RFC 9755 | UTF8=ACCEPT | March 2025 |
Resnick, et al. | Standards Track | [Page] |
This specification extends the Internet Message Access Protocol, specifically IMAP4rev1 (RFC 3501), to support UTF-8 encoded international characters in user names, mail addresses, and message headers. This specification replaces RFC 6855. This specification does not extend IMAP4rev2 (RFC 9051), since that protocol includes everything in this extension.¶
This is an Internet Standards Track document.¶
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.¶
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained athttps://www.rfc-editor.org/info/rfc9755.¶
Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
This specification forms part of the Email AddressInternationalization protocols described in the Email AddressInternationalization Framework document[RFC6530]. It extends IMAP[RFC3501] to permit UTF-8[RFC3629] in headers, as described in"Internationalized Email Headers"[RFC6532]. It also adds amechanism to support mailbox names using the UTF-8 charset. Thisspecification creates two new IMAP capabilities to allow servers toadvertise these new extensions.¶
This specification assumes that the IMAP server will be operating ina fully internationalized environment, i.e., one in which all clientsaccessing the server will be able to accept non-ASCII message headerfields and other information, as specified inSection 3. At leastduring a transition period, that assumption will not be realistic formany environments; the issues involved are discussed inSection 7below.¶
This specification replaces an earlier, experimental approach to thesame problem; see[RFC5738] as well as[RFC6855].¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14[RFC2119][RFC8174] when, and only when, they appear in all capitals, as shown here.¶
The "UTF8=ACCEPT" capability indicates that the server supports theability to open mailboxes containing internationalized messages withthe "SELECT" and "EXAMINE" commands, and the server can provide UTF-8responses to the "LIST" and "LSUB" commands. This capability alsoaffects other IMAP extensions that can return mailbox names or theirprefixes, such as NAMESPACE[RFC2342] and ACL[RFC4314].¶
The "UTF8=ONLY" capability, described inSection 7, implies the"UTF8=ACCEPT" capability. A server is said to support "UTF8=ACCEPT"if it advertises either "UTF8=ACCEPT" or "UTF8=ONLY".¶
A clientMUST use the "ENABLE" command[RFC5161] with the"UTF8=ACCEPT" option (defined inSection 4 below) to indicate to theserver that the client accepts UTF-8 in quoted-strings and supportsthe "UTF8=ACCEPT" extension. The "ENABLE UTF8=ACCEPT" command isonly valid in the authenticated state.¶
The IMAP base specification[RFC3501] forbids the use of 8-bitcharacters in atoms or quoted-strings. Thus, a UTF-8 string can onlybe sent as a literal. This can be inconvenient from a codingstandpoint, and unless the server offers IMAP non-synchronizingliterals[RFC7888], this requires an extra round trip for each UTF-8string sent by the client. When the IMAP server supports"UTF8=ACCEPT", it supports UTF-8 in quoted-strings with the following ABNFsyntax[RFC5234]:¶
quoted =/ DQUOTE *uQUOTED-CHAR DQUOTE ; QUOTED-CHAR is not modified, as it will affect ; other RFC 3501 ABNF non-terminals. uQUOTED-CHAR = QUOTED-CHAR / UTF8-2 / UTF8-3 / UTF8-4 UTF8-2 = <Defined in Section 4 of RFC 3629> UTF8-3 = <Defined in Section 4 of RFC 3629> UTF8-4 = <Defined in Section 4 of RFC 3629>¶
When this extended quoting mechanism is used by the client, theserverMUST reject, with a "BAD" response, any octet sequences withthe high bit set that fail to comply with the formal syntaxrequirements of UTF-8[RFC3629]. The IMAP serverMUST NOT send UTF-8in quoted-strings to the client unless the client has indicatedsupport for that syntax by using the "ENABLE UTF8=ACCEPT" command.¶
If the server supports "UTF8=ACCEPT", the clientMAY use extendedquoted syntax with any IMAP argument that permits a string (includingastring and nstring). However, if characters outside the US-ASCIIrepertoire are used in an inappropriate place, the results would bethe same as if other syntactically valid but semantically invalidcharacters were used. Specific cases where UTF-8 characters arepermitted or not permitted are described in the following paragraphs.¶
All IMAP servers that support "UTF8=ACCEPT"SHOULD accept UTF-8 in mailbox names, and those that also support the Mailbox International Naming Convention described in[RFC3501],Section 5.1.3,MUST accept UTF-8 in mailbox names and convert them to the appropriate internal format. Mailbox namesMUST comply with the Net-UnicodeDefinition ([RFC5198],Section 2) with the specific exception thattheyMUST NOT contain control characters (U+0000 - U+001F and U+0080 - U+009F), a delete character (U+007F), a line separator (U+2028), or aparagraph separator (U+2029).¶
Once an IMAP client has enabled UTF-8 support with the "ENABLE UTF8=ACCEPT" command, itMUST NOT issue a "SEARCH" command that contains a charset specification. If an IMAP server receives such a "SEARCH" command in that situation, itSHOULD reject the command with a "BAD" response (due to the conflicting charset labels). This also applies to any IMAP command or extension that includes an optional charset label and associated strings in the command arguments, including the MULTISEARCH extension. For commands with a mandatory charset field, such as SORT and THREAD, serversSHOULD reject charset values other than UTF-8 with a "BAD" response (due to the conflicting charset labels).¶
If the server supports "UTF8=ACCEPT", then the server accepts UTF-8headers in the "APPEND" command message argument.¶
If an IMAP server supports "UTF8=ACCEPT" and the IMAP client has notissued the "ENABLE UTF8=ACCEPT" command, the serverMUST reject, witha "NO" response, an "APPEND" command that includes any 8-bitcharacter in message header fields.¶
This specification does not extend the IMAP "LOGIN" command[RFC3501]to support UTF-8 usernames and passwords. Whenever a client needs touse UTF-8 usernames or passwords, itMUST use the IMAP "AUTHENTICATE"command, which is already capable of passing UTF-8 usernames andcredentials.¶
Although using the IMAP "AUTHENTICATE" command in this way makes itsyntactically legal to have a UTF-8 username or password, there is noguarantee that the user provisioning system utilized by the IMAPserver will allow such identities. This is an implementationdecision and may depend on what identity system the IMAP server isconfigured to use.¶
[RFC9051],Section 7.5.2 treats message/global like message/rfc,which means that for some messages, the response to FETCHBODYSTRUCTURE varies depending on whether IMAP4rev1 or IMAP4rev2 isin use.¶
[RFC6855] does not extend[RFC3501] in this respect. This documentextends the media-message ABNF production to match[RFC9051].¶
media-message = DQUOTE "MESSAGE" DQUOTE SP DQUOTE ("RFC822" / "GLOBAL") DQUOTE¶
When IMAP4rev1 and UTF8=ACCEPT has been enabled, the serverMAY treatmessage/global like message/rfc822 when computing the body structure,butMAY also treat it as described in[RFC3501]. ClientsMUST acceptboth cases.¶
When IMAP4rev2 and UTF8=ACCEPT are in use, the serverMUST behave asdescribed in[RFC9051].¶
The "UTF8=ONLY" capability indicates that the server supports"UTF8=ACCEPT" (seeSection 3) and that it requires support for UTF-8from clients. In particular, this means that the server will sendUTF-8 in quoted-strings, and it will not accept the olderinternational mailbox name convention (modified UTF-7[RFC3501]).Because these are incompatible changes to IMAP, explicit serverannouncement and client confirmation are necessary: clientsMUST usethe "ENABLE UTF8=ACCEPT" command before using this server. A serverthat advertises "UTF8=ONLY" will reject, with a "NO [CANNOT]"response[RFC5530], any command that might require UTF-8 support andis not preceded by an "ENABLE UTF8=ACCEPT" command.¶
IMAP clients that find support for a server that announces"UTF8=ONLY" problematic are encouraged to at least detect theannouncement and provide an informative error message to theend user.¶
Because the "UTF8=ONLY" server capability includes support for"UTF8=ACCEPT", the capability string will include, at most, one ofthose and never both. For the client, "ENABLE UTF8=ACCEPT" is alwaysused -- never "ENABLE UTF8=ONLY".¶
In most situations, it will be difficult or impossible for theimplementer or operator of an IMAP (or POP) server to know whetherall of the clients that might access it, or the associated mail storemore generally, will be able to support the facilities defined inthis document. In almost all cases, servers that conform to thisspecification will have to be prepared to deal with clients that donot enable the relevant capabilities. Unfortunately, there is nocompletely satisfactory way to do so other than for systems that wishto receive email that requires SMTPUTF8 capabilities to be sure thatall components of those systems -- including IMAP and other clientsselected by users -- are upgraded appropriately.¶
When a message that requires SMTPUTF8 is encountered and the clientdoes not enable UTF-8 capability, choices available to the serverinclude hiding the problematic message(s), creating in-band orout-of-band notifications or error messages, or somehow trying tocreate a surrogate of the message with the intention of providinguseful information to that client about what has occurred. Suchsurrogate messages cannot be actual substitutes for the originalmessage: they will almost always be impossible to reply to (either atall or without loss of information) and the new header fields orspecialized constructs for server-client communications may go beyondthe requirements of current email specifications (e.g.,[RFC5322]).Consequently, such messages may confuse some legacy mail user agents(including IMAP clients) or not provide expected information tousers. There are also trade-offs in constructing surrogates of theoriginal message between accepting complexity and additionalcomputation costs in order to try to preserve as much information aspossible (for example, in "Post-Delivery Message Downgrading forInternationalized Email Messages"[RFC6857]) and trying to minimizethose costs while still providing useful information (for example, in"Simplified POP and IMAP Downgrading for Internationalized Email"[RFC6858]).¶
Implementations that choose to perform downgradingSHOULD use one ofthe standardized algorithms provided in[RFC6857] or[RFC6858].Getting downgrade algorithms right, and minimizing the risk ofoperational problems and harm to the email system, is tricky andrequires careful engineering. These two algorithms are wellunderstood and carefully designed.¶
Because such messages are really surrogates of the original ones, notreally "downgraded" ones (although that terminology is often used forconvenience), they inevitably have relationships to the originalsthat the IMAP specification[RFC3501] did not anticipate. Thisbrings up two concerns in particular: First, digital signaturescomputed over and intended for the original message will often not beapplicable to the surrogate message, and will often fail signatureverification. (It will be possible for some digital signatures to beverified, if they cover only parts of the original message that arenot affected in the creation of the surrogate.) Second, servers thatmay be accessed by the same user with different clients or methods(e.g., POP or webmail systems in addition to IMAP or IMAP clientswith different capabilities) will need to exert extreme care to besure that UIDVALIDITY[RFC3501] behaves as the user would expect.Those issues may be especially sensitive if the server caches thesurrogate message or computes and stores it when the message arriveswith the intent of making either form available depending on clientcapabilities. Additionally, in order to cope with the case when aserver compliant with this extension returns the same UIDVALIDITY toboth legacy and "UTF8=ACCEPT"-aware clients, a client upgraded frombeing non-"UTF8=ACCEPT"-awareMUST discard its cache of messagesdownloaded from the server.¶
The best (or "least bad") approach for any given environment willdepend on local conditions, local assumptions about user behavior,the degree of control the server operator has over client usage andupgrading, the options that are actually available, and so on. It isimpossible, at least at the time of publication of thisspecification, to give good advice that will apply to all situations,or even particular profiles of situations, other than "upgrade legacyclients as soon as possible".¶
When an IMAP server uses a mailbox format that supports UTF-8 headersand it permits selection or examination of that mailbox withoutissuing "ENABLE UTF8=ACCEPT" first, it is the responsibility of theserver to comply with the IMAP base specification[RFC3501] and theInternet Message Format[RFC5322] with respect to all headerinformation transmitted over the wire. The issue of handlingmessages containing non-ASCII characters in legacy environments isdiscussed inSection 8.¶
the "IMAP Capabilities" registry contained a number of references to[RFC6855]. IANA has updated them point to this document instead. The affected references are:¶
The security considerations of UTF-8[RFC3629] and PRECIS Usernames and Passwords[RFC8265] apply to this specification, particularly with respect touse of UTF-8 in usernames and passwords. Otherwise, this is notbelieved to alter the security considerations of IMAP.¶
Special considerations, some of them with security implications, occurif a server that conforms to this specification is accessed by aclient that does not, as well as in some more complex situations inwhich a given message is accessed by multiple clients that might usedifferent protocols and/or support different capabilities. Thoseissues are discussed inSection 8.¶
This non-normative section discusses the reasons behind some of thedesign choices in this specification.¶
The "UTF8=ONLY" mechanism simplifies diagnosis of interoperabilityproblems when legacy support goes away. In the situation wherebackwards compatibility is not working anyway, the non-conforming"just-send-UTF-8 IMAP" has the advantage that it might work with somelegacy clients. However, the difficulty of diagnosinginteroperability problems caused by a "just-send-UTF-8 IMAP" mechanismis the reason the "UTF8=ONLY" capability mechanism was chosen.¶
This non-normative section describes the changes made since[RFC6855].¶
This document removes APPEND's UTF8 data item, making the UTF8-relatedsyntax compatible with IMAP4rev2 as defined by[RFC9051] and makingit simpler for clients to support IMAP4rev1 and IMAP4rev2 with thesame code.¶
IMAP4rev2[RFC9051] provides roughly the same abilities as[RFC6855] but does not include APPEND's UTF8 item. None of[RFC6855], IMAP4rev2, or JMAP[RFC8620] specify any way to learnwhether a particular message was stored using the UTF8 data item. Asof today, an IMAP client cannot learn whether a particular message wasstored using the UTF8 data item, nor would it be able to trust thatinformation even if IMAP4rev1 and 2 were extended to provide thatinformation.¶
In July 2023, one of the authors found only one IMAP client that usesthe UTF8 data item, and that client uses it incorrectly (it sends thedata item for all messages if the server supports UTF8=ACCEPT, withoutregard to whether a particular message includes any UTF8 at all).¶
For these reasons, it was judged best to revise[RFC6855] and adoptthe same syntax as IMAP4rev2.¶
[RFC6532] defines a new media type, message/global, which issubstantially like message/rfc822 except that the submessage may(also) use the syntax defined in[RFC6532].[RFC3501] and[RFC9051] define a FETCH item to return the MIME structure of a message, which servers usually compute once and store.¶
None of the RFCs point out to implementers that IMAP4rev1 andIMAP4rev2 are slightly different, so storing the BODYSTRUCTURE in theway servers and clients often do can easily lead to problems.¶
This document makes the syntax optional, making it simple for serverauthors to implement this extension correctly. This implies thatclients need to parse and handle both varieties, which they need to doanyway if they want to support both IMAP4rev1 and IMAP4rev2.¶
This document is an almost unchanged copy of[RFC6855], which was written byPete Resnick,Chris Newman, andSean Shen. Sean has since changed jobs and the current authors do not have a new email address for him. We cannot be sure that he would approve of the changes in this document, so we did not list him as author, but do gratefully acknowledge his work on[RFC6855].Jiankang Yao replaces him.¶
The next paragraph is a straight copy of the acknowledgments in[RFC6855]:¶
The authors wish to thank the participants of the EAI working group for their contributions to this document, with particular thanks toHarald Alvestrand,David Black,Randall Gellens,Arnt Gulbrandsen,Kari Hurtta,John Klensin,Xiaodong Lee,Charles Lindsey,Alexey Melnikov,Subramanian Moonesamy,Shawn Steele,Daniel Taharlev, andJoseph Yee for their specific contributions to the discussion.¶
Many of them also reread the document during this revision.¶