Unicode® Technical Standard #46

Unicode IDNA Compatibility Processing

Version	17.0.0
Editors	Mark Davis ([email protected]),Markus Scherer ([email protected])
Date	2025-09-04
This Version	https://www.unicode.org/reports/tr46/tr46-35.html
Previous Version	https://www.unicode.org/reports/tr46/tr46-33.html
Latest Version	https://www.unicode.org/reports/tr46/
Latest Proposed Update	https://www.unicode.org/reports/tr46/proposed.html
Revision	35

Summary

Client software, such as browsers and emailers, faced adifficult transition from the version of international domain namesapproved in 2003 (IDNA2003), to the revision approved in 2010(IDNA2008).The specification in this document has been providing a mechanismthat minimizes the impact of this transition for client software,allowing client software to access domains that are valid undereither system.

The specification provides two main features: One is acomprehensive mapping to support current user expectations forcasing and other variants of domain names. Such a mapping is allowedby IDNA2008. The second is a compatibility mechanism that supportsthe existing domain names that were allowed under IDNA2003. Thissecond feature was intended to improve client behavior during thetransition period.

Status

This document has been reviewed by Unicode members and otherinterested parties, and has been approved for publication by theUnicode Consortium. This is a stable document and may be used asreference material or cited as a normative reference by otherspecifications.

A Unicode Technical Standard (UTS) is an independentspecification. Conformance to the Unicode Standard does not implyconformance to any UTS.

Please submit corrigenda and other comments with the onlinereporting form [Feedback].Related information that is useful in understanding this document isfound in theReferences. For the latestversion of the Unicode Standard, see [Unicode]. For alist of current Unicode Technical Reports, see [Reports]. For moreinformation about versions of the Unicode Standard, see [Versions].

1Introduction

One of the great strengths of domain names is universality. The URLhttps://Apple.com goes to Apple'swebsite from anywhere in the world, using any browser. The emailaddress[email protected] can beused to send email to an editor of this specification from anywherein the world, using any emailer.

Initially, domain names were restricted to ASCII characters. This wasa significant burden on people using other characters. Suppose, forexample, that the domain name system had been invented by Greeks, andone could only use Greek characters in URLs. Rather thanapple.com, one would have to writesomething likeαππλε.κομ. An Englishspeaker would not only have to be acquainted with Greek characters,but would also have to pick those Greek letters that would correspondto the desired English letters. One would have to guess at thespelling of particular words, because there are not exact matchesbetween scripts.

Most of the world’s population faced this situation until recently,because their languages use non-ASCII characters. A system wasintroduced in 2003 for internationalized domain names (IDN). Thissystem is calledInternationalizing Domain Names forApplications, or IDNA2003 for short. This mechanism supports IDNs bymeans of a client software transformation into a format known asPunycode. A revision of IDNA was approved in 2010 (IDNA2008). Thisrevision has a number of incompatibilities with IDNA2003.

The incompatibilities forced implementers of client software,such as browsers and emailers, to face difficult choices during thetransition period as registries shifted from IDNA2003 to IDNA2008. Thisdocument specifies a mechanism that has minimized the impact of thistransition for client software, allowing client software to accessdomains that are valid under either system.

The specification provides two main features. The first is acomprehensive mapping to support current user expectations for casingand other variants of domain names. Such a mapping is allowed byIDNA2008. The second feature is a compatibility mechanism thatsupports the existing domain names that were allowed under IDNA2003.This second feature was intended to improve client behavior during thetransition period.Although the transition is complete and transitional processing is now deprecated,the mapping and processing defined in this specification,and the validation based on the latest version of Unicode,remain valuable and in widespread use.

This specification contains both normative andinformative material. Only the conformance clauses and the text thatthey directly or indirectly reference are considered normative.

1.1IDNA2003

The series of RFCs collectively known as IDNA2003 [IDNA2003] allows domain names to containnon-ASCII Unicode characters, which includes not only the charactersneeded for Latin-script languages other than English (such as Å, Ħ,or Þ), but also different scripts, such as Greek, Cyrillic, Tamil, orKorean. An internationalized domain name such asBücher.de can then be used in an"internationalized" URL, called an IRI, such ashttp://Bücher.de#titel.

The IDNA mechanism for allowing non-ASCII Unicode characters indomain names involves applying the following steps to each label inthe domain name that contains Unicode characters:

Transforming (mapping) a Unicode string to remove case andother variant differences.
Checking the resulting string for validity, according tocertain rules.
Transforming the Unicode characters into a DNS-compatibleASCII string using a specialized encoding calledPunycode [RFC3492].

For example, typing the IRIhttp://Bücher.deinto the address bar of any modern browser goes to a correspondingsite, even though the "ü" is not an ASCII character. Thisworks because the IDN in that IRI resolves to the Punycode stringwhich is actually stored by the DNS for that site. Similarly, when abrowser interprets a web page containing a link such as <ahref="http://Bücher.de">, the appropriate site isreached. (In this document, phrases such as "a browserinterprets" refer to domain names parsed out of IRIs entered inan address baras well as to those contained in linksinternal to HTML text.)

In the case of IDNBücher.de, thePunycode value actually used for the domain names on the wire isxn--bcher-kva.de. The Punycode version isalso typically transformed back into Unicode form for display. Theresulting display string will be a string which has already beenmapped according to the IDNA2003 rules. This example results in adisplay string for the IRI that has been casefolded to lowercase:

http://Bücher.de →http://xn--bcher-kva.de →http://bücher.de

A major limitation of IDNA2003 is its restriction to the repertoireof characters in Unicode 3.2, which means that some modern languagesare excluded or not fully supported. Furthermore, within theconstraints of IDNA2003, there is no simple way to extend therepertoire. IDNA2003 also does not make it clear to users ofregistries exactly which string they are registering for a domainname (betweenBücher.de andbücher.de, for example).

1.2IDNA2008

In early 2010, a new version of IDNA was approved. Like IDNA2003,this version consists of a collection of RFCs and is called IDNA2008[IDNA2008]. IDNA2008 is intended to solve themajor problems in IDNA2003. It extends the valid repertoire ofcharacters in domain names, and establishes an automatic process forupdating to future versions of the Unicode Standard. Furthermore, itdefines the concept of a valid domain name clearly, so thatregistrants understand exactly what domain name string is beingregistered.

Processing in IDNA2008 is identical to IDNA2003 for many commondomain names. Both IDNA2003 and IDNA2008 transform a Unicode domainname in an IRI (like http://öbb.at)to the Punycode version (likehttp://xn--bb-eka.at).However, IDNA2008 does not maintain strict backward compatibilitywith IDNA2003. The main differences are:

Additions. Some IDNs are invalid in IDNA2003, butvalid in IDNA2008.
Subtractions.Some IDNs are valid in IDNA2003, butinvalid in IDNA2008.
Deviations.Some IDNs are valid in both, but resolveto different destinations.

1.3Transition Considerations

The differences between IDNA2008 and IDNA2003 may causeinteroperability and security problems. They affect extremely commoncharacters, such as all uppercase characters, all halfwidth orfullwidth characters (commonly used in Japan, China, and Korea), andcertain other characters like the Germaneszett (U+00DF ßLATIN SMALL LETTER SHARP S) and Greekfinal sigma (U+03C2 ςGREEK SMALL LETTER FINAL SIGMA).Note that for the “deviation” characters like the sharp s and the sigma,the industry has fully transitioned to IDNA2008 behavior,and transitional processing has been deprecated.

1.3.1Mapping

IDNA2003 requires a mapping phase, which mapsÖBB.attoöbb.at, for example. Mappingtypically involves mapping uppercase characters to their lowercasepairs, but it also involves other types of mappings betweenequivalent characters, such as mapping halfwidthkatakanacharacters to normalkatakana characters in Japanese. Themapping phase in IDNA2003 was included to match the case insensitivity ofASCII domain names. Users are accustomed to having bothCNN.com andcnn.comwork identically. They expect domain names with accents to have thesame casing behavior, so thatÖBB.atis the same asöbb.at. There arevariations similar to case differences in other scripts. The IDNA2003mapping is based on data specified in the Unicode Standard, Version3.2; this mapping was later formalized as the Unicode property [NFKC_Casefold].

Note that case-folding generates a stable form of a string thaterases functional case-differences. It isnot the same aslowercasing. In particular, the lowercase Cherokee characters addedin Unicode Version 8.0 are case-folded to their uppercasecounterparts.

IDNA2008 does not require a mapping phase, but doespermit one(called "Local Mapping" or "Custom Mapping"). Formore information on the permitted mappings, see theProtocoldocument of [IDNA2008],Section 4.2,Permitted Character and Label Validation andSection 5.2,Conversion to Unicode.

The UTS #46 specification defines a mapping consistent with thenormative requirements of the IDNA2008 protocol, and which ismostly compatible with IDNA2003.For client software, thisprovides behavior that is the most consistent with user expectationsabout the handling of domain names with existing data—namely, thatdomain names are case-insensitive.

1.3.2Deviations

There are a few situations where the use of IDNA2008 withoutcompatibility mapping will result in the resolution of IDNs todifferent IP addresses from in IDNA2003, unless the registry orregistrant takes special action. This affects a very small number ofcharacters, but because these characters are very common inparticular languages, a significant number of domain names in thoselanguages are affected. This set of characters is referred to as"Deviations" and is shown inTable 1,Deviation Characters,illustrated in the context of IRIs.

Table 1.Deviation Characters

Char	Example	IDNA2003 Result	IDNA2008 Result
ß `00DF`	href="http://faß.de"	http://fass.de → http://fass.de	http://faß.de → http://xn--fa-hia.de
ς `03C2`	href="http://βόλος.com"	http://βόλοσ.com → http://xn--nxasmq6b.com	http://βόλος.com → http://xn--nxasmm1c.com
ZWJ `200D`	href="http://ශ්‍රී.com"	http://ශ්රී.com→ http://xn--10cl1a0b.com	http://ශ්‍රී.com→ http://xn--10cl1a0b660p.com
ZWNJ `200C`	href="http://نامه‌ای.com"	http://نامهای.com→ http://xn--mgba3gch31f.com	http://نامه‌ای.com→ http://xn--mgba3gch31f060k.com

For more information on the rationale for the occurrence of theseDeviations in IDNA2008, see the [IDN FAQ].

The differences in interpretation of Deviation characters result inpotential for security exploits. Consider a scenario involvinghttp://www.sparkasse-gießen.de, a GermanIRI containing an IDN for "Gießen Savings and Loan".

Alice's browser supports IDNA2003. Under those rules,http://www.sparkasse-gießen.de is mapped tohttp://www.sparkasse-giessen.de,which leads to a site with the IP address01.23.45.67.
She visits her friend Bob, and checks her bank statement onhis browser. His browser supports IDNA2008. Under those rules,http://www.sparkasse-gießen.de is alsovalid, but converts to a different Punycode domain name inhttp://www.xn--sparkasse-gieen-2ib.de. Thiscan lead to a different site with the IP address101.123.145.167,a spoof site.

Alice ends up at the phishing site, supplies her bankpassword, and her money is stolen. While the .DE registar (DENIC)might have a policy about bundling all of the variants of ß together(so that they all have the same owner) it is not required ofregistries. It is unlikely that all registries will have and enforcesuch a bundling policy in all such cases.

There are two Deviations of particular concern. IDNA2008 allowsthe joiner characters (ZWJ and ZWNJ) in labels. By contrast, theseare removed by the mapping in IDNA2003. When used in the intendedcontexts in particular scripts, the joiner characters produce anoticeable change in displayed text. However, when used between anyother characters in those scripts, or in any other scripts, they areinvisible. For example, when used between the Latin characters"a" and "b" there is no visible different: thesequence "a<ZWJ>b" looks just like "ab".

Because of the visual confusability introduced by the joinercharacters, IDNA2008 provides a special category for them calledCONTEXTJ, and only permits CONTEXTJ characters in limited contexts:certain sequences of Arabic or Indic characters. However,applications that perform IDNA2008 lookup are not required to checkfor these contexts, so overall security is dependent on registrieshaving correct implementations. Moreover, the IDNA2008 contextrestrictions do not catch most cases where distinct domain names havevisually confusable appearances because of ZWJ and ZWNJ.

Note that for these “deviations”,the industry has fully transitioned to IDNA2008 behavior,and transitional processing has been deprecated.

2UnicodeIDNA Compatibility Processing

To satisfy user expectations for mapping, and (originally) provide compatibility with IDNA2003, this document specifies a mapping foruse with IDNA2008. In addition, this document provides a Unicode algorithm for astandardized processing that allows conformant implementations tominimize the security and interoperability problems caused by thedifferences between IDNA2003 and IDNA2008. This Unicode IDNACompatibility Processing is structured according to IDNA2003principles, but extends those principles to Unicode 5.2 and later. Italso incorporates the repertoire extensions provided by IDNA2008.

UTS #46 can be usedpurely as a preprocessing (local mapping) for IDNA2008 by claimingconformance specifically toConformance ClauseC3.

By using this Compatibility Processing, a domain name such asÖBB.at will be mapped to the valid domainnameöbb.at, thus matching userexpectation for case behavior in domain names. For transitional use,the Compatibility Processing also allows domain names containingsymbols and punctuation that were valid in IDNA2003, such as√.com (which has an associated web page).Such domain names containing symbols will gradually disappear asregistries shift to IDNA2008.

Implementations may also restrict or flag (in a UI) domain names thatinclude symbols and punctuation. For more information, seeUnicodeTechnical Report # 36, Unicode Security Considerations [UTR36].

Using the Unicode IDNA Compatibility Processing to transform anIDN into a form suitable for DNS lookup is similar to the tactic of"try IDNA2008 then try IDNA2003". However, this approachavoids a potentially problematic dual lookup. It allows browsers andother clients, such as search engines, to have a single processingstep, without the burden of maintaining two different implementationsand multiple tables. It accounts for a number of edge cases thatwould cause problems, and provides a stable definition withpredictable results.

The Unicode IDNA Compatibility Processing also providesalternate mappings for the Deviation characters. This facilitates thetransition from IDNA2003 to IDNA2008. It is up to the registries todecide how to handle the transition, for example, by either bundlingor blocking the Deviation characters that they support.In practice, for the deviation characters, the transition is complete.All major implementations have switched to nontransitional processing of the four deviation characters.

The term "registries" includes far more than top-levelregistries, such as for.de or.com.For example,.blogspot.com has more domain namesregistered than most top-level registries. There may be differentpolicies in place for a registry and any of its subregistries. Thusmillions of registries need to be considered in a transitionstrategy, not just hundreds.

In lookup software, transitions may be fine-grained: forexample, it may be possible to transition to IDNA2008 rules regardingDeviations for.subdomain.com at a given point butnot for.com, or vice versa.If.tldbundles or blocks the Deviation characters, then clients couldtransition Deviations for.tld,but not for (say).subdomain.tld.Moreover, client software with a UI, such as the address bar in abrowser, could provide more options for the transition. A fulldiscussion of such transition strategies is outside of the scope ofthis document.

During the interim, authors of documents, such as HTMLdocuments, can unambiguously refer to the IDNA2008 interpretation ofcharacters by explicitly using the Punycode form of the domain namelabel.

There are two slightly different compatibility mechanisms for domainnames during a transition and afterward. UTS #46 therefore specifiestwo specific types of processing: Transitional Processing(Conformance ClauseC1)and Nontransitional Processing(Conformance ClauseC2).The only difference between them is the handlingof the four Deviation characters.

Summarized briefly, UTS #46 builds upon IDNA2008 in threeareas:

Mapping. The UTS #46 mapping is used tomaintain maximal compatibility and meet user expectations. It isconformant to IDNA2008, which allows for mapping input.
Symbols and Punctuation. UTS #46 supportsprocessing of symbols and punctuation.Registries which implement IDNA2008will simply refuse the DNS lookups of IDNs with symbols.
Deviations (deprecated). UTS #46 provides two ways ofhandling these to support a transition. Transitional Processing (deprecated)had been recommended to be used immediately before a DNS lookup in thecircumstances where the registry does not guarantee a strategy ofbundling or blocking. Nontransitional Processing, which is fullycompatible with IDNA2008, should be used in all cases.

For a demonstration of differences between IDNA2003, IDNA2008, andthe Unicode IDNA Compatibility Processing, see the [DemoIDN].

UTS #46 does not change any of the terms defined in IDNA2008, such asA-Label or U-Label.

Neither the Unicode IDNA Compatibility Processing nor IDNA2008address security problems associated with confusables (the so-called"paypal.com" problem).IDNA2008 disallows certain symbols and punctuation characters thatcan be used for spoofing, such as spoofs of the slash character("/"). However, these are an extremely small fraction ofthe confusable characters used for spoofing. Moreover, confusablecharacters themselves account for a small proportion of phishingproblems: most are cases like "secure-wellsfargo.com". Formore information, see [Bortzmeyer] and the[IDN FAQ]. It is strongly recommended thatUnicodeTechnical Report #36, Unicode Security Considerations [UTR36] andUnicode Technical Standard#39, Unicode Security Mechanisms [UTS39] beconsulted for information on dealing with confusables, both forclient software and registries. In particular, [UTS39]provides information that can be used to drastically reduce thenumber of confusables when dealing with international domain names,much beyond what IDNA2008 does. See also the [DemoConf].

2.1Display ofInternationalized Domain Names

IDNA2003 applications customarily display the processed string to theuser. This improves security by reducing the opportunity for visualconfusability. Thus, for example, the URLhttp://googIe.com(with a capital I in place of the L) is revealed ashttp://googie.com.

2.2Registries

This specification is primarily targeted at applications doing lookupof IDNs. There is, however, one strong recommendation for registries:do not allow the registration of labels that are invalidaccording to Nontransitional Processing, and do use bundling or blocking forlabels containing confusable characters.

These tactics can be described as follows:

Bundling:If two or more labels are different, but confusable,and more than one is registered,the registrant for each must be the same.
Blocking:If two or more labels are different, but confusable,allow the registration of only one, and block the others.Registries that do not allow any Deviationcharacters at all count asblocking.

Note: Some implementations outside Unicodeuse different terminology for these strategies.In particular, in the ICANN Root Zone Label Generation Rules [RZLGR5],the termallocatable variant of X is used for labels that can be bundled with X,and the termblocked variant is used for a mutually exclusive label.

The label that is actually registered and inserted into a registryhas always been processed. For example,xn--bcher-kvacorresponds tobücher. However, it maybe useful for a registry to also ask for "unprocessed" labels, suchasBücher, as part of the registrationprocess, so that they are aware of the registrant's intent. However,such unprocessed labels must be handled carefully:

Storing the unprocessed label as the sequence of charactersthat the registrant really wanted to apply for.
Processing the unprocessed label, and displaying theprocessed label to the registrant for confirmation.
Proceeding with the regular registration process usingonly the processed label.

2.3Notation

Sets of code points are defined using properties and the syntax ofUnicodeTechnical Standard #18, Unicode Regular Expressions [UTS18]. For example, the set of combining marks isrepresented by the syntax\p{gc=M}. Additionally, the "+" indicates the addition of elementsto a set, for clarity.

In this document, alabel is a substring of a domain name.That substring is bounded on both sides by either the start or theend of the string, or any of the following characters, calledlabel-separators:

U+002E ( . ) FULL STOP
U+FF0E ( ． ) FULLWIDTH FULL STOP
U+3002 ( 。 ) IDEOGRAPHIC FULL STOP
U+FF61 ( ｡ ) HALFWIDTH IDEOGRAPHIC FULL STOP

Many people use the terms "domain names" and "hostnames" interchangeably. This document follows [RFC3490] in use of the term "domainname".

ABidi domain name is a domain name containing at least one character with Bidi_Class R, AL, or AN. See [IDNA2008] RFC 5893, Section 1.4.

3Conformance

The requirements for conformance on implementations of theUnicodeIDNA Compatibility Processing algorithm are stated in the followingclauses. An implementation can claim conformance to any or all ofthese clauses independently.

C1 (deprecated).Given aversion of Unicode and aUnicodeString, a conformant implementation ofTransitionalProcessing shall replicate the results given by applying theTransitional Processing algorithm specified by Section 4,Processing.

C2.Given aversion of Unicode and aUnicodeString, a conformant implementation ofNontransitionalProcessing shall replicate the results given by applying theNontransitional Processing algorithm specified by Section 4,Processing.

C3.Given aversion of Unicode and aUnicodeString, a conformant implementation ofPreprocessingfor IDNA2008 shall replicate the results specified by Section 4.4,Preprocessing for IDNA2008.

These specifications arelogical ones, designed to bestraightforward to describe. An actual implementation is free to usedifferent methods as long the result is the same as that specified bythe logical algorithm.

Any conformant implementation may also havetighter validitycriteria than those imposed bySection 4.1,Validity Criteria. For example, anapplication could disallow or warn of domain name labels with certaincharacteristics, such as:

labels with certain combinations of scripts (Safari)
labels with characters outside of the user's specifiedlanguages (IE)
labels with certain confusable characters (Firefox)
labels that are detected by the Google Safe Browsing API [SafeBrowsing]
labels that do not meet the validity requirements ofIDNA2008
labels produced by toUnicode that would not meet the labelvalidity requirements if toASCII were performed.
labels containing characters which are not contained in theGeneralSecurity Profile for Identifiers fromUnicode TechnicalStandard #39, Unicode Security Mechanisms [UTS39]
labels that do not satisfyRestriction Level 4,ModeratelyRestrictive fromUnicode Technical Standard #39, UnicodeSecurity Mechanisms [UTS39]

For more information, seeUnicode Technical Report #36,Unicode Security Considerations [UTR36] andUnicodeTechnical Standard #39, Unicode Security Mechanisms [UTS39].

3.1STD3 Rules

IDNA2003 provides for a flag,UseSTD3ASCIIRules,that allows for implementations to choose whether or not to abide bythe rules in [STD3]. These rules exclude ASCIIcharacters outside the set consisting of A-Z, a-z, 0-9, and U+002D (- ) HYPHEN-MINUS. For example, some browsers also allow characterssuch as U+005F ( _ ) LOW LINE(underbar) in domain names,and thus use a custom set of valid ASCII characters whenchecking theValidity Criteria.

4Processing

The input to Unicode IDNA Compatibility Processing is a prospectivedomain_namestring expressed in Unicode, and a choice of Transitional orNontransitional Processing. The domain name consists of a sequence oflabels with dot separators, such as "Bücher.de". For more information about the composition of aURL, see Section 3.5 of [STD13].

Main Processing Steps

The following steps, performed in order, successively alter the inputdomain_name string and then output it as a converted Unicodestring, plus a flag to indicate whether there was an error. Even ifan error occurs, the conversion of the string is performed as much asis possible.

Input

A prospectivedomain_name expressed as a sequenceof Unicode code points
A boolean flag:UseSTD3ASCIIRules
A boolean flag:CheckHyphens
A boolean flag:CheckBidi
A boolean flag:CheckJoiners
A boolean flag:Transitional_Processing (deprecated)
A boolean flag:IgnoreInvalidPunycode

Processing

Map. For each codepoint in thedomain_name string, look up the Status value inSection 5,IDNA Mapping Table, and take thefollowing actions:
- disallowed: Leave the code pointunchanged in the string.Note: The Convert/Validate step below checks for disallowed characters,after mapping and normalization.
- ignored: Remove the code point from thestring. This is equivalent to mapping the code point to an emptystring.
- mapped:IfTransitional_Processing (deprecated) andthe code point is U+1E9E capital sharp s (ẞ),then replace the code point in the string by “ss”. Otherwise:
  Replace the code point in thestring by the value for the mapping inSection 5,IDNAMapping Table.
- deviation:
  - IfTransitional_Processing (deprecated), replace the codepoint in the string by the value for the mapping inSection 5,IDNA Mapping Table.
  - Otherwise, leave the codepoint unchanged in the string.
- valid: Leave the code point unchanged inthe string.
Normalize.Normalize thedomain_name string to Unicode NormalizationForm C.
Break. Break thestring into labels at U+002E ( . ) FULL STOP.
Convert/Validate. Foreach label in thedomain_name string:
- If the label starts with “xn--”:
  1. If the label contains any non-ASCII code point (i.e., a code point greater than U+007F), record that there was an error, and continue with the next label.
  2. Attempt to convert the rest of the label to Unicodeaccording toPunycode [RFC3492]. If that conversion failsand if notIgnoreInvalidPunycode,record that there was an error, andcontinue with the next label. Otherwise replace the originallabel in the string by the results of the conversion.
  3. If the label is empty,or if the label contains only ASCII code points,record that there was an error.
  4. Verify that the label meets the validity criteria inSection4.1,Validity Criteriafor Nontransitional Processing. If any of the validity criteriaare not satisfied, record that there was an error.
- If the label does not startwith “xn--”:
  - Verify that the label meets the validity criteria inSection4.1,Validity Criteriafor the input Processing choice (Transitional orNontransitional). If any of the validity criteria are notsatisfied, record that there was an error.

Any inputdomain_name string that does not record an error hasbeen successfully processed according to this specification.Conversely, if an inputdomain_name string causes an error,then the processing of the inputdomain_name string fails.Determining what to do with error input is up to the caller, and notin the scope of this document. The processing isidempotent—reapplying the processing to the output will make nofurther changes. For examples, seeTable 2,Examples of TransitionalProcessing.

Implementations may make further modifications to the resultingUnicode string when showing it to the user. For example, it isrecommended that disallowed characters be replaced by a U+FFFD tomake them visible to the user. Similarly, labels that fail processingduring step 4 may be marked by the insertion of a U+FFFD orother visual device.

With either Transitional orNontransitional Processing, sources already in Punycode are validatedwithout mapping. In particular, Punycode containing Deviationcharacters, such as href="xn--fu-hia.de"(for fuß.de) is not remapped. This provides a mechanism allowingexplicit use of Deviation characters even during a transition period.

4.1ValidityCriteria

Each of the following criteria must be satisfied for a non-empty label:

The label must be in Unicode Normalization Form NFC.
IfCheckHyphens, the label must not contain a U+002D HYPHEN-MINUS characterin both the third and fourth positions.
IfCheckHyphens, the label must neither begin nor end with a U+002DHYPHEN-MINUS character.
If notCheckHyphens, the label must not begin with “xn--”.
The label must not contain a U+002E ( . ) FULL STOP.
The label must not begin with a combining mark, that is:General_Category=Mark.
Each code point in the label must only have certain Statusvalues according toSection 5,IDNAMapping Table:
1. For Transitional Processing (deprecated), each value must bevalid.
2. For Nontransitional Processing, each value must be eithervalid ordeviation.
3. In addition,ifUseSTD3ASCIIRules=true andthe code point is an ASCII code point (U+0000..U+007F),then it must be a lowercase letter (a-z), a digit (0-9),or a hyphen-minus (U+002D).(Note: This excludes uppercase ASCII A-Z which aremapped in UTS #46 anddisallowed in IDNA2008.)
IfCheckJoiners, the label must satisify theContextJ rules fromAppendix A, inThe Unicode Code Points and Internationalized Domain Names for Applications (IDNA) [IDNA2008].
IfCheckBidi, and if the domain name is aBidi domain name, then the label must satisfy all six of the numbered conditions in [IDNA2008] RFC 5893, Section 2.

The first 6 criteria are from [IDNA2008],except for the fourth criterion. Criterion #2 in particular is meant to allow for future label extensions beyond just xn--, such as for future versions of IDNA. Some implementations appear to consider such extentions unlikely, and allow labels such as "r3---sn-apo3qvuoxuxbt-j5pe".

Any particular applicationmay have tighter validitycriteria, as discussed inSection 3,Conformance.

4.1.1UseSTD3ASCIIRules

Starting with Unicode 16.0,UseSTD3ASCIIRules=true ishandled only in the Validity Criteria.An implementation may choose to allow additional ASCII characters but should alwaysconsider ASCII lowercase letters, digits, and the hyphen-minus ([\u002Da-z0-9])asvalid.

Note: ASCIIcharacters may have resulted from a mapping: for example, aU+005F ( _ ) LOW LINE(underbar) may have originally been aU+FF3F ( ＿ ) FULLWIDTH LOW LINE.

4.1.2Right-to-LeftScripts

In addition, the label should meet the requirements for right-to-leftcharacters specified in the Right-to-Left Scripts document of [IDNA2008], and for the CONTEXTJ requirements inthe Protocol document of [IDNA2008]. It isstrongly recommended thatUnicode Technical Report #36,Unicode Security Considerations [UTR36] andUnicodeTechnical Standard #39, Unicode Security Mechanisms[UTS39] be consulted for information on dealingwith confusables, and for characters that should be excluded fromidentifiers. Note that the recommended exclusions are a superset ofthose in [IDNA2008].

4.2ToASCII

The operation corresponding to ToASCII of [RFC3490]is defined by the following steps:

Input

A prospectivedomain_name expressed as a sequenceof Unicode code points
A boolean flag:CheckHyphens
A boolean flag:CheckBidi
A boolean flag:CheckJoiners
A boolean flag:UseSTD3ASCIIRules
A boolean flag:Transitional_Processing (deprecated)
A boolean flag:VerifyDnsLength
A boolean flag:IgnoreInvalidPunycode

Processing

To the inputdomain_name, apply theProcessingSteps inSection 4,Processing,using the input boolean flagsTransitional_Processing,CheckHyphens,CheckBidi,CheckJoiners, andUseSTD3ASCIIRules. This may record an error.
Break the result into labels at U+002E FULL STOP.
Convert each label with non-ASCII characters into Punycode [RFC3492], andprefix by “xn--”. This may record an error.
If theVerifyDnsLength flag is true, then verify DNSlength restrictions. This may record an error. For more information,see [STD13] and[STD3].
1. The length of the domain name, excluding the root labeland its dot, is from 1 to 253.
2. The length of each label is from 1 to 63.
  - Note: Technically, a complete domain name ends withan empty label for the DNS root(see [STD13] [RFC1034] section 3).This empty label, and the trailing dot, is almost always omitted.
  - WhenVerifyDnsLength is false, the empty root label is passed through.
  - WhenVerifyDnsLength is true, the empty root label is disallowed.This corresponds to the syntax in [RFC1034]section 3.5 Preferred name syntaxwhich also defines the label length restrictions.
If an error was recorded in steps 1-4, then the operationhas failed and a failure value is returned. No DNS lookup should bedone.
Otherwise join the labels using U+002E FULL STOP as aseparator, and return the result.

Implementations are advised to apply additional tests to theselabels, such as those described inUnicode Technical Report#36, Unicode Security Considerations [UTR36]andUnicode Technical Standard #39, Unicode SecurityMechanisms [UTS39], and take appropriateactions. For example, a label with mixed scripts or confusables maybe called out in the UI. Note that the use of Punycode to signalproblems may be counter-productive, as described in [UTR36].

4.3ToUnicode

The operation corresponding to ToUnicode of [RFC3490]is defined by the following steps:

Input

A prospectivedomain_name expressed as a sequenceof Unicode code points
A boolean flag:CheckHyphens
A boolean flag:CheckBidi
A boolean flag:CheckJoiners
A boolean flag:UseSTD3ASCIIRules
A boolean flag:Transitional_Processing (deprecated)
A boolean flag:IgnoreInvalidPunycode

Processing

To the inputdomain_name, apply theProcessingSteps inSection 4,Processing,using the input boolean flagsTransitional_Processing,CheckHyphens,CheckBidi,CheckJoiners, andUseSTD3ASCIIRules. This may record an error.
Like [RFC3490], this will alwaysproduce a converted Unicode string. Unlike ToASCII of [RFC3490], this always signals whether or notthere was an error.

Implementations are advised to apply additional tests to theselabels, such as those described inUnicode Technical Report#36, Unicode Security Considerations [UTR36]andUnicode Technical Standard #39, Unicode SecurityMechanisms[UTS39], and takeappropriate actions. For example, a label with mixed scripts orconfusables may be called out in the UI. Note that the use ofPunycode to signal problems may be counter-productive, as describedin [UTR36].

4.4Preprocessingfor IDNA2008

The table specified inSection 5,IDNAMapping Table may also be used for a pure preprocessing step forIDNA2008, mapping a Unicode string for input directly to thealgorithm specified in IDNA2008.

Preprocessing for IDNA2008 is specified as follows:

Apply theSection 4.3,ToUnicodeprocessing to the Unicode string.

Note that this preprocessing allows some characters that areinvalid according to IDNA2008. However, the IDNA2008 processing willcatch those characters. For example, a Unicode string containing acharacter listed as DISALLOWED in IDNA2008, such as U+2665 (♥) BLACKHEART SUIT, will pass the preprocessing step without an error, butsubsequent application of the IDNA2008 processing will fail with anerror, indicating that the string is not a valid IDN according toIDNA2008.

4.5ImplementationNotes

A number of optimizations can be applied to the Unicode IDNACompatibility Processing. These optimizations can improveperformance, reduce table size, make use of existing NFKC transformmechanisms, and so on. For example:

There is an NFC check inSection 4.1,Validity Criteria. However, it onlyneeds to be applied to labels that were converted from Punycode intoUnicode inStep 3.
A simple way to do much of the validity checking inSection4.1,Validity Criteriais to reapply Steps 1 and 2, and verify that the result does notchange.
Because the four label separators are all mapped to U+002E (. ) FULL STOP byStep 1, theparsing of labels in Steps 3 and 4 only need to detect U+002E ( . )FULL STOP, and not the other label separators defined in IDNA [RFC3490].

Note that the inputdomain_name string for the Unicode IDNACompatibility Processing must have had all escaped Unicode codepoints converted to Unicode code points. For example,U+5341( 十 ) CJK UNIFIED IDEOGRAPH-5341 could have been escaped as any ofthe following:

十 an HTML numeric character reference(NCR)
\u5341 a Javascript escapes
%E5%8D%81 a URI/IRI %-escape

Examples are shown inTable 2,Examples of Processing:

Table 2.Examples of Processing

Input	Map	Normalize	Convert	Validate	Comment
Bloß.de	bloss.de	=	n/a	ok	Transitional (deprecated): maps uppercase and sharp s
Bloß.de	bloß.de	=	n/a	ok	Nontransitional: maps uppercase
BLOẞ.de	bloß.de	=	n/a	ok	Maps uppercase
xn--blo-7ka.de	=	=	bloß.de	ok	Punycode is not mapped, so ß never changes (whethertransitional or not).
u¨.com	=	ü.com	n/a	ok	Normalize changesu+ umlaut toü
xn--tda.com	=	=	ü.com	ok	Punycodexn--tda changes toü
xn--u-ccb.com	=	=	u¨.com	*error*	Punycode is not mapped, butis validated. Becauseu + umlaut is not NFC, it fails.
a⒈com	*error*	*error*	*error*	*error*	The character "⒈" isdisallowed,because it would produce a dot when mapped.
xn--a-ecp.ru	xn--a-ecp.ru	=	a⒈.ru	*error*	Punycodexn--a-ecp = a⒈, which failsvalidation.
xn--0.pt	xn--0.pt	=	*error*	*error*	Punycodexn--0 is invalid.
日本語。ＪＰ	日本語.jp	=	n/a	ok	Fullwidth characters are remapped, including 。
☕.us	=	=	n/a	ok	Post-Unicode 3.2 characters are allowed.

5IDNAMapping Table

For each code point in Unicode, the IDNA Mapping Table providesone of the following Status values:

valid: the code point is valid, and notmodified.
ignored: the code point is removed: this isequivalent to mapping the code point to an empty string.
mapped: the code point is replaced in thestring by the value for the mapping.
deviation: the code point is either mappedor valid, depending on whether the processing is transitional ornot.
disallowed: the code point is not allowed.

If this Status value ismapped ordeviation, the table alsosupplies a mapping value for that code point.

A table is provided for each version of Unicode starting with Unicode5.1 under [IDNA-Table].Each table for a version of the Unicode Standard will always bebackward compatible with previous versions of the table: onlycharacters with the Status valuedisallowed maychange in Status or Mapping value,with the following exception:

As part of the deprecation of transitional processing,the following exceptional change has been made in Unicode 15.1:
- Before Unicode 15.1, U+1E9E capital sharp s (ẞ) wasunconditionallymapped to “ss”,consistent with transitional processing whichmaps U+00DF small sharp s (ß) also to “ss”.
- Since Unicode 15.1,when using nontransitional processing,capital sharp s ismapped to small sharp s,which is treated asvalidunder nontransitional processing.This is the new Mapping value in the table.
  When usingtransitional processing (deprecated),U+1E9E capital sharp s (ẞ) continues to bemapped to “ss”,just like thedeviation mapping forU+00DF small sharp s (ß).This is handled during processing.

Unlike the IDNA2008 table, thistable is designed to be applied to the entire domain name, not justto individual labels. That design provides for the IDNA2003 handlingof label separators. In particular, the table is constructed toforbid problematic characters such as U+2488 ( ⒈ ) DIGIT ONE FULLSTOP, whose decompositions contain a "dot".

The Unicode IDNA Compatibility Processing is based on the Unicodecharacter mapping property [NFKC_Casefold].Section 6,MappingTable Derivation describes the derivation of these tables. Likederived properties in the Unicode Character Database, the descriptionof the derivation is informative. Only the data in IDNA Mapping Tableis normative for the application of this specification.

The files use a semicolon-delimited format similar to those in theUnicode Character Database [UAX44]. The fieldvalues are listed inTable 2b,Data File Fields:

Table 2b.Data File Fields

Num	Field	Description
0	Code point(s)	Hex value or range of values.
1	Status	valid, ignored,mapped,deviation, ordisallowed
2	Mapping	Hex value(s). Only present if the Status isignored,mapped, ordeviation.
3	IDNA2008 Status	There are two values:NV8 andXV8.NV8is only present if the Status isvalid but thecharacter is excluded by IDNA2008 from all domain names for allversions of Unicode.XV8 is present when the character isexcluded by IDNA2008 for thecurrentversion of Unicode. These are not normative values.

Example:

0000..002C    ; valid      ;      ; NV8    # 1.1  <control-0000>..COMMA002D..002E    ; valid                      # 1.1  HYPHEN-MINUS..FULL STOP002F          ; valid      ;      ; NV8    # 1.1  SOLIDUS0030..0039    ; valid                      # 1.1  DIGIT ZERO..DIGIT NINE003A..0040    ; valid      ;      ; NV8    # 1.1  COLON..COMMERCIAL AT0041          ; mapped     ; 0061          # 1.1  LATIN CAPITAL LETTER A...0080..009F    ; disallowed                 # 1.1  <control-0080>..<control-009F>...00A1..00A7    ; valid      ;      ; NV8    # 1.1  INVERTED EXCLAMATION MARK..SECTION SIGN...00AD          ; ignored                    # 1.1  SOFT HYPHEN...00DF          ; deviation  ; 0073 0073     # 1.1  LATIN SMALL LETTER SHARP S...19DA          ; valid      ;      ; XV8    # 5.2  NEW TAI LUE THAM DIGIT ONE...

6MappingTable Derivation

The following describes the derivation of the mapping table. Thisdescription has nothing to do with the actual mapping of labels inSection4,Processing.Instead, this section describes the derivation of the table inSection 5,IDNAMapping Table. That table is then normatively used for mapping inSection4,Processing.

The derivation is described as a series of steps.Step 1 defines a base mapping;Steps2,3, and4 define three sets of characters.Step 5 will modify the basemapping or the sets of characters as needed to maintain backwardcompatiblity. The mapping and sets are all used inStep 6 to produce the mapping andStatus values for the table.Step 7 removes characters whose mappings contain characters that are not valid. Each numberedstep may have substeps: for example,Step1 consists of Steps 1.1 through 1.2.

If a Unicode property changes in a future version in a way that wouldaffect backward compatibility,a corresponding clause will be addedtoStep 5 to maintaincompatibility. For more information on compatibility, seeSection5,IDNAMapping Table.

Step 1: Define a base mapping

This step specifies abase mapping, which is a mapping fromeach Unicode code point to sequences of zero or more code points. Thevalue resulting from mapping a particular code point C is called thebase mapping value of C. The base mapping value for C may beidentical to C.

Map the following exceptional characters:
1. Map label separator characters to U+002E ( . ) FULL STOP:
  - U+FF0E ( ． ) FULLWIDTH FULL STOP
  - U+3002 ( 。 ) IDEOGRAPHIC FULL STOP
  - U+FF61 ( ｡ ) HALFWIDTH IDEOGRAPHIC FULL STOP
2. Map all Bidi_Control characters to themselves
3. Map U+1E9E (ẞ) LATIN CAPITAL LETTER SHARP S toU+00DF (ß) LATIN SMALL LETTER SHARP S
Map eachother character to its NFKC_Casefold value[NFKC_Casefold].

Unicode 6.3 adds Bidi_Control characters that were not presentin Unicode 3.2. To preserve the intent of IDNA2003 in disallowingBidi_Control characters rather than just ignoring them, Step 1.1.bwas added. This step causes Step 6.3 to disallow all Bidi_Controlcharacters.

Step 1.1.b only affects 5 new characters added in Unicode 6.3.It would also impact any new Bidi_Control characters in futureversions of the standard.

Step 1.1.c (added in Unicode 15.1)maps the capital sharp s (ẞ) to the small sharp s (ß) rather than to ssbecause all major implementations have adopted nontransitional processing,which does not map ß to ss as in NFKC_Casefold.

Step 2: Specify the base valid set

The base valid set is defined by the sequential list of additions andsubtractions inTable 3,BaseValid Set. This definition is based on the principles of IDNA2003.When applied to the repertoire of Unicode 3.2 characters, thisproduces a set which is closely aligned with IDNA2003.

Table 3.BaseValid Set

Formal Set Notation	Description
`\P{Changes_When_NFKC_Casefolded}`	Start with characters that are equal to their [NFKC_Casefold] value. This criterionexcludes uppercase letters, for example, as well as characters thatare unstable under NFKC normalization, and default ignorable codepoints. Note that according to Perl/Java syntax, \P means the inverse of\p, so these are the characters thatdo not change whenindividually mapped according to [NFKC_Casefold].
`+ \u00DF`	Add LATIN SMALL LETTER SHARP S (ß).
`- \p{c} - \p{z}`	Remove Unassigned, Controls, Private Use, Format,Surrogate, and Whitespace.
`- \p{IDS_Unary_Operator} - \p{IDS_Binary_Operator} - \p{IDS_Trinary_Operator}`	Remove ideographic description characters.
`+ \p{ascii} - [\u002E]`	Add all ASCII exceptfor "."

Step 3: Specify the base exclusionset

The base exclusion set consists of the following code points:

U+FFFC OBJECT REPLACEMENT CHARACTER
U+FFFD REPLACEMENT CHARACTER
U+E0001..U+E007F Tag characters (includes some unassigned code points)

Step 4: Specify the deviation set

This is the set of characters that deviate between IDNA2003 andIDNA2008.

U+200C ZERO WIDTH NON-JOINER
U+200D ZERO WIDTH JOINER
U+00DF ( ß ) LATIN SMALL LETTER SHARP S
U+03C2 ( ς ) GREEK SMALL LETTER FINAL SIGMA

Step 5: Specify changes for backward compatibility

This set is currently empty. Adjustments to the above sets orbase mapping will be made in this section if the steps would cause analready existing character to change Status or mapping under a futureversion of Unicode, so that backward compatibility is maintained.

Step 6: Produce the initial Statusand Mapping values

For each code point:

If the code point is in thedeviation set
- the Status isdeviation and the mappingvalue is the base mapping value for that code point.
Otherwise, if the code point is in the base exclusion set oris unassigned
- the Status isdisallowed and there is nomapping value in the table.
Otherwise, if the code point is not a label separatorandsome code point in its base mapping value is not in the base validset
- the Status isdisallowed and there is nomapping value in the table.
Otherwise, if the base mapping value is an empty string
- the Status isignored and there is nomapping value in the table.
Otherwise, if the base mapping value is the same as the codepoint
- the Status isvalid and there is nomapping value in the table.
Otherwise,
- the Status ismapped and the mappingvalue is the base mapping value for that code point.

Step 7: Produce the final Statusand Mapping values

After processing all code points in previous steps:

Iterate through the set of characters with a Status ofmapped.Any whose mapping values are not wholly in the union of thevalid set and thedeviation set,makedisallowed.
Recursively apply these actions until there are no moreStatus changes.

For example, for Unicode 15.1, the set of characters set todisallowed inStep 7 consists ofthe following:

U+FE12 ( ︒ ) PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULLSTOP

Note: Characters such as U+2488 ( ⒈ ) DIGIT ONE FULL STOP aredisallowed by Step 6.3.

7IDNA Comparison

UntilUnicode 15.1,this section provided a detailed comparison of the differences betweenIDNA2003, UTS #46, and IDNA2008.Due to the end of the transition period, starting with Unicode 16.0,the Mapping Table Derivation no longer takes IDNA2003 mappings into account;therefore that information is no longer applicable.

Unicode provides aderived property file matching IDNA2008.Compared with IDNA2008,UTS #46 mostly adds mappings and considers punctuation and symbols valid.For more information seeSection 2,Unicode IDNA Compatibility Processingand consult theIDNA Mapping Table.

8ConformanceTesting

A conformance testing file (IdnaTestV2.txt) is provided for eachversion of Unicode starting with Unicode 6.0under [IDNA-Table]. It onlyprovides test cases forUseSTD3ASCIIRules=true.

8.1Format

The test file is UTF-8, with certain characters escaped using the\uXXXX or \x{XXXX} convention for readability. The details are in the header of the test file.

8.2Testing Conformance

To test for conformance to UTS #46, an implementation will perform the toUnicode, toAsciiN, and toAsciiToperations on the source string, then verify the resulting strings and relevant Status values. The details are in the header of the test file.

Implementations may be more strict than the default settings for UTS46. In particular, an implementation conformant to IDNA2008 would disallow the input for lines marked with NV8. Implementations need only record that there is an error: they need not reproduce the precise Status codes (after removing any ignored Status values).

8.3Migration

16.0

The test file for version 16.0 corrects some mistakes in the generation of status valuesand makes some improvements.

Starting with Unicode 16.0,the test format uses"" to mean the empty string.This is in contrast to a blank field value, which continues to have a different meaning.For example:
```
""; ; [X4_2]; ; [A4_1, A4_2]; ;  # \u200C; ; [C1]; xn--0ug; ; ""; [A4_1, A4_2] #
```
See the header of the test data file for details.
One or more new source strings are ill-formed, containing an unpaired surrogate,so that status value A3 is covered by test cases.
The status values V4-V6 have been renumbered to V5-V7,in order to match the insertion of validity criterion 4 in Unicode 15.1.
Status value U1 is set instead of V7 forASCII characters other than lowercase letters (a-z), digits (0-9), or hyphen-minus (U+002D),as had been suggested by the file header comments.
The file header comments about several status values have been corrected or clarified.

11.0

The test format and file name changed in Version 11.0 so that it could express a variety of different combinations of input options that people needed. The new format allows the testing implementation to test for precisely the results of its combination of supported flags, by filtering out Status codes that correspond to an unsupported input flag. The value XV8 was also removed, since it was not very useful in practice.

The following illustrate the differences between the old and new format. The set of examples is not exhaustive, but shows how there is more information available for the same examples.

Sample lines in test data format prior to 11.0:

T;  Faß.de;     faß.de;     fass.deN;  Faß.de;     faß.de;     xn--fa-hia.deB;  Bücher.de;  bücher.de;  xn--bcher-kva.deB;  à\u05D0;    [B5 B6];    [B5 B6]B;  a。。b;      [A4_2];     [A4_2]

Sample lines in test data format since 11.0:

Faß.de;     faß.de;     [];       xn--fa-hia.de;     ;  fass.de;Bücher.de;  bücher.de;  [];       xn--bcher-kva.de;  ;  ;à\u05D0;    àא;         [B5 B6];  xn--0ca24w;        ;  ;a。。b;      a..b;       [A4_2];   a..b;              ;  ;

9IDNADerived Property

To facilitate comparison between versions of the Unicode Character Databaseand to highlight the implications for the addition of new characters and changes of character properties,the Unicode Technical Committee has prepared a collection of IDNA Derived Propertydata files.Since Unicode 17.0, the version-specific Idna2008.txt data fileis posted in the versioned [IDNA-Table] directory.Before Unicode 17.0,these data files were posted at [IDNA-Derived].

For each version of the Unicode Standard starting with Unicode 6.1.0,the value of the enumerated IDNA2008_Category property is calculated and listed explicitlyin a separate data file. This property matches the "IDNA Derived Property" as defined in RFC 5892(see [IDNA2008]). The explicit listing is provided as a convenience for implementers. It is the result of performing the exact calculations defined in RFC 5892 concurrent with the release of each version of the Unicode Character Database.

RFC 5892 gives a list of code points for which the derivation is overriddenby exceptional values. All known exceptions are applied when a data file iscreated, but exceptions added in future updates of the IDNA protocol are not applied retroactively.

The format of these IDNA Derived Property data files is modeled closely on that specified in Appendix B.1 of RFC 5892, except that the comment section of each line is not truncated at column 72. For example, excerpted from RFC 5892:

007B..00B6  ; DISALLOWED  # LEFT CURLY BRACKET..PILCROW SIGN00B7        ; CONTEXTO    # MIDDLE DOT00B8..00DE  ; DISALLOWED  # CEDILLA..LATIN CAPITAL LETTER THORN00DF..00F6  ; PVALID      # LATIN SMALL LETTER SHARP S..LATIN SMALL LETT

Compare the same ranges excerpted from the data files:

007B..00B6  ; DISALLOWED  # LEFT CURLY BRACKET..PILCROW SIGN00B7        ; CONTEXTO    # MIDDLE DOT00B8..00DE  ; DISALLOWED  # CEDILLA..LATIN CAPITAL LETTER THORN00DF..00F6  ; PVALID      # LATIN SMALL LETTER SHARP S..LATIN SMALL LETTER O WITH DIAERESIS

This close match in format is designed to simplify scripted comparison between these IDNA Derived Property data files posted at unicode.org and other existing calculated listings based on RFC 5892 that have been posted at IANA or elsewhere.

Acknowledgments

Mark Davis and Michel Suignard authored the bulk of the original text of thisdocument, under direction from the Unicode Technical Committee. Fortheir contributions of ideas or text to this specification, theeditors thank Julie Allen, Matitiahu Allouche, Peter Constable, CraigCummings, Martin Dürst, Peter Edberg, Asmus Freytag, Deborah Goldsmith, LaurentiuIancu, Gervase Markham, Simon Montagu, Lisa Moore, Eric Muller, Simon Sapin, Murray Sargent, Markus Scherer,Jungshik Shin, Henri Sivonen, Shawn Steele,Erik van der Poel, Chris Weber, and Ken Whistler.The specification builds upon [IDNA2008],developed in the IETF Idna-update working group, especiallycontributions from Matitiahu Allouche, Harald Alvestrand, Vint Cerf,Martin J. Dürst, Lisa Dusseault, Patrik Fältström, Paul Hoffman, CaryKarp, John Klensin, and Peter Resnick, and also upon [IDNA2003], authored by Marc Blanchet, AdamCostello, Patrik Fältström, and Paul Hoffman.

References

[Bortzmeyer]	http://www.bortzmeyer.org/idn-et-phishing.html The most interesting studies cited there(originally from Mike Beltzner ofMozilla) are: Decision Strategies and Susceptibility toPhishing by Downs, Holbrook & Cranor WhyPhishing Works by Dhamija, Tygar & Hearst Do Security Toolbars Actually Prevent PhishingAttacks by Wu, Miller & Garfinkel Phishing Tips and Techniques by Gutmann.
[DemoConf]	https://util.unicode.org/UnicodeJsps/confusables.jsp
[DemoIDN]	https://util.unicode.org/UnicodeJsps/idna.jsp
[DemoIDNChars]	https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=\p{age%3D3.2}-\p{cn}-\p{cs}-\p{co}&abb=on&g=uts46+idna+idna2008
[IDNA2003]	The IDNA2003 specification is defined by acluster of IETF RFCs: IDNA [RFC3490] Nameprep [RFC3491] Punycode [RFC3492] Stringprep [RFC3454].
[IDNA2008]	The IDNA2008 specification is defined by acluster of IETF RFCs: Internationalized Domain Names for Applications (IDNA):Definitions and Document Framework https://www.rfc-editor.org/info/rfc5890 Internationalized Domain Names in Applications (IDNA)Protocol https://www.rfc-editor.org/info/rfc5891 The Unicode Code Points and Internationalized DomainNames for Applications (IDNA) https://www.rfc-editor.org/info/rfc5892 Right-to-Left Scripts for Internationalized Domain Namesfor Applications (IDNA) https://www.rfc-editor.org/info/rfc5893 There is also an informative document: Internationalized Domain Names for Applications (IDNA):Background, Explanation, and Rationale https://www.rfc-editor.org/info/rfc5894
[IDNA-Derived]	https://www.unicode.org/Public/idna2008derived
[IDNA-Table]	https://www.unicode.org/Public/17.0.0/idna (Before Unicode 17.0:https://www.unicode.org/Public/idna)
[IDN-FAQ]	https://www.unicode.org/faq/idn.html
[NFKC_Casefold]	The Unicode property specified in [UAX44], and defined by the data inDerivedNormalizationProps.txt(search for "NFKC_Casefold").
[RFC1034]	P. Mockapetris"Domain names - concepts and facilities", RFC 1034, November 1987. https://www.rfc-editor.org/info/rfc1034
[RFC3454]	P. Hoffman, M. Blanchet."Preparation of Internationalized Strings("stringprep")", RFC 3454, December 2002. https://www.rfc-editor.org/info/rfc3454
[RFC3490]	Faltstrom, P., Hoffman, P.and A. Costello, "Internationalizing Domain Names inApplications (IDNA)", RFC 3490, March 2003. https://www.rfc-editor.org/info/rfc3490
[RFC3491]	Hoffman, P. and M. Blanchet,"Nameprep: A Stringprep Profile for Internationalized DomainNames (IDN)", RFC 3491, March 2003. https://www.rfc-editor.org/info/rfc3491
[RFC3492]	Costello, A., "Punycode:A Bootstring encoding of Unicode for Internationalized Domain Namesin Applications (IDNA)", RFC 3492, March 2003. https://www.rfc-editor.org/info/rfc3492
[RZLGR5]	Integration Panel,"Root Zone Label Generation Rules — LGR-5", 22 May 2022. https://www.icann.org/sites/default/files/lgr/rz-lgr-5-overview-26may22-en.pdf
[SafeBrowsing]	http://code.google.com/apis/safebrowsing/
[Stability]	Unicode Consortium StabilityPolicies https://www.unicode.org/policies/stability_policy.html
[STD3]	Braden, R.,"Requirements for Internet Hosts -- CommunicationLayers", STD 3, RFC 1122, and "Requirements for InternetHosts -- Application and Support", STD 3, RFC 1123, October1989. https://www.rfc-editor.org/info/std3
[STD13]	Mockapetris, P.,"Domain names - concepts and facilities", STD 13, RFC1034 and "Domain names - implementation andspecification", STD 13, RFC 1035, November 1987. https://www.rfc-editor.org/info/std13
[UAX44]	UAX #44:UnicodeCharacter Database https://www.unicode.org/reports/tr44/
[Unicode]	The Unicode Standard For the latest version, see: https://www.unicode.org/versions/latest/
[UTR36]	UTR #36:UnicodeSecurity Considerations https://www.unicode.org/reports/tr36/
[UTS18]	UTS #18:UnicodeRegular Expressions https://www.unicode.org/reports/tr18/
[UTS39]	UTS #39:UnicodeSecurity Mechanisms https://www.unicode.org/reports/tr39/

Modifications

The following summarizes modifications from the previous published version of this document.

Revision 35

Reissued for Unicode 17.0.0.
Updated data file references to point to new locations for Version 17.0.([182-A11])

Modifications for previous versions are listed in those respective versions.

© 2010–2025 Unicode, Inc. This publication is protected by copyright, and permission must be obtained from Unicode, Inc. prior to any reproduction, modification, or other use not permitted by theTerms of Use. Specifically, you may make copies of this publication and may annotate and translate it solely for personal or internal business purposes and not for public distribution, provided that any such permitted copies and modifications fully reproduce all copyright and other legal notices contained in the original. You may not make copies of or modifications to this publication for public distribution, or incorporate it in whole or in part into any product or publication without the express written permission of Unicode.

Use of all Unicode Products, including this publication, is governed by the UnicodeTerms of Use. The authors, contributors, and publishers have taken care in the preparation of this publication, but make no express or implied representation or warranty of any kind and assume no responsibility or liability for errors or omissions or for consequential or incidental damages that may arise therefrom. This publication is provided “AS-IS” without charge as a convenience to users.

Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries.

Movatterモバイル変換

Unicode® Technical Standard #46

Unicode IDNA Compatibility Processing

Summary

Status

1Introduction

1.1IDNA2003

1.2IDNA2008

1.3Transition Considerations

1.3.1Mapping

1.3.2Deviations

2UnicodeIDNA Compatibility Processing

2.1Display ofInternationalized Domain Names

2.2Registries

2.3Notation

3Conformance

3.1STD3 Rules

4Processing

4.1ValidityCriteria

4.1.1UseSTD3ASCIIRules

4.1.2Right-to-LeftScripts

4.2ToASCII

4.3ToUnicode

4.4Preprocessingfor IDNA2008

4.5ImplementationNotes

5IDNAMapping Table

6MappingTable Derivation

7IDNA Comparison

8ConformanceTesting

8.1Format

8.2Testing Conformance

8.3Migration

16.0

11.0

9IDNADerived Property

Revision 35