Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 359 Commits
.github/workflows		.github/workflows
idna		idna
tests		tests
tools		tools
.gitignore		.gitignore
HISTORY.rst		HISTORY.rst
LICENSE.md		LICENSE.md
README.rst		README.rst
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Repository files navigation

Internationalized Domain Names in Applications (IDNA)

Support for the Internationalized Domain Names inApplications (IDNA) protocol as specified inRFC 5891. This is the latest version ofthe protocol and is sometimes referred to as “IDNA 2008”.

This library also provides support for Unicode TechnicalStandard 46,Unicode IDNA Compatibility Processing.

This acts as a suitable replacement for the “encodings.idna”module that comes with the Python standard library, but whichonly supports the older superseded IDNA specification (RFC 3490).

Basic functions are simply executed:

>>>import idna>>> idna.encode('ドメイン.テスト')b'xn--eckwd4c7c.xn--zckzah'>>>print(idna.decode('xn--eckwd4c7c.xn--zckzah'))ドメイン.テスト

Installation

This package is available for installation from PyPI:

$ python3 -m pip install idna

Usage

For typical usage, theencode anddecode functions will take adomain name argument and perform a conversion to A-labels or U-labelsrespectively.

>>>import idna>>> idna.encode('ドメイン.テスト')b'xn--eckwd4c7c.xn--zckzah'>>>print(idna.decode('xn--eckwd4c7c.xn--zckzah'))ドメイン.テスト

You may use the codec encoding and decoding methods using theidna.codec module:

>>>import idna.codec>>>print('домен.испытание'.encode('idna2008'))b'xn--d1acufc.xn--80akhbyknj4f'>>>print(b'xn--d1acufc.xn--80akhbyknj4f'.decode('idna2008'))домен.испытание

Conversions can be applied at a per-label basis using theulabel oralabel functions if necessary:

>>> idna.alabel('测试')b'xn--0zwm56d'

Compatibility Mapping (UTS #46)

As described inRFC 5895, theIDNA specification does not normalize input from different potentialways a user may input a domain name. This functionality, known asa “mapping”, is considered by the specification to be a localuser-interface issue distinct from IDNA conversion functionality.

This library provides one such mapping that was developed by theUnicode Consortium. Known asUnicode IDNA Compatibility Processing, it provides for both a regularmapping for typical applications, as well as a transitional mapping tohelp migrate from older IDNA 2003 applications. Strings arepreprocessed according to Section 4.4 “Preprocessing for IDNA2008”prior to the IDNA operations.

For example, “Königsgäßchen” is not a permissible label asLATINCAPITAL LETTER K is not allowed (nor are capital letters in general).UTS 46 will convert this into lower case prior to applying the IDNAconversion.

>>>import idna>>> idna.encode('Königsgäßchen')...idna.core.InvalidCodepoint: Codepoint U+004B at position 1 of 'Königsgäßchen' not allowed>>> idna.encode('Königsgäßchen',uts46=True)b'xn--knigsgchen-b4a3dun'>>>print(idna.decode('xn--knigsgchen-b4a3dun'))königsgäßchen

Transitional processing provides conversions to help transition fromthe older 2003 standard to the current standard. For example, in theoriginal IDNA specification, theLATIN SMALL LETTER SHARP S (ß) wasconverted into twoLATIN SMALL LETTER S (ss), whereas in the currentIDNA specification this conversion is not performed.

>>> idna.encode('Königsgäßchen',uts46=True,transitional=True)'xn--knigsgsschen-lcb0w'

Implementers should use transitional processing with caution, only inrare cases where conversion from legacy labels to current labels must beperformed (i.e. IDNA implementations that pre-date 2008). For typicalapplications that just need to convert labels, transitional processingis unlikely to be beneficial and could produce unexpected incompatibleresults.

`encodings.idna` Compatibility

Function calls from the Python built-inencodings.idna module aremapped to their IDNA 2008 equivalents using theidna.compat module.Simply substitute theimport clause in your code to refer to the newmodule name.

Exceptions

All errors raised during the conversion following the specificationshould raise an exception derived from theidna.IDNAError baseclass.

More specific exceptions that may be generated asidna.IDNABidiErrorwhen the error reflects an illegal combination of left-to-right andright-to-left characters in a label;idna.InvalidCodepoint whena specific codepoint is an illegal character in an IDN label (i.e.INVALID); andidna.InvalidCodepointContext when the codepoint isillegal based on its positional context (i.e. it is CONTEXTO or CONTEXTJbut the contextual requirements are not satisfied.)

Building and Diagnostics

The IDNA and UTS 46 functionality relies upon pre-calculated lookuptables for performance. These tables are derived from computing againsteligibility criteria in the respective standards. These tables arecomputed using the command-line scripttools/idna-data.

This tool will fetch relevant codepoint data from the Unicode repositoryand perform the required calculations to identify eligibility. There arethree main modes:

idna-data make-libdata. Generatesidnadata.py anduts46data.py, the pre-calculated lookup tables used for IDNA andUTS 46 conversions. Implementers who wish to track this library againsta different Unicode version may use this tool to manually generate adifferent version of theidnadata.py anduts46data.py files.
idna-data make-table. Generate a table of the IDNA disposition(e.g. PVALID, CONTEXTJ, CONTEXTO) in the format found in AppendixB.1 of RFC 5892 and the pre-computed tables published byIANA.
idna-data U+0061. Prints debugging output on the variousproperties associated with an individual Unicode codepoint (in thiscase, U+0061), that are used to assess the IDNA and UTS 46 status of acodepoint. This is helpful in debugging or analysis.

The tool accepts a number of arguments, described usingidna-data-h. Most notably, the--version argument allows the specificationof the version of Unicode to be used in computing the table data. Forexample,idna-data --version 9.0.0 make-libdata will generatelibrary data against Unicode 9.0.0.

Additional Notes

Packages. The latest tagged release version is published in thePython Package Index.
Version support. This library supports Python 3.8 and higher.As this library serves as a low-level toolkit for a variety ofapplications, many of which strive for broad compatibility with olderPython versions, there is no rush to remove older interpreter support.Support for older versions are likely to be removed from new releasesas automated tests can no longer easily be run, i.e. once the Pythonversion is officially end-of-life.
Testing. The library has a test suite based on each rule of theIDNA specification, as well as tests that are provided as part of theUnicode Technical Standard 46,Unicode IDNA Compatibility Processing.
Emoji. It is an occasional request to support emoji domains inthis library. Encoding of symbols like emoji is expressly prohibited bythe technical standard IDNA 2008 and emoji domains are broadly phasedout across the domain industry due to associated security risks. Fornow, applications that need to support these non-compliant labelsmay wish to consider trying the encode/decode operation in this libraryfirst, and then falling back to using encodings.idna. Seethe Githubproject for more discussion.