tomlImportant
This PEP is a historical document. The up-to-date, canonical documentation can now be found attomllib.
×
SeePEP 1 for how to propose changes.
This PEP proposes adding thetomllib module to the standard library forparsing TOML (Tom’s Obvious Minimal Language,https://toml.io).
TOML is the format of choice for Python packaging, as evidenced byPEP 517,PEP 518 andPEP 621. This creates a bootstrappingproblem for Python build tools, forcing them to vendor a TOML parsingpackage or employ other undesirable workarounds, and causes serious issuesfor repackagers and other downstream consumers. Including TOML support inthe standard library would neatly solve all of these issues.
Further, many Python tools are now configurable via TOML, such asblack,mypy,pytest,tox,pylint andisort.Many that are not, such asflake8, cite the lack of standard librarysupport as amain reason why.Given the special place TOML already has in the Python ecosystem, it makes sensefor it to be an included battery.
Finally, TOML as a format is increasingly popular (for the reasonsoutlined inPEP 518), with various Python TOML libraries having about2000 reverse dependencies on PyPI (for comparison,requests has about28000 reverse dependencies). Hence, this is likely to be a generally usefuladdition, even looking beyond the needs of Python packaging and related tools.
This PEP proposes basing the standard library support for reading TOML on thethird-party librarytomli(github.com/hukkin/tomli).
Many projects have recently switched to usingtomli, such aspip,build,pytest,mypy,black,flit,coverage,setuptools-scm andcibuildwheel.
tomli is actively maintained and well-tested. It is about 800 linesof code with 100% test coverage, and passes all tests in theproposed official TOML compliance test suite, as well asthe more established BurntSushi/toml-test suite.
A new moduletomllib will be added to the Python standard library,exposing the following public functions:
defload(fp:SupportsRead[bytes],/,*,parse_float:Callable[[str],Any]=...,)->dict[str,Any]:...defloads(s:str,/,*,parse_float:Callable[[str],Any]=...,)->dict[str,Any]:...
tomllib.load deserializes a binary file-like object containing aTOML document to a Pythondict.Thefp argument must have aread() method with the same API asio.RawIOBase.read().
tomllib.loads deserializes astr instance containing a TOML documentto a Pythondict.
Theparse_float argument is a callable object that takes as input theoriginal string representation of a TOML float, and returns a correspondingPython object (similar toparse_float injson.load).For example, the user may pass a function returning adecimal.Decimal,for use cases where exact precision is important. By default, TOML floatsare parsed as instances of the Pythonfloat type.
The returned object contains only basic Python objects (str,int,bool,float,datetime.{datetime,date,time},list,dict withstring keys), and the results ofparse_float.
tomllib.TOMLDecodeError is raised in the case of invalid TOML.
Note that this PEP does not proposetomllib.dump ortomllib.dumpsfunctions; seeIncluding an API for writing TOML for details.
The release of TOML 1.0.0 in January 2021 indicates the TOML format shouldnow be officially considered stable. Empirically, TOML has proven to be astable format even prior to the release of TOML 1.0.0. From thechangelog, wecan see that TOML has had no major changes since April 2020, and has hadtwo releases in the past five years (2017-2021).
In the event of changes to the TOML specification, we can treat minorrevisions as bug fixes and update the implementation in place. In the event ofmajor breaking changes, we should preserve support for TOML 1.x.
The proposed implementation (tomli) is pure Python, well tested andweighs in at under 1000 lines of code. It is minimalist, offering a smaller APIsurface area than other TOML implementations.
The author oftomli is willing to help integratetomli into the standardlibrary and help maintain it,as per this post.Furthermore, Python core developer Petr Viktorin has indicated a willingnessto maintain a read API,as per this post.
Rewriting the parser in C is not deemed necessary at this time. It is rare forTOML parsing to be a bottleneck in applications, and users with higher performanceneeds can use a third-party library (as is already often the case with JSON,despite Python offering a standard library C-extension module).
As discussed in theMotivation section, TOML holds a special place in thePython ecosystem, for readingPEP 518pyproject.toml packagingand tool configuration files.This chief reason to include TOML in the standard library does not apply toother formats, such as YAML or MessagePack.
In addition, the simplicity of TOML distinguishes it from other formats likeYAML, which are highly complicated to construct and parse.
An API for writing TOML may, however, be added in a future PEP.
This proposal has no backwards compatibility issues within the standardlibrary, as it describes a new module.Any existing third-party module namedtomllib will break, asimporttomllib will import the standard library module.However,tomllib is not registered on PyPI, so it is unlikely that anymodule with this name is widely used.
Note that we avoid using the more straightforward nametoml to avoidbackwards compatibility implications for users who have pinned versions of thecurrenttoml PyPI package.For more details, see theAlternative names for the module section.
Errors in the implementation could cause potential security issues.However, the parser’s output is limited to simple data types; inability to loadarbitrary classes avoids security issues common in more “powerful” formats likepickle and YAML. Also, the implementation will be in pure Python, which reducessecurity issues endemic to C, such as buffer overflows.
The API oftomllib mimics that of other well-established file formatlibraries, such asjson andpickle. The lack of adump function willbe explained in the documentation, with a link to relevant third-party libraries(e.g.tomlkit,tomli-w,pytomlpp).
The proposed implementation can be found athttps://github.com/hukkin/tomli
Several potential alternative implementations exist:
tomlkit is well established, actively maintained and supports TOML 1.0.0.An important difference is thattomlkit supports style roundtripping. As aresult, it has a more complex API and implementation (about 5x as much code astomli). Its author does not believe thattomlkit is a good choice forthe standard library.toml is a very widely used library. However, it is not activelymaintained, does not support TOML 1.0.0 and has a number of known bugs. ItsAPI is more complex than that oftomli. It allows customising output stylethrough a complicated encoder API, and some very limited and mostly unusedfunctionality to preserve input style through an undocumented decoder API.For more details on its API differences from this PEP, refer toAppendix A.pytomlpp is a Python wrapper for the C++ projecttoml++. Pure Pythonlibraries are easier to maintain than extension modules.rtoml is a Python wrapper for the Rust projecttoml-rs and hence hassimilar shortcomings topytomlpp.In addition, it does not support TOML 1.0.0.tomli meets our needs and the author is willing to help with itsinclusion in the standard library.There are several reasons to not include an API for writing TOML.
The ability to write TOML is not needed for the use cases that motivate thisPEP: core Python packaging tools, and projects that need to read TOMLconfiguration files.
Use cases that involve editing an existing TOML file (as opposed to writing abrand new one) are better served by a style preserving library. TOML isintended as a human-readable and -editable configuration format, so it’simportant to preserve comments, formatting and other markup. This requiresa parser whose output includes style-related metadata, making it impracticalto output plain Python types likestr anddict. Furthermore, itsubstantially complicates the design of the API.
Even without considering style preservation, there are too many degrees offreedom in how to design a write API. For example, what default style(indentation, vertical and horizontal spacing, quotes, etc) should the libraryuse for the output, and how much control should users be given over it?How should the library handle input and output validation? Should it supportserialization of custom types, and if so, how? While there are reasonableoptions for resolving these issues, the nature of the standard library is suchthat we only get “one chance to get it right”.
Currently, no CPython core developers have expressed willingness to maintain awrite API, or sponsor a PEP that includes one. Since it is hard to changeor remove something in the standard library, it is safer to err on the side ofexclusion for now, and potentially revisit this later.
Therefore, writing TOML is left to third-party libraries. If a good API andrelevant use cases for it are found later, write support can be added in afuture PEP.
tomllib.loadThetoml library on PyPI allows passing paths (and lists of path-likeobjects, ignoring missing files and merging the documents into a single object)to itsload function. However, allowing this here would be inconsistentwith the behavior ofjson.load,pickle.load and other standard libraryfunctions. If we agree that consistency here is desirable,allowing paths is out of scope for this PEP. This can easily and explicitlybe worked around in user code, or by using a third-party library.
The proposed API takes a binary file, whiletoml.load takes a text file andjson.load takes either. Using a binary file allows us to ensure UTF-8 isthe encoding used (ensuring correct parsing on platforms with other defaultencodings, such as Windows), and avoid incorrectly parsing files containingsingle carriage returns as valid TOML due to universal newlines in text mode.
tomllib.loadsWhiletomllib.load takes a binary file,tomllib.loads takesa text string. This may seem inconsistent at first.
Quoting theTOML v1.0.0 specification:
A TOML file must be a valid UTF-8 encoded Unicode document.
tomllib.loads does not intend to load a TOML file, but rather thedocument that the file stores. The most natural representation ofa Unicode document in Python isstr, notbytes.
It is possible to addbytes support in the future if needed, butwe are not aware of any use cases for it.
tomllib.load[s]Thetoml library on PyPI accepts a_dict argument in itsload[s]functions, which works similarly to theobject_hook argument injson.load[s]. There are several uses of_dict found onhttps://grep.app; however, almost all of them are passing_dict=OrderedDict, which should be unnecessary as of Python 3.7.We found two instances of relevant use: in one case, a custom class was passedfor friendlier KeyErrors; in the other, the custom class had severaladditional lookup and mutation methods (e.g. to help resolve dotted keys).
Such a parameter is not necessary for the core use cases outlined in theMotivation section. The absence of this can be pretty easily worked aroundusing a wrapper class, transformer function, or a third-party library. Finally,support could be added later in a backward-compatible way.
parse_float intomllib.load[s]This option is not strictly necessary, since TOML floats should be implementedas “IEEE 754 binary64 values”, which is equivalent to a Pythonfloat on mostarchitectures.
The TOML specification uses the word “SHOULD”, however, implying arecommendation that can be ignored for valid reasons. Parsing floatsdifferently, such as todecimal.Decimal, allows users extra precision beyondthat promised by the TOML format. In the author oftomli’s experience, thisis particularly useful in scientific and financial applications. This is alsouseful for other cases that need greater precision, or where end-users includenon-developers who may not be aware of the limits of binary64 floats.
There are also niche architectures where the Pythonfloat is not a IEEE 754binary64 value. Theparse_float argument allows users to achieve correctTOML semantics even on such architectures.
Ideally, we would be able to use thetoml module name.
However, thetoml package on PyPI is widely used, so there are backwardcompatibility concerns. Since the standard library takes precedence over thirdparty packages, libraries and applications who current depend on thetomlpackage would likely break when upgrading Python versions due to the manyAPI incompatibilities listed inAppendix A,even if they pin their dependency versions.
To further clarify, applications with pinned dependencies are of greatestconcern here. Even if we were able to obtain control of thetoml PyPIpackage name and repurpose it for a backport of the proposed new module,we would still break users on new Python versions that included it in thestandard library, regardless of whether they have pinned an older version ofthe existingtoml package. This is unfortunate, since pinningwould likely be a common response to breaking changes introduced by repurposingthetoml package as a backport (that is incompatible with today’stoml).
Finally, thetoml package on PyPI is not actively maintained, but as ofyet, efforts to request that the author add other maintainershave been unsuccessful,so action here would likely have to be taken without the author’s consent.
Instead, this PEP proposes the nametomllib. This mirrorsplistlibandxdrlib, two other file format modules in the standard library, as wellas other modules, such aspathlib,contextlib andgraphlib.
Other names considered but rejected include:
tomlparser. This mirrorsconfigparser, but is perhaps somewhat lessappropriate if we include a write API in the future.tomli. This assumes we usetomli as the basis for implementation.toml under some namespace, such asparser.toml. However, this isawkward, especially so since existing parsing libraries likejson,pickle,xml,html etc. would not be included in the namespace.tomlThis appendix covers the differences between the API proposed in this PEP andthat of the third-party packagetoml. These differences are relevant tounderstanding the amount of breakage we could expect if we used thetomlname for the standard library module, as well as to better understand the designspace. Note that this list might not be exhaustive.
toml.dump[s])This PEP currently proposes not including a write API; that is, there willbe no equivalent oftoml.dump ortoml.dumps, as discussed atIncluding an API for writing TOML.
If we included a write API, it would be relatively straightforward toconvert most code that usestoml to the new standard library module(acknowledging that this is very different from a compatible API, as itwould still require code changes).
A significant fraction oftoml users rely on this, based on comparingoccurrences of “toml.load”tooccurrences of “toml.dump”.
toml.loadtoml.load has the following signature:
defload(f:Union[SupportsRead[str],str,bytes,list[PathLike|str|bytes]],_dict:Type[MutableMapping[str,Any]]=...,decoder:TomlDecoder=...,)->MutableMapping[str,Any]:...
This is quite different from the first argument proposed in this PEP:SupportsRead[bytes].
Recapping the reasons for this, previously mentioned atTypes accepted as the first argument of tomllib.load:
SupportsRead[bytes] allows us to ensure UTF-8 is the encoding used,and avoid incorrectly parsing single carriage returns as valid TOML.A significant fraction oftoml users rely on this, based on manualinspection ofoccurrences of “toml.load”.
toml raisesTomlDecodeError, vs. the proposedPEP 8-compliantTOMLDecodeError.
A significant fraction oftoml users rely on this, based onoccurrences of “TomlDecodeError”.
toml.load[s] accepts a_dict argumentDiscussed atControlling the type of mappings returned by tomllib.load[s].
As mentioned there, almost all usage consists of_dict=OrderedDict,which is not necessary in Python 3.7 and later.
toml.load[s] support an undocumenteddecoder argumentIt seems the intended use case is for an implementation of commentpreservation. The information recorded is not sufficient to roundtrip theTOML document preserving style, the implementation has known bugs, thefeature is undocumented and we could only find one instance of its use onhttps://grep.app.
Thetoml.TomlDecoder interfaceexposed is far from simple, containing nine methods.
Users are likely better served by a more complete implementation ofstyle-preserving parsing and writing.
toml.dump[s] support anencoder argumentNote that we currently propose to not include a write API; however, if thatwere to change, these differences would likely become relevant.
Theencoder argument enables two use cases:
The first is reasonable; however, we could only find two instances ofthis onhttps://grep.app. One of these two used this ability to addsupport for dumpingdecimal.Decimal, which a potential standard libraryimplementation would support out of the box.If needed for other types, this use case could be well served by theequivalent of thedefault argument injson.dump.
The second use case is enabled by allowing users to specify subclasses oftoml.TomlEncoderand overriding methods to specify parts of the TOML writing process. The APIconsists of five methods and exposes substantial implementation detail.
There is some usage of theencoder API onhttps://grep.app; however, itappears to account for a tiny fraction of the overall usage oftoml.
toml uses and exposes customtoml.tz.TomlTz timezone objects. Theproposed implementation usesdatetime.timezone objects from the standardlibrary.
This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.
Source:https://github.com/python/peps/blob/main/peps/pep-0680.rst
Last modified:2025-02-01 08:55:40 GMT