Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 819 – JSON Package Metadata

PEP 819 – JSON Package Metadata

Author:
Emma Harper Smith <emma at python.org>
PEP-Delegate:
Paul Moore
Discussions-To:
Discourse thread
Status:
Draft
Type:
Standards Track
Topic:
Packaging
Created:
18-Dec-2025
Post-History:
06-Jan-2026

Table of Contents

Abstract

This PEP proposes introducing JSON encoded core metadata and wheel file formatmetadata files in Python packages. Python package metadata (“core metadata”)was first defined inPEP 241 to useRFC 822 email headers to encodeinformation about packages. This was reasonable in 2001; email messageswere the only widely used, standardized text format that had a parser inthe standard library. However, issues with handling different encodings,differing handling of line breaks, and other differences betweenimplementations have caused numerous packaging bugs. Using the JSON format forencoding metadata files would eliminate a wide range of these potential issues.

Motivation

The email message format has a number of complexities and limitations whichreduce its utility as a portable textual interchange format for packagingmetadata. Due to theemail parser requiring configuration changes toproperly generate valid core metadata, many projects do not use theemail module and instead generate core metadata in a custom manner.There are many pitfalls with generating email headers that can be encounteredby such custom generators. First, core metadata fields may contain newlines in thevalue of fields. These newlines must be handled properly to “unfolded” multiplelines perRFC 822. One particularly difficult to encode field is theDescription field, which may contain newlines and indentation. To encodethe field in email headers, CRLF line breaks must be followed by seven (7)spaces and a pipe (’|’) character. WhileDescription may now be encoded inthe message body, similar escaping issues occur for theAuthor andMaintainer fields. Improperly escaped newlines can lead to missing,partial, or invalid core metadata. Second, as discussed in thecore metadata specifications:

The standard file format for metadata (including in wheels and installedprojects) is based on the format of email headers. However, email formatshave been revised several times, and exactly which email RFC applies topackaging metadata is not specified. In the absence of a precisedefinition, the practical standard is set by what the standard libraryemail.parser module can parse using theemail.policy.compat32 policy.

Since no specific email RFC is selected, the current core metadataspecification is ambiguous whether a given core metadata document is valid.RFC 822 is the only email standard to be explicitly listed in a PEP.However, the core metadata specifications also requires that core metadata isencoded using UTF-8 when written to a file. This de-facto makes the coremetadata followRFC 6532, which specifies internationalization of emailheaders. This has practical interoperability concerns. Until a few years ago,it was unspecified how to properly encode non-ASCII emails in coremetadata, making parsing ambiguous. Third, the current format is difficult toproperly validate and parse. Many tools do not check for issues with the outputof theemail parser. If a document is malformed, it may still parsewithout error by theemail module as a valid email message. Furthermore,due to limitations in the email format, fields likeProject-Url must createcustom encodings of nested key-value items, further complicating parsing andvalidation. Finally, the lack of a schema makes it difficult to validate thecontents of email message encoded metadata. While introducing a specificationfor the current format has beendiscussed previously, no progress hadbeen made, and converting to JSON was a suggested resolution to the issuesraised.

TheWHEEL file format is currently encoded in a custom key-value format.While this format is easy to parse and write, it requires manual parsing andvalidation to ensure that the contents are valid. Moving to a JSON encodedformat will allow for easier parsing and validation of the contents, andsimplify packaging tools and services by using a consistent format fordistribution metadata.

Rationale

Introducing a new core metadata file with a well-specified format will greatlyease generating, parsing, and validating metadata. JSON is a natural choice forstoring package core metadata. It is easily machine readable and writable, isunderstandable to humans, and is well supported across many languages.Furthermore,PEP 566 already specifies a canonicalization of email formattedcore metadata to JSON. JSON is also a frequently used format for datainterchange on the web. For discussion of other formats considered, pleaserefer to the rejected ideas section.

To maintain backwards compatibility, the JSON metadata file MUST be generatedalongside the existing email formatted metadata file. This ensures that toolsthat do not support the new format can still read package metadata for newpackages.

The JSON formatted metadata file must be semantically equivalent to the emailencoded file. This ensures that the metadata is unambiguous between the twoformats, and tools may read either when both are present. To maintainperformance, this equivalence is not required to be verified by installers,though other tools may do so. Some tools may choose to make the check dependenton a configuration flag.

Package indexes SHOULD check that the metadata files are semanticallyequivalent when the package is added to the index. This is a low-cost, one-timecheck that ensures users of the index are served valid packages.

Specification

JSON Format Core Metadata File

A new optional but recommended fileMETADATA.json shall be introduced as ametadata file for Python distribution packages. If generated, theMETADATA.json fileMUST be placed in the same directory as the current email formattedMETADATA orPKG-INFO file.

For wheels, this means thatMETADATA.json MUST be located in the.dist-info directory.

If present, theMETADATA.json file MUST be located in the root directory ofthe project sources in a source distribution package. Tools that prefer theJSON formatted metadata file MUST NOT assume the presence of theMETADATA.json file in the source distribution before reading the file.

The semantic contents of theMETADATA andMETADATA.json files MUST beequivalent ifMETADATA.json is present. Installers MAY verify thisinformation. Public package indexes SHOULD verify the files are semanticallyequivalent.

The newMETADATA.json file MUST be included in theinstalled project metadata,if present in the distribution metadata.

Conversion ofMETADATA to JSON Encoding

Conversion from the current email format for core metadata to JSON shouldfollow the process described inPEP 566, with the following modification:theProject-URL entries should be converted into an object with keyscontaining the labels and values containing the URLs from the original emailvalue. The overall process thus becomes:

  1. The original key-value format should be read withemail.parser.HeaderParser;
  2. All transformed keys should be reduced to lower case. Hyphens should bereplaced with underscores, but otherwise should retain all other characters;
  3. The transformed value for any field marked with “(Multiple-use”) should be asingle list containing all the original values for the given key;
  4. TheKeywords field should be converted to a list by splitting theoriginal value on commas;
  5. TheProject-URL field should be converted into a JSON object with keyscontaining the labels and values containing the URLs from the original emailvalue.
  6. The message body, if present, should be set to the value of thedescription key.
  7. The result should be stored as a string-keyed dictionary.

One edge case in the above conversion is that theProject-URL label is“free text, with a maximum length of 32 characters.” This presents a problemwhen trying to decode the label. Therefore this PEP sets the requirement thattheProject-URL label be any textexcept the comma (,) character.This allows for unambiguous parsing of theProject-URL entries by splittingthe text on the left-most comma (,) character.

JSON Schema for Core Metadata

To enable verification of JSON encoded core metadata, aJSON schema for core metadata has been produced.This schema will be updated with each revision to the core metadataspecification. The schema is available inAppendix: JSON Schema for Core Metadata.

Serving METADATA.json in the Simple Repository API

PEP 658 introduced a means of serving package metadata in the SimpleRepository API. The JSON encoded version of the package metadata may also beserved, via the following modifications to the Simple Repository API:

A new attributedata-dist-info-metadata-json may be added to anchor tagsin the Simple API. This attribute SHOULD have a value containing the hashinformation for theMETADATA.json file in the same format asdata-dist-info-metadata. Ifdata-dist-info-metadata-json is present,the repository MUST serve the JSON encoded metadata file at thedistribution’s path with.metadata.json appended to it. For example, if adistribution is served at/simple/foo-1.0-py3-none-any.whl, the JSONencoded core metadata file MUST be served at/simple/foo-1.0-py3-none-any.whl.metadata.json.

JSON Format Wheel Metadata File

A new optional but recommended fileWHEEL.json shall be introduced as aJSON encoded version of theWHEEL file. If generated, theWHEEL.jsonfile MUST be placed in the same directory as the current key-value formattedWHEEL file, i.e. the.dist-info directory. The semantic contents oftheWHEEL andWHEEL.json files MUST be equivalent. The wheel fileformat version will be incremented to1.1 to reflect the introductionofWHEEL.json.

TheWHEEL.json file SHOULD be preferred over theWHEEL file when bothare present.

Conversion ofWHEEL to JSON Encoding

Conversion from the current key-value format for wheel file format metadata toJSON should proceed as follows:

  1. The original key-value format should be read.
  2. All transformed keys should be reduced to lower case. Hyphens should bereplaced with underscores, but otherwise should retain all other characters.
  3. TheTag field’s entries should be converted to a list containing theoriginal values.
  4. The result should be stored as a string-keyed dictionary.

This follows a similar process to the conversion ofMETADATA to JSONencoding.

JSON Schema for Wheel Metadata

To enable verification of JSON encoded wheel file format metadata, aJSON schema for wheel metadata has been produced.This schema will be updated with each revision to the wheel metadataspecification. The schema is available inAppendix: JSON Schema for Wheel Metadata.

Deprecation of theMETADATA,PKG-INFO, andWHEEL Files

TheMETADATA,PKG-INFO, andWHEEL files are now deprecated. Thismeans that a future PEP may make theMETADATA,PKG-INFO, andWHEELfiles optional and requireMETADATA.json andWHEEL.json to be present.Please see the next section for more information on backwards compatibilitycaveats to that change.

Despite theMETADATA andPKG-INFO files being deprecated, new coremetadata revisions should be implemented for both JSON and email to ensure thatthey may remain semantically equivalent. Similarly, newWHEEL metadata keysshould be implemented for both JSON and key-value formats to ensure that theymay remain semantically equivalent.

Backwards Compatibility

The specification forMETADATA.json andWHEEL.json is designed suchthat the new format is completely backwards compatible. Existing tools may readmetadata from the existing email formatted files, and new tools may takeadvantage of the new format.

A future major revision of the wheel specification may make theMETADATA,PKG-INFO, andWHEEL files optional and make theMETADATA.json andWHEEL.json files required.

Note that tools will need to maintain parsing of email metadata and thekey-value formattedWHEEL file indefinitely to support parsing metadatafor old packages which only have theMETADATA,PKG-INFO,orWHEEL files.

Security Implications

One attack vector with JSON encoded core metadata is if the JSON payload isdesigned to consume excessive memory or CPU resources in a denial of service(DoS) attack. While this attack is not likely to affect users whom can cancelresource-intensive interactive operations, it may be an issue for packageindexes.

There are several mitigations that can be made to prevent this:

  1. The length of the JSON payload can be restricted to a reasonable size.
  2. The reader may use aJSONDecoder to omit parsingintandfloat values to avoid quadratic number parsing time complexityattacks.
  3. I plan to contribute a change toJSONDecoder in Python3.15+ that will allow it to be configured to restrict the nesting of JSONpayloads to a reasonable depth. Core metadata currently has a maximum depthof 2 to encode mapping and list fields.

With these mitigations in place, concerns about denial of service attacks withJSON encoded core metadata are minimal.

Reference Implementation

A reference implementation of the JSON schema for JSON core metadata isavailable inAppendix: JSON Schema for Core Metadata.

Furthermore, a reference implementation in thepackaging libraryisavailable.

A reference implementation generating bothMETADATA.json andWHEEL.jsonin theuv build backendis also available.

Rejected Ideas

Using Another File Format (TOML, YAML, etc.)

While TOML or another format could be used for the new core metadata fileformat, JSON has been chosen for a few reasons:

  1. Core metadata is mostly meant as a machine interchange format to be used bytools and services which wish to interoperate. Therefore thehuman-readability of TOML is not an important consideration in thisselection.
  2. JSON parsers are implemented in many languages’ standard libraries and thejson module has been part of Python’s standard library for a verylong time.
  3. JSON is fast to parse and emit.
  4. JSON schemas are JSON native and commonly used.

Open Issues

Where should the JSON schema be served?

Where should the standard JSON Schema be served? Some options would bepackaging.python.org, pypi.org, python.org, or pypa.org.

My first choice would be packaging.python.org, but I am open to other options.

Acknowledgements

Thanks to Konstantin Schütze for implementing the reference implementation ofthis PEP in theuv build backend and for providing valuable feedback on thespecification.

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0819.rst

Last modified:2026-01-09 23:30:01 GMT


[8]ページ先頭

©2009-2026 Movatter.jp