This PEP describes changes to the PyPI infrastructure that are needed to ensurethat users get valid packages from PyPI. These changes should have minimalimpact on other parts of the ecosystem. The PEP focuses on communication betweenPyPI and users, and so does not require any action by package developers.Developers will upload packages using the current process, and PyPI willautomatically generate signed repository metadata for these packages.
In order for the security mechanism to beeffective, additional work will need to be done by PyPI consumers (like pip) toverify the signatures and metadata provided by PyPI. This verification can betransparent to users (unless it fails) and provides an automatic securitymechanism. There is documentation for how to consume TUF metadata in the TUFrepository. However, changes to PyPI consumers are not a pre-requisite forpublishing the metadata from PyPI, and can be doneaccording to the timelines and priorities of individual projects.
This PEP proposes how The Update Framework[2] (TUF) should be integrated with thePython Package Index (PyPI[1]). TUF was designed to be a flexiblesecurity add-on to a software updater or package manager. A full implementationof the frameworkintegrates best security practices, such as separating role responsibilities,adopting the many-man rule for signing packages, keeping signing keys offline,and revocation of expired or compromised signing keys. As a result, attackerswould need to steal multiple signing keys, which are stored independently,in order to compromise the role responsible for specifying a repository’s availablefiles. Or, alternatively, a roleresponsible for indicating the latest snapshot of the repository may also have to becompromised.
The initial integration proposed in this PEP will allow modern package managers,such as pip[3], to be more secure against attacks on PyPI mirrors and PyPI’sown content distribution network, and to better protect users from such attacks.Specifically, this PEP describes how PyPI processesshould be adapted to generate and incorporate TUF metadata (i.e., the minimumsecurity model). This minimum security model supports verification of PyPIdistributions that are signed with keys stored on PyPI. Distributions that areuploaded by developers are signed by PyPI, requiring no action from developers(other than uploading the distribution), and are immediately available fordownload. The minimum security model also minimizes PyPI administrativeresponsibilities by automating much of the signing process.
There is no discussion inthis PEP of support for project distributions thatare signed by developers (maximum security model). This possible future extensionis covered in detail inPEP 480. The maximum security model requires more PyPIadministrative work (though no added work for clients), and also proposesan easy-to-use key management solution for developers/publishers, ideas on howto interface with a potential future build farm on PyPI infrastructure, and thefeasibility of end-to-end signing.
While it does provide implementation recommendations, this PEP does notprescribe exactly how package managers, such as pip, should be adaptedto install or update projects from PyPI with TUF metadata. Package managersinterested in adopting TUF on the client side may consult itslibrarydocumentation, which was created for this purpose.
This PEP does not eliminate any existing features from PyPI. In particular, itdoes not replace existing support for OpenPGP signatures. Developers can continueto upload detached OpenPGP signatures along with distributions. In the future,PEP 480 may allow developers to directly sign TUF metadata using their OpenPGP keys.
Due to the amount of work required to implement this PEP, in early2019 it was deferred until appropriate funding could be secured toimplement the PEP. The Python Software Foundation secured this funding[22] and new PEP coauthors restarted PEPdiscussion.
Attacks on software repositories are common, even in organizations with verygood securitypractices. The resulting repository compromise allows anattacker to edit all files stored on the repository and sign these files usingany keys stored on the repository (online keys). In many signing schemes (likeTLS), this access allows the attacker to replace files on the repository andmake it look like these files are coming from PyPI. Without a way to revoke andreplace the trusted private key, it is very challenging to recover from arepository compromise. In addition to the dangers of repository compromise,software repositories are vulnerable to an attacker on the network (MITM)intercepting and changing files. These and other attacks on softwarerepositories are detailedhere.
This PEP, together with the follow-up proposal inPEP 480, aims to protect usersof PyPI from compromises of the integrity, consistency, and freshness propertiesof PyPI packages, and enhances compromise resilience by mitigating key risk andproviding mechanisms to recover from a compromise of PyPI or its signing keys.
On January 5, 2013, the Python Software Foundation (PSF) announced that[4] a securitybreach had occurred on thepython.org wikis for Python and Jython. As a result, all of the wiki data was destroyed.Fortunately, the PyPI infrastructure was not affected by this breach.However, the incident is a reminder that PyPI needed to take defensive steps toprotect users as much as possible in the event of a compromise. Attacks onsoftware repositories happen all the time[5]. The PSF must accept thepossibility of security breaches and prepare PyPI accordingly because it is avaluable resource used by thousands, if not millions, of people.
Before the wiki attack, PyPI used MD5 hashes to tell package managers, such aspip, whether or not a distribution file was corrupted in transit. However, the absenceof SSL made it hard for package managers to verify transport integrity to PyPI.It was therefore easy to launch a man-in-the-middle attack between pip andPyPI, and arbitrarily change the content of distributions. As a result, users could be tricked intoinstalling malicious distributions. After the wikiattack, several steps were proposed (some of which were implemented) to delivera much higher level of security than was previously the case. These steps includedrequiring SSL tocommunicate with PyPI[6], restricting project names[7], and migrating fromMD5 to SHA-2 hashes[8].
Though necessary, these steps are insufficient to protect distributions because attacks are stillpossible through other avenues. For example, a public mirror is trusted tohonestly mirror PyPI, but some mirrors may misbehave, whether by accident or throughmalicious intervention.Package managers such as pip are supposed to use signatures from PyPI to verifydistribution files downloaded from apublic mirror, but none are known to actuallydo so[10]. Therefore, it would be wise to add more security measures todetect attacks from public mirrors or content delivery networks[11] (CDNs).
Even though official mirrors have beendeprecated on PyPI, awide variety of other attack vectors on package managers remain[13]. These attackscan crash client systems, cause obsolete distributions to be installed, or evenallow an attacker to execute arbitrary code. InSeptember 2013, a post wasmade to the Distutils mailing list showing that the latest version of pip (atthe time) was susceptible to such attacks, and how TUF could protect usersagainst them[14]. Specifically, testing was done to see how pip wouldrespond to these attacks with and without TUF. Attacks tested included replayand freeze, arbitrary installation, slow retrieval, and endless data. The postalso included a demonstration of how pip would respond if PyPI werecompromised.
To provide compromise resilient protection of PyPI, this PEP proposes the use ofThe Update Framework[2] (TUF). TUF provides protection from a variety ofattacks on software update systems, while also providing mechanisms to recoverfrom a repository compromise. TUF has been used in production by a number oforganizations, including use in Cloud Native Computing Foundation’s Notaryservice, which provides the infrastructure for container image signing in DockerRegistry. The TUF specification has been the subject of three independentsecurityaudits.
The scope ofthis PEP is protecting users from compromises of PyPI mirrors,and PyPI’s own TLS termination and content distribution infrastructure.Protection from compromises of PyPI itself is discussed inPEP 480.
The threat model assumes the following:
An attacker is considered successful if it can cause a client to install (orleave installed) something other than the most up-to-date version of asoftware distribution file. If the attacker is preventing the installationof updates, they do not want clients to realize there is anything wrong.
This threat model describes the minimum security model. The maximum securitymodel described inPEP 480 also assumes that attackers can compromise PyPI’sonline keys.
The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”,“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to beinterpreted as described inRFC 2119.
This PEP focuses only on integrating TUF into PyPI. However, the reader isencouraged to review TUF design principles[2] and SHOULD befamiliar with the TUF specification[16].
The following terms used in this PEP are defined in the Python PackagingGlossary[17]:project,release,distribution.
Additional terms used in this PEP are defined as follows:
At its highest level, TUF provides applications with a secure method forknowing about and obtaining new versions of files. On thesurface, this all sounds simple. The basic steps for updating applications are:
The problem is that updating applications is only simple when there are nomalicious activities in the picture. If an attacker is trying to interfere withthese seemingly simple steps, there is plenty they can do.
Assume a software updater takes the approach of most systems (at least the onesthat try to be secure). It downloads both the file it wants and a cryptographicsignature of the file. The software updater already knows which key it truststo make the signature. It checks that the signature is correct and was made bythis trusted key. Unfortunately, the software updater is still at risk in manyways, including the following scenarios:
TUF is designed to address these attacks, and others, by adding signed metadata(text files that describe the repository’s files) to the repository andreferencing the metadata files during the update procedure. Repository filesare verified against the information included in the metadata before they arehanded off to the software update system. The framework also providesmulti-signature trust, explicit and implicit revocation of cryptographic keys,responsibility separation of the metadata, and minimized key risk. For a fulllist and outline of the repository attacks and software updater weaknessesaddressed by TUF, see Appendix A.
A software update system must complete two main tasks to integrate with TUF.First, the repository on the server side MUST be modified to provide signedTUF metadata. This PEP is concerned with the first part of the integration,and the changes on PyPI required to support software updates with TUF.
Second, it must add the framework to the client side of the update system. Forexample, TUF MAY be integrated with the pip package manager. Thus, new versionsof pip going forward SHOULD use TUF by default to download and verify distributionsfrom PyPI before installing them. However, there may be unforeseen issues thatmight prevent users from installing or updating distributions, including pip itself,via TUF. Therefore, pip SHOULD provide an option e.g.,--unsafely-disable-package-verification, in order to work around such issuesuntil they are resolved. Note, the proposed option name is purposefully long,because a user must be helped to understand that the action is unsafe and notgenerally recommended.
We assume that pip would use TUF to verify distributions downloaded only from PyPI.pip MAY support TAP4 in order use TUF to also verify distributions downloadedfromelsewhere.
In order for package managers like pip to download and verify distributions withTUF, a few extra files MUST be added to PyPI. These extra repository files arecalled TUF metadata, and they contain such information as which keys can be trusted,thecryptographic hashes of files, signatures, metadata version numbers, andthe date after which the metadata should be considered expired.
When a package manager wants to check for updates, it asks TUF to do the work.That is, a package manager never has to deal with this additional metadata orunderstand what’s going on underneath. If TUF reports back that there areupdates available, a package manager can then ask TUF to download these filesfrom PyPI. TUF downloads them and checks them against the TUF metadata that italso downloads from the repository. If the downloaded target files aretrustworthy, TUF then hands them over to the package manager.
TheDocument formats section of the TUF specification provides informationabout each type of required metadata and its expected content. The nextsection covers the different kinds of metadata RECOMMENDED for PyPI.
In addition, all target files SHOULD be available on disk at least two times.Once under their original filename, to provide backwards compatibility, andonce with their SHA-512 hash included in theirfilename. This is required to produceConsistent Snapshots.
Depending on the used file system different data deduplication mechanisms MAYbe employed to avoid storage increase from hard copies of target files.
TUF metadata provides information that clients can use to make updatedecisions. For example, atargets metadata lists the available target fileson PyPI and includes the required signatures, cryptographic hash, andfile sizes for each. Different metadata files provide different information, which aresigned by separate roles. Theroot role indicates what metadata belongs toeach role. The concept of roles allows TUF to delegate responsibilitiesto multiple roles, thus minimizing the impact of any one compromised role.
TUF requires four top-level roles. These areroot,timestamp,snapshot,andtargets. Theroot role specifies the public cryptographic keys of thetop-level roles (including its own). Thetimestamp role references thelatestsnapshot and can signify when a new snapshot of the repository isavailable. Thesnapshot role indicates the latest version of all the TUFmetadata files (other thantimestamp). Thetargets role lists the filepaths of available target files together with their cryptographic hashes.The file paths must be specified relative to a base URL. This allows theactual target files to be served from anywhere, as long as the base URLcan be accessed by the client. Each top-level role will serve itsresponsibilities without exception. Table 1 provides an overview of theroles used in TUF.
| Roles and Responsibilities | |
| root | The root role is the locus of trust for the entirerepository. The root role signs the root.json metadatafile. This file indicates which keys are authorized foreach of the top-level roles, including for the root roleitself. The roles “root”, “snapshot”, “timestamp” and“targets” must be specified and each has a list of publickeys. |
| targets | The targets role is responsible for indicating whichtarget files are available from the repository. Moreprecisely, it shares the responsibility of providinginformation about the content of updates. The targetsrole signs targets.json metadata, and can delegate trustfor repository files to other roles (delegated roles). |
| delegated roles | If the top-level targets role performs delegation, theresulting delegated roles can then provide their ownmetadata files. The format of the metadata files providedby delegated targets roles is the same as that oftargets.json. As with targets.json, the latest version ofmetadata files belonging to delegated roles are describedin the snapshot role’s metadata. |
| snapshot | The snapshot role is responsible for ensuring thatclients see a consistent repository state. It providesrepository state information by indicating the latestversions of the top-level targets and delegated targetsmetadata files on the repository in snapshot.json. rootand timestamp are not listed in snapshot.json, becausetimestamp signs for its freshness, after snapshot.jsonhas been created, and root, which has all top-level keys,is required ahead of time to trust any of the top-levelroles. |
| timestamp | The timestamp role is responsible for providinginformation about the timeliness of available updates.Timeliness information is made available by frequentlysigning a new timestamp.json file that has a shortexpiration time. This file indicates the latest versionof snapshot.json. |
Table 1: An overview of the TUF roles.
Unless otherwise specified, this PEP RECOMMENDS that every metadata ortarget file be hashed using the SHA2-512 function oftheSHA-2 family. SHA-2 has native and well-tested Python 2 and 3support (allowing for verification of these hashes without additional,non-Python dependencies). If stronger security guarantees arerequired, then both SHA2-256 and SHA2-512 or both SHA2-256 andSHA3-256MAY be used instead. SHA2-256 and SHA3-256are based on very different designs from each other, providing extra protectionagainstcollision attacks. However, SHA-3requires installing additional, non-Python dependencies forPython 2.
The top-levelroot role signs for the keys of the top-leveltimestamp,snapshot,targets, androot roles. Thetimestamp role signs for everynew snapshot of the repository metadata. Thesnapshot role signs forroot,targets, and all delegated targets roles. The delegated targets rolebinsfurther delegates to thebin-n roles, which sign for all distribution filesbelonging to registered PyPI projects.
Figure 1 provides an overview of the roles available within PyPI, whichincludes the top-level roles and the roles delegated to bytargets. The figurealso indicates the types of keys used to sign each role, and which roles aretrusted to sign for files available on PyPI. The next two sections cover thedetails of signing repository files and the types of keys used for each role.

Figure 1: An overview of the role metadata available on PyPI.
The roles that change most frequently aretimestamp,snapshot and rolesdelegated to bybins (i.e.,bin-n). Thetimestamp andsnapshotmetadata MUST be updated wheneverroot,targets or delegated metadata areupdated. Observe, though, thatroot andtargets metadata are much lesslikely to be updated as often as delegated metadata. Similarly, thebins rolewill only be updated when abin-n role is added, updated, or removed. Therefore,timestamp,snapshot, andbin-n metadata will most likely be updated frequently (possibly everyminute) due to delegated metadata being updated frequently in order to supportcontinuous delivery of projects. Continuous delivery is a set of processesthat PyPI uses to produce snapshots that can safely coexist and be deletedindependent of other snapshots[18].
Every year, PyPI administrators SHOULD sign forroot andtargets role keys.Automation will continuously sign for a timestamped snapshot of all projects. ArepositoryMetadata API is available that can be used tomanage a TUFrepository.
In standard operation, thebin-n metadata will be updated and signed as newdistributions are uploaded to PyPI. However, there will also need to be aone-time online initialization mechanism to create and signbin-n metadata forall existing distributions that are part of the PyPI repository every time PyPIis re-initialized.
Package managers like pip MUST ship theroot metadata file with theinstallation files that users initially download. This includes informationabout the keys trusted for all top-level roles (including the root keys themselves).Package managers must also bundle a TUF client library. Any new version ofrootmetadata that the TUF client library may download is verified against the root keysinitially bundled with the package manager. If a root key is compromised,but a threshold of keys are still secured, then PyPI administrators MUST push newroot metadata that revokes trust in the compromised keys. If a threshold of rootkeys are compromised, then theroot metadata MUST be updated out-of-band.(However, the threshold of root keys should be chosen so that this event is extremelyunlikely.) Package managers do not necessarily need to be updated immediately if rootkeys are revoked or added between new releases of the package manager, as the TUF updateprocess automatically handles cases where a threshold of previousroot keys signfor newroot keys (assuming no backwards-incompatibility in the TUF specificationused). So, for example, if a package manager was initially shipped with version 1 oftheroot metadata, and a threshold ofroot keys in version 1 signed version 2 oftheroot metadata, and a threshold ofroot keys in version 2 signed version 3 oftheroot metadata, then the package manager should be able to transparently updateits copy of the *root metadata from version 1 to 3 using its TUF client library.
Thus, to repeat, the latest good copy ofroot metadata and a TUF client library MUSTbe included in any new version of pip shipped with CPython (via ensurepip). The TUFclient library inside the package manager then loads theroot metadata and downloadsthe rest of the roles, including updating theroot metadata if it has changed.Anoutline of the update process is available.
There are two security models to consider when integrating TUF into PyPI. Theone proposed in this PEP is the minimum security model, which supportsverification of PyPI distributions signed with private cryptographickeys stored on PyPI. Distributions uploaded by developers are signed by PyPIand immediately available for download. A possible future extension to thisPEP, discussed inPEP 480, proposes the maximum security model and allowsa developer to sign for their project. Developer keys are not stored online:therefore, projects are safe from PyPI compromises.
The minimum security model requires no action from a developer and protectsagainst malicious CDNs[19] and public mirrors. To support continuousdelivery of uploaded distributions, PyPI signs for projects with an online key.This level of security prevents projects from being accidentally ordeliberately tampered with by a mirror or a CDN because neither willhave any of the keys required to sign for projects. However, it does notprotect projects from attackers who have compromised PyPI, since they canthen manipulate TUF metadata using the keys stored online.
This PEP proposes that thebin-n roles sign for all PyPI projects with onlinekeys. Thesebin-n roles MUST all be delegated by the upper-levelbins role,which is signed with an offline key, and in turn MUST be delegated by thetop-leveltargets role, which is also signed with an offline key.This means that when a package manager such as pip (i.e., using TUF) downloadsa distribution file from a project on PyPI, it will consult thetargets role aboutthe TUF metadata for that distribution file. If ultimately nobin-n rolesdelegated bytargets viabins specify the distribution file, then it isconsidered to be non-existent on PyPI.
Note, the reason whytargets does not directly delegate tobin-n, butinstead uses the intermediarybins role, is so that other delegations caneasily be added or removed, without affecting thebins-to-bin-n mapping.This is crucial for the implementation ofPEP 480.
The metadata for theroot,targets, andbins roles SHOULD each expire inone year, because these metadata files are expected to change very rarely.
Thetimestamp,snapshot, andbin-n metadata SHOULD each expire in one daybecause a CDN or mirror SHOULD synchronize itself with PyPI every day.Furthermore, this generous time frame also takes into account client clocksthat are highly skewed or adrift.
As the number of projects and distributions on a repository grows, TUF metadata will need togrow correspondingly. For example, consider thebins role. In August 2013,it was found that the size of thebins metadata was about 42MB if thebinsrole itself signed for about 220K PyPI targets (which are simple indices anddistributions). This PEP does not delve into the details, but TUF features aso-called“hashed bin delegation” scheme that splits a large targets metadata fileinto many small ones. This allows a TUF client updater to intelligentlydownload only a small number of TUF metadata files in order to update anyproject signed for by thebins role. For example, applying this scheme tothe previous repository resulted in pip downloading between 1.3KB and 111KB toinstall or upgrade a PyPI project via TUF.
Based on our findings as of the time this document was updated forimplementation (Nov 7 2019), summarized in Tables 2-3, PyPI SHOULDsplit all targets in thebins role by delegating them to 16,384bin-n roles (see C10 in Table 2). Eachbin-n role would signfor the PyPI targets whose SHA2-512 hashes fall into that bin(see Figure 1 andConsistent Snapshots). It was foundthat this number of bins would result in a 5-9% metadata overhead(relative to the average size of downloaded distribution files; see V13 andV15 in Table 3) for returning users, and a 69% overhead for newusers who are installing pip for the first time (see V17 in Table 3).
A few assumptions used in calculating these metadata overhead percentages:
| Name | Description | Value |
| C1 | # of bytes in a SHA2-512 hexadecimal digest | 128 |
| C2 | # of bytes for a SHA2-512 public key ID | 64 |
| C3 | # of bytes for an Ed25519 signature | 128 |
| C4 | # of bytes for an Ed25519 public key | 64 |
| C5 | # of bytes for a target relative file path | 256 |
| C6 | # of bytes to encode a target file size | 7 |
| C7 | # of bytes to encode a version number | 6 |
| C8 | # of targets (simple indices and distributions) | 2,273,539 |
| C9 | Average # of bytes for a downloaded distribution | 2,184,393 |
| C10 | # of bins | 16,384 |
C8 was computed by querying the number of release files.C9 was derived by taking the average between a rough estimate of the averagesize of release filesdownloaded over the past 31 days (1,628,321 bytes),and the average size of releases files on disk (2,740,465 bytes).Ee Durbin helped to provide these numbers on November 7, 2019.
Table 2: A list of constants used to calculate metadata overhead.
| Name | Description | Formula | Value |
| V1 | Length of a path hash prefix | math.ceil(math.log(C10, 16)) | 4 |
| V2 | Total # of path hash prefixes | 16**V1 | 65,536 |
| V3 | Avg # of targets per bin | math.ceil(C8/C10) | 139 |
| V4 | Avg size of SHA-512 hashes per bin | V3*C1 | 17,792 |
| V5 | Avg size of target paths per bin | V3*C5 | 35,584 |
| V6 | Avg size of lengths per bin | V3*C6 | 973 |
| V7 | Avg size of bin-n metadata (bytes) | V4+V5+V6 | 54,349 |
| V8 | Total size of public key IDs in bins | C10*C2 | 1,048,576 |
| V9 | Total size of path hash prefixes in bins | V1*V2 | 262,144 |
| V10 | Est. size of bins metadata (bytes) | V8+V9 | 1,310,720 |
| V11 | Est. size of snapshot metadata (bytes) | C10*C7 | 98,304 |
| V12 | Est. size of metadata overhead per distribution per returning user (same snapshot) | 2*V7 | 108,698 |
| V13 | Est. metadata overhead per distribution per returning user (same snapshot) | round((V12/C9)*100) | 5% |
| V14 | Est. size of metadata overhead per distribution per returning user (diff snapshot) | V12+V11 | 207,002 |
| V15 | Est. metadata overhead per distribution per returning user (diff snapshot) | round((V14/C9)*100) | 9% |
| V16 | Est. size of metadata overhead per distribution per new user | V14+V10 | 1,517,722 |
| V17 | Est. metadata overhead per distribution per new user | round((V16/C9)*100) | 69% |
Table 3: Estimated metadata overheads for new and returning users.
The interested reader may find an interactive version of the metadata overheadcalculatorhere:
This number of bins SHOULD increase when the metadata overhead for returningusers exceeds 50%. Presently, this SHOULD happen when the number of targetsincrease at least 10x from over 2M to over 22M, at which point the metadataoverhead for returning and new users would be around 50-54% and 114%respectively, assuming that the number of bins stay fixed. If the number ofbins is increased, then the cost for all users would effectively be the costfor new users, because their cost would be dominated by the (once-in-a-while)cost of downloading the large number of delegations in thebins metadata.If the cost for new users should prove to be too much, primarily due to theoverhead of downloading thebins metadata, then this subject SHOULD berevisited before that happens.
Note that changes to the number of bins on the server are transparent to theclient. The package manager will be required to download a fresh set ofmetadata, as though it were a new user, but this operation will not require anyexplicit code logic or user interaction in order to do so.
It is possible to make TUF metadata more compact by representing it in a binaryformat, as opposed to the JSON text format. Nevertheless, a sufficiently largenumber of projects and distributions will introduce scalability challenges atsome point, and therefore thebins role will still need delegations (asoutlined in Figure 1) in order to address the problem. The JSON format is anopen and well-known standard for data interchange, which is already supported bythe TUF reference implementation, and therefore the recommended data format bythis PEP. However, due to the large number of delegations, compressedversions of all metadata SHOULD also be made available to clients via theexisting Warehouse mechanisms for HTTP compression. In addition, the JSONmetadata could be compressed before being sent to clients. The TUF referenceimplementation does not currently support downloading compressed JSON metadata,but this could be added to reduce the metadata size.
In this section, the kinds of keys required to sign for TUF roles on PyPI areexamined. TUF is agnostic with respect to choices of digital signaturealgorithms. However, this PEP RECOMMENDS that all digital signatures beproduced with the Ed25519 algorithm[15]. Ed25519 has native andwell-tested Python support (allowing for verification of signatures withoutadditional, non-Python dependencies), uses small keys, and is supportedby modern HSM and authentication token hardware.
Theroot role key is critical for security and should very rarely be used.It is primarily used for key revocation, and it is the locus of trust for allof PyPI. Theroot role signs for the keys that are authorized for each ofthe top-level roles (including its own). Keys belonging to theroot role areintended to be very well-protected and used with the least frequency of allkeys. It is RECOMMENDED that the PSF board determine the current set of trustedroot key holders, each of whom will own a (strong) root key.A majority of them can then constitute a quorum to revoke or endow trust in alltop-level keys. Alternatively, the system administrators of PyPI could begiven responsibility for signing for theroot role. Therefore, therootrole SHOULD require (t, n) keys, where n is the number of key holders determinedby the PSF board, and t > 1 (so that at least two members must sign therootrole).
Thetargets role will be used only to sign for the static delegation of alltargets to thebins role. Since these target delegations must be securedagainst attacks in the event of a compromise, the keys for thetargets roleMUST be offline and independent of other keys. For simplicity of keymanagement, without sacrificing security, it is RECOMMENDED that the keys ofthetargets role be permanently discarded as soon as they have been createdand used to sign for the role. Therefore, thetargets role SHOULD require(2, 2) keys. Again, this is because the keys are going to be permanentlydiscarded, and more offline keys will not help resist key recoveryattacks[20] unless the diversity of cryptographic algorithms is maintained.
For similar reasons, the keys for thebins role SHOULD be set up similar tothe keys for thetargets role.
In order to support continuous delivery, the keys for thetimestamp,snapshot, and allbin-n roles MUST be online. There is little benefit inrequiring all of these roles to use different online keys, since attackerswould presumably be able to compromise all of them if they compromise PyPI.Therefore, it is reasonable to use one online key for all of them.
The online key shared by thetimestamp,snapshot, and allbin-n rolesMAY be stored, encrypted or not, on the Python infrastructure. For example,the key MAY be kept on a self-hosted key management service (e.g. HashicorpVault), or a third-party one (e.g. AWSKMS, Google CloudKMS, or AzureKeyVault).
Some of these key management services allow keys to be stored on HardwareSecurity Modules (HSMs) (e.g., HashicorpVault, AWSCloudHSM, GoogleCloudHSM, Azure KeyVault). This prevents attackers from exfiltratingthe online private key (albeit not from using it, although their actionsmay now be cryptographically auditable). However, this requires modifyingthe reference TUF implementation to support HSMs (WIP).
Regardless of where and how this online key is kept, its use SHOULD becarefully logged, monitored, and audited, ideally in such a manner thatattackers who compromise PyPI are unable to immediately turn off this logging,monitoring, and auditing.
As explained in the previous section, theroot,targets, andbins rolekeys MUST be offline for maximum security. These keys will be offline in thesense that their private keys MUST NOT be stored on PyPI, though some of themMAY be online in the private infrastructure of the project.
There SHOULD be an offline key ceremony to generate, backup, and store thesekeys in such a manner that the private keys can be read only by the Pythonadministrators when necessary (e.g., such as rotating the keys for thetop-level TUF roles). Thus, keys SHOULD be generated, preferably in a physicallocation where side-channelattacks are not a concern, using:
In order to avoid the persistence of sensitive data (e.g., private keys) other thanon backup media after the ceremony, offline keys SHOULD be generatedencrypted using strong passwords, either on (in decreasing order of trust):private HSMs (e.g.,YubiHSM), cloud-based HSMs (e.g., those listed above),in volatile memory (e.g., RAM), or in nonvolatile memory(e.g., SSD or microSD). If keys must be generated on nonvolatile memory,then this memory MUST be irrecoverably destroyed after having securelybacked up the keys.
Passwords used to encrypt keys SHOULD be stored somewhere durable andtrustworthy to which only Python admins have access.
In order to minimizeOPSEC errors during the ceremony, scripts SHOULD bewritten, for execution on the trusted key-generation computer, to automatetedious steps of the ceremony, such as:
Note the one-time keys for thetargets andbins roles MAY be safelygenerated, used, and deleted during the offline key ceremony. Furthermore,theroot keys MAY not be generated during the offline key ceremony itself.Instead, a threshold t of n Python administrators, as discussed above, MAYindependently sign theroot metadataafter the offline key ceremony usedto generate all other keys.
Project developers expect the distributions they upload to PyPI to beimmediately available for download. Unfortunately, there will be problems whenmany readers and writers simultaneously access the same metadata andtarget files. That is, there needs to be a way to ensure consistency ofmetadata and target files when multiple developers simultaneously change thesefiles. There are also issues with consistency on PyPIwithout TUF, but the problem is more severe with signed metadata that MUST keeptrack of the files available on PyPI in real-time.
Suppose that PyPI generates asnapshot that indicates the latest version ofevery metadata, excepttimestamp, at version 1 and a client requests thissnapshot from PyPI. While the client is busy downloading thissnapshot,PyPI then timestamps a new snapshot at, say, version 2. Without ensuringconsistency of metadata, the client would find itself with a copy ofsnapshotthat disagrees with what is available on PyPI. The result would be indistinguishable fromarbitrary metadata injected by an attacker. The problem would also occur withmirrors attempting to sync with PyPI.
To keep TUF metadata on PyPI consistent with the highly volatile target files,consistent snapshots SHOULD be used. Each consistent snapshot captures thestate of all known projects at a given time and MAY safely coexist with anyother snapshot, or be deleted independently, without affecting any othersnapshot.
To maintain consistent snapshots, all TUF metadata MUST, when written to disk,include a version number in their filename:
- VERSION_NUMBER.ROLENAME.json,
- where VERSION_NUMBER is an incrementing integer, and ROLENAME is one of thetop-level metadata roles –root,snapshot ortargets – or one ofthe delegated targets roles –bins orbin-n.
The only exception is thetimestamp metadata file, whose version would not be knownin advance when a client performs an update. Thetimestamp metadatalists theversion of thesnapshot metadata, which in turn lists the versions of thetargets and delegated targets metadata, all as part of a given consistentsnapshot.
In normal usage, version number overflow is unlikely to occur. An 8-byte integer,for instance, can be incremented once per millisecond and last almost 300 millionyears. If an attacker increases the version number arbitrarily, the repositorycan recover by revoking the compromised keys and resetting the version number asdescribed in the TUFspecification.
Thetargets or delegated targets metadata refer to the actual targetfiles, including their cryptographic hashes as specified above.Thus, to mark a target file as part of a consistent snapshot it MUST, whenwritten to disk, include its hash in its filename:
- HASH.FILENAME
- where HASH is thehex digest of the hash of the file contents andFILENAME is the original filename.
This means that there MAY be multiple copies of every target file, one for eachof the cryptographic hash functions specified above.
Assuming infinite disk space, strictly incrementing version numbers, and nohash collisions, a client may safely read from one snapshot while PyPIproduces another snapshot.
Clients, such as pip, that use the TUF protocol MUST be modified to downloadevery metadata and target file, except fortimestamp metadata. This is doneby including, in the file request, the version of the file (for metadata),or the cryptographic hash of the file (for target files) in the filename.
In this simple but effective manner, PyPI is able to capture a consistentsnapshot of all projects and the associated metadata at a given time. The nextsubsection provides implementation details of this idea.
Note: This PEP does not prohibit using advanced file systems or tools toproduce consistent snapshots. There are two important reasons for proposing a simple solution in this PEP.First, the solution does not mandate that PyPIuse any particular file system or tool. Second, the generic file-system basedapproach allows mirrors to use extant file transfer tools, such as rsync, toefficiently transfer consistent snapshots from PyPI.
When a new distribution file is uploaded to PyPI, PyPI MUST update theresponsiblebin-n metadata. Remember that all target files are sorted intobins by their filename hashes. PyPI MUST also updatesnapshot to account forthe updatedbin-n metadata, andtimestamp to account for the updatedsnapshot metadata. These updates SHOULD be handled by an automatedsnapshotprocess.
File uploads MAY be handled in parallel, however, consistent snapshots MUST beproduced in a strictly sequential manner. Furthermore, as long as distributionfiles are self-contained, a consistent snapshot MAY be produced for eachuploaded file. To do so upload processes place new distribution files into aconcurrency-safe FIFO queue and the snapshot process reads from that queue onefile at a time and performs the following tasks:
First, it adds the new file path to the relevantbin-n metadata, incrementsits version number, signs it with thebin-n role key, and writes it toVERSION_NUMBER.bin-N.json.
Then, it takes the most recentsnapshot metadata, updates itsbin-nmetadata version numbers, increments its own version number, signs it with thesnapshot role key, and writes it toVERSION_NUMBER.snapshot.json.
And finally, the snapshot process takes the most recenttimestamp metadata,updates itssnapshot metadata hash and version number, increments its ownversion number, sets a new expiration time, signs it with thetimestamp rolekey, and writes it totimestamp.json.
When updatingbin-n metadata for a consistent snapshot, the snapshot processSHOULD also include any new or updated hashes of simple index pages in therelevantbin-n metadata. Note that, simple index pages may be generateddynamically on API calls, so it is important that their output remains stablethroughout the validity of a consistent snapshot.
Since the snapshot process MUST generate consistent snapshots in a strictlysequential manner it constitutes a bottleneck. Fortunately, the operation ofsigning is fast enough that this may be done a thousand or more times persecond.
Moreover, PyPI MAY serve distribution files to clients before the correspondingconsistent snapshot metadata is generated. In that case the client softwareSHOULD inform the user that full TUF protection is not yet available but willbe shortly.
PyPI SHOULD use atransaction log to record upload processes and thesnapshot queue for auditing and to recover from errors after a server failure.
To avoid running out of disk space due to the constant production of newconsistent snapshots, PyPI SHOULD regularly delete old consistent snapshots,i.e. metadata and target files that were obsoleted some reasonable time inthe past, such as 1 hour.
In order to preserve the latest consistent snapshot PyPI MAY use a“mark-and-sweep” algorithm. That is, walk from the root of the latestconsistent snapshot, i.e.timestamp oversnapshot overtargets anddelegated targets until the target files, marking all visited files, anddelete all unmarked files. The last few consistent snapshots may be preservedin a similar fashion.
Deleting a consistent snapshot will cause clients to see nothing except HTTP404 responses to any request for a file within that consistent snapshot.Clients SHOULD then retry their requests (as before) with the latest consistentsnapshot.
Note thatroot metadata, even though versioned, is not part of any consistentsnapshot. PyPI MUST NOT delete old versions ofroot metadata. This guaranteesthat clients can update to the latestroot role keys, no matter how outdatedtheir localroot metadata is.
From time to time either a project or a distribution will need to be revoked.To revoke trust in either a project or a distribution, the associated bin-nrole can simply remove the corresponding targets and re-sign the bin-nmetadata. This action only requires actions with the online bin-n key.
This PEP has covered the minimum security model, the TUF roles that should beadded to support continuous delivery of distributions, and how to generate andsign the metadata for each role. The remaining sections discuss how PyPISHOULD audit repository metadata, and the methods PyPI can use to detect andrecover from a PyPI compromise.
Table 4 summarizes a few of the attacks possible when a threshold number ofprivate cryptographic keys (belonging to any of the PyPI roles) arecompromised. The leftmost column lists the roles (or a combination of roles)that have been compromised, and the columns to its right show whether thecompromised roles leave clients susceptible to malicious updates, a freezeattack, or metadata inconsistency attacks. Note that if the timestamp, snapshot,and bin-n roles are stored in the same online location, a compromise of onemeans they will all be compromised. Therefore, the table considers theseroles together. A version of this table that considers these roles separatelyis included inPEP 480.
| Role Compromise | Malicious Updates | Freeze Attack | Metadata Inconsistency Attacks |
|---|---|---|---|
| targetsORbins | NOtimestamp and snapshot need to cooperate | ||
| timestampANDsnapshotANDbin-n | YESlimited by earliest root, targets, or bins metadata expiry time | ||
| root | YES | ||
Table 4: Attacks possible by compromising certain combinations of role keys.InSeptember 2013, it was shown how the latest version (at the time) of pipwas susceptible to these attacks and how TUF could protect users against them[14].
Note that compromisingtargets orbinsdoes not immediately allow an attacker to serve maliciousupdates. The attacker must also compromise thetimestamp andsnapshotroles, which are both online and therefore more likely to be compromised.This means that, in order to launch any attack, one must not only be able toact as a man-in-the-middle, but also compromise thetimestamp key (orcompromise theroot keys and sign a newtimestamp key). To launch anyattack other than a freeze attack, one must also compromise thesnapshot key.In practice, this PEP recommends storing thesnapshot,timestamp, andbin-n keys together, or even using the same key for all of these roles.Because of this, the attacker only needs to compromise this single server toperform any of the attacks listed above. Note that clients are still protectedagainst compromises of non-signing infrastructure such as CDNs or mirrors.Moreover, the offlineroot key willallow the repository to recover from an attack by revoking the online key(s).
The maximum security model shows how TUF mitigates online key compromises byintroducing additional roles for end-to-signing. Details about how to generatedeveloper keys and sign upload distributions are provided inPEP 480.
A key compromise means that a threshold of keys (belonging to the metadataroles on PyPI), as well as the PyPI infrastructure have been compromised andused to sign new metadata on PyPI.
If a threshold number oftimestamp,snapshot,targets,bins orbin-nkeys have been compromised, then PyPI MUST take the following steps:
Following these steps would preemptively protect all of these roles, even ifonly one of them may have been compromised.
If a threshold number ofroot keys have been compromised, then PyPI MUST takethe above steps and also replace allroot keys in theroot role.
It is also RECOMMENDED that PyPI sufficiently document compromises withsecurity bulletins. These security bulletins will be most informative whenusers of pip-with-TUF are unable to install or update a project because thekeys for thetimestamp,snapshot orroot roles are no longer valid. Theycould then visit the PyPI web site to consult security bulletins that wouldhelp to explain why they are no longer able to install or update, and then takeaction accordingly. When a threshold number ofroot keys have not beenrevoked due to a compromise, then newroot metadata may be safely updatedbecause a threshold number of existingroot keys will be used to sign for theintegrity of the newroot metadata. TUF clients will be able to verify theintegrity of the newroot metadata with a threshold number of previouslyknownroot keys. This will be the common case. Otherwise, in the worstcase, in which a threshold number ofroot keys have been revoked due to acompromise, an end-user may choose to update newroot metadata without-of-band mechanisms.
If a malicious party compromises PyPI, they can sign arbitrary files with anyof the online keys. The roles with offline keys (i.e.,root,targets andbins)are still protected. To safely recover from a repository compromise, snapshotsshould be audited to ensure files are only restored to trusted versions.
When a repository compromise has been detected, the integrity of three types ofinformation must be validated:
In order to safely restore snapshots in the event of a compromise, PyPI SHOULDmaintain a small number of its own mirrors to copy PyPI snapshots according tosome schedule. The mirroring protocol can be used immediately for thispurpose. The mirrors must be secured and isolated such that they areresponsible only for mirroring PyPI. The mirrors can be checked against oneanother to detect accidental or malicious failures.
Another approach is to generate the cryptographic hash ofsnapshotperiodically and tweet it. Perhaps a user comes forward with the actualmetadata and the repository maintainers can verify the metadata file’s cryptographichash. Alternatively, PyPI may periodically archive its own versions ofsnapshot rather than rely on externally provided metadata. In this case,PyPI SHOULD take the cryptographic hash of every target file on therepository and store this data on an offline device. If any target filehash has changed, this indicates an attack.
As for attacks that serve different versions of metadata, or freeze a versionof a distribution at a specific version, they can be handled by TUF with techniqueslike implicit key revocation and metadata mismatch detection[2].
If breaking changes are made to the update process, PyPI should implement thesechanges without disrupting existing clients. For general guidance on how to doso, see the ongoing discussion in the TAPrepository.
Note that the changes to PyPI from this PEP will be backwards compatible. Thelocation of target files and simple indices are not changed in this PEP, so anyexisting PyPI clients will still be able to perform updates using these files.This PEP adds the ability for clients to use TUF metadata to improve thesecurity of the update process.
If the algorithm used to hash target and metadata files becomes vulnerable, itSHOULD be replaced by a stronger hash algorithm.
The TUF metadata format allows to list digests from different hash algorithmsalongside each other, together with an algorithm identifier, so that clientscan seamlessly switch between algorithms.
However, once support for an old algorithm is turned off, clients that don’tsupport the new algorithm will only be able to install or update packages,including the client itself, by disabling TUF verification. To allow clients totransition without temporarily losing TUF security guarantees, we recommendthe following procedure.
This material is based upon work supported by the National Science Foundationunder Grants No. CNS-1345049 and CNS-0959138. Any opinions, findings, andconclusions or recommendations expressed in this material are those of theauthor(s) and do not necessarily reflect the views of the National ScienceFoundation.
We thank Alyssa Coghlan, Daniel Holth, Donald Stufft, and the distutils-sigcommunity in general for helping us to think about how to usably andefficiently integrate TUF with PyPI.
Roger Dingledine, Sebastian Hahn, Nick Mathewson, Martin Peck and Justin Samuelhelped us to design TUF from its predecessor, Thandy of the Tor project.
We appreciate the efforts of Konstantin Andrianov, Geremy Condra, Zane Fisher,Justin Samuel, Tian Tian, Santiago Torres, John Ward, and Yuyu Zheng indeveloping TUF.
Vladimir Diaz, Monzur Muhammad, Sai Teja Peddinti, Sumana Harihareswara,Ee Durbin and Dustin Ingram helped us to review this PEP.
Zane Fisher helped us to review and transcribe this PEP.
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0458.rst
Last modified:2025-02-01 08:59:27 GMT