Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 262 – A Database of Installed Python Packages

Author:
A.M. Kuchling <amk at amk.ca>
Status:
Rejected
Type:
Standards Track
Topic:
Packaging
Created:
08-Jul-2001
Post-History:
27-Mar-2002

Table of Contents

Note

This PEP was superseded byPEP 345 andPEP 376, which were accepted.Therefore, this PEP is (by implication) rejected.

Introduction

This PEP describes a format for a database of the Python softwareinstalled on a system.

(In this document, the term “distribution” is used to mean a setof code that’s developed and distributed together. A “distribution”is the same as a Red Hat or Debian package, but the term “package”already has a meaning in Python terminology, meaning “a directorywith an__init__.py file in it.”)

Requirements

We need a way to figure out what distributions, and what versions ofthose distributions, are installed on a system. We want to providefeatures similar to CPAN, APT, or RPM. Required use cases thatshould be supported are:

  • Is distribution X on a system?
  • What version of distribution X is installed?
  • Where can the new version of distribution X be found? (This canbe defined as either “a home page where the user can go andfind a download link”, or “a place where a program can findthe newest version?” Both should probably be supported.)
  • What files did distribution X put on my system?
  • What distribution did the file x/y/z.py come from?
  • Has anyone modified x/y/z.py locally?
  • What other distributions does this software need?
  • What Python modules does this distribution provide?

Database Location

The database lives in a bunch of files under<prefix>/lib/python<version>/install-db/. This location will becalled INSTALLDB through the remainder of this PEP.

The structure of the database is deliberately kept simple; eachfile in this directory or its subdirectories (if any) describes asingle distribution. Binary packagings of Python software such asRPMs can then update Python’s database by just installing thecorresponding file into the INSTALLDB directory.

The rationale for scanning subdirectories is that we can move to adirectory-based indexing scheme if the database directory containstoo many entries. For example, this would let us transparentlyswitch from INSTALLDB/Numeric to INSTALLDB/N/Nu/Numeric or somesimilar hashing scheme.

Database Contents

Each file in INSTALLDB or its subdirectories describes a singledistribution, and has the following contents:

An initial line listing the sections in this file, separatedby whitespace. Currently this will always be ‘PKG-INFO FILESREQUIRES PROVIDES’. This is for future-proofing; if we add anew section, for example to list documentation files, thenwe’d add a DOCS section and list it in the contents. Sectionsare always separated by blank lines.

A distribution that uses the Distutils for installation shouldautomatically update the database. Distributions that roll theirown installation will have to use the database’s API tomanually add or update their own entry. System package managerssuch as RPM or pkgadd can just create the new file in theINSTALLDB directory.

Each section of the file is used for a different purpose.

PKG-INFO section

An initial set ofRFC 822 headers containing the distributioninformation for a file, as described inPEP 241, “Metadata forPython Software Packages”.

FILES section

An entry for each file installed by thedistribution. Generated files such as .pyc and .pyo files areon this list as well as the original .py files installed by adistribution; their checksums won’t be stored or checked,though.

Each file’s entry is a single tab-delimited line that containsthe following fields:

  • The file’s full path, as installed on the system.
  • The file’s size
  • The file’s permissions. On Windows, this field will always be‘unknown’
  • The owner and group of the file, separated by a tab.On Windows, these fields will both be ‘unknown’.
  • A SHA1 digest of the file, encoded in hex. For generated filessuch as *.pyc files, this field must contain the string “-“,which indicates that the file’s checksum should not be verified.

REQUIRES section

This section is a list of strings giving the services required forthis module distribution to run properly. This list includes thedistribution name (“python-stdlib”) and module names (“rfc822”,“htmllib”, “email”, “email.Charset”). It will be specifiedby an extra ‘requires’ argument to thedistutils.core.setup()function. For example:

setup(...,requires=['xml.utils.iso8601',

Eventually there may be automated tools that look through all ofthe code and produce a list of requirements, but it’s unlikelythat these tools can handle all possible cases; a manualway to specify requirements will always be necessary.

PROVIDES section

This section is a list of strings giving the services provided byan installed distribution. This list includes the distribution name(“python-stdlib”) and module names (“rfc822”, “htmllib”, “email”,“email.Charset”).

XXX should files be listed? e.g. $PREFIX/lib/color-table.txt,to pick up data files, required scripts, etc.

Eventually there may be an option to let module developers addtheir own strings to this section. For example, you might add“XML parser” to this section, and other module distributions couldthen list “XML parser” as one of their dependencies to indicatethat multiple different XML parsers can be used. For now thisability isn’t supported because it raises too many issues: do weneed a central registry of legal strings, or just let people putwhatever they like? Etc., etc…

API Description

There’s a single fundamental class, InstallationDatabase. Thecode for it lives in distutils/install_db.py. (XXX anysuggestions for alternate locations in the standard library, or analternate module name?)

The InstallationDatabase returns instances of Distribution that containall the information about an installed distribution.

XXX Several of the fields in Distribution are duplicates of ones indistutils.dist.Distribution. Probably they should be factored outinto the Distribution class proposed here, but can this be done in abackward-compatible way?

InstallationDatabase has the following interface:

classInstallationDatabase:def__init__(self,path=None):"""InstallationDatabase(path:string)        Read the installation database rooted at the specified path.        If path is None, INSTALLDB is used as the default.        """defget_distribution(self,distribution_name):"""get_distribution(distribution_name:string) : Distribution        Get the object corresponding to a single distribution.        """deflist_distributions(self):"""list_distributions() : [Distribution]        Return a list of all distributions installed on the system,        enumerated in no particular order.        """deffind_distribution(self,path):"""find_file(path:string) : Distribution        Search and return the distribution containing the file 'path'.        Returns None if the file doesn't belong to any distribution        that the InstallationDatabase knows about.        XXX should this work for directories?        """classDistribution:"""Instance attributes:    name : string      Distribution name    files : {string : (size:int, perms:int, owner:string, group:string,                       digest:string)}       Dictionary mapping the path of a file installed by this distribution       to information about the file.    The following fields all come from PEP 241.    version : distutils.version.Version      Version of this distribution    platform : [string]    summary : string    description : string    keywords : string    home_page : string    author : string    author_email : string    license : string    """defadd_file(self,path):"""add_file(path:string):None        Record the size, ownership, &c., information for an installed file.        XXX as written, this would stat() the file.  Should the size/perms/        checksum all be provided as parameters to this method instead?        """defhas_file(self,path):"""has_file(path:string) : Boolean        Returns true if the specified path belongs to a file in this        distribution.        """defcheck_file(self,path):"""check_file(path:string) : Boolean        Checks whether the file's size, checksum, and ownership match,        returning true if they do.        """

Deliverables

A description of the database API, to be added to this PEP.

Patches to the Distutils that 1) implement an InstallationDatabaseclass, 2) Update the database when a new distribution is installed. 3)add a simple package management tool, features to be added to thisPEP. (Or should that be a separate PEP?) See[2] for the currentpatch.

Open Issues

PJE suggests the installation database “be potentially present onevery directory in sys.path, with the contents merged in sys.pathorder. This would allow home-directory or otheralternate-location installs to work, and ease the process of adistutils install command writing the file.” Nice feature: it doesmean that package manager tools can take into account Pythonpackages that a user has privately installed.

AMK wonders: what does setup.py do if it’s told to installpackages to a directory not on sys.path? Does it write aninstall-db directory to the directory it’s told to write to, ordoes it do nothing?

Should the package-database file itself be included in the fileslist? (PJE would think yes, but of course it can’t contain itsown checksum. AMK can’t think of a use case where including theDB file matters.)

PJE wonders about writing the package DB filefirst, before installing any other files, so that failed partialinstallations can both be backed out, and recognized as broken.This PEP may have to specify some algorithm for recognizing thissituation.

Should we guarantee the format of installation databases remainscompatible across Python versions, or is it subject to arbitrarychange? Probably we need to guarantee compatibility.

Rejected Suggestions

Instead of using one text file per distribution, one large textfile or an anydbm file could be used. This has been rejected fora few reasons. First, performance is probably not an extremelypressing concern as the database is only used when installing orremoving software, a relatively infrequent task. Scalability alsolikely isn’t a problem, as people may have hundreds of Pythonpackages installed, but thousands or tens of thousands seemsunlikely. Finally, individual text files are compatible withinstallers such as RPM or DPKG because a binary packager can justdrop the new database file into the database directory. If onelarge text file or a binary file were used, the Python databasewould then have to be updated by running a postinstall script.

On Windows, the permissions and owner/group of a file aren’tstored. Windows does in fact support ownership and accesspermissions, but reading and setting them requires the win32allextensions, and they aren’t present in the basic Python installerfor Windows.

References

[1] Michael Muller’s patch (posted to the Distutils-SIG around 28 Dec 1999) generates a list of installed files.

[2]
A patch to implement this PEP will be tracked aspatch #562100 on SourceForge.https://bugs.python.org/issue562100 .Code implementing the installation database is currently inPython CVS in the nondist/sandbox/pep262 directory.

Acknowledgements

Ideas for this PEP originally came from postings by Greg Ward,Fred L. Drake Jr., Thomas Heller, Mats Wichmann, Phillip J. Eby,and others.

Many changes and rewrites to this document were suggested by thereaders of the Distutils SIG.

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0262.rst

Last modified:2025-02-01 08:55:40 GMT


[8]ページ先頭

©2009-2025 Movatter.jp