Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 3147 – PYC Repository Directories

Author:
Barry Warsaw <barry at python.org>
Status:
Final
Type:
Standards Track
Created:
16-Dec-2009
Python-Version:
3.2
Post-History:
30-Jan-2010, 25-Feb-2010, 03-Mar-2010, 12-Apr-2010
Resolution:
Python-Dev message

Table of Contents

Abstract

This PEP describes an extension to Python’s import mechanism whichimproves sharing of Python source code files among multiple installeddifferent versions of the Python interpreter. It does this byallowing more than one byte compilation file (.pyc files) to beco-located with the Python source file (.py file). The extensiondescribed here can also be used to support different Pythoncompilation caches, such as JIT output that may be produced by anUnladen Swallow (PEP 3146) enabled C Python.

Background

CPython compiles its source code into “byte code”, and for performancereasons, it caches this byte code on the file system whenever thesource file has changes. This makes loading of Python modules muchfaster because the compilation phase can be bypassed. When yoursource file isfoo.py, CPython caches the byte code in afoo.pycfile right next to the source.

Byte code files contain two 32-bit big-endian numbers followed by themarshaled[2] code object. The 32-bit numbers represent a magicnumber and a timestamp. The magic number changes whenever Pythonchanges the byte code format, e.g. by adding new byte codes to itsvirtual machine. This ensures that pyc files built for previousversions of the VM won’t cause problems. The timestamp is used tomake sure that the pyc file match the py file that was used to createit. When either the magic number or timestamp do not match, the pyfile is recompiled and a new pyc file is written.

In practice, it is well known that pyc files are not compatible acrossPython major releases. A reading of import.c[3] in the Pythonsource code proves that within recent memory, every new CPython majorrelease has bumped the pyc magic number.

Rationale

Linux distributions such as Ubuntu[4] and Debian[5] provide morethan one Python version at the same time to their users. For example,Ubuntu 9.10 Karmic Koala users can install Python 2.5, 2.6, and 3.1,with Python 2.6 being the default.

This causes a conflict for third party Python source files installedby the system, because you cannot compile a single Python source filefor more than one Python version at a time. When Python finds apycfile with a non-matching magic number, it falls back to the slowerprocess of recompiling the source. Thus if your system installed a/usr/share/python/foo.py, two different versions of Python wouldfight over thepyc file and rewrite it each time the source iscompiled. (The standard library is unaffected by this, since multipleversions of the stdlibare installed on such distributions..)

Furthermore, in order to ease the burden on operating system packagersfor these distributions, the distribution packages do not containPython version numbers[6]; they are shared across all Pythonversions installed on the system. Putting Python version numbers inthe packages would be a maintenance nightmare, since all the packages-and their dependencies - would have to be updated every time a newPython release was added or removed from the distribution. Because ofthe sheer number of packages available, this amount of work isinfeasible.

(PEP 384 has been proposed to address binary compatibility issuesof third party extension modules across different versions of Python.)

Because these distributions cannot share pyc files, elaboratemechanisms have been developed to put the resulting pyc files innon-shared locations while the source code is still shared. Examplesinclude the symlink-based Debian regimes python-support[8] andpython-central[9]. These approaches make for much more complicated,fragile, inscrutable, and fragmented policies for delivering Pythonapplications to a wide range of users. Arguably more users get Pythonfrom their operating system vendor than from upstream tarballs. Thus,solving this pyc sharing problem for CPython is a high priority forsuch vendors.

This PEP proposes a solution to this problem.

Proposal

Python’s import machinery is extended to write and search for bytecode cache files in a single directory inside every Python packagedirectory. This directory will be called__pycache__.

Further, pyc file names will contain a magic string (called a “tag”)that differentiates the Python version they were compiled for. Thisallows multiple byte compiled cache files to co-exist for a singlePython source file.

The magic tag is implementation defined, but should contain theimplementation name and a version number shorthand, e.g.cpython-32.It must be unique among all versions of Python, and whenever the magicnumber is bumped, a new magic tag must be defined. An examplepycfile for Python 3.2 is thusfoo.cpython-32.pyc.

The magic tag is available in theimp module via theget_tag()function. This is parallel to theimp.get_magic() function.

This scheme has the added benefit of reducing the clutter in a Pythonpackage directory.

When a Python source file is imported for the first time, a__pycache__ directory will be created in the package directory, ifone does not already exist. The pyc file for the imported source willbe written to the__pycache__ directory, using the magic-tagformatted name. If either the creation of the__pycache__ directoryor the pyc file inside that fails, the import will still succeed, justas it does in a pre-PEP 3147 world.

If the py source file is missing, the pyc file inside__pycache__will be ignored. This eliminates the problem of accidental stale pycfile imports.

For backward compatibility, Python will still support pyc-onlydistributions, however it will only do so when the pyc file lives inthe directory where the py filewould have been, i.e. not in the__pycache__ directory. pyc file outside of__pycache__ will onlybe imported if the py source file is missing.

Tools such aspy_compile[15] andcompileall[16] will beextended to createPEP 3147 formatted layouts automatically, but willhave an option to create pyc-only distribution layouts.

Examples

What would this look like in practice?

Let’s say we have a Python package namedalpha which contains asub-package namebeta. The source directory layout before bytecompilation might look like this:

alpha/__init__.pyone.pytwo.pybeta/__init__.pythree.pyfour.py

After byte compiling this package with Python 3.2, you would see thefollowing layout:

alpha/__pycache__/__init__.cpython-32.pycone.cpython-32.pyctwo.cpython-32.pyc__init__.pyone.pytwo.pybeta/__pycache__/__init__.cpython-32.pycthree.cpython-32.pycfour.cpython-32.pyc__init__.pythree.pyfour.py

Note: listing order may differ depending on the platform.

Let’s say that two new versions of Python are installed, one is Python3.3 and another is Unladen Swallow. After byte compilation, the filesystem would look like this:

alpha/__pycache__/__init__.cpython-32.pyc__init__.cpython-33.pyc__init__.unladen-10.pycone.cpython-32.pycone.cpython-33.pycone.unladen-10.pyctwo.cpython-32.pyctwo.cpython-33.pyctwo.unladen-10.pyc__init__.pyone.pytwo.pybeta/__pycache__/__init__.cpython-32.pyc__init__.cpython-33.pyc__init__.unladen-10.pycthree.cpython-32.pycthree.cpython-33.pycthree.unladen-10.pycfour.cpython-32.pycfour.cpython-33.pycfour.unladen-10.pyc__init__.pythree.pyfour.py

As you can see, as long as the Python version identifier string isunique, any number of pyc files can co-exist. These identifierstrings are described in more detail below.

A nice property of this layout is that the__pycache__ directoriescan generally be ignored, such that a normal directory listing wouldshow something like this:

alpha/__pycache__/__init__.pyone.pytwo.pybeta/__pycache__/__init__.pythree.pyfour.py

This is much less cluttered than even today’s Python.

Python behavior

When Python searches for a module to import (sayfoo), it may findone of several situations. As per current Python rules, the term“matching pyc” means that the magic number matches the currentinterpreter’s magic number, and the source file’s timestamp matchesthe timestamp in thepyc file exactly.

Case 0: The steady state

When Python is asked to import modulefoo, it searches for afoo.py file (orfoo package, but that’s not important for thisdiscussion) along itssys.path. If found, Python looks to see ifthere is a matching__pycache__/foo.<magic>.pyc file, and if so,thatpyc file is loaded.

Case 1: The first import

When Python locates thefoo.py, if the__pycache__/foo.<magic>.pycfile is missing, Python will create it, also creating the__pycache__ directory if necessary. Python will parse and bytecompile thefoo.py file and save the byte code in__pycache__/foo.<magic>.pyc.

Case 2: The second import

When Python is asked to import modulefoo a second time (in adifferent process of course), it will again search for thefoo.pyfile along itssys.path. When Python locates thefoo.py file, itlooks for a matching__pycache__/foo.<magic>.pyc and finding this,it reads the byte code and continues as usual.

Case 3: __pycache__/foo.<magic>.pyc with no source

It’s possible that thefoo.py file somehow got removed, whileleaving the cached pyc file still on the file system. If the__pycache__/foo.<magic>.pyc file exists, but thefoo.py file usedto create it does not, Python will raise anImportError when askedto import foo. In other words, Python will not import a pyc file fromthe cache directory unless the source file exists.

Case 4: legacy pyc files and source-less imports

Python will ignore all legacy pyc files when a source file exists nextto it. In other words, if afoo.pyc file exists next to thefoo.py file, the pyc file will be ignored in all cases

In order to continue to support source-less distributions though, ifthe source file is missing, Python will import a lone pyc file if itlives where the source file would have been.

Case 5: read-only file systems

When the source lives on a read-only file system, or the__pycache__directory or pyc file cannot otherwise be written, all the same rulesapply. This is also the case when__pycache__ happens to be writtenwith permissions which do not allow for writing containing pyc files.

Flow chart

Here is a flow chart describing how modules are loaded:

../_images/pep-3147-1.png

Alternative Python implementations

Alternative Python implementations such as Jython[11], IronPython[12], PyPy[13], Pynie[14], and Unladen Swallow can also use the__pycache__ directory to store whatever compilation artifacts makesense for their platforms. For example, Jython could store the classfile for the module in__pycache__/foo.jython-32.class.

Implementation strategy

This feature is targeted for Python 3.2, solving the problem for thoseand all future versions. It may be back-ported to Python 2.7.Vendors are free to backport the changes to earlier distributions asthey see fit. For backports of this feature to Python 2, when the-U flag is used, a file such asfoo.cpython-27u.pyc can bewritten.

Effects on existing code

Adoption of this PEP will affect existing code and idioms, both insidePython and outside. This section enumerates some of these effects.

Detecting PEP 3147 availability

The easiest way to detect whether your version of Python provides PEP3147 functionality is to do the following check:

>>>importimp>>>has3147=hasattr(imp,'get_tag')

__file__

In Python 3, when you import a module, its__file__ attribute pointsto its sourcepy file (in Python 2, it points to thepyc file). Apackage’s__file__ points to thepy file for its__init__.py.E.g.:

>>>importfoo>>>foo.__file__'foo.py'# baz is a package>>>importbaz>>>baz.__file__'baz/__init__.py'

Nothing in this PEP would change the semantics of__file__.

This PEP proposes the addition of an__cached__ attribute tomodules, which will always point to the actualpyc file that wasread or written. When the environment variable$PYTHONDONTWRITEBYTECODE is set, or the-B option is given, or ifthe source lives on a read-only filesystem, then the__cached__attribute will point to the location that thepyc filewould havebeen written to if it didn’t exist. This location of course includesthe__pycache__ subdirectory in its path.

For alternative Python implementations which do not supportpycfiles, the__cached__ attribute may point to whatever informationmakes sense. E.g. on Jython, this might be the.class file for themodule:__pycache__/foo.jython-32.class. Some implementations mayuse multiple compiled files to create the module, in which case__cached__ may be a tuple. The exact contents of__cached__ arePython implementation specific.

It is recommended that when nothing sensible can be calculated,implementations should set the__cached__ attribute toNone.

py_compile and compileall

Python comes with two modules,py_compile[15] andcompileall[16] which support compiling Python modules external to the built-inimport machinery.py_compile in particular has intimate knowledgeof byte compilation, so these will be updated to understand the newlayout. The-b flag is added tocompileall for writing legacy.pyc byte-compiled file path names.

bdist_wininst and the Windows installer

These tools also compile modules explicitly on installation. If theydo not usepy_compile andcompileall, then they would also have tobe modified to understand the new layout.

File extension checks

There exists some code which checks for files ending in.pyc andsimply chops off the last character to find the matching.py file.This code will obviously fail once this PEP is implemented.

To support this use case, we’ll add two new methods to theimppackage[17]:

  • imp.cache_from_source(py_path) ->pyc_path
  • imp.source_from_cache(pyc_path) ->py_path

Alternative implementations are free to override these functions toreturn reasonable values based on their own support for this PEP.These methods are allowed to returnNone when the implementation (orPEP 302 loader in effect) for whatever reason cannot calculatethe appropriate file name. They should not raise exceptions.

Backports

For versions of Python earlier than 3.2 (and possibly 2.7), it ispossible to backport this PEP. However, in Python 3.2 (and possibly2.7), this behavior will be turned on by default, and in fact, it willreplace the old behavior. Backports will need to support the oldlayout by default. We suggest supportingPEP 3147 through the use ofan environment variable called$PYTHONENABLECACHEDIR or the commandline switch-Xenablecachedir to enable the feature.

Makefiles and other dependency tools

Makefiles and other tools which calculate dependencies on.pyc files(e.g. to byte-compile the source if the.pyc is missing) will haveto be updated to check the new paths.

Alternatives

This section describes some alternative approaches or details thatwere considered and rejected during the PEP’s development.

Hexadecimal magic tags

pyc files inside of the__pycache__ directories contain a magic tagin their file names. These are mnemonic tags for the actual magicnumbers used by the importer. We could have used the hexadecimalrepresentation[10] of the binary magic number as a uniqueidentifier. For example, in Python 3.2:

>>>frombinasciiimporthexlify>>>fromimpimportget_magic>>>'foo.{}.pyc'.format(hexlify(get_magic()).decode('ascii'))'foo.580c0d0a.pyc'

This isn’t particularly human friendly though, thus the magic tagproposed in this PEP.

PEP 304

There is some overlap between the goals of this PEP andPEP 304,which has been withdrawn. HoweverPEP 304 would allow a user tocreate a shadow file system hierarchy in which to storepyc files.This concept of a shadow hierarchy forpyc files could be used tosatisfy the aims of this PEP. Although thePEP 304 does not indicatewhy it was withdrawn, shadow directories have a number of problems.The location of the shadowpyc files would not be easily discoveredand would depend on the proper and consistent use of the$PYTHONBYTECODE environment variable both by the system and by endusers. There are also global implications, meaning that while thesystem might want to shadowpyc files, users might not want to, butthe PEP defines only an all-or-nothing approach.

As an example of the problem, a common (though fragile) Python idiomfor locating data files is to do something like this:

fromosimportdirname,joinimportfoo.bardata_file=join(dirname(foo.bar.__file__),'my.dat')

This would be problematic sincefoo.bar.__file__ will give thelocation of thepyc file in the shadow directory, and it may not bepossible to find themy.dat file relative to the source directoryfrom there.

Fat byte compilation files

An earlier version of this PEP described “fat” Python byte code files.These files would contain the equivalent of multiplepyc files in asinglepyf file, with a lookup table keyed off the appropriate magicnumber. This was an extensible file format so that the first 5parallel Python implementations could be supported fairly efficiently,but with extension lookup tables available to scalepyf byte codeobjects as large as necessary.

The fat byte compilation files were fairly complex, and inherentlyintroduced difficult race conditions, so the current simplification ofusing directories was suggested. The same problem applies to usingzip files as the fat pyc file format.

Multiple file extensions

The PEP author also considered an approach where multiple thin bytecompiled files lived in the same place, but used different fileextensions to designate the Python version. E.g. foo.pyc25,foo.pyc26, foo.pyc31 etc. This was rejected because of the clutterinvolved in writing so many different files. The multiple extensionapproach makes it more difficult (and an ongoing task) to update anytools that are dependent on the file extension.

.pyc

A proposal was floated to call the__pycache__ directory.pyc orsome other dot-file name. This would have the effect on *nix systemsof hiding the directory. There are many reasons why this wasrejected by the BDFL[20] including the fact that dot-files are onlyspecial on some platforms, and we actually donot want to hide thesecompletely from users.

Reference implementation

Work on this code is tracked in a Bazaar branch on Launchpad[22]until it’s ready for merge into Python 3.2. The work-in-progress diffcan also be viewed[23] and is updated automatically as new changesare uploaded.

A Rietveld code review issue[24] has been opened as of 2010-04-01 (no,this is not an April Fools joke :).

References

[2]
The marshal module:https://docs.python.org/3.1/library/marshal.html
[3]
import.c:https://github.com/python/cpython/blob/v3.2a1/Python/import.c
[4]
Ubuntu:https://www.ubuntu.com
[5]
Debian:https://www.debian.org
[6]
Debian Python Policy:https://www.debian.org/doc/packaging-manuals/python-policy/
[8]
python-support:https://web.archive.org/web/20100110123824/http://wiki.debian.org/DebianPythonFAQ#Whatispython-support.3F
[9]
python-central:https://web.archive.org/web/20100110123824/http://wiki.debian.org/DebianPythonFAQ#Whatispython-central.3F
[10]
binascii.hexlify():https://docs.python.org/3.1/library/binascii.html#binascii.hexlify
[11]
Jython:http://www.jython.org/
[12]
IronPython:http://ironpython.net/
[13]
PyPy:https://web.archive.org/web/20100310130136/http://codespeak.net/pypy/dist/pypy/doc/
[14]
Pynie:https://code.google.com/archive/p/pynie/
[15] (1,2)
py_compile:https://docs.python.org/3.1/library/py_compile.html
[16] (1,2)
compileall:https://docs.python.org/3.1/library/compileall.html
[17]
imp:https://docs.python.org/3.1/library/imp.html
[20]
https://www.mail-archive.com/python-dev@python.org/msg45203.html

[21] importlib:https://docs.python.org/3.1/library/importlib.html

[22]
https://code.launchpad.net/~barry/python/pep3147
[23]
https://code.launchpad.net/~barry/python/pep3147/+merge/22648
[24]
http://codereview.appspot.com/842043/show

ACKNOWLEDGMENTS

Barry Warsaw’s original idea was for fat Python byte code files.Martin von Loewis reviewed an early draft of the PEP and suggested thesimplification to store traditionalpyc andpyo files in adirectory. Many other people reviewed early versions of this PEP andprovided useful feedback including but not limited to:

  • David Malcolm
  • Josselin Mouette
  • Matthias Klose
  • Michael Hudson
  • Michael Vogt
  • Piotr Ożarowski
  • Scott Kitterman
  • Toshio Kuratomi

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-3147.rst

Last modified:2025-02-01 08:55:40 GMT


[8]ページ先頭

©2009-2025 Movatter.jp