Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 304 – Controlling Generation of Bytecode Files

PEP 304 – Controlling Generation of Bytecode Files

Author:
Skip Montanaro
Status:
Withdrawn
Type:
Standards Track
Created:
22-Jan-2003
Post-History:
27-Jan-2003, 31-Jan-2003, 17-Jun-2005

Table of Contents

Historical Note

While this original PEP was withdrawn, a variant of this featurewas eventually implemented for Python 3.8 inhttps://bugs.python.org/issue33499

Several of the issues and concerns originally raised in this PEP were resolvedby other changes in the intervening years:

  • the introduction of isolated mode to handle potential security concerns
  • the switch toimportlib, a fully import-hook based import system implementation
  • PEP 3147’s change in the bytecode cache layout to use__pycache__subdirectories, including thesource_to_cache(path) andcache_to_source(path) APIs that allow the interpreter to automaticallyhandle the redirection to a separate cache directory

Abstract

This PEP outlines a mechanism for controlling the generation andlocation of compiled Python bytecode files. This idea originallyarose as a patch request[1] and evolved into a discussion thread onthe python-dev mailing list[2]. The introduction of an environmentvariable will allow people installing Python or Python-basedthird-party packages to control whether or not bytecode files shouldbe generated at installation time, and if so, where they should bewritten. It will also allow users to control whether or not bytecodefiles should be generated at application run-time, and if so, wherethey should be written.

Proposal

Add a new environment variable, PYTHONBYTECODEBASE, to the mix ofenvironment variables which Python understands. PYTHONBYTECODEBASE isinterpreted as follows:

  • If not defined, Python bytecode is generated in exactly the same wayas is currently done. sys.bytecodebase is set to the root directory(either / on Unix and Mac OSX or the root directory of the startup(installation???) drive – typicallyC:\ – on Windows).
  • If defined and it refers to an existing directory to which the userhas write permission, sys.bytecodebase is set to that directory andbytecode files are written into a directory structure rooted at thatlocation.
  • If defined but empty, sys.bytecodebase is set to None and generationof bytecode files is suppressed altogether.
  • If defined and one of the following is true:
    • it does not refer to a directory,
    • it refers to a directory, but not one for which the user has writepermission

    a warning is displayed, sys.bytecodebase is set to None andgeneration of bytecode files is suppressed altogether.

After startup initialization, all runtime references are tosys.bytecodebase, not the PYTHONBYTECODEBASE environment variable.sys.path is not modified.

From the above, we see sys.bytecodebase can only take on two validtypes of values: None or a string referring to a valid directory onthe system.

During import, this extension works as follows:

  • The normal search for a module is conducted. The search order isroughly: dynamically loaded extension module, Python source file,Python bytecode file. The only time this mechanism comes into playis if a Python source file is found.
  • Once we’ve found a source module, an attempt to read a byte-compiledfile in the same directory is made. (This is the same as before.)
  • If no byte-compiled file is found, an attempt to read abyte-compiled file from the augmented directory is made.
  • If bytecode generation is required, the generated bytecode is writtento the augmented directory if possible.

Note that this PEP is explicitlynot about providingmodule-by-module or directory-by-directory control over thedisposition of bytecode files.

Glossary

  • “bytecode base” refers to the current setting ofsys.bytecodebase.
  • “augmented directory” refers to the directory formed from thebytecode base and the directory name of the source file.
  • PYTHONBYTECODEBASE refers to the environment variable when necessaryto distinguish it from “bytecode base”.

Locating bytecode files

When the interpreter is searching for a module, it will use sys.pathas usual. However, when a possible bytecode file is considered, anextra probe for a bytecode file may be made. First, a check is madefor the bytecode file using the directory in sys.path which holds thesource file (the current behavior). If a valid bytecode file is notfound there (either one does not exist or exists but is out-of-date)and the bytecode base is not None, a second probe is made using thedirectory in sys.path prefixed appropriately by the bytecode base.

Writing bytecode files

When the bytecode base is not None, a new bytecode file is written tothe appropriate augmented directory, never directly to a directory insys.path.

Defining augmented directories

Conceptually, the augmented directory for a bytecode file is thedirectory in which the source file exists prefixed by the bytecodebase. In a Unix environment this would be:

pcb=os.path.abspath(sys.bytecodebase)ifsourcefile[0]==os.sep:sourcefile=sourcefile[1:]augdir=os.path.join(pcb,os.path.dirname(sourcefile))

On Windows, which does not have a single-rooted directory tree, thedrive letter of the directory containing the source file is treated asa directory component after removing the trailing colon. Theaugmented directory is thus derived as

pcb=os.path.abspath(sys.bytecodebase)drive,base=os.path.splitdrive(os.path.dirname(sourcefile))drive=drive[:-1]ifbase[0]=="\\":base=base[1:]augdir=os.path.join(pcb,drive,base)

Fixing the location of the bytecode base

During program startup, the value of the PYTHONBYTECODEBASEenvironment variable is made absolute, checked for validity and addedto the sys module, effectively:

pcb=os.path.abspath(os.environ["PYTHONBYTECODEBASE"])probe=os.path.join(pcb,"foo")try:open(probe,"w")exceptIOError:sys.bytecodebase=Noneelse:os.unlink(probe)sys.bytecodebase=pcb

This allows the user to specify the bytecode base as a relative path,but not have it subject to changes to the current working directoryduring program execution. (I can’t imagine you’d want it to movearound during program execution.)

There is nothing special about sys.bytecodebase. The user may changeit at runtime if desired, but normally it will not be modified.

Rationale

In many environments it is not possible for non-root users to writeinto directories containing Python source files. Most of the time,this is not a problem as Python source is generally byte compiledduring installation. However, there are situations where bytecodefiles are either missing or need to be updated. If the directorycontaining the source file is not writable by the current user aperformance penalty is incurred each time a program importing themodule is run.[3] Warning messages may also be generated in certaincircumstances. If the directory is writable, nearly simultaneousattempts to write the bytecode file by two separate processesmay occur, resulting in file corruption.[4]

In environments with RAM disks available, it may be desirable forperformance reasons to write bytecode files to a directory on such adisk. Similarly, in environments where Python source code resides onnetwork file systems, it may be desirable to cache bytecode files onlocal disks.

Alternatives

The only other alternative proposed so far[1] seems to be to add a-R flag to the interpreter to disable writing bytecode filesaltogether. This proposal subsumes that. Adding a command-lineoption is certainly possible, but is probably not sufficient, as theinterpreter’s command line is not readily available duringinstallation (early during program startup???).

Issues

  • Interpretation of a module’s __file__ attribute. I believe the__file__ attribute of a module should reflect the true location ofthe bytecode file. If people want to locate a module’s source code,they should use imp.find_module(module).
  • Security - What if root has PYTHONBYTECODEBASE set? Yes, this canpresent a security risk, but so can many other things the root userdoes. The root user should probably not set PYTHONBYTECODEBASEexcept possibly during installation. Still, perhaps this problemcan be minimized. When running as root the interpreter should checkto see if PYTHONBYTECODEBASE refers to a directory which is writableby anyone other than root. If so, it could raise an exception orwarning and set sys.bytecodebase to None. Or, see the next item.
  • More security - What if PYTHONBYTECODEBASE refers to a generaldirectory (say, /tmp)? In this case, perhaps loading of apreexisting bytecode file should occur only if the file is owned bythe current user or root. (Does this matter on Windows?)
  • The interaction of this PEP with import hooks has not beenconsidered yet. In fact, the best way to implement this idea mightbe as an import hook. SeePEP 302.
  • In the current (pre-PEP 304) environment, it is safe to delete asource file after the corresponding bytecode file has been created,since they reside in the same directory. WithPEP 304 as currentlydefined, this is not the case. A bytecode file in the augmenteddirectory is only considered when the source file is present and itthus never considered when looking for module files ending in“.pyc”. I think this behavior may have to change.

Examples

In the examples which follow, the urllib source code resides in/usr/lib/python2.3/urllib.py and /usr/lib/python2.3 is in sys.path butis not writable by the current user.

  • The bytecode base is /tmp. /usr/lib/python2.3/urllib.pyc exists andis valid. When urllib is imported, the contents of/usr/lib/python2.3/urllib.pyc are used. The augmented directory isnot consulted. No other bytecode file is generated.
  • The bytecode base is /tmp. /usr/lib/python2.3/urllib.pyc exists,but is out-of-date. When urllib is imported, the generated bytecodefile is written to urllib.pyc in the augmented directory which hasthe value /tmp/usr/lib/python2.3. Intermediate directories will becreated as needed.
  • The bytecode base is None. No urllib.pyc file is found. Whenurllib is imported, no bytecode file is written.
  • The bytecode base is /tmp. No urllib.pyc file is found. Whenurllib is imported, the generated bytecode file is written to theaugmented directory which has the value /tmp/usr/lib/python2.3.Intermediate directories will be created as needed.
  • At startup, PYTHONBYTECODEBASE is /tmp/foobar, which does not exist.A warning is emitted, sys.bytecodebase is set to None and nobytecode files are written during program execution unlesssys.bytecodebase is later changed to refer to a valid,writable directory.
  • At startup, PYTHONBYTECODEBASE is set to /, which exists, but is notwritable by the current user. A warning is emitted,sys.bytecodebase is set to None and no bytecode files arewritten during program execution unless sys.bytecodebase islater changed to refer to a valid, writable directory. Note thateven though the augmented directory constructed for a particularbytecode file may be writable by the current user, what counts isthat the bytecode base directory itself is writable.
  • At startup PYTHONBYTECODEBASE is set to the empty string.sys.bytecodebase is set to None. No warning is generated, however.If no urllib.pyc file is found when urllib is imported, no bytecodefile is written.

In the Windows examples which follow, the urllib source code residesinC:\PYTHON22\urllib.py.C:\PYTHON22 is in sys.path but isnot writable by the current user.

  • The bytecode base is set toC:\TEMP.C:\PYTHON22\urllib.pycexists and is valid. When urllib is imported, the contents ofC:\PYTHON22\urllib.pyc are used. The augmented directory is notconsulted.
  • The bytecode base is set toC:\TEMP.C:\PYTHON22\urllib.pycexists, but is out-of-date. When urllib is imported, a new bytecodefile is written to the augmented directory which has the valueC:\TEMP\C\PYTHON22. Intermediate directories will be created asneeded.
  • At startup PYTHONBYTECODEBASE is set toTEMP and the currentworking directory at application startup isH:\NET. Thepotential bytecode base is thusH:\NET\TEMP. If this directoryexists and is writable by the current user, sys.bytecodebase will beset to that value. If not, a warning will be emitted andsys.bytecodebase will be set to None.
  • The bytecode base isC:\TEMP. No urllib.pyc file is found.When urllib is imported, the generated bytecode file is written tothe augmented directory which has the valueC:\TEMP\C\PYTHON22.Intermediate directories will be created as needed.

Implementation

See the patch on Sourceforge.[6]

References

[1] (1,2)
patch 602345, Option for not writing py.[co] files, Klose(https://bugs.python.org/issue602345)
[2]
python-dev thread, Disable writing .py[co], Norwitz(https://mail.python.org/pipermail/python-dev/2003-January/032270.html)
[3]
Debian bug report, Mailman is writing to /usr in cron, Wegner(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=96111)
[4]
python-dev thread, Parallel pyc construction, Dubois(https://mail.python.org/pipermail/python-dev/2003-January/032060.html)
[6]
patch 677103, PYTHONBYTECODEBASE patch (PEP 304), Montanaro(https://bugs.python.org/issue677103)

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0304.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2026 Movatter.jp