Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 488 – Elimination of PYO files

Author:
Brett Cannon <brett at python.org>
Status:
Final
Type:
Standards Track
Created:
20-Feb-2015
Python-Version:
3.5
Post-History:
06-Mar-2015,13-Mar-2015,20-Mar-2015

Table of Contents

Abstract

This PEP proposes eliminating the concept of PYO files from Python.To continue the support of the separation of bytecode files based ontheir optimization level, this PEP proposes extending the PYC filename to include the optimization level in the bytecode repositorydirectory when there are optimizations applied.

Rationale

As of today, bytecode files come in two flavours: PYC and PYO. A PYCfile is the bytecode file generated and read from when nooptimization level is specified at interpreter startup (i.e.,-Ois not specified). A PYO file represents the bytecode file that isread/written whenany optimization level is specified (i.e., when-Oor-OO is specified). This means that while PYCfiles clearly delineate the optimization level used when they weregenerated – namely no optimizations beyond the peepholer – the sameis not true for PYO files. To put this in terms of optimizationlevels and the file extension:

  • 0:.pyc
  • 1 (-O):.pyo
  • 2 (-OO):.pyo

The reuse of the.pyo file extension for both level 1 and 2optimizations means that there is no clear way to tell whatoptimization level was used to generate the bytecode file. In termsof reading PYO files, this can lead to an interpreter using a mixtureof optimization levels with its code if the user was not careful tomake sure all PYO files were generated using the same optimizationlevel (typically done by blindly deleting all PYO files and thenusing thecompileall module to compile all-new PYO files[1]).This issue is only compounded when people optimize Python code beyondwhat the interpreter natively supports, e.g., using the astoptimizerproject[2].

In terms of writing PYO files, the need to delete all PYO filesevery time one either changes the optimization level they want to useor are unsure of what optimization was used the last time PYO fileswere generated leads to unnecessary file churn. The change proposedby this PEP also allows forall optimization levels to bepre-compiled for bytecode files ahead of time, something that iscurrently impossible thanks to the reuse of the.pyo fileextension for multiple optimization levels.

As for distributing bytecode-only modules, having to distribute both.pyc and.pyo files is unnecessary for the common use-caseof code obfuscation and smaller file deployments. This means thatbytecode-only modules will only load from their non-optimized.pyc file name.

Proposal

To eliminate the ambiguity that PYO files present, this PEP proposeseliminating the concept of PYO files and their accompanying.pyofile extension. To allow for the optimization level to be unambiguousas well as to avoid having to regenerate optimized bytecode filesneedlessly in the__pycache__ directory, the optimization levelused to generate the bytecode file will be incorporated into thebytecode file name. When no optimization level is specified, thepre-PEP.pyc file name will be used (i.e., no optimization levelwill be specified in the file name). For example, a source file namedfoo.py in CPython 3.5 could have the following bytecode filesbased on the interpreter’s optimization level (none,-O, and-OO):

  • 0:foo.cpython-35.pyc (i.e., no change)
  • 1:foo.cpython-35.opt-1.pyc
  • 2:foo.cpython-35.opt-2.pyc

Currently bytecode file names are created byimportlib.util.cache_from_source(), approximately using thefollowing expression defined byPEP 3147[3],[4]:

'{name}.{cache_tag}.pyc'.format(name=module_name,cache_tag=sys.implementation.cache_tag)

This PEP proposes to change the expression when an optimizationlevel is specified to:

'{name}.{cache_tag}.opt-{optimization}.pyc'.format(name=module_name,cache_tag=sys.implementation.cache_tag,optimization=str(sys.flags.optimize))

The “opt-” prefix was chosen so as to provide a visual separatorfrom the cache tag. The placement of the optimization level afterthe cache tag was chosen to preserve lexicographic sort order ofbytecode file names based on module name and cache tag which willnot vary for a single interpreter. The “opt-” prefix was chosen over“o” so as to be somewhat self-documenting. The “opt-” prefix waschosen over “O” so as to not have any confusion in case “0” was theleading prefix of the optimization level.

A period was chosen over a hyphen as a separator so as to distinguishclearly that the optimization level is not part of the interpreterversion as specified by the cache tag. It also lends to the use ofthe period in the file name to delineate semantically differentconcepts.

For example, if-OO had been passed to the interpreter theninstead ofimportlib.cpython-35.pyo the file name would beimportlib.cpython-35.opt-2.pyc.

Leaving out the newopt- tag when no optimization level isapplied should increase backwards-compatibility. This is also moreunderstanding of Python implementations which have no use foroptimization levels (e.g., PyPy[10]).

It should be noted that this change in no way affects the performanceof import. Since the import system looks for a single bytecode filebased on the optimization level of the interpreter already andgenerates a new bytecode file if it doesn’t exist, the introductionof potentially more bytecode files in the__pycache__ directoryhas no effect in terms of stat calls. The interpreter will continueto look for only a single bytecode file based on the optimizationlevel and thus no increase in stat calls will occur.

The only potentially negative result of this PEP is the probableincrease in the number of.pyc files and thus increase in storageuse. But for platforms where this is an issue,sys.dont_write_bytecode exists to turn off bytecode generation sothat it can be controlled offline.

Implementation

An implementation of this PEP is available[11].

importlib

Asimportlib.util.cache_from_source() is the API that exposesbytecode file paths as well as being directly used by importlib, itrequires the most critical change. As of Python 3.4, the function’ssignature is:

importlib.util.cache_from_source(path,debug_override=None)

This PEP proposes changing the signature in Python 3.5 to:

importlib.util.cache_from_source(path,debug_override=None,*,optimization=None)

The introducedoptimization keyword-only parameter will controlwhat optimization level is specified in the file name. If theargument isNone then the current optimization level of theinterpreter will be assumed (including no optimization). Any argumentgiven foroptimization will be passed tostr() and must havestr.isalnum() be true, elseValueError will be raised (thisprevents invalid characters being used in the file name). If theempty string is passed in foroptimization then the addition ofthe optimization will be suppressed, reverting to the file nameformat which predates this PEP.

It is expected that beyond Python’s own two optimization levels,third-party code will use a hash of optimization names to specify theoptimization level, e.g.hashlib.sha256(','.join(['nodeadcode','constfolding'])).hexdigest().While this might lead to long file names, it is assumed that mostusers never look at the contents of the __pycache__ directory and sothis won’t be an issue.

Thedebug_override parameter will be deprecated. AFalsevalue will be equivalent tooptimization=1 while aTruevalue will representoptimization='' (aNone argument willcontinue to mean the same as foroptimization). Adeprecation warning will be raised whendebug_override is given avalue other thanNone, but there are no plans for the completeremoval of the parameter at this time (but removal will be no laterthan Python 4).

The various module attributes for importlib.machinery which relate tobytecode file suffixes will be updated[7]. TheDEBUG_BYTECODE_SUFFIXES andOPTIMIZED_BYTECODE_SUFFIXES willboth be documented as deprecated and set to the same value asBYTECODE_SUFFIXES (removal ofDEBUG_BYTECODE_SUFFIXES andOPTIMIZED_BYTECODE_SUFFIXES is not currently planned, but will benot later than Python 4).

All various finders and loaders will also be updated as necessary,but updating the previous mentioned parts of importlib should be allthat is required.

Rest of the standard library

The various functions exposed by thepy_compile andcompileall functions will be updated as necessary to make surethey follow the new bytecode file name semantics[6],[1]. The CLIfor thecompileall module will not be directly affected (the-b flag will be implicit as it will no longer generate.pyofiles when-O is specified).

Compatibility Considerations

Any code directly manipulating bytecode files from Python 3.2 onwill need to consider the impact of this change on their code (priorto Python 3.2 – including all of Python 2 – there was no__pycache__ which already necessitates bifurcating bytecode filehandling support). If code was setting thedebug_overrideargument toimportlib.util.cache_from_source() then care will beneeded if they want the path to a bytecode file with an optimizationlevel of 2. Otherwise only codenot usingimportlib.util.cache_from_source() will need updating.

As for people who distribute bytecode-only modules (i.e., use abytecode file instead of a source file), they will have to choosewhich optimization level they want their bytecode files to be sincedistributing a.pyo file with a.pyc file will no longer beof any use. Since people typically only distribute bytecode files forcode obfuscation purposes or smaller distribution size then onlyhaving to distribute a single.pyc should actually be beneficialto these use-cases. And since the magic number for bytecode fileschanged in Python 3.5 to supportPEP 465 there is no need to supportpre-existing.pyo files[8].

Rejected Ideas

Completely dropping optimization levels from CPython

Some have suggested that instead of accommodating the variousoptimization levels in CPython, we should instead drop thementirely. The argument is that significant performance gains wouldoccur from runtime optimizations through something like a JIT and notthrough pre-execution bytecode optimizations.

This idea is rejected for this PEP as that ignores the fact thatthere are people who do find the pre-existing optimization levels forCPython useful. It also assumes that no other Python interpreterwould find what this PEP proposes useful.

Alternative formatting of the optimization level in the file name

Using the “opt-” prefix and placing the optimization level betweenthe cache tag and file extension is not critical. All options whichhave been considered are:

  • importlib.cpython-35.opt-1.pyc
  • importlib.cpython-35.opt1.pyc
  • importlib.cpython-35.o1.pyc
  • importlib.cpython-35.O1.pyc
  • importlib.cpython-35.1.pyc
  • importlib.cpython-35-O1.pyc
  • importlib.O1.cpython-35.pyc
  • importlib.o1.cpython-35.pyc
  • importlib.1.cpython-35.pyc

These were initially rejected either because they would change thesort order of bytecode files, possible ambiguity with the cache tag,or were not self-documenting enough. An informal poll was taken andpeople clearly preferred the formatting proposed by the PEP[9].Since this topic is non-technical and of personal choice, the issueis considered solved.

Embedding the optimization level in the bytecode metadata

Some have suggested that rather than embedding the optimization levelof bytecode in the file name that it be included in the file’smetadata instead. This would mean every interpreter had a single copyof bytecode at any time. Changing the optimization level would thusrequire rewriting the bytecode, but there would also only be a singlefile to care about.

This has been rejected due to the fact that Python is often installedas a root-level application and thus modifying the bytecode file formodules in the standard library are always possible. In thissituation integrators would need to guess at what a reasonableoptimization level was for users for any/all situations. Byallowing multiple optimization levels to co-exist simultaneously itfrees integrators from having to guess what users want and allowsusers to utilize the optimization level they want.

References

[1] (1,2)
The compileall module(https://docs.python.org/3.5/library/compileall.html)
[2]
The astoptimizer project(https://web.archive.org/web/20150909225454/https://pypi.python.org/pypi/astoptimizer)
[3]
importlib.util.cache_from_source()(https://docs.python.org/3.5/library/importlib.html#importlib.util.cache_from_source)
[4]
Implementation ofimportlib.util.cache_from_source() from CPython 3.4.3rc1(https://github.com/python/cpython/blob/e55181f517bbfc875065ce86ed3e05cf0e0246fa/Lib/importlib/_bootstrap.py#L437)
[6]
The py_compile module(https://docs.python.org/3.5/library/compileall.html)
[7]
The importlib.machinery module(https://docs.python.org/3.5/library/importlib.html#module-importlib.machinery)
[8]
importlib.util.MAGIC_NUMBER(https://docs.python.org/3.5/library/importlib.html#importlib.util.MAGIC_NUMBER)
[9]
Informal poll of file name format options on Google+(https://web.archive.org/web/20160925163500/https://plus.google.com/+BrettCannon/posts/fZynLNwHWGm)
[10]
The PyPy Project(https://www.pypy.org/)
[11]
Implementation of PEP 488(https://github.com/python/cpython/issues/67919)

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0488.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp