Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 432 – Restructuring the CPython startup sequence

Author:
Alyssa Coghlan <ncoghlan at gmail.com>,Victor Stinner <vstinner at python.org>,Eric Snow <ericsnowcurrently at gmail.com>
Discussions-To:
Capi-SIG list
Status:
Withdrawn
Type:
Standards Track
Requires:
587
Created:
28-Dec-2012
Post-History:
28-Dec-2012, 02-Jan-2013, 30-Mar-2019, 28-Jun-2020

Table of Contents

PEP Withdrawal

From late 2012 to mid 2020, this PEP provided general background and specificconcrete proposals for making the CPython startup sequence easier to maintainand the CPython runtime easier to embed as part of a larger application.

For most of that time, the changes were maintained either in a separate featurebranch, or else as underscore-prefixed private APIs in the main CPython repo.

In 2019,PEP 587 migrated a subset of those API changes to the public CPythonAPI for Python 3.8+ (specifically, the PEP updated the interpreter runtime tooffer an explicitly multi-stage struct-based configuration interface).

In June 2020, in response to a query from the Steering Council, the PEP authorsdecided that it made sense to withdraw the original PEP, as enough has changedsincePEP 432 was first written that we think any further changes to thestartup sequence and embedding API would be best formulated as a new PEP (orPEPs) that take into account not only the not-yet-implemented ideas fromPEP 432that weren’t considered sufficiently well validated to make their way intoPEP 587, but also any feedback on the publicPEP 587 API, and any other lessonsthat have been learned while adjusting the CPython implementation to be moreembedding and subinterpreter friendly.

In particular, PEPs proposing the following changes, and any furtherinfrastructure changes needed to enable them, would likely still be worthexploring:

  • shipping an alternate Python executable that ignores all user levelsettings and runs in isolated mode by default, and would hence be moresuitable for execution of system level Python applications than the defaultinterpreter
  • enhancing the zipapp module to support the creation of single-file executablesfrom pure Python scripts (and potentially even Python extension modules, giventhe introduction of multi-phase extension module initialisation)
  • migrating the complex sys.path initialisation logic from C to Python in orderto improve test suite coverage and the general maintainability of that code

Abstract

This PEP proposes a mechanism for restructuring the startup sequence forCPython, making it easier to modify the initialization behaviour of thereference interpreter executable, as well as making it easier to controlCPython’s startup behaviour when creating an alternate executable orembedding it as a Python execution engine inside a larger application.

When implementation of this proposal is completed, interpreter startup willconsist of three clearly distinct and independently configurable phases:

  • Python core runtime preinitialization
    • setting up memory management
    • determining the encodings used for system interfaces (including settingspassed in for later configuration phase)
  • Python core runtime initialization
    • ensuring C API is ready for use
    • ensuring builtin and frozen modules are accessible
  • Main interpreter configuration
    • ensuring external modules are accessible
    • (Note: the name of this phase is quite likely to change)

Changes are also proposed that impact main module execution and subinterpreterinitialization.

Note: TBC = To Be Confirmed, TBD = To Be Determined. The appropriateresolution for most of these should become clearer as the referenceimplementation is developed.

Proposal

This PEP proposes that initialization of the CPython runtime be split intothree clearly distinct phases:

  • core runtime preinitialization
  • core runtime initialization
  • main interpreter configuration

(Earlier versions proposed only two phases, but experience with attempting toimplement the PEP as an internal CPython refactoring showed that at least 3phases are needed to get clear separation of concerns)

The proposed design also has significant implications for:

  • main module execution
  • subinterpreter initialization

In the new design, the interpreter will move through the followingwell-defined phases during the initialization sequence:

  • Uninitialized - haven’t even started the pre-initialization phase yet
  • Pre-Initialization - no interpreter available
  • Runtime Initialized - main interpreter partially available,subinterpreter creation not yet available
  • Initialized - main interpreter fully available, subinterpreter creationavailable

PEP 587 is a more detailed proposal that covers separating out thePre-Initialization phase from the last two phases, but doesn’t allow embeddingapplications to run arbitrary code while in the “Runtime Initialized” state(instead, initializing the core runtime will also always fully initialize themain interpreter, as that’s the way the native CPython CLI still works inPython 3.8).

As a concrete use case to help guide any design changes, and to solve a knownproblem where the appropriate defaults for system utilities differ from thosefor running user scripts, this PEP proposes the creation anddistribution of a separate system Python (system-python) executablewhich, by default, operates in “isolated mode” (as selected by the CPython-I switch), as well as the creation of an example stub binary that justruns an appended zip archive (permitting single-file pure Python executables)rather than going through the normal CPython startup sequence.

To keep the implementation complexity under control, this PEP doesnotpropose wholesale changes to the way the interpreter state is accessed atruntime. Changing the order in which the existing initialization stepsoccur in order to make the startup sequence easier to maintain is already asubstantial change, and attempting to make those other changes at the same timewill make the change significantly more invasive and much harder to review.However, such proposals may be suitable topics for follow-on PEPs or patches- one key benefit of this PEP and its related subproposals is decreasing thecoupling between the internal storage model and the configuration interface,so such changes should be easier once this PEP has been implemented.

Background

Over time, CPython’s initialization sequence has become progressively morecomplicated, offering more options, as well as performing more complex tasks(such as configuring the Unicode settings for OS interfaces in Python 3[10],bootstrapping a pure Python implementation of the import system, andimplementing an isolated mode more suitable for system applications that runwith elevated privileges[6]).

Much of this complexity is formally accessible only through thePy_MainandPy_Initialize APIs, offering embedding applications littleopportunity for customisation. This creeping complexity also makes lifedifficult for maintainers, as much of the configuration needs to takeplace prior to thePy_Initialize call, meaning much of the Python CAPI cannot be used safely.

A number of proposals are on the table for evenmore sophisticatedstartup behaviour, such as better control oversys.pathinitialization (e.g. easily adding additional directories on the command linein a cross-platform fashion[7], controlling the configuration ofsys.path[0][8]), easier configuration of utilities like coveragetracing when launching Python subprocesses[9]).

Rather than continuing to bolt such behaviour onto an already complicatedsystem indefinitely, this PEP proposes to start simplifying the status quo byintroducing a more structured startup sequence, with the aim of making thesefurther feature requests easier to implement.

Originally the entire proposal was maintained in this one PEP, but that provedimpractical, so as parts of the proposed design stabilised, they are now splitout into their own PEPs, allowing progress to be made, even while the detailsof the overall design are still evolving.

Key Concerns

There are a few key concerns that any change to the startup sequenceneeds to take into account.

Maintainability

The CPython startup sequence as of Python 3.6 was difficult to understand, andeven more difficult to modify. It was not clear what state the interpreter wasin while much of the initialization code executed, leading to behaviour suchas lists, dictionaries and Unicode values being created prior to the calltoPy_Initialize when the-X or-W options are used[1].

By moving to an explicitly multi-phase startup sequence, developers shouldonly need to understand:

  • which APIs and features are available prior to pre-configuration (essentiallynone, except for the pre-configuration API itself)
  • which APIs and features are available prior to core runtime configuration, andwill implicitly run the pre-configuration with default settings that match thebehaviour of Python 3.6 if the pre-configuration hasn’t been run explicitly
  • which APIs and features are only available after the main interpreter has beenfully configured (which will hopefully be a relatively small subset of thefull C API)

The first two aspects of that are covered byPEP 587, while the details of thelatter distinction are still being considered.

By basing the new design on a combination of C structures and Pythondata types, it should also be easier to modify the system in thefuture to add new configuration options.

Testability

One of the problems with the complexity of the CPython startup sequence is thecombinatorial explosion of possible interactions between different configurationsettings.

This concern impacts both the design of the new initialisation system, andthe proposed approach for getting there.

Performance

CPython is used heavily to run short scripts where the runtime is dominatedby the interpreter initialization time. Any changes to the startup sequenceshould minimise their impact on the startup overhead.

Experience with the importlib migration suggests that the startup time isdominated by IO operations. However, to monitor the impact of any changes,a simple benchmark can be used to check how long it takes to start and thentear down the interpreter:

python3-mtimeit-s"from subprocess import call""call(['./python', '-Sc', 'pass'])"

Current numbers on my system for Python 3.7 (as built by the Fedora project):

$python3-mtimeit-s"from subprocess import call""call(['python3', '-Sc', 'pass'])"50 loops, best of 5: 6.48 msec per loop

(TODO: run this microbenchmark with perf rather than the stdlib timeit)

This PEP is not expected to have any significant effect on the startup time,as it is aimed primarily atreordering the existing initializationsequence, without making substantial changes to the individual steps.

However, if this simple check suggests that the proposed changes to theinitialization sequence may pose a performance problem, then a moresophisticated microbenchmark will be developed to assist in investigation.

Required Configuration Settings

SeePEP 587 for a detailed listing of CPython interpreter configuration settingsand the various means available for setting them.

Implementation Strategy

An initial attempt was made at implementing an earlier version of this PEP forPython 3.4[2], with one of the significant problems encountered being mergeconflicts after the initial structural changes were put in place to start therefactoring process. Unlike some other previous major changes, such as theswitch to an AST-based compiler in Python 2.5, or the switch to the importlibimplementation of the import system in Python 3.3, there is no clear way tostructure a draft implementation that won’t be prone to the kinds of mergeconflicts that afflicted the original attempt.

Accordingly, the implementation strategy was revised to instead first implementthis refactoring as a private API for CPython 3.7, and then review the viabilityof exposing the new functions and structures as public API elements in CPython3.8.

After the initial merge, Victor Stinner then proceeded to actually migratesettings to the new structure in order to successfully implement thePEP 540UTF-8 mode changes (which required the ability to track all settings that hadpreviously been decoded with the locale encoding, and decode them again usingUTF-8 instead). Eric Snow also migrated a number of internal subsystems over aspart of making the subinterpreter feature more robust.

That work showed that the detailed design originally proposed in this PEP had arange of practical issues, so Victor designed and implemented an improvedprivate API (inspired by an earlier iteration of this PEP), whichPEP 587proposes to promote to a public API in Python 3.8.

Design Details

Note

The API details here are still very much in flux. The header files that showthe current state of the private API are mainly:

PEP 587 covers the aspects of the API that are considered potentially stableenough to make public. Where a proposed API is covered by that PEP,“(see PEP 587)” is added to the text below.

The main theme of this proposal is to initialize the core language runtimeand create a partially initialized interpreter state for the main interpretermuch earlier in the startup process. This will allow most of the CPython APIto be used during the remainder of the initialization process, potentiallysimplifying a number of operations that currently need to rely on basic Cfunctionality rather than being able to use the richer data structures providedby the CPython C API.

PEP 587 covers a subset of that task, which is splitting out the components thateven the existing “May be called beforePy_Initialize” interfaces need (likememory allocators and operating system interface encoding details) into aseparate pre-configuration step.

In the following, the term “embedding application” also covers the standardCPython command line application.

Interpreter Initialization Phases

The following distinct interpreter initialisation phases are proposed:

  • Uninitialized:
    • Not really a phase, but the absence of a phase
    • Py_IsInitializing() returns0
    • Py_IsRuntimeInitialized() returns0
    • Py_IsInitialized() returns0
    • The embedding application determines which memory allocator to use, andwhich encoding to use to access operating system interfaces (or choosesto delegate those decisions to the Python runtime)
    • Application starts the initialization process by calling one of thePy_PreInitialize APIs (seePEP 587)
  • Runtime Pre-Initialization:
    • no interpreter is available
    • Py_IsInitializing() returns1
    • Py_IsRuntimeInitialized() returns0
    • Py_IsInitialized() returns0
    • The embedding application determines the settings required to initializethe core CPython runtime and create the main interpreter and moves to thenext phase by callingPy_InitializeRuntime
    • Note: as ofPEP 587, the embedding application instead callsPy_Main(),Py_UnixMain, or one of thePy_Initialize APIs, and hence jumpsdirectly to the Initialized state.
  • Main Interpreter Initialization:
    • the builtin data types and other core runtime services are available
    • the main interpreter is available, but only partially configured
    • Py_IsInitializing() returns1
    • Py_IsRuntimeInitialized() returns1
    • Py_IsInitialized() returns0
    • The embedding application determines and applies the settingsrequired to complete the initialization process by callingPy_InitializeMainInterpreter
    • Note: as ofPEP 587, this state is not reachable via any public API, itonly exists as an implicit internal state while one of thePy_Initializefunctions is running
  • Initialized:
    • the main interpreter is available and fully operational, but__main__ related metadata is incomplete
    • Py_IsInitializing() returns0
    • Py_IsRuntimeInitialized() returns1
    • Py_IsInitialized() returns1

Invocation of Phases

All listed phases will be used by the standard CPython interpreter and theproposed System Python interpreter.

An embedding application may still continue to leave initialization almostentirely under CPython’s control by using the existingPy_InitializeorPy_Main() APIs - backwards compatibility will be preserved.

Alternatively, if an embedding application wants greater controlover CPython’s initial state, it will be able to use the new, finergrained API, which allows the embedding application greater controlover the initialization process.

PEP 587 covers an initial iteration of that API, separating out thepre-initialization phase without attempting to separate core runtimeinitialization from main interpreter initialization.

Uninitialized State

The uninitialized state is where an embedding application determines the settingswhich are required in order to be able to correctly pass configurations settingsto the embedded Python runtime.

This covers telling Python which memory allocator to use, as well as which textencoding to use when processing provided settings.

PEP 587 defines the settings needed to exit this state in itsPyPreConfigstruct.

A new query API will allow code to determine if the interpreter hasn’t evenstarted the initialization process:

intPy_IsInitializing();

The query for a completely uninitialized environment would then be!(Py_Initialized()||Py_Initializing()).

Runtime Pre-Initialization Phase

Note

InPEP 587, the settings for this phase are not yet separated out,and are instead only available through the combinedPyConfig struct

The pre-initialization phase is where an embedding application determinesthe settings which are absolutely required before the CPython runtime can beinitialized at all. Currently, the primary configuration settings in thiscategory are those related to the randomised hash algorithm - the hashalgorithms must be consistent for the lifetime of the process, and so theymust be in place before the core interpreter is created.

The essential settings needed are a flag indicating whether or not to use aspecific seed value for the randomised hashes, and if so, the specific valuefor the seed (a seed value of zero disables randomised hashing). In addition,due to the possible use ofPYTHONHASHSEED in configuring the hashrandomisation, the question of whether or not to consider environmentvariables must also be addressed early. Finally, to support the CPythonbuild process, an option is offered to completely disable the importsystem.

The proposed APIs for this step in the startup sequence are:

PyInitErrorPy_InitializeRuntime(constPyRuntimeConfig*config);PyInitErrorPy_InitializeRuntimeFromArgs(constPyRuntimeConfig*config,intargc,char**argv);PyInitErrorPy_InitializeRuntimeFromWideArgs(constPyRuntimeConfig*config,intargc,wchar_t**argv);

IfPy_IsInitializing() is false, thePy_InitializeRuntime functions willimplicitly call the correspondingPy_PreInitialize function. Theuse_environment setting will be passed down, while other settings will beprocessed according to their defaults, as described inPEP 587.

ThePyInitError return type is defined inPEP 587, and allows an embeddingapplication to gracefully handle Python runtime initialization failures,rather than having the entire process abruptly terminated byPy_FatalError.

The newPyRuntimeConfig struct holds the settings required for preliminaryconfiguration of the core runtime and creation of the main interpreter:

/* Note: if changing anything in PyRuntimeConfig, also update * PyRuntimeConfig_INIT */typedefstruct{booluse_environment;/* as in PyPreConfig, PyConfig from PEP 587 */intuse_hash_seed;/* PYTHONHASHSEED, as in PyConfig from PEP 587 */unsignedlonghash_seed;/* PYTHONHASHSEED, as in PyConfig from PEP 587 */bool_install_importlib;/* Needed by freeze_importlib */}PyRuntimeConfig;/* Rely on the "designated initializer" feature of C99 */#define PyRuntimeConfig_INIT {.use_hash_seed=-1}

The core configuration settings pointer may beNULL, in which case thedefault values are as specified inPyRuntimeConfig_INIT.

ThePyRuntimeConfig_INIT macro is designed to allow easy initializationof a struct instance with sensible defaults:

PyRuntimeConfigruntime_config=PyRuntimeConfig_INIT;

use_environment controls the processing of all Python relatedenvironment variables. If the flag is true, thenPYTHONHASHSEED isprocessed normally. Otherwise, all Python-specific environment variablesare considered undefined (exceptions may be made for some OS specificenvironment variables, such as those used on Mac OS X to communicatebetween the App bundle and the main Python binary).

use_hash_seed controls the configuration of the randomised hashalgorithm. If it is zero, then randomised hashes with a random seed willbe used. It is positive, then the value inhash_seed will be usedto seed the random number generator. If thehash_seed is zero in thiscase, then the randomised hashing is disabled completely.

Ifuse_hash_seed is negative (anduse_environment is true),then CPython will inspect thePYTHONHASHSEED environment variable. If theenvironment variable is not set, is set to the empty string, or to the value"random", then randomised hashes with a random seed will be used. If theenvironment variable is set to the string"0" the randomised hashing willbe disabled. Otherwise, the hash seed is expected to be a stringrepresentation of an integer in the range[0;4294967295].

To make it easier for embedding applications to use thePYTHONHASHSEEDprocessing with a different data source, the following helper functionwill be added to the C API:

intPy_ReadHashSeed(char*seed_text,int*use_hash_seed,unsignedlong*hash_seed);

This function accepts a seed string inseed_text and converts it tothe appropriate flag and seed values. Ifseed_text isNULL,the empty string or the value"random", bothuse_hash_seed andhash_seed will be set to zero. Otherwise,use_hash_seed will be set to1 and the seed text will be interpreted as an integer and reported ashash_seed. On success the function will return zero. A non-zero returnvalue indicates an error (most likely in the conversion to an integer).

The_install_importlib setting is used as part of the CPython buildprocess to create an interpreter with no import capability at all. It isconsidered private to the CPython development team (hence the leadingunderscore), as the only currently supported use case is to permit compilerchanges that invalidate the previously frozen bytecode forimportlib._bootstrap without breaking the build process.

The aim is to keep this initial level of configuration as small as possiblein order to keep the bootstrapping environment consistent acrossdifferent embedding applications. If we can create a valid interpreter statewithout the setting, then the setting should appear solely in the comprehensivePyConfig struct rather than in the core runtime configuration.

A new query API will allow code to determine if the interpreter is in thebootstrapping state between the core runtime initialization and the creation ofthe main interpreter state and the completion of the bulk of the maininterpreter initialization process:

intPy_IsRuntimeInitialized();

Attempting to callPy_InitializeRuntime() again whenPy_IsRuntimeInitialized() is already true is reported as a userconfiguration error. (TBC, as existing public initialisation APIs support beingcalled multiple times without error, and simply ignore changes to anywrite-once settings. It may make sense to keep that behaviour rather than tryingto make the new API stricter than the old one)

As frozen bytecode may now be legitimately run in an interpreter which is notyet fully initialized,sys.flags will gain a newinitialized flag.

With the core runtime initialised, the main interpreter and most of the CPythonC API should be fully functional except that:

  • compilation is not allowed (as the parser and compiler are not yetconfigured properly)
  • creation of subinterpreters is not allowed
  • creation of additional thread states is not allowed
  • The following attributes in thesys module are all either missing orNone:*sys.path*sys.argv*sys.executable*sys.base_exec_prefix*sys.base_prefix*sys.exec_prefix*sys.prefix*sys.warnoptions*sys.dont_write_bytecode*sys.stdin*sys.stdout
  • The filesystem encoding is not yet defined
  • The IO encoding is not yet defined
  • CPython signal handlers are not yet installed
  • Only builtin and frozen modules may be imported (due to above limitations)
  • sys.stderr is set to a temporary IO object using unbuffered binarymode
  • Thesys.flags attribute exists, but the individual flags may not yethave their final values.
  • Thesys.flags.initialized attribute is set to0
  • Thewarnings module is not yet initialized
  • The__main__ module does not yet exist

<TBD: identify any other notable missing functionality>

The main things made available by this step will be the core Pythondata types, in particular dictionaries, lists and strings. This allows themto be used safely for all of the remaining configuration steps (unlike thestatus quo).

In addition, the current thread will possess a valid Python thread state,allowing any further configuration data to be stored on the main interpreterobject rather than in C process globals.

Any call toPy_InitializeRuntime() must have a matching call toPy_Finalize(). It is acceptable to skip callingPy_InitializeMainInterpreter() in between (e.g. if attempting to build themain interpreter configuration settings fails).

Determining the remaining configuration settings

The next step in the initialization sequence is to determine the remainingsettings needed to complete the process. No changes are made to theinterpreter state at this point. The core APIs for this step are:

intPy_BuildPythonConfig(PyConfigAsObjects*py_config,constPyConfig*c_config);intPy_BuildPythonConfigFromArgs(PyConfigAsObjects*py_config,constPyConfig*c_config,intargc,char**argv);intPy_BuildPythonConfigFromWideArgs(PyConfigAsObjects*py_config,constPyConfig*c_config,intargc,wchar_t**argv);

Thepy_config argument should be a pointer to a PyConfigAsObjects struct(which may be a temporary one stored on the C stack). For any already configuredvalue (i.e. any non-NULL pointer), CPython will sanity check the supplied value,but otherwise accept it as correct.

A struct is used rather than a Python dictionary as the struct is easierto work with from C, the list of supported fields is fixed for a givenCPython version and only a read-only view needs to be exposed to Pythoncode (which is relatively straightforward, thanks to the infrastructurealready put in place to exposesys.implementation).

UnlikePy_InitializeRuntime, this call will raise a Python exception andreport an error return rather than returning a Python initialization specificC struct if a problem is found with the config data.

Any supported configuration setting which is not already set will bepopulated appropriately in the supplied configuration struct. The defaultconfiguration can be overridden entirely by setting the valuebeforecallingPy_BuildPythonConfig. The provided value will then also beused in calculating any other settings derived from that value.

Alternatively, settings may be overriddenafter thePy_BuildPythonConfig call (this can be useful if an embeddingapplication wants to adjust a setting rather than replace it completely,such as removingsys.path[0]).

Thec_config argument is an optional pointer to aPyConfig structure,as defined inPEP 587. If provided, it is used in preference to reading settingsdirectly from the environment or process global state.

Merely reading the configuration has no effect on the interpreter state: itonly modifies the passed in configuration struct. The settings are notapplied to the running interpreter until thePy_InitializeMainInterpretercall (see below).

Supported configuration settings

The interpreter configuration is split into two parts: settings which areeither relevant only to the main interpreter or must be identical across themain interpreter and all subinterpreters, and settings which may vary acrosssubinterpreters.

NOTE: For initial implementation purposes, only the flag indicating whetheror not the interpreter is the main interpreter will be configured on a perinterpreter basis. Other fields will be reviewed for whether or not they canfeasibly be made interpreter specific over the course of the implementation.

Note

The list of config fields below is currently out of sync withPEP 587.Where they differ,PEP 587 takes precedence.

ThePyConfigAsObjects struct mirrors thePyConfig struct fromPEP 587,but uses full Python objects to store values, rather than C level data types.It addsraw_argv andargv list fields, so later initialisation stepsdon’t need to accept those separately.

Fields are always pointers to Python data types, with unset values indicated byNULL:

typedefstruct{/* Argument processing */PyListObject*raw_argv;PyListObject*argv;PyListObject*warnoptions;/* -W switch, PYTHONWARNINGS */PyDictObject*xoptions;/* -X switch *//* Filesystem locations */PyUnicodeObject*program_name;PyUnicodeObject*executable;PyUnicodeObject*prefix;/* PYTHONHOME */PyUnicodeObject*exec_prefix;/* PYTHONHOME */PyUnicodeObject*base_prefix;/* pyvenv.cfg */PyUnicodeObject*base_exec_prefix;/* pyvenv.cfg *//* Site module */PyBoolObject*enable_site_config;/* -S switch (inverted) */PyBoolObject*no_user_site;/* -s switch, PYTHONNOUSERSITE *//* Import configuration */PyBoolObject*dont_write_bytecode;/* -B switch, PYTHONDONTWRITEBYTECODE */PyBoolObject*ignore_module_case;/* PYTHONCASEOK */PyListObject*import_path;/* PYTHONPATH (etc) *//* Standard streams */PyBoolObject*use_unbuffered_io;/* -u switch, PYTHONUNBUFFEREDIO */PyUnicodeObject*stdin_encoding;/* PYTHONIOENCODING */PyUnicodeObject*stdin_errors;/* PYTHONIOENCODING */PyUnicodeObject*stdout_encoding;/* PYTHONIOENCODING */PyUnicodeObject*stdout_errors;/* PYTHONIOENCODING */PyUnicodeObject*stderr_encoding;/* PYTHONIOENCODING */PyUnicodeObject*stderr_errors;/* PYTHONIOENCODING *//* Filesystem access */PyUnicodeObject*fs_encoding;/* Debugging output */PyBoolObject*debug_parser;/* -d switch, PYTHONDEBUG */PyLongObject*verbosity;/* -v switch *//* Code generation */PyLongObject*bytes_warnings;/* -b switch */PyLongObject*optimize;/* -O switch *//* Signal handling */PyBoolObject*install_signal_handlers;/* Implicit execution */PyUnicodeObject*startup_file;/* PYTHONSTARTUP *//* Main module     *     * If prepare_main is set, at most one of the main_* settings should     * be set before calling PyRun_PrepareMain (Py_ReadMainInterpreterConfig     * will set one of them based on the command line arguments if     * prepare_main is non-zero when that API is called).    PyBoolObject    *prepare_main;    PyUnicodeObject *main_source; /* -c switch */PyUnicodeObject*main_path;/* filesystem path */PyUnicodeObject*main_module;/* -m switch */PyCodeObject*main_code;/* Run directly from a code object */PyObject*main_stream;/* Run from stream */PyBoolObject*run_implicit_code;/* Run implicit code during prep *//* Interactive main     *     * Note: Settings related to interactive mode are very much in flux.     */PyObject*prompt_stream;/* Output interactive prompt */PyBoolObject*show_banner;/* -q switch (inverted) */PyBoolObject*inspect_main;/* -i switch, PYTHONINSPECT */}PyConfigAsObjects;

ThePyInterpreterConfig struct holds the settings that may vary betweenthe main interpreter and subinterpreters. For the main interpreter, thesesettings are automatically populated byPy_InitializeMainInterpreter().

typedefstruct{PyBoolObject*is_main_interpreter;/* Easily check for subinterpreters */}PyInterpreterConfig;

As these structs consist solely of object pointers, no explicit initializerdefinitions are needed - C99’s default initialization of struct memory to zerois sufficient.

Completing the main interpreter initialization

The final step in the initialization process is to actually put theconfiguration settings into effect and finish bootstrapping the maininterpreter up to full operation:

intPy_InitializeMainInterpreter(constPyConfigAsObjects*config);

LikePy_BuildPythonConfig, this call will raise an exception andreport an error return rather than exhibiting fatal errors if a problem isfound with the config data. (TBC, as existing public initialisation APIs supportbeing called multiple times without error, and simply ignore changes to anywrite-once settings. It may make sense to keep that behaviour rather than tryingto make the new API stricter than the old one)

All configuration settings are required - the configuration structshould always be passed throughPy_BuildPythonConfig to ensure itis fully populated.

After a successful callPy_IsInitialized() will become true andPy_IsInitializing() will become false. The caveats described above for theinterpreter during the phase where only the core runtime is initialized willno longer hold.

Attempting to callPy_InitializeMainInterpreter() again whenPy_IsInitialized() is true is an error.

However, some metadata related to the__main__ module may still beincomplete:

  • sys.argv[0] may not yet have its final value
    • it will be-m when executing a module or package with CPython
    • it will be the same assys.path[0] rather than the location ofthe__main__ module when executing a validsys.path entry(typically a zipfile or directory)
    • otherwise, it will be accurate:
      • the script name if running an ordinary script
      • -c if executing a supplied string
      • - or the empty string if running from stdin
  • the metadata in the__main__ module will still indicate it is abuiltin module

This function will normally implicitly import site as its final operation(afterPy_IsInitialized() is already set). Setting the“enable_site_config” flag toPy_False in the configuration settings willdisable this behaviour, as well as eliminating any side effects on globalstate ifimportsite is later explicitly executed in the process.

Preparing the main module

Note

InPEP 587,PyRun_PrepareMain andPyRun_ExecMain are notexposed separately, and are instead accessed through aPy_RunMain APIthat both prepares and executes main, and then finalizes the Pythoninterpreter.

This subphase completes the population of the__main__ modulerelated metadata, without actually starting execution of the__main__module code.

It is handled by calling the following API:

intPyRun_PrepareMain();

This operation is only permitted for the main interpreter, and will raiseRuntimeError when invoked from a thread where the current thread statebelongs to a subinterpreter.

The actual processing is driven by the main related settings stored inthe interpreter state as part of the configuration struct.

Ifprepare_main is zero, this call does nothing.

If all ofmain_source,main_path,main_module,main_stream andmain_code are NULL, this call does nothing.

If more than one ofmain_source,main_path,main_module,main_stream ormain_code are set,RuntimeError will be reported.

Ifmain_code is already set, then this call does nothing.

Ifmain_stream is set, andrun_implicit_code is also set, thenthe file identified instartup_file will be read, compiled andexecuted in the__main__ namespace.

Ifmain_source,main_path ormain_module are set, then thiscall will take whatever steps are needed to populatemain_code:

  • Formain_source, the supplied string will be compiled and saved tomain_code.
  • Formain_path:
    • if the supplied path is recognised as a validsys.path entry, itis inserted assys.path[0],main_module is setto__main__ and processing continues as formain_module below.
    • otherwise, path is read as a CPython bytecode file
    • if that fails, it is read as a Python source file and compiled
    • in the latter two cases, the code object is saved tomain_codeand__main__.__file__ is set appropriately
  • Formain_module:
    • any parent package is imported
    • the loader for the module is determined
    • if the loader indicates the module is a package, add.__main__ tothe end ofmain_module and try again (if the final name segmentis already.__main__ then fail immediately)
    • once the module source code is located, save the compiled module codeasmain_code and populate the following attributes in__main__appropriately:__name__,__loader__,__file__,__cached__,__package__.

(Note: the behaviour described in this section isn’t new, it’s a write-upof the current behaviour of the CPython interpreter adjusted for the newconfiguration system)

Executing the main module

Note

InPEP 587,PyRun_PrepareMain andPyRun_ExecMain are notexposed separately, and are instead accessed through aPy_RunMain APIthat both prepares and executes main, and then finalizes the Pythoninterpreter.

This subphase covers the execution of the actual__main__ module code.

It is handled by calling the following API:

intPyRun_ExecMain();

This operation is only permitted for the main interpreter, and will raiseRuntimeError when invoked from a thread where the current thread statebelongs to a subinterpreter.

The actual processing is driven by the main related settings stored inthe interpreter state as part of the configuration struct.

If bothmain_stream andmain_code are NULL, this call does nothing.

If bothmain_stream andmain_code are set,RuntimeError willbe reported.

Ifmain_stream andprompt_stream are both set, main execution willbe delegated to a new internal API:

int_PyRun_InteractiveMain(PyObject*input,PyObject*output);

Ifmain_stream is set andprompt_stream is NULL, main execution willbe delegated to a new internal API:

int_PyRun_StreamInMain(PyObject*input);

Ifmain_code is set, main execution will be delegated to a new internalAPI:

int_PyRun_CodeInMain(PyCodeObject*code);

After execution of main completes, ifinspect_main is set, orthePYTHONINSPECT environment variable has been set, thenPyRun_ExecMain will invoke_PyRun_InteractiveMain(sys.__stdin__,sys.__stdout__).

Internal Storage of Configuration Data

The interpreter state will be updated to include details of the configurationsettings supplied during initialization by extending the interpreter stateobject with at least an embedded copy of thePyConfigAsObjects andPyInterpreterConfig structs.

For debugging purposes, the configuration settings will be exposed asasys._configuration simple namespace (similar tosys.flags andsys.implementation. The attributes will be themselves by simple namespacescorresponding to the two levels of configuration setting:

  • all_interpreters
  • active_interpreter

Field names will match those in the configuration structs, except forhash_seed, which will be deliberately excluded.

An underscored attribute is chosen deliberately, as these configurationsettings are part of the CPython implementation, rather than part of thePython language definition. If new settings are needed to supportcross-implementation compatibility in the standard library, then thoseshould be agreed with the other implementations and exposed as new requiredattributes onsys.implementation, as described inPEP 421.

These aresnapshots of the initial configuration settings. They are notmodified by the interpreter during runtime (except as noted above).

Creating and Configuring Subinterpreters

As the new configuration settings are stored in the interpreter state, theyneed to be initialised when a new subinterpreter is created. This turns outto be trickier than one might expect due toPyThreadState_Swap(NULL);(which is fortunately exercised by CPython’s own embedding tests, allowingthis problem to be detected during development).

To provide a straightforward solution for this case, the PEP proposes toadd a new API:

Py_InterpreterState*Py_InterpreterState_Main();

This will be a counterpart toPy_InterpreterState_Head(), only reporting theoldest currently existing interpreter rather than the newest. IfPy_NewInterpreter() is called from a thread with an existing threadstate, then the interpreter configuration for that thread will beused when initialising the new subinterpreter. If there is no currentthread state, the configuration fromPy_InterpreterState_Main()will be used.

While the existingPy_InterpreterState_Head() API could be used instead,that reference changes as subinterpreters are created and destroyed, whilePyInterpreterState_Main() will always refer to the initial interpreterstate created inPy_InitializeRuntime().

A new constraint is also added to the embedding API: attempting to deletethe main interpreter while subinterpreters still exist will now be a fatalerror.

Stable ABI

Most of the APIs proposed in this PEP are excluded from the stable ABI, asembedding a Python interpreter involves a much higher degree of couplingthan merely writing an extension module.

The only newly exposed APIs that will be part of the stable ABI are thePy_IsInitializing() andPy_IsRuntimeInitialized() queries.

Build time configuration

This PEP makes no changes to the handling of build time configurationsettings, and thus has no effect on the contents ofsys.implementationor the result ofsysconfig.get_config_vars().

Backwards Compatibility

Backwards compatibility will be preserved primarily by ensuring thatPy_BuildPythonConfig() interrogates all the previously definedconfiguration settings stored in global variables and environment variables,and thatPy_InitializeMainInterpreter() writes affected settings back tothe relevant locations.

One acknowledged incompatibility is that some environment variables whichare currently read lazily may instead be read once during interpreterinitialization. As the reference implementation matures, these will bediscussed in more detail on a case-by-case basis. The environment variableswhich are currently known to be looked up dynamically are:

  • PYTHONCASEOK: writing toos.environ['PYTHONCASEOK'] will no longerdynamically alter the interpreter’s handling of filename case differenceson import (TBC)
  • PYTHONINSPECT:os.environ['PYTHONINSPECT'] will still be checkedafter execution of the__main__ module terminates

ThePy_Initialize() style of initialization will continue to besupported. It will use (at least some elements of) the new APIinternally, but will continue to exhibit the same behaviour as itdoes today, ensuring thatsys.argv is not populated until a subsequentPySys_SetArgv call (TBC). All APIs that currently support being calledprior toPy_Initialize() willcontinue to do so, and will also support being called prior toPy_InitializeRuntime().

A System Python Executable

When executing system utilities with administrative access to a system, manyof the default behaviours of CPython are undesirable, as they may allowuntrusted code to execute with elevated privileges. The most problematicaspects are the fact that user site directories are enabled,environment variables are trusted and that the directory containing theexecuted file is placed at the beginning of the import path.

Issue 16499[6] added a-I option to change the behaviour ofthe normal CPython executable, but this is a hard to discover solution (andadds yet another option to an already complex CLI). This PEP proposes toinstead add a separatesystem-python executable

Currently, providing a separate executable with different default behaviourwould be prohibitively hard to maintain. One of the goals of this PEP is tomake it possible to replace much of the hard to maintain bootstrapping codewith more normal CPython code, as well as making it easier for a separateapplication to make use of key components ofPy_Main. Including thischange in the PEP is designed to help avoid acceptance of a design thatsounds good in theory but proves to be problematic in practice.

Cleanly supporting this kind of “alternate CLI” is the main reason for theproposed changes to better expose the core logic for deciding between thedifferent execution modes supported by CPython:

  • script execution
  • directory/zipfile execution
  • command execution (“-c” switch)
  • module or package execution (“-m” switch)
  • execution from stdin (non-interactive)
  • interactive stdin

Actually implementing this may also reveal the need for some betterargument parsing infrastructure for use during the initializing phase.

Open Questions

  • Error details forPy_BuildPythonConfig andPy_InitializeMainInterpreter (these should become clearer as theimplementation progresses)

Implementation

The reference implementation is being developed as a private API refactoringwithin the CPython reference interpreter (as attempting to maintain it as anindependent project proved impractical).

PEP 587 extracts a subset of the proposal that is considered sufficiently stableto be worth proposing as a public API for Python 3.8.

The Status Quo (as of Python 3.6)

The current mechanisms for configuring the interpreter have accumulated ina fairly ad hoc fashion over the past 20+ years, leading to a ratherinconsistent interface with varying levels of documentation.

Also seePEP 587 for further discussion of the existing settings and theirhandling.

(Note: some of the info below could probably be cleaned up and added to theC API documentation for 3.x - it’s all CPython specific, so itdoesn’t belong in the language reference)

Ignoring Environment Variables

The-E command line option allows all environment variables to beignored when initializing the Python interpreter. An embedding applicationcan enable this behaviour by settingPy_IgnoreEnvironmentFlag beforecallingPy_Initialize().

In the CPython source code, thePy_GETENV macro implicitly checks thisflag, and always producesNULL if it is set.

<TBD: I believe PYTHONCASEOK is checked regardless of this setting ><TBD: Does -E also ignore Windows registry keys? >

Randomised Hashing

The randomised hashing is controlled via the-R command line option (inreleases prior to 3.3), as well as thePYTHONHASHSEED environmentvariable.

In Python 3.3, only the environment variable remains relevant. It can beused to disable randomised hashing (by using a seed value of 0) or elseto force a specific hash value (e.g. for repeatability of testing, orto share hash values between processes)

However, embedding applications must use thePy_HashRandomizationFlagto explicitly request hash randomisation (CPython sets it inPy_Main()rather than inPy_Initialize()).

The new configuration API should make it straightforward for anembedding application to reuse thePYTHONHASHSEED processing witha text based configuration setting provided by other means (e.g. aconfig file or separate environment variable).

Locating Python and the standard library

The location of the Python binary and the standard library is influencedby several elements. The algorithm used to perform the calculation isnot documented anywhere other than in the source code[3],[4]. Even thatdescription is incomplete, as it failed to be updated for the virtualenvironment support added in Python 3.3 (detailed inPEP 405).

These calculations are affected by the following function calls (madeprior to callingPy_Initialize()) and environment variables:

  • Py_SetProgramName()
  • Py_SetPythonHome()
  • PYTHONHOME

The filesystem is also inspected forpyvenv.cfg files (seePEP 405) or,failing that, alib/os.py (Windows) orlib/python$VERSION/os.pyfile.

The build time settings forPREFIX andEXEC_PREFIX are also relevant,as are some registry settings on Windows. The hardcoded fallbacks arebased on the layout of the CPython source tree and build output whenworking in a source checkout.

Configuringsys.path

An embedding application may callPy_SetPath() prior toPy_Initialize() to completely override the calculation ofsys.path. It is not straightforward to only allowsome of thecalculations, as modifyingsys.path after initialization isalready complete means those modifications will not be in effectwhen standard library modules are imported during the startup sequence.

IfPy_SetPath() is not used prior to the first call toPy_GetPath()(implicit inPy_Initialize()), then it builds on the location datacalculations above to calculate suitable path entries, along withthePYTHONPATH environment variable.

<TBD: On Windows, there’s also a bunch of stuff to do with the registry>

Thesite module, which is implicitly imported at startup (unlessdisabled via the-S option) adds additional paths to this initialset of paths, as described in its documentation[5].

The-s command line option can be used to exclude the user sitedirectory from the list of directories added. Embedding applicationscan control this by setting thePy_NoUserSiteDirectory global variable.

The following commands can be used to check the default path configurationsfor a given Python executable on a given system:

  • ./python-c"importsys,pprint;pprint.pprint(sys.path)"- standard configuration
  • ./python-s-c"importsys,pprint;pprint.pprint(sys.path)"- user site directory disabled
  • ./python-S-c"importsys,pprint;pprint.pprint(sys.path)"- all site path modifications disabled

(Note: you can see similar information using-msite instead of-c,but this is slightly misleading as it callsos.abspath on all of thepath entries, making relative path entries look absolute. Using thesitemodule also causes problems in the last case, as on Python versions prior to3.3, explicitly importing site will carry out the path modifications-Savoids, while on 3.3+ combining-msite with-S currently fails)

The calculation ofsys.path[0] is comparatively straightforward:

  • For an ordinary script (Python source or compiled bytecode),sys.path[0] will be the directory containing the script.
  • For a validsys.path entry (typically a zipfile or directory),sys.path[0] will be that path
  • For an interactive session, running from stdin or when using the-c or-m switches,sys.path[0] will be the empty string, which the importsystem interprets as allowing imports from the current directory

Configuringsys.argv

Unlike most other settings discussed in this PEP,sys.argv is notset implicitly byPy_Initialize(). Instead, it must be set via anexplicitly call toPy_SetArgv().

CPython calls this inPy_Main() after callingPy_Initialize(). Thecalculation ofsys.argv[1:] is straightforward: they’re the command linearguments passed after the script name or the argument to the-c or-m options.

The calculation ofsys.argv[0] is a little more complicated:

  • For an ordinary script (source or bytecode), it will be the script name
  • For asys.path entry (typically a zipfile or directory) it willinitially be the zipfile or directory name, but will later be changed bytherunpy module to the full path to the imported__main__ module.
  • For a module specified with the-m switch, it will initially be thestring"-m", but will later be changed by therunpy module to thefull path to the executed module.
  • For a package specified with the-m switch, it will initially be thestring"-m", but will later be changed by therunpy module to thefull path to the executed__main__ submodule of the package.
  • For a command executed with-c, it will be the string"-c"
  • For explicitly requested input from stdin, it will be the string"-"
  • Otherwise, it will be the empty string

Embedding applications must call Py_SetArgv themselves. The CPython logicfor doing so is part ofPy_Main() and is not exposed separately.However, therunpy module does provide roughly equivalent logic inrunpy.run_module andrunpy.run_path.

Other configuration settings

TBD: Cover the initialization of the following in more detail:

  • Completely disabling the import system
  • The initial warning system state:
    • sys.warnoptions
    • (-W option, PYTHONWARNINGS)
  • Arbitrary extended options (e.g. to automatically enablefaulthandler):
    • sys._xoptions
    • (-X option)
  • The filesystem encoding used by:
    • sys.getfsencoding
    • os.fsencode
    • os.fsdecode
  • The IO encoding and buffering used by:
    • sys.stdin
    • sys.stdout
    • sys.stderr
    • (-u option, PYTHONIOENCODING, PYTHONUNBUFFEREDIO)
  • Whether or not to implicitly cache bytecode files:
    • sys.dont_write_bytecode
    • (-B option, PYTHONDONTWRITEBYTECODE)
  • Whether or not to enforce correct case in filenames on case-insensitiveplatforms
    • os.environ["PYTHONCASEOK"]
  • The other settings exposed to Python code insys.flags:
    • debug (Enable debugging output in the pgen parser)
    • inspect (Enter interactive interpreter after __main__ terminates)
    • interactive (Treat stdin as a tty)
    • optimize (__debug__ status, write .pyc or .pyo, strip doc strings)
    • no_user_site (don’t add the user site directory to sys.path)
    • no_site (don’t implicitly import site during startup)
    • ignore_environment (whether environment vars are used during config)
    • verbose (enable all sorts of random output)
    • bytes_warning (warnings/errors for implicit str/bytes interaction)
    • quiet (disable banner output even if verbose is also enabled orstdin is a tty and the interpreter is launched in interactive mode)
  • Whether or not CPython’s signal handlers should be installed

Much of the configuration of CPython is currently handled through C levelglobal variables:

Py_BytesWarningFlag(-b)Py_DebugFlag(-doption)Py_InspectFlag(-ioption,PYTHONINSPECT)Py_InteractiveFlag(propertyofstdin,cannotbeoverridden)Py_OptimizeFlag(-Ooption,PYTHONOPTIMIZE)Py_DontWriteBytecodeFlag(-Boption,PYTHONDONTWRITEBYTECODE)Py_NoUserSiteDirectory(-soption,PYTHONNOUSERSITE)Py_NoSiteFlag(-Soption)Py_UnbufferedStdioFlag(-u,PYTHONUNBUFFEREDIO)Py_VerboseFlag(-voption,PYTHONVERBOSE)

For the above variables, the conversion of command line options andenvironment variables to C global variables is handled byPy_Main,so each embedding application must set those appropriately in order tochange them from their defaults.

Some configuration can only be provided as OS level environment variables:

PYTHONSTARTUPPYTHONCASEOKPYTHONIOENCODING

ThePy_InitializeEx() API also accepts a boolean flag to indicatewhether or not CPython’s signal handlers should be installed.

Finally, some interactive behaviour (such as printing the introductorybanner) is triggered only when standard input is reported as a terminalconnection by the operating system.

TBD: Document how the “-x” option is handled (skips processing of thefirst comment line in the main script)

Also see detailed sequence of operations notes at[1].

References

[1] (1,2)
CPython interpreter initialization notes(http://wiki.python.org/moin/CPythonInterpreterInitialization)
[2]
BitBucket Sandbox(https://bitbucket.org/ncoghlan/cpython_sandbox/compare/pep432_modular_bootstrap..default#commits)
[3]
*nix getpath implementation(http://hg.python.org/cpython/file/default/Modules/getpath.c)
[4]
Windows getpath implementation(http://hg.python.org/cpython/file/default/PC/getpathp.c)
[5]
Site module documentation(http://docs.python.org/3/library/site.html)
[6] (1,2)
Proposed CLI option for isolated mode(http://bugs.python.org/issue16499)
[7]
Adding to sys.path on the command line(https://mail.python.org/pipermail/python-ideas/2010-October/008299.html)(https://mail.python.org/pipermail/python-ideas/2012-September/016128.html)
[8]
Control sys.path[0] initialisation(http://bugs.python.org/issue13475)
[9]
Enabling code coverage in subprocesses when testing(http://bugs.python.org/issue14803)
[10]
Problems with PYTHONIOENCODING in Blender(http://bugs.python.org/issue16129)

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0432.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp