interp.exec()Note
This PEP effectively continues in a cleaner form inPEP 734.This PEP is kept as-is for the sake of the various sections ofbackground information and deferred/rejected ideas that havebeen stripped fromPEP 734.
CPython has supported multiple interpreters in the same process (AKA“subinterpreters”) since version 1.5 (1997). The feature has beenavailable via the C-API.[c-api] Multiple interpreters operate inrelative isolation from one another, whichfacilitates novel alternative approaches toconcurrency.
This proposal introduces the stdlibinterpreters module. It exposesthe basic functionality of multiple interpreters already provided by theC-API, along with basic support for communicating between interpreters.This module is especially relevant sincePEP 684 introduced aper-interpreter GIL in Python 3.12.
Summary:
Theinterpreters module will provide a high-level interfaceto the multiple interpreter functionality, and wrap a new low-level_interpreters (in the same way as thethreading module).See theExamples section for concrete usage and use cases.
Along with exposing the existing (in CPython) multiple interpretersupport, the module will also support a basic mechanism forpassing data between interpreters. That involves setting “shareable”objects in the__main__ module of a target subinterpreter. Somesuch objects, likeos.pipe(), may be used to communicate further.The module will also provide a minimal implementation of “channels”as a demonstration of cross-interpreter communication.
Note thatobjects are not shared between interpreters since they aretied to the interpreter in which they were created. Instead, theobjects’data is passed between interpreters. See theShared DataandAPI For Communication sections for more details aboutsharing/communicating between interpreters.
Here is a summary of the API for theinterpreters module. For amore in-depth explanation of the proposed classes and functions, seethe“interpreters” Module API section below.
For creating and using interpreters:
| signature | description |
|---|---|
list_all()->[Interpreter] | Get all existing interpreters. |
get_current()->Interpreter | Get the currently running interpreter. |
get_main()->Interpreter | Get the main interpreter. |
create()->Interpreter | Initialize a new (idle) Python interpreter. |
| signature | description |
|---|---|
classInterpreter | A single interpreter. |
.id | The interpreter’s ID (read-only). |
.is_running()->bool | Is the interpreter currently executing code? |
.close() | Finalize and destroy the interpreter. |
.set_main_attrs(**kwargs) | Bind “shareable” objects in__main__. |
.get_main_attr(name) | Get a “shareable” object from__main__. |
.exec(src_str,/) | Run the given source code in the interpreter (in the current thread). |
For communicating between interpreters:
| signature | description |
|---|---|
is_shareable(obj)->Bool | Can the object’s data be passed between interpreters? |
create_channel()->(RecvChannel,SendChannel) | Create a new channel for passing data between interpreters. |
An executor will be added that extendsThreadPoolExecutor to runper-thread tasks in subinterpreters. Initially, the only supportedtasks will be whateverInterpreter.exec() takes (e.g. astrscript). However, we may also support some functions, as well aseventually a separate method for pickling the task and arguments,to reduce friction (at the expense of performancefor short-running tasks).
In practice, an extension that implements multi-phase init (PEP 489)is considered isolated and thus compatible with multiple interpreters.Otherwise it is “incompatible”.
Many extension modules are still incompatible. The maintainers andusers of such extension modules will both benefit when they are updatedto support multiple interpreters. In the meantime, users may becomeconfused by failures when using multiple interpreters, which couldnegatively impact extension maintainers. SeeConcerns below.
To mitigate that impact and accelerate compatibility, we will do thefollowing:
ImportError when an incompatible module is importedin a subinterpreterinterp=interpreters.create()print('before')interp.exec('print("during")')print('after')
interp=interpreters.create()defrun():interp.exec('print("during")')t=threading.Thread(target=run)print('before')t.start()t.join()print('after')
interp=interpreters.create()interp.exec(tw.dedent(""" import some_lib import an_expensive_module some_lib.set_up() """))wait_for_request()interp.exec(tw.dedent(""" some_lib.handle_request() """))
interp=interpreters.create()try:interp.exec(tw.dedent(""" raise KeyError """))exceptinterpreters.RunFailedErrorasexc:print(f"got the error from the subinterpreter:{exc}")
interp=interpreters.create()try:try:interp.exec(tw.dedent(""" raise KeyError """))exceptinterpreters.RunFailedErrorasexc:raiseexc.__cause__exceptKeyError:print("got a KeyError from the subinterpreter")
Note that this pattern is a candidate for later improvement.
interp=interpreters.create()interp.set_main_attrs(a=1,b=2)interp.exec(tw.dedent(""" res = do_something(a, b) """))res=interp.get_main_attr('res')
interp=interpreters.create()r1,s1=os.pipe()r2,s2=os.pipe()deftask():interp.exec(tw.dedent(f""" import os os.read({r1}, 1) print('during B') os.write({s2}, '') """))t=threading.thread(target=task)t.start()print('before')os.write(s1,'')print('during A')os.read(r2,1)print('after')t.join()
interp=interpreters.create()withopen('spamspamspam')asinfile:interp.set_main_attrs(fd=infile.fileno())interp.exec(tw.dedent(f""" import os for line in os.fdopen(fd): print(line) """))
interp=interpreters.create()r,s=os.pipe()interp.exec(tw.dedent(f""" import os import pickle reader ={r} """))interp.exec(tw.dedent(""" data = b'' c = os.read(reader, 1) while c != b'\x00': while c != b'\x00': data += c c = os.read(reader, 1) obj = pickle.loads(data) do_something(obj) c = os.read(reader, 1) """))forobjininput:data=pickle.dumps(obj)os.write(s,data)os.write(s,b'\x00')os.write(s,b'\x00')
interp=interpreters.create()stdout=io.StringIO()withcontextlib.redirect_stdout(stdout):interp.exec(tw.dedent(""" print('spam!') """))assert(stdout.getvalue()=='spam!')# alternately:interp.exec(tw.dedent(""" import contextlib, io stdout = io.StringIO() with contextlib.redirect_stdout(stdout): print('spam!') captured = stdout.getvalue() """))captured=interp.get_main_attr('captured')assert(captured=='spam!')
A pipe (os.pipe()) could be used similarly.
interp=interpreters.create()main_module=mod_nameinterp.exec(f'import runpy; runpy.run_module({main_module!r})')
interp=interpreters.create()main_script=path_nameinterp.exec(f"import runpy; runpy.run_path({main_script!r})")
tasks_recv,tasks=interpreters.create_channel()results,results_send=interpreters.create_channel()defworker():interp=interpreters.create()interp.set_main_attrs(tasks=tasks_recv,results=results_send)interp.exec(tw.dedent(""" def handle_request(req): ... def capture_exception(exc): ... while True: try: req = tasks.recv() except Exception: # channel closed break try: res = handle_request(req) except Exception as exc: res = capture_exception(exc) results.send_nowait(res) """))threads=[threading.Thread(target=worker)for_inrange(20)]fortinthreads:t.start()requests=...forreqinrequests:tasks.send(req)tasks.close()fortinthreads:t.join()
data,chunksize=read_large_data_set()buf=memoryview(data)numchunks=(len(buf)+1)/chunksizeresults=memoryview(b'\0'*numchunks)tasks_recv,tasks=interpreters.create_channel()defworker():interp=interpreters.create()interp.set_main_attrs(data=buf,results=results,tasks=tasks_recv)interp.exec(tw.dedent(""" while True: try: req = tasks.recv() except Exception: # channel closed break resindex, start, end = req chunk = data[start: end] res = reduce_chunk(chunk) results[resindex] = res """))t=threading.Thread(target=worker)t.start()foriinrange(numchunks):ifnotworkers_running():raise...start=i*chunksizeend=start+chunksizeifend>len(buf):end=len(buf)tasks.send((start,end,i))tasks.close()t.join()use_results(results)
Running code in multiple interpreters provides a useful level ofisolation within the same process. This can be leveraged in a numberof ways. Furthermore, subinterpreters provide a well-defined frameworkin which such isolation may extended. (SeePEP 684.)
Alyssa (Nick) Coghlan explained some of the benefits through a comparison withmulti-processing[benefits]:
[I]expectthatcommunicatingbetweensubinterpretersisgoingtoenduplookinganawfullotlikecommunicatingbetweensubprocessesviasharedmemory.Thetrade-offbetweenthetwomodelswillthenbethatonestilljustlookslikeasingleprocessfromthepointofviewoftheoutsideworld,andhencedoesn't place any extra demands on theunderlyingOSbeyondthoserequiredtorunCPythonwithasingleinterpreter,whiletheothergivesmuchstricterisolation(includingisolatingCglobalsinextensionmodules),butalsodemandsmuchmorefromtheOSwhenitcomestoitsIPCcapabilities.Thesecurityriskprofilesofthetwoapproacheswillalsobequitedifferent,sinceusingsubinterpreterswon't require deliberatelypokingholesintheprocessisolationthatoperatingsystemsgiveyoubydefault.
CPython has supported multiple interpreters, with increasing levelsof support, since version 1.5. While the feature has the potentialto be a powerful tool, it has suffered from neglectbecause the multiple interpreter capabilities are not readily availabledirectly from Python. Exposing the existing functionalityin the stdlib will help reverse the situation.
This proposal is focused on enabling the fundamental capability ofmultiple interpreters, isolated from each other,in the same Python process. This is anew area for Python so there is relative uncertainly about the besttools to provide as companions to interpreters. Thus we minimizethe functionality we add in the proposal as much as possible.
Some have argued that subinterpreters do not add sufficient benefitto justify making them an official part of Python. Adding featuresto the language (or stdlib) has a cost in increasing the size ofthe language. So an addition must pay for itself.
In this case, multiple interpreter support provide a novel concurrencymodel focused on isolated threads of execution. Furthermore, theyprovide an opportunity for changes in CPython that will allowsimultaneous use of multiple CPU cores (currently preventedby the GIL–seePEP 684).
Alternatives to subinterpreters include threading, async, andmultiprocessing. Threading is limited by the GIL and async isn’tthe right solution for every problem (nor for every person).Multiprocessing is likewise valuable in some but not all situations.Direct IPC (rather than via the multiprocessing module) providessimilar benefits but with the same caveat.
Notably, subinterpreters are not intended as a replacement for any ofthe above. Certainly they overlap in some areas, but the benefits ofsubinterpreters include isolation and (potentially) performance. Inparticular, subinterpreters provide a direct route to an alternateconcurrency model (e.g. CSP) which has found success elsewhere andwill appeal to some Python users. That is the core value that theinterpreters module will provide.
In theInterpreter Isolation section below we identify ways inwhich isolation in CPython’s subinterpreters is incomplete. Mostnotable is extension modules that use C globals to store internalstate. (PEP 3121 andPEP 489 provide a solution to that problem,followed by some extra APIs that improve efficiency, e.g.PEP 573).
Consequently, projects that publish extension modules may face anincreased maintenance burden as their users start using subinterpreters,where their modules may break. This situation is limited to modulesthat use C globals (or use libraries that use C globals) to storeinternal state. For numpy, the reported-bug rate is one every 6months.[bug-rate]
Ultimately this comes down to a question of how often it will be aproblem in practice: how many projects would be affected, how oftentheir users will be affected, what the additional maintenance burdenwill be for projects, and what the overall benefit of subinterpretersis to offset those costs. The position of this PEP is that the actualextra maintenance burden will be small and well below the threshold atwhich subinterpreters are worth it.
Introducing an API for a new concurrency model, like happened withasyncio, is an extremely large project that requires a lot of carefulconsideration. It is not something that can be done as simply as thisPEP proposes and likely deserves significant time on PyPI to mature.(SeeNathaniel’s post on python-dev.)
However, this PEP does not propose any new concurrency API.At most it exposes minimal tools (e.g. subinterpreters, channels)which may be used to write code that follows patterns associated with(relatively) new-to-Pythonconcurrency models.Those tools could also be used as the basis for APIs for suchconcurrency models. Again, this PEP does not propose any such API.
A common misconception is that this PEP also includes a promise thatinterpreters will no longer share the GIL. When that is clarified,the next question is “what is the point?”. This is already answeredat length in this PEP. Just to be clear, the value lies in:
*increaseexposureoftheexistingfeature,whichhelpsimprovethecodehealthoftheentireCPythonruntime*exposethe(mostly)isolatedexecutionofinterpreters*preparationforper-interpreterGIL*encourageexperimentation
(See[cache-line-ping-pong].)
This shouldn’t be a problem for now as we have no immediate plansto actually share data between interpreters, instead focusingon copying.
Concurrency is a challenging area of software development. Decades ofresearch and practice have led to a wide variety of concurrency models,each with different goals. Most center on correctness and usability.
One class of concurrency models focuses on isolated threads ofexecution that interoperate through some message passing scheme. Anotable example is Communicating Sequential Processes[CSP] (uponwhich Go’s concurrency is roughly based). The intended isolationinherent to CPython’s interpreters makes them well-suitedto this approach.
CPython’s interpreters are inherently isolated (with caveatsexplained below), in contrast to threads. So the samecommunicate-via-shared-memory approach doesn’t work. Without analternative, effective use of concurrency via multiple interpretersis significantly limited.
The key challenge here is that sharing objects between interpretersfaces complexity due to various constraints on object ownership,visibility, and mutability. At a conceptual level it’s easier toreason about concurrency when objects only exist in one interpreterat a time. At a technical level, CPython’s current memory modellimits how Pythonobjects may be shared safely between interpreters;effectively, objects are bound to the interpreter in which they werecreated. Furthermore, the complexity ofobject sharing increases asinterpreters become more isolated, e.g. after GIL removal (though thisis mitigated somewhat for some “immortal” objects (seePEP 683).
Consequently, the mechanism for sharing needs to be carefully considered.There are a number of valid solutions, several of which may beappropriate to support in Python’s stdlib and C-API. Any such solutionis likely to share many characteristics with the others.
In the meantime, we propose here a minimal solution(Interpreter.set_main_attrs()), which sets some precedent for howobjects are shared. More importantly, it facilitates the introductionof more advanced approaches later and allows them to coexist and cooperate.In part to demonstrate that, we will provide a basic implementation of“channels”, as a somewhat more advanced sharing solution.
Separate proposals may cover:
Interpreter.set_main_attrs()The fundamental enabling feature for communication is that most objectscan be converted to some encoding of underlying raw data, which is safeto be passed between interpreters. For example, anint object canbe turned into a Clong value, sent to another interpreter, andturned back into anint object there. As another example,None may be passed as-is.
Regardless, the effort to determine the best way forward here is mostlyoutside the scope of this PEP. In the meantime, this proposal describesa basic interim solution using pipes (os.pipe()), as well asproviding a dedicated capability (“channels”).SeeAPI For Communication below.
CPython’s interpreters are intended to be strictly isolated from eachother. Each interpreter has its own copy of all modules, classes,functions, and variables. The same applies to state in C, including inextension modules. The CPython C-API docs explain more.[caveats]
However, there are ways in which interpreters do share some state.First of all, some process-global state remains shared:
There are no plans to change this.
Second, some isolation is faulty due to bugs or implementations that didnot take subinterpreters into account. This includes things likeextension modules that rely on C globals.[cryptography] In thesecases bugs should be opened (some are already):
Finally, some potential isolation is missing due to the current designof CPython. Improvements are currently going on to address gaps in thisarea:
PyGILState_* API are somewhat incompatible[gilstate]Multiple interpreter support has not been a widely used feature.In fact, there have been only a handful of documented cases ofwidespread usage, includingmod_wsgi,OpenStack Ceph, andJEP. On the one hand, these casesprovide confidence that existing multiple interpreter support isrelatively stable. On the other hand, there isn’t much of a samplesize from which to judge the utility of the feature.
I’ve solicited feedback from various Python implementors about supportfor subinterpreters. Each has indicated that they would be able tosupport multiple interpreters in the same process (if they choose to)without a lot of trouble. Here are the projects I contacted:
The module provides the following functions:
list_all()->[Interpreter]Returnalistofallexistinginterpreters.get_current()=>InterpreterReturnthecurrentlyrunninginterpreter.get_main()=>InterpreterReturnthemaininterpreter.IfthePythonimplementationhasnoconceptofamaininterpreterthenreturnNone.create()->InterpreterInitializeanewPythoninterpreterandreturnit.Itwillremainidleuntilsomethingisruninitandalwaysruninitsownthread.is_shareable(obj)->bool:ReturnTrueiftheobjectmaybe"shared"betweeninterpreters.Thisdoesnotnecessarilymeanthattheactualobjectswillbeshared.Instead,itmeansthattheobjects' underlying data willbesharedinacross-interpreterway,whetherviaaproxy,acopy,orsomeothermeans.
The module also provides the following class:
class Interpreter(id): id -> int: The interpreter's ID. (read-only) is_running() -> bool: Return whether or not the interpreter's "exec()" is currently executing code. Code running in subthreads is ignored. Calling this on the current interpreter will always return True. close(): Finalize and destroy the interpreter. This may not be called on an already running interpreter. Doing so results in a RuntimeError. set_main_attrs(iterable_or_mapping, /): set_main_attrs(**kwargs): Set attributes in the interpreter's __main__ module corresponding to the given name-value pairs. Each value must be a "shareable" object and will be converted to a new object (e.g. copy, proxy) in whatever way that object's type defines. If an attribute with the same name is already set, it will be overwritten. This method is helpful for setting up an interpreter before calling exec(). get_main_attr(name, default=None, /): Return the value of the corresponding attribute of the interpreter's __main__ module. If the attribute isn't set then the default is returned. If it is set, but the value isn't "shareable" then a ValueError is raised. This may be used to introspect the __main__ module, as well as a very basic mechanism for "returning" one or more results from Interpreter.exec(). exec(source_str, /): Run the provided Python source code in the interpreter, in its __main__ module. This may not be called on an already running interpreter. Doing so results in a RuntimeError. An "interp.exec()" call is similar to a builtin exec() call (or to calling a function that returns None). Once "interp.exec()" completes, the code that called "exec()" continues executing (in the original interpreter). Likewise, if there is any uncaught exception then it effectively (see below) propagates into the code where ``interp.exec()`` was called. Like exec() (and threads), but unlike function calls, there is no return value. If any "return" value from the code is needed, send the data out via a pipe (os.pipe()) or channel or other cross-interpreter communication mechanism. The big difference from exec() or functions is that "interp.exec()" executes the code in an entirely different interpreter, with entirely separate state. The interpreters are completely isolated from each other, so the state of the original interpreter (including the code it was executing in the current OS thread) does not affect the state of the target interpreter (the one that will execute the code). Likewise, the target does not affect the original, nor any of its other threads. Instead, the state of the original interpreter (for this thread) is frozen, and the code it's executing code completely blocks. At that point, the target interpreter is given control of the OS thread. Then, when it finishes executing, the original interpreter gets control back and continues executing. So calling "interp.exec()" will effectively cause the current Python thread to completely pause. Sometimes you won't want that pause, in which case you should make the "exec()" call in another thread. To do so, add a function that calls "interp.exec()" and then run that function in a normal "threading.Thread". Note that the interpreter's state is never reset, neither before "interp.exec()" executes the code nor after. Thus the interpreter state is preserved between calls to "interp.exec()". This includes "sys.modules", the "builtins" module, and the internal state of C extension modules. Also note that "interp.exec()" executes in the namespace of the "__main__" module, just like scripts, the REPL, "-m", and "-c". Just as the interpreter's state is not ever reset, the "__main__" module is never reset. You can imagine concatenating the code from each "interp.exec()" call into one long script. This is the same as how the REPL operates. Supported code: source text.
In addition to the functionality ofInterpreter.set_main_attrs(),the module provides a related way to pass data between interpreters:channels. SeeChannels below.
Regarding uncaught exceptions inInterpreter.exec(), we noted thatthey are “effectively” propagated into the code whereinterp.exec()was called. To prevent leaking exceptions (and tracebacks) betweeninterpreters, we create a surrogate of the exception and its traceback(seetraceback.TracebackException), set it to__cause__on a newinterpreters.RunFailedError, and raise that.
Directly raising (a proxy of) the exception is problematic since it’sharder to distinguish between an error in theinterp.exec() calland an uncaught exception from the subinterpreter.
Every new interpreter created byinterpreters.create()now has specific restrictions on any code it runs. This includes thefollowing:
os.fork() is not allowed (so nomultiprocessing)os.exec*() is not allowed(but “fork+exec”, a lasubprocess is okay)Note that interpreters created with the existing C-API do not have theserestrictions. The same is true for the “main” interpreter, soexisting use of Python will not change.
We may choose to later loosen some of the above restrictions or providea way to enable/disable granular restrictions individually. Regardless,requiring multi-phase init from extension modules will always be adefault restriction.
As discussed inShared Data above, multiple interpreter supportis less useful without a mechanism for sharing data (communicating)between them. Sharing actual Python objects between interpreters,however, has enough potential problems that we are avoiding supportfor that in this proposal. Nor, as mentioned earlier, are we addinganything more than a basic mechanism for communication.
That mechanism is theInterpreter.set_main_attrs() method.It may be used to set up global variables beforeInterpreter.exec()is called. The name-value pairs passed toset_main_attrs() arebound as attributes of the interpreter’s__main__ module.The values must be “shareable”. SeeShareable Types below.
Additional approaches to communicating and sharing objects are enabledthroughInterpreter.set_main_attrs(). A shareable object could beimplemented which works like a queue, but with cross-interpreter safety.In fact, this PEP does include an example of such an approach: channels.
An object is “shareable” if its type supports shareable instances.The type must implement a new internal protocol, which is used toconvert an object to interpreter-independent data and then convertedback to an object on the other side. Also seeis_shareable() above.
A minimal set of simple, immutable builtin types will be supportedinitially, including:
NoneboolbytesstrintfloatWe will also support a small number of complex types initially:
Further builtin types may be supported later, complex or not.Limiting the initial shareable types is a practical matter, reducingthe potential complexity of the initial implementation. There are anumber of strategies we may pursue in the future to expand supportedobjects, once we have more experience with interpreter isolation.
In the meantime, a separate proposal will discuss making the internalprotocol (and C-API) used byInterpreter.set_main_attrs() public.With that protocol, support for other types could be addedby extension modules.
Even without a dedicated object for communication, users may alreadyuse existing tools. For example, one basic approach for sending databetween interpreters is to use a pipe (seeos.pipe()):
os.pipe() to get a read/write pairof file descriptors (bothint objects)interp.set_main_attrs(), binding the read FD(or embeds it using string formatting)interp.exec() on interpreter BSeveral of the earlier examples demonstrate this, such asSynchronize using an OS pipe.
Theinterpreters module will include a dedicated solution forpassing object data between interpreters: channels. They are includedin the module in part to provide an easier mechanism than usingos.pipe() and in part to demonstrate how libraries may takeadvantage ofInterpreter.set_main_attrs()and the protocol it uses.
A channel is a simplex FIFO. It is a basic, opt-in data sharingmechanism that draws inspiration from pipes, queues, and CSP’schannels.[fifo] The main difference from pipes is that channels canbe associated with zero or more interpreters on either end. Likequeues, which are also many-to-many, channels are buffered (thoughthey also offer methods with unbuffered semantics).
Channels have two operations: send and receive. A key characteristicof those operations is that channels transmit data derived from Pythonobjects rather than the objects themselves. When objects are sent,their data is extracted. When the “object” is received in the otherinterpreter, the data is converted back into an object owned by thatinterpreter.
To make this work, the mutable shared state will be managed by thePython runtime, not by any of the interpreters. Initially we willsupport only one type of objects for shared state: the channels providedbyinterpreters.create_channel(). Channels, in turn, will carefullymanage passing objects between interpreters.
This approach, including keeping the API minimal, helps us avoid furtherexposing any underlying complexity to Python users.
Theinterpreters module provides the following function relatedto channels:
create_channel()->(RecvChannel,SendChannel):Createanewchannelandreturn(recv,send),theRecvChannelandSendChannelcorrespondingtotheendsofthechannel.Bothendsofthechannelaresupported"shared"objects(i.e.maybesafelysharedbydifferentinterpreters.Thustheymaybesetusing"Interpreter.set_main_attrs()".
The module also provides the following channel-related classes:
classRecvChannel(id):Thereceivingendofachannel.Aninterpretermayusethistoreceiveobjectsfromanotherinterpreter.AnytypesupportedbyInterpreter.set_main_attrs()willbesupportedhere,thoughatfirstonlyafewofthesimple,immutablebuiltintypeswillbesupported.id->int:Thechannel's unique ID. The "send" end has the same one.recv(*,timeout=None):Returnthenextobjectfromthechannel.Ifnonehavebeensentthenwaituntilthenextsend(oruntilthetimeoutishit).Attheleast,theobjectwillbeequivalenttothesentobject.Thatwillalmostalwaysmeanthesametypewiththesamedata,thoughitcouldalsobeacompatibleproxy.Regardless,itmayuseacopyofthatdataoractuallysharethedata.That's uptotheobject's type.recv_nowait(default=None):Returnthenextobjectfromthechannel.Ifnonehavebeensentthenreturnthedefault.Otherwise,thisisthesameasthe"recv()"method.classSendChannel(id):Thesendingendofachannel.Aninterpretermayusethistosendobjectstoanotherinterpreter.AnytypesupportedbyInterpreter.set_main_attrs()willbesupportedhere,thoughatfirstonlyafewofthesimple,immutablebuiltintypeswillbesupported.id->int:Thechannel's unique ID. The "recv" end has the same one.send(obj,*,timeout=None):Sendtheobject(i.e.itsdata)tothe"recv"endofthechannel.Waituntiltheobjectisreceived.IftheobjectisnotshareablethenValueErrorisraised.Thebuiltinmemoryviewissupported,sosendingabufferacrossinvolvesfirstwrappingtheobjectinamemoryviewandthensendingthat.send_nowait(obj):Sendtheobjecttothe"recv"endofthechannel.Thisbehavesthesameas"send()",exceptforthewaitingpart.Ifnointerpreteriscurrentlyreceiving(waitingontheotherend)thenqueuetheobjectandreturnFalse.OtherwisereturnTrue.
Again, Python objects are not shared between interpreters.However, in some cases data those objects wrap is actually sharedand not just copied. One example might bePEP 3118 buffers.
In those cases the object in the original interpreter is kept aliveuntil the shared data in the other interpreter is no longer used.Then object destruction can happen like normal in the originalinterpreter, along with the previously shared data.
The new stdlib docs page for theinterpreters module will includethe following:
Docs about resources for extension maintainers already exist on theIsolating Extension Modules howto page. Anyextra help will be added there. For example, it may prove helpfulto discuss strategies for dealing with linked libraries that keeptheir own subinterpreter-incompatible global state.
Note that the documentation will play a large part in mitigating anynegative impact that the newinterpreters module might have onextension module maintainers.
Also, theImportError for incompatible extension modules will beupdated to clearly say it is due to missing multiple interpreterscompatibility and that extensions are not required to provide it. Thiswill help set user expectations properly.
One possible alternative to a new module is to add support for interpreterstoconcurrent.futures. There are several reasons why that wouldn’t work:
concurrent.futures is all about executing functionsbut currently we don’t have a good way to run a functionfrom one interpreter in anotherSimilar reasoning applies for support in themultiprocessing module.
interp.exec() runs in the current thread?interp.exec(), and/orInterpreter.set_main_attrs() andInterpreter.get_main_attr()?interp.exec() right now?Interpreter.close() toInterpreter.destroy()?Interpreter.get_main_attr(), since we have channels?In the interest of keeping this proposal minimal, the followingfunctionality has been left out for future consideration. Note thatthis is not a judgement against any of said capability, but rather adeferment. That said, each is arguably valid.
There are a number of things I can imagine would smooth outhypothetical rough edges with the new module:
Interpreter.run() orInterpreter.call()that callsinterp.exec() and falls back to pickleInterpreter.set_main_attrs()andInterpreter.get_main_attr()These would be easy to do if this proves to be a pain point.
One regular point of confusion has been thatInterpreter.exec()executes in the current OS thread, temporarily blocking the currentPython thread. It may be worth doing something to avoid that confusion.
Some possible solutions for this hypothetical problem:
Interpreter.exec_in_thread()?Interpreter.exec_in_current_thread()?In earlier versions of this PEP the method wasinterp.run().The simple change tointerp.exec() alone will probably reduceconfusion sufficiently, when coupled with educating users viathe docs. It it turns out to be a real problem, we can pursueone of the alternatives at that point.
Interpreter.is_running() refers specifically to whether or notInterpreter.exec() (or similar) is running somewhere. It does notsay anything about if the interpreter has any subthreads running. Thatinformation might be helpful.
Some things we could do:
Interpreter.is_running() toInterpreter.is_running_main()Interpreter.has_threads(), to complementInterpreter.is_running()Interpreter.is_running(main=True,threads=False)None of these are urgent and any could be done later, if desired.
We could add a special method, like__xid__ to correspond totp_xid.At the very least, it would allow Python types to convert their instancesto some other type that implementstp_xid.
The problem is that exposing this capability to Python code presentsa degree of complixity that hasn’t been explored yet, nor is therea compelling case to investigate that complexity.
It would be convenient to run existing functions in subinterpretersdirectly.Interpreter.exec() could be adjusted to support this oracall() method could be added:
Interpreter.call(f,*args,**kwargs)
This suffers from the same problem as sharing objects betweeninterpreters via queues. The minimal solution (running a source string)is sufficient for us to get the feature out where it can be explored.
This method would make ainterp.exec() call for you in a thread.Doing this using onlythreading.Thread andinterp.exec() isrelatively trivial so we’ve left it out.
Thethreading module provides a number of synchronization primitivesfor coordinating concurrent operations. This is especially necessarydue to the shared-state nature of threading. In contrast,interpreters do not share state. Data sharing is restricted to theruntime’s shareable objects capability, which does away with the needfor explicit synchronization. If any sort of opt-in shared statesupport is added to CPython’s interpreters in the future, that sameeffort can introduce synchronization primitives to meet that need.
Acsp module would not be a large step away from the functionalityprovided by this PEP. However, adding such a module is outside theminimalist goals of this proposal.
TheGo language provides a concurrency model based on CSP,so it’s similar to the concurrency model that multiple interpreterssupport. However,Go also provides syntactic support, as well asseveral builtin concurrency primitives, to make concurrency afirst-class feature. Conceivably, similar syntactic (and builtin)support could be added to Python using interpreters. However,that isway outside the scope of this PEP!
Themultiprocessing module could support interpreters in the sameway it supports threads and processes. In fact, the module’smaintainer, Davin Potts, has indicated this is a reasonable featurerequest. However, it is outside the narrow scope of this PEP.
By using thePyModuleDef_Slot introduced byPEP 489, we couldeasily add a mechanism by which C-extension modules could opt out ofmultiple interpreter support. Then the import machinery, when operatingin a subinterpreter, would need to check the module for support.It would raise an ImportError if unsupported.
Alternately we could support opting in to multiple interpreters support.However, that would probably exclude many more modules (unnecessarily)than the opt-out approach. Also, note thatPEP 489 defined that anextension’s use of the PEP’s machinery implies multiple interpreterssupport.
The scope of adding the ModuleDef slot and fixing up the importmachinery is non-trivial, but could be worth it. It all depends onhow many extension modules break under subinterpreters. Given thatthere are relatively few cases we know of through mod_wsgi, we canleave this for later.
CSP has the concept of poisoning a channel. Once a channel has beenpoisoned, anysend() orrecv() call on it would raise a specialexception, effectively ending execution in the interpreter that triedto use the poisoned channel.
This could be accomplished by adding apoison() method to both endsof the channel. Theclose() method can be used in this way(mostly), but these semantics are relatively specialized and can wait.
As proposed, every call toInterpreter.exec() will execute in thenamespace of the interpreter’s existing__main__ module. This meansthat data persists there betweeninterp.exec() calls. Sometimesthis isn’t desirable and you want to execute in a fresh__main__.Also, you don’t necessarily want to leak objects there that you aren’tusing any more.
Note that the following won’t work right because it will clear too much(e.g.__name__ and the other “__dunder__” attributes:
interp.exec('globals().clear()')
Possible solutions include:
create() arg to indicate resetting__main__ after eachinterp.exec() callInterpreter.reset_main flag to support opting in or outafter the factInterpreter.reset_main() method to opt in when desiredimportlib.util.reset_globals()[reset_globals]Also note that resetting__main__ does nothing about state storedin other modules. So any solution would have to be clear about thescope of what is being reset. Conceivably we could invent a mechanismby which any (or every) module could be reset, unlikereload()which does not clear the module before loading into it.
Regardless, since__main__ is the execution namespace of theinterpreter, resetting it has a much more direct correlation tointerpreters and their dynamic state than does resetting other modules.So a more generic module reset mechanism may prove unnecessary.
This isn’t a critical feature initially. It can wait until laterif desirable.
It may be nice to re-use an existing subinterpreter instead ofspinning up a new one. Since an interpreter has substantially morestate than just the__main__ module, it isn’t so easy to put aninterpreter back into a pristine/fresh state. In fact, theremaybe parts of the state that cannot be reset from Python code.
A possible solution is to add anInterpreter.reset() method. Thiswould put the interpreter back into the state it was in when newlycreated. If called on a running interpreter it would fail (hence themain interpreter could never be reset). This would likely be moreefficient than creating a new interpreter, though that depends onwhat optimizations will be made later to interpreter creation.
While this would potentially provide functionality that is nototherwise available from Python code, it isn’t a fundamentalfunctionality. So in the spirit of minimalism here, this can wait.Regardless, I doubt it would be controversial to add it post-PEP.
Relatedly, it may be useful to support creating a new interpreterbased on an existing one, e.g.Interpreter.copy(). This tiesinto the idea that a snapshot could be made of an interpreter’s memory,which would make starting up CPython, or creating new interpreters,faster in general. The same mechanism could be used for ahypotheticalInterpreter.reset(), as described previously.
Given that file descriptors and sockets are process-global resources,making them shareable is a reasonable idea. They would be a goodcandidate for the first effort at expanding the supported shareabletypes. They aren’t strictly necessary for the initial API.
Per Antoine Pitrou[async]:
Has any thought been given to how FIFOs could integrate with asynccode driven by an event loop (e.g. asyncio)? I think the model ofexecuting several asyncio (or Tornado) applications each in theirown subinterpreter may prove quite interesting to reconcile multi-core concurrency with ease of programming. That would require theFIFOs to be able to synchronize on something an event loop can waiton (probably a file descriptor?).
The basic functionality of multiple interpreters support does not dependon async and can be added later.
A possible solution is to provide async implementations of the blockingchannel methods (recv(), andsend()).
Alternately, “readiness callbacks” could be used to simplify use inasync scenarios. This would mean adding an optionalcallback(kw-only) parameter to therecv_nowait() andsend_nowait()channel methods. The callback would be called once the object was sentor received (respectively).
(Note that making channels buffered makes readiness callbacks lessimportant.)
Supporting iteration onRecvChannel (via__iter__() or_next__()) may be useful. A trivial implementation would use therecv() method, similar to how files do iteration. Since this isn’ta fundamental capability and has a simple analog, adding iterationsupport can wait until later.
Context manager support onRecvChannel andSendChannel may behelpful. The implementation would be simple, wrapping a call toclose() (or mayberelease()) like files do. As with iteration,this can wait.
With the proposed object passing mechanism of “os.pipe()”, other similarbasic types aren’t strictly required to achieve the minimal usefulfunctionality of multiple interpreters. Such types include pipes(like unbuffered channels, but one-to-one) and queues (like channels,but more generic). See below inRejected Ideas for more information.
Even though these types aren’t part of this proposal, they may stillbe useful in the context of concurrency. Adding them later is entirelyreasonable. The could be trivially implemented as wrappers aroundchannels. Alternatively they could be implemented for efficiency at thesame low level as channels.
When sending an object through a channel, you don’t have a way of knowingwhen the object gets received on the other end. One way to work aroundthis is to return a lockedthreading.Lock fromSendChannel.send()that unlocks once the object is received.
Alternately, the proposedSendChannel.send() (blocking) andSendChannel.send_nowait() provide an explicit distinction that isless likely to confuse users.
Note that returning a lock would matter for buffered channels(i.e. queues). For unbuffered channels it is a non-issue.
A simple example isqueue.PriorityQueue in the stdlib.
Folks might find it useful, when creating a new interpreter, to beable to indicate that they would like some things “inherited” by thenew interpreter. The mechanism could be a strict copy or it could becopy-on-write. The motivating example is with the warnings module(e.g. copy the filters).
The feature isn’t critical, nor would it be widely useful, so itcan wait until there’s interest. Notably, both suggested solutionswill require significant work, especially when it comes to complexobjects and most especially for mutable containers of mutablecomplex objects.
Exceptions are propagated out ofrun() calls, so it isn’t a bigleap to make them shareable. However, as noted elsewhere,it isn’t essential or (particularly common) so we can wait on doingthat.
We could use pickle (or marshal) to serialize everything and thusmake them shareable. Doing this is potentially inefficient,but it may be a matter of convenience in the end.We can add it later, but trying to remove it laterwould be significantly more painful.
An uncaught exception in a subinterpreter (frominterp.exec()) iscopied to the calling interpreter and set as__cause__ on aRunFailedError which is then raised. That copying part involvessome sort of deserialization in the calling interpreter, which can beexpensive (e.g. due to imports) yet is not always necessary.
So it may be useful to use anExceptionProxy type to wrap theserialized exception and only deserialize it when needed. That couldbe viaExceptionProxy__getattribute__() or perhaps throughRunFailedError.resolve() (which would raise the deserializedexception and setRunFailedError.__cause__ to the exception.
It may also make sense to haveRunFailedError.__cause__ be adescriptor that does the lazy deserialization (and set__cause__)on theRunFailedError instance.
interp.exec()Currentlyinterp.exec() always returns None. One idea is to returnthe return value from whatever the subinterpreter ran. However, for nowit doesn’t make sense. The only thing folks can run is a string ofcode (i.e. a script). This is equivalent toPyRun_StringFlags(),exec(), or a module body. None of those “return” anything. We canrevisit this onceinterp.exec() supports functions, etc.
This would be_threading.Lock (or something like it) whereinterpreters would actually share the underlying mutex. The mainconcern is that locks and isolated interpreters may not mix well(as learned in Go).
We can add this later if it proves desirable without much trouble.
The exception types that inherit fromBaseException (aside fromException) are usually treated specially. These types are:KeyboardInterrupt,SystemExit, andGeneratorExit. It maymake sense to treat them specially when it comes to propagation frominterp.exec(). Here are some options:
*propagatelikenormalviaRunFailedError*donotpropagate(handlethemsomehowinthesubinterpreter)*propagatethemdirectly(avoidRunFailedError)*propagatethemdirectly(setRunFailedErroras__cause__)
We aren’t going to worry about handling them differently. Threadsalready ignoreSystemExit, so for now we will follow that pattern.
It can be convenient to have an explicit way to close a channel againstfurther global use. Likewise it could be useful to have an explicitway to release one of the channel ends relative to the currentinterpreter. Among other reasons, such a mechanism is useful forcommunicating overall state between interpreters without the extraboilerplate that passing objects through a channel directly wouldrequire.
The challenge is getting automatic release/close right without makingit hard to understand. This is especially true when dealing with anon-empty channel. We should be able to get by without release/closefor now.
This method would allow no-copy sending of an object through a channelif it supports thePEP 3118 buffer protocol (e.g. memoryview).
Support for this is not fundamental to channels and can be added onlater without much disruption.
The PEP proposes a hard separation between subinterpreters and threads:if you want to run in a thread you must create the thread yourself andcallinterp.exec() in it. However, it might be convenient ifinterp.exec() could do that for you, meaning there would be lessboilerplate.
Furthermore, we anticipate that users will want to run in a thread muchmore often than not. So it would make sense to make this the defaultbehavior. We would add a kw-only param “threaded” (defaultTrue)tointerp.exec() to allow the run-in-the-current-thread operation.
Interpreters are implicitly associated with channels uponrecv() andsend() calls. They are de-associated withrelease() calls. Thealternative would be explicit methods. It would be eitheradd_channel() andremove_channel() methods onInterpreterobjects or something similar on channel objects.
In practice, this level of management shouldn’t be necessary for users.So adding more explicit support would only add clutter to the API.
A pipe would be a simplex FIFO between exactly two interpreters. Formost use cases this would be sufficient. It could potentially simplifythe implementation as well. However, it isn’t a big step to supportinga many-to-many simplex FIFO via channels. Also, with pipes the APIends up being slightly more complicated, requiring naming the pipes.
Queues and buffered channels are almost the same thing. The maindifference is that channels have a stronger relationship with context(i.e. the associated interpreter).
The name “Channel” was used instead of “Queue” to avoid confusion withthe stdlibqueue.Queue.
Thelist_all() function provides the list of all interpreters.In the threading module, which partly inspired the proposed API, thefunction is calledenumerate(). The name is different here toavoid confusing Python users that are not already familiar with thethreading API. For them “enumerate” is rather unclear, whereas“list_all” is clear.
In function calls, uncaught exceptions propagate to the calling frame.The same approach could be taken withinterp.exec(). However, thiswould mean that exception objects would leak across the inter-interpreterboundary. Likewise, the frames in the traceback would potentially leak.
While that might not be a problem currently, it would be a problem onceinterpreters get better isolation relative to memory management (whichis necessary to stop sharing the GIL between interpreters). We’veresolved the semantics of how the exceptions propagate by raising aRunFailedError instead, for which__cause__ wraps a safe proxyfor the original exception and traceback.
Rejected possible solutions:
subprocess.CalledProcessError)(requires a cross-interpreter representation)Interpreter.excepthook(requires a cross-interpreter representation)err.raise() to propagate the traceback).interp.exec() instead ofraising itsubprocess does)[result-object](unnecessary complexity?)interp.exec()(they can pass error info out via channels);with threads you have to do something similarAs implemented in the C-API, an interpreter is not inherently tied toany thread. Furthermore, it will run in any existing thread, whethercreated by Python or not. You only have to activate one of its threadstates (PyThreadState) in the thread first. This means that thesame thread may run more than one interpreter (though obviouslynot at the same time).
The proposed module maintains this behavior. Interpreters are nottied to threads. Only calls toInterpreter.exec() are. However,one of the key objectives of this PEP is to provide a morehuman-centric concurrency model. With that in mind, from a conceptualstandpoint the modulemight be easier to understand if eachinterpreter were associated with its own thread.
That would meaninterpreters.create() would create a new threadandInterpreter.exec() would only execute in that thread (andnothing else would). The benefit is that users would not have towrapInterpreter.exec() calls in a newthreading.Thread. Norwould they be in a position to accidentally pause the currentinterpreter (in the current thread) while their interpreterexecutes.
The idea is rejected because the benefit is small and the cost is high.The difference from the capability in the C-API would be potentiallyconfusing. The implicit creation of threads is magical. The earlycreation of threads is potentially wasteful. The inability to runarbitrary interpreters in an existing thread would prevent some validuse cases, frustrating users. Tying interpreters to threads wouldrequire extra runtime modifications. It would also make the module’simplementation overly complicated. Finally, it might not even makethe module easier to understand.
Associate interpreters with channel ends only oncerecv(),send(), etc. are called.
Doing this is potentially confusing and also can lead to unexpectedraces where a channel is auto-closed before it can be used in theoriginal (creating) interpreter.
This would make sense especially ifInterpreter.exec() were tomanage new threads for you (which we’ve rejected). Essentially,each call would run independently, which would be mostly finefrom a narrow technical standpoint, since each interpretercan have multiple threads.
The problem is that the interpreter has only one__main__ moduleand simultaneousInterpreter.exec() calls would have to sort outsharing__main__ or we’d have to invent a new mechanism. Neitherwould be simple enough to be worth doing.
While having__cause__ set onRunFailedError helps produce amore useful traceback, it’s less helpful when handling the originalerror. To help facilitate this, we could addRunFailedError.reraise(). This method would enable the followingpattern:
try:try:interp.exec(script)exceptRunFailedErrorasexc:exc.reraise()exceptMyException:...
This would be made even simpler if there existed a__reraise__protocol.
All that said, this is completely unnecessary. Using__cause__is good enough:
try:try:interp.exec(script)exceptRunFailedErrorasexc:raiseexc.__cause__exceptMyException:...
Note that in extreme cases it may require a little extra boilerplate:
try:try:interp.exec(script)exceptRunFailedErrorasexc:ifexc.__cause__isnotNone:raiseexc.__cause__raise# re-raiseexceptMyException:...
The implementation of the PEP has 4 parts:
These are at various levels of completion, with more done the loweryou go:
The implementation effort forPEP 554 is being tracked as part ofa larger project aimed at improving multi-core support in CPython.[multi-core-project]
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0554.rst
Last modified:2025-02-01 08:55:40 GMT