Python Enhancement Proposals

Python »
PEP Index »
PEP 219

Author:: Gordon McMillan <gmcm at hypernet.com>
Status:

Introduction

This PEP discusses changes required to core Python in order toefficiently support generators, microthreads and coroutines. It isrelated toPEP 220, which describes how Python should be extendedto support these facilities. The focus of this PEP is strictly onthe changes required to allow these extensions to work.

While these PEPs are based on Christian Tismer’s Stackless[1]implementation, they do not regard Stackless as a referenceimplementation. Stackless (with an extension module) implementscontinuations, and from continuations one can implementcoroutines, microthreads (as has been done by Will Ware[2]) andgenerators. But in more than a year, no one has found any otherproductive use of continuations, so there seems to be no demandfor their support.

However, Stackless support for continuations is a relatively minorpiece of the implementation, so one might regard it as “a”reference implementation (rather than “the” referenceimplementation).

Background

Generators and coroutines have been implemented in a number oflanguages in a number of ways. Indeed, Tim Peters has done purePython implementations of generators[3] and coroutines[4] usingthreads (and a thread-based coroutine implementation exists forJava). However, the horrendous overhead of a thread-basedimplementation severely limits the usefulness of this approach.

Microthreads (a.k.a “green” or “user” threads) and coroutinesinvolve transfers of control that are difficult to accommodate ina language implementation based on a single stack. (Generators canbe done on a single stack, but they can also be regarded as a verysimple case of coroutines.)

Real threads allocate a full-sized stack for each thread ofcontrol, and this is the major source of overhead. However,coroutines and microthreads can be implemented in Python in a waythat involves almost no overhead. This PEP, therefore, offers away for making Python able to realistically manage thousands ofseparate “threads” of activity (vs. today’s limit of perhaps dozensof separate threads of activity).

Another justification for this PEP (explored inPEP 220) is thatcoroutines and generators often allow a more direct expression ofan algorithm than is possible in today’s Python.

Discussion

The first thing to note is that Python, while it minglesinterpreter data (normal C stack usage) with Python data (thestate of the interpreted program) on the stack, the two arelogically separate. They just happen to use the same stack.

A real thread gets something approaching a process-sized stackbecause the implementation has no way of knowing how much stackspace the thread will require. The stack space required for anindividual frame is likely to be reasonable, but stack switchingis an arcane and non-portable process, not supported by C.

Once Python stops putting Python data on the C stack, however,stack switching becomes easy.

The fundamental approach of the PEP is based on these twoideas. First, separate C’s stack usage from Python’s stackusage. Secondly, associate with each frame enough stack space tohandle that frame’s execution.

In the normal usage, Stackless Python has a normal stackstructure, except that it is broken into chunks. But in thepresence of a coroutine / microthread extension, this samemechanism supports a stack with a tree structure. That is, anextension can support transfers of control between frames outsidethe normal “call / return” path.

Problems

The major difficulty with this approach is C calling Python. Theproblem is that the C stack now holds a nested execution of thebyte-code interpreter. In that situation, a coroutine /microthread extension cannot be permitted to transfer control to aframe in a different invocation of the byte-code interpreter. If aframe were to complete and exit back to C from the wronginterpreter, the C stack could be trashed.

The ideal solution is to create a mechanism where nestedexecutions of the byte code interpreter are never needed. The easysolution is for the coroutine / microthread extension(s) torecognize the situation and refuse to allow transfers outside thecurrent invocation.

We can categorize code that involves C calling Python into twocamps: Python’s implementation, and C extensions. And hopefully wecan offer a compromise: Python’s internal usage (and C extensionwriters who want to go to the effort) will no longer use a nestedinvocation of the interpreter. Extensions which do not go to theeffort will still be safe, but will not play well with coroutines/ microthreads.

Generally, when a recursive call is transformed into a loop, a bitof extra bookkeeping is required. The loop will need to keep itsown “stack” of arguments and results since the real stack can nowonly hold the most recent. The code will be more verbose, becauseit’s not quite as obvious when we’re done. While Stackless is notimplemented this way, it has to deal with the same issues.

In normal Python,PyEval_EvalCode is used to build a frame andexecute it. Stackless Python introduces the concept of aFrameDispatcher. LikePyEval_EvalCode, it executes one frame. Butthe interpreter may signal theFrameDispatcher that a new framehas been swapped in, and the new frame should be executed. When aframe completes, theFrameDispatcher follows the back pointer toresume the “calling” frame.

So Stackless transforms recursions into a loop, but it is not theFrameDispatcher that manages the frames. This is done by theinterpreter (or an extension that knows what it’s doing).

The general idea is that where C code needs to execute Pythoncode, it creates a frame for the Python code, setting its backpointer to the current frame. Then it swaps in the frame, signalstheFrameDispatcher and gets out of the way. The C stack is nowclean - the Python code can transfer control to any other frame(if an extension gives it the means to do so).

In the vanilla case, this magic can be hidden from the programmer(even, in most cases, from the Python-internals programmer). Manysituations present another level of difficulty, however.

The map builtin function involves two obstacles to thisapproach. It cannot simply construct a frame and get out of theway, not just because there’s a loop involved, but each passthrough the loop requires some “post” processing. In order to playwell with others, Stackless constructs a frame object for mapitself.

Most recursions of the interpreter are not this complex, butfairly frequently, some “post” operations are required. Stacklessdoes not fix these situations because of the amount of code changesrequired. Instead, Stackless prohibits transfers out of a nestedinterpreter. While not ideal (and sometimes puzzling), thislimitation is hardly crippling.

Advantages

For normal Python, the advantage to this approach is that C stackusage becomes much smaller and more predictable. Unboundedrecursion in Python code becomes a memory error, instead of astack error (and thus, in non-Cupertino operating systems,something that can be recovered from). The price, of course, isthe added complexity that comes from transforming recursions ofthe byte-code interpreter loop into a higher order loop (and theattendant bookkeeping involved).

The big advantage comes from realizing that the Python stack isreally a tree, and the frame dispatcher can transfer controlfreely between leaf nodes of the tree, thus allowing things likemicrothreads and coroutines.