Application-level Stackless features

Introduction

PyPy can expose to its user language features similar to the onespresent inStackless Python: the ability to write code in amassively concurrent style. (It does not (any more) offer theability to run with norecursion depth limit, but the same effectcan be achieved indirectly.)

This feature is based on a custom primitive called acontinulet.Continulets can be directly used by application code, or it is possibleto write (entirely at app-level) more user-friendly interfaces.

Currently PyPy implementsgreenlets on top of continulets. It alsoimplements (an approximation of) tasklets and channels, emulating the modelofStackless Python.

Continulets are extremely light-weight, which means that PyPy should beable to handle programs containing large amounts of them. However, dueto an implementation restriction, a PyPy compiled with--gcrootfinder=shadowstack consumes at least one page of physicalmemory (4KB) per live continulet, and half a megabyte of virtual memoryon 32-bit or a complete megabyte on 64-bit. Moreover, the feature isonly available (so far) on x86 and x86-64 CPUs; for other CPUs you needto add a short page of custom assembler torpython/translator/c/src/stacklet/.

Theory

The fundamental idea is that, at any point in time, the program happensto run one stack of frames (or one per thread, in case ofmulti-threading). To see the stack, start at the top frame and followthe chain off_back until you reach the bottom frame. From thepoint of view of one of these frames, it has af_back pointing toanother frame (unless it is the bottom frame), and it is itself beingpointed to by another frame (unless it is the top frame).

The theory behind continulets is to literally take the previous sentenceas definition of “an O.K. situation”. The trick is that there areO.K. situations that are more complex than just one stack: you willalways have one stack, but you can also have in addition one or moredetachedcycles of frames, such that by following thef_back chainyou run in a circle. But note that these cycles are indeed completelydetached: the top frame (the currently running one) is always the onewhich is not thef_back of anybody else, and it is always the top ofa stack that ends with the bottom frame, never a part of these extracycles.

How do you create such cycles? The fundamental operation to do so is totake two frames andpermute theirf_back — i.e. exchange them.You can permute any twof_back without breaking the rule of “an O.K.situation”. Say for example thatf is some frame halfway down thestack, and you permute itsf_back with thef_back of the topframe. Then you have removed from the normal stack all intermediateframes, and turned them into one stand-alone cycle. By doing the samepermutation again you restore the original situation.

In practice, in PyPy, you cannot change thef_back of an arbitraryframe, but only of frames stored incontinulets.

Continulets are internally implemented usingstacklets. Stacklets are abit more primitive (they are really one-shot continuations), but thatidea only works in C, not in Python. The basic idea of continulets isto have at any point in time a complete valid stack; this is importante.g. to correctly propagate exceptions (and it seems to give meaningfultracebacks too).

Application level interface

Continulets

A translated PyPy contains by default a module called_continuationexporting the typecontinulet. Acontinulet object from thismodule is a container that stores a “one-shot continuation”. It playsthe role of an extra frame you can insert in the stack, and whosef_back can be changed.

To make a continulet object, callcontinulet() with a callable andoptional extra arguments.

Later, the first time youswitch() to the continulet, the callableis invoked with the same continulet object as the extra first argument.At that point, the one-shot continuation stored in the continulet pointsto the caller ofswitch(). In other words you have a perfectlynormal-looking stack of frames. But whenswitch() is called again,this stored one-shot continuation is exchanged with the current one; itmeans that the caller ofswitch() is suspended with its continuationstored in the container, and the old continuation from the continuletobject is resumed.

The most primitive API is actually ‘permute()’, which just permutes theone-shot continuation stored in two (or more) continulets.

In more details:

  • continulet(callable,*args,**kwds): make a new continulet.Like a generator, this only creates it; thecallable is onlyactually called the first time it is switched to. It will becalled as follows:

    callable(cont,*args,**kwds)

    wherecont is the same continulet object.

    Note that it is actuallycont.__init__() that bindsthe continulet. It is also possible to create a not-bound-yetcontinulet by calling explicitlycontinulet.__new__(), andonly bind it later by calling explicitlycont.__init__().

  • cont.switch(value=None,to=None): start the continulet ifit was not started yet. Otherwise, store the current continuationincont, and activate the target continuation, which is theone that was previously stored incont. Note that the targetcontinuation was itself previously suspended by another call toswitch(); this olderswitch() will now appear to return.Thevalue argument is any object that is carried to the targetand returned by the target’sswitch().

    Ifto is given, it must be another continulet object. Inthat case, performs a “double switch”: it switches as describedabove tocont, and then immediately switches again toto.This is different from switching directly toto: the currentcontinuation gets stored incont, the old continuation fromcont gets stored into, and only then we resume theexecution from the old continuation out ofto.

  • cont.throw(type,value=None,tb=None,to=None): similar toswitch(), except that immediately after the switch is done, raisethe given exception in the target.

  • cont.is_pending(): return True if the continulet is pending.This is False when it is not initialized (because we called__new__ and not__init__) or when it is finished (becausethecallable() returned). When it is False, the continuletobject is empty and cannot beswitch()-ed to.

  • permute(*continulets): a global function that permutes thecontinuations stored in the given continulets arguments. Mostlytheoretical. In practice, usingcont.switch() is easier andmore efficient than usingpermute(); the latter does not onits own change the currently running frame.

Genlets

The_continuation module also exposes thegenerator decorator:

@generatordeff(cont,a,b):cont.switch(a+b)cont.switch(a+b+1)foriinf(10,20):printi

This example prints 30 and 31. The only advantage over using regulargenerators is that the generator itself is not limited toyieldstatements that must all occur syntactically in the same function.Instead, we can pass aroundcont, e.g. to nested sub-functions, andcallcont.switch(x) from there.

Thegenerator decorator can also be applied to methods:

classX:@generatordeff(self,cont,a,b):...

Greenlets

Greenlets are implemented on top of continulets inlib_pypy/greenlet.py.See the officialdocumentation of the greenlets.

Note that unlike the CPython greenlets, this version does not sufferfrom GC issues: if the program “forgets” an unfinished greenlet, it willalways be collected at the next garbage collection.

Unimplemented features

The following features (present in some past Stackless version of PyPy)are for the time being not supported any more:

  • Coroutines (could be rewritten at app-level)
  • Continuing execution of a continulet in a different thread(but if it is “simple enough”, you can pickle it and unpickle itin the other thread).
  • Automatic unlimited stack (must beemulated so far)
  • Support for other CPUs than x86 and x86-64

We also do not include any of the recent API additions to StacklessPython, likeset_atomic(). Contributions welcome.

Recursion depth limit

You can use continulets to emulate the infinite recursion depth presentin Stackless Python and in stackless-enabled older versions of PyPy.

The trick is to start a continulet “early”, i.e. when the recursiondepth is very low, and switch to it “later”, i.e. when the recursiondepth is high. Example:

from_continuationimportcontinuletdefinvoke(_,callable,arg):returncallable(arg)defbootstrap(c):# this loop runs forever, at a very low recursion depthcallable,arg=c.switch()whileTrue:# start a new continulet from here, and switch to# it using an "exchange", i.e. a switch with to=.to=continulet(invoke,callable,arg)callable,arg=c.switch(to=to)c=continulet(bootstrap)c.switch()defrecursive(n):ifn==0:return("ok",n)ifn%200==0:prev=c.switch((recursive,n-1))else:prev=recursive(n-1)return(prev[0],prev[1]+1)printrecursive(999999)# prints ('ok', 999999)

Note that if you press Ctrl-C while running this example, the tracebackwill be built withall recursive() calls so far, even if this is morethan the number that can possibly fit in the C stack. These frames are“overlapping” each other in the sense of the C stack; more precisely,they are copied out of and into the C stack as needed.

(The example above also makes use of the following general “guideline”to help newcomers write continulets: inbootstrap(c), only callmethods onc, not on another continulet object. That’s why we wrotec.switch(to=to) and notto.switch(), which would mess up thestate. This is however just a guideline; in general we would recommendto use other interfaces like genlets and greenlets.)

Stacklets

Continulets are internally implemented using stacklets, which is thegeneric RPython-level building block for “one-shot continuations”. Formore information about them please see the documentation in the C sourceatrpython/translator/c/src/stacklet/stacklet.h.

The modulerpython.rlib.rstacklet is a thin wrapper around the abovefunctions. The key point is that new() and switch() always return afresh stacklet handle (or an empty one), and switch() additionallyconsumes one. It makes no sense to have code in which the returnedhandle is ignored, or used more than once. Note thatstacklet.c iswritten assuming that the user knows that, and so no additional checkingoccurs; this can easily lead to obscure crashes if you don’t use awrapper like PyPy’s ‘_continuation’ module.

Theory of composability

Although the concept of coroutines is far from new, they have not beengenerally integrated into mainstream languages, or only in limited form(like generators in Python and iterators in C#). We can argue that apossible reason for that is that they do not scale well when a program’scomplexity increases: they look attractive in small examples, but themodels that require explicit switching, for example by naming the targetcoroutine, do not compose naturally. This means that a program thatuses coroutines for two unrelated purposes may run into conflicts causedby unexpected interactions.

To illustrate the problem, consider the following example (simplifiedcode using a theoreticalcoroutine class). First, a simple usage ofcoroutine:

main_coro=coroutine.getcurrent()# the main (outer) coroutinedata=[]defdata_producer():foriinrange(10):# add some numbers to the list 'data' ...data.append(i)data.append(i*5)data.append(i*25)# and then switch back to main to continue processingmain_coro.switch()producer_coro=coroutine()producer_coro.bind(data_producer)defgrab_next_value():ifnotdata:# put some more numbers in the 'data' list if neededproducer_coro.switch()# then grab the next value from the listreturndata.pop(0)

Every call to grab_next_value() returns a single value, but if necessaryit switches into the producer function (and back) to give it a chance toput some more numbers in it.

Now consider a simple reimplementation of Python’s generators in term ofcoroutines:

defgenerator(f):"""Wrap a function 'f' so that it behaves like a generator."""defwrappedfunc(*args,**kwds):g=generator_iterator()g.bind(f,*args,**kwds)returngreturnwrappedfuncclassgenerator_iterator(coroutine):def__iter__(self):returnselfdefnext(self):self.caller=coroutine.getcurrent()self.switch()returnself.answerdefYield(value):"""Yield the value from the current generator."""g=coroutine.getcurrent()g.answer=valueg.caller.switch()defsquares(n):"""Demo generator, producing square numbers."""foriinrange(n):Yield(i*i)squares=generator(squares)forxinsquares(5):printx# this prints 0, 1, 4, 9, 16

Both these examples are attractively elegant. However, they cannot becomposed. If we try to write the following generator:

defgrab_values(n):foriinrange(n):Yield(grab_next_value())grab_values=generator(grab_values)

then the program does not behave as expected. The reason is thefollowing. The generator coroutine that executesgrab_values()callsgrab_next_value(), which may switch to theproducer_corocoroutine. This works so far, but the switching back fromdata_producer() tomain_coro lands in the wrong coroutine: itresumes execution in the main coroutine, which is not the one from whichit comes. We expectdata_producer() to switch back to thegrab_next_values() call, but the latter lives in the generatorcoroutineg created inwrappedfunc, which is totally unknown tothedata_producer() code. Instead, we really switch back to themain coroutine, which confuses thegenerator_iterator.next() method(it gets resumed, but not as a result of a call toYield()).

Thus the notion of coroutine isnot composable. By opposition, theprimitive notion of continulets is composable: if you build twodifferent interfaces on top of it, or have a program that uses twice thesame interface in two parts, then assuming that both parts independentlywork, the composition of the two parts still works.

A full proof of that claim would require careful definitions, but let usjust claim that this fact is true because of the following observation:the API of continulets is such that, when doing aswitch(), itrequires the program to have some continulet to explicitly operate on.It shuffles the current continuation with the continuation stored inthat continulet, but has no effect outside. So if a part of a programhas a continulet object, and does not expose it as a global, then therest of the program cannot accidentally influence the continuationstored in that continulet object.

In other words, if we regard the continulet object as being essentiallya modifiablef_back, then it is just a link between the frame ofcallable() and the parent frame — and it cannot be arbitrarilychanged by unrelated code, as long as they don’t explicitly manipulatethe continulet object. Typically, both the frame ofcallable()(commonly a local function) and its parent frame (which is the framethat switched to it) belong to the same class or module; so from thatpoint of view the continulet is a purely local link between two localframes. It doesn’t make sense to have a concept that allows this linkto be manipulated from outside.