Movatterモバイル変換
[0]ホーム
[Python-ideas] PEP 550 v2
Nathaniel Smithnjs at pobox.com
Wed Aug 16 03:18:23 EDT 2017
On Tue, Aug 15, 2017 at 4:55 PM, Yury Selivanov <yselivanov.ml at gmail.com> wrote:> Hi,>> Here's the PEP 550 version 2.Awesome!Some of the changes from v1 to v2 might be a bit confusing -- inparticular the thing where ExecutionContext is now a stack ofLocalContext objects instead of just being a mapping. So here's thebig picture as I understand it:In discussions on the mailing list and off-line, we realized that themain reason people use "thread locals" is to implement fake dynamicscoping. Of course, generators/async/await mean that currently it'simpossible to *really* fake dynamic scoping in Python -- that's whatPEP 550 is trying to fix. So PEP 550 v1 essentially added "generatorlocals" as a refinement of "thread locals". But... it turns out that"generator locals" aren't enough to properly implement dynamic scopingeither! So the goal in PEP 550 v2 is to provide semantics strongenough to *really* get this right.I wrote up some notes on what I mean by dynamic scoping, and whyneither thread-locals nor generator-locals can fake it:https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope.ipynb> Specification> =============>> Execution Context is a mechanism of storing and accessing data specific> to a logical thread of execution. We consider OS threads,> generators, and chains of coroutines (such as ``asyncio.Task``)> to be variants of a logical thread.>> In this specification, we will use the following terminology:>> * **Local Context**, or LC, is a key/value mapping that stores the> context of a logical thread.If you're more familiar with dynamic scoping, then you can think of anLC as a single dynamic scope...> * **Execution Context**, or EC, is an OS-thread-specific dynamic> stack of Local Contexts....and an EC as a stack of scopes. Looking up a ContextItem in an ECproceeds by checking the first LC (innermost scope), then if itdoesn't find what it's looking for it checks the second LC (thenext-innermost scope), etc.> ``ContextItem`` objects have the following methods and attributes:>> * ``.description``: read-only description;>> * ``.set(o)`` method: set the value to ``o`` for the context item> in the execution context.>> * ``.get()`` method: return the current EC value for the context item.> Context items are initialized with ``None`` when created, so> this method call never fails.Two issues here, that both require some expansion of this API toreveal a *bit* more information about the EC structure.1) For trio's cancel scope use case I described in the last, Iactually need some way to read out all the values on the LocalContextstack. (It would also be helpful if there were some fast way to checkthe depth of the ExecutionContext stack -- or at least tell whetherit's 1 deep or more-than-1 deep. I know that any cancel scopes thatare in the bottommost LC will always be attached to the given Task, soI can set up the scope->task mapping once and re-use it indefinitely.OTOH for scopes that are stored in higher LCs, I have to check atevery yield whether they're currently in effect. And I want tominimize the per-yield workload as much as possible.)2) For classic decimal.localcontext context managers, the idea isstill that you save/restore the value, so that you can nest multiplecontext managers without having to push/pop LCs all the time. But theabove API is not actually sufficient to implement a propersave/restore, for a subtle reason: if you doci.set(ci.get())then you just (potentially) moved the value from a lower LC up to the top LC.Here's an example of a case where this can produce user-visible effects:https://github.com/njsmith/pep-550-notes/blob/master/dynamic-scope-on-top-of-pep-550-draft-2.pyThere are probably a bunch of options for fixing this. But basicallywe need some API that makes it possible to temporarily set a value inthe top LC, and then restore that value to what it was before (eitherthe previous value, or 'unset' to unshadow a value in a lower LC). Onesimple option would be to make the idiom be something like:@contextmanagerdef local_value(new_value): state = ci.get_local_state() ci.set(new_value) try: yield finally: ci.set_local_state(state)where 'state' is something like a tuple (ci in EC[-1],EC[-1].get(ci)). A downside with this is that it's a bit error-prone(very easy for an unwary user to accidentally use get/set instead ofget_local_state/set_local_state). But I'm sure we can come up withsomething.> Manual Context Management> ------------------------->> Execution Context is generally managed by the Python interpreter,> but sometimes it is desirable for the user to take the control> over it. A few examples when this is needed:>> * running a computation in ``concurrent.futures.ThreadPoolExecutor``> with the current EC;>> * reimplementing generators with iterators (more on that later);>> * managing contexts in asynchronous frameworks (implement proper> EC support in ``asyncio.Task`` and ``asyncio.loop.call_soon``.)>> For these purposes we add a set of new APIs (they will be used in> later sections of this specification):>> * ``sys.new_local_context()``: create an empty ``LocalContext``> object.>> * ``sys.new_execution_context()``: create an empty> ``ExecutionContext`` object.>> * Both ``LocalContext`` and ``ExecutionContext`` objects are opaque> to Python code, and there are no APIs to modify them.>> * ``sys.get_execution_context()`` function. The function returns a> copy of the current EC: an ``ExecutionContext`` instance.If there are enough of these functions then it might make sense tostick them in their own module instead of adding more stuff to sys. Iguess worrying about that can wait until the API details are more firmthough.> * If ``coro.cr_local_context`` is an empty ``LocalContext`` object> that ``coro`` was created with, the interpreter will set> ``coro.cr_local_context`` to ``None``.I like all the ideas in this section, but this specific point feels abit weird. Coroutine objects need a second hidden field somewhere tokeep track of whether the object they end up with is the same one theywere created with?If I set cr_local_context to something else, and then set it back tothe original value, does that trigger the magic await behavior or not?What if I take the initial LocalContext off of one coroutine andattach it to another, does that trigger the magic await behavior?Maybe it would make more sense to have two sentinel values:UNINITIALIZED and INHERIT?> To enable correct Execution Context propagation into Tasks, the> asynchronous framework needs to assist the interpreter:>> * When ``create_task`` is called, it should capture the current> execution context with ``sys.get_execution_context()`` and save it> on the Task object.I wonder if it would be useful to have an option to squash thisexecution context down into a single LocalContext, since we know we'llbe using it for a while and once we've copied an ExecutionContext itbecomes impossible to tell the difference between one that has lots ofinternal LocalContexts and one that doesn't. This could also be handyfor trio/curio's semantics where they initialize a new task's contextto be a shallow copy of the parent task: you could donew_task_coro.cr_local_context = get_current_context().squash()and then skip having to wrap every send() call in a run_in_context.> Generators> ---------->> Generators in Python, while similar to Coroutines, are used in a> fundamentally different way. They are producers of data, and> they use ``yield`` expression to suspend/resume their execution.>> A crucial difference between ``await coro`` and ``yield value`` is> that the former expression guarantees that the ``coro`` will be> executed fully, while the latter is producing ``value`` and> suspending the generator until it gets iterated again.>> Generators, similarly to coroutines, have a ``gi_local_context``> attribute, which is set to an empty Local Context when created.>> Contrary to coroutines though, ``yield from o`` expression in> generators (that are not generator-based coroutines) is semantically> equivalent to ``for v in o: yield v``, therefore the interpreter does> not attempt to control their ``gi_local_context``.Hmm. I assume you're simplifying for expository purposes, but 'yieldfrom' isn't the same as 'for v in o: yield v'. In fact PEP 380 says:"Motivation: [...] a piece of code containing a yield cannot befactored out and put into a separate function in the same way as othercode. [...] If yielding of values is the only concern, this can beperformed without much difficulty using a loop such as 'for v in g:yield v'. However, if the subgenerator is to interact properly withthe caller in the case of calls to send(), throw() and close(), thingsbecome considerably more difficult. As will be seen later, thenecessary code is very complicated, and it is tricky to handle all thecorner cases correctly."So it seems to me that the whole idea of 'yield from' is that it'ssupposed to handle all the tricky bits needed to guarantee that if youtake some code out of a generator and refactor it into a subgenerator,then everything works the same as before. This suggests that 'yieldfrom' should do the same magic as 'await', where by default thesubgenerator shares the same LocalContext as the parent generator.(And as a bonus it makes things simpler if 'yield from' and 'await'work the same.)> Asynchronous Generators> ----------------------->> Asynchronous Generators (AG) interact with the Execution Context> similarly to regular generators.>> They have an ``ag_local_context`` attribute, which, similarly to> regular generators, can be set to ``None`` to make them use the outer> Local Context. This is used by the new> ``contextlib.asynccontextmanager`` decorator.>> The EC support of ``await`` expression is implemented using the same> approach as in coroutines, see the `Coroutine Object Modifications`_> section.You showed how to make an iterator that acts like a generator. Is italso possible to make an async iterator that acts like an asyncgenerator? It's not immediately obvious, because you need to make surethat the local context gets restored each time you re-enter the__anext__ generator. I think it's something like:class AIter: def __init__(self): self._local_context = ... # Note: intentionally not async def __anext__(self): coro = self._real_anext() coro.cr_local_context = self._local_context return coro async def _real_anext(self): ...Does that look right?> ContextItem.get() Cache> ----------------------->> We can add three new fields to ``PyThreadState`` and> ``PyInterpreterState`` structs:>> * ``uint64_t PyThreadState->unique_id``: a globally unique> thread state identifier (we can add a counter to> ``PyInterpreterState`` and increment it when a new thread state is> created.)>> * ``uint64_t PyInterpreterState->context_item_deallocs``: every time> a ``ContextItem`` is GCed, all Execution Contexts in all threads> will lose track of it. ``context_item_deallocs`` will simply> count all ``ContextItem`` deallocations.>> * ``uint64_t PyThreadState->execution_context_ver``: every time> a new item is set, or an existing item is updated, or the stack> of execution contexts is changed in the thread, we increment this> counter.I think this can be refined further (and I don't understandcontext_item_deallocs -- maybe it's a mistake?). AFAICT the thingsthat invalidate a ContextItem's cache are:1) switching threadstates2) popping or pushing a non-empty LocalContext off the currentthreadstate's ExecutionContext3) calling ContextItem.set() on *that* context itemSo I'd suggest tracking the thread state id, a counter of how manynon-empty LocalContexts have been pushed/popped on this thread state,and a *per ContextItem* counter of how many times set() has beencalled.> Backwards Compatibility> =======================>> This proposal preserves 100% backwards compatibility.While this is mostly true in the strict sense, in practice this PEP isuseless if existing thread-local users like decimal and numpy can'tmigrate to it without breaking backcompat. So maybe this sectionshould discuss that?(For example, one constraint on the design is that we can't provideonly a pure push/pop API, even though that's what would be mostconvenient context managers like decimal.localcontext ornumpy.errstate, because we also need to provide some backcompat storyfor legacy functions like decimal.setcontext and numpy.seterr.)-n-- Nathaniel J. Smith --https://vorpus.org
More information about the Python-ideasmailing list
[8]ページ先頭