Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 789 – Preventing task-cancellation bugs by limiting yield in async generators

Author:
Zac Hatfield-Dodds <zac at zhd.dev>,Nathaniel J. Smith <njs at pobox.com>
PEP-Delegate:

Discussions-To:
Discourse thread
Status:
Draft
Type:
Standards Track
Created:
14-May-2024
Python-Version:
3.14

Table of Contents

Abstract

Structured concurrency is increasingly popular in Python. Interfaces such astheasyncio.TaskGroup andasyncio.timeout context managers supportcompositional reasoning, and allow developers to clearly scope the lifetimes ofconcurrent tasks. However, usingyield to suspend a frame inside such acontext leads to situations where the wrong task is canceled, timeouts areignored, and exceptions are mishandled. More fundamentally, suspending a frameinside aTaskGroup violates the structured concurrency design principle thatchild tasks are encapsulated within their parent frame.

To address these issues, this PEP proposes a newsys.prevent_yields() contextmanager. When syntactically inside this context, attempting toyield willraise a RuntimeError, preventing the task from yielding. Additionally, amechanism will be provided for decorators such as@contextmanager to allowyields inside the decorated function.sys.prevent_yields() will be used byasyncio and downstream libraries to implement task groups, timeouts, andcancellation; and a related mechanism bycontextlib etc. to convertgenerators into context managers which allow safe yields.

Background

Structured concurrency is increasingly popular in Python, in the form of newerasyncio interfaces and third-party libraries such as Trio and anyio.These interfaces support compositional reasoning,so long as users never writeayield which suspends a frame while inside a cancel scope.

A cancel scope is a context manager which can… cancel… whatever work occurswithin that context (…scope). In asyncio, this is implicit in the design ofwithasyncio.timeout(): orasyncwithasyncio.TaskGroup()astg:, whichrespectively cancel the contained work after the specified duration, or cancelsibling tasks when one of them raises an exception. The core functionality ofa cancel scope is synchronous, but the user-facing context managers may beeither sync or async.[1][2]

This structured approach works beautifully, unless you hit one specific sharpedge: breaking the nesting structure byyielding inside a cancel scope.This has much the same effect on structured control flow as adding just a fewcross-functiongotos, and the effects are truly dire:

  • The wrong task can be canceled, whether due to a timeout, an error in asibling task, or an explicit request to cancel some other task
  • Exceptions, includingCancelledError, can be delivered to the wrong task
  • Exceptions can go missing entirely, being dropped instead of added to anExceptionGroup

Problem statement

Here’s the fundamental issue: yield suspends a call frame. It only makes senseto yield in a leaf frame – i.e., if your call stack goes like A -> B -> C, thenyou can suspend C, but you can’t suspend B while leaving C running.

But, TaskGroup is a kind of “concurrent call” primitive, where a single framecan have multiple child frames that run concurrently. This means that if weallow people to mix yield and TaskGroup, then we can end up in exactly thissituation, where B gets suspended but C is actively running. This isnonsensical, and causes serious practical problems (e.g., if C raises anexception and A has returned, we have no way to propagate it).

This is a fundamental incompatibility between generator control flow andstructured concurrency control flow, not something we can fix by tweaking ourAPIs. The only solution seems to be to forbid yield inside a TaskGroup.

Although timeouts don’t leave a child task running, the close analogy andrelated problems lead us to conclude that yield should be forbidden inside allcancel scopes, not only TaskGroups. SeeCan’t we just deliver exceptions to the right place? for discussion.

Motivating examples

Let’s consider three examples, to see what this might look like in practice.

Leaking a timeout to the outer scope

Suppose that we want to iterate over an async iterator, but wait for at mostmax_time seconds for each element. We might naturally encapsulate the logicfor doing so in an async generator, so that the call site can continue to use astraightforwardasyncfor loop:

asyncdefiter_with_timeout(ait,max_time):try:whileTrue:withtimeout(max_time):yieldawaitanext(ait)exceptStopAsyncIteration:returnasyncdeffn():asyncforeleminiter_with_timeout(ait,max_time=1.0):awaitdo_something_with(elem)

Unfortunately, there’s a bug in this version: the timeout might expire after thegenerator yields but before it is resumed! In this case, we’ll see aCancelledError raised in the outer task, where it cannot be caught by thewithtimeout(max_time): statement.

The fix is fairly simple: get the next element inside the timeout context, andthen yieldoutside that context.

asyncdefcorrect_iter_with_timeout(ait,max_time):try:whileTrue:withtimeout(max_time):tmp=awaitanext(ait)yieldtmpexceptStopAsyncIteration:return

Leaking background tasks (breaks cancellation and exception handling)

Timeouts are not the only interface which wrap a cancel scope - and if youneed some background worker tasks, you can’t simply close theTaskGroupbefore yielding.

As an example, let’s look at a fan-in generator, which we’ll use to merge thefeeds from several “sensors”. We’ll also set up our mock sensors with a smallbuffer, so that we’ll raise an error in the background task while control flowis outside thecombined_iterators generator.

importasyncio,itertoolsasyncdefmock_sensor(name):forninitertools.count():awaitasyncio.sleep(0.1)ifn==1andname=="b":# 'presence detection'yield"PRESENT"elifn==3andname=="a":# inject a simple bugprint("oops, raising RuntimeError")raiseRuntimeErrorelse:yieldf"{name}-{n}"# non-presence sensor dataasyncdefmove_elements_to_queue(ait,queue):asyncforobjinait:awaitqueue.put(obj)asyncdefcombined_iterators(*aits):"""Combine async iterators by starting N tasks, each of    which move elements from one iterable to a shared queue."""q=asyncio.Queue(maxsize=2)asyncwithasyncio.TaskGroup()astg:foraitinaits:tg.create_task(move_elements_to_queue(ait,q))whileTrue:yieldawaitq.get()asyncdefturn_on_lights_when_someone_gets_home():combined=combined_iterators(mock_sensor("a"),mock_sensor("b"))asyncforeventincombined:print(event)ifevent=="PRESENT":breakprint("main task sleeping for a bit")awaitasyncio.sleep(1)# do some other operationasyncio.run(turn_on_lights_when_someone_gets_home())

When we run this code, we see the expected sequence of observations, then a‘detection’, and then while the main task is sleeping we trigger thatRuntimeError in the background. But… we don’t actually observe theRuntimeError, not even as the__context__ of another exception!

>> python3.11 demo.pya-0b-0a-1PRESENTmain task sleeping for a bitoops, raising RuntimeErrorTraceback (most recent call last):  File"demo.py", line39, in<module>asyncio.run(turn_on_lights_when_someone_gets_home())...  File"demo.py", line37, inturn_on_lights_when_someone_gets_homeawaitasyncio.sleep(1)# do some other operation  File".../python3.11/asyncio/tasks.py", line649, insleepreturnawaitfutureasyncio.exceptions.CancelledError

Here, again, the problem is that we’veyielded inside a cancel scope;this time the scope which aTaskGroup uses to cancel sibling tasks when oneof the child tasks raises an exception. However, theCancelledError whichwas intended for the sibling task was instead injected into theouter task,and so we never got a chance to create and raise anExceptionGroup(...,[RuntimeError()]).

To fix this, we need to turn our async generator into an async context manager,which yields an async iterable - in this case a generator wrapping the queue; infutureperhaps the queue itself:

asyncdefqueue_as_aiterable(queue):# async generators that don't `yield` inside a cancel scope are fine!whileTrue:try:yieldawaitqueue.get()exceptasyncio.QueueShutDown:return@asynccontextmanager#  yield-in-cancel-scope is OK in a context managerasyncdefcombined_iterators(*aits):q=asyncio.Queue(maxsize=2)asyncwithasyncio.TaskGroup()astg:foraitinaits:tg.create_task(move_elements_to_queue(ait,q))yieldqueue_as_aiterable(q)asyncdefturn_on_lights_when_someone_gets_home():...asyncwithcombined_iterators(...)asait:asyncforeventinait:...

In a user-defined context manager

Yielding inside a cancel scope can be safe, if and only if you’re using thegenerator to implement a context manager[3] - in this case anypropagating exceptions will be redirected to the expected task.

We’ve also implemented theASYNC101 linter rule inflake8-async, which warns against yielding inknown cancel scopes. Could user education be sufficient to avoid theseproblems? Unfortunately not: user-defined context managers can also wrap acancel scope, and it’s infeasible to recognize or lint for all such cases.

This regularly arises in practice, because ‘run some background tasks for theduration of this context’ is a very common pattern in structured concurrency.We saw that incombined_iterators() above; and have seen this bug inmultiple implementations of the websocket protocol:

asyncdefget_messages(websocket_url):# The websocket protocol requires background tasks to manage the socket heartbeatasyncwithopen_websocket(websocket_url)asws:# contains a TaskGroup!whileTrue:yieldawaitws.get_message()asyncwithopen_websocket(websocket_url)asws:asyncformessageinget_messages(ws):...

Specification

To prevent these problems, we propose:

  1. a new context manager,withsys.prevent_yields(reason):... which willraise a RuntimeError if you attempt to yield while inside it.[4]Cancel-scope-like context managers in asyncio and downstream code can thenwrap this to prevent yielding insidetheir with-block.
  2. a mechanism by which generator-to-context-manager decorators can allow yieldsacross one call. We’re not yet sure what this should look like; the leadingcandidates are:
    1. a code-object attribute,fn.__code__.co_allow_yields=True, or
    2. some sort of invocation flag, e.g.fn.__invoke_with_yields__, to avoidmutating a code object that might be shared between decorated and undecoratedfunctions

Implementation - tracking frames

The newsys.prevent_yields context manager will require interpreter support.For each frame, we track the entries and exits of this context manager.

We’re not particularly attached to the exact representation; we’ll discuss it asa stack (which would support clear error messages), but more compactrepresentations such as pair-of-integers would also work.

  • When entering a newly-created or resumed frame, initialize empty stacks ofentries and exits.
  • When returning from a frame, merge these stacks into that of the parent frame.
  • When yielding:
    • ifentries!=[]andnotframe.allow_yield_flag, raise aRuntimeErrorinstead of yielding (the new behavior this PEP proposes)
    • otherwise, merge stacks into the parent frame as for a return.

Because this is about yielding frameswithin a task, not switching betweentasks, syntacticyield andyieldfrom should be affected, butawaitexpressions should not.

We can reduce the overhead by storing this metadata in a single stack per threadfor all stack frames which are not generators.

Worked examples

No-yield example

In this example, we see multiple rounds of the stack merging as we unwind fromsys.prevent_yields, through the user-defined ContextManager, back to theoriginal Frame. For brevity, the reason for preventing yields is not shown;it is part of the “1 enter” state.

../_images/pep-789-example-no-yield.png

With noyield we don’t raise any errors, and because the number of entersand exits balance the frame returns as usual with no further tracking.

Attempts-to-yield example

In this example, the Frame attempts toyield while inside thesys.prevent_yields context. This is detected by the interpreter,which raises aRuntimeError instead of suspending the frame.

../_images/pep-789-example-yield-errors.png

Allowed-to-yield example

In this example, a decorator has marked the Frame as allowing yields. Thiscould be@contextlib.contextmanager or a related decorator.

../_images/pep-789-example-yield-allowed.png

When the Frame is allowed to yield, the entry/exit stack is merged into theparent frame’s stack before suspending. When the Frame resumes, its stack isempty. Finally, when the Frame exits, the exit is merged into the parentframe’s stack, rebalancing it.

This ensures that the parent frame correctly inherits any remainingsys.prevent_yields state, while allowing the Frame to safely suspendand resume.

Allowing yield for context managers

TODO: this section is a placeholder, pending a decision on the mechanism for``@contextmanager`` to re-enable yields in the wrapped function.

  • Explain and show a code sample of how@asynccontextmanager sets the flag

Note that third-party decorators such as@pytest.fixture demonstrate thatwe can’t just have the interpreter special-case contextlib.

Behavior ifsys.prevent_yields is misused

While unwise, it’s possible to callsys.prevent_yields.__enter__ and.__exit__ in an order that does not correspond to any valid nesting, or getan invalid frame state in some other way.

There are two wayssys.prevent_yields.__exit__ could detect an invalid state.First, if yields are not prevented, we can simply raise an exception withoutchanging the state. Second, if an unexpected entry is at the top of the stack,we suggest popping that entry and raising an exception – this ensures thatout-of-order calls will still clear the stack, while still making it clear thatsomething is wrong.

(and if we choose e.g. an integer- rather than stack-based representation, suchstates may not be distinguishable from correct nesting at all, in which case thequestion will not arise)

Anticipated uses

In the standard library,sys.prevent_yields could be used byasyncio.TaskGroup,asyncio.timeout, andasyncio.timeout_at.Downstream, we expect to use it intrio.CancelScope, async fixtures (inpytest-trio, anyio, etc.), and perhaps other places.

We consider use-cases unrelated to async correctness, such as preventingdecimal.localcontext from leaking out of a generator, out of scope for thisPEP.

The generator-to-context-manager support would be used by@contextlib.(async)contextmanager, and if necessary in(Async)ExitStack.

Backwards Compatibility

The addition of thesys.prevent_yields context manager, changes to@contextlib.(async)contextmanager, and corresponding interpretersupport are all fully backwards-compatible.

Preventing yields insideasyncio.TaskGroup,asycio.timeout, andasyncio.timeout_at would be a breaking change to at least some code in thewild, which (however unsafe and prone to the motivating problems above) may workoften enough to make it into production.

We will seek community feedback on appropriate deprecation pathways forstandard-library code, including the suggested length of any deprecation period.As an initial suggestion, we could make suspending stdlib contexts emit aDeprecationWarning only under asyncio debug mode in 3.14; then transition towarn-by-default and error under debug mode in 3.15; and finally a hard error in3.16.

Irrespective of stdlib usage, downstream frameworks would adopt thisfunctionality immediately.

How widespread is this bug?

We don’t have solid numbers here, but believe that many projects are affected inthe wild. Since hitting a moderate and a critical bug attributed to suspendinga cancel scope in the same week at work, we’veused static analysis with some success. Threepeople Zac spoke to at PyCon recognized the symptoms and concluded that they hadlikely been affected.

TODO: run the ASYNC101 lint rule across ecosystem projects, e.g. the aio-libspackages, and get some sense of frequency in widely-used PyPI packages?This would help inform the break/deprecation pathways for stdlib code.

How to Teach This

Async generators are very rarely taught to novice programmers.

Most intermediate and advanced Python programmers will only interact with thisPEP as users ofTaskGroup,timeout, and@contextmanager. For thisgroup, we expect a clear exception message and documentation to be sufficient.

  • A new section will be added to thedeveloping with asyncio page, whichbriefly states that async generators are not permitted toyield wheninside a “cancel scope” context, i.e.TaskGroup ortimeout contextmanager. We anticipate that the problem-restatement and some parts of themotivation section will provide a basis for these docs.
    • When working in codebases which avoid async generators entirely[5],we’ve found that an async context manager yielding an async iterable is a safeand ergonomic replacement for async generators – and avoids the delayed-cleanupproblems described inPEP 533, which this proposal does not address.
  • In the docs for each context manager which wraps a cancel scope, and thus nowsys.prevent_yields, include a standard sentence such as “If used within anasync generator, [it is an error toyield inside this context manager].”with a hyperlink to the explanation above.

For asyncio, Trio, curio, or other-framework maintainers who implementcancel scope semantics, we will ensure that the documentation ofsys.prevent_yields gives a full explanation distilled from the solution andimplementation sections of this PEP. We anticipate consulting most suchmaintainers for their feedback on the draft PEP.

Rejected alternatives

PEP 533, deterministic cleanup for iterators

PEP 533 proposes adding__[a]iterclose__ to the iterator protocol,essentially wrapping awith[a]closing(ait) around each (async) for loop.While this would be useful for ensuring timely and deterministic cleanup ofresources held by iterators, the problem it aims to solve, it does not fullyaddress the issues that motivate this PEP.

Even with PEP 533, misfired cancellations would still be delivered to the wrongtask and could wreak havoc before the iterator is closed. Moreover, it does notaddress the fundamental structured concurrency problem withTaskGroup, wheresuspending a frame that owns a TaskGroup is incompatible with the model of childtasks being fully encapsulated within their parent frame.

Deprecate async generators entirely

At the 2024 language summit, several attendees suggested instead deprecating asyncgeneratorsin toto. Unfortunately, while the common-in-practice cases all useasync generators, Trio code can trigger the same problem with standard generators:

# We use Trio for this example, because while `asyncio.timeout()` is async,# Trio's CancelScope type and timeout context managers are synchronous.importtriodefabandon_each_iteration_after(max_seconds):# This is of course broken, but I can imagine someone trying it...whileTrue:withtrio.move_on_after(max_seconds):yield@trio.runasyncdefmain():for_inabandon_each_iteration_after(max_seconds=1):awaittrio.sleep(3)

If it wasn’t for the bug in question, this code would look pretty idiomatic -but after about a second, instead of moving on to the next iteration it raises:

Traceback (most recent call last):  File"demo.py", line10, in<module>asyncdefmain():  File"trio/_core/_run.py", line2297, inrunraiserunner.main_task_outcome.error  File"demo.py", line12, inmainawaittrio.sleep(3)  File"trio/_timeouts.py", line87, insleepawaitsleep_until(trio.current_time()+seconds)...  File"trio/_core/_run.py", line1450, inraise_cancelraiseCancelled._create()trio.Cancelled:Cancelled

Furthermore, there are some non-cancel-scope synchronous context managers whichexhibit related problems, such as the abovementioneddecimal.localcontext.While fixing the example below is not a goal of this PEP, it demonstrates thatyield-within-with problems are not exclusive to async generators:

importdecimaldefwhy_would_you_do_this():withdecimal.localcontext(decimal.Context(prec=1)):yieldone=decimal.Decimal(1)print(one/3)# 0.3333333333333333333333333333next(gen:=why_would_you_do_this())print(one/3)# 0.3

While I’ve had good experiences in async Python without async generators[5], I’d prefer to fix the problem than remove them from thelanguage.

Can’t we just deliver exceptions to the right place?

If we implementedPEP 568 (Generator-sensitivity for Context Variables; seealsoPEP 550), it would be possible to handle exceptions from timeouts: theevent loop could avoid firing aCancelledError until the generator framewhich contains the context manager is on the stack - either when the generatoris resumed, or when it is finalized.

This can take arbitrarily long; even if we implementedPEP 533 to ensuretimely cleanup on exiting (async) for-loops it’s still possible to drive agenerator manually with next/send.

However, this doesn’t address the other problem withTaskGroup. The modelfor generators is that you put a stack frame in suspended animation and can thentreat it as an inert value which can be stored, moved around, and maybediscarded or revived in some arbitrary place. The model for structuredconcurrency is that your stack becomes a tree, with child tasks encapsulatedwithin some parent frame. They’re extending the basic structured programmingmodel in different, and unfortunately incompatible, directions.

Suppose for example that suspending a frame containing an openTaskGroupalso suspended all child tasks. This would preserve the ‘downward’ structuredconcurrency, in that children remain encapsulated - albeit at the cost ofdeadlocking both of our motivating examples, and much real-world code.However, it would still be possible to resume the generator in a differenttask, violating the ‘upwards’ invariant of structured concurrency.

We don’t think it’s worth adding this much machinery to handle cancel scopes,while still leaving task groups broken.

Alternative implementation - inspecting bytecode

Jelle Zijlstra hassketched an alternative, wheresys.prevent_yieldsinspects the bytecode of callers until satisfied that there is no yield betweenthe calling instruction pointer and the next context exit. We expect thatsupport for syntatically-nested context managers could be added fairly easily.

However, it’s not yet clear how this would work when user-defined contextmanagers wrapsys.prevent_yields. Worse, this approach ignores explicitcalls to__enter__() and__exit__(), meaning that the context managementprotocol would vary depending on whether thewith statement was used.

The ‘only pay if you use it’ performance cost is very attractive. However,inspecting frame objects is prohibitively expensive for core control-flowconstructs, and causes whole-program slowdowns via de-optimization.On the other hand, adding interpreter support for better performance leadsback to the same pay-regardless semantics as our preferred solution above.

Footnotes

[1]
While cancel scopes are implicit in asyncio, the analogoustrio.fail_after() (sync) andtrio.open_nursery()(async) context managers literally wrap an instance oftrio.CancelScope. We’ll stick with asyncio for exampleshere, but say “cancel scope” when referring to the framework-independentconcept.
[2]
ATaskGroup is not _only_ a cancel scope, but preventing yields wouldresolve their further problem too. SeeCan’t we just deliver exceptions to the right place?.
[3]
via e.g.contextlib.[async]contextmanager, or moralequivalents such as@pytest.fixture
[4]
Note that this prevents yields in both sync and async generators, so thatdownstream frameworks can safely define sync cancel scope countexts such astrio.fail_after().
[5] (1,2)
seeZac’s experience report here

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0789.rst

Last modified:2024-06-04 01:45:13 GMT


[8]ページ先頭

©2009-2025 Movatter.jp