NotificationsYou must be signed in to change notification settings
Fork32.3k
Star67.8k

gh-91048: Add support for reconstructing async call stacks#103976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Closed

mpage wants to merge8 commits intopython:mainfrommpage:port-awaiters

Closed

gh-91048: Add support for reconstructing async call stacks#103976

mpage wants to merge8 commits intopython:mainfrommpage:port-awaiters

Conversation

Copy link

Contributor

mpage commentedApr 28, 2023•
edited by bedevere-bot
Loading

When profiling an async Python application it's useful to see both the
stack for the currently executing task as well as the chain of coroutines
that are transitively awaiting the task. Consider the following example,
where T represents a task, C represents a coroutine, and A '->' B
indicates A is awaiting B.

T0    +---> T1|     |     |C0    |     C2|     |     |v     |     vC1    |     C3|     |+-----|

The async stack from C3 would be C3, C2, C1, C0. In contrast, the
synchronous call stack while C3 is executing is only C3, C2. It's
possible to reconstruct this view in most cases using what is
currently available in CPython, however it's difficult to do so
efficiently, and would be very challenging to do so, let alone
efficiently, in an out of process profiler that leverages eBPF.

This introduces a new field onto coroutines and async generators
that makes it easy to efficiently reconstruct the async call stack.
The field stores an owned reference, set by the interpreter, to
the coroutine or async generator that is awaiting the field's owner.
To reconstruct the chain of coroutines/async generators one only
needs to walk the new field backwards.

Intermediate awaitables (e.g.Task,_GatheringFuture) complicate
maintaining a complete chain of awaiters. A special method,__set_awaiter__
is introduced to simplify the process. Types can provide an implementation
of this method to forward the awaiter on child objects.

Issue:Async Call-Stack Reconstruction #91048

mpage added4 commits

April 28, 2023 10:35

Add support for reconstructing async call stacks

ba18f2d

When profiling an async Python application it's useful to see both thestack for the currently executing task as well as the chain of coroutinesthat are transitively awaiting the task. Consider the following example,where T represents a task, C represents a coroutine, and A '->' Bindicates A is awaiting B.    T0    +---> T1    |     |     |    C0    |     C2    |     |     |    v     |     v    C1    |     C3    |     |    +-----|The async stack from C3 would be C3, C2, C1, C0. In contrast, thesynchronous call stack while C3 is executing is only C3, C2. It'spossible to reconstruct this view in most cases using what iscurrently available in CPython, however it's difficult to do soefficiently, and would be very challenging to do so, let aloneefficiently, in an out of process profiler that leverages eBPF.This introduces a new field onto coroutines and async generatorsthat makes it easy to efficiently reconstruct the async call stack.The field stores an owned reference, set by the interpreter, tothe coroutine or async generator that is awaiting the field's owner.To reconstruct the chain of coroutines/async generators one onlyneeds to walk the new field backwards.Intermediate awaitables (e.g. `Task`, `_GatheringFuture`) complicatemaintaining a complete chain of awaiters. A special method, `__set_awaiter__`is introduced to simplify the process. Types can provide an implementationof this method to forward the awaiter on child objects.

Fix formatting of blurb

975bd29

Fix style for_PyAwaitable_SetAwaiter

669fe71

Revert newline removal

9e234bb

bedevere-bot mentioned this pull request

Apr 28, 2023

Async Call-Stack Reconstruction#91048

Open

bedevere-bot added the awaiting review label

Apr 28, 2023

Fix inline literal syntax in blurb

34e674e

mpage marked this pull request as ready for review

April 28, 2023 20:33

mpage requested review frommarkshannon,1st1,asvetlov,gvanrossum,kumaraditya303 andwillingc ascode owners

April 28, 2023 20:33

carljm reviewed

Apr 29, 2023

View reviewed changes

Copy link

Member

carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice! Do you have pyperformance benchmark results?

Python/bytecodes.c OutdatedShow resolvedHide resolved

gvanrossum removed request for1st1,asvetlov andkumaraditya303

April 29, 2023 01:29

Copy link

ContributorAuthor

mpage commentedMay 1, 2023

Nice! Do you have pyperformance benchmark results?

Nope, will run them.

Copy link

Member

arhadthedev commentedMay 1, 2023

This branch has conflicts that must be resolved
[...]
Python/generated_cases.c.h

I tried to regeneratePython/generated_cases.c.h by porting the make rule:

cpython/Makefile.pre.in

Lines 1490 to 1503 ina679c3d

	regen-cases:
	# Regenerate Python/generated_cases.c.h
	# and Python/opcode_metadata.h
	# from Python/bytecodes.c
	# using Tools/cases_generator/generate_cases.py
	PYTHONPATH=$(srcdir)/Tools/cases_generator \
	$(PYTHON_FOR_REGEN) \
	$(srcdir)/Tools/cases_generator/generate_cases.py \
	--emit-line-directives \
	-o $(srcdir)/Python/generated_cases.c.h.new \
	-m $(srcdir)/Python/opcode_metadata.h.new \
	$(srcdir)/Python/bytecodes.c
	$(UPDATE_FILE) $(srcdir)/Python/generated_cases.c.h $(srcdir)/Python/generated_cases.c.h.new
	$(UPDATE_FILE) $(srcdir)/Python/opcode_metadata.h $(srcdir)/Python/opcode_metadata.h.new

to the Windows cmd. However, I get an exception:

D:\Oleg\cpython>set PYTHONPATH=D:\Oleg\cpythonD:\Oleg\cpython>python Tools/cases_generator/generate_cases.py --emit-line-directives Python/generated_cases.c.h.new Python/opcode_metadata.h.newRunning Release|x64 interpreter...Traceback (most recent call last):  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 1259, in <module>    main()  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 1250, in main    a.parse()  # Raises SyntaxError on failure    ^^^^^^^^^  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 557, in parse    self.parse_file(filename, instrs_idx)  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 568, in parse_file    with open(filename) as file:         ^^^^^^^^^^^^^^FileNotFoundError: [Errno 2] No such file or directory: 'Python/generated_cases.c.h.new'D:\Oleg\cpython>python Tools/cases_generator/generate_cases.py --emit-line-directives Python/generated_cases.c.h Python/opcode_metadata.hRunning Release|x64 interpreter...Traceback (most recent call last):  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 1259, in <module>    main()  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 1250, in main    a.parse()  # Raises SyntaxError on failure    ^^^^^^^^^  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 557, in parse    self.parse_file(filename, instrs_idx)  File "D:\Oleg\cpython\Tools\cases_generator\generate_cases.py", line 583, in parse_file    raise psr.make_syntax_error(  File "Python/generated_cases.c.h", line 4658    }    ^SyntaxError: Couldn't find '// BEGIN BYTECODES //' in Python/generated_cases.c.hD:\Oleg\cpython>pythonRunning Release|x64 interpreter...Python 3.12.0a5+ (tags/v3.12.0a5-186-gf300a1fa4c:f300a1fa4c, Apr 30 2023, 22:46:31) [MSC v.1929 64 bit (AMD64)] on win32

How should I call it correctly?

Move awaiter setting after specialization block and fix check in `SEN…

c8c348a

…D_GEN`

Copy link

Member

carljm commentedMay 1, 2023

@arhadthedev it would be better to make a new separate issue for difficulties regenerating cases on Windows, rather than conflating it into this PR, which isn't really related.

arhadthedev mentioned this pull request

May 1, 2023

PCBuild needs an analog ofmake regen-cases#104055

Open

Copy link

Member

gvanrossum commentedMay 1, 2023

FYI you should be able to run the script without any parameters and it will use the right files.

Also I thought it is run automatically by the Windows build.bat script.

Copy link

ContributorAuthor

mpage commentedMay 1, 2023•
edited
Loading

PyPerformance results:https://gist.github.com/mpage/b8f712396755c3fee277f84352058608

Looks neutral except for unpack sequence (1.23x faster) and unpickle (1.30x slower). I'm surprised that this change would affect either of these benchmarks. Are they considered to be reliable?

Copy link

Member

carljm commentedMay 1, 2023•
edited
Loading

Just chatted with@mpage and there are a few things we could do to streamline this more:

TheFRAME_OWNED_BY_GENERATOR pre-check shouldn't be necessary for correctness, we can probably just eliminate it?
Since we are doing a runtime check here based on characteristics of the code object (is it anasync def) it would make more sense to push that into the compiler. We could extract this into a dedicatedSET_AWAITER opcode that the compiler emits only in async defs; then no runtime check is needed.
Alternatively if we don't want it in a separate opcode, we could at least specialize away the check by having two versions ofSEND_GEN, one where the caller is known to be an async def and one where it isn't.
We could decide to have this feature enabled only by a runtime flag (since it's for improved debugging/profiling). In that case we probably wouldn't want it handled in the compiler, since then we'd need different sets of pycs for flag on/off.

My inclination is for#2 (do it in the compiler, new opcode.) Would welcome comments from other reviewers (e.g.@markshannon,@gvanrossum,@willingc).

mpage added2 commits

May 1, 2023 15:24

Merge branch 'main' into port-awaiters

67f2394

Handle case where the task's coroutine is NULL

5f63310

This can happen if task's coro completes eagerly

Copy link

Member

markshannon commentedMay 3, 2023

This adds considerable overhead to the fast path of generators, generator expressions and creating coroutines.
It would be more efficient to link the tasks.
So instead of

T0    +---> T1|     |     |C0    |     C2|     |     |v     |     vC1    |     C3|     |+-----|

We would have

T0  ---> T1|        |C0       C2|        |v        vC1       C3

The original could then be reconstructed as needed.

Copy link

Contributor

njsmith commentedMay 3, 2023

I guess this is a similar question to Mark's... this seems like something that you could do much more simply directly in asyncio, without touching the core interpreter? When a task doesawait some_task, then it invokesasyncio.Task.__await__, which is arbitrary code that asyncio controls -- wouldn't it be simpler for that code to make a note about the relationship between the tasks? In some cases it might even let you chain together relationships/"causality" through regularFutures, not just tasks.

Also, I might be missing something, but I think the code here gives the wrong answer in cases where a task has multiple waiters, which is allowed by asyncio semantics:

async def wait_on(task):    await taskasync def main():    sleep_task = asyncio.ensure_future(asyncio.sleep(10))    task1 = asyncio.ensure_future(wait_on(sleep_task))    task2 = asyncio.ensure_future(wait_on(sleep_task))

I think in this case the link from task1 -> sleep_task would get overwritten and lost?

Copy link

ContributorAuthor

mpage commentedMay 8, 2023

This adds considerable overhead to the fast path of generators, generator expressions and creating coroutines. It would be more efficient to link the tasks. So instead of
The original could then be reconstructed as needed.

It's worth mentioning that the choice to maintain the async stacks on coroutines was to make it as simple as possible to reconstruct the stack from an eBPF probe. I think that linking tasks should be doable, it'll mean that we have to recreate the logic of_PyGen_yf in a probe, but that isn't terribly complicated.

Where were you thinking that we would update the links between tasks?

Copy link

ContributorAuthor

mpage commentedMay 8, 2023

I guess this is a similar question to Mark's... this seems like something that you could do much more simply directly in asyncio, without touching the core interpreter? When a task doesawait some_task, then it invokesasyncio.Task.__await__, which is arbitrary code that asyncio controls -- wouldn't it be simpler for that code to make a note about the relationship between the tasks? In some cases it might even let you chain together relationships/"causality" through regularFutures, not just tasks.

It probably doesn't matter in practice, but since__await__ is an ordinary method there's nothing that guarantees that whatever calls__await__ will await the awaitable that is returned (wow that was a mouthful!).

Also, I might be missing something, but I think the code here gives the wrong answer in cases where a task has multiple waiters, which is allowed by asyncio semantics:
async def wait_on(task):    await taskasync def main():    sleep_task = asyncio.ensure_future(asyncio.sleep(10))    task1 = asyncio.ensure_future(wait_on(sleep_task))    task2 = asyncio.ensure_future(wait_on(sleep_task))
I think in this case the link from task1 -> sleep_task would get overwritten and lost?

Yep, this is a limitation of the implementation, the last awaiter wins. This has worked out OK for us so far.

Copy link

Member

gvanrossum commentedMay 8, 2023

I'm not going to be able to give this enough attention any time soon. Would it be okay to consider this for 3.13 instead of 3.12? 3.12 feature freeze (beta 1) was supposed to be today, and while the RM has postponed that by two weeks, he also gave guidance that this is not meant as an encouragement for new features (the SC ruled on a few pending PEPs, allowing some, postponing others to 3.13).