Movatterモバイル変換

Copy link

Member

markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There seems to be a lot of unnecessary work happening when moving from trace to trace or from tier to tier.

Thestack_pointer and frame attributes should be correctly handled in both tier 1 and tier 2 interpreters. They shouldn't need fixing up.

		opcode=next_instr[1].op.code;
		}

		// For selected opcodes build a new executor and enter it now.

Copy link

Member

markshannonDec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why "selected opcodes", why not everywhere?

Copy link

MemberAuthor

gvanrossumDec 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

In an earlier version that somehow didn't work. Right now the check whether the new trace isn't going to immediately deopt again relies on these opcodes. I figured once we have the side exit machinery working we could gradually increase the scope to other deoptimizations. Also, not all deoptimizations are worthy of the effort (e.g. the PEP 523 test).

Copy link

Member

markshannonDec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No special cases, please, it just make the code more complicated and slower.
If we want to treat some exits differently, let's do it properlyfaster-cpython/ideas#638, not here.

Copy link

MemberAuthor

gvanrossumDec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There are several reasons. First, as I explain below, for bytecodes other than branches, I can't promise an exact check for whether the newly created sub-executor doesn't just repeat the same deoptimizing uop that triggered its creation (in which case the sub-executor wouldalways deopt immediately if it is entered at all).

Second, for most bytecodes other than branches, deoptimization paths are relatively rare (IIRC this is apparent from thepystats data -- with the exception of someLOAD_ATTR specializations).

For branches, we expect many cases where the "common" path is not much more common than the "uncommon" path (e.g. 60/40 or 70/30). Now, it might make sense have adifferent special case here, where if e.g._GUARD_IS_TRUE_POP has a hot side-exit, we know that the branch goes the other way, so we can simply create a sub-executor starting at the less-common branch target. The current approach doesn't do this (mostly because I'd have to thread the special case all the way through the various optimizer functions) but just creates a new executor starting at the original Tier 1 branch bytecode -- in the expectation that if the counters are tuned just right, we will have executed the less-common branch in Tier 1 while taking the common branch in Tier 2, so that Tier 1's shift register has changed state and now indicates that the less-common branch is actually taken more frequently. The checks at L1161 and ff. below are a safeguard in case that didn't happen yet (there are all kinds of interesting scenarios, e.g. loops that don't iterate much -- remember that thefirst iteration each time we enter a loop will be done in Tier 1, where we stay until we hit theJUMP_BACKWARD bytecode at the end of the loop).

I propose this PR as a starting point for futher iterations, not as the ultimate design for side-exits. Let's discuss this Monday.

Python/optimizer.c OutdatedShow resolvedHide resolved

Python/ceval.cShow resolvedHide resolved

Python/ceval.c OutdatedShow resolvedHide resolved

Add API to access sub-interpreters

83297df

...and set resume_threshold so they are actually produced.

Add test

10b98f1

gvanrossum changed the title~~Proof of concept: add executor for less-taken branch~~gh-112354: add executor for less-taken branch

bedevere-appbot mentioned this pull request

Trace stitching#112354

Closed

gvanrossum changed the title~~gh-112354: add executor for less-taken branch~~gh-112354: Add executor for less-taken branch

gvanrossum marked this pull request as ready for review

December 13, 2023 22:07

bedevere-appbot added the awaiting core review label

Copy link

MemberAuthor

gvanrossum commentedDec 13, 2023•
edited
Loading

@markshannon Please review again. I did some of the things you asked for, for a few others I explained why not.

TODO:

Clearing out sub-executors when main executor is deallocated. I forgot about this, will add it before you have a look. (Fixed a memory leak while I was at it.)
Blurb.
Reduce stack frame save/restore ops when switching tiers.

gvanrossum added5 commits

December 13, 2023 15:11

Fix memory leak

1450ca6

Clear sub-executors array upon dealloc

dcde4d3

Add blurb

15df63f

Avoid redundant stack frame saves/restores

c786418

Revert "Disable curses tests in --fast-ci mode (make test)"

ee0734b

This has no business being in this PR.This reverts commitf1998c0.

gvanrossum commented

Dec 14, 2023

Include/internal/pycore_uops.hShow resolvedHide resolved

Python/specialize.c

		void
		_Py_Specialize_ForIter(PyObjectiter,_Py_CODEUNITinstr,intoparg)
		{
		assert(_PyOpcode_Deopt[instr->op.code]==FOR_ITER);

Copy link

MemberAuthor

gvanrossumDec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We should really add such asserts to many specialization functions; I ran into this one during an intense debugging session.

Copy link

Member

markshannonDec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The assert can beinstr->op.code == FOR_ITER and it shouldn't be necessary, as_Py_Specialize_ForIter is only called fromFOR_ITER.

Copy link

MemberAuthor

gvanrossumDec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I tried that and I get a Bus error. And of course it's not supposed to be called with something else! But a logic error in my early prototype caused that to happen, and it took me quite a while to track it down.

markshannon reviewed

Dec 14, 2023

		opcode=next_instr[1].op.code;
		}

		// For selected opcodes build a new executor and enter it now.

Copy link

Member

markshannonDec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No special cases, please, it just make the code more complicated and slower.
If we want to treat some exits differently, let's do it properlyfaster-cpython/ideas#638, not here.

		gotoenter_tier_two;// All systems go!
		}

		// The trace is guaranteed to deopt again; forget about it.

Copy link

Member

markshannonDec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is it? Why?

Copy link

MemberAuthor

gvanrossumDec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

See explanation above.

		Py_DECREF(current_executor);
		current_executor= (_PyUOpExecutorObject)pexecutor;

		// Reject trace if it repeats the uop that just deoptimized.

Copy link

Member

markshannonDec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why?

Copy link

MemberAuthor

gvanrossumDec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This test may be a bit imprecise(*), but it tries to discard the case where, even though the counter in the executor indicated that this side exit is "hot", the Tier 1 bytecode hasn't been re-specialized yet. In that case the trace projection will just repeat the uop that just took a deopt side exit, causing it to immediately deopt again. This seems a waste of time and executors -- eventually the sub-executor's deopt counter will also indicate it is hot, and then we'll try again, but it seems better (if we can catch it) to avoid creating the sub-executor in the first place, relying on exponential backoff for the side-exit counter instead (implemented below at L1180 and ff.).

For various reasons, the side-exit counters and the Tier 1 deopt counters don't run in sync, so it's possible that the side-exit counter triggers before the Tier 1 counter has re-specialized. This check gives that another chance.

The test that I wouldlike to use here would be check if the Tier 1 opcode is still unchanged (i.e., not re-specialized), but the executor doesn't record that information (and it would take up a lot of space, we'd need an extra byte for each uop that can deoptimize at least).

(*) The test I wrote is exact for the conditional branches I special-cased above (that's why there's a further special case here for_IS_NONE). For other opcodes it may miss a few cases, e.g. when a single T1 bytecode translates to multiple guards and the failing guard is not the first uop in the translation (i.e. this would always happens for calls, whose translation always starts with_PEP_523, which never dopts in cases we care about). In those cases we can produce a sub-executor that immediately deoptimizes. (And we never try to re-create executors, no matter how often it deoptimizes -- that's a general flaw in the current executor architecture that we should probably file separately.)

Python/specialize.c

		void
		_Py_Specialize_ForIter(PyObjectiter,_Py_CODEUNITinstr,intoparg)
		{
		assert(_PyOpcode_Deopt[instr->op.code]==FOR_ITER);

Copy link

Member

markshannonDec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The assert can beinstr->op.code == FOR_ITER and it shouldn't be necessary, as_Py_Specialize_ForIter is only called fromFOR_ITER.

Python/ceval.c OutdatedShow resolvedHide resolved

Copy link

Member

markshannon commentedDec 14, 2023

#113104 unifies_EXIT_TRACE with other exits and reduces the number of code paths.

gvanrossum added3 commits

December 14, 2023 08:23

Merge branch 'main' into uops-extras

655a841

Merge branch 'main' into uops-extras

32e36fa

Fix compiler warning about int/Py_ssize_t

f5b317a

gvanrossum commented

Dec 15, 2023

Copy link

MemberAuthor

gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Let's talk offline about special cases for side exits on Monday. I would prefer to do only the special cases first and generalize later, but I hear that you prefer a different development strategy.

Python/specialize.c

		void
		_Py_Specialize_ForIter(PyObjectiter,_Py_CODEUNITinstr,intoparg)
		{
		assert(_PyOpcode_Deopt[instr->op.code]==FOR_ITER);

Copy link

MemberAuthor

gvanrossumDec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

gvanrossum commented

Dec 15, 2023

Python/ceval.c OutdatedShow resolvedHide resolved

gvanrossum added2 commits

December 15, 2023 20:54

Be less casual about incref/decref current executor

4804a3c

Slightly nicer way to handle refcounts

46c7d26

gvanrossum commented

Dec 17, 2023

Copy link

MemberAuthor

gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Some food for thought for Monday's discussion.

		Py_DECREF(current_executor);
		current_executor= (_PyUOpExecutorObject)pexecutor;

		// Reject trace if it repeats the uop that just deoptimized.

Copy link

MemberAuthor

gvanrossumDec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

		gotoenter_tier_two;// All systems go!
		}

		// The trace is guaranteed to deopt again; forget about it.

Copy link

MemberAuthor

gvanrossumDec 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

See explanation above.