Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.1k
gh-104584: Support most jumping instructions#106393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
FWIW the Tier 2 interpreter now supports 109 bytecodes and 12 uops (the Tier 1 interpreter supports 202 bytecodes). Cases in Python/executor_cases.c.h:
|
It looks like if we can manage to arrange that a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We want to support specialized instructions, like LOAD_ATTR_INSTANCE_VALUE, and simple instructions like LOAD_FAST, but we specifically do not want to support unspecialized instructions like LOAD_ATTR.
If those instructions show up, either we are projecting the superblock too early or, more likely, need better specialization.
Regarding jumps:
- Unconditional jumps should be nops, simply relying on
SAVE_IP
to update the externally visible state - Conditional jumps should be converted to conditional exits and unconditional jumps.
@@ -2219,17 +2219,17 @@ dummy_func( | |||
} | |||
inst(JUMP_FORWARD, (--)) { | |||
JUMPBY(oparg); | |||
JUMP_POP_DISPATCH(oparg, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can we leave dispatch to the code generator?
The saved instruction pointer (frame->prev_instr
) is part of the VM state like any other, so shouldn't need special casing.Unless the necessary information is not otherwise present. All the information necessary is inJUMPBY(oparg)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Okay, so you're saying that for Tier 2 the code generator should just replaceJUMPBY(<expr>)
with something Tier-2-appropriate. That could actually work. There are three places where we shouldn't do that,SEND
,JUMP_BACKWARD
andENTER_EXECUTOR
, we can exclude those by forbidding something they use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
SEND
will probably need to be broken up into two micro ops, one that does the send, and another that does the jump. TheSEND
jump should be a more-or-less normal jump. We also need to track where we are sending from, to handle the matchingYIELD_VALUE
, so I'd "forbid"SEND
for now.
I think you already handleJUMP_BACKWARD
andENTER_EXECUTOR
correctly. They complete the loop, or exit.
Python/optimizer.c Outdated
@@ -338,6 +339,13 @@ translate_bytecode_to_trace( | |||
ADD_TO_TRACE(SAVE_IP, (int)(instr - (_Py_CODEUNIT *)code->co_code_adaptive)); | |||
int opcode = instr->op.code; | |||
uint64_t operand = instr->op.arg; | |||
// TODO: EXTENDED_ARG handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Setoparg = 0
, loop over EXTENDED_ARGS, shifting as you go then shift in the final oparg.
https://github.com/python/cpython/blob/main/Python/instrumentation.c#L1332
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Will do.
gvanrossumJul 5, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This turns out more complicated than I expected. The big question is:what IP shouldSAVE_IP
save?
To make my current deopt strategy work (re-execute the specialized bytecode, which will fail the same guard and then deopt to the generic bytecode), the saved IP should point to theEXTENDED_ARG
instruction. But the jump offsets are calculated relative to the jumping opcode. I suppose I could add the number ofEXTENDED_ARG
opcodes to the jump offsets (or subtract, for backward jumps).
But I worry that the error handling code might also need an adjustment -- I don't know ifEXTENDED_ARG
is included in exception handling ranges or if the error handling looks at the last executed bytecode.
I guess for now I can just treatEXTENDED_ARG
as an unsupported opcode and kick this can of worms down the road...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Saving the IP of the start of theinstruction (not codeunit) is the correct thing, I think.
On entering the tier 1 interpreter, it will then get the correctoparg
as it will execute all theEXTENDED_ARG
s.
EXTENDED_ARG
can never cause an error, so it doesn't matter whether they are included in the jump tables, although I think they are anyway.
Python/optimizer.c Outdated
_PyExecutorObject *executor = (_PyExecutorObject *)code->co_executors->executors[operand&255]; | ||
opcode = executor->vm_data.opcode; | ||
DPRINTF(2, " * ENTER_EXECUTOR -> %s\n", _PyOpcode_OpName[opcode]); | ||
operand = executor->vm_data.oparg; // TODO: EXTENDED_ARG handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
operand = (operand & 0xffffff00) | executor->vm_data.oparg
Python/executor_cases.c.h Outdated
@@ -118,12 +118,38 @@ | |||
break; | |||
} | |||
case TO_BOOL: { | |||
static_assert(INLINE_CACHE_ENTRIES_TO_BOOL == 3, "incorrect cache size"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We don't want unspecialized instructions in the superblocks. I'd treatENABLE_SPECIALIZATION
as one of the forbidden words.
The fix is to improve our specialization, not to handle unspecialized instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That seems a bit harsh. It would mean that if we have a sequence of innocent bytecodes like
LOAD_FAST iLOAD_CONST jBINARY_OP (/)STORE_FAST k...
we would not be able to include theBINARY_OP
andSTORE_FAST
instructions in a superblock, because the unspecializedBINARY_OP
ends the superblock generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Harsh, but fair.BINARY_OP
needs to work harder at being specialized 🙂
Seriously though, there is no merit in longer superblocks if we can't optimize them well, and the unspecialized forms do not optimize well. They are much slower, and force us to discard a lot of type information.
It is far more profitable to improve the specializer, so that unspecialized instructions are rare.
opcode = executor->vm_data.opcode; | ||
DPRINTF(2, " * ENTER_EXECUTOR -> %s\n", _PyOpcode_OpName[opcode]); | ||
operand = executor->vm_data.oparg; // TODO: EXTENDED_ARG handling | ||
} | ||
switch (opcode) { | ||
case LOAD_FAST_LOAD_FAST: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Rather than special case individual instructions, can we test on flags?
That way, this will continue to work if we addLOAD_FAST_LOAD_CONST
.
Something likeif (_Py_OpcodeMetadata[opcode].is_super_instruction)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We could do something where the code generator encodes this in thesize
field of the expansions, in the switch (currently) at line 405 below. E.g.-1
meansoperand >> 4
and-2
meansoperand & 0xF
. Should probably switch to an enum. The generator would have to work a little harder but that seems doable (e.g. it could detect the use ofoparg1
andoparg2
and then parse the instruction name into two pieces).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That should work, although I was thinking of the dumber approach of just annotatingLOAD_FAST_LOAD_FAST
, etc as a superinstruction in bytecodes.c. Something like adding/* superinstruction */
@@ -2702,6 +2702,10 @@ void Py_LeaveRecursiveCall(void) | |||
///////////////////// Experimental UOp Interpreter ///////////////////// | |||
#undef JUMP_POP_DISPATCH | |||
#define JUMP_POP_DISPATCH(x, n) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This seems unnecessary.
The superblock generator should be converting each conditional jump into a conditional exit, followed by an unconditional jump.
Such that:
POP_JUMP_IF_TRUE label
becomes:
False branch more likely
EXIT_IF_TRUE
True branch more likely
EXIT_IF_FALSEJUMP_FORWARD label
As for which branch is more likely, we will need to record which way branches go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Okay, just so I understand correctly, the idea is thatEXIT_IF_TRUE
/FALSE
doesn't pop, and continues in the bytecode at the originalPOP_JUMP_IF_TRUE
instruction, which will pop. So the "False branch more likely" (i.e., branch not likely) version will have to actually translate toEXIT_IF_TRUE; POP_TOP
. We could special-case this in the translator: it would have put such a list of uops in the expansion forPOP_JUMP_IF_TRUE
and friends, and we'd have to add hand-written cases to the uop executor forEXIT_IF_TRUE
etc.
In the other case (branch likely) theJUMP_FORWARD label
could just beSAVE_IP label
; superblock generation would then continue from the bytecode atlabel
. That's not something I currently do -- a simplistic approach could do theSAVE_IP
followed byEXIT_TRACE
, and we can iterate later on continuing fromlabel
.
But recording which way branches go is something for a future PR; again the most simplistic thing to do for now is to assume that branching is less likely than not branching (i.e., always generateEXIT_IF_TRUE; POP_TOP
for now).
Honestly, for now I think I'll stick to just making the generator explicitly translateJUMPBY(n)
into the correct sequence ofJUMPBY
,STACK_SHRINK
andDISPATCH
. Nothing should followJUMPBY
then.
markshannonJul 5, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think it makes more sense to consume the condition before exiting
These two instructions,POP_JUMP_IF_TRUE
andPOP_JUMP_IF_FALSE
(theNone
variants can be broken down intoLOAD_CONST None; IS_OP; POP_JUMP...
) are special, so don't worry too much about fitting them into the more general framework.
There are a number of approaches, for exits. Here are three:
- Leave the condition on the stack, and exit to the
POP_JUMP...
instruction.
I don't like this approach as it is inefficient and will make trace stitching tricky - Pop the condition and exit to the target, or successor, instruction.
This is better, but means that theEXIT
uop needs to handle a jump and the exit. - Convert the superblock into a tree, jumping to an exit branch which makes an unconditional exit.
Definitely my preferred option. Means that there is only oneEXIT
uop and that it just exits.
Option 3 makes the superblock creation more complex, but simplifies the tier2 interpreter.
We will want to push some operations onto exit branches in the specializer and partial evaluator, so we will need to convert the superblock to a tree at some point.
If option 3 is too complex for this PR, option 2 should work as a temporary step.
So for option 3, using the example above ofPOP_JUMP_IF_TRUE label
we get:
False branch more likely
POP_JUMP_IF_TRUE_UOP uop_label: ...uop_label: SAVE_IP label EXIT
True branch more likely
POP_JUMP_IF_FALSE_UOP uop_label: SAVE_IP label ...uop_label: EXIT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks, this gave me food for thought.
@@ -2219,17 +2219,17 @@ dummy_func( | |||
} | |||
inst(JUMP_FORWARD, (--)) { | |||
JUMPBY(oparg); | |||
JUMP_POP_DISPATCH(oparg, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Okay, so you're saying that for Tier 2 the code generator should just replaceJUMPBY(<expr>)
with something Tier-2-appropriate. That could actually work. There are three places where we shouldn't do that,SEND
,JUMP_BACKWARD
andENTER_EXECUTOR
, we can exclude those by forbidding something they use.
@@ -2702,6 +2702,10 @@ void Py_LeaveRecursiveCall(void) | |||
///////////////////// Experimental UOp Interpreter ///////////////////// | |||
#undef JUMP_POP_DISPATCH | |||
#define JUMP_POP_DISPATCH(x, n) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Okay, just so I understand correctly, the idea is thatEXIT_IF_TRUE
/FALSE
doesn't pop, and continues in the bytecode at the originalPOP_JUMP_IF_TRUE
instruction, which will pop. So the "False branch more likely" (i.e., branch not likely) version will have to actually translate toEXIT_IF_TRUE; POP_TOP
. We could special-case this in the translator: it would have put such a list of uops in the expansion forPOP_JUMP_IF_TRUE
and friends, and we'd have to add hand-written cases to the uop executor forEXIT_IF_TRUE
etc.
In the other case (branch likely) theJUMP_FORWARD label
could just beSAVE_IP label
; superblock generation would then continue from the bytecode atlabel
. That's not something I currently do -- a simplistic approach could do theSAVE_IP
followed byEXIT_TRACE
, and we can iterate later on continuing fromlabel
.
But recording which way branches go is something for a future PR; again the most simplistic thing to do for now is to assume that branching is less likely than not branching (i.e., always generateEXIT_IF_TRUE; POP_TOP
for now).
Honestly, for now I think I'll stick to just making the generator explicitly translateJUMPBY(n)
into the correct sequence ofJUMPBY
,STACK_SHRINK
andDISPATCH
. Nothing should followJUMPBY
then.
Python/executor_cases.c.h Outdated
@@ -118,12 +118,38 @@ | |||
break; | |||
} | |||
case TO_BOOL: { | |||
static_assert(INLINE_CACHE_ENTRIES_TO_BOOL == 3, "incorrect cache size"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That seems a bit harsh. It would mean that if we have a sequence of innocent bytecodes like
LOAD_FAST iLOAD_CONST jBINARY_OP (/)STORE_FAST k...
we would not be able to include theBINARY_OP
andSTORE_FAST
instructions in a superblock, because the unspecializedBINARY_OP
ends the superblock generation.
Python/executor_cases.c.h Outdated
#line127 "Python/executor_cases.c.h" | ||
#line153 "Python/executor_cases.c.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Useless diff chunks like this make it harder to review the generated output. I would love to drop the#line
directives. (Unsurprisingly, I get a lot of value out of seeing the diff of the generated code when I am fiddling with the code generator.) For debugging you could still generate line numbers using
python3 Tools/cases_generator/generate_cases.py -lPy_BUILD_CORE
opcode = executor->vm_data.opcode; | ||
DPRINTF(2, " * ENTER_EXECUTOR -> %s\n", _PyOpcode_OpName[opcode]); | ||
operand = executor->vm_data.oparg; // TODO: EXTENDED_ARG handling | ||
} | ||
switch (opcode) { | ||
case LOAD_FAST_LOAD_FAST: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
We could do something where the code generator encodes this in thesize
field of the expansions, in the switch (currently) at line 405 below. E.g.-1
meansoperand >> 4
and-2
meansoperand & 0xF
. Should probably switch to an enum. The generator would have to work a little harder but that seems doable (e.g. it could detect the use ofoparg1
andoparg2
and then parse the instruction name into two pieces).
Python/optimizer.c Outdated
@@ -338,6 +339,13 @@ translate_bytecode_to_trace( | |||
ADD_TO_TRACE(SAVE_IP, (int)(instr - (_Py_CODEUNIT *)code->co_code_adaptive)); | |||
int opcode = instr->op.code; | |||
uint64_t operand = instr->op.arg; | |||
// TODO: EXTENDED_ARG handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Handling branches is a bit tricky, so it's OK to take a bit of time to get this right.
@@ -2219,17 +2219,17 @@ dummy_func( | |||
} | |||
inst(JUMP_FORWARD, (--)) { | |||
JUMPBY(oparg); | |||
JUMP_POP_DISPATCH(oparg, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
SEND
will probably need to be broken up into two micro ops, one that does the send, and another that does the jump. TheSEND
jump should be a more-or-less normal jump. We also need to track where we are sending from, to handle the matchingYIELD_VALUE
, so I'd "forbid"SEND
for now.
I think you already handleJUMP_BACKWARD
andENTER_EXECUTOR
correctly. They complete the loop, or exit.
@@ -2702,6 +2702,10 @@ void Py_LeaveRecursiveCall(void) | |||
///////////////////// Experimental UOp Interpreter ///////////////////// | |||
#undef JUMP_POP_DISPATCH | |||
#define JUMP_POP_DISPATCH(x, n) \ |
markshannonJul 5, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think it makes more sense to consume the condition before exiting
These two instructions,POP_JUMP_IF_TRUE
andPOP_JUMP_IF_FALSE
(theNone
variants can be broken down intoLOAD_CONST None; IS_OP; POP_JUMP...
) are special, so don't worry too much about fitting them into the more general framework.
There are a number of approaches, for exits. Here are three:
- Leave the condition on the stack, and exit to the
POP_JUMP...
instruction.
I don't like this approach as it is inefficient and will make trace stitching tricky - Pop the condition and exit to the target, or successor, instruction.
This is better, but means that theEXIT
uop needs to handle a jump and the exit. - Convert the superblock into a tree, jumping to an exit branch which makes an unconditional exit.
Definitely my preferred option. Means that there is only oneEXIT
uop and that it just exits.
Option 3 makes the superblock creation more complex, but simplifies the tier2 interpreter.
We will want to push some operations onto exit branches in the specializer and partial evaluator, so we will need to convert the superblock to a tree at some point.
If option 3 is too complex for this PR, option 2 should work as a temporary step.
So for option 3, using the example above ofPOP_JUMP_IF_TRUE label
we get:
False branch more likely
POP_JUMP_IF_TRUE_UOP uop_label: ...uop_label: SAVE_IP label EXIT
True branch more likely
POP_JUMP_IF_FALSE_UOP uop_label: SAVE_IP label ...uop_label: EXIT
opcode = executor->vm_data.opcode; | ||
DPRINTF(2, " * ENTER_EXECUTOR -> %s\n", _PyOpcode_OpName[opcode]); | ||
operand = executor->vm_data.oparg; // TODO: EXTENDED_ARG handling | ||
} | ||
switch (opcode) { | ||
case LOAD_FAST_LOAD_FAST: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That should work, although I was thinking of the dumber approach of just annotatingLOAD_FAST_LOAD_FAST
, etc as a superinstruction in bytecodes.c. Something like adding/* superinstruction */
gvanrossum commentedJul 6, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Drat, the super-instruction refactor is broken. But1868f91 works great! I'll debug later. |
I think it might be worth breaking up this PR into 3:
The handling of jumps has some subtleties, so we might as well get the other changes in while we refine the design of jump handling. |
I might do that, though I suspect there are mute lurking bugs that we will only find by attempting to implement jumps. Also, jumps will be an iterative design anyway. And where do you see handling EXTENDED_ARG fitting in? |
In another PR as well. I much prefer lots of small PRs. Large, draft PRs tend to block other progress. |
Will do. I wonder though -- is a draft PR more of a problem than a private branch? If you'd rather have me do more of the latter I'm fine with that, but draft PRs have the advantage that they run CI. |
When `_PyOptimizer_BackEdge` returns `NULL`, we should restore `next_instr` (and `stack_pointer`). To accomplish this we should jump to `resume_with_error` instead of just `error`.The problem this causes is subtle -- the only repro I have is in PRgh-106393, at commitd7df54b. But the fix is real (as shown later in that PR).While we're at it, also improve the debug output: the offsets at which traces are identified are now measured in bytes, and always show the start offset. This makes it easier to correlate executor calls with optimizer calls, and either with `dis` output.<!-- gh-issue-number:gh-104584 -->* Issue:gh-104584<!-- /gh-issue-number -->
I pushed a leaner version, but without any of the significant changes to the branch strategy, so it's still a draft. Everything else has been split out or is being split out. |
8d07517
toc87f9d6
CompareIntroduce a new macro, JUMP_POP_DISPATCH(x, n).This does JUMPBY(x), STACK_SHRINK(n), DISPATCH().Most JUMP opcodes can use this.The exceptions are SEND, JUMP_BACKWARD, and JUMP_BACKWARD_NO_INTERRUPT.For JUMP_BACKWARD, I have to research whether CHECK_EVAL_BREAKER() and JUMPBY() commute.I think I'll just punt on SEND (it's too complex anyways).
This involves some subtle rearrangement of code in JUMP_BACKWARD.
This may destroy the symmetry or slow things down,but (for now) it's needed so that the executor canat least avoid bailing when the jump is not taken.(The original code was doing a jump 0 in that case.)
If JUMP_BACKWARD jumps to the start of the trace, add this.It contains an eval breaker check.
Okay, I'll close this. I'll create a new issue describing how I think branches could be handled. |
Uh oh!
There was an error while loading.Please reload this page.
SEND
FOR_ITER
and its specializations, exceptFOR_ITER_GEN
ENTER_EXECUTOR
POP_JUMP_IF_TRUE
/FALSE
to only jump when neededJUMP_BACKWARD
to top of trace becomes a specialJUMP_TO_TOP
uopA big weakness (?) is that this always assumes that branches are rarely taken. Any time a branch or jump (other than to the top of the trace) is taken, we leave the trace.