Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
gh-127022: SimplifyPyStackRef_FromPyObjectSteal#127024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation
This gets rid of the immortal check in `PyStackRef_FromPyObjectSteal()`.Overall, this improves performance about 2% in the free threadingbuild.This also renames `PyStackRef_Is()` to `PyStackRef_IsExactly()` becausethe macro requires that the tag bits of the arguments match, which isonly true in certain special cases.
2c43ad0 to5583ac0CompareFidget-Spinner commentedNov 20, 2024
Said benchmark:https://github.com/facebookexperimental/free-threading-benchmarking/tree/main/results/bm-20241118-3.14.0a1+-ed7085a-NOGIL I was thinking of how this breaks the nice encapsulation we have :(, but 2% speedup is too good to give up. |
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
There are a number of cases where we've gone from checking bitwise equality withPyStackRef_Is to checking equality after masking out the deferred bit with one ofPyStackRef_Is{None,True,False}. Were the previous checks wrong?
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Python/bytecodes.c Outdated
| replacedop(_POP_JUMP_IF_TRUE, (cond-- )) { | ||
| assert(PyStackRef_BoolCheck(cond)); | ||
| intflag=PyStackRef_Is(cond,PyStackRef_True); | ||
| intflag=PyStackRef_IsExactly(cond,PyStackRef_True); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Why do we usePyStackRef_IsExactly here (which doesn't mask out the deferred bit) but usePyStackRef_IsFalse (which does mask out the deferred bit) in_POP_JUMP_IF_FALSE above? Is this the rare case where it's safe?
colesburyNov 21, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Our codegen ensures that these ops only see True or False. That's often by adding aTO_BOOL immediately before, which may befolded intoCOMPARE_OP. The precedingTO_BOOL, including inCOMPARE_OP, ensures the canonical representation ofPyStackRef_False orPyStackRef_True with the deferred bit set.
However, there are two places incodegen.c that omit theTO_BOOL because they have other reasons to know that the result is exactly a boolean:
Lines 678 to 682 in09c240f
| ADDOP_I(c,loc,LOAD_FAST,0); | |
| ADDOP_LOAD_CONST(c,loc,_PyLong_GetOne()); | |
| ADDOP_I(c,loc,COMPARE_OP, (Py_NE <<5) |compare_masks[Py_NE]); | |
| NEW_JUMP_TARGET_LABEL(c,body); | |
| ADDOP_JUMP(c,loc,POP_JUMP_IF_FALSE,body); |
Lines 5746 to 5749 in09c240f
| ADDOP(c,LOC(p),GET_LEN); | |
| ADDOP_LOAD_CONST_NEW(c,LOC(p),PyLong_FromSsize_t(size)); | |
| ADDOP_COMPARE(c,LOC(p),GtE); | |
| RETURN_IF_ERROR(jump_to_fail_pop(c,LOC(p),pc,POP_JUMP_IF_FALSE)); |
TheCOMPARE_OPs here still generate bools, but not always in the canonical representation. So we can either:
- Modify
COMPARE_OPto ensure the canonical representation likehttps://github.com/colesbury/cpython/blob/5583ac0c311132e36ef458842e087945898ffdec/Python/bytecodes.c#L2409-L2416 - Use
PyStackRef_IsFalse(instead ofPyStackRef_IsExactly) in theJUMP_IF_FALSE - Modify the codegen by inserting
TO_BOOLin those two spots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That makes sense, thanks for the explanation. Since usingPyStackRef_IsExactly safely is sensitive to code generation changes, I might suggest using it only when we're sure it actually matters for performance, and default to using the variants that mask out the deferred bits everywhere by default since those are always safe. I'd guess that this wouldn't affect the performance improvement of this change much, since it should come from avoiding the tagging in_PyStackRef_FromPyObjectSteal. I don't feel super strongly though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'll switch to usingPyStackRef_IsFalse andPyStackRef_IsTrue.
I'm no longer convinced thatPyStackRef_IsExactly is actually a performance win (and I didn't see it in measurements). I think we have issues with code generation quality that we'll need to address later. Things likePOP_JUMP_IF_NONE are composed of_IS_NONE and_POP_JUMP_IF_TRUE and we pack the intermediate result in a tagged_PyStackRef. Clang does a pretty good job of optimizing through it. GCC less so:https://gcc.godbolt.org/z/Ejs8c78qd.
colesbury commentedNov 21, 2024
No, the previous checks were okay when |
mpage left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Nice!
Python/bytecodes.c Outdated
| replacedop(_POP_JUMP_IF_TRUE, (cond-- )) { | ||
| assert(PyStackRef_BoolCheck(cond)); | ||
| intflag=PyStackRef_Is(cond,PyStackRef_True); | ||
| intflag=PyStackRef_IsExactly(cond,PyStackRef_True); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That makes sense, thanks for the explanation. Since usingPyStackRef_IsExactly safely is sensitive to code generation changes, I might suggest using it only when we're sure it actually matters for performance, and default to using the variants that mask out the deferred bits everywhere by default since those are always safe. I'd guess that this wouldn't affect the performance improvement of this change much, since it should come from avoiding the tagging in_PyStackRef_FromPyObjectSteal. I don't feel super strongly though.
colesbury commentedNov 22, 2024
Benchmark on most recent changes:https://github.com/facebookexperimental/free-threading-benchmarking/tree/main/results/bm-20241122-3.14.0a1+-a9e4872-NOGIL#vs-base
|
4759ba6 intopython:mainUh oh!
There was an error while loading.Please reload this page.
This gets rid of the immortal check in `PyStackRef_FromPyObjectSteal()`.Overall, this improves performance about 2% in the free threadingbuild.This also renames `PyStackRef_Is()` to `PyStackRef_IsExactly()` becausethe macro requires that the tag bits of the arguments match, which isonly true in certain special cases.
Uh oh!
There was an error while loading.Please reload this page.
This gets rid of the immortal check in
PyStackRef_FromPyObjectSteal(). Overall, this improves performance about 1-2% in the free threading build.This also renames
PyStackRef_Is()toPyStackRef_IsExactly()because the macro requires that the tag bits of the arguments match, which is only true in certain special cases.PyStackRef_FromPyObjectSteal#127022