Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-127022: SimplifyPyStackRef_FromPyObjectSteal#127024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
colesbury merged 5 commits intopython:mainfromcolesbury:gh-127022-cheaper-steal
Nov 22, 2024

Conversation

@colesbury
Copy link
Contributor

@colesburycolesbury commentedNov 19, 2024
edited
Loading

This gets rid of the immortal check inPyStackRef_FromPyObjectSteal(). Overall, this improves performance about 1-2% in the free threading build.

This also renamesPyStackRef_Is() toPyStackRef_IsExactly() because the macro requires that the tag bits of the arguments match, which is only true in certain special cases.

This gets rid of the immortal check in `PyStackRef_FromPyObjectSteal()`.Overall, this improves performance about 2% in the free threadingbuild.This also renames `PyStackRef_Is()` to `PyStackRef_IsExactly()` becausethe macro requires that the tag bits of the arguments match, which isonly true in certain special cases.
@Fidget-Spinner
Copy link
Member

Said benchmark:https://github.com/facebookexperimental/free-threading-benchmarking/tree/main/results/bm-20241118-3.14.0a1+-ed7085a-NOGIL

I was thinking of how this breaks the nice encapsulation we have :(, but 2% speedup is too good to give up.

@colesburycolesbury marked this pull request as ready for reviewNovember 21, 2024 16:35
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Copy link
Contributor

@mpagempage left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

There are a number of cases where we've gone from checking bitwise equality withPyStackRef_Is to checking equality after masking out the deferred bit with one ofPyStackRef_Is{None,True,False}. Were the previous checks wrong?

replacedop(_POP_JUMP_IF_TRUE, (cond-- )) {
assert(PyStackRef_BoolCheck(cond));
intflag=PyStackRef_Is(cond,PyStackRef_True);
intflag=PyStackRef_IsExactly(cond,PyStackRef_True);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why do we usePyStackRef_IsExactly here (which doesn't mask out the deferred bit) but usePyStackRef_IsFalse (which does mask out the deferred bit) in_POP_JUMP_IF_FALSE above? Is this the rare case where it's safe?

Copy link
ContributorAuthor

@colesburycolesburyNov 21, 2024
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Our codegen ensures that these ops only see True or False. That's often by adding aTO_BOOL immediately before, which may befolded intoCOMPARE_OP. The precedingTO_BOOL, including inCOMPARE_OP, ensures the canonical representation ofPyStackRef_False orPyStackRef_True with the deferred bit set.

However, there are two places incodegen.c that omit theTO_BOOL because they have other reasons to know that the result is exactly a boolean:

ADDOP_I(c,loc,LOAD_FAST,0);
ADDOP_LOAD_CONST(c,loc,_PyLong_GetOne());
ADDOP_I(c,loc,COMPARE_OP, (Py_NE <<5) |compare_masks[Py_NE]);
NEW_JUMP_TARGET_LABEL(c,body);
ADDOP_JUMP(c,loc,POP_JUMP_IF_FALSE,body);

cpython/Python/codegen.c

Lines 5746 to 5749 in09c240f

ADDOP(c,LOC(p),GET_LEN);
ADDOP_LOAD_CONST_NEW(c,LOC(p),PyLong_FromSsize_t(size));
ADDOP_COMPARE(c,LOC(p),GtE);
RETURN_IF_ERROR(jump_to_fail_pop(c,LOC(p),pc,POP_JUMP_IF_FALSE));

TheCOMPARE_OPs here still generate bools, but not always in the canonical representation. So we can either:

  1. ModifyCOMPARE_OP to ensure the canonical representation likehttps://github.com/colesbury/cpython/blob/5583ac0c311132e36ef458842e087945898ffdec/Python/bytecodes.c#L2409-L2416
  2. UsePyStackRef_IsFalse (instead ofPyStackRef_IsExactly) in theJUMP_IF_FALSE
  3. Modify the codegen by insertingTO_BOOL in those two spots.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That makes sense, thanks for the explanation. Since usingPyStackRef_IsExactly safely is sensitive to code generation changes, I might suggest using it only when we're sure it actually matters for performance, and default to using the variants that mask out the deferred bits everywhere by default since those are always safe. I'd guess that this wouldn't affect the performance improvement of this change much, since it should come from avoiding the tagging in_PyStackRef_FromPyObjectSteal. I don't feel super strongly though.

colesbury reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'll switch to usingPyStackRef_IsFalse andPyStackRef_IsTrue.

I'm no longer convinced thatPyStackRef_IsExactly is actually a performance win (and I didn't see it in measurements). I think we have issues with code generation quality that we'll need to address later. Things likePOP_JUMP_IF_NONE are composed of_IS_NONE and_POP_JUMP_IF_TRUE and we pack the intermediate result in a tagged_PyStackRef. Clang does a pretty good job of optimizing through it. GCC less so:https://gcc.godbolt.org/z/Ejs8c78qd.

mpage reacted with thumbs up emoji
@colesbury
Copy link
ContributorAuthor

Were the previous checks wrong?

No, the previous checks were okay whenPyStackRef_FromPyObjectSteal ensured that stack refs created from immortal objects always had their deferred bit set.

mpage reacted with thumbs up emoji

Copy link
Contributor

@mpagempage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Nice!

replacedop(_POP_JUMP_IF_TRUE, (cond-- )) {
assert(PyStackRef_BoolCheck(cond));
intflag=PyStackRef_Is(cond,PyStackRef_True);
intflag=PyStackRef_IsExactly(cond,PyStackRef_True);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That makes sense, thanks for the explanation. Since usingPyStackRef_IsExactly safely is sensitive to code generation changes, I might suggest using it only when we're sure it actually matters for performance, and default to using the variants that mask out the deferred bits everywhere by default since those are always safe. I'd guess that this wouldn't affect the performance improvement of this change much, since it should come from avoiding the tagging in_PyStackRef_FromPyObjectSteal. I don't feel super strongly though.

colesbury reacted with thumbs up emoji
@colesbury
Copy link
ContributorAuthor

Benchmark on most recent changes:https://github.com/facebookexperimental/free-threading-benchmarking/tree/main/results/bm-20241122-3.14.0a1+-a9e4872-NOGIL#vs-base

Geometric mean: 1.03x faster (HPT: reliability of 100.00%, 1.02x faster at 99th %ile)

@colesburycolesbury merged commit4759ba6 intopython:mainNov 22, 2024
65 checks passed
@colesburycolesbury deleted the gh-127022-cheaper-steal branchNovember 22, 2024 17:55
ebonnal pushed a commit to ebonnal/cpython that referenced this pull requestJan 12, 2025
This gets rid of the immortal check in `PyStackRef_FromPyObjectSteal()`.Overall, this improves performance about 2% in the free threadingbuild.This also renames `PyStackRef_Is()` to `PyStackRef_IsExactly()` becausethe macro requires that the tag bits of the arguments match, which isonly true in certain special cases.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@eendebakpteendebakpteendebakpt left review comments

@mpagempagempage approved these changes

@markshannonmarkshannonAwaiting requested review from markshannonmarkshannon is a code owner

@Fidget-SpinnerFidget-SpinnerAwaiting requested review from Fidget-SpinnerFidget-Spinner is a code owner

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@colesbury@Fidget-Spinner@mpage@eendebakpt

[8]ページ先頭

©2009-2025 Movatter.jp