Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
gh-131798: JIT: Optimize_POP_CALL_TWO_LOAD_CONST_INLINE_BORROW
#134369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
f37b9c3
to82e0a17
ComparePython/bytecodes.c Outdated
PyStackRef_CLOSE(pop1); | ||
} | ||
tier2 pure op(_POP_CALL_LOAD_CONST_INLINE_BORROW, (ptr/4, callable, null -- value)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Can you remove thepure
annotations from these please? They don't get optimized to anything so there's not really a use for them.
Fidget-Spinner left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for the PR. I think@brandtbucher brought this up before, but I think we should be making the uop buffer add-to rather than in-place. What I mean is that instead of adding a pop variant for every optimization we want to do:
..._POP_X_LOAD_CONST_INLINE_BORROW
We should add:
..._POP_TOP_POP_TOP..._LOAD_CONST_INLINE_BORROW
This way:
- We don't have to add a new special pop uop for every single op. This saves on mental overhead, because a special pop variant of the uop for every uop we want to optimize is really tough to reason about.
- In the optimizer, we can just generalize it to
POP_TOP
X times, which is easier to write code to optimize for, rather than add a new entry toremove_unneeded_uops
. As you probably noticed already, it's really tedious to do this.
This would require allowing us to add to the uop buffer, rather than overwriting it in-place. It might be a bigger change, but if you're up for it, it would greatly simplify all future optimizations, and I'd be grateful for someone cleaning that part up. It would best be a separate PR.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
We actually discussed this w/ Brandt yesterday and I totally agree with that. Brandt also mentioned that it is ok to add this for now but I don't mind closing this and focusing on changing the uop buffer right away. Note that Brandt greatly simplified |
Yeah, 90% of the reason why I felt the urgency to move away from in-place modification was because the logic in the pass that removes matching pushes and pops was getting completely out-of-hand. That's not a concern anymore, so I think we can keep the momentum going on the current in-place optimizer work for the time being. |
Fidget-Spinner commentedMay 21, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Ok. Can we please make sure to move away from the in-place optimizer after this PR for any new pop tops load const inline X instructions? The pushes and pops are no longer an issue, but the number of new uops variants we're adding is getting out of hand. I have no clue what some of these uops do unless I specifically look up their emission in the optimizer. We don't need this many uops and fewer uops is generally a better thing (less maintainer burden, less human error ...). If we can generalize to just a few |
Yep, I agree. It also allows us to specialize individual pops for specific known types, immortal values, non-escaping pops, etc. After the sprint ends tomorrow, we can start with larger cleanup efforts (which I agree we need, in the tracer too). |
Great! Feel free to dismiss my review then in the GH UI. |
3d71a8e
to47995d0
Compare47995d0
to90d38d7
CompareWhen I get back, I'll start working on making the optimizer operate on a copy of the stream. This is good for now though. Really cool to see entire calls start disappearing. |
d1ea8ed
intopython:mainUh oh!
There was an error while loading.Please reload this page.
Looks great.@tomasr8: natural follow-ups would be support for tuples of types in |
on it! |
Uh oh!
There was an error while loading.Please reload this page.
Optimize
_POP_CALL_TWO_LOAD_CONST_INLINE_BORROW
This requires adding some additional ops:
_POP_CALL_ONE_LOAD_CONST_INLINE_BORROW
_POP_CALL_LOAD_CONST_INLINE_BORROW
_POP_THREE
_POP_TWO
We can now optimize:
into just
_POP_CALL_LOAD_CONST_INLINE_BORROW
i.e. we save two stack pushes and two stack pops.