Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork34.3k
bpo-46372: Try to mutate the LHS during fastfloat ops#30594
bpo-46372: Try to mutate the LHS during fastfloat ops#30594brandtbucher wants to merge 18 commits intopython:mainfrom
float ops#30594Conversation
Uh oh!
There was an error while loading.Please reload this page.
sweeneyde commentedJan 14, 2022
I think |
markshannon left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
A few general comments:
Could you attach the benchmark results to the PR, not the issue? The speedup depends on the actual implementation, not the general idea.
I'm a bit surprised that this offers much of a speedup given the complexity of the int operators. Maybe the current implementation is just really slow?
Does the specialization of ints add much, or is it mainly the float specialization giving the speedup? I'd be curious to see how a float-only version compares.
It would be worth adding stats to give us some idea what fraction of operations can use this optimization or something like it. In other words, what fraction of operations have the lhs with a ref-count of 1 (or 2 when assigning to same variable)? That would be generally useful information beyond this particular optimization.
As a general rule, if something can be done at compile time, it should be. If not then it should be done at specialization time. Only do stuff at runtime, if we can't avoid it.
We can determine at compile time whether a binary operator if of the forms += x ors = s + x (wheres is a "fast" local). We should do that.
I don't know if it is worth having two specializations, one for thes = s + x and one for the general form, or embedding the information in the cache. I suspect that it won't matter.
I like the float optimization, but I think our efforts to speed up int operations should focus or making int objects more efficient first. Otherwise the coupling of the interpreter to the int implementation is going to make improving the latter even more difficult.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
bedevere-bot commentedJan 17, 2022
When you're done making the requested changes, leave the comment: |
Uh oh!
There was an error while loading.Please reload this page.
brandtbucher commentedJan 19, 2022
Yeah, that makes sense. They can also be updated as the implementation changes, too.
Looks like your intuition is correct. Here is a comparison of Details
I've updated the PR to use the adaptive entry for this purpose. I understand the benefits of a compile-time check, but doing the check at specialization time seems a lot simpler in this particular case.
Shall I drop the |
Uh oh!
There was an error while loading.Please reload this page.
markshannon commentedJan 20, 2022
Looks good. |
brandtbucher commentedJan 20, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Do you have time for a quick Teams call@markshannon? Might be easier to chat about the stats I gathered in-person. (If not, maybe tomorrow or Monday.) |
markshannon commentedJan 21, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Somewhat surprisingbenchmarking results. |
sweeneyde commentedJan 21, 2022
pidigits slowdown should be somewhat expected, since it's mostly huge-integer operations, which are specialized and handled by _PyLong_Add and such before this PR but not after. |
int/float opsfloat opsbrandtbucher commentedJan 25, 2022
Hm, looks like removing the I’ll go ahead and try the less-branchy version that we discussed this morning (an adaptive superinstruction based on the existing in-place string addition op). |
markshannon commentedJan 25, 2022
LGTM. Feel free to merge once the merge conflicts are fixed. |
gvanrossum left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Nice and simple!
Python/ceval.c Outdated
| Py_DECREF(rhs); \ | ||
| STACK_SHRINK(1); \ | ||
| if (Py_REFCNT(lhs) == 1) { \ | ||
| PyFloat_AS_DOUBLE(lhs) = d; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Presumably (see PEP 674) Victor won't like this use of a macro as a target. :-)
Nevertheless, this whole macro is so nice and straightforward compared to ints!
Python/ceval.c Outdated
| double d = l OP r; \ | ||
| Py_DECREF(rhs); \ | ||
| STACK_SHRINK(1); \ | ||
| if (Py_REFCNT(lhs) == 1) { \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Would there be any benefit in checking whether perhaps rhs has a refcount of 1, in case lhs doesn't? While people are probably more likely to writereturn x+1 rather thanreturn 1+x (though even I could sometimes see a reason to do that, from "rhyming" with some other operations or just how the textbook presents the formula), writingreturn 2*x is more likely thanreturn x*2. And of course for asymmetric operations you may not have a choice, likereturn 1/x.
Of course, there's a cost (more code, more branches). There's also the fact that+= and friends steer this in the direction you're taking here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Of course, there's a cost (more code, more branches). There's also the fact that
+=and friends steer this in the direction you're taking here.
Yeah, those are my main motivations for sticking with the LHS here.
brandtbucher commentedFeb 1, 2022
Okay, I just pushed variants of This gets us back up to a 1% improvement: |
gvanrossum commentedFeb 1, 2022
Seems you have a merge conflict. Also, Mark should probably re-review this? |
Uh oh!
There was an error while loading.Please reload this page.
markshannon left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Not a fan of the big macro. Otherwise, looks good though.
| }; | ||
| // NOTE: INPLACE must be a compile-time constant to avoid runtime branching! | ||
| #define BINARY_OP_FAST_FLOAT(OP, INPLACE) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'm not a fan of giant macros and I don't like the hidden compile time control flow.
Given that the inplace and non-inplace forms specialized forms differ in a fundamental way, I'd like that to be explicit in the instructions themselves.
Maybe have two different macros, one for the inplace form and one for the non-inplace form.
You should be able to factor out some of the common code into helper macros.
That might be clearer. Up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Makes sense. I'll go ahead and split it up.
iritkatriel left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This has merge conflicts now.
bedevere-bot commentedNov 25, 2022
When you're done making the requested changes, leave the comment: |
Uh oh!
There was an error while loading.Please reload this page.
This modifies the existing
int andfloatspecializations to mutate the LHS operand in-place when possible. It also unifies the implementations of all of these operations.https://bugs.python.org/issue46372