Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
gh-116477: Improve performance of range for the single argument case#116478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
In general LGTM. But please remove redundant spaces.
Misc/NEWS.d/next/Core and Builtins/2024-03-07-22-38-07.gh-issue-116477.AqwRAV.rst OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
…e-116477.AqwRAV.rstCo-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Uh oh!
There was an error while loading.Please reload this page.
Objects/rangeobject.c Outdated
lstop = PyLong_AsLong(r->stop); | ||
if (lstop == -1 && PyErr_Occurred()) { | ||
if (r->start == _PyLong_GetZero() && r->step == _PyLong_GetOne() ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
How large is overhead of twoPyLong_AsLong()
calls for 0 and 1? Is it noticeable?
Run something like:
./python -m timeit -s 'r = range(10)' 'for i in r: break'
AFAIK it is the fastest code that is affected by the iterator creation time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The overhead for the twoPyLong_AsLong
calls (plus some bits and pieces) are from the first benchmark above:
range(1): Mean +- std dev: [main_pgo] 44.4 ns +- 1.0 ns -> [pr_pgo] 34.7 ns +- 4.0 ns: 1.28x faster
So about 10 ns.
The last benchmark from the first comment:
for loop: Mean +- std dev: [main_pgo] 295 ns +- 9 ns -> [pr_pgo] 267 ns +- 5 ns: 1.10x faster
executes:
def g(): x=0 for ii in range(10): x += 1
So a small for loop with minimal work becomes 10% faster. (timings may vary, my system is noisy)
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looks good to me, but I'd like Serhiy's thumbs up before landing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I'll merge this later at the sprints. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I am not convinced that this change is worth to do, but I will not object against committing it if other core developer likes it.
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@eendebakpt, are the benchmarks up to date? |
Here is a set of benchmarks for the PR rebased to main:
Benchmark script(compiled with PGO, executed with
There is some noise in the data, but there are enough datapoints to make the trends clear. Most relevant for real applications are the |
So, if we exclude the iter optimisation, there will be no slowdowns (unlesss I misread the benchmark)? |
Both optimizations use the same check (two pointer comparisons to Two runs of benchmarks of main vs. the updated PR
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM.@serhiy-storchaka, any objections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Please repeat benchmarks with the current code.
Uh oh!
There was an error while loading.Please reload this page.
eendebakpt commentedFeb 6, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
New benchmarks (Ubuntu, with the new tail calling interpreter, non-PGO):
|
Pieter, can you fix the linting CI? Let's land this! |
I understand your concern. For example, we did not optimise |
Uh oh!
There was an error while loading.Please reload this page.
Benchmark
The fast paths eliminate several calls to
PyLong_AsLongAndOverflow
and the construction of a new Python integer withPyLong_FromLong
.The fast path for the range object is in
compute_range_length
. That implies a small performance penalty (two pointer comparisons) for the cases wherestart != 0
orstep != 1
. We could move the fast path torange_from_array
(thecase 1:
part), but that leads to a bit more duplicated code.