Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-117349: Micro-optimize a fewos.path functions#117350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
AlexWaygood merged 50 commits intopython:mainfromnineteendo:speedup-os.path
Apr 2, 2024

Conversation

@nineteendo
Copy link
Contributor

@nineteendonineteendo commentedMar 28, 2024
edited
Loading

Benchmarks

ntpath.py

script
# TODO: test isjunction() on Windowsecho"isreserved()"&& python -m timeit -s"import before.ntpath""before.ntpath.isreserved('con')"&& python -m timeit -s"import after.ntpath""after.ntpath.isreserved('con')"echo"isreserved('.')"&& python -m timeit -s"import before.ntpath""before.ntpath.isreserved('.')"&& python -m timeit -s"import after.ntpath""after.ntpath.isreserved('.')"echo"expanduser()"&& python -m timeit -s"import before.ntpath""before.ntpath.expanduser('~')"&& python -m timeit -s"import after.ntpath""after.ntpath.expanduser('~')"echo"realpath()"; python -m timeit -s"import test""test.realpath1('.')"; python -m timeit -s"import test""test.realpath2('.')"echo"realpath('nul')"; python -m timeit -s"import test""test.realpath1('nul')"; python -m timeit -s"import test""test.realpath2('nul')"
isreserved()200000 loops, best of 5: 1.66 usec per loop # before200000 loops, best of 5: 1.65 usec per loop # after# -> no differenceisreserved('.')200000 loops, best of 5: 1.31 usec per loop # before500000 loops, best of 5: 966 nsec per loop # after# -> 1.36x faster (for . & ..)expanduser()200000 loops, best of 5: 1.72 usec per loop # before200000 loops, best of 5: 1.72 usec per loop # after# -> no differencerealpath()2000 loops, best of 5: 132 usec per loop # before2000 loops, best of 5: 131 usec per loop # after# -> no differencerealpath('nul')200000 loops, best of 5: 1.26 usec per loop # before500000 loops, best of 5: 907 nsec per loop # after# -> 1.39x faster (for nul)

posixpath.py

script
# test.shecho"ismount()"&& python -m timeit -s"import before.posixpath""before.posixpath.ismount('/Volumes/2GB_001')"&& python -m timeit -s"import after.posixpath""after.posixpath.ismount('/Volumes/2GB_001')"echo"expanduser()"&& python -m timeit -s"import before.posixpath""before.posixpath.expanduser('~')"&& python -m timeit -s"import after.posixpath""after.posixpath.expanduser('~')"echo"expanduser(b'~root')"&& python -m timeit -s"import before.posixpath""before.posixpath.expanduser(b'~root')"&& python -m timeit -s"import after.posixpath""after.posixpath.expanduser(b'~root')"echo"_normpath_fallback()"&& python -m timeit -s"import before.posixpath""before.posixpath._normpath_fallback('foo//bar')"&& python -m timeit -s"import after.posixpath""after.posixpath._normpath_fallback('foo//bar')"echo"abspath()"&& python -m timeit -s"import before.posixpath""before.posixpath.abspath('foo')"&& python -m timeit -s"import after.posixpath""after.posixpath.abspath('foo')"echo"abspath('/foo')"&& python -m timeit -s"import before.posixpath""before.posixpath.abspath('/foo')"&& python -m timeit -s"import after.posixpath""after.posixpath.abspath('/foo')"echo"realpath()"&& python -m timeit -s"import before.posixpath""before.posixpath.realpath('foo/../../..')"&& python -m timeit -s"import after.posixpath""after.posixpath.realpath('foo/../../..')"
ismount()10000 loops, best of 5: 20.3 usec per loop # before10000 loops, best of 5: 19.3 usec per loop # after# -> 1.05x fasterexpanduser()200000 loops, best of 5: 1.43 usec per loop # before200000 loops, best of 5: 1.42 usec per loop # after# -> no differenceexpanduser(b'~root')200000 loops, best of 5: 1.82 usec per loop # before200000 loops, best of 5: 1.75 usec per loop # after# -> 1.04x faster (for byte users)_normpath_fallback()200000 loops, best of 5: 1.07 usec per loop # before500000 loops, best of 5: 953 nsec per loop # after# -> 1.13x fasterabspath()20000 loops, best of 5: 16.7 usec per loop # before20000 loops, best of 5: 16.5 usec per loop # after# -> no differenceabspath('/foo')500000 loops, best of 5: 509 nsec per loop # before500000 loops, best of 5: 423 nsec per loop # after# -> 1.20x faster (for absolute paths)realpath()10000 loops, best of 5: 24.4 usec per loop # before10000 loops, best of 5: 23.9 usec per loop # after# -> 1.02x faster

@nineteendonineteendo mentioned this pull requestMar 29, 2024
16 tasks
@nineteendo
Copy link
ContributorAuthor

I believe that's everything. If you would like these changes to be split up in multiple pull requests let me know.

@nineteendonineteendo marked this pull request as ready for reviewMarch 29, 2024 15:09
Copy link
Member

@AlexWaygoodAlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

As@sobolevn says, it's very difficult to see here which changes are actually related to performance improvements, and which are simply cosmetic changes that have no impact on how the code works. Our general policy is not to accept cosmetic changes, but even if we decided that we wanted to make an exception here, any such stylistic/formatting improvements would have to go into their own PR, so that the git history clearly showed which changes were performance-related and which were cosmetic.

Please revert all changes that do not actually have any impact on performance.

erlend-aasland reacted with thumbs up emoji
@bedevere-app
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@nineteendo
Copy link
ContributorAuthor

Please revert all changes that do not actually have any impact on performance.

Does that include unnesting?

@nineteendo
Copy link
ContributorAuthor

Is there anything you still expect from me? Or can this finally be merged? I've listened to basically all the feedback.
If you want to wait for a benchmark from me, that's understandable of course.

@nineteendo
Copy link
ContributorAuthor

The speed improvements seem way less impressive now. ;(
And I that's even without running my benchmark.

@nineteendo
Copy link
ContributorAuthor

Are you happy now?

@AlexWaygood
Copy link
Member

I re-ran your benchmarks locally. With the latest version of your PR, I'm getting a reasonableslowdown onposixpath.ismount() from your PR branch.

Onmain:

(main) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                                 ~/dev/cpython500000 loops, best of 5: 602 nsec per loop(main) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                                 ~/dev/cpython500000 loops, best of 5: 603 nsec per loop(main) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                                 ~/dev/cpython500000 loops, best of 5: 607 nsec per loop(main) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                                 ~/dev/cpython500000 loops, best of 5: 601 nsec per loop

With your PR branch:

(speedup-os.path) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                      ~/dev/cpython500000 loops, best of 5: 637 nsec per loop(speedup-os.path) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                      ~/dev/cpython500000 loops, best of 5: 650 nsec per loop(speedup-os.path) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                      ~/dev/cpython500000 loops, best of 5: 646 nsec per loop(speedup-os.path) % ./python.exe -m timeit -s "import posixpath" "posixpath.ismount('/Volumes/2GB_001')"                      ~/dev/cpython500000 loops, best of 5: 643 nsec per loop

@nineteendo
Copy link
ContributorAuthor

With the latest version of your PR, I'm getting a reasonable slowdown on posixpath.ismount() from your PR branch.

You don't have a usb stick "2GB_001" plugged in, so the function will returnFalse immediately afters1 = os.lstat(path).
I believe something has been sped up on the main branch, so I just synced my branch.

@AlexWaygood
Copy link
Member

You don't have a usb stick "2GB_001" plugged in, so the function will returnFalse immediately afters1 = os.lstat(path).

heh, that makes sense

Copy link
Member

@AlexWaygoodAlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM

@nineteendo
Copy link
ContributorAuthor

I've updated the benchmarks. Sadly, the only noticeable speedup for regular users is callingposixpath.abspath() with an absolute path.

@AlexWaygood
Copy link
Member

I've updated the benchmarks. Sadly, the only noticeable speedup for regular users is callingposixpath.abspath() with an absolute path.

Don't be too disheartened. As a result of your efforts here, a conversation has been started about possible ways of optimisingstr.startswith and other string methods. If one or more of those ideas comes to fruition, that will bevery impactful for Python users.

Optimising the stdlib is hard! I tried outmany things that ultimately didn't work when I was working on#74690. And there were several things thatdid work, but which I never created PRs for, as they would have made the code too ugly or too fragile.

My advice for the future, however, would definitely be to work on small, focused PRs that have isolated, easily measurable changes. PRs like that are much easier for us to review, and you should find the contributing experience less frustrating as a result.

erlend-aasland reacted with thumbs up emojiJelleZijlstra reacted with heart emoji

@nineteendo
Copy link
ContributorAuthor

Can this be merged, or are we still waiting on something?

@AlexWaygood
Copy link
Member

Can this be merged, or are we still waiting on something?

It can be merged, I just wanted to wait a little bit longer to give the other reviewers time to chime in on the final version of the PR, if they wanted to. I'll merge tomorrow or the day after if there are no further objections from anybody.

nineteendo reacted with thumbs up emoji

nineteendoand others added2 commitsApril 2, 2024 21:46
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
@AlexWaygoodAlexWaygood changed the titlegh-117349: Speedup os.pathgh-117349: Micro-optimize a fewos.path functionsApr 2, 2024
@AlexWaygoodAlexWaygood merged commitcae4cdd intopython:mainApr 2, 2024
@AlexWaygood
Copy link
Member

Thanks@nineteendo

nineteendo reacted with thumbs up emoji

@nineteendonineteendo deleted the speedup-os.path branchApril 2, 2024 20:34
Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM.

AlexWaygood reacted with thumbs up emoji
@erlend-aasland
Copy link
Contributor

I believe it should indeed be possible to speed upstr.startswith(prefix), but notstr.startswith((prefix1, prefix2)), because there's not guarantee that the prefixes have the same length. I'll revert the optimisations for the first case.

Both cases were significantly improved by#117466:

# pre optimisation$ ./python.exe -m timeit -s "s = 'abcdef'" "s.startswith(('abc', 'de'))"5000000 loops, best of 5: 89 nsec per loop# post optimisation$ ./python.exe -m timeit -s "s = 'abcdef'" "s.startswith(('abc', 'de'))"10000000 loops, best of 5: 26.7 nsec per loop

I think we should be careful about micro-optimising Python code, like this PR. Instead, it would be better to keep the code as idiomatic as possible, and instead optimise the Python interpreter for the idiomatic cases.

AlexWaygood and hauntsaninja reacted with thumbs up emoji

@AlexWaygood
Copy link
Member

I think we should be careful about micro-optimising Python code, like this PR. Instead, it would be better to keep the code as idiomatic as possible, and instead optimise the Python interpreter for the idiomatic cases.

I agree that we should neither accept micro-optimisations that make code significantly less idiomatic, nor changes that are purely cosmetic and have no impact on performance. I accepted the PR nonetheless because in the final iteration of the PR, all changes seemed to me to be small, localised changes that, as well as providing small speedups, mostly made the code more idiomatic, and in all cases (in my opinion) did not cause a significant deterioration in style.

erlend-aasland and barneygale reacted with thumbs up emoji

@nineteendo
Copy link
ContributorAuthor

nineteendo commentedApr 3, 2024
edited
Loading

Both cases were significantly improved by#117466

I didn't expect the second case to be sped up as it was a sequence, so I asked for clarification:faster-cpython/ideas#671 (comment), but didn't get a response.

This speedup must have been added afterwards, which I missed.
I obviously reverted the regularstr.startswith() &str.endswith().

diegorusso pushed a commit to diegorusso/cpython that referenced this pull requestApr 17, 2024
)Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>Co-authored-by: Barney Gale <barney.gale@gmail.com>Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@sobolevnsobolevnsobolevn left review comments

@erlend-aaslanderlend-aaslanderlend-aasland left review comments

@AlexWaygoodAlexWaygoodAlexWaygood approved these changes

@eendebakpteendebakpteendebakpt approved these changes

@barneygalebarneygalebarneygale approved these changes

@serhiy-storchakaserhiy-storchakaserhiy-storchaka approved these changes

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

7 participants

@nineteendo@AlexWaygood@erlend-aasland@eendebakpt@barneygale@serhiy-storchaka@sobolevn

[8]ページ先頭

©2009-2025 Movatter.jp