Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

GH-101362: Optimise pathlib by deferring path normalisation#101560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed

Conversation

@barneygale
Copy link
Contributor

@barneygalebarneygale commentedFeb 4, 2023
edited
Loading

PurePath now normalises and splits paths only when necessary, e.g. when.name or.parent is accessed. The result is cached. This speeds up path object construction by around 4x.

PurePath.__fspath__() now returns an unnormalised path, which should be transparent to filesystem APIs (else pathlib's normalisation is broken!). This extends the earlier performance improvement to most impurePath methods, and also speeds upp.joinpath('bar') andp / 'bar'. edit: will fix separately.

This alsofixesGH-76846 andGH-85281 by unifying path constructors and adding an__init__() method. edit: will fix separately.

AlexWaygood and mdboom reacted with rocket emoji
`PurePath` now normalises and splits paths only when necessary, e.g. when`.name` or `.parent` is accessed. The result is cached. This speeds up pathobject construction by around 4x.`PurePath.__fspath__()` now returns an unnormalised path, which should betransparent to filesystem APIs (else pathlib's normalisation is broken!).This extends the earlier performance improvement to most impure `Path`methods, and also speeds up pickling, `p.joinpath('bar')` and `p / 'bar'`.This alsofixespythonGH-76846 andpythonGH-85281 by unifying path constructors andadding an `__init__()` method.
@barneygale
Copy link
ContributorAuthor

barneygale commentedFeb 4, 2023
edited
Loading

Constructing path objects is up to 4x faster with one argument:

$ ./python -m timeit -n 1000000 -s'from pathlib import PurePath''PurePath("foo/bar")' 1000000 loops, best of 5: 2.01 usec per loop# before1000000 loops, best of 5: 495 nsec per loop# after

More than 2x faster with two arguments:

$ ./python -m timeit -n 1000000 -s'from pathlib import PurePath''PurePath("foo", "bar")' 1000000 loops, best of 5: 2.28 usec per loop# before1000000 loops, best of 5: 1.02 usec per loop# after

~~And ~25% faster when joining arguments:~~

[edit: no longer true! ]

$ ./python -m timeit -n 1000000 -s'from pathlib import PurePath; p = PurePath("foo")''p.joinpath("bar")' 1000000 loops, best of 5: 1.66 usec per loop# before1000000 loops, best of 5: 1.3 usec per loop# after

But it's 12%slower when the path needs normalization, as withstr()

$ ./python -m timeit -n 1000000 -s'from pathlib import PurePath''str(PurePath("foo/bar"))' 1000000 loops, best of 5: 2.96 usec per loop# before1000000 loops, best of 5: 3.31 usec per loop# after

And 25%slower when when walking directories (where pathlib keeps everything normalized):

[edit: resolved! seecomment]

$ ./python -m timeit -n 20 -s'from pathlib import Path''list(Path().rglob("*"))' 20 loops, best of 5: 53.4 msec per loop# before20 loops, best of 5: 66.5 msec per loop# after

But stillfaster for filesystem operations that don't require normalization:

[edit: no longer true! this can't be properly fixed until other stuff lands]

$ ./python -m timeit -n 100000 -s'from pathlib import Path''Path("README.rst").read_text()' 100000 loops, best of 5: 26.1 usec per loop# before100000 loops, best of 5: 21.2 usec per loop# after$ ./python -m timeit -n 100000 -s'from pathlib import Path''Path("README.rst").exists()' 100000 loops, best of 5: 5.45 usec per loop# before100000 loops, best of 5: 2.97 usec per loop# after

@barneygalebarneygale marked this pull request as ready for reviewFebruary 4, 2023 18:28
@barneygalebarneygale marked this pull request as draftFebruary 7, 2023 20:35
@barneygale
Copy link
ContributorAuthor

I've found a couple other small optimizations which are best tackled in other PRs, so I'm marking this PR as a 'draft' for now.

AlexWaygood reacted with thumbs up emoji

@barneygalebarneygale changed the titleGH-101362 - Optimise pathlib by deferring path normalisationGH-101362: Optimise pathlib by deferring path normalisationMar 6, 2023
@AlexWaygoodAlexWaygood added the performancePerformance or resource usage labelMar 6, 2023
@barneygale
Copy link
ContributorAuthor

I've undone the change to_from_parsed_parts(), which has restored directory-walking performance:

$ ./python -m timeit -n 20 -s 'from pathlib import Path' 'list(Path().rglob("*"))' 20 loops, best of 5: 146 msec per loop  # before20 loops, best of 5: 152 msec per loop  # after

Still a tiny bit slower than pre-PR.

The rest of the speedups/slowdowns mentioned in my previous comment are still there.

@barneygalebarneygale marked this pull request as ready for reviewMarch 6, 2023 02:33
@barneygale
Copy link
ContributorAuthor

The change toimportlib is necessary because it's relying on a bug in pathlib's path normalization:

I think I need to solve that issue first, so I'm going to mark this PR as a draft (again!)

@barneygalebarneygale marked this pull request as draftMarch 11, 2023 23:32
@barneygalebarneygale marked this pull request as ready for reviewMarch 17, 2023 16:20
@barneygalebarneygale marked this pull request as draftMarch 17, 2023 16:45
@barneygale
Copy link
ContributorAuthor

This PR has strayed too far from the original implementation. I'm going to abandon it. New PR here:

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@jaracojaracoAwaiting requested review from jaraco

@warsawwarsawAwaiting requested review from warsaw

@AlexWaygoodAlexWaygoodAwaiting requested review from AlexWaygood

Assignees

No one assigned

Labels

awaiting reviewperformancePerformance or resource usagetopic-pathlib

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

pathlib.Path._from_parsed_parts should call cls.__new__(cls)

3 participants

@barneygale@bedevere-bot@AlexWaygood

[8]ページ先頭

©2009-2025 Movatter.jp