Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.3k
GH-101362: Optimise pathlib by deferring path normalisation#101560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
`PurePath` now normalises and splits paths only when necessary, e.g. when`.name` or `.parent` is accessed. The result is cached. This speeds up pathobject construction by around 4x.`PurePath.__fspath__()` now returns an unnormalised path, which should betransparent to filesystem APIs (else pathlib's normalisation is broken!).This extends the earlier performance improvement to most impure `Path`methods, and also speeds up pickling, `p.joinpath('bar')` and `p / 'bar'`.This alsofixespythonGH-76846 andpythonGH-85281 by unifying path constructors andadding an `__init__()` method.barneygale commentedFeb 4, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Constructing path objects is up to 4x faster with one argument: $ ./python -m timeit -n 1000000 -s'from pathlib import PurePath''PurePath("foo/bar")' 1000000 loops, best of 5: 2.01 usec per loop# before1000000 loops, best of 5: 495 nsec per loop# after More than 2x faster with two arguments: $ ./python -m timeit -n 1000000 -s'from pathlib import PurePath''PurePath("foo", "bar")' 1000000 loops, best of 5: 2.28 usec per loop# before1000000 loops, best of 5: 1.02 usec per loop# after ~~And ~25% faster when joining arguments:~~ [edit: no longer true! ] $ ./python -m timeit -n 1000000 -s'from pathlib import PurePath; p = PurePath("foo")''p.joinpath("bar")' 1000000 loops, best of 5: 1.66 usec per loop# before1000000 loops, best of 5: 1.3 usec per loop# after But it's 12%slower when the path needs normalization, as with $ ./python -m timeit -n 1000000 -s'from pathlib import PurePath''str(PurePath("foo/bar"))' 1000000 loops, best of 5: 2.96 usec per loop# before1000000 loops, best of 5: 3.31 usec per loop# after
[edit: resolved! seecomment] $ ./python -m timeit -n 20 -s'from pathlib import Path''list(Path().rglob("*"))' 20 loops, best of 5: 53.4 msec per loop# before20 loops, best of 5: 66.5 msec per loop# after
[edit: no longer true! this can't be properly fixed until other stuff lands] $ ./python -m timeit -n 100000 -s'from pathlib import Path''Path("README.rst").read_text()' 100000 loops, best of 5: 26.1 usec per loop# before100000 loops, best of 5: 21.2 usec per loop# after$ ./python -m timeit -n 100000 -s'from pathlib import Path''Path("README.rst").exists()' 100000 loops, best of 5: 5.45 usec per loop# before100000 loops, best of 5: 2.97 usec per loop# after |
I've found a couple other small optimizations which are best tackled in other PRs, so I'm marking this PR as a 'draft' for now. |
I've undone the change to Still a tiny bit slower than pre-PR. The rest of the speedups/slowdowns mentioned in my previous comment are still there. |
The change to I think I need to solve that issue first, so I'm going to mark this PR as a draft (again!) |
This PR has strayed too far from the original implementation. I'm going to abandon it. New PR here: |
Uh oh!
There was an error while loading.Please reload this page.
PurePathnow normalises and splits paths only when necessary, e.g. when.nameor.parentis accessed. The result is cached. This speeds up path object construction by around 4x.edit: will fix separately.PurePath.__fspath__()now returns an unnormalised path, which should be transparent to filesystem APIs (else pathlib's normalisation is broken!). This extends the earlier performance improvement to most impurePathmethods, and also speeds upp.joinpath('bar')andp / 'bar'.This alsofixesGH-76846 andGH-85281 by unifying path constructors and adding anedit: will fix separately.__init__()method.