Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

GH-72904: Addglob.translate() function#106703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
barneygale merged 27 commits intopython:mainfrombarneygale:gh-72904-fnmatch-seps
Nov 13, 2023

Conversation

barneygale
Copy link
Contributor

@barneygalebarneygale commentedJul 12, 2023
edited
Loading

Addglob.translate() function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implementmatch() andglob().

This function differs fromfnmatch.translate() in that wildcards do not match path separators by default, and that a* pattern segment matches precisely one path segment. Whenrecursive is set to true,** pattern segments match any number of path segments, and** cannot appear outside its own segment.

In pathlib, this change speeds up directory walking (because_make_child_relpath() does less work), makes path objects smaller (they don't need a_lines slot), and removes the need for some gnarly code.


📚 Documentation preview 📚:https://cpython-previews--106703.org.readthedocs.build/

mariosasko, wimglenn, and edgarrmondragon reacted with thumbs up emoji
If a sequence of path separators is given to the new argument,`translate()` produces a pattern that matches similarly to`pathlib.Path.glob()`. Specifically:- A `*` pattern segment matches precisely one path segment.- A `**` pattern segment matches any number of path segments- If `**` appears in any other position within the pattern, `ValueError` is  raised.- `*` and `?` wildcards in other positions don't match path separators.This change allows us to factor out a lot of complex code in pathlib.
@barneygale
Copy link
ContributorAuthor

~20% globbing speedup:

$ ./python -m timeit -s'from pathlib import Path; p = Path()''list(p.glob("**/*", follow_symlinks=False))'2 loops, best of 5: 175 msec per loop# before2 loops, best of 5: 146 msec per loop# after

@barneygalebarneygale changed the titleGH-72904: Add optional *seps* argument tofnmatch.translate()GH-72904: Add optional *sep* argument tofnmatch.translate()Jul 26, 2023
Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>
@jaracojaraco mentioned this pull requestAug 5, 2023
4 tasks
@barneygalebarneygale changed the titleGH-72904: Add optional *sep* argument tofnmatch.translate()GH-72904: Addglob.translate() functionAug 12, 2023
@barneygale
Copy link
ContributorAuthor

I've moved this to a newglob.translate() function.

It was easy enough to implement arecursive argument, so I did that and made its defaultFalse to matchglob().

It's much harder to implement aninclude_hidden argument, so I've left that for now. I don't feel great about it, tbh.

@barneygale
Copy link
ContributorAuthor

Right, after some futzing around I'm going to mark this PR as ready again.

Infnmatch.py, I've split thetranslate() method into_translate() and_join_translated_parts(). This minimises the diff, risk, and performance impact in that module.

(In#109879 I've re-implementedfnmatch.translate(), but that PR is an optional side-quest now)

Inglob.py, I've spent some time making the implementation oftranslate(include_hidden=False) as clear and performant as I can. There's still a fair whack of regex involved but Ithink it's followable.

Inpathlib.py, I've made pattern matching usestr(path) directly, rather than a slight variant that represents empty paths as'' rather than'.'.

@barneygale
Copy link
ContributorAuthor

barneygale commentedSep 30, 2023
edited
Loading

Timings:

$ ./python -m timeit -n 5 -s 'from glob import glob' 'list(glob("**/*", recursive=True, include_hidden=True))'5 loops, best of 5: 105 msec per loop  # for interest$ ./python -m timeit -n 5 -s 'from pathlib import Path; p = Path()' 'list(p.glob("**/*", follow_symlinks=True))'5 loops, best of 5: 86.6 msec per loop  # before5 loops, best of 5: 72.7 msec per loop  # after

SoPath.glob() is ~20% faster than before, and ~45% faster thanglob.glob(), at least for this test case.

Copy link
Member

@AA-TurnerAA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A desk review:

A

barneygale reacted with thumbs up emoji
barneygaleand others added2 commitsSeptember 30, 2023 23:21
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
@barneygale
Copy link
ContributorAuthor

I've realised that the docs are a bit skew-whiff. Fix is in a separate PR:#110418

@encukou
Copy link
Member

Is it correct to keep duplicated path separators?

>>>glob.translate('a//b')'(?s:a//b)\\Z'

@barneygale
Copy link
ContributorAuthor

barneygale commentedNov 3, 2023
edited
Loading

Is it correct to keep duplicated path separators?

>>>glob.translate('a//b')'(?s:a//b)\\Z'

glob() keeps them:

>>>os.makedirs('a/b')>>>glob.glob('a//b')['a//b']

So I reckon yes?

@barneygale
Copy link
ContributorAuthor

Also, the number of additional slashes is meaningful in some cases, e.g. in Windows UNC paths or POSIX paths starting with two forward slashes. I don't think a pattern like/foo should match a path like//foo, and vice-versa.

@barneygale
Copy link
ContributorAuthor

barneygale commentedNov 12, 2023
edited
Loading

Hey@encukou, do you think I can merge this, or should I wait for a more complete review from someone?

@encukou
Copy link
Member

Oh, I should have been more clear that I wouldn't get to a thorough review in any reasonable time.
You already have a few approvals, merge away!

barneygale reacted with heart emoji

@barneygalebarneygale merged commitcf67ebf intopython:mainNov 13, 2023
aisk pushed a commit to aisk/cpython that referenced this pull requestFeb 11, 2024
Add `glob.translate()` function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implement `match()` and `glob()`.This function differs from `fnmatch.translate()` in that wildcards do not match path separators by default, and that a `*` pattern segment matches precisely one path segment. When *recursive* is set to true, `**` pattern segments match any number of path segments, and `**` cannot appear outside its own segment.In pathlib, this change speeds up directory walking (because `_make_child_relpath()` does less work), makes path objects smaller (they don't need a `_lines` slot), and removes the need for some gnarly code.Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Glyphack pushed a commit to Glyphack/cpython that referenced this pull requestSep 2, 2024
Add `glob.translate()` function that converts a pathname with shell wildcards to a regular expression. The regular expression is used by pathlib to implement `match()` and `glob()`.This function differs from `fnmatch.translate()` in that wildcards do not match path separators by default, and that a `*` pattern segment matches precisely one path segment. When *recursive* is set to true, `**` pattern segments match any number of path segments, and `**` cannot appear outside its own segment.In pathlib, this change speeds up directory walking (because `_make_child_relpath()` does less work), makes path objects smaller (they don't need a `_lines` slot), and removes the need for some gnarly code.Co-authored-by: Jason R. Coombs <jaraco@jaraco.com>Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@jaracojaracoAwaiting requested review from jaraco

@AA-TurnerAA-TurnerAwaiting requested review from AA-Turner

@serhiy-storchakaserhiy-storchakaAwaiting requested review from serhiy-storchaka

Assignees
No one assigned
Labels
performancePerformance or resource usagetopic-pathlib
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

5 participants
@barneygale@encukou@jaraco@AA-Turner@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp