Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-130942: Fix path seperator matched in character ranges for glob.translate#130989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
dmitrin9 wants to merge17 commits intopython:main
base:main
Choose a base branch
Loading
fromdmitrin9:glob_translations

Conversation

dmitrin9
Copy link

@dmitrin9dmitrin9 commentedMar 8, 2025
edited by bedevere-appbot
Loading

@ghost
Copy link

ghost commentedMar 8, 2025
edited by ghost
Loading

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-appbedevere-appbot added the testsTests in the Lib/test dir labelMar 8, 2025
@bedevere-app
Copy link

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

@bedevere-app
Copy link

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

@dmitrin9dmitrin9 marked this pull request as draftMarch 8, 2025 23:09
@dmitrin9dmitrin9 marked this pull request as ready for reviewMarch 10, 2025 19:16
@dmitrin9
Copy link
Author

@barneygale@picnixz
PR is ready for review! :)

@dmitrin9
Copy link
Author

+type-bug -tests

@barneygale
Copy link
Contributor

barneygale commentedMar 10, 2025
edited
Loading

Thanks v much for taking a look!

Range expressions like[%-0] are still valid, so we should evaluate them as wildcards rather than matching literally IMO. Basically we just need to apply an additional restriction: don't match a separator. We could do that with a lookahead (untested):

diff --git a/Lib/fnmatch.py b/Lib/fnmatch.pyindex 865baea2346..ee35dd4d24c 100644--- a/Lib/fnmatch.py+++ b/Lib/fnmatch.py@@ -145,8 +145,10 @@ def _translate(pat, star, question_mark):                     add('(?!)')                 elif stuff == '!':                     # Negated empty range: match any character.-                    add('.')+                    add(question_mark)                 else:+                    if question_mark != '.':+                        add(f'(?={question_mark})')                     # Escape set operations (&&, ~~ and ||).                     stuff = _re_setops_sub(r'\\\1', stuff)                     if stuff[0] == '!':
dkaszews reacted with thumbs up emojidmitrin9 reacted with hooray emoji

@barneygalebarneygale added type-bugAn unexpected behavior, bug, or error and removed testsTests in the Lib/test dir labelsMar 10, 2025
@@ -514,6 +514,9 @@ def fn(pat):
self.assertEqual(fn('foo/bar\\baz'), r'(?s:foo[/\\]bar[/\\]baz)\Z')
self.assertEqual(fn('**/*'), r'(?s:(?:.+[/\\])?[^/\\]+)\Z')

self.assertEqual(fn('foo[%-0]bar'), r'(?s:foo\[%-0\]bar)\Z')
Copy link

@dkaszewsdkaszewsMar 11, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not sure this is correct. From my understanding of manpages quoted in the issue, a class should be escaped only if it contains a literal path separator, not a range encompassing it. In latter case, we need to just exclude the separator.

[%-0]=> (?!/)[%-0][ab/]=> \[ab/\]

Edge case to be tested in bash andglob.glob: is a range beginning with separator ([%-/] or[/-0]) the first case or the second one? What about corner case of single element range[/-/]? I would say that all three should be escaped since they "contain an explicit/ character".

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Also, does

A range containing an
explicit '/' character is syntactically incorrect. (POSIX requires that
syntactically incorrect patterns are left unchanged.)

mean that entire glob should be escaped, or just the part with the separator? I.e, does[ab][0/][xy] map to[ab]\[0/\][xy] or\[ab\]\[0/\]\[xy\]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Copy link
Author

@dmitrin9dmitrin9Mar 11, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

(Thus, "[]-]" matches just the two characters ']' and
'-', and "[--0]" matches the three characters '-', '.', and '0',
since '/' cannot be matched.)

This would indicate that a range which includes a '/' character as a non-literal would match that range but exclude the '/' character, at least with my interpretation.

I got that from the glob manpage.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

2.13.3.1 looks to back my interpretation:

  1. If path separator appears between brackets, be it a single character or next to a hyphen, escape entire bracket expression. For'\\' in seps case, be careful to check it is not an escape but actual'\\\\'.
  2. Else, if any hyphened range spans a separator, add a negative lookahead. For simplicity, it can also be added for any bracket expression with a hyphen, or any bracket at all - result is the same, just simplifies regex in most cases.
  3. All bracket expressions are analyzed separately, so path separator in one does not invalidate and escape all others.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@dmitya26 I don't have Python on hand, can you just quickly runglob.glob('a[/-b]c') on a following tree:

|-- abc`-- a[    `-- -b]c

If it returns[abc], then you are correct, if it returns the file in subdir then my interpretation seems to match existing implementation.

Copy link
Author

@dmitrin9dmitrin9Mar 11, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It returns '[]'.

edit: Oh wait I think I might've misread how the directories need to be structured.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

├── a[
│   └── -b]c
└── abc

and

glob.glob('a[/-b]c')

would return

['a[/-b]c']

for me.

dkaszews reacted with thumbs up emoji
Copy link
Author

@dmitrin9dmitrin9Mar 14, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Wait so regarding the spec, do you think we should be disallowing only '/' characters, the system's path separator (os.path.sep), or all path separators mentioned like the ones in glob.translate?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Current implementation already extends the spec to all given separators, e.g.glob.translate('abc?', seps=['/', '\\']) maps to'(?s:abc[^/\\\\])\\Z'.

@dmitrin9
Copy link
Author

@barneygale Alright . I just pushed the implementation Dkaszews proposed earlier as that seems to be the most compliant with the spec you mentioned earlier on. I can also get you the initial implementation you showed where it uses a lookahead to exclude path separators from the range though if you feel that would be better. Feel free to take a look! :)

@dkaszews
Copy link

@dmitya26 Looks good, could you please also add test cases for[abc/], [%-/], [/-0] and[/-/] to show that they are all escaped?

@picnixzpicnixz removed the type-bugAn unexpected behavior, bug, or error labelMar 12, 2025
@picnixz
Copy link
Member

(type-bug is reserved for the issues generally)

Copy link
Member

@picnixzpicnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please revert all new lines changes that are un-necessary. New lines are added to separate logical sections of a function (the stdlib is quite compactly written).

In addition, please add more tests, some tests with multiple ranges,[%-0][1-9] for instance, some with incomplete ranges, some with side-by-side ranges, some with collapsing ranges. I may think of more once the implementation is stable.

dmitrin9 reacted with thumbs up emoji
@@ -263,7 +263,6 @@ def escape(pathname):
_dir_open_flags = os.O_RDONLY | getattr(os, 'O_DIRECTORY', 0)
_no_recurse_symlinks = object()


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please revert

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please revert.

@bedevere-app
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@dkaszews
Copy link

dkaszews commentedMar 12, 2025
edited
Loading

Just though of something, doesn't[!a] currently translate trivially to[^a]? Because that also needs a negative lookahead, otherwisea[!b]c will also falsely matcha/c.

Edit: Instead of negative lookahead, a more compact solution would be to replace[!...] with[^/...].

dmitrin9 reacted with thumbs up emoji

@dmitrin9
Copy link
Author

I have made the requested changes; please review again.

@bedevere-app
Copy link

Thanks for making the requested changes!

@picnixz: please review the changes made to this pull request.

@bedevere-appbedevere-appbot requested a review frompicnixzMarch 17, 2025 07:07
@dmitrin9
Copy link
Author

dmitrin9 commentedMar 17, 2025
edited
Loading

Just though of something, doesn't[!a] currently translate trivially to[^a]? Because that also needs a negative lookahead, otherwisea[!b]c will also falsely matcha/c.

Edit: Instead of negative lookahead, a more compact solution would be to replace[!...] with[^/...].

I definitely did implement this at some point, and it definitely is way easier than what's on my fork right now, I'm just not entirely confident it's spec compliant.

@dkaszews
Copy link

Wouldn't this not be spec compliant though?

What spec? The only spec concerns behavior ofglob.glob, which says no class can match path separator. So(?!/)[^...] and[^/...] are the same, because they match the exact same set of files.

@dmitrin9
Copy link
Author

Oh, my mistake. I can have the changes out to you later today! :)

dkaszews reacted with thumbs up emoji

Copy link
Member

@picnixzpicnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I haven't looked exactly at the implementation again because I want to be sure we're on the same page, especially concerning empty ranges.

@@ -263,7 +263,6 @@ def escape(pathname):
_dir_open_flags = os.O_RDONLY | getattr(os, 'O_DIRECTORY', 0)
_no_recurse_symlinks = object()


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please revert.

else:
negative_lookahead=''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
negative_lookahead=''
negative_lookahead=''

@@ -135,6 +138,9 @@ def _translate(pat, star, question_mark):
if chunks[k-1][-1] > chunks[k][0]:
chunks[k-1] = chunks[k-1][:-1] + chunks[k][1:]
del chunks[k]
if len(chunks)>1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
iflen(chunks)>1:
iflen(chunks)>1:

@@ -513,7 +513,14 @@ def fn(pat):
return glob.translate(pat, recursive=True, include_hidden=True, seps=['/', '\\'])
self.assertEqual(fn('foo/bar\\baz'), r'(?s:foo[/\\]bar[/\\]baz)\Z')
self.assertEqual(fn('**/*'), r'(?s:(?:.+[/\\])?[^/\\]+)\Z')

self.assertEqual(fn('foo[!a]bar'), r'(?s:foo(?![/\\])[^a]bar)\Z')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We also need new tests forfnmatch.translate.

self.assertEqual(fn('foo[%-0]bar'), r'(?s:foo(?![/\\])[%-0]bar)\Z')
self.assertEqual(fn('foo[%-0][1-9]bar'), r'(?s:foo(?![/\\])[%-0][1-9]bar)\Z')
self.assertEqual(fn('foo[0-%]bar'), r'(?s:foo(?!)bar)\Z')
self.assertEqual(fn('foo[^-'), r'(?s:foo\[\^\-)\Z')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We need also a test case with multiple ranges and incomplete ones, e.g.,[0-%][0-%[0-%]. And possibly with an additional tail after the last range.

@@ -513,7 +513,14 @@ def fn(pat):
return glob.translate(pat, recursive=True, include_hidden=True, seps=['/', '\\'])
self.assertEqual(fn('foo/bar\\baz'), r'(?s:foo[/\\]bar[/\\]baz)\Z')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

More generally, can you upodatetest_translate_matching and include the examples ofhttps://man7.org/linux/man-pages/man7/glob.7.html so that we have a compliant implementation?

@@ -0,0 +1 @@
Glob.translate negative-lookaheads path separators regex ranges that ecompass path seperator. For ranges which include path separator literals, the range is escaped.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This requires a better indication. In addition, aversionchanged:: next should be added for bothglob.translate() andfnmatch.translate(). Note that the meaning of/ infnmatch.translate() is different fromglob.translate() because/ isnot special at all.

Suggested change
Glob.translate negative-lookaheads path separators regexrangesthat ecompass path seperator. For ranges which include path separator literals, the range is escaped.
:func:`glob.translate` now correctly handlesrangesimplicitly containing path
separators (for instance, ``[0-%]`` contains ``/``). In addition, ranges including
path separator literals are now correctly escaped, as specified by POSIX specifications.

This suggestion is not perfect so we will likely come back later. However for the translate() functions need to be updated.

@dmitrin9
Copy link
Author

The empty ranges were replaced with a negative lookahead before I even opened the PR. I think we should leave it as is and remove the test case. The reason I wrote that test case was to insure that I wasn't altering its behavior by accident when we were discussing how to handle invalid ranges all the way back in the beginning of the issue thread.

@dkaszews
Copy link

To clarify, because "empty ranges" can be a bit ambiguous:

  • Immediately closed class[] -] as first character gets implicitly escaped, may become part of bigger class such as[][] is actually[\]\[], i.e. either literal[ or]. Since glob spec matches Python regex, no special handling needed.
  • Classes that are not empty, but nevertheless cannot match anything, usually due to a backwards range such as[z-a]. Again, could be left alone, but current implementation simplifies them to empty negative lookahead(?!) which has the same semantic of never matching anything.

@dmitrin9
Copy link
Author

Yea, I think it's best to leave it as is. I never intended on changing it and I don't think it is impacting the current issue at all.

@picnixz
Copy link
Member

but current implementation simplifies them to empty negative lookahead (?!) which has the same semantic of never matching anything.

Ups, I think I only remembered the part were we remove empty ranges, but then make them match nothing. False alarm, my bad!

@dmitrin9
Copy link
Author

dmitrin9 commentedMar 19, 2025
edited
Loading

Alright.

I changed the negative lookahead for '!' matching. I also added some more tests which account for rules mentioned in the manpage as you suggested. I am seeing now that I was a bit lacking on the test_translate_matching testcases, so I'll get to adding more of those, but if you see anything more that I haven't noticed yet lmk.

edit: In my next commit I'm also going to remove the newline in the documentation file that's currently failing the CI.

@dmitrin9
Copy link
Author

@picnixz I've made the requested changes! :)

@dmitrin9
Copy link
Author

@picnixz@barneygale Hey, I was just wondering if you guys had a chance to look at the changes I made.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@barneygalebarneygalebarneygale left review comments

@picnixzpicnixzpicnixz requested changes

@dkaszewsdkaszewsdkaszews left review comments

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@dmitrin9@barneygale@dkaszews@picnixz

[8]ページ先頭

©2009-2025 Movatter.jp