NotificationsYou must be signed in to change notification settings
Fork32k
Star67.3k

gh-130942: Fix path seperator matched in character ranges for glob.translate#130989

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

dmitrin9 wants to merge17 commits intopython:main

base:main

Choose a base branch

fromdmitrin9:glob_translations

Open

gh-130942: Fix path seperator matched in character ranges for glob.translate#130989

dmitrin9 wants to merge17 commits intopython:mainfromdmitrin9:glob_translations

+48 −4

Conversation

Copy link

dmitrin9 commentedMar 8, 2025•
edited by bedevere-appbot
Loading

Issue:#130942

Issue:glob.translate incorrectly matches path separator in character ranges #130942

added testcase for globbing with a ranged seperator

be37b54

Copy link

ghost commentedMar 8, 2025•
edited by ghost
Loading

All commit authors signed the Contributor License Agreement.

bedevere-appbot added the testsTests in the Lib/test dir label

Mar 8, 2025

Copy link

bedevere-appbot commentedMar 8, 2025

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

bedevere-appbot added the awaiting review label

Mar 8, 2025

bedevere-appbot mentioned this pull request

Mar 8, 2025

glob.translate incorrectly matches path separator in character ranges#130942

Open

Merge branch 'main' into glob_translations

b874745

Copy link

bedevere-appbot commentedMar 8, 2025

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

dmitrin9 marked this pull request as draft

March 8, 2025 23:09

bedevere-appbot removed the awaiting review label

Mar 8, 2025

blurb-itbotand others added8 commits

March 8, 2025 23:26

📜🤖 Added by blurb_it.

6990566

Merge branch 'main' into glob_translations

cc03a6d

WIP - need to refine glob testcases.

cea1f5e

Merge branch 'glob_translations' ofhttps://github.com/dmitya26/cpython…

5251d75

… into glob_translations

Escape regex ranges including seperators in glob.translate.

dd1b155

Merge branch 'main' into glob_translations

e8b3559

Typo function name in glob.py

9f461a5

Merge branch 'glob_translations' ofhttps://github.com/dmitya26/cpython…

4820018

… into glob_translations

dmitrin9 marked this pull request as ready for review

March 10, 2025 19:16

bedevere-appbot added the awaiting review label

Mar 10, 2025

Copy link

Author

dmitrin9 commentedMar 10, 2025

@barneygale @picnixz
PR is ready for review! :)

Copy link

Author

dmitrin9 commentedMar 10, 2025

+type-bug -tests

Copy link

Contributor

barneygale commentedMar 10, 2025•
edited
Loading

Thanks v much for taking a look!

Range expressions like[%-0] are still valid, so we should evaluate them as wildcards rather than matching literally IMO. Basically we just need to apply an additional restriction: don't match a separator. We could do that with a lookahead (untested):

diff --git a/Lib/fnmatch.py b/Lib/fnmatch.pyindex 865baea2346..ee35dd4d24c 100644--- a/Lib/fnmatch.py+++ b/Lib/fnmatch.py@@ -145,8 +145,10 @@ def _translate(pat, star, question_mark):                     add('(?!)')                 elif stuff == '!':                     # Negated empty range: match any character.-                    add('.')+                    add(question_mark)                 else:+                    if question_mark != '.':+                        add(f'(?={question_mark})')                     # Escape set operations (&&, ~~ and ||).                     stuff = _re_setops_sub(r'\\\1', stuff)                     if stuff[0] == '!':

barneygale added type-bugAn unexpected behavior, bug, or error and removed testsTests in the Lib/test dir labels

Mar 10, 2025

dkaszews reviewed

Mar 11, 2025

View reviewed changes

Lib/test/test_glob.py Outdated

		@@ -514,6 +514,9 @@ def fn(pat):
		self.assertEqual(fn('foo/bar\\baz'), r'(?s:foo[/\\]bar[/\\]baz)\Z')
		self.assertEqual(fn('*/'), r'(?s:(?:.+[/\\])?[^/\\]+)\Z')

		self.assertEqual(fn('foo[%-0]bar'), r'(?s:foo\[%-0\]bar)\Z')

Copy link

dkaszewsMar 11, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not sure this is correct. From my understanding of manpages quoted in the issue, a class should be escaped only if it contains a literal path separator, not a range encompassing it. In latter case, we need to just exclude the separator.

[%-0]=> (?!/)[%-0][ab/]=> \[ab/\]

Edge case to be tested in bash andglob.glob: is a range beginning with separator ([%-/] or[/-0]) the first case or the second one? What about corner case of single element range[/-/]? I would say that all three should be escaped since they "contain an explicit/ character".

Copy link

dkaszewsMar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Also, does

A range containing an
explicit '/' character is syntactically incorrect. (POSIX requires that
syntactically incorrect patterns are left unchanged.)

mean that entire glob should be escaped, or just the part with the separator? I.e, does[ab][0/][xy] map to[ab]\[0/\][xy] or\[ab\]\[0/\]\[xy\]?

Copy link

Contributor

barneygaleMar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Relevant standard seems to be here:https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/utilities/V3_chap02.html#tag_18_13_01

Copy link

Author

dmitrin9Mar 11, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

(Thus, "[]-]" matches just the two characters ']' and
'-', and "[--0]" matches the three characters '-', '.', and '0',
since '/' cannot be matched.)

This would indicate that a range which includes a '/' character as a non-literal would match that range but exclude the '/' character, at least with my interpretation.

I got that from the glob manpage.

Copy link

dkaszewsMar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

2.13.3.1 looks to back my interpretation:

If path separator appears between brackets, be it a single character or next to a hyphen, escape entire bracket expression. For'\\' in seps case, be careful to check it is not an escape but actual'\\\\'.
Else, if any hyphened range spans a separator, add a negative lookahead. For simplicity, it can also be added for any bracket expression with a hyphen, or any bracket at all - result is the same, just simplifies regex in most cases.
All bracket expressions are analyzed separately, so path separator in one does not invalidate and escape all others.

Copy link

dkaszewsMar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@dmitya26 I don't have Python on hand, can you just quickly runglob.glob('a[/-b]c') on a following tree:

|-- abc`-- a[    `-- -b]c

If it returns[abc], then you are correct, if it returns the file in subdir then my interpretation seems to match existing implementation.

Copy link

Author

dmitrin9Mar 11, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

It returns '[]'.

edit: Oh wait I think I might've misread how the directories need to be structured.

Copy link

Author

dmitrin9Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

├── a[
│ └── -b]c
└── abc

and

glob.glob('a[/-b]c')

would return

['a[/-b]c']

for me.

Copy link

Author

dmitrin9Mar 14, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Wait so regarding the spec, do you think we should be disallowing only '/' characters, the system's path separator (os.path.sep), or all path separators mentioned like the ones in glob.translate?

Copy link

dkaszewsMar 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Current implementation already extends the spec to all given separators, e.g.glob.translate('abc?', seps=['/', '\\']) maps to'(?s:abc[^/\\\\])\\Z'.

dmitrin9 added2 commits

March 12, 2025 00:05

Lookahead to ignore path separators in ranges which span path separat…

c7f6d87

…ors in fnmatch._translate

Added empty negative lookahead in front of ranges which encompass pat…

d5748b8

…h separator in fnmatch._translate().

Copy link

Author

dmitrin9 commentedMar 12, 2025

@barneygale Alright . I just pushed the implementation Dkaszews proposed earlier as that seems to be the most compliant with the spec you mentioned earlier on. I can also get you the initial implementation you showed where it uses a lookahead to exclude path separators from the range though if you feel that would be better. Feel free to take a look! :)

Copy link

dkaszews commentedMar 12, 2025

@dmitya26 Looks good, could you please also add test cases for[abc/], [%-/], [/-0] and[/-/] to show that they are all escaped?

picnixz removed the type-bugAn unexpected behavior, bug, or error label

Mar 12, 2025

Copy link

Member

picnixz commentedMar 12, 2025

(type-bug is reserved for the issues generally)

picnixz requested changes

Mar 12, 2025

View reviewed changes

Copy link

Member

picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please revert all new lines changes that are un-necessary. New lines are added to separate logical sections of a function (the stdlib is quite compactly written).

In addition, please add more tests, some tests with multiple ranges,[%-0][1-9] for instance, some with incomplete ranges, some with side-by-side ranges, some with collapsing ranges. I may think of more once the implementation is stable.

Lib/fnmatch.py OutdatedShow resolvedHide resolved

Lib/glob.py

		@@ -263,7 +263,6 @@ def escape(pathname):
		_dir_open_flags = os.O_RDONLY \| getattr(os, 'O_DIRECTORY', 0)
		_no_recurse_symlinks = object()

Copy link

Member

picnixzMar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please revert

Copy link

Member

picnixzMar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please revert.

bedevere-appbot added awaiting changes and removed awaiting review labels

Mar 12, 2025

Copy link

bedevere-appbot commentedMar 12, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link

dkaszews commentedMar 12, 2025•
edited
Loading

Just though of something, doesn't[!a] currently translate trivially to[^a]? Because that also needs a negative lookahead, otherwisea[!b]c will also falsely matcha/c.

Edit: Instead of negative lookahead, a more compact solution would be to replace[!...] with[^/...].

dmitrin9 added2 commits

March 13, 2025 11:34

Revert "Added empty negative lookahead in front of ranges which encom…

95b4ccf

…pass path separator in fnmatch._translate()."

Refine testcases and and escape ranges including path separator liter…

cdfcf47

…als.

Copy link

Author

dmitrin9 commentedMar 17, 2025

I have made the requested changes; please review again.

bedevere-appbot added awaiting change review and removed awaiting changes labels

Mar 17, 2025

Copy link

bedevere-appbot commentedMar 17, 2025

Thanks for making the requested changes!

@picnixz: please review the changes made to this pull request.

bedevere-appbot requested a review frompicnixz

March 17, 2025 07:07

fix blurb.

3929b06

Copy link

Author

dmitrin9 commentedMar 17, 2025•
edited
Loading

Just though of something, doesn't[!a] currently translate trivially to[^a]? Because that also needs a negative lookahead, otherwisea[!b]c will also falsely matcha/c.
Edit: Instead of negative lookahead, a more compact solution would be to replace[!...] with[^/...].

I definitely did implement this at some point, and it definitely is way easier than what's on my fork right now, I'm just not entirely confident it's spec compliant.

Copy link

dkaszews commentedMar 17, 2025

Wouldn't this not be spec compliant though?

What spec? The only spec concerns behavior ofglob.glob, which says no class can match path separator. So(?!/)[^...] and[^/...] are the same, because they match the exact same set of files.

Copy link

Author

dmitrin9 commentedMar 17, 2025

Oh, my mistake. I can have the changes out to you later today! :)

picnixz reviewed

Mar 17, 2025

View reviewed changes

Copy link

Member

picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I haven't looked exactly at the implementation again because I want to be sure we're on the same page, especially concerning empty ranges.

Lib/glob.py

		@@ -263,7 +263,6 @@ def escape(pathname):
		_dir_open_flags = os.O_RDONLY \| getattr(os, 'O_DIRECTORY', 0)
		_no_recurse_symlinks = object()

Copy link

Member

picnixzMar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Please revert.

Lib/fnmatch.py

		else:
		negative_lookahead=''

Copy link

Member

picnixzMar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	negative_lookahead=''
	negative_lookahead=''

Lib/fnmatch.py Outdated

		@@ -135,6 +138,9 @@ def _translate(pat, star, question_mark):
		if chunks[k-1][-1] > chunks[k][0]:
		chunks[k-1] = chunks[k-1][:-1] + chunks[k][1:]
		del chunks[k]
		if len(chunks)>1:

Copy link

Member

picnixzMar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	iflen(chunks)>1:
	iflen(chunks)>1:

Lib/test/test_glob.py Outdated

		@@ -513,7 +513,14 @@ def fn(pat):
		return glob.translate(pat, recursive=True, include_hidden=True, seps=['/', '\\'])
		self.assertEqual(fn('foo/bar\\baz'), r'(?s:foo[/\\]bar[/\\]baz)\Z')
		self.assertEqual(fn('*/'), r'(?s:(?:.+[/\\])?[^/\\]+)\Z')

		self.assertEqual(fn('foo[!a]bar'), r'(?s:foo(?![/\\])[^a]bar)\Z')

Copy link

Member

picnixzMar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We also need new tests forfnmatch.translate.

Lib/test/test_glob.py

		self.assertEqual(fn('foo[%-0]bar'), r'(?s:foo(?![/\\])[%-0]bar)\Z')
		self.assertEqual(fn('foo[%-0][1-9]bar'), r'(?s:foo(?![/\\])[%-0][1-9]bar)\Z')
		self.assertEqual(fn('foo[0-%]bar'), r'(?s:foo(?!)bar)\Z')
		self.assertEqual(fn('foo[^-'), r'(?s:foo\[\^\-)\Z')

Copy link

Member

picnixzMar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We need also a test case with multiple ranges and incomplete ones, e.g.,[0-%][0-%[0-%]. And possibly with an additional tail after the last range.

Lib/test/test_glob.pyShow resolvedHide resolved

Lib/test/test_glob.py

		@@ -513,7 +513,14 @@ def fn(pat):
		return glob.translate(pat, recursive=True, include_hidden=True, seps=['/', '\\'])
		self.assertEqual(fn('foo/bar\\baz'), r'(?s:foo[/\\]bar[/\\]baz)\Z')

Copy link

Member

picnixzMar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

More generally, can you upodatetest_translate_matching and include the examples ofhttps://man7.org/linux/man-pages/man7/glob.7.html so that we have a compliant implementation?

Misc/NEWS.d/next/Library/2025-03-08-23-26-50.gh-issue-130942.jxRMK_.rst Outdated

		@@ -0,0 +1 @@
		Glob.translate negative-lookaheads path separators regex ranges that ecompass path seperator. For ranges which include path separator literals, the range is escaped.

Copy link

Member

picnixzMar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This requires a better indication. In addition, aversionchanged:: next should be added for bothglob.translate() andfnmatch.translate(). Note that the meaning of/ infnmatch.translate() is different fromglob.translate() because/ isnot special at all.

Suggested change

	Glob.translate negative-lookaheads path separators regexrangesthat ecompass path seperator. For ranges which include path separator literals, the range is escaped.
	:func:`glob.translate` now correctly handlesrangesimplicitly containing path
	separators (for instance, ``[0-%]`` contains ``/``). In addition, ranges including
	path separator literals are now correctly escaped, as specified by POSIX specifications.

This suggestion is not perfect so we will likely come back later. However for the translate() functions need to be updated.

Copy link

Author

dmitrin9 commentedMar 17, 2025

The empty ranges were replaced with a negative lookahead before I even opened the PR. I think we should leave it as is and remove the test case. The reason I wrote that test case was to insure that I wasn't altering its behavior by accident when we were discussing how to handle invalid ranges all the way back in the beginning of the issue thread.

Copy link

dkaszews commentedMar 17, 2025

To clarify, because "empty ranges" can be a bit ambiguous:

Immediately closed class[] -] as first character gets implicitly escaped, may become part of bigger class such as[][] is actually[\]\[], i.e. either literal[ or]. Since glob spec matches Python regex, no special handling needed.
Classes that are not empty, but nevertheless cannot match anything, usually due to a backwards range such as[z-a]. Again, could be left alone, but current implementation simplifies them to empty negative lookahead(?!) which has the same semantic of never matching anything.

Copy link

Author

dmitrin9 commentedMar 17, 2025

Yea, I think it's best to leave it as is. I never intended on changing it and I don't think it is impacting the current issue at all.

Copy link

Member

picnixz commentedMar 17, 2025

but current implementation simplifies them to empty negative lookahead (?!) which has the same semantic of never matching anything.

Ups, I think I only remembered the part were we remove empty ranges, but then make them match nothing. False alarm, my bad!

Refine fnmatch translate and glob translate testcases.

e5abc80

Copy link

Author

dmitrin9 commentedMar 19, 2025•
edited
Loading

Alright.

I changed the negative lookahead for '!' matching. I also added some more tests which account for rules mentioned in the manpage as you suggested. I am seeing now that I was a bit lacking on the test_translate_matching testcases, so I'll get to adding more of those, but if you see anything more that I haven't noticed yet lmk.

edit: In my next commit I'm also going to remove the newline in the documentation file that's currently failing the CI.

Add some more matching tests for glob tests.

93c3092

Copy link

Author

dmitrin9 commentedMar 20, 2025

@picnixz I've made the requested changes! :)

Copy link

Author

dmitrin9 commentedMay 3, 2025

@picnixz @barneygale Hey, I was just wondering if you guys had a chance to look at the changes I made.

Labels

awaiting change review

4 participants

Movatterモバイル変換

Uh oh!

gh-130942: Fix path seperator matched in character ranges for glob.translate#130989

Are you sure you want to change the base?

gh-130942: Fix path seperator matched in character ranges for glob.translate#130989

Conversation

dmitrin9 commentedMar 8, 2025• edited by bedevere-appbotLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

ghost commentedMar 8, 2025• edited by ghostLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

bedevere-appbot commentedMar 8, 2025

Uh oh!

bedevere-appbot commentedMar 8, 2025

Uh oh!

dmitrin9 commentedMar 10, 2025

Uh oh!

dmitrin9 commentedMar 10, 2025

Uh oh!

barneygale commentedMar 10, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

dkaszewsMar 11, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmitrin9Mar 11, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmitrin9Mar 11, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmitrin9Mar 14, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmitrin9 commentedMar 12, 2025

Uh oh!

dkaszews commentedMar 12, 2025

Uh oh!

picnixz commentedMar 12, 2025

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bedevere-appbot commentedMar 12, 2025

Uh oh!

dkaszews commentedMar 12, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

dmitrin9 commentedMar 17, 2025

Uh oh!

bedevere-appbot commentedMar 17, 2025

Uh oh!

dmitrin9 commentedMar 17, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

dmitrin9 commentedMar 8, 2025•
edited by bedevere-appbot
Loading

ghost commentedMar 8, 2025•
edited by ghost
Loading

barneygale commentedMar 10, 2025•
edited
Loading

dkaszewsMar 11, 2025•
edited
Loading

dmitrin9Mar 11, 2025•
edited
Loading

dmitrin9Mar 11, 2025•
edited
Loading

dmitrin9Mar 14, 2025•
edited
Loading

dkaszews commentedMar 12, 2025•
edited
Loading

dmitrin9 commentedMar 17, 2025•
edited
Loading

dmitrin9 commentedMar 19, 2025•
edited
Loading