Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.3k
gh-130942: Fix path seperator matched in character ranges for glob.translate#130989
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Changes from10 commits
be37b54
b874745
6990566
cc03a6d
cea1f5e
5251d75
dd1b155
e8b3559
9f461a5
4820018
c7f6d87
d5748b8
95b4ccf
cdfcf47
3929b06
e5abc80
93c3092
File filter
Filter by extension
Conversations
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -263,6 +263,54 @@ def escape(pathname): | ||
_dir_open_flags = os.O_RDONLY | getattr(os, 'O_DIRECTORY', 0) | ||
_no_recurse_symlinks = object() | ||
def escape_regex_range_including_seps(pat, seps): | ||
"""Escape ranges containing seperators in a path | ||
""" | ||
pat = list(pat) | ||
ordinal_seps=set(map(ord, seps)) | ||
insideRange = False | ||
ds=[] | ||
buf='' | ||
idx1=0 | ||
idx2=0 | ||
rangeIncludesSep=False | ||
for path_idx, path_ch in enumerate(pat): | ||
if path_idx > 0: | ||
if path_ch == '[' and pat[path_idx-1] != '\\': | ||
insideRange = True | ||
idx1=path_idx | ||
continue | ||
if path_ch == ']' and pat[path_idx-1] != '\\': | ||
insideRange = False | ||
idx2=path_idx+1 | ||
if insideRange: | ||
buf+=path_ch | ||
if path_ch == '-': | ||
glob_range = list(range(ord(pat[path_idx-1]), ord(pat[path_idx+1]))) | ||
if ordinal_seps.intersection(glob_range): | ||
rangeIncludesSep = True | ||
elif len(buf)>0: | ||
ds.append([idx1, idx2, rangeIncludesSep]) | ||
buf='' | ||
idx1=1 | ||
idx2=2 | ||
rangeIncludesSep=False | ||
for ds_idx, ds_elem in enumerate(ds): | ||
idx1=ds_elem[0] | ||
idx2=ds_elem[1] | ||
rangeIncludesSep=ds_elem[2] | ||
if rangeIncludesSep: | ||
pat.insert(idx1, '\\') | ||
pat.insert(idx2, '\\') | ||
return ''.join(pat) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. Please revert There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. Please revert. | ||
def translate(pat, *, recursive=False, include_hidden=False, seps=None): | ||
"""Translate a pathname with shell wildcards to a regular expression. | ||
@@ -282,6 +330,8 @@ def translate(pat, *, recursive=False, include_hidden=False, seps=None): | ||
seps = (os.path.sep, os.path.altsep) | ||
else: | ||
seps = os.path.sep | ||
escaped_seps = ''.join(map(re.escape, seps)) | ||
any_sep = f'[{escaped_seps}]' if len(seps) > 1 else escaped_seps | ||
not_sep = f'[^{escaped_seps}]' | ||
@@ -312,10 +362,14 @@ def translate(pat, *, recursive=False, include_hidden=False, seps=None): | ||
if part: | ||
if not include_hidden and part[0] in '*?': | ||
results.append(r'(?!\.)') | ||
results.extend(fnmatch._translate(part, f'{not_sep}*', not_sep)[0]) | ||
if idx < last_part_idx: | ||
results.append(any_sep) | ||
res = ''.join(results) | ||
res=escape_regex_range_including_seps(res, seps=seps) | ||
return fr'(?s:{res})\Z' | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -514,6 +514,9 @@ def fn(pat): | ||
self.assertEqual(fn('foo/bar\\baz'), r'(?s:foo[/\\]bar[/\\]baz)\Z') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. More generally, can you upodate | ||
self.assertEqual(fn('**/*'), r'(?s:(?:.+[/\\])?[^/\\]+)\Z') | ||
self.assertEqual(fn('foo[%-0]bar'), r'(?s:foo\[%-0\]bar)\Z') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. I'm not sure this is correct. From my understanding of manpages quoted in the issue, a class should be escaped only if it contains a literal path separator, not a range encompassing it. In latter case, we need to just exclude the separator. [%-0]=> (?!/)[%-0][ab/]=> \[ab/\] Edge case to be tested in bash and There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. Also, does
mean that entire glob should be escaped, or just the part with the separator? I.e, does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. Relevant standard seems to be here:https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/utilities/V3_chap02.html#tag_18_13_01 Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more.
This would indicate that a range which includes a '/' character as a non-literal would match that range but exclude the '/' character, at least with my interpretation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. 2.13.3.1 looks to back my interpretation:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. @dmitya26 I don't have Python on hand, can you just quickly run
If it returns Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. It returns '[]'. edit: Oh wait I think I might've misread how the directories need to be structured. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. ├── a[ and glob.glob('a[/-b]c') would return ['a[/-b]c'] for me. Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. Wait so regarding the spec, do you think we should be disallowing only '/' characters, the system's path separator (os.path.sep), or all path separators mentioned like the ones in glob.translate? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. Current implementation already extends the spec to all given separators, e.g. | ||
self.assertEqual(fn('foo[U-d]bar'), r'(?s:foo\[U-d\]bar)\Z') | ||
if __name__ == "__main__": | ||
unittest.main() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Glob.translate escapes regex ranges that ecompass path seperator. |
Uh oh!
There was an error while loading.Please reload this page.