Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.4k
gh-91760: More strict rules for numerical group references and group names in RE#91792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Merged
Uh oh!
There was an error while loading.Please reload this page.
Merged
Changes fromall commits
Commits
Show all changes
10 commits Select commitHold shift + click to select a range
8909d14
gh-91760: More strict rules for numerical group references and group …
serhiy-storchaka46f320d
Merge branch 'main' into re-group-name
serhiy-storchaka6c1a446
Merge branch 'main' into re-group-name
serhiy-storchaka5026649
Address review comments and minimize the diff.
serhiy-storchaka2f486af
Merge branch 'main' into re-group-name
serhiy-storchakaed09039
Merge branch 'main' into re-group-name
serhiy-storchaka954ef45
Merge branch 're-group-name' of github.com:serhiy-storchaka/cpython i…
serhiy-storchaka3835c24
Merge branch 'main' into re-group-name
serhiy-storchakae0d2f85
Merge branch 'main' into re-group-name
serhiy-storchaka3c0dfcc
Update What's New
serhiy-storchakaFile filter
Filter by extension
Conversations
Failed to load comments.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Jump to file
Failed to load files.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
There are no files selected for viewing
19 changes: 11 additions & 8 deletionsDoc/library/re.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
10 changes: 10 additions & 0 deletionsDoc/whatsnew/3.12.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
40 changes: 12 additions & 28 deletionsLib/re/_parser.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -291,17 +291,13 @@ def error(self, msg, offset=0): | ||
msg = msg.encode('ascii', 'backslashreplace').decode('ascii') | ||
return error(msg, self.string, self.tell() - offset) | ||
def checkgroupname(self, name, offset): | ||
if not (self.istext or name.isascii()): | ||
msg = "bad character in group name %a" % name | ||
raise self.error(msg, len(name) + offset) | ||
if not name.isidentifier(): | ||
msg = "bad character in group name %r" % name | ||
raise self.error(msg, len(name) + offset) | ||
def _class_escape(source, escape): | ||
# handle escape code inside character class | ||
@@ -717,11 +713,11 @@ def _parse(source, state, verbose, nested, first=False): | ||
if sourcematch("<"): | ||
# named group: skip forward to end of name | ||
name = source.getuntil(">", "group name") | ||
source.checkgroupname(name, 1) | ||
elif sourcematch("="): | ||
# named backreference | ||
name = source.getuntil(")", "group name") | ||
source.checkgroupname(name, 1) | ||
gid = state.groupdict.get(name) | ||
if gid is None: | ||
msg = "unknown group name %r" % name | ||
@@ -782,20 +778,14 @@ def _parse(source, state, verbose, nested, first=False): | ||
elif char == "(": | ||
# conditional backreference group | ||
condname = source.getuntil(")", "group name") | ||
ifnot (condname.isdecimal() and condname.isascii()): | ||
source.checkgroupname(condname, 1) | ||
condgroup = state.groupdict.get(condname) | ||
if condgroup is None: | ||
msg = "unknown group name %r" % condname | ||
raise source.error(msg, len(condname) + 1) | ||
else: | ||
condgroup = int(condname) | ||
if not condgroup: | ||
raise source.error("bad group number", | ||
ezio-melotti marked this conversation as resolved. Show resolvedHide resolvedUh oh!There was an error while loading.Please reload this page. | ||
len(condname) + 1) | ||
@@ -1022,20 +1012,14 @@ def addgroup(index, pos): | ||
if not s.match("<"): | ||
raise s.error("missing <") | ||
name = s.getuntil(">", "group name") | ||
ifnot (name.isdecimal() and name.isascii()): | ||
s.checkgroupname(name, 1) | ||
try: | ||
index = groupindex[name] | ||
serhiy-storchaka marked this conversation as resolved. Show resolvedHide resolvedUh oh!There was an error while loading.Please reload this page. | ||
except KeyError: | ||
raise IndexError("unknown group name %r" % name) from None | ||
else: | ||
index = int(name) | ||
if index >= MAXGROUPS: | ||
raise s.error("invalid group reference %d" % index, | ||
len(name) + 1) | ||
79 changes: 24 additions & 55 deletionsLib/test/test_re.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
5 changes: 5 additions & 0 deletionsMisc/NEWS.d/next/Library/2022-04-21-19-14-29.gh-issue-91760.54AR-m.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Apply more strict rules for numerical group references and group names in | ||
regular expressions. Only sequence of ASCII digits is now accepted as | ||
a numerical reference. The group name in | ||
bytes patterns and replacement strings can now only contain ASCII letters | ||
and digits and underscore. |
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.