Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
Closed
Description
Bug report
Bug description:
It seems like SRE ignores the ASCII flag when parsing a character range whose upper bound is beyond the BMP region:
>>>importre# should match>>>regex=re.compile("[\ua7aa-\uffff]",re.IGNORECASE)>>>print(regex.match("\u0266"))<re.Matchobject;span=(0,1),match='ɦ'># should not match>>>regex=re.compile("[\ua7aa-\U00010000]",re.ASCII|re.IGNORECASE)>>>print(regex.match("\u0266"))<re.Matchobject;span=(0,1),match='ɦ'># must be related to case folding, since \ua7aa folds to \u0266>>>regex=re.compile("[\ua7ab-\U00010000]",re.ASCII|re.IGNORECASE)>>>print(regex.match("\u0266"))None# correct behavior when upper bound is in BMP>>>regex=re.compile("[\ua7aa-\uffff]",re.ASCII|re.IGNORECASE)>>>print(regex.match("\u0266"))None
CPython versions tested on:
3.12
Operating systems tested on:
Linux
Linked PRs
- gh-126505: Do not use Unicode case folding in ASCII regexes #126544
- gh-126505: Fix bugs in compiling case-insensitive character classes #126557
- [3.13] gh-126505: Fix bugs in compiling case-insensitive character classes (GH-126557) #126689
- [3.12] gh-126505: Fix bugs in compiling case-insensitive character classes (GH-126557) #126690