
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2006-07-09 18:34 bynneonneo, last changed2022-04-11 14:56 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| re_sub_unmatched_group.patch | serhiy.storchaka,2014-09-18 10:54 | review | ||
| Messages (23) | |||
|---|---|---|---|
| msg29112 -(view) | Author: Robert Xiao (nneonneo)* | Date: 2006-07-09 18:34 | |
Using sre.sub[n], an "unmatched group" error can occur.The test I used is this pattern:sre.sub("foo(?:b(ar)|baz)","\\1","foobaz")This will cause the following backtrace to occur:Traceback (most recent call last): File "<stdin>", line 1, in ? File "lib/python2.4/sre.py", line 142, in sub return _compile(pattern, 0).sub(repl, string, count) File "lib/python2.4/sre.py", line 260, in filter return sre_parse.expand_template(template, match) File "lib/python2.4/sre_parse.py", line 782, in expand_template raise error, "unmatched group"sre_constants.error: unmatched groupPython Version 2.4.3, Mac OS X (behaviour has been verified on Windows 2.4.3 as well).This behaviour, while by design, is unwanted because this type of matching usually requests that a blank match be returned (i.e. the example should return '')The example that I was trying resembles the following:sre.sub("User: (?:Registered User #(\d+)|Guest)","%USERID|\1%",data)The intended behaviour is that the function returns "" when the user is a guest and the user number if the user is a registered member.However, when this function encounters a Guest, it raises an exception and terminates, which is not what is wanted.Perl and other regex engines behave as I have described, substituting empty strings for unmatched groups. The code fix is relatively simple, and would really help out for these types of things. | |||
| msg29113 -(view) | Author: Matt Chaput (mchaput) | Date: 2007-02-15 18:35 | |
The current behavior also makes the "sub" function useless when you need to backreference a group that might not capture, since you have no chance to deal with the exception. | |||
| msg29114 -(view) | Author: Robert Xiao (nneonneo)* | Date: 2007-02-17 02:56 | |
AFAIK the findall function works as desired in this respect: empty matches will return empty strings. | |||
| msg58672 -(view) | Author: Brandon Mintern (BMintern) | Date: 2007-12-16 12:24 | |
This is still a problem which has just given me a headache, becauseusing re.sub now requires gymnastics instead of just using a simplestring as I did in Perl. | |||
| msg69541 -(view) | Author: Gerard (gerardjp) | Date: 2008-07-11 08:17 | |
Hi All,I found a workaround for the re.sub method so it does not raise anexception but returns and empty string when backref-ing an empty group.This is the nutshell:When doing a search and replace with sub, replace the group representedas optional for a group represented as an alternation with one emptysubexpression. So instead of this “(.+?)?” use this “(|.+?)” (withoutthe double quotes).If there’s nothing matched by this group the empty subexpressionmatches. Then an empty string is returned instead of a None and the submethod is executed normally instead of raising the “unmatched group” error.A complete description is in my post:http://www.gp-net.nl/2008/07/11/solved-python-regex-raising-exception-unmatched-group/Regards,Gerard. | |||
| msg69558 -(view) | Author: Brandon Mintern (BMintern) | Date: 2008-07-11 16:52 | |
Looking at your code example, that solution seems quite obvious now, andI wouldn't even call it a "workaround". Thanks for figuring this out.Now if I could only remember what code I was using that for... | |||
| msg78272 -(view) | Author: Robert Xiao (nneonneo)* | Date: 2008-12-24 21:30 | |
How would I apply that workaround to my example?re.sub("foo(?:b(ar)|baz)","\\1","foobaz") | |||
| msg79830 -(view) | Author: Gerard (gerardjp) | Date: 2009-01-14 05:21 | |
Dear Bobby,I don't see what would be the part that generates the empty string?Regards,Gerard. | |||
| msg79853 -(view) | Author: Robert Xiao (nneonneo)* | Date: 2009-01-14 14:34 | |
Well, in this example the group (ar) is unmatched, so sre throws theerror, and because of the alternation, the workaround you mentioneddoesn't seem to directly apply.A better example is probablyre.sub("foo(?:b(ar)|foo)","\\1","foofoo")because this can't be simply repaired by refactoring the regex.The correct behaviour, as I have observed in other regeximplementations, is to replace the group by the empty string; forexample, in #"ar">>> 'foobaz'.replace(/foo(?:b(ar)|baz)/,'$1')"" | |||
| msg81064 -(view) | Author: Gerard (gerardjp) | Date: 2009-02-03 15:59 | |
Bobby,Can you post the actual text you need this for? The back ref indeedreturns a None. I'm wondering if the regex can be be simplefied and if apositive lookbehind could solve this.Symantically speaking ... If there's a "b" then return the "ar", becausethen an empty alternate might again be of help.Kind regards,Gerard. | |||
| msg81118 -(view) | Author: Robert Xiao (nneonneo)* | Date: 2009-02-04 00:36 | |
It was so long ago, I've since redone half my codebase (the hack isstill there, but I can't remember what it was meant to replace now :( ).Sorry about that. | |||
| msg81220 -(view) | Author: Matthew Barnett (mrabarnett)*![]() | Date: 2009-02-05 19:32 | |
This has been addressed in issue#2636. | |||
| msg81462 -(view) | Author: Gerard (gerardjp) | Date: 2009-02-09 16:44 | |
Matthew,Thanx for the heads-up!Regards,Gerard. | |||
| msg108662 -(view) | Author: Terry J. Reedy (terry.reedy)*![]() | Date: 2010-06-26 00:30 | |
If I understand "This has been addressed in issue#2636.", this issue should be closed as, perhaps, out-of-date or duplicate, with 2636 as superceder. Correct? | |||
| msg108669 -(view) | Author: Matthew Barnett (mrabarnett)*![]() | Date: 2010-06-26 00:58 | |
Issue#2636 resulted in the new regex module (also available on PyPI), so this issue is addressed by that, but there's no patch for the re module. | |||
| msg108670 -(view) | Author: Ezio Melotti (ezio.melotti)*![]() | Date: 2010-06-26 01:09 | |
It would be nice if you could port 'pieces' of#2636 to Python, in order to fix this and other bugs (and possibly add more features too). | |||
| msg155967 -(view) | Author: Nikki DelRosso (Nikker) | Date: 2012-03-15 22:02 | |
I'm having the same issue as the original author of this issue was. The workaround does not apply to the situation where the captured text is on one side of an "or" grouping, rather than just being optional. I'm trying to remove groups of text in parentheses that come at the end of a string, but if the content in a pair of parentheses is a number, I want to retain it. My regular expression looks like so:These work:>>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (2009)')'avatar 2009'>>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (2009) (special edition)')'avatar 2009'This doesn't:>>> re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (special Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.6/re.py", line 151, in sub return _compile(pattern, 0).sub(repl, string, count) File "/usr/lib/python2.6/re.py", line 278, in filter return sre_parse.expand_template(template, match) File "/usr/lib/python2.6/sre_parse.py", line 793, in expand_template raise error, "unmatched group"sre_constants.error: unmatched groupedition)')Is there some way I can apply this workaround to this situation? | |||
| msg155969 -(view) | Author: Nikki DelRosso (Nikker) | Date: 2012-03-15 22:04 | |
Sorry, the non-working command should look as follows:re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$','\\1','avatar (special edition)') | |||
| msg155982 -(view) | Author: Matthew Barnett (mrabarnett)*![]() | Date: 2012-03-16 00:59 | |
The replacement can be a callable, so you could do this:re.sub(r'(?:\((?:(\d+)|.*?)\)\s*)+$', lambda m: m.group(1) or '', 'avatar (special edition)') | |||
| msg155983 -(view) | Author: Nikki DelRosso (Nikker) | Date: 2012-03-16 01:08 | |
Perfect; thank you! | |||
| msg227037 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2014-09-18 10:54 | |
Here is a patch which make unmatched groups to be replaced by empty string. These changes looks rather as new feature than bug fix and therefore can be applied only to 3.5. | |||
| msg228966 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2014-10-10 08:16 | |
New changesetbd2f1ea04025 by Serhiy Storchaka in branch 'default':Issue 1519638: Now unmatched groups are replaced with empty strings in re.sub()https://hg.python.org/cpython/rev/bd2f1ea04025 | |||
| msg228969 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2014-10-10 08:45 | |
Thank you for your review Antoine. | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:56:18 | admin | set | github: 43640 |
| 2014-10-10 08:45:02 | serhiy.storchaka | set | status: open -> closed resolution: fixed messages: +msg228969 stage: patch review -> resolved |
| 2014-10-10 08:16:35 | python-dev | set | nosy: +python-dev messages: +msg228966 |
| 2014-10-10 07:50:01 | serhiy.storchaka | set | assignee:serhiy.storchaka |
| 2014-10-08 20:32:20 | pitrou | set | assignee:effbot -> (no value) |
| 2014-09-18 10:54:53 | serhiy.storchaka | set | files: +re_sub_unmatched_group.patch type: enhancement components: + Library (Lib) versions: + Python 3.5, - Python 2.6, Python 2.7 keywords: +patch nosy: +serhiy.storchaka messages: +msg227037 stage: patch review |
| 2013-09-16 14:39:27 | THRlWiTi | set | nosy: +THRlWiTi |
| 2012-03-16 01:08:10 | Nikker | set | messages: +msg155983 |
| 2012-03-16 00:59:59 | mrabarnett | set | messages: +msg155982 |
| 2012-03-15 22:04:12 | Nikker | set | messages: +msg155969 |
| 2012-03-15 22:02:49 | Nikker | set | nosy: +Nikker messages: +msg155967 |
| 2010-06-26 01:09:57 | ezio.melotti | set | nosy: +ezio.melotti messages: +msg108670 |
| 2010-06-26 00:58:24 | mrabarnett | set | messages: +msg108669 |
| 2010-06-26 00:30:53 | terry.reedy | set | nosy: +terry.reedy messages: +msg108662 versions: - Python 2.5, Python 3.0 |
| 2009-02-09 16:44:49 | gerardjp | set | messages: +msg81462 |
| 2009-02-05 19:32:55 | mrabarnett | set | nosy: +mrabarnett messages: +msg81220 |
| 2009-02-04 00:36:38 | nneonneo | set | messages: +msg81118 |
| 2009-02-03 15:59:47 | gerardjp | set | messages: +msg81064 |
| 2009-01-14 14:34:02 | nneonneo | set | messages: +msg79853 versions: + Python 2.6, Python 2.5, Python 3.0 |
| 2009-01-14 05:21:40 | gerardjp | set | messages: +msg79830 |
| 2008-12-24 21:30:42 | nneonneo | set | messages: +msg78272 |
| 2008-09-27 14:39:08 | timehorse | set | versions: + Python 2.7, - Python 2.5 |
| 2008-09-27 14:36:36 | timehorse | set | nosy: +timehorse |
| 2008-07-11 16:52:19 | BMintern | set | messages: +msg69558 |
| 2008-07-11 08:17:20 | gerardjp | set | nosy: +gerardjp messages: +msg69541 title: Unmatched Group issue -> Unmatched Group issue - workaround |
| 2007-12-16 12:24:50 | BMintern | set | nosy: +BMintern messages: +msg58672 |
| 2006-07-09 18:34:12 | nneonneo | create | |