- Notifications
You must be signed in to change notification settings - Fork5
Commit5223f96
committed
Fix regex back-references that are directly quantified with *.
The syntax "\n*", that is a backref with a * quantifier directly appliedto it, has never worked correctly in Spencer's library. This has been anopen bug in the Tcl bug tracker since 2005:https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894The core of the problem is in parseqatom(), which first changes "\n*" to"\n+|" and then applies repeat() to the NFA representing the backref atom.repeat() thinks that any arc leading into its "rp" argument is part of thesub-NFA to be repeated. Unfortunately, since parseqatom() already createdthe arc that was intended to represent the empty bypass around "\n+", thisarc gets moved too, so that it now leads into the state loop created byrepeat(). Thus, what was supposed to be an "empty" bypass gets turned intosomething that represents zero or more repetitions of the NFA representingthe backref atom. In the original example, in place of^([bc])\1*$we now have something that acts like^([bc])(\1+|[bc]*)$At runtime, the branch involving the actual backref fails, as it's supposedto, but then the other branch succeeds anyway.We could no doubt fix this by some rearrangement of the operations inparseqatom(), but that code is plenty ugly already, and what's more thewhole business of converting "x*" to "x+|" probably needs to go away to fixanother problem I'll mention in a moment. Instead, this patch suppressesthe *-conversion when the target is a simple backref atom, leaving the caseof m == 0 to be handled at runtime. This makes the patch in regcomp.c aone-liner, at the cost of having to tweak cbrdissect() a little. In theevent I went a bit further than that and rewrote cbrdissect() to check allthe string-length-related conditions before it starts comparing characters.It seems a bit stupid to possibly iterate through many copies of ann-character backreference, only to fail at the end because the targetstring's length isn't a multiple of n --- we could have found that outbefore starting. The existing coding could only be a win if integerdivision is hugely expensive compared to character comparison, but I don'tknow of any modern machine where that might be true.This does not fix all the problems with quantified back-references. Inparticular, the code is still broken for back-references that appear withina larger expression that is quantified (so that direct insertion of thequantification limits into the BACKREF node doesn't apply). I think fixingthat will take some major surgery on the NFA code, specifically introducingan explicit iteration node type instead of trying to transform iterationinto concatenation of modified regexps.Back-patch to all supported branches. In HEAD, also add a regression testcase for this. (It may seem a bit silly to create a regression test filefor just one test case; but I'm expecting that we will soon import a wholebunch of regex regression tests from Tcl, so might as well create theinfrastructure now.)1 parente00f68e commit5223f96
File tree
6 files changed
+103
-30
lines changed- src
- backend/regex
- test/regress
- expected
- sql
6 files changed
+103
-30
lines changedLines changed: 6 additions & 2 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
1088 | 1088 |
| |
1089 | 1089 |
| |
1090 | 1090 |
| |
1091 |
| - | |
1092 |
| - | |
| 1091 | + | |
| 1092 | + | |
| 1093 | + | |
| 1094 | + | |
| 1095 | + | |
| 1096 | + | |
1093 | 1097 |
| |
1094 | 1098 |
| |
1095 | 1099 |
| |
|
Lines changed: 46 additions & 27 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
720 | 720 |
| |
721 | 721 |
| |
722 | 722 |
| |
723 |
| - | |
| 723 | + | |
724 | 724 |
| |
725 | 725 |
| |
726 | 726 |
| |
| |||
962 | 962 |
| |
963 | 963 |
| |
964 | 964 |
| |
965 |
| - | |
966 | 965 |
| |
967 |
| - | |
968 |
| - | |
| 966 | + | |
| 967 | + | |
| 968 | + | |
| 969 | + | |
969 | 970 |
| |
970 |
| - | |
971 | 971 |
| |
972 | 972 |
| |
973 | 973 |
| |
| |||
978 | 978 |
| |
979 | 979 |
| |
980 | 980 |
| |
| 981 | + | |
981 | 982 |
| |
982 | 983 |
| |
983 |
| - | |
984 |
| - | |
| 984 | + | |
| 985 | + | |
985 | 986 |
| |
986 | 987 |
| |
987 | 988 |
| |
988 | 989 |
| |
989 | 990 |
| |
990 | 991 |
| |
991 |
| - | |
992 |
| - | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
993 | 1007 |
| |
994 |
| - | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
995 | 1012 |
| |
| 1013 | + | |
996 | 1014 |
| |
997 | 1015 |
| |
998 | 1016 |
| |
999 |
| - | |
1000 |
| - | |
1001 |
| - | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
1002 | 1027 |
| |
1003 |
| - | |
1004 | 1028 |
| |
1005 |
| - | |
1006 |
| - | |
1007 |
| - | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
1008 | 1032 |
| |
1009 |
| - | |
1010 |
| - | |
1011 |
| - | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
1012 | 1036 |
| |
1013 |
| - | |
1014 | 1037 |
| |
1015 |
| - | |
1016 |
| - | |
1017 |
| - | |
1018 |
| - | |
1019 |
| - | |
1020 |
| - | |
| 1038 | + | |
| 1039 | + | |
1021 | 1040 |
| |
1022 | 1041 |
| |
1023 | 1042 |
| |
|
Lines changed: 36 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + |
Lines changed: 1 addition & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
30 | 30 |
| |
31 | 31 |
| |
32 | 32 |
| |
33 |
| - | |
| 33 | + | |
34 | 34 |
| |
35 | 35 |
| |
36 | 36 |
| |
|
Lines changed: 1 addition & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
42 | 42 |
| |
43 | 43 |
| |
44 | 44 |
| |
| 45 | + | |
45 | 46 |
| |
46 | 47 |
| |
47 | 48 |
| |
|
Lines changed: 13 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + |
0 commit comments
Comments
(0)