forked frompostgres/postgres
- Notifications
You must be signed in to change notification settings - Fork6
Commit4aea704
committed
Fix semantics of regular expression back-references.
POSIX defines the behavior of back-references thus: The back-reference expression '\n' shall match the same (possibly empty) string of characters as was matched by a subexpression enclosed between "\(" and "\)" preceding the '\n'.As far as I can see, the back-reference is supposed to consider onlythe data characters matched by the referenced subexpression. However,because our engine copies the NFA constructed from the referencedsubexpression, it effectively enforces any constraints therein, too.As an example, '(^.)\1' ought to match 'xx', or any other stringstarting with two occurrences of the same character; but in our codeit does not, and indeed can't match anything, because the '^' anchorconstraint is included in the backref's copied NFA. If POSIX intendedthat, you'd think they'd mention it. Perl for one doesn't act thatway, so it's hard to conclude that this isn't a bug.Fix by modifying the backref's NFA immediately after it's copied fromthe reference, replacing all constraint arcs by EMPTY arcs so that theconstraints are treated as automatically satisfied. This still allowsus to enforce matching rules that depend only on the data characters;for example, in '(^\d+).*\1' the NFA matching step will still knowthat the backref can only match strings of digits.Perhaps surprisingly, this change does not affect the results of anyof a rather large corpus of real-world regexes. Nonetheless, I wouldnot consider back-patching it, since it's a clear compatibility break.Patch by me, reviewed by Joel JacobsonDiscussion:https://postgr.es/m/661609.1614560029@sss.pgh.pa.us1 parentc5530d8 commit4aea704
File tree
5 files changed
+107
-0
lines changed- doc/src/sgml
- src
- backend/regex
- test/modules/test_regex
- expected
- sql
5 files changed
+107
-0
lines changedLines changed: 3 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
6166 | 6166 |
| |
6167 | 6167 |
| |
6168 | 6168 |
| |
| 6169 | + | |
| 6170 | + | |
| 6171 | + | |
6169 | 6172 |
| |
6170 | 6173 |
| |
6171 | 6174 |
| |
|
Lines changed: 71 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
1382 | 1382 |
| |
1383 | 1383 |
| |
1384 | 1384 |
| |
| 1385 | + | |
| 1386 | + | |
| 1387 | + | |
| 1388 | + | |
| 1389 | + | |
| 1390 | + | |
| 1391 | + | |
| 1392 | + | |
| 1393 | + | |
| 1394 | + | |
| 1395 | + | |
| 1396 | + | |
| 1397 | + | |
| 1398 | + | |
| 1399 | + | |
| 1400 | + | |
| 1401 | + | |
| 1402 | + | |
| 1403 | + | |
| 1404 | + | |
| 1405 | + | |
| 1406 | + | |
| 1407 | + | |
| 1408 | + | |
| 1409 | + | |
| 1410 | + | |
| 1411 | + | |
| 1412 | + | |
| 1413 | + | |
| 1414 | + | |
| 1415 | + | |
| 1416 | + | |
| 1417 | + | |
| 1418 | + | |
| 1419 | + | |
| 1420 | + | |
| 1421 | + | |
| 1422 | + | |
| 1423 | + | |
| 1424 | + | |
| 1425 | + | |
| 1426 | + | |
| 1427 | + | |
| 1428 | + | |
| 1429 | + | |
| 1430 | + | |
| 1431 | + | |
| 1432 | + | |
| 1433 | + | |
| 1434 | + | |
| 1435 | + | |
| 1436 | + | |
| 1437 | + | |
| 1438 | + | |
| 1439 | + | |
| 1440 | + | |
| 1441 | + | |
| 1442 | + | |
| 1443 | + | |
| 1444 | + | |
| 1445 | + | |
| 1446 | + | |
| 1447 | + | |
| 1448 | + | |
| 1449 | + | |
| 1450 | + | |
| 1451 | + | |
| 1452 | + | |
| 1453 | + | |
| 1454 | + | |
| 1455 | + | |
1385 | 1456 |
| |
1386 | 1457 |
| |
1387 | 1458 |
| |
|
Lines changed: 6 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
150 | 150 |
| |
151 | 151 |
| |
152 | 152 |
| |
| 153 | + | |
| 154 | + | |
153 | 155 |
| |
154 | 156 |
| |
155 | 157 |
| |
| |||
1182 | 1184 |
| |
1183 | 1185 |
| |
1184 | 1186 |
| |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
1185 | 1191 |
| |
1186 | 1192 |
| |
1187 | 1193 |
| |
|
Lines changed: 22 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
2636 | 2636 |
| |
2637 | 2637 |
| |
2638 | 2638 |
| |
| 2639 | + | |
| 2640 | + | |
| 2641 | + | |
| 2642 | + | |
| 2643 | + | |
| 2644 | + | |
| 2645 | + | |
| 2646 | + | |
| 2647 | + | |
| 2648 | + | |
| 2649 | + | |
| 2650 | + | |
| 2651 | + | |
| 2652 | + | |
| 2653 | + | |
| 2654 | + | |
| 2655 | + | |
| 2656 | + | |
| 2657 | + | |
| 2658 | + | |
| 2659 | + | |
| 2660 | + | |
2639 | 2661 |
| |
2640 | 2662 |
| |
2641 | 2663 |
| |
|
Lines changed: 5 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
770 | 770 |
| |
771 | 771 |
| |
772 | 772 |
| |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
773 | 778 |
| |
774 | 779 |
| |
775 | 780 |
| |
|
0 commit comments
Comments
(0)