forked frompostgres/postgres
- Notifications
You must be signed in to change notification settings - Fork6
Commit12c9a04
committed
Implement lookbehind constraints in our regular-expression engine.
A lookbehind constraint is like a lookahead constraint in that it consumesno text; but it checks for existence (or nonexistence) of a match *ending*at the current point in the string, rather than one *starting* at thecurrent point. This is a long-requested feature since it exists in manyother regex libraries, but Henry Spencer had never got around toimplementing it in the code we use.Just making it work is actually pretty trivial; but naive copying of thelogic for lookahead constraints leads to code that often spends O(N^2) timeto scan an N-character string, because we have to run the match enginefrom string start to the current probe point each time the constraint ischecked. In typical use-cases a lookbehind constraint will be written atthe start of the regex and hence will need to be checked at every character--- so O(N^2) work overall. To fix that, I introduced a third copy of thecore DFA matching loop, paralleling the existing longest() and shortest()loops. This version, matchuntil(), can suspend and resume matching givena couple of pointers' worth of storage space. So we need only run itacross the string once, stopping at each interesting probe point and thenresuming to advance to the next one.I also put in an optimization that simplifies one-character lookahead andlookbehind constraints, such as "(?=x)" or "(?<!\w)", into AHEAD and BEHINDconstraints, which already existed in the engine. This avoids the overheadof the LACON machinery entirely for these rather common cases.The net result is that lookbehind constraints run a factor of three or soslower than Perl's for multi-character constraints, but faster than Perl'sfor one-character constraints ... and they work fine for variable-lengthconstraints, which Perl gives up on entirely. So that's not bad from acompetitive perspective, and there's room for further optimization ifanyone cares. (In reality, raw scan rate across a large input string isprobably not that big a deal for Postgres usage anyway; so I'm happy ifit's linear.)1 parentc5057b2 commit12c9a04
File tree
14 files changed
+690
-73
lines changed- doc/src/sgml
- src
- backend/regex
- include/regex
- test/regress
- expected
- sql
14 files changed
+690
-73
lines changedLines changed: 17 additions & 3 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
4477 | 4477 |
| |
4478 | 4478 |
| |
4479 | 4479 |
| |
| 4480 | + | |
| 4481 | + | |
| 4482 | + | |
| 4483 | + | |
| 4484 | + | |
| 4485 | + | |
| 4486 | + | |
| 4487 | + | |
| 4488 | + | |
| 4489 | + | |
| 4490 | + | |
| 4491 | + | |
| 4492 | + | |
| 4493 | + | |
4480 | 4494 |
| |
4481 | 4495 |
| |
4482 | 4496 |
| |
4483 | 4497 |
| |
4484 | 4498 |
| |
4485 |
| - | |
4486 |
| - | |
| 4499 | + | |
| 4500 | + | |
4487 | 4501 |
| |
4488 | 4502 |
| |
4489 | 4503 |
| |
| |||
5355 | 5369 |
| |
5356 | 5370 |
| |
5357 | 5371 |
| |
5358 |
| - | |
| 5372 | + | |
5359 | 5373 |
| |
5360 | 5374 |
| |
5361 | 5375 |
| |
|
Lines changed: 4 additions & 4 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
332 | 332 |
| |
333 | 333 |
| |
334 | 334 |
| |
335 |
| - | |
336 |
| - | |
337 |
| - | |
338 |
| - | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
339 | 339 |
| |
340 | 340 |
| |
341 | 341 |
| |
|
Lines changed: 12 additions & 3 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
196 | 196 |
| |
197 | 197 |
| |
198 | 198 |
| |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
199 | 207 |
| |
200 | 208 |
| |
201 |
| - | |
202 |
| - | |
| 209 | + | |
| 210 | + | |
203 | 211 |
| |
204 | 212 |
| |
205 | 213 |
| |
| |||
856 | 864 |
| |
857 | 865 |
| |
858 | 866 |
| |
859 |
| - | |
| 867 | + | |
| 868 | + | |
860 | 869 |
| |
861 | 870 |
| |
862 | 871 |
| |
|
Lines changed: 25 additions & 4 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
582 | 582 |
| |
583 | 583 |
| |
584 | 584 |
| |
| 585 | + | |
| 586 | + | |
585 | 587 |
| |
586 | 588 |
| |
587 | 589 |
| |
| |||
596 | 598 |
| |
597 | 599 |
| |
598 | 600 |
| |
599 |
| - | |
600 |
| - | |
| 601 | + | |
| 602 | + | |
601 | 603 |
| |
602 | 604 |
| |
603 |
| - | |
604 |
| - | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
605 | 626 |
| |
606 | 627 |
| |
607 | 628 |
| |
|
Lines changed: 43 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
1348 | 1348 |
| |
1349 | 1349 |
| |
1350 | 1350 |
| |
| 1351 | + | |
| 1352 | + | |
| 1353 | + | |
| 1354 | + | |
| 1355 | + | |
| 1356 | + | |
| 1357 | + | |
| 1358 | + | |
| 1359 | + | |
| 1360 | + | |
| 1361 | + | |
| 1362 | + | |
| 1363 | + | |
| 1364 | + | |
| 1365 | + | |
| 1366 | + | |
| 1367 | + | |
| 1368 | + | |
| 1369 | + | |
| 1370 | + | |
| 1371 | + | |
| 1372 | + | |
| 1373 | + | |
| 1374 | + | |
| 1375 | + | |
| 1376 | + | |
| 1377 | + | |
| 1378 | + | |
| 1379 | + | |
| 1380 | + | |
| 1381 | + | |
| 1382 | + | |
| 1383 | + | |
| 1384 | + | |
| 1385 | + | |
| 1386 | + | |
| 1387 | + | |
| 1388 | + | |
| 1389 | + | |
| 1390 | + | |
| 1391 | + | |
| 1392 | + | |
| 1393 | + | |
1351 | 1394 |
| |
1352 | 1395 |
| |
1353 | 1396 |
| |
|
0 commit comments
Comments
(0)