Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitb5517b2

Browse files
committed
optimize positive lookahead at end of pattern
- we achieve this by moving a trailing positive lookahead outside our wrapped capture group 0.- if the contents of the lookahead are easy, and the rest of the expression is also easy, we can then delegate the whole regexp to regex-automata and fixup the capture groups afterwards.- If the contents are not easy, no harm is done because the expression retains the same semantics in our VM, as we only perform the move if the whole pattern is not self-recursive. Also, as it is done after parsing, any relative backreferences etc. should be unaffected.- Because it is done before analysis, no re-analysis needs to take place, so it is quite efficient.This allows us to take full advantage of regex-automata's prefilters etc, and in general delegating to the NFA should always be quicker than using a backtracking engine and trying each character position in the haystack one at a time.This type of regex pattern is a quite common scenario inside `.sublime-syntax` definitions which are compatible with Sublime Text's `sregex` engine.
1 parent39bee77 commitb5517b2

File tree

14 files changed

+327
-34
lines changed

14 files changed

+327
-34
lines changed

‎.github/FUNDING.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
github:[raphlinus, robinst]
1+
github:[raphlinus, robinst, keith-hall]

‎CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
66
This project adheres to[Semantic Versioning](http://semver.org/spec/v2.0.0.html),
77
with the exception that 0.x versions can break between minor versions.
88

9+
##[Unreleased]
10+
###Added
11+
- Add an optimization step after the pattern is parsed but before it is analyzed.
12+
Currently it only optimizes one specific use-case - where the expression is easy except
13+
for a trailing positive lookahead whose contents are also easy. The optimization is to avoid the VM.
14+
915
##[0.15.0] - 2025-07-06
1016
###Added
1117
- Support`\Z` - anchor to the end of the text before any trailing newlines. (#148)

‎Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[package]
22
name ="fancy-regex"
33
version ="0.15.0"
4-
authors = ["Raph Levien <raph@google.com>","Robin Stocker <robin@nibor.org>"]
4+
authors = ["Raph Levien <raph@google.com>","Robin Stocker <robin@nibor.org>","Keith Hall <keith.hall@available.systems>"]
55
edition ="2018"
66
license ="MIT"
77
description ="An implementation of regexes, supporting a relatively rich set of features, including backreferences and look-around."

‎PERFORMANCE.md

Lines changed: 23 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,9 @@ For a good explanation of that, read
1212

1313
Let's look at the regex from the README again:
1414

15+
```regex
1516
(a|b|ab)*bc
17+
```
1618

1719
And the input text:
1820

@@ -47,15 +49,28 @@ delegate to the regex crate which matches it in linear runtime.
4749

4850
Let's look at another regex, one that makes use of a "fancy" look-ahead:
4951

52+
```regex
5053
(a|b|ab)*(?=c)
51-
54+
```
5255
When fancy-regex matches it against this input:
5356

5457
abababababababababababababababababababababababababababab
5558

56-
It's slow! The reason is that`(?=c)` is not supported by the regex
57-
crate, so we need to handle it with backtracking. And because
58-
`(a|b|ab)*` is before it, we need to do it with backtracking as well.
59+
It's still fast! The reason is that although`(?=c)` is not supported
60+
by the regex crate, fancy-regex detects the trailing positive lookahead
61+
and is able to essentially rewrite the pattern into
62+
63+
```regex
64+
((a|b|ab)*)c
65+
```
66+
67+
and thus delegate the whole thing to the regex crate, and fixup the
68+
captures/match boundaries after a match is found.
69+
70+
If, however, the lookahead didn't come at the end of the pattern,
71+
it would be slow! The reason is that fancy-regex would need to handle
72+
the lookahead with backtracking. And because`(a|b|ab)*` is before it,
73+
that also needs to be done with backtracking as well.
5974

6075
Oniguruma doesn't have a problem with this particular case because its
6176
optimization saves it again: It checks if there's a`c` in the input
@@ -70,8 +85,10 @@ inner part of the look-ahead can be delegated to regex entirely.
7085

7186
###Summary
7287

73-
* If the regex doesn't use fancy features, fancy-regex should have
74-
linear runtime compared to Oniguruma's exponential worst-case.
88+
* If the regex doesn't use fancy features, or the features used in the
89+
pattern can be identified as being syntactic sugar and slightly
90+
rewritten, fancy-regex should have linear runtime compared to
91+
Oniguruma's exponential worst-case.
7592
* Even if the regex doesn't use any fancy features, Oniguruma can be
7693
faster because it is a mature and highly optimized engine.
7794
* With fancy features, Oniguruma can be faster because of optimizations.

‎README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,8 @@ creating the excellent regex crate.
134134

135135
##Authors
136136

137-
The main author is Raph Levien, with many contributions from Robin Stocker.
137+
The main author is Raph Levien, with many contributions from Robin Stocker
138+
and Keith Hall.
138139

139140
##Contributions
140141

‎benches/bench.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ fn run_tricky(c: &mut Criterion) {
8686
}
8787

8888
fnrun_backtrack_limit(c:&mutCriterion){
89-
let tree =Expr::parse_tree("(?i)(a|b|ab)*(?=c)").unwrap();
89+
let tree =Expr::parse_tree("(?i)(a|b|ab)*(?>c)").unwrap();
9090
let a =analyze(&tree).unwrap();
9191
let p =compile(&a).unwrap();
9292
let s ="abababababababababababababababababababababababababababab";

‎examples/toy.rs

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020

2121
//! A simple test app for exercising and debugging the regex engine.
2222
23-
use fancy_regex::internal::{analyze, compile, run_trace,Insn,Prog};
23+
use fancy_regex::internal::{analyze, compile,optimize,run_trace,Insn,Prog};
2424
use fancy_regex::*;
2525
use std::env;
2626
use std::fmt::{Display,Formatter,Result};
@@ -35,6 +35,10 @@ fn main() {
3535
let re = args.next().expect("expected regexp argument");
3636
let e =Expr::parse_tree(&re);
3737
println!("{:#?}", e);
38+
}elseif cmd =="optimize"{
39+
let re = args.next().expect("expected regexp argument");
40+
let e =Expr::parse_tree(&re).expect("expected regexp to be parsed successfully");
41+
println!("{:#?}", optimize(wrap_tree(e)));
3842
}elseif cmd =="analyze"{
3943
let re = args.next().expect("expected regexp argument");
4044
let stdout = io::stdout();
@@ -122,7 +126,8 @@ fn graph(re: &str, writer: &mut dyn std::io::Write) -> std::io::Result<()> {
122126
fnshow_analysis(re:&str,writer:&mutFormatter<'_>) ->Result{
123127
let tree =Expr::parse_tree(&re).unwrap();
124128
let wrapped_tree =wrap_tree(tree);
125-
let a =analyze(&wrapped_tree);
129+
let(optimized_tree, _) =optimize(wrapped_tree).expect("Expected optimization to succeed");
130+
let a =analyze(&optimized_tree);
126131
write!(writer,"{:#?}\n", a)
127132
}
128133

@@ -132,9 +137,13 @@ fn show_compiled_program(re: &str, writer: &mut Formatter<'_>) -> Result {
132137
}
133138

134139
fnprog(re:&str) ->Prog{
140+
// one thing to note here is that we want the prog, but in lib.rs,
141+
// constructing a regex might not produce a prog - it may be wrapped Regex instead,
142+
// which means that "toy" behaves differently to tests etc.
135143
let tree =Expr::parse_tree(re).expect("Expected parsing regex to work");
136144
let wrapped_tree =wrap_tree(tree);
137-
let result =analyze(&wrapped_tree).expect("Expected analyze to succeed");
145+
let(optimized_tree, _) =optimize(wrapped_tree).expect("Expected optimization to succeed");
146+
let result =analyze(&optimized_tree).expect("Expected analyze to succeed");
138147
compile(&result).expect("Expected compile to succeed")
139148
}
140149

‎src/analyze.rs

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -251,9 +251,7 @@ pub fn analyze<'a>(tree: &'a ExprTree) -> Result<Info<'a>> {
251251
mod tests{
252252
usesuper::analyze;
253253
// use super::literal_const_size;
254-
usecrate::CompileError;
255-
usecrate::Error;
256-
usecrate::Expr;
254+
usecrate::{CompileError,Error,Expr};
257255

258256
// #[test]
259257
// fn case_folding_safe() {

‎src/compile.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -620,6 +620,7 @@ mod tests {
620620
backrefs:BitSet::new(),
621621
named_groups:Default::default(),
622622
contains_subroutines:false,
623+
self_recursive:false,
623624
};
624625
let info =analyze(&tree).unwrap();
625626

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp