Here's the initial list of changes i'd make to the nonbacktracking regex engine,
I haven't yet documented much because i made some design choices that i need to discuss.

Rebar baseline before changes:

After changes:

update: After more optimization:

Essentially i fixed the things that hurt the performance the most, there's many wins of 300% or more already.

Some fundamental design changes:

There is no BDD.Find() but rather a 64kb array lookup, this alone increases performance on unicode by ~3x. The initialization side of this is cheap as it's mostly zeroes, but having the array is important, this could be a span in the hot loop as well since the reference does not change.
NFA mode should not be thought of as "switching to second gear" but rather "stopping a tumor". It should be a last resort option which drops the performance from 500mb/s to 5mb/s to not blow up (search time performance goes from O(n) to O(m*n)). The symbolic automata framework here already provides an extremely small automaton and an unfair advantage over other engines. The one-time cost of potentially some megabytes allocated for state space is what you pay for this speed.
RegexFindOptimizations is often a lot slower than a DFA is, the only cases where AVX brings benefit are small character sets, ranges or string literals. Optimizing scenarios like [A-Z\n] or [A-Z0-9] with SearchValues would benefit a ton too, since they're very common predicates to skip to in the inner loop. Currently i disabled RegexFindOptimizations for the dfa engine for any char set over 4 chars that's not a single range.

There's low hanging fruit in the hot loop as well, i removed the IsDeadend flag entirely, since there is just one dead end state, which i stored the state ID for. The nullability and skipping checks got dedicated short-circuit lookups as having a nullable position at all (a valid match) is usually rare, something likebeq orbgt could be used as well. Some if if-else branches in the hot
loop should be removed and resolved somewhere above.

There's a lot of performance still on the table, when compared to resharp, dedicated
DFA prefilter optimizations could help a lot as well.

Let me know what you think of the changes, i think the nonbacktracking engine could be made a lot faster than compiled in most scenarios and the symbolic automata benefits should be used a lot more aggressively.

ghost added the area-System.Text.RegularExpressions label

May 24, 2024

dotnet-policy-servicebot added the community-contributionIndicates that the PR has been added by a community member label

May 24, 2024

Copy link

ContributorAuthor

ieviev commentedMay 24, 2024

@dotnet-policy-service agree

build-analysisbot mentioned this pull request

May 24, 2024

WASM test failing with OOM#100111

Closed

Copy link

ContributorAuthor

ieviev commentedMay 26, 2024

More overhead elimination got another 20% or so in some cases,

The all-english and all-russian benchmarks have too many one-off edge cases related to supporting all anchors that i cannot really change. higher performance there would require breaking something or bigger redesign. But for all other cases the engine is crazy fast already.

One cheap option available for the two slower ones is to remove \b;^;$ anchors where they change nothing semantically, \b\w+\b matches the exact same thing as \w+ and this would allow skipping the anchor checks with about ~30% more performance. I'll see what else is possible to improve match count overhead.

build-analysisbot mentioned this pull request

May 27, 2024

tracing/eventpipe/providervalidation/providervalidation/providervalidation.cmd test failure#101759

Closed

danmoseley reviewed

May 27, 2024

View reviewed changes

...ies/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/StateFlags.cs OutdatedShow resolvedHide resolved

build-analysisbot mentioned this pull request

May 27, 2024

System.Net.Http.Functional.Tests.HttpMetricsTest_Http11_Async_HttpMessageInvoker Test failure#96407

Closed

Copy link

ContributorAuthor

ieviev commentedMay 28, 2024

\b\w+\b matches the exact same thing as \w+

forgot the zero-width-non-joiner so this is not correct, this would otherwise almost double the performance

but making the conditional nullability checks cheaper did help by around 20%
i think this is about as far as i can get without sinking months into it

one thing i didnt figure out is where should tests for SymbolicRegexMatcher go?
it requires adding a ton of references to the unit test project

Copy link

Member

danmoseley commentedMay 28, 2024

IIRC essentially all the tests for the non backtracking are also run against the other engines, and involve passing the pattern and text in (ie aren't testing the internals directly). Is that what you're looking to do?

BTW is any of this work potentially relevant to the other engines? (I don't own regex just observer here.)

Copy link

ContributorAuthor

ieviev commentedMay 28, 2024

@danmoseley
I meant to write a couple tests for reversal optimization correctness but i suppose the end to end tests will immediately show when something is wrong as well. Or alternatively i could refactor it a bit and move it out of the matcher.

Perhaps Compiled might benefit from the same alphabet compression and character kind lookup. A source generated variant of the DFA would be very impressive though, it'd run circles around all other engines

Copy link

Contributor

kasperk81 commentedMay 28, 2024

[03:29:12] info: [FAIL] System.Text.RegularExpressions.Tests.AttRegexTests.Test[03:29:12] info: System.OutOfMemoryException : Out of memory[03:29:12] info:    at System.Text.RegularExpressions.Symbolic.MintermClassifier..ctor(BDD[] )[03:29:12] info:    at System.Text.RegularExpressions.Symbolic.UInt64Solver..ctor(BDD[] )

Copy link

ContributorAuthor

ieviev commentedMay 28, 2024

System.OutOfMemoryException : Out of memory in wasm tests

runtime/src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/AttRegexTests.cs

Lines 47 to 48 ine8e553b

	(RegexOptionsOptions,stringPattern,stringInput,stringExpected)[]cases=Match_MemberData_Cases(engine).ToArray();
	Regex[]regexes=RegexHelpers.GetRegexes(engine,cases.Select(c=>(c.Pattern,(CultureInfo?)null,(RegexOptions?)c.Options,(TimeSpan?)null)).ToArray());

Running thousands of regex instances with no GC inbetween while still maintaining low memory constraints in wasm tests is not easily solved, maybe some of the high memory annotations like<HighAotMemoryUsageAssembly> would do it. Or some sort of pooling / static dictionary that shares array instances based on ranges. Some memory could be shared but overall i think it's cleaner to have the memory associated to the engine instance itself.

Copy link

ContributorAuthor

ieviev commentedMay 29, 2024

Now the engine has less per-match overhead as well,
all-english got about 50% faster and all-russian got 100% faster compared to baseline.

The ending Z anchor is a major pain preventing lots of optimizations,
which also means paying for something you don't use.
A lot of temptation to just throw an exception and not supporting \Z at all.

I added a slight indirection to ascii-only cases to save memory (128 B instead of 65536 B) but the
performance difference is insignificant (< 5%) and at least passes the tests.

Other than that the results look pretty good,
it's not yet clear to me which abstractions are free and which are not so i've been avoiding interfaces.

This was referencedMay 29, 2024

[browser][MT] Tests timed out#102559

Closed

Debug assertion failure in System.Net.Http.HttpConnection.SendAsync#100616

Closed

Copy link

Member

stephentoub commentedMay 30, 2024

Thanks,@ieviev! I will take a look soon.

ieviev force-pushed thesymbolic-automata branch from93e2dd0 to7858a2fCompare

May 30, 2024 19:17

steveharter added the tenet-performancePerformance related issue label

Jun 5, 2024

Copy link

ContributorAuthor

ieviev commentedJun 5, 2024

There's still some things to do here

I'd still want to look at the INullabilityHandler and remove the dependency on StateFlags lookup and the state struct, these are currently causing some redundant work if the nullability is stored in an array.
NFA mode should be decoupled from the high-performance parts and only kept as a fallback mechanism, code sharing between DFA mode and NFA mode makes it harder to optimize things somewhere downstream for the DFA (in the sense that there's no safety guarantees attached to taking shortcuts).
Another thing is that perhaps it is possible to handle the \Z anchor before even starting the match, e.g. making the input span one character shorter if it ends with \n and then starting the match as usual. This could potentially remove a ton of complexity elsewhere in the code. I'm not certain it is always possible but there's many trivial cases where it would work

This was referencedJun 6, 2024

System.Data.OleDb.Tests timeout in net48 x86 Release leg#87783

Closed

Wasm test failure: System addon update list error SyntaxError: XMLHttpRequest.open#102955

Closed

[wasm] ystem addon update list error SyntaxError: XMLHttpRequest.open: 'http://%(server)s/dummy-system-addons.xml' is not a valid URL.#103060

Closed

stephentoub reviewed

Jun 14, 2024

View reviewed changes

Copy link

Member

stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sorry for the delay in getting to this. Much appreciated your submitting it.

A couple of comments/questions...

In our previous conversations, you mentioned you thought inner vectorization would be possible, where we could vectorize within a match rather than just finding the next starting place. I don't see that in this PR. Is that possible? Seems like that'd be an opportunity for some significant wins.
I understand from your comments that these change brought significant wins on some patterns, in particular those involving non-ASCII, which is great. I'm a bit concerned though that when running this on our own perf test suite, I'm seeing regressions in various places. You can find the tests inhttps://github.com/dotnet/performance/tree/main/src/benchmarks/micro/libraries/System.Text.RegularExpressions. Some of the more concerning ones were\p{Sm} and.{2,4}(Tom, which regressed throughput by ~50%, andHolmes.{0,25}(...).{0,25}Holmes..., which regressed throughput by ~25%. Thoughts?

Thanks!

src/libraries/System.Text.RegularExpressions/src/System.Text.RegularExpressions.csproj OutdatedShow resolvedHide resolved

.../System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/MatchingState.cs OutdatedShow resolvedHide resolved

.../System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/MatchingState.csShow resolvedHide resolved

.../System.Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/MatchingState.cs OutdatedShow resolvedHide resolved

....Text.RegularExpressions/src/System/Text/RegularExpressions/Symbolic/SymbolicRegexMatcher.cs OutdatedShow resolvedHide resolved

src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/NonBacktrackingTests.cs OutdatedShow resolvedHide resolved

...em.Text.RegularExpressions/tests/FunctionalTests/System.Text.RegularExpressions.Tests.csproj OutdatedShow resolvedHide resolved

src/libraries/System.Text.RegularExpressions/tests/UnitTests/SymbolicRegexTests.cs OutdatedShow resolvedHide resolved

stephentoub added this to the9.0.0 milestone

Jun 14, 2024

Copy link

ContributorAuthor

ieviev commentedJun 15, 2024

In our previous conversations, you mentioned you thought inner vectorization would be possible, where we could vectorize within a match rather than just finding the next starting place. I don't see that in this PR. Is that possible? Seems like that'd be an opportunity for some significant wins.

Yes, this is possible. Any pattern that contains .* could be a lot faster with longer matches. It'd be best to start with inner vectorization without anchors. The presence of anchors makes it more complicated and expensive but i still think it's possible with anchors as well when followed with an anchor context lookup, also it needs a bit of testing to see where the line between match time speedup and unwanted compile/construction-time overhead is.

I understand from your comments that these change brought significant wins on some patterns, in particular those involving non-ASCII, which is great. I'm a bit concerned though that when running this on our own perf test suite, I'm seeing regressions in various places. You can find the tests inhttps://github.com/dotnet/performance/tree/main/src/benchmarks/micro/libraries/System.Text.RegularExpressions. Some of the more concerning ones were \p{Sm} and .{2,4}(Tom, which regressed throughput by ~50%, and Holmes.{0,25}(...).{0,25}Holmes..., which regressed throughput by ~25%. Thoughts?

I'll definitely profile these as well. There is some overhead from the way edge cases are currently handled. \p{Sm} in particular could be made to skip the reversal part entirely along with other fixed length patterns. I'll follow up about this once i've done some more testing

This was referencedJun 18, 2024

The Operation will be canceled. The next steps may not contain expected logs.dotnet/dnceng#3008

Open

FileSystemWatcher_InternalBufferSize_SynchronizingObject test failed in CI#103495

Closed

Copy link

ContributorAuthor

ieviev commentedJun 18, 2024•
edited
Loading

Now i've had a better look at the dotnet/performance benchmarks as well. Turns out i was wrong about the \p{Sm} benchmark,System.Buffers.ProbabilisticWithAsciiCharSearchValues is actually faster there even with 936 characters in \p{Sm}. But there is a trade-off depending on how frequent the characters are in the input. With <70 occurrences in 20mb of text it was about 2x faster, but it can be either slower or faster depending on frequency. I reenabled RegexFindOptimizations there for now since it's more of a statistics question when can you safely assume all the characters in the set are rare.

Try running the performance tests again - i think it should be faster across the board now.

I thought about inner vectorization as well, but it doesn't seem like there's many benchmarks including .* or other star loops and i think creating a separate implementation of RegexFindOptimizations for the automata engine would be far more beneficial. Derivatives have a very concise and useful way to infer prefixes, that'd work for any pattern, so this would replace a lot of custom logic inRegexPrefixAnalyzer.cs with something similar to this:

// ... prefix start, looping back to this or any visited states is redundantSymbolicRegexNodecurrentState=// ..varmergedPrefix= solver.Emptyforeach(varmintermin minterms){var derivative=createDerivative(state,minterm)if(isRedundant(derivative))then continue;// O(1) bitwise merge relevant transitions for the prefixmergedPrefix= solver.Or(mergedPrefix,minterm)}// .. continue breadth first search for longer prefixesreturn mergedPrefix

There's many optimizations that currently go undetected/unused which could enable avx in more cases. The avx and transitions could be combined as well, e.g. when optimizing for a longer prefix likeTom.*River, the engine currently skips to the location of|Tom.*River when it could just skip the transitions and continue immediately at|.*River instead.
I think most of the regexes being used in benchmarks spend their time in RegexFindOptimizations territory so this would be immediately noticeable there as well. I'm unsure how this compares to the prefixes used for the Compiled engine, maybe just opting in to the same prefix optimizations Compiled gets and adjusting them for NonBacktracking works as well but i'd need to try that in practice.

I wonder if replacing the algorithm choices in the match with something likeFunc<..,int> created during construction would be an option. Essentially what i'd want to do is handle as many if-else choices during construction as possible. The ideal DFA would look something like this without any if-else conditions in between:

while(position!=end){// check for success// get next stateposition++;}

An optimized DFA will give you a very consistent worst case throughput of somewhere around 500mb/s on any pattern which also accounts for utf16 losses on utf8 input (why the english benchmarks are slower), AVX on benchmarks is a bit of a gamble, which sometimes helps a lot and sometimes does not. High match count overhead can sometimes dominate the benchmark as well. There's massive gains from AVX in some cases but overall i'm not exactly sure where the line is as it depends on input text as well.

The worst case performance should be very good on both ascii and unicode by now, what kind of optimizations to do with prefixes depends on benchmarks used orhttps://github.com/dotnet/runtime-assets/blob/main/src/System.Text.RegularExpressions.TestData/Regex_RealWorldPatterns.json

I'll refactor and clean up the PR and comments for a bit now.

Copy link

ContributorAuthor

ieviev commentedJun 18, 2024•
edited
Loading

Ah i left an exception to check if there's over 255 minterms in some pattern as well, a pattern from RegexPcreTests did reach it so a fallback mode definitely necessary. an int array would be a 256K allocation but a couple times faster than BDD so there's a decision there as well. Would pooling the array be an option?

ieviev added9 commits

July 10, 2024 17:24

getstateflags

764ded8

formatting

81d0dca

removing unused interface

38f28b9

local function typo

cce1188

temporarily removing minterms test

8b946da

re-adding minterms test

d3430b3

reenabling test for all engines

388c256

test bugfix

2704641

expected matches change

0abaabe

stephentoub force-pushed thesymbolic-automata branch fromb259937 to6bb517aCompare

July 10, 2024 21:24

Review and clean up some code

0a0f409

Simplification, style consistency, dead code deletion, some bounds-check removal, etc.

stephentoub force-pushed thesymbolic-automata branch from6bb517a to0a0f409Compare

July 10, 2024 21:32

stephentoub approved these changes

Jul 10, 2024

View reviewed changes

Copy link

Member

stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I pushed a commit with a last round of changes.

Otherwise, LGTM.

Thanks!

stephentoub mentioned this pull request

Jul 11, 2024

Microsoft.Gen.OptionsValidation.Unit.Test.EmitterTests.ProduceDiagnosticFromOtherAssemblyTest failed in CI#104723

Open

stephentoub merged commit0c38d80 intodotnet:main

Jul 11, 2024

Copy link

Member

stephentoub commentedJul 11, 2024

I've got another round of cleanup / simplification (some of the code in this PR, some pre-existing) I'm working on and will submit as a separate PR.

Copy link

Contributor

veanes commentedJul 11, 2024

I will be happy to provide comments or feedback if needed.

stephentoub mentioned this pull request

Jul 11, 2024

Some more cleanup to regex NonBacktracking#104766

Merged

DrewScoggins mentioned this pull request

Jul 16, 2024

[Perf] Linux/x64: 2 Regressions in Regex on 7/11/2024 6:16:28 PM#104975

Closed

This was referencedJul 16, 2024

[Perf] Windows/x64: 25 Improvements on 7/11/2024 6:16:28 PMdotnet/perf-autofiling-issues#38348

Closed

[Perf] Linux/x64: 30 Improvements on 7/11/2024 6:16:28 PMdotnet/perf-autofiling-issues#38284

Closed

[Perf] Windows/x64: 17 Improvements on 7/11/2024 6:16:28 PMdotnet/perf-autofiling-issues#38337

Closed

[Perf] Linux/x64: 14 Improvements on 7/11/2024 6:16:28 PMdotnet/perf-autofiling-issues#38381

Closed

[Perf] Linux/x64: 10 Improvements on 7/11/2024 8:46:53 PMdotnet/perf-autofiling-issues#38382

Closed

[Perf] Windows/x64: 9 Improvements on 7/10/2024 6:50:30 PMdotnet/perf-autofiling-issues#38335

Closed

This was referencedJul 18, 2024

[Perf] Windows/arm64: 21 Improvements on 7/11/2024 4:22:20 PMdotnet/perf-autofiling-issues#38605

Closed

[Perf] Linux/arm64: 25 Improvements on 7/11/2024 4:22:20 PMdotnet/perf-autofiling-issues#38493

Closed

This was referencedAug 7, 2024

[Perf] Linux/x64: 33 Improvements on 7/11/2024 6:16:28 PMdotnet/perf-autofiling-issues#38403

Closed

[Perf] Linux/x64: 2 Improvements on 7/12/2024 1:00:12 PMdotnet/perf-autofiling-issues#38404

Closed

github-actionsbot locked and limited conversation to collaborators

Aug 11, 2024

Labels

area-System.Text.RegularExpressions community-contribution

Indicates that the PR has been added by a community member

tenet-performance

Performance related issue

7 participants

Movatterモバイル変換

NonBacktracking Regex optimizations#102655

NonBacktracking Regex optimizations#102655

Uh oh!

Conversation

ieviev commentedMay 24, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

ieviev commentedMay 24, 2024

Uh oh!

ieviev commentedMay 26, 2024

Uh oh!

Uh oh!

ieviev commentedMay 28, 2024

Uh oh!

danmoseley commentedMay 28, 2024

Uh oh!

ieviev commentedMay 28, 2024

Uh oh!

kasperk81 commentedMay 28, 2024

Uh oh!

ieviev commentedMay 28, 2024

Uh oh!

ieviev commentedMay 29, 2024

Uh oh!

stephentoub commentedMay 30, 2024

Uh oh!

ieviev commentedJun 5, 2024

Uh oh!

stephentoub left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ieviev commentedJun 15, 2024

Uh oh!

ieviev commentedJun 18, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

ieviev commentedJun 18, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

stephentoub left a comment

Choose a reason for hiding this comment

Uh oh!

stephentoub commentedJul 11, 2024

Uh oh!

veanes commentedJul 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ieviev commentedMay 24, 2024•
edited
Loading

ieviev commentedJun 18, 2024•
edited
Loading

ieviev commentedJun 18, 2024•
edited
Loading