- Notifications
You must be signed in to change notification settings - Fork1.6k
<regex>: Improve search performance for regexes with initial+ quantifiers#5509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Merged
StephanTLavavej merged 3 commits intomicrosoft:mainfrommuellerj2:regex-improve-regex_search-performanceMay 17, 2025
Merged
<regex>: Improve search performance for regexes with initial+ quantifiers#5509
StephanTLavavej merged 3 commits intomicrosoft:mainfrommuellerj2:regex-improve-regex_search-performanceMay 17, 2025
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Uh oh!
There was an error while loading.Please reload this page.
StephanTLavavej approved these changesMay 16, 2025
Member
StephanTLavavej commentedMay 16, 2025
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
Member
StephanTLavavej commentedMay 17, 2025
I resolved a trivial adjacent-add conflict with#5494 in |
StephanTLavavej approved these changesMay 17, 2025
2391e5e intomicrosoft:main 40 checks passed
Uh oh!
There was an error while loading.Please reload this page.
Member
StephanTLavavej commentedMay 17, 2025
➕ 🚀 ⏱️ |
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading.Please reload this page.
Towards#5468. This is a small change that greatly speeds up searches for regexes like
a+that start with some letter/string/character class followed by a+quantifier (or any other quantifier requiring at least one repetition). Because this loop must be matched at least once, we can enter the repeated subpattern and look for the first position a letter/string/character class in the subpattern can match.While working on this, I noticed that I didn't think the implementation of
text_regex::should_search_match_capture_groups()through well enough: I designed it to use relative coordinates for expected submatches, but this isn't so helpful when one wants to ensure that the whole match is in a particular position. So I changed the implementation to use absolute coordinates from the start of the matched string. Luckily, no test seems to have relied on the previous behavior, meaning all of them just matched the start of the input string anyway.Benchmark
Running on my machine: