- Notifications
You must be signed in to change notification settings - Fork1.6k
<regex>: Correctly reset capture groups for ECMAScript grammar#5456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
<regex>: Correctly reset capture groups for ECMAScript grammar#5456
Uh oh!
There was an error while loading.Please reload this page.
Conversation
…its uses by `_St._Cur`
…re groups before each repetition
b86b415 to90f84c2Compare<regex>: Correctly reset capture groups for ECMAScript grammars<regex>: Correctly reset capture groups for ECMAScript grammarUh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
StephanTLavavej commentedMay 8, 2025
Thanks, this is great! 😻 I pushed a conflict-free merge with |
StephanTLavavej commentedMay 9, 2025
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed. |
cdbcac3 intomicrosoft:mainUh oh!
There was an error while loading.Please reload this page.
StephanTLavavej commentedMay 10, 2025
Thanks for resetting all of these bugs! ⏪ 🐞 😹 |
Uh oh!
There was an error while loading.Please reload this page.
Fixes#5365 and does some preparatory work for#174.
ECMAScript requires that all capture groups inside a loop are reset to "unmatched" at the start of each repetition. To do this correctly, we have to know the first capture group that appears inside the loop. We actually do not have to know the last one to reset the capture groups correctly: If a loop is repeated, then all capture groups between the first capture group inside the loop and the last one in the regex appear inside this loop or are already unmatched. (Knowing the last capture group inside the loop would avoid some unnecessary work, but given that the number of capture groups is usually low in practice it's not clear that this additional work is worth it when traded against more NFA traversal at the time of matching. This would require some benchmarking.)
To find the first capture group, the matcher now traverses the NFA until it encounters a capture group or has visited all of the nodes inside a loop. For POSIX regexes, the traversal is skipped; instead the first capture group is set to the number of capture groups in the whole expression, which means that no capture groups will be reset when processing loops as required by POSIX semantics.
To avoid this NFA traversal for each repetition, the result of this traversal is now cached in the loop state by extending the type
_Loop_vals_t. But this means the type_Loop_vals_tand_Matcherhave to be renamed for binary compatibility. At this opportunity, I also make some minor cleanups.I split this PR into several commits to make reviewing easier:
_Matcherto_Matcher2and add a template parameter for an allocator to prepare for<regex>: Should regex_match() use the match_results allocator? #174 as suggested by STL. For now, I'm passingvoidas the allocator argument because it's obviously not an allocator, so we don't have to worry about ABI when allocator support is actually added._Ncapmember variable: It should beunsigned int, notint._Get_ncap(): It has become superfluous because the type of_Ncaphas been corrected._Firstin favor of_Begin(to match_End)._Cur_iterin_Do_repto minimally reduce stack consumption.<regex>: Implementation divergence for capture group behavior #5365:_Loop_vals_tand rename it to_Loop_vals_v2_t._Find_first_inner_capture_groupthat recursively traverses the NFA to find the first capture group. The recursive traversal is protected against stack overflow like all other recursive calls in the matcher._Do_repinto_Do_rep(doing the main work for any repetition) and_Do_rep_first(doing the preparation before the first repetition)._Do_repto reset capture groups._Do_rep0to reset capture groups in one for loop. (They already are reset correctly in the other for loop.)_Do_rep_firstto perform the NFA traversal if necessary and update the loop state accordingly.