Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

<regex>: Fix matching of loops with context-dependent empty matches and bounded number of repetitions#5820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

@muellerj2
Copy link
Contributor

@muellerj2muellerj2 commentedNov 1, 2025
edited
Loading

Fixes#5797.

Bug analysis

The matcher contains an optimization for empty matches in a loop that has a minimum number of repetitions set. It's based on the observation that when a repetition matched the empty string, the next repetition can just match the empty string again, so the minimum is essentially lifted. This is achieved by simulating minimum number of reps - k - 1 matches, if the empty match was the k-th one.

However, this means we also have fewer remaining repetitions if there is also a maximum number of repetitions set. If these aren't enough, the matching along this trajectory fails.

This isn't much of an issue, though, if the empty match can happen in any repetition: If we must have at most y empty matches, then we will find the solution in the trajectory where the first empty match happens in repetition (min number of reps - y). (A mathematician might say that repetitions matching an empty string must commute with all other repetitions.)

But it's possible to construct empty matches that can only appear when the repetition happens at specific positions in the input string using assertions: ECMAScript's lookahead assertions, but also caret and dollar anchors. Using these constructs and disjunctions inside the repeated expression, we can construct input-position-dependent empty matches. This means that the bug could be encountered in some ECMAScript, POSIX extended and awk regexes.

What about POSIX basic regexes?

I'm not sure. While such regexes have one more context-dependent construct compared to ECMAScript and POSIX extended, namely backreferences (ECMAScript backreferences arenot the right kind of context-dependent because they either stay the same throughout the whole evaluation of the loop or are reset at the start of each repetition), these regexes don't have any disjunctions. I have tried to come up with a problematic basic regex, but have failed.

What about simple loops?

Simple loops implement this optimization as well. But I think context-dependent constructs cannot appear in simple loops currently (though maybe in the future). And even if they could appear, it wouldn't matter: Since simple loops are guaranteed to match strings of the same length in each repetition, a simple loop can only ever match empty strings if it matched an empty string once.

Fix

The easiest fix is to just disable this optimization for non-simple loops if a maximum number of repetitions has been set on the loop.

Alternatives

We could separate the repetition counters for minimum and maximum number of repetitions and only increase the counter used for the minimum if an empty match is encountered. Alternatively, we could add some additional loop state variables that encode that the minimum number of repetitions have essentially been lifted. But this would mean that more loop state would have to be stored and managed even for loops that don't gain anything from this optimization. I doubt that all this work is worth it just to enable this optimization in a few more cases.

Doubts about this optimization

This is second time this optimization is a source of bugs in the matcher (see#5218 (comment) with fix in#5494 after it became clear that this can cause some wrong assignments to capturing groups in ECMAScript as well) because it is applied too aggressively. I can't guarantee that we will not discover another bug caused by it. It also seems to optimize a weird corner case: What's the point of setting the minimum number of repetitions on a loop if the repeated pattern can match the empty string in all repetitions? So I will probably just pull it out if we stumble on another bug that is caused by this optimization.

StephanTLavavej and frederick-vs-ja reacted with heart emoji
@muellerj2muellerj2 requested a review froma team as acode ownerNovember 1, 2025 19:49
@StephanTLavavejStephanTLavavej added bugSomething isn't working regexmeow is a substring of homeowner labelsNov 1, 2025
@StephanTLavavejStephanTLavavej self-assigned thisNov 1, 2025
@StephanTLavavejStephanTLavavej removed their assignmentNov 3, 2025
@StephanTLavavejStephanTLavavej moved this fromInitial Review toReady To Merge inSTL Code ReviewsNov 3, 2025
@StephanTLavavejStephanTLavavej moved this fromReady To Merge toMerging inSTL Code ReviewsNov 4, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavejStephanTLavavej merged commit3f29c70 intomicrosoft:mainNov 5, 2025
41 checks passed
@github-project-automationgithub-project-automationbot moved this fromMerging toDone inSTL Code ReviewsNov 5, 2025
@StephanTLavavej
Copy link
Member

🐞 🛠️ 😸

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@StephanTLavavejStephanTLavavejStephanTLavavej approved these changes

Assignees

No one assigned

Labels

bugSomething isn't workingregexmeow is a substring of homeowner

Projects

Archived in project

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

<regex>: Loops with bounded number of repetitions and context-dependent empty alternative are mishandled

2 participants

@muellerj2@StephanTLavavej

[8]ページ先頭

©2009-2025 Movatter.jp