Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-127971: fix off-by-one read beyond the end of a string during search#132574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
picnixz merged 7 commits intopython:mainfromduaneg:gh-127971
Jul 13, 2025

Conversation

duaneg
Copy link
Contributor

@duanegduaneg commentedApr 16, 2025
edited by bedevere-appbot
Loading

Fix cases where the string search algorithm reads one past the end of the character/byte array under certain conditions.

@@ -595,7 +595,7 @@ STRINGLIB(default_find)(const STRINGLIB_CHAR* s, Py_ssize_t n,
continue;
}
/* miss: check if next character is part of pattern */
if (!STRINGLIB_BLOOM(mask, ss[i+1])) {
if (i < w &&!STRINGLIB_BLOOM(mask, ss[i+1])) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm totally not sure thatelse branch is appropiate here for 'i == w2.
Even since we havei > w for thism this logic isn't clear for me.
What about simple replacementi <= w for loop condition with classicali < w?
It could be enough and cleaner.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think the else branch will be OK, since all it does is advance the index, and it is intended and expected that this could potentially advance it past the end of the string, in which case thefor loop will terminate.

We can't just replace the the loop condition withi < w since thew here is the lastvalid index that the pattern could appear at, and needs to be checked. Otherwise we would miss valid matches, and indeed such a change breaks a large number of unit tests.

Note the conditionals that were changed aremiss conditions, i.e. the algorithm has determined the character at the index cannot be part of the pattern at this location in the string. The conditionals modified are checking whether thefollowing character could potentially be part of a pattern hit, so as to determine whether to skip it entirely by advancing the full length of the pattern or only as much as possible while still considering it as a valid potential hit. In the case where we are at the end of the buffer it doesn't actually matter which branch we take, since either way it will advance past it and terminate. We just need to avoid reading the invalid following character when it doesn't exist.

efimov-mikhail reacted with thumbs up emoji
Copy link
Contributor

@efimov-mikhailefimov-mikhailJun 21, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe it's better to rewrite condition asi+1 <= w? It seems to be more obvious way of checking for me.
It's very similar to "for" condition, but for another argument and before direct using ofi+1 as index.

... thew here is the lastvalid index that the pattern could appear at

IMO, rewrited condition slightly reduces cognitive load.

@efimov-mikhail
Copy link
Contributor

efimov-mikhail commentedJun 19, 2025
edited
Loading

I'm not at all an expert in this code, so I've tried to apply your patch by myself.
It definitely fixes related issue.

But I suggest some changes.
Moreover, I can see very similar code atadaptive_find function.
IMO, we should investigate both cases and fix them in the same PR if needed.

@duaneg
Copy link
ContributorAuthor

I'm not at all an expert in this code, so I've tried to apply your patch by myself. It definitely fixes related issue.

I have been hesitant to ping specific developers, as I don't want to annoy people or violate etiquette, but if you think it would be appropriate there are a couple of devs who have worked on this code frequently and/or recently I could tag in?

But I suggest some changes. Moreover, I can see very similar code atadaptive_find function. IMO, we should investigate both cases and fix them in the same PR if needed.

That is an excellent find, thanks! Agreed: I'll investigate, try to get a test that reproduces the issue there also, and fix it.

efimov-mikhail reacted with thumbs up emoji

@efimov-mikhail
Copy link
Contributor

I've just usedgit blame and find some names.
I hope that's appropriate.
@serhiy-storchaka@sweeneyde

duaneg reacted with heart emoji

@serhiy-storchakaserhiy-storchaka self-requested a reviewJune 20, 2025 07:10
@duaneg
Copy link
ContributorAuthor

I can trigger the off-by-one reads in the adaptive algorithm easily enough, but I haven't been able to get it to cause an ASAN error as yet, so no luck so far on a test for that.

The third variant,default_rfind, already includes the equivalent guard:

/* miss: check if previous character is part of pattern */if (i>0&& !STRINGLIB_BLOOM(mask,s[i-1])) {i=i-m;                                                                  }else {i=i-skip;                                                               }                                                                           }else {/* skip: check if previous character is part of pattern */if (i>0&& !STRINGLIB_BLOOM(mask,s[i-1])) {i=i-m;                                                                  }                                                                           }                                                                           }

@efimov-mikhail
Copy link
Contributor

Doesdefault_rfind have tests on it? If not, those worth to be added too.

IMO, we can add check and tests foradaptive_find even if there is no real crash under ASan.

@duaneg
Copy link
ContributorAuthor

default_rfind does have tests, including ones that exercise the relevant code path. In fact anystr.rfind with a non-matching pattern longer than one character (but shorter or equal to input length so it doesn't short-circuit) does so.adaptive_find also has tests, albeit none that trigger the code branches in question.

Looking closer atadaptive_find, on second thought I don't think the check is strictly required there.adaptive_find is only ever called directly fromFASTSEARCH. For that function to use theadaptive_find algorithm the input length has to be > 2.5k length, and the pattern length has to be more than one character (actually 6+ or 100+ depending on input size). Given that, the character "one past the last possible position the pattern could start" will always be a valid character in the input string. As mentioned above, it doesn't actually matter which branch is taken, so it doesn't matter what its value is.

Presumably the extra clause in the conditional has some measurable performance impact, however slight, and given the input string minimum length requirements it will likely add up to much more than would be saved by avoiding one unnecessary character read. So, perhaps we shouldn't add it toadaptive_find, even though the current code seems safe only due to unchecked implementation details of when/under what circumstances it is called.

Nonetheless, I'll push out an update to this that adds anadaptive_find test, to ensure that code path is exercised, even though there is nothing we can check to verify it is doing what it is supposed to.

@efimov-mikhail
Copy link
Contributor

efimov-mikhail commentedJun 21, 2025
edited
Loading

I suggest to slightly rewritei < w condition asi + 1 <= w fordefault_find and maybei > 0 asi - 1 >= 0 fordefault_rfind.
Moreover, some short comment atadaptive_find code itself also seems valuable.

But those are only cosmetical changes, in other aspects LGTM.

…mmentexplaining why a guard is not necessary in adaptive_find.
@duaneg
Copy link
ContributorAuthor

I suggest to slightly rewritei < w condition asi + 1 <= w fordefault_find and maybei > 0 asi - 1 >= 0 fordefault_rfind. Moreover, some short comment atadaptive_find code itself also seems valuable.

But those are only cosmetical changes, in other aspects LGTM.

Sure thing, will update, thanks!

Copy link
Contributor

@efimov-mikhailefimov-mikhail left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM. I'd like to have just little rewording in the comment.

Copy link
Member

@picnixzpicnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am not fond of tests thatmay trigger a crash and the comments explaining why we have them are too long IMO. I don't see why we can't find an explicit input that triggers the code paths we want (except that it may be hard to craft such input). If it's too hard, we can just.. remove the tests. Or we should indicate in the C code that tests must be updated if we change the implementation.

@duaneg
Copy link
ContributorAuthor

I am not fond of tests thatmay trigger a crash and the comments explaining why we have them are too long IMO. I don't see why we can't find an explicit input that triggers the code paths we want (except that it may be hard to craft such input). If it's too hard, we can just.. remove the tests. Or we should indicate in the C code that tests must be updated if we change the implementation.

I'm not fond of such tests either. I was unable to craft a test case that triggers an ASAN warning, so I will gladly drop it.

@picnixz
Copy link
Member

Ok, just remove the ASAN guard (my bad).

Copy link
Member

@picnixzpicnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

After removing the ASAN guard, I'll merge. Note thatdefault_rfind actually has the checks fori > 0, so it does make sense to keep them forfind as well.

@duaneg
Copy link
ContributorAuthor

OK, will remove the ASAN guard and always run the test. I agree, that seems easiest. Thanks very much for the reviews, both of you, I appreciate it!

efimov-mikhail reacted with thumbs up emoji

@picnixzpicnixz merged commit85ec3b3 intopython:mainJul 13, 2025
44 checks passed
@picnixzpicnixz added needs backport to 3.13bugs and security fixes needs backport to 3.14bugs and security fixes labelsJul 13, 2025
@miss-islington-app
Copy link

Thanks@duaneg for the PR, and@picnixz for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Thanks@duaneg for the PR, and@picnixz for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

@miss-islington-app
Copy link

Sorry,@duaneg and@picnixz, I could not cleanly backport this to3.13 due to a conflict.
Please backport usingcherry_picker on command line.

cherry_picker 85ec3b3b503ffd5b7e45f8b3fa2cec0c10e4bef0 3.13

miss-islington pushed a commit to miss-islington/cpython that referenced this pull requestJul 13, 2025
…g search (pythonGH-132574)(cherry picked from commit85ec3b3)Co-authored-by: Duane Griffin <duaneg@dghda.com>
@picnixz
Copy link
Member

Can you take care of the failed backport? TiA.

@bedevere-app
Copy link

GH-136628 is a backport of this pull request to the3.14 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.14bugs and security fixes labelJul 13, 2025
picnixz pushed a commit that referenced this pull requestJul 13, 2025
…ng search (GH-132574) (#136628)gh-127971: fix off-by-one read beyond the end of a string during search (GH-132574)(cherry picked from commit85ec3b3)Co-authored-by: Duane Griffin <duaneg@dghda.com>
@picnixz
Copy link
Member

@duaneg

@efimov-mikhail
Copy link
Contributor

IMO, it'll be better to wait few days for@duaneg's response. I also can make such backport.

@picnixz
Copy link
Member

With pleasure! It's just so that I can close the issue otherwise I'll likely forget about it :)

efimov-mikhail and duaneg reacted with thumbs up emojiduaneg reacted with heart emoji

@duaneg
Copy link
ContributorAuthor

Sorry, is there anything else to do? It looks like the backport PR has now been merged (thanks!) but if there is anything more required, please let me know and I'll take care of it.

Apologies for not doing it myself immediately last night, but it was 1am local time and I've not done a (CPython) backport before, so while it all looks quite straight-forward I wanted to wait until morning when I had time and was fully awake 🙂

@efimov-mikhail
Copy link
Contributor

efimov-mikhail commentedJul 14, 2025
edited
Loading

Yes, there is some work to be done.
You can see PR (#136628) for 3.14, which has already being merged.
OTOH, there're two CPython versions at prerelease/bugfix stages: 3.14 and 3.13 (https://devguide.python.org/versions/).
So, you should manually prepare PR for 3.13 version with cherry-picking technique.

Also, this link maybe useful:https://devguide.python.org/contrib/core-team/committing/#backporting-changes-to-an-older-version

duaneg reacted with thumbs up emoji

@bedevere-app
Copy link

GH-136645 is a backport of this pull request to the3.13 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.13bugs and security fixes labelJul 14, 2025
@bedevere-app
Copy link

GH-136648 is a backport of this pull request to the3.13 branch.

@duaneg
Copy link
ContributorAuthor

GH-136645 is a backport of this pull request to the3.13 branch.

This first one I made a mistake and tried to merge it into main, sorry. Hopefully the second attempt worked correctly!

efimov-mikhail reacted with thumbs up emoji

picnixz pushed a commit that referenced this pull requestJul 14, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@efimov-mikhailefimov-mikhailefimov-mikhail approved these changes

@picnixzpicnixzpicnixz approved these changes

@serhiy-storchakaserhiy-storchakaAwaiting requested review from serhiy-storchaka

Assignees

@picnixzpicnixz

Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@duaneg@efimov-mikhail@picnixz

[8]ページ先頭

©2009-2025 Movatter.jp