Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-133767: Fix use-after-free in the unicode-escape decoder with an error handler#129648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchakaserhiy-storchaka commentedFeb 4, 2025
edited by bedevere-appbot
Loading

If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal().

If the error handler is used, a new bytes object is created to set asthe object attribute of UnicodeDecodeError, and that bytes object thenreplaces the original data. A pointer to the decoded data will became invalidafter destroying that temporary bytes object. So we need other way to returnthe first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal()._PyBytes_DecodeEscape() does not have such issue, because it does notuse the error handlers registry, but it should be changed for compatibilitywith _PyUnicode_DecodeUnicodeEscapeInternal().
@gpshead
Copy link
Member

Nice! This is similar enough, but clearly far more polished, than what I quickly whipped up while trying to understand the problem and linked to on the PSRT mailing list... that I won't bother posting my own draft PR.

I don't have a good feel for if we need to retain the older internal-use-only C APIs or not, but doing this change via ones with a suffix as you seem to be proposing and leaving the old, though now unused by our own internals, ones in place in case something else references them makes sense to me.

@serhiy-storchaka
Copy link
MemberAuthor

I experimented with several different solutions. One of them was similar to yours, except that I copied all three bytes. It was also necessary to distinguish "no invalid escape" from "escaped null byte". In the end, the currently proposed solution is the simplest.

This PR does not leave the old C API. I do not think that it is needed. The functions are renamed because an error at link time is more preferable than undefined behavior at run time.

gpshead reacted with thumbs up emoji

@serhiy-storchakaserhiy-storchaka changed the titleFix use-after-free in the unicode-escape decoder with error handlergh-133767: Fix use-after-free in the unicode-escape decoder with an error handlerMay 9, 2025
@serhiy-storchakaserhiy-storchaka marked this pull request as ready for reviewMay 9, 2025 17:24
@serhiy-storchaka
Copy link
MemberAuthor

After adding a NEWS entry it is ready for review.

The code is now more complex, decoding functions now return both the invalid char and its positions. This is because the new code in the Python parser needs the position. It can be returned if there was no decoding errors handled by the error handler. The Python parser does not use the error handler.

@gpsheadgpshead added needs backport to 3.13bugs and security fixes needs backport to 3.14bugs and security fixes 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelsMay 10, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@gpshead for commit7194b4d 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F129648%2Fmerge

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelMay 10, 2025
@serhiy-storchakaserhiy-storchaka added type-securityA security issue needs backport to 3.9only security fixes needs backport to 3.10only security fixes needs backport to 3.11only security fixes needs backport to 3.12only security fixes labelsMay 10, 2025
@serhiy-storchakaserhiy-storchaka merged commit9f69a58 intopython:mainMay 12, 2025
140 of 142 checks passed
@serhiy-storchakaserhiy-storchaka deleted the unicode-escape-decode-errors branchMay 12, 2025 17:42
miss-islington pushed a commit to miss-islington/cpython that referenced this pull requestMay 12, 2025
…h an error handler (pythonGH-129648)If the error handler is used, a new bytes object is created to set asthe object attribute of UnicodeDecodeError, and that bytes object thenreplaces the original data. A pointer to the decoded data will became invalidafter destroying that temporary bytes object. So we need other way to returnthe first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal()._PyBytes_DecodeEscape() does not have such issue, because it does notuse the error handlers registry, but it should be changed for compatibilitywith _PyUnicode_DecodeUnicodeEscapeInternal().(cherry picked from commit9f69a58)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@miss-islington-app
Copy link

Sorry,@serhiy-storchaka, I could not cleanly backport this to3.13 due to a conflict.
Please backport usingcherry_picker on command line.

cherry_picker 9f69a58623bd01349a18ba0c7a9cb1dad6a51e8e 3.13

@bedevere-app
Copy link

GH-133942 is a backport of this pull request to the3.14 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.14bugs and security fixes labelMay 12, 2025
@miss-islington-app
Copy link

Sorry,@serhiy-storchaka, I could not cleanly backport this to3.12 due to a conflict.
Please backport usingcherry_picker on command line.

cherry_picker 9f69a58623bd01349a18ba0c7a9cb1dad6a51e8e 3.12

@miss-islington-app
Copy link

Sorry,@serhiy-storchaka, I could not cleanly backport this to3.11 due to a conflict.
Please backport usingcherry_picker on command line.

cherry_picker 9f69a58623bd01349a18ba0c7a9cb1dad6a51e8e 3.11

@miss-islington-app
Copy link

Sorry,@serhiy-storchaka, I could not cleanly backport this to3.10 due to a conflict.
Please backport usingcherry_picker on command line.

cherry_picker 9f69a58623bd01349a18ba0c7a9cb1dad6a51e8e 3.10

@miss-islington-app
Copy link

Sorry,@serhiy-storchaka, I could not cleanly backport this to3.9 due to a conflict.
Please backport usingcherry_picker on command line.

cherry_picker 9f69a58623bd01349a18ba0c7a9cb1dad6a51e8e 3.9

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull requestMay 12, 2025
…der with an error handler (pythonGH-129648)If the error handler is used, a new bytes object is created to set asthe object attribute of UnicodeDecodeError, and that bytes object thenreplaces the original data. A pointer to the decoded data will became invalidafter destroying that temporary bytes object. So we need other way to returnthe first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal()._PyBytes_DecodeEscape() does not have such issue, because it does notuse the error handlers registry, but it should be changed for compatibilitywith _PyUnicode_DecodeUnicodeEscapeInternal().(cherry picked from commit9f69a58)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@bedevere-app
Copy link

GH-133944 is a backport of this pull request to the3.13 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.13bugs and security fixes labelMay 12, 2025
@serhiy-storchakaserhiy-storchaka removed needs backport to 3.9only security fixes needs backport to 3.10only security fixes needs backport to 3.11only security fixes needs backport to 3.12only security fixes labelsMay 12, 2025
@serhiy-storchakaserhiy-storchaka removed their assignmentMay 12, 2025
serhiy-storchaka added a commit that referenced this pull requestMay 13, 2025
…th an error handler (GH-129648) (GH-133942)If the error handler is used, a new bytes object is created to set asthe object attribute of UnicodeDecodeError, and that bytes object thenreplaces the original data. A pointer to the decoded data will became invalidafter destroying that temporary bytes object. So we need other way to returnthe first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal()._PyBytes_DecodeEscape() does not have such issue, because it does notuse the error handlers registry, but it should be changed for compatibilitywith _PyUnicode_DecodeUnicodeEscapeInternal().(cherry picked from commit9f69a58)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
encukou pushed a commit that referenced this pull requestMay 20, 2025
…th an error handler (GH-129648) (GH-133944)If the error handler is used, a new bytes object is created to set asthe object attribute of UnicodeDecodeError, and that bytes object thenreplaces the original data. A pointer to the decoded data will became invalidafter destroying that temporary bytes object. So we need other way to returnthe first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal()._PyBytes_DecodeEscape() does not have such issue, because it does notuse the error handlers registry, but it should be changed for compatibilitywith _PyUnicode_DecodeUnicodeEscapeInternal().(cherry picked from commit9f69a58)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull requestMay 20, 2025
…der with an error handler (pythonGH-129648) (pythonGH-133944)If the error handler is used, a new bytes object is created to set asthe object attribute of UnicodeDecodeError, and that bytes object thenreplaces the original data. A pointer to the decoded data will became invalidafter destroying that temporary bytes object. So we need other way to returnthe first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal()._PyBytes_DecodeEscape() does not have such issue, because it does notuse the error handlers registry, but it should be changed for compatibilitywith _PyUnicode_DecodeUnicodeEscapeInternal().(cherry picked from commit9f69a58)(cherry picked from commit6279eb8)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull requestMay 20, 2025
…der with an error handler (pythonGH-129648) (pythonGH-133944)If the error handler is used, a new bytes object is created to set asthe object attribute of UnicodeDecodeError, and that bytes object thenreplaces the original data. A pointer to the decoded data will became invalidafter destroying that temporary bytes object. So we need other way to returnthe first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal()._PyBytes_DecodeEscape() does not have such issue, because it does notuse the error handlers registry, but it should be changed for compatibilitywith _PyUnicode_DecodeUnicodeEscapeInternal().(cherry picked from commit9f69a58)(cherry picked from commit6279eb8)(cherry picked from commita75953b)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull requestMay 20, 2025
…der with an error handler (pythonGH-129648) (pythonGH-133944)If the error handler is used, a new bytes object is created to set asthe object attribute of UnicodeDecodeError, and that bytes object thenreplaces the original data. A pointer to the decoded data will became invalidafter destroying that temporary bytes object. So we need other way to returnthe first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal()._PyBytes_DecodeEscape() does not have such issue, because it does notuse the error handlers registry, but it should be changed for compatibilitywith _PyUnicode_DecodeUnicodeEscapeInternal().(cherry picked from commit9f69a58)(cherry picked from commit6279eb8)(cherry picked from commita75953b)(cherry picked from commit0c33e5b)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this pull requestMay 20, 2025
…er with an error handler (pythonGH-129648) (pythonGH-133944)If the error handler is used, a new bytes object is created to set asthe object attribute of UnicodeDecodeError, and that bytes object thenreplaces the original data. A pointer to the decoded data will became invalidafter destroying that temporary bytes object. So we need other way to returnthe first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal()._PyBytes_DecodeEscape() does not have such issue, because it does notuse the error handlers registry, but it should be changed for compatibilitywith _PyUnicode_DecodeUnicodeEscapeInternal().(cherry picked from commit9f69a58)(cherry picked from commit6279eb8)(cherry picked from commita75953b)(cherry picked from commit0c33e5b)(cherry picked from commit8b528ca)Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@gpsheadgpsheadgpshead approved these changes

@Yhg1sYhg1sAwaiting requested review from Yhg1s

@ericvsmithericvsmithAwaiting requested review from ericvsmith

@sethmlarsonsethmlarsonAwaiting requested review from sethmlarson

@pablogsalpablogsalAwaiting requested review from pablogsalpablogsal is a code owner

@lysnikolaoulysnikolaouAwaiting requested review from lysnikolaoulysnikolaou is a code owner

Assignees
No one assigned
Labels
type-securityA security issue
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@serhiy-storchaka@gpshead@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp