Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Correctly fold unknown-8bit originating from encoded words.#142517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
bitdancer merged 2 commits intopython:mainfrombitdancer:undecodable_encoded_words
Dec 24, 2025

Conversation

@bitdancer
Copy link
Member

The unknown-8bit trick was designed to deal with unknown bytes in an
ASCII message, and it works fine for that. However, I also tried to
extend it to handle bytes that can't be decoded using the charset
specified in an encoded word, and there it fails because there can be
other non-ASCII characters that weresuccessfully decoded. The fix is
simple: do the unknown-8bit encoding using the utf-8 codec. This is
especially appropriate since anyone trying to do recovery on an unknown
byte string will probably attempt utf-8 first.

@bitdancerbitdancer requested a review froma team as acode ownerDecember 10, 2025 14:50
@bitdancerbitdancer self-assigned thisDec 10, 2025
The unknown-8bit trick was designed to deal with unknown bytes in anASCII message, and it works fine for that.  However, I also tried toextend it to handle bytes that can't be decoded using the charsetspecified in an encoded word, and there it fails because there can beother non-ASCII characters that were *successfully* decoded.  The fix issimple: do the unknown-8bit encoding using the utf-8 codec.  This isespecially appropriate since anyone trying to do recovery on an unknownbyte string will probably attempt utf-8 first.
@bitdancer
Copy link
MemberAuthor

Does anyone want to review this, or shall I just merge it?

@bitdancerbitdancer merged commit1e17ccd intopython:mainDec 24, 2025
48 checks passed
@bitdancerbitdancer added needs backport to 3.13bugs and security fixes needs backport to 3.14bugs and security fixes labelsDec 24, 2025
@miss-islington-app
Copy link

Thanks@bitdancer for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Thanks@bitdancer for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull requestDec 24, 2025
…-142517)The unknown-8bit trick was designed to deal with unknown bytes in anASCII message, and it works fine for that.  However, I also tried toextend it to handle bytes that can't be decoded using the charsetspecified in an encoded word, and there it fails because there can beother non-ASCII characters that were *successfully* decoded.  The fix issimple: do the unknown-8bit encoding using the utf-8 codec.  This isespecially appropriate since anyone trying to do recovery on an unknownbyte string will probably attempt utf-8 first.(cherry picked from commit1e17ccd)Co-authored-by: R. David Murray <rdmurray@bitdance.com>
@bedevere-app
Copy link

GH-143146 is a backport of this pull request to the3.14 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull requestDec 24, 2025
…-142517)The unknown-8bit trick was designed to deal with unknown bytes in anASCII message, and it works fine for that.  However, I also tried toextend it to handle bytes that can't be decoded using the charsetspecified in an encoded word, and there it fails because there can beother non-ASCII characters that were *successfully* decoded.  The fix issimple: do the unknown-8bit encoding using the utf-8 codec.  This isespecially appropriate since anyone trying to do recovery on an unknownbyte string will probably attempt utf-8 first.(cherry picked from commit1e17ccd)Co-authored-by: R. David Murray <rdmurray@bitdance.com>
@bedevere-appbedevere-appbot removed the needs backport to 3.14bugs and security fixes labelDec 24, 2025
@bedevere-app
Copy link

GH-143147 is a backport of this pull request to the3.13 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.13bugs and security fixes labelDec 24, 2025
bitdancer added a commit to bitdancer/cpython that referenced this pull requestDec 24, 2025
bitdancer added a commit that referenced this pull requestDec 24, 2025
bitdancer added a commit that referenced this pull requestDec 24, 2025
…H-142517) (#143147)The unknown-8bit trick was designed to deal with unknown bytes in anASCII message, and it works fine for that.  However, I also tried toextend it to handle bytes that can't be decoded using the charsetspecified in an encoded word, and there it fails because there can beother non-ASCII characters that were *successfully* decoded.  The fix issimple: do the unknown-8bit encoding using the utf-8 codec.  This isespecially appropriate since anyone trying to do recovery on an unknownbyte string will probably attempt utf-8 first.(cherry picked from commit1e17ccd)Co-authored-by: R. David Murray <rdmurray@bitdance.com>Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
bitdancer added a commit that referenced this pull requestDec 24, 2025
…H-142517) (#143146)The unknown-8bit trick was designed to deal with unknown bytes in anASCII message, and it works fine for that.  However, I also tried toextend it to handle bytes that can't be decoded using the charsetspecified in an encoded word, and there it fails because there can beother non-ASCII characters that were *successfully* decoded.  The fix issimple: do the unknown-8bit encoding using the utf-8 codec.  This isespecially appropriate since anyone trying to do recovery on an unknownbyte string will probably attempt utf-8 first.(cherry picked from commit1e17ccd)Co-authored-by: R. David Murray <rdmurray@bitdance.com>Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
@bitdancerbitdancer deleted the undecodable_encoded_words branchDecember 24, 2025 18:21
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

@bitdancerbitdancer

Labels

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

@bitdancer

[8]ページ先頭

©2009-2026 Movatter.jp