Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork34k
Correctly fold unknown-8bit originating from encoded words.#142517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
The unknown-8bit trick was designed to deal with unknown bytes in anASCII message, and it works fine for that. However, I also tried toextend it to handle bytes that can't be decoded using the charsetspecified in an encoded word, and there it fails because there can beother non-ASCII characters that were *successfully* decoded. The fix issimple: do the unknown-8bit encoding using the utf-8 codec. This isespecially appropriate since anyone trying to do recovery on an unknownbyte string will probably attempt utf-8 first.
4ae90b4 to1bba134Comparebitdancer commentedDec 16, 2025
Does anyone want to review this, or shall I just merge it? |
1e17ccd intopython:mainUh oh!
There was an error while loading.Please reload this page.
Thanks@bitdancer for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13. |
Thanks@bitdancer for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14. |
…-142517)The unknown-8bit trick was designed to deal with unknown bytes in anASCII message, and it works fine for that. However, I also tried toextend it to handle bytes that can't be decoded using the charsetspecified in an encoded word, and there it fails because there can beother non-ASCII characters that were *successfully* decoded. The fix issimple: do the unknown-8bit encoding using the utf-8 codec. This isespecially appropriate since anyone trying to do recovery on an unknownbyte string will probably attempt utf-8 first.(cherry picked from commit1e17ccd)Co-authored-by: R. David Murray <rdmurray@bitdance.com>
GH-143146 is a backport of this pull request to the3.14 branch. |
…-142517)The unknown-8bit trick was designed to deal with unknown bytes in anASCII message, and it works fine for that. However, I also tried toextend it to handle bytes that can't be decoded using the charsetspecified in an encoded word, and there it fails because there can beother non-ASCII characters that were *successfully* decoded. The fix issimple: do the unknown-8bit encoding using the utf-8 codec. This isespecially appropriate since anyone trying to do recovery on an unknownbyte string will probably attempt utf-8 first.(cherry picked from commit1e17ccd)Co-authored-by: R. David Murray <rdmurray@bitdance.com>
GH-143147 is a backport of this pull request to the3.13 branch. |
…H-142517) (#143147)The unknown-8bit trick was designed to deal with unknown bytes in anASCII message, and it works fine for that. However, I also tried toextend it to handle bytes that can't be decoded using the charsetspecified in an encoded word, and there it fails because there can beother non-ASCII characters that were *successfully* decoded. The fix issimple: do the unknown-8bit encoding using the utf-8 codec. This isespecially appropriate since anyone trying to do recovery on an unknownbyte string will probably attempt utf-8 first.(cherry picked from commit1e17ccd)Co-authored-by: R. David Murray <rdmurray@bitdance.com>Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
…H-142517) (#143146)The unknown-8bit trick was designed to deal with unknown bytes in anASCII message, and it works fine for that. However, I also tried toextend it to handle bytes that can't be decoded using the charsetspecified in an encoded word, and there it fails because there can beother non-ASCII characters that were *successfully* decoded. The fix issimple: do the unknown-8bit encoding using the utf-8 codec. This isespecially appropriate since anyone trying to do recovery on an unknownbyte string will probably attempt utf-8 first.(cherry picked from commit1e17ccd)Co-authored-by: R. David Murray <rdmurray@bitdance.com>Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
The unknown-8bit trick was designed to deal with unknown bytes in an
ASCII message, and it works fine for that. However, I also tried to
extend it to handle bytes that can't be decoded using the charset
specified in an encoded word, and there it fails because there can be
other non-ASCII characters that weresuccessfully decoded. The fix is
simple: do the unknown-8bit encoding using the utf-8 codec. This is
especially appropriate since anyone trying to do recovery on an unknown
byte string will probably attempt utf-8 first.