NotificationsYou must be signed in to change notification settings
Fork34.1k
Star71.7k

UTF-8 Email parsing/serialising: Roundtrip exits with “surrogates not allowed” #113594

Closed

UTF-8 Email parsing/serialising: Roundtrip exits with “surrogates not allowed”#113594

Assignees

Labels

3.11only security fixes3.12only security fixes3.13bugs and security fixestopic-emailtype-bugAn unexpected behavior, bug, or error

Description

bronger

opened

on Dec 31, 2023

Bug report

Bug description:

In the attached Python minimal example,email_raw_1 survives a round-trip from UTF-8 bytes string to an EmailMessage object and back to a string, whileemail_raw_2 does not:

Traceback (most recent call last):
File "//surrogate_issue.py", line 29, in
print(message_2)
…
File "/usr/local/lib/python3.12/email/_encoded_words.py", line 224, in encode
bstring = string.encode(charset)
^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-2: surrogates not allowed

Funny thing is that the only difference is an additional digit in the middle of it.

The email is malformed, however, it is taken from an actual mail athttps://wilson.bronger.org/5105.txt. Malformed or not, my other email machinery can deal with it, so I think Python should handle such real-world specimen on best-effort basis without exiting.

#!/bin/pythonimportemail,email.policyemail_raw_1="""Content-Type: multipart/mixed; boundary="==="--===Content-Type: message/plain 您0123456789012.3456789--===--""".encode()email_raw_2="""Content-Type: multipart/mixed; boundary="==="--===Content-Type: message/plain 您0123456789012.34567890--===--""".encode()message_1=email.message_from_bytes(email_raw_1,policy=email.policy.SMTPUTF8)message_2=email.message_from_bytes(email_raw_2,policy=email.policy.SMTPUTF8)print(message_1)print(message_2)

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Linked PRs

Metadata

Assignees

serhiy-storchaka

Labels

3.11only security fixes3.12only security fixes3.13bugs and security fixestopic-emailtype-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UTF-8 Email parsing/serialising: Roundtrip exits with “surrogates not allowed” #113594

Description

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions