Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-83938, gh-122476: Stop incorrectly RFC 2047 encoding non-ASCII email addresses#122540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
medmunds wants to merge6 commits intopython:main
base:main
Choose a base branch
Loading
frommedmunds:fix-issues-83938-122476
Open
Show file tree
Hide file tree
Changes fromall commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletionsDoc/library/email.errors.rst
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -59,6 +59,15 @@ The following exception classes are defined in the :mod:`email.errors` module:
headers.


.. exception:: InvalidMailboxError()

Raised when serializing a message with an address header that contains
a mailbox incompatible with the policy in use.
(See :attr:`email.policy.EmailPolicy.utf8`.)

.. versionadded:: 3.14


.. exception:: MessageDefect()

This is the base class for all defects found when parsing email messages.
Expand Down
15 changes: 13 additions & 2 deletionsDoc/library/email.policy.rst
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -406,11 +406,22 @@ added matters. To illustrate::
.. attribute:: utf8

If ``False``, follow :rfc:`5322`, supporting non-ASCII characters in
headers by encoding them as "encoded words". If ``True``, follow
:rfc:`6532` and use ``utf-8`` encoding for headers. Messages
headers by encoding them as:rfc:`2047`"encoded words". If ``True``,
follow:rfc:`6532` and use ``utf-8`` encoding for headers. Messages
formatted in this way may be passed to SMTP servers that support
the ``SMTPUTF8`` extension (:rfc:`6531`).

When ``False``, the generator will raise an
:exc:`~email.errors.InvalidMailboxError` if any address header includes
a mailbox ("addr-spec") with non-ASCII characters. To use a mailbox with
an internationalized domain name, first encode the domain using the
third-party :pypi:`idna` or :pypi:`uts46` module or with
:mod:`encodings.idna`. It is not possible to use a non-ASCII username
("local-part") in a mailbox when ``utf8=False``.

.. versionchanged:: 3.14
Raises :exc:`~email.errors.InvalidMailboxError`. (Earlier versions
incorrectly applied :rfc:`2047` to non-ASCII addr-specs.)

.. attribute:: refold_source

Expand Down
11 changes: 11 additions & 0 deletionsLib/email/_header_value_parser.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -2837,6 +2837,17 @@ def _refold_parse_tree(parse_tree, *, policy):
_fold_mime_parameters(part, lines, maxlen, encoding)
continue

if want_encoding and part.token_type == 'addr-spec':
# RFC2047 forbids encoded-word in any part of an addr-spec.
if charset == 'unknown-8bit':
# Non-ASCII addr-spec came from parsed message; leave unchanged.
want_encoding = False
else:
raise errors.InvalidMailboxError(
"Non-ASCII address requires policy with utf8=True:"
" '{}'".format(part)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would f"Non-ASCII mailbox {str(part)!r} is invalid under current policy setting (utf8=False)" be clearer do you think?

)

if want_encoding and not wrap_as_ew_blocked:
if not part.as_ew_allowed:
want_encoding = False
Expand Down
4 changes: 4 additions & 0 deletionsLib/email/errors.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -33,6 +33,10 @@ class HeaderWriteError(MessageError):
"""Error while writing headers."""


class InvalidMailboxError(MessageError, ValueError):
"""A mailbox was not compatible with the policy in use."""


# These are parsing defects which the parser was able to work around.
class MessageDefect(ValueError):
"""Base class for a message defect."""
Expand Down
60 changes: 58 additions & 2 deletionsLib/test/test_email/test_generator.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
import io
import re
import textwrap
import unittest
from email import message_from_string, message_from_bytes
Expand DownExpand Up@@ -288,6 +289,30 @@ def test_keep_long_encoded_newlines(self):
g.flatten(msg)
self.assertEqual(s.getvalue(), self.typ(expected))

def test_non_ascii_addr_spec_raises(self):
# RFC2047 encoded-word is not permitted in any part of an addr-spec.
# (See also test_non_ascii_addr_spec_preserved below.)
g = self.genclass(self.ioclass(), policy=self.policy.clone(utf8=False))
cases = [
'wők@example.com',
'wok@exàmple.com',
'wők@exàmple.com',
'"Name, for display" <wők@example.com>',
'Näyttönimi <wők@example.com>',
]
for address in cases:
with self.subTest(address=address):
msg = EmailMessage()
msg['To'] = address
addr_spec = msg['To'].addresses[0].addr_spec
expected_error = (
fr"(?i)(?=.*non-ascii)(?=.*utf8.*True)(?=.*{re.escape(addr_spec)})"
)
with self.assertRaisesRegex(
email.errors.InvalidMailboxError, expected_error
):
g.flatten(msg)


class TestGenerator(TestGeneratorBase, TestEmailBase):

Expand DownExpand Up@@ -432,12 +457,12 @@ def test_cte_type_7bit_transforms_8bit_cte(self):

def test_smtputf8_policy(self):
msg = EmailMessage()
msg['From'] = "Páolo <főo@bar.com>"
msg['From'] = "Páolo <főo@bàr.com>"
msg['To'] = 'Dinsdale'
msg['Subject'] = 'Nudge nudge, wink, wink \u1F609'
msg.set_content("oh là là, know what I mean, know what I mean?")
expected = textwrap.dedent("""\
From: Páolo <főo@bar.com>
From: Páolo <főo@bàr.com>
To: Dinsdale
Subject: Nudge nudge, wink, wink \u1F609
Content-Type: text/plain; charset="utf-8"
Expand DownExpand Up@@ -472,6 +497,37 @@ def test_smtp_policy(self):
g.flatten(msg)
self.assertEqual(s.getvalue(), expected)

def test_non_ascii_addr_spec_preserved(self):
# A defective non-ASCII addr-spec parsed from the original
# message is left unchanged when flattening.
# (See also test_non_ascii_addr_spec_raises above.)
source = (
'To: jörg@example.com, "But a long name still works with refold_source" <jörg@example.com>'
).encode()
expected = (
b'To: j\xc3\xb6rg@example.com,\n'
b' "But a long name still works with refold_source" <j\xc3\xb6rg@example.com>\n'
b'\n'
)
msg = message_from_bytes(source, policy=policy.default)
s = io.BytesIO()
g = BytesGenerator(s, policy=policy.default)
g.flatten(msg)
self.assertEqual(s.getvalue(), expected)

def test_idna_encoding_preserved(self):
# Nothing tries to decode a pre-encoded IDNA domain.
msg = EmailMessage()
msg["To"] = Address(
username='jörg',
domain='☕.example'.encode('idna').decode() # IDNA 2003
)
expected = 'To: jörg@xn--53h.example\n\n'.encode()
s = io.BytesIO()
g = BytesGenerator(s, policy=policy.default.clone(utf8=True))
g.flatten(msg)
self.assertEqual(s.getvalue(), expected)


if __name__ == '__main__':
unittest.main()
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
The :mod:`email` module no longer incorrectly encodes non-ASCII characters
in email addresses using :rfc:`2047` encoding. Under a policy with ``utf8=True``
this means the addresses will be correctly passed through. Under a policy with
``utf8=False``, attempting to serialize a message with non-ASCII email addresses
will now result in an :exc:`~email.errors.InvalidMailboxError`.
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
The :mod:`email` module no longer incorrectly encodes non-ASCII characters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Duplicate news item.

in email addresses using :rfc:`2047` encoding. Under a policy with ``utf8=True``
this means the addresses will be correctly passed through. Under a policy with
``utf8=False``, attempting to serialize a message with non-ASCII email addresses
will now result in an :exc:`~email.errors.InvalidMailboxError`.
Loading

[8]ページ先頭

©2009-2025 Movatter.jp