Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.4k
gh-121650: Encode newlines in headers, and verify headers are sound#122233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
gh-121650: Encode newlines in headers, and verify headers are sound#122233
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This should fail for custom fold() implementations that aren't carefulabout newlines.
Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.---Credit for an earlier attempt:Co-Authored-By: Bas Bloemsaat <bas@bloemsaat.org>
That sounds entirely reasonable, and conforms to the RFCs. Two points:
fromemailimportmessage_from_stringemail_in="""Subject: foo <bar>\nBCC: injected@example.comTo: incoming+tag@me.example.comFrom: External Sender <sender@them.example.com>message body"""msg=message_from_string(email_in)print(msg) |
This is in the branch that handles strings (rather than custom Header object). I'm not clears what kind of format that string is supposed to be in.
That |
I'm not touching other instances in this file, since this PR mightbe backported to very old versions.
@serhiy-storchaka, would you like to review this? |
bedevere-bot commentedJul 29, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…s are sound (pythonGH-122233)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
GH-122609 is a backport of this pull request to the3.10 branch. |
… are sound (pythonGH-122233)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
GH-122610 is a backport of this pull request to the3.9 branch. |
… are sound (pythonGH-122233)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
GH-122611 is a backport of this pull request to the3.8 branch. |
…sound (GH-122233) (#122484)gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233)GH-GH- Encode header parts that contain newlinesPer RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.GH-GH- Verify that email headers are well-formedThis should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…sound (GH-122233) (#122599)*gh-121650: Encode newlines in headers, and verify headers are sound (GH-122233)- Encode header parts that contain newlinesPer RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.- Verify that email headers are well-formedThis should fail for custom fold() implementations that aren't carefulabout newlines.Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>(cherry picked from commit0976339)* Document changes as made in 3.12.5
…s are soundpythongh-121650: Encode newlines in headers, and verify headers are sound (pythonGH-122233)Encode header parts that contain newlinesPer RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.Verify that email headers are well-formedThis should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…s are sound (pythonGH-122233)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…s are sound (pythonGH-122233)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…s are sound (pythonGH-122233)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
headers are sound (pythonGH-122233)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>This patch also contains modified commit cherry picked fromc5bba85.This commit was backported to simplify the backport of the other commitfixing CVE. The only modification is a removal of one test case whichtests multiple changes in Python 3.7 and it wasn't working properlywith Python 3.6 where we backported only one change.Co-authored-by: bsiem <52461103+bsiem@users.noreply.github.com>
headers are sound (pythonGH-122233)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)This patch also contains modified commit cherry picked fromc5bba85.This commit was backported to simplify the backport of the other commitfixing CVE. The only modification is a removal of one test case whichtests multiple changes in Python 3.7 and it wasn't working properlywith Python 3.6 where we backported only one change.Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-authored-by: bsiem <52461103+bsiem@users.noreply.github.com>
headers are sound (pythonGH-122233)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)This patch also contains modified commit cherry picked fromc5bba85.This commit was backported to simplify the backport of the other commitfixing CVE. The only modification is a removal of one test case whichtests multiple changes in Python 3.7 and it wasn't working properlywith Python 3.6 where we backported only one change.Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-authored-by: bsiem <52461103+bsiem@users.noreply.github.com>
…ound (pythonGH-122233)## Encode header parts that contain newlinesPer RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.## Verify that email headers are well-formedThis should fail for custom fold() implementations that aren't carefulabout newlines.Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…ound (GH-122233) (#122611)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…sound (GH-122233) (#122608)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.Verify that email headers are well-formed.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…sound (GH-122233) (#122609)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
…ound (GH-122233) (#122610)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Changelog:https://docs.python.org/release/3.12.5/whatsnew/changelog.htmlInclude security fixCVE-2024-6923Reference:https://nvd.nist.gov/vuln/detail/CVE-2024-6923python/cpython#122233(From OE-Core rev: 777cad793a5b07d392b1d9875530fb5480e75863)Signed-off-by: Vijay Anusuri <vanusuri@mvista.com>Signed-off-by: Steve Sakoman <steve@sakoman.com>
…s are sound (pythonGH-122233)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)This patch also contains modified commit cherry picked fromc5bba85.This commit was backported to simplify the backport of the other commitfixing CVE. The only modification is a removal of one test case whichtests multiple changes in Python 3.7 and it wasn't working properlywith Python 3.6 where we backported only one change.Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-authored-by: bsiem <52461103+bsiem@users.noreply.github.com>
…s are sound (pythonGH-122233)Per RFC 2047:> [...] these encoding schemes allow the> encoding of arbitrary octet values, mail readers that implement this> decoding should also ensure that display of the decoded data on the> recipient's terminal will not cause unwanted side-effectsIt seems that the "quoted-word" scheme is a valid way to includea newline character in a header value, just like we already allowundecodable bytes or control characters.They do need to be properly quoted when serialized to text, though.This should fail for custom fold() implementations that aren't carefulabout newlines.(cherry picked from commit0976339)This patch also contains modified commit cherry picked fromc5bba85.This commit was backported to simplify the backport of the other commitfixing CVE. The only modification is a removal of one test case whichtests multiple changes in Python 3.7 and it wasn't working properlywith Python 3.6 where we backported only one change.Co-authored-by: Petr Viktorin <encukou@gmail.com>Co-authored-by: Bas Bloemsaat <bas@bloemsaat.org>Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>Co-authored-by: bsiem <52461103+bsiem@users.noreply.github.com>
Uh oh!
There was an error while loading.Please reload this page.
Re:#121812
Hello@basbloemsaat,
I've spent the day reading through the email module, and RFCs, and I believe I found a better place to fix the issue.
This involved lots of experimentation, so I'm sending an alternative PR rather than a review on yours.
The generator (writer) verifies that the representation of each header is sound (a parser won't treat it as multiple headers, start-of-body, or part of another header). That should cover custom
fold()
implementations orHeader
subclasses.Newlines areencoded in
fold()
, just like undecodable bytes and other special characters.Overall, this means that we treat newlines as validcontent of headers, but “escape” them when such a header isserialized to text.
This PR is a proof of concept. It needs tests and documentation, but I'm out of time for today, and I wanted to share what I have.
Does this look reasonable to you?