Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-134873: fix various quadratic worst-time complexities in_header_value_parser.py [WIP]#134947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
picnixz wants to merge3 commits intopython:main
base:main
Choose a base branch
Loading
frompicnixz:fix/email/dos-134873

Conversation

picnixz
Copy link
Member

@picnixzpicnixz commentedMay 30, 2025
edited by bedevere-appbot
Loading

This is a work in progress (I need to go now) but I'll continue tomorrow. I want to add tests, and some other places are still not fixed because I didn't find a straightforward fix.

johnzhou721 reacted with rocket emoji
@picnixzpicnixz changed the titlegh-134873: fix various quadratic worst-time complexity in_header_value_parser.py [WIP]gh-134873: fix various quadratic worst-time complexities in_header_value_parser.py [WIP]May 30, 2025
@picnixzpicnixz added type-securityA security issue needs backport to 3.9only security fixes needs backport to 3.10only security fixes needs backport to 3.11only security fixes needs backport to 3.12only security fixes needs backport to 3.13bugs and security fixes needs backport to 3.14bugs and security fixes labelsMay 30, 2025
@picnixz
Copy link
MemberAuthor

picnixz commentedMay 31, 2025
edited
Loading

Urgh, so there is still a quadratic complexity that I need to think about, it's when we alternate if-branches. For instance inget_phrase:

try:token,value=get_word(value)phrase.append(token)excepterrors.HeaderParseError:phrase.defects.append(errors.InvalidHeaderDefect("phrase does not start with word"))whilevalueandvalue[0]notinPHRASE_ENDS:ifvalue[0]=='.':phrase.append(DOT)phrase.defects.append(errors.ObsoleteHeaderDefect("period in 'phrase'"))value=value[1:]else:try:token,value=get_word(value)excepterrors.HeaderParseError:ifvalue[0]inCFWS_LEADER:token,value=get_cfws(value)phrase.defects.append(errors.ObsoleteHeaderDefect("comment found without atom"))else:raisephrase.append(token)returnphrase,value

The

ifvalue[0]=='.':phrase.append(DOT)phrase.defects.append(errors.ObsoleteHeaderDefect("period in 'phrase'"))value=value[1:]

is quadratic if we're processing multiple times.. However, if I have something like'a + '.a' * 100, then theif branch still requires a copy every two iterations, whatever I put inside. Even if the length reduces at each iteration, it doesn't sufficiently reduce. I'll need to think a bit more.

One idea would have been to use a deque to prevent a copy when stripping parts, but then this requires to reconstruct a deque everytime.

Maybe switch to a stateful parser? That way, we shouldn't have high complexity and we should be fine. But this requires a complete rewrite of this module.

@picnixz
Copy link
MemberAuthor

Ok, this patch still fixessome cases but not all. Cases where two branches alternate would still be subject to O(n²) complexities (unless we avoid the copy in.lstrip() or in[1:], it's not possible to avoid this with.lstrip() or slices since they still need O(n) to copy the rest of the string).

The advantage of.lstrip() over slices is essentially when theif branch is executed more than once before going to theelse branch (namely, we can batch-process some characters). For instance,"a" + "." * 50000 is parsed usinglstrip() in O(n) instead of O(n²). However,"a" + ".a" * 50000 is still subject to O(n²) parsing.

@picnixz
Copy link
MemberAuthor

picnixz commentedMay 31, 2025
edited
Loading

@serhiy-storchaka I'm a bit stuck here. I don't really have a better idea than to rewrite the module where we would use adeque to hold the current value. That way, I can call.popleft() to drop prefixed chars. Unfortunately, this also means that cannot really return a string anymore as the module is used and signatures are actually in_typeshed:https://github.com/python/typeshed/blob/main/stdlib/email/_header_value_parser.pyi.

So, to ensure backward compatibility, I think I'll need a new module... I can't think of another solution instead of entirely rewriting the logic so that we don't have un-necessary slice so your help would be appreciated, TiA!


I thought about holding the current "index" where the parser stopped but again, I don't think it'll be sufficient as I'll still need to make slices at some point to extract some values to hold (OTOH, using a deque allows me to move some characters to elsewhere without having to copy the string twice, though I'll still need a ''.join() on the part that is being stored).

defget_something(value):storage=Storage()head= []whilecond(value):head.append(value.popleft())storage.value=''.join(head)returnstorage,value

johnzhou721

This comment was marked as resolved.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@johnzhou721johnzhou721johnzhou721 left review comments

@serhiy-storchakaserhiy-storchakaAwaiting requested review from serhiy-storchaka

Assignees
No one assigned
Labels
needs backport to 3.9only security fixesneeds backport to 3.10only security fixesneeds backport to 3.11only security fixesneeds backport to 3.12only security fixesneeds backport to 3.13bugs and security fixesneeds backport to 3.14bugs and security fixestype-securityA security issue
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@picnixz@johnzhou721

[8]ページ先頭

©2009-2025 Movatter.jp