Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.1k
gh-134873: fix various quadratic worst-time complexities in_header_value_parser.py
[WIP]#134947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Conversation
_header_value_parser.py
[WIP]_header_value_parser.py
[WIP]picnixz commentedMay 31, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Urgh, so there is still a quadratic complexity that I need to think about, it's when we alternate if-branches. For instance in try:token,value=get_word(value)phrase.append(token)excepterrors.HeaderParseError:phrase.defects.append(errors.InvalidHeaderDefect("phrase does not start with word"))whilevalueandvalue[0]notinPHRASE_ENDS:ifvalue[0]=='.':phrase.append(DOT)phrase.defects.append(errors.ObsoleteHeaderDefect("period in 'phrase'"))value=value[1:]else:try:token,value=get_word(value)excepterrors.HeaderParseError:ifvalue[0]inCFWS_LEADER:token,value=get_cfws(value)phrase.defects.append(errors.ObsoleteHeaderDefect("comment found without atom"))else:raisephrase.append(token)returnphrase,value The ifvalue[0]=='.':phrase.append(DOT)phrase.defects.append(errors.ObsoleteHeaderDefect("period in 'phrase'"))value=value[1:] is quadratic if we're processing multiple times
Maybe switch to a stateful parser? That way, we shouldn't have high complexity and we should be fine. But this requires a complete rewrite of this module. |
Ok, this patch still fixessome cases but not all. Cases where two branches alternate would still be subject to O(n²) complexities (unless we avoid the copy in The advantage of |
picnixz commentedMay 31, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@serhiy-storchaka I'm a bit stuck here. I don't really have a better idea than to rewrite the module where we would use a So, to ensure backward compatibility, I think I'll need a new module... I can't think of another solution instead of entirely rewriting the logic so that we don't have un-necessary slice so your help would be appreciated, TiA! I thought about holding the current "index" where the parser stopped but again, I don't think it'll be sufficient as I'll still need to make slices at some point to extract some values to hold (OTOH, using a deque allows me to move some characters to elsewhere without having to copy the string twice, though I'll still need a ''.join() on the part that is being stored). defget_something(value):storage=Storage()head= []whilecond(value):head.append(value.popleft())storage.value=''.join(head)returnstorage,value |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
This is a work in progress (I need to go now) but I'll continue tomorrow. I want to add tests, and some other places are still not fixed because I didn't find a straightforward fix.