Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.1k
gh-88500: Reduce memory use ofurllib.unquote
#96763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
`urllib.unquote_to_bytes` and `urllib.unquote` could both potentiallygenerate `O(len(string))` intermediate `bytes` or `str` objects whilecomputing the unquoted final result depending on the input provided. AsPython objects are relatively large, this could consume a lot of ram.This switches the implementation to using an expanding `bytearray` and agenerator internally instead of precomputed `split()` style operations.
gpshead commentedSep 12, 2022 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Microbenchmarks with some antagonistic inputs like Memory usage observed manually using Memory usage is ~1/2 for |
any thoughts from reviewers? |
Uh oh!
There was an error while loading.Please reload this page.
urllib.unquote_to_bytes
andurllib.unquote
could both potentially generateO(len(string))
intermediatebytes
orstr
objects while computing the unquoted final result depending on the input provided. As Python objects are relatively large, this could consume a lot of ram.This switches the implementation to using an expanding
bytearray
and a generator internally instead of precomputedsplit()
style operations.Microbenchmarks with some antagonistic inputs like
mess = "\u0141%%%20a%fe"*1000
show this is 10-20% slower for unquote and unquote_to_bytes and no different for typical inputs that are short or lack much unicode or % escaping. But the functions are already quite fast anyways so not a big deal. The slowdown scales consistently linear with input size as expected.Memory usage observed manually using
/usr/bin/time -v
onpython -m timeit
runs of larger inputs. Unittesting memory consumption is difficult and does not seem worthwhile.Observed memory usage is ~1/2 for
unquote()
and <1/3 forunquote_to_bytes()
usingpython -m timeit -s 'from urllib.parse import unquote, unquote_to_bytes; v="\u0141%01\u0161%20"*500_000' 'unquote_to_bytes(v)'
as a test.Closes#88500.