Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-88500: Reduce memory use ofurllib.unquote#96763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
gpshead merged 5 commits intopython:mainfromgpshead:gh/88500/unquote_mem_use
Dec 11, 2022

Conversation

gpshead
Copy link
Member

@gpsheadgpshead commentedSep 12, 2022
edited
Loading

urllib.unquote_to_bytes andurllib.unquote could both potentially generateO(len(string)) intermediatebytes orstr objects while computing the unquoted final result depending on the input provided. As Python objects are relatively large, this could consume a lot of ram.

This switches the implementation to using an expandingbytearray and a generator internally instead of precomputedsplit() style operations.

Microbenchmarks with some antagonistic inputs likemess = "\u0141%%%20a%fe"*1000 show this is 10-20% slower for unquote and unquote_to_bytes and no different for typical inputs that are short or lack much unicode or % escaping. But the functions are already quite fast anyways so not a big deal. The slowdown scales consistently linear with input size as expected.

Memory usage observed manually using/usr/bin/time -v onpython -m timeit runs of larger inputs. Unittesting memory consumption is difficult and does not seem worthwhile.

Observed memory usage is ~1/2 forunquote() and <1/3 forunquote_to_bytes() usingpython -m timeit -s 'from urllib.parse import unquote, unquote_to_bytes; v="\u0141%01\u0161%20"*500_000' 'unquote_to_bytes(v)' as a test.

Closes#88500.

`urllib.unquote_to_bytes` and `urllib.unquote` could both potentiallygenerate `O(len(string))` intermediate `bytes` or `str` objects whilecomputing the unquoted final result depending on the input provided. AsPython objects are relatively large, this could consume a lot of ram.This switches the implementation to using an expanding `bytearray` and agenerator internally instead of precomputed `split()` style operations.
@gpshead
Copy link
MemberAuthor

gpshead commentedSep 12, 2022
edited
Loading

Microbenchmarks with some antagonistic inputs likemess = "\u0141%%%20a%fe"*1000 show this is 10-20% slower for unquote and unquote_to_bytes and no different for typical inputs that are short or lack much unicode or % escaping. But the functions are already quite fast anyways so not a big deal. The slowdown scales consistently linear with input size as expected.

Memory usage observed manually using/usr/bin/time -v onpython -m timeit runs of larger inputs. Unittesting memory consumption is difficult and does not seem worthwhile.

Memory usage is ~1/2 forunquote() and <1/3 forunquote_to_bytes() usingpython -m timeit -s 'from urllib.parse import unquote, unquote_to_bytes; v="\u0141%01\u0161%20"*500_000' 'unquote_to_bytes(v)' as a test.

@gpsheadgpshead added type-featureA feature request or enhancement performancePerformance or resource usage stdlibPython modules in the Lib dir labelsSep 16, 2022
@gpsheadgpshead marked this pull request as ready for reviewSeptember 16, 2022 08:28
@gpshead
Copy link
MemberAuthor

any thoughts from reviewers?

@gpsheadgpshead requested review fromambv and removed request forethanfurman andsweeneydeNovember 11, 2022 09:21
@gpsheadgpshead merged commit2e279e8 intopython:mainDec 11, 2022
@gpsheadgpshead deleted the gh/88500/unquote_mem_use branchDecember 11, 2022 00:17
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@ambvambvAwaiting requested review from ambv

Assignees

@gpsheadgpshead

Labels
performancePerformance or resource usagestdlibPython modules in the Lib dirtype-featureA feature request or enhancement
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Reduce memory usage of urllib.unquote and unquote_to_bytes
2 participants
@gpshead@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp