Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
Description
Bug report
When passed a bytestring that is over a hundred mebibytes (MiB), theurllib.parse.quote_from_bytes function uses much more memory and CPU than one would expect.
repro.py:
#!/usr/bin/env python3import base64from time import perf_counterfrom urllib.parse import quote_from_bytesMIB = 1024 ** 2def main(): bytes_ = base64.b64encode(100 * MIB * b'\x00') # note 1 start = perf_counter() quoted = quote_from_bytes(bytes_) stop = perf_counter() print(f"Quoting {len(bytes_)/1024**2:.3f} MiB took {stop-start} seconds")if __name__ == '__main__': main()I use/usr/bin/time to track how much CPU and memory is used.
$ /usr/bin/time -v ./repro.pyQuoting 133.333 MiB took 7.290915511985077 seconds Command being timed: "./repro.py" User time (seconds): 7.12 System time (seconds): 0.68 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.82 ... Maximum resident set size (kbytes): 1374872 ...The function ends up at one point needing ten times the size of the bytestring to quote it (i.e. 1.31 GiB). It also takes several seconds to return. I expect it to return in under a second. Fortunately, there's no memory leak as the interpreter does return the memory after the function returns.
Interestingly, if I reduce 100 to 90 in the line marked "note 1", the function returns in half a second and uses only 250 MiB, which is much more in line with my pre-bug expectations.
This function consuming so much memory affectsthe AWSSDK for Python, boto3, as a lot of AWS APIs are called with URL-encoded parameters. boto3/botocore callsurllib.parse.urlencode to do that encoding. That ends up calling the problematicquote_from_bytes. Sample stack trace:
File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 508, in _api_call return self._make_api_call(operation_name, kwargs) File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 898, in _make_api_call http, parsed_response = self._make_request( File "/usr/local/lib/python3.8/dist-packages/botocore/client.py", line 921, in _make_request return self._endpoint.make_request(operation_model, request_dict) File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 119, in make_request return self._send_request(request_dict, operation_model) File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 198, in _send_request request = self.create_request(request_dict, operation_model) File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 139, in create_request prepared_request = self.prepare_request(request) File "/usr/local/lib/python3.8/dist-packages/botocore/endpoint.py", line 150, in prepare_request return request.prepare() File "/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py", line 473, in prepare return self._request_preparer.prepare(self) File "/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py", line 360, in prepare body = self._prepare_body(original) File "/usr/local/lib/python3.8/dist-packages/botocore/awsrequest.py", line 416, in _prepare_body body = urlencode(params, doseq=True) File "/usr/lib/python3.8/urllib/parse.py", line 962, in urlencode v = quote_via(v, safe) File "/usr/lib/python3.8/urllib/parse.py", line 870, in quote_plus return quote(string, safe, encoding, errors) File "/usr/lib/python3.8/urllib/parse.py", line 859, in quote return quote_from_bytes(string, safe) File "/usr/lib/python3.8/urllib/parse.py", line 898, in quote_from_bytes return ''.join([quoter(char) for char in bs])Your environment
Python 3.8.10 on Ubuntu 20.04 running on a t3.large EC2 instance. I have also been able to reproduce it with Python 3.10.6 and 3.11.0rc1+. I also reproduced it on Windows 10 running Python 3.9.13.