
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2014-03-10 09:35 byvstinner, last changed2022-04-11 14:57 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| base64_mem.py | vstinner,2014-03-10 09:35 | |||
| base64_on_demand.patch | vstinner,2014-03-10 10:41 | review | ||
| base64_lazy_init.patch | serhiy.storchaka,2014-03-10 11:26 | review | ||
| urlparse_lazy_init.patch | serhiy.storchaka,2014-03-10 11:41 | review | ||
| base64_urlparse_lazy_init-2.patch | vstinner,2014-03-14 16:01 | review | ||
| base64_urlparse_lazy_init-3.patch | vstinner,2014-03-14 16:58 | review | ||
| base64_urlparse_lazy_init-4.patch | vstinner,2014-03-14 17:35 | review | ||
| base64_urlparse_lazy_init-4.patch | serhiy.storchaka,2014-03-14 17:43 | review | ||
| Messages (18) | |||
|---|---|---|---|
| msg213018 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-03-10 09:35 | |
According to tracemalloc, "import base64" allocates 920.6 kB of memory. The 3 top locations are:Lib/base64.py:414: size=420 KiB, count=7226, average=59 B _b85chars2 = [(a + b) for a in _b85chars for b in _b85chars]Lib/base64.py:306: size=420 KiB, count=7226, average=59 B _a85chars2 = [(a + b) for a in _a85chars for b in _a85chars]Lib/base64.py:142: size=59.8 KiB, count=1025, average=60 B _b32tab2 = [a + b for a in _b32tab for b in _b32tab]Compare it to Python 3.3: base64 of Python 3.3 only allocates 10.3 kB. (I installed tracemalloc manually on Python 3.3 to get this number.)I suggest to initialized these precomputed tables to None, and compute them at the first call to b32encode() / b85encode() / a85encode().The new Base65 and Ascii85 codecs were added by the issue#17618._b32tab2 comes from changeset (1b5ef05d6ced) of issue#17812: "quadratic complexity of base64.b32encode(). Optimize base64.b32encode() and base64.b32decode() (speed up to 3x)." So 3.3 branch is also affected (for the base32 table). | |||
| msg213023 -(view) | Author: Martin v. Löwis (loewis)*![]() | Date: 2014-03-10 10:17 | |
It's a classical time-space trade-off. I'm not sure this is a bug; if it is, the reasonable change (IMO) is to just revert the introduction of these *2 tables, and loop through the output character-by-character.I'd personally be in favor of such a change (drop the tables). If people think that these functions are too slow, they should propose an extension of the binascii module, rather than micro-optimizing the Python code. | |||
| msg213024 -(view) | Author: Martin v. Löwis (loewis)*![]() | Date: 2014-03-10 10:18 | |
I'm also -0 on delayed creation of the tables. If these encodings are used, the memory overhead will still occur. | |||
| msg213025 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-03-10 10:21 | |
> I'd personally be in favor of such a change (drop the tables).I didn't propose to drop the table, but create a table the first time that it is needed.I never used ascii85 nor base85 nor base32 in my applications. Maybe because ascii85 and base85 are new in Python 3.4? ;-) Python should not waste memory if a function is never called.I only import base64 module to the base64 codec :) (I don't care if there are more functions if they don't waste my memory.) | |||
| msg213026 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-03-10 10:41 | |
base64_on_demand.patch: create tables the first time they are needed.b85decode() already uses such trick to build _b85dec.Note: a85decode() doesn't use a precomputed table.> If people think that these functions are too slow, they should propose an extension of the binascii module, rather than micro-optimizing the Python code.Yes, a C extension may be even faster. But I'm not interested to work on such enhance, I just want to fix the regression in memory usage right now. | |||
| msg213027 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2014-03-10 11:26 | |
I was more interested in the import time, and it has slightly increased. Memory consumption ((32**2+2*85**2)*sys.getsizeof(b'xx')/2**10 + sys.getsizeof(dict.fromkeys(range(32**2))) + 2*sys.getsizeof(dict.fromkeys(range(85**2))) + 2*(85**2-256)*sys.getsizeof(85**2) < 1MB) was seemed small compared with the overall memory usage of Python (20-25MB), so I were ignored it.No need for lazy initialization of small tables, and the overhead of the _init_*85_tables() call for every encoding/decoding operation may be too large. Here is simpler patch. | |||
| msg213028 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2014-03-10 11:41 | |
And here is similar patch for urllib.parse. | |||
| msg213029 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-03-10 11:50 | |
> Here is simpler patch.Your patch only changes _*tab2, not the smaller tables. | |||
| msg213030 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2014-03-10 12:06 | |
Yes, because small tables consume two orders less memory. | |||
| msg213075 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2014-03-10 19:49 | |
Victor's approach looks fine to me. | |||
| msg213564 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-03-14 15:28 | |
urlparse_lazy_init.patch looks good, but you should add a comment explaining your change. See my review:http://bugs.python.org/review/20879/ | |||
| msg213566 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-03-14 16:01 | |
I combined the two patches and I tried to address all comments: base64_urlparse_lazy_init-2.patch. | |||
| msg213569 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-03-14 16:58 | |
Serhiy wrote on Rietveld:"As far as this constant is repeated twice, it wastes memory and errorprone."Oh, I expected marshal to be smart and use references, but it does not.Here is a simpler patch which initialize all base85 tables at once using a "b85alphabet" variable which simplify also the initialization of the decoder table. | |||
| msg213574 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-03-14 17:35 | |
Sorry, it took me many tries to write the perfect patch :-) The current code has artificial dependencies between variables and has references to variables when they are not needed in fact.base64_urlparse_lazy_init-4.patch should be even simpler. | |||
| msg213577 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2014-03-14 17:43 | |
I don't like base64_urlparse_lazy_init-3.patch at all. It waste memory and time for initializing of non-needed tables in case when only encoding or only decoding is used.Here is new patch based on base64_urlparse_lazy_init-2.patch. Unlike to my patch, it delays initialization of small tables too. Unlike to Victor's patches it initializes only needed tables and delays initialization of b32 tables. | |||
| msg213578 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2014-03-14 17:58 | |
Well, our patches are almost same so your patch LGTM (and it has more comments). But note that my initialization of _b32tab2 is a little faster (bytes() is called 32 times instead of 1024). | |||
| msg213902 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2014-03-17 21:40 | |
New changeset7093d5758954 by Victor Stinner in branch '3.4':Issue#20879: Delay the initialization of encoding and decoding tables forhttp://hg.python.org/cpython/rev/7093d5758954New changeset06d646935c9a by Victor Stinner in branch 'default':(Merge 3.4) Issue#20879: Delay the initialization of encoding and decodinghttp://hg.python.org/cpython/rev/06d646935c9a | |||
| msg213903 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2014-03-17 21:41 | |
Thanks Serhiy, I merged your last patch with mine. This issue should now be fixed.As Martin wrote, an enhancement would be to reimplement these functions in C without such table. | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:57:59 | admin | set | github: 65078 |
| 2014-03-17 22:18:31 | Arfrever | set | stage: resolved |
| 2014-03-17 21:41:31 | vstinner | set | status: open -> closed resolution: fixed messages: +msg213903 |
| 2014-03-17 21:40:07 | python-dev | set | nosy: +python-dev messages: +msg213902 |
| 2014-03-14 17:58:14 | serhiy.storchaka | set | messages: +msg213578 |
| 2014-03-14 17:43:58 | serhiy.storchaka | set | files: +base64_urlparse_lazy_init-4.patch messages: +msg213577 |
| 2014-03-14 17:35:22 | vstinner | set | files: +base64_urlparse_lazy_init-4.patch messages: +msg213574 |
| 2014-03-14 16:58:53 | vstinner | set | files: +base64_urlparse_lazy_init-3.patch messages: +msg213569 |
| 2014-03-14 16:01:08 | vstinner | set | files: +base64_urlparse_lazy_init-2.patch messages: +msg213566 |
| 2014-03-14 15:28:51 | vstinner | set | messages: +msg213564 |
| 2014-03-10 19:49:34 | pitrou | set | type: performance -> resource usage messages: +msg213075 versions: + Python 3.5, - Python 3.3, Python 3.4 |
| 2014-03-10 12:06:29 | serhiy.storchaka | set | messages: +msg213030 |
| 2014-03-10 11:50:56 | vstinner | set | messages: +msg213029 |
| 2014-03-10 11:41:14 | serhiy.storchaka | set | files: +urlparse_lazy_init.patch messages: +msg213028 |
| 2014-03-10 11:26:55 | serhiy.storchaka | set | files: +base64_lazy_init.patch messages: +msg213027 |
| 2014-03-10 10:50:53 | r.david.murray | set | nosy: +r.david.murray |
| 2014-03-10 10:41:21 | vstinner | set | files: +base64_on_demand.patch keywords: +patch messages: +msg213026 |
| 2014-03-10 10:39:16 | Arfrever | set | nosy: +Arfrever |
| 2014-03-10 10:21:13 | vstinner | set | messages: +msg213025 |
| 2014-03-10 10:18:40 | loewis | set | messages: +msg213024 |
| 2014-03-10 10:17:03 | loewis | set | nosy: +loewis messages: +msg213023 |
| 2014-03-10 09:35:27 | vstinner | create | |