Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue20879

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:base64 module of Python 3.4 uses 920 kB of memory
Type:resource usageStage:resolved
Components:Versions:Python 3.5
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To:Nosy List: Arfrever, loewis, pitrou, python-dev, r.david.murray, serhiy.storchaka, vstinner
Priority:normalKeywords:patch

Created on2014-03-10 09:35 byvstinner, last changed2022-04-11 14:57 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
base64_mem.pyvstinner,2014-03-10 09:35
base64_on_demand.patchvstinner,2014-03-10 10:41review
base64_lazy_init.patchserhiy.storchaka,2014-03-10 11:26review
urlparse_lazy_init.patchserhiy.storchaka,2014-03-10 11:41review
base64_urlparse_lazy_init-2.patchvstinner,2014-03-14 16:01review
base64_urlparse_lazy_init-3.patchvstinner,2014-03-14 16:58review
base64_urlparse_lazy_init-4.patchvstinner,2014-03-14 17:35review
base64_urlparse_lazy_init-4.patchserhiy.storchaka,2014-03-14 17:43review
Messages (18)
msg213018 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2014-03-10 09:35
According to tracemalloc, "import base64" allocates 920.6 kB of memory. The 3 top locations are:Lib/base64.py:414: size=420 KiB, count=7226, average=59 B    _b85chars2 = [(a + b) for a in _b85chars for b in _b85chars]Lib/base64.py:306: size=420 KiB, count=7226, average=59 B    _a85chars2 = [(a + b) for a in _a85chars for b in _a85chars]Lib/base64.py:142: size=59.8 KiB, count=1025, average=60 B    _b32tab2 = [a + b for a in _b32tab for b in _b32tab]Compare it to Python 3.3: base64 of Python 3.3 only allocates 10.3 kB. (I installed tracemalloc manually on Python 3.3 to get this number.)I suggest to initialized these precomputed tables to None, and compute them at the first call to b32encode() / b85encode() / a85encode().The new Base65 and Ascii85 codecs were added by the issue#17618._b32tab2 comes from changeset (1b5ef05d6ced) of issue#17812: "quadratic complexity of base64.b32encode(). Optimize base64.b32encode() and base64.b32decode() (speed up to 3x)." So 3.3 branch is also affected (for the base32 table).
msg213023 -(view)Author: Martin v. Löwis (loewis)*(Python committer)Date: 2014-03-10 10:17
It's a classical time-space trade-off. I'm not sure this is a bug; if it is, the reasonable change (IMO) is to just revert the introduction of these *2 tables, and loop through the output character-by-character.I'd personally be in favor of such a change (drop the tables). If people think that these functions are too slow, they should propose an extension of the binascii module, rather than micro-optimizing the Python code.
msg213024 -(view)Author: Martin v. Löwis (loewis)*(Python committer)Date: 2014-03-10 10:18
I'm also -0 on delayed creation of the tables. If these encodings are used, the memory overhead will still occur.
msg213025 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2014-03-10 10:21
> I'd personally be in favor of such a change (drop the tables).I didn't propose to drop the table, but create a table the first time that it is needed.I never used ascii85 nor base85 nor base32 in my applications. Maybe because ascii85 and base85 are new in Python 3.4? ;-) Python should not waste memory if a function is never called.I only import base64 module to the base64 codec :) (I don't care if there are more functions if they don't waste my memory.)
msg213026 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2014-03-10 10:41
base64_on_demand.patch: create tables the first time they are needed.b85decode() already uses such trick to build _b85dec.Note: a85decode() doesn't use a precomputed table.> If people think that these functions are too slow, they should propose an extension of the binascii module, rather than micro-optimizing the Python code.Yes, a C extension may be even faster. But I'm not interested to work on such enhance, I just want to fix the regression in memory usage right now.
msg213027 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2014-03-10 11:26
I was more interested in the import time, and it has slightly increased. Memory consumption ((32**2+2*85**2)*sys.getsizeof(b'xx')/2**10 + sys.getsizeof(dict.fromkeys(range(32**2))) + 2*sys.getsizeof(dict.fromkeys(range(85**2))) + 2*(85**2-256)*sys.getsizeof(85**2) < 1MB) was seemed small compared with the overall memory usage of Python (20-25MB), so I were ignored it.No need for lazy initialization of small tables, and the overhead of the _init_*85_tables() call for every encoding/decoding operation may be too large. Here is simpler patch.
msg213028 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2014-03-10 11:41
And here is similar patch for urllib.parse.
msg213029 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2014-03-10 11:50
> Here is simpler patch.Your patch only changes _*tab2, not the smaller tables.
msg213030 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2014-03-10 12:06
Yes, because small tables consume two orders less memory.
msg213075 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2014-03-10 19:49
Victor's approach looks fine to me.
msg213564 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2014-03-14 15:28
urlparse_lazy_init.patch looks good, but you should add a comment explaining your change. See my review:http://bugs.python.org/review/20879/
msg213566 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2014-03-14 16:01
I combined the two patches and I tried to address all comments: base64_urlparse_lazy_init-2.patch.
msg213569 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2014-03-14 16:58
Serhiy wrote on Rietveld:"As far as this constant is repeated twice, it wastes memory and errorprone."Oh, I expected marshal to be smart and use references, but it does not.Here is a simpler patch which initialize all base85 tables at once using a "b85alphabet" variable which simplify also the initialization of the decoder table.
msg213574 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2014-03-14 17:35
Sorry, it took me many tries to write the perfect patch :-) The current code has artificial dependencies between variables and has references to variables when they are not needed in fact.base64_urlparse_lazy_init-4.patch should be even simpler.
msg213577 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2014-03-14 17:43
I don't like base64_urlparse_lazy_init-3.patch at all. It waste memory and time for initializing of non-needed tables in case when only encoding or only decoding is used.Here is new patch based on base64_urlparse_lazy_init-2.patch. Unlike to my patch, it delays initialization of small tables too. Unlike to Victor's patches it initializes only needed tables and delays initialization of b32 tables.
msg213578 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2014-03-14 17:58
Well, our patches are almost same so your patch LGTM (and it has more comments). But note that my initialization of _b32tab2 is a little faster (bytes() is called 32 times instead of 1024).
msg213902 -(view)Author: Roundup Robot (python-dev)(Python triager)Date: 2014-03-17 21:40
New changeset7093d5758954 by Victor Stinner in branch '3.4':Issue#20879: Delay the initialization of encoding and decoding tables forhttp://hg.python.org/cpython/rev/7093d5758954New changeset06d646935c9a by Victor Stinner in branch 'default':(Merge 3.4) Issue#20879: Delay the initialization of encoding and decodinghttp://hg.python.org/cpython/rev/06d646935c9a
msg213903 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2014-03-17 21:41
Thanks Serhiy, I merged your last patch with mine. This issue should now be fixed.As Martin wrote, an enhancement would be to reimplement these functions in C without such table.
History
DateUserActionArgs
2022-04-11 14:57:59adminsetgithub: 65078
2014-03-17 22:18:31Arfreversetstage: resolved
2014-03-17 21:41:31vstinnersetstatus: open -> closed
resolution: fixed
messages: +msg213903
2014-03-17 21:40:07python-devsetnosy: +python-dev
messages: +msg213902
2014-03-14 17:58:14serhiy.storchakasetmessages: +msg213578
2014-03-14 17:43:58serhiy.storchakasetfiles: +base64_urlparse_lazy_init-4.patch

messages: +msg213577
2014-03-14 17:35:22vstinnersetfiles: +base64_urlparse_lazy_init-4.patch

messages: +msg213574
2014-03-14 16:58:53vstinnersetfiles: +base64_urlparse_lazy_init-3.patch

messages: +msg213569
2014-03-14 16:01:08vstinnersetfiles: +base64_urlparse_lazy_init-2.patch

messages: +msg213566
2014-03-14 15:28:51vstinnersetmessages: +msg213564
2014-03-10 19:49:34pitrousettype: performance -> resource usage
messages: +msg213075
versions: + Python 3.5, - Python 3.3, Python 3.4
2014-03-10 12:06:29serhiy.storchakasetmessages: +msg213030
2014-03-10 11:50:56vstinnersetmessages: +msg213029
2014-03-10 11:41:14serhiy.storchakasetfiles: +urlparse_lazy_init.patch

messages: +msg213028
2014-03-10 11:26:55serhiy.storchakasetfiles: +base64_lazy_init.patch

messages: +msg213027
2014-03-10 10:50:53r.david.murraysetnosy: +r.david.murray
2014-03-10 10:41:21vstinnersetfiles: +base64_on_demand.patch
keywords: +patch
messages: +msg213026
2014-03-10 10:39:16Arfreversetnosy: +Arfrever
2014-03-10 10:21:13vstinnersetmessages: +msg213025
2014-03-10 10:18:40loewissetmessages: +msg213024
2014-03-10 10:17:03loewissetnosy: +loewis
messages: +msg213023
2014-03-10 09:35:27vstinnercreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp