Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.4k
Description
Bug report
Bug description:
Hi!
Investigating some memory issues on my lamdba, I discovered an odd usage coming fromemail.message_from_bytes
When opening an .eml that contains close to no text but a 30Mb attachment, the memory usage jumps to +238Mb !
9 times the size of the file!!
Here's what was my tests:
fromemailimportmessage_from_bytesimportresourceprint('Init ram: {}kb'.format(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss))data=Nonewithopen('file.eml','rb')asf:data=f.read()print('File loaded: {}kb'.format(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss))print(' (file size: {}kb)'.format(len(data)/1024))mail=message_from_bytes(data)print('After message_from_bytes: {}kb'.format(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss))
And the output:
Init ram: 7168kbFile loaded: 37120kb (file size: 29900kb)After message_from_bytes: 279296kb
The EML in question contains an attachment (a CSV file) encoded in Base64. I suspect thatBytesParser
is converting that content to binary data, but I find it surprising that doing this takes 9 times the filesize.
Wouldn't it be faster and more efficient to convert that only when accessing, and having a way to not convert it at all (getting it raw, in base64) ?
(Maybe there is already and I missed it?)
I tested this in:
- Python 3.10.13
- Python 3.12.1
And got the same results.
CPython versions tested on:
3.10
Operating systems tested on:
Linux