
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2012-08-08 22:38 byvstinner, last changed2022-04-11 14:57 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| pickle_unicode.patch | vstinner,2012-08-08 22:38 | review | ||
| pickleutf8.patch | pitrou,2013-04-06 16:48 | review | ||
| Messages (15) | |||
|---|---|---|---|
| msg167730 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2012-08-08 22:38 | |
Serialization of Unicode strings in the pickle module is suboptimal, especially for long strings.Attached patch optimize the serialization thanks to new properties of Unicode strings (PEP 393): * text (protocol 0): avoid any temporary buffer if the string is an ASCII or latin1 string without "\\" or "\n" character; otherwise use a small buffer of 64 KB (instead of two buffer) * binary (protocol 1, 2): avoid any temporary buffer if string is an ASCII string or if the string is already available encoded as UTF-8The current code for protocol 0 uses raw_unicode_escape() which is really suboptimal: it uses a first buffer to write the escape string, and then a new temporary buffer to store the buffer with the right size (instead of just calling _PyBytes_Resize). | |||
| msg167731 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2012-08-08 22:41 | |
Oh, I forgot to explain that I initially wrote the patch to fix the following failure on our "bigmem" buildbot.http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%20bigmem%203.x/builds/165/steps/test/logs/stdio======================================================================ERROR: test_huge_str_32b (test.test_pickle.InMemoryPickleTests)----------------------------------------------------------------------Traceback (most recent call last): File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/support.py", line 1281, in wrapper return f(self, maxsize) File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/pickletester.py", line 1267, in test_huge_str_32b pickled = self.dumps(data, protocol=proto) File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/test_pickle.py", line 49, in dumps return pickle.dumps(arg, protocol)MemoryError | |||
| msg167796 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2012-08-09 17:10 | |
Looks interesting. Can you post benchmark numbers?(you can use the pickle tests fromhttp://hg.python.org/benchmarks ) | |||
| msg167839 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2012-08-09 21:49 | |
Here is a benchmark comparing Python 3.3 without and with my patchned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../default/python ../fasterpickle/pythonRunning fastpickle...INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickleINFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickleRunning pickle_dict...INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dictINFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dictRunning pickle_list...INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_listINFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_listRunning slowpickle...INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 pickleINFO:root:Running ../default/python performance/bm_pickle.py -n 50 pickleReport on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64Total CPU cores: 8### fastpickle ###Min: 0.530622 -> 0.332841: 1.59x fasterAvg: 0.539450 -> 0.336833: 1.60x fasterSignificant (t=232.04)Stddev: 0.00552 -> 0.00276: 2.0032x smallerTimeline: b'http://tinyurl.com/dyu3vap'The following not significant results are hidden, use -v to show them:pickle_dict, pickle_list, slowpickle. | |||
| msg167842 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2012-08-09 22:03 | |
For your information, results of benchmark comparing Python 3.2 to 3.3:ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../3.2/python ../default/python Running fastpickle...INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickleINFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickleRunning pickle_dict...INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dictINFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dictRunning pickle_list...INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_listINFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_listRunning slowpickle...INFO:root:Running ../default/python performance/bm_pickle.py -n 50 pickleINFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 pickleReport on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64Total CPU cores: 8### fastpickle ###Min: 0.455842 -> 0.542103: 1.19x slowerAvg: 0.462334 -> 0.547271: 1.18x slowerSignificant (t=-101.15)Stddev: 0.00362 -> 0.00471: 1.3028x largerTimeline: b'http://tinyurl.com/btr644x'### pickle_dict ###Min: 0.360125 -> 0.345850: 1.04x fasterAvg: 0.364019 -> 0.348431: 1.04x fasterSignificant (t=30.84)Stddev: 0.00308 -> 0.00181: 1.6973x smallerTimeline: b'http://tinyurl.com/cd3ashu'### pickle_list ###Min: 0.803941 -> 0.584800: 1.37x fasterAvg: 0.811115 -> 0.589200: 1.38x fasterSignificant (t=455.00)Stddev: 0.00261 -> 0.00225: 1.1612x smallerTimeline: b'http://tinyurl.com/8u4m2wf'### slowpickle ###Min: 0.409008 -> 0.461257: 1.13x slowerAvg: 0.413668 -> 0.466201: 1.13x slowerSignificant (t=-115.31)Stddev: 0.00236 -> 0.00219: 1.0772x smallerTimeline: b'http://tinyurl.com/czrg5kf' | |||
| msg167847 -(view) | Author: Alexandre Vassalotti (alexandre.vassalotti)*![]() | Date: 2012-08-09 22:08 | |
Amazing! Though, it would probably be good idea to benchmarks non-ASCII strings as well. | |||
| msg167848 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2012-08-09 22:08 | |
Last one: Python 3.2 vs patched Python 3.3.ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../3.2/python ../fasterpickle/pythonRunning fastpickle...INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickleINFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickleRunning pickle_dict...INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dictINFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dictRunning pickle_list...INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_listINFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_listRunning slowpickle...INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 pickleINFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 pickleReport on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64Total CPU cores: 8### fastpickle ###Min: 0.470211 -> 0.322453: 1.46x fasterAvg: 0.475718 -> 0.328496: 1.45x fasterSignificant (t=205.65)Stddev: 0.00317 -> 0.00395: 1.2456x largerTimeline: b'http://tinyurl.com/9qpphzp'### pickle_dict ###Min: 0.353965 -> 0.347959: 1.02x fasterAvg: 0.358980 -> 0.350596: 1.02x fasterSignificant (t=10.44)Stddev: 0.00545 -> 0.00160: 3.3956x smallerTimeline: b'http://tinyurl.com/9pfeqf9'### pickle_list ###Min: 0.838222 -> 0.593497: 1.41x fasterAvg: 0.844636 -> 0.599491: 1.41x fasterSignificant (t=296.53)Stddev: 0.00520 -> 0.00267: 1.9521x smallerTimeline: b'http://tinyurl.com/9rynvnv'### slowpickle ###Min: 0.408205 -> 0.458309: 1.12x slowerAvg: 0.413738 -> 0.463916: 1.12x slowerSignificant (t=-53.85)Stddev: 0.00263 -> 0.00604: 2.3019x largerTimeline: b'http://tinyurl.com/coffkbg' | |||
| msg178872 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2013-01-03 01:14 | |
serhiy: I'm not really motivated to finish the work on this issue (especially "... it would probably be good idea to benchmarks non-ASCII strings as well."). Would you like to work on this? | |||
| msg178934 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2013-01-03 11:04 | |
Well, I take care of this. I have the own patch for raw_unicode_escape() optimization, but microbenchmarks don't show any speed up. Maybe your approach will be better. | |||
| msg186115 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2013-04-05 23:27 | |
Ping? | |||
| msg186126 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2013-04-06 14:39 | |
Since protocol 0 is essentially dead in Python 3, I would like to propose something simpler and safer: only optimize the binary protocols. If noone beats me to it, I'll adapt Victor's patch for that. | |||
| msg186139 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2013-04-06 16:48 | |
Here is a new patch. Benchmark:### fastpickle ###Min: 0.631457 -> 0.399104: 1.58x fasterAvg: 0.631868 -> 0.399519: 1.58x fasterSignificant (t=701.85)Stddev: 0.00037 -> 0.00064: 1.7604x largerTimeline:http://tinyurl.com/c6n8h5g | |||
| msg186218 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2013-04-07 15:41 | |
New changeset09a84091ae96 by Antoine Pitrou in branch 'default':Issue#15596: Faster pickling of unicode strings.http://hg.python.org/cpython/rev/09a84091ae96 | |||
| msg186219 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2013-04-07 15:50 | |
I've applied the review comments and committed the patch. Thank you! | |||
| msg186247 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2013-04-07 21:34 | |
Hi Antoine, I prefer your patch. Great job!2013/4/7 Antoine Pitrou <report@bugs.python.org>:>> Antoine Pitrou added the comment:>> I've applied the review comments and committed the patch. Thank you!>> ----------> resolution: -> fixed> stage: patch review -> committed/rejected> status: open -> closed>> _______________________________________> Python tracker <report@bugs.python.org>> <http://bugs.python.org/issue15596>> _______________________________________ | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:57:34 | admin | set | github: 59801 |
| 2013-04-07 21:34:52 | vstinner | set | messages: +msg186247 |
| 2013-04-07 15:50:10 | pitrou | set | status: open -> closed resolution: fixed messages: +msg186219 stage: patch review -> resolved |
| 2013-04-07 15:41:01 | python-dev | set | nosy: +python-dev messages: +msg186218 |
| 2013-04-06 16:49:43 | pitrou | set | stage: patch review |
| 2013-04-06 16:48:13 | pitrou | set | files: +pickleutf8.patch messages: +msg186139 |
| 2013-04-06 14:39:27 | pitrou | set | messages: +msg186126 |
| 2013-04-05 23:27:45 | pitrou | set | messages: +msg186115 |
| 2013-01-03 11:04:52 | serhiy.storchaka | set | messages: +msg178934 |
| 2013-01-03 01:14:16 | vstinner | set | nosy: +serhiy.storchaka messages: +msg178872 |
| 2012-08-11 02:16:47 | jcea | set | nosy: +jcea |
| 2012-08-09 22:08:26 | vstinner | set | messages: +msg167848 |
| 2012-08-09 22:08:24 | alexandre.vassalotti | set | messages: +msg167847 |
| 2012-08-09 22:03:37 | vstinner | set | messages: +msg167842 |
| 2012-08-09 21:49:42 | vstinner | set | messages: +msg167839 |
| 2012-08-09 17:10:27 | pitrou | set | messages: +msg167796 |
| 2012-08-08 22:41:32 | vstinner | set | messages: +msg167731 |
| 2012-08-08 22:38:41 | vstinner | create | |