
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2016-01-31 17:48 byvstinner, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| pymem.patch | vstinner,2016-01-31 17:48 | review | ||
| python_memleak.py | vstinner,2016-02-02 11:12 | |||
| tu_malloc.c | vstinner,2016-02-02 11:12 | |||
| pymem_27.patch | catalin.manciu,2016-02-22 12:50 | review | ||
| pymalloc.patch | vstinner,2016-03-14 12:58 | review | ||
| Messages (51) | |||
|---|---|---|---|
| msg259290 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-01-31 17:48 | |
The issue#23601 showed speedup for the dict type by replacing PyMem_Malloc() with PyObject_Malloc() in dictobject.c.When I worked on thePEP 445, it was discussed to use the Python fast memory allocator for small memory allocations (<= 512 bytes), but I think that nobody tested on benchmark.So I open an issue to discuss that.By the way, we should also benchmark the Windows memory allocator which limits fragmentations. Maybe we can skip the Python small memory allocator on recent version of Windows?Attached patch implements the change. The main question is the speedup on various kinds of memory allocations (need a benchmark) :-)I will try to run benchmarks.--If the patch slows down Python, maybe we can investigate if some Python types (like dict) mostly uses "small" memory blocks (<= 512 bytes). | |||
| msg259297 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-01-31 17:59 | |
Ok, to avoid confusion, I opened an issue specific to Windows for its "Low-fragmentation Heap": issue#26251.Other issues related to memory allocators.Merged:- issue#21233: Add *Calloc functions to CPython memory allocation API (extension of thePEP 445, asked by numpy)- issue#13483: Use VirtualAlloc to allocate memory arenas (implementation of thePEP 445)- issue#3329: API for setting the memory allocator used by PythonOpen:- issue#18835: Add aligned memory variants to the suite of PyMem functions/macros => this one is still open, the status is unclear :-/ | |||
| msg259376 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2016-02-02 11:06 | |
Hum, the point of PyMem_Malloc() is that it's distinct from PyObject_Malloc(), right? Why would you redirect one to the other? | |||
| msg259377 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2016-02-02 11:06 | |
(of course, we might question why we have two different families of allocation APIs...) | |||
| msg259378 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-02-02 11:10 | |
> Hum, the point of PyMem_Malloc() is that it's distinct from PyObject_Malloc(), right? Why would you redirect one to the other?For performances.> (of course, we might question why we have two different families of allocation APIs...)That's the real question: why does Python have PyMem family? Is it still justified in 2016?--Firefox uses jemalloc to limit the fragmentation of the heap memory. Once I spent a lot of time to try to understand the principle of fragmentation, and in my tiny benchmarks, jemalloc was *much* better than system allocator. By the way, jemalloc scales well on multiple threads ;-)*http://www.canonware.com/jemalloc/*https://github.com/jemalloc/jemalloc/wikiMy notes on heap memory fragmentation:http://haypo-notes.readthedocs.org/heap_fragmentation.html | |||
| msg259379 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-02-02 11:12 | |
About heap memory fragmentation, see also my attached two "benchmarks" in Python and C: python_memleak.py and tu_malloc.c. | |||
| msg259382 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-02-02 12:00 | |
So, I ran ssh://hg@hg.python.org/benchmarks with my patch. It looks like some benchmarks are up to 4% faster:$ python3 -u perf.py ../default/python.orig ../default/python.pymemINFO:root:Automatically selected timer: perf_counter[ 1/10] 2to3...INFO:root:Running `../default/python.pymem lib3/2to3/2to3 -f all lib/2to3`INFO:root:Running `../default/python.pymem lib3/2to3/2to3 -f all lib/2to3` 1 timeINFO:root:Running `../default/python.orig lib3/2to3/2to3 -f all lib/2to3`INFO:root:Running `../default/python.orig lib3/2to3/2to3 -f all lib/2to3` 1 time[ 2/10] chameleon_v2...INFO:root:Running `../default/python.pymem performance/bm_chameleon_v2.py -n 50 --timer perf_counter`INFO:root:Running `../default/python.orig performance/bm_chameleon_v2.py -n 50 --timer perf_counter`[ 3/10] django_v3...INFO:root:Running `../default/python.pymem performance/bm_django_v3.py -n 50 --timer perf_counter`INFO:root:Running `../default/python.orig performance/bm_django_v3.py -n 50 --timer perf_counter`[ 4/10] fastpickle...INFO:root:Running `../default/python.pymem performance/bm_pickle.py -n 50 --timer perf_counter --use_cpickle pickle`INFO:root:Running `../default/python.orig performance/bm_pickle.py -n 50 --timer perf_counter --use_cpickle pickle`[ 5/10] fastunpickle...INFO:root:Running `../default/python.pymem performance/bm_pickle.py -n 50 --timer perf_counter --use_cpickle unpickle`INFO:root:Running `../default/python.orig performance/bm_pickle.py -n 50 --timer perf_counter --use_cpickle unpickle`[ 6/10] json_dump_v2...INFO:root:Running `../default/python.pymem performance/bm_json_v2.py -n 50 --timer perf_counter`INFO:root:Running `../default/python.orig performance/bm_json_v2.py -n 50 --timer perf_counter`[ 7/10] json_load...INFO:root:Running `../default/python.pymem performance/bm_json.py -n 50 --timer perf_counter json_load`INFO:root:Running `../default/python.orig performance/bm_json.py -n 50 --timer perf_counter json_load`[ 8/10] nbody...INFO:root:Running `../default/python.pymem performance/bm_nbody.py -n 50 --timer perf_counter`INFO:root:Running `../default/python.orig performance/bm_nbody.py -n 50 --timer perf_counter`[ 9/10] regex_v8...INFO:root:Running `../default/python.pymem performance/bm_regex_v8.py -n 50 --timer perf_counter`INFO:root:Running `../default/python.orig performance/bm_regex_v8.py -n 50 --timer perf_counter`[10/10] tornado_http...INFO:root:Running `../default/python.pymem performance/bm_tornado_http.py -n 100 --timer perf_counter`INFO:root:Running `../default/python.orig performance/bm_tornado_http.py -n 100 --timer perf_counter`Report on Linux smithers 4.3.3-300.fc23.x86_64 #1 SMP Tue Jan 5 23:31:01 UTC 2016 x86_64 x86_64Total CPU cores: 8### 2to3 ###6.880090 -> 6.818911: 1.01x faster### fastpickle ###Min: 0.453826 -> 0.442081: 1.03x fasterAvg: 0.456499 -> 0.443978: 1.03x fasterSignificant (t=20.03)Stddev: 0.00370 -> 0.00242: 1.5293x smaller### fastunpickle ###Min: 0.547908 -> 0.526027: 1.04x fasterAvg: 0.554663 -> 0.528686: 1.05x fasterSignificant (t=15.95)Stddev: 0.00893 -> 0.00728: 1.2260x smaller### json_dump_v2 ###Min: 2.733907 -> 2.627718: 1.04x fasterAvg: 2.762473 -> 2.664675: 1.04x fasterSignificant (t=11.99)Stddev: 0.03796 -> 0.04341: 1.1435x larger### regex_v8 ###Min: 0.042438 -> 0.042581: 1.00x slowerAvg: 0.042805 -> 0.044078: 1.03x slowerSignificant (t=-2.12)Stddev: 0.00171 -> 0.00388: 2.2694x larger### tornado_http ###Min: 0.254089 -> 0.246088: 1.03x fasterAvg: 0.257046 -> 0.249033: 1.03x fasterSignificant (t=15.83)Stddev: 0.00401 -> 0.00310: 1.2930x smallerThe following not significant results are hidden, use -v to show them:chameleon_v2, django_v3, json_load, nbody.real19m13.413suser18m50.024ssys0m22.507s | |||
| msg259383 -(view) | Author: Yury Selivanov (Yury.Selivanov)* | Date: 2016-02-02 13:17 | |
> On Feb 2, 2016, at 7:00 AM, STINNER Victor <report@bugs.python.org> wrote:> > So, I ran ssh://hg@hg.python.org/benchmarks with my patch. It looks like some benchmarks are up to 4% faster:Please use -r flag for perf.py | |||
| msg259384 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2016-02-02 13:28 | |
> It looks like some benchmarks are up to 4% faster:What this says is that some internals uses of PyMem_XXX should be replaced with PyObject_XXX. | |||
| msg259385 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-02-02 13:40 | |
FYI benchmark result to compare Python with and without pymalloc (fast memory allocator for block <= 512 bytes). As expected, no pymalloc is slower, up to 30% slower (and it's never faster).Report on Linux smithers 4.3.3-300.fc23.x86_64 #1 SMP Tue Jan 5 23:31:01 UTC 2016 x86_64 x86_64Total CPU cores: 8### 2to3 ###7.253671 -> 7.558993: 1.04x slower### chameleon_v2 ###Min: 5.598481 -> 5.794526: 1.04x slowerAvg: 5.714233 -> 5.922142: 1.04x slowerSignificant (t=-8.01)Stddev: 0.15956 -> 0.09048: 1.7636x smaller### django_v3 ###Min: 0.574221 -> 0.606462: 1.06x slowerAvg: 0.579659 -> 0.612088: 1.06x slowerSignificant (t=-28.44)Stddev: 0.00605 -> 0.00532: 1.1371x smaller### fastpickle ###Min: 0.450852 -> 0.502645: 1.11x slowerAvg: 0.455619 -> 0.513777: 1.13x slowerSignificant (t=-26.24)Stddev: 0.00696 -> 0.01404: 2.0189x larger### fastunpickle ###Min: 0.544064 -> 0.696306: 1.28x slowerAvg: 0.552459 -> 0.705372: 1.28x slowerSignificant (t=-85.52)Stddev: 0.00798 -> 0.00980: 1.2281x larger### json_dump_v2 ###Min: 2.780312 -> 3.265531: 1.17x slowerAvg: 2.830463 -> 3.370060: 1.19x slowerSignificant (t=-23.73)Stddev: 0.04190 -> 0.15521: 3.7046x larger### json_load ###Min: 0.428893 -> 0.558956: 1.30x slowerAvg: 0.431941 -> 0.569441: 1.32x slowerSignificant (t=-74.76)Stddev: 0.00791 -> 0.01033: 1.3060x larger### regex_v8 ###Min: 0.043439 -> 0.044614: 1.03x slowerAvg: 0.044388 -> 0.046487: 1.05x slowerSignificant (t=-4.95)Stddev: 0.00215 -> 0.00209: 1.0283x smaller### tornado_http ###Min: 0.264603 -> 0.278840: 1.05x slowerAvg: 0.270153 -> 0.285263: 1.06x slowerSignificant (t=-23.04)Stddev: 0.00489 -> 0.00436: 1.1216x smallerThe following not significant results are hidden, use -v to show them:nbody. | |||
| msg259389 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-02-02 14:47 | |
Test with jemalloc using the shell script "python.jemalloc":---#!/bin/shLD_PRELOAD=/usr/lib64/libjemalloc.so /home/haypo/prog/python/default/python "$@"---Memory consumption:python3 -u perf.py -m ../default/python ../default/python.jemallocReport on Linux smithers 4.3.3-300.fc23.x86_64 #1 SMP Tue Jan 5 23:31:01 UTC 2016 x86_64 x86_64Total CPU cores: 8### 2to3 ###Mem max: 43100.000 -> 220.000: 195.9091x smaller### chameleon_v2 ###Mem max: 367276.000 -> 224.000: 1639.6250x smaller### django_v3 ###Mem max: 24136.000 -> 284.000: 84.9859x smaller### fastpickle ###Mem max: 8692.000 -> 284.000: 30.6056x smaller### fastunpickle ###Mem max: 8704.000 -> 216.000: 40.2963x smaller### json_dump_v2 ###Mem max: 10448.000 -> 216.000: 48.3704x smaller### json_load ###Mem max: 8444.000 -> 220.000: 38.3818x smaller### nbody ###Mem max: 7388.000 -> 220.000: 33.5818x smaller### regex_v8 ###Mem max: 12764.000 -> 220.000: 58.0182x smaller### tornado_http ###Mem max: 28216.000 -> 228.000: 123.7544x smaller****Performance:Report on Linux smithers 4.3.3-300.fc23.x86_64 #1 SMP Tue Jan 5 23:31:01 UTC 2016 x86_64 x86_64Total CPU cores: 8### 2to3 ###7.413484 -> 7.189792: 1.03x faster### chameleon_v2 ###Min: 5.559697 -> 5.869468: 1.06x slowerAvg: 5.672448 -> 6.033152: 1.06x slowerSignificant (t=-13.67)Stddev: 0.12098 -> 0.14203: 1.1740x larger### nbody ###Min: 0.242194 -> 0.229747: 1.05x fasterAvg: 0.244991 -> 0.235297: 1.04x fasterSignificant (t=9.75)Stddev: 0.00262 -> 0.00652: 2.4861x larger### regex_v8 ###Min: 0.042532 -> 0.046920: 1.10x slowerAvg: 0.043249 -> 0.047907: 1.11x slowerSignificant (t=-13.23)Stddev: 0.00180 -> 0.00172: 1.0503x smaller### tornado_http ###Min: 0.265755 -> 0.274526: 1.03x slowerAvg: 0.273617 -> 0.284186: 1.04x slowerSignificant (t=-6.67)Stddev: 0.00583 -> 0.01474: 2.5297x largerThe following not significant results are hidden, use -v to show them:django_v3, fastpickle, fastunpickle, json_dump_v2, json_load. | |||
| msg259390 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-02-02 14:48 | |
>> It looks like some benchmarks are up to 4% faster:> What this says is that some internals uses of PyMem_XXX should be replaced with PyObject_XXX.Why not changing PyMem_XXX to use the same fast allocator than PyObject_XXX? (as proposed in this issue)FYI we now also have the PyMem_RawXXX family :) | |||
| msg259391 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2016-02-02 14:52 | |
Le 02/02/2016 15:47, STINNER Victor a écrit :> > ### 2to3 ###> Mem max: 43100.000 -> 220.000: 195.9091x smaller> > ### chameleon_v2 ###> Mem max: 367276.000 -> 224.000: 1639.6250x smaller> > ### django_v3 ###> Mem max: 24136.000 -> 284.000: 84.9859x smallerThese figures are not even remotely believable.It would make sense to investigate them before posting such numbers ;-) | |||
| msg259392 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2016-02-02 14:53 | |
Le 02/02/2016 15:48, STINNER Victor a écrit :>> What this says is that some internals uses of PyMem_XXX should be replaced with PyObject_XXX.> > Why not changing PyMem_XXX to use the same fast allocator thanPyObject_XXX? (as proposed in this issue)Why have two sets of functions doing exactly the same thing? | |||
| msg259393 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-02-02 14:54 | |
> These figures are not even remotely believable.To be honest, I didn't try to understand them :-) Are they the number of kB of the RSS memory?Maybe perf.py doesn't like my shell script? | |||
| msg259395 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-02-02 15:01 | |
> Why have two sets of functions doing exactly the same thing?I have no idea. | |||
| msg259440 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-02-02 22:27 | |
> Test with jemalloc using the shell script "python.jemalloc":> ---> #!/bin/sh> LD_PRELOAD=/usr/lib64/libjemalloc.so /home/haypo/prog/python/default/python "$@"> ---"perf.py -m" doesn't work with such bash script, but it works using exec:---#!/bin/shLD_PRELOAD=/usr/lib64/libjemalloc.so exec /home/haypo/prog/python/default/python "$@"---> Memory consumption:python3 -u perf.py -m ../default/python ../default/python.jemallocHum, it looks like jemalloc uses *more* memory than libc memory allocators. I don't know if it's a known Report on Linux smithers 4.3.3-300.fc23.x86_64 #1 SMP Tue Jan 5 23:31:01 UTC 2016 x86_64 x86_64Total CPU cores: 8### 2to3 ###Mem max: 43088.000 -> 43776.000: 1.0160x larger### chameleon_v2 ###Mem max: 367028.000 -> 626324.000: 1.7065x larger### django_v3 ###Mem max: 23824.000 -> 25120.000: 1.0544x larger### fastpickle ###Mem max: 8696.000 -> 9712.000: 1.1168x larger### fastunpickle ###Mem max: 8708.000 -> 9696.000: 1.1135x larger### json_dump_v2 ###Mem max: 10488.000 -> 11556.000: 1.1018x larger### json_load ###Mem max: 8444.000 -> 9396.000: 1.1127x larger### nbody ###Mem max: 7392.000 -> 8416.000: 1.1385x larger### regex_v8 ###Mem max: 12760.000 -> 13576.000: 1.0639x larger### tornado_http ###Mem max: 28196.000 -> 29920.000: 1.0611x larger | |||
| msg259441 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-02-02 22:27 | |
(Crap. I sent an incomplete message, sorry about that.)> Hum, it looks like jemalloc uses *more* memory than libc memory allocators. I don't know if it's a known I don't know if it's a known issue/property of jemalloc. | |||
| msg259445 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-02-02 23:14 | |
Yury: "Please use -r flag for perf.py"Oh, I didn't know this flag. Sure, I can do that.New benchmark using --rigorous to measure the performance of attached pymem.patch.It always seems faster, newer slower.Report on Linux smithers 4.3.3-300.fc23.x86_64 #1 SMP Tue Jan 5 23:31:01 UTC 2016 x86_64 x86_64Total CPU cores: 8### 2to3 ###Min: 6.772531 -> 6.686245: 1.01x fasterAvg: 6.875264 -> 6.726859: 1.02x fasterSignificant (t=3.44)Stddev: 0.09026 -> 0.03398: 2.6560x smaller### django_v3 ###Min: 0.562797 -> 0.552539: 1.02x fasterAvg: 0.591345 -> 0.557561: 1.06x fasterSignificant (t=4.17)Stddev: 0.07689 -> 0.02581: 2.9794x smaller### fastpickle ###Min: 0.464270 -> 0.437667: 1.06x fasterAvg: 0.467195 -> 0.442298: 1.06x fasterSignificant (t=10.59)Stddev: 0.01156 -> 0.02046: 1.7693x larger### fastunpickle ###Min: 0.548834 -> 0.526554: 1.04x fasterAvg: 0.554601 -> 0.539456: 1.03x fasterSignificant (t=4.67)Stddev: 0.01137 -> 0.03040: 2.6734x larger### json_dump_v2 ###Min: 2.723152 -> 2.603108: 1.05x fasterAvg: 2.749255 -> 2.693655: 1.02x fasterSignificant (t=2.89)Stddev: 0.03016 -> 0.18988: 6.2963x larger### regex_v8 ###Min: 0.044256 -> 0.042201: 1.05x fasterAvg: 0.044733 -> 0.043134: 1.04x fasterSignificant (t=4.55)Stddev: 0.00201 -> 0.00288: 1.4309x larger### tornado_http ###Min: 0.253405 -> 0.247401: 1.02x fasterAvg: 0.256274 -> 0.250380: 1.02x fasterSignificant (t=17.48)Stddev: 0.00285 -> 0.00382: 1.3430x largerThe following not significant results are hidden, use -v to show them:chameleon_v2, json_load, nbody. | |||
| msg260674 -(view) | Author: Catalin Gabriel Manciu (catalin.manciu)* | Date: 2016-02-22 12:50 | |
Hi all,Please find below the results from a complete GUPB run on a patched CPython 3.6. In average, an improvement of about 2.1% can be observed. I'm also attaching an implementation of the patch for CPython 2.7 and its benchmark results. On GUPB the average performance boost is 1.5%. In addition we are also seeing a 2.1% increase in throughput rate from our OpenStack Swift setup as measured by ssbench.Compared to my proposition in issue#26382, this patch yields slightly better results for CPython 3.6, gaining an average of +0.36% on GUPB,and similar results for CPython 2.7.Hardware and OS configuration:==============================Hardware: Intel XEON (Haswell-EP)BIOS settings: Intel Turbo Boost Technology: false Hyper-Threading: false OS: Ubuntu 14.04.2 LTSOS configuration: Address Space Layout Randomization (ASLR) disabled to reduce run to run variation by echo 0 > /proc/sys/kernel/randomize_va_space CPU frequency set fixed at 2.3GHzRepository info:================CPython2 :2d8e8d0e7162 (2.7)CPython3 :f9391e2b74a5 tipResults=======Table 1: CPython 3 GUPB results-------------------------------unpickle_list 22.74%mako_v2 9.13%nqueens 6.32%meteor_contest 5.61%fannkuch 5.34%simple_logging 5.28%formatted_logging 5.06%fastunpickle 4.37%json_dump_v2 3.10%regex_compile 3.01%raytrace 2.95%pathlib 2.43%tornado_http 2.22%django_v3 1.94%telco 1.65%pickle_list 1.59%chaos 1.50%etree_process 1.48%fastpickle 1.34%silent_logging 1.12%2to3 1.09%float 1.01%nbody 0.89%normal_startup 0.86%startup_nosite 0.79%richards 0.67%regex_v8 0.61%etree_generate 0.57%hexiom2 0.54%pickle_dict 0.20%call_simple 0.18%spectral_norm 0.17%regex_effbot 0.16%unpack_sequence 0.00%call_method_unknown -0.04%chameleon_v2 -0.07%json_load -0.08%etree_parse -0.09%pidigits -0.15%go -0.16%etree_iterparse -0.22%call_method_slots -0.49%call_method -0.97%Table 2: CPython 2 GUPB results-------------------------------unpickle_list 16.88%json_load 11.74%fannkuch 8.11%mako_v2 6.91%meteor_contest 6.27%slowpickle 4.81%nqueens 4.46%html5lib_warmup 3.53%chaos 2.67%regex_v8 2.56%html5lib 2.34%fastunpickle 2.32%tornado_http 2.23%rietveld 2.15%simple_logging 1.82%normal_startup 1.57%call_method_slots 1.53%telco 1.49%regex_compile 1.47%spectral_norm 1.36%hg_startup 1.27%regex_effbot 1.18%nbody 1.02%2to3 1.01%pybench 0.99%chameleon_v2 0.98%slowunpickle 0.93%startup_nosite 0.92%pickle_list 0.89%richards 0.56%django_v3 0.48%json_dump_v2 0.41%raytrace 0.38%unpack_sequence 0.00%float -0.05%slowspitfire -0.07%go -0.24%hexiom2 -0.26%spambayes -0.27%pickle_dict -0.30%etree_parse -0.32%pidigits -0.41%etree_iterparse -0.47%bzr_startup -0.55%fastpickle -0.74%etree_process -0.96%formatted_logging -1.01%call_simple -1.08%pathlib -1.12%silent_logging -1.22%etree_generate -1.23%call_method_unknown -2.14%call_method -2.22%Table 3: OpenStack Swift ssbench results----------------------------------------ssbench 2.11% | |||
| msg260675 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-02-22 12:56 | |
> Compared to my proposition in issue#26382, this patch yields slightly better results for CPython 3.6, gaining an average of +0.36% on GUPB,and similar results for CPython 2.7.IMHO this change is too young to be backported to Python 2.7. I wrote it for Python 3.6 only. For Python 2.7, I suggest to write patches with narrow scope, as you did for the patch only modifying the list type."""Table 1: CPython 3 GUPB results-------------------------------unpickle_list 22.74%mako_v2 9.13%nqueens 6.32%meteor_contest 5.61%fannkuch 5.34%simple_logging 5.28%formatted_logging 5.06%"""I surprised to see slow-down, but I prefer to think that changes smaller than 5% are pure noise.The good news is the long list of benchmarks with speedup larger than 5.0% :-) 22% on unpick list is nice to have too! | |||
| msg260681 -(view) | Author: Catalin Gabriel Manciu (catalin.manciu)* | Date: 2016-02-22 14:04 | |
I've just posted the results to an OpenStack Swift benchmark run using the patch from my proposition, issue#26382. Victor's patch, applied to CPython 2.7, adds an extra 1% compared to mine (which improved throughput by 1%), effectively doubling the performance gain. Swift is a highly complex real-world workload, so this result is quite significant. | |||
| msg261430 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-09 14:26 | |
I created the issue#26516 "Add PYTHONMALLOC env var and add support for malloc debug hooks in release mode" to help developers to detect bugs in their code, especially misuse of the PyMem_Malloc() API. | |||
| msg261431 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-09 14:36 | |
Patch 3:- Ooops, I updated pymem_api_misuse(), but I forgot to update the related unit test. It's now fixed. | |||
| msg261433 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-09 14:44 | |
In february 2016, I started a thread on the python-dev mailing list:[Python-Dev] Modify PyMem_Malloc to use pymalloc for performancehttps://mail.python.org/pipermail/python-dev/2016-February/143084.htmlM.-A. Lemburg wrote:"""> Do you see any drawback of using pymalloc for PyMem_Malloc()?Yes: You cannot free memory allocated using pymalloc with thestandard C lib free().It would be better to go through the list of PyMem_*() callsin Python and replace them with PyObject_*() calls, wherepossible.> Does anyone recall the rationale to have two families to memory allocators?The PyMem_*() APIs were needed to have a cross-platform malloc()implementation which returns standard C lib free()able memory,but also behaves well when passing 0 as size."""M.-A. Lemburg fears that the PyMem_Malloc() API is misused:"""Sometimes, yes, but we also do allocations for e.g.parsing values in Python argument tuples (e.g. using"es" or "et"):https://docs.python.org/3.6/c-api/arg.htmlWe do document to use PyMem_Free() on those; not sure whethereveryone does this though."""M.-A. Lemburg suggested to the patch of this issue on:"""Yes, but those are part of the stdlib. You'd need to checka few C extensions which are not tested as part of the stdlib,e.g. numpy, scipy, lxml, pillow, etc. (esp. ones which implement customtypes in C since these will often need the memory managementAPIs).It may also be a good idea to check wrapper generators suchas cython, swig, cffi, etc.""" | |||
| msg261445 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-09 16:55 | |
numpy: good!* I patched Pyhon 3.6 with pymem.patch of this issue + pymem-3.patch of issue#26516* I had issues to run tests with Python 3.6 compiled in debug mode:http://bugs.python.org/issue26519 &https://github.com/numpy/numpy/issues/7399* I ran the test suite: all tests pass, no bug related to memory allocators* Tested numpy version: commitb92cc76afad2e74cbbf6f5b9f5b68050f7c8642a (Mar 7 2016)Commands ran in numpy tests in a virtual environment:numpy$ python setup.py installnumpy$ cd..$ python -c 'import numpy; numpy.test()'(...)Ran 6206 tests in 280.986sOK (KNOWNFAIL=7, SKIP=6) | |||
| msg261446 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2016-03-09 16:57 | |
Victor, why do you insist on this instead of changing internal API calls in CPython? | |||
| msg261447 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-09 16:58 | |
Antoine Pitrou added the comment:> Victor, why do you insist on this instead of changing internal API calls in CPython?https://mail.python.org/pipermail/python-dev/2016-February/143097.html"There are 536 calls to the functions PyMem_Malloc(), PyMem_Realloc()and PyMem_Free()." | |||
| msg261448 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2016-03-09 17:00 | |
> "There are 536 calls to the functions PyMem_Malloc(), PyMem_Realloc()and PyMem_Free()."I'm sure you can use powerful tools such as "sed" ;-) | |||
| msg261449 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-09 17:01 | |
> I'm sure you can use powerful tools such as "sed" ;-)I guess that PyMem functions are used in third party C extensions modules. I expect (minor) speedup in these modules too.I don't understand why we should keep a slow allocator if Python has a faster allocator? | |||
| msg261450 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-09 17:05 | |
lxml: good!* I patched Python 3.6 with pymem.patch of this issue + pymem-3.patch of issue#26516* Tested lxml version: git commit93ec66f6533995a7742278f9ba14b925149ac140 (Mar 8 2016)lxml$ make test(...)Ran 1735 tests in 27.663sOK | |||
| msg261452 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-09 17:13 | |
Pillow: goodNote: I had to install JPEG headers (sudo dnf install -y libjpeg-turbo-devel).Tested version: git commit555544c5cfc3874deaac9cfa87780822ee714c0d (Mar 8 2016).---Pillow$ python setup.py installPillow$ python selftest.pyPillow$ python test-installed.py(...)Ran 671 tests in 8.458sFAILED (SKIP=124, errors=2)---The two errors are "OSError: decoder libtiff not available". | |||
| msg261453 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2016-03-09 17:13 | |
Le 09/03/2016 18:01, STINNER Victor a écrit :> I don't understand why we should keep a slow allocator if Python has a faster allocator?Define "slow". malloc() on Linux should be reasonably fast.Do you think it's reasonable to risk breaking external libraries justfor a hypothetic "performance improvement"?Again, why don't you try simply changing internal calls? | |||
| msg261454 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-09 17:15 | |
> Define "slow". malloc() on Linux should be reasonably fast.See first messages of this issue for benchmark results. Some specific benchmarks are faster, none is slower.> Do you think it's reasonable to risk breaking external libraries justfor a hypothetic "performance improvement"?Yes. It was discussed in the python-dev thread. | |||
| msg261455 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2016-03-09 17:20 | |
> Yes. It was discussed in the python-dev thread.I'm talking about the performance improvement in third-party libraries, not the performance improvement in CPython itself which can be addressed by replacing the internal API calls. | |||
| msg261456 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-09 17:27 | |
> I'm talking about the performance improvement in third-party libraries, not the performance improvement in CPython itself which can be addressed by replacing the internal API calls.Oh ok. I don't know how to measure the performance of third-party libraries. I expect no speedup or a little speedup, but no slow-down.> Do you think it's reasonable to risk breaking external libraries justfor a hypothetic "performance improvement"?The question is if my change really breaks anything in practice. I'm testing some popular C extensions to prepare an answer. Early results is that developer use correctly the Python allocator API :-)I disagree on the fact that my change breaks any API. The API doc is clear. For example, you must use PyMem_Free() on memory allocated by PyMem_Malloc(). If you use free(), it fails badly with Python compiled in debug mode.My issue#26516 "Add PYTHONMALLOC env var and add support for malloc debug hooks in release mode" may help developers to validate their own application.I suggest you to continue the discussion on python-dev for a wider audience. I will test a few more projects before replying on the python-dev thread. | |||
| msg261457 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2016-03-09 17:28 | |
Le 09/03/2016 18:27, STINNER Victor a écrit :> > I disagree on the fact that my change breaks any API. The API doc isclear.Does the API doc say anything about the GIL, for example? Or Valgrind?> I suggest you to continue the discussion on python-dev for a wideraudience. I will test a few more projects before replying on thepython-dev thread.I have no interest in going back and forth between the Python trackerand python-dev (especially since I hardly read python-dev these days).If you address my questions positively here I will be happy with the patch! | |||
| msg261458 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-09 17:30 | |
cryptography: good* Git commit0681de7241dcbaec7b3dc85d3cf3944e4bec8309 (Mar 9 2016)"4 failed, 77064 passed, 3096 skipped in 405.09 seconds" 1 error is related to the version number (probably an issue on how I run the tests), 3 errors are FileNotFoundError related to cryptography_vectors. At least, there is no Python fatal error related to memory allocators ;-)--Hum, just in case, I checked my venv:(ENV)haypo@smithers$ python -c 'import _testcapi; _testcapi.pymem_api_misuse()'...Fatal Python error: bad ID: Allocated using API 'o', verified using API 'r'(ENV)haypo@smithers$ python -c 'import _testcapi; _testcapi.pymem_buffer_overflow()'...Fatal Python error: bad trailing pad byteIt works ;-) | |||
| msg261459 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-09 17:36 | |
2016-03-09 18:28 GMT+01:00 Antoine Pitrou <report@bugs.python.org>:> Does the API doc say anything about the GIL, for example? Or Valgrind?For the GIL, yes, Python 3 doc is explicit:https://docs.python.org/dev/c-api/memory.html#memory-interfaceRed and bold warning: "The GIL must be held when using these functions."Hum, sadly it looks like the warning miss in Python 2 doc.The GIL was the motivation to introduce the PyMem_RawMalloc() functionin Python 3.4.For Valgrind: using the issue#26516, you will be able to usePYTHONMALLOC=malloc to use easily Valgrind even on a Python compiledin release mode (which is a new feature, before you had to manuallyrecompile Python in debug mode with --with-valgrind)). | |||
| msg261488 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-10 09:47 | |
> Does the API doc say anything about the GIL, for example?I modified Python to add assert(PyGILState_Check()); in PyMem_Malloc() and other functions.Sadly, I found a bug in Numpy: Numpy releases the GIL for performance but call PyMem_Malloc() with the GIL released. I proposed a fix:https://github.com/numpy/numpy/pull/7404I guess that the fix is obvious and will be quickly merged, but it means that other libraries may have the issue.Using the issue#26516 (PYTHONMALLOC=debug), we can check PyGILState_Check() at runtime, but there is currently an issue related to sub-interpreters. The assertion fails in support.run_in_subinterp(), function used by test_threading and test_capi for example. | |||
| msg261749 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-14 12:58 | |
pymalloc.patch: Updated patch. | |||
| msg261766 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-14 16:12 | |
> Using the issue#26516 (PYTHONMALLOC=debug), we can check PyGILState_Check() at runtime, but there is currently an issue related to sub-interpreters. The assertion fails in support.run_in_subinterp(), function used by test_threading and test_capi for example.I created#26558 to implement GIL checks in PyMem_Malloc() and PyObject_Malloc(). | |||
| msg261788 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-03-14 22:54 | |
I created the issue#26563 "PyMem_Malloc(): check that the GIL is hold in debug hooks". | |||
| msg264020 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2016-04-22 14:38 | |
New changeset68b2a43d8653 by Victor Stinner in branch 'default':PyMem_Malloc() now uses the fast pymalloc allocatorhttps://hg.python.org/cpython/rev/68b2a43d8653 | |||
| msg264027 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2016-04-22 17:11 | |
New changeset104ed24ebbd0 by Victor Stinner in branch 'default':Issue#26249: Try test_capi on Windowshttps://hg.python.org/cpython/rev/104ed24ebbd0 | |||
| msg264130 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2016-04-24 20:33 | |
New changeset7acad5d8f80e by Victor Stinner in branch 'default':Issue#26249: Mention PyMem_Malloc() change in What's New in Python 3.6 in thehttps://hg.python.org/cpython/rev/7acad5d8f80e | |||
| msg264132 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-04-24 20:35 | |
I documented the change, buildbots are happy, I close the issue. | |||
| msg264174 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2016-04-25 13:27 | |
68b2a43d8653 introduced memory leak.$ ./python -m test.regrtest -uall -R : test_formatRun tests sequentially0:00:00 [1/1] test_formatbeginning 9 repetitions123456789.........test_format leaked [6, 7, 7, 7] memory blocks, sum=271 test failed: test_formatTotal duration: 0:00:01 | |||
| msg264245 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2016-04-26 10:36 | |
New changeset090502a0c69c by Victor Stinner in branch 'default':Issue#25349,#26249: Fix memleak in formatfloat()https://hg.python.org/cpython/rev/090502a0c69c | |||
| msg264251 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-04-26 11:35 | |
>68b2a43d8653 introduced memory leak.I was very surprised to see a regression in test_format since I didn't change any change related to bytes, bytearray or str formatting in this issue.In fact, it's much better than that! With PyMem_Malloc() using pymalloc, we benefit for free of the cheap "_Py_AllocatedBlocks" memory leak detector. I introduced the memory leak in the issue#25349 when I optimimzed bytes%args and bytearray%args using the new _PyBytesWriter API.This memory leak gave me an idea, I opened the issue#26850: "PyMem_RawMalloc(): update also sys.getallocatedblocks() in debug mode". | |||
| msg264252 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2016-04-26 11:37 | |
There are no more know bugs related to this change, I close the issue. Thanks for the test_format report Serhiy, I missed it. | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:27 | admin | set | github: 70437 |
| 2016-04-26 11:37:12 | vstinner | set | status: open -> closed messages: +msg264252 |
| 2016-04-26 11:35:59 | vstinner | set | messages: +msg264251 |
| 2016-04-26 10:36:42 | python-dev | set | messages: +msg264245 |
| 2016-04-25 13:27:34 | serhiy.storchaka | set | status: closed -> open messages: +msg264174 |
| 2016-04-24 20:35:26 | vstinner | set | status: open -> closed resolution: fixed messages: +msg264132 |
| 2016-04-24 20:33:44 | python-dev | set | messages: +msg264130 |
| 2016-04-22 17:11:48 | python-dev | set | messages: +msg264027 |
| 2016-04-22 14:38:51 | python-dev | set | nosy: +python-dev messages: +msg264020 |
| 2016-03-14 22:54:44 | vstinner | set | messages: +msg261788 |
| 2016-03-14 16:12:33 | vstinner | set | messages: +msg261766 |
| 2016-03-14 12:58:16 | vstinner | set | files: +pymalloc.patch messages: +msg261749 |
| 2016-03-10 09:47:41 | vstinner | set | messages: +msg261488 |
| 2016-03-09 17:49:00 | yselivanov | set | nosy: -Yury.Selivanov,yselivanov |
| 2016-03-09 17:36:03 | vstinner | set | messages: +msg261459 |
| 2016-03-09 17:30:17 | vstinner | set | messages: +msg261458 |
| 2016-03-09 17:28:59 | pitrou | set | messages: +msg261457 |
| 2016-03-09 17:27:09 | vstinner | set | messages: +msg261456 |
| 2016-03-09 17:20:37 | pitrou | set | messages: +msg261455 |
| 2016-03-09 17:15:44 | vstinner | set | messages: +msg261454 |
| 2016-03-09 17:13:22 | pitrou | set | messages: +msg261453 |
| 2016-03-09 17:13:07 | vstinner | set | messages: +msg261452 |
| 2016-03-09 17:05:30 | vstinner | set | messages: +msg261450 |
| 2016-03-09 17:01:25 | vstinner | set | messages: +msg261449 |
| 2016-03-09 17:00:03 | pitrou | set | messages: +msg261448 |
| 2016-03-09 16:58:42 | vstinner | set | messages: +msg261447 |
| 2016-03-09 16:57:24 | pitrou | set | messages: +msg261446 |
| 2016-03-09 16:55:45 | vstinner | set | messages: +msg261445 |
| 2016-03-09 14:44:11 | vstinner | set | messages: +msg261433 |
| 2016-03-09 14:36:58 | vstinner | set | files: -pymem-3.patch |
| 2016-03-09 14:36:09 | vstinner | set | files: +pymem-3.patch messages: +msg261431 |
| 2016-03-09 14:26:35 | vstinner | set | messages: +msg261430 |
| 2016-03-09 14:24:45 | vstinner | set | title: Change PyMem_Malloc to use PyObject_Malloc allocator? -> Change PyMem_Malloc to use pymalloc allocator |
| 2016-02-22 14:04:09 | catalin.manciu | set | messages: +msg260681 |
| 2016-02-22 12:56:33 | vstinner | set | messages: +msg260675 |
| 2016-02-22 12:50:31 | catalin.manciu | set | files: +pymem_27.patch nosy: +catalin.manciu messages: +msg260674 |
| 2016-02-15 07:33:12 | alecsandru.patrascu | set | nosy: +alecsandru.patrascu |
| 2016-02-02 23:14:59 | vstinner | set | messages: +msg259445 |
| 2016-02-02 22:27:43 | vstinner | set | messages: +msg259441 |
| 2016-02-02 22:27:02 | vstinner | set | messages: +msg259440 |
| 2016-02-02 15:01:56 | vstinner | set | messages: +msg259395 |
| 2016-02-02 14:54:29 | vstinner | set | messages: +msg259393 |
| 2016-02-02 14:53:16 | pitrou | set | messages: +msg259392 |
| 2016-02-02 14:52:13 | pitrou | set | messages: +msg259391 |
| 2016-02-02 14:48:38 | vstinner | set | messages: +msg259390 |
| 2016-02-02 14:47:27 | vstinner | set | messages: +msg259389 |
| 2016-02-02 13:40:35 | vstinner | set | messages: +msg259385 |
| 2016-02-02 13:28:11 | pitrou | set | messages: +msg259384 |
| 2016-02-02 13:17:09 | Yury.Selivanov | set | nosy: +Yury.Selivanov messages: +msg259383 |
| 2016-02-02 12:00:31 | vstinner | set | messages: +msg259382 |
| 2016-02-02 11:12:50 | vstinner | set | files: +tu_malloc.c |
| 2016-02-02 11:12:45 | vstinner | set | files: +python_memleak.py messages: +msg259379 |
| 2016-02-02 11:10:45 | vstinner | set | messages: +msg259378 |
| 2016-02-02 11:06:53 | pitrou | set | messages: +msg259377 |
| 2016-02-02 11:06:14 | pitrou | set | nosy: +pitrou messages: +msg259376 |
| 2016-01-31 17:59:22 | vstinner | set | messages: +msg259297 |
| 2016-01-31 17:48:49 | vstinner | set | nosy: +jtaylor |
| 2016-01-31 17:48:24 | vstinner | create | |