Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue5512

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:Streamline integer division
Type:performanceStage:
Components:Interpreter CoreVersions:Python 3.1, Python 2.7
process
Status:closedResolution:accepted
Dependencies:Superseder:
Assigned To: mark.dickinsonNosy List: mark.dickinson, pitrou, vstinner
Priority:normalKeywords:patch

Created on2009-03-18 22:11 bymark.dickinson, last changed2022-04-11 14:56 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
faster_integer_division.patchmark.dickinson,2009-03-18 22:11
pidigits_bestof.pymark.dickinson,2009-03-18 22:54Integer benchmark
pidigits-2.pyvstinner,2009-03-18 23:22
faster_integer_division2.patchmark.dickinson,2009-03-21 22:19
Messages (16)
msg83781 -(view)Author: Mark Dickinson (mark.dickinson)*(Python committer)Date: 2009-03-18 22:11
Here's a patch that streamlines the x_divrem function inObjects/longobject.c.  More benchmarks are needed, but the initial results are promising:  for py3k, on a 32-bit machine with 15-bit digits, Victor's pidigits benchmark runs almost twice as fast with this patch (numbers below).Patch details=============- Normalize inputs by shifting instead of multiplying and dividing.  This halves the number of divisions used in the algorithm.- Streamline innermost loop.- Save an iteration of outer loop around half the time.- Save an object allocation:  only 3 allocations per x_divrem call  instead of 4.- Remove special case where initial quotient estimate is >= PyLong_BASE.  There's no need for this, since the 'digit' type holds values up  to 2*PyLong_BASE - 1.- Make q type digit instead of type twodigits:  this halves the size  of the multiplication in the innermost loop.Benchmark results=================Using the pidigits_bestof.py script that's posted in theissue 4258 discussion, on a non-debug build of py3k (r70465), on OS X 10.5.6/Core 2 Duo:Unpatched---------Macintosh-3:py3k dickinsm$ ./python.exe ../pidigits_bestof.py 2000performing a warm up run...runningsys.int_info= sys.int_info(bits_per_digit=15, sizeof_digit=2)Time; 2234.6 msTime; 2232.2 msTime; 2227.9 msTime; 2225.7 msTime; 2229.8 msBest Time; 2225.7 msPatched-------dickinsm$ ./python.exe ../pidigits_bestof.py 2000performing a warm up run...runningsys.int_info= sys.int_info(bits_per_digit=15, sizeof_digit=2)Time; 1175.6 msTime; 1176.5 msTime; 1177.3 msTime; 1179.5 msTime; 1168.5 msBest Time; 1168.5 msSo the patch gives a speedup of around 90%.  This particular benchmark is heavy on the divisions though, so I'd expect lesser speedups for other benchmarks.
msg83782 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2009-03-18 22:17
Would it be possible to include integer benchmark tools to upstream? I mean the "pidigit" tool (I didn't wrote this test! I just ported it to Python3) and maybe my "bench_int" tool. It could useful to reproduce the benchmark on different Python versions and different computers. We may open a new issue for each tool?
msg83785 -(view)Author: Mark Dickinson (mark.dickinson)*(Python committer)Date: 2009-03-18 22:31
Sounds good to me!  An integer benchmark script in the Tools/ directorywould be handy.
msg83786 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2009-03-18 22:43
Well, the original code in pidigits is by me :-) (although it's notstated in the source file).I haven't tried the patch but the performance work you're doing isreally impressive, Mark!
msg83788 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2009-03-18 22:47
Results here (Athlon X2 3600+ with gcc in 64-bit mode with 30-bitdigits), "pidigits_bestof.py 2000":- unpatched: 1644.9 ms- patched: 694.8 ms!
msg83789 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2009-03-18 22:48
Useful informations for an integer benchmark: - CPU type: 32 or 64 bits, endian - Memory size - Python version - Informations about how Python was compiled - Integer implementation: use int_info when it's available, otherwise base=2^15Who has the last version of pidigit tool? Can you attach it here? :-)
msg83790 -(view)Author: Mark Dickinson (mark.dickinson)*(Python committer)Date: 2009-03-18 22:52
[Antoine]> Well, the original code in pidigits is by me :-)Apologies for the misattribution!
msg83791 -(view)Author: Mark Dickinson (mark.dickinson)*(Python committer)Date: 2009-03-18 22:54
Here's the version of pidigits that I was using.
msg83793 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2009-03-18 23:22
New version of pidigit: - works on python 2.6, trunk and py3k - display Python version, CPU info (bits, endian), PyLong info (base, digit size) - display usage if no argument is given
msg83794 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2009-03-18 23:22
Oh, there is no version number of the benchmark tool itself!
msg83795 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2009-03-18 23:27
My numbers:# No patch$ ./python /home/haypo/pidigits-2.py 3000Python 3.1a1+ (py3k:70466, Mar 18 2009, 23:56:06)[GCC 4.3.2]CPU: 64 bits, little endianPyLong: base=2^30, sizeof(digit)=32 bits(...)Best Time; 2300.2 ms# With faster_integer_division.patch(...)Best Time; 1138.1 msOk, it's two times faster on 64 bits CPU!!!Other notes about pidigits: - missing header (no author name, no description) - pidigit should also write the number of digits in its output to avoid mistakes in comparaisons
msg83953 -(view)Author: Mark Dickinson (mark.dickinson)*(Python committer)Date: 2009-03-21 22:19
Updated patch.  Lots of cleanup, but only one significant change:  the inner loop now uses signed arithmetic instead of unsigned arithmetic.  This saves a negation and fixes a subtle bug: the previous inner loop code was incorrect when using 15-bit digits on machines with sizeof(short) == sizeof(long).  Not that I know of any such machines:  the Cray T3E famously has no 16-bit integer type, but there sizeof(short) is 4 and sizeof(long) is 8.A few more timings, this time from doing a single huge integer division:  10**1000000//10**500000 (so this effectively times just the inner loop, since all else will be insignificant).  All timings are best-of-5, from non-debug builds of py3k, on the same system: OS X 10.5.6/2.4 GHz Core 2 Duo.  Times in brackets are the approximate per-inner-loop times (remembering that there are 4 times as many iterations of the inner loop for 15-bit digits).32-bit build, 15-bit digits, unpatched:  92382.2 ms   (~7.5 ns)32-bit build, 15-bit digits, patched:    36473.3 ms   (~3.0 ns)64-bit build, 30-bit digits, unpatched:  14581.4 ms   (~4.8 ns)64-bit build, 30-bit digits, patched:     7385.1 ms   (~2.4 ns)... and just for fun, the other combinations:64-bit build, 15-bit digits, unpatched:  61927.5 ms   (~5.1 ns)64-bit build, 15-bit digits, patched:    43632.9 ms   (~3.6 ns)32-bit build, 30-bit digits, unpatched:  62374.1 ms  (~20.3 ns)32-bit build, 30-bit digits, patched:    26928.3 ms   (~8.8 ns)Thanks for the updated pidigits script, Victor!  Maybe this is too small right now to be worth including in the Tools directory, but I hope we can fatten it up with some other benchmarks.  What do you think?
msg84027 -(view)Author: Mark Dickinson (mark.dickinson)*(Python committer)Date: 2009-03-23 20:04
Applied,r70542 andr70547.
msg84066 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2009-03-24 09:00
Would it be possible to port the patch to python trunk?
msg84067 -(view)Author: Mark Dickinson (mark.dickinson)*(Python committer)Date: 2009-03-24 09:10
Hi Victor!  I already applied the x_divrem patch to the trunk and to py3k.  Did you mean the release branch?I'd prefer not to backport this to 2.6 or 3.0:  it's not a new feature, but it's not a bugfix either and there's always some risk of breakage...
msg84068 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2009-03-24 09:11
> I already applied the x_divrem patch to the trunk and to py3kOops! I taught that you applied it to py3k and release30-maint :-/It's ok, trunk and py3k is enough ;-)
History
DateUserActionArgs
2022-04-11 14:56:46adminsetgithub: 49762
2009-03-24 09:11:59vstinnersetmessages: +msg84068
2009-03-24 09:10:01mark.dickinsonsetmessages: +msg84067
2009-03-24 09:00:07vstinnersetmessages: +msg84066
2009-03-23 20:04:49mark.dickinsonsetstatus: open -> closed
resolution: accepted
messages: +msg84027
2009-03-21 22:19:31mark.dickinsonsetfiles: +faster_integer_division2.patch

messages: +msg83953
2009-03-18 23:27:21vstinnersetmessages: +msg83795
2009-03-18 23:22:35vstinnersetmessages: +msg83794
2009-03-18 23:22:07vstinnersetfiles: +pidigits-2.py

messages: +msg83793
2009-03-18 22:54:54mark.dickinsonsetfiles: +pidigits_bestof.py

messages: +msg83791
2009-03-18 22:52:05mark.dickinsonsetmessages: +msg83790
2009-03-18 22:48:53vstinnersetmessages: +msg83789
2009-03-18 22:47:45pitrousetmessages: +msg83788
2009-03-18 22:43:56pitrousetnosy: +pitrou
messages: +msg83786
2009-03-18 22:31:29mark.dickinsonsetmessages: +msg83785
2009-03-18 22:17:09vstinnersetnosy: +vstinner
messages: +msg83782
2009-03-18 22:11:26mark.dickinsoncreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp