
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2009-03-18 22:11 bymark.dickinson, last changed2022-04-11 14:56 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| faster_integer_division.patch | mark.dickinson,2009-03-18 22:11 | |||
| pidigits_bestof.py | mark.dickinson,2009-03-18 22:54 | Integer benchmark | ||
| pidigits-2.py | vstinner,2009-03-18 23:22 | |||
| faster_integer_division2.patch | mark.dickinson,2009-03-21 22:19 | |||
| Messages (16) | |||
|---|---|---|---|
| msg83781 -(view) | Author: Mark Dickinson (mark.dickinson)*![]() | Date: 2009-03-18 22:11 | |
Here's a patch that streamlines the x_divrem function inObjects/longobject.c. More benchmarks are needed, but the initial results are promising: for py3k, on a 32-bit machine with 15-bit digits, Victor's pidigits benchmark runs almost twice as fast with this patch (numbers below).Patch details=============- Normalize inputs by shifting instead of multiplying and dividing. This halves the number of divisions used in the algorithm.- Streamline innermost loop.- Save an iteration of outer loop around half the time.- Save an object allocation: only 3 allocations per x_divrem call instead of 4.- Remove special case where initial quotient estimate is >= PyLong_BASE. There's no need for this, since the 'digit' type holds values up to 2*PyLong_BASE - 1.- Make q type digit instead of type twodigits: this halves the size of the multiplication in the innermost loop.Benchmark results=================Using the pidigits_bestof.py script that's posted in theissue 4258 discussion, on a non-debug build of py3k (r70465), on OS X 10.5.6/Core 2 Duo:Unpatched---------Macintosh-3:py3k dickinsm$ ./python.exe ../pidigits_bestof.py 2000performing a warm up run...runningsys.int_info= sys.int_info(bits_per_digit=15, sizeof_digit=2)Time; 2234.6 msTime; 2232.2 msTime; 2227.9 msTime; 2225.7 msTime; 2229.8 msBest Time; 2225.7 msPatched-------dickinsm$ ./python.exe ../pidigits_bestof.py 2000performing a warm up run...runningsys.int_info= sys.int_info(bits_per_digit=15, sizeof_digit=2)Time; 1175.6 msTime; 1176.5 msTime; 1177.3 msTime; 1179.5 msTime; 1168.5 msBest Time; 1168.5 msSo the patch gives a speedup of around 90%. This particular benchmark is heavy on the divisions though, so I'd expect lesser speedups for other benchmarks. | |||
| msg83782 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2009-03-18 22:17 | |
Would it be possible to include integer benchmark tools to upstream? I mean the "pidigit" tool (I didn't wrote this test! I just ported it to Python3) and maybe my "bench_int" tool. It could useful to reproduce the benchmark on different Python versions and different computers. We may open a new issue for each tool? | |||
| msg83785 -(view) | Author: Mark Dickinson (mark.dickinson)*![]() | Date: 2009-03-18 22:31 | |
Sounds good to me! An integer benchmark script in the Tools/ directorywould be handy. | |||
| msg83786 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2009-03-18 22:43 | |
Well, the original code in pidigits is by me :-) (although it's notstated in the source file).I haven't tried the patch but the performance work you're doing isreally impressive, Mark! | |||
| msg83788 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2009-03-18 22:47 | |
Results here (Athlon X2 3600+ with gcc in 64-bit mode with 30-bitdigits), "pidigits_bestof.py 2000":- unpatched: 1644.9 ms- patched: 694.8 ms! | |||
| msg83789 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2009-03-18 22:48 | |
Useful informations for an integer benchmark: - CPU type: 32 or 64 bits, endian - Memory size - Python version - Informations about how Python was compiled - Integer implementation: use int_info when it's available, otherwise base=2^15Who has the last version of pidigit tool? Can you attach it here? :-) | |||
| msg83790 -(view) | Author: Mark Dickinson (mark.dickinson)*![]() | Date: 2009-03-18 22:52 | |
[Antoine]> Well, the original code in pidigits is by me :-)Apologies for the misattribution! | |||
| msg83791 -(view) | Author: Mark Dickinson (mark.dickinson)*![]() | Date: 2009-03-18 22:54 | |
Here's the version of pidigits that I was using. | |||
| msg83793 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2009-03-18 23:22 | |
New version of pidigit: - works on python 2.6, trunk and py3k - display Python version, CPU info (bits, endian), PyLong info (base, digit size) - display usage if no argument is given | |||
| msg83794 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2009-03-18 23:22 | |
Oh, there is no version number of the benchmark tool itself! | |||
| msg83795 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2009-03-18 23:27 | |
My numbers:# No patch$ ./python /home/haypo/pidigits-2.py 3000Python 3.1a1+ (py3k:70466, Mar 18 2009, 23:56:06)[GCC 4.3.2]CPU: 64 bits, little endianPyLong: base=2^30, sizeof(digit)=32 bits(...)Best Time; 2300.2 ms# With faster_integer_division.patch(...)Best Time; 1138.1 msOk, it's two times faster on 64 bits CPU!!!Other notes about pidigits: - missing header (no author name, no description) - pidigit should also write the number of digits in its output to avoid mistakes in comparaisons | |||
| msg83953 -(view) | Author: Mark Dickinson (mark.dickinson)*![]() | Date: 2009-03-21 22:19 | |
Updated patch. Lots of cleanup, but only one significant change: the inner loop now uses signed arithmetic instead of unsigned arithmetic. This saves a negation and fixes a subtle bug: the previous inner loop code was incorrect when using 15-bit digits on machines with sizeof(short) == sizeof(long). Not that I know of any such machines: the Cray T3E famously has no 16-bit integer type, but there sizeof(short) is 4 and sizeof(long) is 8.A few more timings, this time from doing a single huge integer division: 10**1000000//10**500000 (so this effectively times just the inner loop, since all else will be insignificant). All timings are best-of-5, from non-debug builds of py3k, on the same system: OS X 10.5.6/2.4 GHz Core 2 Duo. Times in brackets are the approximate per-inner-loop times (remembering that there are 4 times as many iterations of the inner loop for 15-bit digits).32-bit build, 15-bit digits, unpatched: 92382.2 ms (~7.5 ns)32-bit build, 15-bit digits, patched: 36473.3 ms (~3.0 ns)64-bit build, 30-bit digits, unpatched: 14581.4 ms (~4.8 ns)64-bit build, 30-bit digits, patched: 7385.1 ms (~2.4 ns)... and just for fun, the other combinations:64-bit build, 15-bit digits, unpatched: 61927.5 ms (~5.1 ns)64-bit build, 15-bit digits, patched: 43632.9 ms (~3.6 ns)32-bit build, 30-bit digits, unpatched: 62374.1 ms (~20.3 ns)32-bit build, 30-bit digits, patched: 26928.3 ms (~8.8 ns)Thanks for the updated pidigits script, Victor! Maybe this is too small right now to be worth including in the Tools directory, but I hope we can fatten it up with some other benchmarks. What do you think? | |||
| msg84027 -(view) | Author: Mark Dickinson (mark.dickinson)*![]() | Date: 2009-03-23 20:04 | |
Applied,r70542 andr70547. | |||
| msg84066 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2009-03-24 09:00 | |
Would it be possible to port the patch to python trunk? | |||
| msg84067 -(view) | Author: Mark Dickinson (mark.dickinson)*![]() | Date: 2009-03-24 09:10 | |
Hi Victor! I already applied the x_divrem patch to the trunk and to py3k. Did you mean the release branch?I'd prefer not to backport this to 2.6 or 3.0: it's not a new feature, but it's not a bugfix either and there's always some risk of breakage... | |||
| msg84068 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2009-03-24 09:11 | |
> I already applied the x_divrem patch to the trunk and to py3kOops! I taught that you applied it to py3k and release30-maint :-/It's ok, trunk and py3k is enough ;-) | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:56:46 | admin | set | github: 49762 |
| 2009-03-24 09:11:59 | vstinner | set | messages: +msg84068 |
| 2009-03-24 09:10:01 | mark.dickinson | set | messages: +msg84067 |
| 2009-03-24 09:00:07 | vstinner | set | messages: +msg84066 |
| 2009-03-23 20:04:49 | mark.dickinson | set | status: open -> closed resolution: accepted messages: +msg84027 |
| 2009-03-21 22:19:31 | mark.dickinson | set | files: +faster_integer_division2.patch messages: +msg83953 |
| 2009-03-18 23:27:21 | vstinner | set | messages: +msg83795 |
| 2009-03-18 23:22:35 | vstinner | set | messages: +msg83794 |
| 2009-03-18 23:22:07 | vstinner | set | files: +pidigits-2.py messages: +msg83793 |
| 2009-03-18 22:54:54 | mark.dickinson | set | files: +pidigits_bestof.py messages: +msg83791 |
| 2009-03-18 22:52:05 | mark.dickinson | set | messages: +msg83790 |
| 2009-03-18 22:48:53 | vstinner | set | messages: +msg83789 |
| 2009-03-18 22:47:45 | pitrou | set | messages: +msg83788 |
| 2009-03-18 22:43:56 | pitrou | set | nosy: +pitrou messages: +msg83786 |
| 2009-03-18 22:31:29 | mark.dickinson | set | messages: +msg83785 |
| 2009-03-18 22:17:09 | vstinner | set | nosy: +vstinner messages: +msg83782 |
| 2009-03-18 22:11:26 | mark.dickinson | create | |