
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2015-01-18 12:32 bycmn, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| ipaddress_collapse_non_consecutive_performance.diff | cmn,2015-01-18 12:32 | The patch on ipaddress.py | ||
| testrig.tar.gz | cmn,2015-01-18 12:40 | test rig for performance eval | ||
| collapse.patch | pitrou,2015-01-18 13:25 | review | ||
| ipaddress_faster_collapse3.patch | serhiy.storchaka,2015-01-18 21:53 | review | ||
| ipaddress_faster_collapse4.patch | cmn,2015-01-20 19:03 | |||
| Messages (15) | |||
|---|---|---|---|
| msg234239 -(view) | Author: Markus (cmn)* | Date: 2015-01-18 12:32 | |
I found the code used to collapse addresses to be very slow on a large number (64k) of island addresses which are not collapseable.The code athttps://github.com/python/cpython/blob/0f164ccc85ff055a32d11ad00017eff768a79625/Lib/ipaddress.py#L349was found to be guilty, especially the index lookup.The patch changes the code to discard the index lookup and have _find_address_range return the number of items consumed.That way the set operation to dedup the addresses can be dropped as well.Numbers from the testrig I adapted fromhttp://bugs.python.org/issue20826 with 8k non-consecutive addresses:Execution time: 0.6893927365541458 secondsvs.Execution time: 12.116527611762285 secondsMfGMarkus Kötter | |||
| msg234240 -(view) | Author: Markus (cmn)* | Date: 2015-01-18 12:40 | |
Added the testrig. | |||
| msg234241 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2015-01-18 13:24 | |
This is great, thank you. Can you sign the contributor's agreement?https://www.python.org/psf/contrib/contrib-form/ | |||
| msg234242 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2015-01-18 13:25 | |
Here is an updated patch with a fix to the tests and docstrings. | |||
| msg234245 -(view) | Author: Markus (cmn)* | Date: 2015-01-18 14:19 | |
I just signed the agreement, ewa@ is processing it. | |||
| msg234253 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2015-01-18 15:24 | |
New changesetf7508a176a09 by Antoine Pitrou in branch 'default':Issue#23266: Much faster implementation of ipaddress.collapse_addresses() when there are many non-consecutive addresses.https://hg.python.org/cpython/rev/f7508a176a09 | |||
| msg234254 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2015-01-18 15:25 | |
Ok, I've committed the patch. Thank you! | |||
| msg234282 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2015-01-18 21:53 | |
Deduplication should not be omitted. This slowed down collapsing of duplicated addresses.$ ./python -m timeit -s "import ipaddress; ips = [ipaddress.ip_address('2001:db8::1000') for i in range(1000)]" -- "ipaddress.collapse_addresses(ips)"Beforef7508a176a09:100 loops, best of 3: 13.4 msec per loopAfterf7508a176a09:10 loops, best of 3: 129 msec per loopProposed patch restores performance for duplicated addresses and simplifies the code using generators. | |||
| msg234283 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2015-01-18 22:08 | |
Good catch. What is the performance on the benchmark posted here? | |||
| msg234286 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2015-01-18 22:28 | |
> What is the performance on the benchmark posted here?The same as with current code. | |||
| msg234287 -(view) | Author: Antoine Pitrou (pitrou)*![]() | Date: 2015-01-18 22:37 | |
Then +1. The patch looks fine to me. | |||
| msg234288 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2015-01-18 22:42 | |
New changeset021b23a40f9f by Serhiy Storchaka in branch 'default':Issue#23266: Restore the performance of ipaddress.collapse_addresses() whithhttps://hg.python.org/cpython/rev/021b23a40f9f | |||
| msg234382 -(view) | Author: Markus (cmn)* | Date: 2015-01-20 19:03 | |
My initial patch was wrong wrt. _find_address_range.It did not loop over equal addresses.Thats why performance with many equal addresses was degraded when dropping the set().Here is a patch to fix _find_address_range, drop the set, and improve performance again.python3 -m timeit -s "import bipaddress; ips = [bipaddress.ip_address('2001:db8::1000') for i in range(1000)]" -- "bipaddress.collapse_addresses(ips)"1000 loops, best of 3: 1.76 msec per looppython3 -m timeit -s "import aipaddress; ips = [aipaddress.ip_address('2001:db8::1000') for i in range(1000)]" -- "aipaddress.collapse_addresses(ips)"1000 loops, best of 3: 1.32 msec per loop | |||
| msg234389 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2015-01-20 20:22 | |
Only one duplicated address is degenerated case. When there is a lot of duplicated addresses in range the patch causes regression.$ ./python -m timeit -s "import ipaddress; ips = [ipaddress.ip_address('2001:db8::%x' % (i%100)) for i in range(100000)]" -- "ipaddress.collapse_addresses(ips)"Unpatched: 10 loops, best of 3: 369 msec per loopPatched: 10 loops, best of 3: 1.04 sec per loop | |||
| msg234410 -(view) | Author: Markus (cmn)* | Date: 2015-01-20 23:48 | |
Eleminating duplicates before processing is faster once the overhead of the set operation is less than the time required to sort the larger dataset with duplicates.So we are basically comparing sort(data) to sort(set(data)).The optimum depends on the input data.python3 -m timeit -s "import random; import bipaddress; ips = [bipaddress.ip_address('2001:db8::') + i for i in range(100000)]; random.shuffle(ips)" -- "bipaddress.collapse_addresses(ips)"10 loops, best of 3: 1.49 sec per loopvs.10 loops, best of 3: 1.59 sec per loopIf the data is pre-sorted, possible if you retrieve from database, things are drastically different:python3 -m timeit -s "import random; import bipaddress; ips = [bipaddress.ip_address('2001:db8::') + i for i in range(100000)]; " -- "bipaddress.collapse_addresses(ips)"10 loops, best of 3: 136 msec per loopvs10 loops, best of 3: 1.57 sec per loopSo for my usecase, I basically have less than 0.1% duplicates (if at all), dropping the set would be better, but ... other usecases will exist.Still, it is easy to "emulate" the use of "sorted(set())" from a users perspective - just call collapse_addresses(set(data)) in case you expect to have duplicates and experience a speedup by inserting unique, possibly even sorted, data.On the other hand, if you have a huge load of 99.99% sorted non collapseable addresses, it is not possible to drop the set() operation in your sorted(set()) from a users perspective, no way to speed things up, and the slowdown you get is x10.That said, I'd drop the set().Optimization depends on data input, dropping the set() allows the user to optimize base on the nature of his input data. | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:12 | admin | set | github: 67455 |
| 2015-01-20 23:48:27 | cmn | set | messages: +msg234410 |
| 2015-01-20 20:22:47 | serhiy.storchaka | set | messages: +msg234389 |
| 2015-01-20 19:03:10 | cmn | set | files: +ipaddress_faster_collapse4.patch messages: +msg234382 |
| 2015-01-18 22:43:28 | serhiy.storchaka | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
| 2015-01-18 22:42:20 | python-dev | set | messages: +msg234288 |
| 2015-01-18 22:37:18 | pitrou | set | messages: +msg234287 |
| 2015-01-18 22:28:38 | serhiy.storchaka | set | messages: +msg234286 |
| 2015-01-18 22:08:30 | pitrou | set | messages: +msg234283 |
| 2015-01-18 21:53:11 | serhiy.storchaka | set | status: closed -> open files: +ipaddress_faster_collapse3.patch nosy: +serhiy.storchaka messages: +msg234282 resolution: fixed -> (no value) stage: resolved -> patch review |
| 2015-01-18 15:25:17 | pitrou | set | status: open -> closed resolution: fixed messages: +msg234254 stage: patch review -> resolved |
| 2015-01-18 15:24:23 | python-dev | set | nosy: +python-dev messages: +msg234253 |
| 2015-01-18 14:19:48 | cmn | set | messages: +msg234245 |
| 2015-01-18 13:25:46 | pitrou | set | files: +collapse.patch messages: +msg234242 |
| 2015-01-18 13:24:58 | pitrou | set | nosy: +pmoody,pitrou messages: +msg234241 |
| 2015-01-18 13:14:50 | pitrou | set | stage: patch review |
| 2015-01-18 12:40:52 | cmn | set | files: +testrig.tar.gz messages: +msg234240 |
| 2015-01-18 12:32:34 | cmn | create | |