Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue20826

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:Faster implementation to collapse consecutive ip-networks
Type:performanceStage:commit review
Components:Library (Lib)Versions:Python 3.5
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To: pmoodyNosy List: exhuma, ezio.melotti, ncoghlan, pitrou, pmoody, python-dev
Priority:normalKeywords:patch

Created on2014-03-02 14:39 byexhuma, last changed2022-04-11 14:57 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
faster-collapse-addresses.patchexhuma,2014-03-02 14:38review
testrig.zipexhuma,2014-03-23 12:12Testdata and scripts.
faster_collapse.patchpitrou,2014-05-12 16:34review
faster_collapse2.patchpitrou,2014-05-12 19:10review
Messages (10)
msg212553 -(view)Author: Michel Albert (exhuma)*Date: 2014-03-02 14:38
This alternative implementation runs over the ``addresses`` collection only once, and "backtracks" only if necessary. Inspired by a "shift-reduce" approach.Technically both are O(n), so the best case is always the same. But the old implementation runs over the *complete* list multiple times until it cannot make any more optimisations. The new implementation only repeats the optimisation on elements which require reconciliation.Tests on a local machine have shown a considerable increase in speed on large collections of elements (iirc about twice as fast on average).
msg213154 -(view)Author: pmoody (pmoody)*(Python committer)Date: 2014-03-11 17:45
Hey Exhuma, thanks for the patch.Can you give me an example list on which this shift reduce approach works much better?
msg214565 -(view)Author: Michel Albert (exhuma)*Date: 2014-03-23 12:12
Sorry for the late reply. I wanted to take some time and give a more detailed explanation. At least to the best of my abilities :)I attached a zip-file with my quick-and-dirty test-rig. The zip contains: * gendata.py -- The script I used to generate test-data * testdata.lst -- My test-data set (for reproducability) * tester.py -- A simple script using ``timeit.timeit``.I am not sure how sensitive the data is I am working with, so I prefer not to put any of the real data on a public forum. Instead, I wrote a small script which generates a data-set which makes the performance difference visible (``gendata.py``). The data which I processed actually created an even worse case, but it's difficult to reproduce. In my case, the data-set I used is in the file named ``testdata.lst``.I then ran the operation 5 times using ``timeit`` (tester.py).Let me also outline an explanation to what it happening:It is possible, that through one "merge" operation, a second may become possible. For the sake of readability, let's use IPv4 addresses, and consider the following list:    [a_1, a_2, ..., a_n, 192.168.1.0/31, 192.168.1.2/32, 192.168.1.3/32, b_1, b_2, ..., b_n]This can be reduced to    [a_1, a_2, ..., a_n, 192.168.1.0/31, 192.168.1.2/31, b_1, b_2, ..., b_n]Which in turn can then be reduced to:    [a_1, a_2, ..., a_n, 192.168.1.0/30, b_1, b_2, ..., b_n]The current implementation, sets a boolean (``optimized``) to ``True`` if any merge has been performed. If yes, it re-runs through the whole list until no optimisation is done. Those re-runs also include [a1..an] and [b1..bn], which is unnecessary. With the given data-set, this gives the following result:    Execution time: 48.27790632040014 seconds    ./python tester.py  244.29s user 0.06s system 99% cpu 4:04.51 totalWith the shift/reduce approach, only as many nodes are visited as necessary. If a "reduce" is made, it "backtracks" as much as possible, but not further. So in the above example, nodes [a1..an] will only be visited once, and [b1..bn] will only be visited once the complete optimisation of the example addresses has been performed. With the given data-set, this gives the following result:    Execution time: 20.298685277199912 seconds    ./python tester.py  104.20s user 0.14s system 99% cpu 1:44.58 totalIf my thoughts are correct, both implementations should have a similar "best-case", but the "worst-case" differs significantly. I am not well-versed with the Big-O notation, especially the best/average/worst case difference. Neither am I good at math. But I believe both are strictly speaking O(n). But more precisely, O(k*n) where k is proportional the number of reconciliation steps needed (that is, if one merger makes another merger possible). But it is much smaller in the shift/reduce approach as only as many elements need to be revisited as necessary, instead of all of them.
msg218305 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2014-05-12 00:18
From a quick look, the algorithm is indeed correct, and it should perform better on average than the previous one.Note: both algorithms are O(n**2) worst case, not O(n).
msg218334 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2014-05-12 16:34
Here is a much faster patch, around 30x faster than the original code.With exhuma's data set and tester.py, the original code gives:$ time python3.4 tester.py Execution time: 5.949284339199949 secondsreal0m30.152suser0m30.104ssys0m0.016sThe patched code gives:$ time ./python tester.py Execution time: 0.25444041779992405 secondsreal0m1.695suser0m1.681ssys0m0.012sexhuma, perhaps you want to test with other data sets?
msg218337 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2014-05-12 16:53
Uh, those measurements are wrong, sorry, since tester.py doesn't consume the iterator. When consuming the iterator, the patch is ~ 4x faster than the original code, which is more reasonable :-)
msg218340 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2014-05-12 17:12
It seems like the speed of Network.supernet() is a bottleneck here.issue16531 would probably allow making supernet() much faster.
msg218354 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2014-05-12 19:10
Updated patch, a bit faster yet. Afterissue16531 is committed, it is now ~15x faster than 3.4.
msg218624 -(view)Author: Roundup Robot (python-dev)(Python triager)Date: 2014-05-15 18:40
New changeset8867874a2b7d by Antoine Pitrou in branch 'default':Issue#20826: Optimize ipaddress.collapse_addresses().http://hg.python.org/cpython/rev/8867874a2b7d
msg218625 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2014-05-15 19:05
I've now committed this. exhuma, if you have any further observations or results, don't hesitate to post them!
History
DateUserActionArgs
2022-04-11 14:57:59adminsetgithub: 65025
2014-05-15 19:05:45pitrousetstatus: open -> closed
resolution: fixed
messages: +msg218625

stage: patch review -> commit review
2014-05-15 18:40:59python-devsetnosy: +python-dev
messages: +msg218624
2014-05-12 19:10:21pitrousetfiles: +faster_collapse2.patch

messages: +msg218354
2014-05-12 17:12:39pitrousetmessages: +msg218340
2014-05-12 16:53:17pitrousetmessages: +msg218337
2014-05-12 16:34:03pitrousetfiles: +faster_collapse.patch

messages: +msg218334
2014-05-12 00:18:09pitrousetnosy: +pitrou
messages: +msg218305
2014-05-09 12:48:04ezio.melottisetnosy: +ezio.melotti

stage: patch review
2014-03-23 12:12:14exhumasetfiles: +testrig.zip

messages: +msg214565
2014-03-11 17:45:31pmoodysetassignee:pmoody
messages: +msg213154
2014-03-02 14:39:00exhumacreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp