Benchmarks¶
Below, rare is compared to various other common and popular tools.
It's worth noting that in many of these results rare is just as fast, but partof that reason is that it consumes CPU in a more efficient way (go is great at parallelization).So take that into account, for better or worse.
All tests were done on ~83MB of gzip'd (1.5GB gunzip'd) nginx logs spread across 10 files. Theywere run on a spinning disk on an older machine. New machines run significantly faster.
Each program was run 3 times and the last time was taken (to make sure things were cached equally).
rare¶
At no point scanning the data doesrare
exceed ~4MB of resident memory.
$rare-vrareversion0.4.3,e0fc395;regex:re2$timerarefilter-m'" (\d{3})'-e"{1}"-ztestdata/*.gz|wc-lMatched:8,373,328/8,373,3288373328real0m3.266suser0m10.607ssys0m0.769s
When aggregating data,rare
is significantly faster than alternatives.
$timerarehisto-m'" (\d{3})'-e"{1}"-ztestdata/*.gz4045,557,3742002,564,984400243,2824055,7084081,397Matched:8,373,328/8,373,328(Groups:8)[9/9]1.41GB(514.25MB/s)real0m2.870suser0m9.606ssys0m0.393s
And, as an alternative, usingdissect matcher instead of regex is even slightly faster:
$timerarehisto-d'" %{CODE} '-e'{CODE}'-ztestdata/*.gz4045,557,3742002,564,984400243,2824055,7084081,397Matched:8,373,328/8,373,328(Groups:8)[9/9]1.41GB(531.11MB/s)real0m2.533suser0m7.976ssys0m0.350s
pcre2¶
The PCRE2 version is approximately the same on a simple regular expression, but begins to shineon more complex regex's.
# Normal re2 version$timeraretable-z-m"\[(.+?)\].*\" (\d+)"-e"{buckettime {1} year nginx}"-e"{bucket {2} 100}"testdata/*.gz202020194002,915,4872,892,2742001,716,107848,925300290245Matched:8,373,328/8,373,328(R:3;C:2)[9/9]1.41GB(52.81MB/s)real0m27.880suser1m28.782ssys0m0.824s# libpcre2 version$timerare-pcretable-z-m"\[(.+?)\].*\" (\d+)"-e"{buckettime {1} year nginx}"-e"{bucket {2} 100}"testdata/*.gz202020194002,915,4872,892,2742001,716,107848,925300290245Matched:8,373,328/8,373,328(R:3;C:2)[9/9]1.41GB(241.82MB/s)real0m5.751suser0m20.173ssys0m0.461s
zcat & grep¶
$ time zcat testdata/*.gz | grep -Poa '" (\d{3})' | wc -l8373328real 0m11.272suser 0m16.239ssys 0m1.989s$ time zcat testdata/* | grep -Poa '" 200' > /dev/nullreal 0m5.416suser 0m4.810ssys 0m1.185s
I believe the largest holdup here is the fact that zcat will pass all the data to grep via a synchronous pipe, whereasrare can process everything in async batches. Usingpigz
orzgrep
instead didn't yield different results, but on single-fileresults they did perform comparibly.
Ripgrep¶
Ripgrep (rg
) is the most comparible for the use-case, but lacksthe complete functionality that rare exposes.
$timerg-z'" (\d{3})'testdata/*.gz|wc-l8373328real0m3.791suser0m8.149ssys0m4.420s
Other Tools¶
If there are other tools worth comparing, please createa new issue on thegithub tracker.