Benchmarks¶
Below, rare is compared to various other common and popular tools on CPU user andreal time.
It's worth noting that in many of these results rare is just as fast, but partof that reason is that it consumes CPU in a more efficient way (go is great at parallelization).So take that into account, for better or worse.
All tests were done on ~83MB of gzip'd (1.5GB gunzip'd) nginx logs spread across 10 files.
Each program was run 3 times and the last time was taken (to make sure things were cached equally).
zcat & grep¶
$ time zcat testdata/* | grep -Poa '" (\d{3})' | wc -l8373328real 0m11.272suser 0m16.239ssys 0m1.989s$ time zcat testdata/* | grep -Poa '" 200' > /dev/nullreal 0m5.416suser 0m4.810ssys 0m1.185s
I believe the largest holdup here is the fact that zcat will pass all the data to grep via a synchronous pipe, whereasrare can process everything in async batches. Usingpigz
instead didn't yield different results, but on single-fileresults they did perform comparibly.
Silver Searcher (ag)¶
Warning
ag version 2.2.0 has a bug where it won't scan all my testdata. I'll hold on benchmarking until there's a fix.
Old Benchmark (Less data by factor of ~8x)¶
$ ag --versionag version 2.2.0Features: +jit +lzma +zlib$ time ag -z '" (\d{3})' testdata/* | wc -l1131354real 0m3.944suser 0m3.904ssys 0m0.152s
rare¶
At no point scanning the data doesrare
exceed ~76MB of resident memory.
$ rare -vrare version 0.1.16, 11ca2bfc4ad35683c59929a74ad023cc762a29ae$ time rare filter -m '" (\d{3})' -e "{1}" -z testdata/* | wc -lMatched: 8,373,328 / 8,373,3288373328real 0m16.192suser 0m20.298ssys 0m20.697s$ time rare histo -m '" (\d{3})' -e "{1}" -z testdata/*404 5,557,374 200 2,564,984 400 243,282 405 5,708 408 1,397 Matched: 8,373,328 / 8,373,328 (Groups: 8)real 0m3.869suser 0m13.423ssys 0m0.191s
pcre2¶
The PCRE2 version is approximately the same on a simple regular expression, but begins to shineon more complex regex's.
$ time rare table -z -m "\[(.+?)\].*\" (\d+)" -e "{buckettime {1} year nginx}" -e "{bucket {2} 100}" testdata/* 2020 2019 400 2,915,487 2,892,274 200 1,716,107 848,925 300 290 245 Matched: 8,373,328 / 8,373,328 (R: 3; C: 2)real 0m31.419suser 1m40.060ssys 0m0.657s$ time rare-pcre table -z -m "\[(.+?)\].*\" (\d+)" -e "{buckettime {1} year nginx}" -e "{bucket {2} 100}" testdata/* 2020 2019 400 2,915,487 2,892,274 200 1,716,107 848,925 300 290 245 Matched: 8,373,328 / 8,373,328 (R: 3; C: 2)real 0m7.936suser 0m27.600ssys 0m0.301s