Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Benchmarks

Below, rare is compared to various other common and popular tools.

It's worth noting that in many of these results rare is just as fast, but partof that reason is that it consumes CPU in a more efficient way (go is great at parallelization).So take that into account, for better or worse.

All tests were done on ~824MB of gzip'd (13.93 GB gunzip'd) nginx logs spread across 8 files. Theywere run on a NVMe SSD on a recent (2025) machine.

Each program was run 3 times and the last time was taken (to make sure things were cached equally).

rare

At no point scanning the data doesrare exceed ~42MB of resident memory. Buffer sizes can be tweakedto adjust memory usage.

$rare-vrareversion0.4.3,e0fc395;regex:re2$timerarefilter-m'" (\d{3})'-e"{1}"-ztestdata/*.gz|wc-lMatched:82,733,280/82,733,28082733280real0m3.409suser0m32.750ssys0m2.175s

When aggregating data,rare is significantly faster than alternatives.

$timerarehisto-m'" (\d{3})'-e"{1}"-ztestdata/*.gz40454,843,84020025,400,1604002,412,96040556,64040813,920Matched:82,733,280/82,733,280(Groups:8)[8/8]13.93GB(4.27GB/s)real0m3.283suser0m31.485ssys0m1.497s

And, as an alternative, usingdissect matcher instead of regex is even slightly faster:

$timerarehisto-d'" %{CODE} '-e'{CODE}'-ztestdata/*.gz40454,843,84020025,400,1604002,412,96040556,64040813,920Matched:82,733,280/82,733,280(Groups:8)[8/8]13.93GB(5.61GB/s)real0m2.546suser0m22.922ssys0m1.491s

pcre2

The PCRE2 version is approximately the same on a simple regular expression, but begins to shineon more complex regex's.

# Normal re2 version$timeraretable-z-m"\[(.+?)\].*\" (\d+)"-e"{buckettime {1} year nginx}"-e"{bucket {2} 100}"testdata/*.gz2020201940028,994,88028,332,48020017,084,6408,316,0003002,8802,400Matched:82,733,280/82,733,280(R:3;C:2)[8/8]13.93GB(596.89MB/s)real0m23.819suser3m52.252ssys0m1.625s# libpcre2 version$timerare-pcretable-z-m"\[(.+?)\].*\" (\d+)"-e"{buckettime {1} year nginx}"-e"{bucket {2} 100}"testdata/*.gz2020201940028,994,88028,332,48020017,084,6408,316,0003002,8802,400Matched:82,733,280/82,733,280(R:3;C:2)[8/8]13.93GB(2.10GB/s)real0m6.813suser1m15.638ssys0m1.985s

zcat & grep

$ time zcat testdata/*.gz | grep -Poa '" (\d{3})' | wc -l82733280real    0m28.414suser    0m35.268ssys     0m1.865s$ time zcat testdata/*.gz | grep -Poa '" 200' > /dev/nullreal    0m28.616suser    0m27.517ssys     0m1.658s

I believe the largest holdup here is the fact that zcat will pass all the data to grep via a synchronous pipe, whereasrare can process everything in async batches. Usingpigz orzgrep instead didn't yield different results, but on single-fileresults they did perform comparibly.

Ripgrep

Ripgrep (rg) is the most comparible for the use-case, but lacksthe complete functionality that rare exposes.

$timerg-z'" (\d{3})'testdata/*.gz|wc-l82733280real0m7.058suser0m40.284ssys0m8.962s

Other Tools

If there are other tools worth comparing, please createa new issue on thegithub tracker.


[8]ページ先頭

©2009-2025 Movatter.jp