Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Benchmarks

Below, rare is compared to various other common and popular tools on CPU user andreal time.

It's worth noting that in many of these results rare is just as fast, but partof that reason is that it consumes CPU in a more efficient way (go is great at parallelization).So take that into account, for better or worse.

All tests were done on ~83MB of gzip'd (1.5GB gunzip'd) nginx logs spread across 10 files.

Each program was run 3 times and the last time was taken (to make sure things were cached equally).

zcat & grep

$ time zcat testdata/* | grep -Poa '" (\d{3})' | wc -l8373328real    0m11.272suser    0m16.239ssys     0m1.989s$ time zcat testdata/* | grep -Poa '" 200' > /dev/nullreal    0m5.416suser    0m4.810ssys     0m1.185s

I believe the largest holdup here is the fact that zcat will pass all the data to grep via a synchronous pipe, whereasrare can process everything in async batches. Usingpigz instead didn't yield different results, but on single-fileresults they did perform comparibly.

Silver Searcher (ag)

Warning

ag version 2.2.0 has a bug where it won't scan all my testdata. I'll hold on benchmarking until there's a fix.

Old Benchmark (Less data by factor of ~8x)

$ ag --versionag version 2.2.0Features:  +jit +lzma +zlib$ time ag -z '" (\d{3})' testdata/* | wc -l1131354real    0m3.944suser    0m3.904ssys 0m0.152s

rare

At no point scanning the data doesrare exceed ~76MB of resident memory.

$ rare -vrare version 0.1.16, 11ca2bfc4ad35683c59929a74ad023cc762a29ae$ time rare filter -m '" (\d{3})' -e "{1}" -z testdata/* | wc -lMatched: 8,373,328 / 8,373,3288373328real    0m16.192suser    0m20.298ssys     0m20.697s$ time rare histo -m '" (\d{3})' -e "{1}" -z testdata/*404                 5,557,374 200                 2,564,984 400                 243,282   405                 5,708     408                 1,397     Matched: 8,373,328 / 8,373,328 (Groups: 8)real    0m3.869suser    0m13.423ssys     0m0.191s

pcre2

The PCRE2 version is approximately the same on a simple regular expression, but begins to shineon more complex regex's.

$ time rare table -z -m "\[(.+?)\].*\" (\d+)" -e "{buckettime {1} year nginx}" -e "{bucket {2} 100}" testdata/*          2020      2019      400       2,915,487 2,892,274           200       1,716,107 848,925             300       290       245                 Matched: 8,373,328 / 8,373,328 (R: 3; C: 2)real    0m31.419suser    1m40.060ssys     0m0.657s$ time rare-pcre table -z -m "\[(.+?)\].*\" (\d+)" -e "{buckettime {1} year nginx}" -e "{bucket {2} 100}" testdata/*          2020      2019      400       2,915,487 2,892,274           200       1,716,107 848,925             300       290       245                 Matched: 8,373,328 / 8,373,328 (R: 3; C: 2)real    0m7.936suser    0m27.600ssys     0m0.301s

[8]ページ先頭

©2009-2025 Movatter.jp