Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

experimental and very fast implementation of a grep

License

NotificationsYou must be signed in to change notification settings

stealth/grab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 

Repository files navigation

greppin

This is my own, experimental, parallel version ofgrep so I can testvarious strategies to speed up access to large directory trees.On Flash storage or SSDs, you can easily outsmart common greps by upa factor of 8.

Options:

Usage: ./greppin [-rIOLlsSH] [-n <cores>] <regex> <path>-2-- use PCRE2 instead of PCRE-O-- print file offset of match-l-- do not print the matching line (Useful if you want   to see _all_ offsets; if you also print the line, only   the first match in the line counts)-s-- single match; dont search file further after first match   (similar to grep on a binary)-H-- use hyperscan lib for scanning-S-- only for hyperscan: interpret pattern as string literal instead of regex-L-- machine has low mem; half chunk-size (default 2GB)   may be used multiple times-I-- enable highlighting of matches (useful)-n-- Use multiple cores in parallel (omit for single core)-r-- recurse on directory

grab uses thepcre library, so basically its equivalent to agrep -P -a.The-P is important, since Perl-Compatible Regular Expressions have differentcharacteristics than basic regexes.

Build

There are two branches.master andgreppin. Master is the 'traditional'grab that should compile and run on most POSIX systems.greppin comes withits own optimized and parallelized version ofnftw() andreaddir(), whichagain doubles speed on the top of speedup that themaster branch alreadyprovides. Thegreppin branch runs on Linux, BSD and OSX.greppin also comeswith support for Intel'shyperscan libraries that tryto exploit CPU's SIMD instructions if possible (AVX2, AVX512 etc.) when compilingthe regex pattern into JIT code.

You will most likely want to build thegreppin branch:

$ git checkout greppin[...]$ cd src; make[...]

Make sure you have thepcre andpcre2 library packages installed.On BSD systems you needgmake instead ofmake.If you want to do cutting edge tech withgreppin's multiple regex engine and hyperscansupport, you first need to get and build that:

$ git clone https://github.com/intel/hyperscan[...]$ cd hyperscan$ mkdir build; cd build$ cmake -DFAT_RUNTIME=1 -DBUILD_STATIC_AND_SHARED=1 ..[...]$ make[...]

This will build so calledfat runtime of the hyperscan libs which contain supportfor all CPU families in order to select the right compilation pattern at runtimefor most performance. Once the build finishes, you buildgreppin against that:

(inside grab cloned repo)

$ cd src$ HYPERSCAN_BUILD=/path/to/hyperscan/build make -f Makefile.hs[...]

This will produce agreppin binary that enables the-H option to loada different engine at runtime, trying to exploit all possible performance bits.

You could link it against already installed libs, but the API just recentlyadded some functions in the 5.x version and most distros ship with 4.x.

Why is it faster?

grab is usingmmap(2) and matches the whole file blobwithout counting newlines (whichgrep is doing even if there is no match[as of a grep code review of mine in 2012; things may be different today])which is a lot faster thanread(2)-ing the file in small chunks and counting thenewlines. If available,grab also uses the PCRE JIT feature.However, speedups are only measurable on large file trees or fast HDDs or SSDs.In the later case, the speedup can be really drastically (up to 3 times faster)if matching recursively and in parallel. Since storage is the bottleneck,parallelizing the search on HDDs makes no sense, as the seeking takes more timethan just doing stuff in linear.

Additionally,grab is skipping files which are too small to contain theregular expression. For larger regex's in a recursive search, this canskip quite good amount of files without even opening them.

A quite newpcre lib is required, on some older systems the build can faildue to a missingPCRE_INFO_MINLENGTH andpcre_study().

Files are mmaped and matched in chunks of 1Gig. For files which are larger,the last 4096 byte (1 page) of a chunk are overlapped, so that matches on a 1 Gigboundary can be found. In this case, you see the match doubled (but with thesame offset).

If you measuregrep vs.grab, keep in mind to drop the dentry and pagecaches between each run:echo 3 > /proc/sys/vm/drop_caches

Note, thatgrep will print only a 'Binary file matches', if it detects binaryfiles, whilegrab will print all matches, unless-s is given. So, for aspeed test you have to search for an expression thatdoes not exist in the data,in order to enforce searching of the entire files.

grab was made to quickly grep through large directory trees without indexing.The originalgrep has by far a more complete option-set. The speedupfor a single file match is very small, if at all measureable.

For SSDs, the multicore option makes sense. For HDDs it does not, sincethe head has to be positioned back and forth between the threads, potentiallydestroying the locality principle and killing performance.

Thegreppin branch features its own lockfree parallel version ofnftw(), so the timeof idling of N - 1 cores when the 1st core builds the directory tree can alsobe used for working.

Whats left to note:grab will traverse directoriesphysically, i.e. it will not followsymlinks.

spot

spot is the parallel version offind. It supports the most frequently used options asyou know it. Theres not much more to tell about it, just try it out.

Examples

This shows the speedup on a 4-core machine with a search on a SSD:

root@linux:~# echo 3 > /proc/sys/vm/drop_cachesroot@linux:~# time grep -r foobardoesnotexist /source/linuxreal0m34.811suser0m3.710ssys0m10.936sroot@linux:~# echo 3 > /proc/sys/vm/drop_cachesroot@linux:~# time grab -r foobardoesnotexist /source/linuxreal0m31.629suser0m4.984ssys0m8.690sroot@linux:~# echo 3 > /proc/sys/vm/drop_cachesroot@linux:~# time grab -n 2 -r foobardoesnotexist /source/linuxreal0m15.203suser0m3.689ssys0m4.665sroot@linux:~# echo 3 > /proc/sys/vm/drop_cachesroot@linux:~# time grab -n 4 -r foobardoesnotexist /source/linuxreal0m13.135suser0m4.023ssys0m5.581s

Withgreppin branch:

root@linux:~# echo 3 > /proc/sys/vm/drop_cachesroot@linux:~# time grep -a -P -r linus /source/linux/|wc -l16918real    1m12.470suser    0m49.548ssys     0m6.162sroot@linux:~# echo 3 > /proc/sys/vm/drop_cachesroot@linux:~# time greppin -n 4 -r linus /source/linux/|wc -l16918real    0m8.773suser    0m4.670ssys     0m5.837sroot@linux:~#

Yes! ~ 9s vs. ~ 72s! Thats 8x as fast on a 4-core SSD machine as the traditional grep.

Just to proof that it resulted in the same output:

root@linux:~# echo 3 > /proc/sys/vm/drop_cachesroot@linux:~# greppin -n 4 -r linus /source/linux/|sort|md5suma1f9fe635bd22575a4cce851e79d26a0  -root@linux:~# echo 3 > /proc/sys/vm/drop_cachesroot@linux:~# grep -P -a -r linus /source/linux/|sort|md5suma1f9fe635bd22575a4cce851e79d26a0  -root@linux:~#

In the single core comparison, speedup also depends on which CPU the kernelactually scheduls thegrep, so agrab may or may not be faster (mostly it is).If the load is equal among the single-core tests,grab will see a speedup ifsearching on large file trees. On multi-core setups,grab can benefit ofcorse.

ripgrep comparison

The project can be foundhere.

The main speedup thats inside their benchmark tables stems from the fact thatripgrepignores a lot of files (notably dotfiles) when invoked without special options as wellas treating binary files as a single-match target (similar togrep). In order to havecomparable results, keep in mind to (4 is the number of cores):

  • echo 3 > /proc/sys/vm/drop_caches between each run
  • Add-j 4 -a --no-unicode --no-pcre2-unicode -uuu --mmap toripgrep, sinceit will by default match Unicode which is 3 times slower, and tries to compensatethe speedloss by skipping 'ignore'-based files.-e is faster than-P,so better choose-e, but thats not as powerful as a PCRE
  • redirect the output to/dev/null to avoid tty based effects
  • add-H -n 4 togreppin if you want best performance.-H is PCRE compatiblewith only very few exceptions (according to hyperscan docu)
  • setfattr -n user.pax.flags -v "m" /path/to/binary if you run on grsec systemsand require rwx JIT mappings

Then just go ahead and check the timings. Even when not using hyperscan,greppinis significantly faster thanrg when using PCRE2 expressions (PCRE2 vs. PCRE2)and still faster when comparing the fastest expressions (-e vs. hyperscan).

About

experimental and very fast implementation of a grep

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors2

  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp