- Notifications
You must be signed in to change notification settings - Fork1.2k
RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.
License
google/re2
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
RE2 is an efficient, principled regular expression librarythat has been used in production at Google and many other placessince 2006.
Safety is RE2's primary goal.
RE2 was designed and implemented with an explicit goal of being ableto handle regular expressions from untrusted users without risk.One of its primary guarantees is that the match time is linear in thelength of the input string. It was also written with production concerns in mind:the parser, the compiler and the execution engines limit their memory usageby working within a configurable budget—failing gracefully when exhausted—andthey avoid stack overflow by eschewing recursion.
It is not a goal to be faster than all other engines under all circumstances.Although RE2 guarantees a running time that is asymptotically linear inthe length of the input, more complex expressions may incur larger constant factors;longer expressions increase the overhead required to handle those expressions safely.In a sense, RE2 is pessimistic where a backtracking engine is optimistic:A backtracking engine tests each alternative sequentially, making it fast when the first alternative is common.By contrast RE2 evaluates all alternatives in parallel, avoiding the performance penalty for the last alternative,at the cost of some overhead. This pessimism is what makes RE2 secure.
It is also not a goal to implement all of the features offered by Perl, PCRE and other engines.As a matter of principle, RE2 does not support constructs for which only backtracking solutions are known to exist.Thus, backreferences and look-around assertions are not supported.
For more information, please refer to Russ Cox's articles on regular expression theory and practice:
- Regular Expression Matching Can Be Simple And Fast
- Regular Expression Matching: the Virtual Machine Approach
- Regular Expression Matching in the Wild
In POSIX mode, RE2 accepts standard POSIX (egrep) syntax regular expressions.In Perl mode, RE2 accepts most Perl operators. The only excluded ones arethose that require backtracking (and its potential for exponential runtime)to implement. These include backreferences (submatching is still okay)and generalized assertions.TheSyntax wiki pagedocuments the supported Perl-mode syntax in detail.The default is Perl mode.
RE2's native language is C++, although there areports and wrappers listed below.
There are two basic operators:RE2::FullMatch requires the regexp to match the entire input text, andRE2::PartialMatch looks for a match for a substring of the input text,returning the leftmost-longest match in POSIX mode and thesame match that Perl would have chosen in Perl mode.
Examples:
assert(RE2::FullMatch("hello","h.*o"))assert(!RE2::FullMatch("hello","e"))assert(RE2::PartialMatch("hello","h.*o"))assert(RE2::PartialMatch("hello","e"))
Both matching functions take additional arguments in which submatches will be stored.The argument can be astring*, or an integer type, or the typeabsl::string_view*.(Theabsl::string_view type is very similar to thestd::string_view type,but for historical reasons, RE2 uses the former.)Astring_view is a pointer to the original input text, along with a count.It behaves like a string but doesn't carry its own storage.Like when using a pointer, when using astring_viewyou must be careful not to use it once the original text has been deleted or gone out of scope.
Examples:
// Successful parsing.int i;string s;assert(RE2::FullMatch("ruby:1234","(\\w+):(\\d+)", &s, &i));assert(s =="ruby");assert(i ==1234);// Fails: "ruby" cannot be parsed as an integer.assert(!RE2::FullMatch("ruby","(.+)", &i));// Success; does not extract the number.assert(RE2::FullMatch("ruby:1234","(\\w+):(\\d+)", &s));// Success; skips NULL argument.assert(RE2::FullMatch("ruby:1234","(\\w+):(\\d+)", (void*)NULL, &i));// Fails: integer overflow keeps value from being stored in i.assert(!RE2::FullMatch("ruby:123456789123","(\\w+):(\\d+)", &s, &i));
The examples above all recompile the regular expression on each call.Instead, you can compile it once to an RE2 object and reuse that object for each call.
Example:
RE2re("(\\w+):(\\d+)");assert(re.ok());// compiled; if not, see re.error();assert(RE2::FullMatch("ruby:1234", re, &s, &i));assert(RE2::FullMatch("ruby:1234", re, &s));assert(RE2::FullMatch("ruby:1234", re, (void*)NULL, &i));assert(!RE2::FullMatch("ruby:123456789123", re, &s, &i));
The constructor takes an optional second argument that canbe used to change RE2's default options.For example,RE2::Quiet silences the error messages that areusually printed when a regular expression fails to parse:
RE2re("(ab", RE2::Quiet);// don't write to stderr for parser failureassert(!re.ok());// can check re.error() for details
Other useful predefined options areLatin1 (disable UTF-8) andPOSIX(use POSIX syntax and leftmost longest matching).
You can also declare your ownRE2::Options object and then configure it as you like.See theheader for the full set of options.
RE2 operates on Unicode code points: it makes no attempt at normalization.For example, the regular expression /ü/ (U+00FC, u with diaeresis)does not match the input "ü" (U+0075 U+0308, u followed by combining diaeresis).Normalization is a long, involved topic.The simplest solution, if you need such matches, is to normalize both the regular expressionsand the input in a preprocessing step before using RE2.For more details on the general topic, seehttps://www.unicode.org/reports/tr15/.
For advanced usage, like constructing your own argument lists,or using RE2 as a lexer, or parsing hex, octal, and C-radix numbers,seere2.h.
RE2 can be built and installed using GNU make, CMake, or Bazel.The simplest installation instructions are:
makemake testmake benchmarkmake installmake testinstallBuilding RE2 requires a C++17 compiler and theAbseil library.Building the tests and benchmarks requiresGoogleTestandBenchmark.To obtain those:
- Linux:
apt install libabsl-dev libgtest-dev libbenchmark-dev - macOS:
brew install abseil googletest google-benchmark pkg-config-wrapper - Windows:
vcpkg install abseil gtest benchmark
orvcpkg add port abseil gtest benchmark
Once those are installed, the build has to be able to find them.If the standard Makefile has trouble, then switching to CMake can help:
rm -rf buildcmake -DRE2_TEST=ON -DRE2_BENCHMARK=ON -S . -B buildcd buildmakemake testmake installWhen using CMake, with benchmarks enabled,make test builds and runs test binariesand builds aregexp_benchmark binary but does not run it.If you don't need the tests or benchmarks at all, you can omit the corresponding-D arguments,and then you don't need the GoogleTest or Benchmark dependencies either.
Another useful option is-DRE2_USE_ICU=ON, which adds a dependency on theICU Unicode library but also extends the list of property names available in the\p and\P patterns.
CMake can also be used to generate Visual Studio and Xcode projects, as well asCygwin, MinGW, and MSYS makefiles.
- Visual Studio users: You need Visual Studio 2019 or later.
- Cygwin users: You must run CMake from the Cygwin command line, not the Windows command line.
If you are adding RE2 to your own CMake project,CMake has two ways to use a dependency:add_subdirectory(),which is when the dependency'ssources are in a subdirectory of your project;andfind_package(), which is when the dependency'sbinaries have been built and installed somewhere on your system.The Abseil documentation walks through the formerhereversus the latterhere.Once you get Abseil working, getting RE2 working will be a very similar process and,either way,target_link_libraries(… re2::re2) should Just Work™.
If you are usingBazel, it will handle the dependencies for you,although you still need to download Bazel,which you can do withBazelisk.
go install github.com/bazelbuild/bazelisk@latest# or on mac: brew install bazeliskbazelisk build :allbazelisk test :allIf you are using RE2 from another project, you need to make sure you areusing at least C++17.See the RE2.bazelrc file for an example.
RE2 is implemented in C++.
The official Python wrapper isin thepython directoryandpublished on PyPI asgoogle-re2.Note that there is also a PyPIre2 but it is not by the RE2 authors and is unmaintained. Usegoogle-re2.
There are also other unofficial wrappers:
- A C wrapper is athttps://github.com/marcomaggi/cre2/.
- A D wrapper is athttps://github.com/ShigekiKarita/re2d/ andon DUB.
- An Erlang wrapper is athttps://github.com/dukesoferl/re2/ andon Hex.
- An Inferno wrapper is athttps://github.com/powerman/inferno-re2/.
- A Node.js wrapper is athttps://github.com/uhop/node-re2/ andon NPM.
- An OCaml wrapper is athttps://github.com/janestreet/re2/ andon OPAM.
- A Perl wrapper is athttps://github.com/dgl/re-engine-RE2/ andon CPAN.
- An R wrapper is athttps://github.com/girishji/re2/ andon CRAN.
- A Ruby wrapper is athttps://github.com/mudge/re2/ and on RubyGems (rubygems.org).
- A WebAssembly wrapper is athttps://github.com/google/re2-wasm/ and on NPM (npmjs.com).
RE2J is a port of the RE2 C++ code to pure Java,andRE2JS is a port of RE2J to JavaScript.
TheGoregexp packageandRustregex cratedo not share code with RE2, but they follow the same principles,accept the same syntax, and provide the same efficiency guarantees.
Theissue tracker is the best place for discussions.
There is amailing list for keeping up with code changes.
Please read thecontribution guide before sending changes.In particular, note that RE2 does not use GitHub pull requests.
About
RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.
Resources
License
Code of conduct
Contributing
Security policy
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.