- Notifications
You must be signed in to change notification settings - Fork202
Compile Time Regular Expression in C++
License
hanickadot/compile-time-regular-expressions
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Fast compile-time regular expressions with support for matching/searching/capturing during compile-time or runtime.
You can use the single header version from directorysingle-header. This header can be regenerated withmake single-header. If you are using cmake, you can add this directory as subdirectory and link to targetctre.
More info atcompile-time.re
ctre::match<"REGEX">(subject);// C++20"REGEX"_ctre.match(subject);// C++17 + N3599 extension
- Matching
- Searching (
searchorstarts_with) - Capturing content (named captures are supported too, but only with syntax
(?<name>...)) - Back-Reference (
\g{N}syntax, and\1...\9syntax too) - Multiline support (with
multiline_) functions - Unicode properties and UTF-8 support
The library is implementing most of the PCRE syntax with a few exceptions:
- callouts
- comments
- conditional patterns
- control characters (
\cX) - match point reset (
\K) - named characters
- octal numbers
- options / modes
- subroutines
- unicode grapheme cluster (
\X)
More documentation onpcre.org.
Not all escaped characters are automatically inserted as self, behaviour of the library is escaped characters are with special meaning, unknown escaped character is a syntax error.
Explicitly allowed character escapes which insert only the character are:
\-\"\<\>
This is approximated API specification from a user perspective (omittingconstexpr andnoexcept which are everywhere, and using C++20 syntax even the API is C++17 compatible):
// look if whole input matches the regex:template<fixed_string regex>autoctre::match(auto Range &&) -> regex_results;template<fixed_string regex>autoctre::match(auto First &&,auto Last &&) -> regex_results;// look if input contains match somewhere inside of itself:template<fixed_string regex>autoctre::search(auto Range &&) -> regex_results;template<fixed_string regex>autoctre::search(auto First &&,auto Last &&) -> regex_results;// check if input starts with match (but doesn't need to match everything):template<fixed_string regex>autoctre::starts_with(auto Range &&) -> regex_results;template<fixed_string regex>autoctre::starts_with(auto First &&,auto Last &&) -> regex_results;// result type is deconstructible into a structured bindingstemplate<...>structregex_results {operatorbool()const;// if it's a matchautoto_view()const -> std::string_view;// also view()autoto_string()const -> std::string;// also str()operatorstd::string_view()const;// also supports all char variantsexplicitoperatorstd::string()const;// also size(), begin(), end(), data()size_tcount()const;// number of capturestemplate<size_t Id>const captured_content &get()const;// provide specific capture, whole regex_results is implicit capture 0};
// search for regex in input and return each occurrence, ignoring rest:template<fixed_string regex>autoctre::range(auto Range &&) -> range of regex_result;template<fixed_string regex>autoctre::range(auto First &&,auto Last &&) -> range of regex_result;// return range of each match, stopping at something which can't be matchedtemplate<fixed_string regex>autoctre::tokenize(auto Range &&) -> range of regex_result;template<fixed_string regex>autoctre::tokenize(auto First &&,auto Last &&) -> range of regex_result;// return parts of the input split by the regex, returning it as part of content of the implicit zero capture (other captures are not changed, you can use it to access how the values were split):template<fixed_string regex>autoctre::split(auto Range &&) -> regex_result;template<fixed_string regex>autoctre::split(auto First &&,auto Last &&) -> range of regex_result;
All the functions (ctre::match,ctre::search,ctre::starts_with,ctre::range,ctre::tokenize,ctre::split) are functors and can be used without parenthesis:
auto matcher = ctre::match<"regex">;if (matcher(input)) ...
std::string-like objects (std::string_viewor your own string if it's providingbegin/endfunctions with forward iterators)- pairs of forward iterators
To enable you need to include:
<ctre-unicode.hpp>- or
<ctre.hpp>and<unicode-db.hpp>
Otherwise you will get missing symbols if you try to use the unicode support without enabling it.
- clang 14.0+ (template UDL, C++17 syntax, C++20 cNTTP syntax)
- xcode clang 15.0+ (template UDL, C++17 syntax, C++20 cNTTP syntax)
- gcc 9.0+ (C++17 & C++20 cNTTP syntax)
- MSVC 14.29+ (Visual Studio 16.11+) (C++20 cNTTP syntax)
The compiler must support extension N3599, for example as GNU extension in gcc (not in GCC 9.1+) and clang.
constexprautomatch(std::string_view sv)noexcept {usingnamespacectre::literals;return"h.*"_ctre.match(sv);}
If you need extension N3599 in GCC 9.1+, you can't use -pedantic. Also, you need to define macroCTRE_ENABLE_LITERALS.
You can provide a pattern as aconstexpr ctll::fixed_string variable.
staticconstexprauto pattern = ctll::fixed_string{"h.*" };constexprautomatch(std::string_view sv)noexcept {return ctre::match<pattern>(sv);}
(this is tested in MSVC 15.8.8)
Currently, the only compiler which supports cNTTP syntaxctre::match<PATTERN>(subject) is GCC 9+.
constexprautomatch(std::string_view sv)noexcept {return ctre::match<"h.*">(sv);}
std::optional<std::string_view>extract_number(std::string_view s)noexcept {if (auto m = ctre::match<"[a-z]+([0-9]+)">(s)) {return m.get<1>().to_view(); }else {return std::nullopt; }}
structdate { std::string_view year; std::string_view month; std::string_view day; };std::optional<date>extract_date(std::string_view s)noexcept {usingnamespacectre::literals;if (auto [whole, year, month, day] = ctre::match<"(\\d{4})/(\\d{1,2})/(\\d{1,2})">(s); whole) {return date{year, month, day}; }else {return std::nullopt; }}// static_assert(extract_date("2018/08/27"sv).has_value());// static_assert((*extract_date("2018/08/27"sv)).year == "2018"sv);// static_assert((*extract_date("2018/08/27"sv)).month == "08"sv);// static_assert((*extract_date("2018/08/27"sv)).day == "27"sv);
auto result = ctre::match<"(?<year>\\d{4})/(?<month>\\d{1,2})/(?<day>\\d{1,2})">(s);return date{result.get<"year">(), result.get<"month">, result.get<"day">};// or in C++ emulation, but the object must have a linkagestaticconstexpr ctll::fixed_string year ="year";staticconstexpr ctll::fixed_string month ="month";staticconstexpr ctll::fixed_string day ="day";return date{result.get<year>(), result.get<month>(), result.get<day>()};// or use numbered access// capture 0 is the whole matchreturn date{result.get<1>(), result.get<2>(), result.get<3>()};
enumclasstype { unknown, identifier, number};structlex_item { type t; std::string_view c;};std::optional<lex_item>lexer(std::string_view v)noexcept {if (auto [m,id,num] = ctre::match<"([a-z]+)|([0-9]+)">(v); m) {if (id) {return lex_item{type::identifier, id}; }elseif (num) {return lex_item{type::number, num}; } }return std::nullopt;}
This support is preliminary, probably the API will be changed.
auto input ="123,456,768"sv;for (auto match: ctre::search_all<"([0-9]+),?">(input)) std::cout << std::string_view{match.get<0>()} <<"\n";
#include<ctre-unicode.hpp>#include<iostream>// needed if you want to output to the terminalstd::string_viewcast_from_unicode(std::u8string_view input)noexcept {returnstd::string_view(reinterpret_cast<constchar *>(input.data()), input.size());}intmain() {usingnamespacestd::literals; std::u8string_view original =u8"Tu es un génie"sv;for (auto match: ctre::search_all<"\\p{Letter}+">(original)) std::cout <<cast_from_unicode(match) << std::endl;return0;}
You can download and install ctre using thevcpkg dependency manager:
git clone https://github.com/Microsoft/vcpkg.gitcd vcpkg./bootstrap-vcpkg.sh./vcpkg integrate install./vcpkg install ctreThe ctre port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, pleasecreate an issue or pull request on the vcpkg repository.
Just runmake in root of this project.
About
Compile Time Regular Expression in C++
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.