Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Compile Time Regular Expression in C++

License

NotificationsYou must be signed in to change notification settings

hanickadot/compile-time-regular-expressions

Repository files navigation

Build Status

Fast compile-time regular expressions with support for matching/searching/capturing during compile-time or runtime.

You can use the single header version from directorysingle-header. This header can be regenerated withmake single-header. If you are using cmake, you can add this directory as subdirectory and link to targetctre.

More info atcompile-time.re

What this library can do

ctre::match<"REGEX">(subject);// C++20"REGEX"_ctre.match(subject);// C++17 + N3599 extension
  • Matching
  • Searching (search orstarts_with)
  • Capturing content (named captures are supported too, but only with syntax(?<name>...))
  • Back-Reference (\g{N} syntax, and\1...\9 syntax too)
  • Multiline support (withmultiline_) functions
  • Unicode properties and UTF-8 support

The library is implementing most of the PCRE syntax with a few exceptions:

  • callouts
  • comments
  • conditional patterns
  • control characters (\cX)
  • match point reset (\K)
  • named characters
  • octal numbers
  • options / modes
  • subroutines
  • unicode grapheme cluster (\X)

More documentation onpcre.org.

Unknown character escape behaviour

Not all escaped characters are automatically inserted as self, behaviour of the library is escaped characters are with special meaning, unknown escaped character is a syntax error.

Explicitly allowed character escapes which insert only the character are:

\-\"\<\>

Basic API

This is approximated API specification from a user perspective (omittingconstexpr andnoexcept which are everywhere, and using C++20 syntax even the API is C++17 compatible):

// look if whole input matches the regex:template<fixed_string regex>autoctre::match(auto Range &&) -> regex_results;template<fixed_string regex>autoctre::match(auto First &&,auto Last &&) -> regex_results;// look if input contains match somewhere inside of itself:template<fixed_string regex>autoctre::search(auto Range &&) -> regex_results;template<fixed_string regex>autoctre::search(auto First &&,auto Last &&) -> regex_results;// check if input starts with match (but doesn't need to match everything):template<fixed_string regex>autoctre::starts_with(auto Range &&) -> regex_results;template<fixed_string regex>autoctre::starts_with(auto First &&,auto Last &&) -> regex_results;// result type is deconstructible into a structured bindingstemplate<...>structregex_results {operatorbool()const;// if it's a matchautoto_view()const -> std::string_view;// also view()autoto_string()const -> std::string;// also str()operatorstd::string_view()const;// also supports all char variantsexplicitoperatorstd::string()const;// also size(), begin(), end(), data()size_tcount()const;// number of capturestemplate<size_t Id>const captured_content &get()const;// provide specific capture, whole regex_results is implicit capture 0};

Range outputting API

// search for regex in input and return each occurrence, ignoring rest:template<fixed_string regex>autoctre::range(auto Range &&) -> range of regex_result;template<fixed_string regex>autoctre::range(auto First &&,auto Last &&) -> range of regex_result;// return range of each match, stopping at something which can't be matchedtemplate<fixed_string regex>autoctre::tokenize(auto Range &&) -> range of regex_result;template<fixed_string regex>autoctre::tokenize(auto First &&,auto Last &&) -> range of regex_result;// return parts of the input split by the regex, returning it as part of content of the implicit zero capture (other captures are not changed, you can use it to access how the values were split):template<fixed_string regex>autoctre::split(auto Range &&) -> regex_result;template<fixed_string regex>autoctre::split(auto First &&,auto Last &&) -> range of regex_result;

Functors

All the functions (ctre::match,ctre::search,ctre::starts_with,ctre::range,ctre::tokenize,ctre::split) are functors and can be used without parenthesis:

auto matcher = ctre::match<"regex">;if (matcher(input)) ...

Possible subjects (inputs)

  • std::string-like objects (std::string_view or your own string if it's providingbegin/end functions with forward iterators)
  • pairs of forward iterators

Unicode support

To enable you need to include:

  • <ctre-unicode.hpp>
  • or<ctre.hpp> and<unicode-db.hpp>

Otherwise you will get missing symbols if you try to use the unicode support without enabling it.

Supported compilers

  • clang 14.0+ (template UDL, C++17 syntax, C++20 cNTTP syntax)
  • xcode clang 15.0+ (template UDL, C++17 syntax, C++20 cNTTP syntax)
  • gcc 9.0+ (C++17 & C++20 cNTTP syntax)
  • MSVC 14.29+ (Visual Studio 16.11+) (C++20 cNTTP syntax)

Template UDL syntax

The compiler must support extension N3599, for example as GNU extension in gcc (not in GCC 9.1+) and clang.

constexprautomatch(std::string_view sv)noexcept {usingnamespacectre::literals;return"h.*"_ctre.match(sv);}

If you need extension N3599 in GCC 9.1+, you can't use -pedantic. Also, you need to define macroCTRE_ENABLE_LITERALS.

C++17 syntax

You can provide a pattern as aconstexpr ctll::fixed_string variable.

staticconstexprauto pattern = ctll::fixed_string{"h.*" };constexprautomatch(std::string_view sv)noexcept {return ctre::match<pattern>(sv);}

(this is tested in MSVC 15.8.8)

C++20 syntax

Currently, the only compiler which supports cNTTP syntaxctre::match<PATTERN>(subject) is GCC 9+.

constexprautomatch(std::string_view sv)noexcept {return ctre::match<"h.*">(sv);}

Examples

Extracting number from input

std::optional<std::string_view>extract_number(std::string_view s)noexcept {if (auto m = ctre::match<"[a-z]+([0-9]+)">(s)) {return m.get<1>().to_view();    }else {return std::nullopt;    }}

link to compiler explorer

Extracting values from date

structdate { std::string_view year; std::string_view month; std::string_view day; };std::optional<date>extract_date(std::string_view s)noexcept {usingnamespacectre::literals;if (auto [whole, year, month, day] = ctre::match<"(\\d{4})/(\\d{1,2})/(\\d{1,2})">(s); whole) {return date{year, month, day};    }else {return std::nullopt;    }}// static_assert(extract_date("2018/08/27"sv).has_value());// static_assert((*extract_date("2018/08/27"sv)).year == "2018"sv);// static_assert((*extract_date("2018/08/27"sv)).month == "08"sv);// static_assert((*extract_date("2018/08/27"sv)).day == "27"sv);

link to compiler explorer

Using captures

auto result = ctre::match<"(?<year>\\d{4})/(?<month>\\d{1,2})/(?<day>\\d{1,2})">(s);return date{result.get<"year">(), result.get<"month">, result.get<"day">};// or in C++ emulation, but the object must have a linkagestaticconstexpr ctll::fixed_string year ="year";staticconstexpr ctll::fixed_string month ="month";staticconstexpr ctll::fixed_string day ="day";return date{result.get<year>(), result.get<month>(), result.get<day>()};// or use numbered access// capture 0 is the whole matchreturn date{result.get<1>(), result.get<2>(), result.get<3>()};

Lexer

enumclasstype {    unknown, identifier, number};structlex_item {    type t;    std::string_view c;};std::optional<lex_item>lexer(std::string_view v)noexcept {if (auto [m,id,num] = ctre::match<"([a-z]+)|([0-9]+)">(v); m) {if (id) {return lex_item{type::identifier, id};        }elseif (num) {return lex_item{type::number, num};        }    }return std::nullopt;}

link to compiler explorer

Range over input

This support is preliminary, probably the API will be changed.

auto input ="123,456,768"sv;for (auto match: ctre::search_all<"([0-9]+),?">(input))    std::cout << std::string_view{match.get<0>()} <<"\n";

Unicode

#include<ctre-unicode.hpp>#include<iostream>// needed if you want to output to the terminalstd::string_viewcast_from_unicode(std::u8string_view input)noexcept {returnstd::string_view(reinterpret_cast<constchar *>(input.data()), input.size());}intmain() {usingnamespacestd::literals;    std::u8string_view original =u8"Tu es un génie"sv;for (auto match: ctre::search_all<"\\p{Letter}+">(original))        std::cout <<cast_from_unicode(match) << std::endl;return0;}

link to compiler explorer

Installing ctre using vcpkg

You can download and install ctre using thevcpkg dependency manager:

git clone https://github.com/Microsoft/vcpkg.gitcd vcpkg./bootstrap-vcpkg.sh./vcpkg integrate install./vcpkg install ctre

The ctre port in vcpkg is kept up to date by Microsoft team members and community contributors. If the version is out of date, pleasecreate an issue or pull request on the vcpkg repository.

Running tests (for developers)

Just runmake in root of this project.


[8]ページ先頭

©2009-2025 Movatter.jp