| Localization library | |||||||||||||||||||||||||
| Regular expressions library(C++11) | |||||||||||||||||||||||||
| Formatting library(C++20) | |||||||||||||||||||||||||
| Null-terminated sequence utilities | |||||||||||||||||||||||||
| Byte strings | |||||||||||||||||||||||||
| Multibyte strings | |||||||||||||||||||||||||
| Wide strings | |||||||||||||||||||||||||
| Primitive numeric conversions | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
| Text encoding identifications | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
| Classes | ||||
(C++11) | ||||
(C++11) | ||||
(C++11) | ||||
| Algorithms | ||||
(C++11) | ||||
(C++11) | ||||
(C++11) | ||||
| Iterators | ||||
(C++11) | ||||
regex_token_iterator (C++11) | ||||
| Exceptions | ||||
(C++11) | ||||
| Traits | ||||
(C++11) | ||||
| Constants | ||||
(C++11) | ||||
(C++11) | ||||
(C++11) | ||||
| Regex Grammar | ||||
(C++11) |
| Member functions | ||||
| Comparisons | ||||
| Observers | ||||
| Modifiers | ||||
Defined in header <regex> | ||
template< class BidirIt, | (since C++11) | |
std::regex_token_iterator is a read-onlyLegacyForwardIterator that accesses the individual sub-matches of every match of a regular expression within the underlying character sequence. It can also be used to access the parts of the sequence that were not matched by the given regular expression (e.g. as a tokenizer).
On construction, it constructs anstd::regex_iterator and on every increment it steps through the requested sub-matches from the current match_results, incrementing the underlyingstd::regex_iterator when incrementing away from the last submatch.
The default-constructedstd::regex_token_iterator is the end-of-sequence iterator. When a validstd::regex_token_iterator is incremented after reaching the last submatch of the last match, it becomes equal to the end-of-sequence iterator. Dereferencing or incrementing it further invokes undefined behavior.
Just before becoming the end-of-sequence iterator, astd::regex_token_iterator may become asuffix iterator, if the index-1 (non-matched fragment) appears in the list of the requested submatch indices. Such iterator, if dereferenced, returns a match_results corresponding to the sequence of characters between the last match and the end of sequence.
A typical implementation ofstd::regex_token_iterator holds the underlyingstd::regex_iterator, a container (e.g.std::vector<int>) of the requested submatch indices, the internal counter equal to the index of the submatch, a pointer tostd::sub_match, pointing at the current submatch of the current match, and astd::match_results object containing the last non-matched character sequence (used in tokenizer mode).
Contents |
-BidirIt must meet the requirements ofLegacyBidirectionalIterator. |
Several specializations for common character sequence types are defined:
Defined in header <regex> | |
| Type | Definition |
std::cregex_token_iterator | std::regex_token_iterator<constchar*> |
std::wcregex_token_iterator | std::regex_token_iterator<constwchar_t*> |
std::sregex_token_iterator | std::regex_token_iterator<std::string::const_iterator> |
std::wsregex_token_iterator | std::regex_token_iterator<std::wstring::const_iterator> |
| Member type | Definition |
value_type | std::sub_match<BidirIt> |
difference_type | std::ptrdiff_t |
pointer | const value_type* |
reference | const value_type& |
iterator_category | std::forward_iterator_tag |
iterator_concept(C++20) | std::input_iterator_tag |
regex_type | std::basic_regex<CharT, Traits> |
constructs a newregex_token_iterator(public member function)[edit] | |
(destructor) (implicitly declared) | destructs aregex_token_iterator, including the cached value(public member function)[edit] |
| assigns contents (public member function)[edit] | |
(removed in C++20) | compares tworegex_token_iterators(public member function)[edit] |
| accesses current submatch (public member function)[edit] | |
| advances the iterator to the next submatch (public member function)[edit] |
It is the programmer's responsibility to ensure that thestd::basic_regex object passed to the iterator's constructor outlives the iterator. Because the iterator stores astd::regex_iterator which stores a pointer to the regex, incrementing the iterator after the regex was destroyed results in undefined behavior.
#include <algorithm>#include <fstream>#include <iostream>#include <iterator>#include <regex> int main(){// Tokenization (non-matched fragments)// Note that regex is matched only two times; when the third value is obtained// the iterator is a suffix iterator.conststd::string text="Quick brown fox.";conststd::regex ws_re("\\s+");// whitespacestd::copy(std::sregex_token_iterator(text.begin(), text.end(), ws_re,-1), std::sregex_token_iterator(),std::ostream_iterator<std::string>(std::cout,"\n")); std::cout<<'\n'; // Iterating the first submatchesconststd::string html= R"(<p><a href="http://google.com">google</a> )" R"(< a HREF ="http://cppreference.com">cppreference</a>\n</p>)";conststd::regex url_re(R"!!(<\s*A\s+[^>]*href\s*=\s*"([^"]*)")!!", std::regex::icase); std::copy(std::sregex_token_iterator(html.begin(), html.end(), url_re, 1), std::sregex_token_iterator(), std::ostream_iterator<std::string>(std::cout, "\n"));}
Output:
Quickbrownfox. http://google.comhttp://cppreference.com
The following behavior-changing defect reports were applied retroactively to previously published C++ standards.
| DR | Applied to | Behavior as published | Correct behavior |
|---|---|---|---|
| LWG 3698 (P2770R0) | C++20 | regex_token_iterator was aforward_iteratorwhile being a stashing iterator | madeinput_iterator[1] |
iterator_category was unchanged by the resolution, because changing it tostd::input_iterator_tag might break too much existing code.