| Category | Functions |
|---|---|
| Matching | bmatchmatchmatchAllmatchFirst |
| Building | ctRegexescaperregex |
| Replace | replacereplaceAllreplaceAllIntoreplaceFirstreplaceFirstInto |
| Split | splitsplitter |
| Objects | CapturesRegexRegexExceptionRegexMatchSplitterStaticRegex |
They met on 24/01/1970.7/8/99 wasn't as hot as 7/8/2022.import std.regex;import std.stdio;// Print out all possible dd/mm/yy(yy) dates found in user input.auto r =regex(r"\b[0-9][0-9]?/[0-9][0-9]?/[0-9][0-9](?:[0-9][0-9])?\b");foreach (line; stdin.byLine){// matchAll() returns a range that can be iterated// to get all subsequent matches.foreach (c; matchAll(line, r)) writeln(c.hit);}
import std.regex;auto ctr = ctRegex!(`^.*/([^/]+)/?$`);// It works just like a normal regex:auto c2 = matchFirst("foo/bar", ctr);// First match found here, if anyassert(!c2.empty);// Be sure to check if there is a match before examining contents!assert(c2[1] =="bar");// Captures is a range of submatches: 0 = full match.
import std.regex;auto multi =regex([`\d+,\d+`,`([a-z]+):(\d+)`]);auto m ="abc:43 12,34".matchAll(multi);assert(m.front.whichPattern == 2);assert(m.front[1] =="abc");assert(m.front[2] =="43");m.popFront();assert(m.front.whichPattern == 1);assert(m.front[0] =="12,34");
import std.regex;// The result of `matchAll/matchFirst` is directly testable with `if/assert/while`,// e.g. test if a string consists of letters only:assert(matchFirst("LettersOnly",`^\p{L}+$`));// And we can take advantage of the ability to define a variable in the IfCondition:if (const captures = matchFirst("At l34st one digit, but maybe more...",`((\d)(\d*))`)){assert(captures[2] =="3");assert(captures[3] =="4");assert(captures[1] =="34");}
regex see ashort tour of the module API and its abilities. There are other web resources on regular expressions to help newcomers, and a goodreference with tutorial can easily be found. This library uses a remarkably common ECMAScript syntax flavor with the following extensions:| Pattern element | Semantics |
| Atoms | Match single characters |
| any character except [{|*+?()^$ | Matches the character itself. |
| . | In single line mode matches any character. Otherwise it matches any character except '\n' and '\r'. |
| [class] | Matches a single character that belongs to this character class. |
| [^class] | Matches a single character that doesnot belong to this character class. |
| \cC | Matches the control character corresponding to letter C |
| \xXX | Matches a character with hexadecimal value of XX. |
| \uXXXX | Matches a character with hexadecimal value of XXXX. |
| \U00YYYYYY | Matches a character with hexadecimal value of YYYYYY. |
| \f | Matches a formfeed character. |
| \n | Matches a linefeed character. |
| \r | Matches a carriage return character. |
| \t | Matches a tab character. |
| \v | Matches a vertical tab character. |
| \d | Matches any Unicode digit. |
| \D | Matches any character except Unicode digits. |
| \w | Matches any word character (note: this includes numbers). |
| \W | Matches any non-word character. |
| \s | Matches whitespace, same as \p{White_Space}. |
| \S | Matches any character except those recognized as\s. |
| \\ | Matches \ character. |
| \c where c is one of [|*+?() | Matches the character c itself. |
| \p{PropertyName} | Matches a character that belongs to the Unicode PropertyName set. Single letter abbreviations can be used without surrounding {,}. |
| \P{PropertyName} | Matches a character that does not belong to the Unicode PropertyName set. Single letter abbreviations can be used without surrounding {,}. |
| \p{InBasicLatin} | Matches any character that is part of the BasicLatin Unicodeblock. |
| \P{InBasicLatin} | Matches any character except ones in the BasicLatin Unicodeblock. |
| \p{Cyrillic} | Matches any character that is part of Cyrillicscript. |
| \P{Cyrillic} | Matches any character except ones in Cyrillicscript. |
| Quantifiers | Specify repetition of other elements |
| * | Matches previous character/subexpression 0 or more times. Greedy version - tries as many times as possible. |
| *? | Matches previous character/subexpression 0 or more times. Lazy version - stops as early as possible. |
| + | Matches previous character/subexpression 1 or more times. Greedy version - tries as many times as possible. |
| +? | Matches previous character/subexpression 1 or more times. Lazy version - stops as early as possible. |
| ? | Matches previous character/subexpression 0 or 1 time. Greedy version - tries as many times as possible. |
| ?? | Matches previous character/subexpression 0 or 1 time. Lazy version - stops as early as possible. |
| {n} | Matches previous character/subexpression exactly n times. |
| {n,} | Matches previous character/subexpression n times or more. Greedy version - tries as many times as possible. |
| {n,}? | Matches previous character/subexpression n times or more. Lazy version - stops as early as possible. |
| {n,m} | Matches previous character/subexpression n to m times. Greedy version - tries as many times as possible, but no more than m times. |
| {n,m}? | Matches previous character/subexpression n to m times. Lazy version - stops as early as possible, but no less then n times. |
| Other | Subexpressions & alternations |
| (regex) | Matches subexpression regex, saving matched portion of text for later retrieval. |
| (?#comment) | An inline comment that is ignored while matching. |
| (?:regex) | Matches subexpression regex,not saving matched portion of text. Useful to speed up matching. |
| A|B | Matches subexpression A, or failing that, matches B. |
| (?P<name>regex) | Matches named subexpression regex labeling it with name 'name'. When referring to a matched portion of text, names work like aliases in addition to direct numbers. |
| Assertions | Match position rather than character |
| ^ | Matches at the beginning of input or line (in multiline mode). |
| $ | Matches at the end of input or line (in multiline mode). |
| \b | Matches at word boundary. |
| \B | Matches whennot at word boundary. |
| (?=regex) | Zero-width lookahead assertion. Matches at a point where the subexpression regex could be matched starting from the current position. |
| (?!regex) | Zero-width negative lookahead assertion. Matches at a point where the subexpression regex couldnot be matched starting from the current position. |
| (?<=regex) | Zero-width lookbehind assertion. Matches at a point where the subexpression regex could be matched ending at the current position (matching goes backwards). |
| (?<!regex) | Zero-width negative lookbehind assertion. Matches at a point where the subexpression regex couldnot be matched ending at the current position (matching goes backwards). |
| Pattern element | Semantics |
| Any atom | Has the same meaning as outside of a character class, except for ] which must be written as \] |
| a-z | Includes characters a, b, c, ..., z. |
| [a||b], [a--b], [a~~b], [a&&b] | Where a, b are arbitrary classes, means union, set difference, symmetric set difference, and intersection respectively.Any sequence of character class elements implicitly forms a union. |
| Flag | Semantics |
| g | Global regex, repeat over the whole input. |
| i | Case insensitive matching. |
| m | Multi-line mode, match ^, $ on start and end line separators as well as start and end of input. |
| s | Single-line mode, makes . match '\n' and '\r' as well. |
| x | Free-form syntax, ignores whitespace in pattern, useful for formatting complex regular expressions. |
| Format specifier | Replaced by |
| $& | the whole match. |
| $` | part of inputpreceding the match. |
| $' | part of inputfollowing the match. |
| $$ | '$' character. |
| \c , where c is any character | the character c itself. |
| \\ | '\' character. |
| $1 .. $99 | submatch number 1 to 99 respectively. |
regex by Walter Bright and Andrei Alexandrescu.Sourcestd/regex/package.d
Regex(Char)Regex object holds regular expression pattern in compiled form.Example Test if this object doesn't contain any compiled pattern.
Regex!char r;assert(r.empty);r = regex("");// Note: "" is a valid regex pattern.assert(!r.empty);Getting a range of all the named captures in the regex.
import std.range;import std.algorithm;auto re = regex(`(?P<name>\w+) = (?P<var>\d+)`);auto nc = re.namedCaptures;staticassert(isRandomAccessRange!(typeof(nc)));assert(!nc.empty);assert(nc.length == 2);assert(nc.equal(["name","var"]));assert(nc[0] =="name");assert(nc[1..$].equal(["var"]));
StaticRegex = Regex(Char);StaticRegex isRegex object that contains D code specially generated at compile-time to speed up matching.regex(S : C[], C)(const S[]patterns, const(char)[]flags = "")regex(S)(Spattern, const(char)[]flags = "")pattern.Spattern | A single regular expression to match. |
S[]patterns | An array of regular expression strings. The resultingRegex object will match any expression; usewhichPattern to know which. |
const(char)[]flags | The attributes (g, i, m, s and x accepted) |
void test(S)(){// multi-pattern regex example S[] arr = [`([a-z]+):(\d+)`,`(\d+),\d+`];auto multi =regex(arr);// multi regex S str ="abc:43 12,34";auto m = str.matchAll(multi); writeln(m.front.whichPattern);// 1 writeln(m.front[1]);// "abc" writeln(m.front[2]);// "43" m.popFront(); writeln(m.front.whichPattern);// 2 writeln(m.front[1]);// "12"}import std.meta : AliasSeq;staticforeach (C; AliasSeq!(string, wstring, dstring))// Test with const array of patterns - see https://issues.dlang.org/show_bug.cgi?id=20301staticforeach (S; AliasSeq!(C,const C,immutable C)) test!S();
ctRegex(alias pattern, string flags = "");| pattern | Regular expression |
| flags | The attributes (g, i, m, s and x accepted) |
Captures(R) if (isSomeString!R);Captures object contains submatches captured during a call tomatch or iteration overRegexMatch range.import std.range.primitives : popFrontN;auto c = matchFirst("@abc#", regex(`(\w)(\w)(\w)`));assert(c.pre =="@");// Part of input preceding matchassert(c.post =="#");// Immediately after matchassert(c.hit == c[0] && c.hit =="abc");// The whole matchwriteln(c[2]);// "b"writeln(c.front);// "abc"c.popFront();writeln(c.front);// "a"writeln(c.back);// "c"c.popBack();writeln(c.back);// "b"popFrontN(c, 2);assert(c.empty);assert(!matchFirst("nothing","something"));// Captures that are not matched will be null.c = matchFirst("ac", regex(`a(b)?c`));assert(c);assert(!c[1]);
pre();post();hit();front();back();popFront();popBack();empty() const;opIndex()(size_ti) inout;opCast(T : bool)() const;import std.regex;assert(!matchFirst("nothing","something"));
whichPattern() const;import std.regex;writeln(matchFirst("abc","[0-9]+","[a-z]+").whichPattern);// 2
opIndex(String)(Stringi)import std.regex;import std.range;auto c = matchFirst("a = 42;", regex(`(?P<var>\w+)\s*=\s*(?P<value>\d+);`));assert(c["var"] =="a");assert(c["value"] =="42");popFrontN(c, 2);//named groups are unaffected by range primitivesassert(c["var"] =="a");assert(c.front =="42");
length() const;captures();RegexMatch(R) if (isSomeString!R);pre();post();hit();front() inout;popFront();save();import std.regex;auto m = matchAll("Hello, world!", regex(`\w+`));assert(m.front.hit =="Hello");m.popFront();assert(m.front.hit =="world");m.popFront();assert(m.empty);
empty() const;opCast(T : bool)();captures() inout;match(R, RegEx)(Rinput, RegExre)match(R, String)(Rinput, Stringre)input to regex patternre, using Thompson NFA matching scheme.matchFirst(R, RegEx)(Rinput, RegExre)matchFirst(R, String)(Rinput, Stringre)matchFirst(R, String)(Rinput, String[]re...)input that matches the patternre. This function picks the most suitable regular expression engine depending on the pattern properties.re parameter can be one of three types:matchAll(R, RegEx)(Rinput, RegExre)matchAll(R, String)(Rinput, Stringre)matchAll(R, String)(Rinput, String[]re...)re in the giveninput. The result is a lazy range of matches generated as they are encountered in the input going left to right.re parameter can be one of three types:bmatch(R, RegEx)(Rinput, RegExre)bmatch(R, String)(Rinput, Stringre)replaceFirst(R, C, RegEx)(Rinput, RegExre, const(C)[]format)input by replacing the first match with a string generated from it according to theformat specifier.Rinput | string to search |
RegExre | compiled regular expression to use |
const(C)[]format | format string to generate replacements from, seethe format string. |
writeln(replaceFirst("noon", regex("n"),"[$&]"));// "[n]oon"
replaceFirst(alias fun, R, RegEx)(Rinput, RegExre)re in theinput. Unlike the other overload there is no format string instead captures are passed to to a user-defined functorfun that returns a new string to use as replacement.input, seereplaceAll to replace the all of the matches.input with all matches replaced by return values offun. If no matches found returns theinput itself.import std.conv : to;string list ="#21 out of 46";string newList =replaceFirst!(cap => to!string(to!int(cap.hit)+1)) (list, regex(`[0-9]+`));writeln(newList);// "#22 out of 46"
replaceFirstInto(Sink, R, C, RegEx)(ref Sinksink, Rinput, RegExre, const(C)[]format)replaceFirstInto(alias fun, Sink, R, RegEx)(Sinksink, Rinput, RegExre)sink. In particular this enables efficient construction of a final output incrementally.format string and the one with the user defined callback.import std.array;string m1 ="first message\n";string m2 ="second message\n";auto result = appender!string();replaceFirstInto(result, m1, regex(`([a-z]+) message`),"$1");//equivalent of the above with user-defined callbackreplaceFirstInto!(cap=>cap[1])(result, m2, regex(`([a-z]+) message`));writeln(result.data);// "first\nsecond\n"
replaceAll(R, C, RegEx)(Rinput, RegExre, const(C)[]format)input by replacing all of the fragments that match a patternre with a string generated from the match according to theformat specifier.Rinput | string to search |
RegExre | compiled regular expression to use |
const(C)[]format | format string to generate replacements from, seethe format string. |
input with the all of the matches (if any) replaced. If no match is found returns the input string itself.// insert comma as thousands delimiterautore = regex(r"(?<=\d)(?=(\d\d\d)+\b)","g");writeln(replaceAll("12000 + 42100 = 54100",re,","));// "12,000 + 42,100 = 54,100"
replaceAll(alias fun, R, RegEx)(Rinput, RegExre)re in theinput. Unlike the other overload there is no format string instead captures are passed to to a user-defined functorfun that returns a new string to use as replacement.input, seereplaceFirst to replace the first match only.input with all matches replaced by return values offun. If no matches found returns theinput itself.Rinput | string to search |
RegExre | compiled regular expression |
| fun | delegate to use |
string baz(Captures!(string) m){import std.string : toUpper;return toUpper(m.hit);}// Capitalize the letters 'a' and 'r':auto s =replaceAll!(baz)("Strap a rocket engine on a chicken.", regex("[ar]"));writeln(s);// "StRAp A Rocket engine on A chicken."replaceAllInto(Sink, R, C, RegEx)(Sinksink, Rinput, RegExre, const(C)[]format)replaceAllInto(alias fun, Sink, R, RegEx)(Sinksink, Rinput, RegExre)sink. In particular this enables efficient construction of a final output incrementally.// insert comma as thousands delimiter in fifty randomly produced big numbersimport std.array, std.conv, std.random, std.range;staticre = regex(`(?<=\d)(?=(\d\d\d)+\b)`,"g");autosink = appender!(char [])();enumulong min = 10UL ^^ 10, max = 10UL ^^ 19;foreach (i; 0 .. 50){sink.clear();replaceAllInto(sink, text(uniform(min, max)),re,",");foreach (pos; iota(sink.data.length - 4, 0, -4)) writeln(sink.data[pos]);// ','}
replace(alias scheme = match, R, C, RegEx)(Rinput, RegExre, const(C)[]format)replace(alias fun, R, RegEx)(Rinput, RegExre)re. With "g" flag it performs the equivalent ofreplaceAll otherwise it works the same asreplaceFirst.Splitter(Flag!"keepSeparators" keepSeparators = No.keepSeparators, Range, alias RegEx = Regex) if (isSomeString!Range && isRegexFor!(RegEx, Range));splitter(Flag!"keepSeparators" keepSeparators = No.keepSeparators, Range, RegEx)(Ranger, RegExpat)r using a regular expressionpat as a separator.| keepSeparators | flag to specify if the matches should be in the resulting range |
Ranger | the string to split |
RegExpat | the pattern to split on |
import std.algorithm.comparison : equal;auto s1 =", abc, de, fg, hi, ";assert(equal(splitter(s1, regex(", *")), ["","abc","de","fg","hi",""]));
import std.algorithm.comparison : equal;import std.typecons : Yes;auto pattern = regex(`([\.,])`);assert("2003.04.05" .splitter!(Yes.keepSeparators)(pattern) .equal(["2003",".","04",".","05"]));assert(",1,2,3" .splitter!(Yes.keepSeparators)(pattern) .equal([",","1",",","2",",","3"]));
front();empty();popFront();save();split(String, RegEx)(Stringinput, RegExrx)input.RegexException = std.regex.internal.ir.RegexException;escaper(Range)(Ranger);import std.algorithm.comparison;import std.regex;string s =`This is {unfriendly} to *regex*`;assert(s.escaper.equal(`This is \{unfriendly\} to \*regex\*`));