NDN Regular Expression

NDN regular expression is a kind of regular expression that can match NDN names. Matching isperformed at two levels: the name level and the name component level.

A name component matcher, enclosed in< and>, specifies the pattern of a name component. Thecomponent pattern is expressed with thePerl Regular Expression Syntax.For example,<ab*c> matches the 1st, 3rd, and 4th components of/ac/dc/abc/abbc, but doesnot match the 2nd component. A special case is that<> denotes a wildcard matcher that can matchANY name component.

A component matcher can match only one name component. To match a name, you need to compose an NDNregular expression with zero or more name component matchers. For example,<ndn><edu><ucla>matches the name/ndn/edu/ucla. To describe a more complicated name pattern, we borrow somesyntaxes from the standard regular expressions.

NDN Regex Syntax

Anchors

The^ character matches the start of a name. For example,^<ndn> matches any name startingwith the componentndn, but does not match a name like/local/broadcast.

The$ character matches the end of a name. For example,^<ndn><edu>$ matches only onename:/ndn/edu.

Repetition

A component matcher can be followed by a repeat quantifier to indicate how many times the precedingcomponent may appear.

The* quantifier denotes “zero or more times”. For example,^<A><B>*<C>$ matches/A/C,/A/B/C,/A/B/B/C, and so on.

The+ quantifier denotes “one or more times”. For example,^<A><B>+<C>$ matches/A/B/C,/A/B/B/C, and so on, but does not match/A/C.

The? quantifier denotes “zero or one time”. For example,^<A><B>?<C> matches/A/C and/A/B/C, but does not match/A/B/B/C.

A bounded quantifier specifies a minimum and maximum number of permitted matches:{n} denotes“exactlyn times”;{n,} denotes “at leastn times”;{,n} denotes “at mostntimes”;{n,m} denotes “betweenn andm times (inclusive)”. For example,^<A><B>{2,4}<C>$ matches/A/B/B/C and/A/B/B/B/B/C.

Note that the quantifiers aregreedy, which means it will consume as many matched components aspossible. NDN regular expressions currently do not support non-greedy repeat matching and possessiverepeat matching. For example, for the name/A/B/C/C/C,^<A><B><C>+$ will match the entirename instead of only/A/B/C.

Sets

A name component set, denoted by a bracket expression starting with[ and ending with],defines a set of name components. It matches any single name component that is a member of that set.

Unlike standard regular expressions, NDN regular expression only supportsSingle Components Set,that is, you have to list all the set members one by one between the brackets. For example,^[<ndn><localhost>] matches any names starting with eitherndn" orlocalhost component.

When a name component set starts with a'^', the set becomes aNegation Set. It matches thecomplement of the contained name components. For example,^[^<ndn>] matches any non-empty namethat does not start withndn component.

Some other types of sets, such as Range Set, will be supported later.

Note that component sets may be repeated in the same way as component matchers.

Sub-pattern and Back Reference

A section beginning( and ending) acts as a marked sub-pattern. Whatever matched thesub-pattern is split out in a separate field by the matching algorithm. For example^<A>(<>{2})<B>(<>) matches the name/A/C/D/B/E, and the first sub-pattern capturesC/D.

Marked sub-patterns can be referred to by a back-reference\N, which references one or morecapturing groups. In the example above, a back reference\1\2 extracts/C/D/E out of thename.

Marked sub-patterns can also be repeated. The regex engine does not permanently substituteback-references in a regular expression, but will use the last match saved into the back-reference.If a new match is found by capturing parentheses, the previous match is overwritten. For example,both^([<A><B><C>]+)$ and^([<A><B><C>])+$ match/C/A/B. However, the former regexstoresC/A/B as the first back-reference, while the latter one storesB. That is because the+ quantifier in the latter NDN regular expression causes the marked sub-pattern to be matchedthree times, andB is the last saved match.