Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Learn regex the easy way

License

NotificationsYou must be signed in to change notification settings

birthToSpring/learn-regex

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Learn Regex

Translations:

What is Regular Expression?

Regular expression is a group of characters or symbols which is used to find a specific pattern from a text.

A regular expression is a pattern that is matched against a subject string fromleft to right. The word "Regular expression" is a mouthful, you will usuallyfind the term abbreviated as "regex" or "regexp". Regular expression is used forreplacing a text within a string, validating form, extract a substring from astring based upon a pattern match, and so much more.

Imagine you are writing an application and you want to set the rules for when auser chooses their username. We want to allow the username to contain letters,numbers, underscores and hyphens. We also want to limit the number of charactersin username so it does not look ugly. We use the following regular expression tovalidate a username:



Regular expression

Above regular expression can accept the stringsjohn_doe,jo-hn_doe andjohn12_as. It does not matchJo because that string contains uppercaseletter and also it is too short.

Table of Contents

1. Basic Matchers

A regular expression is just a pattern of characters that we use to performsearch in a text. For example, the regular expressionthe means: the lettert, followed by the letterh, followed by the lettere.

"the" => The fat cat sat onthe mat.

Test the regular expression

The regular expression123 matches the string123. The regular expression ismatched against an input string by comparing each character in the regularexpression to each character in the input string, one after another. Regularexpressions are normally case-sensitive so the regular expressionThe wouldnot match the stringthe.

"The" =>The fat cat sat on the mat.

Test the regular expression

2. Meta Characters

Meta characters are the building blocks of the regular expressions. Metacharacters do not stand for themselves but instead are interpreted in somespecial way. Some meta characters have a special meaning and are written insidesquare brackets. The meta characters are as follows:

Meta characterDescription
.Period matches any single character except a line break.
[ ]Character class. Matches any character contained between the square brackets.
[^ ]Negated character class. Matches any character that is not contained between the square brackets
*Matches 0 or more repetitions of the preceding symbol.
+Matches 1 or more repetitions of the preceding symbol.
?Makes the preceding symbol optional.
{n,m}Braces. Matches at least "n" but not more than "m" repetitions of the preceding symbol.
(xyz)Character group. Matches the characters xyz in that exact order.
|Alternation. Matches either the characters before or the characters after the symbol.
\Escapes the next character. This allows you to match reserved characters[ ] ( ) { } . * + ? ^ $ \ |
^Matches the beginning of the input.
$Matches the end of the input.

2.1 Full stop

Full stop. is the simplest example of meta character. The meta character.matches any single character. It will not match return or newline characters.For example, the regular expression.ar means: any character, followed by thelettera, followed by the letterr.

".ar" => Thecarparked in thegarage.

Test the regular expression

2.2 Character set

Character sets are also called character class. Square brackets are used tospecify character sets. Use a hyphen inside a character set to specify thecharacters' range. The order of the character range inside square bracketsdoesn't matter. For example, the regular expression[Tt]he means: an uppercaseT or lowercaset, followed by the letterh, followed by the lettere.

"[Tt]he" =>The car parked inthe garage.

Test the regular expression

A period inside a character set, however, means a literal period. The regularexpressionar[.] means: a lowercase charactera, followed by letterr,followed by a period. character.

"ar[.]" => A garage is a good place to park a car.

Test the regular expression

2.2.1 Negated character set

In general, the caret symbol represents the start of the string, but when it istyped after the opening square bracket it negates the character set. Forexample, the regular expression[^c]ar means: any character exceptc,followed by the charactera, followed by the letterr.

"[^c]ar" => The carparked in thegarage.

Test the regular expression

2.3 Repetitions

Following meta characters+,* or? are used to specify how many times asubpattern can occur. These meta characters act differently in differentsituations.

2.3.1 The Star

The symbol* matches zero or more repetitions of the preceding matcher. Theregular expressiona* means: zero or more repetitions of preceding lowercasecharactera. But if it appears after a character set or class then it findsthe repetitions of the whole character set. For example, the regular expression[a-z]* means: any number of lowercase letters in a row.

"[a-z]*" => Thecarparkedinthegarage #21.

Test the regular expression

The* symbol can be used with the meta character. to match any string ofcharacters.*. The* symbol can be used with the whitespace character\sto match a string of whitespace characters. For example, the expression\s*cat\s* means: zero or more spaces, followed by lowercase characterc,followed by lowercase charactera, followed by lowercase charactert,followed by zero or more spaces.

"\s*cat\s*" => The fat catsat on the concatenation.

Test the regular expression

2.3.2 The Plus

The symbol+ matches one or more repetitions of the preceding character. Forexample, the regular expressionc.+t means: lowercase letterc, followed byat least one character, followed by the lowercase charactert. It needs to beclarified thatt is the lastt in the sentence.

"c.+t" => The fatcat sat on the mat.

Test the regular expression

2.3.3 The Question Mark

In regular expression the meta character? makes the preceding characteroptional. This symbol matches zero or one instance of the preceding character.For example, the regular expression[T]?he means: Optional the uppercaseletterT, followed by the lowercase characterh, followed by the lowercasecharactere.

"[T]he" =>The car is parked in the garage.

Test the regular expression

"[T]?he" =>The car is parked in the garage.

Test the regular expression

2.4 Braces

In regular expression braces that are also called quantifiers are used tospecify the number of times that a character or a group of characters can berepeated. For example, the regular expression[0-9]{2,3} means: Match at least2 digits but not more than 3 ( characters in the range of 0 to 9).

"[0-9]{2,3}" => The number was 9.9997 but we rounded it off to10.0.

Test the regular expression

We can leave out the second number. For example, the regular expression[0-9]{2,} means: Match 2 or more digits. If we also remove the comma theregular expression[0-9]{3} means: Match exactly 3 digits.

"[0-9]{2,}" => The number was 9.9997 but we rounded it off to10.0.

Test the regular expression

"[0-9]{3}" => The number was 9.9997 but we rounded it off to 10.0.

Test the regular expression

2.5 Capturing Group

A capturing group is a group of sub-patterns that is written inside Parentheses(...). Like As we discussed before that in regular expression if we put a quantifierafter a character then it will repeat the preceding character. But if we put quantifierafter a capturing group then it repeats the whole capturing group. For example,the regular expression(ab)* matches zero or more repetitions of the character"ab". We can also use the alternation| meta character inside capturing group.For example, the regular expression(c|g|p)ar means: lowercase characterc,g orp, followed by charactera, followed by characterr.

"(c|g|p)ar" => Thecar isparked in thegarage.

Test the regular expression

Note that capturing groups do not only match but also capture the characters for use inthe parent language. The parent language could be python or javascript or virtually anylanguage that implements regular expressions in a function definition.

2.5.1 Non-capturing group

A non-capturing group is a capturing group that only matches the characters, butdoes not capture the group. A non-capturing group is denoted by a? followed by a:within parenthesis(...). For example, the regular expression(?:c|g|p)ar is similar to(c|g|p)ar in that it matches the same characters but will not create a capture group.

"(?:c|g|p)ar" => Thecar isparked in thegarage.

Test the regular expression

Non-capturing groups can come in handy when used in find-and-replace functionality orwhen mixed with capturing groups to keep the overview when producing any other kind of output.See also [4. Lookaround](# 4. Lookaround).

2.6 Alternation

In a regular expression, the vertical bar| is used to define alternation.Alternation is like an OR statement between multiple expressions. Now, you may bethinking that character set and alternation works the same way. But the bigdifference between character set and alternation is that character set works oncharacter level but alternation works on expression level. For example, theregular expression(T|t)he|car means: either (uppercase characterT or lowercaset, followed by lowercase characterh, followed by lowercase charactere) OR(lowercase characterc, followed by lowercase charactera, followed bylowercase characterr). Note that I put the parentheses for clarity, to show that either expressionin parentheses can be met and it will match.

"(T|t)he|car" =>Thecar is parked inthe garage.

Test the regular expression

2.7 Escaping special character

Backslash\ is used in regular expression to escape the next character. Thisallows us to specify a symbol as a matching character including reservedcharacters{ } [ ] / \ + * . $ ^ | ?. To use a special character as a matchingcharacter prepend\ before it.

For example, the regular expression. is used to match any character exceptnewline. Now to match. in an input string the regular expression(f|c|m)at\.? means: lowercase letterf,c orm, followed by lowercasecharactera, followed by lowercase lettert, followed by optional.character.

"(f|c|m)at\.?" => Thefatcat sat on themat.

Test the regular expression

2.8 Anchors

In regular expressions, we use anchors to check if the matching symbol is thestarting symbol or ending symbol of the input string. Anchors are of two types:First type is Caret^ that check if the matching character is the startcharacter of the input and the second type is Dollar$ that checks if matchingcharacter is the last character of the input string.

2.8.1 Caret

Caret^ symbol is used to check if matching character is the first characterof the input string. If we apply the following regular expression^a (if a isthe starting symbol) to input stringabc it matchesa. But if we applyregular expression^b on above input string it does not match anything.Because in input stringabc "b" is not the starting symbol. Let's take a lookat another regular expression^(T|t)he which means: uppercase characterT orlowercase charactert is the start symbol of the input string, followed bylowercase characterh, followed by lowercase charactere.

"(T|t)he" =>The car is parked inthe garage.

Test the regular expression

"^(T|t)he" =>The car is parked in the garage.

Test the regular expression

2.8.2 Dollar

Dollar$ symbol is used to check if matching character is the last characterof the input string. For example, regular expression(at\.)$ means: alowercase charactera, followed by lowercase charactert, followed by a.character and the matcher must be end of the string.

"(at\.)" => The fat cat. sat. on the mat.

Test the regular expression

"(at\.)$" => The fat cat. sat. on the mat.

Test the regular expression

3. Shorthand Character Sets

Regular expression provides shorthands for the commonly used character sets,which offer convenient shorthands for commonly used regular expressions. Theshorthand character sets are as follows:

ShorthandDescription
.Any character except new line
\wMatches alphanumeric characters:[a-zA-Z0-9_]
\WMatches non-alphanumeric characters:[^\w]
\dMatches digit:[0-9]
\DMatches non-digit:[^\d]
\sMatches whitespace character:[\t\n\f\r\p{Z}]
\SMatches non-whitespace character:[^\s]

4. Lookaround

Lookbehind and lookahead (also called lookaround) are specific types ofnon-capturing groups (Used to match the pattern but not included in matchinglist). Lookarounds are used when we have the condition that this pattern ispreceded or followed by another certain pattern. For example, we want to get allnumbers that are preceded by$ character from the following input string$4.44 and $10.88. We will use following regular expression(?<=\$)[0-9\.]*which means: get all the numbers which contain. character and are precededby$ character. Following are the lookarounds that are used in regularexpressions:

SymbolDescription
?=Positive Lookahead
?!Negative Lookahead
?<=Positive Lookbehind
?<!Negative Lookbehind

4.1 Positive Lookahead

The positive lookahead asserts that the first part of the expression must befollowed by the lookahead expression. The returned match only contains the textthat is matched by the first part of the expression. To define a positivelookahead, parentheses are used. Within those parentheses, a question mark withequal sign is used like this:(?=...). Lookahead expression is written afterthe equal sign inside parentheses. For example, the regular expression(T|t)he(?=\sfat) means: optionally match lowercase lettert or uppercaseletterT, followed by letterh, followed by lettere. In parentheses wedefine positive lookahead which tells regular expression engine to matchTheorthe which are followed by the wordfat.

"(T|t)he(?=\sfat)" =>The fat cat sat on the mat.

Test the regular expression

4.2 Negative Lookahead

Negative lookahead is used when we need to get all matches from input stringthat are not followed by a pattern. Negative lookahead is defined same as we definepositive lookahead but the only difference is instead of equal= character weuse negation! character i.e.(?!...). Let's take a look at the followingregular expression(T|t)he(?!\sfat) which means: get allThe orthe wordsfrom input string that are not followed by the wordfat precedes by a spacecharacter.

"(T|t)he(?!\sfat)" => The fat cat sat onthe mat.

Test the regular expression

4.3 Positive Lookbehind

Positive lookbehind is used to get all the matches that are preceded by aspecific pattern. Positive lookbehind is denoted by(?<=...). For example, theregular expression(?<=(T|t)he\s)(fat|mat) means: get allfat ormat wordsfrom input string that are after the wordThe orthe.

"(?<=(T|t)he\s)(fat|mat)" => Thefat cat sat on themat.

Test the regular expression

4.4 Negative Lookbehind

Negative lookbehind is used to get all the matches that are not preceded by aspecific pattern. Negative lookbehind is denoted by(?<!...). For example, theregular expression(?<!(T|t)he\s)(cat) means: get allcat words from inputstring that are not after the wordThe orthe.

"(?<!(T|t)he\s)(cat)" => The cat sat oncat.

Test the regular expression

5. Flags

Flags are also called modifiers because they modify the output of a regularexpression. These flags can be used in any order or combination, and are anintegral part of the RegExp.

FlagDescription
iCase insensitive: Sets matching to be case-insensitive.
gGlobal Search: Search for a pattern throughout the input string.
mMultiline: Anchor meta character works on each line.

5.1 Case Insensitive

Thei modifier is used to perform case-insensitive matching. For example, theregular expression/The/gi means: uppercase letterT, followed by lowercasecharacterh, followed by charactere. And at the end of regular expressionthei flag tells the regular expression engine to ignore the case. As you cansee we also providedg flag because we want to search for the pattern in thewhole input string.

"The" =>The fat cat sat on the mat.

Test the regular expression

"/The/gi" =>The fat cat sat onthe mat.

Test the regular expression

5.2 Global search

Theg modifier is used to perform a global match (find all matches rather thanstopping after the first match). For example, the regular expression/.(at)/gmeans: any character except new line, followed by lowercase charactera,followed by lowercase charactert. Because we providedg flag at the end ofthe regular expression now it will find all matches in the input string, not just the first one (which is the default behavior).

"/.(at)/" => Thefat cat sat on the mat.

Test the regular expression

"/.(at)/g" => Thefatcatsat on themat.

Test the regular expression

5.3 Multiline

Them modifier is used to perform a multi-line match. As we discussed earlieranchors(^, $) are used to check if pattern is the beginning of the input orend of the input string. But if we want that anchors works on each line we usem flag. For example, the regular expression/at(.)?$/gm means: lowercasecharactera, followed by lowercase charactert, optionally anything exceptnew line. And because ofm flag now regular expression engine matches patternat the end of each line in a string.

"/.at(.)?$/" => The fat                cat sat                on themat.

Test the regular expression

"/.at(.)?$/gm" => Thefat                  catsat                  on themat.

Test the regular expression

6. Greedy vs lazy matching

By default regex will do greedy matching , means it will match as long aspossible. we can use? to match in lazy way means as short as possible

"/(.*at)/" =>The fat cat sat on the mat.

Test the regular expression

"/(.*?at)/" =>The fat cat sat on the mat.

Test the regular expression

Contribution

  • Open pull request with improvements
  • Discuss ideas in issues
  • Spread the word
  • Reach out with any feedbackTwitter URL

License

MIT ©Zeeshan Ahmad

About

Learn regex the easy way

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp