This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Pattern matching" – news ·newspapers ·books ·scholar ·JSTOR(February 2011) (Learn how and when to remove this message) |
Incomputer science,pattern matching is the act of checking a given sequence oftokens for the presence of the constituents of somepattern. In contrast topattern recognition, the match usually must be exact: "either it will or will not be a match." The patterns generally have the form of eithersequences ortree structures. Uses of pattern matching include outputting the locations (if any) of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence (i.e.,search and replace).
Sequence patterns (e.g., a text string) are often described usingregular expressions and matched using techniques such asbacktracking.
Tree patterns are used in someprogramming languages as a general tool to process data based on its structure, e.g.C#,[1]F#,[2]Haskell,[3]Java,[4]ML,Python,[5]Racket,[6]Ruby,[7]Rust,[8]Scala,[9]Swift[10] and the symbolic mathematics languageMathematica have specialsyntax for expressing tree patterns and alanguage construct forconditional execution and value retrieval based on it.
Often it is possible to give alternative patterns that are tried one by one, which yields a powerful conditional programming construct. Pattern matching sometimes includes support forguards.[citation needed]
This sectionneeds expansion. You can help byadding missing information.(May 2008) |
Early programming languages with pattern matching constructs includeCOMIT (1957),SNOBOL (1962),Refal (1968) with tree-based pattern matching,Prolog (1972), St Andrews Static Language (SASL) (1976),NPL (1977), andKent Recursive Calculator (KRC) (1981).
The pattern matching feature of function arguments in the languageML (1973) and its dialectStandard ML (1983) has been carried over to some otherfunctional programming languages that were influenced by them, such asHaskell (1990),Scala (2004), andF# (2005). The pattern matching construct with thematch keyword that was introduced in theML dialectCaml (1985) was followed by languages such asOCaml (1996),F# (2005),F* (2011), andRust (2015).
Manytext editors support pattern matching of various kinds: theQED editor supportsregular expression search, and some versions ofTECO support the OR operator in searches.
Computer algebra systems generally support pattern matching on algebraic expressions.[11]
Pattern matching involves specialized terminology.
While some concepts are relatively common to many pattern languages, other pattern languages include unique or unusual extensions.
matchv{(a,b)=>...} expectsv to be a pair, anda andb are bindings bringing variables of the same name into scope in the continuation ("...")._, the wildcard pattern accepts all values without examining them further, ignoring their structure. Also known asdiscard, thewild pattern, thecatch-all pattern, or as ahole.(list(?even?)...) first expects a list, and then applies the predicateeven? to each element; the overall pattern thus succeeds only when the scrutinee is a list of even numbers.(==expr) in Racket compares the value against the result of evaluatingexpr. In Erlang, mention of any variable already in scope in a pattern causes it to act as a constraint in this way (instead of as a binding).123 or"hello" are calledliteral patterns.or-pattern)The simplest pattern in pattern matching is an explicit value or a variable. For an example, consider a simple function definition in Haskell syntax (function parameters are not in parentheses but are separated by spaces, = is not assignment but definition):
f0=1
Here, 0 is a single value pattern. Now, whenever f is given 0 as argument the pattern matches and the function returns 1. With any other argument, the matching and thus the function fail. As the syntax supports alternative patterns in function definitions, we can continue the definition extending it to take more generic arguments:
fn=n*f(n-1)
Here, the firstn is a single variable pattern, which will match absolutely any argument and bind it to name n to be used in the rest of the definition. In Haskell (unlike at leastHope), patterns are tried in order so the first definition still applies in the very specific case of the input being 0, while for any other argument the function returnsn * f (n-1) with n being the argument.
The wildcard pattern (often written as_) is also simple: like a variable name, it matches any value, but does not bind the value to any name. Algorithms formatching wildcards in simple string-matching situations have been developed in a number ofrecursive and non-recursive varieties.[15]
More complex patterns can be built from the primitive ones of the previous section, usually in the same way as values are built by combining other values. The difference then is that with variable and wildcard parts, a pattern does not build into a single value, but matches a group of values that are the combination of the concrete elements and the elements that are allowed to vary within the structure of the pattern.
A tree pattern describes a part of a tree by starting with a node and specifying some branches and nodes and leaving some unspecified with a variable or wildcard pattern. It may help to think of theabstract syntax tree of a programming language andalgebraic data types.
In Haskell, the following line defines an algebraic data typeColor that has a single data constructorColorConstructor that wraps an integer and a string.
dataColor=ColorConstructorIntegerString
The constructor is a node in a tree and the integer and string are leaves in branches.
When we want to writefunctions to makeColor anabstract data type, we wish to write functions tointerface with the data type, and thus we want to extract some data from the data type, for example, just the string or just the integer part ofColor.
If we pass a variable that is of type Color, how can we get the data out of this variable? For example, for a function to get the integer part ofColor, we can use a simple tree pattern and write:
integerPart(ColorConstructortheInteger_)=theInteger
As well:
stringPart(ColorConstructor_theString)=theString
The creations of these functions can be automated by Haskell's datarecord syntax.
ThisOCaml example which defines ared–black tree and a function to re-balance it after element insertion shows how to match on a more complex structure generated by a recursive data type. The compiler verifies at compile-time that the list of cases is exhaustive and none are redundant.
typecolor=Red|Blacktype'atree=Empty|Treeofcolor*'atree*'a*'atreeletrebalancet=matchtwith|Tree(Black,Tree(Red,Tree(Red,a,x,b),y,c),z,d)|Tree(Black,Tree(Red,a,x,Tree(Red,b,y,c)),z,d)|Tree(Black,a,x,Tree(Red,Tree(Red,b,y,c),z,d))|Tree(Black,a,x,Tree(Red,b,y,Tree(Red,c,z,d)))->Tree(Red,Tree(Black,a,x,b),y,Tree(Black,c,z,d))|_->t(* the 'catch-all' case if no previous pattern matches *)
Pattern matching can be used to filter data of a certain structure. For instance, in Haskell alist comprehension could be used for this kind of filtering:
[Ax|Ax<-[A1,B1,A2,B2]]
evaluates to
[A 1, A 2]
InMathematica, the only structure that exists is thetree, which is populated by symbols. In theHaskell syntax used thus far, this could be defined as
dataSymbolTree=SymbolString[SymbolTree]
An example tree could then look like
Symbol"a"[Symbol"b"[],Symbol"c"[]]
In the traditional, more suitable syntax, the symbols are written as they are and the levels of the tree are represented using[], so that for instancea[b,c] is a tree with a as the parent, and b and c as the children.
A pattern in Mathematica involves putting "_" at positions in that tree. For instance, the pattern
A[_]
will match elements such as A[1], A[2], or more generally A[x] wherex is any entity. In this case,A is the concrete element, while_ denotes the piece of tree that can be varied. A symbol prepended to_ binds the match to that variable name while a symbol appended to_ restricts the matches to nodes of that symbol. Note that even blanks themselves are internally represented asBlank[] for_ andBlank[x] for_x.
The Mathematica functionCases filters elements of the first argument that match the pattern in the second argument:[16]
Cases[{a[1],b[1],a[2],b[2]},a[_]]
evaluates to
{a[1],a[2]}
Pattern matching applies to thestructure of expressions. In the example below,
Cases[{a[b],a[b,c],a[b[c],d],a[b[c],d[e]],a[b[c],d,e]},a[b[_],_]]
returns
{a[b[c],d],a[b[c],d[e]]}
because only these elements will match the patterna[b[_],_] above.
In Mathematica, it is also possible to extract structures as they are created in the course of computation, regardless of how or where they appear. The functionTrace can be used to monitor a computation, and return the elements that arise which match a pattern. For example, we can define theFibonacci sequence as
fib[0|1]:=1fib[n_]:=fib[n-1]+fib[n-2]
Then, we can ask the question: Given fib[3], what is the sequence of recursive Fibonacci calls?
Trace[fib[3],fib[_]]
returns a structure that represents the occurrences of the patternfib[_] in the computational structure:
{fib[3],{fib[2],{fib[1]},{fib[0]}},{fib[1]}}
In symbolic programming languages, it is easy to have patterns as arguments to functions or as elements of data structures. A consequence of this is the ability to use patterns to declaratively make statements about pieces of data and to flexibly instruct functions how to operate.
For instance, theMathematica functionCompile can be used to make more efficient versions of the code. In the following example the details do not particularly matter; what matters is that the subexpression{{com[_], Integer}} instructsCompile that expressions of the formcom[_] can be assumed to beintegers for the purposes of compilation:
com[i_]:=Binomial[2i,i]Compile[{x,{i,_Integer}},x^com[i],{{com[_],Integer}}]
Mailboxes inErlang also work this way.
TheCurry–Howard correspondence between proofs and programs relatesML-style pattern matching tocase analysis andproof by exhaustion.
By far the most common form of pattern matching involves strings of characters. In many programming languages, a particular syntax of strings is used to represent regular expressions, which are patterns describing string characters.
However, it is possible to perform some string pattern matching within the same framework that has been discussed throughout this article.
In Mathematica, strings are represented as trees of root StringExpression and all the characters in order as children of the root. Thus, to match "any amount of trailing characters", a new wildcard ___ is needed in contrast to _ that would match only a single character.
In Haskell andfunctional programming languages in general, strings are represented as functionallists of characters. A functional list is defined as an empty list, or an element constructed on an existing list. In Haskell syntax:
[]-- an empty listx:xs-- an element x constructed on a list xs
The structure for a list with some elements is thuselement:list. When pattern matching, we assert that a certain piece of data is equal to a certain pattern. For example, in the function:
head(element:list)=element
We assert that the first element ofhead's argument is called element, and the function returns this. We know that this is the first element because of the way lists are defined, a single element constructed onto a list. This single element must be the first. The empty list would not match the pattern at all, as an empty list does not have a head (the first element that is constructed).
In the example, we have no use forlist, so we can disregard it, and thus write the function:
head(element:_)=element
The equivalent Mathematica transformation is expressed as
head[element, ]:=element
In Mathematica, for instance,
StringExpression["a",_]
will match a string that has two characters and begins with "a".
The same pattern in Haskell:
['a',_]
Symbolic entities can be introduced to represent many different classes of relevant features of a string. For instance,
StringExpression[LetterCharacter, DigitCharacter]
will match a string that consists of a letter first, and then a number.
In Haskell,guards could be used to achieve the same matches:
[letter,digit]|isAlphaletter&&isDigitdigit
The main advantage of symbolic string manipulation is that it can be completely integrated with the rest of the programming language, rather than being a separate, special purpose subunit. The entire power of the language can be leveraged to build up the patterns themselves or analyze and transform the programs that contain them.
SNOBOL (StriNg Oriented and symBOlic Language) is a computer programming language developed between 1962 and 1967 atAT&TBell Laboratories byDavid J. Farber,Ralph E. Griswold and Ivan P. Polonsky.
SNOBOL4 stands apart from most programming languages by having patterns as afirst-class data type (i.e. a data type whose values can be manipulated in all ways permitted to any other data type in the programming language) and by providing operators for patternconcatenation andalternation. Strings generated during execution can be treated as programs and executed.
SNOBOL was quite widely taught in larger US universities in the late 1960s and early 1970s and was widely used in the 1970s and 1980s as a text manipulation language in thehumanities.
Since SNOBOL's creation, newer languages such asAWK andPerl have made string manipulation by means ofregular expressions fashionable. SNOBOL4 patterns, however, subsumeBackus–Naur form (BNF) grammars, which are equivalent tocontext-free grammars and more powerful thanregular expressions.[17]
{{cite web}}:Cite uses generic title (help)