Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 635 – Structural Pattern Matching: Motivation and Rationale

PEP 635 – Structural Pattern Matching: Motivation and Rationale

Author:
Tobias Kohn <kohnt at tobiaskohn.ch>,Guido van Rossum <guido at python.org>
BDFL-Delegate:

Discussions-To:
Python-Dev list
Status:
Final
Type:
Informational
Created:
12-Sep-2020
Python-Version:
3.10
Post-History:
22-Oct-2020, 08-Feb-2021
Resolution:
Python-Committers message

Table of Contents

Abstract

This PEP provides the motivation and rationale forPEP 634(“Structural Pattern Matching: Specification”). First-time readersare encouraged to start withPEP 636, which provides a gentlerintroduction to the concepts, syntax and semantics of patterns.

Motivation

(Structural) pattern matching syntax is found in many languages, fromHaskell, Erlang and Scala to Elixir and Ruby. (A proposal forJavaScript is also under consideration.)

Python already supports a limited form of this through sequenceunpacking assignments, which the new proposal leverages.

Several other common Python idioms are also relevant:

  • Theif...elif...elif...else idiom is often used to findout the type or shape of an object in an ad-hoc fashion, using oneor more checks likeisinstance(x,cls),hasattr(x,"attr"),len(x)==n or"key"inx as guards to select an applicableblock. The block can then assumex supports the interfacechecked by the guard. For example:
    ifisinstance(x,tuple)andlen(x)==2:host,port=xmode="http"elifisinstance(x,tuple)andlen(x)==3:host,port,mode=x# Etc.

    Code like this is more elegantly rendered usingmatch:

    matchx:casehost,port:mode="http"casehost,port,mode:pass# Etc.
  • AST traversal code often looks for nodes matching a given pattern,for example the code to detect a node of the shape “A + B * C” mightlook like this:
    if(isinstance(node,BinOp)andnode.op=="+"andisinstance(node.right,BinOp)andnode.right.op=="*"):a,b,c=node.left,node.right.left,node.right.right# Handle a + b*c

    Usingmatch this becomes more readable:

    matchnode:caseBinOp("+",a,BinOp("*",b,c)):# Handle a + b*c

We believe that adding pattern matching to Python will enable Pythonusers to write cleaner, more readable code for examples like thoseabove, and many others.

For a more academic discussion to this proposal, see[1].

Pattern Matching and OO

Pattern matching is complimentary to the object-oriented paradigm.Using OO and inheritance we can easily define a method on a base classthat defines default behavior for a specific operation on that class,and we can override this default behavior in subclasses. We can alsouse the Visitor pattern to separate actions from data.

But this is not sufficient for all situations. For example, a codegenerator may consume an AST, and have many operations where thegenerated code needs to vary based not just on the class of a node,but also on the value of some class attributes, like theBinOpexample above. The Visitor pattern is insufficiently flexible forthis: it can only select based on the class.

See acomplete example.

Like the Visitor pattern, pattern matching allows for a strict separationof concerns: specific actions or data processing is independent of theclass hierarchy or manipulated objects. When dealing with predefined oreven built-in classes, in particular, it is often impossible to add furthermethods to the individual classes. Pattern matching not only relieves theprogrammer or class designer from the burden of the boilerplate code neededfor the Visitor pattern, but is also flexible enough to directly work withbuilt-in types. It naturally distinguishes between sequences of differentlengths, which might all share the same class despite obviously differingstructures. Moreover, pattern matching automatically takes inheritanceinto account: a classD inheriting fromC will be handled by a patternthat targetsC by default.

Object oriented programming is geared towards single-dispatch: it is asingle instance (or the type thereof) that determines which method is tobe called. This leads to a somewhat artificial situation in case of binaryoperators where both objects might play an equal role in deciding whichimplementation to use (Python addresses this through the use of reversedbinary methods). Pattern matching is structurally better suited to handlesuch situations of multi-dispatch, where the action to be taken depends onthe types of several objects to equal parts.

Patterns and Functional Style

Many Python applications and libraries are not written in a consistentOO style – unlike Java, Python encourages defining functions at thetop-level of a module, and for simple data structures, tuples (ornamed tuples or lists) and dictionaries are often used exclusively ormixed with classes or data classes.

Pattern matching is particularly suitable for picking apart such datastructures. As an extreme example, it’s easy to write code that picksa JSON data structure usingmatch:

matchjson_pet:case{"type":"cat","name":name,"pattern":pattern}:returnCat(name,pattern)case{"type":"dog","name":name,"breed":breed}:returnDog(name,breed)case_:raiseValueError("Not a suitable pet")

Functional programming generally prefers a declarative style with a focuson relationships in data. Side effects are avoided whenever possible.Pattern matching thus naturally fits and highly supports functionalprogramming style.

Rationale

This section provides the rationale for individual design decisions.It takes the place of “Rejected ideas” in the standard PEP format.It is organized in sections corresponding to the specification (PEP 634).

Overview and Terminology

Much of the power of pattern matching comes from the nesting of subpatterns.That the success of a pattern match depends directly on the success ofsubpattern is thus a cornerstone of the design. However, although apattern likeP(Q(),R()) succeeds only if both subpatternsQ()andR() succeed (i.e. the success of patternP depends onQandR), the patternP is checked first. IfP fails, neitherQ() norR() will be tried (this is a direct consequence of thefact that ifP fails, there are no subjects to match againstQ()andR() in the first place).

Also note that patterns bind names to values rather than performing anassignment. This reflects the fact that patterns aim to not have sideeffects, which also means that Capture or AS patterns cannot assign avalue to an attribute or subscript. We thus consistently use the term‘bind’ instead of ‘assign’ to emphasise this subtle difference betweentraditional assignments and name binding in patterns.

The Match Statement

The match statement evaluates an expression to produce a subject, finds thefirst pattern that matches the subject, and executes the associated blockof code. Syntactically, the match statement thus takes an expression anda sequence of case clauses, where each case clause comprises a pattern anda block of code.

Since case clauses comprise a block of code, they adhere to the existingindentation scheme with the syntactic structure of<keyword>...:<(indented)block>, which resembles a compoundstatement. The keywordcase reflects its widespread use inpattern matching languages, ignoring those languages that use othersyntactic means such as a symbol like|, because it would not fitestablished Python structures. The syntax of patterns following thekeyword is discussed below.

Given that the case clauses follow the structure of a compound statement,the match statement itself naturally becomes a compound statement itselfas well, following the same syntactic structure. This naturally leads tomatch<expr>:<case_clause>+. Note that the match statement determinesa quasi-scope in which the evaluated subject is kept alive (although not ina local variable), similar to how a with statement might keep a resourcealive during execution of its block. Furthermore, control flows from thematch statement to a case clause and then leaves the block of the matchstatement. The block of the match statement thus has both syntactic andsemantic meaning.

Various suggestions have sought to eliminate or avoid the naturally arising“double indentation” of a case clause’s code block. Unfortunately, all suchproposals offlat indentation schemes come at the expense of violatingPython’s established structural paradigm, leading to additional syntacticrules:

  • Unindented case clauses.The idea is to align case clauses with thematch, i.e.:
    matchexpression:casepattern_1:...casepattern_2:...

    This may look awkward to the eye of a Python programmer, becauseeverywhere else a colon is followed by an indent. Thematch wouldneither follow the syntactic scheme of simple nor composite statementsbut rather establish a category of its own.

  • Putting the expression on a separate line after “match”.The idea is to use the expression yielding the subject as a statementto avoid the singularity ofmatch having no actual block despitethe colons:
    match:expressioncasepattern_1:...casepattern_2:...

    This was ultimately rejected because the first block would be anothernovelty in Python’s grammar: a block whose only content is a singleexpression rather than a sequence of statements. Attempts to amend thisissue by adding or repurposing yet another keyword along the lines ofmatch:returnexpression did not yield any satisfactory solution.

Although flat indentation would save some horizontal space, the cost ofincreased complexity or unusual rules is too high. It would also complicatelife for simple-minded code editors. Finally, the horizontal space issue canbe alleviated by allowing “half-indent” (i.e. two spaces instead of four)for match statements (though we do not recommend this).

In sample programs usingmatch, written as part of the development of thisPEP, a noticeable improvement in code brevity is observed, more than makingup for the additional indentation level.

Statement vs. Expression. Some suggestions centered around the idea ofmakingmatch an expression rather than a statement. However, thiswould fit poorly with Python’s statement-oriented nature and lead tounusually long and complex expressions and the need to invent newsyntactic constructs or break well established syntactic rules. Anobvious consequence ofmatch as an expression would be that caseclauses could no longer have arbitrary blocks of code attached, but onlya single expression. Overall, the strong limitations could in no wayoffset the slight simplification in some special use cases.

Hard vs. Soft Keyword. There were options to make match a hard keyword,or choose a different keyword. Although using a hard keyword would simplifylife for simple-minded syntax highlighters, we decided not to use hardkeyword for several reasons:

  • Most importantly, the new parser doesn’t require us to do this. Unlikewithasync that caused hardships with being a soft keyword for fewreleases, here we can makematch a permanent soft keyword.
  • match is so commonly used in existing code, that it would breakalmost every existing program and will put a burden to fix code on manypeople who may not even benefit from the new syntax.
  • It is hard to find an alternative keyword that would not be commonly usedin existing programs as an identifier, and would still clearly reflect themeaning of the statement.

Use “as” or “|” instead of “case” for case clauses.The pattern matching proposed here is a combination of multi-branch controlflow (in line withswitch in Algol-derived languages orcond in Lisp)and object-deconstruction as found in functional languages. While the proposedkeywordcase highlights the multi-branch aspect, alternative keywords suchasas would equally be possible, highlighting the deconstruction aspect.as orwith, for instance, also have the advantage of already beingkeywords in Python. However, sincecase as a keyword can only occur as aleading keyword inside amatch statement, it is easy for a parser todistinguish between its use as a keyword or as a variable.

Other variants would use a symbol like| or=>, or go entirely withoutspecial marker.

Since Python is a statement-oriented language in the tradition of Algol, and aseach composite statement starts with an identifying keyword,case seemed tobe most in line with Python’s style and traditions.

Match Semantics

The patterns of different case clauses might overlap in that more thanone case clause would match a given subject. The first-to-match ruleensures that the selection of a case clause for a given subject isunambiguous. Furthermore, case clauses can have increasingly generalpatterns matching wider sets of subjects. The first-to-match rulethen ensures that the most precise pattern can be chosen (although itis the programmer’s responsibility to order the case clauses correctly).

In a statically typed language, the match statement would be compiled toa decision tree to select a matching pattern quickly and very efficiently.This would, however, require that all patterns be purely declarative andstatic, running against the established dynamic semantics of Python. Theproposed semantics thus represent a path incorporating the best of bothworlds: patterns are tried in a strictly sequential order so that eachcase clause constitutes an actual statement. At the same time, we allowthe interpreter to cache any information about the subject or change theorder in which subpatterns are tried. In other words: if the interpreterhas found that the subject is not an instance of a classC, it candirectly skip case clauses testing for this again, without having toperform repeated instance-checks. If a guard stipulates that a variablex must be positive, say (i.e.ifx>0), the interpreter mightcheck this directly after bindingx and before any furthersubpatterns are considered.

Binding and scoping. In many pattern matching implementations, eachcase clause would establish a separate scope of its own. Variables boundby a pattern would then only be visible inside the corresponding case block.In Python, however, this does not make sense. Establishing separate scopeswould essentially mean that each case clause is a separate function withoutdirect access to the variables in the surrounding scope (without having toresort tononlocal that is). Moreover, a case clause could no longerinfluence any surrounding control flow through standard statement such asreturn orbreak. Hence, such strict scoping would lead tounintuitive and surprising behavior.

A direct consequence of this is that any variable bindings outlive therespective case or match statements. Even patterns that only match asubject partially might bind local variables (this is, in fact, necessaryfor guards to function properly). However, these semantics for variablebinding are in line with existing Python structures such as for loops andwith statements.

Guards

Some constraints cannot be adequately expressed through patterns alone.For instance, a ‘less’ or ‘greater than’ relationship defies the usual‘equal’ semantics of patterns. Moreover, different subpatterns areindependent and cannot refer to each other. The addition ofguardsaddresses these restrictions: a guard is an arbitrary expression attachedto a pattern and that must evaluate to a “truthy” value for the pattern to succeed.

For example,case[x,y]ifx<y: uses a guard (ifx<y) toexpress a ‘less than’ relationship between two otherwise disjoint capturepatternsx andy.

From a conceptual point of view, patterns describe structural constraintson the subject in a declarative style, ideally without any side-effects.Recall, in particular, that patterns are clearly distinct from expressions,following different objectives and semantics. Guards then enhance caseblocks in a highly controlled way with arbitrary expressions (that mighthave side effects). Splitting the overall functionality into a static structuraland a dynamically evaluated part not only helps with readability, but canalso introduce dramatic potential for compiler optimizations. To keep thisclear separation, guards are only supported on the level of case clausesand not for individual patterns.

Example using guards:

defsort(seq):matchseq:case[]|[_]:returnseqcase[x,y]ifx<=y:returnseqcase[x,y]:return[y,x]case[x,y,z]ifx<=y<=z:returnseqcase[x,y,z]ifx>=y>=z:return[z,y,x]case[p,*rest]:a=sort([xforxinrestifx<=p])b=sort([xforxinrestifp<x])returna+[p]+b

Patterns

Patterns fulfill two purposes: they impose (structural) constraints onthe subject and they specify which data values should be extracted fromthe subject and bound to variables. In iterable unpacking, which can beseen as a prototype to pattern matching in Python, there is only onestructural pattern to express sequences while there is a rich set ofbinding patterns to assign a value to a specific variable or field.Full pattern matching differs from this in that there is more varietyin structural patterns but only a minimum of binding patterns.

Patterns differ from assignment targets (as in iterable unpacking) in two ways:they impose additional constraints on the structure of the subject, anda subject may safely fail to match a specific pattern at any point(in iterable unpacking, this constitutes an error). The latter means thatpattern should avoid side effects wherever possible.

This desire to avoid side effects is one reason why capture patternsdon’t allow binding values to attributes or subscripts: if thecontaining pattern were to fail in a later step, it would be hard torevert such bindings.

A cornerstone of pattern matching is the possibility of arbitrarilynesting patterns. The nesting allows expressing deeptree structures (for an example of nested class patterns, see the motivationsection above) as well as alternatives.

Although patterns might superficially look like expressions,it is important to keep in mind that there is a clear distinction. In fact,no pattern is or contains an expression. It is more productive to think ofpatterns as declarative elements similar to the formal parameters in afunction definition.

AS Patterns

Patterns fall into two categories: most patterns impose a (structural)constraint that the subject needs to fulfill, whereas the capture patternbinds the subject to a name without regard for the subject’s structure oractual value. Consequently, a pattern can either express a constraint orbind a value, but not both. AS patterns fill this gap in that theyallow the user to specify a general pattern as well as capture the subjectin a variable.

Typical use cases for the AS pattern include OR and Class patternstogether with a binding name as in, e.g.,caseBinOp('+'|'-'asop,...):orcase[int()asfirst,int()assecond]:. The latter could beunderstood as saying that the subject must fulfil two distinct pattern:[first,second] as well as[int(),int()]. The AS patterncan thus be seen as a special case of an ‘and’ pattern (see OR patternsbelow for an additional discussion of ‘and’ patterns).

In an earlier version, the AS pattern was devised as a ‘Walrus pattern’,written ascase[first:=int(),second:=int()]. However, usingasoffers some advantages over:=:

  • The walrus operator:= is used to capture the result of an expressionon the right hand side, whereasas generally indicates some form of‘processing’ as inimportfooasbar orexceptEaserr:. Indeed,the patternPasx does not assign the patternP tox, butrather the subject that successfully matchesP.
  • as allows for a more consistent data flow from left to right (theattributes in Class patterns also follow a left-to-right data flow).
  • The walrus operator looks very similar to the syntax for matching attributes in the Class pattern,potentially leading to some confusion.

Example using the AS pattern:

defsimplify_expr(tokens):matchtokens:case[('('|'[')asl,*expr,(')'|']')asr]if(l+r)in('()','[]'):returnsimplify_expr(expr)case[0,('+'|'-')asop,right]:returnUnaryOp(op,right)case[(int()|float()asleft)|Num(left),'+',(int()|float()asright)|Num(right)]:returnNum(left+right)case[(int()|float())asvalue]:returnNum(value)

OR Patterns

The OR pattern allows you to combine ‘structurally equivalent’ alternativesinto a new pattern, i.e. several patterns can share a common handler. If anyof an OR pattern’s subpatterns matches the subject, the entire ORpattern succeeds.

Statically typed languages prohibit the binding of names (capture patterns)inside an OR pattern because of potential conflicts concerning the types ofvariables. As a dynamically typed language, Python can be less restrictivehere and allow capture patterns inside OR patterns. However, each subpatternmust bind the same set of variables so as not to leave potentially undefinednames. With two alternativesP|Q, this means that ifP binds thevariablesu andv,Q must bind exactly the same variablesu andv.

There was some discussion on whether to use the bar symbol| or theorkeyword to separate alternatives. The OR pattern does not fully fitthe existing semantics and usage of either of these two symbols. However,| is the symbol of choice in all programming languages with support ofthe OR pattern and is used in that capacity for regular expressions inPython as well. It is also the traditional separator between alternativesin formal grammars (including Python’s).Moreover,| is not only used for bitwise OR, but alsofor set unions and dict merging (PEP 584).

Other alternatives were considered as well, but none of these would allowOR-patterns to be nested inside other patterns:

  • Using a comma:
    case401,403,404:print("Some HTTP error")

    This looks too much like a tuple – we would have to find a different wayto spell tuples, and the construct would have to be parenthesized insidethe argument list of a class pattern. In general, commas already have manydifferent meanings in Python, we shouldn’t add more.

  • Using stacked cases:
    case401:case403:case404:print("Some HTTP error")

    This is how this would be done inC, using its fall-through semanticsfor cases. However, we don’t want to mislead people into thinking thatmatch/case uses fall-through semantics (which are a common source of bugsinC). Also, this would be a novel indentation pattern, which might makeit harder to support in IDEs and such (it would break the simple rule “addan indentation level after a line ending in a colon”). Finally, thiswould not support OR patterns nested inside other patterns, either.

  • Using “case in” followed by a comma-separated list:
    casein401,403,404:print("Some HTTP error")

    This would not work for OR patterns nested inside other patterns, like:

    casePoint(0|1,0|1):print("A corner of the unit square")

AND and NOT Patterns

Since this proposal defines an OR-pattern (|) to match one of several alternates,why not also an AND-pattern (&) or even a NOT-pattern (!)?Especially given that some other languages (F# for example) supportAND-patterns.

However, it is not clear how useful this would be. The semantics for matchingdictionaries, objects and sequences already incorporates an implicit ‘and’:all attributes and elements mentioned must be present for the match tosucceed. Guard conditions can also support many of the use cases that ahypothetical ‘and’ operator would be used for.

A negation of a match pattern using the operator! as a prefixwould match exactly if the pattern itself does not match. Forinstance,!(3|4) would match anything except3 or4.However, there isevidence from other languages that this israrely useful, and primarily used as double negation!! to controlvariable scopes and prevent variable bindings (which does not apply toPython). Other use cases are better expressed using guards.

In the end, it was decided that this would make the syntax more complexwithout adding a significant benefit. It can always be added later.

Example using the OR pattern:

defsimplify(expr):matchexpr:case('/',0,0):returnexprcase('*'|'/',0,_):return0case('+'|'-',x,0)|('+',0,x)|('*',1,x)|('*'|'/',x,1):returnxreturnexpr

Literal Patterns

Literal patterns are a convenient way for imposing constraints on thevalue of a subject, rather than its type or structure. They alsoallow you to emulate a switch statement using pattern matching.

Generally, the subject is compared to a literal pattern by means of standardequality (x==y in Python syntax). Consequently, the literal patterns1.0 and1 match exactly the same set of objects, i.e.case1.0:andcase1: are fully interchangeable. In principle,True would alsomatch the same set of objects becauseTrue==1 holds. However, webelieve that many users would be surprised finding thatcaseTrue:matched the subject1.0, resulting in some subtle bugs and convolutedworkarounds. We therefore adopted the rule that the three singletonpatternsNone,False andTrue match by identity (xisy inPython syntax) rather than equality. Hence,caseTrue: will match onlyTrue and nothing else. Note thatcase1: would still matchTrue,though, because the literal pattern1 works by equality and not identity.

Early ideas to induce a hierarchy on numbers so thatcase1.0 wouldmatch both the integer1 and the floating point number1.0, whereascase1: would only match the integer1 were eventually dropped infavor of the simpler and more consistent rule based on equality. Moreover, anyadditional checks whether the subject is an instance ofnumbers.Integralwould come at a high runtime cost to introduce what would essentially bea novel idea in Python. When needed, the explicit syntaxcaseint(1): canbe used.

Recall that literal patterns arenot expressions, but directlydenote a specific value. From a pragmatic point of view, we want toallow using negative and even complex values as literal patterns, butthey are not atomic literals (only unsigned real and imaginary numbersare). E.g.,-3+4j is syntactically an expression of the formBinOp(UnaryOp('-',3),'+',4j). Since expressions are not partof patterns, we had to add explicit syntactic support for such valueswithout having to resort to full expressions.

Interpolatedf-strings, on theother hand, are not literal values, despite their appearance and cantherefore not be used as literal patterns (string concatenation, however,is supported).

Literal patterns not only occur as patterns in their own right, but alsoas keys inmapping patterns.

Range matching patterns.This would allow patterns such as1...6. However, there are a host ofambiguities:

  • Is the range open, half-open, or closed? (I.e. is6 included in theabove example or not?)
  • Does the range match a single number, or a range object?
  • Range matching is often used for character ranges (‘a’…’z’) but thatwon’t work in Python since there’s no character data type, just strings.
  • Range matching can be a significant performance optimization if you canpre-build a jump table, but that’s not generally possible in Python dueto the fact that names can be dynamically rebound.

Rather than creating a special-case syntax for ranges, it was decidedthat allowing custom pattern objects (InRange(0,6)) would be more flexibleand less ambiguous; however those ideas have been postponed for the timebeing.

Example using Literal patterns:

defsimplify(expr):matchexpr:case('+',0,x):returnxcase('+'|'-',x,0):returnxcase('and',True,x):returnxcase('and',False,x):returnFalsecase('or',False,x):returnxcase('or',True,x):returnTruecase('not',('not',x)):returnxreturnexpr

Capture Patterns

Capture patterns take on the form of a name that accepts any value and bindsit to a (local) variable (unless the name is declared asnonlocal orglobal). In that sense, a capture pattern is similarto a parameter in a function definition (when the function is called, eachparameter binds the respective argument to a local variable in the function’sscope).

A name used for a capture pattern must not coincide with another capturepattern in the same pattern. This, again, is similar to parameters, whichequally require each parameter name to be unique within the list ofparameters. It differs, however, from iterable unpacking assignment, wherethe repeated use of a variable name as target is permissible (e.g.,x,x=1,2). The rationale for not supporting(x,x) in patternsis its ambiguous reading: it could be seen as in iterable unpacking whereonly the second binding tox survives. But it could be equally seen asexpressing a tuple with two equal elements (which comes with its own issues).Should the need arise, then it is still possible to introduce support forrepeated use of names later on.

There were calls to explicitly mark capture patterns and thus identify themas binding targets. According to that idea, a capture pattern would bewritten as, e.g.?x,$x or=x. The aim of such explicit capturemarkers is to let an unmarked name be a value pattern (see below).However, this is based on the misconception that pattern matching was anextension ofswitch statements, placing the emphasis on fast switching basedon (ordinal) values. Such aswitch statement has indeed been proposed forPython before (seePEP 275 andPEP 3103). Pattern matching, on the otherhand, builds a generalized concept of iterable unpacking. Binding valuesextracted from a data structure is at the very core of the concept and hencethe most common use case. Explicit markers for capture patterns would thusbetray the objective of the proposed pattern matching syntax and simplifya secondary use case at the expense of additional syntactic clutter forcore cases.

It has been proposed that capture patterns are not needed at all,since the equivalent effect can be obtained by combining an ASpattern with a wildcard pattern (e.g.,case_asx is equivalenttocasex). However, this would be unpleasantly verbose,especially given that we expect capture patterns to be very common.

Example using Capture patterns:

defaverage(*args):matchargs:case[x,y]:# captures the two elements of a sequencereturn(x+y)/2case[x]:# captures the only element of a sequencereturnxcase[]:return0casea:# captures the entire sequencereturnsum(a)/len(a)

Wildcard Pattern

The wildcard pattern is a special case of a ‘capture’ pattern: it acceptsany value, but does not bind it to a variable. The idea behind this ruleis to support repeated use of the wildcard in patterns. While(x,x)is an error,(_,_) is legal.

Particularly in larger (sequence) patterns, it is important to allow thepattern to concentrate on values with actual significance while ignoringanything else. Without a wildcard, it would become necessary to ‘invent’a number of local variables, which would be bound but never used. Evenwhen sticking to naming conventions and using e.g._1,_2,_3 to nameirrelevant values, say, this still introduces visual clutter and can hurtperformance (compare the sequence pattern(x,y,*z) to(_,y,*_),where the*z forces the interpreter to copy a potentially very longsequence, whereas the second version simply compiles to code along thelines ofy=seq[1]).

There has been much discussion about the choice of the underscore as_as a wildcard pattern, i.e. making this one name non-binding. However, theunderscore is already heavily used as an ‘ignore value’ marker in iterableunpacking. Since the wildcard pattern_ never binds, this use of theunderscore does not interfere with other uses such as inside the REPL orthegettext module.

It has been proposed to use... (i.e., the ellipsis token) or*(star) as a wildcard. However, both these look as if an arbitrary numberof items is omitted:

case[a,...,z]:...case[a,*,z]:...

Either example looks like it would match a sequence of two or moreitems, capturing the first and last values. While that may be theultimate “wildcard”, it does not convey the desired semantics.

An alternative that does not suggest an arbitrary number of itemswould be?. This is even being proposed independently frompattern matching inPEP 640. We feel however that using? as aspecial “assignment” target is likely more confusing to Python usersthan using_. It violates Python’s (admittedly vague) principleof using punctuation characters only in ways similar to how they areused in common English usage or in high school math, unless the usageisvery well established in other programming languages (like, e.g.,using a dot for member access).

The question mark fails on both counts: its use in other programminglanguages is a grab-bag of usages only vaguely suggested by the ideaof a “question”. For example, it means “any character” in shellglobbing, “maybe” in regular expressions, “conditional expression” inC and many C-derived languages, “predicate function” in Scheme,“modify error handling” in Rust, “optional argument” and “optionalchaining” in TypeScript (the latter meaning has also been proposed forPython byPEP 505). An as yet unnamed PEP proposes it to markoptional types, e.g.int?.

Another common use of? in programming systems is “help”, forexample, in IPython and Jupyter Notebooks and many interactivecommand-line utilities.

In addition, this would put Python in a rather unique position:The underscore is as a wildcard pattern ineveryprogramming language with pattern matching that we could find(includingC#,Elixir,Erlang,F#,Grace,Haskell,Mathematica,OCaml,Ruby,Rust,Scala,Swift, andThorn).Keeping in mind that many users of Python also work with other programminglanguages, have prior experience when learning Python, and may move on toother languages after having learned Python, we find that suchwell-established standards are important and relevant with respect toreadability and learnability. In our view, concerns that this wildcardmeans that a regular name received special treatment are not strongenough to introduce syntax that would make Python special.

Else blocks. A case block without a guard whose pattern is a singlewildcard (i.e.,case_:) accepts any subject without binding it toa variable or performing any other operation. It is thus semanticallyequivalent toelse:, if it were supported. However, adding suchan else block to the match statement syntax would not remove the needfor the wildcard pattern in other contexts. Another argument againstthis is that there would be two plausible indentation levels for anelse block: aligned withcase or aligned withmatch. Theauthors have found it quite contentious which indentation level toprefer.

Example using the Wildcard pattern:

defis_closed(sequence):matchsequence:case[_]:# any sequence with a single elementreturnTruecase[start,*_,end]:# a sequence with at least two elementsreturnstart==endcase_:# anythingreturnFalse

Value Patterns

It is good programming style to use named constants for parametric values orto clarify the meaning of particular values. Clearly, it would be preferableto writecase(HttpStatus.OK,body): overcase(200,body):, for example. The main issue that arises here is how todistinguish capture patterns (variable bindings) from value patterns. Thegeneral discussion surrounding this issue has brought forward a plethora ofoptions, which we cannot all fully list here.

Strictly speaking, value patterns are not really necessary, butcould be implemented using guards, i.e.case(status,body)ifstatus==HttpStatus.OK:. Nonetheless, theconvenience of value patterns is unquestioned and obvious.

The observation that constants tend to be written in uppercase letters orcollected in enumeration-like namespaces suggests possible rules to discernconstants syntactically. However, the idea of using upper- vs. lowercase asa marker has been met with scepticism since there is no similar precedencein core Python (although it is common in other languages). We therefore onlyadopted the rule that any dotted name (i.e., attribute access) is to beinterpreted as a value pattern, for exampleHttpStatus.OKabove. This precludes, in particular, local variables and globalvariables defined in the current module from acting as constants.

A proposed rule to use a leading dot (e.g..CONSTANT) for that purpose was criticised because it was felt that thedot would not be a visible-enough marker for that purpose. Partly inspiredby forms found in other programming languages, a number of differentmarkers/sigils were proposed (such as^CONSTANT,$CONSTANT,==CONSTANT,CONSTANT?, or the word enclosed in backticks), althoughthere was no obvious or natural choice. The current proposal thereforeleaves the discussion and possible introduction of such a ‘constant’ markerfor a future PEP.

Distinguishing the semantics of names based on whether it is a globalvariable (i.e. the compiler would treat global variables as constants ratherthan capture patterns) leads to various issues. The addition or alterationof a global variable in the module could have unintended side effects onpatterns. Moreover, pattern matching could not be used directly inside amodule’s scope because all variables would be global, making capturepatterns impossible.

Example using the Value pattern:

defhandle_reply(reply):matchreply:case(HttpStatus.OK,MimeType.TEXT,body):process_text(body)case(HttpStatus.OK,MimeType.APPL_ZIP,body):text=deflate(body)process_text(text)case(HttpStatus.MOVED_PERMANENTLY,new_URI):resend_request(new_URI)case(HttpStatus.NOT_FOUND):raiseResourceNotFound()

Group Patterns

Allowing users to explicitly specify the grouping is particularly helpfulin case of OR patterns.

Sequence Patterns

Sequence patterns follow as closely as possible the already establishedsyntax and semantics of iterable unpacking. Of course, subpatterns takethe place of assignment targets (variables, attributes and subscript).Moreover, the sequence pattern only matches a carefully selected set ofpossible subjects, whereas iterable unpacking can be applied to anyiterable.

  • As in iterable unpacking, we do not distinguish between ‘tuple’ and‘list’ notation.[a,b,c],(a,b,c) anda,b,c are allequivalent. While this means we have a redundant notation and checkingspecifically for lists or tuples requires more effort (e.g.caselist([a,b,c])), we mimic iterable unpacking as much aspossible.
  • A starred pattern will capture a sub-sequence of arbitrary length,again mirroring iterable unpacking. Only one starred item may bepresent in any sequence pattern. In theory, patterns such as(*_,3,*_)could be understood as expressing any sequence containing the value3.In practice, however, this would only work for a very narrow set of usecases and lead to inefficient backtracking or even ambiguities otherwise.
  • The sequence pattern doesnot iterate through an iterable subject. Allelements are accessed through subscripting and slicing, and the subject mustbe an instance ofcollections.abc.Sequence. This includes, of course,lists and tuples, but excludes e.g. sets and dictionaries. While it wouldinclude strings and bytes, we make an exception for these (see below).

A sequence pattern cannot just iterate through any iterable object. Theconsumption of elements from the iteration would have to be undone if theoverall pattern fails, which is not feasible.

To identify sequences we cannot rely onlen() and subscripting andslicing alone, because sequences share these protocols with mappings(e.g.dict) in this regard. It would be surprising if a sequencepattern also matched a dictionaries or other objects implementingthe mapping protocol (i.e.__getitem__). The interpreter thereforeperforms an instance check to ensure that the subject in question reallyis a sequence (of known type). (As an optimization of the most commoncase, if the subject is exactly a list or a tuple, the instance checkcan be skipped.)

String and bytes objects have a dual nature: they are both ‘atomic’ objectsin their own right, as well as sequences (with a strongly recursive naturein that a string is a sequence of strings). The typical behavior and usecases for strings and bytes are different enough from those of tuples andlists to warrant a clear distinction. It is in fact often unintuitive andunintended that strings pass for sequences, as evidenced by regular questionsand complaints. Strings and bytes are therefore not matched by a sequencepattern, limiting the sequence pattern to a very specific understanding of‘sequence’. The built-inbytearray type, being a mutable version ofbytes, also deserves an exception; but we don’t intend toenumerate all other types that may be used to represent bytes(e.g. some, but not all, instances ofmemoryview andarray.array).

Mapping Patterns

Dictionaries or mappings in general are one of the most important and mostwidely used data structures in Python. In contrast to sequences, mappingsare built for fast direct access to arbitrary elements identified by a key.In most cases an element is retrieved from a dictionary by a known keywithout regard for any ordering or other key-value pairs stored in the samedictionary. Particularly common are string keys.

The mapping pattern reflects the common usage of dictionary lookup: it allowsthe user to extract some values from a mapping by means of constant/knownkeys and have the values match given subpatterns.Extra keys in the subject are ignored even if**rest is not present.This is different from sequence patterns, where extra items will cause amatch to fail. But mappings are actually different from sequences: theyhave natural structural sub-typing behavior, i.e., passing a dictionarywith extra keys somewhere will likely just work.Should it benecessary to impose an upper bound on the mapping and ensure that noadditional keys are present, then the usual double-star-pattern**restcan be used. The special case**_ with a wildcard, however, is notsupported as it would not have any effect, but might lead to an incorrectunderstanding of the mapping pattern’s semantics.

To avoid overly expensive matching algorithms, keys must be literals orvalue patterns.

There is a subtle reason for usingget(key,default) instead of__getitem__(key) followed by a check forAttributeError: ifthe subject happens to be adefaultdict, calling__getitem__for a non-existent key would add the key. Usingget() avoids thisunexpected side effect.

Example using the Mapping pattern:

defchange_red_to_blue(json_obj):matchjson_obj:case{'color':('red'|'#FF0000')}:json_obj['color']='blue'case{'children':children}:forchildinchildren:change_red_to_blue(child)

Class Patterns

Class patterns fulfill two purposes: checking whether a given subject isindeed an instance of a specific class, and extracting data from specificattributes of the subject. Anecdotal evidence revealed thatisinstance()is one of the most often used functions in Python in terms ofstatic occurrences in programs. Such instance checks typically precedea subsequent access to information stored in the object, or a possiblemanipulation thereof. A typical pattern might be along the lines of:

deftraverse_tree(node):ifisinstance(node,Node):traverse_tree(node.left)traverse_tree(node.right)elifisinstance(node,Leaf):print(node.value)

In many cases class patterns occur nested, as in the examplegiven in the motivation:

if(isinstance(node,BinOp)andnode.op=="+"andisinstance(node.right,BinOp)andnode.right.op=="*"):a,b,c=node.left,node.right.left,node.right.right# Handle a + b*c

The class pattern lets you concisely specify both an instance checkand relevant attributes (with possible further constraints). It isthereby very tempting to write, e.g.,caseNode(left,right): in thefirst case above andcaseLeaf(value): in the second. While thisindeed works well for languages with strict algebraic data types, it isproblematic with the structure of Python objects.

When dealing with general Python objects, we face a potentially very largenumber of unordered attributes: an instance ofNode contains a largenumber of attributes (most of which are ‘special methods’ such as__repr__). Moreover, the interpreter cannot reliably deduce theordering of attributes. For an object thatrepresents a circle, say, there is no inherently obvious ordering of theattributesx,y andradius.

We envision two possibilities for dealing with this issue: either explicitlyname the attributes of interest, or provide an additional mapping that tellsthe interpreter which attributes to extract and in which order. Bothapproaches are supported. Moreover, explicitly naming the attributes ofinterest lets you further specify the required structure of an object; ifan object lacks an attribute specified by the pattern, the match fails.

  • Attributes that are explicitly named pick up the syntax of named arguments.If an object of classNode has two attributesleft andrightas above, the patternNode(left=x,right=y) will extract the values ofboth attributes and assign them tox andy, respectively. The dataflow from left to right seems unusual, but is in line with mapping patternsand has precedents such as assignments viaas inwith- orimport-statements (and indeed AS patterns).

    Naming the attributes in question explicitly will be mostly used for morecomplex cases where the positional form (below) is insufficient.

  • The class field__match_args__ specifies a number of attributestogether with their ordering, allowing class patterns to rely on positionalsub-patterns without having to explicitly name the attributes in question.This is particularly handy for smaller objects or instances of data classes,where the attributes of interest are rather obvious and often have awell-defined ordering. In a way,__match_args__ is similar to thedeclaration of formal parameters, which allows calling functions withpositional arguments rather than naming all the parameters.

    This is a class attribute, because it needs to be looked up on the classnamed in the class pattern, not on the subject instance.

The syntax of class patterns is based on the idea that de-constructionmirrors the syntax of construction. This is already the case in virtuallyany Python construct, be assignment targets, function definitions oriterable unpacking. In all these cases, we find that the syntax forsending and that for receiving ‘data’ are virtually identical.

  • Assignment targets such as variables, attributes and subscripts:foo.bar[2]=foo.bar[3];
  • Function definitions: a function defined withdeffoo(x,y,z=6)is called as, e.g.,foo(123,y=45), where the actual argumentsprovided at the call site are matched against the formal parametersat the definition site;
  • Iterable unpacking:a,b=b,a or[a,b]=[b,a] or(a,b)=(b,a), just to name a few equivalent possibilities.

Using the same syntax for reading and writing, l- and r-values, orconstruction and de-construction is widely accepted for its benefits inthinking about data, its flow and manipulation. This equally extends tothe explicit construction of instances, where class patternsC(p,q)deliberately mirror the syntax of creating instances.

The special case for the built-in classesbool,bytearrayetc. (where e.g.str(x) captures the subject value inx) canbe emulated by a user-defined class as follows:

classMyClass:__match_args__=["__myself__"]__myself__=property(lambdaself:self)

Type annotations for pattern variables.The proposal was to combine patterns with type annotations:

matchx:case[a:int,b:str]:print(f"An int{a} and a string{b}:")case[a:int,b:int,c:int]:print("Three ints",a,b,c)...

This idea has a lot of problems. For one, the colon can onlybe used inside of brackets or parentheses, otherwise the syntax becomesambiguous. And because Python disallowsisinstance() checkson generic types, type annotations containing generics will notwork as expected.

History and Context

Pattern matching emerged in the late 1970s in the form of tuple unpackingand as a means to handle recursive data structures such as linked lists ortrees (object-oriented languages usually use the visitor pattern for handlingrecursive data structures). The early proponents of pattern matchingorganised structured data in ‘tagged tuples’ rather thanstruct as inC or the objects introduced later. A node in a binary tree would, forinstance, be a tuple with two elements for the left and right branches,respectively, and aNode tag, written asNode(left,right). InPython we would probably put the tag inside the tuple as('Node',left,right) or define a data classNode to achieve thesame effect.

Using modern syntax, a depth-first tree traversal would then be written asfollows:

deftraverse(node):matchnode:caseNode(left,right):traverse(left)traverse(right)caseLeaf(value):handle(value)

The notion of handling recursive data structures with pattern matchingimmediately gave rise to the idea of handling more general recursive‘patterns’ (i.e. recursion beyond recursive data structures)with pattern matching. Pattern matching would thus also be used to definerecursive functions such as:

deffib(arg):matcharg:case0:return1case1:return1casen:returnfib(n-1)+fib(n-2)

As pattern matching was repeatedly integrated into new and emergingprogramming languages, its syntax slightly evolved and expanded. The twofirst cases in thefib example above could be written more succinctlyascase0|1: with| denoting alternative patterns. Moreover, theunderscore_ was widely adopted as a wildcard, a filler where neitherthe structure nor value of parts of a pattern were of substance. Since theunderscore is already frequently used in equivalent capacity in Python’siterable unpacking (e.g.,_,_,third,_*=something) we kept theseuniversal standards.

It is noteworthy that the concept of pattern matching has always beenclosely linked to the concept of functions. The different case clauseshave always been considered as something like semi-independent functionswhere pattern variables take on the role of parameters. This becomesmost apparent when pattern matching is written as an overloaded function,along the lines of (Standard ML):

funfib0=1|fib1=1|fibn=fib(n-1)+fib(n-2)

Even though such a strict separation of case clauses into independentfunctions does not apply in Python, we find that patterns share manysyntactic rules with parameters, such as binding arguments to unqualifiednames only or that variable/parameter names must not be repeated fora particular pattern/function.

With its emphasis on abstraction and encapsulation, object-orientedprogramming posed a serious challenge to pattern matching. In short: inobject-oriented programming, we can no longer view objects as tagged tuples.The arguments passed into the constructor do not necessarily specify theattributes or fields of the objects. Moreover, there is no longer a strictordering of an object’s fields and some of the fields might be private andthus inaccessible. And on top of this, the given object might actually bean instance of a subclass with slightly different structure.

To address this challenge, patterns became increasingly independent of theoriginal tuple constructors. In a pattern likeNode(left,right),Node is no longer a passive tag, but rather a function that can activelycheck for any given object whether it has the right structure and extract aleft andright field. In other words: theNode-tag becomes afunction that transforms an object into a tuple or returns some failureindicator if it is not possible.

In Python, we simply useisinstance() together with the__match_args__field of a class to check whether an object has the correct structure andthen transform some of its attributes into a tuple. For theNode exampleabove, for instance, we would have__match_args__=('left','right') toindicate that these two attributes should be extracted to form the tuple.That is,caseNode(x,y) would first check whether a given object is aninstance ofNode and then assignleft tox andright toy,respectively.

Paying tribute to Python’s dynamic nature with ‘duck typing’, however, wealso added a more direct way to specify the presence of, or constraints onspecific attributes. Instead ofNode(x,y) you could also writeobject(left=x,right=y), effectively eliminating theisinstance()check and thus supporting any object withleft andright attributes.Or you would combine these ideas to writeNode(right=y) so as to requirean instance ofNode but only extract the value of theright attribute.

Backwards Compatibility

Through its use of “soft keywords” and the new PEG parser (PEP 617),the proposal remains fully backwards compatible. However, 3rd partytooling that uses a LL(1) parser to parse Python source code may beforced to switch parser technology to be able to support those samefeatures.

Security Implications

We do not expect any security implications from this language feature.

Reference Implementation

Afeature-complete CPython implementation is available onGitHub.

Aninteractive playgroundbased on the above implementation was created using Binder[2] and Jupyter[3].

References

[1]
Kohn et al., Dynamic Pattern Matching with Pythonhttps://gvanrossum.github.io/docs/PyPatternMatching.pdf
[2]
Binderhttps://mybinder.org
[3]
Jupyterhttps://jupyter.org

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0635.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2026 Movatter.jp