Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 622 – Structural Pattern Matching

Author:
Brandt Bucher <brandt at python.org>,Daniel F Moisset <dfmoisset at gmail.com>,Tobias Kohn <kohnt at tobiaskohn.ch>,Ivan Levkivskyi <levkivskyi at gmail.com>,Guido van Rossum <guido at python.org>,Talin <viridia at gmail.com>
BDFL-Delegate:

Discussions-To:
Python-Dev list
Status:
Superseded
Type:
Standards Track
Created:
23-Jun-2020
Python-Version:
3.10
Post-History:
23-Jun-2020, 08-Jul-2020
Superseded-By:
634

Table of Contents

Abstract

This PEP proposes to add apattern matching statement to Python,inspired by similar syntax found in Scala, Erlang, and other languages.

Patterns and shapes

Thepattern syntax builds on Python’s existing syntax for sequenceunpacking (e.g.,a,b=value).

Amatch statement compares a value (thesubject)to several different shapes (thepatterns) until a shape fits.Each pattern describes the type and structure of the accepted valuesas well as the variables where to capture its contents.

Patterns can specify the shape to be:

  • a sequence to be unpacked, as already mentioned
  • a mapping with specific keys
  • an instance of a given class with (optionally) specific attributes
  • a specific value
  • a wildcard

Patterns can be composed in several ways.

Syntax

Syntactically, amatch statement contains:

  • asubject expression
  • one or morecase clauses

Eachcase clause specifies:

  • a pattern (the overall shape to be matched)
  • an optional “guard” (a condition to be checked if the pattern matches)
  • a code block to be executed if the case clause is selected

Motivation

The rest of the PEP:

  • motivates why we believe pattern matching makes a good addition to Python
  • explains our design choices
  • contains a precise syntactic and runtime specification
  • gives guidance for static type checkers (and one small addition to thetyping module)
  • discusses the main objections and alternatives that have beenbrought up during extensive discussion of the proposal, both withinthe group of authors and in the python-dev community

Finally, we discuss some possible extensions that might be consideredin the future, once the community has ample experience with thecurrently proposed syntax and semantics.

Overview

Patterns are a new syntactical category with their own rulesand special cases. Patterns mix input (given values) and output(captured variables) in novel ways. They may take a little time touse effectively. The authors have provideda brief introduction to the basic concepts here. Note that this sectionis not intended to be complete or entirely accurate.

Pattern, a new syntactic construct, and destructuring

A new syntactic construct calledpattern is introduced in thisPEP. Syntactically, patterns look like a subset of expressions.The following are examples of patterns:

  • [first,second,*rest]
  • Point2d(x,0)
  • {"name":"Bruce","age":age}
  • 42

The above expressions may look like examples of object constructionwith a constructor which takes some values as parameters andbuilds an object from those components.

When viewed as a pattern, the above patterns mean the inverse operation ofconstruction, which we calldestructuring.Destructuring takes a subject valueand extracts its components.

The syntactic similarity between object construction and destructuring isintentional. It also follows the existingPythonic style of contexts which makes assignment targets (write contexts) looklike expressions (read contexts).

Pattern matching never creates objects. This is in the same way that[a,b]=my_list doesn’t create anew[a,b] list, nor reads the values ofa andb.

Matching process

During this matching process,the structure of the pattern may not fit the subject, and matchingfails.

For example, matching the patternPoint2d(x,0) to the subjectPoint2d(3,0) successfully matches. The match alsobindsthe pattern’s free variablex to the subject’s value3.

As another example, if the subject is[3,0], the match failsbecause the subject’s typelist is not the pattern’sPoint2d.

As a third example, if the subject isPoint2d(3,7), the match fails because thesubject’s second coordinate7 is not the same as the pattern’s0.

Thematch statement tries to match a single subject to each of thepatterns in itscase clauses. At the firstsuccessful match to a pattern in acase clause:

  • the variables in the pattern are assigned, and
  • a corresponding block is executed.

Eachcase clause can also specify an optional boolean condition,known as aguard.

Let’s look at a more detailed example of amatch statement. Thematch statement is used within a function to define the buildingof 3D points. In this example, the function can accept as input any ofthe following: tuple with 2 elements, tuple with 3 elements, anexisting Point2d object or an existing Point3d object:

defmake_point_3d(pt):matchpt:case(x,y):returnPoint3d(x,y,0)case(x,y,z):returnPoint3d(x,y,z)casePoint2d(x,y):returnPoint3d(x,y,0)casePoint3d(_,_,_):returnptcase_:raiseTypeError("not a point we support")

Without pattern matching, this function’s implementation would require severalisinstance() checks, one or twolen() calls, and a moreconvoluted control flow. Thematch example version and the traditionalPython version withoutmatch translate into similar code under the hood.With familiarity of pattern matching, a user reading this function usingmatchwill likely find this version clearer than the traditional approach.

Rationale and Goals

Python programs frequently need to handle data which varies in type,presence of attributes/keys, or number of elements. Typical examplesare operating on nodes of a mixed structure like an AST, handling UIevents of different types, processing structured input (likestructured files or network messages), or “parsing” arguments for afunction that can accept different combinations of types and numbersof parameters. In fact, the classic ‘visitor’ pattern is an example of this,done in an OOP style – but matching makes it much less tedious to write.

Much of the code to do so tends to consist of complex chains of nestedif/elif statements, including multiple calls tolen(),isinstance() and index/key/attribute access. Inside those branchesusers sometimes need to destructure the data further to extract therequired component values, which may be nested several objects deep.

Pattern matching as present in many other languages provides anelegant solution to this problem. These range from statically compiledfunctional languages like F# and Haskell, via mixed-paradigm languageslikeScala andRust, to dynamic languages like Elixir andRuby, and is under consideration for JavaScript. We are indebted tothese languages for guiding the way to Pythonic pattern matching, asPython is indebted to so many other languages for many of itsfeatures: many basic syntactic features were inherited from C,exceptions from Modula-3, classes were inspired by C++, slicing camefrom Icon, regular expressions from Perl, decorators resemble Javaannotations, and so on.

The usual logic for operating on heterogeneous data can be summarizedin the following way:

  • Some analysis is done on theshape (type and components) of thedata: This could involveisinstance() orlen() calls and/or extractingcomponents (via indexing or attribute access) which are checked forspecific values or conditions.
  • If the shape is as expected, some more components are possiblyextracted and some operation is done using the extracted values.

Take for examplethis piece of the Django web framework:

if(isinstance(value,(list,tuple))andlen(value)>1andisinstance(value[-1],(Promise,str))):*value,label=valuevalue=tuple(value)else:label=key.replace('_',' ').title()

We can see the shape analysis of thevalue at the top, followingby the destructuring inside.

Note that shape analysis here involves checking the types both of thecontainer and of one of its components, and some checks on its numberof elements. Once we match the shape, we need to decompose thesequence. With the proposal in this PEP, we could rewrite that codeinto this:

matchvalue:case[*v,label:=(Promise()|str())]ifv:value=tuple(v)case_:label=key.replace('_',' ').title()

This syntax makes much more explicit which formats are possible forthe input data, and which components are extracted from where. You cansee a pattern similar to list unpacking, but also type checking: thePromise() pattern is not an object construction, but representsanything that’s an instance ofPromise. The pattern operator|separates alternative patterns (not unlike regular expressions or EBNFgrammars), and_ is a wildcard. (Note that the match syntax usedhere will accept user-defined sequences, as well as lists and tuples.)

In some occasions, extraction of information is not as relevant asidentifying structure. Take the following example from thePython standard library:

defis_tuple(node):ifisinstance(node,Node)andnode.children==[LParen(),RParen()]:returnTruereturn(isinstance(node,Node)andlen(node.children)==3andisinstance(node.children[0],Leaf)andisinstance(node.children[1],Node)andisinstance(node.children[2],Leaf)andnode.children[0].value=="("andnode.children[2].value==")")

This example shows an example of finding out the “shape” of the datawithout doing significant extraction. This code is not very easy toread, and the intended shape that this is trying to match is notevident. Compare with the updated code using the proposed syntax:

defis_tuple(node:Node)->bool:matchnode:caseNode(children=[LParen(),RParen()]):returnTruecaseNode(children=[Leaf(value="("),Node(),Leaf(value=")")]):returnTruecase_:returnFalse

Note that the proposed code will work without any modifications to thedefinition ofNode and other classes here. As shown in theexamples above, the proposal supports not just unpacking sequences, butalso doingisinstance checks (likeLParen() orstr()),looking into object attributes (Leaf(value="(") for example) andcomparisons with literals.

That last feature helps with some kinds of code which look more likethe “switch” statement as present in other languages:

matchresponse.status:case200:do_something(response.data)# OKcase301|302:retry(response.location)# Redirectcase401:retry(auth=get_credentials())# Login firstcase426:sleep(DELAY)# Server is swamped, try after a bitretry()case_:raiseRequestError("we couldn't get the data")

Although this will work, it’s not necessarily what the proposal isfocused on, and the new syntax has been designed to best support thedestructuring scenarios.

See thesyntax sections belowfor a more detailed specification.

We propose that destructuring objects can be customized by a newspecial__match_args__ attribute. As part of this PEP we specifythe general API and its implementation for some standard libraryclasses (including named tuples and dataclasses). See theruntime section below.

Finally, we aim to provide comprehensive support for static typecheckers and similar tools. For this purpose, we propose to introducea@typing.sealed class decorator that will be a no-op at runtimebut will indicate to static tools that all sub-classes of this classmust be defined in the same module. This will allow effective staticexhaustiveness checks, and together with dataclasses, will providebasic support foralgebraic data types. See thestatic checkers section for more details.

Syntax and Semantics

Patterns

Thepattern is a new syntactic construct, that could be considered a loosegeneralization of assignment targets. The key properties of a pattern are whattypes and shapes of subjects it accepts, what variables it captures and howit extracts them from the subject. For example, the pattern[a,b] matchesonly sequences of exactly 2 elements, extracting the first element intoaand the second one intob.

This PEP defines several types of patterns. These are certainly not theonly possible ones, so the design decision was made to choose a subset offunctionality that is useful now but conservative. More patterns can be addedlater as this feature gets more widespread use. See therejected ideasanddeferred ideas sections for more details.

The patterns listed here are described in more detail below, but summarizedtogether in this section for simplicity:

  • Aliteral pattern is useful to filter constant values in a structure.It looks like a Python literal (including some values likeTrue,False andNone). It only matches objects equal to the literal, andnever binds.
  • Acapture pattern looks likex and is equivalent to an identicalassignment target: it always matches and binds the variablewith the given (simple) name.
  • Thewildcard pattern is a single underscore:_. It always matches,but does not capture any variable (which prevents interference with otheruses for_ and allows for some optimizations).
  • Aconstant value pattern works like the literal but for certain namedconstants. Note that it must be a qualified (dotted) name, given the possibleambiguity with a capture pattern. It looks likeColor.RED andonly matches values equal to the corresponding value. It never binds.
  • Asequence pattern looks like[a,*rest,b] and is similar toa list unpacking. An important difference is that the elements nestedwithin it can be any kind of patterns, not just names or sequences.It matches only sequences of appropriate length, as long as all the sub-patternsalso match. It makes all the bindings of its sub-patterns.
  • Amapping pattern looks like{"user":u,"emails":[*es]}. It matchesmappings with at least the set of provided keys, and if all thesub-patterns match their corresponding values. It binds whatever thesub-patterns bind while matching with the values corresponding to the keys.Adding**rest at the end of the pattern to capture extra items is allowed.
  • Aclass pattern is similar to the above but matches attributes insteadof keys. It looks likedatetime.date(year=y,day=d). It matchesinstances of the given type, having at least the specifiedattributes, as long as the attributes match with the correspondingsub-patterns. It binds whatever the sub-patterns bind when matching with thevalues ofthe given attributes. An optional protocol also allows matching positionalarguments.
  • AnOR pattern looks like[*x]|{"elems":[*x]}. It matches if anyof its sub-patterns match. It uses the binding for the leftmost patternthat matched.
  • Awalrus pattern looks liked:=datetime(year=2020,month=m). Itmatches onlyif its sub-pattern also matches. It binds whatever the sub-pattern match does, andalso binds the named variable to the entire object.

Thematch statement

A simplified, approximate grammar for the proposed syntax is:

...compound_statement:|if_stmt...|match_stmtmatch_stmt:"match"expression':'NEWLINEINDENTcase_block+DEDENTcase_block:"case"pattern[guard]':'blockguard:'if'expressionpattern:walrus_pattern|or_patternwalrus_pattern:NAME':='or_patternor_pattern:closed_pattern('|'closed_pattern)*closed_pattern:|literal_pattern|capture_pattern|wildcard_pattern|constant_pattern|sequence_pattern|mapping_pattern|class_pattern

SeeAppendix A for the full, unabridged grammar.The simplified grammars in this section are there for helping the reader,not as a full specification.

We propose that the match operation should be a statement, not an expression.Although inmany languages it is an expression, being a statement better suits the generallogic of Python syntax. Seerejected ideas for more discussion.The allowed patterns are described in detail below in thepatterns subsection.

Thematch andcase keywords are proposed to be soft keywords,so that they are recognized as keywords at the beginning of a matchstatement or case block respectively, but are allowed to be used inother places as variable or argument names.

The proposed indentation structure is as following:

matchsome_expression:casepattern_1:...casepattern_2:...

Here,some_expression represents the value that is being matched against,which will be referred to hereafter as thesubject of the match.

Match semantics

The proposed large scale semantics for choosing the match is to choose the firstmatching pattern and execute the corresponding suite. The remaining patternsare not tried. If there are no matching patterns, the statement ‘fallsthrough’, and execution continues at the following statement.

Essentially this is equivalent to a chain ofif...elif...elsestatements. Note that unlike for the previously proposedswitch statement,the pre-computed dispatch dictionary semantics does not apply here.

There is nodefault orelse case - instead the special wildcard_ can be used (see the section oncapture_pattern)as a final ‘catch-all’ pattern.

Name bindings made during a successful pattern match outlive the executed suiteand can be used after the match statement. This follows the logic of otherPython statements that can bind names, such asfor loop andwithstatement. For example:

matchshape:casePoint(x,y):...caseRectangle(x,y,_,_):...print(x,y)# This works

During failed pattern matches, some sub-patterns may succeed. For example,while matching the value[0,1,2] with the pattern(0,x,1), thesub-patternx may succeed if the list elements are matched from left to right.The implementation may choose to either make persistent bindings for thosepartial matches or not. User code including amatch statement should not relyon the bindings being made for a failed match, but also shouldn’t assume thatvariables are unchanged by a failed match. This part of the behavior isleft intentionally unspecified so different implementations can addoptimizations, and to prevent introducing semantic restrictions that couldlimit the extensibility of this feature.

Note that some pattern types below define more specific rules about whenthe binding is made.

Allowed patterns

We introduce the proposed syntax gradually. Here we start from the mainbuilding blocks. The following patterns are supported:

Literal Patterns

Simplified syntax:

literal_pattern:|number|string|'None'|'True'|'False'

A literal pattern consists of a simple literal like a string, a number,a Boolean literal (True orFalse), orNone:

matchnumber:case0:print("Nothing")case1:print("Just one")case2:print("A couple")case-1:print("One less than nothing")case1-1j:print("Good luck with that...")

Literal pattern uses equality with literal on the right hand side, so thatin the above examplenumber==0 and then possiblynumber==1, etcwill be evaluated. Note that although technically negative numbersare represented using unary minus, they are consideredliterals for the purpose of pattern matching. Unary plus is not allowed.Binary plus and minus are allowed only to join a real number and an imaginarynumber to form a complex number, such as1+1j.

Note that because equality (__eq__) is used, and the equivalencybetween Booleans and the integers0 and1, there is nopractical difference between the following two:

caseTrue:...case1:...

Triple-quoted strings are supported. Raw strings and byte stringsare supported. F-strings are not allowed (since in general they are notreally literals).

Capture Patterns

Simplified syntax:

capture_pattern:NAME

A capture pattern serves as an assignment target for the matched expression:

matchgreeting:case"":print("Hello!")casename:print(f"Hi{name}!")

Only a single name is allowed (a dotted name is a constant value pattern).A capture pattern always succeeds. A capture pattern appearing in a scope makesthe name local to that scope. For example, usingname after the abovesnippet may raiseUnboundLocalError rather thanNameError, ifthe"" case clause was taken:

matchgreeting:case"":print("Hello!")casename:print(f"Hi{name}!")ifname=="Santa":# <-- might raise UnboundLocalError...# but works fine if greeting was not empty

While matching against each case clause, a name may be bound at mostonce, having two capture patterns with coinciding names is an error:

matchdata:case[x,x]:# Error!...

Note: one can still match on a collection with equal items usingguards.Also,[x,y]|Point(x,y) is a legal pattern because the twoalternatives are never matched at the same time.

The single underscore (_) is not considered aNAME and treated speciallyas awildcard pattern.

Reminder:None,False andTrue are keywords denotingliterals, not names.

Wildcard Pattern

Simplified syntax:

wildcard_pattern:"_"

The single underscore (_) name is a special kind of pattern that alwaysmatches butnever binds:

matchdata:case[_,_]:print("Some pair")print(_)# Error!

Given that no binding is made, it can be used as many times as desired, unlikecapture patterns.

Constant Value Patterns

Simplified syntax:

constant_pattern:NAME('.'NAME)+

This is used to match against constants and enum values.Every dotted name in a pattern is looked up using normal Python nameresolution rules, and the value is used for comparison by equality withthe match subject (same as for literals):

fromenumimportEnumclassSides(str,Enum):SPAM="Spam"EGGS="eggs"...matchentree[-1]:caseSides.SPAM:# Compares entree[-1] == Sides.SPAM.response="Have you got anything without Spam?"caseside:# Assigns side = entree[-1].response=f"Well, could I have their Spam instead of the{side} then?"

Note that there is no way to use unqualified names as constant valuepatterns (they always denote variables to be captured). Seerejected ideas for other syntactic alternatives that wereconsidered for constant value patterns.

Sequence Patterns

Simplified syntax:

sequence_pattern:    | '[' [values_pattern] ']'    | '(' [value_pattern ',' [values pattern]] ')'values_pattern: ','.value_pattern+ ','?value_pattern: '*' capture_pattern | pattern

A sequence pattern follows the same semantics as unpacking assignment.Like unpacking assignment, both tuple-like and list-like syntax can beused, with identical semantics. Each element can be an arbitrarypattern; there may also be at most one*name pattern to catch allremaining items:

matchcollection:case1,[x,*others]:print("Got 1 and a nested sequence")case(1,x):print(f"Got 1 and{x}")

To match a sequence pattern the subject must be an instance ofcollections.abc.Sequence, and it cannot be any kind of string(str,bytes,bytearray). It cannot be an iterator. For matchingon a specific collection class, see class pattern below.

The_ wildcard can be starred to match sequences of varying lengths. Forexample:

  • [*_] matches a sequence of any length.
  • (_,_,*_), matches any sequence of length two or more.
  • ["a",*_,"z"] matches any sequence of length two or more that starts with"a" and ends with"z".

Mapping Patterns

Simplified syntax:

mapping_pattern: '{' [items_pattern] '}'items_pattern: ','.key_value_pattern+ ','?key_value_pattern:    | (literal_pattern | constant_pattern) ':' or_pattern    | '**' capture_pattern

Mapping pattern is a generalization of iterable unpacking to mappings.Its syntax is similar to dictionary display but each key and value arepatterns"{"(pattern":"pattern)+"}". A**rest pattern is alsoallowed, to extract the remaining items. Only literal and constant valuepatterns are allowed in key positions:

importconstantsmatchconfig:case{"route":route}:process_route(route)case{constants.DEFAULT_PORT:sub_config,**rest}:process_config(sub_config,rest)

The subject must be an instance ofcollections.abc.Mapping.Extra keys in the subject are ignored even if**rest is not present.This is different from sequence pattern, where extra items will cause amatch to fail. But mappings are actually different from sequences: theyhave natural structural sub-typing behavior, i.e., passing a dictionarywith extra keys somewhere will likely just work.

For this reason,**_ is invalid in mapping patterns; it would always be ano-op that could be removed without consequence.

Matched key-value pairs must already be present in the mapping, and not createdon-the-fly by__missing__ or__getitem__. For example,collections.defaultdict instances will only match patterns with keys thatwere already present when thematch block was entered.

Class Patterns

Simplified syntax:

class_pattern:    | name_or_attr '(' ')'    | name_or_attr '(' ','.pattern+ ','? ')'    | name_or_attr '(' ','.keyword_pattern+ ','? ')'    | name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')'keyword_pattern: NAME '=' or_pattern

A class pattern provides support for destructuring arbitrary objects.There are two possible ways of matching on object attributes: by positionlikePoint(1,2), and by name likePoint(x=1,y=2). Thesetwo can be combined, but a positional match cannot follow a match by name.Each item in a class pattern can be an arbitrary pattern. A simpleexample:

matchshape:casePoint(x,y):...caseRectangle(x0,y0,x1,y1,painted=True):...

Whether a match succeeds or not is determined by the equivalent of anisinstance call. If the subject (shape, in the example) is notan instance of the named class (Point orRectangle), the matchfails. Otherwise, it continues (see details in theruntime section).

The named class must inherit fromtype. It may be a single nameor a dotted name (e.g.some_mod.SomeClass ormod.pkg.Class).The leading name must not be_, so e.g._(...) and_.C(...) are invalid. Useobject(foo=_) to check whether thematched object has an attributefoo.

By default, sub-patterns may only be matched by keyword foruser-defined classes. In order to support positional sub-patterns, acustom__match_args__ attribute is required.The runtime allows matching againstarbitrarily nested patterns by chaining all of the instance checks andattribute lookups appropriately.

Combining multiple patterns (OR patterns)

Multiple alternative patterns can be combined into one using|. This meansthe whole pattern matches if at least one alternative matches.Alternatives are tried from left to right and have a short-circuit property,subsequent patterns are not tried if one matched. Examples:

matchsomething:case0|1|2:print("Small number")case[]|[_]:print("A short sequence")casestr()|bytes():print("Something string-like")case_:print("Something else")

The alternatives may bind variables, as long as each alternative bindsthe same set of variables (excluding_). For example:

matchsomething:case1|x:# Error!...casex|1:# Error!...caseone:=[1]|two:=[2]:# Error!...caseFoo(arg=x)|Bar(arg=x):# Valid, both arms bind 'x'...case[x]|x:# Valid, both arms bind 'x'...

Guards

Eachtop-level pattern can be followed by aguard of the formifexpression. A case clause succeeds if the pattern matches and the guardevaluates to a true value. For example:

matchinput:case[x,y]ifx>MAX_INTandy>MAX_INT:print("Got a pair of large numbers")casexifx>MAX_INT:print("Got a large number")case[x,y]ifx==y:print("Got equal items")case_:print("Not an outstanding input")

If evaluating a guard raises an exception, it is propagated onwards ratherthan fail the case clause. Names that appear in a pattern are bound before theguard succeeds. So this will work:

values=[0]matchvalues:case[x]ifx:...# This is not executedcase_:...print(x)# This will print "0"

Note that guards are not allowed for nested patterns, so that[xifx>0]is aSyntaxError and1|2if3|4 will be parsed as(1|2)if(3|4).

Walrus patterns

It is often useful to match a sub-patternand bind the correspondingvalue to a name. For example, it can be useful to write more efficientmatches, or simply to avoid repetition. To simplify such cases, any pattern(other than the walrus pattern itself) can be preceded by a name andthe walrus operator (:=). For example:

matchget_shape():caseLine(start:=Point(x,y),end)ifstart==end:print(f"Zero length line at{x},{y}")

The name on the left of the walrus operator can be used in a guard, inthe match suite, or after the match statement. However, the name willonly be bound if the sub-pattern succeeds. Another example:

matchgroup_shapes():case[],[point:=Point(x,y),*other]:print(f"Got{point} in the second group")process_coordinates(x,y)...

Technically, most such examples can be rewritten using guards and/or nestedmatch statements, but this will be less readable and/or will produce lessefficient code. Essentially, most of the arguments inPEP 572 apply hereequally.

The wildcard_ is not a valid name here.

Runtime specification

The Match Protocol

The equivalent of anisinstance call is used to decide whether anobject matches a given class pattern and to extract the correspondingattributes. Classes requiring different matching semantics (such asduck-typing) can do so by defining__instancecheck__ (apre-existing metaclass hook) or by usingtyping.Protocol.

The procedure is as following:

  • The class object forClass inClass(<sub-patterns>) islooked up andisinstance(obj,Class) is called, whereobj isthe value being matched. If false, the match fails.
  • Otherwise, if any sub-patterns are given in the form of positionalor keyword arguments, these are matched from left to right, asfollows. The match fails as soon as a sub-pattern fails; if allsub-patterns succeed, the overall class pattern match succeeds.
  • If there are match-by-position items and the class has a__match_args__ attribute, the item at positioniis matched against the value looked up by attribute__match_args__[i]. For example, a patternPoint2d(5,8),wherePoint2d.__match_args__==["x","y"], is translated(approximately) intoobj.x==5andobj.y==8.
  • If there are more positional items than the length of__match_args__, aTypeError is raised.
  • If the__match_args__ attribute is absent on the matched class,and one or more positional item appears in a match,TypeError is also raised. We don’t fall back onusing__slots__ or__annotations__ – “In the face of ambiguity,refuse the temptation to guess.”
  • If there are any match-by-keyword items the keywords are looked upas attributes on the subject. If the lookup succeeds the value ismatched against the corresponding sub-pattern. If the lookup fails,the match fails.

Such a protocol favors simplicity of implementation over flexibility andperformance. For other considered alternatives, seeextended matching.

For the most commonly-matched built-in types (bool,bytearray,bytes,dict,float,frozenset,int,list,set,str, andtuple), asingle positional sub-pattern is allowed to be passed tothe call. Rather than being matched against any particular attributeon the subject, it is instead matched against the subject itself. Thiscreates behavior that is useful and intuitive for these objects:

  • bool(False) matchesFalse (but not0).
  • tuple((0,1,2)) matches(0,1,2) (but not[0,1,2]).
  • int(i) matches anyint and binds it to the namei.

Overlapping sub-patterns

Certain classes of overlapping matches are detected atruntime and will raise exceptions. In addition to basic checksdescribed in the previous subsection:

  • The interpreter will check that two match items are not targeting the sameattribute, for examplePoint2d(1,2,y=3) is an error.
  • It will also check that a mapping pattern does not attempt to matchthe same key more than once.

Special attribute__match_args__

The__match_args__ attribute is always looked up on the typeobject named in the pattern. If present, it must be a list or tupleof strings naming the allowed positional arguments.

In deciding what names should be available for matching, therecommended practice is that class patterns should be the mirror ofconstruction; that is, the set of available names and their typesshould resemble the arguments to__init__().

Only match-by-name will work by default, and classes should define__match_args__ as a class attribute if they would like to supportmatch-by-position. Additionally, dataclasses and named tuples willsupport match-by-position out of the box. See below for more details.

Exceptions and side effects

While matching each case, thematch statement may trigger execution of otherfunctions (for example__getitem__(),__len__() ora property). Almost every exception caused by those propagates outside of thematch statement normally. The only case where an exception is not propagated isanAttributeError raised while trying to lookup an attribute while matchingattributes of a Class Pattern; that case results in just a matching failure,and the rest of the statement proceeds normally.

The only side-effect carried on explicitly by the matching process is the binding ofnames. However, the process relies on attribute access,instance checks,len(), equality and item access on the subject and some ofits components. It also evaluates constant value patterns and the left side ofclass patterns. While none of those typically create any side-effects, some ofthese objects could. This proposal intentionally leaves out any specificationof what methods are called or how many times. User code relying on thatbehavior should be considered buggy.

The standard library

To facilitate the use of pattern matching, several changes will be made tothe standard library:

  • Namedtuples and dataclasses will have auto-generated__match_args__.
  • For dataclasses the order of attributes in the generated__match_args__will be the same as the order of corresponding arguments in the generated__init__() method. This includes the situations where attributes areinherited from a superclass.

In addition, a systematic effort will be put into going throughexisting standard library classes and adding__match_args__ whereit looks beneficial.

Static checkers specification

Exhaustiveness checks

From a reliability perspective, experience shows that missing a case whendealing with a set of possible data values leads to hard to debug issues,thus forcing people to add safety asserts like this:

defget_first(data:Union[int,list[int]])->int:ifisinstance(data,list)anddata:returndata[0]elifisinstance(data,int):returndataelse:assertFalse,"should never get here"

PEP 484 specifies that static type checkers should support exhaustiveness inconditional checks with respect to enum values.PEP 586 later generalized thisrequirement to literal types.

This PEP further generalizes this requirement toarbitrary patterns. A typical situation where this applies is matching anexpression with a union type:

defclassify(val:Union[int,Tuple[int,int],List[int]])->str:matchval:case[x,y]ifx>0andy>0:returnf"A pair of{x} and{y}"case[x,*other]:returnf"A sequence starting with{x}"caseint():returnf"Some integer"# Type-checking error: some cases unhandled.

The exhaustiveness checks should also apply where both pattern matchingand enum values are combined:

fromenumimportEnumfromtypingimportUnionclassLevel(Enum):BASIC=1ADVANCED=2PRO=3classUser:name:strlevel:LevelclassAdmin:name:straccount:Union[User,Admin]matchaccount:caseAdmin(name=name)|User(name=name,level=Level.PRO):...caseUser(level=Level.ADVANCED):...# Type-checking error: basic user unhandled

Obviously, noMatchable protocol (in terms ofPEP 544) is needed, sinceevery class is matchable and therefore is subject to the checks specifiedabove.

Sealed classes as algebraic data types

Quite often it is desirable to apply exhaustiveness to a set of classes withoutdefining ad-hoc union types, which is itself fragile if a class is missing inthe union definition. A design pattern where a group of record-like classes iscombined into a union is popular in other languages that support patternmatching and is known under a name ofalgebraic data types.

We propose to add a special decorator class@sealed to thetypingmodule, that will have no effect at runtime, but will indicate to statictype checkers that all subclasses (direct and indirect) of this class shouldbe defined in the same module as the base class.

The idea is that since all subclasses are known, the type checker can treatthe sealed base class as a union of all its subclasses. Together withdataclasses this allows a clean and safe support of algebraic data typesin Python. Consider this example:

fromdataclassesimportdataclassfromtypingimportsealed@sealedclassNode:...classExpression(Node):...classStatement(Node):...@dataclassclassName(Expression):name:str@dataclassclassOperation(Expression):left:Expressionop:strright:Expression@dataclassclassAssignment(Statement):target:strvalue:Expression@dataclassclassPrint(Statement):value:Expression

With such definition, a type checker can safely treatNode asUnion[Name,Operation,Assignment,Print], and also safely treat e.g.Expression asUnion[Name,Operation]. So this will result in a typechecking error in the below snippet, becauseName is not handled (and typechecker can give a useful error message):

defdump(node:Node)->str:matchnode:caseAssignment(target,value):returnf"{target} ={dump(value)}"casePrint(value):returnf"print({dump(value)})"caseOperation(left,op,right):returnf"({dump(left)}{op}{dump(right)})"

Type erasure

Class patterns are subject to runtime type erasure. Namely, although onecan define a type aliasIntQueue=Queue[int] so that a pattern likeIntQueue() is syntactically valid, type checkers should reject such amatch:

queue:Union[Queue[int],Queue[str]]matchqueue:caseIntQueue():# Type-checking error here...

Note that the above snippet actually fails at runtime with the currentimplementation of generic classes in thetyping module, as well aswith builtin generic classes in the recently acceptedPEP 585, becausethey prohibitisinstance checks.

To clarify, generic classes are not prohibited in general from participatingin pattern matching, just that their type parameters can’t be explicitlyspecified. It is still fine if sub-patterns or literals bind the typevariables. For example:

fromtypingimportGeneric,TypeVar,UnionT=TypeVar('T')classResult(Generic[T]):first:Tother:list[T]result:Union[Result[int],Result[str]]matchresult:caseResult(first=int()):...# Type of result is Result[int] herecaseResult(other=["foo","bar",*rest]):...# Type of result is Result[str] here

Note about constants

The fact that a capture pattern is always an assignment target may create unwantedconsequences when a user by mistake tries to “match” a value againsta constant instead of using the constant value pattern. As a result, atruntime such a match will always succeed and moreover override the value ofthe constant. It is important therefore that static type checkers warn aboutsuch situations. For example:

fromtypingimportFinalMAX_INT:Final=2**64value=0matchvalue:caseMAX_INT:# Type-checking error here: cannot assign to final nameprint("Got big number")case_:print("Something else")

Note that the CPython reference implementation also generates aSyntaxWarning message for this case.

Precise type checking of star matches

Type checkers should perform precise type checking of star items in patternmatching giving them either a heterogeneouslist[T] type, oraTypedDict type as specified byPEP 589. For example:

stuff:Tuple[int,str,str,float]matchstuff:casea,*b,0.5:# Here a is int and b is list[str]...

Performance Considerations

Ideally, amatch statement should have good runtime performance comparedto an equivalent chain of if-statements. Although the history of programminglanguages is rife with examples of new features which increased engineerproductivity at the expense of additional CPU cycles, it would beunfortunate if the benefits ofmatch were counter-balanced by a significantoverall decrease in runtime performance.

Although this PEP does not specify any particular implementation strategy,a few words about the prototype implementation and how it attempts tomaximize performance are in order.

Basically, the prototype implementation transforms all of thematchstatement syntax into equivalent if/else blocks - or more accurately, intoPython byte codes that have the same effect. In other words, all of thelogic for testing instance types, sequence lengths, mapping keys andso on are inlined in place of thematch.

This is not the only possible strategy, nor is it necessarily the best.For example, the instance checks could be memoized, especiallyif there are multiple instances of the same class type but with differentarguments in a single match statement. It is also theoreticallypossible for a future implementation to process case clauses or sub-patterns inparallel using a decision tree rather than testing them one by one.

Backwards Compatibility

This PEP is fully backwards compatible: thematch andcasekeywords are proposed to be (and stay!) soft keywords, so their use asvariable, function, class, module or attribute names is not impeded atall.

This is important becausematch is the name of a popular andwell-known function and method in there module, which we have nodesire to break or deprecate.

The difference between hard and soft keywords is that hard keywordsarealways reserved words, even in positions where they make nosense (e.g.x=class+1), while soft keywords only get a specialmeaning in context. SincePEP 617 the parser backtracks, that means that ondifferent attempts to parse a code fragment it could interpret a softkeyword differently.

For example, suppose the parser encounters the following input:

match[x,y]:

The parser first attempts to parse this as an expression statement.It interpretsmatch as a NAME token, and then considers[x,y] to be a double subscript. It then encounters the colon and hasto backtrack, since an expression statement cannot be followed by acolon. The parser then backtracks to the start of the line and findsthatmatch is a soft keyword allowed in this position. It thenconsiders[x,y] to be a list expression. The colon then is justwhat the parser expected, and the parse succeeds.

Impacts on third-party tools

There are a lot of tools in the Python ecosystem that operate on Pythonsource code: linters, syntax highlighters, auto-formatters, and IDEs. Thesewill all need to be updated to include awareness of thematch statement.

In general, these tools fall into one of two categories:

Shallow parsers don’t try to understand the full syntax of Python, butinstead scan the source code for specific known patterns. IDEs, such as VisualStudio Code, Emacs and TextMate, tend to fall in this category, since frequentlythe source code is invalid while being edited, and a strict approach to parsingwould fail.

For these kinds of tools, adding knowledge of a new keyword is relativelyeasy, just an addition to a table, or perhaps modification of a regularexpression.

Deep parsers understand the complete syntax of Python. An example of thisis the auto-formatterBlack. A particular requirement with these kinds oftools is that they not only need to understand the syntax of the current versionof Python, but older versions of Python as well.

Thematch statement uses a soft keyword, and it is one of the first majorPython features to take advantage of the capabilities of the new PEG parser. Thismeans that third-party parsers which are not ‘PEG-compatible’ will have a hardtime with the new syntax.

It has been noted that a number of these third-party tools leverage common parsinglibraries (Black for example uses a fork of the lib2to3 parser). It may be helpfulto identify widely used parsing libraries (such asparso andlibCST)and upgrade them to be PEG compatible.

However, since this work would need to be done not only for the match statement,but forany new Python syntax that leverages the capabilities of the PEG parser,it is considered out of scope for this PEP. (Although it is suggested that thiswould make a fine Summer of Code project.)

Reference Implementation

Afeature-complete CPython implementation is available onGitHub.

Aninteractive playgroundbased on the above implementation was created usingBinder andJupyter.

Example Code

A smallcollection of example code isavailable on GitHub.

Rejected Ideas

This general idea has been floating around for a pretty long time, and manyback and forth decisions were made. Here we summarize many alternativepaths that were taken but eventually abandoned.

Don’t do this, pattern matching is hard to learn

In our opinion, the proposed pattern matching is not more difficult thanaddingisinstance() andgetattr() to iterable unpacking. Also, webelieve the proposed syntax significantly improves readability for a widerange of code patterns, by allowing to expresswhat one wants to do, ratherthanhow to do it. We hope the few real code snippets we included in the PEPabove illustrate this comparison well enough. For more real code examplesand their translations see Ref.[1].

Don’t do this, use existing method dispatching mechanisms

We recognize that some of the use cases for thematch statement overlapwith what can be done with traditional object-oriented programming (OOP) designtechniques using class inheritance. The ability to choose alternatebehaviors based on testing the runtime type of a match subject mighteven seem heretical to strict OOP purists.

However, Python has always been a language that embraces a variety ofprogramming styles and paradigms. Classic Python design idioms such as“duck”-typing go beyond the traditional OOP model.

We believe that there are important use cases where the use ofmatch resultsin a cleaner and more maintainable architecture. These use cases tend tobe characterized by a number of features:

  • Algorithms which cut across traditional lines of data encapsulation. If analgorithm is processing heterogeneous elements of different types (such asevaluating or transforming an abstract syntax tree, or doing algebraicmanipulation of mathematical symbols), forcing the user to implementthe algorithm as individual methods on each element type results inlogic that is smeared across the entire codebase instead of being neatlylocalized in one place.
  • Program architectures where the set of possible data types is relativelystable, but there is an ever-expanding set of operations to be performedon those data types. Doing this in a strict OOP fashion requires constantlyadding new methods to both the base class and subclasses to support the newmethods, “polluting” the base class with lots of very specialized methoddefinitions, and causing widespread disruption and churn in the code. Bycontrast, in amatch-based dispatch, adding a new behavior merelyinvolves writing a newmatch statement.
  • OOP also does not handle dispatching based on theshape of an object, suchas the length of a tuple, or the presence of an attribute – instead any suchdispatching decision must be encoded into the object’s type. Shape-baseddispatching is particularly interesting when it comes to handling “duck”-typedobjects.

Where OOP is clearly superior is in the opposite case: where the set of possibleoperations is relatively stable and well-defined, but there is an ever-growingset of data types to operate on. A classic example of this is UI widget toolkits,where there is a fixed set of interaction types (repaint, mouse click, keypress,and so on), but the set of widget types is constantly expanding as developersinvent new and creative user interaction styles. Adding a new kind of widgetis a simple matter of writing a new subclass, whereas with a match-based approachyou end up having to add a new case clause to many widespread match statements.We therefore don’t recommend usingmatch in such a situation.

Allow more flexible assignment targets instead

There was an idea to instead just generalize the iterable unpacking to muchmore general assignment targets, instead of adding a new kind of statement.This concept is known in some other languages as “irrefutable matches”. Wedecided not to do this because inspection of real-life potential use casesshowed that in vast majority of cases destructuring is related to anifcondition. Also many of those are grouped in a series of exclusive choices.

Make it an expression

In most other languages pattern matching is represented by an expression, notstatement. But making it an expression would be inconsistent with othersyntactic choices in Python. All decision making logic is expressed almostexclusively in statements, so we decided to not deviate from this.

Use a hard keyword

There were options to makematch a hard keyword, or choose a differentkeyword. Although using a hard keyword would simplify life for simple-mindedsyntax highlighters, we decided not to use hard keyword for several reasons:

  • Most importantly, the new parser doesn’t require us to do this. Unlike withasync that caused hardships with being a soft keyword for few releases,here we can makematch a permanent soft keyword.
  • match is so commonly used in existing code, that it would break almostevery existing program and will put a burden to fix code on many people whomay not even benefit from the new syntax.
  • It is hard to find an alternative keyword that would not be commonly usedin existing programs as an identifier, and would still clearly reflect themeaning of the statement.

Useas or| instead ofcase for case clauses

The pattern matching proposed here is a combination of multi-branch controlflow (in line withswitch in Algol-derived languages orcond in Lisp)and object-deconstruction as found in functional languages. While the proposedkeywordcase highlights the multi-branch aspect, alternative keywords suchasas would equally be possible, highlighting the deconstruction aspect.as orwith, for instance, also have the advantage of already beingkeywords in Python. However, sincecase as a keyword can only occur as aleading keyword inside amatch statement, it is easy for a parser todistinguish between its use as a keyword or as a variable.

Other variants would use a symbol like| or=>, or go entirely withoutspecial marker.

Since Python is a statement-oriented language in the tradition of Algol, and aseach composite statement starts with an identifying keyword,case seemed tobe most in line with Python’s style and traditions.

Use a flat indentation scheme

There was an idea to use an alternative indentation scheme, for example whereevery case clause would not be indented with respect to the initialmatchpart:

matchexpression:casepattern_1:...casepattern_2:...

The motivation is that although flat indentation saves some horizontal space,it may look awkward to an eye of a Python programmer, because everywhere elsecolon is followed by an indent. This will also complicate life forsimple-minded code editors. Finally, the horizontal space issue can bealleviated by allowing “half-indent” (i.e. two spaces instead of four) formatch statements.

In sample programs usingmatch, written as part of the development of thisPEP, a noticeable improvement in code brevity is observed, more than making upfor the additional indentation level.

Another proposal considered was to use flat indentation but put theexpression on the line aftermatch:, like this:

match:expressioncasepattern_1:...casepattern_2:...

This was ultimately rejected because the first block would be anovelty in Python’s grammar: a block whose only content is a singleexpression rather than a sequence of statements.

Alternatives for constant value pattern

This is probably the trickiest item. Matching against some pre-definedconstants is very common, but the dynamic nature of Python also makes itambiguous with capture patterns. Five other alternatives were considered:

  • Use some implicit rules. For example, if a name was defined in the globalscope, then it refers to a constant, rather than representing acapture pattern:
    # Here, the name "spam" must be defined in the global scope (and# not shadowed locally). "side" must be local.matchentree[-1]:casespam:...# Compares entree[-1] == spam.caseside:...# Assigns side = entree[-1].

    This however can cause surprises and action at a distance if someonedefines an unrelated coinciding name before the match statement.

  • Use a rule based on the case of a name. In particular, if the namestarts with a lowercase letter it would be a capture pattern, while ifit starts with uppercase it would refer to a constant:
    matchentree[-1]:caseSPAM:...# Compares entree[-1] == SPAM.caseside:...# Assigns side = entree[-1].

    This works well with the recommendations for naming constants fromPEP 8. The main objection is that there’s no other part of corePython where the case of a name is semantically significant.In addition, Python allows identifiers to use different scripts,many of which (e.g. CJK) don’t have a case distinction.

  • Use extra parentheses to indicate lookup semantics for a given name. Forexample:
    matchentree[-1]:case(spam):...# Compares entree[-1] == spam.caseside:...# Assigns side = entree[-1].

    This may be a viable option, but it can create some visual noise if usedoften. Also honestly it looks pretty unusual, especially in nested contexts.

    This also has the problem that we may want or need parentheses todisambiguate grouping in patterns, e.g. inPoint(x,y=(y:=complex())).

  • Introduce a special symbol, for example.,?,$, or^ toindicate that a given name is a value to be matched against, notto be assigned to. An earlier version of this proposal used aleading-dot rule:
    matchentree[-1]:case.spam:...# Compares entree[-1] == spam.caseside:...# Assigns side = entree[-1].

    While potentially useful, it introduces strange-looking new syntaxwithout making the pattern syntax any more expressive. Indeed,named constants can be made to work with the existing rules byconverting them toEnum types, or enclosing them in their ownnamespace (considered by the authors to be one honking great idea):

    matchentree[-1]:caseSides.SPAM:...# Compares entree[-1] == Sides.SPAM.caseside:...# Assigns side = entree[-1].

    If needed, the leading-dot rule (or a similar variant) could beadded back later with no backward-compatibility issues.

  • There was also an idea to make lookup semantics the default, and require$ or? to be used in capture patterns:
    match entree[-1]:    case spam: ...   # Compares entree[-1] == spam.    case side?: ...  # Assigns side = entree[-1].

    There are a few issues with this:

    • Capture patterns are more common in typical code, so it isundesirable to require special syntax for them.
    • The authors are not aware of any other language that adornscaptures in this way.
    • None of the proposed syntaxes have any precedent in Python;no other place in Python that binds names (e.g.import,def,for) uses special marker syntax.
    • It would break the syntactic parallels of the current grammar:
      match coords:    case ($x, $y):        return Point(x, y)  # Why not "Point($x, $y)"?

In the end, these alternatives were rejected because of the mentioned drawbacks.

Disallow float literals in patterns

Because of the inexactness of floats, an early version of this proposaldid not allow floating-point constants to be used as match patterns. Partof the justification for this prohibition is that Rust does this.

However, during implementation, it was discovered that distinguishing betweenfloat values and other types required extra code in the VM that would slowmatches generally. Given that Python and Rust are very different languageswith different user bases and underlying philosophies, it was felt thatallowing float literals would not cause too much harm, and would be lesssurprising to users.

Range matching patterns

This would allow patterns such as1...6. However, there are a host ofambiguities:

  • Is the range open, half-open, or closed? (I.e. is6 included in theabove example or not?)
  • Does the range match a single number, or a range object?
  • Range matching is often used for character ranges (‘a’…’z’) but thatwon’t work in Python since there’s no character data type, just strings.
  • Range matching can be a significant performance optimization if you canpre-build a jump table, but that’s not generally possible in Python dueto the fact that names can be dynamically rebound.

Rather than creating a special-case syntax for ranges, it was decidedthat allowing custom pattern objects (InRange(0,6)) would be more flexibleand less ambiguous; however those ideas have been postponed for the timebeing (Seedeferred ideas).

Use dispatch dict semantics for matches

Implementations for classicswitch statement sometimes use a pre-computedhash table instead of a chained equality comparisons to gain some performance.In the context ofmatch statement this is technically also possible formatches against literal patterns. However, having subtly different semanticsfor different kinds of patterns would be too surprising for potentiallymodest performance win.

We can still experiment with possible performance optimizations in thisdirection if they will not cause semantic differences.

Usecontinue andbreak in case clauses.

Another rejected proposal was to define new meanings forcontinueandbreak inside ofmatch, which would have the following behavior:

  • continue would exit the current case clause and continue matchingat the next case clause.
  • break would exit the match statement.

However, there is a serious drawback to this proposal: if thematch statementis nested inside of a loop, the meanings ofcontinue andbreak are nowchanged. This may cause unexpected behavior during refactorings; also, anargument can be made that there are other means to get the same behavior (suchas using guard conditions), and that in practice it’s likely that the existingbehavior ofcontinue andbreak are far more useful.

AND (&) patterns

This proposal defines an OR-pattern (|) to match one of several alternates;why not also an AND-pattern (&)? Especially given that some other languages(F# for example) support this.

However, it’s not clear how useful this would be. The semantics for matchingdictionaries, objects and sequences already incorporates an implicit ‘and’: allattributes and elements mentioned must be present for the match to succeed. Guardconditions can also support many of the use cases that a hypothetical ‘and’operator would be used for.

In the end, it was decided that this would make the syntax more complex withoutadding a significant benefit.

Negative match patterns

A negation of a match pattern using the operator! as a prefix would matchexactly if the pattern itself does not match. For instance,!(3|4)would match anything except3 or4.

This was rejected because there isdocumented evidence that this featureis rarely useful (in languages which support it) or used as double negation!! to control variable scopes and prevent variable bindings (which doesnot apply to Python). It can also be simulated using guard conditions.

Check exhaustiveness at runtime

The question is what to do if no case clause has a matching pattern, andthere is no default case. An earlier version of the proposal specified thatthe behavior in this case would be to throw an exception rather thansilently falling through.

The arguments back and forth were many, but in the end the EIBTI (ExplicitIs Better Than Implicit) argument won out: it’s better to have the programmerexplicitly throw an exception if that is the behavior they want.

For cases such as sealed classes and enums, where the patterns are all knownto be members of a discrete set,static checkers can warn about missingpatterns.

Type annotations for pattern variables

The proposal was to combine patterns with type annotations:

matchx:case[a:int,b:str]:print(f"An int{a} and a string{b}:)case[a:int,b:int,c:int]:print(f"Three ints",a,b,c)...

This idea has a lot of problems. For one, the colon can onlybe used inside of brackets or parens, otherwise the syntax becomesambiguous. And because Python disallowsisinstance() checkson generic types, type annotations containing generics will notwork as expected.

Allow*rest in class patterns

It was proposed to allow*rest in a class pattern, giving avariable to be bound to all positional arguments at once (similar toits use in unpacking assignments). It would provide some symmetrywith sequence patterns. But it might be confused with a feature toprovide thevalues for all positional arguments at once. And thereseems to be no practical need for it, so it was scrapped. (It couldeasily be added at a later stage if a need arises.)

Disallow_.a in constant value patterns

The first public draft said that the initial name in a constant valuepattern must not be_ because_ has a special meaning inpattern matching, so this would be invalid:

case_.a:...

(However,a._ would be legal and load the attribute with name_ of the objecta as usual.)

There was some pushback against this on python-dev (some people have alegitimate use for_ as an important global variable, esp. ini18n) and the only reason for this prohibition was to prevent someuser confusion. But it’s not the hill to die on.

Use some other token as wildcard

It has been proposed to use... (i.e., the ellipsis token) or* (star) as a wildcard. However, both these look as if anarbitrary number of items is omitted:

case[a,...,z]:...case[a,*,z]:...

Both look like the would match a sequence of at two or more items,capturing the first and last values.

In addition, if* were to be used as the wildcard character, wewould have to come up with some other way to capture the rest of asequence, currently spelled like this:

case[first,second,*rest]:...

Using an ellipsis would also be more confusing in documentation andexamples, where... is routinely used to indicate somethingobvious or irrelevant. (Yes, this would also be an argument againstthe other uses of... in Python, but that water is already underthe bridge.)

Another proposal was to use?. This could be acceptable, althoughit would require modifying the tokenizer.

Also,_ is already usedas a throwaway target in other contexts, and this use is prettysimilar. This example is fromdifflib.py in the stdlib:

fortag,_,_,j1,j2ingroup:...

Perhaps the most convincing argument is that_ is used as thewildcard in every other language we’ve looked at supporting patternmatching: C#, Elixir, Erlang, F#, Haskell, Mathematica, OCaml, Ruby,Rust, Scala, and Swift. Now, in general, we should not be concernedtoo much with what another language does, since Python is clearlydifferent from all these languages. However, if there is such anoverwhelming and strong consensus, Python should not go out of its wayto do something completely different – particularly given that_works well in Python and is already in use as a throwaway target.

Note that_ is not assigned to by patterns – this avoidsconflicts with the use of_ as a marker for translatable stringsand an alias forgettext.gettext, as recommended by thegettext module documentation.

Use some other syntax instead of| for OR patterns

A few alternatives to using| to separate the alternatives in ORpatterns have been proposed. Instead of:

case401|403|404:print("Some HTTP error")

the following proposals have been fielded:

  • Use a comma:
    case401,403,404:print("Some HTTP error")

    This looks too much like a tuple – we would have to find adifferent way to spell tuples, and the construct would have to beparenthesized inside the argument list of a class pattern. Ingeneral, commas already have many different meanings in Python, weshouldn’t add more.

  • Allow stacked cases:
    case401:case403:case404:print("Some HTTP error")

    This is how this would be done in C, using its fall-throughsemantics for cases. However, we don’t want to mislead people intothinking thatmatch/case uses fall-through semantics (whichare a common source of bugs in C). Also, this would be a novelindentation pattern, which might make it harder to support in IDEsand such (it would break the simple rule “add an indentation levelafter a line ending in a colon”). Finally, this wouldn’t supportOR patterns nested inside other patterns.

  • Usecasein followed by a comma-separated list:
    casein401,403,404:print("Some HTTP error")

    This wouldn’t work for OR patterns nested inside other patterns,like:

    casePoint(0|1,0|1):print("A corner of the unit square")
  • Use theor keyword:
    case401or403or404:print("Some HTTP error")

    This could work, and the readability is not too different from using|. Some users expressed a preference foror because theyassociate| with bitwise OR. However:

    1. Many other languages that have pattern matching use| (thelist includes Elixir, Erlang, F#, Mathematica, OCaml, Ruby, Rust,and Scala).
    2. | is shorter, which may contribute to the readability ofnested patterns likePoint(0|1,0|1).
    3. Some people mistakenly believe that| has the wrong priority;but since patterns don’t support other operators it has the samepriority as in expressions.
    4. Python users useor very frequently, and may build animpression that it is strongly associated with Booleanshort-circuiting.
    5. | is used between alternatives in regular expressionsand in EBNF grammars (like Python’s own).
    6. | not just used for bitwise OR – it’s used for set unions,dict merging (PEP 584) and is being considered as analternative totyping.Union (PEP 604).
    7. | works better as a visual separator, especially betweenstrings. Compare:
      case"spam"or"eggs"or"cheese":

      to:

      case"spam"|"eggs"|"cheese":

Add anelse clause

We decided not to add anelse clause for several reasons.

  • It is redundant, since we already havecase_:
  • There will forever be confusion about the indentation level of theelse: – should it align with the list of cases or with thematch keyword?
  • Completionist arguments like “every other statement has one” arefalse – only those statements have anelse clause where it addsnew functionality.

Deferred Ideas

There were a number of proposals to extend the matching syntax that wedecided to postpone for possible future PEP. These fall into the realm of“cool idea but not essential”, and it was felt that it might be better toacquire some real-world data on how the match statement will be used inpractice before moving forward with some of these proposals.

Note that in each case, the idea was judged to be a “two-way door”,meaning that there should be no backwards-compatibility issues with addingthese features later.

One-off syntax variant

While inspecting some code-bases that may benefit the most from the proposedsyntax, it was found that single clause matches would be used relatively often,mostly for various special-casing. In other languages this is supported inthe form of one-off matches. We proposed to support such one-off matches too:

ifmatchvalueaspattern[andguard]:...

or, alternatively, without theif:

matchvalueaspattern[ifguard]:...

as equivalent to the following expansion:

matchvalue:casepattern[ifguard]:...

To illustrate how this will benefit readability, consider this (slightlysimplified) snippet from real code:

ifisinstance(node,CallExpr):if(isinstance(node.callee,NameExpr)andlen(node.args)==1andisinstance(node.args[0],NameExpr)):call=node.callee.namearg=node.args[0].name...# Continue special-casing 'call' and 'arg'...# Follow with common code

This can be rewritten in a more straightforward way as:

ifmatchnodeasCallExpr(callee=NameExpr(name=call),args=[NameExpr(name=arg)]):...# Continue special-casing 'call' and 'arg'...# Follow with common code

This one-off form would not allowelifmatch statements, as it was onlymeant to handle a single pattern case. It was intended to be special caseof amatch statement, not a special case of anif statement:

ifmatchvalue_1aspatter_1[andguard_1]:...elifmatchvalue_2aspattern_2[andguard_2]:# Not allowed...elifmatchvalue_3aspattern_3[andguard_3]:# Not allowed...else:# Also not allowed...

This would defeat the purpose of one-off matches as a complement to exhaustivefull matches - it’s better and clearer to use a full match in this case.

Similarly,ifnotmatch would not be allowed, sincematch...as... is notan expression. Nor do we propose awhilematch construct present in some languageswith pattern matching, since although it may be handy, it will likely be usedrarely.

Other pattern-based constructions

Many other languages supporting pattern-matching use it as a basis for multiplelanguage constructs, including a matching operator, a generalized formof assignment, a filter for loops, a method for synchronizing communication,or specialized if statements. Some of these were mentioned in the discussionof the first draft. Another question asked was why this particular form (joiningbinding and conditional selection) was chosen while other forms were not.

Introducing more uses of patterns would be too bold and premature given theexperience we have using patterns, and would make this proposal toocomplicated. The statement as presented provides a form of the feature thatis sufficiently general to be useful while being self-contained, and withouthaving a massive impact on the syntax and semantics of the language as a whole.

After some experience with this feature, the community may have a betterfeeling for what other uses of pattern matching could be valuable in Python.

Algebraic matching of repeated names

A technique occasionally seen in functional languages like Erlang and Elixir isto use a match variable multiple times in the same pattern:

matchvalue:casePoint(x,x):print("Point is on a diagonal!")

The idea here is that the first appearance ofx would bind the valueto the name, and subsequent occurrences would verify that the incomingvalue was equal to the value previously bound. If the value was not equal,the match would fail.

However, there are a number of subtleties involved with mixing load-storesemantics for capture patterns. For the moment, we decided to make repeateduse of names within the same pattern an error; we can always relax thisrestriction later without affecting backwards compatibility.

Note that youcan use the same name more than once in alternate choices:

matchvalue:casex|[x]:# etc.

Custom matching protocol

During the initial design discussions for this PEP, there were a lot of ideasthrown around about custom matchers. There were a couple of motivations forthis:

  • Some classes might want to expose a different set of “matchable” namesthan the actual class properties.
  • Some classes might have properties that are expensive to calculate, andtherefore shouldn’t be evaluated unless the match pattern actually neededaccess to them.
  • There were ideas for exotic matchers such asIsInstance(),InRange(),RegexMatchingGroup() and so on.
  • In order for built-in types and standard library classes to be ableto support matching in a reasonable and intuitive way, it was believedthat these types would need to implement special matching logic.

These customized match behaviors would be controlled by a special__match__ method on the class name. There were two competing variants:

  • A ‘full-featured’ match protocol which would pass in not onlythe subject to be matched, but detailed information aboutwhich attributes the specified pattern was interested in.
  • A simplified match protocol, which only passed in the subject value,and which returned a “proxy object” (which in most cases could bejust the subject) containing the matchable attributes.

Here’s an example of one version of the more complex protocol proposed:

matchexpr:caseBinaryOp(left=Number(value=x),op=op,right=Number(value=y)):...fromtypesimportPatternObjectBinaryOp.__match__((),{"left":PatternObject(Number,(),{"value":...},-1,False),"op":...,"right":PatternObject(Number,(),{"value":...},-1,False),},-1,False,)

One drawback of this protocol is that the arguments to__match__would be expensive to construct, and could not be pre-computed due tothe fact that, because of the way names are bound, there are no realconstants in Python. It also meant that the__match__ method wouldhave to re-implement much of the logic of matching which would otherwisebe implemented in C code in the Python VM. As a result, this option wouldperform poorly compared to an equivalentif-statement.

The simpler protocol suffered from the fact that although it was moreperformant, it was much less flexible, and did not allow for many ofthe creative custom matchers that people were dreaming up.

Late in the design process, however, it was realized that the need fora custom matching protocol was much less than anticipated. Virtuallyall the realistic (as opposed to fanciful) uses cases brought up couldbe handled by the built-in matching behavior, although in a few casesan extra guard condition was required to get the desired effect.

Moreover, it turned out that none of the standard library classes reallyneeded any special matching support other than an appropriate__match_args__ property.

The decision to postpone this feature came with a realization that this isnot a one-way door; that a more flexible and customizable matching protocolcan be added later, especially as we gain more experience with real-worlduse cases and actual user needs.

The authors of this PEP expect that thematch statement will evolveover time as usage patterns and idioms evolve, in a way similar to whatother “multi-stage” PEPs have done in the past. When this happens, theextended matching issue can be revisited.

Parameterized Matching Syntax

(Also known as “Class Instance Matchers”.)

This is another variant of the “custom match classes” idea that would allowdiverse kinds of custom matchers mentioned in the previous section – however,instead of using an extended matching protocol, it would be achieved byintroducing an additional pattern type with its own syntax. This pattern typewould accept two distinct sets of parameters: one set which consists of theactual parameters passed into the pattern object’s constructor, and anotherset representing the binding variables for the pattern.

The__match__ method of these objects could use the constructor parametervalues in deciding what was a valid match.

This would allow patterns such asInRange<0,6>(value), which would matcha number in the range 0..6 and assign the matched value to ‘value’. Similarly,one could have a pattern which tests for the existence of a named group ina regular expression match result (different meaning of the word ‘match’).

Although there is some support for this idea, there was a lot of bikesheddingon the syntax (there are not a lot of attractive options available)and no clear consensus was reached, so it was decided that for now, thisfeature is not essential to the PEP.

Pattern Utility Library

Both of the previous ideas would be accompanied by a new Python standardlibrary module which would contain a rich set of useful matchers.However, it is not really possible to implement such a library withoutadopting one of the extended pattern proposals given in the previous sections,so this idea is also deferred.

Acknowledgments

We are grateful for the help of the following individuals (among manyothers) for helping out during various phases of the writing of thisPEP:

  • Gregory P. Smith
  • Jim Jewett
  • Mark Shannon
  • Nate Lust
  • Taine Zhao

Version History

  1. Initial version
  2. Substantial rewrite, including:
    • Minor clarifications, grammar and typo corrections
    • Rename various concepts
    • Additional discussion of rejected ideas, including:
      • Why we choose_ for wildcard patterns
      • Why we choose| for OR patterns
      • Why we choose not to use special syntax for capture variables
      • Why this pattern matching operation and not others
    • Clarify exception and side effect semantics
    • Clarify partial binding semantics
    • Drop restriction on use of_ in load contexts
    • Drop the default single positional argument being the wholesubject except for a handful of built-in types
    • Simplify behavior of__match_args__
    • Drop the__match__ protocol (moved todeferred ideas)
    • DropImpossibleMatchError exception
    • Drop leading dot for loads (moved todeferred ideas)
    • Reworked the initial sections (everything beforesyntax)
    • Added an overview of all the types of patterns before thedetailed description
    • Added simplified syntax next to the description of each pattern
    • Separate description of the wildcard from capture patterns
    • Added Daniel F Moisset as sixth co-author

References

[1]
https://github.com/gvanrossum/patma/blob/master/EXAMPLES.md

Appendix A – Full Grammar

Here is the full grammar formatch_stmt. This is an additionalalternative forcompound_stmt. It should be understood thatmatch andcase are soft keywords, i.e. they are not reservedwords in other grammatical contexts (including at the start of a lineif there is no colon where expected). By convention, hard keywordsuse single quotes while soft keywords use double quotes.

Other notation used beyond standard EBNF:

  • SEP.RULE+ is shorthand forRULE(SEPRULE)*
  • !RULE is a negative lookahead assertion
match_expr:    | star_named_expression ',' star_named_expressions?    | named_expressionmatch_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENTcase_block: "case" patterns [guard] ':' blockguard: 'if' named_expressionpatterns: value_pattern ',' [values_pattern] | patternpattern: walrus_pattern | or_patternwalrus_pattern: NAME ':=' or_patternor_pattern: '|'.closed_pattern+closed_pattern:    | capture_pattern    | literal_pattern    | constant_pattern    | group_pattern    | sequence_pattern    | mapping_pattern    | class_patterncapture_pattern: NAME !('.' | '(' | '=')literal_pattern:    | signed_number !('+' | '-')    | signed_number '+' NUMBER    | signed_number '-' NUMBER    | strings    | 'None'    | 'True'    | 'False'constant_pattern: attr !('.' | '(' | '=')group_pattern: '(' patterns ')'sequence_pattern: '[' [values_pattern] ']' | '(' ')'mapping_pattern: '{' items_pattern? '}'class_pattern:    | name_or_attr '(' ')'    | name_or_attr '(' ','.pattern+ ','? ')'    | name_or_attr '(' ','.keyword_pattern+ ','? ')'    | name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')'signed_number: NUMBER | '-' NUMBERattr: name_or_attr '.' NAMEname_or_attr: attr | NAMEvalues_pattern: ','.value_pattern+ ','?items_pattern: ','.key_value_pattern+ ','?keyword_pattern: NAME '=' or_patternvalue_pattern: '*' capture_pattern | patternkey_value_pattern:    | (literal_pattern | constant_pattern) ':' or_pattern    | '**' capture_pattern

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0622.rst

Last modified:2025-02-01 08:55:40 GMT


[8]ページ先頭

©2009-2025 Movatter.jp