This PEP proposes a semantics for pattern matching that respects the general concept ofPEP 634,but is more precise, easier to reason about, and should be faster.
The object model will be extended with two special (dunder) attributes,__match_container__ and__match_class__, in addition to the__match_args__ attribute fromPEP 634, to support pattern matching.Both of these new attributes must be integers and__match_args__ is required to be a tuple of unique strings.
With this PEP:
Pattern matching in Python, as described inPEP 634, is to be added to Python 3.10.Unfortunately,PEP 634 is not as precise about the semantics as it could be,nor does it allow classes sufficient control over how they match patterns.
PEP 634 explicitly includes a section on undefined behavior.Large amounts of undefined behavior may be acceptable in a language like C,but in Python it should be kept to a minimum.Pattern matching in Python can be defined more precisely without losing expressiveness or performance.
PEP 634 delegates the decision over whether a class is a sequence or mapping tocollections.abc.Not all classes that could be considered sequences are registered as subclasses ofcollections.abc.Sequence.This PEP allows them to match sequence patterns, without the fullcollections.abc.Sequence machinery.
PEP 634 privileges some builtin classes with a special form of matching, the “self” match.For example the patternlist(x) matches a list and assigns the list tox.By allowing classes to choose which kinds of pattern they match, other classes can use this form as well.
For example, usingsympy, we might want to write:
# a*a == a**2caseMul(args=[Symbol(a),Symbol(b)])ifa==b:returnPow(a,2)
Which requires the sympy classSymbol to “self” match.Forsympy to support this pattern withPEP 634 is possible, but a bit tricky.With this PEP it can be implemented very easily[1].
With this PEP, access to attributes during pattern matching becomes well defined and deterministic.This makes pattern matching less error prone when matching objects with hidden side effects, such as object-relational mappers.Objects will have more control over their own deconstruction, which can help prevent unintended consequences should attribute access have side-effects.
PEP 634 relies on thecollections.abc module when determining which patterns a value can match, implicitly importing it if necessary.This PEP will eliminate surprising import errors and misleading audit events from those imports.
The semantics proposed in this PEP will allow efficient implementation, partly as a result of having precise semanticsand partly from using the object model.
With precise semantics, it is possible to reason about what code transformations are correct,and thus apply optimizations effectively.
Because the object model is a core part of Python, implementations already handle special attribute lookup efficiently.Looking up a special attribute is much faster than performing a subclass test on an abstract base class.
The object model and special methods are at the core of the Python language. Consequently,implementations support them well.Using special attributes for pattern matching allows pattern matching to be implemented in a way thatintegrates well with the rest of the implementation, and is thus easier to maintain and is likely to perform better.
A match statement performs a sequence of pattern matches. In general, matching a pattern has three parts:
To determine whether a value can match a particular kind of pattern, we add the__match_container__and__match_class__ attributes.This allows the kind of a value to be determined in a efficient fashion.
The__match_container__ and__match_class__ attributes will be added toobject.__match_container__ should be overridden by classes that want to match mapping or sequence patterns.__match_class__ should be overridden by classes that want to change the default behavior when matching class patterns.
__match_container__ must be an integer and should be exactly one of these:
0MATCH_SEQUENCE=1MATCH_MAPPING=2
MATCH_SEQUENCE is used to indicate that instances of the class can match sequence patterns.
MATCH_MAPPING is used to indicate that instances of the class can match mapping patterns.
__match_class__ must be an integer and should be exactly one of these:
0MATCH_SELF=8
MATCH_SELF is used to indicate that for a single positional argument class pattern, the subject will be used and not deconstructed.
Note
In the rest of this document, we will refer to the above values by name only.Symbolic constants will be provided both for Python and C, and the values willnever be changed.
object will have the following values for the special attributes:
__match_container__=0__match_class__=0__match_args__=()
These special attributes will be inherited as normal.
If__match_args__ is overridden, then it is required to hold a tuple of unique strings. It may be empty.
Note
__match_args__ will be automatically generated for dataclasses and named tuples, as specified inPEP 634.
The pattern matching implementation isnot required to check that any of these attributes behave as specified.If the value of__match_container__,__match_class__ or__match_args__ is not as specified, thenthe implementation may raise any exception, or match the wrong pattern.Of course, implementations are free to check these properties and provide meaningful error messages if they can do so efficiently.
In the following, all variables of the form$var are temporary variables and are not visible to the Python program.They may be visible via introspection, but that is an implementation detail and should not be relied on.The pseudo-statementFAIL is used to signify that matching failed for this pattern and that matching should move to the next pattern.If control reaches the end of the translation without reaching aFAIL, then it has matched, and following patterns are ignored.
Variables of the form$ALL_CAPS are meta-variables holding a syntactic element, they are not normal variables.So,$VARS=$items is not an assignment of$items to$VARS,but an unpacking of$items into the variables that$VARS holds.For example, with the abstract syntaxcase[$VARS]:, and the concrete syntaxcase[a,b]: then$VARS would hold the variables(a,b),not the values of those variables.
The pseudo-functionQUOTE takes a variable and returns the name of that variable.For example, if the meta-variable$VAR held the variablefoo thenQUOTE($VAR)=="foo".
All additional code listed below that is not present in the original source will not trigger line events, conforming toPEP 626.
Before any patterns are matched, the expression being matched is evaluated:
matchexpr:
translates to:
$value = expr
Capture patterns always match, so the irrefutable match:
casecapture_var:
translates to:
capture_var = $value
Wildcard patterns always match, so:
case_:
translates to:
# No code -- Automatically matchesThe literal pattern:
caseLITERAL:
translates to:
if $value != LITERAL: FAIL
except when the literal is one ofNone,True orFalse ,when it translates to:
if $value is not LITERAL: FAIL
The value pattern:
casevalue.pattern:
translates to:
if $value != value.pattern: FAIL
A pattern not including a star pattern:
case [$VARS]:
translates to:
$kind = type($value).__match_container__if $kind != MATCH_SEQUENCE: FAILif len($value) != len($VARS): FAIL$VARS = $value
Example:[2]
A pattern including a star pattern:
case [$VARS]
translates to:
$kind = type($value).__match_container__if $kind != MATCH_SEQUENCE: FAILif len($value) < len($VARS): FAIL$VARS = $value # Note that $VARS includes a star expression.
Example:[3]
A pattern not including a double-star pattern:
case {$KEYWORD_PATTERNS}:translates to:
$sentinel = object()$kind = type($value).__match_container__if $kind != MATCH_MAPPING: FAIL# $KEYWORD_PATTERNS is a meta-variable mapping names to variables.for $KEYWORD in $KEYWORD_PATTERNS: $tmp = $value.get(QUOTE($KEYWORD), $sentinel) if $tmp is $sentinel: FAIL $KEYWORD_PATTERNS[$KEYWORD] = $tmp
Example:[4]
A pattern including a double-star pattern:
case {$KEYWORD_PATTERNS, **$DOUBLE_STARRED_PATTERN}:translates to:
$kind = type($value).__match_container__if $kind != MATCH_MAPPING: FAIL# $KEYWORD_PATTERNS is a meta-variable mapping names to variables.$tmp = dict($value)if not $tmp.keys() >= $KEYWORD_PATTERNS.keys(): FAIL:for $KEYWORD in $KEYWORD_PATTERNS: $KEYWORD_PATTERNS[$KEYWORD] = $tmp.pop(QUOTE($KEYWORD))$DOUBLE_STARRED_PATTERN = $tmp
Example:[5]
Class pattern with no arguments:
caseClsName():
translates to:
if not isinstance($value, ClsName): FAIL
Class pattern with a single positional pattern:
case ClsName($VAR):
translates to:
$kind = type($value).__match_class__if $kind == MATCH_SELF: if not isinstance($value, ClsName): FAIL $VAR = $valueelse: As other positional-only class pattern
Positional-only class pattern:
case ClsName($VARS):
translates to:
if not isinstance($value, ClsName): FAIL$attrs = ClsName.__match_args__if len($attr) < len($VARS): raise TypeError(...)try: for i, $VAR in enumerate($VARS): $VAR = getattr($value, $attrs[i])except AttributeError: FAIL
Example:[6]
Class patterns with all keyword patterns:
case ClsName($KEYWORD_PATTERNS):
translates to:
if not isinstance($value, ClsName): FAILtry: for $KEYWORD in $KEYWORD_PATTERNS: $tmp = getattr($value, QUOTE($KEYWORD)) $KEYWORD_PATTERNS[$KEYWORD] = $tmpexcept AttributeError: FAIL
Example:[7]
Class patterns with positional and keyword patterns:
case ClsName($VARS, $KEYWORD_PATTERNS):
translates to:
if not isinstance($value, ClsName): FAIL$attrs = ClsName.__match_args__if len($attr) < len($VARS): raise TypeError(...)$pos_attrs = $attrs[:len($VARS)]try: for i, $VAR in enumerate($VARS): $VAR = getattr($value, $attrs[i]) for $KEYWORD in $KEYWORD_PATTERNS: $name = QUOTE($KEYWORD) if $name in pos_attrs: raise TypeError(...) $KEYWORD_PATTERNS[$KEYWORD] = getattr($value, $name)except AttributeError: FAIL
Example:[8]
The above specification assumes that patterns are not nested. For nested patternsthe above translations are applied recursively by introducing temporary capture patterns.
For example, the pattern:
case[int(),str()]:
translates to:
$kind = type($value).__match_class__if $kind != MATCH_SEQUENCE: FAILif len($value) != 2: FAIL$value_0, $value_1 = $value#Now match on temporary valuesif not isinstance($value_0, int): FAILif not isinstance($value_1, str): FAIL
Guards translate to a test following the rest of the translation:
casepatternifguard:
translates to:
[translationforpattern]ifnotguard:FAIL
All classes should ensure that the the values of__match_container__,__match_class__and__match_args__ follow the specification.Therefore, implementations can assume, without checking, that the following are true:
__match_container__==0or__match_container__==MATCH_SEQUENCEor__match_container__==MATCH_MAPPING__match_class__==0or__match_class__==MATCH_SELF
and that__match_args__ is a tuple of unique strings.
For the core builtin container classes__match_container__ will be:
list:MATCH_SEQUENCEtuple:MATCH_SEQUENCEdict:MATCH_MAPPINGbytearray: 0bytes: 0str: 0Named tuples will have__match_container__ set toMATCH_SEQUENCE.
issubclass(cls,collections.abc.Mapping) is true will have__match_container__ set toMATCH_MAPPING.issubclass(cls,collections.abc.Sequence) is true will have__match_container__ set toMATCH_SEQUENCE.For the following builtin classes__match_class__ will be set toMATCH_SELF:
boolbytearraybytesfloatfrozensetintsetstrlisttupledictThe above semantics implies a lot of redundant effort and copying in the implementation.However, it is possible to implement the above semantics efficiently by employing semantic preserving transformationson the naive implementation.
When performing matching, implementations are allowedto treat the following functions and methods as pure:
For any class supportingMATCH_SEQUENCE:
* ``cls.__len__()``* ``cls.__getitem__()``
For any class supportingMATCH_MAPPING:
* ``cls.get()`` (Two argument form only)
Implementations are allowed to make the following assumptions:
isinstance(obj,cls) can be freely replaced withissubclass(type(obj),cls) and vice-versa.isinstance(obj,cls) will always return the same result for any(obj,cls) pair and repeated calls can thus be elided.__match_container__,__match_class__ or__match_args__ is a pure operation, and may be cached.__match_container__==MATCH_SEQUENCE is not zero, are not modified by iteration, subscripting or calls tolen().Consequently, those operations can be freely substituted for each other where they would be equivalent when applied to an immutable sequence.__match_container__==MATCH_MAPPING is not zero, will not capture the second argument of theget() method.So, the$sentinel value may be freely re-used.In fact, implementations are encouraged to make these assumptions, as it is likely to result in significantly better performance.
None.
The naive implementation that follows from the specification will not be very efficient.Fortunately, there are some reasonably straightforward transformations that can be used to improve performance.Performance should be comparable to the implementation ofPEP 634 (at time of writing) by the release of 3.10.Further performance improvements may have to wait for the 3.11 release.
The following is not part of the specification,but guidelines to help developers create an efficient implementation.
Since the first step in matching each pattern is check to against the kind, it is possible to combine all the checks against kind into a single multi-way branch at the beginningof the match. The list of cases can then be duplicated into several “lanes” each corresponding to one kind.It is then trivial to remove unmatchable cases from each lane.Depending on the kind, different optimization strategies are possible for each lane.Note that the body of the match clause does not need to be duplicated, just the pattern.
This is probably the most complex to optimize and the most profitable in terms of performance.Since each pattern can only match a range of lengths, often only a single length,the sequence of tests can be rewritten in as an explicit iteration over the sequence,attempting to match only those patterns that apply to that sequence length.
For example:
case[]:Acase[x]:Bcase[x,y]:Ccaseother:D
Can be compiled roughly as:
# Choose lane $i = iter($value) for $0 in $i: break else: A goto done for $1 in $i: break else: x = $0 B goto done for $2 in $i: del $0, $1, $2 break else: x = $0 y = $1 C goto done other = $value Ddone:
The best strategy here is probably to form a decision tree based on the size of the mapping and which keys are present.There is no point repeatedly testing for the presence of a key.For example:
matchobj:case{a:x,b:y}:Wcase{a:x,c:y}:Xcase{a:x,b:_,c:y}:Ycaseother:Z
If the key"a" is not present when checking for case X, there is no need to check it again for Y.
The mapping lane can be implemented, roughly as:
# Choose laneif len($value) == 2: if "a" in $value: if "b" in $value: x = $value["a"] y = $value["b"] goto W if "c" in $value: x = $value["a"] y = $value["c"] goto Xelif len($value) == 3: if "a" in $value and "b" in $value: x = $value["a"] y = $value["c"] goto Yother = $valuegoto Z
The changes to the semantics can be summarized as:
__match_args__ to be atuple of strings, not just a sequence.This make pattern matching a bit more robust and optimizable as__match_args__ can be assumed to be immutable.cls.__match_container__ instead ofissubclass(cls,collections.abc.Mapping) andissubclass(cls,collections.abc.Sequence).__match_class__=0.There are no changes to syntax. All examples given in thePEP 636 tutorial should continue to work as they do now.
An earlier version of this PEP only used attributes from the instance’s dictionary when matching a class pattern with__match_class__ was the default value.The intent was to avoid capturing bound-methods and other synthetic attributes. However, this also mean that properties were ignored.
For the class:
classC:def__init__(self):self.a="a"@propertydefp(self):...defm(self):...
Ideally we would match the attributes “a” and “p”, but not “m”.However, there is no general way to do that, so this PEP now follows the semantics ofPEP 634.
__match_args__ on the subject not the patternAn earlier version of this PEP looked up__match_args__ on the class of the subject andnot the class specified in the pattern.This has been rejected for a few reasons:
*Usingtheclassspecifiedinthepatternismoreamenabletooptimizationandcanofferbetterperformance.*Usingtheclassspecifiedinthepatternhasthepotentialtoprovidebettererrorreportingissomecases.*Neitherapproachisperfect,bothhaveoddcornercases.Keepingthestatusquominimizesdisruption.
__match_class__ and__match_container__ into a single valueAn earlier version of this PEP combined__match_class__ and__match_container__ into a single value,__match_kind__.Using a single value has a small advantage in terms of performance,but is likely to result in unintended changes to container matching when overriding class matching behavior, and vice versa.
The original version of this PEP included the match kindMATCH_POSITIONAL and special method__deconstruct__ which would allow classes full control over their matching. This is importantfor libraries likesympy.
For example, usingsympy, we might want to write:
# sin(x)**2 + cos(x)**2 == 1caseAdd(Pow(sin(a),2),Pow(cos(b),2))ifa==b:return1
Forsympy to support the positional patterns with current pattern matching is possible,but is tricky. With these additional features it can be implemented easily[9].
This idea will feature in a future PEP for 3.11.However, it is too late in the 3.10 development cycle for such a change.
In an earlier version of this PEP, there was a distinct value for__match_class__ that allowed classes to not match any classpattern that would have required deconstruction. However, this would become redundant onceMATCH_POSITIONAL is introduced, andcomplicates the specification for an extremely rare case.
classSymbol:__match_class__=MATCH_SELF
This:
case[a,b]ifaisb:
translates to:
$kind = type($value).__match_container__if $kind != MATCH_SEQUENCE: FAILif len($value) != 2: FAILa, b = $valueif not a is b: FAIL
This:
case[a,*b,c]:
translates to:
$kind = type($value).__match_container__if $kind != MATCH_SEQUENCE: FAILif len($value) < 2: FAILa, *b, c = $value
This:
case{"x":x,"y":y}ifx>2:
translates to:
$kind = type($value).__match_container__if $kind != MATCH_MAPPING: FAIL$tmp = $value.get("x", $sentinel)if $tmp is $sentinel: FAILx = $tmp$tmp = $value.get("y", $sentinel)if $tmp is $sentinel: FAILy = $tmpif not x > 2: FAILThis:
case{"x":x,"y":y,**z}:
translates to:
$kind = type($value).__match_container__if $kind != MATCH_MAPPING: FAIL$tmp = dict($value)if not $tmp.keys() >= {"x", "y"}: FAILx = $tmp.pop("x")y = $tmp.pop("y")z = $tmpThis:
matchClsName(x,y):
translates to:
if not isinstance($value, ClsName): FAIL$attrs = ClsName.__match_args__if len($attr) < 2: FAILtry: x = getattr($value, $attrs[0]) y = getattr($value, $attrs[1])except AttributeError: FAIL
This:
matchClsName(a=x,b=y):
translates to:
if not isinstance($value, ClsName): FAILtry: x = $value.a y = $value.bexcept AttributeError: FAIL
This:
matchClsName(x,a=y):
translates to:
if not isinstance($value, ClsName): FAIL$attrs = ClsName.__match_args__if len($attr) < 1: raise TypeError(...)$positional_names = $attrs[:1]try: x = getattr($value, $attrs[0]) if "a" in $positional_names: raise TypeError(...) y = $value.aexcept AttributeError: FAIL
classBasic:__match_class__=MATCH_POSITIONALdef__deconstruct__(self):returnself._args
This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.
Source:https://github.com/python/peps/blob/main/peps/pep-0653.rst
Last modified:2025-02-01 08:55:40 GMT