Python Enhancement Proposals

Python »
PEP Index »
PEP 3103

Author:: Guido van Rossum <guido at python.org>
Status:

Rejection Notice

A quick poll during my keynote presentation at PyCon 2007 shows thisproposal has no popular support. I therefore reject it.

Python-dev has recently seen a flurry of discussion on adding a switchstatement. In this PEP I’m trying to extract my own preferences fromthe smorgasbord of proposals, discussing alternatives and explainingmy choices where I can. I’ll also indicate how strongly I feel aboutalternatives I discuss.

This PEP should be seen as an alternative toPEP 275. My views aresomewhat different from that PEP’s author, but I’m grateful for thework done in that PEP.

This PEP introduces canonical names for the many variants that havebeen discussed for different aspects of the syntax and semantics, suchas “alternative 1”, “school II”, “option 3” and so on. Hopefullythese names will help the discussion.

Rationale

A common programming idiom is to consider an expression and dodifferent things depending on its value. This is usually done with achain of if/elif tests; I’ll refer to this form as the “if/elifchain”. There are two main motivations to want to introduce newsyntax for this idiom:

It is repetitive: the variable and the test operator, usually ‘==’or ‘in’, are repeated in each if/elif branch.
It is inefficient: when an expression matches the last test value(or no test value at all) it is compared to each of the precedingtest values.

Both of these complaints are relatively mild; there isn’t a lot ofreadability or performance to be gained by writing this differently.Yet, some kind of switch statement is found in many languages and itis not unreasonable to expect that its addition to Python will allowus to write up certain code more cleanly and efficiently than before.

There are forms of dispatch that are not suitable for the proposedswitch statement; for example, when the number of cases is notstatically known, or when it is desirable to place the code fordifferent cases in different classes or files.

Basic Syntax

I’m considering several variants of the syntax first proposed in PEP275 here. There are lots of other possibilities, but I don’t see thatthey add anything.

I’ve recently been converted to alternative 1.

I should note that all alternatives here have the “implicit break”property: at the end of the suite for a particular case, the controlflow jumps to the end of the whole switch statement. There is no wayto pass control from one case to another. This in contrast to C,where an explicit ‘break’ statement is required to prevent fallingthrough to the next case.

In all alternatives, the else-suite is optional. It is more Pythonicto use ‘else’ here rather than introducing a new reserved word,‘default’, as in C.

Semantics are discussed in the next top-level section.

Alternative 1

This is the preferred form inPEP 275:

switchEXPR:caseEXPR:SUITEcaseEXPR:SUITE...else:SUITE

The main downside is that the suites where all the action is areindented two levels deep; this can be remedied by indenting the cases“half a level” (e.g. 2 spaces if the general indentation level is 4).

Alternative 2

This is Fredrik Lundh’s preferred form; it differs by not indentingthe cases:

switchEXPR:caseEXPR:SUITEcaseEXPR:SUITE....else:SUITE

Some reasons not to choose this include expected difficulties forauto-indenting editors, folding editors, and the like; and confusedusers. There are no situations currently in Python where a lineending in a colon is followed by an unindented line.

Alternative 3

This is the same as alternative 2 but leaves out the colon after theswitch:

switchEXPRcaseEXPR:SUITEcaseEXPR:SUITE....else:SUITE

The hope of this alternative is that it will not upset the auto-indentlogic of the average Python-aware text editor less. But it looksstrange to me.

Alternative 4

This leaves out the ‘case’ keyword on the basis that it is redundant:

switchEXPR:EXPR:SUITEEXPR:SUITE...else:SUITE

Unfortunately now we are forced to indent the case expressions,because otherwise (at least in the absence of an ‘else’ keyword) theparser would have a hard time distinguishing between an unindentedcase expression (which continues the switch statement) or an unrelatedstatement that starts like an expression (such as an assignment or aprocedure call). The parser is not smart enough to backtrack once itsees the colon. This is my least favorite alternative.

Extended Syntax

There is one additional concern that needs to be addressedsyntactically. Often two or more values need to be treated the same.In C, this done by writing multiple case labels together without anycode between them. The “fall through” semantics then mean that theseare all handled by the same code. Since the Python switch will nothave fall-through semantics (which have yet to find a champion) weneed another solution. Here are some alternatives.

Alternative A

Use:

caseEXPR:

to match on a single expression; use:

caseEXPR,EXPR,...:

to match on multiple expressions. The is interpreted so that if EXPRis a parenthesized tuple or another expression whose value is a tuple,the switch expression must equal that tuple, not one of its elements.This means that we cannot use a variable to indicate multiple cases.While this is also true in C’s switch statement, it is a relativelycommon occurrence in Python (see for example sre_compile.py).

Alternative B

Use:

caseEXPR:

to match on a single expression; use:

caseinEXPR_LIST:

to match on multiple expressions. If EXPR_LIST is a singleexpression, the ‘in’ forces its interpretation as an iterable (orsomething supporting __contains__, in a minority semanticsalternative). If it is multiple expressions, each of those isconsidered for a match.

Alternative C

Use:

caseEXPR:

to match on a single expression; use:

caseEXPR,EXPR,...:

to match on multiple expressions (as in alternative A); and use:

case*EXPR:

to match on the elements of an expression whose value is an iterable.The latter two cases can be combined, so that the true syntax is morelike this:

case[*]EXPR,[*]EXPR,...:

The* notation is similar to the use of prefix* already in use forvariable-length parameter lists and for passing computed argumentlists, and often proposed for value-unpacking (e.g.a,b,*c=X asan alternative to(a,b),c=X[:2],X[2:]).

Alternative D

This is a mixture of alternatives B and C; the syntax is likealternative B but instead of the ‘in’ keyword it uses ‘*’. This ismore limited, but still allows the same flexibility. It uses:

caseEXPR:

to match on a single expression and:

case*EXPR:

to match on the elements of an iterable. If one wants to specifymultiple matches in one case, one can write this:

case*(EXPR,EXPR,...):

or perhaps this (although it’s a bit strange because the relativepriority of ‘*’ and ‘,’ is different than elsewhere):

case*EXPR,EXPR,...:

Discussion

Alternatives B, C and D are motivated by the desire to specifymultiple cases with the same treatment using a variable representing aset (usually a tuple) rather than spelling them out. The motivationfor this is usually that if one has several switches over the same setof cases it’s a shame to have to spell out all the alternatives eachtime. An additional motivation is to be able to specifyranges tobe matched easily and efficiently, similar to Pascal’s “1..1000:”notation. At the same time we want to prevent the kind of mistakethat is common in exception handling (and which will be addressed inPython 3000 by changing the syntax of the except clause): writing“case 1, 2:” where “case (1, 2):” was meant, or vice versa.

The case could be made that the need is insufficient for the addedcomplexity; C doesn’t have a way to express ranges either, and it’sused a lot more than Pascal these days. Also, if a dispatch methodbased on dict lookup is chosen as the semantics, large ranges could beinefficient (consider range(1, sys.maxint)).

All in all my preferences are (from most to least favorite) B, A, D’,C, where D’ is D without the third possibility.

Semantics

There are several issues to review before we can choose the rightsemantics.

If/Elif Chain vs. Dict-based Dispatch

There are several main schools of thought about the switch statement’ssemantics:

School I wants to define the switch statement in term of anequivalent if/elif chain (possibly with some optimization thrownin).
School II prefers to think of it as a dispatch on a precomputeddict. There are different choices for when the precomputationhappens.
There’s also school III, which agrees with school I that thedefinition of a switch statement should be in terms of an equivalentif/elif chain, but concedes to the optimization camp that allexpressions involved must be hashable.

We need to further separate school I into school Ia and school Ib:

School Ia has a simple position: a switch statement is translated toan equivalent if/elif chain, and that’s that. It should not belinked to optimization at all. That is also my main objectionagainst this school: without any hint of optimization, the switchstatement isn’t attractive enough to warrant new syntax.
School Ib has a more complex position: it agrees with school II thatoptimization is important, and is willing to concede the compilercertain liberties to allow this. (For example,PEP 275 Solution 1.)In particular, hash() of the switch and case expressions may or maynot be called (so it should be side-effect-free); and the caseexpressions may not be evaluated each time as expected by theif/elif chain behavior, so the case expressions should also beside-effect free. My objection to this (elaborated below) is thatif either the hash() or the case expressions aren’tside-effect-free, optimized and unoptimized code may behavedifferently.

School II grew out of the realization that optimization of commonlyfound cases isn’t so easy, and that it’s better to face this head on.This will become clear below.

The differences between school I (mostly school Ib) and school II arethreefold:

When optimizing using a dispatch dict, if either the switchexpression or the case expressions are unhashable (in which casehash() raises an exception), school Ib requires catching the hash()failure and falling back to an if/elif chain. School II simply letsthe exception happen. The problem with catching an exception inhash() as required by school Ib, is that this may hide a genuinebug. A possible way out is to only use a dispatch dict if all caseexpressions are ints, strings or other built-ins with known goodhash behavior, and to only attempt to hash the switch expression ifit is also one of those types. Type objects should probably also besupported here. This is the (only) problem that school IIIaddresses.
When optimizing using a dispatch dict, if the hash() function of anyexpression involved returns an incorrect value, under school Ib,optimized code will not behave the same as unoptimized code. Thisis a well-known problem with optimization-related bugs, and wastelots of developer time. Under school II, in this situationincorrect results are produced at least consistently, which shouldmake debugging a bit easier. The way out proposed for the previousbullet would also help here.
School Ib doesn’t have a good optimization strategy if the caseexpressions are named constants. The compiler cannot know theirvalues for sure, and it cannot know whether they are truly constant.As a way out, it has been proposed to re-evaluate the expressioncorresponding to the case once the dict has identified which caseshould be taken, to verify that the value of the expression didn’tchange. But strictly speaking, all the case expressions occurringbefore that case would also have to be checked, in order to preservethe true if/elif chain semantics, thereby completely killing theoptimization. Another proposed solution is to have callbacksnotifying the dispatch dict of changes in the value of variables orattributes involved in the case expressions. But this is not likelyimplementable in the general case, and would require many namespacesto bear the burden of supporting such callbacks, which currentlydon’t exist at all.
Finally, there’s a difference of opinion regarding the treatment ofduplicate cases (i.e. two or more cases with match expressions thatevaluates to the same value). School I wants to treat this the sameis an if/elif chain would treat it (i.e. the first match wins andthe code for the second match is silently unreachable); school IIwants this to be an error at the time the dispatch dict is frozen(so dead code doesn’t go undiagnosed).

School I sees trouble in school II’s approach of pre-freezing adispatch dict because it places a new and unusual burden onprogrammers to understand exactly what kinds of case values areallowed to be frozen and when the case values will be frozen, or theymight be surprised by the switch statement’s behavior.

School II doesn’t believe that school Ia’s unoptimized switch is worththe effort, and it sees trouble in school Ib’s proposal foroptimization, which can cause optimized and unoptimized code to behavedifferently.

In addition, school II sees little value in allowing cases involvingunhashable values; after all if the user expects such values, they canjust as easily write an if/elif chain. School II also doesn’t believethat it’s right to allow dead code due to overlapping cases to occurunflagged, when the dict-based dispatch implementation makes it soeasy to trap this.

However, there are some use cases for overlapping/duplicate cases.Suppose you’re switching on some OS-specific constants (e.g. exportedby the os module or some module like that). You have a case for each.But on some OS, two different constants have the same value (since onthat OS they are implemented the same way – like O_TEXT and O_BINARYon Unix). If duplicate cases are flagged as errors, your switchwouldn’t work at all on that OS. It would be much better if you couldarrange the cases so that one case has preference over another.

There’s also the (more likely) use case where you have a set of casesto be treated the same, but one member of the set must be treateddifferently. It would be convenient to put the exception in anearlier case and be done with it.

(Yes, it seems a shame not to be able to diagnose dead code due toaccidental case duplication. Maybe that’s less important, andpychecker can deal with it? After all we don’t diagnose duplicatemethod definitions either.)

This suggests school IIb: like school II but redundant cases must beresolved by choosing the first match. This is trivial to implementwhen building the dispatch dict (skip keys already present).

(An alternative would be to introduce new syntax to indicate “okay tohave overlapping cases” or “ok if this case is dead code” but I findthat overkill.)

Personally, I’m in school II: I believe that the dict-based dispatchis the one true implementation for switch statements and that weshould face the limitations up front, so that we can reap maximalbenefits. I’m leaning towards school IIb – duplicate cases should beresolved by the ordering of the cases instead of flagged as errors.

When to Freeze the Dispatch Dict

For the supporters of school II (dict-based dispatch), the next bigdividing issue is when to create the dict used for switching. I callthis “freezing the dict”.

The main problem that makes this interesting is the observation thatPython doesn’t have named compile-time constants. What isconceptually a constant, such as re.IGNORECASE, is a variable to thecompiler, and there’s nothing to stop crooked code from modifying itsvalue.

Option 1

The most limiting option is to freeze the dict in the compiler. Thiswould require that the case expressions are all literals orcompile-time expressions involving only literals and operators whosesemantics are known to the compiler, since with the current state ofPython’s dynamic semantics and single-module compilation, there is nohope for the compiler to know with sufficient certainty the values ofany variables occurring in such expressions. This is widely thoughnot universally considered too restrictive.

Raymond Hettinger is the main advocate of this approach. He proposesa syntax where only a single literal of certain types is allowed asthe case expression. It has the advantage of being unambiguous andeasy to implement.

My main complaint about this is that by disallowing “named constants”we force programmers to give up good habits. Named constants areintroduced in most languages to solve the problem of “magic numbers”occurring in the source code. For example, sys.maxint is a lot morereadable than 2147483647. Raymond proposes to use string literalsinstead of named “enums”, observing that the string literal’s contentcan be the name that the constant would otherwise have. Thus, wecould write “case ‘IGNORECASE’:” instead of “case re.IGNORECASE:”.However, if there is a spelling error in the string literal, the casewill silently be ignored, and who knows when the bug is detected. Ifthere is a spelling error in a NAME, however, the error will be caughtas soon as it is evaluated. Also, sometimes the constants areexternally defined (e.g. when parsing a file format like JPEG) and wecan’t easily choose appropriate string values. Using an explicitmapping dict sounds like a poor hack.

Option 2

The oldest proposal to deal with this is to freeze the dispatch dictthe first time the switch is executed. At this point we can assumethat all the named “constants” (constant in the programmer’s mind,though not to the compiler) used as case expressions are defined –otherwise an if/elif chain would have little chance of success either.Assuming the switch will be executed many times, doing some extra workthe first time pays back quickly by very quick dispatch times later.

An objection to this option is that there is no obvious object wherethe dispatch dict can be stored. It can’t be stored on the codeobject, which is supposed to be immutable; it can’t be stored on thefunction object, since many function objects may be created for thesame function (e.g. for nested functions). In practice, I’m sure thatsomething can be found; it could be stored in a section of the codeobject that’s not considered when comparing two code objects or whenpickling or marshalling a code object; or all switches could be storedin a dict indexed by weak references to code objects. The solutionshould also be careful not to leak switch dicts between multipleinterpreters.

Another objection is that the first-use rule allows obfuscated codelike this:

deffoo(x,y):switchx:casey:print42

To the untrained eye (not familiar with Python) this code would beequivalent to this:

deffoo(x,y):ifx==y:print42

but that’s not what it does (unless it is always called with the samevalue as the second argument). This has been addressed by suggestingthat the case expressions should not be allowed to reference localvariables, but this is somewhat arbitrary.

A final objection is that in a multi-threaded application, thefirst-use rule requires intricate locking in order to guarantee thecorrect semantics. (The first-use rule suggests a promise that sideeffects of case expressions are incurred exactly once.) This may beas tricky as the import lock has proved to be, since the lock has tobe held while all the case expressions are being evaluated.

Option 3

A proposal that has been winning support (including mine) is to freezea switch’s dict when the innermost function containing it is defined.The switch dict is stored on the function object, just as parameterdefaults are, and in fact the case expressions are evaluated at thesame time and in the same scope as the parameter defaults (i.e. in thescope containing the function definition).

This option has the advantage of avoiding many of the finesses neededto make option 2 work: there’s no need for locking, no worry aboutimmutable code objects or multiple interpreters. It also provides aclear explanation for why locals can’t be referenced in caseexpressions.

This option works just as well for situations where one wouldtypically use a switch; case expressions involving imported or globalnamed constants work exactly the same way as in option 2, as long asthey are imported or defined before the function definition isencountered.

A downside however is that the dispatch dict for a switch inside anested function must be recomputed each time the nested function isdefined. For certain “functional” styles of programming this may makeswitch unattractive in nested functions. (Unless all case expressionsare compile-time constants; then the compiler is of course free tooptimize away the switch freezing code and make the dispatch table partof the code object.)

Another downside is that under this option, there’s no clear momentwhen the dispatch dict is frozen for a switch that doesn’t occurinside a function. There are a few pragmatic choices for how to treata switch outside a function:

Disallow it.
Translate it into an if/elif chain.
Allow only compile-time constant expressions.
Compute the dispatch dict each time the switch is reached.
Like (b) but tests that all expressions evaluated are hashable.

Of these, (a) seems too restrictive: it’s uniformly worse than (c);and (d) has poor performance for little or no benefits compared to(b). It doesn’t make sense to have a performance-critical inner loopat the module level, as all local variable references are slow there;hence (b) is my (weak) favorite. Perhaps I should favor (e), whichattempts to prevent atypical use of a switch; examples that workinteractively but not in a function are annoying. In the end I don’tthink this issue is all that important (except it must be resolvedsomehow) and am willing to leave it up to whoever ends up implementingit.

When a switch occurs in a class but not in a function, we can freezethe dispatch dict at the same time the temporary function objectrepresenting the class body is created. This means the caseexpressions can reference module globals but not class variables.Alternatively, if we choose (b) above, we could choose thisimplementation inside a class definition as well.

Option 4

There are a number of proposals to add a construct to the languagethat makes the concept of a value pre-computed at function definitiontime generally available, without tying it either to parameter defaultvalues or case expressions. Some keywords proposed include ‘const’,‘static’, ‘only’ or ‘cached’. The associated syntax and semanticsvary.

These proposals are out of scope for this PEP, except to suggest thatif such a proposal is accepted, there are two ways for the switch tobenefit: we could require case expressions to be either compile-timeconstants or pre-computed values; or we could make pre-computed valuesthe default (and only) evaluation mode for case expressions. Thelatter would be my preference, since I don’t see a use for moredynamic case expressions that isn’t addressed adequately by writing anexplicit if/elif chain.

Conclusion

It is too early to decide. I’d like to see at least one completedproposal for pre-computed values before deciding. In the meantime,Python is fine without a switch statement, and perhaps those who claimit would be a mistake to add one are right.

Copyright

This document has been placed in the public domain.

Source:https://github.com/python/peps/blob/main/peps/pep-3103.rst

Last modified:2025-02-01 08:55:40 GMT

Movatterモバイル変換

Python Enhancement Proposals

PEP 3103 – A Switch/Case Statement

Rejection Notice

Abstract

Rationale

Basic Syntax

Alternative 1

Alternative 2

Alternative 3

Alternative 4

Extended Syntax

Alternative A

Alternative B

Alternative C

Alternative D

Discussion

Semantics

If/Elif Chain vs. Dict-based Dispatch

When to Freeze the Dispatch Dict

Option 1

Option 2

Option 3

Option 4

Conclusion

Copyright