Python Enhancement Proposals

Python »
PEP Index »
PEP 531

Author:: Alyssa Coghlan <ncoghlan at gmail.com>
Status:

Abstract

Inspired byPEP 505 and the related discussions, this PEP proposes the additionof two new control flow operators to Python:

Existence-checking precondition (“exists-then”):expr1?thenexpr2
Existence-checking fallback (“exists-else”):expr1?elseexpr2

as well as the following abbreviations for common existence checkingexpressions and statements:

Existence-checking attribute access:obj?.attr (forobj?thenobj.attr)
Existence-checking subscripting:obj?[expr] (forobj?thenobj[expr])
Existence-checking assignment:value?=expr (forvalue=value?elseexpr)

The common? symbol in these new operator definitions indicates that theyuse a new “existence checking” protocol rather than the establishedtruth-checking protocol used by if statements, while loops, comprehensions,generator expressions, conditional expressions, logical conjunction, andlogical disjunction.

This new protocol would be made available asoperator.exists, with thefollowing characteristics:

types can define a new__exists__ magic method (Python) ortp_exists slot (C) to override the default behaviour. This optionalmethod has the same signature and possible return values as__bool__.
operator.exists(None) returnsFalse
operator.exists(NotImplemented) returnsFalse
operator.exists(Ellipsis) returnsFalse
float,complex anddecimal.Decimal will override the existencecheck such thatNaN values returnFalse and other values (includingzero values) returnTrue
for any other type,operator.exists(obj) returns True by default. Mostimportantly, values that evaluate to False in a truth checking context(zeroes, empty containers) will still evaluate to True in an existencechecking context

PEP Withdrawal

When posting this PEP for discussion on python-ideas[4], I asked reviewers toconsider 3 high level design questions before moving on to considering thespecifics of this particular syntactic proposal:

1. Do we collectively agree that “existence checking” is a usefulgeneral concept that exists in software development and is distinctfrom the concept of “truth checking”?2. Do we collectively agree that the Python ecosystem would benefitfrom an existence checking protocol that permits generalisation ofalgorithms (especially short circuiting ones) across different “datamissing” indicators, including those defined in the languagedefinition, the standard library, and custom user code?3. Do we collectively agree that it would be easier to use such aprotocol effectively if existence-checking equivalents to thetruth-checking “and” and “or” control flow operators were available?

While the answers to the first question were generally positive, it quicklybecame clear that the answer to the second question is “No”.

Steven D’Aprano articulated the counter-argument well in[5], but the generalidea is that when checking for “missing data” sentinels, we’re almost alwayslooking for aspecific sentinel value, rather thanany sentinel value.

NotImplemented exists, for example, due toNone being a potentiallylegitimate result from overloaded arithmetic operators and exceptionhandling imposing too much runtime overhead to be useful for operand coercion.

Similarly,Ellipsis exists for multi-dimensional slicing support due toNone already have another meaning in a slicing context (indicating the useof the default start or stop indices, or the default step size).

In mathematics, the value ofNaN is thatprogrammatically it behaveslike a normal value of its type (e.g. exposing all the usual attributes andmethods), while arithmetically it behaves according to the mathematical rulesfor handlingNaN values.

With that core design concept invalidated, the proposal as a whole doesn’tmake sense, and it is accordingly withdrawn.

However, the discussion of the proposal did prompt consideration of a potentialprotocol based approach to make the existingand,or andif-elseoperators more flexible[6] without introducing any new syntax, so I’ll bewriting that up as another possible alternative toPEP 505.

Relationship with other PEPs

While this PEP was inspired by and builds on Mark Haase’s excellent work inputting togetherPEP 505, it ultimately competes with that PEP due tosignificant differences in the specifics of the proposed syntax and semanticsfor the feature.

It also presents a different perspective on the rationale for the change byfocusing on the benefits to existing Python users as the typical demands ofapplication and service development activities are genuinely changing. Itisn’t an accident that similar features are now appearing in multipleprogramming languages, and while it’s a good idea for us to learn from how otherlanguage designers are handling the problem, precedents being set elsewhereare more relevant tohow we would go about tackling this problem than theyare to whether or not we think it’s a problem we should address in the firstplace.

Rationale

Existence checking expressions

An increasingly common requirement in modern software development is the needto work with “semi-structured data”: data where the structure of the data isknown in advance, but pieces of it may be missing at runtime, and the softwaremanipulating that data is expected to degrade gracefully (e.g. by omittingresults that depend on the missing data) rather than failing outright.

Some particularly common cases where this issue arises are:

handling optional application configuration settings and function parameters
handling external service failures in distributed systems
handling data sets that include some partial records

It is the latter two cases that are the primary motivation for this PEP - whileneeding to deal with optional configuration settings and parameters is a designrequirement at least as old as Python itself, the rise of public cloudinfrastructure, the development of software systems as collaborative networksof distributed services, and the availability of large public and private datasets for analysis means that the ability to degrade operations gracefully inthe face of partial service failures or partial data availability is becomingan essential feature of modern programming environments.

At the moment, writing such software in Python can be genuinely awkward, asyour code ends up littered with expressions like:

value1=expr1.field.of.interestifexpr1isnotNoneelseNone
value2=expr2["field"]["of"]["interest"]ifexpr2isnotNoneelseNone
value3=expr3ifexpr3isnotNoneelseexpr4ifexpr4isnotNoneelseexpr5

If these are only occasional, then expanding out to full statement forms mayhelp improve readability, but if you have 4 or 5 of them in a row (which is afairly common situation in data transformation pipelines), then replacing themwith 16 or 20 lines of conditional logic really doesn’t help matters.

Expanding the three examples above that way hopefully helps illustrate that:

ifexpr1isnotNone:value1=expr1.field.of.interestelse:value1=Noneifexpr2isnotNone:value2=expr2["field"]["of"]["interest"]else:value2=Noneifexpr3isnotNone:value3=expr3else:ifexpr4isnotNone:value3=expr4else:value3=expr5

The combined impact of the proposals in this PEP is to allow the above sampleexpressions to instead be written as:

value1=expr1?.field.of.interest
value2=expr2?["field"]["of"]["interest"]
value3=expr3?elseexpr4?elseexpr5

In these forms, almost all of the information presented to the reader isimmediately relevant to the question “What does this code do?”, while theboilerplate code to handle missing data by passing it through to the outputor falling back to an alternative input, has shrunk to two uses of the?symbol and two uses of the?else keyword.

In the first two examples, the 31 character boilerplate clauseifexprNisnotNoneelseNone (minimally 27 characters for a single lettervariable name) has been replaced by a single? character, substantiallyimproving the signal-to-pattern-noise ratio of the lines (especially if itencourages the use of more meaningful variable and field names rather thanmaking them shorter purely for the sake of expression brevity).

In the last example, two instances of the 21 character boilerplate,ifexprNisnotNone (minimally 17 characters) are replaced with singlecharacters, again substantially improving the signal-to-pattern-noise ratio.

Furthermore, each of our 5 “subexpressions of potential interest” is includedexactly once, rather than 4 of them needing to be duplicated or pulled outto a named variable in order to first check if they exist.

The existence checking precondition operator is mainly defined to provide aclear conceptual basis for the existence checking attribute access andsubscripting operators:

obj?.attr is roughly equivalent toobj?thenobj.attr
obj?[expr] is roughly equivalent toobj?thenobj[expr]

The main semantic difference between the shorthand forms and their expandedequivalents is that the common subexpression to the left of the existencechecking operator is evaluated only once in the shorthand form (similar tothe benefit offered by augmented assignment statements).

Existence checking assignment

Existence-checking assignment is proposed as a relatively straightforwardexpansion of the concepts in this PEP to also cover the common configurationhandling idiom:

value=valueifvalueisnotNoneelseexpensive_default()

by allowing that to instead be abbreviated as:

value?=expensive_default()

This is mainly beneficial when the target is a subscript operation orsubattribute, as even without this specific change, the PEP would stillpermit this idiom to be updated to:

value=value?elseexpensive_default()

The main argumentagainst adding this form is that it’s arguably ambiguousand could mean either:

value=value?elseexpensive_default(); or
value=value?thenvalue.subfield.of.interest

The second form isn’t at all useful, but if this concern was deemed significantenough to address while still keeping the augmented assignment feature,the full keyword could be included in the syntax:

value?else=expensive_default()

Alternatively, augmented assignment could just be dropped from the currentproposal entirely and potentially reconsidered at a later date.

Existence checking protocol

The existence checking protocol is including in this proposal primarily toallow for proxy objects (e.g. local representations of remote resources) andmock objects used in testing to correctly indicate non-existence of targetresources, even though the proxy or mock object itself is not None.

However, with that protocol defined, it then seems natural to expand it toprovide a type independent way of checking forNaN values in numeric types- at the moment you need to be aware of the exact data type you’re working with(e.g. builtin floats, builtin complex numbers, the decimal module) and use theappropriate operation (e.g.math.isnan,cmath.isnan,decimal.getcontext().is_nan(), respectively)

Similarly, it seems reasonable to declare that the other placeholder builtinsingletons,Ellipsis andNotImplemented, also qualify as objects thatrepresent the absence of data more so than they represent data.

Proposed symbolic notation

Python has historically only had one kind of implied boolean context: truthchecking, which can be invoked directly via thebool() builtin. As this PEPproposes a new kind of control flow operation based on existence checking ratherthan truth checking, it is considered valuable to have a reminder directlyin the code when existence checking is being used rather than truth checking.

The mathematical symbol for existence assertions is U+2203 ‘THERE EXISTS’:∃

Accordingly, one possible approach to the syntactic additions proposed in thisPEP would be to use that already defined mathematical notation:

expr1∃thenexpr2
expr1∃elseexpr2
obj∃.attr
obj∃[expr]
target∃=expr

However, there are two major problems with that approach, one practical, andone pedagogical.

The practical problem is the usual one that most keyboards don’t offer any easyway of entering mathematical symbols other than those used in basic arithmetic(even the symbols appearing in this PEP were ultimately copied & pastedfrom[3] rather than being entered directly).

The pedagogical problem is that the symbols for existence assertions (∃)and universal assertions (∀) aren’t going to be familiar to most peoplethe way basic arithmetic operators are, so we wouldn’t actually be making theproposed syntax easier to understand by adopting∃.

By contrast,? is one of the few remaining unused ASCII punctuationcharacters in Python’s syntax, making it available as a candidate syntacticmarker for “this control flow operation is based on an existence check, not atruth check”.

Taking that path would also have the advantage of aligning Python’s syntaxwith corresponding syntax in other languages that offer similar features.

Drawing from the existing summary inPEP 505 and the Wikipedia articles onthe “safe navigation operator[1] and the “null coalescing operator”[2],we see:

The?. existence checking attribute access syntax precisely aligns with:
- the “safe navigation” attribute access operator in C# (?.)
- the “optional chaining” operator in Swift (?.)
- the “safe navigation” attribute access operator in Groovy (?.)
- the “conditional member access” operator in Dart (?.)
The?[] existence checking attribute access syntax precisely aligns with:
- the “safe navigation” subscript operator in C# (?[])
- the “optional subscript” operator in Swift (?[].)
The?else existence checking fallback syntax semantically aligns with:
- the “null-coalescing” operator in C# (??)
- the “null-coalescing” operator in PHP (??)
- the “nil-coalescing” operator in Swift (??)

To be clear, these aren’t the only spelling of these operators used in otherlanguages, but they’re the most common ones, and the? symbol is the mostcommon syntactic marker by far (presumably prompted by the use of? tointroduce the “then” clause in C-style conditional expressions, which manyof these languages also offer).

Proposed keywords

Given the symbolic marker?, it would be syntactically unambiguous to spellthe existence checking precondition and fallback operations using the samekeywords as their truth checking counterparts:

expr1?andexpr2 (instead ofexpr1?thenexpr2)
expr1?orexpr2 (instead ofexpr1?elseexpr2)

However, while syntactically unambiguous when written, this approach makesthe code incredibly hard topronounce (What’s the pronunciation of “?”?) andalso hard todescribe (given reused keywords, there’s no obvious shorthandterms for “existence checking precondition (?and)” and “existence checkingfallback (?or)” that would distinguish them from “logical conjunction (and)”and “logical disjunction (or)”).

We could try to encourage folks to pronounce the? symbol as “exists”,making the shorthand names the “exists-and expression” and the“exists-or expression”, but there’d be no way of guessing those names purelyfrom seeing them written in a piece of code.

Instead, this PEP takes advantage of the proposed symbolic syntax to introducea new keyword (?then) and borrow an existing one (?else) in a waythat allows people to refer to “then expressions” and “else expressions”without ambiguity.

These keywords also align well with the conditional expressions that aresemantically equivalent to the proposed expressions.

For?else expressions,expr1?elseexpr2 is equivalent to:

_lhs_result=expr1_lhs_resultifoperator.exists(_lhs_result)elseexpr2

Here the parallel is clear, since theelseexpr2 appears at the end ofboth the abbreviated and expanded forms.

For?then expressions,expr1?thenexpr2 is equivalent to:

_lhs_result=expr1expr2ifoperator.exists(_lhs_result)else_lhs_result

Here the parallel isn’t as immediately obvious due to Python’s traditionallyanonymous “then” clauses (introduced by: inif statements and suffixedbyif in conditional expressions), but it’s still reasonably clear as longas you’re already familiar with the “if-then-else” explanation of conditionalcontrol flow.

Risks and concerns

Readability

Learning to read and write the new syntax effectively mainly requiresinternalising two concepts:

expressions containing? include an existence check and may short circuit
ifNone or another “non-existent” value is an expected input, and thecorrect handling is to propagate that to the result, then the existencechecking operators are likely what you want

Currently, these concepts aren’t explicitly represented at the language level,so it’s a matter of learning to recognise and use the various idiomaticpatterns based on conditional expressions and statements.

Magic syntax

There’s nothing about? as a syntactic element that inherently suggestsisnotNone oroperator.exists. The main current use of? as asymbol in Python code is as a trailing suffix in IPython environments torequest help information for the result of the preceding expression.

However, the notion of existence checking really does benefit from a pervasivevisual marker that distinguishes it from truth checking, and that calls fora single-character symbolic syntax if we’re going to do it at all.

Conceptual complexity

This proposal takes the currently ad hoc and informal concept of “existencechecking” and elevates it to the status of being a syntactic language featurewith a clearly defined operator protocol.

In many ways, this should actuallyreduce the overall conceptual complexityof the language, as many more expectations will map correctly between truthchecking withbool(expr) and existence checking withoperator.exists(expr) than currently map between truth checking andexistence checking withexprisnotNone (orexprisnotNotImplementedin the context of operand coercion, or the various NaN-checking operationsin mathematical libraries).

As a simple example of the new parallels introduced by this PEP, compare:

all_are_true=all(map(bool,iterable))at_least_one_is_true=any(map(bool,iterable))all_exist=all(map(operator.exists,iterable))at_least_one_exists=any(map(operator.exists,iterable))

Design Discussion

Subtleties in chaining existence checking expressions

Similar subtleties arise in chaining existence checking expressions as alreadyexist in chaining logical operators: the behaviour can be surprising if theright hand side of one of the expressions in the chain itself returns avalue that doesn’t exist.

As a result,value=arg1?thenf(arg1)?elsedefault() would be dubious foressentially the same reason thatvalue=condandexpr1orexpr2 is dubious:the former will evaluatedefault() iff(arg1) returnsNone, justas the latter will evaluateexpr2 ifexpr1 evaluates toFalse ina boolean context.

Ambiguous interaction with conditional expressions

In the proposal as currently written, the following is a syntax error:

value=f(arg)ifarg?elsedefault

While the following is a valid operation that checks a second condition if thefirst doesn’t exist rather than merely being false:

value=expr1ifcond1?elsecond2elseexpr2

The expression chaining problem described above means that the argument can bemade that the first operation should instead be equivalent to:

value=f(arg)ifoperator.exists(arg)elsedefault

requiring the second to be written in the arguably clearer form:

value=expr1if(cond1?elsecond2)elseexpr2

Alternatively, the first form could remain a syntax error, and the existencechecking symbol could instead be attached to theif keyword:

value=expr1if?condelseexpr2

Existence checking in other truth-checking contexts

The truth-checking protocol is currently used in the following syntacticconstructs:

logical conjunction (and-expressions)
logical disjunction (or-expressions)
conditional expressions (if-else expressions)
if statements
while loops
filter clauses in comprehensions and generator expressions

In the current PEP, switching from truth-checking withand andor toexistence-checking is a matter of substituting in the new keywords,?thenand?else in the appropriate places.

For other truth-checking contexts, it proposes either importing andusing theoperator.exists API, or else continuing with the current idiomof checking specifically forexprisnotNone (or the context appropriateequivalent).

The simplest possible enhancement in that regard would be to elevate theproposedexists() API from an operator module function to a new builtinfunction.

Alternatively, the? existence checking symbol could be supported as amodifier on theif andwhile keywords to indicate the use of anexistence check rather than a truth check.

However, it isn’t at all clear that the potential consistency benefits gainedfor either suggestion would justify the additional disruption, so they’vecurrently been omitted from the proposal.

Defining expected invariant relations between`bool` and`exists`

The PEP currently leaves the definition of__bool__ on all existing typesunmodified, which ensures the entire proposal remains backwards compatible,but results in the following cases wherebool(obj) returnsTrue, butthe proposedoperator.exists(obj) would returnFalse:

NaN values forfloat,complex, anddecimal.Decimal
Ellipsis
NotImplemented

The main argument for potentially changing these is that it becomes easier toreason about potential code behaviour if we have a recommended invariant inplace saying that values which indicate they don’t exist in an existencechecking context should also report themselves as beingFalse in a truthchecking context.

Failing to define such an invariant would lead to arguably odd outcomes likefloat("NaN")?else0.0 returning0.0 whilefloat("NaN")or0.0returnsNaN.

Limitations

Arbitrary sentinel objects

This proposal doesn’t attempt to provide syntactic support for the “sentinelobject” idiom, whereNone is a permitted explicit value, so aseparate sentinel object is defined to indicate missing values:

_SENTINEL=object()deff(obj=_SENTINEL):returnobjifobjisnot_SENTINELelsedefault_value()

This could potentially be supported at the expense of making the existenceprotocol definition significantly more complex, both to define and to use:

at the Python layer,operator.exists and__exists__ implementationswould return the empty tuple to indicate non-existence, and otherwise returna singleton tuple containing a reference to the object to be used as theresult of the existence check
at the C layer,tp_exists implementations would return NULL to indicatenon-existence, and otherwise return aPyObject* pointer as theresult of the existence check

Given that change, the sentinel object idiom could be rewritten as:

class Maybe:  SENTINEL = object()  def __init__(self, value):      self._result = (value,) is value is not self.SENTINEL else ()  def __exists__(self):      return self._resultdef f(obj=Maybe.SENTINEL):    return Maybe(obj) ?else default_value()

However, I don’t think cases where the 3 proposed standard sentinel values (i.e.None,Ellipsis andNotImplemented) can’t be used are going to beanywhere near common enough for the additional protocol complexity and the lossof symmetry between__bool__ and__exists__ to be worth it.

Specification

The Abstract already gives the gist of the proposal and the Rationale givessome specific examples. If there’s enough interest in the basic idea, then afull specification will need to provide a precise correspondence between theproposed syntactic sugar and the underlying conditional expressions that issufficient to guide the creation of a reference implementation.

…TBD…

Implementation

As withPEP 505, actual implementation has been deferred pending in-principleinterest in the idea of adding these operators - the implementation isn’tthe hard part of these proposals, the hard part is deciding whether or notthis is a change where the long term benefits for new and existing Python usersoutweigh the short term costs involved in the wider ecosystem (includingdevelopers of other implementations, language curriculum developers, andauthors of other Python related educational material) adjusting to the change.

…TBD…