Python Enhancement Proposals

Python »
PEP Index »
PEP 472

PEP 472 – Support for indexing with keyword arguments

Author:: Stefano Borini, Joseph Martinot-Lagarde
Discussions-To:

Table of Contents

Abstract

This PEP proposes an extension of the indexing operation to support keywordarguments. Notations in the forma[K=3,R=2] would become legal syntax.For future-proofing considerations,a[1:2,K=3,R=4] are considered andmay be allowed as well, depending on the choice for implementation. In additionto a change in the parser, the index protocol (__getitem__,__setitem__and__delitem__) will also potentially require adaptation.

Motivation

The indexing syntax carries a strong semantic content, differentiating it froma method call: it implies referring to a subset of data. We believe thissemantic association to be important, and wish to expand the strategies allowedto refer to this data.

As a general observation, the number of indices needed by an indexing operationdepends on the dimensionality of the data: one-dimensional data (e.g. a list)requires one index (e.g.a[3]), two-dimensional data (e.g. a matrix) requirestwo indices (e.g.a[2,3]) and so on. Each index is a selector along one of theaxes of the dimensionality, and the position in the index tuple is themetainformation needed to associate each index to the corresponding axis.

The current python syntax focuses exclusively on position to express theassociation to the axes, and also contains syntactic sugar to refer tonon-punctiform selection (slices)

>>>a[3]# returns the fourth element of a>>>a[1:10:2]# slice notation (extract a non-trivial data subset)>>>a[3,2]# multiple indexes (for multidimensional arrays)

The additional notation proposed in this PEP would allow notations involvingkeyword arguments in the indexing operation, e.g.

>>>a[K=3,R=2]

which would allow to refer to axes by conventional names.

One must additionally consider the extended form that allows both positionaland keyword specification

>>>a[3,R=3,K=4]

This PEP will explore different strategies to enable the use of these notations.

Use cases

The following practical use cases present two broad categories of usage of akeyworded specification: Indexing and contextual option. For indexing:

To provide a more communicative meaning to the index, preventing e.g. accidentalinversion of indexes
```
>>>gridValues[x=3,y=5,z=8]>>>rain[time=0:12,location=location]
```
In some domain, such as computational physics and chemistry, the use of anotation such asBasis[Z=5] is a Domain Specific Language notation to representa level of accuracy
```
>>>low_accuracy_energy=computeEnergy(molecule,BasisSet[Z=3])
```
In this case, the index operation would return a basis set at the chosen levelof accuracy (represented by the parameter Z). The reason behind an indexing is thatthe BasisSet object could be internally represented as a numeric table, whererows (the “coefficient” axis, hidden to the user in this example) are associatedto individual elements (e.g. row 0:5 contains coefficients for element 1,row 5:8 coefficients for element 2) and each column is associated to a givendegree of accuracy (“accuracy” or “Z” axis) so that first column is lowaccuracy, second column is medium accuracy and so on. With that indexing,the user would obtain another object representing the contents of the columnof the internal table for accuracy level 3.

Additionally, the keyword specification can be used as an option contextual tothe indexing. Specifically:

A “default” option allows to specify a default return value when the indexis not present
```
>>>lst=[1,2,3]>>>value=lst[5,default=0]# value is 0
```
For a sparse dataset, to specify an interpolation strategyto infer a missing point from e.g. its surrounding data.
```
>>>value=array[1,3,interpolate=spline_interpolator]
```
A unit could be specified with the same mechanism
```
>>>value=array[1,3,unit="degrees"]
```

How the notation is interpreted is up to the implementing class.

Current implementation

Currently, the indexing operation is handled by methods__getitem__,__setitem__ and__delitem__. These methods’ signature accept one argumentfor the index (with__setitem__ accepting an additional argument for the setvalue). In the following, we will analyze__getitem__(self,idx) exclusively,with the same considerations implied for the remaining two methods.

When an indexing operation is performed,__getitem__(self,idx) is called.Traditionally, the full content between square brackets is turned into a singleobject passed to argumentidx:

When a single element is passed, e.g.a[2],idx will be2.
When multiple elements are passed, they must be separated by commas:a[2,3].In this case,idx will be a tuple(2,3). Witha[2,3,"hello",{}]idx will be(2,3,"hello",{}).
A slicing notation e.g.a[2:10] will produce a slice object, or a tuplecontaining slice objects if multiple values were passed.

Except for its unique ability to handle slice notation, the indexing operationhas similarities to a plain method call: it acts like one when invoked withonly one element; If the number of elements is greater than one, theidxargument behaves like a*args. However, as stated in the Motivation section,an indexing operation has the strong semantic implication of extraction of asubset out of a larger set, which is not automatically associated to a regularmethod call unless appropriate naming is chosen. Moreover, its different visualstyle is important for readability.

Specifications

The implementation should try to preserve the current signature for__getitem__, or modify it in a backward-compatible way. We will presentdifferent alternatives, taking into account the possible cases that needto be addressed

C0.a[1];a[1,2]# Traditional indexingC1.a[Z=3]C2.a[Z=3,R=4]C3.a[1,Z=3]C4.a[1,Z=3,R=4]C5.a[1,2,Z=3]C6.a[1,2,Z=3,R=4]C7.a[1,Z=3,2,R=4]# Interposed ordering

Strategy “Strict dictionary”

This strategy acknowledges that__getitem__ is special in accepting onlyone object, and the nature of that object must be non-ambiguous in itsspecification of the axes: it can be either by order, or by name. As a resultof this assumption, in presence of keyword arguments, the passed entity is adictionary and all labels must be specified.

C0.a[1];a[1,2]->idx=1;idx=(1,2)C1.a[Z=3]->idx={"Z":3}C2.a[Z=3,R=4]->idx={"Z":3,"R":4}C3.a[1,Z=3]->raiseSyntaxErrorC4.a[1,Z=3,R=4]->raiseSyntaxErrorC5.a[1,2,Z=3]->raiseSyntaxErrorC6.a[1,2,Z=3,R=4]->raiseSyntaxErrorC7.a[1,Z=3,2,R=4]->raiseSyntaxError

Pros

Strong conceptual similarity between the tuple case and the dictionary case.In the first case, we are specifying a tuple, so we are naturally defininga plain set of values separated by commas. In the second, we are specifying adictionary, so we are specifying a homogeneous set of key/value pairs, asindict(Z=3,R=4);
Simple and easy to parse on the__getitem__ side: if it gets a tuple,determine the axes using positioning. If it gets a dictionary, usethe keywords.
C interface does not need changes.

Neutral

Degeneracy ofa[{"Z":3,"R":4}] witha[Z=3,R=4] means the notationis syntactic sugar.

Cons

Very strict.
Destroys ordering of the passed arguments. Preserving theorder would be possible with an OrderedDict as drafted byPEP 468.
Does not allow use cases with mixed positional/keyword arguments such asa[1,2,default=5].

Strategy “mixed dictionary”

This strategy relaxes the above constraint to return a dictionary containingboth numbers and strings as keys.

C0.a[1];a[1,2]->idx=1;idx=(1,2)C1.a[Z=3]->idx={"Z":3}C2.a[Z=3,R=4]->idx={"Z":3,"R":4}C3.a[1,Z=3]->idx={0:1,"Z":3}C4.a[1,Z=3,R=4]->idx={0:1,"Z":3,"R":4}C5.a[1,2,Z=3]->idx={0:1,1:2,"Z":3}C6.a[1,2,Z=3,R=4]->idx={0:1,1:2,"Z":3,"R":4}C7.a[1,Z=3,2,R=4]->idx={0:1,"Z":3,2:2,"R":4}

Pros

Opens for mixed cases.

Cons

Destroys ordering information for string keys. We have no way of saying if"Z" in C7 was in position 1 or 3.
Implies switching from a tuple to a dict as soon as one specified indexhas a keyword argument. May be confusing to parse.

Strategy “named tuple”

Return a named tuple foridx instead of a tuple. Keyword arguments wouldobviously have their stated name as key, and positional argument would have anunderscore followed by their order:

C0.a[1];a[1,2]->idx=1;idx=(_0=1,_1=2)C1.a[Z=3]->idx=(Z=3)C2.a[Z=3,R=2]->idx=(Z=3,R=2)C3.a[1,Z=3]->idx=(_0=1,Z=3)C4.a[1,Z=3,R=2]->idx=(_0=1,Z=3,R=2)C5.a[1,2,Z=3]->idx=(_0=1,_2=2,Z=3)C6.a[1,2,Z=3,R=4]->(_0=1,_1=2,Z=3,R=4)C7.a[1,Z=3,2,R=4]->(_0=1,Z=3,_1=2,R=4)or(_0=1,Z=3,_2=2,R=4)orraiseSyntaxError

The required typename of the namedtuple could beIndex or the name of theargument in the function definition, it keeps the ordering and is easy toanalyse by using the_fields attribute. It is backward compatible, providedthat C0 with more than one entry now passes a namedtuple instead of a plaintuple.

Pros

Looks nice. namedtuple transparently replaces tuple and gracefullydegrades to the old behavior.
Does not require a change in the C interface

Cons

According tosome sourcesnamedtuple is not well developed.To include it as such important object would probably require reworkand improvement;
The namedtuple fields, and thus the type, will have to change accordingto the passed arguments. This can be a performance bottleneck, and makesit impossible to guarantee that two subsequent index accesses get the sameIndex class;
the_n “magic” fields are a bit unusual, but ipython already uses themfor result history.
Python currently has no builtin namedtuple. The current one is availablein the “collections” module in the standard library.
Differently from a function, the two notationsgridValues[x=3,y=5,z=8]andgridValues[3,5,8] would not gracefully match if the order is modifiedat call time (e.g. we ask forgridValues[y=5,z=8,x=3]). In a function,we can pre-define argument names so that keyword arguments are properlymatched. Not so in__getitem__, leaving the task for interpreting andmatching to__getitem__ itself.

Strategy “New argument contents”

In the current implementation, when many arguments are passed to__getitem__,they are grouped in a tuple and this tuple is passed to__getitem__ as thesingle argumentidx. This strategy keeps the current signature, but expands therange of variability in type and contents ofidx to more complex representations.

We identify four possible ways to implement this strategy:

P1: uses a single dictionary for the keyword arguments.
P2: uses individual single-item dictionaries.
P3: similar toP2, but replaces single-item dictionaries with a(key,value) tuple.
P4: similar toP2, but uses a special and additional new object:keyword()

Some of these possibilities lead to degenerate notations, i.e. indistinguishablefrom an already possible representation. Once again, the proposed notationbecomes syntactic sugar for these representations.

Under this strategy, the old behavior for C0 is unchanged.

C0:a[1]->idx=1# integera[1,2]->idx=(1,2)# tuple

In C1, we can use either a dictionary or a tuple to represent key and value pairfor the specific indexing entry. We need to have a tuple with a tuple in C1because otherwise we cannot differentiatea["Z",3] froma[Z=3].

C1:a[Z=3]->idx={"Z":3}# P1/P2 dictionary with single keyoridx=(("Z",3),)# P3 tuple of tuplesoridx=keyword("Z",3)# P4 keyword object

As you can see, notation P1/P2 implies thata[Z=3] anda[{"Z":3}] willcall__getitem__ passing the exact same value, and is therefore syntacticsugar for the latter. Same situation occurs, although with different index, forP3. Using a keyword object as in P4 would remove this degeneracy.

For the C2 case:

C2.a[Z=3,R=4]->idx={"Z":3,"R":4}# P1 dictionary/ordereddictoridx=({"Z":3},{"R":4})# P2 tuple of two single-key dictoridx=(("Z",3),("R",4))# P3 tuple of tuplesoridx=(keyword("Z",3),keyword("R",4))# P4 keyword objects

P1 naturally maps to the traditional**kwargs behavior, however it breaksthe convention that two or more entries for the index produce a tuple. P2preserves this behavior, and additionally preserves the order. Preserving theorder would also be possible with an OrderedDict as drafted byPEP 468.

The remaining cases are here shown:

C3.a[1,Z=3]->idx=(1,{"Z":3})# P1/P2oridx=(1,("Z",3))# P3oridx=(1,keyword("Z",3))# P4C4.a[1,Z=3,R=4]->idx=(1,{"Z":3,"R":4})# P1oridx=(1,{"Z":3},{"R":4})# P2oridx=(1,("Z",3),("R",4))# P3oridx=(1,keyword("Z",3),keyword("R",4))# P4C5.a[1,2,Z=3]->idx=(1,2,{"Z":3})# P1/P2oridx=(1,2,("Z",3))# P3oridx=(1,2,keyword("Z",3))# P4C6.a[1,2,Z=3,R=4]->idx=(1,2,{"Z":3,"R":4})# P1oridx=(1,2,{"Z":3},{"R":4})# P2oridx=(1,2,("Z",3),("R",4))# P3oridx=(1,2,keyword("Z",3),keyword("R",4))# P4C7.a[1,Z=3,2,R=4]->idx=(1,2,{"Z":3,"R":4})# P1. Pack the keyword arguments. Ugly.orraiseSyntaxError# P1. Same behavior as in function calls.oridx=(1,{"Z":3},2,{"R":4})# P2oridx=(1,("Z",3),2,("R",4))# P3oridx=(1,keyword("Z",3),2,keyword("R",4))# P4

Pros

Signature is unchanged;
P2/P3 can preserve ordering of keyword arguments as specified at indexing,
P1 needs an OrderedDict, but would destroy interposed ordering if allowed:all keyword indexes would be dumped into the dictionary;
Stays within traditional types: tuples and dicts. Evt. OrderedDict;
Some proposed strategies are similar in behavior to a traditional function call;
The C interface forPyObject_GetItem and family would remain unchanged.

Cons

Apparently complex and wasteful;
Degeneracy in notation (e.g.a[Z=3] anda[{"Z":3}] are equivalent andindistinguishable notations at the__[get|set|del]item__ level).This behavior may or may not be acceptable.
for P4, an additional object similar in nature to slice() is needed,but only to disambiguate the above degeneracy.
idx type and layout seems to change depending on the whims of the caller;
May be complex to parse what is passed, especially in the case of tuple of tuples;
P2 Creates a lot of single keys dictionary as members of a tuple. Looks ugly.P3 would be lighter and easier to use than the tuple of dicts, and stillpreserves order (unlike the regular dict), but would result in clumsyextraction of keywords.

Strategy “kwargs argument”

__getitem__ accepts an optional**kwargs argument which should be keyword only.idx also becomes optional to support a case where no non-keyword arguments are allowed.The signature would then be either

__getitem__(self,idx)__getitem__(self,idx,**kwargs)__getitem__(self,**kwargs)

Applied to our cases would produce:

C0.a[1,2]->idx=(1,2);kwargs={}C1.a[Z=3]->idx=None;kwargs={"Z":3}C2.a[Z=3,R=4]->idx=None;kwargs={"Z":3,"R":4}C3.a[1,Z=3]->idx=1;kwargs={"Z":3}C4.a[1,Z=3,R=4]->idx=1;kwargs={"Z":3,"R":4}C5.a[1,2,Z=3]->idx=(1,2);kwargs={"Z":3}C6.a[1,2,Z=3,R=4]->idx=(1,2);kwargs={"Z":3,"R":4}C7.a[1,Z=3,2,R=4]->raiseSyntaxError# in agreement to function behavior

Empty indexinga[] of course remains invalid syntax.

Pros

Similar to function call, evolves naturally from it;
Use of keyword indexing with an object whose__getitem__doesn’t have a kwargs will fail in an obvious way.That’s not the case for the other strategies.

Cons

It doesn’t preserve order, unless an OrderedDict is used;
Forbids C7, but is it really needed?
Requires a change in the C interface to pass an additionalPyObject for the keyword arguments.

C interface

As briefly introduced in the previous analysis, the C interface wouldpotentially have to change to allow the new feature. Specifically,PyObject_GetItem and related routines would have to accept an additionalPyObject*kw argument for Strategy “kwargs argument”. The remainingstrategies would not require a change in the C function signatures, but thedifferent nature of the passed object would potentially require adaptation.

Strategy “named tuple” would behave correctly without any change: the classreturned by the factory method in collections returns a subclass of tuple,meaning thatPyTuple_* functions can handle the resulting object.

Alternative Solutions

In this section, we present alternative solutions that would workaround themissing feature and make the proposed enhancement not worth of implementation.

Use a method

One could keep the indexing as is, and use a traditionalget() method for thosecases where basic indexing is not enough. This is a good point, but as alreadyreported in the introduction, methods have a different semantic weight fromindexing, and you can’t use slices directly in methods. Compare e.g.a[1:3,Z=2] witha.get(slice(1,3),Z=2).

The authors however recognize this argument as compelling, and the advantagein semantic expressivity of a keyword-based indexing may be offset by a rarelyused feature that does not bring enough benefit and may have limited adoption.

Emulate requested behavior by abusing the slice object

This extremely creative method exploits the slice objects’ behavior, providedthat one accepts to use strings (or instantiate properly named placeholderobjects for the keys), and accept to use “:” instead of “=”.

>>>a["K":3]slice('K', 3, None)>>>a["K":3,"R":4](slice('K', 3, None), slice('R', 4, None))>>>

While clearly smart, this approach does not allow easy inquire of the key/valuepair, it’s too clever and esotheric, and does not allow to pass a slice as ina[K=1:10:2].

However, Tim Delaney comments

“I really do think thata[b=c,d=e] should just be syntax sugar fora['b':c,'d':e]. It’s simple to explain, and gives the greatest backwardscompatibility. In particular, libraries that already abused slices in thisway will just continue to work with the new syntax.”

We think this behavior would produce inconvenient results. The library Pandas usesstrings as labels, allowing notation such as

>>>a[:,"A":"F"]

to extract data from column “A” to column “F”. Under the above comment, this notationwould be equally obtained with

>>>a[:,A="F"]

which is weird and collides with the intended meaning of keyword in indexing, thatis, specifying the axis through conventional names rather than positioning.

Pass a dictionary as an additional index

>>>a[1,2,{"K":3}]

this notation, although less elegant, can already be used and achieves similarresults. It’s evident that the proposed Strategy “New argument contents” can beinterpreted as syntactic sugar for this notation.

Additional Comments

Commenters also expressed the following relevant points:

Relevance of ordering of keyword arguments

As part of the discussion of this PEP, it’s important to decide if the orderinginformation of the keyword arguments is important, and if indexes and keys canbe ordered in an arbitrary way (e.g.a[1,Z=3,2,R=4]).PEP 468tries to address the first point by proposing the use of an ordereddict,however one would be inclined to accept that keyword arguments in indexing areequivalent to kwargs in function calls, and therefore as of today equallyunordered, and with the same restrictions.

Need for homogeneity of behavior

Relative to Strategy “New argument contents”, a comment from Ian Cordascopoints out that

“it would be unreasonable for just one method to behave totallydifferently from the standard behaviour in Python. It would be confusing foronly__getitem__ (and ostensibly,__setitem__) to take keywordarguments but instead of turning them into a dictionary, turn them intoindividual single-item dictionaries.” We agree with his point, however it mustbe pointed out that__getitem__ is already special in some regards when itcomes to passed arguments.

Chris Angelico also states:

“it seems very odd to start out by saying “here, let’s give indexing theoption to carry keyword args, just like with function calls”, and then comeback and say “oh, but unlike function calls, they’re inherently ordered andcarried very differently”.” Again, we agree on this point. The moststraightforward strategy to keep homogeneity would be Strategy “kwargsargument”, opening to a**kwargs argument on__getitem__.

One of the authors (Stefano Borini) thinks that only the “strict dictionary”strategy is worth of implementation. It is non-ambiguous, simple, does notforce complex parsing, and addresses the problem of referring to axes eitherby position or by name. The “options” use case is probably best handled witha different approach, and may be irrelevant for this PEP. The alternative“named tuple” is another valid choice.

Having .get() become obsolete for indexing with default fallback

Introducing a “default” keyword could makedict.get() obsolete, which would bereplaced byd["key",default=3]. Chris Angelico however states:

“Currently, you need to write__getitem__ (which raises an exception onfinding a problem) plus something else, e.g.get(), which returns a defaultinstead. By your proposal, both branches would go inside__getitem__, whichmeans they could share code; but there still need to be two branches.”

Additionally, Chris continues:

“There’ll be an ad-hoc and fairly arbitrary puddle of names (some will godefault=, others will say that’s way too long and godef=, except thatthat’s a keyword so they’ll usedflt= or something…), unless there’s astrong force pushing people to one consistent name.”.

This argument is valid but it’s equally valid for any function call, and isgenerally fixed by established convention and documentation.

On degeneracy of notation

User Drekin commented: “The case ofa[Z=3] anda[{"Z":3}] is similar tocurrenta[1,2] anda[(1,2)]. Even though one may argue that the parenthesesare actually not part of tuple notation but are just needed because of syntax,it may look as degeneracy of notation when compared to function call:f(1,2)is not the same thing asf((1,2)).”.

References

Copyright

This document has been placed in the public domain.

Source:https://github.com/python/peps/blob/main/peps/pep-0472.rst

Last modified:2024-12-19 20:04:05 GMT

Movatterモバイル変換

PEP 472 – Support for indexing with keyword arguments