This PEP proposes an extension of the indexing operation to support keywordarguments. Notations in the forma[K=3,R=2] would become legal syntax.For future-proofing considerations,a[1:2,K=3,R=4] are considered andmay be allowed as well, depending on the choice for implementation. In additionto a change in the parser, the index protocol (__getitem__,__setitem__and__delitem__) will also potentially require adaptation.
The indexing syntax carries a strong semantic content, differentiating it froma method call: it implies referring to a subset of data. We believe thissemantic association to be important, and wish to expand the strategies allowedto refer to this data.
As a general observation, the number of indices needed by an indexing operationdepends on the dimensionality of the data: one-dimensional data (e.g. a list)requires one index (e.g.a[3]), two-dimensional data (e.g. a matrix) requirestwo indices (e.g.a[2,3]) and so on. Each index is a selector along one of theaxes of the dimensionality, and the position in the index tuple is themetainformation needed to associate each index to the corresponding axis.
The current python syntax focuses exclusively on position to express theassociation to the axes, and also contains syntactic sugar to refer tonon-punctiform selection (slices)
>>>a[3]# returns the fourth element of a>>>a[1:10:2]# slice notation (extract a non-trivial data subset)>>>a[3,2]# multiple indexes (for multidimensional arrays)
The additional notation proposed in this PEP would allow notations involvingkeyword arguments in the indexing operation, e.g.
>>>a[K=3,R=2]
which would allow to refer to axes by conventional names.
One must additionally consider the extended form that allows both positionaland keyword specification
>>>a[3,R=3,K=4]
This PEP will explore different strategies to enable the use of these notations.
The following practical use cases present two broad categories of usage of akeyworded specification: Indexing and contextual option. For indexing:
>>>gridValues[x=3,y=5,z=8]>>>rain[time=0:12,location=location]
Basis[Z=5] is a Domain Specific Language notation to representa level of accuracy>>>low_accuracy_energy=computeEnergy(molecule,BasisSet[Z=3])
In this case, the index operation would return a basis set at the chosen levelof accuracy (represented by the parameter Z). The reason behind an indexing is thatthe BasisSet object could be internally represented as a numeric table, whererows (the “coefficient” axis, hidden to the user in this example) are associatedto individual elements (e.g. row 0:5 contains coefficients for element 1,row 5:8 coefficients for element 2) and each column is associated to a givendegree of accuracy (“accuracy” or “Z” axis) so that first column is lowaccuracy, second column is medium accuracy and so on. With that indexing,the user would obtain another object representing the contents of the columnof the internal table for accuracy level 3.
Additionally, the keyword specification can be used as an option contextual tothe indexing. Specifically:
>>>lst=[1,2,3]>>>value=lst[5,default=0]# value is 0
>>>value=array[1,3,interpolate=spline_interpolator]
>>>value=array[1,3,unit="degrees"]
How the notation is interpreted is up to the implementing class.
Currently, the indexing operation is handled by methods__getitem__,__setitem__ and__delitem__. These methods’ signature accept one argumentfor the index (with__setitem__ accepting an additional argument for the setvalue). In the following, we will analyze__getitem__(self,idx) exclusively,with the same considerations implied for the remaining two methods.
When an indexing operation is performed,__getitem__(self,idx) is called.Traditionally, the full content between square brackets is turned into a singleobject passed to argumentidx:
a[2],idx will be2.a[2,3].In this case,idx will be a tuple(2,3). Witha[2,3,"hello",{}]idx will be(2,3,"hello",{}).a[2:10] will produce a slice object, or a tuplecontaining slice objects if multiple values were passed.Except for its unique ability to handle slice notation, the indexing operationhas similarities to a plain method call: it acts like one when invoked withonly one element; If the number of elements is greater than one, theidxargument behaves like a*args. However, as stated in the Motivation section,an indexing operation has the strong semantic implication of extraction of asubset out of a larger set, which is not automatically associated to a regularmethod call unless appropriate naming is chosen. Moreover, its different visualstyle is important for readability.
The implementation should try to preserve the current signature for__getitem__, or modify it in a backward-compatible way. We will presentdifferent alternatives, taking into account the possible cases that needto be addressed
C0.a[1];a[1,2]# Traditional indexingC1.a[Z=3]C2.a[Z=3,R=4]C3.a[1,Z=3]C4.a[1,Z=3,R=4]C5.a[1,2,Z=3]C6.a[1,2,Z=3,R=4]C7.a[1,Z=3,2,R=4]# Interposed ordering
This strategy acknowledges that__getitem__ is special in accepting onlyone object, and the nature of that object must be non-ambiguous in itsspecification of the axes: it can be either by order, or by name. As a resultof this assumption, in presence of keyword arguments, the passed entity is adictionary and all labels must be specified.
C0.a[1];a[1,2]->idx=1;idx=(1,2)C1.a[Z=3]->idx={"Z":3}C2.a[Z=3,R=4]->idx={"Z":3,"R":4}C3.a[1,Z=3]->raiseSyntaxErrorC4.a[1,Z=3,R=4]->raiseSyntaxErrorC5.a[1,2,Z=3]->raiseSyntaxErrorC6.a[1,2,Z=3,R=4]->raiseSyntaxErrorC7.a[1,Z=3,2,R=4]->raiseSyntaxError
dict(Z=3,R=4);__getitem__ side: if it gets a tuple,determine the axes using positioning. If it gets a dictionary, usethe keywords.a[{"Z":3,"R":4}] witha[Z=3,R=4] means the notationis syntactic sugar.a[1,2,default=5].This strategy relaxes the above constraint to return a dictionary containingboth numbers and strings as keys.
C0.a[1];a[1,2]->idx=1;idx=(1,2)C1.a[Z=3]->idx={"Z":3}C2.a[Z=3,R=4]->idx={"Z":3,"R":4}C3.a[1,Z=3]->idx={0:1,"Z":3}C4.a[1,Z=3,R=4]->idx={0:1,"Z":3,"R":4}C5.a[1,2,Z=3]->idx={0:1,1:2,"Z":3}C6.a[1,2,Z=3,R=4]->idx={0:1,1:2,"Z":3,"R":4}C7.a[1,Z=3,2,R=4]->idx={0:1,"Z":3,2:2,"R":4}
"Z" in C7 was in position 1 or 3.Return a named tuple foridx instead of a tuple. Keyword arguments wouldobviously have their stated name as key, and positional argument would have anunderscore followed by their order:
C0.a[1];a[1,2]->idx=1;idx=(_0=1,_1=2)C1.a[Z=3]->idx=(Z=3)C2.a[Z=3,R=2]->idx=(Z=3,R=2)C3.a[1,Z=3]->idx=(_0=1,Z=3)C4.a[1,Z=3,R=2]->idx=(_0=1,Z=3,R=2)C5.a[1,2,Z=3]->idx=(_0=1,_2=2,Z=3)C6.a[1,2,Z=3,R=4]->(_0=1,_1=2,Z=3,R=4)C7.a[1,Z=3,2,R=4]->(_0=1,Z=3,_1=2,R=4)or(_0=1,Z=3,_2=2,R=4)orraiseSyntaxError
The required typename of the namedtuple could beIndex or the name of theargument in the function definition, it keeps the ordering and is easy toanalyse by using the_fields attribute. It is backward compatible, providedthat C0 with more than one entry now passes a namedtuple instead of a plaintuple.
_n “magic” fields are a bit unusual, but ipython already uses themfor result history.gridValues[x=3,y=5,z=8]andgridValues[3,5,8] would not gracefully match if the order is modifiedat call time (e.g. we ask forgridValues[y=5,z=8,x=3]). In a function,we can pre-define argument names so that keyword arguments are properlymatched. Not so in__getitem__, leaving the task for interpreting andmatching to__getitem__ itself.In the current implementation, when many arguments are passed to__getitem__,they are grouped in a tuple and this tuple is passed to__getitem__ as thesingle argumentidx. This strategy keeps the current signature, but expands therange of variability in type and contents ofidx to more complex representations.
We identify four possible ways to implement this strategy:
(key,value) tuple.keyword()Some of these possibilities lead to degenerate notations, i.e. indistinguishablefrom an already possible representation. Once again, the proposed notationbecomes syntactic sugar for these representations.
Under this strategy, the old behavior for C0 is unchanged.
C0:a[1]->idx=1# integera[1,2]->idx=(1,2)# tuple
In C1, we can use either a dictionary or a tuple to represent key and value pairfor the specific indexing entry. We need to have a tuple with a tuple in C1because otherwise we cannot differentiatea["Z",3] froma[Z=3].
C1:a[Z=3]->idx={"Z":3}# P1/P2 dictionary with single keyoridx=(("Z",3),)# P3 tuple of tuplesoridx=keyword("Z",3)# P4 keyword object
As you can see, notation P1/P2 implies thata[Z=3] anda[{"Z":3}] willcall__getitem__ passing the exact same value, and is therefore syntacticsugar for the latter. Same situation occurs, although with different index, forP3. Using a keyword object as in P4 would remove this degeneracy.
For the C2 case:
C2.a[Z=3,R=4]->idx={"Z":3,"R":4}# P1 dictionary/ordereddictoridx=({"Z":3},{"R":4})# P2 tuple of two single-key dictoridx=(("Z",3),("R",4))# P3 tuple of tuplesoridx=(keyword("Z",3),keyword("R",4))# P4 keyword objects
P1 naturally maps to the traditional**kwargs behavior, however it breaksthe convention that two or more entries for the index produce a tuple. P2preserves this behavior, and additionally preserves the order. Preserving theorder would also be possible with an OrderedDict as drafted byPEP 468.
The remaining cases are here shown:
C3.a[1,Z=3]->idx=(1,{"Z":3})# P1/P2oridx=(1,("Z",3))# P3oridx=(1,keyword("Z",3))# P4C4.a[1,Z=3,R=4]->idx=(1,{"Z":3,"R":4})# P1oridx=(1,{"Z":3},{"R":4})# P2oridx=(1,("Z",3),("R",4))# P3oridx=(1,keyword("Z",3),keyword("R",4))# P4C5.a[1,2,Z=3]->idx=(1,2,{"Z":3})# P1/P2oridx=(1,2,("Z",3))# P3oridx=(1,2,keyword("Z",3))# P4C6.a[1,2,Z=3,R=4]->idx=(1,2,{"Z":3,"R":4})# P1oridx=(1,2,{"Z":3},{"R":4})# P2oridx=(1,2,("Z",3),("R",4))# P3oridx=(1,2,keyword("Z",3),keyword("R",4))# P4C7.a[1,Z=3,2,R=4]->idx=(1,2,{"Z":3,"R":4})# P1. Pack the keyword arguments. Ugly.orraiseSyntaxError# P1. Same behavior as in function calls.oridx=(1,{"Z":3},2,{"R":4})# P2oridx=(1,("Z",3),2,("R",4))# P3oridx=(1,keyword("Z",3),2,keyword("R",4))# P4
PyObject_GetItem and family would remain unchanged.a[Z=3] anda[{"Z":3}] are equivalent andindistinguishable notations at the__[get|set|del]item__ level).This behavior may or may not be acceptable.idx type and layout seems to change depending on the whims of the caller;__getitem__ accepts an optional**kwargs argument which should be keyword only.idx also becomes optional to support a case where no non-keyword arguments are allowed.The signature would then be either
__getitem__(self,idx)__getitem__(self,idx,**kwargs)__getitem__(self,**kwargs)
Applied to our cases would produce:
C0.a[1,2]->idx=(1,2);kwargs={}C1.a[Z=3]->idx=None;kwargs={"Z":3}C2.a[Z=3,R=4]->idx=None;kwargs={"Z":3,"R":4}C3.a[1,Z=3]->idx=1;kwargs={"Z":3}C4.a[1,Z=3,R=4]->idx=1;kwargs={"Z":3,"R":4}C5.a[1,2,Z=3]->idx=(1,2);kwargs={"Z":3}C6.a[1,2,Z=3,R=4]->idx=(1,2);kwargs={"Z":3,"R":4}C7.a[1,Z=3,2,R=4]->raiseSyntaxError# in agreement to function behavior
Empty indexinga[] of course remains invalid syntax.
__getitem__doesn’t have a kwargs will fail in an obvious way.That’s not the case for the other strategies.As briefly introduced in the previous analysis, the C interface wouldpotentially have to change to allow the new feature. Specifically,PyObject_GetItem and related routines would have to accept an additionalPyObject*kw argument for Strategy “kwargs argument”. The remainingstrategies would not require a change in the C function signatures, but thedifferent nature of the passed object would potentially require adaptation.
Strategy “named tuple” would behave correctly without any change: the classreturned by the factory method in collections returns a subclass of tuple,meaning thatPyTuple_* functions can handle the resulting object.
In this section, we present alternative solutions that would workaround themissing feature and make the proposed enhancement not worth of implementation.
One could keep the indexing as is, and use a traditionalget() method for thosecases where basic indexing is not enough. This is a good point, but as alreadyreported in the introduction, methods have a different semantic weight fromindexing, and you can’t use slices directly in methods. Compare e.g.a[1:3,Z=2] witha.get(slice(1,3),Z=2).
The authors however recognize this argument as compelling, and the advantagein semantic expressivity of a keyword-based indexing may be offset by a rarelyused feature that does not bring enough benefit and may have limited adoption.
This extremely creative method exploits the slice objects’ behavior, providedthat one accepts to use strings (or instantiate properly named placeholderobjects for the keys), and accept to use “:” instead of “=”.
>>>a["K":3]slice('K', 3, None)>>>a["K":3,"R":4](slice('K', 3, None), slice('R', 4, None))>>>
While clearly smart, this approach does not allow easy inquire of the key/valuepair, it’s too clever and esotheric, and does not allow to pass a slice as ina[K=1:10:2].
However, Tim Delaney comments
“I really do think thata[b=c,d=e]should just be syntax sugar fora['b':c,'d':e]. It’s simple to explain, and gives the greatest backwardscompatibility. In particular, libraries that already abused slices in thisway will just continue to work with the new syntax.”
We think this behavior would produce inconvenient results. The library Pandas usesstrings as labels, allowing notation such as
>>>a[:,"A":"F"]
to extract data from column “A” to column “F”. Under the above comment, this notationwould be equally obtained with
>>>a[:,A="F"]
which is weird and collides with the intended meaning of keyword in indexing, thatis, specifying the axis through conventional names rather than positioning.
>>>a[1,2,{"K":3}]
this notation, although less elegant, can already be used and achieves similarresults. It’s evident that the proposed Strategy “New argument contents” can beinterpreted as syntactic sugar for this notation.
Commenters also expressed the following relevant points:
As part of the discussion of this PEP, it’s important to decide if the orderinginformation of the keyword arguments is important, and if indexes and keys canbe ordered in an arbitrary way (e.g.a[1,Z=3,2,R=4]).PEP 468tries to address the first point by proposing the use of an ordereddict,however one would be inclined to accept that keyword arguments in indexing areequivalent to kwargs in function calls, and therefore as of today equallyunordered, and with the same restrictions.
Relative to Strategy “New argument contents”, a comment from Ian Cordascopoints out that
“it would be unreasonable for just one method to behave totallydifferently from the standard behaviour in Python. It would be confusing foronly__getitem__(and ostensibly,__setitem__) to take keywordarguments but instead of turning them into a dictionary, turn them intoindividual single-item dictionaries.” We agree with his point, however it mustbe pointed out that__getitem__is already special in some regards when itcomes to passed arguments.
Chris Angelico also states:
“it seems very odd to start out by saying “here, let’s give indexing theoption to carry keyword args, just like with function calls”, and then comeback and say “oh, but unlike function calls, they’re inherently ordered andcarried very differently”.” Again, we agree on this point. The moststraightforward strategy to keep homogeneity would be Strategy “kwargsargument”, opening to a**kwargsargument on__getitem__.
One of the authors (Stefano Borini) thinks that only the “strict dictionary”strategy is worth of implementation. It is non-ambiguous, simple, does notforce complex parsing, and addresses the problem of referring to axes eitherby position or by name. The “options” use case is probably best handled witha different approach, and may be irrelevant for this PEP. The alternative“named tuple” is another valid choice.
Introducing a “default” keyword could makedict.get() obsolete, which would bereplaced byd["key",default=3]. Chris Angelico however states:
“Currently, you need to write__getitem__(which raises an exception onfinding a problem) plus something else, e.g.get(), which returns a defaultinstead. By your proposal, both branches would go inside__getitem__, whichmeans they could share code; but there still need to be two branches.”
Additionally, Chris continues:
“There’ll be an ad-hoc and fairly arbitrary puddle of names (some will godefault=, others will say that’s way too long and godef=, except thatthat’s a keyword so they’ll usedflt=or something…), unless there’s astrong force pushing people to one consistent name.”.
This argument is valid but it’s equally valid for any function call, and isgenerally fixed by established convention and documentation.
User Drekin commented: “The case ofa[Z=3] anda[{"Z":3}] is similar tocurrenta[1,2] anda[(1,2)]. Even though one may argue that the parenthesesare actually not part of tuple notation but are just needed because of syntax,it may look as degeneracy of notation when compared to function call:f(1,2)is not the same thing asf((1,2)).”.
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0472.rst
Last modified:2024-12-19 20:04:05 GMT