Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 637 – Support for indexing with keyword arguments

Author:
Stefano Borini
Sponsor:
Steven D’Aprano
Discussions-To:
Python-Ideas list
Status:
Rejected
Type:
Standards Track
Created:
24-Aug-2020
Python-Version:
3.10
Post-History:
23-Sep-2020
Resolution:
Python-Dev thread

Table of Contents

Note

This PEP has been rejected. In general, the cost of introducing new syntaxwas not outweighed by the perceived benefits. See the link in the Resolutionheader field for details.

Abstract

At present keyword arguments are allowed in function calls, but not initem access. This PEP proposes that Python be extended to allow keywordarguments in item access.

The following example shows keyword arguments for ordinary function calls:

>>>val=f(1,2,a=3,b=4)

The proposal would extend the syntax to allow a similar constructto indexing operations:

>>>val=x[1,2,a=3,b=4]# getitem>>>x[1,2,a=3,b=4]=val# setitem>>>delx[1,2,a=3,b=4]# delitem

and would also provide appropriate semantics. Single- and double-star unpacking ofarguments is also provided:

>>>val=x[*(1,2),**{a=3,b=4}]# Equivalent to above.

This PEP is a successor toPEP 472, which was rejected due to lack ofinterest in 2019. Since then there’s been renewed interest in the feature.

Overview

Background

PEP 472 was opened in 2014. The PEP detailed various use cases and was created byextracting implementation strategies from a broad discussion on thepython-ideas mailing list, although no clear consensus was reached on which strategyshould be used. Many corner cases have been examined more closely and feltawkward, backward incompatible or both.

The PEP was eventually rejected in 2019[1] mostlydue to lack of interest for the feature despite its 5 years of existence.

However, with the introduction of type hints inPEP 484 thesquare bracket notation has been used consistently to enrich the typingannotations, e.g. to specify a list of integers as Sequence[int]. Additionally,there has been an expanded growth of packages for data analysis such as pandasand xarray, which use names to describe columns in a table (pandas) or axis inan nd-array (xarray). These packages allow users to access specific data bynames, but cannot currently use index notation ([]) for this functionality.

As a result, a renewed interest in a more flexible syntax that would allow fornamed information has been expressed occasionally in many different threads onpython-ideas, recently by Caleb Donovick[2] in 2019 and AndrasTantos[3] in 2020. These requests prompted a strong activity on thepython-ideas mailing list, where the various options have been re-discussed anda general consensus on an implementation strategy has now been reached.

Use cases

The following practical use cases present different cases where a keywordspecification would improve notation and provide additional value:

  1. To provide a more communicative meaning to the index, preventing e.g. accidentalinversion of indexes:
    >>>grid_position[x=3,y=5,z=8]>>>rain_amount[time=0:12,location=location]>>>matrix[row=20,col=40]
  2. To enrich the typing notation with keywords, especially during the use of generics:
    deffunction(value:MyType[T=int]):
  3. In some domain, such as computational physics and chemistry, the use of anotation such asBasis[Z=5] is a Domain Specific Language notation to representa level of accuracy:
    >>>low_accuracy_energy=computeEnergy(molecule,BasisSet[Z=3])
  4. Pandas currently uses a notation such as:
    >>>df[df['x']==1]

    which could be replaced withdf[x=1].

  5. xarray has named dimensions. Currently these are handled with functions .isel:
    >>>data.isel(row=10)# Returns the tenth row

    which could also be replaced withdata[row=10]. A more complex example:

    >>># old syntax>>>da.isel(space=0,time=slice(None,2))[...]=spam>>># new syntax>>>da[space=0,time=:2]=spam

    Another example:

    >>># old syntax>>>ds["empty"].loc[dict(lon=5,lat=6)]=10>>># new syntax>>>ds["empty"][lon=5,lat=6]=10>>># old syntax>>>ds["empty"].loc[dict(lon=slice(1,5),lat=slice(3,None))]=10>>># new syntax>>>ds["empty"][lon=1:5,lat=6:]=10
  6. Functions/methods whose argument is another function (plus itsarguments) need some way to determine which arguments are destined forthe target function, and which are used to configure how they run thetarget. This is simple (if non-extensible) for positional parameters,but we need some way to distinguish these for keywords.[4]

    An indexed notation would afford a Pythonic way to pass keywordarguments to these functions without cluttering the caller’s code.

    >>># Let's start this example with basic syntax without keywords.>>># the positional values are arguments to `func` while>>># `name=` is processed by `trio.run`.>>>trio.run(func,value1,value2,name="func")>>># `trio.run` ends up calling `func(value1, value2)`.>>># If we want/need to pass value2 by keyword (keyword-only argument,>>># additional arguments that won't break backwards compatibility ...),>>># currently we need to resort to functools.partial:>>>trio.run(functools.partial(func,param2=value2),value1,name="func")>>>trio.run(functools.partial(func,value1,param2=value2),name="func")>>># One possible workaround is to convert `trio.run` to an object>>># with a `__call__` method, and use an "option" helper,>>>trio.run.option(name="func")(func,value1,param2=value2)>>># However, foo(bar)(baz) is uncommon and thus disruptive to the reader.>>># Also, you need to remember the name of the `option` method.>>># This PEP allows us to replace `option` with `__getitem__`.>>># The call is now shorter, more mnemonic, and looks+works like typing>>>trio.run[name="func"](func,value1,param2=value2)
  7. Availability of star arguments would benefitPEP 646 Variadic Generics,especially in the formsa[*x] anda[*x,*y,p,q,*z]. The PEP detailsexactly this notation in its “Unpacking: Star Operator” section.

It is important to note that how the notation is interpreted is up to theimplementation. This PEP only defines and dictates the behavior of Pythonregarding passed keyword arguments, not how these arguments should beinterpreted and used by the implementing class.

Current status of indexing operation

Before detailing the new syntax and semantics to the indexing notation, it isrelevant to analyse how the indexing notation works today, in which contexts,and how it is different from a function call.

Subscriptingobj[x] is, effectively, an alternate and specialised form offunction call syntax with a number of differences and restrictions compared toobj(x). The current Python syntax focuses exclusively on position to expressthe index, and also contains syntactic sugar to refer to non-punctiformselection (slices). Some common examples:

>>>a[3]# returns the fourth element of 'a'>>>a[1:10:2]# slice notation (extract a non-trivial data subset)>>>a[3,2]# multiple indexes (for multidimensional arrays)

This translates into a__(get|set|del)item__ dunder call which is passed a singleparameter containing the index (for__getitem__ and__delitem__) or two parameterscontaining index and value (for__setitem__).

The behavior of the indexing call is fundamentally different from a function callin various aspects:

The first difference is in meaning to the reader. A function call says“arbitrary function call potentially with side-effects”. An indexing operationsays “lookup”, typically to point at a subset or specific sub-aspect of anentity (as in the case of typing notation). This fundamental difference meansthat, while we cannot prevent abuse, implementors should be aware that theintroduction of keyword arguments to alter the behavior of the lookup mayviolate this intrinsic meaning.

The second difference of the indexing notation compared to a functionis that indexing can be used for both getting and setting operations.In Python, a function cannot be on the left hand side of an assignment. Inother words, both of these are valid:

>>>x=a[1,2]>>>a[1,2]=5

but only the first one of these is valid:

>>>x=f(1,2)>>>f(1,2)=5# invalid

This asymmetry is important, and makes one understand that there is a naturalimbalance between the two forms. It is therefore not a given that the twoshould behave transparently and symmetrically.

The third difference is that functions have names assigned to theirarguments, unless the passed parameters are captured with*args, in which casethey end up as entries in the args tuple. In other words, functions alreadyhave anonymous argument semantic, exactly like the indexing operation. However,__(get|set|del)item__ is not always receiving a tuple as theindex argument(to be uniform in behavior with*args). In fact, given a trivial class:

classX:def__getitem__(self,index):print(index)

The index operation basically forwards the content of the square brackets “as is”in theindex argument:

>>>x=X()>>>x[0]0>>>x[0,1](0, 1)>>>x[(0,1)](0, 1)>>>>>>x[()]()>>>x[{1,2,3}]{1, 2, 3}>>>x["hello"]hello>>>x["hello","hi"]('hello', 'hi')

The fourth difference is that the indexing operation knows how to convertcolon notations to slices, thanks to support from the parser. This is valid:

a[1:3]

this one isn’t:

f(1:3)

The fifth difference is that there’s no zero-argument form. This is valid:

f()

this one isn’t:

a[]

Specification

Before describing the specification, it is important to stress the difference innomenclature betweenpositional index,final index andkeyword argument, as it is important tounderstand the fundamental asymmetries at play. The__(get|set|del)item__is fundamentally an indexing operation, and the way the element is retrieved,set, or deleted is through an index, thefinal index.

The current status quo is to directly build thefinal index from what is passed betweensquare brackets, thepositional index. In other words, what is passed in thesquare brackets is trivially used to generate what the code in__getitem__ then usesfor the indicisation operation. As we already saw for the dict,d[1] has apositional index of1 and also a final index of1 (because it’s the element that isthen added to the dictionary) andd[1,2] has positional index of(1,2) andfinal index also of(1,2) (because yet again it’s the element that is added to the dictionary).However, the positional indexd[1,2:3] is not accepted by the dictionary, becausethere’s no way to transform the positional index into a final index, as the slice object isunhashable. The positional index is what is currently known as theindex parameter in__getitem__. Nevertheless, nothing prevents to construct a dictionary-like class thatcreates the final index by e.g. converting the positional index to a string.

This PEP extends the current status quo, and grants more flexibility tocreate the final index via an enhanced syntax that combines the positional indexand keyword arguments, if passed.

The above brings an important point across. Keyword arguments, in the context of the indexoperation, may be used to take indexing decisions to obtain the final index, and thereforewill have to accept values that are unconventional for functions. See forexample use case 1, where a slice is accepted.

The successful implementation of this PEP will result in the following behavior:

  1. An empty subscript is still illegal, regardless of context (see Rejected Ideas):
    obj[]# SyntaxError
  2. A single index value remains a single index value when passed:
    obj[index]# calls type(obj).__getitem__(obj, index)obj[index]=value# calls type(obj).__setitem__(obj, index, value)delobj[index]# calls type(obj).__delitem__(obj, index)

    This remains the case even if the index is followed by keywords; see point 5 below.

  3. Comma-separated arguments are still parsed as a tuple and passed asa single positional argument:
    obj[spam,eggs]# calls type(obj).__getitem__(obj, (spam, eggs))obj[spam,eggs]=value# calls type(obj).__setitem__(obj, (spam, eggs), value)delobj[spam,eggs]# calls type(obj).__delitem__(obj, (spam, eggs))

    The points above mean that classes which do not want to support keywordarguments in subscripts need do nothing at all, and the feature is thereforecompletely backwards compatible.

  4. Keyword arguments, if any, must follow positional arguments:
    obj[1,2,spam=None,3]# SyntaxError

    This is like function calls, where intermixing positional and keywordarguments give a SyntaxError.

  5. Keyword subscripts, if any, will be handled like they are infunction calls. Examples:
    # Single index with keywords:obj[index,spam=1,eggs=2]# calls type(obj).__getitem__(obj, index, spam=1, eggs=2)obj[index,spam=1,eggs=2]=value# calls type(obj).__setitem__(obj, index, value, spam=1, eggs=2)delobj[index,spam=1,eggs=2]# calls type(obj).__delitem__(obj, index, spam=1, eggs=2)# Comma-separated indices with keywords:obj[foo,bar,spam=1,eggs=2]# calls type(obj).__getitem__(obj, (foo, bar), spam=1, eggs=2)obj[foo,bar,spam=1,eggs=2]=value# calls type(obj).__setitem__(obj, (foo, bar), value, spam=1, eggs=2)delobj[foo,bar,spam=1,eggs=2]# calls type(obj).__detitem__(obj, (foo, bar), spam=1, eggs=2)

    Note that:

    • a single positional index will not turn into a tuplejust because one adds a keyword value.
    • for__setitem__, the same order is retained for index and value.The keyword arguments go at the end, as is normal for a functiondefinition.
  6. The same rules apply with respect to keyword subscripts as forkeywords in function calls:
    • the interpreter matches up each keyword subscript to a named parameterin the appropriate method;
    • if a named parameter is used twice, that is an error;
    • if there are any named parameters left over (without a value) when thekeywords are all used, they are assigned their default value (if any);
    • if any such parameter doesn’t have a default, that is an error;
    • if there are any keyword subscripts remaining after all the namedparameters are filled, and the method has a**kwargs parameter,they are bound to the**kwargs parameter as a dict;
    • but if no**kwargs parameter is defined, it is an error.
  7. Sequence unpacking is allowed inside subscripts:
    obj[*items]

    This allows notations such as[:,*args,:], which could be treatedas[(slice(None),*args,slice(None))]. Multiple star unpacking areallowed:

    obj[1,*(2,3),*(4,5),6,foo=5]# Equivalent to obj[(1, 2, 3, 4, 5, 6), foo=3)

    The following notation equivalence must be honored:

    obj[*()]# Equivalent to obj[()]obj[*(),foo=3]# Equivalent to obj[(), foo=3]obj[*(x,)]# Equivalent to obj[(x,)]obj[*(x,),]# Equivalent to obj[(x,)]

    Note in particular case 3: sequence unpacking of a single element willnot behave as if only one single argument was passed. A related case isthe following example:

    obj[1,*(),foo=5]# Equivalent to obj[(1,), foo=5]# calls type(obj).__getitem__(obj, (1,), foo=5)

    However, as we saw earlier, for backward compatibility a single index will be passed as is:

    obj[1,foo=5]# calls type(obj).__getitem__(obj, 1, foo=5)

    In other words, a single positional index will be passed “as is” only if no sequenceunpacking is present. If a sequence unpacking is present, then the index will become a tuple,regardless of the resulting number of elements in the index after the unpacking has taken place.

  8. Dict unpacking is permitted:
    items={'spam':1,'eggs':2}obj[index,**items]# equivalent to obj[index, spam=1, eggs=2]

    The following notation equivalent should be honored:

    obj[**{}]# Equivalent to obj[()]obj[3,**{}]# Equivalent to obj[3]
  9. Keyword-only subscripts are permitted. The positional index will be the empty tuple:
    obj[spam=1,eggs=2]# calls type(obj).__getitem__(obj, (), spam=1, eggs=2)obj[spam=1,eggs=2]=5# calls type(obj).__setitem__(obj, (), 5, spam=1, eggs=2)delobj[spam=1,eggs=2]# calls type(obj).__delitem__(obj, (), spam=1, eggs=2)

    The choice of the empty tuple as a sentinel has been debated. Details are provided inthe Rejected Ideas section.

  10. Keyword arguments must allow slice syntax:
    obj[3:4,spam=1:4,eggs=2]# calls type(obj).__getitem__(obj, slice(3, 4, None), spam=slice(1, 4, None), eggs=2)

    This may open up the possibility to accept the same syntax for general functioncalls, but this is not part of this recommendation.

  11. Keyword arguments allow for default values:
    # Given type(obj).__getitem__(obj, index, spam=True, eggs=2)obj[3]# Valid. index = 3, spam = True, eggs = 2obj[3,spam=False]# Valid. index = 3, spam = False, eggs = 2obj[spam=False]# Valid. index = (), spam = False, eggs = 2obj[]# Invalid.
  12. The same semantics given above must be extended to__class__getitem__:SincePEP 560, type hints are dispatched so that forx[y], if no__getitem__ method is found, andx is a type (class) object,andx has a class method__class_getitem__, that method iscalled. The same changes should be applied to this method as well,so that a writing likelist[T=int] can be accepted.

Indexing behavior in standard classes (dict, list, etc.)

None of what is proposed in this PEP will change the behavior of the currentcore classes that use indexing. Adding keywords to the index operation forcustom classes is not the same as modifying e.g. the standard dict type tohandle keyword arguments. In fact, dict (as well as list and other stdlibclasses with indexing semantics) will remain the same and will continue not toaccept keyword arguments. In other words, ifd is adict, thestatementd[1,a=2] will raiseTypeError, as their implementation willnot support the use of keyword arguments. The same holds for all other classes(list, dict, etc.)

Corner case and Gotchas

With the introduction of the new notation, a few corner cases need to be analysed.

  1. Technically, if a class defines their getter like this:
    def__getitem__(self,index):

    then the caller could call that using keyword syntax, like these two cases:

    obj[3,index=4]obj[index=1]

    The resulting behavior would be an error automatically, since it would be likeattempting to call the method with two values for theindex argument, andaTypeError will be raised. In the first case, theindex would be3,in the second case, it would be the empty tuple().

    Note that this behavior applies for all currently existing classes that rely onindexing, meaning that there is no way for the new behavior to introducebackward compatibility issues on this respect.

    Classes that wish to stress this behavior explicitly can define theirparameters as positional-only:

    def__getitem__(self,index,/):
  2. a similar case occurs with setter notation:
    # Given type(obj).__setitem__(obj, index, value):obj[1,value=3]=5

    This poses no issue because the value is passed automatically, and the Python interpreter will raiseTypeError:gotmultiplevaluesforkeywordargument'value'

  3. If the subscript dunders are declared to use positional-or-keywordparameters, there may be some surprising cases when arguments are passedto the method. Given the signature:
    def__getitem__(self,index,direction='north')

    if the caller uses this:

    obj[0,'south']

    they will probably be surprised by the method call:

    # expected type(obj).__getitem__(obj, 0, direction='south')# but actually get:type(obj).__getitem__(obj,(0,'south'),direction='north')

    Solution: best practice suggests that keyword subscripts should beflagged as keyword-only when possible:

    def__getitem__(self,index,*,direction='north')

    The interpreter need not enforce this rule, as there could be scenarioswhere this is the desired behaviour. But linters may choose to warnabout subscript methods which don’t use the keyword-only flag.

  4. As we saw, a single value followed by a keyword argument will not be changed into a tuple, i.e.:d[1,a=3] is treated as__getitem__(d,1,a=3), NOT__getitem__(d,(1,),a=3). It would beextremely confusing if adding keyword arguments were to change the type of the passed index.In other words, adding a keyword to a single-valued subscript will not change it into a tuple.For those cases where an actual tuple needs to be passed, a proper syntax will have to be used:
    obj[(1,),a=3]# calls type(obj).__getitem__(obj, (1,), a=3)

    In this case, the call is passing a single element (which is passed as is, as from rule above),only that the single element happens to be a tuple.

    Note that this behavior just reveals the truth that theobj[1,] notation is shorthand forobj[(1,)] (and alsoobj[1] is shorthand forobj[(1)], with the expected behavior).When keywords are present, the rule that you can omit this outermost pair of parentheses is nolonger true:

    obj[1]# calls type(obj).__getitem__(obj, 1)obj[1,a=3]# calls type(obj).__getitem__(obj, 1, a=3)obj[1,]# calls type(obj).__getitem__(obj, (1,))obj[(1,),a=3]# calls type(obj).__getitem__(obj, (1,), a=3)

    This is particularly relevant in the case where two entries are passed:

    obj[1,2]# calls type(obj).__getitem__(obj, (1, 2))obj[(1,2)]# same as aboveobj[1,2,a=3]# calls type(obj).__getitem__(obj, (1, 2), a=3)obj[(1,2),a=3]# calls type(obj).__getitem__(obj, (1, 2), a=3)

    And particularly when the tuple is extracted as a variable:

    t=(1,2)obj[t]# calls type(obj).__getitem__(obj, (1, 2))obj[t,a=3]# calls type(obj).__getitem__(obj, (1, 2), a=3)

    Why? because in the caseobj[1,2,a=3] we are passing two elements (whichare then packed as a tuple and passed as the index). In the caseobj[(1,2),a=3]we are passing a single element (which is passed as is) which happens to be a tuple.The final result is that they are the same.

C Interface

Resolution of the indexing operation is performed through a call to the following functions

  • PyObject_GetItem(PyObject*o,PyObject*key) for the get operation
  • PyObject_SetItem(PyObject*o,PyObject*key,PyObject*value) for the set operation
  • PyObject_DelItem(PyObject*o,PyObject*key) for the del operation

These functions are used extensively within the Python executable, and arealso part of the public C API, as exported byInclude/abstract.h. It is clear thatthe signature of this function cannot be changed, and different C level functionsneed to be implemented to support the extended call. We propose

  • PyObject_GetItemWithKeywords(PyObject*o,PyObject*key,PyObject*kwargs)
  • PyObject_SetItemWithKeywords(PyObject*o,PyObject*key,PyObject*value,PyObject*kwargs)
  • PyObject_GetItemWithKeywords(PyObject*o,PyObject*key,PyObject*kwargs)

New opcodes will be needed for the enhanced call. Currently, theimplementation usesBINARY_SUBSCR,STORE_SUBSCR andDELETE_SUBSCRto invoke the old functions. We proposeBINARY_SUBSCR_KW,STORE_SUBSCR_KW andDELETE_SUBSCR_KW for the new operations. Thecompiler will have to generate these new opcodes. Theold C implementations will call the extended methods passingNULLas kwargs.

Finally, the following new slots must be added to thePyMappingMethods struct:

  • mp_subscript_kw
  • mp_ass_subscript_kw

These slots will have the appropriate signature to handle the dictionary objectcontaining the keywords.

“How to teach” recommendations

One request that occurred during feedback sessions was to detail a possible narrativefor teaching the feature, e.g. to students, data scientists, and similar audience.This section addresses that need.

We will only describe the indexing from the perspective of use, not ofimplementation, because it is the aspect that the above mentioned audience willlikely encounter. Only a subset of the users will have to implement their owndunder functions, and can be considered advanced usage. A proper explanation could be:

The indexing operation is generally used to refer to a subset of a largerdataset by means of an index. In the commonly seen cases, the index is made byone or more numbers, strings, slices, etc.

Some types may allow indexing to occur not only with the index, but also withnamed values. These named values are given between square brackets using thesame syntax used for function call keyword arguments. The meaning of the namesand their use is found in the documentation of the type, as it varies from onetype to another.

The teacher will now show some practical real world examples, explaining thesemantics of the feature in the shown library. At the time of writing theseexamples do not exist, obviously, but the libraries most likely to implementthe feature are pandas and numpy, possibly as a method to refer to columns byname.

Reference Implementation

A reference implementation is currently being developed here[6].

Workarounds

Every PEP that changes the Python language should“clearly explain whythe existing language specification is inadequate to address theproblem that the PEP solves”.

Some rough equivalents to the proposed extension, which we call work-arounds,are already possible. The work-arounds provide an alternative to enabling thenew syntax, while leaving the semantics to be defined elsewhere.

These work-arounds follow. In them the helpersH andP are not intended tobe universal. For example, a module or package might require the use of its ownhelpers.

  1. User defined classes can be givengetitem anddelitem methods,that respectively get and delete values stored in a container:
    >>>val=x.getitem(1,2,a=3,b=4)>>>x.delitem(1,2,a=3,b=4)

    The same can’t be done forsetitem. It’s not valid syntax:

    >>>x.setitem(1,2,a=3,b=4)=valSyntaxError: can't assign to function call
  2. A helper class, here calledH, can be used to swap the containerand parameter roles. In other words, we use:
    H(1,2,a=3,b=4)[x]

    as a substitute for:

    x[1,2,a=3,b=4]

    This method will work forgetitem,delitem and also forsetitem. This is because:

    >>>H(1,2,a=3,b=4)[x]=val

    is valid syntax, which can be given the appropriate semantics.

  3. A helper function, here calledP, can be used to store thearguments in a single object. For example:
    >>>x[P(1,2,a=3,b=4)]=val

    is valid syntax, and can be given the appropriate semantics.

  4. Thelo:hi:step syntax for slices is sometimes very useful. Thissyntax is not directly available in the work-arounds. However:
    s[lo:hi:step]

    provides a work-around that is available everything, where:

    classS:def__getitem__(self,key):returnkeys=S()

    defines the helper objects.

Rejected Ideas

Previous PEP 472 solutions

PEP 472 presents a good amount of ideas that are now all to be consideredRejected. A personal email from D’Aprano to the author specifically said:

I have now carefully read through PEP 472 in full, and I am afraid Icannot support any of the strategies currently in the PEP.

We agree that those options are inferior to the currently presented, for onereason or another.

To keep this document compact, we will not present here the objections forall options presented inPEP 472. Suffice to say that they were discussed,and each proposed alternative had one or few dealbreakers.

Adding new dunders

It was proposed to introduce new dunders__(get|set|del)item_ex__that are invoked over the__(get|set|del)item__ triad, if they are present.

The rationale around this choice is to make the intuition around how to add kwdarg support to square brackets more obvious and in line with the functionbehavior. Given:

def__getitem_ex__(self,x,y):...

These all just work and produce the same result effortlessly:

obj[1,2]obj[1,y=2]obj[y=2,x=1]

In other words, this solution would unify the behavior of__getitem__ to the traditionalfunction signature, but since we can’t change__getitem__ and break backward compatibility,we would have an extended version that is used preferentially.

The problems with this approach were found to be:

  • It will slow down subscripting. For every subscript access, this new dunderattribute gets investigated on the class, and if it is not present then thedefault key translation function is executed.Different ideas were proposed to handle this, from wrapping the methodonly at class instantiation time, to add a bit flag to signal the availabilityof these methods. Regardess of the solution, the new dunder would be effectiveonly if added at class creation time, not if it’s added later. This wouldbe unusual and would disallow (and behave unexpectedly) monkeypatching of themethods for whatever reason it might be needed.
  • It adds complexity to the mechanism.
  • Will require a long and painful transition period during which timelibraries will have to somehow support both calling conventions, because mostlikely, the extended methods will delegate to the traditional ones when theright conditions are matched in the arguments, or some classes will supportthe traditional dunder and others the extended dunder. While this will notaffect calling code, it will affect development.
  • it would potentially lead to mixed situations where the extended version isdefined for the getter, but not for the setter.
  • In the__setitem_ex__ signature, value would have to be made the firstelement, because the index is of arbitrary length depending on the specifiedindexes. This would look awkward because the visual notation does not matchthe signature:
    obj[1,2]=3# calls type(obj).__setitem_ex__(obj, 3, 1, 2)
  • the solution relies on the assumption that all keyword indices necessarily mapinto positional indices, or that they must have a name. This assumption may befalse: xarray, which is the primary Python package for numpy arrays withlabelled dimensions, supports indexing by additional dimensions (so called“non-dimension coordinates”) that don’t correspond directly to the dimensionsof the underlying numpy array, and those have no position to match up to.In other words, anonymous indexes are a plausible use case that this solutionwould remove, although it could be argued that using*args would solvethat issue.

Adding an adapter function

Similar to the above, in the sense that a pre-function would be called toconvert the “new style” indexing into “old style indexing” that is then passed.Has problems similar to the above.

create a new “kwslice” object

This proposal has already been explored in “New arguments contents” P4 inPEP 472:

obj[a,b:c,x=1]# calls type(obj).__getitem__(obj, a, slice(b, c), key(x=1))

This solution requires everyone who needs keyword arguments to parse the tupleand/or key object by hand to extract them. This is painful and opens up to theget/set/del function to always accept arbitrary keyword arguments, whether theymake sense or not. We want the developer to be able to specify which argumentsmake sense and which ones do not.

Using a single bit to change the behavior

A special class dunder flag:

__keyfn__=True

would change the signature of the__get|set|delitem__ to a “function like” dispatch,meaning that this:

>>>d[1,2,z=3]

would result in a call to:

>>>type(obj).__getitem__(obj,1,2,z=3)# instead of type(obj).__getitem__(obj, (1, 2), z=3)

This option has been rejected because it feels odd that a signature of a methoddepends on a specific value of another dunder. It would be confusing for bothstatic type checkers and for humans: a static type checker would have to hard-codea special case for this, because there really is nothing else in Pythonwhere the signature of a dunder depends on the value of another dunder.A human that has to implement a__getitem__ dunder would have to look if in theclass (or in any of its subclasses) for a__keyfn__ before the dunder can be written.Moreover, adding a base classes that have the__keyfn__ flag set would breakthe signature of the current methods. This would be even more problematic if theflag is changed at runtime, or if the flag is generated by calling a functionthat returns randomly True or something else.

Allowing for empty index notation obj[]

The current proposal preventsobj[] from being valid notation. Howevera commenter stated

We haveTuple[int,int] as a tuple of two integers. And we haveTuple[int]as a tuple of one integer. And occasionally we need to spell a tuple ofnovalues, since that’s the type of(). But we currently are forced to writethat asTuple[()]. If we allowedTuple[] that odd edge case would beremoved.

So I probably would be okay with allowingobj[] syntactically, as long as thedict type could be made to reject it.

This proposal already established that, in case no positional index is given, thepassed value must be the empty tuple. Allowing for the empty index notation wouldmake the dictionary type accept it automatically, to insert or refer to the value withthe empty tuple as key. Moreover, a typing notation such asTuple[] can easilybe written asTuple without the indexing notation.

However, subsequent discussion with Brandt Bucher during implementation has revealedthat the caseobj[] would fit a natural evolution for variadic generics, givingmore strength to the above comment. In the end, after a discussion between D’Aprano,Bucher and the author, we decided to leave theobj[] notation as a syntaxerror for now, and possibly extend the notation with an additional PEP to holdthe equivalenceobj[] asobj[()].

Sentinel value for no given positional index

The topic of which value to pass as the index in the case of:

obj[k=3]

has been considerably debated.

One apparently rational choice would be to pass no value at all, by making use ofthe keyword only argument feature, but unfortunately will not work well withthe__setitem__ dunder, as a positional element for the value is alwayspassed, and we can’t “skip over” the index one unless we introduce a very weird behaviorwhere the first argument refers to the index when specified, and to the value whenindex is not specified. This is extremely deceiving and error prone.

The above consideration makes it impossible to have a keyword only dunder, andopens up the question of what entity to pass for the index position when no indexis passed:

obj[k=3]=5# would call type(obj).__setitem__(obj, ???, 5, k=3)

A proposed hack would be to let the user specify which entity to use when anindex is not specified, by specifying a default for theindex, but thisforces necessarily to also specify a (never going to be used, as a value isalways passed by design) default for thevalue, as we can’t havenon-default arguments after defaulted one:

def__setitem__(self,index=SENTINEL,value=NEVERUSED,*,k)

which seems ugly, redundant and confusing. We must therefore accept that someform of sentinel index must be passed by the Python implementation when theobj[k=3] notation is used. This also means that default arguments to thoseparameters are simply never going to be used (but it’s already thecase with the current implementation, so no change there).

Additionally, some classes may want to use**kwargs, instead of a keyword-onlyargument, meaning that having a definition like:

def__setitem__(self,index,value,**kwargs):

and a user that wants to pass a keywordvalue:

x[value=1]=0

expecting a call like:

type(obj).__setitem__(obj,SENTINEL,0,**{"value":1})

will instead accidentally be caught by the namedvalue, producing aduplicatevalueerror. The user should not be worried about the actuallocal names of those two arguments if they are, for all practical purposes,positional only. Unfortunately, using positional-only values will ensure thisdoes not happen but it will still not solve the need to pass bothindex andvalue even when the index is not provided. The point is that the user should notbe prevented to use keyword arguments to refer to a columnindex,value(orself) just because the class implementor happens to use those namesin the parameter list.

Moreover, we also require the three dunders to behave in the same way: it wouldbe extremely inconvenient if only__setitem__ were to receive thissentinel, and__get|delitem__ would not because they can get away with asignature that allows for no index specification, thus allowing for auser-specified default index.

Whatever the choice of the sentinel, it will make the following casesdegenerate and thus impossible to differentiate in the dunder:

obj[k=3]obj[SENTINEL,k=3]

The question now shifts to which entity should represent the sentinel:the options were:

  1. Empty tuple
  2. None
  3. NotImplemented
  4. a new sentinel object (e.g. NoIndex)

For option 1, the call will become:

type(obj).__getitem__(obj,(),k=3)

therefore makingobj[k=3] andobj[(),k=3] degenerate and indistinguishable.

This option sounds appealing because:

  1. The numpy community was inquired[5], and the general consensusof the responses was that the empty tuple felt appropriate.
  2. It shows a parallel with the behavior of*args in a function, whenno positional arguments are given:
    >>>deffoo(*args,**kwargs):...print(args,kwargs)...>>>foo(k=3)() {'k': 3}

    Although we do accept the following asymmetry in behavior compared to functionswhen a single value is passed, but that ship has sailed:

    >>>foo(5,k=3)(5,) {'k': 3}   # for indexing, a plain 5, not a 1-tuple is passed

For option 2, usingNone, it was objected that NumPy uses it to indicateinserting a new axis/dimensions (there’s anp.newaxis alias as well):

arr=np.array(5)arr.ndim==0arr[None].ndim==arr[None,].ndim==1

While this is not an insurmountable issue, it certainly will ripple onto numpy.

The only issues with both the above is that both the empty tuple and None arepotential legitimate indexes, and there might be value in being able to differentiatethe two degenerate cases.

So, an alternative strategy (option 3) would be to use an existing entity that isunlikely to be used as a valid index. One option could be the current built-in constantNotImplemented, which is currently returned by operators methods toreport that they do not implement a particular operation, and a different strategyshould be attempted (e.g. to ask the other object). Unfortunately, its name andtraditional use calls back to a feature that is not available, rather than thefact that something was not passed by the user.

This leaves us with option 4: a new built-in constant. This constantmust be unhashable (so it’s never going to be a valid key) and have a clearname that makes it obvious its context:NoIndex. Thiswould solve all the above issues, but the question is: is it worth it?

From a quick inquire, it seems that most people on python-ideas seem to believeit’s not crucial, and the empty tuple is an acceptable option. Hence theresulting series will be:

obj[k=3]# type(obj).__getitem__(obj, (), k=3). Empty tupleobj[1,k=3]# type(obj).__getitem__(obj, 1, k=3). Integerobj[1,2,k=3]# type(obj).__getitem__(obj, (1, 2), k=3). Tuple

and the following two notation will be degenerate:

obj[(),k=3]# type(obj).__getitem__(obj, (), k=3)obj[k=3]# type(obj).__getitem__(obj, (), k=3)

Common objections

  1. Just use a method call.

    One of the use cases is typing, where the indexing is used exclusively, andfunction calls are out of the question. Moreover, function calls do not handleslice notation, which is commonly used in some cases for arrays.

    One problem is type hint creation has been extended to built-ins in Python 3.9,so that you do not have to import Dict, List, et al anymore.

    Without kwdargs inside[], you would not be able to do this:

    Vector=dict[i=float,j=float]

    but for obvious reasons, call syntax using builtins to create custom type hintsisn’t an option:

    dict(i=float,j=float)# would create a dictionary, not a type

    Finally, function calls do not allow for a setitem-like notation, as shownin the Overview: operations such asf(1,x=3)=5 are not allowed, and areinstead allowed for indexing operations.

References

[1]
“Rejection of PEP 472”(https://mail.python.org/pipermail/python-dev/2019-March/156693.html)
[2]
“Allow kwargs in __{get|set|del}item__”(https://mail.python.org/archives/list/python-ideas@python.org/thread/EUGDRTRFIY36K4RM3QRR52CKCI7MIR2M/)
[3]
“PEP 472 – Support for indexing with keyword arguments”(https://mail.python.org/archives/list/python-ideas@python.org/thread/6OGAFDWCXT5QVV23OZWKBY4TXGZBVYZS/)
[4]
“trio.run() should take **kwargs in addition to *args”(https://github.com/python-trio/trio/issues/470)
[5]
“[Numpy-discussion] Request for comments on PEP 637 - Support for indexing with keyword arguments”(http://numpy-discussion.10968.n7.nabble.com/Request-for-comments-on-PEP-637-Support-for-indexing-with-keyword-arguments-td48489.html)
[6]
“Reference implementation”(https://github.com/python/cpython/compare/master…stefanoborini:PEP-637-implementation-attempt-2)

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0637.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp