Python Enhancement Proposals

Python »
PEP Index »
PEP 3106

PEP 3106 – Revamping dict.keys(), .values() and .items()

Author:: Guido van Rossum
Status:

Abstract

This PEP proposes to change the .keys(), .values() and .items()methods of the built-in dict type to return a set-like or unorderedcontainer object whose contents are derived from the underlyingdictionary rather than a list which is a copy of the keys, etc.; andto remove the .iterkeys(), .itervalues() and .iteritems() methods.

The approach is inspired by that taken in the Java CollectionsFramework[1].

Introduction

It has long been the plan to change the .keys(), .values() and.items() methods of the built-in dict type to return a morelightweight object than a list, and to get rid of .iterkeys(),.itervalues() and .iteritems(). The idea is that code that currently(in 2.x) reads:

fork,vind.iteritems():...

should be rewritten as:

fork,vind.items():...

(and similar for .itervalues() and .iterkeys(), except the latter isredundant since we can write that loop asforkind.)

Code that currently reads:

a=d.keys()# assume we really want a list here

(etc.) should be rewritten as

a = list(d.keys())

There are (at least) two ways to accomplish this. The original planwas to simply let .keys(), .values() and .items() return an iterator,i.e. exactly what iterkeys(), itervalues() and iteritems() return inPython 2.x. However, the Java Collections Framework[1] suggeststhat a better solution is possible: the methods return objects withset behavior (for .keys() and .items()) or multiset (== bag) behavior(for .values()) that do not contain copies of the keys, values oritems, but rather reference the underlying dict and pull their valuesout of the dict as needed.

The advantage of this approach is that one can still write code likethis:

a=d.items()fork,vina:...# And later, again:fork,vina:...

Effectively, iter(d.keys()) (etc.) in Python 3.0 will do whatd.iterkeys() (etc.) does in Python 2.x; but in most contexts we don’thave to write the iter() call because it is implied by a for-loop.

The objects returned by the .keys() and .items() methods behave likesets. The object returned by the values() method behaves like a muchsimpler unordered collection – it cannot be a set because duplicatevalues are possible.

Because of the set behavior, it will be possible to check whether twodicts have the same keys by simply testing:

ifa.keys()==b.keys():...

and similarly for .items().

These operations are thread-safe only to the extent that using them ina thread-unsafe way may cause an exception but will not causecorruption of the internal representation.

As in Python 2.x, mutating a dict while iterating over it using aniterator has an undefined effect and will in most cases raise aRuntimeError exception. (This is similar to the guarantees made bythe Java Collections Framework.)

The objects returned by .keys() and .items() are fully interoperablewith instances of the built-in set and frozenset types; for example:

set(d.keys())==d.keys()

is guaranteed to be True (except when d is being modifiedsimultaneously by another thread).

Specification

I’m using pseudo-code to specify the semantics:

classdict:# Omitting all other dict methods for brevity.# The .iterkeys(), .itervalues() and .iteritems() methods# will be removed.defkeys(self):returnd_keys(self)defitems(self):returnd_items(self)defvalues(self):returnd_values(self)classd_keys:def__init__(self,d):self.__d=ddef__len__(self):returnlen(self.__d)def__contains__(self,key):returnkeyinself.__ddef__iter__(self):forkeyinself.__d:yieldkey# The following operations should be implemented to be# compatible with sets; this can be done by exploiting# the above primitive operations:##   <, <=, ==, !=, >=, > (returning a bool)#   &, |, ^, - (returning a new, real set object)## as well as their method counterparts (.union(), etc.).## To specify the semantics, we can specify x == y as:##   set(x) == set(y)   if both x and y are d_keys instances#   set(x) == y        if x is a d_keys instance#   x == set(y)        if y is a d_keys instance## and so on for all other operations.classd_items:def__init__(self,d):self.__d=ddef__len__(self):returnlen(self.__d)def__contains__(self,(key,value)):returnkeyinself.__dandself.__d[key]==valuedef__iter__(self):forkeyinself.__d:yieldkey,self.__d[key]# As well as the set operations mentioned for d_keys above.# However the specifications suggested there will not work if# the values aren't hashable.  Fortunately, the operations can# still be implemented efficiently.  For example, this is how# intersection can be specified:def__and__(self,other):ifisinstance(other,(set,frozenset,d_keys)):result=set()foriteminother:ifiteminself:result.add(item)returnresultifnotisinstance(other,d_items):returnNotImplementedd={}iflen(other)<len(self):self,other=other,selfforiteminself:ifiteminother:key,value=itemd[key]=valuereturnd.items()# And here is equality:def__eq__(self,other):ifisinstance(other,(set,frozenset,d_keys)):iflen(self)!=len(other):returnFalseforiteminother:ifitemnotinself:returnFalsereturnTrueifnotisinstance(other,d_items):returnNotImplemented# XXX We could also just compare the underlying dicts...iflen(self)!=len(other):returnFalseforiteminself:ifitemnotinother:returnFalsereturnTruedef__ne__(self,other):# XXX Perhaps object.__ne__() should be defined this way.result=self.__eq__(other)ifresultisnotNotImplemented:result=notresultreturnresultclassd_values:def__init__(self,d):self.__d=ddef__len__(self):returnlen(self.__d)def__contains__(self,value):# This is slow, and it's what "x in y" uses as a fallback# if __contains__ is not defined; but I'd rather make it# explicit that it is supported.forvinself:ifv==value:returnTruereturnFalsedef__iter__(self):forkeyinself.__d:yieldself.__d[key]def__eq__(self,other):ifnotisinstance(other,d_values):returnNotImplementediflen(self)!=len(other):returnFalse# XXX Sometimes this could be optimized, but these are the# semantics: we can't depend on the values to be hashable# or comparable.olist=list(other)forxinself:try:olist.remove(x)exceptValueError:returnFalseassertolist==[]returnTruedef__ne__(self,other):result=self.__eq__(other)ifresultisnotNotImplemented:result=notresultreturnresult

Notes:

The view objects are not directly mutable, but don’t implement__hash__(); their value can change if the underlying dict is mutated.

The only requirements on the underlying dict are that it implements__getitem__(), __contains__(), __iter__(), and __len__().

We don’t implement .copy() – the presence of a .copy()method suggests that the copy has the same type as the original, butthat’s not feasible without copying the underlying dict. If you wanta copy of a specific type, like list or set, you can just pass oneof the above to the list() or set() constructor.

The specification implies that the order in which itemsare returned by .keys(), .values() and .items() is the same (just asit was in Python 2.x), because the order is all derived from the dictiterator (which is presumably arbitrary but stable as long as a dictisn’t modified). This can be expressed by the following invariant:

list(d.items())==list(zip(d.keys(),d.values()))

Open Issues

Do we need more of a motivation? I would think that being able to doset operations on keys and items without having to copy them shouldspeak for itself.

I’ve left out the implementation of various set operations. Thesecould still present small surprises.

It would be okay if multiple calls to d.keys() (etc.) returned thesame object, since the object’s only state is the dict to which itrefers. Is this worth having extra slots in the dict object for?Should that be a weak reference or should the d_keys (etc.) objectlive forever once created? Strawman: probably not worth the extraslots in every dict.

Should d_keys, d_values and d_items have a public instance variable ormethod through which one can retrieve the underlying dict? Strawman:yes (but what should it be called?).

I’m soliciting better names than d_keys, d_values and d_items. Theseclasses could be public so that their implementations could be reusedby the .keys(), .values() and .items() methods of other mappings. Orshould they?

Should the d_keys, d_values and d_items classes be reusable?Strawman: yes.

Should they be subclassable? Strawman: yes (but see below).

A particularly nasty issue is whether operations that are specified interms of other operations (e.g. .discard()) must really be implementedin terms of those other operations; this may appear irrelevant but itbecomes relevant if these classes are ever subclassed. Historically,Python has a really poor track record of specifying the semantics ofhighly optimized built-in types clearly in such cases; my strawman isto continue that trend. Subclassing may still be useful toadd newmethods, for example.

I’ll leave the decisions (especially about naming) up to whoeversubmits a working implementation.

References

[1] (1,2)

Java Collections Frameworkhttp://java.sun.com/docs/books/tutorial/collections/index.html

Source:https://github.com/python/peps/blob/main/peps/pep-3106.rst