Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 584 – Add Union Operators To dict

Author:
Steven D’Aprano <steve at pearwood.info>,Brandt Bucher <brandt at python.org>
BDFL-Delegate:
Guido van Rossum <guido at python.org>
Status:
Final
Type:
Standards Track
Created:
01-Mar-2019
Python-Version:
3.9
Post-History:
01-Mar-2019, 16-Oct-2019, 02-Dec-2019, 04-Feb-2020,17-Feb-2020
Resolution:
Python-Dev thread

Table of Contents

Abstract

This PEP proposes adding merge (|) and update (|=) operatorsto the built-indict class.

Note

After this PEP was accepted, the decision was made to alsoimplement the new operators forseveral other standard librarymappings.

Motivation

The current ways to merge two dicts have several disadvantages:

dict.update

d1.update(d2) modifiesd1 in-place.e=d1.copy();e.update(d2) is not an expression and needs atemporary variable.

{**d1,**d2}

Dict unpacking looks ugly and is not easily discoverable. Few peoplewould be able to guess what it means the first time they see it, orthink of it as the “obvious way” to merge two dicts.

As Guido said:

I’m sorry for PEP 448, but even if you know about**d insimpler contexts, if you were to ask a typical Python user howto combine two dicts into a new one, I doubt many people wouldthink of{**d1,**d2}. I know I myself had forgotten aboutit when this thread started!

{**d1,**d2} ignores the types of the mappings and always returnsadict.type(d1)({**d1,**d2}) fails for dict subclassessuch asdefaultdict that have an incompatible__init__ method.

collections.ChainMap

ChainMap is unfortunately poorly-known and doesn’t qualify as“obvious”. It also resolves duplicate keys in the opposite order tothat expected (“first seen wins” instead of “last seen wins”). Likedict unpacking, it is tricky to get it to honor the desired subclass.For the same reason,type(d1)(ChainMap(d2,d1)) fails for somesubclasses of dict.

Further, ChainMaps wrap their underlying dicts, so writes to theChainMap will modify the original dict:

>>>d1={'spam':1}>>>d2={'eggs':2}>>>merged=ChainMap(d2,d1)>>>merged['eggs']=999>>>d2{'eggs': 999}

dict(d1,**d2)

This “neat trick” is not well-known, and only works whend2 isentirely string-keyed:

>>>d1={"spam":1}>>>d2={3665:2}>>>dict(d1,**d2)Traceback (most recent call last):...TypeError:keywords must be strings

Rationale

The new operators will have the same relationship to thedict.update method as the list concatenate (+) and extend(+=) operators have tolist.extend. Note that this issomewhat different from the relationship that|/|= have withset.update; the authors have determined that allowing the in-placeoperator to accept a wider range of types (aslist does) is a moreuseful design, and that restricting the types of the binary operator’soperands (again, aslist does) will help avoid silent errorscaused by complicated implicit type casting on both sides.

Key conflicts will be resolved by keeping the rightmost value. Thismatches the existing behavior of similardict operations, wherethe last seen value always wins:

{'a':1,'a':2}{**d,**e}d.update(e)d[k]=v{k:vforxin(d,e)for(k,v)inx.items()}

All of the above follow the same rule. This PEP takes the positionthat this behavior is simple, obvious, usually the behavior we want,and should be the default behavior for dicts. This means that dictunion is not commutative; in generald|e!=e|d.

Similarly, theiteration order of the key-value pairs in thedictionary will follow the same semantics as the examples above, witheach newly added key (and its value) being appended to the currentsequence.

Specification

Dict union will return a newdict consisting of the left operandmerged with the right operand, each of which must be adict (or aninstance of adict subclass). If a key appears in both operands,the last-seen value (i.e. that from the right-hand operand) wins:

>>>d={'spam':1,'eggs':2,'cheese':3}>>>e={'cheese':'cheddar','aardvark':'Ethel'}>>>d|e{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}>>>e|d{'cheese': 3, 'aardvark': 'Ethel', 'spam': 1, 'eggs': 2}

The augmented assignment version operates in-place:

>>>d|=e>>>d{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}

Augmented assignment behaves identically to theupdate methodcalled with a single positional argument, so it also accepts anythingimplementing the Mapping protocol (more specifically, anything withthekeys and__getitem__ methods) or iterables of key-valuepairs. This is analogous tolist+= andlist.extend, whichaccept any iterable, not just lists. Continued from above:

>>>d|[('spam',999)]Traceback (most recent call last):...TypeError:can only merge dict (not "list") to dict>>>d|=[('spam',999)]>>>d{'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}

When new keys are added, their order matches their order within theright-hand mapping, if any exists for its type.

Reference Implementation

One of the authors haswritten a C implementation.

Anapproximate pure-Python implementation is:

def__or__(self,other):ifnotisinstance(other,dict):returnNotImplementednew=dict(self)new.update(other)returnnewdef__ror__(self,other):ifnotisinstance(other,dict):returnNotImplementednew=dict(other)new.update(self)returnnewdef__ior__(self,other):dict.update(self,other)returnself

Major Objections

Dict Union Is Not Commutative

Union is commutative, but dict union will not be (d|e!=e|d).

Response

There is precedent for non-commutative unions in Python:

>>>{0}|{False}{0}>>>{False}|{0}{False}

While the results may be equal, they are distinctly different. Ingeneral,a|b is not the same operation asb|a.

Dict Union Will Be Inefficient

Giving a pipe operator to mappings is an invitation to writing codethat doesn’t scale well. Repeated dict union is inefficient:d|e|f|g|h creates and destroys three temporary mappings.

Response

The same argument applies to sequence concatenation.

Sequence concatenation grows with the total number of items in thesequences, leading to O(N**2) (quadratic) performance. Dict union islikely to involve duplicate keys, so the temporary mappings willnot grow as fast.

Just as it is rare for people to concatenate large numbers of lists ortuples, the authors of this PEP believe that it will be rare forpeople to merge large numbers of dicts.collections.Counter is adict subclass that supports many operators, and there are no knownexamples of people having performance issues due to combining largenumbers of Counters. Further, a survey of the standard library by theauthors found no examples of merging more than two dicts, so this isunlikely to be a performance problem in practice… “Everything isfast for small enough N”.

If one expects to be merging a large number of dicts where performanceis an issue, it may be better to use an explicit loop and in-placemerging:

new={}fordinmany_dicts:new|=d

Dict Union Is Lossy

Dict union can lose data (values may disappear); no other form ofunion is lossy.

Response

It isn’t clear why the first part of this argument is a problem.dict.update() may throw away values, but not keys; that isexpected behavior, and will remain expected behavior regardless ofwhether it is spelled asupdate() or|.

Other types of union are also lossy, in the sense of not beingreversible; you cannot get back the two operands given only the union.a|b==365… what area andb?

Only One Way To Do It

Dict union will violate the Only One Way koan from the Zen.

Response

There is no such koan. “Only One Way” is a calumny about Pythonoriginating long ago from the Perl community.

More Than One Way To Do It

Okay, the Zen doesn’t say that there should be Only One Way To Do It.But it does have a prohibition against allowing “more than one way todo it”.

Response

There is no such prohibition. The “Zen of Python” merely expresses apreference for “only oneobvious way”:

Thereshouldbeone--andpreferablyonlyone--obviouswaytodoit.

The emphasis here is that there should be an obvious way to do “it”.In the case of dict update operations, there are at least twodifferent operations that we might wish to do:

  • Update a dict in place: The Obvious Way is to use theupdate()method. If this proposal is accepted, the|= augmentedassignment operator will also work, but that is a side-effect of howaugmented assignments are defined. Which you choose is a matter oftaste.
  • Merge two existing dicts into a third, new dict: This PEP proposesthat the Obvious Way is to use the| merge operator.

In practice, this preference for “only one way” is frequently violatedin Python. For example, everyfor loop could be re-written as awhile loop; everyif block could be written as anif/else block. List, set and dict comprehensions could all bereplaced by generator expressions. Lists offer no fewer than fiveways to implement concatenation:

  • Concatenation operator:a+b
  • In-place concatenation operator:a+=b
  • Slice assignment:a[len(a):]=b
  • Sequence unpacking:[*a,*b]
  • Extend method:a.extend(b)

We should not be too strict about rejecting useful functionalitybecause it violates “only one way”.

Dict Union Makes Code Harder To Understand

Dict union makes it harder to tell what code means. To paraphrase theobjection rather than quote anyone in specific: “If I seespam|eggs, I can’t tell what it does unless I know whatspamandeggs are”.

Response

This is very true. But it is equally true today, where the use of the| operator could mean any of:

  • int/bool bitwise-or
  • set/frozenset union
  • any other overloaded operation

Adding dict union to the set of possibilities doesn’t seem to makeitharder to understand the code. No more work is required todetermine thatspam andeggs are mappings than it would taketo determine that they are sets, or integers. And good namingconventions will help:

flags|=WRITEABLE# Probably numeric bitwise-or.DO_NOT_RUN=WEEKENDS|HOLIDAYS# Probably set union.settings=DEFAULT_SETTINGS|user_settings|workspace_settings# Probably dict union.

What About The Fullset API?

dicts are “set like”, and should support the full collection of setoperators:|,&,^, and-.

Response

This PEP does not take a position on whether dicts should support thefull collection of set operators, and would prefer to leave that for alater PEP (one of the authors is interested in drafting such a PEP).For the benefit of any later PEP, a brief summary follows.

Set symmetric difference (^) is obvious and natural. For example,given two dicts:

d1={"spam":1,"eggs":2}d2={"ham":3,"eggs":4}

the symmetric differenced1^d2 would be{"spam":1,"ham":3}.

Set difference (-) is also obvious and natural, and an earlierversion of this PEP included it in the proposal. Given the dictsabove, we would haved1-d2 be{"spam":1} andd2-d1 be{"ham":3}.

Set intersection (&) is a bit more problematic. While it is easyto determine the intersection ofkeys in two dicts, it is not clearwhat to do with thevalues. Given the two dicts above, it isobvious that the only key ofd1&d2 must be"eggs". “Lastseen wins”, however, has the advantage of consistency with other dictoperations (and the proposed union operators).

What AboutMapping AndMutableMapping?

collections.abc.Mapping andcollections.abc.MutableMappingshould define| and|=, so subclasses could just inherit thenew operators instead of having to define them.

Response

There are two primary reasons why adding the new operators to theseclasses would be problematic:

  • Currently, neither defines acopy method, which would benecessary for| to create a new instance.
  • Adding|= toMutableMapping (or acopy method toMapping) would create compatibility issues for virtualsubclasses.

Rejected Ideas

Rejected Semantics

There were at least four other proposed solutions for handlingconflicting keys. These alternatives are left to subclasses of dict.

Raise

It isn’t clear that this behavior has many use-cases or will be oftenuseful, but it will likely be annoying as any use of the dict unionoperator would have to be guarded with atry/except clause.

Add The Values (As Counter Does, with+)

Too specialised to be used as the default behavior.

Leftmost Value (First-Seen) Wins

It isn’t clear that this behavior has many use-cases. In fact, onecan simply reverse the order of the arguments:

d2|d1# d1 merged with d2, keeping existing values in d1

Concatenate Values In A List

This is likely to be too specialised to be the default. It is notclear what to do if the values are already lists:

{'a':[1,2]}|{'a':[3,4]}

Should this give{'a':[1,2,3,4]} or{'a':[[1,2],[3,4]]}?

Rejected Alternatives

Use The Addition Operator

This PEP originally started life as a proposal for dict addition,using the+ and+= operator. That choice proved to beexceedingly controversial, with many people having serious objectionsto the choice of operator. For details, seeprevious versions of thePEP and the mailing listdiscussions.

Use The Left Shift Operator

The<< operator didn’t seem to get much support on Python-Ideas,but no major objections either. Perhaps the strongest objection wasChris Angelico’s comment

The “cuteness” value of abusing the operator to indicateinformation flow got old shortly after C++ did it.

Use A New Left Arrow Operator

Another suggestion was to create a new operator<-. Unfortunatelythis would be ambiguous,d<-e could meandmergee ordless-thanminuse.

Use A Method

Adict.merged() method would avoid the need for an operator atall. One subtlety is that it would likely need slightly differentimplementations when called as an unbound method versus as a boundmethod.

As an unbound method, the behavior could be similar to:

defmerged(cls,*mappings,**kw):new=cls()# Will this work for defaultdict?forminmappings:new.update(m)new.update(kw)returnnew

As a bound method, the behavior could be similar to:

defmerged(self,*mappings,**kw):new=self.copy()forminmappings:new.update(m)new.update(kw)returnnew
Advantages
  • Arguably, methods are more discoverable than operators.
  • The method could accept any number of positional and keywordarguments, avoiding the inefficiency of creating temporary dicts.
  • Accepts sequences of(key,value) pairs like theupdatemethod.
  • Being a method, it is easy to override in a subclass if you needalternative behaviors such as “first wins”, “unique keys”, etc.
Disadvantages
  • Would likely require a new kind of method decorator which combinedthe behavior of regular instance methods andclassmethod. Itwould need to be public (but not necessarily a builtin) for thoseneeding to override the method. There is aproof of concept.
  • It isn’t an operator. Guido discusseswhy operators are useful.For another viewpoint, seeAlyssa Coghlan’s blog post.

Use a Function

Instead of a method, use a new built-in functionmerged(). Onepossible implementation could be something like this:

defmerged(*mappings,**kw):ifmappingsandisinstance(mappings[0],dict):# If the first argument is a dict, use its type.new=mappings[0].copy()mappings=mappings[1:]else:# No positional arguments, or the first argument is a# sequence of (key, value) pairs.new=dict()forminmappings:new.update(m)new.update(kw)returnnew

An alternative might be to forgo the arbitrary keywords, and take asingle keyword parameter that specifies the behavior on collisions:

defmerged(*mappings,on_collision=lambdak,v1,v2:v2):# implementation left as an exercise to the reader
Advantages
  • Most of the same advantages of the method solutions above.
  • Doesn’t require a subclass to implement alternative behavior oncollisions, just a function.
Disadvantages
  • May not be important enough to be a builtin.
  • Hard to override behavior if you need something like “first wins”,without losing the ability to process arbitrary keyword arguments.

Examples

The authors of this PEP did a survey of third party libraries fordictionary merging which might be candidates for dict union.

This is a cursory list based on a subset of whatever arbitrarythird-party packages happened to be installed on one of the authors’computers, and may not reflect the current state of any package. Alsonote that, while further (unrelated) refactoring may be possible, therewritten version only adds usage of the new operators for anapples-to-apples comparison. It also reduces the result to anexpression when it is efficient to do so.

IPython/zmq/ipkernel.py

Before:

aliases=dict(kernel_aliases)aliases.update(shell_aliases)

After:

aliases=kernel_aliases|shell_aliases

IPython/zmq/kernelapp.py

Before:

kernel_aliases=dict(base_aliases)kernel_aliases.update({'ip':'KernelApp.ip','hb':'KernelApp.hb_port','shell':'KernelApp.shell_port','iopub':'KernelApp.iopub_port','stdin':'KernelApp.stdin_port','parent':'KernelApp.parent',})ifsys.platform.startswith('win'):kernel_aliases['interrupt']='KernelApp.interrupt'kernel_flags=dict(base_flags)kernel_flags.update({'no-stdout':({'KernelApp':{'no_stdout':True}},"redirect stdout to the null device"),'no-stderr':({'KernelApp':{'no_stderr':True}},"redirect stderr to the null device"),})

After:

kernel_aliases=base_aliases|{'ip':'KernelApp.ip','hb':'KernelApp.hb_port','shell':'KernelApp.shell_port','iopub':'KernelApp.iopub_port','stdin':'KernelApp.stdin_port','parent':'KernelApp.parent',}ifsys.platform.startswith('win'):kernel_aliases['interrupt']='KernelApp.interrupt'kernel_flags=base_flags|{'no-stdout':({'KernelApp':{'no_stdout':True}},"redirect stdout to the null device"),'no-stderr':({'KernelApp':{'no_stderr':True}},"redirect stderr to the null device"),}

matplotlib/backends/backend_svg.py

Before:

attrib=attrib.copy()attrib.update(extra)attrib=attrib.items()

After:

attrib=(attrib|extra).items()

matplotlib/delaunay/triangulate.py

Before:

edges={}edges.update(dict(zip(self.triangle_nodes[border[:,0]][:,1],self.triangle_nodes[border[:,0]][:,2])))edges.update(dict(zip(self.triangle_nodes[border[:,1]][:,2],self.triangle_nodes[border[:,1]][:,0])))edges.update(dict(zip(self.triangle_nodes[border[:,2]][:,0],self.triangle_nodes[border[:,2]][:,1])))

Rewrite as:

edges={}edges|=zip(self.triangle_nodes[border[:,0]][:,1],self.triangle_nodes[border[:,0]][:,2])edges|=zip(self.triangle_nodes[border[:,1]][:,2],self.triangle_nodes[border[:,1]][:,0])edges|=zip(self.triangle_nodes[border[:,2]][:,0],self.triangle_nodes[border[:,2]][:,1])

matplotlib/legend.py

Before:

hm=default_handler_map.copy()hm.update(self._handler_map)returnhm

After:

returndefault_handler_map|self._handler_map

numpy/ma/core.py

Before:

_optinfo={}_optinfo.update(getattr(obj,'_optinfo',{}))_optinfo.update(getattr(obj,'_basedict',{}))ifnotisinstance(obj,MaskedArray):_optinfo.update(getattr(obj,'__dict__',{}))

After:

_optinfo={}_optinfo|=getattr(obj,'_optinfo',{})_optinfo|=getattr(obj,'_basedict',{})ifnotisinstance(obj,MaskedArray):_optinfo|=getattr(obj,'__dict__',{})

praw/internal.py

Before:

data={'name':six.text_type(user),'type':relationship}data.update(kwargs)

After:

data={'name':six.text_type(user),'type':relationship}|kwargs

pygments/lexer.py

Before:

kwargs.update(lexer.options)lx=lexer.__class__(**kwargs)

After:

lx=lexer.__class__(**(kwargs|lexer.options))

requests/sessions.py

Before:

merged_setting=dict_class(to_key_val_list(session_setting))merged_setting.update(to_key_val_list(request_setting))

After:

merged_setting=dict_class(to_key_val_list(session_setting))|to_key_val_list(request_setting)

sphinx/domains/__init__.py

Before:

self.attrs=self.known_attrs.copy()self.attrs.update(attrs)

After:

self.attrs=self.known_attrs|attrs

sphinx/ext/doctest.py

Before:

new_opt=code[0].options.copy()new_opt.update(example.options)example.options=new_opt

After:

example.options=code[0].options|example.options

sphinx/ext/inheritance_diagram.py

Before:

n_attrs=self.default_node_attrs.copy()e_attrs=self.default_edge_attrs.copy()g_attrs.update(graph_attrs)n_attrs.update(node_attrs)e_attrs.update(edge_attrs)

After:

g_attrs|=graph_attrsn_attrs=self.default_node_attrs|node_attrse_attrs=self.default_edge_attrs|edge_attrs

sphinx/highlighting.py

Before:

kwargs.update(self.formatter_args)returnself.formatter(**kwargs)

After:

returnself.formatter(**(kwargs|self.formatter_args))

sphinx/quickstart.py

Before:

d2=DEFAULT_VALUE.copy()d2.update(dict(("ext_"+ext,False)forextinEXTENSIONS))d2.update(d)d=d2

After:

d=DEFAULT_VALUE|dict(("ext_"+ext,False)forextinEXTENSIONS)|d

sympy/abc.py

Before:

clash={}clash.update(clash1)clash.update(clash2)returnclash1,clash2,clash

After:

returnclash1,clash2,clash1|clash2

sympy/parsing/maxima.py

Before:

dct=MaximaHelpers.__dict__.copy()dct.update(name_dict)obj=sympify(str,locals=dct)

After:

obj=sympify(str,locals=MaximaHelpers.__dict__|name_dict)

sympy/printing/ccode.py and sympy/printing/fcode.py

Before:

self.known_functions=dict(known_functions)userfuncs=settings.get('user_functions',{})self.known_functions.update(userfuncs)

After:

self.known_functions=known_functions|settings.get('user_functions',{})

sympy/utilities/runtests.py

Before:

globs=globs.copy()ifextraglobsisnotNone:globs.update(extraglobs)

After:

globs=globs|(extraglobsifextraglobsisnotNoneelse{})

The above examples show that sometimes the| operator leads to aclear increase in readability, reducing the number of lines of codeand improving clarity. However other examples using the|operator lead to long, complex single expressions, possibly well overthePEP 8 maximum line length of 80 columns. As with any otherlanguage feature, the programmer should use their own judgement aboutwhether| improves their code.

Related Discussions

Mailing list threads (this is by no means an exhaustive list):

Ticket on the bug tracker

Merging two dictionaries in an expression is a frequently requestedfeature. For example:

https://stackoverflow.com/questions/38987/how-to-merge-two-dictionaries-in-a-single-expression

https://stackoverflow.com/questions/1781571/how-to-concatenate-two-dictionaries-to-create-a-new-one-in-python

https://stackoverflow.com/questions/6005066/adding-dictionaries-together-python

Occasionally people request alternative behavior for the merge:

https://stackoverflow.com/questions/1031199/adding-dictionaries-in-python

https://stackoverflow.com/questions/877295/python-dict-add-by-valuedict-2

…including one proposal to treat dicts assets of keys.

Ian Lee’s proto-PEP, anddiscussion in 2015. Furtherdiscussion took place onPython-Ideas.

(Observant readers will notice that one of the authors of this PEP wasmore skeptical of the idea in 2015.)

Addinga full complement of operators to dicts.

Discussion on Y-Combinator.

https://treyhunner.com/2016/02/how-to-merge-dictionaries-in-python/

https://code.tutsplus.com/tutorials/how-to-merge-two-python-dictionaries–cms-26230

In direct response to an earlier draft of this PEP, Serhiy Storchakaraiseda ticket in the bug tracker to replace thecopy();update() idiom with dict unpacking.

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0584.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp