Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 372 – Adding an ordered dictionary to collections

Author:
Armin Ronacher <armin.ronacher at active-4.com>,Raymond Hettinger <python at rcn.com>
Status:
Final
Type:
Standards Track
Created:
15-Jun-2008
Python-Version:
2.7, 3.1
Post-History:


Table of Contents

Abstract

This PEP proposes an ordered dictionary as a new data structure forthecollections module, called “OrderedDict” in this PEP. Theproposed API incorporates the experiences gained from working withsimilar implementations that exist in various real-world applicationsand other programming languages.

Patch

A working Py3.1 patch including tests and documentation is at:

OrderedDict patch

The check-in was in revisions: 70101 and 70102

Rationale

In current Python versions, the widely used built-in dict type doesnot specify an order for the key/value pairs stored. This makes ithard to use dictionaries as data storage for some specific use cases.

Some dynamic programming languages like PHP and Ruby 1.9 guarantee acertain order on iteration. In those languages, and existing Pythonordered-dict implementations, the ordering of items is defined by thetime of insertion of the key. New keys are appended at the end, butkeys that are overwritten are not moved to the end.

The following example shows the behavior for simple assignments:

>>>d=OrderedDict()>>>d['parrot']='dead'>>>d['penguin']='exploded'>>>d.items()[('parrot', 'dead'), ('penguin', 'exploded')]

That the ordering is preserved makes an OrderedDict useful for a couple ofsituations:

  • XML/HTML processing libraries currently drop the ordering ofattributes, use a list instead of a dict which makes filteringcumbersome, or implement their own ordered dictionary. This affectsElementTree, html5lib, Genshi and many more libraries.
  • There are many ordered dict implementations in various librariesand applications, most of them subtly incompatible with each other.Furthermore, subclassing dict is a non-trivial task and manyimplementations don’t override all the methods properly which canlead to unexpected results.

    Additionally, many ordered dicts are implemented in an inefficientway, making many operations more complex then they have to be.

  • PEP 3115 allows metaclasses to change the mapping object used forthe class body. An ordered dict could be used to create orderedmember declarations similar to C structs. This could be useful, forexample, for futurectypes releases as well as ORMs that definedatabase tables as classes, like the one the Django framework ships.Django currently uses an ugly hack to restore the ordering ofmembers in database models.
  • The RawConfigParser class accepts adict_type argument thatallows an application to set the type of dictionary used internally.The motivation for this addition was expressly to allow users toprovide an ordered dictionary.[1]
  • Code ported from other programming languages such as PHP oftendepends on an ordered dict. Having an implementation of anordering-preserving dictionary in the standard library could easethe transition and improve the compatibility of different libraries.

Ordered Dict API

The ordered dict API would be mostly compatible with dict and existingordered dicts. Note: this PEP refers to the 2.7 and 3.0 dictionaryAPI as described in collections.Mapping abstract base class.

The constructor andupdate() both accept iterables of tuples aswell as mappings like a dict does. Unlike a regular dictionary,the insertion order is preserved.

>>>d=OrderedDict([('a','b'),('c','d')])>>>d.update({'foo':'bar'})>>>dcollections.OrderedDict([('a', 'b'), ('c', 'd'), ('foo', 'bar')])

If ordered dicts are updated from regular dicts, the ordering of newkeys is of course undefined.

All iteration methods as well askeys(),values() anditems() return the values ordered by the time the key wasfirst inserted:

>>>d['spam']='eggs'>>>d.keys()['a', 'c', 'foo', 'spam']>>>d.values()['b', 'd', 'bar', 'eggs']>>>d.items()[('a', 'b'), ('c', 'd'), ('foo', 'bar'), ('spam', 'eggs')]

New methods not available on dict:

OrderedDict.__reversed__()
Supports reverse iteration by key.

Questions and Answers

What happens if an existing key is reassigned?

The key is not moved but assigned a new value in place. This isconsistent with existing implementations.

What happens if keys appear multiple times in the list passed to theconstructor?

The same as for regular dicts – the latter item overrides theformer. This has the side-effect that the position of the firstkey is used because only the value is actually overwritten:
>>>OrderedDict([('a',1),('b',2),('a',3)])collections.OrderedDict([('a', 3), ('b', 2)])

This behavior is consistent with existing implementations inPython, the PHP array and the hashmap in Ruby 1.9.

Is the ordered dict a dict subclass? Why?

Yes. Likedefaultdict, an ordered dictionary subclassesdict.Being a dict subclass make some of the methods faster (like__getitem__ and__len__). More importantly, being a dictsubclass lets ordered dictionaries be usable with tools like json thatinsist on having dict inputs by testing isinstance(d, dict).

Do any limitations arise from subclassing dict?

Yes. Since the API for dicts is different in Py2.x and Py3.x, theOrderedDict API must also be different. So, the Py2.7 version will needto override iterkeys, itervalues, and iteritems.

DoesOrderedDict.popitem() return a particular key/value pair?

Yes. It pops-off the most recently inserted new key and itscorresponding value. This corresponds to the usual LIFO behaviorexhibited by traditional push/pop pairs. It is semanticallyequivalent tok=list(od)[-1];v=od[k];delod[k];return(k,v).The actual implementation is more efficient and pops directlyfrom a sorted list of keys.

Does OrderedDict support indexing, slicing, and whatnot?

As a matter of fact,OrderedDict does not implement theSequenceinterface. Rather, it is aMutableMapping that remembersthe order of key insertion. The only sequence-like addition issupport forreversed.

A further advantage of not allowing indexing is that it leaves openthe possibility of a fast C implementation using linked lists.

Does OrderedDict support alternate sort orders such as alphabetical?

No. Those wanting different sort orders really need to be using anothertechnique. The OrderedDict is all about recording insertion order. If anyother order is of interest, then another structure (like an in-memorydbm) is likely a better fit.

How well does OrderedDict work with the json module, PyYAML, and ConfigParser?

For json, the good news is that json’s encoder respects OrderedDict’s iteration order:
>>>items=[('one',1),('two',2),('three',3),('four',4),('five',5)]>>>json.dumps(OrderedDict(items))'{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'

In Py2.6, the object_hook for json decoders passes-in an already builtdictionary so order is lost before the object hook sees it. Thisproblem is being fixed for Python 2.7/3.1 by adding a new hook thatpreserves order (seehttps://github.com/python/cpython/issues/49631 ).With the new hook, order can be preserved:

>>>jtext='{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'>>>json.loads(jtext,object_pairs_hook=OrderedDict)OrderedDict({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5})

For PyYAML, a full round-trip is problem free:

>>>ytext=yaml.dump(OrderedDict(items))>>>printytext!!python/object/apply:collections.OrderedDict- - [one, 1]  - [two, 2]  - [three, 3]  - [four, 4]  - [five, 5]>>>yaml.load(ytext)OrderedDict({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5})

For the ConfigParser module, round-tripping is also problem free. Customdicts were added in Py2.6 specifically to support ordered dictionaries:

>>>config=ConfigParser(dict_type=OrderedDict)>>>config.read('myconfig.ini')>>>config.remove_option('Log','error')>>>config.write(open('myconfig.ini','w'))

How does OrderedDict handle equality testing?

Comparing two ordered dictionaries implies that the test will beorder-sensitive so that list(od1.items())==list(od2.items()).

When ordered dicts are compared with other Mappings, their orderinsensitive comparison is used. This allows ordered dictionariesto be substituted anywhere regular dictionaries are used.

How __repr__ format will maintain order during a repr/eval round-trip?

OrderedDict([(‘a’, 1), (‘b’, 2)])

What are the trade-offs of the possible underlying data structures?

  • Keeping a sorted list of keys is fast for all operations except__delitem__() which becomes an O(n) exercise. This data structure leadsto very simple code and little wasted space.
  • Keeping a separate dictionary to record insertion sequence numbers makesthe code a little bit more complex. All of the basic operations are O(1)but the constant factor is increased for __setitem__() and __delitem__()meaning that every use case will have to pay for this speedup (since allbuildup go through __setitem__). Also, the first traversal incurs aone-timeO(nlogn) sorting cost. The storage costs are double thatfor the sorted-list-of-keys approach.
  • A version written in C could use a linked list. The code would be morecomplex than the other two approaches but it would conserve space andwould keep the same big-oh performance as regular dictionaries. It isthe fastest and most space efficient.

Reference Implementation

An implementation with tests and documentation is at:

OrderedDict patch

The proposed version has several merits:

  • Strict compliance with the MutableMapping API and no new methodsso that the learning curve is near zero. It is simply a dictionarythat remembers insertion order.
  • Generally good performance. The big-oh times are the same as regulardictionaries except that key deletion is O(n).

Other implementations of ordered dicts in various Python projects orstandalone libraries, that inspired the API proposed here, are:

Future Directions

With the availability of an ordered dict in the standard library,other libraries may take advantage of that. For example, ElementTreecould return odicts in the future that retain the attribute orderingof the source file.

References

[1]
https://github.com/python/cpython/issues/42649

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0372.rst

Last modified:2025-02-01 08:55:40 GMT


[8]ページ先頭

©2009-2025 Movatter.jp