Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 603 – Adding a frozenmap type to collections

Author:
Yury Selivanov <yury at edgedb.com>
Discussions-To:
Discourse thread
Status:
Draft
Type:
Standards Track
Created:
12-Sep-2019
Post-History:
12-Sep-2019

Table of Contents

Abstract

Apersistent data structure is defined as a data structure thatpreserves the previous version of the data when the data is modified.Such data structures are effectivelyimmutable, as operations onthem do not update the structure in-place, but instead always yielda new updated structure (see[0] for more details.)

This PEP proposes to add a new fully persistent and immutable mappingtype calledfrozenmap to thecollections module.

The bulk offrozenmap’s reference implementation is alreadyused in CPython to implement thecontextvars module.

Rationale

Python has two immutable collection types:tuple andfrozenset. These types can be used to represent immutable listsand sets. However, a way to represent immutablemappings does not yetexist, and this PEP proposes afrozenmap to implement animmutablemapping.

The proposedfrozenmap type:

  • implements thecollections.abc.Mapping protocol,
  • supports pickling, and
  • provides an API for efficient creation of “modified” versions.

The following use cases illustrate why an immutable mapping isdesirable:

  • Immutable mappings are hashable which allows their useas dictionary keys or set elements.

    This hashable property permits functions decorated with@functools.lru_cache() to accept immutable mappings asarguments. Unlike an immutable mapping, passing a plaindictto such a function results in error.

  • Immutable mappings can hold complex state. Since immutable mappingscan be copied by reference, transactional mutation of state can beefficiently implemented.
  • Immutable mappings can be used to safely share dictionaries acrossthread and asynchronous task boundaries. The immutability makes iteasier to reason about threads and asynchronous tasks.

Lastly, CPython[1] already contains the main portion of the C coderequired for thefrozenmap implementation. The C code alreadyexists to implement thecontextvars module (seePEP 567 formore details.) Exposing this C code via a public collection typedrastically increases the number of users of the code. This leads toincreased code quality by discovering bugs and improving performancewhich without afrozenmap collection would be very challengingbecause most programs use thecontextvars module indirectly.

Specification

A new public immutable typefrozenmap is added to thecollections module.

Construction

frozenmap implements adict-like construction API:

  • frozenmap() creates a new empty immutable mapping;
  • frozenmap(**kwargs) creates a mapping from**kwargs, e.g.frozenmap(x=10,y=0,z=-1)
  • frozenmap(collection) creates a mapping from the passedcollection object. The passedcollection object can be:
    • adict,
    • anotherfrozenmap,
    • an object with anitems() method that is expected to returna series of key/value tuples, or
    • an iterable of key/value tuples.

Data Access

frozenmap implements thecollection.abc.Mapping protocol.Therefore, getters, membership checks, and iteration work the sameway that they would for adict:

m=frozenmap(foo='bar')assertm['foo']=='bar'assertm.get('foo')=='bar'assert'foo'inmassert'baz'notinmassertm.get('baz','missing')=='missing'assertm==massertm!=frozenmap()# m is not equal to an empty frozenmapassertlen(m)==1# etc.

Mutation

frozenmap instances are immutable. That said, it is possibleto efficiently produce mutatedcopies of the immutable instance.

The complexity of mutation operations is O(log N) and the resultingfrozenmap copies often consume very little additional memory dueto the use of structural sharing (read[6] for more details.)

frozenmap.including(key, value)

The method creates a newfrozenmap copy with a newkey /valuepair:

m=frozenmap(foo=1)m2=m.including('bar',100)print(m)# will print frozenmap({'foo': 1})print(m2)# will print frozenmap({'foo': 1, 'bar': 100})

frozenmap.excluding(key)

The method produces a copy of thefrozenmap which does notinclude a deletedkey:

m=frozenmap(foo=1,bar=100)m2=m.excluding('foo')print(m)# will print frozenmap({'foo': 1, 'bar': 100})print(m2)# will print frozenmap({'bar': 1})m3=m.excluding('spam')# will throw a KeyError('spam')

frozenmap.union(mapping=None, **kw)

The method produces a copy of thefrozenmap and adds or modifiesmultiple key/values for the created copy. The signature ofthe method matches the signature of thefrozenmap constructor:

m=frozenmap(foo=1)m2=m.union({'spam':'ham'})print(m2)# will print frozenmap({'foo': 1, 'spam': 'ham'})m3=m.union(foo=100,y=2)print(m3)# will print frozenmap({'foo': 100, 'y': 2})print(m)# will print frozenmap({'foo': 1})

Calling theunion() method to add/replace N keys is more efficientthan calling theincluding() method N times.

frozenmap.mutating()

The method allows efficient copying of afrozenmap instance withmultiple modifications applied. This method is especially usefulwhen the frozenmap in question contains thousands of key/value pairsand there’s a need to update many of them in a performance-criticalsection of the code.

Thefrozenmap.mutating() method returns a mutable dict-likecopy of thefrozenmap object: an instance ofcollections.FrozenMapCopy.

TheFrozenMapCopy objects:

  • are copy-on-write views of the data offrozenmap instancesthey were created from;
  • are mutable, although any mutations on them do not affect thefrozenmap instances they were created from;
  • can be passed to thefrozenmap constructor; creating afrozenmap from aFrozenMapCopy object is an O(1)operation;
  • have O(log N) complexity for get/set operations; creatingthem is an O(1) operation;
  • have aFrozenMapCopy.close() method that prevents anyfurther access/mutation of the data;
  • can be used as a context manager.

The below example illustrates howmutating() can be used witha context manager:

numbers=frozenmap((i,i**2)foriinrange(1_000_000))withnumbers.mutating()ascopy:foriinnumbers:ifnot(numbers[i]%997):delcopy[i]numbers_without_997_multiples=frozenmap(copy)# at this point, *numbers* still has 1_000_000 key/values, and# *numbers_without_997_multiples* is a copy of *numbers* without# values that are multiples of 997.foriinnumbers:ifnot(numbers[i]%593):delcopy[i]numbers_without_593_multiples=frozenmap(copy)print(copy[10])# will print 100.print(copy[10])# This will throw a ValueError as *copy*# has been closed when the "with" block# was executed.

Iteration

Asfrozenmap implements the standardcollections.abc.Mappingprotocol, so all expected methods of iteration are supported:

assertlist(m)==['foo']assertlist(m.items())==[('foo','bar')]assertlist(m.keys())==['foo']assertlist(m.values())==['bar']

Iteration infrozenmap, unlike indict, does not preserve theinsertion order.

Hashing

frozenmap instances can be hashable just liketuple objects:

hash(frozenmap(foo='bar'))# workshash(frozenmap(foo=[]))# will throw an error

Typing

It is possible to use the standard typing notation for frozenmaps:

m:frozenmap[str,int]=frozenmap()

Implementation

The proposedfrozenmap immutable type uses a Hash Array MappedTrie (HAMT) data structure. Functional programming languages,like Clojure, use HAMT to efficiently implement immutable hash tables,vectors, and sets.

HAMT

The key design contract of HAMT is the guarantee of a predictablevalue when given the hash of akey. For a pair ofkey andvalue,the hash of thekey can be used to determine the location ofvalue in the hash map tree.

Immutable mappings implemented with HAMT have O(log N) performanceforset() andget() operations. This efficiency is possiblebecause mutation operations only affect one branch of the tree,making it possible to reuse non-mutated branches, and, therefore,avoiding copying of unmodified data.

Read more about HAMT in[5]. The CPython implementation[1] has afairly detailed description of the algorithm as well.

Performance

../_images/pep-0603-hamt_vs_dict.png

Figure 1. Benchmark code can be found here:[3].

The above chart demonstrates that:

  • frozenmap implemented with HAMT displays near O(1) performancefor all benchmarked dictionary sizes.
  • dict.copy() becomes less efficient when using around100-200 items.
../_images/pep-0603-lookup_hamt.png

Figure 2. Benchmark code can be found here:[4].

Figure 2 compares the lookup costs ofdict versus a HAMT-basedimmutable mapping. HAMT lookup time is ~30% slower than Python dictlookups on average. This performance difference exists since traversinga shallow tree is less efficient than lookup in a flat continuous array.

Further to that, quoting[6]: “[using HAMT] means that in practicewhile insertions, deletions, and lookups into a persistent hash arraymapped trie have a computational complexity of O(log n), for mostapplications they are effectively constant time, as it would requirean extremely large number of entries to make any operation take morethan a dozen steps.”

Design Considerations

Why “frozenmap” and not “FrozenMap”

The lower-case “frozenmap” resonates well with thefrozensetbuilt-in as well as with types likecollections.defaultdict.

Why “frozenmap” and not “frozendict”

“Dict” has a very specific meaning in Python:

  • a dict is a concrete implementation ofabc.MutableMapping withO(1) get and set operations (frozenmap has O(log N) complexity);
  • Python dicts preserve insertion order.

The proposedfrozenmap does not have these mentionedproperties. Instead,frozenmap has an O(log N) cost of set/getoperations, and it only implements theabc.Mapping protocol.

Implementation

The full implementation of the proposedfrozenmap type isavailable at[2]. The package includes C and pure Pythonimplementations of the type.

See also the HAMT collection implementation as part of theCPython project tree here:[1].

References

[0]
https://en.wikipedia.org/wiki/Persistent_data_structure
[1] (1,2,3)
https://github.com/python/cpython/blob/3.8/Python/hamt.c
[2]
https://github.com/MagicStack/immutables
[3]
https://gist.github.com/1st1/be5a1c10aceb0775d0406e879cf87344
[4]
https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e
[5]
https://en.wikipedia.org/wiki/Hash_array_mapped_trie#cite_note-bagwell-1
[6] (1,2)
https://en.wikipedia.org/wiki/Persistent_data_structure#Trees

Acknowledgments

I thank Carol Willing, Łukasz Langa, Larry Hastings, andGuido van Rossum for their feedback, ideas, edits, and discussionsaround this PEP.

Copyright

This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.


Source:https://github.com/python/peps/blob/main/peps/pep-0603.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2025 Movatter.jp