Movatterモバイル変換

Associative array

From Wikipedia, the free encyclopedia

Data structure holding key/value pairs

For the C++ implementation, seeAssociative containers (C++). For the higher-order function, seeMap (higher-order function). For the relation used in database systems to resolve many-to-many relationships, seeAssociative entity.Not to be confused withdata dictionary.

Incomputer science, anassociative array,key-value store,map,symbol table, ordictionary is anabstract data type that stores acollection ofkey/value pairs, such that each possible key appears at most once in the collection. In mathematical terms, an associative array is afunction with finitedomain.^[1] It supports 'lookup', 'remove', and 'insert' operations.

Thedictionary problem is the classic problem of designing efficientdata structures that implement associative arrays.^[2]The two major solutions to the dictionary problem arehash tables andsearch trees.^[3]^[4]^[5]^[6]It is sometimes also possible to solve the problem using directly addressedarrays,binary search trees, or other more specialized structures.

Many programming languages include associative arrays asprimitive data types, while many other languages providesoftware libraries that support associative arrays.Content-addressable memory is a form of direct hardware-level support for associative arrays.

Associative arrays have many applications including such fundamentalprogramming patterns asmemoization^[7] and thedecorator pattern.^[8]The name does not come from theassociative property known in mathematics. Rather, it arises from the association of values with keys. It should not be confused withassociative processors.

Operations

[edit]

In an associative array, the association betweena key and a value is often known as a "mapping"; the same word may also be used to refer to the process of creating a new association.

The operations that are usually defined for an associative array are:^[3]^[4]^[9]

Insert or put: add a new $(key,value)$ pair to the collection, mapping the key to its new value. Any existing mapping is overwritten. The arguments to this operation are the key and the value.
Remove or delete: remove a $(key,value)$ pair from the collection, unmapping a given key from its value. The argument to this operation is the key.
Lookup, find, or get: find the value (if any) that is bound to a given key. The argument to this operation is the key, and the value is returned from the operation. If no value is found, some lookup functions raise anexception, while others return a default value (such as zero, null, or a specific value passed to the constructor).

Associative arrays may also include other operations such as determining the number of mappings or constructing aniterator to loop over all the mappings. For such operations, the order in which the mappings are returned is usually implementation-defined.

Amultimap generalizes an associative array by allowing multiple values to be associated with a single key.^[10] Abidirectional map is a related abstract data type in which the mappings operate in both directions: each value must be associated with a unique key, and a second lookup operation takes a value as an argument and looks up the key associated with that value.

Properties

[edit]

The operations of the associative array should satisfy various properties:^[9]

lookup(k, insert(j, v, D)) = if k == j then v else lookup(k, D)
lookup(k, new()) = fail, wherefail is an exception or default value
remove(k, insert(j, v, D)) = if k == j then remove(k, D) else insert(j, v, remove(k, D))
remove(k, new()) = new()

wherek andj are keys,v is a value,D is an associative array, andnew() creates a new, empty associative array.

Example

[edit]

Suppose that the set of loans made by a library is represented in a data structure. Each book in a library may be checked out by one patron at a time. However, a single patron may be able to check out multiple books. Therefore, the information about which books are checked out to which patrons may be represented by an associative array, in which the books are the keys and the patrons are the values. Using notation fromPython orJSON, the data structure would be:

{"Pride and Prejudice":"Alice","Wuthering Heights":"Alice","Great Expectations":"John"}

A lookup operation on the key "Great Expectations" would return "John". If John returns his book, that would cause a deletion operation, and if Pat checks out a book, that would cause an insertion operation, leading to a different state:

{"Pride and Prejudice":"Alice","The Brothers Karamazov":"Pat","Wuthering Heights":"Alice"}

Implementation

[edit]

For dictionaries with very few mappings, it may make sense to implement the dictionary using anassociation list, which is alinked list of mappings. With this implementation, the time to perform the basic dictionary operations is linear in the total number of mappings. However, it is easy to implement and the constant factors in its running time are small.^[3]^[11]

Another very simple implementation technique, usable when the keys are restricted to a narrow range, is direct addressing into an array: the value for a given keyk is stored at the array cellA[k], or if there is no mapping fork then the cell stores a specialsentinel value that indicates the lack of a mapping. This technique is simple and fast, with each dictionary operation taking constant time. However, the space requirement for this structure is the size of the entire keyspace, making it impractical unless the keyspace is small.^[5]

The two major approaches for implementing dictionaries are ahash table or asearch tree.^[3]^[4]^[5]^[6]

Hash table implementations

[edit]

Main article:Hash table

This graph compares the average number ofCPU cache misses required to look up elements in large hash tables (far exceeding size of the cache) with chaining andlinear probing. Linear probing performs better due to betterlocality of reference, though as the table gets full, its performance degrades drastically.

The most common general-purpose implementation of an associative array is ahash table: anarray combined with ahash function that separates each key into a separate "bucket" of the array. The basic idea behind a hash table is that accessing an element of an array via its index is a simple, constant-time operation. Therefore, the average overhead of an operation for a hash table is only the computation of the key's hash, combined with accessing the corresponding bucket within the array. As such, hash tables usually perform in O(1) time, and usually outperform alternative implementations.

Hash tables must be able to handlecollisions: the mapping by the hash function of two different keys to the same bucket of the array. The two most widespread approaches to this problem areseparate chaining andopen addressing.^[3]^[4]^[5]^[12] In separate chaining, the array does not store the value itself but stores apointer to another container, usually anassociation list, that stores all the values matching the hash. By contrast, in open addressing, if a hash collision is found, the table seeks an empty spot in an array to store the value in a deterministic manner, usually by looking at the next immediate position in the array.

Open addressing has a lowercache miss ratio than separate chaining when the table is mostly empty. However, as the table becomes filled with more elements, open addressing's performance degrades exponentially. Additionally, separate chaining uses less memory in most cases, unless the entries are very small (less than four times the size of a pointer).

Tree implementations

[edit]

Main article:Search tree

Self-balancing binary search trees

[edit]

Another common approach is to implement an associative array with aself-balancing binary search tree, such as anAVL tree or ared–black tree.^[13]

Compared to hash tables, these structures have both strengths and weaknesses. The worst-case performance of self-balancing binary search trees is significantly better than that of a hash table, with a time complexity inbig O notation of O(logn). This is in contrast to hash tables, whose worst-case performance involves all elements sharing a single bucket, resulting in O(n) time complexity. In addition, and like all binary search trees, self-balancing binary search trees keep their elements in order. Thus, traversing its elements follows a least-to-greatest pattern, whereas traversing a hash table can result in elements being in seemingly random order. Because they are in order, tree-based maps can also satisfy range queries (find all values between two bounds) whereas a hashmap can only find exact values. However, hash tables have a much better average-case time complexity than self-balancing binary search trees of O(1), and their worst-case performance is highly unlikely when a goodhash function is used.

A self-balancing binary search tree can be used to implement the buckets for a hash table that uses separate chaining. This allows for average-case constant lookup, but assures a worst-case performance of O(logn). However, this introduces extra complexity into the implementation and may cause even worse performance for smaller hash tables, where the time spent inserting into and balancing the tree is greater than the time needed to perform alinear search on all elements of a linked list or similar data structure.^[14]^[15]

Other trees

[edit]

Associative arrays may also be stored in unbalancedbinary search trees or in data structures specialized to a particular type of keys such asradix trees,tries,Judy arrays, orvan Emde Boas trees, though the relative performance of these implementations varies. For instance, Judy trees have been found to perform less efficiently than hash tables, while carefully selected hash tables generally perform more efficiently than adaptive radix trees, with potentially greater restrictions on the data types they can handle.^[16] The advantages of these alternative structures come from their ability to handle additional associative array operations, such as finding the mapping whose key is the closest to a queried key when the query is absent in the set of mappings.

Comparison

[edit]

Underlying data structure	Lookup or Removal		Insertion		Ordered
Underlying data structure	average	worst case	average	worst case	Ordered
Hash table	O(1)	O(n)	O(1)	O(n)	No
Self-balancing binary search tree	O(logn)	O(logn)	O(logn)	O(logn)	Yes
unbalancedbinary search tree	O(logn)	O(n)	O(logn)	O(n)	Yes
Sequential container ofkey–value pairs (e.g.association list)	O(n)	O(n)	O(1)	O(1)	No

Ordered dictionary

[edit]

The basic definition of a dictionary does not mandate an order. To guarantee a fixed order of enumeration, ordered versions of the associative array are often used. There are two senses of an ordered dictionary:

The order of enumeration is always deterministic for a given set of keys by sorting. This is the case for tree-based implementations, one representative being thestd::map (a tree map) container of C++.^[17]
The order of enumeration is key-independent and is instead based on the order of insertion. This is the case for the "ordered dictionary" in.NET Framework, theLinkedHashMap ofJava andPython.^[18]^[19]^[20]

The latter is more common. Such ordered dictionaries can be implemented using anassociation list, by overlaying adoubly linked list on top of a normal dictionary, or by moving the actual data out of the sparse (unordered) array and into a dense insertion-ordered one.

Language support

[edit]

Main article:Comparison of programming languages (associative array)

Associative arrays can be implemented in any programming language as a package and many language systems provide them as part of their standard library. In some languages, they are not only built into the standard system, but have special syntax, often using array-like subscripting.

Built-in syntactic support for associative arrays was introduced in 1969 bySNOBOL4, under the name "table".TMG offered tables with string keys and integer values.MUMPS made multi-dimensional associative arrays, optionally persistent, its key data structure.SETL supported them as one possible implementation of sets and maps. Most modern scripting languages, starting withAWK and includingRexx,Perl,PHP,Tcl,JavaScript,Maple,Python,Ruby,Wolfram Language,Go, andLua, support associative arrays as a primary container type. In many more languages, they are available as library functions without special syntax.

InSmalltalk,Objective-C,.NET,^[21]Python,REALbasic,Swift,VBA andDelphi^[22] they are calleddictionaries; inPerl andRuby they are calledhashes; inC++,C#,Java,Go,Clojure,Scala,OCaml,Haskell they are calledmaps (seemap (C++),unordered_map (C++), andMap); inCommon Lisp andWindows PowerShell, they are calledhash tables (since both typically use this implementation); inMaple and Lua, they are calledtables. InPHP andR, all arrays can be associative, except that the keys are limited to integers and strings. In JavaScript (see alsoJSON), all objects behave as associative arrays with string-valued keys, while the Map and WeakMap types take arbitrary objects as keys. In Lua, they are used as the primitive building block for all data structures. InVisual FoxPro, they are calledCollections. TheD language also supports associative arrays.^[23]

Permanent storage

[edit]

Main article:Key–value store

Many programs using associative arrays will need to store that data in a more permanent form, such as acomputer file. A common solution to this problem is a generalized concept known asarchiving orserialization, which produces a text or binary representation of the original objects that can be written directly to a file. This is most commonly implemented in the underlying object model, like .Net or Cocoa, which includes standard functions that convert the internal data into text. The program can create a complete text representation of any group of objects by calling these methods, which are almost always already implemented in the base associative array class.^[24]

For programs that use very large data sets, this sort of individual file storage is not appropriate, and adatabase management system (DB) is required. Some DB systems natively store associative arrays by serializing the data and then storing that serialized data and the key. Individual arrays can then be loaded or saved from the database using the key to refer to them. Thesekey–value stores have been used for many years and have a history as long as that of the more commonrelational database (RDBs), but a lack of standardization, among other reasons, limited their use to certain niche roles. RDBs were used for these roles in most cases, although saving objects to a RDB can be complicated, a problem known asobject-relational impedance mismatch.

After approximately 2010, the need for high-performance databases suitable forcloud computing and more closely matching the internal structure of the programs using them led to a renaissance in the key–value store market. These systems can store and retrieve associative arrays in a native fashion, which can greatly improve performance in common web-related workflows.

References

[edit]

^Collins, Graham; Syme, Donald (1995). "A theory of finite maps".Higher Order Logic Theorem Proving and Its Applications. Lecture Notes in Computer Science. Vol. 971. pp. 122–137.doi:10.1007/3-540-60275-5_61.ISBN 978-3-540-60275-0.
^Andersson, Arne (1989). "Optimal Bounds on the Dictionary Problem".Proc. Symposium on Optimal Algorithms. Lecture Notes in Computer Science. Vol. 401. Springer Verlag. pp. 106–114.doi:10.1007/3-540-51859-2_10.ISBN 978-3-540-51859-4.
^^a ^b ^c ^d ^eGoodrich, Michael T.;Tamassia, Roberto (2006), "9.1 The Map Abstract Data Type",Data Structures & Algorithms in Java (4th ed.), Wiley, pp. 368–371
^^a ^b ^c ^dMehlhorn, Kurt;Sanders, Peter (2008), "4 Hash Tables and Associative Arrays",Algorithms and Data Structures: The Basic Toolbox(PDF), Springer, pp. 81–98,archived(PDF) from the original on 2014-08-02
^^a ^b ^c ^dCormen, Thomas H.;Leiserson, Charles E.;Rivest, Ronald L.;Stein, Clifford (2001), "11 Hash Tables",Introduction to Algorithms (2nd ed.),MIT Press andMcGraw-Hill, pp. 221–252,ISBN 0-262-03293-7.
^^a ^bDietzfelbinger, M., Karlin, A., Mehlhorn, K., Meyer auf der Heide, F., Rohnert, H., and Tarjan, R. E. 1994."Dynamic Perfect Hashing: Upper and Lower Bounds"Archived 2016-03-04 at theWayback Machine.SIAM J. Comput. 23, 4 (Aug. 1994), 738-761.http://portal.acm.org/citation.cfm?id=182370doi:10.1137/S0097539791194094
^Michie, Donald (1968)."'Memo' Functions and Machine Learning"(PDF).Nature.218 (5136):19–22.Bibcode:1968Natur.218...19M.doi:10.1038/218019a0.S2CID 4265138.
^Goodrich & Tamassia (2006), pp. 597–599.
^^a ^bBlack, Paul E.; Stewart, Rob (2 November 2020)."dictionary".Dictionary of Algorithms and Data Structures. Retrieved26 January 2022.
^Goodrich & Tamassia (2006), pp. 389–397.
^"When should I use a hash table instead of an association list?". lisp-faq/part2. 1996-02-20.
^Klammer, F.;Mazzolini, L. (2006), "Pathfinders for associative maps",Ext. Abstracts GIS-l 2006, GIS-I, pp. 71–74.
^Joel Adams and Larry Nyhoff."Trees in STL".Quote:"The Standard Template library ... some of its containers -- the set<T>, map<T1, T2>, multiset<T>, and multimap<T1, T2> templates -- are generally built using a special kind ofself-balancing binary search tree called ared–black tree."
^Knuth, Donald (1998).The Art of Computer Programming. Vol. 3:Sorting and Searching (2nd ed.). Addison-Wesley. pp. 513–558.ISBN 0-201-89685-0.
^Probst, Mark (2010-04-30)."Linear vs Binary Search". Retrieved2016-11-20.
^Alvarez, Victor; Richter, Stefan; Chen, Xiao; Dittrich, Jens (April 2015). "A comparison of adaptive radix trees and hash tables".2015 IEEE 31st International Conference on Data Engineering. Seoul, South Korea: IEEE. pp. 1227–1238.doi:10.1109/ICDE.2015.7113370.ISBN 978-1-4799-7964-6.S2CID 17170456.
^"std::map".en.cppreference.com.
^"OrderedDictionary Class (System.Collections.Specialized)".MS Docs.
^"LinkedHashMap".
^"collections — Container datatypes — Python 3.9.0a3 documentation".docs.python.org.
^"Dictionary<TKey, TValue> Class". MSDN.
^"System.Generics.Collections.TDictionary - RAD Studio API Documentation".docwiki.embarcadero.com. Retrieved2017-04-18.
^"Associative Arrays, the D programming language". Digital Mars.
^"Archives and Serializations Programming Guide", Apple Inc., 2012

External links

[edit]

Look upassociative array in Wiktionary, the free dictionary.

NIST's Dictionary of Algorithms and Data Structures: Associative Array

v t e Data structures
Types	Collection Container
Abstract	Associative array Multimap Retrieval Data Structure List Stack Queue Double-ended queue Priority queue Double-ended priority queue Set Multiset Disjoint-set
Arrays	Bit array Circular buffer Dynamic array Hash table Hashed array tree Sparse matrix
Linked	Association list Linked list Skip list Unrolled linked list XOR linked list
Trees	B-tree Binary search tree AA tree AVL tree Red–black tree Self-balancing tree Splay tree Heap Binary heap Binomial heap Fibonacci heap R-tree R* tree R+ tree Hilbert R-tree Rope Trie Hash tree
Graphs	Binary decision diagram Directed acyclic graph Directed acyclic word graph
List of data structures

v t e Data types
Uninterpreted	Bit Byte Trit Tryte Word Bit array
Numeric	Arbitrary-precision or bignum Complex Decimal Fixed point Block floating point Floating point Reduced precision Minifloat Half precision bfloat16 Single precision Double precision Quadruple precision Octuple precision Extended precision Long double Integer signedness Interval Rational
Pointer	Address physical virtual Reference
Text	Character String null-terminated
Composite	Algebraic data type generalized Array Associative array Class Dependent Equality Inductive Intersection List Object metaobject Option type Product Record or Struct Refinement Set Union tagged
Other	Any type Boolean Bottom type Collection Enumerated type Exception Function type Opaque data type Recursive data type Semaphore Stream Strongly typed identifier Type class Empty type Unit type Void
Related topics	Value Abstract data type Boxing Data structure Generic Kind metaclass Parametric polymorphism Primitive data type Interface Subtyping Type constructor Type conversion Type system Type theory Variable