Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Trie

This is a good article. Click here for more information.
From Wikipedia, the free encyclopedia
Search tree data structure
This article is about a specific type of tree data structure. For tree data structures generally, seetree (data structure).For other uses of "tree", seetree (disambiguation).
Not to be confused withtri,try, ortray.

Trie
TypeTree
Invented1960
Invented byEdward Fredkin,Axel Thue, and René de la Briandais
Time complexity inbig O notation
OperationAverageWorst case
SearchO(n)O(n)
InsertO(n)O(n)
DeleteO(n)O(n)
Space complexity
SpaceO(n)O(n)
Depiction of a trie. Single empty circle, representing the root node, points to three children. The arrow to each child is marked by a different letter. The children themselves have similar set of arrows and child nodes, with nodes that correspond to full words bearing blue integer values.
A trie for keys "A", "to", "tea", "ted", "ten", "i", "in", and "inn". Each complete English word has an arbitrary integer value associated with it.

In computer science, atrie (/ˈtr/,/ˈtr/), also known as adigital tree orprefix tree,[1] is a specializedsearch tree data structure used to store and retrieve strings from a dictionary or set. Unlike abinary search tree, nodes in a trie do not store their associated key. Instead, each node'sposition within the trie determines its associated key, with the connections between nodes defined by individualcharacters rather than the entire key.

Tries are particularly effective for tasks such as autocomplete, spell checking, and IP routing, offering advantages overhash tables due to their prefix-based organization and lack of hash collisions. Every child node shares a commonprefix with its parent node, and the root node represents theempty string. While basic trie implementations can be memory-intensive, various optimization techniques such as compression and bitwise representations have been developed to improve their efficiency. A notable optimization is theradix tree, which provides more efficient prefix-based storage.

While tries commonly store character strings, they can be adapted to work with any ordered sequence of elements, such aspermutations of digits or shapes. A notable variant is thebitwise trie, which uses individualbits from fixed-length binary data (such asintegers ormemory addresses) as keys.

History, etymology, and pronunciation

[edit]

The idea of a trie for representing a set of strings was first abstractly described byAxel Thue in 1912.[2][3] Tries were first described in a computer context by René de la Briandais in 1959.[4][3][5]: 336 

The idea was independently described in 1960 byEdward Fredkin,[6] who coined the termtrie, pronouncing it/ˈtr/ (as "tree"), after the middle syllable ofretrieval.[7][8] However, other authors pronounce it/ˈtr/ (as "try"), in an attempt to distinguish it verbally from "tree".[7][8][3]

Overview

[edit]

Tries are a form of string-indexed look-up data structure, which is used to store a dictionary list of words that can be searched on in a manner that allows for efficient generation ofcompletion lists.[9][10]: 1  A prefix trie is anordered tree data structure used in the representation of a set of strings over a finite alphabet set, which allows efficient storage of words with common prefixes.[1]

Tries can be efficacious onstring-searching algorithms such aspredictive text,approximate string matching, andspell checking in comparison to binary search trees.[11][8][12]: 358  A trie can be seen as a tree-shapeddeterministic finite automaton.[13]

Operations

[edit]
Trie representation of the string setssea,sells, andshe

Tries support various operations: insertion, deletion, and lookup of a string key. Tries are composed of nodes that contain links, which either point to other suffix child nodes ornull. As for every tree, each node but the root is pointed to by only one other node, called itsparent. Each node contains as many links as the number of characters in the applicablealphabet (although tries tend to have a substantial number of null links). In some cases, the alphabet used is simply that of thecharacter encoding—resulting in, for example, a size of 256 in the case of (unsigned)ASCII.[14]: 732 

The null links within the children of a node emphasize the following characteristics:[14]: 734 [5]: 336 

  1. Characters and string keys are implicitly stored in the trie, and include a charactersentinel value indicating string termination.
  2. Each node contains one possible link to aprefix of strong keys of the set.

A basicstructure type of nodes in the trie is as follows;Node{\displaystyle {\text{Node}}} may contain an optionalValue{\displaystyle {\text{Value}}}, which is associated with each key stored in the last character of string, or terminal node.

structure Node    ChildrenNode[Alphabet-Size]    Is-TerminalBoolean    ValueData-Typeend structure

Searching

[edit]

Searching for a value in a trie is guided by the characters in the search string key, as each node in the trie contains a corresponding link to each possible character in the given string. Thus, following the string within the trie yields the associated value for the given string key. A null link during the search indicates the inexistence of the key.[14]: 732-733 

The following pseudocode implements the search procedure for a given stringkey in a rooted triex.[15]: 135 

Trie-Find(x, key)for 0 ≤ i < key.lengthdoif x.Children[key[i]] = nilthenreturn falseend if        x := x.Children[key[i]]repeatreturn x.Value

In the above pseudocode,x andkey correspond to the pointer of trie's root node and the string key respectively. The search operation, in a standard trie, takesO(dm){\displaystyle O({\text{dm}})} time, wherem{\displaystyle {\text{m}}} is the size of the string parameterkey{\displaystyle {\text{key}}}, andd{\displaystyle {\text{d}}} corresponds to thealphabet size.[16]: 754 Binary search trees, on the other hand, takeO(mlogn){\displaystyle O(m\log n)} in the worst case, since the search depends on the height of the tree (logn{\displaystyle \log n}) of the BST (in case ofbalanced trees), wheren{\displaystyle {\text{n}}} andm{\displaystyle {\text{m}}} being number of keys and the length of the keys.[12]: 358 

The trie occupies less space in comparison with a BST in the case of a large number of short strings, since nodes share common initial string subsequences and store the keys implicitly.[12]: 358  The terminal node of the tree contains a non-null value, and it is a searchhit if the associated value is found in the trie, and searchmiss if it is not.[14]: 733 

Insertion

[edit]

Insertion into trie is guided by using thecharacter sets as indexes to the children array until the last character of the string key is reached.[14]: 733-734  Each node in the trie corresponds to one call of theradix sorting routine, as the trie structure reflects the execution of pattern of the top-down radix sort.[15]: 135 

123456789
Trie-Insert(x, key, value)for 0 ≤ i < key.lengthdoif x.Children[key[i]] = nilthen            x.Children[key[i]] := Node()end if        x := x.Children[key[i]]repeat    x.Value := value    x.Is-Terminal := True

If a null link is encountered prior to reaching the last character of the string key, a new node is created (line 3).[14]: 745  The value of the terminal node is assigned to the input value; therefore, if the former was non-null at the time of insertion, it is substituted with the new value.

Deletion

[edit]

Deletion of akey–value pair from a trie involves finding the terminal node with the corresponding string key, marking the terminal indicator and value tofalse and null correspondingly.[14]: 740 

The following is arecursive procedure for removing a stringkey from rooted trie (x).

123456789101112131415
Trie-Delete(x, key)if key = nilthenif x.Is-Terminal = Truethen            x.Is-Terminal := False            x.Value := nilend iffor 0 ≤ i < x.Children.lengthif x.Children[i] != nilreturn xend ifrepeatreturn nilend if    x.Children[key[0]] := Trie-Delete(x.Children[key[0]], key[1:])return x

The procedure begins by examining thekey; null denotes the arrival of a terminal node or end of a string key. If the node is terminal it has no children, it is removed from the trie (line 14). However, an end of string key without the node being terminal indicates that the key does not exist, thus the procedure does not modify the trie. The recursion proceeds by incrementingkey's index.

Replacing other data structures

[edit]

Replacement for hash tables

[edit]

A trie can be used to replace ahash table, over which it has the following advantages:[12]: 358 

  • Searching for a node with an associated key of sizem{\displaystyle m} has the complexity ofO(m){\displaystyle O(m)}, whereas an imperfect hash function may have numerous colliding keys, and the worst-case lookup speed of such a table would beO(N){\displaystyle O(N)}, whereN{\displaystyle N} denotes the total number of nodes within the table.
  • Tries do not need a hash function for the operation, unlike a hash table; there are also nocollisions of different keys in a trie.
  • Buckets in a trie, which are analogous to hash table buckets that store key collisions, are necessary only if a single key is associated with more than one value.
  • String keys within the trie can be sorted using a predetermined alphabetical ordering.

However, tries are less efficient than a hash table when the data is directly accessed on asecondary storage device such as a hard disk drive that has higherrandom access time than themain memory.[6] Tries are also disadvantageous when the key value cannot be easily represented as string, such asfloating point numbers where multiple representations are possible (e.g. 1 is equivalent to 1.0, +1.0, 1.00, etc.),[12]: 359  however it can be unambiguously represented as abinary number inIEEE 754, in comparison totwo's complement format.[17]

Implementation strategies

[edit]
A trie implemented as aleft-child right-sibling binary tree: vertical arrows arechild pointers, dotted horizontal arrows arenext pointers. The set of strings stored in this trie is{baby, bad, bank, box, dad, dance}. The lists are sorted to allow traversal in lexicographic order.

Tries can be represented in several ways, corresponding to different trade-offs between memory use and speed of the operations.[5]: 341  Using a vector of pointers for representing a trie consumes enormous space; however, memory space can be reduced at the expense of running time if asingly linked list is used for each node vector, as most entries of the vector containsnil{\displaystyle {\text{nil}}}.[3]: 495 

Techniques such asalphabet reduction may reduce the large space requirements by reinterpreting the original string as a longer string over a smaller alphabet i.e. a string ofn bytes can alternatively be regarded as a string of2nfour-bit units and stored in a trie with 16 instead of 256 pointers per node. Although this can reduce memory usage by up to a factor of eight, lookups need to visit twice as many nodes in the worst case.[5]: 347–352  Other techniques include storing a vector of 256 ASCII pointers as a bitmap of 256 bits representing ASCII alphabet, which reduces the size of individual nodes dramatically.[18]

Bitwise tries

[edit]
See also:x-fast trie andBitwise trie with bitmap

Bitwise tries are used to address the enormous space requirement for the trie nodes in a naive simple pointer vector implementations. Each character in the string key set is represented via individual bits, which are used to traverse the trie over a string key. The implementations for these types of trie usevectorized CPU instructions tofind the first set bit in a fixed-length key input (e.g.GCC's__builtin_clz()intrinsic function). Accordingly, the set bit is used to index the first item, or child node, in the 32- or 64-entry based bitwise tree. Search then proceeds by testing each subsequent bit in the key.[19]

This procedure is alsocache-local andhighly parallelizable due toregister independency, and thus performant onout-of-order execution CPUs.[19]

Compressed tries

[edit]
Main article:Radix tree

Radix tree, also known as acompressed trie, is a space-optimized variant of a trie in which any node with only one child gets merged with its parent; elimination of branches of the nodes with a single child results in better metrics in both space and time.[20][21]: 452  This works best when the trie remains static and set of keys stored are very sparse within their representation space.[22]: 3–16 

One more approach is to "pack" the trie, in which a space-efficient implementation of a sparse packed trie applied to automatichyphenation, in which the descendants of each node may be interleaved in memory.[8]

Patricia trees

[edit]
Patricia tree representation of the string set{in, integer, interval, string, structure}.

Patricia trees are a particular implementation of the compressed binary trie that uses thebinary encoding of the string keys in its representation.[23][15]: 140  Every node in a Patricia tree contains an index, known as a "skip number", that stores the node's branching index to avoid empty subtrees during traversal.[15]: 140-141  A naive implementation of a trie consumes immense storage due to larger number of leaf-nodes caused by sparse distribution of keys; Patricia trees can be efficient for such cases.[15]: 142 [24]: 3 

A representation of a Patricia tree is shown to the right. Each index value adjacent to the nodes represents the "skip number"—the index of the bit with which branching is to be decided.[24]: 3  The skip number 1 at node 0 corresponds to the position 1 in the binary encoded ASCII where the leftmost bit differed in the key setX{\displaystyle X}.[24]: 3-4  The skip number is crucial for search, insertion, and deletion of nodes in the Patricia tree, and abit masking operation is performed during every iteration.[15]: 143 

Applications

[edit]

Trie data structures are commonly used inpredictive text orautocomplete dictionaries, andapproximate matching algorithms.[11] Tries enable faster searches, occupy less space, especially when the set contains large number of short strings, thus used inspell checking, hyphenation applications andlongest prefix match algorithms.[8][12]: 358  However, if storing dictionary words is all that is required (i.e. there is no need to store metadata associated with each word), a minimal deterministic acyclic finite state automaton (DAFSA) or radix tree would use less storage space than a trie. This is because DAFSAs and radix trees can compress identical branches from the trie which correspond to the same suffixes (or parts) of different words being stored. String dictionaries are also utilized innatural language processing, such as findinglexicon of atext corpus.[25]: 73 

Sorting

[edit]

Lexicographic sorting of a set of string keys can be implemented by building a trie for the given keys and traversing the tree inpre-order fashion;[26] this is also a form ofradix sort.[27] Tries are also fundamental data structures forburstsort, which is notable for being the fastest string sorting algorithm as of 2007,[28] accomplished by its efficient use of CPUcache.[29]

Full-text search

[edit]

A special kind of trie, called asuffix tree, can be used to index allsuffixes in a text to carry out fast full-text searches.[30]

Web search engines

[edit]

A specialized kind of trie called a compressed trie, is used inweb search engines for storing theindexes - a collection of all searchable words.[31] Each terminal node is associated with a list ofURLs—called occurrence list—to pages that match the keyword. The trie is stored in the main memory, whereas the occurrence is kept in an external storage, frequently in largeclusters, or the in-memory index points to documents stored in an external location.[32]

Bioinformatics

[edit]

Tries are used inBioinformatics, notably insequence alignment software applications such asBLAST, which indexes all the different substring of lengthk (calledk-mers) of a text by storing the positions of their occurrences in a compressed trie sequence databases.[25]: 75 

Internet routing

[edit]
See also:Luleå algorithm

Compressed variants of tries, such as databases for managingForwarding Information Base (FIB), are used in storingIP address prefixes withinrouters andbridges for prefix-based lookup to resolvemask-based operations inIP routing.[25]: 75 

See also

[edit]

References

[edit]
  1. ^abMaabar, Maha (17 November 2014)."Trie Data Structure". CVR,University of Glasgow.Archived from the original on 27 January 2021. Retrieved17 April 2022.
  2. ^Thue, Axel (1912)."Über die gegenseitige Lage gleicher Teile gewisser Zeichenreihen".Skrifter Udgivne Af Videnskabs-Selskabet I Christiania.1912 (1):1–67. Cited by Knuth.
  3. ^abcdKnuth, Donald (1997). "6.3: Digital Searching".The Art of Computer Programming Volume 3: Sorting and Searching (2nd ed.). Addison-Wesley. p. 492.ISBN 0-201-89685-0.
  4. ^de la Briandais, René (1959).File searching using variable length keys(PDF). Proc. Western J. Computer Conf. pp. 295–298.doi:10.1145/1457838.1457895.S2CID 10963780. Archived fromthe original(PDF) on 2020-02-11. Cited by Brass and by Knuth.
  5. ^abcdBrass, Peter (8 September 2008).Advanced Data Structures.UK:Cambridge University Press.doi:10.1017/CBO9780511800191.ISBN 978-0521880374.
  6. ^abEdward Fredkin (1960)."Trie Memory".Communications of the ACM.3 (9):490–499.doi:10.1145/367390.367400.S2CID 15384533.
  7. ^abBlack, Paul E. (2009-11-16)."trie".Dictionary of Algorithms and Data Structures.National Institute of Standards and Technology.Archived from the original on 2011-04-29.
  8. ^abcdeFranklin Mark Liang (1983).Word Hy-phen-a-tion By Com-put-er(PDF) (Doctor of Philosophy thesis). Stanford University.Archived(PDF) from the original on 2005-11-11. Retrieved2010-03-28.
  9. ^"Trie". School of Arts and Science,Rutgers University. 2022.Archived from the original on 17 April 2022. Retrieved17 April 2022.
  10. ^Connelly, Richard H.; Morris, F. Lockwood (1993)."A generalization of the trie data structure".Mathematical Structures in Computer Science.5 (3).Syracuse University:381–418.doi:10.1017/S0960129500000803.S2CID 18747244.
  11. ^abAho, Alfred V.; Corasick, Margaret J. (Jun 1975)."Efficient String Matching: An Aid to Bibliographic Search".Communications of the ACM.18 (6):333–340.doi:10.1145/360825.360855.S2CID 207735784.
  12. ^abcdefThareja, Reema (13 October 2018). "Hashing and Collision".Data Structures Using C (2 ed.).Oxford University Press.ISBN 9780198099307.
  13. ^Daciuk, Jan (24 June 2003).Comparison of Construction Algorithms for Minimal, Acyclic, Deterministic, Finite-State Automata from Sets of Strings. International Conference on Implementation and Application of Automata.Springer Publishing. pp. 255–261.doi:10.1007/3-540-44977-9_26.ISBN 978-3-540-40391-3.
  14. ^abcdefgSedgewick, Robert; Wayne, Kevin (3 April 2011).Algorithms (4 ed.).Addison-Wesley,Princeton University.ISBN 978-0321573513.
  15. ^abcdefGonnet, G. H.; Yates, R. Baeza (January 1991).Handbook of algorithms and data structures: in Pascal and C (2 ed.).Boston,United States:Addison-Wesley.ISBN 978-0-201-41607-7.
  16. ^Patil, Varsha H. (10 May 2012).Data Structures using C++.Oxford University Press.ISBN 9780198066231.
  17. ^S. Orley; J. Mathews."The IEEE 754 Format". Department of Mathematics and Computer Science,Emory University.Archived from the original on 28 March 2022. Retrieved17 April 2022.
  18. ^Bellekens, Xavier (2014). "A Highly-Efficient Memory-Compression Scheme for GPU-Accelerated Intrusion Detection Systems".Proceedings of the 7th International Conference on Security of Information and Networks - SIN '14. Glasgow, Scotland, UK: ACM. pp. 302:302–302:309.arXiv:1704.02272.doi:10.1145/2659651.2659723.ISBN 978-1-4503-3033-6.S2CID 12943246.
  19. ^abWillar, Dan E. (27 January 1983)."Log-logarithmic worst-case range queries are possible in space O(n)".Information Processing Letters.17 (2):81–84.doi:10.1016/0020-0190(83)90075-3.
  20. ^Sartaj Sahni (2004)."Data Structures, Algorithms, & Applications in C++: Tries".University of Florida.Archived from the original on 3 July 2016. Retrieved17 April 2022.
  21. ^Mehta, Dinesh P.; Sahni, Sartaj (7 March 2018). "Tries".Handbook of Data Structures and Applications (2 ed.).Chapman & Hall,University of Florida.ISBN 978-1498701853.
  22. ^Jan Daciuk; Stoyan Mihov; Bruce W. Watson; Richard E. Watson (1 March 2000)."Incremental Construction of Minimal Acyclic Finite-State Automata".Computational Linguistics.26 (1).MIT Press:3–16.arXiv:cs/0007009.Bibcode:2000cs........7009D.doi:10.1162/089120100561601.
  23. ^"Patricia tree".National Institute of Standards and Technology.Archived from the original on 14 February 2022. Retrieved17 April 2022.
  24. ^abcCrochemore, Maxime; Lecroq, Thierry (2009). "Trie".Encyclopedia of Database Systems.Boston,United States:Springer Publishing.Bibcode:2009eds..book.....L.doi:10.1007/978-0-387-39940-9.ISBN 978-0-387-49616-0 – viaHAL (open archive).
  25. ^abcMartinez-Prieto, Miguel A.; Brisaboa, Nieves; Canovas, Rodrigo; Claude, Francisco; Navarro, Gonzalo (March 2016)."Practical compressed string dictionaries".Information Systems.56.Elsevier:73–108.doi:10.1016/j.is.2015.08.008.ISSN 0306-4379.
  26. ^Kärkkäinen, Juha."Lecture 2"(PDF).University of Helsinki.The preorder of the nodes in a trie is the same as the lexicographical order of the strings they represent assuming the children of a node are ordered by the edge labels.
  27. ^Kallis, Rafael (2018)."The Adaptive Radix Tree (Report #14-708-887)"(PDF).University of Zurich: Department of Informatics, Research Publications.
  28. ^Ranjan Sinha and Justin Zobel and David Ring (Feb 2006)."Cache-Efficient String Sorting Using Copying"(PDF).ACM Journal of Experimental Algorithmics.11:1–32.doi:10.1145/1187436.1187439.S2CID 3184411.
  29. ^J. Kärkkäinen and T. Rantala (2008). "Engineering Radix Sort for Strings". In A. Amir and A. Turpin and A. Moffat (ed.).String Processing and Information Retrieval, Proc. SPIRE. Lecture Notes in Computer Science. Vol. 5280. Springer. pp. 3–14.doi:10.1007/978-3-540-89097-3_3.ISBN 978-3-540-89096-6.
  30. ^Giancarlo, Raffaele (28 May 1992)."A Generalization of the Suffix Tree to Square Matrices, with Applications".SIAM Journal on Computing.24 (3).Society for Industrial and Applied Mathematics:520–562.doi:10.1137/S0097539792231982.ISSN 0097-5397.
  31. ^Yang, Lai; Xu, Lida; Shi, Zhongzhi (23 March 2012). "An enhanced dynamic hash TRIE algorithm for lexicon search".Enterprise Information Systems.6 (4):419–432.Bibcode:2012EntIS...6..419Y.doi:10.1080/17517575.2012.665483.S2CID 37884057.
  32. ^Transier, Frederik; Sanders, Peter (December 2010)."Engineering basic algorithms of an in-memory text search engine".ACM Transactions on Information Systems.29 (1).Association for Computing Machinery:1–37.doi:10.1145/1877766.1877768.S2CID 932749.

External links

[edit]
Wikimedia Commons has media related toTrie.
Look uptrie in Wiktionary, the free dictionary.
Search trees
(dynamic sets/associative arrays)
Heaps
Tries
Spatial data partitioning trees
Other trees
Types
Abstract
Arrays
Linked
Trees
Graphs
String metric
String-searching algorithm
Multiple string searching
Regular expression
Sequence alignment
Data structure
Other
Retrieved from "https://en.wikipedia.org/w/index.php?title=Trie&oldid=1283788852"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp