- Notifications
You must be signed in to change notification settings - Fork13
Trie implementation in Elixir
License
Rob-bie/retrieval
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Retrieval is a tradional trie implementation in pure Elixir that supports pattern based lookup and a varietyof other functionality. Documentation can be foundHERE.
Add Retrieval to yourmix.exs
dependencies:
defdepsdo[{:retrieval,"~> 0.9.1"}]end
There's three ways to create a new trie struct. Provide zero arguments to create an empty trie,provide a binary key or list of binary keys to create a trie with nodes.
Retrieval.new%Retrieval.Trie{trie:...}Retrieval.new("example")%Retrieval.Trie{trie:...}Retrieval.new(~w/this is an example/)%Retrieval.Trie{trie:...}
For the rest of our examples we are going to assume we have created the following trie:
trie=Retrieval.new(~w/apple apply ape bed between betray cat noon hot warm winter boob smash crush under above people negative poison place out divide zebra extended/)
Just like creating a new trie you can either provide a single binary key or a list of binary keys forinsertion.
Retrieval.insert(trie,"elephant")|>Retrieval.insert("banana")|>Retrieval.insert(~w/cheese orange alchemy/)
Checking if a trie contains a binary key is quick and easy.
Retrieval.contains?(trie,"apple")trueRetrieval.contains?(trie,"extended")trueRetrieval.contains?(trie,"doodle")false
Looking up all binary keys that begin with a certain prefix is very common.
Retrieval.prefix(trie,"a")["above","ape","apple","apply"]Retrieval.prefix(trie,"app")["apple","apply"]Retrieval.prefix(trie,"be")["bed","betray","between"]Retrieval.prefix(trie,"th")[]
Retrieval supports a variety of patterns that can be arbitrarily combined. Erroneous patternsreturn a tuple of the form{:error, reason}
.
The wildcard pattern*
simply matches onany character.
# Three letter word# "*" - Matches any character# "p" - Matches "p"# "e" - Matches "e"Retrieval.pattern(trie,"*pe")["ape"]# Five letter word# "a" - Matches "a"# "*" - Matches any character# "*" - Matches any character# "*" - Matches any character# "y" - Matches "y"Retrieval.pattern(trie,"a***y")["apply"]# Three letter word# "*" - Matches any character# "*" - Matches any character# "*" - Matches any characterRetrieval.pattern(trie,"***")["ape","bed","cat","hot","out"]# "*" - Matches any character# "n" - Matches "n"# Two letter word that ends with "n"Retrieval.pattern(trie,"*n")[]
The inclusion group pattern[...]
matches on any characterinside of the brackets.
# Three letter word# "[ab]" - Matches "a" or "b"# "*" - Matches any character# "*" - Matches any characterRetrieval.pattern(trie,"[ab]**")["ape","bed"]# Three letter word# "[ab]" - Matches "a" or "b"# "*" - Matches any character# "d" - Matches "d"Retrieval.pattern(trie,"[ab]*d")["bed"]# Five letter word# "[abz]" - Matches "a", "b", or, "z"# "[eb]" - Matches "e" or "b"# "*" - Matches any character# "*" - Matches any character# "[ea]" - Matches "e" or "a"Retrieval.pattern(trie,"[abz][eb]**[ea]")["above","zebra"]# Four letter word# "[yxzo]" - Matches "y", "x", "z", or "o"# "*" - Matches any character# "*" - Matches any character# "*" - Matches any characterRetrieval.pattern(trie,"[yxzo]***")[]# Oh no, a bad patternRetrieval.pattern(trie,"[abc][ezf"){:error,"Dangling group (inclusion) starting at column 6, expecting ]"}
The exclusion group pattern[^...]
matches on any characternot inside of the brackets.
# Five letter word# "[abz]" - Matches "a", "b", or, "z"# "[^eb]" - Matches any character but "e" or "b"# "*" - Matches any character# "*" - Matches any character# "[ea]" - Matches "e" or "a"Retrieval.pattern(trie,"[abz][^eb]**[ea]")["apple"]# Six letter word# "[pd]" - Matches "p" or "d"# "*" - Matches any character# "*" - Matches any character# "*" - Matches any character# "[^od]" - Matches any character but "o" or "d"# "*" - Matches any characterRetrieval.pattern(trie,"[pd]***[^od]*")["people"]# Three letter word# "[^abc]" - Matches any character but "a", "b", or, "c"# "*" - Matches any character# "*" - Matches any characterRetrieval.pattern(trie,"[^abc]**")["hot","out"]# Oh no, a bad patternRetrieval.pattern(trie,"[^abc"){:error,"Dangling group (exclusion) starting at column 1, expecting ]"}
The default capture group pattern{name}
must have a name and matches on any character. A name can be any combination of charactersbut it is convention to start with the start with the name1
and increment the name of the next capture group by one. Subsequent usage ofthe named capture group must match the character captured by the first capture group. In other words, capturegroups are useful when looking for binaries with the same character in different positions. Capture groupscan be combined with aninclusion group ({name[...]})
or anexclusion group ({name[^...]})
. It is mandatory that the capture group'sname is the first thing that appears after the opening curly brace and the optional inclusion or exclusion group must the last thing that appears beforethe closing curly brace. In the event that a pattern is provided that doesn't follow these rules an error will be returned.
# Four letter word# "{1}" - Creates a new capture group named "1", matches any character# "{2}" - Creates a new capture group named "2", matches any character# "{2}" - Matches any character captured by initial capture group "{2}"# "{1}" - Matches any character captured by initial capture group "{1}"# You may have recognized that this pattern is for searching for four letter palindromes, neat!Retrieval.pattern(trie,"{1}{2}{2}{1}")["boob","noon"]# Four letter word# "{1[^nm]}" - Creates a new capture group named "1", matches any character but "n" or "m"# "{2[oe]}" - Creates a new capture group named "2", matches "o" or "e"# "{2}" - Matches any character captured by initial capture group "{2}"# "{1}" - Matches any character captured by initial capture group "{1}"# Notice that it no longer accepts "noon" because of the exclusion of "n" in capture group "{1}"Retrieval.pattern(trie,"{1[^nm]}{2[oe]}{2}{1}")["boob"]# Eight letter word# "{aa}" - Creates a new capture group named "aa", matches any character# "*" - Matches any character# "*" - Matches any character# "{aa}" - Matches any character captured by initial capture group "{aa}"# "*" - Matches any character# "{b}" - Creates a new capture group named "b", matches any character# "{aa}" - Matches any character captured by initial capture group "{aa}"# "{b}" - Matches any character captured by initial capture group "{b}"# In English, find a size eight binary that has the same character in the first, fourth, and seventh position# Additionally, it must have the same character in the sixth and eighth position# The rest are wildcards, shove anything that fits in betweenRetrieval.pattern(trie,"{aa}**{aa}*{b}{aa}{b}")["extended"]# Oh no, a bad patternRetrieval.pattern(trie,"{}**{aa}*{b}{aa}{b}"){:error,"Unnamed capture starting at column 1, capture cannot be empty"}# Oh no, a bad patternRetrieval.pattern(trie,"{aa}**{aa}*{b}{aa}{b"){:error,"Dangling group (capture) starting at column 19, expecting }"}# Oh no, a bad patternRetrieval.pattern(trie,"{aa[^abc]a}**{aa}*{b}{aa}{b}"){:error,"Group (exclusion) must in the tail position of capture starting at column 4"}
You may have been wondering if it's possible to match on pattern symbols to find binaries that contain the wildcard*
,the exclusion carot^
, and others. The answer is yes, the pattern parser allows you to escape otherwisereserved symbols to be used directly in patterns.
The following symbols can be escaped:
Symbol | Escaped symbol |
---|---|
* | \\* |
^ | \\^ |
[ | \\[ |
] | \\] |
{ | \\{ |
} | \\} |
And in action:
# Three letter word# "\\*" - Matches "*"# "*" - Matches any character# "*" - Matches any characterRetrieval.new(~w/*ab [x*/)|>Retrieval.pattern("\\***")["*ab"]# Three letter word# "[\\*\\[]" - Matches "*" or "["# "*" - Matches any character# "*" - Matches any characterRetrieval.new(~w/*ab [x*/)|>Retrieval.pattern("[\\*\\[]**")["*ab","[x*"]# Oh no, a bad patternRetrieval.new(~w/*ab [x*/)|>Retrieval.pattern("[*]**"){:error,"Unescaped symbol * at column 2"}
Will ramble later!
This work is free. You can redistribute it and/or modify it under theterms of the MIT License. See the LICENSE file for more details.