You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
Rax is a radix tree implementation initially written to be used in a specificplace of Redis in order to solve a performance problem, but immediatelyconverted into a stand alone project to make it reusable for Redis itself, outside the initial intended application, and for other projects as well.
The primary goal was to find a suitable balance between performancesand memory usage, while providing a fully featured implementation of radix treesthat can cope with many different requirements.
During the development of this library, while getting more and more excitedabout how practical and applicable radix trees are, I was very surprised tosee how hard it is to write a robust implementation, especially of a fullyfeatured radix tree with a flexible iterator. A lot of things can go wrongin node splitting, merging, and various edge cases. For this reason a majorgoal of the project is to provide a stable and battle tested implementationfor people to use and in order to share bug fixes. The project relies a loton fuzz testing techniques in order to explore not just all the lines of codethe project is composed of, but a large amount of possible states.
Rax is an open source project, released under the BSD two clause license.
Major features:
Memory conscious:
Packed nodes representation.
Able to avoid storing a NULL pointer inside the node if the key is set to NULL (there is anisnull bit in the node header).
Lack of parent node reference. A stack is used instead when needed.
Fast lookups:
Edges are stored as arrays of bytes directly in the parent node, no need to access non useful children while trying to find a match. This translates into less cache misses compared to other implementations.
Cache line friendly scanning of the correct child by storing edges as two separated arrays: an array of edge chars and one of edge pointers.
Complete implementation:
Deletion with nodes re-compression as needed.
Iterators (including a way to use iterators while the tree is modified).
Random walk iteration.
Ability to report and resist out of memory: if malloc() returns NULL the API can report an out of memory error and always leave the tree in a consistent state.
Readable and fixable implementation:
All complex parts are commented with algorithms details.
Debugging messages can be enabled to understand what the implementation is doing when calling a given function.
Ability to print the radix tree nodes representation as ASCII art.
Portable implementation:
Never does unaligned accesses to memory.
Written in ANSI C99, no extensions used.
Extensive code and possible states test coverage using fuzz testing.
Testing relies a lot on fuzzing in order to explore non trivial states.
Implementation of the dictionary and iterator compared with behavior-equivalent implementations of simple hash tables and sorted arrays, generating random data and checking if the two implementations results match.
Out of memory condition tests. The implementation is fuzzed with a special allocator returningNULL at random. The resulting radix tree is tested for consistency. Redis, the primary target of this implementation, does not use this feature, but the ability to handle OOM may make this implementation useful where the ability to survive OOMs is needed.
Part of Redis: the implementation is stressed significantly in the real world.
The layout of a node is as follows. In the example, a node which representsa key (so has a data pointer associated), has three childrenx,y,z.Every space represents a byte in the diagram.
The headerHDR is actually a bitfield with the following fields:
uint32_t iskey:1; /* Does this node contain a key? */uint32_t isnull:1; /* Associated value is NULL (don't store it). */uint32_t iscompr:1; /* Node is compressed. */uint32_t size:29; /* Number of children, or compressed string len. */
Compressed nodes represent chains of nodes that are not keys and haveexactly a single child, so instead of storing:
The basic API is a trivial dictionary where you can add or remove elements.The only notable difference is that the insert and remove APIs also acceptan optional argument in order to return, by reference, the old value storedat a key when it is updated (on insert) or removed.
Creating a radix tree and adding a key
A new radix tree is created with:
rax *rt = raxNew();
In order to insert a new key, the following function is used:
The function returns 1 if the key was inserted correctly, or 0 if the keywas already in the radix tree: in this case, the value is updated. Thevalue of 0 is also returned on out of memory, however in that caseerrno is set toENOMEM.
If the associated valuedata is NULL, the node where the keyis stored does not use additional memory to store the NULL value, sodictionaries composed of just keys are memory efficient if you useNULL as associated value.
Note that keys are unsigned arrays of chars and you need to specify thelength: Rax is binary safe, so the key can be anything.
The insertion function is also available in a variant that will notoverwrite the existing key value if any:
The function is exactly the same as raxInsert(), however if the keyexists the function returns 0 (like raxInsert) without touching theold value. The old value can be still returned via the 'old' pointerby reference.
This function returns the special valueraxNotFound if the key youare trying to access is not there, so an example usage is the following:
void *data = raxFind(rax,mykey,mykey_len);if (data == raxNotFound) return;printf("Key value is %p\n", data);
raxFind() is a read only function so no out of memory conditions arepossible, the function never fails.
Deleting keys
Deleting the key is as you could imagine it, but with the ability toreturn by reference the value associated to the key we are about todelete:
int raxRemove(rax *rax, unsigned char *s, size_t len, void **old);
The function returns 1 if the key gets deleted, or 0 if the key was notthere. This function also does not fail for out of memory, however ifthere is an out of memory condition while a key is being deleted, theresulting tree nodes may not get re-compressed even if possible: the radixtree may be less efficiently encoded in this case.
Theold argument is optional, if passed will be set to the key associatedvalue if the function successfully finds and removes the key.
Iterators
The Rax key space is ordered lexicographically, using the value of thebytes the keys are composed of in order to decide which key is greaterbetween two keys. If the prefix is the same, the longer key is consideredto be greater.
Rax iterators allow to seek a given element based on different operatorsand then to navigate the key space callingraxNext() andraxPrev().
Basic iterator usage
Iterators are normally declared as local variables allocated on the stack,and then initialized with theraxStart function:
raxIterator iter;raxStart(&iter, rt); // Note that 'rt' is the radix tree pointer.
The functionraxStart never fails and returns no value.Once an iterator is initialized, it can be sought (sought is the past tensof 'seek', which is not 'seeked', in case you wonder) in order to startthe iteration from the specified position. For this goal, the functionraxSeek is used:
The functionraxNext returns elements starting from the element soughtwithraxSeek, till the final element of the tree. When there are no moreelements, 0 is returned, otherwise the function returns 1. However the functionmay return 0 when an out of memory condition happens as well: while it attemptsto always use the stack, if the tree depth is large or the keys are big theiterator starts to use heap allocated memory.
The functionraxPrev works exactly in the same way, but will move towardsthe first element of the radix tree instead of moving towards the lastelement.
Releasing iterators
An iterator can be used multiple times, and can be sought again and againusingraxSeek without any need to callraxStart again. However, when theiterator is not going to be used again, its memory must be reclaimedwith the following call:
raxStop(&iter);
Note that even if you do not callraxStop, most of the times you'll notdetect any memory leak, but this is just a side effect of how theRax implementation works: most of the times it will try to use the stackallocated data structures. However for deep trees or large keys, heap memorywill be allocated, and failing to callraxStop will result into a memoryleak.
Seek operators
The functionraxSeek can seek different elements based on the operator.For instance in the example above we used the following call:
raxSeek(&iter,">=",(unsigned char*)"foo",3);
In order to seek the first element>= to the string"foo". Howeverother operators are available. The first set are pretty obvious:
== seek the element exactly equal to the given one.
> seek the element immediately greater than the given one.
>= seek the element equal, or immediately greater than the given one.
< seek the element immediately smaller than the given one.
<= seek the element equal, or immediately smaller than the given one.
^ seek the smallest element of the radix tree.
$ seek the greatest element of the radix tree.
When the last two operators,^ or$ are used, the key and key lengthargument passed are completely ignored since they are not relevant.
Note how certain times the seek will be impossible, for example when theradix tree contains no elements or when we are asking for a seek that isnot possible, like in the following case:
raxSeek(&iter,">",(unsigned char*)"zzzzz",5);
We may not have any element greater than"zzzzz". In this case, whathappens is that the first call toraxNext orraxPrev will simply returnzero, so no elements are iterated.
Iterator stop condition
Sometimes we want to iterate specific ranges, for example from AAA to BBB.In order to do so, we could seek and get the next element. However we needto stop once the returned key is greater than BBB. The Rax library offerstheraxCompare function in order to avoid you need to code the same stringcomparison function again and again based on the exact iteration you aredoing:
raxIterator iter;raxStart(&iter);raxSeek(&iter,">=",(unsigned char*)"AAA",3); // Seek the first elementwhile(raxNext(&iter)) { if (raxCompare(&iter,">",(unsigned char*)"BBB",3)) break; printf("Current key: %.*s\n", (int)iter.key_len,(char*)iter.key);}raxStop(&iter);
The above code shows a complete range iterator just printing the keystraversed by iterating.
The prototype of theraxCompare function is the following:
The operators supported are>,>=,<,<=,==.The function returns 1 if the current iterator key satisfies the operatorcompared to the provided key, otherwise 0 is returned.
Checking for iterator EOF condition
Sometimes we want to know if the itereator is in EOF state before callingraxNext() or raxPrev(). The iterator EOF condition happens when there areno more elements to return via raxNext() or raxPrev() call, because eitherraxSeek() failed to seek the requested element, or because EOF was reachedwhile navigating the tree with raxPrev() and raxNext() calls.
This condition can be tested with the following function that returns 1if EOF was reached:
int raxEOF(raxIterator *it);
Modifying the radix tree while iterating
In order to be efficient, the Rax iterator caches the exact node we are at,so that at the next iteration step, it can start from where it left.However an iterator has sufficient state in order to re-seek againin case the cached node pointers are no longer valid. This problem happenswhen we want to modify a radix tree during an iteration. A common patternis, for instance, deleting all the elements that match a given condition.
Fortunately there is a very simple way to do this, and the efficiency costis only paid as needed, that is, only when the tree is actually modified.The solution consists of seeking the iterator again, with the current key,once the tree is modified, like in the following example:
while(raxNext(&iter,...)) { if (raxRemove(rax,...)) { raxSeek(&iter,">",iter.key,iter.key_size); }}
In the above case we are iterating withraxNext, so we are going towardslexicographically greater elements. Every time we remove an element, what weneed to do is to seek it again using the current element and the> seekoperator: this way we'll move to the next element with a new state representingthe current radix tree (after the change).
The same idea can be used in different contexts, considering the following:
Iterators need to be sought again withraxSeek every time keys are added or removed while iterating.
The current iterator key is always valid to access viaiter.key_size anditer.key, even after it was deleted from the radix tree.
Re-seeking iterators after EOF
After iteration reaches an EOF condition since there are no more elementsto return, because we reached one or the other end of the radix tree, theEOF condition is permanent, and even iterating in the reverse direction willnot produce any result.
The simplest way to continue the iteration, starting again from the lastelement returned by the iterator, is simply to seek itself:
raxSeek(&iter,iter.key,iter.key_len,"==");
So for example in order to write a command that prints all the elementsof a radix tree from the first to the last, and later again from the lastto the first, reusing the same iterator, it is possible to use the followingapproach:
To extract a fair element from a radix tree so that every element is returnedwith the same probability is not possible if we require that:
The radix tree is not larger than expected (for example augmented with information that allows elements ranking).
We want the operation to be fast, at worst logarithmic (so things like reservoir sampling are out since it's O(N)).
However a random walk which is long enough, in trees that are more or less balanced, produces acceptable results, is fast, and eventually returns every possible element, even if not with the right probability.
To perform a random walk, just seek an iterator anywhere and call thefollowing function:
int raxRandomWalk(raxIterator *it, size_t steps);
If the number of steps is set to 0, the function will perform a number ofrandom walk steps between 1 and two times the logarithm in base two of thenumber of elements inside the tree, which is often enough to get a decentresult. Otherwise, you may specify the exact number of steps to take.
Printing trees
For debugging purposes, or educational ones, it is possible to use thefollowing call in order to get an ASCII art representation of a radix treeand the nodes it is composed of:
raxShow(mytree);
However note that this works well enough for trees with a few elements, butbecomes hard to read for very large trees.
The following is an example of the output raxShow() produces after addingthe specified keys and values:
In order to test with Valgrind, just run the tests using it, howeverif you want accurate leaks detection, let Valgrind run thewhole test,since if you stop it earlier it will detect a lot of false positive memoryleaks. This is due to the fact that Rax put pointers at unaligned addresseswithmemcpy, so it is not obvious where pointers are stored for Valgrind,that will detect the leaks. However, at the end of the test, Valgrind willdetect that all the allocations were later freed, and will report thatthere are no leaks.
Debugging Rax
While investigating problems in Rax it is possible to turn debugging messageson by compiling with the macroRAX_DEBUG_MSG enabled. Note that it's a lotof output, and may make running large tests too slow.
In order to active debugging selectively in a dynamic way, it is possible touse the function raxSetDebugMsg(0) or raxSetDebugMsg(1) to disable/enabledebugging.
A problem when debugging code doing complex memory operations like a radixtree implemented the way Rax is implemented, is to understand where the bughappens (for instance a memory corruption). For that goal it is possible touse the function raxTouch() that will basically recursively access everynode in the radix tree, itearting every sub child. In combination withtools like Valgrind, it is possible then to perform the following patternin order to narrow down the state causing a give bug:
The rax-test is executed using Valgrind, adding a printf() so that forthe fuzz tester we see what iteration in the loop we are in.
After every modification of the radix tree made by the fuzz testerin rax-test.c, we add a call to raxTouch().
Now as soon as an operation will corrupt the tree, raxTouch() willdetect it (via Valgrind) immediately. We can add more calls to narrowthe state.
At this point a good idea is to enable Rax debugging messages immediatelybefore the moment the tree is corrupted, to see what happens. This canbe achieved by adding a few "if" statements inside the code, since weknow the iteration that causes the corruption (because of step 1).
This method was used with success during rafactorings in order to debug theintroduced bugs.