- Notifications
You must be signed in to change notification settings - Fork22
Hashing-function agnostic Cuckoo filters for Redis
License
kristoff-it/redis-cuckoofilter
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Hashing-function agnostic Cuckoo filters for Redis.
Cuckoo filters are a probabilistic data structure that allows you to test formembership of an element in a set without having to hold the whole set inmemory.
This is done at the cost of having a probability of getting a false positiveresponse, which, in other words, means that they can only answer "Definitely no"or "Probably yes". The false positive probability is roughly inversely relatedto how much memory you are willing to allocate to the filter.
The most iconic data structure used for this kind of task are Bloom filtersbut Cuckoo filters boast both better practical performance and efficiency, and,more importantly, the ability ofdeleting elements from the filter.
Bloom filters only support insertion of new items.Some extensions of Bloom filters have the ability of deleting items but theyachieve so at the expense of precision or memory usage, resulting in a far worsetradeoff compared to what Cuckoo filters offer.
Cuckoo filters offer a very interesting division of labour between server andclients.
Since Cuckoo filters rely on a single hashing of the original item you want toinsert, it is possible to off-load that part of the computation to the client.In practical terms it means that instead of sending the whole item to Redis, theclients sendhash
andfingeprint
of the original item.
- You need to push trough the cable a constant amount of data per item insteadof N bytes(Redis is a remote service afterall, you're going through a UNIXsocket at the very least).
- To perform well, Cuckoo filters rely on a good choice of fingerprint for eachitem and it should not be left to the library.
- The hash function can be decided by you, meaning that this module ishashing-function agnostic.
The last point is the most important one.It allows you to be more flexible in case you need to reason about item hashesacross different clients potentially written in different languages.
Additionally, different hashing function families specialize on different usecases that might interest you or not. For example some work best for small data(< 7 bytes), some the opposite. Some focus more on performance at the expense ofmore collisions, while some others behave better than the rest on peculiarplatforms.
This blogpostshows a few benchmarks of different hashing function families.
Considering all of that, the choice of hashing and fingerprinting functions hasto be up to you.
For the internal partial hashing that has to happen when reallocating afingerprint server-side, this implementation uses FNV1a which is robust and fastfor 1 byte inputs (the size of a fingerprint).
Thanks to how Cuckoo filters work, that choice is completely transparent to theclients.
Download a precompiled binary from theRelease sectionof this repo or compile it yourself (instructions at the end of this README).
Put
libredis-cuckoofilter.so
module in a folder readable by your Redisserver.To try out the module you can send
MODULE LOAD /path/to/libredis-cuckoofilter.so
using redis-cli or a client ofyour choice.Once you save on disk a key containing a Cuckoo filter you will need to add
loadmodule /path/to/libredis-cuckoofilter.so
to yourredis.conf
, otherwiseRedis will not load complaining that it doesn't know how to read some datafrom the.rdb
file.
redis-cli> MODULE LOAD /path/to/libredis-cuckoofilter.soOKredis-cli> CF.INIT test 64KOK redis-cli> CF.ADD test 5366164415461427448 97OKredis-cli> CF.CHECK test 5366164415461427448 97(integer) 1redis-cli> CF.REM test 5366164415461427448 97OK redis-cli> CF.CHECK test 5366164415461427448 97(integer) 0
importredisr=redis.Redis()# Load the module if you haven't done so alreadyr.execute_command("module","load","/path/to/libredis-cuckoofilter.so")# Create a filterr.execute_command("cf.init","test","64k")# Define a fingerprinting function, for hashing we'll use python's builtin `hash()`deffingerprint(x):returnord(x[0])# takes the first byte and returns its numerical valueitem="banana"# Add an item to the filterr.execute_command("cf.add","test",hash(item),fingerprint(item))# Check for its presencer.execute_command("cf.check","test",hash(item),finterprint(item))# => true# Check for a non-existing itemr.execute_command("cf.check","test",hash("apple"),fingerprint("apple"))# => false
In Cuckoo filters the number of bytes that we decide to use as fingerprintwill directly impact the maximum false positive error rate of a given filter.This implementation supports 1, 2 and 4-byte wide fingerprints.
Error % ->3.125e-02 (~0.03, i.e. 3%)
Error % ->1.22070312e-04 (~0.0001, i.e. 0.01%))
Error % ->9.31322574e-10 (~0.000000001, i.e. 0.0000001%)
Returns the correct size for a filter that must hold at mostuniverse
items.Defaultfpsize
is 1, specify a different value if you need an error rate lowerthan 3%.Cuckoo filters should never be filled over 80% of their maximum theoretical capacityboth for performance reasons and because a filter that approaces 100% fill rate willstart refusing inserts with aERR too full
error.This command will automatically paduniverse
for you. UseEXACT
if you don't wantthat behavior.
Returns the theoretical maximum number of items that can be added to a filter of givensize
andfpsize
. Defaultfpsize
is 1.
Instantiates a new filter. UseCF.SIZEFOR
to know the correct value forsize
.Supported sizes are a power of 2 in this range:1K .. 8G
.Default error rate is 3%, usefpsize
to specify a different target error rate.
Adds a new item to the filter. Bothhash
andfp
must be numbers.In particular,hash
has to be a 64bit representable number, whilefp
should be afpsize
representable number. As an example, a filter withfpsize
set to1
will cause the maximum recommended value offp
to be255
.Thefp
argument is au32
so(2^32)-1
is its maximum valid value, but whenfpsize
is lower than4
, high bits will be truncated (e.g.-1 == 255
whenfpsize == 1
).
You can use both signed and unsigned values as long as you are consistentin their use. Internally all values will be transalted to unsigned.If a filter is undersized/overfilled or you are adding multiple copies ofthe same item or, worse, you're not properely handling information entropy,this command will returnERR too full
.Read the extented example inkristoff-it/zig-cuckoofilterto learn more about misusage scenarios.
Deletes an item. Accepts the same arguments asCF.ADD
.WARNING: this command must be used to only delete items that werepreviously inserted. Trying to delete non-existing items will corrupt thefilter and cause it to lockdown. When that happens all command will startreturningERR broken
, because at that point it will be impossible toknow what the correct state would be. Incurring inERR broken
isa usage error and should never happen. Read the extented example inkristoff-it/zig-cuckoofilterto learn more about misusage scenarios.
Checks if an item is present in the filter or not. Returns1
for thepositive case and0
otherwise. Accepts the same arguments asCF.ADD
.
Returns the number of items present in the filter.
Returns1
if the filter was broken because of misusage ofCF.REM
,returns0
otherwise. A broken filter cannot be fixed and will startreturningERR broken
from most comamnds.
Returns1
if the filter is too full, returns0
otherwise.This command can return1
even if you never received aERR too full
from a call toCF.ADD
.Read the extented example inkristoff-it/zig-cuckoofilterto learn more about misusage scenarios.
If you are adding and alsodeleting items from the filterbut in a moment ofcongestion you ended up ovferfilling the filter,this command can help re-distribute some items to fix the situation.It's not a command you should ever rely on because it should neverbe needed if you properly sized your filter usingCF.SIZEFOR
.Read the extented example inkristoff-it/zig-cuckoofilterto learn more about misusage scenarios.
Checkoutkristoff-it/zig-cuckoofilterfor more information about advanced usage of Cuckoo filters andhow to deal (and most importantly, prevent) failure scenarios.
- Advanced client-side syncrhonizationGiven that now the logic is bundled in zig-cuckoofilter and thatit can now be used by any C ABI compatible target (checkout therepo for examples in C, JS, Python and Go), combined with Streamsit would be possible to keep a client-side Cuckoo filter syncedwith one in Redis, allowing clients to keep reads locally andasyncrhonously sync with Redis to obtain new updates to the filter.
Download the latest Zig compiler version fromhttp://ziglang.org.
$ zig build-lib -dynamic -isystem src --release-fast src/redis-cuckoofilter.zig
$ zig build-lib -dynamic -isystem src --release-fast -target x86_64-linux --library c src/redis-cuckoofilter.zig
Usezig targets
for the complete list of available targets.
MIT License
Copyright (c) 2019 Loris Cro
Permission is hereby granted, free of charge, to any person obtaining a copyof this software and associated documentation files (the "Software"), to dealin the Software without restriction, including without limitation the rightsto use, copy, modify, merge, publish, distribute, sublicense, and/or sellcopies of the Software, and to permit persons to whom the Software isfurnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in allcopies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS ORIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THEAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHERLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THESOFTWARE.
About
Hashing-function agnostic Cuckoo filters for Redis
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.