Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork5
A library for performing Content-Defined Chunking (CDC) on data streams.
License
green-coder/cdc
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
A library for performingContent-Defined Chunking (CDC) on data streams. Implemented using generic iterators, very easy to use.
let reader:BufReader<File> =BufReader::new(file);let byte_iter = reader.bytes().map(|b| b.unwrap());// Finds and iterates on the separators.for separatorinSeparatorIter::new(byte_iter){println!("Index: {}, hash: {:016x}", separator.index, separator.hash);}
Each module is documented via an example which you can find in theexamples/ folder.
To run them, use a command like:
cargo run --example separator --releaseNote: Some examples are looking for a file namedmyLargeFile.bin which I didn't upload to Github. Please use your own files for testing.
From low level to high level:
A
RollingHash64trait, for rolling hash with a 64 bits hash value.Rabin64, an implementation of the Rabin Fingerprint rolling hash with a 64 bits hash value.Separator, a struct which describes a place in a data stream identified as a separator.SeparatorIter, an adaptor which takes anIterator<Item=u8>as input and which enumerates all the separators found.Chunk, a struct which describes a piece of the data stream (index and size).ChunkIter, an adaptor which takes anIterator<Item=Separator>as input and which enumerates chunks.
The library is not cutting any files, it only provides information on how to do it.
You can change the default window size used by
Rabin64, and how theSeparatorIteris choosing the separator.The design of this crate may be subject to changes sometime in the future. I am waiting for some features of
Rustto mature up, specially theimpl Traitfeature.
There is ahuge difference between the debug build and the release build in terms of performance. Remember that when you test the lib, usecargo run --release.
I may try to improve the performance of the lib at some point, but for now it is good enough for most usages.
Coded with ❤️ , licensed under the terms of theMIT license.
About
A library for performing Content-Defined Chunking (CDC) on data streams.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Sponsor this project
Uh oh!
There was an error while loading.Please reload this page.
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors2
Uh oh!
There was an error while loading.Please reload this page.