Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A library for performing Content-Defined Chunking (CDC) on data streams.

License

NotificationsYou must be signed in to change notification settings

green-coder/cdc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A library for performingContent-Defined Chunking (CDC) on data streams. Implemented using generic iterators, very easy to use.

Example

let reader:BufReader<File> =BufReader::new(file);let byte_iter = reader.bytes().map(|b| b.unwrap());// Finds and iterates on the separators.for separatorinSeparatorIter::new(byte_iter){println!("Index: {}, hash: {:016x}", separator.index, separator.hash);}

Each module is documented via an example which you can find in theexamples/ folder.

To run them, use a command like:

cargo run --example separator --release

Note: Some examples are looking for a file namedmyLargeFile.bin which I didn't upload to Github. Please use your own files for testing.

What's in the crate

From low level to high level:

  • ARollingHash64 trait, for rolling hash with a 64 bits hash value.

  • Rabin64, an implementation of the Rabin Fingerprint rolling hash with a 64 bits hash value.

  • Separator, a struct which describes a place in a data stream identified as a separator.

  • SeparatorIter, an adaptor which takes anIterator<Item=u8> as input and which enumerates all the separators found.

  • Chunk, a struct which describes a piece of the data stream (index and size).

  • ChunkIter, an adaptor which takes anIterator<Item=Separator> as input and which enumerates chunks.

Implementation details

  • The library is not cutting any files, it only provides information on how to do it.

  • You can change the default window size used byRabin64, and how theSeparatorIter is choosing the separator.

  • The design of this crate may be subject to changes sometime in the future. I am waiting for some features ofRust to mature up, specially theimpl Trait feature.

Performance

There is ahuge difference between the debug build and the release build in terms of performance. Remember that when you test the lib, usecargo run --release.

I may try to improve the performance of the lib at some point, but for now it is good enough for most usages.

License

Coded with ❤️ , licensed under the terms of theMIT license.

About

A library for performing Content-Defined Chunking (CDC) on data streams.

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

    Packages

    No packages published

    Contributors2

    •  
    •  

    Languages


    [8]ページ先頭

    ©2009-2025 Movatter.jp