Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 4,533 Commits
.github		.github
art		art
examples		examples
fuzz		fuzz
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.rustfmt.toml		.rustfmt.toml
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
RELEASE_CHECKLIST.md		RELEASE_CHECKLIST.md
SAFETY.md		SAFETY.md
SECURITY.md		SECURITY.md
code-of-conduct.md		code-of-conduct.md
tsan_suppressions.txt		tsan_suppressions.txt

Repository files navigation

key	value
documentation
chat about databases with us

sled -it's all downhill from here!!!

An embedded database.

let tree = sled::open("/tmp/welcome-to-sled")?;// insert and get, similar to std's BTreeMaplet old_value = tree.insert("key","value")?;assert_eq!(  tree.get(&"key")?,Some(sled::IVec::from("value")),);// range queriesfor kv_resultin tree.range("key_1".."key_9"){}// deletionlet old_value = tree.remove(&"key")?;// atomic compare and swaptree.compare_and_swap("key",Some("current_value"),Some("new_value"),)?;// block until all operations are stable on disk// (flush_async also available to get a Future)tree.flush()?;

$${\color{red}This \space README \space is \space out \space of \space sync \space with \space the \space main \space branch \space which \space contains \space a \space large \space in-progress \space rewrite }$$

If you would like to work with structured data without paying expensive deserialization costs, check out thestructured example!

features

API similar to a threadsafeBTreeMap<[u8], [u8]>
serializable (ACID)transactionsfor atomically reading and writing to multiple keys in multiple keyspaces.
fully atomic single-key operations, includingcompare and swap
zero-copy reads
write batches
subscribe to changes on keyprefixes
multiple keyspaces
merge operators
forward and reverse iterators over ranges of items
a crash-safe monotonicID generatorcapable of generating 75-125 million unique ID's per second
zstd compression (use thecompression build feature, disabled by default)
cpu-scalable lock-free implementation
flash-optimized log-structured storage
uses modern b-tree techniques such as prefix encoding and suffixtruncation for reducing the storage costs of long keys with sharedprefixes. If keys are the same length and sequential then thesystem can avoid storing 99%+ of the key data in most cases,essentially acting like a learned index

expectations, gotchas, advice

Maybe one of the first things that seems weird is theIVec type.This is an inlinableArced slice that makes some things more efficient.
Durability:sled automatically fsyncs every 500ms by default,which can be configured with theflush_every_ms configurable, or you maycallflush /flush_async manually after operations.
Transactions are optimistic - do not interact with external stateor perform IO from within a transaction closure unless it isidempotent.
Internal tree node optimizations: sled performs prefix encodingon long keys with similar prefixes that are grouped together in a range,as well as suffix truncation to further reduce the indexing costs oflong keys. Nodes will skip potentially expensive length and offset pointersif keys or values are all the same length (tracked separately, don't worryabout making keys the same length as values), so it may improve space usageslightly if you use fixed-length keys or values. This also makes it easierto usestructured access as well.
sled does not support multiple open instances for the time being. Pleasekeep sled open for the duration of your process's lifespan. It's totallysafe and often quite convenient to use a global lazy_static sled instance,modulo the normal global variable trade-offs. Every operation is threadsafe,and most are implemented under the hood with lock-free algorithms that avoidblocking in hot paths.

performance

LSM tree-like write performancewithtraditional B+ tree-like read performance
over a billion operations in under a minute at 95% read 5% writes on 16 cores on a small dataset
measure your own workloads rather than relying on some marketing for contrived workloads

a note on lexicographic ordering and endianness

If you want to store numerical keys in a way that will play nicely with sled's iterators and ordered operations, please remember to store your numerical items in big-endian form. Little endian (the default of many things) will often appear to be doing the right thing until you start working with more than 256 items (more than 1 byte), causing lexicographic ordering of the serialized bytes to diverge from the lexicographic ordering of their deserialized numerical form.

Rust integral types have built-into_be_bytes andfrom_be_bytesmethods.
bincodecan be configured to store integral types in big-endian form.

interaction with async

If your dataset resides entirely in cache (achievable at startup by setting the cacheto a large enough value and performing a full iteration) then all reads and writes arenon-blocking and async-friendly, without needing to use Futures or an async runtime.

To asynchronously suspend your async task on the durability of writes, we support theflush_async method,which returns a Future that your async tasks can await the completion of if they requirehigh durability guarantees and you are willing to pay the latency costs of fsync.Note that sled automatically tries to sync all data to disk several times per secondin the background without blocking user threads.

We support async subscription to events that happen on key prefixes, because theSubscriber struct implementsFuture<Output=Option<Event>>:

let sled = sled::open("my_db").unwrap();letmut sub = sled.watch_prefix("");sled.insert(b"a",b"a").unwrap();extreme::run(asyncmove{whileletSome(event) =(&mut sub).await{println!("got event {:?}", event);}});

minimum supported Rust version (MSRV)

We support Rust 1.62 and up.

architecture

lock-free tree on a lock-free pagecache on a lock-free log. the pagecache scatterspartial page fragments across the log, rather than rewriting entire pages at a timeas B+ trees for spinning disks historically have. on page reads, we concurrentlyscatter-gather reads across the log to materialize the page from its fragments.check out thearchitectural outlookfor a more detailed overview of where we're at and where we see things going!

philosophy

don't make the user think. the interface should be obvious.
don't surprise users with performance traps.
don't wake up operators. bring reliability techniques from academia into real-world practice.
don't use so much electricity. our data structures should play to modern hardware's strengths.

known issues, warnings

if reliability is your primary constraint, use SQLite. sled is beta.
if storage price performance is your primary constraint, use RocksDB. sled uses too much space sometimes.
if you have a multi-process workload that rarely writes, use LMDB. sled is architected for use with long-running, highly-concurrent workloads such as stateful services or higher-level databases.
quite young, should be considered unstable for the time being.
the on-disk format is going to change in ways that requiremanual migrations before the1.0.0 release!

priorities

A full rewrite of sled's storage subsystem is happening on a modular basis as part of thekomora project, in particular the marble storage engine. This will dramatically lower both the disk space usage (space amplification) and garbage collection overhead (write amplification) of sled.
The memory layout of tree nodes is being completely rewritten to reduce fragmentation and eliminate serialization costs.
The merge operator feature will change into a trigger feature that resembles traditional database triggers, allowing state to be modified as part of the same atomic writebatch that triggered it for retaining serializability with reactive semantics.