- Notifications
You must be signed in to change notification settings - Fork99
The PebblesDB write-optimized key-value store (SOSP 17)
License
utsaslab/pebblesdb
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
PebblesDB is a write-optimized key-value store which is built usingthe novel FLSM (Fragmented Log-Structured Merge Tree) datastructure. FLSM is a modification of the standard log-structured merge tree data structure whichaims at achieving higher write throughput and lower writeamplification without compromising on read throughput.
PebblesDB is built by modifyingHyperLevelDB which, in turn,is built on top ofLevelDB. PebblesDB is APIcompatible with HyperLevelDB and LevelDB. Thus, PebblesDB is adrop-in replacement for LevelDB and HyperLevelDB. The source code is available onGithub. The full paper onPebblesDB can be foundhere. The slides for the SOSP 17 talk, which explains the core ideas behind PebblesDB, can be foundhere.
If you are using LevelDB in your deployment, do consider trying outPebblesDB! PebblesDB can also be used to replace RocksDB as long asthe RocksDB-specific functionality like column families are not used.
Pleasecitethe following paper if you use PebblesDB:PebblesDB: BuildingKey-Value Stores using Fragmented Log-Structured MergeTrees. PandianRaju, Rohan Kadekodi, Vijay Chidambaram, Ittai Abraham.SOSP17.Bibtex
Thebenchmarkspagehas a list of experiments evaluating PebblesDB vs LevelDB,HyperLevelDB, and RocksDB. The summary is that PebblesDB outperformsthe other stores on write throughput, equals other stores on readthroughput, and incurs a penalty for small range queries on fullycompacted key-value stores. PebblesDB achieves6x the write throughput of RocksDB, while providing similar read throughput, and performing 50% lesser IO. Please see the paper for more details.
If you would like to run MongoDB with PebblesDB as the storage engine, please check outmongo-pebbles, a modification of the mongo-rocks layer between RocksDB and MongoDB.
PebblesDB requireslibsnappy
andlibtool
. To install on Linux, please usesudo apt-get install libsnappy-dev libtool
. For MacOSX, usebrew install snappy
and instead ofldconfig
, useupdate_dyld_shared_cache
.
PebblesDB was built, compiled, and tested with g++-4.7, g++-4.9, and g++-5. It may not work with other versions of g++ and other C++ compilers.
Using Autotools:
$ cd pebblesdb/src$ autoreconf -i$ ./configure$ make$ make install$ ldconfig
Using CMake:
$ mkdir -p build&&cd build$ cmake ..&& make install -j16
cd pebblesdb/src/
make db_bench
(this only works if you are compiling using autotools, and have doneautoreconf
andconfigure
before this step)./db_bench --benchmarks=<list-of-benchmarks> --num=<number-of-keys> --value_size=<size-of-value-in-bytes> --reads=<number-of-reads> --db=<database-directory-path>
A complete set of parameters can be found indb/db_bench.cc
Sample usage:./db_bench --benchmarks=fillrandom,readrandom --num=1000000 --value_size=1024 --reads=500000 --db=/tmp/pebblesdbtest-1000
Usefilter
benchmark property to print the filter policy statistics like memory usage.
./db_bench --benchmarks=fillrandom,readrandom,filter --num=1000000 --value_size=1024 --reads=500000 --db=/tmp/pebblesdbtest-1000
fillrandom : 110.460 micros/op; 9.0 MB/s readrandom : 4.120 micros/op; (5000 of 10000 found) Filter in-memory size: 0.024 MB Count of filters: 1928
PebblesDB uses the FLSM data structure to logically arrange the sstableson disk. FLSM helps in achieving high write throughput by reducingwrite amplification. But in FLSM, each guard can contain multipleoverlapping sstables. Hence a read or seek over the database requiresexamining one guard (multiple sstables) per level, thereby increasingthe read/seek latency. PebblesDB employs some optimizations to tacklethese challenges as follows:
PebblesDB makes use of sstable-level bloom filter instead of blocklevel bloom filter used in HyperLevelDB or LevelDB. With thisoptimization, even though a guard can contain multiple sstables,PebblesDB effectively reads only one sstable from disk per level.
By default, this optimization is turned on, but this can be disabledby commenting the macro
#define FILE_LEVEL_FILTER
indb/version_set.h
. Remember to domake db_bench
after making achange.
Sstable-level bloom filter can't be used to reduce the disk read forseek
operation sinceseek
has to examine all files within a guardeven if a file doesn't contain the key. To tackle this challenge,PebblesDB does two optimizations:
Parallel seeks: PebblesDB employs multiple threads to do
seek()
operation on multiple files within a guard. Note that thisoptimization might only be helpful when the size of the data set ismuch larger than the RAM size because otherwise the overhead of threadsynchronization conceals the benefits obtained by using multiplethreads. By default, this optimization is disabled. This can beenabled by uncommenting#define SEEK_PARALLEL
indb/version_set.h
.Forced compaction: When the workload is seek-heavy, PebblesDBcan be configured to do a seek-based forced compaction which aims toreduce the number of files within a guard. This can lead to anincrease in write IO, but this is a trade-off between write IO andseek throughput. By default, this optimization is enabled. This canbe disabled by uncommenting
#define DISABLE_SEEK_BASED_COMPACTION
indb/version_set.h
.
The amount of overhead PebblesDB has for read/seek workloads as wellas the amount of gain it has for write workloads depends on a singleparameter:
kMaxFilesPerGuardSentinel
, which determines the maximumnumber of sstables that can be present within a single guard.This parameter can be set in
db/dbformat.h
(default value:2). Setting this parameter high will favor write throughput whilesetting it lower will favor read/seek throughputs.
The Java Native Interface wrapper to PebblesDB is availablehere.Please follow the instructions specified underRunning YCSB Workloads with PebblesDB section for running the YCSB benchmarks.
The YCSB bindings for PebblesDB can be foundhere.
The following improvements are made to the codebase after the SOSP paper:
- Add CMake build system support (Zeyuan Hu @xxks-kkk)
- Add JNI Wrapper and support for running YCSB benchmarks (Abhijith Nair @abhijith97)
- Accounting for memory used by bloom filters (Karuna Grewal @aakp10)
Please contact us atvijay@cs.utexas.edu
with any questions. Dropus a note if you are using or plan to use PebblesDB in your company oruniversity.
About
The PebblesDB write-optimized key-value store (SOSP 17)