Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

The PebblesDB write-optimized key-value store (SOSP 17)

License

NotificationsYou must be signed in to change notification settings

utsaslab/pebblesdb

Build StatusLicense

PebblesDB is a write-optimized key-value store which is built usingthe novel FLSM (Fragmented Log-Structured Merge Tree) datastructure. FLSM is a modification of the standard log-structured merge tree data structure whichaims at achieving higher write throughput and lower writeamplification without compromising on read throughput.

PebblesDB is built by modifyingHyperLevelDB which, in turn,is built on top ofLevelDB. PebblesDB is APIcompatible with HyperLevelDB and LevelDB. Thus, PebblesDB is adrop-in replacement for LevelDB and HyperLevelDB. The source code is available onGithub. The full paper onPebblesDB can be foundhere. The slides for the SOSP 17 talk, which explains the core ideas behind PebblesDB, can be foundhere.

If you are using LevelDB in your deployment, do consider trying outPebblesDB! PebblesDB can also be used to replace RocksDB as long asthe RocksDB-specific functionality like column families are not used.

Pleasecitethe following paper if you use PebblesDB:PebblesDB: BuildingKey-Value Stores using Fragmented Log-Structured MergeTrees. PandianRaju, Rohan Kadekodi, Vijay Chidambaram, Ittai Abraham.SOSP17.Bibtex

Thebenchmarkspagehas a list of experiments evaluating PebblesDB vs LevelDB,HyperLevelDB, and RocksDB. The summary is that PebblesDB outperformsthe other stores on write throughput, equals other stores on readthroughput, and incurs a penalty for small range queries on fullycompacted key-value stores. PebblesDB achieves6x the write throughput of RocksDB, while providing similar read throughput, and performing 50% lesser IO. Please see the paper for more details.

If you would like to run MongoDB with PebblesDB as the storage engine, please check outmongo-pebbles, a modification of the mongo-rocks layer between RocksDB and MongoDB.


Dependencies

PebblesDB requireslibsnappy andlibtool. To install on Linux, please usesudo apt-get install libsnappy-dev libtool. For MacOSX, usebrew install snappy and instead ofldconfig, useupdate_dyld_shared_cache.

PebblesDB was built, compiled, and tested with g++-4.7, g++-4.9, and g++-5. It may not work with other versions of g++ and other C++ compilers.

Installation

Using Autotools:

$ cd pebblesdb/src$ autoreconf -i$ ./configure$ make$ make install$ ldconfig

Using CMake:

$ mkdir -p build&&cd build$ cmake ..&& make install -j16

Running microbenchmark

  1. cd pebblesdb/src/
  2. make db_bench (this only works if you are compiling using autotools, and have doneautoreconf andconfigure before this step)
  3. ./db_bench --benchmarks=<list-of-benchmarks> --num=<number-of-keys> --value_size=<size-of-value-in-bytes> --reads=<number-of-reads> --db=<database-directory-path>
    A complete set of parameters can be found indb/db_bench.cc

Sample usage:
./db_bench --benchmarks=fillrandom,readrandom --num=1000000 --value_size=1024 --reads=500000 --db=/tmp/pebblesdbtest-1000

Usefilter benchmark property to print the filter policy statistics like memory usage.

./db_bench --benchmarks=fillrandom,readrandom,filter --num=1000000 --value_size=1024 --reads=500000 --db=/tmp/pebblesdbtest-1000

    fillrandom   :     110.460 micros/op;    9.0 MB/s    readrandom   :       4.120 micros/op; (5000 of 10000 found)    Filter in-memory size: 0.024 MB    Count of filters: 1928

Optimizations in PebblesDB

PebblesDB uses the FLSM data structure to logically arrange the sstableson disk. FLSM helps in achieving high write throughput by reducingwrite amplification. But in FLSM, each guard can contain multipleoverlapping sstables. Hence a read or seek over the database requiresexamining one guard (multiple sstables) per level, thereby increasingthe read/seek latency. PebblesDB employs some optimizations to tacklethese challenges as follows:

Read optimization

  • PebblesDB makes use of sstable-level bloom filter instead of blocklevel bloom filter used in HyperLevelDB or LevelDB. With thisoptimization, even though a guard can contain multiple sstables,PebblesDB effectively reads only one sstable from disk per level.

  • By default, this optimization is turned on, but this can be disabledby commenting the macro#define FILE_LEVEL_FILTER indb/version_set.h. Remember to domake db_bench after making achange.

Seek optimization

Sstable-level bloom filter can't be used to reduce the disk read forseek operation sinceseek has to examine all files within a guardeven if a file doesn't contain the key. To tackle this challenge,PebblesDB does two optimizations:

  1. Parallel seeks: PebblesDB employs multiple threads to doseek() operation on multiple files within a guard. Note that thisoptimization might only be helpful when the size of the data set ismuch larger than the RAM size because otherwise the overhead of threadsynchronization conceals the benefits obtained by using multiplethreads. By default, this optimization is disabled. This can beenabled by uncommenting#define SEEK_PARALLEL indb/version_set.h.

  2. Forced compaction: When the workload is seek-heavy, PebblesDBcan be configured to do a seek-based forced compaction which aims toreduce the number of files within a guard. This can lead to anincrease in write IO, but this is a trade-off between write IO andseek throughput. By default, this optimization is enabled. This canbe disabled by uncommenting#define DISABLE_SEEK_BASED_COMPACTION indb/version_set.h.


Tuning PebblesDB

  • The amount of overhead PebblesDB has for read/seek workloads as wellas the amount of gain it has for write workloads depends on a singleparameter:kMaxFilesPerGuardSentinel, which determines the maximumnumber of sstables that can be present within a single guard.

  • This parameter can be set indb/dbformat.h (default value:2). Setting this parameter high will favor write throughput whilesetting it lower will favor read/seek throughputs.


Running YCSB Benchmarks

The Java Native Interface wrapper to PebblesDB is availablehere.Please follow the instructions specified underRunning YCSB Workloads with PebblesDB section for running the YCSB benchmarks.

The YCSB bindings for PebblesDB can be foundhere.


Improvements made after the SOSP paper

The following improvements are made to the codebase after the SOSP paper:

  • Add CMake build system support (Zeyuan Hu @xxks-kkk)
  • Add JNI Wrapper and support for running YCSB benchmarks (Abhijith Nair @abhijith97)
  • Accounting for memory used by bloom filters (Karuna Grewal @aakp10)

Contact

Please contact us atvijay@cs.utexas.edu with any questions. Dropus a note if you are using or plan to use PebblesDB in your company oruniversity.


[8]ページ先頭

©2009-2025 Movatter.jp