- Notifications
You must be signed in to change notification settings - Fork349
Succinct Data Structure Library 2.0
License
simongog/sdsl-lite
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The Succinct Data Structure Library (SDSL) is a powerful and flexible C++11library implementing succinct data structures. In total, the library containsthe highlights of 40research publications. Succinct data structurescan represent an object (such as a bitvector or a tree) in space close to theinformation-theoretic lower bound of the object while supporting operationsof the original object efficiently. The theoretical time complexity of anoperation performed on the classical data structure and the equivalentsuccinct data structure are (most of the time) identical.
Succinct data structures have very attractive theoretical properties. However,in practice implementing succinct data structures is non-trivial as they areoften composed of complex operations on bitvectors. The SDSL Library provideshigh quality, open source implementations of many succinct data structuresproposed in literature.
Specifically, the aim of the library is to provide basic and complex succinctdata structure which are
- Easy and intuitive to use (like theSTL, which provides classical data structures),
- Faithful to the original theoretical results,
- Capable of handling large inputs (yes, we support 64-bit),
- Provide efficient construction of all implemented succinct data structures,while at the same time enable good run-time performance.

In addition we provide additional functionality which can help you use succinctdata structure to their full potential.
- Each data structure can easily be serialized and loaded to/from disk.
- We provide functionality which helps you analyze the storage requirements of anySDSL based data structure (see right)
- We support features such as hugepages and tracking the memory usage of eachSDSL data structure.
- Complex structures can be configured by template parameters and thereforeeasily be composed. There exists one simple method which constructsall complex structures.
- We maintain an extensive collection of examples which help you use the differentfeatures provided by the library.
- All data structures are tested for correctness using a unit-testing framework.
- We provide a large collection of supporting documentation consisting of examples,cheat sheet,tutorial slides and walk-through.
The library contains many succinct data structures from the following categories:
- Bitvectors supporting Rank and Select
- Integer Vectors
- Wavelet Trees
- Compressed Suffix Arrays (CSA)
- Balanced Parentheses Representations
- Longest Common Prefix (LCP) Arrays
- Compressed Suffix Trees (CST)
- Range Minimum/Maximum Query (RMQ) Structures
For a complete overview including theoretical bounds see thecheat sheet or thewiki.
We provide an extensive set of documentation describing all data structuresand features provided by the library. Specifically we provide
- Acheat sheet which succinctlydescribes the usage of the library.
- An doxygen generatedAPI reference which lists all types and functionsof the library.
- A set ofexample programs demonstrating how different featuresof the library are used.
- A tutorialpresentation with theexample code using in thesides demonstrating all features of the library in a step-by-step walk-through.
- Unit Tests which contain small code snippets used to test eachlibrary feature.
The SDSL library requires:
- A modern, C++11 ready compiler such as
g++
version 4.9 or higher orclang
version 3.2 or higher. - Thecmake build system.
- A 64-bit operating system. Either Mac OS X or Linux are currently supported.
- For increased performance the processor of the system should support fast bit operations available in
SSE4.2
To download and install the library use the following commands.
git clone https://github.com/simongog/sdsl-lite.gitcd sdsl-lite./install.sh
This installs the sdsl library into theinclude
andlib
directories in yourhome directory. A different location prefix can be specified as a parameter oftheinstall.sh
script:
./install.sh /usr/local/
To build a portable sdsl library without usingSSE4.2
andAVX2
instructions, setBUILD_PORTABLE
at build time, e.g.BUILD_PORTABLE=1 ./install.sh
ormkdir build && BUILD_PORTABLE=1 cmake ..
.These instructions are enabled by default if the processor of the build system supports them.
To remove the library from your system use the provided uninstall script:
./uninstall.sh
There is also aGentoo Ebuild for SDSL by Mathias Weller.
To get you started with the library you can start by compiling the followingsample program which constructs a compressed suffix array (a FM-Index) over thetextmississippi!
, counts the number of occurrences of patternsi
andstores the data structure, and a space usage visualization to thefilesfm_index-file.sdsl
andfm_index-file.sdsl.html
:
#include<sdsl/suffix_arrays.hpp>#include<fstream>usingnamespacesdsl;intmain() { csa_wt<> fm_index;construct_im(fm_index,"mississippi!",1); std::cout <<"'si' occurs" <<count(fm_index,"si") <<" times.\n";store_to_file(fm_index,"fm_index-file.sdsl"); std::ofstreamout("fm_index-file.sdsl.html"); write_structure<HTML_FORMAT>(fm_index,out);}
To compile the program usingg++
run:
g++ -std=c++11 -O3 -DNDEBUG -I~/include -L~/lib program.cpp -o program -lsdsl -ldivsufsort -ldivsufsort64
Next we suggest you look at the comprehensivetutorial which describesall major features of the library or look at some of the providedexamples.
Implementing succinct data structures can be tricky. To ensure that all datastructures behave as expected, we created a large collection of unit testswhich can be used to check the correctness of the library on your computer.Thetest directory contains test code. We usegoogletestframework andmake to run the tests. See the README file in thedirectory for details.
To simply run all unit tests after installing the library type
cd sdsl-lite/buildmake test-sdsl
Note: Running the tests requires several sample files to be downloaded from the weband can take up to 2 hours on slow machines.
To ensure the library runs efficiently on your system we suggest you run ourbenchmark suite. The benchmark suite recreates apopularexperimental study which you candirectly compare to the results of your benchmark run.
While we use an extensive set of unit tests and test coverage tools you mightstill find bugs in the library. We encourage you to report any problems withthe library via thegithub issue tracking systemof the project.
The latest version can be found on the SDSL github project pagehttps://github.com/simongog/sdsl-lite .
If you are running experiments in an academic settings we suggest you use themost recentreleased versionof the library. This allows others to reproduce your experiments exactly.
The SDSL library is free software provided under the GNU General Public License(GPLv3). For more information see theCOPYING file in the librarydirectory.
We distribute this library freely to foster the use and development of advanceddata structure. If you use the library in an academic setting please cite thefollowing paper:
@inproceedings{gbmp2014sea, title = {From Theory to Practice: Plug and Play with Succinct Data Structures}, author = {Gog, Simon and Beller, Timo and Moffat, Alistair and Petri, Matthias}, booktitle = {13th International Symposium on Experimental Algorithms, (SEA 2014)}, year = {2014}, pages = {326-337}, ee = {http://dx.doi.org/10.1007/978-3-319-07959-2_28}}
A preliminary version is availablehere on arxiv.
We have included the code of two excellent suffix arrayconstruction algorithms.
- Yuta Mori's incredible fast suffixlibdivsufsortalgorithm for byte-alphabets.
- An adapted version ofJesper Larsson'simplementation ofsuffix array sorting on integer-alphabets (description ofLarsson and Sadakane).
Additionally, we use thegoogletest framework to provide unit tests.Our visualizations are implemented using thed3js-library.
The main contributors to the library are:
This project is also supported by code contributionsfrom other researchers. E.g. Juha Kärkkäinen,Dominik Kempa,and Simon Puglisi contributed a compressed bitvectorimplementation (hyb_vector).This project further profited from excellent input of our studentsMarkus Brenner, Alexander Diehm, Christian Ocker, and Maike Zwerger. StefanArnold helped us with tricky template questions. We are also grateful toDiego Caro,Travis Gagie,Kalle Karhu,Bruce Kuo,Jan Kurrus,Shanika Kuruppu,Jouni Siren,andJulio Vizcainofor bug reports.
Are you working on a new or improved implementation of a succinct data structure?We encourage you to contribute your implementation to the SDSL library to makeyour work accessible to the community within the existing library framework.Feel free to contact any of the authors or create an issue on theissue tracking system.
About
Succinct Data Structure Library 2.0