- Notifications
You must be signed in to change notification settings - Fork10
AtomSpace Graph Database RocksDB backend
License
opencog/atomspace-rocks
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Save and restoreAtomSpacecontents as well as individual Atoms to aRocksDB database. The RocksDB database is ahigh-performance, zero-configuration, single-user, local-host-onlyfile-backed database. It provides top-notch read and write performance,which is the #1 reason you should e interested in using it. But pleasenote: only one running AtomSpace executable can connect to it at anygiven moment. Multi-user, networked AtomSpaces are provided by theAtomSpace-CogStorageNode
driver.
In ASCII-art:
+---------------------+ | | | AtomSpace | | | +-- StorageNode API --+ | | | RocksStorageNode | | | +---------------------+ | RocksDB | +---------------------+ | filesystem | +---------------------+
Each box is a shared library. Library calls go downwards. TheStorageNode API is the sameforallStorageNode
s; theRocksStorageNode
is just one of them.
RocksDB (seehttps://rocksdb.org/) is an "embeddable persistent key-valuestore for fast storage." The goal of layering the AtomSpace on top of itis to provide fast persistent storage for the AtomSpace. There areseveral advantages to doing this:
- RocksDB is file-based, and so it is straight-forward to make backupcopies of datasets, as well as to share these copies with others.(You don't need to be a DB Admin to do this!)
- RocksDB runs locally, and so the overhead of pushing bytes throughthe network is eliminated. The remaining inefficiencies/bottleneckshave to do with converting between the AtomSpace's natural in-RAMformat, and the position-independent format that all databases need.(Here, we say "position-independent" in that the DB format does notcontain any C/C++ pointers; all references are managed with localunique ID's.)
- RocksDB is a "real" database, and so enables the storage of datasetsthat might not otherwise fit into RAM. This back-end does not tryto guess what your working set is; it is up to you to load, work withand save those Atoms that are important for you. Theexamplesdemonstrate exactly how that can be done.
This backend, together with the CogServer-basednetwork AtomSpacebackend provides a building-block out of which more complexdistributed and/or decentralized AtomSpaces can be built.
This isVersion 1.5.1. All unit tests pass. It has been used inat least one major project, to process tens of millions of Atoms.
This code is 2x or 3x faster than thePostgresStorageNodeon synthetic benchmarks, and has been observed to run 12x fasterin a real-world application. At least half of thise performancedifference can be explained by the fact that the PostgresStorageNodeis old and has a sub-optimal design. Someone should port the Rockscode here to create a new, better Postgres StorageNode.
The build and install ofatomspace-rocks
follows the same pattern asother AtomSpace projects.
RocksDB is a prerequisite. On Debian/Ubuntu,apt install librocks-dev
Then build, install and test:
cd to project dir atomspace-rocks mkdir build cd build cmake .. make -j4 sudo make install make check
See the examples directory for details. In brief:
$ guilescheme@(guile-user)> (use-modules (opencog))scheme@(guile-user)> (use-modules (opencog persist))scheme@(guile-user)> (use-modules (opencog persist-rocks))scheme@(guile-user)> (define sto (RocksStorageNode "rocks:///tmp/foo.rdb/"))scheme@(guile-user)> (cog-open sto)scheme@(guile-user)> (load-atomspace)scheme@(guile-user)> (cog-close sto)
That's it! You've loaded the entire contents offoo.rdb
into theAtomSpace! Of course, loading everything is not generally desirable,especially when the file is huge and RAM space is tight. More granularload and store is possible; see theexamples directory fordetails.
There are two implementations in this repo: a simple one, suitable forusers who use only a single AtomSpace, and a sophisticated one, intendedfor sophisticated users who need to work with complex DAG's ofAtomSpaces. These two are accessed by using eitherMonoStorageNode
or by usingRocksStorageNode
. Both use the standardStorageNode
API.
The implementation ofMonoStorageNode
is smaller and simpler, and isthe easier of the two to understand.
The implementation ofRocksStorageNode
provides full support for deepstacks (DAG's) of AtomSpaces, layered one on top another (called"Frames", a name meant to suggest "Kripke Frames" and "stackframes").An individual "frame" can be thought of as a change-set, a collection ofdeltas to the next frame further down in the DAG. A frame inheritingfrom multiple AtomSpaces contains the set-union of Atoms in thecontributing AtomSpaces. Atoms and Values can added, changed and removedin each changeset, without affecting Atoms and Values in deeper frames.
This is a minimalistic implementation. There has been no performancetuning. There's only just enough code to make everything work; that'sit. This does nothing at all fancy/sophisticated with RocksDB, and itmight be possible to improve performance and squeeze out some air.However, the code is not sloppy, so it might be hard to make it gofaster.
If you are creating a new StorageNode to some other kind of database,using the code here as a starting point would make an excellent designchoice. All the hard problems have been solved, and yet the overalldesign remains fairly simple. All you'd need to do is to replace allof the references to RocksDB to your favorite, desired DB.
About
AtomSpace Graph Database RocksDB backend