LumoSQL is a modification (not a fork) of theSQLite embedded data storage library, which is among themost-deployed software. We are currently inPhase II of the project.
If you are reading this on GitHub you are looking at a read-only mirror.The master is always available atlumosql.org.LumoSQL adds security, privacy, performance and measurement features to SQLite.
SQLite can test and compare results consistently across many kinds of system and configurations using theNot-forking tool. Example:
LumoSQL can swap back end key-value store engines in and out of SQLite. LMDB isthe most famous (but not the only) example of an alternative key-value store,and LumoSQL can combine dozes of versions of LMDB and SQLite source code likethis:
In LumoSQL 0.4 there are three LumoSQL backends:
We are looking at some interesting new development in key-value storage to add and benchmark.
LumoSQL adds modern encryption to SQLite, includingAttribute-Based Encryption(ABE). This can bedone on a per-row basis, and also includes per-row checksums so that any errorcan be noticed quickly and located down to the individual row. Per-rowchecksums also make some search and comparison operations much faster.
LumoSQL is distributed undervery liberal MIT licence terms.
LumoSQL is supported by theNLNet Foundation.
LumoSQL runs on x86, ARM-32 and RISC-V architectures, and many Linux and BSD OSs.
If you are reading this on Github, then you are looking at a mirror. LumoSQL isis maintained usingthe Fossil repository. If you want to participate in LumoSQL there is a forum, and if you have code contributionsyou can ask for access to the respository.
LumoSQL has multiple upstreams, but does not fork any of them despite needing modifications.The novelNot-forking tool semi-automatically tracks upstream changes and is a requirement for building LumoSQL. Between not-forking and theLumoSQL Build and Benchmark System,LumoSQL is as much about combining and configuring upstreams as it is about creatingoriginal database software. By maintaining Not-forking outside LumoSQL, we hopeother projects will find it useful.
The LumoSQL and SQLite projects are cooperating, so any merge friction isexpected to become less over time, and key to that is the approach of notforking.
LumoSQL exists to demonstrate changes to SQLite that might be useful, but whichSQLite probably cannot consider for many years because of SQLite's uniqueposition of being used by a majority of the world's population.
SQLite is used by thousands of software projects, just three beingGoogle's Android, Mozilla's Firefox and Apple's iOS which between them havebillions of users. That is a main reason why SQLite is so careful and conservativewith all changes.
On the other hand, many of these same users need SQLite to have new featureswhich do not fit with the SQLite project's cautious approach, and LumoSQL is ademonstration of some of these improvements.
The LumoSQL documentation project reviews dozens of relevant codebases. SQLitehas become ubiquitous over two decades, which means there is a great deal ofpreparation needed when considering architectural changes.
As of LumoSQL 0.4, there are many obvious limitations, including:
Most developers already have the required minimum of git and core unix-styledevelopment tools. SQLite has very few dependencies (mostly Tcl), andLumoSQL adds one Perl-based processing tool.
LumoSQL is mirrored to Github and application developers can use gitwith Github in the usual way. LumoSQL developers working on the LumoSQLlibrary internals choose to useFossil source codemanager instead of git, and if you're planningto develop LumoSQL internals then you need Fossil.
There are manyreasons why people chooseFossil.For LumoSQL one of them is that SQLite and Fossil are symbioticprojects, each written in the other.
Uncomment existingdeb-src
line in /etc/apt/sources.list, for examplefor Ubuntu 20.04.2 a valid line is:deb-src http://gb.archive.ubuntu.com/ubuntu focal main restricted
Then runsudo apt update # this fetches the deb-src updatessudo apt full-upgrade # this gets the latest OS updatessudo apt install git build-essential tclxsudo apt build-dep sqlite3
Theexact commands above have been tested on a pristine install of Ubuntu20.04.2 LTS, as installed from ISO or one of the operating systems shipped withWindows Services for Linux.
On any reasonably recent Fedora-derived Linux distribution, including Red Hat:
sudo dnf install --assumeyes \ git make gcc ncurses-devel readline-devel glibc-devel autoconf tcl-devel tclx-devel
Once you have done the setup specific to your operating system in the previoussteps, the following should work on reaonably recent Debian and Fedora-relatedoperating systems, and Gentoo.
Other required tools can be installed from your operating system's standard packages.Here are the tool dependencies:
The not-forking tool will advise you with a messageif you need a tool or a version that is not installed.
OnDebian 10 "Buster" Stable Release, the not-forking makefile("perl Makefile.PL") will warn that git needs to be version 2.22 or higher.Buster has version 2.20, however this is not a critical error. If you don'tlike error messages scrolling past during a build, then install a more recentgitfrom Buster backports.
This is a very brief quickstart, for full detail see theBuild and Benchmark System documentation.
Now you have the dependencies installed, clone the LumoSQL repository usingfossil clone https://lumosql.org/src/lumosql
, which will create a new subdirectory calledlumosql
anda file calledlumosql.fossil
in the current directory.
Try:cd lumosqlmake what
To see what the default sources and options are. Thewhat
target does not make any changes although it may generate a fileMakefile.options
to helpmake
parse the command line.
Benchmarking a single binary should take no longer than 4 minutes to complete dependingon hardware. The results are stored in an SQLite database stored in the LumoSQL top-level directory by default, that is, the directory you just created usingfossil clone
.
Start by building and benchmarking the official SQLite release version 3.35.5, which is the currentrelease at the time of writing this README.
make benchmark USE_LMDB=no USE_BDB=no SQLITE_VERSIONS='3.35.5'
All source files fetched are cached in ~/.cache/LumoSQL in a way that maximises reuse regardless of their origin (Fossil, git, wget etc) and which minimises errors. The LumoSQL build system is driving thenot-fork
tool, which maintains the cache. Not-fork will download just the differences of a remote version if most of the code is already in cache.
The output from this make command will be lots of build messages followed by something like this:
*** Running benchmark 3.35.5 TITLE = sqlite 3.35.5 SQLITE_ID = 1b256d97b553a9611efca188a3d995a2fff71275 SQLITE_NAME = 3.35.5 2021-04-19 18:32:05 1b256d97b553a9611efca188a3d995a2fff712759044ba480f9a0c9e98faalt1 DATASIZE = 1 DEBUG = off LMDB_DEBUG = off LMDB_FIXED_ROWID = off LMDB_TRANSACTION = optimistic ROWSUM = off ROWSUM_ALGORITHM = sha3_256 SQLITE3_JOURNAL = default RUN_ID = 70EA47101F68CDD6D3C0ED255962A2AA50F1540EE4FEBB46A03FAD888B49676C OK 0.003 1 Creating database and tables OK 0.019 2 1000 INSERTs OK 0.007 3 100 UPDATEs without an index, upgrading a read-only transaction OK 0.052 4 25000 INSERTs in a transaction OK 0.113 5 100 SELECTs without an index OK 0.243 6 100 SELECTs on a string comparison OK 0.012 7 Creating an index OK 0.046 8 5000 SELECTs with an index OK 0.036 9 1000 UPDATEs without an index OK 0.113 10 25000 UPDATEs with an index OK 0.093 11 25000 text UPDATEs with an index OK 0.032 12 INSERTs from a SELECT OK 0.020 13 DELETE without an index OK 0.028 14 DELETE with an index OK 0.027 15 A big INSERT after a big DELETE OK 0.010 16 A big DELETE followed by many small INSERTs OK 0.005 17 DROP TABLE 0.859 (total time)
A database with the default name ofbenchmarks.sqlite
has been created withtwo tables containing the results. This is one single test run, and the testrun data is kept in the tabletest_data
. The tablerun_data
contains datarelative to a set of runs (version numbers, time test started, etc). This is cumulative,so another invocation ofmake benchmark
will append tobenchmarks.sqlite
.
Every run is assigned a SHA3 hash, which helps in making results persistent over time and across the internet.
The toolbenchmark-filter.tcl
does some basic processing of these results:
tool/benchmark-filter.tclRUN_ID TARGET DATE TIME DURATION70EA47101F68CDD6D3C0ED255962A2AA50F1540EE4FEBB46A03FAD888B49676C 3.35.5 2021-05-20 16:13:18 0.859
The option DATASIZE=parameter is a multiplication factor on the size of the chunks that is used for benchmarking. This is useful because it can affect the time it takes to run the tests by a very differentmultiplication factor:
make benchmark USE_LMDB=no USE_BDB=no DATASIZE=2 SQLITE_VERSIONS='3.35.5 3.33.0'
followed by:
tool/benchmark-filter.tcl RUN_ID TARGET DATE TIME DURATION70EA47101F68CDD6D3C0ED255962A2AA50F1540EE4FEBB46A03FAD888B49676C 3.35.5 2021-05-20 16:13:18 0.85965DD0759B133FF5DFBBD04C494F4631E013C64E475FC5AC06EC70F4E0333372F 3.35.5++datasize-2 2021-05-20 16:18:30 2.511931B1489FC4477A41914A5E0AFDEF3927C306339FBB863B5FB4CF801C8F2F3D0 3.33.0++datasize-2 2021-05-20 16:18:51 2.572
Simplistically, these results suggest that SQLite version 3.35.5 is faster than3.33.0 on larger data sizes, but that 3.35.5 is much faster with smaller datasizes. After adding more versions and running the benchmarking tool again, we wouldsoon discover that SQLite 3.25.0 seems faster than 3.33.0, and other interesting things. Simplistic interpretations can be misleading :-)
This is a Quickstart, so for full detail you will need theBuild/Benchmark documentation. However as a teaser, and since LMDBwas the original inspiration for LumoSQL (see theHistory section below for more on that) here are some more things thatcan be done with the LMDB target:
$ make what LMDB_VERSIONS=alltclsh tool/build.tcl what not-fork.d MAKE_COMMAND='make' LMDB_VERSIONS='all'BENCHMARK_RUNS=1COPY_DATABASES=COPY_SQL=MAKE_COMMAND=makeNOTFORK_COMMAND=not-forkNOTFORK_ONLINE=0NOTFORK_UPDATE=0SQLITE_VERSIONS=3.35.5USE_SQLITE=yesUSE_BDB=yesSQLITE_FOR_BDB=BDB_VERSIONS=BDB_STANDALONE=18.1.32=3.18.2USE_LMDB=yesSQLITE_FOR_LMDB=3.35.5LMDB_VERSIONS=allLMDB_STANDALONE=OPTION_DATASIZE=1OPTION_DEBUG=offOPTION_LMDB_DEBUG=offOPTION_LMDB_FIXED_ROWID=offOPTION_LMDB_TRANSACTION=optimisticOPTION_ROWSUM=offOPTION_ROWSUM_ALGORITHM=sha3_256OPTION_SQLITE3_JOURNAL=defaultBUILDS= 3.35.5 3.18.2 +bdb-18.1.32 3.35.5+lmdb-0.9.11 3.35.5+lmdb-0.9.12 3.35.5+lmdb-0.9.13 3.35.5+lmdb-0.9.14 3.35.5+lmdb-0.9.15 3.35.5+lmdb-0.9.16 3.35.5+lmdb-0.9.17 3.35.5+lmdb-0.9.18 3.35.5+lmdb-0.9.19 3.35.5+lmdb-0.9.20 3.35.5+lmdb-0.9.21 3.35.5+lmdb-0.9.22 3.35.5+lmdb-0.9.23 3.35.5+lmdb-0.9.24 3.35.5+lmdb-0.9.25 3.35.5+lmdb-0.9.26 3.35.5+lmdb-0.9.27 3.35.5+lmdb-0.9.28 3.35.5+lmdb-0.9.29TARGETS= 3.35.5 3.18.2 +bdb-18.1.32 3.35.5+lmdb-0.9.11 3.35.5+lmdb-0.9.12 3.35.5+lmdb-0.9.13 3.35.5+lmdb-0.9.14 3.35.5+lmdb-0.9.15 3.35.5+lmdb-0.9.16 3.35.5+lmdb-0.9.17 3.35.5+lmdb-0.9.18 3.35.5+lmdb-0.9.19 3.35.5+lmdb-0.9.20 3.35.5+lmdb-0.9.21 3.35.5+lmdb-0.9.22 3.35.5+lmdb-0.9.23 3.35.5+lmdb-0.9.24 3.35.5+lmdb-0.9.25 3.35.5+lmdb-0.9.26 3.35.5+lmdb-0.9.27 3.35.5+lmdb-0.9.28 3.35.5+lmdb-0.9.29
After executing this build withmake benchmark
rather thanmake what
, here are summary results using a a new parameter tobenchmark-filter.tcl
:
$ tool/benchmark-filter.tcl -fields TARGET,DURATIONTARGET DURATION3.35.5 0.8523.35.5+lmdb-0.9.11 1.2013.35.5+lmdb-0.9.12 1.2113.35.5+lmdb-0.9.13 1.2123.35.5+lmdb-0.9.14 1.2193.35.5+lmdb-0.9.15 1.1933.35.5+lmdb-0.9.16 1.1913.35.5+lmdb-0.9.17 1.2133.35.5+lmdb-0.9.18 1.2173.35.5+lmdb-0.9.19 1.2093.35.5+lmdb-0.9.20 1.2233.35.5+lmdb-0.9.21 1.2293.35.5+lmdb-0.9.22 1.2303.35.5+lmdb-0.9.23 1.2153.35.5+lmdb-0.9.24 1.2183.35.5+lmdb-0.9.25 1.2193.35.5+lmdb-0.9.26 1.2203.35.5+lmdb-0.9.27 1.2203.35.5+lmdb-0.9.28 1.2093.35.5+lmdb-0.9.29 1.209
Again, simplistic interpretations are insufficient, but the data here suggests that LMDB has decreasedin performance over time, to improve again with the most recent versions, and no version of LMDB is faster than native SQLite 3.35.5 . However, furtherbenchmark runs indicate that is not the final story, as LMDB run on slower hard disks improve in relative speed rapidly. And using theDATASIZE
option also changes the picture.
The results for the Berkely DB backend are also most interesting.
There have been several implementations of new storage backends to SQLite, all of them hard forksand nearly all dead forks. A backend needs certain characteristics:
There are not many candidate key-value stores. One of the most widely-used isHoward Chu's LMDB. There was a lot of attention in 2013 when Howard releasedhisproof of concept SQLite port. LMDBoperates on a very different and more modern principle to all other widely-usedkey/value stores, potentially bringing benefits to some users of SQLite. In2013, the ported SQLite gave significant performance benefits.
The original 2013 code modified the SQLitebtree.c
from version SQLiteversion 3.7.17 to use LMDB 0.9.9 . It took considerable work for LumoSQL toexcavate the ancient code and reproduce the results.
By January 2020 the LumoSQL project concluded:
Since then, many new possibilities have emerged for LumoSQL, and new collaborations.