dshulyak/badgerPublic

forked fromhypermodeinc/badger

NotificationsYou must be signed in to change notification settings
Fork0
Star0

Fast key-value DB in Go.

blog.dgraph.io/post/badger/

License

Apache-2.0 license

0 stars 1.2k forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,119 Commits
.github		.github
badger		badger
docs		docs
images		images
integration/testgc		integration/testgc
options		options
pb		pb
skl		skl
table		table
trie		trie
y		y
.deepsource.toml		.deepsource.toml
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
VERSIONING.md		VERSIONING.md
appveyor.yml		appveyor.yml
backup.go		backup.go
backup_test.go		backup_test.go
batch.go		batch.go
batch_test.go		batch_test.go
changes.sh		changes.sh
compaction.go		compaction.go
db.go		db.go
db2_test.go		db2_test.go
db_test.go		db_test.go
dir_unix.go		dir_unix.go
dir_windows.go		dir_windows.go
doc.go		doc.go
errors.go		errors.go
go.mod		go.mod
go.sum		go.sum
histogram.go		histogram.go
histogram_test.go		histogram_test.go
iterator.go		iterator.go
iterator_test.go		iterator_test.go
key_registry.go		key_registry.go
key_registry_test.go		key_registry_test.go
level_handler.go		level_handler.go
levels.go		levels.go
levels_test.go		levels_test.go
logger.go		logger.go
logger_test.go		logger_test.go
managed_db.go		managed_db.go
managed_db_test.go		managed_db_test.go
manifest.go		manifest.go
manifest_test.go		manifest_test.go
merge.go		merge.go
merge_test.go		merge_test.go
options.go		options.go
publisher.go		publisher.go
publisher_test.go		publisher_test.go
stream.go		stream.go
stream_test.go		stream_test.go
stream_writer.go		stream_writer.go
stream_writer_test.go		stream_writer_test.go
structs.go		structs.go
test.sh		test.sh
txn.go		txn.go
txn_test.go		txn_test.go
util.go		util.go
value.go		value.go
value_test.go		value_test.go

Repository files navigation

BadgerDB

BadgerDB is an embeddable, persistent and fast key-value (KV) database writtenin pure Go. It is the underlying database forDgraph, afast, distributed graph database. It's meant to be a performant alternative tonon-Go-based key-value stores like RocksDB.

UseDiscuss Issues for reporting issues about this repository.

Project Status [March 24, 2020]

Badger is stable and is being used to serve data sets worth hundreds ofterabytes. Badger supports concurrent ACID transactions with serializablesnapshot isolation (SSI) guarantees. A Jepsen-style bank test runs nightly for8h, with--race flag and ensures the maintenance of transactional guarantees.Badger has also been tested to work with filesystem level anomalies, to ensurepersistence and consistency. Badger is being used by a number of projects whichincludes Dgraph, Jaeger Tracing, UsenetExpress, and many more.

The list of projects using Badger can be foundhere.

Badger v1.0 was released in Nov 2017, and the latest version that is data-compatiblewith v1.0 is v1.6.0.

Badger v2.0 was released in Nov 2019 with a new storage format which won'tbe compatible with all of the v1.x. Badger v2.0 supports compression, encryption and uses a cache to speed up lookup.

TheChangelog is kept fairly up-to-date.

For more details on our version naming schema please readChoosing a version.

Getting Started

Installing

To start using Badger, install Go 1.12 or above and rungo get:

$ go get github.com/dgraph-io/badger/v2

This will retrieve the library and install thebadger command lineutility into your$GOBIN path.

Note: Badger does not directly use CGO but it relies onhttps://github.com/DataDog/zstd for compression and it requires gcc/cgo. If you wish to use badger without gcc/cgo, you can run`CGO_ENABLED=0 go get github.com/dgraph-io/badger/...` which will download badger without the support for ZSTD compression algorithm.

Choosing a version

BadgerDB is a pretty special package from the point of view that the most important change we canmake to it is not on its API but rather on how data is stored on disk.

This is why we follow a version naming schema that differs from Semantic Versioning.

New major versions are released when the data format on disk changes in an incompatible way.
New minor versions are released whenever the API changes but data compatibility is maintained.Note that the changes on the API could be backward-incompatible - unlike Semantic Versioning.
New patch versions are released when there's no changes to the data format nor the API.

Following these rules:

v1.5.0 and v1.6.0 can be used on top of the same files without any concerns, as their majorversion is the same, therefore the data format on disk is compatible.
v1.6.0 and v2.0.0 are data incompatible as their major version implies, so files created withv1.6.0 will need to be converted into the new format before they can be used by v2.0.0.

For a longer explanation on the reasons behind using a new versioning naming schema, you can readVERSIONING.md.

Opening a database

The top-level object in Badger is aDB. It represents multiple files on diskin specific directories, which contain the data for a single database.

To open your database, use thebadger.Open() function, with the appropriateoptions. TheDir andValueDir options are mandatory and must bespecified by the client. They can be set to the same value to simplify things.

package mainimport ("log"badger"github.com/dgraph-io/badger/v2")funcmain() {// Open the Badger database located in the /tmp/badger directory.// It will be created if it doesn't exist.db,err:=badger.Open(badger.DefaultOptions("/tmp/badger"))iferr!=nil {log.Fatal(err)  }deferdb.Close()  // Your code here…}

Please note that Badger obtains a lock on the directories so multiple processescannot open the same database at the same time.

In-Memory Mode/Diskless Mode

By default, Badger ensures all the data is persisted to the disk. It also supports a purein-memory mode. When Badger is running in in-memory mode, all the data is stored in the memory.Reads and writes are much faster in in-memory mode, but all the data stored in Badger will be lostin case of a crash or close. To open badger in in-memory mode, set theInMemory option.

opt := badger.DefaultOptions("").WithInMemory(true)

Transactions

Read-only transactions

To start a read-only transaction, you can use theDB.View() method:

err:=db.View(func(txn*badger.Txn)error {  // Your code here…  returnnil})

You cannot perform any writes or deletes within this transaction. Badgerensures that you get a consistent view of the database within this closure. Anywrites that happen elsewhere after the transaction has started, will not beseen by calls made within the closure.

Read-write transactions

To start a read-write transaction, you can use theDB.Update() method:

err:=db.Update(func(txn*badger.Txn)error {  // Your code here…  returnnil})

All database operations are allowed inside a read-write transaction.

Always check the returned error value. If you return an errorwithin your closure it will be passed through.

AnErrConflict error will be reported in case of a conflict. Depending on the stateof your application, you have the option to retry the operation if you receivethis error.

AnErrTxnTooBig will be reported in case the number of pending writes/deletes inthe transaction exceeds a certain limit. In that case, it is best to commit thetransaction and start a new transaction immediately. Here is an example (we arenot checking for errors in some places for simplicity):

updates:=make(map[string]string)txn:=db.NewTransaction(true)fork,v:=rangeupdates {iferr:=txn.Set([]byte(k),[]byte(v));err==badger.ErrTxnTooBig {_=txn.Commit()txn=db.NewTransaction(true)_=txn.Set([]byte(k),[]byte(v))  }}_=txn.Commit()

Managing transactions manually

TheDB.View() andDB.Update() methods are wrappers around theDB.NewTransaction() andTxn.Commit() methods (orTxn.Discard() in case ofread-only transactions). These helper methods will start the transaction,execute a function, and then safely discard your transaction if an error isreturned. This is the recommended way to use Badger transactions.

However, sometimes you may want to manually create and commit yourtransactions. You can use theDB.NewTransaction() function directly, whichtakes in a boolean argument to specify whether a read-write transaction isrequired. For read-write transactions, it is necessary to callTxn.Commit()to ensure the transaction is committed. For read-only transactions, callingTxn.Discard() is sufficient.Txn.Commit() also callsTxn.Discard()internally to cleanup the transaction, so just callingTxn.Commit() issufficient for read-write transaction. However, if your code doesn’t callTxn.Commit() for some reason (for e.g it returns prematurely with an error),then please make sure you callTxn.Discard() in adefer block. Refer to thecode below.

// Start a writable transaction.txn:=db.NewTransaction(true)defertxn.Discard()// Use the transaction...err:=txn.Set([]byte("answer"), []byte("42"))iferr!=nil {returnerr}// Commit the transaction and check for error.iferr:=txn.Commit();err!=nil {returnerr}

The first argument toDB.NewTransaction() is a boolean stating if the transactionshould be writable.

Badger allows an optional callback to theTxn.Commit() method. Normally, thecallback can be set tonil, and the method will return after all the writeshave succeeded. However, if this callback is provided, theTxn.Commit()method returns as soon as it has checked for any conflicts. The actual writingto the disk happens asynchronously, and the callback is invoked once thewriting has finished, or an error has occurred. This can improve the throughputof the application in some cases. But it also means that a transaction is notdurable until the callback has been invoked with anil error value.

Using key/value pairs

To save a key/value pair, use theTxn.Set() method:

err:=db.Update(func(txn*badger.Txn)error {err:=txn.Set([]byte("answer"), []byte("42"))returnerr})

Key/Value pair can also be saved by first creatingEntry, then setting thisEntry usingTxn.SetEntry().Entry also exposes methods to set propertieson it.

err:=db.Update(func(txn*badger.Txn)error {e:=badger.NewEntry([]byte("answer"), []byte("42"))err:=txn.SetEntry(e)returnerr})

This will set the value of the"answer" key to"42". To retrieve thisvalue, we can use theTxn.Get() method:

err:=db.View(func(txn*badger.Txn)error {item,err:=txn.Get([]byte("answer"))handle(err)varvalNot,valCopy []byteerr:=item.Value(func(val []byte)error {// This func with val would only be called if item.Value encounters no error.// Accessing val here is valid.fmt.Printf("The answer is: %s\n",val)// Copying or parsing val is valid.valCopy=append([]byte{},val...)// Assigning val slice to another variable is NOT OK.valNot=val// Do not do this.returnnil  })handle(err)// DO NOT access val here. It is the most common cause of bugs.fmt.Printf("NEVER do this. %s\n",valNot)// You must copy it to use it outside item.Value(...).fmt.Printf("The answer is: %s\n",valCopy)// Alternatively, you could also use item.ValueCopy().valCopy,err=item.ValueCopy(nil)handle(err)fmt.Printf("The answer is: %s\n",valCopy)returnnil})

Txn.Get() returnsErrKeyNotFound if the value is not found.

Please note that values returned fromGet() are only valid while thetransaction is open. If you need to use a value outside of the transactionthen you must usecopy() to copy it to another byte slice.

Use theTxn.Delete() method to delete a key.

Monotonically increasing integers

To get unique monotonically increasing integers with strong durability, you canuse theDB.GetSequence method. This method returns aSequence object, whichis thread-safe and can be used concurrently via various goroutines.

Badger would lease a range of integers to hand out from memory, with thebandwidth provided toDB.GetSequence. The frequency at which disk writes aredone is determined by this lease bandwidth and the frequency ofNextinvocations. Setting a bandwidth too low would do more disk writes, setting ittoo high would result in wasted integers if Badger is closed or crashes.To avoid wasted integers, callRelease before closing Badger.

seq,err:=db.GetSequence(key,1000)deferseq.Release()for {num,err:=seq.Next()}

Merge Operations

Badger provides support for ordered merge operations. You can define a funcof typeMergeFunc which takes in an existing value, and a value to bemerged with it. It returns a new value which is the result of themergeoperation. All values are specified in byte arrays. For e.g., here is a mergefunction (add) which appends a[]byte value to an existing[]byte value.

// Merge function to append one byte slice to anotherfuncadd(originalValue,newValue []byte) []byte {returnappend(originalValue,newValue...)}

This function can then be passed to theDB.GetMergeOperator() method, alongwith a key, and a duration value. The duration specifies how often the mergefunction is run on values that have been added using theMergeOperator.Add()method.

MergeOperator.Get() method can be used to retrieve the cumulative value of the keyassociated with the merge operation.

key:= []byte("merge")m:=db.GetMergeOperator(key,add,200*time.Millisecond)deferm.Stop()m.Add([]byte("A"))m.Add([]byte("B"))m.Add([]byte("C"))res,_:=m.Get()// res should have value ABC encoded

Example: Merge operator which increments a counter

funcuint64ToBytes(iuint64) []byte {varbuf [8]bytebinary.BigEndian.PutUint64(buf[:],i)returnbuf[:]}funcbytesToUint64(b []byte)uint64 {returnbinary.BigEndian.Uint64(b)}// Merge function to add two uint64 numbersfuncadd(existing,new []byte) []byte {returnuint64ToBytes(bytesToUint64(existing)+bytesToUint64(new))}

It can be used as

key:= []byte("merge")m:=db.GetMergeOperator(key,add,200*time.Millisecond)deferm.Stop()m.Add(uint64ToBytes(1))m.Add(uint64ToBytes(2))m.Add(uint64ToBytes(3))res,_:=m.Get()// res should have value 6 encoded

Setting Time To Live(TTL) and User Metadata on Keys

Badger allows setting an optional Time to Live (TTL) value on keys. Once the TTL haselapsed, the key will no longer be retrievable and will be eligible for garbagecollection. A TTL can be set as atime.Duration value using theEntry.WithTTL()andTxn.SetEntry() API methods.

err:=db.Update(func(txn*badger.Txn)error {e:=badger.NewEntry([]byte("answer"), []byte("42")).WithTTL(time.Hour)err:=txn.SetEntry(e)returnerr})

An optional user metadata value can be set on each key. A user metadata valueis represented by a single byte. It can be used to set certain bits alongwith the key to aid in interpreting or decoding the key-value pair. Usermetadata can be set usingEntry.WithMeta() andTxn.SetEntry() API methods.

err:=db.Update(func(txn*badger.Txn)error {e:=badger.NewEntry([]byte("answer"), []byte("42")).WithMeta(byte(1))err:=txn.SetEntry(e)returnerr})

Entry APIs can be used to add the user metadata and TTL for same key. ThisEntrythen can be set usingTxn.SetEntry().

err:=db.Update(func(txn*badger.Txn)error {e:=badger.NewEntry([]byte("answer"), []byte("42")).WithMeta(byte(1)).WithTTL(time.Hour)err:=txn.SetEntry(e)returnerr})

Iterating over keys

To iterate over keys, we can use anIterator, which can be obtained using theTxn.NewIterator() method. Iteration happens in byte-wise lexicographical sortingorder.

err:=db.View(func(txn*badger.Txn)error {opts:=badger.DefaultIteratorOptionsopts.PrefetchSize=10it:=txn.NewIterator(opts)deferit.Close()forit.Rewind();it.Valid();it.Next() {item:=it.Item()k:=item.Key()err:=item.Value(func(v []byte)error {fmt.Printf("key=%s, value=%s\n",k,v)returnnil    })iferr!=nil {returnerr    }  }returnnil})

The iterator allows you to move to a specific point in the list of keys and moveforward or backward through the keys one at a time.

By default, Badger prefetches the values of the next 100 items. You can adjustthat with theIteratorOptions.PrefetchSize field. However, setting it toa value higher thanGOMAXPROCS (which we recommend to be 128 or higher)shouldn’t give any additional benefits. You can also turn off the fetching ofvalues altogether. See section below on key-only iteration.

Prefix scans

To iterate over a key prefix, you can combineSeek() andValidForPrefix():

err:=db.View(func(txn*badger.Txn)error {it:=txn.NewIterator(badger.DefaultIteratorOptions)deferit.Close()prefix:= []byte("1234")forit.Seek(prefix);it.ValidForPrefix(prefix);it.Next() {item:=it.Item()k:=item.Key()err:=item.Value(func(v []byte)error {fmt.Printf("key=%s, value=%s\n",k,v)returnnil    })iferr!=nil {returnerr    }  }returnnil})

Key-only iteration

Badger supports a unique mode of iteration calledkey-only iteration. It isseveral order of magnitudes faster than regular iteration, because it involvesaccess to the LSM-tree only, which is usually resident entirely in RAM. Toenable key-only iteration, you need to set theIteratorOptions.PrefetchValuesfield tofalse. This can also be used to do sparse reads for selected keysduring an iteration, by callingitem.Value() only when required.

err:=db.View(func(txn*badger.Txn)error {opts:=badger.DefaultIteratorOptionsopts.PrefetchValues=falseit:=txn.NewIterator(opts)deferit.Close()forit.Rewind();it.Valid();it.Next() {item:=it.Item()k:=item.Key()fmt.Printf("key=%s\n",k)  }returnnil})

Stream

Badger provides a Stream framework, which concurrently iterates over all or aportion of the DB, converting data into custom key-values, and streams it outserially to be sent over network, written to disk, or even written back toBadger. This is a lot faster way to iterate over Badger than using a singleIterator. Stream supports Badger in both managed and normal mode.

Stream uses the natural boundaries created by SSTables within the LSM tree, toquickly generate key ranges. Each goroutine then picks a range and runs aniterator to iterate over it. Each iterator iterates over all versions of valuesand is created from the same transaction, thus working over a snapshot of theDB. Every time a new key is encountered, it callsChooseKey(item), followedbyKeyToList(key, itr). This allows a user to select or reject that key, andif selected, convert the value versions into custom key-values. The goroutinebatches up 4MB worth of key-values, before sending it over to a channel.Another goroutine further batches up data from this channel usingsmartbatching algorithm and callsSend serially.

This framework is designed for high throughput key-value iteration, spreadingthe work of iteration across many goroutines.DB.Backup uses this framework toprovide full and incremental backups quickly. Dgraph is a heavy user of thisframework. In fact, this framework was developed and used within Dgraph, beforegetting ported over to Badger.

stream:=db.NewStream()// db.NewStreamAt(readTs) for managed mode.// -- Optional settingsstream.NumGo=16// Set number of goroutines to use for iteration.stream.Prefix= []byte("some-prefix")// Leave nil for iteration over the whole DB.stream.LogPrefix="Badger.Streaming"// For identifying stream logs. Outputs to Logger.// ChooseKey is called concurrently for every key. If left nil, assumes true by default.stream.ChooseKey=func(item*badger.Item)bool {returnbytes.HasSuffix(item.Key(), []byte("er"))}// KeyToList is called concurrently for chosen keys. This can be used to convert// Badger data into custom key-values. If nil, uses stream.ToList, a default// implementation, which picks all valid key-values.stream.KeyToList=nil// -- End of optional settings.// Send is called serially, while Stream.Orchestrate is running.stream.Send=func(list*pb.KVList)error {returnproto.MarshalText(w,list)// Write to w.}// Run the streamiferr:=stream.Orchestrate(context.Background());err!=nil {returnerr}// Done.

Garbage Collection

Badger values need to be garbage collected, because of two reasons:

Badger keeps values separately from the LSM tree. This means that the compaction operationsthat clean up the LSM tree do not touch the values at all. Values need to be cleaned upseparately.
Concurrent read/write transactions could leave behind multiple values for a single key, because theyare stored with different versions. These could accumulate, and take up unneeded space beyond thetime these older versions are needed.

Badger relies on the client to perform garbage collection at a time of their choosing. It providesthe following method, which can be invoked at an appropriate time:

DB.RunValueLogGC(): This method is designed to do garbage collection whileBadger is online. Along with randomly picking a file, it uses statistics generated by theLSM-tree compactions to pick files that are likely to lead to maximum spacereclamation. It is recommended to be called during periods of low activity inyour system, or periodically. One call would only result in removal of at maxone log file. As an optimization, you could also immediately re-run it wheneverit returns nil error (indicating a successful value log GC), as shown below.
```
ticker:=time.NewTicker(5*time.Minute)deferticker.Stop()forrangeticker.C { again:err:=db.RunValueLogGC(0.7)iferr==nil {goto again } }
```
DB.PurgeOlderVersions(): This method isDEPRECATED since v1.5.0. Now, Badger's LSM tree automatically discards older/invalid versions of keys.

Note: The RunValueLogGC method would not garbage collect the latest value log.

Database backup

There are two public API methodsDB.Backup() andDB.Load() which can beused to do online backups and restores. Badger v0.9 provides a CLI toolbadger, which can do offline backup/restore. Make sure you have$GOPATH/binin your PATH to use this tool.

The command below will create a version-agnostic backup of the database, to afilebadger.bak in the current working directory

badger backup --dir <path/to/badgerdb>

To restorebadger.bak in the current working directory to a new database:

badger restore --dir <path/to/badgerdb>

Seebadger --help for more details.

If you have a Badger database that was created using v0.8 (or below), you canuse thebadger_backup tool provided in v0.8.1, and then restore it using thecommand above to upgrade your database to work with the latest version.

badger_backup --dir <path/to/badgerdb> --backup-file badger.bak

We recommend all users to use theBackup andRestore APIs and tools. However,Badger is also rsync-friendly because all files are immutable, barring thelatest value log which is append-only. So, rsync can be used as rudimentary wayto perform a backup. In the following script, we repeat rsync to ensure that theLSM tree remains consistent with the MANIFEST file while doing a full backup.

#!/bin/bashset -o historyset -o histexpand# Makes a complete copy of a Badger database directory.# Repeat rsync if the MANIFEST and SSTables are updated.rsync -avz --delete db/ dstwhile !! | grep -q "(MANIFEST\|\.sst)$"; do :; done

Memory usage

Badger's memory usage can be managed by tweaking several options available intheOptions struct that is passed in when opening the database usingDB.Open.

Options.ValueLogLoadingMode can be set tooptions.FileIO (instead of thedefaultoptions.MemoryMap) to avoid memory-mapping log files. This can beuseful in environments with low RAM.
Number of memtables (Options.NumMemtables)
- If you modifyOptions.NumMemtables, also adjustOptions.NumLevelZeroTables andOptions.NumLevelZeroTablesStall accordingly.
Number of concurrent compactions (Options.NumCompactors)
Mode in which LSM tree is loaded (Options.TableLoadingMode)
Size of table (Options.MaxTableSize)
Size of value log file (Options.ValueLogFileSize)

If you want to decrease the memory usage of Badger instance, tweak theseoptions (ideally one at a time) until you achieve the desiredmemory usage.

Statistics

Badger records metrics using theexpvar package, which is included in the Gostandard library. All the metrics are documented iny/metrics.gofile.

expvar package adds a handler in to the default HTTP server (which has to bestarted explicitly), and serves up the metrics at the/debug/vars endpoint.These metrics can then be collected by a system likePrometheus, to getbetter visibility into what Badger is doing.

Resources

Blog Posts

Design

Badger was written with these design goals in mind:

Write a key-value database in pure Go.
Use latest research to build the fastest KV database for data sets spanning terabytes.
Optimize for SSDs.

Badger’s design is based on a paper titledWiscKey: Separating Keys fromValues in SSD-conscious Storage.

Comparisons

Feature	Badger	RocksDB	BoltDB
Design	LSM tree with value log	LSM tree only	B+ tree
High Read throughput	Yes	No	Yes
High Write throughput	Yes	Yes	No
Designed for SSDs	Yes (with latest research¹)	Not specifically²	No
Embeddable	Yes	Yes	Yes
Sorted KV access	Yes	Yes	Yes
Pure Go (no Cgo)	Yes	No	Yes
Transactions	Yes, ACID, concurrent with SSI³	Yes (but non-ACID)	Yes, ACID
Snapshots	Yes	Yes	Yes
TTL support	Yes	Yes	No
3D access (key-value-version)	Yes⁴	No	No

¹ TheWISCKEY paper (on which Badger is based) saw bigwins with separating values from keys, significantly reducing the writeamplification compared to a typical LSM tree.

² RocksDB is an SSD optimized version of LevelDB, which was designed specifically for rotating disks.As such RocksDB's design isn't aimed at SSDs.

³ SSI: Serializable Snapshot Isolation. For more details, see the blog postConcurrent ACID Transactions in Badger

⁴ Badger provides direct access to value versions via its Iterator API.Users can also specify how many versions to keep per key via Options.

Benchmarks

We have run comprehensive benchmarks against RocksDB, Bolt and LMDB. Thebenchmarking code, and the detailed logs for the benchmarks can be found in thebadger-bench repo. More explanation, including graphs can be found the blog posts (linkedabove).

Projects Using Badger

Below is a list of known projects that use Badger:

Dgraph - Distributed graph database.
Jaeger - Distributed tracing platform.
go-ipfs - Go client for the InterPlanetary File System (IPFS), a new hypermedia distribution protocol.
Riot - An open-source, distributed search engine.
emitter - Scalable, low latency, distributed pub/sub broker with message storage, uses MQTT, gossip and badger.
OctoSQL - Query tool that allows you to join, analyse and transform data from multiple databases using SQL.
Dkron - Distributed, fault tolerant job scheduling system.
Sandglass - distributed, horizontally scalable, persistent, time sorted message queue.
TalariaDB - Grab's Distributed, low latency time-series database.
Sloop - Salesforce's Kubernetes History Visualization Project.
Immudb - Lightweight, high-speed immutable database for systems and applications.
Usenet Express - Serving over 300TB of data with Badger.
gorush - A push notification server written in Go.
0-stor - Single device object store.
Dispatch Protocol - Blockchain protocol for distributed application data analytics.
GarageMQ - AMQP server written in Go.
RedixDB - A real-time persistent key-value store with the same redis protocol.
BBVA - Raft backend implementation using BadgerDB for Hashicorp raft.
Fantom - aBFT Consensus platform for distributed applications.
decred - An open, progressive, and self-funding cryptocurrency with a system of community-based governance integrated into its blockchain.
OpenNetSys - Create useful dApps in any software language.
HoneyTrap - An extensible and opensource system for running, monitoring and managing honeypots.
Insolar - Enterprise-ready blockchain platform.
IoTeX - The next generation of the decentralized network for IoT powered by scalability- and privacy-centric blockchains.
go-sessions - The sessions manager for Go net/http and fasthttp.
Babble - BFT Consensus platform for distributed applications.
Tormenta - Embedded object-persistence layer / simple JSON database for Go projects.
BadgerHold - An embeddable NoSQL store for querying Go types built on Badger
Goblero - Pure Go embedded persistent job queue backed by BadgerDB
Surfline - Serving global wave and weather forecast data with Badger.
Cete - Simple and highly available distributed key-value store built on Badger. Makes it easy bringing up a cluster of Badger with Raft consensus algorithm by hashicorp/raft.
Volument - A new take on website analytics backed by Badger.
KVdb - Hosted key-value store and serverless platform built on top of Badger.

If you are using Badger in a project please send a pull request to add it to the list.

Contributing

If you're interested in contributing to Badger seeCONTRIBUTING.md.

Frequently Asked Questions

My writes are getting stuck. Why?

Update: With the newValue(func(v []byte)) API, this deadlock can no longerhappen.

The following is true for users on Badger v1.x.

This can happen if a long running iteration withPrefetch is set to false, butaItem::Value call is made internally in the loop. That causes Badger toacquire read locks over the value log files to avoid value log GC removing thefile from underneath. As a side effect, this also blocks a new value log GCfile from being created, when the value log file boundary is hit.

Please see Github issues#293and#315.

There are multiple workarounds during iteration:

UseItem::ValueCopy instead ofItem::Value when retrieving value.
SetPrefetch to true. Badger would then copy over the value and release thefile lock immediately.
WhenPrefetch is false, don't callItem::Value and do a pure key-onlyiteration. This might be useful if you just want to delete a lot of keys.
Do the writes in a separate transaction after the reads.

My writes are really slow. Why?

Are you creating a new transaction for every single key update, and waiting forit toCommit fully before creating a new one? This will lead to very lowthroughput.

We have createdWriteBatch API which provides a way to batch upmany updates into a single transaction andCommit that transaction usingcallbacks to avoid blocking. This amortizes the cost of a transaction reallywell, and provides the most efficient way to do bulk writes.

wb:=db.NewWriteBatch()deferwb.Cancel()fori:=0;i<N;i++ {err:=wb.Set(key(i),value(i),0)// Will create txns as needed.handle(err)}handle(wb.Flush())// Wait for all txns to finish.

Note thatWriteBatch API does not allow any reads. For read-modify-writeworkloads, you should be using theTransaction API.

I don't see any disk writes. Why?

If you're using Badger withSyncWrites=false, then your writes might not be written to value logand won't get synced to disk immediately. Writes to LSM tree are done inmemory first, before theyget compacted to disk. The compaction would only happen onceMaxTableSize has been reached. So, ifyou're doing a few writes and then checking, you might not see anything on disk. Once youClosethe database, you'll see these writes on disk.

Reverse iteration doesn't give me the right results.

Just like forward iteration goes to the first key which is equal or greater than the SEEK key, reverse iteration goes to the first key which is equal or lesser than the SEEK key. Therefore, SEEK key would not be part of the results. You can typically add a0xff byte as a suffix to the SEEK key to include it in the results. See the following issues:#436 and#347.

Which instances should I use for Badger?

We recommend using instances which provide local SSD storage, without any limiton the maximum IOPS. In AWS, these are storage optimized instances like i3. Theyprovide local SSDs which clock 100K IOPS over 4KB blocks easily.

I'm getting a closed channel error. Why?

panic: close of closed channelpanic: send on closed channel

If you're seeing panics like above, this would be because you're operating on a closed DB. This can happen, if you callClose() before sending a write, or multiple times. You should ensure that you only callClose() once, and all your read/write operations finish before closing.

Are there any Go specific settings that I should use?

Wehighly recommend setting a high number forGOMAXPROCS, which allows Go toobserve the full IOPS throughput provided by modern SSDs. In Dgraph, we have setit to 128. For more details,see thisthread.

Are there any Linux specific settings that I should use?

We recommend settingmax file descriptors to a high number depending upon the expected size ofyour data. On Linux and Mac, you can check the file descriptor limit withulimit -n -H for thehard limit andulimit -n -S for the soft limit. A soft limit of65535 is a good lower bound.You can adjust the limit as needed.

I see "manifest has unsupported version: X (we support Y)" error.

This error means you have a badger directory which was created by an older version of badger andyou're trying to open in a newer version of badger. The underlying data format can change acrossbadger versions and users will have to migrate their data directory.Badger data can be migrated from version X of badger to version Y of badger by following the stepslisted below.Assume you were on badger v1.6.0 and you wish to migrate to v2.0.0 version.

Install badger version v1.6.0
- cd $GOPATH/src/github.com/dgraph-io/badger
- git checkout v1.6.0
- cd badger && go install
  This should install the old badger binary in your $GOBIN.
Create Backup
- badger backup --dir path/to/badger/directory -f badger.backup
Install badger version v2.0.0
- cd $GOPATH/src/github.com/dgraph-io/badger
- git checkout v2.0.0
- cd badger && go install
  This should install new badger binary in your $GOBIN
Install badger version v2.0.0
- badger restore --dir path/to/new/badger/directory -f badger.backup
  This will create a new directory onpath/to/new/badger/directory and add badger data innewer format to it.

NOTE - The above steps shouldn't cause any data loss but please ensure the new data is valid beforedeleting the old badger directory.

Why do I need gcc to build badger? Does badger need CGO?

Badger does not directly use CGO but it relies onhttps://github.com/DataDog/zstd library forzstd compression and the library requiresgcc/cgo. You can build badger without cgo by runningCGO_ENABLED=0 go build. This will build badger without the support for ZSTD compression algorithm.

Contact

Please usediscuss.dgraph.io for questions, feature requests and discussions.
Please useGithub issue tracker for filing bugs or feature requests.
Join.
Follow us on Twitter@dgraphlabs.

About

Fast key-value DB in Go.

blog.dgraph.io/post/badger/

Releases

40tags

Packages

No packages published

Languages

Go99.8%
Shell0.2%

Movatterモバイル変換

License

dshulyak/badger

Folders and files

Latest commit

History

Repository files navigation

BadgerDB

Project Status [March 24, 2020]

Table of Contents

Getting Started

Installing

Choosing a version

Opening a database

In-Memory Mode/Diskless Mode

Transactions

Read-only transactions

Read-write transactions

Managing transactions manually

Using key/value pairs

Monotonically increasing integers

Merge Operations

Setting Time To Live(TTL) and User Metadata on Keys

Iterating over keys

Prefix scans

Key-only iteration

Stream

Garbage Collection

Database backup

Memory usage

Statistics

Resources

Blog Posts

Design

Comparisons

Benchmarks

Projects Using Badger

Contributing

Frequently Asked Questions

My writes are getting stuck. Why?

My writes are really slow. Why?

I don't see any disk writes. Why?

Reverse iteration doesn't give me the right results.

Which instances should I use for Badger?

I'm getting a closed channel error. Why?

Are there any Go specific settings that I should use?

Are there any Linux specific settings that I should use?

I see "manifest has unsupported version: X (we support Y)" error.

Why do I need gcc to build badger? Does badger need CGO?

Contact

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages