- Notifications
You must be signed in to change notification settings - Fork106
A distributed knowledge graph store
License
eBay/akutan
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
There's a blog post that's agood introduction to Akutan.
Akutan is a distributed knowledge graph store, sometimes called an RDF store or atriple store. Knowledge graphs are suitable for modeling data that is highlyinterconnected by many types of relationships, like encyclopedic informationabout the world. A knowledge graph store enables rich queries on its data, whichcan be used to power real-time interfaces, to complement machine learningapplications, and to make sense of new, unstructured information in the contextof the existing knowledge.
How to model your data as a knowledge graph and how to query it will feel a bitdifferent for people coming from SQL, NoSQL, and property graph stores. In aknowledge graph, data is represented as a single table offacts, where eachfact has asubject,predicate, andobject. This representation enables thestore to sift through the data for complex queries and to apply inference rulesthat raise the level of abstraction. Here's an example of a tiny graph:
subject | predicate | object |
---|---|---|
<John_Scalzi> | <born> | <Fairfield> |
<John_Scalzi> | <lives> | <Bradford> |
<John_Scalzi> | <wrote> | <Old_Mans_War> |
To learn about how to represent and query data in Akutan, seedocs/query.md.
Akutan is designed to store large graphs that cannot fit on a single server. It'sscalable in how much data it can store and the rate of queries it can execute.However, Akutan serializes all changes to the graph through a central log, whichfundamentally limits the total rate of change. The rate of change won't improvewith a larger number of servers, but a typical deployment should be able tohandle tens of thousands of changes per second. In exchange for this limitation,Akutan's architecture is a relatively simple one that enables many features. Forexample, Akutan supports transactional updates and historical global snapshots. Webelieve this trade-off is suitable for most knowledge graph use cases, whichaccumulate large amounts of data but do so at a modest pace. To learn more aboutAkutan's architecture and this trade-off, seedocs/central_log_arch.md.
Akutan isn't ready for production-critical deployments, but it's useful today forsome use cases. We've run a 20-server deployment of Akutan for developmentpurposes and off-line use cases for about a year, which we've most commonlyloaded with a dataset of about 2.5 billion facts. We believe Akutan's currentcapabilities exceed this capacity and scale; we haven't yet pushed Akutan to itslimits. The project has a good architectural foundation on which additionalfeatures can be built and higher performance could be achieved.
Akutan needs more love before it can be used for production-critical deployments.Much of Akutan's code consists of high-quality, documented, unit-tested modules,but some areas of the code base are inherited from Akutan's earlier prototype daysand still need attention. In other places, some functionality is lacking beforeAkutan could be used as a critical production data store, including deletion offacts, backup/restore, and automated cluster management. We have filedGitHub issues for these and a few other things. There are also areas where Akutancould be improved that wouldn't necessarily block production usage. For example,Akutan's query language is not quite compatible with Sparql, and its inferenceengine is limited.
So, Akutan has a nice foundation and may be useful to some people, but it alsoneeds additional love. If that's not for you, here are a few alternativeopen-source knowledge and property graph stores that you may want to consider(we have no affiliation with these projects):
- Blazegraph: an RDF store. Supportsseveral query languages, including SPARQL and Gremlin. Disk-based,single-master, scales out for reads only. Seems unmaintained. Powershttps://query.wikidata.org/.
- Dgraph: a triple-oriented propertygraph store. GraphQL-like query language, no support for SPARQL. Disk-based,scales out.
- Neo4j: a property graph store. Cypher querylanguage, no support for SPARQL. Single-master, scales out for reads only.
- See also Wikipedia'sComparison of Triplestorespage.
The remainder of this README describes how to get Akutan up and running. Severaldocuments under thedocs/
directory describe aspects of Akutan in moredetail; seedocs/README.md for an overview.
Akutan has the following system dependencies:
- It's written inGo. You'll need v1.11.5 or newer.
- Akutan usesProtocol Buffersextensively to encode messages forgRPC, the log of datachanges, and storage on disk. You'll need protobuf version 3. We reccomend3.5.2 or later. Note that 3.0.x is the default in many Linux distributions, butdoesn't work with the Akutan build.
- Akutan's Disk Views store their facts inRocksDB.
On Mac OS X, these can all be installed viaHomebrew:
$ brew install golang protobuf rocksdb zstd
On Ubuntu, refer to the files within thedocker/ directory forpackage names to use withapt-get
.
After cloning the Akutan repository, pull down several Go libraries and additionalGo tools:
$ make get
Finally, build the project:
$ make build
The fastest way to run Akutan locally is to launch the in-memory log store:
$ bin/plank
Then open another terminal and run:
$ make run
This will bring up several Akutan servers locally. It starts an API server thatlistens on localhost for gRPC requests on port 9987 and for HTTP requests onport 9988, such ashttp://localhost:9988/stats.txt.
The easiest way to interact with the API server is usingbin/akutan-client
. Seedocs/query.md for examples. The API server exposes theFactStore
gRPC service defined inproto/api/akutan_api.proto.
Earlier, we usedbin/plank
as a log store, but this is unsuitable for realusage! Plank is in-memory only, isn't replicated, and by default, it onlykeeps 1000 entries at a time. It's only meant for development.
Akutan also supports usingApache Kafka as its logstore. This is recommended over Plank for any deployment. To use Kafka, follow theKafka quick start guide to installKafka, start ZooKeeper, and start Kafka. Then create a topic called "akutan"(not "test" as in the Kafka guide) withpartitions
set to 1. You'll want toconfigure Kafka to synchronously write entries to disk.
To use Kafka with Akutan, set theakutanLog
'stype
tokafka
in your Akutanconfiguration (default:local/config.json
), and update thelocator
'saddresses
accordingly (Kafka uses port 9092 by default). You'll need to clearout Akutan's Disk Views' data before restarting the cluster. The Disk Viewsby default store their data in $TMPDIR/rocksdb-akutan-diskview-{space}-{partition}so you can delete them all withrm -rf $TMPDIR/rocksdb-akutan-diskview*
This repository includes support for running Akutan insideDocker andMinikube. These environments canbe tedious for development purposes, but they're useful as a step towards amodern and robust production deployment.
Seecluster/k8s/Minikube.md
file for the steps to build and deploy Akutanservices inMinikube
. It also includes the steps to build the Docker images.
Akutan generates distributedOpenTracing traces for usewithJaeger. To try it, follow theJaeger Getting Started Guidefor running the all-in-one Docker image. The defaultmake run
is configured tosend traces there, which you can query athttp://localhost:16686. The Minikubecluster also includes a Jaeger all-in-one instance.
You can use whichever editor you'd like, but this repository contains someconfiguration forVS Code. Wesuggest the following extensions:
Override the default settings in.vscode/settings.json
with./vscode-settings.json5.
TheMakefile
contains various targets related to running tests:
Target | Description |
---|---|
make test | run all the akutan unit tests |
make cover | run all the akutan unit tests and open the web-based coverage viewer |
make lint | run basic code linting |
make vet | run all static analysis tests including linting and formatting |
Copyright 2019 eBay Inc.
Primary authors: Simon Fell, Diego Ongaro, Raymond Kroeker, Sathish Kandasamy
Licensed under the Apache License, Version 2.0 (the "License"); you may not usethis file except in compliance with the License. You may obtain a copy of theLicense athttps://www.apache.org/licenses/LICENSE-2.0.
Unless required by applicable law or agreed to in writing, software distributedunder the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES ORCONDITIONS OF ANY KIND, either express or implied. See the License for thespecific language governing permissions and limitations under the License.
Note the project was renamed to Akutan in July 2019.