Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
This repository was archived by the owner on Feb 16, 2022. It is now read-only.
/akutanPublic archive

A distributed knowledge graph store

License

NotificationsYou must be signed in to change notification settings

eBay/akutan

Build StatusGoDoc

There's a blog post that's agood introduction to Akutan.

Akutan is a distributed knowledge graph store, sometimes called an RDF store or atriple store. Knowledge graphs are suitable for modeling data that is highlyinterconnected by many types of relationships, like encyclopedic informationabout the world. A knowledge graph store enables rich queries on its data, whichcan be used to power real-time interfaces, to complement machine learningapplications, and to make sense of new, unstructured information in the contextof the existing knowledge.

How to model your data as a knowledge graph and how to query it will feel a bitdifferent for people coming from SQL, NoSQL, and property graph stores. In aknowledge graph, data is represented as a single table offacts, where eachfact has asubject,predicate, andobject. This representation enables thestore to sift through the data for complex queries and to apply inference rulesthat raise the level of abstraction. Here's an example of a tiny graph:

subjectpredicateobject
<John_Scalzi><born><Fairfield>
<John_Scalzi><lives><Bradford>
<John_Scalzi><wrote><Old_Mans_War>

To learn about how to represent and query data in Akutan, seedocs/query.md.

Akutan is designed to store large graphs that cannot fit on a single server. It'sscalable in how much data it can store and the rate of queries it can execute.However, Akutan serializes all changes to the graph through a central log, whichfundamentally limits the total rate of change. The rate of change won't improvewith a larger number of servers, but a typical deployment should be able tohandle tens of thousands of changes per second. In exchange for this limitation,Akutan's architecture is a relatively simple one that enables many features. Forexample, Akutan supports transactional updates and historical global snapshots. Webelieve this trade-off is suitable for most knowledge graph use cases, whichaccumulate large amounts of data but do so at a modest pace. To learn more aboutAkutan's architecture and this trade-off, seedocs/central_log_arch.md.

Akutan isn't ready for production-critical deployments, but it's useful today forsome use cases. We've run a 20-server deployment of Akutan for developmentpurposes and off-line use cases for about a year, which we've most commonlyloaded with a dataset of about 2.5 billion facts. We believe Akutan's currentcapabilities exceed this capacity and scale; we haven't yet pushed Akutan to itslimits. The project has a good architectural foundation on which additionalfeatures can be built and higher performance could be achieved.

Akutan needs more love before it can be used for production-critical deployments.Much of Akutan's code consists of high-quality, documented, unit-tested modules,but some areas of the code base are inherited from Akutan's earlier prototype daysand still need attention. In other places, some functionality is lacking beforeAkutan could be used as a critical production data store, including deletion offacts, backup/restore, and automated cluster management. We have filedGitHub issues for these and a few other things. There are also areas where Akutancould be improved that wouldn't necessarily block production usage. For example,Akutan's query language is not quite compatible with Sparql, and its inferenceengine is limited.

So, Akutan has a nice foundation and may be useful to some people, but it alsoneeds additional love. If that's not for you, here are a few alternativeopen-source knowledge and property graph stores that you may want to consider(we have no affiliation with these projects):

  • Blazegraph: an RDF store. Supportsseveral query languages, including SPARQL and Gremlin. Disk-based,single-master, scales out for reads only. Seems unmaintained. Powershttps://query.wikidata.org/.
  • Dgraph: a triple-oriented propertygraph store. GraphQL-like query language, no support for SPARQL. Disk-based,scales out.
  • Neo4j: a property graph store. Cypher querylanguage, no support for SPARQL. Single-master, scales out for reads only.
  • See also Wikipedia'sComparison of Triplestorespage.

The remainder of this README describes how to get Akutan up and running. Severaldocuments under thedocs/ directory describe aspects of Akutan in moredetail; seedocs/README.md for an overview.

Installing dependencies and building Akutan

Akutan has the following system dependencies:

  • It's written inGo. You'll need v1.11.5 or newer.
  • Akutan usesProtocol Buffersextensively to encode messages forgRPC, the log of datachanges, and storage on disk. You'll need protobuf version 3. We reccomend3.5.2 or later. Note that 3.0.x is the default in many Linux distributions, butdoesn't work with the Akutan build.
  • Akutan's Disk Views store their facts inRocksDB.

On Mac OS X, these can all be installed viaHomebrew:

$ brew install golang protobuf rocksdb zstd

On Ubuntu, refer to the files within thedocker/ directory forpackage names to use withapt-get.

After cloning the Akutan repository, pull down several Go libraries and additionalGo tools:

$ make get

Finally, build the project:

$ make build

Running Akutan locally

The fastest way to run Akutan locally is to launch the in-memory log store:

$ bin/plank

Then open another terminal and run:

$ make run

This will bring up several Akutan servers locally. It starts an API server thatlistens on localhost for gRPC requests on port 9987 and for HTTP requests onport 9988, such ashttp://localhost:9988/stats.txt.

The easiest way to interact with the API server is usingbin/akutan-client. Seedocs/query.md for examples. The API server exposes theFactStore gRPC service defined inproto/api/akutan_api.proto.

Deployment concerns

The log

Earlier, we usedbin/plank as a log store, but this is unsuitable for realusage! Plank is in-memory only, isn't replicated, and by default, it onlykeeps 1000 entries at a time. It's only meant for development.

Akutan also supports usingApache Kafka as its logstore. This is recommended over Plank for any deployment. To use Kafka, follow theKafka quick start guide to installKafka, start ZooKeeper, and start Kafka. Then create a topic called "akutan"(not "test" as in the Kafka guide) withpartitions set to 1. You'll want toconfigure Kafka to synchronously write entries to disk.

To use Kafka with Akutan, set theakutanLog'stype tokafka in your Akutanconfiguration (default:local/config.json), and update thelocator'saddresses accordingly (Kafka uses port 9092 by default). You'll need to clearout Akutan's Disk Views' data before restarting the cluster. The Disk Viewsby default store their data in $TMPDIR/rocksdb-akutan-diskview-{space}-{partition}so you can delete them all withrm -rf $TMPDIR/rocksdb-akutan-diskview*

Docker and Kubernetes

This repository includes support for running Akutan insideDocker andMinikube. These environments canbe tedious for development purposes, but they're useful as a step towards amodern and robust production deployment.

Seecluster/k8s/Minikube.md file for the steps to build and deploy Akutanservices inMinikube. It also includes the steps to build the Docker images.

Distributed tracing

Akutan generates distributedOpenTracing traces for usewithJaeger. To try it, follow theJaeger Getting Started Guidefor running the all-in-one Docker image. The defaultmake run is configured tosend traces there, which you can query athttp://localhost:16686. The Minikubecluster also includes a Jaeger all-in-one instance.

Development

VS Code

You can use whichever editor you'd like, but this repository contains someconfiguration forVS Code. Wesuggest the following extensions:

Override the default settings in.vscode/settings.json with./vscode-settings.json5.

Test targets

TheMakefile contains various targets related to running tests:

TargetDescription
make testrun all the akutan unit tests
make coverrun all the akutan unit tests and open the web-based coverage viewer
make lintrun basic code linting
make vetrun all static analysis tests including linting and formatting

License Information

Copyright 2019 eBay Inc.

Primary authors: Simon Fell, Diego Ongaro, Raymond Kroeker, Sathish Kandasamy

Licensed under the Apache License, Version 2.0 (the "License"); you may not usethis file except in compliance with the License. You may obtain a copy of theLicense athttps://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributedunder the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES ORCONDITIONS OF ANY KIND, either express or implied. See the License for thespecific language governing permissions and limitations under the License.


Note the project was renamed to Akutan in July 2019.

About

A distributed knowledge graph store

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp