| Datalog | |
|---|---|
| Paradigm | Logic,Declarative |
| Family | Prolog |
| First appeared | 1977; 49 years ago (1977) |
| Typing discipline | Weak |
| Dialects | |
| Datomic,.QL,Soufflé, XTDB, etc. | |
| Influenced by | |
| Prolog | |
| Influenced | |
| SQL | |
| Datalog | |
|---|---|
| Filename extension | .dl |
| Internet media type | |
| Website | datalog-specs |
Datalog is adeclarativelogic programming language. While it is syntactically a subset ofProlog, Datalog generally uses a bottom-up rather than top-down evaluation model. This difference yields significantly different behavior and properties fromProlog. It is often used as aquery language fordeductive databases. Datalog has been applied to problems indata integration,networking,program analysis, and more.
A Datalog program consists offacts, which are statements that are held to be true, andrules, which say how to deduce new facts from known facts. For example, here are two facts that meanxerces is a parent of brooke andbrooke is a parent of damocles:
parent(xerces,brooke).parent(brooke,damocles).
The names are written in lowercase because strings beginning with an uppercase letter stand for variables. Here are two rules:
ancestor(X,Y):-parent(X,Y).ancestor(X,Y):-parent(X,Z),ancestor(Z,Y).
The:- symbol is read as "if", and the comma is read "and", so these rules mean:
The meaning of a program is defined to be the set of all of the facts that can be deduced using the initial facts and the rules. This program's meaning is given by the following facts:
parent(xerces,brooke).parent(brooke,damocles).ancestor(xerces,brooke).ancestor(brooke,damocles).ancestor(xerces,damocles).
Some Datalog implementations don't deduce all possible facts, but instead answerqueries:
?-ancestor(xerces,X).
This query asks:Who are all the X that xerces is an ancestor of? For this example, it would returnbrooke anddamocles.
The non-recursive subset of Datalog is closely related to query languages forrelational databases, such asSQL. The following table maps between Datalog,relational algebra, andSQL concepts:
| Datalog | Relational algebra | SQL |
|---|---|---|
| Relation | Relation | Table |
| Fact | Tuple | Row |
| Rule | N/a | Materialized view |
| Query | Select | Query |
More formally, non-recursive Datalog corresponds precisely tounions of conjunctive queries, or equivalently, negation-free relational algebra.
Schematic translation from non-recursive Datalog into SQL |
|---|
s(x,y).t(y).r(A,B):-s(A,B),t(B). CREATETABLEs(z0TEXTNONNULL,z1TEXTNONNULL,PRIMARYKEY(z0,z1));CREATETABLEt(z0TEXTNONNULLPRIMARYKEY);INSERTINTOsVALUES('x','y');INSERTINTOtVALUES('y');CREATEVIEWrASSELECTs.z0,s.z1FROMs,tWHEREs.z1=t.z0; |
A Datalog program consists of a list ofrules (Horn clauses).[1] Ifconstant andvariable are twocountable sets of constants and variables respectively andrelation is a countable set ofpredicate symbols, then the followingBNF grammar expresses the structure of a Datalog program:
<program>::=<rule><program> | ""<rule>::=<atom> ":-"<atom-list> "."<atom>::=<relation> "("<term-list> ")"<atom-list>::=<atom> |<atom> ","<atom-list> | ""<term>::=<constant> |<variable><term-list>::=<term> |<term> ","<term-list> | ""
Atoms are also referred to asliterals. The atom to the left of the:- symbol is called thehead of the rule; the atoms to the right are thebody. Every Datalog program must satisfy the condition that every variable that appears in the head of a rule also appears in the body (this condition is sometimes called therange restriction).[1][2]
There are two common conventions for variable names: capitalizing variables, or prefixing them with a question mark?.[3]
Note that under this definition, Datalog doesnot include negation nor aggregates; see§ Extensions for more information about those constructs.
Rules with empty bodies are calledfacts. For example, the following rule is a fact:
r(x):-.
The set of facts is called theextensional database orEDB of the Datalog program. The set of tuples computed by evaluating the Datalog program is called theintensional database orIDB.
Many implementations of logic programming extend the above grammar to allow writing facts without the:-, like so:
r(x).
Some also allow writing 0-ary relations without parentheses, like so:
p:-q.
These are merely abbreviations (syntactic sugar); they have no impact on the semantics of the program.
| Program | edge(x,y).edge(y,z).path(A,B):-edge(A,B).path(A,C):-path(A,B),edge(B,C). |
|---|---|
| Herbrand universe | x,y,z |
| Herbrand base | edge(x, x),edge(x, y), ...,edge(z, z),path(x, x), ...,path(z, z) |
| Herbrand model | edge(x, y),edge(y, z),path(x, y),path(y, z),path(x, z) |
There are three widely-used approaches to the semantics of Datalog programs:model-theoretic,fixed-point, andproof-theoretic. These three approaches can be proven equivalent.[4]
An atom is calledground if none of its subterms are variables. Intuitively, each of the semantics define the meaning of a program to be the set of all ground atoms that can be deduced from the rules of the program, starting from the facts.
A rule is called ground if all of its atoms (head and body) are ground. A ruleR2 is aground instance of another ruleR1 ifR2 is the result of asubstitution of constants for all the variables inR1. TheHerbrand base of a Datalog program is the set of all ground atoms that can be made with the constants appearing in the program. TheHerbrand model of a Datalog program is the smallest subset of the Herbrand base such that, for each ground instance of each rule in the program, if the atoms in the body of the rule are in the set, then so is the head.[5] The model-theoretic semantics define the minimal Herbrand model to be the meaning of the program.
LetI be thepower set of the Herbrand base of a programP. Theimmediate consequence operator forP is a mapT fromI toI that adds all of the new ground atoms that can be derived from the rules of the program in a single step. The least-fixed-point semantics define the least fixed point ofT to be the meaning of the program; this coincides with the minimal Herbrand model.[6]
Thefixpoint semantics suggest an algorithm for computing the minimal model: Start with the set of ground facts in the program, then repeatedly add consequences of the rules until a fixpoint is reached. This algorithm is callednaïve evaluation.

path(x, z) from the programedge(x,y).edge(y,z).path(A,B):-edge(A,B).path(A,C):-path(A,B),edge(B,C).
The proof-theoretic semantics defines the meaning of a Datalog program to be the set of facts with correspondingproof trees. Intuitively, a proof tree shows how to derive a fact from the facts and rules of a program.
One might be interested in knowing whether or not a particular ground atom appears in the minimal Herbrand model of a Datalog program, perhaps without caring much about the rest of the model. A top-down reading of the proof trees described above suggests an algorithm for computing the results of suchqueries. This reading informs theSLD resolution algorithm, which forms the basis for the evaluation ofProlog.
There are many different ways to evaluate a Datalog program, with different performance characteristics.
Bottom-up evaluation strategies start with the facts in the program and repeatedly apply the rules until either some goal or query is established, or until the complete minimal model of the program is produced.
Naïve evaluation mirrors thefixpoint semantics for Datalog programs. Naïve evaluation uses a set of "known facts", which is initialized to the facts in the program. It proceeds by repeatedly enumerating all ground instances of each rule in the program. If each atom in the body of the ground instance is in the set of known facts, then the head atom is added to the set of known facts. This process is repeated until a fixed point is reached, and no more facts may be deduced. Naïve evaluation produces the entire minimal model of the program.[7]
This sectionneeds expansion. You can help byadding missing information.(February 2023) |
Semi-naïve evaluation is a bottom-up evaluation strategy that can be asymptotically faster than naïve evaluation.[8]

Naïve and semi-naïve evaluation both evaluate recursive Datalog rules by repeatedly applying them to a set of known facts until a fixed point is reached. In each iteration, rules are only run for "one step", i.e., non-recursively. As mentionedabove, each non-recursive Datalog rule corresponds precisely to aconjunctive query. Therefore, many of the techniques fromdatabase theory used to speed up conjunctive queries are applicable to bottom-up evaluation of Datalog, such as
Many such techniques are implemented in modern bottom-up Datalog engines such asSoufflé. Some Datalog engines integrate SQL databases directly.[17]
Bottom-up evaluation of Datalog is also amenable toparallelization. Parallel Datalog engines are generally divided into two paradigms:
This sectionneeds expansion. You can help byadding missing information.(March 2023) |
SLD resolution is sound and complete for Datalog programs.
Top-down evaluation strategies begin with aquery orgoal. Bottom-up evaluation strategies can answer queries by computing the entire minimal model and matching the query against it, but this can be inefficient if the answer only depends on a small subset of the entire model. Themagic sets algorithm takes a Datalog program and a query, and produces a more efficient program that computes the same answer to the query while still using bottom-up evaluation.[23] A variant of the magic sets algorithm has been shown to produce programs that, when evaluated usingsemi-naïve evaluation, are as efficient as top-down evaluation.[24]
Thedecision problem formulation of Datalog evaluation is as follows: "Given a Datalog programP split into a set of facts (EDB)E and a set of rulesR, and a ground atomA. IsA in the minimal model ofP?" In this formulation, there are three variations of thecomputational complexity of evaluating Datalog programs:[25]
With respect to data complexity, the decision problem for Datalog isP-complete (See Theorem 4.4 in[25]). P-completeness for data complexity means that there exists a fixed Datalog query for which evaluation is P-complete. The proof is based onDatalog metainterpreter for propositional logic programs.
With respect to program complexity, the decision problem isEXPTIME-complete. In particular, evaluating Datalog programs always terminates; Datalog is notTuring-complete.
Some extensions to Datalog do not preserve these complexity bounds. Extensions implemented in someDatalog engines, such as algebraic data types, can even make the resulting language Turing-complete.
Several extensions have been made to Datalog, e.g., to support negation,aggregate functions, inequalities, to allowobject-oriented programming, or to allowdisjunctions as heads ofclauses. These extensions have significant impacts on the language's semantics and on the implementation of a corresponding interpreter.
Datalog is a syntactic subset ofProlog,disjunctive Datalog,answer set programming,DatalogZ, andconstraint logic programming. When evaluated as an answer set program, a Datalog program yields a single answer set, which is exactly its minimal model.[26]
Many implementations of Datalog extend Datalog with additional features; see§ Datalog engines for more information.
This sectionneeds expansion. You can help byadding missing information.(February 2023) |
Datalog can be extended to supportaggregate functions.[27]
Notable Datalog engines that implement aggregation include:
Adding negation to Datalog complicates its semantics, leading to whole new languages and strategies for evaluation. For example, the language that results from adding negation with thestable model semantics is exactlyanswer set programming.
Stratified negation can be added to Datalog while retaining its model-theoretic and fixed-point semantics. Notable Datalog engines that implement stratified negation include:
Unlike inProlog, statements of a Datalog program can be stated in any order. Datalog does not have Prolog'scut operator. This makes Datalog a fullydeclarative language.
In contrast to Prolog, Datalog
p(x, y) is admissible but notp(f(x), y),This article deals primarily with Datalog without negation (see alsoSyntax and semantics of logic programming § Negation). However, stratified negation is a common addition to Datalog; the following list contrastsProlog with Datalog with stratified negation. Datalog with stratified negation
Datalog generalizes many other query languages. For instance,conjunctive queries andunion of conjunctive queries can be expressed in Datalog. Datalog can also expressregular path queries.
When we considerordered databases, i.e., databases with anorder relation on theiractive domain, then theImmerman–Vardi theorem implies that the expressive power of Datalog is precisely that of the classPTIME: a property can be expressed in Datalog if and only if it is computable in polynomial time.[31]
Theboundedness problem for Datalog asks, given a Datalog program, whether it isbounded, i.e., the maximal recursion depth reached when evaluating the program on an input database can be bounded by some constant. In other words, this question asks whether the Datalog program could be rewritten as a nonrecursive Datalog program, or, equivalently, as aunion of conjunctive queries. Solving the boundedness problem on arbitrary Datalog programs isundecidable,[32] but it can be made decidable by restricting to some fragments of Datalog.
Systems that implement languages inspired by Datalog, whethercompilers,interpreters,libraries, orembedded DSLs, are referred to asDatalog engines. Datalog engines often implement extensions of Datalog, extending it with additionaldata types,foreign function interfaces, or support for user-definedlattices. Such extensions may allow for writingnon-terminating or otherwise ill-defined programs.[citation needed]
Here is a short list of systems that are either based on Datalog or provide a Datalog interpreter:
| Name | Year of latest release | Written in | Licence | Data sources | Description | Links |
|---|---|---|---|---|---|---|
| AbcDatalog | 2023 | Java | BSD | Datalog engine that implements common evaluation algorithms; designed for extensibility, research use, and education | Homepage | |
| Ascent | 2023 | Rust | MIT License | A logic programming language (similar to Datalog) embedded in Rust via macros, supporting a Lattice and customized datastructure. | Repository | |
| bddbddb | 2007 | Java | GNU LGPL | Datalog implementation designed to query Java bytecode including points-to analysis on large Java programs; usingBDDs internally. | Homepage | |
| Bloom (Bud) | 2017 | Ruby | BSD 3-Clause | RubyDSL for programming with data-centric constructs, based on theDedalus extension of Datalog which adds a temporal dimension to the logic | HomepageRepository | |
| Cascalog | 2014 | Clojure | Apache 2.0 | can query otherDBMS | Data processing and querying library for Clojure and Java, designed to be used onHadoop | RepositoryHomepage (archived) |
| Clingo | 2024 | C++ | MIT License | Answer Set Programming system that supports Datalog as a special case; its standalone groundergringo suffices for plain Datalog | HomepageRepositoryOnline demo | |
| ConceptBase | 2025 | Prolog/C++/Java | BSD 2-Clause | deductive and object-oriented database system for conceptual modeling and metamodeling, which includes a Datalog query evaluator | Homepage | |
| Coral | 1997 | C++ | proprietary, free for some uses, open source | A deductive database system written in C++ with semi-naïve datalog evaluation. Developed 1988-1997. | Homepage | |
| Crepe | 2023 | Rust | Apache 2.0 orMIT | Rust library for expressing Datalog-like inferences, based on procedural macros | Homepage | |
| Datafrog | 2019 | Rust | Apache 2.0 orMIT | Lightweight Datalog engine intended to be embedded in other Rust programs | Homepage | |
| Datafun | 2016 | Racket | open source, no license in repository | Functional programming language that generalized Datalog on semilattices | HomepageRepository | |
| Datahike | 2024 | Clojure | Eclipse Public License 1.0 | built-in database (in-memory or file) | Fork of DataScript with a durable backend based on ahitchhiker tree, using Datalog as query language | Homepage |
| Datalevin | 2024 | Clojure | Eclipse Public License 1.0 | LMDB bindings | Fork of DataScript optimized for LMDB durable storage, using Datalog as query language | Homepage |
| Datalog (Erlang) | 2019 | Erlang | Apache 2.0 | Library to support Datalog queries in Erlang, with data represented as streams of tuples | Homepage | |
| Datalog (MITRE) | 2016 | Lua | GNU LGPL | Lightweight deductive database system, designed to be small and usable on memory constrained devices | HomepageOnline demo | |
| Datalog (OCaml) | 2019 | OCaml | BSD 2-clause | In-memory Datalog implementation for OCaml featuring bottom-up and top-down algorithms | Homepage | |
| Datalog (Racket) | 2022 | Racket | Apache 2.0 orMIT | Racket package for using Datalog | HomepageRepository | |
| Datalog Educational System | 2025 | Prolog | GNU LGPL | DBMS connectors | Open-source implementation intended for teaching Datalog and SQL[33] | Homepage |
| DataScript | 2024 | Clojure | Eclipse Public License 1.0 | in-memory database | Immutable database that runs in a browser, using Datalog as query language | Homepage |
| Datomic | 2024 | Clojure | closed source; binaries released underApache 2.0 | bindings forDynamoDB,Cassandra,PostgreSQL and others | Distributed database running on cloud architectures; uses Datalog as query language | Homepage |
| DDlog | 2021 | Rust | MIT License | Incremental, in-memory, typed Datalog engine; compiled in Rust; based on the differential dataflow[34] library | Homepage | |
| DLV | 2023 | C++ | proprietary, free for some uses | Answer Set Programming system that supports Datalog as a special case | Homepage Company | |
| Dyna1 | 2013 | Haskell | GNU AGPL v3 | Declarative programming language using Datalog for statistical AI programming; later Dyna versions do not use Datalog | RepositoryHomepage (archived) | |
| Flix | 2024 | Java | Apache 2.0 | Functional and logic programming language inspired by Datalog extended with user-defined lattices and monotone filter/transfer functions | HomepageOnline demo | |
| Graal | 2018 | Java | CeCILL v2.1 | RDF import,CSV import,DBMS connectors | Java toolkit dedicated to querying knowledge bases within the framework of existential rules (a.k.a.tuple-generating dependencies or Datalog+/-) | Homepage |
| Inter4QL | 2020 | C++ | BSD | Interpreter for a database query language based on four-valued logic, supports Datalog as a special case | Homepage | |
| IRIS | 2016 | Java | GNU LGPL v2.1 | Logic programming system supporting Datalog and negation under the well-founded semantics; support for RDFS | Repository | |
| Jena | 2024 | Java | Apache 2.0 | RDF import | Semantic web framework that includes a Datalog implementation as part of its general purpose rule engine; compatibility with RDF | Rule engine documentation |
| Mangle | 2024 | Go | Apache 2.0 | Programming language for deductive database programming, supporting an extension of Datalog | Homepage | |
| maplib | 2025 | Rust | Apache 2.0, proprietary for some uses | RDF import, Polars data frames | Semantic web framework in Python that support Datalog reasoning for knowledge graphs as RDF | Homepage |
| Naga | 2021 | Clojure | Eclipse Public License 1.0 | Asami graph database | Query engine that executes Datalog queries over the graph database; runs in browsers (memory), on JVM (memory/files), or natively (memory/files). | Homepage |
| Nemo | 2024 | Rust | Apache 2.0 orMIT | RDF import,CSV import | In-memory rule engine for knowledge graph analysis and database transformations; compatible with RDF andSPARQL; supportstgds | HomepageOnline demo |
| pyDatalog | 2015 | Python | GNU LGPL | DBMS connectors from Python | Python library for interpreting Datalog queries | HomepageRepository |
| RDFox | 2025 | C++ | proprietary, free for some uses | in-memory database,RDF import,CSV import,DBMS connectors | Main-memory based RDF triple store with Datalog reasoning; supports incremental evaluation andhigh availability setups | Homepage |
| SociaLite | 2016 | Java | Apache 2.0 | HDFS bindings | Datalog variant and engine for large-scale graph analysis | Homepage (archived)Repository |
| Soufflé | 2023 | C++ | UPL v1.0 | CSV import,sqlite3 bindings | Datalog engine originally designed for applications static program analysis; rule sets are either compiled to C++ programs or interpreted | Homepage |
| tclbdd | 2015 | Tcl | BSD | Datalog implementation based onbinary decision diagrams; designed to support development of an optimizing compiler for Tcl[35] | Homepage | |
| TerminusDB | 2024 | Prolog/Rust | Apache 2.0 | Graph database and document store, that also features a Datalog-based query language | Homepage | |
| XSB | 2022 | C | GNU LGPL | A logic programming and deductive database system based onProlog with tabling giving Datalog-like termination and efficiency, including incremental evaluation[36] | Homepage | |
| XTDB (formerlyCrux) | 2024 | Clojure | MPL 2.0 | bindings forApache Kafka and others | Immutable database with time-travel, Datalog used as query language in XTDB 1.x (may change in XTDB 2.x) | HomepageRepository |
Datalog is quite limited in its expressivity. It is notTuring-complete, and doesn't include basic data types such asintegers orstrings. This parsimony is appealing from a theoretical standpoint, but it means Datalogper se is rarely used as a programming language orknowledge representation language.[41] MostDatalog engines implement substantial extensions of Datalog. However, Datalog has a strong influence on such implementations, and many authors don't bother to distinguish them from Datalog as presented in this article. Accordingly, the applications discussed in this section include applications of realistic implementations of Datalog-based languages.
Datalog has been applied to problems indata integration,information extraction,networking,security,cloud computing andmachine learning.[42][43]Google has developed an extension to Datalog forbig data processing.[44]
Datalog has seen application instatic program analysis.[45] TheSoufflé dialect has been used to writepointer analyses forJava and acontrol-flow analysis forScheme.[46][47] Datalog has been integrated withSMT solvers to make it easier to write certain static analyses.[48] TheFlix dialect is also suited to writing static program analyses.[49]
Some widely used database systems include ideas and algorithms developed for Datalog. For example, theSQL:1999 standard includesrecursive queries, and the Magic Sets algorithm (initially developed for the faster evaluation of Datalog queries) is implemented in IBM'sDB2.[50]
The origins of Datalog date back to the beginning oflogic programming, but it became prominent as a separate area around 1977 when Hervé Gallaire andJack Minker organized a workshop onlogic anddatabases.[51]David Maier is credited with coining the term Datalog.[52]
{{citation}}: CS1 maint: work parameter with ISBN (link){{cite book}}:|journal= ignored (help){{citation}}: CS1 maint: multiple names: authors list (link).