Tuple calculus is acalculus that was created and introduced byEdgar F. Codd as part of therelational model, in order to provide adeclarative database-query language for data manipulation in thisdata model. It formed the inspiration for the database-query languagesQUEL andSQL, of which the latter, although far less faithful to the original relational model and calculus, is now the de facto standard database-query language; a dialect of SQL is used by nearly everyrelational-database-management system. Michel Lacroix and Alain Pirotte proposeddomain calculus, which is closer tofirst-order logic and together with Codd showed that both of these calculi (as well asrelational algebra) are equivalent inexpressive power.[1] Subsequently, query languages for the relational model were calledrelationally complete if they could express at least all of these queries.
Since the calculus is a query language forrelational databases we first have to define a relational database. The basic relational building block is thedomain (somewhat similar, but not equal to, adata type). Atuple is a finite sequence ofattributes, which areordered pairs of domains and values. Arelation is a set of (compatible) tuples. Although these relational concepts are mathematically defined, those definitions map loosely to traditional database concepts. Atable is an accepted visual representation of a relation; a tuple is similar to the concept of arow.
We first assume the existence of a setC of column names, examples of which are "name", "author", "address", etcetera. We defineheaders as finite subsets ofC. Arelational database schema is defined as atupleS = (D,R,h) whereD is the domain of atomic values (seerelational model for more on the notions ofdomain andatomic value),R is a finite set of relation names, and
afunction that associates a header with each relation name inR. (Note that this is a simplification from the full relational model where there is more than one domain and a header is not just a set of column names but also maps these column names to a domain.) Given a domainD we define atuple overD as apartial function that maps some column names to an atomic value inD. An example would be (name : "Harry", age : 25).
The set of all tuples overD is denoted asTD. The subset ofC for which a tuplet is defined is called thedomain oft (not to be confused with the domain in the schema) and denoted asdom(t).
Finally we define arelational database given a schemaS = (D,R,h) as a function
that maps the relation names inR to finite subsets ofTD, such that for every relation namer inR and tuplet indb(r) it holds that
The latter requirement simply says that all the tuples in a relation should contain the same column names, namely those defined for it in the schema.
For the construction of the formulas we will assume an infinite setV of tuple variables. The formulas are defined given a database schemaS = (D,R,h) and a partial functiontype :V ⇸ 2C, called attype assignment, that assigns headers to some tuple variables. We then define theset ofatomic formulasA[S,type] with the following rules:
Examples of atoms are:
The formal semantics of such atoms is defined given a databasedb overS and a tuple variable bindingval :V →TD that maps tuple variables to tuples over the domain inS:
The atoms can be combined into formulas, as is usual in first-order logic, with the logical operators ∧ (and), ∨ (or) and ¬ (not), and we can use the existential quantifier (∃) and the universal quantifier (∀) to bind the variables. We define theset of formulasF[S,type] inductively with the following rules:
Examples of formulas:
Note that the last formula states that all books that are written by C. J. Date have as their subject the relational model. As usual we omit brackets if this causes no ambiguity about the semantics of the formula.
We will assume that the quantifiers quantify over the universe of all tuples over the domain in the schema. This leads to the following formal semantics for formulas given a databasedb overS and a tuple variable bindingval :V ->TD:
Finally we define what a query expression looks like given a schemaS = (D,R,h):
wherev is a tuple variable,H a header andf(v) a formula inF[S,type] wheretype = { (v,H) } and withv as its only free variable. The result of such a query for a given databasedb overS is the set of all tuplest overD withdom(t) =H such thatf is true fordb andval = { (v,t) }.
Examples of query expressions are:
Because the semantics of the quantifiers is such that they quantify over all the tuples over the domain in the schema it can be that a query may return a different result for a certain database if another schema is presumed. For example, consider the two schemasS1 = (D1,R,h ) andS2 = (D2,R,h ) with domainsD1 = { 1 },D2 = { 1, 2 }, relation namesR = { "r1" } and headersh = { ("r1", {"a"}) }. Both schemas have a common instance:
If we consider the following query expression
then its result ondb is either { (a : 1) } underS1 or { (a : 1), (a : 2) } underS2. It will also be clear that if we take the domain to be an infinite set, then the result of the query will also be infinite. To solve these problems we will restrict our attention to those queries that aredomain independent, i.e., the queries that return the same result for a database under all of its schemas.
An interesting property of these queries is that if we assume that the tuple variables range over tuples over the so-calledactive domain of the database, which is the subset of the domain that occurs in at least one tuple in the database or in the query expression, then the semantics of the query expressions does not change. In fact, in many definitions of the tuple calculus this is how the semantics of the quantifiers is defined, which makes all queries by definition domain independent.
In order to limit the query expressions such that they express only domain-independent queries a syntactical notion ofsafe query is usually introduced. To determine whether a query expression is safe we will derive two types of information from a query. The first is whether a variable-column pairt.a isbound to the column of a relation or a constant, and the second is whether two variable-column pairs are directly or indirectly equated (denotedt.v ==s.w).
For deriving boundedness we introduce the following reasoning rules:
For deriving equatedness we introduce the following reasoning rules (next to the usual reasoning rules for equivalence relations: reflexivity, symmetry and transitivity):
We then say that a query expression { v : H | f(v) } issafe if
The restriction to safe query expressions does not limit the expressiveness since all domain-independent queries that could be expressed can also be expressed by a safe query expression. This can be proven by showing that for a schemaS = (D,R,h), a given setK of constants in the query expression, a tuple variablev and a headerH we can construct a safe formula for every pairv.a witha inH that states that its value is in the active domain. For example, assume thatK={1,2},R={"r"} andh = { ("r", {"a, "b"}) } then the corresponding safe formula forv.b is:
This formula, then, can be used to rewrite any unsafe query expression to an equivalent safe query expression by adding such a formula for every variablev and column namea in its type where it is used in the expression. Effectively this means that we let all variables range over the active domain, which, as was already explained, does not change the semantics if the expressed query is domain independent.