Movatterモバイル変換

[0]ホーム

Jump to content

Tuple relational calculus

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromTuple calculus)

Relational model

Tuple calculus is acalculus that was created and introduced byEdgar F. Codd as part of therelational model, in order to provide adeclarative database-query language for data manipulation in thisdata model. It formed the inspiration for the database-query languagesQUEL andSQL, of which the latter, although far less faithful to the original relational model and calculus, is now the de facto standard database-query language; a dialect of SQL is used by nearly everyrelational-database-management system. Michel Lacroix and Alain Pirotte proposeddomain calculus, which is closer tofirst-order logic and together with Codd showed that both of these calculi (as well asrelational algebra) are equivalent inexpressive power.^[1] Subsequently, query languages for the relational model were calledrelationally complete if they could express at least all of these queries.

Definition

[edit]

Relational database

[edit]

Since the calculus is a query language forrelational databases we first have to define a relational database. The basic relational building block is thedomain (somewhat similar, but not equal to, adata type). Atuple is a finite sequence ofattributes, which areordered pairs of domains and values. Arelation is a set of (compatible) tuples. Although these relational concepts are mathematically defined, those definitions map loosely to traditional database concepts. Atable is an accepted visual representation of a relation; a tuple is similar to the concept of arow.

We first assume the existence of a setC of column names, examples of which are "name", "author", "address", etcetera. We defineheaders as finite subsets ofC. Arelational database schema is defined as atupleS = (D,R,h) whereD is the domain of atomic values (seerelational model for more on the notions ofdomain andatomic value),R is a finite set of relation names, and

h :R → 2^C

afunction that associates a header with each relation name inR. (Note that this is a simplification from the full relational model where there is more than one domain and a header is not just a set of column names but also maps these column names to a domain.) Given a domainD we define atuple overD as apartial function that maps some column names to an atomic value inD. An example would be (name : "Harry", age : 25).

t :C ⇸D

The set of all tuples overD is denoted asT_D. The subset ofC for which a tuplet is defined is called thedomain oft (not to be confused with the domain in the schema) and denoted asdom(t).

Finally we define arelational database given a schemaS = (D,R,h) as a function

db :R → 2^T_D

that maps the relation names inR to finite subsets ofT_D, such that for every relation namer inR and tuplet indb(r) it holds that

dom(t) =h(r).

The latter requirement simply says that all the tuples in a relation should contain the same column names, namely those defined for it in the schema.

Atoms

[edit]

For the construction of the formulas we will assume an infinite setV of tuple variables. The formulas are defined given a database schemaS = (D,R,h) and a partial functiontype :V ⇸ 2^C, called attype assignment, that assigns headers to some tuple variables. We then define theset ofatomic formulasA[S,type] with the following rules:

ifv andw inV,a intype(v) andb intype(w) then the formulav.a =w.b is inA[S,type],
ifv inV,a intype(v) andk denotes a value inD then the formulav.a =k is inA[S,type], and
ifv inV,r inR andtype(v) =h(r) then the formular(v) is inA[S,type].

Examples of atoms are:

(t.age =s.age) —t has an age attribute ands has an age attribute with the same value
(t.name = "Codd") — tuplet has a name attribute and its value is "Codd"
Book(t) — tuplet is present in relation Book.

The formal semantics of such atoms is defined given a databasedb overS and a tuple variable bindingval :V →T_D that maps tuple variables to tuples over the domain inS:

v.a =w.b is true if and only ifval(v)(a) =val(w)(b)
v.a =k is true if and only ifval(v)(a) =k
r(v) is true if and only ifval(v) is indb(r)

Formulas

[edit]

The atoms can be combined into formulas, as is usual in first-order logic, with the logical operators ∧ (and), ∨ (or) and ¬ (not), and we can use the existential quantifier (∃) and the universal quantifier (∀) to bind the variables. We define theset of formulasF[S,type] inductively with the following rules:

every atom inA[S,type] is also inF[S,type]
iff₁ andf₂ are inF[S,type] then the formulaf₁ ∧f₂ is also inF[S,type]
iff₁ andf₂ are inF[S,type] then the formulaf₁ ∨f₂ is also inF[S,type]
iff is inF[S,type] then the formula ¬f is also inF[S,type]
ifv inV,H a header andf a formula inF[S,type_[v->H]] then the formula ∃v :H (f ) is also inF[S,type], wheretype_[v->H] denotes the function that is equal totype except that it mapsv toH,
ifv inV,H a header andf a formula inF[S,type_[v->H]] then the formula ∀v :H (f ) is also inF[S,type]

Examples of formulas:

t.name = "C. J. Date" ∨t.name = "H. Darwen"
Book(t) ∨ Magazine(t)
∀t : {author, title, subject} ( ¬ ( Book(t) ∧t.author = "C. J. Date" ∧ ¬ (t.subject = "relational model")))

Note that the last formula states that all books that are written by C. J. Date have as their subject the relational model. As usual we omit brackets if this causes no ambiguity about the semantics of the formula.

We will assume that the quantifiers quantify over the universe of all tuples over the domain in the schema. This leads to the following formal semantics for formulas given a databasedb overS and a tuple variable bindingval :V ->T_D:

f₁ ∧f₂ is true if and only iff₁ is true andf₂ is true,
f₁ ∨f₂ is true if and only iff₁ is true orf₂ is true or both are true,
¬f is true if and only iff is not true,
∃v :H (f ) is true if and only if there is a tuplet overD such thatdom(t) =H and the formulaf is true forval_[v->t], and
∀v :H (f ) is true if and only if for all tuplest overD such thatdom(t) =H the formulaf is true forval_[v->t].

Queries

[edit]

Finally we define what a query expression looks like given a schemaS = (D,R,h):

{v :H |f(v) }

wherev is a tuple variable,H a header andf(v) a formula inF[S,type] wheretype = { (v,H) } and withv as its only free variable. The result of such a query for a given databasedb overS is the set of all tuplest overD withdom(t) =H such thatf is true fordb andval = { (v,t) }.

Examples of query expressions are:

{t : {name} | ∃s : {name, wage} ( Employee(s) ∧s.wage = 50.000 ∧t.name =s.name ) }
{t : {supplier, article} | ∃s : {s#, sname} ( Supplier(s) ∧s.sname =t.supplier ∧ ∃p : {p#, pname} ( Product(p) ∧p.pname =t.article ∧ ∃a : {s#, p#} ( Supplies(a) ∧s.s# =a.s# ∧a.p# =p.p# ))) }

Semantic and syntactic restriction

[edit]

Domain-independent queries

[edit]

Because the semantics of the quantifiers is such that they quantify over all the tuples over the domain in the schema it can be that a query may return a different result for a certain database if another schema is presumed. For example, consider the two schemasS₁ = (D₁,R,h ) andS₂ = (D₂,R,h ) with domainsD₁ = { 1 },D₂ = { 1, 2 }, relation namesR = { "r₁" } and headersh = { ("r₁", {"a"}) }. Both schemas have a common instance:

db = { ( "r₁", { ("a", 1) } ) }

If we consider the following query expression

{t : {a} |t.a =t.a }

then its result ondb is either { (a : 1) } underS₁ or { (a : 1), (a : 2) } underS₂. It will also be clear that if we take the domain to be an infinite set, then the result of the query will also be infinite. To solve these problems we will restrict our attention to those queries that aredomain independent, i.e., the queries that return the same result for a database under all of its schemas.

An interesting property of these queries is that if we assume that the tuple variables range over tuples over the so-calledactive domain of the database, which is the subset of the domain that occurs in at least one tuple in the database or in the query expression, then the semantics of the query expressions does not change. In fact, in many definitions of the tuple calculus this is how the semantics of the quantifiers is defined, which makes all queries by definition domain independent.

Safe queries

[edit]

In order to limit the query expressions such that they express only domain-independent queries a syntactical notion ofsafe query is usually introduced. To determine whether a query expression is safe we will derive two types of information from a query. The first is whether a variable-column pairt.a isbound to the column of a relation or a constant, and the second is whether two variable-column pairs are directly or indirectly equated (denotedt.v ==s.w).

For deriving boundedness we introduce the following reasoning rules:

in "v.a =w.b " no variable-column pair is bound,
in "v.a =k " the variable-column pairv.a is bound,
in "r(v) " all pairsv.a are bound fora intype(v),
in "f₁ ∧f₂ " all pairs are bound that are bound either inf₁ or inf₂,
in "f₁ ∨f₂ " all pairs are bound that are bound both inf₁ and inf₂,
in " ¬f " no pairs are bound,
in " ∃v :H (f ) " a pairw.a is bound if it is bound inf andw <>v, and
in " ∀v :H (f ) " a pairw.a is bound if it is bound inf andw <>v.

For deriving equatedness we introduce the following reasoning rules (next to the usual reasoning rules for equivalence relations: reflexivity, symmetry and transitivity):

in "v.a =w.b " it holds thatv.a ==w.b,
in "v.a =k " no pairs are equated,
in "r(v) " no pairs are equated,
in "f₁ ∧f₂ " it holds thatv.a ==w.b if it holds either inf₁ or inf₂,
in "f₁ ∨f₂ " it holds thatv.a ==w.b if it holds both inf₁ and inf₂,
in " ¬f " no pairs are equated,
in " ∃v :H (f ) " it holds thatw.a ==x.b if it holds inf andw<>v andx<>v, and
in " ∀v :H (f ) " it holds thatw.a ==x.b if it holds inf andw<>v andx<>v.

We then say that a query expression { v : H | f(v) } issafe if

for every column namea inH we can derive thatv.a is equated with a bound pair inf,
for every subexpression off of the form " ∀w :G (g ) " we can derive that for every column namea inG we can derive thatw.a is equated with a bound pair ing, and
for every subexpression off of the form " ∃w :G (g ) " we can derive that for every column namea inG we can derive thatw.a is equated with a bound pair ing.

The restriction to safe query expressions does not limit the expressiveness since all domain-independent queries that could be expressed can also be expressed by a safe query expression. This can be proven by showing that for a schemaS = (D,R,h), a given setK of constants in the query expression, a tuple variablev and a headerH we can construct a safe formula for every pairv.a witha inH that states that its value is in the active domain. For example, assume thatK={1,2},R={"r"} andh = { ("r", {"a, "b"}) } then the corresponding safe formula forv.b is:

v.b = 1 ∨v.b = 2 ∨ ∃w ( r(w) ∧ (v.b =w.a ∨v.b =w.b ) )

This formula, then, can be used to rewrite any unsafe query expression to an equivalent safe query expression by adding such a formula for every variablev and column namea in its type where it is used in the expression. Effectively this means that we let all variables range over the active domain, which, as was already explained, does not change the semantics if the expressed query is domain independent.