```What is a Query Language?
Universality of Data Retrieval Languages, Aho and Ullman, POPL 1979
Raghu Ramakrishnan
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
1
What is …?

What Is A Query Language?
• A language that allows retrieval and manipulation of data
From a database.

What Is A Database?
• A large collection of DATA
• The data can be grouped into sets whose elements have
similar structure.


What Kind of Structure Can the Data Have?
What Kind of Manipulation Should Be Allowed?
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
2
Some Ideas
Relations should be treated as sets of tuples.
 The query language must have a simple, nonoperational meaning that is independent of
physical data representation.
 There must be efficient ways to process
queries over (large) sets of similarly
structured facts.

We will focus on the relational model
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
3
Principles for A Relational Query Language*
* Proposed by Aho & Ullman
1) Relation = Set of Tuples.
Ordering & other storage details should not be
visible.
2) Data Values should not be ‘Interpreted’.
Def : Let μ  D  D be a Bijection.
A Function
f is Allowable
if :
 ( f ( r1 ,..., rn ))  f (  ( r1 ),...,  ( rn ))
Note: (2) Says that no special meaning should be attached to
data values (as far as the query language is concerned);
thus, Arithmetic is Disallowed!
5+6 = 11, 8<9, …
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
4
Principles – Refinement


Principle (2) is too restrictive.
Relax it slightly:
Let P be a special set of predicates
 Preserves
. (e.g.  ,  )
P if  p  P
 ( p ( x1 ,..., x n )) is true  p (  ( x1 ),...,  ( x n ) is true.
Relaxing
Principle
(2) : We require
that :
 ( f ( r1 ,..., rn ))  f (  ( r1 ),...,  ( rn ))
only for Bijections
 that preserve
P.
Note: If we include +, ×, etc. to P, soon only the identity
function will preserve P!
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
5
Allowable Fns – Transitive Closure

Aho & Ullman’s notation of allowable
function is rather restrictive. However:
1. All Relational Algebra queries are allowable.
2. Transitive Closure is allowable.

And they prove that:
• There is no Relational Algebra query that computes
the Transitive Closure of a Relation.
Any R.A expression has a fixed size, say n. Choose Relation R:
a1
a2
ak
k>n
The relational algebra expression cannot deal with (a1, ak).
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
6
Proposal

We should extent RA to support a least
fixpoint operator.
• Some systems (e.g., Oracle) support limited forms
of recursion like transitive closure. Others (DB2)
support linear recursion, following SQL:1999.
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
7
Least Fixpoints

The LFP operator is defined as follows:
LFP ( R  f ( R ))  r , where :
1. r  f(r)
2. if r'  f(r') then r  r'

Theorem (Tarski):
There is a least fixpoint satisfying LFP(R=f(R)) if ‘f ’ is
monotone.
Monotone
: r1  r2  f(r1 )  f(r 2 )
Note: If ‘f’ is a relation algebra expression without ‘−’ (set diff.),
then it is monotone.
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
8
Least Fixpoint – Cont.

Theorem (Kleene)
If f is continuous
& over a complete
lattice,
LFP(R  f(R))  Lim f (  )
n
n 

Example: Transitive Closure
R  R  r  r;
 f ( R ) is R  r  r
f ( )  r ;
f ( f (  ))  f ( r )  r  r  r

f
n
n
( ) 
 r  r   r
i 1
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
9
LFP - Cont.

Claim:
The LFP operator satisfies principles 1&2

Theorem (Aho-Ullman):
There is no relational algebra expression E(R)
that computes the transitive closure of an
arbitrary input relation R.
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
10
Proof
C onsider a set of l arbitrary sym bols:
 l   a1 , a 2 ,
al 
W e consider a fam ily of relations
R l   ( a1 , a 2 ), ( a 2 , a 3 )
a1
a2
( a l  1 , a l )
a3
al
W e show that N O relational algebra expre ssion

com putes exactly the tuples in R l for all l
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
11
We will prove that every R.A. expr. E(R l )
can be expressed
as : b1 b 2  b k |  ( b1 , b 2 ,  b k ) 
Where
 is of the form : clause1  clause2

Each clause is of the form : atom1  atom2  
Each atom is of the form :
bi  a c , bi  a c , bi  b j  c , bi  b j  c
The b' s are variables
taking
(0  c  l )
and the c ' s are constants
a1
a2
from  l ,
values
am-c
am
bj
bj+c
al
Note: Here (bj+c) ≡ am s.t. bj=am-c
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
12
Lemma
: If E is any R.A. expr.
E(R l )  b1b 2  b k |  ( b1 , b 2 ,  b k ) 
Suppose the lemma
Suppose
is true, we can then
prove the theorem
as follows
:
E(R)  R , for some E , for all R , then R l  b1b 2 |  ( b1 , b 2 ) 


Case 1 : Every clause in  has an atom of the form :
b1  a i , b 2  a i , or b1  b 2  c
Consider
(b 1 ,b 2 )  (a m , a m  d ) where
m   i s.t. b1  a i or b 2  a i is an atom;
d   c s.t. b1  b 2  c is an atom
 (a m , a m  d ) is not computed,
ai

but is in R l
am
am+c
c
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
am+d
13
Case 2 : Some clause in  has ONLY atoms with 
Consider
(b 1 ,b 2 )  (a m  d ,a m )
Where no atom
b i  a m or b i  a m  d
appears in  , and
d  c , for all c s.t. b1  b 2  c or b 2  b1  c
appears in  .
 (a m  d , a m ) is computed,

but is not in R l
am
am+d
b2
b1
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
14
Proof of lemma
Basis : 0 operators.  E(R) is R or constant
relation.
R  {b 1b 2|b 2  b1  1 };
{c 1 ,c 2 , c m }  {b 1|b 1  c1  b1  c 2   }
Induction
:
E  E 1  E 2 , E 1 -E 2 or E 1  E 2
E 1  {b 1  b k |Ψ 1(b 1  b k )}
E 2  {b 1  b k |Ψ 2 (b 1  b k )}
'
'
'
'
E 1  E 2  {b1  b k |Ψ 1(b 1  b k )  Ψ 2 (b 1  b k )}
E   F ( E 1 ), F has only  , 
 E  {b1  b k |Ψ 1(b 1  b k )  F(b 1  b k )}
E   S ( E 1 ), proceeding
similarly
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan

15
Transitive closure - more
a1
a2
a3
al
R l  ( a1 , a 2 ), ( a 2 , a 3 )  ( a l 1 , a l )
 1 2 ( 1 ( R l )   2 ( R l ))
Does this relational
algebra expr. computes
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan

Rl ?
16
Transitive closure - more
a1
a2
a3
al
R l  ( a1 , a 2 ), ( a 2 , a 3 )  ( a l 1 , a l )
 1 2 ( 1 ( R l )   2 ( R l ))
Does this relational
algebra expr. computes

Rl ?
YES! But it is NOT a relation algebra expression!
a1
a2
What does “ai<aj” mean now?!
a4
a3
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
17
BP-Completeness

A query language is BP-complete if:
• All functions that can be expressed in the language
are allowable.
• Let r1 and r2 be two relations (instances), such that for
all renamings μ
r1   ( r1 )  r2   ( r2 )
Then there is a function f in the language such that
r2  f ( r1 )
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
18
Example of BP-Complete
A
B
C
D
E
F
5
6
5
6
5
6
5
6
5
6
5
6
6
5
6
5
7
8
6
5
6
5
6
5
7
8
10 11
7
8
7
8
8
7
5
5
6
6
1. If ‘A’ is used as ‘r1’ in previous slide, which
of the others qualifies as ‘r2’?
2. For each such relation, find relational algebra
function f.
CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan
19
```