What is a Query Language? Universality of Data Retrieval Languages, Aho and Ullman, POPL 1979 Raghu Ramakrishnan CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 1 What is …? What Is A Query Language? • A language that allows retrieval and manipulation of data From a database. What Is A Database? • A large collection of DATA • The data can be grouped into sets whose elements have similar structure. What Kind of Structure Can the Data Have? What Kind of Manipulation Should Be Allowed? CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 2 Some Ideas Relations should be treated as sets of tuples. The query language must have a simple, nonoperational meaning that is independent of physical data representation. There must be efficient ways to process queries over (large) sets of similarly structured facts. We will focus on the relational model CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 3 Principles for A Relational Query Language* * Proposed by Aho & Ullman 1) Relation = Set of Tuples. Ordering & other storage details should not be visible. 2) Data Values should not be ‘Interpreted’. Def : Let μ D D be a Bijection. A Function f is Allowable if : ( f ( r1 ,..., rn )) f ( ( r1 ),..., ( rn )) Note: (2) Says that no special meaning should be attached to data values (as far as the query language is concerned); thus, Arithmetic is Disallowed! 5+6 = 11, 8<9, … CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 4 Principles – Refinement Principle (2) is too restrictive. Relax it slightly: Let P be a special set of predicates Preserves . (e.g. , ) P if p P ( p ( x1 ,..., x n )) is true p ( ( x1 ),..., ( x n ) is true. Relaxing Principle (2) : We require that : ( f ( r1 ,..., rn )) f ( ( r1 ),..., ( rn )) only for Bijections that preserve P. Note: If we include +, ×, etc. to P, soon only the identity function will preserve P! CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 5 Allowable Fns – Transitive Closure Aho & Ullman’s notation of allowable function is rather restrictive. However: 1. All Relational Algebra queries are allowable. 2. Transitive Closure is allowable. And they prove that: • There is no Relational Algebra query that computes the Transitive Closure of a Relation. Any R.A expression has a fixed size, say n. Choose Relation R: a1 a2 ak k>n The relational algebra expression cannot deal with (a1, ak). CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 6 Proposal We should extent RA to support a least fixpoint operator. • Leads to recursive queries • Some systems (e.g., Oracle) support limited forms of recursion like transitive closure. Others (DB2) support linear recursion, following SQL:1999. CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 7 Least Fixpoints The LFP operator is defined as follows: LFP ( R f ( R )) r , where : 1. r f(r) 2. if r' f(r') then r r' Theorem (Tarski): There is a least fixpoint satisfying LFP(R=f(R)) if ‘f ’ is monotone. Monotone : r1 r2 f(r1 ) f(r 2 ) Note: If ‘f’ is a relation algebra expression without ‘−’ (set diff.), then it is monotone. CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 8 Least Fixpoint – Cont. Theorem (Kleene) If f is continuous & over a complete lattice, LFP(R f(R)) Lim f ( ) n n Example: Transitive Closure R R r r; f ( R ) is R r r f ( ) r ; f ( f ( )) f ( r ) r r r f n n ( ) r r r i 1 CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 9 LFP - Cont. Claim: The LFP operator satisfies principles 1&2 Theorem (Aho-Ullman): There is no relational algebra expression E(R) that computes the transitive closure of an arbitrary input relation R. CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 10 Proof C onsider a set of l arbitrary sym bols: l a1 , a 2 , al W e consider a fam ily of relations R l ( a1 , a 2 ), ( a 2 , a 3 ) a1 a2 ( a l 1 , a l ) a3 al W e show that N O relational algebra expre ssion com putes exactly the tuples in R l for all l CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 11 We will prove that every R.A. expr. E(R l ) can be expressed as : b1 b 2 b k | ( b1 , b 2 , b k ) Where is of the form : clause1 clause2 Each clause is of the form : atom1 atom2 Each atom is of the form : bi a c , bi a c , bi b j c , bi b j c The b' s are variables taking (0 c l ) and the c ' s are constants a1 a2 from l , values am-c am bj bj+c al Note: Here (bj+c) ≡ am s.t. bj=am-c CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 12 Lemma : If E is any R.A. expr. E(R l ) b1b 2 b k | ( b1 , b 2 , b k ) Suppose the lemma Suppose is true, we can then prove the theorem as follows : E(R) R , for some E , for all R , then R l b1b 2 | ( b1 , b 2 ) Case 1 : Every clause in has an atom of the form : b1 a i , b 2 a i , or b1 b 2 c Consider (b 1 ,b 2 ) (a m , a m d ) where m i s.t. b1 a i or b 2 a i is an atom; d c s.t. b1 b 2 c is an atom (a m , a m d ) is not computed, ai but is in R l am am+c c CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan am+d 13 Case 2 : Some clause in has ONLY atoms with Consider (b 1 ,b 2 ) (a m d ,a m ) Where no atom b i a m or b i a m d appears in , and d c , for all c s.t. b1 b 2 c or b 2 b1 c appears in . (a m d , a m ) is computed, but is not in R l am am+d b2 b1 CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 14 Proof of lemma Basis : 0 operators. E(R) is R or constant relation. R {b 1b 2|b 2 b1 1 }; {c 1 ,c 2 , c m } {b 1|b 1 c1 b1 c 2 } Induction : E E 1 E 2 , E 1 -E 2 or E 1 E 2 E 1 {b 1 b k |Ψ 1(b 1 b k )} E 2 {b 1 b k |Ψ 2 (b 1 b k )} ' ' ' ' E 1 E 2 {b1 b k |Ψ 1(b 1 b k ) Ψ 2 (b 1 b k )} E F ( E 1 ), F has only , E {b1 b k |Ψ 1(b 1 b k ) F(b 1 b k )} E S ( E 1 ), proceeding similarly CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 15 Transitive closure - more a1 a2 a3 al R l ( a1 , a 2 ), ( a 2 , a 3 ) ( a l 1 , a l ) 1 2 ( 1 ( R l ) 2 ( R l )) Does this relational algebra expr. computes CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan Rl ? 16 Transitive closure - more a1 a2 a3 al R l ( a1 , a 2 ), ( a 2 , a 3 ) ( a l 1 , a l ) 1 2 ( 1 ( R l ) 2 ( R l )) Does this relational algebra expr. computes Rl ? YES! But it is NOT a relation algebra expression! a1 a2 What does “ai<aj” mean now?! a4 a3 CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 17 BP-Completeness A query language is BP-complete if: • All functions that can be expressed in the language are allowable. • Let r1 and r2 be two relations (instances), such that for all renamings μ r1 ( r1 ) r2 ( r2 ) Then there is a function f in the language such that r2 f ( r1 ) CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 18 Example of BP-Complete A B C D E F 5 6 5 6 5 6 5 6 5 6 5 6 6 5 6 5 7 8 6 5 6 5 6 5 7 8 10 11 7 8 7 8 8 7 5 5 6 6 1. If ‘A’ is used as ‘r1’ in previous slide, which of the others qualifies as ‘r2’? 2. For each such relation, find relational algebra function f. CS 286, UC Berkeley, Spring 2007 , R. Ramakrishnan 19

Descargar
# Lecture 1