Relational Algebra Relational Query Languages Query = “retrieval program” Language examples: Theoretical: 1. Relational Algebra 2. Relational Calculus a. tuple relational calculus (TRC) b. domain relational calculus (DRC) Practical 1. SQL (SEQUEL from System R) 2. QUEL (Ingres) 3. Datalog (Prolog-like) Theoretical QL’s: give semantics to practical QL’s key to understand query optimization in relational DBMSs Relational Algebra Basic operators select ( ) project (p ) union ( ) set difference ( - ) cartesian product ( x ) rename ( ) The operators take one or two relations as inputs and give a new relation as a result. relation relation relational operator relation R1 sid Example Instances bid 101 102 103 104 bname Interlake Interlake Clipper Marine color blue red green red Boats Schema: Boats(bid, bname, color) Sailors(sid, sname, rating, age) Reserves( sid, bid, day) 22 58 sid bid day 101 103 10/10/96 11/12/96 sname rating age dustin 7 45.0 31 58 lubber rusty 8 10 55.5 35.0 sid S2 28 31 44 58 sname yuppy lubber guppy rusty rating 9 8 5 10 age 35.0 55.5 35.0 35.0 S1 22 Projection Examples: p age ( S 2 ) ; p sname , rating ( S 2) Retains only attributes that are in the “projection list”. Schema of result: exactly the columns in the projection list, with the same names that they had in the input relation. Projection operator has to eliminate duplicates (How do they arise? Why remove them?) Note: real systems typically don’t do duplicate elimination unless the user explicitly asks for it. (Why not?) Projection sid 28 31 44 58 sname yuppy lubber guppy rusty S2 rating 9 8 5 10 age 35.0 55.5 35.0 35.0 sname rating yuppy 9 lubber 8 guppy rusty 5 10 p sname , rating age 35.0 55.5 p age ( S 2 ) ( S 2) Selection () Selects rows that satisfy selection condition. Result is a relation. Schema of result is same as that of the input relation. Do we need to do duplicate elimination? sid 28 31 44 58 sname yuppy lubber guppy rusty rating 9 8 5 10 rating 8 (S 2) age 35.0 55.5 35.0 35.0 p sname rating yuppy 9 rusty 10 snam e, rating ( rating 8 ( S 2 )) Selection Notation: p(r) p is called the selection predicate , r can be the name of a table, or another query Predicate: 1. Simple attr1 = attr2 Attr = constant value (also, <, > , etc) 2. Complex predicate1 AND predicate2 predicate1 OR predicate2 NOT (predicate) Union and Set-Difference All of these operations take two input relations, which must be union-compatible: Same number of columns (attributes). `Corresponding’ columns have the same type. For which, if any, is duplicate elimination required? Union sid sname rating age 22 dustin 7 45.0 31 58 lubber rusty 8 10 55.5 35.0 rating 9 8 5 10 age 35.0 55.5 35.0 35.0 S1 sid 28 31 44 58 sname yuppy lubber guppy rusty S2 sid sname rating age 22 31 58 44 28 dustin lubber rusty guppy yuppy 7 8 10 5 9 45.0 55.5 35.0 35.0 35.0 S1 S 2 Set Difference sid sname rating age sid sname rating age 22 dustin 7 45.0 22 dustin 7 45.0 31 58 lubber rusty 8 10 55.5 35.0 rating 9 8 5 10 age 35.0 55.5 35.0 35.0 S1 S 2 S1 sid 28 31 44 58 sname yuppy lubber guppy rusty S2 sid 28 44 sname yuppy guppy rating 9 5 S2 – S1 age 35.0 35.0 Cartesian-Product S1 R1: Each row of S1 paired with each row of R1. Like the c.p for mathematical relations: every tuple of S1 “appended” to every tuple of R1 Q: How many rows in the result? Result schema has one field per field of S1 and R1, with field names `inherited’ if possible. May have a naming conflict: Both S1 and R1 have a field with the same name. In this case, can use the renaming operator… Cartesian Product Example sid sname rating age 22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0 sid bid day 22 58 101 103 10/10/96 11/12/96 R1 S1 (sid) sname rating age R1 X S1 = (sid) bid day 22 dustin 7 45.0 22 101 10/10/96 22 dustin 7 45.0 58 103 11/12/96 31 lubber 8 55.5 22 101 10/10/96 31 lubber 8 55.5 58 103 11/12/96 58 rusty 10 35.0 22 101 10/10/96 58 rusty 10 35.0 58 103 11/12/96 Rename ( ) Allows us to refer to a relation by more than one name and to rename conflicting names Example: x (E) returns the expression E under the name X If a relational-algebra expression E has arity n, then x (A1, A2, …, An) (E) returns the result of expression E under the name X, and with the attributes renamed to A1, A2, …., An. Ex. temp1(sid1,sname,rating, age, sid2, bid, day) (R1 x S1) Compound Operator: Intersection In addition to the 6 basic operators, there are several additional “Compound Operators” These add no computational power to the language, but are useful shorthands. Can be expressed solely with the basic ops. Intersection takes two input relations, which must be union-compatible. Q: How to express it using basic operators? R S = R (R S) Intersection sid sname rating age 22 dustin 7 45.0 31 58 lubber rusty 8 10 55.5 35.0 S1 sid 28 31 44 58 sname yuppy lubber guppy rusty S2 rating 9 8 5 10 age 35.0 55.5 35.0 35.0 sid sname rating age 31 58 lubber rusty 8 10 55.5 35.0 S1 S 2 Compound Operator: Join Joins are compound operators involving cross product, selection, and (sometimes) projection. Most common type of join is a “natural join” (often just called “join”). R S conceptually is: Compute R S Select rows where attributes that appear in both relations have equal values Project all unique atttributes and one copy of each of the common ones. Note: Usually done much more efficiently than this. Useful for putting “normalized” relations back together. Natural Join Example sid bid day sid sname rating age 22 58 101 103 10/10/96 11/12/96 22 dustin 7 45.0 31 58 lubber rusty 8 10 55.5 35.0 R1 R1 S1 S1 = sid sname rating age bid day 22 58 dustin rusty 7 10 45.0 35.0 101 103 10/10/96 11/12/96 Other Types of Joins Condition Join (or “theta-join”): R c S c ( R S ) (sid) sname rating age (sid) bid day 22 31 dustin lubber 7 8 45.0 55.5 58 58 103 103 11/12/96 11/12/96 S 1 S 1 .sid R1 .sid R1 Result schema same as that of cross-product. May have fewer tuples than cross-product. Equi-join: special case: condition c contains only conjunction of equalities. Compound Operator: Division Useful for expressing “for all” queries like: Find sids of sailors who have reserved all boats. For A/B attributes of B are subset of attrs of A. May need to “project” to make this happen. E.g., let A have 2 fields, x and y; B have only field y: A B x y B ( x, y A) A/B contains all tuples (x) such that for every y tuple in B, there is an xy tuple in A. Examples of Division A/B sno pno pno s1 s1 s1 s1 s2 s2 s3 p1 p2 p3 p4 p1 p2 p2 p2 s4 p2 s4 p4 A B1 pno p2 p4 B2 sno s1 s2 s3 s4 A/B1 pno p1 p2 p4 B3 sno s1 s4 sno A/B2 A/B3 s1 Expressing A/B Using Basic Operators Division is not essential op; just a useful shorthand. (Also true of joins, but joins are so common that systems implement joins specially.) Idea: For A/B, compute all x values that are not `disqualified’ by some y value in B. x value is disqualified if by attaching y value from B, we obtain an xy tuple that is not in A. Disqualified x values = p x (( p x ( A) B ) A) A/B = p x ( A ) Disqualified x values Banking Example branch (branch-name, branch-city, assets) customer (customer-name, customer-street, customer-only) account (account-number, branch-name, balance) loan (loan-number, branch-name, amount) depositor (customer-name, account-number) borrower (customer-name, loan-number) Example Queries loan (loan-number, branch-name, amount) Find all loans of over $1200 amount >1200 (loan) Find the loan number for each loan of an amount greater than $1200 ploan-number (amount > 1200 (loan)) Example Queries depositor (customer-name, account-number) borrower (customer-name, loan-number) Find the names of all customers who have a loan, an depositor account, or both, from the bank pcustomer-name (borrower) pcustomer-name (depositor) Find the names of all customers who have a loan and a depositor account at bank. pcustomer-name (borrower) pcustomer-name (depositor) Example Queries loan (loan-number, branch-name, amount) depositor (customer-name, account-number) borrower (customer-name, loan-number) Find the names of all customers who have a loan at the Perryridge branch but do not have an depositor account at any branch of the bank. pcustomer-name (branch-name = “Perryridge” (borrower – pcustomer-name (depositor) loan)) Example Queries account (account-number, branch-name, balance) Find the largest account balance Rename account relation as d The query is: p balance(account) - paccount.balance( account.balance < d.balance (account x d (account))) Example Queries account (account-number, branch-name, balance) depositor (customer-name, account-number) Find all customers who have an account from at least the “Downtown” and the Uptown” branches. Query 1 pCN(BN=“Downtown”(depositor account)) pCN(BN=“Uptown”(depositor account)) where CN denotes customer-name and BN denotes branch-name. Query 2 customer-name, branch-name (depositor account) temp(branch-name) ({(“Downtown”), (“Uptown”)}) Example Queries Find all customers who have an account at all branches located in Boston. pcustomer-name, branch-name (depositor account) pbranch-name (branch-city = “Boston” (branch)) Extended Relational Operations Additional Operators that extend the power of the language Based on SQL… make the language less clean Generalized projections Outer Joins Update General Projection Notation: p e1, e2, …, en (Relation) ei: can include any arithmetic operation – not only attributes Example: credit = cname limit balance Jones Turner 5000 2000 3000 2500 Smith 4000 3000 Then: p cname, limit – balance = cname limit -balance Jones Turner 3000 500 Smith 1000 Outer Joins Motivation: loan lno amt Jones L-170 Turner L-230 Downtown L-170 Redwood L-230 3000 4000 Smith Perry 1700 bname loan cname borrower = lno L-260 bname lno Downtown L-170 Redwood L-230 Join result loses: •any record of Perry •any record of Smith L-155 borrower amt cname 3000 4000 Jones Turner Outer Join ( Left outer Join ( ) ) preserves all tuples in left relation loan borrower = Right outer Join ( bname lno amt cname Downtown L-170 Redwood L-230 3000 4000 Jones Turner Perry 1700 NULL L-260 bname ) lno Downtown L-170 preserves all tuples in right relation Redwood L-230 NULL L-155 amt cname 3000 4000 Jones Turner NULL Smith Outer Join (cont) Full Outer Join ( ) preserves all tuples in both relations bname lno amt cname Downtown L-170 Redwood L-230 3000 4000 Jones Turner Perry L-260 1700 NULL NULL L-155 NULL Smith Update () 1) Deletion: r r – s account account – bname = Perry (account) 2) Insertion: r r s branch branch {( BU, Boston, 9M)} 3) Update: r account pe1, e2, …, en (r) p bname, acct_no, bal * 1.05 (account)

Descargar
# No Slide Title