Comp 231 Database Management Systems 3. Relational Data Model Department of Computer Science, HKUST 1 Basic Concepts • Entities and relationships are stored in tables • Relationships are captured by including key of one table into another • Languages for manipulating the tables • All popular DBMSs today are based on relational data model (or an extension of it, e.g., object-relational data model) Department of Computer Science, HKUST 2 Why is it so good? • Simplicity, everybody knows how to manipulate tables • Tables are simple enough so that solutions to complicated problems such as concurrency control and query optimization can be obtained • It has a theoretical basis for the studying of database design problems • Tables are logical concepts; physically tables can be stored in different ways support data independence Department of Computer Science, HKUST 3 Terminology • • • • • Relation table; denoted by R(A1, A2, ..., An) Attribute column; denoted by Ai Tuple row Attribute value value stored in a table cell Domain legal type and range of values of an attribute denoted by dom(Ai) – Attribute: Age – Attribute: EmpName – Attribute: Salary Domain: [0-100] Domain: 50 alphabetic chars Domain: non-negative integer • Ideally, a domain can be defined in terms of another domain; e.g., the domain of EmpName is PersonName. However, this is NOT allowed in most basic DBMSs. Department of Computer Science, HKUST 4 An Example Relation Relation/Table Name Attributes/Columns STUDENT Name Chan Kin Ho Lam Wai Kin Man Ko Yee Lee Chin Cheung Alvin Lam Student-id 99223367 96882145 96452165 96154292 96520934 Age 23 17 22 16 15 Department of Computer Science, HKUST 5 CGA 11.19 10.89 8.75 10.98 9.65 Some Formal Definitions • A relation is denoted by: R(A1, A2, ..., An) – STUDENT(Name, Student-id, Age, CGA) • Degree of a relation: the number of attributes n in the relation. • Tuple t of R(A1, A2, ..., An): An ordered set of values <v1,v2,...,vn> where each vi is an element of dom(Ai). • Relation instance r(R): A set of tuples in R • r(R) = {t1, t2, ..., tm}, or alternatively r(R) dom(A1) dom(A2) ... dom(An) Department of Computer Science, HKUST 6 Relation and Cartesian Product • A relation is the Cartesian product of sets of values Name = { Lee, Cheung } Grade = { A, B, C } • A relation StudentGrade (Name, Grade) can be defined as any subset of the following maximal extent: Name Grade = • { Lee, A, Lee, B , Lee, C, Cheung, A, Cheung, B, Cheung C } r(StudentGrade) Name Grade = { Lee, A Cheung C } Department of Computer Science, HKUST 7 After all, why don’t we call them tables? • Observe which of the following would you consider a table: STUDENT Name Chan Kin Ho Lam Wai Kin Man Ko Yee Lee Chin Cheung Alvin Lam Student-id 99223367 96882145 96452165 96154292 96520934 Name Chan Kin Ho Age 23 17 22 16 15 CGA 11.19 10.89 8.75 10.98 9.65 Name Chan Kin Ho Student-id 99223367 Name Age CGA 23 11.19 ??? • A empty table/relation is still a table/relation (can be denoted by R = {}) • A null table/relation is still a null table/relation (can be denoted by R = ) Department of Computer Science, HKUST 8 Characteristics of Relations • Tuples in a relation are not considered to be ordered, even though they appear to be in a tabular form. • Ordering of attributes in a relation schema R are significant because the mathematical definition of a relation doesn’t include attribute names. • Values in a tuple: All values are considered atomic. A special null value is used to represent values that are unknown or inapplicable to certain tuples. Department of Computer Science, HKUST 9 Keys • Let K R (I.e., K is a set of attributes which is a subset of R) • K is a superkey of R if K can identify a unique tuple of each possible relation r(R). Customer(CusNo, Name, Address, …) where customers have unique customer numbers and unique names. superkeys: CusNo Name {CusNo, Name} {CusNo, Name, Address} plus many others • K is a candidate key if K is minimal There are two candidate keys: CusNo and Name • Every relation is guarantee to (must) have at least one key. Why? Department of Computer Science, HKUST 10 Relational Algebra Department of Computer Science, HKUST 11 Relational Algebra • It is a procedural language, I.e., sequence of operations is explicitly specified in the language • Six basic operations: select, project, union, set difference, Cartesian product, rename E.g., R = S T U • Each operation takes one or more relations as input and give a relation as output (a property known as closure allowing us to nest relational algebra expressions) • Contrast relational algebra expressions with arithmetic expressions; the former operates on relations and the latter on numbers Department of Computer Science, HKUST 12 Select Operation • Notation: p (r) • Defined as: p (r) = { t | t r P(t) } • Read as: t is a tuple of r and t satisfies the predicate P. • P is a formula of the form: attribute name op attribute name or attribute name op constant where op is one of • Predicates can be connected by (and) (or) (not) Department of Computer Science, HKUST 13 Select Operation - Example • Relation r: • A B D 5 ( r) : A B C D 1 7 5 7 12 3 23 10 A B C D 1 7 23 10 Department of Computer Science, HKUST 14 Project Operation • Notation: A1, A2, ..., Ak (r) where A1, A2, ..., An are attributes defined in r • After the projection, duplicates are assumed to be eliminated automatically since a relation is a set! Not true in real DMBSs: duplicates are not eliminated until explicitly told to do so • Order of attributes in the projection list can be arbitrary. (One use of projection is to rearrange attributes) Department of Computer Science, HKUST 15 Project Operation - Example • Relation r: • A,C (r): A B C 10 1 20 1 30 1 40 2 A C A C 1 1 1 1 1 2 2 Department of Computer Science, HKUST 16 Union Operation • Notation: r s • Defined as: r s = { t | t r t s } • For r s to be valid (r and s must be compatible), I.e., – r, s must have the same number of attributes – domains must be compatible pair-wise between r and s. • R(EmpName, Age) S(StudName, Age) T(ProdID, Name) U(StudName, Age, CGA) • Compatible relations: R and S, R and StudName, Age U, etc. • Incompatible relations: R and T, R and U, etc. Department of Computer Science, HKUST 17 Union Operation - Example • Relation r, s: r • rs: A B 1 2 s A B 2 1 A B 1 2 1 3 Department of Computer Science, HKUST 18 3 Set Difference Operation • Notation: r s • Defined as: r s = { t | t r t s } • For r s to be valid: – r, s must have the same number of attributes – domains must be compatible pair-wise between r and s. Department of Computer Science, HKUST 19 Set Difference Operation - Example • Relation r, s: r • rs: A B 1 2 s A B 2 1 A B 1 1 Department of Computer Science, HKUST 20 3 Cartesian-Product Operation • Notation: r s • Defined as: r s = { t q | t r q s } • Assume that attributes of r and s are disjoint (I.e., r and s don’t have attributes with the same name) • If they are not disjoint, then make them disjoint by renaming the attributes concerned Department of Computer Science, HKUST 21 Cartesian-Product Operation - Example • Relation r, s: r A • rs: B 1 2 s A B C D E 1 1 1 1 2 2 2 2 γ 10 10 20 10 10 10 20 10 γ C D 10 + 10 + 20 – 10 – Department of Computer Science, HKUST 22 E Join Operation • Cartesian product is rarely used by its own; it is the basis of the join operation, which is more popular • r s = A C (r s) • A and C are sets of attributes which have the same attribute names in r and s. Department of Computer Science, HKUST 23 Join Operation - Example B C, D • R = ( A, B, D) • S = ( E, B B, D D) • R joins S has the result schema: T(A, B, C, D, E) and is defined by: r s = r.A,r.B,r.C,r.D,s.E (r.B=s.B r.D=s.D (r s)) • This is also called natural join Department of Computer Science, HKUST 24 Natural Join Operation - Example r • Relation r, s: • r s: A B 1 1 1 1 2 A γ C γ γ B 1 2 4 1 2 D a a a a b C γ γ D a a b a b s E γ γ Department of Computer Science, HKUST 25 B 1 3 1 2 3 D a a a b b E γ Assignment Operation • The assignment operation() provides a convenient way to express complex queries by storing intermediate results into temporary relations • Assignment must always be made to a temporary relation variable. • Example: Write r s as temp1 R-S (r ) temp2 R-S((temp1 s) - R-S,S( r)) result = temp1 - temp2 – The result to the right of the is assigned to the relation variable on the left of the . – May use variable in subsequent expressions. Department of Computer Science, HKUST 26 Example Queries • Find all customers who have an account from both the “Downtown” and “Uptown” branches. depositor (customer-name, account-no) account (account-no, branch-name, branch-city) • Query 1 CN(BN=“Downtown”(depositor account)) CN(BN=“Uptown”(depositor account)) where CN denotes customer-name and BN denotes branchname. • Query 2 (for the purpose of illustrating division only) temp {“Downtown”, “Uptown”} [ customer-name, branch-name (depositor account) ] temp Department of Computer Science, HKUST 27 Example Queries • Find all customers who have an account at all branches located in Brooklyn. depositor (customer-name, account-no) account (account-no, branch-name, branch-city) given a Returns all branches customer x of a customer and the set of branches a, b, customer-name, branch-name (depositor account) c, check x,a branch-name (branch-city = “Brooklyn” (branch)) x,b and x,c are in … • Dividend contains all customer-branch information all branches in • Divisor contains all branches located in Brooklyn Brooklyn Department of Computer Science, HKUST 28 Tuple Relational Calculus Department of Computer Science, HKUST 29 Tuple Relational Calculus • A nonprocedural query language, where each query is of the form { t | P(t) } • It is the set of all tuples t such that predicate P is true for t • t is a tuple variable; t[A] denotes the value of tuple t on attribute A • t r denotes that tuple t is in relation r • P is a formula similar to that of the predicate calculus Department of Computer Science, HKUST 30 Predicate Calculus Formula 1. Set of attributes and constants 2. Set of comparison operators: (e.g., <, , =, , >, ) 3. Set of connectives: and (), or (), not () 4. Implication (): x y, if x is true, then y is true and if x is NOT true, then y could be either true OR false xyxy 5. Set of quantifiers: t r (Q(t)) “there exists” a tuple t in relation r such that predicate Q(t) is true t r (Q(t)) Q is true “for all” tuples t in relation r Department of Computer Science, HKUST 31 Banking example • branch(branch-name, branch-city, assets) • customer (customer-name, customer-street, customer-city) • account (branch-name, account-number, balance) • loan (branch-name, loan-number, amount) • depositor (customer-name, account-number) • borrower (customer-name, loan-number) Department of Computer Science, HKUST 32 A Simple Selection Query • Find the branch-name, loan-number, and amount for loans of over $1200 {t | t loan t[amount] 1200} t[amount] • all relevant attributes are contained in: loan (branch-name, loan-number, t amount) • all attributes in tuple t is returned; {t[name] | … …} to get only the name attribute • Intuitively, you can consider t as a variable (or pointer) which iteratively loops through each tuple in loan • t is called a tuple variable loan table Department of Computer Science, HKUST 33 amount A Join Query • Find the loan number for each loan of an amount greater than $1200 and return the customer info {t | t customer ( s loan) ( t[loan-number] = s[loannumber] s[amount] 1200 ) } customer t loan-no loan t[loan-no] s loan-no amt s[loan-no] s[amount] • It is a join operation • A customer may have many loans, but as long as one of his loans exceeds 1200, his information will be output • again, all attributes in tuple t is returned Department of Computer Science, HKUST 34 Peculiarity from the Textbook • • • • • • borrower (customer-name, loan-number) loan (branch-name, loan-number, amount) {t | s loan (t[loan-number] = s[loan-number] s[amount] 1200)} Same query as the previous one, except that t is not defined as a tuple in borrower, so t is an arbitrary tuple variable The whole expression only mentions “loan-number” as an attribute of t, so we can only assume that t has only one attribute, namely, “loan-number” This query only returns loan numbers Let’s make it simpler (same as the first example!) {s[loan-number] | s loan s[amount] 1200} It is better to define the scope of a tuple variable first! You need the t borrower part if you want to return the customer-street or city anyway! Department of Computer Science, HKUST 35 Example Queries • Find the names of all customers having a loan, an account, or both at the bank t’s name can be found in the borrower table { t | ( s borrower) ( t[customer-name] =s[customer-name] ) ( u depositor) (t[customer-name] = u[customer-name]) } or t’s name can be found in the depositor table • t has NOT been explicitly defined as a tuple in the customer relation • only attribute customer-name of tuple t is returned • consider adding t customer Question: can I use s depositor instead of u depositor ? Department of Computer Science, HKUST 36 Example Queries • Find the names of all customers who have a loan and an account at the bank. {t | s borrower (t[customer-name] =s[customer-name]) u depositor (t[customer-name] = u[customer-name])} • Alternative: {t [customer-name] | t customer ( s borrower (t[customer-name] = s[customer-name]) u depositor (t[customer-name] = u[customer-name]) )} Department of Computer Science, HKUST 37 Example Queries • Find the names of all customers having a loan at the Perryridge branch { t[name] | t customer s borrower ( t[customer-name] = s[customer-name] u loan (u[branch-name] = “Perryridge” t u[loan-number] = s[loan-number] ) )} customer borrower s loan Perryridge u { t | s borrower (t[customer-name] = s[customer-name] u loan (u[branch-name] = “Perryridge” u[loan-name] = s[loan-number]))} [textbook] Department of Computer Science, HKUST 38 Example Queries • • • • The inner u loan () cannot be taken out of the s borrower ( ) quantifier. customer t borrower s s u {t | t customer s borrower ( t[customer-name] = s[customer-name] ) u loan (u[branch-name] = “Perryridge” u[loan-number] = s[loan-number] )} loan Perryridge where is “s” defined? {t | t customer s borrower ( t[customer-name] = s[customer-name] ) u loan s borrower (u[branch-name] = “Perryridge” u[loan-number] = s[loan-number] )} The two “s” in the two unnested existential quantifiers are not the same. Department of Computer Science, HKUST 39 NOT-EXIST Example • Find the names of all customers who have a loan at the Perryridge branch, but no account at any branch of the bank t is a borrower at Perryridge branch {t | s borrower ( t[customer-name] = s[customer-name] u loan ( u[branch-name] = “Perryridge” u[loan-number] = s[loan-number] ) v depositor (v[customer-name] = t[customer-name] ) ) } customer but t is not a depositor anywhere in the bank t loan depositor borrower s Perryridge u v Department of Computer Science, HKUST 40 EXIST vs. NOT-EXIST v depositor (v[customer-name] = t[customer-name] ) Consider a customer t, there does not exist any depositor whose name is the same as t’s name. v depositor (v[customer-name] t[customer-name] ) Consider a customer t, there exist (any) one depositor whose name is not the same as t’s name. t customer a b c depositor v c e • First query returns a and b • Second query returns a, b, and c (c is qualified because you can find e in depositor which is c) • First query is the same as: vdepositor v[cust-nm] t[cust-nm] Department of Computer Science, HKUST 41 Example Queries • Find the names of all customers having a loan from the Perryridge branch, and also return the cities they live in the borrower and the loan is from Perryridge has a loan {t | s loan (s[branch-name] =“Perryridge” t is a u borrower (u[loan-number] = s[loan-number] borrower t[customer-name] = u[customer-name] v customer (u[customer-name] =v[customer-name] t[customer-city] = v[customer-city])))} this part is for finding t’s city; not needed if you use t customer Department of Computer Science, HKUST 42 Example Queries • Find the names of all customers having a loan from the Perryridge branch and live in the the city where Perryridge branch is located. some loan issued from Perryridge { t | t customer s loan ( s[branch-name] =“Perryridge” u borrower ( u[loan-number] = s[loan-number] t[customer-name] = u[customer-name] v branch ( v[branch-name] = s[branch-name] t has borrowed the loan t[customer-city] = v[branch-city] ) ) ) } t lives in the same city as the Perryridge branch is located branch(branch-name, branch-city, assets) s[branch-name] is already bound to “Perryridge”; want to find the city where Perryridge is in Department of Computer Science, HKUST 43 Query with Universal Quantifier • Find the names of all customers who have an account at all branches located in Brooklyn: {t | s branch ( s[branch-city] =“Brooklyn” if s is a branch in Brooklyn then there is an account u in from that branch u account ( s[branch-name] = u[branch-name] s depositor ( t[customer-name] = s[customer-name] s[account-number] = u[account-number]) ) )} such that t is the depositor of account u Department of Computer Science, HKUST 44 Other Relational Algebra Operators Department of Computer Science, HKUST 45 Outer Join • An extension of the join operation that avoids loss of information. • Computes the join and then adds tuples from one relation that do not match tuples in the other relation to the result of the join. • Uses null values: – null signifies that the value is unknown or does not exist. – All comparisons involving null are false by definition. Department of Computer Science, HKUST 46 Outer Join - Example • Relation loan branch-name loan-number amount Downtown L-170 3000 Perryridge L-260 1700 Redwood L-230 4000 • Relation borrower cust-name loan-number Jones L-170 Smith L-230 Hayes L-155 Department of Computer Science, HKUST 47 Outer Join - Example loan borrower branch-name loan-number amount cust-name loan-number Downtown L-170 3000 Jones L-170 Perryridge L-260 1700 Smith L-230 Redwood L-230 4000 Hayes L-155 Loan Borrower branch-name loan-number amount cust-name Downtown L-170 3000 Jones Redwood L-230 4000 Smith • Join returns only the matching (or “good”) tuples • The fact that loan L-260 has no borrower is not explicit in the result • Hayes has borrowed an non-existent loan L-155 is also undetected Department of Computer Science, HKUST 48 Left Outer Join -Example • Left outer join: Loan borrower • Keep the entire left relation (Loan) and fill in information from the right relation, use null if information is missing. branch-name loan-number amount cust-name Downtown L-170 3000 Jones Perryridge L-260 1700 null Redwood L-230 4000 Smith Department of Computer Science, HKUST 49 Right and Full Outer Join - example • Loan • Loan Borrower branch-name amount cust-name loan-number Downtown 3000 Jones L-170 Redwood 4000 Smith L-230 null null Hayes L-155 borrower branch-name amount cust-name loan-number Downtown 3000 Jones L-170 Redwood 4000 Smith L-230 Perryridge 1700 null L-260 null null Hayes L-155 Department of Computer Science, HKUST 50 Aggregate functions • Aggregation operator takes a collection of values and returns a single value as a result. avg: average value min: minimum value max: maximum value sum: sum of values count: number of values G1,G2, … Gn F1 A1, F2 A2,…, Fm Am(E) – E is any relational-algebra expression – G1, G2, … ,Gn is a list of attributes on which to group – Fi is an aggregate function (avg, min, max, etc.) – Ai is an attribute name to which Fi applies Department of Computer Science, HKUST 51 Aggregate function - example • Relation r: • sumc( r ) A B C 7 7 3 10 sum-C 27 Department of Computer Science, HKUST 52 Aggregate function - example • Relation account grouped by branch-name: branch-name account-number balance Perryridge a-102 400 Perryridge a-201 900 Brighton a-217 750 Brighton a-215 750 Redwood a-222 700 • Branch-name sum balance (account) branch-name Perryridge Brighton Redwood sum-balance 1300 1500 700 Department of Computer Science, HKUST 53 Division Operation is for Reference Only (not covered in class) Department of Computer Science, HKUST 54 Division Operation • Notation: r s • Suited to queries that include the phrase “for all.” • Let r and s be relations on schemas R and S respectively, where R = (A1, ….., Am, B1, ….., Bn) S = (B1, …., Bn) The result of r s is a relation with attributes R - S = (A1, …, Am) r s = { t | t R-S( r ) u s ( tu r ) } Department of Computer Science, HKUST 55 Division Operation - example r s = { t | t R-S( r ) u s ( tu r ) } r A γ δ δ δ δ ε ε B 1 2 3 1 1 1 3 4 6 1 2 s B 1 2 The result consists of attribute A only but not all of the 5 values. How to find out? u = 1, 2 Check if: u s ( tu r ) t R-S( r ) A γ δ ε Is ,1 and ,2 tuples in r? Is ,1 and ,2 tuples in r? check γ and δ… Is ,1 and ,2 tuples in r? Department of Computer Science, HKUST 56 rs A ε Another Division Example Relations r, s: rs A α α α β β γ γ γ B A A A A A A A A C α γ γ γ γ γ γ β A α γ D A A B A B A B B B A A E 1 1 1 1 3 1 1 1 D E A B 1 1 C γ γ Department of Computer Science, HKUST 57 Properties of Division Operation • Let q = r s Then q is the largest relation satisfying: q s r Relation r, s: r A α α α β γ δ δ δ δ ε ε B 1 2 3 1 1 1 3 4 6 1 2 s B 1 2 q A α ε Department of Computer Science, HKUST 58 Alternative Definition of Division • Division is not a fundamental relational algebra operation since it can be defined with other basic algebra operations Let r(R) and s(S) be relations, and let S R r s = R-S (r ) R-S ((R-S (r ) s) R-S,S (r )) To see why: – R-S,S(r ) simply reorders attributes of r so that non-overlapping attributes appear on the left of those in s – R-S (r ) contains all candidates for r s (need to find the unqualified values) – R-S(r) s gives all possible combinations of “r” and s – (R-S(r ) s) R-S,S(r ) returns combinations which are not in r – R-S((R-S(r ) s) R-S,S(r )) gives those tuples t in R-S(r) such that for some tuple u s, tu r. Department of Computer Science, HKUST 59

Descargar
# 無投影片標題 - Hong Kong University of Science and