Comp 231 Database Management
Systems
3. Relational Data Model
Department of Computer Science, HKUST 1
Basic Concepts
• Entities and relationships are stored in tables
• Relationships are captured by including key of one
table into another
• Languages for manipulating the tables
• All popular DBMSs today are based on relational data
model (or an extension of it, e.g., object-relational
data model)
Department of Computer Science, HKUST 2
Why is it so good?
• Simplicity, everybody knows how to manipulate tables
• Tables are simple enough so that solutions to
complicated problems such as concurrency control and
query optimization can be obtained
• It has a theoretical basis for the studying of database
design problems
• Tables are logical concepts; physically tables can be
stored in different ways  support data independence
Department of Computer Science, HKUST 3
Terminology
•
•
•
•
•
Relation  table; denoted by R(A1, A2, ..., An)
Attribute  column; denoted by Ai
Tuple  row
Attribute value  value stored in a table cell
Domain  legal type and range of values of an attribute
denoted by dom(Ai)
– Attribute: Age
– Attribute: EmpName
– Attribute: Salary
Domain: [0-100]
Domain: 50 alphabetic chars
Domain: non-negative integer
• Ideally, a domain can be defined in terms of another
domain; e.g., the domain of EmpName is PersonName.
However, this is NOT allowed in most basic DBMSs.
Department of Computer Science, HKUST 4
An Example Relation
Relation/Table Name
Attributes/Columns
STUDENT
Name
Chan Kin Ho
Lam Wai Kin
Man Ko Yee
Lee Chin Cheung
Alvin Lam
Student-id
99223367
96882145
96452165
96154292
96520934
Age
23
17
22
16
15
Department of Computer Science, HKUST 5
CGA
11.19
10.89
8.75
10.98
9.65
Some Formal Definitions
• A relation is denoted by: R(A1, A2, ..., An)
– STUDENT(Name, Student-id, Age, CGA)
• Degree of a relation: the number of attributes n in the relation.
• Tuple t of R(A1, A2, ..., An): An ordered set of values <v1,v2,...,vn>
where each vi is an element of dom(Ai).
• Relation instance r(R): A set of tuples in R
• r(R) = {t1, t2, ..., tm}, or alternatively
r(R)  dom(A1)  dom(A2)  ...  dom(An)
Department of Computer Science, HKUST 6
Relation and Cartesian Product
• A relation is the Cartesian product of sets of values
Name = { Lee, Cheung }
Grade = { A, B, C }
• A relation StudentGrade (Name, Grade) can be defined as any
subset of the following maximal extent:
Name  Grade =
•
{ Lee, A, Lee, B , Lee, C,
Cheung, A, Cheung, B, Cheung C }
r(StudentGrade)  Name  Grade = { Lee, A Cheung C }
Department of Computer Science, HKUST 7
After all, why don’t we call them tables?
• Observe which of the following would you consider a table:
STUDENT
Name
Chan Kin Ho
Lam Wai Kin
Man Ko Yee
Lee Chin Cheung
Alvin Lam
Student-id
99223367
96882145
96452165
96154292
96520934
Name
Chan Kin Ho
Age
23
17
22
16
15
CGA
11.19
10.89
8.75
10.98
9.65
Name
Chan Kin Ho
Student-id
99223367
Name
Age CGA
23 11.19
???
• A empty table/relation is still a table/relation (can be denoted by
R = {})
• A null table/relation is still a null table/relation (can be denoted
by R = )
Department of Computer Science, HKUST 8
Characteristics of Relations
• Tuples in a relation are not considered to be ordered, even
though they appear to be in a tabular form.
• Ordering of attributes in a relation schema R are significant
because the mathematical definition of a relation doesn’t
include attribute names.
• Values in a tuple: All values are considered atomic. A special
null value is used to represent values that are unknown or
inapplicable to certain tuples.
Department of Computer Science, HKUST 9
Keys
• Let K  R (I.e., K is a set of attributes which is a subset of R)
• K is a superkey of R if K can identify a unique tuple of each
possible relation r(R).
Customer(CusNo, Name, Address, …)
where customers have unique customer numbers and unique names.
superkeys:
CusNo
Name
{CusNo, Name}
{CusNo, Name, Address}
plus many others
• K is a candidate key if K is minimal
There are two candidate keys: CusNo and Name
• Every relation is guarantee to (must) have at least one key. Why?
Department of Computer Science, HKUST 10
Relational Algebra
Department of Computer Science, HKUST 11
Relational Algebra
• It is a procedural language, I.e., sequence of operations is
explicitly specified in the language
• Six basic operations: select, project, union, set difference,
Cartesian product, rename
E.g., R = S  T  U
• Each operation takes one or more relations as input and give a
relation as output (a property known as closure allowing us to
nest relational algebra expressions)
• Contrast relational algebra expressions with arithmetic
expressions; the former operates on relations and the latter on
numbers
Department of Computer Science, HKUST 12
Select Operation
• Notation: p (r)
• Defined as: p (r) = { t | t  r  P(t) }
• Read as: t is a tuple of r and t satisfies the predicate
P.
• P is a formula of the form:
attribute name op attribute name
or attribute name op constant
where op is one of      
• Predicates can be connected by  (and)  (or)  (not)
Department of Computer Science, HKUST 13
Select Operation - Example
• Relation r:
•  A  B  D  5 ( r) :
A
B
C
D


1
7


5
7


12
3


23
10
A
B
C
D


1
7


23
10
Department of Computer Science, HKUST 14
Project Operation
• Notation:  A1, A2, ..., Ak (r)
where A1, A2, ..., An are attributes defined in r
• After the projection, duplicates are assumed to be
eliminated automatically since a relation is a set!
Not true in real DMBSs: duplicates are not eliminated
until explicitly told to do so
• Order of attributes in the projection list can be
arbitrary. (One use of projection is to rearrange
attributes)
Department of Computer Science, HKUST 15
Project Operation - Example
• Relation r:
• 
A,C
(r):
A
B
C

10 1

20 1

30 1

40 2
A
C
A
C

1
1


1
1


1
2


2
Department of Computer Science, HKUST 16
Union Operation
• Notation: r  s
• Defined as: r  s = { t | t  r  t  s }
• For r  s to be valid (r and s must be compatible),
I.e.,
– r, s must have the same number of attributes
– domains must be compatible pair-wise between r and s.
• R(EmpName, Age) S(StudName, Age)
T(ProdID, Name)
U(StudName, Age, CGA)
• Compatible relations: R and S, R and  StudName, Age U, etc.
• Incompatible relations: R and T, R and U, etc.
Department of Computer Science, HKUST 17
Union Operation - Example
• Relation r, s: r
• rs:
A
B



1
2
s
A
B


2
1
A
B




1
2
1
3
Department of Computer Science, HKUST 18
3
Set Difference Operation
• Notation: r  s
• Defined as: r  s = { t | t  r  t  s }
• For r  s to be valid:
– r, s must have the same number of attributes
– domains must be compatible pair-wise between r and s.
Department of Computer Science, HKUST 19
Set Difference Operation - Example
• Relation r, s: r
• rs:
A
B



1
2
s
A
B


2
1
A
B


1
1
Department of Computer Science, HKUST 20
3
Cartesian-Product Operation
• Notation: r  s
• Defined as: r  s = { t q | t  r  q  s }
• Assume that attributes of r and s are disjoint (I.e., r
and s don’t have attributes with the same name)
• If they are not disjoint, then make them disjoint by
renaming the attributes concerned
Department of Computer Science, HKUST 21
Cartesian-Product Operation - Example
• Relation r, s: r A
• rs:
B

1

2
s
A
B
C
D
E








1
1
1
1
2
2
2
2



γ
10
10
20
10
10
10
20
10











γ
C
D

10 +

10 +

20 –

10 –
Department of Computer Science, HKUST 22
E
Join Operation
• Cartesian product is rarely used by its own; it is the
basis of the join operation, which is more popular
• r
s = A  C (r  s)
• A and C are sets of attributes which have the same
attribute names in r and s.
Department of Computer Science, HKUST 23
Join Operation - Example
B C, D
• R = ( A, B,
D)
• S = ( E, B
B, D
D)
• R joins S has the result schema: T(A, B, C, D, E) and
is defined by:
r s =  r.A,r.B,r.C,r.D,s.E (r.B=s.B  r.D=s.D (r  s))
• This is also called natural join
Department of Computer Science, HKUST 24
Natural Join Operation - Example
r
• Relation r, s:
• r
s:
A





B
1
1
1
1
2
A


γ


C


γ
γ

B
1
2
4
1
2
D
a
a
a
a
b
C

γ

γ

D
a
a
b
a
b
s
E

γ

γ

Department of Computer Science, HKUST 25
B
1
3
1
2
3
D
a
a
a
b
b
E


γ


Assignment Operation
• The assignment operation() provides a convenient way to
express complex queries by storing intermediate results into
temporary relations
• Assignment must always be made to a temporary relation
variable.
• Example: Write r  s as
temp1  R-S (r )
temp2  R-S((temp1  s) - R-S,S( r))
result = temp1 - temp2
– The result to the right of the  is assigned to the relation
variable on the left of the .
– May use variable in subsequent expressions.
Department of Computer Science, HKUST 26
Example Queries
• Find all customers who have an account from both the
“Downtown” and “Uptown” branches.
depositor (customer-name, account-no)
account (account-no, branch-name, branch-city)
• Query 1
CN(BN=“Downtown”(depositor
account)) 
CN(BN=“Uptown”(depositor
account))
where CN denotes customer-name and BN denotes branchname.
• Query 2 (for the purpose of illustrating division only)
temp  {“Downtown”, “Uptown”}
[ customer-name, branch-name (depositor
account) ]  temp
Department of Computer Science, HKUST 27
Example Queries
• Find all customers who have an account at all
branches located in Brooklyn.
depositor (customer-name, account-no)
account (account-no, branch-name, branch-city)
given a
Returns all branches
customer x
of a customer
and the set of
branches a, b,
customer-name, branch-name (depositor account)
c, check x,a
  branch-name (branch-city = “Brooklyn” (branch))
x,b and x,c
are in …
• Dividend contains all customer-branch information
all branches in
• Divisor contains all branches located in Brooklyn
Brooklyn
Department of Computer Science, HKUST 28
Tuple Relational Calculus
Department of Computer Science, HKUST 29
Tuple Relational Calculus
• A nonprocedural query language, where each query is of the
form
{ t | P(t) }
• It is the set of all tuples t such that predicate P is true for t
• t is a tuple variable; t[A] denotes the value of tuple t on
attribute A
• t  r denotes that tuple t is in relation r
• P is a formula similar to that of the predicate calculus
Department of Computer Science, HKUST 30
Predicate Calculus Formula
1. Set of attributes and constants
2. Set of comparison operators: (e.g., <, , =, , >, )
3. Set of connectives: and (), or (), not ()
4. Implication (): x  y, if x is true, then y is true and if x is NOT
true, then y could be either true OR false
xyxy
5. Set of quantifiers:
 t  r (Q(t))  “there exists” a tuple t in relation r such
that predicate Q(t) is true
 t  r (Q(t))  Q is true “for all” tuples t in relation r
Department of Computer Science, HKUST 31
Banking example
• branch(branch-name, branch-city, assets)
• customer (customer-name, customer-street,
customer-city)
• account (branch-name, account-number, balance)
• loan (branch-name, loan-number, amount)
• depositor (customer-name, account-number)
• borrower (customer-name, loan-number)
Department of Computer Science, HKUST 32
A Simple Selection Query
• Find the branch-name, loan-number, and amount for loans of
over $1200
{t | t  loan  t[amount]  1200}
t[amount]
• all relevant attributes are contained in:
loan (branch-name, loan-number,
t
amount)
• all attributes in tuple t is returned;
{t[name] | … …} to get only the name
attribute
• Intuitively, you can consider t as a
variable (or pointer) which iteratively
loops through each tuple in loan
• t is called a tuple variable
loan table
Department of Computer Science, HKUST 33
amount
A Join Query
• Find the loan number for each loan of an amount greater than
$1200 and return the customer info
{t | t customer  ( s  loan) ( t[loan-number] = s[loannumber]  s[amount]  1200 ) }
customer
t
loan-no
loan
t[loan-no]
s
loan-no amt
s[loan-no]
s[amount]
• It is a join operation
• A customer may have many loans, but as long as one of his
loans exceeds 1200, his information will be output
• again, all attributes in tuple t is returned
Department of Computer Science, HKUST 34
Peculiarity from the Textbook
•
•
•
•
•
•
borrower (customer-name, loan-number)
loan (branch-name, loan-number, amount)
{t |  s loan (t[loan-number] = s[loan-number]  s[amount]  1200)}
Same query as the previous one, except that t is not defined as a tuple in
borrower, so t is an arbitrary tuple variable
The whole expression only mentions “loan-number” as an attribute of t, so
we can only assume that t has only one attribute, namely, “loan-number”
This query only returns loan numbers
Let’s make it simpler (same as the first example!)
{s[loan-number] | s loan  s[amount]  1200}
It is better to define the scope of a tuple variable first! You need the
t borrower part if you want to return the customer-street or city anyway!
Department of Computer Science, HKUST 35
Example Queries
• Find the names of all customers having a loan, an account, or
both at the bank
t’s name can be found in the borrower table
{ t | ( s  borrower) ( t[customer-name] =s[customer-name] )
 ( u depositor) (t[customer-name] = u[customer-name]) }
or t’s name can be found in the depositor table
• t has NOT been explicitly defined as a tuple in the customer relation
• only attribute customer-name of tuple t is returned
• consider adding t  customer
Question: can I use s depositor instead of u depositor ?
Department of Computer Science, HKUST 36
Example Queries
• Find the names of all customers who have a loan and an
account at the bank.
{t | s  borrower (t[customer-name] =s[customer-name])
  u depositor (t[customer-name] = u[customer-name])}
• Alternative:
{t [customer-name] | t customer 
( s  borrower (t[customer-name] = s[customer-name])
  u depositor (t[customer-name] = u[customer-name]) )}
Department of Computer Science, HKUST 37
Example Queries
•
Find the names of all customers having a loan at the Perryridge branch
{ t[name] | t  customer 
 s  borrower ( t[customer-name] = s[customer-name]
 u loan (u[branch-name] = “Perryridge”
t
 u[loan-number] = s[loan-number] ) )}
customer
borrower
s
loan
Perryridge
u
{ t | s  borrower (t[customer-name] = s[customer-name]
 u  loan (u[branch-name] = “Perryridge”
 u[loan-name] = s[loan-number]))}
[textbook]
Department of Computer Science, HKUST 38
Example Queries
•
•
•
•
The inner u loan ()
cannot be taken out of the
 s  borrower ( )
quantifier.
customer
t
borrower
s
s
u
{t | t customer 
 s  borrower ( t[customer-name] = s[customer-name] )
  u loan (u[branch-name] = “Perryridge”
 u[loan-number] = s[loan-number] )}
loan
Perryridge
where is “s” defined?
{t | t customer 
 s  borrower ( t[customer-name] = s[customer-name] )
  u loan  s  borrower (u[branch-name] = “Perryridge”
 u[loan-number] = s[loan-number] )}
The two “s” in the two unnested existential quantifiers are not the same.
Department of Computer Science, HKUST 39
NOT-EXIST Example
• Find the names of all customers who have a loan at the
Perryridge branch, but no account at any branch of the bank
t is a borrower at Perryridge branch
{t | s borrower ( t[customer-name] = s[customer-name] 
u loan ( u[branch-name] = “Perryridge” 
u[loan-number] = s[loan-number] ) 
 v  depositor (v[customer-name] = t[customer-name] ) ) }
customer
but t is not a depositor anywhere in the bank
t
loan
depositor
borrower
s
Perryridge
u
v
Department of Computer Science, HKUST 40
EXIST vs. NOT-EXIST
v  depositor (v[customer-name] = t[customer-name] )
Consider a customer t, there does not exist any depositor
whose name is the same as t’s name.
v  depositor (v[customer-name]  t[customer-name] )
Consider a customer t, there exist (any) one depositor
whose name is not the same as t’s name.
t
customer
a
b
c
depositor
v
c
e
• First query returns a and b
• Second query returns a, b, and c (c is
qualified because you can find e in
depositor which is  c)
• First query is the same as:
 vdepositor v[cust-nm]  t[cust-nm]
Department of Computer Science, HKUST 41
Example Queries
• Find the names of all customers having a loan
from the Perryridge branch, and also return the
cities they live in
the borrower
and the loan is
from Perryridge
has a loan
{t | s  loan (s[branch-name] =“Perryridge”
t is a
  u  borrower (u[loan-number] = s[loan-number]
borrower
 t[customer-name] = u[customer-name]
  v  customer (u[customer-name] =v[customer-name]
 t[customer-city] = v[customer-city])))}
this part is for finding t’s city;
not needed if you use t customer
Department of Computer Science, HKUST 42
Example Queries
•
Find the names of all customers having a loan from the Perryridge
branch and live in the the city where Perryridge branch is located.
some loan issued
from Perryridge
{ t | t customer 
 s  loan ( s[branch-name] =“Perryridge”
  u  borrower ( u[loan-number] = s[loan-number]
 t[customer-name] = u[customer-name]
  v  branch ( v[branch-name] =
s[branch-name]
t has borrowed the loan
 t[customer-city] =
v[branch-city] ) ) ) }
t lives in the same city as
the Perryridge branch is located
branch(branch-name, branch-city, assets)
s[branch-name] is already bound
to “Perryridge”; want to find the
city where Perryridge is in
Department of Computer Science, HKUST 43
Query with Universal Quantifier
• Find the names of all customers who have an
account at all branches located in Brooklyn:
{t | s  branch ( s[branch-city] =“Brooklyn” 
if s is a branch
in Brooklyn
then there is an account u
in from that branch
 u  account ( s[branch-name] = u[branch-name]
  s  depositor ( t[customer-name] = s[customer-name]
 s[account-number] = u[account-number]) ) )}
such that t is the
depositor of account u
Department of Computer Science, HKUST 44
Other Relational Algebra Operators
Department of Computer Science, HKUST 45
Outer Join
• An extension of the join operation that avoids loss of
information.
• Computes the join and then adds tuples from one relation that
do not match tuples in the other relation to the result of the join.
• Uses null values:
– null signifies that the value is unknown or does not exist.
– All comparisons involving null are false by definition.
Department of Computer Science, HKUST 46
Outer Join - Example
• Relation loan
branch-name loan-number
amount
Downtown
L-170
3000
Perryridge
L-260
1700
Redwood
L-230
4000
• Relation borrower
cust-name
loan-number
Jones
L-170
Smith
L-230
Hayes
L-155
Department of Computer Science, HKUST 47
Outer Join - Example
loan
borrower
branch-name loan-number amount
cust-name
loan-number
Downtown
L-170
3000
Jones
L-170
Perryridge
L-260
1700
Smith
L-230
Redwood
L-230
4000
Hayes
L-155
Loan
Borrower
branch-name loan-number amount
cust-name
Downtown
L-170
3000
Jones
Redwood
L-230
4000
Smith
• Join returns only the matching (or “good”) tuples
• The fact that loan L-260 has no borrower is not explicit in the result
• Hayes has borrowed an non-existent loan L-155 is also undetected
Department of Computer Science, HKUST 48
Left Outer Join -Example
• Left outer join: Loan
borrower
• Keep the entire left relation (Loan) and fill in information
from the right relation, use null if information is missing.
branch-name loan-number amount
cust-name
Downtown
L-170
3000
Jones
Perryridge
L-260
1700
null
Redwood
L-230
4000
Smith
Department of Computer Science, HKUST 49
Right and Full Outer Join - example
• Loan
• Loan
Borrower
branch-name amount
cust-name
loan-number
Downtown
3000
Jones
L-170
Redwood
4000
Smith
L-230
null
null
Hayes
L-155
borrower
branch-name amount
cust-name
loan-number
Downtown
3000
Jones
L-170
Redwood
4000
Smith
L-230
Perryridge
1700
null
L-260
null
null
Hayes
L-155
Department of Computer Science, HKUST 50
Aggregate functions
•
Aggregation operator  takes a collection of values and returns a single value as
a result.
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
G1,G2, … Gn  F1 A1, F2 A2,…, Fm Am(E)
– E is any relational-algebra expression
– G1, G2, … ,Gn is a list of attributes on which to group
– Fi is an aggregate function (avg, min, max, etc.)
– Ai is an attribute name to which Fi applies
Department of Computer Science, HKUST 51
Aggregate function - example
• Relation r:
• sumc( r )
A




B




C
7
7
3
10
sum-C
27
Department of Computer Science, HKUST 52
Aggregate function - example
• Relation account grouped by branch-name:
branch-name account-number balance
Perryridge
a-102
400
Perryridge
a-201
900
Brighton
a-217
750
Brighton
a-215
750
Redwood
a-222
700
• Branch-name 
sum balance
(account)
branch-name
Perryridge
Brighton
Redwood
sum-balance
1300
1500
700
Department of Computer Science, HKUST 53
Division Operation is for Reference
Only (not covered in class)
Department of Computer Science, HKUST 54
Division Operation
• Notation: r  s
• Suited to queries that include the phrase “for all.”
• Let r and s be relations on schemas R and S
respectively, where
R = (A1, ….., Am, B1, ….., Bn)
S = (B1, …., Bn)
The result of r  s is a relation with attributes R - S =
(A1, …, Am)
r  s = { t | t R-S( r )  u  s ( tu  r ) }
Department of Computer Science, HKUST 55
Division Operation - example
r  s = { t | t R-S( r )  u  s ( tu  r ) }
r
A




γ
δ
δ
δ
δ
ε
ε
B
1
2
3
1
1
1
3
4
6
1
2
s
B
1
2
The result consists of attribute A only but
not all of the 5 values. How to find out?
u = 1, 2 Check if: u  s ( tu  r )
t R-S( r )
A


γ
δ
ε
Is ,1 and ,2
tuples in r?
Is ,1 and ,2
tuples in r?
check γ and δ…
Is ,1 and ,2
tuples in r?
Department of Computer Science, HKUST 56
rs
A

ε
Another Division Example
Relations r, s:
rs
A
α
α
α
β
β
γ
γ
γ
B
A
A
A
A
A
A
A
A
C
α
γ
γ
γ
γ
γ
γ
β
A
α
γ
D
A
A
B
A
B
A
B
B
B
A
A
E
1
1
1
1
3
1
1
1
D
E
A
B
1
1
C
γ
γ
Department of Computer Science, HKUST 57
Properties of Division Operation
• Let q = r  s
Then q is the largest relation satisfying: q  s  r
Relation r, s:
r
A
α
α
α
β
γ
δ
δ
δ
δ
ε
ε
B
1
2
3
1
1
1
3
4
6
1
2
s
B
1
2
q
A
α
ε
Department of Computer Science, HKUST 58
Alternative Definition of Division
• Division is not a fundamental relational algebra operation since it
can be defined with other basic algebra operations
Let r(R) and s(S) be relations, and let S  R
r  s = R-S (r )  R-S ((R-S (r )  s)  R-S,S (r ))
To see why:
– R-S,S(r ) simply reorders attributes of r so that non-overlapping
attributes appear on the left of those in s
– R-S (r ) contains all candidates for r  s (need to find the unqualified
values)
– R-S(r)  s gives all possible combinations of “r” and s
– (R-S(r )  s)  R-S,S(r ) returns combinations which are not in r
– R-S((R-S(r )  s)  R-S,S(r )) gives those tuples t in
R-S(r) such that for some tuple u  s, tu r.
Department of Computer Science, HKUST 59
Descargar

無投影片標題 - Hong Kong University of Science and