Business Analysis and Data Design
ITEC-630 Fall 2008
Business Data Design
Professor J. Alberto Espinosa
Agenda
• Introduction to database concepts
• Data modeling & relational database design
• Transitional artifacts: the CRUD matrix –
linking requirements to data design
• Normalization
• Database queries
2
Database Concepts
3
Definitions
Database:
An organized collection of “logically related” data
that can be retrieved on demand
Database Management System (DBMS):
Software that manages databases
(i.e., define, create, update, and query databases)
Acts as intermediary between business
applications and physical data files
“Most powerful, scalable, flexible and effective business
applications rely on a well designed database and a
powerful underlying DBMS”
4
The Old Way:
Programs and Data files
Data and program files were separate. You had to write individual
programs to: define the data; upload it; update it; manipulate it; and or
retrieve it
Examples:
Data
Files
Application Programs
Accounting,
Human Resources
API1
System Software
Windows
Unix, Linux
INSTRUCTION SET
HARDWARE
PC,
Mainframe
5
A Better Way: Using a DBMS
A business application passes high level instructions to the DBMS. The
DBMS has capabilities to do all the necessary data management: data
definition, manipulation, and retrieval. So, the business application does
not have to worry about low level data management functions
Examples:
Database Application
API2
Database
DBMS
Accounting,
Human Resources,
ERP, CRM
Oracle, Access,
MS SQL Server
API1
System Software
Windows
Unix, Linux
INSTRUCTION SET
HARDWARE
PC,
Mainframe
6
Advantages of Using
Databases & DBMSs
•
•
•
•
•
•
•
•
Programs independent of data structure
Less data redundancy
Better consistency in the data
More flexibility & scalability
Easier to integrate & share data
Easier to develop business applications
Easier to enforce business rules/constraints
Easier access to data by users
(e.g., queries, reports, forms, etc.)
7
Stand-alone DBMS
DBMS and database work in the same computer:
the user’s computer  OK for personal productivity
Stand-alone
DBMS
(e.g., MS Access)
Database
8
DBMS in a Client/Server Environment:
Better for corporate use  the DBMS has two components
DBMS Server: runs the “back-end” part of the DBMS and
performs most of the data management functions – e.g.,
queries, updates, etc.
DBMS Client: runs “front-end” part of the DBMS that provides the
user interface (e.g., data entry, screen displays or presentation,
report formatting, query building tools)
DBMS
Client
Data Request
(e.g., query)
Response
(e.g., query result)
DBMS
Server
Database
Retrieve, add,
delete and/or
update data
9
DBMS in a Web Server Environment:
Very common when there are large numbers of users and would be impractical to deploy and
install a DBSM client  access to the database is done through a browser
(e.g., on-line purchases)
Request (ex. get a price quote, place an order)
Response (ex. query results with HTML-formatted
product price or order confirmation notice)
10
Business to Business
E-Commerce Example
using XML
e.g., supplier
INSERT
query
XML
Processor
DBMS
(e.g., MS
SQL Server)
XML Document
(e.g., Purchase
Order)
Internet
XML
Processor
e.g., buyer
SELECT
query
XML Document
(e.g., Purchase
Order)
DBMS
(e.g., Oracle)
11
Data Warehouse
“A database that stores and consolidates current and historical data from various systems
(internal and external) with tools for management reporting and sophisticated analysis—i.e.,
Datamining”
Business
Intelligence
12
Most Common Database Models
•
•
•
•
Hierarchical (of historical interest only)
Network (of historical interest only)
Relational
Object Oriented (new)
13
Relational Database
• For a database to be truly relational, it must comply
with 12 rules defined by its inventor (Dr. E. F. Codd).
• No commercially available database complies with
the full set of rules, but the 12 rules are used as
guidelines for sound database design.
• Rule 1 states that data should be presented in tables
• Rule 2 states that data must be accessible without
ambiguity
• We will talk more about other rules later (i.e., about
entity integrity and referential integrity – stay tuned).
14
Implications about Rule 1
A relational database must have:
•
•
•
Tables: or “entities”
Every table has a unique name
Ex. Students, Courses
Fields: or “columns”, “attributes”
Every field has a unique name within the table
Ex. Students (StudentID, StudentName, Major, Address)
Ex. Courses (CourseNo, CouseName, CreditPoints,
Description)
Records: or “rows”, “tuples”, “instances”
Every record is unique (has a unique field that identifies it)
Ex. {“jdoe”, “John Doe”, “CS”, 5000 Forbes Ave.)
Ex. {“MGMT-352-001”, “MIS”, Fall 2002, “A great course”}
15
Implications about Rule 2
Unambiguous reference  Table.Field WHERE Record Search
Ex. Students.StudentID 
refers to the StudentID field/column of the Students table
Ex. Courses.CourseName WHERE CourseNo = “ITEC-630”

more specifically, refers to the value in the CourseName
field/column of the Courses table in the record/row in which
the field/column CourseNo is “ITEC-630”, that is “System
Requirements”
16
Object Oriented (OO) Databases
•
•
•
•
•
•
OO languages + added database functionality, or
Database products + added OO programming facilities
Similar to relational databases
“Classes” (a grouping of similar objects -- like tables)
“Objects” (an instance of a class -- like records)
“Object properties” (object attributes -- like fields)
• Plus:
– Methods (i.e., procedures or programs)
Programs embedded in classes and objects
– Other OO Properties (inheritance, encapsulation, etc.)
17
Terminology Equivalence
ERD or
Data Model
OO Database
Relational
Database
Entity
Class
Table
Instances
Objects
Records
Relationship
Relationship
Relationship
Attributes
Properties
Fields
Other
Terms Used
Rows, Tuples
Columns
18
Database Management
Systems (DBMS)
Software that manages databases
i.e., define, create, update, and query databases
e.g., MS Access, MS SQL Server, Oracle
19
DBMS Functions and Tools
•
Performs 3 main functions:
– Data definition (define, create databases)
– Data manipulation (data entry, updates)
– Data retrieval (extraction, reports, displays)
•
Plus additional database tools:
– Data dictionary: data about the database
– Visual tools: report & form design
– Data modeling & database design tools
– Macros and programming languages
– Internet/web features, etc.
•
Examples:
– Oracle, DB2, Visual FoxPro, MS Access & MS SQL
20
Database
Design
21
Database Design Goals
•
•
•
•
•
Data integrity
Avoid anomalies in the data
No data redundancy
Record the data in one place only
Efficient data entry
Duplicate data means having to enter the same data more than
once
Consistency
Duplicate data can lead to inconsistencies when the data changes
e.g., 2 different addresses for same client
Flexibility and easy evolution
East to maintain, update and add new tables
22
Database Design Issue #1:
Enforce Entity Integrity
23
Entity Integrity
•
•
Is ensuring that every record in each table in the database can
be addressed (i.e., found) – this means that there each
record has to have a unique identifier that is not duplicate
or null (i.e., not blank)
Examples: every student has an AU ID; every purchase order
has a unique number; every customer has an ID
Primary key (PK)  helps enforce Entity Integrity:
• Field(s) that uniquely identifies a record in a table
(e.g., AU user ID)
• Entity integrity = PK is not duplicate & not blank
• PK can be:
– A single field (e.g., UserID), or
– Or more than one field (e.g., OrderNo, LineItem)
24
Database Design Issue #2:
Enforce Referential Integrity
25
Referential Integrity
• Is ensuring that the data that is entered in one table
is consistent with data in other tables
• Examples: purchase orders can only be placed by
valid customers; accounting transactions can only be
posted to valid company accounts
Foreign key (FK)  helps enforce referential Integrity:
• A field in a table that is a PK in another table
• That is, a field that “must” exist in another table
• This is how referential integrity is maintained
26
Illustration: Primary and Foreign Keys
PK
FK
PK
27
Entity, Referential Integrity
PK
PK
PK, FK
PK
FK
Database Schema:
The structure of the
database, which contain
tables, views, constraints,
relations, etc. – just about
everything, except the
data itself
PK, FK
28
Other Important Keys
•
Candidate Keys:
– Often there are more than one keys that could serve
as a primary key
– Example: Order, LineItem vs. Order, ProdID
– Example: AU ID, SSN, AU Login ID
– These are called candidate
– Any candidate can be selected as the primary key
•
Alternative Keys:
– Once a primary key has been selected from the
choice of candidate keys, the other keys (not used
as PKs) are referred to as “alternative keys”
29
Database Design Issue #3:
Develop a Data Model
or Entity-Relationship Diagram (ERD)
30
Data Model Example (Entity Relationship Diagram--ERD):
Course Registration System
Courses
Instructors
CourseNo
CourseDescription
Many
InstructorID
CreditPoints
PreRequisites
ClassroomNo
1
Includes
InstructorID
Teach
1
Relationships
StudentID
Enrollments
Comments
Entities
Students
Many
StudentID
CourseNo
LastName
FirstName
Telephone
EMailAddr
Enrolls
Many
1
LastName
FirstName
SSN
Department
College
Major
EMailAddr
31
Data Model Example (Entity Relationship Diagram--ERD):
Course Registration System
Cardinality
1 to
Many
Enrolls
Entities
Relationships
32
The Textbook’s ERD Notation
Entities
InstructorID
LastName
CourseNo
FirstName
Teach
Instructors
Telephone
InstructorID
(FK)
EMail
CourseDescr
Courses
CreditPoints
PreReqs
Relationships
33
Peter Chen’s ERD Notation
Instructors
PK
Course
InstructorID
Teaches
LastName
FirstName
Telephone
EMail
PK
FK1
CourseNo
CourseDescription
InstructorID
CreditPoints
PreRequisites
34
Entity-Relationship Diagrams (ERDs)
i.e., Conceptual Data Modeling
•
•
•
•
Similar to a class diagram, but without methods and
generalizations
Data-oriented modeling method that describes the data and
relationships among data entities
Goal: capture meaning of the data
2 main ERD or data model constructs:
– Entities and its attributes
– Relationships between entities
35
Entity
“An object, person, place, event or thing or which we want to
record data”
•
•
Equivalent to a table in a database
Examples: instructors, students, classrooms, invoices,
registration, machines, countries, states, etc.
•
Entity instance: a single occurrence of an entity
Example: Espinosa, Kogod 39, ITEC 630
•
Entities can be identified in a requirements analysis
description by following the use of NOUNS
36
Relationships
•
Relationships describe how two entities relate to each other
•
Relationships in a database application can be identified following
the VERBS that describe how entities are associated with one
another
•
Examples:
students enroll in courses
countries have cities, etc.
37
Cardinality
•
Cardinality is an important database concept to describe how two
entities are related
•
The Cardinality of a relationship describes how many instances of
one entity can be associated with another entity
The cardinality of a relationship between two entities has two
components:
– Maximum Cardinality: is the maximum number of instances that
can be associated with the other entity – usually either 1 or many
(the exact number is rarely used)
– Minimum Cardinality: is the minimum number of instances that
can be associated with the other entity – usually either 0 or 1
– Symbols:
0
1
Many
•
38
Cardinality (cont’d.)
•
A relationship is fully described by describing the
cardinality in both directions of the relationship: e.g., a
client places zero (i.e., optional) or many orders and
each order must relate to only one (i.e., mandatory)
client.
•
Examples:
1 student can only park 1 (or 0) cars  1 to (0 or) 1
1 client can place (0 or ) many orders  1 to (0 or) many
1 student can enroll in (at least 1 or) many courses and
a course can have (0 or) many students  (0 or) many
to (1 or) many
39
Example: 2 Entities, 1 Relationship
Instructors
PK
Zero or
many
InstructorID
Teaches
LastName
FirstName
Telephone
EMail
One and
only one
Course
PK
FK1
CourseNo
CourseDescription
InstructorID
CreditPoints
PreRequisites
Peter Chen’s notation
& MS Visio software
40
ERD SYMBOLS (cont’d.)
Note: high level conceptual models don’t show attributes, just entities
Employee
BioData
Has
1 to 1
Maximum
Cardinality
(outer symbol)
Employee
Has
Mandatory
FamilyData
Optional
Minimum
Cardinality
(inner symbol)
Peter Chen’s notation
using Systems Architect software
41
ERD SYMBOLS (cont’d.)
→ Advises
← Have
Advisor
Student
1 to Many
Maximum
Cardinality
Faculty
1 to Many (or None)
Mandatory
Teaches
Minimum
Cardinality
Course
Optional
Peter Chen’s (“crow’s feet”) notation
using Systems Architect software
42
Many to Many Relationships?
Many to Many
Orders
Products
Convert a Many-to-Many
into 2 One-to-Many’s
Orders
Products
1 to Many
LineItems
1 to Many
Intersection Table (or None)
43
Cardinality: 1 to 1 (MS Access
notation)
44
Cardinality: 1 to many
(MS Access notation)
45
Steps in data modeling Modeling
1.
Identify and diagram all ENTITIES
2.
Add PK attributes – i.e., implement entity integrity
Ensure PK’s are non-null & non-duplicates
3.
Identify and diagram all RELATIONSHIPS
Note CARDINALITIES (1 to 1, 1 to n, n to n)
4.
Add FK attributes – i.e., implement referential
integrity (this is automatic in some tools—MS
Access)
5.
Add remaining attributes
46
ERD Example:
Course Registration System
Courses (CourseNo (PK), CourseDescripition, InstructorID,
CreditPoints, ClassroomNo)
PreRequisites (CourseNo (PK), PreRequisiteNo (PK),
Comments)
Students (StudentID (PK), LastName, FirstName, SSN,
Department, College, Major, EMail)
Enrollment (StudentID (PK), CourseNo (PK), Comments)
Instructors (InstructorID (PK), LastName, FirstName,
Telephone, EMail)
Classrooms (ClassroomNo (PK), ClassroomName, Building,
BuildingRoomNo, Equipment, Capacity)
Note: PK denotes a primary key
47
Example: Course Registration System
Step 1. Draw Entities
PreRequisites
Course
ClassRooms
Enrollment
Instructors
Students
48
Example: Course Registration System
Step 2. Add PK’s (undeline/separate with a line)
PreRequisites
Course
Instructors
CourseNo
PreRequisiteNo
CourseNo
InstructorID
Enrollment
ClassRooms
ClassroomNo
StudentID
CourseNo
Students
StudentID
49
Example: Course Registration System
Step 3. Add Relationships (w/Cardinalities)
PreRequisites
PK,FK1
PK
CourseNo
PreRequisiteNo
has
Course
PK
Instructors
Teaches
CourseNo
PK
InstructorID
Includes
Assigned
Enrollment
ClassRooms
PK
ClassroomNo
PK,FK1
PK,FK2
StudentID
CourseNo
Students
Enrolls
PK
StudentID
50
Example: Course Registration System
Step 4. Add FK’s
PreRequisites
PK,FK1
PK
CourseNo
PreRequisiteNo
Assigned
Course
has
PK
CourseNo
FK1
FK2
InstructorID
ClassroomNo
Instructors
Teaches
PK
InstructorID
Includes
Enrollment
ClassRooms
PK
ClassroomNo
PK,FK1
PK,FK2
StudentID
CourseNo
Students
Enrolls
PK
StudentID
51
Example: Course Registration System
Step 5. Add Remaining Attributes
Course
PreRequisites
PK,FK1
PK
CourseNo
PreRequisiteNo
Comments
Assigned
Has
PK
CourseNo
FK1
CourseDescription
InstructorID
CreditPoints
FK2
ClassroomNo
Instructors
Teaches
PK
InstructorID
LastName
FirstName
Telephone
EMail
Students
ClassRooms
PK
Includes
PK
ClassroomNo
ClassroomName
Building
BuildingRoomNo
Equipment
Capacity
Enrollment
PK,FK1
PK,FK2
StudentID
CourseNo
Comments
Enrolls
StudentID
LastName
FirstName
SSN
Department
College
Major
EMail
52
Example:
Course Registration System
53
EXAMPLE:
Package Delivery Tracking System
Deliveries
Clients
PK
ClientID
Deliveries
Deliveries
Deliveries
PK
PK DeliveryNo
DeliveryNo
PK
PK DeliveryNo
DeliveryNo
LastName
FirstName
Address
Telephone
FK4 ClientID
ClientID
FK4
FK5 DriverNo
FK5 DriverNo
Date
Status
Clients
Clients
PK
PK
ClientID
Trucks
Trucks
PK
PK
Trucks
PK
TruckNo
TruckNo
Packages
Packages
Packages
Packages
PackageNo
PK
PK
PK PackageNo
PackageNo
PackageNo
FK4
DeliveryNo
FK4 DeliveryNo
Size
Charge
Drivers
Drivers
Drivers
PK
PK DriverNo
DriverNo
PK
DriverNo
Drivers
FK1
TruckNo
TruckNo
PK
DriverNo
Make
Model
Year
FK1
TruckNo
DriverName
LicenseNo
54
Example:
Package Delivery Tracking System
55
EXAMPLE:
Airline Reservation System
56
Example:
Airline Reservation System
57
Database Design Issue #4:
Implement Important Rules: Update,
Delete and
Business Rules
58
Referential Integrity:
Update Rules
What can be updated/modified in the database and when?
1. It is OK to update values in any non-PK fields, provided that
referential integrity and business rules are respected
2. It is OK to update values in the PK in one table if it is not linked to
a FK in another table, provided that entity integrity, referential
integrity and business rules are respected
3. If a PK is linked to a FK in another table, we need to ensure that
referential integrity is maintained. Depending on what makes
business sense, the update rule can be either:
• U:R (Update:Restrict) – i.e., Disallow updates of values in the PK, or
• U:C (Update:Cascade) – i.e., Allow updates,
but cascade changes to all related FKs in other tables
59
Referential Integrity:
Delete Rules
What can be deleted in the database and when?
1. It is OK to delete records in a table [only] if its PK is
not linked to a FK in another table
2. If its PK is linked to a FK in another table, we need to
ensure that referential integrity is maintained.
Depending on what makes business sense, the delete
rule can be either:
• D:R (Delete:Restrict) – i.e., Disallow deletion of records, or
• D:C (Delete:Cascade) – i.e., Allow deletion of records,
but cascade deletions in all related tables that contain a FK
linked to this table
60
Illustration of Update and Delete Rules
Instructors
PK
InstructorID
LastName
FirstName
Telephone
EMail
Course
PK
FK1
CourseNo
CourseDescription
InstructorID
CreditPoints
PreRequisites
What happens if (how is referential integrity affected):
• We change an instructor’s last name?
• We change an InstructorID in the Course table?
• We change an InstructorID in the Instructors table?
• We delete a course?
• We delete an instructor
61
Other Data Integrity:
Business Rules
Most DBMS have features that allow you to impose
constraints in the data to meet rules imposed by a
company when conducting business:
i.e., “business rules” – examples:
• CustomerAge >= 18
• OrderQty >= 100
• ProductPrice <= 1000
• PaymentDate <= PurchaseDate + 90
62
Database Design Issue #5:
“Normalize” Your Design
(we will discuss this later)
63
Database
Queries
64
DBMS Functions and Tools
• Performs 3 main functions:
– Data definition (create databases)
– Data manipulation (enter & update data)
– Data retrieval (data extraction)
•
Using
Queries
(or proprietary
features)
Plus additional database tools:
– Data dictionary: data about the database
– Visual tools: report & form design
– Data modeling & database design tools
– Macros and programming languages
– Internet/web features, etc.
65
Queries
Often thought of as a method to retrieve data, but
queries can also be used to define and manipulate data
Databases can be queried in many ways:
• Proprietary DBMS commands and languages,
which are unique to the particular DBMS product, or
• Standard query methods/languages (QBE, SQL, etc.),
which most DBMS products support
66
Standard Query Methods
Query by Example (QBE) (Design View in MS Access)
• Visual interface using examples of data requested
• Similar to how you do searches in the library
Structured Query Language (SQL)
• Popular with power users
• Works in most DBMS
• Can embed SQL commands in programs, web
scripts, etc.
• English-like commands, practical
• Exact, mathematical: relational algebra & matrix math
67
Query by Example (QBE)
•
•
•
Called Query “Design View” in MS Access
Column labels are the fields we want to retrieve
In table cells we enter “examples” of the info we want
68
Structured Query Language
(SQL)
69
SQL Commands Types
Only 8 Commands!!
• Data Definition:
Create, Drop
• Data Manipulation:
Insert, Update, Delete, Union, Join
• Data Retrieval:
Select
70
SQL Commands: Data Definition
Example: Create & Delete Table called “Employees”
One SQL
Command
(from CREATE to ;)
CREATE TABLE Employees
(EmployeeID integer,
LastName char(24),
Fields
FirstName char(24),
created in
Employees
Birthday date,
table
Phone char(10),
Notes memo);
; = End of
SQL Command
DROP TABLE Employees;
71
SQL Commands: Data Manipulation
• INSERT: Add new records
• UPDATE: Modify existing records
• DELETE: Delete records
• UNION: Combine records from two tables
• JOIN: Combine columns from two tables
72
SQL Commands: Data Manipulation
Add & Update Records
Insert (add) a complete record (values in all fields):
INSERT INTO Employees
VALUES (“ae”, “Espinosa”, “Alberto”,
12/12/2002, “885-1958”, “Looks tired, needs a vacation”);
Insert (add) partial record (values in some fields only):
INSERT INTO Employees
(EmployeeID, LastName, FirstName)
VALUES (“ae”, “Espinosa”, “Alberto”);
Update (modify) record with new values:
UPDATE Employees SET LastName=“Espinosa”
WHERE EmployeeID = “ae”;
73
Data Extraction Queries: The Idea
•
Organize database (design, create):
– In the most efficient & consistent way (internally)
– Not based on how you want the data to look
•
Produce the “virtual” temporary tables the way you
want them to look using queries
How we store
the data
How we display
the data
74
Data Extraction in SQL:
The “SELECT” Command
Note BOLDFACE denotes SQL Keywords
SELECT field1, field2, etc. – columns to retrieve and display
FROM table1, table2, etc. – tables that contain the data
WHERE condition1 – which records to retrieve
AND [OR] condition2 ……. – further conditions
GROUP BY field2, ….. – to group results
HAVING condition3 – like WHERE but after grouping
ORDER BY field1, field2, etc.... [DESC] [ASC] – to sort the query results
SELECT can be followed by:
DISTINCT (SELECT DISTINCT eliminates duplicate rows from result)
TOP n (lists only the top n rows of result – e.g. SELECT Top 5)
* (lists all fields in the table – e.g., SELECT * FROM …)
75
Complexity of SELECT Queries
• Simple Queries:
Involve a single table
• Queries with Aggregate Functions:
When we only want averages, totals, etc.
• Queries with Aggregate Functions and Grouping:
When we want averages, totals, etc. categorized by groups
• Complex Queries with Joins:
Involve more than one table
• Complex Nested Queries:
Sub-queries (which compute something) within queries
76
Simple SQL SELECT Queries
SELECT ProdID, ProdName, Type, Price (a list of fields)
FROM Products (the table where the data resides)
WHERE Price>=300; (which rows to display)
SELECT ProdName, Price FROM Products
WHERE Price>=120 AND Type=“Percussion”;
Note: the SQL DELETE command works identically to the SELECT command,
but instead of displaying the results, it deletes them. For example, the
following DELETE command deletes all the records displayed with the
previous query (it is not a bad idea to view the records before you delete
them)
DELETE ProdName, Price FROM Products
WHERE Price>=120 AND Type=“Percussion”;
Note: an SQL query can not only contain a list of fields, but also
any expression involving on or more fields. For example:
SELECT Labor, Parts, Labor+Parts AS Charges (column name)
FROM Repairs
WHERE Labor+Parts>=300;
77
Delimiters
When writing WHERE conditions and similar
statements in SQL, one often needs to compare a
field with a particular value. The values need to be
written within delimiters that match the data type.
These are the delimiters:
• Text – quotes:
ex. WHERE UserID = “alberto”
• Date/time – pound sign
ex. WHERE OrderDate > #01/07/2008#
• Number – nothing
ex. WHERE Amount > 200
78
SQL Queries With Aggregate Functions
•
These queries yield a single number result
(i.e., a table with 1 column and 1 row)
•
The only thing you can include in the SELECT line are the fields
you are aggregating
Aggregate functions you can use:
Avg, Sum, Min, Max, Count
•
•
These functions aggregate vertically a column of (usually numeric)
values (e.g., salaries, payment amounts, etc.)
SELECT Avg(Price) AS AvgPrice
FROM Products
WHERE Price>=120
AND Type=“Percussion”;
Note: the AS clause is optional; it does not
change the query results; it only changes the
column label in the results
SELECT Max(Price) AS MaxPrice, Avg(Price) AS AvgPrice
FROM Products
WHERE Type=“Guitars”;
SELECT Count(*) as TotOrders
FROM Orders
WHERE OrderStatus = “Top Priority”
Note: you can use more
than one aggregate
function in one SELECT
command
Note: the Count function counts how many
rows meet the Where criteria, so it you can use
any column you wish to count and you will get
the same results – the easiest thing is to use
Count(*)
79
SQL Queries With
Aggregate Functions and Grouping
• The ONLY things you can include in the SELECT line are:
(1) the fields you are aggregating [e.g., Avg(Price)]
(2) and the fields you are using to group [e.g, Type]
SELECT Type, Avg(Price) AS AvgPrice, Max(Price) AS MaxPrice
FROM Products
Note: the WHERE
WHERE Price>=1000
clause is evaluated
BEFORE the grouping
GROUP BY Type
SELECT Type, Avg(Price) AS AvgPrice, Max(Price) as MaxPrice
FROM Products
Note: the HAVING
GROUP BY Type
clause is evaluated
HAVING Avg(Price)>1000 AFTER the grouping
80
Complex SELECT Queries with Joins
Tables: Orders (OrderNo, ClientID, OrderDate, OrderStatus)
LineItems (OrderNo, LineItem, ProdID, Qty)
Table Join (2 ways):
SELECT Orders.OrderNo, OrderStatus, ClientID, LineItem, ProdID, Qty
FROM Orders, LineItems
WHERE Orders.OrderNo = LineItems.OrderNo;
Join
Condition
Table Product (WRONG!! Don’t forget the join condition):
SELECT Orders.OrderNo, ClientID, LineItem, ProdID, Qty
FROM Orders, LineItems;
81
Complex SELECT Queries with Joins: TIPS
COMPLEX queries that JOIN 2 tables are identical to SIMPLE queries, except
for 2 additional rules you MUST ALWAYS apply:
1.
The two tables need to be JOINED through the common field that links them
e.g., WHERE Orders.OrderNo = LineItems.OrderNo
2.
ANY time you refer to a COMMOND FIELD that exists in both tables, you
must use a TABLE PREFIX to eliminate the ambiguity
e.g., SELECT Orders.OrderNo; WHERE Orders.OrderNo = 990001
For complex queries that JOIN 3 or more tables apply rule 1 for EACH link, and
always apply rule 2 – e.g.,
SELECT Clients.ClientID, ClientName, Orders.OrderNo,
OrderStatus, LineItem, ProdID, Qty
FROM Clients, Orders, LineItems
WHERE Clients.ClientID = Orders.ClientID
AND Orders.OrderNo = LineItems.OrderNo
82
Mapping QBE to SQL
SELECT
FROM
JOIN
CONDITION
WHERE
ORDER BY
83
Nested Queries w/Aggregates
Display above average quantities in line items – i.e., order
number, line item, product ID and quantity for any line
item in which the quantity ordered is above the average
quantity ordered in all orders:
SELECT OrderNo, LineItem, ProdID, Qty
FROM LineItems
WHERE Qty>(SELECT Avg(Qty) FROM LineItems)
Tip: prepare the sub-query first: SELECT Avg(Qty) FROM LineItems
84
Nested Queries w/Aggregates
Display product ID’s and total quantities ordered for that
product for totals exceeding 1000 units (2 solutions):
SELECT ProdID, TotQty
FROM (SELECT ProdID, Sum(Qty) AS TotQty
FROM LineItems GROUP BY ProdID)
WHERE TotQty > 1000
Tip: again, prepare the sub-query first:
SELECT ProdID, Sum(Qty) AS TotQty
FROM LineItems GROUP BY ProdID
Alternative solution without sub-query:
SELECT Prod ID, Sum(Qty) AS TotQty
FROM LineItems GROUP BY ProdID
HAVING Sum(Qty) > 1000
85
Nested Queries w/Lists
Produce Sub-Query First – e.g., a single-column table (list)
SELECT PartNumber FROM Shipments
WHERE SupplierID = "S5“
GROUP BY PartNumber
HAVING Avg(Qty)>200;
Then enclose the Sub-Query in parenthesis and use with IN keyword
SELECT DISTINCT PartName
FROM Parts
WHERE PartNumber IN
(SELECT PartNumber FROM Shipments
WHERE SupplierID = "S5"
GROUP BY PartNumber HAVING AVG(Qty)>200);
Another example:
SELECT DISTINCT PartName
FROM Parts
WHERE PartNumber IN (352, 353, 354)
86
Database Design Issue #5:
“Normalize” Your Design
89
Database Design Goals
•
Data integrity (Entity and Referential Integrity – ERD’s)
Avoid anomalies in the data
•
No data redundancy
Record the data in one place only
•
Efficient data entry
Duplicate data means having to enter the same data more than once
•
Consistency
Duplicate data can lead to inconsistencies when the data changes
e.g., 2 different addresses for same client
•
Flexibility and easy evolution
East to maintain, update and add new tables
Normalization
90
Why Normalization?
• Question: if a data model/ERD is sound and all entity integrity,
referential integrity, update/delete and business rules have been well
implemented, does this guarantee a good database design?
Answer: not necessarily. If your design is not “normalized”, you
could have redundant data, and that would be a BAD thing (design)
• Normalization should yield the most efficient way to organize and
record the data internally—not necessarily how users want to see the
data, but what makes more sense for non-redundant data storage
• We can later build user table views (i.e., what the user wants or
needs to see) by querying these normalized tables.
• Redundancy: only PK and FK (e.g., client ID’s) values should appear
in multiple tables (because they are needed to link tables)
 Non-key data (e.g., client last name) that appears in multiple tables is
“redundant”
91
Example
You gather requirements from users and one user gives you this table
and tell you that she would like the system to collect this data.
How would you organize this data internally in the database?
92
Normalization
•
Normalization = The systematic process of “decomposing” a set
of unorganized tables with redundant data into smaller, simpler, and
more organized tables with only minimal data redundant in key
fields and no data redundancy on non-key fields
— i.e., from chaos to order
Decompose to most efficient
internal organization 
You can always recover the original
data format with a query 
Decomposition
Query
93
Degree of Normalization
•
Normalization is a matter of degree -- the more normalized your design
is, the lower the chances of having redundant data
•
Normal Forms (NF) (higher NF designs are more normalized):
1NF  2NF  3NF  BCNF  PJNF  DKNF  4NF  5NF
•
The process of normalizing a design to 3NF may seem complex, but
the concept is very simple:
(1) Minimize data redundancy in key attributes
-- i.e., data in key fields can be entered in more than one table
(2) Eliminate data redundancy in non-key attributes
-- i.e., data in non-key fields should be entered only in one table
(3) Ensure that every piece of data (each non-key attribute)
can be unambiguously located by its PK
(4) Each incremental NF gets us a step closer in this direction
94
Normal Forms
To what extent is a database normalized?
• Normalization is a matter of degree
• Measured in what is called “normal forms” (NF)
• 1NF, 2NF, 3NF, etc., higher NF = more normalized
• 3NF
Good enough for most applications
• BCNF  Boyce-Codd NF (more robust version of 3NF)
Mostly of academic interest (and complex applications):
• 4NF, 5NF or PJNF (Project Join), DKNF (Domain-Key)
 More advanced theoretically, little practical use
 Useful for research and formal methods only
95
Q: What’s wrong with this table?
A: Data in PayDate & Amount fields not single-valued
—i.e., they have repeating values
96
Similar Table, Same Problem
A: repeating values for a PK value  PK is duplicate
97
First Normal Form (1NF)
•
•
•
A “TABLE” is in 1NF if
there are no multi-valued attributes
and no PK is duplicated
i.e., attributes are “atomic”
A “DATABASE” is in 1NF if
ALL its tables are in 1NF
98
Decomposition to 1NF:
Create a separate table where the repeating
values can be recorded as rows
99

Decomposition
100
Q: What’s wrong with this table?
A: Some data in the Client and OrderDate fields are entered twice
i.e., some non-key data are redundant
i.e., there are “partial dependencies” in the table (see next slide)
101
Functional Dependencies
• An attribute B is functionally dependent
on attribute A if the value of a valid
instance of attribute A uniquely
determines the value of attribute B
• Represented as:
A
B
102
Functional Dependency Examples
StudentID
StudentID
StudentName
StudentMajor
What are the functional dependencies in this relations?
Clients (ClientID, ClientName, City, State, Zip)
LineItems (OrderNo, LineItem, ClientID, ProdID, Qty)
103
Second Normal Form (2NF)
• Applies to tables with “composite” PKs
(i.e., PK has more than one attribute)
• A “TABLE” is in 2NF if
(1) it is in 1NF, and
(2) non-key attributes are functionally dependent
on the whole PK, not on just part of it (i.e., no partial dependencies)
• Note: we only need to worry about 2NF when PK contains more than
one attribute (i.e., “composite”)
• That is: if a table is in 1NF and has a single PK, it is automatically in
2NF
• A “DATABASE” is in 2NF if ALL its tables are in 2NF
104
Decomposition to 2NF
Move the partial key (e.g., OrderNo) and the fields
that are functionally dependent on only that part of the
key (e.g., ClientID, OrderDate) to a separate table
and make that partial key the PK in that new table
105
Decomposition

106
Q: What’s wrong with this table?
A: Some of the data in the ClientCity field is redundant,
because once we know who the ClientID is, we know the
city where they live
i.e., there are “transitive dependencies” in the table
107
Transitive Dependencies
•
If a non-key attribute C is functionally dependent on
another non-key attribute B (BC) and
B is in turn dependent on the PK attribute A (AB)
this implies C is transitively dependent on A (AC)
(through B or ABC), which will cause redundancies
•
In 2NF, all non-key attributes are functionally
dependent on the PK
•
Thus, in a 2NF table, a transitive dependency will occur
every time there is a functional dependency between
any two non-key attributes.
108
Transitive Dependency Examples
OrderNo
CourseNo
ClientID
InstructorID
ClientName
InstructorName
Are there transitive dependencies in these relations?
LineItems (OrderNo, LineItem, ProdID, Qty)
LineItems (OrderNo, LineItem, ProdID, ProdName, Qty)
109
Third Normal Form (3NF)
• A “TABLE” is in 3NF if (1) it is in 2NF and (2) non-key
attributes depend on the PK and nothing else
• That is, non-key attributes are NOT functionally
dependent on other non-key attributes
(just on the PK)
• In other words, there are no transitive dependencies
• A “DATABASE” is in 3NF if ALL its tables are in 3NF
110
Decomposition to 3NF:
Move the fields with transitive
dependencies to a separate table
111
Decomposition

112
In Summary
• 1NF = no multi-value attributes (or no PK duplicates)
• 2NF = 1NF + the “whole” PK, not just part of it
• 3NF = 2NF + the PK and “nothing but” the PK
• Important! it is OK to have non-normalized designs,
and some database applications may actually
require a non-normalized design, but you must have
an understanding of which normalization form you
are violating and a good reason for doing it
113
Exercises
Text p.203, problem 3:
Indicate the normal form (PK underlined) and decompose to 3NF
Class (CourseNo, SectionNo, RoomNo)
Class (CourseNo, SectionNo, RoomNo, Capacity)
Class (CourseNo, SectionNo, CourseName, RoomNo, Capacity)
114
Exercises
POS System:
Indicate the normal form (PK underlined) and decompose to 3NF
Sales (SaleNo, ClientID, ClientName, SaleDate, SaleAmount)
SalesDetails (SaleNo, LineItem, SaleDate, ProdID, ProdName, Qty)
Other Systems:
VideoRental (VideoNo, Date, MovieID, MovieName, ClientID)
VideoRental (VideoNo, Date, ClientID, CheckoutDate, RentalDays)
Videos (VideoNo, MovieID, MovieName, MovieType)
Videos (VideoNo, MovieID, VideoCondition)
Movies (MovieID, MovieName, MovieType, Producer, ReleaseDate)
115
Exercise
Indicate the normal form and decompose to 3NF
116
FYI,
Conceptually, normalization can be thought of the opposite of a
SELECT SQL query. When you normalize, you decompose a large
table into simpler, smaller tables without redundancies. In contrast,
when you query several small tables, the result is a larger table in
which redundancies don’t matter.
For example, the decomposed tables of the exercise in the prior page
can be reconstructed by querying the normalized tables as follows:
SELECT Companies.CompanyID, CompanyName,
Employees.EmployeeID, EmployeeName,
Departments.DeptID, DeptName
FROM Departments, Companies, Employees
WHERE Companies.CompanyID = Employees.CompanyID
AND Departments.DeptID = Employees.DeptID
117
Exercise
Indicate the normal form and decompose to 3NF
(and then try to write an SQL query to re-construct the original table)
118
FYI Only
• Boyce-Codd Normal Form (BCNF):
– A more robust version of 3NF
– A database is in BCNF when the database is in 3NF when you
substitute the PK with any other Alternative Key
– That is, the database is in 3NF for all Candidate Keys
• Domain-Key Normal Form (BKNF):
– All values entered in an attribute satisfy the constraints defined in
the domain of that attribute
– An attribute’s domain is the pool of data from which the attribute
can draw its values
– Example: if we define a constraint for the OrderID attribute
(e.g., 6 digits, from 000001 to 999999) in general (i.e., the
domain), the OrderID attribute in every table that uses this
attribute, must satisfy the same constraints.
119
Transitional Artifact:
The CRUD Matrix 
Connecting Data
Objects to Use Cases
Data Objects
• A data object is a person or thing you
want to collect data for:
• In a database application a data object
is a table
Examples:
courses, students, clients, invoices,
orders, deliveries
121
Identifying Data Objects
To identify data objects, refer to the Use Cases
(or other requirements artifacts) and:
• Identify and highlight (or bold face) all nouns
• Inspect these nouns to see if they represent possible system data
objects
• But be careful, a noun may not refer to a data object, but simply to
an attribute of a data object
• A data object maps to a class (in a class diagram), entity (in a data
model) or table (in a database)
• A data object has attributes (and behaviors if object is for a class)
• An attribute is something you want to record about a data object
• For example, in Students (StudentID, Name, SSN, Email)—Students
represents a data object and the data inside the parenthesis
represents attributes of that data object
122
The CRUD Matrix
• A “transitional artifact” is one that helps establish a relationship or
cross reference between artifacts
• A CRUD matrix is a transitional artifact between Use Cases and
Data Objects
• Helps ensure that the Use Cases specified have all the necessary
Data Objects to handle the data needs of the application and,
conversely, that the collection of Data Objects identified cover the
entire functionality specified in the requirements.
• The Use Cases, if properly specified, must describe all the actions
necessary to maintain all data objects
• A CRUD matrix is a table that cross references which Use Cases:
(C)reate, (R)ead, (U)pdate and/or (D)elete data in these objects
123
Developing a CRUD Matrix
• The CRUD matrix has one row for every data object identified and
one column for every Use Case specified (or the other way around)
• So, first create a column (or row) for every Use Case in your model
• Every noun highlighted in the Use Cases will suggest the need for
data object to store the respective data you, so you need to create a
row (or column) for each of these data objects.
• Then go through every cell in the first Use Case and enter a C, R, U
and/or D on the cell depending on whether the Use Case is creating,
reading, updating or deleting records in the respective data object.
• The data objects should give you an indication of the entities (i.e.,
database tables) that you will need in your Data Model (and database)
• And the C’s, R’s, U’s and D’s should give you an idea of the SQL
queries that your application will need
124
Illustration
Table 1
Table 2
Table 3
UC-101
UC-102
C
R
UC-103
U
D
• UC-102 reads data from Table 1
 It will require an SQL SELECT query
• UC-101 creates a record in Table 1
 It will require an SQL INSERT query
• UC-103 deletes records data from Table 3
 It will require an SQL DELETE query
• UC-102 updates data in Table 2
 It will require an SQL UPDATE query
125
CRUD Matrix Example for a Loan
Processing Application
Use Case
Data Object
Submit a Loan
Request
Evaluate a Loan
Request
Applicant
C
Loan Application
C
R
Credit Score
C
R
Credit Report
C
R
Account History
C
R
Loan Request
C
R,U
Loan Officer
R
Evaluation
C
Book a Loan
R
R
Loan Agreement
R
Loan Account
C
Loan Clerk
R
In a database application, these are tables and these are queries
126
ATM Application Example
127
ATM Use Case
Use Case ID
UC-100
Use Case
Withdraw Funds
Actors
(P) Customer
Description
The customer inserts card in the ATM, logs in with a pass code, and
makes a selection from the available choices to withdraw funds. Once in
the funds withdrawal screen, the customer is prompted to enter the
amount to withdraw. After the amount is entered, the system will check
for availability of funds for that customer. Provided that funds are
available, the system will dispense the amount requested in cash and
then debit that amount from the customer’s bank account. The system
will record the last withdrawal date in customer’s file and record
transaction in ATM transaction log .
Priority
Non-Functional
Requirements
Assumptions
Source
128
ATM Use Case
Use Case ID
UC-101
Use Case
Deposit Funds
Actors
(P) Customer
Description
The customer inserts card in the ATM, logs in with a pass code, and
makes a selection from the available choices to deposit funds. Once in
the funds deposit screen, the customer is prompted to enter the amount
to deposit. After the amount is entered, deposit slot door opens, customer
places deposit envelop in slot, deposit slot door closes. The system
credits the customer’s account accordingly, records the last deposit
date in the customer’s file and record the transaction in ATM
transaction log.
Priority
Non-Functional
Requirements
Assumptions
Source
129
ATM Use Case
Use Case ID
UC-102
Use Case
Transfer Funds
Actors
(P) Customer
Description
The customer inserts card in the ATM, logs in with a pass code, and makes
a selection from the available choices to transfer funds. Once in the funds
transfer screen, the customer is prompted to enter the amount to transfer,
from account and to account. After the information is entered, the checks
for availability of funds. If funds are available, it displays the transaction
and asks for confirmation. The customer confirms transaction and the
customer’s account gets adjusted accordingly. The system records the last
funds transfer date in the customer’s file and records the transaction in
ATM transaction log.
Priority
Non-Functional
Requirements
Assumptions
Source
130
ATM Use Case
Use Case ID
UC-103
Use Case
Balance Inquiry
Actors
(P) Customer
Description
The customer inserts card in the ATM, logs in with a pass code, and
makes a selection from the available choice to inquire balances. The
machine prints balances, records the last balance inquiry date in the
customer’s file and records the transaction in ATM transaction log .
Priority
Non-Functional
Requirements
Assumptions
Source
131
ATM System’s CRUD Matrix
Use Case
Deposit
Funds
Transfer
Funds
Inquire
Balances
C
U
U
U
Customer File
R,U
R,U
R,U
R,U
Customer Account
R,U
U
R,U
R
C
U
U
Data Object
ATM
ATM Transaction Log
Customer Transactions
Withdraw
Funds
R,U
132
Descargar

Summer Class 3 - American University