Data Modeling and Database Design
Team
Minder Chen, Ph.D.
Team number
Specialty
Customer
member
is a member of
Employee
Division
Division number
Division name
Division address
belongs to
Customer
Customer
Customer
Customer
Customer
Customer
Employee number
First name
Last name
Employee function
Employee salary
subcontract
staffed by
is assigned to
Project
Task
Task name
Task cost
number
name
address
activity
telephone
fax
contains
Project number
Project name
Project label
Start date
End date
Rationales for Data Modeling
• Data is the foundation of modern information
systems enabled by data base technologies.
• Data in an organization exist and can be described
independently of how these data are used.
• Data should be managed as a corporate-wide
resource.
• The types of data used in an organization do not
change very much.
• Data have certain inherent properties which lead to
correct structuring.
• If we structure data according to their inherent
properties, the structure (i.e., data models) will be
stable.
© Minder Chen, 1993~2006
Data Modeling - 2 -
History of Data Modeling
• Importance of Entity-Relationship Modeling Technique
–
–
–
–
–
Database
Data modeling and enterprise-wide data
Data quality
Data updating and accessing tools and procedure
Data sharing culture
• ER modeling technique was first developed by Peter Chen
in 1976
–
–
–
A conceptual/logical data modeling tool
A user-oriented approach
A graphic-based method
• ER modeling technique is the major data modeling method
in Information Engineering and is widely supported by
most of CASE tools.
• Data modeling is the foundation of most database-centered
transaction processing systems and data warehouse
systems
© Minder Chen, 1993~2006
Data Modeling - 3 -
C/S Development Methodology
SDLC
C/S
Architecture
performance =>
rules=>
Conceptual
Analysis
Logical
Design
Physical
Design
Work
Flow
Form
Sequences
Forms,
Screens
Application
Logic
Process
Flow
Object
Interaction
Model
Programs,
Procedures
Information
& Data Base
Data
Model
Database
Schema
Tables,
Indexes
User
Interface
Source: David Vaskevitch, Client/Server Strategies, IDG Books, 1993.
Data Modeling - 4 © Minder Chen, 1993~2006
Client/Server Application Development Methodology
Requirements
Information
& Data Base
Processes
Behavior
Workflow
User Interface
Architecture
Application
Design and
Development
Source: David Vaskevitch, Client/Server Strategies, IDG Books, 1993.
© Minder Chen, 1993~2006
Data Modeling - 5 -
Multiple Perspectives
We use
this data
DATA
ONE
BUSINESS
ACTIVITY
HIRE
EMPLOYEE
PAY
EMPLOYEE
......
......
....
....
EMPLOYEE
© Minder Chen, 1993~2006
We do
these things
PROMOTE
EMPLOYEE
FIRE
EMPLOYEE
Data Modeling - 6 -
Data Model (Entity Relationship Diagram)
Member
Order
sells;
is sold on
Product
placed by;
places
Member
is enrolled under;
applies to
established by;
established
generates;
generated by
is featured in;
features
© Minder Chen, 1993~2006
Agreement
Promotion
sponsors;
is sponsored by
Club
Data Modeling - 7 -
Entity Types
• Definition:
– An entity is an object or event, real or abstract, about
which we would like to store data. Entity is the
abbreviation of entity type. It represent a set of
entity instances which can be described by the
same set of attribute types. The value of the same
attribute for each entity instance may be different.
• Identifying Entity Types
– What information is required by the business?
– Things that are of interest to the business that need
to be remembered in order to manage and track
them.
– Things belong to the same entity type have common
characteristics.
© Minder Chen, 1993~2006
Data Modeling - 8 -
Naming Entity Types
• The name of each entity is in singular form
–
–
–
–
a noun
an adjective + a noun
a noun + a noun => (noun string)
an adjective + a noun + a noun
• Examples
– Customer, Customer Order, Product, Hourly Employee, Project,
Department, Unfilled Customer Order
•
•
•
•
Be clear and concise
Avoid abbreviation
Be consist with user’s terminology
Identify synonyms
–
–
–
–
Customer
Product
Supplier
Teacher
Client
Merchandise
Vendor
Faculty
• Use one name as the official name and document others
as aliases
© Minder Chen, 1993~2006
Data Modeling - 9 -
Exercise: Entity Type Naming
• Courses
• Department
• Customer Order
• PO
© Minder Chen, 1993~2006
Data Modeling - 10 -
Properties of Entity Types
•
•
•
•
•
•
•
•
Name
Description
Identifier
Properties: Estimated number (Max., Min.,
Average) of entity instances
Expected growth rate of entity instances
Subject Area in which the Entity Type
resides
Attributes that describe the Entity Types
Examples of entity type instances
© Minder Chen, 1993~2006
Data Modeling - 11 -
Definition of an Entity Type
• A poor definition of Customer: Anyone
that buys something from the company.
– Can employees be a customer?
– Can a leaser be a customer?
– If the company sold a subsidiary to another
company, does the new owner consider a
customer?
• Good definition should be:
–
–
–
–
–
Compatible
Precise
Concise
Clear
Complete
© Minder Chen, 1993~2006
Data Modeling - 12 -
Good Definition
• Compatible
– Customer: An ORGANIZATION that purchase
PRODUCTs for personal use.
– Distributor: An ORGANIZATION that purchase
PRODUCTs for resale.
• Precision:
– With appropriate qualifiers
– Example: An ORGANIZATION is considered to have
purchase a PRODUCT when we receive a valid
PURCHASE ORDER from it.
• Complete
– ORGANIZATION, PRODUCT, PURCHASE ORDER
need to be defined.
• Concise and Clear
– Use modular definition
© Minder Chen, 1993~2006
Data Modeling - 13 -
Example of Entity Type Descriptions
Entity Type
Description
Customer
Information about all persons or organizations who
purchases
Product
All goods manufactured and sold
Raw-material
Components used to manufacture Products.
Supplier
Vendors of Raw Materials.
Buyer
Company personnel responsible for purchasing
Raw-Materials from Suppliers
© Minder Chen, 1993~2006
Data Modeling - 14 -
Entity Type and Entity Instance (Occurrence)
Entity Types
Vendor
Employee
Course
Department
© Minder Chen, 1993~2006
Entity Instance
ABC Co.
John Smith
Intro. to IE
Marketing Department
Data Modeling - 15 -
Exercise: Entity Types or Entity Instances?
• Maryland
• Organization Unit
• Customer
• President
• Bill Clinton
• Department of Commerce
• Address
© Minder Chen, 1993~2006
Data Modeling - 16 -
Finding Entity Types
•
•
•
•
•
•
Interviews with users
JAD workshops
Business forms
Reports
Computer files using reverse engineering
Operation manuals
© Minder Chen, 1993~2006
Data Modeling - 17 -
Where to Look for an Entity Type?
• Tangible or Intangible Things
– The nouns that are used to describe the problem domain will often
correspond to the major Entity Types of the system, at least at a
high level.
– Examples: Product, Sensor, and Employee, Department, and Sale
Office.
• Resources
– Any resources that an organization needs to manage should be
represented as an Entity Type. Information assists the efficient
and effective use of other resources through improved decision.
– Examples: Inventory, Machine, Bank Account, and Customer.
• Roles Played
– Roles can be played by persons or organizational units.
– Examples: Customers, Managers, and Account representatives.
• Events
– Events are incidents that occur at points in time. An event often
involved an interaction between two Entity Types or an action that
changes the status of an Entity Type.
– Examples: Sale, Delivery, and Registration of a motor vehicle.
© Minder Chen, 1993~2006
Data Modeling - 18 -
BIAIT:
Business Information Analysis and Integration Technique
•
•
•
•
Analysis of Orders
Ordered entities can be a thing, a space, or a skill.
View the order from supplier side.
If an organization receives no orders, it has no reason
for existing.
• An organization unit can receive multiple types of
orders.
• 4 questions about the Supplier:
–
–
–
–
Billing (Cash)?
Deliver Late (Immediate)?
Profile customer?
Negotiate price (Fixed)?
• 3 questions about the Ordered Entity:
– Rented (Sold)?
– Tracked?
– Made to order (Stock)?
Source: Carlson, W. M., "BIAIT: Business Information Analysis and Integration Technique The New Horizon," Data Base, Vol. 10, No. 4, 1979, pp. 3-9.
© Minder Chen, 1993~2006
Data Modeling - 19 -
Criteria for Evaluating an Entity Type
• Need to be remembered by the information system in order
to be functional.
• Can be operated on: CREATE, READ, UPDATE, DELETE.
• Has a set of operations/services that always apply to
change the status of each occurrence of an Entity Type.
• Carry a set of attributes that always apply to describe each
occurrence of an Entity Type.
• Have at least one relationship with other entity type.
• Exist more than one entity occurrence (instance) in an
Entity Type.
• Have at least a unique identifier.
• Domain-based requirements: Something that the system
must have in order to operate. These may be clearly
specified in the problem description or known from subject
matter experts.
© Minder Chen, 1993~2006
Data Modeling - 20 -
Entity Relationship Modeling and Diagramming
• Relationships
• Entity Relationship Diagramming
Notation
• Attributes
• Identifiers
• Partitioning and Entity Subtypes
© Minder Chen, 1993~2006
Data Modeling - 21 -
Relationship (Type)
• Definition
– A Relationship Type is an association among Entity
Types. It indicates that there is a business
relationship between these Entity Types.
– Relationship Membership is the participation of an
Entity Type in a Relationship.
– In IE, a Relationship Type can involve only two Entity
Types (binary relationship). Some other modeling
techniques allow n-ary relationships.
• Examples
–
–
–
–
CUSTOMER places ORDER
ORDER is placed by CUSTOMER
EMPLOYEE works on PROJECT
PROJECT has project member EMPLOYEE
© Minder Chen, 1993~2006
Data Modeling - 22 -
Paring (Relationship Instance)
• Relationship paring is a pair of Entity Instances of two
Entity Types associated by a Relationship Type between
these two Entity Types.
Entity Types
Entity Instance
Student
Student#1
Student#2
Course
Course#A
Course#B
Course#C
Course#D
Relationship
Student
takes
Course
© Minder Chen, 1993~2006
Relationship Paring
Student#1 takes Course#A
Student#1 takes Course#B
Student#1 takes Course#D
Student#2 takes Course#A
Student#2 takes Course#C
Student#2 takes Course#D
Data Modeling - 23 -
Relationship Instances Grouping
• Definition: A collection of pairings of
a Relationship Membership in which
an Entity Instance is involved.
• Examples:
– Student#1 takes Course#A, #B, and #D
– Student#2 takes Course#A, #C, and #D
– Course#A is taken by Student#1 and
Student#2
© Minder Chen, 1993~2006
Data Modeling - 24 -
Relationship Cardinality
One-to-One
E1
E2
E1
E2
1:1
One-to-Many
1:M
Many-to-Many
E1
E2
M:N
© Minder Chen, 1993~2006
Data Modeling - 25 -
Relationship Cardinality
• The number of Entity Instances involved in the Relationship
Instances Grouping in a Relationship Type.
• Three Forms of Cardinality
1. One-to-one (1:1)
DEPARTMENT has MANAGER
Each DEPARTMENT has one and only one MANAGER
Each MANAGER manages one and only one DEPARTMENT
2. One-to-many (1:m)
CUSTOMER places ORDER
Each CUSTOMER sometimes (95%) place one or more ORDERs
Each ORDER always is placed by exactly one CUSTOMER
3. Many-to-many (m:n)
INSTRUCTOR teaches COURSE
Each INSTRUCTION teaches zero, one, or more COURSEs
Each COURSE is taught by one or more INSTRUCTORs
© Minder Chen, 1993~2006
Data Modeling - 26 -
Entity Relationship Diagram (ERD): Notations
Graphical Notations
Cardinality
indicator
zero
one
many
relationship-description
Entity-X
reversed-relation-description
Entity-Y
min max
Translate into two structured statements
Each Entity-X relationship-description cardinality-indicator (one-or-many) Entity-Y
Each Entity-Y reversed-relationship-description (zero-or-one) Entity-Y
Example
is-managed-by
Department
© Minder Chen, 1993~2006
manages
Manager
Data Modeling - 27 -
Optionality of Relationship Memberships
• Whether all entity instances of both entity
types need to participate in relationship
pairing.
• Optionality:
– Mandatory
– Optional
• Example:
– CUSTOMER membership is optional
– ORDER membership is mandatory
places
CUSTOMER
© Minder Chen, 1993~2006
is placed by
ORDER
Data Modeling - 28 -
Relationship Statements
Cardinality
indicator
one
one or more
Graphical Notations
places
CUSTOMER
is placed by
Optionality
indicator
ORDER
zero (sometimes)
one (always)
Each Entity X optionality relationship cardinality Entity Y
Each CUSTOMER sometimes places one or more ORDER.
Each ORDER always is placed by one CUSTOMER.
© Minder Chen, 1993~2006
Data Modeling - 29 -
Defining Relationships
• Name
• Description
• Property
– Cardinality volumes
– Optionality percentage: % of Entity Type X's
instances pairing with Entity Type's Y's
instances
– Transferability: A relationship is transferable if
an entity instance can change its pairing within
the same relationship.
» TRANSFERABLE: An EMPLOYEE can change to a
different DEPARTMENT.
» NON-TRANSFERABLE: An ORDER cannot be
transferred to another CUSTOMER.
© Minder Chen, 1993~2006
Data Modeling - 30 -
ERD: More Examples
(a)
Customer
Product
(b)
places
Order
belongs-to
is-contained-in
contains
manages
Employee
is-managed-by
works-for
Parallel
Relationship
Project
has-project-members
is-consists-of
(c)
Part
contained-in
© Minder Chen, 1993~2006
Involuted or Looped
Relationship
Data Modeling - 31 -
Identifying Relationships
• Association between entity types
• Entity types that are used on the
same forms or documents.
• A description in a business document
that has a verb that relates two entity
types
– has
– consists of
– uses
© Minder Chen, 1993~2006
Data Modeling - 32 -
Attributes
• Definition
– Characteristics that could be used to describe Entity Types and
Relationship Types. However, in IE, relationship types are not
allowed to have attributes.
• Naming Conventions:
– Names that have business meaning
– Don't use abbreviation or possessive case, e.g., PN and
Customer's name
– Don't include entity type name because IEF will prefix the attribute
name with entity type name automatically
– Use standard format:
Entity Type Name
(Qualifiers)
Domain Name
Customer
Name
Employee
Starting
Date
• Examples
–
–
–
–
Customer has customer name, address, and telephone number
Product has quantity-on-hand, weight, volume, color, and name.
Employee has SSN, salary, and birthday.
Employee-works-for-project has percentage-of-time, starting-date.
© Minder Chen, 1993~2006
Data Modeling - 33 -
Attributes: Notations
Student
Student ID
Student Name
Birth date
Student ID
Course no.
enrollment
Em ploy ee
Em ploy ee num ber
Fi rst name
Last name
Em ploy ee function
Em ploy ee salary
Student
studentID
name
phone
Birth date
Student(Student ID, Student Name, Birth Date)
Finding Attributes:
Attributes are identified progressively during BAA phase.
• Data Analysis
• Activity Analysis
• Interaction Analysis
• Current Systems Analysis
© Minder Chen, 1993~2006
Data Modeling - 34 -
Attribute Value
• Definition
– Attribute Values are instances of Attributes used to describe
specific Entity Instances
• Examples
–
–
–
–
–
Customer Number: 011334
Customer Name: Minder Chen
State: VA
Order Total: $23,000
Sale tax: $250
• An attribute of an entity type should have only one value
at any given time. (No repeating group)
• Avoid using complex coding scheme for an attribute.
For example: PART Number: X-XXX-XXX
Part Type
© Minder Chen, 1993~2006
Material
Sequence Number
Data Modeling - 35 -
Type & Instance
OBJECT TYPE
Entity Type
Entity
Entity Type
OCCURRENCE
Entity Instance
Entity Instance
Entity
Relationship (Type)
Pairing (Relationship Instance)
Attribute (Type)
(Attribute) Value
© Minder Chen, 1993~2006
Data Modeling - 36 -
Attribute Source Categories
• Basic
– Definition: An Attribute Value that cannot be deduced
or calculated.
– Examples: Student name and Birthday
• Derived
– Definition: The Attribute Value can be calculated or
deduced from relationship Groupings or from the
values of other Attributes. The value of a Derived
Attribute changes constantly.
– Examples: Student Age, Account Balance, Number of
courses taken.
• Designed
– Definition: The Attribute is created to overcome the
system constraints. The value of a Designed
Attribute does not change.
– Examples: Student ID, Course number.
© Minder Chen, 1993~2006
Data Modeling - 37 -
Properties of Attributes
•
•
•
•
•
•
•
Name
Description
Attribute Source Category: Basic, Derived, Designed
Domain or data type: Text, Number, Date, Time, Timestamp
Optionality: Mandatory or optional
Length and/or precision
Permitted Values (Legal Values)
–
–
Ranges
A set of values (Code Table)
• Default value or algorithm
Tools such as PowerBuilder has additional properties for
table’s columns called extended attributes
– Validation Rule
– Editing Format
– Reporting Format
© Minder Chen, 1993~2006
– Column Heading
– Form Label
– Code Table
Data Modeling - 38 -
Data Modeling Case Study
The following is description by a pharmacy owner:
"Jack Smith catches a cold and what he suspects is a
flu virus. He makes an appointment with his family
doctor who confirm his diagnosis. The doctor
prescribes an antibiotic and nasal decongestant
tablets. Jack leaves the doctor's office and drives to
his local drug store. The pharmacist packages the
medication and types the labels for pill bottles. The
label includes information about customer, the doctor
who prescribe the drug, the drug (e.g., Penicillin),
when to take it, and how often, the content of the pill
(250 mg), the number of refills, expiration date, and the
date of purchase."
Please develop a data model for the entities and relationships
within the context of pharmacy. Also develop a definition
for "prescription". List all your underlying assumptions
used in your data models.
© Minder Chen, 1993~2006
Data Modeling - 39 -
Data Modeling Process
• List entity types
• Create relationships
– Pick a central entity type
– Work around the neighborhood
» Add entity types to the diagram
» Build relationships among them
– Determine cardinalities of relationships
• Find/Create identifiers for each entity type
• Add attributes to the entity type in the data
model
• Analyze and revise the data model
© Minder Chen, 1993~2006
Data Modeling - 40 -
Classifying Attribute and Partitioning
• An Entity Subtype A collection of Entities of the same
type to which a narrower definition and additional
Attributes and Relationships apply. An Entity Subtype
inherits (retains) all the Attributes and Relationships of
its parent Entity Type.
• Classifying Attribute: An attribute of the Base Entity Type
whose values partition the Entity Instances into
Subtypes.
• Partitioning: A basis for subdividing one entity type into
subtypes. The process of dividing an Entity Type into
several Subtypes based on a Classifying Attribute is
called Partitioning.
• The Classifying Attribute is recorded as a property of the
Partitioning and it appears on the diagram.
© Minder Chen, 1993~2006
Data Modeling - 41 -
Normalization
• A data base is a model or an image of the
reality.
• Logical Data Base Design is a process of
modeling and capturing the end-user
views of an application domain and
synthesis them into a data base structure.
• Normalization is a logical data base design
method.
• The basis for normalization is the
functional dependencies among attributes
in a table.
© Minder Chen, 1993~2006
Data Modeling - 42 -
SQL Terminology
Column
Product Table
Row
p_no
product_name
quantity
price
101
Color TV
24
500
201
B&W TV
10
250
202
PC
5
2000
Create a table in SQL
CREATE TABLES
(p_no
CHAR(5) NOT NULL,
product_name
CHAR(20),
quantity
SMALLINT,
price
DECIMAL(10, 2));
© Minder Chen, 1993~2006
Data Modeling - 43 -
SQL Terminology
Set Theory
Relational DB
File
Example
Relation
Table
File
Product_table
Attribute
Column
Data item
Product_name
Tuple
Row
Record
Product_101's info.
Domain
Pool of legal values
Data type
DATE
© Minder Chen, 1993~2006
Data Modeling - 44 -
SQL Principles
• The result of a SQL query is always a table (View
or Dynamic Table)
• Rows in a table are considered to be unordered
• Dominate the markets since late 1980s
• Can be used in interactive programming
environments
• Provide both data definition language (DDL) and
data manipulation language (DML)
• A non-procedural language
• Can be embedded in 3GL:
– Embedded SQL
– Dynamic SQL
© Minder Chen, 1993~2006
Data Modeling - 45 -
SQL: Data Definition Language (DDL)
CREATE
DROP
ALTER
© Minder Chen, 1993~2006
TABLE
VIEW
INDEX
DATABASE
TABLE
Data Modeling - 46 -
SQL: Introduction
• A relational data base is perceived by its users
as a collection of tables
• E. F. Codd 1969
• Dominate the markets since late 1980s
• Strengths:
– Simplicity
– End-user orientation
– Standardization
– Value-based instead of pointer-based
– Endorsed by major computer companies
• Most CASE products support the development
of relational data base centered applications
© Minder Chen, 1993~2006
Data Modeling - 47 -
SQL: Data Manipulation Language (DML)
SELECT
UPDATE
INSERT
DELETE
p_no
101
201
202
product_name
Color TV
B&W TV
PC
quantity
24
10
5
price
500
250
2000
The Generic Form of the SELECT Statement
SELECT [DISTINCT] column(s)
FROM table(s)
[WHERE conditions]
[GROUP BY column(s) [HAVING condition]]
[ORDER BY column(s)]
© Minder Chen, 1993~2006
Data Modeling - 48 -
Database Table
• The following code retrieves only the Last Name and the
Employee ID where the Employee ID is greater than 5. The
records are retrieved in descending order.
SELECT LastName, EmployeeID
FROM Employees
WHERE EmployeeID > 5
ORDER BY EmployeeID DESC
© Minder Chen, 1993~2006
Data Modeling - 49 -
WHERE Clause
• WHERE: Use the Where clause to limit the
selection. The # symbol indicates literal date
values.
SELECT * FROM Employees
WHERE LastName = "Smith"
SELECT Employees.LastName FROM Employees
WHERE Employees.State in ('NY','WA')
SELECT OrderID FROM Orders
WHERE OrderDate BETWEEN #01/01/93# AND
#01/31/93#
© Minder Chen, 1993~2006
Data Modeling - 50 -
Keys
• A key, also called identifier, is an Attribute or a
Composite Attribute that can be used to
uniquely identify an instance of an entity type.
• Examples:
Entity Type
Key
Warehouse
Product
Student
Ship
Warehouse Number
Product Number
Student ID or SSN
Name and Port of Registration
Stock of Product
Product Number and Warehouse No.
© Minder Chen, 1993~2006
Data Modeling - 51 -
Types of Key
• Primary Key: A unique key is an attribute or a
set of attributes that has been used by the DBMS
as the identifier of a table.
• Candidate (Alternative) Key: An attribute or a set
of attributes that could have been used as the
primary key of a table.
• Secondary (Index) Key: An attribute or a set of
attributes that has been used to construct the
data retrieval index.
• Concatenated (Combined or Composite) Key: A
set of attributes that has been used as the key.
• Foreign Key: An attribute or a set of attributes
that is used as the primary key in another table.
© Minder Chen, 1993~2006
Data Modeling - 52 -
Purposes of Normalization
• Avoid maintenance problems such as
Update .
• Insert: There may be no place to insert new
information.
• Delete: Some important information will be
lost by deletion.
• Update: Inconsistency may occur because
of the existence of data redundancy.
• Provide maximum flexibility to meet future
information needs by keeping tables
corresponding to object types in their
simplified forms.
© Minder Chen, 1993~2006
Data Modeling - 53 -
A Common Sense Approach to Normalization
• Don't rush to put all the information in one
table.
• Create a table to correspond to a class of
a simple object type that should exist by
itself, i.e., "one fact in one place."
• Include common fields (links) as ways of
joining information from several related
tables.
• Avoid redundancy by using links to
retrieve data from related tables.
© Minder Chen, 1993~2006
Data Modeling - 54 -
Normalization Theory
• Normalization is a process of systematically
breaking a complex table into simpler ones.
• It is built around the concept of normal forms.
• A relation is in a particular normal form if it
satisfies a specific set of constraints such as
dependencies among attributes in the relation.
• For x is an integer and x > 1,
if a relation is in x-NF than it is in (x-1)-NF.
• Higher order normal forms are usually more
desirable than lower order normal forms.
• Normalization process usually starts from
complex relations which are usually drawn
from some existing documents such as
business forms.
© Minder Chen, 1993~2006
Data Modeling - 55 -
A Business Form
© Minder Chen, 1993~2006
Data Modeling - 56 -
An Informal Example of Normalization
• A CUSTOMER ORDER contains the following
information:
–
–
–
–
–
–
–
–
OrderNo
OrderDate
CustNo
CustAddress
CustType
Tax
Total
one or more than one Order-Item which has
»
»
»
»
»
ProductNo
Description
Quantity
UnitPrice
Subtotal.
© Minder Chen, 1993~2006
Data Modeling - 57 -
Solution
Unnormalized table
(OrderNo, OrderDate, CustNo, CustAddress, CustType, Tax, Total,
1{ProductNo, Description, Quantity, UnitPrice,Subtotal}n)
Remove repeating group
1st NF
(OrderNo, ProductNo, Description, Quantity, UnitPrice, Subtotal)
Remove partial FD
2nd NF
(OrderNo, OrderDate, CustNo, CustAddress, CustType, Tax, Total)
Remove transitive FD
(OrderNo, ProductNo, Quantity, UnitPrice, Subtotal)
(ProductNo, Description, UnitPrice)
3rd NF
© Minder Chen, 1993~2006
(OrderNo, OrderDate, CustNo, Tax, Total)
(CustNo, CustAddress, CustType)
Data Modeling - 58 -
Unnormalized Form
• A relation that has multi-valued attributes (repeating
groups).
• Normalization Process: Remove Multi-value Attributes
• If an unnormalized relation R has a primary key K and a
multi-value attribute M, the normalization process is:
– The multi-value attribute M should be removed from R.
– A new relation will be created with (K,M) as the primary key of
the relation.
– There may be some other attributes associated with this new
relation.
– R will then be at least in 1NF.
• Example: An Employee relation has an attribute
language-spoken. For some employees there may be
more than one language that they can speak.
EMP (employeeID, empName, empAddress, (language1, language2, ...))

EMP (employeeID, empName, empAddress)
EMP-LANGUAGE (employeeID, language, skillLevel)
© Minder Chen, 1993~2006
Data Modeling - 59 -
How Do You Remove the Repeating Groups?
CREATE TABLE MEM_CONDITION (
MEMBER#
VARCHAR2(12)
CASE#
VARCHAR2(16)
DIAG_ARRAY_1
VARCHAR2(6)
DIAG_ARRAY_2
VARCHAR2(6)
DIAG_ARRAY_3
VARCHAR2(6)
DIAG_ARRAY_4
VARCHAR2(6)
DIAG_ARRAY_5
VARCHAR2(6)
DIAG_EX_ARRAY_1
VARCHAR2(2)
DIAG_EX_ARRAY_2
VARCHAR2(2)
DIAG_EX_ARRAY_3
VARCHAR2(2)
DIAG_EX_ARRAY_4
VARCHAR2(2)
DIAG_EX_ARRAY_5
VARCHAR2(2)
DRUG_ARRAY_1
VARCHAR2(12)
DRUG_ARRAY_2
VARCHAR2(12)
DRUG_ARRAY_3
VARCHAR2(12)
DRUG_ARRAY_4
VARCHAR2(12)
DRUG_ARRAY_5
VARCHAR2(12)
LC_ARRAY_1
VARCHAR2(4)
LC_ARRAY_2
VARCHAR2(4)
LC_ARRAY_3
VARCHAR2(4)
LC_ARRAY_4
VARCHAR2(4)
LC_ARRAY_5
VARCHAR2(4)
MEM_REVIEW
VARCHAR2(4)
OP#
VARCHAR2(4)
© Minder
Chen, 1993~2006
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NOT
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
Data Modeling - 60 -
Functional Dependency
• Notation: R.X => R.Y
• Definition: Attribute Y of Relation R is
functionally dependent on the
Attribute X of Relation R when there
is each value of R.Y associated with
no more than one value of R.X. R.X
and R.Y may be composite attributes.
• Description:
– R .Y is functionally dependent on R.X
– R.X functionally determines R.Y
© Minder Chen, 1993~2006
Data Modeling - 61 -
Full & Partial Dependency
• R.A => R.B
• If B is not functionally dependent on
any subset of A (other than A itself), B
is fully dependent on A in R.
• If B is functionally dependent on a
subset of A (other than A itself), B is
partially dependent on A in R.
© Minder Chen, 1993~2006
Data Modeling - 62 -
First Normal Form (1NF)
• A relation R is in the first normal form (1NF) if and only if all
attributes of any tuple in R contain only atomic values.
• Normalization Process:
– Remove Partial Functional Dependencies
– If R is in 1NF and has a composite primary key (K1,K2), an attribute
P is functionally dependent on K1 (K1 => P) (i.e., P is partially
dependent on (K1, K2)), the normalization process is:
– The attribute P should be removed from R and a new relation will
be created with K1 as the primary key and P as a non-key attribute.
– A relation that is in 1NF and not in 2NF must have a composite
primary key.
• Example
– Supplier-Part relation has attributes supplier#, part#, qty, city,
distance, where (supplier#, part#) is the key.
– City is partially dependent on supplier#.
SUPPLIER-PART (supplier#, part#, qty, city, distance)

SUPPLIER-PART (supplier#, Part#, qty)
SUPPLIER (supplier#, city, distance)
© Minder Chen, 1993~2006
Data Modeling - 63 -
Non-loss Decomposition
• Normalization is a reduction (decomposition)
process that replaces a relation by suitable
projections. Each of the projection is a new
relation that is in a further normalized form than
the original relation. The collection of
projections is equivalent to the original relation.
• The original relation can always be recovered by
taking the natural join of these projections.
• Any information that can be derived from the
original relation can also be derived from the
further normalized relations. The converse is not
true.
• The process is reversible because no
information is loss in the reduction process.
© Minder Chen, 1993~2006
Data Modeling - 64 -
Transitive Dependency
In a relation R,
if R.A =>R.B and R.B => R.C
then attribute C is said to be transitively
dependent on attribute A.
© Minder Chen, 1993~2006
Data Modeling - 65 -
Second Normal Form (2NF)
• A relation R is in the second normal form (2NF) if and
only if it is in 1NF and every non-key attribute is fully
dependent on the primary key.
• Normalization Process: Remove Transitive
Dependencies
• If R is in 2NF and has two non-key attributes A1 and A2
where A2 is functionally dependent on A1 (A1 => A2).
The A2 should be removed from R and a new relation
will be created with A1 as the primary key and A2 as a
non-key attribute.
• Example
– Supplier relation has attributes supplier#, city, distance, where
supplier# is the key and distance to a supplier can be
determined by the city of the supplier.
SUPPLIER (supplier#, city, distance, quality_level)

SUPPLIER (Supplier#, city, quality_level)
CITY-DISTANCE (city, distance)
© Minder Chen, 1993~2006
Data Modeling - 66 -
Third Normal Form (3NF)
• A relation R is in the third normal form (3NF) if
and only if the non-key attributes (if there is any)
are fully dependent on the primary key of R (i.e.,
R is in its 2NF) and are mutually independent.
• Heuristic to Check Whether a Relation Is in 3NF
– All the non-key attributes (which are not multi-value
attributes) are dependent on the (primary) key, the
whole key, and nothing but the key.
Explanation
• All the non-key attributes have atomic value and dependent on the key
(1NF - No multi-value attribute),
• the whole key, (2NF - No Partially Functional Dependency)
• and nothing but the key (3NF - No Transitive Functional Dependency)
© Minder Chen, 1993~2006
Data Modeling - 67 -
Normalization Process
Unnormalized Form
B
A
C
D
E
F
G
H
remove repeating groups
1NF
F
A
G
B
A
H
remove partial dependencies
2NF
C
D
E
remove transitive dependencies
3NF
3NF
3NF
A
F
G
F
© Minder Chen, 1993~2006
H
D
3NF
A
B
C
E
D
Data Modeling - 68 -
Normalization: Pros and Cons
• Pros
– Reduce data redundancy & space required
– Enhance data consistency
– Enforce data integrity
– Reduce update cost
– Provide maximum flexibility in responding ad hoc queries
• Cons
– Many complex queries will be slower because joins have to be
performed to retrieve relevant data from several normalized
tables
– Programmers/users have to understand the underlying data
model of an database application in order to perform proper
joins among several tables
– The formulation of multiple-level queries is a nontrivial task.
© Minder Chen, 1993~2006
Data Modeling - 69 -
Join Two Tables
SELECT Categories.CategoryName, Products.ProductName
FROM Categories, Products
WHERE Products.CategoryID = Categories.Category ID
© Minder Chen, 1993~2006
Data Modeling - 70 -
Tables in Relational DB
• Identify Primary Keys and Foreign Keys in the
following Tables!!!
ID
ID
ID
© Minder Chen, 1993~2006
Data Modeling - 71 -
Join Tables
SELECT Orders.OrderID, Orders.CustID,
LastName, Firstname, Orders.ItemID, Description
FROM
Customer, Orders, Inventory
WHERE Customer.CustID = Orders.CustID AND
Orders.ItemID = Inventory.ItemID
ORDER BY CustID, Orders.ItemID
ID
© Minder Chen, 1993~2006
ID
Data Modeling - 72 -
Foreign Keys & Primary Keys in a Sample Access Database
© Minder Chen, 1993~2006
Data Modeling - 73 -
An Example of a Complex Query
Please list name and phone number of customers
who have ordered product number 007.
SELECT customer_name, customer_phone
FROM customer
WHERE customer_number IN
SELECT customer_number
FROM order
WHERE order_no IN
SELECT order_no
FROM orderItem
WHERE product_number = 007
© Minder Chen, 1993~2006
Data Modeling - 74 -
Denormalization
• The process of intentionally backing away from
normalization to improve performance. Denormalization
should not be the first choice for improving performance
and should only be used for fine tuning a database for a
particular application.
• Requirements
– Prior normalization
– Knowledge of data usage
• Benefits
–
–
–
–
Minimize the need for joins
Reduce number of tables
Reduce number of foreign keys
Reduce number of indices
• Knowledge of Data Usage
–
–
–
–
–
How often are two data items needed together
How many rows are involved
How volatile is denormalized data
How important is visibility of data to users
What is the minimum response time and frequency of an query
© Minder Chen, 1993~2006
Data Modeling - 75 -
De-normalization: An Example
JOIN
R1
R2
Denormalization
R1 * R 2
R2
• Where:
– R1 (ProductNo, SupplierNo, Price)
– R2 (SupplierNo, Name, Address, Phone)
– R1*R2 (ProductNo, SupplierNo, Name, Address, Phone, Price)
• R2 should be kept to prevent data loss.
• Data redundancy in R1*R2 and R2 could cause potential
data inconsistency problems if the redundant data in
these two tables are not maintained properly.
© Minder Chen, 1993~2006
Data Modeling - 76 -
Data Model Refinement and Transformation
•
•
•
•
•
Data Model Refinement
Associative Entity Type
Removing Many-to-Many Relationships
Keys
Transformation to Relational Databases
© Minder Chen, 1993~2006
Data Modeling - 77 -
Refinement of a Data Model: Analysis and Simplification
•
•
•
•
•
•
•
Isolated Entity Type
Solitary Entity Type
One-to-One Relationship
Redundant Relationship
Multi-Valued Attributes
Attribute with Attributes
Many-to-Many Relationship
© Minder Chen, 1993~2006
Data Modeling - 78 -
Isolated Entity Type
• An Entity Type that does not participate in a
Relationship.
• Since every Entity Type should participate in at
least one Relationship, there exist two
alternatives:
– Identify a relevant Relationship
– Remove the Entity Type from the model
© Minder Chen, 1993~2006
Data Modeling - 79 -
Solitary Entity Type
• An Entity Type that has only one Entity Instance.
Examples: Computer Center, Sales Tax, and Current
Order Number. Solitary Entity Types may be too
restrictive.
• Alternatives:
– Introduce another Entity Type with a wider scope.
– Computer Center ==> Organization Unit
– Define it as an Attribute of an Entity Type.
– Sales Tax ==> Sales Tax of Order
– Define it as a data element in an parameter table. A parameter
table has only one row.
– Current Order Number ==> Current Order Number of Parameter
Table
© Minder Chen, 1993~2006
Data Modeling - 80 -
Evaluate One-to-One Relationship
• It may be an unnecessary relationship between
two Entity Types if they have the same attribute
and relationships (i.e., they are identical).
• It should be then combined into one Entity Type.
Maybe Incorrect
Purchase
Request
becomes
has request
Purchase
Order
Correct
Purchase
Order
© Minder Chen, 1993~2006
Data Modeling - 81 -
Redundant Relationship
Is this relationship redundant?
has ordered
product
customer
is ordered by
places
ORDERS
is placed by
contains
has
order
is part of
order
item
Differences in timing of an entity type in its life cycle:
• Implemented as separate entity types or use subtypes
• Use value of attributes or additional attributes to differentiate them
© Minder Chen, 1993~2006
Data Modeling - 82 -
Redundant Relationship
Redundant
Product
stocks
Warehouse
is held as
holds
Stock
contains
is held in
Non-redundant
Product
is contained in
contains
Order Line
is contained in
contains
Order
is placed by
is contained in
contains
© Minder Chen, 1993~2006
Order History
places
is contained in
contains
Customer
Data Modeling - 83 -
Multi-Valued Attribute
• Definition
–
An Attribute that may have more than one value at a time is called a
multi-valued attribute.
• Solution:
–
Create an Entity Type for the multi-valued attribute
• Example:
–
Languages spoken by an Employee
–
–
Employee(ID, Name, Phone, Languages)
Employee(111, “John Smith”, 201-999-8888, (English, Chinese))
–
–
Employee(ID, Name, Phone)
Employee(111, “John Smith”, 210-999-8888)
–
–
–
Employee_language(ID, Language)
Employee_language(111, English)
Employee_language(111, Chinese)
© Minder Chen, 1993~2006
Data Modeling - 84 -
Attribute with Attributes
• An Attribute that can be described by other
Attributes is called an attribute with
attributes.
• Example:
– College Degree by an Employee
– (John Smith has a College Degree in Computer
Sciences from George Mason University)
• Solution:
– Create an Entity Type to avoid an Attribute with
Attributes.
– Add new attributes to the existing Entity Type.
© Minder Chen, 1993~2006
Data Modeling - 85 -
Associative Entity Type
• An Associative Entity Type is an Entity Type
whose existence is meaningful only if it
participates in several (>=2) Relationship Types
at the same time.
• Associative Entity Types are often introduced to
represent additional information in many-tomany Relationships or to decompose a many-tomany Relationship into two one-to-many
Relationships.
• Associative Entity Types are also used to
represent n-ary Relationships in a binary data
model.
© Minder Chen, 1993~2006
Data Modeling - 86 -
Remove Many-to-Many Relationship
Given
Order
contains
belongs-to
Product
Why?
• There is no place to attach Attributes that are required to describe a
many-to-many Relationship.
• It is difficult to translate many-to-many Relationships into relational
tables automatically.
How?
A many-to-many relationship can be decomposed into two
one-to-many Relationships by creating an Associative Entity
Type between the existing two Entity Types.
has
contains
Order
belongs to
© Minder Chen, 1993~2006
Order Line
Product
is contained in
Data Modeling - 87 -
Remove Many-to-Many Relationships: Exercises
Remove the many-to-many relationship from the
following ER diagrams
(a)
Product
(b)
Student
(c)
has-sources
offers
takes
is-taken-by
Supplier
Course
consists-of
Part
is-contained-in
© Minder Chen, 1993~2006
Data Modeling - 88 -
Bills of Material
A
Part
consists-of
is-a-component-in
C
B
D
E
D
3
1
Product Structure
1
2
F
2
2
Product-Structure(Parent Part No, Child Part No, Quantity)
A
A
B
B
C
C
© Minder Chen, 1993~2006
B
C
D
E
D
F
2
1
1
3
2
2
Data Modeling - 89 -
Using an Associative Entity Type to Represent an N-ary Relationship
involved in
product usage
involved in
product usage
Product
Project
involved in
product usage
Supplier
Product Usage is an Associative Entity Type for a 3-ary Relationship.
is used in
Product
uses
Product Usage
Project
supplies
Supplier
© Minder Chen, 1993~2006
Data Modeling - 90 -
Translate Data Models to Relational Tables
Given
has
contains
Order
Order Line
belongs to
Key: Order#
Attribute:
Order date
Customer ID
Sale Person ID
is contained in
Key: Order#+Product#
Attribute:
Quantity
Unit Price
Product
Key: Product#
Attribute:
Description
Qty-on-hand
Unit Price
Relational Tables Created
CREATE TABLE ORDER
(OrderNo
CHAR(10)
OrderDate
DATE,
CustomerID
CHAR(10),
SalePersonID CHAR(10));
© Minder Chen, 1993~2006
NOT NULL,
Data Modeling - 91 -
Transformation of Data Models to Relational Database Tables
• The entire, or part of, a data (entity-relationship)
model can be translated into a normalized
database design.
• Objects Created
– At most one relational database
– One or more relations (tables)
– Data structures (DDL) representing the elements
(attributes) and the primary key of each relation
– Data type of each data elements
© Minder Chen, 1993~2006
Data Modeling - 92 -
Heuristics of Transformation
• A table is created for each Entity Type in the ER diagram.
• A table is created for each multi-valued attribute.
• Relationship Types are implemented as tables or as foreign
keys in other tables.
• Many-to-many relationship types are translated into tables.
• Foreign keys are used for implementing one-to-one and
one-to-many Relationship Types.
• For one-to-many Relationship Types, the foreign key is
placed in the table that represents the Entity Type on the
"many" end of the Relationship Type.
• For identifying one-to-many Relationship Types, the PK of
the "one" table migrate to the "many" table as a FK and the
FK is also part of the PK of the "many" table.
• For non-identifying one-to-many Relationship Types, the
PK of the "one" table migrate to the "many" table as a FK
and the FK is a non-key attribute of the "many" table.
© Minder Chen, 1993~2006
Data Modeling - 93 -
http://www.oracle.com/tools/jdeveloper/documents/jsptwp/index.html?content.html
Auction Web
Site's Data Model
© Minder Chen, 1993~2006
Data Modeling - 94 -
dept_id = parent_id
dept
dept_id
int
parent_id
int
name
varchar(255)
description
text
date_changed datetime
pfid = pfid
product_variant
sku
int
pfid
varchar(30)
attribute0 tinyint
attribute1 tinyint
attribute2 tinyint
attribute3 tinyint
attribute4 tinyint
A Data Model for an Electronic Commerce Application
dept_id = dept_id
pfid = pfid
product_family
pfid
varchar(30)
dept_id
int
manufacturer_id int
name
varchar(255)
short_description varchar(255)
long_description text
image_filename
varchar(255)
intro_date
datetime
date_changed
datetime
list_price
int
monogramable
tinyint
shopper
sku = sku
shopper_id char(32)
order_id = order_id
created
datetime
name
varchar(235)
password
varchar(20)
street
varchar(50)
city
varchar(50)shopper_id = shopper_id
state
varchar(30)
receipt
zip
varchar(15)
order_id
char(26)
country
varchar(20)
shopper_id
char(32)
phone
varchar(16)
total
int
email
varchar(50)
status
tinyint
date_entered
datetime
date_changed
datetime
marshalled_receipt image
shopper_id = shopper_id
basket
shopper_id
char(32)
date_changed
datetime
marshalled_order image
© Minder Chen, 1993~2006
product_attribute
pfid
varchar(30)
attribute_id
tinyint
attribute_index tinyint
attribute_value varchar(20)
pfid = pfid
pfid = pfid
promo_cross
pfid
varchar(30)
related_pfid varchar(30)
description
varchar(255)
pfid = pfid
promo_upsell
pfid
varchar(30)
related_pfid varchar(30)
description
varchar(255)
receipt_item
pfid
varchar(30)
sku
int
order_id
char(26)
row_id
int
quantity
int
adjusted_price int
promo_price
promo_name
varchar(255)
promo_type
int
promo_description
text
promo_rank
int
active
int
date_start
datetime
date_end
datetime
shopper_all
int
shopper_column
varchar(64)
shopper_op
varchar(2)
shopper_value
varchar(64)
cond_all
int
cond_column
varchar(64)
cond_op
varchar(2)
cond_value
varchar(64)
cond_basis
char(1)
cond_min
int
award_all
int
award_column
varchar(64)
award_op
varchar(2)
award_value
varchar(64)
award_max
int
disjoint_cond_award int
disc_type
char(1)
disc_value
realData Modeling
- 95 -
Attribute 0 of pfid 14 is size and
the attribute value 1 is Grande
and 2 is Tall and 3 is Short
© Minder Chen, 1993~2006
Data Modeling - 96 -
Web-based Build-To-Order Application
© Minder Chen, 1993~2006
Data Modeling - 97 -
Data Model for Build-To-Order Application
© Minder Chen, 1993~2006
Data Modeling - 98 -
Descargar

Data Modeling and Database Design