70420403- Database Systems
数据库管理系统
Li Jianguo
[email protected]
1
References


Raghu Ramakrishnan/Johannes
Gehrke, Database Management
Systems, 2nd Ed. , 清华大学出版社
/McGraw-Hill, 2000
C. J.Date, 数据库系统导论(第7版), 机
械工业出版社
2
What is DBMS?



Need for information management
A very large, integrated collection of data.
Models real-world enterprise.
–
–

Entities (e.g., students, courses)
Relationships (e.g., Liu Wei is taking 70420403)
A Database Management System (DBMS) is a
software package designed to store and
manage databases.
3
Why Use a DBMS?






Data independence and efficient access.
Data integrity and security.
Uniform data administration.
Concurrent access, recovery from crashes.
Replication control
Reduced application development time.
4
File Systems
program 1
data description 1
File 1
program 2
data description 2
File 2
program 3
data description 3
File 3
5
Database Management
Application
program 1
(with data
semantics)
Application
program 2
(with data
semantics)
DBMS
description
manipulation
control
database
Application
program 3
(with data
semantics)
6
Why Study Databases??

Shift from computation to information
–
–

at the “low end”: access to physical world
at the “high end”: scientific applications
Datasets increasing in diversity and volume.
–
–

?
Digital libraries, interactive video, Human
Genome project, e-commerce, sensor networks
... need for DBMS/data services exploding
DBMS encompasses several areas of CS
–
OS, languages, theory, AI, multimedia, logic
7
Database Systems
8
Key Terms cont.

Database System
– A database, a database management
system and appropriate hardware and
personnel.
number,
Designation
Hours
worked, Pay
rate
Insurance,
Pension
9
Database System: Hardware

Set of physical devices on which a database
resides. It consists of one or more
computers, disk drives, CRT terminals,
printers, tape drives, connecting cables and
other auxiliary and connecting hardware.
10
Database System: Hardware cont.
Printer
TAPE
Client
Server
Client
11
Database System: Software

A database software includes two types of
software
– General-purpose database management software,
usually called the database management system
(DBMS)
– Application software that uses DBMS facilities to
manipulate the database to achieve a specific
business function, such as providing reports or
documents, which can be used by users.
12
Database System: Software cont.

Application software is generally written
standard programming language such as C,
or it may be written in a language
(commonly called a fourth-generation
language) supplied with the DBMS.

These programs utilise the command
language of the DBMS and make use of the
information contained in the data dictionary.
13
Database System: Software cont.

User Interface
– Language, menus and other facilities by which
users interact with various system components,
such as application programs, the DBMS

Computer-aided software engineering
(CASE) tools
– Automated tools used to design databases and
application programs.
14
Data Dictionary/Directory (Repository)

A subsystem that keeps track of the
– definitions of all data items in the database.
– relationships that exists between various data
structures.
– indexes that are used to access data quickly.
– screen and report format definitions that may be
used by various application programs.
15
Data Dictionary

Definitions of data items in the database
includes:
– Elementary-level data items (fields),
– group and record-level data structures, and
– files or relational tables.
16
Metadata
• Data that describe the properties or
characteristics of other data.
• Some of these properties include data definitions,
data structures and rules or constraints. Item
name, the data type, length, minimum and
maximum allowable values (where appropriate)
and a brief description of each data item.
• Metadata allow database designers and users to
understand what data exist, what the data mean.
• Data without clear meaning can be confusing,
misinterpreted or erroneous.
17
Metadata
E.g.
Data Item
Name Type
Value
Length Min Max Description
Name
ID
Dept
Age
30
9
10
2
Character
Number
Character
Integer
Name
Character 15
Manager Number
9
18
60
Employee Name
Employee No.
Dept. No.
Employee Age
Dept. Name
Mgr. Emp. No.
Employee No. (ID) unique
Manager is an employee of the organisation
18
Database System: People

Two different types of people (users and
practitioners) are concerned with the
database.
Users
– who need information from the database to carry
out their primary business responsibility
e.g. Executives, managers, staff, clerical personnel
19
Database System: People cont.
Practitioners
– people responsible for the database system
and its associated application software.
e.g. Database administrators, analysts,
programmers, database and system designers,
information systems managers.
20
Components of a Database System
Four components: People, H/W, S/W, Data

Practitioners (analysts and database
designers) in consultation with users identify
data needs and design database structures to
accommodate these needs.

The database structures are specified to the
DBMS through the data dictionary.
21
Components of a DBS cont.

Users enter data into the system by following
specified procedures.

The entered data are maintained on
hardware media such as disks and tapes.

Application programmes that access the
database are written by practitioners and
users to be run on computers.
22
Components of a DBS cont.
Application
Program
Analyst
Application
Program
DATA
DICTIONARY/
DIRECTORY
Application
Program
DBMS
DATAB
ASE
Terminals
User
Computer
PRINTER
23
Key Terms cont.

Database Management System
– Systems software that facilitates the management of
a database.
E.g. Oracle, Access, SQL Server
24
Database Management System
(DBMS)

DBMS
– An application software that organises
data into records in one or more
databases and allows organising,
accessing and sorting of the data in a
variety of formats.
25
DBMS cont.

Relational DBMS
– Most common type of DBMS. Data
elements are stored in different tables
made up of rows and columns. Relates
data in different tables through the use
of common data element(s).
26
Components of a DBMS
DBMS
USER
USER
User Query and
Reporting
Facilities
Application
Program
Development
Facilities
PROGRAMMER
Data Dictionary/
Directory
Subsystem
DATA
DICTIONARY/
DIRECTORY
Security and
Integrity
Subsystem
Database Access
Subsystem
DATAB
ASE
PROGRAMMER
27
What is not a DDBS?

A timesharing computer system

A loosely or tightly coupled multiprocessor
system

A database system which resides at one of
the nodes of a network of computers - this is
a centralized database on a network node
28
Data Security

The database is a valuable resource needing
protection.

The DBMS provides database security by
limiting access to the database to authorised
personnel.

Authorised users will generally be restricted
as to the particular data they can access and
whether they can update it.
29
Data Security

Access is often controlled by passwords and
by data views, which are definitions of
restricted portions of the database.
1
2
3
4
Data items
30
Data Integrity

The integrity and consistency of the database
are protected via constraints on values that
data items can have and by backup and
recovery capabilities provided within the
DBMS.

Data constraint definitions are maintained in
the data dictionary.
31
Data Integrity

Backup and recovery are supported by
software that automatically logs changes to
the database and provides for a means of
recovering the current state of the database in
case of system failure.
32
Concurrent Data Access

One of the chief functions of the DBMS is to
support the access, retrieval and update of
data in the database.

The DBMS provides the physical
mechanisms allowing multiple users to
access a variety of related data quickly and
efficiently.
33
User-Oriented Data Manipulation

DBMS provides user-oriented data
manipulation tools.

Easy-to-use query languages allow users to
formulate queries and request one-time
reports directly from the database.

Often query languages will contain facilities
to format the results of queries as reports.
34
Data Manipulation

Report generators have more powerful
reporting facilities than those in the query
language.
35
Application Development

The DBMS commonly provides significant
assistance to the application programmer.
– Tools for screen, menu and report generation
– application generators
– compilers
– data and view definition facilities
36
Application Development

Modern database systems provide language
components that are much more powerful
than those of traditional languages, making
the programming process itself considerably
more efficient.
– Developer 2000/PowerBuilder for Oracle
– Visual Basic for Microsoft SQL server
37
Database Applications
Databases range from those for a single user
with a desktop computer to those on
mainframe computers with thousands of
users.

Personal databases

Workgroup databases

Departmental databases

Enterprise databases
38
Personal Computer Databases
Designed to support one user with a standard
alone PC.
E.g. a sales person keeping track of this
customer information with contact details.
39
Workgroup Databases
A relatively small team of people (less than 25)
who collaborate on the same project or
application.
E.g. a software development team maintaining
a list of software objects.
40
Department Databases
A department is a functional unit of an
organisation. It is larger than a workgroup.
Department databases are designed to support
the various functions and activities of a
department.
E.g. a personnel database that is designed to
track data concerning employees, jobs, skills
and job assignments.
41
Enterprise Databases
An enterprise is one whose scope is the entire
organisation or enterprise.
Such databases are intended to support
organisation-wide operations and decision
making.
E.g. a large health care organisation that
operates a group of medical centre's
including hospitals, clinics and nursing
homes.
42
Enterprise Databases cont.
An enterprise database does support
information needs from many departments.
The most important type of enterprise
database today is called a data warehouse.

Data warehouse
– An integrated decision support database whose
content is derived from the various operational
databases.
43
Database Environment
Marketing
Sales
Accounting
Advertising
Accounts
Receivable
Accounts
Payable
Accounting
Corporate Database
Marketing
Purchasing
44
3 Level ANSI/SPARC Architecture


Made databases more independent of
application
Became a standard for the organisation of
DBMS
45
3 Levels

External
– User’s and Application’s view of data

Conceptual
– Logical Data Model

Physical
– Physical data Model
46

External (Sub) Schema
– defines the external view of data
as seen by a user or program

Conceptual Schema
– defines the logical view of data
as seen by all users and programs

Physical (Internal) Schema
– defines the physical view of data
as seen by a DBMS
47
3 Level Architecture
user a
user i/program j
program x
sub-schema a sub-schema i sub-schema z
conceptual schema
physical schema
Databa
ses
48
Physical View
• The DBMS must know
– exact physical location
– precise physical structure
database
Employee record
A.B.C. De Silva
|222, Galle Road, Colombo
Name (20 characters)
|
Address (40 characters)
650370690V|Senior Lecturer
NID (10 char) Designation (15 char)
49
Physical View
• Physical view provides the disk drives,
physical addresses, indexes and pointers.
• Physical Database Design is the
responsibility of the Database
Administrator (DBA). No user is concerned
with this view.
• Physical devices to contain the data
• Access methods to retrieve and update data
• maintain and improve database performance
50
Logical View
• The user/application must
know
– existence
– logical reference
NID 650370690V
database
Employee
Name
A.B.C. De Silva
Designation Senior Lecturer
Address
222, Galle Road, Colombo
51
Logical View
• This is a single logical description of all
data elements and their relationships.
• It is the result of the conceptual design
which involves analysis of all users
information needs and data definition
of data items needs to meet them.
• SQL CREATE TABLE statement is
used to define the data elements.
52
External View
• The user/application see
– authorised data
– own format
database
Lecturer
Name A.B.C. De Silva
Department
Dept. of Computer Science
Designation Senior Lecturer
Age 35
53
External View
• This consists of user views of the
database.
• Each definable user group will have its
own view of the database.
• Each of these views gives a useroriented description of the data
elements and relationships of which the
view is composed.
54
User View
• A logical description of some portion of
the database that is required by a user
to perform some task.
E.g. Benefit application user will view
part of the employee data, excluding
data such as date of birth and salary.
55
External View
• It can be derived directly from the
conceptual schema.
• The collection of all user views is the
external level.
• SQL CREATE VIEW statement is used
to create these views and SQL GRANT
statement is used to restrict its use to a
user group.
56
External View cont.
• External Views Allows to
– hide unauthorised data
e.g. salary, dob
– provide user view
e.g. view employee name, designation,
department data taken from employee and
department files
– derive new attributes
e.g. age derived from dob or nid
57
External View cont.
• External Views Allows to
– change unit of measurement
e.g. show age in years or months
– define security levels
e.g. update access to employee file
read-only to department file
58
Mapping between Levels
• DBMS map or translate from one level
to another.
External
 Conceptual
Conceptual  Physical
• Database exists in reality only at the
internal level.
59
DML/SQL
user a
user i/program j
program x
sub-schema a sub-schema i sub-schema z
SDDL
conceptual schema
DDL
physical schema
Users
Databa
ses
Practitioners:
DBA
60
Practitioners:DBA
• Data Definition Language (DDL)
- is the language component of a DBMS
that defines each data element as it
appears in the database.
• Sub-Schema Data Definition Language
(SDDL)
- is the language component of a DBMS
that defines data elements as it should
appear to the end users and
programmers.
61
User
• Data Manipulation Language (DML)
- is a language associated with a DBMS
that is employed by end users and
programmers to manipulate data in the
database.
• Structured Query Language (SQL)
- pronounced as sequel, is the standard
data manipulation for relational DBMSs.
62
Characteristics of Early DBMS




Centralised Storage Management
Transaction Management
Recovery Facilities
System Maintained Access Paths
63
Characteristics of Modern DBMS







Query processing and optimisation
Transaction management
Concurrency control
Database recovery
Database security and authorisation
Distributed databases
Data warehousing and data mining
64
Function of a DBMS
Functions of a DBMS are concerned with
providing efficient flexible data processing
capabilities without compromising data
validity.
Main Functions are:
–
–
–
–
Shared data
Control redundancy
Data integrity
Data security
65
Capabilities of a DBMS
Capabilities vary considerable, but basic
operations are:
 Data definition
 Data entry
 Data manipulation
 Data display
66
Database Approach
Advantages
•
•
•
•
•
•
•
•
•
Program-data independence
Minimal data redundancy
Improved data consistency
Improved data sharing
Increased productivity of application
development
Enforcement of standards
Improved data quality
Improved data accessibility and responsiveness
Reduced program maintenance
67
Program-Data/Data Independence
• The separation of data descriptions
(metadata) from the application programs
that use the data.
In the database approach data descriptions
are stored in a central location called the
data dictionary. This property allows an
organisation’s data to change and evolve
(within limits) without changing the
application program that process the data.
68
Minimal Data Redundancy
• Data files are integrated into a single,
logical structure. Each primary fact is
recorded (ideally) in only one place in the
database.
E.g. Employee data not with the payroll and
benefit files.
Note: Data redundancy is not eliminated
entirely. Some data items will appear in
more than one place (e.g. employee no.) to
represent the relationship with others.
69
Improved Data Consistency
• By eliminating (or controlling) data
redundancy, we greatly reduce the
opportunities for inconsistency.
E.g. employee address is stored only once
and hence we cannot have disagreement
on the stored values.
• Also, updating data values is greatly
simplified and have avoid the wasted
storage space.
70
Improved Data Sharing
• A database is designed as a shared
corporate resource. Authorised users
are granted permission to use the
database, and each user (or group of
users) is provided one or more user
views to facilitate this use.
E.g. employee data common to payroll,
benefit applications will be shared
among different users.
71
Increased Productivity of Application
Development
• A major advantage of the database
approach is that it greatly reduces the
cost and time for developing new
business applications.
– Programmer could concentrate on the
specific functions required for the new
application, without having to worry about
design or low-level implementation details;
as related data have already been designed
and implemented.
72
Increased Productivity of
Application Development cont.
– DBMS provides a number of high-level
productivity tools such as forms and report
generations and high-level languages that
automate some of the activities of database
design and implementation.
73
Enforcement of Standards
– When the database approach is implemented
with full management support, the database
administration function should be granted
single-point authority and responsibility for
establishing and enforcing data standards.
– Standards include naming conventions, data
quality standards and uniform procedures for
accessing, updating and protecting data.
– Powerful set of tools for developing and
enforcing these standards are available for some
DBMS.
74
Improved Data Quality
A number of tools and processes are available to
improve data quality.
– Database designers can specify integrity
constraints that are enforced by the DBMS.
– One of the objectives of a data warehouse
environment is to clean up operational data
before they are placed in the data warehouse.
Constraint
A rule that cannot be violated by database users.
75
Improved Data Accessibility and
Responsiveness
With relational database, end users without
programming experience can often retrieve and
display data, even when it crosses traditional
departmental boundaries.
– English-like query language SQL and query tools
such as Query-By-Example provide such
facilities.
76
Reduced Program Maintenance
Stored data are changed frequently for
variety of reasons such as new data items
types are added, and data formats change
(e.g. date format from two-digit to four
digit).
Data independence allows to reduce the
program maintenance time.
77
Database Approach
Disadvantages
• DBMS are more vulnerable than file-based
system because of the centralised nature of
a large integrated database.
• If a failure occurs the recovery process is
more complex and some times may results
in lost transactions.
• Hardware, software and personnel cost are
higher for DBMS.
78
Data Models



A data model is a collection of concepts for
describing data.
A schema is a description of a particular
collection of data, using the a given data
model.
The relational model of data is the most widely
used model today.
–
–
Main concept: relation, basically a table with rows
and columns.
Every relation has a schema, which describes the
columns, or fields.
79
Levels of Abstraction

Many views, single
conceptual (logical) schema
and physical schema.
–
–
–
Views describe how users
see the data.
Conceptual schema defines
logical structure
Physical schema describes
the files and indexes used.
View 1
View 2
View 3
Conceptual Schema
Physical Schema
 Schemas are defined using DDL; data is modified/queried using DML.
80
Example: University Database

Conceptual schema:
–
–
–

Physical schema:
–
–

Students(sid: string, name: string, login: string,
age: integer, gpa:real)
Courses(cid: string, cname:string, credits:integer)
Enrolled(sid:string, cid:string, grade:string)
Relations stored as unordered files.
Index on first column of Students.
External Schema (View):
–
Course_info(cid:string, enrollment:integer)
81
Data Independence



Applications insulated from how data is
structured and stored.
Logical data independence: Protection from
changes in logical structure of data.
Physical data independence: Protection from
changes in physical structure of data.
 One of the most important benefits of using a DBMS!
82
Concurrency Control

Concurrent execution of user programs is
essential for good DBMS performance.
–


Because disk accesses are frequent, and relatively
slow, it is important to keep the CPU humming by
working on several user programs concurrently.
Interleaving actions of different user programs
can lead to inconsistency: e.g., check is cleared
while account balance is being computed.
DBMS ensures such problems don’t arise: users
can pretend they are using a single-user system.
83
Transaction: An Execution Unit of a DB


Key concept is transaction, which is an atomic
sequence of database actions (reads/writes).
Each transaction, executed completely, must leave
the DB in a consistent state if DB is consistent when
the transaction begins.
–
–
–
Users can specify some simple integrity constraints on
the data, and the DBMS will enforce these constraints.
Beyond this, the DBMS does not really understand the
semantics of the data. (e.g., it does not understand how
the interest on a bank account is computed). Why not?
Thus, ensuring that a transaction (run alone) preserves
consistency is ultimately the user’s responsibility!
84
Scheduling Concurrent Transactions

DBMS ensures that execution of {T1, ... , Tn} is
equivalent to some serial execution T1’ ... Tn’.
–
–
–
Before reading/writing an object, a transaction requests
a lock on the object, and waits till the DBMS gives it the
lock. All locks are released at the end of the transaction.
(Strict 2PL locking protocol.)
Idea: If an action of Ti (say, writing X) affects Tj (which
perhaps reads X), one of them, say Ti, will obtain the
lock on X first and Tj is forced to wait until Ti completes;
this effectively orders the transactions.
What if Tj already has a lock on Y and Ti later requests a
lock on Y? What is it called? What will happen?
85
Ensuring Atomicity


DBMS ensures atomicity (all-or-nothing property)
even if system crashes in the middle of a Xact.
Idea: Keep a log (history) of all actions carried out
by the DBMS while executing a set of Xacts:
–
–
Before a change is made to the database, the
corresponding log entry is forced to a safe location.
(WAL protocol.)
After a crash, the effects of partially executed
transactions are undone using the log. (Thanks to WAL, if
log entry wasn’t saved before the crash, corresponding
change was not applied to database!)
86
The Log

The following actions are recorded in the log:
–
Ti writes an object: the old value and the new value.

–



Log record must go to disk before the changed page!
Ti commits/aborts: a log record indicating this action.
Log records chained together by Xact id, so it’s easy to
undo a specific Xact (e.g., to resolve a deadlock).
Log is often duplexed and archived on “stable” storage.
All log related activities (and in fact, all CC related
activities such as lock/unlock, dealing with deadlocks
etc.) are handled transparently by the DBMS.
87
Databases make these folks happy ...


End users and DBMS vendors
DB application programmers
–

e.g. webmasters
Database administrator (DBA)
–
–
–
–
Designs logical /physical schemas
Handles security and authorization
Data availability, crash recovery
Database tuning as needs evolve
Must understand how a DBMS works!
88
These layers
must consider
concurrency
control and
recovery
Structure of a DBMS



A typical DBMS has a
Query Optimization
layered architecture.
and Execution
The figure does not
Relational Operators
show the concurrency
Files and Access Methods
control and recovery
components.
Buffer Management
This is one of several
Disk Space Management
possible architectures;
each system has its own
variations.
DB
89
Summary






DBMS used to maintain, query large datasets.
Benefits include recovery from system crashes,
concurrent access, quick application
development, data integrity and security.
Levels of abstraction give data independence.
A DBMS typically has a layered architecture.
DBAs hold responsible jobs
and are well-paid!
DBMS R&D is one of the broadest,
mature areas in CS.
90
What to Study in 70420403?




?
DBMS basics
– ER, relational model, SQL, DB design using FD
Transaction management
– atomicity, consistency, isolation, durability
– concurrency control and recovery
Databases/data services for real-time applications
– scheduling and CC
– QoS metrics
Data services in emerging applications
– event services, sensor networks, data aggregation,
streaming data
91
Descargar

Introduction to Database Systems