Database Management Systems
SPL/2010
1
Overview
●
we studied:
●
●
how memory is managed within the virtual
memory space of a process
how messages can be exchanged between
processes, using inter-process communication
SPL/2010
2
Next
●
●
Storage remains unchanged beyond the end of the
process execution.
RTE provides services to processes to support
persistent data storage.
SPL/2010
3
Persistent storage
●
●
several properties:
●
It is persistent
●
It can be shared among several processes
We discuss in this lecture two types of
persistent storage services:
1.
the file system
2.
database services.
SPL/2010
4
File System (FS) Services (interface)
●
Map names to storage locations:
●
●
●
●
storage location=storage on disk.
FS maps names ("/usr/local/file.ext") to
concrete locations on disk.
name mapping interface = sys. calls: open(name),
create(name), delete(name) query names space
getfiles(name-pattern).
FS organizes space of file names in hierarchical
structure of embedded directories.
SPL/2010
5
File System (FS) Services (interface)
●
Access:
●
●
●
stream interface to read and write a file as a
stream of bytes.
seek within the file, read its content and write
new content.
allocation of space / automatic increase storage
space
SPL/2010
6
File System (FS) Services (interface)
●
locking files:
●
●
allow multiple processes to synchronize access on
shared files
one can lock a whole file or just a segment of
bytes within the content of a file.
SPL/2010
7
FS
●
FS = abstract interface of system calls
(open, read, write, seek etc)
●
implemented by specific drivers which know how
to interact with specific hardware devices.
SPL/2010
8
Distributed FS
●
process can request access to files stored on
a remote machine:
●
●
implementation of FS in RTE passes the system
calls (open, read, etc) to FS-server on server
distributed file systems (for example, NFS and
SAMBA) have complex locking mechanisms.
SPL/2010
9
Database Management System (DBMS)
●
way of storing persistent data other than FS
●
DBMS is present as a service
●
●
processes connect through inter-process
communication protocol (generally over TCP).
DBMS manages persistent storage, often by
accessing FS on its side
SPL/2010
10
File systems vs. DBMS
●
Data Model
●
●
●
in FS is very simple: a stream of bytes with the
open/close, read/write and lock operation.
DBMS defines a rich data model, manages
serialization
programmer must define the specific data model
for the data he is interested in storing.
SPL/2010
11
File systems vs. DBMS
●
Data Independence –
●
●
●
DBMSs provide an abstract interface for data
storage/access.
programmer does not need to define in which
file, at which offset data is to be stored.
one can query the DBMS by content (e.g.,
"retrieve all content that satisfies certain
criteria").
SPL/2010
12
File systems vs. DBMS
●
Efficient concurrency
●
●
●
DBMSs are built to support thousands of
concurrent users.
ensure data is kept consistent
provide efficient management of such high
concurrency.
SPL/2010
13
File systems vs. DBMS
●
Reduced application development time
●
●
DBMSs allow data manipulation using a simple
API.
DBMS manages storage, query optimization,
concurrency and integrity management
SPL/2010
14
DBMS Services
●
management:
●
●
verify that users are authorized to access data.
Specific rights can be granted for each part of
the data and each group of users.
SPL/2010
15
DBMS Services
●
Session and Transaction management:
●
●
transaction = complex data operation - read and
modify many different objects
viewed from the outside as single atomic
operation- completely succeeds or not performed
SPL/2010
16
DBMS Services
●
Query optimization and execution:
●
●
●
clients interact with DBMS by submitting
complex queries, in a language
DBMS execute queries in the most efficient
manner possible.
techniques in the world of relational database
servers.
SPL/2010
17
Back-end services
●
not directly visible to the user but
performed by the DBMS as part of its
implementation
●
File and Access methods
●
Buffer management
●
Disk space management
SPL/2010
18
Transaction
●
●
●
●
unit of work performed against a database
management system
treated in a coherent and reliable way
independent of other transactions.
atomic, consistent, isolated and durable
(ACID acronym).
composed of independent units of work, each
reading and/or writing information to DB
SPL/2010
19
Transaction
●
●
●
●
provide "all-or-nothing" - work units must
complete or take no effect whatsoever.
must be isolated from other transactions
results must conform to existing constraints
in database
if completed successfully must be committed
to durable storage.
SPL/2010
20
Transaction pattern
●
Begin transaction
●
Execute data manipulations and queries
●
If no errors occur then commit transaction
●
If errors occur rollback the transaction
SPL/2010
21
Transactions safety
●
Identify resources that will be accessed
during the transaction.
●
●
SQL: full tables or parts of tables (rows).
Acquire locks on all resources before
transaction starts.
SPL/2010
22
Transactions safety
●
Perform transaction.
●
●
modifications are performed on data snapshot in
memory.
isolation: data in transaction is not affected by
other threads. partial modifications not
committed are not visible
SPL/2010
23
Transactions safety
●
●
transaction committed: modifications are
merged to persistent storage
if transaction is rollbacked: modifications
are deleted and persistent storage is not
modified.
SPL/2010
24
Data Models
●
●
●
data model = which data values can be sent
for storage
OO method:
●
primitive data types (int, bool, char etc)
●
complex data types
DBMS - relational model
SPL/2010
25
Case Study: A Data Model for Colleges
●
●
data model of an academic college.
different departments, each specializing in
teaching a certain domain.
●
●
●
Computer Science / Physics department
department has a department head and a
geographical location.
students may be enrolled in one department
and registered to courses given by it.
SPL/2010
26
Relational Model
●
●
●
●
●
"Relation" = "table", - "relational" = "based
on tables“…
logical representation of information.
logical schema - declared constraints in the
database design - consistency -
database normalization
design with desirable properties from a set
of logically equivalent alternatives.
SPL/2010
27
building blocks
●
Relation - set of records of same fields.
header of a table
●
●
Our example: students, departments, etc.
Attribute - field within a relation (data type,
name)
●
Our example: student name, department head,
SPL/2010
28
building blocks
●
Domain - restriction of types
●
●
●
our example: student ID is 9 digits 0-9 each.
Records - tuple, single line within a relation.
Integrity constraints - constraint on
attribute with regard to all records
●
our example: Student ID is unique.
SPL/2010
29
Relational Algebra
●
Selection - select specific rows
●
Projection - projects specific columns
●
Set operations - union, intersection, crossproduct, set-difference
●
●
●
on relations that have the same structure (same
attributes)
Join - cross product followed by selection and
projection.
Renaming operation - change name of columns.
SPL/2010
30
●
●
take existing relations as arguments and
return new relations as a value.
columns of tables are restricted to being
simple values
●
columns must be primitive data types
●
no embedded tables or array.
●
complex data - primary and foreign keys
SPL/2010
31
SQL (Structured Query Language)
●
query language - interface with a relational
DBMS. sub-languages:
●
●
DDL: Data Definition Language define schemas,
relations and domains.
DML: Data Manipulation Language - queries,
insertions, updates and deletions
SPL/2010
32
●
client interacts with SQL server by sending
SQL queries (TCP protocol)
●
●
receiving as answers a result-set.
communication protocol between an SQL server
and its clients is stateful and session oriented.
SPL/2010
33
Keys
●
subset of the attributes of the relation
which uniquely identify rows.
●
Example: relation for students with attributes
name, birthday, address and Social Security
Number (SSN).
–
–
SPL/2010
Name is not a key (two students could have the same
name). SSN is a key (this is why it was invented:).
several attributes in a key: courses (course name,
department, semester, lecturer) course name is not a
key / but the pair (course name, semester) could
34
Database Normalization and Foreign Keys
●
●
●
minimize duplication of information (also
known as "Don't Repeat Yourself" - the DRY
principle
Duplication = data appear twice in the domain
of the problem modeled.
Example: relation student (ID, Name, Course)
a student is enrolled to 3 courses = 3
records duplicating the ID and Name
SPL/2010
35
●
●
Solution: student data will be stored only in a
single relation.
refer to students- keys to remove data
repetition: use a single attribute SSN to
refer to a student.
SPL/2010
36
Join
●
●
retrieve the information on the student
given a tuple from the courses relation, we
need to operate a join operation on the SSN
attribute (which is shared by the 2 relations)
The LEFT JOIN keyword returns all rows from the left table (table_name1),
even if there are no matches in the right table (table_name2).
SPL/2010
37
SPL/2010
38
Foreign key
●
in student relation, SSN is a key.
●
in course relation, StudentSSN is not a key
●
●
several courses taken by a single student
StudentSSN refers to the key of another table.
= foreign key in the course relation - refers to
the student relation.
SPL/2010
39
Data Integrity Constraints
●
cannot create a tuple in the course relation
that refers to a student that does not exist.
●
●
●
●
new row in course relation: check foreign key
exists in student relation.
update SSN field of course relation: check new
StudentSSN exists in student relation.
delete row in student relation: verify no rows in
course that refer to it
update row in student relation: verify no rows in
course that refer to old value
SPL/2010
40
●
When defining a data model on a server, user
declares which constraints are part of the model:
–
–
keys and foreign keys.
DBMS server enforces constraints and verifies insert,
update, delete operations –no conflict with constraints
SPL/2010
41
NULL
●
●
belongs to all domains
indicates lack of knowledge - we do not know
what its value is
●
●
example (1 = NULL) is false and (1 != NULL) is
also false! (NULL = NULL) is also false.
compare NULL with value, result is always
false.
SPL/2010
42
Indexing
●
●
●
●
records in relation in an query efficient data
structure with respect to certain attributes.
B-Tree / hash-tables
●
query efficient
●
cost time when inserting or deleting data.
Implicit indexing - employed by DBMS for
attributes defined as unique (keys).
Explicit indexing of an attribute is possible
SPL/2010
43
Designing a Data Model
●
●
A normalized data model
provides efficient access to data with
respect to the most typical scenarios
SPL/2010
44
Design Steps
SPL/2010
45
Step A – Identify simple correlation among attributes
●
●
A Student has a name and ID.
ignore Courses and Department - not simple hold correlation to other attributes as well:
●
department has a name, location and head.
●
Course has a name.
SPL/2010
46
Step B – Identify unique attributes
●
●
●
●
at least one attribute in a relation that is
unique
Primary Key - a unique attribute that must
have a value (NULL is not allowed).
student relation: ID = primary key.
courses relation = artificial one: a course
number. department ID.
SPL/2010
47
●
●
●
can define several attributes as a unique key
any relation that should be later queried
efficiently must have a primary key.
in rare cases, no need to define a primary
key.
●
example: logs stored in DB, only insertions are
performed occasionally based on a time range.
–
SPL/2010
no requirement to uniquely identify a single row of a
log - so there is no need to define a key on such a
table.
48
Step C – Associate simple relations
●
●
associations between tables.
where one table refers to another: adding in
a table an attribute that refers to rows in
another relation.
●
●
●
example, a student is registered to a department
(assume only one).
add the primary key of the department relation
as an attribute in the students relation
foreign key - many-to-one relationship
SPL/2010
49
●
●
●
foreign key identifies a column (set) in
(referencing) table that refers to a column
(set) in another (referenced) table.
values in one row of the referencing columns
must occur in a single row in referenced
table.
A row in the referencing table cannot contain
values that don't exist in the referenced
table (except NULL - which means that we
do not know what the value of the column is).
SPL/2010
50
step D – Represent Collections
●
describe an attribute as a collection.
●
●
●
●
our example: courses to which a student is
registered is a collection of values.
model association - create table that correlates
courses and students with foreign keys.
define student2course with attributes: courseId
as foreign key to courses.ID and studentId as a
foreign key to Students.ID.
model many-to-many associations - called "cross
tables".
SPL/2010
51
The College model
●
●
●
Tables: departments, students, courses,
student2course
departments: ID integer as a PRIMARY KEY.
Name short string that must be UNIQUE,
Building integer
students: ID integer as a PRIMARY KEY,
Name short string, DepID integer as a
FOREIGN KEY to departments.ID
SPL/2010
52
●
●
courses: ID integer as a PRIMARY KEY,
Name short string that must not be NULL
student2course: SID integer as a FOREIGN
KEY to students.ID , CID integer as a
FOREIGN KEY to courses.ID and Grade
integer.
SPL/2010
53
●
We add an explicit index on Students.Name
and on Courses.Name as we know many
queries will be performed on those
attributes
SPL/2010
54
Descargar

Document