DBMS and Data Management
DBMS
Definition: A database management system (DBMS) is
a set of software that are used to define, store,
manipulate and control the data in a database.
 define --- define data types, structures and
constraints.
 store --- store data to support efficient access.
 manipulate --- perform retrieval and update
operations using a query language.
 control --- control access to data.
Database System = Database + DBMS
Integrity, Security
Topics
Integrity Constraints within a DBMS
Security of Data
Database Integrity
Definition:
Data Integrity
Correctness and consistency of data against a stated set of
constraints during the “normal” operation of the database.
DBMS cannot protect against all accidental errors
a date being entered incorrectly,
mistyping a name,
BUT
DBMS should protect against unreasonable entries, updates, deletions etc.
age > 120
salary > max_salary
making employees redundant without removing them from the payroll data.
Types of Integrity Constraints
1. The data model underlying the DBMS may have its own integrity constraints.
For example: the relational model has two forms of integrity rule which
must be satisfied
Entity Integrity
Referential Integrity
2. There are also a set of constraints regarding “permitted” values for one or
more attributes.
Domain Constraints
3. In addition most DBMS permit additional (general) constraints known as
user specified integrity constraints
(aka enterprise constraints, or business rules)
•
(These are constraints which the data must satisfy in order to be meaningful to the
organisation - User actually being data or database administrator)
Ideally the DBMS should enforce ALL the above types of constraint however it often falls to the application program to protect against integrity
violations.
Relational Model: Entity Integrity
1. a database reflects some part of “reality”
2. unique objects in the "real" world must be
identifiable and unique within the database
3. objects are identified using identifiers
(keys)
made from 1 or more attributes
4. identifiers must be
unique (for a given object type)
minimal (have no spurious attributes)
That is the reason why PRIMARY KEY is NEEDED
 SQL: setting the primary key, (with 2 attributes)
Create TABLE t (
attributeA type,
attributeB type,
PRIMARY KEY (attributeA, attributeB),
)
Entity Integrity Rule
No component of the primary key of a base relation is
allowed to accept nulls.
Justification
a null valued key implies a missing, or unknown, identity.
If the identity is missing then the object in the real world is not
identifiable.
If the identity is unknown then you cannot know if the object is
already in the database - therefore you cannot tell that if the
identity is unique.
Relational Model: Referential Integrity
A foreign key is an attribute (possibly composite) of one relation, whose value
matches the primary key of some tuple in a relation.
(sometimes called: a posted identifier)
Referential integrity rule
The database must not contain any unmatched foreign key values.
SQL: setting the foreign key,
Create TABLE t (
attributeA type,
attributeB type,
attributeC type,
PRIMARY KEY (attributeA, attributeB),
FOREIGN KEY (attributeC) REFERENCES anothertable(referecedAttribute),
)
During update or delete
Restricted
Update/Delete only if no foreign key exists
Cascades
Update/Delete operation cascades to update/delete the
foreign key entries
Nullifies (or set default value)
Update/Delete primary key values &
Null (or default value) in the foreign key entry.
Domain Constraints
Each value appearing in a particular place in the database should come
from a specific domain of values appropriate to the use of the value.
Defn: Domain
a set of all possible correct values
• These can be defined as
• an explicit set of values:
supplier code domain = (S1, S2, S3, S4, S5)
• a range of valid values:
age domain = (0 .. 120)
• or a pattern which values must satisfy:
employee code domain = ( AAAA9999AAA)
(4 letters followed by 4 digits followed by 3 letters)
• or (sometimes) just a datatype:
size domain = integer
• or combinations of the above.
SQL for domain constrain
Create TABLE t (
attributeA DATE,
attributeB VARCHAR2(30), NOT NULL
attributeC CHAR(5), NOT NULL
)
User Specified Integrity Constraints
Categories of User Specified Integrity Constraints
Relation rules
record constraints
set constraints
which may be
static
dynamic (before and after alterations)
Relation Rules
What combination of attribute values are acceptable in a
relation?
 Record constraints
treat the relation one row at a time
For all
Vehicle.Type = CAR
then Vehicle.Wheels = (3,4)
Vehicle.Type = CYCLE
then Vehicle.Wheels = (2,3)
Vehicle.Type = HOVERCRAFT
then Vehicle.Wheels = null
This prevents meaningless combinations of data for an individual object in the
database.
 Set constraints
apply to the relation as a whole, that is all data in
the table
SUM(employee.salary) < limit
COUNT(distinct dept.car) < 10
AVG(staff.overtime) IN [+/- 10% of 6]
Relation rules may be Static or Dynamic
Static constraints
must hold between data items at ALL times.
employee.salary < manager.salary
IN employee, manager
WHERE
employee.mgr# = manager.mgr#
Dynamic constraints
must hold between database states before AND after changes to the relation
or relations
T1(salary) - T0(salary) > 0
after update(salary) at time T1.
That is, any change to salaries must be increasing
and the integrity constraints may relate to several tables.
Responses to Integrity Violation
Attempts to alter a database in ways which violate integrity must be prevented
Responses include
reject the attempted operation
request/prompt for alternative values
abort the current transaction
Integrity checking may be performed
on input
on deletion
on update
deferred to some convenient time (end of transaction)
Deferred Integrity Constraints
Checking
Some checks MUST be deferred to a transaction boundary
(the transaction is the unit of integrity).
Consider a pair of constraints
For each project entry
there must be at least one employee.
For each employee
there must be a project on which they are
employed
Security.
Why do we need security?
Many benefits accrue from the use of a DBMS, centralised data
storage and access management making the contents of the
database available to the user.
However, this centralised storage potentially means that all users
have access to all possible data. Clearly this is not secure,
additional commands must be available to prevent users accessing
data which requires sensitive management.
Security features are required to prevent unauthorised
•
Disclosure
•
Alteration
•
Destruction
of objects in the database.
The Classical Solution to Security.
1.
2.
Identify the user.
•
access through a login sequence requiring a user
identifier.
Authenticate the users identity.
•
require the user to input a secure password.
3.
Match the user identifier with a list of
privileges.
4.
•
•
Privileges such as:
what type of operation can be performed?
on what objects in the database?
5.
Before a transaction is executed check
objects and operations required for the transaction
execution against the privileges granted to the user.
The above would use an authority matrix to determine which users had
what rights on what objects.
User
Relation filed1
A
filed2
Relation Field
B
a
1
Select
All
None
Run
2
Insert
select
Select
All
None
Select
Run
3
Select
Field
b
Program
a
In a relational database the above would form part of the database itself.
Example: Privileges
SQL includes the grant statement (DCL) to grant privileges on a table or view.
GRANT {privilege_list | ALL | ALL PRIVILEGES}
ON
object_name
TO {authorisation_id_list | PUBLIC}
where privilege_list consists of a comma separated list whose components come
from:
select
insert
delete
update {column_list}
references {column_list}
‘references’ gives a user the right to create foreign keys that reference to the table
concerned.
The person who issues the statement must own the tables or views specified in the
statement.
GRANT select, insert ON stock TO smith
GRANT ALL ON product TO fox
GRANT update (stock_code, stock_price) ON stock TO brown
GRANT select ON branch TO PUBLIC
The reverse of granting access privileges is to REVOKE privileges.
REVOKE {privilege_list | ALL | ALL PRIVILEGES}
ON
object_name
FROM {authorisation_id_list | PUBLIC}
REVOKE select, insert ON stock FROM smith
REVOKE ALL ON product FROM fox
REVOKE update (stock_code, stock_price) ON stock FROM
brown
REVOKE select ON branch FROM PUBLIC
Security through VIEW definitions:
•
In order to limit a users access to data we could specify a set
of views on to a relation.
•
The views would expose only those parts of the relation which
the user is permitted to see, an abstraction of the underlying data.
•
No access privileges are granted on the underlying table, only
on the views.
We use the External Schema to limit access to the Conceptual
Data.
Consider,
A database concerned with personnel information which includes salary
information, the database is used both by the payroll office which needs to
see information on NI#, Salary, working hours etc. and also by the medical
staff who need to see records of home address and phone numbers for
emergency contacts.
Clearly we wish to restrict the payroll view to exclude the medical
information, and restrict the medical view to exclude the financial
information.
personnel:
Name
Address
Phone
Emergency_con NI
tact
Salary
(USING SQL)
CREATE VIEW medical
AS
SELECT Name, Address, Phone, Emergency_contact
FROM personnel;
CREATE VIEW payroll
AS
SELECT Name, Address, NI#, Salary, weekly_hours
FROM personnel;
GRANT select ON payroll TO payroll_clerk;
GRANT select ON medical TO nurse;
Will do what we require.
Weekly_hours
Notice that the view can be quite sophisticated:
CREATE VIEW zwu (name, address, salary)
AS
SELECT Name, Address, Salary
FROM personnel
WHERE personnel.Name = "Zimin Wu";
GRANT select ON zwu TO Zimin;
Or how about summarising information in the payroll file, without making the
individual salaries accessible?
CREATE VIEW salary_summary (total_salary)
AS SELECT sum(Salary)
FROM personnel;
GRANT select ON salary_summary TO bank_manager;
So, the view is a useful 'access limiting' tool.
A actual scenario
 Suppose a teacher asks the teaching assistant
(TA)to key in the scores of those students who
have taken the re-sit exam of Database(2701)
for him in a database system. What solution
should the teacher have in order to protect the
privacy of each student? (Hint: show only
necessary information and grant proper
privileges to the TA)
 Student information Table
 Score{SID,CID,score,resit,sname}
Transaction and Concurrency Control
Topics
 Large Scale Data Management
Lots of Data
Lots of Users
Lots of ……
What are Transactions?
 Concurrency Control
Why?
How?
Large Scale Data Management Systems
To put into perspective the problems of large scale database
management systems consider the requirements for the
implementation of an electronic cash system (e.g. smart cards)




There are 400 Billion recorded consumer payment transactions each year.
Person to Person transactions account for a further 400 Billion.
There are 2 Trillion US dollars worth of sub $10 transactions world-wide per year.
“e-cash” (Internet payment mechanisms) are expected to drop to the sub “cent” level for
one-off per page access.
Banks like to keep accurate, secure records of ALL financial
activities.
Credit card transactions are authorised in approx 30 secs - no
matter how busy the system is (how many simultaneous
attempts to authorise cards occur).
Transaction Processing
A database transaction represents a real
world task.
A transaction can be defined as a program
unit that has four core properties:
 Atomicity (a logical unit of work)
all or nothing, transactions are indivisible units.
Transactions either completely succeed fail; no partial, or
completely successes
 Consistency (logical unit of integrity)
moves database from one consistent state to another.
 Independence (logical unit of concurrency)
transactions execute independently of other transactions
which may be occurring simultaneously within the
database system.
 Durability (logical unit of recovery)
failures in the database system cannot cause the effects
of completed transactions to be undone. Once
“committed” the effects are recoverable after failure.
How to represent a Transaction.
How to represent a Transaction.
Most DBMS (or Database Languages) have mechanisms
similar to the SQL description which follows (they do differ from
system to system)
A program unit can be one or more statements - we
talk about multi-statement transactions (MST). A
DBMS needs “user” support to inform it of the
boundaries of a transaction.
Marking Transaction Boundaries
A transaction begins with the user's first executable
SQL statement.SQL has two simple commands in the
Data Manipulation Language to allow the user to mark
the end of transactions.
 COMMIT;
 marks the end of a transaction - and is used where the effects of
the transaction are to be made visible to the other transactions in
the database.
 ROLLBACK;
 also marks the end of a transaction - it is used where you want to
“undo” the effects of the transaction (rollback the state of the
database) to the state it was in just before the BEGIN
TRANSACTION was executed. (undoing the whole transaction)
The Need for Concurrency Control
Most databases are used in multi-user mode;
resulting in more than one transaction occurring
simultaneously in the system.
Databases frequently have lots of users.
(Airline booking, Telesales Operations, Credit Card Mgt.)
Within a collection of transactions individual
statements might be interleaved as well as the
underlying operating system causing processes
to be interleaved.
 CLASSIC PROBLEMS
 Without some form of concurrency management the
effects of simultaneous transactions may interfere
causing three possible types of problem:
The lost update
The uncommitted dependency
Inconsistent analysis.
The Lost Update Problem
Transaction A
TIME
Fetch V
(= 10)
T1
T2
Update V
V=V+10
(=20)
Fetch V
(=10)
T3
T4
Commit
Transaction B
Update V
V=V+20
(=30)
T5
T6
Commit
At time after T4 the update performed by transaction A is lost; the stored
value of V is 30.
The Uncommitted Dependency
Transaction A
Fetch V
(=20)
TIME
Transaction B
T0
Commit
T1
Fetch V
(=10)
T2
Update V V=V+10(=20)
T3
T4
Update V
V=V+20
(V=40)
Rollback
T5
The effect of the rollback in Transaction B is to undo the update of V at time T2.
However, Transaction A has fetched the updated value of V at time T3. If
another transaction executed the same code as A after T4 the result would be
different.
Inconsistent Analysis
Transaction A
TIME
Fetch Acc1
(= 40)
T1
Transaction B 8
Sum=Sum+Acc1
Fetch Acc2 (= 50)
Sum = Sum+Acc2
Fetch Acc3
(= 20)
Sum=Sum+Acc3
T2
T3
Fetch Acc3 (= 30)
T4
Update Acc3
Acc3 = Acc3 -10(= 20)
T5
Fetch Acc1 (= 40)
T6
Update Acc1
Acc1 = Acc1 + 10
(= 50)
T7
Commit
T8
What is the value of the sum total? Is it “right”?
Requirements of Concurrency Control
To avoid these problems transactions must be serializable.
Serializability: (informal defn)
The outcome of a non-serial schedule is the same as that
of a serial schedule of the same transactions.
A non-serial schedule is any sequence of reads and writes
performed by a set of concurrent transactions.
A serial schedule is a schedule where the read and write
operations of each transaction are executed consecutively
without any interleaved operations from other transactions.
Transaction 1 Transaction 2
May by
serialised as
Transaction 1 or as
Transaction 2
Transaction 2
Transaction 1
There is no interference, but still the two sequences lead to
different results.
Concurrency Control: Locking.
Basically, locking is simple.

When a transaction is going to use some record or table
and cannot risk that object changing then it acquires a lock on
the object.

The effect of the lock is to prevent other transactions from
gaining access to that object.
 As usual the actual locking operations in most DBMS are more
complex than the above, for example
 locking a whole table may be unduly restrictive
 the use of a single mode locking strategy prevents shared 'read only' access
 Serialisability is not guaranteed with locks only.
Most DBMS use
Read/Write, Multi-level, Dual Phase Locking
Strategy
Locking Modes
•Read Locks
reading is none destructive so multiple readers can share a read
locked object;
writers are prevented from obtaining read locked objects
•Write Locks
writers are destructive, readers and writers cannot obtain
write locked objects
•Some DBMSs (e.g. Oracle) provide for lock conversion (aka
upgrade) automatically as appropriate
Multiple Level
•
•
locks can be acquired at different levels, e.g. table and
row levels in Oracle.
may be escalated in certain DBMSs
Locking Phase
•
Growing Phase
locks are acquired throughout the transaction; no locks
are released until the end of the growing phase
•
Shrinking Phase
after 1st lock is released no additional locks can be
acquired
This strategy ensures serializability.
How does INGRES do it?
Setting Locks:
•
set automatically by the DBMS
Types of Locking:
•
•
•
X - exclusive (write) locks, only one transaction at a time may hold
this per resource.
S - shared (read) locks, multiple transactions can hold an S lock on
this resource, but no transaction can obtain an X lock on a resource
while any S
lock is held on it.
IX/IS - intended X/S locks, taken on tables before X and S locks are
taken at page level.
Locking Levels:
•
•
Page level locks, physical device pages used to limit locks to relevant
pieces of a table (needs the table to be physically distributed in a
predictable manner, e.g. hashed key).
Table level locks, lock the whole table.
Summary of locking level usage in INGRES:
Query
Comment
Lock Type
Lock Level
select
for each table in
select
IS
table
S
page
OR
update insert delete
if > 10 pages involved
S
table
for table involved
IX
table
X
page
OR
if >10 pages involved
X
table
AND
other tables involved
create drop modify
same as select
X
table
Oracle uses a similar approach, but SELECT statement (without FOR UPDATE in
SELECT) does not lock data.
Page Vs Table Locking.


Lock Escalation
If a query uses > 10 pages then a table
lock is taken. This prevents a large
number of page locks being accumulated
during execution.
If during execution more than 10 page
locks are acquired on a table then they
are promoted to a table lock, and the
accumulated page locks are dropped.
Example 1
SELECT * FROM emp
WHERE name = "Jack"
Actions:
1) acquire table level IS lock on emp.
2) acquire page level lock on page 2 of emp table, assuming "Jack" is on page 2.
3) execute query.
4) drop all locks.
Example 2
BEGIN TRANSACTION
SELECT * FROM dept
WHERE dname = "ADMIN"
UPDATE emp
SET salary=40000
WHERE position = "MGR"
END TRANSACTION
Actions:
1) acquire table level IS lock on dept.
2) acquire page level S lock on page 1 of dept.
3) execute select query.
4) acquire table level X lock on emp, (position is not the primary key and is not
indexed so cannot use page locking)
5) execute update query.
6) drop ALL locks.
What if you request a lock on a locked table?
Lock Compatibility:
IS
IX
S
SIX
X
IS
yes
yes
yes
yes
no
IX
yes
yes
no
no
no
S
yes
no
yes
no
no
SIX
yes
no
no
no
no
X
no
no
no
no
no
•
An attempt to obtain a lock on a previously locked object may result in the
requesting transaction being blocked.
•
A blocked action is held in a wait state until the resource requested is freed.
BUT
this blocking may lead to DEADLOCK!
Consider:
Transaction A
Time
Begin Transaction
T1
T2
Select * from emp
(S lock on emp table)
Begin Transaction
T3
T4
Insert into dept(name)
values("Sales")(request IX
lock on dept)** request
blocked
Transaction B
Select * from dept
(S lock on dept table)
T5
T6
Insert into emp(name)
values ("Bill")
(request IX lock on emp)
waiting
T7
waiting
……
T8
……
DEADLOCK!
Question:
If the DBMS can detect deadlock can the above situation be
resolved adequately?
Answer:
Yes (with varying levels of sophistication).

INGRES aborts (issues a rollback) one of the transactions
arbitrarily and reports an error to the user.

Oracle reports an error to one of the transactions and expects the
signalled transation to roll back explicitly to allow the other
transaction to continue.

More elegant systems should rollback one or other transaction
(silently, no need to concern the user) and reschedule the
transaction from the information in the system log.

Really elegant systems will rollback the transaction which had
performed the least work at the time the deadlock was detected.
Beware of transaction starvation (Livelock) solvable with a
priority system.
16.6 Deadlock Handling
System is deadlocked if there is a set of
transactions such that every transaction in the
set is waiting for another transaction in the set.
Deadlock prevention protocols ensure that the
system will never enter into a deadlock state.
Some prevention strategies :
Require that each transaction locks all its data items before it
begins execution (predeclaration).
Impose partial ordering of all data items and require that a
transaction can lock data items only in the order specified by
the partial order (graph-based protocol).
More Deadlock Prevention Strategies
Following schemes use transaction
timestamps for the sake of deadlock
prevention alone.
wait-die scheme — non-preemptive(非抢占)
older transaction may wait for younger one to release data
item. Younger transactions never wait for older ones; they are
rolled back instead.
a transaction may die several times before acquiring needed
data item
wound-wait scheme — preemptive
older transaction wounds (forces rollback) of younger
transaction instead of waiting for it. Younger transactions may
wait for older ones.
may be fewer rollbacks than wait-die scheme.
Deadlock prevention (Cont.)
Both in wait-die and in wound-wait schemes, a
rolled back transactions is restarted with its
original timestamp.
Older transactions thus have precedence over newer ones,
and starvation is hence avoided.
Timeout-Based Schemes :
a transaction waits for a lock only for a specified amount of
time. After that, the wait times out and the transaction is rolled
back.
thus deadlocks are not possible
simple to implement; but starvation is possible.
Also difficult to determine good value of the timeout interval.
Deadlock Detection
Deadlocks can be described as a wait-for
graph, which consists of a pair G = (V,E),
V is a set of vertices (all the transactions in the system)
E is a set of edges; each element is an ordered pair Ti Tj.
If Ti  Tj is in E, then there is a directed edge from Ti to Tj,
implying that Ti is waiting for Tj to release a data item.
When Ti requests a data item currently being held by Tj, then
the edge Ti Tj is inserted in the wait-for graph. This edge is
removed only when Tj is no longer holding a data item needed
by Ti.
 The system is in a deadlock state if and only if the
wait-for graph has a cycle. Must invoke a deadlockdetection algorithm periodically(周期性地) to look for
cycles.
Deadlock Detection (Cont.)
Wait-for graph without a cycle Wait-for graph with a cycle
Deadlock Recovery
When deadlock is detected :
Some transaction will have to rolled back (made a victim) to
break deadlock.
 Select that transaction as victim that will incur minimum cost.
Rollback -- determine how far to roll back transaction
 Total rollback: Abort the transaction and then restart it.
 More effective to roll back transaction only as far as necessary to
break deadlock.
Starvation happens if same transaction is always chosen as
victim.
 Include the number of rollbacks in the cost factor to avoid
starvation
Descargar

幻灯片 1