Data Management Issues for
Peer-to-Peer Computing
Anastasios Kementsietsidis,
Kenneth Cheung, Verena Kantere
A motivating example…
Consider the world of electronic medical records (EMR)

A person’s EMR is always dispersed into a number of
of databases that belong to hospitals, physicians, insurance
agencies etc.

Each database uses a different EMR format

EMR interchange, although crucial, is hindered by the number
of formats currently available

Moreover, it is not always known beforehand where one can
find the different parts of a person’s EMR
Copyright  2001
Anastasios Kementsietsidis
Elicited requirements
Any multi-database system for EMRs must satisfy the following
requirements

It should allow autonomous and heterogeneous local databases
to participate in the multi-database system

Local databases may store overlapping and/or complementary
information

Some form of interoperation is necessary in order to



facilitate EMR interchange and
keep existing EMRs consistent with each other
A query mechanism is necessary to facilitate EMR retrieval/discovery
Copyright  2001
Anastasios Kementsietsidis
Outline (of what follows)




Introduction of a multi-database architecture that meets the
mentioned requirements
Comparison of the architecture with existing ones
Overview of the research issues that need to be addressed
Summary of conclusions and discussion of future topics of interest
Copyright  2001
Anastasios Kementsietsidis
The proposed architecture
Main characteristics of the multi-database system (MDBS)

Each local database (LDBS) is autonomous in the sense that it has:
Execution autonomy

Communication autonomy

Association autonomy
However it does not have complete design autonomy


Each LDBS is acquainted with some of the other LDBSs

The architecture is dynamic in the sense that:
Information dissemination occurs only between acquainted databases



Each LDBS can enter/leave the MDBS at any point in time
While part of the MDBS, an LDBS can unilaterally decide to go online/offline
Acquaintances are not fixed and they can change over time
Copyright  2001
Anastasios Kementsietsidis
A P2P multi-database for EMRs
HDB
HDB
DDB
PDB
PDB
Servent
HDB
DDB
Copyright  2001
DDB
Anastasios Kementsietsidis
HDB
Any commercial DBMS
e.g. DB2, Oracle
A P2P multi-database for EMRs
HDB
HDB
DDB
PDB
PDB
Servent
HDB
DDB
Copyright  2001
Anastasios Kementsietsidis
HDB
Any commercial DBMS
e.g. DB2, Oracle
A P2P multi-database for EMRs
HDB
HDB
DDB
PDB
PDB
Servent
HDB
DDB
Copyright  2001
Anastasios Kementsietsidis
HDB
Any commercial DBMS
e.g. DB2, Oracle
A P2P multi-database for EMRs
HDB
HDB
DDB
PDB
PDB
Servent
HDB
DDB
Copyright  2001
Anastasios Kementsietsidis
HDB
Any commercial DBMS
e.g. DB2, Oracle
Existing architectures

Distributed database systems (DDS)
DDS configuration is static and LDBSs have no autonomy of any form.

Federated database systems (FDBS)
In an FDBS, LDBSs have design and (more) association autonomy.
Existing FDBSs are classified into:

Tightly coupled: DBA-oriented construction of schemas involving large
overhead. System configuration is considered rather static.

Loosely coupled: User-oriented construction of schemas supporting
read-only operations. System configuration is dynamic.

Multi-database systems (MDBS)
MDBSs offer read-only operations and can be classified into:

Global schema integration systems: DBA-oriented construction of schemas

Multi-database language systems: User-oriented construction of schemas
Copyright  2001
Anastasios Kementsietsidis
Comparison of approaches
Differences between the introduced architecture and the existing ones:

Respect of LDBS autonomy

No centralized control of any form (global schemas, global
dictionaries, global directories, global administrators etc.)

Support for LDBS interoperability in terms of both transaction
and query processing
Copyright  2001
Anastasios Kementsietsidis
Research Issues
to be addressed...

Heterogeneity of LDBSs
Resolve the issue of heterogeneity between pairs of acquainted LDBSs.

Entity identification
Identify data items that reside in different LDBSs and are semantically
related. Use this knowledge both during transaction processing, and
during query answering

Transaction processing
Construct a mechanism that allows the propagation of changes between
semantically related data items. (presented by Verena)

Query answering
Construct a mechanism that rewrites queries and propagates them between
acquainted LDBSs. (presented by Ken)
Copyright  2001
Anastasios Kementsietsidis
Setting up acquaintances
A major objective of the acquaintance procedure is to resolve
heterogeneity issues between the acquainted LDBSs
Patient(Fname, Lname, Age, …)
DDB
HDB
Inpatient(Name, DOB, Sex, …)
Requirements for the setup procedure:

Does not obstruct the operation of an LDBS

Requires minimum overhead (time/DBA workload)

It is incremental
Copyright  2001
Anastasios Kementsietsidis
Rules
Rules provoke the automatic, predetermined exchange
of information among peers
The rule mechanism is responsible for the:
1.
Creation of rules
2.
Triggering of rules
3.
Coordination of triggering of rules
Copyright  2001
Anastasios Kementsietsidis
Rules
Example: Suppose that a doctor wants to retrieve the medical record of
patients with the stomach ache that were before in the hospital. After
the examination, he wants to prescribe a new medicine to the patient.
DocDB
Patient(SIN, Name,….)
Appointment(AppID,SIN,Symptom,…)
HospDB
Inpatient(SIN, Name,….)
MedRec(MedrecID, SIN, Date,…)
PharmDB
Prescription(PrID, SIN,…)
Contents(PrID, DrugID, Dose,..)
Copyright  2001
Anastasios Kementsietsidis
Creation of Rules




Rules are triggered by events (Active Databases) =>
necessity of an event language
Rules are of the form Events-Conditions-Actions(ECA rules):
Rule <rule name>:
is fired by <event expression>
checks <condition expression> //optional
performs <action expression>
Expressions can be simple or composite
Composite expressions can contain elements (events,
conditions, actions) from more than one databases
Copyright  2001
Anastasios Kementsietsidis
Creation of Rules
Example:
Rule r1:
is fired by
update on DocDB.Appointment(AppID, SIN,Symptom,..)
AND update on HospDB.Inpatient(SIN,…)
checks Symptom = “Stomach ache”
performs retrieve HospDB.MedRec(MedrecID,SIN,….) AND
update PharmDB.Prescription(PrID, SIN,….) AND
update PharmDB.Contents(PrID, Drug, Dose,…)
Copyright  2001
Anastasios Kementsietsidis
Creation of Rules
A rule is broken into ‘sub-rules’ (one for each DB involved in
the event expression) which are installed in the appropriate DB
Example:
Subrule DocDB_sub-rule:
fired by update on DocDB.Appointment(AppID, SIN,
Symptom,..)
performs notify DocDB
Subrule HospDB_subrule:
fired by update on HospDB.Inpatient(SIN,…)
performs notify DocDB
Copyright  2001
Anastasios Kementsietsidis
Rule Triggering
Rule triggering includes:
 Propagation of events after their occurrence:
According to the sub-rules installed in a DB instances of event subexpressions are propagated

Propagation of queries asking about evaluation of
conditions:
The condition expression is broken into parts, like the event expression,
and a query is formed and evaluated for each one of them

Propagation of actions:
The action expression is also broken into parts which are send to the
appropriate DBs.
Copyright  2001
Anastasios Kementsietsidis
Rule Triggering Coordination
Features of rule triggering coordination:
 For each rule there is a database responsible for the
coordination of it: in the example the DocDB is responsible for the
coordination of the triggering of rule r1



The evaluation of the rule is distributed: the sub-rules
DocDB_subrule and Hosp_subrule are evaluated separately
An algorithm for the combination of partial evaluations is
being developed
Rules have a fixed priority in a context of rules: no conflicts
among rules
Copyright  2001
Anastasios Kementsietsidis
Issues in Rule Triggering and
Rule Triggering Coordination
Key issues in triggering and coordination of triggering of
rules are:
 How are events used by rules:
 Can we use the same event in more than one rules?
 Can events be considered ‘out of date’?
 How are events ‘summarized’ in case of late
propagation?
 How does the distributed evaluation of a rule affect
the efficiency of the system?
 How are circles and deadlocks of rules detected?
Copyright  2001
Anastasios Kementsietsidis
Query Processing
Consists of two components:
1. Query language.
- (e.g. SQL, QBE or even natural language)
- Specify what user want to get.
2. Evaluation Plan
- (i.e. executing the query)
- Show how to get the result .
Copyright  2001
Anastasios Kementsietsidis
Query Processing in P2P
Multidatabase



Typical centralized database retrieve the
result locally.
Distributed Database retrieve the result
according to the global schema.
P2P database not only retrieve locally
but also retrieve from other peers
without following any global schema.
Copyright  2001
Anastasios Kementsietsidis
Issues in P2P Query Processing
Heterogeneity
1. Semantic Heterogeneity


i.e. the same word means differently, different
word means the same.
2. Schema Heterogeneity

i.e. each database have different schema.
3. Query language heterogeneity

i.e. different DBMS accept different query
language.
Copyright  2001
Anastasios Kementsietsidis
Issues in P2P Query Processing
(Continue)

Execution
1. How to handle global query.


how to execute global query.
decide when to stop propagate to other peers.
2. How to handle Efficiency.



concern how to reduce the execution time without loss
information. Some possible solutions:
reduce network traffic by minimize the number of peers
need to access.
utilize extra resources. Some peers may be more powerful,
e.g. faster CPU, higher network bandwidth, etc.
Copyright  2001
Anastasios Kementsietsidis
The Prototype

Used Query Rewriting approach.


rewrite the input query, which is defined by local
schema, into some query/queries that could
execute by other peers.
Query rewriting using view


describe information of other peers using view.
the result query/queries are defined by these
views.
Copyright  2001
Anastasios Kementsietsidis
Example

Original Query posted on a HDB


Query after rewriting



select Name from Inpatient where Name =
“someone”.
select PName from Patient where PName =
“someone”.
Patient is a view on HDB
Sent the rewrited query to a DoctorDB
Copyright  2001
Anastasios Kementsietsidis
Related work

Multidatabase Languages




users construct query over schema of all
databases.
users have full control of what to get.
users require well understanding on all databases.
Global Schemas



users construct query over the global schema.
the global schema require central administration.
the global schema approach sacrifice autonomy
Copyright  2001
Anastasios Kementsietsidis
Conclusions and future work…
What we covered thus far:

Introduced a (new) multi-database architecture that is based on the
Peer-to-Peer paradigm

Compared our architecture with existing ones, outlining its differences
and proving its importance

Offered an overview of the research issues that need to be addressed
Further topics of interest:

Extend the current mechanisms for transaction processing
and query answering

Investigate the performance of the architecture in real-life applications
Copyright  2001
Anastasios Kementsietsidis
Descargar

Data Management Issues in Peer-to