Object-Relational Database
Systems:
Evolution Beats Revolution
Michael J. Carey
IBM Almaden Research Center
Plan for the Talk
The relational DBMS revolution
 Relational model and query language
 Why relational succeeded
 Why relational isn't enough, and some options
The object-oriented DBMS revolution
 Object-oriented model(s) and query language(s)
 Why object-oriented "failed"
 Why wrappers will fail as well
The object-relational DBMS evolution
 The object-relational model and query language
 Current products and examples
 Performance and other challenges
The Relational DBMS Revolution
The pre-relational era (1970's)
 Graph-based data models
Hierarchical model (e.g., IMS)
 Network model (e.g., Codasyl)


Low-level, navigational interfaces

Labor-intensive and error-prone
The relational era (1980's)
 Simple, abstract data model
Database = set of relations ("tables")
 3 schema levels: views, base tables, physical schema
 Algebra of set-oriented operations


High-level, declarative interfaces
SQL, Quel, QBE
 Embedded languages, 4GLs

The Relational Model (by example)
Employees and departments
Department
Employee
dno
10
20
eno
1
7
22
name
Toy
Shoe
name
Lou
Laura
Mike
salary
10000000
150000
80000
?
select E.name, E.salary, D.no
from Employee E, Department D
where E.salary < 100000
and D.name = 'Shoe'
and E.dept = D.dno
dept
10
20
20
Relational DBMS "Goodies"
Relational query processing
 Queries range over tables and/or views
 Programmers use a declarative language (SQL)
 Query optimizer picks the lowest-cost query plan


Alternative access paths, join orders, join methods, and so
on (based on indices and database characteristics)
Result: data independence
Support for (shared) business logic
 Integrity constraints


Check constraints, referential integrity constraints
Triggers, stored procedures, views, authorization
Performance and robustness
 Buffering, locking, crash recovery, replication, ...
We've Achieved Nirvana ... Right?
Relations are surely the answer!
 Simple, high-level model for programmers
 Easy to distribute data and parallelize queries
But what was the question?
 Sometimes difficult to model "real world" data
Entities and relationships (versus tables)
 Variance among entities (versus homogeneity)
 Set-valued attributes (versus normalization)

Demanding new database applications
 New applications bring new data types
 Complex objects are problematic

"A relational database is like a garage which
forces you to take your car apart and store
the pieces in little drawers..."
What are the Options?
Throw in the towel
 OOPL + your favorite file system
Object-oriented DBMS
 Tightly integrated: OOPL w/built-in DBMS
Object-oriented client wrapper
 Loosely integrated: OOPL + relational DBMS
Object-relational DBMS
 Newly integrated: Relational model + OO features
Which solution is the "right" one...?
Let's Examine the Problem Space
Stonebraker's 4-quadrant model
Complex
Simple
OO DBMS
O-R DBMS
File
System
Relational
DBMS
Queries
Complex
The Object-Oriented DBMS Revolution
Motivated by new database applications, e.g.:
 Computer-aided engineering
 Document management
 Geographic data management
Engineering applications were early drivers
 Complex data structures ("pointer spaghetti")
 Navigational data access required
 Tight coupling between applications and data
 Version management support needed
Approach: OOPL + DBMS = OO-DBMS
 Commonly based on C++ or Smalltalk
 Persistence, collections, versions, queries, ...
No OO "Ted Codd" Stepped Forward
Object-Oriented Database System Manifesto
 Mandatory features
Complex objects, identity, encapsulation
 Inheritance w/substitutability and late binding
 Computationally complete methods
 Extensible type system, persistence
 Secondary storage, concurrency and recovery
 Ad hoc queries


Optional features
Multiple inheritance, static type checking
 Distribution, long transactions, versions


Individual choices
Programming paradigm/language
 Details and uniformity of object model

OO-DBMS Technology Today
Lots of research results
 Object data models and features
 OO query languages and processing techniques
 Client-server architectures and performance
Significant commercial progress
 Important and innovative systems


E.g., O2, ObjectStore, ODE
Quite a few commercial product offerings

GemStone, Objectivity, ObjectStore, Ontos, O2, Matisse,
Poet, Versant, others
The ODMG-93 standard (release 2.0)
 Consortium of OO-DBMS startups
 Three key parts: ODL, OQL, C++ binding
But the Revolution "Failed" ($0B)
Lingering OO-DBMS differences
 Query power, API details, implementation twists
 Piecewise ODMG standard conformance (ex: OQL!)
Still behind R-DBMSs in important ways
 Codasyl-like schema compilation cycle
 Schema evolution painful, if supported
 Typically missing many useful "goodies"
Support for multiple application languages
 Query optimization, views, authorization, constraints, triggers,
multi-user scalability and robustness, ...

Other factors (niche market)
 SQL-based application building tools
 Architecturally biased towards "fat clients"
OO Client Wrappers are the Answer...
Available from a number of vendors
 Persistence Software, Ontologic, HP, Next, ...
Language-specific relational wrappers
 Proxy classes for C++ or Smalltalk (or Java)
 Mapping of row data into language objects
 Client-side (or middle-tier) object caching and
method execution
Why is this approach attractive?
 Can develop OO applications today, against
existing enterprise data, for "business objects"
...Not!
Paradigm mismatch for querying
 C++ or Smalltalk for simple business logic
and navigation, against object-oriented
schema
 SQL for queries, against relational schema
Choice forced for business logic & rules
 Do on server, using DBMS facilities?


Check constraints, referential integrity constraints,
triggers, stored procedures, authorization
Do on client, using OO wrapper facilities?

C++ or Smalltalk (or Java) programming
This had better be a stop-gap solution
 R-DBMS could become a storage manager,
throwing away 20+ years of successful R&D!
The Object-Relational DBMS Evolution
 Third Generation Database System Manifesto
 Support rich object structures and rules
Rich type system, inheritance, encapsulation
 Functions, optional unique ids, rules/trigggers


Subsume second generation database systems
High-level query-oriented interface
 Stored and virtual collections
 Updatable views
 Data model/performance feature separation


Open to other subsystems (tools, middleware)
Accessible from multiple languages
 Layered persistence-oriented language bindings
 SQL support ("intergalactic dataspeak")
 Query-shipping architecture

"Not Your Father's Employee Type"
Beyond name, rank, and serial number
 Several new attribute types


Location (2-d point), job description (text), photo (image), ...
Associated functions

Distance(point, point), contains(text, string), ...
Beyond your basic employee record
 Employees come in different flavors


Employees have many known relationships


Emp, RSM, Programmer, Manager, Temp, ...
Manager, department, projects, ...
Employees have behavior

Age(Emp), qualified(Emp, Job), hire(Emp), ...
An Employee is a "business object"
Two Flavors of O-R Object Extensions
Object extension #1: Abstract data types (ADTs)
 New column types and functions
E.g.,text, image, audio, video, time series, point, line, OLE...
 For modeling new kinds of facts about enterprise entities

Object extension #2: Row types
 Types and functions for rows of tables
Includes inheritance, references, set-valued attributes
 For modeling business objects with relationships & behavior

Impact on schemas and query language: SQL3
 Schemas: tables at the top, OO richness within
 Queries: extensions to support the added richness
 Structured types: support both ADT and row type
object modeling needs (unified type system)
ADTs (Black Box)
To define and use a "black box ADT", a user will
 Implement its internal structure and functions in an
external programming language (e.g., C/C++, Java)
 Use the DDL to register the type with the DBMS
Size of an instance of the type
 Input (constructor) and output functions
 Other functions and operators, including signatures and
linkable implementations
 Costs and other properties for query optimizer


Use the new type like a built-in data type
Now available for defining columns of tables
 Functions and operators become available in queries

Example: Illustra Black Box ADT
Point as a "black box ADT" (written in C)
create type Point
(
internallength = 16;
input = point_in;
output = point_out;
);
-- typedef struct {double x, double y} point
-- for reading in Point constants
-- for displaying Point results
create function point_in(Text) returns Point as
external name 'MI_HOME/functions/point.so'
language C;
create function point_out(Point) returns Text as
external name 'MI_HOME/functions/point.so'
language C;
Example: Illustra Black Box ADT (cont.)
Now we can put an end to "Pointless" queries...!
create function further_west(Point, Point) returns Boolean as
external name 'MI_HOME/functions/pointfuns.so'
language C;
select E1.name, E1.location
from Emp E1, Emp E2
where further_west(E1.location, E2.location) and E2.name = 'Mike';
create binary operator binding to further_west;
select E1.name, E1.location
from Emp E1, Emp E2
where E1.location >> E2.location and E2.name = 'Mike';
ADTs (White Box)
To define and use a "white box ADT", a user will
 Describe its internal structure using SQL3 DDL
Attribute definitions are column-like
 Advantages: heterogeneity, nulls, nesting, constraints, ...


Implement its functions either directly in SQL or in
his/her favorite external programming language


Finish explaining the type to the DBMS using DDL


For query optimizer, as before
Use the new type like a built-in data type


Utilize system-generated accessors and mutators
In tables and queries, as before
Note: this is just a SQL3 structured type definition that's
primarily intended for use in columns
Example: DB2 UDB/OSF White Box ADT
Point as a "white box ADT" (written in SQL3)
create type Point as
(
x double,
y double,
);
create function distance(p1 Point, p2 Point) returns Point
language SQL inline not variant
return sqrt((p2..y-p1..y)*(p2..y-p1..y) + (p2..x-p1..x)*(p2..x-p1..x));
select E.name
from Emp E, City C
where C.name = 'San Jose'
and distance(E.location, C.center) < 25;
Of Extenders, Blades, and Cartridges
High performance demands "deep" integration
 Optimizer must know about an ADT operator's...
Execution cost (especially for expensive functions)
 Logical properties (e.g., transitivity, negator, ...)
 Selectivity estimates (i.e., filtering/matching power)
 Relationship to access methods (both old and new)


DBMS runtime must invoke functions efficiently

Static vs. dynamic loading, fenced vs. unfenced execution
Partnerships and third-party packages
 E.g., DB2's text, image, and spatial extenders
 Package contains types, functions, access methods,
optimizer information, and SQL DDL statements for
all of the above
Row Types
To define and use a "row type", a user will
 Create the desired structured type using SQL3 DDL


Create functions/methods involving the type


Columns, plus (optional) specification of a supertype
Arguments of the new type, w/overloading in the case of methods
Create one or more tables of the indicated type

Type hierarchy (if any) yields corresponding table hierarchies
Type Hierarchy
P e rso n _ t
Em p_t
K id _ t
Table Hierarchy
I B M _ P e o p le
IB M _ E m p s
IB M _ K id s
Example: SQL3 Row Types (plus Sets...)
Employees are people, so ...
create type Person_t as(
name Varchar(20),
birthdate Date)
method age( ) returns Integer language SQL;
create method age( ) for Person_t
return year(current date) - year(birthdate);
create table IBM_People of Person_t (ref is self);
(**Note: this is approximate SQL3 syntax)
create type Emp_t
under Person_t as (
salary Float,
job_description Varchar(100),
department ref(Dept),
projects set(ref(Project)
);
create table IBM_Emps of Emp_t
under IBM_People (...);
Queries Over Row Types
SQL's query constructs, extended with the ability to
access these features (a la SQL3 plus sets)
User-defined functions in queries (w/late method binding)
 Dereferencing of references (path expressions)
 Queries over nested collections (table expressions)

For example, find unexplainable discrepencies
between employees' and managers' salaries:
select E.name, E.manager->name, display(E.photo)
from IBM_Emps E
where E.salary > E.department->manager->salary
and E.department->manager->age( ) > E.self->age( )
and not contains(E.job_description, "Java")
Other OR-Related Features
Support for large objects
 Multimedia data types aren't small (e.g., video)
 Special handling required for efficiency


Minimal copying, piecewise retrieval, optional logging, movement to/from
files, separate storage area from other attributes
DB2 has blob, clob, and dbclob types (up to 2GB)
Support for active data (triggers and constraints)
 Ex:
create trigger me_too
after insert on IBM_Emps
referencing new as newemp
foreach row mode db2sql
when salary > department->manager->salary
begin atomic
set newemp.department->manager->salary
= newemp.salary;
end
OR-DBMS Technology Status
Many OR-DBMS research results
 Postgres, EXODUS, Starburst, ...
 OODB query processing research
Commercial systems exist today
 IBM DB2 CS (V2.1) and CA-Ingres


Illustra, UniSQL/X


User-defined types & functions, large objects, triggers
Early providers of ADTs, row objects, inheritance
IBM DB2 UDB, Informix, Oracle

"Universal server" products contain subsets of all this stuff
Standards right around the corner
 SQL3 is "hardening" and has an object part with
structured types, table hierachies, user-defined
functions and methods, object views, ....
Some OR-DBMS Performance Issues
Bucky OR-DBMS benchmark from UW-Madison
 Based on a hypothetical university schema
 Exercised a range of OR-DBMS features
Row types, inheritance, late binding, subtables
 Queries involving path expressions and/or sets
 ADTs (black or white box) and functions


In Proc. 1997 ACM SIGMOD Conference
Tested a first-generation OR-DBMS product
 OR versus relational simulation, same DB engine
Showed benefits of (complex) ADTs, indexes on functions
 Indicated areas where query optimization needs schema
support: scope for path expressions, inverse relationships
 Turned up bugs and performance problems (e.g., sets)

OR Enterprise Scenerio (w/Challenges)
Object-relational server managing the database
 ADTs w/inheritance and multi-language support
 Row types, integrated with all of SQL (OO views,
authorization, triggers, constraints, etc.)
High-function, OO, caching front-ends
 Support for desktop and middle-tier (web!) applications
 OR object model at all levels, for queries and navigation
 Clean bindings for OOPLs (Java, C++, Smalltalk)
 Methods/queries running on client or server
 Likewise for triggers and constraints
Business rules specified & implemented once!
 In SQL (+ OOPL), running where appropriate
Multi-Tier Integration Challenges
Good mappings and interfaces to provide
object-relational objects to OOPLs
Java, C++, Smalltalk, others
 Full query support in addition to navigation

Challenges in querying and caching
 Intelligent querying over cache + database
 Correct and efficient caching of view objects
Update-related challenges
 Triggers and constraints of all types
 View objects (both directions)
Method execution on client or server
 Java should be very useful here
Legacy Data Access Challenges
Some data will live outside the OR-DBMS
 Older DBMSs (both relational & pre-relational)
 Specialized data stores (documents, images, ...)
 Applications (i.e., legacy transactions)
Object-relational middleware is the answer!
 Table functions can handle simple cases now
 Distributed OR query engine (a la DataJoiner) can
mediate between new applications and legacy data
 Resulting appearance is that of an integrated OR
database, accessible via SQL3 APIs and OO tools
Front-End Integration + Legacy Data Access
C++ Client
Co-op Cache
Java Client
Co-op Cache
Smalltalk Client
Co-op Cache
Co-op Interface
Object-Relational
Engine
Query
Object Wrappers
OR-DBMS
R-DBMS
Image Mgr
Text Mgr
Conclusions
Relational DBMS era: 1980's, early 1990's.
 Significantly raised the levels of abstraction & productivity
 Only "real" parallel computing success story to date, too!
Object DBMS era: Should have been early 1990's...
 Never made it out of the (mainstream) starting gate
Object-relational DBMS era: You are there!
 Object enhancements to relational DBMSs
ADTs (white box, black box) and functions
 Row types with inheritance, references, sets, ...


Vastly reduces the "impedence mismatch" w/OOPLs
Today's OO wrappers are an interim solution
 Possibilities abound for nice OO/OR tools


Will have OR middleware as well as engines
Descargar

No Slide Title