OCL4 Oracle 10g:
SQL & PL/SQL
Session #8
Matthew P. Johnson
CISDD, CUNY
January, 2005
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
1
Agenda

Security & web apps

RegEx support in 10g

Oracle & XML

Data warehousing

More on the PL/SQL labs

Any more lab?
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
2
Review: Why security is hard

It’s a “negative deliverable”

It’s an asymmetric threat

Tolstoy: “Happy families are all alike; every
unhappy family is unhappy in its own way.”


Analogs: “homeland”, jails, debugging, proofreading, Popperian science, fishing, MC algs
So: fix biggest problems first
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
3
DB users have privileges







SELECT: read access to all columns
INSERT(col-name): can insert rows with nondefault values in this column
INSERT: can insert rows with non-default values in
all columns
DELETE
REFERENCES(col-name): can define foreign keys
that refer to (or other constraints that mention) this
column
TRIGGER: triggers can reference table
EXECUTE: can run function/SP
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
4
Granting privileges (Oracle)


One method of setting access levels
Creator of object automatically gets all
privileges to it



Possible objects: tables, whole databases, stored
functions/procedures, etc.
<DB-name>.* - all tables in DB
A privileged user can grant privileges to other
users or groups
GRANT
GRANTSELECT
privileges
ON mytable
ON object
TO someone
TO
WITH
userGRANT
<WITHOPTION;
GRANT OPTION>
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
5
Granting and revoking

Privileged user has privileges

Privileged-WGO user can grant them, w/wo GO

Granter can revoke privileges or GO
Revocation cascades by default




To prevent, use RESTRICT (at end of cmd)
If would cascade, command fails
Can change owner:
ALTER TABLE my-tbl
OWNER TO new-owner;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
6
Granting and revoking


What we giveth, we may taketh away
mjohnson: (effects?)
GRANT SELECT, INSERT ON my-table TO
george WITH GRANT OPTION;

george: (effects?)
GRANT SELECT ON my-table TO laura;

mjohnson: (effects?)
REVOKE SELECT ON my-table FROM laura;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
7
Role-based authorization


In SQL-1999, privileges assigned with roles
For example:




Student role
Instructor role
Admin role
Each role gets to do same (sorts of) things
GRANT SELECT ON my-table TO employee;

Privileges assigned by assigning role to users
GRANT employee TO billg;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
8
Passwords

DBMS recognizes your privileges because it
recognizes you

how?

Storing passwords in the DB is a bad idea
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
9
Hashed or digested passwords
One-way hash function:

computing f(x) is easy;
Computing f-1(y) is hard/impossible;
Finding some x2 s.t. f(x2) = f(x) is hard/imposs
1.
2.
3.

Intuitively: seeing f(x) gives little (useful) info on x





“collisions”
x “looks random”
PRNGs
MD5, SHA-1
RFID for cars: http://www.rfidanalysis.org/
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
10
Built-in accounts

Many DBMSs (and OSs) have built-in demo
accounts by default


MySQL: root/(blank) (closed on sales)



In some versions, must “opt out”
http://lists.seifried.org/pipermail/security/2004-February/001782.html
Oracle: scott/tiger (was open on sales last
year)
SQLServer: sa/(blank/null)

http://support.microsoft.com/default.aspx?scid=kb;EN-US;313418
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
11
Query-related: Injection attacks

Here’s a situation:


Prompt for user/pass
Do lookup:
SELECT * FROM users
WHERE user=u AND password=p;

If found, user gets in

test.user table in MySQL
http://pages.stern.nyu.edu/~mjohnson/dbms/php/loginph
p.txt

http://pages.stern.nyu.edu/~mjohnson/dbms/php/login.php


Apart from no hashing, is this safe?
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
12
Injection attacks
SELECT * FROM users
WHERE user = u AND password = p;

We expect to get input of something like:
user: mjohnson
pass: secret



SELECT * FROM users
WHERE user= 'mjohnson' AND password
= 'secret';
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
13
Injection attacks – MySQL/Perl/PHP
SELECT * FROM users
WHERE user = u AND password = p;

Consider another input:
user: ' OR 1=1 OR user = '
SELECT * FROM users
 pass: ' OR 1=1 OR pass = '
WHERE user = ''

OR http://pages.stern.nyu.edu/~mjohnson/dbms/php/login.php
1=1
http://pages.stern.nyu.edu/~mjohnson/dbms/eg/injection.txt

OR user = ''
SELECT
* FROM
users
AND
password
= ''
WHEREOR
user
1=1= '' OR 1=1 OR user = ''
AND password
' OR 1=1 OR pass = '';
OR pass = '
'';
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
14
Injection attacks – MySQL/Perl/PHP
SELECT * FROM users
WHERE user = u AND password = p;

Consider this one:
user: your-boss' OR 1=1 #
pass: abc



http://pages.stern.nyu.edu/~mjohnson/dbms/php/login.php
SELECT
SELECT ** FROM
FROM users
users
WHERE
WHERE user
user == 'your-boss'
'your-boss' OR 1=1 #'
AND password
'abc';
OR 1=1 #'= AND
password = 'abc';
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
15
Injection attacks – MySQL/Perl/PHP
SELECT * FROM users
WHERE user = u AND password = p;

Consider another input:


user: your-boss
pass: ' OR 1=1 OR pass = '
http://pages.stern.nyu.edu/~mjohnson/dbms/php/login.php
SELECT * FROM users
SELECTuser
* FROM
users
WHERE
= 'your-boss'
WHEREAND
user
= 'your-boss'
AND password
password
= ''
= '' OR
OR 1=1
1=1 OR pass = '';
OR pass = '';
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
16
Multi-command inj. attacks (other DBs)
SELECT * FROM users
WHERE user = u AND password = p;

Consider another input:


user: '; DELETE FROM users WHERE user =
'abc'; SELECT FROM users WHERE
password = '
pass: abc
 SELECT
SELECT ** FROM
FROM users
users WHERE user = '';
DELETE
FROM =users
WHERE user
'abc';
WHERE user
''; DELETE
FROM =users
SELECT
FROM =users
WHERE
password
WHERE user
'abc';
SELECT
FROM = ''
password
= 'abc';
usersAND
WHERE
password
= '' AND
password = 'abc';
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
17
Multi-command inj. attacks (other DBs)
SELECT * FROM users
WHERE user = u AND password = p;

Consider another input:


user: '; DROP TABLE users; SELECT FROM
users WHERE password = '
pass: abc
 SELECT
SELECT ** FROM
FROM users
users WHERE user = '';
DROP
WHERETABLE
user users;
= ''; DROP TABLE users;
SELECT FROM
FROM users
users WHERE
WHERE password
password == ''
SELECT
ANDpassword
password=='abc';
'abc';
'' AND
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
18
Multi-command inj. attacks (other DBs)
SELECT * FROM users
WHERE user = u AND password = p;

Consider another input:


user: '; SHUTDOWN WITH NOWAIT; SELECT
FROM users WHERE password = '
pass: abc
 SELECT
SELECT ** FROM
FROM users
users WHERE user = '';
SHUTDOWN
WITH
NOWAIT;
WHERE user
= '
'; SHUTDOWN WITH
NOWAIT;
SELECT
FROM
users
WHERE = ''
SELECT
FROM
users
WHERE
password
AND password
'abc'; = 'abc';
password
= '' AND=password
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
19
Injection attacks – MySQL/Perl/PHP
DELETE FROM users
WHERE user = u AND password = p;

Consider another input:


user: your-boss
pass: ' OR 1=1 AND user = 'your-boss
http://pages.stern.nyu.edu/~mjohnson/dbms/php/users.php
 Delete your boss!
DELETE
DELETE FROM
FROM users
users
WHERE user = 'your-boss'
WHERE user = 'your-boss' AND pass = '
= '' = 'your-boss';
' OR AND
1=1 pass
AND user
OR 1=1
AND Matthew
user
= 'your-boss';
P. Johnson, OCL4, CISDD CUNY, Sept 2005
20
Injection attacks – MySQL/Perl/PHP
DELETE FROM users
WHERE user = u AND pass = p;

Consider another input:


user: ' OR 1=1 OR user = '
pass: ' OR 1=1 OR user = '
DELETEhttp://pages.stern.nyu.edu/~mjohnson/dbms/php/users.php
FROM users
WHERE everyone!
user = ''
 Delete
OR 1=1
DELETE FROM users
OR user = ''
WHERE user = '' OR 1=1 OR user = ''
AND pass = ''
AND pass = '' OR 1=1 OR user = '';
OR 1=1
OR user = '';
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
21
Preventing injection attacks


Ultimate source of problem: quotes
Soln 1: don’t allow quotes!


Q: Is this satisfactory?


Reject any entered data containing single quotes
Does Amazon need to sell O’Reilly books?
Soln 2: escape any single quotes



Replace any ' with a '' or \'
In Perl, use taint mode – won’t show
In PHP, turn on magic_quotes_gpc flag in .htaccess

show both PHP versions
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
22
Preventing injection attacks

Soln 3: use prepare parameter-based queries

Supported in JDBC, Perl DBI, PHP ext/mysqli

http://pages.stern.nyu.edu/~mjohnson/dbms/perl/loginsafe.cgi
http://pages.stern.nyu.edu/~mjohnson/dbms/perl/userssafe.cgi


Very dangerous: using tainted data to run
commands at the Unix command prompt


Semi-colons, prime char, etc.
Safest: define set if legal chars, not illegal ones
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
23
Preventing injection attacks



When to do security checking for quotes,
etc.?
Natural choice: in client-side data validation
But not enough!


As saw earlier: can submit GET and POST
params manually
 Must do security checking on server
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
24
More Info

phpGB MySQL Injection Vulnerability


http://www.securiteam.com/unixfocus/6X00O1P5PY.html
"How I hacked PacketStorm“

http://www.wiretrip.net/rfp/txt/rfp2k01.txt
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
25
SQL*Plus settings
SQL> SET RECSEP OFF
SQL> COLUMN text FORMAT A60
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
26
New topic: Regular Expressions

In automata theory, Finite Automata are the
simplest weakest of computer, Turing
Machines the strongest


Chomsky’s Hierarchy
FA are equivalent to a regular expression


Expressions that specify a pattern
Can check whether a string matches the pattern
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
27
RegEx matching



Use REGEX_LIKE
Metachar for any char is .
First, get employee_comment table:


http://pages.stern.nyu.edu/~mjohnson/oracle/empcomm.sql
Now do search:
SELECT emp_id, text
FROM employee_comment
WHERE REGEXP_LIKE(text,'...-....');

So far, like LIKE
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
28
RegEx matching

Can also pull out the matching text with
REGEXP_SUBSTR:
SELECT emp_id,
REGEXP_SUBSTR(text,'...-....') text
FROM employee_comment
WHERE REGEXP_LIKE(text,'...-....');

If want only numbers, can specify a set of
chars rather than a dot:
SELECT emp_id, REGEXP_SUBSTR(text,
'[0123456789]..-...[0123456789]') text
FROM employee_comment
WHERE REGEXP_LIKE(text, '[0123456789].....[0123456789]');
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
29
RegEx matching

Or can specify a range of chars:
SELECT emp_id, REGEXP_SUBSTR(text,
'[0-9]..-....') text
FROM employee_comment
WHERE REGEXP_LIKE(text,'...-....');

Or, finally, can state how many copies to
match:
SELECT emp_id, REGEXP_SUBSTR(text,
'[0-9]{3}-[0-9]{4}') text
FROM employee_comment
WHERE REGEXP_LIKE(text,'[0-9]{3}-[0-9]{4}');
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
30
RegExp matching

Other operators:





* - 0 or more matches
+ - 1 or more matches
? - 0 or 1 match
Also, can OR options together with | op
Here: some phone nums have area codes, some
not, so want to match both:
SELECT emp_id, REGEXP_SUBSTR(text,
'[0-9]{3}-[0-9]{3}-[0-9]{4}|[0-9]{3}[0-9]{4}') text
FROM employee_comment
WHERE REGEXP_LIKE(text,'[0-9]{3}-[0-9]{3}[0-9]{4}|[0-9]{3}-[0-9]{4}');
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
31
RegExp matching

Order of ORed together patterns matters:

First matching pattern wins
SELECT emp_id, REGEXP_SUBSTR(text,
'[0-9]{3}-[0-9]{4}|[0-9]{3}-[0-9]{3}[0-9]{4}') text
FROM employee_comment
WHERE REGEXP_LIKE(text,'[0-9]{3}-[09]{4}|[0-9]{3}-[0-9]{3}-[0-9]{4}');
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
32
RegExp matching

There’s a shared structure between the two,
tho


Area code is just optional
Can use ? op
SELECT emp_id, REGEXP_SUBSTR(text,
'([0-9]{3}-)?[0-9]{3}-[0-9]{4}') text
FROM employee_comment
WHERE REGEXP_LIKE(text,'([0-9]{3}-)?[09]{3}-[0-9]{4}');
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
33
RegExp matching

Also, different kinds of separators:



dash, dot, just blank
Can OR together whole number patterns
Better: Just use set of choices of each sep.
SELECT emp_id, REGEXP_SUBSTR(text, '([09]{3}[-. ])?[0-9]{3}[-. ][0-9]{4}') text
FROM employee_comment
WHERE REGEXP_LIKE(text,'([0-9]{3}[-. ])?[09]{3}[-. ][0-9]{4}');
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
34
RegExp matching

One other thing: area codes in parentheses


Of course, area codes are still optional
Parentheses must be escaped - \( \)
SELECT emp_id, REGEXP_SUBSTR(text, '([09]{3}[-. ]|\([0-9]{3}\) )?[0-9]{3}[-. ][09]{4}') text
FROM employee_comment
WHERE REGEXP_LIKE(text,'([0-9]{3}[-. ]|\([09]{3}\) )?[0-9]{3}[-. ][0-9]{4}');
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
35
And now for something completely different:
XML

XML: eXtensible Mark-up Language

Very popular language for semi-structured
data

Mark-up language: consists of elements
composed of tags, like HTML

Emerging lingua franca of the Internet, Web
Services, inter-vender comm
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
36
Unstructured data

At one end of continuum: unstructured data





Text files
Stock market prices
CIA intelligence intercepts
Audio recordings
“Just one damn bit after another”



~ Henry Ford
No (intentional, formal) patterns to the data
Difficult to manage/make sense of

Why we need data-mining
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
37
Structured data

At the other end: structured data


Tables in RDBMSs
Data organized into semantic chunks


Similar/related entities grouped together


Relationships, classes
Entities in same group have same structure


entities
Same fields/attributes/properties
Easy to make sense of


But sometimes too rigid a req.
Difficult to send—convert to tab-delimited
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
38
Semi-structured data

Not too random



Data organized into entities
Similar/related grouped to form other entities
Not too structured


Some attributes may be missing
Size of attributes may vary


Support of lists/sets
Juuust Right

Data is self-describing
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
39
Semi-structured data

Predominant examples:




HTML: HyperText Mark-up Language
XML: eXtensible Mark-up Language
NB: both mark-up languages (use tags)
Mark-up lends self of semi-structured data


Demarcate boundaries for entities
But freely allow other entities inside
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
40
Data model for semi-structured data


Usually represented as directed graphs
Graph: set of vertices (nodes) and edges


In model,



Dots connected by lines; not nec. a tree!
Nodes ~ entities or fields/attributes
Edges ~ attribute-of/sub-entity-of
Example: publisher publishes >=0 books


Each book has one title, one year, >=1 authors
Draw publishers graph
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
41
XML is a SSD language
Standard published by W3C

Officially announced/recommended in 1998

XML != HTML

XML != a replacement for HTML
Both are mark-up languages


Big diffs:

XML doesn’t use predefined tags (!)
1.

But it’s extensible: tags can be added
HTML is about presentation: <I>, <B>, <P>
2.

XML is about content: <book>, <author>
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
42
XML syntax

Like HTML in many respects but more strict

All tags must be closed



Can’t have: this is a line<br>
Every start tag has an end tag
Although <br/> style can replace both

IS case-sensitive
IS space-sensitive

XML doc has a unique root element

Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
43
XML syntax

Tags must be properly nested



Elements may have quoted attributes


<Myelm myatt=“myval”>…</Myelm>
Comments same as in HTML:


Not allowed <b><i>I’m not kidding</b></i>
Intuition: file folders
<!-- Pay no attention… -->
Draw publishers XML
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
44
Escape chars in XML

Some chars must be escaped
>
&lt;
Distinguish content from syntax
<
&gt;
&
&amp;
"
&quot;
'
&apos;

<elm>3 &lt; 5</elm>
<elm>&quot;Don&apos;t call me
&apos;Ishmael&apos;!&quot;</elm>

Can also declare value to be pure text:
<aRealTag>
<![CDATA[<notAtag>jsdljsd<neitherAmI<“'><>>]]>
</aRealTag>
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
45
XML Namespaces

Different schemas/DTDs may overlap


XHTML and MathML share some tags
Soln: namespaces

as in Java/C++/C#
<book xmlns:isbn="www.isbn-org.org/def">
<title>...</title>
<number>15</number>
<isbn:number>...</isbn:number>
</book>
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
46
From Relational Data to XML Data
persons
XML:
Name
SSN
Mailing-address
Michael
123
NY
Hilary
456
DC
Bill
789
Chappaqua
persons
row
name
<persons>
<row><name>Michael</name>
<ssn>123</ssn></row>
<row><name>Hilary</name>
<ssn>456</ssn></row>
<row><name>Bill</name>
<ssn>789</ssn></row>
</persons>
ssn
name ssn
“Michael” 123 “Hilary” 456
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
row
row
name
“Bill”
ssn
789
47
Semi-structured Data Explained

List-valued attributes

XML is not 1NF!
<persons>
<row><name>Hilary</name>
<phone>202-222-2222</phone>
<phone>914-222-2222</phone></row>
<row><name>Bill</name>
<phone>914-222-2222</phone>
<phone>212-333-3333</phone></row>
 Impossible in (single, BCNF) tables:
</persons>
name
phone
Bill
9142222222
2123333333
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
 two phones!
???
48
<movieinfo>
<movie id="o111">
<title>Lost in Translation</title>
SSD graph might not be trees!
<year>2003</year>
But XML
docs must be
<stars idref="o333 o444"/>
</movie>
<movie
id="o222">
Would
cause
much redundancy
Soln: <title>Hamlet</title>
same concept as pointers in C/C++/J
<year>1999</year>
 Object ids and references
<stars idref="o333"/>
</movie> <person id="o456">
Graph example:
<person id="o111">
 Movies: Lost in Translation, Hamlet
<name>Bill Murray</name>
 Stars: Bill Murray, Scarlet Johansson
<movies idref="o111 o222"/>
</person>
</movieinfo>
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
Object ids and References





49
What do we do with XML?

Things done with XML:







Send to partners
Parse XML received
Convert to RDBMS rows
Query for particular data
Convert to other XML
Convert to formats other than XML
Lots of tools/standards for these…
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
50
DTDs & understanding XML



XML is extensible
Advantage: when creating, we can use any
tags we like
Disadv: when reading, they can use any tags
they like


Using XML docs a priori is very difficult
Solution: impose some constraints
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
51
DTDs

DTD: Document Type Definition

You and partners/vertical industry/academic
discipline decide on a DTD/schema for your docs



DTD specifies the grammar to be used


Specify which entities you may use/must understand
Specify legal relationships
DTD = set of rules for creating valid entities
DTD tells your software what to look for in doc
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
52
DTD examples

Well-formed XML v. valid XML

Simple example:





http://pages.stern.nyu.edu/~mjohnson/dbms/xml/note.xml
http://pages.stern.nyu.edu/~mjohnson/dbms/xml/badnote.xml
http://pages.stern.nyu.edu/~mjohnson/dbms/xml/badnote2.xml
Copy from: http://pages.stern.nyu.edu/~mjohnson/dbms/eg/xml.txt
Partial publisher example rules:




Root  publisher
Publisher  name, book*, author*
Book  title, date, author+
Author  firstname, middlename?, lastname
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
53
Partial DTD example (typos!)
<?xml version=“1.0” encoding=“UTF-8” ?>
<!DOCTYPE PUBLISHER [
<!ELEMENT PUBLISHER (name, book*, author*)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT BOOK (title, date, author+)>
<!ELEMENT AUTHOR (firstname, middlename?,
lastname>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT middlename (#PCDATA)>


DTD is not XML, but can be embedded in or ref.ed from XML
Replacement for DTDs is XML Schema
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
54
XML Applications/dialects

MathML: Mathematical Markup Language

http://wwwasdoc.web.cern.ch/wwwasdoc/WWW/public
ations/ictp99/ictp99N8059.html

VoiceXML:
http://newmedia.purchase.edu/~Jeanine/interfac
es/rps.xml

ChemML: Chemical Markup Language

XHMTL: HTML retrofitted as an XML application
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
55
XML Applications/dialects

VoiceXML:



http://newmedia.purchase.edu/~Jeanine/interfaces/rps.xml
AT&T Directory Assistance
http://phone.yahoo.com/
Image from http://www.voicexml.org/tutorials/intro2.html
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
56
More XML Apps

FIXML


swiftML


XML equiv. of SWIFT: Society for Worldwide Interbank
Financial Telecommunications message format
Apache’s Ant



XML equiv. of FIX: Financial Information eXchange
Scripting language for Java build management
http://ant.apache.org/manual/using.html
Many more:

http://www-106.ibm.com/developerworks/xml/library/x-stand4/
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
57
More XML Applications/Protocols

RSS: Rich Site Summary/Really Simple
Syndication




News sites, blogs…
http://slate.msn.com/rss/
http://slashdot.org/index.rss
Screenshot
<channel>
http://paulboutin.weblogger.com/pictures/viewer$673
<title>my channel</title>
More
info: http://slate.msn.com/id/2096660/
<item>
<title>story 1</title>
<link>…</link>
</item>
// other items
</channel>
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005


58
More XML Applications/Protocols

SOAP: Simple Object Access Protocol
XML-based messaging format
 Used by Google API: http://www.google.com/apis/
 Amazon API: http://amazon.com/gp/aws/landing.html
 Amazon light: http://kokogiak.com/amazon/
 Other examples:
<SOAP:Envelope
http://www.wired.com/wired/archive/12.03/google.html?pg=
xmlns:SOAP="urn:schemas-xmlsoap-org:soap.v1">
10&topic=&topic_set=
<SOAP:Header></SOAP:Header>

<SOAP:Body>
<GetSalesTax>
 SOAP envelope with header and body
<SalesTotal>100</SalesTotal>
 Request sales tax for total
<GetSalesTax>
</SOAP:Body>
</SOAP:Envelope>
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
59
More XML Applications/Protocols
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<gs:doGoogleSearch xmlns:gs="urn:GoogleSearch">
<key>%(key)s</key>
<start>0</start>
<maxResults>10</maxResults>
<filter>true</filter>
<restrict/>
<safeSearch>false</safeSearch>
<lr/>
</gs:doGoogleSearch>
</soap:Body>
</soap:Envelope>
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
60
New topic: XML in Oracle - purchase-order e.g
<?xml version="1.0"?>
<purchase_order>
<customer_name>Alpha Tech</customer_name>
<po_number>11257></po_number>
<po_date>2004-01-20</po_date>
<po_items>
<item>
<part_number>AI5-4557</part_number>
<quantity>20</quantity>
</item>
<item>
<part_number>EI-T5-001</part_number>
<quantity>12</quantity>
</item>
</po_items>
</purchase_order>
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
61
Storing XML data

As of 9i, has XMLType data type

By default, underlying storage is as CLOB
CREATE TABLE purchase_order(
po_id number(5) not null,
customer_po_nbr varchar(20),
customer_inception_date date,
order_nbr number(5),
purchase_order_doc xmltype,
constraint purchase_order_pk
primary key(po_id)
);
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
62
Loading XML into Oracle

First, log in as sys:
connect sys/junk as sysdba
create directory xml_data as '/xml';
grant read, write on directory xml_data to scott;

Now scott can import:
connect scott/tiger
declare bf1 bfile;
begin
bf1 := bfilename('XML_DATA', 'purch_ord.xml');
insert into purchase_order(po_id, purchase_order_doc)
values(1000, xmltype(bf1,
nls_charset_id('we8mswin1252')));
end;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
63
Loading XML into Oracle

Not just loading raw text



XMLType data must be well-formed
Parsable as XML
Try modifying customer_name open tag
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
64
Accessing XML in Oracle

Now can look at raw XML:
SQL> SELECT purchase_order_doc
FROM purchase_order;

Can also use XPath to extract particular
nodes and values, with extract function:
SQL> SELECT extract(purchase_order_doc,
'/purchase_order/customer_name')
FROM purchase_order;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
65
XPath in Oracle

Can also extract all nodes of one type, underneath some
node, with double-slash //


All purchase order items
SQL> SELECT extract(purchase_order_doc,
'/purchase_order//item')
FROM purchase_order;
NB: this is not valid XML




No unique root
Can request just one with bracket op
Numbering starts at 1, not 0
Wrong name/number  no error, no results
SQL> SELECT extract(purchase_order_doc,
'/purchase_order/po_items/item[2]')
FROM purchase_order;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
66
extract v. extractvalue

extractvalue returns value, not whole node:
SQL> SELECT extract(purchase_order_doc,
'/purchase_order/customer_name')
FROM purchase_order;
vs.
SQL> SELECT extractvalue(purchase_order_doc,
'/purchase_order/customer_name')
FROM purchase_order;

extractvalue applies only to unique nodes:
SQL> SELECT extractvalue(purchase_order_doc,
'/purchase_order/po_items')
FROM purchase_order;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
67
existsnode function

Can check whether node/location exists with
existnode function

Returns 1 or 0
SQL> SELECT po_id FROM purchase_order
WHERE existsnode(purchase_order_doc,
'/purchase_order/customer_name') = 1;

Also applies to bracketed paths:
SQL> SELECT po_id FROM purchase_order
WHERE existsnode(purchase_order_doc,
'/purchase_order/po_items/item[1]') = 1;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
68
Moving data from XML to relations

To move single values from XML to tables, can
simply use extractvalue in UPDATE statements:
SQL> UPDATE purchase_order
SET order_nbr = 7101,
customer_po_nbr =
extractvalue(purchase_order_doc,
'/purchase_order/po_number'),
customer_inception_date =
to_date(extractvalue(purchase_order_doc,
'/purchase_order/po_date'), 'yyyy-mm-dd');
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
69
Moving data from XML to relations

What about moving set of nodes

The two item nodes
SQL> SELECT extract(purchase_order_doc,
'/purchase_order//item')
FROM purchase_order;

Use xmlsequence to get a varray of items

Use TABLE to convert to a relation
SQL> SELECT rownum, item.* FROM TABLE(
SELECT xmlsequence(extract(purchase_order_doc,
'/purchase_order//item'))
FROM purchase_order) item;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
70
Moving data from XML to relations



Result is a two-row relation with XMLTypes
Can use extractvalue to extract this data
First, create destination table:
CREATE TABLE LINE_ITEM(
ORDER_NBR NUMBER(9) NOT NULL,
PART_NBR VARCHAR2(20) NOT NULL,
QTY NUMBER(5) NOT NULL,
FILLED_QTY NUMBER(5),
CONSTRAINT line_item_pk PRIMARY KEY
(ORDER_NBR,PART_NBR));
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
71
Moving data from XML to relations

Then insert results:
SQL> INSERT INTO line_item(order_nbr,part_nbr,qty)
SELECT 7109, extractvalue(column_value,
'/item/part_number'),
extractvalue(column_value, '/item/quantity')
FROM TABLE(
SELECT xmlsequence(extract(purchase_order_doc,
'/purchase_order//item'))
FROM purchase_order
);
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
72
XML Schemas and Oracle



By default, XML must be well-formed to be read into
the XMLType field
XML is valid if it conforms to a schema
To use a schema with Oracle, must first register it:
declare bf1 bfile;
begin
bf1 := bfilename('XML_DATA',
'purch_ord.xsd');
dbms_xmlschema.registerschema(
'http://localhost:8080/home/xml
/schemas/purch_ord.xsd', bf1);
end;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
73
XML Schemas and Oracle

With schema registered, can apply it to an XMLType field
CREATE TABLE purchase_order2
(po_id NUMBER(5) NOT NULL,
customer_po_nbr VARCHAR2(20),
customer_inception_date DATE,
order_nbr NUMBER(5),
purchase_order_doc XMLTYPE,
CONSTRAINT purchase_order2_pk PRIMARY
KEY (po_id))
XMLTYPE COLUMN purchase_order_doc XMLSCHEMA
"http://localhost:8080/home/xml/schemas/pur
ch_ord.xsd"
ELEMENT "purchase_order";
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
74
Importing to schema field

Try to import xml file, get error:
declare bf1 bfile;
begin
bf1 := bfilename('XML_DATA', 'purch_ord.xml');
insert into purchase_order2(po_id,
purchase_order_doc)
values (2000, XMLTYPE(bf1,
nls_charset_id('WE8MSWIN1252')));
end;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
75
Importing to schema field


Root node of XML must specify the schema
Change root to the following:
<purchase_order
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://localhost:8080/h
ome/xml/schemas/purch_ord.xsd">


Now can import
Also fails if extra or missing nodes


Modify company_name node
Add new comments node
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
76
Can check to see whether schema is used

Can call isSchemaBased(), getSchemaURL()
and isSchemaValid() on XMLType fields:
SQL> select po.purchase_order_doc.isSchemaBased(),
po.purchase_order_doc.getSchemaURL(),
po.purchase_order_doc.isSchemaValid()
from purchase_order2 po;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
77
Updating XMLType data

Can update XMLType data with ordinary
UPDATE statements:
SQL> UPDATE purchase_order po
SET po.purchase_order_doc =
XMLTYPE(BFILENAME('XML_DATA',
'purch_ord_alt.xml'),
nls_charset_id('WE8MSWIN1252'))
WHERE po.po_id = 2000;

Replaces whole XMLType object with new
one
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
78
Updating XMLType data

Can also modify the existing XMLType object


By writing node values
updateXML() function does search/replace

But searches for node, not value
SQL> SELECT extract(po.purchase_order_doc,
'/purchase_order/customer_name')
FROM purchase_order po
WHERE po_id = 1000;
SQL> UPDATE purchase_order po
SET po.purchase_order_doc =
updateXML(po.purchase_order_doc,
'/purchase_order/customer_name/text()',
'some other company')
WHERE po.po_id
= 1000;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
79
Updating XMLType data

Can also write whole node, using XMLType:
SQL> UPDATE purchase_order po
SET po.purchase_order_doc =
updateXML(po.purchase_order_doc,
'/purchase_order/customer_name',
XMLTYPE('<customer_name>some third
company</customer_name>'))
WHERE po.po_id = 1000;
SQL> SELECT extract(po.purchase_order_doc,
'/purchase_order/customer_name')
FROM purchase_order po
WHERE po_id = 1000;

Validation/well-formedness is still checked
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
80
Updating XMLType data

And can update items in a collection:
SQL> SELECT extract(po.purchase_order_doc,
'/purchase_order//item')
FROM purchase_order po
WHERE po.po_id = 1000;
SQL> UPDATE purchase_order po
SET po.purchase_order_doc =
updateXML(po.purchase_order_doc,
'/purchase_order/po_items/item[1]',
XMLTYPE('<item><part_number>T1000</part_number><quantity>33</quantity></i
tem>'))
WHERE po.po_id = 1000;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
81
Converting relational data to XML


Saw how to put XML in a table
Conversely, can convert ordinary relational
data to XML


XMLElement() generates an XML node
First, create supplier table:
CREATE TABLE SUPPLIER(
SUPPLIER_ID NUMBER(5) NOT NULL,
NAME VARCHAR2(30) NOT NULL,
PRIMARY KEY (SUPPLIER_ID));
insert into supplier values(1, 'Acme');
insert into supplier values(2, 'Tilton');
insert into
supplier
'Eastern');
Matthew
P. Johnson, OCL4,values(3,
CISDD CUNY, Sept 2005
82
Converting relational data to XML

Now can call XMLElement function to wrap values in
tags:
SELECT XMLElement("supplier_id", s.supplier_id) ||
XMLElement("name", s.name) xml_fragment
FROM supplier s;

And can build it up:
SELECT XMLElement("supplier",
XMLElement("supplier_id", s.supplier_id),
XMLElement("name", s.name))
FROM supplier s;

Don’t concatenate! Turns to strings, escapes < >

Error in book
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
83
XMLForest()

More simply, can use XMLForest() function:
SELECT XMLElement("supplier",
XMLForest(s.supplier_id, s.name))
FROM supplier s;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
84
XMLAgg()

Can use XMLAgg() to put nodes together
inside another node:
SELECT XMLElement("supplier_list",
XMLAgg(XMLElement("supplier",
XMLElement("supplier_id", s.supplier_id),
XMLElement("name", s.name)
))) xml_document
FROM supplier s;
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
85
New topic: Data Warehousing

Physical warehouse: stores different kinds of items





combined from different sources in supply chain
access items as a combined package
“Synergy”
DW is the sys containing the data from many DBs
OLAP is the system for easily querying the DW


Online analytical processing
front-end to DW & stats
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
86
Integrating Data

Ad hoc combination of DBs from different sources
can be problematic

Data may be spread across many systems



geographically
by division
different systems from before mergers…
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
87
Conversion/scrubbing/merging

Lots of issues…

different types of data


Different values for data


‘GREEN’/’GR/’2
Semantic differences


Varchar(255) v. char(30)
Cars v. Automobiles
Missing values

Handle with nulls or XML
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
88
Federated DBs

Situ: n different DBs must work together

One idea: write programs for each to talk to each
other one


How many programs required?
Like ambassadors for each country
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
89
Federated DBs

Better idea: introduce another DB


Now how many programs?


write programs for it to talk to each other DB
English in business, French in diplomacy
 Warehousing

Refreshed nightly
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
90
OLTP v. OLAP

DWs usually not updated in real-time




data is usually not live
but care about higher-level, longer-term patterns
For “knowledge workers”/decision-makers
Live data is in system used by OLTP



online transaction processing
E.g., airline reservations
OLTP data loaded into DW periodically, say nightly
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
91
Utilizing Data

Situ: each time manager has hunch




 requests custom reports
 direct programmers to write/modify SQL app to produce
these results
on higher or lower levels, for different specifics
Problem: too difficult/expensive/slow

too great a time lag
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
92
EISs

Could just write queries at command-prompt

But decision makes aren’t (all) SQL programmers

Soln: create an executive information system




provides friendly front-end to common, important queries
basically a simple DB front-end
your project part 5
GROUP BY queries are particularly applicable…
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
93
EISs v. OLAP

Okay for fixed set of queries
But what if queries are open-ended?

Q: What’s driving sales in the Northeast?




What’s the source cause?
Result from one query influences next query tried
OLAP systems are interactive:




run query
analyze results
think of new query
repeat
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
94
Star Schemas

Popular schema for DW data

One central DB surrounded by specific DBs

Center: fact table

Extremities: data tables

Fields in fact table are foreign keys to data tables

Normalization  Snowflake Schema

May not be worthwhile…
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
95
Dates and star schemas

OLAP behaves as though you had a Days table,
with every possible row



Dates(day, week, month, year, DID)
(5, 27, 7, 2000)
Can join on Days like any other table
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
96
Dates and star schemas

E.g.: products x salesperson x region x date


Regular dim tables:




Products sold by salespeople in regions on dates
Product(PID, name, color)
Emp(name, SSN, sal)
Region(name, RID)
Fact table:


Sales(PID, DID, SSN, RID)
Interpret as a cube (cross product of all dimensions)

Can have both data and stats
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
97
Drill-down & roll-up


Imagine: notice some region’s sales way up
Why? Good salesperson? Some popular product
there?

Maybe need to search by month, or month and
product, abstract back up to just product…

“slicing & dicing”
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
98
OLAP and data warehousing

Could write GROUP BY queries for each

OLAP systems provide simpler, non-SQL interface
for this sort of thing

Vendors: MicroStrategy, SAP, etc.

Otoh: DW-style operators have been added to SQL
and some DBMSs…
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
99
DW extensions in SQL: ROLLUP (Oracle)

Suppose have orders table (from two years), with
region and date info:
SQL> column month format a10
SQL> @mosql2_data
SQL> describe all_orders

Can select total sales:
SELECT sum(o.tot_sales)
FROM all_orders o join region r
ON r.region_id = o.region_id;


Examples derived/from Mastering Oracle SQL, 2e (O’Reilly)
Get data here: http://examples.oreilly.com/mastorasql2/mosql2_data.sql
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
100
DW extensions in SQL: ROLLUP (Oracle)

Can write GROUP BY queries for year or region or
both:
SELECT r.name region, o.year, sum(o.tot_sales)
FROM all_orders o join region r
ON r.region_id = o.region_id
GROUP BY (r.name, o.year);
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
101
DW extensions in SQL: ROLLUP (Oracle)

ROLLUP operator




Extension of GROUP BY
Does GROUP BY on several levels, simultaneously
Order matters
Get sales totals for each region/year pair each region,
and the grand total:
SELECT r.name region, o.year, sum(o.tot_sales)
FROM all_orders o join region r
ON r.region_id = o.region_id
GROUP BY ROLLUP (r.name, o.year);
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
102
DW extensions in SQL: ROLLUP (Oracle)


Change the order of the group fields to get a
different sequence of groups
To get totals for each year/region pair, each year,
and the grand total, and just reverse group-by order:
SELECT o.year, r.name region, sum(o.tot_sales)
FROM all_orders o join region r
ON r.region_id = o.region_id
GROUP BY ROLLUP (o.year, r.name);
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
103
DW extensions in SQL: ROLLUP (Oracle)

Adding more dimensions, like month, is easy (apart
from formatting):
SELECT o.year,
to_char(to_date(o.month, 'MM'),'Month') month,
r.name region, sum(o.tot_sales)
FROM all_orders o join region r
ON r.region_id = o.region_id
GROUP BY ROLLUP (o.year, o.month, r.name);

NB: summing happens on each level
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
104
DW extensions in SQL: ROLLUP (Oracle)

If desired, can combine fields for the sake of
grouping:
SELECT o.year,
to_char(to_date(o.month, 'MM'),'Month') month,
r.name region, sum(o.tot_sales)
FROM all_orders o join region r
ON r.region_id = o.region_id
GROUP BY ROLLUP ((o.year, o.month), r.name);
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
105
DW extensions in SQL: CUBE (Oracle)

Another GROUP BY extension: CUBE




Subtotals all possible combins of group-by fields (powerset)
Syntax: “ROLLUP”  “CUBE”
Order of fields doesn’t matter (apart from ordering)
To get subtotals for each region/month pair, each region, each
month, and the grand total:
SELECT
to_char(to_date(o.month, 'MM'),'Month') month,
r.name region,
sum(o.tot_sales)
FROM all_orders o join region r
ON r.region_id = o.region_id
GROUP BY CUBE (o.month, r.name);
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
106
DW extensions in SQL: CUBE (Oracle)

Again, can easily add more dimensions:
SELECT o.year,
to_char(to_date(o.month, 'MM'),'Month') month,
r.name region, sum(o.tot_sales)
FROM all_orders o join region r
ON r.region_id = o.region_id
GROUP BY CUBE (o.year, o.month, r.name);
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
107
DW SQL exts: GROUPING SETS (Oracle)


That’s a lot of rows
Instead of a cube of all combinations, maybe we just
want the totals for each individual field:
SELECT o.year,
to_char(to_date(o.month, 'MM'),'Month') month,
r.name region, sum(o.tot_sales)
FROM all_orders o join region r
ON r.region_id = o.region_id
GROUP BY GROUPING SETS (o.year, o.month, r.name);
Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
108
That’s all, folks!

Selected solutions to exercises:

sqlzoo ~ “Answers” on sqlzoo.net

PL/SQL solutions:
http://pages.stern.nyu.edu/~mjohnson/oracle/archive/fall04/plsql/

Files from my machine:
http://pages.stern.nyu.edu/~mjohnson/oracle/machine/

mpjohnson-at-gmail.com

Final evals!

Matthew P. Johnson, OCL4, CISDD CUNY, Sept 2005
109
Descargar

Document