4/1
about XML/Xquery/RDF
HTML vs. XML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteoul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
<bibliography>
<book> <title> Foundations…
</title>
<author> Abiteboul
</author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison
Wesley </publisher>
<year> 1995 </year>
</book>
…
</bibliography>
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
<bibliography>
Abiteboul, Hull, Vianu
<book> <title> Foundations… </title>
<br> Addison Wesley, 1995
<author> Abiteboul </author>
<p> <i> Data on the Web </i>
<author> Hull </author>
Abiteoul, Buneman, Suciu
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<br> Morgan Kaufmann, 1999
HTML describes presentation
<year> 1995 </year>
</book>
…
</bibliography>
XML describes content
Why are Database folks so
excited about XML?
• XML is just a syntax for (selfdescribing) data
• This is still exciting because
– No standard syntax for
relational data
– With XML, we can
• Translate any legacy data
to XML
• Can exchange data in
XML format
– Ship over the web,
input to any
application
The X-standards…
• XML: an on-the-wire
representation for data
– Xquery: a query language
for XML
– Xschema: a schema
description language for
XML data
• RDF: a language for metadata description
• WSDL/SOAP/UDDI:
languages for describing
services
XML Terminology
•
•
•
•
•
•
tags: book, title, author, …
start tag: <book>, end tag: </book>
elements: <book>…<book>,<author>…</author>
elements are nested
empty element: <red></red> abbrv. <red/>
an XML document: single root element
well formed XML document: if it has matching tags
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
<bibliography>
Abiteboul, Hull, Vianu
<book> <title> Foundations… </title>
<br> Addison Wesley, 1995
<author> Abiteboul </author>
<p> <i> Data on the Web </i>
<author> Hull </author>
Abiteoul, Buneman, Suciu
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<br> Morgan Kaufmann, 1999
HTML describes presentation
<year> 1995 </year>
</book>
…
</bibliography>
XML describes content
XML Terminology
•
•
•
•
•
•
tags: book, title, author, …
start tag: <book>, end tag: </book>
elements: <book>…<book>,<author>…</author>
elements are nested
empty element: <red></red> abbrv. <red/>
an XML document: single root element
well formed XML document: if it has matching tags
More XML: Attributes
<book price = “55” currency = “USD”>
<title> Foundations of Databases </title>
<author> Abiteboul </author>
…
<year> 1995 </year>
</book>
Attributes are single-valued
--No guidance on when to use them
Object identifiers
More XML: Oids and References
<person id=“o555”> <name> Jane </name> </person>
<person id=“o456”> <name> Mary </name>
<children idref=“o123 o555”/>
</person>
<person id=“o123” mother=“o456”><name>John</name>
</person>
oids and references in XML are just syntax
XML vs. Relational Data
• XML is meant as a language that
supports both Text and Structured Data
– Conflicting demands...
• XML supports semi-structured data
– In essence, the schema can be
union of multiple schemas
• Easy to represent books with or
without prices, books with any
number of authors etc.
• XML supports free mixing of text and
data
– using the #PCDATA type
• XML is ordered (while relational data
is unordered)
TEXT
More
Structure
XML
Less
Structure
Structured
(relational)
Data
DTDs
<!DOCTYPE paper [
<!ELEMENT paper (section*)>
<!ELEMENT section ((title,section*) | text)>
<!ELEMENT title
(#PCDATA)>
<!ELEMENT text
(#PCDATA)>
]>
Semistructured
<paper> <section> <text> </text> </section>
<section> <title> </title> <section> … </section>
<section> … </section>
</section>
</paper>
XML Schemas
•
•
•
•
•
More recent proposal (with XML syntax)
unifies previous schema proposals
generalizes DTDs
uses XML syntax
two documents: structure and datatypes
– http://www.w3.org/TR/xmlschema-1
– http://www.w3.org/TR/xmlschema-2
XML Schema
Querying XML
• Requirements:
– Need to handle lack of schema.
• We may not know much about the data, so we need to navigate
the XML.
– Need to support both “information retrieval” and “SQLstyle” queries.
• Ordered vs. un-ordered XML
– “Human readable”
• like SQL? 
• Candidates
– Many… based on conflicting requirements
• XSL: Makes IR folks happy
• XML-QL: Makes DB folks happy
• Xquery : W3C’s attempt to make everybody (un)happy
Xquery Resources
•
•
•
XQuery 1.0: An XML Query
Language
– W3C Working Draft 20
December 2001
XML Query Use Cases
– W3C Working Draft 20
December 2001
Microsoft .Net Xquery Language
Demo
– http://131.107.228.20/
– http://support.xhive.com/xquery/index.ht
ml
•
– Supports querying on the
documents described in the
W3C Use Cases
Xquery Tutorial by Fankhauser &
Wadler
– www.research.avayalabs.com/
user/wadler/papers/xquerytutorial/ xquery-tutorial.pdf
FLoWeR Expressions
Xquery queries are made up of FLWR expressions
that work on “paths”
• For binds variables to nodes
• Let computes aggregates
• Where applies a formula to find matching
elements
• Return constructs the output elements
Path expressions are of the form:
element//element/element[attrib=value]
Comparison to SQL
•
Look at the use case description on Xquery manual
• Supports all (?) SQL style queries (with different
syntax of course) [default queries in the demo]
• Has support for
– “construction”—outputting the answers in arbitrary
XML formats (use case “XMP” )
– “path expressions” --- navigating the XML tree (use
case “seq”)
– Simple text queries [use case “text”]
– Allows queries on “Tag” elements
• Removes the “data/meta-data” barrier in queries
– For each book that has at least one author, list the title and first
two authors, and an empty "et-al" element if the book has
additional authors. [XMP use case 6]
DTD for
http://www.bn.com/bib.xml
<!ELEMENT bib (book* )>
<!ELEMENT book (title, (author+ | editor+ ), publisher, price )>
<!ATTLIST book year CDATA #REQUIRED >
<!ELEMENT author (last, first )>
<!ELEMENT editor (last, first, affiliation )>
<!ELEMENT title (#PCDATA )>
<!ELEMENT last (#PCDATA )>
<!ELEMENT first (#PCDATA )>
<!ELEMENT affiliation (#PCDATA )>
<!ELEMENT publisher (#PCDATA )>
<!ELEMENT price (#PCDATA )>
Example Query
Query
<bib>
{ for $b in /bib/book
where $b/publisher = "AddisonWesley"
and [email protected] > 1991
return <book year={ [email protected] }>
{ $b/title }
</book> }
</bib>
“For all books after 1991,
return with Year changed from
a tag to an attribute”
Result
<bib>
<book year="1994">
<title>TCP/IP Illustrated</title>
</book>
<book year="1992">
<title>Advanced Programming in
the Unix environment</title>
</book>
</bib>
Example Query (2)
• Return the books that cost more at amazon than
fatbrain
Let $amazon :=
document(http://www.amazon.com/books.xml),
Let $fatbrain :=
document(http://www.fatbrain.com/books.xml)
Join
For $am in $amazon/books/book,
$fat in $fatbrain/books/book
Where $am/isbn = $fat/isbn
and $am/price > $fat/price
Return <book>{ $am/title, $am/price, $fat/price
}<book>
XML frenzy in the DB Community
• Now that XML is there, what can we do
with it?
– Convert all databases from Relational to XML?
• Or provide XML views of relational databases?
– Develop theory of native XML databases?
• Or assume that XML data will be stored in relational
databases..
– Issues: What sort of storage mechanisms? What sort of
indices?
XML middleware for Databases
X query
• XML adapters (middle-ware)
received significant attention in
DB community
– SilkRoute (AT&T)
– Xperanto (IBM)
• Issues:
– Need to convert relational data
into XML
• Tagging (easy)
– Need to convert Xquery queries
into equivalent SQL queries
• Trickier as Xquery supports
schema querying
SQ L
XML
R elations
Don’t look beyond this..
Xquery Tutorial
Craig Knoblock
University of Southern California
References
• XQuery 1.0: An XML Query Language
– W3C Working Draft 20 December 2001
• XML Query Use Cases
– W3C Working Draft 20 December 2001
• Microsoft .Net Xquery Language Demo
– http://131.107.228.20/
– Supports querying on the documents described in the
W3C Use Cases
• Xquery Tutorial by Fankhauser & Wadler
– www.research.avayalabs.com/user/wadler/papers/xquer
y-tutorial/ xquery-tutorial.pdf
DTD for
http://www.bn.com/bib.xml
<!ELEMENT bib (book* )>
<!ELEMENT book (title, (author+ | editor+ ), publisher, price )>
<!ATTLIST book year CDATA #REQUIRED >
<!ELEMENT author (last, first )>
<!ELEMENT editor (last, first, affiliation )>
<!ELEMENT title (#PCDATA )>
<!ELEMENT last (#PCDATA )>
<!ELEMENT first (#PCDATA )>
<!ELEMENT affiliation (#PCDATA )>
<!ELEMENT publisher (#PCDATA )>
<!ELEMENT price (#PCDATA )>
Data for www.bn.com/bib.xml
<bib>
<book year="1994">
<title>TCP/IP Illustrated</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price> 65.95</price>
</book>
<book year="1992">
<title>Advanced Programming in the Unix
environment</title>
<author><last>Stevens</last><first>W.</first></author>
<publisher>Addison-Wesley</publisher>
<price>65.95</price>
</book>
Data for www.bn.com/bib.xml (cont.)
<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price> 39.95</price>
</book>
<book year="1999">
<title>The Economics of Technology and Content for Digital TV</title>
<editor> <last>Gerbarg</last><first>Darcy</first>
<affiliation>CITI</affiliation> </editor>
<publisher>Kluwer Academic Publishers</publisher>
<price>129.95</price>
</book>
</bib>
Document References
• Document can either be referenced explicitly
or in the default namespace
• In the Microsoft Demo
– /Bib = document("http://www.bn.com/bib.xml")/bib
• We will use /bib throughout, but you must use
the expansion to run the demo
• In Theseus the document for xquery is
passed as input
Projection
• Return the names of all authors of books
/bib/book/author
=
<author><last>Stevens</last><first>W.</first></author>
<author><last>Stevens</last><first>W.</first></author>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
Project (cont.)
• The same query can also be written as a for loop
/bib/book/author
=
for $bk in /bib/book return
for $aut in $bk/author return $aut
=
<author><last>Stevens</last><first>W.</first></author>
<author><last>Stevens</last><first>W.</first></author>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
Selection
• Return the titles of all books published before 1997
/bib/book[@year < "1997"]/title
=
<title>TCP/IP Illustrated</title>
<title>Advanced Programming in the Unix
environment</title>
Selection (cont.)
• Return the titles of all books published before 1997
/bib/book[@year < "1997"]/title
=
for $bk in /bib/book
where [email protected] < "1997"
return $bk/title
=
<title>TCP/IP Illustrated</title>
<title>Advanced Programming in the Unix
environment</title>
Selection (cont.)
• Return book with the title “Data on the Web”
/bib/book[title = "Data on the Web"]
=
<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author
>
<author><last>Buneman</last><first>Peter</first></author
>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price> 39.95</price>
</book>
Selection (cont.)
• Return the price of the book “Data on the Web”
/bib/book[title = "Data on the Web"]/price
=
<price> 39.95</price>
How would you return the book with a price of $39.95?
Selection (cont.)
• Return the book with a price of $39.95
for $bk in /bib/book
where $bk/price = " 39.95"
return $bk
=
<book year="2000">
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann Publishers</publisher>
<price> 39.95</price>
</book>
Construction
• Return year and title of all books published before 1997
for $bk in /bib/book
where [email protected] < "1997"
return <book>{ [email protected], $bk/title }</book>
=
<book year="1994">
<title>TCP/IP Illustrated</title>
</book>
<book year="1992">
<title>Advanced Programming in the Unix environment</title>
</book>
Grouping
• Return titles for each author
for $author in distinct(/bib/book/author/last) return
<author name={ $author/text() }>
{ /bib/book[author/last = $author]/title }
</author>
=
<author name="Stevens">
<title>TCP/IP Illustrated</title>
<title>Advanced Programming in the Unix environment</title>
</author>
<author name="Abiteboul">
<title>Data on the Web</title>
</author>
…
Join
• Return the books that cost more at amazon than fatbrain
Let $amazon := document(http://www.amazon.com/books.xml),
Let $fatbrain := document(http://www.fatbrain.com/books.xml)
For $am in $amazon/books/book,
$fat in $fatbrain/books/book
Where $am/isbn = $fat/isbn
and $am/price > $fat/price
Return <book>{ $am/title, $am/price, $fat/price }<book>
Example Query 1
<bib>
{ for $b in /bib/book
where $b/publisher = "Addison-Wesley" and
[email protected] > 1991
return <book year={ [email protected] }>
{ $b/title }
</book> }
</bib>
What does this do?
Result Query 1
<bib>
<book year="1994">
<title>TCP/IP Illustrated</title>
</book>
<book year="1992">
<title>Advanced Programming in the Unix
environment</title>
</book>
</bib>
Example Query 2
<results>
{ for $b in
document("http://www.bn.com/bib.xml")/bib/book,
$t in $b/title,
$a in $b/author
return
<result>
{ $t }
{ $a }
</result> }
</results>
Result Query 2
<results>
<result><title>TCP/IP Illustrated</title>
<last>Stevens </last>
</result>
<result><title>Advanced Programming in the Unix environment</title>
<last>Stevens</last>
</result>
<result><title>Data on the Web</title>
<last>Abiteboul</last>
</result>
<result> <title>Data on the Web</title>
<last>Buneman</last>
</result>
<result><title>Data on the Web</title>
<last>Suciu</last>
</result>
</results>
Example Query 3
<books-with-prices>
{
for $b in document("http://www.bn.com/bib.xml")//book,
$a in document("http://www.amazon.com/reviews.xml")//entry
where $b/title = $a/title
return
<book-with-prices>
{ $b/title }
<price-amazon>{ $a/price/text() }</price-amazon>
<price-bn>{ $b/price/text() }</price-bn>
</book-with-prices>
}
</books-with-prices>
Result Query 3
<books-with-prices>
<book-with-prices>
<title>TCP/IP Illustrated</title>
<price-amazon>65.95</price-amazon>
<price-bn> 65.95</price-bn>
</book-with-prices>
<book-with-prices>
<title>Advanced Programming in the Unix environment</title>
<price-amazon>65.95</price-amazon>
<price-bn>65.95</price-bn>
</book-with-prices>
<book-with-prices>
<title>Data on the Web </title>
<price-amazon>34.95</price-amazon>
<price-bn> 39.95</price-bn>
</book-with-prices>
</books-with-prices>
Example Query 4
<bib>
{ for $b in document("www.bn.com/bib.xml")//book
where $b/publisher = "Addison-Wesley" and [email protected] > "1991"
return <book> { [email protected] } { $b/title } </book>
sortby (title) }
</bib>
Example Result 4
<bib>
<book year="1992">
<title>Advanced Programming in the Unix environment</title>
</book>
<book year="1994">
<title>TCP/IP Illustrated</title>
</book>
</bib>
Impact of XML on Integration
If and when all sources accept
Xqueries and exchange data in
XML format, then
– Mediator can accept user
queries in Xquery
– Access sources using Xquery
– Get data back in XML format
– Merge results and send to user
in XML format
• How about now?
– Sources can use XML
adapters (middle-ware)
u
Xq
X q u ery
e ry
XM
L
M ediator
XML
X query
SQ L
XML
R elations
Is XML standardization a magical solution for Integration?
S ervices
S ou rce T ru st
O ntologies;
S ource/S ervice
D escriptions
If all WEB sources standardize into
XML format
W eb p ages
S tru ctu red
d ata
S en sors
(stream in g
D ata)
Ca
lls
S ou rce F u sion /
Q u ery P lan n in g
– Source access (wrapper generation
issues) become easier to manage
– BUT all other problems remain
od
ur
So
el
ry
M onitor
ity
til
/U
ce
en
er
ef
Pr
Answers
cs
E xecu tor
N eeds to handle
S ource/netw ork
Interruptions,
R untim e uncertainity,
replanning
M
ue
Q
ce
N eeds to handle:
M ultiple objectives,
S ervice com position,
S ource quality & o verlap
i
tis t
S ta
in g
dat
Up
ng
ni
an s
pl
t
Re ues
q
Re
• Still need to relate source
(XML)schemas to mediator
(XML)schema
• Still need to reason about source
overlap, source access limitations
etc.
• Still need to manage execution in
the presence of source/network
uncertainities
P robing
Q ueries
Xq
X q u ery
ry
XM
M ediator
XML
ue
L
“Semantic Web”
• The LAV/GAV approaches assume that some human
expert will do the actual schema mapping
• The “semantic-web” initiative attempts to automate
schema mapping
– Idea: Allow pages to write logical axioms relating their
vocabulary (tags) to other external tags
– Support automatic inference of relations between
source and mediator schema using these rules
• DAML+OIL
Descargar

Document