XML 101:
A Technical Introduction to XML
20 November 2002
Bank of Montreal Database Users Group
Ian GRAHAM
IT Strategy, IBS, Technology and Solutions, BMO Financial Group
E: <[email protected]>
T: (416) 513.5656 / F: (416) 513.5590
To download this talk: http://www.utoronto.ca/ian/talks/
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
1
Presentation Outline
1. What is XML (basic introduction)
2. Defining language dialects and constraints
–
DTDs, namespaces, and schemas
3. XML processing
–
Parsers and parser interfaces; XML processing tools
4. XML databases
–
High-level issues, and references
5. XML messaging / web services
–
Why, and some issues/example
6. Conclusions
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
2
What is XML?
 A base-level syntax
–
for encoding structured, text-based information (words, characters, ...)
 A text-based syntax
– XML is written using printable Unicode characters. Explicit binary data is not
allowed
 Supports extensible data formats
– XML lets you define your own elements (essentially data types), within the
constraints of the syntax rules
 Designed as a universal format
– The syntax rules ensure that all XML processing software MUST identically
handle a given piece of XML data.
If you can read and process it, so can anybody else
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
3
XML: A Simple Example
XML Declaration (“this is XML”)
Flags character encoding
used in file
<?xml version="1.0" encoding="iso-8859-1"?>
<partorders
xmlns=“http://myco.org/Spec/partorders”>
<order ref=“x23-2112-2342”
date=“25aug1999-12:34:23h”>
<desc> Gold sprockel grommets,
with matching hamster
</desc>
<part number=“23-23221-a12” />
<quantity units=“gross”> 12 </quantity>
<deliveryDate date=“27aug1999-12:00h” />
</order>
<order ref=“x23-2112-2342”
date=“25aug1999-12:34:23h”>
. . . Order something else . . .
</order>
Black – XML tags and markup
</partorders>
Blue - encoded text data
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
4
Example Revisited
element
tags
attribute of this
quantity element
<partorders
xmlns=“http://myco.org/Spec/partorders” >
<order ref=“x23-2112-2342”
date=“25aug1999-12:34:23h”>
<desc> Gold sprockel grommets,
with matching hamster
</desc>
<part number=“23-23221-a12” />
<quantity units=“gross”> 12 </quantity>
<deliveryDate date=“27aug1999-12:00h” />
</order>
<order ref=“x23-2112-2342”
date=“25aug1999-12:34:23h”>
. . . Order something else . . .
</order>
Hierarchical, structured data
</partorders>
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
5
XML Data Model - A Tree
ref=
date=
<partorders xmlns="...">
<order date="..."
ref="...">
<desc> ..text..
</desc>
<part />
<quantity />
<delivery-date />
</order>
<order ref=".." .../>
desc
text
order
part
quantity
partorders
text
xmlns=
delivery-date
</partorders>
order
ref=
date=
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
6
XML: Design goals
 Simple but reliable
– Strict syntax rules, to eliminate syntax errors
– syntax defines structure (hierarchically), and names structural parts
(element names) -- it is self-describing data
 Extensible and ‘mixable’
– Can create your own language of tags/elements
– Can mix one language with another, and still reliably separate /
process the data
 Designed for a distributed environment
– Can have remote (‘webbed’) data, and retrieve and use it reliably
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
7
XML Processing: The XML Parser
parser
Interface
XML data
XML
parser
XML-based
application
 The parser must verify that the XML is syntactically correct
 Such data is said to be well-formed
– The minimal requirement to “be” XML
 A parser MUST stop processing if the data isn’t well-formed
– E.g., stop processing and “throw an exception” to the XML-based
application. The XML 1.0 spec requires this behaviour
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
8
Special Issues: Characters and Charsets
 XML specification defines characters allowed as whitespace in
tags: <element
id
=
“23.112”
/>
 You cannot use EBCIDIC character ‘NEL’ as whitespace
– Must make sure to not do so!
 What if you want to include characters not defined in the encoding
charset (e.g., Greek characters in an ISO-Latin-1 document):
– Use character references. For example:
&#9824; -- the spades character ()
9824th character in the Unicode character set
 Also, a reminder that binary data is forbidden
– must be encoded as printable characters (e.g. using Base64)
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
9
Parsers and DTDs
parser
interface
XML data
parser
XML-based
application
DTD
– A DTD can define external parts (entities) to be ‘included’ in
– But …. what if the parser can’t find the external parts (firewall?)?
– That depends on the type: there are two types of XML parsers
• one that MUST retrieve all parts
• one that can ignore them (if it can’t find them)
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
10
Two types of XML parsers
 Validating
– Must retrieve all entities and process all of the DTD. Will stop
processing and indicate a failure if it cannot
– It must also test and verify other things in the DTD -- instructions that
define syntactic document rules (allowed elements, attributes, etc.).
 Non-validating (well-formed only)
– Tries retrieve all ‘parts’, but will cease processing the DTD content
at the first part (entity) it can’t find,
– But this is not an error -- the parser simply makes available the XML
data (and the names of any unresolved ‘parts’) to the application.
Application behavior will depend on parser type
Many parsers can operate in either mode (config)
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
11
Presentation Outline
1. What is XML (basic introduction)
2. Defining language dialects and constraints
–
DTDs, namespaces, and schemas
3. XML processing
–
Parsers and parser interfaces; XML processing tools
4. XML databases
–
High-level issues, and references
5. XML messaging / web services
–
Why, and some issues/example
6. Conclusions
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
12
Defining constraints / languages
 Two ways of doing so:
– XML Document Type Declaration (DTD) -- Part of core XML spec.
– XML Schema (often called XSD) -- New specification (2001), which
allows for richer constraints on XML documents.
 What DTDs and/or schema specify:
– Allowed element and attribute names, hierarchical nesting rules;
element content/type restrictions
 Adding dialect specifications implies two classes of XML data
– Well-formed
– Valid
XML that is syntactically correct
XML that is well-formed and consistent with
a specific DTD (or Schema)
 Schemas are more powerful than DTDs
– Often used for type validation, or for defining low-level type
constraints (integer, varchar, datetime, etc.) constraints on values.
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
13
DTD Example
<!DOCTYPE transfers [
<!ELEMENT transfers (fundsTransfer)+ >
<!ELEMENT fundsTransfer (from, to) >
<!ATTLIST fundsTransfer
date CDATA #REQUIRED>
<!ELEMENT from (amount, transitID?, accountID,
acknowledgeReceipt ) >
<!ATTLIST from
type (intrabank|internal|other) #REQUIRED>
<!ELEMENT amount (#PCDATA) >
. . . Omitted DTD content . . .
<!ELEMENT to EMPTY >
<!ATTLIST to
account CDATA #REQUIRED>
]>
<transfers>
<fundsTransfer date="20010923T12:34:34Z">
. . . As with previous example . . .
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
14
XML Namespaces
 Mechanism for identifying different “spaces” for XML names
– That is, element or attribute names
 This is a way of identifying different language dialects, consisting
of names that have specific semantic (and processing) meanings.
 For example <key/> in one language (e.g. a security key) can be
distinguised from <key/> in another language (a database key)
 Mechanism uses a special xmlns attribute to define namespaces.
– The namespace is a URL string
– But the URL does not reference anything in particular (there may be
nothing there!)
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
15
Mixing languages together
Namespaces let you do this relatively easily:
<?xml version= "1.0" encoding= "utf-8" ?>
Default ‘space’
is xhtml
<html xmlns="http://www.w3.org/1999/xhtml1"
xmlns:mt="http://www.w3.org/1998/mathml” >
<head>
<title> Title of XHTML Document </title>
</head><body>
<div class="myDiv">
<h1> Heading of Page </h1>
<mt:mathml>
<mt:title> ... MathML markup . . .
</mt:mathml>
mt: prefix indicates
<p> more html stuff goes here </p>
‘space’ mathml (a
</div>
different language)
</body>
</html>
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
16
XML Schemas
 A specification for defining XML validation rules
Specs:
Best-practice:
http://www.w3.org/XML/Schema
http://www.xfront.com/BestPracticesHomepage.html
 Uses pure XML (plus namespaces) to do this
 More powerful than DTDs - can specify things like integer types,
date strings, real numbers in a given range, etc.
 Often used for type validation, or for relating database schemas
to XML models
 They don’t, however, let you declare entities -- those can only be
done in DTDs
 The following slide shows the XML schema equivalent to our DTD
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
17
XML Schema version of our DTD (Portion)
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xs:element name="accountID" type="xs:string"/>
<xs:element name="acknowledgeReceipt" type="xs:string"/>
<xs:complexType name="amountType">
<xs:simpleContent>
<xs:restriction base="xs:string">
<xs:attribute name="currency" use="required">
<xs:simpleType>
<xs:restriction base="xs:NMTOKEN">
<xs:enumeration value="USD"/>
. . . (some stuff omitted) . . .
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:restriction>
</xs:simpleContent>
</xs:complexType>
<xs:complexType name="fromType">
<xs:sequence>
<xs:element name="amount" type="amountType"/>
<xs:element ref="transitID" minOccurs="0"/>
<xs:element ref="accountID"/>
<xs:element ref="acknowledgeReceipt"/>
</xs:sequence>
. . . And still more !!! . . .
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
18
Presentation Outline
1. What is XML (basic introduction)
2. Defining language dialects and constraints
–
DTDs, namespaces, and schemas
3. XML processing
–
Parsers and parser interfaces; XML processing tools
4. XML databases
–
High-level issues, and references
5. XML messaging / web services
–
Why, and some issues/example
6. Conclusions
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
19
XML Software
 XML parsers…..
– Read in XML data, checks for syntactic (and possibly DTD/Schema)
constraints, and makes data available to an application. There are
three 'generic' parser APIs
•
•
•
•
SAX
DOM
JDOM
Pull
Simple API to XML
Document Object Model
Java Document Object Model
evolving API (new)
(event-based)
(object/tree based)
(object/tree based)
(pull-based / object + tree)
– Lots of XML parsers and interface software available
• Unix, Linux, Windows 2000/XP, Z/OS, etc
– SAX-based parsers are fast (often as fast as you can stream data)
– DOM slower, more memory intensive (create in-memory version of
entire document
– Validating can be much slower than non-validating
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
20
Parser API: SAX
A) SAX: Simple API for XML
– http://www.megginson.com/SAX/index.html
– An event-based interface (a push parser API)
– Parser reports events whenever it sees a tag/attribute/text
node/unresolved external entity/other (driven by input stream)
– Programmer attaches “event handlers” to handle the event
 Advantages
– Simple to use
– Very fast (not doing very much before you get the tags and data)
– Low memory footprint (doesn’t read an XML document entirely into
memory)
 Disadvantages
– Not doing very much for you -- you have to do everything yourself
– Not useful if you have to dynamically modify the document once it’s in
memory (since you’ll have to do all the work to put it in memory
yourself!)
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
21
Parser API: DOM
B) DOM: Document Object Model
–
–
–
–
http://www.w3.org/DOM/
An object-based interface
Parser generates an in-memory tree corresponding to the document
DOM interface defines methods for accessing and modifying the tree
 Advantages
– Very useful for dynamic modification of, access to the tree
– Useful for querying (I.e. looking for data) that depends on the tree
structure [element.childNode("2").getAttributeValue("boobie")]
– Same interface for many programming languages (C++, Java, ...)
 Disadvantages
– Can be slow (needs to produce the tree), and may need lots of
memory
– DOM programming interface is a bit awkward, not terribly object
oriented
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
22
DOM Parser Processing Model
DOM
parser
interface
XML data
application
parser
Document “object”
desc
text
order
part
partorders
quantity
delivery-date
order
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
23
Parser API: JDOM
B2) JDOM: Java Document Object Model
–
–
–
–
http://www.jdom.org
A Java-specific object-oriented interface
Parser generates an in-memory tree corresponding to the document
JDOM interface has methods for accessing and modifying the tree
 Advantages
– Very useful for dynamic modification of the tree
– Useful for querying (I.e. looking for data) that depends on the tree
structure
– Much nicer Object Oriented programming interface than DOM
 Disadvantages
– Can be slow (make that tree...), and can take up lots of memory
– New, and not entirely cooked (but close)
– Only works with Java
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
24
Parser API: Pull
C) Pull Interfaces
– http://www.xmlpull.org/ (Java); there is also a .NET pull API
– An pull-parser interface
– API uses expressions / methods to ‘pull’ specific chunks of XML data,
or to iterate over the XML
– Can be built on top of a DOM model
 Advantages
– Easier to write applications that need to read in and process XML
data (‘easier’ model than a push API, in many cases)
– Has proven a very popular component in the .NET toolkit
 Disadvantages
– Can be slow if you do lots of iteration over the XML input data
– No common API across different languages (although xmlpull.org
tries to be similar to the .NET API); not yet a ‘real’ standard (still being
worked on; not part of most commercial environments)
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
25
XML Processing: XSLT
D) XSLT eXtensible Stylesheet Language -- Transformations
– http://www.w3.org/TR/xslt
– An XML language for processing/transforming XML
– Does tree transformations -- takes XML and an XSLT style sheet as
input, and produces a new XML document with a different structure
 Advantages
– Very useful for tree transformations -- much easier than DOM or SAX
for this purpose
– Can be used to query a document (XSLT pulls out the part you want)
 Disadvantages
– Can be slow for large documents or stylesheets
– Can be difficult to debug stylesheets (poor error detection; much
better if you use schemas)
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
26
XSLT processing model
 D) Processing model
schema
XSLT style sheet in
XML data in
XSLT
processor
XML
parser
data out (XML)
XML
parser
document “objects” for
data and style sheet
schema
order
partorders
desc
text
part
quantity
delivery-date
order
IT Strategy, IBS, Technology & Solutions
xza
partorders
foo
bee
order
[email protected] / 416.513.5656
27
XML Processing Toolkits
Lots of them …
 Java
– JAXP
dom4j
.NET
…
( http://java.sun.com/xml/jaxp/faq.html )
( http://www.dom4j.org )
( part of .NET framework)
… others …
 Provide DOM, SAX, (JDOM) interfaces, plus lots of other useful
tools in a standardized way (loading parsers, performing XSLT
transformations, etc.)
 JAXP is standard Java, and thus integrated with Websphere
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
28
Presentation Outline
1. What is XML (basic introduction)
2. Defining language dialects and constraints
–
DTDs, namespaces, and schemas
3. XML processing
–
Parsers and parser interfaces; XML processing tools
4. XML databases
–
High-level issues, and references
5. XML messaging / web services
–
Why, and some issues/example
6. Conclusions
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
29
XML and databases
 So where do you stick XML data
– Inside a database!?!
– But how to do this – and which database type to use:
– RDBMS, ORDBMS, ODB, XML??
 How you do so depends on the use cases you have for the data.
Some good-to-ask questions are
– Am I talking about storing documents, or data?
– Is the XML format integral to the application (e.g. XHTML, DocBook?)
– How will the database be queried?
– Queried by XML structure, or by standard SQL
– What ‘parts’ of the document need to be queried
– Do I need a text index?
– How will the data be used/retrieved?
– Passed to XML processing tools (e.g. XSLT), or used at ‘atomic’ simple type
level?
– The answers drive out
– What database to choose, how to map XML to tables (O-R or table
mappings), store as BLOB or broken up …..
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
30
XML and databases
 Upcoming technologies
– XML Query – a query language for querying XML datasets (and
databases)
• Uses XML schema for type casting, and validation
• Info: http://www.w3.org/XML/Query
 Useful XML Database references
–
–
–
–
http://www.xml.com/pub/a/2001/10/31/nativexmldb.html
http://www.rpbourret.com/xml/XMLAndDatabases.htm
http://www.rpbourret.com/xml/XMLDatabaseProds.htm
http://www.xmldb.org/resources.html
IT Strategy, IBS, Technology & Solutions
Introductory article
XML and databases
Products list
Docs / resource list
[email protected] / 416.513.5656
31
Presentation Outline
1. What is XML (basic introduction)
2. Defining language dialects and constraints
–
DTDs, namespaces, and schemas
3. XML processing
–
Parsers and parser interfaces; XML processing tools
4. XML databases
–
High-level issues, and references
5. XML messaging / web services
–
Why, and some issues/example
6. Conclusions
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
32
XML Messaging
 Use XML as the format for sending messages between systems
 Advantages:
– Common syntax; self-describing (easier to parse)
– Can use common/existing transport mechanisms to “move” the XML
data (HTTP, HTTPS, SMTP (email), MQ, IIOP/(CORBA), JMS, ….)
 Requirements
– Shared understanding of dialects for transport (required registry
[namespace!] ) for identifying dialects
– Shared acceptance of messaging contract
 Disadvantages
– Asynchronous transport; no guarantee of delivery, no guarantee that
partner (external) shares acceptance of contract.
– Messages will be much larger than binary (10x or more) [can
compress]
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
33
Common messaging model
 XML over HTTP
– Use HTTP to transport XML messages
–
POST /path/to/interface.pl HTTP/1.1
Referer: http://www.foo.org/myClient.html
User-agent: db-server-olk
Accept-encoding: gzip
Accept-charset: iso-8859-1, utf-8, ucs
Content-type: application/xml; charset=utf-8
Content-length: 13221
. . .
<?xml version=“1.0” encoding=“utf-8” ?>
<message>
. . . Markup in message . . .
</message>
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
34
Some standards for message format
 Define dialects designed to “wrap” remote invocation messages
 XML-RPC
http://www.xmlrpc.com
– Very simple way of encoding function/method call name, and passed
parameters, in an XML message.
 SOAP (Simple object access protocol) http://www.soapware.org
– More complex wrapper, which lets you specify schemas for
interfaces; more complex rules for handling/proxying messages, etc.
This is a core component of Microsoft’s .NET strategy, and is
integrated into more recent versions of Websphere and other
commercial packages.
W3c activity (who sets the SOAP spec) is outlined at:
http://www.w3.org/2000/xp/Group/
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
35
XML Messaging + Processing
• XML as a universal format for data exchange
Place order
SOAP interface
(XML/edi) using
SOAP over HTTP
Supplier
Application
SOAP API
Factory
SOAP
Supplier
XML/
EDI
Transport
HTTP(S)
SMTP
other ...
IT Strategy, IBS, Technology & Solutions
Supplier
Response
(XML/edi) using
SOAP over HTTP
[email protected] / 416.513.5656
36
Web “Services” Model
 SOAP plus higher-level modeling for how services are ‘advertised’,
‘exposed’ and ‘found’
– Uses an XML dialect, WSDL (Web Services Description Language) to
define a service
• WSDL can use XML Schema to define how data is passed between a
service provider and requestor
– Uses an XML dialect, UDDI (Universal Description, Discovery and
Integration) for
• Describing services (high-level)
• Discovering services (registry services, metadata)
• UDDI defined using XML Schema
– Core technology for application integration
•
•
•
•
Microsoft .NET
IBM Websphere
Oracle
…. Many others
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
37
Web Services Code Development
Client code
proxy
automated
code
generator
XML
schema
skeleton
proxy
Write the
Application!
WSDL
WS/SOAP
SOAP
Requests/
responses
WS/SOAP
skeleton
Validation,
business
logic,
routing,
Logging,
more…
Middle tier
code
adapter
adapter
MECH
IT Strategy, IBS, Technology & Solutions
Product
System
code
[email protected] / 416.513.5656
38
Presentation Outline
1. What is XML (basic introduction)
2. Defining language dialects and constraints
–
DTDs, namespaces, and schemas
3. XML processing
–
Parsers and parser interfaces; XML processing tools
4. XML databases
–
High-level issues, and references
5. XML messaging / web services
–
Why, and some issues/example
6. Conclusions
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
39
XML (and related) Specifications
XML Core
XML 1.0
W3C rec
industry std
W3C draft
‘Open’ std
Xfragment
XML names
RDF
Canonical
Xpath
MathML
APIs
XSLT
JDOM
Xpointer
SMIL 1 & 2
XML base
SVG
JAXP
Xlink
XSL
DOM 1
DOM 2
DOM 3
XML
signature
XHTML
events
UDDI
Biztalk
XML-RPC
ebXML
WDDX
XMI
...
IT Strategy,Protocols
IBS, Technology & Solutions
Web
WSDL
...
Services
XHTML 1.0
Xforms
XML schema
SOAP
Style
…...
XML query ….
SAX 1
SAX 2
CSS 1
CSS 2
CSS 3
Infoset
Modularized
XHTML
FinXML
IFX
FpML
XHTML
basic
dirXML
...
100's more ....
[email protected] / 416.513.5656
Application areas
40
XML 101:
A Technical Introduction to XML
The End.
Ian GRAHAM
IT Strategy, IBS, Technology and Solutions, BMO Financial Group
E: <[email protected]>
T: (416) 513.5656 / F: (416) 513.5590
IT Strategy, IBS, Technology & Solutions
[email protected] / 416.513.5656
41
Descargar

An XML Introduction