Introduction to XML
John Arnett, MSc
Standards Modeller
Information and Statistics Division
NHSScotland
Tel: 0131 551 8073 (x2073)
mailto:[email protected]
http://isdscotland.org/xml
Contents
•
•
•
•
•
What is XML?
Anatomy of an XML Document
Conformance and Validation
Summary
Find Out More
What is XML?
• XML is not…
– a programming language
– a software panacea
– an object-oriented technology
– HTML with funny tags
– a replacement for HTML… but it is
re-shaping publishing on the web
What is XML?
• Stands for Extensible Markup Language
– Meta-markup language derived
from SGML (Standard Generalised
Markup Language)
– Open Standard, currently XML 1.0
2nd edition (W3C Recommendation
6 October 2000)
What is XML?
• W3C says
– XML is the universal format for
structured documents and data on
the Web
– A data object is an XML document
if it is well-formed, as defined in [the
W3C] specification (more on this later)
What is XML?
• Data Content and Presentation
Sample dataset
ID
SURNAME
FORENAME
SEX
DOB
134376
Jones
Ian
0
06011971
198457
McKenzie
Alison
1
23081983
111672
Martin
Lesley
0
12111979
147678
Jackson
Sarah
1
15061976
Flat file, database, spreadsheet, etc
What is XML?
• Record – data oriented structure
111672 Martin




Lesley 0 12111979
Structured
Searchable
Easy to understand
Portable
What is XML?
• HTML – document oriented structure
<h1>Record Id: <font color="red">11672</font></h1>
<table><colgroup><col align="left"></colgroup>
<tr><th>Surname:</th><td>Martin</td>
Surname:
Martin
</tr><tr><th>Given Name:</th><td>Lesley</td>
Given
Name: Lesley
</tr><tr><th>Sex:</th><td>Male</td></tr>
Sex:<tr><th>Date
Male of Birth:</th><td>12 November 1979</td></tr>
Date
of Birth: 12 November 1979
</table>
Record Id: 11672




Easy to understand
Portable
Structured
Searchable
What is XML?
• XML to the rescue!
<Record recordId=“11672">
<Surname>Martin</Surname>
<GivenName>Lesley</GivenName>
<Sex>M</Sex>
<DateOfBirth>
<Day>12</Day><Month>11</Month><Year>1979</Year>
</DateOfBirth>
</Record>




Easy to understand
Portable
Structured
Searchable
What is XML?
• HTML and XML are…
– Text based
– Open standards
– Widely used
What is XML?
• But XML also…
– Structured
– Separates data from presentation
– Self-describing
– Searchable
– Extensible
• i.e. any number of tags allowed
Anatomy of an XML Document
• XML documents consist of text
– character data
• tab, carriage return and line feed
• Unicode characters
– markup
Anatomy of an XML Document
• Markup
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<!-- this is an xml comment -->
<MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
– start-, end- and empty element tags
• tag names are case sensitive!
– entity and character references
– comments
Anatomy of an XML Document
• Character data
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<!-- this is an xml comment -->
<MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
– Reserved characters
&, <, >,‘ and “
Anatomy of an XML Document
• Declaration
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<!-- this is an xml comment -->
<MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
– Optional first line of markup (but
W3C recommended)
– Used to match documents to
parsers
Anatomy of an XML Document
• Root Element
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<!-- this is an xml comment -->
<MessageBody>Hello, World Wide Web!</MessageBody>
</Message>
– Uniquely named element
– Contains all the data and links to
other documents
Anatomy of an XML Document
• Elements
<Book>XML Bible
<Price>24.99</Price>
<img src=“book.gif"/>
<Author>E.R. Harold</Author>
<Publisher>J. Forbes</Publisher>
</Book>
– Define the content of the XML
document
– May contain other elements,
character data or can be empty
Anatomy of an XML Document
• Attributes
<BookCatalog Subject="XML">
<Book Title="XML Bible" Price="24.99“/>
<Book Title="XML How To Program" Price=“19.99“/>
<Book Title=“Definitive XML Schema“
Price=“44.99“/>
</BookCatalog>
– Add data about the elements
Anatomy of an XML Document
• Handling reserved characters
– Built-in entities
&
“
<
>
‘
=
=
=
=
=
&amp;
&quot;
&lt;
&gt;
&apos;
– CDATA Sections
<CodeSnippet>
<![CDATA[if(this->getX() < 5 && values[0] =>
10) cerr << "out of range";]]>
</CodeSnippet>
Anatomy of an XML Document
• Namespaces
– Preventing naming collisions
<order
xmlns:cust="http://www.example.com/custDetails“
xmlns:book="http://www.example.com/bookDetails"
xmlns="http://www.example.com/order">
<cust:title>Dr</cust:title>
<cust:name>Peter Parker</cust:name>
<book:title>White Teeth</book:title>
<book:price>5.99</book:price>
<orderNumber>AYT2379</orderNumber>
</order>
Conformance and Validation
• All XML processors must check wellformedness constraints
– One root element
– Start and end tags match
<Tag>content</Tag>
– Empty elements are terminated as
<Tag/>
– Tags are correctly nested
<Parent><Child></Child></Parent>
– All attributes enclosed in “quotes”
Conformance and Validation
• Validating XML processors check
against validity constraints
– specified in Document Type
Definitions (DTDs) or Schemas
– a valid XML document must be
well-formed
– a well-formed document need not
necessarily be valid
Document Type Definitions
• DTD syntax able to specify
– Structure and order of child elements
<!ELEMENT Product (Name, Size?)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Size (#PCDATA)>
– Element attributes
<!ATTLIST Product EffDate CDATA #IMPLIED>
• limited number of data types
• default and fixed attribute values
Document Type Definitions
• DTD’s
– Easy to understand and implement
– Lightweight alternative to schemas
– But…
• use non-XML syntax
• only limited support for data
typing and namespaces
• difficult to extend
Schemas
• W3C Schema
– Uses XML syntax
– Provides built-in and supports userdefined data types
– Supports namespaces
– Provides several extensibilty
mechanisms
Schemas
• Schemas therefore more flexible…
<xs:element name="Product">
<xs:complexType>
<xs:sequence>
<xs:element name=“Name" type="xs:string"/>
<xs:element name=“Size" type="xs:positiveInteger”
minOccurs="0"/>
</xs:sequence>
<xs:attribute name=“EffDate" type="xs:date"/>
</xs:complexType>
</xs:element>
• but harder to understand than DTD’s
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ATTLIST
Product (Name, Size?)>
Name (#PCDATA)>
Size (#PCDATA)>
Product EffDate CDATA #IMPLIED>
In Summary…
• A language for describing markup
languages
• Extensible, ie. define own tags
• Readable, structured and self
describing
• Documents must be well-formed
• Documents may be validated using
DTD’s and/or Schemas
Find Out More
• World Wide Web Consortium
– www.w3.org
• W3C XML v1.0 Specification
– http://www.w3.org/TR/REC-xml
Find Out More
• The XML Industry Portal
– www.xml.org
• O’Reilly XML site
– www.xml.com
• XML Cover Pages
– www.oasis-open.org/cover/
• Café Con Leche
– www.ibiblio.org/xml/
Find Out More
• Scottish Health and Community Care
XML Steering Group
– www.isdscotland.org/xml
XML Tools
• XSV - Open Source XML Schema
Validator
– www.ltg.ed.ac.uk/~ht/xsv-status.html
• MSXML 4.0
– www.microsoft.com/downloads/detail
s.aspx?FamilyID=3144b72b-b4f246da-b4b6-c5d7485f2b42
XML Tools
• XML Spy 2004 IDE
– www.altova.com/products_ide.html
• Free XML Tools and Software
– www.garshol.priv.no/download/xmlt
ools/
Printed Sources
• Numerous printed sources – for more
information visit
– Charles F. Goldfarb's
www.xmlbooks.com
– www.amazon.com
Descargar

XML - Information Services Division