Chapter 4 - Quality Control with
Schemas
Learning XML
by
Erik T. Ray
Slides were developed by
Jack Davis
College of Information Science
and Technology
Radford University
August 2006
1
Schemas
• define an XML tag set
- primarily elements, attributes, entities and
structure
• is a pass or fail test for XML documents
(validation)
• insure that a document fulfills a minimum set
of requirements, finding flaws that could
result in anomalous processing
• are not required
• A validating XML parser takes an XML
instance as input and produces a validation
report as output. This report typically lists
errors found in the document (where it does
not conform to the schema)
• Validation considers:
structure, data typing, integrity (status of links
between nodes and resources), business rules
(spell checking, checksum)
August 2006
2
Schema Types
• DTD - Document Type Definition
The oldest and most widely supported schema
language.
DTD's don't support namespaces (can't mix
tag sets within a single DTD) and have very
weak data typing.
• The W3C built XML Schema
XML Schemas are themselves XML
documents, so they can be checked for wellformedness and validity.
XML Schema support namespaces and have a
much broader ability to specify data types,
including things like date types.
• Other schema definition languages are
available (RELAX NG, Schematron, …).
August 2006
3
DTD's
• XML elements and attributes are defined in a
DTD
• DTD's are extensible - meaning they can be
extended to meet the needs of the current task
• A DTD can be specified within an XML
document (internal) or in a separate file
(external).
• Many free DTD's exist on the internet today
and can be freely downloaded
• DTD's declare a set of allowed elements. A
conforming XML document can't use any
elements not defined in this set.
• DTD's define a content model for each
element. This describes what elements or
data can go inside an element, in what order,
in what number, and whether they are required
or optional.
• DTD's declare a set of allowed attributes for
each element with data types and default
values.
• DTD's provide mechanisms to manage the
model, providing links to other components.
August 2006
4
Element Declarations
• Element declaration
<!ELEMENT element_name (content model)>
Content Model
Text:
– Description: text or character data
– Syntax: (#PCDATA)
• Elements:
– Description: contains other elements
– Syntax: (element_1, element_2, …)
• Mixed Content:
– Description: contains both text and other
elements
– Syntax: (#PCDATA | element_1 |
element2 …)*
• Empty:
– Description: does not contain any content
– Syntax: EMPTY
• Any:
– Description: can contain text or elements
– Syntax: ANY
August 2006
5
Element Declaration Syntax
• Declaration syntax is flexible when it comes to
whitespace. You can add extra space
anywhere except in the string of characters at
the beginning that identifies the declaration
type.
For example, these are all acceptable:
<!ELEMENT
thingie
ALL>
<!ELEMENT
thingie
ALL>
<!ELEMENT
August 2006
thingie
(
foo |
bar |
zap )*>
6
Element: Character Notations
• Question Mark:
– Character: ?
– Description: element may occur zero or
one time
– Usage: email?
• Asterisk:
– Character: *
– Description: element may occur zero or
more times
– Usage: email*
• Plus:
– Character: +
– Description: element may occur one or
many times
– Usage: email+
August 2006
7
Element: Character Notations (cont.)
• Parentheses:
– Character: ( )
– Description: used to indicate a set
– Usage: (name, address, zip_code)
• Vertical bar:
– Character: |
– Description: used to indicate a set of
values
– Usage: a | b | c
• Comma:
– Character: ,
– Description: used to indicate element
sequence
– Usage: (a, b, c)
August 2006
8
Attribute Declarations
<!ATTLIST element_name
attribute_name-1 datatype default_value
attribute_name-2 datatype default_value
attribute_name-3 datatype default_value>
<!ATTLIST student
level CDATA #REQUIRED>
<!ATTLIST student
level (fr | soph | jr | sr)
August 2006
"fr">
9
Attribute Data Types
• Data type: CDATA
– Description: character data
• Data type: ID
– Description: unique identifier to give an
element a label
• Data type: Enumerated List (i.e., – (a, b, c) )
– Description: list of all possible values that
the attribute can contain
August 2006
10
Attributes: Default Values
• Attribute type: #FIXED
– Description: value of the attribute must
match the value assigned in the DTD
• Attribute type: #REQUIRED
– Description: element must contain the
attribute to be valid
• Attribute type: #IMPLIED
– Description: attribute is optional
August 2006
11
Example XML Document
<?xml version=”1.0” standalone=”yes”?>
<emails>
<message num=”a1”
to=”joe&#64;acmeshipping.com”
from=”brenda&#64;xyzcompany.com”
date=”02/09/01”>
<subject title=”Order 10011”/>
<body>
Joe,
Please let me know if order
number 10011 has shipped.
Thanks,
Brenda
</body>
<reply status="yes"/>
</message>
</emails>
August 2006
12
Internal DTD
<!DOCTYPE emails [
<!ELEMENT emails
(message+)>
<!ELEMENT message (subject?, body, reply*)>
<!ATTLIST message
num
ID
#REQUIRED
to
CDATA
#REQUIRED
from
CDATA
#FIXED
“brenda&#64;xyzcompany.com”
date
CDATA
#REQUIRED>
<!ELEMENT subject
EMPTY>
<!ATTLIST
subject
title
CDATA
#IMPLIED>
<!ELEMENT body
ANY>
<!ELEMENT
reply
EMPTY>
<!ATTLIST reply
status
(yes | no)
"no">
]>
In a standalone XML document this is prepended
to the XML document. If it's an external DTD
the XML document must contain a declaration
like the following.
<!DOCTYPE emails SYSTEM "emails.dtd"> or
<!DOCTYPE emails SYSTEM "http://…">
August 2006
13
DTD Census Example
• Here's an example XML document. The
information in this example is a census
document. The following example is a typical
Census example XML document. It's created
after an interview with one family. Consider
that all such documents could be compiled
and overall statistics generated.
example 4-1
• Here's the DTD that generates the rules by
which the Census XML documents are
created.
example 4.2
August 2006
14
DTD Design
• DTD design and construction is part science
and part art form. The basic concepts are
simple, but maintaining hundreds of element
and attribute declarations while keeping them
readable and bug-free can be a challenge.
• Keep it organized
Good comments can save hours of
scrutinizing later, do not wait until the end to
document. Keep declarations separated into
sections by their purpose.
Pad declarations with lots of whitespace.
Content models and attribute lists suffer from
dense syntax, so spacing out the parts, even
placing them on separate lines, helps. Indent
lines inside declarations to make the
delimiters clearer. Use extra space between
logical divisions.
DTD's will require updating as requirements
change. Number versions to avoid lots of
confusion later.
August 2006
15
DTD Design (cont.)
•
Parameter entities
Parameter entities can hold recurring parts of
declarations and allow you to edit them in one
place. In the external subset, they can be used in
element-type declarations to hold element groups
and content models, or in attribute list declarations
to hold attribute definitions. For example, assume
you want every element to have an optional ID
attribute for linking and an optional class attribute
to assign specific role information. Parameter
entities, which apply only in DTDs, look much like
ordinary general entities, but have an extra % in the
declaration. You can declare a parameter entity as
in the following:
<!ENTITY % common.atts "
id
ID
#implied
class CDATA #implied" >
the entity can be used in attribute list declarations
<!ATTLIST foo %common.atts;>
<!ATTLIST bar %common.atts;
extra CDATA #FIXED "blah">
August 2006
16
Attributes vs. Elements
• Making a DTD from scratch is not easy. You
have to break information down into its
conceptual atoms and package it as a
hierarchical structure, but it's not always clear
how to divide the information.
Choose names that make sense. Element
names like thing, object, and chunk are nearly
impossible to figure out.
Hierarchy adds information. A newspaper has
articles that contain paragraphs and heads.
Containers create boundaries to make it easier
to write stylesheets and processing
applications. Strive for a tree structure that
resembles a wide, bushy shrub. If you go too
deep, the markup begins to overwhelm the
content and it becomes harder to edit a
document; too shallow and the information
content is diluted.
August 2006
17
Attributes vs. Elements (cont.)
• Know when to use elements over attributes.
An element holds content that is part of your
document. An attribute modifies the behavior
of an element. The trick is to find a balance
between using general elements with
attributes to specify purpose and creating an
element for every single contingency.
There are advantages to splitting a monolithic
DTD into smaller components, or modules.
The first is that a modularized DTD can be
easier to maintain. XML provides two ways to
modularize your DTD. The first is to store
parts in separate files, then import them with
external parameter entities. The second is to
use a syntactic device called a conditional
section.
August 2006
18
Importing Modules
• To import whole DTD's or parts of DTDs, use
an external parameter entity.
<!ELEMENT catalog (title, metadata, front,
entries+)>
<!ENTITY % basic.stuff SYSTEM
"basics.mod">
%basic.stuff;
<!ENTITY % frnt.matter SYSTEM "front.mod">
%frnt.matter;
<!ENTITY % metadata PUBLIC
"-//Standards Stuff//DTD Metadata
v3.2//EN" "http://www.standards- ….">
%metadata;
This DTD has two local components, which
are specified by system identifiers. Each
component has a .mod filename extension,
which is a traditional way to show that a file
contains declarations but should not be used
as a DTD on its own.
August 2006
19
Examples
• standalone.xml
• itfac.xml
Review the itfac.xml document, then students
should develop the dtd.
• faculty.dtd
• faculty.css
August 2006
20
XML Schema Overview
•
XML Schema specification released by the W3C in
May 2001, and contains two parts:
– Part I - structure
– Part II - data types
•
Developed as an alternative to DTD’s and is much
more powerful
•
Features:
– Pattern matching
– Rich set of data types
– Attribute grouping
– Supports XML namespaces
– Follows XML syntax
August 2006
21
XML Schemas
•
The XML Schema specification was released by the
W3C in May of 2001
•
XML Schemas, like DTD’s, are used to describe the
structure of an XML document
•
The XML Schema specification consists of two
parts:
– XML Schema: Structures. This specification
consists of a definition language for describing
and constraining the content of XML documents
– XML Schema: Datatypes. This specification
defines the datatypes to be used in XML
schemas.
•
The namespace for XML Schema is:
http://www.w3.org/2001/XMLSchema
August 2006
22
XML Schema - advantages
• XML Schema allows you to import
vocabularies (tag sets).
• XML Schemas are XML documents, so they
can be validated
• The XML Schema specification contains a
number of built-in datatypes, and also allows
developers to create their own datatypes
• Some of the datatypes are:
xs:string
text
xs:token
contains textual tokens
xs:QName
namespace-qualified name
xs:decimal pos & neg floats and int's
xs:integer
integers
xs:float
floating pt. number
xs:ID,IDREF identification token
xs:boolean true or false
xs:time
HH:MM:SS
xs:date
CCYY-MM-DD
xs:dateTime CCYY-MM-DDTHH:MM:SS-Zone
August 2006
23
Complex Elements
• Most elements are not simple. They can
contain elements, attributes, and character
data with specialized formats. So, complex
elements can be defined.
Here's an example complex type definition.
<xs:element name="date">
<xs:complexType>
<xs:all>
<xs:element ref="year"/>
<xs:element ref="mo"/>
<xs:element ref="day"/>
</xs:all>
</xs:complexType>
</xs:element>
<xs:element name="year" type="xs:integer"/>
<xs:element name="mo" type="xs:integer/>
<xs:element name="day" type="xs:integer/>
August 2006
24
Restriction Elements
• In the previous example the month number
was just given as type integer. However, this
would allow the user to insert any integer into
the document for the month number,
obviously we'd like to restrict the month
number to 1-12.
<xs:simpleType name="monthNum">
<xs:restriction base="xs:integer">
<xs:minInclusive value="1" />
<xs:maxInclusive value="12" />
</xs:restriction>
</xs:simpleType>
<xs:element name="mo" type="monthNum"/>
August 2006
25
Restriction Elements (cont.)
• Restrictions can create fixed values, constrain
the length of strings, and match patterns with
regular expressions. Here's an example that
restricts a postal code (three digits followed
by three capital letters).
<xs:element name="postalcode"
type="pcode"/>
<xs:simpleType name="pcode">
<xs:restriction base="xs:token">
<xs:pattern value="[0-9]{3}[A-Z]{3}"/>
</xs:restriction>
</xs:simpleType>
• Can also implement enumeration types
<xs simpleType name="gender">
<xs:restriction base="xs:token">
<xs:enumeration value="female"/>
<xs:enumeration value="male"/>
</xs:restriction>
</xs:simpleType>
August 2006
26
XML Schema Occurrence Constraints
•
Occurrence constraints define the number of times
a particular element can or must occur
•
Attributes:
minOccurs:
Defines the minimum number of times an
element can occur. Default value is 1
maxOccurs:
Defines the maximum number of times an
element can occur. Default value is 1
•
Can set the value of the “maxOccurs” attribute to
“unbounded” to indicate that there is no maximum
number of times the element can occur
August 2006
27
XML Schema Simple Type Example
• XML schemas are put together like DTD's with
element and attribute declarations along with
type declarations. A simple example shows
the structure.
• XML file:
<?xml version=”1.0”?>
<email
xmlns:xsi=
"http://www.w3.org/2001/XMLSchemainstance"
xsi:noNamespaceSchemaLocation =
"email_schema.xsd">
This is my e-mail message
</email>
• Schema file:
<?xml version=”1.0”?>
<xsd:schema xmlns:xsd=
”http://www.w3.org/2001/XMLSchema”>
<xsd:element name=”email”
type=”xsd:string”/>
</xsd:schema>
August 2006
28
XML Schemas
• XML Schemas utilize:
type extension
type restriction
lists
unions
namespace features
and much, much more.
This brief presentation only scratches the
surface of XML schemas.
August 2006
29
XML Schema Example
• Here's a schema for the Census example that
a DTD was defined for. Note the differences.
example 4-6
August 2006
30
Descargar

Slide 1