XML Technologies
and Related Applications
Lesson 2
is a technology for creating markup
languages to describe data of virtually any
type in a structured manner.
Unlike HTML, which limits the document
author to a fixed set of tags, XML allows
document authors to describe data more
precisely by creating new tags. XML can be
used to create markup languages for
describing data in almost any field.
• Learn about Elements and Attributes.
• Learn how namespaces are used to
promote better interoperability.
• Use entity references to embed illegal
• Understand the importance of the
• Add comments to your XML documents.
What Makes an XML Document?
XML documents are based on elements.
 An element is comprised of a start tag, content,
and an end tag.
 Elements can contain other elements, and
additional descriptive information about an
element can be encoded with attributes.
<message>Welcome to XML!</message>
white space is irrelevant!
Some notes
Elements can contain other elements, nested to
any arbitrary depth. (solar system.xml)
 Note that symmetry among elements is not
 Every element can have any combination of any
sub-element name.
 XML documents are commonly stored in text files
that end in the extension .xml, although this is
not a requirement of XML.
Elements: The Building Blocks of XML
Valid element names
• Must start with a letter A..Za..z or an underscore,
• May contain any combination of A..Za..z, 0..9,
dash ( - ), or underscore
• Can be of any length.
• Are case sensitive.
• Terseness is of minimal importance.
Content is the text between opening and
closing tags.
 In order to be able to tell the difference
between tags and content, a small number
of characters and symbos are not allowed in
the content area.
Special Symbols(1)
Common Symbols
#65 for Capital A
A better solution is to tag such content with the
CDATA character data section modifier.
 This tells the parser to ignore the content and
not attempt to interpret it as having XML markup.
<name><![CDATA[ You & Me ]]></name>
 CDATA section can not have another nested CDATA
CDATA Parse eg.xml
Root Element
All XML documents must contain
exactly one root element .
 A DOM tree.
Attributes: More Muscle for Elements(1)
Sometimes you need to convey more information
about an element than its name and content can
 An attribute can be used to give the element a
unique label so it can be easily located, or it can
describe a property about the element.
 An element can have any number of attributes,
as long as each has a unique name.
Attributes: More Muscle for Elements(2)
All-element design is superior.
 (Attribute)To see if the element is a
simple scalar data value that has only a
single interpretation and is not likely to
change or expand over time.
About content
is better? It’s up to you!
Data and metadata.
Text information between opening and closing
tag-pairs is data.
Tag names, attributes and their relationship
are metadata.
Attributes: More Muscle for Elements(3)
Attribute values can be constrained to certain
types if you use a DTD.
 One type is ID, which tells XML that the value is
a unique identifier code for the element. No two
elements in a document can have the same ID.
 Another type, IDREF, is a reference to an ID.
Data type
in XML files will
interprets as strings.
To enforce specific data type.
XML Schema
in attributes.
Reserved Attribute Names
Some attribute names have been set aside for
special purposes by the XML working group.
 These attributes are reserved for XML's use and
begin with the prefix xml:
 xml:lang
 xml:space
 xml:link
 xml:attribute
XML Documents Parsers (1)
software program called an XML parser
(or an XML processor) is required to process
an XML document. The XML parser reads the
XML document, checks its syntax, reports
any errors and allows programmatic access
to the document's contents.
XML document is considered well
formed if it is syntactically correct.
XML Documents Parsers (2)
XML parsers can be downloaded
at no charge.
Such as Microsoft Internet Explorer 5
(IE5), have built-in XML parsers, the
Apache XML Project's parser Xerces,
Sun Microsystem's Java API for XML
Parsing (JAXP) and IBM's parser XML
for Java (XML4J).
Basic Document Structure Rules
1. Tags are case-sensitive.
2. Opening tags must have closing tags.
3. Tags must be properly nested.
4. Attribute values require quotes.
5. Root Element required.
well-formed XML document
Some Examples
The Document Prolog
The top of an XML document is graced with
special information called the document prolog.
 But the prolog can hold additional information
that nails down such details as the document type
definition being used, declarations of special pieces
of text, the text encoding, and instructions to XML
XML declaration(1)
version: Sets the version number.
 encoding: Defines the character encoding
used in the document, such as US-ASCII.
 standalone:
Tells the XML processor
whether there are any other files to load.
XML declaration(2)
All of the properties are optional, but you should
try to include at least the version number in case
something changes drastically in a future revision
of the XML specification.
 The parameter names must be lowercase, and
all values must be quoted with either double or
single quotes.
Document type declaration
This is where you can specify various parameters such as
entity declarations, the DTD to use for validating the
document, and the name of the root element.
It is a container for data that is targeted toward
a specific XML processor.
 Processing instructions (PIs) contain two pieces
of information: a target keyword and some data.
The PI can contain any data except the
combination ?>, which would be interpreted as the
closing delimiter.
<?xml-stylesheet href=“style.css” type=“text/css”>
<?xml-stylesheet href=“style.xsl” type=“text/xsl”>
Comments (1)
Comments are notes in the document that are
not interpreted by the parser.
 They can be used to identify the purpose of files
and sections to help navigate a cluttered
document, or simply to communicate with each
Comments (2)
Two dashes in a row (--) are used tell the parser
when a comment begins and ends, they can't be
placed anywhere inside the comment.
Since comments can contain markup, they can
be used to "turn off" parts of a document.
 Don't
put comments inside comments or
Something about UTF-8
Unicode (2 bytes)
UCS (Universal Character Set)
UTF (Unicode/UCS Transformation Format)
XML 1.0 V.S. XML 1.1
Unicode compatibility.
 Backward and forward compatibility.
Namespaces: Expanding Vocabulary(1)
What happens when you want to include
elements or attributes from different document
Namespaces: Expanding Vocabulary(2)
A namespace must be declared in the document
before you can use it.
 Be careful not to use prefixes like xml, xsl, or
other names reserved by XML and related
 The XML processor isn't required to do anything
with the URI, however.
uri url...
URI(Uniform Resource Identifier) RFC 2396
 URL(Uniform Resource Locator)
 URN(Uniform Resource Name)
Scope of NS
Default namespace
We can declare one of the namespaces to be the default
by omitting the colon (:) and the name from the xmlns
 Elements and attributes in the default namespace don't
need the namespace prefix, resulting in clearer markup.
Some problems about NS
Namespaces can be a headache if used in
conjunction with a DTD.
 Namespaces can only assure that names are
unique and unambiguous. They have nothing to do
with document validity.
 The URI referenced in a namespace declaration
does nothing more than provide an identifier to
the processing application. In fact, most XML
parsers completely and utterly ignore the
namespace URI.
You can read and write arbitrarily complex XML
documents by using the simple element and
attribute markups.
 It is simple to create well-formed XML document
by hand.
 Know the grammar rules regarding what’s
acceptable for element and attribute names.
 Understand the importance of designing with
elements versus attributes, and when to use which
 Namespaces are a powerful feature of XML.