.opennet Technologies
XML Document Object Model
and XML-Java Interfaces
Fall Semester 2001 MW 5:00 pm - 6:20 pm CENTRAL (not Indiana) Time
Geoffrey Fox and Bryan Carpenter
PTLIU Laboratory for Community Grids
Computer Science,
Informatics, Physics
Indiana University
Bloomington IN 47404
[email protected]
10/4/2015
xmldomfall01
1
The two XML World Views
• There are the Data Object
• And the Document Object view
– both defined in XML Schema
Java
Web Server
XML Web page
Database
Persistent Managed Store
(Virtual)
XML Layer
Object Layer
Enterprise
Javabeans
Virtual Machine
JAVA
Servlet
Control
Form Input/Output Processing
System User
10/4/2015
xmldomfall01
2
Java XML Interfaces -- SAX
• The appropriate way to interface Java to XML is still being debated and there
are several different approaches
• There is the SAX (Simple API for XML) where a SAX parser reads an XML
data stream and hands nuggets of information to a user program. These
nuggets are called events and typical events are Start Tag, End Tag, Content
within a Tag etc.
• http://www-105.ibm.com/developerworks/education.nsf/dw/xml-onlinecoursebynewest?OpenDocument&Count=500 has a recent SAX tutorial
• http://www6.software.ibm.com/developerworks/education/x-usax/index.html
• SAX resource is http://www.megginson.com/SAX/index.html
User Code
XML Data
Schema/Instance(s)
10/4/2015
xmldomfall01
XML SAX
Parser
3
Java XML Interfaces –
Document Object Model DOM
• The DOM is an unfortunate name as it is useful whether or
not XML defines a document or any other object
– i.e. it can be used whether one is supporting XML Web page
or XML Data
• Further the OM part of name is confusing as XML defines
an object and the DOM describes a different object
• DOM is really TOSXO – Tree Object Structure of an XML
Object – or perhaps TOM – Tree Object Model
• In fact if you look at almost any structured information, you
will find that it has a tree structure and of course we saw that
any Schema or DTD defined XML produces a tree
10/4/2015
xmldomfall01
4
Example of HTML DOM
• Here is an
example of a
fragment of
HTML and
how it can be
thought of as
a tree
• This is called
a “document
fragment” in
DOM
(lightweight
tree)
10/4/2015
xmldomfall01
5
IMS and DOM
• As an example, consider recent definition of a object structure for
course material from the so called ADL or Advanced Distributed
Learning effort from DoD – see http://www.adlnet.org
• Here we have a hierarchy with an element called block to define
nodes in a tree
• This block tree node has various other elements which are specific
to this application
– Actually in this specification the leaves of tree are <au> tag
(assignable unit) which is in fact typically a Web page
• So ADL has superimposed a tree for document organization on
top of tree for document given by DOM
– Of course DOM applies to either tree and describes the way of
navigates through it
10/4/2015
xmldomfall01
6
Example Tree based Course Structure
10/4/2015
xmldomfall01
7
XML DTD Structure for Block Element
?? gglo
lobbaalP
lPro
roppeertie
rtiess
ssoouurc
rcee ~~
m
m ooddeell ~~
++ eexxte
tern
rnaalM
lM eeta
taddaata
ta
lo
loccaatio
tionn ~~
** oobbje
jecctiv
tiveeR
R eeff ~~
title
title ~~
id
ideenntific
tificaatio
tionn
?? ddeessccrip
riptio
tionn ~~
?? la
labbeels
ls
?? ppre
rere
reqquuis
isite
itess ~~
ccoouurs
rsee
?? ccuurric
rricuula
larr ~~
?? ddeevveelo
loppeerr ~~
?? ccoom
m pple
letio
tionnR
R eeqq ~~
ssoouurc
rcee ~~
** eexxte
tennssio
ionnss
bblo
locckk ~~
m
m ooddeell ~~
lo
loccaatio
tionn ~~
nnaam
m ee ~~
++ ppro
roppeerty
rty
** aauu ~~
++
vvaalu
luee ~~
** bblo
locckk ~~
bblo
locckkA
Alia
liass ~~
?? oobbje
jecctiv
tiveess
10/4/2015
xmldomfall01
8
Tree or Structured Data
• Yahoo and Google offer
Structured (tree) or
unstructured data access
Tree Nodes
10/4/2015
xmldomfall01
9
Unstructured Data
• The Gallimaufrey of Web
Search Engines
10/4/2015
xmldomfall01
10
Java XML Interfaces – DOM
• Apache has two so called DOM parsers which read the full tree
into memory and allow you to browse it
– Xerces and Crimson
• Note these are built on top of SAX parsers and provide an
additional layer of capability.
– In all these architectures, one can choose to validate or not to
validate XML
Tree Representation
Of XML Instance
User Code
XML Data
Schema/Instance(s)
10/4/2015
XML DOM
Parser
xmldomfall01
11
Java XML Interfaces -- XPP
• A “Pull” Parser written by Aleksander Slominski who is a
graduate student of Dennis Gannon at Indiana University
• http://www.extreme.indiana.edu/soap/xpp/
• This has a similar interface to SAX but you can “backtrack”
– For instance you could decide that you did not want to read all the
events in a particular element
– <xmlnode> Other Nodes </xmlnode>
– And later go back if it turns out you need them
• In DOM view of Java Interface, XPP Supports choosing whether
or not to expand nodes of the XML Tree
• XPP was fastest parser in a recent survey (which excluded SAX as
it doesn’t preserve tree structure)
• http://www-106.ibm.com/developerworks/xml/library/xinjava/index.html
10/4/2015
xmldomfall01
12
Performance of XML DOM like Parsers
• This took a variety of documents and summed time
– Current XPP does not support one of documents with entities and
other not so useful XML constructs
Smaller Numbers
Better
Article has links to all systems
10/4/2015
xmldomfall01
13
Java XML Interfaces – JDOM I
• DOM has perhaps two difficulties
– A lot of DOM features are aimed at Web Page not XML data
application (Tree structure common to both)
– It is not especially well designed to exploit Java
• JDOM is designed to produce a natural Java—XML interface
– It exploits Java Collections to organize nodes and other features of
an XML Instance
• For more information on JDOM, visit http://www.jdom.org.
– For information on the Java Community Process (JCP) standards
effort for JDOM, see
http://java.sun.com/aboutJava/communityprocess/jsr/jsr_102_jdom.html.
• JDOM appears immature and description in performance review
is not so positive!
– Surprisingly it is no faster than Java DOM
10/4/2015
xmldomfall01
14
Party Line on JDOM, DOM4J
• The standard DOM is a very simple data structure that intermixes text
nodes, element nodes, processing instruction nodes, CDATA nodes,
entity references, and several other kinds of nodes.
• That makes it difficult to work with in practice, because you are always
sifting through collections of nodes, discarding the ones you don't need
into order to process the ones you are interested in.
• JDOM, on the other hand, creates a tree of objects from an XML
structure.
– The resulting tree is much easier to use, and it can be created from an
XML structure without a compilation step.
• Although it is not on the JCP standards track, DOM4J is an opensource, object-oriented alternative to DOM that is in many ways ahead
of JDOM in terms of implemented features.
• As such, it represents an excellent alternative for Java developers who
need to manipulate XML-based data. For more information on DOM4J,
see http://www.dom4j.org.
10/4/2015
xmldomfall01
15
Java XML Interfaces – Castor I
• http://castor.exolab.org/ is open source project that supports a
different model where you map one to one XML Schema objects
to Java Classes
– Map Class <--> Schema
– Map Java Instance <--> XML Instance
• This uses Java object references to traverse tree – not explicit tree
structure
– Looks best if Schema reflects an integrated object and names of
properties mean something
– If Schema (as in ADL) just a “tree” then maybe not so natural
• Next Page is Castor advertisment!
• There is some partial standards effort for this type called JAXB
(Java Architecture for XML Binding
http://jcp.org/jsr/detail/031.jsp)
– See http://java.sun.com/xml/jaxp1.1/docs/tutorial/overview/3_apis.html for Sun’s attempt to
deconfuse these approaches
10/4/2015
xmldomfall01
16
Java XML Interfaces – Castor II
• Castor XML: Java object model to and from XML
– Generate source code from an XML Schema
• Castor JDO: Java object persistence to RDBMS
• Castor DAX: Java object persistence to LDAP
• Castor DSML: LDAP directory exchange through XML
• XML-based mapping file specify the mapping between one model
and another
• Support for schema-less Java to XML binding
• In memory caching and write-at-commit reduces JDBC
operations
• Two phase commit transactions, object rollback and deadlock
detection
• OQL query mapping to SQL queries
• EJB container managed persistence provider for OpenEJB
10/4/2015
xmldomfall01
17
Java XML Interfaces – Castor III
• Note Comparison of DOM versus Castor/JAXB
– Maybe we have a tree corresponding to a parent class docroot and
child properties called say fred.
– Let fred have children of same name
• The Castor way of accessing information would be reference
– Docroot.fred.fred.fred.finalproperty
– Actually use methods (setter/getter) as properties are private
• DOM model would reference tree 4 levels down with node whose
name was finalproperty
• Castor has a document handler which will return the XML
associated with any Java object generated from XML in text
format as well as SAX DocumentHandlers and DOM trees.
• Best is to combine Castor and DOM models?
10/4/2015
xmldomfall01
18
Java XML Interfaces – Castor IV
• This diagram illustrates the Castor versus DOM model
Instance Docroot
Instance fred
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
parent
Node
Node
Node
child
Node
Node
Node
Node
Property finalpropery
See online book chapter (Professional XML 2nd Ed.Wrox Pubs.)
http://www.wrox.com/books/samplechapters/5059/content.pdf
10/4/2015
xmldomfall01
19
Java XML Interfaces – JAXP
• JavaTM APIs for XML Processing (JAXP) is a collection of
technologies allowing you to interface with many different types
of XML Java interfaces
– http://java.sun.com/xml/jaxp.html
– This link has several good online tutorials
• http://java.sun.com/xml/jaxp1.1/docs/tutorial/overview/3_apis.html
• This tutorial discusses JAXP and relation to SAX DOM XSLT
JDOM
• JAXP is an approved Java standard which is meant to allow you
to keep the same interface and change implementation
– Not clear this is efficient and will catch on
10/4/2015
xmldomfall01
20
The Origins of the W3C DOM
• The idea of DOM came from need to be able to build interactive
web pages and to identify parts of a document uniquely so that
one can for example
– Associate a mouse event with a particular page element.
– Associate input of text into a form with a particular text are
• Dynamical HTML was introduced in Netscape 4 and IE5 and
allows one to both associate events with HTML elements and to
change the HTML structure
– e.g. move a “layer” around within browser
– Change text and color in a “document fragment”
• Netscape’s implementation of Dynamical HTML had many bugs
and was inferior to Microsoft’s although it had the essential
needed functionality
10/4/2015
xmldomfall01
21
The 4 levels of DOM
• Level 0: Functionality equivalent to that evident in Netscape
Navigator 3.0 and Microsoft Internet Explorer 3.0.
– Levels 1 and 2 include what is called Dynamical HTML but make
this much more complete
• Level 1: This concentrates on the general API to an XML
document.
– It contains functionality for document (tree) navigation and
manipulation.
– It defines the special case of DOM applied to HTML with specific
API’s for the different HTML elements
• Level 2: includes a style sheet object model, and defines
functionality for manipulating the style information attached to a
document. It also enables traversals on the document (i.e. for
manipulating collections of nodes) , defines an event model (very
important!) and provides support for XML namespaces.
• Level 3: Still being developed – see next page
10/4/2015
xmldomfall01
22
Level 3 DOM
• Level 3, which is at Working Draft stage, includes the following items:
• Extending the DOM Level 2 Object Model: Allowing users and applications to
access keyboard events. Adding the ability of defining groups of events.
• Content Models (DTD, Schema) and Validation: an object model for accessing
and modifying a Content Model for a document.
• Load and Save interfaces: for loading XML source documents into a DOM
representation and for saving a DOM representation as an XML document.
• Embedded Document Object Model: Currently, the Web is moving towards
documents with mixed markup vocabularies, e.g. SVG fragments can be
embedded in an XHTML document. This creates new challenges for the DOM,
since it also means that DOM APIs and implementations of the different
vocabularies need to work together.
• Adaption to changes to core XML functionality: the DOM is an API to an XML
document. As auxiliary functionality to XML 1.0 is developed (namespaces,
XML Base), the DOM API should model this.
• XPath DOM: A simple solution to query a DOM tree using XPath will be also
included.
10/4/2015
xmldomfall01
23
What the DOM is not ….. I
• Although the Document Object Model was strongly influenced by
"Dynamic HTML", in Level 1, it does not implement all of "Dynamic
HTML". In particular, events have not yet been defined. Level 1 is
designed to lay a firm foundation for this kind of functionality by
providing a robust, flexible model of the document itself.
• The Document Object Model is not a binary specification. DOM
programs written in the same language will be source code compatible
across platforms, but the DOM does not define any form of binary
interoperability.
• The Document Object Model is not a way of persisting objects to XML
or HTML. Instead of specifying how objects may be represented in
XML, the DOM specifies how XML and HTML documents are
represented as objects, so that they may be used in object oriented
programs.
• The Document Object Model is not a set of data structures, it is an
object model that specifies interfaces. Although this document contains
diagrams showing parent/child relationships, these are logical
relationships defined by the programming interfaces, not
representations of any particular internal data structures.
10/4/2015
xmldomfall01
24
What the DOM is not ….. II
• The Document Object Model does not define "the true inner
semantics" of XML or HTML. The semantics of those languages
are defined by W3C Recommendations for these languages. The
DOM is a programming model designed to respect these
semantics. The DOM does not have any ramifications for the way
you write XML and HTML documents; any document that can be
written in these languages can be represented in the DOM.
• The Document Object Model, despite its name, is not a competitor
to the Component Object Model (COM). COM, like CORBA, is a
language independent way to specify interfaces and objects; the
DOM is a set of interfaces and objects designed for managing
HTML and XML documents. The DOM may be implemented
using language-independent systems like COM or CORBA; it
may also be implemented using language-specific bindings like the
Java or ECMAScript bindings specified in this document.
10/4/2015
xmldomfall01
25
Language Bindings
• The DOM specifies a set of methods and properties which are the
interface that for user to access the static or dynamic (events) of
an XML structure. It also allows one to create or modify such
structures
– In specification it gives this interface for IDL (CORBA), Java
and ECMAScript
• For Web Pages, Java (in Java Server Pages) or ECMAScript are
most important
• ECMAScript is a general object based scripting language
– ECMAScript plus the DOM bindings is essentially JavaScript
– Of course Netscape 4 and IE5 do not follow (exactly) the W3C
DOM
– Mozilla (Netscape 6) http://www.mozilla.org/js/ does support
the W3C DOM Interface – fully at level 1 and partially at level
2
10/4/2015
xmldomfall01
26
• Note that
Netscape 6
supports
XML
• This comes
from
http://home.ne
tscape.com/br
owsers/future/
standards.htm
l
• In Netscape 6
and Mozilla
“everything”
(Web page
and Browser
adornments)
are controlled
by DOM
interface
10/4/2015
Netscape 6
and Level 1
DOM
xmldomfall01
27
DOM Level 1Core
• In the DOM, one builds a tree out of a set of Node objects
• Each Node object has a set of generic capabilities (properties
and methods) and also implements specific interfaces. In the
CORE one defines a set of Node types to reflect the structure
of XML. Each Node type has its own interface to reflects its
special features.
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
Node
…….. etc.
10/4/2015
xmldomfall01
28
Node Types in Level 1 Core I
• For each Node Type, we give the allowed children
• Document -Element (maximum of one),
ProcessingInstruction, Comment, DocumentType
• DocumentFragment -- Element, ProcessingInstruction,
Comment, Text, CDATASection, EntityReference
• DocumentType -no children
• EntityReference -Element, ProcessingInstruction,
Comment, Text, CDATASection, EntityReference
• Element -Element, Text, Comment,
ProcessingInstruction, CDATASection, EntityReference
10/4/2015
xmldomfall01
29
Node Types in Level 1 Core II
•
•
•
•
•
•
Attr -Text, EntityReference
ProcessingInstruction -- no children
Comment -no children
Text -no children
CDATASection -no children
Entity -Element, ProcessingInstruction,
Comment, Text, CDATASection, EntityReference
• Notation -no children
10/4/2015
xmldomfall01
30
The
Node
Constants
Interface
in
CORBA
IDL
Properties
Methods
10/4/2015
xmldomfall01
31
nodeName nodeValue attributes
• Each Node type has particular rules for values of some of the
properties – most importantly nodeName and nodeValue
• attributes is property only allowed for an element document type
Node Type
10/4/2015
xmldomfall01
32
Document Fragment
• This is a lightweight “document” used to denote a part of a Tree.
As it does not carry all the overhead of an XML object instance, it
is a convenient way of denoting a sub tree including all leaf nodes
below a certain internal node.
• This is an important building block for documents
Node
Document Fragment
Node
Node
Node
10/4/2015
Node
Node
Node
Node
Node
Node
Node
Node
Node
xmldomfall01
Node
Node
Node
Node
33
This page
is full of
Documents
Fragments
such as
or
10/4/2015
xmldomfall01
34
Properties of a Node I
• nodeName
– The name of this node, depending on its type; see the table
above.
• nodeValue
– The value of this node, depending on its type; see the table
above.
– Exceptions on setting: DOMException
• NO_MODIFICATION_ALLOWED_ERR: Raised when
the node is readonly.
– Exceptions on retrieval: DOMException
• DOMSTRING_SIZE_ERR: Raised when it would return
more characters than fit in a DOMString variable on the
implementation platform.
• nodeType
– A code representing the type of the underlying object, as
defined above.
10/4/2015
xmldomfall01
35
• parentNode
Properties of a Node II
– The parent of this node. All nodes, except Document, DocumentFragment, and
Attr may have a parent. However, if a node has just been created and not yet
added to the tree, or if it has been removed from the tree, this is null.
• childNodes
– A NodeList that contains all children of this node. If there are no children, this is
a NodeList containing no nodes. The content of the returned NodeList is "live"
in the sense that, for instance, changes to the children of the node object that it
was created from are immediately reflected in the nodes returned by the
NodeList accessors; it is not a static snapshot of the content of the node. This is
true for every NodeList, including the ones returned by the
getElementsByTagName method.
• firstChild
– The first child of this node. If there is no such node, this returns
null.
• lastChild
– The last child of this node. If there is no such node, this returns
null.
10/4/2015
xmldomfall01
36
Properties of a Node III
• previousSibling
– The node immediately preceding this node. If there is no such
node, this returns null.
• nextSibling
– The node immediately following this node. If there is no such
node, this returns null.
• attributes
– A NamedNodeMap containing the attributes of this node (if it is an
Element) or null otherwise.
• ownerDocument
– The Document object associated with this node. This is also the
Document object used to create new nodes. When this node is a
Document this is null.
10/4/2015
xmldomfall01
37
Methods of a Node I
• insertBefore (newChild, refChild)
– Inserts the node newChild before the existing child
node refChild. If refChild is null, insert newChild at
the end of the list of children.
– If newChild is a DocumentFragment object, all of its
children are inserted, in the same order, before
refChild. If the newChild is already in the tree, it is
first removed.
• replaceChild (newChild, oldChild)
– Replaces the child node oldChild with newChild in
the list of children, and returns the oldChild node. If
the newChild is already in the tree, it is first removed.
10/4/2015
xmldomfall01
38
Methods of a Node II
• removeChild (oldChild)
– Removes the child node indicated by oldChild from
the list of children, and returns it.
• appendChild (newChild)
– Adds the node newChild to the end of the list of
children of this node. If the newChild is already in the
tree, it is first removed.
• hasChildNodes
– This is a convenience method to allow easy
determination of whether a node has any children.
– It returns true if there are any Child Nodes
10/4/2015
xmldomfall01
39
Methods of a Node III
• cloneNode (deep)
– Returns a duplicate of this node, i.e., serves as a generic copy
constructor for nodes. The duplicate node has no parent
(parentNode returns null.).
– Cloning an Element copies all attributes and their values,
including those generated by the XML processor to represent
defaulted attributes, but this method does not copy any text it
contains unless it is a deep clone, since the text is contained in a
child Text node. Cloning any other type of node simply returns
a copy of this node.
• Parameter deep: If true, recursively clone the subtree under the
specified node; if false, clone only the node itself (and its
attributes, if it is an Element).
10/4/2015
xmldomfall01
40
Two Specific Interfaces
• DocumentFragment:
• And Document
10/4/2015
xmldomfall01
41
HTML Level 1 DOM
• This has several extensions basically inheriting the XML
Interfaces of Core to specialize to each HTML tag
• An HTMLDocument interface, derived from the core Document
interface. HTMLDocument specifies the operations and queries
that can be made on a HTML document.
• An HTMLElement interface, derived from the core Element
interface. HTMLElement specifies the operations and queries that
can be made on any HTML element. Methods on HTMLElement
include those that allow for the retrieval and modification of
attributes that apply to all HTML elements.
• Specializations for all HTML elements that have attributes that
extend beyond those specified in the HTMLElement interface. For
all such attributes, the derived interface for the element contains
explicit methods for setting and getting the values.
10/4/2015
xmldomfall01
42
HTMLDocument Interface
• This uses another special interface data structure
HTMLCollection to hold lists of sub-components
10/4/2015
xmldomfall01
43
HTMLElement and Specializations
• Any HTML Element adds to Node
The <body> tag adds
10/4/2015
xmldomfall01
44
Two HTML
DOM API’s
• And <a> </a>
Link tag adds
while the select
element in a
form has a
bunch of new
properties and
methods
10/4/2015
xmldomfall01
45
Highlights of Event Model in Level 2 DOM
• Every Node can have Event Listeners added for types of Event
• For example taking mouse events, types are click, mousedown,
mouseup, mouseover, mousemove, mouseout
10/4/2015
xmldomfall01
46
Sample Event in DOM Level 2
• Here is a MouseEvent
• Note you can in DOM both receive
events and create them
programmatically. This capability
was not implemented properly in
Netscape 4 – sometimes you could
and sometimes you couldn’t xmldomfall01
10/4/2015
47
Descargar

XML DOM and Java Fall 2001