XML - What It Means To You
William J. “Bill” McCalpin
EDPP, CDIA, MIT, LIT
Principal
MHE
MHE - Consultants for Document and Datament Technologies
Introduction
The Hegelian Dialectic
MHE - Consultants for Document and Datament Technologies
Thesis, Antithesis, Synthesis
In the philosophy of Hegel,
these words show the
inevitable transition of
thought, by contradiction
and reconciliation, from
an initial conviction to its
opposite and then to a
new, higher conception
that involves but
transcends both of them
MHE - Consultants for Document and Datament Technologies
The Hegelian Dialetic
• Thesis: Most business have wellestablished, productive legacy systems
• Antithesis: XML is springing forth
everywhere
• Synthesis: XML will be integrated with
legacy systems - enhancing some processes,
changing many others, and eliminating
some altogether
• In short, XML will affect what you do
MHE - Consultants for Document and Datament Technologies
How To Relate XML to
Everyman
• You might think that XML is too esoteric
for most people to understand
• But XML is based on the basic human need
exchanging information
• XML couples the communication skills we
have used over the last several thousand
years to modern, Internet technology
• So how can you understand it?
MHE - Consultants for Document and Datament Technologies
Sex And The Single Pixel
Or, How To Explain XML Through
Human Relationships
MHE - Consultants for Document and Datament Technologies
Men Are From Mars
Women Are From Venus
• Author John Gray has
the best selling book
describing the
difficulties of
communication
• Why would there be
such difficulties?
MHE - Consultants for Document and Datament Technologies
Communication Difficulty #1
• In order for any communication to take
place, both parties must share the same
fundamental mechanism which carries
information
• For example, in writing, if a boy and girl
don’t even share the same writing schemes,
they can’t possibly understand...
MHE - Consultants for Document and Datament Technologies
Chinese Characters vs Latin
Alphabet
“I Love You”
MHE - Consultants for Document and Datament Technologies
Underlying Structure of XML
• Text characters
• Tags are delimited by “<“ and “>”, i.e.
<xml>
• Ending tags have “/”, e.g., </xml>
• Parameters are indicated by double quotes,
e.g., <PAPER track="Application">
• XML is a series of tags and data, e.g.,
<STATE>Texas</STATE>
MHE - Consultants for Document and Datament Technologies
Communication Difficulty #2
• Once both parties agree to the fundamental
syntax, then both parties must next agree to
the words to be used
• In the case of XML, how do both parties
know that <STATE> means a political
subdivision and not one of
{gas,liquid,solid}?
MHE - Consultants for Document and Datament Technologies
A Date Gone Bad
• One evening in the
hotel lobby bar, two
young Italian men
spend a while talking
to an attractive
Venezuelan girl...and
her aunt
• They spoke Italian and
she spoke Spanish, but
they communicated
passably
MHE - Consultants for Document and Datament Technologies
A Date Still Going Bad
• However, the aunt
wanted to go up to her
room with her niece
• The Italians wanted to
take the young lady
out dancing...
• So they asked her:
MHE - Consultants for Document and Datament Technologies
Oops
• What the boys said:
• What the young lady
needed to hear:
“Vuoi andare con noi
‘sta sera?
“Quisieras ir con
nosotros esta tarde?”
MHE - Consultants for Document and Datament Technologies
Miscommunication
• Even though Italian and Spanish use the
same sounds, the same grammar, and have a
common ancestry in Latin, some words are
different
• Unfortunately, the most common words in
both languages are likely to be the most
different
MHE - Consultants for Document and Datament Technologies
The Cost Of Data Differences
“NASA lost a $125
million Mars orbiter
because one
engineering team used
metric units while
another used English
units for a key
spacecraft
operation...” CNN
9/30/99
MHE - Consultants for Document and Datament Technologies
XML “Words”
• HTML has a certain
number of fixed tags everyone knows what
they are, but they can’t
be augmented
• In XML, everyone can
make up their own
tags to suit their needs
- but how do we avoid
a Tower of
CyberBabel?
MHE - Consultants for Document and Datament Technologies
Communication Difficulty #3
• Even when you agree to common tags, you
still need to agree to a common
understanding
• In XML, the Schema (now replacing the
DTD) defines what tags are allowed to
describe a particular collection of data
• For example, in the field of human
relations, what is a “date”?
MHE - Consultants for Document and Datament Technologies
One DTD For A “Date”
• A woman thinks:
– Invitation - formal
– Dress-up - nicely
– Eat out – dinner with
wine at nice restaurant
– Entertainment – see a
movie
– Private moment – good
night kiss
• <!DOCTYPE Date [
• <!ELEMENT Date (Invitation, Dress,
Meal, Entertainment+, Intimacy) >
• <!ELEMENT Invitation (#PCDATA) >
• <!ELEMENT Dress (#PCDATA) >
• <!ELEMENT Meal (#PCDATA) >
• <!ELEMENT Entertainment
(#PCDATA) >
• <!ELEMENT Intimacy (#PCDATA) >
MHE - Consultants for Document and Datament Technologies
A Woman’s View Of A “Date”
<date>
<invitation>Telephone
call</invitation>
<dress>Long dress</dress>
<meal>4-star
restaurant</meal>
<entertainment>the
theatre</entertainment>
<intimacy>A passionate,
romantic kiss</intimacy>
</date>
MHE - Consultants for Document and Datament Technologies
Another DTD For A “Date”
• A man thinks:
– Eat out – six-pack
– Private moment –
necking
•
•
•
•
<!DOCTYPE Date [
<!ELEMENT Date (Meal,Intimacy+) >
<!ELEMENT Meal (#PCDATA) >
<!ELEMENT Intimacy (#PCDATA) >
MHE - Consultants for Document and Datament Technologies
A Man’s View Of A “Date”
<date>
<meal>six-pack of
beer</meal>
<intimacy>necking
</intimacy>
</date>
MHE - Consultants for Document and Datament Technologies
When Men And Women Agree
<date>
<invitation>Telephone
call</invitation>
<dress>Long dress</dress>
<meal>4-star
restaurant</meal>
<entertainment>the
theatre</entertainment>
<intimacy>A passionate,
romantic kiss</intimacy>
</date>
<date>
<invitation>Honking
</invitation>
<dress>Not the shirt he
changed the oil in</dress>
<meal>food and beer</meal>
<entertainment>rent a
video</entertainment>
<intimacy>A passionate,
romantic kiss while
necking</intimacy>
</date>
MHE - Consultants for Document and Datament Technologies
Presentation
• In human relationships, it’s normal for
someone to present themselves in the best
light possible
• We try to minimize any deficiencies while
maximizing our positive attributes
• Thus, we would like to present ourselves as:
MHE - Consultants for Document and Datament Technologies
Author’s View
MHE - Consultants for Document and Datament Technologies
Original Data
MHE - Consultants for Document and Datament Technologies
XSL
• XSL - eXtended Style Language
• XSL is derived from CSS - Cascading Style
Sheets
• XSL can enable the author to create one or
many views of XML
• Since XSL can be separate from the XML
object, the reader can apply the presentation
information as well as the author
MHE - Consultants for Document and Datament Technologies
Communication Difficulty #4
• When all we had was paper and film, the
author alone controlled the presentation of
the data
• One of the great advantages of electronic
formats is that the presentation of data can
now be put into the hands of the reader
• How can we describe this in the field of
human relationships?
MHE - Consultants for Document and Datament Technologies
MHE - Consultants for Document and Datament Technologies
Three Bachelors To Choose From
• Our contestant has to
choose from 3
bachelors
• But if the information
about the bachelors
were on paper, then
the information would
be presented only one
way
MHE - Consultants for Document and Datament Technologies
How To Choose?
Bachelor List
But with XML (and
other electronic
formats like HTML),
our contestant can
view the information
in different ways, to
help her make her
decision
MHE - Consultants for Document and Datament Technologies
The Datament
tm
MHE - Consultants for Document and Datament Technologies
The “Datament”
• Efforts to expand the meaning of
“document” to include all manner of
electronic formats have been unsuccessful
• Hence, we have invented the concept of the
tm
“datament , which is a “organized
collection of information in time” which
can be viewed by both human and machine
MHE - Consultants for Document and Datament Technologies
The Readers Of Dataments
• Because the datament is in XML,
presentation information can be ignored and
the data directly extracted from the
appropriate tags
• Dataments can also carry one or more
“views” of the data.
– One view should be the original static view
– Another view can allow the reader flexibility
MHE - Consultants for Document and Datament Technologies
Why Multiple Views?
• Think of a 60,000 page phone bill - it’s
impossible to make any sense of it without
sorting, hiding, etc. like with a spreadsheet
• On the other hand, if one reader alters the
view, then another reader might miss
important information, hence there is a
“default” view
• This default or author-centric view will also
help satisfy regulatory authorities
MHE - Consultants for Document and Datament Technologies
Communication Difficulty #5
• Without resorting to bars, how can people
easily find compatible partners?
• Now think about all the classified ads you
might have to pore through in order to find
someone who interests you
• Fortunately, personals have a standard
indexing method
MHE - Consultants for Document and Datament Technologies
A Personal Ad
•
•
•
•
DWF - “divorced white female”
SBM - “single black male”
WBFP - “wood burning fireplace” - oops
This system works because there is a
standard method of indexing personals
• If the authors of the classifieds made up
their own indexes, think of the confusion:
MHE - Consultants for Document and Datament Technologies
Apples And Oranges
• “nice DWM seeks girl
who wants a good
time”
• “cons w trvst in spc
prog seeks swng aln to
tk to their ldr”
MHE - Consultants for Document and Datament Technologies
Extending XML
• XML is not only a useful way to accurately
describe people, er, information, but it can
be use as the basis of many other standards
• For example, RDF stands for “Resource
Description Framework ”, that is, a
framework for describing and interchanging
metadata (i.e., information about
information).
MHE - Consultants for Document and Datament Technologies
XML
• XML has a common underlying syntax
• Industries and groups can create XML tags
which suit their needs
• XML enables both the author and the reader
to control the presentation
• But let’s digress...
MHE - Consultants for Document and Datament Technologies
What Is A Document?
• The American Heritage
Dictionary defines a
document as
“information in writing
placed on a medium
such as paper, often
used as a record.”
• Documents have been
placed on clay tablets,
gold leaf, animal skins,
all types of paper,
microfilm,
MHEoptical
- Consultants for Document and Datament Technologies
storage, and so on
Information And Presentation
• In every case, the document represents a
fundamental union of information and
presentation
• But “presentation” presumes that the
primary audience for the document is a
human being
• With the coming of the Internet, this is no
longer the case
MHE - Consultants for Document and Datament Technologies
The Curse Of Presentation
• Composition
products
require that
you specify a
printer, even
before you
know where
the
document
will print
MHE - Consultants for Document and Datament Technologies
Why Are Print, Image, And
Presentation Formats
Incompatible?
MHE - Consultants for Document and Datament Technologies
Printing And Imaging Formats
• Many printing formats: AFP, Metacode,
DJDE, XES (UDK), PostScript, PCL, etc.
• All formats use external resources like
fonts, forms, graphics, etc., although
sometimes inconsistently
• Most are escape-sequence based, some are
formal data architectures, and some are
almost programming languages
MHE - Consultants for Document and Datament Technologies
Printing And Imaging Formats
• Many imaging formats - while most used
CCITT Group 4 for image compression,
most also had proprietary data wrappers
• Later systems adopted text-based formats
such as PDF, although storing other print
streams is not unknown
• Systems which store text-based formats
must wrestle with resource issues
MHE - Consultants for Document and Datament Technologies
Different Print Formats
• Why do printers have different formats?
Because of physical constraints imposed by
the hardware:
– resources reduce the amount of data sent
through pipeline to printer
– pages must be imaged in less than a fraction of
a second
– complex graphics can be developed on the
printer, but this needs a special language
MHE - Consultants for Document and Datament Technologies
Different Imaging Formats
• Why do imaging systems have different
formats: because of physical constraints
imposed by the hardware:
– Mass storage was expensive
– Indexing schemes were too close to the
application
– Text is avoided sometimes because of resource
issues
– Interoperability with other products an issue
MHE - Consultants for Document and Datament Technologies
Result
• In each case, data architecture decisions
were made in order to enhance some aspect
of legibility of the stored objects.
• If there were no requirement to present the
information (to a human reader), then the
requirement for custom data formats for
each vendor would probably disappear!
MHE - Consultants for Document and Datament Technologies
Universal Literacy
Who’s reading our documents?
MHE - Consultants for Document and Datament Technologies
The Road To Universal Literacy
• First, only the few
could read
• After the printing
press, the many began
to read
• Eventually,
educational reforms
brought the ability to
read to all
MHE - Consultants for Document and Datament Technologies
Literacy In The Internet Age
• Can there be a spread of literacy beyond
“all”?
• How many webpages have you ever read?
• You will never be able to keep up with the
Web – alone
• There are already an estimated 98,685,000
host computers on the Internet
(www.mids.org)
MHE - Consultants for Document and Datament Technologies
Intelligent Agents
• Just around the corner is software that will
read the Web for us – not search, but read
• So we have to spread literacy to an audience
beyond “all” – people, that is
• Does increased quality in presentation mean
better computer literacy?
MHE - Consultants for Document and Datament Technologies
Noise On The Net
• Think of the average webpage:




three dimensional spinning objects
marquees scrolling across the bottom
multiple frames bookmarks
audio
• These items are all designed to attract the
eye – your eye
• This does nothing for the machine reading
the webpage
MHE - Consultants for Document and Datament Technologies
Two Important Truths
• There are two important truths of the
Internet era:
– Documents which are read by humans need to
be dynamic in their presentation
– Documents which are read by computers don’t
need any presentation information at all
• XML totally divorces presentation from
information!
MHE - Consultants for Document and Datament Technologies
What Have We Learned About
XML?
MHE - Consultants for Document and Datament Technologies
XML Summary
• XML uses tags to describe data
– <state>Texas</state>
• Businesses and non-profits join together to
build DTD/Schemas to describe data objects
in their spaces
– <?xml version="1.0" encoding="ISO-8859-1"?>
– <!DOCTYPE claim [
MHE - Consultants for Document and Datament Technologies
XML Summary
• An XML “document” contains information
for a particular event or transaction which
can be understand by both parties
• XML ‘documents’ can be intended for two
types of readers: human and machine
MHE - Consultants for Document and Datament Technologies
XML Summary
• XML ‘documents’ intended for a machine
do not require any presentation information
• XML ‘dataments’ carry the information
which enables both static (author-centric)
and dynamic (reader-centric) presentations,
using XSL
MHE - Consultants for Document and Datament Technologies
What Will You Tell Your Boss?
“Well, this dude named Hegel met Drew
Carey while speaking Spanish in an Italian
bar when they met a transvestite space alien
who was looking for a missing NASA
satellite who told them that women were not
either from Venus and that Mimi and Pierce
Brosnan were on a date but each was
reading different versions of the same menu
because it was a datament in XML.”
MHE - Consultants for Document and Datament Technologies
Reference
• www.w3c.org - the official World Wide
Web Consortium site (you’ll find links to
the XML spec here)
MHE - Consultants for Document and Datament Technologies
William J. “Bill” McCalpin
EDPP, CDIA, MIT, LIT
Principal, MHE
1400 Cheyenne Dr.
Richardson, Texas 75080-3921
972-231-3660 (v) 972-690-4521 (f)
[email protected]
MHE - Consultants for Document and Datament Technologies
Descargar

No Slide Title