XML: what is
 The Extensible Markup Language (XML) is a
general-purpose specification for creating custom
markup languages
 markup language is an artificial language using a
set of annotations to text that give instructions
regarding how text is to be displayed.
 A well-known example of a markup language in use in
computing is HyperText Markup Language (HTML)
 It is classified as an extensible language because it
allows its users to define their own elements
XML - Introduction
XML: cosa è
XML è un metalinguaggio, che permette di definire sintatticamente
linguaggi di markup
definisce un insieme regole (meta)sintattiche, attraverso le quali
è possibile descrivere formalmente un linguaggio di markup,
detto applicazione XML
ogni applicazione XML eredita da XML un insieme di caratteristiche
sintattiche comuni
 ogni applicazione XML a sua volta definisce una sintassi formale
XML permette di esplicitare la (le) struttura(e) di un documento in modo
formale mediante marcatori (markup) che vanno inclusi all’interno del
testo (character data)
Il markup rappresenta la struttura logica del documento
Il markup si riconosce dal resto del testo perché compreso tra delimiter,
XML - Introduction
XML in 10 Points
1. XML is for structuring data
XML documents reflect the structure of the data that they contain. For
example, if the document were a book, it might contain <section>
elements, which would in turn contain <chapter> elements, and so
XML is a set of rules (you may also think of them as guidelines or
conventions) for designing text formats that let you structure your
XML makes it easy for a computer to generate data, read data, and
ensure that the data structure is unambiguous.
XML avoids common pitfalls in language design: it is extensible,
platform-independent, and it supports internationalization and
localization. fully Unicode-compliant.
XML - Introduction
XML in 10 Points
2. XML looks a bit like HTML
 Like HTML, XML makes use of tags (words bracketed by '<'
and '>') and attributes (of the form name="value").
 While HTML specifies what each tag and attribute means, and
often how the text between them will look in a browser, XML
uses the tags only to delimit pieces of data, and leaves the
interpretation of the data completely to the application that
reads it.
 In other words, if you see "<p>" in an XML file, do not assume it is
a paragraph. Depending on the context, it may be a price, a
parameter, a person, a p... (and who says it has to be a word with
a "p"?).
XML - Introduction
XML in 10 Points
3. XML is text, but isn't meant to be read
 Although XML is verbose, and it is all ASCII text, XML is still
designed primarily to be used by automated systems, not
necessarily read by humans.
 Like HTML, XML files are text files that people shouldn't have
to read, but may when the need arises.
 Compared to HTML, the rules for XML files allow fewer
variations. A forgotten tag, or an attribute without quotes makes
an XML file unusable, while in HTML such practice is often
explicitly allowed.
XML - Introduction
XML in 10 Points
4. XML is verbose by design
 Since XML is a text format and it uses tags to delimit the data,
XML files are nearly always larger than comparable binary
 That was a conscious decision by the designers of XML. The
advantages of a text format are evident, and the disadvantages
can usually be compensated at a different level.
 Disk space is less expensive than it used to be, and compression
programs like zip and gzip can compress files very well and very
 In addition, communication protocols such as modem protocols
and HTTP/1.1, the core protocol of the Web, can compress data
on the fly, saving bandwidth as effectively as a binary format.
XML - Introduction
XML in 10 Points
5. XML is a family of technologies
The core of XML is the XML 1.0 recommendation. Beyond XML 1.0, "the
XML family" is a growing set of modules that offer useful services to
accomplish important and frequently demanded tasks
XLink describes a standard way to add hyperlinks to an XML file.
XPointer is a syntax in development for pointing to parts of an XML document. An
XPointer is a bit like a URL, but instead of pointing to documents on the Web, it
points to pieces of data inside an XML file.
CSS, the style sheet language, is applicable to XML as it is to HTML.
XSL is the advanced language for expressing style sheets. It is based on XSLT, a
transformation language used for rearranging, adding and deleting tags and
The DOM is a standard set of function calls for manipulating XML (and HTML)
files from a programming language.
XML Schemas 1 and 2 help developers to precisely define the structures of their
own XML-based formats.
XML - Introduction
XML in 10 Points
6. XML is new, but not that new
 Development of XML started in 1996 and it has been a W3C
Recommendation since February 1998, which may make you
suspect that this is rather immature technology.
 In fact, the technology isn't very new. Before XML there was
SGML, developed in the early '80s, an ISO standard since
1986, and widely used for large documentation projects.
 The designers of XML simply took the best parts of SGML,
guided by the experience with HTML, and produced something
that is no less powerful than SGML, and vastly more regular
and simple to use.
XML - Introduction
XML in 10 Points
7. XML leads HTML to XHTML
 There is an important XML application that is a document
format: W3C's XHTML, the successor to HTML. XHTML has
many of the same elements as HTML.
 The syntax has been changed slightly to conform to the rules
of XML. A format that is "XML-based" inherits the syntax from
XML and restricts it in certain ways (e.g, XHTML allows "<p>",
but not "<r>"); it also adds meaning to that syntax (XHTML
says that "<p>" stands for "paragraph", and not for "price",
"person", or anything else).
XML - Introduction
XML in 10 Points
8. XML is modular
 Using XML, you can define vocabularies that are designed to
be reused.
 By creating DTDs or XML Schemas, you can create sets of
documents that are all based on common vocabularies.
 Similarly, using XML Namespaces, you can publish and share
those vocabularies without conflicts.
 Since two formats developed independently may have elements
or attributes with the same name, care must be taken when
combining those formats (does "<p>" mean "paragraph" from this
format or "person" from that one?).
XML - Introduction
XML in 10 Points
9. XML is the basis for RDF and the Semantic Web
 RDF, or the Resource Description Framework, and the
Semantic Web are both initiatives of the W3C to help refine the
way information is organized on the Web.
 XML is the basis of these technologies, and will help organize
the information on the Web, making it easier for users to find
and access the information they need.
XML - Introduction
XML in 10 Points
10. XML is license-free, platform-independent and
 XML is not owned by any corporation, nor is it controlled by a
 It is a publication of the W3C, and as such, it can be used
freely by anyone.
 And although some may have issues with the W3C process, or
what ends up in the final Recommendations, the bottom line is
that it makes XML a fairly open standard. (open standard is a
standard that is publicly available and has various rights to use
associated with it. )
XML - Introduction
Riferimenti in Italiano
 XML in 10 punti
 Questo sommario in 10 punti cerca di raccogliere
alcuni concetti basilari che permettano al neofita di
vedere un po' di luce attraverso la nebbia. di
Andrea Benassi 26 Novembre 2003
 http://www.indire.it/content/index.php?action=read
XML - Introduction
 XML is recommended by the World Wide Web
Consortium (W3C).
 The recommendation specifies both the lexical
grammar and the requirements for parsing.
 Lexical That is, the rules governing how a character
sequence is divided up into subsequences of
characters, each of which represents an individual
 parsing, or, more formally, syntactic analysis, is
the process of analyzing a sequence of tokens to
determine their grammatical structure with respect to
a given (more or less) formal grammar.
XML - Introduction
 It started as a simplified subset of the Standard
Generalized Markup Language (SGML)
 The versatility of SGML for dynamic information
display was understood by early digital media
publishers in the late 1980s prior to the rise of the
 By the mid-1990s some practitioners of SGML had
gained experience with the World Wide Web, and
believed that SGML offered solutions to some of the
problems the Web was likely to face as it grew.
 Dan Connolly added SGML to the list of W3C's
activities when he joined the staff in 1995; work
began in mid-1996 when Sun Microsystems engineer
Jon Bosak developed a charter and recruited
XML - Introduction
XML was compiled by a working group of eleven
members, supported by an (approximately) 150-member
Interest Group. Technical debate took place on the
Interest Group mailing list and issues were resolved by
consensus or, when that failed, majority vote of the
Working Group.
The XML Working Group never met face-to-face; the
design was accomplished using a combination of email
and weekly teleconferences. The major design decisions
were reached in twenty weeks of intense work between
July and November 1996, when the first Working Draft of
an XML specification was published.
 Further design work continued through 1997,
and XML 1.0 became a W3C Recommendation
on February 10, 1998.
XML - Introduction
Working Group's goals
 Internet usability, general-purpose usability
 SGML compatibility
 Facilitation of easy development of processing
software and minimization of optional features
 Legibility, formality, conciseness, and ease of
 Like its antecedent SGML, XML allows for some
redundant syntactic constructs and includes
repetition of element identifiers.
 In these respects, terseness was not considered
essential in its structure.
XML - Introduction
The name “XML” …. other names
 "MAGMA" (Minimal Architecture for Generalized
Markup Applications)
 "SLIM" (Structured Language for Internet Markup)
 "MGML" (Minimal Generalized Markup Language).
XML - Introduction
Perché non SGML?
 SGML ha molti pregi, ma ha dalla sua una complessità
d’uso e di comprensione notevole
 Non è pensato per la rete
 XML contiene tutte le caratteristiche di SGML che servono
per creare applicazioni generali
...senza scendere nel livello di dettaglio
e pedanteria richiesti da SGML
 Inoltre, il successo di HTML ha fatto capire che:
 Il mondo degli sviluppatori è pronto ad accogliere il modello
basato sul markup
 La semplicità è un punto di forza fondamentale
The differences between SGML and XML are highlighted in a note published by
the W3C, which can be found at: http://www.w3.org/TR/NOTE-sgmlxml-971215 .
XML - Introduction
XML version
XML 1.0, was initially defined in 1998.
The second, XML 1.1, was initially published on February 4,
2004, the same day as XML 1.0 Third Edition, and is currently in
its second edition, as published on August 16, 2006.
It has undergone minor revisions since then, without being given a
new version number, and is currently in its fourth edition, as
published on August 16, 2006. It is widely implemented and still
recommended for general use.
XML 1.1 is not very widely implemented and is recommended for
use only by those who need its unique features.
XML 1.0 and XML 1.1 differ in the requirements of characters
used for element and attribute names: XML 1.0 only allows
characters which are defined in Unicode 2.0, which includes most
world scripts, but excludes those which were added in later
Unicode versions.
XML - Introduction
HTML case
XML non è un sostituto di HTML
 HTML nasce come DTD di SGML per la pubblicazione
di semplici documenti testuali con qualche
immagine e collegamento ipertestuale
 Vengono implementate nel tempo molte estensioni
proprietarie che creano barriere all’interoperatività
degli strumenti
 I browser (parser) rilassano le regole sintattiche ed
interpretano anche documenti HTML “scorretti”
 HTML è per presentare informazioni, XML è per
descrivere informazioni.
XML - Introduction
Many Technologies Contribute to
the Power of XML
If you wanted to use XML as a file format for storing information,
and then publishing that information in print, on CD-ROM, and on
the World Wide Web, you would need to make use of some
other technologies that are not specifically XML, but might be
based on XML, or be supplementary to XML.
You might have an XML document that you want to display on the Web;
however, XML documents do not contain any information about display
formatting. To transform the XML data into HTML or XHTML for displaying it
on the Web, you might need to use a style sheet, such as the
Extensible Stylesheet Language (XSL)
XML - Introduction
Documet Type Definition
 You might also need to specify exactly how XML
files are to be structured, using a set of rules (
Document Type Definition (DTD)).
 DTDs are an integral part of creating valid XML, but
they are actually not formally defined anywhere.
 DTDs are a holdover from SGML, maintained for
compatibility reasons.
 The syntax used for the declarations in DTDs is
defined as a part of the XML 1.0 Recommendation
 DTDs are useful—without them or another type of
schema, it is impossible to verify that an XML file is
structured properly within the rules the author had
in mind.
 But DTDs are not required in order to use XML
XML - Introduction
Note: XML can come in two
varieties: well formed and valid
Well-formed XML means that the
XML is written in the proper format,
and that it complies with all the rules
for XML as set forth in the XML 1.0
Valid XML means that the XML
document has been validated against
a rule set, or schema,
XML - Introduction
XML 1.0 Reccomandation defines
the basic structures of XML
CDATA sections
PCData Sections
This includes defining the conventions for names,
case sensitivity, start tags, end tags, and so on.
 Everything you need to work with well-formed
XML is contained within this one
XML - Introduction
XML-Related Recommendations
There are also a number of W3C Recommendations that are
very closely related to the core XML technology.
In this category, the Recommendations define some
technologies that are designed specifically to add functionality
to XML 1.0.
These technologies include XML Namespaces
and XML Schemas
XML - Introduction
 XML allows developers to create their own markup
languages, for use in a variety of applications.
 However, there is nothing to stop two developers
from developing markup languages that have
similar tags, but with different structure or
 If both of these developers were using their markup
languages internally only, this might not be a
 But what if these developers start sharing their
vocabularies with their clients, vendors, and the
general public? The result could be confusion about
what tag means what, and in what context.
XML - Introduction
Namespace example (I)
Developer One designs a <name> element that looks like this:
Developer Two, however, prefers to use a <name> element with no
<name>John Doe</name>
For example, what happens if a vendor is working with both
XML - Introduction
Namespace example (II)
Create elements as being a part of a specific namespace.
This means that when they are used, the parser is aware that they
belong to a namespace, and if a similar element is used, but it
belongs to a different namespace, there is no conflict.
Namespaces make use of a special attribute called xmlns that
allows you to define a prefix and the namespace URI.
<?xml version="1.0"?>
<vendor:name>John Dough</vendor:name>
XML - Introduction
XML Schemas
In order to be considered valid, the XML document needs
to either have a DTD or an XML Schema.
 XML Schemas represent a formal schema
language for defining the structure of XML
The XML Schema specification deals with some of the
shortcomings of DTDs, such as the lack of robust data
structures, and also abandons the cryptic syntax of DTDs
for an easier-to-use XML-based syntax
XML - Introduction
XML Family
 There are also a number of W3C Recommendations
that deal with various aspects of XML that are not
necessarily related to the structure of an XML
 but provide mechanisms for implementing XML in
practical solutions.
 These recommendations are related to the display
or navigation of XML documents.
 XML è in realtà una famiglia di linguaggi.
 Alcuni hanno l’ambizione di standard, altri sono solo
proposte di privati o industrie interessate. Alcuni
hanno scopi generali, altri sono applicazioni specifiche
per ambiti ristretti.
XML - Introduction
Extensible Stylesheet Language
 Stylesheet language designed to aid in the
presentation of XML.
 As a stylesheet language, it is similar to
Cascading Style Sheets (CSS), although there
are some significant differences
 XSL uses an XML syntax to specify how
elements within an XML document should be
XML - Introduction
Extensible Stylesheet Language
(XSL) example
<title>Introducing XML</title>
<byline>John Doe</byline>
<body>Learning about XML is not complicated...</body>
If we wanted to display the title of the document in italic, we could
use an XSL sheet that looks something like this:
<xsl:template match="title">
<fo:block font-style="italic">
When the stylesheet and XML document are processed by an
XSL-capable parser, the result will be a document displayed with
the title in italic.
XML - Introduction
Extensible Stylesheet Language
XSLT is a technology that allows developers to author a
stylesheet which when processed, will result in the elements
and attributes of an XML document being transformed into
another format.
For example, by using XSLT it is possible to transform an
XML element:
<byline>John Doe</byline>
into an HTML tag set:
<b>John Doe</b>
XML - Introduction
XPath is a Recommendation that was developed specifically for
locating components within an XML document
XPointer is a Recommendation that allows developers to easily
refer to and locate XML document fragments.
This is very useful for several types of applications, including the ability to have multiple
authors working on a single large XML document, or making extremely large XML
documents more manageable for editing purposes.
XPointer enables you to specify points and ranges within your XML documents, which
can then be treated as "mini" documents in their own right.
XML - Introduction
One of the most powerful aspects of information on the World Wide Web is the ability to
link together documents of interest. Therefore, a linking mechanism for XML
documents naturally increases the power of XML.
The XLink and XBase Recommendations are both used to
specify information about linking XML documents together.
Linking in XML is more complicated than in HTML, because there are more types of
links available to developers
There are also applications where simply linking between documents might not be ideal
and you might want to build a large XML document from a set of smaller documents.
For that purpose, there is the XInclude Recommendation,
which provides the means to include sets of XML documents
XML - Introduction
into a single document structure.
Processing XML files
 Three traditional techniques for processing XML files
 Using a programming language and the SAX API.
 Using a programming language and the DOM API.
 Using a transformation engine and a filter (XSL)
An application programming interface (API) is a set
of functions, procedures, methods or classes that
an operating system, library or service provides to
support requests made by computer programs
XML - Introduction
Document Object Model, or DOM
 XML and structured documents like XML are trees,
and the DOM is essentially an API for manipulating
the document tree.
 Rather than an API based on user events (such as
clicking a mouse), the DOM is based on the
structure of the document itself.
 The DOM is likely to be best suited for applications
where the document must be accessed repeatedly
or out of sequence order.
 If the application is strictly sequential and one-pass,
the SAX model is likely to be faster and use less
XML - Introduction
Simple API for XML, or SAX
 SAX is an event-driven API, which means that rather than
working with the document structure as a whole, SAX
allows you to deal with specific parts of a document as the
document is parsed.
The quantity of memory that a SAX parser must use in order to
function is typically much smaller than that of a DOM parser.
DOM parsers must have the entire tree in memory before any processing
can begin.
The memory footprint of a SAX parser, by contrast, is based only on the
maximum depth of the XML file
Because of the event-driven nature of SAX, processing documents
can often be faster than DOM-style parsers. Memory allocation takes
time, so the larger memory footprint of the DOM is also a
performance issue.
Due to the nature of DOM, streamed reading from disk is impossible.
Processing XML documents that could never fit into memory is only
possible through the use of a stream XML parser, such as a SAX
XML - Introduction
XML and Data: Document Repositories
 There are a number of tools called document
repositories, which are designed specifically for
maintaining large documents or sets of documents.
 Because these tools are based in SGML, most have
rapidly adapted to XML and are available for use
 Document repositories can be viewed as specialized
databases, designed to work with large documents.
 They often have special features, such as the
capability to enable users to edit only a part of a
document, and then integrate that part into the
XML - Introduction
XML and Data: XQuery
The proper design of your database structure (the schema) is
The best data in the world is useless without proper queries.
Because XML documents are now being stored in relational
databases, object databases, document repositories, and as
simple flat files, the W3C wanted to create a common query
language which would enable users to create queries that
would work across all these different kinds of data
One way to look at XQuery is as an XML-specific SQL.
The advantage to XQuery for XML is that XQuery is being
designed specifically for XML,with the structure of XML documents
in mind.
XML - Introduction
The Related Technologies
There is another category of XML technologies called XML vocabularies.
These are individual markup languages that have been written using XML
XML vocabularies can be treated just like any other XML document, because
they are wellformed (and in many cases, valid) XML.
When you are developing XML documents, what you are really doing is
developing your own XML vocabularies. However, there may already be an
existing XML vocabulary that will meet your needs.
There are literally hundreds of XML vocabularies in existence. Some of these
vocabularies are being developed privately for use within a specific
organization. And some are being developed publicly for anyone to use.
The vocabularies we have chosen to cover here are vocabularies that are
being developed in conjunction with the W3C, and either are, or will likely
become, W3C Recommendations
XML - Introduction
Different Vocabularies : XHTML
 XHTML, which stands for XML HTML.
 XHTML is simply HTML, rewritten to comply with the
rules for being well-formed
 The reasoning behind this move is that XHTML will
allow XML applications to read and treat HTML as if
it were just another XML document
 One critical difference is that unlike HTML, XHTML is
case sensitive, and all the tags have to appear in
lower case. That is because XML is case sensitive,
so <body> and <BODY> are not the same tag.
 Additionally, XHTML requires that all tags be
properly closed and nested; HTML does not.
XML - Introduction
Different Vocabularies
To make wireless communication easier between devices, and to serve
documents to wireless devices, there is an XML-based vocabulary in use (and in
ongoing development) designed specifically for wireless: the Wireless Markup
Language (WML).
Scalable Vector Graphics (SVG) is an XML-based specification for creating
graphics, which could be used on the Web or in print. SVG enables these graphics
to be created in a text file, based on the geometry of the graphic.
Synchronized Multimedia Integration Language (SMIL) is an XMLbased language that allows developers to create multimedia presentations in an
XML-based language. It allows features similar to that of PowerPoint or Flash,
such as animated graphics, sounds, and the ability to interact with the
presentation on some level (such as following links)
Resource Description Framework (RDF) is primarily an XML-based format
for expressing metadata about information on the Web. Metadata is data about
data; for example, a table of contents in a book might be considered metadata
because it describes the contents of each chapter in the book.
XML - Introduction
Ragioni per l’uso di XML
 Trasmettere dati tra sistemi diversi (e spesso tra
piattaforme diverse)
 Inviare informazioni in un formato indipendente dalla sua
rappresentazione (separazione tra contenuti e
 Scambiarsi informazioni insieme alla struttura semantica
 dell’informazione
 Trasmettere dati che sono facilmente intellegibili sia
dall’uomo che dal computer
 Consentire alle imprese di accelerare l’integrazione con i
loro business partner
 Migliorare la diffusione delle informazioni dentro l’impresa e
sul web
 Permettere la gestione di quei documenti precedentemente
di competenza dell’EDI
XML - Introduction
Tecnologia XML Vantaggi
 Presentazione dei dati orientata all’utente
 La combinazione di XML+XSL:
 permette di separare la logica di business dalla logica di
 libera l’applicazione dai vincoli legati al device di
 Scambio di dati tra applicazioni
 l’integrazione tra applicazioni è possibile con uno sforzo,
che è una frazione di quello tradizionale dell’area EDI
 Pubblicazione di dati direttamente in XML
 il formato leggibile dalla macchina (UNICODE) può
essere combinato con altri dati ed elaborato
ulteriormente (impossibile con HTML)
XML - Introduction
Goldfarb e Prescod nel loro testo "The XML Handbook" dividono
tutte le applicazioni XML in due grandi categorie:
Il POP gestisce documenti il cui utente finale è un lettore umano.
POP (Presentation oriented publishing)
MOM (Message oriented middleware)
Il publishing di testi, di manuali, di presentazioni sono obiettivi di POP. Le
finalità di POP sono simili a quelle dell'HTML. Usando l'XML è però possibile
dare connotazioni strutturali più ricche ai testi (vedi: DocBook).
Gli stylesheet permettono di trasformare documenti che rappresentano la
struttura logica in documenti che descrivono il layout fisico. Cambiando
stylesheet, si può cambiare il modo in cui i documenti sono
Il MOM si basa sullo scambio di documenti XML fra programmi al fine di
svolgere una funzione coordinata in un ambiente distribuito.
Un esempio di MOM è la gestione automatica di ordini fra fornitori e clienti.
Il MOM può coinvolgere diversi tipi di risorse (p.e., database e sistemi di
message-queuing), per le quali si stanno diffondendo interfacce basate su
XML - Introduction
Presentation Oriented Publishing
 POP è stata l’applicazione killer di SGML
 Ha portato enormi risparmi alle aziende che
lavoravano sul Web negli anni ‘80
 Invece di creare documenti formattati, gli utenti
umani creano astrazioni non formattate
 Il file rappresenta ciò che è nel documento, non come
deve apparire
 L’utente POP non si preoccupa dei dati ma della
 Per ottenere il risultato desiderato specificare dei
foglio di stile, uno per la stampa, uno per il CD-Rom,
uno per il Web, etc.
XML - Introduction
Message Oriented Middleware
MOM l’applicazione killer di XML sul Web
MOM influenza radicalmente il concetto di middleware
XML - Introduction
 Content management
 presentation-oriented publishing
 one common data format
 multiple rendering styles (XSL)
 Data interchange/EDI
 data interchange / EDI
 interfacing of heterogeneous products
 inter-process communication (IPC)
 Application integration
 application-to-application communication
 Internet message formats (protocols)
 client/middle tier/server
 Data aggregation/portal
 enterprise information portals
XML - Introduction
Electronic Data Interchange
 The transfer of structured data, by agreed message
standards, from one computer system to another
without human intervention.
 Even in this era of technologies such as XML web
services, the Internet and the World Wide Web, EDI is
still the data format used by the vast majority of
electronic commerce transactions in the world.
 Comprende:
 Un set di regole sintattiche per strutturare i dati
 Un protocollo per lo scambio interattivo
 Messaggi standard
 Le organizzazioni che inviano o ricevono documenti
sono chiamate in terminologia EDI "trading
XML - Introduction
Essential elements of EDI
 the use of an electronic transmission medium (originally a
value-added network, but increasingly the open, public
Internet) rather than the despatch of physical storage
media such as magnetic tapes and disks;
 the use of structured, formatted messages based on
agreed standards (such that messages can be translated,
interpreted and checked for compliance with an explicit set
of rules);
 relatively fast delivery of electronic documents from sender
to receiver (generally implying receipt within hours, or even
minutes); and
 direct communication between applications (rather than
merely between computers).
XML - Introduction
Il vecchio EDI
 Formati diversi per
ciascuna applicazione
 Il codice applicativo
non ha una vista
 Nuovi attori hanno
impatti devastanti
 Può soltanto
condividere elementi
definiti in precedenza
 I nuovi bisogni non
possono essere
facilmente soddisfatti
XML - Introduction
XML può essere la soluzione
 Formati diversi per
ciascuna applicazione
 XML fornisce una
singola vista logica
 L’architettura flessibile
supporta nuovi
XML - Introduction
Calcolo Distribuito (I)
 Reazione lenta ai
 Costi di manutenzione
 Flessibilità limitata
 I cambiamenti dei dati
si propagano a tutti i
XML - Introduction
Calcolo Distribuito (II)
 Più standard
 Più semplice
 Più facilmente
 Minori costi di
 Maggiore reattività
 API e template
language standard
XML - Introduction
Esempio: fatturazione elettronica
La fatturazione elettronica “elaborabile”, quella cioè orientata ad automatizzare le
registrazioni contabili, è basata su sistemi di trasmissione di dati commerciali ed
amministrativi che, utilizzando reti di trasmissione telematica o reti di telecomunicazioni
nazionali ed internazionali, consentono di scambiare automaticamente tra due
applicazioni informatiche, messaggi strutturati mediante una norma concordata. Sono
tali, per esempio, i tradizionali sistemi di trasmissione EDI (Electronic Data Interchange
che scambiano dati secondo tracciati standard internazionali, utilizzando reti di
trasmissione private oppure le più innovative,e meno onerose, soluzioni WEBEDI con
tecnologie di trasmissione web-based oppure le ultime nate, le soluzioni XML-based,
dove i dati vengono scambiati utilizzando il metalinguaggio XML (eXtensible Markup
Language), secondo gli stessi standard dell’EDI oppure con nuovi standard
XML - Introduction
Approccio XML/EDI basato su
scambio di messaggi
Piero De Sabbata ENEA
XML - Introduction
Trasmissione messaggi e sicurezza
Piero De Sabbata ENEA
XML - Introduction
Lo scenario message based
Piero De Sabbata ENEA
XML - Introduction
XML - Introduction
XML - Introduction
E’ un sistema di codifica che assegna un numero univoco ad ogni
carattere usato per la scrittura di testi, in maniera indipendente dalla
lingua, dalla piattaforma informatica e dal programma utilizzato.
Il codice assegnato al carattere viene rappresentato con U +,
seguito dalle quattro (o sei) cifre esadecimali del numero che lo
Attualmente lo standard Unicode non rappresenta ancora tutti i
caratteri in uso nel mondo.
Essendo ancora in evoluzione, si prefigge di coprire tutti i caratteri
rappresentabili, garantendo la compatibilità e la non sovrapposizione
con le codifiche dei caratteri già definiti, ma lasciando comunque dei
ben precisi campi di codici "non usati", da riservare per la gestione
autonoma all'interno di applicazioni particolari.
XML - Introduction
XML - Introduction
Character encoding
 Unicode can be implemented by different character
 Una codifica di caratteri consiste in un codice che associa
un insieme di caratteri ad un insieme di altri oggetti, come
numeri (specialmente nell'informatica) con lo scopo di
facilitare la memorizzazione di un testo in un computer o la
sua trasmissione attraverso una rete di telecomunicazioni.
 Esempi comuni sono il Codice Morse e la codifica ASCII.
 The most commonly used encoding is UTF-8
XML - Introduction
UTF-8 (Unicode Transformation Format, 8 bit) è una
codifica dei caratteri Unicode in sequenze di lunghezza
variabile di byte
Usa da 1 a 4 byte per rappresentare un carattere
Per esempio un solo byte è necessario per rappresentare i
128 caratteri dell'alfabeto ASCII, corrispondenti alle
posizioni Unicode da U+0000 a U+007F.
Esempi :
XML - Introduction
Per esempio, il carattere alef (‫)א‬,
corrispondente all'Unicode U+05D0, viene
rappresentato in UTF-8 con questo
0x000000 0x00007F
0x000080 0x0007FF
110xxxxx 10xxxxxx
l'esadecimale 0x05D0 equivale al binario 1011101-0000.
0x000800 0x00FFFF
1110xxxx 10xxxxxx
0x010000 0x10FFFF
11110xxx 10xxxxxx
10xxxxxx 10xxxxxx
gli undici bit vengono copiati in ordine nelle
posizioni marcate con "x". 110-10111 10010000.
ricade nell'intervallo da 0x0080 a 0x07FF.
Secondo la tabella verrà rappresentato con due
byte. 110xxxxx 10xxxxxx.
il risultato finale è la coppia di byte 11010111
10010000, o in esadecimale 0xD7 0x90
The Euro symbol (€), which is Unicode U+20AC or binary 10 0000 1010 1100:
($), which
or binarythrough
10 0100:
the third
line ofisthe
table range
of U+0800
this third
falls line
into of
of the table
of U+0000
it will range
be encoded
three bytes,
The first line of the table shows it will be encoded using one byte, 0xxxxxxx
Putting the binary right-justified into the 'x' bits results in
Putting the binary right-justified into the 'x' bits results in 00100100
This byte in hexadecimal is 0x24. Thus the ASCII dollar sign is encoded
These bytes in hexadecimal are 0xE2,0x82,0xAC. That is the encoding of the Euro
symbol (€) in UTF-8.
XML - Introduction
World Wide Web Consortium
 The World Wide Web Consortium (W3C) is the main
international standards organization for the World
Wide Web (abbreviated WWW or W3).
 It is arranged as a consortium where member
organizations maintain full-time staff for the
purpose of working together in the development of
standards for the World Wide Web.
 As of October 2008, the W3C had 418 members
(http://www.w3.org/Consortium/Member/List )
 W3C also engages in education and outreach,
develops software and serves as an open forum for
discussion about the Web.
 It was founded and is headed by Sir Tim BernersLee.
XML - Introduction
XML - Introduction
What is a Recommendation?
Unlike an officially sanctioned standards body, such as the
International Standards Organization (ISO), the W3C is not an
official standards organization.
The W3C simply publishes "Recommendations," which are not
binding in any way. Simply put, they are a set of guidelines,
published and copyrighted by the W3C.
 The power of these "Recommendations"
comes from the fact that people treat them as
standards by consensus, and the fact that you
can't claim compliance with a
Recommendation and not be in compliance
without violating the copyrights.
XML - Introduction
Incarico a Charles F. Goldfarb di costruire un sistema per
la memorizzazione, la ricerca, la gestione e la
pubblicazione di documenti legali
Goldfarb scoprì che molti sistemi, in IBM, non potevano
comunicare tra loro
3 fatti importanti
I formati dei file nelle diverse applicazioni erano proprietari
...e diversi tra loro!!!
I diversi programmi avevano bisogno di supportare una
rappresentazione comune dei documenti
Il linguaggio comune doveva essere specifico per i
documenti legali
Il linguaggio doveva essere specificato in una maniera
formale, capace di delimitare in modo appropriato gli
La risposta è stato GML (Generalized Markup Language),
precursore di SGML (Standard GML), il linguaggio da cui
deriva XML
XML - Introduction
Standard Generalized Markup
Language (ISO 8879:1986 SGML)
is an ISO Standard metalanguage in which one can define markup
languages for documents.
SGML is a descendant of IBM's Generalized Markup Language (GML),
developed in the 1960s by Charles Goldfarb, Edward Mosher and
Raymond Lorie (whose surname initials were used by Goldfarb to make
up the term GML).
SGML provides an abstract syntax that can be realized in many different
concrete syntaxes
SGML was originally designed to enable the sharing of machinereadable documents in large projects in government, law and industry,
which have to remain readable for several decades.
It has also been used extensively in the printing and publishing
industries, but its complexity has prevented its widespread application
for small-scale general-purpose use.
Primarily intended for text and database publishing, one of its first major
applications was the second edition of the Oxford English Dictionary (OED),
which was and is wholly marked up using an SGML-like markup.
XML - Introduction
W3C XML 10 Years
On 10 February 1998, W3C published Extensible Markup Language (XML) 1.0
as a W3C Recommendation. W3C would like to thank the dedicated
communities -- including people who have participated in W3C's XML groups
and mailing lists, the SGML community, and xml-dev -- whose efforts have
created a successful family of technologies based on the solid XML 1.0
"There is essentially no computer in the world, desk-top, hand-held, or
back-room, that doesn't process XML sometimes," said Tim Bray of
Sun Microsystems.
"This is a good thing, because it shows that information can be
packaged and transmitted and used in a way that's independent of the
kinds of computer and software that are involved. XML won't be the
last neutral information-wrapping system; but as the first, it's done very
XML - Introduction
Il concetto di metalinguaggio (I)
In logic and linguistics, a metalanguage is a language used
to make statements in another language which is called the
object language ( cioè un formalismo per descrivere
rigorosamente un altro linguaggio)
Markup languages are different from metalanguages as
they only describe how a document should be presented
and not the syntax of a computer programming language,
however it's possible to use schemas like XML Schemas to
describe content rules.
XML is the metalanguage used to describe XHTML
just as SGML is used to describe HTML.
XHTML is much stricter than HTML, for example
XHTML is case sensitive unlike HTML.
XML - Introduction
Il concetto di metalinguaggio (II)
XML - Introduction
Il concetto di metalinguaggio (III)
Dato che XML è un metalinguaggio per specificare altri
linguaggi, costituisce un “livello comune” per il dialogo in
ambienti differenti
XML non dice nulla su che tag utilizzare, ma fissa solo delle
regole comuni per eseguire correttamente il parsing del file
E’ possibile usare XML per gli scopi più disparati, a seconda
delle operazioni che verranno eseguite dalla specifica
applicazione di fronte al markup utilizzato
Dati (file XML)
Tag specifici
Regole XML
XML - Introduction

