SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
LIS1510
Library and Archives Automation Issues
XML and extensible
systems
Andy Dawson
School of Library, Archive & Information Studies, UCL
(University of Malta 2008)
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
What we will be covering today
•
•
•
•
•
Shortcomings of HTML
Generalised markup languages
How XML works
XML document types
Other related extensible technologies
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Limitations of (X)HTML
• Fixed tag set (specifications determined by
W3C)
• Intended for display of documents on the
Web
• Doesn’t do everything everyone wants
• Not easy to use for other purposes
– searching in documents
– analysis of documents
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Principles of Generalized Markup
• Descriptive markup – encodes features
within a document
• Say what those features are - not what to do
with them
• Need to define your own tags
• Creates machine-independent data
• Data can then be used for different
purposes
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
SGML
• SGML – Standard Generalized Markup
Language
– International standard in 1986
– Metalanguage (syntactic framework) for
defining markup tags
– Parts of SGML are rather complex
– Used by large projects
– Not particularly easy to get started
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
XML
• XML (Extensible Markup Language)
– Adopted by World Wide Web Consortium
in 1998
– Cut-down version of SGML
– Based on same principles
– Designed to implement easily on the Web
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Advantages of XML
•
•
•
•
Machine-independent plain ASCII files
Potential longevity
Multi-purpose use
Ability to analyse/manipulate content
• BUT need to define tag set!
• Not a replacement for HTML unless
analysis/manipulation of data is required
• However, XHTML has become a ‘reliable’
alternative option for simple web publishing
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Defining Your Own Tags
• Need to undertake document analysis
– Identify key features in document
– Identify structure of document
– Choose names for tags
• Only then can we apply the tag
scheme
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Example of a Newspaper
Name of newspaper
Issue
Article
Headline
Author
Paragraphs
Pictures
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Basics of XML Syntax
• Documents are composed of elements
• Start and end tags for every element - unlike
HTML, end tags must be present
– also “Empty elements”
• Attributes
– modify an element
– have a name and a value
– Value must be enclosed in matching quotes (single or
double)
– An element may have several attributes
• Documents can be “Well-formed” or “Valid”
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Well-formed Documents
• Well-formed documents follow XML syntax i.e.
– start and end tags
– attributes in quotes
– nested structure
• But they have no pre-defined structure!
• Therefore:
– Can only check the syntax
– Cannot validate the structure of well-formed documents
• Prepares documents for potential use/conversion
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Valid Documents
• A Valid XML document contains (or refers
to) a Document Type Definition (DTD)
• The DTD is a specification of the document
structure identifying
– which elements are allowed
– where they are allowed
– which attributes they may take
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
Related technologies
• CSS – Cascading Style Sheets
– As used with HTML
– Concentrate only on appearance
• XHTML
– Version of HTML conformant with XML syntax
• XSL - eXtensible Stylesheet Language
– XML language for style sheets
– Controls the appearance of the elements within the
document & defines templates for processing elements
• XML Schemas
– Another way of defining document information
Andy Dawson
SCHOOL OF LIBRARY, ARCHIVE AND INFORMATION STUDIES
That’s all folks…
• Any questions?
• Optional XML exercise is
available…anyone?
• Otherwise – carry on with your
coursework
• Next Tuesday: Website management and
last chance to finish off your website!
…and have a nice weekend 
Andy Dawson
Descargar

1510 - 9 - University of Malta