XML Technologies and Related Applications Lesson 2 Review XML is a technology for creating markup languages to describe data of virtually any type in a structured manner. Unlike HTML, which limits the document author to a fixed set of tags, XML allows document authors to describe data more precisely by creating new tags. XML can be used to create markup languages for describing data in almost any field. Objectives • Learn about Elements and Attributes. • Learn how namespaces are used to promote better interoperability. • Use entity references to embed illegal characters. • Understand the importance of the prolog. • Add comments to your XML documents. What Makes an XML Document? XML documents are based on elements. An element is comprised of a start tag, content, and an end tag. Elements can contain other elements, and additional descriptive information about an element can be encoded with attributes. <myMessage> <message>Welcome to XML!</message> </myMessage> white space is irrelevant! Some notes Elements can contain other elements, nested to any arbitrary depth. (solar system.xml) Note that symmetry among elements is not required. Every element can have any combination of any sub-element name. XML documents are commonly stored in text files that end in the extension .xml, although this is not a requirement of XML. Elements: The Building Blocks of XML Valid element names • Must start with a letter A..Za..z or an underscore, • May contain any combination of A..Za..z, 0..9, dash ( - ), or underscore • Can be of any length. • Are case sensitive. • Terseness is of minimal importance. Content Content is the text between opening and closing tags. In order to be able to tell the difference between tags and content, a small number of characters and symbos are not allowed in the content area. Special Symbols(1) Common Symbols < > “ ‘ & < > " ' & #65 for Capital A < > " ' & CDATA A better solution is to tag such content with the CDATA character data section modifier. This tells the parser to ignore the content and not attempt to interpret it as having XML markup. <customer> <name><![CDATA[ You & Me ]]></name> </customer> CDATA section can not have another nested CDATA section. CDATA Parse eg.xml Root Element All XML documents must contain exactly one root element . A DOM tree. Attributes: More Muscle for Elements(1) Sometimes you need to convey more information about an element than its name and content can express. An attribute can be used to give the element a unique label so it can be easily located, or it can describe a property about the element. An element can have any number of attributes, as long as each has a unique name. Attributes: More Muscle for Elements(2) All-element design is superior. (Attribute)To see if the element is a simple scalar data value that has only a single interpretation and is not likely to change or expand over time. About content Which is better? It’s up to you! Data and metadata. Text information between opening and closing tag-pairs is data. Tag names, attributes and their relationship are metadata. Attributes: More Muscle for Elements(3) Attribute values can be constrained to certain types if you use a DTD. One type is ID, which tells XML that the value is a unique identifier code for the element. No two elements in a document can have the same ID. Another type, IDREF, is a reference to an ID. Data type Everything in XML files will interprets as strings. To enforce specific data type. DTD XML Schema Quotes in attributes. be Reserved Attribute Names Some attribute names have been set aside for special purposes by the XML working group. These attributes are reserved for XML's use and begin with the prefix xml: xml:lang xml:space xml:link xml:attribute XML Documents Parsers (1) software program called an XML parser (or an XML processor) is required to process an XML document. The XML parser reads the XML document, checks its syntax, reports any errors and allows programmatic access to the document's contents. An XML document is considered well formed if it is syntactically correct. A XML Documents Parsers (2) Most XML parsers can be downloaded at no charge. Such as Microsoft Internet Explorer 5 (IE5), have built-in XML parsers, the Apache XML Project's parser Xerces, Sun Microsystem's Java API for XML Parsing (JAXP) and IBM's parser XML for Java (XML4J). Basic Document Structure Rules 1. Tags are case-sensitive. 2. Opening tags must have closing tags. 3. Tags must be properly nested. 4. Attribute values require quotes. 5. Root Element required. well-formed XML document Some Examples The Document Prolog The top of an XML document is graced with special information called the document prolog. But the prolog can hold additional information that nails down such details as the document type definition being used, declarations of special pieces of text, the text encoding, and instructions to XML processors. XML declaration(1) version: Sets the version number. encoding: Defines the character encoding used in the document, such as US-ASCII. standalone: Tells the XML processor whether there are any other files to load. XML declaration(2) All of the properties are optional, but you should try to include at least the version number in case something changes drastically in a future revision of the XML specification. The parameter names must be lowercase, and all values must be quoted with either double or single quotes. Document type declaration This is where you can specify various parameters such as entity declarations, the DTD to use for validating the document, and the name of the root element. PI(1) It is a container for data that is targeted toward a specific XML processor. Processing instructions (PIs) contain two pieces of information: a target keyword and some data. PI(2) The PI can contain any data except the combination ?>, which would be interpreted as the closing delimiter. <?xml-stylesheet href=“style.css” type=“text/css”> <?xml-stylesheet href=“style.xsl” type=“text/xsl”> Comments (1) Comments are notes in the document that are not interpreted by the parser. They can be used to identify the purpose of files and sections to help navigate a cluttered document, or simply to communicate with each other. Comments (2) Two dashes in a row (--) are used tell the parser when a comment begins and ends, they can't be placed anywhere inside the comment. Since comments can contain markup, they can be used to "turn off" parts of a document. Don't put comments inside comments or elements. Something about UTF-8 Unicode (2 bytes) UCS (Universal Character Set) 216=65536 UCS-2 UCS-4 UTF (Unicode/UCS Transformation Format) UTF-8 UTF-16 XML 1.0 V.S. XML 1.1 Unicode compatibility. Backward and forward compatibility. Namespaces: Expanding Vocabulary(1) What happens when you want to include elements or attributes from different document types? Namespaces: Expanding Vocabulary(2) A namespace must be declared in the document before you can use it. Be careful not to use prefixes like xml, xsl, or other names reserved by XML and related languages. The XML processor isn't required to do anything with the URI, however. uri url... URI(Uniform Resource Identifier) RFC 2396 URL(Uniform Resource Locator) URN(Uniform Resource Name) Scope of NS Default namespace We can declare one of the namespaces to be the default by omitting the colon (:) and the name from the xmlns attribute. Elements and attributes in the default namespace don't need the namespace prefix, resulting in clearer markup. Some problems about NS Namespaces can be a headache if used in conjunction with a DTD. Namespaces can only assure that names are unique and unambiguous. They have nothing to do with document validity. The URI referenced in a namespace declaration does nothing more than provide an identifier to the processing application. In fact, most XML parsers completely and utterly ignore the namespace URI. Summary You can read and write arbitrarily complex XML documents by using the simple element and attribute markups. It is simple to create well-formed XML document by hand. Know the grammar rules regarding what’s acceptable for element and attribute names. Understand the importance of designing with elements versus attributes, and when to use which one. Namespaces are a powerful feature of XML.