ISO 19757 –
Document Schema Definition
Languages (DSDL)
Martin Bryan
Convenor, JTC1/SC18 WG1
ISO 19757 - DSDL
Parts of DSDL
1.
2.
3.
4.
Overview
Regular-grammar-based validation (RELAX NG)
Rule-based validation (Schematron)
Namespace-based validation dispatch language
(NVDL)
5. Datatypes
6. Path-based integrity constraints
7. Character repertoire validation
8. Declarative document architectures
9. Datatype- and namespace-aware DTDs
10. Validation management
ISO 19757 - DSDL
Regular-grammar-based validation
(RELAX NG)
• XML description of a data model
– Compact syntax is even simpler than DTDs
• Provides way of defining short-cuts
– More functional than parameter entities
• Provides context-dependent models
– Models can be amended when imported
• Supports namespaces and datatypes
– Any datatype, including W3C Schema datatypes
• Can import modules from multiple namespaces
– Can build multi-source schemas
ISO 19757 - DSDL
Main components of RELAX NG
pattern ::= <element name="QName"> pattern+ </element>
| <element> nameClass pattern+ </element>
| <attribute name="QName"> [pattern] </attribute>
| <attribute> nameClass [pattern] </attribute>
| <group> pattern+ </group>
| <interleave> pattern+ </interleave>
| <choice> pattern+ </choice>
| <optional> pattern+ </optional>
| <zeroOrMore> pattern+ </zeroOrMore>
| <oneOrMore> pattern+ </oneOrMore>
| <list> pattern+ </list>
| <mixed> pattern+ </mixed>
| <ref name="NCName"/>
| <parentRef name="NCName"/>
| <empty/>
| <text/>
| <value [type="NCName"]> string </value>
| <data type="NCName"> param* [exceptPattern] </data>
| <notAllowed/>
| <externalRef href="anyURI"/>
| <grammar> grammarContent* </grammar>
ISO 19757 - DSDL
Using the full syntax
<grammar xmlns="http://relaxng.org/ns/structure/1.0"
datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<ref name="document"/>
</start>
<define name="document">
<element name="document">
<ref name="head"/>
<ref name="body"/>
</element>
</define>
<define name="head">
<element name="head">
<interleave>
<element name="organization">
<choice>
<value>ISO</value>
<value>ISO/IEC</value>
</choice>
</element>
<element name="document-type">
<choice>
<value>International Standard</value>
<value>Technical Report</value>
<value>Guide</value>
<value>Publicly Available Specification</value>
<value>Technical Specification</value>
<value>International Standardized Profile</value>
</choice>
</element>
ISO 19757 - DSDL
Alternative compact syntax
• Can produce a whole ISO standard using just:
namespace p = "http://relaxng.org/ns/proofsystem"
datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
formal = element p:* { attribute * { text }*, (formal|text)* }
inline &= formal*
block |= formal
block |= element grammarref|rngref {attribute src { xsd:anyURI }}
include "is.rnc“
• Can replace existing definitions with new one
• Can extend definitions
– |= means “add this option to an existing OR group”
– &= means “add this option to an existing AND group”
• Can merge grammars
ISO 19757 - DSDL
Rule-based validation (Schematron)
• “A Schematron schema contains naturallanguage assertions concerning a set of
documents, marked up with various elements
and attributes for testing these natural-language
assertions, and for simplifying and grouping the
assertions.”
• “A Schematron schema reduces to a nonchaining rule system whose terms are boolean
functions invoking an external query language on
the instance and other visible XML documents,
with syntactic features to reduce specification
size and to allow efficient implementation.”
ISO 19757 - DSDL
Schematron example
<sch:rule context="failed-assert | successful-report">
<sch:extends rule="second-level" />
<sch:assert test="count(diagnostic-reference) + count(text)
= count(*)">
The <sch:name/> element should only contain a text element
and diagnostic reference elements.
</sch:assert>
<sch:assert test="count(text) = 1">
The <sch:name/> element should only contain a text element.
</sch:assert>
<sch:assert test="preceding-sibling::fired-rule |
preceding-sibling::failed-assert | precedingsibling::successful-report">
A <sch:name/> comes after a fired-rule, a failed-assert or
a succesful-report.
</sch:assert>
</sch:rule>
ISO 19757 - DSDL
Schematron core elements
•
•
•
•
•
•
•
•
•
•
•
•
•
•
active
assert
extends
include
let
name
ns
param
pattern
phase
report
rule
schema
value-of
ISO 19757 - DSDL
Ancilliary elements and attributes
•
•
•
•
•
•
•
diagnostics element
diagnostic element
dir element
emph element
p element
span element
title element
•
•
•
•
•
•
flag attribute
fpi attribute
icon attribute
role attribute
see attribute
subject attribute
ISO 19757 - DSDL
Namespace-based Validation
Dispatching Language (NVDL)
• Allows data from different namespaces to be validated by
different processes
– Can validate one namespace using RELAX, another using a DTD
and a third using a W3C Schema
• Simple and full syntaxes
– Full syntax simplified to simple syntax before use
• All validation is done in context
– Slots are created to identify where data from alternative
namespaces has been removed
• Allows attributes from different namespaces to be
validated
• Elements and attributes in different namespaces are separated
into separate “sections”
ISO 19757 - DSDL
NVDL example – HTML + XForms (1)
<rules xmlns="purl://dsdl.org/nvdl/ns/structure/1.0"
xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">
<namespace ns="http://www.w3.org/2002/06/xhtml2">
<validate schema="xhtml2.rng">
<mode>
<namespace ns="http://www.w3.org/2002/xforms">
<validate schema="xforms.rng">
<mode>
<namespace ns="http://www.w3.org/2002/xforms">
<attach message="Skipped descendant XForms sections."/>
</namespace>
<namespace ns="http://www.w3.org/2002/06/xhtml2">
<unwrap message="Skipped descendant XHTML2 sections."/>
</namespace>
</mode>
</validate>
…
ISO 19757 - DSDL
NVDL example (2)
<unwrap>
<mode>
<namespace ns="http://www.w3.org/2002/xforms">
<unwrap message="Skipped descendant XForms"/>
</namespace>
<namespace ns="http://www.w3.org/2002/06/xhtml2">
<attach message="Any descendant XHTML2 sections"/>
</namespace>
</mode>
</unwrap>
</namespace>
</mode>
</validate>
</namespace>
</rules>
ISO 19757 - DSDL
Datatypes
• Allows multiple datatype sets to be defined
– W3C datatypes can be used as the base
• Will allow user-defined datatype primitives to be
added
– Needed for extended date/period formats, etc
• Will provide mechanism for defining complex
patterns
– Patterns based on supertypes will be allowed
• Normalization of values, comparing results after
normalization
– Convert local date formats to ISO 8601 then compare
ISO 19757 - DSDL
Possible form for Part 5
<datatype name="price">
<supertype name="decimal">
<cast>
<if test="not(sign='-')">
<copy-of select="whole-part"/>
<text>.</text>
<my:fraction-part>
<value-of select(substring(concat(fraction-part, '00'), 1,2)"/>
</my:fraction-part>
</if>
</cast>
</supertype>
</datatype>
ISO 19757 - DSDL
Path-based integrity constraints
• Non-hierarchical links between information items
in a structured resource can be identified by
addressing items within the document tree and
then expressing the relationship between them.
• Provides a method for identifying information
items dependent on ancestry or the use of keys
• And a method for describing the role of
relationships that are not hierarchical
• Allows selection of fragments to be validated
• Will include an extensible basis for supporting
mechanisms not currently available
ISO 19757 - DSDL
Character repertoire validation
• User-defined character sets that can be used to
validate the contents of elements or attributes
– Will be able to check that only characters relevant for
a particular language are used, not all those in a
particular Unicode character block
• Schematron-like rules for associating character
repertoires with a particular element or attribute
<sch:rule context="*[/*[@xml:lang='nl']]">
<sch:assert test="\p{IsBasicLatin}\p{IsLatin-1Supplement}
&#x132;&#x133;\p{IsGeneralPunctuation}\p{IsCurrencySymbols}">
If this document is a Dutch document, it should have only
characters used in typical Dutch publishing.
</sch:assert>
</sch:rule>
ISO 19757 - DSDL
Declarative document architectures
• Allows locally meaningful names to be assigned
to schema components
– 80/20 rule allows many functions of abstract classes
• Allows predefined fragments to be defined within
schema
– Reintroduces entity definitions in a more controllable
form
– May contain optional components
• Can even re-define entity names
– No longer restricted to English-based prompts to
reference standard entity references such as &nbsp;
• Removing elements/attribute in defined contexts
ISO 19757 - DSDL
Datatype/Namespace-aware DTDs
• Shows how the ISO 8879/XML Document Type
Definition (DTD) syntax can be extended to
validate documents that make full use of XML
Namespaces and Part 5 Datatypes
• May be extended to add character repertoire
validation
• Will allow DTDs to be used to validate any XML
document, including those defined using Part 2
• Will allow SGML documents to be treated as
input to ISO 19757 validation processes
ISO 19757 - DSDL
Validation management
• Includes a mechanism to invoke parsers which read nonXML sources (and XML sources that can't be identified by
a single URI) to create XML Infosets that can be used for
subsequent processing
• Allows pre-validation transformations to be used to
normalize and/or subset documents before validation
• Multiple validations and transformations may be applied
• Transformations will be able to split a document into
multiple resulting documents
• Includes facilities to generate customized validation
reports which can be output as XML document instances
that can be processed by other applications
ISO 19757 - DSDL
Possible format for Part 10
<framework>
<rule>
<instance>
<transform transformation="normalize.xslt"/>
</instance>
<assert>
<isValid schema="my-schema.rng"/>
<isValid schema="my-schema.sch"/>
</assert>
</rule>
</framework>
ISO 19757 - DSDL
Current status
• Published
– Part 2, RELAX-NG
• At Committee Draft stage
– Part 3, Schematron
– Part 4, NVDL
• Working Draft under consideration
–
–
–
–
Part 1, Overview
Part 7, Character repertoire validation
Part 8, Declarative document architectures
Part 10, Validation management
• Parts 5, 6 & 9 not yet drafted
ISO 19757 - DSDL
Tracking progress
• Via your national standards body
– IST/41 at BSI
• Via XML UK or any ISUG chapter
– Martin Bryan is XML UK representative on IST/41 and
ISUG representative for SC34/WG1
• Via the DSDL public website
– http://www.dsdl.org
ISO 19757 - DSDL
Descargar

ISO 19757 – Document Schema Description Languages …