Examplotron

Table of contents

  1. Purposes
  2. Limitations
  3. Tutorial
    1. Getting started
    2. What about the attributes
    3. Occurrences
    4. Namespaces
    5. Assertions
    6. Imports
    7. Place holders
  4. Resources
  5. To do
  6. Acknowledgements
  7. History
  8. Legal statement

1. Purposes

The purpose of examplotron is to use instance documents as a lightweight schema language-- eventually adding the information needed to guide a validator in the sample documents.

"Classical" XML validation languages such as DTDs, W3C XML Schema, Relax, Trex or Schematron rely on a modeling of either the structure (and eventually the datatypes) that a document must follow to be considered as valid or on the rules that needs to be checked.

This modeling relies on specific XML serialization syntaxes that need to be understood before one can validate a document and is very different from the instance documents and the creation of a new XML vocabulary involves both creating a new syntax and mastering a syntax for the schema.

Many tools (including popular XML editors) are able to generate various flavors of XML schemas from instance documents, but these schemas do not find enough information in the documents to be directly useable leaving the need for human tweaking and the need to fully understand the schema language.

Examplotron may then be used either as a validation language by itself, or to improve the generation of schemas expressed using other XML schema languages by providing more information to the schema translators.

2. Limitations

The obvious limitation of working with sample documents is that while this is very efficient to describe patterns that can be "shown" in a document, this cannot by itself be used to describe abstract "constructed" patterns.

To workaround this limitation, one need to introduce modeling elements or attributes, moving to an hybrid schema language involving both pure "schema by example" and modeling or rules construction.

The current release includes such an attribute (eg:occurs) to provide a control on the number of occurrences of an element (see section "Occurrences" for a detailed description of this attribute).

I plan to consider the addition of other similar elements or attributes to workaround other similar restrictions such as:

3. Tutorial

3.1. Getting started.

This first instance document (examplotron1.xml) is also a examplotron schema:

<?xml version="1.0" encoding="UTF-8"?>
<foo>
	<bar>My first examplotron.</bar>
	<bar>Hello world</bar>
</foo>

This schema will validate all the documents without any namespace and the "same" structure, i.e. three element nodes (a document element of type "foo" with exactly two children elements of type "bar") and no attributes.

The examplotron compiler (i.e. the compile.xsl XSLT sheet) transforms the examplotron schema into a Relax NG schema (examplotron1.rng) that can be applied to any document to check if it has the same structure.

The structure of examplotron1.rng (and therefore the transformation defined in compile.xsl) is very straightforward:

   
<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0">
  <start>
    <element name="foo">
      <element name="bar">
        <text/>
      </element>
      <element name="bar">
        <text/>
      </element>
    </element>
  </start>
</grammar>

3.2. What about the attributes?

Attributes are also supported as shown by examplotron2.xml:

<?xml version="1.0" encoding="UTF-8"?>
<foo>
	<bar true="no longer">My first examplotron.</bar>
	<bar>Hello world</bar>
</foo>

Which will include the definition of the "true" attribute (examplotron2.rng):

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0">
  <start>
    <element name="foo">
      <element name="bar">
        <attribute name="true"/>
        <text/>
      </element>
      <element name="bar">
        <text/>
      </element>
    </element>
  </start>
</grammar>

Note that, unlike in preceding versions, including consecutive elements with the same name generates consecutive definitions of these elements with different content models.

3.3. Occurrences

Controlling the number of the occurrences by adding as many elements as needed is rapidly verbose and do not cope with optional or arbitrary big number of occurrences. For these two cases, examplotron defines a simple mechanism inspired by the DTDs allowing to override the definition of the occurrences (examplotron3.xml):

<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns:eg="http://examplotron.org/0/">
	<bar eg:occurs="+">Hello world</bar>
	<!-- eg:occurs could also have been set to "*", "." or "?" -->
</foo>

The value of the eg:occurs attributes can be "*" (0 or more), "+" (1 or more), "." (exactly one) or "?" (0 or 1) and defined the number of occurrences of the element (examplotron3.rng):

 
<?xml version="1.0" encoding="utf-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0">
  <start>
    <element name="foo">
      <oneOrMore>
        <element name="bar">
          <text/>
        </element>
      </oneOrMore>
    </element>
  </start>
</grammar>
				

3.4. Namespaces

Examplotron does support namespaces without any known restriction using namespaces in the examplotron documents as in any instance document (examplotron4.xml):

<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns:eg="http://examplotron.org/0/" xmlns:bar="http://http://examplotron.org/otherns/">
	<bar:bar eg:occurs="+">Hello world</bar:bar>
	<!-- eg:occurs could also have been set to "*", "." or "?" -->
</foo>

is straightforwardly translated into (examplotron4.rng):

<?xml version="1.0" encoding="utf-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0">
  <start>
    <element name="foo">
      <oneOrMore>
        <element name="bar" ns="http://http://examplotron.org/otherns/">
          <text/>
        </element>
      </oneOrMore>
    </element>
  </start>
</grammar>

3.5. Assertions

In order to describe more complex rules, it is possible to define assertions (i.e. statements that need to be met) as XPath expressions using "eg:assert" attributes (examplotron5.xml):

<?xml version="1.0" encoding="UTF-8"?>
<foo xmlns:eg="http://examplotron.org/0/"  eg:assert="sum(percent)=100">
<!-- The sum of the values of the "percent" element needs to be equal to 100 -->
	<percent eg:occurs="+">100</percent>
</foo>

The implementation of this feature is optional and may may vary depending on the architecture of the implementation. One of the possibilities is to generate Schematron embedded assertions as supported by Sun's MSV (examplotron5.rng):

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0" 
   xmlns:sch="http://www.ascc.net/xml/schematron">
  <start>
    <element name="foo">
      <sch:assert test="sum(percent)=100"/>
      <oneOrMore>
        <element name="percent">
          <text/>
        </element>
      </oneOrMore>
    </element>
  </start>
</grammar>

Assertions can be used without restriction on document using namespaces, but please remember that XPath expressions do not support default namespaces (examplotron6.xml):

<?xml version="1.0" encoding="UTF-8"?>
<foo:foo xmlns:eg="http://examplotron.org/0/"  eg:assert="sum(bar:percent)=100" 
    xmlns:foo="http://examplotron/otherns/foo" xmlns:bar="http://examplotron/otherns/bar">
	<bar:percent eg:occurs="+">100</bar:percent>
</foo:foo>

To deal with possible redefinitions of namespace prefixes, the compiler copies all the namespaces nodes found in the element where the assertion is found in the template (examplotron6.rng):

<?xml version="1.0" encoding="UTF-8"?>
<grammar xmlns="http://relaxng.org/ns/structure/1.0" 
   xmlns:sch="http://www.ascc.net/xml/schematron">
  <start>
    <element name="foo" ns="http://examplotron/otherns/foo">
      <sch:assert xmlns:eg="http://examplotron.org/0/" 
          xmlns:foo="http://examplotron/otherns/foo" 
          xmlns:bar="http://examplotron/otherns/bar" 
          test="sum(bar:percent)=100"/>
      <oneOrMore>
        <element name="percent" ns="http://examplotron/otherns/bar">
          <text/>
        </element>
      </oneOrMore>
    </element>
  </start>
</grammar>

Many thanks to David Carlisle for this tip.

3.6. Import

To be (re) defined to match either one or both "include" and "externalRef" patterns from Relax NG.

3.7. Place holders

These should not be needed any longer.

4. Resources

Mailing list

This mailing list is for discussing issues, bugs, questions and future development of Examplotron.

To subscribe, send a mail to examplotron-request@xmlschemata.org with "subscribe" in the subject or body.

This documentation has been written as a RDDL document and this section will be developed to include more resources related to examplotron.

XSLT Compiler

This compiler is a XSLT transformation that compiles an examplotron schema into a XSLT transformation that can be used to validate documents that are conform to this schema.

The compiler must be run using an EXSLT compliant XSLT processor. The resulting schema must use a Relax NG processor (with support for embedded Schematron rules to support eg:assert).

W3C XML Schema for examplotron

This W3C XML Schema (Proposed Recommendation, 16 March 2001) schema describes the examplotron vocabulary and can be imported in W3C XML Schema to validate examplotron schemas.

CSS Stylesheet

A CSS stylesheet borrowed from RDDL used to provide the "look-and-feel" of this document, suitable in general for RDDL documents.

CSS Stylesheet (original).

Original version of the previous CSS stylesheet on rddl.org.

XYZFind Server User's Guide

The chapter 7. of the XYZFind Server User's Guide describes a schema language used by XYZFind Server that is very similar to examplotron.

Proposal for XSL

This early proposal for XSL proposed a syntax similar to the one used by examplotron for expressing patterns.

Relax NG

Home page of the Relax NG Oasis TC.

5. To do

I have been pleasantly surprised after a couple of hours working on examplotron that this simple tool was beginning to be useful while still very simple (or simplistic).

The current version is already a powerful tool that can be used to validate documents.

It can be used as a main validation tool, or as a complement of a more classical validation tool, for instance, to add additional requirements and constraints to existing vocabularies when an application is using a subset of a vocabulary.

This being said, the simplicity of the tools is leaving room for many applications and extensions on which your feedback is welcome:

  1. Documentation: develop the resources section.
  2. Extension: ability to accurately control the number of occurrences (may already be done using eg:assert).
  3. Extension: ability to control text nodes.
  4. Extension: ability to define recursive models.
  5. Extension: ability to add type information.
  6. Extension: ability to explicitly control the occurrences of attributes.
  7. Readability: ability to add documentation.
  8. Proof of concept: write an examplotron schema for XHTML.
  9. Anything else ?

6. Acknowledgements

Many thanks to the many people that have given me hints, ideas or encouragements or even let me think that examplotron could be the best invention since the French baguette.

Note: the French baguette is another very simple invention made only of flour, salt, yeast and water (exactly like examplotron that is made out of XML 1.0, Namespaces in XML 1.0, XPath 1.0 and XSLT 1.0). The Englo-American sliced bread (often used in this context), involving more ingredients and postprocessing is far more complex and does obviously not belong to the same category than examplotron.

Non normative list (by chronological order): Simon St.Laurent, Edd Dumbill, John Cowan, Len Bullard, Rick Jelliffe, Evan Lenz, Dan Brickley, Jonathan Borden, David Mundie, David Carlisle, Murata Makoto, Cyril Jandia, Amelia A. Lewis, Gavin Thomas Nicol, Tim Mueller-Seydlitz, Michael Champion, Wendell Piez...

7. History

V0.1

  • Creation

V0.2

  • Addition of several sections (limitations, acknowledgements, history and legal)
  • Clarifications after comments through xml-dev and private mails.
  • Addition of an history section in compile.xsl
  • Creation of a W3C XML Schema for examplotron (examplotron.xsd).
  • Start to feed the resources section.

V0.3

  • Addition of eg:assert.
  • Rewrite of the history section as RDDL resources.
  • Addition of new resources.
  • Expansion of the list of acknowledgements.

V0.4

  • Addition of eg:import.
  • Addition of eg:placeHolder.
  • Restructured the document.
  • Addition of new resources.
  • Expansion of the list of acknowledgements.

V0.5

Major architectural change with constant set of features.

  • "compile.xsl" no longer generate a XSLT transformation but a Relax NG schema.
  • Repeated elements are no longer considered as choice but as a sequence of different definitions.
  • Imports need to be redefined.
  • Place holders shouldn't be needed any longer.
  • Expansion of the list of acknowledgements.