Movatterモバイル変換


[0]ホーム

URL:


OASIS

RELAX NG Tutorial

Committee Specification�3 December 2001

This version:
Committee Specification: 3 December 2001
Previous versions:
Committee Specification: 10 August 2001
Editors:
James Clark�<jjc@jclark.com>, MURATA Makoto�<EB2M-MRT@asahi-net.or.jp>

Copyright � The Organization for the Advancement ofStructured Information Standards [OASIS] 2001. All RightsReserved.

This document and translations of it may be copied and furnishedto others, and derivative works that comment on or otherwise explainit or assist in its implementation may be prepared, copied, publishedand distributed, in whole or in part, without restriction of any kind,provided that the above copyright notice and this paragraph areincluded on all such copies and derivative works. However, thisdocument itself may not be modified in any way, such as by removingthe copyright notice or references to OASIS, except as needed for thepurpose of developing OASIS specifications, in which case theprocedures for copyrights defined in the OASIS Intellectual PropertyRights document must be followed, or as required to translate it intolanguages other than English.

The limited permissions granted above are perpetual and will notbe revoked by OASIS or its successors or assigns.

This document and the information contained herein is providedon an "AS IS" basis and OASIS DISCLAIMS ALL WARRANTIES,EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THEUSE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANYIMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULARPURPOSE.


Abstract

RELAX NG is a simple schema language for XML, based on[RELAX] and[TREX]. A RELAX NG schemaspecifies a pattern for the structure and content of an XMLdocument. A RELAX NG schema thus identifies a class of XML documentsconsisting of those documents that match the pattern. A RELAX NGschema is itself an XML document.

This document is a tutorial for RELAX NG version 1.0.

Status of this Document

This Committee Specification was approved for publication by theOASIS RELAX NG technical committee. It is a stable document whichrepresents the consensus of the committee. Comments on this documentmay be sent torelax-ng-comment@lists.oasis-open.org.

A list of known errors in this document is available athttp://www.oasis-open.org/committees/relax-ng/tutorial-20011203-errata.html.

Table of Contents

1Getting started
2Choice
3Attributes
4Named patterns
5Datatyping
6Enumerations
7Lists
8Interleaving
9Modularity
9.1Referencing external patterns
9.2Combining definitions
9.3Merging grammars
9.4Replacing definitions
10Namespaces
10.1Using thens attribute
10.2Qualified names
11Name classes
12Annotations
13Nested grammars
14Non-restrictions
15Further information

Appendixes

AComparison with XML DTDs
BComparison with RELAX Core
B.1Mapping RELAX NG to RELAX Core
B.1.1elementRule-tag pairs
B.1.2hedgeRule
B.1.3attPool
B.1.4Hedge models
B.1.5Attribute declarations
B.2Examples
B.2.1Ancestor-and-sibling-sensitive content models
B.2.2Attribute-sensitive content model
B.3Features of RELAX NG beyond RELAX Core
CComparison with TREX
DChanges from 12 June 2001 version
References

1. Getting started

Consider a simple XML representation of an email address book:

<addressBook>  <card>    <name>John Smith</name>    <email>js@example.com</email>  </card>  <card>    <name>Fred Bloggs</name>    <email>fb@example.net</email>  </card></addressBook>

The DTD would be as follows:

<!DOCTYPE addressBook [<!ELEMENT addressBook (card*)><!ELEMENT card (name, email)><!ELEMENT name (#PCDATA)><!ELEMENT email (#PCDATA)>]>

A RELAX NG pattern for this could be written as follows:

<element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0">  <zeroOrMore>    <element name="card">      <element name="name">        <text/>      </element>      <element name="email">        <text/>      </element>    </element>  </zeroOrMore></element>

If theaddressBook is required to be non-empty, thenwe can useoneOrMore instead ofzeroOrMore:

<element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0">  <oneOrMore>    <element name="card">      <element name="name">        <text/>      </element>      <element name="email">        <text/>      </element>    </element>  </oneOrMore></element>

Now let's change it to allow eachcard to have anoptionalnote element:

<element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0">  <zeroOrMore>    <element name="card">      <element name="name">        <text/>      </element>      <element name="email">        <text/>      </element>      <optional><element name="note">  <text/></element>      </optional>    </element>  </zeroOrMore></element>

Note that thetext pattern matches arbitrary text,including empty text. Note also that whitespace separating tags isignored when matching against a pattern.

All the elements specifying the pattern must be namespace qualifiedby the namespace URI:

http://relaxng.org/ns/structure/1.0

The examples above use a default namespace declarationxmlns="http://relaxng.org/ns/structure/1.0" for this. Anamespace prefix is equally acceptable:

<rng:element name="addressBook" xmlns:rng="http://relaxng.org/ns/structure/1.0">  <rng:zeroOrMore>    <rng:element name="card">      <rng:element name="name">        <rng:text/>      </rng:element>      <rng:element name="email">        <rng:text/>      </rng:element>    </rng:element>  </rng:zeroOrMore></rng:element>

For the remainder of this document, the default namespacedeclaration will be left out of examples.

2. Choice

Now suppose we want to allow thename to be brokendown into agivenName and afamilyName,allowing anaddressBook like this:

<addressBook>  <card>    <givenName>John</givenName>    <familyName>Smith</familyName>    <email>js@example.com</email>  </card>  <card>    <name>Fred Bloggs</name>    <email>fb@example.net</email>  </card></addressBook>

We can use the following pattern:

<element name="addressBook">  <zeroOrMore>    <element name="card">      <choice>        <element name="name">          <text/>        </element>        <group>          <element name="givenName">            <text/>          </element>          <element name="familyName">            <text/>          </element>        </group>      </choice>      <element name="email">        <text/>      </element>      <optional><element name="note">  <text/></element>      </optional>    </element>  </zeroOrMore></element>

This corresponds to the following DTD:

<!DOCTYPE addressBook [<!ELEMENT addressBook (card*)><!ELEMENT card ((name | (givenName, familyName)), email, note?)><!ELEMENT name (#PCDATA)><!ELEMENT email (#PCDATA)><!ELEMENT givenName (#PCDATA)><!ELEMENT familyName (#PCDATA)><!ELEMENT note (#PCDATA)>]>

3. Attributes

Suppose we want thecard element to have attributesrather than child elements. The DTD might look like this:

<!DOCTYPE addressBook [<!ELEMENT addressBook (card*)><!ELEMENT card EMPTY><!ATTLIST card  name CDATA #REQUIRED  email CDATA #REQUIRED>]>

Just change eachelement pattern to anattribute pattern:

<element name="addressBook">  <zeroOrMore>    <element name="card">      <attribute name="name">        <text/>      </attribute>      <attribute name="email">        <text/>      </attribute>    </element>  </zeroOrMore></element>

In XML, the order of attributes is traditionally not significant.RELAX NG follows this tradition. The above pattern would match both

<card name="John Smith" email="js@example.com"/>

and

<card email="js@example.com" name="John Smith"/>

In contrast, the order of elements is significant. The pattern

<element name="card">  <element name="name">    <text/>  </element>  <element name="email">    <text/>  </element></element>

wouldnot match

<card><email>js@example.com</email><name>John Smith</name></card>

Note that anattribute element by itself indicates arequired attribute, just as anelement element by itselfindicates a required element. To specify an optional attribute, useoptional just as withelement:

<element name="addressBook">  <zeroOrMore>    <element name="card">      <attribute name="name">        <text/>      </attribute>      <attribute name="email">        <text/>      </attribute>      <optional>        <attribute name="note">          <text/>        </attribute>      </optional>    </element>  </zeroOrMore></element>

Thegroup andchoice patterns can beapplied toattribute patterns in the same way they areapplied toelement patterns. For example, if we wantedto allow either aname attribute or both agivenName and afamilyName attribute, we canspecify this in the same way that we would if we were usingelements:

<element name="addressBook">  <zeroOrMore>    <element name="card">      <choice>        <attribute name="name">          <text/>        </attribute>        <group>          <attribute name="givenName">            <text/>          </attribute>          <attribute name="familyName">            <text/>          </attribute>        </group>      </choice>      <attribute name="email">        <text/>      </attribute>    </element>  </zeroOrMore></element>

Thegroup andchoicepatterns can combineelement andattribute patterns without restriction. Forexample, the following pattern would allow a choice of elements andattributes independently for both thename and theemail part of acard:

<element name="addressBook">  <zeroOrMore>    <element name="card">      <choice><element name="name">  <text/></element><attribute name="name">  <text/></attribute>      </choice>      <choice><element name="email">  <text/></element><attribute name="email">  <text/></attribute>      </choice>    </element>  </zeroOrMore></element>

As usual, the relative order of elements is significant, but therelative order of attributes is not. Thus the above would match anyof:

<card name="John Smith" email="js@example.com"/><card email="js@example.com" name="John Smith"/><card email="js@example.com"><name>John Smith</name></card><card name="John Smith"><email>js@example.com</email></card><card><name>John Smith</name><email>js@example.com</email></card>

However, it would not match

<card><email>js@example.com</email><name>John Smith</name></card>

because the pattern forcard requires anyemail child element to follow anyname childelement.

There is one difference betweenattribute andelement patterns:<text/>is the default for the content of anattribute pattern,whereas anelement pattern is not allowed to beempty. For example,

<attribute name="email"/>

is short for

<attribute name="email">  <text/></attribute>

It might seem natural that

<element name="x"/>

matched anx element with no attributes and nocontent. However, this would make the meaning of empty contentinconsistent between theelement pattern and theattribute pattern, so RELAX NG does not allow theelement pattern to be empty. A pattern that matches anelement with no attributes and no children must use<empty/> explicitly:

<element name="addressBook">  <zeroOrMore>    <element name="card">      <element name="name">        <text/>      </element>      <element name="email">        <text/>      </element>      <optional>        <element name="prefersHTML">          <empty/>        </element>      </optional>    </element>  </zeroOrMore></element>

Even if the pattern in anelement patternmatches attributes only, there is no need to useempty. For example,

<element name="card">  <attribute name="email">    <text/>  </attribute></element>

is equivalent to

<element name="card">  <attribute name="email">    <text/>  </attribute>  <empty/></element>

4. Named patterns

For a non-trivial RELAX NG pattern, it is often convenient to be ableto give names to parts of the pattern. Instead of

<element name="addressBook">  <zeroOrMore>    <element name="card">      <element name="name"><text/>      </element>      <element name="email">        <text/>      </element>    </element>  </zeroOrMore></element>

we can write

<grammar>  <start>    <element name="addressBook">      <zeroOrMore><element name="card">  <ref name="cardContent"/></element>      </zeroOrMore>    </element>  </start>  <define name="cardContent">    <element name="name">      <text/>    </element>    <element name="email">      <text/>    </element>  </define></grammar>

Agrammar element has a singlestartchild element, and zero or moredefine child elements.Thestart anddefine elements containpatterns. These patterns can containref elements thatrefer to patterns defined by any of thedefine elementsin thatgrammar element. Agrammar patternis matched by matching the pattern contained in thestartelement.

We can use thegrammar element to write patterns in astyle similar to DTDs:

<grammar>  <start>    <ref name="AddressBook"/>  </start>  <define name="AddressBook">    <element name="addressBook">      <zeroOrMore>        <ref name="Card"/>      </zeroOrMore>    </element>  </define>  <define name="Card">    <element name="card">      <ref name="Name"/>      <ref name="Email"/>    </element>  </define>  <define name="Name">    <element name="name">      <text/>    </element>  </define>  <define name="Email">    <element name="email">      <text/>    </element>  </define></grammar>

Recursive references are allowed. For example,

<define name="inline">  <zeroOrMore>    <choice>      <text/>      <element name="bold">        <ref name="inline"/>      </element>      <element name="italic">        <ref name="inline"/>      </element>      <element name="span">        <optional>          <attribute name="style"/>        </optional>        <ref name="inline"/>      </element>    </choice>  </zeroOrMore></define>

However, recursive references must be within anelement. Thus, the following isnotallowed:

<define name="inline">  <choice>    <text/>    <element name="bold">      <ref name="inline"/>    </element>    <element name="italic">      <ref name="inline"/>    </element>    <element name="span">      <optional><attribute name="style"/>      </optional>      <ref name="inline"/>    </element>  </choice>  <optional>    <ref name="inline"/>  </optional></define>

5. Datatyping

RELAX NG allows patterns to reference externally-defineddatatypes, such as those defined by[W3C XML Schema Datatypes]. RELAX NGimplementations may differ in what datatypes they support. You mustuse datatypes that are supported by the implementation you plan touse.

Thedata pattern matches a string thatrepresents a value of a named datatype. ThedatatypeLibrary attribute contains a URIidentifying the library of datatypes being used. The datatypelibrary defined by[W3C XML Schema Datatypes] would be identified by theURIhttp://www.w3.org/2001/XMLSchema-datatypes.Thetype attribute specifies the name of thedatatype in the library identified by thedatatypeLibrary attribute. For example, if aRELAX NG implementation supported the datatypes of[W3C XML Schema Datatypes], you could use:

<element name="number">  <data type="integer" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"/></element>

It is inconvenient to specify thedatatypeLibrary attribute on everydata element, so RELAX NG allows thedatatypeLibrary attribute to be inherited. ThedatatypeLibrary attribute can be specified on anyRELAX NG element. If adata element does not haveadatatypeLibrary attribute, it will use thevalue from the closest ancestor that has adatatypeLibrary attribute. Typically, thedatatypeLibrary attribute is specified on theroot element of the RELAX NG pattern. For example,

<element name="point" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">  <element name="x">    <data type="double"/>  </element>  <element name="y">    <data type="double"/>  </element></element>

If the children of an element or an attribute match adata pattern, then complete content of the element orattribute must match thatdata pattern. It is notpermitted to have a pattern which allows part of the content to matchadata pattern, and another part to match anotherpattern. For example, the following pattern isnotallowed:

<element name="bad">  <data type="int"/>  <element name="note">    <text/>  </element></element>

However, this would be fine:

<element name="ok">  <data type="int"/>  <attribute name="note">    <text/>  </attribute></element>

Note that this restriction does not apply to thetext pattern.

Datatypes may have parameters. For example, a string datatype mayhave a parameter controlling the length of the string. The parametersapplicable to any particular datatype are determined by the datatypingvocabulary. Parameters are specified by adding one or moreparam elements as children of thedataelement. For example, the following constrains theemailelement to contain a string at most 127 characters long:

<element name="email">  <data type="string">    <param name="maxLength">127</param>  </data></element>

6. Enumerations

Many markup vocabularies have attributes whose value is constrainedto be one of set of specified values. Thevalue patternmatches a string that has a specified value. For example,

<element name="card">  <attribute name="name"/>  <attribute name="email"/>  <attribute name="preferredFormat">    <choice>      <value>html</value>      <value>text</value>    </choice>  </attribute></element>

allows thepreferredFormat attribute to have the valuehtml ortext. This corresponds to theDTD:

<!DOCTYPE card [<!ELEMENT card EMPTY><!ATTLIST card  name CDATA #REQUIRED  email CDATA #REQUIRED  preferredFormat (html|text) #REQUIRED>]>

Thevalue pattern is not restricted to attributevalues. For example, the following is allowed:

<element name="card">  <element name="name">    <text/>  </element>  <element name="email">    <text/>  </element>  <element name="preferredFormat">    <choice>      <value>html</value>      <value>text</value>    </choice>  </element></element>

The prohibition against adata pattern's matchingonly part of the content of an element also applies tovalue patterns.

By default, thevalue pattern will consider the stringin the pattern to match the string in the document if the two stringsare the same after the whitespace in both strings is normalized.Whitespace normalization strips leading and trailing whitespacecharacters, and collapses sequences of one or more whitespacecharacters to a single space character. This corresponds to thebehaviour of an XML parser for an attribute that is declared as otherthan CDATA. Thus the above pattern will match any of:

<card name="John Smith" email="js@example.com" preferredFormat="html"/><card name="John Smith" email="js@example.com" preferredFormat="  html  "/>

The way that thevalue pattern compares thepattern string with the document string can be controlled byspecifying atype attribute and optionally adatatypeLibrary attribute, which identify adatatype in the same way as for thedata pattern.The pattern string matches the document string if they both representthe same value of the specified datatype. Thus, whereas thedata pattern matches an arbitrary value of adatatype, thevalue pattern matches a specificvalue of a datatype.

If there is no ancestor element with adatatypeLibrary element, the datatype librarydefaults to a built-in RELAX NG datatype library. This provides twodatatypes,string andtoken.The built-in datatypetoken corresponds to thedefault comparison behavior of thevalue pattern.The built-in datatypestring compares stringswithout any whitespace normalization (other than the end-of-line andattribute value normalization automatically performed by XML). Forexample,

<element name="card">  <attribute name="name"/>  <attribute name="email"/>  <attribute name="preferredFormat">    <choice>      <value type="string">html</value>      <value type="string">text</value>    </choice>  </attribute></element>

willnot match

<card name="John Smith" email="js@example.com" preferredFormat="  html  "/>

7. Lists

Thelist pattern matches a whitespace-separatedsequence of tokens; it contains a pattern that the sequence ofindividual tokens must match. Thelist patternsplits a string into a list of strings, and then matches the resultinglist of strings against the pattern inside thelistpattern.

For example, suppose we want to have avectorelement that contains two floating point numbers separated bywhitespace. We could uselist as follows:

<element name="vector">  <list>    <data type="float"/>    <data type="float"/>  </list></element>

Or suppose we want thevector element tocontain a list of one or more floating point numbers separated bywhitespace:

<element name="vector">  <list>    <oneOrMore>      <data type="double"/>    </oneOrMore>  </list></element>

Or suppose we want apath element containingan even number of floating point numbers:

<element name="path">  <list>    <oneOrMore>      <data type="double"/>      <data type="double"/>    </oneOrMore>  </list></element>

8. Interleaving

Theinterleave pattern allows child elements to occurin any order. For example, the following would allow thecard element to contain thename andemail elements in any order:

<element name="addressBook">  <zeroOrMore>    <element name="card">      <interleave><element name="name">  <text/></element><element name="email">  <text/></element>      </interleave>    </element>  </zeroOrMore></element>

The pattern is calledinterleave because of how itworks with patterns that match more than one element. Suppose we wantto write a pattern for the HTMLhead element whichrequires exactly onetitle element, at most onebase element and zero or morestyle,script,link andmeta elementsand suppose we are writing agrammar pattern that has onedefinition for each element. Then we could define the pattern forhead as follows:

<define name="head">  <element name="head">    <interleave>      <ref name="title"/>      <optional>        <ref name="base"/>      </optional>      <zeroOrMore>        <ref name="style"/>      </zeroOrMore>      <zeroOrMore>        <ref name="script"/>      </zeroOrMore>      <zeroOrMore>        <ref name="link"/>      </zeroOrMore>      <zeroOrMore>        <ref name="meta"/>      </zeroOrMore>    </interleave>  </element></define>

Suppose we had ahead element that contained ameta element, followed by atitle element,followed by ameta element. This would match the patternbecause it is an interleaving of a sequence of twometaelements, which match the child pattern

      <zeroOrMore>        <ref name="meta"/>      </zeroOrMore>

and a sequence of onetitle element, which matchesthe child pattern

      <ref name="title"/>

The semantics of theinterleave pattern are that asequence of elements matches aninterleave pattern if itis an interleaving of sequences that match the child patterns of theinterleave pattern. Note that this is different from the& connector in SGML:A* & B matchesthe sequence of elementsA A B or the sequence ofelementsB A A but not the sequence of elementsA BA.

One special case ofinterleave is very common:interleaving<text/> with a patternp represents a pattern that matches whatpmatches but also allows characters to occur as children. Themixed element is a shorthand for this.

<mixed>p </mixed>

is short for

<interleave> <text/>p </interleave>

9. Modularity

9.1. Referencing external patterns

TheexternalRef pattern can be used toreference a pattern defined in a separate file. TheexternalRef element has a requiredhref attribute that specifies the URL of a filecontaining the pattern. TheexternalRef matches ifthe pattern contained in the specified URL matches. Suppose forexample, you have a RELAX NG pattern that matches HTML inline contentstored ininline.rng:

<grammar>  <start>    <ref name="inline"/>  </start>  <define name="inline">    <zeroOrMore>      <choice>        <text/>        <element name="code">          <ref name="inline"/>        </element>        <element name="em">          <ref name="inline"/>        </element>        <!-- etc -->      </choice>    </zeroOrMore>  </define></grammar>

Then we could allow thenote element to containinline HTML markup by usingexternalRef as follows:

<element name="addressBook">  <zeroOrMore>    <element name="card">      <element name="name">        <text/>      </element>      <element name="email">        <text/>      </element>      <optional><element name="note">  <externalRef href="inline.rng"/></element>      </optional>    </element>  </zeroOrMore></element>

For another example, suppose you have two RELAX NG patterns stored infilespattern1.rng andpattern2.rng. Thenthe following is a pattern that matches anything matchedby either of those patterns:

<choice>  <externalRef href="pattern1.rng"/>  <externalRef href="pattern2.rng"/></choice>

9.2. Combining definitions

If a grammar contains multiple definitions with the same name,then the definitions must specify how they are to be combined into asingle definition by using thecombine attribute.Thecombine attribute may have the valuechoice orinterleave. Forexample,

<define name="inline.class" combine="choice">  <element name="bold">    <ref name="inline"/>  </element></define><define name="inline.class" combine="choice">  <element name="italic">    <ref name="inline"/>  </element></define>

is equivalent to

<define name="inline.class">  <choice>    <element name="bold">      <ref name="inline"/>    </element>    <element name="italic">      <ref name="inline"/>    </element>  </choice></define>

When combining attributes,combine="interleave"is typically used. For example,

<grammar>  <start>    <element name="addressBook">      <zeroOrMore><element name="card">  <ref name="card.attlist"/></element>      </zeroOrMore>    </element>  </start>  <define name="card.attlist" combine="interleave">    <attribute name="name">      <text/>    </attribute>  </define>  <define name="card.attlist" combine="interleave">    <attribute name="email">      <text/>    </attribute>  </define></grammar>

is equivalent to

<grammar>  <start>    <element name="addressBook">      <zeroOrMore><element name="card">  <ref name="card.attlist"/></element>      </zeroOrMore>    </element>  </start>  <define name="card.attlist">    <interleave>      <attribute name="name"><text/>      </attribute>      <attribute name="email"><text/>      </attribute>    </interleave>  </define></grammar>

which is equivalent to

<grammar>  <start>    <element name="addressBook">      <zeroOrMore><element name="card">  <ref name="card.attlist"/></element>      </zeroOrMore>    </element>  </start>  <define name="card.attlist">    <group>      <attribute name="name"><text/>      </attribute>      <attribute name="email"><text/>      </attribute>    </group>  </define></grammar>

since combining attributes withinterleavehas the same effect as combining them withgroup.

It is an error for two definitions of the same name to specifydifferent values forcombine. Note that the orderof definitions within a grammar is not significant.

Multiplestart elements can be combined inthe same way as multiple definitions.

9.3. Merging grammars

Theinclude element allows grammars to bemerged together. Agrammar pattern may haveinclude elements as children. Aninclude element has a requiredhref attribute that specifies the URL of a filecontaining agrammar pattern. The definitions inthe referencedgrammar pattern will be included ingrammar pattern containing theinclude element.

Thecombine attribute is particularly usefulin conjunction withinclude. For example, supposea RELAX NG patterninline.rng provides a patternfor inline content, which allowsbold anditalic elements arbitrarily nested:

<grammar>  <define name="inline">    <zeroOrMore>      <ref name="inline.class"/>    </zeroOrMore>  </define>  <define name="inline.class">    <choice>      <text/>      <element name="bold"><ref name="inline"/>      </element>      <element name="italic"><ref name="inline"/>      </element>    </choice>  </define></grammar>

Another RELAX NG pattern could useinline.rngand addcode andem to the setof inline elements as follows:

<grammar>  <include href="inline.rng"/>  <start>    <element name="doc">      <zeroOrMore><element name="p">  <ref name="inline"/></element>      </zeroOrMore>    </element>  </start>  <define name="inline.class" combine="choice">    <choice>      <element name="code"><ref name="inline">      </element>      <element name="em"><ref name="inline">      </element>    </choice>  </define>  </grammar>

This would be equivalent to

<grammar>  <define name="inline">    <zeroOrMore>      <ref name="inline.class"/>    </zeroOrMore>  </define>  <define name="inline.class">    <choice>      <text/>      <element name="bold"><ref name="inline"/>      </element>      <element name="italic"><ref name="inline"/>      </element>    </choice>  </define>  <start>    <element name="doc">      <zeroOrMore><element name="p">  <ref name="inline"/></element>      </zeroOrMore>    </element>  </start>  <define name="inline.class" combine="choice">    <choice>      <element name="code"><ref name="inline">      </element>      <element name="em"><ref name="inline">      </element>    </choice>  </define>  </grammar>

which is equivalent to

<grammar>  <define name="inline">    <zeroOrMore>      <ref name="inline.class"/>    </zeroOrMore>  </define>  <define name="inline.class">    <choice>      <text/>      <element name="bold"><ref name="inline"/>      </element>      <element name="italic"><ref name="inline"/>      </element>      <element name="code"><ref name="inline">      </element>      <element name="em"><ref name="inline">      </element>    </choice>  </define>  <start>    <element name="doc">      <zeroOrMore><element name="p">  <ref name="inline"/></element>      </zeroOrMore>    </element>  </start></grammar>

Note that it is allowed for one of the definitions of a name toomit thecombine attribute. However, it is anerror if there is more than one definition that does so.

ThenotAllowed pattern is useful when merginggrammars. ThenotAllowed pattern never matchesanything. Just as addingempty to agroup makes no difference, so addingnotAllowed to achoice makes nodifference. It is typically used to allow an including pattern tospecify additional choices withcombine="choice".For example, ifinline.rng were written likethis:

<grammar>  <define name="inline">    <zeroOrMore>      <choice><text/><element name="bold">  <ref name="inline"/></element><element name="italic">  <ref name="inline"/></element><ref name="inline.extra"/>      </choice>    </zeroOrMore>  </define>  <define name="inline.extra">    <notAllowed/>  </define></grammar>

then it could be customized to allow inlinecode andem elements asfollows:

<grammar>  <include href="inline.rng"/>  <start>    <element name="doc">      <zeroOrMore><element name="p">  <ref name="inline"/></element>      </zeroOrMore>    </element>  </start>  <define name="inline.extra" combine="choice">    <choice>      <element name="code"><ref name="inline">      </element>      <element name="em"><ref name="inline">      </element>    </choice>  </define>  </grammar>

9.4. Replacing definitions

RELAX NG allowsdefine elements to be putinside theinclude element to indicate that theyare to replace definitions in the includedgrammarpattern.

Suppose the fileaddressBook.rngcontains:

<grammar>  <start>    <element name="addressBook">      <zeroOrMore><element name="card">  <ref name="cardContent"/></element>      </zeroOrMore>    </element>  </start>  <define name="cardContent">    <element name="name">      <text/>    </element>    <element name="email">      <text/>    </element>  </define></grammar>

Suppose we wish to modify this pattern so that thecard element contains anemailAddress element instead of anemail element. Then we could replace the definitionofcardContent as follows:

<grammar>  <include href="addressBook.rng">    <define name="cardContent">      <element name="name"><text/>      </element>      <element name="emailAddress"><text/>      </element>    </define>  </include></grammar>

This would be equivalent to

<grammar>  <start>    <element name="addressBook">      <zeroOrMore><element name="card">  <ref name="cardContent"/></element>      </zeroOrMore>    </element>  </start>  <define name="cardContent">    <element name="name">      <text/>    </element>    <element name="emailAddress">      <text/>    </element>  </define></grammar>

Aninclude element can also contain astart element, which replaces thestart in the included grammar pattern.

10. Namespaces

RELAX NG is namespace-aware. Thus, it considers an element or attributeto have both a local name and a namespace URI which togetherconstitute the name of that element or attribute.

10.1. Using thens attribute

Theelement pattern uses anns attributeto specify the namespace URI of the elements that it matches. Forexample,

<element name="foo" ns="http://www.example.com">  <empty/></element>

would match any of:

<foo xmlns="http://www.example.com"/><e:foo xmlns:e="http://www.example.com"/><example:foo xmlns:example="http://www.example.com"/>

but not any of:

<foo/><e:foo xmlns:e="http://WWW.EXAMPLE.COM"/><example:foo xmlns:example="http://www.example.net"/>

A value of an empty string for thens attributeindicates a null or absent namespace URI (just as with thexmlns attribute). Thus, the pattern

<element name="foo" ns="">  <empty/></element>

matches any of:

<foo xmlns=""/><foo/>

but not any of:

<foo xmlns="http://www.example.com"/><e:foo xmlns:e="http://www.example.com"/>

It is tedious and error-prone to specify thensattribute on everyelement, so RELAX NG allows it to bedefaulted. If anelement pattern does not specify anns attribute, then it defaults to the value of thens attribute of the nearest ancestor that has anns attribute, or the empty string if there is no suchancestor. Thus,

<element name="addressBook">  <zeroOrMore>    <element name="card">      <element name="name">        <text/>      </element>      <element name="email">        <text/>      </element>    </element>  </zeroOrMore></element>

is equivalent to

<element name="addressBook" ns="">  <zeroOrMore>    <element name="card" ns="">      <element name="name" ns="">        <text/>      </element>      <element name="email" ns="">        <text/>      </element>    </element>  </zeroOrMore></element>

and

<element name="addressBook" ns="http://www.example.com">  <zeroOrMore>    <element name="card">      <element name="name">        <text/>      </element>      <element name="email">        <text/>      </element>    </element>  </zeroOrMore></element>

is equivalent to

<element name="addressBook" ns="http://www.example.com">  <zeroOrMore>    <element name="card" ns="http://www.example.com">      <element name="name" ns="http://www.example.com">        <text/>      </element>      <element name="email" ns="http://www.example.com">        <text/>      </element>    </element>  </zeroOrMore></element>

Theattribute pattern also takes anns attribute. However, there is adifference in how it defaults. This is because of the fact that theXML Namespaces Recommendation does not apply the default namespace toattributes. If anns attribute is notspecified on theattribute pattern, then itdefaults to the empty string. Thus,

<element name="addressBook" ns="http://www.example.com">  <zeroOrMore>    <element name="card">      <attribute name="name"/>      <attribute name="email"/>    </element>  </zeroOrMore></element>

is equivalent to

<element name="addressBook" ns="http://www.example.com">  <zeroOrMore>    <element name="card" ns="http://www.example.com">      <attribute name="name" ns=""/>      <attribute name="email" ns=""/>    </element>  </zeroOrMore></element>

and so will match

<addressBook xmlns="http://www.example.com">  <card name="John Smith" email="js@example.com"/></addressBook>

or

<example:addressBook xmlns:example="http://www.example.com">  <example:card name="John Smith" email="js@example.com"/></example:addressBook>

but not

<example:addressBook xmlns:example="http://www.example.com">  <example:card example:name="John Smith" example:email="js@example.com"/></example:addressBook>

10.2. Qualified names

When a pattern matches elements and attributes from multiplenamespaces, using thens attribute would requirerepeating namespace URIs in different places in the pattern. This iserror-prone and hard to maintain, so RELAX NG also allows theelement andattribute patterns to use aprefix in the value of thename attribute to specify thenamespace URI. In this case, the prefix specifies the namespace URI towhich that prefix is bound by the namespace declarations in scope ontheelement orattribute pattern. Thus,

<element name="ab:addressBook" xmlns:ab="http://www.example.com/addressBook"                               xmlns:a="http://www.example.com/address">  <zeroOrMore>    <element name="ab:card">      <element name="a:name">        <text/>      </element>      <element name="a:email">        <text/>      </element>    </element>  </zeroOrMore></element>

is equivalent to

<element name="addressBook" ns="http://www.example.com/addressBook">  <zeroOrMore>    <element name="card" ns="http://www.example.com/addressBook">      <element name="name" ns="http://www.example.com/address">        <text/>      </element>      <element name="email" ns="http://www.example.com/address">        <text/>      </element>    </element>  </zeroOrMore></element>

If a prefix is specified in the value of thenameattribute of anelement orattributepattern, then that prefix determines the namespace URI of the elementsor attributes that will be matched by that pattern, regardless ofthe value of anyns attribute.

Note that the XML default namespace (as specified by thexmlns attribute) is not used in determining the namespaceURI of elements and attributes thatelement andattribute patterns match.

11. Name classes

Normally, the name of the element to be matched by anelement element is specified by anameattribute. Anelement element can instead start with anelement specifying aname-class. In this case, theelement pattern will only match an element if the name ofthe element is a member of the name-class. The simplest name-class isanyName, which any name at all is a member of, regardlessof its local name and its namespace URI. For example, the followingpattern matches any well-formed XML document:

<grammar>  <start>    <ref name="anyElement"/>  </start>  <define name="anyElement">    <element>      <anyName/>      <zeroOrMore><choice>  <attribute>    <anyName/>  </attribute>  <text/>  <ref name="anyElement"/></choice>      </zeroOrMore>    </element>  </define></grammar>

ThensName name-class contains anyname with the namespace URI specified by thens attribute, which defaults in the same wayas thens attribute on theelement pattern.

Thechoice name-class matches any name that is amember of any of its child name-classes.

TheanyName andnsNamename-classes can contain anexcept clause. Forexample,

<element name="card" ns="http://www.example.com">  <zeroOrMore>    <attribute>      <anyName>        <except>          <nsName/>          <nsName ns=""/>        </except>      </anyName>    </attribute>  </zeroOrMore>  <text/></element>

would allow thecard element to have any number ofnamespace-qualified attributes provided that they were qualified withnamespace other than that of thecard element.

Note that anattribute pattern matches a singleattribute even if it has a name-class that contains multiple names.To match zero or more attributes, thezeroOrMore elementmust be used.

Thename name-class contains a single name.The content of thename element specifies the namein the same way as thename attribute of theelement pattern. Thensattribute specifies the namespace URI in the same way as theelement pattern.

Some schema languages have a concept oflax validation,where an element or attribute is validated against a definition onlyif there is one. We can implement this concept in RELAX NG with nameclasses that usesexcept andname.Suppose, for example, we wanted to allow an element to have anyattribute with a qualified name, but we still wanted to ensure that ifthere was anxml:space attribute, it had the valuedefault orpreserve. It wouldn't work touse

<element name="example">  <zeroOrMore>    <attribute>      <anyName/>    </attribute>  </zeroOrMore>  <optional>    <attribute name="xml:space">      <choice>        <value>default</value>        <value>preserve</value>      </choice>    </attribute>  </optional></element>

because anxml:space attribute with a valueother thandefault orpreservewould match

    <attribute>      <anyName/>    </attribute>

even though it did not match

    <attribute name="xml:space">      <choice>        <value>default</value>        <value>preserve</value>      </choice>    </attribute>

The solution is to usename together withexcept:

<element name="example">  <zeroOrMore>    <attribute>      <anyName>        <except>          <name>xml:space</name>        </except>      </anyName>    </attribute>  </zeroOrMore>  <optional>    <attribute name="xml:space">      <choice>        <value>default</value>        <value>preserve</value>      </choice>    </attribute>  </optional></element>

Note that thedefine element cannot contain aname-class; it can only contain a pattern.

12. Annotations

If a RELAX NG element has an attribute or child element with anamespace URI other than the RELAX NG namespace, then that attribute orelement is ignored. Thus, you can add annotations to RELAX NG patternssimply by using an attribute or element in a separate namespace:

<element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0" xmlns:a="http://www.example.com/annotation">  <zeroOrMore>    <element name="card">      <a:documentation>Information about a single email address.</a:documentation>      <element name="name">        <text/>      </element>      <element name="email">        <text/>      </element>    </element>  </zeroOrMore></element>

RELAX NG also provides adiv element whichallows an annotation to be applied to a group of definitions in agrammar. For example, you might want to divide up the definitions ofthe grammar into modules:

<grammar xmlns:m="http://www.example.com/module">  <div m:name="inline">    <define name="code">pattern </define>    <define name="em">pattern </define>    <define name="var">pattern </define>  </div>  <div m:name="block">    <define name="p">pattern </define>    <define name="ul">pattern </define>    <define name="ol">pattern </define>  </div></grammar>

This would allow you easily to generate variants of the grammarbased on a selection of modules.

A companion specification, RELAX NG DTD Compatibility[Compatibility], defines annotations to implementsome features of XML DTDs.

13. Nested grammars

There is no prohibition against nesting grammar patterns. Aref pattern refers to a definition from nearestgrammar ancestor. There is also aparentRef element that escapes out of the currentgrammar and references a definition from the parent of the currentgrammar.

Imagine the problem of writing a pattern for tables. The patternfor tables only cares about the structure of tables; it doesn't careabout what goes inside a table cell. First, we create a RELAX NG patterntable.rng as follows:

<grammar><define name="cell.content">  <notAllowed/></define><start>  <element name="table">    <oneOrMore>      <element name="tr">        <oneOrMore>  <element name="td">    <ref name="cell.content"/>  </element>        </oneOrMore>      </element>    </oneOrMore>  </element></start></grammar>

Patterns that includetable.rng must redefinecell.content. By using a nestedgrammar pattern containing aparentRef pattern, the including pattern canredefinecell.content to be a pattern defined inthe including pattern's grammar, thus effectively importing a patternfrom the parent grammar into the child grammar:

<grammar><start>  <element name="doc">    <zeroOrMore>      <choice><element name="p">  <ref name="inline"/></element><grammar>  <include href="table.rng">    <define name="cell.content">      <parentRef name="inline"/>    </define>          </include></grammar>      </choice>    </zeroOrMore>  </element></start><define name="inline">  <zeroOrMore>    <choice>      <text/>      <element name="em">        <ref name="inline"/>      </element>    </choice>  </zeroOrMore></define></grammar>

Of course, in a trivial case like this, there is no advantage innesting the grammars: we could simply have includedtable.rng within the outergrammar element.However, when the included grammar has many definitions, nesting itavoids the possibility of name conflicts between the including grammarand the included grammar.

14. Non-restrictions

RELAX NG does not require patterns to be "deterministic" or"unambiguous".

Suppose we wanted to write the email address book in HTML, but useclass attributes to specify the structure:

<element name="html">  <element name="head">    <element name="title">      <text/>    </element>  </element>  <element name="body">    <element name="table">      <attribute name="class">        <value>addressBook</value>      </attribute>      <oneOrMore>        <element name="tr">  <attribute name="class">    <value>card</value>  </attribute>          <element name="td">    <attribute name="class">      <value>name</value>    </attribute>            <interleave>              <text/>              <optional>                <element name="span">                  <attribute name="class">                    <value>givenName</value>                  </attribute>                  <text/>                </element>              </optional>              <optional>                <element name="span">                  <attribute name="class">                    <value>familyName</value>                  </attribute>                  <text/>                </element>              </optional>            </interleave>          </element>          <element name="td">    <attribute name="class">      <value>email</value>    </attribute>            <text/>          </element>        </element>      </oneOrMore>    </element>  </element></element>

This would match a XML document such as:

<html>  <head>    <title>Example Address Book</title>  </head>  <body>    <table>      <tr>        <td>          <span>John</span>          <span>Smith</span>        </td>        <td>js@example.com</td>      </tr>    </table>  </body></html>

but not:

<html>  <head>    <title>Example Address Book</title>  </head>  <body>    <table>      <tr>        <td>          <span>John</span>          <!-- Note the incorrect class attribute -->          <span>Smith</span>        </td>        <td>js@example.com</td>      </tr>    </table>  </body></html>

15. Further information

The definitive specification of RELAX NG is[RELAX NG].

A. Comparison with XML DTDs

RELAX NG provides functionality that goes beyond XML DTDs. Inparticular, RELAX NG

  • uses XML syntax to represent schemas
  • supports datatyping
  • integrates attributes into contentmodels
  • supports XML namespaces
  • supports unordered content
  • supports context-sensitive contentmodels

ID/IDREF validation is not provided by RELAX NG; however, it isprovided by a companion specification, RELAX NG DTD Compatibility[Compatibility]. Comprehensive support forcross-reference checking is planned for a future specification.

RELAX NG does not support features of XML DTDs that involvechanging the infoset of an XML document. In particular, RELAXNG

  • does not allow defaults for attributes to bespecified; however, this is allowed by RELAX NG DTD Compatibility[Compatibility]
  • does not allow entities to be specified
  • does not allow notations to be specified
  • does not specify whether whitespace is significant

Also RELAX NG does not define a way for an XML document to associateitself with a RELAX NG pattern.

B. Comparison with RELAX Core

Any description in RELAX Core can be directly captured in RELAXNG without loss of information.

B.1. Mapping RELAX NG to RELAX Core

B.1.1.elementRule-tag pairs

AnelementRule as well as the referencedtag element is typically captured by adefine element containing anelement element as the child.

AnelementRule-tag pair in RELAX Core is shown below:

<elementRule role="foo" label="bar">hedge model</elementRule>
<tag role="foo" name="baz">attribute declarations</tag>

A rewrite in RELAX NG is shown below:

<define name="bar">  <element name="baz">hedge modelattribute declarations  </element></define>

B.1.2.hedgeRule

AhedgeRule element is captured by adefine element containing attributedeclarations.

AhedgeRule elementin RELAX Core is shown below:

<hedgeRule label="bar">hedge model</hedgeRule>

A rewrite in RELAX NG is:

<define name="bar">hedge model</define>

B.1.3.attPool

AnattPool elementin RELAX Core is shown below:

<attPool role="foo">attribute declarations</attPool>

A rewrite in RELAX NG is

<define name="foo">attribute declarations</define>

B.1.4. Hedge models

Mapping of hedge models in RELAX Core to RELAX NG is summarized below:

  1. occurs="*" in RELAX Core is captured by<zeroOrMore>...</zeroOrMore>.
  2. occurs="+" in RELAX Core is captured by<oneOrMore>...</oneOrMore>
  3. occurs="?" in RELAX Core is captured by<optional>...</optional>
  4. <mixed>...</mixed> inRELAX Core is captured by<mixed>...</mixed>
  5. <ref label="..."/> inRELAX Core is captured by<ref name="..."/>.
  6. <hedgeRef label="..."/> inRELAX Core is captured by<ref name="..."/>

B.1.5. Attribute declarations

Both languages useattribute. However, inRELAX Core, anattribute withoutrequired="true" declares a defaultable attribute. On the other hand, in RELAX NG, a defaultable attribute has to be declared by anattribute element within anoptional element.

Declaration of a required attribute in RELAX Core is shown below:

<attribute name="foo" type="integer" required="true"/>

In RELAX NG, this is captured by:

<attribute name="foo">  <data type="integer"/></attribute>

Declaration of an optional attribute in RELAX Core is shownbelow:

<attribute name="foo" type="integer"/>

In RELAX NG, this is captured by:

<optional>  <attribute name="foo">    <data type="integer"/>  </attribute></optional>

B.2. Examples

B.2.1. Ancestor-and-sibling-sensitive content models

Here is a rewrite of an example inSTEP7 of "HOW TO RELAX". The first paragraph cannot containfootnotes, but the other paragraphs can.

<grammar>  <start>    <element name="doc">      <ref name="paraWithoutFNotes"/>      <zeroOrMore>        <ref name="paraWithFNotes"/>      </zeroOrMore>    </element>  </start>  <define name="paraWithoutFNotes">    <element name="para">      <text/>    </element>  </define>  <define name="paraWithFNotes">    <element name="para">      <mixed>        <zeroOrMore>          <element name="fnote">            <text/>          </element>        </zeroOrMore>      </mixed>    </element>  </define></grammar>

The following document matches this pattern:

<doc><para/><para><fnote/></para></doc>

On the other hand, the following document does not:

<doc><para><fnote/></para></doc>

B.2.2. Attribute-sensitive content model

Here is a rewrite of an example inSTEP8 of "HOW TO RELAX". This pattern assigns different contentmodels for the same tag namediv depending on thevalue of the attributeclass.

<grammar>  <start>    <element name="html">      <zeroOrMore>        <ref name="section"/>      </zeroOrMore>    </element>  </start>  <define name="section">    <element name="div">      <attribute name="class"><value>section</value></attribute>      <zeroOrMore>        <element name="para">          <text/>        </element>      </zeroOrMore>      <zeroOrMore>        <ref name="subsection"/>      </zeroOrMore>   </element>  </define>  <define name="subsection">    <element name="div">      <attribute name="class"><value>subsection</value></attribute>      <zeroOrMore>        <element name="para">          <text/>        </element>      </zeroOrMore>    </element>  </define></grammar>

The following document matches this pattern:

<html>  <div>    <para/>    <div>      <para/>    </div>  </div>  <div>    <div>      <para/>    </div>  </div></html>

On the other hand, the following document does not:

<html>  <div>    <para/>    <div>      <para/>    </div>  </div></html>

B.3. Features of RELAX NG beyond RELAX Core

RELAX NG has some features which are missing in RELAXCore.

  1. Namespaces: since RELAX Core is intended to be used inconjunction with RELAX Namespace, RELAX Core does not supportnamespaces. On the other hand, RELAX NG supports namespaces. RELAXNamespace will be extended so that it can work with RELAX NG.
  2. Mixture ofelement andattribute: RELAX Core does not allow their mixture but rather provide two types of basic constructs, namelyelementRule/hedgeRule andtag/attPool.
  3. Name classes: RELAX Core does not have nameclasses but merely provide name literals.
  4. interleave: RELAX Core does not provide any mechanism for interleaving.
  5. Datatype libraries: RELAX Core allows XML Schema Part2 but does not allow other datatype libaries.
  6. define ininclude: RELAX Core does not allow such redefinitions.
  7. list: RELAX Core does not providesuch structured strings.
  8. data inchoice:in RELAX Core, the hedge model ofelementRule is either a datatype reference or an expression without datatype references.

C. Comparison with TREX

RELAX NG has the following changes from TREX:

  1. theconcur pattern has been removed
  2. thestring pattern has been replaced by thevalue pattern
  3. theanyString pattern has been renamed totext
  4. the namespace URI is different
  5. pattern elements must be namespace qualified
  6. anonymous datatypes have been removed
  7. thedata pattern can have parameters specified byparam child elements
  8. thelist pattern has been addedfor matching whitespace-separated lists of tokens
  9. thereplace andgroup values for thecombineattribute have been removed
  10. aninclude element in a grammar may containdefine elements that replace included definitions
  11. the restriction that definitions combined with thecombine attribute must be from different files hasbeen removed
  12. adiv element may be used to grouptogether definitions within agrammar
  13. aninclude element occurring as apattern has been renamed toexternalRef; aninclude element is now allowed only as a child ofthegrammar element
  14. theparent attribute on theref element has been replaced by a newparentRef element
  15. thetype attribute of thedata element is an unqualified name; thedata element uses thedatatypeLibrary attribute rather than thens attribute to identify the namespace of thedatatype
  16. astart element is not allowed tohave aname attribute
  17. anattribute element is not allowedto have aglobal attribute
  18. thenot anddifferencename classes have been replaced byexcept
  19. thedata element may haveanexcept child

D. Changes from 12 June 2001 version

  1. key andkeyRefhave been removed; support for ID and IDREF is now availablein a companion specification, RELAX NG DTD CompatibilityAnnotations[Compatibility]
  2. difference andnothave been replaced byexcept
  3. astart element is no longerallowed to have aname attribute
  4. anattribute element is no longerallowed to have aglobalattribute

References

Compatibility
James Clark, MakotoMURATA, editors.RELAX NGDTD Compatibility. OASIS, 2001.
RELAX
MURATA Makoto.RELAX (RegularLanguage description for XML). INSTAC(Information Technology Research and Standardization Center), 2001.
RELAX NG
James Clark, MakotoMURATA, editors.RELAX NGSpecification. OASIS, 2001.
TREX
James Clark.TREX - Tree Regular Expressions for XML.Thai Open Source Software Center, 2001.
W3C XML Schema Datatypes
Paul V. Biron, Ashok Malhotra, editors.XML Schema Part 2: Datatypes.W3C (World Wide Web Consortium), 2001.

[8]ページ先頭

©2009-2025 Movatter.jp