Copyright � The Organization for the Advancement ofStructured Information Standards [OASIS] 2001. All RightsReserved.
This document and translations of it may be copied and furnishedto others, and derivative works that comment on or otherwise explainit or assist in its implementation may be prepared, copied, publishedand distributed, in whole or in part, without restriction of any kind,provided that the above copyright notice and this paragraph areincluded on all such copies and derivative works. However, thisdocument itself may not be modified in any way, such as by removingthe copyright notice or references to OASIS, except as needed for thepurpose of developing OASIS specifications, in which case theprocedures for copyrights defined in the OASIS Intellectual PropertyRights document must be followed, or as required to translate it intolanguages other than English.
The limited permissions granted above are perpetual and will notbe revoked by OASIS or its successors or assigns.
This document and the information contained herein is providedon an "AS IS" basis and OASIS DISCLAIMS ALL WARRANTIES,EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THEUSE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANYIMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULARPURPOSE.
RELAX NG is a simple schema language for XML, based on[RELAX] and[TREX]. A RELAX NG schemaspecifies a pattern for the structure and content of an XMLdocument. A RELAX NG schema thus identifies a class of XML documentsconsisting of those documents that match the pattern. A RELAX NGschema is itself an XML document.
This document is a tutorial for RELAX NG version 1.0.
This Committee Specification was approved for publication by theOASIS RELAX NG technical committee. It is a stable document whichrepresents the consensus of the committee. Comments on this documentmay be sent torelax-ng-comment@lists.oasis-open.org.
A list of known errors in this document is available athttp://www.oasis-open.org/committees/relax-ng/tutorial-20011203-errata.html.
Consider a simple XML representation of an email address book:
<addressBook> <card> <name>John Smith</name> <email>js@example.com</email> </card> <card> <name>Fred Bloggs</name> <email>fb@example.net</email> </card></addressBook>
The DTD would be as follows:
<!DOCTYPE addressBook [<!ELEMENT addressBook (card*)><!ELEMENT card (name, email)><!ELEMENT name (#PCDATA)><!ELEMENT email (#PCDATA)>]>
A RELAX NG pattern for this could be written as follows:
<element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore></element>
If theaddressBook is required to be non-empty, thenwe can useoneOrMore instead ofzeroOrMore:
<element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0"> <oneOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </oneOrMore></element>
Now let's change it to allow eachcard to have anoptionalnote element:
<element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional><element name="note"> <text/></element> </optional> </element> </zeroOrMore></element>
Note that thetext pattern matches arbitrary text,including empty text. Note also that whitespace separating tags isignored when matching against a pattern.
All the elements specifying the pattern must be namespace qualifiedby the namespace URI:
http://relaxng.org/ns/structure/1.0
The examples above use a default namespace declarationxmlns="http://relaxng.org/ns/structure/1.0" for this. Anamespace prefix is equally acceptable:
<rng:element name="addressBook" xmlns:rng="http://relaxng.org/ns/structure/1.0"> <rng:zeroOrMore> <rng:element name="card"> <rng:element name="name"> <rng:text/> </rng:element> <rng:element name="email"> <rng:text/> </rng:element> </rng:element> </rng:zeroOrMore></rng:element>
For the remainder of this document, the default namespacedeclaration will be left out of examples.
Now suppose we want to allow thename to be brokendown into agivenName and afamilyName,allowing anaddressBook like this:
<addressBook> <card> <givenName>John</givenName> <familyName>Smith</familyName> <email>js@example.com</email> </card> <card> <name>Fred Bloggs</name> <email>fb@example.net</email> </card></addressBook>
We can use the following pattern:
<element name="addressBook"> <zeroOrMore> <element name="card"> <choice> <element name="name"> <text/> </element> <group> <element name="givenName"> <text/> </element> <element name="familyName"> <text/> </element> </group> </choice> <element name="email"> <text/> </element> <optional><element name="note"> <text/></element> </optional> </element> </zeroOrMore></element>
This corresponds to the following DTD:
<!DOCTYPE addressBook [<!ELEMENT addressBook (card*)><!ELEMENT card ((name | (givenName, familyName)), email, note?)><!ELEMENT name (#PCDATA)><!ELEMENT email (#PCDATA)><!ELEMENT givenName (#PCDATA)><!ELEMENT familyName (#PCDATA)><!ELEMENT note (#PCDATA)>]>
Suppose we want thecard element to have attributesrather than child elements. The DTD might look like this:
<!DOCTYPE addressBook [<!ELEMENT addressBook (card*)><!ELEMENT card EMPTY><!ATTLIST card name CDATA #REQUIRED email CDATA #REQUIRED>]>
Just change eachelement pattern to anattribute pattern:
<element name="addressBook"> <zeroOrMore> <element name="card"> <attribute name="name"> <text/> </attribute> <attribute name="email"> <text/> </attribute> </element> </zeroOrMore></element>
In XML, the order of attributes is traditionally not significant.RELAX NG follows this tradition. The above pattern would match both
<card name="John Smith" email="js@example.com"/>
and
<card email="js@example.com" name="John Smith"/>
In contrast, the order of elements is significant. The pattern
<element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element></element>
wouldnot match
<card><email>js@example.com</email><name>John Smith</name></card>
Note that anattribute element by itself indicates arequired attribute, just as anelement element by itselfindicates a required element. To specify an optional attribute, useoptional just as withelement:
<element name="addressBook"> <zeroOrMore> <element name="card"> <attribute name="name"> <text/> </attribute> <attribute name="email"> <text/> </attribute> <optional> <attribute name="note"> <text/> </attribute> </optional> </element> </zeroOrMore></element>
Thegroup andchoice patterns can beapplied toattribute patterns in the same way they areapplied toelement patterns. For example, if we wantedto allow either aname attribute or both agivenName and afamilyName attribute, we canspecify this in the same way that we would if we were usingelements:
<element name="addressBook"> <zeroOrMore> <element name="card"> <choice> <attribute name="name"> <text/> </attribute> <group> <attribute name="givenName"> <text/> </attribute> <attribute name="familyName"> <text/> </attribute> </group> </choice> <attribute name="email"> <text/> </attribute> </element> </zeroOrMore></element>
Thegroup andchoicepatterns can combineelement andattribute patterns without restriction. Forexample, the following pattern would allow a choice of elements andattributes independently for both thename and theemail part of acard:
<element name="addressBook"> <zeroOrMore> <element name="card"> <choice><element name="name"> <text/></element><attribute name="name"> <text/></attribute> </choice> <choice><element name="email"> <text/></element><attribute name="email"> <text/></attribute> </choice> </element> </zeroOrMore></element>
As usual, the relative order of elements is significant, but therelative order of attributes is not. Thus the above would match anyof:
<card name="John Smith" email="js@example.com"/><card email="js@example.com" name="John Smith"/><card email="js@example.com"><name>John Smith</name></card><card name="John Smith"><email>js@example.com</email></card><card><name>John Smith</name><email>js@example.com</email></card>
However, it would not match
<card><email>js@example.com</email><name>John Smith</name></card>
because the pattern forcard requires anyemail child element to follow anyname childelement.
There is one difference betweenattribute andelement patterns:<text/>is the default for the content of anattribute pattern,whereas anelement pattern is not allowed to beempty. For example,
<attribute name="email"/>
is short for
<attribute name="email"> <text/></attribute>
It might seem natural that
<element name="x"/>
matched anx element with no attributes and nocontent. However, this would make the meaning of empty contentinconsistent between theelement pattern and theattribute pattern, so RELAX NG does not allow theelement pattern to be empty. A pattern that matches anelement with no attributes and no children must use<empty/> explicitly:
<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional> <element name="prefersHTML"> <empty/> </element> </optional> </element> </zeroOrMore></element>
Even if the pattern in anelement patternmatches attributes only, there is no need to useempty. For example,
<element name="card"> <attribute name="email"> <text/> </attribute></element>
is equivalent to
<element name="card"> <attribute name="email"> <text/> </attribute> <empty/></element>
For a non-trivial RELAX NG pattern, it is often convenient to be ableto give names to parts of the pattern. Instead of
<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"><text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore></element>
we can write
<grammar> <start> <element name="addressBook"> <zeroOrMore><element name="card"> <ref name="cardContent"/></element> </zeroOrMore> </element> </start> <define name="cardContent"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </define></grammar>
Agrammar element has a singlestartchild element, and zero or moredefine child elements.Thestart anddefine elements containpatterns. These patterns can containref elements thatrefer to patterns defined by any of thedefine elementsin thatgrammar element. Agrammar patternis matched by matching the pattern contained in thestartelement.
We can use thegrammar element to write patterns in astyle similar to DTDs:
<grammar> <start> <ref name="AddressBook"/> </start> <define name="AddressBook"> <element name="addressBook"> <zeroOrMore> <ref name="Card"/> </zeroOrMore> </element> </define> <define name="Card"> <element name="card"> <ref name="Name"/> <ref name="Email"/> </element> </define> <define name="Name"> <element name="name"> <text/> </element> </define> <define name="Email"> <element name="email"> <text/> </element> </define></grammar>
Recursive references are allowed. For example,
<define name="inline"> <zeroOrMore> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> <element name="span"> <optional> <attribute name="style"/> </optional> <ref name="inline"/> </element> </choice> </zeroOrMore></define>
However, recursive references must be within anelement. Thus, the following isnotallowed:
<define name="inline"> <choice> <text/> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> <element name="span"> <optional><attribute name="style"/> </optional> <ref name="inline"/> </element> </choice> <optional> <ref name="inline"/> </optional></define>
RELAX NG allows patterns to reference externally-defineddatatypes, such as those defined by[W3C XML Schema Datatypes]. RELAX NGimplementations may differ in what datatypes they support. You mustuse datatypes that are supported by the implementation you plan touse.
Thedata pattern matches a string thatrepresents a value of a named datatype. ThedatatypeLibrary attribute contains a URIidentifying the library of datatypes being used. The datatypelibrary defined by[W3C XML Schema Datatypes] would be identified by theURIhttp://www.w3.org/2001/XMLSchema-datatypes.Thetype attribute specifies the name of thedatatype in the library identified by thedatatypeLibrary attribute. For example, if aRELAX NG implementation supported the datatypes of[W3C XML Schema Datatypes], you could use:
<element name="number"> <data type="integer" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"/></element>
It is inconvenient to specify thedatatypeLibrary attribute on everydata element, so RELAX NG allows thedatatypeLibrary attribute to be inherited. ThedatatypeLibrary attribute can be specified on anyRELAX NG element. If adata element does not haveadatatypeLibrary attribute, it will use thevalue from the closest ancestor that has adatatypeLibrary attribute. Typically, thedatatypeLibrary attribute is specified on theroot element of the RELAX NG pattern. For example,
<element name="point" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <element name="x"> <data type="double"/> </element> <element name="y"> <data type="double"/> </element></element>
If the children of an element or an attribute match adata pattern, then complete content of the element orattribute must match thatdata pattern. It is notpermitted to have a pattern which allows part of the content to matchadata pattern, and another part to match anotherpattern. For example, the following pattern isnotallowed:
<element name="bad"> <data type="int"/> <element name="note"> <text/> </element></element>
However, this would be fine:
<element name="ok"> <data type="int"/> <attribute name="note"> <text/> </attribute></element>
Note that this restriction does not apply to thetext pattern.
Datatypes may have parameters. For example, a string datatype mayhave a parameter controlling the length of the string. The parametersapplicable to any particular datatype are determined by the datatypingvocabulary. Parameters are specified by adding one or moreparam elements as children of thedataelement. For example, the following constrains theemailelement to contain a string at most 127 characters long:
<element name="email"> <data type="string"> <param name="maxLength">127</param> </data></element>
Many markup vocabularies have attributes whose value is constrainedto be one of set of specified values. Thevalue patternmatches a string that has a specified value. For example,
<element name="card"> <attribute name="name"/> <attribute name="email"/> <attribute name="preferredFormat"> <choice> <value>html</value> <value>text</value> </choice> </attribute></element>
allows thepreferredFormat attribute to have the valuehtml ortext. This corresponds to theDTD:
<!DOCTYPE card [<!ELEMENT card EMPTY><!ATTLIST card name CDATA #REQUIRED email CDATA #REQUIRED preferredFormat (html|text) #REQUIRED>]>
Thevalue pattern is not restricted to attributevalues. For example, the following is allowed:
<element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <element name="preferredFormat"> <choice> <value>html</value> <value>text</value> </choice> </element></element>
The prohibition against adata pattern's matchingonly part of the content of an element also applies tovalue patterns.
By default, thevalue pattern will consider the stringin the pattern to match the string in the document if the two stringsare the same after the whitespace in both strings is normalized.Whitespace normalization strips leading and trailing whitespacecharacters, and collapses sequences of one or more whitespacecharacters to a single space character. This corresponds to thebehaviour of an XML parser for an attribute that is declared as otherthan CDATA. Thus the above pattern will match any of:
<card name="John Smith" email="js@example.com" preferredFormat="html"/><card name="John Smith" email="js@example.com" preferredFormat=" html "/>
The way that thevalue pattern compares thepattern string with the document string can be controlled byspecifying atype attribute and optionally adatatypeLibrary attribute, which identify adatatype in the same way as for thedata pattern.The pattern string matches the document string if they both representthe same value of the specified datatype. Thus, whereas thedata pattern matches an arbitrary value of adatatype, thevalue pattern matches a specificvalue of a datatype.
If there is no ancestor element with adatatypeLibrary element, the datatype librarydefaults to a built-in RELAX NG datatype library. This provides twodatatypes,string andtoken.The built-in datatypetoken corresponds to thedefault comparison behavior of thevalue pattern.The built-in datatypestring compares stringswithout any whitespace normalization (other than the end-of-line andattribute value normalization automatically performed by XML). Forexample,
<element name="card"> <attribute name="name"/> <attribute name="email"/> <attribute name="preferredFormat"> <choice> <value type="string">html</value> <value type="string">text</value> </choice> </attribute></element>
willnot match
<card name="John Smith" email="js@example.com" preferredFormat=" html "/>
Thelist pattern matches a whitespace-separatedsequence of tokens; it contains a pattern that the sequence ofindividual tokens must match. Thelist patternsplits a string into a list of strings, and then matches the resultinglist of strings against the pattern inside thelistpattern.
For example, suppose we want to have avectorelement that contains two floating point numbers separated bywhitespace. We could uselist as follows:
<element name="vector"> <list> <data type="float"/> <data type="float"/> </list></element>
Or suppose we want thevector element tocontain a list of one or more floating point numbers separated bywhitespace:
<element name="vector"> <list> <oneOrMore> <data type="double"/> </oneOrMore> </list></element>
Or suppose we want apath element containingan even number of floating point numbers:
<element name="path"> <list> <oneOrMore> <data type="double"/> <data type="double"/> </oneOrMore> </list></element>
Theinterleave pattern allows child elements to occurin any order. For example, the following would allow thecard element to contain thename andemail elements in any order:
<element name="addressBook"> <zeroOrMore> <element name="card"> <interleave><element name="name"> <text/></element><element name="email"> <text/></element> </interleave> </element> </zeroOrMore></element>
The pattern is calledinterleave because of how itworks with patterns that match more than one element. Suppose we wantto write a pattern for the HTMLhead element whichrequires exactly onetitle element, at most onebase element and zero or morestyle,script,link andmeta elementsand suppose we are writing agrammar pattern that has onedefinition for each element. Then we could define the pattern forhead as follows:
<define name="head"> <element name="head"> <interleave> <ref name="title"/> <optional> <ref name="base"/> </optional> <zeroOrMore> <ref name="style"/> </zeroOrMore> <zeroOrMore> <ref name="script"/> </zeroOrMore> <zeroOrMore> <ref name="link"/> </zeroOrMore> <zeroOrMore> <ref name="meta"/> </zeroOrMore> </interleave> </element></define>
Suppose we had ahead element that contained ameta element, followed by atitle element,followed by ameta element. This would match the patternbecause it is an interleaving of a sequence of twometaelements, which match the child pattern
<zeroOrMore> <ref name="meta"/> </zeroOrMore>
and a sequence of onetitle element, which matchesthe child pattern
<ref name="title"/>
The semantics of theinterleave pattern are that asequence of elements matches aninterleave pattern if itis an interleaving of sequences that match the child patterns of theinterleave pattern. Note that this is different from the& connector in SGML:A* & B matchesthe sequence of elementsA A B or the sequence ofelementsB A A but not the sequence of elementsA BA.
One special case ofinterleave is very common:interleaving<text/> with a patternp represents a pattern that matches whatpmatches but also allows characters to occur as children. Themixed element is a shorthand for this.
<mixed>p </mixed>
is short for
<interleave> <text/>p </interleave>
TheexternalRef pattern can be used toreference a pattern defined in a separate file. TheexternalRef element has a requiredhref attribute that specifies the URL of a filecontaining the pattern. TheexternalRef matches ifthe pattern contained in the specified URL matches. Suppose forexample, you have a RELAX NG pattern that matches HTML inline contentstored ininline.rng:
<grammar> <start> <ref name="inline"/> </start> <define name="inline"> <zeroOrMore> <choice> <text/> <element name="code"> <ref name="inline"/> </element> <element name="em"> <ref name="inline"/> </element> <!-- etc --> </choice> </zeroOrMore> </define></grammar>
Then we could allow thenote element to containinline HTML markup by usingexternalRef as follows:
<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> <optional><element name="note"> <externalRef href="inline.rng"/></element> </optional> </element> </zeroOrMore></element>
For another example, suppose you have two RELAX NG patterns stored infilespattern1.rng andpattern2.rng. Thenthe following is a pattern that matches anything matchedby either of those patterns:
<choice> <externalRef href="pattern1.rng"/> <externalRef href="pattern2.rng"/></choice>
If a grammar contains multiple definitions with the same name,then the definitions must specify how they are to be combined into asingle definition by using thecombine attribute.Thecombine attribute may have the valuechoice orinterleave. Forexample,
<define name="inline.class" combine="choice"> <element name="bold"> <ref name="inline"/> </element></define><define name="inline.class" combine="choice"> <element name="italic"> <ref name="inline"/> </element></define>
is equivalent to
<define name="inline.class"> <choice> <element name="bold"> <ref name="inline"/> </element> <element name="italic"> <ref name="inline"/> </element> </choice></define>
When combining attributes,combine="interleave"is typically used. For example,
<grammar> <start> <element name="addressBook"> <zeroOrMore><element name="card"> <ref name="card.attlist"/></element> </zeroOrMore> </element> </start> <define name="card.attlist" combine="interleave"> <attribute name="name"> <text/> </attribute> </define> <define name="card.attlist" combine="interleave"> <attribute name="email"> <text/> </attribute> </define></grammar>
is equivalent to
<grammar> <start> <element name="addressBook"> <zeroOrMore><element name="card"> <ref name="card.attlist"/></element> </zeroOrMore> </element> </start> <define name="card.attlist"> <interleave> <attribute name="name"><text/> </attribute> <attribute name="email"><text/> </attribute> </interleave> </define></grammar>
which is equivalent to
<grammar> <start> <element name="addressBook"> <zeroOrMore><element name="card"> <ref name="card.attlist"/></element> </zeroOrMore> </element> </start> <define name="card.attlist"> <group> <attribute name="name"><text/> </attribute> <attribute name="email"><text/> </attribute> </group> </define></grammar>
since combining attributes withinterleavehas the same effect as combining them withgroup.
It is an error for two definitions of the same name to specifydifferent values forcombine. Note that the orderof definitions within a grammar is not significant.
Multiplestart elements can be combined inthe same way as multiple definitions.
Theinclude element allows grammars to bemerged together. Agrammar pattern may haveinclude elements as children. Aninclude element has a requiredhref attribute that specifies the URL of a filecontaining agrammar pattern. The definitions inthe referencedgrammar pattern will be included ingrammar pattern containing theinclude element.
Thecombine attribute is particularly usefulin conjunction withinclude. For example, supposea RELAX NG patterninline.rng provides a patternfor inline content, which allowsbold anditalic elements arbitrarily nested:
<grammar> <define name="inline"> <zeroOrMore> <ref name="inline.class"/> </zeroOrMore> </define> <define name="inline.class"> <choice> <text/> <element name="bold"><ref name="inline"/> </element> <element name="italic"><ref name="inline"/> </element> </choice> </define></grammar>
Another RELAX NG pattern could useinline.rngand addcode andem to the setof inline elements as follows:
<grammar> <include href="inline.rng"/> <start> <element name="doc"> <zeroOrMore><element name="p"> <ref name="inline"/></element> </zeroOrMore> </element> </start> <define name="inline.class" combine="choice"> <choice> <element name="code"><ref name="inline"> </element> <element name="em"><ref name="inline"> </element> </choice> </define> </grammar>
This would be equivalent to
<grammar> <define name="inline"> <zeroOrMore> <ref name="inline.class"/> </zeroOrMore> </define> <define name="inline.class"> <choice> <text/> <element name="bold"><ref name="inline"/> </element> <element name="italic"><ref name="inline"/> </element> </choice> </define> <start> <element name="doc"> <zeroOrMore><element name="p"> <ref name="inline"/></element> </zeroOrMore> </element> </start> <define name="inline.class" combine="choice"> <choice> <element name="code"><ref name="inline"> </element> <element name="em"><ref name="inline"> </element> </choice> </define> </grammar>
which is equivalent to
<grammar> <define name="inline"> <zeroOrMore> <ref name="inline.class"/> </zeroOrMore> </define> <define name="inline.class"> <choice> <text/> <element name="bold"><ref name="inline"/> </element> <element name="italic"><ref name="inline"/> </element> <element name="code"><ref name="inline"> </element> <element name="em"><ref name="inline"> </element> </choice> </define> <start> <element name="doc"> <zeroOrMore><element name="p"> <ref name="inline"/></element> </zeroOrMore> </element> </start></grammar>
Note that it is allowed for one of the definitions of a name toomit thecombine attribute. However, it is anerror if there is more than one definition that does so.
ThenotAllowed pattern is useful when merginggrammars. ThenotAllowed pattern never matchesanything. Just as addingempty to agroup makes no difference, so addingnotAllowed to achoice makes nodifference. It is typically used to allow an including pattern tospecify additional choices withcombine="choice".For example, ifinline.rng were written likethis:
<grammar> <define name="inline"> <zeroOrMore> <choice><text/><element name="bold"> <ref name="inline"/></element><element name="italic"> <ref name="inline"/></element><ref name="inline.extra"/> </choice> </zeroOrMore> </define> <define name="inline.extra"> <notAllowed/> </define></grammar>
then it could be customized to allow inlinecode andem elements asfollows:
<grammar> <include href="inline.rng"/> <start> <element name="doc"> <zeroOrMore><element name="p"> <ref name="inline"/></element> </zeroOrMore> </element> </start> <define name="inline.extra" combine="choice"> <choice> <element name="code"><ref name="inline"> </element> <element name="em"><ref name="inline"> </element> </choice> </define> </grammar>
RELAX NG allowsdefine elements to be putinside theinclude element to indicate that theyare to replace definitions in the includedgrammarpattern.
Suppose the fileaddressBook.rngcontains:
<grammar> <start> <element name="addressBook"> <zeroOrMore><element name="card"> <ref name="cardContent"/></element> </zeroOrMore> </element> </start> <define name="cardContent"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </define></grammar>
Suppose we wish to modify this pattern so that thecard element contains anemailAddress element instead of anemail element. Then we could replace the definitionofcardContent as follows:
<grammar> <include href="addressBook.rng"> <define name="cardContent"> <element name="name"><text/> </element> <element name="emailAddress"><text/> </element> </define> </include></grammar>
This would be equivalent to
<grammar> <start> <element name="addressBook"> <zeroOrMore><element name="card"> <ref name="cardContent"/></element> </zeroOrMore> </element> </start> <define name="cardContent"> <element name="name"> <text/> </element> <element name="emailAddress"> <text/> </element> </define></grammar>
Aninclude element can also contain astart element, which replaces thestart in the included grammar pattern.
RELAX NG is namespace-aware. Thus, it considers an element or attributeto have both a local name and a namespace URI which togetherconstitute the name of that element or attribute.
Theelement pattern uses anns attributeto specify the namespace URI of the elements that it matches. Forexample,
<element name="foo" ns="http://www.example.com"> <empty/></element>
would match any of:
<foo xmlns="http://www.example.com"/><e:foo xmlns:e="http://www.example.com"/><example:foo xmlns:example="http://www.example.com"/>
but not any of:
<foo/><e:foo xmlns:e="http://WWW.EXAMPLE.COM"/><example:foo xmlns:example="http://www.example.net"/>
A value of an empty string for thens attributeindicates a null or absent namespace URI (just as with thexmlns attribute). Thus, the pattern
<element name="foo" ns=""> <empty/></element>
matches any of:
<foo xmlns=""/><foo/>
but not any of:
<foo xmlns="http://www.example.com"/><e:foo xmlns:e="http://www.example.com"/>
It is tedious and error-prone to specify thensattribute on everyelement, so RELAX NG allows it to bedefaulted. If anelement pattern does not specify anns attribute, then it defaults to the value of thens attribute of the nearest ancestor that has anns attribute, or the empty string if there is no suchancestor. Thus,
<element name="addressBook"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore></element>
is equivalent to
<element name="addressBook" ns=""> <zeroOrMore> <element name="card" ns=""> <element name="name" ns=""> <text/> </element> <element name="email" ns=""> <text/> </element> </element> </zeroOrMore></element>
and
<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card"> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore></element>
is equivalent to
<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card" ns="http://www.example.com"> <element name="name" ns="http://www.example.com"> <text/> </element> <element name="email" ns="http://www.example.com"> <text/> </element> </element> </zeroOrMore></element>
Theattribute pattern also takes anns attribute. However, there is adifference in how it defaults. This is because of the fact that theXML Namespaces Recommendation does not apply the default namespace toattributes. If anns attribute is notspecified on theattribute pattern, then itdefaults to the empty string. Thus,
<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card"> <attribute name="name"/> <attribute name="email"/> </element> </zeroOrMore></element>
is equivalent to
<element name="addressBook" ns="http://www.example.com"> <zeroOrMore> <element name="card" ns="http://www.example.com"> <attribute name="name" ns=""/> <attribute name="email" ns=""/> </element> </zeroOrMore></element>
and so will match
<addressBook xmlns="http://www.example.com"> <card name="John Smith" email="js@example.com"/></addressBook>
or
<example:addressBook xmlns:example="http://www.example.com"> <example:card name="John Smith" email="js@example.com"/></example:addressBook>
but not
<example:addressBook xmlns:example="http://www.example.com"> <example:card example:name="John Smith" example:email="js@example.com"/></example:addressBook>
When a pattern matches elements and attributes from multiplenamespaces, using thens attribute would requirerepeating namespace URIs in different places in the pattern. This iserror-prone and hard to maintain, so RELAX NG also allows theelement andattribute patterns to use aprefix in the value of thename attribute to specify thenamespace URI. In this case, the prefix specifies the namespace URI towhich that prefix is bound by the namespace declarations in scope ontheelement orattribute pattern. Thus,
<element name="ab:addressBook" xmlns:ab="http://www.example.com/addressBook" xmlns:a="http://www.example.com/address"> <zeroOrMore> <element name="ab:card"> <element name="a:name"> <text/> </element> <element name="a:email"> <text/> </element> </element> </zeroOrMore></element>
is equivalent to
<element name="addressBook" ns="http://www.example.com/addressBook"> <zeroOrMore> <element name="card" ns="http://www.example.com/addressBook"> <element name="name" ns="http://www.example.com/address"> <text/> </element> <element name="email" ns="http://www.example.com/address"> <text/> </element> </element> </zeroOrMore></element>
If a prefix is specified in the value of thenameattribute of anelement orattributepattern, then that prefix determines the namespace URI of the elementsor attributes that will be matched by that pattern, regardless ofthe value of anyns attribute.
Note that the XML default namespace (as specified by thexmlns attribute) is not used in determining the namespaceURI of elements and attributes thatelement andattribute patterns match.
Normally, the name of the element to be matched by anelement element is specified by anameattribute. Anelement element can instead start with anelement specifying aname-class. In this case, theelement pattern will only match an element if the name ofthe element is a member of the name-class. The simplest name-class isanyName, which any name at all is a member of, regardlessof its local name and its namespace URI. For example, the followingpattern matches any well-formed XML document:
<grammar> <start> <ref name="anyElement"/> </start> <define name="anyElement"> <element> <anyName/> <zeroOrMore><choice> <attribute> <anyName/> </attribute> <text/> <ref name="anyElement"/></choice> </zeroOrMore> </element> </define></grammar>
ThensName name-class contains anyname with the namespace URI specified by thens attribute, which defaults in the same wayas thens attribute on theelement pattern.
Thechoice name-class matches any name that is amember of any of its child name-classes.
TheanyName andnsNamename-classes can contain anexcept clause. Forexample,
<element name="card" ns="http://www.example.com"> <zeroOrMore> <attribute> <anyName> <except> <nsName/> <nsName ns=""/> </except> </anyName> </attribute> </zeroOrMore> <text/></element>
would allow thecard element to have any number ofnamespace-qualified attributes provided that they were qualified withnamespace other than that of thecard element.
Note that anattribute pattern matches a singleattribute even if it has a name-class that contains multiple names.To match zero or more attributes, thezeroOrMore elementmust be used.
Thename name-class contains a single name.The content of thename element specifies the namein the same way as thename attribute of theelement pattern. Thensattribute specifies the namespace URI in the same way as theelement pattern.
Some schema languages have a concept oflax validation,where an element or attribute is validated against a definition onlyif there is one. We can implement this concept in RELAX NG with nameclasses that usesexcept andname.Suppose, for example, we wanted to allow an element to have anyattribute with a qualified name, but we still wanted to ensure that ifthere was anxml:space attribute, it had the valuedefault orpreserve. It wouldn't work touse
<element name="example"> <zeroOrMore> <attribute> <anyName/> </attribute> </zeroOrMore> <optional> <attribute name="xml:space"> <choice> <value>default</value> <value>preserve</value> </choice> </attribute> </optional></element>
because anxml:space attribute with a valueother thandefault orpreservewould match
<attribute> <anyName/> </attribute>
even though it did not match
<attribute name="xml:space"> <choice> <value>default</value> <value>preserve</value> </choice> </attribute>
The solution is to usename together withexcept:
<element name="example"> <zeroOrMore> <attribute> <anyName> <except> <name>xml:space</name> </except> </anyName> </attribute> </zeroOrMore> <optional> <attribute name="xml:space"> <choice> <value>default</value> <value>preserve</value> </choice> </attribute> </optional></element>
Note that thedefine element cannot contain aname-class; it can only contain a pattern.
If a RELAX NG element has an attribute or child element with anamespace URI other than the RELAX NG namespace, then that attribute orelement is ignored. Thus, you can add annotations to RELAX NG patternssimply by using an attribute or element in a separate namespace:
<element name="addressBook" xmlns="http://relaxng.org/ns/structure/1.0" xmlns:a="http://www.example.com/annotation"> <zeroOrMore> <element name="card"> <a:documentation>Information about a single email address.</a:documentation> <element name="name"> <text/> </element> <element name="email"> <text/> </element> </element> </zeroOrMore></element>
RELAX NG also provides adiv element whichallows an annotation to be applied to a group of definitions in agrammar. For example, you might want to divide up the definitions ofthe grammar into modules:
<grammar xmlns:m="http://www.example.com/module"> <div m:name="inline"> <define name="code">pattern </define> <define name="em">pattern </define> <define name="var">pattern </define> </div> <div m:name="block"> <define name="p">pattern </define> <define name="ul">pattern </define> <define name="ol">pattern </define> </div></grammar>
This would allow you easily to generate variants of the grammarbased on a selection of modules.
A companion specification, RELAX NG DTD Compatibility[Compatibility], defines annotations to implementsome features of XML DTDs.
There is no prohibition against nesting grammar patterns. Aref pattern refers to a definition from nearestgrammar ancestor. There is also aparentRef element that escapes out of the currentgrammar and references a definition from the parent of the currentgrammar.
Imagine the problem of writing a pattern for tables. The patternfor tables only cares about the structure of tables; it doesn't careabout what goes inside a table cell. First, we create a RELAX NG patterntable.rng as follows:
<grammar><define name="cell.content"> <notAllowed/></define><start> <element name="table"> <oneOrMore> <element name="tr"> <oneOrMore> <element name="td"> <ref name="cell.content"/> </element> </oneOrMore> </element> </oneOrMore> </element></start></grammar>
Patterns that includetable.rng must redefinecell.content. By using a nestedgrammar pattern containing aparentRef pattern, the including pattern canredefinecell.content to be a pattern defined inthe including pattern's grammar, thus effectively importing a patternfrom the parent grammar into the child grammar:
<grammar><start> <element name="doc"> <zeroOrMore> <choice><element name="p"> <ref name="inline"/></element><grammar> <include href="table.rng"> <define name="cell.content"> <parentRef name="inline"/> </define> </include></grammar> </choice> </zeroOrMore> </element></start><define name="inline"> <zeroOrMore> <choice> <text/> <element name="em"> <ref name="inline"/> </element> </choice> </zeroOrMore></define></grammar>
Of course, in a trivial case like this, there is no advantage innesting the grammars: we could simply have includedtable.rng within the outergrammar element.However, when the included grammar has many definitions, nesting itavoids the possibility of name conflicts between the including grammarand the included grammar.
RELAX NG does not require patterns to be "deterministic" or"unambiguous".
Suppose we wanted to write the email address book in HTML, but useclass attributes to specify the structure:
<element name="html"> <element name="head"> <element name="title"> <text/> </element> </element> <element name="body"> <element name="table"> <attribute name="class"> <value>addressBook</value> </attribute> <oneOrMore> <element name="tr"> <attribute name="class"> <value>card</value> </attribute> <element name="td"> <attribute name="class"> <value>name</value> </attribute> <interleave> <text/> <optional> <element name="span"> <attribute name="class"> <value>givenName</value> </attribute> <text/> </element> </optional> <optional> <element name="span"> <attribute name="class"> <value>familyName</value> </attribute> <text/> </element> </optional> </interleave> </element> <element name="td"> <attribute name="class"> <value>email</value> </attribute> <text/> </element> </element> </oneOrMore> </element> </element></element>
This would match a XML document such as:
<html> <head> <title>Example Address Book</title> </head> <body> <table> <tr> <td> <span>John</span> <span>Smith</span> </td> <td>js@example.com</td> </tr> </table> </body></html>
but not:
<html> <head> <title>Example Address Book</title> </head> <body> <table> <tr> <td> <span>John</span> <!-- Note the incorrect class attribute --> <span>Smith</span> </td> <td>js@example.com</td> </tr> </table> </body></html>
RELAX NG provides functionality that goes beyond XML DTDs. Inparticular, RELAX NG
ID/IDREF validation is not provided by RELAX NG; however, it isprovided by a companion specification, RELAX NG DTD Compatibility[Compatibility]. Comprehensive support forcross-reference checking is planned for a future specification.
RELAX NG does not support features of XML DTDs that involvechanging the infoset of an XML document. In particular, RELAXNG
Also RELAX NG does not define a way for an XML document to associateitself with a RELAX NG pattern.
Any description in RELAX Core can be directly captured in RELAXNG without loss of information.
AnelementRule as well as the referencedtag element is typically captured by adefine element containing anelement element as the child.
AnelementRule-tag pair in RELAX Core is shown below:
<elementRule role="foo" label="bar">hedge model</elementRule>
<tag role="foo" name="baz">attribute declarations</tag>
A rewrite in RELAX NG is shown below:
<define name="bar"> <element name="baz">hedge modelattribute declarations </element></define>
AhedgeRule element is captured by adefine element containing attributedeclarations.
AhedgeRule elementin RELAX Core is shown below:
<hedgeRule label="bar">hedge model</hedgeRule>
A rewrite in RELAX NG is:
<define name="bar">hedge model</define>
AnattPool elementin RELAX Core is shown below:
<attPool role="foo">attribute declarations</attPool>
A rewrite in RELAX NG is
<define name="foo">attribute declarations</define>
Mapping of hedge models in RELAX Core to RELAX NG is summarized below:
Both languages useattribute. However, inRELAX Core, anattribute withoutrequired="true" declares a defaultable attribute. On the other hand, in RELAX NG, a defaultable attribute has to be declared by anattribute element within anoptional element.
Declaration of a required attribute in RELAX Core is shown below:
<attribute name="foo" type="integer" required="true"/>
In RELAX NG, this is captured by:
<attribute name="foo"> <data type="integer"/></attribute>
Declaration of an optional attribute in RELAX Core is shownbelow:
<attribute name="foo" type="integer"/>
In RELAX NG, this is captured by:
<optional> <attribute name="foo"> <data type="integer"/> </attribute></optional>
Here is a rewrite of an example inSTEP7 of "HOW TO RELAX". The first paragraph cannot containfootnotes, but the other paragraphs can.
<grammar> <start> <element name="doc"> <ref name="paraWithoutFNotes"/> <zeroOrMore> <ref name="paraWithFNotes"/> </zeroOrMore> </element> </start> <define name="paraWithoutFNotes"> <element name="para"> <text/> </element> </define> <define name="paraWithFNotes"> <element name="para"> <mixed> <zeroOrMore> <element name="fnote"> <text/> </element> </zeroOrMore> </mixed> </element> </define></grammar>
The following document matches this pattern:
<doc><para/><para><fnote/></para></doc>
On the other hand, the following document does not:
<doc><para><fnote/></para></doc>
Here is a rewrite of an example inSTEP8 of "HOW TO RELAX". This pattern assigns different contentmodels for the same tag namediv depending on thevalue of the attributeclass.
<grammar> <start> <element name="html"> <zeroOrMore> <ref name="section"/> </zeroOrMore> </element> </start> <define name="section"> <element name="div"> <attribute name="class"><value>section</value></attribute> <zeroOrMore> <element name="para"> <text/> </element> </zeroOrMore> <zeroOrMore> <ref name="subsection"/> </zeroOrMore> </element> </define> <define name="subsection"> <element name="div"> <attribute name="class"><value>subsection</value></attribute> <zeroOrMore> <element name="para"> <text/> </element> </zeroOrMore> </element> </define></grammar>
The following document matches this pattern:
<html> <div> <para/> <div> <para/> </div> </div> <div> <div> <para/> </div> </div></html>
On the other hand, the following document does not:
<html> <div> <para/> <div> <para/> </div> </div></html>
RELAX NG has some features which are missing in RELAXCore.
RELAX NG has the following changes from TREX: