Movatterモバイル変換

1. Introduction

The Efficient XML Interchange (EXI) format is a very compact, highperformance XML representation that was designed to work well for abroad range of applications. It simultaneously improves performanceand significantly reduces bandwidth requirements without compromisingefficient use of other resources such as battery life, code size,processing power, and memory.

EXI uses a grammar-driven approach that achieves very efficientencodings using a straightforward encoding algorithm and a small setofdatatype representations.Consequently,EXI processors are relatively simple andcan be implemented on devices with limited capacity.

EXI is schema"informed", meaning that it can utilize available schemainformation to improve compactness and performance, but does notdepend on accurate, complete or current schemas to work. It supportsarbitrary schema extensions and deviations and also works veryeffectively with partial schemas or in the absence of any schema. Theformat itself also does not depend on any particular schema language,or format, for schema information.

[Definition:] A program modulecalled anEXI processor, whether it is software orhardware, is used by application programs to encode their structured dataintoEXI streams and/or to decodeEXI streams to make the structureddata accessible.The former and latter aforementioned roles of EXI processors are called[Definition:] EXI stream encoder and[Definition:] EXI stream decoder, respectively.This document not only specifies theEXI format, but also defines errors that EXI processors are required todetect and behave upon.

The primary goal of this document is to define the EXI format completely without leaving ambiguity so as to make it feasible for implementations to interoperate. As such, the document lends itself to describing the design and features of the format in a systematic manner, often declaratively with relatively few prosaic annotations and examples. Those readers who prefer a step-by-step introduction to the EXI format design and features are suggested to start with the non-normative[EXI Primer].

1.1 History and Design

EXI is the result of extensive work carried out by the W3C's XMLBinary Characterization (XBC) and Efficient XML Interchange (EXI)Working Groups. XBC was chartered to investigate the costs andbenefits of an alternative form of XML, and formulate a way to objectivelyevaluate the potential of a substitute format for XML. Based on XBC'srecommendations, EXI was chartered, first to measure, evaluate, andcompare the performance of various XML technologies (using metricsdeveloped by XBC[XBC Measurement Methodologies]), and then, if it appearedsuitable, to formulate a recommendation for a W3C formatspecification. The measurements results and analyses, are presentedelsewhere[EXI Measurements Note]. The format described in thisdocument is the specification so recommended.

The functional requirements of the EXI format are those that wereprepared by the XBC WG in their analysis of the desirable propertiesof a high performance representation for XML[XBC Properties].Those properties were derived from a very broad set of use cases alsoidentified by the XBC working group[XBC Use Cases].

The design of the format presented here, is largely based on theresults of the measurements carried out by the group to evaluate theperformance characteristics (mainly of processing efficiency andcompactness) of various existing formats. The EXI format is based onEfficient XML[Efficient XML], including for example the basis heuristic grammar approach,compression algorithm, and resulting entropy encoding.

EXI is compatible with XML at the XML Information Set[XML Information Set] level, rather than at the XML syntax level. Thispermits it to encapsulate an efficient alternative syntax and grammarfor XML, while facilitating at least the potential for minimizing theimpact on XML application interoperability.

1.2 Notational Conventions and Terminology

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appearEMPHASIZED in this document, are to be interpreted as described in RFC2119[IETF RFC 2119]. Other terminology used to describe the EXIformat is defined in the body of this specification.

The termevent andstream is used throughout this document to denoteEXI event andEXI stream respectively unless the words are qualified differently to mean otherwise.

This document specifies an abstract grammar for EXI. In grammar notation, all terminalsymbols are represented in plain text and all non-terminal symbols arerepresented initalics. Grammar productions arerepresented as follows:

LeftHandSide : Terminal NonTerminal

A set of one or more grammar productions that share the sameleft-hand side non-terminal symbol are often presented together annotatedwithevent codes that specify how events matching the terminal symbols of the associated productions are represented in the EXI stream as follows:

LeftHandSide :
	Terminal₁ NonTerminal₁	EventCode₁
	Terminal₂ NonTerminal₂	EventCode₂
	Terminal₃ NonTerminal₃	EventCode₃
	...
	Terminal_n NonTerminal_n	EventCode_n

Section8.1 Grammar Notation introduces additional notations for describing productions andevent codes in grammars. Those additional notations facilitate concise representation of the EXI grammar system.

[Definition:] In this document, the termqname is used to denote aQName^XS2.QName values are composed of an uri, a local-name and an optional prefix. Two qnames are considered equal if they have the same uri and local-name, regardless of their prefix values. In cases where prefixes are not relevant, such as in the grammar notation, they are not specified by this document.

Terminal symbols that are qualified with a qname permit the use of a wildcard symbol (*) in place of or as part of a qname. The forms of terminal symbols involving qname wildcards used in grammars and their definitions are described in the table below.

Wildcard	Definition
SE (*)	The terminal symbol that matches a start element (SE) event with any qname.
SE (uri : *)	The terminal symbol that matches a start element (SE) event with any local-name in namespaceuri.
AT (*)	The terminal symbol that matches an attribute (AT) event with any qname.
AT (uri : *)	The terminal symbol that matches an attribute (AT) event with any local-name in namespaceuri.

Several prefixes are used throughout this document to designate certain namespaces. The bindings shown below are assumed, however, any prefixes can be used in practice if they are properly bound to the namespaces.

Prefix	Namespace Name
exi	http://www.w3.org/2009/exi
xsd	http://www.w3.org/2001/XMLSchema
xsi	http://www.w3.org/2001/XMLSchema-instance

In describing the layout of an EXI format construct, a pair of square brackets [ ] are used to surround the name of a field to denote that the occurrence of the field is optional in the structure of the part or component that contains the field.

In arithmetic expressions, the notation ⌈x⌉ wherex represents a real number denotes the ceiling ofx, that is, the smallest integer greater than or equal tox.

When it is stated that strings are sorted in lexicographical order,it is done so character by character, and the order among characters is determined by comparing their Unicode code points.

Unless stated otherwise, when this specification indicates one type is derived from another type, it means the type is derived by extension or restriction, not by union or list. Similarly, when this specification uses the term type hierarchy, it is referring to the hierarchy of types derived from one another by extension or restriction

2. Design Principles

The following design principles were used to guide the development of EXI and encourage consistent design decisions. They are listed here to provide insight into the EXI design rationale and to anchor discussions on desirable EXI traits.

General:: One of primary objectives of EXI is to maximize the number of systems, devices and applications that can communicate using XML data. Specialized approaches optimized for specific use cases should be avoided.
Minimal:: To reach the broadest set of small, mobile and embedded applications, simple, elegant approaches are preferred to large, analytical or complex ones.
Efficient:: EXI must be competitive with hand-optimized binary formats so it can be used by applications that require this level of efficiency.
Flexible:: EXI must deal flexibly and efficiently with documents that contain arbitrary schema extensions or deviate from their schema. Documents that contain schema deviations should not cause encoding to fail.
Interoperable:: EXI must integrate well with existing XML technologies, minimizing the changes required to those technologies. It must be compatible with the XML Information Set[XML Information Set], without significant subsetting or supersetting, in order to maintain interoperability with existing and prospective XML specifications.

3. Basic Concepts

EXI achieves broad generality, flexibility, and performance, by unifying concepts from formal language theory and information theory into a single, relatively simple algorithm. The algorithm uses a grammar to determine what is likely to occur at any given point in an XML document and encodes the most likely alternatives in fewer bits. The fully generalized algorithm works for any language that can be described by a grammar (e.g., XML, Java, HTTP, etc.); however, EXI is optimized specifically for XML languages.

The built-in EXI grammars accept any XML document or fragment and may be augmented with productions derived from schemas or other sources of information about what is likely to occur in a set of XML documents.When schemas are used, EXI also supports a user-customizable set of Datatype Representations for efficiently representing typed values.Though use of any schema languages including XML Schemas[XML Schema Structures] [XML Schema Datatypes], RELAX NG schemas[ISO/IEC 19757-2:2008], DTDs[XML 1.0] [XML 1.1] is permitted, EXI grammars anddatatype representations need to be given bindings for each schema language used.This specification only defines how EXI grammars and datatype representationsrelate to XML schema.

TheEXI stream encoder uses the grammar to map a stream of XML information items onto a smaller, lower entropy, stream ofevents.

TheEXI stream encoder then represents the stream of events using a set of simple variable length codes calledevent codes.Event codes are similar to Huffman codes[Huffman Coding], but are much simpler to compute and maintain. They are encoded directly as a sequence of values, or if additional compression is desired, they are passed to theEXI compression algorithm, which replaces frequently occurring event patterns to further reduce size.

4. EXI Streams

[Definition:] AnEXI stream is anEXI headerfollowed by an EXI body.[Definition:] TheEXI body carries the content of the document, while the EXI header communicates the options used for encoding the EXI body. Section5. EXI Header describes theEXI header.

[Definition:] The building block of an EXI body is anEXI event. An EXI body consists of a sequence of EXI events representing anEXI document or anEXI fragment.

The EXI events permitted at any given position in an EXI stream are determined by the EXI grammar.As is the case with XML,the events occur with nesting pairs of matching start element and end element events where any pair does not intersect with another except when it is fully contained in the other.The EXI grammar incorporates knowledge of the XML grammar and may be augmented and refined using schema information and fidelity options. The EXI grammar is formally specified in section8. EXI Grammars.

An EXI body can represent an EXI document with a single root element or an EXI fragment with zero or more root elements.[Definition:] EXI documents are EXI bodies with a single root element that conform to the Built-in Document Grammar (See8.4.1 Built-in Document Grammar) or Schema-informed Document Grammar (See8.5.1 Schema-informed Document Grammar).[Definition:] EXI fragments are EXI bodies with zero or more root elements that conform to the Built-in Fragment Grammar (See8.4.2 Built-in Fragment Grammar) or Schema-informed Fragment Grammar (See8.5.2 Schema-informed Fragment Grammar).

[Definition:] When schema information is available to describe the contents of an EXI body, such an EXI stream is aschema-informed EXI stream and the EXI body is interpreted according to the Schema-informed Document Grammar (See8.5.1 Schema-informed Document Grammar) or Schema-informed Fragment Grammar (See8.5.2 Schema-informed Fragment Grammar).[Definition:] Otherwise, an EXI stream is aschema-less EXI stream, and the EXI body is interpreted according to the Built-in Document Grammar (See8.4.1 Built-in Document Grammar) or Built-in Fragment Grammar (See8.4.2 Built-in Fragment Grammar).

The following table summarizes the EXI event types and associated event content that occur in an EXI stream.[Definition:] The content of an event consists ofcontent items,and the content items appear in an EXI stream in the order they are shown in the tablefollowing their respectiveevent codes thateach marks the start of anevent.In addition, the table includes the grammar notation used to represent eachevent in this specification. Eachevent in an EXI stream participates in a mapping system that relatesevents to XML Information Items so that an EXI documentor an EXI fragmentas a whole serves to represent an XML Information Set. The table shows XML Information Items relevant to each EXI event. AppendixB Infoset Mapping describes the mapping system in detail.

Table 4-1. EXI events
EXI Event Type	Event Content(Content Items)	Grammar Notation(Terminal Symbols)	Information Item
Start Document		SD	B.1 Document Information Item
End Document		ED	B.1 Document Information Item
Start Element	qname	SE ( qname )	B.2 Element Information Items
		SE ( * )
		SE ( uri : * )
End Element		EE
Attribute	qname, value	AT ( qname )	B.3 Attribute Information Item
		AT ( * )
		AT ( uri : * )
Characters	value	CH	B.6 Character Information item
Namespace Declaration	uri,prefix,local-element-ns	NS	B.11 Namespace Information Item
Comment	text	CM	B.7 Comment Information item
Processing Instruction	name, text	PI	B.4 Processing Instruction Information Item
DOCTYPE	name, public, system, text	DT	B.8 Document Type Declaration Information item
Entity Reference	name	ER	B.5 Unexpanded Entity Reference Information item
Self Contained		SC

Section6. Encoding EXI Streams describes the algorithm used to encodeevents in the EXI stream.As indicated in the table above, there are some event types that carry content with theirevent instances while other event types function as markers without content.

SE events may be followed by a series of NS events. Each NS event either associates a prefix with an URI, assigns a default namespace, or in the case of a namespace declaration with an empty URI, rescinds one of such associations in effect at the point of its occurrence. The effect of the association or disassociation caused by a NS event stays in effect until the corresponding EE event occurs.

Like XML, the namespace of a particular element may be specified by a namespace declarationprecedingthe element or a local namespace declaration following the element name. When the namespace is specified by a local namespace declaration, thelocal-element-ns flag of the associated NS event is set to true and the prefix of the element is set to the prefix of that NS event. When the namespace is specified by a previous namespace declaration, thelocal-element-ns flag of all local NS events is false and the prefix of the element is set according to the prefix component of the elementqname. The series of NS events associated with a particular element may include at most one NS event with itslocal-element-ns flag set to true. Theuri of a NS event with itslocal-element-ns flag set to true MUST match theuri of the associated SE event.

The namespace of elements and attributes is specified as part of SE and AT events and hence namespace declarations can be omitted from the EXI stream if preservation of prefixes is not required by the applications. As prescribed byTable B-2 andTable B-11, [namespace attributes] representing namespace declarations are mapped to NS events and SHOULD NOT be represented by AT events. This also implies that the following AT events SHOULD NOT occur in EXI streams: (1) AT events with qname whose uri is "http://www.w3.org/2000/xmlns/"; (2) AT events with qname which has empty uri ("") and local name either of the form "xmlns" or "xmlns:*", where "*" represents a string with 0 or more characters.

An SE event may be followed by a SC event, indicating the element is self-contained and can be read independently from the rest of the EXI body. Applications may use self-contained elements to index portions of the EXI body for random access.

The representation ofevent codes which identify the event type and start each event is described in6.2 Representing Event Codes.Each item in the event content has adatatype representationassociated with it as shown in the following table. The content of eachevent, if any, is encoded as a sequence of items each of which being encoded according to itsdatatype representationin order starting with the first item followed by subsequent items.

Table 4-2. Datatype representations of event content items
Content item	Used in	Datatype representation
name	PI, DT, ER	7.1.10 String
prefix	NS	7.1.10 String
local-element-ns	NS	7.1.2 Boolean
public	DT	7.1.10 String
qname	SE, AT	7.1.7 QName
system	DT	7.1.10 String
text	CM, PI, DT	7.1.10 String
uri	NS	7.1.10 String
value	CH, AT	According to the schema datatype (see7. Representing Event Content) if any is in effect, otherwise7.1.10 String

The datatype representationused for eachvalue content item depends on the schemadatatype^XS2if any that is in effect for thatvalue.The String datatype representation (see7.1.10 String)is used forvalues that do not have an associated schema datatype,cannot be or are opted not to be represented by their associated datatype representations,or occur in mixed content. Section7. Representing Event Content describes how each of the types listed above are encoded in an EXI stream.

Note:

The syntax and semantics of the NS event are designed to minimize the overhead required for representing namespace prefixes in EXI streams without introducing significant complexity. In the common case where each namespace is bound to a single prefix, this design reduces the overhead for representing all element and attribute namespace prefixes to zero bits.

5. EXI Header

EachEXI stream begins with an EXI header.[Definition:] TheEXI headercan identify EXI streams,distinguish EXIstreamsfrom text XML documents,identify the version of the EXI format being used, and specify the options used to process the body of the EXI stream.The EXI header has the following structure:

[ EXI Cookie ]	Distinguishing Bits	Presence Bit	EXI Format	[EXI Options]	[Padding Bits]
		for EXI Options	Version

The EXI Options field within an EXI header is optional. Its presence is indicated bythe value of the presence bit that followsDistinguishing Bits. The presence and absence is indicated by the value 1 and 0, respectively.

When thecompression option is true, or thealignment option isbyte-alignment orpre-compression,padding bits of minimum length required to make the whole length ofthe header byte-aligned are added at the end of the header.On the other hand, there are no padding bits when the alignment in use isbit-packed.The padding bits fieldif it is presentcan contain any values of bits as its contents.

The details of theEXI Cookie,Distinguishing Bits,EXI Format Version andEXI Options are described in the following sections.

5.1 EXI Cookie

[Definition:] AnEXI header MAY start with anEXI Cookie,which is a four byte field that serves to indicate that the stream of which it is a part is an EXI stream. The four byte field consists of four characters" $ " , " E ", " X " and " I "in that order, each represented as an ASCII octet, as follows.

'$'

'E'

'X'

'I'

This four byte sequence is particular to EXI and specific enough to distinguish EXI streams from a broad range of data types currently used on the Web. While the EXI cookie is optional, its use is RECOMMENDED in the EXI header when the EXI stream is exchanged in a context where a longer, more specific content-based datatype identifier is desired than that provided by theDistinguishing Bits, whose role is more narrowly focused on distinguishing EXI streams from XML documents.

5.2 Distinguishing Bits

[Definition:] The second part in the EXI header is theDistinguishing Bits,which is a two bit field of which the first bit contains the value 1 and the second bit contains the value 0, as follows.

Unlike the optional EXI cookie that MAY occur to precede this field, the presence of Distinguishing Bits is REQUIRED in the EXI header. It is used to distinguish EXI streams from text XML documents in the absence of anEXI cookie.This two bit sequence is the minimum that suffices to distinguish EXIstreams from XML documents since it is the minimum length bitpattern that cannot occur as the first two bits of a well-formed XMLdocument represented in any one of the conventional characterencodings, such as UTF-8, UTF-16, UTF-32, UCS-2, UCS-4, EBCDIC, ISO 8859,Shift-JIS and EUC, according toXML[XML 1.0] [XML 1.1].Therefore, XML Processors that do not support EXI are expected to reject an EXI stream as early as they readand process the first byte from the stream.

Systems that use EXI streams as well as XML documents can reliably look atthe Distinguishing Bits to determine whether to interpret a particularstream as XML or EXI.

5.3 EXI Format Version

[Definition:] The fourth part in the EXI header is theEXI Format Version, which identifies the version of the EXI format being used.EXI format version numbers are integers. Each version of the EXI Format Specification specifies the corresponding EXI format version number to be used by conforming implementations. The EXI format version number that corresponds with this version of the EXI format specification is 1 (one).

The first bit of the version field indicates whether the version is a preview or final version of the EXI format.A value of 0 indicates this is a final version and a value of 1 indicates this is a previewversion. Final versions correspond to final, approved versions of the EXI format specification.AnEXI processor that implements a final version of the EXI format specification is REQUIRED to process EXI streams that have a version field with its first bit set to 0 followed by a version number that corresponds to the version of the EXI specification the processor implements. The behavior of an EXI processor on an EXI stream with its first bit set to 0 followed by a version not corresponding to a version implemented by the processor is not constrained by this specification. For example, the EXI processor MAY reject such a stream outright or it MAY attempt to process the EXI body.Preview versions of the EXI format are useful forgaining implementation and deployment experience prior to finalizing aparticular version of the EXI format. While preview versions may match drafts of this specification, they are not governed by this specification and the behaviour of EXI processors encountering preview versions of the EXI format is implementation dependent. Implementers are free to coordinate to achieve interoperability between different preview versions of the EXI format.

Following the first bit of the version is a sequence of one or more4-bit unsigned integers representing the version number. The versionnumber is determined by summing this sequence of 4-bit unsignedvalues and adding 1 (one). The sequence is terminated by any 4-bit unsigned integer witha value in the range 0-14. As such, the first 15 version numbers arerepresented by 4 bits, the next 15 are represented by 8 bits, etc.

Given an EXI stream with its stream cursor positioned just past the first bit of the EXI format version field, the EXI format version number can be computed by going through the following steps with version number initially set to 1.

Read next 4 bits as an unsigned integer value.
Add the value that was just read to the version number.
If the value is 15, go to step 1, otherwise (i.e. the value being in the range of 0-14), use the current value of the version number as the EXI version number.

The following are example EXI format version numbers.

Example 5-1.EXI Format Version Examples

EXI Format Version Field	Description
1 0000	Preview version 1
0 0000	Final version 1
0 1110	Final version 15
0 1111 0000	Final version 16
0 1111 0001	Final version 17

EXI processors conforming with the final version of thisspecification MUST use the 5-bit value 0 0000 as the versionnumber.

5.4 EXI Options

[Definition:] Thefifthpart of the EXIheader is theEXI Options, which provides a way to specify theoptions used to encode the body of the EXI stream.[Definition:] The EXI Options are represented as anEXI Options document, which is an XML document encoded using the EXI format described in this specification.This results in a very compact headerformat that can be read and written with very little additional software.

The presence of EXI Options in its entirety is optional in EXI header,and it is predicated on the value of the presence bit that follows theDistinguishing Bits.When EXI Options are present in the header, an EXI Processor MUST observe thespecified options to process the EXI stream that follows. Otherwise,an EXI Processor may obtain the EXI options using another mechanism.There are no fallback option values provided by this specification for usein the absence of the whole EXI Options part.

EXI processors MAY provide external means for applications or users tospecify EXI Options when the EXI Options document is absent.SuchEXI processors are typically used in controlled systemswhere the knowledge about the effective EXI Options is shared prior tothe exchange of EXIstreams. The mechanisms to communicate out-of-band EXI Options and their representation are implementation dependent.

The following table describes the EXI options that may be specified in theEXI Options document.

Table 5-1. EXI Options in Options Document
EXI Option	Description	Default Value
alignment	Alignment ofevent codes andcontent items	bit-packed
compression	EXI compression is used to achieve better compactness	false
strict	Strict interpretation of schemas is used to achieve better compactness	false
fragment	Body is encoded as anEXI fragment instead of anEXI document	false
preserve	Specifies whether the support for the preservation of comments, pis, etc. is each enabled	all false
selfContained	Enables self-contained elements	false
schemaId	Identify the schema information, if any, used to encode the body	no default value
datatypeRepresentationMap	Specify alternate datatype representations for typedvalues in theEXI body	no default value
blockSize	Specifies the block size used for EXI compression	1,000,000
valueMaxLength	Specifies the maximum string length ofvaluecontent items to be considered for addition to the string table.	unbounded
valuePartitionCapacity	Specifies the total capacity of value partitions in a string table	unbounded
[user defined meta-data]	User defined meta-data may be added	no default value

AppendixC XML Schema for EXI Options Document provides an XML Schemadescribingthe EXI Options document.This schema is designed to produce smaller headersfor option combinations used when compactness is critical.

TheEXI Options document isrepresented as anEXI body informed by the above mentioned schema using the default optionsspecified by the following XML document.An EXI Options document consists only of an EXI body, and MUST NOTstart with an EXI header.

Header options used for encoding theEXI Options document

  <header xmlns="http://www.w3.org/2009/exi">    <strict/>  </header>

Note that this specification does not requireEXI processors to read and process the schema prescribed for EXI options document (C XML Schema for EXI Options Document), in order to process EXI options documents. EXI processors MUST use the schema-informed grammars that stem from the schema in processing EXI options documents, beyond which there is no requirement as to the use of the schema, and implementations are free to use any methods to retrieve the instructions that observe the grammars for processing EXI options documents. Section8.5 Schema-informed Grammars describes the system to derive schema-informed grammars from XML Schemas.

Below is a brief description of each EXI option.

[Definition:] Thealignment option is used to control the alignment ofevent codes andcontent items. The value is one ofbit-packed,byte-alignment orpre-compression, of whichbit-packed is the default value assumed when the "alignment" element is absent in theEXI Options document.The option valuesbyte-alignment andpre-compression are effected when "byte" and "pre-compress" elements are present in the EXI Options document, respectively.When the value ofcompression option is set to true, alignment of the EXI Body is governed by the rules specified in9. EXI Compression instead of the alignment option value. The "alignment" element MUST NOT appear in an EXI options document when the "compression" element is present.

[Definition:] The alignment option valuebit-packed indicates that theevent codes and associated content are packed in bits without any padding in-between.

[Definition:] The alignment option valuebyte-alignment indicates that theevent codes and associated content are aligned on byte boundaries. While byte-alignment generally results in EXI streams of larger sizes compared with their bit-packed equivalents, byte-alignment may provide a help in some use cases that involve frequent copying of large arrays of scalar data directly out of the stream. It can also make it possible to work with data in-place and can make it easier to debug encoded data by allowing items on aligned boundaries to be easily located in the stream.

[Definition:] The alignment option valuepre-compression indicates that all steps involved in compression (see section9. EXI Compression) are to be done with the exception of the final step of applying the DEFLATE algorithm. The primary use case of pre-compression is to avoid a duplicate compression step when compression capability is built into the transport protocol. In this case, pre-compression just prepares the stream for later compression.

[Definition:] Thecompression option is a Boolean used to increase compactness using additional computational resources. The default value "false" is assumed when the "compression" element is absent in theEXI Options document whereas its presence denotes the value "true".When set to true, theevent codes and associated content are compressed according to9. EXI Compression regardless of thealignment option value. As mentioned above, the "compression" element MUST NOT appear in an EXI options document when the "alignment" element is present.

[Definition:] Thestrict option is a Boolean used to increase compactness by using a strict interpretation of the schemas and omitting preservation of certain items, such as comments, processing instructions and namespace prefixes. The default value "false" is assumed when the "strict" element is absent in theEXI Options documentwhereas its presence denotes the value "true".When set to true,those productions that have NS, CM, PI, ER, and SC terminal symbols are omitted from theEXI grammars, and schema-informed element and type grammars are restricted to only permit items declared in the schemas.A note in section8.5.4.4.2 Adding Productions when Strict is True describes some additional restrictions consequential of the use of this option.The "strict" element MUST NOT appear in anEXI options document whenone of "dtd", "prefixes", "comments", "pis" or "selfContained"element is present in the same options document.

[Definition:] Thefragment option is a Boolean that indicates whether theEXI body is anEXI document or anEXI fragment. When set to true, theEXI body is anEXI fragment. Otherwise, theEXI body is anEXI document. The default value "false" is assumed when the "fragment" element is absent in theEXI Options documentwhereas its presence denotes the value "true".

[Definition:] Thepreserve option is a set of Booleans that can be set independentlyto each enable or disable a share of the format's capacity determining whether or how certain information items can be preserved in the EXI stream.Section6.3 Fidelity Options describes the set of information itemsaffected by the preserve option.The presence of "dtd", "prefixes", "lexicalValues", "comments" and "pis" in the EXI Options document each turns on fidelity options Preserve.comments, Preserve.pis, Preserve.dtd, Preserve.prefixes and Preserve.lexicalValues whereas the absence denotes turning each off.The elements "dtd", "prefixes", "comments" and "pis"MUST NOT appear in anEXI options document when the "strict" element is present in the same options document.The element "lexicalValues", on the other hand, is permitted to occur in the presence of "strict" element.

[Definition:] TheselfContained option is a Boolean used to enable the use of self-contained elements in the EXI stream. Self-contained elements may be read independently from the rest of the EXI body, allowing them to be indexed for random access. The "selfContained" element MUST NOT appear in anEXI options document whenone of "compression", "pre-compression" or "strict" elements are presentin the same options document. The default value "false" is assumed when the "selfContained" element is absent from theEXI Options documentwhereas its presence denotes the value "true".

[Definition:] TheschemaId option may be used to identify the schema information usedfor processingthe EXI body. When the"schemaId" element in theEXI options document contains the xsi:nil attributewith its value set to true,no schema informationis used for processingthe EXI body (i.e. aschema-less EXI stream).When the value of the "schemaId" element is empty, no user defined schema information is used for processing the EXI body;however, the built-in XML schema types are available for use in the EXI body.When the schemaId option is absent (i.e., undefined), no statement is made about the schema information used to encode the EXI body and this informationMUST be communicated out of band.This specification does not dictate the syntax or semantics of other values specified in this field. An example schemaId scheme is the use of URI that is apt for globally identifying schema resources on the Web.The parties involved in the exchange are free to agree on the scheme of schemaId field that is appropriate for their use to uniquely identify the schema information.

[Definition:] ThedatatypeRepresentationMap optionspecifies an alternate set of datatype representations for typedvalues intheEXI bodyas described in7.4 Datatype Representation Map.When there are no "datatypeRepresentationMap" elements in theEXI Options document, no Datatype Representation Map is used for processing the EXI body.This option does not take effect when the value of the Preserve.lexicalValues fidelity option is true (see6.3 Fidelity Options),or when theEXI stream is aschema-less EXI stream.

[Definition:] TheblockSize option specifies the block size used for EXI compression. When the "blockSize" element is absent in theEXI Options document, the default blocksize of 1,000,000 is used. The default blockSize is intentionally large but can be reduced for processing large documents on devices with limited memory.

[Definition:] ThevalueMaxLength option specifies the maximum length ofvalue content items to be considered for addition to the string table.The default value "unbounded" is assumed when the "valueMaxLength" element is absent in theEXI Options document.

[Definition:] ThevaluePartitionCapacity option specifies the maximum number ofvalue content items in the string table at any given time.The default value "unbounded" is assumed when the "valuePartitionCapacity" element is absent in theEXI Options document.Section7.3.3 Partitions Optimized for Frequent use of String Literals specifies the behavior of the string table when this capacity is reached.

[Definition:] Theuser defined meta-data conveys auxiliary information that applications may use to facilitate interpretation of the EXI stream.The user defined meta-data MUST NOT be interpreted in a way that alters or extends the EXI data format defined in this specification.User defined meta-data may be added to an EXI Options document just prior to thealignment option.

6. Encoding EXI Streams

The rules for encoding a series ofevents as anEXI stream are verysimple and are driven by a declarative set of grammars that describesthe structure of anEXI stream. Everyevent in the stream isencoded using the same set of encoding rules, which are summarized asfollows:

Get the next eventdatato be encoded
If fidelity options (see6.3 Fidelity Options) indicate this event type is not processed,go to step 1
Use the grammars to determine theevent code of theevent
Encode theevent code followed by the event content (seeTable 4-1)
Evaluate the grammar production matched by theevent
Repeat until the End Document (ED) event is encoded

Self-contained (SC), namespace (NS) and attribute (AT) events associated with a given element occur directly after the start element (SE) event in the following order:

...

AT (xsi:type)

AT (xsi:nil)

...

Namespace (NS) events occur in document order.The xsi:type and xsi:nil attributesoccur before all other AT events.When the grammar currently in effect for the element is either abuilt-in element grammar (see8.4.3 Built-in Element Grammar) or aschema-informed element fragment grammar (see8.5.3 Schema-informed Element Fragment Grammar), the remaining attribute (AT) events can occur in any order. Otherwise, when the grammar in effect is aschema-informed element grammar or aschema-informed type grammar (see8.5.4 Schema-informed Element and Type Grammars), the remaining attributes can occur in any order that is permitted by the grammar, though in practice they SHOULD occur in lexicographical order sorted first byqname local-name then byqname uri for achieving better compactness, where aqname is aqname of an attribute.

Note:

Under certain circumstances, it is not strictly required that the xsi:type or xsi:nil attributes occur before other AT events of the same element. See the notes in section8.4.3 Built-in Element Grammar for details.

EXI uses the same simple procedure described above, to encode well-formed documents, document fragments, schema-valid information items, schema-invalid information items, information items partially described by schemas and information items with no schema at all. Only the grammars that describe these items differ. For example, an element with no schema information is encoded according to the XML grammar defined by the XML specification, while an element with schema information is encoded according to the more specific grammar defined by that schema.

[Definition:] Anevent code is a sequence of 1 to 3 non-negative integers called parts used to identify each event in an EXI stream. The EXI grammars describe which events may occur at each point in an EXI stream and associate an even code with each one.(See8.2 Grammar Event Codes for more description of event codes.)

Section6.1 Determining Event Codes describes in detail how the grammar is used to determine the event code of anevent. Section6.2 Representing Event Codes describes in detail how event codes are represented as bits. Section6.3 Fidelity Options describes available fidelity options and how theyaffectthe EXI stream. Section7. Representing Event Content describes how the typed event contents are represented as bits.

6.1 Determining Event Codes

The structure of an EXI stream is described by the EXI grammars, which are formally specified in section8. EXI Grammars. Each grammar defines which events are permitted to occur at any given point in the EXI stream and provides a pre-assignedevent code for each one.

For example, the grammar productions below describe the events that can occur in a schema-informed EXI stream after the Start-Document (SD) event provided there are four global elements defined in the schema and assign anevent code for each one.See8.5.1 Schema-informed Document Grammar for the process used for generating the grammar productions below from the schema.

Example 6-1.Example productions with event codes


Syntax			Event Code
	DocContent
		SE ("A")DocEnd	0
		SE ("B")DocEnd	1
		SE ("C")DocEnd	2
		SE ("D")DocEnd	3
		SE ()DocEnd*	4
		DTDocContent	5.0
		CMDocContent	5.1.0
		PIDocContent	5.1.1

At the point in an EXI stream where the above grammar productions are in effect, theevent code of Start Element "A" (i.e. SE("A")) is 0. The event code of a DOCTYPE (DT) event at this point in the stream is 5.0, and so on.

6.2 Representing Event Codes

Eachevent code is represented by a sequence of 1 to 3 parts that uniquely identify an event.Event code parts are encoded in order starting with the first part followed by subsequent parts.

Thei-th part of anevent code is encoded as ann-bit unsigned integer (7.1.9 n-bit Unsigned Integer), wheren is ⌈ log₂ m ⌉ andm is the number of distinct values used as thei-th part of its own and all its sibling event codes in the current grammar.Twoevent codes are siblings at thei-th part if and only if they share the same values in all preceding parts. Allevent codes are siblings at the first part.

If there is only one distinct value for a given part, the part is omitted (i.e., encoded in log₂ 1 = 0 bits = 0 bytes).

For example, the eightevent codes shown in theDocContent grammar above have values ranging from 0 to 5 for the first part. Six distinct values are needed to identify the first part of theseevent codes.Therefore, the first part can be encoded as ann-bit unsigned integer wheren = ⌈ log₂ 6 ⌉ = 3. In the same fashion, the second and third part (if present) are represented asn-bit unsigned integers wheren is ⌈ log₂ 2 ⌉ = 1 and ⌈ log₂ 2 ⌉ = 1 respectively.

When the value of thecompression option is false andbit-packed alignment is used,n-bit unsigned integers are represented usingn bits. The first table below illustrates how theevent codes of eachevent matched by theDocContent grammar above are represented in this case.

When the value ofcompression option is true, or eitherbyte-alignment orpre-compression alignment option is used,n-bit unsigned integers are represented using the minimum number of bytes required to storen bits. The second table below illustrates how theevent codes of eachevent matched by theDocContent grammar above are represented in this case.

Example 6-2.Example event code encoding

Table 6-1. Example event code encodingwhen EXI compression is not in effect andbit-packed alignment option is used
Event	Part values			Event Code Encoding	# bits
SE ("A")	0			000	3
SE ("B")	1			001	3
SE ("C")	2			010	3
SE ("D")	3			011	3
SE (*)	4			100	3
DT	5	0		101 0	4
CM	5	1	0	101 1 0	5
PI	5	1	1	101 1 1	5

# distinct values (m)

# bits per part

⌈ log₂ m ⌉

Table 6-2. Example event code encodingwhen EXI compression is in effect, or eitherbyte-alignment orpre-compression alignment option is used
Event	Part values			Event Code Encoding	# bytes
SE ("A")	0			00000000	1
SE ("B")	1			00000001	1
SE ("C")	2			00000010	1
SE ("D")	3			00000011	1
SE (*)	4			00000100	1
DT	5	0		00000101 00000000	2
CM	5	1	0	00000101 00000001 00000000	3
PI	5	1	1	00000101 00000001 00000001	3

# distinct values (m)

# bytes per part

⌈ (log₂ m) / 8 ⌉

6.3 Fidelity Options

Some XML applications do not require the entire XML feature set and would prefer to eliminate the overhead associated with unused features. For example, the SOAP 1.2 specification[SOAP 1.2] prohibits the use of XMLprocessing instructions.In addition, there are many data-exchange use cases that do not require XML comments or DTDs.

Thepreserve option in EXI Options comprises a set of fidelity options, each of which independentlyenables or disables the format's capacity forthe preservation (or preservation level) of a certain type of information item.Applications can use thepreserve option to specify the set of fidelity options they require.As specified in section8.3 Pruning Unneeded Productions, EXI processors MUST use these fidelity options to pruneproductions that match the associated events from the grammars, improving compactness and processing efficiency.

The table below lists the fidelity options supported by this version of the EXI specification and describes the effect setting these options has on theEXI stream.

Table 6-3. Fidelity options
Fidelity option	Effect
Preserve.comments	CM events can be preserved
Preserve.pis	PI events can be preserved
Preserve.dtd	DT and ER events can be preserved
Preserve.prefixes	NS events and namespace prefixes can be preserved
Preserve.lexicalValues	Lexical form of element and attribute values can be preserved invalue content items

When qualified names[Namespaces in XML 1.0] [Namespaces in XML 1.1] are used in thevalues of AT or CH events in an EXI Stream, the Preserve.prefixes fidelity option SHOULD be turned on to enable the preservation of the NS prefix declarations used by these values.Note, in particular among other cases, that this practice applies to the use of xsi:type attributes in EXI streams when Preserve.lexicalValues fidelity option is set totrue.

See section4. EXI Streams for the definition of EXI event types and their associatedcontent items.

7. Representing Event Content

Theevent code of each event in an EXI body is represented as a sequence ofn-bit unsigned integers (7.1.9 n-bit Unsigned Integer).See section6.2 Representing Event Codes for the description of the event code representation.Thecontent items of an event are represented according to their datatype representations (seeTable 4-2). In the absence of an associated datatype representation, attribute and charactervalues arerepresented as String (7.1.10 String).

[Definition:] EXI defines a minimal set of datatype representations calledBuilt-in EXI datatype representations that define howcontent itemsas well as the parts of anevent codeare represented in EXI streams.When thePreserve.lexicalValues option is false,values are represented using built-in EXI datatype representations(see7.1 Built-in EXI Datatype Representations) or user-defined datatype representations(see7.4 Datatype Representation Map) associated with the schemadatatypes^XS2.Otherwise,values are represented as Strings with restricted character sets (seeTable 7-2 below).

The followingTable 7-1 lists the built-in EXI datatype representations, associated EXI datatype identifiers and the XML Schemabuilt-in datatypes^XS2 each EXI datatype representation is used to represent by default. When that default association is in effect, datatypes derived from the XML Schema datatypes are also represented according to the associatedbuilt-in EXI datatype representation. When there are more than one XML Schema datatypes from which a datatype is derived directly or indirectly, the closest ancestor is used to determine thebuilt-in EXI datatype representation. For example, a value of XML Schema datatype xsd:int is represented according to the samebuilt-in EXI datatype representation as a value of XML Schema datatype xsd:integer. Although xsd:int is derived indirectly from xsd:integer and also further from xsd:decimal, a value of xsd:int is processed as an instance of xsd:integer because xsd:integer is closer to xsd:int than xsd:decimal is in the datatype inheritance hierarchy.

Table 7-1.Built-in EXI Datatype Representations
Built-in EXI Datatype Representation	EXI Datatype ID	XML Schema Datatypes
Binary	exi:base64Binary	base64Binary
Binary	exi:hexBinary	hexBinary
Boolean	exi:boolean	boolean
Date-Time	exi:dateTime	dateTime
	exi:time	time
	exi:date	date
	exi:gYearMonth	gYearMonth
	exi:gYear	gYear
	exi:gMonthDay	gMonthDay
	exi:gDay	gDay
	exi:gMonth	gMonth
Decimal	exi:decimal	decimal
Float	exi:double	float,double
Integer	exi:integer	integer
String	exi:string	string,anySimpleType and all types directly derived byunion
n-bit Unsigned Integer		Not associated with any datatype directly, but used byInteger datatype representation for some boundedintegers (see7.1.5 Integer)
Unsigned Integer		Not associated with any datatype directly, but used byInteger datatype representation for unsignedintegers (see7.1.5 Integer)
List		All types directly derived bylist, includingNMTOKENS,IDREFS andENTITIES
QName		xsi:type attribute values whenPreserve.lexicalValues option value isfalse

Each EXI datatype identifier above is aqname that uniquely identifies one of thebuilt-in EXI datatype representations.EXI datatype identifiers are used inDatatype Representation Maps tochangethe datatype representations used for specific schemadatatypes^XS2 and their sub-types.Onlybuilt-in EXI datatype representations that have been assigned identifiers are usable inDatatype Representation Maps.

When thePreserve.lexicalValues option is true, allvaluesare represented as Strings.The table below defines restricted character sets for several of the built-in EXI datatypes. Eachvalue that would have otherwise been represented by one of these datatypes is instead represented as a String with the associated restricted character set,regardless of the actual pattern facets, if any, specified in the definitions of the schema datatypes.

Table 7-2. Restricted Character Sets for Built-in EXIDatatype Representations
EXI Datatype ID	Restricted Character Set
exi:base64Binary	{ #x9, #xA, #xD, #x20, +, /, [0-9], =, [A-Z], [a-z] }
exi:hexBinary	{ #x9, #xA, #xD, #x20, [0-9], [A-F], [a-f] }
exi:boolean	{ #x9, #xA, #xD, #x20, 0, 1, a, e, f, l, r, s, t, u }
exi:dateTime	{ #x9, #xA, #xD, #x20, +, -, ., [0-9], :, T, Z }
exi:time
exi:date
exi:gYearMonth
exi:gYear
exi:gMonthDay
exi:gDay
exi:gMonth
exi:decimal	{ #x9, #xA, #xD, #x20, +, -, ., [0-9] }
exi:double	{ #x9, #xA, #xD, #x20, +, -, ., [0-9], E, F, I, N, a, e }
exi:integer	{ #x9, #xA, #xD, #x20, +, -, [0-9] }
exi:string	no restricted character set

The restricted character set for the EXI List datatype representation is the restricted character set of the EXI datatype representation of the List item type.

The restricted character set for a value that would be represented as an EXI enumeration is the restricted character set of the EXI datatype representation of the enumeration base type.

The rules used to represent values of String depend on thecontent items to which the values belong. There are certaincontent items whose value representation involve the use of string tables while othercontent items are represented using the encoding rule described in7.1.10 String without involvement of string tables. Thecontent items that use string tables and how each of suchcontent items uses string tables to represent their values are described in7.3 String Table.

Schemas can provide one or more enumerated values fordatatypes.When thePreserve.lexicalValues option is false,EXI exploits those pre-defined values when they are available to represent values of suchdatatypesin a more efficient manner thanwould have done otherwise without using pre-defined values.The encoding rule for representingenumerated valuesis described in7.2 Enumerations.Datatypesthat are directly derived fromanotherby union and their subtypes are always represented as String regardless of the availability of enumerated values. Representation of values of which thedatatype is either alist datatype^XS2, orone of QName, Notation or adatatypederived therefrom by restriction are also not affected by enumerated values if any.

7.1 Built-in EXI Datatype Representations

The following sections describe thebuilt-in EXI datatype representations used for representingevent codes andcontent items in EXI streams. Unless otherwise stated, individual items in an EXI stream are packed into bytes most significant bit first.

7.1.1 Binary

The Binary datatype representation is a length-prefixed sequence of octets representing the binary content. The length is represented as an Unsigned Integer (see7.1.6 Unsigned Integer).

7.1.2 Boolean

When the associated schema datatype is directly or indirectly derived from xsd:boolean and pattern facets are available in the schema datatype, the Boolean datatype representation is an-bit unsigned integer (7.1.9 n-bit Unsigned Integer), wheren is two (2) and the values zero (0), one (1), two (2) and three (3) represent the values "false", "0", "true" and "1" respectively.

Otherwise, the Boolean datatype representation is an-bit unsigned integer (7.1.9 n-bit Unsigned Integer), wheren is one (1). The value zero (0) represents false and the value one (1) represents true.

7.1.3 Decimal

The Decimal datatype representation is a Boolean sign (see7.1.2 Boolean) followed by two Unsigned Integers (see7.1.6 Unsigned Integer). A sign value of zero (0) is used to represent positive Decimal values and a sign value of one (1) is used to represent negative Decimal values. The first Unsigned Integer represents the integral portion of the Decimal value. The second Unsigned Integer represents the fractional portion of the Decimal value with the digits in reverse order to preserve leading zeros.

Note:

Some implementers may assume and try to find a parallel between Decimal and7.1.5 Integer datatype representations. However, note that there are enough discrepancies that may make sharing implementation codes between the two more involved than some would have presumed. In particular7.1.5 Integer cannot represent minus zero.

7.1.4 Float

The Float datatype representation is two consecutive Integers (see7.1.5 Integer). The first Integer represents the mantissa of the floating point number and the second Integer represents the base-10 exponent of the floating point number. The range of the mantissa is - (2⁶³) to 2⁶³-1 and the range of the exponent is - (2¹⁴-1) to 2¹⁴-1.Mantissa or exponent values outside of the respective accepted range MUST NOT be used in the Float datatype representation. Values typed as Float with a mantissa or exponent outside the accepted range are represented as untyped values, processed by an alternative production if available that can be used to represent untyped values.Examples of such productions are those whose terminal symbol on the right-hand side is AT(qname) [untyped value], AT(*) [untyped value] or CH [untyped value] (See8.5.4.4.1 Adding Productions when Strict is False).

The exponent value -(2¹⁴) is used to indicate one of the special values: infinity, negative infinity and not-a-number (NaN). An exponent value -(2¹⁴) with mantissa values 1 and -1 representspositive infinity (INF) and negative infinity (-INF) respectively. An exponent value -(2¹⁴) with any other mantissa value represents NaN.

The Float datatype representation can be decoded by going through the following steps.

Retrieve the mantissa value using the procedure described in7.1.5 Integer.
Retrieve the exponent value using the procedure described in7.1.5 Integer.
If the exponent value is -(2¹⁴), the mantissa value 1 represents INF, the mantissa value -1 represents -INF and any other mantissa value represents NaN. If the exponent value is not -(2¹⁴), the float value ism × 10^e wherem is the mantissa ande is the exponent obtained in the preceding steps.

7.1.5 Integer

The Integer datatype representation supports signed integer numbers of arbitrary magnitude. The specific representation used depends on thefacet^XS2 values of the associated schemadatatype^XS2 as follows.

If the associated schema datatype is directly or indirectly derived from xsd:integer and the bounded range determined by itsminInclusive^XS2,minExclusive^XS2,maxInclusive^XS2 andmaxExclusive^XS2 facets has 4096 or fewer values,the value is represented as ann-bit Unsigned Integer offset from the minimum value in the range wheren is ⌈ log₂m ⌉ andm is the bounded range of the schema datatype.

Otherwise, if the associated schema datatype is directly or indirectly derived from xsd:integer and theminInclusive^XS2 orminExclusive^XS2 facets specify a lower bound greater than or equal to zero (0), the value is represented as anUnsigned Integer.

Otherwise, the value is represented as a Boolean sign (see7.1.2 Boolean) followed by an Unsigned Integer (see7.1.6 Unsigned Integer). A sign value of zero (0) is used to represent positive integers and a sign value of one (1) is used to represent negative integers. For non-negative values, the Unsigned Integer holds the magnitude of the value. For negative values, the Unsigned Integer holds the magnitude of the value minus 1.

7.1.6 Unsigned Integer

The Unsigned Integer datatype representation supports unsigned integer numbers of arbitrary magnitude. It is represented as a sequence of octets terminated by an octet with its most significant bit set to 0. The value of the unsigned integer is stored in the least significant 7 bits of the octets as a sequence of 7-bit bytes, with the least significant byte first.

EXI processors SHOULD support arbitrarily large Unsigned Integer values. EXI processors MUST support Unsigned Integer values less than 2147483648.

The Unsigned Integer datatype representation can be decoded by going through the following steps.

Example 7-1.Example algorithm for decoding an Unsigned Integer

Start with the initial value set to 0 and the initial multiplier set to 1.
Read the next octet.
Multiply the value of the unsigned number represented by the 7 least significant bits of the octet by the current multiplier and add the result to the current value.
Multiply the multiplier by 128.
If the most significant bit of the octet was 1, go back to step 2.

7.1.7 QName

The QName datatype representation is a sequence of values representing the URI, local-name and prefix components of the QName in that order, where the prefix component is present only when thePreserve.prefixes option is set to true.

When the QName value is specified by a schema-informed grammar using the SE (qname) or AT (qname) terminal symbols, URI and local-name are implicit and are omitted.Similarly, when the URI of the QName value is derived from a schema-informed grammar usingSE (uri: *)or AT (uri: *)terminal symbols, URI is implicit thus omitted in the representation, and only the local-name component is encoded as a String (see7.1.10 String).Otherwise, URI and local-name components are encoded as Strings.If the QName is in no namespace, the URI is represented by a zero length String.

When present, prefixes are represented asn-bit unsigned integers (7.1.9 n-bit Unsigned Integer), wheren is⌈ log₂(N) ⌉andN is the number ofprefixes in the prefix string table partition associated with the URI of the QName or one (1) if there are no prefixes in this partition.If the givenprefix exists in the associated prefix string table partition, it is represented using the compact identifier assigned by the partition. If the givenprefix does not exist in the associated partition, the QName MUST be part of an SE event and the prefix MUST be resolved by one of the NS events immediately following the SE event (see resolution rules below). In this case, the unresolved prefix representation is not used and can be zero (0) or the compact identifier of any prefix in the associated partition.

Note:

WhenN is one, the prefix is represented using zero bits (i.e. omitted).

Given an-bit unsigned integerm that represents either the prefix value or an unresolved prefix value, the effective prefix value is determined by following the rules described below in order. A QName is in error if its prefix cannot be resolved by the rules below.

If the prefix string table partition associated with the URI of the QName assigns the compact identifierm to aprefix value, select thisprefix value as the candidateprefix value. Otherwise, there is no candidateprefix value.
If the QName value is part of an SE event followed by an associated NS event withitslocal-element-ns flag valueset to true, theprefix value is theprefix of this NS event. Otherwise, theprefix value is the candidate value, if any, selected in step 1 above.

7.1.8 Date-Time

The Date-Time datatype representation is a sequenceof values representing the individual components of the Date-Time. Thefollowing table specifies each of the possible date-time componentsalong with how they are encoded. The value ranges of the date-time components follow thedefinitions of the XML Schema specification[XML Schema Datatypes] which for example prescribes the value range of the seconds to be between 0 and 60 to account for leap second representation and hour between 0 and 24 among others.

Table 7-3. Date-Time components
Component	Value	Type
Year	Offset from 2000	Integer (7.1.5 Integer)
MonthDay	Month * 32 + Day	9-bit Unsigned Integer (7.1.9 n-bit Unsigned Integer) where day is a value in the range 1-31 and month is a value in the range 1-12.
Time	((Hour * 64) + Minutes) * 64 + seconds	17-bit Unsigned Integer (7.1.9 n-bit Unsigned Integer) where Hour is a value in the range 0-24, Minutes is a value in the range 0-59 and seconds is a value in the range 0-60
FractionalSecs	Fractional seconds	Unsigned Integer (7.1.6 Unsigned Integer) representing the fractional part of the seconds with digits in reverse order to preserve leading zeros
TimeZone	TZHours * 64 + TZMinutes	11-bit Unsigned Integer (7.1.9 n-bit Unsigned Integer) representing a signed integer offset by 896 ( = 14 * 64 ) where TZHours is a value in the range [-14 .. 14] and TZMinutes is a value in the range [-59 .. 59]
presence	Boolean presence indicator	Boolean (7.1.2 Boolean)

The variety of components that constitute a value and their appearance order depend on the XML Schema type associated with the value. The following table shows which components are included in a value of each XML Schema type that is relevant to Date-Time datatype. Items listed in square brackets are included if and only if the value of its preceding presence indicator (specified above) is set to true.

Table 7-4. Assortment of Date-Time components
XML Schema Datatype	Included Components
gYear^XS2	Year, presence, [TimeZone]
gYearMonth^XS2	Year, MonthDay, presence, [TimeZone]
date^XS2	Year, MonthDay, presence, [TimeZone]
dateTime^XS2	Year, MonthDay, Time, presence, [FractionalSecs], presence, [TimeZone]
gMonth^XS2	MonthDay, presence, [TimeZone]
gMonthDay^XS2
gDay^XS2
time^XS2	Time, presence, [FractionalSecs], presence, [TimeZone]

7.1.9n-bit Unsigned Integer

When the value of thecompression option is false andthebit-packed alignment is used,then-bit Unsigned Integer datatype representation is an unsigned binary integer usingn bits.Otherwise, it is an unsigned integer using the minimum number of bytes required to storen bits. Bytes are ordered with the least significant byte first.

Then-bit unsigned integer is used for representingevent codes, the prefix component of QNames (see7.1.7 QName) and certainvalue content items, as described in respective relevant parts of this document. As described in section7.1.5 Integer, integers with a bounded range sizem equal to4096or smaller are represented asn-bit unsigned integers withn being ⌈ log₂m ⌉, as an offset from the minimum value in the range.

7.1.10 String

The String datatype representation is a length prefixed sequence ofcharacters. The length indicates the number of characters in thestring and is represented as an Unsigned Integer (see7.1.6 Unsigned Integer). If a restricted character set is defined for the string (see7.1.10.1 Restricted Character Sets), each character is represented as ann-bit Unsigned Integer (see7.1.9 n-bit Unsigned Integer). Otherwise, each character is represented by its Unicode[UNICODE]code point encoded as an Unsigned Integer (see7.1.6 Unsigned Integer).

EXI uses a string table to represent certaincontent items more efficiently. Section7.3 String Tabledescribes the string table and how it is applied to different contentitems.

7.1.10.1 Restricted Character Sets

If a string value is associated with a schemadatatype^XS2 directly or indirectly derived from xsd:string and one or more of the datatypes in its datatype hierarchy has one or more pattern facets, there may be a restricted character set defined for the string value. The following steps are used to determine the restricted character set, if any, defined for a given string value associated with such a schema datatype.

Given the schema datatype, let the target datatype definition be the definition of the most-derived datatype that has one or more pattern facets immediately specified in its definition in the schema among those in the datatype inheritance hierarchy that traces backwards towardprimitive datatypes^XS2 starting from the datatype.If the target datatype definition is a definition for abuilt-in datatype^XS2, there is no restricted character set for the string value. Otherwise,determine the set of characters for each immediate pattern facet of the target datatype definition according to sectionE DerivingSet of Charactersfrom XML Schema Regular Expressions.Then, compute the restricted set of characters for the string value as the union of all the sets of characters computed in the previous step. If the resulting set of characters contains less than256characters and contains only BMP characters, the string value has a restricted character set and each character is represented using ann-bit Unsigned Integer (see7.1.9 n-bit Unsigned Integer), wheren is ⌈ log₂(N + 1) ⌉ andN is the number of characters in the restricted character set.

The characters in the restricted character set are sorted by Unicode[UNICODE] code point and represented by integer values in the range (0 ...N−1) according to their ordinal position in the set. Characters that are not in this set are represented by then-bit Unsigned IntegerN followed by the Unicode code point of the character represented as an Unsigned Integer.

The figure below illustrates an overview of the process for determining and using restricted character sets described in this section.

Figure 7-1.String Processing Model

7.1.11 List

Values of type List are encoded as a lengthprefixed sequence of values. The length is encoded as an Unsigned Integer (see7.1.6 Unsigned Integer) and each value is encoded accordingto its type (see7. Representing Event Content).

7.2 Enumerations

When thePreserve.lexicalValues option is false,enumerated valuesare encoded asn-bit Unsigned Integers (7.1.9 n-bit Unsigned Integer) wheren = ⌈ log₂m ⌉ andm is the number of itemsin the enumerated type. The unsigned integer value assigned to each item corresponds toits ordinal position in the enumeration in schema-order starting withposition zero (0).When there are more than one item that represent the same value in the enumeration,such value can be represented using the ordinal position of any items that represent the value.

Exceptions are for schemaunion datatypes^XS2 ,list datatypes^XS2, as well as QName or Notation and types derived therefrom by restriction. The values of such types are processed by their respective built-in EXI datatype representations instead of being represented as enumerations.

7.3 String Table

EXI uses a string table to assign "compact identifiers" to somestring values. Occurrences of string values found in the string tableare represented using the associated compact identifier rather thanencoding the entire "string literal". The string table is initially pre-populated withstring values that are likely to occur in certain contexts and isdynamically expanded to include additional string values encounteredin the document. The followingcontent items are encoded using astring table:

uris
prefixes
uri andlocal-nameinqnames
values

When a string value is found in the string table, the value is encodedusing the compact identifier and no changes are made to the string table as a result.When a string value is not found in the string table, its string literal is encodedas a String without using a compact identifier, only after whichthe string table is augmented by including the string value with an assignedcompact identifierunless the string value represents avalue content itemand fails to satisfy the criteria in effect by virtue ofvaluePartitionCapacity andvalueMaxLength options.

The string table is divided into partitions and each partition isoptimized for more frequent use of either compact identifiers or string literalsdepending on the purpose of the partition. Section7.3.1 String Table Partitions describes how EXI string table ispartitioned. Section7.3.2 Partitions Optimized for Frequent use of Compact Identifiersdescribes how string values are encoded when the associated partitionis optimized for more frequent use of compact identifiers. Section7.3.3 Partitions Optimized for Frequent use of String Literals describes how string values areencoded when the associated partition is optimized for more frequent useof string literals.

The life cycle of a string table spans the processing ofa single EXI stream. String tables are not represented in an EXI stream or exchangedbetween EXI processors. A string table cannot be reused across multiple EXI streams;therefore, EXI processors MUST use a string table that is equivalent tothe one that would have been newly created and pre-populated with initialvalues for processing each EXI stream.

7.3.1 String Table Partitions

The string table is organized into partitionsso that the indices assigned to compact identifiers can stay relatively small.Smaller number of indices results in improved average compactness and the efficiencyof table operations. Each partition has a separate set of compact identifiers andcontent items are assigned to specific partitions as described below.

Uri content items and the URI portion ofqname content items are assigned to the uripartition. The uri partition is optimized for frequent use of compact identifiers and ispre-populated with initial entries as described inD.1 Initial Entries in Uri Partition.When a schema is provided, the uri partition is also pre-populated withthe name of eachtargetnamespace URI declared in the schema,plus some of the namespace URIs used in wildcard termsand attribute wildcards(see section8.5.4.1.7 Wildcard Termsand8.5.4.1.3.2 Complex Type Grammars, respectivelyfor the condition),appended in lexicographical order.

Prefix content items are assigned to partitions basedon their associated namespace URI. Partitions containingprefix content items are optimized for frequent use of compact identifiers and thestring table is pre-populated with entries as described inD.2 Initial Entries in Prefix Partitions.

The local-name portion ofqnamecontent items are assigned to partitions based on the namespace URI oftheqname content item of which the local-name is a part.Partitions containing local-names are optimized for frequent use of stringliterals and the string table is pre-populated with entries as described inD.3 Initial Entries in Local-Name Partitions.

Eachvaluecontent item is assigned to both the global value partitionand a "local" value partition based on theqnameof the attribute or element in context at the timethe value is added to the value partitions.Partitions containingvalue content items are optimized for frequent use of string literals and are initially empty.[Definition:] The variableglobalID is a non-negative integer representing the compact identifier of the next item added to the global value partition.Its value is initially set to 0 (zero) and changes while processing an EXI stream per the rule described in7.3.3 Partitions Optimized for Frequent use of String Literals.

7.3.2 Partitions Optimized for Frequent use of Compact Identifiers

String table partitions that are expected to contain a relativelysmall number of entries used repeatedly throughout the document areoptimized for the frequent use of compact identifiers. This includes theuri partition andall partitions containingprefix content items.

When a string value is found in a partition optimized for frequent use of compact identifiers,the string value is represented as the value (i+1)encoded as ann-bit Unsigned Integer (7.1.9 n-bit Unsigned Integer), wherei is the value of the compact identifier,n is⌈ log₂ (m+1) ⌉ andm is the number ofentries in the string table partition at the time of the operation.

When a string value is not found in a partition optimized for frequent use of compact identifiers,the String value is represented as zero (0) encoded as ann-bit Unsigned Integer, followed by the string literalencoded as a String (7.1.10 String). Afterencoding the String value, it is added to the string table partitionand assigned the next available compact identifierm.

7.3.3 Partitions Optimized for Frequent use of String Literals

The remaining string table partitions are optimized forthe frequent use of string literals. This includes all string table partitions containinglocal-namesand all string table partitions containingvalue contentitems.

When a string value is found in the partitions containinglocal-names, thestring value is represented as zero (0) encoded as an Unsigned Integer (see7.1.6 Unsigned Integer) followed bythe compact identifier of the string value. The compact identifier of the stringvalue is encoded as ann-bit unsigned integer (7.1.9 n-bit Unsigned Integer), wheren is ⌈ log₂m ⌉ andm isthe number of entries in the string table partition at the time of the operation.

When a string value is not found in the partitions containinglocal-names, itsstring literal is encoded as a String (see7.1.10 String) with the length of the string incrementedby one. After encoding the string value, it is added to the stringtable partition and assigned the next available compactidentifierm.

As described above, eachvalue content item is assignedto two partitions, a "local" value partition and the globalvalue partition.When a string value is found in the global or "local" partition, it is represented using a compact identifier. When a string value is found in the "local" value partition,the string value may be represented as zero (0) encoded as an Unsigned Integer (see7.1.6 Unsigned Integer) followed by the compact identifierof the string value in the "local" value partition.When a string value is found in the global value partition, the String value may be represented as one (1) encoded as anUnsigned Integer (see7.1.6 Unsigned Integer) followed by the compactidentifier of the String value in the global valuepartition. The compact identifier is encoded as ann-bitunsigned integer (7.1.9 n-bit Unsigned Integer), wheren is ⌈ log₂m ⌉ andm is the number of entries in theassociated partition at the time of the operation.

When a string valueS is not found in the global or "local"value partition, its string literal is encoded as aString (see7.1.10 String) with the lengthL + 2 (incremented by two) whereL is the number of characters in the string value.IfvaluePartitionCapacity is not zero, andL is greater than zero and no more thanvalueMaxLength, the stringS is added to the associated "local" value partition using the next available compact identifierm and added to the global value partition using the compact identifierglobalID. WhenS is added to the global value partition and there was already a stringV in the global value partition associated with the compact identifierglobalID, the stringS replaces the stringV in the global table, and the stringV is removed from its associated local value partition by rendering its compact identifier permanently unassigned. When the string value is added to the global value partition, the value ofglobalID is incremented by one (1). If the resulting value ofglobalID is equal tovaluePartitionCapacity, its value is reset to zero (0)

7.4 Datatype Representation Map

By default, each typed value in an EXI stream is represented using itsdefault built-in EXI datatype representation (seeTable 7-1).However,[Definition:] EXI processors MAY provide the capability to specify alternate built-in EXI datatype representations oruser-defined datatype representations for specific schemadatatypes^XS2.This capability is called aDatatype Representation Map.

Note:

This feature is relevant only to simple types in the schema.EXI does not provide a way for applications to infuse custom representations of structured data bound to complex types into the format.

EXI processors that support Datatype Representation Maps MAY provide implementation specific means to define and install user-defined datatype representations. EXI processors MAY also provide implementation specific means for applications or users to specify alternate built-in EXI datatype representations or user-defined datatype representations for representing specific schema datatypes. As with the default EXI datatype representations, alternate datatype representations are used for the associated XML Schema types specified in the Datatype Representation Map and XML Schema datatypes derived from those datatypes. When there are built-in or user-defined datatype representations associated with more than one XML Schema datatype in the type hierarchy of a particular datatype, the closest ancestor with an associated datatype representation is used to determine the EXI datatype representation. For XML Schema datatypes with enumerated values, the encoding rule described in7.2 Enumerations is used as the representation when the closest ancestor datatype with an associated datatype representation has no enumerated values.

When an EXI processor encodes an EXI stream usinga Datatype Representation Map and theEXI Options part of the header is present, the EXI options part MUST specify all alternate datatype representations used in the EXI stream.An EXI processor that attempts to decode anEXI stream that specifies a user-defined datatype representation in the EXI header thatit does not recognize MAY report a warning, but this is not anerror. However, when an EXI processor encounters a typed value thatwas encoded by a user-defined datatype representation that it does not support, it MUSTreport an error.

The EXI options header, when it appears in an EXI stream, MUST include a"datatypeRepresentationMap" element for eachschema datatypeof which the descendant datatypes derived by restriction as well as itself arenot represented using the defaultbuilt-in EXI datatype representation.The "datatypeRepresentationMap" element includes two child elements.Theqname ofthe first child element identifies the schema datatype that is notrepresented using the defaultbuilt-in EXI datatype representationand theqname of thesecond child element identifies the alternatebuilt-in EXI datatype representation or user-defined datatype representationused to represent that type.Built-in EXI datatype representations are identified by the type identifiers inTable 7-1.

For example, the following "datatypeRepresentationMap" element indicates all values of type xsd:decimal are represented using the built-in exi:string datatype representation. In addition, all datatypes directly or indirectly derived from xsd:decimal by restriction that do not have a closer ancestor in the type hierarchy with an associated datatype representation are represented using exi:string.

Example 7-2.datatypeRepresentationMap indicating all Decimal values are represented usingbuilt-in String datatype representation

    <exi:datatypeRepresentationMap xmlns:xsd="http://www.w3.org/2001/XMLSchema">        <xsd:decimal/>        <exi:string/>    </exi:datatypeRepresentationMap>

It is the responsibility of an EXI processor to interface with a particular implementation ofbuilt-in EXI datatype representationsor user-defineddatatype representationsproperly. In the example above, an EXI processor may need to provide a string value of the data being processed that is typed as xsd:decimal in order to interface withan implementation of built-in String datatype representation.In such a case, some EXI processors may have started with a decimal value and such processors may well translate the value into a string before passing the data tothe implementation of built-in String datatype representationwhile other EXI processors may already have a string value of the data so that it can pass the value directly tothe implementation of built-in String datatype representationwithout any translation.

As another example, the following"datatypeRepresentationMap" element indicates allvalues of the user-definedsimple typegeo:geometricSurfaceand the datatypes derived from it by restrictionare representedusing the user-defined datatype representation geo:geometricInterpolator:

Example 7-3.datatypeRepresentationMap illustrating a user-defined type represented by a user-defined datatype representation

    <exi:datatypeRepresentationMap xmlns:geo="http://example.com/Geometry">        <geo:geometricSurface/><geo:geometricInterpolator/>    </exi:datatypeRepresentationMap>

Note:

EXI only defines a way to indicate the use of user-defined datatype representations for representing values of specific datatypes.Datatype representations are referred to by their respectiveqnames in "datatypeRepresentationMap" elements. A datatype representation is omnipresent only if itsqname is one of those that represent built-in EXI datatype representations.For datatype representations of otherqnames, EXI does not provide nor suggest a method by which they are identified and shared between EXI Processors.This suggests that the use of user-defined (i.e. custom) datatype representationsneeds to be restrained by weighing alternatives and considering the consequences of each in pros and cons, in order to avoid unruly proliferation of documents that use such datatype representations.Those applications that ever find Datatype Representation Map useful should make sure that they exchange such documents only among the parties that are pre-known or discovered to be able to process the user-defined datatype representations that are in use. Otherwise, if it is not for certain if a receiver understands the particular user-defined datatype representations, the sender should never attempt to send documents that use user-defined datatype representations to that recipient.

8. EXI Grammars

EXI is a knowledge based encoding that uses a set of grammars todetermine which events are most likely to occur at any given point inan EXI stream and encodes the most likely alternatives in fewerbits. It does this by mapping the stream of events to a lower entropyset of representative values and encoding those values using a set ofsimple variable length codes or an EXI compression algorithm.

The result is a very simple, small algorithm that uniformly handlesschema-less encoding, schema-informed encoding, schema deviations,and any combination thereof in EXI streams. These variations donot require different algorithms or different parsers, they are simplyinformed by different combinations of grammars.

The following sections describe the grammars used to inform the EXI encoding.

Note:

The grammar semantics in this specification are written for clarity and generality. They do not prescribe a particular implementation approach.

8.1 Grammar Notation

8.1.1 Fixed Event Codes

Each grammar production has anevent code, which is represented by a sequence of one to three parts separated by periods ("."). Each part is an unsigned integer. The following are examples of grammar productions withevent codes as they appear in this specification.

Example 8-1.Example productions with fixed event codes


Productions		Event Codes

LeftHandSide₁ :
	Terminal₁ NonTerminal₁	0
	Terminal₂ NonTerminal₂	1
	Terminal₃ NonTerminal₃	2.0
	Terminal₄ NonTerminal₄	2.1
	Terminal₅ NonTerminal₅	2.2.0
	Terminal₆ NonTerminal₆	2.2.1

LeftHandSide₂ :
	Terminal₁ NonTerminal₁	0
	Terminal₂ NonTerminal₂	1.0
	Terminal₃ NonTerminal₃	1.1

The number of parts in a givenevent code is called the event code's length. No two productions with the same non-terminal symbol on the left-hand side are permitted to have the sameevent code.

8.1.2 Variable Event Codes

Some non-terminal symbols are used on the right-hand side in a production withouta terminal symbol prefixed to them, but with a parenthesized event code affixed instead.Such non-terminal symbols are macros and they are used to capture some recurring set of productionsassymbols so that a symbol can be used in the grammar representation instead of including all the productions the macro represents in place every time it is used.

Example 8-2.Example productions that use macro non-terminal symbols


ABigProduction₁ :
	Terminal₁ NonTerminal₁	0
	Terminal₂ NonTerminal₂	1
	LEFTHANDSIDE₁ (2.0)	2.0

ABigProduction₂ :
	Terminal₁ NonTerminal₁	0
	LEFTHANDSIDE₁ (1.1)	1.1
	Terminal₂ NonTerminal₂	1.2

Because non-terminal macros are injected into the right-hand side of more than one production,theevent codes of productions with these macro non-terminals on the left-hand side are not fixed, but will have different event code values depending on the context in which the macro non-terminal appears. This specification calls these variable event codes and uses variables in place of individual event code parts to indicate the event code parts are determined by the context. Below are some examples of variable event codes:

Example 8-3.Example non-terminal macros and its productions with variable event codes


LEFTHANDSIDE₁ (n.m) :
	TERMINAL₁ NONTERMINAL₁	n.0
	TERMINAL₂ NONTERMINAL₂	n.1
	TERMINAL₃ NONTERMINAL₃	n.m+2
	TERMINAL₄ NONTERMINAL₄	n.m+3
	TERMINAL₅ NONTERMINAL₅	n.m+4.0
	TERMINAL₆ NONTERMINAL₆	n.m+4.1

Unless otherwise specified, the variablen evaluates to the first part of theevent code of the production in which the macro non-terminalLEFTHANDSIDE₁ appears on the right-hand side. Similarly, the expressionn.m represents the first two parts of theevent code of the production in which the macro non-terminalLEFTHANDSIDE₁ appears on the right-hand side.

Non-terminal macros are used in this specification for notational convenience only.They are not non-terminals, even though they are used in place of non-terminals.Productions that use non-terminal macros on the right-hand side need to be expanded by macro substitution before such productions are interpreted.Therefore,ABigProduction₁ andABigProduction₂ shown in the preceding example are equivalent to the following set of productions obtained by expanding the non-terminal macro symbolLEFTHANDSIDE₁ and evaluating the variable event codes.

Example 8-4.Expanded productions equivalent to the productions used above


ABigProduction₁ :
	Terminal₁ NonTerminal₁	0
	Terminal₂ NonTerminal₂	1
	TERMINAL₁ NONTERMINAL₁	2.0
	TERMINAL₂ NONTERMINAL₂	2.1
	TERMINAL₃ NONTERMINAL₃	2.2
	TERMINAL₄ NONTERMINAL₄	2.3
	TERMINAL₅ NONTERMINAL₅	2.4.0
	TERMINAL₆ NONTERMINAL₆	2.4.1

ABigProduction₂ :
	Terminal₁ NonTerminal₁	0
	TERMINAL₁ NONTERMINAL₁	1.0
	TERMINAL₂ NONTERMINAL₂	1.1
	Terminal₂ NonTerminal₂	1.2
	TERMINAL₃ NONTERMINAL₃	1.3
	TERMINAL₄ NONTERMINAL₄	1.4
	TERMINAL₅ NONTERMINAL₅	1.5.0
	TERMINAL₆ NONTERMINAL₆	1.5.1

8.2 Grammar Event Codes

Each production rule in the EXI grammar includes anevent code value that approximates the likelihood the associated production rule will be matched over the other productions with the same left-hand-side non-terminal symbol. Ultimately, theevent codes determine the value(s) by which each non-terminal symbol will be represented in the EXI stream.

To understand how a givenevent code approximates the likelihood a given production will match, it is useful to visualize theevent codes for a set of production rules that have the same non-terminal symbol on the left-hand side as a tree. For example, the following set of productions:

Example 8-5.Example productions with event codes

ElementContent :
	EE	0
	SE () ElementContent*	1.0
	CH ElementContent	1.1
	ER ElementContent	1.2
	CM ElementContent	1.3.0
	PI ElementContent	1.3.1

represents a set of information items that might occur as element content after the start tag. Using the productionevent codes, we can visualize this set of productions as follows:

Figure 8-1.Event code tree for ElementContent grammar

where theterminal symbolsare represented by the leaf nodes of the tree, and theevent code of each production ruledefines a path from the root of the tree to the nodethat represents the terminal symbol that is on the right-hand side of the production.We call this the event code tree for a given set of productions.

An event code tree is similar to a Huffman tree[Huffman Coding] in that shorter paths are generally used for symbols that are considered more likely. However, event code trees are far simpler and less costly to compute and maintain. Event code trees are shallow and contain at most three levels. In addition, the length of eachevent code in the event code tree is assigned statically without analyzing the data. This classification provides some of the benefits of a Huffman tree without the cost.

8.3 Pruning Unneeded Productions

As discussed in section6.3 Fidelity Options, applications MAY provide a set of fidelity options to specify the XML features they require. EXI processors MUST use these fidelity options to prunethe productions of which the terminal symbols represent the events that are not required from the grammars,improving compactness and processing efficiency.

For example, the following set of productions represent the set of information items that might occur as element content after the start tag.

Example 8-6.Example productions with full fidelity

ElementContent :
	EE	0
	SE () ElementContent*	1.0
	CH ElementContent	1.1
	ER ElementContent	1.2
	CM ElementContent	1.3.0
	PI ElementContent	1.3.1

If an application sets the fidelity options Preserve.comments, Preserve.pis and Preserve.dtd to false, the productions matching comment (CM), processing instruction (PI) and entity reference (ER) events are pruned from the grammar, producing the following set of productions:

Example 8-7.Example productions after pruning

ElementContent :
	EE	0
	SE () ElementContent*	1.0
	CH ElementContent	1.1

Removing these productions from the grammar tells EXI processors that comments and processing instructions will never occur in the EXI stream, which reduces the entropy of the stream allowing it to be encoded in fewer bits.

Each time a production is removed from a grammar, theevent codes of the other productions with the same non-terminal symbol on the left-hand side MUST be adjusted to keep them contiguous if its removal has left the remaining productions with non-contiguous event codes.

8.4 Built-in XML Grammars

This section describes the built-in XML grammars used by EXI when no schema information is available or when available schema information describes only portions of the EXI stream.

The built-in XML grammars are dynamic and continuously evolve to reflect knowledge learned while processing an EXI stream. Newbuilt-in element grammars are created to describe the content of newly encountered elements and new grammar productions are added to refine existing built-in grammars. Newly learned grammars and productions are used to more efficiently represent subsequent events in the EXI stream. All newly createdbuilt-in element grammars areglobal element grammars.

[Definition:] Aglobal element grammar is a grammar describing the content of an element that has global scope (i.e. a global element). At the onset of processing an EXI stream, the set of global element grammars is the set of all schema-informed element grammars derived from element declarations that have a {scope} property ofglobal. Eachbuilt-in element grammar created while processing an EXI stream is added to the set of global element grammars. Each global elementgrammarhas a uniqueqname.

8.4.1 Built-in Document Grammar

In the absence of schema information describing the content of the EXI stream, the following grammar describes the events that will occur in anEXI document.

		Event Code

Document :
	SDDocContent	0

DocContent :
	SE ()DocEnd*	0
	DTDocContent	1.0
	CMDocContent	1.1.0
	PIDocContent	1.1.1

DocEnd :
	ED	0
	CMDocEnd	1.0
	PIDocEnd	1.1

Semantics:

All productions in the built-in document grammars of the formLeftHandSide : SE (*)RightHandSideare evaluated as follows:

Letqname be theqname of the element matched by SE (*)
If aglobal element grammar does not exist for elementqname, create one according to section8.4.3 Built-in Element Grammar.
Evaluate the element content using theglobal element grammar for elementqname.
Evaluate the remainder of event sequence usingRightHandSide.

8.4.2 Built-in Fragment Grammar

In the absence of schema information describing the contents of an EXI stream, the following grammar describes the events that may occur in anEXI fragment. The grammar below represents the initial set of productions in the built-in fragment grammar at the start of EXI stream processing. The associated semantics explain how the built-in fragment grammar evolves to more efficiently represent subsequent events in the EXI stream.

		Event Code

Fragment :
	SDFragmentContent	0

FragmentContent :
	SE ()FragmentContent*	0
	ED	1
	CMFragmentContent	2.0
	PIFragmentContent	2.1

Semantics:

All productions in the built-in fragment grammars of the formLeftHandSide : SE (*)RightHandSideare evaluated as follows:

Letqname be theqname of the element matched by SE (*)
If aglobal element grammar does not exist for elementqname, create one according to section8.4.3 Built-in Element Grammar.
Create a production of the formLeftHandSide : SE (qname)RightHandSide with anevent code 0
Increment the first part of theevent code of each production in the current grammar with the non-terminalLeftHandSide on the left-hand side
Add the production created in step3to the grammar
Evaluate the element content using theglobal element grammar for elementqname.
Evaluate the remainder of event sequence usingRightHandSide.

All productions of the formLeftHandSide : SE (qname)RightHandSide that were previously added to the grammar upon the first occurrence of the element that has theqname qname are evaluated as follows when they are matched:

Evaluate the element content using theglobal element grammar for elementqname
Evaluate the remainder of event sequence usingRightHandSide.

8.4.3 Built-in Element Grammar

[Definition:] When no grammar exists for an element occurring in an EXI stream, abuilt-in element grammar is created for that element.Built-in element grammars are initially generic and are progressively refined as the specific content for the associated element is learned. All built-in element grammars areglobal element grammars and can be uniquely identified by the qname of the global element they describe. At the outset of processing an EXI stream, the set of built-in element grammars is empty.

Below is the initial set of productions used for all newly createdbuilt-in element grammars. The semantics describe how productions are added to eachbuilt-in element grammar as the content of the associated element is learned.

		Event Code

StartTagContent :
	EE	0.0
	AT ()StartTagContent*	0.1
	NSStartTagContent	0.2
	SCFragment	0.3
	ChildContentItems (0.4)

ElementContent :
	EE	0
	ChildContentItems (1.0)

ChildContentItems (n.m) :
	SE ()ElementContent*	n.m
	CHElementContent	n.(m+1)
	ERElementContent	n.(m+2)
	CMElementContent	n.(m+3).0
	PIElementContent	n.(m+3).1

Note:

When the value of theselfContained option is false,the production with the terminal symbol SC on the right-hand side is absent from the above grammar, andthe use of the non-terminal macroChildContentItems that hasStartTagContent non-terminal on the left-hand side (shown in bold above) gets expanded with variable values (0.3) instead of (0.4) used above.
When a xsi:type attribute appears in an element where thebuilt-in element grammar is in effect, it MUST occur before any other AT events of the same elementunless it is known that xsi:type attribute will not impact grammar selection.
When a xsi:nil attribute appears in an element where thebuilt-in element grammar is in effect, it does not impact grammar selection and is not strictly required to occur before other AT events of the same element.

Semantics:

All productions in thebuilt-in element grammar of the formLeftHandSide: AT (*)RightHandSide are evaluated as follows:

Letqname be theqname of the attribute matched by AT (*)
Ifqname is not xsi:type or if a production of the formLeftHandSide : AT (xsi:type) with anevent code of length 1 does not exist in the current element grammar, create a production of the formLeftHandSide : AT (qname)RightHandSidewith anevent code 0 and increment the first part of the event code of each production in the current grammar with the non-terminalLeftHandSide on the left-hand side. Add this production to the grammar.
Ifqname is xsi:type, lettarget-type be the value of the xsi:type attribute and assign it the QName datatype representation (see7.1.7 QName). If there is no namespace in scope for the specified qname prefix, set theuri oftarget-type to empty ("") and thelocalName to the full lexical value of the QName, including the prefix. Encodetarget-type according to section7. Representing Event Content. If a grammar can be found for thetarget-type type using the encodedtarget-type representation, evaluate the element contents using the grammar fortarget-type type instead ofRightHandSide.

The production of the formLeftHandSide : AT (xsi:type)RightHandSide that was previously added to the grammar upon the first occurrence of the xsi:type attribute is evaluated as follows when it is matched:

Lettarget-type be the value of the xsi:type attribute and assign it the QName datatype representation (see7.1.7 QName).
If there is no namespace in scope for the specifiedqname prefix, set theuri oftarget-type to empty ("") and thelocalName to the full lexical value of the QName, including the prefix.
Represent the value oftarget-type according to section7. Representing Event Content.
If a grammar can be found for thetarget-type type using the encodedtarget-type representation,evaluate the element contents using the grammar fortarget-type type instead ofRightHandSide.

All productions of the formLeftHandSide : SCFragment are evaluated as follows:

Save the string table, grammars and any implementation-specific state learned while processing this EXI Body.
Initialize the string table, grammars and any implementation-specific state learned while processing this EXI Body to the state they held just prior to processing this EXI Body.
Skip to the next byte-aligned boundary in the streamif it is not already at such a boundary.
Letqname be theqname of the SE event immediately preceding this SC event.
Letcontent be the sequence of events following this SC event that match the grammar for elementqname, up to and including the terminating EE event.
Evaluate the sequence of events (SD, SE(qname),content, ED) according to theFragment grammar (see8.4.2 Built-in Fragment Grammar).
Skip to the next byte-aligned boundary in the streamif it is not already at such a boundary.
Restore the string table, grammars and implementation-specific state learned while processing this EXI Body to that saved in step 1 above.

All productions in thebuilt-in element grammar of the formLeftHandSide : SE (*)RightHandSide are evaluated as follows:

Letqname be theqname of the element matched by SE (*)
If aglobal element grammar does not exist for elementqname, create one according to section8.4.3 Built-in Element Grammar.
Create a production of the formLeftHandSide : SE (qname)RightHandSide with anevent code 0
Increment the first part of theevent code of each production in the current grammar with the non-terminalLeftHandSide on the left-hand side
Add the production created in step3to the grammar
Evaluate the element content using theglobal element grammar for elementqname.
Evaluate the remainder of event sequence usingRightHandSide.

Evaluate the element content using theglobal element grammar for elementqname
Evaluate the remainder of event sequence usingRightHandSide.

All productions in thebuilt-in element grammar of the formLeftHandSide : CHRightHandSide are evaluated as follows:

If a production of the form,LeftHandSide : CHRightHandSide with anevent code of length 1 does not exist in the current element grammar, create one with event code 0 and increment the first part of the event code of each production in the current grammar with the non-terminalLeftHandSide on the left-hand side.
Add the production created in step 1 to the grammar
Evaluate the remainder of event sequence usingRightHandSide.

All productions in thebuilt-in element grammar of the formLeftHandSide : EEare evaluated as follows:

If a production of the form,LeftHandSide : EEwith anevent code of length 1 does not exist in the current element grammar, create one with event code 0 and increment the first part of the event code of each production in the current grammar with the non-terminalLeftHandSide on the left-hand side.
Add the production created in step 1 to the grammar

8.5 Schema-informed Grammars

This section describes the schema-informed grammars used by EXI when schema information is available to describe the contents of theEXI stream.Schema information used for processing an EXI stream is either indicated by the header optionschemaId, or communicated out-of-band in the absence ofschemaId.

Schema-informed grammars accept all XML documents and fragments regardless of whether and how closely they match the schema. TheEXI stream encoder encodes individual events using schema-informed grammars where they are available and falls back to the built-in XML grammars where they are not. In general, events for which a schema-informed grammar exists will be encoded more efficiently.

Unlike built-in XML grammars, schema-informed grammars are static and do not evolve, which permits the reuse of schema-informed grammars across the processing of multiple EXI streams. This is a single outstanding difference between the two grammar systems.

It is important to note that schema-informed and built-in grammars are often used together within the context of a singleEXI stream. While processing a schema-informed grammar, built-in grammars may be created to represent schema deviations or elements that match wildcards declared in the schema. Even though these built-in grammars occur in the context of a schema-informed stream, they are still dynamic and evolve to represent content learned while processing the EXI stream as is described in8.4 Built-in XML Grammars.

8.5.1 Schema-informed Document Grammar

When schema information is available to describe the contents of anEXI stream, the following grammar describes the events that will occur in anEXI document.

Syntax

Event Code

Document :

SDDocContent

DocContent :

SE (G₀)DocEnd

SE (G₁)DocEnd

⋮

SE (G_n−1)DocEnd

n−1

SE (*)DocEnd

DTDocContent

(n+1).0

CMDocContent

(n+1).1.0

PIDocContent

(n+1).1.1

Note:

The variablen in the grammar above is the number of global elements declared in the schema.G ₀, G ₁, ... G _n−1 represent all theqnames of global elements sorted lexicographically, first by local-name, then by uri.

DocEnd :

CMDocEnd

1.0

PIDocEnd

1.1

Semantics:

In a schema-informed grammar, all productions of the formLeftHandSide : SE (*)RightHandSide are evaluated as follows:

Letqname be theqname of the element matched by SE (*)
If aglobal element grammar does not exist for elementqname, create one according to section8.4.3 Built-in Element Grammar.
Evaluate the element content using theglobal element grammar for elementqname.
Evaluate the remainder of event sequence usingRightHandSide

8.5.2 Schema-informed Fragment Grammar

When schema information is available to describe the contents of anEXI stream, the following grammar describes the events that will occur in anEXI fragment.

Syntax

Event Code

Fragment :

SDFragmentContent

FragmentContent :

SE (F₀)FragmentContent

SE (F₁)FragmentContent

⋮

SE (F_n−1)FragmentContent

n−1

SE (*)FragmentContent

n+1

CMFragmentContent

(n+2).0

PIFragmentContent

(n+2).1

Note:

The variablen in the grammar above represents the number of unique elementqnames declared in the schema. The variables F ₀, F ₁, ... F _n−1 represent theseqnames sorted lexicographically, first by local-name, then by uri. If there is more than one element declared with the sameqname, theqname is included only once.If all such elements have the same schema type name and {nillable} property value, their content is evaluated according to the specific grammar for that element declaration.Otherwise, their content is evaluated according to the relaxed Element Fragment grammar described in8.5.3 Schema-informed Element Fragment Grammar.

Semantics:

In a schema-informed grammar, all productions of the formLeftHandSide : SE (*)RightHandSide are evaluated as follows:

Letqname be theqname of the element matched by SE (*)
If aglobal element grammar does not exist for elementqname, create one according to section8.4.3 Built-in Element Grammar.
Evaluate the element content using theglobal element grammar for elementqname.
Evaluate the remainder of event sequence usingRightHandSide

8.5.3 Schema-informed Element Fragment Grammar

[Definition:] When schema information is available to describe the contents of anEXI stream and more than one element is declared with the sameqname,but not all such elements have the same type name and {nillable} property value,theSchema-informed Element Fragment Grammar are used for processing the events that may occur in such elements when they occur inside an EXI fragment or EXI Element Fragment.The schema-informed element fragment grammar consists ofElementFragment andElementFragmentTypeEmpty which are defined below.ElementFragment is a grammar that accounts both element declarations and attribute declarations in the schemas, whereasElementFragmentTypeEmpty is a grammar that regards only attribute declarations.

Syntax

Event Code

ElementFragment₀ :

AT (A₀) [schema-typed value]ElementFragment₀

AT (A₁) [schema-typed value]ElementFragment₀

⋮

AT (A_n−1) [schema-typed value]ElementFragment₀

n−1

AT (*)ElementFragment₀

SE (F₀)ElementFragment₂

n+1

SE (F₁)ElementFragment₂

n+2

⋮

SE (F_m-1)ElementFragment₂

n+m

SE (*)ElementFragment₂

n+m+1

n+m+2

CH [untyped value]ElementFragment₂

n+m+3

ElementFragment₁ :

SE (F₀)ElementFragment₂

SE (F₁)ElementFragment₂

⋮

SE (F_m-1)ElementFragment₂

m-1

SE (*)ElementFragment₂

m+1

CH [untyped value]ElementFragment₂

m+2

ElementFragment₂ :

SE (F₀)ElementFragment₂

SE (F₁)ElementFragment₂

⋮

SE (F_m-1)ElementFragment₂

m-1

SE (*)ElementFragment₂

m+1

CH [untyped value]ElementFragment₂

m+2

ElementFragmentTypeEmpty₀ :

AT (A₀) [schema-typed value]ElementFragmentTypeEmpty₀

AT (A₁) [schema-typed value]ElementFragmentTypeEmpty₀

⋮

AT (A_n−1) [schema-typed value]ElementFragmentTypeEmpty₀

n−1

AT (*)ElementFragmentTypeEmpty₀

n+1

ElementFragmentTypeEmpty₁ :

Note:

The variablen in the grammar above representsthe number of uniqueqnames given to explicitly declared attributes in the schema.The variables A₀ , A₁, ... A_n−1 represent these qnames sorted lexicographically, first by local-name, then by uri. If there is more than one attribute declared with the sameqname, the qname is included only once.If all such attributes have the same schema type name, theirvalue is represented using that type.Otherwise, theirvalue is represented as a String.

The variablem in the grammar above represents the number of unique elementqnames declared in the schema. The variables F₀, F₁, ... F_m-1 represent these qnames sorted lexicographically, first by local-name, then by uri. If there is more than one element declared with the sameqname, the qname is included only once.If all such elements have the same type name and {nillable} property value, their content is evaluated according to specific grammar for that element declaration.Otherwise, their content is evaluated according to the relaxed Element Fragment grammar described above.

Semantics:

In a schema-informed grammar, all productions of the formLeftHandSide : SE (*)RightHandSide are evaluated as follows:

Letqname be theqname of the element matched by SE (*)
If aglobal element grammar does not exist for elementqname, create one according to section8.4.3 Built-in Element Grammar.
Evaluate the element content using theglobal element grammar for elementqname.
Evaluate the remainder of event sequence usingRightHandSide

All productions in the schema-informed element fragment grammar of the formLeftHandSide: AT (*)RightHandSide are evaluated as follows:

Letqname be theqname of the attribute matched by AT (*)
If a global attribute definition exists forqname, letglobal-type be the datatype of the global attribute. If the attribute value can be represented using the datatype representation associated withglobal-type, it SHOULD be represented using the datatype representation associated withglobal-type (see7. Representing Event Content). If the attribute value is not represented using the datatype representation associated withglobal-type, represent theattribute eventusing the AT (*) [untyped value] terminal (see8.5.4.4 Undeclared Productions).

Note:

When a schema-informed grammar is in effect, xsi:type and xsi:nil attributes MUST NOT be represented using AT(*) terminal.

As with all schema informed element grammars, the schema-informed element fragment grammar is augmented with additional productions that describe events that may occur in an EXI stream, but are not explicitly declared in the schema. The process for augmenting the grammar is described in8.5.4.4 Undeclared Productions.For the purposes of this process, the schema-informed element fragment grammar is treated as though it is created from an element declaration with a {nillable} property value of true and a type declaration that has named sub-types, andElementFragmentTypeEmpty is used to serve as theTypeEmpty of the type in the process.

8.5.4 Schema-informed Element and Type Grammars

[Definition:] When one or more XML Schema is available to describe the contents of an EXI stream, aschema-informed element grammarElement_i is derived for each element declarationE_i described by the schemas, where 0 ≤i <n andn is the number of element declarations in the schema.

[Definition:] When one or more XML Schema is available to describe the contents of an EXI stream, aschema-informed type grammarType_i is derivedfor each named type declarationT_i described by the schemas as well as for each of thebuilt-in primitive types^XS2 andbuilt-in derived types^XS2, thecomplex ur-type^XS1 and thesimple ur-type^XS2 defined by XML Schema specification[XML Schema Structures][XML Schema Datatypes], where 0 ≤i <n andn is the total number of such available types.

Each schema-informed element grammar and type grammar is constructed according to the following four steps:

Create a proto-grammar that describes the content model according to available schema information (see section8.5.4.1 EXI Proto-Grammars).
Normalize the proto-grammar into an EXI grammar (see section8.5.4.2 EXI Normalized Grammars).
Assignevent codes to each production in the normalized EXI grammar (see section8.5.4.3 Event Code Assignment).
Add additional productions to the normalized EXI grammar to represent events that may occur in the EXI stream, but are not described by the schema, such as comments, processing-instructions, schema-deviations, etc. (see section8.5.4.4 Undeclared Productions).

Each element grammarElement_i includes a sequence ofn non-terminalsElement_i, j, where 0 ≤j <n. The content of the entire element is described by the first non-terminalElement_i, 0. The remaining non-terminals describe portions of the element content. Likewise, each type grammarType_i includes a sequence ofn non-terminalsType_i, j and the content of the entire type is described by the first non-terminalType_i, 0.

The algorithms expressed in this section provide a concise and formal description of the EXI grammars for a given set of XML Schema definitions. More efficient algorithms likely exist for generating these EXI grammars and EXI implementations are free to use any algorithm that produces grammars andevent codes that generate EXI encodings that match those produced by the grammars described here.

An example is provided in the appendix (seeH Schema-informed Grammar Examples) that demonstrates the process described in this section to generate a complete schema-informed element grammar from an element declarationin a schema.

8.5.4.1 EXI Proto-Grammars

This section describes the process for creating the EXI proto-grammars from XML Schema declarations and definitions. EXI proto-grammars differ from normalized EXI grammars in that they may contain productions of the form:

	LeftHandSide :
		RightHandSide

whereLeftHandSide andRightHandSide are both non-terminals. Whereas, all productions in a normalized EXI grammar contain exactly one terminal symbol and at most one non-terminal symbol on the right-hand side. This is a restricted form of Greibach normal form[Greibach Normal Form].

EXI proto-grammars are derived from XML Schema in a straight-forward manner and can easily be normalized with simple algorithm (see8.5.4.2 EXI Normalized Grammars).

8.5.4.1.1 Grammar Concatenation Operator

Proto-grammars are specified in a modular, constructive fashion. XML Schema components such as terms, particles, attribute uses are transformed each into a distinct proto-grammar, leveraging proto-grammars of their sub-components. At various stages of proto-grammar construction, two or more of proto-grammars are concatenated one after another to form more composite grammars.

The grammar concatenation operator ⊕ is a binary, associative operator that creates a new grammar from its left and right grammar operands. The new grammar accepts any set of symbols accepted by its left operand followed by any set of symbols accepted by its right operand.

Given a left operandGrammar^L and a right operandGrammar^R, the following operation

Grammar^L ⊕Grammar^R

creates a combined grammar by replacing each production of the form

	Grammar^L_k :
		EE

where 0 ≤k <n andn is the number of non-terminals that occur on the left-hand side of productions inGrammar^L, with a production of the form

	Grammar^L_k :
		Grammar^R₀

connecting each accept state ofGrammar^L with the start state ofGrammar^R.

8.5.4.1.2 Element Grammars

This section describes the process for creating an EXI element grammar from an XML Schemaelement declaration^XS1.

Given an element declarationE_i, with properties {name}, {target namespace}, {type definition}, {scope} and {nillable}, create a corresponding EXI grammarElement_i for evaluating the contents of elements in the specified {scope} withqname local-name = {name} andqname uri = {target namespace}whereqname is theqname of the elements.

LetT_j be the {type definition} ofE_i andType_j be the type grammar created fromT_j. The grammarElement_i describing the content model ofE_i is created as follows.

Syntax:
	Element_i , 0 :
		Type_j , 0

8.5.4.1.3 Type Grammars

Given an XML Schema type definitionT_iwith properties {name} and {target namespace},two type grammars are created, which are denoted byType_i andTypeEmpty_i.[Definition:] Type_i is a grammar that fully reflects the type definition ofT_i, whereas[Definition:] TypeEmpty_i is a grammar thatregardsonly the attribute uses and attribute wildcards ofT_i, if any.

The grammarType_i is used for evaluating the content of elements that are defined to be of typeT_i in the schema.[Definition:] Type_i is aglobal type grammar whenT_i is a named type.Type_i, when it is a global type grammar, can additionally be used as the effective grammar designated by a xsi:type attribute with the attribute value that is aqname with local-name = {name} and uri = {target namespace}.TypeEmpty_i is used in place ofType_i when the element instance that is being evaluated has a xsi:nil attribute with the valuetrue.

Sections8.5.4.1.3.1 Simple Type Grammars and8.5.4.1.3.2 Complex Type Grammars describe the processes for creatingType_i andTypeEmpty_i from XML Schemasimple type definitions^XS1 andcomplex type definitions^XS1 defined in schemas as well asbuilt-in primitive types^XS2,built-in derived types^XS2,simple ur-type^XS2 andcomplex ur-type^XS1 defined by XML Schema specification[XML Schema Datatypes].

8.5.4.1.3.1 Simple Type Grammars

This section describes the process for creating an EXI type grammar from an XML Schemasimple type definition^XS1.

Given a simple type definitionT_i, create two new EXI grammarsType_i andTypeEmpty_ifollowing the procedure described below.

Add the following grammar productions toType_i andTypeEmpty_i :

Syntax:
	Type_i, 0 :
		CH [schema-typed value]Type_i, 1
	Type_i, 1 :
		EE

	TypeEmpty_i, 0 :
		EE

Note:

Productions of the formLeftHandSide : CH [schema-typed value]RightHandSide representtyped character data that can be represented using the EXI datatype representation associated with the simple type definition (see7. Representing Event Content).Character data that can be represented using the EXI datatype representation associated with the simple type definition SHOULD be represented this way. Character data that is not represented using the EXI datatype representation associated with the simple type definition is represented by productions of the formLeftHandSide : CH [untyped value]RightHandSide described in section8.5.4.4 Undeclared Productions.

8.5.4.1.3.2 Complex Type Grammars

This section describes the process for creating an EXI type grammar from an XML Schemacomplex type definition^XS1.

Given a complex type definitionT_i, with properties {name}, {target namespace},{attribute uses}, {attribute wildcard} and {content type}, create two EXI grammarsType_i andTypeEmpty_ifollowing the procedure described below.

Generate a grammarAttribute_i, for each attribute useA_i in {attribute uses} according to section8.5.4.1.4 Attribute Uses.

Sort the attribute use grammars first by qname local-name, then by qname uri to form a sequence of grammarsG₀,G₁, …,G_n−1, wheren is the number of attribute uses in {attribute uses}.

If an {attribute wildcard} is specified, incrementn and generate additional attribute use grammarsG_n−1 as follows:

	G_n−1, 0 :
		EE

	G_n−1, 1 :
		EE

When the {attribute wildcard}'s {namespace constraint} isany, ora pair ofnot and either a namespace name or the special valueabsent indicating no namespace,add the following production to each grammarG_igenerated above,where 0 ≤ i < n :

	G_i, 0 :
		AT (*)G_i, 0

Otherwise, that is, when {namespace constraint} is a set of values whose members are namespace names or the special valueabsent indicating no namespace,add the following production to each grammarG_igenerated abovewhere 0 ≤ i < n :

G_i, 0 :

AT(uri_x : *)G_i, 0

whereuri_x is a member value of {namespace constraint},provided that it is the empty string (i.e. "") that is used asuri_x when the member value is the special valueabsent.Eachuri_x is used to augment the uri partition of the String table.Section7.3.1 String Table Partitions describes how theseuri strings are put into String table for pre-population.

If there is neither an attribute use nor an {attribute wildcard},G₀ of the following form is used as an attribute use grammar.

	G_0, 0 :
		EE

Note:

When xsi:type and/or xsi:nil attributes appear in an element where schema-informed grammars are in effect, they MUST occur before any other AT events of the same element, with xsi:type placed before xsi:nil when they both occur.

Semantics:

In complex type grammars,all productions of the formLeftHandSide: AT ()RightHandSide* andLeftHandSide: AT(uri_x : )RightHandSide*that stem from attribute wildcardsare evaluated as follows:

Letqname be theqname of the attribute matched by AT (*) or AT(uri_x : *)
If a global attribute definition exists forqname, letglobal-type be the datatype of the global attribute. If the attribute value can be represented using the datatype representation associated withglobal-type, it SHOULD be represented using the datatype representation associated withglobal-type (see7. Representing Event Content). If the attribute value is not represented using the datatype representation associated withglobal-type, represent theattribute eventusing the AT (*) [untyped value] terminal (see8.5.4.4 Undeclared Productions).

Note:

When a schema-informed grammar is in effect, xsi:type and xsi:nil attributes MUST NOT be represented using AT(*) terminal.

The grammarTypeEmpty_i is created by combining the sequence of attribute use grammarsterminated by an empty {content type} grammaras follows:

TypeEmpty_i =G₀ ⊕G₁ ⊕ … ⊕G_n−1⊕Content_i

where the grammarContent_i is created as follows:

	Content_i, 0 :
		EE

The grammarType_i is generated as follows.

If {content type} is a simple type definitionT_j, generate a grammarContent_iasType_jaccording to section8.5.4.1.3.1 Simple Type Grammars.If {content type} has a content model particle, generate a grammarContent_i according to section8.5.4.1.5 Particles.Otherwise, if {content type} isempty,create a grammarContent_i as follows:

	Content_i :
		EE

If {content type} is a content model particle with mixed content, add a production for each non-terminalContent_i , j inContent_i as follows:

	Content_i, j :
		CH[untyped value]Content_i, j

Note:
	The value of each Characters event that has an [untyped value] is represented as a String (see7.1.10 String).

Then, create a copyH_i of each attribute use grammarG_i and create the grammarType_i by combining this sequence of attribute use grammars and theContent_i grammar using the grammar concatenation operator defined in section8.5.4.1.1 Grammar Concatenation Operator as follows:

Type_i =H₀ ⊕H₁ ⊕ … ⊕H_n−1 ⊕Content_i

8.5.4.1.4 Attribute Uses

Given an attribute useA_i with properties {required} and {attribute declaration}, where {attribute declaration} has properties {name}, {target namespace}, {type definition} and {scope}, generate a new EXI grammarAttribute_i for evaluating attributes in the specified {scope} withqname local-name = {name} andqname uri = {target namespace}whereqname is theqname of the attributes.Add the following grammar productions toAttribute_i:

	Attribute_i, 0 :
		AT(qname) [schema-typed value]Attribute_i, 1

	Attribute_i, 1 :
		EE

If the {required} property ofA_i is false, add the following grammar production to indicate this attribute occurrence may be omitted from the content model.

	Attribute_i, 0 :
		EE

Note:

Productions of the formLeftHandSide : AT(qname) [schema-typed value]RightHandSide represent typed attributes that occur in schema-valid contexts with values that can be represented using the EXI datatype representation associated with the attribute's {type definition} (see7. Representing Event Content).Attributes that occur in schema-valid contexts that can be represented using the EXI datatype representation associated with the attribute's {type definition}, SHOULD be represented this way. Attributes that are not represented this way, are represented using the alternate forms of AT events described in section8.5.4.4 Undeclared Productions.

8.5.4.1.5 Particles

Givenan XML Schemaparticle^XS1P_i with {min occurs}, {max occurs} and {term} properties, generate a grammarParticle_i for evaluating instances ofP_i as follows.

If {term} is an element declaration, generate the grammarTerm₀ according to section8.5.4.1.6 Element Terms. If {term} is a wildcard, generate the grammarTerm₀ according to section8.5.4.1.7 Wildcard Terms Wildcard Terms. If {term} is a model group, generate the grammarTerm₀ according to section8.5.4.1.8 Model Group Terms.

Create {min occurs} copies ofTerm₀.

G₀,G₁, …,G_{{min occurs}-1}

If {max occurs} is not unbounded, create {max occurs} – {min occurs} additional copies ofTerm₀,

G_{{min occurs}},G_{{min occurs}+1}, …,G_{{max occurs}-1}

Add the following productions to each of the grammars that do not already have a production of this form.

	G_i, 0 :
		EE where {min occurs} ≤ i < {max occurs}

indicating these instances ofTerm₀ may be omitted from the content model. Then, create the grammar forParticle_i using the grammar concatenation operator defined in section8.5.4.1.1 Grammar Concatenation Operator as follows:

Particle_i =G₀ ⊕G₁ ⊕ … ⊕G_{{max occurs}-1}

Otherwise, if {max occurs} is unbounded, generate one additional copy ofTerm₀,G_{{min occurs}} and replace all productions of the form:

	G_{{min occurs}, k} :
		EE

with productions of the form:

	G_{{min occurs}, k} :
		G_{{min occurs}, 0}

indicating this term may be repeated indefinitely. Then, when there is no more production of the form:

	G_{{min occurs}, 0} :
		EE

add one after the other productions with the non-terminalG_{{min occurs}, 0} on the left-hand side, indicating this term may be omitted from the content model. Then, create the grammar forParticle_i using the grammar concatenation operator defined in section8.5.4.1.1 Grammar Concatenation Operator as follows:

Particle_i =G₀ ⊕G₁ ⊕ … ⊕G_{{min occurs}}

8.5.4.1.6 Element Terms

Given a particle {term}PT_i that is an XML Schemaelement declaration^XS1with properties {name},{substitution group affiliation}and {target namespace}, letS be the set of element declarations that directly or indirectly reaches the element declarationPT_i through the chain of {substitution group affiliation} property of the elements, plusPT_i itself if was not in the set. Sort the element declarations inS lexicographically first by {name} then by {target namespace}, which makes a sorted list of element declarationsE₀,E₁, …E_n−1 wheren is the cardinality ofS. Then create the grammarParticleTerm_i with the following grammar productions:

Syntax:

	ParticleTerm_i, 0 :
		SE(qname₀)ParticleTerm_i, 1
		SE(qname₁)ParticleTerm_i, 1
		⋮
		SE(qname_n−1)ParticleTerm_i, 1

	ParticleTerm_i, 1 :
		EE

Note:

	In the productions above,qname_x (where 0 ≤ x < n) represents aqname of which local-name and uri are {name} property and {target namespace} property of the element declarationE_x, respectively.

Semantics:

	In a schema-informed grammar, all productions of the formLeftHandSide : SE(qname)RightHandSide are evaluated as follows: Evaluate the element contents using the SE(qname) grammar. Evaluate the remainder of the event sequence usingRightHandSide

8.5.4.1.7 Wildcard Terms

Given a particle {term}PT_i that is an XML Schemawildcard^XS1with property {namespace constraint}, a grammar that reflects the wildcard definition is created as follows.

Create a grammarParticleTerm_i containing the following grammar production:

	ParticleTerm_i, 1 :
		EE

When the wildcard's {namespace constraint} isany, or a pair ofnot and either a namespace name or the special valueabsent indicating no namespace,add the following production toParticleTerm_i.

	ParticleTerm_i, 0 :
		SE()ParticleTerm*_i, 1

Otherwise, that is, when {namespace constraint} is a set of values whose members are namespace names or the special valueabsent indicating no namespace,add the following production toParticleTerm_i:

	ParticleTerm_i, 0 :
		SE(uri_x: )ParticleTerm*_i, 1

for each member valueuri_x in {namespace constraint},provided that it is the empty string (i.e. "") that is used asuri_x when the member value is the special valueabsent.Eachuri_x is used to augment the uri partition of the String table.Section7.3.1 String Table Partitions describes how theseuri strings are put into String table for pre-population.

Semantics:

In a schema-informed grammar, all productions of the formLeftHandSide : TerminalRightHandSide where Terminal is one of SE (*) or SE (uri_x: *) are evaluated as follows:

Letqname be theqname of the element matched by SE (*)or SE(uri_x: *)
If aglobal element grammar does not exist for elementqname, create one according to section8.4.3 Built-in Element Grammar.
Evaluate the element content using theglobal element grammar for elementqname.
Evaluate the remainder of the event sequence usingRightHandSide

8.5.4.1.8 Model Group Terms

8.5.4.1.8.1 Sequence Model Groups

Given a particle {term}PT_i that is a model group with {compositor} equal to "sequence" and a list ofn {particles}P₀,P₁, …,P_n−1, create a grammarParticleTerm_i as follows:

If the value ofn is 0, add the following productions to the grammarParticleTerm_i.

	ParticleTerm_i, 0 :
		EE

Otherwise, generate a sequence of grammarsParticle₀,Particle₁, …,Particle_n−1 corresponding to the list of particlesP₀,P₁, …,P_n−1 according to section8.5.4.1.5 Particles. Then combine the sequence of grammars using the grammar concatenation operator defined in section8.5.4.1.1 Grammar Concatenation Operator as follows:

ParticleTerm_i =Particle₀ ⊕Particle₁ ⊕ … ⊕Particle_n−1

8.5.4.1.8.2 Choice Model Groups

Given a particle {term}PT_i that is a model group with {compositor} equal to "choice" and a list ofn {particles}P₀,P₁, …,P_n−1, create a grammarParticleTerm_i as follows:

If the value ofn is 0, add the following productions to the grammarParticleTerm_i.

	ParticleTerm_i, 0 :
		EE

Otherwise, generate a sequence of grammar productionsParticle₀,Particle₁, …,Particle_n−1 corresponding to the list of particlesP₀,P₁, …,P_n−1 according to section8.5.4.1.5 Particles. Then create the grammarParticleTerm_i with the following grammar productions:

	ParticleTerm_i, 0 :
		Particle_0, 0

		Particle_1, 0
		⋮
		Particle_n−1, 0

indicating the grammar for the term may accept any one of the given {particles}.

8.5.4.1.8.3 All Model Groups

Given a particle {term}PT_i that is a model group with {compositor} equal to "all" and a list ofn { particles }P₀,P₁, ...,P_n−1, create a grammarParticleTerm_i as follows:

Add the following production to the grammarParticleTerm_i.

	ParticleTerm_i, 0 :
		EE

If the value ofn is not 0, generate a sequence of grammar productionsParticle₀,Particle₁, …,Particle_n−1 corresponding to the list of particlesP₀,P₁, …,P_n−1 according to section8.5.4.1.5 Particles.

Replace all productions of the form:

	Particle_j , k :
		EE

with productions of the form:

	Particle_j , k :
		ParticleTerm_i, 0

where 0 ≤ j <n, and 0 ≤ k <m withm denoting the number non-terminals in the grammarParticle_j.

Add the following productions to the grammarParticleTerm_i.

	ParticleTerm_i, 0 :
		Particle_0, 0

		Particle_1, 0
		⋮
		Particle_n−1, 0

Note:

The grammar above can accept any sequence of the given {particles} in any order. This grammar is intentionally simple and succinct, enabling high-performance, low-footprint implementations on a wide range of devices, including those with very limited memory resources. More elaborate and precise grammars for the "all" group are possible; however, the associated improvement in precision is not sufficient to justify their code-footprint and memory resource requirements.

8.5.4.2 EXI Normalized Grammars

This section describes the process for converting an EXI proto-grammargeneratedfrom an XML Schema in accordance with section8.5.4.1 EXI Proto-Grammars into an EXI normalized grammar. Each production in an EXI normalized grammar has exactly one non-terminal symbol on the left-hand side and one terminal symbol on the right-hand side followed by at most one non-terminal symbol on the right-hand side. In addition, EXI normalized grammars contain no two grammar productions with the same non-terminal on the left-hand side and the same terminal symbol on the right-hand side. This is a restricted form of Greibach normal form[Greibach Normal Form].

EXI proto-grammars differ from normalized EXI grammars in that they may contain productions of the form:

	LeftHandSide :
		RightHandSide

whereLeftHandSide andRightHandSide are both non-terminals. Therefore, the first step of the normalization process focuses on replacing productions in this form with productions that conform to the EXI normalized grammar rules. This process can produce a grammar that has more than one production with the same non-terminal on the left-hand side and the same terminal symbol on the right-hand side. Therefore, the second step focuses on eliminating such productions.

The first step of the normalization process is described in Section8.5.4.2.1 Eliminating Productions with no Terminal Symbol. The second step is described in section8.5.4.2.2 Eliminating Duplicate Terminal Symbols. Once these two steps are completed, the grammar will be an EXI normalized grammar.

8.5.4.2.1 Eliminating Productions with no Terminal Symbol

Given an EXI proto-grammarG_i, with non-terminalsG_i, 0,G_i, 1, …,G_i, n−1, replace each production of the form:

	G_i, j :
		G_i, k where 0 ≤ j <n and 0 ≤ k <n

with a set of productions:

	G_i, j :
		RHS(G_i, k)₀
		RHS(G_i, k)₁
		⋮
		RHS(G_i, k)_m-1

whereRHS(G_i, k)₀,RHS(G_i, k)₁, …,RHS(G_i, k)_m-1 represents the right-hand side of each production inG_i that has the non-terminalG_i, k on the left-hand side andm is the number of such productions.

Remove such productions if any amongG_i, j :RHS(G_i, k)_h where 0 ≤ h <m of which the right-hand side either is identical to the left-hand side, or has previously been replaced while applying the process described in this section to productions withG_i, j on the left-hand side.

Repeat this process until there are no moreproductionsof the form:

	G_i, j :
		G_i, k where 0 ≤ j <n and 0 ≤ k <n

in the grammarG_i.

8.5.4.2.2 Eliminating Duplicate Terminal Symbols

Given an EXI proto-grammarG_i, with non-terminalsG_i, 0,G_i, 1, …,G_i, n−1, identify all pairs of productions that have the same non-terminal on the left-hand side and the same terminal symbol on the right-hand side of the form:

	G_i, j :
		TerminalG_i, k
		TerminalG_i, l

wherek ≠ l and Terminal represents a particular terminal symbol and replace them with a single production:

	G_i, j :
		TerminalG_{i, k ⊔ l}

whereG_{i, k ⊔ l} is a distinct non-terminal that accepts the inputs accepted byG_i, k and the inputs accepted byG_i, l.Here the notation " k ⊔ l " denotes a union set of integers and is used to uniquely identify the index of such a non-terminal.

If the non-terminalG_{i, k ⊔ l} does not exist, create it as follows:

	G_{i, k ⊔ l} :
		RHS(G_i, k)₀
		RHS(G_i, k)₁
		⋮
		RHS(G_i, k)_m-1


		RHS(G_i, l)₀
		RHS(G_i, l)₁
		⋮
		RHS(G_i, l)_n−1

whereRHS(G_i, k)₀,RHS(G_i, k)₁,…,RHS(G_i, k)_m-1andRHS(G_i, l)₀,RHS(G_i, l)₁,…,RHS(G_i, l)_n−1represent the right-hand side of each production in the GrammarG_i that has the non-terminalsG_j, k andG_j, l on the left-hand side respectively andm andn are the number of such productions.

Repeat this process until there are no more productions in the grammarG_i of the form:

	G_i, j :
		TerminalG_i, k
		TerminalG_i, l

Then, identify any identical productions of the following form:

	G_i, j :
		TerminalG_i, k
		TerminalG_i, k

where 0 ≤k <n,n is the number of productions inG_i and Terminal represents a specific terminal symbol, then remove one of them until there are no more productions remaining in the grammarG_i of this form.

8.5.4.3 Event Code Assignment

This section describes the process for assigning uniqueevent codes to each production in a normalized EXI grammar. Given a normalized EXI grammarG_i, apply the following process to each unique non-terminalG_i, j that occurs on the left-hand side of the productions inG_i where 0 ≤ j <n andn is the number of such non-terminals inG_i.

Sort all productions withG_i, j on the left-hand side in the following order:

all productions with AT(qname) on the right-hand sidesorted lexicographically byqname local-name, then byqname uri, followed by
all productions with AT(uri_x : *) on the right-hand side sorted lexicographically byuri, followed by
any production with AT (*) on the right-hand side, followed by
all productions with SE(qname) on the right-hand side sorted in schema order, followed by
all productions with SE(uri_x : *) on the right-hand side sorted in schema order, followed by
any production with SE(*) on the right-hand side, followed by
any production with EE on the right-hand side, followed by
any production with CH on the right-hand side.

In step 4 and step 5, the schema order of productions with SE(qname) and SE(uri_x : *) on the right-hand side is determined by the order of the corresponding particles in the schema after any references to named model groups in the schema are expanded in place with the group definitions themselves. A content model of a complex type can be seen as a tree that consists of particles where particles of either element declaration terms or wildcard terms appear as leaves, and the order is assigned to those leaf particles by traversing the tree by depth-first method.

Given the sorted list of productionsP₀,P₁, …P_n with the non-terminalG_i, j on the left-hand side, assignevent codes to each of the productions as follows:

Productions		Event Code

	P₀	0
	P₁	1
	⋮	⋮
	P_n−1	n−1

8.5.4.4 Undeclared Productions

The normalized element and type grammarsgeneratedfrom a schema describe the sequences of child elements, attributes and character events that may occur in a particular EXI stream. However, there are additional events that may occur in an EXI stream that are not described by the schema, for example events representing comments, processing-instructions, schema deviations, etc.

This section first describes the process for, in cases withstrict option value set to false, augmenting the normalized element and type grammars with productions that describe events that may occur in the EXI stream, but are not explicitly declared in the schema. It then describes the way, in cases withstrict option value set to true, normalized element and type grammars are supplemented with productions to be prepared for the occurrences of xsi:type and xsi:nil attributes that are permitted by the schema.

In the normalized grammars, terminal symbols AT and CH represent attribute and character events that can be represented by the EXI datatype representations associated with their schema datatypes (see7. Representing Event Content).When thestrict option is false, additional untyped AT and CH terminal symbols are added that can be used for representing attributes and character events that cannot be represented by the associated EXI datatype representations (e.g., schema-invalid values). The following table shows the notation used for such AT and CH terminals along with their definitions.

Notation	Definition
AT (qname) [untyped value]	Terminal symbol that matches an attributeevent withqnameqname and an untyped value.
AT (*) [untyped value]	Terminal symbol that matches an attributeevent with anyqname and an untyped value.
CH [untyped value]	Terminal symbol that matches a charactersevent with an untyped value.

8.5.4.4.1 Adding Productions when Strict is False

This section describes the process for augmenting the normalized grammars when the value of thestrict option is false.

[Definition:] For each normalized element grammarElement_i,an unique index numbercontent is determined such that: for each set of grammar productions with left-hand side non-terminal symbol of index smaller thancontent there is at least one production with AT terminal symbol and the rest of the productions inElement_i with left-hand side non-terminal symbols of indices equal or greater thancontent do not have AT terminal symbols. The left-hand side non-terminal symbols indices are assigned in ascending order with the entry non-terminal symbol ofElement_i being assigned index 0 (zero). If there are no productions inElement_i that have AT terminal symbols on their right-hand side, the content index is 0.

For each normalized element grammarElement_i, create a copyElement_i, content2 ofElement_i, content where the index "content" is thecontent of theElement_i grammar. Then, apply the following procedures.

Add the following production to each non-terminalElement_i, j that does not already include a production of the formElement_i, j : EE, such that 0 ≤ j ≤ content.

		Event Code

Element_i, j :
	EE	n.m

wheren.m represents the next availableevent code with length 2.

LetE_i be the element declaration from whichElement_i was created andT_k be the {type definition} ofE_i. LetType_kandTypeEmpty_kbe the type grammars created fromT_k (see section8.5.4.1.3 Type Grammars). Add the following productions toElement_i.

		Event Code

Element_i, 0 :
	AT(xsi:type)Element_i, 0	n.m
	AT(xsi:nil)Element_i, 0	n.(m+1)

wheren.m represents the next availableevent code with length 2.

Note:

	When xsi:type and/or xsi:nil attributes appear in an element where schema-informed grammars are in effect, they MUST occur before any otherattributeevents of the same element, with xsi:type placed before xsi:nil when they both occur.

Semantics:

	All productions of the formLeftHandSide : AT (xsi:type)RightHandSide are evaluated as follows: Lettarget-type be the value of the xsi:type attribute and assign it the QName datatype representation (see7.1.7 QName). If there is no namespace in scope for the specifiedqname prefix, set theuri oftarget-type to empty ("") and thelocalName to the full lexical value of the QName, including the prefix. Represent the value oftarget-type according to section7. Representing Event Content. If a grammar can be found for thetarget-type type using the encodedtarget-type representation,evaluate the element contents using the grammar fortarget-type type instead ofRightHandSide.
	In a schema-informed grammar, all productions of the formLeftHandSide : AT (xsi:nil)RightHandSide are evaluated as follows: Letnil be the value of the xsi:nil attribute. Ifnil is a valid Boolean, assign it the Boolean datatype representation (see7.1.2 Boolean) and encode it according to section7. Representing Event Content. Ifnil is not a valid Boolean, represent thexsi:nil attributeevent using the AT (*) [untyped value] terminal (see8.5.4.4 Undeclared Productions). If the value ofnil is true, evaluate the element contents using the grammarTypeEmpty_kdefined above rather thanRightHandSide.

For each non-terminalElement_i, j, such that 0 ≤ j ≤ content , with zero or more productions of the following form:

	Element_i, j :
		AT (qname₀) [schema-typed value]NonTerminal₀
		AT (qname₁) [schema-typed value]NonTerminal₁
		⋮
		AT (qname_x-1) [schema-typed value]NonTerminal_x-1

wherex represents the number of attributes declared in the schema for this context, add the following productions:

		Event Code

Element_i, j :
	AT ()Element*_i, j	n.m
	AT (qname₀) [untyped value]NonTerminal₀	n.(m+1).0
	AT (qname₁) [untyped value]NonTerminal₁	n.(m+1).1
	⋮	⋮
	AT (qname_x-1) [untyped value]NonTerminal_x-1	n.(m+1).(x-1)
	AT () [untyped value]Element*_i, j	n.(m+1).(x)

wheren.m represents the next availableevent code with length 2.

Note:

The value of eachattributeevent that has an [untyped value] is represented as a String (see7.1.10 String).

Like an element, an attribute may occur in a schema-invalid context, have a untyped (e.g., schema-invalid) value or both. However, unlike an element whose occurrence and value are represented by separate SE and CH events, the occurrence and value of an attribute are represented by a single AT event. Consequently, four kinds of ATterminal symbolsare needed for the four possible representations of an attribute eventin schema-informed grammars.The table below shows these four kinds of AT terminal symbolsalong with the equivalent combinations of SE and CHterminal symbolsfor representing elements.

Table 8-1.Equivalent terminal symbolsfor different attribute and element representations

Schema-typed value

Untyped value

Schema-valid occurrence

AT (qname) [schema-typed value]

SE (qname) CH [schema-typed value]

AT (qname) [untyped value]

SE (qname) CH [untyped value]

Schema-invalid occurrence

AT (*)

SE (*) CH [schema-typed value]

AT (*) [untyped value]

SE (*) CH [untyped value]

Note that an attribute matching AT (*) terminal without [untyped value] predicationin schema-informed grammars bears an untyped value unless there is a global attributedefinition available forqname whereqname is theqname of the attribute.When a global attribute definition is available forqname, the attributevalue is represented according to the datatype of the global attribute.

In the above table, AT (*) terminal without [untyped value] predication is shownonly in the bottom left cell for simplicity. To be more precise, it might well extend into thebottom right cell because attributes matching the terminal bear schema-typed values insome cases or untyped values in others depending on the availability of a globalattribute definition that denotes the attribute.

When xsi:type and/or xsi:nil attributes appear in an element where schema-informed grammars are in effect, they MUST occur before any otherattributeevents of the same element, with xsi:type placed before xsi:nil when they both occur.

Semantics:

	In a schema-informed grammar,all productions of the formLeftHandSide : AT () are evaluated as follows: Letqname be theqname of the attribute matched by AT () If a global attribute definition exists forqname, letglobal-type be the datatype of the global attribute. If the attribute value can be represented using the datatype representation associated withglobal-type, it SHOULD be represented using the datatype representation associated withglobal-type (see7. Representing Event Content). If the attribute value is not represented using the datatype representation associated withglobal-type, represent theattribute eventusing the AT (*) [untyped value] terminal (see8.5.4.4 Undeclared Productions).
Note: When a schema-informed grammar is in effect, xsi:type and xsi:nil attributes MUST NOT be represented using AT() terminal. AT() [untyped value] terminal, on the other hand, can be used to represent an xsi:nil attribute when there is a production of the formLeftHandSide : AT (xsi:nil) whereLeftHandSide is the left-hand side non-terminal of the AT() [untyped value] terminal in question, and the value of the xsi:nil attribute is unable to be represented using AT (xsi:nil) terminal. AT() [untyped value] terminal MUST NOT be used to represent xsi:type attributes.

Add the following production toElement_i.

		Event Code

Element_i, 0 :
	NSElement_i, 0	n.m

wheren.m represents the next availableevent code with length 2.

When the value of theselfContained option is true, add the following production toElement_i.

		Event Code

Element_i, 0 :
	SCFragment	n.m

wheren.m represents the next availableevent code with length 2.

Semantics:

All productions of the formLeftHandSide : SCFragment are evaluated as follows:

Save the string table, grammars and any implementation-specific state learned while processing this EXI Body.
Initialize the string table, grammars and any implementation-specific state learned while processing this EXI Body to the state they held just prior to processing this EXI Body.
Skip to the next byte-aligned boundary in the streamif it is not already at such a boundary.
Letqname be theqname of the SE event immediately preceding this SC event.
Letcontent be the sequence of events following this SC event that match the grammar for elementqname, up to and including the terminating EE event.
Evaluate the sequence of events (SD, SE(qname),content, ED) according to theFragment grammar. (see8.5.2 Schema-informed Fragment Grammar)
Skip to the next byte-aligned boundary in the streamif it is not already at such a boundary.
Restore the string table, grammars and implementation-specific state learned while processing this EXI Body to that saved in step 1 above.

Add the following productions to each non-terminalElement_i, j, such that 0 ≤ j ≤ content .

		Event Code

Element_i, j :
	SE ()Element*_i, content2	n.m
	CH [untyped value]Element_i, content2	n.(m+1)
	ERElement_i, content2	n.(m+2)
	CMElement_i, content2	n.(m+3).0
	PIElement_i, content2	n.(m+3).1

wheren.m represents the next availableevent code with length 2.

Note:

Productions of the formLeftHandSide : CH [untyped value]RightHandSide match untyped character data represented as a String in the EXI stream (e.g., schema-invalid values). Character data represented using the datatype representation associated with the schema datatype of the character data are matched by productions of the formLeftHandSide : CH [schema-typed value]RightHandSide described in section8.5.4.1.3.1 Simple Type Grammars.

Semantics:

In a schema-informed grammar, all productions of the formLeftHandSide : SE (*)RightHandSide are evaluated as follows:

Letqname be theqname of the element matched by SE (*)
If aglobal element grammar does not exist for elementqname, create one according to section8.4.3 Built-in Element Grammar.
Evaluate the element content using theglobal element grammar for elementqname.
Evaluate the remainder of the event sequence usingRightHandSide

Add the following production toElement_i, content2 and to each non-terminalElement_i, j that does not already include a production of the formElement_i, j : EE, such that content <j <n, wheren is the number of non-terminals inElement_i.

		Event Code

Element_i, j :
	EE	n.m

wheren.m represents the next availableevent code with length 2.

Add the following productions toElement_i, content2 and to each non-terminalElement_i, j, such that content <j <n, wheren is the number of non-terminals inElement_i.

		Event Code

Element_i, j :
	SE ()Element*_i, j	n.m
	CH [untyped value]Element_i, j	n.(m+1)
	ERElement_i, j	n.(m+2)
	CMElement_i, j	n.(m+3).0
	PIElement_i, j	n.(m+3).1

wheren.m represents the next availableevent code with length 2.

Semantics:

In a schema-informed grammar, all productions of the formLeftHandSide : SE (*)RightHandSide are evaluated as follows:

Letqname be theqname of the element matched by SE (*)
If aglobal element grammar does not exist for elementqname, create one according to section8.4.3 Built-in Element Grammar.
Evaluate the element content using theglobal element grammar for elementqname.
Evaluate the remainder of the event sequence usingRightHandSide

Apply the process described above for element grammars to each normalized type grammarType_i andTypeEmpty_i.

8.5.4.4.2 Adding Productions when Strict is True

This section describes the process for augmenting the normalized grammars when the value of thestrict option is true. For each normalized element grammarElement_i, apply the following procedures.

LetE_i be the element declaration from whichElement_i was created andT_k be the {type definition} ofE_i. IfT_keither has named sub-types or is a simple type definition of which {variety} isunion,add the following production toElement_i.

		Event Code

Element_i, 0 :
	AT(xsi:type)Element_i, 0	n.m

wheren.m represents the next availableevent code with length 2.

Semantics:

All productions of the formLeftHandSide : AT (xsi:type)RightHandSide are evaluated as follows:

	Lettarget-type be the value of the xsi:type attribute and assign it the QName datatype representation (see7.1.7 QName). If there is no namespace in scope for the specifiedqname prefix, set theuri oftarget-type to empty ("") and thelocalName to the full lexical value of the QName, including the prefix. Represent the value oftarget-type according to section7. Representing Event Content. If a grammar can be found for thetarget-type type using the encodedtarget-type representation,evaluate the element contents using the grammar fortarget-type type instead ofRightHandSide.

LetType_kandTypeEmpty_kbe the type grammars created fromT_k (see section8.5.4.1.3 Type Grammars). If the {nillable} property ofE_i is true, add the following production toElement_i.

		Event Code

Element_i, 0 :
	AT(xsi:nil)Element_i, 0	n.m

wheren.m represents the next availableevent code with length 2.

Semantics:

In a schema-informed grammar, all productions of the formLeftHandSide : AT (xsi:nil)RightHandSide are evaluated as follows:

	Letnil be the value of the xsi:nil attribute. Ifnil is a valid Boolean, assign it the Boolean datatype representation (see7.1.2 Boolean) and encode it according to section7. Representing Event Content. Otherwise, the value of thenil is schema-invalid and cannot be represented when the value of thestrict option is true. If the value ofnil is true, evaluate the element contents using the grammarTypeEmpty_kdefined above rather thanRightHandSide.

Note:

When xsi:type or xsi:nil attributes appear in an element where schema-informed grammars are in effect withstrict option valuetrue, they MUST occur before any other attribute events of the same element.
There are several restrictions peculiar to schema-informed element and type grammars created withstrict option valuetrue in the ability to represent a small number of uncommon element information item formulations yet valid to the schemas. This is a consequence of intentional grammar simplification aimed to make the grammars compact enough for and the amount of footprint necessary to process the grammars amenable even to extremely resource-deprived devices. Itemized below are such restrictions. Those restrictions need to be given heed to in the process of making a decision to choose the rightstrict option value that works best for each use case.
	It is not possible to use xsi:type and xsi:nil attributes together on the same element.This is due to the fact that xsi:type specifies a target type definition, but xsi:nil is only permitted on nillable elements, not type definitions.
	It is not possible to use xsi:type for explicitly denoting the natural type (i.e. the type immediately given to an element definition in the schema) of the element unless such a type has named sub-types or is a simple type definition of which {variety} isunion.
	Namespace declarations are not available instream. A consequence of this is that instream namespace declarations otherwise available whenstrict option value isfalse cannot be turned to for helping decompose qualified names[Namespaces in XML 1.0] [Namespaces in XML 1.1] in AT or CH values into pairs of uri and local-name, including xsi:type attribute values encoded with thePreserve.lexicalValues option valuetrue wherein the qualified names are represented using String datatype representation (see7.1.10 String) instead of QName datatype representation (see7.1.7 QName).
	The attributes xsi:schemaLocation and xsi:noNamespaceSchemaLocation can appear only when they match specific schema declarations (i.e., wildcards or ur-types).

9. EXI Compression

The use of EXI compression increases compactness utilizing additional computational resources. EXI compression combines knowledge of XML with a widely adopted, standard compression algorithm to achieve higher compression ratios than would be achievable by applying compression to the entire stream.

EXI compression is applied whencompression is turned on or whenalignment is set topre-compression. Byte-aligned representations ofevent codes andcontent items are more amenable to compression algorithms compared to unaligned representations because most compression algorithms operate on series of bytes to identify redundancies in the octets. Therefore, when EXI compression is used,event codes andcontent items of EXI events are encoded as aligned bytes in accordance with6.2 Representing Event Codes and7. Representing Event Content.

EXI compression splits a sequence of EXI events into a number of contiguous blocks of events.Events that belong to the same block are transformed into lower entropy groups of similar values calledchannels, which are individually well suited for standard compression algorithms. To reduce compression overhead, smaller channels are combined before compressing them, while larger channels are compressed independently. The criteria EXI compression uses to define and combine channels is intentionally simple to facilitate implementation, reduce processing overhead, and avoid the need to encode channel ordering or grouping information in the format. The figure below presents a schematic view of the steps involved in EXI compression.

Figure 9-1.EXI Compression Overview

In the following sections,9.1 Blocks defines blocks and explains how EXI events are partitioned into blocks.Section9.2 Channels defines channels, their organization as well as how a group of channels correlate to its corresponding block of events.Section9.3 Compressed Streams describes how some channels are combined as needed in preparation for applying compression algorithms on channels.

9.1 Blocks

EXI compression partitions the sequence of EXI events into a sequence of one or more non-overlapping blocks. Each block preceding the final block contains the minimum set of consecutive events that result in exactlyblockSize values in its value channels (see9.2.2 Value Channels), where blockSize is the block size of the EXI stream (see5.4 EXI Options). The final block containsless than the minimum set of consecutive events that result in blockSizevalues in its value channels.

9.2 Channels

Events inside each block are multiplexed into channels. The first channel of each block is the structure channel described in Section9.2.1 Structure Channel. The remaining channels in each block are value channels described in Section9.2.2 Value Channels.The diagram below presents an exemplary view of the transformation in which events within a block are multiplexed into channels in one way and channels are demultiplexed into events in the other way.

Figure 9-2.Multiplexing EXI events into channels

9.2.1 Structure Channel

The structure channel of each block defines the overall order and structure of the events in that block. It contains theevent codes and associated content for each event in the block, except for Attribute (AT) and Character (CH)values,which are stored in the value channels. In addition, there are two kinds of attribute events whosevalues are stored in the structure channel instead of in value channels. Thevalue of each xsi:type attribute is stored in the structure channel.Thevalue of each xsi:nil attribute that matches a schema-informed grammar productionthat does not include the AT (*) [untyped value] terminal is also stored in the structure channel. These attribute events are intrinsic to the grammar system thus are essential in processing the structure channel because their values affect the grammar to be used for processing the rest of the elements on which they appear. Allevent codes and content in the structure stream occur in the same order as they occur in the EXI event sequence.

9.2.2 Value Channels

Thevalues of the Attribute (AT) and Character (CH) events in each block are organized into separate channels based on theqname of the associated attribute or element. Specifically, thevalue of each Attribute (AT) event is placed in the channel identified by theqname of the Attribute and thevalue of each Character (CH) event is placed in the channel identified by theqname of its parent Start Element (SE) event. Each block contains exactly one channel for each distinct element or attributeqname that occurs in the block. Thevalues in each channel occur in the order they occur in the EXI event sequence.

9.3 Compressed Streams

The channels in a block are further organized into compressed streams. Smaller channels are combined into the same compressed stream, while others are each compressed separately. Below are the rules applied within the scope of a block used to determine the channels to be combined together, the order of the compressed streams and the order amongst the channels that are combined into the same compressed stream.

If the value channels of the block contain at most 100values, the block will contain only 1 compressed stream containing the structure channel followed by all of the value channels. The order of the value channels within the compressed stream is defined by the order in which the firstvalue in each channel occurs in the EXI event sequence.

If the value channels of the block contain more than 100values, the first compressed stream contains only the structure channel. The second compressed stream contains all value channels that contain at most 100values. And the remaining compressed streams each contain only one channel, each having more than 100values. The order of the value channels within the second compressed stream is defined by the order in which the firstvalue in each channel occurs in the EXI event sequence. Similarly, the order of the compressed streams following the second compressed stream in the block is defined by the order in which the firstvalue of the channel inside each compressed stream occurs in the EXI event sequence.

Note:

EXI compression changes the order in whichevent codes andvalues are read and written to and from an EXI stream.EXI processors must encode and decodevalues in this revised order so order sensitive constructs like the string table (see7.3 String Table) work properly.

When the value of thecompression option is set to true, each compressed stream in a block is stored using the standard DEFLATE Compressed Data Format defined by RFC 1951[IETF RFC 1951]. Otherwise, when the value of thealignment option is set topre-compression, each compressed stream in a block is stored directly without the DEFLATE algorithm.

Note:

Some EXI events have zero-byte representations and are not explicitly represented in the EXI stream. If all the events in a channel have zero-byte representations, the channel has a zero-byte representation and is not explicitly represented in a compressed stream. Implementations must take care to avoid creating an empty DEFLATE stream when all the channels that would have otherwise been organized into a compressed stream are implicit. E.g., this can occur if the final block contains only zero-length EE and ED events.

10. Conformance

10.1 EXI Stream Conformance

[Definition:] Aconformant EXI stream consists of a sequence of octets that follows the syntax ofEXI stream that is defined in this document.[Definition:] EXI format provides a way to involve user-defined datatype representations in EXI streams processing, which is an extension point that, when used in conjunction with relevant datatype representations specifications external to this document, leads to the formulation ofExtended EXI streams.

Conformance of extended EXI streams is relative to the syntax defined by the relevant user-defined datatype representations specifications. The definitions of user-defined datatype representations syntax are out of the scope of this document.[Definition:] An extended EXI stream is aconformant extended EXI stream if replacing value items represented using user-defined datatype representations with their intrinsic representations would make the stream aconformant EXI stream.An extended EXI stream described as an "EXI stream with regards to datatype representationsS " whereS is the set of datatype representations can be processed by anEXI stream decoder only if the processor has the shared knowledge about each one of the datatype representations in the setS.

The structural syntax ofEXI streams andextended EXI streams is described by the abstract EXI grammar system defined in this document. Although this document specifies the normative way in which XML Schema schemas are mapped into the EXI grammar system to make schema-informed grammars, EXI allows the use of other schema languages to process EXI streams or extended EXI streams so far as there is a well known EXI grammar binding of the schema language and the binding preserves the semantics of the EXI grammar system. EXI streams or extended EXI streams generated using schemas of such schema language are still conformant. The definitions of grammar bindings for schema languages other than XML Schema are out of the scope of this document, and each schema language community is encouraged to define its own binding in order to make it possible to harness the utmost efficiency out of EXI when schemas of the language are available.

10.2 EXI Processor Conformance

The conformance of EXI Processors is defined separately for each of the two processor roles,EXI stream encoders andEXI stream decoders; the conformance of the former is described in terms of the conformance of theEXI streams orextended EXI streams that they produce, while that of the latter is based on the set of format features that EXI stream decoders are preparedfor in the processing ofconformant EXI streams orconformant extended EXI streams.

AnEXI stream encoder is conformant if and only if it is capable of generatingconformant EXI streams orconformant extended EXI streams given any input structured data it is made to work on.On the other hand,EXI stream decoders MUST support all format features described in this document as they are explained, except for the capability of handlingDatatype Representation Mapwhich is an optional feature.EXI stream decoders that do not implementDatatype Representation Mapfeature MUST report an error with a meaningful message upon encountering a"datatypeRepresentationMap" element while processingEXI options documents inEXI headers.

Except where required for interoperability with limited computing platforms (e.g, mobile and embedded devices), this specification avoids placing arbitrary limits on the magnitude of specific numeric values required for implementation. So, in theory it is possible for EXI grammars, event codes, strings, enumeration lists, etc. to be arbitrarily large. In practice, however, it is not the intent of this specification to require conforming implementations to adopt exotic or inefficient numeric representations for handling arbitrarily large EXI documents and grammars on specific platforms.

A References

A.1 Normative References

IETF RFC 1951: DEFLATE Compressed Data Format Specification version 1.3, P. Deutsch, Author. Internet Engineering Task Force, May 1996. Available at http://www.ietf.org/rfc/rfc1951.txt.
IETF RFC 2119: Key words for use in RFCs to Indicate Requirement Levels, S. Bradner, Author. Internet Engineering Task Force, June 1999. Available at http://www.ietf.org/rfc/rfc2119.txt.
IETF RFC 3023: XML Media Types, M. Murata, S. St.Laurent and D. Kohn, Author. Internet Engineering Task Force, January 2001. Available at http://www.ietf.org/rfc/rfc3023.txt.
ISO/IEC 10646: ISO/IEC 10646-1:2000. Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane andISO/IEC 10646-2:2001. Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 2: Supplementary Planes, as, from time to time, amended, replaced by a new edition or expanded by the addition of new parts. [Geneva]: International Organization for Standardization. (Seehttp://www.iso.org for the latest version.)
UNICODE: The Unicode Standard, The Unicode Consortium
XML 1.0: Extensible Markup Language (XML) 1.0 (Fifth Edition), T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, and F. Yergeau, Editors. World Wide Web Consortium, 10 February 1998, revised 26 November 2008. This version is http://www.w3.org/TR/2008/REC-xml-20081126. The latest version is available at http://www.w3.org/TR/REC-xml.
XML 1.1: Extensible Markup Language (XML) 1.1 (Second Edition), T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau, and J. Cowan, Editors. World Wide Web Consortium, 04 February 2004, revised 16 August 2006. This version is http://www.w3.org/TR/2006/REC-xml11-20060816. The latest version is available at http://www.w3.org/TR/xml11.
Namespaces in XML 1.0: Namespaces in XML 1.0 (Third Edition), T. Bray, D. Hollander, A. Layman, R. Tobin, and H. Thompson, Editors. World Wide Web Consortium, 14 January 1999, revised 8 December 2009. This version is http://www.w3.org/TR/2009/REC-xml-names-20091208. The latest version is available at http://www.w3.org/TR/xml-names/.
Namespaces in XML 1.1: Namespaces in XML 1.1 (Second Edition), T. Bray, D. Hollander, A. Layman, and R. Tobin, Editors. World Wide Web Consortium, 4 February 2004, revised 16 August 2006. This version is http://www.w3.org/TR/2006/REC-xml-names11-20060816. The latest version is available at http://www.w3.org/TR/xml-names11/.
XML Information Set: XML Information Set (Second Edition), J. Cowan and R. Tobin, Editors. World Wide Web Consortium, 24 October 2001, revised 4 February 2004. This version is http://www.w3.org/TR/2004/REC-xml-infoset-20040204. The latest version is available at http://www.w3.org/TR/xml-infoset.
XML Schema Structures: XML Schema Part 1: Structures Second Edition, H. Thompson, D. Beech, M. Maloney, and N. Mendelsohn, Editors. World Wide Web Consortium, 2 May 2001, revised 28 October 2004. This version is http://www.w3.org/TR/2004/REC-xmlschema-1-20041028. The latest version is available at http://www.w3.org/TR/xmlschema-1.
XML Schema Datatypes: XML Schema Part 2: Datatypes Second Edition, P. Byron and A. Malhotra, Editors. World Wide Web Consortium, 2 May 2001, revised 28 October 2004. This version is http://www.w3.org/TR/2004/REC-xmlschema-2-20041028. The latest version is available at http://www.w3.org/TR/xmlschema-2.

A.2 Other References

Efficient XML: Efficient XML, part of[EXI Measurements Note] independently referenced. The latest version is available at http://www.w3.org/TR/exi-measurements/#contributions-efx.
EXI Evaluation Note: Efficient XML Interchange Evaluation, Carine Bournez, Editor. World Wide Web Consortium. The latest version is available at http://www.w3.org/TR/exi-evaluation/.
EXI Impacts Note: Efficient XML Interchange (EXI) Impacts, Jaakko Kangasharju, Editor. World Wide Web Consortium. The latest version is available at http://www.w3.org/TR/exi-impacts/.
EXI Measurements Note: Efficient XML Interchange Measurements Note, Greg White, Jaakko Kangasharju, Don Brutzman and Stephen Williams, Editors. World Wide Web Consortium. The latest version is available at http://www.w3.org/TR/exi-measurements/.
EXI Primer: Efficient XML Interchange (EXI) Primer, Daniel Peintner, Santiago Pericas-Geertsen, Editors. World Wide Web Consortium. The latest version is available at http://www.w3.org/TR/exi-primer/.
Greibach Normal Form: A New Normal-Form Theorem for Context-Free Phrase Structure Grammars, Sheila A. Greibach, Author. Journal of the ACM Volume 12 Issue 1, January 1965, pp. 42–52.
Huffman Coding: A Method for the Construction of Minimum-Redundancy Codes, D. A. Huffman, Author. Proceedings of the I.R.E., September 1952, pp. 1098-1102.
IEEE 754-2008: IEEE Standard for Floating-Point Arithmetic
ISO/IEC 19757-2:2008: Document Schema Definition Language (DSDL) -- Part 2: Regular-grammar-based validation -- RELAX NG
SOAP 1.2: SOAP Version 1.2 Part 1: Messaging Framework (Second Edition), M. Gudgin, M. Hadley, N. Mendelsohn, J-J. Moreau, H. Frystyk Nielsen, A. Karmarkar, and Y. Lafon, Editors. World Wide Web Consortium, 24 June 2003, revised 27 April 2007. This version is http://www.w3.org/TR/2007/REC-soap12-part1-20070427/᠎. The latest version is available at http://www.w3.org/TR/soap12-part1/.
XBC Measurement Methodologies: XML Binary Characterization Measurement Methodologies, S. D. Williams and P. Haggar, Editors. World Wide Web Consortium, 31 March 2005. This version is http://www.w3.org/TR/2005/NOTE-xbc-measurement-20050331/. The latest version is available at http://www.w3.org/TR/xbc-measurement.
XBC Use Cases: XML Binary Characterization Use Cases, Mike Cokus and Santiago Pericas-Geertsen, Editors. World Wide Web Consortium, 31 March 2005. This version is http://www.w3.org/TR/2005/NOTE-xbc-use-cases-20050331/. The latest version is available at http://www.w3.org/TR/xbc-use-cases.
XBC Properties: XML Binary Characterization Properties, Mike Cokus and Santiago Pericas-Geertsen, Editors. World Wide Web Consortium, 31 March 2005. This version is http://www.w3.org/TR/2005/NOTE-xbc-properties-20050331/ The latest version is available at http://www.w3.org/TR/xbc-properties/.

B Infoset Mapping

This appendix contains the mappings between the XML Information Set[XML Information Set] model and the EXI format. Starting from the document information item, eachinformation item definition is mapped to its respective unordered set of EXI event types(seeTable 4-1).The actual order amongst information set items when it is relevant reflects the occurrence order of EXI events or their references in an EXI stream that correlate to the infoset items. As used in the XML Information Set specification, the Infoset property names are shown in square brackets,[thus].

Note:

As has been prescribed in section2. Design Principles, EXI is designed to be compatible with the XML Information Set. While this approach is both legitimate and practical for designing a succinct format interoperable with XML family of specifications and technologies, it entails that some lexical constructs of XML not recognized by the XML Information Set are not represented by EXI, either. Examples of such unrepresented lexical constructs of XML include white space outside the document element, white space within tags, the kind of quotation marks (single or double) used to quote attribute values, and the boundaries of CDATA marked sections.

No constructs in EXI format facilitate the representation of[character encoding scheme],[standalone] and[version] properties which are available in the definition of Document Information Item of XML Information Set (seeB.1 Document Information Item). EXI is made agnostic about[character encoding scheme] and[version] properties as they are in XML Information Set, and considers them to be the properties of XML serializers in use. EXI forgoes[standalone] property because simply having no references to any external markup declarations practically serves the purpose with less complexity.

B.1 Document Information Item

A document information item maps to a pair ofStart Document (SD)andEnd Document (ED) eventswith each of its properties subject to further mapping as shown in the following table.

Table B-1. Mapping between the document information item properties to EXI event types
Property	EXI event types
[children]	CM* PI* DT? [SE, EE]
[document element]	[SE, EE]
[notations]	Computed based ontext content item of DTto which each notation information set item maps.
[unparsed entities]	Computed based ontext content item of DTto which each unparsed entity information set item maps.
[base URI]	The base URI of the EXI stream
[character encoding scheme]	N/A
[standalone]	Not available
[version]	Not available
[all declarations processed]	True if all declarations contained directly or indirectly in DT are processed, otherwise false, which is the processor quality as opposed to the information provided by the format.

B.2 Element Information Items

An element information item maps to a pair of aStart Element (SE)event and the correspondingEnd Element (EE)event with each of its properties subject to further mapping as shown in the following table.

Table B-2. Mapping of the element information item properties to EXI event types
Property	EXI event types
[namespace name]	SE
[local name]	SE
[prefix]	SE
[children]	[SE, EE]* PI* CM* CH* ER*
[attributes]	AT*
[namespace attributes]	NS*
[in-scope namespaces]	The namespace information items computed using the[namespace attributes] properties of this information item and its ancestors
[base URI]	The base URI of the element information item
[parent]	Computed based on the last SE event encountered that did not get a matching EE event if any, or computed based on the SD event

B.3 Attribute Information Item

An attribute information item maps to anAttribute (AT)event with each of its properties subject to further mapping as shown in the following table.

Table B-3. Mapping of the attribute information item properties to EXI event types
Property	EXI event types
[namespace name]	AT
[local name]	AT
[prefix]	AT
[normalized value]	Thevalue of AT
[specified]	True if the item maps to AT, otherwise false
[attribute type]	Computed based on AT and DT
[references]	Computed based on[attribute type] andvalue of AT
[owner element]	Computed based on the last SE event encountered that did not get a matching EE event

B.4 Processing Instruction Information Item

A processing instruction information maps to aProcessing Instruction (PI)event with each of its properties subject to further mapping as shown in the following table.

Table B-4. Mapping of the processing instruction information item properties to EXI event types
Property	EXI event types
[target]	PI
[content]	PI
[base URI]	The base URI of the processing information item
[notation]	Computed based on the availability of the internal DTD subset
[parent]	Computed based on the last SE event encountered that did not get a matching EE event type

B.5 Unexpanded Entity Reference Information item

An unexpanded entity reference information item maps to anEntity Reference (ER) eventwith each of its properties subject to further mapping as shown in the following table.

Table B-5. Mapping of the entity reference information item properties tothe EXI event types
Property	EXI event types
[name]	ER
[system identifier]	Based on the availability of the internal DTD subset
[public identifier]	Based on the availability of the internal DTD subset
[declaration base URI]	The base URI of the unexpanded entity reference information item
[parent]	Computed based on the last SE event encountered that did not get a matching EE event type

B.6 Character Information item

A character information item maps to the individual characters contained in aCharacters (CH)event following a SE event that did not get a matching EE event.

Table B-6. Mapping of the character information item properties and the EXI event types
Property	EXI event types
[character code]	Each character in CH
[element content whitespace]	Computed based on[parent] and DT
[parent]	Computed based on the last SE event encountered that did not get a matching EE event

B.7 Comment Information item

A comment information item maps to aComment (CM)event with each of its properties subject to further mapping as shown in the following table.

Table B-7. Mapping of the comment information item properties and the EXI event types
Property	EXI event types
[content]	text content item of CM
[parent]	Computed based on the last SE event encountered that did not get a matching EE event, or the SD event

B.8 Document Type Declaration Information item

A document type declaration information item maps to aDOCTYPE (DT)event with each of its properties subject to further mapping as shown in the following table.

Table B-8. Mapping of the document type declaration information item properties to the EXI event types
Property	EXI event types
[system identifier]	DT
[public identifier]	DT
[children]	Computed based ontext content item of DT
[parent]	Computed based on the SD event

B.9 Unparsed Entity Information Item

An unparsed entity information item maps to part of thetext content item ofDOCTYPE (DT)event with each of its properties subject to further mapping as shown in the following table.

Table B-9. Mapping of the unparsed entity information item properties to EXI event types
Property	EXI event types
[name]	Computed based ontext content item of DT
[system identifier]	Computed based ontext content item of DT
[public identifier]	Computed based ontext content item of DT
[declaration base URI]	The base URI of the unparsed entity information item
[notation name]	Computed based ontext content item of DT
[notation]	Computed based ontext content item of DT

B.10 Notation Information Item

An notation information item maps to part of thetext content item ofDOCTYPE (DT)event with each of its properties subject to further mapping as shown in the following table.

Table B-10. Mapping of the notation information item properties to EXI event types
Property	EXI event types
[name]	Computed based ontext content item of DT
[system identifier]	Computed based ontext content item of DT
[public identifier]	Computed based ontext content item of DT
[declaration base URI]	The base URI of the notation information item

B.11 Namespace Information Item

An namespace information itemmaps to a Namespace Declaration (NS)event with each of its properties subject to further mapping as shown in the following table.

Table B-11. Mapping of the namespace information item properties to EXI event types
Property	EXI event types
[prefix]	NS
[namespace name]	NS

C XML Schema for EXI Options Document

The following schema describes the EXI options header. It isdesigned to produce smaller headers for option combinations used whencompactness is critical.

<xsd:schema targetNamespace="http://www.w3.org/2009/exi"            xmlns:xsd="http://www.w3.org/2001/XMLSchema"            elementFormDefault="qualified">  <xsd:element name="header">    <xsd:complexType>      <xsd:sequence>        <xsd:element name="lesscommon" minOccurs="0">          <xsd:complexType>            <xsd:sequence>              <xsd:element name="uncommon" minOccurs="0">                <xsd:complexType>                  <xsd:sequence>                    <xsd:any namespace="##other" minOccurs="0" maxOccurs="unbounded"                             processContents="skip" />                    <xsd:element name="alignment" minOccurs="0">                      <xsd:complexType>                        <xsd:choice>                          <xsd:element name="byte">                            <xsd:complexType />                          </xsd:element>                          <xsd:element name="pre-compress">                            <xsd:complexType />                          </xsd:element>                        </xsd:choice>                      </xsd:complexType>                    </xsd:element>                    <xsd:element name="selfContained" minOccurs="0">                      <xsd:complexType />                    </xsd:element>                    <xsd:element name="valueMaxLength" minOccurs="0">                      <xsd:simpleType>                        <xsd:restriction base="xsd:unsignedInt" />                      </xsd:simpleType>                    </xsd:element>                    <xsd:element name="valuePartitionCapacity" minOccurs="0">                      <xsd:simpleType>                        <xsd:restriction base="xsd:unsignedInt" />                      </xsd:simpleType>                    </xsd:element>                    <xsd:element name="datatypeRepresentationMap"                                 minOccurs="0" maxOccurs="unbounded">                      <xsd:complexType>                        <xsd:sequence>                          <!-- schema datatype -->                          <xsd:any namespace="##other" processContents="skip" />                          <!-- datatype representation -->                          <xsd:any processContents="skip" />                        </xsd:sequence>                      </xsd:complexType>                    </xsd:element>                  </xsd:sequence>                </xsd:complexType>              </xsd:element>              <xsd:element name="preserve" minOccurs="0">                <xsd:complexType>                  <xsd:sequence>                    <xsd:element name="dtd" minOccurs="0">                      <xsd:complexType />                    </xsd:element>                    <xsd:element name="prefixes" minOccurs="0">                      <xsd:complexType />                    </xsd:element>                    <xsd:element name="lexicalValues" minOccurs="0">                      <xsd:complexType />                    </xsd:element>                    <xsd:element name="comments" minOccurs="0">                      <xsd:complexType />                    </xsd:element>                    <xsd:element name="pis" minOccurs="0">                      <xsd:complexType />                    </xsd:element>                  </xsd:sequence>                </xsd:complexType>              </xsd:element>              <xsd:element name="blockSize" minOccurs="0">                <xsd:simpleType>                  <xsd:restriction base="xsd:unsignedInt">                    <xsd:minInclusive value="1" />                  </xsd:restriction>                </xsd:simpleType>              </xsd:element>            </xsd:sequence>          </xsd:complexType>        </xsd:element>        <xsd:element name="common" minOccurs="0">          <xsd:complexType>            <xsd:sequence>              <xsd:element name="compression" minOccurs="0">                <xsd:complexType />              </xsd:element>              <xsd:element name="fragment" minOccurs="0">                <xsd:complexType />              </xsd:element>              <xsd:element name="schemaId" minOccurs="0" nillable="true">                <xsd:simpleType>                  <xsd:restriction base="xsd:string" />                </xsd:simpleType>              </xsd:element>            </xsd:sequence>          </xsd:complexType>        </xsd:element>        <xsd:element name="strict" minOccurs="0">          <xsd:complexType />        </xsd:element>      </xsd:sequence>    </xsd:complexType>  </xsd:element>  <!-- Built-in EXI Datatype IDs for use in datatype representation maps -->  <xsd:simpleType name="base64Binary">     <xsd:restriction base="xsd:base64Binary"/>  </xsd:simpleType>  <xsd:simpleType name="hexBinary" >     <xsd:restriction base="xsd:hexBinary"/>  </xsd:simpleType>  <xsd:simpleType name="boolean" >     <xsd:restriction base="xsd:boolean"/>  </xsd:simpleType>  <xsd:simpleType name="decimal" >     <xsd:restriction base="xsd:decimal"/>  </xsd:simpleType>  <xsd:simpleType name="double" >     <xsd:restriction base="xsd:double"/>  </xsd:simpleType>  <xsd:simpleType name="integer" >     <xsd:restriction base="xsd:integer"/>  </xsd:simpleType>  <xsd:simpleType name="string" >     <xsd:restriction base="xsd:string"/>  </xsd:simpleType>  <xsd:simpleType name="dateTime" >     <xsd:restriction base="xsd:dateTime"/>  </xsd:simpleType>  <xsd:simpleType name="date" >     <xsd:restriction base="xsd:date"/>  </xsd:simpleType>  <xsd:simpleType name="time" >     <xsd:restriction base="xsd:time"/>  </xsd:simpleType>  <xsd:simpleType name="gYearMonth" >     <xsd:restriction base="xsd:gYearMonth"/>  </xsd:simpleType>  <xsd:simpleType name="gMonthDay" >     <xsd:restriction base="xsd:gMonthDay"/>  </xsd:simpleType>  <xsd:simpleType name="gYear" >     <xsd:restriction base="xsd:gYear"/>  </xsd:simpleType>  <xsd:simpleType name="gMonth" >     <xsd:restriction base="xsd:gMonth"/>  </xsd:simpleType>  <xsd:simpleType name="gDay" >     <xsd:restriction base="xsd:gDay"/>  </xsd:simpleType>  <!-- Qnames reserved for future use in datatype representation maps -->  <xsd:simpleType name="ieeeBinary32" >     <xsd:restriction base="xsd:float"/>  </xsd:simpleType>  <xsd:simpleType name="ieeeBinary64" >     <xsd:restriction base="xsd:double"/>  </xsd:simpleType></xsd:schema>

Note:

Theqnames exi:ieeeBinary32 and exi:ieeeBinary64 defined above are reserved for future use in Datatype Representation Maps to identify the 32-bit and 64-bit Binary Interchange Formats defined by the IEEE 754-2008 standard[IEEE 754-2008].

D Initial Entries in String Table Partitions

D.1 Initial Entries in Uri Partition

The following table lists the entries that are initially populated in uri partitions, where partition name URI denotes that they are entries in the uri partition.

Table D-1. Initial values inuri partition
Partition	Compact ID	String Value
URI	0	"" [empty string]
URI	1	"http://www.w3.org/XML/1998/namespace"
URI	2	"http://www.w3.org/2001/XMLSchema-instance"

When XML Schemas are used to inform the grammars for processing EXI body, there is an additional entry that is appended to the uri partition,regardless of the XML Schema in use.

Table D-2. Additional entry when XML Schemas are used
Partition	Compact ID	String Value
URI	3	"http://www.w3.org/2001/XMLSchema"

Additionally, when XML Schemas are used, the uri partition is also pre-populated with some of the namespace URIs used in the schemas. Section7.3.1 String Table Partitions describes the way this has to be done. All string values in uri partition are unique.

D.2 Initial Entries in Prefix Partitions

The following table lists the entries that are initially populated in prefix partitions,where XML-PF represents the partition forprefixes inthe"http://www.w3.org/XML/1998/namespace" namespace and XSI-PFrepresents the partition forprefixes in the"http://www.w3.org/2001/XMLSchema-instance" namespace.

Table D-3. Initialprefix string table entries
Partition	Compact ID	String Value
""	0	"" [empty string]
XML-PF	0	"xml"
XSI-PF	0	"xsi"

D.3 Initial Entries in Local-Name Partitions

The following tables list the string values that are initially populated and made availablein local-name partitions, where XML‑NS represents the partition forlocal-namesin the"http://www.w3.org/XML/1998/namespace" namespace, XSI‑NSrepresents the partition forlocal-names in the"http://www.w3.org/2001/XMLSchema-instance" namespace, and XSD‑NSrepresents the partition forlocal-names in the"http://www.w3.org/2001/XMLSchema" namespace.

Table D-4.String values initially available in XML‑NS and XSI‑NS partition
Partition	String Values
XML‑NS	"base", "id", "lang", "space"
XSI‑NS	"nil", "type"

When XML Schemas are used to inform the grammars for processing EXI body, those string values listed in the next table are available in XSD‑NS partition.

Table D-5.String values initially available in XSD‑NS partition for schema-informed EXI streams
Partition	String Values
XSD‑NS	"ENTITIES", "ENTITY", "ID", "IDREF", "IDREFS", "NCName", "NMTOKEN", "NMTOKENS", "NOTATION", "Name", "QName", "anySimpleType", "anyType", "anyURI", "base64Binary", "boolean", "byte", "date", "dateTime", "decimal", "double", "duration", "float", "gDay", "gMonth", "gMonthDay", "gYear", "gYearMonth", "hexBinary", "int", "integer", "language", "long", "negativeInteger", "nonNegativeInteger", "nonPositiveInteger", "normalizedString", "positiveInteger", "short", "string", "time", "token", "unsignedByte", "unsignedInt", "unsignedLong", "unsignedShort"

Additionally, when a schema is provided, the string table is also pre-populated with the local-name of each attribute, element and type explicitly declared in the schema, partitioned by namespace URI.

All string values within each partition containing local-names is then sorted lexicographically. Assign each string value a compact identifier in the sorted order, with the initial identifier number 0 assigned to the first string value, incremented by 1 before each subsequent assignment.

E DerivingSet of Charactersfrom XML Schema Regular Expressions

XML Schema datatypes specification[XML Schema Datatypes] defines aregular expression^XS2 syntax for use in pattern facets of simple type definitions. Pattern facets constrain the set of valid values to those that lexically match the specified regular expression. This section describes the rules for deriving the set of characters allowed in a string value that conforms to a given regular expression in an XML Schema.In the following description, the term "set-of-chars" is used as the shorthand form of "set of characters".

At the top level,the XML Schema regular expressionsyntax is summarized by the following production excerpted here from[XML Schema Datatypes]. Note the notation used for the numbers that tag the productions. "XSD:" is prefixed to the original numeric tags to make it easier to discern them as belonging to XML Schema specification.

[XSD:1]^XS2 regExp ::= branch ( '|' branch )*

The set-of-chars for a regex that conforms to the syntax above is the union of the set-of-chars defined for each branch. Each branch of a regex is described by the following production:

[XSD:2]^XS2 branch ::= piece*

The set-of-chars for each branch of a regex is the union of the set-of-chars for each piece of the branch. Each piece of a branch is described by the following production:

[XSD:3]^XS2 piece ::= atom quantifier?

The set-of-chars for each piece of a branch is the set-of-chars for the atom portion of the piece. The atom portion of a piece is described by the following production:

[XSD:9]^XS2 atom ::= Char | charClass | ( '(' regExp ')' )

The set-of-chars for the atom is the set-of-chars for the Char, charClass or regExp that constitutes the atom.

The set-of-chars for a Char that constitutes an atom contains the single character that matches theChar expression^XS2.The set-of-chars for a charClass that constitutes an atom is the set of characters specified by thecharClass expression^XS2.The set-of-chars for a regExp sub-expression enclosed in parenthesis that constitutes an atom is the set-of-chars for the regExp itself derived by recursively applying the rule defined above.

For stability and interoperability of restricted character sets across different versions of the Unicode standard, certain pattern facets cannot be used for deriving restricted character sets. In particular, pattern facets that contain one or morecategory escapes^XS2,category complement escapes^XS2 ormulti-character escapes^XS2 other than \s do not have restricted character sets.

F Content Coding and Internet Media Type

Two labels are defined for identifying and negotiating the use of EXI for representing XML information in higher-level protocols. They serve two distinct roles. One is for content coding and the other is for internet media type.

F.1 Content Coding

The content-coding value "exi" is registered with the Internet Assigned Numbers Authority (IANA) for use with EXI. Protocols that can identify and negotiate the content coding ofXMLinformation independent of its media type,SHOULD use the content coding "exi" (case-insensitive) to convey the acceptance or actual use of EXI encoding for XML information.

F.2 Internet Media Type

A new media type registration "application/exi" described below is being proposed for community review, with the intent to eventually submit it to the IESG for review, approval, and registration with IANA.

Type name:

application

Subtype name:

exi

Required parameters:

none

Optional parameters:

none

Encoding considerations:

binary

Security considerations:

When used as an XML replacement in an application, EXI sharesthe same security concerns as XML, described in IETF RFC 3023[IETF RFC 3023],section 10.

In addition to concerns shared with XML, the schema identifierrefers to information external to the EXI document itself. Ifan attacker is able to substitute another schema in place ofthe intended one, the semantics of the EXI document could bechanged in some ways. As an example, EXI is sensitive to theorder of the values in an enumeration. It is not known whethersuch an attack is possible on the actual structure of thedocument.

Also, EXI supports user-defined datatype representations, and suchrepresentations, if present in a document and purportedly understood bya processor, can be a security weakness. Definitions of theserepresentations are expected to be external, often application- orindustry-specific, so any definition needs to be analyzed carefully fromthe security perspective before being adopted.

Interoperability considerations:

The datatype representation map feature of EXI requirescoordination between the producer and consumer of an EXIdocument, and is not recommended except in controlledenvironments or using standardized datatype representationspotentially defined in the future.

EXI permits information necessary to decode a document to beomitted with the expectation that such information has beencommunicated out of band. Such omissions hinderinteroperability in uncontrolled environments.

Published specification:

Efficient XML Interchange (EXI) Format 1.0, World Wide WebConsortium

Applications that use this media type:

No known applications currently use this media type.

Additional information:

Magic number(s):

	The first four octets may be hexadecimal 24 45 58 49 ("$EXI").The first octet after these, or the first octet of the wholecontent if they are not present, has its high two bits set tovalues 1 and 0 in that order.

File extension(s):
	.exi

Macintosh file type code(s):
	APPL

Consideration of alternatives :
	When transferring EXI streams over a protocol that can identify and negotiate the content coding of XML information independent of its media-type, the content-coding should be used to identify and negotiate how the XML information is encoded and the media-type should be used to negotiate and identify what type of information is transferred.

Person & email address to contact for further information:

World Wide Web Consortium <web-human@w3.org>

Intended usage:

COMMON

Restrictions on usage:

none

Author/Change controller:

The EXI specification is the product of the World Wide WebConsortium's Efficient XML Interchange Working Group. The W3Chas change control over this specification.

G Example Encoding (Non-Normative)

EXI Primer[EXI Primer] contains a section that explains the workings of EXI format using simple example documents. Those examples are intended to serve as a tool to confirm the understanding of the EXI format in action by going through encoding and decoding processes step by step.

H Schema-informed Grammar Examples (Non-Normative)

As an example to exercise the process to produce schema-informed element grammars, consider the following XML Schema fragment declaring two complex-typed elements, <product> and <order>:

Example H-1.Example XML Schema fragment

<xs:element name="product">  <xs:complexType>    <xs:sequence maxOccurs="2">      <xs:element name="description" type="xs:string" minOccurs="0"/>      <xs:element name="quantity" type="xs:integer" />      <xs:element name="price" type="xs:float" />    </xs:sequence>    <xs:attribute name="sku" type="xs:string" use="required" />    <xs:attribute name="color" type="xs:string" use="optional" />  </xs:complexType></xs:element><xs:element name="order">  <xs:complexType>    <xs:sequence>      <xs:element ref="product" maxOccurs="unbounded" />    </xs:sequence>  </xs:complexType></xs:element>

SectionH.1 Proto-Grammar Examples guides you through the process ofgeneratingEXI proto-grammars from the schema components available in the example schema above. EXI grammars in the normalized form that correspond to the proto-grammars are shown in sectionH.2 Normalized Grammar Examples. SectionH.3 Complete Grammar Examples shows the complete EXI grammars for elements <product> and <order>.

H.1 Proto-Grammar Examples

Grammars for element declaration terms "description", "quantity" and "price" are as follows. See section8.5.4.1.6 Element Terms for the rules used togenerate grammars for element terms.

Term_description

	Term_description₀ :
		SE("description")Term_description₁

	Term_description₁ :
		EE

Term_quantity

	Term_quantity₀ :
		SE("quantity")Term_quantity₁

	Term_quantity₁ :
		EE

Term_price

	Term_price₀ :
		SE("price")Term_price₁

	Term_price₁ :
		EE

The grammar for element particle "description" iscreated based onTerm_description given { minOccurs } value of 0 and { maxOccurs } value of 1. See section8.5.4.1.5 Particles for the rules used togenerate grammars for particles.

Particle_description

	Term_description₀ :
		SE("description")Term_description₁
		EE

	Term_description₁ :
		EE

Grammars for element particle "quantity" and "prices" are the same as those of their terms (Term_quantity andTerm_price, respectively) because {minOccurs} and {maxOccurs} are both 1.

Particle_quantity

	Term_quantity₀ :
		SE("quantity")Term_quantity₁

	Term_quantity₁ :
		EE

Particle_price

	Term_price₀ :
		SE("price")Term_price₁

	Term_price₁ :
		EE

The grammar for the sequence group term in <product> element declaration iscreated based onthe grammars of subordinate particles as follows. See section8.5.4.1.8.1 Sequence Model Groups for the rules used togenerate grammars for sequence groups.

Term_sequence =Particle_description ⊕Particle_quantity ⊕Particle_price

which yields the following grammars forTerm_sequence.

Term_sequence

	Term_description₀ :
		SE("description")Term_description₁
		Term_quantity₀

	Term_description₁ :
		Term_quantity₀

	Term_quantity₀ :
		SE("quantity")Term_quantity₁

	Term_quantity₁ :
		Term_price₀

	Term_price₀ :
		SE("price")Term_price₁

	Term_price₁ :
		EE

The grammar for the particle that is the content model of element <product>is created based onTerm_sequence (shown above) given {minOccurs} value of 1 and {maxOccurs} value of 2. See section8.5.4.1.5 Particles for the rules used togenerate grammars for particles.

Particle_sequence

	Term_description_0,0 :
		SE("description")Term_description_0,1
		Term_quantity_0,0

	Term_description_0,1 :
		Term_quantity_0,0

	Term_quantity_0,0 :
		SE("quantity")Term_quantity_0,1

	Term_quantity_0,1 :
		Term_price_0,0

	Term_price_0,0 :
		SE("price")Term_price_0,1

	Term_price_0,1 :
		Term_description_1,0

	Term_description_1,0 :
		SE("description")Term_description_1,1
		Term_quantity_1,0
		EE

	Term_description_1,1 :
		Term_quantity_1,0

	Term_quantity_1,0 :
		SE("quantity")Term_quantity_1,1

	Term_quantity_1,1 :
		Term_price_1,0

	Term_price_1,0 :
		SE("price")Term_price_1,1

	Term_price_1,1 :
		EE

Grammars for attribute uses of attributes "sku" and "color" are as follows. See section8.5.4.1.4 Attribute Uses for the rules used togenerate grammars for attribute uses.

Use_sku

	Use_sku₀ :
		AT("sku") [schema-typed value]Use_sku₁

	Use_sku₁ :
		EE

Use_color

	Use_color₀ :
		AT("color") [schema-typed value]Use_color₁
		EE

	Use_color₁ :
		EE

Note the subtle difference betweenthe forms of the twogrammarsUse_sku andUse_color.At the outset of the grammars,onlyUse_color contains a production of which the right-hand side starts with EE, whichis the result ofthe difference in their occurrencerequirementdefined in the schema.

Finally, the grammar for the element <product> iscreated based onthe grammars of its attribute uses and content model particle as follows. See section8.5.4.1.3.2 Complex Type Grammars for the rules used togenerate grammars for complex types.

ProtoG_ProductElement =Use_color ⊕Use_sku ⊕Particle_sequence

which yields the following grammar for element <product>.

ProtoG_ProductElement

	Use_color₀ :
		AT("color") [schema-typed value]Use_color₁
		Use_sku₀

	Use_color₁ :
		Use_sku₀

	Use_sku₀ :
		AT("sku") [schema-typed value]Use_sku₁

	Use_sku₁ :
		Term_description_0,0

	Term_description_0,0 :
		SE("description")Term_description_0,1
		Term_quantity_0,0

	Term_description_0,1 :
		Term_quantity_0,0

	Term_quantity_0,0 :
		SE("quantity")Term_quantity_0,1

	Term_quantity_0,1 :
		Term_price_0,0

	Term_price_0,0 :
		SE("price")Term_price_0,1

	Term_price_0,1 :
		Term_description_1,0

	Term_description_1,0 :
		SE("description")Term_description_1,1
		Term_quantity_1,0
		EE

	Term_description_1,1 :
		Term_quantity_1,0

	Term_quantity_1,0 :
		SE("quantity")Term_quantity_1,1

	Term_quantity_1,1 :
		Term_price_1,0

	Term_price_1,0 :
		SE("price")Term_price_1,1

	Term_price_1,1 :
		EE

The other element declaration <order> can be processedto generate its proto-grammar in a similar fashion as follows.


Term_product

	Term_product₀ :
		SE("product")Term_product₁

	Term_product₁ :
		EE

The grammar for element particle "product" is created based onTerm_product given { minOccurs } value of 1 and { maxOccurs } value ofunbounded. See section8.5.4.1.5 Particles for the rules used to generate grammars for particles.


Particle_product (before simplification)


	Term_product_0,0 :
		SE("product")Term_product_0,1

	Term_product_0,1 :
		Term_product_1,0

	Term_product_1,0 :
		SE("product")Term_product_1,1
		EE

	Term_product_1,1 :
		Term_product_1,0

In the above grammars, two grammarsTerm_product_0,1 andTerm_product_1,1 are redundant because they serve for no other purpose than simply relaying one non-terminal to another. Though it is not required, the uses of non-terminalsTerm_product_0,1 andTerm_product_1,1 are each replaced byTerm_product_1,0 andTerm_product_1,0, which produces the followingsimplifiedproto-grammars.

Particle_product (after simplification)



	Term_product_0,0 :
		SE("product")Term_product_1,0

	Term_product_1,0 :
		SE("product")Term_product_1,0
		EE

The proto-grammar of the element <order> equates toParticle_product because the type definition of element <order> has no attribute uses, and its content model has both { minOccurs } and { maxOccurs } property values of 1 where the element particle "product" is the sole member of the content model.

ProtoG_OrderElement



	Term_product_0,0 :
		SE("product")Term_product_1,0

	Term_product_1,0 :
		SE("product")Term_product_1,0
		EE

H.2 Normalized Grammar Examples

The element proto-grammarsProtoG_ProductElement andProtoG_OrderElement produced in the previous section can be turned into their normalized forms which are shown below with anevent code assigned to each production. See section8.5.4.2 EXI Normalized Grammars for the process that converts proto-grammars into normalized grammars, and section8.5.4.3 Event Code Assignment for the rules that determine theevent codes of productions in normalized grammars.


		Event Code
Use_color₀ :
	AT("color") [schema-typed value]Use_color₁	0
	AT("sku") [schema-typed value]Use_sku₁	1

Use_color₁ :
	AT("sku") [schema-typed value]Use_sku₁	0

Use_sku₁ :
	SE("description")Term_description_0,1	0
	SE("quantity")Term_quantity_0,1	1

Term_description_0,1 :
	SE("quantity")Term_quantity_0,1	0

Term_quantity_0,1 :
	SE("price")Term_price_0,1	0

Term_price_0,1 :
	SE("description")Term_description_1,1	0
	SE("quantity")Term_quantity_1,1	1
	EE	2

Term_description_1,1 :
	SE("quantity")Term_quantity_1,1	0

Term_quantity_1,1 :
	SE("price")Term_price_1,1	0

Term_price_1,1 :
	EE	0


		Event Code
Term_product_0,0 :
	SE("product")Term_product_1,0	0

Term_product_1,0:
	SE("product")Term_product_1,0	0
	EE	1

Note that someproductionsthat were present in the proto-grammars have been removed in the normalized grammars.Thoseproductionswere culled upon the completion of grammar normalization because their left-hand-side non-terminals are not referenced from right-hand side of any available productions,and yet those non-terminals are not the first non-terminals of the grammar they belong to..

H.3 Complete Grammar Examples

The normalized grammarsNormG_ProductElement andNormG_OrderElement are augmented with undeclared productions to become complete grammars.See section8.5.4.4 Undeclared Productions for the processto augment normalized grammars with productions for accepting terminal symbols not declared in schemas.The complete grammars for elements <product> and <order> are shown below.Note that the default grammar settings (i.e. the settings that can be described by an empty header options document <exi:header/> is used for the sake of this augmentation process, and those productions that accept ER, NS, CM and PI have been pruned according to the rules described in section8.3 Pruning Unneeded Productions since thoseterminal symbolsare not preserved in the default grammar settings.


		Event Code
Use_color₀ :
	AT("color") [schema-typed value]Use_color₁	0
	AT("sku") [schema-typed value]Use_sku₁	1
	EE	2.0
	AT(xsi:type)Use_color₀	2.1
	AT(xsi:nil)Use_color₀	2.2
	AT ()Use_color*₀	2.3
	AT("color") [untyped value]Use_color₁	2.4.0
	AT("sku") [untyped value]Use_sku₁	2.4.1
	AT () [untyped value]Use_color*₀	2.4.2
	SE()Use_sku*_{1_copied}	2.5
	CH [untyped value]Use_sku_{1_copied}	2.6

Use_color₁ :
	AT("sku") [schema-typed value]Use_sku₁	0
	EE	1.0
	AT ()Use_color*₁	1.1
	AT("sku") [untyped value]Use_sku₁	1.2.0
	AT () [untyped value]Use_color*₁	1.2.1
	SE()Use_sku*_{1_copied}	1.3
	CH [untyped value]Use_sku_{1_copied}	1.4

Use_sku₁ :
	SE("description")Term_description_0,1	0
	SE("quantity")Term_quantity_0,1	1
	EE	2.0
	AT ()Use_sku*₁	2.1
	AT () [untyped value]Use_sku*₁	2.2.0
	SE()Use_sku*_{1_copied}	2.3
	CH [untyped value]Use_sku_{1_copied}	2.4

Use_sku_{1_copied} :
	SE("description")Term_description_0,1	0
	SE("quantity")Term_quantity_0,1	1
	EE	2.0
	SE()Use_sku*_{1_copied}	2.1
	CH [untyped value]Use_sku_{1_copied}	2.2

Term_description_0,1 :
	SE("quantity")Term_quantity_0,1	0
	EE	1
	SE()Term_description*_0,1	2.0
	CH [untyped value]Term_description_0,1	2.1

Term_quantity_0,1 :
	SE("price")Term_price_0,1	0
	EE	1
	SE()Term_quantity*_0,1	2.0
	CH [untyped value]Term_quantity_0,1	2.1

Term_price_0,1 :
	SE("description")Term_description_1,1	0
	SE("quantity")Term_quantity_1,1	1
	EE	2
	SE()Term_price*_0,1	3.0
	CH [untyped value]Term_price_0,1	3.1

Term_description_1,1 :
	SE("quantity")Term_quantity_1,1	0
	EE	1
	SE()Term_description*_1,1	2.0
	CH [untyped value]Term_description_1,1	2.1

Term_quantity_1,1 :
	SE("price")Term_price_1,1	0
	EE	1
	SE()Term_quantity*_1,1	2.0
	CH [untyped value]Term_quantity_1,1	2.1

Term_price_1,1 :
	EE	0
	SE()Term_price*_1,1	1.0
	CH [untyped value]Term_price_1,1	1.1


		Event Code
Term_product_0,0 :
	SE("product")Term_product_1,0	0
	EE	1.0
	AT(xsi:type)Term_product_0,0	1.1
	AT(xsi:nil)Term_product_0,0	1.2
	AT ()Term_product*_0,0	1.3
	AT () [untyped value]Term_product*_0,0	1.4.0
	SE()Term_product*_{0,0_copied}	1.5
	CH [untyped value]Term_product_{0,0_copied}	1.6

Term_product_{0,0_copied} :
	SE("product")Term_product_1,0	0
	EE	1.0
	SE()Term_product*_{0,0_copied}	1.1
	CH [untyped value]Term_product_{0,0_copied}	1.2

Term_product_1,0 :
	SE("product")Term_product_1,0	0
	EE	1
	SE()Term_product*_1,0	2.0
	CH [untyped value]Term_product_1,0	2.1

I Recent Specification Changes (Non-Normative)

I.1 Changes from First Edition Recommendation

Clarified the definition and intended use of content index. (see19 August 2013 (1),19 August 2013 (2),19 August 2013 (3),19 August 2013 (4),19 August 2013 (5),19 August 2013 (6))

Clarified the offset of Integer datatype for bounded range schema types. (see27 June 2013)

Add a references to Namespaces in XML 1.1 specification. (see26 June 2013)

Clarified the valid value range of the time components in Date-Time datatype. (see13 June 2013)

Clarified that the namespace declarations are mapped to NS events and should not be represented by AT events. (see06 May 2013)

Fixed discrepancy between complex type grammars and ur-type grammar. (see29 March 2013 (1),29 March 2013 (2), and29 March 2013 (3))

Clarified that enumerated values do not affect how values of list datatypes are encoded. (see19 September 2012 (1),19 September 2012 (2))

Clarified that AT(xsi:type) productions are added to a grammar at most once. (seeAT(xsi:type) handling in Built-in Element Grammar)

Improved the wording describing when to add an extra production representing EE when { max_occurs } is unbounded. (seeUnbounded { max_occurs } of Particles)

Clarified when patterns if any are relevant in Boolean datatype representation. (seePatterns in Boolean)

Clarified how values with enumerated values are represented when a DTRM is in effect. (seeEnumerated Values with DTRM)

Added a clarification regarding the restricted character set used for a value that would be represented as an EXI enumeration. (seeRestricted Character Set of Enumeration)

I.2 Changes from previous versions of the document

The changes from First Public Working Draft are available in theFirst Edition Recommendation

J Acknowledgements (Non-Normative)

This document is the work of theEfficient XML Interchange (EXI) WG.

Members of the Working Group are (at the time of writing, sorted alphabetically by last name):

Carine Bournez, W3C/ERCIM (staff contact)
Don Brutzman, Web3D Consortium
Michael Cokus, MITRE Corporation
Yusuke Doi, Toshiba Corporation
Youenn Fablet, Canon, Inc.
Jun Fujisawa, Canon, Inc.
Joerg Heuer, Siemens AG
Sebastian Käbisch, Siemens AG
Takuki Kamiya, Fujitsu Laboratories of America, Inc.(chair)
Rumen Kyusakov, Invited Expert, Luleå University of Technology
Richard Kuntschke, Siemens AG
Don McGregor, Web3D Consortium
Daniel Peintner, Siemens AG
Liam Quin, W3C/MIT (staff contact)
Mohamed Zergaoui, INNOVIMAX

The EXI Working Group would like to acknowledge the following former members of the group for their leadership, guidance and expertise they provided throughout their individual tenure in the WG. (sorted in chronologically)

Oliver Goldman, Adobe Systems, Inc. (former co-chair) (until 8 June 2006)
Robin Berjon, Expway (former co-chair) (until 17 October 2006)
Peter Haggar, IBM (until 7 March 2007)
Paul Thorpe, OSS Nokalva, Inc. (until 11 Sept 2007)
Kimmo Raatikainen, Nokia (until 13 March 2008)
Daniel Vogelheim, Invited Expert (former co-chair then from Siemens AG) (until 15 July 2008)
Stephen Williams, High Performance Technologies, Inc. (until 8 Aug 2008)
Ed Day, Objective Systems, Inc. (until 23 Oct 2009)
Santiago Pericas-Geertsen, Sun Microsystems, Inc. (until 6 May 2010)
Paul Sandoz, Sun Microsystems, Inc. (until 6 May 2010)
Alan Hudson, Web3D Consortium (until 2 June 2011)
Sheldon Snyder, Web3D Consortium (until 2 June 2011)
John Schneider, AgileDelta, Inc. (until 19 July 2012)
Rich Rollman, AgileDelta, Inc. (until 19 July 2012)
Nan Ma, China Electronics Standardization Institute (until 19 July 2012)
Jaakko Kangasharju, University of Helsinki (until 19 July 2012)
Greg White, Stanford University (former co-chair) (until 19 July 2012)
David Lee, MarkLogic (until 6 September 2013)
Hideyuki Moribe, Fujitsu Laboratories of America, Inc. (until 14 January 2014)

The EXI working group owes so much to our distinguished colleague from Nokia, Kimmo Raatikainen (1955-2008), on the progress of our work, who succumbed to an ailment on March 13, 2008. His breadth of knowledge, depth of insight, ingenuity and courage to speak up constantly shed a light onto us whenever the group seemed to stray into a futile path of disagreements during the course. We shall never forget and will always appreciate his presence in us, and great contribution that is omnipresent in every aspect of our work throughout.

Movatterモバイル変換

Efficient XML Interchange (EXI) Format 1.0 (Second Edition)

W3C Recommendation 11 February 2014

Abstract

Status of this Document

Table of Contents

Appendices

1. Introduction

1.1 History and Design

1.2 Notational Conventions and Terminology

2. Design Principles

3. Basic Concepts

4. EXI Streams

5. EXI Header

5.1 EXI Cookie

5.2 Distinguishing Bits

5.3 EXI Format Version

5.4 EXI Options

6. Encoding EXI Streams

6.1 Determining Event Codes

6.2 Representing Event Codes

6.3 Fidelity Options

7. Representing Event Content

7.1 Built-in EXI Datatype Representations

7.1.1 Binary

7.1.2 Boolean

7.1.3 Decimal

7.1.4 Float

7.1.5 Integer

7.1.6 Unsigned Integer

7.1.7 QName

7.1.8 Date-Time

7.1.9n-bit Unsigned Integer

7.1.10 String

7.1.10.1 Restricted Character Sets

7.1.11 List

7.2 Enumerations

7.3 String Table

7.3.1 String Table Partitions

7.3.2 Partitions Optimized for Frequent use of Compact Identifiers

7.3.3 Partitions Optimized for Frequent use of String Literals

7.4 Datatype Representation Map

8. EXI Grammars

8.1 Grammar Notation

8.1.1 Fixed Event Codes

8.1.2 Variable Event Codes

8.2 Grammar Event Codes

8.3 Pruning Unneeded Productions

8.4 Built-in XML Grammars

8.4.1 Built-in Document Grammar

8.4.2 Built-in Fragment Grammar

8.4.3 Built-in Element Grammar

8.5 Schema-informed Grammars

8.5.1 Schema-informed Document Grammar

8.5.2 Schema-informed Fragment Grammar

8.5.3 Schema-informed Element Fragment Grammar

8.5.4 Schema-informed Element and Type Grammars

8.5.4.1 EXI Proto-Grammars

8.5.4.1.1 Grammar Concatenation Operator

8.5.4.1.2 Element Grammars

8.5.4.1.3 Type Grammars

8.5.4.1.3.1 Simple Type Grammars

8.5.4.1.3.2 Complex Type Grammars

8.5.4.1.4 Attribute Uses

8.5.4.1.5 Particles

8.5.4.1.6 Element Terms

8.5.4.1.7 Wildcard Terms

8.5.4.1.8 Model Group Terms

8.5.4.1.8.1 Sequence Model Groups

8.5.4.1.8.2 Choice Model Groups

8.5.4.1.8.3 All Model Groups

8.5.4.2 EXI Normalized Grammars

8.5.4.2.1 Eliminating Productions with no Terminal Symbol

8.5.4.2.2 Eliminating Duplicate Terminal Symbols

8.5.4.3 Event Code Assignment

8.5.4.4 Undeclared Productions

8.5.4.4.1 Adding Productions when Strict is False

8.5.4.4.2 Adding Productions when Strict is True

9. EXI Compression

9.1 Blocks