Copyright © 2009W3C® (MIT,ERCIM,Keio), All Rights Reserved. W3Cliability,trademark anddocument use rules apply.
This Working Group Note is an evaluation of the Efficient XML Interchange(EXI) Format 1.0 with reference to theProperties identified by theXML Binary Characterization (XBC) WorkingGroup, relative to XML, gzipped XML and ASN.1 PER. It is conducted usingtheXBC Measurementmethodology. For the"compactness" and"processingefficiency" Properties, the performance is measured withEXI Measurementframework, over thetest datacollected for the EXI measurements, representing XBC Use Cases.
This section describes the status of this document at the time of itspublication. Other documents may supersede this document. A list of current W3Cpublications and the latest revision of this technical report can be found intheW3C technical reports index athttp://www.w3.org/TR/.
This is the second Working Draft of the evaluation of the EXI Format 1.0 conducted by the EXI Working Group. Itpresents an evaluation of the EXI Format 1.0 conducted by the EXI WorkingGroup. This draft includes results for all properties, including"compactness"and"processingefficiency".
This document was developed by theEfficient XML Interchange (EXI) WorkingGroup. A complete list of changes to this document is available.
Comments on this document are invited and are to be sent to the publicpublic-exi@w3.org mailing list (public archive). Ifsubstantive comments are received, the Working Group may revise this WorkingGroup Note.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains apublic list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes containsEssential Claim(s) must disclose the information in accordance withsection 6 of the W3C Patent Policy.
This document presents the anticipated benefits of the EXI format 1.0compared toXML and gzipped XML. Additionally, testsfor compactness include comparison toASN.1 PER. Thepoints of comparison are the requirements set by the EXI Working Group charter,based on the results of theXML BinaryCharacterization Working Group.
This summarized evaluation of the EXI format uses the testing frameworkbuilt during the first phase of the EXI Working Group's work so as to select abaseline candidate technology. Although this evaluation aims at demonstratingEXI benefits in the targetedXBCUse Cases, it can be read as a summary of theEXI measurements Note.
The methodology used in the evaluation relies on previous work onmeasurements. ThePropertiesreferred to in this document have been defined by theXBC Working Group. The methodology formeasurement is detailed in theXBC measurement methodologydocument. For convenience, Appendix A gives an overview of the propertiesdefinitions, as well as some details of their measurements.
In addition, two Properties require an implementation to be evaluated:Compactness and Processing Efficiency. These Properties have been tested usingtheEXI measurementframework and the associatedmethodology.
At the time of the first publication of this document, the Working Group hasnot tested conformance of implementations. The methodology and frameworkdesigned and implemented onJapex by the Working Groupare used for the properties that require implementation testing. The otherproperties can be asserted by checking the specification only.
This test has been run over the EXI Working Group'sframework testdata, which contains 94 test documents from 21 test groups. The followinggraphs show the resulting size as a percentage of the original XML documentsize, sorted by the EXI result, for the sake of legibility (i.e. "best" resultson the left). The implementation of EXI used for the measurements isEfficient XML 4.0. It implements the specification ofthe EXI format 1.0 at the time of writing.
For each test case, the testing framework uses the most appropriateapplicationclass: Whenever a schema is available, EXI uses the schema information, andwhen a document-analysis-based technique leads to a better result, thecompression option is turned on.
The graph above compares EXI to Gzipped XML. As shown by the graph, EXI isconsistently smaller than gzipped XML regardless of document size, documentstructure or the availability of schema information. In some cases, EXI is over10 times smaller than gzip. In addition, EXI works well in cases where gzip haslittle effect or even makes documents bigger, such as high volume streams ofsmall messages typical of geolocation, financial exchange and sensorapplications.
The graph above compares the same EXI numbers to the ASN.1 PER file sizes.Each EXI encoded file is smaller than the equivalent ASN.1 PER, and sometimes20 times smaller. This holds true even for cases where EXI is preserving XMLcomments, processing instructions and namespace prefixes that are not preservedby ASN.1 PER. In addition, EXI works well in cases where ASN.1 PER actuallyincreases the size of the document or fails to produce an encoding at all(e.g., due to schema deviations.)
The processing efficiency tests were run using the EXI Working Group'sframework testdata andtestmethodology on a Windows XP machine with a 3.0 Ghz Pentium 4 CPU and 1.5Gbytes of RAM. Processing efficiency was measured in transactions per second(TPS) and the following graphs show results as a percentage of XML speed andGzipped XML speed, sorted by EXI result for legibility. So, for example, ameasurement of 200% is two times faster, a measurement of 300% is three timesfaster, etc.
It is important to note that processing efficiency is also implementation dependent and not all EXI implementations will achieve the performance resultsillustrated here. The implementation of EXI used for these measurements wasEfficient XML 4.0, thatimplements the EXI 1.0 format specification.
The following two graphs illustrate the decoding (i.e., parsing) speed ofEfficient XML with and without EXI compression for each test case.
The graph above shows EXI decoding speed without compression compared toXML. The average decoding speed of EXI was 14.5 times faster than the averagedecoding speed of XML. The median speed increase was 6.7 times faster. Toimprove readibility, the graph does not show the four best cases, which rangedfrom 54 times faster to 257 times faster. These four test cases were SOAPweb-service messages that were marshalled from a binding layer and containedrepeating structures with elements and attributes from several differentnamespaces. As is typical for such use cases, the repeated structures containeda large number of repeated namespace declarations. EXI eliminates most of theoverhead associated with namespace processing, which is why EXI achieved such aspeed increase for these cases.
The graph above shows EXI decoding speed with compression compared to XMLwith compression. The average decoding speed of EXI was 9.2 times faster thanthe average decoding speed of GZipped XML. The median speed was 4.4times faster. To improve readibility, the graph does not show the four bestcases, which ranged from 30 times faster to 102 times faster. These were thesame four SOAP web-service test cases described in the previous paragraph.
The following two graphs illustrate the encoding (i.e., serialization) speedof Efficient XML with and without EXI compression for each test case.
The graph above shows EXI encoding speed without compression compared toXML. The average encoding speed of EXI was 6.0 times faster than the averageencoding speed of XML. The median speed increase was 2.4 times faster. Toimprove readibility, the graph does not show the best case, which was 21 timesfaster.
The graph above shows EXI encoding speed with compression compared to XMLwith compression. The average encoding speed of EXI was 5.4 times faster thanthe average encoding speed of Gzipped XML. The median speed increase was 2.7times faster. The graph does no show the best case, which was just over 18times faster.
TheXBC working group analyzed all theproperties desired by theXBC use cases and identified a minimum set of required properties for the W3C EXI format in itsXBC Characterization document. For several of these properties, the XBC working group defined specific thresholds a data format must achieve to satisfy the requirements of the XBC use cases. For example, several XBC use cases required a data format with compactness similar to custom binary formats. In discussing the needs of these use cases, it was determined that the format must be no larger than ASN.1 PER + 20% when schemas optimizations are used to satisfy the "compactness" requirements of these use cases. The table below lists those properties and scores EXI for each. For comparison purposes, scores are also given for XML(+gzip).
these properties were designed to determine whether candidate EXI formats meet the requirements and specific performancethresholds of the XBC use cases. So, when the table says XML(+gzip) does not meet the compactness requirement, it means it does not meet the specific compactness threshold required by the XBC use cases. Similarly, when the table says XML(+gzip) does not meet the processing efficiency requirement, it means that XML(+gzip) does not meet the specific processing efficiency requirements of the XBC use cases. Several of these use cases specify a need for a format that was "faster to process than XML" (e.g., to be competitive with binary RPC mechanisms), so by definition it was impossible for XML to achieve this requirement.
The XBC working group classified these properties into two categories based on the observation that some properties are inherent to the format (e.g., compactness), while some are also properties of implementations of the format (e.g., processing efficiency), but characteristics of the format could prohibit implementations from achieving them. As such, the first category lists properties the format must support inherently. The second category lists properties of implementations the format must not prevent.
The score given for each property is the evaluation result of the format orimplementation thereof, obtained by following the methodology defined in theXBC Measurement Methodologies document.
Property | XML (+gzip) | EXI | ||
MUST support | ||||
Directly Readable and Writable | No | The XML format itself satisfies this property, but naturally gzip compression applied to a file format requires creating the intermediate form (XML) first. | Yes | Implementations can read and write EXI streams directly via standard XML APIs, such as DOM, SAX and StAX. At least one current implementation also support typed APIs for increased performance. |
Transport Independence | Yes | Yes | EXI can be used over TCP, UDP, HTTP and various wireless and satellite transports. | |
Compactness | No | XML and gzip cannot take advantage of schema information, so this format fails in the Schema and Both classes (It does not achieve compactness typically required by applications that use binary data formats, like ASN.1, CORBA, XDR, etc.) By definition, it succeeds in the Document class and fails in the Neither class (due to the different requirements, in the Neither class, it would have to be smaller than itself). (SeeNote above this table) | Yes | |
Human Language Neutral | Yes | Yes | EXI supports all standard character set encodings. | |
Platform Neutrality | Yes | Yes | The EXI format specification does not make particular assumption about the platform architecture. Implementation already exists for several popular server, desktop and mobile platforms, including Java EE/SE, Microsoft .NET, Java Mobile Edition and .NET Compact Framework. | |
Integratable into XML Stack | Yes | Yes | EXI was designed to integrate well into the XML stack, neither duplicating nor requiring changes to functionality at other layers in the XML stack. It builds on the XML Infoset data model. It implements the same character encodings as text XML and supports the common interfaces as existing XML parsers and serializers. As such, it can be inserted into existing XML applications with minimal time and cost. | |
Royalty Free | Yes | Yes | Per the W3C PP. | |
Fragmentable | Yes | Yes | EXI can represent any collection of XML fragments extracted from any collection of XML documents. All schema optimization, bit-packing and XML compression algorithms apply equally to fragments. | |
Streamable | Yes | Yes | ||
Roundtrip Support | Yes | The equivalence is exact in both cases. | Yes | EXI supports lossless equivalence for PSVI, Infoset and lexical applications, such as XML Digital Signatures. The EXI "preserve" option can be used when this property is needed. |
Generality | No | XML scores 8/20, Gzipped XML 10/20 (seeappendix B.) | Yes | EXI scores 19/20 (seeappendix B.) |
Schema Extensions and Deviations | Yes | Yes | EXI includes schema optimizations that support arbitrary schema extensions and deviations. Applications may specify strict or extensible schema handling and may provide a full schema, partial schema or no schema at all. | |
Format Version Identifier | Yes | Both XML and gzip include an identifier in the header. | Yes | EXI header includes version. |
Content Type Management | Yes | Yes | EXI can be used in various contexts, some which use a media type and some which use content encoding, or both. | |
Self-Contained | Yes | Yes | When schema optimizations are not used, EXI documents are always self-contained. | |
MUST NOT Prevent | ||||
Processing Efficiency | Prevents (SeeNote above) | XBC Measurement methodology defines processing efficiency relative to XML. This renders XML implementations unable to exceed the threshold by definition. The score given on the left merely reflects this, and does not mean anything but that XML is the norm used by the methodology. See how EXI outperforms XML(+gzip) as demonstrated inProcessing efficiency results and summarized in the subsequent column on the right. | Does Not Prevent | Current implementations achieve performance several times faster than XML using both in-memory tests and more realistic scenarios that involve file and network IO. These implementations do not depend on compile-time schema-binding techniques that make dynamically acquiring, loading or updating schemas impractical or impossible. |
Small Footprint | Does Not Prevent | Does Not Prevent | TBD in CR phase: check implementation for a variety of small, mobile devices. | |
Widespread Adoption | Does Not Prevent | Both XML and gzip have been widely adopted and included in many protocol standards. | Does Not Prevent | |
Space Efficiency | Prevents | Does Not Prevent | TBD on CR phase: check implementations for small, mobile devices. | |
Implementation Cost | Does Not Prevent | Does Not Prevent | TBD in CR phase. | |
Forward Compatibility | Does Not Prevent | Does Not Prevent |
DRAFT @@ other items for discussion?
A format is directly readable and writable if it can be serialized from aninstance of a data model and parsed into an instance of a data model withoutfirst being transformed to an intermediate representation. The retained datamodel for EXI is the XML Infoset.
A format is transport independent if the only assumptions of transportservice are "error-free and ordered delivery of messages without any arbitraryrestrictions on the message length".
However, a protocol binding can specify how a format is transmitted as payloadin a specific transport (e.g., TCP/IP) or messaging (e.g., HTTP) protocol.
The Compactness property measurement represents the amount of compression aparticular format achieves when encoding data model items. There are threecategories of methods to reduce the size of a data object or data modelitems:
This property is measured in the EXI testing framework in 4 measurementmodes: "Neither" optimization (pure tokenization), "Schema" (schema-basedcompression), "Document" (data analysis), "Both" (data analysis + schema-basedcompression).
A format is human language neutral if it is not significantly more optimalfor processing when its content is in a given language or set thereof, and doesnot impose restrictions on the languages or combinations of languages that maybe used with it. Historically, it has often been a property of many data anddocument formats that they only supported certain character encodings. XML donot suffer from similar limitations, and it is expected that EXI will not limitthe usage of particular human languages.
In terms of compactness or processing efficiency, it is not possible to ensurethe same performance for a language that can be entirely captured using asingle byte per character and for one that requires a multi-byte encoding, butan internationalization support equivalent to XML is necessary for a wideadoption.
Platform neutrality is the property of formats that are not significantlymore optimal for processing on some computing platforms or architectures thanon others (e.g. endianness, native structures for programming language).Platform neutrality ensures not only that wide adoption is possible, but alsomakes the format more resilient to the passing of time.
In some cases, options in the format may be used based on the preferredparameters of the systems involved. Thus, the XBC Working Group proposed 3possible values:
It must also be noted that allowing too many mechanisms (options orparameters) for optimization may in fact prove to be a pessimisation.
Per the EXI Working Group charter, this property must be seen as a strongrequirement. The integration of the EXI format in the stack of the existing XMLspecifications for validation, transformation, querying, APIs,canonicalization, signatures, encryption, etc. is a key to a wide adoption.
The EXI format will be unencumbered and royalty-free as ensured by theprocess W3C. It will lead this technology to a better adoption across theindustry. A free format is also more likely to have free, open source code forprocessing it and free tools for building applications which use it. Inaddition, per the EXI Working Group charter, the EXI format will be proven tohave at least one publicly available implementation before becoming a W3CRecommendation.
Fragmentability is the ability to encode instances that do not represent theentirety of a document together with sufficient context for the decoder toprocess them. In addition to this ability to process fragments in isolation, itcovers storing one or more parts of a document instance as immediatelyextractable fragments, so that they can be pulled out with little or noadditional processing cost.
Streamability is the ability to generate correct partial output from partialinput. This property is needed in memory-constrained environments where it isimportant to be able to handle data as it is generated to avoid buffering ofdata inside the processor. Hence it is also characterized by the amount ofbuffering that needs to be done in the processors. In particular, requiredbuffer space for encoding or decoding must be constant, no matter what theinput document is or how it is mapped to the data model. This requirementprecludes some serialization techniques (e.g. Gzip compression over the entireXML document).
A particular attention must be paid to the need for lookahead in the formatparser, since it is not always available. For some types of sequences it can bebeneficial to have the length of the full serialized form of the sequence thatprecede the actual sequence, so the serializer must buffer the whole sequencebefore outputting anything. If such sequences can be arbitrarily long, thissacrifices output streamability.
A format supports roundtripping if converting a file from XML to that formatand back produces an output equivalent to the original input. A format supportsroundtripping via XML if converting a file from that format to XML and backproduces an output equivalent to the original input.
This property is measured by comparing the data which can be represented inXML with those that can be represented in the EXI format:
A format has the property of generality if it is competitive withalternatives across a diverse range of XML documents, applications and usecases. The EXI testing framework covers the XBC use cases. The goal of this setof test cases is to include a range of different document sizes, different usesof schemas and various XML features (comments, whitespaces, etc.)
The measurement of this property is defined by the XBC Working Group as ascore over 20 items, 1 point per item:
A format supports schema extensions and deviations if it allows applicationsto encode XML Infosets that are not conformant to the schema or not defined inthe schema associated with the document.
This property refers to the ability to efficiently determine the version ofa format from a document instance. It is desirable to access this informationas early as possible, so a format that does not make this information availablewhen the processing starts should be considered inefficient as far as thisproperty is concerned.
This property refers to the definition in the format of one or more mediatypes and/or encodings to be used when transferring documents. It is requiredfor content negotiation, hence its importance for the Web.
The XBC Working Group proposed four degrees of support:
An XML format is self-contained if the only information that is required toreproduce the data model instance is (i) the representation of the data modelinstance and (ii) the specification of the XML format. When no externalinformation is known by the receiver, the document needs to be self-contained.
This property refers to the speed at which a new format can be generatedand/or consumed for processing. It covers serialization, parsing and databinding. The XBC Working Group proposed the following criteria for itsmeasurement:
Each measurement should be recorded as a percentage faster than a standardtext-based alternative for each type of operation.
The EXI testing framework implements parsing from SAX and serializationthrough SAX. Alternative APIs can be used.
This property refers to the size of a processor implementing a new formatwith respect to that of a processor implementing XML. Ideally, the evaluationof this property would be done through a range of implementations in differentlanguages on various platforms. Since a complete evaluation could not be easilyachieved during the development of the format itself, due to the small numberof implementations at this time, an alternate solution consists in consideringthe number and/or complexity of the mandatory features (which impacts the sizeof the code segment) and the amount of data that must be available to aprocessor in order to support the format (which impacts the size of theinitialized data segment).
A format is more ubiquitous to the extent it has been implemented on agreater range and number of devices and used in a wider variety ofapplications. There is a tradeoff between the format implementation cost andcomplexity and the adequation to the applications' needs.
A requirement on XML was "It shall be easy to write programs which processXML documents." A rough estimate of implementation cost can be made byconsidering how much time does it take for a solitary programmer to implementsufficiently robust processing of the format (the so-called Desperate PerlHacker measure).
The possibility to reuse common APIs (including DOM, SAX, StAX) lowers thecost of implementation of an alternate encoding of the XML Infoset. Thisproperty benefits from the Integration in the XML Stack property.
This property refers to the memory requirements of a processor implementingEXI with respect to that of a processor implementing XML. The measurement is apercentage of the dynamic memory costs for equivalent XML processing.
The measurement for this property is by inspection of format specification,logical analysis, and empirical testing on test scenarios. The EXI testingframework measures the heap size in each test case.
A format must support the evolution of data models and must allowcorresponding implementation of layered standards. Format version and extensionpoints are related to this property. Evolution of XML and its data models couldmean additional character encodings, additional element/attribute/bodystructure, or new predefined behavior similar to ID attributes. Integration ofthe EXI format into the XML Stack is also related to this property.
Criteria | XML | XML+gzip | EXI |
Can represent documents without a schema | 1 | 1 | 1 |
Can represent documents that include elements and attributes not defined in the associated schema (i.e., open content) | 1 | 1 | 1 |
Can represent any schema-invalid document | 1 | 1 | 1 |
Can leverage available schema information to improve compactness, processing speed, and resource utilization | 0 | 0 | 1 |
Can leverage available schema information to improve compactness, processing speed, and resource utilization even when documents contain elements and attributes not defined in the schema | 0 | 0 | 1 |
Can leverage available schema information to improve compactness, processing speed, and resource utilization for any schema-invalid document | 0 | 0 | 1 |
Can leverage document analysis to improve compactness | 0 | 1 | 1 |
Can suppress document analysis to increase speed and reduce resource utilization | 1 | 0 | 1 |
[optional] Can adjust document analysis to meet application performance and resource utilization criteria | 0 | 1 | 1 |
Can structure the binary XML stream to increase net compactness when off-the-shelf compression software is built in to the communications infrastructure | 0 | 0 | 1 |
[optional] Supports high fidelity XML representations that preserve an exact copy of the original XML document, including all whitespace and formatting | 1 | 1 | 0 |
Supports reduced fidelity XML representations that preserve all data model items, but discard whitespace and formatting to improve compactness | 1 | 1 | 1 |
Supports reduced fidelity XML representations that preserve all information needed by a particular application, but discard specified information items that are not needed (e.g., comments and processing instructions) to improve compactness | 1 | 1 | 1 |
Supports reduced fidelity XML representations that preserve the logical structures and values of an XML document, but discard lexical and syntactic constructs to improve compactness | 1 | 1 | 1 |
Can consistently produce XML representations that are close to the same size or smaller than XML documents compressed using gzip | 0 | 1 | 1 |
Can consistently produce more compact XML representations than XML documents compressed using gzip | 0 | 0 | 1 |
Can consistently produce more compact XML representations than binary XML documents created with document analysis suppressed, then compressed using gzip | 0 | 0 | 1 |
Can consistently produce XML representations that are close to the same size or smaller than the equivalent ASN.1 PER encoding plus 20% | 0 | 0 | 1 |
Can consistently produce XML representations that are more compact than the equivalent ASN.1 PER encoding plus 20% | 0 | 0 | 1 |
[optional] Can consistently produce XML representations that are more compact than the equivalent ASN.1 PER encoding plus 20% compressed using gzip | 0 | 0 | 1 |
8 | 10 | 19 |