Office Open XML |
---|
![]() | |
Filename extension | .docx, .docm |
---|---|
Internet media type | |
Developed by | Microsoft,Ecma,ISO/IEC |
Initial release | 2006; 19 years ago (2006) |
Type of format | Document file format |
Extended from | XML,DOC,WordProcessingML |
Standard | ECMA-376, ISO/IEC 29500 |
Website | ECMA-376,ISO/IEC 29500:2008 |
![]() | |
Filename extension | .pptx, .pptm |
---|---|
Internet media type | |
Developed by | Microsoft,Ecma,ISO/IEC |
Type of format | Presentation |
Extended from | XML,PPT |
Standard | ECMA-376, ISO/IEC 29500 |
Website | ECMA-376,ISO/IEC 29500:2008 |
![]() | |
Filename extension | .xlsx, .xlsm |
---|---|
Internet media type | |
Developed by | Microsoft,Ecma,ISO/IEC |
Type of format | Spreadsheet |
Extended from | XML,XLS,SpreadsheetML |
Standard | ECMA-376, ISO/IEC 29500 |
Website | ECMA-376,ISO/IEC 29500:2008 |
TheOffice Open XML file formats are a set offile formats that can be used to represent electronicoffice documents. There are formats forword processing documents,spreadsheets andpresentations as well as specific formats for material such as mathematical formulas, graphics, bibliographies etc.
The formats were developed byMicrosoft and first appeared inMicrosoft Office 2007. They were standardized between December 2006 and November 2008, first by theEcma International consortium, where they became ECMA-376, and subsequently, after acontentious standardization process, by the ISO/IEC's Joint Technical Committee 1, where they became ISO/IEC 29500:2008.
Office Open XML documents are stored inOpen Packaging Conventions (OPC) packages, which areZIP files containingXML and other data files, along with a specification of the relationships between them.[2] Depending on the type of the document, the packages have different internal directory structures and names. An application will use the relationships files to locate individual sections (files), with each having accompanying metadata, in particularMIME metadata.
A basic package contains an XML file called[Content_Types].xml at the root, along with three directories:_rels,docProps, and a directory specific for the document type (for example, in a .docx word processing package, there would be aword directory). Theword directory contains thedocument.xml file which is the core content of the document.
An example relationship file (word/_rels/document.xml.rels), is:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?><Relationshipsxmlns="http://schemas.microsoft.com/package/2005/06/relationships"><RelationshipId="rId1"Type="http://schemas.microsoft.com/office/2006/relationships/image"Target="http://en.wikipedia.org/images/wiki-en.png"TargetMode="External"/><RelationshipId="rId2"Type="http://schemas.microsoft.com/office/2006/relationships/hyperlink"Target="http://www.wikipedia.org"TargetMode="External"/></Relationships>
As such, images referenced in the document can be found in the relationship file by looking for all relationships that are of typehttp://schemas.microsoft.com/office/2006/relationships/image
. To change the used image, edit the relationship.
The following code shows an example of inline markup for ahyperlink:
<w:hyperlinkr:id="rId2"w:history="1"xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
In this example, theUniform Resource Locator (URL) is in the Target attribute of the Relationship referenced through the relationship Id, "rId2" in this case. Linked images, templates, and other items are referenced in the same way.
Pictures can be embedded or linked using a tag:
<v:imagedataw:rel="rId1"o:title="example"/>
This is the reference to the image file. All references are managed via relationships. For example, a document.xml has a relationship to the image. There is a _rels directory in the same directory as document.xml, inside _rels is a file called document.xml.rels. In this file there will be a relationship definition that contains type, ID and location. The ID is the referenced ID used in the XML document. The type will be a reference schema definition for the media type and the location will be an internal location within the ZIP package or an external location defined with a URL.
Office Open XML uses theDublin CoreMetadata Element Set andDCMI Metadata Terms to store document properties. Dublin Core is a standard for cross-domain information resource description and is defined inISO 15836:2003.
An example document properties file (docProps/core.xml) that uses Dublin Core metadata, is:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><cp:corePropertiesxmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties"xmlns:dc="http://purl.org/dc/elements/1.1/"xmlns:dcterms="http://purl.org/dc/terms/"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><dc:title>OfficeOpenXML</dc:title><dc:subject>Fileformatandstructure</dc:subject><dc:creator>Wikipedia</dc:creator><cp:keywords>OfficeOpenXML,Metadata,DublinCore</cp:keywords><dc:description>OfficeOpenXMLusesISO15836:2003</dc:description><cp:lastModifiedBy>Wikipedia</cp:lastModifiedBy><cp:revision>1</cp:revision><dcterms:createdxsi:type="dcterms:W3CDTF">2008-06-19T20:00:00Z</dcterms:created><dcterms:modifiedxsi:type="dcterms:W3CDTF">2008-06-19T20:42:00Z</dcterms:modified><cp:category>Documentfileformat</cp:category><cp:contentStatus>Final</cp:contentStatus></cp:coreProperties>
An Office Open XML file may contain several documents encoded in specializedmarkup languages corresponding to applications within the Microsoft Office product line. Office Open XML defines multiple vocabularies using 27namespaces and 89schema modules.
The primary markup languages are:
Shared markup language materials include:
In addition to the above markup languages custom XML schemas can be used to extend Office Open XML.
Patrick Durusau, the editor ofODF, has viewed the markup style of OOXML and ODF as representing two sides of a debate: the "element side" and the "attribute side". He notes that OOXML represents "the element side of this approach" and singles out theKeepNext
element as an example:
<w:pPr><w:keepNext/>…</w:pPr>
In contrast, he notes ODF would use the single attributefo:keep-next
, rather than an element, for the same semantic.[3]
TheXML Schema of Office Open XML emphasizes reducing load time and improvingparsing speed.[4] In a test with applications current in April 2007, XML-based office documents were slower to load than binary formats.[5] To enhance performance, Office Open XML uses very short element names for common elements and spreadsheets save dates as index numbers (starting from 1900 or from 1904).[6] In order to be systematic and generic, Office Open XML typically uses separate child elements for data and metadata (element names ending inPr forproperties) rather than using multiple attributes, which allows structured properties. Office Open XML does not use mixed content but uses elements to put a series of text runs (element namer) into paragraphs (element namep). The result is terse[citation needed] and highly nested in contrast toHTML, for example, which is fairly flat, designed for humans to write intext editors and is more congenial for humans to read.
The naming of elements and attributes within the text has attracted some criticism. There are three different syntaxes in OOXML (ECMA-376) for specifying the color and alignment of text depending on whether the document is a text, spreadsheet, or presentation. Rob Weir (anIBM employee and co-chair of theOASISOpenDocument Format TC) asks "What is the engineering justification for this horror?". He contrasts withOpenDocument: "ODF uses the W3C's XSL-FO vocabulary for text styling, and uses this vocabulary consistently".[7]
Some have argued the design is based too closely on Microsoft applications.In August 2007, theLinux Foundation published a blog post calling upon ISO National Bodies to vote "No, with comments" during the International Standardization of OOXML. It said, "OOXML is a direct port of a single vendor's binary document formats. It avoids the re-use of relevant existing international standards (e.g. several cryptographic algorithms, VML, etc.). There are literally hundreds of technical flaws that should be addressed before standardizing OOXML including continued use of binary code tied to platform specific features, propagating bugs in MS-Office into the standard, proprietary units, references to proprietary/confidential tags, unclearIP and patent rights, and much more".[8]
The version of the standard submitted toJTC 1 was 6546 pages long. The need and appropriateness of such length has been questioned.[9][10]Google stated that "the ODF standard, which achieves the same goal, is only 867 pages"[9]
Word processing documents use the XML vocabulary known as WordprocessingML normatively defined by the schemawml.xsd
which accompanies the standard. This vocabulary is defined in clause 11 of Part 1.[11]
Spreadsheet documents use the XML vocabulary known as SpreadsheetML normatively defined by the schemasml.xsd
which accompanies the standard. This vocabulary is described in clause 12 of Part 1.[11]
Each worksheet in a spreadsheet is represented by an XML document with a root element named<worksheet>...</worksheet>
in thehttp://schemas.openxmlformats.org/spreadsheetml/2006/main Namespace.
The representation of date and time values in SpreadsheetML has attracted some criticism. ECMA-376 1st edition does not conform to ISO 8601:2004 "Representation of Dates and Times". It requires that implementations replicate aLotus 1-2-3[12] bug that erroneously treats 1900 as a leap year. Products complying with ECMA-376 would be required to use the WEEKDAY() spreadsheet function, and therefore assign incorrect dates to some days of the week, and also miscalculate the number of days between certain dates.[13] ECMA-376 2nd edition (ISO/IEC 29500) allows the use of 8601:2004 "Representation of Dates and Times" in addition to the Lotus 1-2-3 bug-compatible form.[14][15]
Office Math Markup Language is a mathematical markup language which can be embedded in WordprocessingML, with intrinsic support for including word processing markup like revision markings,[16] footnotes, comments, images and elaborate formatting and styles.[17]The OMML format is different from theWorld Wide Web Consortium (W3C)MathML recommendation that does not support those office features, but is partially compatible[18] throughXSL Transformations; tools are provided with office suite and are automatically used via clipboard transformations.[19]
The following Office MathML example defines thefraction:
<m:oMathPara><!-- mathematical block container used as a paragraph --><m:oMath><!-- mathematical inline formula --><m:f><!-- a fraction --><m:num><m:r><m:t>π</m:t></m:r></m:num><!-- numerator containing a single run of text --><m:den><m:r><m:t>2</m:t></m:r></m:den><!-- denominator containing a single run of text --></m:f></m:oMath></m:oMathPara>
Some have queried the need for Office MathML (OMML) instead advocating the use ofMathML, aW3C recommendation for the "inclusion of mathematical expressions in Web pages" and "machine to machine communication".[20] Murray Sargent has answered some of these issues in a blog post, which details some of the philosophical differences between the two formats.[21]
DrawingML is thevector graphics markup language used in Office Open XML documents. Its major features are the graphics rendering of text elements, graphical vector-based shape elements, graphical tables and charts.
The DrawingML table is the third table model in Office Open XML (next to the table models in WordprocessingML and SpreadsheetML) and is optimized for graphical effects and its main use is in presentations created with PresentationML markup.DrawingML contains graphics effects (like shadows and reflection) that can be used on the different graphical elements that are used in DrawingML.In DrawingML you can also create 3d effects, for instance to show the different graphical elements through a flexible camera viewpoint.It is possible to create separate DrawingML theme parts in an Office Open XML package. These themes can then be applied to graphical elements throughout the Office Open XML package.[22]
DrawingML is unrelated to the othervector graphics formats such asSVG. These can be converted to DrawingML to include natively in an Office Open XML document. This is a different approach to that of theOpenDocument format, which uses a subset of SVG, and includes vector graphics as separate files.
A DrawingML graphic's dimensions are specified inEnglish Metric Units (EMUs). It is so called because it allows an exact common representation of dimensions originally in eitherEnglish ormetric units—defined as 1/360,000 of acentimeter, and thus there are 914,400 EMUs perinch, and 12,700 EMUs perpoint, to prevent round-off in calculations.Rick Jelliffe favors EMUs as a rational solution to a particular set of design criteria.[23]
Some have criticised the use of DrawingML (and the transitional-use-onlyVML) instead ofW3C recommendationSVG.[24] VML did not become a W3C recommendation.[25]
OOXML documents are typically composed of other resources in addition to XML content (graphics, video, etc.).
Some have criticised the choice of permitted format for such resources: ECMA-376 1st edition specifies "Embedded Object Alternate Image Requests Types" and "Clipboard Format Types", which refer toWindows Metafiles orEnhanced Metafiles – each of which are proprietary formats that have hard-coded dependencies onWindows itself. The critics state the standard should instead have referenced the platform neutral standard ISO/IEC 8632 "Computer Graphics Metafile".[13]
The Standard provides three mechanisms to allow foreign markup to be embedded within content for editing purposes:
These are defined in clause 17.5 of Part 1.
Versions of Office Open XML contain what are termed "compatibility settings". These are contained in Part 4 ("Markup Language Reference") of ECMA-376 1st Edition, but during standardization were moved to become a new part (also called Part 4) of ISO/IEC 29500:2008 ("Transitional Migration Features").
These settings (including element with names such asautoSpaceLikeWord95,footnoteLayoutLikeWW8,lineWrapLikeWord6,mwSmallCaps,shapeLayoutLikeWW8,suppressTopSpacingWP,truncateFontHeightsLikeWP6,uiCompat97To2003,useWord2002TableStyleRules,useWord97LineBreakRules,wpJustification andwpSpaceWidth) were the focus of some controversy during the standardisation of DIS 29500.[26] As a result, new text was added to ISO/IEC 29500 to document them.[27]
An article inFree Software Magazine has criticized the markup used for these settings. Office Open XML uses distinctly named elements for each compatibility setting, each of which is declared in the schema. The repertoire of settings is thus limited — for new compatibility settings to be added, new elements may need to be declared, "potentially creating thousands of them, each having nothing to do with interoperability".[28]
The standard provides two types of extensibility mechanism, Markup Compatibility and Extensibility (MCE) defined in Part 3 (ISO/IEC 29500-3:2008) and Extension Lists defined in clause 18.2.10 of Part 1.
If ISO were to give OOXML with its 6546 pages the same level of review that other standards have seen, it would take 18 years (6576 days for 6546 pages) to achieve comparable levels of review to the existing ODF standard (871 days for 867 pages) which achieves the same purpose and is thus a good comparison. Considering that OOXML has only received about 5.5% of the review that comparable standards have undergone, reports about inconsistencies, contradictions and missing information are hardly surprising
"... OOXML chose this route. Rather than create an application-definable configuration tag there is a unique tag for each setting ... Currently, the only application's unique settings that are catered for are the applications that the standard's authors have decided to include, ... For other applications to be added, further tag names would need to be defined in the specification, potentially creating thousands of them, each having nothing to do with interoperability ..".