Copyright © 2018-2020W3C® (MIT,ERCIM,Keio,Beihang). W3Cliability,trademark andpermissive document license rules apply.
This section is non-normative.
This specification defines a file format and processing model for packaging into a single-file container the set of related resources and associated metadata that comprise adigital publication.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of currentW3C publications and the latest revision of this technical report can be found in theW3C technical reports index at https://www.w3.org/TR/.
This document was published by thePublishing Working Group as a Working Group Note.
GitHub Issues are preferred for discussion of this specification. Alternatively, you can send comments to our mailing list. Please send them topublic-publ-wg@w3.org (archives).
Publication as a Working Group Note does not imply endorsement by theW3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under theW3C Patent Policy.
The group does not expect this document to become aW3C Recommendation.
This document is governed by the1 March 2019W3C Process Document.
This section is non-normative.
A digital publication Package is used:
To exchange in-progress packaged publications between different individuals and/ordifferent organizations;
To provide finalized packaged publications from a publisher or conversion house to different distribution or sales channels; and
To deliver packaged publications to users or user agents.
This specification is based on proven technologies and allows digital publications to be packaged in an easy way, hence the term "lightweight" used in its name.
This section is non-normative.
This document uses terminology defined by theW3C Note "Publishing and Linking on the Web" [publishing-linking], including, in particular,user anduser agent.
In addition, the following terminology is defined for use in this specification:
Content types that have intrinsic binary format qualities, such as video and audio media types which are already designed for optimum compression, or which provide optimized streaming capabilities.
Content types that benefit from compression due to the nature of their internal data structure, such as file formats based on character strings (for example, HTML, CSS, etc.).
Single-file container for the set of constituentresources and associated metadata that comprise a digital publication.
Preferred starting resource for a digital publication, enabling in some cases the discovery of its Publication Manifest.
Set of constituent resources and associated metadata, organized together in a uniquely identifiable grouping.
[JSON-LD] representation of a digital publication as defined in [pub-manifest].
Base directory of the Package file system.
Only the first instance of a term in a section is linked to its definition.
This section is non-normative.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key wordsMAY,MUST, andSHOULD in this document are to be interpreted as described inBCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
This section is non-normative.
For packaging the set of constituentresources and associated metadata that comprise a digital publication, this specification uses the ZIP format as specified in ISO/IEC 21320-1:2015 ([ISO21320] and [zip]).
This section is non-normative.
When stored in a Package, resources withNon-Codec content typesSHOULD be compressed and the Deflate compression algorithmMUST be used. This practice ensures that file entries stored in the Package have a smaller size.
Resources withCodec content typesSHOULD be stored without compression. In such case, compression would introduce unnecessary processing overhead at production time (especially with large resource files) and would impact audio/video playback performance at consumption time.
In some cases, the combination of compression with some encryption schemes might even hinder the ability of user agents to handle partial content requests (e.g. HTTP byte ranges), due to the technicaldifficulty to determine the length of the full resource ahead of media playback (e.g. HTTP Content-Length header).
This section is non-normative.
APackageMUST include at least one of the following files in itsRoot Directory:
publication.json
, whichMUST be in the format defined forPublication Manifests.index.html
whichMUST follow the requirements of thePrimary Entry Page of a digital publication.The Root Directory is virtual in nature: a user agent might or might not generate a physical root directory for the contents of the Package if such contents are unpackaged.
The contents of both filesMUST not be encrypted.
A PackageMUST also include all resources within the bounds of the digital publication, i.e. the finite set of resources obtained from the union of resources listed in the default reading order and resource list of the Publication Manifest.
These resource filesMAY be in any location descendant from the Root Directory, or in the Root Directory itself.
Contents within the PackageMUST reference these resources via relative-URL strings [url].
The [zip] specification has few constraints on the characters allowed for file and directory names. When crafting such names, authors must be careful to use characters which allow a broad interoperability among operating systems.
This section is non-normative.
If the Package contains apublication.json
file located in the Root Directory, the Publication Manifest is obtained byopening and parsing this file.
Otherwise, if the Package contains anindex.html
filelocated in the Root Directory, the Publication Manifest is obtained through the following steps:
document
be the result of the extraction of theindex.html
file from the Package.text/html
orapplication/xhtml+xml
, terminate this algorithm.link
element in tree order indocument
whoserel
attribute contains thepublication
token.null
, terminate this algorithm.href
attribute's value is the empty string,terminate this algorithm.If thehref
attribute value ofmanifest link has a non-nullfragmentidentifying an identifierid indocument
:
script
element in tree order, whoseid
attribute is equal toid and whosetype
attribute is equal toapplication/ld+json
.null
, terminate thisalgorithm.This branch is in use when the manifest is embedded in the primary entry page. Thealgorithm locates thescript
element and extract the manifest itself.
href
attribute.This branch is in use when the manifest is in a separate file. It performs thestandard operations to retrieve the manifest from the Package.
If bothindex.html
andpublication.json
are present in the Package, then the Primary Entry PageSHOULD contain a reference to thepublication.json
file, following the rules defined in this section.
Here, the Publication Manifest, located into the Root Directory, is simply referenced from the Primary Entry Page via an [HTML]link
element.
<linkhref="publication.json"rel="publication"/>
An example of a Primary Entry Page embedding a Publication Manifest is given in [audiobooks].
application/lpf+zip
Media TypeThis section is non-normative.
This appendix registers the media typeapplication/lpf+zip
for the Lightweight Packaging Format (LPF).
Lightweight Packaging Format (or LPF) is a container technology based on the [zip] archive format, used for packaging into a single-file container the set of related resources and associated metadata that comprise a digital publication . LPF and its related standards are maintained and defined by the World Wide Web Consortium (W3C).
application
lpf+zip
N/A
N/A
LPF files are binary files in ZIP format.
Security considerations that apply toapplication/zip
also apply to LPF files. For instance, an archive could contain compressed files that expand to fill all available disk space on a hard drive. In consequence, user agents that read LPF files should rigorously check the size and validity of data retrieved.
In addition, because of the various content types that can be embedded in LPF files,application/lpf+zip
may describe content that poses security issues,e.g. malicious executable content deliberately included in the package. However, only in cases where the user agent recognizes and processes the additional content, or where further processing of that content is dispatched to other user agents, would security issues potentially arise. In such cases, matters of security would fall outside the domain of this registration document.
Any format based on LPF, if using content encryption,MUST choose a different MIME media type and file extension than those defined in this specification.
This media type registration is for the Lightweight Packaging Format (LPF), as described by theLightweight Packaging Format (LPF) specification located athttps://www.w3.org/TR/lpf
.
This media type is intended to be used by multiple interoperable applications for the distribution and consumption of ebooks, audiobooks, digital visual narratives and other types of digital publications.
0: PK 0x03 0x04
LPF files are most often identified with the extension.lpf
.
ZIP
None
Ivan Herman (ivan@w3.org)
COMMON
The published specification is a work product of the World Wide Web Consortium (W3C)’s Publishing Working Group. TheW3C has change control over this specification.
This section is non-normative.
The editor would like to thank the members of the Publishing Working Group for their contributions to this specification: