This is a draft.We need your help to make it better. Get involved, learn more, and help us improve the Open Definition:
TheOpen Definition has three key requirements for a work to be open: anopen license,open access, and anopen format. This page focuses on theopen format:
Section 1.3 Open Format from the Open Definition version 2.0 states:
Theworkmust be provided in a convenient and modifiable form such that there are no unnecessary technological obstacles to the performance of the licensed rights. Specifically, datashould be machine-readable, available in bulk, and provided in an open format (i.e., a format with a freely available published specification which places no restrictions, monetary or otherwise, upon its use) or, at the very least, can be processed with at least one free/libre/open-source software tool.
We want to create open knowledge. To help achieve this, the Open Format requires:
Theworkmust be provided in a convenient format so that it is easy to reuse. This requires thework to be published in a format that maximises knowledge sharing and reuse. The format may vary for different media types (e.g. image, text, tabular or geographic data).
Theworkmust be provided in a modifiable format so it can be reused in different ways, in part or in whole. What is an appropriate modifiable form?
(contribution needed)
Data is machine-readable if it is in a format that can be easily read, written, parsed and displayed by a computer.
For example:
As another example:
Appropriate machine-readable format may vary by data type. For example, a machine-readable format for geographic data may be different to a format for tabular data.
This section is based on anarchived OKFN glossary and an Open Definition discussion abouta harmonised Open Format definition.
See alsohttps://www.data.gov/developers/blog/primer-machine-readability-online-documents-and-data
Theworkshould be provided in bulk means that the data can be easily downloaded as a whole in one request.
This requirement complements the Access section of the Open Definition and together they require that:
But your data can still be open if you publish it as many individual files (however it could be argued you’re not publishing it in a convenient form).
An Open Data Format is a format with, “a freely available published specification which places no restrictions, monetary or otherwise, upon its use”.
A freely available published specification allows:
If an open data format has no restrictions, monetary or otherwise, upon its use, then:
An Open Format is a format that, “can be processed with at least one free/libre/open-source software tool”.
If there is a free software tool available to process the data, then the data can be re-used without the need to implement software.
The Open Format for data definitions above enable tabular data (e.g. a Nation Budget) to be published as a PDF (an open format according to the definition). However, this is not a convenient form for this type of data and, “theworkmust be provided in a convenient and modifiable form such that there are no unnecessary technological obstacles to the performance of the licensed rights”.
So, is a PDF of a National Budget open?
Tim Berners-Lee’s5 Star Open Data scheme says it’s open and gets 1 star.
Based on the definition of machine readable above, a PDF of a Nation Budget isn’t open. (contribution needed - is this the intent?)
It could be argued that by prefixing the second sentence of the Open Format with, “Specifically, datashould…”, this means non-dataworksmay, but are not required to:
(Contribution needed - Is it OK that these requirements are all optional for non-data works?)
Some words in the Open Definition have special meaning and are shown inbold oritalics. There meaning is defined below:
Work - denotes the item or piece of knowledge being transferred. Examples of a work include, but are not limited to: data, music, art, images, video, literary compositions, web pages and software.
Must,Required, orShall - an absolute requirementRFC2119.
Must Not orShall Not - an absolute prohibitionRFC2119.
Should orRecommended - there may be valid reasons to ignore this requirement but the full implications must be understood and carefully weighed before choosing a different courseRFC2119.
Should Not orNot Recommended - there may be valid reasons when the particular behaviour is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behaviour described with this labelRFC2119.
May orOptional - the item is truly optionalRFC2119.
These improvement ideas mainly come from conversations on thediscussion list.
A specification that describes an open formatshould be:
Theworkshould be published in a lossless and uncompressed open format so all the original detail is retained.
Tools like theOpen Data Census andOpen Data Certificates test to see if data is published using an open format. Thisimprovement idea seeks to harmonise the definition of the Open Format for data so that tools could all point to the Open Definition, in the same way the tools currently point to it for a definition of an open licence and a list ofconformant licenses.
Do you have another resource you’d like added below? Make the list better.
These links provide some alternate perspectives on open formats:
These lists of open formats have not been assessed as being conformant with the Open Definition: