Wikidata in the Linked Open Data Cloud, as at August 2020. Databases indicated as circles (with wikidata indicated as ‘WD’), with grey lines linking databases in the network if their data is aligned. Generated fromhttps://lod-cloud.net/datasets .DBpedia as the most interlinked LOD dataset and crystallization point of the Linked Open Data Cloud since 2008. Image from 2021, generated fromhttps://lod-cloud.net.
Incomputing,linked data is structured data which is interlinked with other data so it becomes more useful throughsemantic queries. It builds upon standardWeb technologies such asHTTP,RDF andURIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for theInternet to become a globaldatabase.[1]
Thus, we can identify the following components as essential to a global Linked Data system as envisioned, and to any actual Linked Data subset within it:
Linked open data are linked data that areopen data.[6][7][8] Tim Berners-Lee gives the clearest definition of linked open data as differentiated from linked data.
Linked Open Data (LOD) is Linked Data which is released under an open license, which does not impede its reuse for free.
In 2010,Tim Berners-Lee suggested a 5-star scheme for grading the quality of open data on the web, for which the highest ranking is Linked Open Data:[11]
The term "linked open data" has been in use since at least February 2007, when the "Linking Open Data" mailing list[12] was created.[13] The mailing list was initially hosted by theSIMILE project[14] at theMassachusetts Institute of Technology.
The above diagram shows which Linking Open Data datasets are connected, as of August 2014. This was produced by the Linked Open Data Cloud project, which was started in 2007. Some sets may include copyrighted data which is freely available.[15]The same diagram as above, but for February 2017, showing the growth in just two and a half yearsThe LOD cloud in December 2024
The goal of the W3C Semantic Web Education and Outreach group's Linking Open Data community project is to extend the Web with adata commons by publishing variousopendatasets as RDF on the Web and by settingRDF links between data items from different data sources. In October 2007, datasets consisted of over two billion RDFtriples, which were interlinked by over two million RDF links.[16][17] By September 2011 this had grown to 31 billion RDF triples, interlinked by around 504 million RDF links. A detailed statistical breakdown was published in 2014.[18]
There are a number ofEuropean Union projects involving linked data. These include the linked open data around the clock (LATC) project,[19] the AKN4EU project for machine-readable legislative data,[20] the PlanetData project,[21] the DaPaaS (Data-and-Platform-as-a-Service) project,[22] and the Linked Open Data 2 (LOD2) project.[23][24][25] Data linking is one of the main goals of theEU Open Data Portal, which makes available thousands of datasets for anyone to reuse and link.
Ontologies are formal descriptions of data structures. Some of the better known ontologies are:
FOAF – an ontology describing persons, their properties and relationships
UMBEL – a lightweight reference structure of 20,000 subject concept classes and their relationships derived fromOpenCyc, which can act as binding classes to external data; also has links to 1.5 million named entities from DBpedia andYAGO
DBpedia – a dataset containing extracted data from Wikipedia; it contains about 3.4 million concepts described by 1 billiontriples, including abstracts in 11 different languages
GeoNames – provides RDF descriptions of more than 7,500,000 geographical features worldwide
Wikidata – a collaboratively-created linked dataset that acts as central storage for the structured data of itsWikimedia Foundation sibling projects
Global Research Identifier Database (GRID) – an international database of 89,506 institutions engaged in academic research, with 14,401 relationships. GRID models two types of relationships: a parent-child relationship that defines a subordinate association, and a related relationship that describes other associations[26][27]
KnowWhereGraph[28] – an integrated 12 billion triples strongknowledge graph of 30 data layers at the intersection between humans and their environment using Semantic Web and Linked Data technologies.[29]
Clickable diagrams that show the individual datasets and their relationships within the DBpedia-spawned LOD cloud (as by the figures to the right) are available.[30][31]
Linked Data Is Merely More Data – Prateek Jain,Pascal Hitzler, Peter Z. Yeh, Kunal Verma, and Amit P. Sheth. In: Dan Brickley, Vinay K. Chaudhri, Harry Halpin, and Deborah McGuinness:Linked Data Meets Artificial Intelligence. Technical Report SS-10-07, AAAI Press, Menlo Park, California, 2010, pp. 82–86.