Results for a query in Data Commons | |
| Founder | Ramanathan V. Guha |
|---|---|
| Key people | Prem Ramaswami (Head of Data Commons) |
| Parent | |
| URL | datacommons |
| Launched | May 2018; 7 years ago (2018-05) |
Data Commons is an open-source platform[1] created byGoogle[2] that provides anopen knowledge graph, combining economic, scientific and other public datasets into a unified view.[3]Ramanathan V. Guha, a creator of web standards includingRDF,[4]RSS, andSchema.org,[5] founded the project,[6] which is now led by Prem Ramaswami.[7]
The Data Commons website was launched in May 2018 with an initial dataset consisting offact-checking data published inSchema.org "ClaimReview" format by several fact checkers from theInternational Fact-Checking Network.[8][9] Google has worked with partners such as theUnited Nations (UN) to populate the repository,[2] which also includes data from theUnited States Census, theWorld Bank, theUS Bureau of Labor Statistics,[10]Wikipedia, theNational Oceanic and Atmospheric Administration and theFederal Bureau of Investigation.[11]
The service expanded during 2019 to include anRDF-styleknowledge graph populated from a number of largely statistical open datasets. The service was announced to a wider audience in 2019.[12] In 2020 the service improved its coverage of non-US datasets, while also increasing its coverage ofbioinformatics andcoronavirus.[13] In 2023, the service relaunched with a natural-language front end powered by alarge language model.[2] It also launched as the back end to the UN data portal withSustainable Development Goals data.[14]
Data Commons places more emphasis on statistical data than is common forlinked data andknowledge graph initiatives. It includes geographical, demographic, weather and real estate data alongside other categories,[3] describing states, Congressional districts, and cities in the United States as well as biological specimens, power plants, and elements of thehuman genome via theEncyclopedia of DNA Elements (ENCODE) project.[11] It represents data assemantic triples each of which can have its own provenance.[3] It centers on the entity-oriented integration of statistical observations from a variety of public datasets. Although it supports a subset of the W3CSPARQL query language,[15] itsAPIs[16] also include tools — such as aPandas dataframe interface — oriented towards data science, statistics and data visualization.
Data Commons is integrative, meaning that it does not provide a hosting platform for different datasets, but rather attempts to consolidate much of the information provided by the datasets into a single data graph.
Data Commons is built on agraph data-model. The graph can be accessed through a browser interface and several APIs,[3][11] and is expanded through loading data (typically CSV andMCF-based templates).[17] The graph can be accessed by natural language queries inGoogle Search.[18] The data vocabulary used to define the datacommons.org graph is based uponSchema.org.[3] In particular the Schema.org terms StatisticalPopulation[19] and Observation[20] were proposed to Schema.org to support datacommons-like use cases.[21]
Software from the project is available onGitHub underApache 2 license.[22]