Movatterモバイル変換


[0]ホーム

URL:


W3C

Data Catalog Vocabulary (DCAT)

W3C Working Draft 05 April 2012

This version:
http://www.w3.org/TR/2012/WD-vocab-dcat-20120405/
Latest published version:
http://www.w3.org/TR/vocab-dcat/
Latest editor's draft:
http://dvcs.w3.org/hg/gld/raw-file/default/dcat/index.html
Editors:
Fadi Maali,DERI, NUIG
John Erickson,Tetherless World Constellation (RPI)
Phil Archer,W3C/ERCIM

Copyright © 2012W3C® (MIT,ERCIM,Keio), All Rights Reserved.W3Cliability,trademark anddocument use rules apply.


Abstract

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use.

By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of currentW3C publications and the latest revision of this technical report can be found in theW3C technical reports index at http://www.w3.org/TR/.

This publication signals the move of DCAT onto theW3C Recommendation Track. DCAT was first developed and published by DERI and has seen widespread adoption at the time of this publication. Theoriginal vocabulary was further developed by theeGov Interest Group, before being brought onto the Recommendation Track byGovernment Linked Data (GLD) Working Group.

This document was published by theGovernment Linked Data (GLD) Working Group as a First Public Working Draft. This document is intended to become aW3C Recommendation. If you wish to make comments regarding this document, please send them topublic-gld-comments@w3.org (subscribe,archives). All feedback is welcome.

Publication as a Working Draft does not imply endorsement by theW3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the5 February 2004W3C Patent Policy.W3C maintains apublic list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes containsEssential Claim(s) must disclose the information in accordance withsection 6 of theW3C Patent Policy.

Table of Contents

1.Introduction

This section is non-normative.

This document does not prescribe any particular method of deploying data expressed in DCAT. DCAT is applicable in many contexts including RDF accessible via SPARQL endpoints, embedded in HTML pages as RDFa, or serialized as e.g. RDF/XML or Turtle. The examples in this document use Turtle simply because of Turtle's readability.

2.Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key wordsmust,must not,required,should,should not,recommended,may, andoptional in this specification are to be interpreted as described in [RFC2119].

3.Namespaces

The namespace for DCAT ishttp://www.w3.org/ns/dcat#. However,it should be noted that DCAT makes extensive use of terms from other vocabularies,in particularDublin Core. DCAT itself defines a minimal set of classes andproperties of its own. A full set of namespaces and prefixes used in thisdocument is shown in the table below.

PrefixNamespace
dcathttp://www.w3.org/ns/dcat#
dctermshttp://purl.org/can never remember
foafhttp://xmlns.com/foaf/0.1/

4.Vocabulary Overview

This section is non-normative.

DCAT is an RDF vocabulary well-suited to representing government data catalogs such asData.gov anddata.gov.uk. DCAT defines three main classes:

Another important class in DCAT is dcat:CatalogRecord which describes a dataset entry in the catalog. Notice that while dcat:Dataset represents the dataset itself, dcat:CatalogRecord represents the record that describes a dataset in the catalog. The use of the CatalogRecord is considered optional. It is used to capture provenance information about dataset entries in a catalog. If this distinction is not necessary then CatalogRecord can be safely ignored.

@@Fadi:I will update the figure once the list of properties are listed

UML model of DCAT classes and properties

Example

This example provides a quick overview of how dcat might be used to represent a government catalog and its datasets.

@@@ TODO.jse: Illustrate more clearly how these "examples" might appear "in the wild." Esp. the use of skos:Concept (etc) in RDFa... @@@

First, the catalog description:

   :catalog       a dcat:Catalog ;       dct:title "Imaginary catalog" ;       rdfs:label "Imaginary catalog" ;       foaf:homepage <http://example.org/catalog> ;       dct:publisher :transparency-office ;       dcat:themes :themes ;       dct:language "en"^^xsd:language ;       dcat:dataset :dataset/001 ;       .

The publisher of the catalog has the relative URI :transparency-office. Further description of the publisher can be provided as in the following example:

   :transparency-office       a foaf:Agent ;       rdfs:label "Transparency Office" ;       .

The catalog classify its datasets according to a set of domains represented by the relative URI :themes. SKOS can be used to describe the domains used:

   :themes       a skos:ConceptScheme ;       skos:prefLabel "A set of domains to classify documents" ;   .

The catalog connect to each of its datasets via dcat:dataset. In the example above, an example dataset was mentioned with the relative URI :dataset/001. A possible description of it using dcat is shown below:

   :dataset/001       a       dcat:Dataset ;       dct:title "Imaginary dataset" ;       dcat:keyword "accountability","transparency" ,"payments" ;       dcat:theme :themes/accountability ;       dct:issued "2011-12-05"^^xsd:date ;       dct:updated "2011-12-05"^^xsd:date ;       dct:publisher :agency/finance-ministry ;       dct:accrualPeriodicity "every six months" ;       dct:language "en"^^xsd:language ;       dcat:Distribution :dataset/001/csv ;       .

Notice that this dataset is classified under the domain represented by the relative URI :themes/accountability. This should be part of the domains set identified by the URI :themes that was used to describe the catalog domains. An example SKOS description

    :themes/accountability         a skos:Concept ;        skos:inScheme :themes ;        skos:prefLabel "Accountability" ;        .

The dataset can be downloaded in CSV format via the distribution represented by :dataset/001/csv.

   :dataset/001/csv       a dcat:Distribution ;       dcat:accessURL <http://www.example.org/files/001.csv> ;       dct:title "CSV distribution of imaginary dataset 001" ;       dct:format [            a dct:IMT;             rdf:value "text/csv";             rdfs:label "CSV"       ]       .

Finally, if the catalog publisher decides to keep metadata describing its records (i.e. the records containing metadata describing the datasets) dcat:CatalogRecord can be used. For example, :dataset/001 was issued on 2011-12-05. however, its description on Imaginary Catalog was added on 2011-12-11. This can be represented by dcat:

   :record/001       a dcat:CatalogRecord ;       foaf:primaryTopic :dataset/001 ;       dct:issued "2011-12-11"^^xsd:date ;   .   :catalog       dcat:record :record/001 ;   .

Encoding of property values

5.Class: Catalog

A data catalog is a curated collection of metadata about datasets.

RDF class:dcat:Catalog
Usage note:Typically, a web-based data catalog is represented as a single instance of this class.
See also:Catalog record,Dataset

5.1Property: homepage

The homepage of the catalog.

RDF Property:foaf:homepage
Range:foaf:Document
Usage note:foaf:homepage is an inverse functional property (IFP) which means that it should be unique and precisely identify the catalog. This allows smushing various descriptions of the catalog when different URIs are used.

5.2Property: publisher

The entity responsible for making the catalog online.

RDF Property:dcterms:publisher
Range:foaf:Agent

5.3Property: spatial/geographic coverage>

The geographical area covered by the catalog.

RDF Property:dcterms:spatial
Range:dcterms:Location (Spatial region or named place)

5.4Property: themes

The knowledge organization system (KOS) used to classify catalog's datasets.

RDF Property:dcat:themeTaxonomy
Range:skos:ConceptScheme

5.5Property: title

A name given to the catalog.

RDF Property:dcterms:title
Range:rdfs:Literal

5.6Property: description

free-text account of the catalog.

RDF Property:dcterms:description
Range:rdfs:Literal

5.7Property: language

The language of the catalog. This refers to the language used in the textual metadata describing titles, descriptions, etc. of the datasets in the catalog.

RDF Property:dct:language
Range:rdfs:Literal a string representing the code of the language as described inhttp://www.ietf.org/rfc/rfc3066.txt
Usage note:Multiple values can be used. The publisher might also choose to describe the language on the dataset level (seedataset language).

5.8Property: license

This describes the license under which thecatalog can be used/reused andnot the datasets. Even if the license of the catalog applies to all of its datasets it should be replicated on each dataset.

RDF Property:dcterms:license
Range:dctype:LicenseDocument
Usage note:To allow automatic analysis of datasets, it is important to use canonical identifiers for well-known licenses, see @@@void guide@@@ for a list.
See also:dataset license

5.9Property: dataset

A dataset that is part of the catalog.

RDF Property:dcat:dataset
Range:dcat:Dataset

5.10Property: catalog record

A catalog record that is part of the catalog.

RDF Property:dcat:record
Range:dcat:CatalogRecord

6.Class: Catalog record

A record in a data catalog, describing a single dataset.

RDF Class:dcat:CatalogRecord
Usage noteThis class is optional and not all catalogs will use it. It exists for catalogs where a distinction is made between metadata about adataset and metadata about thedataset's entry in the catalog. For example, thepublication date property of thedataset reflects the date when the information was originally made available by the publishing agency, while the publication date of thecatalog record is the date when the dataset was added to the catalog. In cases where both dates differ, or where only the latter is known, thepublication date should only be specified for the catalog record.
See alsoDataset

In web-based catalogs, the URL of the catalog page should be used as URI for the catalog recordif it is a permalink.

If named graphs are used, all RDF triples describing the catalog record, the dataset, and its distributions, should go into a graph named with the catalog record's URI.

6.1Property: listing date

The date of listing the corresponding dataset in the catalog.

SeeIssue-3

RDF Property:dcterms:issued
Range:rdfs:Literal typed asxsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
Usage note:This indicates the date of listing the dataset in the catalog and not the publication date of the dataset itself.
See also: dataset release date

6.2Property: update/modification date

Most recent date on which the catalog entry was changed, updated or modified.

SeeIssue-3

RDF Property:dcterms:modified
Range:rdfs:Literal typed asxsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
Usage note:This indicates the date of last change of a catalog entry, i.e. the catalog metadata description of the dataset, and not the date of the dataset itself.
See also: dataset modification date

6.3Property: dataset

Links the catalog record to the dcat:Dataset resource described in the record.

SeeIssue-4

RDF Property:foaf:primaryTopic
Range:dcat:Dataset

7.Class: Dataset

A collection of data, published or curated by a single source, and available for access or download in one or more formats.

RDF Class:dcat:Dataset
Usage note:This class represents the actual dataset as published by the dataset publisher. In cases where a distinction between the actual dataset and its entry in the catalog is necessary (because metadata such as modification date and maintainer might differ), thecatalog record class can be used for the latter.
See also:Catalog record

7.1Property: update/modification date

Most recent date on which the dataset was changed, updated or modified.

SeeIssue-3

RDF Property:dcterms:modified
Range:rdfs:Literal typed asxsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
Usage note:The value of this property indicates a change to the actual dataset, not a change to the catalog record. An absent value may indicate that the dataset has never changed after its initial publication, or that the date of last modification is not known, or that the dataset is continuously updated.Example: 2010-05-07
See also:frequency

7.2Property: title

A name given to the dataset.

RDF Property:dcterms:title
Range:rdfs:Literal

7.3Property: description

free-text account of the dataset.

RDF Property:dcterms:description
Range:rdfs:Literal

7.4Property: publisher

An entity responsible for making the dataset available.

SeeIssue-4

RDF Property:dcterms:publisher
Range:foaf:Organization
See also:Class: Organization/Person

7.5Property: release date

Date of formal issuance (e.g., publication) of the dataset.

SeeIssue-3

RDF Property:dcterms:issued
Range:rdfs:Literal typed asxsd:date. The date is encoded as a literal in "YYYY-MM-DD" form (ISO 8601 Date and Time Formats). If the specific day or month are not known, then 01 should be specified.
Usage note:This property should be set using the first known date of issuance.Example: 2010-05-07

7.6Property: frequency

The frequency with which dataset is published.

SeeIssue-5

RDF Property:dcterms:accrualPeriodicity
Range:dcterms:Frequency (A rate at which something recurs)
Usage note:@@@ values should come from a controlled vocabulary i.e. predefined set of resources. It could use values likeplacetime.com intervals
Domain:dcterms:Collection so, a Catalog must be a dcterms:Collection as well.

7.7Property: identifier

A unique identifier of the dataset.

RDF Property:dcterms:identifier
Range:rdfs:Literal
Usage note:The identifier might be used to coin permanent and unique URI for the dataset, but still having it represented explicitly is useful.

7.8Property: spatial/geographical coverage

Spatial coverage of the dataset.

RDF Property:dcterms:spatial
Range:dcterms:Location (A spatial region or named place)
Usage note: @@@ controlled vocabulary. geonames???

7.9Property: temporal coverage

@@@ The temporal period that the dataset covers.

RDF Property:dcterms:temporal
Range:dcterms:PeriodOfTime (An interval of time that is named or defined by its start and end dates)
Usage note: @@@ controlled vocabulary.http://www.placetime.com/ might be an option???

7.10Property: language

The language of the dataset.

RDF Property:dct:language
Range:rdfs:Literal a string representing the code of the language as described inhttp://www.ietf.org/rfc/rfc3066.txt
Usage note:This overrides the value of thecatalog language in case of conflict.

7.11Property: license

The license under which the dataset is published and can be reused.

RDF Property:dcterms:license
Range:dctype:LicenseDocument
Usage note:SeeSection 2.4 of Describing Linked Datasets with the VoID Vocabulary.

7.12Property: granularity

describes the level of granularity of data. @@@ elaborate more@@@

RDF Property:dcat:granularity
Range:rdfs:Resource
Usage note:This is usually geographical or temporal but can also be other dimension e.g. Person can be used to describe granularity of a dataset about average income.

A set of sample values used in data.gov: country, county, longitude/latitude, region, plane, airport.

7.13Property: data dictionary

provides some sort of description that helps understanding the data. This usually consisits of a table providing explanation of columns meaning, values interpretation and acronyms/codes used in the data.

RDF Property:dcat:dataDictionary
Range:foaf:Document
Usage note:@@@ Review @@@ It is rarely provided in the current catalogs and does not have a consistent usage, however when it is provided it is a link to some document or embeded in a document packaged together with the dataset. It is recommended to represent it as a resource having the URL of the online document as its URI. Statistical datasets, as a particular yet common case, can have a more structured description and the on-progress work onSDMX+RDF can be utilized here.

7.14Property: data quality

describes the quality of data.

RDF Property:dcat:dataQuality
Range:rdfs:Literal
Usage note:@@@Review@@@ This is a very general property and it is not clear how exactly it will be used as catalogs currently do not use it or use it with meaningless values. Catalogs are expected to define more specific sub-properties to describe quality characteristics e.g. statistical data usually have a lot to describe about the quality of sampling, collection mode, non-response adjustment…

7.15Property: theme/category

The main category of the dataset. A dataset can have multiple themes.

RDF Property:dcat:theme
Range:skos:Concept
Usage note:The set ofskos:Concepts used to categorize the datasets are organized in askos:ConceptScheme describing all the categories and their relations in the catalog.

7.16Property: keyword/tag

A keyword or tag describing the dataset.

RDF Property:dcat:keyword
Range:rdfs:Literal

7.17Property: related documents

A related document such as technical documentation, agency program page, citation, etc.

RDF Property:dcterms:references
Range:Has no defined range
Usage note:The value is the URI of the related document.

7.18Property: dataset distribution

Connects a dataset to its available distributions.

RDF Property:dcat:distribution
Range:dcat:Distribution

7.19Class: Distribution

Represents a specific available form of a dataset. Each dataset might be available in different forms, these forms might represent different formats of the dataset, different endpoints,... Examples of Distribution include a downloadable CSV file, an XLS file representing the dataset, an RSS feed…

RDF Property:dcat:Distribution
Range:Has no defined range
Usage note:This represents a general availability of a dataset it implies no information about the actual access method of the data, i.e. whether it is a direct download, API, or a splash page. Use one of its subclasses when the particular access method is known.
See alsoDownload,WebService,Feed

7.20Property: access/download

points to the location of a distribution. This can be a direct download link, a link to an HTML page containing a link to the actual data, Feed, Web Service etc. the semantic is determined by its domain (Distribution, Feed, WebService, Download).

If the value is always a URI, shouldn't the range be rdfs:Resource?

RDF Property:dcat:accessURL
Range:rdfs:Literal
Usage note:the value is a URL.
See alsoDownload,WebService,Feed

7.21Property: size

The size of a distribution.

RDF Property:dcat:size
Range:rdfs:Resource
Usage note:dcat:size is usually used with a blank node described usingrdfs:label anddcat:bytes.
Example:
   :distribution dcat:size [dcat:bytes 5120^^xsd:integer; rdfs:label "5KB"]

7.22Property: format

the file format of the distribution.

RDF Property:dcterms:format
Range:dcterms:MediaTypeOrExtent
Usage note:MIME type is used for values. A list of MIME types URLs can be found atIANA. However ESRI Shape files have no specific MIME type (A Shape distribution is actually a collection of files), currently this is still an open question? @@@.
Example:
:distribution dcterms:format [   a dcterms:IMT;   rdf:value "text/csv";   rdfs:label "CSV"]

8.Class: Download

Represents a downloadable distribution of a dataset.

RDF Class:dcat:Download
Range: accessUrl of the Download distribution should be adirect download link (a one-click access to the data file).
See also:Distribution,WebService,Feed

9.Class: WebService

Represents a web service that enables access to the data of a dataset.

RDF Class:dcat:WebService
Range:dcterms:MediaTypeOrExtent
Usage note:Describe the web service using accessUrl, format and size. Further description of the web service is out the scope of dcat.
See also:Distribution,Download,Feed

10.Class: Feed

represent availability of a dataset as a feed.

RDF Class:dcat:Feed
Usage note:Describe the feed using accessUrl,format and size. Further description of the web service is out the scope of dcat.
See also:Distribution,Download,WebService

11.Class: Category and category scheme

The knowledge organization system (KOS) used to represent themes/categories of datasets in the catalog.

RDF Classes:skos:ConceptScheme,skos:Concept
Usage note:It's necessary to use either skos:inScheme or skos:topConceptOf on every skos:Concept otherwise it's not clear which concept scheme they belong to.
See also:catalog themes,dataset theme

12.Class: Organization/Person

RDF Classes:foaf:Person for people andfoaf:Organization for government agencies or other entities.
Usage note:FOAF provides sufficient properties to describe these entities.

13.Extending the DCAT vocabulary

A.References

A.1Normative references

[RFC2119]
S. Bradner.Key words for use in RFCs to Indicate Requirement Levels. March 1997. Internet RFC 2119. URL:http://www.ietf.org/rfc/rfc2119.txt

A.2Informative references

No informative references.


[8]ページ先頭

©2009-2025 Movatter.jp