Movatterモバイル変換


[0]ホーム

URL:


W3C

PROV-DM: The PROV Data Model

W3C Recommendation 30 April 2013

This version:
http://www.w3.org/TR/2013/REC-prov-dm-20130430/
Latest published version:
http://www.w3.org/TR/prov-dm/
Implementation report:
http://www.w3.org/TR/2013/NOTE-prov-implementations-20130430/
Previous version:
http://www.w3.org/TR/2013/PR-prov-dm-20130312/(color-coded diff)
Editors:
Luc Moreau, University of Southampton
Paolo Missier, Newcastle University
Contributors:
Khalid Belhajjame, University of Manchester
Reza B'Far, Oracle Corporation
James Cheney, University of Edinburgh
Sam Coppens, iMinds - Ghent University
Stephen Cresswell, legislation.gov.uk
Yolanda Gil, Invited Expert
Paul Groth, VU University of Amsterdam
Graham Klyne, University of Oxford
Timothy Lebo, Rensselaer Polytechnic Institute
Jamie McCusker, Rensselaer Polytechnic Institute
Simon Miles, Invited Expert
James Myers, Rensselaer Polytechnic Institute
Satya Sahoo, Case Western Reserve University
Curt Tilmes, National Aeronautics and Space Administration

Please refer to theerrata for this document, which may include some normative corrections.

The English version of this specification is the only normative version. Non-normativetranslations may also be available.

Copyright © 2011-2013W3C® (MIT,ERCIM,Keio,Beihang), All Rights Reserved.W3Cliability,trademark anddocument use rules apply.


Abstract

Provenance is information about entities, activities, and peopleinvolved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness.PROV-DM is the conceptual data model that forms a basis for theW3Cprovenance (PROV) family of specifications.PROV-DM distinguishes core structures, forming the essence of provenance information, fromextended structures catering for more specific uses of provenance. PROV-DM is organized in six components, respectively dealing with: (1) entities and activities, and the time at which they were created, used, or ended;(2) derivations of entities from entities;(3) agents bearing responsibility for entities that were generated and activities that happened;(4) a notion of bundle, a mechanism to support provenance of provenance; (5) properties to link entities that refer to the same thing; and,(6) collections forming a logical structure for its members.

This document introduces the provenance concepts found inPROV and defines PROV-DM types andrelations. The PROV data model is domain-agnostic, but is equipped withextensibility points allowing domain-specific information to be included.

Two further documents complete the specification of PROV-DM.First, a companion document specifies the set of constraints thatprovenance should follow. Second, a separate document describes a provenance notation for expressing instances of provenance for human consumption; this notation is used in examples inthis document.

ThePROV Document Overview describes the overall state of PROV, and should be read before other PROV documents.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of currentW3C publications and the latest revision of this technical report can be found in theW3C technical reports index at http://www.w3.org/TR/.

PROV Family of Documents

This document is part of the PROV family of documents, a set of documents defining various aspects that are necessary to achieve the vision of inter-operableinterchange of provenance information in heterogeneous environments such as the Web. These documents are listed below. Please consult the [PROV-OVERVIEW] for a guide to reading these documents.

Endorsed ByW3C

This document has been reviewed byW3C Members, by software developers, and by otherW3C groups and interested parties, and is endorsed by the Director as aW3C Recommendation. It is a stable document and may be used as reference material or cited from another document.W3C's role in making the Recommendation is to draw attention to the specification and to promote its widespread deployment. This enhances the functionality and interoperability of the Web.

Please Send Comments

This document was published by theProvenance Working Group as a Recommendation. If you wish to make comments regarding this document, please send them topublic-prov-comments@w3.org (subscribe,archives). All comments are welcome.

This document was produced by a group operating under the5 February 2004W3C Patent Policy.W3C maintains apublic list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes containsEssential Claim(s) must disclose the information in accordance withsection 6 of theW3C Patent Policy.

Table of Contents

1.Introduction

For the purpose of this specification,provenance is defined as a record that describes the people,institutions, entities, and activities involved in producing,influencing, or delivering a piece of data or a thing.In particular, the provenance of information is crucial in decidingwhether information is to be trusted, how it should be integrated withother diverse information sources, and how to give credit to itsoriginators when reusing it. In an open and inclusive environmentsuch as the Web, where users find information that is often contradictory orquestionable, provenance can help those users to make trust judgements.

We present the PROV data model, PROV-DM,a generic data model for provenance that allows domain and application specific representations of provenance to be translated into such a data model andinterchanged between systems.Thus, heterogeneous systems can export their native provenance into such a core data model, and applications that need to make sense of provenance can then import it,process it, and reason over it.

The PROV data model distinguishescore structures fromextended structures: core structures form the essence ofprovenance information, and are commonly found in variousdomain-specific vocabularies that dealwith provenance or similar kinds of information [Mappings].Extended structures enhance and refine corestructures with more expressive capabilities to cater for moreadvanced uses of provenance.The PROV data model, comprising both core and extended structures, is a domain-agnostic model, but with clear extensibility points allowing further domain-specific andapplication-specific extensions to be defined.

The PROV data model has a modular design and is structured according to six components covering various facets of provenance:

This specification presents the concepts of the PROV data model, andprovenance types and relations, without specific concern for how they are applied.With these, it becomes possible to write useful provenance, and publish or embed it alongside the data it relates to.

However, if something about which provenance is expressed is subject to change, then it is challenging to express its provenance precisely (e.g. the data from which a daily weather report is derived changes from day to day).This is addressed in a companion specification [PROV-CONSTRAINTS] by proposing formal constraints on the way that provenance is related to the things it describes (such as the use of attributes, temporal information and specialization of entities), and additional conclusions that are valid to infer.

1.1Compliance with this Document

For the purpose of compliance, the normative sections of this documentareSection 1.1,Section 1.3,Section 5., andAppendix A.

1.2Structure of this Document

This section is non-normative.

Section 2 provides an overview of the PROV data model, distinguishing a core set of types and relations, commonly found in provenance, from extended structures catering for more specific uses. It also introduces a modular organization of the data model in components.

Section 3 overviews the Provenance Notation used to illustrate examples of provenance.

Section 4 illustrates how the PROV data model can be usedto express the provenance of a report published on the Web.

Section 5 provides the definitions of PROV concepts, structured according to six components.

Section 6 summarizes PROV-DM extensibility points.

Section 7 introduces the idea that constraints can be applied to the PROV data model to validate provenance; these are covered in the companion specification [PROV-CONSTRAINTS].

1.3Notational Conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

Examples throughout this document use the PROV-N Provenance Notation, briefly introduced inSection 3 and specified fully in a separate document [PROV-N].

1.4Namespaces

This section is non-normative.

The following namespaces prefixes are used throughout this document.

Table 1 ◊:Prefix and Namespaces used in this specification
prefixnamespace IRIdefinition
provhttp://www.w3.org/ns/prov#The PROV namespace (seeSection 5.7.4)
xsdhttp://www.w3.org/2000/10/XMLSchema#XML Schema Namespace [XMLSCHEMA11-2]
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#The RDF namespace [RDF-CONCEPTS]
(others)(various)All other namespace prefixes are used in examples only.
In particular, IRIs starting with "http://example.com" represent
some application-dependent IRI [RFC3987]

2.PROV Overview

This section is non-normative.

This section introduces provenance concepts with informal explanations and illustrativeexamples. PROV distinguishescore structures, forming the essence of provenance, fromextended structures catering for more specific uses of provenance. Core and extended structures are respectively presented inSection 2.1 andSection 2.2. Furthermore, the PROV data model is organized according to components, which form thematic groupings of concepts (seeSection 2.3). Aprovenance description is an instance of a provenance structure, whether core or extended, described below.

2.1PROV Core Structures

This section is non-normative.

At its core, provenance describes the use and production ofentities byactivities, which may be influenced invarious ways byagents. These core types and their relationshipsare illustratedbythe UML diagram ofFigure 1.

PROV Core Structures
Figure 1 ◊: PROV Core Structures (Informative)

The concepts found in the core of PROV are introduced in the rest of this section.They are summarized inTable 2, where they are categorized as type or relation. The first column lists concepts, the second column indicates whether a concept maps to a type or a relation, whereas the third column contains the corresponding name, as it appears in Figure 1. Names of relations have a verbal form in the past tense to express what happened in the past, as opposed to what may or will happen. In the core of PROV, all relations are binary.

Table 2 ◊:Mapping of PROV core concepts to types and relations
PROV ConceptsPROV-DM types or relationsNameOverview
EntityPROV-DM TypesEntitySection 2.1.1
ActivityActivitySection 2.1.1
AgentAgentSection 2.1.3
GenerationPROV-DM RelationsWasGeneratedBySection 2.1.1
UsageUsedSection 2.1.1
CommunicationWasInformedBySection 2.1.1
DerivationWasDerivedFromSection 2.1.2
AttributionWasAttributedToSection 2.1.3
AssociationWasAssociatedWithSection 2.1.3
DelegationActedOnBehalfOfSection 2.1.3

2.1.1Entity and Activity

This section is non-normative.

In PROV, things we want to describe the provenance of are calledentities and have some fixed aspects. The term "things" encompasses a broad diversity of notions, including digital objects such as a file or web page, physical things such as a mountain, a building, a printed book, or a car as well as abstract concepts and ideas.

Anentity is a physical, digital, conceptual, or other kind of thing with some fixed aspects; entities may be real or imaginary. [Detailed specification]

Example 1

An entity may be the document at IRIhttp://www.bbc.co.uk/news/science-environment-17526723, a file in a file system, a car, or an idea.

Anactivity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities. [Detailed specification]Just as entities cover a broad range of notions, activities can cover a broad range ofnotions:information processing activities may for example move, copy, or duplicate digital entities; physical activities can include driving a car between two locations or printing a book.

Example 2

An activity may be the publishing of a document on the Web, sending a twitter message, extracting metadata embedded in a file, driving a car from Boston to Cambridge, assembling a data set based on a set of measurements, performing a statistical analysis over a data set, sorting news items according to some criteria, running a SPARQL query over a triple store, or editing a file.

Activities and entities are associated with each other in two different ways: activities utilize entities and activities produce entities. The act of utilizing or producing an entity may have a duration. The term 'generation' refers to the completion of the act of producing; likewise, the term 'usage' refers to the beginning of the act of utilizing entities. Thus, we define the following concepts of generation and usage.

Generation is the completion of production of a new entity by an activity. This entity did not exist before generation and becomes available for usage after this generation. [Detailed specification]

Usage is the beginning of utilizing an entity by an activity. Before usage, the activity had not begun to utilize this entity and could not have been affected by the entity. [Detailed specification]

Example 3

Examples of generation are the completed creation of a file by aprogram, the completed creation of a linked data set, and the completedpublication of a new version of a document.

Example 4

Usage examples include a procedure beginning to consume an argument, a service starting to read a value on a port, a program beginning to read a configurationfile, or the point at which an ingredient, such as eggs, is being added in a baking activity. Usage may entirely consume an entity (e.g. eggs are no longer available after being added tothe mix); in contrast, the same entity may be used multiple times, possibly by different activities (e.g. a file on a file system can be read indefinitely).

Example 5

Let us consider the activity of driving a car from Boston to Cambridge.One might reasonably ask what entities are used and generated by this activity.This is answered byconsidering that a single artifact maycorrespond to several entities; in this case, a car in Boston may be adifferent entity from the same car in Cambridge. Thus, among other things,an entity "car in Boston" would be used, and a new entity "car inCambridge" would be generated by this activity of driving. Theprovenance trace of the car might include: designed in Japan,manufactured in Korea, shipped to Boston USA, purchased by customer,driven to Cambridge, serviced by engineer in Cambridge, etc., all ofwhich might be important information when deciding whether or not itrepresents a sensible second-hand purchase. Or some of it mightalternatively be relevant when trying to determine the truth of a webpage reporting a traffic violation involving that car. This breadthof provenance allows descriptions of interactions between physical anddigital artifacts.

The generation of an entity by an activity and its subsequent usage by another activity is termed communication.

Communication is the exchange of some unspecified entity by two activities, one activity using some entity generated by the other. [Detailed specification]

Example 6

The activity of writing a celebrity article was informed by (acommunication instance) the activity of intercepting voicemails.

2.1.2Derivation

Activities utilize entities and produce entities. In some cases, utilizing an entity influences the creation of another in some way. This notion of 'influence' is captured by derivations, defined as follows.

Aderivation is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity. [Detailed specification]

Example 7

Examples of derivation include the transformation of a relational table into alinked data set, the transformation of a canvas into a painting, the transportation of a work of art from London to New York, and a physical transformation such as the melting of ice into water.

The focus of derivation is on connecting a generated entity to a usedentity.While the basic idea is simple, the concept of derivation can be quitesubtle: implicit is the notion that the generated entity was affectedin some way by the used entity. If an artifactwas used by an activity that also generated a new artifact, it does not always followthat the second artifact was derived from the first. In the activityof creating a painting, an artist may have mixed some paint that wasnever actually applied to the canvas: the painting would typicallynot be considered a derivation from the unused paint. PROV does not attempt to specify the conditions under which derivationsexist; rather, derivation is considered to have been determined by unspecified means. Thus, while a chain of usage and generation is necessary for aderivation to hold between entities, it is not sufficient; someform of influence occurring during the activities involved is also needed.

2.1.3Agents and Responsibility

For many purposes, a key consideration for deciding whether something is reliable and/or trustworthy is knowing who or whatwas reponsible for its production. Data published by a respected independent organization may be considered more trustworthy than that from a lobby organization; a claim by a well-known scientist with an established track record may be more believed than a claim by a new student; a calculation performed by an established software library may be more reliable than by a one-off program.

Anagent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity. [Detailed specification] An agent may be a particular type of entity or activity. This means that the model can be used to express provenance of the agents themselves.

Example 8

Software for checking the use of grammar in a document may be defined as an agent of a document preparation activity; one can also describe its provenance, including for instance the vendor and the version history. A site selling books on the Web, the services involved in the processing of orders, and the companies hosting them are also agents.

Agents can be related to entities, activities, and other agents.

Attribution is the ascribing of an entity to an agent.[Detailed specification]

Example 9

A blog post can be attributed to an author, a mobile phone to its manufacturer.

Agents are defined as having some kind of responsibility for activities.

An activityassociation is an assignment of responsibility to an agent for an activity, indicating that the agent had a role in the activity.[Detailed specification]

Example 10

Examples of association between an activity and an agent are:

  • creation of a web page under the guidance of a designer;
  • various forms of participation in a panel discussion, including audience member, panelist, or panel chair;
  • a public event, sponsored by a company, and hosted by a museum;

Delegation is the assignment of authority and responsibility to an agent (by itself or by another agent) to carry out a specific activity as a delegate or representative, while the agent it acts on behalf of retains some responsibility for the outcome of the delegated work.[Detailed specification] The nature of this relation is intended to be broad, including contractual relation, but also altruistic initiative by the representative agent.

Example 11

A student publishing a web page describing an academicdepartment could result in both the student and the department beingagents associated with the activity. It may not matter which actualstudent published a web page, but it may matter significantly that the departmenttold the student to put up the web page.

2.2PROV Extended Structures

While the core of PROV focuses on essential provenance structures commonly found in provenance descriptions, extended structures are designed to support more advanced uses of provenance. The purpose of this section is twofold. First, mechanisms to specify these extended structures are introduced. Second, two further kinds of provenance structures are overviewed: they cater for provenance of provenance and collections, respectively.

2.2.1Mechanisms to Define Extended Structures

Extended structures are defined by a variety of mechanisms outlined in this section: subtyping, expanded relations, optionalidentification, and new relations.

2.2.1.1Subtyping

This section is non-normative.

Subtyping can be applied to core types. For example, a software agent is special kind of agent, defined as follows.

Asoftware agent is running software.

Subtyping can also be applied to core relations. For example, a revision is a special kind of derivation, defined as follows.

Arevision is a derivation for which the resulting entity is a revised version of some original.

2.2.1.2Expanded Relations

Section 2.1 shows that seven concepts are mapped to binary relations in the core of PROV. However, some advanced uses of these concepts cannot be captured by a binary relation, but require relations to be expanded to n-ary relations.

Indeed, binary relations are actually shorthands that can be 'opened up' by applications and filled in with further application details. For example, derivation is a very high level relationship between two entities: an application may decide to 'open up' that relationship in an expanded relation that describes how an entity was derived from another by virtue of listing the generation, usage, and activity involved in the derivation relationship. Applications are free to decide which level of granularity they want describe, and PROV gives them the way to do that.

To illustrate expanded relations, we revisit the concept ofassociation, introducedinSection 2.1.3 (full definition of the expanded association can be foundinSection 5.3.3). Agents may rely onplans, i.e. sets of actions or steps, to achieve theirgoals in the context of an activity.Hence, an expanded form ofassociation relation allows for a plan to be specified. Plan is defined by subtyping and full association by an expanded relation, as follows.

Aplan is an entity that represents a set of actions or steps intended by one or more agents to achieve some goals.

An activityassociation is an assignment of responsibility to an agent for an activity, indicating that the agent had a role in the activity. It further allows for a plan to be specified, which is the plan intended by the agent to achieve some goals in the context of this activity.

There exist noprescriptive requirements on the nature of plans, their representation, theactions or steps they consist of, or their intended goals. Since plans may evolve over time,it may become necessary to track their provenance, so plans themselves areentities. Representing the plan explicitly in the provenance can be useful for various tasks: for example, to validate the execution as represented in the provenance record, to manage expectation failures, or to provide explanations.

Example 12

An example of association between an activity and an agent involving a plan is:an XSLT transform (an activity) launched by a user (an agent) based on an XSL style sheet (a plan).

2.2.1.3Optional Identification

Some concepts exhibit both a core use, expressed asbinary relation, and an extended use, expressed as n-ary relation. Insome cases, mapping the concept to a relation, whether binary orn-ary, is not sufficient: instead, it may be required toidentify an instance of such concept. In those cases, PROV allows for an optional identifier to beexpressed to identify an instance of an association between two ormore elements. This optional identifier can then be used to refer toan instance as part of other concepts.

Example 13

A service may read a same configuration file on two different occasions. Each usage can be identifed by its own identifier, allowing them to be distinguished.

2.2.1.4Further Relations

Finally, PROV supports further relations that are not subtypes or expanded versions of existing relations (such asspecialization,alternate).

2.2.2Provenance of Provenance

Abundle is a named set of provenance descriptions, and is itself an entity, so allowing provenance of provenance to be expressed.

For users to decide whether they can place their trust insomething, they may want to analyze its provenance, but also determinethe agent its provenance is attributed to, and when it wasgenerated. In other words, users need to be able to determine the provenance of provenance.Hence, provenance is alsoregarded as an entity (of type Bundle), by which provenance of provenance can then beexpressed.

Example 14

In a decision making situation, decision makers may be presented with the same piece of knowledge, issued by multiple sources. In order to validate this piece of knowledge, decision makers can consider its provenance, but also the provenance of its provenance, which may help determine whether it can be trusted.

2.2.3Collections

Acollection is an entity that provides a structure to some constituents that must themselves be entities. These constituents are said to bemember of the collections. Many different types of collections exist, such assets,dictionaries, orlists. Using Collections, one can express the provenance of the collection itself in addition to that of the members.

Example 15

An example of collection is an archive of documents. Each document has its own provenance, but the archive itself also has some provenance: who maintained it, which documents it contained at which point in time, how it was assembled, etc.

2.3Modular Organization

Besides the separation between core and extended structures, PROV-DMis further organized according to components, grouping concepts in athematic manner.

Table 3 enumerates the six components, five of which have already been implicitly overviewed in this section. All components contain extended structures, whereas only the first three contain core structures.

Table 3 ◊:Components Overview
ComponentCore
Structures
OverviewSpecificationDescription
1Entities and Activities2.1.15.1about entities and activities, and their interrelations
2Derivation2.1.25.2about derivation and its subtypes
3Agent and Responsibility2.1.35.3about agents and concepts ascribing responsibility to them
4Bundles2.2.25.4about bundles, a mechanism to support provenance of provenance
5Alternate5.5about relations linking entities referring to the same thing
6Collections2.2.35.6about collections

3.The Provenance Notation

This section is non-normative.

To illustrate the application of PROV concepts to a concrete example (seeSection 4) and to provide examples of concepts (seeSection 5),we introduce PROV-N, a notation for writing instances of the PROV data model. For full details and for a normative reference, the reader is referred to the companion specification [PROV-N].PROV-N is a notation aimed at human consumption, with the following characteristics:

Example 16

An activity with identifiera1 and an attributetype with valuecreateFile.

activity(a1, [ prov:type="createFile" ])
Two entities with identifierse1 ande2.
entity(e1)entity(e2)
The activitya1 usede1, ande2 was generated bya1.
used(a1, e1)wasGeneratedBy(e2, a1)
The same descriptions, but with an explicit identifieru1 for the usage, and the syntactic marker '-' to mark the absence of identifier in the generation. Both are followed by ';'.
used(u1; a1, e1)wasGeneratedBy(-; e2, a1)

4.Illustration of PROV-DM by an Example

This section is non-normative.

Section 2 has introduced some provenance concepts, and how they are expressed as types or relations in the PROV data model. The purpose of this section is to put these concepts into practice in order to express the provenance of some document published on the Web. With this realistic example, PROV concepts are composed together, and a graphical illustration shows a provenance description forming a directed graph, rooted at the entity we want to explain the provenance of, and pointing to the entities, activities, and agents it depended on. This example also shows that, sometimes, multiple provenance descriptions about the same entity can co-exist, which then justifies the need for provenance of provenance.

In this example, we consider one of the many documents published by the World Wide Web Consortium, and describe its provenance. Specifically, we consider the document identified byhttp://www.w3.org/TR/2011/WD-prov-dm-20111215. Its provenance can be expressed from several perspectives: first, provenance can take the authors' viewpoint; second, it can be concerned with theW3C process. Then, attribution of these two provenance descriptions is provided.

4.1Example: The Authors View

This section is non-normative.

Description: A documentis edited by some editor, using contributions from variouscontributors.

In this perspective, provenance of the documenthttp://www.w3.org/TR/2011/WD-prov-dm-20111215 is concerned with the editing activity as perceived by authors. This kind of information could be used by authors in their CV or in a narrative about this document.

We paraphrase some PROV descriptions, express them with the PROV-N notation, and depict them with a graphical illustration (seeFigure 2).Full details of the provenance record can be foundhere.

Provenance of a Document (1)
Figure 2 ◊: Provenance of a Document (part 1) (Informative)

Provenance descriptions can beillustrated graphically. The illustration is not intended to represent all the details of the model, but it is intended to show the essence of a set ofprovenance descriptions [PROV-LAYOUT]. Therefore, it should not be seen as an alternate notation for expressing provenance.

The graphical illustration takes the form of a graph. Entities, activities and agents are represented as nodes, with oval, rectangular, and pentagonal shapes, respectively. Usage,Generation, Derivation, and Association are represented as directed edges.

Entities are laid out according to the ordering of their generation. We endeavor to show time progressing from left to right. This means that edges for Usage, Generation,Derivation, Association typically point leftwards

4.2Example: The Process View

Description: The World Wide WebConsortium publishes documents according to its publicationpolicy. Working drafts are published regularly to reflect the workaccomplished by working groups. Every publication of a working draftmust be preceded by a "publication request" to the Webmaster. Thevery first version of a document must also be preceded by a"transition request" to be approved by theW3C director. All workingdrafts are made available at a unique IRI. In this scenario, we consider two successive versions of a given document, the policy according to which they were published, and the associated requests.

We describe the kind of provenance record that theWWW Consortium could keep for auditors to check that due processes are followed. All entities involved in this example are Web resources, with well-defined IRIs (some of which refer to archived email messages, available toW3C Members).

We now paraphrase some PROV descriptions, and express them with the PROV-N notation, and depict them with a graphical illustration (seeFigure 3). Full details of the provenance record can be foundhere.

Provenance of a Document (2)
Figure 3 ◊: Provenance of a Document (part 2) (Informative)

This simple example has shown a variety of PROV concepts, such as Entity, Agent, Activity, Usage, Generation, Derivation, and Association. In this example, it happens that all entities were already Web resources, with readily available IRIs, which we used. We note that some of the resources are public, whereas others have restricted access: provenance statements only make use of their identifiers. If identifiers do not pre-exist, e.g. for activities, then they can be generated, for instanceex:act2, occurring in the namespace identified by prefixex. We note that the IRI scheme developed byW3C is particularly suited for expressing provenance of these documents, since each IRI denotes a specific version of a document. It then becomes easy to relate the various versions with PROV relations. We note that an Association is a ternary relation (represented by a multi-edge labeled wasAssociatedWith) from an activity to an agent and a plan.

4.3Example: Attribution of Provenance

The two previous sections offer two different perspectives on the provenance of a document. PROV allows for multiple sources to provide the provenance of a subject. For users to decide whether they can place their trust in the document, they may want to analyze its provenance, but also determine who the provenance is attributed to, and when it wasgenerated, etc. In other words, we need to be able to express the provenance of provenance.

PROV-DM offers a construct to name a bundle of provenance descriptions (full details:ex:author-view).

bundle ex:author-view  agent(ex:Paolo,   [ prov:type='prov:Person' ])  agent(ex:Simon,   [ prov:type='prov:Person' ])...endBundle
Likewise, the process view can be expressed as a separate named bundle (full details:ex:process-view).
bundle ex:process-view   agent(w3:Consortium, [ prov:type='prov:Organization' ])...endBundle

To express their respective provenance, these bundles must be seen as entities, and all PROV constructs are now available to express their provenance. In the example below,ex:author-view is attributed to the agentex:Simon, whereasex:process-view tow3:Consortium.

entity(ex:author-view, [ prov:type='prov:Bundle' ])wasAttributedTo(ex:author-view, ex:Simon)entity(ex:process-view, [ prov:type='prov:Bundle' ])wasAttributedTo(ex:process-view, w3:Consortium)

5.PROV-DM Types and Relations

Provenance concepts, expressed as PROV-DM types and relations, are organized according to six components that are defined in this section.The components and their dependencies are illustrated inFigure 4. A component that relies on concepts defined in another is displayed above it in the figure. So, for example, component 5 (alternate) depends on concepts defined in component 4 (bundles), itself dependent on concepts defined in component 1 (entity and activity).

PROV-DM Components
Figure 4
◊: PROV-DM Components (Informative)

While not all PROV-DM relations are binary, they all involve two primary elements. Hence,Table 4 indexes all relations (exceptwasInfluencedBy) according to their two primary elements (referred to as subject and object). The table adopts the same color scheme asFigure 4, allowing components to be readily identified.Relation names appearing in bold correspond to the core structures introducedinSection 2.1.

Table 4 ◊:PROV-DM Relations At a Glance
Object
EntityActivityAgent
SubjectEntityWasGeneratedBy
WasInvalidatedBy
R
T
L
WasAttributedTo
ActivityUsed
WasStartedBy
WasEndedBy
R
T
L
WasInformedByWasAssociatedWithR
AgentActedOnBehalfOf

The letters 'R' and 'L' appearing in the right-hand side of some cells ofTable 4 indicate that attributesprov:role (Section 5.7.2.3)andprov:location (Section 5.7.2.2)are permitted for these relations.The letter 'T' indicates anOPTIONALtime is also permitted.

Some PROV-DM relations are not binary and involve extra optional element. They are summarized inTable 5 grouping secondary objects, according to their type. The table also adopts the same color scheme asFigure 4, allowing components to be readily identified. None of these relations correspond to the core structures introducedinSection 2.1.

Table 5 ◊:Secondary optional elements in PROV-DM Relations
Secondary Object
EntityActivityAgent
SubjectEntityWasDerivedFrom (activity)
ActivityWasAssociatedWith (plan)WasStartedBy (starter)
WasEndedBy (ender)
AgentActedOnBehalfOf (activity)

Table 6 is a complete index of all the types and relations of PROV-DM, color-coded according to the component they belong to. In the first column, concept names link to their informal definition, whereas, in the second column, representations link to the information used to represent the concept. Concept names appearing in bold in the first column are the core structures introduced inSection 2.1. Likewise, these core structures have their names and parameters highlighted in bold in the second column (prov-n representation); expanded structures are not represented with a bold font.

Table 6 ◊:PROV-DM Types and Relations
Type or Relation NameRepresentation in the PROV-N notationComponent
Entityentity(id, [ attr1=val1, ...])Component 1: Entities/Activities
Activityactivity(id, st, et, [ attr1=val1, ...])
GenerationwasGeneratedBy(id;e,a,t,attrs)
Usageused(id;a,e,t,attrs)
CommunicationwasInformedBy(id;a2,a1,attrs)
StartwasStartedBy(id;a2,e,a1,t,attrs)
EndwasEndedBy(id;a2,e,a1,t,attrs)
InvalidationwasInvalidatedBy(id;e,a,t,attrs)
DerivationwasDerivedFrom(id;e2, e1, a, g2, u1, attrs)Component 2: Derivations
Revision... prov:type='prov:Revision' ...
Quotation... prov:type='prov:Quotation' ...
Primary Source... prov:type='prov:PrimarySource' ...
Agentagent(id, [ attr1=val1, ...])Component 3: Agents, Responsibility, Influence
AttributionwasAttributedTo(id;e,ag,attr)
AssociationwasAssociatedWith(id;a,ag,pl,attrs)
DelegationactedOnBehalfOf(id;ag2,ag1,a,attrs)
Plan... prov:type='prov:Plan' ...
Person... prov:type='prov:Person' ...
Organization... prov:type='prov:Organization' ...
SoftwareAgent... prov:type='prov:SoftwareAgent' ...
InfluencewasInfluencedBy(id;e2,e1,attrs)
Bundle constructorbundle id description_1 ... description_n endBundleComponent 4: Bundles
Bundle type... prov:type='prov:Bundle' ...
AlternatealternateOf(alt1, alt2)Component 5: Alternate
SpecializationspecializationOf(infra, supra)
Collection... prov:type='prov:Collection' ...Component 6: Collections
EmptyCollection... prov:type='prov:EmptyCollection' ...
MembershiphadMember(c,e)

In the rest of the section, each type and relation is defined informally, followed by a summary of the information used to represent the concept, and illustrated with PROV-N examples.

5.1Component 1: Entities and Activities

The first component of PROV-DM is concerned withentities andactivities, and their interrelations: Used (Usage), WasGeneratedBy (Generation), WasStartedBy (Start), WasEndedBy (End), WasInvalidatedBy (Invalidation), and WasInformedBy (Communication).Figure 5 uses UML to depict the first component.Core structures are displayed in the yellow area, consisting of two classes (Entity,Activity) and three binary associations between them: Used (Usage), WasGeneratedBy (Generation), and WasInformedBy (Communication). The rest of the figure displays extended structures, including UML association classes (see [UML], section 7.3.4, p. 42), represented in gray, to express expanded n-ary relations for Used (Usage), WasGeneratedBy (Generation), WasInvalidatedBy (Invalidation), WasStartedBy (Start), WasEndedBy (End). The figure also makes explicit associations withtime for these concepts (time being marked with the primitive stereotype). When not specified, cardinality is assumed to be 0..*.

entities and activities
Figure 5 ◊: Entities and Activities Component Overview (Informative)

5.1.1Entity

Anentity is a physical, digital, conceptual, or other kind of thing with some fixed aspects; entities may be real or imaginary.

Anentity, writtenentity(id, [attr1=val1, ...]) in PROV-N, has:
  • id: an identifier for an entity;
  • attributes: anOPTIONAL set of attribute-value pairs ((attr1,val1), ...) representing additional information about the fixed aspects of this entity.
Example 17

The following expression

entity(tr:WD-prov-dm-20111215, [ prov:type="document", ex:version="2" ])
states the existence of an entity, denoted by identifiertr:WD-prov-dm-20111215, with typedocument and version number2. The attributeex:version is application specific, whereas the attributetype (seeSection 5.7.4.4) is reserved in thePROV namespace.

5.1.2Activity

Anactivity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities.

Anactivity, writtenactivity(id, st, et, [attr1=val1, ...]) in PROV-N, has:
  • id: an identifier for an activity;
  • startTime: anOPTIONAL time (st) for the start of the activity;
  • endTime: anOPTIONAL time (et) for the end of the activity;
  • attributes: anOPTIONAL set of attribute-value pairs ((attr1,val1), ...) representing additional information about this activity.
Example 18

The following expression

activity(a1, 2011-11-16T16:05:00, 2011-11-16T16:06:00,        [ ex:host="server.example.org", prov:type='ex:edit' ])

states the existence of an activity with identifiera1, start time2011-11-16T16:05:00, and end time2011-11-16T16:06:00, running on hostserver.example.org, and of typeedit. The attributehost is application specific (declared in some namespace with prefixex). The attributetype is a reserved attribute of PROV-DM, allowing for sub-typing to be expressed (seeSection 5.7.2.4).

Further considerations:

  • An activity is not an entity. This distinction is similar to the distinction between 'continuant' and 'occurrent' in logic [Logic].

5.1.3Generation

Generation is the completion of production of a new entity by an activity. This entity did not exist before generation and becomes available for usage after this generation.

Given that a generation is the completion of production of an entity, it is instantaneous.

Generation, writtenwasGeneratedBy(id; e, a, t, attrs) in PROV-N, has:
  • id: anOPTIONAL identifier for a generation;
  • entity: an identifier (e) for a created entity;
  • activity: anOPTIONAL identifier (a) for the activity that creates the entity;
  • time: anOPTIONAL "generation time" (t), the time at which the entity was completely created;
  • attributes: anOPTIONAL set (attrs) of attribute-value pairs representing additional information about this generation.

While each ofid,activity,time, andattributes isOPTIONAL, at least one of themMUST be present.

Example 19

The following expressions

  wasGeneratedBy(e1, a1, 2001-10-26T21:32:52, [ ex:port="p1" ])  wasGeneratedBy(e2, a1, 2001-10-26T10:00:00, [ ex:port="p2" ])

state the existence of two generations (with respective times2001-10-26T21:32:52 and2001-10-26T10:00:00), at which new entities, identified bye1 ande2, were created by anactivity, identified bya1.The first one was available on portp1, whereas the other was available on portp2. The semantics ofport are application specific.

Example 20

In some cases, we may want to record the time at which an entity was generated without having to specify the activity that generated it. To support this requirement, the activity element in generation is optional. Hence, the following expression indicates the time at which an entity is generated, without naming the activity that did it.

  wasGeneratedBy(e, -, 2001-10-26T21:32:52)

5.1.4Usage

Usage is the beginning of utilizing an entity by an activity. Before usage, the activity had not begun to utilize this entity and could not have been affected by the entity. (Note: This definition is formulated for a given usage; it is permitted for an activity to have used a same entity multiple times.)

Given that a usage is the beginning of utilizing an entity, it is instantaneous.

Usage, writtenused(id; a, e, t, attrs) in PROV-N, has:
  • id: anOPTIONAL identifier for a usage;
  • activity: an identifier (a) for the activity that used an entity;
  • entity: anOPTIONAL identifier (e) for the entity being used;
  • time: anOPTIONAL "usage time" (t), the time at which the entity started to be used;
  • attributes: anOPTIONAL set (attrs) of attribute-value pairs representing additional information about this usage.

While each ofid,entity,time, andattributes isOPTIONAL, at least one of themMUST be present.

A reference to a given entityMAY appear in multiple usages that share a given activity identifier.

Example 21

The following usages

  used(a1, e1, 2011-11-16T16:00:00, [ ex:parameter="p1" ])  used(a1, e2, 2011-11-16T16:00:01, [ ex:parameter="p2" ])

state that the activity identified bya1 used two entities identified bye1 ande2, at times2011-11-16T16:00:00 and2011-11-16T16:00:01, respectively; the firstone was found as the value of parameterp1, whereas the second was found as value of parameterp2. The semantics ofparameter is application specific.

5.1.5Communication

Communication is the exchange of some unspecified entity by two activities, one activity using some entity generated by the other.

A communication implies that activitya2 is dependent on anothera1, by way of some unspecified entity that is generated bya1 and used bya2.

Acommunication, written aswasInformedBy(id; a2, a1, attrs) in PROV-N, has:
  • id: anOPTIONAL identifier identifying the relation;
  • informed: the identifier (a2) of the informed activity;
  • informant: the identifier (a1) of the informant activity;
  • attributes: anOPTIONAL set (attrs) of attribute-value pairs representing additional information about this communication.
Example 22

Consider two activitiesa1 anda2, the former performed by a government agency, and the latter by a driver caught speeding.

activity(a1, [ prov:type="traffic regulations enforcing" ])activity(a2, [ prov:type="fine paying" ])wasInformedBy(a2, a1)
The last line indicates that some implicit entity was generated bya1 and used bya2; this entity may be a traffic ticket that had a notice of fine, amount, and payment mailing details.

5.1.6Start

Start is when an activity is deemed to have been started by an entity, known astrigger. The activity did not exist before its start. Any usage, generation, or invalidation involving an activity follows the activity's start. A start may refer to a trigger entity that set off the activity, or to an activity, known asstarter, that generated the trigger.

Given that a start is when an activity is deemed to have started, it is instantaneous.

An activitystart, writtenwasStartedBy(id; a2, e, a1, t, attrs) in PROV-N, has:
  • id: anOPTIONAL identifier for the activity start;
  • activity: an identifier (a2) for the started activity;
  • trigger: anOPTIONAL identifier (e) for the entity triggering the activity;
  • starter: anOPTIONAL identifier (a1) for the activity that generated the (possibly unspecified) entity (e);
  • time: theOPTIONAL time (t) at which the activity was started;
  • attributes: anOPTIONAL set (attrs) of attribute-value pairs representing additional information about this activity start.

While each ofid,trigger,starter,time, andattributes isOPTIONAL, at least one of themMUST be present.

Example 23

The following example contains the description of an activitya1 (a discussion), which was started at a specific time, and was triggered by an email messagee1.

entity(e1, [ prov:type="email message"] )activity(a1, [ prov:type="Discuss" ])wasStartedBy(a1, e1, -, 2011-11-16T16:05:00)
Furthermore, if the message is also an input to the activity, this can be described as follows:
used(a1, e1, -)

Alternatively, one can also describe the activity that generated the email message.

activity(a0, [ prov:type="Write" ])wasGeneratedBy(e1, a0)wasStartedBy(a1, e1, a0, 2011-11-16T16:05:00)

Ife1 is not known, it would also be valid to write:

wasStartedBy(a1, -, a0, 2011-11-16T16:05:00)
Example 24

In the following example, a race is started by a bang, and responsibility for this trigger is attributed to an agentex:Bob.

activity(ex:foot_race)entity(ex:bang)wasStartedBy(ex:foot_race, ex:bang, -, 2012-03-09T08:05:08-05:00)agent(ex:Bob)wasAttributedTo(ex:bang, ex:Bob)
Example 25

In this example, filling the fuel tank was started as a consequence ofobserving low fuel. The trigger entity is unspecified, it couldfor instance have been the low fuel warning light, the fuel tankindicator needle position, or the engine not running properly.

activity(ex:filling-fuel)activity(ex:observing-low-fuel)agent(ex:driver, [ prov:type='prov:Person'  )wasAssociatedWith(ex:filling-fuel, ex:driver)wasAssociatedWith(ex:observing-low-fuel, ex:driver)wasStartedBy(ex:filling-fuel, -, ex:observing-low-fuel, -)

The relations wasStartedBy and used are orthogonal, and thus need to be expressed independently, according to the situation being described.

5.1.7End

End is when an activity is deemed to have been ended by an entity, known astrigger. The activity no longer exists after its end. Any usage, generation, or invalidation involving an activity precedes the activity's end. An end may refer to a trigger entity that terminated the activity, or to an activity, known asender that generated the trigger.

Given that an end is when an activity is deemed to have ended, it is instantaneous.

An activityend, writtenwasEndedBy(id; a2, e, a1, t, attrs) in PROV-N, has:
  • id: anOPTIONAL identifier for the activity end;
  • activity: an identifier (a2) for the ended activity;
  • trigger: anOPTIONAL identifier (e) for the entity triggering the activity ending;
  • ender: anOPTIONAL identifier (a1) for the activity that generated the (possibly unspecified) entity (e);
  • time: theOPTIONAL time (t) at which the activity was ended;
  • attributes: anOPTIONAL set (attrs) of attribute-value pairs representing additional information about this activity end.

While each ofid,trigger,ender,time, andattributes isOPTIONAL, at least one of themMUST be present.

Example 26

The following example is a description of an activitya1 (editing) that was ended following an approval documente1.

entity(e1, [ prov:type="approval document" ])activity(a1, [ prov:type="Editing" ])wasEndedBy(a1, e1, -, -)

5.1.8Invalidation

Invalidation is the start of the destruction, cessation, or expiry of an existing entity by an activity. The entity is no longer available for use (or further invalidation) after invalidation. Any generation or usage of an entity precedes its invalidation.

Given that an invalidation is the start of destruction, cessation, or expiry, it is instantaneous.

Entities have a duration. Generation marks the beginning of an entity, whereas invalidation marks its end. An entity's lifetime can end for different reasons:

  • an entity was destroyed: e.g. a painting was destroyed by fire; a Web page is taken out of a site;
  • an entity was consumed: e.g. Bob ate all his soup, Alice ran out of gas when driving to work;
  • an entity expires: e.g. a "buy one beer, get one free" offer is valid during happy hour (7-8pm);
  • an entity is time limited: e.g. the BBC news site on April 3rd, 2012;
  • an entity attribute is changing: e.g. the traffic light changed from green to red.

In the first two cases, the entity has physically disappeared after its termination: there is no more soup, or painting. In the third case, there may be an "offer voucher" that still exists, but it is no longer valid; likewise, on April 4th, the BBC news site still exists but it is not the same entity as BBC news Web site on April 3rd; or the green traffic light (an entity with a fixed aspect green light) became thered traffic light (another entity with a fixed aspect red light).

Invalidation, writtenwasInvalidatedBy(id; e, a, t, attrs) in PROV-N, has:
  • id: anOPTIONAL identifier for a invalidation;
  • entity: an identifier for the invalidated entity;
  • activity: anOPTIONAL identifier for the activity that invalidated the entity;
  • time: anOPTIONAL "invalidation time", the time at which the entity began to be invalidated;
  • attributes: anOPTIONAL set of attribute-value pairs representing additional information about this invalidation.

While each ofid,activity,time, andattributes isOPTIONAL, at least one of themMUST be present.

Example 27

The Painter, a Picasso painting, is known to have been destroyed in aplane accident.

entity(ex:The-Painter)agent(ex:Picasso)wasAttributedTo(ex:The-Painter, ex:Picasso)activity(ex:crash)wasInvalidatedBy(ex:The-Painter, ex:crash, 1998-09-03T01:31:00, [ ex:circumstances="plane accident" ])
Example 28

The BBC news home page on 2012-04-03ex:bbcNews2012-04-03contained a reference to a given news itembbc:news/uk-17595024,but the BBC news home page on the next day did not.

entity(ex:bbcNews2012-04-03)hadMember(ex:bbcNews2012-04-03, bbc:news/uk-17595024)wasGeneratedBy  (ex:bbcNews2012-04-03, -, 2012-04-03T00:00:01)wasInvalidatedBy(ex:bbcNews2012-04-03, -, 2012-04-03T23:59:59)
We refer to exampleExample 43 for further descriptions of the BBC Web site, and toSection 5.6.2 for a description of the relationhadMember.
Example 29

In this example, the "buy one beer, get one free" offer expired at the end of the happy hour.

entity(buy_one_beer_get_one_free_offer_during_happy_hour)wasAttributedTo(buy_one_beer_get_one_free_offer_during_happy_hour, proprietor)wasInvalidatedBy(buy_one_beer_get_one_free_offer_during_happy_hour,                 -,2012-03-10T18:00:00)

In contrast, in the following descriptions, Bob redeemed the offer 45 minutes before it expired, and got two beers.

entity(buy_one_beer_get_one_free_offer_during_happy_hour)wasAttributedTo(buy_one_beer_get_one_free_offer_during_happy_hour, proprietor)activity(redeemOffer)entity(twoBeers)wasAssociatedWith(redeemOffer, bob)used(redeemOffer,     buy_one_beer_get_one_free_offer_during_happy_hour,      2012-03-10T17:15:00)wasInvalidatedBy(buy_one_beer_get_one_free_offer_during_happy_hour,                 redeemOffer,                 2012-03-10T17:15:00)wasGeneratedBy(twoBeers,redeemOffer)

We see that the offer was both used to be converted intotwoBeers and invalidated by theredeemOffer activity: in other words, the combined usage and invalidation indicate consumption of the offer.

5.2Component 2: Derivations

The second component of PROV-DM is concerned with:derivations ofentities from other entities and derivation subtypes WasRevisionOf (Revision), WasQuotedFrom (Quotation), and HasPrimarySource (Primary Source).Figure 6 depicts the third componentwith PROV core structures in the yellow area, including two classes(Entity,Activity) and binary association WasDerivedFrom(Derivation). PROV extended structures are found outside thisarea. UML association classes express expanded n-ary relations.The subclasses are marked by the UML stereotype "prov:type" to indicate that the corresponding types are valid values for the attributeprov:type.

derivation
Figure 6 ◊: Derivation Component Overview (Informative)

5.2.1Derivation

Aderivation is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity.

According toSection 2, for an entity to be transformed from, created from, or resulting from an update to another, there must be someunderpinning activity or activities performing the necessary action(s) resulting in such a derivation. A derivation can be described at various levels of precision. In its simplest form, derivation relates two entities. Optionally, attributes can be added to represent further information about the derivation. If the derivation is the result of a single known activity, then this activity can also be optionally expressed. To provide a completely accurate description of the derivation, the generation and usage of the generated and used entities, respectively, can be provided, so as to make the derivation path, through usage, activity, and generation, explicit. Optional information such as activity, generation, and usage can be linked to derivations to aid analysis of provenance and to facilitate provenance-based reproducibility.

Aderivation, writtenwasDerivedFrom(id; e2, e1, a, g2, u1, attrs) in PROV-N, has:
  • id: anOPTIONAL identifier for a derivation;
  • generatedEntity: the identifier (e2) of the entity generated by the derivation;
  • usedEntity: the identifier (e1) of the entity used by the derivation;
  • activity: anOPTIONAL identifier (a) for the activity using and generating the above entities;
  • generation: anOPTIONAL identifier (g2) for the generation involving the generated entity (e2) and activity (a);
  • usage: anOPTIONAL identifier (u1) for the usage involving the used entity (e1) and activity (a);
  • attributes: anOPTIONAL set (attrs) of attribute-value pairs representing additional information about this derivation.
Example 30

The following descriptions are about derivations betweene2 ande1, but no information is provided as to the identity of the activity (and usage and generation) underpinning the derivation. In the second line, a type attribute is also provided.

wasDerivedFrom(e2, e1)wasDerivedFrom(e2, e1, [ prov:type="physical transform" ])

The following description expresses that activitya, using the entitye1 according to usageu1, derived theentitye2 and generated it according to generationg2. It is followed by descriptions for generationg2 and usageu1.

wasDerivedFrom(e2, e1, a, g2, u1)wasGeneratedBy(g2; e2, a, -)used(u1; a, e1, -)

With such a comprehensive description of derivation, a program that analyzes provenance can identify the activity underpinning the derivation, it can identify how the preceding entitye1 was used by the activity (e.g. for instance, which argument it was passed as, if the activity is the result of a function invocation), and which output the derived entitye2 was obtained from (say, for a function returning multiple results).

5.2.2Revision

Arevision is a derivation for which the resulting entity is a revised version of some original.

The implication here is that the resulting entity contains substantial content from the original.Arevision relation is a kind ofderivation relation from a revised entity to a preceding entity.The type of a revision relation is denoted by:prov:Revision.PROV defines no revision-specific attributes.

Example 31

Revisiting the example ofSection 4.2,we can now state that the reporttr:WD-prov-dm-20111215 was a revision of the reporttr:WD-prov-dm-20111018.

entity(tr:WD-prov-dm-20111215, [ prov:type='rec54:WD'  ])entity(tr:WD-prov-dm-20111018, [ prov:type='rec54:WD'  ])wasDerivedFrom(tr:WD-prov-dm-20111215,                tr:WD-prov-dm-20111018,                [ prov:type='prov:Revision' ])

5.2.3Quotation

Aquotation is the repeat of (some or all of) an entity, such as text or image, by someone who may or may not be its original author.

Aquotation relation is a kind ofderivation relation, for which an entity was derived from a preceding entity by copying, or "quoting", some or all of it.The type of a quotation relation is denoted by:prov:Quotation.PROV defines no quotation-specific attributes.

Example 32

The following paragraph is a quote from one ofthe author's blogs.

"During the workshop, it became clear to me that the consensus based models (which are often graphical in nature) can not only be formalized but also be directly connected to these database focused formalizations. I just needed to get over the differences in syntax. This could imply that we could have nice way to trace provenance across systems and through databases and be able to understand the mathematical properties of this interconnection."

Ifwp:thoughts-from-the-dagstuhl-principles-of-provenance-workshop/ denotes the original blog by agentex:Paul, anddm:bl-dagstuhl denotes the above paragraph, then the following descriptions express that the above paragraph was copied by agentex:Luc from a part of the blog, attributed to the agentex:Paul.

entity(wp:thoughts-from-the-dagstuhl-principles-of-provenance-workshop/)entity(dm:bl-dagstuhl)agent(ex:Luc)agent(ex:Paul)wasDerivedFrom(dm:bl-dagstuhl,               wp:thoughts-from-the-dagstuhl-principles-of-provenance-workshop/,               [ prov:type='prov:Quotation' ])wasAttributedTo(dm:bl-dagstuhl, ex:Luc)wasAttributedTo(wp:thoughts-from-the-dagstuhl-principles-of-provenance-workshop/, ex:Paul)

5.2.4Primary Source

Aprimary source for a topic refers to something produced by some agent with direct experience and knowledge about the topic, at the time of the topic's study, without benefit from hindsight.

Because of the directnessofprimarysources, they "speak for themselves" in ways that cannot becaptured through the filter of secondary sources. As such, it isimportant for secondary sources to reference those primary sourcesfrom which they were derived, so that their reliability can beinvestigated.

It is also important to note that a given entity might be a primary source for one entity but not another. It is the reason why Primary Source is defined as a relation as opposed to a subtype of Entity.

Aprimary source relation is a kind of aderivation relation fromsecondary materials to their primary sources. It is recognized thatthe determination of primary sources can be up to interpretation, andshould be done according to conventions accepted within theapplication's domain. The type of a primary source relation is denoted by:prov:PrimarySource.PROV defines no attributes specific to primary source.

Example 33

Let us consider Charles Joseph Minard's flow map of Napoleon's March in1812, which was published in 1869. Although the map is not a primary source,Minard probably used the journal of Pierre-Irénée Jacob, pharmacistto Napoleon's army during the Russian campaign. This primary source relationcan be encoded as follows.

entity(ex:la-campagne-de-Russie-1812-1813, [ prov:type="map" ])entity(ex:revue-d-Histoire-de-la-Pharmacie-t-XVIII, [ prov:type="journal" ])wasDerivedFrom(ex:la-campagne-de-Russie-1812-1813,               ex:revue-d-Histoire-de-la-Pharmacie-t-XVIII,               [ prov:type='prov:PrimarySource' ])

5.3Component 3: Agents, Responsibility, and Influence

The third component of PROV-DM, depicted inFigure 7, is concerned withagents and the relations WasAttributedTo(Attribution), WasAssociatedWith (Association), and ActedOnBehalfOf (Delegation), relating agents to entities, activities, and agents, respectively. Core structures are displayed in the yellow area and include three classes and three binary associations. Outside the yellow area, extended structures comprise UML association classes to express expanded n-ary relations, and subclassesPlan,Person,SoftwareAgent, andOrganization. The subclasses are marked by the UML stereotype "prov:type" to indicate that that these are valid values for the attributeprov:type.

agents and responsibilities
Figure 7 ◊: Agents and Responsibility Overview (Informative)

Component 3 further defines a general notion ofinfluence, a relation implied by all relations defined so far.Figure 8 displays one new association class, generalizing previously introduced associations.

Influence Overview
Figure 8 ◊: Influence Overview (Informative)

5.3.1Agent

Anagent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity.

An agent may be a particular type of entity or activity. This means that the model can be used to express provenance of the agents themselves.

Anagent, writtenagent(id, [attr1=val1, ...]) in PROV-N, has:
  • id: an identifier for an agent;
  • attributes: a set of attribute-value pairs ((attr1,val1), ...) representing additional information about this agent.

It is useful to define some basic categories of agents from an interoperability perspective.There are three types of agents that are common across most anticipated domains of use; it is acknowledged that these types do not cover all kinds of agent.

  • SoftwareAgent

    Asoftware agent is running software. The type of a software agent is denoted byprov:SoftwareAgent.

  • Organization

    Anorganization is a social or legal institution such as a company, society, etc. The type of an organization agent is denoted byprov:Organization.

  • Person

    Person agents are people. The type of a person agent is denoted byprov:Person.

PROV defines no attributes specific to SoftwareAgent, Organization, and Person.

Example 34

The following expression is about an agent identified bye1, which is a person, named Alice, with employee number 1234.

agent(e1, [ex:employee="1234", ex:name="Alice", prov:type='prov:Person' ])

It is optional to specify the type of an agent. When present, it is expressed using theprov:type attribute.

5.3.2Attribution

Attribution is the ascribing of an entity to an agent.

When an entitye is attributed to agentag, entitye was generated by some unspecified activity that in turn was associated to agentag. Thus, this relation is useful when the activity is not known, or irrelevant.

Anattribution relation, writtenwasAttributedTo(id; e, ag, attrs) in PROV-N, has:
  • id: anOPTIONAL identifier for the relation;
  • entity: an entity identifier (e);
  • agent: the identifier (ag) of the agent whom the entity is ascribed to, and therefore bears some responsibility for its existence;
  • attributes: anOPTIONAL set (attrs) of attribute-value pairs representing additional information about this attribution.
Example 35

Revisiting the example ofSection 4.1,we can ascribetr:WD-prov-dm-20111215 to some agents without an explicit activity.

agent(ex:Paolo, [ prov:type='prov:Person' ])agent(ex:Simon, [ prov:type='prov:Person' ])entity(tr:WD-prov-dm-20111215, [ prov:type='rec54:WD' ])wasAttributedTo(tr:WD-prov-dm-20111215, ex:Paolo, [ prov:type="editorship" ])wasAttributedTo(tr:WD-prov-dm-20111215, ex:Simon, [ prov:type="authorship" ])

5.3.3Association

An activityassociation is an assignment of responsibility to an agent for an activity, indicating that the agent had a role in the activity. It further allows for a plan to be specified, which is the plan intended by the agent to achieve some goals in the context of this activity.

Anassociation, writtenwasAssociatedWith(id; a, ag, pl, attrs) in PROV-N, has:
  • id: anOPTIONAL identifier for the association between an activity and an agent;
  • activity: an identifier (a) for the activity;
  • agent: anOPTIONAL identifier (ag) for the agent associated with the activity;
  • plan: anOPTIONAL identifier (pl) for the plan the agent relied on in the context of this activity;
  • attributes: anOPTIONAL set (attrs) of attribute-value pairs representing additional information about this association of this activity with this agent.

While each ofid,agent,plan, andattributes isOPTIONAL, at least one of themMUST be present.

Aplan is an entity that represents a set of actions or steps intended by one or more agents to achieve some goals. The type of a Plan entity is denoted byprov:Plan.

PROV defines no plan-specific attributes.

Example 36

In the following example, a designer agent and an operator agent are associated with an activity. The designer's goals are achieved by a workflowex:wf, described as an entity of typeplan.

activity(ex:a, [ prov:type="workflow execution" ])agent(ex:ag1,  [ prov:type="operator" ])agent(ex:ag2,  [ prov:type="designer" ])wasAssociatedWith(ex:a, ex:ag1, -,     [ prov:role="loggedInUser", ex:how="webapp" ])wasAssociatedWith(ex:a, ex:ag2, ex:wf, [ prov:role="designer", ex:context="project1" ])entity(ex:wf, [ prov:type='prov:Plan' ,                 ex:label="Workflow 1",                 prov:location="http://example.org/workflow1.bpel" %% xsd:anyURI ])
Since the workflowex:wf is itself an entity, its provenance can also be expressed in PROV: it can be generated by some activity and derived from other entities,for instance.
Example 37

In some cases, one wants to indicate a plan was followed, without having to specify which agent was involved.

activity(ex:a, [ prov:type="workflow execution" ])wasAssociatedWith(ex:a, -, ex:wf)entity(ex:wf, [ prov:type='prov:Plan',                 ex:label="Workflow 1",                 ex:url="http://example.org/workflow1.bpel" %% xsd:anyURI])
In this case, it is assumed that an agent exists, but it has not been specified.

5.3.4Delegation

Delegation is the assignment of authority and responsibility to an agent (by itself or by another agent) to carry out a specific activity as a delegate or representative, while the agent it acts on behalf of retains some responsibility for the outcome of the delegated work.

For example, astudent acted on behalf of his or her supervisor, who acted on behalf of thedepartment chair, who acted on behalf of the university; all thoseagents are responsible in some way for the activity that took place butwe do not say explicitly who bears responsibility and to whatdegree.

Adelegation link, writtenactedOnBehalfOf(id; ag2, ag1, a, attrs) in PROV-N, has:
  • id: anOPTIONAL identifier for the delegation link between delegate and responsible;
  • delegate: an identifier (ag2) for the agent associated with an activity, acting on behalf of the responsibleagent;
  • responsible: an identifier (ag1) for the agent, on behalf of which the delegate agent acted;
  • activity: anOPTIONAL identifier (a) of an activity for which the delegation link holds;
  • attributes: anOPTIONAL set (attrs) of attribute-value pairs representing additional information about this delegation link.
Example 38

The following fragment describes three agents: a programmer, a researcher, and a funder. The programmer and researcher are associated with a workflow activity. The programmer acts on behalfof the researcher (line-management) encoding the commands specified by the researcher; the researcher acts on behalf of the funder, who has a contractual agreement with the researcher. The terms'line-management' and 'contract' used in this example are domain specific.

activity(a,[ prov:type="workflow" ])agent(ag1, [ prov:type="programmer" ])agent(ag2, [ prov:type="researcher" ])agent(ag3, [ prov:type="funder" ])wasAssociatedWith(a, ag1, [ prov:role="loggedInUser" ])wasAssociatedWith(a, ag2)wasAssociatedWith(a, ag3)actedOnBehalfOf(ag1, ag2, a, [ prov:type="line-management" ])actedOnBehalfOf(ag2, ag3, a, [ prov:type="contract" ])

5.3.5Influence

Influence is the capacity of an entity, activity, or agent to have an effect on the character, development, or behavior of another by means of usage, start, end, generation, invalidation, communication, derivation, attribution, association, or delegation.

An influence relation between two objectso2 ando1 is a generic dependency ofo2ono1 that signifies some form of influence ofo1 ono2.

AnInfluence relation, writtenwasInfluencedBy(id; o2, o1, attrs) in PROV-N, has:
  • id: anOPTIONAL identifier identifying the relation;
  • influencee: an identifier (o2) for an entity, activity, or agent;
  • influencer: an identifier (o1) for an ancestor entity, activity, or agent that the former depends on;
  • attributes: anOPTIONAL set (attrs) of attribute-value pairs representing additional information about this relation.

Ausage,start,end,generation,invalidation,communication,derivation,attribution,association, anddelegation is also aninfluence. It isRECOMMENDED to adopt these more specific relations when writing provenance descriptions. It is anticipated that theInfluence relation may be useful to express queries over provenance information.

The following table establishes the correspondence between the attributesinfluencee andinfluencer, and attributes ofUsage,Start,End,Generation,Invalidation,Communication,Derivation,Attribution,Association, andDelegation.
Table 7 ◊:Mapping Relations to Influence
Relation Nameinfluenceeinfluencer
Generationentityactivity
Usageactivityentity
Communicationinformedinformant
Startactivitytrigger
Endactivitytrigger
Invalidationentityactivity
DerivationgeneratedEntityusedEntity
Attributionentityagent
Associationactivityagent
Delegationdelegateresponsible
Example 39

We refer to the example ofSection 4.2, and specifically toFigure 3.We could have expressed that the influence ofw3:Consortium ontr:WD-prov-dm-20111215.

 wasInfluencedBy(tr:WD-prov-dm-20111215, w3:Consortium)
Instead, it is recommended to express the more specific description:
 wasAttributedTo(tr:WD-prov-dm-20111215, w3:Consortium)

5.4Component 4: Bundles

The fourth component of PROV-DM is concerned with bundles, a mechanism to support provenance of provenance.Figure 9 depicts a UML class diagram for the fourth component. It comprises aBundle class defined as a subclass ofEntity.

bundles
Figure 9 ◊: Bundle Component Overview (Informative)

5.4.1Bundle constructor

Abundle is a named set of provenance descriptions, and is itself an entity, so allowing provenance of provenance to be expressed.

Abundle constructor allows the content and the name of a bundle to be specified; it is writtenbundle id description_1 ... description_n endBundle and consists of:
  • id: an identifier for the bundle;
  • descriptions: a set of provenance descriptionsdescription_1, ...,description_n.

A bundle's identifierid identifies a unique set of descriptions.

There may be other kinds of bundles not directly expressible by this constructor, such as provenance descriptions handwritten on a letter or a whiteboard, etc. Whatever the means by which bundles are expressed, all can be described, as in the following section.

5.4.2Bundle Type

A bundle is a named set of descriptions, but it is also an entity so that its provenance can be described.

PROV defines the followingtype for bundles:

  • prov:Bundle is the type that denotes Bundle entities.

PROV defines no bundle-specific attributes.

A bundle description is of the formentity(id, [ prov:type='prov:Bundle', attr1=val1, ...] )whereid is an identifier denoting a bundle, a typeprov:Bundle andanOPTIONAL set of attribute-value pairs ((attr1,val1), ...) representing additional information about this bundle.

The provenance of provenance can then be described using PROV constructs, as illustrated byExample 40andExample 41.

Example 40

Let us consider two entitiesex:report1 andex:report2.

 entity(ex:report1, [ prov:type="report", ex:version=1 ])wasGeneratedBy(ex:report1, -, 2012-05-24T10:00:01)entity(ex:report2, [ prov:type="report", ex:version=2])wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01)wasDerivedFrom(ex:report2, ex:report1)

Let us assume that Bob observed the creation ofex:report1.A first bundle can be expressed.

 bundle bob:bundle1  entity(ex:report1, [ prov:type="report", ex:version=1 ])  wasGeneratedBy(ex:report1, -, 2012-05-24T10:00:01)endBundle

In contrast,Alice observed the creation ofex:report2 and its derivation fromex:report1.A separate bundle can also be expressed.

 bundle alice:bundle2  entity(ex:report1)  entity(ex:report2, [ prov:type="report", ex:version=2 ])  wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01)  wasDerivedFrom(ex:report2, ex:report1)endBundle

The first bundle contains the descriptions corresponding to Bob observing the creation ofex:report1. Its provenance can be described as follows.

 entity(bob:bundle1, [ prov:type='prov:Bundle' ])wasGeneratedBy(bob:bundle1, -, 2012-05-24T10:30:00)wasAttributedTo(bob:bundle1, ex:Bob)

In contrast, the second bundle is attributed to Alice whoobserved the derivation ofex:report2 fromex:report1.

 entity(alice:bundle2, [ prov:type='prov:Bundle' ])wasGeneratedBy(alice:bundle2, -, 2012-05-25T11:15:00)wasAttributedTo(alice:bundle2, ex:Alice)
Example 41

A provenance aggregator could merge two bundles, resulting in a novel bundle, whose provenance is described as follows.

 bundle agg:bundle3  entity(ex:report1, [ prov:type="report", ex:version=1 ])  wasGeneratedBy(ex:report1, -, 2012-05-24T10:00:01)  entity(ex:report2, [ prov:type="report", ex:version=2 ])  wasGeneratedBy(ex:report2, -, 2012-05-25T11:00:01)  wasDerivedFrom(ex:report2, ex:report1)endBundleentity(agg:bundle3, [ prov:type='prov:Bundle' ])agent(ex:aggregator01, [ prov:type='ex:Aggregator' ])wasAttributedTo(agg:bundle3, ex:aggregator01)wasDerivedFrom(agg:bundle3, bob:bundle1)wasDerivedFrom(agg:bundle3, alice:bundle2)

The new bundle is given a new identifieragg:bundle3 and is attributed to theex:aggregator01 agent.

5.5Component 5: Alternate Entities

The fifth component of PROV-DM is concerned withrelations SpecializationOf (Specialization) and AlternateOf (Alternate) between entities.Figure 10 depictsthe fifth component with a single class and two binary associations.

alternates
Figure 10 ◊: Alternates Component Overview (Informative)

Two provenance descriptions about the same thing may emphasize differents aspects of that thing.

Example 42

User Alice writes an article. In its provenance, she wishes to refer to the precise version of the article with a date-specific IRI, as she might edit the article later. Alternatively, user Bob refers to the article in general, independently of its variants over time.

The PROV data model introduces relations, called specialization and alternatethat allow entities to be linked together. They are defined as follows.

5.5.1Specialization

An entity that is aspecialization of another shares all aspects of the latter, and additionally presents more specific aspects of the same thing as the latter. In particular, the lifetime of the entity being specialized contains that of any specialization.

Examples of aspects include a time period, an abstraction, and a context associated with the entity.

Aspecialization relation, writtenspecializationOf(infra, supra) in PROV-N, has:
  • specificEntity: an identifier (infra)of the entity that is a specialization of the general entity (supra);
  • generalEntity: an identifier (supra) of the entity that is being specialized.

A specialization is not, as defined here, also an influence, and therefore does not have an id and attributes.

Example 43

The BBC news home page on 2012-03-23ex:bbcNews2012-03-23is a specialization of the BBC news page in generalbbc:news/. This can be expressed as follows.

specializationOf(ex:bbcNews2012-03-23, bbc:news/)
We have created a new qualified name,ex:bbcNews2012-03-23, in the namespaceex, to identify the specific page carrying this day's news, which would otherwise be the genericbbc:news/ page.

5.5.2Alternate

Twoalternate entities present aspects of the same thing. These aspects may be the same or different, and the alternate entities may or may not overlap in time.

Analternate relation, writtenalternateOf(e1, e2) in PROV-N, has:
  • alternate1: an identifier (e1) of the first of the two entities;
  • alternate2: an identifier (e2) of the second of the two entities.

An alternate is not, as defined here, also an influence, and therefore does not have an id and attributes.

Note thatalternateOf is a necessarily very generalrelationship that, in reasoning, only states that the twoalternate entities respectively fix some aspects of some common thing(possibly evolving over time), and so there is some relevantconnection between the provenance of the alternates. In aspecific application context,alternateOf, or a subtype of it,could allow more inferences.

Example 44

A given news item on the BBC News sitebbc:news/science-environment-17526723 for desktopis an alternate of abbc:news/mobile/science-environment-17526723 for mobile devices.

entity(bbc:news/science-environment-17526723,        [ prov:type="a news item for desktop"])entity(bbc:news/mobile/science-environment-17526723,        [ prov:type="a news item for mobile devices"])alternateOf(bbc:news/science-environment-17526723,             bbc:news/mobile/science-environment-17526723)
Example 45

Considering again the two versions of the technical reporttr:WD-prov-dm-20111215 (second working draft) andtr:WD-prov-dm-20111018 (first working draft). They are alternates of each other.

entity(tr:WD-prov-dm-20111018)entity(tr:WD-prov-dm-20111215)alternateOf(tr:WD-prov-dm-20111018, tr:WD-prov-dm-20111215)

They are both specializations of the pagehttp://www.w3.org/TR/prov-dm/.

5.6Component 6: Collections

The sixth component of PROV-DM is concerned with the notion of collections. A collection is an entity that has some members. The members are themselves entities, and therefore their provenance can be expressed. Some applications need to be able to express the provenance of the collection itself: e.g. who maintains the collection (attribution), which members it contains as it evolves, and how it was assembled. The purpose of Component 6 is to define the types and relations that are useful to express the provenance of collections.

Figure 11 depictsthe sixth component with two new classes (Collection, Empty Collection) and one association HadMember (Membership).

collection
Figure 11 ◊: Collections Component Overview (Informative)

5.6.1Collection

Acollection is an entity that provides a structure to some constituents that must themselves be entities. These constituents are said to bemember of the collections. Anempty collection is a collection without members.

PROV-DM defines the following types related to collections:

  • prov:Collection denotes an entity of type Collection, i.e. an entity that can participate in relations amongst collections;
  • prov:EmptyCollection denotes an empty collection.

PROV defines no collection-specific attributes.

Example 46
entity(c0, [ prov:type='prov:EmptyCollection' ])  // c0 is an empty collectionentity(c1, [ prov:type='prov:Collection'  ])      // c1 is a collection, with unknown content

5.6.2Membership

Amembership relation is defined for stating the members of a Collection.

Membership is the belonging of an entity to a collection.

Amembership relation, writtenhadMember(c, e), has:
  • collection: an identifier (c) for the collection whose member is asserted;
  • entity: the identifiere of an entity that is member of the collection.

Membership is not, as defined here, also an influence, and therefore does not have an id and attributes.

Example 47

In this example,c is a collection known to havee0,e1, ande2 as members, and may have other members.

entity(e0)entity(e1)entity(e2)entity(c, [prov:type='prov:Collection'  ])      // c is a collection, with unknown contenthadMember(c, e0)hadMember(c, e1)hadMember(c, e2)

5.7Further Elements of PROV-DM

This section introduces further elements of PROV-DM.

5.7.1Identifier

Anidentifier is aqualified name.

Entity,Activity, andAgent have a mandatory identifier. Two entities (resp. activities, agents) are equal if they have the same identifier.

Generation,Usage,Communication,Start,End,Invalidation,Derivation,Attribution,Association,Delegation,Influence have an optional identifier. Two generations (resp. usages, communications, etc.) are equal if they have the same identifier.

5.7.2Attribute

Anattribute is aqualified name.

The PROV data model introduces a pre-defined set of attributes in thePROV namespace, which we define below. This specification does not provide any interpretation for any attribute declared in any other namespace.

Table 8 ◊:PROV-DM Attributes At a Glance
AttributeAllowed InvalueSection
prov:labelany constructAValue of typexsd:stringSection 5.7.2.1
prov:locationEntity,Activity,Agent,Usage,Generation,Invalidation,Start, andEndAValueSection 5.7.2.2
prov:roleUsage,Generation,Invalidation,Association,Start, andEndAValueSection 5.7.2.3
prov:typeany constructAValueSection 5.7.2.4
prov:valueEntityAValueSection 5.7.2.5
5.7.2.1prov:label

The attributeprov:label provides a human-readable representation of an instance of a PROV-DM type or relation.The value associated with the attributeprov:labelMUST be a string.

Example 48

The following entity is provided with a label attribute.

 entity(ex:e1, [ prov:label="This is a human-readable label" ])

The following entity has two label attributes, in French and English.

 entity(ex:car01, [ prov:label="Voiture 01"@fr, prov:label="Car 01"@en ])
5.7.2.2prov:location

Alocation can be an identifiable geographic place (ISO 19112), but it can also be a non-geographic place such as a directory, row, or column.As such, there are numerous ways in which location can be expressed, such as by a coordinate,address, landmark, and so forth. This document does not specify how to concretely express locations, but instead provide a mechanism to introduce locations, by means of a reserved attribute.

The attributeprov:location is anOPTIONAL attribute ofEntity,Activity,Agent,Usage,Generation,Invalidation,Start, andEnd. The value associated with the attributeprov:locationMUST be a PROV-DMValue, expected to denote a location.

While the attributeprov:location is allowed for several PROV concepts, it may not make sense to use it in some cases. For example, an activity that describes the relocation of an entity will have start and end locations, as well as every place in between those points.

Example 49

The following expression describes entity Mona Lisa, a painting, with a location attribute.

 entity(ex:MonaLisa, [ prov:location="Le Louvre, Paris", prov:type="StillImage" ])

The following expression describes a cell, at coordinates (5,5), with value 10.

 entity(ex:cell, [ prov:location="(5,5)", prov:value="10" %% xsd:integer ])
5.7.2.3prov:role

Arole is the function of an entity or agent with respect to an activity, in the context of ausage,generation,invalidation,association,start, andend.

The attributeprov:role is allowed to occur multiple times in a list of attribute-value pairs. The value associated with aprov:role attributeMUST be a PROV-DMValue.

Example 50

The following activity is associated with an agent acting as the operator.

 wasAssociatedWith(a, ag, [ prov:role="operator" ])

In the following expression, the activityex:div01 used entityex:cell in the role of divisor.

used(ex:div01, ex:cell, [ prov:role="divisor" ])
5.7.2.4prov:type

The attributeprov:type provides further typing information for any construct with an optional set of attribute-value pairs.

PROV-DM liberallydefines a type as a category of things having common characteristics. PROV-DM is agnostic about the representation of types, and only states thatthe value associated with aprov:type attributeMUST be a PROV-DMValue. The attributeprov:typeis allowed to occur multiple times.

Example 51

The following describes an agent of type software agent.

   agent(ag, [ prov:type='prov:SoftwareAgent' ])

The following types are pre-defined in PROV, and are valid values for theprov:type attribute.

Table 9 ◊:PROV-DM Predefined Types
TypeSpecificationCore concept
prov:BundleSection 5.4.1Entity
prov:CollectionSection 5.6.1Entity
prov:EmptyCollectionSection 5.6.1Entity
prov:OrganizationSection 5.3.1Agent
prov:PersonSection 5.3.1Agent
prov:PlanSection 5.3.3Entity
prov:PrimarySourceSection 5.2.4Derivation
prov:QuotationSection 5.2.3Derivation
prov:RevisionSection 5.2.2Derivation
prov:SoftwareAgentSection 5.3.1Agent
5.7.2.5prov:value

The attributeprov:value provides a value that is a direct representation of an entity as a PROV-DMValue.

The attributeprov:value is anOPTIONAL attribute of entity. The value associated with the attributeprov:valueMUST be a PROV-DMValue. The attributeprov:valueMAY occur at most once in a set of attribute-value pairs.

Example 52

The following example illustrates the provenance of the number4 obtained by an activity that computed the length of an input string"abcd".The input and the output are expressed as entitiesex:in andex:out, respectively. They each have aprov:value attribute associated with the corresponding value.

entity(ex:in, [ prov:value="abcd" ]) entity(ex:out, [ prov:value=4 ]) activity(ex:len, [ prov:type="string-length" ])used(ex:len, ex:in)wasGeneratedBy(ex:out, ex:len)wasDerivedFrom(ex:out, ex:in)

Two different entitiesMAY have the same value for the attribute prov:value. For instance,when two entities, with the same prov:value, are generated by two different activities, as illustratedby the following example.

Example 53

Example 52 illustrates an entity with a given value4. This examples shows that another entity with the same value may be computed differently (by an addition).

entity(ex:in1, [ prov:value=3 ]) entity(ex:in2, [ prov:value=1 ]) entity(ex:out2, [ prov:value=4 ])      // ex:out2 also has value 4activity(ex:add1, [ prov:type="addition" ])used(ex:add1, ex:in1)used(ex:add1, ex:in2)wasGeneratedBy(ex:out2, ex:add1)

5.7.3Value

Avalue is a constant such as a string, number, time, qualified name, IRI, and encoded binary data, whose interpretation is outside the scope of PROV. Values can occur in attribute-value pairs.

Each kind of such values is called adatatype. Use of the following data types isRECOMMENDED.

  • The RDF-compatible [RDF-CONCEPTS] types, including those taken from the set of XML Schema Datatypes [XMLSCHEMA11-2];
  • Qualified names introduced in this specification.

The normative definitions of these datatypes are provided by their respective specifications.

Conformance to RDF Datatypes As of the publication of this document, RDF 1.1 Concepts and Abstract Syntax [RDF-CONCEPTS11] is not yet aW3C Recommendation (seehttp://www.w3.org/TR/rdf11-concepts/ for the latest version). Both the Provenance Working Group and the RDF Working Group are confident that there will be only minor changes before it becomes aW3C Recommendation. In order to take advantage of the anticipated corrections and new features sooner, while also providing stability in case the specification does not advance as expected, conformance to PROV as it relates to RDF Datatypes is defined as follows:

This "change in normative reference" is effective as of the publication of RDF 1.1 as aW3C Recommendation. However,W3C expects to publish a new edition of PROV once RDF 1.1 becomes a Recommendation to update the reference explicitly.

Example 54

The following examples respectively are the string "abc", the integer number 1, and the IRI "http://example.org/foo".

  "abc"  "1" %% xsd:integer  "http://example.org/foo" %% xsd:anyURI

The following example shows a value of typeprov:QUALIFIED_NAME (seeprov:QUALIFIED_NAME [PROV-N]).The prefixex must be bound to anamespace declared in anamespace declaration.

   "ex:value" %% prov:QUALIFIED_NAME
Alternatively, the same value can be expressed using the following convenience notation.
   'ex:value'

We note that PROVtime instants are defined according to xsd:dateTime [XMLSCHEMA11-2].

Example 55

In the following example, the generation time of entitye1 is expressed according toxsd:dateTime [XMLSCHEMA11-2].

   wasGeneratedBy(e1,a1, 2001-10-26T21:32:52)

5.7.4Namespace Declaration

Anamespace is identified by an IRI [RFC3987]. In PROV-DM, attributes, identifiers, and values withqualified names as data type can be placed in a namespace using the mechanisms described in this specification.

Anamespace declaration consists of a binding between a prefix and a namespace. Every qualified name with this prefix in the scope of thisdeclaration refers to this namespace.

Adefault namespace declaration consists of a namespace. Every un-prefixed qualified namerefers to default namespace declaration.

ThePROV namespace is identified by the IRIhttp://www.w3.org/ns/prov#.

5.7.5Qualified Name

Aqualified name is a name subject to namespace interpretation. It consists of a namespace, denoted by an optional prefix, and a local name.

PROV-DM stipulates that a qualified name can be mapped into an IRI by concatenating the IRI associated with the prefix and the local part.

A qualified name's prefix isOPTIONAL. If a prefix occurs in a qualified name, it refers to anamespace declared in a namespace declaration. In the absence of prefix, the qualified name refers to thedefault namespace.

6.PROV-DM Extensibility Points

The PROV data model provides extensibility points that allow designers to specialize it for specific applications or domains. We summarize these extensibility points here.

ThePROV namespace declares a set of reserved attributes catering for extensibility:prov:type,prov:role,prov:location.

The PROV data model is designed to be application and technology independent, but implementers are welcome and encouraged to specialize PROV-DM to specific domains and applications. To ensure interoperability, specializations ofthe PROV data model that exploit the extensibility points summarized in this section must preserve the semantics specified in this document and in [PROV-CONSTRAINTS].

7.Creating Valid Provenance

This specification defines PROV-DM, a data model that allows descriptions of the people, institutions, entities, and activities,involved in producing, influencing, or delivering a piece of data or athing to be expressed. However, with this data model, it is also possible to composedescriptions that would not make sense: for instance, one couldexpress that an entity was used before it was generated, or that theactivity that generated an entity started after the entitygeneration. A set of constraints have been defined for PROV andcan be found in a companion specification [PROV-CONSTRAINTS].TheySHOULD be used by developers to compose provenance descriptions that are valid, andby implementers of reasoning engines aiming to check whether provenance descriptions have problems.

The example ofsection 3 contains identifiers such astr:WD-prov-dm-20111215, which denotes a specific version of a technical report. On the other hand, an IRI such ashttp://www.w3.org/TR/prov-dm/ denotes the latest version of a document. One needs to ensure that provenance descriptions for the latter resource remain valid as the resource state changes.

To this end, PROV allows asserters to describe "partial states" of entities by means of attributes and associated values. Some further constraints apply to the use of these attributes, since the values associated with them are expected to remain unchanged for some period of time. The constraints associated to attributes allow provenance descriptions to be refined, they can also be found in the companion specification [PROV-CONSTRAINTS].

A.Cross-References to PROV-O and PROV-N

PROV-DM is a conceptual data model which can be serialized in various ways. The following table contains the PROV-O classes and properties, as described in [PROV-O], and PROV-N productions, as described in [PROV-N] that correspond to PROV-DM concepts.

Table 10 ◊:Cross-References to PROV-O and PROV-N
PROV-DMPROV-OPROV-NComponent
EntityEntityentityExpressionComponent 1:
Entities/Activities
ActivityActivityactivityExpression
GenerationwasGeneratedBy,GenerationgenerationExpression
Usageused,UsageusageExpression
CommunicationwasInformedBy,CommunicationcommunicationExpression
StartwasStartedBy,StartstartExpression
EndwasEndedBy,EndendExpression
InvalidationwasInvalidatedBy,InvalidationinvalidationExpression
DerivationwasDerivedFrom,DerivationderivationExpressionComponent 2:
Derivations
RevisionwasRevisionOf,RevisiontypeRevision
QuotationwasQuotedFrom,QuotationtypeQuotation
Primary SourcehadPrimarySource,PrimarySourcetypePrimarySource
AgentAgentagentExpressionComponent 3:
Agents, Responsibility,
Influence
AttributionwasAttributedTo,AttributionattributionExpression
AssociationwasAssociatedWith,AssociationassociationExpression
DelegationactedOnBehalfOf,DelegationdelegationExpression
PlanPlantypePlan
PersonPersontypePerson
OrganizationOrganizationtypeOrganization
SoftwareAgentSoftwareAgenttypeSoftwareAgent
InfluencewasInfluencedBy,InfluenceinfluenceExpression
Bundle constructorbundle descriptionbundleComponent 4:
Bundles
Bundle typeBundletypeBundle
AlternatealternateOfalternateExpressionComponent 5:
Alternate
SpecializationspecializationOfspecializationExpression
CollectionCollectiontypeCollectionComponent 6:
Collections
EmptyCollectionEmptyCollectiontypeEmptyCollection
MembershiphadMembermembershipExpression

B.Change Log

B.1Changes since Proposed Recommendation

B.2Changes since Candidate Recommendation

B.3Changes since Last Call

Please see theResponses to Public Comments on the Last Call Working Draft for more details about the justification of these changes.

C.Acknowledgements

This document has been produced by the Provenance Working Group, and its contents reflect extensive discussion within the Working Group as a whole. The editors extend special thanks to Sandro Hawke (W3C/MIT) and Ivan Herman (W3C/ERCIM),W3C contacts for the Provenance Working Group.

The editors acknowledge valuable contributions from the following:Tom Baker,David Booth,Robert Freimuth,Satrajit Ghosh,Ralph Hodgson,Renato Iannella,Jacek Kopecky,James Leigh,Jacco van Ossenbruggen,Alan Ruttenberg,Reza Samavi, andAntoine Zimmermann.

Members of the Provenance Working Group at the time of publication of this document were:Ilkay Altintas (Invited expert),Reza B'Far (Oracle Corporation),Khalid Belhajjame (University of Manchester),James Cheney (University of Edinburgh, School of Informatics),Sam Coppens (iMinds - Ghent University),David Corsar (University of Aberdeen, Computing Science),Stephen Cresswell (The National Archives),Tom De Nies (iMinds - Ghent University),Helena Deus (DERI Galway at the National University of Ireland, Galway, Ireland),Simon Dobson (Invited expert),Martin Doerr (Foundation for Research and Technology - Hellas(FORTH)),Kai Eckert (Invited expert),Jean-Pierre EVAIN (European Broadcasting Union, EBU-UER),James Frew (Invited expert),Irini Fundulaki (Foundation for Research and Technology - Hellas(FORTH)),Daniel Garijo (Universidad Politécnica de Madrid),Yolanda Gil (Invited expert),Ryan Golden (Oracle Corporation),Paul Groth (Vrije Universiteit),Olaf Hartig (Invited expert),David Hau (National Cancer Institute, NCI),Sandro Hawke (W3C/MIT),Jörn Hees (German Research Center for Artificial Intelligence (DFKI) Gmbh),Ivan Herman, (W3C/ERCIM),Ralph Hodgson (TopQuadrant),Hook Hua (Invited expert),Trung Dong Huynh (University of Southampton),Graham Klyne (University of Oxford),Michael Lang (Revelytix, Inc.),Timothy Lebo (Rensselaer Polytechnic Institute),Jamie McCusker (Rensselaer Polytechnic Institute),Deborah McGuinness (Rensselaer Polytechnic Institute),Simon Miles (Invited expert),Paolo Missier (School of Computing Science, Newcastle university),Luc Moreau (University of Southampton),James Myers (Rensselaer Polytechnic Institute),Vinh Nguyen (Wright State University),Edoardo Pignotti (University of Aberdeen, Computing Science),Paulo da Silva Pinheiro (Rensselaer Polytechnic Institute),Carl Reed (Open Geospatial Consortium),Adam Retter (Invited Expert),Christine Runnegar (Invited expert),Satya Sahoo (Invited expert),David Schaengold (Revelytix, Inc.),Daniel Schutzer (FSTC, Financial Services Technology Consortium),Yogesh Simmhan (Invited expert),Stian Soiland-Reyes (University of Manchester),Eric Stephan (Pacific Northwest National Laboratory),Linda Stewart (The National Archives),Ed Summers (Library of Congress),Maria Theodoridou (Foundation for Research and Technology - Hellas(FORTH)),Ted Thibodeau (OpenLink Software Inc.),Curt Tilmes (National Aeronautics and Space Administration),Craig Trim (IBM Corporation),Stephan Zednik (Rensselaer Polytechnic Institute),Jun Zhao (University of Oxford),Yuting Zhao (University of Aberdeen, Computing Science).

D.References

D.1Normative references

[PROV-CONSTRAINTS]
James Cheney; Paolo Missier; Luc Moreau; eds.Constraints of the PROV Data Model. 30 April 2013, W3C Recommendation. URL:http://www.w3.org/TR/2013/REC-prov-constraints-20130430/
[PROV-N]
Luc Moreau; Paolo Missier; eds.PROV-N: The Provenance Notation. 30 April 2013, W3C Recommendation. URL:http://www.w3.org/TR/2013/REC-prov-n-20130430/
[PROV-O]
Timothy Lebo; Satya Sahoo; Deborah McGuinness; eds.PROV-O: The PROV Ontology. 30 April 2013, W3C Recommendation. URL:http://www.w3.org/TR/2013/REC-prov-o-20130430/
[RDF-CONCEPTS]
Graham Klyne; Jeremy J. Carroll.Resource Description Framework (RDF): Concepts and Abstract Syntax. 10 February 2004. W3C Recommendation. URL:http://www.w3.org/TR/2004/REC-rdf-concepts-20040210
[RFC2119]
S. Bradner.Key words for use in RFCs to Indicate Requirement Levels. March 1997. Internet RFC 2119. URL:http://www.ietf.org/rfc/rfc2119.txt
[RFC3987]
M. Dürst; M. Suignard.Internationalized Resource Identifiers (IRIs) (RFC 3987). January 2005. RFC. URL:http://www.ietf.org/rfc/rfc3987.txt
[XMLSCHEMA11-2]
Henry S. Thompson et al.W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes. 5 April 2012. W3C Recommendation. URL:http://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/

D.2Informative references

[Logic]
W. E. Johnson.Logic: Part III.1924. URL:http://www.ditext.com/johnson/intro-3.html
[Mappings]
Satya Sahoo; Paul Groth; Olaf Hartig; Simon Miles; Sam Coppens; James Myers; Yolanda Gil; Luc Moreau; Jun Zhao; Michael Panzer; Daniel GarijoProvenance Vocabulary Mappings. August 2010 URL:http://www.w3.org/2005/Incubator/prov/wiki/Provenance_Vocabulary_Mappings
[PROV-AQ]
Graham Klyne; Paul Groth; eds.Provenance Access and Query. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-aq-20130430/
[PROV-DC]
Daniel Garijo; Kai Eckert; eds.Dublin Core to PROV Mapping. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-dc-20130430/
[PROV-DICTIONARY]
Tom De Nies; Sam Coppens; eds.PROV Dictionary: Modeling Provenance for Dictionary Data Structures. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-dictionary-20130430/
[PROV-LAYOUT]
W3C PROV Working Group.PROV Graph Layout Conventions. 2012. URL:http://www.w3.org/2011/prov/wiki/Diagrams
[PROV-LINKS]
Luc Moreau; Timothy Lebo; eds.Linking Across Provenance Bundles. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-links-20130430/
[PROV-OVERVIEW]
Paul Groth; Luc Moreau; eds.PROV-OVERVIEW: An Overview of the PROV Family of Documents. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-overview-20130430/
[PROV-PRIMER]
Yolanda Gil; Simon Miles; eds.PROV Model Primer. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-primer-20130430/
[PROV-SEM]
James Cheney; ed.Semantics of the PROV Data Model. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-sem-20130430.
[PROV-XML]
Hook Hua; Curt Tilmes; Stephan Zednik; eds.PROV-XML: The PROV XML Schema. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-xml-20130430/
[RDF-CONCEPTS11]
Richard Cyganiak; David Wood; eds.RDF 1.1 Concepts and Abstract Syntax. Working Draft. URL:http://www.w3.org/TR/rdf11-concepts/
[UML]
Object Management GroupUnified Modeling Language: Superstructure. version 2.0, 2005 URL:http://www.omg.org/spec/UML/2.0/Superstructure/PDF/

[8]ページ先頭

©2009-2025 Movatter.jp