Movatterモバイル変換


[0]ホーム

URL:


W3C

The PROV Data Model and Abstract Syntax Notation

W3C Working Draft 15 December 2011

This version:
http://www.w3.org/TR/2011/WD-prov-dm-20111215/
Latest published version:
http://www.w3.org/TR/prov-dm/
Latest editor's draft:
http://dvcs.w3.org/hg/prov/raw-file/default/model/ProvenanceModel.html
Previous version:
http://www.w3.org/TR/2011/WD-prov-dm-20111018/
Editors:
Luc Moreau, University of Southampton
Paolo Missier, Newcastle University
Contributors:
Khalid Belhajjame, University of Manchester
Stephen Cresswell, legislation.gov.uk
Yolanda Gil, Invited Expert
Ryan Golden, Oracle Corporation
Paul Groth, VU University of Amsterdam
Graham Klyne, University of Oxford
Jim McCusker, Rensselaer Polytechnic Institute
Simon Miles, Invited Expert
James Myers, Rensselaer Polytechnic Institute
Satya Sahoo, Case Western Reserve University

Copyright © 2011W3C® (MIT,ERCIM,Keio), All Rights Reserved.W3Cliability,trademark anddocument use rules apply.


Abstract

PROV-DM is a data model for provenance for buildingrepresentations of the entities, people and activities involved inproducing a piece of data or thing in the world. PROV-DM isdomain-agnotisc, but with well-defined extensibility points allowingfurther domain-specific and application-specific extensions to bedefined. It is accompanied by PROV-ASN, a technology-independentabstract syntax notation, which allows serializations of PROV-DMinstances to be created for human consumption, which facilitates itsmapping to concrete syntax, and which is used as the basis for aformal semantics.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of currentW3C publications and the latest revision of this technical report can be found in theW3C technical reports index at http://www.w3.org/TR/.

This document is part of a set of specifications aiming to define the various aspects that are necessary to achieve the vision of inter-operable interchange of provenance information in heterogeneous environments such as the Web. This document defines the PROV-DM data model for provenance, accompanied with a notation to express instances of that data model for human consumption. Three other documents are: 1) a normative serialization of PROV-DM in RDF, specified by means of a mapping to the OWL2 Web Ontology Language; 2)the mechanisms for accessing and querying provenance; 3) a primer for the provenance data model.

This document was published by theProvenance Working Group as a Working Draft. This document is intended to become aW3C Recommendation. If you wish to make comments regarding this document, please send them topublic-prov-wg@w3.org (subscribe,archives). All feedback is welcome.

Publication as a Working Draft does not imply endorsement by theW3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the5 February 2004W3C Patent Policy.W3C maintains apublic list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes containsEssential Claim(s) must disclose the information in accordance withsection 6 of theW3C Patent Policy.

Table of Contents

1.Introduction

For the purpose of this specification, provenance is defined as a record that describes the people,institutions, entities, and activities, involved in producing,influencing, or delivering a piece of data or a thing in the world.In particular, the provenance of information is crucial in decidingwhether information is to be trusted, how it should be integrated withother diverse information sources, and how to give credit to itsoriginators when reusing it. In an open and inclusive environmentsuch as the Web, users find information that is often contradictory orquestionable: provenance can help those users to make trust judgments.

The idea that a single way of representing and collecting provenance could be adopted internally by all systems does not seem to be realistic today. Instead, a pragmatic approach is to consider a core data model for provenance that allows domain and application specific representations of provenance to be translated into such a data model and exchanged between systems.Heterogeneous systems can then export their provenance into such a core data model, and applications that need to make sense of provenance in heterogeneous systems can then import it, process it, and reason over it.

Thus, the vision is that different provenance-aware systems natively adopt their own model for representing their provenance, but a core provenance data model can be readily adopted as a provenanceinterchange model across such systems.

A set of specifications define the various aspectsthat are necessary to achieve this vision in an inter-operableway, the first of which is contained in this document:

The PROV-DM data model for provenance consists of a set of coreconcepts, and a few common relations, based on these core concepts. PROV-DM is a domain-agnotisc model, but with well-defined extensibility points allowing further domain-specific and application-specific extensions to be defined.

This specification also introducesPROV-ASN, an abstract syntax that is primarily aimed at human consumption. PROV-ASN allowsserializations of PROV-DM instances to be written in a technology independent manner,it facilitates its mapping to concrete syntax, and it is used as the basis for aformal semantics. This specification uses instances of provenance written in PROV-ASN to illustrate the data model.

1.1Structure of this Document

Insection 2, a set of preliminaries are introduced, including concepts that underpin PROV-DM and motivations for the PROV-ASN notation.

Section 3 provides an overview of PROV-DM listing its core types and their relations.

Insection 4, PROV-DM isapplied to a short scenario, encoded in PROV-ASN, and illustratedgraphically.

Section 5 provides the normative definition of PROV-DM and the notation PROV-ASN.

Section 6 introduces common relations used in PROV-DM, including relations for data collections and common domain-independent common relations.

Section 7 summarizes PROV-DM extensibility points.

Section 8 discusses how PROV-DM can be applied to the notion of resource.

1.2PROV-DM Namespace

The PROV-DM namespace ishttp://www.w3.org/ns/prov-dm/ (TBC).

All the elements, relations, reserved names and attributes introduced in this specification belong to the PROV-DM namespace.

There is a desire to use a single namespace that all specs can share to refer to common provenance terms.

1.3Conventions

The key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" in this document are to be interpreted as described in [RFC2119].

2.Preliminaries

2.1A Conceptualization of the World

2.1.1Entity, Activity, Agent

This specification is based on a conceptualization of the worldthat is described in this section. In the world (whether real or not),there are things, which can be physical, digital, conceptual, orotherwise, and activities involving things.

When we talk about things in the world in natural language and even when we assign identifiers, we are often imprecise in ways that make it difficult to clearly and unambiguously report provenance: a resource with a URL may be understood as referring to a report available at that URL, the version of the report available there today, the report independent of where it is hosted over time, etc.

Hence, to accommodate different perspectives on things and their situation in the world as perceived by us, we introduce the idea of a characterized thing, which refers to a thing and its situation in the world, as characterized by someone. We then define anentity as an identifiable characterized thing. An entityfixes some aspects of a thing and its situation in the world, so that it becomes possible to express its provenance, and what causes these specific aspects to be as such. An alternative entity may fix other aspects, and its provenance may be different.

Different users may take different perspectives on a resource witha URL. These perspectives in this conceptualization of the world arereferred to as entities. Three such entities may beexpressed:
  • a report available at URL: fixes the nature of the thing, i.e. a document, and its location;
  • the version of the report available there today: fixes its version number, contents, and its date;
  • the report independent of where it is hosted and of its content over time: fixes the nature of the thing as a conceptual artifact.
The provenance of these three entities may differ, and may be along the following lines:
  • the provenance of a report available at URL may include: the act of publishing it and making it available at a given location, possibly under some license and access control;
  • the provenance of the version of the report available there today may include: the authorship of the specific content, and reference to imported content;
  • the provenance of the report independent of where it is hosted over time may include: the motivation for writing the report, the overall methodology for producing it, and the broad team involved in it.

We do not assume that any characterization is more important than any other, and in fact, it is possible to describe the processing that occurred for the report to be commissioned, for individual versions to be created, for those versions to be published at the given URL, etc., each via a different entity that characterizes the report appropriately.

In the world,activities involveentities in multiple ways: they consume them, they process them, theytransform them, they modify them, they change them, they relocatethem, they use them, they generate them, they are controlled by them,etc.

Anagent is a type of entity that takes an active role in an activity such that it can be assigned some degree of responsibility for the activity taking place.This definition intentionally stays away from using concepts such as enabling, causing, initiating, affecting, etc, because any entities also enable, cause, initiate, and affect in some way the activities. So the notion of having some degree of responsibility is really what makes an agent.

Even software agents can be assigned some responsibility for the effects they have in the world, so for example if one is using a Text Editor and one's laptop crashes, then one would say that the Text Editor was responsible for crashing the laptop. If one invokes a service to buy a book, that service can be considered responsible for drawing funds from one's bank to make the purchase (the company that runs the service and the web site would also be responsible, but the point here is that we assign some measure of responsibility to software as well). So when someone models software as an agent for an activity in our model, they mean the agent has some responsibility for that activity.

In this specification, the qualifier 'identifiable' is implicit whenever a reference is made to an activity, agent, or an entity.

2.1.2Time and Event

Time is critical in the context of provenance, since it can help corroborate provenance claims. For instance, if an entity is claimed to be obtained by transforming another, then the latter must have existed before the former. If it is not the case, then there is something wrong in such a provenance claim.

Although time is critical, we should also recognize that provenance can be used in many different contexts: in a single system, across the Web, or in spatial data management, to name a few. Hence, it is a design objective of PROV-DM to minimize the assumptions about time, so that PROV-DM can be used in varied contexts.

Furthermore, consider two activities that started at the same timeinstant. Just by referring to that instant, we cannot distinguishwhich activity start we refer to. This is particularly relevant if wetry to explain that the start of these activities had differentreasons. We need to be able to refer to the start of an activity as afirst class concept, so that we can talk about it and about itsrelation with respect to other similar starts.

Hence, in our conceptualization of the world, an instantaneous event, orevent for short, happens in the world and marks a change in the world, in its activities and in its entities. The term "event" is commonly used in process algebra with a similar meaning. For instance, in CSP [CSP], events represent communications or interactions; they are assumed to be atomic and instantaneous.

2.1.2.1Types of Events

Four kinds of events underpin the PROV-DM data model. Theactivity start andactivity end events demarcate the beginning and the end of activities, respectively. Theentity generation andentity usage events demarcate the characterization interval for entities. More specifically:

Anentity generation event is theevent that marks the final instant of an entity's creation timespan, after which it becomes available for use.

Anentity usage event is theevent that marks the first instant of an entity's consumption timespan by an activity.

Anactivity start event is theevent that marks the instant an activity starts.

Anactivity end event is theevent that marks the instant an activity ends.

2.1.2.2Event Ordering

To allow for minimalistic clock assumptions, like Lamport[CLOCK], PROV-DM relies on a notion of relative ordering ofevents,without using physical clocks. This specification assumes that a partial order exists betweenevents.

Specifically,follows is a partialorder betweenevents, indicating that anevent occurs after another.For symmetry,precedes is defined asthe inverse of follows.

How such partial order is realized in practice is beyond the scopeof this specification. This specification only assumes thateachevent can be mapped to an instant in some form oftimeline. The actual mapping is not in scope of thisspecification. Likewise, whether this timeline is formed of a singleglobal timeline or whether it consists of multiple Lamport's styleclocks is also beyond this specification. It is anticipatedthatfollows andprecedes correspond to some orderingover this timeline.

This specification introduces a set of "temporal interpretation"rules allowing to deriveevent ordering constraints fromprovenance records. According to such temporal interpretation,provenance recordsmust satisfy such constraints. We note that theactual verification of such temporal constraints is also outside thescope of this specification.

PROV-DM also allows for time observations to be inserted in specificprovenance records, for each recognizedevent introducedin this specification. The presence of a time observation for agivenevent fixes the mapping of thisevent to thetimeline. It can also help with the verification of associatedtemporal constraints (though, again, this verification is outside thescope of this specfication).

2.2PROV-ASN: The Provenance Abstract Syntax Notation

This specification defines PROV-DM, a data model for provenance, consisting of records describing how people, entities, and activities, were involved in producing,influencing, or delivering a piece of data or a thing in the world.

This specification also relies on a language, PROV-ASN, the Provenance Abstract Syntax Notation, to expressinstances of that data model. For each construct of PROV-DM, a corresponding ASN expression is introduced, by way of a production in the ASN grammar.

PROV-ASN is an abstract syntax, whose goals are:

This specification provides a grammar for PROV-ASN. Each record of the PROV-DM data model is explained in terms of the production of this grammar.

The formal semantics of PROV-DM is defined at[PROV-SEMANTICS] and its encoding in the OWL2 Web Ontology Language at [PROV-O].

2.3Representation, Assertion, and Inference

PROV-DM is a provenance data model designed to expressrepresentations of the world.

A file at some point during its lifecycle, which includes multiple edits by multiple people, can be represented by its location in the file system, a creator, and content.

These representations are relative to an asserter, and in that sense constitute assertions stating properties of the world, as represented by an asserter. Different asserters will normally contribute different representations.This specification does not define a notion of consistency between different sets of assertions (whether by the same asserter or different asserters).The data model provides the means to associate attribution to assertions.

An alternative representation of the above file is a set of blocks in a hard disk.

The data model is designed to capture activities that happened in the past, as opposed to activitiesthat may or will happen. However, this distinction is not formally enforced.Therefore, all PROV-DM assertionsshould be interpreted as a record of what has happened, as opposed to what may or will happen.

This specification does not prescribe the means by which an asserter arrives at assertions; for example, assertions can be composed on the basis of observations, reasoning, or any other means.

Sometimes, inferences about the world can be made from representationsconformant to the PROV-DM data model. When this is the case, thisspecification defines such inferences, allowing new provenance recordsto be inferred from existing ones. Hence, representations of the worldcan result either from direct assertions by asserters or fromapplication of inferences defined by this specification.

2.4Grammar Notation

This specification includes a grammar for PROV-ASN expressed using the Extended Backus-Naur Form (EBNF) notation.

Each rule in the grammar defines one symbol, in the form:

E ::=expression

Within the expression on the right-hand side ofa rule, the follwoing expressions are used to match strings of one or more characters:
  • E: matches term satisfying rule for symbol E.
  • abc: matches the literal string inside the single quotes.
  • expression: matchesexpression or nothing; optionalexpression.
  • expression: matches one or more occurrences ofexpression.
  • expression: matches zero or more occurrences ofexpression.

3.PROV-DM: An Overview

The following ER diagram provides a high level overview of thestructure of PROV-DM records. Examples of provenance assertions that conform to this schema are provided in the next section.

PROV-DM overview
Overview diagram does not represent the sub-relations -- proposal to use a UML notation instead of ER.

The model includes the following elements:

A set of attribute-value pairs can be associated to elements and relations of the PROV model in order to further characterizetheir nature. ThewasComplementOf relationship is used to denote that twoentitiescomplement each other, in the sense that they each represent a partial, but mutually compatible characterization of the same thing.The attributesrole andtype are pre-defined.

The set of relations presented here forms a core, which is further extended with additional relations, defined in SectionCommon Relations.

The model includes a further additional element:notes. These are also structured as sets of attribute-value pairs. Notes are used to provide additional, "free-form" information regardingany identifiable construct of the model, with no prescribed meaning. Notes are described in detailhere.

Attributes and notes are the mainextensibility points in the model: individual interest groups are expected to extend PROV-DM by introducing new attributes and notes as needed to address applications-specific provenance modelling requirements.

4.Example

This section is non-normative.

There is a suggestion that a better example should be adopted for this document. Possibly, several shorter examples. This isISSUE-132
To illustrate PROV-DM, this section presents an example encoded according to PROV-ASN. For more detailed explanations of how PROV-DM should be used, and for more examples, we refer the reader to the Provenance Primer [PROV-PRIMER].
Comments on section 3.2. This isISSUE-71

4.1A File Scenario

This scenario is concerned with the evolution of a crime statisticsfile (referred to as e0) stored on a shared file system and whichjournalists Alice, Bob, Charles, David, and Edith can share andedit. We consider variousevents in the evolution of file e0;events listed below follow each other, unless otherwise specified.

Event evt1: Alice creates (a0) an empty file in /share/crime.txt. We denote this file e1.

Event evt2: Bob appends (a1) the following line to /share/crime.txt:

There was a lot of crime in London last month.

We denote the revised file e2.

Event evt3: Charles emails (a2) the contents of /share/crime.txt, as anattachment, which we refer to as e4. (We specifically refer to a copy of the file that is uploaded on the mail server.)

Event evt4: David edits (a3) file /share/crime.txt as follows.

There was a lot of crime in London and New-York last month.

We denote the revised file e3.

Event evt5: Edith emails (a4) the contents of /share/crime.txt as an attachment, referred to as e5.

Event evt6: betweenevents evt4 and evt5, someone (unspecified) runs a spell checker (a5) on the file /share/crime.txt. The file after spell checking is referred to as e6.

4.2Encoding using PROV-ASN

In this section, the example is encoded according to the provenance data model (specified in sectionPROV-DM: The Provenance Data Model) and expressed in PROV-ASN.

Entity Records (described inSection Entity). The file in its various forms and its copies are modelled as entity records, corresponding to multiple characterizations, as per scenario. The entity records are identified bye0, ...,e6.

entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ])entity(e1, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="" ])entity(e2, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="There was a lot of crime in London last month."])entity(e3, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="There was a lot of crime in London and New York last month."])entity(e4)entity(e5)entity(e6, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="There was a lot of crime in London and New York last month.", ex:spellchecked="yes"])

These entity records list attributes that have been given values during intervals delimited byevents; such intervals are referred to ascharacterization intervals. The following table lists all entity identifiers and their corresponding characterization intervals. When the end of the characterization interval is not delimited by anevent described in this scenario, it is marked by "...".

EntityCharacterization Interval
e0evt1 - ...
e1evt1 - evt2
e2evt2 - evt4
e3evt4 - ...
e4evt3 - ...
e5evt5 - ...
e6evt6 - ...

Activity Records (described inSection Activity) represent activities in the scenario.

activity(a0, create-file,          2011-11-16T16:00:00,)activity(a1, add-crime-in-london,  2011-11-16T16:05:00,)activity(a2, email,                2011-11-16T17:00:00,)activity(a3, edit-London-New-York, 2011-11-17T09:00:00,)activity(a4, email,                2011-11-17T09:30:00,)activity(a5, spellcheck,,)

Generation Records (described inSection Generation) represent theevent at which a file is created in a specific form. Attributes are used to describe the modalities according to which a given entity is generated by a given activity. The interpretation of attributes is application specific. Illustrations of such attributes for the scenario are: no attribute is provided fore0;e2 was generated by the editor's save function;e4 can be found on the smtp port, in the attachment section of the mail message;e6 was produced on the standard output ofa5. Two identifiersg1 andg2 identify the generation records referenced in derivations introduced below.

wasGeneratedBy(e0, a0)wasGeneratedBy(e1, a0, [ex:fct="create"])wasGeneratedBy(e2, a1, [ex:fct="save"])     wasGeneratedBy(e3, a3, [ex:fct="save"])     wasGeneratedBy(g1, e4, a2, [ex:port="smtp", ex:section="attachment"])  wasGeneratedBy(g2, e5, a4, [ex:port="smtp", ex:section="attachment"])    wasGeneratedBy(e6, a5, [ex:file="stdout"])

Usage Records (described inSection Usage) represent theevent by which a file is read by an activity. Likewise, attributes describe the modalities according to which the various entities are used by activities. Illustrations of such attributes are:e1 is used in the context ofa1'sload functionality;e2 is used bya2 in the context of its attach functionality;e3 is used on the standard input bya5. Two identifiersu1 andu2 identify the Usage records referenced in derivations introduced below.

used(a1,e1,[ex:fct="load"])used(a3,e2,[ex:fct="load"])used(u1,a2,e2,[ex:fct="attach"])used(u2,a4,e3,[ex:fct="attach"])used(a5,e3,[ex:file="stdin"])

Derivation Records (described inSection Derivation Relation) express that an entity is derived from another. The first two are expressed in their compact version, whereas the following two are expressed in their full version, including the activity underpinning the derivation, and associated usage (u1,u2) and generation (g1,g2) records.

wasDerivedFrom(e2,e1)wasDerivedFrom(e3,e2)wasDerivedFrom(e4,e2,a2,g1,u1)wasDerivedFrom(e5,e3,a4,g2,u2)

wasComplementOf: (this relation is described inSection wasComplementOf). The crime statistics file (e0) has various contents over its existence (e1,e2,e3); the entity records identified bye1,e2,e3 complemente0 with an attributecontent. Likewise, the one denoted bye6 complements the record denoted bye3 with an attributespellchecked.

wasComplementOf(e1,e0)wasComplementOf(e2,e0)wasComplementOf(e3,e0)wasComplementOf(e6,e3)

Agent Records (described atSection Agent): the various users are represented as agents, themselves being a type of entity.

agent(ag1, [ prov:type="prov:Person" %% xsd:QName, ex:name="Alice" ])agent(ag2, [ prov:type="prov:Person" %% xsd:QName, ex:name="Bob" ])agent(ag3, [ prov:type="prov:Person" %% xsd:QName, ex:name="Charles" ])agent(ag4, [ prov:type="prov:Person" %% xsd:QName, ex:name="David" ])agent(ag5, [ prov:type="prov:Person" %% xsd:QName, ex:name="Edith" ])

Activity Assocation Records (described inSection Activity Association): the association of an agent with an activity is expressed with , and the nature of this association is described by attributes. Illustrations of such attributes include the role of the participating agent, as creator, author and communicator (role is a reserved attribute in PROV-DM).

wasAssociatedWith(a0, ag1, [prov:role="creator"])wasAssociatedWith(a1, ag2, [prov:role="author"])wasAssociatedWith(a2, ag3, [prov:role="communicator"])wasAssociatedWith(a3, ag4, [prov:role="author"])wasAssociatedWith(a4, ag5, [prov:role="communicator"])

4.3Graphical Illustration

Provenance assertions can beillustrated graphically. The illustration is not intended to represent all the details of the model, but it is intended to show the essence of a set of provenance assertions. Therefore, it cannot be seen as an alternate notation for expressing provenance.

The graphical illustration takes the form of a graph. Entities, activities and agents are represented as nodes, with oval, rectangular, and half-hexagonal shapes, respectively. Usage, Generation, Derivation, Activity Association, and Complementarity are represented as directed edges.

Entities are layed out according to the ordering of their generation event. We endeavor to show time progressing from left to right. This means that edges for Usage, Generation and Derivation typically point from right to left.

example

example

5.PROV-DM Core

This section contains the normative specification of PROV-DM core, the core of the PROV data model.

In a next iteration of this document, it is proposed to reorganizesection 5 as follows. First, the presentation of the data modelalone. Second, its temporal interpretation. Third, the constraints andinferences associated with well-formed accounts.

5.1Record

PROV-DM consists of a set of constructs, referred to asrecords, to formulate representations of the world and constraints that must be satisfied by them.

Furthermore, PROV-DM includes a "house-keeping construct", a record container, used to wrap PROV-DM records and facilitate their interchange.

In PROV-ASN, such representations of the worldmust be conformant with the toplevel productionrecord of the grammar. Theserecords are grouped in three categories:elementRecord (see sectionElement),relationRecord (see sectionRelation), andaccountRecord (see sectionAccount).

record ::=elementRecord |relationRecord |accountRecord

elementRecord ::=entityRecord |activityRecord |agentRecord|noteRecord

relationRecord ::=generationRecord |usageRecord |derivationRecord |activityAssociationRecord |responsibilityRecord |startRecord |endRecord |complementRecord |annotationRecord

In PROV-ASN, a record container is compliant with the productionrecordContainer (see sectionRecord Container).

5.2Element

This section describes all the PROV-DM records referred to as element records. (They are conformant to theelementRecord production of the grammar.)

5.2.1Entity Record

In PROV-DM, anentity record is a representation of an entity.

Examples of entities include a linked data set, a sparse-matrix matrix of floating-point numbers, a document in a directory, the same document published on the Web, and meta-data embedded in a document.

An entity record, notedentity(id, [ attr1=val1, ...]) in PROV-ASN, contains:

  • id: an identifierid identifying an entity; the identifier of the entity record is defined to be the same as the identifier of the entity;
  • attributes: anoptional set of attribute-value pairs[ attr1=val1, ...], representing this entity's situation in the world.

The assertion of an entity record,entity(id, [ attr1=val1, ...]), states, from a given asserter's viewpoint, the existence of an entity, whose situation in the world is represented by the attribute-value pairs, which remain unchanged during a characterization interval, i.e. a continuous interval between twoevents in the world.

In PROV-ASN, an entity record's text matches theentityRecord production of the grammar defined in this specification document.

entityRecord ::=entity(identifieroptional-attribute-values)

optional-attribute-values ::=,[attribute-values]
attribute-values ::=attribute-value|attribute-value,attribute-values
attribute-value ::=attribute=Literal

The following entity record,

entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ])
states the existence of an entity, denoted by identifiere0, with typeFile and path/shared/crime.txt in the file system, and creator alice The attributespath andcreator are application specific, whereas the attributetype is reserved in the PROV-DM namespace.
Further considerations:
  • If an asserter wishes to characterize an entity with the same attribute-value pairs over several intervals, then they are required to assert multiple entity records, each with its own identifier (so as to allow potential dependencies between the various entity records to be expressed).
  • There is no assumption that the set of attributes is complete and that the attributes are independent/orthogonal of each other.
  • A characterization interval may collapse into a single instant.
  • An entity assertion is about a thing, whose situation in the world may be variant. An entity record is asserted at a particular point and is invariant, in the sense that its attributes are given a value as part of that assertion.
  • Activities are not represented by entity records, but instead by activity records, as explained below.
The characterization interval of an entity record is currently implicit. Making it explicit would allow us to define wasComplementOf more precisely. It would also allow us to addressISSUE-108.Beginning and end of characterization interval could be expressed by attributes (similarly to activities).

5.2.2Activity Record

In PROV-DM, anactivity record is a representation of an identifiable activity, which performs a piece of work.

An activity, represented by an activity record, is delimited by itsstart and itsend events; hence, it occurs over an interval delimited by twoevents. However, an activity record need not mention time information, nor duration, because they may not be known.

Such start and end times constituteattributes of an activity, where the interpretation of attribute in the context of an activity record is the same as the interpretation of attribute for entity record: an activity record's attribute remains constant for the duration of the activity it represents. Further characteristics of the activity in the world can be represented by other attribute-value pairs, whichmust also remain unchanged during the activity duration.

Examples of activities include assembling a data set based on a set of measurements, performing a statistical analysis over a data set, sorting news items according to some criteria, running a sparql query over a triple store, editing a file, and publishing a web page.

An activity record, writtenactivity(id, rl, st, et, [ attr1=val1, ...]) in PROV-ASN, contains:

  • id: an identifierid identifying an activity; the identifier of the activity record is defined to be the same as the identifier of the activity;
  • recipeLink: anoptionalrecipe linkrl, which consists of a domain specific specification of the activity;
  • startTime: anoptional timest indicating the start of the activity;
  • endTime: anoptional timeet indicating the end of the activity;
  • attributes: a set of attribute-value pairs[ attr1=val1, ...], representing other attributes of this activity that hold for its whole duration.

In PROV-ASN, an activity record's text matches theactivityRecord production of the grammar defined in this specification document.

activityRecord ::=activity(identifier,recipeLink,time,timeoptional-attribute-values)

The following activity assertion

activity(a1,add-crime-in-london,2011-11-16T16:05:00,2011-11-16T16:06:00,[ex:host="server.example.org",prov:type="ex:edit" %% xsd:QName])

identified by identifiera1, states the existence of an activity with recipe linkadd-crime-in-london, start time2011-11-16T16:05:00, and end time2011-11-16T16:06:00, running on hostserver.example.org, and of typeedit (declared in some namespace with prefixex). The attributehost is application specific, butmust hold for the duration of activity. The attributetype is a reserved attribute of PROV-DM, allowing for subtyping to be expressed.

The mere existence of an activity assertion entails someevent ordering in the world, since anactivity start event alwaysprecedes the correspondingactivity end event. This is expressed by constraintstart-precedes-end.

The following temporal constraint holds for any activity record: thestart eventprecedes theend event.

An activity record is not an entity record.Indeed, an entity record represents an entity that exists in full atany point in its characterization interval, persists during thisinterval, and preserves the characteristics that makes itidentifiable. Alternatively, an activity in something that happens,unfolds or develops through time, but is typically not identifiable bythe characteristics it exhibits at any point during its duration. This distinction is similar to the distinction between 'continuant' and 'occurrent' in logic [Logic].

5.2.3Agent Record

Anagent record is a representation of an agent, which is an entity that can be assigned some degree of responsibility for an activity taking place.

Many agents can have an association with a given activity. An agent may do the ordering of the activity, another agent may do its design, another agent may push the button to start it, another agent may run it, etc. As many agents as one wishes to mention can occur in the provenance record, if it is important to indicate that they were associated with the activity.

From an inter-operability perspective, it is useful to define some basic categories of agents sinceit will improve the use of provenance records by applications. There should be very few of these basic categories to keep the model simple and accessible. There are three types of agents in the model:

  • Person: agents of type Person are people. (This type is equivalent to a "foaf:person" [FOAF])
  • Organization: agents of type Organization are social institutions such as companies, societies etc. (This type is equivalent to a "foaf:organization" [FOAF])
  • SoftwareAgent: a software agent is a piece of software.

These types are mutually exclusive, though they do not cover all kinds of agent.

An agent record, notedagent(id, [ attr1=val1, ...]) in PROV-ASN, contains:

  • id: an identifierid identifying an agent; the identifier of the agent record is defined to be the same as the identifier of the agent;
  • attributes: contains a set of attribute-value pairs[ attr1=val1, ...], representing this agent's situation in the world.

In PROV-ASN, an agent record's text matches theagentRecord production of the grammar defined in this specification document.

agentRecord ::=agent(identifieroptional-attribute-values)

With the following assertions,

agent(e1, [ex:employee="1234", ex:name="Alice", prov:type="prov:Person" %% xsd:QName])entity(e2) and wasStartedBy(a1,e2,[prov:role="author"])entity(e3) and wasAssociatedWith(a1,e3,[prov:role="sponsor"])

the agent record identified bye1 is an explicit agent assertion that holds irrespective of activities it may be associated with. On the other hand, from the entity records identified bye2 ande3, one can infer agent records, as per the following inference.

One can assert an agent record or alternatively, one can infer an agent recordby its association with an activity.

If the recordsentity(e,attrs)andwasAssociatedWith(a,e) hold for some identifiersa,e, and attribute-valuesattrs, thenthe recordagent(e,attrs) also holds.

5.2.4Note Record

As provenance records are exchanged between systems, it may be useful to add extra-information about such records. For instance, a "trust service" may add value-judgements about the trustworthiness of some of the assertions made. Likewise, an interactive visualization component may want to enrich a set of provenance records with information helping reproduce their visual representation. To help with inter-operability, PROV-DM introduces a simple annotation mechanism allowing any identifiable record to be associated with notes.

Annote record is a set of attribute-value pairs, whose meaning is application specific. It may or may not be a representation of something in the world.

In PROV-ASN, a note record's text matches thenoteRecord production of the grammar defined in this specification document.

noteRecord ::=note(identifier,attribute-values)

A separate PROV-DM record is used to associate a note with an identifiable record (seeSection on annotation). A given note may be associated with multiple records.

The following note record

note(ann1,[ex:color="blue", ex:screenX=20, ex:screenY=30])

consists of a set of application-specific attribute-value pairs, intendedto help the rendering of the record it is associated with, byspecifying its color and its position on the screen. In this example,these attribute-value pairs do not constitute a representation of somethingin the world; they are just used to help render provenance.

Attribute-value pairs occurring in notes differ from attribute-value pairs occurring in entity records and activity records. In entity and activity records, attribute-value pairsmust be a representation of something in the world, which remain constant for the duration of the characterization interval (for entity record) or the activity duration (for activity records). In note records, it isoptional for attribute-value pairs to be representations of something in the world. If they are a representation of something in the world, then itmay change value for the corresponding duration. If attribute-value pairs of a note record are a representation of something in the world that does not change, they are not regarded as determining characteristics of an entity or activity, for the purpose of provenance.

5.3Relation

This section describes all the PROV-DM records representing relations between the elements introduced inSection Element. While these relations are not binary, they all involve two primary elements. They can be summarized as follows.

PROV-DM Core Relation Summary
EntityActivityAgentNote
EntitywasDerivedFrom
wasComplementOf
wasGeneratedBy-hasAnnotation
Activityused-wasStartedBy
wasEndedBy
wasAssociatedWith
hasAnnotation
Agent--actedOnBehalfOfhasAnnotation
Note---hasAnnotation

In PROV-ASN, all these relation records are conformant to therelationRecord production of the grammar.

5.3.1Activity-Entity Relation

5.3.1.1Generation Record

In PROV-DM, ageneration record is a representation of a worldevent, the creation of a new entity by an activity. This entity did not exist before creation. The representation of thisevent encompasses a description of the modalities of generation of this entity by this activity.

Ageneration event may be, for example, the creation of a file by a program, the creation of a linked data set, the production of a new version of a document, and the sending of a value on a communication channel.

A generation record, writtenwasGeneratedBy(id,e,a,attrs,t) in PROV-ASN, has the following components:

  • id: anoptional identifierid identifying the generation record;
  • entity: an identifiere identifying an entity record that represents the entity that is created;
  • activity: an identifiera identifying an activity record that represents the activity that creates the entity;
  • time: anoptional "generation time"t, the time at which the entity was created;
  • attributes: anoptional set of attribute-value pairsattrs that describes the modalities of generation of this entity by this activity.

In PROV-ASN, a generation record's text matches thegenerationRecord production of the grammar defined in this specification document.

generationRecord ::=wasGeneratedBy(identifier,eIdentifier,aIdentifier,timeoptional-attribute-values)

A generation record's id isoptional. Itmust be used when annotating generation records (see SectionAnnotation Record) or when defining precise-1 derivations (seeDerivation Record).

The following generation assertions

  wasGeneratedBy(e1,a1, 2001-10-26T21:32:52, [ex:port="p1", ex:order=1])  wasGeneratedBy(e2,a1, 2001-10-26T10:00:00, [ex:port="p1", ex:order=2])

state the existence of twoevents in the world (with respective times2001-10-26T21:32:52 and2001-10-26T10:00:00), at which new entities, represented by entity records identified bye1 ande2, are created by an activity, itself represented by an activity record identified bya1.The first one is available as the first value on port p1, whereas the other is the second value on port p1. The semantics ofport andorder in these records are application specific.

The assertion of a generation record implies ordering ofevents in the world.

If an assertionwasGeneratedBy(x,a,attrs) orwasGeneratedBy(x,a,attrs,t) holds,then the following temporal constraint also holds: thegeneration of the entity denoted byxprecedes theendofa andfollows thestart ofa.

A given entity record can be referred to in a single generation record in the scope of a givenaccount.The rationale for this constraint is as follows.If two activities sequentially set different values to some attribute by means of two differentgeneration events, then they generate distinct entities. Alternatively, for two activities to generate an entity simultaneously, they would require some synchronization by which they agree the entity is released for use; the end of this synchronization would constitute the actual generation of the entity, but is performed by a single activity. This unicity constraint is formalized as follows.

Given an entity record denoted bye, two activity records denoted bya1 anda2, and two sets of attribute-value pairsattrs1 andattrs2,if the recordswasGeneratedBy(e,a1,attrs1) andwasGeneratedBy(e,a2,attrs2) exist in the scope of a given account,thena1=a2 andattrs1=attrs2.
TODO: Introduce the well-formed-ness constraint in a entirely separate section.
5.3.1.2Usage Record

In PROV-DM, ausage record is a representation of a worldevent: the consumption of an entity by an activity. The representation includes a description of the modalities of usage of this entity by this activity.

Ausage event may be the consumption of a parameter by a procedure, the reading of a value on a port by a service, the reading of a configuration file by a program, or the adding of an ingredient, such as eggs, in a baking activity. Usage may entirely consume an entity (e.g. eggs are not longer available after being added to the mix), or leave it as such, ready for further uses (e.g. a file on a file system can be read indefinitely).

A usage record, writtenused(id,a,e,attrs,t) in PROV-ASN, has the following constituent:

  • id: anoptional identifierid identifying the usage record;
  • activity: an identifiera for an activity record, which represents the consuming activity;
  • entity: an identifiere for an entity record, which represents the entity that is consumed;
  • time: anoptional "usage time"t, the time at which the entity was used;
  • attributes: an OPTIONIAL set of attribute-value pairsattrs that describe the modalities of usage of this entity by this activity;

In PROV-ASN, a usage record's text matches theusageRecord production of the grammar defined in this specification document.

usageRecord ::=used(identifier,aIdentifier,eIdentifier,timeoptional-attribute-values)

A usage record's id isoptional, but comes handy when annotating usage records (see SectionAnnotation Record) or when defining derivations.

The following usage records

  used(a1,e1,2011-11-16T16:00:00,[ex:parameter="p1"])  used(a1,e2,2011-11-16T16:00:01,[ex:parameter="p2"])

state that the activity, represented by the activity record identified bya1, consumed two entities, represented by entity records identified bye1 ande2, at times2011-11-16T16:00:00 and2011-11-16T16:00:01, respectively; the first one was found as the value of parameterp1, whereas the second was found as value of parameterp2. The semantics ofparameter in these records is application specific.

A usage record's id isoptional. Itmust be present when annotating usage records (see SectionAnnotation Record) or when defining precise-1 derivations (seeDerivation Record).

A reference to a given entity recordmay appear in multiple usage records that share a given activity record identifier.

For any entity, the following temporal constraint holds: thegeneration of an entity alwaysprecedes any of itsusages.
Given an activity record identified bya, an entity record identified bye, a set of attribute-value pairsattrs, and optional timet,if assertionused(a,e,attrs) orused(a,e,attrs,t) holds,then the following temporal constraint holds: theusage of the entity represented by entity record identified byeprecedes theend of activity represented by record identified bya andfollows itsstart.
Should we define a taxonomy of use? This isISSUE-23.

5.3.2Activity-Agent Relation

5.3.2.1Activity Association Record

The key purpose of agents in PROV-DM is to assign responsibilityfor activities. It is important to reflect that there is a degree inthe responsibility of agents, and that is a major reason fordistinguishing among all the agents that have some association with anactivity and determine which ones are really the originators of theentity. For example, a programmer and a researcher could both beassociated with running a workflow, but it may not matter whatprogrammer clicked the button to start the workflow while it wouldmatter a lot what researcher told the programmer to do so. Anotherexample: a student publishing a web page describing an academicdepartment could result in both the student and the department beingagents associated with the activity, and it may not matter whatstudent published a web page but it matters a lot that the departmenttold the student to put up the web page. So there is some notion ofresponsibility that needs to be captured.

To this end, PROV-DM offers two kinds of records. The first, introduced in this section, represents an association between an agent and an activity; the second, introduced inSection Responsibility record, represents the fact that an agent was acting on behalf of another, in the context of an activity.

Examples of activity association include designing, participation, initiation and termination, timetabling or sponsoring.

Anactivity association record, writtenwasAssociatedWith(a,ag2,attrs) in PROV-ASN, has the following constituents:

  • id: anoptional identifierid identifying the activity association record;
  • activity: an identifiera for an activity record;
  • attributes: anoptional set of attribute-value pairsattrs that describe the modalities of association of this activity with this agent;
  • agent: an identifierag2 for an agent record, which represents the agent associated with the activity.

In PROV-ASN, an activity association record's text matches theactivityAssociationRecord productions of the grammar defined in this specification document.

activityAssociationRecord ::=wasAssociatedWith(identifier,aIdentifier,agIdentifieroptional-attribute-values)
In the following example, a programmer and a researcher agents are asserted to be associated with an activity.
activity(a,[prov:type="workflow"])agent(ag1,[prov:type="programmer"])agent(ag2,[prov:type="researcher"])wasAssociatedWith(a,ag1,[prov:role="loggedInUser", ex:how="webapp"])wasAssociatedWith(a,ag2,[prov:role="designer", ex:context="phd"])
5.3.2.2Start and End Records

Astart record is a representation of an agent starting an activity. Anend record is a representation of an agent ending an activity. Both relations are specialized forms ofwasAssociatedWith. They contain attributes describing the modalities of acting/ending activities.

A start record, writtenwasStartedBy(id,a,ag,attrs) in PROV-ASN, contains:

  • id: anoptional identifierid identifying the start record;
  • activity: an identifiera denoting an activity record, representing the started activity;
  • agent: an identifierag for an agent record, representing the starting agent;
  • attributes: anoptional set of attribute-value pairsattrs, describing modalities according to which the agent started the activity.

An end record, writtenwasEndedBy(id,a,ag,attrs) in PROV-ASN, contains:

  • id: anoptional identifierid identifying the end record;
  • activity: an identifiera denoting an activity record, representing the ended activity;
  • agent: an identifierag for an agent record, representing the ending agent;
  • attributes: anoptional set of attribute-value pairsattrs, describing modalities according to which the agent ended the activity.

In PROV-ASN, start and end record's texts match thestartRecord andendRecord productions of the grammar defined in this specification document.

startRecord ::=wasStartedBy(identifier,aIdentifier,agIdentifieroptional-attribute-values)
endRecord ::=wasEndedBy(identifier,aIdentifier,agIdentifieroptional-attribute-values)

The following assertions

wasStartedBy(a,ag,[ex:mode="manual"])wasEndedby(a,ag,[ex:mode="manual"])

state that the activity, represented by the activity record denoted byawas started and ended by an agent, represented by record denoted byah, in "manual" mode, an application specific characterization of these relations.

Temporal constraints for these relations remain to bewritten. The temporal constraints should ensure that the agent startedits existence before the effect it may have on the activity.

5.3.3Entity-Entity or Agent-Agent Relation

5.3.3.1Responsibility Record

To promote take-up, PROV-DM offers a mild version of responsibilityin the form of a relation to represent when an agent acted on anotheragent's behalf. So in the example of someone running a mail program,the program is an agent of that activity and the person is also anagent of the activity, but we would also add that the mail softwareagent is running on the person's behalf. In the other example, thestudent acted on behalf of his supervisor, who acted on behalf of thedepartment chair, who acts on behalf of the university, and all thoseagents are responsible in some way for the activity to take place butwe don't say explicitly who bears responsibility and to whatdegree.

We could also say that an agent can act on behalf of several otheragents (a group of agents). This would also make possible toindirectly reflect chains of responsibility. This also indirectlyreflects control without requiring that control is explicitlyindicated. In some contexts there will be a need to representresponsibility explicitly, for example to indicate legalresponsibility, and that could be added as an extension to this coremodel. Similarly with control, since in particular contexts theremight be a need to define specific aspects of control that variousagents exert over a given activity.

Given an activity association recordwasAssociatedWith(a,ag2,attrs),aresponsibility record, writtenactedOnBehalfOf(id,ag2,ag1,a,attrs) in PROV-ASN, has the following constituents:

  • id: anoptional identifierid identifying the responsibility record;
  • subordinate: an identifierag2 for an agent record, which represents an agent associated with an activity, acting on behalf of the responsible agent;
  • responsible: an identifierag1 for an agent record, which represents the agent on behalf of which the subordinate agentag2 acts;
  • activity: anoptional identifiera of an activity record for which the responsibility record holds;
  • attributes: anoptional set of attribute-value pairsattrs that describe the modalities of this relation.
responsibilityRecord ::=actedOnBehalfOf(identifier,agIdentifier,agIdentifier,aIdentifieroptional-attribute-values)
In the following example, a programmer, a researcher and a funder agents are asserted. The porgrammer and researcher are associated with a workflow activity. The programmer acts on behalf of the researcher (delegation) encoding the commands specified by the researcher; the researcher acts on behalf of the funder, who has an contractual agreement with the researcher.
activity(a,[prov:type="workflow"])agent(ag1,[prov:type="programmer"])agent(ag2,[prov:type="researcher"])agent(ag3,[prov:type="funder"])wasAssociatedWith(a,ag1,[prov:role="loggedInUser"])wasAssociatedWith(a,ag2)actedOnBehalfOf(ag1,ag2,a,[prov:type="delegation"])actedOnBehalfOf(ag2,ag3,a,[prov:type="contract"])
5.3.3.2Derivation Record

In PROV-DM, aderivation record is a representation that some entity is transformed from, created from, or affected by another entity in the world.

Examples of derivation include the transformation of a canvas into a painting, the transportation of a person from London to New-York, the transformation of a relational table into a linked data set, and the melting of ice into water.

According toSection Conceptualization, for an entity to be transformed from, created from, or affected by another in some way, there must be some underpinning activities performing the necessary actions resulting in such a derivation. However, asserters may not assert or have knowledge of these activities and associated details: they may not assert or know their number, they may not assert or know their identity, they may not assert or know the attributes characterizing how the relevant entities are used or generated. To accommodate the varying circumstances of the various asserters, PROV-DM allows more or less precise records of derivation to be asserted. Hence, PROV-DM uses the termsprecise andimprecise to characterize the different kinds of derivation record. We note that the derivation itself is exact (i.e., deterministic, non-probabilistic), but it is its description, expressed in a derivation record, that may be imprecise.

The lack of precision may come from two sources:

  • the number of activities that underpin a derivation is not asserted or known, or
  • any of the other details that are involved in the derivation is not asserted or known; these include activity identities, generation and usage records, and their attributes.

Hence, given a precision axis, with valuesprecise andimprecise, and an activity axis, with valuesone activity andn activities, we can then form a matrix of possible derivations, precise or imprecise, or corresponding to one activity or n activities.Out of the four possibilities, PROV-DM offers three forms of derivation, while the fourth one is not meaningful. The following table summarises names for the three kinds of derivation, which we then explain.

PROV-DM Derivation Type Summary
precision axis
preciseimprecise
activity
axis
one activityprecise-1 derivation recordimprecise-1 derivation record
n activities---imprecise-n derivation record
  • The asserter asserts that derivation is due to exactly one activity, and all the details are asserted. We call this a precise-1 derivation record.
  • The asserter asserts that derivation is due to exactly one activity, but other details, whether known or unknown, are not asserted. We call this an imprecise-1 derivation record.
  • The asserter does not know how many activities are involved in the derivation, and other details, whether known or unknown, are also not asserted. We call this an imprecise-n derivation record.

We note that the fourth theoretical case of a precise derivation, where the number of activities is not known or asserted cannot occur.

The three kinds of derivation records are successively introduced. To minimize the number of relation types in PROV-DM, we introduce a PROV-DM reserved attributesteps, which allows us to distinguish the various derivation types.

Aprecise-1 derivation record, writtenwasDerivedFrom(id, e2, e1, a, g2, u1, attrs) in PROV-ASN, contains:

  • id: anoptional identifierid identifying the derivation record;
  • generatedEntity: the identifiere2 of an entity record, which is a representation of the generated entity;
  • usedEntity: the identifiere1 of an entity record, which is a representation of the used entity;
  • activity: an identifiera of an activity record, which is a representation of the activity using and generating the above entities;
  • generation: an identifierg2 of the generation record pertaining toe2 anda;
  • usage: an identifieru1 of the usage record pertaining toe1 anda.
  • attributes: anoptional set of attribute-value pairsattrs that describe the modalities of this derivation, optionally including the attribute-value pairprov:steps="1".

It isoptional to include the attributeprov:steps in a precise-1 derivation since the record already refers to the one and only one activity underpinning the derivation.

Animprecise-1 derivation record, writtenwasDerivedFrom(id, e2,e1, attrs) in PROV-ASN, contains:

  • id: anoptional identifierid identifying the derivation record;
  • generatedEntity: the identifiere2 of an entity record, which is a representation of the generated entity;
  • usedEntity: the identifiere1 of an entity record, which is a representation of the used entity.
  • attributes: a set of attribute-value pairsattrs that describe the modalities of this derivation; itmust include the attribute-value pairprov:steps="1".

An imprecise-1 derivationmust include the attributeprov:steps, since it is the only means to distinguish this record from an imprecise-n derivation record.

Animprecise-n derivation record, writtenwasDerivedFrom(id, e2, e1, attrs) in PROV-ASN, contains:

  • id: anoptional identifierid identifying the derivation record;
  • generatedEntity: the identifiere2 of an entity record, which is a representation of the generated entity;
  • usedEntity: the identifiere1 of an entity record, which is a representation of the used entity.
  • attributes: anoptional set of attribute-value pairsattrs that describe the modalities of this derivation; it optionally includes the attribute-value pairprov:steps="n".

It isoptional to include the attributeprov:steps in an imprecise-n derivation record. It defaults toprov:steps="n".

None of the three kinds of derivation is defined to be transitive. Domain-specific specializations of these derivations may be defined in such a way that the transitivity property holds.

In PROV-ASN, a derivation record's text matches thederivationRecord production of the grammar defined in this specification document.

derivationRecord ::=precise-1-derivationRecord|imprecise-1-derivationRecord|imprecise-n-derivationRecord

precise-1-derivationRecord ::=wasDerivedFrom(identifier,eIdentifier,eIdentifier,aIdentifier,gIdentifier,uIdentifieroptional-attribute-values)
imprecise-1-derivationRecord::=wasDerivedFrom(identifier,eIdentifier,eIdentifier,attribute-values)
imprecise-n-derivationRecord::=wasDerivedFrom(identifier,eIdentifier,eIdentifieroptional-attribute-values)
The grammar should make it clear that attributeprov:steps="1" is required for imprecise-1-derivationRecord.
PM: suggestion -- remove the distinction between imprecise-1 and imprecise-n in the grammar and instead explain that the qualification (1 vs n) is through attribute prov:steps.

The following assertions state the existence of derivations.

wasDerivedFrom(e5,e3,a4,g2,u2,[])wasDerivedFrom(e5,e3,a4,g2,u2,[prov:steps="1"])wasDerivedFrom(e3,e2,[prov:steps="1"])wasDerivedFrom(e2,e1,[])wasDerivedFrom(e2,e1,[prov:steps="n"])

The first two are precise-1 derivation records expressing that the activity represented by the activitya4, byusing the entity denoted bye3 according to usage recordu2 derived theentity denoted bye5 and generated it according to generation recordg2. The third record is an imprecise-1 derivation, which is similar fore3 ande2, but it leaves the activity record and associated attributes implicit. The fourth and fifth records are imprecise-n derivation records betweene2 ande1, but no information is provided as to the number and identity of activities underpinning the derivation.

An precise-1 derivation record is richer than an imprecise-1 derivation record, itself, being more informative that an imprecise-n derivation record. Hence, the following implications hold.

Given two entity records denoted bye1 ande2,if the assertionwasDerivedFrom(e2, e1, a, g2, u1, attrs) holds for some generation record identified byg2, and usage record identified byu1, thenwasDerivedFrom(e2,e1,[prov:steps="1"] ∪ attrs) also holds.
Given two entity records denoted bye1 ande2,if the assertionwasDerivedFrom(e2, e1, [prov:steps="1"] ∪ attrs) holds, thenwasDerivedFrom(e2,e1,attrs) also holds.

If a derivation record holds fore2 ande1, then this means that the entity represented by entity record identified bye1 has an influence on the entity represented entity record identified bye2, which at the minimum implies temporal ordering, specified as follows.First, we consider one-activity derivations.

Given an activity record identified bya, entity records identified bye1 ande2, generation record identified byg2, and usage record identified byu1,if the recordwasDerivedFrom(e2,e1,a,g2,u1,attrs)orwasDerivedFrom(e2,e1,[prov:steps="1"] ∪ attrs) holds,thenthe following temporal constraint holds:theusageof entity denoted bye1precedes thegeneration ofthe entity denoted bye2.

Then, imprecise-n derivations.

Given two entity records denoted bye1 ande2,if the recordwasDerivedFrom(e2,e1,[prov:steps="n"] ∪ attrs) holds,then the following temporal constraint holds:thegeneration event of the entity denoted bye1precedes thegeneration event ofthe entity denoted bye2.

Note that temporal ordering is between generations ofe1ande2, as opposed to precise-1 derivation,which implies temporal ordering between the usage ofe1 andgeneration ofe2. Indeed, in the case ofimprecise-n derivation, nothing is known about the usage ofe1,since there is no associated activity.

The imprecise-1 derivation has the same meaning as the precise-1 derivation, except that an activity is known to exist, though it does not need to be asserted. This is formalized by the following inference rule,referred to asactivity introduction:

IfwasDerivedFrom(e2,e1) holds,then there exist an activity record identified bya, a usage record identified byu, and a generation record identified bygsuch that:
activity(a,aAttrs)wasGeneratedBy(g,e2,a,gAttrs)used(u,a,e1,uAttrs)
for sets of attribute-value pairsgAttrs,uAttrs, andaAttrs.

Note that inferring derivation from usage and generation does not holdin general. Indeed, when a generationwasGeneratedBy(g, e2, a, attrs2)precedesused(u, a, e1, attrs1), forsomee1,e2,attrs1,attrs2, anda, onecannot infer derivationwasDerivedFrom(e2, e1, a, g, u)orwasDerivedFrom(e2,e1) since ofe2 cannot possibly be determined byofe1, given the creation ofe2precedes the useofe1.

The following property holds for account wheregeneration-unicity applies. Move it to separate section with allrelated material.

A further inference is permitted from the imprecise-1 derivation record:

Given an activity record identified bype, entity records identified bye1 ande2, and set of attribute-value pairsattrs2,ifwasDerivedFrom(e2,e1, [prov:steps="1"]) andwasGeneratedBy(e2,pe,attrs2) hold,thenused(pe,e1,attrs1) also holdsfor some set of attribute-value pairsattrs1.

This inference is justified by the fact that the entity represented by entity record identified bye2 is generated by at most one activity in a given account (seegeneration-unicity). Hence, this activity record is also the one referred to in the usage record ofe1.

We note that the converse inference, does not hold.FromwasDerivedFrom(e2,e1) andused(pe,e1), one cannotderivewasGeneratedBy(e2,pe,attrs2) because identifiere1 may occur in usage records referring to many activity records, but they may not be referred to in generation records containing identifiere2.

Should derivation have a time? Which time? This isISSUE-43.
5.3.3.3Complementarity Record
While the working group recognizes the importance of the complementarity record concept, its name and its exact semantics are still being discussed.

Acomplementarity record is a relationship between two entities stated to have compatible characterization over some continuous interval between two events.

The rationale for introducing this relationship is that in general, at any given time, for an entity in the world, there may be multiple ways of characterizing it, and hence multiple representations can be asserted by different asserters. In the example that follows, suppose thing "Royal Society" is represented by two asserters, each using a different set of attributes. If the asserters agree that both representations refer to "The Royal Society", the question of whether any correspondence can be established between the two representations arises naturally. This is particularly relevant when (a) the sets of attributes used by the two representations overlap partially, or (b) when one set is subsumed by the other. In both these cases, we have a situation where each of the two asserters has a partial view of "The Royal Society", and establishing a correspondence between them on the shared attributes is beneficial, as in case (a) each of the two representationcomplements the other, and in case (b) one of the two (that with the additional attributes) complements the other.

This intuition is made more precise by considering the entities that form the representations of entities at a certain point in time. An entity record represents, by means of attribute-value pairs, a thing and its situation in the world, which remain constant over a characterization interval.As soon as the thing's situation changes, this marks the end of the characterization interval for the entity record representing it. The thing's novel situation is represented by an attribute with a new value, or an entirely different set of attribute-value pairs, embodied in another entity record, with a new characterization interval. Thus, if we overlap the timelines (or, more generally, the sequences of value-changing events) for the two entities, we can hope to establish correspondences amongst the entity records that represent them at various points along that events line. The figure below illustrates this intuition.

illustration complementOf

Relationcomplement-of between two entity records is intended to capture these correspondences, as follows. Suppose entity records A and B share a set P of attributes, and each of them has other attributes in addition to P. If the values assigned to each attribute in P arecompatible between A and B, then we say thatA is-complement-of B, andB is-complement-of A, in a symmetrical fashion. In the particular case where the set P of attributes of B is a strict superset of A's attributes, then we say thatB is-complement-of A, but in this case the opposite does not hold. In this case, the relation is not symmetric. (as a special case, A and B may not share any attributes at all, and yet the asserters may still stipulate that they are representing the same thing "Royal Society". The symmetric relation may hold trivially in this case).

The termcompatible used above means that a mapping can be established amongst the values of attributes in P and found in the two entity expession. This generalizes to the case where attribute sets P1 and P2 of A, and B, respectively, are not identical but they can be mapped to one another. The simplest case is the identity mapping, in which A and B share attribute set P, and furthermore the values assigned to attributes in P match exactly.

It is important to note that the relation holds only for the characterization intervals of the entity expessions involved As soon as one attribute changes value in one of them, new correspondences need to be found amongst the new entities. Thus, the relation has a validity span that can be expressed in terms of the event lines of the entity.

A complementarity record is writtenwasComplementOf(e2,e1), wheree1 ande2 are two identifiers denoting entity records.

The following example illustrates the entity "Royal Society"and its perspectives at various points in time.

entity(rs,[ex:created=1870])entity(rs_l1,[prov:location="loc2"])entity(rs_l2,[prov:location="The Mall"])entity(rs_m1,[ex:membership=250, ex:year=1900])entity(rs_m2,[ex:membership=300, ex:year=1945])entity(rs_m3,[ex:membership=270, ex:year=2010])wasComplementOf(rs_m3, rs_l2)wasComplementOf(rs_m2, rs_l1)wasComplementOf(rs_m2, rs_l2)wasComplementOf(rs_m1, rs_l1)wasComplementOf(rs_m3, rs)wasComplementOf(rs_m2, rs)wasComplementOf(rs_m1, rs)wasComplementOf(rs_l1, rs)wasComplementOf(rs_l2, rs)
An assertion "wasComplementOf(B,A)" holds over the temporal intersection of A and B,only if:
  1. if a mapping can be established from an attribute X of entity record identified by B to an attribute Y of entity record identified by A, then the values of A and B must be consistent with that mapping;
  2. entity record identified by B has some attribute that entity record identified by A does not have.

The complementarity relation is not transitive. Let us consider identifierse1,e2, ande3 identifying three entity records such thatwasComplementOf(e3,e2) andwasComplementOf(e2,e1) hold. The recordwasComplementOf(e3,e1) may not hold because the characterization intervals of the denoted entity records may not overlap.

In PROV-ASN, a complementarity record's text matches thecomplementarityRecord production of the grammar defined in this specification document.

complementarityRecord ::=wasComplementOf(eIdentifier,eIdentifieroptional-attribute-values)
|wasComplementOf(eIdentifier,accIdentifier,eIdentifier,accIdentifieroptional-attribute-values)

An entity record identifier can optionally be accompanied by an account identifier. When this is the case, it becomes possible to link two entity record identifiers that are appear in different accounts. (In particular, the entity record identifiers in two different account are allowed to be the same.). When account identifiers are not available, then the linking of entity records through complementarity can only take place within the scope of a single account.

In the following example, the same description of the Royal Society is structured according to two different accounts. In the second account, we find a complementarity record linkingrs_m1 in accountex:acc2 tors in accountex:acc1.

account(ex:acc1,        http://example.org/asserter1,     ...    entity(rs,[ex:created=1870])    ...    )account(ex:acc2,        http://example.org/asserter2,     ...    entity(rs_m1,[ex:membership=250, ex:year=1900])    ...    wasComplementOf(rs_m1, ex:acc2, rs, ex:acc1))
It is suggested that the name 'wasComplementOf' does not capture the meaning of this relation adequately. No concrete suggestion has been made so far.Furthermore, there is a suggestion that an alternative relation that is transitive may also be useful.This is raised in the followingemail.
A discussion on alternative definition of wasComplementOf has not reached a satisfactory conclusion yet. This isISSUE-29
Comments on ivpof inISSUE-57.

5.3.4Annotation Record

Anannotation record establishes a link between an identifiable PROV-DM record and a note record referred to by its identifier. Multiple note records can be associated with a given PROV-DM record; symmetrically, multiple PROV-DM records can be associated with a given note record. Since note records have identifiers, they can also be annotated. The annotation mechanism (with note record and the annotation record) forms a key aspect of the extensibility mechanism of PROV-DM (seeextensibility section).

An annotation record, writtenhasAnnotation(r,n,attrs) in PROV-ASN, has the following constituents:

  • record: an identifierr of the record being annnotated;
  • note: an identifiern of a note record;
  • attributes: anoptional setattrs of attribute-value pairs to further describe this record.

In PROV-ASN, a note record's text matches thenoteRecord production of the grammar defined in this specification document.

annotationRecord ::=hasAnnotation(identifier,nIdentifieroptional-attribute-values)

The interpretation of notes is application-specific. See SectionNote for a discussion of the difference between note attributes and other records attributes. We also note the present tense in this term to indicate that it may not denote something in the past.

The following records

entity(e1,[prov:type="document"])entity(e2,[prov:type="document"])activity(a,transform,t1,t2,[])used(u1,a,e1,[ex:file="stdin"])wasGeneratedBy(e2, a, [ex:file="stdout"])note(n1,[ex:icon="doc.png"])hasAnnotation(e1,n1)hasAnnotation(e2,n1)note(n2,[ex:style="dotted"])hasAnnotation(u1,n2)

assert the existence of two documents in the world (attribute-value pair:prov:type="document") represented by entity records identified bye1 ande2, and annotate these records with a note indicating that the icon (an application specific way of rendering provenance) isdoc.png. It also asserts an activity, its usage of the first entity, and its generation of the second entity. Theusage record is annotated with a style (an application specific way of rendering this edge graphically). To be able to express this annotation, the usage record was provided with an identifieru1, which was then referred to inhasAnnotation(u1,n2).

5.4Bundle

In this section, two constructs are introduced to groupPROV-DM records. The firstone,account record is itself arecord, whereas the secondonerecord container is not.

5.4.1Account Record

In PROV-DM, anaccount record is a wrapper of records with a dual purpose:

  • It is the mechanism by which attribution of provenance can be assserted; it allows asserters to bundle up their assertions, and assert suitable attribution;
  • It provides a scoping mechanism for record identifiers and for some contraints (such asgeneration-unicity andderivation-use).

An account record, writtenaccount(id, assertIRI, recs, attrs) in PROV-ASN, contains:

  • id: an identifierid that identifies this account globally;
  • asserter: an IRI, denoted byassertIRI, to identify an asserter; such IRI has no specific interpretation in the context of PROV-DM;
  • records: a setrecs of provenance records;
  • attributes: anoptional setattrs of attribute-value pairs to further describe this record.

In PROV-ASN, an account record's text matches theaccountRecord production of the grammar defined in this specification document.

accountRecord ::=account(identifier,asserter,recordoptional-attribute-values)
Currently, the non-terminalasserter is defined as IRI and its interpretation is outside PROV-DM. We may want the asserter to be an agent instead, and therefore use PROV-DM to express the provenance of PROV-DM assertions. The editors seek inputs on how to resolve this issue.

The following account record

account(ex:acc0,        http://example.org/asserter,           entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ])          ...          wasDerivedFrom(e2,e1)          ...          activity(a0,create-file,t)          ...          wasGeneratedBy(e0,a0,[])               ...          wasAssociatedWith(a4, ag5, [prov:role="communicator"])  )

contains the set of provenance records of sectionexample-prov-asn-encoding, is asserted by agenthttp://example.org/asserter, and is identified by identifierex:acc0.

Account records constitue a scope for record identifiers. A record identifier within the scope of an account is intended to denote a single record. However, nothing prevents an asserter from asserting an account containing, for example, multiple entity records with a same identifier but different attribute-values. In that case, they should be understood as a single entity record with this identifier and the union of all attributes values, as formalized inidentified-entity-in-account.

Given an entity record identifiere, two sets of attribute-values denoted byav1 andav2, two entity recordsentity(e,av1) andentity(e,av2) occurring in an account are equivalent to the entity recordentity(e,av) whereav is the set of attribute-value pairs formed by the union ofav1 andav2.

Whilst constraintidentified-entity-in-account specifies how to understand multiple entity records with a same identifier within a given account, it does not guarantee that the entity record formed with the union of all attribute-value pairs is consistent. Indeed, a given attribute may be assigned multiple values, resulting in an inconsistent entity record, as illustrated by the following example.

In the following account record, we find two entity records with a same identifiere.

account(ex:acc1,        http://example.org/id,          entity(e,[prov:type="person", ex:age=20])          entity(e,[prov:type="person", ex:age=30])          ...)

Application ofidentified-entity-in-account results in an entity record containing the attribute-value pairsage=20 andage=30. This results in an inconsistent characterization of a person. We note that deciding whether a set of attribute-values is consistent or not is application specific and outside the scope of this specification.

Account records can be nested since an account record can occur among the records being wrapped by another account.

An account is said to be well-formed ifit satisfies the constraintsgeneration-unicity andderivation-use.

The union of two accounts is another account, containing the unions of their respective records, where records with a same identifier should be understood according to constraintidentified-entity-in-account. Well-formedaccounts are notclosed under union because theconstraintgeneration-unicity may nolonger be satisfied in the resulting union.

Indeed, let us consider another account record

account(ex:acc2,        http://example.org/asserter2,           entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ])          ...          activity(a1,create-file,t1)          ...          wasGeneratedBy(e0,a1,[ex:fct="create"])               ... )

with identifierex:acc2, containing assertions by asserter byhttp://example.org/asserter2 stating that the entity represented by entity record identified bye0 was generated by an activity represented by activity record identified bya1 instead ofa0 in the previous accountex:acc0. If accountsex:acc0 andex:acc2 are merged together, the resulting set of records violatesgeneration-unicity.

Account records constitute a scope for record identifiers. Since accounts can be nested, scopes can also be nested; thus, the scope of record identifiers should be understood in the context of such nested scopes. When a record with an identifier occurs directly within an account, then its identifier denotes this record in the scope of this account, except in sub-accounts where records with the same identifier occur.

The following account record is inspired from sectionexample-prov-asn-encoding. This account, identified byex:acc3, declares entity record with identifiere0, which is being referred to in the nested accountex:acc4. The scope of identifiere0 is accountex:acc3, including subaccountex:acc4.

account(ex:acc3,        http://example.org/asserter1,           entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ])          activity(a0,create-file,t)          wasGeneratedBy(e0,a0,[])            account(ex:acc4,                  http://example.org/asserter2,                    entity(e1, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="" ])                    activity(a0,copy-file,t)                    wasGeneratedBy(e1,a0,[ex:fct="create"])                    wasComplementOf(e1,e0)))

Alternatively, an activity record identified bya0 occurs in each of the two accounts. Therefore, each activity record is asserted in a separate scope, and therefore may represent different activities in the world.

The account record is the hook by which further meta information can be expressed about provenance, such as asserter, time of creation, signatures. The annotation mechanism can be used for this purpose, but how general meta-information is expressed is beyond the scope of this specification, except for asserters.

5.4.2Record Container

Arecord container is a house-keeping construct of PROV-DM, also capable of bundling PROV-DM records. A record container is not a record, but can be exploited to return assertions in response to a request for the provenance of something ([PROV-PAQ]).

A record container, writtencontainer decls recs endContainer in PROV-ASN, contains:

  • namespaceDeclarations: a setdecls of namespace declarations, declaring namespaces and associated prefixes, which can be used inattributes andidentifiers occurring insiderecs;
  • records: a non-empty set of recordsrecs.

All the records inrecs are implictly wrapped in a default account, scoping all the record identifiers they declare directly, and constituting a toplevel account, in the hierarchy of accounts. Consequently, every provenance record is always expressed in the context of an account, either explicitly in an asserted account, or implicitly in a container's default account.

In PROV-ASN, a record container's text matches therecordContainer production of the grammar defined in this specification document.

recordContainer ::=containernamespaceDeclarationsrecordendContainer

The following container

container  prefix ex: http://example.org/,  account(ex:acc1,http://example.org/asserter1,...)  account(ex:acc2,http://example.org/asserter1,...)endContainer

illustrates how two accounts with identifiersex:acc1 andex:acc2 can be returned in a PROV-ASN serialization of the provenance of something.

Asserter needs to be defined. This isISSUE-51.
Scope and Identifiers. This isISSUE-81.

5.5Further Terms in Records

This section further terms in PROV-DM records.

5.5.1Attribute

Anattribute is aqualified name. A qualified name can be mapped into an IRI by concatenating the IRI associated with the prefix and the local part (see detailed rule in [RDF-SPARQL-QUERY], Section4.1.1).

attribute ::=qualifiedName
qualifiedName  ::=prefixedName |unprefixedName
prefixedName  ::=prefix:localPart
unprefixedName  ::=localPart
prefix  ::=a name without colon compatible with theNC_NAME production [XML-NAMES]
localPart  ::=a name without colon compatible with theNC_NAME production [XML-NAMES]

A qualified name's prefix isoptional. If a prefix occurs in a qualified name, it refers to anamespace declared in the record container. In the absence of prefix, the qualified name refers to the default namespace declared in the container.

Note that XML NC_NAME don't allow local identifiers to start with a number. Instead, should we use the productions used in SPARQL or TURTLE?

From this specification's viewpoint, the interpretation of an attribute declared in a namespace other than prov-dm is out ofscope.

The PROV data model introduces a fixed set of attributes in thePROV-DM namespace:

  • The attributeprov:role denotes the function of an entity with respect to an activity, in the context of a usage, generation, activity association, start, end record. The value associated with aprov:role attributemust be conformant withLiteral.

    The following start record describes the role of the agent identified byag in this start relation with activitya.

       wasStartedBy(a,ag, [prov:role="program-operator"])
  • The attributeprov:type provides further typing information for the element or relation asserted in the record. The value associated with aprov:type attributemust be conformant withLiteral.

    The following record declares an agent of type software agent

       agent(ag, [prov:type="prov:SoftwareAgent" %% xsd:QName])
  • The attributeprov:steps defines the level of precision associated with a derivation record. The value associated with aprov:steps attributemust be"1" or"n".

5.5.2Identifier

Anidentifier is aqualified name. A qualified name can be mapped into an IRI by concatenating the IRI associated with the prefix and the local part (see detailed rule in [RDF-SPARQL-QUERY], Section4.1.1).

identifier ::=qualifiedName
eIdentifier ::=identifier(intended to denote an entity record)
aIdentifier ::=identifier(intended to denote an activity record)
agIdentifier ::=identifier(intended to denote an agent record)
gIdentifier::=identifier(intended to denote a generation record)
uIdentifier::=identifier(intended to denote a usage record)
nIdentifier::=identifier(intended to denote a note record)
accIdentifier::=identifier(intended to denote an account record)

5.5.3Literal

A PROV-DM Literal represents a data value such as a particular stringor number. A PROV-DM Literal represents a value whose interpretation is outside the scope of PROV-DM.

In PROV-ASN, a Literal's text matches theLiteral production of the grammar defined in this specification document.

Literal  ::=typedLiteral |convenienceNotation
typedLiteral ::=quotedString%%datatype
datatype ::=qualifiedName
convenienceNotation  ::=stringLiteral |intLiteral
stringLiteral ::=quotedString
quotedString ::=a finite sequence of characters in which " (U+22) and \ (U+5C) occur only in pairs of the form \" (U+5C, U+22) and \\ (U+5C, U+5C), enclosed in a pair of " (U+22) characters
intLiteral ::=a finite-length sequence of decimal digits (#x30-#x39) with an optional leading negative sign (-)

The non terminalsstringLiteral andintLiteralare syntactic sugar for quoted strings with datatypexsd:string andxsd:int, respectively.

In particular, a PROV-DM Literal may be an IRI-typed string (with datatypexsd:anyURI); such IRI has no specific interpretation in the context of PROV-DM.

The following examples respectively are the string "abc" (expressed using the convenience notation), the string "abc", the integer number 1, the integer number 1 (expressed using the convenience notation) and the IRI "http://example.org/foo".

  "abc"  "abc" %% xsd:string  "1" %% xsd:int  1  "http://example.org/foo" %% xsd:anyURI
The following example shows a literal of typexsd:QName (seeQName [XMLSCHEMA-2]).The prefixexmust be bound to anamespace declared in the record container.
  "ex:value" %% xsd:QName
Should we define structural equivalence of literals as in OWL2? [OWL2-SYNTAX](see sectionLiterals).

5.5.4Time

Time instants are defined according to xsd:dateTime [XMLSCHEMA-2].

It isoptional to assert time in usage, generation, and activity records.

5.5.5Asserter

Anasserter is a creator of PROV-DM records. An asserter is denoted by an IRI. Such IRI has no specific interpretation in the context of PROV-DM.

asserter ::=IRI
IRI ::=an IRI compatible with production IRI in [IRI], enclosed in a pair of < (U+3C) and > (U+3E) characters
Currently, the non-terminalasserter is defined as IRI. We may want the asserter to be an agent instead, and therefore use PROV-DM to express the provenance of PROV-DM. We seek inputs on how to resolve this issue.

5.5.6Namespace Declaration

A PROV-DMnamespace is identified by an IRI reference [IRI]. In PROV-DM, attributes, identifiers, and literals of with datatypexsd:QName can be placed in a namespace using the mechanisms described in this specification.

Anamespace declaration consists of a binding between a prefix and a namespace. Every qualified name with this prefix in the scope of this declaration refers to this namespace. Adefault namespace declaration consists of a namespace. Every unprefixed qualified name in the scope of this default namespace declaration refers to this namespace.

namespaceDeclarations ::= |defaultNamespaceDeclaration |namespaceDeclarationnamespaceDeclaration
namespaceDeclaration ::=prefixprefixIRI
defaultNamespaceDeclaration ::=defaultIRI

5.5.7Recipe Link

Arecipe link is an association between an activity record and a process specification that underpins the represented activity. Such IRI has no specific interpretation in the context of PROV-DM.

recipeLink ::=IRI

It isoptional to assert recipe links in activities.

Process specifications, as referred to by recipe links, are out of scope of this specification.

By defining a recipe link as an IRI whose interpretation is out ofscope of PROV-DM, we don't allow it to refer to an entity (in aninter-operable manner). Is this what we intend?
Simplify the references to recipe link. This isISSUE-131

5.5.8Location

Location is an identifiable geographic place (ISO 19112). As such, there are numerous ways in which location can be expressed, such as by a coordinate, address, landmark, row, column, and so forth. This document does not specify how to concretely express locations, but instead provide a mechanism to introduce locations in assertions.

Location is anoptional attribute of entity records and activity records. The value associated with a attributelocationmust be aLiteral, expected to denote a location.

6.PROV-DM Common Relations

This section contains the normative specification of common relations of PROV-DM.

We have defined a set of common relation, in response toISSUE-44. Is this set complete?
The types of these relations need to be made explicit.

The following figure summarizes the additional relations described in subsections 6.2 onwards.

common relations

6.1Collections

The purpose of this section is to enable modelling of part-of relationships amongst entities. In particular, a form ofcollection entity type is introduced, consisting of a set of key-value pairs. Key-value pairs provide a generic indexing structure that can be used to model commonly used data structures, including associative lists (also known as "dictionaries" in some programming languages), relational tables, ordered lists, and more.
The relations introduced here are all specializations of thewasDerivedFrom relation, specifically precise-1 or imprecise-1. They are designed to model: A collection record is defined as follows.
collectionRecord ::=collectionDerivationRecord|keyDerivationRecord|entityMembershipRecord
collectionDerivationRecord ::=wasAddedTo_Coll(identifier,identifier) |wasRemovedFrom_Coll(identifier,identifier)
keyDerivationRecord ::=wasAddedTo_Key(identifier,identifier)|wasRemovedFrom_Key(identifier,identifier)
entityMembershipRecord ::=wasAddedTo_Entity(identifier,identifier)

Record:wasAddedTo_Coll(c2,c1) (resp.wasRemovedFrom_Coll(c2,c1)) denotes that collectionc2 is an updated version of collectionc1, following an insertion (resp. deletion) operation.

Record:wasAddedTo_Key(c,k) (resp.wasRemovedFrom_Key(c,k)) denotes that collectionc had a new value with keyk added to (resp. removed from) it.

Record:wasAddedTo_Entity(c,e) denotes that collectionc had entitye added to it.

Consider the following assertions:

wasAddedTo_Coll(c2,c1)wasAddedTo_Key(c2,k1)wasAddedTo_Entity(c2,e1)wasAddedTo_Coll(c3,c2)wasAddedTo_Key(c3,k2)wasAddedTo_Entity(c3,e2)wasRemovedFrom_Coll(c4,c3)wasRemovedFrom_Key(c4,k1)

The corresponding graphical representation is shown below.

collections

With these assertions:

  • c2 is known to contain the key-value pair(k1,e1)
  • c3 is known to contain pairs(k1,e1) and(k2,e2).
  • c4 is knownnot to contain pair(k1,e1) and to contain pair(k2,e2).

6.2Traceability Record

Atraceability record states the existence of a "dependency path" between two entities, indicating that one entity can be shown to be in the lineage of another, and may have influenced it in some way. This relation is transitive.

A traceability record, writtentracedTo(id,e2,e1,attrs) in PROV-ASN:

In PROV-ASN, a traceability record's text matches thetraceabilityRecord production of the grammar defined in this specification document.

traceabilityRecord ::=tracedTo(identifier,eIdentifier,eIdentifieroptional-attribute-values)

A traceability record can be inferred from existing relations, or can be asserted stating that such a dependency path exists without the asserter knowing its individual steps, as expressed by the following constraints.

Given two identifierse2 ande1 identifying entity records, the following statements hold:
  1. IfwasDerivedFrom(e2,e1,a,g2,u1) holds, for somea,g2,u1,thentracedTo(e2,e1) also holds.
  2. IfwasDerivedFrom(e2,e1) holds,thentracedTo(e2,e1) also holds.
  3. IfwasGeneratedBy(e2,a,gAttr) and wasAssociatedWith(a,e1) hold, for somea andgAttr,thentracedTo(e2,e1) also holds.
  4. IfwasGeneratedBy(e2,a,gAttr),wasAssociatedWith(a,e) andactedOnBehalfOf(e,e1) hold, for somea,e, andgAttr,thentracedTo(e2,e1) also holds.
  5. IfwasGeneratedBy(e2,a,gAttr) and wasStartedBy(a,e1,sAttr) hold, for somea,e, andgAttr, andsAttr,thentracedTo(e2,e1) also holds.
  6. IftracedTo(e2,e) andtracedTo(e,e1) hold for somee,thentracedTo(e2,e1) also holds.
If the recordtracedTo(r2,r1) holdsfor two identifiersr2 andr1 identifying entity records,then there existe0,e1, ...,en forn≥1, withe0=r2 anden=r1, andfor any i such that0≤i≤n-1, at least of the following statements holds:
  • wasDerivedFrom(ei,ei+1,a,g2,u1) holds, for somea,g2,u1, or
  • wasDerivedFrom(ei,ei+1) holds, or
  • wasBasedOn(ei,ei+1) holds, or
  • wasGeneratedBy(ei,a,gAttr) and wasAssociatedWith(a,ei+1) hold, for somea andgAttr, or
  • wasGeneratedBy(ei,a,gAttr),wasAssociatedWith(a,e) andactedOnBehalfOf(e,ei+1) hold, for somea,e andgAttr, or
  • wasGeneratedBy(ei,a,gAttr) and wasStartedBy(a,ei+1,sAttr) hold, for somea,e, andgAttr, andsAttr.

We note that the previous constraint is not really an inferencerule, since there is nothing that we can actually infer. Instead, this constraint should simply be seen as part of the definition of the traceability record.

6.3Activity Ordering Record

PROV-DM allows dependencies amongst activities to be expressed.Aninformation flow ordering record is a representation that an entity was generated by an activity, before it was used by another activity.Acontrol ordering record is a representation that an activity was initiated by another activity.

In PROV-ASN, an activity ordering record's text matches theactivityOrderingRecord production of the grammar defined in this specification document.

activityOrderingRecord ::=informationFlowOrderingRecord |controlOrderingRecord
informationFlowOrderingRecord  ::=wasInformedBy(identifier,aIdentifier,aIdentifieroptional-attribute-values)
controlOrderingRecord  ::=wasStartedBy(identifier,aIdentifier,aIdentifieroptional-attribute-values)

An information flow ordering record, written aswasInformedBy(id,a2,a1,attrs) in PROV-ASN, contains:

An information flow ordering record is formally defined as follows.

Given two activity records identified bya1 anda2, the recordwasInformedBy(a2,a1)holds,if and only if there is an entity record identified bye and sets of attribute-value pairsattrs1 andattrs2,such thatwasGeneratedBy(e,a1,attrs1) andused(a2,e,attrs2) hold.
Given two activity records denoted bya1 anda2,if the recordwasInformedBy(a2,a1) holds,then the following temporal constraint holds:thestart event of the activity record denoted bya1precedes theend event ofthe activity record denoted bya2.

The relationshipwasInformedBy is not transitive. Indeed, consider the following records.

wasInformedBy(a2,a1)wasInformedBy(a3,a2)

We cannot inferwasInformedBy(a3,a1) from them. Indeed, fromwasInformedBy(a2,a1), we know that there existse1 such thate1 was generated bya1 and used bya2. Likewise, fromwasInformedBy(a3,a2), we know that there existse2 such thate2 was generated bya2 and used bya3. The following illustration shows a case where transitivity cannot hold. The horizontal axis represents time. We see thate1 was generated aftere2 was used. Furthermore, the illustration also shows thata3 completes beforea1. So it is impossible fora3 to have used an entity generated bya1.

non transitivity of wasInformedBy
The relation wasScheduledAfter was dropped, and replaced by a simplier relation wasStartedBy(a2,a1). It is intentional that the name wasStartedBy is overloaded.

A control ordering record, written aswasStartedBy(a2,a1) in PROV-ASN, contains:

Such a record states control ordering betweena2 anda1, specified as follows.

Given two activity records identified bya1 anda2, the recordwasStartedBy(a2,a1)holdsif and only if there exist an entity record identified bye and some attributesgAttr andsAttr,such thatwasGeneratedBy(e,a1,gAttr) andwasStartedBy(a2,e,sAttr) hold.

In the following assertions, we find two activity records, identified bya1 anda2, representing two activities, which took place on two separate hosts. The third record indicates that the latter activity was started by the former.

activity(a1,workflow,t1,t2,[ex:host="server1.example.org"])activity(a2,sub-workflow,t3,t4,[ex:host="server2.example.org"])wasStartedBy(a2,a1)

Alternatively, we could have asserted the existence of an entity, representing a request to create a sub-workflow. This request, issued bya1, triggered the start ofa2.

entity(e,[prov:type="creation-request"])wasGeneratedBy(e,a1)wasStartedBy(a2,e)
Given two activity records denoted bya1 anda2,if the recordwasStartedBy(a2,a1) holds,then the following temporal constraint holds: thestart event of the activity record denoted bya1precedes thestart event ofthe activity record denoted bya2.
Suggested definition for process ordering. This isISSUE-50.

6.4Revision Record

Arevision record is a representation of the creation of an entity considered to be a variant of another. Deciding whether something is made available as a revision of something else usually involves an agent who represents someone in the world who takes responsibility for approving that the former is a due variant of the latter.

A revision record, writtenwasRevisionOf(e2,e1,ag,attrs) in PROV-ASN, contains:

In PROV-ASN, a revision record's text matches therevisionRecord production of the grammar defined in this specification document.

revisionRecord ::=wasRevisionOf(eIdentifier,eIdentifier,agIdentifieroptional-attribute-values)

A revision record needs to satisfy the following constraint, linking the two entity records by a derivation, and stating them to be a complement of a third entity record.

Given two identifiersold andnew identifying two entities, and an identifierag identifying an agent,if a recordwasRevisionOf(new,old,ag) is asserted,then there exists an entity record identifiere and attribute-valueseAttrs,dAttrs, such that the following records hold:
  • wasDerivedFrom(new,old,dAttrs);
  • entity(e,eAttrs);
  • wasComplementOf(new,e);
  • wasComplementOf(old,e).
The derivation record may be imprecise-1 or imprecise-n.

wasRevisionOf is a strict sub-relation ofwasDerivedFrom since two entitiese2 ande1 may satisfywasDerivedFrom(e2,e1) without being a variant of each other.

The following revision assertion

agent(ag,[prov:type="QualityController"])entity(e1,[prov:type="document"])entity(e2,[prov:type="document"])wasRevisionOf(e2,e1,ag)

states that the document represented by entity record identified bye2 is a revision of document represented by entity record identified bye1; agent denoted byag is responsible for this new versioning of the document.

Revision should be a class not a property. This isISSUE-48.

6.5Attribution Record

An attribution record represents that an entity is ascribed to an agent and is compliant with theattributionRecord production.

An attribution record, written wasAttributedTo(e,ag,attr), contains the following components:

Attribution models the notion of an activity generating an entity identified bye being controlled by an agentag, which takes responsibility for generatinge. Formally, this is expressed as the following necessary condition.

In PROV-ASN, an attribution record's text matches theattributionRecord production of the grammar.

attributionRecord ::=wasAttributedTo(eIdentifier,agIdentifieroptional-attribute-values)
IfwasAttributedTo(e,ag) holds for some identifierse andag,then there exists an activity identified bype such that the following statements hold:
activity(pe,recipe,t1,t2,attr1)wasGenerateBy(e,pe)wasAssociatedWith(pe,ag,attr2)
for some sets of attribute-value pairsattr1 andattr2, timet1, andt2.

6.6Quotation Record

A quotation record is a representation of the repeating or copying of some part of an entity, compatible with thequotationRecord production.

A quotation record, written wasQuotedFrom(e2,e1,ag2,ag1,attrs), contains:

In PROV-ASN, a quotation record's text matches thequotationRecord production of the grammar.

quotationRecord ::=wasQuotedFrom(eIdentifier,eIdentifier,agIdentifier,agIdentifieroptional-attribute-values)
IfwasQuotedFrom(e2,e1,ag2,ag1) holds for some identifierse2,e1,ag2,ag1,then the following records hold:
wasDerivedFrom(e2,e1)wasAttributedTo(e2,ag2)wasAttributedTo(e1,ag1)

6.7Summary Record

Asummary record represents that an entity is a synopsis or abbreviation of another entity. A summary record is compliant with thesummaryRecord production.

An assertion wasSummaryOf, written wasSummaryOf(e2,e1,attrs), contains:

In PROV-ASN, a summary record's text matches thesummaryRecord production of the grammar.

summaryRecord ::=wasSummaryOf(eIdentifier,eIdentifieroptional-attribute-values)

wasSummaryOf is a strict sub-relation ofwasDerivedFrom.

6.8Original Source Record

Anoriginal source record represents an entity inwhich another entity first appeared. A original-sourcerecord is compliant with theoriginalSourceRecord production.

An assertion hadOriginalSource, written hadOriginalSource(e2,e1,attrs), contains:

hasOriginalSource is a strict sub-relation ofwasDerivedFrom.

In PROV-ASN, an original source record's text matches theoriginalSourceRecord production of the grammar.

originalSourceRecord ::=hadOriginalSource(eIdentifier,eIdentifieroptional-attribute-values)

7.PROV-DM Extensibility Points

The PROV data model provides several extensibility points that allow designers to specialize it to specific applications or domains. We summarize these extensibility points here:

The PROV data model is designed to be application and technology independent, but specializations of PROV-DM are welcome and encouraged. To ensure inter-operability, specializations of the PROV data model that exploit the extensibility points summarized in this sectionmust preserve the semantics specified in this document. For instance, a qualified attribute on a domain specific entity recordmust represent an aspect of an entity and this aspectmust remain unchanged during the characterization's interval of this entity record.

8.Resources, URIs, Entities, Identifiers, and Scope

This specification introduces the notion of an identifiable entity in the world. In PROV-DM, an entity record is a representation of such an identifiable entity. An entity record includes an identifier identifying this entity. Identifiers are qualified names, which can be mapped to IRIs.

The term 'resource' is used in a general sense for whatever might be identified by a URI [RFC3986]. On the Web, a URI denotes a resource, without any expectation that the resource is accessed.

The purpose of this section is to clarify the relationship between resource and the notions of entity and entity record.

In the context of PROV-DM, a resource is just a thing in the world. One may take multiple perspectives on such a thing and its situation in the world, fixing some its aspects.

We refer to the example of section2.1 for a resource (at some URL) and three different perspectives, referred to as entities. Three different entity records can be expressed for this report, which in the PROV-ASN sample below, are expressed within a same account.

containerprefix app urn:example:prefix cr  http://example.org/crime/   account(acc1,           http://example.org/asserter1,           entity(app:0, [ prov:type="Document", cr:path="http://example.org/crime.txt" ])           entity(app:1, [ prov:type="Document", cr:path="http://example.org/crime.txt", cr:version="2.1", cr:content="...", cr:date="2011-10-07" ])           entity(app:2, [ prov:type="Document", cr:author="John" ])        ...)endContainer

Each entity record contains an idenfier that identifies the entity it represents.In this example, three identifiers were minted, and their prefix uses the URN syntax with "example" namespace.

Given that the report is a resource denoted by the URIhttp://example.org/crime.txt, we could simply use this URI as the identifier of an entity. This would avoid us minting new URIs. Hence, the report URI would play a double role: as a URI it denotes a resource accessible at that URI, and as a PROV-DM identifier, it identifies a specific characterization of this report. A given identifier identifies a single entity record within the scope of an account. Hence, below, all entities records have been given the same identifier but appear in the scope of different accounts.

container prefix app http://example.org/prefix cr  http://example.org/crime/   account(acc2,           http://example.org/asserter1,           entity(app:crime.txt, [ prov:type="Document", cr:path="http://example.org/crime.txt" ])           ...)   account(acc3,           http://example.org/asserter1,           entity(app:crime.txt, [ prov:type="Document", cr:path="http://example.org/crime.txt", cr:version="2.1", cr:content="...", cr:date="2011-10-07" ])           ...)   account(acc4,           http://example.org/asserter1,           entity(app:crime.txt, [ prov:type="Document", cr:author="John" ])           ...)endContainer

In this case, the qualified nameapp:crime.txt maps to URIhttp://example.org/crime.txt still denotes the same resource; however, the perspective we take about that resource is expressed as a different entity record, happening to have the same identifier in different accounts.

Alternatively, if we need to assert the existence of two different perspectives on the report within the same account, then alternate identifiersmust be used, one of them being allowed to be the resource URI.

container  prefix app  http://example.org/ prefix app2 urn:example: prefix cr   http://example.org/crime/   account(acc5,           http://example.org/asserter1,           entity(app:crime.txt, [ prov:type="Document", cr:path="http://example.org/crime.txt" ])           entity(app2:1, [ prov:type="Document", cr:path="http://example.org/crime.txt", cr:version="2.1", cr:content="...", cr:date="2011-10-07" ])           ...)endContainer

A.Changes Since First Public Working Draft

B.Acknowledgements

WG membership to be listed here.

C.References

C.1Normative references

[IRI]
M. Duerst, M. Suignard.Internationalized Resource Identifiers (IRI). January 2005. Internet RFC 3987. URL:http://www.ietf.org/rfc/rfc3987.txt
[OWL2-SYNTAX]
Boris Motik; Peter F. Patel-Schneider; Bijan Parsia.OWL 2 Web Ontology Language:Structural Specification and Functional-Style Syntax. 27 October 2009. W3C Recommendation. URL:http://www.w3.org/TR/2009/REC-owl2-syntax-20091027/
[RDF-SPARQL-QUERY]
Andy Seaborne; Eric Prud'hommeaux.SPARQL Query Language for RDF. 15 January 2008. W3C Recommendation. URL:http://www.w3.org/TR/2008/REC-rdf-sparql-query-20080115
[RFC2119]
S. Bradner.Key words for use in RFCs to Indicate Requirement Levels. March 1997. Internet RFC 2119. URL:http://www.ietf.org/rfc/rfc2119.txt
[RFC3986]
T. Berners-Lee; R. Fielding; L. Masinter.Uniform Resource Identifier (URI): Generic Syntax. January 2005. Internet RFC 3986. URL:http://www.ietf.org/rfc/rfc3986.txt
[XML-NAMES]
Richard Tobin; et al.Namespaces in XML 1.0 (Third Edition). 8 December 2009. W3C Recommendation. URL:http://www.w3.org/TR/2009/REC-xml-names-20091208/
[XMLSCHEMA-2]
Paul V. Biron; Ashok Malhotra.XML Schema Part 2: Datatypes Second Edition. 28 October 2004. W3C Recommendation. URL:http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/

C.2Informative references

[CLOCK]
Lamport, L.Time, clocks, and the ordering of events in a distributed system.Communications of the ACM 21 (7): 558–565. 1978 URL:http://research.microsoft.com/users/lamport/pubs/time-clocks.pdf DOI: doi:10.1145/359545.359563.
[CSP]
Hoare, C. A. R.Communicating Sequential Processes.Prentice-Hall. 1985URL:http://www.usingcsp.com/cspbook.pdf
[FOAF]
Dan Brickley, Libby Miller.FOAF Vocabulary Specification 0.98. 9 August 2010. URL:http://xmlns.com/foaf/spec/
[Logic]
W. E. JohnsonLogic: Part III.1924. URL:http://www.ditext.com/johnson/intro-3.html
[PROV-O]
Satya Sahoo and Deborah McGuinnessProvenance Formal Model. 2011, Work in progress. URL:http://www.w3.org/TR/prov-o/
[PROV-PAQ]
Graham Klyne and Paul GrothProvenance Access and Query. 2011, Work in progress. URL:http://dvcs.w3.org/hg/prov/tip/paq/prov-aq.html
[PROV-PRIMER]
Yolanda Gil and Simon MilesProv Model Primer. 2011, Work in progress. URL:http://dvcs.w3.org/hg/prov/raw-file/default/primer/Primer.html
[PROV-SEMANTICS]
James CheneyFormal Semantics Strawman. 2011, Work in progress. URL:http://www.w3.org/2011/prov/wiki/FormalSemanticsStrawman

[8]ページ先頭

©2009-2025 Movatter.jp