Copyright © 2011W3C® (MIT,ERCIM,Keio), All Rights Reserved.W3Cliability,trademark anddocument use rules apply.
PROV-DM is a data model for provenance for buildingrepresentations of the entities, people and activities involved inproducing a piece of data or thing in the world. PROV-DM isdomain-agnotisc, but with well-defined extensibility points allowingfurther domain-specific and application-specific extensions to bedefined. It is accompanied by PROV-ASN, a technology-independentabstract syntax notation, which allows serializations of PROV-DMinstances to be created for human consumption, which facilitates itsmapping to concrete syntax, and which is used as the basis for aformal semantics.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of currentW3C publications and the latest revision of this technical report can be found in theW3C technical reports index at http://www.w3.org/TR/.
This document is part of a set of specifications aiming to define the various aspects that are necessary to achieve the vision of inter-operable interchange of provenance information in heterogeneous environments such as the Web. This document defines the PROV-DM data model for provenance, accompanied with a notation to express instances of that data model for human consumption. Three other documents are: 1) a normative serialization of PROV-DM in RDF, specified by means of a mapping to the OWL2 Web Ontology Language; 2)the mechanisms for accessing and querying provenance; 3) a primer for the provenance data model.This document was published by theProvenance Working Group as a Working Draft. This document is intended to become aW3C Recommendation. If you wish to make comments regarding this document, please send them topublic-prov-wg@w3.org (subscribe,archives). All feedback is welcome.
Publication as a Working Draft does not imply endorsement by theW3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the5 February 2004W3C Patent Policy.W3C maintains apublic list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes containsEssential Claim(s) must disclose the information in accordance withsection 6 of theW3C Patent Policy.
For the purpose of this specification, provenance is defined as a record that describes the people,institutions, entities, and activities, involved in producing,influencing, or delivering a piece of data or a thing in the world.In particular, the provenance of information is crucial in decidingwhether information is to be trusted, how it should be integrated withother diverse information sources, and how to give credit to itsoriginators when reusing it. In an open and inclusive environmentsuch as the Web, users find information that is often contradictory orquestionable: provenance can help those users to make trust judgments.
The idea that a single way of representing and collecting provenance could be adopted internally by all systems does not seem to be realistic today. Instead, a pragmatic approach is to consider a core data model for provenance that allows domain and application specific representations of provenance to be translated into such a data model and exchanged between systems.Heterogeneous systems can then export their provenance into such a core data model, and applications that need to make sense of provenance in heterogeneous systems can then import it, process it, and reason over it.
Thus, the vision is that different provenance-aware systems natively adopt their own model for representing their provenance, but a core provenance data model can be readily adopted as a provenanceinterchange model across such systems.
A set of specifications define the various aspectsthat are necessary to achieve this vision in an inter-operableway, the first of which is contained in this document:
The PROV-DM data model for provenance consists of a set of coreconcepts, and a few common relations, based on these core concepts. PROV-DM is a domain-agnotisc model, but with well-defined extensibility points allowing further domain-specific and application-specific extensions to be defined.
This specification also introducesPROV-ASN, an abstract syntax that is primarily aimed at human consumption. PROV-ASN allowsserializations of PROV-DM instances to be written in a technology independent manner,it facilitates its mapping to concrete syntax, and it is used as the basis for aformal semantics. This specification uses instances of provenance written in PROV-ASN to illustrate the data model.
Insection 2, a set of preliminaries are introduced, including concepts that underpin PROV-DM and motivations for the PROV-ASN notation.
Section 3 provides an overview of PROV-DM listing its core types and their relations.
Insection 4, PROV-DM isapplied to a short scenario, encoded in PROV-ASN, and illustratedgraphically.
Section 5 provides the normative definition of PROV-DM and the notation PROV-ASN.
Section 6 introduces common relations used in PROV-DM, including relations for data collections and common domain-independent common relations.
Section 7 summarizes PROV-DM extensibility points.
Section 8 discusses how PROV-DM can be applied to the notion of resource.
The PROV-DM namespace ishttp://www.w3.org/ns/prov-dm/ (TBC).
All the elements, relations, reserved names and attributes introduced in this specification belong to the PROV-DM namespace.
The key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" in this document are to be interpreted as described in [RFC2119].
This specification is based on a conceptualization of the worldthat is described in this section. In the world (whether real or not),there are things, which can be physical, digital, conceptual, orotherwise, and activities involving things.
When we talk about things in the world in natural language and even when we assign identifiers, we are often imprecise in ways that make it difficult to clearly and unambiguously report provenance: a resource with a URL may be understood as referring to a report available at that URL, the version of the report available there today, the report independent of where it is hosted over time, etc.
Hence, to accommodate different perspectives on things and their situation in the world as perceived by us, we introduce the idea of a characterized thing, which refers to a thing and its situation in the world, as characterized by someone. We then define anentity as an identifiable characterized thing. An entityfixes some aspects of a thing and its situation in the world, so that it becomes possible to express its provenance, and what causes these specific aspects to be as such. An alternative entity may fix other aspects, and its provenance may be different.
We do not assume that any characterization is more important than any other, and in fact, it is possible to describe the processing that occurred for the report to be commissioned, for individual versions to be created, for those versions to be published at the given URL, etc., each via a different entity that characterizes the report appropriately.
In the world,activities involveentities in multiple ways: they consume them, they process them, theytransform them, they modify them, they change them, they relocatethem, they use them, they generate them, they are controlled by them,etc.
Anagent is a type of entity that takes an active role in an activity such that it can be assigned some degree of responsibility for the activity taking place.This definition intentionally stays away from using concepts such as enabling, causing, initiating, affecting, etc, because any entities also enable, cause, initiate, and affect in some way the activities. So the notion of having some degree of responsibility is really what makes an agent.
Even software agents can be assigned some responsibility for the effects they have in the world, so for example if one is using a Text Editor and one's laptop crashes, then one would say that the Text Editor was responsible for crashing the laptop. If one invokes a service to buy a book, that service can be considered responsible for drawing funds from one's bank to make the purchase (the company that runs the service and the web site would also be responsible, but the point here is that we assign some measure of responsibility to software as well). So when someone models software as an agent for an activity in our model, they mean the agent has some responsibility for that activity.
In this specification, the qualifier 'identifiable' is implicit whenever a reference is made to an activity, agent, or an entity.
Time is critical in the context of provenance, since it can help corroborate provenance claims. For instance, if an entity is claimed to be obtained by transforming another, then the latter must have existed before the former. If it is not the case, then there is something wrong in such a provenance claim.
Although time is critical, we should also recognize that provenance can be used in many different contexts: in a single system, across the Web, or in spatial data management, to name a few. Hence, it is a design objective of PROV-DM to minimize the assumptions about time, so that PROV-DM can be used in varied contexts.
Furthermore, consider two activities that started at the same timeinstant. Just by referring to that instant, we cannot distinguishwhich activity start we refer to. This is particularly relevant if wetry to explain that the start of these activities had differentreasons. We need to be able to refer to the start of an activity as afirst class concept, so that we can talk about it and about itsrelation with respect to other similar starts.
Hence, in our conceptualization of the world, an instantaneous event, orevent for short, happens in the world and marks a change in the world, in its activities and in its entities. The term "event" is commonly used in process algebra with a similar meaning. For instance, in CSP [CSP], events represent communications or interactions; they are assumed to be atomic and instantaneous.
Four kinds of events underpin the PROV-DM data model. Theactivity start andactivity end events demarcate the beginning and the end of activities, respectively. Theentity generation andentity usage events demarcate the characterization interval for entities. More specifically:
Anentity generation event is theevent that marks the final instant of an entity's creation timespan, after which it becomes available for use.
Anentity usage event is theevent that marks the first instant of an entity's consumption timespan by an activity.
Anactivity start event is theevent that marks the instant an activity starts.
Anactivity end event is theevent that marks the instant an activity ends.
To allow for minimalistic clock assumptions, like Lamport[CLOCK], PROV-DM relies on a notion of relative ordering ofevents,without using physical clocks. This specification assumes that a partial order exists betweenevents.
Specifically,follows is a partialorder betweenevents, indicating that anevent occurs after another.For symmetry,precedes is defined asthe inverse of follows.
How such partial order is realized in practice is beyond the scopeof this specification. This specification only assumes thateachevent can be mapped to an instant in some form oftimeline. The actual mapping is not in scope of thisspecification. Likewise, whether this timeline is formed of a singleglobal timeline or whether it consists of multiple Lamport's styleclocks is also beyond this specification. It is anticipatedthatfollows andprecedes correspond to some orderingover this timeline.
This specification introduces a set of "temporal interpretation"rules allowing to deriveevent ordering constraints fromprovenance records. According to such temporal interpretation,provenance recordsmust satisfy such constraints. We note that theactual verification of such temporal constraints is also outside thescope of this specification.
PROV-DM also allows for time observations to be inserted in specificprovenance records, for each recognizedevent introducedin this specification. The presence of a time observation for agivenevent fixes the mapping of thisevent to thetimeline. It can also help with the verification of associatedtemporal constraints (though, again, this verification is outside thescope of this specfication).
This specification defines PROV-DM, a data model for provenance, consisting of records describing how people, entities, and activities, were involved in producing,influencing, or delivering a piece of data or a thing in the world.
This specification also relies on a language, PROV-ASN, the Provenance Abstract Syntax Notation, to expressinstances of that data model. For each construct of PROV-DM, a corresponding ASN expression is introduced, by way of a production in the ASN grammar.
PROV-ASN is an abstract syntax, whose goals are:
This specification provides a grammar for PROV-ASN. Each record of the PROV-DM data model is explained in terms of the production of this grammar.
The formal semantics of PROV-DM is defined at[PROV-SEMANTICS] and its encoding in the OWL2 Web Ontology Language at [PROV-O].
PROV-DM is a provenance data model designed to expressrepresentations of the world.
These representations are relative to an asserter, and in that sense constitute assertions stating properties of the world, as represented by an asserter. Different asserters will normally contribute different representations.This specification does not define a notion of consistency between different sets of assertions (whether by the same asserter or different asserters).The data model provides the means to associate attribution to assertions.
The data model is designed to capture activities that happened in the past, as opposed to activitiesthat may or will happen. However, this distinction is not formally enforced.Therefore, all PROV-DM assertionsshould be interpreted as a record of what has happened, as opposed to what may or will happen.
This specification does not prescribe the means by which an asserter arrives at assertions; for example, assertions can be composed on the basis of observations, reasoning, or any other means.
Sometimes, inferences about the world can be made from representationsconformant to the PROV-DM data model. When this is the case, thisspecification defines such inferences, allowing new provenance recordsto be inferred from existing ones. Hence, representations of the worldcan result either from direct assertions by asserters or fromapplication of inferences defined by this specification.
This specification includes a grammar for PROV-ASN expressed using the Extended Backus-Naur Form (EBNF) notation.
Each rule in the grammar defines one symbol, in the form:
E ::=expression
Within the expression on the right-hand side ofa rule, the follwoing expressions are used to match strings of one or more characters:The following ER diagram provides a high level overview of thestructure of PROV-DM records. Examples of provenance assertions that conform to this schema are provided in the next section.
The model includes the following elements:
A set of attribute-value pairs can be associated to elements and relations of the PROV model in order to further characterizetheir nature. ThewasComplementOf relationship is used to denote that twoentitiescomplement each other, in the sense that they each represent a partial, but mutually compatible characterization of the same thing.The attributesrole andtype are pre-defined.
The set of relations presented here forms a core, which is further extended with additional relations, defined in SectionCommon Relations.
The model includes a further additional element:notes. These are also structured as sets of attribute-value pairs. Notes are used to provide additional, "free-form" information regardingany identifiable construct of the model, with no prescribed meaning. Notes are described in detailhere.
Attributes and notes are the mainextensibility points in the model: individual interest groups are expected to extend PROV-DM by introducing new attributes and notes as needed to address applications-specific provenance modelling requirements.
This section is non-normative.
This scenario is concerned with the evolution of a crime statisticsfile (referred to as e0) stored on a shared file system and whichjournalists Alice, Bob, Charles, David, and Edith can share andedit. We consider variousevents in the evolution of file e0;events listed below follow each other, unless otherwise specified.
Event evt1: Alice creates (a0) an empty file in /share/crime.txt. We denote this file e1.
Event evt2: Bob appends (a1) the following line to /share/crime.txt:
There was a lot of crime in London last month.
We denote the revised file e2.
Event evt3: Charles emails (a2) the contents of /share/crime.txt, as anattachment, which we refer to as e4. (We specifically refer to a copy of the file that is uploaded on the mail server.)
Event evt4: David edits (a3) file /share/crime.txt as follows.
There was a lot of crime in London and New-York last month.
We denote the revised file e3.
Event evt5: Edith emails (a4) the contents of /share/crime.txt as an attachment, referred to as e5.
Event evt6: betweenevents evt4 and evt5, someone (unspecified) runs a spell checker (a5) on the file /share/crime.txt. The file after spell checking is referred to as e6.
Entity Records (described inSection Entity). The file in its various forms and its copies are modelled as entity records, corresponding to multiple characterizations, as per scenario. The entity records are identified bye0, ...,e6.
entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ])entity(e1, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="" ])entity(e2, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="There was a lot of crime in London last month."])entity(e3, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="There was a lot of crime in London and New York last month."])entity(e4)entity(e5)entity(e6, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="There was a lot of crime in London and New York last month.", ex:spellchecked="yes"])
These entity records list attributes that have been given values during intervals delimited byevents; such intervals are referred to ascharacterization intervals. The following table lists all entity identifiers and their corresponding characterization intervals. When the end of the characterization interval is not delimited by anevent described in this scenario, it is marked by "...".
Entity Characterization Interval e0 evt1 - ... e1 evt1 - evt2 e2 evt2 - evt4 e3 evt4 - ... e4 evt3 - ... e5 evt5 - ... e6 evt6 - ...
Activity Records (described inSection Activity) represent activities in the scenario.
activity(a0, create-file, 2011-11-16T16:00:00,)activity(a1, add-crime-in-london, 2011-11-16T16:05:00,)activity(a2, email, 2011-11-16T17:00:00,)activity(a3, edit-London-New-York, 2011-11-17T09:00:00,)activity(a4, email, 2011-11-17T09:30:00,)activity(a5, spellcheck,,)
Generation Records (described inSection Generation) represent theevent at which a file is created in a specific form. Attributes are used to describe the modalities according to which a given entity is generated by a given activity. The interpretation of attributes is application specific. Illustrations of such attributes for the scenario are: no attribute is provided fore0;e2 was generated by the editor's save function;e4 can be found on the smtp port, in the attachment section of the mail message;e6 was produced on the standard output ofa5. Two identifiersg1 andg2 identify the generation records referenced in derivations introduced below.
wasGeneratedBy(e0, a0)wasGeneratedBy(e1, a0, [ex:fct="create"])wasGeneratedBy(e2, a1, [ex:fct="save"]) wasGeneratedBy(e3, a3, [ex:fct="save"]) wasGeneratedBy(g1, e4, a2, [ex:port="smtp", ex:section="attachment"]) wasGeneratedBy(g2, e5, a4, [ex:port="smtp", ex:section="attachment"]) wasGeneratedBy(e6, a5, [ex:file="stdout"])
Usage Records (described inSection Usage) represent theevent by which a file is read by an activity. Likewise, attributes describe the modalities according to which the various entities are used by activities. Illustrations of such attributes are:e1 is used in the context ofa1'sload functionality;e2 is used bya2 in the context of its attach functionality;e3 is used on the standard input bya5. Two identifiersu1 andu2 identify the Usage records referenced in derivations introduced below.
used(a1,e1,[ex:fct="load"])used(a3,e2,[ex:fct="load"])used(u1,a2,e2,[ex:fct="attach"])used(u2,a4,e3,[ex:fct="attach"])used(a5,e3,[ex:file="stdin"])
Derivation Records (described inSection Derivation Relation) express that an entity is derived from another. The first two are expressed in their compact version, whereas the following two are expressed in their full version, including the activity underpinning the derivation, and associated usage (u1,u2) and generation (g1,g2) records.
wasDerivedFrom(e2,e1)wasDerivedFrom(e3,e2)wasDerivedFrom(e4,e2,a2,g1,u1)wasDerivedFrom(e5,e3,a4,g2,u2)
wasComplementOf: (this relation is described inSection wasComplementOf). The crime statistics file (e0) has various contents over its existence (e1,e2,e3); the entity records identified bye1,e2,e3 complemente0 with an attributecontent. Likewise, the one denoted bye6 complements the record denoted bye3 with an attributespellchecked.
wasComplementOf(e1,e0)wasComplementOf(e2,e0)wasComplementOf(e3,e0)wasComplementOf(e6,e3)
Agent Records (described atSection Agent): the various users are represented as agents, themselves being a type of entity.
agent(ag1, [ prov:type="prov:Person" %% xsd:QName, ex:name="Alice" ])agent(ag2, [ prov:type="prov:Person" %% xsd:QName, ex:name="Bob" ])agent(ag3, [ prov:type="prov:Person" %% xsd:QName, ex:name="Charles" ])agent(ag4, [ prov:type="prov:Person" %% xsd:QName, ex:name="David" ])agent(ag5, [ prov:type="prov:Person" %% xsd:QName, ex:name="Edith" ])
Activity Assocation Records (described inSection Activity Association): the association of an agent with an activity is expressed with , and the nature of this association is described by attributes. Illustrations of such attributes include the role of the participating agent, as creator, author and communicator (role is a reserved attribute in PROV-DM).
wasAssociatedWith(a0, ag1, [prov:role="creator"])wasAssociatedWith(a1, ag2, [prov:role="author"])wasAssociatedWith(a2, ag3, [prov:role="communicator"])wasAssociatedWith(a3, ag4, [prov:role="author"])wasAssociatedWith(a4, ag5, [prov:role="communicator"])
Provenance assertions can beillustrated graphically. The illustration is not intended to represent all the details of the model, but it is intended to show the essence of a set of provenance assertions. Therefore, it cannot be seen as an alternate notation for expressing provenance.
The graphical illustration takes the form of a graph. Entities, activities and agents are represented as nodes, with oval, rectangular, and half-hexagonal shapes, respectively. Usage, Generation, Derivation, Activity Association, and Complementarity are represented as directed edges.
Entities are layed out according to the ordering of their generation event. We endeavor to show time progressing from left to right. This means that edges for Usage, Generation and Derivation typically point from right to left.
This section contains the normative specification of PROV-DM core, the core of the PROV data model.
PROV-DM consists of a set of constructs, referred to asrecords, to formulate representations of the world and constraints that must be satisfied by them.
Furthermore, PROV-DM includes a "house-keeping construct", a record container, used to wrap PROV-DM records and facilitate their interchange.
In PROV-ASN, such representations of the worldmust be conformant with the toplevel productionrecord of the grammar. Theserecords are grouped in three categories:elementRecord (see sectionElement),relationRecord (see sectionRelation), andaccountRecord (see sectionAccount).
In PROV-ASN, a record container is compliant with the productionrecordContainer (see sectionRecord Container).
This section describes all the PROV-DM records referred to as element records. (They are conformant to theelementRecord production of the grammar.)
In PROV-DM, anentity record is a representation of an entity.
Examples of entities include a linked data set, a sparse-matrix matrix of floating-point numbers, a document in a directory, the same document published on the Web, and meta-data embedded in a document.
An entity record, notedentity(id, [ attr1=val1, ...]) in PROV-ASN, contains:
The assertion of an entity record,entity(id, [ attr1=val1, ...]), states, from a given asserter's viewpoint, the existence of an entity, whose situation in the world is represented by the attribute-value pairs, which remain unchanged during a characterization interval, i.e. a continuous interval between twoevents in the world.
In PROV-ASN, an entity record's text matches theentityRecord production of the grammar defined in this specification document.
The following entity record,
entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ])states the existence of an entity, denoted by identifiere0, with typeFile and path/shared/crime.txt in the file system, and creator alice The attributespath andcreator are application specific, whereas the attributetype is reserved in the PROV-DM namespace.
In PROV-DM, anactivity record is a representation of an identifiable activity, which performs a piece of work.
An activity, represented by an activity record, is delimited by itsstart and itsend events; hence, it occurs over an interval delimited by twoevents. However, an activity record need not mention time information, nor duration, because they may not be known.
Such start and end times constituteattributes of an activity, where the interpretation of attribute in the context of an activity record is the same as the interpretation of attribute for entity record: an activity record's attribute remains constant for the duration of the activity it represents. Further characteristics of the activity in the world can be represented by other attribute-value pairs, whichmust also remain unchanged during the activity duration.
Examples of activities include assembling a data set based on a set of measurements, performing a statistical analysis over a data set, sorting news items according to some criteria, running a sparql query over a triple store, editing a file, and publishing a web page.
An activity record, writtenactivity(id, rl, st, et, [ attr1=val1, ...]) in PROV-ASN, contains:
In PROV-ASN, an activity record's text matches theactivityRecord production of the grammar defined in this specification document.
The following activity assertion
activity(a1,add-crime-in-london,2011-11-16T16:05:00,2011-11-16T16:06:00,[ex:host="server.example.org",prov:type="ex:edit" %% xsd:QName])
identified by identifiera1, states the existence of an activity with recipe linkadd-crime-in-london, start time2011-11-16T16:05:00, and end time2011-11-16T16:06:00, running on hostserver.example.org, and of typeedit (declared in some namespace with prefixex). The attributehost is application specific, butmust hold for the duration of activity. The attributetype is a reserved attribute of PROV-DM, allowing for subtyping to be expressed.
The mere existence of an activity assertion entails someevent ordering in the world, since anactivity start event alwaysprecedes the correspondingactivity end event. This is expressed by constraintstart-precedes-end.
An activity record is not an entity record.Indeed, an entity record represents an entity that exists in full atany point in its characterization interval, persists during thisinterval, and preserves the characteristics that makes itidentifiable. Alternatively, an activity in something that happens,unfolds or develops through time, but is typically not identifiable bythe characteristics it exhibits at any point during its duration. This distinction is similar to the distinction between 'continuant' and 'occurrent' in logic [Logic].
Anagent record is a representation of an agent, which is an entity that can be assigned some degree of responsibility for an activity taking place.
Many agents can have an association with a given activity. An agent may do the ordering of the activity, another agent may do its design, another agent may push the button to start it, another agent may run it, etc. As many agents as one wishes to mention can occur in the provenance record, if it is important to indicate that they were associated with the activity.
From an inter-operability perspective, it is useful to define some basic categories of agents sinceit will improve the use of provenance records by applications. There should be very few of these basic categories to keep the model simple and accessible. There are three types of agents in the model:
These types are mutually exclusive, though they do not cover all kinds of agent.
An agent record, notedagent(id, [ attr1=val1, ...]) in PROV-ASN, contains:
In PROV-ASN, an agent record's text matches theagentRecord production of the grammar defined in this specification document.
With the following assertions,
agent(e1, [ex:employee="1234", ex:name="Alice", prov:type="prov:Person" %% xsd:QName])entity(e2) and wasStartedBy(a1,e2,[prov:role="author"])entity(e3) and wasAssociatedWith(a1,e3,[prov:role="sponsor"])
the agent record identified bye1 is an explicit agent assertion that holds irrespective of activities it may be associated with. On the other hand, from the entity records identified bye2 ande3, one can infer agent records, as per the following inference.
One can assert an agent record or alternatively, one can infer an agent recordby its association with an activity.
As provenance records are exchanged between systems, it may be useful to add extra-information about such records. For instance, a "trust service" may add value-judgements about the trustworthiness of some of the assertions made. Likewise, an interactive visualization component may want to enrich a set of provenance records with information helping reproduce their visual representation. To help with inter-operability, PROV-DM introduces a simple annotation mechanism allowing any identifiable record to be associated with notes.
Annote record is a set of attribute-value pairs, whose meaning is application specific. It may or may not be a representation of something in the world.
In PROV-ASN, a note record's text matches thenoteRecord production of the grammar defined in this specification document.
A separate PROV-DM record is used to associate a note with an identifiable record (seeSection on annotation). A given note may be associated with multiple records.
The following note record
note(ann1,[ex:color="blue", ex:screenX=20, ex:screenY=30])
consists of a set of application-specific attribute-value pairs, intendedto help the rendering of the record it is associated with, byspecifying its color and its position on the screen. In this example,these attribute-value pairs do not constitute a representation of somethingin the world; they are just used to help render provenance.
Attribute-value pairs occurring in notes differ from attribute-value pairs occurring in entity records and activity records. In entity and activity records, attribute-value pairsmust be a representation of something in the world, which remain constant for the duration of the characterization interval (for entity record) or the activity duration (for activity records). In note records, it isoptional for attribute-value pairs to be representations of something in the world. If they are a representation of something in the world, then itmay change value for the corresponding duration. If attribute-value pairs of a note record are a representation of something in the world that does not change, they are not regarded as determining characteristics of an entity or activity, for the purpose of provenance.
This section describes all the PROV-DM records representing relations between the elements introduced inSection Element. While these relations are not binary, they all involve two primary elements. They can be summarized as follows.
Entity | Activity | Agent | Note | |
Entity | wasDerivedFrom wasComplementOf | wasGeneratedBy | - | hasAnnotation |
Activity | used | - | wasStartedBy wasEndedBy wasAssociatedWith | hasAnnotation |
Agent | - | - | actedOnBehalfOf | hasAnnotation |
Note | - | - | - | hasAnnotation |
In PROV-ASN, all these relation records are conformant to therelationRecord production of the grammar.
In PROV-DM, ageneration record is a representation of a worldevent, the creation of a new entity by an activity. This entity did not exist before creation. The representation of thisevent encompasses a description of the modalities of generation of this entity by this activity.
Ageneration event may be, for example, the creation of a file by a program, the creation of a linked data set, the production of a new version of a document, and the sending of a value on a communication channel.
A generation record, writtenwasGeneratedBy(id,e,a,attrs,t) in PROV-ASN, has the following components:
In PROV-ASN, a generation record's text matches thegenerationRecord production of the grammar defined in this specification document.
A generation record's id isoptional. Itmust be used when annotating generation records (see SectionAnnotation Record) or when defining precise-1 derivations (seeDerivation Record).
The following generation assertions
wasGeneratedBy(e1,a1, 2001-10-26T21:32:52, [ex:port="p1", ex:order=1]) wasGeneratedBy(e2,a1, 2001-10-26T10:00:00, [ex:port="p1", ex:order=2])
state the existence of twoevents in the world (with respective times2001-10-26T21:32:52 and2001-10-26T10:00:00), at which new entities, represented by entity records identified bye1 ande2, are created by an activity, itself represented by an activity record identified bya1.The first one is available as the first value on port p1, whereas the other is the second value on port p1. The semantics ofport andorder in these records are application specific.
The assertion of a generation record implies ordering ofevents in the world.
A given entity record can be referred to in a single generation record in the scope of a givenaccount.The rationale for this constraint is as follows.If two activities sequentially set different values to some attribute by means of two differentgeneration events, then they generate distinct entities. Alternatively, for two activities to generate an entity simultaneously, they would require some synchronization by which they agree the entity is released for use; the end of this synchronization would constitute the actual generation of the entity, but is performed by a single activity. This unicity constraint is formalized as follows.
In PROV-DM, ausage record is a representation of a worldevent: the consumption of an entity by an activity. The representation includes a description of the modalities of usage of this entity by this activity.
Ausage event may be the consumption of a parameter by a procedure, the reading of a value on a port by a service, the reading of a configuration file by a program, or the adding of an ingredient, such as eggs, in a baking activity. Usage may entirely consume an entity (e.g. eggs are not longer available after being added to the mix), or leave it as such, ready for further uses (e.g. a file on a file system can be read indefinitely).
A usage record, writtenused(id,a,e,attrs,t) in PROV-ASN, has the following constituent:
In PROV-ASN, a usage record's text matches theusageRecord production of the grammar defined in this specification document.
A usage record's id isoptional, but comes handy when annotating usage records (see SectionAnnotation Record) or when defining derivations.
The following usage records
used(a1,e1,2011-11-16T16:00:00,[ex:parameter="p1"]) used(a1,e2,2011-11-16T16:00:01,[ex:parameter="p2"])
state that the activity, represented by the activity record identified bya1, consumed two entities, represented by entity records identified bye1 ande2, at times2011-11-16T16:00:00 and2011-11-16T16:00:01, respectively; the first one was found as the value of parameterp1, whereas the second was found as value of parameterp2. The semantics ofparameter in these records is application specific.
A usage record's id isoptional. Itmust be present when annotating usage records (see SectionAnnotation Record) or when defining precise-1 derivations (seeDerivation Record).
A reference to a given entity recordmay appear in multiple usage records that share a given activity record identifier.
The key purpose of agents in PROV-DM is to assign responsibilityfor activities. It is important to reflect that there is a degree inthe responsibility of agents, and that is a major reason fordistinguishing among all the agents that have some association with anactivity and determine which ones are really the originators of theentity. For example, a programmer and a researcher could both beassociated with running a workflow, but it may not matter whatprogrammer clicked the button to start the workflow while it wouldmatter a lot what researcher told the programmer to do so. Anotherexample: a student publishing a web page describing an academicdepartment could result in both the student and the department beingagents associated with the activity, and it may not matter whatstudent published a web page but it matters a lot that the departmenttold the student to put up the web page. So there is some notion ofresponsibility that needs to be captured.
To this end, PROV-DM offers two kinds of records. The first, introduced in this section, represents an association between an agent and an activity; the second, introduced inSection Responsibility record, represents the fact that an agent was acting on behalf of another, in the context of an activity.
Examples of activity association include designing, participation, initiation and termination, timetabling or sponsoring.
Anactivity association record, writtenwasAssociatedWith(a,ag2,attrs) in PROV-ASN, has the following constituents:
In PROV-ASN, an activity association record's text matches theactivityAssociationRecord productions of the grammar defined in this specification document.
activity(a,[prov:type="workflow"])agent(ag1,[prov:type="programmer"])agent(ag2,[prov:type="researcher"])wasAssociatedWith(a,ag1,[prov:role="loggedInUser", ex:how="webapp"])wasAssociatedWith(a,ag2,[prov:role="designer", ex:context="phd"])
Astart record is a representation of an agent starting an activity. Anend record is a representation of an agent ending an activity. Both relations are specialized forms ofwasAssociatedWith. They contain attributes describing the modalities of acting/ending activities.
A start record, writtenwasStartedBy(id,a,ag,attrs) in PROV-ASN, contains:
An end record, writtenwasEndedBy(id,a,ag,attrs) in PROV-ASN, contains:
In PROV-ASN, start and end record's texts match thestartRecord andendRecord productions of the grammar defined in this specification document.
The following assertions
wasStartedBy(a,ag,[ex:mode="manual"])wasEndedby(a,ag,[ex:mode="manual"])
state that the activity, represented by the activity record denoted byawas started and ended by an agent, represented by record denoted byah, in "manual" mode, an application specific characterization of these relations.
To promote take-up, PROV-DM offers a mild version of responsibilityin the form of a relation to represent when an agent acted on anotheragent's behalf. So in the example of someone running a mail program,the program is an agent of that activity and the person is also anagent of the activity, but we would also add that the mail softwareagent is running on the person's behalf. In the other example, thestudent acted on behalf of his supervisor, who acted on behalf of thedepartment chair, who acts on behalf of the university, and all thoseagents are responsible in some way for the activity to take place butwe don't say explicitly who bears responsibility and to whatdegree.
We could also say that an agent can act on behalf of several otheragents (a group of agents). This would also make possible toindirectly reflect chains of responsibility. This also indirectlyreflects control without requiring that control is explicitlyindicated. In some contexts there will be a need to representresponsibility explicitly, for example to indicate legalresponsibility, and that could be added as an extension to this coremodel. Similarly with control, since in particular contexts theremight be a need to define specific aspects of control that variousagents exert over a given activity.
Given an activity association recordwasAssociatedWith(a,ag2,attrs),aresponsibility record, writtenactedOnBehalfOf(id,ag2,ag1,a,attrs) in PROV-ASN, has the following constituents:
activity(a,[prov:type="workflow"])agent(ag1,[prov:type="programmer"])agent(ag2,[prov:type="researcher"])agent(ag3,[prov:type="funder"])wasAssociatedWith(a,ag1,[prov:role="loggedInUser"])wasAssociatedWith(a,ag2)actedOnBehalfOf(ag1,ag2,a,[prov:type="delegation"])actedOnBehalfOf(ag2,ag3,a,[prov:type="contract"])
In PROV-DM, aderivation record is a representation that some entity is transformed from, created from, or affected by another entity in the world.
Examples of derivation include the transformation of a canvas into a painting, the transportation of a person from London to New-York, the transformation of a relational table into a linked data set, and the melting of ice into water.
According toSection Conceptualization, for an entity to be transformed from, created from, or affected by another in some way, there must be some underpinning activities performing the necessary actions resulting in such a derivation. However, asserters may not assert or have knowledge of these activities and associated details: they may not assert or know their number, they may not assert or know their identity, they may not assert or know the attributes characterizing how the relevant entities are used or generated. To accommodate the varying circumstances of the various asserters, PROV-DM allows more or less precise records of derivation to be asserted. Hence, PROV-DM uses the termsprecise andimprecise to characterize the different kinds of derivation record. We note that the derivation itself is exact (i.e., deterministic, non-probabilistic), but it is its description, expressed in a derivation record, that may be imprecise.
The lack of precision may come from two sources:
Hence, given a precision axis, with valuesprecise andimprecise, and an activity axis, with valuesone activity andn activities, we can then form a matrix of possible derivations, precise or imprecise, or corresponding to one activity or n activities.Out of the four possibilities, PROV-DM offers three forms of derivation, while the fourth one is not meaningful. The following table summarises names for the three kinds of derivation, which we then explain.
precision axis | |||
precise | imprecise | ||
activity axis | one activity | precise-1 derivation record | imprecise-1 derivation record |
n activities | --- | imprecise-n derivation record |
We note that the fourth theoretical case of a precise derivation, where the number of activities is not known or asserted cannot occur.
The three kinds of derivation records are successively introduced. To minimize the number of relation types in PROV-DM, we introduce a PROV-DM reserved attributesteps, which allows us to distinguish the various derivation types.
Aprecise-1 derivation record, writtenwasDerivedFrom(id, e2, e1, a, g2, u1, attrs) in PROV-ASN, contains:
It isoptional to include the attributeprov:steps in a precise-1 derivation since the record already refers to the one and only one activity underpinning the derivation.
Animprecise-1 derivation record, writtenwasDerivedFrom(id, e2,e1, attrs) in PROV-ASN, contains:
An imprecise-1 derivationmust include the attributeprov:steps, since it is the only means to distinguish this record from an imprecise-n derivation record.
Animprecise-n derivation record, writtenwasDerivedFrom(id, e2, e1, attrs) in PROV-ASN, contains:
It isoptional to include the attributeprov:steps in an imprecise-n derivation record. It defaults toprov:steps="n".
None of the three kinds of derivation is defined to be transitive. Domain-specific specializations of these derivations may be defined in such a way that the transitivity property holds.
In PROV-ASN, a derivation record's text matches thederivationRecord production of the grammar defined in this specification document.
The following assertions state the existence of derivations.
wasDerivedFrom(e5,e3,a4,g2,u2,[])wasDerivedFrom(e5,e3,a4,g2,u2,[prov:steps="1"])wasDerivedFrom(e3,e2,[prov:steps="1"])wasDerivedFrom(e2,e1,[])wasDerivedFrom(e2,e1,[prov:steps="n"])
The first two are precise-1 derivation records expressing that the activity represented by the activitya4, byusing the entity denoted bye3 according to usage recordu2 derived theentity denoted bye5 and generated it according to generation recordg2. The third record is an imprecise-1 derivation, which is similar fore3 ande2, but it leaves the activity record and associated attributes implicit. The fourth and fifth records are imprecise-n derivation records betweene2 ande1, but no information is provided as to the number and identity of activities underpinning the derivation.
An precise-1 derivation record is richer than an imprecise-1 derivation record, itself, being more informative that an imprecise-n derivation record. Hence, the following implications hold.
If a derivation record holds fore2 ande1, then this means that the entity represented by entity record identified bye1 has an influence on the entity represented entity record identified bye2, which at the minimum implies temporal ordering, specified as follows.First, we consider one-activity derivations.
Then, imprecise-n derivations.
Note that temporal ordering is between generations ofe1ande2, as opposed to precise-1 derivation,which implies temporal ordering between the usage ofe1 andgeneration ofe2. Indeed, in the case ofimprecise-n derivation, nothing is known about the usage ofe1,since there is no associated activity.
The imprecise-1 derivation has the same meaning as the precise-1 derivation, except that an activity is known to exist, though it does not need to be asserted. This is formalized by the following inference rule,referred to asactivity introduction:
activity(a,aAttrs)wasGeneratedBy(g,e2,a,gAttrs)used(u,a,e1,uAttrs)for sets of attribute-value pairsgAttrs,uAttrs, andaAttrs.
Note that inferring derivation from usage and generation does not holdin general. Indeed, when a generationwasGeneratedBy(g, e2, a, attrs2)precedesused(u, a, e1, attrs1), forsomee1,e2,attrs1,attrs2, anda, onecannot infer derivationwasDerivedFrom(e2, e1, a, g, u)orwasDerivedFrom(e2,e1) since ofe2 cannot possibly be determined byofe1, given the creation ofe2precedes the useofe1.
A further inference is permitted from the imprecise-1 derivation record:
Given an activity record identified bype, entity records identified bye1 ande2, and set of attribute-value pairsattrs2,ifwasDerivedFrom(e2,e1, [prov:steps="1"]) andwasGeneratedBy(e2,pe,attrs2) hold,thenused(pe,e1,attrs1) also holdsfor some set of attribute-value pairsattrs1.
This inference is justified by the fact that the entity represented by entity record identified bye2 is generated by at most one activity in a given account (seegeneration-unicity). Hence, this activity record is also the one referred to in the usage record ofe1.
We note that the converse inference, does not hold.FromwasDerivedFrom(e2,e1) andused(pe,e1), one cannotderivewasGeneratedBy(e2,pe,attrs2) because identifiere1 may occur in usage records referring to many activity records, but they may not be referred to in generation records containing identifiere2.
Acomplementarity record is a relationship between two entities stated to have compatible characterization over some continuous interval between two events.
The rationale for introducing this relationship is that in general, at any given time, for an entity in the world, there may be multiple ways of characterizing it, and hence multiple representations can be asserted by different asserters. In the example that follows, suppose thing "Royal Society" is represented by two asserters, each using a different set of attributes. If the asserters agree that both representations refer to "The Royal Society", the question of whether any correspondence can be established between the two representations arises naturally. This is particularly relevant when (a) the sets of attributes used by the two representations overlap partially, or (b) when one set is subsumed by the other. In both these cases, we have a situation where each of the two asserters has a partial view of "The Royal Society", and establishing a correspondence between them on the shared attributes is beneficial, as in case (a) each of the two representationcomplements the other, and in case (b) one of the two (that with the additional attributes) complements the other.
This intuition is made more precise by considering the entities that form the representations of entities at a certain point in time. An entity record represents, by means of attribute-value pairs, a thing and its situation in the world, which remain constant over a characterization interval.As soon as the thing's situation changes, this marks the end of the characterization interval for the entity record representing it. The thing's novel situation is represented by an attribute with a new value, or an entirely different set of attribute-value pairs, embodied in another entity record, with a new characterization interval. Thus, if we overlap the timelines (or, more generally, the sequences of value-changing events) for the two entities, we can hope to establish correspondences amongst the entity records that represent them at various points along that events line. The figure below illustrates this intuition.
Relationcomplement-of between two entity records is intended to capture these correspondences, as follows. Suppose entity records A and B share a set P of attributes, and each of them has other attributes in addition to P. If the values assigned to each attribute in P arecompatible between A and B, then we say thatA is-complement-of B, andB is-complement-of A, in a symmetrical fashion. In the particular case where the set P of attributes of B is a strict superset of A's attributes, then we say thatB is-complement-of A, but in this case the opposite does not hold. In this case, the relation is not symmetric. (as a special case, A and B may not share any attributes at all, and yet the asserters may still stipulate that they are representing the same thing "Royal Society". The symmetric relation may hold trivially in this case).
The termcompatible used above means that a mapping can be established amongst the values of attributes in P and found in the two entity expession. This generalizes to the case where attribute sets P1 and P2 of A, and B, respectively, are not identical but they can be mapped to one another. The simplest case is the identity mapping, in which A and B share attribute set P, and furthermore the values assigned to attributes in P match exactly.
It is important to note that the relation holds only for the characterization intervals of the entity expessions involved As soon as one attribute changes value in one of them, new correspondences need to be found amongst the new entities. Thus, the relation has a validity span that can be expressed in terms of the event lines of the entity.
A complementarity record is writtenwasComplementOf(e2,e1), wheree1 ande2 are two identifiers denoting entity records.
The following example illustrates the entity "Royal Society"and its perspectives at various points in time.
entity(rs,[ex:created=1870])entity(rs_l1,[prov:location="loc2"])entity(rs_l2,[prov:location="The Mall"])entity(rs_m1,[ex:membership=250, ex:year=1900])entity(rs_m2,[ex:membership=300, ex:year=1945])entity(rs_m3,[ex:membership=270, ex:year=2010])wasComplementOf(rs_m3, rs_l2)wasComplementOf(rs_m2, rs_l1)wasComplementOf(rs_m2, rs_l2)wasComplementOf(rs_m1, rs_l1)wasComplementOf(rs_m3, rs)wasComplementOf(rs_m2, rs)wasComplementOf(rs_m1, rs)wasComplementOf(rs_l1, rs)wasComplementOf(rs_l2, rs)
The complementarity relation is not transitive. Let us consider identifierse1,e2, ande3 identifying three entity records such thatwasComplementOf(e3,e2) andwasComplementOf(e2,e1) hold. The recordwasComplementOf(e3,e1) may not hold because the characterization intervals of the denoted entity records may not overlap.
In PROV-ASN, a complementarity record's text matches thecomplementarityRecord production of the grammar defined in this specification document.
An entity record identifier can optionally be accompanied by an account identifier. When this is the case, it becomes possible to link two entity record identifiers that are appear in different accounts. (In particular, the entity record identifiers in two different account are allowed to be the same.). When account identifiers are not available, then the linking of entity records through complementarity can only take place within the scope of a single account.
In the following example, the same description of the Royal Society is structured according to two different accounts. In the second account, we find a complementarity record linkingrs_m1 in accountex:acc2 tors in accountex:acc1.
account(ex:acc1, http://example.org/asserter1, ... entity(rs,[ex:created=1870]) ... )account(ex:acc2, http://example.org/asserter2, ... entity(rs_m1,[ex:membership=250, ex:year=1900]) ... wasComplementOf(rs_m1, ex:acc2, rs, ex:acc1))
Anannotation record establishes a link between an identifiable PROV-DM record and a note record referred to by its identifier. Multiple note records can be associated with a given PROV-DM record; symmetrically, multiple PROV-DM records can be associated with a given note record. Since note records have identifiers, they can also be annotated. The annotation mechanism (with note record and the annotation record) forms a key aspect of the extensibility mechanism of PROV-DM (seeextensibility section).
An annotation record, writtenhasAnnotation(r,n,attrs) in PROV-ASN, has the following constituents:
In PROV-ASN, a note record's text matches thenoteRecord production of the grammar defined in this specification document.
The interpretation of notes is application-specific. See SectionNote for a discussion of the difference between note attributes and other records attributes. We also note the present tense in this term to indicate that it may not denote something in the past.
The following records
entity(e1,[prov:type="document"])entity(e2,[prov:type="document"])activity(a,transform,t1,t2,[])used(u1,a,e1,[ex:file="stdin"])wasGeneratedBy(e2, a, [ex:file="stdout"])note(n1,[ex:icon="doc.png"])hasAnnotation(e1,n1)hasAnnotation(e2,n1)note(n2,[ex:style="dotted"])hasAnnotation(u1,n2)
assert the existence of two documents in the world (attribute-value pair:prov:type="document") represented by entity records identified bye1 ande2, and annotate these records with a note indicating that the icon (an application specific way of rendering provenance) isdoc.png. It also asserts an activity, its usage of the first entity, and its generation of the second entity. Theusage record is annotated with a style (an application specific way of rendering this edge graphically). To be able to express this annotation, the usage record was provided with an identifieru1, which was then referred to inhasAnnotation(u1,n2).
In this section, two constructs are introduced to groupPROV-DM records. The firstone,account record is itself arecord, whereas the secondonerecord container is not.
In PROV-DM, anaccount record is a wrapper of records with a dual purpose:
An account record, writtenaccount(id, assertIRI, recs, attrs) in PROV-ASN, contains:
In PROV-ASN, an account record's text matches theaccountRecord production of the grammar defined in this specification document.
The following account record
account(ex:acc0, http://example.org/asserter, entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ]) ... wasDerivedFrom(e2,e1) ... activity(a0,create-file,t) ... wasGeneratedBy(e0,a0,[]) ... wasAssociatedWith(a4, ag5, [prov:role="communicator"]) )
contains the set of provenance records of sectionexample-prov-asn-encoding, is asserted by agenthttp://example.org/asserter, and is identified by identifierex:acc0.
Account records constitue a scope for record identifiers. A record identifier within the scope of an account is intended to denote a single record. However, nothing prevents an asserter from asserting an account containing, for example, multiple entity records with a same identifier but different attribute-values. In that case, they should be understood as a single entity record with this identifier and the union of all attributes values, as formalized inidentified-entity-in-account.
Whilst constraintidentified-entity-in-account specifies how to understand multiple entity records with a same identifier within a given account, it does not guarantee that the entity record formed with the union of all attribute-value pairs is consistent. Indeed, a given attribute may be assigned multiple values, resulting in an inconsistent entity record, as illustrated by the following example.
In the following account record, we find two entity records with a same identifiere.
account(ex:acc1, http://example.org/id, entity(e,[prov:type="person", ex:age=20]) entity(e,[prov:type="person", ex:age=30]) ...)
Application ofidentified-entity-in-account results in an entity record containing the attribute-value pairsage=20 andage=30. This results in an inconsistent characterization of a person. We note that deciding whether a set of attribute-values is consistent or not is application specific and outside the scope of this specification.
Account records can be nested since an account record can occur among the records being wrapped by another account.
An account is said to be well-formed ifit satisfies the constraintsgeneration-unicity andderivation-use.
The union of two accounts is another account, containing the unions of their respective records, where records with a same identifier should be understood according to constraintidentified-entity-in-account. Well-formedaccounts are notclosed under union because theconstraintgeneration-unicity may nolonger be satisfied in the resulting union.
Indeed, let us consider another account record
account(ex:acc2, http://example.org/asserter2, entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ]) ... activity(a1,create-file,t1) ... wasGeneratedBy(e0,a1,[ex:fct="create"]) ... )
with identifierex:acc2, containing assertions by asserter byhttp://example.org/asserter2 stating that the entity represented by entity record identified bye0 was generated by an activity represented by activity record identified bya1 instead ofa0 in the previous accountex:acc0. If accountsex:acc0 andex:acc2 are merged together, the resulting set of records violatesgeneration-unicity.
Account records constitute a scope for record identifiers. Since accounts can be nested, scopes can also be nested; thus, the scope of record identifiers should be understood in the context of such nested scopes. When a record with an identifier occurs directly within an account, then its identifier denotes this record in the scope of this account, except in sub-accounts where records with the same identifier occur.
The following account record is inspired from sectionexample-prov-asn-encoding. This account, identified byex:acc3, declares entity record with identifiere0, which is being referred to in the nested accountex:acc4. The scope of identifiere0 is accountex:acc3, including subaccountex:acc4.
account(ex:acc3, http://example.org/asserter1, entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ]) activity(a0,create-file,t) wasGeneratedBy(e0,a0,[]) account(ex:acc4, http://example.org/asserter2, entity(e1, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice", ex:content="" ]) activity(a0,copy-file,t) wasGeneratedBy(e1,a0,[ex:fct="create"]) wasComplementOf(e1,e0)))
Alternatively, an activity record identified bya0 occurs in each of the two accounts. Therefore, each activity record is asserted in a separate scope, and therefore may represent different activities in the world.
The account record is the hook by which further meta information can be expressed about provenance, such as asserter, time of creation, signatures. The annotation mechanism can be used for this purpose, but how general meta-information is expressed is beyond the scope of this specification, except for asserters.
Arecord container is a house-keeping construct of PROV-DM, also capable of bundling PROV-DM records. A record container is not a record, but can be exploited to return assertions in response to a request for the provenance of something ([PROV-PAQ]).
A record container, writtencontainer decls recs endContainer in PROV-ASN, contains:
All the records inrecs are implictly wrapped in a default account, scoping all the record identifiers they declare directly, and constituting a toplevel account, in the hierarchy of accounts. Consequently, every provenance record is always expressed in the context of an account, either explicitly in an asserted account, or implicitly in a container's default account.
In PROV-ASN, a record container's text matches therecordContainer production of the grammar defined in this specification document.
The following container
container prefix ex: http://example.org/, account(ex:acc1,http://example.org/asserter1,...) account(ex:acc2,http://example.org/asserter1,...)endContainer
illustrates how two accounts with identifiersex:acc1 andex:acc2 can be returned in a PROV-ASN serialization of the provenance of something.
Anattribute is aqualified name. A qualified name can be mapped into an IRI by concatenating the IRI associated with the prefix and the local part (see detailed rule in [RDF-SPARQL-QUERY], Section4.1.1).
A qualified name's prefix isoptional. If a prefix occurs in a qualified name, it refers to anamespace declared in the record container. In the absence of prefix, the qualified name refers to the default namespace declared in the container.
From this specification's viewpoint, the interpretation of an attribute declared in a namespace other than prov-dm is out ofscope.
The PROV data model introduces a fixed set of attributes in thePROV-DM namespace:
The following start record describes the role of the agent identified byag in this start relation with activitya.
wasStartedBy(a,ag, [prov:role="program-operator"])
The following record declares an agent of type software agent
agent(ag, [prov:type="prov:SoftwareAgent" %% xsd:QName])
Anidentifier is aqualified name. A qualified name can be mapped into an IRI by concatenating the IRI associated with the prefix and the local part (see detailed rule in [RDF-SPARQL-QUERY], Section4.1.1).
A PROV-DM Literal represents a data value such as a particular stringor number. A PROV-DM Literal represents a value whose interpretation is outside the scope of PROV-DM.
In PROV-ASN, a Literal's text matches theLiteral production of the grammar defined in this specification document.
The non terminalsstringLiteral andintLiteralare syntactic sugar for quoted strings with datatypexsd:string andxsd:int, respectively.
In particular, a PROV-DM Literal may be an IRI-typed string (with datatypexsd:anyURI); such IRI has no specific interpretation in the context of PROV-DM.
The following examples respectively are the string "abc" (expressed using the convenience notation), the string "abc", the integer number 1, the integer number 1 (expressed using the convenience notation) and the IRI "http://example.org/foo".
"abc" "abc" %% xsd:string "1" %% xsd:int 1 "http://example.org/foo" %% xsd:anyURIThe following example shows a literal of typexsd:QName (seeQName [XMLSCHEMA-2]).The prefixexmust be bound to anamespace declared in the record container.
"ex:value" %% xsd:QName
Time instants are defined according to xsd:dateTime [XMLSCHEMA-2].
It isoptional to assert time in usage, generation, and activity records.
Anasserter is a creator of PROV-DM records. An asserter is denoted by an IRI. Such IRI has no specific interpretation in the context of PROV-DM.
A PROV-DMnamespace is identified by an IRI reference [IRI]. In PROV-DM, attributes, identifiers, and literals of with datatypexsd:QName can be placed in a namespace using the mechanisms described in this specification.
Anamespace declaration consists of a binding between a prefix and a namespace. Every qualified name with this prefix in the scope of this declaration refers to this namespace. Adefault namespace declaration consists of a namespace. Every unprefixed qualified name in the scope of this default namespace declaration refers to this namespace.
Arecipe link is an association between an activity record and a process specification that underpins the represented activity. Such IRI has no specific interpretation in the context of PROV-DM.
It isoptional to assert recipe links in activities.
Process specifications, as referred to by recipe links, are out of scope of this specification.
Location is an identifiable geographic place (ISO 19112). As such, there are numerous ways in which location can be expressed, such as by a coordinate, address, landmark, row, column, and so forth. This document does not specify how to concretely express locations, but instead provide a mechanism to introduce locations in assertions.
Location is anoptional attribute of entity records and activity records. The value associated with a attributelocationmust be aLiteral, expected to denote a location.
This section contains the normative specification of common relations of PROV-DM.
The following figure summarizes the additional relations described in subsections 6.2 onwards.
Record:wasAddedTo_Coll(c2,c1) (resp.wasRemovedFrom_Coll(c2,c1)) denotes that collectionc2 is an updated version of collectionc1, following an insertion (resp. deletion) operation.
Record:wasAddedTo_Key(c,k) (resp.wasRemovedFrom_Key(c,k)) denotes that collectionc had a new value with keyk added to (resp. removed from) it.
Record:wasAddedTo_Entity(c,e) denotes that collectionc had entitye added to it.
Consider the following assertions:
wasAddedTo_Coll(c2,c1)wasAddedTo_Key(c2,k1)wasAddedTo_Entity(c2,e1)wasAddedTo_Coll(c3,c2)wasAddedTo_Key(c3,k2)wasAddedTo_Entity(c3,e2)wasRemovedFrom_Coll(c4,c3)wasRemovedFrom_Key(c4,k1)
The corresponding graphical representation is shown below.
With these assertions:
Atraceability record states the existence of a "dependency path" between two entities, indicating that one entity can be shown to be in the lineage of another, and may have influenced it in some way. This relation is transitive.
A traceability record, writtentracedTo(id,e2,e1,attrs) in PROV-ASN:
In PROV-ASN, a traceability record's text matches thetraceabilityRecord production of the grammar defined in this specification document.
A traceability record can be inferred from existing relations, or can be asserted stating that such a dependency path exists without the asserter knowing its individual steps, as expressed by the following constraints.
We note that the previous constraint is not really an inferencerule, since there is nothing that we can actually infer. Instead, this constraint should simply be seen as part of the definition of the traceability record.
PROV-DM allows dependencies amongst activities to be expressed.Aninformation flow ordering record is a representation that an entity was generated by an activity, before it was used by another activity.Acontrol ordering record is a representation that an activity was initiated by another activity.
In PROV-ASN, an activity ordering record's text matches theactivityOrderingRecord production of the grammar defined in this specification document.
An information flow ordering record, written aswasInformedBy(id,a2,a1,attrs) in PROV-ASN, contains:
An information flow ordering record is formally defined as follows.
The relationshipwasInformedBy is not transitive. Indeed, consider the following records.
wasInformedBy(a2,a1)wasInformedBy(a3,a2)
We cannot inferwasInformedBy(a3,a1) from them. Indeed, fromwasInformedBy(a2,a1), we know that there existse1 such thate1 was generated bya1 and used bya2. Likewise, fromwasInformedBy(a3,a2), we know that there existse2 such thate2 was generated bya2 and used bya3. The following illustration shows a case where transitivity cannot hold. The horizontal axis represents time. We see thate1 was generated aftere2 was used. Furthermore, the illustration also shows thata3 completes beforea1. So it is impossible fora3 to have used an entity generated bya1.
A control ordering record, written aswasStartedBy(a2,a1) in PROV-ASN, contains:
Such a record states control ordering betweena2 anda1, specified as follows.
In the following assertions, we find two activity records, identified bya1 anda2, representing two activities, which took place on two separate hosts. The third record indicates that the latter activity was started by the former.
activity(a1,workflow,t1,t2,[ex:host="server1.example.org"])activity(a2,sub-workflow,t3,t4,[ex:host="server2.example.org"])wasStartedBy(a2,a1)
Alternatively, we could have asserted the existence of an entity, representing a request to create a sub-workflow. This request, issued bya1, triggered the start ofa2.
entity(e,[prov:type="creation-request"])wasGeneratedBy(e,a1)wasStartedBy(a2,e)
Arevision record is a representation of the creation of an entity considered to be a variant of another. Deciding whether something is made available as a revision of something else usually involves an agent who represents someone in the world who takes responsibility for approving that the former is a due variant of the latter.
A revision record, writtenwasRevisionOf(e2,e1,ag,attrs) in PROV-ASN, contains:
In PROV-ASN, a revision record's text matches therevisionRecord production of the grammar defined in this specification document.
A revision record needs to satisfy the following constraint, linking the two entity records by a derivation, and stating them to be a complement of a third entity record.
wasRevisionOf is a strict sub-relation ofwasDerivedFrom since two entitiese2 ande1 may satisfywasDerivedFrom(e2,e1) without being a variant of each other.
The following revision assertion
agent(ag,[prov:type="QualityController"])entity(e1,[prov:type="document"])entity(e2,[prov:type="document"])wasRevisionOf(e2,e1,ag)
states that the document represented by entity record identified bye2 is a revision of document represented by entity record identified bye1; agent denoted byag is responsible for this new versioning of the document.
An attribution record represents that an entity is ascribed to an agent and is compliant with theattributionRecord production.
An attribution record, written wasAttributedTo(e,ag,attr), contains the following components:
Attribution models the notion of an activity generating an entity identified bye being controlled by an agentag, which takes responsibility for generatinge. Formally, this is expressed as the following necessary condition.
In PROV-ASN, an attribution record's text matches theattributionRecord production of the grammar.
activity(pe,recipe,t1,t2,attr1)wasGenerateBy(e,pe)wasAssociatedWith(pe,ag,attr2)for some sets of attribute-value pairsattr1 andattr2, timet1, andt2.
A quotation record is a representation of the repeating or copying of some part of an entity, compatible with thequotationRecord production.
A quotation record, written wasQuotedFrom(e2,e1,ag2,ag1,attrs), contains:
In PROV-ASN, a quotation record's text matches thequotationRecord production of the grammar.
wasDerivedFrom(e2,e1)wasAttributedTo(e2,ag2)wasAttributedTo(e1,ag1)
Asummary record represents that an entity is a synopsis or abbreviation of another entity. A summary record is compliant with thesummaryRecord production.
An assertion wasSummaryOf, written wasSummaryOf(e2,e1,attrs), contains:
In PROV-ASN, a summary record's text matches thesummaryRecord production of the grammar.
wasSummaryOf is a strict sub-relation ofwasDerivedFrom.
Anoriginal source record represents an entity inwhich another entity first appeared. A original-sourcerecord is compliant with theoriginalSourceRecord production.
An assertion hadOriginalSource, written hadOriginalSource(e2,e1,attrs), contains:
hasOriginalSource is a strict sub-relation ofwasDerivedFrom.
In PROV-ASN, an original source record's text matches theoriginalSourceRecord production of the grammar.
The PROV data model provides several extensibility points that allow designers to specialize it to specific applications or domains. We summarize these extensibility points here:
ThePROV-DM namespace declares a set of reserved attributes:type,location.
ThePROV-DM namespace declares a reserved attribute:role.
The PROV data model is designed to be application and technology independent, but specializations of PROV-DM are welcome and encouraged. To ensure inter-operability, specializations of the PROV data model that exploit the extensibility points summarized in this sectionmust preserve the semantics specified in this document. For instance, a qualified attribute on a domain specific entity recordmust represent an aspect of an entity and this aspectmust remain unchanged during the characterization's interval of this entity record.
This specification introduces the notion of an identifiable entity in the world. In PROV-DM, an entity record is a representation of such an identifiable entity. An entity record includes an identifier identifying this entity. Identifiers are qualified names, which can be mapped to IRIs.
The term 'resource' is used in a general sense for whatever might be identified by a URI [RFC3986]. On the Web, a URI denotes a resource, without any expectation that the resource is accessed.
The purpose of this section is to clarify the relationship between resource and the notions of entity and entity record.
In the context of PROV-DM, a resource is just a thing in the world. One may take multiple perspectives on such a thing and its situation in the world, fixing some its aspects.
We refer to the example of section2.1 for a resource (at some URL) and three different perspectives, referred to as entities. Three different entity records can be expressed for this report, which in the PROV-ASN sample below, are expressed within a same account.
containerprefix app urn:example:prefix cr http://example.org/crime/ account(acc1, http://example.org/asserter1, entity(app:0, [ prov:type="Document", cr:path="http://example.org/crime.txt" ]) entity(app:1, [ prov:type="Document", cr:path="http://example.org/crime.txt", cr:version="2.1", cr:content="...", cr:date="2011-10-07" ]) entity(app:2, [ prov:type="Document", cr:author="John" ]) ...)endContainer
Each entity record contains an idenfier that identifies the entity it represents.In this example, three identifiers were minted, and their prefix uses the URN syntax with "example" namespace.
Given that the report is a resource denoted by the URIhttp://example.org/crime.txt, we could simply use this URI as the identifier of an entity. This would avoid us minting new URIs. Hence, the report URI would play a double role: as a URI it denotes a resource accessible at that URI, and as a PROV-DM identifier, it identifies a specific characterization of this report. A given identifier identifies a single entity record within the scope of an account. Hence, below, all entities records have been given the same identifier but appear in the scope of different accounts.
container prefix app http://example.org/prefix cr http://example.org/crime/ account(acc2, http://example.org/asserter1, entity(app:crime.txt, [ prov:type="Document", cr:path="http://example.org/crime.txt" ]) ...) account(acc3, http://example.org/asserter1, entity(app:crime.txt, [ prov:type="Document", cr:path="http://example.org/crime.txt", cr:version="2.1", cr:content="...", cr:date="2011-10-07" ]) ...) account(acc4, http://example.org/asserter1, entity(app:crime.txt, [ prov:type="Document", cr:author="John" ]) ...)endContainer
In this case, the qualified nameapp:crime.txt maps to URIhttp://example.org/crime.txt still denotes the same resource; however, the perspective we take about that resource is expressed as a different entity record, happening to have the same identifier in different accounts.
Alternatively, if we need to assert the existence of two different perspectives on the report within the same account, then alternate identifiersmust be used, one of them being allowed to be the resource URI.
container prefix app http://example.org/ prefix app2 urn:example: prefix cr http://example.org/crime/ account(acc5, http://example.org/asserter1, entity(app:crime.txt, [ prov:type="Document", cr:path="http://example.org/crime.txt" ]) entity(app2:1, [ prov:type="Document", cr:path="http://example.org/crime.txt", cr:version="2.1", cr:content="...", cr:date="2011-10-07" ]) ...)endContainer
WG membership to be listed here.