Movatterモバイル変換


[0]ホーム

URL:


W3C

Semantics of the PROV Data Model

W3C Working Group Note 30 April 2013

This version:
http://www.w3.org/TR/2013/NOTE-prov-sem-20130430/
Latest published version:
http://www.w3.org/TR/prov-sem/
Previous version:
http://www.w3.org/TR/2013/WD-prov-sem-20130312/
Editor:
James Cheney, University of Edinburgh

This document is also available in this non-normative format:PDF

Copyright © 2012-2013W3C® (MIT,ERCIM,Keio,Beihang), All Rights Reserved.W3Cliability,trademark anddocument use rules apply.


Abstract

Provenance is information about entities, activities, and peopleinvolved in producing a piece of data or thing, which can be used toform assessments about its quality, reliability ortrustworthiness. PROV-DM is the conceptual data model that forms abasis for theW3C provenance (PROV) family of specifications.

This document presents a model-theoretic semantics for the PROVdata model, viewingPROV-DM statements as atomic formulas in the sense of first-orderlogic, and viewing the constraints and inferences specified inPROV-CONSTRAINTS as a first-order theory. It is shown that valid PROVinstances (in the sense of PROV-CONSTRAINTS) correspond to satisfiabletheories.This information may be useful to researchers or users of PROV tounderstand the intended meaning and use of PROV for modelinginformation about the actual history, derivation or evolution of Webresources. It may also be useful for development of additionalconstraints or inferences for reasoning about PROV or integration ofPROV with other Semantic Web vocabularies. It isnot proposedas a canonical or required semantics of PROV and does not place anyconstraints on the use of PROV.

The PROV Document Overview describes the overall state of PROV, and should be read before other PROV documents.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of currentW3C publications and the latest revision of this technical report can be found in theW3C technical reports index at http://www.w3.org/TR/.

PROV Family of Documents

This document is part of the PROV family of documents, a set of documents defining various aspects that are necessary to achieve the vision of inter-operableinterchange of provenance information in heterogeneous environments such as the Web. These documents are listed below. Please consult the [PROV-OVERVIEW] for a guide to reading these documents.

Implementations Encouraged

The Provenance Working Group encourages implementations that make useof or extend the semantics in this document.Although work on this document by the Provenance Working Group is complete, errors may be recorded in theerrata or and these may be addressed in future revisions.

Please Send Comments

This document was published by theProvenance Working Group as a Working Group Note. If you wish to make comments regarding this document, please send them topublic-prov-comments@w3.org (subscribe,archives). All comments are welcome.

Publication as a Working Group Note does not imply endorsement by theW3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the5 February 2004W3C Patent Policy.W3C maintains apublic list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes containsEssential Claim(s) must disclose the information in accordance withsection 6 of theW3C Patent Policy.

Table of Contents

1.Introduction

Provenance is a record that describes the people, institutions,entities, and activities involved in producing, influencing, ordelivering a piece of data or a thing. [PROV-DM]This document complements the PROV-DM specification [PROV-DM] that defines a data model for provenance on the Web, and the PROV-CONSTRAINTS specification[PROV-CONSTRAINTS] thatspecifies definitions, inferences, and constraints that can be used toreason about PROV documents, or determine their validity. This documentprovides a formal semantics of PROV, providing a formalcounterpart to the informal descriptions and motivations givenelsewhere in PROV specifications.

1.1Purpose of this document

The PROV-DM and PROV-CONSTRAINTS specifications give motivating examples thatprovide an intuition about the meaning of the constructs. For someconcepts, such as use, start, end, generation, invalidation, andderivation, the meaning is either obvious or situation-dependent.However, during the development of PROV, the importance of additionalconcepts became evident, but the intuitive meaning or correct use ofthese concepts were not clear. For example, thealternateOf andspecializationOf relations are used in PROV to relate differententities that present aspects of "the same thing". Over time the working group came to aconsensus about these concepts and how they are to be used, but thisunderstanding is based on abstract notions that are not explicit inPROV documents; instead, some of their properties are capturedformally through certain constraints and inferences, while others arenot captured in PROV specifications at all.

The purpose of this document is to present the working group'sconsensus view of the semantics of PROV, using tools from mathematicallogic, principally model theory (though our use of these tools is lightweight). This information may be useful to users for understanding theintent behind certain features of PROV, to researchers investigatingricher forms of reasoning over provenance, or to future effortsbuilding upon PROV. It is intended as an exploration ofone semantics for PROV,not a definitive specification of theonlysemantics of PROV. We provide a semantics that satisfies allof the constraints on valid PROV instances, and such that valid PROVinstances correspond to satisfiable theories: every valid instance hasa model, and vice versa.

The semantics has some appealingproperties. Specifically, it provides a declarative counterpart tothe operational definition of validity taken in PROV-CONSTRAINTS. Inthe specification, validity is defined via a normalization processfollowed by constraint checking on the normal form. This approach was adoptedto keep the specification closer to implementations, although otherimplementations are possible and allowed. In addition to providing asemantics, this document shows that the operationalpresentation of PROV validity checking is equivalent to thedeclarative presentation adopted here. This could help justifyalternative approaches to validity checking.

This document mostly considers the semantics of PROV statements and instances. PROV documents can consist of multiple instances, such as named bundles. The semantics do not cover general PROV documents, but the semantics can be used on each instance in a document separately, just as PROV-CONSTRAINTS specifies that each instance in a document is to be validated separately. So, in the rest of this document, we discuss only PROV instances and not PROV documents. The semantics of extensions of PROV, such as dictionaries [PROV-DICTIONARY] and linking across bundles [PROV-LINKS], are beyond the scope of this document.

This document has been reviewed by the Working Group, but thetheorems and proofs have not beenformally peer-reviewed in the sense of an academic paper. Thus, the Working Group believes this document is an appropriate startingpoint for future study of the semantics of PROV, but further work maybe needed.

1.2Structure of this document

1.3 Audience

This document assumes familiarity with [PROV-DM] and [PROV-CONSTRAINTS] and employs (a simplified form of) [PROV-N] notation. In particular it assumes familiarity with the concepts from logic, and the relationship between PROV statements and instances and first-order formulas and theories, respectively, presented inSection 2.5 of PROV-CONSTRAINTS.

This document may be useful to users of PROV who have a formal background and are interested in the rationale for some of the constructs of PROV; for researchers investigating extensions of PROV or alternative approaches to reasoning about PROV; or for future efforts on provenance standardization.

2. Basics

2.1 Identifiers

A lowercase symbolx,y,... on its own denotes an identifier. Identifiers are viewed asvariables from the point of view of logic. Identifiers denote objects, andtwo different identifiersx andy may denote equal or different objects. We writeIdentifiers for theset of identifiers of interest in a given situation (typically, theset of identifiers present in the PROV instance of interest).

2.2 Attributes and Values

We assume a setAttributes of attribute labels and a setValuesof possible values of attributes. To allow for the fact that someattributes can have undefined or multiple values, we sometimes use the setP(Values), that is, the set of sets of values. Thus, we can use theempty set to stand for an undefined value and{a,b} to stand forthe set of values of a two-valued attribute.

2.3 Times

We assume an ordered set(Times,) of time instants, whereTimesValues and is a linear order.

2.4Atomic Formulas

The following atomic formulas correspond to the statements of PROV-DM. We assume that definitions 1-4 of PROV-CONSTRAINTS have been applied in order to expand all optional parameters; thus, we use uniform notationr(id,a1,,an) instead of the semicolon notationr(id;a1,,an).

Each parameter is either an identifier, a constant (e.g. a time or other literal value in an attribute list), or a null symbol "". Placeholder symbols "" can only appear in the specified argumentspl inwasAssociatedWith anda,g,u inwasDerivedFrom, as shown in the grammar below.

atomic_formula::=element_formula|relation_formula|auxiliary_formulaelement_formula::=entity(id,attrs)|activity(id,st,et,attrs)|agent(id,attrs)relation_formula::=wasGeneratedBy(id,e,a,t,attrs)|used(id,e,a,t,attrs)|wasInvalidatedBy(id,e,a,t,attrs)|wasStartedBy(id,a2,e,a1,t,attrs)|wasEndedBy(id,a2,e,a1,t,attrs)|wasAssociatedWith(id,ag,act,pl,attrs)|wasAssociatedWith(id,ag,act,,attrs)|wasAttributedTo(id,e,ag,attrs)|actedOnBehalfOf(if,ag2,ag1,act,attrs)|wasInformedBy(id,a2,a1,attrs)|wasDerivedFrom(id,e2,e1,act,g,u,attrs)|wasDerivedFrom(id,e2,e1,,,,attrs)|wasInfluencedBy(id,x,y,attrs)|alternateOf(e1,e2)|specializationOf(e1,e2)|hadMember(c,e)auxiliary_formula::=x strictlyPrecedes y|x precedes y|notNull(x)|typeOf(x,ty)attrs::=[attr1=v1,,attrn=vn]ty::=entity|activity|agent|Collection|EmptyCollection

We include the standard PROV collection types (Collection andEmptyCollection) and the membership relationhadMember; however, we do not model dictionaries or the insertion or deletion relations in PROV-DICTIONARY [PROV-DICTIONARY], since these are not part of the PROV recommendations. If these features are incorporated into future standards, their semantics (and the soundness of the associated constraints) should be modeled. We omit theprov prefixes from theCollection andEmptyCollection types.

As stated in the Introduction, we do not explicitly model bundles or PROV documents; however, each instance can be viewed as a set of formulas and can be modeled separately. The semantics of the standard features of PROV can be defined without talking about multiple instances; however, thementionOf relation in PROV-LINKS [PROV-LINKS] is intended to support linking across bundles. Future editions of PROV may incorporatementionOf or other cross-instance assertions, and if so this semantics should be generalized in order to provide a rationale for such an extension and to establish the soundness of constraints associated withmentionOf.

2.5First-Order Formulas

We also consider the usual connectives and quantifiers offirst-order logic.

ϕ::=atomic_formula|True|False|x=y|¬ ϕ|ϕ1ϕ2|ϕ1ϕ2|ϕ1ϕ2|x.ϕ|x.ϕ

3. Structures and Interpretations

In this section we define mathematical structuresW that can be used tointerpret PROV formulas and instances. A structure consists of acollection of sets, functions and relations. The components of astructureW are given in the rest of the section incomponents, highlighted in boxes.

We use the term "component" here in a different sense than in PROV-DM. Here, the components are parts of a large definition, whereas PROV-DM defines six components that group different parts of the PROV data model.

3.1 Things

Things is a set of things in the situation being modeled.Each thing has an associated set ofEvents and attributes whosevalues can change over time. Different kinds ofEvents are specified further below.

To model this, a structureW includes:

Component 1 (things)
  1. a setThings of things
  2. a setEvents of events
  3. a functionevents:ThingsP(Events) from things toassociated sets of events.
  4. a functionvalue:Things×Attributes×EventsP(Values) giving the possible values of each attribute of aThing at the instant of a given event.
  5. Attributes are only defined during the events of a thing, thatis,value(T,a,evt) impliesevtevents(T).

The range ofvalue is the setP(Values), indicating thatvalueis essentially a multi-valued function that returns a set of values (possibly empty). Whenvalue(x,a,evt)=, we say that attributea is undefined forx at eventevt.

Note that this description does not say what the structure of aThing is, only how it may be described in terms of its eventsand attribute values. A thing could be a record of fixedattribute values; it could be a bear; it could be the Royal Society;it could be a transcendental number likeπ. All that matters fromour point of view is that we know how to map theThing to its events and attribute mapping.

The identity of a Thing is not observable through its attributes orevents, so it is possible for two differentThings to be indistinguishable by theirattribute values and events. That is, if the set ofThings={T0,T1} and the attributes arespecified asvalue(T0,a,evt)=value(T1,a,evt) for eachevtEvents andaAttributes, this does not imply thatT0=T1.

Things are associated with certain kinds ofObjects calledEntities, defined inthe next subsection. Specifically, the functionthingOf associatesanEntity to aThing.

3.2 Objects

Things are things in the world that have attributes that can change over time.Things may not have distinguishing features that are readily observable and permanent. In PROV, we do not talk explicitly aboutThings, but instead we talk about various objects that have discrete, fixed features, and relationships among these objects. Some objects, calledEntities, are associated withThings, and their fixed attributes need to match those of the associatedThing during their common events. Others correspond to agents, activities, or identifiable interactions among them.

In this section, we detail the different subsets ofObjects, andgive disjointness constraints and associated functions. Generally, these constraints are necessary to validatedisjointness constraints from PROV-CONSTRAINTS [PROV-CONSTRAINTS].

AnObject is described by a set of events and attributes withfixed values. Objects encompass entities, activities, agents, andinteractions (i.e., usage, generation, and other events or influence relations).To model this, a structure includes:

Component 2 (objects)
  1. a setObjects
  2. a functionevents:ObjectsP(Events) from objects to associated sets of events.
  3. a functionvalue:Objects×AttributesP(Values).

Intuitively,events(e) is the set of events in whiche participated. The setvalue(e,a) is the set of values of attributea during the object's events.

As withThings, the range ofvalue is sets of values,makingvalue effectively a multivalued function. It is alsopossible to have two different objects that are indistinguishable bytheir attributes and associated events. Objects are not things, and thesets ofObjects andThings are disjoint; however, certain objects,namely entities, are associated with things.

Disjointness betweenObjects andThings is not necessary but is assumed in order to avoid confusion between the different categories (time-varyingThings vs fixedObjects).

3.2.1 Entities

Anentity is a kind of object that fixes some aspects of a thing. We assume:

Component 3 (entities)
  1. a setEntitiesObjects of entities, disjoint fromActivities below.
  2. a functionthingOf:EntitiesThings that associates eachEntitye with aThing, such thatevents(e)events(thingOf(e)) and for eachevtevents(e) and for each attributea we havevalue(e,a)value(thingOf(e),a,evt).

Although both entities and things can have undefined or multiple attribute values, their meaning is slightly different: for a thing,value(x,a,evt)= means that the attributea has no value at eventevt, whereas for an entity,value(x,a)= only means that the thing associated to entityx need not have a fixed value fora during the events ofx. This does not imply thatvalue(thingOf(e),a,evt)= whenevtevents(e).

Furthermore, all of the attribute values of the entity must be present in the associated thing throughout the events of the entity. For example, supposevalue(thingOf(e),a,evt) is{1} at some eventevtevents(e) andvalue(thingOf(e),a,evt)={2} at some other eventevt. Thenvalue(e,a) must be because there is no other set of values that is simultaneously contained in both{1} and{2}.

In the above description of howEntities relate toThings, we requirevalue(e,a)value(thingOf(e),a,evt) wheneverevtevents(e). Intuitively, this means that if we are talking about aThing indirectly by describing anEntity, then any attributes we ascribe to theEntity must also describe the associatedThing during their common events. Attributes of bothEntities andThings are multi-valued, so there is no inconsistency in saying that an entity has two different values for some attribute. In some situations, further uniqueness constraints or range constraints could be imposed on attributes.

OnlyEntities are associated withThings, and this association is necessary to provide an interpretation for thealternateOf andspecializationOf relations. It might also make sense to associateAgents,Activities, andInteractions withThings, or with some other structures; however, this is not necessary to model any of the current features of PROV, so in the interest of simplicity we do not do this.

3.2.1.1 Plans

We identify a specific subset of the entities calledplans:

A setPlansEntities of plans.

3.2.1.2Collections

We identify another specific subset of the entities calledcollections, with the following associated structure:

  • A setCollectionsEntities
  • A membership functionmembers:CollectionsP(Entities)mapping each collection to its set of members.

3.2.2 Activities

Anactivity is an object corresponding to a continuing process rather than an evolving thing. We introduce:

Component 6 (activities)
  1. A setActivitiesObjects of activities.
  2. FunctionsstartTime:ActivitiesTimes andendTime:ActivitiesTimes giving the start and end time of each activity.
  3. Activities are disjoint from Entities:EntitiesActivities=.

3.2.3 Agents

An agent is an object that can act, by controlling, starting, ending, or participating in activities. An agent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity. Agents can act on behalf of other agents. An agent may be a particular type of entity or activity; an agent cannot be both entity and activity because the sets of entities and activities are disjoint. We introduce:

Component 7 (agents)

A setAgentsObjects of agents.

There is no requirement that every agent is either an activity or an entity.

3.2.4 Influences

We consider a setInfluencesObjects which has disjoint subsetsEvents connecting entities and activities,Associations between agents and activities,Attributions between entities and agents,Communications between pairs of activities,Delegations between pairs of agents, andDerivations that describe chains of generation and usage steps. These kinds of influences are discussed further below. Influences are disjoint from entities, activities and agents.

Component 8 (influences)
  1. A setInfluences=EventsAssociationsCommunicationsDelegationsDerivationsObjects
  2. The setsEvents,Associations,Communications,Delegations andDerivations are all pairwise disjoint.
  3. Influences are disjoint from entities, agents andactivities:Influences(EntitiesActivitiesAgents)=
  4. An associated functioninfluenced:InfluencesObjects×Objects giving the source and target of each influence.
3.2.4.1 Events

AnEvent is an instantaneous influence that relates an activityto an entity (either of which could also be anagent). Events have types including usage, generation, invalidation, starting and ending. Events are instantaneous. We introduce:

  1. A setEventsInfluences of events, partitioned into disjoint subsetsStarts,Ends,Generations,Usages,Invalidations.
  2. A functiontime:EventsTimes.
  3. A quasi-ordering on events⪯⊂Events×Events. We writeee whenee ande⪯̸e hold.
  4. A functionstarted:StartsActivities×Entities×Activities, such thatstarted(start)=(a,e,a) impliesstartevents(a)events(e)events(a).
  5. A functionended:EndsActivities×Entities×Activities, such thatended(end)=(a,e,a) impliesendevents(a)events(e)events(a).
  6. A functionused:UsagesActivities×Entities suchthatused(use)=(a,e) impliesuseevents(a)events(e).
  7. A functiongenerated:GenerationsEntities×Activities such thatgenerated(gen)=(a,e) impliesgenevents(a)events(e).
  8. A functioninvalidated:InvalidationsEntities×Activities such thatinvalidated(inv)=(a,e) impliesinvevents(a)events(e).
3.2.4.2 Associations

AnAssociation is an influence relating an agent to an activityand optional plan. To model associations, we introduce:

A setAssociationsInfluences with associated functionassociatedWith:AssociationsAgents×Activities×Plans.

3.2.4.3 Attributions

AnAttribution is an influence relating an entity to an agent. To model attributions, we introduce:

A setAttributionsInfluences with associated functionattributedTo:AttributionsEntities×Agents.

3.2.4.4Communications

ACommunication is an influence indicating exchange of information between activities. To model communications, we introduce:

A setCommunicationsInfluences with associated functioncommunicated:CommunicationsActivities×Activities.

3.2.4.5Delegations

ADelegation is an influence relating two agents. To model delegations, we introduce:

A setDelegationsInfluences and associated functionactedFor:DelegationsAgents×Agents×Activities

3.2.4.6 Derivations

ADerivation is an influence chaining one or more generation and use steps. To model derivations, we introduce an auxiliary notion ofderivation path. These paths are of the form

entngnactnunentn1...ent1g1act1u1ent0

where theenti are entities,acti are activities,gi are generations, andui are usages.

Formally, we consider the (regular) language:

DerivationPaths=Entities(GenerationsActivitiesUsagesEntities)+

with the constraints that for each derivation path:

  • for each substringentgact we havegenerated(g)=(ent,act), and
  • for each substringactuent we haveused(u)=(act,ent).

A setDerivationsInfluences with an associated functionderivationPath:DerivationsDerivationPaths linking each derivation to a derivation path.

ThederivationPath function links eachdDerivations to a derivation path. A derivation has exactly one associated derivation path. However, if the PROV-N statementwasDerivedFrom(e_2,e_1,-,-,-) is asserted in an instance, there may be multiple derivation paths linkinge2 toe1, each corresponding to a different path, identified by different derivationsdDerivations.

A derivation path implies the existence of at least one chained generation and use step. However, not all such potential derivation paths are associated with derivations; there can (and in general will) be many such paths that are not associated with derivation steps. In other words, because we require derivations to be explicitly associated with derivation paths, it is not sound to infer the existence of a derivation from the existence of an alternating generation/use chain.

The reason why we need paths and not just individual derivation steps is to reflect thatwasDerivedFrom(id,e2,e1,,,,attrs) formulas can represent multiple derivation steps. However, there is no way to force a derivation to take multiple steps. Any valid PROV instance has a model in which all derivation paths are one-step.

3.3Additional axioms

Above we have stated some properties of the components. We impose some additional properties that relate several components, as follows:

Component 15 (axioms)
  1. Ifgenerated(g)=(e,a1) andused(u)=(a2,e) then there existscCommunications such thatcommunicated(c)=(a2,a1).
  2. IfeEntities then there existgen,inv,a,a such thatgenerated(gen)=(e,a) andinvalidated(inv)=(e,a).
  3. Ifstarted(start)=(a2,e,a1) then there existsgen such thatgenerated(gen)=(e,a1).
  4. Ifended(end)=(a2,e,a1) then there existsgen such thatgenerated(gen)=(e,a1).
  5. IfdDerivations andprov:Revisionvalue(d,prov:type) and there existsw(GenerationsActivitiesUsesEntities) such thatderivationPath(deriv)=e2we1DerivationPaths thenthingOf(e1)=thingOf(e2).
  6. IfattributedTo(att)=(e,ag) then there existgen,assoc anda such thatgenerated(gen)=(e,a) andassociatedWith(assoc)=(a,ag).
  7. IfactedFor(deleg)=(ag2,ag1,act) then there existassoc1,assoc2,pl1,pl2 such thatassociatedWith(assoc1)=(ag1,act,pl1) andassociatedWith(assoc2)=(ag2,act,pl2).
  8. Ifgenerated(id)=(e,a) theninfluenced(id)=(e,a).
  9. Ifused(id)=(e,a) theninfluenced(id)=(e,a).
  10. Ifcommunicated(id)=(a2,a1) theninfluenced(id)=(a2,a1).
  11. Ifstarted(id)=(a2,e,a1) theninfluenced(id)=(a2,e).
  12. Ifended(id)=(a2,e,a1) theninfluenced(id)=(a2,e).
  13. Ifinvalidated(id)=(e,a) theninfluenced(id)=(e,a).
  14. IfderivationPath(id)=e2we1 theninfluenced(id)=(e2,e1).
  15. IfattributedTo(id)=(e,ag) theninfluenced(id)=(e,ag).
  16. IfassociatedWith(id)=(a,ag,pl) theninfluenced(id)=(a,ag).
  17. IfactedFor(id)=(ag2,ag1) theninfluenced(id)=(ag2,ag1).
  18. Ifgenerated(gen)=(e,a)=generated(gen) thengen=gen.
  19. Ifinvalidated(inv)=(e,a)=invalidated(inv) theninv=inv.
  20. Ifstarted(st)=(a,e1,a) andstarted(st)=(a,e2,a) thenst=st.
  21. Ifended(end)=(a,e1,a) andended(end)=(a,e2,a) thenend=end.
  22. Ifstarted(st)=(a,e,a) thenstevt for allevtevents(a)Invalidations.
  23. Ifended(end)=(a,e,a) thenevtend for allevtevents(a)Invalidations.
  24. Ifgenerated(gen)=(e,a) thengenevt for allevtevents(e).
  25. Ifinvalidated(inv)=(e,a) thenevtinv for allevtevents(e).
  26. For any derivationderiv, with pathderivationPath(deriv)=w, ife2gaue1 is a substring ofw wheree1,e2Entities,gGenerations,uUsages andaActivities thenug.
  27. For any derivationderiv, with pathderivationPath(deriv)=e2we1, ifgenerated(gen1)=(e1,a1) andgenerated(gen2)=(e2,a2) thengen1gen2.
  28. IfassociatedWith(assoc)=(a,ag,pl) andstarted(start)=(a,e1,a1) andinvalidated(inv)=(ag,a2) thenstartinv.
  29. IfassociatedWith(assoc)=(a,ag,pl) andgenerated(gen)=(ag,a1) andended(end)=(a,e2,a2) thengenend.
  30. IfassociatedWith(assoc)=(a,ag,pl) andstarted(start)=(a,e1,a1) andended(end)=(ag,e2,a2) thenstartend.
  31. IfassociatedWith(assoc)=(a,ag,pl) andstarted(start)=(ag,e1,a1) andended(end)=(a,e2,a2) thenstartend.
  32. IfattributedTo(attrib)=(e,ag) andgenerated(gen1)=(ag1,a1) andgenerated(gen2)=(e,a2) thengen1gen2.
  33. IfattributedTo(attrib)=(e,ag) andstarted(start)=(ag1,e1,a1) andgenerated(gen)=(e,a2) thenstartgen.
  34. IfactedFor(deleg)=(ag2,ag1,a) andgenerated(gen)=(ag1,a1) andinvalidated(inv)=(ag2,a2) thengeninv.
  35. IfactedFor(deleg)=(ag2,ag1,a) andstarted(start)=(ag1,e1,a1) andended(end)=(ag2,e2,a2) thenstartend.
  36. IfeEntity andprov:emptyCollectionvalue(e,prov:type) theneCollections andmembers(e)=.

These properties are calledaxioms, and they are needed to ensure that the PROV-CONSTRAINTS inferences and constraints hold in all structures.

Axioms 22 and 23 do not require that invalidation events originating from an activity follow the activity's start event(s) or precede its end event(s). This is because there is no such constraint in PROV-CONSTRAINTS. Arguably, there should be a constraint analogous to Constraint 34 that specifies that any invalidation event in which an activity participates must follow the activity's start event(s) and precede its end event(s).

Here, we exempt invalidations from axioms 22 and 23 in order to simplify the proof of weak completeness.

3.4 Putting it all together

APROV structureW is a collection of sets, functions, and relations containing all of the abovedescribed components and satisfying all of the associated propertiesand axioms. If we need to talk about the objects or relations ofmore than one structure then we may writeW1.Objects,W1.Things,etc.; otherwise, todecrease notational clutter, when we consider a fixed structure then the names of the sets, relations and functions above refer to the components of that model.

Some features of PROV structures are relatively obvious or routine,corresponding directly to features of PROV and associated inferences.For example, the functionsused,generated,invalidated,started,ended mapping events to their associated entities or activities, andcommunicated,associatedWith,attributedTo,actedFor associatingother types of influences with appropriate data.

On the other hand,some features are more distinctive, and represent areas where formalmodeling has been used to guide the development of PROV. Derivationpaths are one such distinctive feature; they correspond to anintuition that derivations may describe one or multiple generation-usesteps leading from one entity to another. Another distinctive featureis the use ofThings, which correspond to changing, real-worldthings, as opposed toEntities, which correspond to limited views orperspectives onThings, with some fixed aspects. The semanticstructures ofThings andEntities provide a foundation for thealternateOf andspecializationOf relations.

3.5 Interpretations

We need to link identifiers to the objects they denote. We do this using a function which we shall call aninterpretation. An interpretation is a functionρ:IdentifiersObjects describingwhich object is the target of each identifier. The mapping from identifiers to objects maynot change over time; onlyObjects can be denoted byIdentifiers.

4. Semantics

In what follows, letW be a fixed structure with the associated sets and relations discussed in the previous section, and letρ be an interpretation of identifiers as objects inW.The annotations [WF] refer to well-formedness constraints that correspond to typing constraints.

4.1 Satisfaction

Consider a formulaϕ, a structureW and an interpretationρ.We define notationW,ρϕ which means thatϕ is satisfied inW,ρ. For atomic formulas, the definition of the satisfaction relation is given in the next few subsections. We give the standard definition of the semantics of the other formulas:

Semantics 16 (first-order-logic-semantics)
  1. W,ρTrue always holds.
  2. W,ρFalse never holds.
  3. W,ρx=y holds if and only ifρ(x)=ρ(y).
  4. W,ρ¬ϕ holds if and only ifW,ρϕ does not hold.
  5. W,ρϕψ holds if and only ifW,ρϕ andW,ρψ.
  6. W,ρϕψ holds if eitherW,ρϕ orW,ρψ.
  7. W,ρϕψ holds ifW,ρϕ impliesW,ρψ.
  8. W,ρx.ϕ holds if there exists someobjObjects such thatW,ρ[x:=obj]ϕ.
  9. W,ρx.ϕ holds if there for everyobjObjects we haveW,ρ[x:=obj]ϕ.

In the semantics above, note that the domain of quantification is the set ofObjects; that is, quantifiers range over entities, activities, agents, or influences (which are in turn further subdivided into types of influences).Things and relations cannot be referenced directly by identifiers.

A PROV instanceI consists of a set of statements, each of which can be translated to an atomic formula following the definitional rules in PROV-CONSTRAINTS, possibly by introducing fresh existential variables. Thus, we can view an instanceI as a set of atomic formulas{ϕ1,,ϕn}, or equivalently a single formulax1,,xk. ϕ1ϕn, wherex1,,xk are the existential variables ofI.

4.2 Attribute matching

We say that an objectobj matches attributes[attr1=val1,...] in structureW provided:for each attributeattri, we havevaliW.value(obj,attri).This is sometimes abbreviated as:match(W,obj,attrs).

4.3 Semantics of Element Formulas

4.3.1 Entity

An entity formula is of the formentity(id,attrs) whereid denotes an entity.

Entity formulasentity(id,attrs) can be interpreted as follows:

Semantics 17 (entity-semantics)

W,ρentity(id,attrs) holds if and only if:

  1. [WF]id denotes an entityent=ρ(id)Entities.
  2. the attributes match:match(W,ent,attrs).

Not all of the attributes of an entity object are required to be present in an entity formula about that object. For example, the following formulas all hold ifx denotes an entitye such thatvalue(e,a)={4,5},value(e,b)={6} hold:

 entity(x,[]) entity(x,[a=5]) entity(x,[a=4,a=5]) entity(x,[a=4,b=6])

Note that PROV-CONSTRAINTS normalization will merge these formulasto a single one:

  entity(x,[a=4,a=5,b=6])

4.3.2 Activity

An activity formula is of the formactivity(id,st,et,attrs)whereid is a identifier referring to the activity,st is a starttime andet is an end time, andattrs are the attributes ofactivityid.

Semantics 18 (activity-semantics)

W,ρactivity(id,st,et,attrs) holds if and only if:

  1. [WF] The identifierid maps to an activityact=ρ(id)Activities.
  2. ρ(st)Times is the activity's start time, that is:startTime(act)=ρ(st).
  3. ρ(et) is the activity's end time, that is:endTime(act)=ρ(et).
  4. There existsstart,e,a such thatstarted(start)=(act,e,a),and for all such start events $startTime(act) = time(start).
  5. There existsend,e,a such thatended(end)=(act,e,a), andfor all such end eventsendTime(act)=time(end).
  6. The attributes match:match(W,act,attrs).

The above definition is complicated for two reasons. First, we need to ensure that every activity has a start and end event. Second, when anactivity formula is asserted, we need to make sure all of the associated start and end event times match.

4.3.3 Agent

An agent formula is of the formagent(id,attrs) whereid denotes the agent andattrs describes additional attributes.

Semantics 19 (agent-semantics)

W,ρagent(id,attrs) holds if and only if:

  1. [WF]id denotes an agentag=ρ(id)Agents.
  2. The attributes match:match(W,ag,attrs).

4.4 Semantics of Relations

4.4.1 Generation

The generation formula is of the formwasGeneratedBy(id,e,a,t,attrs) whereid is an event identifier,e is an entity identifier,a is an activity identifier,attrs isa set of attribute-value pairs, andt is a time.

Semantics 20 (generation-semantics)

W,ρwasGeneratedBy(id,e,a,t,attrs) holds if and only if:

  1. [WF] The identifierid denotes a generation eventevt=ρ(id)Generations.
  2. [WF] The identifiere denotes an entityent=ρ(e)Entities.
  3. [WF] The identifiera denotes an activityact=ρ(a)Activities.
  4. The eventevt occurred at timeρ(t)Times, i.e.time(evt)=ρ(t).
  5. The activityact generatedent viaevt, i.e.generated(evt)=(ent,act).
  6. The attribute values match:match(W,evt,attrs).

4.4.2 Use

The use formula is of the formused(id,a,e,t,attrs) whereiddenotes an event,a is an activity identifier,e is an objectidentifier,attrs is a set of attribute-value pairs, andt is a time.

Semantics 21 (usage-semantics)

W,ρused(id,a,e,t,attrs) holds if and only if:

  1. [WF] The identifierid denotes a usage eventevt=ρ(id)Usages.
  2. [WF] The identifiera denotes an activityact=ρ(id)Activities.
  3. [WF] The identifiere denotes an entityent=ρ(e)Entities.
  4. The eventevt occurred at timeρ(t)Times,i.e.time(evt)=ρ(t).
  5. The activityact usedobj viaevt, i.e.used(evt)=(act,ent).
  6. The attribute values match:match(W,evt,attrs).

4.4.3 Invalidation

The invalidation formula is of the formwasInvalidatedBy(id,e,a,t,attrs) whereid is an event identifier,e is an entity identifier,a is an activity identifier,attrs is a set of attribute-value pairs, andt is a time.

Semantics 22 (invalidation-semantics)

An invalidation formulaW,ρwasInvalidatedBy(id,e,a,t,attrs) holds if and only if:

  1. [WF] The identifierid denotes an invalidation eventevt=ρ(id)Invalidations.
  2. [WF] The identifiere denotes an entityent=ρ(e)Entities.
  3. [WF] The identifiera denotes an activityact=ρ(a)Activities.
  4. The eventevt occurred at timeρ(t)Times, i.e.time(evt)=ρ(t).
  5. The activityact invalidatedent viaevt, i.e.invalidated(evt)=(ent,act).
  6. The attribute values match:match(W,evt,attrs).

4.4.4 Association

An association formula has the formwasAssociatedWith(id,a,ag,pl,attrs).

Semantics 23 (association-plan-semantics)

W,ρwasAssociatedWith(id,a,ag,pl,attrs) holds if and only if:

  1. [WF]assoc denotes an associationassoc=ρ(id)Associations.
  2. [WF]a denotes an activityact=ρ(a)Activities.
  3. [WF]ag denotes an agentagent=ρ(ag)Agents.
  4. [WF]pl denotes a planplan=ρ(pl)Plans.
  5. The association associates the agent with the activity and plan, i.e.associatedWith(assoc)=(agent,act,plan).
  6. The attributes match:match(W,assoc,attrs).
Semantics 24 (assocation-semantics)

W,ρwasAssociatedWith(id,a,ag,,attrs) holds if and only if:

  1. [WF]assoc denotes an associationassoc=ρ(id)Associations.
  2. [WF]a denotes an activityact=ρ(a)Activities.
  3. [WF]ag denotes an agentagent=ρ(ag)Agents.
  4. The association associates the agent with the activity and no plan, i.e.associatedWith(assoc)=(agent,act,).
  5. The attributes match:match(W,assoc,attrs).

4.4.5 Start

A start formulawasStartedBy(id,a2,e,a1,t,attrs) is interpreted as follows:

Semantics 25 (start-semantics)

W,ρwasStartedBy(id,a2,e,a1,t,attrs) holds if and only if:

  1. [WF]id denotes a start eventevt=ρ(id)Starts.
  2. [WF]a2 denotes an activityact2=ρ(a2)Activities.
  3. [WF]e denotes an entityent=ρ(e)Entities.
  4. [WF]a1 denotes an activityact1=ρ(a1)Activities.
  5. The event happened at timet, that is,ρ(t)==time(evt).
  6. The activityact1 startedact2 via entityent: that is,started(evt)=(act2,ent,act1).
  7. The attributes match:match(W,evt,attrs).

4.4.6 End

An activity end formulawasEndedBy(id,a2,e,a1,t,attrs) is interpreted as follows:

Semantics 26 (end-semantics)

W,ρwasEndedBy(id,a2,e,a1,t,attrs) holds if and only if:

  1. [WF]id denotes an end eventevt=ρ(id)Ends.
  2. [WF]a2 denotes an activityact2=ρ(a2)Activities.
  3. [WF]e denotes an entityent=ρ(e)Entities.
  4. [WF]a1 denotes an activityact1=ρ(a1)Activities.
  5. The event happened at the end ofact2, that is,ρ(t)=endTime(act2)=time(evt).
  6. The activityact1 endedact2 via entityent: that is,ended(evt)=(act2,ent,act1).
  7. The attributes match:match(W,evt,attrs).

4.4.7 Attribution

An attribution formulawasAttributedTo(id,e,ag,attrs) is interpreted as follows:

Semantics 27 (attribution-semantics)

W,ρwasAttributedTo(id,e,ag,attrs) holds if and only if:

  1. [WF]id denotes an associationassoc=ρ(id)Associations.
  2. [WF]e denotes an entityent=ρ(e)Entities.
  3. [WF]ag denotes an agentagent=ρ(ag)Agents.
  4. The entity was attributed to the agent, i.e.attributedTo(assoc)=(ent,agent).
  5. The attributes match:match(W,assoc,attrs).

4.4.8Communication

A communication formulawasInformedBy(id,a2,a2,attrs) isinterpreted as follows:

Semantics 28 (communication-semantics)

W,ρwasInformedBy(id,a2,a1,attrs) holds if and only if:

  1. [WF]id denotes a communicationcomm=ρ(id)Communications.
  2. [WF]a1,a2 denote activitiesact1=ρ(a1)Activities,act2=ρ(a2)Activities.
  3. There existgen,use,ent such thatcommunicated(comm)=(act2,act1) andgenerated(gen)=(ent,act1) andused(use)=(act2,ent).
  4. The attributes match:match(W,comm,attrs).

4.4.9 Delegation

TheactedOnBehalfOf(id,ag2,ag1,act,attrs) relation is interpreted as follows:

Semantics 29 (delegation-semantics)

W,ρactedOnBehalfOf(id,ag2,ag1,act,attrs) holds if and only if:

  1. [WF]id denotes a delegationdeleg=ρ(id)Delegations.
  2. [WF]a denotes an activityact=ρ(a)Activities.
  3. [WF]ag1,ag2 denote agentsagent1=ρ(ag1),agent2=ρ(ag2)Agents.
  4. The agentagent2 acted for the agentagent1 with respect to the activityact, i.e.actedFor(deleg)=(agent2,agent1,act).
  5. The attributes match:match(W,deleg,attrs).

4.4.10 Derivation

Derivation formulas can be of one of two forms:

  • wasDerivedFrom(id,e2,e1,a,g,u,attrs), which specifies an activity, generation and usage event. For convenience we call this aprecise derivation.
  • andwasDerivedFrom(id,e2,e1,,,,attrs), which does not specify an activity, generation and usage event. For convenience we call this animprecise derivation.
4.4.10.1 Precise

A precise derivation formula has the formwasDerivedFrom(id,e2,e1,a,g,u,attrs).

W,ρwasDerivedFrom(id,e2,e1,act,g,u,attrs) holds if and only if:

  1. [WF]id denotes a derivationderiv=ρ(id)Derivations.
  2. [WF]e1,e2 denote entitiesent1=ρ(e1),ent2=ρ(e2)Entities.
  3. [WF]a denotes an activityact=ρ(a)Activities.
  4. [WF]g denotes a generation eventgen=ρ(g)Generations.
  5. [WF]u denotes a use eventuse=ρ(u)Usages.
  6. The derivation denotes a one-step derivation path linking theentities via the activity, generation and use:derivationPath(deriv)=ent2genactuseent1.
  7. The attribute values match:match(W,deriv,attrs).
4.4.10.2 Imprecise

An imprecise derivation formula has the formwasDerivedFrom(id,e2,e1,,,,attrs).

W,ρwasDerivedFrom(id,e2,e1,,,,attrs) holds if and only if:

  1. [WF]id denotes a derivationderiv=ρ(id)Derivations.
  2. [WF]e1,e2 denote entitiesent1=ρ(e1),ent2=ρ(e2)Entities .
  3. derivationPath(deriv)=ent2went1 for somew.
  4. The attribute values match:match(W,deriv,attrs).

4.4.11Influence

Semantics 32 (influence-semantics)

W,ρwasInfluencedBy(id,o2,o1,attrs) holds if and only if at least one of the following hold:

  1. [WF]id denotes an influenceinf=ρ(id)Influences.
  2. [WF]o1 ando2 denote objectsobj1=ρ(o1)Objects andobj2=ρ(o2)Objects.
  3. The influenceinf linkso2 witho1; that is,influenced(inf)=(o2,o1).
  4. The attribute values match:match(W,deriv,attrs).

4.4.12 Specialization

ThespecializationOf(e1,e2) relation indicates when one entity formula presents more specific aspects of another.

Semantics 33 (specialization-semantics)

W,ρspecializationOf(e1,e2) holds if and only if:

  1. [WF] Bothe1 ande2 are entity identifiers, denoting entitiesent1=ρ(e1)Entities andent2=ρ(e2)Entities.
  2. The two entities present aspects of the same thing, that is,thingOf(ent1)=thingOf(ent2).
  3. The events ofent1 are contained in those ofent2, i.e.events(ent1)events(ent2).
  4. For each attributeattr we havevalue(ent1,attr)value(ent2,attr).
  5. At least one of these inclusions is strict: that is, eitherevents(ent1)events(ent2) or for someattr we havevalue(ent1,attr)value(ent2,attr).

The second criterion says that the two Entities present (possibly different) aspects ofthe same Thing. Note that the third criterion allowsent1 andent2 to have the same events (orevents(ent2) can be larger).The last criterion allowsent1 to have more defined attributes thanent2, but they must include the attributes defined byent2. Two different entities that have the same attributes can also be related by specialization. The fifth criterion (indirectly) ensures that specialization is irreflexive.

4.4.13 Alternate

ThealternateOf relation indicates when two entity formulas present (possibly different) aspects of the same thing. The two entities may or may not overlap in time.

Semantics 34 (alternate-semantics)

W,ρalternateOf(e1,e2) holds if and only if:

  1. [WF] Bothe1 ande2 are entity identifiers, denotingent1=ρ(e1) andent2=ρ(e2).
  2. The two objects refer to the same underlying Thing:thingOf(ent1)=thingOf(ent2)

4.4.14 Membership

ThehadMember relation relates a collection to an element of the collection.

Semantics 35 (membership-semantics)

W,ρhadMember(c,e) holds if and only if:

  1. [WF] Bothe1 ande2 are entity identifiers, denotingcoll=ρ(c)Collections andent=ρ(e)Entities.
  2. The entityent is a member of the collectioncoll: that is,entmembers(coll).

4.5Semantics of Auxiliary Formulas

In this section, we define the semantics of additional formulas concerning ordering, null values, and typing. These are used in the logical versions of constraints.

4.5.1Precedes and Strictly Precedes

The precedes relationx precedes y holds between two events, one taking place before (or simultaneously with) another. Its meaning is defined in terms of the quasiordering on events specified by. The semantics of strictly precedes (x strictlyPrecedes y) is similar, onlyx must take place strictly beforey. It is interpreted as, which we recall is defined from asxyxy and y⪯̸x.

Semantics 36 (precedes-semantics)
  1. W,ρx precedes y holds if and only ifρ(x),ρ(y)Events andρ(x)ρ(y).
  2. W,ρx strictlyPrecedes y holds if and only ifρ(x),ρ(y)Events andρ(x)ρ(y).

The ordering of time values associated to events is unrelated to the event ordering. For example:

entity(e)activity(a1)activity(a2)wasGeneratedBy(gen1; e, a1, 2011-11-16T16:05:00)wasGeneratedBy(gen2; e, a2, 2012-11-16T16:05:00) //different date

This instance is valid, and must satisfy precedence constraintsgen1 precedes gen2 andgen2 precedes gen1, but this does not imply anything about the relative orderings of the associated times, or vice versa.

4.5.2notNull

ThenotNull(x) formula is used to specify that a value may not be the null value. The symbol "" always denotes the null value (i.e.ρ()=).

Semantics 37 (notNull-semantics)

W,ρnotNull(e) holds if and only ifρ(e).

4.5.3typeOf

The typing formulatypeOf(x,t) constrains the type of the value ofx.

Semantics 38 (typeOf-semantics)
  1. W,ρtypeOf(e,entity) holds if and only ifρ(e)Entities.
  2. W,ρtypeOf(a,activity) holds if and only ifρ(a)Activities.
  3. W,ρtypeOf(ag,agent) holds if and only ifρ(ag)Agents.
  4. W,ρtypeOf(c,Collection) holds if and only ifρ(c)Collections.
  5. W,ρtypeOf(c,EmptyCollection) holds if and only ifρ(c)Collections andmembers(ρ(c)=.

5. Inferences and Constraints

In this section we restate all of the inferences and constraints of PROV-CONSTRAINTS in terms of first-order logic. For each, we give a proof sketch showing why the inference or constraint is sound for reasoning about the semantics. We exclude the definitional rules in PROV-CONSTRAINTS because they are only needed for expanding the abbreviated forms of PROV-N statements to the logical formulas used here.

5.1Inferences

Inference 5 (communication-generation-use-inference)
id,a2,a1,attrs. wasInformedBy(id,a2,a1,attrs)e,gen,t1,use,t2. wasGeneratedBy(gen,e,a1,t1,[])used(use,a2,e,t2,[])

This follows immediately from the semantics ofwasInformedBy.

Inference 6 (generation-use-communication-inference)
gen,e,a1,t1,attrs1,use,a2,t2,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)used(use,a2,e,t2,attrs2)id. wasInformedBy(id,a2,a1,[])

This follows from the semantics ofwasInformedBy andAxiom 1.

Inference 7 (entity-generation-invalidation-inference)
e,attrs. entity(e,attrs)gen,a1,t1,inv,a2,t2. wasGeneratedBy(gen,e,a1,t1,[])wasInvalidatedBy(inv,e,a2,t2,[])

This follows fromAxiom 2, which requires that generation and invalidation events exist for each entity.

Inference 8 (activity-start-end-inference)
a,t1,t2,attrs. activity(a,t1,t2,attrs)start,e1,a1,end,a2,e2. wasStartedBy(start,a,e1,a1,t1,[])wasEndedBy(end,a,e2,a2,t2,[])

This follows from the semantics of activity formulas, specifically the requirement that start and end events exist for the activity.

Inference 9 (wasStartedBy-inference)
id,a,e1,a1,t,attrs. wasStartedBy(id,a,e1,a1,t,attrs)gen,t1. wasGeneratedBy(gen,e1,a1,t1,[])

This follows fromAxiom 3.

Inference 10 (wasEndedBy-inference)
id,a,e1,a1,t,attrs. wasEndedBy(id,a,e1,a1,t,attrs)gen,t1. wasGeneratedBy(gen,e1,a1,t1,[])

This follows fromAxiom 4.

Inference 11 (derivation-generation-use-inference)
id,e2,e1,a,gen2,use1,attrs. notNull(a)notNull(gen2)notNull(use1)wasDerivedFrom(id,e2,e1,a,gen2,use1,attrs)t1,t2. used(use1,a,e1,t1,[])wasGeneratedBy(gen2,e2,a,t2,[])

This follows from the semantics of precise derivation steps.

Inference 12 (revision-is-alternate-inference)
id,e1,e2,a,g,u. wasDerivedFrom(id,e2,e1,a,g,u,[prov:type=prov:Revision]))alternateOf(e2,e1)

This follows from the semantics of derivation steps (precise or imprecise) andAxiom 5.

Inference 13 (attribution-inference)
att,e,ag,attrs. wasAttributedTo(att,e,ag,attrs)a,t,gen,assoc,pl. wasGeneratedBy(gen,e,a,t,[])wasAssociatedWith(assoc,a,ag,pl,[])

This follows from the semantics of generation, association, and attribution, byAxiom 6.

Inference 14 (delegation-inference)
id,ag1,ag2,a,attrs. actedOnBehalfOf(id,ag1,ag2,a,attrs)id1,pl1,id2,pl2. wasAssociatedWith(id1,a,ag1,pl1,[])wasAssociatedWith(id2,a,ag2,pl2,[])

This follows from the semantics of association and delegation, byAxiom 7.

Inference 15 (influence-inference)

  1. id,e,a,t,attrs. wasGeneratedBy(id,e,a,t,attrs)wasInfluencedBy(id,e,a,attrs)

  2. id,a,e,t,attrs. used(id,a,e,t,attrs)wasInfluencedBy(id,a,e,attrs)

  3. id,a2,a1,attrs. wasInformedBy(id,a2,a1,attrs)wasInfluencedBy(id,a2,a1,attrs)

  4. id,a2,e,a1,t,attrs. wasStartedBy(id,a2,e,a1,t,attrs)wasInfluencedBy(id,a2,e,attrs)

  5. id,a2,e,a1,t,attrs. wasEndedBy(id,a2,e,a1,t,attrs)wasInfluencedBy(id,a2,e,attrs)

  6. id,e,a,t,attrs. wasInvalidatedBy(id,e,a,t,attrs)wasInfluencedBy(id,e,a,attrs)

  7. id,e2,e1,a,g,u,attrs. wasDerivedFrom(id,e2,e1,a,g,u,attrs)wasInfluencedBy(id,e2,e1,attrs)

  8. id,e,ag,attrs. wasAttributedTo(id,e,ag,attrs)wasInfluencedBy(id,e,ag,attrs)

  9. id,a,ag,pl,attrs. wasAssociatedWith(id,a,ag,pl,attrs)wasInfluencedBy(id,a,ag,attrs)

  10. id,ag2,ag1,a,attrs. actedOnBehalfOf(id,ag2,ag1,a,attrs)wasInfluencedBy(id,ag2,ag1,attrs)

This follows viaAxioms 8 through17.

Inference 16 (alternate-reflexive)
e. entity(e)alternateOf(e,e)

Supposeent=ρ(e). ClearlyentEntities andthingOf(ent)=thingOf(ent), soW,ρalternateOf(e,e).

Inference 17 (alternate-transitive)
e1,e2,e3. alternateOf(e1,e2)alternateOf(e2,e3)alternateOf(e1,e3)

Supposeent1=ρ(e1) andent2=ρ(e2) andent3=ρ(e3). Then by assumptionent1,ent2, andent3 are inEntities andthingOf(e1)=thingOf(e2) andthingOf(e2)=thingOf(e3), sothingOf(e1)=thingOf(e3), as required to concludeW,ρalternateOf(e1,e3).

Inference 18 (alternate-symmetric)
e1,e2. alternateOf(e1,e2)alternateOf(e2,e1)

Supposeent1=ρ(e1) andent2=ρ(e2). Then by assumption bothent1 andent2 are inEntities andthingOf(e1)=thingOf(e2), as required to concludeW,ρalternateOf(e2,e1).

Inference 19 (specialization-transitive)
e1,e2,e3. specializationOf(e1,e2)specializationOf(e2,e3)specializationOf(e1,e3)

Suppose the conditions for specialization hold ofent1 andent2 and forent2 andent3, whereent1=ρ(e1) andent2=ρ(e2) andent3=ρ(e3). Thenevents(e1)events(e2)events(e3). Moreover,value(obj2,attr)value(obj3,attr), and similarlyvalue(obj1,attr)value(obj2,attr) sovalue(obj1,attr)value(obj3,attr). Finally, at least one of the inclusions betweenobj1 andobj2 is strict, so the same is the case forobj1 andobj3.

Inference 20 (specialization-alternate-inference)
e1,e2. specializationOf(e1,e2)alternateOf(e1,e2)

Ifent1=ρ(e1) andent2=ρ(e2) are specializations, thenthingOf(ent1)=thingOf(ent2).

Inference 21 (specialization-attributes-inference)
e1,attrs,e2. entity(e1,attrs)specializationOf(e2,e1)entity(e2,attrs)

Supposeent1=ρ(e1) andent2=ρ(e2). Suppose(att,v) is an attribute-value pair inattrs. Sinceentity(e1,attrs) holds, we know thatvvalue(ent1,att). Thusvvalue(ent2,att) sincevalue(ent2,att)value(ent1,att). Since this is the case for all attribute-value pairs inattrs, and sincee2 obviously denotes an entity, we can concludeW,ρentity(e2,attrs).

5.2Constraints

5.2.1Uniqueness constraints

Constraint 22 (key-object)

  1. id,attrs1,attrs2.entity(id,attrs1)entity(id,attrs2)entity(id,attrs1attrs2)

  2. id,t1,t1,t2,t2,attrs1,attrs2. activity(id,t1,t2,attrs1)activity(id,t1,t2,attrs2)activity(id,t1,t2,attrs1attrs2)t1=t1t2=t2

  3. id,attrs1,attrs2.agent(id,attrs1)agent(id,attrs2)agent(id,attrs1attrs2).

These properties follow immediately from the definitions of the semantics of the respective assertions, because functions are used for the underlying data.

Constraint 23 (key-properties)

  1. id,e,e,a,a,t,t,attrs1,attrs2. wasGeneratedBy(id,e,a,t,attrs)wasGeneratedBy(id,e,a,t,attrs2)wasGeneratedBy(id,e,a,t,attrs1attrs2)e=ea=at=t

  2. id,e,e,a,a,t,t,attrs1,attrs2. used(id,a,e,t,attrs)used(id,a,e,t,attrs2)used(id,a,e,t,attrs1attrs2)e=ea=at=t

  3. id,a1,a2,a1,a2,attrs1,attrs2. wasInformedBy(id,a1,a2,attrs)wasInformedBy(id,a1,a2,attrs2)wasInformedBy(id,a1,a2,attrs1attrs2)a1=a1a2=a2

  4. id,e,ea1,a2,a1,a2,t,t,attrs1,attrs2. wasStartedBy(id,a2,e,a1,t,attrs1)wasStartedBy(id,a2,e,a1,t,attrs2)wasStartedBy(id,a2,e,a1,t,attrs1attrs2)a1=a1e=ea2=a2t=t

  5. id,e,ea1,a2,a1,a2,t,t,attrs1,attrs2. wasEndedBy(id,a2,e,a1,t,attrs1)wasEndedBy(id,a2,e,a1,t,attrs2)wasEndedBy(id,a2,e,a1,t,attrs1attrs2)a1=a1e=ea2=a2t=t

  6. id,e,e,a,a,t,t,attrs1,attrs2. wasInvalidatedBy(id,e,a,t,attrs1)wasInvalidatedBy(id,e,a,t,attrs2)wasInvalidatedBy(id,e,a,t,attrs1attrs2)e=ea=at=t

  7. id,e1,e1,e2,e2,a,a,g,g,u,u,attrs1,attrs2. wasDerivedFrom(id,e2,e1,a,g2,u1,attrs1)wasDerivedFrom(id,e2,e1,a,g2,u1,attrs2)wasDerivedFrom(id,e2,e1,a,g2,u1,attrs1attrs2)e1=e1e2=e2a=ag=gu=u

  8. id,e,e,ag,ag,attrs1,attrs2. wasAttributedTo(id,e,ag,attrs1)wasAttributedTo(id,e,ag,attrs2)wasAttributedTo(id,e,ag,attrs1attrs2)e=eag=ag

  9. id,a,a,ag,ag,pl,pl,attrs1,attrs2. wasAssociatedWith(id,a,ag,pl,attrs1)wasAssociatedWith(id,a,ag,pl,attrs2)wasAssociatedWith(id,a,ag,pl,attrs1attrs2)a=aag=agpl=pl

  10. id,ag1,ag1,ag2,ag2,a,a,attrs1,attrs2. actedOnBehalfOf(id,ag2,ag1,a,attrs1)actedOnBehalfOf(id,ag2,ag1,a,attrs2)actedOnBehalfOf(id,ag2,ag1,a,attrs1attrs2)ag1=ag1ag2=ag2a=a

  11. id,o1,o2,o1,o2,attrs1,attrs2. wasInfluencedBy(id,o2,o1,attrs1)wasInfluencedBy(id,o2,o1,attrs2)wasInfluencedBy(id,o2,o1,attrs1attrs2)o1=o1o2=o2

These properties follow immediately from the definitions of the semantics of the respective assertions, again because functions are used for the underlying data.

Constraint 24 (unique-generation)
gen1,gen2,e,a,t1,t2,attrs1,attrs2. wasGeneratedBy(gen1,e,a,t1,attrs1)wasGeneratedBy(gen2,e,a,t2,attrs2)gen1=gen2

This follows fromAxiom 18.

Constraint 25 (unique-invalidation)
inv1,inv2,e,a,t1,t2,attrs1,attrs2. wasInvalidatedBy(inv1,e,a,t1,attrs1)wasInvalidatedBy(inv2,e,a,t2,attrs2)inv1=inv2

This follows fromAxiom 19.

Constraint 26 (unique-wasStartedBy)
start1,start2,a,e1,e2,a0,t1,t2,attrs1,attrs2. wasStartedBy(start1,a,e1,a0,t1,attrs1)wasStartedBy(start2,a,e2,a0,t2,attrs2)start1=start2

This follows fromAxiom 20.

Constraint 27 (unique-wasEndedBy)
end1,end2,a,e1,e2,a0,t1,t2,attrs1,attrs2. wasEndedBy(end1,a,e1,a0,t1,attrs1)wasEndedBy(end2,a,e2,a0,t2,attrs2)end1=end2

This follows fromAxiom 21.

Constraint 28 (unique-startTime)
start,a1,a2,t,t1,t2,e,attrs,attrs1. activity(a2,t1,t2,attrs)wasStartedBy(start,a2,e,a1,t,attrs1)t1=t

This follows from the semantics ofwasStartedBy, since the start times must both match that of the activity.

Constraint 29 (unique-endTime)
end,a1,a2,t,t1,t2,e,attrs,attrs1. activity(a2,t1,t2,attrs)wasEndedBy(end,a2,e,a1,t,attrs1)t2=t

This follows from the semantics ofwasEndedBy, since the end times must both match that of the activity.

5.2.2Ordering constraints

Constraint 30 (start-precedes-end)
start,end,a,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasStartedBy(start,a,e1,a1,t1,attrs1)wasEndedBy(end,a,e2,a2,t2,attrs2)start precedes end

This follows fromAxiom 22.

Constraint 31 (start-start-ordering)
start1,start2,a,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasStartedBy(start1,a,e1,a1,t1,attrs1)wasStartedBy(start2,a,e2,a2,t2,attrs2)start1 precedes start2

This follows fromAxiom 22.

Constraint 32 (end-end-ordering)
end1,end2,a,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasEndedBy(end1,a,e1,a1,t1,attrs1)wasEndedBy(end2,a,e2,a2,t2,attrs2)end1 precedes end2

This follows fromAxiom 23.

Constraint 33 (usage-within-activity)

  1. start,use,a,e1,e2,a1,t1,t2,attrs1,attrs2. wasStartedBy(start,a,e1,a1,t1,attrs1)used(use,a,e2,t2,attrs2)start precedes use

  2. use,end,a,e1,e2,a2,t1,t2,attrs1,attrs2. used(use,a,e1,t1,attrs1)wasEndedBy(end,a,e2,a2,t2,attrs2)use precedes end

Part 1 follows fromAxiom 22 and part 2 follows fromAxiom 23.

Constraint 34 (generation-within-activity)

  1. start,gen,e1,e2,a,a1,t1,t2,attrs1,attrs2. wasStartedBy(start,a,e1,a1,t1,attrs1)wasGeneratedBy(gen,e2,a,t2,attrs2)start precedes gen

  2. gen,end,e,e1,a,a1,t,t1,attrs,attrs1. wasGeneratedBy(gen,e,a,t,attrs)wasEndedBy(end,a,e1,a1,t1,attrs1)gen precedes end

Part 1 follows fromAxiom 22 and part 2 follows fromAxiom 23.

Constraint 35 (wasInformedBy-ordering)
id,start,end,a1,a1,a2,a2,e1,e2,t1,t2,attrs,attrs1,attrs2. wasInformedBy(id,a2,a1,attrs)wasStartedBy(start,a1,e1,a1,t1,attrs1)wasEndedBy(end,a2,e2,a2,t2,attrs2)start precedes end

This follows from the semantics ofwasInformedBy,Axiom 24, and the previous two constraints, becausewasInformedBy implies the existence of intermediate generation and usage events linkinga1 anda2 through an entitye. The generation ofe must precede its use.

Constraint 36 (generation-precedes-invalidation)
gen,inv,e,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)wasInvalidatedBy(inv,e,a2,t2,attrs2)gen precedes inv

This follows fromAxiom 24.

Constraint 37 (generation-precedes-usage)
gen,use,e,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)used(use,a2,e,t2,attrs2)gen precedes use

This follows fromAxiom 24.

Constraint 38 (usage-precedes-invalidation)
use,inv,a1,a2,e,t1,t2,attrs1,attrs2. used(use,a1,e,t1,attrs1)wasInvalidatedBy(inv,e,a2,t2,attrs2)use precedes inv

This follows fromAxiom 25.

Constraint 39 (generation-generation-ordering)
gen1,gen2,e,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen1,e,a1,t1,attrs1)wasGeneratedBy(gen2,e,a2,t2,attrs2)gen1 precedes gen2

This follows fromAxiom 24.

Constraint 40 (invalidation-invalidation-ordering)
inv1,inv2,e,a1,a2,t1,t2,attrs1,attrs2. wasInvalidatedBy(inv1,e,a1,t1,attrs1)wasInvalidatedBy(inv2,e,a2,t2,attrs2)inv1 precedes inv2

This follows fromAxiom 25.

Constraint 41 (derivation-usage-generation-ordering)
d,e1,e2,a,gen2,use1,attrs. notNull(a)notNull(gen2)notNull(use1)wasDerivedFrom(d,e2,e1,a,gen2,use1,attrs)use1 precedes gen2

This follows fromAxiom 26.

Constraint 42 (derivation-generation-generation-ordering)
d,gen1,gen2,e1,e2,a,a1,a2,g,u,t1,t2,attrs,attrs1,attrs2. wasDerivedFrom(d,e2,e1,a,g,u,attrs)wasGeneratedBy(gen1,e1,a1,t1,attrs1)wasGeneratedBy(gen2,e2,a2,t2,attrs2)gen1 strictlyPrecedes gen2

This follows fromAxiom 27.

Constraint 43 (wasStartedBy-ordering)

  1. gen,start,e,a,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)wasStartedBy(start,a,e,a2,t2,attrs2)gen precedes start

  2. start,inv,e,a,a1,a2,t1,t2,attrs1,attrs2. wasStartedBy(start,a,e,a1,t1,attrs1)wasInvalidatedBy(inv,e,a2,t2,attrs2)start precedes inv

Part 1 follows fromAxiom 24. Part 2 follows fromAxiom 25.

Constraint 44 (wasEndedBy-ordering)

  1. gen,end,e,a,a1,a2,t1,t2,attrs1,attrs2. wasGeneratedBy(gen,e,a1,t1,attrs1)wasEndedBy(end,a,e,a2,t2,attrs2)gen precedes end

  2. end,inv,e,a,a1,a2,t1,t2,attrs1,attrs2. wasEndedBy(end,a,e,a1,t1,attrs1)wasInvalidatedBy(inv,e,a2,t2,attrs2)end precedes inv

Part 1 follows fromAxiom 24. Part 2 follows fromAxiom 25.

Constraint 45 (specialization-generation-ordering)
gen1,gen2,e1,e2,a1,a2,t1,t2,attrs1,attrs2. specializationOf(e2,e1)wasGeneratedBy(gen1,e1,a1,t1,attrs1)wasGeneratedBy(gen2,e2,a2,t2,attrs2)gen1 precedes gen2

This follows fromAxiom 24 and the fact that ife2 specializese1 then all of the events ofe2 are events ofe1. Thus, the generation ofe1 precedes all events ofe2.

Constraint 46 (specialization-invalidation-ordering)
inv1,inv2,e1,e2,a1,a2,t1,t2,attrs1,attrs2. specializationOf(e1,e2)wasInvalidatedBy(inv1,e1,a1,t1,attrs1)wasInvalidatedBy(inv2,e2,a2,t2,attrs2)inv1 precedes inv2

This follows fromAxiom 25 and the fact that ife2 specializese1 then all of the events ofe2 are events ofe1. Thus, the invalidation ofe1 follows all events ofe2.

Constraint 47 (wasAssociatedWith-ordering)

In the following inferences,pl may be a placeholder -.


  1. assoc,start1,inv2,ag,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasAssociatedWith(assoc,a,ag,pl,attrs)wasStartedBy(start1,a,e1,a1,t1,attrs1)wasInvalidatedBy(inv2,ag,a2,t2,attrs2)start1 precedes inv2

  2. assoc,gen1,end2,ag,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasAssociatedWith(assoc,a,ag,pl,attrs)wasGeneratedBy(gen1,ag,a1,t1,attrs1)wasEndedBy(end2,a,e2,a2,t2,attrs2)gen1 precedes end2

  3. assoc,start1,end2,ag,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasAssociatedWith(assoc,a,ag,pl,attrs)wasStartedBy(start1,a,e1,a1,t1,attrs1)wasEndedBy(end2,ag,e2,a2,t2,attrs2)start1 precedes end2

  4. assoc,start1,end2,ag,e1,e2,a1,a2,t1,t2,attrs1,attrs2. wasAssociatedWith(assoc,a,ag,pl,attrs)wasStartedBy(start1,ag,e1,a1,t1,attrs1)wasEndedBy(end2,a,e2,a2,t2,attrs2)start1 precedes end2

The four parts follow fromAxiom 28 throughAxiom 31 respectively.

Constraint 48 (wasAttributedTo-ordering)

  1. att,gen1,gen2,e,a1,a2,t1,t2,ag,attrs,attrs1,attrs2. wasAttributedTo(att,e,ag,attrs)wasGeneratedBy(gen1,ag,a1,t1,attrs1)wasGeneratedBy(gen2,e,a2,t2,attrs2)gen1 precedes gen2

  2. att,start1,gen2,e,e1,a1,a2,ag,t1,t2,attrs,attrs1,attrs2. wasAttributedTo(att,e,ag,attrs)wasStartedBy(start1,ag,e1,a1,t1,attrs1)wasGeneratedBy(gen2,e,a2,t2,attrs2)start1 precedes gen2

These properties follow fromAxiom 32 andAxiom 33.

Constraint 49 (actedOnBehalfOf-ordering)

  1. del,gen1,inv2,ag1,ag2,a,a1,a2,t1,t2,attrs,attrs1,attrs2. actedOnBehalfOf(del,ag2,ag1,a,attrs)wasGeneratedBy(gen1,ag1,a1,t1,attrs1)wasInvalidatedBy(inv2,ag2,a2,t2,attrs2)gen1 precedes inv2

  2. del,start1,end2,ag1,ag2,a,a1,a2,e1,e2,t1,t2,attrs,attrs1,attrs2. actedOnBehalfOf(del,ag2,ag1,a,attrs)wasStartedBy(start1,ag1,e1,a1,t1,attrs1)wasEndedBy(end2,ag2,e2,a2,t2,attrs2)start1 precedes end2

These properties follow fromAxiom 34 andAxiom 35.

5.2.3Typing constraints

Constraint 50 (typing)

  1. e,attrs. entity(e,attrs)typeOf(e,entity)

  2. ag,attrs. agent(ag,attrs)typeOf(ag,agent)

  3. a,t1,t2,attrs. activity(a,t1,t2,attrs)typeOf(a,activity)

  4. u,a,e,t,attrs. used(u,a,e,t,attrs)typeOf(a,activity)typeOf(e,entity)

  5. g,a,e,t,attrs. wasGeneratedBy(g,e,a,t,attrs)typeOf(a,activity)typeOf(e,entity)

  6. inf,a2,a1,t,attrs. wasInformedBy(inf,a2,a1,t,attrs)typeOf(a1,activity)typeOf(a2,activity)

  7. start,a2,e,a1,t,attrs. wasStartedBy(start,a2,e,a1,t,attrs)typeOf(a1,activity)typeOf(a2,activity)typeOf(e,entity)

  8. end,a2,e,a1,t,attrs. wasEndedBy(end,a2,e,a1,t,attrs)typeOf(a1,activity)typeOf(a2,activity)typeOf(e,entity)

  9. inv,a,e,t,attrs. wasInvalidatedBy(inv,e,a,t,attrs)typeOf(a,activity)typeOf(e,entity)

  10. id,e2,e1,a,g2,u1,attrs. notNull(a)notNull(g2)notNull(u1)wasDerivedFrom(id,e2,e1,a,g2,u1,attrs)typeOf(e2,entity)typeOf(e1,activity)typeOf(a,activity)

  11. id,e2,e1,attrs. wasDerivedFrom(id,e2,e1,,,,attrs)typeOf(e2,entity)typeOf(e1,activity)

  12. id,e,ag,attrs. wasAttributedTo(id,e,ag,attrs)typeOf(e,entity)typeOf(ag,agent)

  13. id,a,ag,pl,attrs. notNull(pl)wasAssociatedWith(id,a,ag,pl,attrs)typeOf(a,activity)typeOf(ag,agent)typeOf(pl,entity)

  14. id,a,ag,attrs. wasAssociatedWith(id,a,ag,,attrs)typeOf(a,activity)typeOf(ag,agent)

  15. id,ag2,ag1,a,attrs. actedOnBehalfOf(id,ag2,ag1,a,attrs)typeOf(ag2,agent)typeOf(ag1,agent)typeOf(a,activity)

  16. e2,e1. alternateOf(e2,e1)typeOf(e2,entity)typeOf(e1,entity)

  17. e2,e1. specializationOf(e2,e1)typeOf(e2,entity)typeOf(e1,entity)

  18. c,e. hadMember(c,e)typeOf(c,Collection)typeOf(e,entity)

  19. c. entity(c,[prov:type=prov:emptyCollection]))typeOf(c,entity)typeOf(c,Collection)typeOf(c,EmptyCollection)

Each typing constraint follows immediately from well-formedness criteria marked [WF] in the corresponding semantics for formulas. The final constraint requiresAxiom 36.

5.2.4Impossibility constraints

Constraint 51 (impossible-unspecified-derivation-generation-use)

  1. id,e1,e2,g,attrs. notNull(g)wasDerivedFrom(id,e2,e1,,g,,attrs)False

  2. id,e1,e2,u,attrs. notNull(u)wasDerivedFrom(id,e2,e1,,,u,attrs)False

  3. id,e1,e2,g,u,attrs. notNull(g)notNull(u)wasDerivedFrom(id,e2,e1,,g,u,attrs)False

Each part follows from the fact that the semantics ofwasDerivedFrom only allows formulas to hold when either all three ofa,g,u are "" (denoting) or none of them are.

Constraint 52 (impossible-specialization-reflexive)
e. specializationOf(e,e)False

This follows from the fact that in the semantics ofspecializationOf, the requirement that one of the inclusions is strict implies that the two entities cannot be the same.

Constraint 53 (impossible-property-overlap)

For eachr ands{used,wasGeneratedBy,wasInvalidatedBy,wasStartedBy,wasEndedBy,wasInformedBy,wasAttributedTo,wasAssociatedWith,actedOnBehalfOf} such thatr ands are different relation names, the following constraint holds:

id,a1,,am,b1,,bn. r(id,a1,,am)s(id,b1,,bn)False

This follows from the assumption that the different kinds of influences are disjoint sets, characterized by their types. Note that generic influences are allowed to overlap with more specific kinds of influence.

Constraint 54 (impossible-object-property-overlap)

For eachp{entity,activity,agent} and eachr{used,wasGeneratedBy,wasInvalidatedBy,wasStartedBy,wasEndedBy,wasInformedBy,wasAttributedTo,wasAssociatedWith,actedOnBehalfOf,wasInfluencedBy}, the following constraint holds:

id,a1,,am,b1,,bn. p(id,a1,,am)r(id,b1,,bn)False

This follows from the assumption that influences are distinct from other objects (entities, activities or agents).

Constraint 55 (entity-activity-disjoint)
id. typeOf(id,entity)typeOf(id,activity)False

This follows from the assumption that entities and activities are disjoint.

Constraint 56 (membership-empty-collection)
c,e. hasMember(c,e)typeOf(c,EmptyCollection)False

This follows from the definition of the semantics oftypeOf(c,EmptyCollection), which requires that there are no members of the collection denoted byc.

6.Soundness and Completeness

Above we have presented arguments for the soundness of theconstraints and inferences with respect to the semantics.Here, we relate the notions ofvalidity andnormalform defined in PROV-CONSTRAINTS to the semantics.

6.1Soundness

Our main soundness result is:

Theorem 39 (soundness-theorem)

LetW be a PROV structure, that is, a structure providing all of the components above and satisfying all of the axioms.

  1. IfI is an instance andWI andI is obtained fromI by applying one of the PROV inferences, thenWI.
  2. IfI is an instance andWI andI is obtained fromI by applying one of the PROV key or uniqueness constraints, thenWI.
  3. IfI is an instance andWI thenI has a normal formI andWI.
  4. IfI is a normal form andWI thenI satisfies all of the ordering, typing and impossibility constraints.
  5. IfWI thenI is valid.

For part 1, the arguments are as in the previous section.

For part 2, ifWI then sinceW satisfies the logical forms of all uniqueness and key constraints, constraint application cannot fail onI andWI.

For part 3, proceed by induction on a terminating sequence of inference or uniqueness constraint steps: ifI is in normal form then we are done. IfI is not in normal form then if an inference is applicable, then use part 1; if a uniqueness constraint is applicable, then use part 2.

For part 4, the arguments are as in the previous section for each constraint.

Finally, for part 5, supposeWI. ThenWI whereI is the normal form ofI by part 2. By part 3,I satisfies all of the remaining constraints, soI is valid.

6.2Weak Completeness

In this section we give a translation from valid PROV instances to structures, and show that a valid PROV instance has a model. We call this propertyweak completeness.

The termweak refers to the fact that there are still some inferences that are sound in the semantics but not enforced by validation. For example, consider the following (valid) PROV instance fragment:

entity(e,[a=1])agent(e,[b=2])

This instance is valid and has a model, but in every model satisfying the instance, it is also true that:

entity(e,[a=1,b=2])agent(e,[a=1,b=2])

Thus, weak completeness captures the fact that every valid instance has a model, but does not imply that a valid instance satisfies all of the deductions possible in that model.

LetI be a valid PROV instance that is in normal form. We define a structureM(I) as follows, by giving the sets, functions and relations specified in the components inSection 3, and finally verifying that the axioms hold.

First, without loss of generality, we assume that all times specified in activity or event formulas inI are ground values. If not, set each variable in such a position to some dummy value. This is justified by the following fact:

Lemma 40 (time-grounding)

IfI is valid thenS(I) is valid, whereS is any substitution that maps time variables to time constants.

First, consider a substitutionS=[t:=c] that maps a single time variable to a constant. It is straightforward to check that ifI is in normal form, thenS(I) is in normal form, since none of the inferences or uniqueness constraints can be enabled by changing a time variable uniformly inI. Similarly, the remaining constraints are insensitive to the time values, soS(I) is in normal form and satisfies all of the remaining constraints just asI does. The general case of a substitution that replaces multiple time variables with constants is a straightforward generalization since we can view such a substitution as a composition of single-variable substitutions.

6.2.1Sets

The sets of structureM(I) are:

Entities={idItypeOf(id,entity)}Plans={plid,ag,ac,attrs. wasAssociatedWith(id,ag,act,pl,attrs)I,pl}Collections={cItypeOf(c,prov:Collection) or ItypeOf(c,prov:EmptyCollection)}Activities={idItypeOf(id,activity)}{aid,aididEntities}{aidid,e2,e1. wasDerivedFrom(id,e2,e1,,,,attrs)I}Agents={idItypeOf(id,agent)}Usages={ida,e,t,attrs. used(id,a,e,t,attrs)I}{uidid,e2,e1,attrs. wasDerivedFrom(id,e2,e1,,,,attrs)I}Generations={ide,a,t,attrs. wasGeneratedBy(id,e,a,t,attrs)I}{gidid,e2,e1,attrs. wasDerivedFrom(id,e2,e1,,,,attrs)I}{gididEntities}Invalidations={ide,a,t,attrs. wasInvalidatedBy(id,e,a,t,attrs)I}{iididEntities}Starts={ida,e,a,t,attrs. wasStartedBy(id,a,e,a,t,attrs)I}Ends={ida,e,a,t,attrs. wasEndedBy(id,a,e,a,t,attrs)I}Events=UsagesGenerationsInvalidationsStartsEndsAssociations={idag,act,pl,attrs. wasAssociatedWith(id,ag,act,pl,attrs)I}Attributions={ide,ag,attrs. wasAttributedTo(id,e,ag,attrs)I}Delegations={idag2,ag1,attrs. actedOnBehalfOf(id,ag2,ag1,act,attrs)I}Communications={ida2,a1,attrs. wasInformedBy(id,a2,a1,attrs)I}Derivations={ide2,e1,a,g,u,attrs. wasDerivedFrom(id,e2,e1,a,g,u,attrs)I}Influences=EventsAssociationsAttributionsCommunicationsDelegations{ido2,o1,attrs. wasInfluencedBy(id,o2,o1,attrs)I}Objects=EntitiesActivitiesAgentsInfluences

In the definitions ofEntities,Collections,Activities andAgents we use the notationItypeOf(id,t) to indicate thatid must have typet inI according to the typing constraints. For example, for entities, this means that the setEntities contains all identifierse,e appearing in theentity(e,attrs),alternateOf(e,e), orspecializationOf(e,e) formulas, as well as all tose appearing in the appropriate positions of other formulas, as specified in the typing constraints.

In the definitions ofActivities,Generations,Invalidations, andUsages we writeaid,gid,iid anduid respectively to indicate additional activities, generations and usages added for imprecise derivations or entities.

In addition, to define the set ofThings, we introduce an equivalence relation onEntities as follows:

e1e2alternateOf(e1,e2)I

The fact that this is an equivalence relation follows from the fact thatI is in normal form, since the constraints onalternateOf ensure that it is an equivalence relation. Recall that given an equivalence relation on some setX, theequivalence class ofxX is the set[x]={yXxy}. Thequotient ofX by an equivalence relation onX is the set of equivalence classes,X={[x]xX}. Now we define the set ofThings as the quotient of-equivalence classes ofEntities.

Things=Entities/={[e]eEntities}

Observe that sinceI is normalized and valid, entities andactivities are disjoint, the influences are disjoint from entities,activities, and agents, and the different subsets of events and influences are pairwisedisjoint, as required.

6.2.2Functions

First, we consider the functions associated withEntities.

events(e)={idused(id,a,e,t,attrs)I}{idwasGeneratedBy(id,e,a,t,attrs)I}{idwasInvalidatedBy(id,e,a,t,attrs)I}{idwasStartedBy(id,a,e,a,t,attrs)I}{idwasEndedBy(id,a,e,a,t,attrs)I}{ge,ie}events(e)=events(e)specializationOf(e,e)Ievents(e)value(e,a)={ventity(e,attrs)I,(a=v)attrs}(auniq)value(e,uniq)={uniqe}value(e,a)=value(e)specializationOf(e,e)Ivalue(e)thingOf(e)=[e]

Above, we introduce a fresh attribute nameuniq, not already inuse inI, along with a fresh valuee and for each entitye weadd a valueuniqe tovalues(e,uniq). Thisconstruction ensures that if an entity is a specialization of anotherinI then the specialization relationship will hold inM(I). Wealso define the set of all events involved ine as the set of eventsimmediately involved ine or any specialization ofe. Similarly,the values of attributes ofe are those immediately declared forealong with those of anye thate specializes. We also introduce dummygeneration and invalidation events for each entitye, along withactivitiesae,ae to perform them.

Similarly, forThings, weemploy an auxiliary functionevents:ThingsP(Events) that collects the set of allevents in which one of the entities constituting the thing participated.

events(T)=eTevents(e)value(T,a,evt)=eT,evtevents(e)value(e,a)

The functionsevents,startTime andendTime mapping activities to their start and end times are defined as follows:

events(a)={idused(id,a,e,t,attrs)I}{idwasGeneratedBy(id,e,a,t,attrs)I}{idwasInvalidatedBy(id,e,a,t,attrs)I}{idwasStartedBy(id,a,e,a,t,attrs)I}{idwasEndedBy(id,a,e,a,t,attrs)I}{ge,ie}startTime(id)=t1 where activity(a,t1,t2,attrs)IendTime(id)=t2 where activity(a,t1,t2,attrs)I

The start and end times are arbitrary (say, some zero value) for activities with noactivity formula declaring the times. The above definitions ofstartTime andendTime ignore any start times asserted inwasStartedBy orwasEndedBy formulas. If bothactivity andwasStartedBy/wasEndedBy statements are present, then they must match, but PROV-CONSTRAINTS does not require that the times of multiple start or end events match for an activity with noactivity statement.

The following valid instance exemplifies the above discussion, whent1t2:

wasStartedBy(id1;a,e1,a1,t1,[])wasStartedBy(id2;a,e2,a2,t2,[])

This instance becomes invalid if we add anactivity(a,[]) statement, because it expands toactivity(a,T1,T2,[]) whereT1,T2 are existential variables, and uniqueness constraints require thatt1=T1=t2, which leads to uniqueness constraint failure.

For otherObjects besidesEntities andActivities, theassociated sets ofEvents are defined to be empty. (AnAgent thathappens to be anEntity orActivity will have the set of eventsdefined above for the appropriate kind of object. Note that sinceEntities andActivities are disjoint, this definition is unambiguous.)

The functiontime mappingEvents to theirTimes is defined as follows:

time(id)=t where used(id,a,e,t,attrs)Itime(id)=t where wasGeneratedBy(id,e,a,t,attrs)Itime(id)=t where wasInvalidatedBy(id,e,a,t,attrs)Itime(id)=t where wasStartedBy(id,a,e,a,t,attrs)Itime(id)=t where wasEndedBy(id,a,e,a,t,attrs)I

This definition is deterministic because the sets of identifiers of differentEvents are disjoint, and the associated times are unique.

The functions giving the interpretations of the different identified influences are as follows:

used(id)=(a,e) where used(id,a,e,t,attrs)Iused(uid)=(aid,e1) where wasDerivedFrom(id,e2,e1,,,,attrs)Igenerated(id)=(e,a) where wasGeneratedBy(id,e,a,t,attrs)Igenerated(gid)=(e2,aid) where wasDerivedFrom(id,e2,e1,,,,attrs)Igenerated(ge)=(e,ae) where eEntitiesinvalidated(id)=(e,a) where wasInvalidatedBy(id,e,a,t,attrs)Iinvalidated(ie)=(e,ae) where eEntitiesstarted(id)=(a,e,a) where wasStartedBy(id,a,e,a,t,attrs)Iended(id)=(a,e,a) where wasEndedBy(id,a,e,a,t,attrs)IassociatedWith(id)=(ag,act,pl) where wasAssociatedWith(id,ag,act,pl,attrs)IattributedTo(id)=(e,ag) where wasAttributedTo(id,e,ag,attrs)IactedFor(id)=(ag2,ag1,act) where actedOnBehalfOf(id,ag2,ag1,act,attrs)Icommunicated(id)=(a2,a1) where wasInformedBy(id,a2,a1,attrs)IderivationPath(id)=e2gaue1 where wasDerivedFrom(id,e2,e1,a,g,u,attrs)IderivationPath(id)=e2gidaiduide1 where wasDerivedFrom(id,e2,e1,,,,attrs)I

Note that sinceI is normalized and valid, by the uniqueness constraints these functions are all well-defined. In the case for imprecise derivations, we generate additional activities, generations and usages linkinge2 toe1.

The definition of theinfluenced function is more involved, andis as follows:

influenced(id)=used(id)generated(id)invalidated(id){(a,e)(a,e,a)started(id)}{(a,e)(a,e,a)ended(id)}{(ag,act)(ag,act,pl)associatedWith(id)}attributedTo(id){(ag2,ag1)(ag2,ag1,act)actedFor(id)}communicated(id){(e2,e1)e2we1derivationPath(id)}{(o2,o1)wasInfluencedBy(id,o2,o1)I}

This definition ensures that by constructioninfluenced(id)contains all of the other associated relationships. For any specificid, however, most of the above sets will be empty, and the finalline will often be redundant. It is not always redundant, because itis possible to assert an unspecified influence inI.

It is straightforward to verify (by their definitions) that theevent sets associated with entities and activities satisfy theside-conditions inComponent 9.

Finally, the collection membership functionmembers is defined as follows:

members(c)={ehadMember(c,e)I

6.2.3Relations

We introduced a relation corresponding toalternateOf above, in definingThings, but this relation is not a component of the semantics.

The event ordering relation is defined as follows:

evtevt(evt,evt)GI

closed under reflexivity and transitivity. Here, we are using a slight abuse of notation: we writeGI for the directed graph that is used during validation ofI to test for cycles among event ordering constraints. See Sec. 7.1 of PROV-CONSTRAINTS [PROV-CONSTRAINTS].

6.2.4Axioms

To verify that the construction ofM(I) yields a PROV structure, we must ensure that all of the axioms and side-conditions in the components are satisfied. As noted above, the disjointness constraints are satisfied by construction.

For each axiom we give the corresponding justification:

  1. Axiom 1 follows becauseI is normalized with respect to Inference 6.
  2. Axiom 2 follows from the construction, since we add dummy generation and invalidation events for every entity.
  3. Axioms 3 and 4 follow becauseI is normalized with respect to Inference 9 and 10 respectively.
  4. Axiom 5 follows becauseI is normalized with respect to Inference 12.
  5. Axioms 6 and 7 follow becauseI is normalized with respect to Inference 13 and 14 respectively.
  6. Axioms 8 through 17 follow becauseI is normalized with respect to Inference 15.
  7. Axioms 18 through 21 follow becauseI is normalized with respect to uniqueness constraints 24 through 27.
  8. Axiom 22 follows because constraints 30, 31, 33, 34 ensure that a start event for an activity precedes any other start, end, usage or generation events involving that activity.
  9. Axiom 23 follows because constraints 30, 32, 33, 34 ensure that an end event for an activity follows any other events involving that activity.
  10. Axiom 24 follows because constraints 34, 36, 37, 39 ensure that a generation event for an entity precedes any other events involving that entity.
  11. Axiom 25 follows because constraints 36, 38, 40, 43, 44 ensure that an invalidation event for an entity follows any other generation, usage, or invalidation events involving that entity.
  12. Axiom 26 follows from constraint 41.
  13. Axiom 27 follows from constraint 42 and from the fact that the event ordering constraint graphGI associated with a valid instanceI cannot have any cycles involving a strict precedence edge.
  14. Axioms 28 through 31 follow from Constraint 47.
  15. Axioms 32 and 33 follow from Constraint 48.
  16. Axioms 34 and 35 follow from Constraint 49.
  17. Axiom 36 follows from Constraint 50, part 19, and the semantics oftypeof.

6.2.5Main results

The main results of this section are that if a valid PROV instanceI has a modelMI that satisfies all of the inferences and constraints. Thus, a form of completeness holds: every valid PROV instance has a model.

Theorem 41 (weak-completeness-theorem)

SupposeJ is a valid PROV instance. Then there exists a PROV structureM such thatMJ.

First, we consider the case whereJ itself is a valid, normalized PROV instanceI, with no existential variables, and letM(I) be the corresponding structure. ThenM(I) is a PROV structure, satisfying all of the axioms (and hence all of the inferences and constraints) stated above.

Moreover,M(I)I, as can be verified on a case-by-case basis for each type of formula by considering its semantics and the definition of the construction ofM. Most cases are straightforward; we consider the cases ofalternateOf andspecializationOf since they are among the most interesting.

  • SupposealternateOf(e1,e2)I. We wish to show thatM(I)alternateOf(e1,e2). Since there are no existential variables inI, we know thate1,e2M(I).Entities. Moreover,e1e2 according to the equivalence relation defined above, and sothingOf(e1)=[e1]=[e2]=thingOf(e2), so we can conclude thatM(I)alternateOf(e1,e2).
  • SupposespecializationOf(e1,e2)I. We wish to show thatM(I)specializationOf(e1,e2). Again, clearlye1,e2Entities, and sinceI satisfies all inferences, we know thatalternateOf(e1,e2)I so clearlythingOf(e2)=thingOf(e1) as argued above. Next,
    events(e1)=events(e1)specializationOf(e,e1)Ievents(e)events(e2)specializationOf(e,e2)Ievents(e2)=events(e2)
    becausespecializationOf(e1,e2)I and alle that are specializations ofe1 are also specializations ofe2. Furthermore, for eachattr,
    value(e1,attr)=value(e1,attr)specializationOf(e1,e)Ivalue(e,attr)value(e2,attr)specializationOf(e2,e)Ivalue(e,attr)=value(e2,attr)
    for the same reason. Finally, by constructionuniqe1value(e1,uniq) anduniqe1value(e2,uniq) so the inclusion is strict for the special attributeuniq. Thus, we have verified all of the conditions necessary to concludeM(I)specializationOf(e1,e2).

Next, we show how to handle a normalized, validI containsexistential variablesx1,,xn. Choose fresh constantsc1,,cn ofappropriate types for the existential variables and defineρ(xi)=ci. ThenM(ρ(I))ρ(I) by the above argument.Moreover,M(ρ(I)),ρI. SoM(ρ(I)) is itself the desiredmodel.

Finally, to handle the case whereJ is an arbitrary valid instance, we need to show that ifJ is not in normal form, and normalizes to someI such thatMI, thenMJ. We can prove this by induction on the length of the sequence of normalization steps. The base case, whenJ=I, is established already. SupposeJ normalizes inn+1 steps and we can perform one normalization step on it to obtainJ, which normalizes toI inn steps. By induction, we know thatMJ. For each possible normalization step, we must show that ifMJ thenMJ.

First consider inference steps. These add information, that is,JJ. Hence it is immediate thatMJ since every formula inJ is inJ, and all formulas ofJ are satisfied inM.

Next consider uniqueness constraint steps, which may involve merging formulas. That is,J=J0{r(id,a1,,an,attrs1),r(id,b1,,bn,attrs2)} andJ=S(J0){r(id,S(a1),,S(an),attrs1attrs2)}, whereS is a unifying substitution makingS(ai)=S(bi) for eachi{1,,n}. SinceMJ, we must haveM,ρJ for someρ, and therefore we must also have thatM,ρS(J0) andM,ρr(id,S(a1),,S(an),attrs1attrs2). We can extendρ to a valuationρ such thatM,ρS(x1)=x1S(xk)=xk wheredom(S)={x1,,xk}. Also,M,ρJ0 andM,ρr(id,a1,,an,attrs1attrs2). Moreover, sinceS is a unifier, we also haveM,ρr(id,b1,,bn,attrs1attrs2). Finally, since we can always remove attributes from an atomic formula without damaging its satisfiability, we can conclude thatM,ρr(id,a1,,an,attrs1)r(id,b1,,bn,attrs2). To conclude, we have shownMJ0{r(id,a1,,an,attrs1),r(id,b1,,bn,attrs2)}, that is,MJ, as desired.

A.Acknowledgements

This document has been produced by the PROV Working Group, and its contents reflect extensive discussion within the Working Group as a whole as well as feedback and comments from external reviewers. Thanks specifically to Khalid Belhajjame, Tom De Nies, Paolo Missier, Simon Miles, Luc Moreau, Satya Sahoo, Jan van den Bussche, Joachim Van Herwegen, and Antoine Zimmermann for detailed feedback.

We would also like to acknowledgeSchloss Dagstuhl - Leibniz Center for Informatics, because significant progress was made on this document atDagstuhl Seminar 12091 (Principles of Provenance) that took place from February 26 to March 2, 2012.

Thanks also to Robin Berjon for the ReSPec.js specification writingtool and to MathJax for their LaTeX-to-HTML conversion tools, both of which aided in the preparation of this document.

Members of the Provenance Working Group at the time of publication of this document were:Ilkay Altintas (Invited expert),Reza B'Far (Oracle Corporation),Khalid Belhajjame (University of Manchester),James Cheney (University of Edinburgh, School of Informatics),Sam Coppens (iMinds - Ghent University),David Corsar (University of Aberdeen, Computing Science),Stephen Cresswell (The National Archives),Tom De Nies (iMinds - Ghent University),Helena Deus (DERI Galway at the National University of Ireland, Galway, Ireland),Simon Dobson (Invited expert),Martin Doerr (Foundation for Research and Technology - Hellas(FORTH)),Kai Eckert (Invited expert),Jean-Pierre EVAIN (European Broadcasting Union, EBU-UER),James Frew (Invited expert),Irini Fundulaki (Foundation for Research and Technology - Hellas(FORTH)),Daniel Garijo (Universidad Politécnica de Madrid),Yolanda Gil (Invited expert),Ryan Golden (Oracle Corporation),Paul Groth (Vrije Universiteit),Olaf Hartig (Invited expert),David Hau (National Cancer Institute, NCI),Sandro Hawke (W3C/MIT),Jörn Hees (German Research Center for Artificial Intelligence (DFKI) Gmbh),Ivan Herman, (W3C/ERCIM),Ralph Hodgson (TopQuadrant),Hook Hua (Invited expert),Trung Dong Huynh (University of Southampton),Graham Klyne (University of Oxford),Michael Lang (Revelytix, Inc.),Timothy Lebo (Rensselaer Polytechnic Institute),James McCusker (Rensselaer Polytechnic Institute),Deborah McGuinness (Rensselaer Polytechnic Institute),Simon Miles (Invited expert),Paolo Missier (School of Computing Science, Newcastle university),Luc Moreau (University of Southampton),James Myers (Rensselaer Polytechnic Institute),Vinh Nguyen (Wright State University),Edoardo Pignotti (University of Aberdeen, Computing Science),Paulo da Silva Pinheiro (Rensselaer Polytechnic Institute),Carl Reed (Open Geospatial Consortium),Adam Retter (Invited Expert),Christine Runnegar (Invited expert),Satya Sahoo (Invited expert),David Schaengold (Revelytix, Inc.),Daniel Schutzer (FSTC, Financial Services Technology Consortium),Yogesh Simmhan (Invited expert),Stian Soiland-Reyes (University of Manchester),Eric Stephan (Pacific Northwest National Laboratory),Linda Stewart (The National Archives),Ed Summers (Library of Congress),Maria Theodoridou (Foundation for Research and Technology - Hellas(FORTH)),Ted Thibodeau (OpenLink Software Inc.),Curt Tilmes (National Aeronautics and Space Administration),Craig Trim (IBM Corporation),Stephan Zednik (Rensselaer Polytechnic Institute),Jun Zhao (University of Oxford),Yuting Zhao (University of Aberdeen, Computing Science).

B.References

B.1Informative references

[PROV-AQ]
Graham Klyne; Paul Groth; eds.Provenance Access and Query. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-aq-20130430/
[PROV-CONSTRAINTS]
James Cheney; Paolo Missier; Luc Moreau; eds.Constraints of the PROV Data Model. 30 April 2013, W3C Recommendation. URL:http://www.w3.org/TR/2013/REC-prov-constraints-20130430/
[PROV-DC]
Daniel Garijo; Kai Eckert; eds.Dublin Core to PROV Mapping. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-dc-20130430/
[PROV-DICTIONARY]
Tom De Nies; Sam Coppens; eds.PROV-Dictionary: Modeling Provenance for Dictionary Data Structures. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-dictionary-20130430/
[PROV-DM]
Luc Moreau; Paolo Missier; eds.PROV-DM: The PROV Data Model. 30 April 2013, W3C Recommendation. URL:http://www.w3.org/TR/2013/REC-prov-dm-20130430/
[PROV-LINKS]
Luc Moreau; Timothy Lebo; eds.Linking Across Provenance Bundles. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-links-20130430/
[PROV-N]
Luc Moreau; Paolo Missier; eds.PROV-N: The Provenance Notation. 30 April 2013, W3C Recommendation. URL:http://www.w3.org/TR/2013/REC-prov-n-20130430/
[PROV-O]
Timothy Lebo; Satya Sahoo; Deborah McGuinness; eds.PROV-O: The PROV Ontology. 30 April 2013, W3C Recommendation. URL:http://www.w3.org/TR/2013/REC-prov-o-20130430/
[PROV-OVERVIEW]
Paul Groth; Luc Moreau; eds.PROV-OVERVIEW: An Overview of the PROV Family of Documents. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-overview-20130430/
[PROV-PRIMER]
Yolanda Gil; Simon Miles; eds.PROV Model Primer. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-primer-20130430/
[PROV-XML]
Hook Hua; Curt Tilmes; Stephan Zednik; eds.PROV-XML: The PROV XML Schema. 30 April 2013, W3C Note. URL:http://www.w3.org/TR/2013/NOTE-prov-xml-20130430/

[8]ページ先頭

©2009-2025 Movatter.jp