技术领域technical field
本发明属于信息技术领域,尤其涉及一种基于事件相似度的事件日志与过程模型校准方法。The invention belongs to the field of information technology, in particular to an event log and process model calibration method based on event similarity.
背景技术Background technique
目前用于对事件日志与过程模型的一致性进行检测的方法有很多。为确定日志与模型偏差的程度,Adriansyah A等提出最优校准的概念,将最优校准的求解问题转换成求解最优路径问题。其中,具有最小代价的路径对应着日志与模型的最优校准。同时,根据是否区分各个事件的重要性,最优校准又进一步分为基于代价值(cost)的最优校准和基于移动步数(move)的最优校准两种情况。针对企业级数据和模型的校准问题,Song等根据模型的结构和行为特点,利用有效的启发式方法大规模缩减搜索空间,从而提高了校准的效率。There are many methods currently used to detect the consistency between the event log and the process model. In order to determine the degree of deviation between the log and the model, Adriansyah A et al. proposed the concept of optimal calibration, which transformed the problem of solving the optimal calibration into the problem of solving the optimal path. Among them, the path with the smallest cost corresponds to the optimal calibration of log and model. At the same time, according to whether to distinguish the importance of each event, the optimal calibration is further divided into two cases: the optimal calibration based on the cost value (cost) and the optimal calibration based on the number of moving steps (move). For the calibration of enterprise-level data and models, Song et al. used effective heuristic methods to reduce the search space on a large scale according to the structure and behavior characteristics of the model, thereby improving the efficiency of calibration.
适应分布式计算和大数据环境的校准研究也不断出现。Jorge M G等将SESE结构的过程模型划分为多个子模型分别校准,它可以显著减少校准所需时间,同时也易于进行偏差的诊断。Verbeek等应用Petri网的分解方法,将用Petri网表示的过程模型与其日志之间的校准问题,转化为Petri网的子网与日志映射之间的比较,提出了校准分解的可行性、伪校准的代价下限等一系列定理。Calibration research adapted to distributed computing and big data environments is also emerging. Jorge M G et al. divided the process model of the SESE structure into multiple sub-models to calibrate separately, which can significantly reduce the time required for calibration and is also easy to diagnose deviations. Verbeek et al. applied the decomposition method of Petri net, transformed the calibration problem between the process model represented by Petri net and its log into the comparison between the subnet of Petri net and the log mapping, and proposed the feasibility of calibration decomposition, pseudo-calibration A series of theorems such as the lower bound of the cost.
在当前关于模型与事件日志的校准中,每一个事件(event)均只涉及活动一个属性。但在实际环境中,已记录的事件日志信息往往涉及到事件的执行者、操作对象和事件发生的时间、地点等的信息,即事件可能具有多个属性,而不仅限于活动一个属性。同时,如果考虑到事件的活动以外的其它属性时,当前用于反映模型与日志偏差的最优校准,未必仍然是最优校准,因此需要新的方法来计算模型与日志的偏差。In the current alignment of the model with the event log, each event refers to only one attribute of the activity. However, in the actual environment, the recorded event log information often involves the executor of the event, the operation object, and the time and location of the event, that is, the event may have multiple attributes, not limited to one attribute of the activity. At the same time, if other attributes other than event activity are considered, the current optimal calibration to reflect the deviation between the model and the log may not still be the optimal calibration, so a new method is needed to calculate the deviation between the model and the log.
发明内容Contents of the invention
为解决上述问题,本发明将事件涉及到的多个属性综合加以考虑,提供了一种基于事件相似度的事件日志与过程模型校准方法,能更好地反映事件日志与过程模型间的符合性。In order to solve the above problems, the present invention comprehensively considers multiple attributes involved in the event, and provides an event log and process model calibration method based on event similarity, which can better reflect the conformity between the event log and the process model .
本发明采用的技术方案为:一种基于事件相似度的事件日志与过程模型校准方法,包括以下步骤:The technical scheme adopted in the present invention is: a method for calibrating event logs and process models based on event similarity, comprising the following steps:
首先,对事件概念进行定义,其中包括过程模型、事件的迹、事件日志以及活动映射;First, define the event concept, including process model, event trace, event log and activity mapping;
其次,基于本体树计算两事件各个属性之间的相似度;具体计算过程为:Secondly, calculate the similarity between the attributes of the two events based on the ontology tree; the specific calculation process is:
步骤1.输入迹中的事件eσ和过程模型中的事件eN,两事件的活动属性需相同;Step 1. The event eσ in the input trace and the event eN in the process model must have the same activity attribute;
步骤2.将两事件用其功能语义FS(eσ)和FS(eN)的形式表示,然后将其功能语义中的各个属性分别在本体树上进行标注并计算其相似度;Step 2. Express the two events in the form of their functional semantics FS(eσ ) and FS(eN ), and then mark each attribute in their functional semantics on the ontology tree and calculate their similarity;
步骤3.根据两事件各属性的相似度进一步计算两事件的相似度;Step 3. further calculate the similarity of the two events according to the similarity of each attribute of the two events;
最后,基于事件的相似度对事件日志中的迹和模型进行校准;具体计算过程如下:Finally, the trace and model in the event log are calibrated based on the similarity of the event; the specific calculation process is as follows:
步骤1.参照标准似然函数,定义相似度似然函数及其代价值;Step 1. With reference to the standard likelihood function, define the similarity likelihood function and its cost value;
步骤2.基于相似度似然函数,定义校准的代价值cost(γ);Step 2. Based on the similarity likelihood function, define the calibration cost value cost(γ);
步骤3.将事件的迹σ转换为事件网,计算过程模型和事件网中具有相同活动属性的事件的同步移动相似度,并构造事件网与过程模型的积网;Step 3. Convert the trace σ of the event into an event network, calculate the synchronous movement similarity of events with the same activity attributes in the process model and the event network, and construct the cumulative network of the event network and the process model;
步骤4.求出积网中从开始状态到终结状态的不同路径,并将积网中的变迁映射为校准中的移动;Step 4. Calculate the different paths from the start state to the final state in the product network, and map the transition in the product network to the movement in the calibration;
步骤5.根据相似度似然函数下校准代价值的定义,代价值最小的路径即为最优校准γ。Step 5. According to the definition of the calibration cost value under the similarity likelihood function, the path with the smallest cost value is the optimal calibration γ.
第一步中的定义如下:The definition in the first step is as follows:
定义1.过程模型:将过程模型表示为一个二元组(PN,Attr),其中,PN=(P,T;F,M),为一个Petri网,其中,P是一个库所集合,T是一个变迁集合,是流关系的集合,M:P→Num.M∈NumP是一个标识函数,其中,Num表示自然数集合,M∈NumP表示M:P→Num为一个定义在集合P上的函数;Attr是一个标签函数,且有Attr:T→ξ∪τ,其中,ξ是所有可能发生的事件集合,τ为空事件,Attr函数是将Petri网中的每一个变迁映射为事件集ξ或空事件τ中的元素,事件e是集合ξ中的一个元素,表示为(e,#res(e)[:value]),其中,e为事件分类器,用事件的活动名字来表示,#res(e)[:value]表示e的相应属性及属性值,其中[:value]表示属性值为可选项;Definition 1. Process model: represent the process model as a pair (PN, Attr), where PN = (P, T; F, M), which is a Petri net, where P is a set of places, and T is a set of transitions, is a set of flow relations, M: P→Num.M∈NumP is an identification function, where Num represents a set of natural numbers, and M∈NumP represents M: P→Num is a function defined on the set P; Attr is A label function, and has Attr:T→ξ∪τ, where ξ is the set of all possible events, τ is an empty event, and the Attr function maps each transition in the Petri net to an event set ξ or an empty event τ The element in the event e is an element in the set ξ, which is expressed as (e ,#res(e)[:value]), wheree is the event classifier, represented by the activity name of the event, #res(e )[:value] indicates the corresponding attribute and attribute value ofe , where [:value] indicates that the attribute value is optional;
定义2.事件的迹:当一个事件具有多个属性时,迹表示为一个事件的有限序列,即<(e1,#res(e1)[:value]),…,(en,#res(en)[:value])>,事件日志是迹的多重集;Definition 2. The trace of an event: when an event has multiple attributes, the trace is expressed as a finite sequence of events, that is, <(e1 ,#res(e1 )[:value]),…,(en ,# res(en )[:value])>, the event log is a multiset of traces;
定义3.活动映射:将具有多个属性的事件用它的活动属性来表示的方法,称为活动映射,记作↓act。Definition 3. Activity mapping: The method of representing an event with multiple attributes by its activity attribute is called activity mapping, denoted as↓act .
在领域专家参与下参照WordNet和HowNet构造本体树,对构造本体树时用到的概念进行定义:With the participation of domain experts, refer to WordNet and HowNet to construct ontology trees, and define the concepts used in constructing ontology trees:
定义4.事件e的功能语义Definition 4. Functional Semantics of Event e
将事件e的功能语义描述为一个多元组:FS(e)=<activity,resource,[option],constraint>,其中:Describe the functional semantics of event e as a tuple: FS(e)=<activity,resource,[option],constraint>, where:
activity={C|C∈Ca}表示事件e所对应的活动,其中Ca表示某领域描述活动的概念集合,记为#act(e);activity={C|C∈Ca } means the activity corresponding to event e, where Ca means the set of concepts describing activities in a certain field, denoted as #act(e);
resource={C|C∈Cr}表示事件e所涉及到的资源,其中Cr表示某领域描述资源的概念集合,记为#res(e);resource={C|C∈Cr } indicates the resources involved in event e, where Cr indicates the collection of concepts describing resources in a certain field, recorded as #res(e);
option={C|C∈Co}表示事件e所涉及到的其它属性,其中Co表示某领域描述相应其它属性的概念集合;option={C|C∈Co } indicates other attributes involved in the event e, where Co indicates a set of concepts describing the corresponding other attributes in a certain field;
constraint=Q1∧Q2∧…∧Qn,表示发生某个事件时应满足的各属性约束条件,其中Qi(i=1,2,…,n)表示事件某一属性的具体约束条件;约束条件Q1,Q2,…,Qn之间定义为“并且”关系;constraint=Q1 ∧Q2 ∧...∧Qn , indicating the constraints of each attribute that should be satisfied when an event occurs, where Qi (i=1,2,...,n) represents the specific constraints of a certain attribute of the event ; Constraints Q1 , Q2 , ..., Qn are defined as "and"relationship;
定义5.本体树Definition 5. Ontology tree
令O为领域D上的本体树,则有:Let O be the ontology tree on domain D, then:
O=<({C},{R})|Ci∈D,i=1,2,…,m;Rj∈D,j=1,2,…,n>;O=<({C},{R})|Ci ∈ D, i=1,2,...,m; Rj ∈D,j=1,2,...,n>;
其中,C为概念集合,表示领域D上的相关概念,又可分为类别概念和特征概念、实例概念和要素概念;其中类别概念表征具有相同性质对象的集合;特征概念反映了概念C所具有的特征;实例概念表示了概念的相应实例;要素概念表征了概念的所组成要素;R是关系的集合,本体定义的关系有:is-a、instance-of、element-of和trait-of,其中,is-a表示概念间的继承关系,instance-of表示概念和概念具体实例之间的关系,part-of指概念与概念的组成要素之间的关系,trait-of表示概念和概念的特征之间的关系;Among them, C is a collection of concepts, representing related concepts in the domain D, and can be divided into category concepts, feature concepts, instance concepts, and element concepts; the category concept represents a collection of objects with the same nature; the feature concept reflects the concept C has. The characteristics of R; the instance concept represents the corresponding instance of the concept; the element concept represents the constituent elements of the concept; R is a set of relations, and the relations defined by the ontology are: is-a, instance-of, element-of and trait-of, Among them, is-a indicates the inheritance relationship between concepts, instance-of indicates the relationship between concepts and concrete instances of concepts, part-of indicates the relationship between concepts and their constituent elements, and trait-of indicates the characteristics of concepts and concepts The relationship between;
定义6.本体语义标注Definition 6. Ontology semantic annotation
将给定的概念k替换或部分替换为本体树中的概念的过程,称为概念的本体语义标注,记作remark(k);The process of replacing or partially replacing a given concept k with a concept in the ontology tree is called ontology semantic annotation of the concept, denoted as remark(k);
定义7.事件的本体语义标注Definition 7. Ontology Semantic Labeling of Events
将事件e的功能语义FS(e)进行本体标注实质上是将事件的各个属性在本体树中进行语义标注的过程;首先,在本体树中,将事件e涉及到的概念#attr(e)进行本体语义标注,若事件e的某个属性有具体的属性值,则根据所标注的trait-of给出其具体值。The ontology labeling of the functional semantics FS(e) of the event e is essentially a process of semantically labeling the various attributes of the event in the ontology tree; first, in the ontology tree, the concept #attr(e) involved in the event e Ontology semantic annotation is carried out. If a certain attribute of event e has a specific attribute value, its specific value is given according to the marked trait-of.
基于事件属性在本体树上的标注求相似度的步骤为:The steps to calculate the similarity based on the labeling of event attributes on the ontology tree are as follows:
若eN为模型中的一个事件,eσ是迹中的一个事件,且有eN=eσ,将两事件表示为其功能语义FS(eN)和FS(eσ)的形式,并将FS(eN)和FS(eσ)中的每一属性在本体树上进行标注;若eN和eσ某一属性在本体树中的标注分别表示为x和y,则概念对(x,y)之间的相似度sim(x,y)可分如下情况进行讨论:If eN is an event in the model, eσ is an event in the trace, andeN =eσ , express the two events in the form of their functional semantics FS(eN ) and FS(eσ ), and Label each attribute in FS(eN ) and FS(eσ ) on the ontology tree; if the labeling of a certain attribute of eN and eσ in the ontology tree is represented as x and y respectively, then the concept pair ( The similarity sim(x,y) between x,y) can be discussed in the following situations:
(1)若不存在y∈(Cr(eσ)∪Co(eσ)),使得kind(x)=kind(y),则sim(x,y)=0;(1) If there is no y∈(Cr (eσ )∪Co (eσ )), so that kind(x)=kind(y), then sim(x,y)=0;
(2)若存在y∈(Cr(eσ)∪Co(eσ)),使得kind(x)=kind(y),且(2) If there exists y∈(Cr(eσ )∪Co(eσ )), such that kind(x)=kind(y), and
若y∈descendant(x)或y∈instance-of(x),则sim(x,y)=1;If y∈descendant(x) or y∈instance-of(x), then sim(x,y)=1;
若且则sim(x,y)=Dis(x,y),即x,y之间的相似度sim(x,y)为其在本体树中的距离;like and Then sim(x, y)=Dis(x, y), that is, the similarity sim(x, y) between x and y is its distance in the ontology tree;
(3)若存在y∈(Cr(eσ)∪Co(eσ)),使得kind(x)=kind(y),且(3) If there exists y∈(Cr (eσ )∪Co (eσ )), such that kind(x)=kind(y), and
将限定x和y的每一个约束Qi,映射为其特征trait-ofi(x)和trait-ofi(y):Map each constraint Qi that defines x and y to its trait-ofi (x) and trait-ofi (y):
(a)若则sim(Qi(xi,yi))=Dis(x,y);(a) if Then sim(Qi (xi ,yi ))=Dis(x,y);
(b)若且trait-ofi(y)不满足trait-ofi(x),则sim(Qi(xi,yi))=0;(b) if And trait-ofi (y) does not satisfy trait-ofi (x), then sim(Qi (xi ,yi ))=0;
(c)若则trai-tofi(y)满足trait-ofi(x)(c) if Then trait-ofi (y) satisfies trait-ofi (x)
若y∈descendant(x)或y∈instance-of(x),则sim(Qi(xi,yi))=1;If y∈descendant(x) or y∈instance-of(x), then sim(Qi (xi , yi ))=1;
若且则sim(Qi(xi,yi))=Dis(x,y);like and Then sim(Qi (xi ,yi ))=Dis(x,y);
然后,sim(x,y)=average(sim(Qi(xi,yi));Then, sim(x,y)=average(sim(Qi (xi ,yi ));
在以上计算过程中,函数kind表示事件属性所属种类;若有kind(x)=kind(y),且sim(Qi(xi,yi))是在Qi的限定下xi,yi的相似度,而sim(x,y)等于在各种不同Qi下sim(Qi(xi,yi))的平均值,其中trait-of是本体树中存在的关系之一。In the above calculation process, the function kind indicates the type of event attribute; if there is kind(x)=kind(y), and sim(Qi (xi , yi )) is the similarity betweenxi and yi under the limitation of Qi , and sim(x, y) is equal to sim(Qi(xi ,yi )), where trait-of is one of the relations existing in the ontology tree.
参照标准似然函数,令日志移动、模型移动、弱同步移动和强同步移动的代价值cost分别记为(1,1,1,1-Sim-event(eσ,eN)),称其为相似度似然函数,其中,Sim-event(eσ,eN)表示事件对(eσ,eN)之间的相似度,是上述所给出的事件各属性相似度的平均值;基于相似度似然函数,校准代价值cost(γ)为校准γ中包含的日志移动数、模型移动数、弱同步移动数与强同步移动数*(1-Sim-event(eσ,eN))之和。Referring to the standard likelihood function, let the cost values of log movement, model movement, weak synchronization movement and strong synchronization movement be recorded as (1,1,1,1-Sim-event(eσ ,eN )) respectively, called is the similarity likelihood function, where Sim-event(eσ , eN ) represents the similarity between event pairs (eσ , eN ), which is the average value of the similarity of each attribute of the event given above; Based on the similarity likelihood function, the calibration cost value cost(γ) is the log movement number, model movement number, weak synchronization movement number and strong synchronization movement number*(1-Sim-event(eσ ,eN ))Sum.
本发明从具有多个属性的事件出发,研究过程模型与事件日志的校准问题。为确保日志与模型中事件描述的一致性,将事件及其属性用其功能语义表示,并在领域本体上加以标注。通过计算事件对之间多个属性的相似度,进而计算具有相同活动属性事件对之间的相似度。基于事件的相似度,给出过程模型与事件日志之间的校准算法。The invention starts from the event with multiple attributes, and studies the calibration problem of the process model and the event log. In order to ensure the consistency between the log and the event description in the model, the event and its attributes are represented by their functional semantics and marked on the domain ontology. By calculating the similarity of multiple attributes between event pairs, the similarity between event pairs with the same active attribute is calculated. Based on the similarity of events, a calibration algorithm between the process model and the event log is given.
通过本发明所述校准和简单校准的比较可知,简单校准反映的是偏差的质,而本发明所述的校准在质的基础上反映了偏差的量。因此,本发明所述的校准对校准的描述更加细致,可为改进模型或发现日志中的问题提供更详细的依据。The comparison between the calibration in the present invention and the simple calibration shows that the simple calibration reflects the quality of the deviation, while the calibration in the present invention reflects the quantity of the deviation on a qualitative basis. Therefore, the calibration described in the present invention describes the calibration in more detail, which can provide a more detailed basis for improving the model or finding problems in the log.
附图说明Description of drawings
图1是用Petri网描述的一个培训课程的过程模型N1;Fig. 1 is a process model N1 of a training course described by Petri net;
图2是教学领域的部分功能本体示意图;Figure 2 is a schematic diagram of some functional ontology in the teaching field;
图3是实施例所述过程模型示意图;Fig. 3 is a schematic diagram of the process model described in the embodiment;
图4是实施例所述事件网示意图;Fig. 4 is a schematic diagram of the event network described in the embodiment;
图5是实施例所述过程模型与事件网的积网示意图;Fig. 5 is a schematic diagram of the accumulated network of the process model and event network described in the embodiment;
图6是简单校准和本发明所述的最优校准比较,其中(a)是简单最优校准,(b)是本发明所述的最优校准;Fig. 6 is simple calibration and optimal calibration comparison of the present invention, wherein (a) is simple optimal calibration, (b) is optimal calibration of the present invention;
图7是简单校准和本发明所述校准的结构比较,其中(a)是简单校准的结构图,(b)是本发明所述校准的结构图。Fig. 7 is a structure comparison between the simple calibration and the calibration of the present invention, wherein (a) is the structural diagram of the simple calibration, and (b) is the structural diagram of the calibration of the present invention.
具体实施方式Detailed ways
下面结合附图和实施例对本发明进行详细说明。The present invention will be described in detail below in conjunction with the accompanying drawings and embodiments.
基于事件相似度的事件日志与过程模型校准方法,包括以下步骤:The event log and process model calibration method based on event similarity includes the following steps:
首先,对事件校准过程中涉及的概念进行定义和解释。First, the concepts involved in the event calibration process are defined and explained.
定义1.过程模型Definition 1. Process Model
将过程模型表示为一个二元组(PN,Attr),其中,Represent the process model as a two-tuple (PN, Attr), where,
(1)PN=(P,T;F,M),为一个Petri网,其中,P是一个库所集合,T是一个变迁集合,是流关系的集合,M:P→Num.M∈NumP是一个标识函数,其中,Num表示自然数集合,M∈NumP表示M:P→Num为一个定义在集合P上的函数。(1) PN=(P, T; F, M), which is a Petri net, where P is a place set, T is a transition set, is a set of flow relations, M:P→Num.M∈NumP is an identification function, where Num represents a set of natural numbers, and M∈NumP represents M:P→Num is a function defined on the set P.
(2)Attr是一个标签函数,且有Attr:T→ξ∪τ,其中,ξ是所有可能发生的事件集合,τ为空事件。标签函数Attr将Petri网中的每一个变迁映射为事件集ξ或空事件τ中的元素。事件e是ξ中的一个元素,表示为(e,#res(e)[:value]),其中,e为事件分类器,用事件的活动名字来表示,#res(e)[:value]表示e的相应属性及属性值,其中[:value]表示属性值为可选项。(2) Attr is a label function, and has Attr:T→ξ∪τ, where ξ is the set of all possible events, and τ is an empty event. The label function Attr maps each transition in the Petri net to an element in the event set ξ or the empty event τ. The event e is an element in ξ, expressed as (e ,#res(e)[:value]), wheree is the event classifier, represented by the activity name of the event, #res(e)[:value] Indicates the corresponding attribute and attribute value ofe , where [:value] indicates that the attribute value is optional.
定义2.事件的迹Definition 2. The trace of an event
当一个事件具有多个属性时,迹表示为一个事件的有限序列,即<(e1,#res(e1)[:value]),…,(en,#res(en)[:value])>。事件日志是迹的多重集。When an event has multiple attributes, the trace is represented as a finite sequence of events, i.e. <(e1 ,#res(e1 )[:value]),…,(en ,#res(en )[: value])>. An event log is a multiset of traces.
定义3.活动映射Definition 3. Activity Mapping
将具有多个属性的事件用它的活动属性来表示的方法,称为活动映射,记作↓act。类似地,活动映射可将事件的迹映射为其相对应的简单迹,也将事件日志映射称为其相对应的简单事件日志,还将过程模型映射为一个简单过程模型。The method of representing an event with multiple attributes by its activity attribute is called activity mapping, denoted as↓act . Similarly, an activity map may map a trace of an event to its corresponding simple trace, also refer to an event log map as its corresponding simple event log, and map a process model to a simple process model.
如无特殊说明,本发明中的事件均指有多个属性的事件。由此,过程模型指的是多属性事件模型,迹指多属性事件的迹,事件日志指多属性事件日志。而将事件中只有活动属性的模型、事件日志、迹和校准分别称为简单模型、简单事件日志、简单迹和简单校准。Unless otherwise specified, the events in the present invention refer to events with multiple attributes. Thus, the process model refers to the multi-attribute event model, the trace refers to the trace of the multi-attribute event, and the event log refers to the multi-attribute event log. The models, event logs, traces, and calibrations that have only active attributes in events are called simple models, simple event logs, simple traces, and simple calibrations, respectively.
下面通过一个实例来说明过程模型和事件的迹等概念。The following uses an example to illustrate concepts such as process models and event traces.
图1用Petri网描述了一个培训课程的过程模型N1。其主要活动列举如下:①学员向助理(assistant)咨询相关情况并提交申请(apply request);②注册个人信息(register);③助理核实学员个人信息(check information);④学员支付培训费用(payfinancial staff);⑤助理(assistants)准备相关课程资料(prepare);⑥Sam(teacher)在教室B201开展培训;⑦学员预习下一节课程(preview);⑧对培训结果进行考核(examine)或考查(assess)。事件除活动外的属性及属性值在图中用虚框表示,是对事件的限制。在图1中的各个事件,包括活动、资源resource和地点place等属性。Figure 1 describes the process model N1 of a training course with Petri net. Its main activities are listed as follows: ①Students consult the assistant (assistant) about the relevant situation and submit an application (apply request); ②Register personal information (register); ③Assistants verify the personal information of students (check information); ④Students pay training fees (payfinancial staff); ⑤Assistants prepare relevant course materials (prepare); ⑥Sam (teacher) conducts training in classroom B201; ⑦Students preview the next course (preview); ⑧Examine or assess the training results ). Attributes and attribute values of events other than activities are represented by dotted boxes in the figure, which are restrictions on events. Each event in Figure 1 includes attributes such as activity, resource resource, and place.
若有迹σ1和σ2,其中σ1=<(apply request,assistant:Cindy),(register,assistant:Jack),(pay,financial staff),(prepare,assistant),(teach course,teacher:Sam,classroom:B201),(preview,student),(teach course,teacher:Sam,classroom:B201),(examine,teacher:Sam)>。而迹σ2除teach course发生在B202外,其他的与迹σ1相同。If there are traces σ1 and σ2 , where σ1 =<(apply request,assistant:Cindy),(register,assistant:Jack),(pay,financial staff),(prepare,assistant),(teach course,teacher: Sam,classroom:B201),(preview,student),(teach course,teacher:Sam,classroom:B201),(examine,teacher:Sam)>. And trace σ2 is the same as trace σ1 except that the teach course takes place in B202.
对迹σ1和σ2进行活动映射,得到σ1↓act=σ2↓act=<apply request,register,pay,prepare,teach course,preview,teach course,examine>。即σ1和σ2在N1的简单模型上等价,且与模型的fitness度(适应度)均为1。但考虑到事件的多个属性时,将迹σ2在N1上进行重演,则其fitness度小于1。Perform activity mapping on traces σ1 and σ2 to obtain σ1↓act =σ2↓act =<apply request, register, pay, prepare, teach course, preview, teach course, examine>. That is, σ1 and σ2 are equivalent on the simple model of N1 , and the fitness degree (fitness) of the model is 1. However, when multiple attributes of the event are considered, if the trace σ2 is replayed on N1 , its fitness degree is less than 1.
其次,基于本体树计算两事件各属性之间的相似度。Secondly, the similarity between the attributes of the two events is calculated based on the ontology tree.
属性之间的相似度可利用本体树进行计算。而当前的本体树尚不够完善,故利用WordNet和HowNet等以词法所代表的概念为描述对象,以揭示了不同概念之间以及概念属性关系之间关系的词汇语义网,来进行属性的相似度计算。在语义网中,根据需要校准的具体模型和具体事件日志选择其表示的关系。The similarity between attributes can be calculated using the ontology tree. However, the current ontology tree is still not perfect, so we use WordNet and HowNet, etc., to describe the concepts represented by the lexical system, and to reveal the relationship between different concepts and the relationship between the attributes of concepts in the lexical semantic network to perform attribute similarity. calculate. In the Semantic Web, the relations represented are chosen according to the specific model and specific event log that need to be calibrated.
参照WordNet的构造方法以及语义在服务组合中的应用,可以认为一个过程模型是面向特定领域实现一系列功能的程序。每一项功能可以由一个事件来完成,而一个事件又可由多个属性来表示。此处定义事件的功能语义描述,描述中体现了事件的常用属性,如活动属性和资源属性,其它属性作为可选项。Referring to the construction method of WordNet and the application of semantics in service composition, it can be considered that a process model is a program that implements a series of functions for a specific domain. Each function can be completed by an event, and an event can be represented by multiple attributes. The functional semantic description of the event is defined here, which reflects common attributes of the event, such as activity attributes and resource attributes, and other attributes are optional.
定义4.事件e的功能语义Definition 4. Functional Semantics of Event e
事件e的功能语义可以描述为一个多元组:FS(e)=<activity,resource,[option],constraint>,其中:The functional semantics of event e can be described as a tuple: FS(e)=<activity,resource,[option],constraint>, where:
activity={C|C∈Ca}表示事件e所对应的活动,其中Ca表示某领域描述活动的概念集合,如teach、exam、assess等,记为#act(e)。activity={C|C∈Ca } represents the activity corresponding to event e, where Ca represents a collection of concepts describing activities in a certain domain, such as teach, exam, assess, etc., denoted as #act(e).
resource={C|C∈Cr}表示事件e所涉及到的资源,包含活动发起的主体、事件作用的客体等,其中Cr表示某领域描述资源的概念集合,如teacher、student、assistant等,记为#res(e)。resource={C|C∈Cr } indicates the resources involved in the event e, including the subject of the activity initiation, the object of the event, etc., where Cr indicates the collection of concepts describing resources in a certain field, such as teacher, student, assistant, etc. , recorded as #res(e).
option={C|C∈Co}表示事件e所涉及到的其它属性,如地点、时间等,其中Co表示某领域描述相应其它属性的概念集合,涉及地点的概念如lab、classroom;涉及时间的概念如year、time等。[option]表示这一项在事件e的功能语义中是可选项。option={C|C∈Co } indicates other attributes involved in the event e, such as location, time, etc., where Co indicates a set of concepts that describe other attributes in a certain field, and concepts related to locations such as lab and classroom; The concept of time such as year, time, etc. [option] indicates that this item is optional in the functional semantics of event e.
对于事件的所有属性,定义函数kind来表示事件属性所属种类。如kind(#res(e))=res。For all the attributes of the event, the function kind is defined to indicate the kind of the event attribute. Such as kind(#res(e))=res.
constraint=Q1∧Q2∧…∧Qn,表示发生某个事件时应满足的各属性约束条件,其中Qi(i=1,2,…,n,)表示事件发生某一属性的具体约束条件。将不同属性的约束条件之间定义为“并且”关系。如(capacity>50)∧(multi-media=true)为地点选择的两个条件,capacity>50表示其容纳人数应大于50,multi-media=true表示此处应有多媒体设备。constraint=Q1 ∧Q2 ∧...∧Qn , which means the constraints of each attribute that should be satisfied when an event occurs, where Qi (i=1,2,...,n,) represents the specific Restrictions. Define an "and" relationship between constraints of different attributes. For example, (capacity>50)∧(multi-media=true) is the two conditions for site selection, capacity>50 means that the number of people it can accommodate should be greater than 50, and multi-media=true means that there should be multimedia equipment here.
由上可知,事件功能语义描述为一个多元组,多元组的核心是活动activity,资源resource是活动作用的主体或客体,option表示事件的其它属性,constraint是对事件属性进行的限定和规范。It can be seen from the above that the event function semantics is described as a multigroup, the core of the multigroup is the activity activity, the resource resource is the subject or object of the activity, option represents other attributes of the event, and constraint is the limitation and specification of the event attributes.
为保证迹和模型校准时采用相同的描述方式,建立领域本体树,将事件的功能描述映射到本体树中。本体树的建立一方面可避免引起语义冲突,另一方面用做事件相似度的计算。In order to ensure that traces and models are calibrated using the same description method, a domain ontology tree is established, and the functional description of events is mapped to the ontology tree. On the one hand, the establishment of the ontology tree can avoid semantic conflicts, and on the other hand, it can be used to calculate the similarity of events.
定义5.本体树Definition 5. Ontology tree
令O为领域D上的本体树,则有:Let O be the ontology tree on domain D, then:
O=<({C},{R})|Ci∈D,i=1,2,…,m;Rj∈D,j=1,2,…,n>,O=<({C},{R})|Ci ∈ D, i=1,2,...,m; Rj ∈D,j=1,2,...,n>,
其中,C为概念集合,表示领域D上的相关概念,又可分为类别概念和特征概念、实例概念和要素概念。其中类别概念表征具有相同性质对象的集合,如teacher、place、classroom等;特征概念反映了概念C所具有的特征,如classroom的特征有capacity(容纳人数);实例概念表示了概念的相应实例;要素概念表征了概念的所组成要素。Among them, C is a collection of concepts, representing related concepts in domain D, and can be divided into category concepts, feature concepts, instance concepts and element concepts. Among them, the category concept represents a collection of objects with the same nature, such as teacher, place, classroom, etc.; the feature concept reflects the characteristics of concept C, such as the feature of classroom has capacity (number of people); the instance concept represents the corresponding instance of the concept; The element concept represents the constituent elements of the concept.
R是关系的集合。在这里,本体定义的关系有:is-a、instance-of、element-of以及trait-of。其中,is-a表示概念间的继承关系,instance-of表示概念和概念具体实例之间的关系,part-of指概念与概念的组成要素之间的关系,trait-of表示概念和概念的特征之间的关系。图2表示了为教学领域的部分功能本体树。R is a collection of relations. Here, the relationships defined by the ontology are: is-a, instance-of, element-of, and trait-of. Among them, is-a indicates the inheritance relationship between concepts, instance-of indicates the relationship between concepts and concrete instances of concepts, part-of indicates the relationship between concepts and their constituent elements, and trait-of indicates the characteristics of concepts and concepts The relationship between. Figure 2 shows part of the functional ontology tree for the teaching domain.
定义6本体语义标注Definition 6 Ontology Semantic Labeling
将给定的概念k替换或部分替换为领域功能本体树中概念的过程,称为概念的本体语义标注,记作remark(k)。The process of replacing or partially replacing a given concept k with a concept in the domain functional ontology tree is called ontology semantic annotation of the concept, denoted as remark(k).
定义7事件的本体语义标注Definition 7 Ontology Semantic Labeling of Events
将事件e的功能语义FS(e)进行本体标注实质上是将事件的各个属性在本体树中进行语义标注的过程。首先,在某领域的本体树中,将事件e涉及到的概念#attr(e)进行本体语义标注remark(#attr(e))。若事件e的某个属性有属性值value(#attr(e)),则根据remark(#attr(e))的trait-of属性给出value(#attr(e))的具体值。如在图1中,过程模型N1的变迁t6所表示的事件e,其#act(e)为teach course,#res1(e)为classroom,其值限定为B201。故事件e的功能语义可描述为FS(e)=<teach course,classroom,classroom=B201>。根据图2所示的本体树,有remark(classroom)=classroom,B201为classroom的限定值,而classroom有trait-of为capacity和multi-media,则可求得capacity(B201)和multi-media(B201)。The ontology labeling of the functional semantics FS(e) of the event e is essentially a process of semantically labeling each attribute of the event in the ontology tree. First, in the ontology tree of a certain domain, remark (#attr(e)) the concept #attr(e) involved in the event e. If an attribute of event e has an attribute value value(#attr(e)), the specific value of value(#attr(e)) is given according to the trait-of attribute of remark(#attr(e)). As shown in Fig. 1, for event e represented by transition t6 of process model N1 , its #act(e) is a teach course, #res1 (e) is a classroom, and its value is limited to B201. The functional semantics of story piece e can be described as FS(e)=<teach course, classroom, classroom=B201>. According to the ontology tree shown in Figure 2, there is remark (classroom) = classroom, B201 is the limit value of classroom, and classroom has trait-of as capacity and multi-media, then capacity (B201) and multi-media ( B201).
类似地,可将迹中各事件的属性在本体树上加以标注。如迹中存在事件e,其#act(e)为teach course,#res1(e)为classroom,其值为B202。故可知,remark(classroom)=classroom,然后求得capacity(B202)和multi-media(B202)。Similarly, the attributes of each event in the trace can be marked on the ontology tree. If there is an event e in the trace, its #act(e) is a teach course, #res1 (e) is a classroom, and its value is B202. Therefore, it can be seen that remark(classroom)=classroom, and then capacity(B202) and multi-media(B202) are obtained.
事件属性在本体树中的语义标注,特别是涉及到属性具体值时,需要人工标注。因此,事件各个属性的语义标注是半自动完成的。同时,各领域的本体树可在领域专家参与下参照WordNet和HowNet进行构造。The semantic labeling of event attributes in the ontology tree, especially when it comes to the specific values of attributes, requires manual labeling. Therefore, the semantic annotation of each attribute of an event is done semi-automatically. At the same time, the ontology tree of each domain can be constructed with reference to WordNet and HowNet with the participation of domain experts.
基于本体树计算事件相似度的过程如下:The process of calculating event similarity based on ontology tree is as follows:
对过程模型与迹中事件的相似度计算可转换为事件多个属性之间相似度的计算。可基于本体树进行属性的相似度计算。具体步骤如下:The calculation of the similarity between the process model and the event in the trace can be converted into the calculation of the similarity between multiple attributes of the event. The similarity calculation of attributes can be performed based on the ontology tree. Specific steps are as follows:
步骤1、输入迹中的事件eσ和模型中的事件eN,两事件的活动属性需相同。Step 1. The event eσ in the input trace and the event eN in the model must have the same activity attributes.
步骤2、分别将两事件eσ和eN表示为其功能语义的形式FS(eσ)和FS(eN),将功能语义FS(eσ)和FS(eN)的各个属性在本体树上进行标注。Step 2. Express the two events eσ and eN in their functional semantic forms FS(eσ ) and FS(eN ) respectively, and put the attributes of functional semantics FS(eσ ) and FS(eN ) in the ontology label on the tree.
步骤3、计算两事件中每一对属性在本体树上的相似度,记录下各属性的种类及各属性相应的相似度,然后以其平均值作为两事件的相似度Sim-event(eσ,eN)。Step 3. Calculate the similarity of each pair of attributes in the two events on the ontology tree, record the type of each attribute and the corresponding similarity of each attribute, and then use the average value as the similarity Sim-event(eσ , eN ).
下面以模型中事件eN、迹中事件eσ(两事件活动属性相同,即eN=eσ)为例,分类说明相似度的计算规则:Taking the event eN in the model and the event eσ in the trace (the two events have the same activity attribute, that is,eN =eσ ) as examples, the calculation rules of the similarity are explained by classification:
将两事件的每个属性(也是在功能语义中的属性)在本体树上进行标注并分别表示为(x,y),其中的x和y根据其本体树的特点,可允许有traitof等特征。对概念对(x,y),其相似度sim(x,y)可分如下情况进行讨论:Each attribute of the two events (also in the functional semantics) is marked on the ontology tree and expressed as (x, y) respectively, where x and y are allowed to have characteristics such as traitof according to the characteristics of the ontology tree . For a concept pair (x, y), its similarity sim(x, y) can be discussed in the following cases:
(1)若不存在y∈(Cr(eσ)∪Co(eσ)),使得kind(x)=kind(y),则sim(x,y)=0;(1) If there is no y∈(Cr (eσ )∪Co (eσ )), so that kind(x)=kind(y), then sim(x,y)=0;
(2)若存在y∈(Cr(eσ)∪Co(eσ)),使得kind(x)=kind(y),且(2) If there exists y∈(Cr (eσ )∪Co (eσ )), such that kind(x)=kind(y), and
若y∈descendant(x)或y∈instance-of(x),则sim(x,y)=1;If y∈descendant(x) or y∈instance-of(x), then sim(x,y)=1;
若且则sim(x,y)=Dis(x,y),即x,y之间的相似度sim(x,y)为其在本体树中的距离;like and Then sim(x, y)=Dis(x, y), that is, the similarity sim(x, y) between x and y is its distance in the ontology tree;
(3)若存在y∈(Cr(eσ)∪Co(eσ)),使得kind(x)=kind(y),且(3) If there exists y∈(Cr (eσ )∪Co (eσ )), such that kind(x)=kind(y), and
将限定x和y的每一个约束Qi,映射为其特征trait-ofi(x)和trait-ofi(y):Map each constraint Qi that defines x and y to its trait-ofi (x) and trait-ofi (y):
(a)若sim(Qi(xi,yi))=Dis(x,y);(a) if sim(Qi (xi ,yi ))=Dis(x,y);
(b)若且trait-ofi(y)不满足trait-ofi(x),则sim(Qi(xi,yi))=0;(b) if And trait-ofi (y) does not satisfy trait-ofi (x), then sim(Qi (xi ,yi ))=0;
(c)若则trai-tofi(y)满足trait-ofi(x)(c) if Then trait-ofi (y) satisfies trait-ofi (x)
若y∈descendant(x)或y∈instance-of(x),则sim(Qi(xi,yi))=1;If y∈descendant(x) or y∈instance-of(x), then sim(Qi (xi , yi ))=1;
若且则sim(Qi(xi,yi))=Dis(x,y);like and Then sim(Qi (xi ,yi ))=Dis(x,y);
然后,sim(x,y)=average(sim(Qi(xi,yi));Then, sim(x,y)=average(sim(Qi (xi ,yi ));
在以上计算过程中,函数kind表示事件属性所属种类;若有kind(x)=kind(y),且sim(Qi(xi,yi))是在Qi的限定下xi,yi的相似度,而sim(x,y)等于在各种不同Qi下sim(Qi(xi,yi))的平均值,其中trait-of是本体树中存在的关系之一。其中表示对该条件没有限制,例如表示对x的限制为空。trait-of表示属性的特征,是本体树中的集合关系之一。In the above calculation process, the function kind indicates the type of event attribute; if there is kind(x)=kind(y), and sim(Qi (xi , yi )) is the similarity betweenxi and yi under the limitation of Qi , and sim(x, y) is equal to sim(Qi(xi ,yi )), where trait-of is one of the relations existing in the ontology tree. in Indicates that there is no restriction on the condition, for example Indicates that the constraint on x is empty. trait-of represents the characteristics of attributes, and is one of the set relationships in the ontology tree.
在以上事件的属性概念对(x,y)的相似度计算中,(1)指迹的事件eσ中不存在和模型的事件eN中的属性x相对应的概念y,如x中有resource,而y中没有,此时sim(x,y)为0;(2)指当(x,y)为相同的概念对,且约束条件中没有对x的限定,若y是x的子孙结点或是x的实例,则认为概念y符合x,其相似度为1;(如两属性同为概念resource,x在语义树上对应概念place且无具体约束,而y属于classroom或place的具体实例);如果y不是x的子孙结点或实例,则以x,y在语义树上的距离表示其相似度;(如x对应概念Lab,而y对应classroom);(3)如果y中存在和x同类的概念,且有对x的限定约束,则将其每一个约束Qi映射为其特征trait-ofi(x)和trait-ofi(y)。若对y的约束为空,则在此约束Qi上,sim(Qi(xi,yi))为Dis(x,y);若对y的约束不为空,则比较trait-ofi(y)与trait-ofi(x)。若trait-ofi(y)不满足trait-ofi(x),则sim(Qi(xi,yi))=0;若trait-ofi(y)满足trait-ofi(x),若y是x的子孙结点或实例时,则(x,y)在约束Qi上的相似度sim(Qi(xi,yi))为1,若y不是x的子孙结点或实例,则(x,y)在约束Qi上的相似度sim(Qi(xi,yi))为Dis(x,y)。例如,假设对x的约束条件为classroom=B201,则将其映射trait-of(classroom)为capacity(B201);若对y的约束为空,此时在此约束下,(x,y)的相似度为两者在本体中的距离Dis(x,y);假设对y的约束为Lab=C305,则将其映射为capacity(C305),然后比较capacity(B201)与capacity(C305)。若capacity(C305)不满足capacity(B201),则在此约束下,(x,y)的相似度为0;若capacity(C305)满足capacity(B201),因y为Lab不是classroom的实例或子孙结点,则在此约束下,(x,y)的相似度为Dis(x,y);若y是classroom的实例或子孙结点,则在此约束下,(x,y)的相似度为1。当对x存在多个约束时,(x,y)的相似度为每个约束下属性相似度的平均值。在sim(x,y)的计算中,Dis(x,y)表示(x,y)在语义树中的距离。满足条件的界定,可根据具体概念的trait-of而定,一般理解为“优于”。如multi-media为true优于multi-media为false。In the above event attribute concept pair (x, y) similarity calculation, (1) there is no concept y corresponding to the attribute x in the event eN of the model in the event eσ of the trace, for example, there is resource, but not in y, then sim(x,y) is 0; (2) means when (x,y) is the same concept pair, and there is no restriction on x in the constraints, if y is a descendant of x node or an instance of x, the concept y is considered to be consistent with x, and its similarity is 1; (for example, the two attributes are both concept resources, x corresponds to the concept place on the semantic tree and has no specific constraints, and y belongs to classroom or place specific instance); if y is not a descendant node or instance of x, then use the distance between x and y on the semantic tree to represent its similarity; (such as x corresponds to the concept Lab, and y corresponds to classroom); (3) if y is If there is a concept similar to x, and there are constraints on x, each constraint Qi is mapped to its trait-ofi (x) and trait-ofi (y). If the constraint on y is empty, then on this constraint Qi , sim(Qi (xi ,yi )) is Dis(x,y); if the constraint on y is not empty, compare trait-ofi (y) and trait-ofi (x). If trait-ofi (y) does not satisfy trait-ofi (x), then sim(Qi (xi , yi ))=0; if trait-ofi (y) satisfies trait-ofi (x) , if y is a descendant node or instance of x, then the similarity sim(Qi (xi , yi )) of (x,y) on the constraint Qi is 1, if y is not a descendant node of x Or example, then the similarity sim(Qi (xi , yi )) of (x, y) on the constraint Qi is Dis(x, y). For example, suppose the constraint on x is classroom=B201, then map trait-of(classroom) to capacity(B201); if the constraint on y is empty, then under this constraint, (x,y) The similarity is the distance Dis(x,y) between the two in the ontology; assuming that the constraint on y is Lab=C305, map it to capacity(C305), and then compare capacity(B201) and capacity(C305). If capacity(C305) does not satisfy capacity(B201), then under this constraint, the similarity of (x, y) is 0; if capacity(C305) satisfies capacity(B201), because y is not an instance or descendant of classroom node, then under this constraint, the similarity of (x, y) is Dis(x, y); if y is an instance or descendant node of classroom, then under this constraint, the similarity of (x, y) is 1. When there are multiple constraints on x, the similarity of (x, y) is the average of the attribute similarities under each constraint. In the calculation of sim(x,y), Dis(x,y) represents the distance of (x,y) in the semantic tree. The definition of meeting the conditions can be determined according to the trait-of of the specific concept, which is generally understood as "better than". If multi-media is true is better than multi-media is false.
如图1所示的模型N1中,对变迁t6所对应的事件e,#act(e)=teacher course,#resource(e)=teacher:sam,#place(e)=classroom:B201。与之相对应的迹中事件e′,#act(e′)=teachercourse,#resource(e′)=teacher:Sam,#place(e′)=classroom:B202。由上述步骤可知,对resource属性,sim(x,y)=1,对place属性,#place(e)=#place(e′)=classroom为相同的概念,但对事件e,其约束为Q(x)=B201,而对事件e′,其约束为B202,故此判断B202的各个trait-of是否满足B201的要求。假定capacity(B202)<capacity(B201),但multi-media(B201)=true,而multi-media(B202)=false,在约束#place(e)=classroom:B201下,其属性place的相似值为0.5。故两事件相似度计算结果为:Sim-event(e,e′)=0.75,com_attr[0]={resource,1},com_attr[1]={place,0.5}。In model N1 shown in FIG.1 , for event e corresponding to transition t6,#act (e)=teacher course, #resource(e)=teacher:sam, #place(e)=classroom:B201. Corresponding to the event e' in the trace, #act(e')=teachercourse, #resource(e')=teacher:Sam, #place(e')=classroom:B202. From the above steps, it can be known that for the resource attribute, sim(x, y)=1, and for the place attribute, #place(e)=#place(e′)=classroom are the same concept, but for the event e, its constraint is Q (x)=B201, and for event e', its constraint is B202, so it is judged whether each trait-of of B202 satisfies the requirements of B201. Suppose capacity(B202)<capacity(B201), but multi-media(B201)=true, and multi-media(B202)=false, under the constraint #place(e)=classroom:B201, the similar value of its attribute place is 0.5. Therefore, the calculation result of the similarity between the two events is: Sim-event(e, e′)=0.75, com_attr[0]={resource,1}, com_attr[1]={place,0.5}.
下面基于事件之间的相似度,给出校准的相关定义。The relevant definition of calibration is given below based on the similarity between events.
定义8基于事件相似度的校准Definition 8 Calibration based on event similarity
对于事件的迹σ和模型N,基于事件相似度的校准γ是一系列移动的集合。校准使得第一个元素构成的系列为日志中的迹σ(除去>>),第二个元素构成的序列为模型N中变迁所对应事件的发生序列(除去>>)。其中,每一个有效移动是事件对(m,n):For event trace σ and model N, calibration γ based on event similarity is a set of shifts. Calibrate so that the series formed by the first element is the trace σ in the log (remove >>), and the sequence formed by the second element is the occurrence sequence of events corresponding to the transition in model N (remove >>). where each valid move is an event pair (m,n):
(1)(m,n)称日志移动,如果m∈σ↓act且n=>>;(1)(m,n) is called log movement, ifm ∈σ↓act andn =>>;
(2)(m,n)为模型移动,如果m=>>且n∈N↓act;(2) (m, n) is the model movement, ifm =>> andn ∈ N↓ act ;
(3)(m,n)为弱同步移动,如果m∈σ↓act,n∈N↓act,m=n且有Sim(m,n)<λ。(3)(m,n) is weak synchronous movement, ifm ∈σ↓act ,n ∈N ↓act ,m =n and Sim(m,n)<λ.
(4)(m,n)为强同步移动,如果m∈σ↓act,n∈N↓act,m=n且有Sim(m,n)≥λ。(4)(m,n) is strong synchronous movement, ifm ∈σ↓act ,n ∈N ↓act ,m =n and Sim(m,n)≥λ.
其中,λ是一个可调节的阈值,且0≤λ≤1。Among them, λ is an adjustable threshold, and 0≤λ≤1.
定义9校准的代价值Definition 9 Calibration cost value
参照标准似然函数,令日志移动、模型移动、弱同步移动和强同步移动的代价值cost分别记为(1,1,1,1-Sim-event(eσ,eN)),称其为相似度似然函数。其中,Sim-event(eσ,eN)表示事件对(eσ,eN)之间的相似度。Referring to the standard likelihood function, let the cost values of log movement, model movement, weak synchronization movement and strong synchronization movement be recorded as (1,1,1,1-Sim-event(eσ ,eN )) respectively, called is the similarity likelihood function. Among them, Sim-event(eσ ,eN ) represents the similarity between the event pair (eσ ,eN ).
基于相似度似然函数,一个校准γ的代价值cost(γ)为校准γ中包含的日志移动数、模型移动数、弱同步移动数与强同步移动数*(1-Sim-event(eσ,eN))之和。Based on the similarity likelihood function, the cost value cost(γ) of a calibration γ is the log movement number, model movement number, weak synchronization movement number and strong synchronization movement number*(1-Sim-event(eσ ,eN )) sum.
定义10最优校准Definition 10 Optimal Calibration
令Ψ(σ,N)表示迹σ和模型N基于相似度似然函数下的一个校准集合。若存在γ1∈Ψ(σ,N),对均有cost(γ)≤cost(γ1),则称γ1为σ与N基于相似度似然函数的最优校准。类似地,将σ与N之间基于相似度似然函数的最优校准集合表示为Ψ0(σ,N)。Let Ψ(σ,N) denote a calibration set under the similarity-likelihood function between trace σ and model N. If there exists γ1 ∈Ψ(σ,N), for Cost(γ)≤cost(γ1 ), then γ1 is called the optimal calibration between σ and N based on the likelihood function of similarity. Similarly, the optimal calibration set based on the similarity likelihood function between σ and N is denoted as Ψ0 (σ,N).
校准过程为:首先,将迹σ转换为其对应的事件网,对活动属性相同的事件计算其事件的相似度,并构造事件网与模型网的积网,将积网中的变迁映射为校准中的移动。其次,根据相似度似然函数下校准代价值的定义给出每一步移动的代价值,从积网的初始标识到结束标识有若干条路径,在这些路径中,代价值最小的最优路径对应着迹和模型的最优校准。The calibration process is as follows: first, convert the trace σ into its corresponding event network, calculate the event similarity for events with the same activity attributes, and construct the product network of the event network and the model network, and map the transition in the product network into a calibration in the movement. Secondly, according to the definition of the calibration cost value under the similarity likelihood function, the cost value of each step is given. There are several paths from the initial mark to the end mark of the cumulative network. Among these paths, the optimal path with the smallest cost value corresponds to Optimal calibration of traces and models.
图3所示为用标签Petri网表示的模型N2。图4所示为迹σ=<(a,Resource1:x1),(b,Resource2:m1,Place1:y1),(c,Place2:z1),(e,Place3:k1)>所表示的事件网N2′。模型N2和事件网N2′的积网N″如图5所示。在图3和图4中,每一个变迁t的标签为一个事件e,事件e的活动属性e,如a,b,c等。在积网N″中,每一个同步移动均附有该移动中两事件的相似度以及事件的各个属性以及属性间的相似度。将积网中的变迁映射为校准中的移动,并根据相似度似然函数给出不同路径对应校准的代价值,其中代价值最小的路径即为最优校准。Fig. 3 shows the model N2 represented by a labeled Petri net. Figure 4 shows the trace σ=<(a, Resource1 : x1 ), (b, Resource2 : m1 , Place1 : y1 ), (c, Place2 : z1 ), (e, Place3 :k1 )> represents the event network N2 ′. The product network N" of model N2 and event network N2 ′ is shown in Figure 5. In Figure 3 and Figure 4, the label of each transition t is an event e, and the activity attributee of event e, such as a, b , c, etc. In the product network N", each synchronous movement is attached with the similarity of the two events in the movement, each attribute of the event and the similarity between attributes. The transition in the cumulative network is mapped to the movement in the calibration, and the cost value of different paths corresponding to the calibration is given according to the similarity likelihood function, and the path with the smallest cost value is the optimal calibration.
下面就校准与简单校准进行比较。The following is a comparison between calibration and simple calibration.
在对事件相似性讨论的基础上,将基于事件日志L与模型N的校准和简单日志中的迹与简单模型的校准进行比较,可得出以下推论。对事件日志L中的任一条迹σ和模型N,其对应的简单迹与简单模型分别为σ↓act和N↓act。Based on the discussion of event similarity, the following inferences can be drawn by comparing the calibration based on the event log L with the model N and the calibration of the traces in the simple log with the simple model. For any trace σ and model N in the event log L, the corresponding simple trace and simple model are σ↓act and N↓act respectively.
推论1对事件日志L中的任一条迹σ和模型N,cost(Ψ0(σ,N))≥cost(Ψ0(σ↓act,N↓act))。Corollary 1 For any trace σ in event log L and model N, cost(Ψ0 (σ,N))≥cost(Ψ0 (σ↓act ,N↓act )).
证明:令γ∈Ψ0(σ,N),则cost(γ)为日志移动数、模型移动数、弱同步移动数和强同步移动*(1-Sim-event(eσ,eN))之和。而对γ′∈Γ0(σ↓act,N↓act),cost(γ′)为γ中日志移动数与模型移动数之和。因为在两种校准中,关于日志移动和模型移动的定义是相同的,均为(迹中活动,>>)和(>>,模型中活动)。因为0≤Sim-event(eσ,eN)≤1,所以对模型N和日志L中的迹σ,有cost(γ)≥cost(γ′)。Proof: Let γ∈Ψ0 (σ,N), then cost(γ) is log movement number, model movement number, weak synchronization movement number and strong synchronization movement number*(1-Sim-event(eσ ,eN )) Sum. And for γ′∈Γ0 (σ↓act ,N↓act ), cost(γ′) is the sum of log movement number and model movement number in γ. Because in both calibrations, the definitions of log shift and model shift are the same, both (activity in trace, >>) and (>>, activity in model). Since 0≤Sim-event(eσ ,eN )≤1, for trace σ in model N and log L, cost(γ)≥cost(γ′).
推论1说明,若考虑到事件除活动以外的属性,事件日志中的迹与模型校准的代价值会增大,即迹和模型相符合的程度会下降。这是因为当考虑事件的多个属性时,此时的校准中存在着弱同步移动和强同步移动。即使移动中的活动相同,仍需要从其它属性的相似度进一步计算其代价值。这从另一个方面说明,简单校准判断的日志与模型偏差的“质”,而校准则在质的基础上,给出日志与模型偏差的“量”。Corollary 1 shows that if the attributes of events other than activities are considered, the cost of calibration between the trace and the model in the event log will increase, that is, the degree of conformity between the trace and the model will decrease. This is because there are weak and strong synchronous movements in the calibration at this time when multiple properties of the event are considered. Even if the activities in the movement are the same, it still needs to further calculate its cost value from the similarity of other attributes. This shows from another aspect that simple calibration judges the "quality" of the deviation between the log and the model, and calibration gives the "quantity" of the deviation between the log and the model on the basis of quality.
推论2对γ∈Ψ0(σ,N),不一定有γ1∈Γ0(σ↓act,N↓act)。其中,γ1里移动中的活动为γ中事件的活动映射。反之亦然。Corollary 2 For γ∈Ψ0 (σ,N), there is not necessarily γ1 ∈Γ0 (σ↓act ,N↓act ). Among them, the moving activities in γ1 are the activity mappingsof events in γ. vice versa.
证明:根据推论1,若γ∈Ψ0(σ,N),则对有cost(γ)≤cost(γ′)。在相似度似然函数定义下,γ和γ′的代价值是其中的日志移动数、模型移动数、弱同步移动数和强同步移动*(1-Sim-event(eσ,eN))。当将γ和γ′映射为简单校准γ1和γ1′时,γ1和γ1′的代价值为γ1和γ1′中日志移动数与模型移动数之和。但cost(γ)≤cost(γ′)不一定能够推出cost(γ1)≤cost(γ1′)。因此,γ1∈Γ0(σ↓act,N↓act)不一定成立。Proof: According to Corollary 1, if γ∈Ψ0 (σ,N), then for There is cost(γ)≤cost(γ′). Under the definition of similarity likelihood function, the cost value of γ and γ′ is the log movement number, model movement number, weak synchronization movement number and strong synchronization movement number*(1-Sim-event(eσ ,eN )) . When γ and γ′ are mapped to simple calibrationγ1 andγ1′ , the cost value of γ1 and γ1′ is the sum of the number of log moves and the number of model moves inγ1 andγ1 ′. But cost(γ)≤cost(γ′) does not necessarily lead to cost(γ1 )≤cost(γ1 ′). Therefore, γ1 ∈ Γ0 (σ↓act , N↓act ) does not necessarily hold.
对其逆命题可类似地给出证明。The proof of its converse can be similarly given.
推论2说明,考虑到具有多属性事件之间的相似度,校准和简单校准不一定存在对应关系。Corollary 2 shows that, considering the similarity between events with multiple attributes, there does not necessarily exist a corresponding relationship between calibration and simple calibration.
推论3对日志L中的迹σ1,若和γ2∈Ψ(σ1,N),且cost(γ1)=cost(γ2)。则若γ1和γ2中日志移动、模型移动、弱同步移动和强同步移动的个数均相同时,则有cost(γ1′)=cost(γ2′),其中,γ1′∈Γ(σ1↓act,N↓act),γ2′∈Γ(σ↓act,N↓act)。反之不成立。Corollary 3 For trace σ1 in log L, if and γ2 ∈Ψ(σ1 ,N), and cost(γ1 )=cost(γ2 ). Then if the numbers of log movement, model movement, weak synchronization movement and strong synchronization movement in γ1 and γ2 are the same, then there is cost(γ1 ′)=cost(γ2 ′), where γ1 ′∈ Γ(σ1↓act , N↓act ), γ2 ′∈Γ(σ↓act , N↓act ). The opposite is not true.
证明:可与结论2类似地进行证明。Proof: It can be proved similarly to conclusion 2.
推论3说明,若迹有两个代价值相同的最优校准,则只有这两个最优校准中四种移动的个数分别相同时,它们的最优简单校准也有相同的代价值。但当迹有两个代价值相同的最优简单校准时,其最优校准的代价值未必相同。Corollary 3 shows that if a trace has two optimal calibrations with the same cost value, their optimal simple calibrations also have the same cost value only when the numbers of the four kinds of moves in the two optimal calibrations are the same. But when the trace has two optimal simple calibrations with the same cost value, the cost values of their optimal calibrations may not be the same.
如前推论2所述,迹和模型的最优简单校准与最优校准结果可能不一致,结合推论1,最优校准能够更准确地反映日志中的迹和模型的一致性程度。As mentioned in Inference 2, the optimal simple calibration of the trace and the model may not be consistent with the optimal calibration results. Combined with Inference 1, the optimal calibration can more accurately reflect the degree of consistency between the trace and the model in the log.
若有过程模型N和它的事件日志L,其中,事件有活动属性和两种资源属性res1和res2,如教学过程模型的res1为teacher,res2为course。将N和L中的每个事件映射为其活动,得到其对应的简单过程模型N↓act和事件日志L↓act。If there is a process model N and its event log L, the event has an activity attribute and two resource attributes res1 and res2 , for example, res1 of the teaching process model is teacher, and res2 is course. Each event in N and L is mapped to its activity, and its corresponding simple process model N↓act and event log L↓act are obtained.
(1)事件日志中的最优迹与最优实例(1) The optimal trace and optimal instance in the event log
图6(a)所示为简单过程模型N↓act与事件日志L↓act中每条迹的最优简单校准及其代价值。如图6(a)中所示,通过比较每个实例(case)迹的最优校准代价值,可得到事件日志中代价值最小的迹σi↓act,使得cost(γi)=min(cost(γj)),1≤j≤n。称迹σi↓act为L↓act上的最优迹,σi↓act所对应的实例casei为最优实例。Figure 6(a) shows the optimal simple calibration and its cost value for each trace in the simple process model N↓act and the event log L↓act . As shown in Figure 6(a), by comparing the optimal calibration cost value of each instance (case) trace, the trace σi↓act with the smallest cost value in the event log can be obtained, so that cost(γi )=min( cost(γj )), 1≤j≤n. The trace σi↓act is called the optimal trace on L↓act , and the instance casei corresponding to σi↓act is the optimal instance.
图6(b)所示为过程模型N与事件日志L中每条迹的多属性事件的最优校准及其代价值。与最优简单校准相类似,根据最优校准的代价值,可得到事件日志L中最优代价值最小的最优迹和其对应的最优实例。同时,可从res1和res2的角度进行评价,对涉及到res1的实例对应的事件日志记作L(res1),可得到L(res1)中的最优迹和最优实例。类似地,对事件日志L(res2),也可得到L(res2)中的最优迹和最优实例。Figure 6(b) shows the optimal calibration of multi-attribute events in process model N and event log L for each trace and its cost value. Similar to the optimal simple calibration, according to the cost value of the optimal calibration, the optimal trace with the smallest optimal cost value in the event log L and its corresponding optimal instance can be obtained. At the same time, it can be evaluated from the perspective of res1 and res2 , and the event log corresponding to the instance involving res1 is recorded as L(res1 ), and the optimal trace and optimal instance in L(res1 ) can be obtained. Similarly, for the event log L(res2 ), the optimal trace and the optimal instance in L(res2 ) can also be obtained.
因此可知,与最优简单校准相比较,校准可以从更多的角度来评价事件日志中的迹和实例。Therefore, compared with the optimal simple calibration, the calibration can evaluate the traces and instances in the event log from more perspectives.
(2)简单校准与校准结构的比较(2) Comparison of simple calibration and calibration structure
图7给出了简单校准与校准的结构示意图。在图7(a)中,简单校准中的每个移动是迹与模型中的事件对(除去>>)和代价值。在图7(b)所示的校准中,令其弱同步移动和强同步移动的代价值均为零,并“缩小”来看,即可得到其对应的简单校准。当“放大”多属性事件的校准时,则可进一步发现其同步移动相似度可能不为1的原因。因此,多属性事件的校准一方面从“质”上反映日志和模型偏差出现的位置及数量,另一方面也从“量”上更细致地反映偏差出现的位置及原因,如Place不一致。因此,本发明所述的校准是简单校准的更细致表示,也为改进模型或发现日志中的问题提供更详细的依据。Figure 7 shows a schematic diagram of the structure of simple calibration and calibration. In Fig. 7(a), each move in the simple calibration is an event pair (remove >>) and a cost value in the trace and model. In the calibration shown in Figure 7(b), let the cost values of the weak synchronous movement and the strong synchronous movement be zero, and "zoom out" to get the corresponding simple calibration. When the calibration of multi-attribute events is "zoomed in", the reason why the similarity of their synchronous movement may not be 1 can be further found. Therefore, on the one hand, the calibration of multi-attribute events reflects the location and quantity of deviations between logs and models in terms of "quality", and on the other hand, it also reflects the location and reasons of deviations in more detail in terms of "quantity", such as the inconsistent place. Therefore, the calibration described in the present invention is a more detailed representation of the simple calibration, and also provides a more detailed basis for improving the model or finding problems in the log.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910250146.4ACN110109921B (en) | 2019-03-29 | 2019-03-29 | Event log and process model calibration method based on event similarity |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910250146.4ACN110109921B (en) | 2019-03-29 | 2019-03-29 | Event log and process model calibration method based on event similarity |
| Publication Number | Publication Date |
|---|---|
| CN110109921Atrue CN110109921A (en) | 2019-08-09 |
| CN110109921B CN110109921B (en) | 2021-08-06 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910250146.4AExpired - Fee RelatedCN110109921B (en) | 2019-03-29 | 2019-03-29 | Event log and process model calibration method based on event similarity |
| Country | Link |
|---|---|
| CN (1) | CN110109921B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113936755A (en)* | 2021-09-09 | 2022-01-14 | 山东科技大学 | Clinical path variation detection method based on Petri net synchronous synthetic operation |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102012918A (en)* | 2010-11-26 | 2011-04-13 | 中金金融认证中心有限公司 | System and method for excavating and executing rule |
| US8745057B1 (en)* | 2011-11-28 | 2014-06-03 | Google Inc. | Creating and organizing events in an activity stream |
| CN105069306A (en)* | 2015-08-18 | 2015-11-18 | 山东科技大学 | Event log audit method based on workflow net model |
| CN106204345A (en)* | 2016-06-29 | 2016-12-07 | 安徽理工大学 | The optimization method of service interaction process model based on configuration information |
| US20180365654A1 (en)* | 2017-06-20 | 2018-12-20 | Microsoft Technology Licensing, Llc | Automatic association and sharing of photos with calendar events |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102012918A (en)* | 2010-11-26 | 2011-04-13 | 中金金融认证中心有限公司 | System and method for excavating and executing rule |
| US8745057B1 (en)* | 2011-11-28 | 2014-06-03 | Google Inc. | Creating and organizing events in an activity stream |
| US9223803B2 (en)* | 2011-11-28 | 2015-12-29 | Google Inc. | Creating and organizing events in an activity stream |
| CN105069306A (en)* | 2015-08-18 | 2015-11-18 | 山东科技大学 | Event log audit method based on workflow net model |
| CN106204345A (en)* | 2016-06-29 | 2016-12-07 | 安徽理工大学 | The optimization method of service interaction process model based on configuration information |
| US20180365654A1 (en)* | 2017-06-20 | 2018-12-20 | Microsoft Technology Licensing, Llc | Automatic association and sharing of photos with calendar events |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113936755A (en)* | 2021-09-09 | 2022-01-14 | 山东科技大学 | Clinical path variation detection method based on Petri net synchronous synthetic operation |
| Publication number | Publication date |
|---|---|
| CN110109921B (en) | 2021-08-06 |
| Publication | Publication Date | Title |
|---|---|---|
| Caprar et al. | Conceptualizing and measuring culture in international business and management: From challenges to potential solutions | |
| Sheth et al. | So far (schematically) yet so near (semantically) | |
| Fazel-Zarandi et al. | An ontology for skill and competency management | |
| WO2022218186A1 (en) | Method and apparatus for generating personalized knowledge graph, and computer device | |
| CN107844482A (en) | Multi-data source method for mode matching based on global body | |
| CN101140588A (en) | A sorting method and device for relational search results | |
| Yu et al. | Clustering analysis and punishment‐driven consensus‐reaching process for probabilistic linguistic large‐group decision‐making with application to car‐sharing platform selection | |
| US20180365324A1 (en) | Method of data organization and data searching for use in constructing evidence-based beliefs | |
| Zhao et al. | Factors affecting household solid waste generation and management in Sri Lanka: an empirical study | |
| US20240370928A1 (en) | Asset value evaluation method and apparatus, model training method and apparatus, and readable storage medium | |
| CN115545468A (en) | Audit risk measurement method based on knowledge graph | |
| Colpaert et al. | Quantifying the interoperability of open government datasets | |
| Li et al. | A method for fuzzy quantified querying over fuzzy resource description framework graph | |
| CN110109921A (en) | Event log and process model calibration method based on event similarity | |
| CN115640405A (en) | A method of intelligent paper composition based on knowledge graph | |
| Fazel-Zarandi et al. | Constructing expert profiles over time for skills management and expert finding | |
| Cameron | SEMEF: A taxonomy-based discovery of experts, expertise and collaboration networks | |
| Rahwan et al. | Towards large scale argumentation support on the semantic web | |
| CN101930369A (en) | A Method for Semantic Matching of Components Oriented to Task Transfer | |
| Ul Hassan et al. | Slua: Towards semantic linking of users with actions in crowdsourcing | |
| Nuntawong et al. | Home: Hybrid ontology mapping evaluation tool for computer science curricula | |
| Min | [Retracted] Personalised Recommendation of PE Network Course Environment Resources Using Data Mining Analysis | |
| Rekha et al. | Ontology driven framework for assessing the syllabus fairness of a question paper | |
| Umuhoza et al. | Trustworthiness assessment of knowledge on the semantic sensor web by provenance integration | |
| Wang | A framework and architecture for quality assessment in data integration |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee | Granted publication date:20210806 |