Movatterモバイル変換


[0]ホーム

URL:


CN104598591A - Model element matching method for type attribute graph model - Google Patents

Model element matching method for type attribute graph model
Download PDF

Info

Publication number
CN104598591A
CN104598591ACN201510028158.4ACN201510028158ACN104598591ACN 104598591 ACN104598591 ACN 104598591ACN 201510028158 ACN201510028158 ACN 201510028158ACN 104598591 ACN104598591 ACN 104598591A
Authority
CN
China
Prior art keywords
node
feature vector
model
segmentation
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510028158.4A
Other languages
Chinese (zh)
Other versions
CN104598591B (en
Inventor
覃征
张任伟
李胜男
杨晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua UniversityfiledCriticalTsinghua University
Priority to CN201510028158.4ApriorityCriticalpatent/CN104598591B/en
Publication of CN104598591ApublicationCriticalpatent/CN104598591A/en
Application grantedgrantedCritical
Publication of CN104598591BpublicationCriticalpatent/CN104598591B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种针对类型属性图模型的模型元素匹配方法,所述方法包括:构建待分析模型的带权多维搜索树;针对待匹配模型元素在所述带权多维搜索树上进行区域搜索,从而获取所述待匹配模型元素的相似节点集;分别计算所述相似节点集中的每一个元素节点与所述待匹配模型元素的相似度从而确定与所述待匹配模型元素相似度最高的所述元素节点。本发明的模型元素匹配方法不仅具有较高的通用性而且可以应用于多人协作编辑模型的情况;同时,基于本发明的方法进行模型元素匹配,其匹配过程的计算量大大减小,从而大大减小了模型元素匹配整体过程的耗时,提高了模型元素匹配的执行速度和效率。

The invention discloses a model element matching method for a type attribute graph model, the method comprising: constructing a weighted multi-dimensional search tree of a model to be analyzed; performing a region search on the weighted multi-dimensional search tree for model elements to be matched , so as to obtain the similar node set of the model element to be matched; respectively calculate the similarity between each element node in the similar node set and the model element to be matched to determine the highest similarity with the model element to be matched Described element node. The model element matching method of the present invention not only has high versatility but also can be applied to the situation of multi-person collaborative editing model; at the same time, based on the method of the present invention for model element matching, the calculation amount of the matching process is greatly reduced, thereby greatly The time consumption of the overall process of model element matching is reduced, and the execution speed and efficiency of model element matching are improved.

Description

Translated fromChinese
一种针对类型属性图模型的模型元素匹配方法A Model Element Matching Method for Type Property Graph Model

技术领域technical field

本发明涉及信息技术领域,具体说涉及一种针对类型属性图模型的模型元素匹配方法。The invention relates to the field of information technology, in particular to a model element matching method for a type attribute graph model.

背景技术Background technique

随着模型驱动的软件开发方法与应用技术的不断发展,开发人员的关注热点逐渐由代码的编写转向模型的设计,这就对模型的维护与管理带来了更高的要求。With the continuous development of model-driven software development methods and application technologies, the focus of developers is gradually shifting from code writing to model design, which brings higher requirements for model maintenance and management.

一方面,模型的版本管理系统需要面对日益增加的模型规模,要求能够对包含上万个模型元素的大模型进行匹配。同时,为了缩短开发周期,常需要多个开发人员对同一模型协作编辑,这就需要模型版本管理系统能够较快的比较同一模型多个版本之间的异同。因此,如何提高大模型间的模型元素的匹配效率,是模型版本控制领域的研究重点。On the one hand, the version management system of the model needs to face the increasing scale of the model, and it is required to be able to match the large model containing tens of thousands of model elements. At the same time, in order to shorten the development cycle, multiple developers often need to edit the same model collaboratively, which requires a model version management system that can quickly compare the similarities and differences between multiple versions of the same model. Therefore, how to improve the matching efficiency of model elements between large models is the focus of research in the field of model version control.

当前的模型元素匹配方法存在下述问题。首先,由于待匹配模型不应仅局限于某一类,即要求算法能对通用模型做出匹配,但是在当前的模型匹配方法算法的通用性不高。其次,当前主流的匹配算法大多基于静态标识符,通过为模型元素分配唯一标识用以直接比较。考虑到模型并行开发时标识符的不确定性,这种方法在多人协作编辑模型时并不适用。再次,常见的模型版本管理工具需要遍历所有的模型元素,其匹配算法的计算复杂度较高。当匹配模型元素较多时,需耗费较长的时间,影响随后的模型冲突检测以及模型合并的效率。The current model element matching method suffers from the following problems. First of all, because the model to be matched should not be limited to a certain type, that is, the algorithm is required to match the general model, but the generality of the algorithm in the current model matching method is not high. Second, most of the current mainstream matching algorithms are based on static identifiers, which assign unique identifiers to model elements for direct comparison. Considering the uncertainty of the identifier when the model is developed in parallel, this method is not suitable for multiple people to edit the model collaboratively. Thirdly, common model version management tools need to traverse all model elements, and their matching algorithms have high computational complexity. When there are many matching model elements, it takes a long time, which affects the efficiency of subsequent model conflict detection and model merging.

因此,针对当前的模型匹配方法存在的问题,需要一种新的模型元素匹配方法以获得更为理想的模型元素匹配结果。Therefore, in view of the problems existing in the current model matching method, a new model element matching method is needed to obtain a more ideal model element matching result.

发明内容Contents of the invention

针对当前的模型匹配方法存在的问题,本发明提供了一种针对类型属性图模型的模型元素匹配方法,所述方法包括以下步骤:Aiming at the problems existing in the current model matching method, the present invention provides a method for matching model elements of a type property graph model, the method comprising the following steps:

步骤一,构建待分析模型的带权多维搜索树,所述带权多维搜索树包含相互间拥有层级从属关系的目录节点以及元素节点,所述元素节点用于描述所述待分析模型中相应的模型元素,所述目录节点包含多个子树,所述目录节点或所述元素节点构造在其从属的目录节点的子树上;Step 1, constructing a weighted multidimensional search tree of the model to be analyzed, the weighted multidimensional search tree includes directory nodes and element nodes that have hierarchical affiliation with each other, and the element nodes are used to describe the corresponding a model element, the directory node contains a plurality of subtrees, the directory node or the element node is constructed on the subtree of its subordinate directory node;

步骤二,针对待匹配模型元素在所述带权多维搜索树上进行区域搜索,从而搜索出所述带权多维搜索树上与所述待匹配模型元素相似的所有元素节点,进而构造所述待匹配模型元素的相似节点集;Step 2: Perform an area search on the weighted multidimensional search tree for the model elements to be matched, so as to search for all element nodes on the weighted multidimensional search tree that are similar to the model elements to be matched, and then construct the to-be-matched A set of similar nodes that match model elements;

步骤三,分别计算所述相似节点集中的每一个元素节点与所述待匹配模型元素的相似度从而确定与所述待匹配模型元素相似度最高的所述元素节点。Step 3: Calculate the similarity between each element node in the similar node set and the model element to be matched to determine the element node with the highest similarity to the model element to be matched.

在一实施例中,所述步骤一包含以下步骤:In one embodiment, said step 1 includes the following steps:

特征向量组构建步骤,根据所述待分析模型的特征构建相应的特征向量从而构建所述待分析模型的第一特征向量组;The eigenvector group construction step is to construct corresponding eigenvectors according to the characteristics of the model to be analyzed so as to construct the first eigenvector group of the model to be analyzed;

带权多维搜索树构建步骤,利用所述第一特征向量组构建所述带权多维搜索树。The weighted multi-dimensional search tree construction step is to use the first feature vector group to construct the weighted multi-dimensional search tree.

在一实施例中,所述特征向量组构建步骤包含以下步骤:In one embodiment, the feature vector group construction step includes the following steps:

分析所述待分析模型从而获取所述待分析模型的特征向量,其中,所述待分析模型的每一个模型元素对应一个所述特征向量,所述模型元素的每一个典型特征对应所述特征向量的一个维度;Analyzing the model to be analyzed so as to obtain a feature vector of the model to be analyzed, wherein each model element of the model to be analyzed corresponds to one of the feature vectors, and each typical feature of the model element corresponds to the feature vector a dimension of

基于所述典型特征的语义为所述特征向量的每一个维度分配权值。Assigning a weight to each dimension of the feature vector based on the semantics of the typical feature.

在一实施例中,所述带权多维搜索树构建步骤包含以下步骤:In one embodiment, the weighted multi-dimensional search tree construction step includes the following steps:

分割目标集构建步骤,创建分割目标集并将所述第一特征向量组加入到所述分割目标集中;The segmentation target set construction step is to create a segmentation target set and add the first feature vector group to the segmentation target set;

分割判断步骤,从所述分割目标集中任选一特征向量组,判断所述特征向量组是否需要被分割并将完成所述判断的特征向量组从所述分割目标集中去除,其中,当所述特征向量组中的特征向量的数目大于用户预先定义的特定值时所述特征向量组需要被分割;The segmentation judging step is to select a feature vector group from the segmentation target set, determine whether the feature vector group needs to be segmented and remove the feature vector group that has completed the judgment from the segmentation target set, wherein, when the When the number of eigenvectors in the eigenvector group is greater than a specific value predefined by the user, the eigenvector group needs to be divided;

分割步骤,当所述特征向量组需要被分割时,基于特定的分割策略对所述特征向量组执行一次分割操作从而获取多个子特征向量组,并将每个所述子特征向量组加入到所述分割目标集中;In the segmentation step, when the feature vector group needs to be divided, perform a segmentation operation on the feature vector group based on a specific segmentation strategy to obtain multiple sub-feature vector groups, and add each of the sub-feature vector groups to the The segmentation target set;

目录节点构造步骤,在当前的所述分割步骤执行完毕后根据当前的所述分割步骤获取的子特征向量组构建相应的目录节点,其中,每一个所述子特征向量组对应所述目录节点的一条子树;The directory node construction step is to construct a corresponding directory node according to the sub-feature vector groups obtained in the current segmentation step after the current segmentation step is completed, wherein each of the sub-feature vector groups corresponds to the directory node a subtree;

元素节点构造步骤,当所述目标特征向量组不需要被分割时,基于所述目标特征向量组构造所述元素节点;An element node construction step, when the target feature vector group does not need to be divided, construct the element node based on the target feature vector group;

分割目标遍历步骤,针对所述分割目标集中所有的特征向量组执行所述分割判断步骤并根据所述分割判断步骤的结果执行所述分割步骤以及所述目录节点构造步骤或所述元素节点构造步骤直到所述分割目标集为空。A segmentation target traversal step, executing the segmentation judgment step for all feature vector groups in the segmentation target set and performing the segmentation step and the directory node construction step or the element node construction step according to the result of the segmentation judgment step until the segmentation target set is empty.

在一实施例中,所述分割策略包含分割维度以及分割值,在所述分割步骤中,将所述特征向量组中每个特征向量在所述分割维度上的值与所述分割值做对比从而获取对比结果,根据所述对比结果将所述特征向量组分割为两个子特征向量组。In one embodiment, the segmentation strategy includes a segmentation dimension and a segmentation value, and in the segmentation step, the value of each feature vector in the feature vector group on the segmentation dimension is compared with the segmentation value Therefore, a comparison result is obtained, and the feature vector group is divided into two sub-feature vector groups according to the comparison result.

在一实施例中,计算所述特征向量组中每个维度对应的权值与所述维度对应的所有所述特征向量的方差的乘积,选取数值最大的所述乘积对应的所述维度作为所述分割维度。In one embodiment, the product of the weight corresponding to each dimension in the feature vector group and the variance of all the feature vectors corresponding to the dimension is calculated, and the dimension corresponding to the product with the largest value is selected as the dimension. Dimensions of segmentation.

在一实施例中,选择所述特征向量组中所述分割维度上所有特征向量的值的中位数作为所述分割值。In an embodiment, the median of values of all feature vectors on the segmentation dimension in the feature vector group is selected as the segmentation value.

在一实施例中,在所述分割步骤中,定义所述两个子特征向量组分别为第一子特征向量组以及第二子特征向量组,其中:In one embodiment, in the dividing step, the two sub-feature vector groups are defined as the first sub-feature vector group and the second sub-feature vector group, wherein:

所述目标特征向量组中的特征向量的值小于所述分割值的所述特征向量归属所述第一子特征向量组;The eigenvectors in the target eigenvector group whose value is smaller than the split value belong to the first sub-eigenvector group;

所述目标特征向量组中的特征向量的值大于所述分割值的所述特征向量归属所述第二子特征向量组;The eigenvectors in the target eigenvector group whose value is greater than the split value belong to the second sub-eigenvector group;

根据所述第一子特征向量组与所述第二子特征向量组是否平衡来确定所述目标特征向量组中的特征向量的值等于所述分割值的所述特征向量的归属。The belonging of the feature vector whose value of the feature vector in the target feature vector group is equal to the split value is determined according to whether the first sub-feature vector group and the second sub-feature vector group are balanced.

在一实施例中,所述目录节点记录有当前目录节点对应的分割维度、所述分割维度对应的分割值、所述分割维度对应的权值、所述当前目录节点对应的所述第一子特征向量组中特征向量的最小值以及所述第二子特征向量组中特征向量的最大值。In an embodiment, the directory node records the segmentation dimension corresponding to the current directory node, the segmentation value corresponding to the segmentation dimension, the weight value corresponding to the segmentation dimension, the first child value corresponding to the current directory node The minimum value of the eigenvectors in the eigenvector group and the maximum value of the eigenvectors in the second sub-group of eigenvectors.

在一实施例中,所述步骤二包含以下步骤:In one embodiment, the second step includes the following steps:

搜索节点集构建步骤,创建搜索节点集并将所述带权多维搜索树上第一个目录节点加入到所述搜索节点集中;The search node set construction step is to create a search node set and add the first directory node on the weighted multidimensional search tree to the search node set;

节点类型判断步骤,在所述搜索节点集中任选一节点,判断所述节点是否为所述元素节点并从所述搜索节点集中去除执行过节点类型判断的所述节点;A node type judgment step, selecting a node in the search node set, judging whether the node is the element node and removing the node type judgment from the search node set;

相似节点获取步骤,当所述节点为所述元素节点时,添加所述节点至所述相似节点集;A similar node obtaining step, when the node is the element node, adding the node to the similar node set;

搜索范围获取步骤,当所述节点不是所述元素节点时,获取待匹配模型元素对应于所述节点的搜索范围;A search scope acquisition step, when the node is not the element node, acquire the search scope corresponding to the node of the model element to be matched;

搜索节点集更新步骤,搜索出满足所述搜索范围的所述节点的子树,将所述子树上的节点加入到所述搜索节点集中;The search node set update step is to search out the subtree of the node satisfying the search range, and add the nodes on the subtree to the search node set;

搜索节点遍历步骤,针对所述搜索节点集中的所有节点执行所述节点类型判断步骤并根据所述节点类型判断步骤的结果执行搜索范围获取步骤以及搜索节点更新步骤或相似节点获取步骤直到所述搜索节点集为空。A search node traversal step of executing the node type judgment step for all nodes in the search node set and performing a search range acquisition step and a search node update step or a similar node acquisition step according to the result of the node type judgment step until the search The node set is empty.

在一实施例中,在所述搜索区域获取步骤中,基于所述搜索节点对应的分割维度的权值定义所述待匹配模型元素在所述分割维度上搜索范围。In an embodiment, in the step of obtaining the search area, the search range of the model element to be matched on the segmentation dimension is defined based on the weight of the segmentation dimension corresponding to the search node.

与现有技术相比,本发明具有如下优点:Compared with prior art, the present invention has following advantage:

本发明的模型元素匹配方法具有较高的通用性;The model element matching method of the present invention has high versatility;

本发明的模型元素匹配方法最大限度的避免了模型并行开发时静态标识符分配的不确定性对匹配计算的影响,使得本发明的方法可以应用于多人协作编辑模型的情况;The model element matching method of the present invention avoids to the greatest extent the influence of the uncertainty of static identifier assignment on the matching calculation during model parallel development, so that the method of the present invention can be applied to the situation of multi-person collaborative editing models;

基于本发明的方法进行模型元素匹配,其匹配过程的计算量大大减小,从而大大减小了模型元素匹配整体过程的耗时,提高了模型元素匹配的执行速度和效率。The matching of model elements based on the method of the present invention greatly reduces the calculation amount of the matching process, thereby greatly reducing the time consumption of the overall process of model element matching, and improving the execution speed and efficiency of model element matching.

本发明的其它特征或优点将在随后的说明书中阐述。并且,本发明的部分特征或优点将通过说明书而变得显而易见,或者通过实施本发明而被了解。本发明的目的和部分优点可通过在说明书、权利要求书以及附图中所特别指出的步骤来实现或获得。Additional features or advantages of the invention will be set forth in the ensuing description. And, some features or advantages of the present invention will be apparent from the description, or be understood by practicing the present invention. The objects and some of the advantages of the invention will be realized or obtained by the steps particularly pointed out in the written description, claims as well as the appended drawings.

附图说明Description of drawings

附图用来提供对本发明的进一步理解,并且构成说明书的一部分,与本发明的实施例共同用于解释本发明,并不构成对本发明的限制。在附图中:The accompanying drawings are used to provide a further understanding of the present invention, and constitute a part of the description, and are used together with the embodiments of the present invention to explain the present invention, and do not constitute a limitation to the present invention. In the attached picture:

图1是本发明一实施例的执行流程图;Fig. 1 is the execution flowchart of an embodiment of the present invention;

图2是本发明一实施例的带权多维搜索树结构示意图;Fig. 2 is a schematic diagram of a weighted multi-dimensional search tree structure according to an embodiment of the present invention;

图3是本发明一实施例的源模型的结构示意图;Fig. 3 is a schematic structural diagram of a source model according to an embodiment of the present invention;

图4是本发明一实施例的修改模型的结构示意图。Fig. 4 is a schematic structural diagram of a modified model according to an embodiment of the present invention.

具体实施方式Detailed ways

以下将结合附图及实施例来详细说明本发明的实施方式,借此本发明的实施人员可以充分理解本发明如何应用技术手段来解决技术问题,并达成技术效果的实现过程并依据上述实现过程具体实施本发明。需要说明的是,只要不构成冲突,本发明中的各个实施例以及各实施例中的各个特征可以相互结合,所形成的技术方案均在本发明的保护范围之内。The implementation of the present invention will be described in detail below in conjunction with the accompanying drawings and examples, so that implementers of the present invention can fully understand how the present invention uses technical means to solve technical problems, and achieve the realization process of technical effects and according to the above-mentioned realization process The present invention is implemented concretely. It should be noted that, as long as there is no conflict, each embodiment and each feature in each embodiment of the present invention can be combined with each other, and the formed technical solutions are all within the protection scope of the present invention.

本发明提供了一种针对类型属性图模型的模型元素匹配方法。为了提高模型元素较多时两模型匹配的效率,本发明的方法采用先初步筛选出待分析模型中与待匹配模型元素相似的多个模型元素,然后再分别计算其相似度,从而最终获得待分析模型中与待匹配模型元素最匹配的模型元素。The invention provides a model element matching method for a type attribute graph model. In order to improve the matching efficiency of the two models when there are many model elements, the method of the present invention initially screens out a plurality of model elements in the model to be analyzed that are similar to the model elements to be matched, and then calculates their similarities respectively, thereby finally obtaining the model elements to be analyzed The model element in the model that best matches the model element to be matched.

经过初步筛选,需要进行相似度计算的模型元素大大减少,从而大大减小了计算量。同时,由于多维搜索树区域搜索较快,所以整体流程耗时大大缩短,匹配效率大大提高。After preliminary screening, the model elements that need to be calculated for similarity are greatly reduced, thereby greatly reducing the amount of calculation. At the same time, due to the fast region search of the multi-dimensional search tree, the overall process time consumption is greatly shortened and the matching efficiency is greatly improved.

下面结合附图和具体实施方式对本发明做进一步的说明。附图的流程图中示出的步骤可以在包含诸如一组计算机可执行指令的计算机系统中执行。虽然在流程图中示出了各步骤的逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments. The steps shown in the flowcharts of the figures can be performed in a computer system comprising, for example, a set of computer-executable instructions. Although a logical order of steps is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that shown or described herein.

为了筛选出待分析模型中与待匹配模型元素相似的模型元素,如图1所示,本发明的方法首先要执行步骤S100,构建待分析模型的带权多维搜索树。为了提高通用性,本发明的方法采用了类型属性图模型作为对象。当待匹配模型元素或待分析模型不属于类型属性图模型时,则使用模型转换工具将待匹配模型元素或待分析模型转换为类型属性图模型。In order to filter out model elements similar to the model elements to be matched in the model to be analyzed, as shown in FIG. 1 , the method of the present invention first executes step S100 to construct a weighted multi-dimensional search tree of the model to be analyzed. In order to improve generality, the method of the present invention adopts the type attribute graph model as the object. When the model element to be matched or the model to be analyzed does not belong to the type attribute graph model, the model conversion tool is used to convert the model element to be matched or the model to be analyzed into a type attribute graph model.

由于类型属性图模型是当前较为常用的模型类型,且绝大多数当前的模型类型都可以利用模型转换工具转化为类型属性图模型。因此相较于现有技术,本发明的方法具有较高的通用性。Since the type attribute graph model is currently a more commonly used model type, and most current model types can be converted into type attribute graph models using model conversion tools. Therefore, compared with the prior art, the method of the present invention has higher versatility.

在本实施例中,采用Eclipse模型框架(Eclipse Model Framework,EMF)构造类型属性图模型。EMF提供一种结构化模型的框架,支持直接设计模型,并生成相应代码。在EMF中,每一个模型元素都可具有多个属性,模型元素间通过关联关系(继承、聚合等)相连接。In this embodiment, an Eclipse model framework (Eclipse Model Framework, EMF) is used to construct a type property graph model. EMF provides a framework for structured models, supports direct design of models, and generates corresponding codes. In EMF, each model element can have multiple attributes, and model elements are connected through association relationships (inheritance, aggregation, etc.).

在本实施例中,根据EMF模型构建的带权多维搜索树基于相互间拥有层级从属关系的多个节点构建。节点分为元素节点以及目录节点两类。其中,目录节点用于描述带权多维搜索树的逻辑构架,元素节点用于表述模型的模型元素。在目录节点之间以及目录节点与元素节点之间存在层级从属关系,目录节点包含多个子树,目录节点或元素节点构造在其从属的目录节点的子树上。In this embodiment, the weighted multi-dimensional search tree constructed according to the EMF model is constructed based on a plurality of nodes having hierarchical affiliation relationships with each other. There are two types of nodes: element nodes and directory nodes. Among them, the directory node is used to describe the logical framework of the weighted multi-dimensional search tree, and the element node is used to express the model elements of the model. There is hierarchical subordination between directory nodes and between directory nodes and element nodes. Directory nodes contain multiple subtrees, and directory nodes or element nodes are constructed on the subtrees of their subordinate directory nodes.

目录节点之间以及目录节点与元素节点之间存在层级从属关系由目录节点记录的目录包含范围与元素节点记录的具体模型元素内容决定。如图2所示,目录节点201为带权多维搜索树最高层级的节点,目录节点202以及目录节点203从属于目录节点201,目录节点202以及目录节点203构建在目录节点201的子树上。元素节点211以及元素节点212从属于目录节点202,元素节点211以及元素节点212构建在目录节点202的子树上。目录节点204以及元素节点215从属于目录节点203,目录节点204以及元素节点215构建在目录节点203的子树上。元素节点213以及元素节点214从属于目录节点204,元素节点213以及元素节点214构建在目录节点204的子树上。元素节点211、212、213、214以及215为带权多维搜索树的最底层的节点。The hierarchical affiliation between directory nodes and between directory nodes and element nodes is determined by the directory inclusion range recorded by directory nodes and the specific model element content recorded by element nodes. As shown in FIG. 2 , directory node 201 is the highest-level node of the weighted multi-dimensional search tree, directory node 202 and directory node 203 are subordinate to directory node 201, and directory node 202 and directory node 203 are built on the subtree of directory node 201. The element node 211 and the element node 212 are subordinate to the directory node 202 , and the element node 211 and the element node 212 are built on the subtree of the directory node 202 . Directory node 204 and element node 215 are subordinate to directory node 203 , and directory node 204 and element node 215 are built on the subtree of directory node 203 . The element node 213 and the element node 214 are subordinate to the directory node 204 , and the element node 213 and the element node 214 are constructed on the subtree of the directory node 204 . The element nodes 211, 212, 213, 214 and 215 are the bottommost nodes of the weighted multidimensional search tree.

接下来详细说明本实施例的带权多维搜索树构建过程,为了构建带权多维搜索树,首先要执行步骤S101,特征向量组构建步骤。在步骤S101中,根据待分析模型的特征构建相应的特征向量从而构建待分析模型的特征向量组。构建特征向量组,首先要分析待分析模型从而获取待分析模型的特征向量。在本实施例中,待分析模型的每一个模型元素对应一个特征向量,模型元素的每一个典型特征对应特征向量的一个维度。Next, the construction process of the weighted multi-dimensional search tree in this embodiment will be described in detail. In order to construct the weighted multi-dimensional search tree, step S101, which is the step of constructing feature vector groups, must be executed first. In step S101, corresponding feature vectors are constructed according to the features of the model to be analyzed so as to construct a feature vector group of the model to be analyzed. To construct the eigenvector group, the model to be analyzed must first be analyzed to obtain the eigenvectors of the model to be analyzed. In this embodiment, each model element of the model to be analyzed corresponds to a feature vector, and each typical feature of the model element corresponds to a dimension of the feature vector.

以一个具体的模型为例,如图3所示的模型,模型一共包含五个模型元素,分别为模型元素301(continent)、模型元素302(city)、模型元素303(district)、模型元素304(country)以及模型元素305(countrylanguage)。模型元素的典型特征可以表述为模型名称长度、属性个数、引用次数、主键属性个数以及非空属性个数。将一个模型元素的每个特征描述的数值排列为一个数组即可获得该模型元素对应的特征向量。具体到每个模型元素如下:Taking a specific model as an example, the model shown in Figure 3 contains five model elements in total, namely model element 301 (continent), model element 302 (city), model element 303 (district), and model element 304 (country) and model element 305 (countrylanguage). The typical characteristics of a model element can be expressed as the length of the model name, the number of attributes, the number of references, the number of primary key attributes, and the number of non-null attributes. The feature vector corresponding to the model element can be obtained by arranging the values of each feature description of a model element into an array. Specific to each model element is as follows:

模型元素301(continent)的典型特征为模型名称长度9、属性个数3、引用次数1、主键属性个数1以及非空属性个数3,可表示为特征向量The typical characteristics of the model element 301 (continent) are the length of the model name 9, the number of attributes 3, the number of references 1, the number of primary key attributes 1, and the number of non-empty attributes 3, which can be expressed as a feature vector

continent(9,3,1,1,3);continent(9,3,1,1,3);

模型元素302(city)的典型特征为模型名称长度4、属性个数4、引用次数2、主键属性个数1以及非空属性个数4,可表示为特征向量The typical characteristics of the model element 302 (city) are the model name length 4, the number of attributes 4, the number of references 2, the number of primary key attributes 1, and the number of non-empty attributes 4, which can be expressed as a feature vector

city(4,4,2,1,4);city(4,4,2,1,4);

模型元素303(district)的典型特征为模型名称长度8、属性个数3、引用次数1、主键属性个数1以及非空属性个数3,可表示为特征向量The typical characteristics of the model element 303 (district) are the model name length 8, the number of attributes 3, the number of references 1, the number of primary key attributes 1, and the number of non-empty attributes 3, which can be expressed as a feature vector

district(8,3,1,1,3);district(8,3,1,1,3);

模型元素304(country)的典型特征为模型名称长度7、属性个数10、引用次数3、主键属性个数1以及非空属性个数7,可表示为特征向量The typical characteristics of the model element 304 (country) are the model name length 7, the number of attributes 10, the number of references 3, the number of primary key attributes 1, and the number of non-empty attributes 7, which can be expressed as a feature vector

country(7,10,3,1,7);country(7, 10, 3, 1, 7);

模型元素305(countrylanguage)的典型特征为模型名称长度15、属性个数4、引用次数1、主键属性个数1以及非空属性个数4,可表示为特征向量The typical characteristics of the model element 305 (countrylanguage) are the model name length 15, the number of attributes 4, the number of references 1, the number of primary key attributes 1, and the number of non-empty attributes 4, which can be expressed as a feature vector

countrylangurage(15,4,1,1,4);countrylanguage(15, 4, 1, 1, 4);

图3所示的模型就可表示为特征向量组:The model shown in Figure 3 can be expressed as a set of feature vectors:

(continent(9,3,1,1,3),(continent(9,3,1,1,3),

city(4,4,2,1,4),city(4,4,2,1,4),

country(7,10,3,1,7),country(7, 10, 3, 1, 7),

countrylangurage(15,4,1,1,4),countrylanguage(15, 4, 1, 1, 4),

district(8,3,1,1,3))      (1)district(8, 3, 1, 1, 3)) (1)

在匹配模型元素分析相似度时,需要考虑到典型特征的不同对模型元素相似度的影响。模型的不同特征对模型整体的相似程度影响不同,例如,关键的典型特征对模型元素相似度的影响要大于非关键的典型特征。在本实施例中,为了使得在之后的相似模型元素筛选过程中优先选择重要的典型特征相似的模型元素,根据模型的典型特征的语义给不同的典型特征分配不同的权值,即对特征向量组的不同维度赋予对应的权值。具体到图3所示的模型中,如表1所示:When analyzing the similarity of matching model elements, it is necessary to take into account the impact of differences in typical features on the similarity of model elements. Different features of the model have different effects on the overall similarity of the model. For example, key typical features have a greater impact on the similarity of model elements than non-key typical features. In this embodiment, in order to preferentially select model elements with similar important typical features in the subsequent similar model element screening process, different weights are assigned to different typical features according to the semantics of the typical features of the model, that is, to the feature vector Different dimensions of the group are given corresponding weights. Specific to the model shown in Figure 3, as shown in Table 1:

典型特征typical features权值Weight模型名称长度Model name length0.30.3属性个数Number of attributes0.250.25引用次数Citations0.250.25主键属性个数Number of primary key attributes0.10.1非空属性个数The number of non-empty attributes0.10.1

表1Table 1

特征向量组构建完毕后就可以根据特征向量组构建带权多维搜索树。由于每个模型包含多个模型元素,在本实施例中,用户可以根据实际需要自行设定用于表述模型的模型元素的元素节点的表述范围,即自行设定一个元素节点对应几个模型元素。在构建带权多维搜索树,用户根据实际需要,预先设定好一个特定值,该特定值就是一个元素节点所对应模型元素的数目。After the eigenvector group is constructed, a weighted multidimensional search tree can be constructed according to the eigenvector group. Since each model contains multiple model elements, in this embodiment, the user can set the expression range of the element nodes used to express the model elements of the model according to actual needs, that is, set one element node to correspond to several model elements . When constructing a weighted multidimensional search tree, the user presets a specific value according to actual needs, and the specific value is the number of model elements corresponding to an element node.

接下来就可以对特征向量组进行分割从而将代表模型元素的特征向量分配到元素节点。在本实施例中,为了便于分割特征向量组,首先执行步骤S108,分割目标集创建步骤,创建分割目标集并将待分析模型所对应的特征向量组加入到分割目标集中。The group of feature vectors can then be segmented to assign feature vectors representing model elements to element nodes. In this embodiment, in order to facilitate the segmentation of feature vector groups, step S108 is first performed, the step of creating a segmentation target set, creating a segmentation target set and adding the feature vector group corresponding to the model to be analyzed into the segmentation target set.

然后就可以执行步骤S102,分割判断步骤,从分割目标集中任选一个特征向量组判断其是否需要被分割并将完成分割判断步骤的特征向量组从分割目标集中去除。不难理解,最初选择的特征向量组为待分析模型所对应的特征向量组。Then step S102 can be executed, the segmentation judgment step, selecting a feature vector group from the segmentation target set to determine whether it needs to be segmented and removing the feature vector group that has completed the segmentation judgment step from the segmentation target set. It is not difficult to understand that the initially selected eigenvector group is the eigenvector group corresponding to the model to be analyzed.

在步骤S102中,当特征向量组中的特征向量的数目大于用户预先定义的特定值时目标特征向量组需要被分割。不难理解,通常在设定特定值的时候,特定值小于待分析模型所对应的特征向量组中的特征向量的数目,因此在最初一开始可以省略步骤S102,直接判断待分析模型所对应的特征向量组需要被分割。In step S102, when the number of eigenvectors in the eigenvector group is greater than a specific value predefined by the user, the target eigenvector group needs to be divided. It is not difficult to understand that usually when setting a specific value, the specific value is smaller than the number of eigenvectors in the eigenvector group corresponding to the model to be analyzed, so step S102 can be omitted at the very beginning, and the eigenvector corresponding to the model to be analyzed can be directly judged Groups of eigenvectors need to be split.

当特征向量组不需要被分割时,接下来执行步骤S104,元素节点构造步骤,基于特征向量组构造元素节点。在步骤S104中,将特征向量组中所有的特征向量加入到元素节点中,从而构造元素节点。When the feature vector group does not need to be divided, step S104 is executed next, the element node construction step, and the element node is constructed based on the feature vector group. In step S104, all the feature vectors in the feature vector group are added to the element node, so as to construct the element node.

当目标特征向量组需要被分割时,接下来执行步骤S103,分割步骤,分割特征向量组。在步骤S103中,基于特定的分割策略对特征向量组执行一次分割操作从而获取多个子特征向量组。这里需要说明的是,根据不同的用户需求,在对特征向量组进行分割时可以一次将其分为不同数目的子特征向量组。在本实施例中,采用的是在一次分割操作中将特征向量组分割为两个子特征向量组的方式。When the target eigenvector group needs to be segmented, step S103 is performed next, a segmentation step, to segment the eigenvector group. In step S103, a segmentation operation is performed on the feature vector group based on a specific segmentation strategy to obtain multiple sub-feature vector groups. What needs to be explained here is that, according to different user requirements, when the feature vector group is divided, it can be divided into different numbers of sub-feature vector groups at one time. In this embodiment, the method of dividing the feature vector group into two sub-feature vector groups in one division operation is adopted.

在本实施例中,在分割策略中定义了分割维度以及分割值两个参数。分割维度是特征向量组中一个特定的维度,即其可对应待分析模型中一个特定的典型特征。分割值则是特征向量组中对应分割维度的特征向量的一个可能取值。In this embodiment, two parameters of a segmentation dimension and a segmentation value are defined in the segmentation strategy. The segmentation dimension is a specific dimension in the feature vector group, that is, it can correspond to a specific typical feature in the model to be analyzed. The split value is a possible value of the feature vector corresponding to the split dimension in the feature vector group.

在本实施例中,将特征向量组中每个特征向量在分割维度上的值与分割值做对比从而获取对比结果,根据对比结果将特征向量组分割为两个子特征向量组。In this embodiment, the value of each eigenvector in the eigenvector group on the segmentation dimension is compared with the segmentation value to obtain a comparison result, and the eigenvector group is divided into two sub-eigenvector groups according to the comparison result.

在本实施例中,针对不同的特征向量组采用不同的分割维度以及分割值。欧式空间上某维度的方差最大,意味着在该方向上,数据最分散。在这样的维度上切割,辨识度也最高,便于提高区域搜索效率。此外,考虑到带权多维搜索树上各维度影响力的不同,因此在本实施例中综合了方差与权值两个因素来确定分割维度。每次选取分割维度时,尽量的选择数据权重大且最分散的维度。即选择dim=i|Max(widi),di表示维度i下所有点的方差。In this embodiment, different segmentation dimensions and segmentation values are used for different feature vector groups. The variance of a certain dimension on the Euclidean space is the largest, which means that the data is most scattered in this direction. Cutting in such a dimension has the highest degree of recognition, which is convenient for improving the efficiency of area search. In addition, considering the influence of each dimension on the weighted multi-dimensional search tree, in this embodiment, the two factors of variance and weight are integrated to determine the division dimension. When selecting a split dimension each time, try to choose the dimension with the largest data weight and the most scattered. That is, choose dim=i|Max(wi di ), where di represents the variance of all points under dimension i.

计算特征向量组中每个维度对应的权值与该维度对应的所有特征向量的方差的乘积,选取数值最大的乘积对应的维度作为分割维度。然后选择目标特征向量组中分割维度上所有特征向量的值的中位数作为分割值。具体为,对特征向量组在分割维度上所有的特征向量的值排序,当特征向量组包含奇数个特征向量时选择上述排序中中间的数值作为分割值,当目标特征向量组包含偶数个特征向量时选择上述排序中中间的两个数值的平均数作为分割值。Calculate the product of the weight corresponding to each dimension in the eigenvector group and the variance of all eigenvectors corresponding to this dimension, and select the dimension corresponding to the product with the largest value as the segmentation dimension. The median of the values of all eigenvectors on the split dimension in the target eigenvector group is then chosen as the split value. Specifically, sort the values of all the eigenvectors of the eigenvector group on the segmentation dimension. When the eigenvector group contains an odd number of eigenvectors, select the value in the middle of the above sorting as the split value. When the target eigenvector group contains an even number of eigenvectors When selecting the average of the two values in the middle of the above sorting as the split value.

在分割特征向量组时:首先使目标特征向量组中的特征向量的值小于分割值的特征向量归属一个子特征向量组;然后使特征向量组中的特征向量的值大于分割值的特征向量归属另一个子特征向量组;最后根据两个子特征向量组之间是否平衡来确定目标特征向量组中的特征向量的值等于分割值的特征向量的归属。When splitting the eigenvector group: first, make the eigenvectors whose value of the eigenvector in the target eigenvector group is smaller than the split value belong to a sub-eigenvector group; then make the eigenvectors in the eigenvector group whose value is greater than the split value belong to Another sub-eigenvector group; finally, according to whether the two sub-eigenvector groups are balanced, determine the attribution of the eigenvector whose value of the eigenvector in the target eigenvector group is equal to the split value.

接下来就可以执行步骤S106,目录节点构造步骤,在当前的分割步骤执行完毕后根据当前的分割步骤获取的子特征向量组构建相应的目录节点。在步骤S106中,每一个子特征向量组对应目录节点的一条子树。Next, step S106 can be executed, the directory node construction step, after the current segmentation step is executed, the corresponding directory node is constructed according to the sub-feature vector groups obtained in the current segmentation step. In step S106, each sub-feature vector group corresponds to a sub-tree of the directory node.

除最高层级的目录节点外,其他的所有目录节点均构建在其从属的目录节点的对应子树上。即当前所要构造的目录节点构建在其所对应的当前的分割操作的对象(当前目标特征向量)对应的子树上。Except for the highest-level directory node, all other directory nodes are built on the corresponding subtrees of their subordinate directory nodes. That is, the directory node to be constructed currently is constructed on the subtree corresponding to the object of the current split operation (current target feature vector).

在本实施例中,一个分割步骤(S103)包含一次分割操作,一次分割操作只对一个目标特征向量组进行分割。针对每次分割操作都可以建一个目录节点,一次分割操作所获得的子特征向量组从属于该次分割操作对应的目录节点,其与目录节点的子树一一对应。In this embodiment, one segmentation step ( S103 ) includes one segmentation operation, and one segmentation operation only divides one target feature vector group. A directory node can be created for each split operation, and the sub-feature vector group obtained by a split operation belongs to the directory node corresponding to the split operation, and corresponds to the subtree of the directory node one by one.

目录节点记录有当前目录节点对应的分割维度、当前目录节点对应的分割维度对应的分割值、当前目录节点对应的分割维度对应的权值、当前目录节点对应的两个子特征向量组中特征向量的最小值以及最大值。The directory node records the segmentation dimension corresponding to the current directory node, the segmentation value corresponding to the segmentation dimension corresponding to the current directory node, the weight corresponding to the segmentation dimension corresponding to the current directory node, and the eigenvectors in the two sub-eigenvector groups corresponding to the current directory node. minimum and maximum values.

由于每次分割都会生成多个子特征向量组,并且针对每个子特征向量组都需要执行分割判断步骤(S102)并根据分割判断步骤(S102)的结果执行分割步骤(S103)、目录节点(S106)或元素节点(S104)构造步骤。为了保证带权多维搜索树的完整(不发生漏掉子特征向量组的情况)。在本实施例中,每次分割步骤S103之后都执行步骤S105,分割目标更新步骤,将分割后获得的每个子特征向量组加入到分割目标集中。Since each segmentation will generate a plurality of sub-feature vector groups, and for each sub-feature vector group, it is necessary to perform the segmentation judgment step (S102) and perform the segmentation step (S103) and directory node (S106) according to the result of the segmentation judgment step (S102). Or an element node (S104) construction step. In order to ensure the integrity of the weighted multi-dimensional search tree (the situation of missing sub-feature vector groups does not occur). In this embodiment, step S105 is executed after each segmentation step S103 , the segmentation target updating step, and each sub-feature vector group obtained after segmentation is added to the segmentation target set.

并且接下来执行分割目标遍历步骤,针对分割目标集中所有的特征向量组执行分割判断步骤(S102)并根据分割判断步骤(S102)的结果执行分割步骤(S103)、目录节点(S106)或元素节点(S104)构造步骤。And then execute the segmentation target traversal step, execute the segmentation judgment step (S102) for all feature vector groups in the segmentation target set and perform the segmentation step (S103), directory node (S106) or element node according to the result of the segmentation judgment step (S102) (S104) Construction step.

当一次分割完成且对应的目录节点被构建好后,即有新的子特征向量组加入到分割目标集此时可直接进行分割判断步骤S102。When a segmentation is completed and the corresponding directory nodes are constructed, a new sub-feature vector group is added to the segmentation target set, and the segmentation judgment step S102 can be performed directly.

当一个元素节点被构建好后即意味着带权多维搜索树在该分支上已到终点,此时就需要对其他没有到达终点的分支进行处理。首先要确定是否有没有达到终点的分支,即执行步骤S107,分割目标集判断步骤,判断分割目标集是否为空。When an element node is constructed, it means that the weighted multidimensional search tree has reached the end point on the branch, and at this time, other branches that have not reached the end point need to be processed. First of all, it is determined whether there is a branch that has not reached the end point, that is, step S107 is executed, the segmenting target set judging step, and it is judged whether the segmenting target set is empty.

当分割目标集不为空时,即存在需要进行处理的子特征向量时,继续执行S102,从分割目标集中任选一个特征向量组进行处理。即从分割目标集中任选一个特征向量组执行步骤S102。并根据步骤S102的结果执行步骤S104或者步骤S103、步骤S105以及步骤S106。重复执行上述过程直到分割目标集为空。当分割目标集为空时,带权多维搜索树完成构建。When the segmentation target set is not empty, that is, there are sub-feature vectors that need to be processed, continue to execute S102, and select a feature vector group from the segmentation target set for processing. That is, a feature vector group is selected from the segmentation target set to execute step S102. And execute step S104 or step S103, step S105 and step S106 according to the result of step S102. Repeat the above process until the segmentation target set is empty. When the split target set is empty, the weighted multidimensional search tree is constructed.

以图3所示的模型为例,定义在该模型的带权多维搜索树中一个元素节点对应一个模型元素。模型对应的特征向量组为式(1)所示的特征向量组。Taking the model shown in FIG. 3 as an example, one element node in the weighted multidimensional search tree defined in the model corresponds to one model element. The eigenvector group corresponding to the model is the eigenvector group shown in formula (1).

首先对模型对应的特征向量组进行分割从而构造目录节点。Firstly, the feature vector group corresponding to the model is divided to construct the directory node.

先选择分割维度。计算特征向量组中各维度的方差与权值的乘积,每一维度计算结果分别为3.9,1.74,0.16,,0,0.216。故选择第一维度(模型名称长度)为分割维度;First select the split dimension. Calculate the product of the variance and the weight of each dimension in the feature vector group, and the calculation results of each dimension are 3.9, 1.74, 0.16, 0, 0.216. Therefore, the first dimension (the length of the model name) is selected as the segmentation dimension;

然后选择分割值,根据第一维度排序,即(4,7,8,9,15),选择其中位数(此处为8)作为分割值;Then select the split value, sort according to the first dimension, that is (4, 7, 8, 9, 15), and select the median (8 here) as the split value;

接着将特征向量组分割为如下两个子特征向量组(向量组a和向量组b)。Next, the eigenvector group is divided into the following two sub-eigenvector groups (vector group a and vector group b).

向量组a:(city(4,4,2,1,4),country(7,10,3,1,7),district(8,3,1,1,3))Vector group a: (city(4, 4, 2, 1, 4), country(7, 10, 3, 1, 7), district(8, 3, 1, 1, 3))

向量组b:(continent(9,3,1,1,3),countrylangurage(15,4,1,1,4))Vector group b: (continent(9, 3, 1, 1, 3), countrylanguage(15, 4, 1, 1, 4))

如图2所示,构建相应的目录节点201以及目录节点201中的子树221和222,向量组b对应子树221,向量组a对应子树222。目录节点201记录有分割维度(第一维度,模型名称长度),分割值(8),分割维度对应的权值(0.3)以及两个子特征向量组中特征向量的最小值以及最大值(4,15)。As shown in FIG. 2 , the corresponding directory node 201 and the subtrees 221 and 222 in the directory node 201 are constructed, the vector group b corresponds to the subtree 221 , and the vector group a corresponds to the subtree 222 . The directory node 201 records the segmentation dimension (the first dimension, the length of the model name), the segmentation value (8), the weight corresponding to the segmentation dimension (0.3), and the minimum and maximum values (4, 15).

接下来对向量组b进行分割并在子树221上构造目录节点202。向量组b被分割为对应子树223的向量组c(continent(9,3,1,1,3))以及对应子树224的向量组d(countrylangurage(15,4,1,1,4))。目录节点202记录有分割维度(第二维度,属性个数),分割值(3.5),分割维度对应的权值(0.25)以及两个子特征向量组中特征向量的最小值以及最大值(3,4)。Next, the vector group b is split and the directory node 202 is constructed on the subtree 221 . Vector group b is divided into vector group c(continent(9,3,1,1,3)) corresponding to subtree 223 and vector group d(countrylangurage(15,4,1,1,4) corresponding to subtree 224 ). The directory node 202 records the segmentation dimension (the second dimension, the number of attributes), the segmentation value (3.5), the weight corresponding to the segmentation dimension (0.25) and the minimum and maximum values (3, 4).

根据向量组c(continent(9,3,1,1,3))以及向量组d(countrylangurage(15,4,1,1,4))分别在子树223上构造元素节点211以及在子树224上构造元素节点212。According to the vector group c (continent (9, 3, 1, 1, 3)) and the vector group d (country language (15, 4, 1, 1, 4)), the element node 211 and the Element node 212 is constructed at 224 .

对向量组a进行分割并在子树222上构造目录节点203。向量组a被分割为对应子树225的向量组e(city(4,4,2,1,4),district(8,3,1,1,3))以及对应子树226的向量组f(country(7,10,3,1,7))。目录节点203记录有分割维度(第二维度,属性个数),分割值(4),分割维度对应的权值(0.25)以及两个子特征向量组中特征向量的最小值以及最大值(3,10)。The vector group a is split and the directory node 203 is constructed on the subtree 222 . Vector group a is divided into vector group e (city (4, 4, 2, 1, 4), district (8, 3, 1, 1, 3)) corresponding to subtree 225 and vector group f corresponding to subtree 226 (country(7,10,3,1,7)). The directory node 203 records the segmentation dimension (the second dimension, the number of attributes), the segmentation value (4), the weight corresponding to the segmentation dimension (0.25) and the minimum and maximum values (3, 10).

根据向量组f(country(7,10,3,1,7))在子树226上构造元素节点215。The element node 215 is constructed on the subtree 226 according to the set of vectors f(country(7, 10, 3, 1, 7)).

对向量组e进行分割并在子树225上构造目录节点204。向量组e被分割为对应子树227的向量组g(city(4,4,2,1,4))以及对应子树228的向量组h(district(8,3,1,1,3))。目录节点204记录有分割维度(第一维度,模型名称长度),分割值(6),分割维度对应的权值(0.3)以及两个子特征向量组中特征向量的最小值以及最大值(4,8)。The vector set e is split and the directory node 204 is constructed on the subtree 225 . Vector group e is divided into vector group g(city(4,4,2,1,4)) corresponding to subtree 227 and vector group h(district(8,3,1,1,3) corresponding to subtree 228 ). Directory node 204 records the segmentation dimension (the first dimension, the length of the model name), the segmentation value (6), the weight corresponding to the segmentation dimension (0.3) and the minimum and maximum values of the feature vectors in the two sub feature vector groups (4, 8).

根据向量组向量组g(city(4,4,2,1,4))以及向量组h(district(8,3,1,1,3))分别在子树227上构造元素节点213以及在子树228上构造元素节点214。Construct element node 213 on subtree 227 and in Element node 214 is constructed on subtree 228 .

回到图1,当带权多维搜索树构建完毕后就可以执行步骤S110,区域搜索步骤。在步骤S110中,针对待匹配模型元素在带权多维搜索树上进行区域搜索,从而搜索出带权多维搜索树上与待匹配模型元素相似的所有节点,进而构造待匹配模型元素的相似节点集。Returning to Fig. 1, step S110, the area search step, can be executed after the weighted multi-dimensional search tree is constructed. In step S110, a region search is performed on the weighted multidimensional search tree for the model element to be matched, so as to search for all nodes similar to the model element to be matched on the weighted multidimensional search tree, and then construct a similar node set of the model element to be matched .

步骤S110的具体执行如下所述,为了便于执行区域搜索,首先执行步骤S117,搜索节点集构建步骤,构建搜索节点集并将带权多维搜索树上最高层级的节点(第一个目录节点)(待分析模型对应的特征向量组在第一次被分割时所构建的目录节点)加入到搜索节点集。The specific execution of step S110 is as follows, in order to facilitate the execution of the area search, first execute step S117, the search node set construction step, build the search node set and use the highest-level node (the first directory node) on the weighted multi-dimensional search tree ( The catalog node constructed when the feature vector group corresponding to the model to be analyzed is first divided) is added to the search node set.

然后执行步骤S111,节点类型判断步骤,在搜索节点集中任选一节点进行搜索,先判断该节点是否为元素节点。同时,从搜索节点集中去除执行过节点类型判断步骤的节点。Then execute step S111, the step of judging the node type, choose a node in the search node set to search, and first judge whether the node is an element node. At the same time, the nodes that have performed the step of judging the node type are removed from the search node set.

由于最初搜索节点集中只有带权多维搜索树上的第一个目录节点。因此最初执行步骤S111的目标为带权多维搜索树上的第一个目录节点,且最初执行步骤S111后搜索节点集为空。Since the initial search node set only has the first directory node on the weighted multidimensional search tree. Therefore, the target of executing step S111 initially is the first directory node on the weighted multi-dimensional search tree, and the set of search nodes is empty after initially executing step S111.

如果当前进行搜索的节点不是元素节点,则执行步骤S113,搜索范围获取步骤。当然的,按照本实施例的方法构建的带权多维搜索树的最高层级的节点为目录节点,因此在本实施例中刚开始搜索时可以不执行步骤S111直接执行步骤S113。If the node currently being searched is not an element node, step S113 is performed, a step of obtaining a search range. Of course, the highest-level node of the weighted multi-dimensional search tree constructed according to the method of this embodiment is a directory node, so step S113 can be directly executed without executing step S111 at the beginning of the search in this embodiment.

在步骤S113中,获取待匹配模型元素对应于当前节点的搜索范围。由于带权多维搜索树中,权值大小的不同决定了空间上各方向的搜索区间也不应该相同。因此在步骤S113中,基于当前节点对应的分割维度的权值定义待匹配模型元素在该分割维度上的搜索范围。In step S113, the search range corresponding to the current node of the model element to be matched is obtained. In the weighted multi-dimensional search tree, the difference in the size of the weight determines that the search intervals in all directions in the space should not be the same. Therefore, in step S113, the search range of the model element to be matched on the segmentation dimension is defined based on the weight of the segmentation dimension corresponding to the current node.

在本实施例中,定义带权多维搜索树的搜索范围如下In this embodiment, the search scope for defining a weighted multidimensional search tree is as follows

RRww=={{((xx11,,xx22,,......xxnno))∈∈RRnno||∀∀11≤≤kk≤≤nno,,wwkk||xxkk--aakk||≤≤dd}}------((22))

其中,Rw为搜索范围,d为搜索半径。wk|xk-ak|≤d表示多维空间某一维度上所有与目标节点a的距离及权值的乘积不超过搜索半径d的点的集合。搜索半径通常根据实际情况设置。Among them, Rw is the search range, and d is the search radius. wk |xk -ak |≤d represents the set of all points whose distance from the target node a and the product of weight do not exceed the search radius d in a certain dimension of the multidimensional space. The search radius is usually set according to the actual situation.

基于搜索范围就可以确定下一个需要进行搜索的节点,因此执行步骤S115,搜索满足条件的子树步骤,从当前的节点的所有子树中搜索出满足搜索范围的子树。Based on the search range, the next node to be searched can be determined. Therefore, step S115 is executed to search for subtrees satisfying the condition, and a subtree satisfying the search range is searched out from all subtrees of the current node.

在实际操作中,步骤S115可能会搜索到多个满足搜索范围的子树(本实施例中,针对一个搜索节点最多搜出2个满足搜索范围的子树)。但是在本实施例中,步骤S111每次只对一个节点进行分析判断。因此为了不漏掉节点,保证所有需要搜索的节点都得到搜索,在本实施例中,需要执行步骤S114,搜索节点集更新步骤,将满足搜索范围的子树上的节点加入到搜索节点集中。In actual operation, step S115 may search for multiple subtrees satisfying the search range (in this embodiment, at most two subtrees satisfying the search range are searched for one search node). However, in this embodiment, step S111 only analyzes and judges one node at a time. Therefore, in order not to miss nodes and ensure that all the nodes that need to be searched are searched, in this embodiment, step S114, the step of updating the search node set, is required to add the nodes on the subtree satisfying the search range to the search node set.

然后在步骤S114之后再次执行步骤S111,从搜索节点集中任选一节点,判断其是否为元素节点。具体到本实施例就是如果搜索到2个子树任取一个子树上的节点执行步骤S111并将另一个子树上的节点加入到搜索节点集;如果只搜索到1个子树则直接对其执行步骤S111且搜索节点集不变。Then step S111 is executed again after step S114 to select a node from the search node set and determine whether it is an element node. Specifically in this embodiment, if two subtrees are searched, any node on one subtree is selected to perform step S111 and the node on the other subtree is added to the search node set; if only one subtree is found, it is directly executed Step S111 and the search node set remains unchanged.

接下来如果新的节点仍然是目录节点那么继续执行步骤S113、步骤S115以及步骤S114。重复执行上述过程直到步骤S111中的节点为元素节点。此时接下来执行步骤S112,相似节点获取步骤,添加当前节点(元素节点)至相似节点集。Next, if the new node is still a directory node, then continue to execute step S113, step S115 and step S114. Repeat the above process until the node in step S111 is an element node. At this time, step S112 is executed next, the similar node acquisition step, adding the current node (element node) to the similar node set.

当一个元素节点被添加至相似节点集即意味着带权多维搜索树在该分支上的搜索已完成,此时就需要对其他没有完成的分支(搜索节点集中的节点)进行处理。在步骤S112之后执行步骤S116,搜索节点集判断步骤,判断搜索节点集中是否还存在节点。当搜索节点集中仍然存在节点时,从搜索节点集中任选一个节点进行步骤S111。当搜索节点集中不存在节点时,搜索完成。When an element node is added to the similar node set, it means that the search on the branch of the weighted multidimensional search tree has been completed. At this time, other unfinished branches (nodes in the search node set) need to be processed. Step S116 is executed after step S112 , a search node set judging step, judging whether there is still a node in the search node set. When there are still nodes in the search node set, select a node from the search node set to proceed to step S111. The search is complete when no node exists in the search node set.

在这里需要指出的是,在步骤S115中存在搜索不到满足搜索范围的子树的情况,此时也说明该分支上的搜索已经完成,即在这种情况下继续执行步骤S116。It should be pointed out here that in step S115 there is a case that no subtree satisfying the search range can be searched, which means that the search on this branch has been completed, that is, in this case, continue to execute step S116.

另外,在本实施例中,当最终搜索完成时,如果相似节点集中不包含任何元素节点,则放宽步骤S113中设定的搜索范围重新搜索。In addition, in this embodiment, when the final search is completed, if the set of similar nodes does not contain any element nodes, the search range set in step S113 is relaxed and the search is performed again.

以一具体的应用为例,图4所示的模型为图3所示的模型的修改版。与图3相比,图4所示的模型中:Taking a specific application as an example, the model shown in FIG. 4 is a modified version of the model shown in FIG. 3 . Compared with Figure 3, in the model shown in Figure 4:

1)将模型元素303(district)改名为street(模型元素403)1) Rename model element 303 (district) to street (model element 403)

2)模型元素302(city)添加属性cityRank(模型元素402)2) Model element 302 (city) adds attribute cityRank (model element 402)

3)添加模型元素406(province)3) Add model element 406 (province)

假如要搜索模型元素404(country)在图3所示模型中的相似节点,则利用如图2所示的带权多维搜索树进行搜索。大致的搜索流程为:If it is desired to search for similar nodes of the model element 404 (country) in the model shown in FIG. 3 , the search is performed using the weighted multidimensional search tree shown in FIG. 2 . The general search process is:

建立搜索节点集(空集),搜索目录节点201;Set up search node set (empty set), search directory node 201;

搜索到子树222为满足搜索范围的子树;The searched subtree 222 is a subtree satisfying the search range;

将目录节点203加入到搜索节点集中;Add directory node 203 to the set of search nodes;

从搜索节点集中取出目录节点203(搜索节点集为空),搜索目录节点203;Take out catalog node 203 (search node set is empty) from search node set, search catalog node 203;

搜索到子树226为满足搜索范围的子树;The searched subtree 226 is a subtree satisfying the search range;

将元素节点215加入到搜索节点集中;Add element node 215 to the set of search nodes;

从搜索节点集中取出元素节点215(搜索节点集为空),搜索元素节点215;Take out the element node 215 from the search node set (the search node set is empty), and search the element node 215;

将元素节点215(country(7,10,3,1,7))加入相似节点集;Add the element node 215 (country (7, 10, 3, 1, 7)) to the similar node set;

搜索节点集为空,搜索结束。The search node set is empty and the search ends.

在本实施例的上述搜索过程中搜索半径设为0.8。In the above-mentioned search process of this embodiment, the search radius is set to 0.8.

获取到相似节点集后就可以执行步骤S120,分别计算相似节点集中的每一个元素节点与待匹配模型元素的相似度从而确定与待匹配模型元素相似度最高的元素节点。After the similar node set is obtained, step S120 can be executed to calculate the similarity between each element node in the similar node set and the model element to be matched, so as to determine the element node with the highest similarity to the model element to be matched.

在本实施例中定义了模型元素相似度计算公式,如下:In this embodiment, the formula for calculating the similarity of model elements is defined, as follows:

Sm=λSw+(1-λ)Ss,0≤λ≤1(3)Sm =λSw +(1-λ)Ss ,0≤λ≤1(3)

公式(3)中,模型元素相似度Sm是空间距离相似度Sw与模型元素字符串相似度Ss两部分的加权平均,λ为介于0至1之间的参数。In formula (3), the model element similarity Sm is the weighted average of the two parts of the spatial distance similarity Sw and the model element string similarity Ss , and λ is a parameter between 0 and 1.

空间距离相似度定义为:The spatial distance similarity is defined as:

SSww==1111++distdistww((αα,,ββ))------((44))

其中:distw(α,β)为两点间的带权欧几里得距离,定义为:Among them: distw (α, β) is the weighted Euclidean distance between two points, defined as:

distdistww((αα,,ββ))==ΣΣkk==11nnowwkk((aakk--bbkk))22------((55))

其中:in:

w→=(w1,w2,...,wn)为权值向量,且Σk=1nwk=1.w &Right Arrow; = ( w 1 , w 2 , . . . , w no ) is a weight vector, and Σ k = 1 no w k = 1 .

字符串相似度的定义则借助于字符串编辑距离。两字符串的编辑距离指从源字符串到目标字符串所需要的最少的编辑操作,包括插入一个特定字符、替换和删除任意字符。编辑距离的大小反应了字符串的相似程度。字符串相似度定义如下:The definition of string similarity is by means of string edit distance. The edit distance between two strings refers to the minimum editing operations required from the source string to the target string, including inserting a specific character, replacing and deleting any character. The size of the edit distance reflects the similarity of the strings. String similarity is defined as follows:

SSsthe s==11--editedit((aa,,bb))MaxMax((LLaa,,LLbb))------((66))

其中,edit(a,b)表示两字符串a,b的编辑距离,La,Lb分别为其长度。当edit(a,b)=0时,Ss=1,表示两字符串完全相等。当edit(a,b)为最大字符串长度时,说明两字符串完全不同,此时相似度为0。公式也体现了相似度同时受编辑长度与字符串长度影响。Among them, edit(a, b) represents the edit distance of two strings a, b, and La and Lb are their lengths respectively. When edit(a,b)=0, Ss =1, indicating that the two character strings are completely equal. When edit(a,b) is the maximum string length, it means that the two strings are completely different, and the similarity is 0 at this time. The formula also reflects that the similarity is affected by both the edit length and the string length.

综上,可从某一模型元素的相似节点集中,计算出待匹配模型中最相似的模型元素。In summary, the most similar model element in the model to be matched can be calculated from the similar node set of a certain model element.

在模型的构建过程中,开发人员、构建环境、应用目标等多种因素都会影响到最终构建完成的模型元素。模型元素的不确定性大大加大了模型元素匹配的难度。由于本发明的方法构建了统一的特征向量组构建标准以及带权多维搜索树构建标准并建立了统一的相似度计算方法,因此针对大大降低了模型元素不确定性最终匹配结果的影响。本发明的方法不基于静态标识符,从而避免了模型并行开发时静态标识符分配的不确定性对匹配计算的影响。进而使得本发明的方法可以应用于多人协作编辑模型的情况。In the process of model construction, various factors such as developers, construction environment, and application goals will affect the final model elements. The uncertainty of model elements greatly increases the difficulty of model element matching. Since the method of the present invention establishes a unified feature vector group construction standard and a weighted multi-dimensional search tree construction standard and a unified similarity calculation method, it greatly reduces the influence of model element uncertainty on the final matching result. The method of the present invention is not based on static identifiers, thereby avoiding the impact of the uncertainty of static identifier allocation on matching calculations during model parallel development. Furthermore, the method of the present invention can be applied to the situation of multi-person collaborative editing model.

下面假设将图4所示模型中的模型元素402(city)的模型名称改为cities,其他特征不变,从而得到新的模型元素(cities)。Next, it is assumed that the model name of the model element 402 (city) in the model shown in FIG. 4 is changed to cities, and other features remain unchanged, so as to obtain a new model element (cities).

假设要搜索图3所示模型中与模型元素(cities)最匹配的模型元素,设定搜索半径为2,则通过搜索图2所示的带权多维搜索树可以获得包含元素节点213以及214的相似节点集(city(4,4,2,1,4),district(8,3,1,1,3))即对应模型元素302以及303。Suppose you want to search for the model elements that best match the model elements (cities) in the model shown in Figure 3, and set the search radius to 2, then by searching the weighted multidimensional search tree shown in Figure 2, you can obtain the Similar node sets (city (4, 4, 2, 1, 4), district (8, 3, 1, 1, 3)) correspond to model elements 302 and 303 .

模型元素(cities)对应的特征向量为(6,5,2,1,5)The eigenvector corresponding to the model element (cities) is (6, 5, 2, 1, 5)

分别计算cities与city和district的相似度。Calculate the similarity between cities and city and district respectively.

设λ=0.5Let λ=0.5

(1)cities与city(1) cities and cities

distw(α,β)=(0.3*2*2+0.25*1*1+0+0+0.1*1)1/2=1.24distw (α,β)=(0.3*2*2+0.25*1*1+0+0+0.1*1)1/2 =1.24

Sw=1/(1+1.24)=0.44Sw =1/(1+1.24)=0.44

Ss=1-3/6=0.5Ss =1-3/6=0.5

Sm=0.5*0.44+0.5*0.5=0.47Sm =0.5*0.44+0.5*0.5=0.47

(2)cities与district(2) cities and districts

distw(α,β)=(0.3*2*2+0.25*2*2+0.25*1+0.1*1)1/2=1.59distw (α,β)=(0.3*2*2+0.25*2*2+0.25*1+0.1*1)1/2 =1.59

Sw=1/(1+1.59)=0.38Sw =1/(1+1.59)=0.38

Ss=1-8/8=0;Ss =1-8/8=0;

Sm=0.5*0.38=0.19Sm =0.5*0.38=0.19

综上可以看出相较于模型元素(district),模型元素(city)与模型元素(cities)的相似度较高。因此在图3所示的模型中与模型元素(cities)相匹配的模型元素为模型元素(city)。上述获取模型元素匹配结果的过程中只需进行2次相似度计算,相较于现有技术中将模型中所有元素遍历计算(本应用例中需要计算5次)的方法,计算量大大减小,计算速度和效率都得到了大大提高,从而大大减小了模型元素匹配整体过程的耗时,提高了模型元素匹配的执行速度和效率。In summary, it can be seen that compared with the model element (district), the similarity between the model element (city) and the model element (cities) is higher. Therefore, in the model shown in FIG. 3 , the model element that matches the model element (cities) is the model element (city). In the above process of obtaining the matching results of model elements, only two similarity calculations are required. Compared with the method of traversing and calculating all elements in the model in the prior art (in this application example, 5 calculations are required), the amount of calculation is greatly reduced. , the calculation speed and efficiency have been greatly improved, thereby greatly reducing the time consumption of the overall process of model element matching, and improving the execution speed and efficiency of model element matching.

虽然本发明所公开的实施方式如上,但所述的内容只是为了便于理解本发明而采用的实施方式,并非用以限定本发明。本发明所述的方法还可有其他多种实施例。在不背离本发明实质的情况下,熟悉本领域的技术人员当可根据本发明作出各种相应的改变或变形,但这些相应的改变或变形都应属于本发明的权利要求的保护范围。Although the embodiments disclosed in the present invention are as above, the described content is only an embodiment adopted for the convenience of understanding the present invention, and is not intended to limit the present invention. The method described in the present invention can also have other various embodiments. Without departing from the essence of the present invention, those skilled in the art can make various corresponding changes or modifications according to the present invention, but these corresponding changes or modifications should all belong to the protection scope of the claims of the present invention.

Claims (11)

CN201510028158.4A2015-01-202015-01-20 A Model Element Matching Method for Type Property Graph ModelActiveCN104598591B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201510028158.4ACN104598591B (en)2015-01-202015-01-20 A Model Element Matching Method for Type Property Graph Model

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201510028158.4ACN104598591B (en)2015-01-202015-01-20 A Model Element Matching Method for Type Property Graph Model

Publications (2)

Publication NumberPublication Date
CN104598591Atrue CN104598591A (en)2015-05-06
CN104598591B CN104598591B (en)2017-08-04

Family

ID=53124376

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201510028158.4AActiveCN104598591B (en)2015-01-202015-01-20 A Model Element Matching Method for Type Property Graph Model

Country Status (1)

CountryLink
CN (1)CN104598591B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106997335A (en)*2016-01-262017-08-01阿里巴巴集团控股有限公司The decision method and device of identical characters string
CN108491592A (en)*2018-03-062018-09-04中国第汽车股份有限公司CAE simulation results automatic processing method and system
CN115767606A (en)*2022-10-212023-03-07西安电子科技大学Method for acquiring shortest interrupt time under limit of certain fault-tolerant number of time-varying network

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20030037039A1 (en)*2001-08-162003-02-20International Business Machines CorporationSchema for SQL statements
CN103049503A (en)*2012-12-112013-04-17南京大学UML (Unified Modeling Language) model querying method based on structure matching
CN103390018A (en)*2013-04-282013-11-13浙江工业大学Web service data modeling and searching method based on SDD (service data description)

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20030037039A1 (en)*2001-08-162003-02-20International Business Machines CorporationSchema for SQL statements
CN103049503A (en)*2012-12-112013-04-17南京大学UML (Unified Modeling Language) model querying method based on structure matching
CN103390018A (en)*2013-04-282013-11-13浙江工业大学Web service data modeling and searching method based on SDD (service data description)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张任伟 等: "基于带权多维搜索树的模型匹配算法", 《清华大学学报(自然科学版)》*

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106997335A (en)*2016-01-262017-08-01阿里巴巴集团控股有限公司The decision method and device of identical characters string
CN108491592A (en)*2018-03-062018-09-04中国第汽车股份有限公司CAE simulation results automatic processing method and system
CN108491592B (en)*2018-03-062022-06-14中国第一汽车股份有限公司CAE simulation result automatic processing method and system
CN115767606A (en)*2022-10-212023-03-07西安电子科技大学Method for acquiring shortest interrupt time under limit of certain fault-tolerant number of time-varying network

Also Published As

Publication numberPublication date
CN104598591B (en)2017-08-04

Similar Documents

PublicationPublication DateTitle
Zhang et al.Treepi: A novel graph indexing method
Hua et al.Faster parallel core maintenance algorithms in dynamic graphs
JP5995409B2 (en) Graphical model for representing text documents for computer analysis
CN105677683B (en)Batch data querying method and device
CN106682116A (en)OPTICS point sorting clustering method based on Spark memory computing big data platform
CN103761236A (en)Incremental frequent pattern increase data mining method
CN109614520B (en)Parallel acceleration method for multi-pattern graph matching
WO2015010509A1 (en)One-dimensional liner space-based method for implementing trie tree dictionary search
CN107885503A (en)A kind of iteration based on performance of program analysis compiles optimization method
CN104573039A (en)Keyword search method of relational database
Xu et al.Distributed maximal clique computation and management
CN116301755B (en)Automatic batch flow data marking framework construction method based on directed calculation graph
CN118520008B (en) An intelligent query optimization method and system for Spark SQL
CN105808729A (en)Academic big data analysis method based on reference relationship among pieces of thesis
CN104598591B (en) A Model Element Matching Method for Type Property Graph Model
CN109542949B (en) A Knowledge Acquisition Method for Decision Information System Based on Form Vector
CN116414822B (en)Method and device for constructing seismic data index library, related equipment and index library
CN105631465A (en)Density peak-based high-efficiency hierarchical clustering method
Bause et al.Gradual weisfeiler-leman: Slow and steady wins the race
CN117972111B (en) A knowledge reasoning method based on online graph processing technology for knowledge graph
Rao et al.An approach to merging of two community subgraphs to form a community graph using graph mining techniques
Zhang et al.An improved label propagation algorithm based on the similarity matrix using random walk
CN108717551A (en)A kind of fuzzy hierarchy clustering method based on maximum membership degree
Wetzels et al.Taming horizontal instability in merge trees: On the computation of a comprehensive deformation-based edit distance
US11386155B2 (en)Filter evaluation in a database system

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp