技术领域technical field
本发明涉及信息技术领域,特别涉及一种数据集成的方法和装置。 The invention relates to the field of information technology, in particular to a data integration method and device. the
背景技术Background technique
数据管理是利用计算机硬件和软件技术对数据进行有效的收集、存储、处理和应用的过程。其目的在于充分有效地发挥数据的作用。实现数据有效管理的关键是数据组织。随着计算机技术的发展,数据管理经历了人工管理、文件系统、数据库系统三个发展阶段。在数据库系统中所建立的数据结构,更充分地描述了数据间的内在联系,便于数据修改、更新与扩充,同时保证了数据的独立性、可靠、安全性与完整性,减少了数据冗余,故提高了数据共享程度及数据管理效率。 Data management is the process of using computer hardware and software technology to effectively collect, store, process and apply data. Its purpose is to fully and effectively play the role of data. The key to effective data management is data organization. With the development of computer technology, data management has experienced three stages of development: manual management, file system, and database system. The data structure established in the database system more fully describes the internal relationship between data, facilitates data modification, update and expansion, and at the same time ensures the independence, reliability, security and integrity of data, reducing data redundancy , so the degree of data sharing and data management efficiency are improved. the
数据集成技术是数据库系统设计的重要组成基本技术。随着信息化应用的不断深入,企业内部、企业与外部信息交互的需求日益强烈,急切需要对已有的信息进行整合,联通“信息孤岛”,共享信息。企业实现数据集成共享,可以使更多的人更充分地使用已有数据资源,减少资料收集、数据采集等重复劳动和相应费用。目前,业界在实施数据集成的过程当中,同类型的数据经常会出现不一致,重复或来源途径不明等问题,因此,如何对数据进行有效的集成管理已成为增强企业商业竞争力的必然选择。 Data integration technology is an important component of database system design. With the continuous deepening of information application, the demand for information interaction within the enterprise and between the enterprise and the outside is becoming increasingly strong, and there is an urgent need to integrate existing information, connect "information islands" and share information. Enterprises realize data integration and sharing, which can enable more people to use existing data resources more fully, and reduce duplication of labor and corresponding costs such as data collection and data acquisition. At present, in the process of implementing data integration in the industry, the same type of data often has problems such as inconsistency, duplication, or unknown sources. Therefore, how to effectively integrate and manage data has become an inevitable choice to enhance the business competitiveness of enterprises. the
目前业界的数据集成方式主要有两种: At present, there are two main methods of data integration in the industry:
一、基于数据库表传递的被动集成方式: 1. Passive integration method based on database table transfer:
该集成方式通过集成中间件采用JDBC(Java Data Base Connectivity,java数据库连接)、消息队列等技术实现不同应用系统间数据在表与表之间的传递。 This integration method uses technologies such as JDBC (Java Data Base Connectivity, java database connection) and message queues to realize the transfer of data between different application systems between tables through integrated middleware. the
二、基于Web Service(网络服务)的数据主动集成方式: 2. Active data integration method based on Web Service (network service):
该集成方式通过主动调用Web Service方式,采用SOAP(Simple Object Access Protocol,简单对象访问协议)等协议传递数据,实现数据在不同系统间的数据传递。 This integration method actively calls the Web Service method, and uses protocols such as SOAP (Simple Object Access Protocol, Simple Object Access Protocol) to transfer data to realize data transfer between different systems. the
不同的集成方式可以根据图论抽像为统一的图模型,如图1所示,图1中的A、B代表不同的应用系统或数据库,Data表示应该系统间传递的数据。通过这样的图模型可以用来表达数据传递的过程(在数据血缘图中为数据血缘)。 Different integration methods can be abstracted into a unified graph model based on graph theory, as shown in Figure 1. A and B in Figure 1 represent different application systems or databases, and Data represents the data that should be transferred between systems. Such a graph model can be used to express the process of data transfer (data lineage in the data lineage diagram). the
数据血缘是信息架构研究过程中发展出来的一种新兴的技术方案。数据血缘通常被定义为数据在其生命周期内,从创建到系统间传递的过程。 Data lineage is an emerging technical solution developed in the process of information architecture research. Data lineage is usually defined as the process of data in its life cycle, from creation to transfer between systems. the
在大型数据库中,数据之间传递关系复杂,数据集成关系错综复杂,按照数据传递关系会形成一个复杂的网状结构图,例如数据血缘图,难以直接通过数据血缘图做影响性分析。因此,需要一种更好的数据集成方案,以便进行影响性分析。 In a large database, the transfer relationship between data is complex, and the data integration relationship is intricate. According to the data transfer relationship, a complex network structure diagram will be formed, such as the data blood relationship diagram, which is difficult to directly analyze the impact of the data blood relationship diagram. Therefore, a better data integration scheme is needed for impact analysis. the
发明内容Contents of the invention
本发明实施例提供了一种数据集成的方法和装置,用于提供一种更好的数据集成方案,便于进行影响性分析。 Embodiments of the present invention provide a data integration method and device, which are used to provide a better data integration solution and facilitate impact analysis. the
本发明实施例一方面提供了一种数据集成的方法,包括: Embodiments of the present invention provide a data integration method on the one hand, including:
获取需要展示的图形数据集的数据集成路径,数据集成路径标示了各数据之间传递的路径和方向; Obtain the data integration path of the graphic dataset that needs to be displayed, and the data integration path indicates the path and direction of data transfer;
依据数据之间传递的路径和方向,确定各数据的入度; Determine the in-degree of each data according to the path and direction transmitted between the data;
将入度最小的数据作为根节点,将所述图形数据集中的数据结构构建为树形数据结构。 The data with the smallest in-degree is used as the root node, and the data structure in the graph data set is constructed as a tree data structure. the
结合一方面的实现方式,在第一种可能的实现方式中,所述方法还包括: In combination with an implementation manner on the one hand, in a first possible implementation manner, the method further includes:
依据数据之间传递的路径和方向,确定各数据的出度;并确定所述图形数据集的各数据是否包含在树形数据结构中; Determine the out-degree of each data according to the path and direction transmitted between the data; and determine whether each data of the graphic data set is included in the tree data structure;
所述将入度最小的数据作为根节点包括: The data with the smallest in-degree as the root node includes:
将入度最小的数据作为根节点,若最小入度的数据有两个或两个以上,则将所述两个或两个以上的数据中出度最大且未包含在树形数据结构中的数据作为根节点。 The data with the smallest in-degree is used as the root node. If there are two or more data with the smallest in-degree, the data with the largest out-degree among the two or more data that is not included in the tree data structure data as the root node. the
结合一方面的实现方式或者第一种可能的实现方式,在第二种可能的实现方式中,所述将所述图形数据集中的数据构建为树形数据结构包括: In combination with one aspect of the implementation or the first possible implementation, in the second possible implementation, the constructing the data in the graph data set into a tree data structure includes:
按照广度优先算法遍历所述图形数据集中的数据,得到树形数据结构。 According to the breadth-first algorithm, the data in the graph data set is traversed to obtain a tree data structure. the
结合一方面的实现方式或者第一种可能的实现方式,在第三种可能的实现方式中,所述方法还包括以下至少之一: In combination with one aspect of the implementation or the first possible implementation, in a third possible implementation, the method further includes at least one of the following:
若存在两个或两个以上的根节点,则标识初源异常; If there are two or more root nodes, identify the original source exception;
若所述图形数据集中存在两个或两个以上的数据为数据创建点,则将所述数据创建点标识为异常; If there are two or more data in the graphic data set as data creation points, then mark the data creation points as abnormal;
若数据集成路径不在所述树形数据结构中,则将不在树形数据结构中的数据集成路径标识为异常。 If the data integration path is not in the tree data structure, mark the data integration path not in the tree data structure as abnormal. the
结合一方面的实现方式或者第一种可能的实现方式,在第四种可能的实现方式中,所述图形数据集包括:数据血缘图的数据集合,所述数据集成路径为数据血缘图中的有向边。 With reference to the implementation manner in one aspect or the first possible implementation manner, in a fourth possible implementation manner, the graph data set includes: a data set of a data lineage graph, and the data integration path is a There is an edge. the
本发明实施例二方面提供了一种数据集成的装置,包括: The second aspect of the embodiment of the present invention provides a data integration device, including:
路径获取单元,用于获取需要展示的图形数据集的数据集成路径,数据集成路径标示了各数据之间传递的路径和方向; The path acquisition unit is used to obtain the data integration path of the graphic data set to be displayed, and the data integration path indicates the path and direction of the transmission between each data;
度获取单元,用于依据所述路径获取单元获取的数据之间传递的路径和方向,确定各数据的入度; A degree acquisition unit, used to determine the in-degree of each data according to the path and direction transmitted between the data acquired by the path acquisition unit;
数据集成单元,用于将入度最小的数据作为根节点,将所述图形数据集中的数据结构构建为树形数据结构。 The data integration unit is configured to use the data with the smallest in-degree as the root node, and construct the data structure in the graph data set into a tree data structure. the
结合二方面的实现方式,在第一种可能的实现方式中,所述装置还包括: In combination with the implementations of the two aspects, in a first possible implementation, the device further includes:
节点确定单元,用于确定所述图形数据集的各数据是否包含在所述数据集成单元建立的树形数据结构中; a node determination unit, configured to determine whether each data of the graph data set is included in the tree data structure established by the data integration unit;
所述度获取单元,还用于依据数据之间传递的路径和方向,确定各数据的出度; The degree acquisition unit is also used to determine the out-degree of each data according to the path and direction transmitted between the data;
所述数据集成单元,具体用于将入度最小的数据作为根节点,若最小入度的数据有两个或两个以上,则将所述两个或两个以上的数据中出度最大且未包含在树形数据结构中的数据作为根节点。 The data integration unit is specifically used to use the data with the smallest in-degree as the root node, and if there are two or more data with the smallest in-degree, then the two or more data with the largest out-degree and Data not included in the tree data structure serves as the root node. the
结合二方面的实现方式或者第一种可能的实现方式,在第二种可能的实现方式中,所述数据集成单元,具体用于按照广度优先算法遍历所述图形数据集中的数据,得到树形数据结构。 In combination with the implementation of the two aspects or the first possible implementation, in the second possible implementation, the data integration unit is specifically configured to traverse the data in the graph data set according to the breadth-first algorithm to obtain a tree-shaped data structure. the
结合二方面的实现方式或者第一种可能的实现方式,在第三种可能的实现方式中,所述装置,还包括: In combination with the implementation of the second aspect or the first possible implementation, in a third possible implementation, the device further includes:
异常标识单元,用于执行以下至少之一:若存在两个或两个以上的根节点,则标识初源异常;若所述图形数据集中存在两个或两个以上的数据为数据创建点,则将所述数据创建点标识为异常;若数据集成路径不在所述树形数据结构中,则将不在树形数据结构中的数据集成路径标识为异常。 An exception identification unit, configured to perform at least one of the following: if there are two or more root nodes, identify the original source exception; if there are two or more data in the graph data set as data creation points, Then mark the data creation point as abnormal; if the data integration path is not in the tree data structure, mark the data integration path not in the tree data structure as abnormal. the
结合二方面的实现方式或者第一种可能的实现方式,在第四种可能的实现方式中,所述路径获取单元,具体用于获取需要展示的数据血缘图的有向边,所述有向边标示了各数据之间传递的路径和方向。 In combination with the implementation of the second aspect or the first possible implementation, in a fourth possible implementation, the path obtaining unit is specifically configured to obtain directed edges of the data blood relationship graph to be displayed, and the directed Edges indicate the path and direction of data transfer. the
从以上技术方案可以看出,本发明实施例具有以下优点:将图形数据集的复杂的网状结构转换为树形数据结构,使数据集成后的数据关系更简洁,树形数据结构更具可读性,属于更好的数据集成方案,为进行影响性分析提供了便利。 It can be seen from the above technical solutions that the embodiments of the present invention have the following advantages: the complex network structure of the graphic data set is converted into a tree data structure, so that the data relationship after data integration is more concise, and the tree data structure is more predictable. Readability, which is a better data integration solution, provides convenience for impact analysis. the
附图说明Description of drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。 In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without any creative effort. the
图1现有技术数据集成图模型示意图; Fig. 1 schematic diagram of prior art data integration graph model;
图2为本发明实施例方法流程示意图; Fig. 2 is a schematic flow chart of the method of the embodiment of the present invention;
图3为本发明实施例数据血缘图模型示意图; Fig. 3 is a schematic diagram of the data kinship map model of the embodiment of the present invention;
图4为本发明实施例方法流程示意图; Fig. 4 is a schematic flow chart of the method of the embodiment of the present invention;
图5A为本发明实施例异常示意图; Figure 5A is a schematic diagram of the abnormality of the embodiment of the present invention;
图5B为本发明实施例异常示意图; Figure 5B is a schematic diagram of the abnormality of the embodiment of the present invention;
图5C为本发明实施例异常示意图; Figure 5C is a schematic diagram of the abnormality of the embodiment of the present invention;
图6为本发明实施例装置结构示意图; Fig. 6 is the structural representation of device embodiment of the present invention;
图7为本发明实施例装置结构示意图; Fig. 7 is the structural representation of device embodiment of the present invention;
图8为本发明实施例装置结构示意图; Fig. 8 is a schematic diagram of the device structure of the embodiment of the present invention;
图9为本发明实施例设备结构示意图; Fig. 9 is a schematic diagram of the device structure of the embodiment of the present invention;
图10为本发明实施例终端设备结构示意图。 FIG. 10 is a schematic structural diagram of a terminal device according to an embodiment of the present invention. the
具体实施方式Detailed ways
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部份实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。 In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, rather than all embodiments . Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention. the
本发明实施例提供了一种数据集成的方法,如图2所示,包括: The embodiment of the present invention provides a method for data integration, as shown in Figure 2, including:
201:获取需要展示的图形数据集的数据集成路径,数据集成路径标示了各数据之间传递的路径和方向; 201: Obtain the data integration path of the graphic data set to be displayed, and the data integration path indicates the path and direction of data transfer;
图形数据集是指按照数据之间传递关系会组成图形结构的数据集,可以是已经组成的图形数据结构,例如数据血缘图(也称为数据血缘有向图)、ER(Entity Relationship,实体关系)图等,如果上述图形数据集为数据血缘图的数据集合,上述数据集成路径为数据血缘图中的有向边。可以理解的是,由于数据集可以是数据血缘图的数据集合,也可以是ER图中的设备,那么上述数据集应理解为广义的数据,代表数据库中存储的数据,或者,与具体的应用系统、实体设备等具有对应关系的数据。 A graph data set refers to a data set that will form a graph structure according to the transfer relationship between data. It can be a graph data structure that has already been formed, such as a data lineage graph (also known as a data lineage directed graph), ER (Entity Relationship, entity relationship) ) graph, etc., if the above-mentioned graph data set is a data set of a data lineage graph, the above-mentioned data integration path is a directed edge in the data lineage graph. It can be understood that since the data set can be the data set of the data lineage diagram, or the equipment in the ER diagram, then the above data set should be understood as data in a broad sense, representing the data stored in the database, or, related to the specific application System, physical equipment and other data that have a corresponding relationship. the
在数据血缘图中,表示数据传递过程中,数据源对数据所进行的操作。Create表示数据的创建、录入;Read表示数据的读取;Update表示数据的更新;Delete表示数据的删除。 In the data lineage graph, it represents the operations performed by the data source on the data during the data transfer process. Create represents the creation and entry of data; Read represents the reading of data; Update represents the update of data; Delete represents the deletion of data. the
数据血缘图G(V,E,Data)模型定义如下: The data lineage graph G (V, E, Data) model is defined as follows:
V为数据血缘图中的顶点,定义为数据源(Data Source),即创建或保存数据(Data)的应用系统或数据库。 V is the vertex in the data lineage graph, defined as the data source (Data Source), that is, the application system or database that creates or saves the data (Data). the
E为数据血缘图中的有向边,定义为数据集成路径,E=edge(u,v),u|v∈G(v),u!=v,即数据在数据源间传递的路径和方向。特殊说明:数据血缘图中,不允许出现数据源自包含集成路径,即源和目的为同一个数据源的有向边,集成定义为不同数据源间的数据传递。 E is the directed edge in the data lineage graph, which is defined as the data integration path, E=edge(u,v),u|v∈G(v),u!=v, that is, the path and path of data transfer between data sources direction. Special note: In the data lineage graph, data sources and integration paths are not allowed, that is, directed edges whose source and destination are the same data source, and integration is defined as data transfer between different data sources. the
Data,数据血缘有向图中所传递的数据,一般定义为数据实体,可以对应数据库中的表。 Data, the data transmitted in the data lineage directed graph, is generally defined as a data entity, which can correspond to a table in the database. the
在数据血缘图中,数据集成路径标表示数据传递过程中,数据源对数据所进行的操作。操作主要有如下几类:Create表示数据的创建、录入;Read表示数据的读取;Update表示数据的更新;Delete表示数据的删除。数据源是指数据所在的应用系统或数据库。初始源(Initial Source)是指在数据血缘图中,数据最初创建录入的数据源即为初始源。初始源属于数据源。 In the data lineage diagram, the data integration path label indicates the operation performed by the data source on the data during the data transfer process. There are mainly the following types of operations: Create means data creation and entry; Read means data reading; Update means data update; Delete means data deletion. The data source refers to the application system or database where the data resides. The initial source (Initial Source) means that in the data lineage graph, the data source where the data is initially created and entered is the initial source. The original source belongs to the data source. the
202:依据数据之间传递的路径和方向,确定各数据的入度; 202: Determine the in-degree of each data according to the path and direction transmitted between the data;
“入度”这个词源于图论算法,它通常指有向图中某点作为图中边的终点的次数之和。与之对应的则是“出度”,指有向图中某点作为图中边的起点的次数之和。“入度”和“出度”在数据集成的各种数据结构技术领域也有广泛应用。 The term "in-degree" comes from graph theory algorithms, and it usually refers to the sum of the number of times a point in a directed graph is the end point of an edge in the graph. Corresponding to it is "out-degree", which refers to the sum of the number of times that a point in a directed graph is used as the starting point of an edge in the graph. "In-degree" and "out-degree" are also widely used in various data structure technology fields of data integration. the
在数据血缘图G(V,E,Data)模型中,入度对应以数据源G(V)为输入目的的有向边个数,正常情况下,一个系统只能有一个确定的数据源作为输入。出度对应以数据源G(V)为输入源的有向边个数,正常情况下,一个数据源允许给多个下游应用系统(数据源)集成。 In the data kinship graph G(V, E, Data) model, the in-degree corresponds to the number of directed edges whose input purpose is the data source G(V). Under normal circumstances, a system can only have one definite data source as enter. The out-degree corresponds to the number of directed edges with the data source G(V) as the input source. Under normal circumstances, one data source allows multiple downstream application systems (data sources) to be integrated. the
203:将入度最小的数据作为根节点,将上述图形数据集中的数据结构构建为树形数据结构。 203: Use the data with the smallest in-degree as the root node, and construct the data structure in the graph data set as a tree data structure. the
树形数据结构,简称为树形结构,指的是数据元素之间存在着“一对多”的树形关系的数据结构。在树形结构中,树根结点没有前驱结点,其余每个结点有且只有一个前驱结点。叶子结点没有后续结点,其余每个结点的后续节点数可以是一个也可以是多个。另外,数学统计中的树形结构还可表示层次关系。树形数据结构中,前驱节点与后续节点之间存在传递关系,这种传递关系可以表示数据集成路径。 A tree data structure, referred to as a tree structure for short, refers to a data structure in which there is a "one-to-many" tree relationship between data elements. In the tree structure, the root node has no predecessor node, and each other node has one and only one predecessor node. A leaf node has no subsequent nodes, and the number of subsequent nodes of each other node can be one or more. In addition, the tree structure in mathematical statistics can also represent hierarchical relationships. In the tree data structure, there is a transitive relationship between the predecessor node and the subsequent node, which can represent the data integration path. the
本发明实施例,将图形数据集的复杂的网状结构转换为树形数据结构,使数据集成后的数据关系更简洁,树形数据结构更具可读性,属于更好的数据集成方案,为进行影响性分析提供了便利。 In the embodiment of the present invention, the complex network structure of the graphic data set is converted into a tree data structure, so that the data relationship after data integration is more concise, and the tree data structure is more readable, which belongs to a better data integration scheme. Facilitates impact analysis. the
进一步地,由于可能存在多个相同最小入度的数据,本发明实施例还提供了进一步的解决方案:上述方法,还包括:依据数据之间传递的路径和方向,确定各数据的出度;并确定上述图形数据集的各数据是否包含在树形数 据结构中; Furthermore, since there may be multiple data with the same minimum in-degree, the embodiment of the present invention also provides a further solution: the above method further includes: determining the out-degree of each data according to the path and direction transmitted between the data; And determine whether each data of the above-mentioned graphic data set is included in the tree data structure;
上述将入度最小的数据作为根节点包括:将入度最小的数据作为根节点,若最小入度的数据有两个或两个以上,则将上述两个或两个以上的数据中出度最大且未包含在树形数据结构中的数据作为根节点。 The above-mentioned taking the data with the smallest in-degree as the root node includes: taking the data with the smallest in-degree as the root node, if there are two or more data with the smallest in-degree, then the out-degree of the above two or more data The largest data not included in the tree data structure is used as the root node. the
优选地,上述将上述图形数据集中的数据构建为树形数据结构包括:按照广度优先算法遍历上述图形数据集中的数据,得到树形数据结构。需要说明的是,按照深度优先算法生成血缘基础树也是可以的,以上按广度优先算法生成血缘基础树这一优选方案,不应理解为对本发明实施例的唯一限定。 Preferably, the above-mentioned constructing the data in the above-mentioned graph data set into a tree-shaped data structure includes: traversing the data in the above-mentioned graph data set according to a breadth-first algorithm to obtain a tree-shaped data structure. It should be noted that it is also possible to generate the basic bloodline tree according to the depth-first algorithm, and the above preferred solution of generating the basic bloodline tree according to the breadth-first algorithm should not be understood as the only limitation to the embodiment of the present invention. the
进一步地,本发明实施还提供了标识异常的方案,如下:上述方法,还包括以下至少之一: Further, the implementation of the present invention also provides a solution for identifying abnormalities, as follows: The above method also includes at least one of the following:
一个树形数据结构中只能有一个根节点,例如:在数据血缘图中,数据源是识别到的另一个初始源,初始源作为数据的首次创建录入点,根据业务规则,存在两个及以上的初始源,会引入数据不一致性问题或数据重复问题。基于此,本发明实施例提供了如下异常识别方案:若存在两个或两个以上的根节点,则标识初源异常。需要说明的是,标识初始源的方案,并不需要以生成树形数据结构为前提。 There can only be one root node in a tree data structure. For example, in the data pedigree graph, the data source is another identified initial source, and the initial source is the entry point for the first creation of data. According to business rules, there are two and The above initial sources will introduce data inconsistency or data duplication. Based on this, the embodiment of the present invention provides the following anomaly identification scheme: if there are two or more root nodes, identify the original source anomaly. It should be noted that the scheme of identifying the initial source does not need to be premised on generating a tree data structure. the
由于多个C点(数据创建点),意味着系统存在多个数据录入点,会引入数据不一致,重复等问题。本发明实施例提供了如下异常识别方案:若上述图形数据集中存在两个或两个以上的数据为数据创建点,则将上述数据创建点标识为异常; Due to multiple C points (data creation points), it means that there are multiple data entry points in the system, which will introduce problems such as data inconsistency and duplication. The embodiment of the present invention provides the following anomaly identification scheme: if there are two or more data in the above-mentioned graphic data set as data creation points, then the above-mentioned data creation points are marked as abnormal;
本发明实施例还提供了异常回路的识别方案,异常回路,例如:D数据源传递到B数据源时,与A传递到B数据源的数据产生冲突,会引入数据不一致,重复问题。本发明实施例提供了如下异常识别方案:若数据集成路径不在上述树形数据结构中,则将不在树形数据结构中的数据集成路径标识为异常。以上异常识别方案,在后续以数据血缘图为应用背景的实施例举例中将给出更详细的说明。 The embodiment of the present invention also provides an identification scheme for abnormal loops. For example, when the abnormal loop is transferred from D data source to B data source, it will conflict with the data transferred from A to B data source, which will introduce data inconsistency and duplication problems. The embodiment of the present invention provides the following anomaly identification scheme: if the data integration path is not in the tree data structure, mark the data integration path not in the tree data structure as abnormal. The above anomaly identification scheme will be described in more detail in the subsequent embodiments with the application background of the data kinship graph. the
以上异常识别的方案,能大幅提升中大型企业数据治理、以及数据影响性分析的效率和准确性。结合自动化工具,能快速、动态生成展现良好的数据血缘图形,并可自动、准确识别出相关的数据异常。 The above exception identification scheme can greatly improve the efficiency and accuracy of data governance and data impact analysis of medium and large enterprises. Combined with automated tools, it can quickly and dynamically generate a good data lineage graph, and can automatically and accurately identify relevant data anomalies. the
本发明实施例给出了以数据血缘图为应用背景的实施例举例,对以上实施例进行更详细的说明。 The embodiment of the present invention gives an example of an embodiment using the data kinship map as an application background, and describes the above embodiments in more detail. the
一、数据血缘图G(V,E,Data)模型定义如下: 1. The data lineage graph G (V, E, Data) model is defined as follows:
V为图中顶点,定义为数据源,即数据Data所创建或保存的IT应用系统或数据库。如下图3所示的圆形标志的A、B、C、D、E、F即为数据源。 V is the vertex in the graph, defined as the data source, that is, the IT application system or database created or saved by the data Data. A, B, C, D, E, and F of the circle marks shown in Figure 3 below are the data sources. the
E为图中有向边,定义为数据集成路径,E=edge(u,v),u|v∈G(v),u!=v,即数据在数据源间传递的路径和方向。特殊说明:数据血缘图中,不允许出现数据源自包含集成路径,即源和目的为同一个数据源的有向边,集成定义为不同数据源间的数据传递。图3中的剪头所示方向。 E is a directed edge in the graph, defined as a data integration path, E=edge(u,v),u|v∈G(v),u!=v, that is, the path and direction of data transfer between data sources. Special note: In the data lineage graph, data sources and integration paths are not allowed, that is, directed edges whose source and destination are the same data source, and integration is defined as data transfer between different data sources. The direction indicated by the shear head in Figure 3. the
在数据血缘图数据右上角方形标示的C、R、U、D,分别代表数据所在数据源的执行的操作。C表示数据的创建、录入;R表示数据的读取;U表示数据的更新;D表示数据的删除。后续图5A~图5C采用同样的标示方式,不再一一说明。 The C, R, U, and D squares marked in the upper right corner of the data lineage map represent the operations performed by the data source where the data is located, respectively. C represents the creation and entry of data; R represents the reading of data; U represents the update of data; D represents the deletion of data. Subsequent FIG. 5A to FIG. 5C adopt the same labeling manner, and will not be described one by one. the
二、本发明实施例的应用场景: Two, the application scene of the embodiment of the present invention:
本发明实施例可以应用于数据集成管理平台,用于监控集成的数据集成,并根据识别到的集成异常信息进行报警。按照执行顺序依次为:数据集成元数据采集、数据血缘图生成以及数据血缘异常识别与报警,三个部分。 The embodiments of the present invention can be applied to a data integration management platform for monitoring integrated data integration and alarming according to identified integration abnormality information. According to the order of execution, there are three parts: data integration metadata collection, data lineage map generation, and data lineage abnormality identification and alarm. the
其中,数据集成元数据采集:通过数据集成管理平台手工或自动收集数据集成的元数据,包括传递的数据(一般以表为单位,对应业务数据实体),上游数据源和下游数据源。 Among them, data integration metadata collection: manually or automatically collect data integration metadata through the data integration management platform, including transmitted data (generally in the unit of table, corresponding to business data entities), upstream data sources and downstream data sources. the
数据血缘图生成:根据采集的数据集成元数据,按数据血缘图模型组装成数据血缘图。 Data lineage map generation: According to the collected data integration metadata, the data lineage map model is assembled into a data lineage map. the
数据血缘异常识别与报警:基于生成的数据血缘图,按数据血缘异常识别步骤找出异常并报警提示。 Data blood relationship abnormality identification and alarm: Based on the generated data blood relationship map, follow the data blood relationship abnormality identification steps to find out the abnormality and give an alarm. the
三、数据血缘异常识别,请参阅图4所示,包括如下步骤: 3. Data consanguinity abnormal identification, please refer to Figure 4, including the following steps:
401:在所有未纳入血缘基础树的数据源中寻找初始源。 401: Find the original source in all data sources that are not included in the basic blood tree. the
识别规则:首先找出数据源中入度最小的数据源,如入度相等,则选数据源中入度最小、出度最大,且没有纳入血缘基础树的数据源作为初始源。 Identification rules: firstly find out the data source with the smallest in-degree among the data sources. If the in-degree is equal, select the data source with the smallest in-degree and the largest out-degree among the data sources, and the data source that has not been included in the blood relationship basic tree as the initial source. the
在步骤401中,异常识别:如存在多个初始源,刚标识初始源异常。正 常情况下,只允许存在一个初始源。 In step 401, anomaly identification: if there are multiple initial sources, just identify the initial source anomalies. Normally, only one initial source is allowed. the
异常样例,如图5A所示:C数据源是识别到的另一个初始源,初始源作为数据的首次创建录入点,根据业务规则,存在两个及以上的初始源,会引入数据不一致性问题或数据重复问题。因此可以标识初始源异常。 Abnormal example, as shown in Figure 5A: C data source is another identified initial source, and the initial source is the first creation and entry point of data. According to business rules, there are two or more initial sources, which will introduce data inconsistency problem or data duplication problem. The original source anomaly can thus be identified. the
402:确定通过401的寻找,是否存在初始源,如没有识别到的初始源,进入步骤404。否则,进入403。 402: Determine whether there is an initial source through the search in 401, if there is no identified initial source, go to step 404. Otherwise, go to 403. the
403:以初始源为根节点,生成血缘基础树,并识别异常C(数据创建点)点。 403: Using the initial source as the root node, generate a basic bloodline tree, and identify abnormal C (data creation point) points. the
生成血缘基础树规则:从初始源开始,根据数据治理规则,采用数据传递范围最广,传递路径最短为最优的方式遍历,即基于数据源下游按广度优先算法生成血缘基础树。生成血缘基础树也是树形数据结构。 Generate blood relationship basic tree rules: start from the initial source, according to the data governance rules, use the widest data transmission range and the shortest transmission path as the optimal way to traverse, that is, generate the blood relationship basic tree based on the breadth-first algorithm downstream of the data source. Generating the basic tree of blood relationship is also a tree data structure. the
需要说明的是,按照深度优先算法生成血缘基础树也是可以的,以上按广度优先算法生成血缘基础树这一优选方案,不应理解为对本发明实施例的唯一限定。 It should be noted that it is also possible to generate the basic bloodline tree according to the depth-first algorithm, and the above preferred solution of generating the basic bloodline tree according to the breadth-first algorithm should not be understood as the only limitation to the embodiment of the present invention. the
在步骤403中,异常识别:遍历过程中,统计数据源C点个数,如C点个数大于1,则标识异常C点。异常样例:如图5B所示,多个C点(数据创建点),意味着系统存在多个数据录入点,会引入数据不一致,重复等问题。 In step 403, abnormal identification: during the traversal process, the number of C points in the data source is counted, and if the number of C points is greater than 1, an abnormal C point is identified. Abnormal example: As shown in Figure 5B, multiple C points (data creation points) mean that there are multiple data entry points in the system, which will introduce problems such as data inconsistency and duplication. the
404:遍历血缘基础树,标识异常回路。 404: Traverse the basic blood relationship tree, and identify abnormal loops. the
标识异常回路规则:遍历血缘图所有有向边,如不在血缘基础树上,则标识为异常回路。 Rules for identifying abnormal loops: Traverse all directed edges in the lineage graph, if they are not on the base lineage tree, mark them as abnormal loops. the
在步骤404中,异常识别:异常回路代表数据传递过程中的回流,会引入数据不一致性问题。异常样例,如图5C所示:数据从D数据源传递到B数据源时与A传递到B数据源的数据产生冲突,会引入数据不一致,重复问题。 In step 404, anomaly identification: an abnormal loop represents a return flow in the process of data transmission, which will introduce data inconsistency problems. An abnormal example, as shown in Figure 5C: When the data is transferred from D data source to B data source, it will conflict with the data transferred from A to B data source, which will introduce data inconsistency and duplication problems. the
以上实施例,将图形数据集的复杂的网状结构转换为树形数据结构,使数据集成后的数据关系更简洁,树形数据结构更具可读性,属于更好的数据集成方案,为进行影响性分析提供了便利。以上异常识别的方案,能大幅提升中大型企业数据治理、以及数据影响性分析的效率和准确性。结合自动化工具,能快速、动态生成展现良好的数据血缘图形,并可自动、准确识别出 相关的数据异常。 In the above embodiments, the complex network structure of the graphic data set is converted into a tree data structure, so that the data relationship after data integration is more concise, and the tree data structure is more readable, which belongs to a better data integration scheme. Conducting impact analysis is facilitated. The above exception identification scheme can greatly improve the efficiency and accuracy of data governance and data impact analysis of medium and large enterprises. Combined with automated tools, it can quickly and dynamically generate a good data lineage graph, and can automatically and accurately identify relevant data anomalies. the
本发明实施例还提供了一种数据集成的装置,如图6所示,包括: The embodiment of the present invention also provides a data integration device, as shown in Figure 6, including:
路径获取单元601,用于获取需要展示的图形数据集的数据集成路径,数据集成路径标示了各数据之间传递的路径和方向; The path acquisition unit 601 is used to acquire the data integration path of the graphic data set to be displayed, and the data integration path indicates the path and direction of transmission between each data;
图形数据集是指按照数据之间传递关系会组成图形结构的数据集,可以是已经组成的图形数据结构,例如数据血缘图(也称为数据血缘有向图)、ER(Equipment Room,设备间)图等,如果上述图形数据集为数据血缘图的数据集合,上述数据集成路径为数据血缘图中的有向边。可以理解的是,由于数据集可以是数据血缘图的数据集合,也可以是ER图中的设备,那么上述数据集应理解为广义的数据,代表数据库中存储的数据,或者,与具体的应用系统、实体设备等具有对应关系的数据。 A graph data set refers to a data set that will form a graph structure according to the transfer relationship between data. It can be a graph data structure that has already been formed, such as a data lineage graph (also known as a data lineage directed graph), ER (Equipment Room, between equipment) ) graph, etc., if the above-mentioned graph data set is a data set of a data lineage graph, the above-mentioned data integration path is a directed edge in the data lineage graph. It can be understood that since the data set can be the data set of the data lineage diagram, or the equipment in the ER diagram, then the above data set should be understood as data in a broad sense, representing the data stored in the database, or, related to the specific application System, physical equipment and other data that have a corresponding relationship. the
度获取单元602,用于依据上述路径获取单元601获取的数据之间传递的路径和方向,确定各数据的入度; The degree acquisition unit 602 is used to determine the in-degree of each data according to the path and direction transmitted between the data acquired by the above-mentioned path acquisition unit 601;
数据集成单元603,用于将入度最小的数据作为根节点,将上述图形数据集中的数据结构构建为树形数据结构。 The data integration unit 603 is configured to use the data with the smallest in-degree as the root node, and construct the data structure in the graph data set as a tree data structure. the
以上实施例,将图形数据集的复杂的网状结构转换为树形数据结构,使数据集成后的数据关系更简洁,树形数据结构更具可读性,属于更好的数据集成方案,为进行影响性分析提供了便利。 In the above embodiments, the complex network structure of the graphic data set is converted into a tree data structure, so that the data relationship after data integration is more concise, and the tree data structure is more readable, which belongs to a better data integration scheme. Conducting impact analysis is facilitated. the
进一步地,由于可能存在多个相同最小入度的数据,本发明实施例还提供了进一步的解决方案:如图7所示,上述装置,还包括: Further, since there may be multiple data with the same minimum in-degree, the embodiment of the present invention also provides a further solution: as shown in Figure 7, the above-mentioned device also includes:
节点确定单元701,用于确定上述图形数据集的各数据是否包含在上述数据集成单元603建立的树形数据结构中; A node determination unit 701, configured to determine whether each data of the above-mentioned graph data set is included in the tree data structure established by the above-mentioned data integration unit 603;
上述度获取单元602,还用于依据数据之间传递的路径和方向,确定各数据的出度; The above-mentioned degree acquisition unit 602 is also used to determine the out-degree of each data according to the path and direction transmitted between the data;
上述数据集成单元603,具体用于将入度最小的数据作为根节点,若最小入度的数据有两个或两个以上,则将上述两个或两个以上的数据中出度最大且未包含在树形数据结构中的数据作为根节点。 The above-mentioned data integration unit 603 is specifically used to use the data with the smallest in-degree as the root node, and if there are two or more data with the smallest in-degree, then take the data with the largest out-degree and no The data contained in the tree data structure acts as the root node. the
可选地,上述数据集成单元603,具体用于按照广度优先算法遍历上述图形数据集中的数据,得到树形数据结构。 Optionally, the above-mentioned data integration unit 603 is specifically configured to traverse the data in the above-mentioned graph data set according to the breadth-first algorithm to obtain a tree data structure. the
需要说明的是,按照深度优先算法生成血缘基础树也是可以的,以上按广度优先算法生成血缘基础树这一优选方案,不应理解为对本发明实施例的唯一限定。 It should be noted that it is also possible to generate the basic bloodline tree according to the depth-first algorithm, and the above preferred solution of generating the basic bloodline tree according to the breadth-first algorithm should not be understood as the only limitation to the embodiment of the present invention. the
进一步地,本发明实施还提供了标识异常的方案,如下:如图8所示,上述装置,还包括: Further, the implementation of the present invention also provides a solution for identifying abnormalities, as follows: As shown in Figure 8, the above-mentioned device also includes:
异常标识单元801,用于执行以下至少之一:若存在两个或两个以上的根节点,则标识初源异常;若上述图形数据集中存在两个或两个以上的数据为数据创建点,则将上述数据创建点标识为异常;若数据集成路径不在上述树形数据结构中,则将不在树形数据结构中的数据集成路径标识为异常。 Anomaly identification unit 801, configured to perform at least one of the following: if there are two or more root nodes, identify the original source anomaly; if there are two or more data in the above-mentioned graph data set as data creation points, Then mark the above data creation point as abnormal; if the data integration path is not in the above tree data structure, then mark the data integration path not in the tree data structure as abnormal. the
以上异常识别的方案,能大幅提升中大型企业数据治理、以及数据影响性分析的效率和准确性。结合自动化工具,能快速、动态生成展现良好的数据血缘图形,并可自动、准确识别出相关的数据异常。 The above exception identification scheme can greatly improve the efficiency and accuracy of data governance and data impact analysis of medium and large enterprises. Combined with automated tools, it can quickly and dynamically generate a good data lineage graph, and can automatically and accurately identify relevant data anomalies. the
可选地,上述路径获取单元601,具体用于获取需要展示的数据血缘图的有向边,上述有向边标示了各数据之间传递的路径和方向。 Optionally, the above-mentioned path acquisition unit 601 is specifically configured to acquire directed edges of the blood relationship graph of the data to be displayed, and the above-mentioned directed edges indicate the path and direction of transmission between each data. the
本发明实施例还提供了一种设备,如图9所示,该设备可以用于数据集成的实现,包括:接收设备901、发送设备902、存储设备903以及处理器904; The embodiment of the present invention also provides a device, as shown in Figure 9, the device can be used to implement data integration, including: receiving device 901, sending device 902, storage device 903 and processor 904;
其中,处理器904,用于获取需要展示的图形数据集的数据集成路径,数据集成路径标示了各数据之间传递的路径和方向;依据数据之间传递的路径和方向,确定各数据的入度;将入度最小的数据作为根节点,将上述图形数据集中的数据结构构建为树形数据结构。 Among them, the processor 904 is used to obtain the data integration path of the graphic data set to be displayed, and the data integration path indicates the path and direction of transmission between each data; according to the path and direction of transmission between data, determine the input of each data Degree; the data with the smallest in-degree is used as the root node, and the data structure in the above-mentioned graph dataset is constructed as a tree data structure. the
图形数据集可以是存储在存储设备903,也可以通过接收设备901接收得到。最终的树形数据结构可以存储在存储设备903,也可以通过发送设备902发送到其它设备。本发明实施例对此不予限定。 The graphic data set may be stored in the storage device 903 or received by the receiving device 901 . The final tree data structure can be stored in the storage device 903 or sent to other devices through the sending device 902 . The embodiment of the present invention does not limit this. the
以上实施例,将图形数据集的复杂的网状结构转换为树形数据结构,使数据集成后的数据关系更简洁,树形数据结构更具可读性,属于更好的数据集成方案,为进行影响性分析提供了便利。 In the above embodiments, the complex network structure of the graphic data set is converted into a tree data structure, so that the data relationship after data integration is more concise, and the tree data structure is more readable, which belongs to a better data integration scheme. Conducting impact analysis is facilitated. the
进一步地,由于可能存在多个相同最小入度的数据,本发明实施例还提供了进一步的解决方案:上述处理器904,还用于依据数据之间传递的路径和方向,确定各数据的出度;并确定上述图形数据集的各数据是否包含在树形 数据结构中; Furthermore, since there may be multiple data with the same minimum in-degree, the embodiment of the present invention also provides a further solution: the above-mentioned processor 904 is also used to determine the output of each data according to the path and direction of data transfer. degree; and determine whether each data of the above graph data set is included in the tree data structure;
上述将入度最小的数据作为根节点包括:将入度最小的数据作为根节点,若最小入度的数据有两个或两个以上,则将上述两个或两个以上的数据中出度最大且未包含在树形数据结构中的数据作为根节点。 The above-mentioned taking the data with the smallest in-degree as the root node includes: taking the data with the smallest in-degree as the root node, if there are two or more data with the smallest in-degree, then the out-degree of the above two or more data The largest data not included in the tree data structure is used as the root node. the
优选地,上述处理器904,用于按照广度优先算法遍历上述图形数据集中的数据,得到树形数据结构。 Preferably, the above-mentioned processor 904 is configured to traverse the data in the above-mentioned graph data set according to the breadth-first algorithm to obtain a tree data structure. the
需要说明的是,按照深度优先算法生成血缘基础树也是可以的,以上按广度优先算法生成血缘基础树这一优选方案,不应理解为对本发明实施例的唯一限定。 It should be noted that it is also possible to generate the basic bloodline tree according to the depth-first algorithm, and the above preferred solution of generating the basic bloodline tree according to the breadth-first algorithm should not be understood as the only limitation to the embodiment of the present invention. the
进一步地,本发明实施还提供了标识异常的方案,如下:上述处理器904,还用于执行以下至少一项:若存在两个或两个以上的根节点,则标识初源异常;若上述图形数据集中存在两个或两个以上的数据为数据创建点,则将上述数据创建点标识为异常;若数据集成路径不在上述树形数据结构中,则将不在树形数据结构中的数据集成路径标识为异常。 Further, the implementation of the present invention also provides a solution for identifying anomalies, as follows: the above-mentioned processor 904 is also configured to perform at least one of the following: if there are two or more root nodes, identify the original source anomaly; if the above If there are two or more data in the graph data set as data creation points, the above data creation points will be marked as abnormal; if the data integration path is not in the above tree data structure, the data not in the tree data structure will be integrated The path is identified as abnormal. the
以上异常识别的方案,能大幅提升中大型企业数据治理、以及数据影响性分析的效率和准确性。结合自动化工具,能快速、动态生成展现良好的数据血缘图形,并可自动、准确识别出相关的数据异常。 The above exception identification scheme can greatly improve the efficiency and accuracy of data governance and data impact analysis of medium and large enterprises. Combined with automated tools, it can quickly and dynamically generate a good data lineage graph, and can automatically and accurately identify relevant data anomalies. the
可选地,上述处理器904,具体用于获取需要展示的数据血缘图的有向边,上述有向边标示了各数据之间传递的路径和方向。 Optionally, the above-mentioned processor 904 is specifically configured to obtain directed edges of the blood relationship graph of the data to be displayed, and the above-mentioned directed edges indicate the path and direction of transmission between each data. the
本发明实施例还提供了另一种数据集成装置,如图10所示,为了便于说明,仅示出了与本发明实施例相关的部分,具体技术细节未揭示的,请参照本发明实施例方法部分。该装置可以是终端设备,例如:是手机、平板电脑、PDA(Personal Digital Assistant,个人数字助理)、POS(Point of Sales,销售终端)、车载电脑等任意终端设备,还可以是服务器等。以终端设备为例: The embodiment of the present invention also provides another data integration device, as shown in Figure 10, for the convenience of description, only the parts related to the embodiment of the present invention are shown, and the specific technical details are not disclosed, please refer to the embodiment of the present invention method section. The device can be a terminal device, for example, any terminal device such as a mobile phone, a tablet computer, a PDA (Personal Digital Assistant, a personal digital assistant), a POS (Point of Sales, a sales terminal), a vehicle-mounted computer, or a server. Take terminal equipment as an example:
图10示出的是与本发明实施例提供的终端设备相关的部分结构的框图。参考图10,终端设备包括:射频(Radio Frequency,RF)电路1010、存储器1020、输入单元1030、显示单元1040、传感器1050、音频电路1060、无线保真(wireless fidelity,WiFi)模块1070、处理器1080、以及电源1090等部件。本领域技术人员可以理解,图10中示出的终端设备结构并不构成对终端 设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。 FIG. 10 shows a block diagram of a partial structure related to the terminal device provided by the embodiment of the present invention. Referring to FIG. 10 , the terminal device includes: a radio frequency (Radio Frequency, RF) circuit 1010, a memory 1020, an input unit 1030, a display unit 1040, a sensor 1050, an audio circuit 1060, a wireless fidelity (wireless fidelity, WiFi) module 1070, a processor 1080, and power supply 1090 and other components. Those skilled in the art can understand that the terminal device structure shown in FIG. 10 does not constitute a limitation on the terminal device, and may include more or fewer components than shown in the figure, or combine some components, or arrange different components. the
下面结合图10对终端设备的各个构成部件进行具体的介绍: The following is a specific introduction to each component of the terminal device in combination with Figure 10:
RF电路1010可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,给处理器1080处理;另外,将设计上行的数据发送给基站。通常,RF电路包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器(Low Noise Amplifier,LNA)、双工器等。此外,RF电路100还可以通过无线通信与网络和其他设备通信。上述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(Global System of Mobile communication,GSM)、通用分组无线服务(General Packet Radio Service,GPRS)、码分多址(Code Division Multiple Access,CDMA)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、长期演进(Long Term Evolution,LTE)、电子邮件、短消息服务(Short Messaging Service,SMS)等。 The RF circuit 1010 can be used for sending and receiving information or receiving and sending signals during a call. In particular, after receiving the downlink information from the base station, it is processed by the processor 1080; in addition, it sends the designed uplink data to the base station. Generally, an RF circuit includes but is not limited to an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (Low Noise Amplifier, LNA), a duplexer, and the like. In addition, the RF circuit 100 can also communicate with networks and other devices through wireless communication. The above-mentioned wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile communication (Global System of Mobile communication, GSM), General Packet Radio Service (General Packet Radio Service, GPRS), Code Division Multiple Access (Code Division Multiple Access, CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Messaging Service (SMS), etc. the
存储器1020可用于存储软件程序以及模块,处理器1080通过运行存储在存储器1020的软件程序以及模块,从而执行终端设备的各种功能应用以及数据处理。存储器1020可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据终端设备的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器1020可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。 The memory 1020 can be used to store software programs and modules, and the processor 1080 executes various functional applications and data processing of the terminal device by running the software programs and modules stored in the memory 1020 . The memory 1020 can mainly include a program storage area and a data storage area, wherein the program storage area can store an operating system, at least one application program required by a function (such as a sound playback function, an image playback function, etc.); Data created by the use of terminal equipment (such as audio data, phonebook, etc.), etc. In addition, the memory 1020 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage devices. the
输入单元1030可用于接收输入的数字或字符信息,以及产生与终端设备1000的用户设置以及功能控制有关的键信号输入。具体地,输入单元1030可包括触控面板1031以及其他输入设备1032。触控面板1031,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触控面板1031上或在触控面板1031附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触控面板1031可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并 检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器1080,并能接收处理器1080发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板1031。除了触控面板1031,输入单元1030还可以包括其他输入设备1032。具体地,其他输入设备1032可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。 The input unit 1030 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the terminal device 1000 . Specifically, the input unit 1030 may include a touch panel 1031 and other input devices 1032 . The touch panel 1031, also referred to as a touch screen, can collect touch operations of the user on or near it (for example, the user uses any suitable object or accessory such as a finger or a stylus on the touch panel 1031 or near the touch panel 1031). operation), and drive the corresponding connection device according to the preset program. Optionally, the touch panel 1031 may include two parts, a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch orientation, and detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and sends it to the to the processor 1080, and can receive and execute commands sent by the processor 1080. In addition, the touch panel 1031 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. In addition to the touch panel 1031 , the input unit 1030 may also include other input devices 1032 . Specifically, other input devices 1032 may include but not limited to one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), trackball, mouse, joystick, and the like. the
显示单元1040可用于显示由用户输入的信息或提供给用户的信息以及终端设备的各种菜单。显示单元1040可包括显示面板1041,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板1041。进一步的,触控面板1031可覆盖显示面板1041,当触控面板1031检测到在其上或附近的触摸操作后,传送给处理器1080以确定触摸事件的类型,随后处理器1080根据触摸事件的类型在显示面板1041上提供相应的视觉输出。虽然在图10中,触控面板1031与显示面板1041是作为两个独立的部件来实现终端设备的输入和输入功能,但是在某些实施例中,可以将触控面板1031与显示面板1041集成而实现终端设备的输入和输出功能。 The display unit 1040 may be used to display information input by or provided to the user and various menus of the terminal device. The display unit 1040 may include a display panel 1041. Optionally, the display panel 1041 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), or the like. Furthermore, the touch panel 1031 can cover the display panel 1041, and when the touch panel 1031 detects a touch operation on or near it, it sends it to the processor 1080 to determine the type of the touch event, and then the processor 1080 determines the type of the touch event according to the The type provides a corresponding visual output on the display panel 1041 . Although in FIG. 10, the touch panel 1031 and the display panel 1041 are used as two independent components to realize the input and input functions of the terminal device, in some embodiments, the touch panel 1031 and the display panel 1041 can be integrated. And realize the input and output functions of the terminal equipment. the
终端设备1000还可包括至少一种传感器1050,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板1041的亮度,接近传感器可在终端设备移动到耳边时,关闭显示面板1041和/或背光。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别终端设备姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于终端设备还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。 The terminal device 1000 may also include at least one sensor 1050, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor, where the ambient light sensor can adjust the brightness of the display panel 1041 according to the brightness of the ambient light, and the proximity sensor can turn off the display panel 1041 and the display panel 1041 when the terminal device moves to the ear. / or backlighting. As a kind of motion sensor, the accelerometer sensor can detect the magnitude of acceleration in various directions (generally three axes), and can detect the magnitude and direction of gravity when it is stationary, and can be used for applications that recognize the posture of terminal equipment (such as horizontal and vertical screen switching, Related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tap), etc.; as for other sensors such as gyroscopes, barometers, hygrometers, thermometers, and infrared sensors that can also be configured on terminal devices, here No longer. the
音频电路1060、扬声器1061,传声器1062可提供用户与终端设备之间的音频接口。音频电路1060可将接收到的音频数据转换后的电信号,传输到扬声器1061,由扬声器1061转换为声音信号输出;另一方面,传声器1062 将收集的声音信号转换为电信号,由音频电路1060接收后转换为音频数据,再将音频数据输出处理器1080处理后,经RF电路1010以发送给比如另一终端设备,或者将音频数据输出至存储器1020以便进一步处理。 The audio circuit 1060, the speaker 1061, and the microphone 1062 can provide an audio interface between the user and the terminal device. The audio circuit 1060 can transmit the electrical signal converted from the received audio data to the loudspeaker 1061, and the loudspeaker 1061 converts it into a sound signal output; After being received, it is converted into audio data, and then the audio data is processed by the output processor 1080, and then sent to, for example, another terminal device through the RF circuit 1010, or the audio data is output to the memory 1020 for further processing. the
WiFi属于短距离无线传输技术,终端设备通过WiFi模块1070可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图10示出了WiFi模块1070,但是可以理解的是,其并不属于终端设备1000的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。 WiFi is a short-distance wireless transmission technology. Through the WiFi module 1070, terminal equipment can help users send and receive emails, browse web pages, and access streaming media. It provides users with wireless broadband Internet access. Although FIG. 10 shows a WiFi module 1070, it can be understood that it is not an essential component of the terminal device 1000, and can be completely omitted as required without changing the essence of the invention. the
处理器1080是终端设备的控制中心,利用各种接口和线路连接整个终端设备的各个部分,通过运行或执行存储在存储器1020内的软件程序和/或模块,以及调用存储在存储器1020内的数据,执行终端设备的各种功能和处理数据,从而对终端设备进行整体监控。可选的,处理器1080可包括一个或多个处理单元;优选的,处理器1080可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1080中。 The processor 1080 is the control center of the terminal equipment, which uses various interfaces and lines to connect various parts of the entire terminal equipment, runs or executes software programs and/or modules stored in the memory 1020, and calls data stored in the memory 1020 , execute various functions of the terminal equipment and process data, so as to monitor the terminal equipment as a whole. Optionally, the processor 1080 may include one or more processing units; preferably, the processor 1080 may integrate an application processor and a modem processor, wherein the application processor mainly processes operating systems, user interfaces, and application programs, etc. , the modem processor mainly handles wireless communications. It can be understood that the foregoing modem processor may not be integrated into the processor 1080 . the
终端设备1000还包括给各个部件供电的电源1090(比如电池),优选的,电源可以通过电源管理系统与处理器1080逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。 The terminal device 1000 also includes a power supply 1090 (such as a battery) for supplying power to various components. Preferably, the power supply can be logically connected to the processor 1080 through the power management system, so that functions such as charging, discharging, and power consumption management can be realized through the power management system. . the
尽管未示出,终端设备1000还可以包括摄像头、蓝牙模块等,在此不再赘述。 Although not shown, the terminal device 1000 may also include a camera, a Bluetooth module, etc., which will not be repeated here. the
在本发明实施例中,该终端所包括的处理器1080还具有以下功能: In the embodiment of the present invention, the processor 1080 included in the terminal also has the following functions:
其中,处理器1080,用于获取需要展示的图形数据集的数据集成路径,数据集成路径标示了各数据之间传递的路径和方向;依据数据之间传递的路径和方向,确定各数据的入度;将入度最小的数据作为根节点,将上述图形数据集中的数据结构构建为树形数据结构。 Among them, the processor 1080 is used to obtain the data integration path of the graphic data set to be displayed, and the data integration path indicates the path and direction of transmission between each data; according to the path and direction of transmission between data, determine the input of each data Degree; the data with the smallest in-degree is used as the root node, and the data structure in the above-mentioned graph dataset is constructed as a tree data structure. the
以上实施例,将图形数据集的复杂的网状结构转换为树形数据结构,使数据集成后的数据关系更简洁,树形数据结构更具可读性,属于更好的数据集成方案,为进行影响性分析提供了便利。 In the above embodiments, the complex network structure of the graphic data set is converted into a tree data structure, so that the data relationship after data integration is more concise, and the tree data structure is more readable, which belongs to a better data integration scheme. Conducting impact analysis is facilitated. the
进一步地,由于可能存在多个相同最小入度的数据,本发明实施例还提供了进一步的解决方案:上述处理器1080,还用于依据数据之间传递的路径和方向,确定各数据的出度;并确定上述图形数据集的各数据是否包含在树形数据结构中; Furthermore, since there may be multiple data with the same minimum in-degree, the embodiment of the present invention also provides a further solution: the above-mentioned processor 1080 is also used to determine the output of each data according to the path and direction of data transfer. degree; and determine whether each data of the above graph data set is included in the tree data structure;
上述将入度最小的数据作为根节点包括:将入度最小的数据作为根节点,若最小入度的数据有两个或两个以上,则将上述两个或两个以上的数据中出度最大且未包含在树形数据结构中的数据作为根节点。 The above-mentioned taking the data with the smallest in-degree as the root node includes: taking the data with the smallest in-degree as the root node, if there are two or more data with the smallest in-degree, then the out-degree of the above two or more data The largest data not included in the tree data structure is used as the root node. the
优选地,上述处理器1080,用于按照广度优先算法遍历上述图形数据集中的数据,得到树形数据结构。 Preferably, the processor 1080 is configured to traverse the data in the graph data set according to the breadth-first algorithm to obtain a tree data structure. the
需要说明的是,按照深度优先算法生成血缘基础树也是可以的,以上按广度优先算法生成血缘基础树这一优选方案,不应理解为对本发明实施例的唯一限定。 It should be noted that it is also possible to generate the basic bloodline tree according to the depth-first algorithm, and the above preferred solution of generating the basic bloodline tree according to the breadth-first algorithm should not be understood as the only limitation to the embodiment of the present invention. the
进一步地,本发明实施还提供了标识异常的方案,如下:上述处理器1080,还用于执行以下至少一项:若存在两个或两个以上的根节点,则标识初源异常;若上述图形数据集中存在两个或两个以上的数据为数据创建点,则将上述数据创建点标识为异常;若数据集成路径不在上述树形数据结构中,则将不在树形数据结构中的数据集成路径标识为异常。 Further, the implementation of the present invention also provides a solution for identifying anomalies, as follows: the above-mentioned processor 1080 is also configured to perform at least one of the following: if there are two or more root nodes, identify the original source anomaly; if the above-mentioned If there are two or more data in the graph data set as data creation points, the above data creation points will be marked as abnormal; if the data integration path is not in the above tree data structure, the data not in the tree data structure will be integrated The path is identified as abnormal. the
以上异常识别的方案,能大幅提升中大型企业数据治理、以及数据影响性分析的效率和准确性。结合自动化工具,能快速、动态生成展现良好的数据血缘图形,并可自动、准确识别出相关的数据异常。 The above exception identification scheme can greatly improve the efficiency and accuracy of data governance and data impact analysis of medium and large enterprises. Combined with automated tools, it can quickly and dynamically generate a good data lineage graph, and can automatically and accurately identify relevant data anomalies. the
可选地,上述处理器1080,具体用于获取需要展示的数据血缘图的有向边,上述有向边标示了各数据之间传递的路径和方向。 Optionally, the above-mentioned processor 1080 is specifically configured to obtain directed edges of the blood relationship graph of the data to be displayed, and the above-mentioned directed edges indicate the path and direction of transmission between various data. the
值得注意的是,上述装置和设备实施例中,所包括的各个单元只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本发明的保护范围。 It is worth noting that, in the above-mentioned device and device embodiments, each unit included is only divided according to functional logic, but is not limited to the above-mentioned division, as long as the corresponding functions can be realized; in addition, each functional unit The specific names are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present invention. the
另外,本领域普通技术人员可以理解实现上述各方法实施例中的全部或部分步骤是可以通过程序来指令相关的硬件完成,相应的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或 光盘等。 In addition, those of ordinary skill in the art can understand that all or part of the steps in the above-mentioned method embodiments can be completed by instructing related hardware through programs, and the corresponding programs can be stored in a computer-readable storage medium. The storage medium can be a read-only memory, a magnetic disk or an optical disk, etc. the
以上仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明实施例揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。 The above are only preferred specific implementation modes of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the embodiments of the present invention. , should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims. the
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201310326760.7ACN104346367B (en) | 2013-07-30 | 2013-07-30 | A kind of method and apparatus of data integration |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201310326760.7ACN104346367B (en) | 2013-07-30 | 2013-07-30 | A kind of method and apparatus of data integration |
| Publication Number | Publication Date |
|---|---|
| CN104346367Atrue CN104346367A (en) | 2015-02-11 |
| CN104346367B CN104346367B (en) | 2018-10-02 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201310326760.7AActiveCN104346367B (en) | 2013-07-30 | 2013-07-30 | A kind of method and apparatus of data integration |
| Country | Link |
|---|---|
| CN (1) | CN104346367B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104933621A (en)* | 2015-06-19 | 2015-09-23 | 天睿信科技术(北京)有限公司 | Big data analysis system and method for guarantee ring |
| CN105844390A (en)* | 2016-03-21 | 2016-08-10 | 乐视网信息技术(北京)股份有限公司 | Method and device for tracing data quality and hardware processor |
| CN106484725A (en)* | 2015-08-31 | 2017-03-08 | 华为技术有限公司 | A kind of data processing method, device and system |
| WO2017101301A1 (en)* | 2015-12-14 | 2017-06-22 | 乐视控股(北京)有限公司 | Data information processing method and device |
| CN108280135A (en)* | 2017-12-26 | 2018-07-13 | 阿里巴巴集团控股有限公司 | Realize the method, apparatus and electronic equipment of data structure visualization |
| CN110019177A (en)* | 2017-07-21 | 2019-07-16 | 北京京东尚科信息技术有限公司 | The method and apparatus of rule storage |
| CN110457405A (en)* | 2019-08-20 | 2019-11-15 | 上海观安信息技术股份有限公司 | A kind of database audit method based on genetic connection |
| CN111090651A (en)* | 2019-12-18 | 2020-05-01 | 深圳前海微众银行股份有限公司 | Data source processing method, device and equipment and readable storage medium |
| CN114265941A (en)* | 2021-12-20 | 2022-04-01 | 百融至信(北京)征信有限公司 | A method and system for converting a relationship graph into a tree-like blood relationship graph |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050177592A1 (en)* | 2003-11-12 | 2005-08-11 | Advanced Digital Broadcast Ltd. | System for searching for data and defining data in tree structures and method of searching and defining data in tree structures |
| CN101620606A (en)* | 2008-06-30 | 2010-01-06 | 国际商业机器公司 | The method and system of automatically generated data library inquiry |
| CN102111912A (en)* | 2011-03-09 | 2011-06-29 | 南京瀚之显电子科技有限公司 | Centralized construction method for Zigbee homogeneous tree-type wireless sensor network |
| CN102509105A (en)* | 2011-09-30 | 2012-06-20 | 北京航空航天大学 | Hierarchical processing method of image scene based on Bayesian inference |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050177592A1 (en)* | 2003-11-12 | 2005-08-11 | Advanced Digital Broadcast Ltd. | System for searching for data and defining data in tree structures and method of searching and defining data in tree structures |
| CN101620606A (en)* | 2008-06-30 | 2010-01-06 | 国际商业机器公司 | The method and system of automatically generated data library inquiry |
| CN102111912A (en)* | 2011-03-09 | 2011-06-29 | 南京瀚之显电子科技有限公司 | Centralized construction method for Zigbee homogeneous tree-type wireless sensor network |
| CN102509105A (en)* | 2011-09-30 | 2012-06-20 | 北京航空航天大学 | Hierarchical processing method of image scene based on Bayesian inference |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104933621A (en)* | 2015-06-19 | 2015-09-23 | 天睿信科技术(北京)有限公司 | Big data analysis system and method for guarantee ring |
| CN106484725B (en)* | 2015-08-31 | 2019-08-20 | 华为技术有限公司 | A data processing method, device and system |
| CN106484725A (en)* | 2015-08-31 | 2017-03-08 | 华为技术有限公司 | A kind of data processing method, device and system |
| WO2017101301A1 (en)* | 2015-12-14 | 2017-06-22 | 乐视控股(北京)有限公司 | Data information processing method and device |
| EP3206146A4 (en)* | 2015-12-14 | 2017-08-16 | LE Holdings (Beijing) Co., Ltd. | Data information processing method and device |
| CN105844390A (en)* | 2016-03-21 | 2016-08-10 | 乐视网信息技术(北京)股份有限公司 | Method and device for tracing data quality and hardware processor |
| CN105844390B (en)* | 2016-03-21 | 2022-08-30 | 天津智融创新科技发展有限公司 | Data quality tracing method and device and hardware processor |
| CN110019177A (en)* | 2017-07-21 | 2019-07-16 | 北京京东尚科信息技术有限公司 | The method and apparatus of rule storage |
| CN108280135B (en)* | 2017-12-26 | 2021-08-10 | 创新先进技术有限公司 | Method and device for realizing visualization of data structure and electronic equipment |
| CN108280135A (en)* | 2017-12-26 | 2018-07-13 | 阿里巴巴集团控股有限公司 | Realize the method, apparatus and electronic equipment of data structure visualization |
| CN110457405A (en)* | 2019-08-20 | 2019-11-15 | 上海观安信息技术股份有限公司 | A kind of database audit method based on genetic connection |
| CN110457405B (en)* | 2019-08-20 | 2021-09-21 | 上海观安信息技术股份有限公司 | Database auditing method based on blood relationship |
| CN111090651A (en)* | 2019-12-18 | 2020-05-01 | 深圳前海微众银行股份有限公司 | Data source processing method, device and equipment and readable storage medium |
| CN111090651B (en)* | 2019-12-18 | 2024-03-29 | 深圳前海微众银行股份有限公司 | Data source processing method, device, equipment and readable storage medium |
| CN114265941A (en)* | 2021-12-20 | 2022-04-01 | 百融至信(北京)征信有限公司 | A method and system for converting a relationship graph into a tree-like blood relationship graph |
| Publication number | Publication date |
|---|---|
| CN104346367B (en) | 2018-10-02 |
| Publication | Publication Date | Title |
|---|---|---|
| CN104346367B (en) | A kind of method and apparatus of data integration | |
| EP3555769B1 (en) | Caching of subgraphs and integration of cached subgraphs into graph query results | |
| CN104112213B (en) | The method and device of recommendation information | |
| TWI506247B (en) | Method and device for displaying geographic position | |
| CN104427074B (en) | A kind of methods, devices and systems for showing incoming information | |
| CN107204964A (en) | A kind of methods, devices and systems of rights management | |
| CN110597793A (en) | Data management method and device, electronic equipment and computer readable storage medium | |
| CN111104425A (en) | Data processing method and device | |
| CN109978482A (en) | Workflow processing method, device, equipment and storage medium | |
| CN107071129A (en) | A bright screen control method and mobile terminal | |
| CN107465802A (en) | A kind of methods, devices and systems for showing communication message | |
| CN114707793A (en) | Emergency plan generation method and device, storage medium and electronic equipment | |
| WO2014146450A1 (en) | Method, device and system for data searching | |
| CN106789307B (en) | Configuration data processing method, apparatus and system | |
| CN106657254B (en) | Method, device and system for synchronizing contact information | |
| CN109429229A (en) | Obtain the method, apparatus and computer readable storage medium of network access information | |
| CN107103086B (en) | Data acquisition auditing method and system, and computer readable storage medium | |
| CN106775745B (en) | Method and device for merging program codes | |
| CN116976898B (en) | Data acquisition method, data visualization method, device and related products | |
| CN109451295A (en) | A kind of method and system obtaining virtual information | |
| CN113379385B (en) | Clinical research project plan data processing method and device | |
| CN107193574A (en) | A kind of method and apparatus for showing miscue information | |
| CN116881143A (en) | Data object copying abnormality investigation method, device, equipment and storage medium | |
| CN104683553B (en) | The management method and device of refuse messages | |
| CN105208064B (en) | A kind of method and apparatus obtaining micro-blog information |
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |