CN114297204A

Movatterモバイル変換

Info

Publication number: CN114297204A
Application number: CN202111677171.4A
Authority: CN
Inventors: 高羽
Original assignee: Secworld Information Technology Beijing Co Ltd; Qax Technology Group Inc
Current assignee: Secworld Information Technology Beijing Co Ltd; Qax Technology Group Inc
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-04-08
Anticipated expiration: 2041-12-31
Also published as: CN114297204B

Abstract

Description

Translated fromChinese

一种异构数据源的数据存储、检索方法及装置A data storage and retrieval method and device for heterogeneous data sources

技术领域technical field

本发明涉及异构数据源处理技术领域，尤其涉及一种异构数据源的数据存储、数据检索方法及装置。The invention relates to the technical field of heterogeneous data source processing, in particular to a data storage and data retrieval method and device for heterogeneous data sources.

背景技术Background technique

数据的检索操作可以横跨多种数据源，数据源即数据的来源之处，不同数据源包含了各行各业的业务活动产生的大量数据(例如，日志数据)，这些来源之处不同的大量数据会在数据结构、存储格式等等方面存在差异，构成了异构数据源，它所包含的数据量是巨大的，可以是百亿级的。Data retrieval operations can span a variety of data sources. The data source is the source of the data. Different data sources contain a large amount of data (for example, log data) generated by business activities in all walks of life. Data will have differences in data structure, storage format, etc., forming heterogeneous data sources, and the amount of data it contains is huge, which can be tens of billions.

目前，通常需要针对每种数据源，进行定制化的编写脚本，以实现数据解析和采集所需数据的功能，以及还需要存储采集到的数据并对外提供检索能力。然而，这对程序性能和可靠性有极高要求的，这相当于是需要进行定制化的开发，造成开发成本高，且不利于后续维护管理。At present, it is usually necessary to write customized scripts for each data source to realize the functions of data analysis and collection of required data, as well as to store the collected data and provide external retrieval capabilities. However, this has extremely high requirements on program performance and reliability, which is equivalent to the need for customized development, which results in high development costs and is not conducive to subsequent maintenance management.

发明内容SUMMARY OF THE INVENTION

有鉴于此，本发明提供一种异构数据源的数据存储、数据检索方法及装置，主要目的在于利用通用方式实现对异构数据源的采集、存储和检索操作，优化了对异构数据源的相关数据处理方法，降低数据处理成本，提高数据处理效率，也有利于后续维护管理。In view of this, the present invention provides a data storage, data retrieval method and device for heterogeneous data sources, the main purpose of which is to realize the collection, storage and retrieval operations of heterogeneous data sources in a general manner, and optimize the heterogeneous data sources. The relevant data processing method can reduce the cost of data processing, improve the efficiency of data processing, and is also conducive to subsequent maintenance and management.

本申请第一方面提供了一种异构数据源的数据存储方法，该方法包括：A first aspect of the present application provides a data storage method for heterogeneous data sources, the method comprising:

接收数据采集指令，所述数据采集指令中包含异构数据源的标识；receiving a data collection instruction, where the data collection instruction includes an identifier of a heterogeneous data source;

基于所述异构数据源的标识，查找到相应的异构数据源，其中，所述异构数据源中内嵌有注解，所述注解用于配置所述异构数据源的数据采集方式；Find the corresponding heterogeneous data source based on the identifier of the heterogeneous data source, wherein the heterogeneous data source is embedded with an annotation, and the annotation is used to configure the data collection mode of the heterogeneous data source;

根据所述异构数据源中内嵌的注解，执行数据采集操作得到目标数据；According to the embedded annotations in the heterogeneous data source, the data collection operation is performed to obtain the target data;

利用预置配置文件，将所述目标数据存储到列式存储数据库中。Using a preset configuration file, the target data is stored in a columnar storage database.

在本申请第一方面的一些变更实施方式中，所述根据所述异构数据源中内嵌的注解，执行数据采集操作得到目标数据，包括：In some modified implementations of the first aspect of the present application, performing a data collection operation to obtain target data according to the annotations embedded in the heterogeneous data sources includes:

从所述注解中获取所述异构数据源的接口和数据转换格式；Obtain the interface and data conversion format of the heterogeneous data source from the annotation;

根据所述接口从所述异构数据源中采集数据；Collect data from the heterogeneous data sources according to the interface;

根据所述数据转换格式，对采集到的数据进行统一格式转换处理得到目标数据。According to the data conversion format, unified format conversion processing is performed on the collected data to obtain target data.

在本申请第一方面的一些变更实施方式中，所述利用预置配置文件，将所述目标数据存储到列式存储数据库中，包括：In some modified implementations of the first aspect of the present application, the use of a preset configuration file to store the target data in a columnar storage database includes:

从所述预置配置文件中获取多个预置属性字段、多个存储量阈值和多个存储地址，其中，每个所述存储量阈值关联了不同的存储地址；Obtain multiple preset attribute fields, multiple storage volume thresholds and multiple storage addresses from the preset configuration file, wherein each of the storage volume thresholds is associated with different storage addresses;

从所述多个存储量阈值中获取大于所述目标数据的数据量所对应的存储量阈值，作为目标存储量阈值；Acquiring, from the plurality of storage thresholds, a storage threshold corresponding to a data volume greater than the target data, as the target storage threshold;

根据所述目标存储量阈值，从所述多个存储地址中确定对应关联的目标存储地址；Determine a corresponding associated target storage address from the plurality of storage addresses according to the target storage amount threshold;

根据所述多个预置属性字段，在所述目标存储地址对应的存储空间上创建列式存储数据库；creating a columnar storage database on the storage space corresponding to the target storage address according to the plurality of preset attribute fields;

将所述目标数据存储到所述列式存储数据库中。The target data is stored in the columnar storage database.

在本申请第一方面的一些变更实施方式中，所述在根据所述多个预置属性字段，在所述目标存储地址对应的存储空间上创建列式存储数据库之后，所述方法还包括：In some modified implementations of the first aspect of the present application, after the columnar storage database is created in the storage space corresponding to the target storage address according to the plurality of preset attribute fields, the method further includes:

从多个所述预置属性字段中获取目标属性字段；Obtain a target attribute field from a plurality of the preset attribute fields;

设置所述目标属性字段对应的数据存储格式，以用于在所述目标属性字段内以所述数据存储格式存储数据信息；和/或，setting a data storage format corresponding to the target attribute field, so as to store data information in the target attribute field in the data storage format; and/or,

若所述目标属性字段为多个，则通过对多个所述目标属性字段进行解析，得到所述多个目标属性字段之间存在的关联关系，以用于在将所述目标数据存储到所述列式存储数据库中时，优先向具有所述关联关系的目标属性字段内存储数据信息。If there are multiple target attribute fields, the association relationship between the multiple target attribute fields is obtained by parsing the multiple target attribute fields, so as to be used for storing the target data in all the target attribute fields. When the column type is stored in the database, data information is preferentially stored in the target attribute field with the association relationship.

在本申请第一方面的一些变更实施方式中，所述将所述目标数据存储到所述列式存储数据库中，包括：In some modified implementations of the first aspect of the present application, the storing the target data in the columnar storage database includes:

在利用所述多个预置属性字段存储所述目标数据的过程中，按照所述数据存储格式向所述目标属性字段存储相应的目标数据；In the process of storing the target data by using the plurality of preset attribute fields, storing the corresponding target data in the target attribute field according to the data storage format;

若所述目标属性字段为多个，则根据所述多个目标属性字段之间存在的关联关系，向所述多个目标属性字段存储相应的目标数据。If there are multiple target attribute fields, corresponding target data is stored in the multiple target attribute fields according to the association relationship between the multiple target attribute fields.

本申请第二方面提供了一种异构数据源的数据检索方法，应用于如上所述的异构数据源的数据存储方法所得到的列式存储数据库，该方法包括：A second aspect of the present application provides a data retrieval method for heterogeneous data sources, which is applied to a columnar storage database obtained by the above-mentioned data storage method for heterogeneous data sources, and the method includes:

接收数据检索指令，所述数据检索指令携带检索信息；receiving a data retrieval instruction, where the data retrieval instruction carries retrieval information;

通过遍历所述列式存储数据库中的各个属性列，查找与所述检索信息匹配的检索结果。By traversing each attribute column in the columnar storage database, a retrieval result matching the retrieval information is searched.

在本申请第二方面的一些变更实施方式中，所述通过遍历所述列式存储数据库中的各个属性列，查找与所述检索信息匹配的检索结果，包括：In some modified implementations of the second aspect of the present application, the searching for a retrieval result matching the retrieval information by traversing each attribute column in the columnar storage database includes:

从所述检索信息中解析出检索条件和检索关键字；Parse out the retrieval conditions and retrieval keywords from the retrieval information;

根据所述检索关键字，逐个遍历所述列式存储数据库内每个属性字段下的属性信息，查找匹配的目标属性信息；According to the retrieval keyword, traverse the attribute information under each attribute field in the columnar storage database one by one to find the matching target attribute information;

根据所述目标属性信息，确定对应归属的目标属性字段；According to the target attribute information, determine the corresponding attribution target attribute field;

在所述目标属性字段下，查找与所述检索条件匹配的检索结果。Under the target attribute field, search for a search result that matches the search condition.

本申请第三方面提供了一种异构数据源的数据存储装置，该装置包括：A third aspect of the present application provides a data storage device for heterogeneous data sources, the device comprising:

接收单元，用于接收数据采集指令，所述数据采集指令中包含异构数据源的标识；a receiving unit, configured to receive a data collection instruction, where the data collection instruction includes an identifier of a heterogeneous data source;

查找单元，用于基于所述异构数据源的标识，查找到相应的异构数据源，其中，所述异构数据源中内嵌有注解，所述注解用于配置所述异构数据源的数据采集方式；A search unit, configured to find a corresponding heterogeneous data source based on the identifier of the heterogeneous data source, wherein the heterogeneous data source is embedded with annotations, and the annotation is used to configure the heterogeneous data source the data collection method;

采集单元，用于根据所述异构数据源中内嵌的注解，执行数据采集操作得到目标数据；a collection unit, configured to perform a data collection operation to obtain target data according to the annotations embedded in the heterogeneous data sources;

存储单元，用于利用预置配置文件，将所述目标数据存储到列式存储数据库中。The storage unit is configured to use a preset configuration file to store the target data in the columnar storage database.

在本申请第三方面的一些变更实施方式中，采集单元包括：In some modified implementations of the third aspect of the present application, the collection unit includes:

获取模块，用于从所述注解中获取所述异构数据源的接口和数据转换格式；an acquisition module for acquiring the interface and data conversion format of the heterogeneous data source from the annotation;

采集模块，用于根据所述接口从所述异构数据源中采集数据；a collection module, configured to collect data from the heterogeneous data source according to the interface;

处理模块，用于根据所述数据转换格式，对采集到的数据进行统一格式转换处理得到目标数据。The processing module is configured to perform unified format conversion processing on the collected data according to the data conversion format to obtain target data.

在本申请第三方面的一些变更实施方式中，所述存储单元包括：In some modified implementations of the third aspect of the present application, the storage unit includes:

获取模块，用于从所述预置配置文件中获取多个预置属性字段、多个存储量阈值和多个存储地址，其中，每个所述存储量阈值关联了不同的存储地址；an acquisition module, configured to acquire multiple preset attribute fields, multiple storage thresholds and multiple storage addresses from the preset configuration file, wherein each of the storage thresholds is associated with a different storage address;

所述获取模块，还用于从所述多个存储量阈值中获取大于所述目标数据的数据量所对应的存储量阈值，作为目标存储量阈值；The acquiring module is further configured to acquire, from the plurality of storage thresholds, a storage threshold corresponding to a data volume greater than the target data, as a target storage threshold;

确定模块，用于根据所述目标存储量阈值，从所述多个存储地址中确定对应关联的目标存储地址；a determining module, configured to determine a corresponding associated target storage address from the plurality of storage addresses according to the target storage amount threshold;

创建模块，用于根据所述多个预置属性字段，在所述目标存储地址对应的存储空间上创建列式存储数据库；A creation module, configured to create a columnar storage database on the storage space corresponding to the target storage address according to the plurality of preset attribute fields;

存储模块，用于将所述目标数据存储到所述列式存储数据库中。A storage module, configured to store the target data in the columnar storage database.

在本申请第三方面的一些变更实施方式中，所述存储单元还包括：In some modified implementations of the third aspect of the present application, the storage unit further includes:

所述获取模块，还用于从多个所述预置属性字段中获取目标属性字段；The obtaining module is further configured to obtain a target attribute field from a plurality of the preset attribute fields;

设置模块，用于设置所述目标属性字段对应的数据存储格式，以用于在所述目标属性字段内以所述数据存储格式存储数据信息；a setting module, configured to set a data storage format corresponding to the target attribute field, so as to store data information in the target attribute field in the data storage format;

建立模块，用于当所述目标属性字段为多个时，通过对多个所述目标属性字段进行解析，得到所述多个目标属性字段之间存在的关联关系，以用于在将所述目标数据存储到所述列式存储数据库中时，优先向具有所述关联关系的目标属性字段内存储数据信息。A building module is configured to obtain the association relationship existing between the multiple target attribute fields by parsing the multiple target attribute fields when there are multiple target attribute fields, so as to use the When the target data is stored in the columnar storage database, data information is preferentially stored in the target attribute field with the association relationship.

在本申请第三方面的一些变更实施方式中，所述存储模块还具体用于：In some modified implementations of the third aspect of the present application, the storage module is further specifically used for:

当所述目标属性字段为多个时，根据所述多个目标属性字段之间存在的关联关系，向所述多个目标属性字段存储相应的目标数据。When there are a plurality of target attribute fields, corresponding target data is stored in the plurality of target attribute fields according to the association relationship existing among the plurality of target attribute fields.

本申请第四方面提供了一种异构数据源的数据检索装置，该装置包括：A fourth aspect of the present application provides a data retrieval device for heterogeneous data sources, the device comprising:

接收单元，用于接收数据检索指令，所述数据检索指令携带检索信息；a receiving unit, configured to receive a data retrieval instruction, where the data retrieval instruction carries retrieval information;

查找单元，用于通过遍历所述列式存储数据库中的各个属性列，查找与所述检索信息匹配的检索结果。A search unit, configured to search for a search result matching the search information by traversing each attribute column in the columnar storage database.

在本申请第四方面的一些变更实施方式中，所述查找单元包括：In some modified implementations of the fourth aspect of the present application, the search unit includes:

解析模块，用于从所述检索信息中解析出检索条件和检索关键字；a parsing module for parsing the retrieval conditions and retrieval keywords from the retrieval information;

查找模块，用于根据所述检索关键字，逐个遍历所述列式存储数据库内每个属性字段下的属性信息，查找匹配的目标属性信息；A search module, configured to traverse the attribute information under each attribute field in the columnar storage database one by one according to the retrieval keyword, to search for matching target attribute information;

确定模块，用于根据所述目标属性信息，确定对应归属的目标属性字段；a determining module, configured to determine the target attribute field corresponding to the attribution according to the target attribute information;

所述查找模块，还用于在所述目标属性字段下，查找与所述检索条件匹配的检索结果。The search module is further configured to search for a search result matching the search condition under the target attribute field.

本申请第五方面提供了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如上所述的异构数据源的检索方法，以及实现如上所述的异构数据源的数据检索方法。A fifth aspect of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the foregoing method for retrieving heterogeneous data sources is implemented, And a data retrieval method for realizing the heterogeneous data sources as described above.

本申请第六方面提供了一种电子设备，包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如上所述的异构数据源的检索方法，以及实现如上所述的异构数据源的数据检索方法。A sixth aspect of the present application provides an electronic device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implements the above when executing the computer program The method for retrieving heterogeneous data sources described above, and the method for retrieving data from heterogeneous data sources as described above.

借由上述技术方案，本发明提供的技术方案至少具有下列优点：By the above-mentioned technical scheme, the technical scheme provided by the present invention has at least the following advantages:

本发明提供了一种数据存储、数据检索方法及装置，本发明是预先向待采集的异构数据源内添加注解，继而在接收到数据采集指令时，根据异构数据源内嵌的注解，执行数据采集操作得到目标数据，然后在利用预置配置文件将目标数据存储到列式存储数据库中，据此以通用方式实现了对每个异构数据源的采集和存储操作。以及在接收到检索指令时利用该列式存储数据库能够反馈相应的检索结果。相较于现有技术，解决了因针对每个异构数据源定制化开发提供采集、存储和检索功能导致成本高、不利于后续维护管理的问题，本发明优化对异构数据源的采集和存储操作，无需针对每个异构数据源定制化开发去达到采集和存储的功能，即无需侵入性的修改编写代码，而是以通用方式实现采集和存储功能，同时相应地也提供了通用的检索功能，据此通用化处理方式，降低处理成本，效率高，也利于后续维护管理。The present invention provides a method and device for data storage and data retrieval. The present invention adds annotations to heterogeneous data sources to be collected in advance, and then executes execution according to the annotations embedded in the heterogeneous data sources when a data collection instruction is received. The data collection operation obtains the target data, and then uses the preset configuration file to store the target data in the columnar storage database, thereby realizing the collection and storage operation of each heterogeneous data source in a general manner. And when a retrieval instruction is received, corresponding retrieval results can be fed back by using the columnar storage database. Compared with the prior art, it solves the problem of high cost and unfavorable subsequent maintenance and management due to the customized development of each heterogeneous data source providing collection, storage and retrieval functions, and the invention optimizes the collection and management of heterogeneous data sources. For storage operations, there is no need to customize the development for each heterogeneous data source to achieve the functions of collection and storage, that is, there is no need to invasively modify and write codes. The retrieval function, based on this generalized processing method, reduces processing costs, has high efficiency, and is also conducive to subsequent maintenance and management.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the present invention, in order to be able to understand the technical means of the present invention more clearly, it can be implemented according to the content of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and easy to understand , the following specific embodiments of the present invention are given.

附图说明Description of drawings

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are for the purpose of illustrating preferred embodiments only and are not to be considered limiting of the invention. Also, the same components are denoted by the same reference numerals throughout the drawings. In the attached image:

图1为本发明实施例提供的一种异构数据源的数据存储方法流程图；1 is a flowchart of a data storage method for heterogeneous data sources according to an embodiment of the present invention;

图2为本发明实施例提供的另一种异构数据源的数据存储方法流程图；2 is a flowchart of another data storage method for heterogeneous data sources provided by an embodiment of the present invention;

图3为本发明实施例提供的一种异构数据源的数据检索方法流程图；3 is a flowchart of a data retrieval method for heterogeneous data sources provided by an embodiment of the present invention;

图4为本发明实施例提供的另一种异构数据源的数据检索方法流程图FIG. 4 is a flowchart of another method for retrieving data from heterogeneous data sources according to an embodiment of the present invention

图5为本发明实施例提供的一种异构数据源的数据存储装置的组成框图；FIG. 5 is a block diagram of the composition of a data storage device for heterogeneous data sources according to an embodiment of the present invention;

图6为本发明实施例提供的另一种异构数据源的数据存储装置的组成框图；6 is a block diagram of the composition of another data storage device for heterogeneous data sources provided by an embodiment of the present invention;

图7为本发明实施例提供的一种异构数据源的数据检索装置的组成框图；FIG. 7 is a block diagram of a composition of a data retrieval apparatus for heterogeneous data sources provided by an embodiment of the present invention;

图8为本发明实施例提供的另一种异构数据源的数据检索装置的组成框图。FIG. 8 is a block diagram of the composition of another apparatus for retrieving data from heterogeneous data sources according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将参照附图更详细地描述本发明的示例性实施例。虽然附图中显示了本发明的示例性实施例，然而应当理解，可以以各种形式实现本发明而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本发明，并且能够将本发明的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present invention will be more thoroughly understood, and will fully convey the scope of the present invention to those skilled in the art.

本发明实施例提供了一种异构数据源的数据存储方法，如图1所示，该方法是针对不同异构数据源以通用方式实现了采集和存储功能，对此本发明实施例提供以下具体步骤：An embodiment of the present invention provides a data storage method for heterogeneous data sources, as shown in FIG. 1 , the method realizes the collection and storage functions in a general manner for different heterogeneous data sources, and the embodiment of the present invention provides the following Specific steps:

101、接收数据采集指令，数据采集指令中包含异构数据源的标识。101. Receive a data collection instruction, where the data collection instruction includes an identifier of a heterogeneous data source.

其中，数据源即数据的来源之处，不同数据源包含了各行各业的业务活动产生的大量数据(例如，日志数据)，这些来源之处不同的大量数据会在数据结构、存储方式等等方面存在差异，构成了异构数据源。Among them, the data source is the source of the data, and different data sources contain a large amount of data (for example, log data) generated by business activities in various industries. There are differences in aspects, which constitute heterogeneous data sources.

在本发明实施例中，接收到的数据采集指令中可以包含一个或多个数据源的标识的，该标识用于指示数据采集对象。In this embodiment of the present invention, the received data collection instruction may include identifiers of one or more data sources, and the identifiers are used to indicate the data collection objects.

102、基于异构数据源的标识，查找到相应的异构数据源。102. Based on the identifier of the heterogeneous data source, find the corresponding heterogeneous data source.

103、根据异构数据源中内嵌的注解，执行数据采集操作得到目标数据。103. According to the annotation embedded in the heterogeneous data source, perform a data collection operation to obtain target data.

其中，异构数据源中内嵌有注解，注解用于配置异构数据源的数据采集方式，具体的，注解可理解为代码里的特殊标记，这些标记可以在编译，类加载，运行时被读取，并执行相应的处理，通过注解开发人员可以在不改变原有代码和逻辑的情况下在源代码中嵌入补充信息。Among them, there are annotations embedded in the heterogeneous data sources, and the annotations are used to configure the data collection method of the heterogeneous data sources. Specifically, the annotations can be understood as special tags in the code. These tags can be used during compilation, class loading, and runtime. Read, and perform corresponding processing, through annotation developers can embed supplementary information in the source code without changing the original code and logic.

在本发明实施例中，针对不同异构数据源，注解都是通用的，而不是针对每个数据源实现某些指定操作而定制开发编写的脚本，在确定采集对象(即数据源)之后，注解是被预先写入到数据源中的。In this embodiment of the present invention, the annotations are universal for different heterogeneous data sources, rather than a custom developed script for each data source to implement certain specified operations. Annotations are pre-written into the data source.

具体的，在代码层面，可以是利用一个软件开发工具包(Software DevelopmentKit，SDK)将注解写入到数据源的起始端，注解相当于是代码里的特殊标志，这些标志可以在编译，类加载，运行时被读取，并执行相应的处理，以便于其他工具补充信息或者进行部署。对于本发明实施例中，就是通过运行数据源内嵌的注解执行数据采集操作，得到目标数据。Specifically, at the code level, a software development kit (Software Development Kit, SDK) can be used to write annotations to the beginning of the data source. Annotations are equivalent to special signs in the code. These signs can be used in compilation, class loading, The runtime is read and processed accordingly for other tools to supplement information or deploy. In the embodiment of the present invention, the data collection operation is performed by running the annotation embedded in the data source to obtain the target data.

104、利用预置配置文件，将目标数据存储到列式存储数据库中。104. Using the preset configuration file, store the target data in the columnar storage database.

其中，预置配置文件至少包含了预置属性字段和存储地址，预置属性字段用于指代利用哪些属性字段对目标数据进行存储；存储地址用于指代将目标数据存储在哪里，例如内存或服务器。以及此外如果存储位置需要授权登录，那么在预置配置文件中还可以存储相应的账号和密码。The preset configuration file contains at least a preset attribute field and a storage address. The preset attribute field is used to refer to which attribute fields are used to store the target data; the storage address is used to refer to where the target data is stored, such as memory or server. And in addition, if the storage location requires authorized login, the corresponding account number and password can also be stored in the preset configuration file.

其中，列式存储数据库是指按照列存储索引存储采集到的目标数据，列式属性是以关系数据库中的属性(列)为单位进行数据存储的，在数据表中将同一属性的数据信息存储在一起，而一条记录中不同的属性的属性信息则分别存放在不同的存储单元中。Among them, the columnar storage database refers to storing the collected target data according to the column storage index, and the columnar attribute stores the data in units of attributes (columns) in the relational database, and stores the data information of the same attribute in the data table. Together, the attribute information of different attributes in a record is stored in different storage units.

在本发明实施例中，预置配置文件相当于是给出了存储目标数据所依据的标准。以及结合这样的存储标准，采用列式存储数据库实施存储目标数据的作用为：对于多个异构数据源而言，它们的数据量可以是百亿级的，采集这些异构数据源所得到的属性信息的种类数量也是很大的，继而利用创建的数据库存储这些属性信息所需要的属性字段数量也是很大的。如果按照行式存储方式存储数据信息，那么在检索操作时就会遍历查找大量的属性字段内的数据信息，这将浪费太多检索处理成本的，导致检索效率低。但是如果采用列式存储方式存储数据信息，在检索操作时基于列存储索引，即遍历一个属性字段内的属性信息，才会查找到下一个属性字段，因此无需遍历所有属性字段，就能够查找到匹配的检索结果了。In this embodiment of the present invention, the preset configuration file is equivalent to providing a standard according to which target data is stored. And combined with such storage standards, the role of using columnar storage database to store target data is: for multiple heterogeneous data sources, their data volume can be tens of billions, and the data obtained by collecting these heterogeneous data sources The number of types of attribute information is also very large, and the number of attribute fields required to store these attribute information by using the created database is also very large. If the data information is stored in the row-based storage mode, the data information in a large number of attribute fields will be traversed during the retrieval operation, which will waste too much retrieval processing cost, resulting in low retrieval efficiency. However, if the data information is stored in a columnar storage method, the retrieval operation is based on the column storage index, that is, the attribute information in one attribute field is traversed before the next attribute field can be found, so it is not necessary to traverse all the attribute fields to find the matching search results.

以上，本发明实施例提供了一种数据存储方法，本发明实施例是预先向待采集的异构数据源内添加注解，继而在接收到数据采集指令时，根据异构数据源内嵌的注解，执行数据采集操作得到目标数据，然后在利用预置配置文件将目标数据存储到列式存储数据库中，据此以通用方式实现了对每个异构数据源的采集和存储操作。相较于现有技术，解决了因针对每个异构数据源定制化开发提供采集、存储功能导致成本高、不利于后续维护管理的问题，本发明优化对异构数据源的采集和存储操作，无需针对每个异构数据源定制化开发去达到采集和存储的功能，即无需侵入性的修改编写代码，而是以通用方式实现采集和存储功能，降低数据采集和存储的处理成本，提高数据处理效率，也利于后续维护管理。Above, the embodiment of the present invention provides a data storage method. In the embodiment of the present invention, an annotation is added to the heterogeneous data source to be collected in advance, and then when a data collection instruction is received, according to the annotation embedded in the heterogeneous data source, Execute the data collection operation to obtain the target data, and then use the preset configuration file to store the target data in the columnar storage database, thereby realizing the collection and storage operation of each heterogeneous data source in a general manner. Compared with the prior art, it solves the problem of high cost and unfavorable follow-up maintenance and management caused by providing collection and storage functions for customized development of each heterogeneous data source, and the invention optimizes the collection and storage operations for heterogeneous data sources. , there is no need to customize the development for each heterogeneous data source to achieve the function of collection and storage, that is, there is no need to invasively modify the code, but to realize the function of collection and storage in a general way, reduce the processing cost of data collection and storage, and improve the Data processing efficiency is also conducive to subsequent maintenance management.

为了对上述实施例做出更加详细的说明，本发明实施例还提供了另一种异构数据源的数据存储方法，如图2所示，该方法是对上述实施例的细化解释说明，对此本发明实施例提供以下具体步骤：In order to describe the above embodiments in more detail, the embodiments of the present invention further provide another data storage method for heterogeneous data sources. As shown in FIG. 2 , the method is a detailed explanation of the above embodiments. This embodiment of the present invention provides the following specific steps:

201、接收数据采集指令，数据采集指令中包含异构数据源的标识。201. Receive a data collection instruction, where the data collection instruction includes an identifier of a heterogeneous data source.

在本发明实施例中，本步骤解释说明，参见步骤101，此处不再赘述。In this embodiment of the present invention, for the explanation of this step, refer to step 101, and details are not repeated here.

202、基于异构数据源的标识，查找到相应的异构数据源。202. Find a corresponding heterogeneous data source based on the identifier of the heterogeneous data source.

203、根据异构数据源中内嵌的注解，执行数据采集操作得到目标数据。203. According to the annotation embedded in the heterogeneous data source, perform a data collection operation to obtain target data.

其中，异构数据源中内嵌有注解，注解用于配置异构数据源的数据采集方式，注解中包括异构数据源的接口和数据转换格式。Among them, annotations are embedded in the heterogeneous data sources, the annotations are used to configure the data collection method of the heterogeneous data sources, and the annotations include the interfaces and data conversion formats of the heterogeneous data sources.

在本发明实施例中，利用异构数据源中内嵌的注解实现数据采集的具体实施方法包括如下：In the embodiment of the present invention, the specific implementation method for implementing data collection by using the annotations embedded in the heterogeneous data sources includes the following:

首先是，从注解中获取异构数据源的接口和数据转换格式。The first is to obtain the interface and data conversion format of heterogeneous data sources from annotations.

其次是，根据接口从异构数据源中采集数据，根据数据转换格式，对采集到的数据进行统一格式转换处理得到目标数据。Secondly, collect data from heterogeneous data sources according to the interface, and perform unified format conversion processing on the collected data according to the data conversion format to obtain target data.

需要说明的是，对于多个异构数据源，它们在数据结构、存储格式等等方面存在差异，因此需要利用本发明实施例提及的数据转换格式，实现对从多个异构数据源内采集到的数据信息进行标准化格式处理，从而得到标准统一的数据信息，作为目标数据，用于后续存储操作。It should be noted that, for multiple heterogeneous data sources, there are differences in data structure, storage format, etc., so it is necessary to use the data conversion format mentioned in the embodiment of the present invention to realize the data collection from multiple heterogeneous data sources. The received data information is processed in a standardized format, thereby obtaining standardized and unified data information, which is used as target data for subsequent storage operations.

204、从预置配置文件中获取多个预置属性字段、多个存储量阈值和多个存储地址，其中，每个存储量阈值关联了不同的存储地址。204. Acquire multiple preset attribute fields, multiple storage volume thresholds, and multiple storage addresses from the preset configuration file, where each storage volume threshold is associated with a different storage address.

其中，预置属性字段是从多个技术领域的数据库整合得到的标准化通用的属性字段。具体的，对于本发明实施例从异构数据源采集到的目标数据，由于是来自于不同异构数据源的，且多个异构数据源所包含的属性种类数量也是很大的，因此，本发明实施例为了实现对更多样性的数据信息进行有效存储，可以从这些异构数据源中获取属性字段来构建一个标准化通用的属性字段。The preset attribute field is a standardized and general attribute field obtained by integrating databases in multiple technical fields. Specifically, for the target data collected from heterogeneous data sources in the embodiment of the present invention, since it comes from different heterogeneous data sources, and the number of attribute types contained in the multiple heterogeneous data sources is also large, therefore, In order to effectively store more diverse data information in the embodiments of the present invention, attribute fields can be obtained from these heterogeneous data sources to construct a standardized and general attribute field.

例如，通过对多个异构数据源内包含的属性字段进行取交集处理，得到公共属性字段；其次是，根据公共属性字段，衍生属性字段。具体的，对于衍生属性字段，可以采用语义分析方法，例如：对相似语义的属性字段进行合并，或者基于高频使用的属性字段扩展更多下位概念细化分支的属性字段。进一步的说明，多个异构数据源包含的属性种类是多样的且数量大，对于那些很低频出现的属性种类的数据，它的数据价值不高，可以将其从采集到的目标数据中数据清洗掉，而不存储到列式存储数据库中，具体的，可以不在数据库内设置这些很低频的属性字段，以避免数据库内存储过多无用冗余数据而占用存储资源了。For example, a common attribute field is obtained by performing intersection processing on attribute fields contained in multiple heterogeneous data sources; secondly, an attribute field is derived according to the common attribute field. Specifically, for the derived attribute field, a semantic analysis method can be used, for example, merging attribute fields with similar semantics, or expanding attribute fields of more subordinate concept refinement branches based on attribute fields that are frequently used. It is further explained that the types of attributes contained in multiple heterogeneous data sources are diverse and large in number. For the data of the types of attributes that appear infrequently, its data value is not high, and it can be collected from the target data. It is cleaned and not stored in the columnar storage database. Specifically, these very low-frequency attribute fields can not be set in the database, so as to avoid storing too much useless redundant data in the database and occupying storage resources.

其中，存储量阈值是指存储空间的数据容量阈值，存储地址是指存储空间的地址，每个存储量阈值关联了不同的存储地址，以便于根据采集到目标数据的数据量大小，选择相应的存储空间，需要说明的是，对于采集到的目标数据，可以包括动态数据和静态数据，动态数据是常常变化，直接反映事务过程的数据，比如，网站访问量、在线人数、日销售额等等，由于动态数据更新周期短且数据量大，因此优化处理方式，是将动态数据和静态数据分开处理。Among them, the storage volume threshold refers to the data capacity threshold of the storage space, and the storage address refers to the address of the storage space. Each storage volume threshold is associated with a different storage address, so as to select the corresponding storage address according to the data volume of the collected target data. Storage space, it should be noted that for the collected target data, it can include dynamic data and static data. Dynamic data is the data that changes frequently and directly reflects the transaction process, such as website traffic, online number, daily sales, etc. , because the dynamic data update cycle is short and the amount of data is large, the optimal processing method is to separate the dynamic data and the static data.

在本发明实施例中，利用预置配置文件中的预置属性字段、存储量阈值和存储地址，能够自动完成对目标数据创建相应数据库完成存储。In the embodiment of the present invention, by using the preset attribute field, storage amount threshold and storage address in the preset configuration file, it is possible to automatically complete the creation of a corresponding database for the target data to complete the storage.

205、从多个存储量阈值中获取大于目标数据的数据量所对应的存储量阈值，作为目标存储量阈值。205. Acquire, from a plurality of storage amount thresholds, a storage amount threshold corresponding to a data amount greater than the target data, as the target storage amount threshold.

206、根据目标存储量阈值，从多个存储地址中确定对应关联的目标存储地址。206. Determine a corresponding associated target storage address from the plurality of storage addresses according to the target storage amount threshold.

在本发明实施例中，对于步骤205-206解释说明，预置配置文件中存储了多个存储量阈值，每个存储量阈值对应关联一个存储地址。为了避免出现存储地址对应的存储空间不足以容纳存储目标数据的情况，因此可以优先将目标数据的数据量与不同存储量阈值进行比较，查找大于目标数据的数据量的存储量阈值所对应的存储地址，当然若这样的存储地址有多个，则可以从中选择任意一个即可，从而利用存储地址获取到存储空间，去完成存储目标数据。其中，每个存储地址都可以包括：成对存储的起始地址和结束地址。In this embodiment of the present invention, for the explanation of steps 205-206, a preset configuration file stores multiple storage volume thresholds, and each storage volume threshold corresponds to a storage address. In order to avoid the situation that the storage space corresponding to the storage address is not enough to store the target data, the data volume of the target data can be compared with different storage volume thresholds first, and the storage volume corresponding to the storage volume threshold greater than the data volume of the target data can be found. Of course, if there are multiple such storage addresses, any one of them can be selected, and the storage space can be obtained by using the storage address to complete the storage of the target data. Wherein, each storage address may include: a start address and an end address stored in pairs.

在另外的实施例中，为了提升存储空间的利用率，在查找到大于目标数据的数据量的存储量阈值所对应的多个存储地址后，可以从多个存储地址中选择最小的存储量阈值对应的存储地址，从而确保相应的地址空间尽可能多地被占用，防止资源浪费。In another embodiment, in order to improve the utilization rate of the storage space, after finding multiple storage addresses corresponding to the storage volume threshold greater than the data volume of the target data, the smallest storage volume threshold may be selected from the multiple storage addresses Corresponding storage address, so as to ensure that the corresponding address space is occupied as much as possible and prevent the waste of resources.

207、根据多个预置属性字段，在目标存储地址对应的存储空间上创建列式存储数据库。207. Create a columnar storage database on the storage space corresponding to the target storage address according to the plurality of preset attribute fields.

在本发明实施例中，主要是根据预置配置文件中的预置属性字段去创建列式存储数据库的，那么该列式存储数据库内就包含了这些预置属性字段，进一步的，还可以对这些预置属性字段做出优化处理，具体包括如下：In the embodiment of the present invention, the columnar storage database is mainly created according to the preset attribute fields in the preset configuration file, then the columnar storage database includes these preset attribute fields. These preset attribute fields are optimized, including the following:

从多个预置属性字段中获取目标属性字段，即需要优化处理的预置属性字段，简称为目标属性字段，对目标属性字段优化处理方式为：Obtain the target attribute field from multiple preset attribute fields, that is, the preset attribute field that needs to be optimized, which is referred to as the target attribute field for short. The optimization processing method for the target attribute field is as follows:

例如，通过对目标属性字段进行预处理，设置目标属性字段对应的数据存储格式。For example, by preprocessing the target attribute field, the data storage format corresponding to the target attribute field is set.

示例性的，在一些特定领域有特殊含义的字段，或者例如根据实际业务需求，对于省、市、区、县的名称存储时，需要同时存储文字名称和其映射的编码。Exemplarily, for fields with special meanings in some specific fields, or for example, according to actual business requirements, when storing names of provinces, cities, districts, and counties, it is necessary to store both text names and their mapped codes.

又或者，作为并列方案，若目标属性字段为多个，则通过对多个目标属性字段进行解析，得到多个目标属性字段之间存在的关联关系。基于这种关联关系，可以优先向具有关联关系的目标属性字段内存储数据信息，例如，在向一个目标属性字段存储数据信息之后，若该目标属性字段与另一个目标属性字段存在关联关系，则基于这种关联关系，优先向该另一个目标属性字段内存储数据信息，然后再根据其他属性字段去完成对数据库的数据存储操作。Or, as a parallel solution, if there are multiple target attribute fields, the relationship between the multiple target attribute fields is obtained by analyzing the multiple target attribute fields. Based on this association relationship, data information can be preferentially stored in the target attribute field with the association relationship. For example, after storing data information in one target attribute field, if the target attribute field has an association relationship with another target attribute field, then Based on this association relationship, data information is preferentially stored in the other target attribute field, and then the data storage operation on the database is completed according to other attribute fields.

需要说明的，基于关联关系进行数据存储，不仅在存储时，能有效提升数据存储效率，在后续检索时，基于一个目标属性字段，能够带出与其存在关联关系的其他目标属性字段，从而有效提升数据的查询效率。It should be noted that data storage based on association relationship can not only effectively improve the efficiency of data storage during storage, but also in subsequent retrieval, based on a target attribute field, it can bring out other target attribute fields that have an associated relationship with it, thereby effectively improving the efficiency of data storage. Data query efficiency.

示例性的，如上提及的省、市、区、县分别对应不同属性字段，可以预先按照一定排序建立这些属性字段之间的关联关系，从而基于该关联关系，按照这个排序依次向省、市、区、县分别对应不同属性字段内存储数据信息，由于采集到的目标数据也可能是存在关联的，那么基于具有关联关系的目标属性字段去完成优先存储操作，提高了数据存储效率。Exemplarily, as mentioned above, the provinces, cities, districts, and counties correspond to different attribute fields, and the association relationship between these attribute fields can be established in advance according to a certain order, so that based on the association relationship, according to this order, the provinces, cities and , district, and county store data information in different attribute fields respectively. Since the collected target data may also be related, the priority storage operation is completed based on the target attribute field with the related relationship, which improves the data storage efficiency.

208、将目标数据存储到列式存储数据库中。208. Store the target data in the columnar storage database.

在本发明实施例中，通过对采集得到的目标数据进行数据解析处理，确定需要存储到的属性字段下，从而完成存储操作，但是需要对步骤207预先设定的目标属性字段进行说明如下：In the embodiment of the present invention, by performing data analysis processing on the collected target data, the attribute field to be stored is determined, so as to complete the storage operation, but the target attribute field preset instep 207 needs to be described as follows:

在利用多个预置属性字段存储目标数据的过程中，按照数据存储格式向目标属性字段存储相应数据。以及若目标属性字段为多个，则根据多个目标属性字段存在的关联关系，向多个目标属性字段存储相应数据。In the process of storing the target data by using a plurality of preset attribute fields, corresponding data is stored in the target attribute field according to the data storage format. And if there are multiple target attribute fields, corresponding data is stored in the multiple target attribute fields according to the association relationship existing in the multiple target attribute fields.

进一步的，在本发明实施例提供的异构数据源的数据存储方法得到的列式存储数据库基础之上，本发明实施例还提供了一种异构数据源的数据检索方法，如图3所示，对此本发明实施例提供以下具体步骤：Further, based on the columnar storage database obtained by the data storage method for heterogeneous data sources provided by the embodiment of the present invention, the embodiment of the present invention also provides a data retrieval method for heterogeneous data sources, as shown in FIG. 3 . The following specific steps are provided in this embodiment of the present invention:

301、接收数据检索指令，数据检索指令携带检索信息。301. Receive a data retrieval instruction, where the data retrieval instruction carries retrieval information.

302、通过遍历列式存储数据库中的各个属性列，查找与检索信息匹配的检索结果。302. Search for a retrieval result matching the retrieval information by traversing each attribute column in the columnar storage database.

在本发明实施例中，列式存储数据库中存储了来自于多个异构数据源内的数据信息，这相当于是通过数据采集和存储的方式对多个异构数据源内数据信息进行了整合预处理。那么对于接收到检索指令，它可以间接相当于是面向这些异构数据源的检索操作，因此通过检索列式存储数据库查找到检索结果，等同了向原异构数据源发起检索指令得到的检索结果。In the embodiment of the present invention, the columnar storage database stores data information from multiple heterogeneous data sources, which is equivalent to integrating and preprocessing the data information in multiple heterogeneous data sources by means of data collection and storage. . Then, when a retrieval command is received, it can be indirectly equivalent to a retrieval operation oriented to these heterogeneous data sources. Therefore, finding the retrieval result by retrieving the columnar storage database is equivalent to the retrieval result obtained by issuing a retrieval command to the original heterogeneous data source.

对于本发明实施例中，基于检索该列式存储数据库得到检索结果，替代了原本各自向每个异构数据源发起检索指令去获取相关数据信息，大大提高了检索效率和检索通用性。In the embodiment of the present invention, retrieval results are obtained based on retrieval of the columnar storage database, instead of initiating retrieval instructions to each heterogeneous data source to obtain relevant data information, which greatly improves retrieval efficiency and retrieval versatility.

进一步的，为了对检索操作的细化解释说明，本发明实施例还提供了另一种异构数据源的数据检索方法，如图4所示，对此本发明实施例提供以下具体步骤：Further, in order to explain the retrieval operation in detail, the embodiment of the present invention also provides another method for retrieving data from heterogeneous data sources, as shown in FIG. 4 , and the embodiment of the present invention provides the following specific steps:

401、接收数据检索指令，数据检索指令携带检索信息。401. Receive a data retrieval instruction, where the data retrieval instruction carries retrieval information.

402、从检索信息中解析出检索条件和检索关键字。402. Parse the retrieval condition and the retrieval keyword from the retrieval information.

403、根据检索关键字，逐个遍历列式存储数据库内每个属性字段下的属性信息，查找匹配的目标属性信息。403. Traverse the attribute information under each attribute field in the columnar storage database one by one according to the retrieval keyword, and search for matching target attribute information.

本发明实施例利用列式存储数据库可以将采集这些异构数据源所得到的大量种类的属性信息进行存储，即列式存储数据库内属性字段数量也是很多的，那么基于列式存储数据库的列式存储特点，也会在数据读取时能够实现整列读取数据，即检索到一个属性字段，能够读取整个属性列的数据信息，如此检索，可以避免大量属性字段被遍历，而是当确定检索关键字所在属性字段之后，不需要再遍历其他属性字段了，大大提高检索效率。In the embodiment of the present invention, a columnar storage database can be used to store a large number of types of attribute information obtained by collecting these heterogeneous data sources, that is, the number of attribute fields in the columnar storage database is also large. The storage characteristics will also enable the entire column to read the data when the data is read, that is, retrieve an attribute field and read the data information of the entire attribute column. Such retrieval can avoid a large number of attribute fields from being traversed, but when the retrieval is determined After the attribute field where the keyword is located, there is no need to traverse other attribute fields, which greatly improves the retrieval efficiency.

404、根据目标属性信息，确定对应归属的目标属性字段。404. Determine the target attribute field corresponding to the attribution according to the target attribute information.

405、在目标属性字段下，查找与检索条件匹配的检索结果。405. Under the target attribute field, search for a search result matching the search condition.

在本发明实施例，在根据检索关键字，确定匹配的目标属性信息之后，再进一步的根据检索条件在该目标属性字段下完成检索操作，高效的获取到检索结果。In the embodiment of the present invention, after the matching target attribute information is determined according to the retrieval keyword, the retrieval operation is further completed under the target attribute field according to the retrieval condition, and the retrieval result is obtained efficiently.

进一步的，作为对上述图1、图2所示方法的实现，本发明实施例提供了一种异构数据源的数据存储装置。该装置实施例与前述方法实施例对应，为便于阅读，本装置实施例不再对前述方法实施例中的细节内容进行逐一赘述，但应当明确，本实施例中的装置能够对应实现前述方法实施例中的全部内容。该装置应用于针对不同异构数据源以通用方式实现了采集和存储操作，具体如图5所示，该装置包括：Further, as an implementation of the methods shown in FIG. 1 and FIG. 2 above, an embodiment of the present invention provides a data storage device for heterogeneous data sources. This apparatus embodiment corresponds to the foregoing method embodiments. For ease of reading, this apparatus embodiment will not repeat the details in the foregoing method embodiments one by one, but it should be clear that the apparatus in this embodiment can correspondingly implement the foregoing method embodiments. the entire contents of the example. The device is applied to realize collection and storage operations in a general manner for different heterogeneous data sources, as shown in Figure 5, the device includes:

接收单元31，用于接收数据采集指令，所述数据采集指令中包含异构数据源的标识；a receivingunit 31, configured to receive a data collection instruction, where the data collection instruction includes an identifier of a heterogeneous data source;

查找单元32，用于基于所述异构数据源的标识，查找到相应的异构数据源，其中，所述异构数据源中内嵌有注解，所述注解用于配置所述异构数据源的数据采集方式；The searchingunit 32 is configured to find the corresponding heterogeneous data source based on the identifier of the heterogeneous data source, wherein the heterogeneous data source is embedded with annotations, and the annotations are used to configure the heterogeneous data source data collection method;

采集单元33，用于根据所述异构数据源中内嵌的注解，执行数据采集操作得到目标数据；Acollection unit 33, configured to perform a data collection operation to obtain target data according to the annotations embedded in the heterogeneous data sources;

存储单元34，用于利用预置配置文件，将所述目标数据存储到列式存储数据库中。Thestorage unit 34 is configured to use a preset configuration file to store the target data in a columnar storage database.

进一步的，如图6所示，采集单元33包括：Further, as shown in Figure 6, thecollection unit 33 includes:

获取模块331，用于从所述注解中获取所述异构数据源的接口和数据转换格式；an obtainingmodule 331, configured to obtain the interface and data conversion format of the heterogeneous data source from the annotation;

采集模块332，用于根据所述接口从所述异构数据源中采集数据；acollection module 332, configured to collect data from the heterogeneous data sources according to the interface;

处理模块333，用于根据所述数据转换格式，对采集到的数据进行统一格式转换处理得到目标数据。Theprocessing module 333 is configured to perform unified format conversion processing on the collected data according to the data conversion format to obtain target data.

进一步的，如图6所示，所述存储单元34包括：Further, as shown in FIG. 6 , thestorage unit 34 includes:

获取模块341，用于从所述预置配置文件中获取多个预置属性字段、多个存储量阈值和多个存储地址，其中，每个所述存储量阈值关联了不同的存储地址；an obtainingmodule 341, configured to obtain a plurality of preset attribute fields, a plurality of storage amount thresholds and a plurality of storage addresses from the preset configuration file, wherein each of the storage amount thresholds is associated with a different storage address;

所述获取模块341，还用于从所述多个存储量阈值中获取大于所述目标数据的数据量所对应的存储量阈值，作为目标存储量阈值；The acquiringmodule 341 is further configured to acquire, from the plurality of storage thresholds, a storage threshold corresponding to a data volume greater than the target data, as the target storage threshold;

确定模块342，用于根据所述目标存储量阈值，从所述多个存储地址中确定对应关联的目标存储地址；Adetermination module 342, configured to determine a corresponding associated target storage address from the plurality of storage addresses according to the target storage amount threshold;

创建模块343，用于根据所述多个预置属性字段，在所述目标存储地址对应的存储空间上创建列式存储数据库；Acreation module 343, configured to create a columnar storage database on the storage space corresponding to the target storage address according to the plurality of preset attribute fields;

存储模块344，用于将所述目标数据存储到所述列式存储数据库中。Thestorage module 344 is configured to store the target data in the columnar storage database.

进一步的，如图6所示，所述存储单元34还包括：Further, as shown in FIG. 6 , thestorage unit 34 further includes:

所述获取模块341，还用于从多个所述预置属性字段中获取目标属性字段；The obtainingmodule 341 is further configured to obtain a target attribute field from a plurality of the preset attribute fields;

设置模块345，用于设置所述目标属性字段对应的数据存储格式，以用于在所述目标属性字段内以所述数据存储格式存储数据信息；asetting module 345, configured to set a data storage format corresponding to the target attribute field, so as to store data information in the data storage format in the target attribute field;

建立模块346，用于当所述目标属性字段为多个时，通过对多个所述目标属性字段进行解析，得到所述多个目标属性字段之间存在的关联关系，以用于在将所述目标数据存储到所述列式存储数据库中时，优先向具有所述关联关系的目标属性字段内存储数据信息。The establishingmodule 346 is configured to, when there are multiple target attribute fields, obtain the association relationship existing between the multiple target attribute fields by parsing the multiple target attribute fields, so as to When the target data is stored in the columnar storage database, data information is preferentially stored in the target attribute field with the association relationship.

进一步的，如图6所示，所述存储模块344还具体用于：Further, as shown in FIG. 6 , thestorage module 344 is also specifically used for:

进一步的，作为对上述图3、图4所示方法的实现，本发明实施例提供了一种异构数据源的数据检索装置。该装置实施例与前述方法实施例对应，为便于阅读，本装置实施例不再对前述方法实施例中的细节内容进行逐一赘述，但应当明确，本实施例中的装置能够对应实现前述方法实施例中的全部内容。该装置应用于完成针对包含异构数据源的数据库的检索操作，具体如图7所示，该装置包括：Further, as an implementation of the methods shown in FIG. 3 and FIG. 4 above, an embodiment of the present invention provides a data retrieval apparatus for heterogeneous data sources. This apparatus embodiment corresponds to the foregoing method embodiments. For ease of reading, this apparatus embodiment will not repeat the details in the foregoing method embodiments one by one, but it should be clear that the apparatus in this embodiment can correspondingly implement the foregoing method embodiments. the entire contents of the example. The device is used to complete a retrieval operation for a database containing heterogeneous data sources, as shown in FIG. 7 , the device includes:

接收单元41，用于接收数据检索指令，所述数据检索指令携带检索信息；a receivingunit 41, configured to receive a data retrieval instruction, where the data retrieval instruction carries retrieval information;

查找单元42，用于通过遍历所述列式存储数据库中的各个属性列，查找与所述检索信息匹配的检索结果。Thesearch unit 42 is configured to search for a search result matching the search information by traversing each attribute column in the columnar storage database.

进一步的，如图8所示，所述查找单元42包括：Further, as shown in Figure 8, thesearch unit 42 includes:

解析模块421，用于从所述检索信息中解析出检索条件和检索关键字；Theparsing module 421 is used to parse out the retrieval conditions and retrieval keywords from the retrieval information;

查找模块422，用于根据所述检索关键字，逐个遍历所述列式存储数据库内每个属性字段下的属性信息，查找匹配的目标属性信息；Thesearch module 422 is configured to traverse the attribute information under each attribute field in the columnar storage database one by one according to the retrieval keyword, and search for matching target attribute information;

确定模块423，用于根据所述目标属性信息，确定对应归属的目标属性字段；Adetermination module 423, configured to determine the corresponding attribution target attribute field according to the target attribute information;

所述查找模块422，还用于在所述目标属性字段下，查找与所述检索条件匹配的检索结果。Thesearch module 422 is further configured to search for a search result matching the search condition under the target attribute field.

综上所述，本发明实施例提供了一种数据存储、数据检索方法及装置，本发明是预先向待采集的异构数据源内添加注解，继而在接收到数据采集指令时，根据异构数据源内嵌的注解，执行数据采集操作得到目标数据，然后在利用预置配置文件将目标数据存储到列式存储数据库中，据此以通用方式实现了对每个异构数据源的采集和存储操作。以及在接收到检索指令时利用该列式存储数据库能够反馈相应的检索结果。相较于现有技术，解决了因针对每个异构数据源定制化开发提供采集、存储和检索功能导致成本高、不利于后续维护管理的问题，本发明实施例优化对异构数据源的采集和存储操作，无需针对每个异构数据源定制化开发去达到采集和存储的功能，即无需侵入性的修改编写代码，而是以通用方式实现采集和存储功能，同时相应地也提供了通用的检索功能，据此通用化处理方式，降低处理成本，效率高，也利于后续维护管理。To sum up, the embodiments of the present invention provide a method and device for data storage and data retrieval. In the present invention, annotations are added to the heterogeneous data sources to be collected in advance, and then when a data collection instruction is received, according to the heterogeneous data The annotations embedded in the source, perform the data collection operation to obtain the target data, and then use the preset configuration file to store the target data in the columnar storage database, thereby realizing the collection and storage of each heterogeneous data source in a general way operate. And when a retrieval instruction is received, corresponding retrieval results can be fed back by using the columnar storage database. Compared with the prior art, it solves the problem of high cost and unfavorable follow-up maintenance and management due to the customized development of each heterogeneous data source to provide collection, storage and retrieval functions. Collection and storage operations do not require customized development for each heterogeneous data source to achieve the functions of collection and storage, that is, there is no need to invasively modify and write codes, but to realize the functions of collection and storage in a general way, and correspondingly provide The general retrieval function, based on the generalized processing method, reduces the processing cost, has high efficiency, and is also conducive to subsequent maintenance and management.

所述异构数据源的数据存储装置包括处理器和存储器，上述接收单元、查找单元、采集单元和存储单元等均作为程序单元存储在存储器中，由处理器执行存储在存储器中的上述程序单元来实现相应的功能。The data storage device of the heterogeneous data source includes a processor and a memory, the above-mentioned receiving unit, search unit, collection unit and storage unit are all stored in the memory as program units, and the processor executes the above-mentioned program units stored in the memory. to achieve the corresponding function.

所述异构数据源的数据检索装置包括处理器和存储器，上述接收单元、和查找单元等均作为程序单元存储在存储器中，由处理器执行存储在存储器中的上述程序单元来实现相应的功能。The data retrieval device of the heterogeneous data source includes a processor and a memory, the above-mentioned receiving unit, the search unit, etc. are all stored in the memory as program units, and the processor executes the above-mentioned program units stored in the memory to realize corresponding functions. .

处理器中包含内核，由内核去存储器中调取相应的程序单元。内核可以设置一个或以上，通过调整内核参数以利用通用方式实现对异构数据源的采集、存储和检索操作，优化了对异构数据源的相关数据处理方法，降低数据处理成本，提高数据处理效率，也有利于后续维护管理。The processor includes a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to one or more. By adjusting the kernel parameters, the collection, storage and retrieval operations of heterogeneous data sources can be realized in a common way, which optimizes the related data processing methods for heterogeneous data sources, reduces data processing costs, and improves data processing. Efficiency is also conducive to follow-up maintenance management.

本发明实施例提供了一种计算机可读存储介质，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如上所述的异构数据源的数据存储方法，或异构数据源的数据检索方法。An embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the above-described data storage method for heterogeneous data sources is implemented, or data retrieval methods for heterogeneous data sources.

本发明实施例提供了一种电子设备，包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如上所述的异构数据源的数据存储方法，或异构数据源的数据检索方法。An embodiment of the present invention provides an electronic device, including: a memory, a processor, and a computer program stored on the memory and running on the processor, where the processor implements the above when executing the computer program The data storage method of the heterogeneous data source described above, or the data retrieval method of the heterogeneous data source.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

在一个典型的配置中，设备包括一个或多个处理器(CPU)、存储器和总线。设备还可以包括输入/输出接口、网络接口等。In a typical configuration, a device includes one or more processors (CPUs), memory, and a bus. Devices may also include input/output interfaces, network interfaces, and the like.

存储器可能包括计算机可读介质中的非永久性存储器，随机存取存储器(RAM)和/或非易失性内存等形式，如只读存储器(ROM)或闪存(flash RAM)，存储器包括至少一个存储芯片。存储器是计算机可读介质的示例。Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory (flash RAM), the memory including at least one memory chip. Memory is an example of a computer-readable medium.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带，磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质，可用于存储可以被计算设备访问的信息。按照本文中的界定，计算机可读介质不包括暂存电脑可读媒体(transitory media)，如调制的数据信号和载波。Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

还需要说明的是，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed, or which are inherent to such a process, method, article of manufacture, or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture or apparatus that includes the element.

本领域技术人员应明白，本申请的实施例可提供为方法、系统或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。It will be appreciated by those skilled in the art that the embodiments of the present application may be provided as a method, a system or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

以上仅为本申请的实施例而已，并不用于限制本申请。对于本领域技术人员来说，本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同插入、改进等，均应包含在本申请的权利要求范围之内。The above are merely examples of the present application, and are not intended to limit the present application. Various modifications and variations of this application are possible for those skilled in the art. Any modification, equivalent insertion, improvement, etc. made within the spirit and principle of the present application shall be included within the scope of the claims of the present application.

Claims

Translated fromChinese

1.一种异构数据源的数据存储方法，其特征在于，所述方法包括：1. A data storage method for heterogeneous data sources, wherein the method comprises:

2.根据权利要求1所述的方法，其特征在于，所述注解中包括所述异构数据源的接口和数据转换格式，所述根据所述异构数据源中内嵌的注解，执行数据采集操作得到目标数据，包括：2 . The method according to claim 1 , wherein the annotation includes an interface and a data conversion format of the heterogeneous data source, and the execution data is executed according to the annotation embedded in the heterogeneous data source. 3 . The collection operation obtains the target data, including:

3.根据权利要求1所述的方法，其特征在于，所述利用预置配置文件，将所述目标数据存储到列式存储数据库中，包括：3. The method according to claim 1, wherein the storing the target data in a columnar storage database by using a preset configuration file comprises:

4.根据权利要求3所述的方法，其特征在于，所述方法还包括：4. The method according to claim 3, wherein the method further comprises:

5.根据权利要求4所述的方法，其特征在于，所述将所述目标数据存储到所述列式存储数据库中，包括：5. The method according to claim 4, wherein the storing the target data in the columnar storage database comprises:

6.一种异构数据源的数据检索方法，其特征在于，应用于如权利要求1至5中任一项所述的异构数据源的数据存储方法所得到的列式存储数据库，所述方法包括：6. A data retrieval method for heterogeneous data sources, characterized in that, applied to a columnar storage database obtained by the data storage method for heterogeneous data sources as claimed in any one of claims 1 to 5, said Methods include:

7.根据权利要求6所述的方法，其特征在于，所述通过遍历所述列式存储数据库中的各个属性列，查找与所述检索信息匹配的检索结果，包括：7. The method according to claim 6, wherein the searching for a retrieval result matching the retrieval information by traversing each attribute column in the columnar storage database comprises:

8.一种异构数据源的数据存储装置，其特征在于，所述装置包括：8. A data storage device for heterogeneous data sources, wherein the device comprises:

9.一种异构数据源的数据检索装置，其特征在于，所述装置包括：9. A data retrieval device for heterogeneous data sources, wherein the device comprises:

10.一种计算机可读存储介质，其特征在于，所述计算机可读存储介质上存储有计算机程序，所述计算机程序被处理器执行时实现如权利要求1-5中任一项所述的异构数据源的数据存储方法；10. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program according to any one of claims 1-5 is implemented. Data storage methods for heterogeneous data sources;

或所述计算机程序被处理器执行时实现如权利要求6或7所述的异构数据源的数据检索方法。Or when the computer program is executed by the processor, the method for retrieving data from heterogeneous data sources as claimed in claim 6 or 7 is implemented.

11.一种电子设备，其特征在于，包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现如权利要求1-5中任一项所述的异构数据源的数据存储方法；11. An electronic device, characterized in that it comprises: a memory, a processor and a computer program stored on the memory and running on the processor, the processor implementing the computer program as claimed when executing the computer program The data storage method of the heterogeneous data source described in any one of requirements 1-5;