CN113836175B

Movatterモバイル変換

Info

Publication number: CN113836175B
Application number: CN202010592659.6A
Authority: CN
Inventors: 徐陇浙
Original assignee: Zhejiang Uniview Technologies Co Ltd
Current assignee: Zhejiang Uniview Technologies Co Ltd
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2024-08-09
Anticipated expiration: 2040-06-24
Also published as: CN113836175A

Abstract

Translated fromChinese

本申请实施例公开了一种数据访问方法、装置、设备及存储介质。该方法包括：确定数据访问指令是否命中候选指令规则，并将所述数据访问指令命中的候选指令规则确定为目标指令规则；根据所述目标指令规则，确定目标数据处理引擎；通过所述目标数据处理引擎，对所述数据访问指令进行处理。上述方案能够根据数据访问指令自动确定适用于对该数据访问指令进行处理的数据处理引擎，从而通过该引擎对该数据访问指令进行处理，不需要技术人员人为选择数据处理引擎，提高了数据访问指令处理的效率，降低了数据业务开发的难度。

The embodiments of the present application disclose a data access method, device, equipment and storage medium. The method includes: determining whether a data access instruction hits a candidate instruction rule, and determining the candidate instruction rule hit by the data access instruction as a target instruction rule; determining a target data processing engine according to the target instruction rule; and processing the data access instruction through the target data processing engine. The above scheme can automatically determine the data processing engine suitable for processing the data access instruction according to the data access instruction, so that the data access instruction is processed by the engine, without the need for technicians to manually select the data processing engine, thereby improving the efficiency of data access instruction processing and reducing the difficulty of data business development.

Description

Translated fromChinese

数据访问方法、装置、设备及存储介质Data access method, device, equipment and storage medium

技术领域Technical Field

本申请实施例涉及数据处理技术领域，尤其涉及一种数据访问方法、装置、设备及存储介质。The embodiments of the present application relate to the field of data processing technology, and in particular to a data access method, apparatus, device and storage medium.

背景技术Background Art

目前大数据生态中，不同的数据处理引擎能够提供不同的功能服务。例如，关系型数据处理引擎可以实现高并发性能，但是无法支撑海量数据的访问；nosql数据库能支持海量数据的访问，但无法支撑复杂语句的查询；搜索引擎能够支持海量数据的访问以及较为丰富的查询语法的查询，但性能比较中庸；基于hadoop的离线计算引擎能够处理大量的数据，但是处理效率低。In the current big data ecosystem, different data processing engines can provide different functional services. For example, relational data processing engines can achieve high concurrency performance, but cannot support access to massive data; nosql databases can support access to massive data, but cannot support queries with complex statements; search engines can support access to massive data and queries with richer query syntax, but their performance is mediocre; offline computing engines based on hadoop can process large amounts of data, but their processing efficiency is low.

业务人员在使用上述数据处理引擎时，需要根据自己的业务场景挑选合适的数据处理引擎以制定合理的数据存储和查询策略。此外，上述数据处理引擎往往提供不相同的接口或DSL，为保证更高效的工作，需要在开发业务时对以上数据处理引擎有较为深入的理解，提升了大数据业务的开发难度。When using the above data processing engines, business personnel need to select the appropriate data processing engine according to their business scenarios to formulate reasonable data storage and query strategies. In addition, the above data processing engines often provide different interfaces or DSLs. In order to ensure more efficient work, it is necessary to have a deeper understanding of the above data processing engines when developing business, which increases the difficulty of developing big data business.

发明内容Summary of the invention

本申请实施例提供一种数据访问方法、装置、设备及存储介质，以根据数据访问指令，自动确定适用的数据处理引擎，从而直接通过该引擎进行数据处理。The embodiments of the present application provide a data access method, apparatus, device and storage medium to automatically determine an applicable data processing engine according to a data access instruction, so as to directly perform data processing through the engine.

在一个实施例中，本申请实施例提供了一种数据访问方法，该方法包括：In one embodiment, the present application provides a data access method, the method comprising:

确定数据访问指令是否命中候选指令规则，并将所述数据访问指令命中的候选指令规则确定为目标指令规则；Determine whether the data access instruction hits a candidate instruction rule, and determine the candidate instruction rule hit by the data access instruction as a target instruction rule;

根据所述目标指令规则，确定目标数据处理引擎；Determining a target data processing engine according to the target instruction rule;

通过所述目标数据处理引擎，对所述数据访问指令进行处理。The data access instruction is processed by the target data processing engine.

在另一个实施例中，本申请实施例还提供了一种数据访问装置，该装置包括：In another embodiment, the present application also provides a data access device, the device comprising:

目标指令规则确定模块，用于确定数据访问指令是否命中候选指令规则，并将所述数据访问指令命中的候选指令规则确定为目标指令规则；A target instruction rule determination module, used to determine whether a data access instruction hits a candidate instruction rule, and determine the candidate instruction rule hit by the data access instruction as a target instruction rule;

目标数据处理引擎确定模块，用于根据所述目标指令规则，确定目标数据处理引擎；A target data processing engine determination module, used to determine a target data processing engine according to the target instruction rule;

处理模块，用于通过所述目标数据处理引擎，对所述数据访问指令进行处理。A processing module is used to process the data access instruction through the target data processing engine.

在又一个实施例中，本申请实施例还提供了一种设备，包括：一个或多个处理器；In yet another embodiment, the present application embodiment further provides a device, including: one or more processors;

存储器，用于存储一个或多个程序；A memory for storing one or more programs;

当所述一个或多个程序被所述一个或多个处理器执行，使得所述一个或多个处理器实现本申请实施例任一项所述的数据访问方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the data access method described in any one of the embodiments of the present application.

在再一个实施例中，本申请实施例还提供了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如本申请实施例中任一项所述的数据访问方法。In yet another embodiment, the present application also provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the data access method as described in any one of the embodiments of the present application is implemented.

本申请实施例中，通过确定数据访问指令是否命中候选指令规则，并将所述数据访问指令命中的候选指令规则确定为目标指令规则，从而确定数据访问指令所对应的规则条件，并通过数据处理引擎对应的规则条件，确定适用于对该数据访问指令进行处理的目标数据处理引擎，通过目标数据处理引擎对数据处理访问指令进行处理，不需要技术人员人为选择数据处理引擎，提高了数据访问指令处理的效率，降低了数据业务开发的难度。In an embodiment of the present application, by determining whether a data access instruction hits a candidate instruction rule, and determining the candidate instruction rule hit by the data access instruction as a target instruction rule, the rule condition corresponding to the data access instruction is determined, and through the rule condition corresponding to the data processing engine, a target data processing engine suitable for processing the data access instruction is determined. The data processing access instruction is processed by the target data processing engine, and there is no need for technical personnel to manually select the data processing engine, thereby improving the efficiency of data access instruction processing and reducing the difficulty of data business development.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明一种实施例提供的数据访问方法的流程图；FIG1 is a flow chart of a data access method provided by an embodiment of the present invention;

图2为本发明另一实施例提供的数据访问方法的流程图；FIG2 is a flow chart of a data access method provided by another embodiment of the present invention;

图3为本发明一种实施例提供的数据访问装置结构示意图；FIG3 is a schematic diagram of the structure of a data access device provided by an embodiment of the present invention;

图4为本发明一种实施例提供的数据访问设备的结构示意图。FIG. 4 is a schematic diagram of the structure of a data access device provided by an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释本发明，而非对本发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与本发明相关的部分而非全部结构。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are only used to explain the present invention, rather than to limit the present invention. It should also be noted that, for ease of description, only parts related to the present invention, rather than all structures, are shown in the accompanying drawings.

图1为本发明一种实施例提供的数据访问方法的流程图。本实施例提供的数据访问方法可适用于通过不同的数据处理引擎对数据访问指令进行处理的情况，典型的，本申请实施例可以适用于根据数据访问指令自动选择适用的目标数据处理引擎，以通过目标数据处理引擎对数据访问指令进行处理的情况。该方法具体可以由数据访问装置来执行，该装置可以由软件和/或硬件的方式实现，该装置可以集成在数据访问设备中。参见图1，本申请实施例的方法具体包括：FIG1 is a flow chart of a data access method provided by an embodiment of the present invention. The data access method provided by this embodiment can be applicable to the case where data access instructions are processed by different data processing engines. Typically, the embodiment of the present application can be applicable to the case where an applicable target data processing engine is automatically selected according to the data access instruction to process the data access instruction by the target data processing engine. The method can be specifically executed by a data access device, which can be implemented by software and/or hardware, and the device can be integrated in a data access device. Referring to FIG1 , the method of the embodiment of the present application specifically includes:

S110、确定数据访问指令是否命中候选指令规则，并将所述数据访问指令命中的候选指令规则确定为目标指令规则。S110, determining whether the data access instruction hits a candidate instruction rule, and determining the candidate instruction rule hit by the data access instruction as a target instruction rule.

其中，数据访问指令可以为对数据进行操作的指令，例如SQL语句，其中候选指令规则可以为预先根据候选数据处理引擎的功能特性编写的指令规则，其中包括指令的访问类型和访问条件。例如，HBase的功能特性包括，只能对select子句中没有函数的数据访问指令进行响应和处理，可以将对应的候选指令规则设置为“Project规则：匹配数据选取操作，且不包含子函数”。如果数据访问指令符合该候选指令规则的条件，则将该候选指令规则确定为目标指令规则。The data access instruction may be an instruction for operating on data, such as an SQL statement, and the candidate instruction rule may be an instruction rule pre-written according to the functional characteristics of the candidate data processing engine, including the access type and access condition of the instruction. For example, the functional characteristics of HBase include that it can only respond to and process data access instructions without functions in the select clause, and the corresponding candidate instruction rule may be set to "Project rule: match data selection operations, and do not contain sub-functions". If the data access instruction meets the conditions of the candidate instruction rule, the candidate instruction rule is determined as the target instruction rule.

在本申请实施例中，确定数据访问指令是否命中候选指令规则，包括：将数据访问指令中的访问类型与所述候选指令规则中的访问类型相匹配；若匹配成功，则确定所述数据访问指令中的访问条件是否命中所述候选指令规则中的访问条件。In an embodiment of the present application, determining whether a data access instruction hits a candidate instruction rule includes: matching an access type in the data access instruction with an access type in the candidate instruction rule; if the match is successful, determining whether an access condition in the data access instruction hits an access condition in the candidate instruction rule.

示例性的，对于数据访问指令“select c1from t”，将其与候选指令规则进行对比，该数据访问指令中的“select”访问类型命中Project规则中的“数据选取操作”，数据访问指令满足Project规则不包含子函数的条件，因此，该数据访问指令命中候选指令规则“Project规则：匹配数据选取操作，且不包含子函数”。将该候选指令规则确定为目标指令规则。For example, for the data access instruction "select c1from t", it is compared with the candidate instruction rule. The "select" access type in the data access instruction hits the "data selection operation" in the Project rule. The data access instruction meets the condition that the Project rule does not contain a sub-function. Therefore, the data access instruction hits the candidate instruction rule "Project rule: matches the data selection operation and does not contain a sub-function". The candidate instruction rule is determined as the target instruction rule.

通过确定数据访问指令是否命中候选指令规则，并将命中的候选指令规则确定为目标指令规则，从而分析出数据访问指令所符合的条件，以便选取能够满足数据访问指令的条件的数据处理引擎，对数据访问指令进行处理，实现数据处理引擎选取的自动化和智能化，提高数据处理的效率。By determining whether a data access instruction hits a candidate instruction rule and determining the hit candidate instruction rule as the target instruction rule, the conditions met by the data access instruction are analyzed so as to select a data processing engine that can meet the conditions of the data access instruction, process the data access instruction, realize the automation and intelligence of data processing engine selection, and improve data processing efficiency.

S120、根据所述目标指令规则，确定目标数据处理引擎。S120. Determine a target data processing engine according to the target instruction rule.

由于不同数据处理引擎存在各自的功能特性，不同数据处理引擎的功能特性可能存在不同，因此，针对数据访问指令，需要根据数据处理引擎的功能特性，选择适用于对数据访问指令进行处理的数据处理引擎。例如，当满足以下条件时，选择HBase数据处理引擎：select子句中没有函数；where子句中只包含主键作为条件或没有条件，条件操作符为＝、<＝、>、>＝、！＝、in、between、Like中的至少一种，条件操作数为字面量；没有函数、表达式或字段引用，其中Like语操作只能为左Like操作，不能是右Like操作或左右Like操作。orderby子句只包含主键或者timestamp字段；没有group by子句；Join语句没有多表连接；没有union all操作，但是可以有union操作。当不满足HBase数据处理引擎的条件，满足如下条件时，选择使用HBase+ElasticSearch数据处理引擎：select子句中没有函数，或者有函数，但函数为count、sum、aver、max、min中至少一个。只有半连接SemiJoin语句，并且连接的数据量不大于预设数量阈值；不包含其他类型的Join语句；没有union语句和union all语句。当不满足HBase+ElasticSearch的条件，SQL语句满足如下条件时，使用GreenPlum数据处理引擎：没有多表Join操作。当不满足GreenPlum数据处理引擎的条件时，使用HadoopParquet数据处理引擎。Since different data processing engines have their own functional characteristics, the functional characteristics of different data processing engines may be different. Therefore, for data access instructions, it is necessary to select a data processing engine suitable for processing data access instructions according to the functional characteristics of the data processing engine. For example, when the following conditions are met, select the HBase data processing engine: there is no function in the select clause; the where clause contains only the primary key as a condition or no condition, the condition operator is at least one of =, <=, >, >=, !=, in, between, Like, and the condition operand is a literal; there is no function, expression or field reference, and the Like operation can only be a left Like operation, not a right Like operation or a left and right Like operation. The orderby clause only contains the primary key or timestamp field; there is no group by clause; the Join statement does not have a multi-table connection; there is no union all operation, but there can be a union operation. When the conditions for the HBase data processing engine are not met, the HBase+ElasticSearch data processing engine is selected when the following conditions are met: There is no function in the select clause, or there is a function, but the function is at least one of count, sum, aver, max, and min. There are only semi-join statements, and the amount of data connected is not greater than the preset threshold; no other types of Join statements are included; there are no union statements and union all statements. When the conditions for HBase+ElasticSearch are not met, the GreenPlum data processing engine is used when the SQL statement meets the following conditions: There is no multi-table Join operation. When the conditions for the GreenPlum data processing engine are not met, the HadoopParquet data processing engine is used.

在本申请实施例中，由于预先根据候选数据处理引擎的功能特性编写候选指令规则，因此，可以根据数据访问指令所命中的候选指令规则，确定对应的目标数据处理引擎。In the embodiment of the present application, since the candidate instruction rules are written in advance according to the functional characteristics of the candidate data processing engines, the corresponding target data processing engine can be determined according to the candidate instruction rules hit by the data access instruction.

在本申请实施例中，根据所述目标指令规则，确定目标数据处理引擎，包括：根据所述目标指令规则，以及候选指令规则与候选数据处理引擎的关联关系，从候选数据处理引擎中确定目标数据处理引擎。在根据所述目标指令规则，以及候选指令规则与候选数据处理引擎的关联关系，从候选数据处理引擎中确定目标数据处理引擎之前，所述方法还包括：根据候选数据处理引擎的功能特性，编写候选数据处理引擎对应的候选指令规则；建立所述候选数据处理引擎与对应候选指令规则的关联关系。In an embodiment of the present application, determining the target data processing engine according to the target instruction rule includes: determining the target data processing engine from the candidate data processing engines according to the target instruction rule and the association relationship between the candidate instruction rule and the candidate data processing engine. Before determining the target data processing engine from the candidate data processing engines according to the target instruction rule and the association relationship between the candidate instruction rule and the candidate data processing engine, the method also includes: writing candidate instruction rules corresponding to the candidate data processing engine according to the functional characteristics of the candidate data processing engine; and establishing an association relationship between the candidate data processing engine and the corresponding candidate instruction rule.

其中，候选数据处理引擎可以为HBase数据处理引擎、HBase+ElasticSearch数据处理引擎、GreePlum数据处理引擎和Hadoop Parquet数据处理引擎等。示例性的，由于预先建立了候选指令规则与候选数据处理引擎的关联关系，因此，可以根据该关联关系，确定数据访问指令命中的目标指令规则所对应的目标数据处理引擎。根据目标规则指令以及关联关系，确定对应的目标数据处理引擎，从而实现了根据目标指令规则对应的条件，自动选择适用的目标数据处理引擎，提高了数据处理的效率。Among them, the candidate data processing engines may be HBase data processing engine, HBase+ElasticSearch data processing engine, GreePlum data processing engine, Hadoop Parquet data processing engine, etc. Exemplarily, since the association relationship between the candidate instruction rules and the candidate data processing engines is pre-established, the target data processing engine corresponding to the target instruction rule hit by the data access instruction can be determined according to the association relationship. According to the target rule instructions and the association relationship, the corresponding target data processing engine is determined, thereby realizing the automatic selection of the applicable target data processing engine according to the conditions corresponding to the target instruction rules, thereby improving the efficiency of data processing.

S130、通过所述目标数据处理引擎，对所述数据访问指令进行处理。S130: Process the data access instruction through the target data processing engine.

由于数据访问指令命中目标指令规则，而目标指令规则与目标数据处理引擎相对应，因此，目标数据处理引擎能够满足对数据访问指令的处理要求，具备对数据访问指令进行处理的功能特性，可以通过该目标数据处理引擎对该数据访问指令进行处理，从而实现了对数据访问指令的及时高效处理，不需要技术人员根据数据访问指令人为选择适用的数据处理引擎进行处理，解决了对技术人员技术要求以及限制严格的问题，提高了数据处理效率。Since the data access instruction hits the target instruction rule, and the target instruction rule corresponds to the target data processing engine, the target data processing engine can meet the processing requirements for the data access instruction and has the functional characteristics of processing the data access instruction. The data access instruction can be processed by the target data processing engine, thereby realizing timely and efficient processing of the data access instruction. There is no need for technical personnel to manually select an applicable data processing engine for processing according to the data access instruction, which solves the problem of strict technical requirements and restrictions on technical personnel and improves data processing efficiency.

在本申请实施例中，进行数据查询之前，还包括数据入库操作。具体的，数据以SQLinsert语句的形式进入SQL解析模块，SQL解析模块将SQL解析为AST，并进行数据校验，AST再解析为结构化数据，结构化数据序列化存入Kafka数据库，Kafka数据库经由Flink消费引擎插入HBase数据处理引擎对应的数据库和Hadoop Parquet数据处理引擎对应的数据库，HBase的数据经由HBase协处理器同步插入ElasticSearch数据处理引擎对应的数据库。使用Spark每隔一段时间对Hadoop Parquet数据处理引擎对应的数据库中的数据进行清洗，例如数据去重和文件整理，保证与其他数据处理引擎数据库之间的数据一致性。同样，Hbase数据处理引擎对应的数据库的timestamp字段也赋值为数据进入数据库的入库时间，不同数据处理引擎的数据消费使用不同的Kafka ComsumerGroup，可以通过group获取到不同数据消费到的偏移值以及当前消费数据的入库时间，从而推算出整个系统目前的数据入库进度。In the embodiment of the present application, before the data query is performed, the data storage operation is also included. Specifically, the data enters the SQL parsing module in the form of SQLinsert statements, the SQL parsing module parses the SQL into AST, and performs data verification, and the AST is then parsed into structured data, and the structured data is serialized and stored in the Kafka database. The Kafka database is inserted into the database corresponding to the HBase data processing engine and the database corresponding to the Hadoop Parquet data processing engine via the Flink consumer engine, and the HBase data is synchronously inserted into the database corresponding to the ElasticSearch data processing engine via the HBase coprocessor. Use Spark to clean the data in the database corresponding to the Hadoop Parquet data processing engine at regular intervals, such as data deduplication and file organization, to ensure data consistency with other data processing engine databases. Similarly, the timestamp field of the database corresponding to the Hbase data processing engine is also assigned to the storage time of the data entering the database. Different data processing engines use different Kafka ConsumerGroups for data consumption. The offset values consumed by different data and the storage time of the current consumption data can be obtained through the group, thereby calculating the current data storage progress of the entire system.

本申请实施例中，通过确定数据访问指令是否命中候选指令规则，并将所述数据访问指令命中的候选指令规则确定为目标指令规则，从而确定数据访问指令所对应的规则条件，并通过数据处理引擎对应的规则条件，确定适用于对该数据访问指令进行处理的目标数据处理引擎，通过目标数据处理引擎对该数据处理访问进行处理，不需要技术人员人为选择数据处理引擎，提高了数据访问指令处理的效率，降低了数据业务开发的难度。In an embodiment of the present application, by determining whether a data access instruction hits a candidate instruction rule, and determining the candidate instruction rule hit by the data access instruction as a target instruction rule, the rule condition corresponding to the data access instruction is determined, and through the rule condition corresponding to the data processing engine, a target data processing engine suitable for processing the data access instruction is determined, and the data processing access is processed by the target data processing engine, without the need for technical personnel to manually select the data processing engine, thereby improving the efficiency of data access instruction processing and reducing the difficulty of data business development.

图2为本发明另一实施例提供的数据访问方法的流程图。本申请实施例为对上述实施例基础上进行优化，未在本实施例中详细描述的细节详见上述实施例。参见图2，本实施例提供的数据访问方法可以包括：FIG2 is a flow chart of a data access method provided by another embodiment of the present invention. The present embodiment is an optimization based on the above embodiment. For details not described in detail in this embodiment, please refer to the above embodiment. Referring to FIG2, the data access method provided by this embodiment may include:

S210、根据候选数据处理引擎的功能特性，编写候选数据处理引擎对应的候选指令规则。S210. Compile candidate instruction rules corresponding to the candidate data processing engine according to the functional characteristics of the candidate data processing engine.

示例性的，由于不同的数据处理引擎可能具有不同的功能特性，例如，如果select子句中不包含函数，则可以选择HBase数据处理引擎进行处理，如果包含函数，则HBase数据处理引擎无法进行处理，需要选择其他数据处理引擎进行处理。一般情况下需要技术人员自身掌握各数据处理引擎的功能特性，根据各数据处理引擎的功能特性确定该数据处理引擎是否能够对该数据访问指令进行处理，如果是，则选择该数据处理引擎进行处理，如果不是，则根据其他数据处理引擎的功能特性，选择能够对该数据访问指令进行处理，而此方式对技术人员的技术要求和专业水平要求较高，需要技术人员对各数据处理引擎都有深入的了解，业务开发处理的难度较大。在本申请实施例中，根据各候选数据处理引擎的功能特性，编写候选数据处理引擎对应的候选指令规则，存储候选指令规则，而不需要技术人员深入了解各数据处理引擎的功能特性，从而降低了对技术人员的限制，便于快速确定目标数据处理引擎并对数据访问指令及时处理。Exemplary, since different data processing engines may have different functional characteristics, for example, if the select clause does not contain a function, the HBase data processing engine can be selected for processing. If a function is included, the HBase data processing engine cannot be processed and other data processing engines need to be selected for processing. Generally, the technicians need to master the functional characteristics of each data processing engine, and determine whether the data processing engine can process the data access instruction according to the functional characteristics of each data processing engine. If yes, the data processing engine is selected for processing. If not, according to the functional characteristics of other data processing engines, the data access instruction can be processed. This method has high technical requirements and professional level requirements for the technicians, and requires the technicians to have an in-depth understanding of each data processing engine, and the difficulty of business development processing is relatively large. In the embodiment of the present application, according to the functional characteristics of each candidate data processing engine, the candidate instruction rules corresponding to the candidate data processing engine are written, and the candidate instruction rules are stored, without the need for the technicians to have an in-depth understanding of the functional characteristics of each data processing engine, thereby reducing the restrictions on the technicians, facilitating the rapid determination of the target data processing engine and timely processing of the data access instruction.

S220、建立所述候选数据处理引擎与对应候选指令规则的关联关系。S220: Establish an association relationship between the candidate data processing engine and the corresponding candidate instruction rule.

为了便于后续确定目标指令规则对应的目标数据处理引擎，在本申请实施例中，在确定各候选数据处理引擎的候选指令规则后，建立候选数据处理引擎与对应候选指令规则的关联关系，并进行存储。由于候选指令规则中包括目标指令规则，因此，根据候选指令规则与候选数据处理引擎的关联关系，能够确定目标指令规则对应的目标数据处理引擎，实现了数据处理引擎的智能适应性选择，而不需要人工选取。In order to facilitate the subsequent determination of the target data processing engine corresponding to the target instruction rule, in the embodiment of the present application, after determining the candidate instruction rules of each candidate data processing engine, an association relationship between the candidate data processing engine and the corresponding candidate instruction rule is established and stored. Since the candidate instruction rules include the target instruction rule, the target data processing engine corresponding to the target instruction rule can be determined according to the association relationship between the candidate instruction rule and the candidate data processing engine, thereby realizing the intelligent adaptive selection of the data processing engine without the need for manual selection.

S230、确定数据访问指令是否命中候选指令规则，并将所述数据访问指令命中的候选指令规则确定为目标指令规则。S230, determining whether the data access instruction hits a candidate instruction rule, and determining the candidate instruction rule hit by the data access instruction as a target instruction rule.

在本申请实施例中，确定数据访问指令是否命中候选指令规则，并将所述数据访问指令命中的候选指令规则确定为目标指令规则，包括：确定数据访问指令是否命中HBase数据处理引擎对应的候选指令规则；若未命中，则确定数据访问指令是否命中HBase和ElasticSearch数据处理引擎对应的候选指令规则；若未命中，则确定数据访问指令是否命中GreenPlum数据处理引擎对应的候选指令规则；若未命中，则确定数据访问指令是否命中Hadoop Parquet数据处理引擎对应的候选指令规则。In an embodiment of the present application, it is determined whether a data access instruction hits a candidate instruction rule, and the candidate instruction rule hit by the data access instruction is determined as a target instruction rule, including: determining whether the data access instruction hits the candidate instruction rule corresponding to the HBase data processing engine; if not, determining whether the data access instruction hits the candidate instruction rule corresponding to the HBase and ElasticSearch data processing engines; if not, determining whether the data access instruction hits the candidate instruction rule corresponding to the GreenPlum data processing engine; if not, determining whether the data access instruction hits the candidate instruction rule corresponding to the Hadoop Parquet data processing engine.

示例性的，由于HBase数据处理引擎的处理速度最快，但是能够处理的范围最小，因此，优先确定是否可以通过HBase数据处理引擎进行数据访问指令的处理。各数据处理引擎的处理速度从快到慢的排序依次为：HBase、HBase和ElasticSearch、GreenPlum、HadoopParquet。因此，可以根据以上排序的优先顺序确定数据访问指令命中的候选指令规则。当然，也可以根据实际需要对不同的数据处理引擎进行排序，这里不做具体限制。Exemplarily, since the HBase data processing engine has the fastest processing speed but the smallest processing range, it is preferred to determine whether the data access instruction can be processed by the HBase data processing engine. The processing speeds of the data processing engines are ranked from fast to slow as follows: HBase, HBase and ElasticSearch, GreenPlum, and HadoopParquet. Therefore, the candidate instruction rules for the data access instruction hit can be determined according to the priority order of the above ranking. Of course, different data processing engines can also be ranked according to actual needs, and no specific restrictions are made here.

确定数据访问指令是否命中HBase数据处理引擎对应的候选指令规则，包括：若所述数据访问指令命中TableScan规则中的数据扫描操作，和/或，若所述数据访问指令命中Project规则中的数据选取操作，且不包含子函数，和/或，若所述数据访问指令命中Sort规则的数据排序操作，且排序字段只有一个，和/或，若所述数据访问指令命中Filter规则的数据过滤操作，且数据过滤操作的条件只包括主键值或无条件，和/或，若所述数据访问指令命中Join规则的数据半连接操作，且无多表连接操作，和/或，若所述数据访问指令命中Union规则的数据合并操作，则确定所述数据访问指令命中HBase数据处理引擎对应的候选指令规则。确定数据访问指令是否命中HBase和ElasticSearch数据处理引擎对应的候选指令规则，包括：若所述数据访问指令未命中HBase数据处理引擎对应的候选指令规则，且，若所述数据访问指令命中Project规则的数据选取操作，且无子函数或函数为count、sum、aver、max、min中的至少一个，和/或，若所述数据访问指令命中Filter规则的数据过滤操作，且不包含函数，存在模糊查询，和/或，若所述数据访问指令命中Join规则的数据半连接操作，且连接的数据量小于预设数量阈值，则确定数据访问指令命中HBase和ElasticSearch数据处理引擎对应的候选指令规则。确定数据访问指令是否命中GreenPlum数据处理引擎对应的候选指令规则，包括：若所述数据访问指令未命中HBase数据处理引擎对应的候选指令规则或者HBase和ElasticSearch数据处理引擎对应的候选指令规则，且当所述数据访问指令命中Join规则的数据半连接操作时，无多表连接操作，则确定数据访问指令命中GreenPlum数据处理引擎对应的候选指令规则。确定数据访问指令是否命中Hadoop Parquet数据处理引擎对应的候选指令规则，包括：若所述数据访问指令未命中HBase数据处理引擎对应的候选指令规则、HBase和ElasticSearch数据处理引擎对应的候选指令规则或GreenPlum数据处理引擎对应的候选指令规则，则确定数据访问指令命中Hadoop Parquet数据处理引擎对应的候选指令规则。Determine whether the data access instruction hits the candidate instruction rule corresponding to the HBase data processing engine, including: if the data access instruction hits the data scanning operation in the TableScan rule, and/or, if the data access instruction hits the data selection operation in the Project rule and does not contain a subfunction, and/or, if the data access instruction hits the data sorting operation of the Sort rule and there is only one sorting field, and/or, if the data access instruction hits the data filtering operation of the Filter rule and the condition of the data filtering operation only includes the primary key value or no condition, and/or, if the data access instruction hits the data semi-join operation of the Join rule and there is no multi-table join operation, and/or, if the data access instruction hits the data merging operation of the Union rule, then determine that the data access instruction hits the candidate instruction rule corresponding to the HBase data processing engine. Determine whether the data access instruction hits the candidate instruction rules corresponding to the HBase and ElasticSearch data processing engines, including: if the data access instruction does not hit the candidate instruction rules corresponding to the HBase data processing engine, and if the data access instruction hits the data selection operation of the Project rule, and there is no sub-function or the function is at least one of count, sum, aver, max, min, and/or, if the data access instruction hits the data filtering operation of the Filter rule, and does not contain a function, there is a fuzzy query, and/or, if the data access instruction hits the data semi-join operation of the Join rule, and the amount of connected data is less than a preset quantity threshold, then determine that the data access instruction hits the candidate instruction rules corresponding to the HBase and ElasticSearch data processing engines. Determine whether the data access instruction hits the candidate instruction rule corresponding to the GreenPlum data processing engine, including: if the data access instruction does not hit the candidate instruction rule corresponding to the HBase data processing engine or the candidate instruction rule corresponding to the HBase and ElasticSearch data processing engines, and when the data access instruction hits the data semi-join operation of the Join rule, there is no multi-table join operation, then determine that the data access instruction hits the candidate instruction rule corresponding to the GreenPlum data processing engine. Determine whether the data access instruction hits the candidate instruction rule corresponding to the Hadoop Parquet data processing engine, including: if the data access instruction does not hit the candidate instruction rule corresponding to the HBase data processing engine, the candidate instruction rule corresponding to the HBase and ElasticSearch data processing engines, or the candidate instruction rule corresponding to the GreenPlum data processing engine, then determine that the data access instruction hits the candidate instruction rule corresponding to the Hadoop Parquet data processing engine.

具体的，SQL查询语句经由Calcite Core SQL解析为AST，针对Project、TableScan、Join、Filter、Sort、Union操作进行处理：Specifically, the SQL query statement is parsed into AST by Calcite Core SQL, and processed for Project, TableScan, Join, Filter, Sort, and Union operations:

1)TableScan规则：匹配数据扫描操作，即TableScan节点，将匹配到语句中的TableScan对象替换为HBaseTableScan，即默认选择使用HBase数据处理引擎进行数据访问指令的处理。1) TableScan rule: Matches data scanning operations, that is, TableScan nodes, and replaces the TableScan object matched in the statement with HBaseTableScan, that is, the HBase data processing engine is selected by default to process data access instructions.

2)Project规则：匹配数据选择操作，即SQL中的select子句，规则满足时，判断Project中是否包含函数，如果包含函数，判断函数是否包括count、sum、aver、max、min中的至少一个，若是，拷贝替换Project子树为新的等价子树OptProject，将TableScan对象替换为ElasticSearchHBaseTableScan对象，即选择使用HBase+ElasticSearch数据处理引擎进行数据处理。若否，遍历Project子树节点，拷贝替换Project子树为新的等价子树OptProject，将TableScan对象替换为ParquetTableScan对象，即选择使用Hadoop Parquet数据处理引擎进行数据访问指令的处理。如果不包含函数，则选择使用HBase数据处理引擎进行数据访问指令的处理。2) Project rule: Match data selection operations, that is, the select clause in SQL. When the rule is met, determine whether the Project contains functions. If it contains functions, determine whether the functions include at least one of count, sum, aver, max, and min. If so, copy and replace the Project subtree with a new equivalent subtree OptProject, and replace the TableScan object with the ElasticSearchHBaseTableScan object, that is, choose to use the HBase+ElasticSearch data processing engine for data processing. If not, traverse the Project subtree nodes, copy and replace the Project subtree with a new equivalent subtree OptProject, and replace the TableScan object with the ParquetTableScan object, that is, choose to use the Hadoop Parquet data processing engine to process data access instructions. If it does not contain functions, choose to use the HBase data processing engine to process data access instructions.

3)Sort规则：匹配数据的排序操作，当此规则满足时，判断排序字段是否只有唯一键，若排序字段只有唯一键，选择使用HBase数据处理引擎进行数据访问指令的处理。若为其他排序方式，则拷贝替换Sort子树为新的等价子树OptSort，TableScan替换为ElasticSearchHBaseTableScan，即选择使用HBase+ElasticSearch数据处理引擎进行数据处理。3) Sort rule: Matches the sorting operation of data. When this rule is met, determine whether the sorting field has only a unique key. If the sorting field has only a unique key, choose to use the HBase data processing engine to process the data access instruction. If it is another sorting method, copy and replace the Sort subtree with a new equivalent subtree OptSort, and replace TableScan with ElasticSearchHBaseTableScan, that is, choose to use the HBase+ElasticSearch data processing engine for data processing.

4)Filter规则：匹配数据过滤操作，即对应SQL中的where条件，当规则满足时，判断查询条件是否包含函数：若包含函数，则拷贝替换Filter子树为新的等价子树OptFilter，将TableScan替换为GreePlumTableScan，即选择使用GreePlum数据处理引擎进行数据访问指令的处理。若不包含函数，则判断是否存在模糊查询，若存在模糊查询，则拷贝替换Filter子树为新的等价子树OptFilter，将TableScan替换为ElasticSearchHBaseTableScan，即选择使用HBase+ElasticSearch数据处理引擎进行数据处理。若不存在模糊查询，查询条件如果只有唯一ID，选择使用HBase数据处理引擎进行数据访问指令的处理，否则拷贝替换Filter子树为新的等价子树OptFilter，将TableScan替换为GreePlumTableScan，即选择使用GreePlum数据处理引擎进行数据访问指令的处理。若包含LIKE操作，如果为左LIKE操作，选择使用HBase数据处理引擎进行数据访问指令的处理，否则替换Filter子树为新的等价子树OptFilter，将TableScan替换为GreePlumTableScan，即选择使用GreePlum数据处理引擎进行数据访问指令的处理。4) Filter rule: Match the data filtering operation, which corresponds to the where condition in SQL. When the rule is met, determine whether the query condition contains a function: If it contains a function, copy and replace the Filter subtree with a new equivalent subtree OptFilter, and replace TableScan with GreePlumTableScan, that is, choose to use the GreePlum data processing engine to process the data access instructions. If it does not contain a function, determine whether there is a fuzzy query. If there is a fuzzy query, copy and replace the Filter subtree with a new equivalent subtree OptFilter, and replace TableScan with ElasticSearchHBaseTableScan, that is, choose to use the HBase+ElasticSearch data processing engine for data processing. If there is no fuzzy query, if the query condition only has a unique ID, choose to use the HBase data processing engine to process the data access instructions, otherwise copy and replace the Filter subtree with a new equivalent subtree OptFilter, and replace TableScan with GreePlumTableScan, that is, choose to use the GreePlum data processing engine to process the data access instructions. If a LIKE operation is included, if it is a left LIKE operation, choose to use the HBase data processing engine to process the data access instruction. Otherwise, replace the Filter subtree with the new equivalent subtree OptFilter, and replace TableScan with GreePlumTableScan, that is, choose to use the GreePlum data processing engine to process the data access instruction.

5)Join规则：匹配数据表之间的Join操作，比如left join、right join、innerjoin或outer join，当此规则满足时，检查匹配的Join对象及其左右子树，确定是否为SEMI JOIN操作，即in、exsit等操作，如果为SEMI JOIN操作，获取SEMI JOIN操作右侧查询的Filter条件，按该条件调用ElasticSearch原生API查询符合条件的总数。如果总数大于预设数量阈值，拷贝替换Join子树为新的等价子树OptJoin，将SEMI JOIN操作左右两侧的ElasticSearchHBaseTableScan都替换为ParquetTableScan，即选择使用HadoopParquet数据处理引擎进行数据访问指令的处理。如果总数不大于阈值，选择使用HBase数据处理引擎进行数据访问指令的处理。如果为其他类型的JOIN语句，则拷贝替换Join子树为新的等价子树OptJoin，将JOIN语句左右两侧的ElasticSearchHBaseTableScan都替换为ParquetTableScan，即选择使用Hadoop Parquet数据处理引擎进行数据访问指令的处理。5) Join rule: Match the Join operation between data tables, such as left join, right join, innerjoin or outer join. When this rule is met, check the matching Join object and its left and right subtrees to determine whether it is a SEMI JOIN operation, that is, in, exsit and other operations. If it is a SEMI JOIN operation, obtain the Filter condition of the query on the right side of the SEMI JOIN operation, and call the ElasticSearch native API according to the condition to query the total number of qualified conditions. If the total number is greater than the preset number threshold, copy and replace the Join subtree with a new equivalent subtree OptJoin, and replace the ElasticSearchHBaseTableScan on both sides of the SEMI JOIN operation with ParquetTableScan, that is, choose to use the HadoopParquet data processing engine to process data access instructions. If the total number is not greater than the threshold, choose to use the HBase data processing engine to process data access instructions. If it is another type of JOIN statement, copy and replace the Join subtree with the new equivalent subtree OptJoin, and replace the ElasticSearchHBaseTableScan on both sides of the JOIN statement with ParquetTableScan, that is, choose to use the Hadoop Parquet data processing engine to process data access instructions.

6)Union规则：匹配Union和Union ALL操作，当规则满足，检查是否为Union ALL操作，如果是Union ALL操作，拷贝替换Union子树为新的等价子树OptUnion将子树中的所有TableScan替换为ParquetTableScan，即选择使用Hadoop Parquet数据处理引擎进行数据访问指令的处理。如果是普通类型的Union操作，选择使用HBase数据处理引擎进行数据访问指令的处理。6) Union rule: Matches Union and Union ALL operations. When the rule is met, check whether it is a Union ALL operation. If it is a Union ALL operation, copy and replace the Union subtree with a new equivalent subtree OptUnion to replace all TableScans in the subtree with ParquetTableScan, that is, choose to use the Hadoop Parquet data processing engine to process data access instructions. If it is a common type of Union operation, choose to use the HBase data processing engine to process data access instructions.

S240、根据所述目标指令规则，确定目标数据处理引擎。S240. Determine a target data processing engine according to the target instruction rule.

S250、通过所述目标数据处理引擎，对所述数据访问指令进行处理。S250: Process the data access instruction through the target data processing engine.

在通过所述目标数据处理引擎，对所述数据访问指令进行处理之前，所述方法还包括：对候选数据处理引擎对应的接口进行标准化封装，确定标准接口，用于接收所述数据访问指令。Before the target data processing engine processes the data access instruction, the method further includes: performing standardized encapsulation on the interface corresponding to the candidate data processing engine to determine a standard interface for receiving the data access instruction.

通过将候选数据处理引擎的接口进行封装，从而对外提供一个标准接口，使各数据处理引擎对外抽象为一个数据库，用户只需创建数据表，而不需要考虑不同的数据存储方案后数据查询方案，实现统一便捷处理。By encapsulating the interfaces of candidate data processing engines, a standard interface is provided to the outside world, so that each data processing engine is abstracted as a database. Users only need to create data tables without considering different data storage solutions and data query solutions, thus achieving unified and convenient processing.

本申请实施例的技术方案，根据各候选数据处理引擎的功能特性，编写候选数据处理引擎对应的候选指令规则，存储候选指令规则，而不需要技术人员深入了解各数据处理引擎的功能特性，从而降低了对技术人员的要求，便于快速确定目标数据处理引擎并对数据访问指令及时处理。根据候选指令规则与候选数据处理引擎的关联关系，能够确定目标指令规则对应的目标数据处理引擎，实现了数据处理引擎的智能适应性选择，而不需要人工选取，提高了处理效率。The technical solution of the embodiment of the present application compiles candidate instruction rules corresponding to the candidate data processing engines according to the functional characteristics of each candidate data processing engine, and stores the candidate instruction rules, without requiring the technical personnel to have an in-depth understanding of the functional characteristics of each data processing engine, thereby reducing the requirements for the technical personnel, facilitating the rapid determination of the target data processing engine and timely processing of data access instructions. According to the association between the candidate instruction rules and the candidate data processing engines, the target data processing engine corresponding to the target instruction rules can be determined, realizing the intelligent adaptive selection of the data processing engine without the need for manual selection, thereby improving processing efficiency.

图3为本发明一种实施例提供的数据访问装置结构示意图。该装置可适用于通过不同的数据处理引擎对数据访问指令进行处理的情况，典型的，本申请实施例可以适用于根据数据访问指令自动选择适用的目标数据处理引擎，以通过目标数据处理引擎对数据访问指令进行处理的情况。该装置可以由软件和/或硬件的方式实现，该装置可以集成在数据访问设备中。参见图3，该装置具体包括：FIG3 is a schematic diagram of the structure of a data access device provided by an embodiment of the present invention. The device can be applicable to the case where data access instructions are processed by different data processing engines. Typically, the embodiment of the present application can be applicable to the case where an applicable target data processing engine is automatically selected according to the data access instruction so as to process the data access instruction by the target data processing engine. The device can be implemented by software and/or hardware, and the device can be integrated in a data access device. Referring to FIG3, the device specifically includes:

目标指令规则确定模块310，用于确定数据访问指令是否命中候选指令规则，并将所述数据访问指令命中的候选指令规则确定为目标指令规则；A target instruction rule determination module 310 is used to determine whether a data access instruction hits a candidate instruction rule, and determine the candidate instruction rule hit by the data access instruction as a target instruction rule;

目标数据处理引擎确定模块320，用于根据所述目标指令规则，确定目标数据处理引擎；A target data processing engine determination module 320, configured to determine a target data processing engine according to the target instruction rule;

处理模块330，用于通过所述目标数据处理引擎，对所述数据访问指令进行处理。The processing module 330 is used to process the data access instruction through the target data processing engine.

在本申请实施例中，所述目标指令规则确定模块310，包括：In the embodiment of the present application, the target instruction rule determination module 310 includes:

访问类型匹配单元，用于将数据访问指令中的访问类型与所述候选指令规则中的访问类型相匹配；an access type matching unit, for matching an access type in a data access instruction with an access type in the candidate instruction rule;

访问条件匹配单元，用于若匹配成功，则确定所述数据访问指令中的访问条件是否命中所述候选指令规则中的访问条件。The access condition matching unit is used to determine whether the access condition in the data access instruction matches the access condition in the candidate instruction rule if the match is successful.

第一命中单元，用于确定数据访问指令是否命中HBase数据处理引擎对应的候选指令规则；A first hit unit, used to determine whether the data access instruction hits the candidate instruction rule corresponding to the HBase data processing engine;

第二命中单元，用于若未命中，则确定数据访问指令是否命中HBase和ElasticSearch数据处理引擎对应的候选指令规则；The second hit unit is used to determine whether the data access instruction hits the candidate instruction rules corresponding to the HBase and ElasticSearch data processing engines if no hit is found;

第三命中单元，用于若未命中，则确定数据访问指令是否命中GreenPlum数据处理引擎对应的候选指令规则；A third hit unit is used to determine whether the data access instruction hits the candidate instruction rule corresponding to the GreenPlum data processing engine if there is no hit;

第四命中单元，用于若未命中，则确定数据访问指令是否命中HadoopParquet数据处理引擎对应的候选指令规则。The fourth hit unit is used to determine whether the data access instruction hits the candidate instruction rule corresponding to the Hadoop Parquet data processing engine if there is no hit.

在本申请实施例中，所述第一命中单元，具体用于：In the embodiment of the present application, the first hit unit is specifically used for:

若所述数据访问指令命中TableScan规则中的数据扫描操作，和/或，If the data access instruction hits the data scan operation in the TableScan rule, and/or,

若所述数据访问指令命中Project规则中的数据选取操作，且不包含子函数，和/或，If the data access instruction hits the data selection operation in the Project rule and does not contain a sub-function, and/or,

若所述数据访问指令命中Sort规则的数据排序操作，且排序字段只有一个，和/或，If the data access instruction hits the data sorting operation of the Sort rule, and there is only one sorting field, and/or,

若所述数据访问指令命中Filter规则的数据过滤操作，且数据过滤操作的条件只包括主键值或无条件，和/或，If the data access instruction hits the data filtering operation of the Filter rule, and the conditions of the data filtering operation only include the primary key value or no condition, and/or,

若所述数据访问指令命中Join规则的数据半连接操作，且无多表连接操作，和/或，If the data access instruction hits the data semi-join operation of the Join rule and there is no multi-table join operation, and/or,

若所述数据访问指令命中Union规则的数据合并操作，则确定所述数据访问指令命中HBase数据处理引擎对应的候选指令规则。If the data access instruction hits the data merge operation of the Union rule, it is determined that the data access instruction hits the candidate instruction rule corresponding to the HBase data processing engine.

在本申请实施例中，所述第二命中单元，具体用于：In the embodiment of the present application, the second hit unit is specifically used for:

若所述数据访问指令未命中HBase数据处理引擎对应的候选指令规则，且，If the data access instruction does not hit the candidate instruction rule corresponding to the HBase data processing engine, and,

若所述数据访问指令命中Project规则的数据选取操作，且无子函数或函数为count、sum、aver、max、min中的至少一个，和/或，If the data access instruction hits the data selection operation of the Project rule, and there is no sub-function or the function is at least one of count, sum, aver, max, min, and/or,

若所述数据访问指令命中Filter规则的数据过滤操作，且不包含函数，存在模糊查询，和/或，If the data access instruction hits the data filtering operation of the Filter rule and does not contain a function, there is a fuzzy query, and/or,

若所述数据访问指令命中Join规则的数据半连接操作，且连接的数据量小于预设数量阈值，则确定数据访问指令命中HBase和ElasticSearch数据处理引擎对应的候选指令规则。If the data access instruction hits the data semi-join operation of the Join rule, and the amount of connected data is less than a preset quantity threshold, it is determined that the data access instruction hits the candidate instruction rules corresponding to the HBase and ElasticSearch data processing engines.

在本申请实施例中，所述第三命中单元，具体用于：In the embodiment of the present application, the third hit unit is specifically used for:

若所述数据访问指令未命中HBase数据处理引擎对应的候选指令规则或者HBase和ElasticSearch数据处理引擎对应的候选指令规则，且当所述数据访问指令命中Join规则的数据半连接操作时，无多表连接操作，则确定数据访问指令命中GreenPlum数据处理引擎对应的候选指令规则。If the data access instruction does not hit the candidate instruction rule corresponding to the HBase data processing engine or the candidate instruction rule corresponding to the HBase and ElasticSearch data processing engines, and when the data access instruction hits the data semi-join operation of the Join rule, there is no multi-table join operation, then it is determined that the data access instruction hits the candidate instruction rule corresponding to the GreenPlum data processing engine.

在本申请实施例中，所述第四命中单元，具体用于：In the embodiment of the present application, the fourth hit unit is specifically used for:

若所述数据访问指令未命中HBase数据处理引擎对应的候选指令规则、HBase和ElasticSearch数据处理引擎对应的候选指令规则或GreenPlum数据处理引擎对应的候选指令规则，则确定数据访问指令命中Hadoop Parquet数据处理引擎对应的候选指令规则。If the data access instruction does not hit the candidate instruction rule corresponding to the HBase data processing engine, the candidate instruction rule corresponding to the HBase and ElasticSearch data processing engines, or the candidate instruction rule corresponding to the GreenPlum data processing engine, it is determined that the data access instruction hits the candidate instruction rule corresponding to the Hadoop Parquet data processing engine.

在本申请实施例中，所述目标数据处理引擎确定模块320，包括：In the embodiment of the present application, the target data processing engine determination module 320 includes:

关联确定单元，用于根据所述目标指令规则，以及候选指令规则与候选数据处理引擎的关联关系，从候选数据处理引擎中确定目标数据处理引擎。The association determination unit is used to determine the target data processing engine from the candidate data processing engines according to the target instruction rule and the association relationship between the candidate instruction rule and the candidate data processing engine.

在本申请实施例中，所述装置还包括：In the embodiment of the present application, the device further includes:

目标指令规则编写模块，用于根据候选数据处理引擎的功能特性，编写候选数据处理引擎对应的候选指令规则；A target instruction rule writing module is used to write candidate instruction rules corresponding to the candidate data processing engine according to the functional characteristics of the candidate data processing engine;

关联关系建立模块，用于建立所述候选数据处理引擎与对应候选指令规则的关联关系。The association relationship establishing module is used to establish an association relationship between the candidate data processing engine and the corresponding candidate instruction rule.

标准接口确定模块，用于对候选数据处理引擎对应的接口进行标准化封装，确定标准接口，用于接收所述数据访问指令。The standard interface determination module is used to perform standardized encapsulation on the interface corresponding to the candidate data processing engine and determine the standard interface for receiving the data access instruction.

本申请实施例所提供的数据访问装置可执行本申请任意实施例所提供的数据访问方法，具备执行方法相应的功能模块和有益效果。The data access device provided in the embodiments of the present application can execute the data access method provided in any embodiment of the present application, and has the corresponding functional modules and beneficial effects of the execution method.

图4为本发明一种实施例提供的数据访问设备的结构示意图。图4示出了适于用来实现本申请实施例的示例性数据访问设备412的框图。图4显示的数据访问设备412仅仅是一个示例，不应对本申请实施例的功能和使用范围带来任何限制。FIG4 is a schematic diagram of the structure of a data access device provided by an embodiment of the present invention. FIG4 shows a block diagram of an exemplary data access device 412 suitable for implementing an embodiment of the present application. The data access device 412 shown in FIG4 is only an example and should not bring any limitation to the functions and scope of use of the embodiment of the present application.

如图4所示，数据访问设备412可以包括：一个或多个处理器416；存储器428，用于存储一个或多个程序，当所述一个或多个程序被所述一个或多个处理器416执行，使得所述一个或多个处理器416实现本申请实施例所提供的数据访问方法，包括：As shown in FIG. 4 , the data access device 412 may include: one or more processors 416; a memory 428 for storing one or more programs. When the one or more programs are executed by the one or more processors 416, the one or more processors 416 implement the data access method provided in the embodiment of the present application, including:

数据访问设备412的组件可以包括但不限于：一个或者多个处理器或者处理器416，存储器428，连接不同设备组件(包括存储器428和处理器416)的总线418。Components of the data access device 412 may include, but are not limited to, one or more processors or a processor 416 , a memory 428 , and a bus 418 that connects the various device components (including the memory 428 and the processor 416 ).

总线418表示几类总线结构中的一种或多种，包括存储器总线或者存储器控制器，外围总线，图形加速端口，处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说，这些体系结构包括但不限于工业标准体系结构(ISA)总线，微通道体系结构(MAC)总线，增强型ISA总线、视频电子标准协会(VESA)局域总线以及外围组件互连(PCI)总线。Bus 418 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor or a local bus using any of a variety of bus architectures. For example, these architectures include but are not limited to Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, Enhanced ISA bus, Video Electronics Standards Association (VESA) local bus and Peripheral Component Interconnect (PCI) bus.

数据访问设备412典型地包括多种计算机设备可读存储介质。这些存储介质可以是任何能够被数据访问设备412访问的可用存储介质，包括易失性和非易失性存储介质，可移动的和不可移动的存储介质。The data access device 412 typically includes a variety of computer device readable storage media. These storage media can be any available storage media that can be accessed by the data access device 412, including volatile and non-volatile storage media, removable and non-removable storage media.

存储器428可以包括易失性存储器形式的计算机设备可读存储介质，例如随机存取存储器(RAM)430和/或高速缓存存储器432。数据访问设备412可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机设备存储介质。仅作为举例，存储系统434可以用于读写不可移动的、非易失性磁存储介质(图4未显示，通常称为“硬盘驱动器”)。尽管图4中未示出，可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器，以及对可移动非易失性光盘(例如CD-ROM,DVD-ROM或者其它光存储介质)读写的光盘驱动器。在这些情况下，每个驱动器可以通过一个或者多个数据存储介质接口与总线418相连。存储器428可以包括至少一个程序产品，该程序产品具有一组(例如至少一个)程序模块，这些程序模块被配置以执行本发明各实施例的功能。The memory 428 may include computer device readable storage media in the form of volatile memory, such as random access memory (RAM) 430 and/or cache memory 432. The data access device 412 may further include other removable/non-removable, volatile/non-volatile computer device storage media. By way of example only, the storage system 434 may be used to read and write non-removable, non-volatile magnetic storage media (not shown in FIG. 4 , commonly referred to as a “hard drive”). Although not shown in FIG. 4 , a disk drive for reading and writing removable non-volatile disks (such as “floppy disks”) and an optical disk drive for reading and writing removable non-volatile optical disks (such as CD-ROMs, DVD-ROMs or other optical storage media) may be provided. In these cases, each drive may be connected to the bus 418 via one or more data storage medium interfaces. The memory 428 may include at least one program product having a set (e.g., at least one) of program modules that are configured to perform the functions of various embodiments of the present invention.

具有一组(至少一个)程序模块442的程序/实用工具440，可以存储在例如存储器428中，这样的程序模块442包括但不限于操作设备、一个或者多个应用程序、其它程序模块以及程序数据，这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块442通常执行本发明所描述的实施例中的功能和/或方法。A program/utility 440 having a set (at least one) of program modules 442 may be stored, for example, in the memory 428, such program modules 442 including, but not limited to, operating devices, one or more application programs, other program modules, and program data, each of which or some combination may include an implementation of a network environment. The program modules 442 generally perform the functions and/or methods of the embodiments described herein.

数据访问设备412也可以与一个或多个外部设备414(例如键盘、指向设备、显示器426等)通信，还可与一个或者多个使得用户能与该数据访问设备412交互的设备通信，和/或与使得该数据访问设备412能与一个或多个其它计算设备进行通信的任何设备(例如网卡，调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口422进行。并且，数据访问设备412还可以通过网络适配器420与一个或者多个网络(例如局域网(LAN)，广域网(WAN)和/或公共网络，例如因特网)通信。如图4所示，网络适配器420通过总线418与数据访问设备412的其它模块通信。应当明白，尽管图4中未示出，可以结合数据访问设备412使用其它硬件和/或软件模块，包括但不限于：微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID设备、磁带驱动器以及数据备份存储设备等。The data access device 412 may also communicate with one or more external devices 414 (e.g., keyboards, pointing devices, displays 426, etc.), may also communicate with one or more devices that enable a user to interact with the data access device 412, and/or may communicate with any device that enables the data access device 412 to communicate with one or more other computing devices (e.g., network cards, modems, etc.). Such communication may be performed via an input/output (I/O) interface 422. Furthermore, the data access device 412 may also communicate with one or more networks (e.g., local area networks (LANs), wide area networks (WANs), and/or public networks, such as the Internet) via a network adapter 420. As shown in FIG. 4 , the network adapter 420 communicates with other modules of the data access device 412 via a bus 418. It should be understood that, although not shown in FIG. 4 , other hardware and/or software modules may be used in conjunction with the data access device 412, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID devices, tape drives, and data backup storage devices, etc.

处理器416通过运行存储在存储器428中的多个程序中其他程序的至少一个，从而执行各种功能应用以及数据处理，例如实现本申请实施例所提供的一种数据访问方法。The processor 416 executes various functional applications and data processing by running at least one of the other programs among the multiple programs stored in the memory 428, such as implementing a data access method provided in an embodiment of the present application.

本发明一种实施例提供了一种包含计算机可执行指令的存储介质，所述计算机可执行指令在由计算机处理器执行时用于执行数据访问方法，包括：An embodiment of the present invention provides a storage medium containing computer executable instructions, wherein the computer executable instructions are used to perform a data access method when executed by a computer processor, including:

本申请实施例的计算机存储介质，可以采用一个或多个计算机可读的存储介质的任意组合。计算机可读存储介质可以是计算机可读信号存储介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的设备、装置或器件，或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括：具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请实施例中，计算机可读存储介质可以是任何包含或存储程序的有形存储介质，该程序可以被指令执行设备、装置或者器件使用或者与其结合使用。The computer storage medium of the embodiment of the present application can adopt any combination of one or more computer-readable storage media. The computer-readable storage medium can be a computer-readable signal storage medium or a computer-readable storage medium. The computer-readable storage medium can be, for example, - but not limited to - electrical, magnetic, optical, electromagnetic, infrared, or semiconductor equipment, devices or devices, or any combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above. In an embodiment of the present application, a computer-readable storage medium can be any tangible storage medium containing or storing a program, which can be used by an instruction execution device, device or device or used in combination with it.

计算机可读的信号存储介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号存储介质还可以是计算机可读存储介质以外的任何计算机可读存储介质，该计算机可读存储介质可以发送、传播或者传输用于由指令执行设备、装置或者器件使用或者与其结合使用的程序。A computer-readable signal storage medium may include a data signal propagated in baseband or as part of a carrier wave, which carries a computer-readable program code. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A computer-readable signal storage medium may also be any computer-readable storage medium other than a computer-readable storage medium, which may send, propagate, or transmit a program for use by or in conjunction with an instruction execution device, apparatus, or device.

计算机可读存储介质上包含的程序代码可以用任何适当的存储介质传输，包括——但不限于无线、电线、光缆、RF等等，或者上述的任意合适的组合。The program code contained on the computer-readable storage medium may be transmitted using any appropriate storage medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言或其组合来编写用于执行本发明操作的计算机程序代码，所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或设备上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present invention may be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional procedural programming languages such as "C" or similar programming languages. The program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or device. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., via the Internet using an Internet service provider).

注意，上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员会理解，本发明不限于这里所述的特定实施例，对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此，虽然通过以上实施例对本发明进行了较为详细的说明，但是本发明不仅仅限于以上实施例，在不脱离本发明构思的情况下，还可以包括更多其他等效实施例，而本发明的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present invention and the technical principles used. Those skilled in the art will understand that the present invention is not limited to the specific embodiments described herein, and that various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the scope of protection of the present invention. Therefore, although the present invention has been described in more detail through the above embodiments, the present invention is not limited to the above embodiments, and may include more other equivalent embodiments without departing from the concept of the present invention, and the scope of the present invention is determined by the scope of the appended claims.