Movatterモバイル変換


[0]ホーム

URL:


CN111221842A - Big data processing system and method - Google Patents

Big data processing system and method
Download PDF

Info

Publication number
CN111221842A
CN111221842ACN201811428198.8ACN201811428198ACN111221842ACN 111221842 ACN111221842 ACN 111221842ACN 201811428198 ACN201811428198 ACN 201811428198ACN 111221842 ACN111221842 ACN 111221842A
Authority
CN
China
Prior art keywords
query
engine
query statement
statement
storage engine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811428198.8A
Other languages
Chinese (zh)
Inventor
刘思源
朱海龙
李铭
徐胜国
徐皓
李铮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co LtdfiledCriticalBeijing Qihoo Technology Co Ltd
Priority to CN201811428198.8ApriorityCriticalpatent/CN111221842A/en
Publication of CN111221842ApublicationCriticalpatent/CN111221842A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Landscapes

Abstract

Translated fromChinese

本发明公开了一种大数据处理系统及方法。其中,系统包括:服务接口,提供至少一种对外调用方式,适于接收利用任一种对外调用方式输入的特定语言格式的查询语句;解析模块,适于对查询语句进行语法解析及校验,生成逻辑查询计划;路由模块,适于根据逻辑查询计划,确定与逻辑查询计划相对应的计算引擎和/或存储引擎,并将查询语句路由至计算引擎和/或存储引擎;多个计算引擎及多个存储引擎,适于根据路由模块路由的查询语句,执行对应的查询处理,获得并输出查询结果。采用本方案,用户仅通过输入的特定语言格式的查询语句便可实现对大数据的处理,从而将用户的业务逻辑与计算引擎及存储引擎解耦,降低用户的学习成本,提升用户体验。

Figure 201811428198

The invention discloses a big data processing system and method. Wherein, the system includes: a service interface, which provides at least one external calling method, and is suitable for receiving query statements in a specific language format input by any external calling method; Generating a logical query plan; a routing module, adapted to determine a computing engine and/or a storage engine corresponding to the logical query plan according to the logical query plan, and route the query statement to the computing engine and/or storage engine; a plurality of computing engines and The multiple storage engines are adapted to execute corresponding query processing according to the query statement routed by the routing module, and obtain and output query results. With this solution, the user can process big data only by inputting a query statement in a specific language format, thereby decoupling the user's business logic from the computing engine and storage engine, reducing the user's learning cost and improving the user experience.

Figure 201811428198

Description

Big data processing system and method
Technical Field
The invention relates to the technical field of computers, in particular to a big data processing system and a big data processing method.
Background
With the continuous development of science and technology and society, various data are increased in a well-jet manner, a large number of large data platforms are enabled to emerge continuously, and people can process massive data through the large data platforms.
At present, a user still needs to select a computing engine and/or a storage engine required by data query by himself on a big data platform, and then compiles a corresponding execution code according to the engine characteristics, grammar rules and the like of the selected computing engine and/or storage engine, so as to realize data query.
However, by adopting the existing big data platform, a user needs to learn a large amount of knowledge of the computing engine and the storage engine, so that the learning cost of the user is greatly increased, and the user experience is reduced; moreover, the technology is rapidly developed, the iterative updating of a computing engine and a storage engine is faster, and the existing big data platform increases the learning cost of a user and is also easy to cause the situation that the selected computing engine or storage engine is not matched with the actual business logic due to insufficient cognition of the user on the computing engine or storage engine, so that the defect of data processing efficiency is reduced; in addition, because the existing big data platform data processing logic is coupled with the computing engine or the storage engine too densely, when the computing engine or the storage engine is replaced, the codes need to be recompiled, thereby improving the maintenance cost of the business.
Disclosure of Invention
In view of the above, the present invention has been developed to provide a big data processing system and method that overcome, or at least partially address, the above-discussed problems.
According to one aspect of the present invention, there is provided a big data processing system comprising:
the service interface provides at least one external calling mode and is suitable for receiving a query statement in a specific language format input by using any external calling mode;
the analysis module is suitable for performing syntax analysis and verification on the query statement to generate a logic query plan;
a routing module adapted to determine at least one compute engine and/or at least one storage engine corresponding to the logical query plan according to the logical query plan and route the query statement to the at least one compute engine and/or at least one storage engine;
and the plurality of computing engines and the plurality of storage engines are suitable for executing corresponding query processing according to the query statements routed by the routing module, and obtaining and outputting query results.
According to another aspect of the present invention, there is provided a big data processing method, including:
providing at least one external calling mode, and receiving a query statement in a specific language format input by using any external calling mode;
performing syntax analysis and verification on the query statement to generate a logic query plan;
determining at least one computing engine and/or at least one storage engine corresponding to the logical query plan according to the logical query plan, and routing the query statement to the at least one computing engine and/or at least one storage engine;
and the plurality of computing engines and the plurality of storage engines execute corresponding query processing according to the routed query statement to obtain and output a query result.
According to yet another aspect of the present invention, there is provided a computing device comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the big data processing method.
According to still another aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the big data processing method.
According to the big data processing system and the big data processing method, the service interface provides at least one external calling mode and receives the query statement in a specific language format input by using any external calling mode; the syntax analysis and verification are carried out on the query statement through an analysis module to generate a logic query plan; the routing module further determines at least one computing engine and/or at least one storage engine corresponding to the logic query plan according to the logic query plan and routes the query statement to the at least one computing engine and/or the at least one storage engine; and executing corresponding query processing by the plurality of computing engines and the plurality of storage engines according to the query statements routed by the routing module to obtain and output query results. By adopting the scheme, the user can realize the processing of the big data only through the input query sentence with the specific language format without performing targeted coding according to the characteristics of the calculation engine and the storage engine, so that the service logic of the user is decoupled from the calculation engine and the storage engine, the learning cost of the user is reduced, the user experience is improved, and the improvement of the data processing efficiency and the service maintenance are facilitated.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a functional block diagram of a big data processing system according to an embodiment of the present invention;
FIG. 2 is a functional block diagram of a big data processing system according to another embodiment of the present invention;
FIG. 3 is a flow chart illustrating a big data processing method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computing device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
FIG. 1 is a functional block diagram of a big data processing system according to an embodiment of the present invention. As shown in fig. 1, the system includes: aservice interface 11, aparsing module 12, arouting module 13, a plurality ofcompute engines 14, and a plurality ofstorage engines 15.
Theservice interface 11 provides at least one external calling mode and is suitable for receiving a query statement in a specific language format input by using any external calling mode.
Different from the existing big data processing platform, the big data processing system provided by the invention has the advantages that a user does not need to learn the grammatical structures of a computing engine and a storage engine, and can input query sentences in a specific language format only through at least one external calling mode provided by the system, and can realize the query of data through the execution of each subsequent module. That is, the query statement in the specific language format in this embodiment may be a query logic statement, rather than a statement compiled specially according to the features of the computing engine and the storage engine and the grammatical structure in the prior art, so as to greatly reduce the learning cost of the user and improve the user experience.
And theanalysis module 12 is suitable for performing syntax analysis and verification on the query statement to generate a logic query plan.
Specifically, unlike the prior art in which a sentence is checked in a computing engine and/or a storage engine, in this embodiment, in order to guarantee data query efficiency and avoid system resource waste, theparsing module 12 performs a prepositive syntax check on the query sentence received by theservice interface 11. The specific syntax checking method is not limited in this embodiment, and those skilled in the art can set the syntax checking method according to actual requirements. Optionally, if the syntax of the query statement is not qualified, corresponding prompt information can be fed back to the user, so that the user can correct the query statement in time according to the prompt information.
After the grammar is successfully checked, the query statement is further analyzed to analyze a corresponding logic query plan. Such that therouting module 13, the plurality ofcompute engines 14, and/or the plurality ofstorage engines 15 obtain final query results based on the logical query plan.
Arouting module 13 adapted to determine, from the logical query plan, at least one compute engine and/or at least one storage engine corresponding to the logical query plan and to route the query statement to the at least one compute engine and/or at least one storage engine.
After theparsing module 12 generates the logical query plan corresponding to the query statement, therouting module 13 further generates a corresponding physical execution plan according to the logical query plan, that is, determines at least one computing engine and/or at least one storage engine corresponding to the logical query plan, and routes the query statement to the at least one computing engine and/or the at least one storage engine. The query statement routed to the at least one computing engine and/or the at least one storage engine may be an unprocessed query statement itself or a translated query statement. Preferably, in order to facilitate the query processing by the computing engine and/or the storage engine, the query statement may be translated into a query statement in a corresponding engine language, and the translated query statement may be routed to the at least one computing engine and/or the at least one storage engine.
Therefore, in this embodiment, the calculation engine and the storage engine for executing the query statement are automatically selected by the system according to the query statement, so that the appropriate calculation engine and storage engine are automatically matched with the query statement, which is beneficial to improving the query efficiency of data.
And the plurality ofcomputing engines 14 and the plurality ofstorage engines 15 are adapted to execute corresponding query processing according to the query statements routed by the routing module, and obtain and output query results.
Therefore, in the embodiment, the service interface provides at least one external calling mode and receives the query statement in the specific language format input by using any external calling mode; the syntax analysis and verification are carried out on the query statement through an analysis module to generate a logic query plan; the routing module further determines at least one computing engine and/or at least one storage engine corresponding to the logic query plan according to the logic query plan and routes the query statement to the at least one computing engine and/or the at least one storage engine; and executing corresponding query processing by the plurality of computing engines and the plurality of storage engines according to the query statements routed by the routing module to obtain and output query results. By adopting the scheme, the user can realize the processing of the big data only through the input query sentence with the specific language format without performing targeted coding according to the characteristics of the calculation engine and the storage engine, so that the service logic of the user is decoupled from the calculation engine and the storage engine, the learning cost of the user is reduced, the user experience is improved, and the improvement of the data processing efficiency and the service maintenance are facilitated.
FIG. 2 is a functional block diagram of a big data processing system according to another embodiment of the present invention. As shown in fig. 2, based on the system shown in fig. 1, the big data processing system provided in this embodiment further includes: anadaptation module 21 and adata asset module 22.
Theservice interface 11 provides at least one external calling mode and is suitable for receiving a query statement in a specific language format input by using any external calling mode. Wherein the at least one external calling mode comprises: a command line call mode, a JDBC call mode, and/or a proprietary API call mode. Optionally, in order to further improve user experience, the embodiment may provide corresponding external calling modes for different user groups. For example, a command line calling mode can be provided for a group of end users; for the developer user group, JDBC (Java DataBase Connectivity) calling mode and/or special API calling mode may be provided. Optionally, the query statement in the specific language format input by using any external calling manner is specifically an SQL statement in the specific language format.
And theanalysis module 12 is suitable for performing syntax analysis and verification on the query statement to generate a logic query plan. Theparsing module 12 is further adapted to determine whether the query statement is a mixed query statement during syntax parsing of the query statement; if not, generating a single data query plan; and if so, generating a mixed query plan. That is, the big data processing system provided in this embodiment supports a single data query mode and a hybrid query mode, and generates corresponding query plans for the two modes, respectively, and when it is determined that a query statement is a hybrid query statement, generates a hybrid query plan; and when it is determined that the query statement is not a mixed query statement, generating a single data query plan.
Specifically, when theparsing module 12 determines whether the query statement is a mixed query statement, the parsingmodule 12 further determines whether the query statement is a multi-data source query statement. If the query statement is a multi-data-source query statement, determining the query statement to be a mixed query statement; if the query statement is not a multiple data source query statement, determining that the query statement is not a mixed query statement. In an actual implementation process, in the process of judging whether the query statement is a mixed query statement, if at least two data sources in the data source information corresponding to the query statement correspond to different types of storage engines, the parsingmodule 12 determines that the query statement is the mixed query statement; and/or if at least two data sources in the data source information corresponding to the query statement correspond to different clusters, determining that the query statement is a mixed query statement; and/or if at least two data sources in the data source information corresponding to the query statement correspond to different service connections, determining that the query statement is a mixed query statement.
In a specific implementation process, when theparsing module 12 parses the query statement to generate the logic query plan, to improve the accuracy of the generated logic query plan, the parsingmodule 12 in this embodiment is further adapted to convert the query statement into a corresponding logic tree, where each node in the logic tree may correspond to at least one sub-statement in the query statement.
After the conversion into the corresponding logical tree, the converted logical tree is further split, thereby obtaining at least one logical sub-tree corresponding to the query statement. During the splitting process of the logic tree, candidate multi-data source connection nodes in the logic tree can be searched; for the searched candidate multi-data-source connection node, determining whether the data source of each branch corresponding to the candidate multi-data-source connection node meets a hybrid processing rule; if yes, splitting processing is carried out. And if the data source of each branch corresponds to different types of storage engines, and/or the data source of each branch corresponds to different clusters, and/or the data source of each branch corresponds to different service connections, determining that the hybrid processing rule is satisfied.
According to the splitting result of the logic tree, whether the query statement is a mixed query statement can be quickly determined. In addition, for the convenience of processing of subsequent modules, after the logic tree is split, the logic tree is converted into corresponding query sub-statements according to the split logic subtrees, and the query sub-statements are transmitted to therouting module 13. Specifically, the split logic subtree is converted into a query sub-statement of a corresponding engine language, and the converted query sub-statement is transmitted to therouting module 13, so that therouting module 13 routes the converted query sub-statement to a corresponding calculation engine and/or storage engine.
Optionally, in order to improve the data query efficiency, the parsingmodule 12 is further adapted to perform query optimization on the logic query plan according to the optimization rule. Before optimizing the logical query plan, usually based on the query statement, only the data acquisition part in the query statement is executed by the storage engine, and then the calculation engine performs calculation according to the data fed back by the storage engine. However, in the logic query plan, the storage engine needs to continuously perform information interaction with the computing engine, thereby greatly increasing the system overhead, reducing the data query efficiency, and easily causing the waste of computing resources of the storage engine. Therefore, the embodiment further performs query optimization on the generated logical query plan according to the optimization rule. Specifically, after the logic query plan is generated, the parsingmodule 12 further sinks the query statement, that is, the portion of the query statement that can be executed by the storage engine is determined to be executed by the corresponding storage engine, so as to optimize the logic query plan.
Therouting module 13 is further adapted to determine, from the single data query plan generated by the parsingmodule 12, at least one storage engine corresponding to the single data query plan; and routing the query statement to at least one storage engine.
In particular, therouting module 13 routes the query statement to at least one storage engine or at least one compute engine according to the determined type of the at least one storage engine corresponding to the single data query plan. Therouting module 13 converts the query statement into a query statement in an engine language corresponding to the at least one storage engine or the at least one computing engine, and routes the converted query statement to the at least one storage engine or the at least one computing engine, in the process of routing the query statement to the at least one storage engine or the at least one computing engine. For example, if it is determined that the storage engine corresponding to the single data query plan is a non-distributed storage engine such as MySQL, elastic search, and drive, therouting module 13 may route the query statement to at least one storage engine corresponding to the single data query plan by JDBC; if it is determined that the storage engine corresponding to the single data query plan is a distributed storage engine such as Hive, therouting module 13 further determines a calculation engine corresponding to the query statement, and routes the query statement to the corresponding calculation engine, so that the calculation engine invokes at least one storage engine corresponding to the single data query plan to perform query processing.
Alternatively, therouting module 13 is further adapted to: determining at least one storage engine and at least one calculation engine corresponding to the hybrid query plan according to the hybrid query plan generated by the parsingmodule 12; and routing the query statement to at least one storage engine and at least one calculation engine so that the at least one storage engine can perform query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine can perform calculation processing according to the intermediate query result to obtain a final query result.
Specifically, therouting module 13 routes each query sub-statement to the corresponding storage engine and calculation engine, obtains a corresponding intermediate query result after the corresponding storage engine executes the corresponding query sub-statement, and performs calculation processing by the calculation engine according to the intermediate query result and the query sub-statement corresponding to the calculation engine, so as to obtain a final query result.
Anadaptation module 21 adapted to determine whether at least one storage engine supports query processing in a specific language format; if not, the query statement is converted before being routed to the at least one storage engine.
In an actual implementation process, if at least one storage engine corresponding to the logical query plan does not support the SQL statement for query processing, such as storage engines like HBase and Redis, the query statement is converted before therouting module 13 routes the query statement to the at least one storage engine. Therouting module 13 further routes the converted query statement to at least one storage engine corresponding to the logical query plan.
Adata asset module 22 adapted to store metadata information and provide the metadata information to the parsing module and/or routing module; and/or storing the processing result; and/or storing the data field rules in the plurality of storage engines. In particular, thedata asset module 22 may interact with any of the modules in the present system. For example, thedata asset module 22 stores therein metadata information, which may provide corresponding metadata information when the parsing module parses and verifies the query statement, and may also provide corresponding metadata information when the routing module determines at least one computing engine and/or at least one storage engine corresponding to the logical query plan; or, after the computing engine or the storage engine obtains the query result, the query result is stored, so as to provide a basis for subsequent query or analysis; or, the data field rule in the storage engine can be stored in the data resource module, so as to provide a basis for query optimization.
In addition, the system further comprises a stream computing module (not shown in the figure) adapted to determine whether a data source corresponding to the query statement in the specific language format relates to the real-time data storage engine; and if so, converting the query statement into a corresponding real-time computing task for the real-time data storage engine to execute the real-time computing task.
Specifically, the system also supports stream computing, that is, the system determines whether the data source corresponding to the query statement relates to a real-time data storage engine such as Kafka through a stream computing module, and if so, indicates that the query statement relates to stream computing, so as to convert the query statement into a corresponding real-time computing task, and the real-time computing task is executed by the real-time data storage engine, and a real-time computing result is continuously obtained.
Therefore, according to the big data processing system provided by the embodiment, a user can process big data only through the input query statement in the specific language format without performing targeted coding according to the characteristics of the computing engine and the storage engine, so that the service logic of the user is decoupled from the computing engine and the storage engine, the learning cost of the user is reduced, the user experience is improved, and the improvement of the data processing efficiency and the service maintenance are facilitated; in addition, in the embodiment, through the analysis of the query statement, a single data query plan or a mixed query plan is generated according to whether the query statement is a mixed query statement, and query processing is performed based on the single data query plan or the mixed query plan, so that a matched query plan is automatically selected according to the query statement, and the accuracy of an obtained query result is improved; moreover, the query statement is converted by the adaptation module, so that a user can utilize a storage engine which does not support a specific language format through the query statement in the specific language format, and the user experience is further improved; in addition, the system can also realize stream type calculation, which is beneficial to further improvement of user experience.
Fig. 3 is a flowchart illustrating a big data processing method according to an embodiment of the present invention. As shown in fig. 3, the method includes:
in step S310, a query statement in a specific language format input by using any external calling method is received.
Step S320, performing syntax parsing and checking on the query statement to generate a logic query plan.
Step S330, determining at least one computing engine and/or at least one storage engine corresponding to the logic query plan according to the logic query plan, and routing the query statement to the at least one computing engine and/or at least one storage engine.
Step S340, the plurality of computing engines and the plurality of storage engines execute corresponding query processing according to the routed query statement, and obtain and output a query result.
Optionally, the parsing and checking the query statement to generate a logic query plan further includes: judging whether the query statement is a mixed query statement; if not, generating a single data query plan; and if so, generating a mixed query plan.
Optionally, the parsing and checking the query statement to generate a logic query plan further includes: and judging whether the query statement is a multi-data source query statement.
Optionally, the determining, according to the logical query plan, at least one compute engine and/or at least one storage engine corresponding to the logical query plan, and routing the query statement to the at least one compute engine and/or the at least one storage engine further includes: determining at least one storage engine corresponding to a single data query plan according to the single data query plan; routing the query statement to the at least one storage engine.
Optionally, the determining, according to the logical query plan, at least one compute engine and/or at least one storage engine corresponding to the logical query plan, and routing the query statement to the at least one compute engine and/or the at least one storage engine further includes: determining at least one storage engine and at least one computation engine corresponding to a hybrid query plan according to the hybrid query plan; and routing the query statement to the at least one storage engine and the at least one calculation engine, so that the at least one storage engine performs query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine performs calculation processing according to the intermediate query result to obtain a final query result.
Optionally, the method further includes: determining whether at least one storage engine supports query processing in a particular language format; if not, the query statement is converted before being routed to the at least one storage engine.
Optionally, parsing and checking the query statement to generate a logic query plan further includes: and performing query optimization on the logic query plan according to the optimization rule.
Optionally, the method further includes: storing the metadata information; and/or storing the processing result; and/or storing the data field rules in the plurality of storage engines.
Optionally, the method further includes: judging whether a data source corresponding to the query statement in the specific language format relates to a real-time data storage engine or not; and if so, converting the query statement into a corresponding real-time computing task so that the real-time data storage engine can execute the real-time computing task.
Optionally, the at least one external calling manner includes: a command line call mode, a JDBC call mode, and/or a proprietary API call mode.
Optionally, the query statement is an SQL statement.
The specific implementation manner of each step in this embodiment may refer to the description of the corresponding part in the system embodiment shown in fig. 1 and/or fig. 2, which is not described herein again.
Therefore, by adopting the scheme, the user can process the big data only through the input query statement with the specific language format without performing targeted coding according to the characteristics of the calculation engine and the storage engine, so that the service logic of the user is decoupled from the calculation engine and the storage engine, the learning cost of the user is reduced, the user experience is improved, and the improvement of the data processing efficiency and the service maintenance are facilitated.
According to an embodiment of the present invention, a non-volatile computer storage medium is provided, where at least one executable instruction is stored, and the computer executable instruction can execute the big data processing method in any of the above method embodiments.
Fig. 4 is a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.
As shown in fig. 4, the computing device may include: a processor (processor)402, aCommunications Interface 404, a memory 406, and a Communications bus 408.
Wherein:
the processor 402,communication interface 404, and memory 406 communicate with each other via a communication bus 408.
Acommunication interface 404 for communicating with network elements of other devices, such as clients or other servers.
The processor 402 is configured to execute theprogram 410, and may specifically perform relevant steps in the above-described embodiment of the big data processing method.
In particular,program 410 may include program code comprising computer operating instructions.
The processor 402 may be a central processing unit CPU, or an application specific Integrated Circuit ASIC (application specific Integrated Circuit), or one or more Integrated circuits configured to implement an embodiment of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 406 for storing aprogram 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
Theprogram 410 may specifically be configured to cause the processor 402 to perform the following operations:
providing at least one external calling mode, and receiving a query statement in a specific language format input by using any external calling mode;
performing syntax analysis and verification on the query statement to generate a logic query plan;
determining at least one computing engine and/or at least one storage engine corresponding to the logical query plan according to the logical query plan, and routing the query statement to the at least one computing engine and/or at least one storage engine;
and the plurality of computing engines and the plurality of storage engines execute corresponding query processing according to the routed query statement to obtain and output a query result.
In an alternative embodiment, theprogram 410 may be specifically configured to cause the processor 402 to perform the following operations:
judging whether the query statement is a mixed query statement; if not, generating a single data query plan; and if so, generating a mixed query plan.
In an alternative embodiment, theprogram 410 may be specifically configured to cause the processor 402 to perform the following operations:
and judging whether the query statement is a multi-data source query statement.
In an alternative embodiment, theprogram 410 may be specifically configured to cause the processor 402 to perform the following operations:
determining at least one storage engine corresponding to a single data query plan according to the single data query plan; routing the query statement to the at least one storage engine.
In an alternative embodiment, theprogram 410 may be specifically configured to cause the processor 402 to perform the following operations:
determining at least one storage engine and at least one computation engine corresponding to a hybrid query plan according to the hybrid query plan; and routing the query statement to the at least one storage engine and the at least one calculation engine, so that the at least one storage engine performs query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine performs calculation processing according to the intermediate query result to obtain a final query result.
In an alternative embodiment, theprogram 410 may be specifically configured to cause the processor 402 to perform the following operations:
determining whether the at least one storage engine supports query processing in the particular language format; if not, before routing the query statement to the at least one storage engine, performing conversion processing on the query statement.
In an alternative embodiment, theprogram 410 may be specifically configured to cause the processor 402 to perform the following operations:
and performing query optimization on the logic query plan according to the optimization rule.
In an alternative embodiment, theprogram 410 may be specifically configured to cause the processor 402 to perform the following operations:
storing the metadata information; and/or storing the processing result; and/or storing the data field rules in the plurality of storage engines.
In an alternative embodiment, theprogram 410 may be specifically configured to cause the processor 402 to perform the following operations:
judging whether a data source corresponding to the query statement in the specific language format relates to a real-time data storage engine or not;
and if so, converting the query statement into a corresponding real-time computing task so that the real-time data storage engine can execute the real-time computing task.
In an optional embodiment, the at least one external call mode includes: a command line call mode, a JDBC call mode, and/or a proprietary API call mode.
In an alternative embodiment, the query statement is an SQL statement.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in a large data processing system according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
The invention discloses: A1. a big data processing system, comprising:
the service interface provides at least one external calling mode and is suitable for receiving a query statement in a specific language format input by using any external calling mode;
the analysis module is suitable for performing syntax analysis and verification on the query statement to generate a logic query plan;
a routing module adapted to determine at least one compute engine and/or at least one storage engine corresponding to the logical query plan according to the logical query plan and route the query statement to the at least one compute engine and/or at least one storage engine;
and the plurality of computing engines and the plurality of storage engines are suitable for executing corresponding query processing according to the query statements routed by the routing module, and obtaining and outputting query results.
A2. The system of a1, wherein the parsing module is further adapted to: judging whether the query statement is a mixed query statement; if not, generating a single data query plan; and if so, generating a mixed query plan.
A3. The system of a2, wherein the parsing module is further adapted to: and judging whether the query statement is a multi-data source query statement.
A4. The system of a2, wherein the routing module is further adapted to: determining at least one storage engine corresponding to a single data query plan according to the single data query plan; routing the query statement to the at least one storage engine.
A5. The system of a2, wherein the routing module is further adapted to:
determining at least one storage engine and at least one computation engine corresponding to a hybrid query plan according to the hybrid query plan; and routing the query statement to the at least one storage engine and the at least one calculation engine, so that the at least one storage engine performs query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine performs calculation processing according to the intermediate query result to obtain a final query result.
A6. The system of a4 or a5, wherein the system further comprises:
an adaptation module adapted to determine whether the at least one storage engine supports query processing in the particular language format; if not, before routing the query statement to the at least one storage engine, performing conversion processing on the query statement.
A7. The system of any one of a1-a6, wherein the parsing module is further adapted to: and performing query optimization on the logic query plan according to the optimization rule.
A8. The system of any one of a1-a7, wherein the system further comprises:
the data asset module is suitable for storing metadata information and providing the metadata information for the analysis module and/or the routing module;
and/or storing the processing result;
and/or storing the data field rules in the plurality of storage engines.
A9. The system of any one of a1-A8, wherein the system further comprises:
the stream type calculation module is suitable for judging whether a data source corresponding to the query statement in the specific language format relates to a real-time data storage engine or not;
and if so, converting the query statement into a corresponding real-time computing task so that the real-time data storage engine can execute the real-time computing task.
A10. The system of any one of a1-a9, wherein the at least one outbound call comprises: a command line call mode, a JDBC call mode, and/or a proprietary API call mode.
A11. The system of any one of a1-a10, wherein the query statement is an SQL statement.
The invention also discloses: B12. a big data processing method comprises the following steps:
providing at least one external calling mode, and receiving a query statement in a specific language format input by using any external calling mode;
performing syntax analysis and verification on the query statement to generate a logic query plan;
determining at least one computing engine and/or at least one storage engine corresponding to the logical query plan according to the logical query plan, and routing the query statement to the at least one computing engine and/or at least one storage engine;
and the plurality of computing engines and the plurality of storage engines execute corresponding query processing according to the routed query statement to obtain and output a query result.
B13. The method of B12, wherein the parsing and checking the query statement to generate a logical query plan further comprises:
judging whether the query statement is a mixed query statement; if not, generating a single data query plan; and if so, generating a mixed query plan.
B14. The method of B13, wherein the parsing and checking the query statement to generate a logical query plan further comprises:
and judging whether the query statement is a multi-data source query statement.
B15. The method of B13, wherein the determining, according to the logical query plan, at least one compute engine and/or at least one storage engine corresponding to the logical query plan and routing the query statement to the at least one compute engine and/or at least one storage engine further comprises:
determining at least one storage engine corresponding to a single data query plan according to the single data query plan; routing the query statement to the at least one storage engine.
B16. The method of B13, wherein the determining, according to the logical query plan, at least one compute engine and/or at least one storage engine corresponding to the logical query plan and routing the query statement to the at least one compute engine and/or at least one storage engine further comprises:
determining at least one storage engine and at least one computation engine corresponding to a hybrid query plan according to the hybrid query plan; and routing the query statement to the at least one storage engine and the at least one calculation engine, so that the at least one storage engine performs query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine performs calculation processing according to the intermediate query result to obtain a final query result.
B17. The method of B15 or B16, wherein the method further comprises:
determining whether the at least one storage engine supports query processing in the particular language format; if not, before routing the query statement to the at least one storage engine, performing conversion processing on the query statement.
B18. The method of any one of B12-B17, wherein parsing and checking the query statement, generating a logical query plan further comprises:
and performing query optimization on the logic query plan according to the optimization rule.
B19. The method of any one of B12-B18, wherein the method further comprises:
storing the metadata information; and/or storing the processing result; and/or storing the data field rules in the plurality of storage engines.
B20. The method of any one of B12-B19, wherein the method further comprises:
judging whether a data source corresponding to the query statement in the specific language format relates to a real-time data storage engine or not;
and if so, converting the query statement into a corresponding real-time computing task so that the real-time data storage engine can execute the real-time computing task.
B21. The method of any one of B12-B20, wherein the at least one outbound call comprises: a command line call mode, a JDBC call mode, and/or a proprietary API call mode.
B22. The system of any one of B12-B21, wherein the query statement is an SQL statement.
The invention also discloses: C23. a computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction which causes the processor to execute the operation corresponding to the big data processing method as any one of A1-A11.
The invention also discloses: D24. a computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the big data processing method as described in any one of a1-a 11.

Claims (10)

1. A big data processing system, comprising:
the service interface provides at least one external calling mode and is suitable for receiving a query statement in a specific language format input by using any external calling mode;
the analysis module is suitable for performing syntax analysis and verification on the query statement to generate a logic query plan;
a routing module adapted to determine at least one compute engine and/or at least one storage engine corresponding to the logical query plan according to the logical query plan and route the query statement to the at least one compute engine and/or at least one storage engine;
and the plurality of computing engines and the plurality of storage engines are suitable for executing corresponding query processing according to the query statements routed by the routing module, and obtaining and outputting query results.
2. The system of claim 1, wherein the parsing module is further adapted to: judging whether the query statement is a mixed query statement; if not, generating a single data query plan; and if so, generating a mixed query plan.
3. The system of claim 2, wherein the parsing module is further adapted to: and judging whether the query statement is a multi-data source query statement.
4. The system of claim 2, wherein the routing module is further adapted to: determining at least one storage engine corresponding to a single data query plan according to the single data query plan; routing the query statement to the at least one storage engine.
5. The system of claim 2, wherein the routing module is further adapted to:
determining at least one storage engine and at least one computation engine corresponding to a hybrid query plan according to the hybrid query plan; and routing the query statement to the at least one storage engine and the at least one calculation engine, so that the at least one storage engine performs query processing according to the query statement to obtain an intermediate query result, and the at least one calculation engine performs calculation processing according to the intermediate query result to obtain a final query result.
6. The system of claim 4 or 5, wherein the system further comprises:
an adaptation module adapted to determine whether the at least one storage engine supports query processing in the particular language format; if not, before routing the query statement to the at least one storage engine, performing conversion processing on the query statement.
7. The system of any one of claims 1-6, wherein the parsing module is further adapted to: and performing query optimization on the logic query plan according to the optimization rule.
8. A big data processing method comprises the following steps:
providing at least one external calling mode, and receiving a query statement in a specific language format input by using any external calling mode;
performing syntax analysis and verification on the query statement to generate a logic query plan;
determining at least one computing engine and/or at least one storage engine corresponding to the logical query plan according to the logical query plan, and routing the query statement to the at least one computing engine and/or at least one storage engine;
and the plurality of computing engines and the plurality of storage engines execute corresponding query processing according to the routed query statement to obtain and output a query result.
9. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the big data processing method according to any one of claims 1-7.
10. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the big data processing method according to any one of claims 1 to 7.
CN201811428198.8A2018-11-272018-11-27Big data processing system and methodPendingCN111221842A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201811428198.8ACN111221842A (en)2018-11-272018-11-27Big data processing system and method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201811428198.8ACN111221842A (en)2018-11-272018-11-27Big data processing system and method

Publications (1)

Publication NumberPublication Date
CN111221842Atrue CN111221842A (en)2020-06-02

Family

ID=70830397

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201811428198.8APendingCN111221842A (en)2018-11-272018-11-27Big data processing system and method

Country Status (1)

CountryLink
CN (1)CN111221842A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112486592A (en)*2020-11-302021-03-12成都新希望金融信息有限公司Distributed data processing method, device, server and readable storage medium
CN112506931A (en)*2020-12-152021-03-16平安银行股份有限公司Data query method and device, electronic equipment and storage medium
CN112948178A (en)*2021-02-242021-06-11北京金山云网络技术有限公司Data processing method, device, system, equipment and medium
CN113064914A (en)*2021-04-222021-07-02中国工商银行股份有限公司Data extraction method and device
CN113342843A (en)*2021-07-062021-09-03多点生活(成都)科技有限公司Big data online analysis method and system
CN113568927A (en)*2021-06-242021-10-29华控清交信息科技(北京)有限公司Data processing system, method, database engine and device for data processing
CN113703739A (en)*2021-09-032021-11-26上海森亿医疗科技有限公司Cross-language fusion computing method, system and terminal based on omiga engine
CN113704291A (en)*2021-09-032021-11-26北京火山引擎科技有限公司Data query method and device, storage medium and electronic equipment
CN113918859A (en)*2021-10-192022-01-11创盛视联数码科技(北京)有限公司 Metadata processing method, system, device and readable medium for online education platform
CN114254051A (en)*2020-09-222022-03-29京东科技控股股份有限公司 A big data computing method, device and big data platform
CN114722072A (en)*2022-03-232022-07-08远光软件股份有限公司Query model based query method, query device and storage medium
CN115422162A (en)*2022-09-062022-12-02宁波数益工联科技有限公司Industrial data storage system, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20090019000A1 (en)*2007-07-122009-01-15Mitchell Jon ArendsQuery based rule sets
CN102982075A (en)*2012-10-302013-03-20北京京东世纪贸易有限公司Heterogeneous data source access supporting system and method thereof
CN103440303A (en)*2013-08-212013-12-11曙光信息产业股份有限公司Heterogeneous cloud storage system and data processing method thereof
CN106777108A (en)*2016-12-152017-05-31贵州电网有限责任公司电力科学研究院A kind of data query method and apparatus based on mixing storage architecture
CN108519914A (en)*2018-04-092018-09-11腾讯科技(深圳)有限公司Big data computational methods, system and computer equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20090019000A1 (en)*2007-07-122009-01-15Mitchell Jon ArendsQuery based rule sets
CN102982075A (en)*2012-10-302013-03-20北京京东世纪贸易有限公司Heterogeneous data source access supporting system and method thereof
CN103440303A (en)*2013-08-212013-12-11曙光信息产业股份有限公司Heterogeneous cloud storage system and data processing method thereof
CN106777108A (en)*2016-12-152017-05-31贵州电网有限责任公司电力科学研究院A kind of data query method and apparatus based on mixing storage architecture
CN108519914A (en)*2018-04-092018-09-11腾讯科技(深圳)有限公司Big data computational methods, system and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
向红: "基于本体的异构数据集成系统的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》*

Cited By (14)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN114254051A (en)*2020-09-222022-03-29京东科技控股股份有限公司 A big data computing method, device and big data platform
CN112486592A (en)*2020-11-302021-03-12成都新希望金融信息有限公司Distributed data processing method, device, server and readable storage medium
CN112486592B (en)*2020-11-302024-04-02成都新希望金融信息有限公司Distributed data processing method, device, server and readable storage medium
CN112506931A (en)*2020-12-152021-03-16平安银行股份有限公司Data query method and device, electronic equipment and storage medium
CN112948178A (en)*2021-02-242021-06-11北京金山云网络技术有限公司Data processing method, device, system, equipment and medium
CN113064914A (en)*2021-04-222021-07-02中国工商银行股份有限公司Data extraction method and device
CN113568927A (en)*2021-06-242021-10-29华控清交信息科技(北京)有限公司Data processing system, method, database engine and device for data processing
CN113568927B (en)*2021-06-242024-03-29华控清交信息科技(北京)有限公司Data processing system, method, database engine and device for data processing
CN113342843A (en)*2021-07-062021-09-03多点生活(成都)科技有限公司Big data online analysis method and system
CN113703739A (en)*2021-09-032021-11-26上海森亿医疗科技有限公司Cross-language fusion computing method, system and terminal based on omiga engine
CN113704291A (en)*2021-09-032021-11-26北京火山引擎科技有限公司Data query method and device, storage medium and electronic equipment
CN113918859A (en)*2021-10-192022-01-11创盛视联数码科技(北京)有限公司 Metadata processing method, system, device and readable medium for online education platform
CN114722072A (en)*2022-03-232022-07-08远光软件股份有限公司Query model based query method, query device and storage medium
CN115422162A (en)*2022-09-062022-12-02宁波数益工联科技有限公司Industrial data storage system, electronic equipment and storage medium

Similar Documents

PublicationPublication DateTitle
CN111221842A (en)Big data processing system and method
CN112988163B (en)Intelligent adaptation method, intelligent adaptation device, intelligent adaptation electronic equipment and intelligent adaptation medium for programming language
US9122540B2 (en)Transformation of computer programs and eliminating errors
CN111221852A (en)Mixed query processing method and device based on big data
CN113504900A (en)Programming language conversion method and device
CN111221841A (en)Real-time processing method and device based on big data
CN111309751A (en) Big data processing method and device
CN112069456B (en) A method, device, electronic device and storage medium for generating a model file
CN111104335A (en) A C language defect detection method and device based on multi-level analysis
CN111125154A (en) Method and apparatus for outputting structured query statements
CN111104796B (en) Method and device for translation
CN111221860A (en)Mixed query optimization method and device based on big data
CN111221888A (en) Big data analysis system and method
CN115292058A (en)Service scene level service topology generation method and device and electronic equipment
CN112579151A (en)Method and device for generating model file
CN113705799B (en) Processing unit, computing device, and computational graph processing method for deep learning model
CN110489124B (en)Source code execution method, source code execution device, storage medium and computer equipment
CN111221843A (en)Big data processing method and device
WO2017097125A1 (en)Executive code generation method and device
CN118860350A (en) A code development method and related equipment
JP2016051367A (en) Data analysis apparatus, data analysis method, and program.
US20110271170A1 (en)Determining page faulting behavior of a memory operation
CN116414396A (en) A LLVM target definition file generation method, device and electronic equipment
CN113377939A (en)Text enhancement method and device, computer equipment and storage medium
CN106682221B (en)Question-answer interaction response method and device and question-answer system

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20200602

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp