CN105975498A

Movatterモバイル変換

Info

Publication number: CN105975498A
Application number: CN201610269153.5A
Authority: CN
Inventors: 李於彬; 汪玉; 田聃
Original assignee: Tsinghua University; Huawei Technologies Co Ltd
Current assignee: Tsinghua University; Huawei Technologies Co Ltd
Priority date: 2016-04-27
Filing date: 2016-04-27
Publication date: 2016-09-28

Abstract

Translated fromChinese

一种数据查询的方法、装置和系统，通过数据查询装置从压缩数据块中获取与查询请求中的查询条件对应的条件列以及与该查询请求中的查询目标对应的目标列，该压缩数据块是通过对原始数据块以列宽为单位进行列压缩处理得到的；根据该查询条件，确定该条件列中的至少一个连续的第一压缩记录对应的行掩码；若确定该条件列中的第二压缩记录对应于至少一个原始记录，并且该至少一个原始记录与该至少一个连续的第一压缩记录对应的原始记录一一对应相同，确定该第二压缩记录对应的行掩码包括该连续的至少一个第一压缩记录对应的行掩码；根据该条件列中每个压缩记录对应的行掩码以及该目标列，确定该查询请求对应的查询结果，能够提高数据查询的速度。

A method, device and system for data query, wherein a condition column corresponding to a query condition in a query request and a target column corresponding to a query target in the query request are obtained from a compressed data block through a data query device, the compressed data block It is obtained by performing column compression processing on the original data block in the unit of column width; according to the query condition, determine the row mask corresponding to at least one continuous first compressed record in the condition column; if the The second compressed record corresponds to at least one original record, and the at least one original record is the same as the original record corresponding to the at least one continuous first compressed record, and it is determined that the row mask corresponding to the second compressed record includes the continuous The row mask corresponding to at least one first compressed record; according to the row mask corresponding to each compressed record in the condition column and the target column, determine the query result corresponding to the query request, which can improve the speed of data query.

Description

Translated fromChinese

数据查询的方法、装置和系统Method, device and system for data query

技术领域technical field

本发明涉及数据库领域，更具体地，涉及数据查询的方法、装置和系统。The present invention relates to the field of databases, more specifically, to a data query method, device and system.

背景技术Background technique

随着数据量和数据产生速度的不断增长，数据库查询操作需要处理的数据量呈指数型增长，查询延时不断增大。基于数据量的增长和大规模的数据分析请求，列数据库得到越来越多的重视。As the amount of data and the speed of data generation continue to grow, the amount of data that needs to be processed by database query operations increases exponentially, and the query delay continues to increase. Based on the growth of data volume and large-scale data analysis requests, columnar databases are getting more and more attention.

针对列数据库查询请求，目前常用的两种方法为基于中央处理器(Central Processing Unit，CPU)软件平台的数据查询方法和基于CPU+现场可编程门阵列(Field－Programmable Gate Array，FPGA)异构平台的数据查询方法。For columnar database query requests, two commonly used methods are the data query method based on the central processing unit (Central Processing Unit, CPU) software platform and the heterogeneous platform based on CPU+field-programmable gate array (Field-Programmable Gate Array, FPGA). data query method.

在基于CPU软件平台的数据查询方法中，CPU从磁盘读取需要的数据块到内存，根据该数据块的压缩信息索引与查询请求相关的数据列，并对这些数据列进行解压缩处理，根据数据列的解压缩结果和查询请求，确定查询结果的行掩码，最后将查询结果反馈给客户端。In the data query method based on the CPU software platform, the CPU reads the required data block from the disk to the memory, indexes the data columns related to the query request according to the compression information of the data block, and decompresses these data columns, according to Decompress the result of the data column and query request, determine the row mask of the query result, and finally feed back the query result to the client.

在上述基于CPU软件平台的数据查询方法中，整个查询过程都运行在CPU上，而CPU的解压缩速度在200MB/s左右，限制了数据解压缩的速度，从而影响数据查询的速度。In the above-mentioned data query method based on the CPU software platform, the entire query process runs on the CPU, and the decompression speed of the CPU is about 200MB/s, which limits the speed of data decompression, thereby affecting the speed of data query.

在基于CPU+FPGA异构平台的数据查询方法中，CPU分析客户端查询请求，从磁盘读取需要的数据块到内存，并全部发送给FPGA进行解压缩；FPGA根据查询请求对数据块的解压缩结果和客户端查询请求，确定需要使用的数据列以及行掩码，根据需要使用的数据列以及行掩码获得满足查询请求的数据内容反馈给CPU，CPU最后将查询结果反馈给客户端。In the data query method based on the CPU+FPGA heterogeneous platform, the CPU analyzes the client query request, reads the required data blocks from the disk to the memory, and sends them all to the FPGA for decompression; the FPGA decompresses the data blocks according to the query request. Compress the results and client query requests, determine the data columns and row masks that need to be used, obtain the data content that meets the query request according to the required data columns and row masks, and feed them back to the CPU, and the CPU finally feeds back the query results to the client.

在上述基于CPU+FPGA异构平台的数据查询方法中，FPGA需要对整个数据块进行解压缩，并进行数据查询，影响了数据查询的速度。In the above-mentioned data query method based on the CPU+FPGA heterogeneous platform, the FPGA needs to decompress the entire data block and perform data query, which affects the speed of data query.

因此，现有的数据查询的方法在遇到查询的数据量很大的情况时，需要解压缩大量的数据，并在大量数据中进行查询操作，数据查询速度比较慢。Therefore, when the existing data query method encounters a large amount of query data, it needs to decompress a large amount of data and perform query operations on the large amount of data, and the data query speed is relatively slow.

发明内容Contents of the invention

本发明提供了一种数据查询的方法、装置和系统，能够提高数据查询的速度。The invention provides a data query method, device and system, which can improve the speed of data query.

第一方面，提供了一种数据查询的方法，该方法包括：数据查询装置从压缩数据块中获取与查询请求中的查询条件对应的条件列以及与该查询请求中的查询目标对应的目标列，其中，该压缩数据块是通过对原始数据块以列宽为单位进行列压缩处理得到的；该数据查询装置根据该查询条件，确定该条件列中的至少一个连续的第一压缩记录中每个第一压缩记录对应的行掩码；若该数据查询装置确定该条件列中的第二压缩记录对应于至少一个原始记录，并且该至少一个原始记录与该至少一个连续的第一压缩记录中每个第一压缩记录对应的原始记录一一对应相同，确定该第二压缩记录对应的行掩码包括该连续的至少一个第一压缩记录中每个第一压缩记录对应的行掩码；该数据查询装置根据该条件列中每个压缩记录对应的行掩码以及该目标列，确定该查询请求对应的查询结果。In the first aspect, a data query method is provided, the method comprising: the data query device acquires the condition column corresponding to the query condition in the query request and the target column corresponding to the query target in the query request from the compressed data block , wherein the compressed data block is obtained by performing column compression processing on the original data block in units of column width; the data query device determines, according to the query condition, each A row mask corresponding to a first compressed record; if the data query device determines that the second compressed record in the condition column corresponds to at least one original record, and the at least one original record is consistent with the at least one continuous first compressed record The original records corresponding to each first compressed record are identical in one-to-one correspondence, and the line mask corresponding to the second compressed record is determined to include the line mask corresponding to each first compressed record in the at least one continuous first compressed record; the The data query device determines the query result corresponding to the query request according to the row mask corresponding to each compressed record in the condition column and the target column.

可选地，该条件列可以包括一列数据或多列数据，该目标列可以包括一列或多列数据。Optionally, the condition column may include one or more columns of data, and the target column may include one or more columns of data.

可选地，列宽可以表示原始数据块中占比特位最长的记录中值的长度，且该原始数据块中每列数据的列宽相同。Optionally, the column width may represent the median length of the longest record in the original data block, and the column width of each column of data in the original data block is the same.

可选地，该数据查询装置可以根据查询条件以及第一压缩记录对应的原始记录，确定第一压缩记录对应的行掩码。例如，该数据查询装置可以对该第一压缩记录进行解压缩处理，以获得该第一压缩记录所对应的原始记录，并根据该查询条件以及该解压缩处理得到的原始记录，确定该第一压缩记录对应的行掩码。Optionally, the data query device may determine the row mask corresponding to the first compressed record according to the query condition and the original record corresponding to the first compressed record. For example, the data query device may decompress the first compressed record to obtain the original record corresponding to the first compressed record, and determine the first The row mask corresponding to the compressed record.

该数据查询装置确定第二压缩记录对应于一个或多个连续的原始记录，并且该一个或多个连续的原始记录与该至少一个连续的第一压缩记录对应的一个或多个连续的原始记录一一对应相等，则该数据查询装置可以无需对该第二压缩记录进行解压缩处理，而直接将该至少一个连续的第一压缩记录对应的行掩码确定为该第二压缩记录的行掩码，从而能够提高数据处理效率，降低数据处理时延。The data query means determines that the second compressed record corresponds to one or more continuous original records, and the one or more continuous original records correspond to the one or more continuous original records corresponding to the at least one continuous first compressed record If the one-to-one correspondence is equal, the data query device may directly determine the row mask corresponding to the at least one continuous first compressed record as the row mask of the second compressed record without decompressing the second compressed record. Code, so as to improve data processing efficiency and reduce data processing delay.

结合第一方面，在第一方面的第一种可能的实现方式中，该压缩数据块是通过对原始数据块依次经过第一压缩处理和第二压缩处理得到的；在该数据查询装置根据该查询条件，确定该条件列中的至少一个连续的第一压缩记录中每个第一压缩记录对应的行掩码之前，该方法还包括：该数据查询装置对该条件列中的每个压缩记录进行与该第二压缩处理相对应的解压缩处理，得到第一解压缩数据块，其中，该第一解压缩数据块包括该至少一个连续的第一压缩记录对应的至少一个第一解压缩记录和该第二压缩记录对应的第二解压缩记录；该数据查询装置根据该查询条件，确定该条件列中的至少一个连续的第一压缩记录中每个第一压缩记录对应的行掩码，包括：该数据查询装置根据该查询条件和该至少一个第一解压缩记录中每个第一解压缩记录对应的原始记录，确定该每个第一解压缩记录对应的行掩码。With reference to the first aspect, in the first possible implementation of the first aspect, the compressed data block is obtained by sequentially performing the first compression process and the second compression process on the original data block; Query conditions, before determining the row mask corresponding to each first compressed record in at least one continuous first compressed record in the condition column, the method also includes: the data query device for each compressed record in the condition column performing decompression processing corresponding to the second compression processing to obtain a first decompressed data block, wherein the first decompressed data block includes at least one first decompressed record corresponding to the at least one continuous first compressed record The second decompressed record corresponding to the second compressed record; the data query device determines the row mask corresponding to each first compressed record in at least one continuous first compressed record in the condition column according to the query condition, The method includes: the data query device determines the row mask corresponding to each first decompressed record according to the query condition and the original record corresponding to each first decompressed record in the at least one first decompressed record.

可选地，该方法还包括：该若该数据查询装置确定该至少一个第一解压缩记录中的每个第一解压缩记录均为原始记录，则根据该每个第一解压缩记录对应的原始记录和该查询条件，确定该每个第一解压缩记录对应的行掩码。Optionally, the method further includes: if the data query device determines that each first decompressed record in the at least one first decompressed record is an original record, according to the corresponding The original record and the query condition determine the row mask corresponding to each first decompressed record.

结合第一方面的第一种可能的实现方式，在第一方面的第二种可能的实现方式中，该若该数据查询装置确定该条件列中的第二压缩记录对应于至少一个原始记录，并且该至少一个原始记录与该至少一个连续的第一压缩记录中每个第一压缩记录对应的原始记录一一对应相同，确定该第二压缩记录对应的行掩码包括该连续的至少一个第一压缩记录中每个第一压缩记录对应的行掩码，包括：若该第二解压缩记录具体为距离-长度对，并且该距离-长度对的值指示该第二解压缩记录对应于该至少一个原始记录，并且该至少一个原始记录与该至少一个第一解压缩记录中每个第一解压缩记录对应的原始记录一一对应相同，确定该第二解压缩记录对应的行掩码包括该至少一个第一解压缩记录中每个第一解压缩记录对应的行掩码，其中，该距离-长度对中的距离表示该第二解压缩记录对应的该至少一个原始记录与该至少一个第一解压缩记录对应的至少一个原始记录之间的地址偏移量，该距离-长度对中的长度表示该至少一个第一解压缩记录的长度。With reference to the first possible implementation of the first aspect, in the second possible implementation of the first aspect, if the data query device determines that the second compressed record in the condition column corresponds to at least one original record, And the at least one original record is the same as the original record corresponding to each first compressed record in the at least one continuous first compressed record, and it is determined that the row mask corresponding to the second compressed record includes the at least one continuous first compressed record A row mask corresponding to each first compressed record in a compressed record, including: if the second decompressed record is specifically a distance-length pair, and the value of the distance-length pair indicates that the second decompressed record corresponds to the At least one original record, and the at least one original record is the same as the original record corresponding to each first decompressed record in the at least one first decompressed record, and determining the line mask corresponding to the second decompressed record includes The row mask corresponding to each first decompressed record in the at least one first decompressed record, wherein the distance in the distance-length pair represents the at least one original record corresponding to the second decompressed record and the at least one The address offset between at least one original record corresponding to the first decompressed record, and the length in the distance-length pair represents the length of the at least one first decompressed record.

结合第一方面的第二种可能的实现方式，在第一方面的第三种可能的实现方式中，该方法还包括：若该第二解压缩记录为原始记录，则根据该第二解压缩记录对应的原始记录和该查询条件，确定该第二解压缩记录对应的行掩码。With reference to the second possible implementation of the first aspect, in a third possible implementation of the first aspect, the method further includes: if the second decompressed record is an original record, according to the second decompressed Record the corresponding original record and the query condition, and determine the row mask corresponding to the second decompressed record.

结合第一方面的第一种至第三种可能的实现方式中的任一种可能的实现方式，在第一方面的第四种可能的实现方式中，该数据查询装置为现场可编程门阵列FPGA；该数据查询装置从压缩数据块中获取与查询请求中的查询条件对应的条件列以及与该查询请求中的查询目标对应的目标列，包括：FPGA接收中央处理器CPU发送的该条件列以及该目标列。In combination with any of the first to third possible implementations of the first aspect, in a fourth possible implementation of the first aspect, the data query device is a field programmable gate array FPGA; the data query device obtains the condition column corresponding to the query condition in the query request and the target column corresponding to the query target in the query request from the compressed data block, including: the FPGA receives the condition column sent by the central processing unit CPU and the target column.

结合第一方面的第四种可能的实现方式，在第一方面的第五种可能的实现方式中，该查询条件包括第一子查询条件和第二子查询条件，该条件列包括与该第一子查询条件对应的第一条件列和与该第二子查询条件对应的第二条件列；在该FPGA接收该CPU发送的该条件列以及该目标列之前，该方法还包括：接收CPU发送的指示信息，该指示信息用于指示该第一子查询条件与该第二子查询条件之间的逻辑关系；该FPGA接收该CPU发送的该条件列以及该目标列，包括接收该CPU按照与该逻辑关系对应的操作顺序发送的该第一条件列和该第二条件列。With reference to the fourth possible implementation of the first aspect, in a fifth possible implementation of the first aspect, the query condition includes a first sub-query condition and a second sub-query condition, and the condition column includes A first condition column corresponding to a sub-query condition and a second condition column corresponding to the second sub-query condition; before the FPGA receives the condition column and the target column sent by the CPU, the method also includes: receiving the CPU sending Instruction information, the instruction information is used to indicate the logical relationship between the first sub-query condition and the second sub-query condition; the FPGA receives the condition column and the target column sent by the CPU, including receiving the CPU according to the and The first condition column and the second condition column sent by the operation sequence corresponding to the logical relationship.

结合第一方面的第一种至第三种可能的实现方式中的任一种可能的实现方式，在第一方面的第六种可能的实现方式中，该数据查询装置为CPU；该数据查询装置从压缩数据块中获取与查询请求中的查询条件对应的条件列以及与该查询请求中的查询目标对应的目标列，包括：该CPU获取该压缩数据块以及该查询请求；该CPU根据该压缩数据块和该查询请求，确定该条件列和该目标列。With reference to any one of the first to third possible implementations of the first aspect, in the sixth possible implementation of the first aspect, the data query device is a CPU; the data query The device obtains the condition column corresponding to the query condition in the query request and the target column corresponding to the query target in the query request from the compressed data block, including: the CPU obtains the compressed data block and the query request; the CPU obtains the compressed data block and the query request; Compress the data block and the query request, and determine the condition column and the target column.

结合第一方面的第一种至第六种可能的实现方式中的任一种可能的实现方式，在第一方面的第七种可能的实现方式中，该第一压缩为LZ77压缩，该第二压缩为霍夫曼Huffman压缩。In combination with any one of the first to sixth possible implementations of the first aspect, in a seventh possible implementation of the first aspect, the first compression is LZ77 compression, and the first The second compression is Huffman Huffman compression.

第二方面，提供一种数据查询的方法，该方法包括：中央处理器CPU从压缩数据块中获取与查询请求中的查询条件对应的条件列以及与该查询请求中的查询目标对应的目标列，其中，该压缩数据块是通过对原始数据块以列宽为单位进行列压缩处理得到的；CPU向现场可编程门阵列FPGA发送该条件列和该目标列。In a second aspect, a method for data query is provided, the method comprising: the central processing unit CPU obtains the condition column corresponding to the query condition in the query request and the target column corresponding to the query target in the query request from the compressed data block , wherein, the compressed data block is obtained by performing column compression on the original data block in units of column width; the CPU sends the conditional column and the target column to the field programmable gate array FPGA.

结合第二方面，在第二方面的第一种可能的实现方式中，该查询条件包括第一子查询条件和第二子查询条件，该条件列包括与该第一子查询条件对应的第一条件列和与该第二子查询条件对应的第二条件列；该CPU向现场可编程门阵列FPGA发送该条件列和该目标列，包括：该CPU向该FPGA发送指示信息，该指示信息用于指示该第一子查询条件与该第二子查询条件之间的逻辑关系；该CPU按照与该逻辑关系对应的操作顺序向该FPGA发送该第一条件列和该第二条件列。With reference to the second aspect, in a first possible implementation of the second aspect, the query condition includes a first sub-query condition and a second sub-query condition, and the condition column includes the first sub-query condition corresponding to the first sub-query condition Condition column and the second condition column corresponding to the second sub-query condition; the CPU sends the condition column and the target column to the field programmable gate array FPGA, including: the CPU sends instruction information to the FPGA, and the instruction information uses To indicate a logical relationship between the first sub-query condition and the second sub-query condition; the CPU sends the first condition column and the second condition column to the FPGA according to an operation sequence corresponding to the logical relationship.

结合第二方面或第二方面的第一种可能的实现方式，在第二方面的第二种可能的实现方式中，该CPU向现场可编程门阵列FPGA发送该条件列和该目标列，包括：该CPU在向该FPGA发送该条件列之后，再发送该目标列。In combination with the second aspect or the first possible implementation of the second aspect, in the second possible implementation of the second aspect, the CPU sends the condition column and the target column to a field programmable gate array FPGA, including : The CPU sends the target column after sending the condition column to the FPGA.

在本发明实施例的数据查询的方法中，由于条件列中每个压缩记录对应的行掩码仅占一个比特位，占用的缓存远少于目标列的解压缩结果，因此可以减少缓存需求，提高了数据查询装置的查询性能。In the data query method of the embodiment of the present invention, since the row mask corresponding to each compressed record in the condition column only occupies one bit, the occupied cache is far less than the decompressed result of the target column, so the cache requirement can be reduced. The query performance of the data query device is improved.

第三方面，提供了一种数据查询的装置，用于执行上述第一方面或第一方面的任意可能的实现方式中的方法。具体地，该装置包括用于执行上述第一方面或第一方面的任意可能的实现方式中的方法的单元。可选地，该数据查询装置可以为CPU或者FPGA。In a third aspect, a data query device is provided, configured to execute the method in the first aspect or any possible implementation manner of the first aspect. Specifically, the apparatus includes a unit configured to execute the method in the foregoing first aspect or any possible implementation manner of the first aspect. Optionally, the data query device may be a CPU or FPGA.

第四方面，提供了一种数据查询的装置，用于执行上述第二方面或第二方面的任意可能的实现方式中的方法。具体地，该装置包括用于执行上述第一方面或第一方面的任意可能的实现方式中的方法的单元。可选地，该数据查询装置可以为CPU。In a fourth aspect, a data query device is provided, which is used to execute the method in the above-mentioned second aspect or any possible implementation manner of the second aspect. Specifically, the apparatus includes a unit configured to execute the method in the foregoing first aspect or any possible implementation manner of the first aspect. Optionally, the data query device may be a CPU.

第五方面，提供了一种数据查询的系统，包括上述第二方面和第三方面提供的装置，可选地，该系统包括该第二方面提供的FPGA和该第三方面提供的CPU。A fifth aspect provides a data query system, including the devices provided in the second aspect and the third aspect. Optionally, the system includes the FPGA provided in the second aspect and the CPU provided in the third aspect.

第六方面，提供了一种计算机可读介质，用于存储计算机程序，该计算机程序包括用于执行第一方面或第一方面的任意可能的实现方式中的方法的指令。A sixth aspect provides a computer-readable medium for storing a computer program, where the computer program includes instructions for executing the method in the first aspect or any possible implementation manner of the first aspect.

第七方面，提供了一种计算机可读介质，用于存储计算机程序，该计算机程序包括用于执行第二方面或第二方面的任意可能的实现方式中的方法的指令。In a seventh aspect, there is provided a computer-readable medium for storing a computer program, where the computer program includes instructions for executing the method in the second aspect or any possible implementation manner of the second aspect.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案，下面将对本发明实施例中所需要使用的附图作简单地介绍，显而易见地，下面所描述的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following will briefly introduce the accompanying drawings required in the embodiments of the present invention. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For Those of ordinary skill in the art can also obtain other drawings based on these drawings without making creative efforts.

图1是根据本发明实施例的数据查询的系统的示意性框图。Fig. 1 is a schematic block diagram of a data query system according to an embodiment of the present invention.

图2是根据本发明实施例的数据查询的方法的示意性流程图。Fig. 2 is a schematic flowchart of a data query method according to an embodiment of the present invention.

图3是根据本发明实施例的另一数据查询的方法的示意性流程图。Fig. 3 is a schematic flowchart of another data query method according to an embodiment of the present invention.

图4是根据本发明实施例的另一数据查询的方法的示意性流程图。Fig. 4 is a schematic flowchart of another data query method according to an embodiment of the present invention.

图5是根据本发明实施例的数据查询的装置的示意性框图。Fig. 5 is a schematic block diagram of an apparatus for data query according to an embodiment of the present invention.

图6是根据本发明实施例的数据查询的装置的示意性框图。Fig. 6 is a schematic block diagram of an apparatus for data query according to an embodiment of the present invention.

具体实施方式detailed description

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明的一部分实施例，而不是全部实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都应属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

图1示出了本发明实施例的数据查询的系统示意图，该系统包括CPU110、FPGA 120、客户端130和数据存储单元140，其中，该CPU包括查询解析单元111、文件解析单元112、列选择单元113和缓存单元114，该FPGA包括数据管理与调度单元121、解压缩与数据查询单元122和查询结果确定单元123。Fig. 1 shows the system schematic diagram of the data inquiry of the embodiment of the present invention, and this system comprises CPU110, FPGA 120, client 130 and data storage unit 140, and wherein, this CPU comprises query analysis unit 111, file analysis unit 112, column selection unit 113 and cache unit 114 , the FPGA includes a data management and scheduling unit 121 , a decompression and data query unit 122 and a query result determination unit 123 .

该查询解析单元111用于接收该客户端130发送的查询请求，该查询请求包括查询条件和查询目标。The query parsing unit 111 is configured to receive a query request sent by the client 130, and the query request includes a query condition and a query target.

该文件解析单元112用于从该数据存储单元140中读取查询需要使用的压缩数据块，该压缩数据块携带列索引信息，并向列选择单元113发送该压缩数据块以及该列索引信息，其中，该压缩数据块是通过对原始数据块以列宽为单位进行列压缩处理得到的。The file parsing unit 112 is configured to read from the data storage unit 140 the compressed data block that needs to be used for the query, the compressed data block carries column index information, and sends the compressed data block and the column index information to the column selection unit 113, Wherein, the compressed data block is obtained by performing column compression on the original data block in units of column width.

可选地，该压缩数据块是通过对原始数据块依次经过第一压缩处理和第二压缩处理得到的，该第一压缩可为LZ77压缩，该第二压缩可以为霍夫曼(Huffman)压缩。Optionally, the compressed data block is obtained by sequentially going through the first compression process and the second compression process on the original data block, the first compression can be LZ77 compression, and the second compression can be Huffman (Huffman) compression .

该列选择单元113用于根据该压缩数据和该查询请求，确定查询需要的条件列和目标列，并将该条件列和该目标列发送至该FPGA。The column selection unit 113 is used to determine the condition column and target column required for query according to the compressed data and the query request, and send the condition column and target column to the FPGA.

该数据管理与调度单元121用于接收该CPU发送的该条件列和该目标列，并向该解压缩与数据查询单元122发送该条件列和该目标列。The data management and scheduling unit 121 is used for receiving the condition column and the target column sent by the CPU, and sending the condition column and the target column to the decompression and data query unit 122 .

该解压缩与数据查询单元122用于对该条件列和该目标列进行解压缩处理，确定该查询条件对应的行掩码，以及根据该行掩码和该目标列确定查询结果。The decompression and data query unit 122 is used for decompressing the condition column and the target column, determining the row mask corresponding to the query condition, and determining the query result according to the row mask and the target column.

可选地，该查询条件可以包括第一子查询条件和第二子查询条件Optionally, the query condition may include a first sub-query condition and a second sub-query condition

该查询结果确定单元123用于确定是否需要对查询条件对应的行掩码进行聚合处理，并确定目标行掩码，根据该目标行掩码和该目标列解压缩结果确定查询结果，并向该CPU的缓存单元114发送该查询结果。The query result determination unit 123 is used to determine whether to aggregate the row mask corresponding to the query condition, and determine the target row mask, determine the query result according to the target row mask and the decompression result of the target column, and send to the The cache unit 114 of the CPU sends the query result.

该缓存单元114用于向客户端130发送该查询结果。The cache unit 114 is used to send the query result to the client 130 .

图2示出了本发明实施例的数据查询的方法200的示意性流程图。该方法200具体可以由数据查询装置执行。FIG. 2 shows a schematic flowchart of a data query method 200 according to an embodiment of the present invention. The method 200 may specifically be executed by a data query device.

S210，数据查询装置从压缩数据块中获取与查询请求中的查询条件对应的条件列以及与该查询请求中的查询目标对应的目标列，其中，该压缩数据块是通过对原始数据块以列宽为单位进行列压缩处理得到的。S210, the data query device obtains the condition column corresponding to the query condition in the query request and the target column corresponding to the query target in the query request from the compressed data block, wherein the compressed data block is obtained by It is obtained by performing column compression in units of width.

应理解，本发明实施例的原始数据块可以具体为矩阵的形式，例如M*N矩阵，此时，该原始数据块可以表示为{C_m*n，m，n}。在本发明实施例中，可以对该原始数据块以列宽为单位进行列压缩处理，以得到{C'_mn*1，mn，1}。可选地，列宽可以表示原始数据块中占比特位最长的记录中值的长度。It should be understood that the original data block in this embodiment of the present invention may be specifically in the form of a matrix, for example, an M*N matrix. In this case, the original data block may be expressed as {C_m*n , m, n}. In the embodiment of the present invention, column compression processing may be performed on the original data block in units of column widths to obtain {C'_mn*1 , mn, 1}. Optionally, the column width may represent the length of the median value of the longest record in the original data block.

可选地，该压缩数据块是通过对原始数据块依次经过第一压缩处理和第二压缩处理得到的，其中，该第一压缩可以为LZ77压缩，该第二压缩可以为霍夫曼Huffman压缩。Optionally, the compressed data block is obtained by sequentially subjecting the original data block to a first compression process and a second compression process, wherein the first compression may be LZ77 compression, and the second compression may be Huffman compression .

应理解，本发明实施例提供的LZ77压缩算法是一种基于字典籍的无损数据压缩算法，并且是以列宽为单位对原始数据数据进行压缩的，对于原始数据块中已经出现过的第二原始记录，使用一个指向与该第二原始记录相同的第一原始记录的距离-长度对代替重复出现的该第二原始记录。其中，该距离-长度对中的距离是指该第二原始记录与该第一原始记录之间的地址偏移量，距离-长度对中的长度是指第一原始记录的长度。LZ77压缩算法基于“滑动窗口”压缩，滑动窗口包含两部分，一部分是输入的待压缩原始记录，另一部分是刚压缩过的原始记录作为字典籍。It should be understood that the LZ77 compression algorithm provided by the embodiment of the present invention is a dictionary-based lossless data compression algorithm, and the original data is compressed in units of column width. For the second data that has appeared in the original data block For an original record, a duplicate occurrence of the second original record is replaced with a distance-length pair pointing to the same first original record as the second original record. Wherein, the distance in the distance-length pair refers to the address offset between the second original record and the first original record, and the length in the distance-length pair refers to the length of the first original record. The LZ77 compression algorithm is based on "sliding window" compression. The sliding window consists of two parts, one is the input original record to be compressed, and the other is the newly compressed original record as a dictionary.

可选地，可以利用LZ77压缩算法对该原始数据块以列宽为单位进行列压缩处理得到LZ77压缩数据块，具体地，若在该原始数据块中至少一个连续的第二原始记录与至少一个连续的第一原始记录一一对应相同，且该至少一个第一原始记录中每个第一原始记录均无字典籍，即可以将该至少一个第一原始记录作为经过LZ77压缩处理后的至少一个第一LZ77压缩记录，则将该至少一个第二原始记录以距离-长度对的形式压缩为一个第二LZ77压缩记录，其中，该LZ77压缩数据块包括该至少一个第一LZ77压缩记录和该第二LZ77压缩记录。Optionally, the LZ77 compression algorithm can be used to perform column compression processing on the original data block in units of column widths to obtain an LZ77 compressed data block. Specifically, if at least one continuous second original record in the original data block is associated with at least one The consecutive first original records have the same one-to-one correspondence, and there is no dictionary for each first original record in the at least one first original record, that is, the at least one first original record can be used as at least one after LZ77 compression processing. The first LZ77 compressed record, then compress the at least one second original record into a second LZ77 compressed record in the form of a distance-length pair, wherein the LZ77 compressed data block includes the at least one first LZ77 compressed record and the first LZ77 compressed record Two LZ77 compressed records.

可选地，该数据查询装置可以对该LZ77压缩数据块再进行Huffman压缩处理，得到该压缩数据块，其中，该压缩数据块包括与该至少一个第一LZ77压缩记录一一对应的至少一个第一压缩记录以及与该第二LZ77压缩记录对应的第二压缩记录。Optionally, the data query device may further perform Huffman compression processing on the LZ77 compressed data block to obtain the compressed data block, wherein the compressed data block includes at least one first LZ77 compressed record one-to-one corresponding to the at least one first LZ77 compressed record. A compressed record and a second compressed record corresponding to the second LZ77 compressed record.

S220，该数据查询装置根据该查询条件，确定该条件列中的至少一个连续的第一压缩记录中每个第一压缩记录对应的行掩码。S220, the data query device determines a row mask corresponding to each first compressed record in at least one continuous first compressed record in the condition column according to the query condition.

可选地，在S220之前，该数据查询装置可以对该条件列中的每个压缩记录进行与该第二压缩处理相对应的解压缩处理，得到第一解压缩数据块，其中，该第一解压缩数据块包括该至少一个连续的第一压缩记录对应的至少一个第一解压缩记录和该第二压缩记录对应的第二解压缩记录。Optionally, before S220, the data query device may perform decompression processing corresponding to the second compression processing on each compressed record in the condition column to obtain a first decompressed data block, wherein the first The decompressed data block includes at least one first decompressed record corresponding to the at least one continuous first compressed record and a second decompressed record corresponding to the second compressed record.

具体而言，该数据查询装置可以根据该至少一个第一解压缩记录中每个第一解压缩记录以及该查询条件，确定该每个第一解压缩记录对应的行掩码。Specifically, the data query device may determine the row mask corresponding to each first decompressed record according to each first decompressed record in the at least one first decompressed record and the query condition.

作为一个可选实施例，该数据查询装置可以对该条件列中的每个压缩记录进行Huffman解压缩处理，得到Huffman解压缩数据块，其中，该Huffman解压缩数据块包括与该至少一个第一LZ77压缩记录对应的至少一个第一Huffman解压缩记录和该第二LZ77压缩记录对应的第二Huffman解压缩记录，其中，该至少一个第一Huffman解压缩记录对应于至少一个第一原始记录。则该数据查询装置可以根据该至少一个第一原始记录中每个第一原始记录以及该查询条件，确定与该每个第一原始记录对应的行掩码。As an optional embodiment, the data query device may perform Huffman decompression processing on each compressed record in the condition column to obtain a Huffman decompressed data block, wherein the Huffman decompressed data block includes the at least one first At least one first Huffman decompressed record corresponding to the LZ77 compressed record and a second Huffman decompressed record corresponding to the second LZ77 compressed record, wherein the at least one first Huffman decompressed record corresponds to at least one first original record. Then the data query device can determine the row mask corresponding to each first original record according to each first original record in the at least one first original record and the query condition.

S230，若该数据查询装置确定该条件列中的第二压缩记录对应于至少一个原始记录，并且该至少一个原始记录与该至少一个连续的第一压缩记录中每个第一压缩记录对应的原始记录一一对应相同，确定该第二压缩记录对应的行掩码包括该连续的至少一个第一压缩记录中每个第一压缩记录对应的行掩码。S230, if the data query device determines that the second compressed record in the condition column corresponds to at least one original record, and the at least one original record corresponds to the original record corresponding to each first compressed record in the at least one continuous first compressed record The one-to-one correspondence of the records is the same, and determining the row mask corresponding to the second compressed record includes the row mask corresponding to each first compressed record in the at least one continuous first compressed record.

具体地，若该第二解压缩记录具体为距离-长度对，并且该距离-长度对的值指示该第二解压缩记录对应于该至少一个原始记录，并且该至少一个原始记录与该至少一个第一解压缩记录中每个第一解压缩记录对应的原始记录一一对应相同，确定该第二解压缩记录对应的行掩码包括该至少一个第一解压缩记录中每个第一解压缩记录对应的行掩码，其中，该距离-长度对中的距离表示该第二解压缩记录对应的该至少一个原始记录与该至少一个第一解压缩记录对应的至少一个原始记录之间的地址偏移量，该距离-长度对中的长度表示该至少一个第一解压缩记录的长度。Specifically, if the second decompressed record is specifically a distance-length pair, and the value of the distance-length pair indicates that the second decompressed record corresponds to the at least one original record, and the at least one original record is related to the at least one The original records corresponding to each first decompressed record in the first decompressed record are identical in one-to-one correspondence, and it is determined that the line mask corresponding to the second decompressed record includes each first decompressed record in the at least one first decompressed record A line mask corresponding to the record, wherein the distance in the distance-length pair represents the address between the at least one original record corresponding to the second decompressed record and the at least one original record corresponding to the at least one first decompressed record The offset, the length in the distance-length pair represents the length of the at least one first decompressed record.

作为一个可选实施例，若该数据查询装置确定该第二Huffman解压缩记录具体为距离-长度对，且该距离-长度对的值指示该第二Huffman解压缩记录对应于该至少一个第一Huffman解压缩记录，其中，该至少一个第一Huffman解压缩记录对应于至少一个第一原始记录。则确定该第二Huffman解压缩记录对应的行掩码包括该至少一个第一原始记录中第一原始记录对应的行掩码。As an optional embodiment, if the data query device determines that the second Huffman decompression record is specifically a distance-length pair, and the value of the distance-length pair indicates that the second Huffman decompression record corresponds to the at least one first Huffman decompressed records, wherein the at least one first Huffman decompressed record corresponds to at least one first original record. Then it is determined that the line mask corresponding to the second Huffman decompressed record includes the line mask corresponding to the first original record in the at least one first original record.

作为另一个可选实施例，若该数据查询装置确定该第二Huffman解压缩记录为第二原始记录，且该第二原始记录不同于该至少一个第一原始记录中任一个第一原始记录，则根据该第二原始记录和该查询请求，确定该第二Huffman解压缩记录对应的行掩码。As another optional embodiment, if the data query device determines that the second Huffman decompressed record is a second original record, and the second original record is different from any first original record in the at least one first original record, Then, according to the second original record and the query request, determine the row mask corresponding to the second Huffman decompressed record.

S240，该数据查询装置根据该条件列中每个压缩记录对应的行掩码以及该目标列，确定该查询请求对应的查询结果。S240, the data query device determines the query result corresponding to the query request according to the row mask corresponding to each compressed record in the condition column and the target column.

具体地，该数据查询装置可以根据该条件列中每个解压缩记录对应的行掩码，确定目标列中与该查询请求对应的查询结果。Specifically, the data query device can determine the query result corresponding to the query request in the target column according to the row mask corresponding to each decompressed record in the condition column.

作为一个可选实施例，该数据查询装置根据每个Huffman解压缩记录对应的行掩码，确定目标列中与该查询请求对应的查询结果。As an optional embodiment, the data query device determines the query result corresponding to the query request in the target column according to the row mask corresponding to each Huffman decompressed record.

本发明实施例的数据查询的方法，通过数据查询装置从压缩数据块中获取与查询请求中的查询条件对应的条件列以及与该查询请求中的查询目标对应的目标列，该压缩数据块是通过对原始数据块以列宽为单位进行列压缩处理得到的；根据该查询条件，确定该条件列中的至少一个连续的第一压缩记录对应的行掩码；若确定该条件列中的第二压缩记录对应于至少一个原始记录，并且该至少一个原始记录与该至少一个连续的第一压缩记录对应的原始记录一一对应相同，确定该第二压缩记录对应的行掩码包括该连续的至少一个第一压缩记录对应的行掩码；根据该条件列中每个压缩记录对应的行掩码以及该目标列，确定该查询请求对应的查询结果，能够提高数据查询的速度。In the data query method of the embodiment of the present invention, the condition column corresponding to the query condition in the query request and the target column corresponding to the query target in the query request are obtained from the compressed data block by the data query device, and the compressed data block is Obtained by performing column compression processing on the original data block in units of column width; according to the query condition, determine the row mask corresponding to at least one continuous first compressed record in the condition column; The two compressed records correspond to at least one original record, and the at least one original record is the same as the original record corresponding to the at least one continuous first compressed record, and it is determined that the row mask corresponding to the second compressed record includes the continuous The row mask corresponding to at least one first compressed record; according to the row mask corresponding to each compressed record in the condition column and the target column, determine the query result corresponding to the query request, which can improve the speed of data query.

可选地，该数据查询装置可以为FPGA或者CPU。Optionally, the data query device may be FPGA or CPU.

作为一个可选实施例，若该数据查询装置为FPGA，则该FPGA可以接收CPU发送的该条件列以及该目标列。As an optional embodiment, if the data query device is an FPGA, the FPGA may receive the condition column and the target column sent by the CPU.

可选地，该查询条件可以包括第一子查询条件和第二子查询条件，该条件列包括与该第一子查询条件对应的第一条件列和与该第二子查询条件对应的第二条件列；在该FPGA接收该CPU发送的该条件列以及该目标列之前，该FPGA可以接收CPU发送的该第一子查询条件与该第二子查询条件之间的逻辑关系。Optionally, the query condition may include a first sub-query condition and a second sub-query condition, and the condition column includes a first condition column corresponding to the first sub-query condition and a second sub-query condition corresponding to the second sub-query condition. Condition column: before the FPGA receives the condition column and the target column sent by the CPU, the FPGA may receive the logical relationship between the first sub-query condition and the second sub-query condition sent by the CPU.

可选地，该FPGA可以接收该CPU按照与该逻辑关系对应的操作顺序发送的该第一条件列和该第二条件列，该FPGA还可以接收该CPU发送的该第一条件列、该第二条件列以及该第一条件列与该第二条件列之间的逻辑关系，以便于该FPGA按照与该逻辑关系对应的操作顺序对该第一条件列和该第二条件列进行操作，本发明实施例对此不作限定。Optionally, the FPGA may receive the first condition column and the second condition column sent by the CPU according to the operation sequence corresponding to the logical relationship, and the FPGA may also receive the first condition column, the second condition column sent by the CPU, and the second condition column. Two conditional columns and the logical relationship between the first conditional column and the second conditional column, so that the FPGA operates on the first conditional column and the second conditional column according to the operation sequence corresponding to the logical relationship. The embodiment of the invention does not limit this.

在本发明实施例的数据查询的方法中，该FPGA按照与该逻辑关系对应的操作顺序接收该第一条件列和该第二条件列，加快了与该查询条件对应的行掩码产生的速度，减少了确定行掩码过程的中间数据以及缓存需求，从而提高了数据查询的速度。In the data query method of the embodiment of the present invention, the FPGA receives the first condition column and the second condition column according to the operation sequence corresponding to the logical relationship, which speeds up the generation of the row mask corresponding to the query condition , reducing the intermediate data and cache requirements in the process of determining the row mask, thereby increasing the speed of data query.

作为另一个可选实施例，若该数据查询装置为CPU，则该CPU可以接收客户端发送的查询请求，该查询请求可以包括查询条件和查询目标，该CPU可以从数据存储器中读取该查询请求需要的压缩数据块，并根据该压缩数据块和该查询请求确定与该查询条件对应的条件列以及与该查询目标对应的目标列，其中，该压缩数据块携带列索引信息。As another optional embodiment, if the data query device is a CPU, the CPU can receive a query request sent by the client, the query request can include query conditions and query targets, and the CPU can read the query from the data storage Request the required compressed data block, and determine the condition column corresponding to the query condition and the target column corresponding to the query target according to the compressed data block and the query request, wherein the compressed data block carries column index information.

可选地，该数据查询装置可以先解压缩条件列，再解压缩目标列，使得该数据查询装置可以先得到与查询条件对应的行掩码再得到目标列的解压缩结果。Optionally, the data query device can decompress the condition column first, and then decompress the target column, so that the data query device can first obtain the row mask corresponding to the query condition, and then obtain the decompression result of the target column.

图3是本发明实施例的数据查询的方法300的示意性流程图。如图3所示，该方法300具体可以由数据查询装置执行。FIG. 3 is a schematic flowchart of a data query method 300 according to an embodiment of the present invention. As shown in FIG. 3 , the method 300 may specifically be executed by a data query device.

S310，CPU从压缩数据块中获取与查询请求中的查询条件对应的条件列以及与该查询请求中的查询目标对应的目标列，其中，该压缩数据块是通过对原始数据块以列宽为单位进行列压缩处理得到的。S310, the CPU obtains the condition column corresponding to the query condition in the query request and the target column corresponding to the query target in the query request from the compressed data block, wherein the compressed data block is obtained by the original data block with a column width of The unit is obtained by performing column compression processing.

具体地，该CPU可以接收客户端发送的查询请求，该查询请求包括查询条件和查询目标，该CPU可以根据该查询请求，在数据存储器中确定于该查询请求对应的压缩数据块，该压缩数据块是通过对原始数据块以列宽为单位进行列压缩处理得到的，该CPU可以从该压缩数据块中确定与该查询条件对应的条件里，以及与该查询目标对应的目标列。Specifically, the CPU may receive a query request sent by a client, the query request includes a query condition and a query target, and the CPU may determine a compressed data block corresponding to the query request in the data storage according to the query request, the compressed data The block is obtained by performing column compression on the original data block in units of column width, and the CPU can determine the condition corresponding to the query condition and the target column corresponding to the query target from the compressed data block.

S320，CPU向FPGA发送该条件列和该目标列。S320, the CPU sends the condition column and the target column to the FPGA.

可选地，该查询条件包括第一子查询条件和第二子查询条件，该条件列包括与该第一子查询条件对应的第一条件列和与该第二子查询条件对应的第二条件列；该CPU可以向该FPGA发送该第一子查询条件与该第二子查询条件之间的逻辑关系；该CPU按照与该逻辑关系对应的操作顺序向该FPGA发送该第一条件列和该第二条件列。Optionally, the query condition includes a first sub-query condition and a second sub-query condition, and the condition column includes a first condition column corresponding to the first sub-query condition and a second condition corresponding to the second sub-query condition column; the CPU can send the logical relationship between the first sub-query condition and the second sub-query condition to the FPGA; the CPU sends the first condition column and the logical relationship to the FPGA according to the operation sequence corresponding to the logical relationship The second conditional column.

可选地，该CPU可以在向该FPGA发送该条件列之后，再发送该目标列。Optionally, the CPU may send the target column after sending the condition column to the FPGA.

本发明实施例的数据查询的方法，通过CPU从压缩数据块中获取与查询请求中的查询条件对应的条件列以及与该查询请求中的查询目标对应的目标列，其中，该压缩数据块是通过对原始数据块以列宽为单位进行列压缩处理得到的，能够提高数据查询的速度。In the data query method of the embodiment of the present invention, the condition column corresponding to the query condition in the query request and the target column corresponding to the query target in the query request are obtained from the compressed data block by the CPU, wherein the compressed data block is It is obtained by performing column compression on the original data block in units of column width, which can improve the speed of data query.

图4是本发明实施例的数据查询的方法400的示意性流程图。如图4所示，该方法400应用于如图1所示的数据查询系统。FIG. 4 is a schematic flowchart of a data query method 400 according to an embodiment of the present invention. As shown in FIG. 4 , the method 400 is applied to the data query system shown in FIG. 1 .

S401，CPU接收客户端的查询请求，该查询请求包括查询条件和查询目标。S401. The CPU receives a query request from a client, where the query request includes a query condition and a query target.

S402，该CPU从数据存储器中获取该查询请求需要的压缩数据块，该压缩数据块携带列索引信息，且该压缩数据块是通过对原始数据块以列宽为单位进行列压缩处理得到的。S402. The CPU obtains from the data storage the compressed data block required by the query request, the compressed data block carries column index information, and the compressed data block is obtained by performing column compression processing on the original data block in units of column width.

可选地，该压缩数据块可以是通过对原始数据块依次经过第一压缩处理和第二压缩处理得到的，并且，更具体地，该第一压缩可以为LZ77压缩，该第二压缩可以为Huffman压缩。Optionally, the compressed data block may be obtained by sequentially subjecting the original data block to a first compression process and a second compression process, and, more specifically, the first compression may be LZ77 compression, and the second compression may be Huffman compression.

S403，该CPU从压缩数据块中获取与查询请求中的查询条件对应的条件列以及与该查询请求中的查询目标对应的目标列。S403. The CPU acquires the condition column corresponding to the query condition in the query request and the target column corresponding to the query target in the query request from the compressed data block.

S404，该CPU向FPGA发送该条件列和该目标列。S404, the CPU sends the condition column and the target column to the FPGA.

可选地，该CPU可以在向该FPGA发送条件列之后，再发送目标列。Optionally, the CPU may send the target sequence after sending the condition sequence to the FPGA.

可选地，该查询条件可以包括第一子查询条件和第二子查询条件，该条件列包括与该第一子查询条件对应的第一条件列和与该第二子查询条件对应的第二条件列，该CPU可以按照与该第一子查询条件与该第二子查询条件之间的逻辑关系所对应的操作顺序发送该第一条件列和该第二条件列。Optionally, the query condition may include a first sub-query condition and a second sub-query condition, and the condition column includes a first condition column corresponding to the first sub-query condition and a second sub-query condition corresponding to the second sub-query condition. A condition column, the CPU may send the first condition column and the second condition column according to an operation sequence corresponding to the logical relationship between the first sub-query condition and the second sub-query condition.

S405，该FPGA对该条件列进行解压缩处理。S405, the FPGA decompresses the condition column.

S406，该FPGA确定该查询条件对应的行掩码。S406. The FPGA determines a row mask corresponding to the query condition.

具体地，该FPGA可以对该条件列进行Huffman解压缩处理，得到该条件列的Huffman解压缩结果，并根据该条件列的Huffman解压缩结果和该查询条件，确定与该查询条件对应的行掩码。Specifically, the FPGA can perform Huffman decompression processing on the condition column to obtain the Huffman decompression result of the condition column, and determine the row mask corresponding to the query condition according to the Huffman decompression result of the condition column and the query condition. code.

S407，该FPGA对该目标列进行解压缩处理。S407, the FPGA decompresses the target column.

可选地，该FPGA可以对该目标列先进行Huffman解压缩，再进行LZ77解压缩，得到该目标列的解压缩结果。Optionally, the FPGA may first perform Huffman decompression on the target column, and then perform LZ77 decompression to obtain a decompression result of the target column.

S408，该FPGA根据该行掩码和该目标列确定查询结果。S408. The FPGA determines a query result according to the row mask and the target column.

S409，该FPGA通过该CPU向该客户端返回该查询结果。S409, the FPGA returns the query result to the client through the CPU.

上面结合了图2至图4对本发明实施例的数据查询的方法进行了描述，下面将结合图5和图6描述本发明实施例的数据查询的装置，应注意，这些例子仅仅是为了帮助本领域技术人员理解和实现本发明的实施例，而非限制本发明实施例的范围。本领域技术人员可以根据这里给出的例子进行等价变换或修改，这样的变换或修改仍应落入本发明实施例的范围内。The method for data query of the embodiment of the present invention has been described above in conjunction with FIGS. 2 to 4 . The device for data query of the embodiment of the present invention will be described below in conjunction with FIGS. Those skilled in the art understand and implement the embodiments of the present invention, but not to limit the scope of the embodiments of the present invention. Those skilled in the art can perform equivalent transformation or modification according to the examples given here, and such transformation or modification should still fall within the scope of the embodiments of the present invention.

图5是本发明实施例的数据查询的装置500的示意性框图。Fig. 5 is a schematic block diagram of an apparatus 500 for data query according to an embodiment of the present invention.

获取单元510，用于数据查询装置从压缩数据块中获取与查询请求中的查询条件对应的条件列以及与该查询请求中的查询目标对应的目标列，其中，该压缩数据块是通过对原始数据块以列宽为单位进行列压缩处理得到的.The obtaining unit 510 is used for the data query device to obtain the condition column corresponding to the query condition in the query request and the target column corresponding to the query target in the query request from the compressed data block, wherein the compressed data block is obtained by analyzing the original The data block is obtained by column compression in units of column width.

确定单元520，用于该数据查询装置根据该查询条件，确定该条件列中的至少一个连续的第一压缩记录中每个第一压缩记录对应的行掩码。The determining unit 520 is used for the data query device to determine the row mask corresponding to each first compressed record in at least one continuous first compressed record in the condition column according to the query condition.

该确定单元520还用于若该数据查询装置确定该条件列中的第二压缩记录对应于至少一个原始记录，并且该至少一个原始记录与该至少一个连续的第一压缩记录中每个第一压缩记录对应的原始记录一一对应相同，确定该第二压缩记录对应的行掩码包括该连续的至少一个第一压缩记录中每个第一压缩记录对应的行掩码；该数据查询装置根据该条件列中每个压缩记录对应的行掩码以及该目标列，确定该查询请求对应的查询结果。The determination unit 520 is further configured to if the data query device determines that the second compressed record in the condition column corresponds to at least one original record, and the at least one original record and each first compressed record in the at least one continuous first compressed record The original records corresponding to the compressed records are identical in one-to-one correspondence, and it is determined that the row mask corresponding to the second compressed record includes the row mask corresponding to each first compressed record in the continuous at least one first compressed record; the data query device according to The row mask corresponding to each compressed record in the condition column and the target column determine the query result corresponding to the query request.

可选地，该压缩数据块是通过对原始数据块依次经过第一压缩处理和第二压缩处理得到的；该获取单元还用于在该数据查询装置根据该查询条件，确定该条件列中的至少一个连续的第一压缩记录中每个第一压缩记录对应的行掩码之前，对该条件列中的每个压缩记录进行与该第二压缩处理相对应的解压缩处理，得到第一解压缩数据块，其中，该第一解压缩数据块包括该至少一个连续的第一压缩记录对应的至少一个第一解压缩记录和该第二压缩记录对应的第二解压缩记录；该确定单元还用于确定该至少一个第一解压缩记录中每个第一解压缩记录对应的行掩码。Optionally, the compressed data block is obtained by sequentially undergoing first compression processing and second compression processing on the original data block; the obtaining unit is also used to determine the Before the row mask corresponding to each first compressed record in at least one continuous first compressed record, perform decompression processing corresponding to the second compression processing on each compressed record in the condition column to obtain the first decompression A compressed data block, wherein the first decompressed data block includes at least one first decompressed record corresponding to the at least one continuous first compressed record and a second decompressed record corresponding to the second compressed record; the determining unit also It is used to determine the row mask corresponding to each first decompressed record in the at least one first decompressed record.

可选地，该确定单元520具体用于：若该第二解压缩记录具体为距离-长度对，并且该距离-长度对的值指示该第二解压缩记录对应于该至少一个原始记录，并且该至少一个原始记录与该至少一个第一解压缩记录中每个第一解压缩记录对应的原始记录一一对应相同，确定该第二解压缩记录对应的行掩码包括该至少一个第一解压缩记录中每个第一解压缩记录对应的行掩码，其中，该距离-长度对中的距离表示该第二解压缩记录对应的该至少一个原始记录与该至少一个第一解压缩记录对应的至少一个原始记录之间的地址偏移量，该距离-长度对中的长度表示该至少一个第一解压缩记录的长度。Optionally, the determining unit 520 is specifically configured to: if the second decompressed record is specifically a distance-length pair, and the value of the distance-length pair indicates that the second decompressed record corresponds to the at least one original record, and The at least one original record is the same as the original record corresponding to each first decompressed record in the at least one first decompressed record, and it is determined that the row mask corresponding to the second decompressed record includes the at least one first decompressed record. A row mask corresponding to each first decompressed record in the compressed record, wherein the distance in the distance-length pair indicates that the at least one original record corresponding to the second decompressed record corresponds to the at least one first decompressed record The address offset between the at least one original record, the length in the distance-length pair represents the length of the at least one first decompressed record.

可选地，该确定单元520还用于：若该第二解压缩记录为原始记录，则根据该第二解压缩记录对应的原始记录和该查询条件，确定该第二解压缩记录对应的行掩码。Optionally, the determining unit 520 is further configured to: if the second decompressed record is an original record, determine the row corresponding to the second decompressed record according to the original record corresponding to the second decompressed record and the query condition mask.

可选地，该数据查询装置为FPGA，该获取单元具体用于：FPGA接收中央处理器CPU发送的该条件列以及该目标列。Optionally, the data query device is an FPGA, and the acquiring unit is specifically configured to: the FPGA receives the condition column and the target column sent by the central processing unit CPU.

可选地，该查询条件包括第一子查询条件和第二子查询条件，该条件列包括与该第一子查询条件对应的第一条件列和与该第二子查询条件对应的第二条件列，该装置还包括接收单元，该接收单元用于在该FPGA接收该CPU发送的该条件列以及该目标列之前，接收CPU发送的该第一子查询条件与该第二子查询条件之间的逻辑关系；该获取单元具体用于接收该CPU按照与该逻辑关系对应的操作顺序发送的该第一条件列和该第二条件列。Optionally, the query condition includes a first sub-query condition and a second sub-query condition, and the condition column includes a first condition column corresponding to the first sub-query condition and a second condition corresponding to the second sub-query condition column, the device also includes a receiving unit, the receiving unit is used to receive the first sub-query condition sent by the CPU and the second sub-query condition before the FPGA receives the condition column and the target column sent by the CPU logical relationship; the acquiring unit is specifically configured to receive the first condition column and the second condition column sent by the CPU according to the operation sequence corresponding to the logical relationship.

可选地，该数据查询装置为CPU，该获取单元具体用于：该CPU获取该压缩数据块以及该查询请求；该CPU根据该压缩数据块和该查询请求，确定该条件列和该目标列。Optionally, the data query device is a CPU, and the obtaining unit is specifically configured to: the CPU obtains the compressed data block and the query request; the CPU determines the condition column and the target column according to the compressed data block and the query request .

可选地，该第一压缩为LZ77压缩，该第二压缩为霍夫曼Huffman压缩。Optionally, the first compression is LZ77 compression, and the second compression is Huffman compression.

图6是本发明实施例的数据查询的装置600的示意性框图。Fig. 6 is a schematic block diagram of an apparatus 600 for data query according to an embodiment of the present invention.

获取单元610，用于中央处理器CPU从压缩数据块中获取与查询请求中的查询条件对应的条件列以及与该查询请求中的查询目标对应的目标列，其中，该压缩数据块是通过对原始数据块以列宽为单位进行列压缩处理得到的.The obtaining unit 610 is used for the central processing unit CPU to obtain the condition column corresponding to the query condition in the query request and the target column corresponding to the query target in the query request from the compressed data block, wherein the compressed data block is obtained by The original data block is obtained by column compression in units of column width.

发送单元620，用于CPU向现场可编程门阵列FPGA发送该条件列和该目标列。The sending unit 620 is used for the CPU to send the condition column and the target column to the field programmable gate array FPGA.

可选地，该查询条件包括第一子查询条件和第二子查询条件，该条件列包括与该第一子查询条件对应的第一条件列和与该第二子查询条件对应的第二条件列；该发送单元620具体用于：该CPU向该FPGA发送该第一子查询条件与该第二子查询条件之间的逻辑关系；该CPU按照与该逻辑关系对应的操作顺序向该FPGA发送该第一条件列和该第二条件列。Optionally, the query condition includes a first sub-query condition and a second sub-query condition, and the condition column includes a first condition column corresponding to the first sub-query condition and a second condition corresponding to the second sub-query condition column; the sending unit 620 is specifically used for: the CPU sends the logical relationship between the first sub-query condition and the second sub-query condition to the FPGA; the CPU sends the logical relationship to the FPGA according to the operation sequence corresponding to the logical relationship The first condition column and the second condition column.

可选地，该发送单元具体用于：该CPU在向该FPGA发送该条件列之后，再发送该目标列。Optionally, the sending unit is specifically configured to: send the target sequence after the CPU sends the condition sequence to the FPGA.

本领域普通技术人员可以意识到，结合本文中所公开的实施例中描述的各方法步骤和单元，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各实施例的步骤及组成。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。本领域普通技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Those of ordinary skill in the art can realize that, in combination with the various method steps and units described in the embodiments disclosed herein, they can be implemented by electronic hardware, computer software, or a combination of the two. In order to clearly illustrate the possibility of hardware and software For interchangeability, in the above description, the steps and components of each embodiment have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those of ordinary skill in the art may use different methods to implement the described functions for each particular application, but such implementation should not be regarded as exceeding the scope of the present invention.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, and will not be repeated here.

在本申请所提供的几个实施例中，应该理解到，所揭露的系统、装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另外，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接，也可以是电的，机械的或其它的形式连接。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本发明实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

另外，在本发明各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分，或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(Read-OnlyMemory，简称为“ROM”)、随机存取存储器(Random Access Memory，简称为“RAM”)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of software products, and the computer software products are stored in a storage medium In, several instructions are included to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, referred to as "ROM"), random access memory (Random Access Memory, referred to as "RAM"), magnetic disk or optical disc, etc. A medium that can store program code.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到各种等效的修改或替换，这些修改或替换都应涵盖在本发明的保护范围之内。因此，本发明的保护范围应以权利要求的保护范围为准。The above is only a specific embodiment of the present invention, but the protection scope of the present invention is not limited thereto. Any person familiar with the technical field can easily think of various equivalents within the technical scope disclosed in the present invention. Modifications or replacements shall all fall within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.