Movatterモバイル変換


[0]ホーム

URL:


CN112883059A - Query operation instruction optimization method and device, electronic equipment and storage medium - Google Patents

Query operation instruction optimization method and device, electronic equipment and storage medium
Download PDF

Info

Publication number
CN112883059A
CN112883059ACN201911204591.3ACN201911204591ACN112883059ACN 112883059 ACN112883059 ACN 112883059ACN 201911204591 ACN201911204591 ACN 201911204591ACN 112883059 ACN112883059 ACN 112883059A
Authority
CN
China
Prior art keywords
query
query operation
compressed data
data
optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911204591.3A
Other languages
Chinese (zh)
Inventor
缪哲语
李猛
吴迪
乔智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding LtdfiledCriticalAlibaba Group Holding Ltd
Priority to CN201911204591.3ApriorityCriticalpatent/CN112883059A/en
Priority to PCT/CN2020/132386prioritypatent/WO2021104478A1/en
Publication of CN112883059ApublicationCriticalpatent/CN112883059A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

The embodiment of the invention discloses a query operation instruction optimization method, a query operation instruction optimization device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a query operation instruction, and acquiring corresponding query operation object data according to the query operation instruction, wherein the query operation object data are compressed data and carry corresponding attribute information; determining query operable compressed data corresponding to the query operation object data according to the query operation object data, and acquiring attribute information of the query operable compressed data; and carrying out hierarchical optimization on the query operation instruction according to the attribute information of the query operable compressed data to obtain the optimized query operation instruction. The technical scheme has strong applicability, avoids a complex data decompression process, saves the storage space of decompressed data to a great extent, simplifies the data operation flow and improves the data operation performance.

Description

Query operation instruction optimization method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of query operation control, in particular to a query operation instruction optimization method and device, electronic equipment and a storage medium.
Background
As data technology develops, more and more data needs to be stored in a database. In order to save the data storage space, it is usually necessary to compress the data to be stored before storing the data, for example, for a HiStore column database, an average compression ratio higher than 10:1 can be achieved by using a suitable compression algorithm, so that the data storage space can be greatly saved. However, when performing subsequent data operations such as data query, the compressed data cannot be directly operated, for example, if the original execution engine uses an original data type consistent with the logic execution plan during execution, the compressed data needs to be completely decompressed first, and then the query operation is performed after the compressed data is restored to the original data, so that the complicated data decompression process can greatly reduce the query performance of the engine.
For the above problems, the following treatment methods can be generally adopted: 1. performing urgent data decompression (eager decompression), that is, decompressing when data is introduced into the main memory, although this processing method can limit the code change caused by the compressed storage manager, it cannot avoid decompressing the data, so the execution process of the data operation is not substantially optimized, and the occupation of the memory is not optimized because the data stored in the memory is the uncompressed original data; 2. performing delayed decompression (lazy decompression), i.e. the data is kept compressed as much as possible during the execution of the whole query and other operations, and performing data decompression only when necessary, for example, before the physical operator operation, but this processing method is only suitable for a compression method that partially provides full mapping (mapping-complete), i.e. the data can be decompressed by full mapping, and additional mapping operations need to be added, which slows down the execution speed and may increase the intermediate result size; 3. instantaneous decompression (i.e., by adjusting a standard relational operator, compression properties of data before and after execution of an operation are not changed, and data is decompressed only in a current operator operation, but this processing method is basically only applicable to the relational operator and has very limited application.
Disclosure of Invention
The embodiment of the invention provides a query operation instruction optimization method, a query operation instruction optimization device, electronic equipment and a storage medium.
In a first aspect, an embodiment of the present invention provides a query operation instruction optimization method.
Specifically, the query operation instruction optimization method includes:
acquiring a query operation instruction, and acquiring corresponding query operation object data according to the query operation instruction, wherein the query operation object data are compressed data and carry corresponding attribute information;
determining query operable compressed data corresponding to the query operation object data according to the query operation object data, and acquiring attribute information of the query operable compressed data;
and carrying out hierarchical optimization on the query operation instruction according to the attribute information of the query operable compressed data to obtain the optimized query operation instruction.
With reference to the first aspect, in a first implementation manner of the first aspect, the compressed data is stored column by column in units of row groups, and the compressed data includes a compressed data packet for storing the compressed data and a compressed data information packet for storing attribute information of the compressed data.
With reference to the first aspect and the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the determining, according to the query operation object data, query operable compressed data corresponding to the query operation object data, and acquiring attribute information of the query operable compressed data includes:
when the query operation object data is operable compressed data, determining the query operation object data as corresponding query operable compressed data;
when the query operation object data is the inoperable compressed data, decompressing the inoperable compressed data until operable compressed data is obtained, and determining the operable compressed data obtained by decompression as the query operable compressed data corresponding to the query operation object data;
acquiring attribute information of the query operable compressed data, wherein the attribute information of the query operable compressed data at least comprises one or more of the following information: statistical information, compression information, storage information.
With reference to the first aspect, the first implementation manner of the first aspect, and the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the performing hierarchical optimization on the query operation instruction according to the attribute information of the query-operable compressed data to obtain an optimized query operation instruction includes:
determining a query operation expression and a query operation operator corresponding to the query operation instruction according to the query operation instruction;
acquiring first attribute information of the query operable compressed data, and performing first-level optimization on the query operation instruction according to the first attribute information to obtain a corresponding first-level optimization query operation expression and a corresponding first-level optimization query operation operator;
and acquiring second attribute information of the query operable compressed data, performing secondary optimization on the first-stage optimization query operation expression according to the second attribute information to obtain a corresponding secondary optimization query operation expression and a corresponding secondary optimization query operation operator until a final-stage optimization query operation expression and a final-stage optimization query operation operator are obtained, and acquiring the optimized query operation instruction according to the final-stage optimization query operation expression and the final-stage optimization query operation operator.
With reference to the first implementation manner of the first aspect, the second implementation manner of the first aspect, and the third implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the present disclosure further includes:
and executing operation on the query operable compressed data according to the optimized query operation instruction.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, the third implementation manner of the first aspect, and the fourth implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the performing, according to the optimized query operation instruction, an operation on the query-operable compressed data includes:
determining target query operable compressed data in the query operable compressed data according to the optimized query operation instruction;
and performing operation on the target query operable compressed data according to the optimized query operation instruction.
In a second aspect, an embodiment of the present invention provides a query operation instruction optimization method.
Specifically, the query operation instruction optimization method includes:
the acquisition module is configured to acquire a query operation instruction and acquire corresponding query operation object data according to the query operation instruction, wherein the query operation object data are compressed data and carry corresponding attribute information;
the determining module is configured to determine query operable compressed data corresponding to the query operation object data according to the query operation object data and acquire attribute information of the query operable compressed data;
and the optimization module is configured to perform hierarchical optimization on the query operation instruction according to the attribute information of the query operable compressed data to obtain an optimized query operation instruction.
With reference to the second aspect, in a first implementation manner of the second aspect, the compressed data is stored column by column in units of row groups, and the compressed data includes a compressed data packet for storing the compressed data and a compressed data information packet for storing attribute information of the compressed data.
With reference to the second aspect and the first implementation manner of the second aspect, in a second implementation manner of the second aspect, the determining module includes:
a first determining sub-module configured to determine the query operation object data as query operable compressed data corresponding to the query operation object data when the query operation object data is operable compressed data;
a second determining sub-module, configured to, when the query operation object data is the inoperable compressed data, decompress the inoperable compressed data until operable compressed data is obtained, and determine the operable compressed data obtained by decompression as query operable compressed data corresponding to the query operation object data;
an obtaining sub-module configured to obtain attribute information of the query-operable compressed data, wherein the attribute information of the query-operable compressed data at least includes one or more of the following information: statistical information, compression information, storage information.
With reference to the second aspect, the first implementation manner of the second aspect, and the second implementation manner of the second aspect, in a third implementation manner of the second aspect, the optimization module includes:
the third determining sub-module is configured to determine a query operation expression and a query operation operator corresponding to the query operation instruction according to the query operation instruction;
the first optimization submodule is configured to acquire first attribute information of the query operable compressed data, and perform first-level optimization on the query operation instruction according to the first attribute information to obtain a corresponding first-level optimization query operation expression and a corresponding first-level optimization query operation operator;
and the second optimization submodule is configured to acquire second attribute information of the query operable compressed data, perform secondary optimization on the first-stage optimization query operation expression according to the second attribute information, acquire a corresponding secondary optimization query operation expression and a corresponding secondary optimization query operation operator until a final-stage optimization query operation expression and a final-stage optimization query operation operator are acquired, and acquire the optimized query operation instruction according to the final-stage optimization query operation expression and the final-stage optimization query operation operator.
With reference to the first implementation manner of the second aspect, the second implementation manner of the second aspect, and the third implementation manner of the second aspect, in a fourth implementation manner of the second aspect, the present disclosure further includes:
an execution module configured to perform an operation on the query-operable compressed data according to the optimized query operation instruction.
With reference to the first implementation manner of the second aspect, the second implementation manner of the second aspect, the third implementation manner of the second aspect, and the fourth implementation manner of the second aspect, in a fifth implementation manner of the second aspect, the executing module includes:
a fourth determining sub-module configured to determine target query operable compressed data in the query operable compressed data according to the optimized query operation instruction;
an execution submodule configured to perform an operation on the target query-operable compressed data according to the optimized query operation instruction.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory and a processor, where the memory is used to store one or more computer instructions that support a query operation instruction optimization apparatus to execute the above query operation instruction optimization method, and the processor is configured to execute the computer instructions stored in the memory. The query operation order optimization device may further include a communication interface for communicating with other devices or a communication network.
In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer instructions for a query operation instruction optimization device, where the computer instructions include computer instructions for executing the query operation instruction optimization method to the query operation instruction optimization device.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
according to the technical scheme, the query operation instruction is subjected to hierarchical optimization by means of the attribute information of the query operation object data to obtain the optimized query operation instruction, so that the optimized query operation instruction is conveniently and directly used for operating the compressed data in the follow-up process. The technical scheme has strong applicability, avoids a complex data decompression process, saves the storage space of decompressed data to a great extent, simplifies the data operation flow and improves the data operation performance.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of embodiments of the invention.
Drawings
Other features, objects and advantages of embodiments of the invention will become more apparent from the following detailed description of non-limiting embodiments thereof, when taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 illustrates a flow diagram of a method for query operation instruction optimization according to an embodiment of the invention;
FIG. 2 is a schematic diagram of compressed data stored column by column according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of data compression and compressed data query in the prior art;
FIG. 4 shows a flowchart of step S102 of a query operation instruction optimization method according to the embodiment shown in FIG. 1;
FIG. 5 is a flow chart illustrating data compression and compressed data query according to an embodiment of the present invention;
FIG. 6 shows a flowchart of step S103 of the query operation instruction optimization method according to the embodiment shown in FIG. 1;
FIG. 7 is a flowchart illustrating query operation instruction optimization according to an embodiment of the present invention;
FIG. 8 illustrates a flow diagram of a method for query operation instruction optimization according to another embodiment of the invention;
FIG. 9 is a flowchart illustrating step S804 of the query operation instruction optimization method according to the embodiment illustrated in FIG. 8;
fig. 10A is a schematic diagram of a logical expression according to example 1 of the present invention, fig. 10B is a schematic diagram of a database storage according to example 1 of the present invention, fig. 10C is a schematic diagram of a flow of a prior art process according to example 1 of the present invention, and fig. 10D is a schematic diagram of a process of an inventive scheme according to example 1 of the present invention;
fig. 11A is a schematic database storage diagram according to example 2 of the present invention, fig. 11B is a schematic prior art process flow diagram according to example 2 of the present invention, fig. 11C is a schematic optimization logic representation according to example 2 of the present invention, fig. 11D is a schematic prior art process flow diagram according to example 2 of the present invention, and fig. 11E is a schematic prior art process flow diagram according to example 2 of the present invention;
fig. 12A is a schematic diagram of a logical expression according to example 3 of the present invention, fig. 12B is a schematic diagram of a database storage according to example 3 of the present invention, fig. 12C is a schematic diagram of a prior art process flow according to example 3 of the present invention, and fig. 12D is a schematic diagram of an optimized logical expression according to example 3 of the present invention;
FIG. 13A is a schematic diagram of a logic expression according to example 4 of the present invention, FIG. 13B is a schematic diagram of a prior art process flow according to example 4 of the present invention, and FIG. 13C is a schematic diagram of an optimized logic expression according to example 4 of the present invention;
FIG. 14A is a schematic diagram of a logic expression according to example 5 of the present invention, FIG. 14B is a schematic diagram of a prior art process flow according to example 5 of the present invention, and FIG. 14C is a schematic diagram of an optimized logic expression according to example 4 of the present invention;
FIG. 15 is a block diagram showing the structure of a query operation instruction optimization apparatus according to an embodiment of the present invention;
FIG. 16 is a block diagram illustrating the structure of adetermination module 1502 of the query operation instruction optimizing apparatus according to the embodiment shown in FIG. 15;
fig. 17 is a block diagram illustrating a structure of anoptimization module 1503 of the query operation instruction optimization apparatus according to the embodiment illustrated in fig. 15;
FIG. 18 is a block diagram showing the construction of a query operation instruction optimizing apparatus according to another embodiment of the present invention;
FIG. 19 is a block diagram illustrating the structure of anexecution module 1804 of the query operation instruction optimization apparatus according to the embodiment shown in FIG. 18;
FIG. 20 shows a block diagram of an electronic device according to an embodiment of the invention;
FIG. 21 is a block diagram of a computer system suitable for implementing a query operation instruction optimization method according to an embodiment of the present invention.
Detailed Description
Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.
In the embodiments of the present invention, it is to be understood that terms such as "including" or "having", etc., are intended to indicate the presence of the features, numbers, steps, actions, components, parts, or combinations thereof disclosed in the present specification, and are not intended to exclude the possibility that one or more other features, numbers, steps, actions, components, parts, or combinations thereof may be present or added.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. Embodiments of the present invention will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
According to the technical scheme provided by the embodiment of the invention, the query operation instruction is subjected to hierarchical optimization by virtue of the attribute information of the query operation object data to obtain the optimized query operation instruction, so that the optimized query operation instruction is conveniently and directly used for operating the compressed data in the follow-up process. The technical scheme has strong applicability, avoids a complex data decompression process, saves the storage space of decompressed data to a great extent, simplifies the data operation flow and improves the data operation performance.
Fig. 1 shows a flowchart of a query operation instruction optimization method according to an embodiment of the present invention, which is suitable for a query operation instruction optimization server, and as shown in fig. 1, the query operation instruction optimization method includes the following steps S101 to S103:
in step S101, an inquiry operation instruction is obtained, and corresponding inquiry operation object data is obtained according to the inquiry operation instruction, where the inquiry operation object data is compressed data and carries corresponding attribute information;
in step S102, determining query operable compressed data corresponding to the query operation object data according to the query operation object data, and acquiring attribute information of the query operable compressed data;
in step S103, the query operation instruction is optimized in a hierarchical manner according to the attribute information of the query operable compressed data, so as to obtain an optimized query operation instruction.
As mentioned above, as data technology advances, more and more data needs to be stored in a database. In order to save the data storage space, the data to be stored needs to be compressed and then stored, but the compressed data cannot be directly operated when data operations such as data query are performed subsequently, the compressed data needs to be completely decompressed first, and then the query operation is performed after the compressed data is restored to the original data, so that the complicated data decompression process reduces the query performance of the engine to a great extent. The prior art solutions either do not substantially optimize the execution of data operations and memory usage or have very limited application scenarios.
In view of the above problem, in this embodiment, a query operation instruction optimization method is provided, which performs hierarchical optimization on a query operation instruction by using attribute information of query operation object data to obtain an optimized query operation instruction, so as to subsequently directly perform an operation on compressed data by using the optimized query operation instruction. The technical scheme has strong applicability, avoids a complex data decompression process, saves the storage space of decompressed data to a great extent, simplifies the data operation flow and improves the data operation performance.
In an embodiment of the present invention, the query operation instruction refers to a query instruction for achieving a certain operation purpose, such as a query instruction, a search instruction, and the like, and for the SQL field, the query operation instruction may be implemented as an SQL query operation instruction.
In an embodiment of the present invention, the query operation object data corresponding to the query operation instruction refers to data that is related to the query operation instruction and is likely to be an object of the query operation finally, for example, if the query operation instruction queries a number less than 950 in a certain database, all data stored in the database may be considered as the query operation object data corresponding to the query operation instruction, and the query operation may be involved in the operation of the query operation in the subsequent query operation.
In an embodiment of the present invention, since the query operation target data is data already stored in a database, the query operation target data is compressed data, and the compressed data further carries corresponding attribute information, where the attribute information may include one or more of the following information: statistical information of compressed data, compression information of compressed data, storage information of compressed data, and the like, the existence of the attribute information enables the optimization of the query operation instruction by the present invention, wherein the statistical information may be, for example: maximum data value, minimum data value, summation value within a preset range of data, data type, etc., and the compression information may be, for example: the original data type, the data storage type, the data compression method, and the like, and the storage information may be, for example: storage type, storage location, etc.
In an embodiment of the present invention, the compressed data may be, for example, compressed data stored column by column in a HiStore column database in units of row groups, where a row group refers to a data unit formed by a plurality of rows in a column storage, and a row refers to a data row in a conventional sense in the field of databases, that is, in the HiStore column database, data is stored column by column, and in a certain column, data is further divided into row groups formed by a plurality of rows. In addition, the data in the row group may be independently compressed and subjected to information statistics by taking the row group as a unit, where the result of the information statistics is statistical information or attribute information of the row group, and is mainly used to describe the characteristics and attributes of the data stored in the row group. Fig. 2 is a schematic diagram of column-by-column storage of compressed data according to an embodiment of the present invention, as shown in fig. 2, the data storage table includescolumn 1, column 2, column 3, and other columns (not shown in fig. 2), androw group 1, row group 2, row group 3, and other row groups (not shown in fig. 2), in fig. 2, an intersection of a column and a row group represents a row group unit included in the column, a black box in each intersection represents statistical information of the row group, i.e., a compressed data packet, and a gray box represents a data packet formed after compression of each row of data included in the row group, i.e., a compressed data packet.
In an embodiment of the present invention, the operable compressed data refers to data that can be directly subjected to operations such as query and the like on the basis of compressed data, and may also be referred to as interpretable compressed data, and compressed data obtained by compression methods such as Delta compression based on increments, GCD compression based on greatest common divisor, FOR (Frame-of-reference) compression, Dictionary compression of various types, and the like are all operable compressed data. Similarly, the non-operable compressed data, which is opposite to the operable compressed data, refers to data that cannot be directly queried or the like on the basis of the compressed data, and needs to be decompressed into the operable compressed data and then queried or the like, and may also be referred to as non-interpretable compressed data, for example, compressed data obtained by a compression method such as Huffman coding, arithmetic coding, LZ77/LZ4 algorithm, and the like are all non-operable compressed data. Fig. 3 is a schematic flow chart of data compression and compressed data query in the prior art, as shown in fig. 3, in a general case, when original data is compressed into compressed data and stored in a disk, the original data may be compressed into operable compressed data with a certain compression ratio, and then the operable compressed data may be compressed into inoperable compressed data with a higher compression ratio for storage, when query needs to be performed on the compressed data, the inoperable compressed data needs to be decompressed into operable compressed data with a certain readability, and then the operable compressed data needs to be decompressed and restored into the original data for data query operation.
In an embodiment of the present invention, the query operation instruction is hierarchically optimized according to the attribute information of the query-operable compressed data, which means that the query operation instruction is simplified and optimized in different aspects according to the obtained attribute information of the query-operable compressed data, so as to finally obtain an optimized query operation instruction, and a specific optimization process of the query operation instruction will be described in detail below.
In an embodiment of the present invention, as shown in fig. 4, the step S102 of determining query operable compressed data corresponding to the query operation target data according to the query operation target data and acquiring attribute information of the query operable compressed data includes the following steps S401 to S403:
in step S401, when the query operation object data is operable compressed data, determining the query operation object data as query operable compressed data corresponding thereto;
in step S402, when the query operation object data is the inoperable compressed data, decompressing the inoperable compressed data until operable compressed data is obtained, and determining the operable compressed data obtained by decompression as the query operable compressed data corresponding to the query operation object data;
in step S403, obtaining attribute information of the query-operable compressed data, where the attribute information of the query-operable compressed data at least includes one or more of the following information: statistical information, compression information, storage information.
As mentioned above, in the prior art, when querying the stored compressed data, it is necessary to decompress the operable compressed data into the original data and perform the query operation on the data. In order to avoid a cumbersome data decompression process, save a storage space of decompressed data, simplify a data operation process, and improve data operation performance, in this embodiment, it is intended that corresponding operations can be directly performed on compressed data in the following, it is first required to determine whether query operation object data is operable compressed data, if the query operation object data obtained previously is operable compressed data, the query operation object data is determined as query operable compressed data corresponding thereto, data operations such as query and the like can be directly performed in the following, and if the query operation object data is inoperable compressed data, the inoperable compressed data needs to be decompressed until the operable compressed data corresponding thereto is obtained.
Fig. 5 is a schematic flow chart of data compression and compressed data query according to an embodiment of the present invention, and as shown in fig. 5, when querying compressed data, query operation of data can be performed only by decompressing inoperable compressed data stored in a database into operable compressed data with certain readability, and without decompressing and restoring the operable compressed data into original data.
In an embodiment of the present invention, as shown in fig. 6, the step S103 of performing hierarchical optimization on the query operation instruction according to the attribute information of the query-operable compressed data to obtain an optimized query operation instruction includes the following steps S601 to S603:
in step S601, determining a query operation expression and a query operation operator corresponding to the query operation instruction according to the query operation instruction;
in step S602, first attribute information of the query-operable compressed data is obtained, and a first-level optimization is performed on the query operation instruction according to the first attribute information, so as to obtain a corresponding first-level optimized query operation expression and a first-level optimized query operation operator;
in step S603, second attribute information of the query-operable compressed data is obtained, secondary optimization is performed on the first-stage optimized query operation expression according to the second attribute information, so as to obtain a corresponding secondary optimized query operation expression and a corresponding secondary optimized query operation operator until a final-stage optimized query operation expression and a final-stage optimized query operation operator are obtained, and the optimized query operation instruction is obtained according to the final-stage optimized query operation expression and the final-stage optimized query operation operator.
In this embodiment, when performing hierarchical optimization on the query operation instruction according to the attribute information of the query operable compressed data, first determining a query operation expression and a query operation operator corresponding to the query operation instruction according to the query operation instruction; then acquiring first attribute information of the query operable compressed data, and performing first-level optimization on the query operation instruction according to the first attribute information to obtain a corresponding first-level optimization query operation expression and a corresponding first-level optimization query operation operator; then obtaining second attribute information of the query operable compressed data, and performing secondary optimization on the primary optimization query operation expression according to the second attribute information to obtain a corresponding secondary optimization query operation expression and a corresponding secondary optimization query operation operator; and sequentially carrying out the operation until a final-stage optimization query operation expression and a final-stage optimization query operation operator are obtained, and finally obtaining the optimized query operation instruction according to the final-stage optimization query operation expression and the final-stage optimization query operation operator.
The hierarchical optimization refers to performing hierarchical simplification or query data range reduction on the query operation expression according to different attribute information of the query operable compressed data, for example, in the case of primary optimization, the primary simplification or query data range reduction can be performed according to first attribute information of the query operable compressed data, and in the case of secondary optimization, the secondary simplification or query data range reduction can be performed according to second attribute information of the query operable compressed data until the final stage. It can be seen that the number of hierarchical optimizations is related to the number of categories of query-operable compressed data attribute information. For example, if the original query operation expression is a <950, the original query operation expression may be optimized according to theminimum value 900 of the statistical information of the query operable compressed data, that is, the query data range is narrowed, and the logic expression obtained after optimization may be a < 950-. As another example, if the original query operation expression is a <950/a & a ═ 1000/B, that is querying less than 950 row group data in database aqueries 1000 row group data in database B, a first level optimization can be performed on the original query operation expression based on theminimum value 900 of statistics for the database a query-operable compressed data, namely, the query data range is narrowed, the first-level optimization logic expression obtained after the first-level optimization can be a <950-, then, the first-level logic expression is secondarily optimized according to thecommon divisor 100 of the statistical information of the database B query operable compressed data, namely, the query data range is narrowed, and the secondary optimization logical expression obtained after the secondary optimization can be a < 950-.
Fig. 7 is a schematic diagram illustrating an optimization flow of a query operation instruction according to an embodiment of the present invention, in fig. 7, assuming that first attribute information is statistical information and second attribute information is compressed information, first, a query operation expression X corresponding to the query operation instruction is determined according to the query operation instruction0And query operation operator S0Then, according to the first attribute information, namely statistical information, of the query operable compressed data, performing first-level optimization on the query operation instruction to obtain a corresponding first-level optimization query operation expression X1And first-level optimization query operation operator S1(ii) a Then optimizing the query operation expression X for the first level according to the second attribute information of the query operable compressed data, namely, the compression information1Performing secondary optimization to obtain a corresponding secondary optimization query operation expression X2And secondary optimization query operation operator S2In this embodiment, the secondary optimization query operation expression X2And secondary optimization query operation operator S2Namely the final stage optimization query operation expression and the final stage optimization query operation operator, and finally according to the final stage optimization query operation expression X2And final optimization query operation operator S2The optimized query operation instruction can be obtained, and of course, in actual operation, the attribute information according to the hierarchical optimization may be all statistical information, but the statistical information may be different.
In an embodiment of the present invention, the method further includes a step of performing an operation on the query-operable compressed data according to the optimized query operation instruction, that is, as shown in fig. 8, the method includes the following steps S801-S804:
in step S801, a query operation instruction is obtained, and corresponding query operation object data is obtained according to the query operation instruction, where the query operation object data is compressed data and carries corresponding attribute information;
in step S802, determining query operable compressed data corresponding to the query operation object data according to the query operation object data, and acquiring attribute information of the query operable compressed data;
in step S803, performing hierarchical optimization on the query operation instruction according to the attribute information of the query operable compressed data to obtain an optimized query operation instruction;
in step S804, an operation is performed on the query-operable compressed data according to the optimized query operation instruction.
After the query operation instruction is optimized, the optimized query operation instruction can be used for directly executing operation on the query operable compressed data, the operation based on the compressed data can effectively avoid a data decompression process, save the storage space occupied by the decompressed data, simplify the data operation process and improve the data operation performance.
In an embodiment of the present invention, as shown in fig. 9, the step S804 of performing an operation on the query-operable compressed data according to the optimized query operation instruction includes the following steps S901 to S902:
in step S901, determining target query operable compressed data in the query operable compressed data according to the optimized query operation instruction;
in step S902, an operation is performed on the target query-operable compressed data according to the optimized query operation instruction.
After the query operation instruction is optimized, the corresponding operation data object may be changed, for example, the storage range of the data object may be reduced, at this time, the target query operable compressed data corresponding to the query operation instruction needs to be re-determined according to the optimized query operation instruction, and then, the operation is performed on the target query operable compressed data according to the optimized query operation instruction, so as to obtain an operation result.
The technical solutions provided by the embodiments of the present invention are explained and illustrated below by taking several examples as examples.
In example 1, the query operation command is to query row group data smaller than 950 in the database a, the corresponding logical expression may be represented as a <950, the query operation object data corresponding to the query operation command is numerical row group data stored in the database a, and the logical expression diagram is shown in fig. 10A. As shown in fig. 10B, the original data to be stored is 64-bit integer data type (int64) data with a minimum value of 900, such as 910, 900, 999 … …, and the like, and when the data is compressed and stored by the FOR compression method, theoriginal data 910, 900, 999 … … can be compressed into operable 8-bit integerdata type data 10, 0, 99 … … obtained by subtracting theminimum value 900, and theminimum value 900 is used as the statistical information of the compressed data. In the prior art, when querying row group data smaller than 950 in the database a, 8-bit integer data type data stored in a certain row group of the database a needs to be converted into 64-bit integer data type data, then minimum values 900 are added respectively to obtain 64-bit integer data type data, and then the 64-bit integer data type data is compared with 950, as shown in fig. 10C. By using the technical solution of the embodiment of the present invention, the logic expression a <950 and the corresponding operator a <950 are optimized according to the statistical information of the compressed data, i.e. theminimum value 900, without the data decompression process, so as to obtain the optimized logic expression a <950 > and the corresponding operator a <50, and further, 8-bit integer data type data smaller than 50 can be directly queried in the row group of the database a, so as to complete the operation of a <950, at this time, the target query operable compressed data becomes 8-bit integer data type data smaller than 50, and the optimized logic expression diagram is shown in fig. 10D. In this example, the constant 950 in expression a <950 is shifted to 950-minimum and by substitution of the minimum 900, a combination of constants is achieved: 950-.
In example 2, the query operation command is to query the database B for row group data equal to 1000, the corresponding logical expression may be represented as a being 1000, and the query operation target data corresponding to the query operation command is the numerical row group data stored in the database B. As shown in fig. 11A, the original data to be stored is 64-bit integer data type data withcommon divisor 100 such as 800, 900, 1000 … …, etc., and when the GCD compression method is used for compression storage, theoriginal data 800, 900, 1000 … … can be compressed into operable 8-bit integerdata type data 8, 9, 10 … … obtained by removing thecommon divisor 100, and thecommon divisor 100 is used as the statistical information of the row group of compressed data. In the prior art, when querying row group data equal to 1000 in the database B, it is necessary to convert 8-bit integer data type data stored in the row group of the database B into 64-bit integer data type data, and then multiply the 64-bit integer data type data bycommon divisor 100 to obtain 64-bit integer data type data, and compare the 64-bit integer data type data with 1000, as shown in fig. 11B. By using the technical solution of the embodiment of the present invention, the data decompression process is not required, and the optimized logical expression a-1000/100 and the corresponding operator a-10 are obtained by optimizing the logical expression a-1000 and the corresponding operator a-1000 according to the statistical information of the compressed data, that is, thecommon divisor 100, and then 8-bit integer data type data equal to 10 can be directly queried in the row group of the database B, so that the operation of a-1000 can be completed, at this time, the target query operable compressed data becomes 8-bit integer data type data equal to 10, and the optimized logical expression schematic diagram is shown in fig. 11C. In this example, the constant 1000 in the expression a 1000 is transferred to 1000/common divisor, and through substitution of thecommon divisor 100, merging of the constants is realized: 1000/100 is 10, and simultaneously, because the optimized logic expression can be directly performed based on the compressed data of the 8-bit integer data type, compared with the prior art in which the operation can be performed only by recovering the compressed data of the 8-bit integer data type to the original data of the 64-bit integer data type, the dimension reduction of the operator is realized, and the data operation after the dimension reduction not only greatly saves the data storage space, but also greatly saves the data operation time. For another example, for the database B, if the query operation command is to query the database B for the row group data less than or equal to 1001, in the prior art, the 8-bit integer data type data stored in the database B needs to be converted into 64-bit integer data type data, and then the 64-bit integer data type data is obtained by multiplying the 64-bit integer data type data by thecommon divisor 100, and then the 64-bit integer data type data is compared with 1001, as shown in fig. 11D. By using the technical scheme of the embodiment of the invention, the data decompression process is not needed, the logic expression a <1001 is optimized according to the statistical information of the compressed data, namely thecommon divisor 100, so that the optimized logic expression a < ═ 1001/100 is obtained, and then 8-bit integer data type data less than or equal to 10 can be directly inquired in a certain row group of the database B, so that the operation of a <1000 can be completed, and the optimized logic expression schematic diagram is shown in FIG. 11E.
Example 3, the query operation command is to calculate the sum of two row group data in the database C, and the corresponding logical expression can be represented as a + b, where, as shown in fig. 12A, the row group where a is located uses the FOR compression method, the minimum value of which is 900, theoriginal data 910, 900, 999 … … 64 bit integer data type data is compressed by the FOR compression method to obtain the operable 8 bit integerdata type data 10, 0, 99 … … FOR storage, and theminimum value 900 is the statistical information of the row group. As shown in fig. 12B, the row group where B is located also uses the FOR compression method, the minimum value is 2000, theoriginal data 2000, 3000, 2500 … … 64 bits integer data type data is compressed by the FOR compression method to obtain operable 16 bits integerdata type data 0, 1000, 500 … … FOR storage, and theminimum value 2000 is the statistical information of the row group. In the prior art, when the sum of two row group data is calculated in the database C, it is necessary to convert the 8-bit integer data type data stored in a certain row group of the database C into 64-bit integer data type data, then add theminimum value 900 to obtain 64-bit integer data type data, convert the 16-bit integer data type data stored in another row group of the database C into 64-bit integer data type data, and then add theminimum value 2000 to obtain 64-bit integer data type data, as shown in fig. 12C. By using the technical solution of the embodiment of the present invention, the logic expression a + b and the corresponding operator a + b are optimized according to the statistical information of the compressed data, i.e. theminimum values 900 and 2000, without the data decompression process, to obtain the optimized logic expression a + b +minimum value 1+ minimum value 2 and the corresponding operator a + b +2900, but considering that the data storage type of the row group where a is located is 8-bit integer data and the data storage type of the row group where b is located is 16-bit integer data, the row group data where a is located still needs to be converted into 16-bit integer data capable of being calculated with the row group data where b is located, and then added with the sum 2900 of the minimum values of the two row groups to obtain the operation result of a + b, where if the finally calculated sum exceeds the accommodation range of the 16-bit integer data type, it may be converted to a 64-bit integer data type as shown in the dashed box of fig. 12D. The example effectively realizes the dimension reduction of the operational characters, and the data operation after the dimension reduction not only greatly saves the data storage space, but also greatly saves the data operation time.
Example 4, the query operation command is to calculate a sum of a certain row data in the database D, and the corresponding logical expression may be represented as sum (a), where, as shown in fig. 13A, the row group where a is located uses an FOR compression method, the minimum value of which is 900, theoriginal data 910, 900, 999 … … 64 bit integer data type data is compressed by the FOR compression method to obtain the operational 8 bit integerdata type data 10, 0, 99 … … FOR storage, and theminimum value 900 is the statistical information of the row group. In the prior art, when the sum of a certain row of data in the database D is calculated, the 8-bit integer data type data stored in the row of the database D needs to be converted into 64-bit integer data type data, then theminimum value 900 is added to obtain 64-bit integer data type data, and then the sum is calculated, as shown in fig. 13B. By using the technical scheme of the embodiment of the present invention, the data decompression process is not needed, and the logic expression sum (a) and the corresponding operator sum (a) are optimized according to the statistical information of the compressed data, i.e. the minimum 900, to obtain the optimized logic expression sum (a) + minimum × N, where N is the number of data in the row group and the corresponding operator sum (a) +900 × N. Considering that the data type may need to be converted after the 8-bit integer data type data is summed, and the minimum value × N belongs to 64-bit integer data type data, when performing the optimization operation, the 8-bit integer data type data may be converted into a data type lower than 64 bits according to the practical application, the data type is summed first, then the obtained summed value is converted into 64-bit integer data type data, and the summed value is summed with the minimum value × N belonging to the 64-bit integer data type, and the optimized logic expression diagram is shown in fig. 13C. The example realizes the dimension reduction of the operator in the calculation process and performs corresponding operation in advance under the condition of low dimension, so that the data storage space and the data operation time are greatly reduced.
Example 5, the query operation command is to calculate a summation value of a certain row group data in the database E, and the corresponding logical expression may be represented as sum (a), where, as shown in fig. 14A, the row group where a is located uses a GCD compression method, the common divisor of the row group is 100, theoriginal data 800, 900, 1000 … … 64 bits of integer data type data is compressed by the GCD compression method to obtain the operable 8 bits of integerdata type data 8, 9, 10 … … for storage, and the common divisor 200 is statistical information of the row group. In the prior art, when the summation value of a certain row of data in the database E is calculated, the 8-bit integer data type data stored in the row of the database E needs to be converted into 64-bit integer data type data, and then multiplied by thecommon divisor 100, respectively, and then the summation calculation is performed, as shown in fig. 14B. By using the technical scheme of the embodiment of the invention, the data decompression process is not needed, and the logic expression sum (a) and the corresponding operator sum (a) are optimized according to the statistical information of the compressed data, namely thecommon divisor 100, so that the optimized logic expression sum (a) x common divisor and the corresponding operator sum (a) x common divisor are obtained. Considering that the data type may need to be converted after the 8-bit integer data type data is summed, when performing the optimization operation, the 8-bit integer data type data may be converted into a data type lower than 64 bits according to the actual application, the data type is summed, the summed value is converted into 64-bit integer data type data, and the 64-bit integer data type data is multiplied by the common divisor, and the optimized logic expression diagram is shown in fig. 14C. The example realizes the dimension reduction of the operator in the calculation process and performs corresponding operation in advance under the condition of low dimension, so that the data storage space and the data operation time are greatly reduced.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention.
Fig. 15 is a block diagram illustrating a structure of a query operation instruction optimization apparatus according to an embodiment of the present invention, which may be implemented as part or all of an electronic device, and may be implemented as a query operation instruction optimization server, through software, hardware, or a combination of both. As shown in fig. 15, the query operation instruction optimization device includes:
an obtainingmodule 1501, configured to obtain a query operation instruction, and obtain corresponding query operation object data according to the query operation instruction, where the query operation object data is compressed data and carries corresponding attribute information;
a determiningmodule 1502 configured to determine, according to the query operation object data, query operable compressed data corresponding to the query operation object data, and obtain attribute information of the query operable compressed data;
theoptimization module 1503 is configured to perform hierarchical optimization on the query operation instruction according to the attribute information of the query operable compressed data to obtain an optimized query operation instruction.
As mentioned above, as data technology advances, more and more data needs to be stored in a database. In order to save the data storage space, the data to be stored needs to be compressed and then stored, but the compressed data cannot be directly operated when data operations such as data query are performed subsequently, the compressed data needs to be completely decompressed first, and then the query operation is performed after the compressed data is restored to the original data, so that the complicated data decompression process reduces the query performance of the engine to a great extent. The prior art solutions either do not substantially optimize the execution of data operations and memory usage or have very limited application scenarios.
In view of the above problem, in this embodiment, a query operation instruction optimization apparatus is provided, which performs hierarchical optimization on a query operation instruction by using attribute information of query operation object data to obtain an optimized query operation instruction, so as to subsequently directly perform an operation on compressed data by using the optimized query operation instruction. The technical scheme has strong applicability, avoids a complex data decompression process, saves the storage space of decompressed data to a great extent, simplifies the data operation flow and improves the data operation performance.
In an embodiment of the present invention, the query operation instruction refers to a query instruction for achieving a certain operation purpose, such as a query instruction, a search instruction, and the like, and for the SQL field, the query operation instruction may be implemented as an SQL query operation instruction.
In an embodiment of the present invention, the query operation object data corresponding to the query operation instruction refers to data that is related to the query operation instruction and is likely to be an object of the query operation finally, for example, if the query operation instruction queries a number less than 950 in a certain database, all data stored in the database may be considered as the query operation object data corresponding to the query operation instruction, and the query operation may be involved in the operation of the query operation in the subsequent query operation.
In an embodiment of the present invention, since the query operation target data is data already stored in a database, the query operation target data is compressed data, and the compressed data further carries corresponding attribute information, where the attribute information may include one or more of the following information: statistical information of compressed data, compression information of compressed data, storage information of compressed data, and the like, the existence of the attribute information enables the optimization of the query operation instruction by the present invention, wherein the statistical information may be, for example: maximum data value, minimum data value, summation value within a preset range of data, data type, etc., and the compression information may be, for example: the original data type, the data storage type, the data compression method, and the like, and the storage information may be, for example: storage type, storage location, etc.
In an embodiment of the present invention, the compressed data may be, for example, compressed data stored column by column in a HiStore column database in units of row groups, where a row group refers to a data unit formed by a plurality of rows in a column storage, and a row refers to a data row in a conventional sense in the field of databases, that is, in the HiStore column database, data is stored column by column, and in a certain column, data is further divided into row groups formed by a plurality of rows. In addition, the data in the row group may be independently compressed and subjected to information statistics by taking the row group as a unit, where the result of the information statistics is statistical information or attribute information of the row group, and is mainly used to describe the characteristics and attributes of the data stored in the row group. Fig. 2 is a schematic diagram of column-by-column storage of compressed data according to an embodiment of the present invention, as shown in fig. 2, the data storage table includescolumn 1, column 2, column 3, and other columns (not shown in fig. 2), androw group 1, row group 2, row group 3, and other row groups (not shown in fig. 2), in fig. 2, an intersection of a column and a row group represents a row group unit included in the column, a black box in each intersection represents statistical information of the row group, i.e., a compressed data packet, and a gray box represents a data packet formed after compression of each row of data included in the row group, i.e., a compressed data packet.
In an embodiment of the present invention, the operable compressed data refers to data that can be directly subjected to operations such as query and the like on the basis of compressed data, and may also be referred to as interpretable compressed data, and compressed data obtained by compression methods such as Delta compression based on increments, GCD compression based on greatest common divisor, FOR (Frame-of-reference) compression, Dictionary compression of various types, and the like are all operable compressed data. Similarly, the non-operable compressed data, which is opposite to the operable compressed data, refers to data that cannot be directly queried or the like on the basis of the compressed data, and needs to be decompressed into the operable compressed data and then queried or the like, and may also be referred to as non-interpretable compressed data, for example, compressed data obtained by a compression method such as Huffman coding, arithmetic coding, LZ77/LZ4 algorithm, and the like are all non-operable compressed data. Fig. 3 is a schematic flow chart of data compression and compressed data query in the prior art, as shown in fig. 3, in a general case, when original data is compressed into compressed data and stored in a disk, the original data may be compressed into operable compressed data with a certain compression ratio, and then the operable compressed data may be compressed into inoperable compressed data with a higher compression ratio for storage, when query needs to be performed on the compressed data, the inoperable compressed data needs to be decompressed into operable compressed data with a certain readability, and then the operable compressed data needs to be decompressed and restored into the original data for data query operation.
In an embodiment of the present invention, the query operation instruction is hierarchically optimized according to the attribute information of the query-operable compressed data, which means that the query operation instruction is simplified and optimized in different aspects according to the obtained attribute information of the query-operable compressed data, so as to finally obtain an optimized query operation instruction, and a specific optimization process of the query operation instruction will be described in detail below.
In an embodiment of the present invention, as shown in fig. 16, the determiningmodule 1502 includes:
a first determining sub-module 1601 configured to determine the query operand data as query operable compressed data corresponding thereto when the query operand data is operable compressed data;
a second determining sub-module 1602, configured to, when the query operation object data is the inoperable compressed data, decompress the inoperable compressed data until operable compressed data is obtained, and determine the operable compressed data obtained by decompression as query operable compressed data corresponding to the query operation object data;
an obtaining sub-module 1603 configured to obtain attribute information of the query-operable compressed data, where the attribute information of the query-operable compressed data at least includes one or more of the following information: statistical information, compression information, storage information.
As mentioned above, in the prior art, when querying the stored compressed data, it is necessary to decompress the operable compressed data into the original data and perform the query operation on the data. In order to avoid a cumbersome data decompression process, save a storage space of decompressed data, simplify a data operation process, and improve data operation performance, in this embodiment, it is intended that corresponding operations can be directly performed on compressed data in the following, it is first required to determine whether query operation object data is operable compressed data, if the query operation object data obtained previously is operable compressed data, the query operation object data is determined as query operable compressed data corresponding thereto, data operations such as query and the like can be directly performed in the following, and if the query operation object data is inoperable compressed data, the inoperable compressed data needs to be decompressed until the operable compressed data corresponding thereto is obtained.
Fig. 5 is a schematic flow chart of data compression and compressed data query according to an embodiment of the present invention, and as shown in fig. 5, when querying compressed data, query operation of data can be performed only by decompressing inoperable compressed data stored in a database into operable compressed data with certain readability, and without decompressing and restoring the operable compressed data into original data.
In an embodiment of the present invention, as shown in fig. 17, theoptimization module 1503 includes:
a third determiningsubmodule 1701 configured to determine a query operation expression and a query operation operator corresponding to the query operation instruction according to the query operation instruction;
afirst optimization submodule 1702 configured to obtain first attribute information of the query-operable compressed data, and perform first-level optimization on the query operation instruction according to the first attribute information to obtain a corresponding first-level optimized query operation expression and a corresponding first-level optimized query operation operator;
asecond optimization submodule 1703, configured to obtain second attribute information of the query-operable compressed data, perform secondary optimization on the first-stage optimized query operation expression according to the second attribute information, to obtain a corresponding secondary-stage optimized query operation expression and a corresponding secondary-stage optimized query operation operator until a final-stage optimized query operation expression and a final-stage optimized query operation operator are obtained, and obtain the optimized query operation instruction according to the final-stage optimized query operation expression and the final-stage optimized query operation operator.
In this embodiment, when performing hierarchical optimization on the query operation instruction according to the attribute information of the query operable compressed data, first determining a query operation expression and a query operation operator corresponding to the query operation instruction according to the query operation instruction; then acquiring first attribute information of the query operable compressed data, and performing first-level optimization on the query operation instruction according to the first attribute information to obtain a corresponding first-level optimization query operation expression and a corresponding first-level optimization query operation operator; then obtaining second attribute information of the query operable compressed data, and performing secondary optimization on the primary optimization query operation expression according to the second attribute information to obtain a corresponding secondary optimization query operation expression and a corresponding secondary optimization query operation operator; and sequentially carrying out the operation until a final-stage optimization query operation expression and a final-stage optimization query operation operator are obtained, and finally obtaining the optimized query operation instruction according to the final-stage optimization query operation expression and the final-stage optimization query operation operator.
The hierarchical optimization refers to performing hierarchical simplification or query data range reduction on the query operation expression according to different attribute information of the query operable compressed data, for example, in the case of primary optimization, the primary simplification or query data range reduction can be performed according to first attribute information of the query operable compressed data, and in the case of secondary optimization, the secondary simplification or query data range reduction can be performed according to second attribute information of the query operable compressed data until the final stage. It can be seen that the number of hierarchical optimizations is related to the number of categories of query-operable compressed data attribute information. For example, if the original query operation expression is a <950, the original query operation expression may be optimized according to theminimum value 900 of the statistical information of the query operable compressed data, that is, the query data range is narrowed, and the logic expression obtained after optimization may be a < 950-. As another example, if the original query operation expression is a <950/a & a ═ 1000/B, that is querying less than 950 row group data in database aqueries 1000 row group data in database B, a first level optimization can be performed on the original query operation expression based on theminimum value 900 of statistics for the database a query-operable compressed data, namely, the query data range is narrowed, the first-level optimization logic expression obtained after the first-level optimization can be a <950-, then, the first-level logic expression is secondarily optimized according to thecommon divisor 100 of the statistical information of the database B query operable compressed data, namely, the query data range is narrowed, and the secondary optimization logical expression obtained after the secondary optimization can be a < 950-.
Fig. 7 is a schematic diagram illustrating an optimization flow of a query operation instruction according to an embodiment of the present invention, in fig. 7, assuming that first attribute information is statistical information and second attribute information is compressed information, first, a query operation expression X corresponding to the query operation instruction is determined according to the query operation instruction0And query operation operator S0Then, according to the first attribute information, namely statistical information, of the query operable compressed data, performing first-level optimization on the query operation instruction to obtain a corresponding first-level optimization query operation expression X1And first-level optimization query operation operator S1(ii) a Then optimizing the query operation expression X for the first level according to the second attribute information of the query operable compressed data, namely, the compression information1Performing secondary optimization to obtain a corresponding secondary optimization query operation expression X2And secondary optimization query operation operator S2In this embodiment, the secondary optimization query operation expression X2And secondary optimization query operation operator S2I.e. the final stage optimization query operation expression and the final stage optimizationTransforming query operation operator, and finally, according to said final optimized query operation expression X2And final optimization query operation operator S2The optimized query operation instruction can be obtained, and of course, in actual operation, the attribute information according to the hierarchical optimization may be all statistical information, but the statistical information may be different.
In an embodiment of the present invention, the apparatus further includes a part for performing an operation on the query-operable compressed data according to the optimized query operation instruction, that is, as shown in fig. 18, the apparatus includes:
an obtainingmodule 1801, configured to obtain a query operation instruction, and obtain corresponding query operation object data according to the query operation instruction, where the query operation object data is compressed data and carries corresponding attribute information;
a determiningmodule 1802 configured to determine, according to the query operation object data, query operable compressed data corresponding thereto, and acquire attribute information of the query operable compressed data;
an optimizingmodule 1803, configured to perform hierarchical optimization on the query operation instruction according to the attribute information of the query-operable compressed data, so as to obtain an optimized query operation instruction;
anexecution module 1804 configured to perform operations on the query-operable compressed data according to the optimized query operation instructions.
After the query operation instruction is optimized, the optimized query operation instruction can be used for directly executing operation on the query operable compressed data, the operation based on the compressed data can effectively avoid a data decompression process, save the storage space occupied by the decompressed data, simplify the data operation process and improve the data operation performance.
In an embodiment of the present invention, as shown in fig. 19, the executingmodule 1804 includes:
a fourth determining sub-module 1901 configured to determine, according to the optimized query operation instruction, target query-operable compressed data in the query-operable compressed data;
anexecution submodule 1902 configured to perform an operation on the target query-operable compressed data according to the optimized query operation instruction.
After the query operation instruction is optimized, the corresponding operation data object may be changed, for example, the storage range of the data object may be reduced, at this time, the target query operable compressed data corresponding to the query operation instruction needs to be re-determined according to the optimized query operation instruction, and then, the operation is performed on the target query operable compressed data according to the optimized query operation instruction, so as to obtain an operation result.
Fig. 20 is a block diagram illustrating a structure of an electronic device according to an embodiment of the invention, and as shown in fig. 20, theelectronic device 2000 includes amemory 2001 and aprocessor 2002; wherein,
thememory 2001 is used to store one or more computer instructions, which are executed by theprocessor 2002 to implement any of the method steps described above.
FIG. 21 is a schematic diagram of a computer system suitable for implementing a query operation instruction optimization method according to an embodiment of the present invention.
As shown in fig. 21, thecomputer system 2100 includes aprocessing unit 2101, which can execute various processes in the above-described embodiments according to a program stored in a Read Only Memory (ROM)2102 or a program loaded from astorage portion 2108 into a Random Access Memory (RAM) 2103. In the RAM2103, various programs and data necessary for the operation of thesystem 2100 are also stored. Theprocessing unit 2101, ROM2102 and RAM2103 are connected to each other via abus 2104. An input/output (I/O)interface 2105 is also connected tobus 2104.
The following components are connected to the I/O interface 2105: aninput portion 2106 including a keyboard, a mouse, and the like; anoutput portion 2107 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker and the like; astorage portion 2108 including a hard disk and the like; and acommunication section 2109 including a network interface card such as a LAN card, a modem, or the like. Thecommunication section 2109 performs communication processing via a network such as the internet. Thedriver 2110 is also connected to the I/O interface 2105 as necessary. A removable medium 2111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on thedrive 2110 as necessary, so that a computer program read out therefrom is mounted in thestorage portion 2108 as necessary.
In particular, the above described method may be implemented as a computer software program according to an embodiment of the present invention. For example, embodiments of the present invention include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the query operation instruction optimization method. In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication section 2109, and/or installed from theremovable medium 2111. Theprocessing unit 1501 may be implemented as a CPU, a GPU, an FPGA, an NPU, or other processing units.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium may be a computer-readable storage medium included in the apparatus in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the embodiments of the present invention.
The foregoing description is only exemplary of the preferred embodiments of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention according to the embodiments of the present invention is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present invention are mutually replaced to form the technical solution.

Claims (14)

CN201911204591.3A2019-11-292019-11-29Query operation instruction optimization method and device, electronic equipment and storage mediumPendingCN112883059A (en)

Priority Applications (2)

Application NumberPriority DateFiling DateTitle
CN201911204591.3ACN112883059A (en)2019-11-292019-11-29Query operation instruction optimization method and device, electronic equipment and storage medium
PCT/CN2020/132386WO2021104478A1 (en)2019-11-292020-11-27Query operation instruction optimization method and apparatus, electronic device and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201911204591.3ACN112883059A (en)2019-11-292019-11-29Query operation instruction optimization method and device, electronic equipment and storage medium

Publications (1)

Publication NumberPublication Date
CN112883059Atrue CN112883059A (en)2021-06-01

Family

ID=76038888

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201911204591.3APendingCN112883059A (en)2019-11-292019-11-29Query operation instruction optimization method and device, electronic equipment and storage medium

Country Status (2)

CountryLink
CN (1)CN112883059A (en)
WO (1)WO2021104478A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102521363A (en)*2011-12-152012-06-27武汉达梦数据库有限公司Column partition based numerical data compression method for column storage database
CN104715039A (en)*2015-03-232015-06-17星环信息科技(上海)有限公司Column-based storage and research method and equipment based on hard disk and internal storage
US20180137176A1 (en)*2016-11-172018-05-17Sap SeDocument Store Utilizing Partial Object Compression

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN105718593B (en)*2016-01-282019-04-16长春师范大学A kind of database inquiry optimization method and system
CN109634529A (en)*2018-12-122019-04-16浪潮(北京)电子信息产业有限公司A kind of data compression method and decompressing method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102521363A (en)*2011-12-152012-06-27武汉达梦数据库有限公司Column partition based numerical data compression method for column storage database
CN104715039A (en)*2015-03-232015-06-17星环信息科技(上海)有限公司Column-based storage and research method and equipment based on hard disk and internal storage
US20180137176A1 (en)*2016-11-172018-05-17Sap SeDocument Store Utilizing Partial Object Compression

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CARSTEN BINNIG 等: "Dictionary-based order-preserving string compression for main memory column stores", 《SIGMOD \'09: PROCEEDINGS OF THE 2009 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA》, 29 June 2009 (2009-06-29), pages 283 - 296, XP058753191, DOI: 10.1145/1559845.1559877*
严秋玲 等: "列存储数据仓库中启发式查询优化机制", 《计算机学报 》, 11 November 2011 (2011-11-11), pages 2018 - 2026*

Also Published As

Publication numberPublication date
WO2021104478A1 (en)2021-06-03

Similar Documents

PublicationPublication DateTitle
US11151126B2 (en)Hybrid column store providing both paged and memory-resident configurations
US10747737B2 (en)Altering data type of a column in a database
US9430524B1 (en)RLE-aware optimization of SQL queries
US8452825B2 (en)Sortable floating point numbers
CN111666279B (en) Query data processing method, device, electronic device and computer storage medium
US10931302B2 (en)Fast evaluation of predicates against compressed data
CN112506880B (en)Data processing method and related equipment
CN112269806B (en)Data query method, device, equipment and computer storage medium
WO2014085722A2 (en)Size reducer for tabular data model
EP2901285A1 (en)A system and a method for executing sql basic operators on compressed data without decompression process
CN115827555B (en)Data processing method, computer device, storage medium, and multiplier structure
CN105930104A (en)Data storing method and device
CN112883059A (en)Query operation instruction optimization method and device, electronic equipment and storage medium
CN112612427B (en)Vehicle stop data processing method and device, storage medium and terminal
US20240137043A1 (en)Data compression method and apparatus, and data decompression method and apparatus
WO2018082245A1 (en)Raster data aggregation method and apparatus, raster data decoupling method and apparatus, and system
CN114547086B (en)Data processing method, device, equipment and computer readable storage medium
US9449046B1 (en)Constant-vector computation system and method that exploits constant-value sequences during data processing
CN116860798A (en) Data query method, electronic device and computer-readable storage medium
CN117632946A (en)Hierarchical B+ tree algorithm, device and computer storage medium based on dynamic prefix
JP6336302B2 (en) Information processing apparatus, information processing method, and program
DammeQuery Processing Based on Compressed Intermediates.
CN111309704B (en)Database operation method and database operation system
CN112487111A (en)Data table association method and device based on KV database
CN115982206B (en)Method and device for processing data

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp