Detailed Description
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.
The following further describes specific implementation of the embodiments of the present invention with reference to the drawings.
Referring to fig. 1A, a flowchart of steps of a hash join method according to a first embodiment of the present application is shown.
Specifically, the hash connection method provided in this embodiment includes the following steps:
in step S101, hash entries corresponding to multiple rows of data in the hash table of the target data table are determined based on hash values of the multiple rows of data in the data table to be connected.
In the embodiment of the present application, the data table to be connected may be understood as a data table waiting for performing a connection operation in relational algebra. The hash value of the plurality of lines of data may be understood as a value obtained by hashing the plurality of lines of data. The hash entries of the plurality of lines of data respectively corresponding to the hash table of the target data table may be entry pointers of the plurality of lines of data respectively corresponding to the hash table of the target data table, indexes of the plurality of lines of data respectively corresponding to the hash table of the target data table, or array subscripts of the plurality of lines of data respectively corresponding to the hash table of the target data table, and the like. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In a specific example, the hash table of the target data table may be generated based on the locally stored target data table before determining the hash entries corresponding to the plurality of rows of data in the hash table of the target data table, respectively. In the process of generating the hash table of the target data table, the data in the target data table can be sorted according to the similarity of the connection primary keys, so that the data in the hash table of the target data table are ensured to be distributed according to the connection primary keys as much as possible, and the execution of hash connection is facilitated. Here, the manner of generating the hash table differs depending on the database, and for example, the hash table of the target data table may be generated in an array manner, a linked list manner, or the like. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In one implementation, a process of generating a hash table is described by taking a compute node 1 in a massively Parallel processing computer (MPP) architecture database as an example. Two internal tables exist in compute node 1: the order (order) table and Customer (Customer) table are shown in tables 1 and 2, respectively.
TABLE 1
TABLE 2
Suppose that a user submits a query request a to the MPP architecture database as follows: select c _ custkey, o _ order, o _ shift from customer, order where c _ custkey is o _ custkey;
and the coordinating node of the MPP framework database receives the query request A and issues corresponding query tasks to each computing node in the MPP framework database. After receiving the query task, the computing node 1 learns that the order form and the customer form need to be hashed and connected according to the custkey, and then the computing node 1 hashes and connects the order form and the customer form according to the custkey to form a hash table shown in the following table 3. The value of the hash key (hashkey) can be obtained by calculating through a corresponding hash algorithm, and the connection predicate is the custkey in the order list and the customer list.
TABLE 3
It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In some optional embodiments, when determining the hash entries corresponding to the rows of data in the hash table of the target data table respectively based on the hash values of the rows of data in the data table to be connected, retrieving the hash entry corresponding to each row of data in the hash table respectively based on the hash value of the connecting primary key of each row of data in the rows of data. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In a specific example, the Join (Join) may be understood as a basic operation in a database for matching two data tables or multiple data tables. There are many variations such as inner join (inner join), anti join (anti join), etc. The Join Key (Join Key) may be understood as a row to be compared when joining, and may be one or more rows. And if the corresponding connection main keys of the data of the two rows of the two data tables are the same, the data of the two rows of the two data tables are successfully connected. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In some optional embodiments, when retrieving the hash entry corresponding to each line of data in the hash table based on the hash value of the connected primary key of each line of data in the plurality of lines of data, the identification information of the hash entry corresponding to each line of data in the hash table is calculated in parallel based on the hash value of the connected primary key of each line of data in the plurality of lines of data by a plurality of threads for calculating the identification information of the hash entry; and determining the hash entry corresponding to each line of data in the hash table based on the identification information of the hash entry corresponding to each line of data in the hash table. Therefore, the retrieval efficiency of the hash entries corresponding to each row of data in the hash table can be effectively improved. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In a specific example, the identification information of the hash entry may be a number of the hash entry. When the identification information of the hash entries corresponding to each row of data in the hash table is calculated in parallel through a plurality of threads for calculating the identification information of the hash entries and based on the hash values of the main keys connected to each row of data in the hash table, the identification information of the hash entries corresponding to each row of data in the hash table is calculated in parallel through each thread for calculating the identification information of the hash entries in the plurality of threads for calculating the identification information of the hash entries and based on the hash values of the main keys connected to each row of data in the hash table. More specifically, performing modulo operation on the hash value of each line of data in the plurality of lines of data, which is connected with the main key, through each thread in a plurality of threads for calculating the identification information of the hash entry, so as to obtain the identification information of the hash entry corresponding to each line of data in the hash table. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In step S102, based on the hash entries, hash buckets corresponding to the rows of data in the hash table respectively are determined.
In the embodiments of the present application, the hash bucket may be understood as a "data block" storing a plurality of records. The size of the hash bucket is fixed, i.e. only a fixed number of collisions can be handled. As shown in fig. 1B, the hash table is composed of hash entries and hash buckets. Each hash entry correspondence includes one or more hash buckets. One or more lines of data and one or more lines of hash data of the data are correspondingly stored in each hash bucket. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In some optional embodiments, after the parallel computation of the identification information of the hash entry corresponding to each line of data in the hash table based on the hash value of the primary key connected to each line of data in the plurality of lines of data by a plurality of threads for computing the identification information of the hash entry, the method further includes: caching identification information of hash entries respectively corresponding to each row of data in the hash table; the determining, based on the hash entry, hash buckets corresponding to the plurality of lines of data in the hash table respectively includes: and searching a hash bucket corresponding to each row of data in the hash table in the hash entry identified by the cached identification information. Specifically, in the hash entry identified by the cached identification information, traversing a hash bucket corresponding to each row of data in the plurality of rows of data in the hash table respectively. Wherein the hash entry includes one or more hash buckets. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In some optional embodiments, when determining the hash buckets respectively corresponding to the plurality of lines of data in the hash table based on the hash entry, by a plurality of hash bucket lookup threads, in the hash entry, concurrently looking up each line of data in the plurality of lines of data in the hash table respectively corresponding to the hash buckets in the hash table. Therefore, the searching efficiency of the hash bucket corresponding to each row of data in the multiple rows of data in the hash table can be effectively improved. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In a specific example, when the hash entry is searched for the hash bucket corresponding to each row of data in the hash table in parallel by a plurality of hash bucket lookup threads, each hash bucket lookup thread in the hash entry is searched for the hash bucket corresponding to each row of data in the hash table in parallel by each hash bucket lookup thread in the plurality of hash bucket lookup threads. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In step S103, if it is determined that the rows of data and the rows of data of the target data table stored in the corresponding hash bucket respectively satisfy an equivalent connection condition, performing hash connection on the rows of data and the rows of data of the target data table stored in the corresponding hash bucket to obtain a hash connection result between the data table to be connected and the target data table.
In the embodiment of the present application, the equivalent join condition may be understood as a common join condition of a relational operation-join operation, which is a special case of conditional join (or θ join) when a join operator is a "═ sign, that is, θ equals 0. The Hash (Hash) is understood to mean that for a given datum, its corresponding Hash value is calculated. The hash values calculated for the same data must be identical. The Hash Join (Hash Join) can be understood as a Hash-based Join algorithm, and the basic idea is to create a Hash table from one data table, and then search each row of data of the other data table in the Hash table row by row. The hash table helps to improve the retrieval speed. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In some optional embodiments, before hashing the rows of data with the rows of the target data table stored in the corresponding hash bucket, the method further comprises: for each row of data in the plurality of rows of data, if the hash value of the row of data connected with the main key is the same as the hash value of the row of data connected with the main key of the target data table stored in the corresponding hash bucket, determining whether the hash keys of the row of data are in one-to-one correspondence with the hash keys of the row of data of the target data table stored in the corresponding hash bucket; and if the hash keys of the line data are determined to be in one-to-one correspondence with the hash keys of the line data of the target data table stored in the corresponding hash bucket, determining that the line data and the line data of the target data table stored in the corresponding hash bucket meet the equivalent connection condition. Therefore, whether the row data and the row data of the target data table stored in the corresponding hash bucket meet the equivalent connection condition can be effectively determined through the hash value of the row data connected with the main key and the hash key of the row data. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In a specific example, the hash value of the row data connected to the primary key may be a value obtained by hashing the row data connected to the primary key. The hash key of the row data can be understood as a key value obtained by hashing a field used by a connection condition of the row data. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In some optional embodiments, when the multiple rows of data are hashed with the row data of the target data table stored in the corresponding hash bucket, each row of data in the multiple rows of data is hashed with the row data of the target data table stored in the corresponding hash bucket in parallel through multiple hash connection threads. Therefore, the efficiency of performing hash connection on each row of data in the rows of data and the row of data of the target data table stored in the corresponding hash bucket can be effectively improved. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In a specific example, when hash connection is performed in parallel on each line of data in the plurality of lines of data and the line of data of the target data table stored in the corresponding hash bucket through a plurality of hash connection threads, hash connection is performed in parallel on each line of data in the plurality of lines of data and the line of data of the target data table stored in the corresponding hash bucket through each hash connection thread in the plurality of hash connection threads. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
In a specific example, as shown in fig. 1C, the hash join process is as follows: the method comprises the steps of firstly calculating hash values of main keys for connecting a plurality of rows of data in a data table to be connected, and then retrieving hash entries corresponding to each row of data in the hash table based on the hash values of the main keys for connecting the rows of data in the data table to be connected. After the hash entry corresponding to each row of data in the hash table is retrieved, traversing the hash bucket corresponding to each row of data in the hash table in the hash entry. Specifically, whether the hash value of the row data connecting main key is the same as the hash value of the row data connecting main key of the target data table stored in the corresponding hash bucket is determined, if it is determined that the hash value of the row data connecting main key is the same as the hash value of the row data connecting main key of the target data table stored in the corresponding hash bucket, it is determined whether the hash key of the row data is in one-to-one correspondence with the hash key of the row data of the target data table stored in the corresponding hash bucket, and if it is determined that the hash key of the row data is in one-to-one correspondence with the hash key of the row data of the target data table stored in the corresponding hash bucket, it is determined that the row data and the row data of the target data table stored in the corresponding hash bucket satisfy an equivalent connection condition, and thus a hash connection result of the row data and the row data of the target data table stored in the corresponding hash bucket can be obtained. And traversing the next hash bucket in the hash entry until the hash bucket in the hash entry is traversed if the hash value of the row data connected with the main key is determined to be different from the hash value of the row data connected with the main key of the target data table stored in the corresponding hash bucket. And traversing the next hash bucket in the hash entry until the hash bucket in the hash entry is traversed if the hash key of the line data is determined to be not corresponding to the hash key of the line data of the target data table stored in the corresponding hash bucket. It should be understood that the above description is only exemplary, and the embodiments of the present application are not limited in this respect.
According to the hash connection method provided by the embodiment of the invention, hash entries corresponding to a plurality of rows of data in the hash table of the target data table are determined based on hash values of a plurality of rows of data in the data table to be connected, and hash buckets corresponding to a plurality of rows of data in the hash table are determined based on the hash entries; if it is determined that the rows of data respectively satisfy the equivalent connection condition with the row data of the target data table stored in the corresponding hash bucket, hash connection is performed on the rows of data and the row data of the target data table stored in the corresponding hash bucket to obtain a hash connection result of the data table to be connected and the target data table, the rows of data in the data table to be connected can be detected by the hash table at one time, that is, the rows of data in the data table to be connected can be detected by the hash table at one time, the efficiency of hash table detection on the row data in the data table to be connected can be effectively improved, and further, the performance of hash connection and the overall performance of a database using hash connection can be effectively improved.
The hash connection method provided by the present embodiment may be executed by any suitable device with data processing capability, including but not limited to: a camera, a terminal, a mobile terminal, a PC, a server, an in-vehicle device, an entertainment device, an advertising device, a Personal Digital Assistant (PDA), a tablet, a laptop, a handheld game machine, glasses, a watch, a wearable device, a virtual display device, a display enhancement device, or the like.
Referring to fig. 2, a schematic diagram of a database system in the second embodiment of the present application is shown.
The database system provided by the embodiment comprises: the client is used for receiving operation used for indicating hash connection of a data table to be connected and sending a hash connection request used for requesting the hash connection of the data table to be connected to the database server based on the operation; the database server is configured to determine hash values of multiple rows of data in the data table to be connected based on the received hash connection request, determine hash entries corresponding to the multiple rows of data in a hash table of a target data table based on the hash values of the multiple rows of data in the data table to be connected, determine hash buckets corresponding to the multiple rows of data in the hash table based on the hash entries, and if it is determined that the multiple rows of data and the row data of the target data table stored in the corresponding hash bucket satisfy an equivalent connection condition, perform hash connection on the multiple rows of data and the row data of the target data table stored in the corresponding hash bucket to obtain a hash connection result between the data table to be connected and the target data table.
Referring to fig. 2, a schematic structural diagram of a database system for implementing the hash connection method provided in the embodiment of the present application, where the system may include a database server and a client in a terminal device a, it should be understood that the database server and the client in the terminal device a presented in fig. 2 are only exemplary and are not limited to the implementation forms of the database server and the client in the terminal device a.
In practical application, the database server and the terminal device a may be connected by a wired or wireless network, and may specifically realize communication connection through mobile networks such as GSM, GPRS, LTE, and the like, or perform communication connection through modes such as bluetooth, WIFI, infrared, and the like.
The database server may be a server provided in a service device for providing services for a user, and specifically may be an independent application service device.
The terminal device a may be a user-oriented terminal capable of interacting with a user, such as a mobile phone, a notebook, a computer, an iPad, an intelligent audio, and the like, and may be various self-service terminals, such as self-service machines in places such as hospitals, banks, stations, and the like, and in addition, the terminal device a may also be an intelligent machine supporting interaction, such as a chat robot, a floor sweeping robot, a meal ordering service robot, and the like. The product type and the physical form of the terminal equipment are not limited, and the terminal equipment needs to have an interactive function and can be realized by installing interactive application programs such as database and the like.
When performing hash connection, the client in the terminal device a may send a hash connection request to the database server through the network. The database server receives a hash connection request sent by a client side in the terminal device A, and performs hash connection on a data table to be connected based on the hash connection request. Therefore, the hash connection method provided in the embodiment of the present application may be executed by a database server, and a specific implementation process may refer to the description of the first method embodiment.
Through the database system provided by the embodiment of the application, a client receives an operation for indicating hash connection of a data table to be connected, and sends a hash connection request for requesting hash connection of the data table to be connected to the database server based on the operation, the database server determines hash values of multiple rows of data in the data table to be connected based on the received hash connection request, determines hash entries respectively corresponding to the multiple rows of data in a hash table of a target data table based on the hash values of the multiple rows of data in the data table to be connected, determines hash buckets respectively corresponding to the multiple rows of data in the hash table based on the hash entries, and if the multiple rows of data respectively satisfy an equivalent connection condition with the row data of the target data table stored in the corresponding hash buckets, and performing hash connection on the multi-row data and the corresponding row data of the target data table stored in the hash bucket to obtain a hash connection result between the data table to be connected and the target data table, and performing hash table detection on the multi-row data in the data table to be connected at one time, namely performing hash table detection on batch row data in the data table to be connected at one time, so that the efficiency of performing hash table detection on the row data in the data table to be connected can be effectively improved, and further, the performance of hash connection and the overall performance of a database using hash connection can be effectively improved.
The database system provided in this embodiment is used to implement the corresponding hash connection method in the foregoing method embodiments, and details are not described here.
Referring to fig. 3, a schematic structural diagram of a third huh connection device according to an embodiment of the present application is shown.
The hash connection apparatus provided in this embodiment includes: a first determining module 201, configured to determine, based on hash values of multiple rows of data in a data table to be connected, hash entries corresponding to the multiple rows of data in a hash table of a target data table, respectively; a second determining module 202, configured to determine, based on the hash entries, hash buckets corresponding to the lines of data in the hash table, respectively; and the hash connection module 203 is configured to perform hash connection on the multiple lines of data and the corresponding line data of the target data table stored in the hash bucket if it is determined that the multiple lines of data and the corresponding line data of the target data table stored in the hash bucket respectively satisfy an equivalent connection condition, so as to obtain a hash connection result between the data table to be connected and the target data table.
The hash connection apparatus provided in this embodiment is used to implement the corresponding hash connection method in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again.
Referring to fig. 4, a schematic structural diagram of a huh-in-four connection device according to an embodiment of the present application is shown.
The hash connection apparatus provided in this embodiment includes: a first determining module 301, configured to determine, based on hash values of multiple rows of data in a data table to be connected, hash entries corresponding to the multiple rows of data in a hash table of a target data table, respectively; a second determining module 302, configured to determine, based on the hash entries, hash buckets corresponding to the rows of data in the hash table, respectively; a hash connection module 305, configured to perform hash connection on the multiple lines of data and the line data of the target data table stored in the corresponding hash bucket if it is determined that the multiple lines of data and the line data of the target data table stored in the corresponding hash bucket respectively satisfy an equal value connection condition, so as to obtain a hash connection result between the data table to be connected and the target data table.
Optionally, the first determining module 301 includes: and the retrieval submodule 3011 is configured to retrieve, based on the hash value of each line of data in the plurality of lines of data, a hash entry corresponding to each line of data in the hash table.
Optionally, the retrieving sub-module 3011 includes: a calculating unit 3012, configured to calculate, in parallel, identification information of hash entries corresponding to each line of data in the hash table based on hash values of the primary keys connected to each line of data in the plurality of lines of data through a plurality of threads for calculating identification information of the hash entries; a determining unit 3014, configured to determine, based on identification information of hash entries corresponding to the lines of data in the hash table, hash entries corresponding to the lines of data in the hash table respectively.
Optionally, after the computing unit 3012, the retrieving sub-module 3011 further includes: a caching unit 3013, configured to cache identification information of hash entries corresponding to the rows of data in the hash table respectively; the second determining module 302 is specifically configured to: and searching a hash bucket corresponding to each row of data in the hash table in the hash entry identified by the cached identification information.
Optionally, the second determining module 302 is specifically configured to: and searching hash buckets corresponding to each row of data in the hash table in the hash entry in parallel through a plurality of hash bucket searching threads.
Optionally, before the hash connection module 305, the apparatus further includes: a third determining module 303, configured to determine, for each row of data in the multiple rows of data, whether hash keys of a row of data correspond to hash keys of a row of data of the target data table stored in a corresponding hash bucket one by one if it is determined that hash values of connecting primary keys of a row of data are the same as hash values of connecting primary keys of a row of data of the target data table stored in the corresponding hash bucket; a fourth determining module 304, configured to determine that line data and line data of the target data table stored in the corresponding hash bucket satisfy an equivalent connection condition if it is determined that the hash key of the line data and the hash key of the line data of the target data table stored in the corresponding hash bucket correspond to each other one by one.
Optionally, the hash connection module 305 is specifically configured to: and performing hash connection on each line of data in the plurality of lines of data and the corresponding line of data of the target data table stored in the hash bucket in parallel through a plurality of hash connection threads.
The hash connection apparatus provided in this embodiment is used to implement the corresponding hash connection method in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again.
Referring to fig. 5, a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention is shown, and the specific embodiment of the present invention does not limit the specific implementation of the electronic device.
As shown in fig. 5, the electronic device may include: a processor (processor)402, aCommunications Interface 404, a memory 406, and a Communications bus 408.
Wherein:
the processor 402,communication interface 404, and memory 406 communicate with each other via a communication bus 408.
Acommunication interface 404 for communicating with other electronic devices or servers.
The processor 402 is configured to execute theprogram 410, and may specifically perform relevant steps in the hash join method embodiment described above.
In particular,program 410 may include program code comprising computer operating instructions.
The processor 402 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.
And a memory 406 for storing aprogram 410. Memory 406 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
Theprogram 410 may specifically be configured to cause the processor 402 to perform the following operations: determining hash entries respectively corresponding to a plurality of rows of data in a hash table of a target data table based on hash values of the plurality of rows of data in the data table to be connected; determining hash buckets corresponding to the lines of data in the hash table respectively based on the hash entries; and if the fact that the rows of data respectively meet the row data of the target data table stored in the corresponding hash bucket with the corresponding row data of the target data table is determined, performing hash connection on the rows of data and the row data of the target data table stored in the corresponding hash bucket to obtain a hash connection result of the data table to be connected and the target data table.
In an optional implementation, theprogram 410 is further configured to cause the processor 402 to, when determining, based on hash values of multiple rows of data in the data table to be connected, corresponding hash entries in the hash table of the target data table for the multiple rows of data, retrieve, based on the hash value of the connecting primary key of each row of data in the multiple rows of data, the corresponding hash entries in the hash table for the each row of data.
In an alternative embodiment, theprogram 410 is further configured to cause the processor 402 to, when retrieving the hash entry corresponding to each row of data in the hash table based on the hash value of the connected primary key of each row of data in the plurality of rows, concurrently calculate, by a plurality of threads for calculating identification information of the hash entry, the identification information of the hash entry corresponding to each row of data in the hash table based on the hash value of the connected primary key of each row of data in the plurality of rows; and determining the hash entry corresponding to each line of data in the hash table based on the identification information of the hash entry corresponding to each line of data in the hash table.
In an alternative embodiment, theprogram 410 is further configured to cause the processor 402 to cache the identification information of the hash entry corresponding to each line of data in the hash table after parallel computing the identification information of the hash entry corresponding to each line of data in the hash table based on the hash value of the primary key connecting to each line of data in the plurality of lines of data through a plurality of threads for computing the identification information of the hash entry; and when determining hash buckets corresponding to the multiple lines of data in the hash table respectively based on the hash entries, searching the hash bucket corresponding to each line of data in the hash table respectively in the hash entry identified by the cached identification information.
In an alternative embodiment, theprogram 410 is further configured to cause the processor 402 to, when determining the hash buckets corresponding to the rows of data in the hash table respectively based on the hash entries, search, by a plurality of hash bucket search threads, in the hash entries, the hash buckets corresponding to each row of data in the hash table respectively in the rows of data in parallel.
In an optional implementation, theprogram 410 is further configured to, before hash-connecting the plurality of rows of data with the row data of the target data table stored in the corresponding hash bucket, for each row of data in the plurality of rows of data, determine whether hash keys of a row data correspond to hash keys of a row data of the target data table stored in the corresponding hash bucket one-to-one if it is determined that hash values of connected primary keys of a row data of the target data table stored in the corresponding hash bucket are the same; and if the hash keys of the line data are determined to be in one-to-one correspondence with the hash keys of the line data of the target data table stored in the corresponding hash bucket, determining that the line data and the line data of the target data table stored in the corresponding hash bucket meet the equivalent connection condition.
In an optional implementation, theprogram 410 is further configured to, when performing hash connection on the plurality of lines of data and the corresponding line data of the target data table stored in the hash bucket, perform hash connection on each line of data in the plurality of lines of data and the corresponding line data of the target data table stored in the hash bucket in parallel through a plurality of hash connection threads.
For specific implementation of each step in theprogram 410, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing hash join method embodiment, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.
Through the electronic device in the embodiment, hash entries corresponding to a plurality of rows of data in a hash table of a target data table are determined based on hash values of the plurality of rows of data in the data table to be connected, and hash buckets corresponding to the plurality of rows of data in the hash table are determined based on the hash entries; if it is determined that the rows of data respectively satisfy the equivalent connection condition with the row data of the target data table stored in the corresponding hash bucket, hash connection is performed on the rows of data and the row data of the target data table stored in the corresponding hash bucket to obtain a hash connection result of the data table to be connected and the target data table, the rows of data in the data table to be connected can be detected by the hash table at one time, that is, the rows of data in the data table to be connected can be detected by the hash table at one time, the efficiency of hash table detection on the row data in the data table to be connected can be effectively improved, and further, the performance of hash connection and the overall performance of a database using hash connection can be effectively improved.
It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present invention may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present invention.
The above-described method according to an embodiment of the present invention may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein may be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the hash join method described herein. Further, when a general-purpose computer accesses code for implementing the hash join method shown herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the hash join method shown herein.
Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.
The above embodiments are only for illustrating the embodiments of the present invention and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.