Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit8e89bc6

Browse files
committed
Rotate instead of shifting hash join batch number.
Our algorithm for choosing batch numbers turned out not to workeffectively for multi-billion key inner relations. We would usemore hash bits than we have, and effectively concentrate all tuplesinto a smaller number of batches than we intended. While ideallywe should switch to wider hashes, for now, change the algorithm toone that effectively gives up bits from the bucket number when wedon't have enough bits. That means we'll finish up with longerbucket chains than would be ideal, but that's better than havingbatches that don't fit in work_mem and can't be divided.Batch-patch to all supported releases.Author: Thomas MunroReviewed-by: Tom Lane, thanks also to Tomas Vondra, Alvaro Herrera, Andres Freund for testing and discussionReported-by: James ColemanDiscussion:https://postgr.es/m/16104-dc11ed911f1ab9df%40postgresql.org
1 parent81be0c5 commit8e89bc6

File tree

1 file changed

+17
-4
lines changed

1 file changed

+17
-4
lines changed

‎src/backend/executor/nodeHash.c

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1005,6 +1005,15 @@ ExecHashGetHashValue(HashJoinTable hashtable,
10051005
return true;
10061006
}
10071007

1008+
/*
1009+
* Rotate the bits of "word" to the right by n bits.
1010+
*/
1011+
staticinlineuint32
1012+
pg_rotate_right32(uint32word,intn)
1013+
{
1014+
return (word >>n) | (word << (sizeof(word)*BITS_PER_BYTE-n));
1015+
}
1016+
10081017
/*
10091018
* ExecHashGetBucketAndBatch
10101019
*Determine the bucket number and batch number for a hash value
@@ -1014,7 +1023,7 @@ ExecHashGetHashValue(HashJoinTable hashtable,
10141023
* chains), and must only cause the batch number to remain the same or
10151024
* increase. Our algorithm is
10161025
*bucketno = hashvalue MOD nbuckets
1017-
*batchno = (hashvalue DIV nbuckets) MOD nbatch
1026+
*batchno =ROR(hashvalue, log2_nbuckets) MOD nbatch
10181027
* where nbuckets and nbatch are both expected to be powers of 2, so we can
10191028
* do the computations by shifting and masking. (This assumes that all hash
10201029
* functions are good about randomizing all their output bits, else we are
@@ -1026,7 +1035,11 @@ ExecHashGetHashValue(HashJoinTable hashtable,
10261035
* number the way we do here).
10271036
*
10281037
* nbatch is always a power of 2; we increase it only by doubling it. This
1029-
* effectively adds one more bit to the top of the batchno.
1038+
* effectively adds one more bit to the top of the batchno. In very large
1039+
* joins, we might run out of bits to add, so we do this by rotating the hash
1040+
* value. This causes batchno to steal bits from bucketno when the number of
1041+
* virtual buckets exceeds 2^32. It's better to have longer bucket chains
1042+
* than to lose the ability to divide batches.
10301043
*/
10311044
void
10321045
ExecHashGetBucketAndBatch(HashJoinTablehashtable,
@@ -1039,9 +1052,9 @@ ExecHashGetBucketAndBatch(HashJoinTable hashtable,
10391052

10401053
if (nbatch>1)
10411054
{
1042-
/* we can do MOD by masking, DIV by shifting */
10431055
*bucketno=hashvalue& (nbuckets-1);
1044-
*batchno= (hashvalue >>hashtable->log2_nbuckets)& (nbatch-1);
1056+
*batchno=pg_rotate_right32(hashvalue,
1057+
hashtable->log2_nbuckets)& (nbatch-1);
10451058
}
10461059
else
10471060
{

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp