Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit9e551a1

Browse files
committed
Rotate instead of shifting hash join batch number.
Our algorithm for choosing batch numbers turned out not to workeffectively for multi-billion key inner relations. We would usemore hash bits than we have, and effectively concentrate all tuplesinto a smaller number of batches than we intended. While ideallywe should switch to wider hashes, for now, change the algorithm toone that effectively gives up bits from the bucket number when wedon't have enough bits. That means we'll finish up with longerbucket chains than would be ideal, but that's better than havingbatches that don't fit in work_mem and can't be divided.Batch-patch to all supported releases.Author: Thomas MunroReviewed-by: Tom Lane, thanks also to Tomas Vondra, Alvaro Herrera, Andres Freund for testing and discussionReported-by: James ColemanDiscussion:https://postgr.es/m/16104-dc11ed911f1ab9df%40postgresql.org
1 parentf49e5ef commit9e551a1

File tree

1 file changed

+17
-4
lines changed

1 file changed

+17
-4
lines changed

‎src/backend/executor/nodeHash.c

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1848,6 +1848,15 @@ ExecHashGetHashValue(HashJoinTable hashtable,
18481848
return true;
18491849
}
18501850

1851+
/*
1852+
* Rotate the bits of "word" to the right by n bits.
1853+
*/
1854+
staticinlineuint32
1855+
pg_rotate_right32(uint32word,intn)
1856+
{
1857+
return (word >>n) | (word << (sizeof(word)*BITS_PER_BYTE-n));
1858+
}
1859+
18511860
/*
18521861
* ExecHashGetBucketAndBatch
18531862
*Determine the bucket number and batch number for a hash value
@@ -1857,7 +1866,7 @@ ExecHashGetHashValue(HashJoinTable hashtable,
18571866
* chains), and must only cause the batch number to remain the same or
18581867
* increase. Our algorithm is
18591868
*bucketno = hashvalue MOD nbuckets
1860-
*batchno = (hashvalue DIV nbuckets) MOD nbatch
1869+
*batchno =ROR(hashvalue, log2_nbuckets) MOD nbatch
18611870
* where nbuckets and nbatch are both expected to be powers of 2, so we can
18621871
* do the computations by shifting and masking. (This assumes that all hash
18631872
* functions are good about randomizing all their output bits, else we are
@@ -1869,7 +1878,11 @@ ExecHashGetHashValue(HashJoinTable hashtable,
18691878
* number the way we do here).
18701879
*
18711880
* nbatch is always a power of 2; we increase it only by doubling it. This
1872-
* effectively adds one more bit to the top of the batchno.
1881+
* effectively adds one more bit to the top of the batchno. In very large
1882+
* joins, we might run out of bits to add, so we do this by rotating the hash
1883+
* value. This causes batchno to steal bits from bucketno when the number of
1884+
* virtual buckets exceeds 2^32. It's better to have longer bucket chains
1885+
* than to lose the ability to divide batches.
18731886
*/
18741887
void
18751888
ExecHashGetBucketAndBatch(HashJoinTablehashtable,
@@ -1882,9 +1895,9 @@ ExecHashGetBucketAndBatch(HashJoinTable hashtable,
18821895

18831896
if (nbatch>1)
18841897
{
1885-
/* we can do MOD by masking, DIV by shifting */
18861898
*bucketno=hashvalue& (nbuckets-1);
1887-
*batchno= (hashvalue >>hashtable->log2_nbuckets)& (nbatch-1);
1899+
*batchno=pg_rotate_right32(hashvalue,
1900+
hashtable->log2_nbuckets)& (nbatch-1);
18881901
}
18891902
else
18901903
{

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp