Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit507615f

Browse files
committed
Describe hash join implementation
Add a high level description of our implementation of the hybrid hashjoin algorithm to the block comment in nodeHashjoin.c.Author: Melanie Plageman <melanieplageman@gmail.com>Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>Discussion:https://postgr.es/m/20230516160051.4267a800%40karst
1 parentb973f93 commit507615f

File tree

1 file changed

+45
-0
lines changed

1 file changed

+45
-0
lines changed

‎src/backend/executor/nodeHashjoin.c

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,51 @@
1010
* IDENTIFICATION
1111
* src/backend/executor/nodeHashjoin.c
1212
*
13+
* HASH JOIN
14+
*
15+
* This is based on the "hybrid hash join" algorithm described shortly in the
16+
* following page
17+
*
18+
* https://en.wikipedia.org/wiki/Hash_join#Hybrid_hash_join
19+
*
20+
* and in detail in the referenced paper:
21+
*
22+
* "An Adaptive Hash Join Algorithm for Multiuser Environments"
23+
* Hansjörg Zeller; Jim Gray (1990). Proceedings of the 16th VLDB conference.
24+
* Brisbane: 186–197.
25+
*
26+
* If the inner side tuples of a hash join do not fit in memory, the hash join
27+
* can be executed in multiple batches.
28+
*
29+
* If the statistics on the inner side relation are accurate, planner chooses a
30+
* multi-batch strategy and estimates the number of batches.
31+
*
32+
* The query executor measures the real size of the hashtable and increases the
33+
* number of batches if the hashtable grows too large.
34+
*
35+
* The number of batches is always a power of two, so an increase in the number
36+
* of batches doubles it.
37+
*
38+
* Serial hash join measures batch size lazily -- waiting until it is loading a
39+
* batch to determine if it will fit in memory. While inserting tuples into the
40+
* hashtable, serial hash join will, if that tuple were to exceed work_mem,
41+
* dump out the hashtable and reassign them either to other batch files or the
42+
* current batch resident in the hashtable.
43+
*
44+
* Parallel hash join, on the other hand, completes all changes to the number
45+
* of batches during the build phase. If it increases the number of batches, it
46+
* dumps out all the tuples from all batches and reassigns them to entirely new
47+
* batch files. Then it checks every batch to ensure it will fit in the space
48+
* budget for the query.
49+
*
50+
* In both parallel and serial hash join, the executor currently makes a best
51+
* effort. If a particular batch will not fit in memory, it tries doubling the
52+
* number of batches. If after a batch increase, there is a batch which
53+
* retained all or none of its tuples, the executor disables growth in the
54+
* number of batches globally. After growth is disabled, all batches that would
55+
* have previously triggered an increase in the number of batches instead
56+
* exceed the space allowed.
57+
*
1358
* PARALLELISM
1459
*
1560
* Hash joins can participate in parallel query execution in several ways. A

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp