NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commit507615f

committed

Describe hash join implementation

Add a high level description of our implementation of the hybrid hashjoin algorithm to the block comment in nodeHashjoin.c.Author: Melanie Plageman <melanieplageman@gmail.com>Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com>Reviewed-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>Discussion:https://postgr.es/m/20230516160051.4267a800%40karst

1 parentb973f93 commit507615fCopy full SHA for 507615f

File tree

1 file changed

+45

-0

lines changed

src/backend/executor
- nodeHashjoin.c

1 file changed

+45

-0

lines changed

`‎src/backend/executor/nodeHashjoin.c‎`

Lines changed: 45 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -10,6 +10,51 @@`
`10`	`10`	`* IDENTIFICATION`
`11`	`11`	`* src/backend/executor/nodeHashjoin.c`
`12`	`12`	`*`
	`13`	`+ * HASH JOIN`
	`14`	`+ *`
	`15`	`+ * This is based on the "hybrid hash join" algorithm described shortly in the`
	`16`	`+ * following page`
	`17`	`+ *`
	`18`	`+ * https://en.wikipedia.org/wiki/Hash_join#Hybrid_hash_join`
	`19`	`+ *`
	`20`	`+ * and in detail in the referenced paper:`
	`21`	`+ *`
	`22`	`+ * "An Adaptive Hash Join Algorithm for Multiuser Environments"`
	`23`	`+ * Hansjörg Zeller; Jim Gray (1990). Proceedings of the 16th VLDB conference.`
	`24`	`+ * Brisbane: 186–197.`
	`25`	`+ *`
	`26`	`+ * If the inner side tuples of a hash join do not fit in memory, the hash join`
	`27`	`+ * can be executed in multiple batches.`
	`28`	`+ *`
	`29`	`+ * If the statistics on the inner side relation are accurate, planner chooses a`
	`30`	`+ * multi-batch strategy and estimates the number of batches.`
	`31`	`+ *`
	`32`	`+ * The query executor measures the real size of the hashtable and increases the`
	`33`	`+ * number of batches if the hashtable grows too large.`
	`34`	`+ *`
	`35`	`+ * The number of batches is always a power of two, so an increase in the number`
	`36`	`+ * of batches doubles it.`
	`37`	`+ *`
	`38`	`+ * Serial hash join measures batch size lazily -- waiting until it is loading a`
	`39`	`+ * batch to determine if it will fit in memory. While inserting tuples into the`
	`40`	`+ * hashtable, serial hash join will, if that tuple were to exceed work_mem,`
	`41`	`+ * dump out the hashtable and reassign them either to other batch files or the`
	`42`	`+ * current batch resident in the hashtable.`
	`43`	`+ *`
	`44`	`+ * Parallel hash join, on the other hand, completes all changes to the number`
	`45`	`+ * of batches during the build phase. If it increases the number of batches, it`
	`46`	`+ * dumps out all the tuples from all batches and reassigns them to entirely new`
	`47`	`+ * batch files. Then it checks every batch to ensure it will fit in the space`
	`48`	`+ * budget for the query.`
	`49`	`+ *`
	`50`	`+ * In both parallel and serial hash join, the executor currently makes a best`
	`51`	`+ * effort. If a particular batch will not fit in memory, it tries doubling the`
	`52`	`+ * number of batches. If after a batch increase, there is a batch which`
	`53`	`+ * retained all or none of its tuples, the executor disables growth in the`
	`54`	`+ * number of batches globally. After growth is disabled, all batches that would`
	`55`	`+ * have previously triggered an increase in the number of batches instead`
	`56`	`+ * exceed the space allowed.`
	`57`	`+ *`
`13`	`58`	`* PARALLELISM`
`14`	`59`	`*`
`15`	`60`	`* Hash joins can participate in parallel query execution in several ways. A`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit507615f

File tree

1 file changed

1 file changed

`‎src/backend/executor/nodeHashjoin.c‎`

0 commit comments