NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commitbe3b265

committed

Improve SELECT DISTINCT to consider hash aggregation, as well as sort/uniq,

as methods for implementing the DISTINCT step. This eliminates the formerperformance gap between DISTINCT and GROUP BY, and also makes it possibleto do SELECT DISTINCT on datatypes that only support hashing not sorting.SELECT DISTINCT ON is still always implemented by sorting; it would takeexecutor changes to support hashing that, and it's not clear it's worththe trouble.This is a release-note-worthy incompatibility from previous PG versions,since SELECT DISTINCT can no longer be counted on to deliver sorted outputwithout explicitly saying ORDER BY. (Anyone who can't cope with thatcan consider turning off enable_hashagg.)Several regression test queries needed to have ORDER BY added to preservestable output order. I fixed the ones that manifested here, but theremight be some other cases that show up on other platforms.

1 parent4abd7b4 commitbe3b265Copy full SHA for be3b265

File tree

13 files changed

+396

-111

lines changed

src
- backend
  - nodes
    - outfuncs.c
  - optimizer/plan
    - planmain.c
    - planner.c
  - parser
    - parse_clause.c
- include/nodes
  - relation.h
- test/regress
  - expected
  - input
    - misc.source
  - output
    - misc.source
  - sql

13 files changed

+396

-111

lines changed

`‎src/backend/nodes/outfuncs.c`

Lines changed: 2 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -8,7 +8,7 @@`
`8`	`8`	`*`
`9`	`9`	`*`
`10`	`10`	`* IDENTIFICATION`
`11`		`- * $PostgreSQL: pgsql/src/backend/nodes/outfuncs.c,v 1.329 2008/08/02 21:31:59 tgl Exp $`
	`11`	`+ * $PostgreSQL: pgsql/src/backend/nodes/outfuncs.c,v 1.330 2008/08/05 02:43:17 tgl Exp $`
`12`	`12`	`*`
`13`	`13`	`* NOTES`
`14`	`14`	`* Every node type that can appear in stored rules' parsetrees must`
`@@ -1334,6 +1334,7 @@ _outPlannerInfo(StringInfo str, PlannerInfo *node)`
`1334`	`1334`	`WRITE_NODE_FIELD(append_rel_list);`
`1335`	`1335`	`WRITE_NODE_FIELD(query_pathkeys);`
`1336`	`1336`	`WRITE_NODE_FIELD(group_pathkeys);`
	`1337`	`+WRITE_NODE_FIELD(distinct_pathkeys);`
`1337`	`1338`	`WRITE_NODE_FIELD(sort_pathkeys);`
`1338`	`1339`	`WRITE_FLOAT_FIELD(total_table_pages,"%.0f");`
`1339`	`1340`	`WRITE_FLOAT_FIELD(tuple_fraction,"%.4f");`

`‎src/backend/optimizer/plan/planmain.c`

Lines changed: 13 additions & 7 deletions

Original file line number	Diff line number	Diff line change
`@@ -14,7 +14,7 @@`
`14`	`14`	`*`
`15`	`15`	`*`
`16`	`16`	`* IDENTIFICATION`
`17`		`- * $PostgreSQL: pgsql/src/backend/optimizer/plan/planmain.c,v 1.108 2008/08/03 19:10:52 tgl Exp $`
	`17`	`+ * $PostgreSQL: pgsql/src/backend/optimizer/plan/planmain.c,v 1.109 2008/08/05 02:43:17 tgl Exp $`
`18`	`18`	`*`
`19`	`19`	`*-------------------------------------------------------------------------`
`20`	`20`	`*/`
`@@ -66,9 +66,9 @@`
`66`	`66`	`* PlannerInfo field and not a passed parameter is that the low-level routines`
`67`	`67`	`* in indxpath.c need to see it.)`
`68`	`68`	`*`
`69`		`- * Note: the PlannerInfo node also includes group_pathkeys and sort_pathkeys,`
`70`		`- * which like query_pathkeys need to be canonicalized once the info is`
`71`		`- * available.`
	`69`	`+ * Note: the PlannerInfo node also includes group_pathkeys, distinct_pathkeys,`
	`70`	`+ *and sort_pathkeys,which like query_pathkeys need to be canonicalized once`
	`71`	`+ *the info isavailable.`
`72`	`72`	`*`
`73`	`73`	`* tuple_fraction is interpreted as follows:`
`74`	`74`	`* 0: expect all tuples to be retrieved (normal case)`
`@@ -120,6 +120,8 @@ query_planner(PlannerInfo root, List tlist,`
`120`	`120`	`root->query_pathkeys);`
`121`	`121`	`root->group_pathkeys=canonicalize_pathkeys(root,`
`122`	`122`	`root->group_pathkeys);`
	`123`	`+root->distinct_pathkeys=canonicalize_pathkeys(root,`
	`124`	`+root->distinct_pathkeys);`
`123`	`125`	`root->sort_pathkeys=canonicalize_pathkeys(root,`
`124`	`126`	`root->sort_pathkeys);`
`125`	`127`	`return;`
`@@ -237,10 +239,12 @@ query_planner(PlannerInfo root, List tlist,`
`237`	`239`	`/*`
`238`	`240`	`* We have completed merging equivalence sets, so it's now possible to`
`239`	`241`	`* convert the requested query_pathkeys to canonical form.Also`
`240`		`- * canonicalize the groupClause and sortClause pathkeys for use later.`
	`242`	`+ * canonicalize the groupClause, distinctClause and sortClause pathkeys`
	`243`	`+ * for use later.`
`241`	`244`	`*/`
`242`	`245`	`root->query_pathkeys=canonicalize_pathkeys(root,root->query_pathkeys);`
`243`	`246`	`root->group_pathkeys=canonicalize_pathkeys(root,root->group_pathkeys);`
	`247`	`+root->distinct_pathkeys=canonicalize_pathkeys(root,root->distinct_pathkeys);`
`244`	`248`	`root->sort_pathkeys=canonicalize_pathkeys(root,root->sort_pathkeys);`
`245`	`249`
`246`	`250`	`/*`
`@@ -286,9 +290,11 @@ query_planner(PlannerInfo root, List tlist,`
`286`	`290`	`/*`
`287`	`291`	`* If both GROUP BY and ORDER BY are specified, we will need two`
`288`	`292`	`* levels of sort --- and, therefore, certainly need to read all the`
`289`		`- * tuples --- unless ORDER BY is a subset of GROUP BY.`
	`293`	`+ * tuples --- unless ORDER BY is a subset of GROUP BY. Likewise if`
	`294`	`+ * we have both DISTINCT and GROUP BY.`
`290`	`295`	`*/`
`291`		`-if (!pathkeys_contained_in(root->sort_pathkeys,root->group_pathkeys))`
	`296`	`+if (!pathkeys_contained_in(root->sort_pathkeys,root->group_pathkeys)\|\|`
	`297`	`+!pathkeys_contained_in(root->distinct_pathkeys,root->group_pathkeys))`
`292`	`298`	`tuple_fraction=0.0;`
`293`	`299`	`}`
`294`	`300`	`elseif (parse->hasAggs\|\|root->hasHavingQual)`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitbe3b265

File tree

13 files changed

13 files changed

`‎src/backend/nodes/outfuncs.c`

`‎src/backend/optimizer/plan/planmain.c`

0 commit comments