NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commitbd10ec5

committed

Detect redundant GROUP BY columns using UNIQUE indexes

d4c3a15 added support that when the GROUP BY contained all of thecolumns belonging to a relation's PRIMARY KEY, all other columnsbelonging to that relation would be removed from the GROUP BY clause.That's possible because all other columns are functionally dependent onthe PRIMARY KEY and those columns alone ensure the groups are distinct.Here we expand on that optimization and allow it to work for any uniqueindexes on the table rather than just the PRIMARY KEY index. Thisnormally requires that all columns in the index are defined with NOT NULL,however, we can relax that requirement when the index is defined withNULLS NOT DISTINCT.When there are multiple suitable indexes to allow columns to be removed,we prefer the index with the least number of columns as this allows usto remove the highest number of GROUP BY columns. One day, we may want torevisit that decision as it may make more sense to use the narrower set ofcolumns in terms of the width of the data types and stored/queried data.This also adjusts the code to make use of RelOptInfo.indexlist ratherthan looking up the catalog tables.In passing, add another short-circuit path to allow bailing out earlierin cases where it's certainly not possible to remove redundant GROUP BYcolumns. This early exit is now cheaper to do than when this code wasoriginally written as00b4146 made it cheaper to check for emptyBitmapsets.Patch originally by Zhang Mingli and later worked on by jian he, but afterI (David) worked on it, there was very little of the original left.Author: Zhang Mingli, jian he, David RowleyReviewed-by: jian he, Andrei LepikhovDiscussion:https://postgr.es/m/327990c8-b9b2-4b0c-bffb-462249f82de0%40Spark

1 parentd8f3351 commitbd10ec5Copy full SHA for bd10ec5

File tree

5 files changed

+200

-22

lines changed

src
- backend/optimizer
  - plan
    - initsplan.c
  - util
    - plancat.c
- include/nodes
  - pathnodes.h
- test/regress
  - expected
    - aggregates.out
  - sql
    - aggregates.sql

5 files changed

+200

-22

lines changed

`‎src/backend/optimizer/plan/initsplan.c‎`

Lines changed: 98 additions & 22 deletions

Original file line number	Diff line number	Diff line change
`@@ -400,24 +400,21 @@ add_vars_to_attr_needed(PlannerInfo root, List vars,`
`400`	`400`	`*`
`401`	`401`	`* Since some other DBMSes do not allow references to ungrouped columns, it's`
`402`	`402`	`* not unusual to find all columns listed in GROUP BY even though listing the`
`403`		`- * primary-key columns would be sufficient. Deleting such excess columns`
`404`		`- * avoids redundant sorting work, so it's worth doing.`
	`403`	`+ * primary-key columns, or columns of a unique constraint would be sufficient.`
	`404`	`+ * Deleting such excess columns avoids redundant sorting or hashing work, so`
	`405`	`+ * it's worth doing.`
`405`	`406`	`*`
`406`	`407`	`* Relcache invalidations will ensure that cached plans become invalidated`
`407`		`- * when the underlying index of the pkey constraint is dropped.`
`408`		`- *`
`409`		`- * Currently, we only make use of pkey constraints for this, however, we may`
`410`		`- * wish to take this further in the future and also use unique constraints`
`411`		`- * which have NOT NULL columns. In that case, plan invalidation will still`
`412`		`- * work since relations will receive a relcache invalidation when a NOT NULL`
`413`		`- * constraint is dropped.`
	`408`	`+ * when the underlying supporting indexes are dropped or if a column's NOT`
	`409`	`+ * NULL attribute is removed.`
`414`	`410`	`*/`
`415`	`411`	`void`
`416`	`412`	`remove_useless_groupby_columns(PlannerInfo*root)`
`417`	`413`	`{`
`418`	`414`	`Query*parse=root->parse;`
`419`	`415`	`Bitmapset**groupbyattnos;`
`420`	`416`	`Bitmapset**surplusvars;`
	`417`	`+booltryremove= false;`
`421`	`418`	`ListCell*lc;`
`422`	`419`	`intrelid;`
`423`	`420`
`@@ -457,10 +454,24 @@ remove_useless_groupby_columns(PlannerInfo *root)`
`457`	`454`	`/* OK, remember we have this Var */`
`458`	`455`	`relid=var->varno;`
`459`	`456`	`Assert(relid <=list_length(parse->rtable));`
	`457`	`+`
	`458`	`+/*`
	`459`	`+ * If this isn't the first column for this relation then we now have`
	`460`	`+ * multiple columns. That means there might be some that can be`
	`461`	`+ * removed.`
	`462`	`+ */`
	`463`	`+tryremove \|= !bms_is_empty(groupbyattnos[relid]);`
`460`	`464`	`groupbyattnos[relid]=bms_add_member(groupbyattnos[relid],`
`461`	`465`	`var->varattno-FirstLowInvalidHeapAttributeNumber);`
`462`	`466`	`}`
`463`	`467`
	`468`	`+/*`
	`469`	`+ * No Vars or didn't find multiple Vars for any relation in the GROUP BY?`
	`470`	`+ * If so, nothing can be removed, so don't waste more effort trying.`
	`471`	`+ */`
	`472`	`+if (!tryremove)`
	`473`	`+return;`
	`474`	`+`
`464`	`475`	`/*`
`465`	`476`	`* Consider each relation and see if it is possible to remove some of its`
`466`	`477`	`* Vars from GROUP BY. For simplicity and speed, we do the actual removal`
`@@ -472,9 +483,10 @@ remove_useless_groupby_columns(PlannerInfo *root)`
`472`	`483`	`foreach(lc,parse->rtable)`
`473`	`484`	`{`
`474`	`485`	`RangeTblEntry*rte=lfirst_node(RangeTblEntry,lc);`
	`486`	`+RelOptInfo*rel;`
`475`	`487`	`Bitmapset*relattnos;`
`476`		`-Bitmapset*pkattnos;`
`477`		`-OidconstraintOid;`
	`488`	`+Bitmapset*best_keycolumns=NULL;`
	`489`	`+int32best_nkeycolumns=PG_INT32_MAX;`
`478`	`490`
`479`	`491`	`relid++;`
`480`	`492`
`@@ -495,19 +507,83 @@ remove_useless_groupby_columns(PlannerInfo *root)`
`495`	`507`	`if (bms_membership(relattnos)!=BMS_MULTIPLE)`
`496`	`508`	`continue;`
`497`	`509`
`498`		`-/*`
`499`		`- * Can't remove any columns for this rel if there is no suitable`
`500`		`- * (i.e., nondeferrable) primary key constraint.`
`501`		`- */`
`502`		`-pkattnos=get_primary_key_attnos(rte->relid, false,&constraintOid);`
`503`		`-if (pkattnos==NULL)`
`504`		`-continue;`
	`510`	`+rel=root->simple_rel_array[relid];`
`505`	`511`
`506`	`512`	`/*`
`507`		`- * If the primary key is a proper subset of relattnos then we have`
`508`		`- * some items in the GROUP BY that can be removed.`
	`513`	`+ * Now check each index for this relation to see if there are any with`
	`514`	`+ * columns which are a proper subset of the grouping columns for this`
	`515`	`+ * relation.`
`509`	`516`	`*/`
`510`		`-if (bms_subset_compare(pkattnos,relattnos)==BMS_SUBSET1)`
	`517`	`+foreach_node(IndexOptInfo,index,rel->indexlist)`
	`518`	`+{`
	`519`	`+Bitmapset*ind_attnos;`
	`520`	`+boolnulls_check_ok;`
	`521`	`+`
	`522`	`+/*`
	`523`	`+ * Skip any non-unique and deferrable indexes. Predicate indexes`
	`524`	`+ * have not been checked yet, so we must skip those too as the`
	`525`	`+ * predOK check that's done later might fail.`
	`526`	`+ */`
	`527`	`+if (!index->unique\|\| !index->immediate\|\|index->indpred!=NIL)`
	`528`	`+continue;`
	`529`	`+`
	`530`	`+/* For simplicity, we currently don't support expression indexes */`
	`531`	`+if (index->indexprs!=NIL)`
	`532`	`+continue;`
	`533`	`+`
	`534`	`+ind_attnos=NULL;`
	`535`	`+nulls_check_ok= true;`
	`536`	`+for (inti=0;i<index->nkeycolumns;i++)`
	`537`	`+{`
	`538`	`+/*`
	`539`	`+ * We must insist that the index columns are all defined NOT`
	`540`	`+ * NULL otherwise duplicate NULLs could exist. However, we`
	`541`	`+ * can relax this check when the index is defined with NULLS`
	`542`	`+ * NOT DISTINCT as there can only be 1 NULL row, therefore`
	`543`	`+ * functional dependency on the unique columns is maintained,`
	`544`	`+ * despite the NULL.`
	`545`	`+ */`
	`546`	`+if (!index->nullsnotdistinct&&`
	`547`	`+!bms_is_member(index->indexkeys[i],`
	`548`	`+rel->notnullattnums))`
	`549`	`+{`
	`550`	`+nulls_check_ok= false;`
	`551`	`+break;`
	`552`	`+}`
	`553`	`+`
	`554`	`+ind_attnos=`
	`555`	`+bms_add_member(ind_attnos,`
	`556`	`+index->indexkeys[i]-`
	`557`	`+FirstLowInvalidHeapAttributeNumber);`
	`558`	`+}`
	`559`	`+`
	`560`	`+if (!nulls_check_ok)`
	`561`	`+continue;`
	`562`	`+`
	`563`	`+/*`
	`564`	`+ * Skip any indexes where the indexed columns aren't a proper`
	`565`	`+ * subset of the GROUP BY.`
	`566`	`+ */`
	`567`	`+if (bms_subset_compare(ind_attnos,relattnos)!=BMS_SUBSET1)`
	`568`	`+continue;`
	`569`	`+`
	`570`	`+/*`
	`571`	`+ * Record the attribute numbers from the index with the fewest`
	`572`	`+ * columns. This allows the largest number of columns to be`
	`573`	`+ * removed from the GROUP BY clause. In the future, we may wish`
	`574`	`+ * to consider using the narrowest set of columns and looking at`
	`575`	`+ * pg_statistic.stawidth as it might be better to use an index`
	`576`	`+ * with, say two INT4s, rather than, say, one long varlena column.`
	`577`	`+ */`
	`578`	`+if (index->nkeycolumns<best_nkeycolumns)`
	`579`	`+{`
	`580`	`+best_keycolumns=ind_attnos;`
	`581`	`+best_nkeycolumns=index->nkeycolumns;`
	`582`	`+}`
	`583`	`+}`
	`584`	`+`
	`585`	`+/* Did we find a suitable index? */`
	`586`	`+if (!bms_is_empty(best_keycolumns))`
`511`	`587`	`{`
`512`	`588`	`/*`
`513`	`589`	`* To easily remember whether we've found anything to do, we don't`
`@@ -518,7 +594,7 @@ remove_useless_groupby_columns(PlannerInfo *root)`
`518`	`594`	`(list_length(parse->rtable)+1));`
`519`	`595`
`520`	`596`	`/* Remember the attnos of the removable columns */`
`521`		`-surplusvars[relid]=bms_difference(relattnos,pkattnos);`
	`597`	`+surplusvars[relid]=bms_difference(relattnos,best_keycolumns);`
`522`	`598`	`}`
`523`	`599`	`}`
`524`	`600`

`‎src/backend/optimizer/util/plancat.c‎`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -457,6 +457,7 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,`
`457`	`457`	`info->indrestrictinfo=NIL;/* set later, in indxpath.c */`
`458`	`458`	`info->predOK= false;/* set later, in indxpath.c */`
`459`	`459`	`info->unique=index->indisunique;`
	`460`	`+info->nullsnotdistinct=index->indnullsnotdistinct;`
`460`	`461`	`info->immediate=index->indimmediate;`
`461`	`462`	`info->hypothetical= false;`
`462`	`463`

`‎src/include/nodes/pathnodes.h‎`

Lines changed: 2 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -1187,6 +1187,8 @@ struct IndexOptInfo`
`1187`	`1187`	`boolpredOK;`
`1188`	`1188`	`/* true if a unique index */`
`1189`	`1189`	`boolunique;`
	`1190`	`+/* true if the index was defined with NULLS NOT DISTINCT */`
	`1191`	`+boolnullsnotdistinct;`
`1190`	`1192`	`/* is uniqueness enforced immediately? */`
`1191`	`1193`	`boolimmediate;`
`1192`	`1194`	`/* true if index doesn't really exist */`

`‎src/test/regress/expected/aggregates.out‎`

Lines changed: 67 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -1448,6 +1448,73 @@ explain (costs off) select * from p_t1 group by a,b,c,d;`
`1448`	`1448`	`-> Seq Scan on p_t1_2`
`1449`	`1449`	`(5 rows)`
`1450`	`1450`
	`1451`	`+create unique index t3_c_uidx on t3(c);`
	`1452`	`+-- Ensure we don't remove any columns from the GROUP BY for a unique`
	`1453`	`+-- index on a NULLable column.`
	`1454`	`+explain (costs off) select b,c from t3 group by b,c;`
	`1455`	`+ QUERY PLAN`
	`1456`	`+----------------------`
	`1457`	`+ HashAggregate`
	`1458`	`+ Group Key: b, c`
	`1459`	`+ -> Seq Scan on t3`
	`1460`	`+(3 rows)`
	`1461`	`+`
	`1462`	`+-- Make the column NOT NULL and ensure we remove the redundant column`
	`1463`	`+alter table t3 alter column c set not null;`
	`1464`	`+explain (costs off) select b,c from t3 group by b,c;`
	`1465`	`+ QUERY PLAN`
	`1466`	`+----------------------`
	`1467`	`+ HashAggregate`
	`1468`	`+ Group Key: c`
	`1469`	`+ -> Seq Scan on t3`
	`1470`	`+(3 rows)`
	`1471`	`+`
	`1472`	`+-- When there are multiple supporting unique indexes and the GROUP BY contains`
	`1473`	`+-- columns to cover all of those, ensure we pick the index with the least`
	`1474`	`+-- number of columns so that we can remove more columns from the GROUP BY.`
	`1475`	`+explain (costs off) select a,b,c from t3 group by a,b,c;`
	`1476`	`+ QUERY PLAN`
	`1477`	`+----------------------`
	`1478`	`+ HashAggregate`
	`1479`	`+ Group Key: c`
	`1480`	`+ -> Seq Scan on t3`
	`1481`	`+(3 rows)`
	`1482`	`+`
	`1483`	`+-- As above but try ordering the columns differently to ensure we get the`
	`1484`	`+-- same result.`
	`1485`	`+explain (costs off) select a,b,c from t3 group by c,a,b;`
	`1486`	`+ QUERY PLAN`
	`1487`	`+----------------------`
	`1488`	`+ HashAggregate`
	`1489`	`+ Group Key: c`
	`1490`	`+ -> Seq Scan on t3`
	`1491`	`+(3 rows)`
	`1492`	`+`
	`1493`	`+-- Ensure we don't use a partial index as proof of functional dependency`
	`1494`	`+drop index t3_c_uidx;`
	`1495`	`+create index t3_c_uidx on t3 (c) where c > 0;`
	`1496`	`+explain (costs off) select b,c from t3 group by b,c;`
	`1497`	`+ QUERY PLAN`
	`1498`	`+----------------------`
	`1499`	`+ HashAggregate`
	`1500`	`+ Group Key: b, c`
	`1501`	`+ -> Seq Scan on t3`
	`1502`	`+(3 rows)`
	`1503`	`+`
	`1504`	`+-- A unique index defined as NULLS NOT DISTINCT does not need a supporting NOT`
	`1505`	`+-- NULL constraint on the indexed columns. Ensure the redundant columns are`
	`1506`	`+-- removed from the GROUP BY for such a table.`
	`1507`	`+drop index t3_c_uidx;`
	`1508`	`+alter table t3 alter column c drop not null;`
	`1509`	`+create unique index t3_c_uidx on t3(c) nulls not distinct;`
	`1510`	`+explain (costs off) select b,c from t3 group by b,c;`
	`1511`	`+ QUERY PLAN`
	`1512`	`+----------------------`
	`1513`	`+ HashAggregate`
	`1514`	`+ Group Key: c`
	`1515`	`+ -> Seq Scan on t3`
	`1516`	`+(3 rows)`
	`1517`	`+`
`1451`	`1518`	`drop table t1 cascade;`
`1452`	`1519`	`NOTICE: drop cascades to table t1c`
`1453`	`1520`	`drop table t2;`

`‎src/test/regress/sql/aggregates.sql‎`

Lines changed: 32 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -507,6 +507,38 @@ create temp table p_t1_2 partition of p_t1 for values in(2);`
`507`	`507`	`-- Ensure we can remove non-PK columns for partitioned tables.`
`508`	`508`	`explain (costs off)select*from p_t1group by a,b,c,d;`
`509`	`509`
	`510`	`+createunique indext3_c_uidxon t3(c);`
	`511`	`+`
	`512`	`+-- Ensure we don't remove any columns from the GROUP BY for a unique`
	`513`	`+-- index on a NULLable column.`
	`514`	`+explain (costs off)select b,cfrom t3group by b,c;`
	`515`	`+`
	`516`	`+-- Make the column NOT NULL and ensure we remove the redundant column`
	`517`	`+altertable t3 alter column csetnot null;`
	`518`	`+explain (costs off)select b,cfrom t3group by b,c;`
	`519`	`+`
	`520`	`+-- When there are multiple supporting unique indexes and the GROUP BY contains`
	`521`	`+-- columns to cover all of those, ensure we pick the index with the least`
	`522`	`+-- number of columns so that we can remove more columns from the GROUP BY.`
	`523`	`+explain (costs off)select a,b,cfrom t3group by a,b,c;`
	`524`	`+`
	`525`	`+-- As above but try ordering the columns differently to ensure we get the`
	`526`	`+-- same result.`
	`527`	`+explain (costs off)select a,b,cfrom t3group by c,a,b;`
	`528`	`+`
	`529`	`+-- Ensure we don't use a partial index as proof of functional dependency`
	`530`	`+dropindex t3_c_uidx;`
	`531`	`+createindext3_c_uidxon t3 (c)where c>0;`
	`532`	`+explain (costs off)select b,cfrom t3group by b,c;`
	`533`	`+`
	`534`	`+-- A unique index defined as NULLS NOT DISTINCT does not need a supporting NOT`
	`535`	`+-- NULL constraint on the indexed columns. Ensure the redundant columns are`
	`536`	`+-- removed from the GROUP BY for such a table.`
	`537`	`+dropindex t3_c_uidx;`
	`538`	`+altertable t3 alter column c dropnot null;`
	`539`	`+createunique indext3_c_uidxon t3(c) nulls not distinct;`
	`540`	`+explain (costs off)select b,cfrom t3group by b,c;`
	`541`	`+`
`510`	`542`	`droptable t1 cascade;`
`511`	`543`	`droptable t2;`
`512`	`544`	`droptable t3;`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitbd10ec5

File tree

5 files changed

5 files changed

`‎src/backend/optimizer/plan/initsplan.c‎`

`‎src/backend/optimizer/util/plancat.c‎`

`‎src/include/nodes/pathnodes.h‎`

`‎src/test/regress/expected/aggregates.out‎`

`‎src/test/regress/sql/aggregates.sql‎`

0 commit comments