Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit4767bc8

Browse files
committed
Improve statistics estimation to make some use of DISTINCT in sub-queries.
Formerly, we just punted when trying to estimate stats for variables comingout of sub-queries using DISTINCT, on the grounds that whatever stats wemight have for underlying table columns would be inapplicable. But if thesub-query has only one DISTINCT column, we can consider its output variableas being unique, which is useful information all by itself. The scope ofthis improvement is pretty narrow, but it costs nearly nothing, so we mightas well do it. Per discussion with Andres Freund.This patch differs from the draft I submitted yesterday in updating variouscomments about vardata.isunique (to reflect its extended meaning) and intweaking the interaction with security_barrier views. There does not seemto be a reason why we can't use this sort of knowledge even when thesub-query is such a view.
1 parent1cc1b91 commit4767bc8

File tree

2 files changed

+60
-36
lines changed

2 files changed

+60
-36
lines changed

‎src/backend/utils/adt/selfuncs.c

Lines changed: 59 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@
110110
#include"optimizer/predtest.h"
111111
#include"optimizer/restrictinfo.h"
112112
#include"optimizer/var.h"
113+
#include"parser/parse_clause.h"
113114
#include"parser/parse_coerce.h"
114115
#include"parser/parsetree.h"
115116
#include"utils/builtins.h"
@@ -255,10 +256,11 @@ var_eq_const(VariableStatData *vardata, Oid operator,
255256
return0.0;
256257

257258
/*
258-
* If we matched the var to a unique index, assume there is exactly one
259-
* match regardless of anything else. (This is slightly bogus, since the
260-
* index's equality operator might be different from ours, but it's more
261-
* likely to be right than ignoring the information.)
259+
* If we matched the var to a unique index or DISTINCT clause, assume
260+
* there is exactly one match regardless of anything else. (This is
261+
* slightly bogus, since the index or clause's equality operator might be
262+
* different from ours, but it's much more likely to be right than
263+
* ignoring the information.)
262264
*/
263265
if (vardata->isunique&&vardata->rel&&vardata->rel->tuples >=1.0)
264266
return1.0 /vardata->rel->tuples;
@@ -389,10 +391,11 @@ var_eq_non_const(VariableStatData *vardata, Oid operator,
389391
boolisdefault;
390392

391393
/*
392-
* If we matched the var to a unique index, assume there is exactly one
393-
* match regardless of anything else. (This is slightly bogus, since the
394-
* index's equality operator might be different from ours, but it's more
395-
* likely to be right than ignoring the information.)
394+
* If we matched the var to a unique index or DISTINCT clause, assume
395+
* there is exactly one match regardless of anything else. (This is
396+
* slightly bogus, since the index or clause's equality operator might be
397+
* different from ours, but it's much more likely to be right than
398+
* ignoring the information.)
396399
*/
397400
if (vardata->isunique&&vardata->rel&&vardata->rel->tuples >=1.0)
398401
return1.0 /vardata->rel->tuples;
@@ -4128,10 +4131,11 @@ get_join_variables(PlannerInfo *root, List *args, SpecialJoinInfo *sjinfo,
41284131
*atttype, atttypmod: type data to pass to get_attstatsslot(). This is
41294132
*commonly the same as the exposed type of the variable argument,
41304133
*but can be different in binary-compatible-type cases.
4131-
*isunique: TRUE if we were able to match the var to a unique index,
4132-
*implying its values are unique for this query. (Caution: this
4133-
*should be trusted for statistical purposes only, since we do not
4134-
*check indimmediate.)
4134+
*isunique: TRUE if we were able to match the var to a unique index or a
4135+
*single-column DISTINCT clause, implying its values are unique for
4136+
*this query. (Caution: this should be trusted for statistical
4137+
*purposes only, since we do not check indimmediate nor verify that
4138+
*the exact same definition of equality applies.)
41354139
*
41364140
* Caller is responsible for doing ReleaseVariableStats() before exiting.
41374141
*/
@@ -4357,32 +4361,21 @@ examine_simple_variable(PlannerInfo *root, Var *var,
43574361
{
43584362
/*
43594363
* Plain subquery (not one that was converted to an appendrel).
4360-
*
4361-
* Punt if subquery uses set operations, GROUP BY, or DISTINCT --- any
4362-
* of these will mash underlying columns' stats beyond recognition.
4363-
* (Set ops are particularly nasty; if we forged ahead, we would
4364-
* return stats relevant to only the leftmost subselect...)
43654364
*/
43664365
Query*subquery=rte->subquery;
43674366
RelOptInfo*rel;
43684367
TargetEntry*ste;
43694368

4370-
if (subquery->setOperations||
4371-
subquery->groupClause||
4372-
subquery->distinctClause)
4373-
return;
4374-
43754369
/*
4376-
* If the sub-query originated from a view with the security_barrier
4377-
* attribute, we treat it as a black-box from outside of the view.
4378-
* This is probably a harsher restriction than necessary; it's
4379-
* certainly OK for the selectivity estimator (which is a C function,
4380-
* and therefore omnipotent anyway) to look at the statistics. But
4381-
* many selectivity estimators will happily *invoke the operator
4382-
* function* to try to work out a good estimate - and that's not OK.
4383-
* So for now, we do this.
4370+
* Punt if subquery uses set operations or GROUP BY, as these will
4371+
* mash underlying columns' stats beyond recognition. (Set ops are
4372+
* particularly nasty; if we forged ahead, we would return stats
4373+
* relevant to only the leftmost subselect...) DISTINCT is also
4374+
* problematic, but we check that later because there is a possibility
4375+
* of learning something even with it.
43844376
*/
4385-
if (rte->security_barrier)
4377+
if (subquery->setOperations||
4378+
subquery->groupClause)
43864379
return;
43874380

43884381
/*
@@ -4415,6 +4408,37 @@ examine_simple_variable(PlannerInfo *root, Var *var,
44154408
rte->eref->aliasname,var->varattno);
44164409
var= (Var*)ste->expr;
44174410

4411+
/*
4412+
* If subquery uses DISTINCT, we can't make use of any stats for the
4413+
* variable ... but, if it's the only DISTINCT column, we are entitled
4414+
* to consider it unique. We do the test this way so that it works
4415+
* for cases involving DISTINCT ON.
4416+
*/
4417+
if (subquery->distinctClause)
4418+
{
4419+
if (list_length(subquery->distinctClause)==1&&
4420+
targetIsInSortList(ste,InvalidOid,subquery->distinctClause))
4421+
vardata->isunique= true;
4422+
/* cannot go further */
4423+
return;
4424+
}
4425+
4426+
/*
4427+
* If the sub-query originated from a view with the security_barrier
4428+
* attribute, we must not look at the variable's statistics, though
4429+
* it seems all right to notice the existence of a DISTINCT clause.
4430+
* So stop here.
4431+
*
4432+
* This is probably a harsher restriction than necessary; it's
4433+
* certainly OK for the selectivity estimator (which is a C function,
4434+
* and therefore omnipotent anyway) to look at the statistics. But
4435+
* many selectivity estimators will happily *invoke the operator
4436+
* function* to try to work out a good estimate - and that's not OK.
4437+
* So for now, don't dig down for stats.
4438+
*/
4439+
if (rte->security_barrier)
4440+
return;
4441+
44184442
/* Can only handle a simple Var of subquery's query level */
44194443
if (var&&IsA(var,Var)&&
44204444
var->varlevelsup==0)
@@ -4513,10 +4537,10 @@ get_variable_numdistinct(VariableStatData *vardata, bool *isdefault)
45134537
}
45144538

45154539
/*
4516-
* If there is a unique index for the variable, assume it is unique no
4517-
* matter what pg_statistic says; the statistics could be out of date, or
4518-
* we might have found a partial unique index that proves the var is
4519-
* unique for this query.
4540+
* If there is a unique indexor DISTINCT clausefor the variable, assume
4541+
*it is unique nomatter what pg_statistic says; the statistics could be
4542+
*out of date, orwe might have found a partial unique index that proves
4543+
*the var isunique for this query.
45204544
*/
45214545
if (vardata->isunique)
45224546
stadistinct=-1.0;

‎src/include/utils/selfuncs.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ typedef struct VariableStatData
7474
Oidvartype;/* exposed type of expression */
7575
Oidatttype;/* type to pass to get_attstatsslot */
7676
int32atttypmod;/* typmod to pass to get_attstatsslot */
77-
boolisunique;/*true if matched to a unique index */
77+
boolisunique;/*matches unique index or DISTINCT clause */
7878
}VariableStatData;
7979

8080
#defineReleaseVariableStats(vardata) \

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp