NotificationsYou must be signed in to change notification settings
Fork5
Star26

Commita536ed5

committed

Make use of statistics on index expressions. There are still some

corner cases that could stand improvement, but it does all the basicstuff. A byproduct is that the selectivity routines are no longerconstrained to working on simple Vars; we might in future be able toimprove the behavior for subexpressions that don't match indexes.

1 parentd372bba commita536ed5Copy full SHA for a536ed5

File tree

5 files changed

+1062

-979

lines changed

src
- backend
  - optimizer
    - path
      - costsize.c
    - util
      - relnode.c
  - utils/adt
    - selfuncs.c
- include
  - optimizer
    - pathnode.h
  - utils
    - selfuncs.h

5 files changed

+1062

-979

lines changed

`‎src/backend/optimizer/path/costsize.c`

Lines changed: 3 additions & 178 deletions

Original file line number	Diff line number	Diff line change
`@@ -49,7 +49,7 @@`
`49`	`49`	`* Portions Copyright (c) 1994, Regents of the University of California`
`50`	`50`	`*`
`51`	`51`	`* IDENTIFICATION`
`52`		`- * $PostgreSQL: pgsql/src/backend/optimizer/path/costsize.c,v 1.124 2004/02/03 17:34:03 tgl Exp $`
	`52`	`+ * $PostgreSQL: pgsql/src/backend/optimizer/path/costsize.c,v 1.125 2004/02/17 00:52:53 tgl Exp $`
`53`	`53`	`*`
`54`	`54`	`*-------------------------------------------------------------------------`
`55`	`55`	`*/`
`@@ -102,8 +102,6 @@ boolenable_mergejoin = true;`
`102`	`102`	`boolenable_hashjoin= true;`
`103`	`103`
`104`	`104`
`105`		`-staticSelectivityestimate_hash_bucketsize(Queryroot,Varvar,`
`106`		`-intnbuckets);`
`107`	`105`	`staticboolcost_qual_eval_walker(Nodenode,QualCosttotal);`
`108`	`106`	`staticSelectivityapprox_selectivity(Queryroot,Listquals,`
`109`	`107`	`JoinTypejointype);`
`@@ -1152,7 +1150,7 @@ cost_hashjoin(HashPath path, Query root)`
`1152`	`1150`	`/* not cached yet */`
`1153`	`1151`	`thisbucketsize=`
`1154`	`1152`	`estimate_hash_bucketsize(root,`
`1155`		`- (Var*)get_rightop(restrictinfo->clause),`
	`1153`	`+get_rightop(restrictinfo->clause),`
`1156`	`1154`	`virtualbuckets);`
`1157`	`1155`	`restrictinfo->right_bucketsize=thisbucketsize;`
`1158`	`1156`	`}`
`@@ -1168,7 +1166,7 @@ cost_hashjoin(HashPath path, Query root)`
`1168`	`1166`	`/* not cached yet */`
`1169`	`1167`	`thisbucketsize=`
`1170`	`1168`	`estimate_hash_bucketsize(root,`
`1171`		`-(Var*)get_leftop(restrictinfo->clause),`
	`1169`	`+get_leftop(restrictinfo->clause),`
`1172`	`1170`	`virtualbuckets);`
`1173`	`1171`	`restrictinfo->left_bucketsize=thisbucketsize;`
`1174`	`1172`	`}`
`@@ -1249,179 +1247,6 @@ cost_hashjoin(HashPath path, Query root)`
`1249`	`1247`	`path->jpath.path.total_cost=startup_cost+run_cost;`
`1250`	`1248`	`}`
`1251`	`1249`
`1252`		`-/*`
`1253`		`- * Estimate hash bucketsize fraction (ie, number of entries in a bucket`
`1254`		`- * divided by total tuples in relation) if the specified Var is used`
`1255`		`- * as a hash key.`
`1256`		`- *`
`1257`		`- * XXX This is really pretty bogus since we're effectively assuming that the`
`1258`		`- * distribution of hash keys will be the same after applying restriction`
`1259`		`- * clauses as it was in the underlying relation. However, we are not nearly`
`1260`		`- * smart enough to figure out how the restrict clauses might change the`
`1261`		`- * distribution, so this will have to do for now.`
`1262`		`- *`
`1263`		`- * We are passed the number of buckets the executor will use for the given`
`1264`		`- * input relation.If the data were perfectly distributed, with the same`
`1265`		`- * number of tuples going into each available bucket, then the bucketsize`
`1266`		`- * fraction would be 1/nbuckets. But this happy state of affairs will occur`
`1267`		`- * only if (a) there are at least nbuckets distinct data values, and (b)`
`1268`		`- * we have a not-too-skewed data distribution.Otherwise the buckets will`
`1269`		`- * be nonuniformly occupied. If the other relation in the join has a key`
`1270`		`- * distribution similar to this one's, then the most-loaded buckets are`
`1271`		`- * exactly those that will be probed most often. Therefore, the "average"`
`1272`		`- * bucket size for costing purposes should really be taken as something close`
`1273`		`- * to the "worst case" bucket size. We try to estimate this by adjusting the`
`1274`		`- * fraction if there are too few distinct data values, and then scaling up`
`1275`		`- * by the ratio of the most common value's frequency to the average frequency.`
`1276`		`- *`
`1277`		`- * If no statistics are available, use a default estimate of 0.1. This will`
`1278`		`- * discourage use of a hash rather strongly if the inner relation is large,`
`1279`		`- * which is what we want. We do not want to hash unless we know that the`
`1280`		`- * inner rel is well-dispersed (or the alternatives seem much worse).`
`1281`		`- */`
`1282`		`-staticSelectivity`
`1283`		`-estimate_hash_bucketsize(Queryroot,Varvar,intnbuckets)`
`1284`		`-{`
`1285`		`-Oidrelid;`
`1286`		`-RelOptInfo*rel;`
`1287`		`-HeapTupletuple;`
`1288`		`-Form_pg_statisticstats;`
`1289`		`-doubleestfract,`
`1290`		`-ndistinct,`
`1291`		`-mcvfreq,`
`1292`		`-avgfreq;`
`1293`		`-float4*numbers;`
`1294`		`-intnnumbers;`
`1295`		`-`
`1296`		`-/* Ignore any binary-compatible relabeling */`
`1297`		`-if (var&&IsA(var,RelabelType))`
`1298`		`-var= (Var) ((RelabelType)var)->arg;`
`1299`		`-`
`1300`		`-/*`
`1301`		`- * Lookup info about var's relation and attribute; if none available,`
`1302`		`- * return default estimate.`
`1303`		`- */`
`1304`		`-if (var==NULL\|\| !IsA(var,Var))`
`1305`		`-return0.1;`
`1306`		`-`
`1307`		`-relid=getrelid(var->varno,root->rtable);`
`1308`		`-if (relid==InvalidOid)`
`1309`		`-return0.1;`
`1310`		`-`
`1311`		`-rel=find_base_rel(root,var->varno);`
`1312`		`-`
`1313`		`-if (rel->tuples <=0.0\|\|rel->rows <=0.0)`
`1314`		`-return0.1;/* ensure we can divide below */`
`1315`		`-`
`1316`		`-tuple=SearchSysCache(STATRELATT,`
`1317`		`-ObjectIdGetDatum(relid),`
`1318`		`-Int16GetDatum(var->varattno),`
`1319`		`-0,0);`
`1320`		`-if (!HeapTupleIsValid(tuple))`
`1321`		`-{`
`1322`		`-/*`
`1323`		`- * If the attribute is known unique because of an index,`
`1324`		`- * we can treat it as well-distributed.`
`1325`		`- */`
`1326`		`-if (has_unique_index(rel,var->varattno))`
`1327`		`-return1.0 / (double)nbuckets;`
`1328`		`-`
`1329`		`-/*`
`1330`		`- * Perhaps the Var is a system attribute; if so, it will have no`
`1331`		`- * entry in pg_statistic, but we may be able to guess something`
`1332`		`- * about its distribution anyway.`
`1333`		`- */`
`1334`		`-switch (var->varattno)`
`1335`		`-{`
`1336`		`-caseObjectIdAttributeNumber:`
`1337`		`-caseSelfItemPointerAttributeNumber:`
`1338`		`-/* these are unique, so buckets should be well-distributed */`
`1339`		`-return1.0 / (double)nbuckets;`
`1340`		`-caseTableOidAttributeNumber:`
`1341`		`-/* hashing this is a terrible idea... */`
`1342`		`-return1.0;`
`1343`		`-}`
`1344`		`-return0.1;`
`1345`		`-}`
`1346`		`-stats= (Form_pg_statistic)GETSTRUCT(tuple);`
`1347`		`-`
`1348`		`-/*`
`1349`		`- * Obtain number of distinct data values in raw relation.`
`1350`		`- */`
`1351`		`-ndistinct=stats->stadistinct;`
`1352`		`-if (ndistinct<0.0)`
`1353`		`-ndistinct=-ndistinct*rel->tuples;`
`1354`		`-`
`1355`		`-if (ndistinct <=0.0)/* ensure we can divide */`
`1356`		`-{`
`1357`		`-ReleaseSysCache(tuple);`
`1358`		`-return0.1;`
`1359`		`-}`
`1360`		`-`
`1361`		`-/* Also compute avg freq of all distinct data values in raw relation */`
`1362`		`-avgfreq= (1.0-stats->stanullfrac) /ndistinct;`
`1363`		`-`
`1364`		`-/*`
`1365`		`- * Adjust ndistinct to account for restriction clauses. Observe we`
`1366`		`- * are assuming that the data distribution is affected uniformly by`
`1367`		`- * the restriction clauses!`
`1368`		`- *`
`1369`		`- * XXX Possibly better way, but much more expensive: multiply by`
`1370`		`- * selectivity of rel's restriction clauses that mention the target`
`1371`		`- * Var.`
`1372`		`- */`
`1373`		`-ndistinct *=rel->rows /rel->tuples;`
`1374`		`-`
`1375`		`-/*`
`1376`		`- * Initial estimate of bucketsize fraction is 1/nbuckets as long as`
`1377`		`- * the number of buckets is less than the expected number of distinct`
`1378`		`- * values; otherwise it is 1/ndistinct.`
`1379`		`- */`
`1380`		`-if (ndistinct> (double)nbuckets)`
`1381`		`-estfract=1.0 / (double)nbuckets;`
`1382`		`-else`
`1383`		`-estfract=1.0 /ndistinct;`
`1384`		`-`
`1385`		`-/*`
`1386`		`- * Look up the frequency of the most common value, if available.`
`1387`		`- */`
`1388`		`-mcvfreq=0.0;`
`1389`		`-`
`1390`		`-if (get_attstatsslot(tuple,var->vartype,var->vartypmod,`
`1391`		`-STATISTIC_KIND_MCV,InvalidOid,`
`1392`		`-NULL,NULL,&numbers,&nnumbers))`
`1393`		`-{`
`1394`		`-/*`
`1395`		`- * The first MCV stat is for the most common value.`
`1396`		`- */`
`1397`		`-if (nnumbers>0)`
`1398`		`-mcvfreq=numbers[0];`
`1399`		`-free_attstatsslot(var->vartype,NULL,0,`
`1400`		`-numbers,nnumbers);`
`1401`		`-}`
`1402`		`-`
`1403`		`-/*`
`1404`		`- * Adjust estimated bucketsize upward to account for skewed`
`1405`		`- * distribution.`
`1406`		`- */`
`1407`		`-if (avgfreq>0.0&&mcvfreq>avgfreq)`
`1408`		`-estfract *=mcvfreq /avgfreq;`
`1409`		`-`
`1410`		`-/*`
`1411`		`- * Clamp bucketsize to sane range (the above adjustment could easily`
`1412`		`- * produce an out-of-range result). We set the lower bound a little`
`1413`		`- * above zero, since zero isn't a very sane result.`
`1414`		`- */`
`1415`		`-if (estfract<1.0e-6)`
`1416`		`-estfract=1.0e-6;`
`1417`		`-elseif (estfract>1.0)`
`1418`		`-estfract=1.0;`
`1419`		`-`
`1420`		`-ReleaseSysCache(tuple);`
`1421`		`-`
`1422`		`-return (Selectivity)estfract;`
`1423`		`-}`
`1424`		`-`
`1425`	`1250`
`1426`	`1251`	`/*`
`1427`	`1252`	`* cost_qual_eval`

`‎src/backend/optimizer/util/relnode.c`

Lines changed: 2 additions & 6 deletions

Original file line number	Diff line number	Diff line change
`@@ -8,7 +8,7 @@`
`8`	`8`	`*`
`9`	`9`	`*`
`10`	`10`	`* IDENTIFICATION`
`11`		`- * $PostgreSQL: pgsql/src/backend/optimizer/util/relnode.c,v 1.54 2003/12/08 18:19:58 tgl Exp $`
	`11`	`+ * $PostgreSQL: pgsql/src/backend/optimizer/util/relnode.c,v 1.55 2004/02/17 00:52:53 tgl Exp $`
`12`	`12`	`*`
`13`	`13`	`*-------------------------------------------------------------------------`
`14`	`14`	`*/`
`@@ -214,12 +214,8 @@ find_base_rel(Query *root, int relid)`
`214`	`214`	`* find_join_rel`
`215`	`215`	`* Returns relation entry corresponding to 'relids' (a set of RT indexes),`
`216`	`216`	`* or NULL if none exists. This is for join relations.`
`217`		`- *`
`218`		`- * Note: there is probably no good reason for this to be called from`
`219`		`- * anywhere except build_join_rel, but keep it as a separate routine`
`220`		`- * just in case.`
`221`	`217`	`*/`
`222`		`-staticRelOptInfo*`
	`218`	`+RelOptInfo*`
`223`	`219`	`find_join_rel(Query*root,Relidsrelids)`
`224`	`220`	`{`
`225`	`221`	`List*joinrels;`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commita536ed5

File tree

5 files changed

5 files changed

`‎src/backend/optimizer/path/costsize.c`

`‎src/backend/optimizer/util/relnode.c`

0 commit comments