Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit0e5e167

Browse files
committed
Collect and use element-frequency statistics for arrays.
This patch improves selectivity estimation for the array <@, &&, and @>(containment and overlaps) operators. It enables collection of statisticsabout individual array element values by ANALYZE, and introducesoperator-specific estimators that use these stats. In addition,ScalarArrayOpExpr constructs of the forms "const = ANY/ALL (array_column)"and "const <> ANY/ALL (array_column)" are estimated by treating them asvariants of the containment operators.Since we still collect scalar-style stats about the array values as awhole, the pg_stats view is expanded to show both these stats and thearray-style stats in separate columns. This creates an incompatible changein how stats for tsvector columns are displayed in pg_stats: the statsabout lexemes are now displayed in the array-related columns instead of theoriginal scalar-related columns.There are a few loose ends here, notably that it'd be nice to be able tosuppress either the scalar-style stats or the array-element stats forcolumns for which they're not useful. But the patch is in good enoughshape to commit for wider testing.Alexander Korotkov, reviewed by Noah Misch and Nathan Boley
1 parent34c9784 commit0e5e167

File tree

24 files changed

+2341
-168
lines changed

24 files changed

+2341
-168
lines changed

‎doc/src/sgml/catalogs.sgml

Lines changed: 40 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5354,9 +5354,9 @@
53545354
Column data values of the appropriate kind for the
53555355
<replaceable>N</>th <quote>slot</quote>, or null if the slot
53565356
kind does not store any data values. Each array's element
5357-
values are actually of the specific column's data type,so there
5358-
is no way to define these columns'type more specifically than
5359-
<type>anyarray</>.
5357+
values are actually of the specific column's data type,or a related
5358+
type such as an array's elementtype, so there is no way to define
5359+
these columns' type more specifically than<type>anyarray</>.
53605360
</entry>
53615361
</row>
53625362
</tbody>
@@ -8291,8 +8291,6 @@
82918291
<entry>
82928292
A list of the most common values in the column. (Null if
82938293
no values seem to be more common than any others.)
8294-
For some data types such as <type>tsvector</>, this is a list of
8295-
the most common element values rather than values of the type itself.
82968294
</entry>
82978295
</row>
82988296

@@ -8301,12 +8299,9 @@
83018299
<entry><type>real[]</type></entry>
83028300
<entry></entry>
83038301
<entry>
8304-
A list of the frequencies of the most common values or elements,
8302+
A list of the frequencies of the most common values,
83058303
i.e., number of occurrences of each divided by total number of rows.
83068304
(Null when <structfield>most_common_vals</structfield> is.)
8307-
For some data types such as <type>tsvector</>, it can also store some
8308-
additional information, making it longer than the
8309-
<structfield>most_common_vals</> array.
83108305
</entry>
83118306
</row>
83128307

@@ -8338,13 +8333,47 @@
83388333
type does not have a <literal>&lt;</> operator.)
83398334
</entry>
83408335
</row>
8336+
8337+
<row>
8338+
<entry><structfield>most_common_elems</structfield></entry>
8339+
<entry><type>anyarray</type></entry>
8340+
<entry></entry>
8341+
<entry>
8342+
A list of non-null element values most often appearing within values of
8343+
the column. (Null for scalar types.)
8344+
</entry>
8345+
</row>
8346+
8347+
<row>
8348+
<entry><structfield>most_common_elem_freqs</structfield></entry>
8349+
<entry><type>real[]</type></entry>
8350+
<entry></entry>
8351+
<entry>
8352+
A list of the frequencies of the most common element values, i.e., the
8353+
fraction of rows containing at least one instance of the given value.
8354+
Two or three additional values follow the per-element frequencies;
8355+
these are the minimum and maximum of the preceding per-element
8356+
frequencies, and optionally the frequency of null elements.
8357+
(Null when <structfield>most_common_elems</structfield> is.)
8358+
</entry>
8359+
</row>
8360+
8361+
<row>
8362+
<entry><structfield>elem_count_histogram</structfield></entry>
8363+
<entry><type>real[]</type></entry>
8364+
<entry></entry>
8365+
<entry>
8366+
A histogram of the counts of distinct non-null element values within the
8367+
values of the column, followed by the average number of distinct
8368+
non-null elements. (Null for scalar types.)
8369+
</entry>
8370+
</row>
83418371
</tbody>
83428372
</tgroup>
83438373
</table>
83448374

83458375
<para>
8346-
The maximum number of entries in the <structfield>most_common_vals</>
8347-
and <structfield>histogram_bounds</> arrays can be set on a
8376+
The maximum number of entries in the array fields can be controlled on a
83488377
column-by-column basis using the <command>ALTER TABLE SET STATISTICS</>
83498378
command, or globally by setting the
83508379
<xref linkend="guc-default-statistics-target"> run-time parameter.

‎src/backend/catalog/heap.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1182,7 +1182,7 @@ heap_create_with_catalog(const char *relname,
11821182
F_ARRAY_SEND,/* array send (bin) proc */
11831183
InvalidOid,/* typmodin procedure - none */
11841184
InvalidOid,/* typmodout procedure - none */
1185-
InvalidOid,/* analyze procedure - default */
1185+
F_ARRAY_TYPANALYZE,/*arrayanalyze procedure */
11861186
new_type_oid,/* array element type - the rowtype */
11871187
true,/* yes, this is an array type */
11881188
InvalidOid,/* this has no array type */

‎src/backend/catalog/system_views.sql

Lines changed: 34 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -117,29 +117,54 @@ CREATE VIEW pg_stats AS
117117
stawidthAS avg_width,
118118
stadistinctAS n_distinct,
119119
CASE
120-
WHEN stakind1IN (1,4) THEN stavalues1
121-
WHEN stakind2IN (1,4) THEN stavalues2
122-
WHEN stakind3IN (1,4) THEN stavalues3
123-
WHEN stakind4IN (1,4) THEN stavalues4
120+
WHEN stakind1=1 THEN stavalues1
121+
WHEN stakind2=1 THEN stavalues2
122+
WHEN stakind3=1 THEN stavalues3
123+
WHEN stakind4=1 THEN stavalues4
124+
WHEN stakind5=1 THEN stavalues5
124125
ENDAS most_common_vals,
125126
CASE
126-
WHEN stakind1IN (1,4) THEN stanumbers1
127-
WHEN stakind2IN (1,4) THEN stanumbers2
128-
WHEN stakind3IN (1,4) THEN stanumbers3
129-
WHEN stakind4IN (1,4) THEN stanumbers4
127+
WHEN stakind1=1 THEN stanumbers1
128+
WHEN stakind2=1 THEN stanumbers2
129+
WHEN stakind3=1 THEN stanumbers3
130+
WHEN stakind4=1 THEN stanumbers4
131+
WHEN stakind5=1 THEN stanumbers5
130132
ENDAS most_common_freqs,
131133
CASE
132134
WHEN stakind1=2 THEN stavalues1
133135
WHEN stakind2=2 THEN stavalues2
134136
WHEN stakind3=2 THEN stavalues3
135137
WHEN stakind4=2 THEN stavalues4
138+
WHEN stakind5=2 THEN stavalues5
136139
ENDAS histogram_bounds,
137140
CASE
138141
WHEN stakind1=3 THEN stanumbers1[1]
139142
WHEN stakind2=3 THEN stanumbers2[1]
140143
WHEN stakind3=3 THEN stanumbers3[1]
141144
WHEN stakind4=3 THEN stanumbers4[1]
142-
ENDAS correlation
145+
WHEN stakind5=3 THEN stanumbers5[1]
146+
ENDAS correlation,
147+
CASE
148+
WHEN stakind1=4 THEN stavalues1
149+
WHEN stakind2=4 THEN stavalues2
150+
WHEN stakind3=4 THEN stavalues3
151+
WHEN stakind4=4 THEN stavalues4
152+
WHEN stakind5=4 THEN stavalues5
153+
ENDAS most_common_elems,
154+
CASE
155+
WHEN stakind1=4 THEN stanumbers1
156+
WHEN stakind2=4 THEN stanumbers2
157+
WHEN stakind3=4 THEN stanumbers3
158+
WHEN stakind4=4 THEN stanumbers4
159+
WHEN stakind5=4 THEN stanumbers5
160+
ENDAS most_common_elem_freqs,
161+
CASE
162+
WHEN stakind1=5 THEN stanumbers1
163+
WHEN stakind2=5 THEN stanumbers2
164+
WHEN stakind3=5 THEN stanumbers3
165+
WHEN stakind4=5 THEN stanumbers4
166+
WHEN stakind5=5 THEN stanumbers5
167+
ENDAS elem_count_histogram
143168
FROM pg_statistic sJOIN pg_class cON (c.oid=s.starelid)
144169
JOIN pg_attribute aON (c.oid= attrelidAND attnum=s.staattnum)
145170
LEFT JOIN pg_namespace nON (n.oid=c.relnamespace)

‎src/backend/commands/analyze.c

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -110,8 +110,6 @@ static void update_attstats(Oid relid, bool inh,
110110
staticDatumstd_fetch_func(VacAttrStatsPstats,intrownum,bool*isNull);
111111
staticDatumind_fetch_func(VacAttrStatsPstats,intrownum,bool*isNull);
112112

113-
staticboolstd_typanalyze(VacAttrStats*stats);
114-
115113

116114
/*
117115
*analyze_rel() -- analyze one relation
@@ -476,8 +474,7 @@ do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
476474
for (i=0;i<attr_cnt;i++)
477475
{
478476
VacAttrStats*stats=vacattrstats[i];
479-
AttributeOpts*aopt=
480-
get_attribute_options(onerel->rd_id,stats->attr->attnum);
477+
AttributeOpts*aopt;
481478

482479
stats->rows=rows;
483480
stats->tupDesc=onerel->rd_att;
@@ -490,11 +487,12 @@ do_analyze_rel(Relation onerel, VacuumStmt *vacstmt, bool inh)
490487
* If the appropriate flavor of the n_distinct option is
491488
* specified, override with the corresponding value.
492489
*/
490+
aopt=get_attribute_options(onerel->rd_id,stats->attr->attnum);
493491
if (aopt!=NULL)
494492
{
495-
float8n_distinct=
496-
inh ?aopt->n_distinct_inherited :aopt->n_distinct;
493+
float8n_distinct;
497494

495+
n_distinct=inh ?aopt->n_distinct_inherited :aopt->n_distinct;
498496
if (n_distinct!=0.0)
499497
stats->stadistinct=n_distinct;
500498
}
@@ -1794,7 +1792,7 @@ static intcompare_mcvs(const void *a, const void *b);
17941792
/*
17951793
* std_typanalyze -- the default type-specific typanalyze function
17961794
*/
1797-
staticbool
1795+
bool
17981796
std_typanalyze(VacAttrStats*stats)
17991797
{
18001798
Form_pg_attributeattr=stats->attr;

‎src/backend/commands/typecmds.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -609,7 +609,7 @@ DefineType(List *names, List *parameters)
609609
F_ARRAY_SEND,/* send procedure */
610610
typmodinOid,/* typmodin procedure */
611611
typmodoutOid,/* typmodout procedure */
612-
InvalidOid,/* analyze procedure - default */
612+
F_ARRAY_TYPANALYZE,/* analyze procedure */
613613
typoid,/* element type ID */
614614
true,/* yes this is an array type */
615615
InvalidOid,/* no further array type */
@@ -1140,7 +1140,7 @@ DefineEnum(CreateEnumStmt *stmt)
11401140
F_ARRAY_SEND,/* send procedure */
11411141
InvalidOid,/* typmodin procedure - none */
11421142
InvalidOid,/* typmodout procedure - none */
1143-
InvalidOid,/* analyze procedure - default */
1143+
F_ARRAY_TYPANALYZE,/* analyze procedure */
11441144
enumTypeOid,/* element type ID */
11451145
true,/* yes this is an array type */
11461146
InvalidOid,/* no further array type */
@@ -1450,7 +1450,7 @@ DefineRange(CreateRangeStmt *stmt)
14501450
F_ARRAY_SEND,/* send procedure */
14511451
InvalidOid,/* typmodin procedure - none */
14521452
InvalidOid,/* typmodout procedure - none */
1453-
InvalidOid,/* analyze procedure - default */
1453+
F_ARRAY_TYPANALYZE,/* analyze procedure */
14541454
typoid,/* element type ID */
14551455
true,/* yes this is an array type */
14561456
InvalidOid,/* no further array type */

‎src/backend/tsearch/ts_selfuncs.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,10 @@ mcelem_tsquery_selec(TSQuery query, Datum *mcelem, int nmcelem,
220220
/*
221221
* There should be two more Numbers than Values, because the last two
222222
* cells are taken for minimal and maximal frequency. Punt if not.
223+
*
224+
* (Note: the MCELEM statistics slot definition allows for a third extra
225+
* number containing the frequency of nulls, but we're not expecting that
226+
* to appear for a tsvector column.)
223227
*/
224228
if (nnumbers!=nmcelem+2)
225229
returntsquery_opr_selec_no_stats(query);

‎src/backend/tsearch/ts_typanalyze.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -377,6 +377,11 @@ compute_tsvector_stats(VacAttrStats *stats,
377377
* able to find out the minimal and maximal frequency without
378378
* going through all the values. We keep those two extra
379379
* frequencies in two extra cells in mcelem_freqs.
380+
*
381+
* (Note: the MCELEM statistics slot definition allows for a third
382+
* extra number containing the frequency of nulls, but we don't
383+
* create that for a tsvector column, since null elements aren't
384+
* possible.)
380385
*/
381386
mcelem_values= (Datum*)palloc(num_mcelem*sizeof(Datum));
382387
mcelem_freqs= (float4*)palloc((num_mcelem+2)*sizeof(float4));

‎src/backend/utils/adt/Makefile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@ override CFLAGS+= -mieee
1515
endif
1616
endif
1717

18-
OBJS = acl.o arrayfuncs.o array_userfuncs.o arrayutils.o bool.o\
18+
OBJS = acl.o arrayfuncs.o array_selfuncs.o array_typanalyze.o\
19+
array_userfuncs.o arrayutils.o bool.o\
1920
cash.o char.o date.o datetime.o datum.o domains.o\
2021
enum.o float.o format_type.o\
2122
geo_ops.o geo_selfuncs.o int.o int8.o json.o like.o lockfuncs.o\

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp