Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commita63b29a

Browse files
committed
Minor improvements for the multivariate MCV lists
The MCV build should always call get_mincount_for_mcv_list(), as thethere is no other logic to decide whether the MCV list represents allthe data. So just remove the (ngroups > nitems) condition.Also, when building MCV lists, the number of items was limited by thestatistics target (i.e. up to 10000). But when deserializing the MCVlist, a different value (8192) was used to check the input, causingan error. Simply ensure that the same value is used in both places.This should have been included in7300a69, but I forgot to include itin that commit.
1 parent7300a69 commita63b29a

File tree

2 files changed

+33
-35
lines changed

2 files changed

+33
-35
lines changed

‎src/backend/statistics/mcv.c

Lines changed: 31 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -155,15 +155,17 @@ statext_mcv_build(int numrows, HeapTuple *rows, Bitmapset *attrs,
155155
numattrs,
156156
ngroups,
157157
nitems;
158-
159-
AttrNumber*attnums=build_attnums_array(attrs,&numattrs);
160-
158+
AttrNumber*attnums;
159+
doublemincount;
161160
SortItem*items;
162161
SortItem*groups;
163162
MCVList*mcvlist=NULL;
163+
MultiSortSupportmss;
164+
165+
attnums=build_attnums_array(attrs,&numattrs);
164166

165167
/* comparator for all the columns */
166-
MultiSortSupportmss=build_mss(stats,numattrs);
168+
mss=build_mss(stats,numattrs);
167169

168170
/* sort the rows */
169171
items=build_sorted_items(numrows,&nitems,rows,stats[0]->tupDesc,
@@ -196,33 +198,28 @@ statext_mcv_build(int numrows, HeapTuple *rows, Bitmapset *attrs,
196198
* per-column frequencies, as if the columns were independent).
197199
*
198200
* Using the same algorithm might exclude items that are close to the
199-
* "average" frequency. Butit does not say whether the frequency is
200-
* close to base frequency or not. We also need to consider unexpectedly
201-
* uncommon items (compared tobase frequency), andthe single-column
202-
* algorithmignores that entirely.
201+
* "average" frequency of the sample. Butthat does not say whether the
202+
*observed frequency isclose tothebase frequency or not. We also
203+
*need to consider unexpectedlyuncommon items (again,compared to the
204+
*base frequency), and the single-columnalgorithmdoes not have to.
203205
*
204-
*If we can fit all theitemsonto the MCV list, do that. Otherwise
205-
*use get_mincount_for_mcv_list to decide whichitemsto keep in the
206-
*MCV list, based on the number of occurrences in the sample.
206+
*We simply decide how manyitemsto keep by computing minimum count
207+
*using get_mincount_for_mcv_list() and then keep allitemsthat seem
208+
*to be more common than that.
207209
*/
208-
if (ngroups>nitems)
209-
{
210-
doublemincount;
210+
mincount=get_mincount_for_mcv_list(numrows,totalrows);
211211

212-
mincount=get_mincount_for_mcv_list(numrows,totalrows);
213-
214-
/*
215-
*Walk thegroupsuntilwefind the first group with a count below
216-
* the mincount threshold (the index of that group is the number of
217-
* groups we want to keep).
218-
*/
219-
for (i=0;i<nitems;i++)
212+
/*
213+
* Walk the groups until we find the first group with a count below
214+
* the mincount threshold (the index of that group is the number of
215+
* groups wewant to keep).
216+
*/
217+
for (i=0;i<nitems;i++)
218+
{
219+
if (groups[i].count<mincount)
220220
{
221-
if (groups[i].count<mincount)
222-
{
223-
nitems=i;
224-
break;
225-
}
221+
nitems=i;
222+
break;
226223
}
227224
}
228225

@@ -469,11 +466,12 @@ statext_mcv_load(Oid mvoid)
469466
* Each attribute has to be processed separately, as we may be mixing different
470467
* datatypes, with different sort operators, etc.
471468
*
472-
* We use uint16 values for the indexes in step (3), as we currently don't allow
473-
* more than 8k MCV items anyway, although that's mostly arbitrary limit. We might
474-
* increase this to 65k and still fit into uint16. Furthermore, this limit is on
475-
* the number of distinct values per column, and we usually have few of those
476-
* (and various combinations of them for the those MCV list). So uint16 seems fine.
469+
* We use uint16 values for the indexes in step (3), as the number of MCV items
470+
* is limited by the statistics target (which is capped to 10k at the moment).
471+
* We might increase this to 65k and still fit into uint16, so there's a bit of
472+
* slack. Furthermore, this limit is on the number of distinct values per column,
473+
* and we usually have few of those (and various combinations of them for the
474+
* those MCV list). So uint16 seems fine for now.
477475
*
478476
* We don't really expect the serialization to save as much space as for
479477
* histograms, as we are not doing any bucket splits (which is the source
@@ -1322,7 +1320,7 @@ pg_mcv_list_send(PG_FUNCTION_ARGS)
13221320
* somewhat wasteful as we could do with just a single bit, thus reducing
13231321
* the size to ~1/8. It would also allow us to combine bitmaps simply using
13241322
* & and |, which should be faster than min/max. The bitmaps are fairly
1325-
* small, though (as wecap the MCV list size to 8k items).
1323+
* small, though (thanks to thecaponthe MCV list size).
13261324
*/
13271325
staticbool*
13281326
mcv_get_match_bitmap(PlannerInfo*root,List*clauses,

‎src/include/statistics/statistics.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,8 @@ typedef struct MVDependencies
8282
#defineSTATS_MCV_MAGIC0xE1A651C2/* marks serialized bytea */
8383
#defineSTATS_MCV_TYPE_BASIC1/* basic MCV list type */
8484

85-
/* max items in MCV list (mostly arbitrary number) */
86-
#defineSTATS_MCVLIST_MAX_ITEMS8192
85+
/* max items in MCV list (should be equal to max default_statistics_target) */
86+
#defineSTATS_MCVLIST_MAX_ITEMS10000
8787

8888
/*
8989
* Multivariate MCV (most-common value) lists

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp