Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit3d3bf62

Browse files
committed
Omit null rows when setting the threshold for what's a most-common value.
As with the previous patch, large numbers of null rows could skew thiscalculation unfavorably, causing us to discard values that have alegitimate claim to be MCVs, since our definition of MCV is that it'smost common among the non-null population of the column. Hence, makethe numerator of avgcount be the number of non-null sample values notthe number of sample rows; likewise for maxmincount in thecompute_scalar_stats variant.Also, make the denominator be the number of distinct values actuallyobserved in the sample, rather than reversing it back out of the computedstadistinct. This avoids depending on the accuracy of the Haas-Stokesapproximation, and really it's what we want anyway; the threshold shoulddepend only on what we see in the sample, not on what we extrapolateabout the contents of the whole column.Alex Shulgin, reviewed by Tomas Vondra and myself
1 parent5cb8826 commit3d3bf62

File tree

1 file changed

+9
-11
lines changed

1 file changed

+9
-11
lines changed

‎src/backend/commands/analyze.c

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2133,14 +2133,13 @@ compute_distinct_stats(VacAttrStatsP stats,
21332133
}
21342134
else
21352135
{
2136-
doublendistinct=stats->stadistinct;
2136+
/* d here is the same as d in the Haas-Stokes formula */
2137+
intd=nonnull_cnt-summultiple+nmultiple;
21372138
doubleavgcount,
21382139
mincount;
21392140

2140-
if (ndistinct<0)
2141-
ndistinct=-ndistinct*totalrows;
2142-
/* estimate # of occurrences in sample of a typical value */
2143-
avgcount= (double)samplerows /ndistinct;
2141+
/* estimate # occurrences in sample of a typical nonnull value */
2142+
avgcount= (double)nonnull_cnt / (double)d;
21442143
/* set minimum threshold count to store a value */
21452144
mincount=avgcount*1.25;
21462145
if (mincount<2)
@@ -2494,21 +2493,20 @@ compute_scalar_stats(VacAttrStatsP stats,
24942493
}
24952494
else
24962495
{
2497-
doublendistinct=stats->stadistinct;
2496+
/* d here is the same as d in the Haas-Stokes formula */
2497+
intd=ndistinct+toowide_cnt;
24982498
doubleavgcount,
24992499
mincount,
25002500
maxmincount;
25012501

2502-
if (ndistinct<0)
2503-
ndistinct=-ndistinct*totalrows;
2504-
/* estimate # of occurrences in sample of a typical value */
2505-
avgcount= (double)samplerows /ndistinct;
2502+
/* estimate # occurrences in sample of a typical nonnull value */
2503+
avgcount= (double)values_cnt / (double)d;
25062504
/* set minimum threshold count to store a value */
25072505
mincount=avgcount*1.25;
25082506
if (mincount<2)
25092507
mincount=2;
25102508
/* don't let threshold exceed 1/K, however */
2511-
maxmincount= (double)samplerows / (double)num_bins;
2509+
maxmincount= (double)values_cnt / (double)num_bins;
25122510
if (mincount>maxmincount)
25132511
mincount=maxmincount;
25142512
if (num_mcv>track_cnt)

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp