Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit3c93a60

Browse files
committed
Add some more defenses against silly estimates to gincostestimate().
A report from Andy Colson showed that gincostestimate() was not beingnearly paranoid enough about whether to believe the statistics it finds inthe index metapage. The problem is that the metapage stats (other than thepending-pages count) are only updated by VACUUM, and in the worst casecould still reflect the index's original empty state even when it has grownto many entries. We attempted to deal with that by scaling up the stats tomatch the current index size, but if nEntries is zero then scaling it upstill gives zero. Moreover, the proportion of pages that are entry pagesvs. data pages vs. pending pages is unlikely to be estimated very well byscaling if the index is now orders of magnitude larger than before.We can improve matters by expanding the use of the rule-of-thumb estimatesI introduced in commit7fb008c: if the index has grown by morethan a cutoff amount (here set at 4X growth) since VACUUM, then use therule-of-thumb numbers instead of scaling. This might not be exactly rightbut it seems much less likely to produce insane estimates.I also improved both the scaling estimate and the rule-of-thumb estimateto account for numPendingPages, since it's reasonable to expect that thatis accurate in any case, and certainly pages that are in the pending listare not either entry or data pages.As a somewhat separate issue, adjust the estimation equations that areconcerned with extra fetches for partial-match searches. These equationssuppose that a fraction partialEntries / numEntries of the entry and datapages will be visited as a consequence of a partial-match search. Now,it's physically impossible for that fraction to exceed one, but ourestimate of partialEntries is mostly bunk, and our estimate of numEntriesisn't exactly gospel either, so we could arrive at a silly value. In theexample presented by Andy we were coming out with a value of 100, leadingto insane cost estimates. Clamp the fraction to one to avoid that.Like the previous patch, back-patch to all supported branches; thisproblem can be demonstrated in one form or another in all of them.
1 parent3cd1ba1 commit3c93a60

File tree

1 file changed

+52
-29
lines changed

1 file changed

+52
-29
lines changed

‎src/backend/utils/adt/selfuncs.c

Lines changed: 52 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -7246,6 +7246,7 @@ gincostestimate(PG_FUNCTION_ARGS)
72467246
numEntries;
72477247
GinQualCountscounts;
72487248
boolmatchPossible;
7249+
doublepartialScale;
72497250
doubleentryPagesFetched,
72507251
dataPagesFetched,
72517252
dataPagesFetchedBySel;
@@ -7274,42 +7275,59 @@ gincostestimate(PG_FUNCTION_ARGS)
72747275
memset(&ginStats,0,sizeof(ginStats));
72757276
}
72767277

7277-
if (ginStats.nTotalPages>0&&ginStats.nEntryPages>0&&numPages>0)
7278+
/*
7279+
* Assuming we got valid (nonzero) stats at all, nPendingPages can be
7280+
* trusted, but the other fields are data as of the last VACUUM. We can
7281+
* scale them up to account for growth since then, but that method only
7282+
* goes so far; in the worst case, the stats might be for a completely
7283+
* empty index, and scaling them will produce pretty bogus numbers.
7284+
* Somewhat arbitrarily, set the cutoff for doing scaling at 4X growth; if
7285+
* it's grown more than that, fall back to estimating things only from the
7286+
* assumed-accurate index size. But we'll trust nPendingPages in any case
7287+
* so long as it's not clearly insane, ie, more than the index size.
7288+
*/
7289+
if (ginStats.nPendingPages<numPages)
7290+
numPendingPages=ginStats.nPendingPages;
7291+
else
7292+
numPendingPages=0;
7293+
7294+
if (numPages>0&&ginStats.nTotalPages <=numPages&&
7295+
ginStats.nTotalPages>numPages /4&&
7296+
ginStats.nEntryPages>0&&ginStats.nEntries>0)
72787297
{
72797298
/*
7280-
*We got validstats. nPendingPages canbe trusted, but the other
7281-
*fields are data as of the last VACUUM. Scalethem by the ratio
7282-
*numPages / nTotalPages toaccount for growth sincethen.
7299+
*OK, thestats seem close enough to sane tobe trusted. But we
7300+
*still need to scalethem by the ratio numPages / nTotalPages to
7301+
* account for growth sincethe last VACUUM.
72837302
*/
72847303
doublescale=numPages /ginStats.nTotalPages;
72857304

7286-
numEntryPages=ginStats.nEntryPages;
7287-
numDataPages=ginStats.nDataPages;
7288-
numPendingPages=ginStats.nPendingPages;
7289-
numEntries=ginStats.nEntries;
7290-
7291-
numEntryPages=ceil(numEntryPages*scale);
7292-
numDataPages=ceil(numDataPages*scale);
7293-
numEntries=ceil(numEntries*scale);
7305+
numEntryPages=ceil(ginStats.nEntryPages*scale);
7306+
numDataPages=ceil(ginStats.nDataPages*scale);
7307+
numEntries=ceil(ginStats.nEntries*scale);
72947308
/* ensure we didn't round up too much */
7295-
numEntryPages=Min(numEntryPages,numPages);
7296-
numDataPages=Min(numDataPages,numPages-numEntryPages);
7309+
numEntryPages=Min(numEntryPages,numPages-numPendingPages);
7310+
numDataPages=Min(numDataPages,
7311+
numPages-numPendingPages-numEntryPages);
72977312
}
72987313
else
72997314
{
73007315
/*
7301-
* It's a hypothetical index, or perhaps an index created pre-9.1 and
7302-
* never vacuumed since upgrading. Invent some plausible internal
7303-
* statistics based on the index page count. We estimate that 90% of
7304-
* the index is entry pages, and the rest is data pages. Estimate 100
7305-
* entries per entry page; this is rather bogus since it'll depend on
7306-
* the size of the keys, but it's more robust than trying to predict
7307-
* the number of entries per heap tuple.
7316+
* We might get here because it's a hypothetical index, or an index
7317+
* created pre-9.1 and never vacuumed since upgrading (in which case
7318+
* its stats would read as zeroes), or just because it's grown too
7319+
* much since the last VACUUM for us to put our faith in scaling.
7320+
*
7321+
* Invent some plausible internal statistics based on the index page
7322+
* count (and clamp that to at least 10 pages, just in case). We
7323+
* estimate that 90% of the index is entry pages, and the rest is data
7324+
* pages. Estimate 100 entries per entry page; this is rather bogus
7325+
* since it'll depend on the size of the keys, but it's more robust
7326+
* than trying to predict the number of entries per heap tuple.
73087327
*/
73097328
numPages=Max(numPages,10);
7310-
numEntryPages=floor(numPages*0.90);
7311-
numDataPages=numPages-numEntryPages;
7312-
numPendingPages=0;
7329+
numEntryPages=floor((numPages-numPendingPages)*0.90);
7330+
numDataPages=numPages-numPendingPages-numEntryPages;
73137331
numEntries=floor(numEntryPages*100);
73147332
}
73157333

@@ -7435,16 +7453,21 @@ gincostestimate(PG_FUNCTION_ARGS)
74357453
/*
74367454
* Add an estimate of entry pages read by partial match algorithm. It's a
74377455
* scan over leaf pages in entry tree. We haven't any useful stats here,
7438-
* so estimate it as proportion.
7456+
* so estimate it as proportion. Because counts.partialEntries is really
7457+
* pretty bogus (see code above), it's possible that it is more than
7458+
* numEntries; clamp the proportion to ensure sanity.
74397459
*/
7440-
entryPagesFetched+=ceil(numEntryPages*counts.partialEntries /numEntries);
7460+
partialScale=counts.partialEntries /numEntries;
7461+
partialScale=Min(partialScale,1.0);
7462+
7463+
entryPagesFetched+=ceil(numEntryPages*partialScale);
74417464

74427465
/*
74437466
* Partial match algorithm reads all data pages before doing actual scan,
7444-
* so it's a startup cost. Again, we haven't any useful stats here, so,
7445-
* estimate it as proportion
7467+
* so it's a startup cost.Again, we haven't any useful stats here, so
7468+
* estimate it as proportion.
74467469
*/
7447-
dataPagesFetched=ceil(numDataPages*counts.partialEntries /numEntries);
7470+
dataPagesFetched=ceil(numDataPages*partialScale);
74487471

74497472
/*
74507473
* Calculate cache effects if more than one scan due to nestloops or array

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp