Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit31f38f2

Browse files
committed
Redesign the planner's handling of index-descent cost estimation.
Historically we've used a couple of very ad-hoc fudge factors to try toget the right results when indexes of different sizes would satisfy aquery with the same number of index leaf tuples being visited. Incommit21a39de I tweaked one of thesefudge factors, with results that proved disastrous for larger indexes.Commitbf01e34 fudged it some more,but still with not a lot of principle behind it.What seems like a better way to address these issues is to explicitly modelindex-descent costs, since that's what's really at stake when consideringdiferent indexes with similar leaf-page-level costs. We tried that oncelong ago, and found that charging random_page_cost per page descendedthrough was way too much, because upper btree levels tend to stay in cachein real-world workloads. However, there's still CPU costs to think about,and the previous fudge factors can be seen as a crude attempt to accountfor those costs. So this patch replaces those fudge factors with explicitcharges for the number of tuple comparisons needed to descend the indextree, plus a small charge per page touched in the descent. The costmultipliers are chosen so that the resulting charges are in the vicinity ofthe historical (pre-9.2) fudge factors for indexes of up to about a milliontuples, while not ballooning unreasonably beyond that, as the old fudgefactor did (even more so in 9.2).To make this work accurately for btree indexes, add some code that allowsextraction of the known root-page height from a btree. There's noequivalent number readily available for other index types, but we can usethe log of the number of index pages as an approximate substitute.This seems like too much of a behavioral change to risk back-patching,but it should improve matters going forward. In 9.2 I'll just revertthe fudge-factor change.
1 parente1b735a commit31f38f2

File tree

6 files changed

+372
-118
lines changed

6 files changed

+372
-118
lines changed

‎src/backend/access/nbtree/nbtpage.c

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -411,6 +411,82 @@ _bt_gettrueroot(Relation rel)
411411
returnrootbuf;
412412
}
413413

414+
/*
415+
*_bt_getrootheight() -- Get the height of the btree search tree.
416+
*
417+
*We return the level (counting from zero) of the current fast root.
418+
*This represents the number of tree levels we'd have to descend through
419+
*to start any btree index search.
420+
*
421+
*This is used by the planner for cost-estimation purposes. Since it's
422+
*only an estimate, slightly-stale data is fine, hence we don't worry
423+
*about updating previously cached data.
424+
*/
425+
int
426+
_bt_getrootheight(Relationrel)
427+
{
428+
BTMetaPageData*metad;
429+
430+
/*
431+
* We can get what we need from the cached metapage data. If it's not
432+
* cached yet, load it. Sanity checks here must match _bt_getroot().
433+
*/
434+
if (rel->rd_amcache==NULL)
435+
{
436+
Buffermetabuf;
437+
Pagemetapg;
438+
BTPageOpaquemetaopaque;
439+
440+
metabuf=_bt_getbuf(rel,BTREE_METAPAGE,BT_READ);
441+
metapg=BufferGetPage(metabuf);
442+
metaopaque= (BTPageOpaque)PageGetSpecialPointer(metapg);
443+
metad=BTPageGetMeta(metapg);
444+
445+
/* sanity-check the metapage */
446+
if (!(metaopaque->btpo_flags&BTP_META)||
447+
metad->btm_magic!=BTREE_MAGIC)
448+
ereport(ERROR,
449+
(errcode(ERRCODE_INDEX_CORRUPTED),
450+
errmsg("index \"%s\" is not a btree",
451+
RelationGetRelationName(rel))));
452+
453+
if (metad->btm_version!=BTREE_VERSION)
454+
ereport(ERROR,
455+
(errcode(ERRCODE_INDEX_CORRUPTED),
456+
errmsg("version mismatch in index \"%s\": file version %d, code version %d",
457+
RelationGetRelationName(rel),
458+
metad->btm_version,BTREE_VERSION)));
459+
460+
/*
461+
* If there's no root page yet, _bt_getroot() doesn't expect a cache
462+
* to be made, so just stop here and report the index height is zero.
463+
* (XXX perhaps _bt_getroot() should be changed to allow this case.)
464+
*/
465+
if (metad->btm_root==P_NONE)
466+
{
467+
_bt_relbuf(rel,metabuf);
468+
return0;
469+
}
470+
471+
/*
472+
* Cache the metapage data for next time
473+
*/
474+
rel->rd_amcache=MemoryContextAlloc(rel->rd_indexcxt,
475+
sizeof(BTMetaPageData));
476+
memcpy(rel->rd_amcache,metad,sizeof(BTMetaPageData));
477+
478+
_bt_relbuf(rel,metabuf);
479+
}
480+
481+
metad= (BTMetaPageData*)rel->rd_amcache;
482+
/* We shouldn't have cached it if any of these fail */
483+
Assert(metad->btm_magic==BTREE_MAGIC);
484+
Assert(metad->btm_version==BTREE_VERSION);
485+
Assert(metad->btm_fastroot!=P_NONE);
486+
487+
returnmetad->btm_fastlevel;
488+
}
489+
414490
/*
415491
*_bt_checkpage() -- Verify that a freshly-read page looks sane.
416492
*/

‎src/backend/nodes/outfuncs.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1772,7 +1772,9 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
17721772
/* Do NOT print rel field, else infinite recursion */
17731773
WRITE_UINT_FIELD(pages);
17741774
WRITE_FLOAT_FIELD(tuples,"%.0f");
1775+
WRITE_INT_FIELD(tree_height);
17751776
WRITE_INT_FIELD(ncolumns);
1777+
/* array fields aren't really worth the trouble to print */
17761778
WRITE_OID_FIELD(relam);
17771779
/* indexprs is redundant since we print indextlist */
17781780
WRITE_NODE_FIELD(indpred);
@@ -1781,6 +1783,7 @@ _outIndexOptInfo(StringInfo str, const IndexOptInfo *node)
17811783
WRITE_BOOL_FIELD(unique);
17821784
WRITE_BOOL_FIELD(immediate);
17831785
WRITE_BOOL_FIELD(hypothetical);
1786+
/* we don't bother with fields copied from the pg_am entry */
17841787
}
17851788

17861789
staticvoid

‎src/backend/optimizer/util/plancat.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
#include"access/genam.h"
2121
#include"access/heapam.h"
2222
#include"access/htup_details.h"
23+
#include"access/nbtree.h"
2324
#include"access/sysattr.h"
2425
#include"access/transam.h"
2526
#include"access/xlog.h"
@@ -352,6 +353,17 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
352353
info->tuples=rel->tuples;
353354
}
354355

356+
if (info->relam==BTREE_AM_OID)
357+
{
358+
/* For btrees, get tree height while we have the index open */
359+
info->tree_height=_bt_getrootheight(indexRelation);
360+
}
361+
else
362+
{
363+
/* For other index types, just set it to "unknown" for now */
364+
info->tree_height=-1;
365+
}
366+
355367
index_close(indexRelation,NoLock);
356368

357369
indexinfos=lcons(info,indexinfos);

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp