NotificationsYou must be signed in to change notification settings
Fork28
Star151

Commitf21668f

committed

Add "split after new tuple" nbtree optimization.

Add additional heuristics to the algorithm for locating an optimal splitlocation. New logic identifies localized monotonically increasingvalues in indexes with multiple columns. When this insertion pattern isdetected, page splits split just after the new item that provoked a pagesplit (or apply leaf fillfactor in the style of a rightmost page split).This optimization is a variation of the long established leaf fillfactoroptimization used during rightmost page splits.50/50 page splits are only appropriate with a pattern of truly randominsertions, where the average space utilization ends up at 65% - 70%.Without this patch, affected cases have leaf pages that are no more thanabout 50% full on average. Future insertions can never make use of thefree space left behind. With this patch, affected cases have leaf pagesthat are about 90% full on average (assuming a fillfactor of 90).Localized monotonically increasing insertion patterns are presumed to befairly common in real-world applications. There is a fair amount ofanecdotal evidence for this. Both pg_depend system catalog indexes(pg_depend_depender_index and pg_depend_reference_index) are at least20% smaller after the regression tests are run when the optimization isavailable. Furthermore, many of the indexes created by a fair useimplementation of TPC-C for Postgres are consistently about 40% smallerwhen the optimization is available.Note that even pg_upgrade'd v3 indexes make use of this optimization.Author: Peter GeogheganReviewed-By: Heikki LinnakangasDiscussion:https://postgr.es/m/CAH2-WzkpKeZJrXvR_p7VSY1b-s85E3gHyTbZQzR0BkJ5LrWF_A@mail.gmail.com

1 parentf7ff0ae commitf21668fCopy full SHA for f21668f

File tree

1 file changed

+211

-3

lines changed

src/backend/access/nbtree
- nbtsplitloc.c

1 file changed

+211

-3

lines changed

`‎src/backend/access/nbtree/nbtsplitloc.c‎`

Lines changed: 211 additions & 3 deletions

Original file line number	Diff line number	Diff line change
`@@ -70,6 +70,9 @@ static void _bt_recsplitloc(FindSplitData *state,`
`70`	`70`	`staticvoid_bt_deltasortsplits(FindSplitData*state,doublefillfactormult,`
`71`	`71`	`boolusemult);`
`72`	`72`	`staticint_bt_splitcmp(constvoidarg1,constvoidarg2);`
	`73`	`+staticbool_bt_afternewitemoff(FindSplitData*state,OffsetNumbermaxoff,`
	`74`	`+intleaffillfactor,bool*usemult);`
	`75`	`+staticbool_bt_adjacenthtid(ItemPointerlowhtid,ItemPointerhighhtid);`
`73`	`76`	`staticOffsetNumber_bt_bestsplitloc(FindSplitData*state,intperfectpenalty,`
`74`	`77`	`bool*newitemonleft);`
`75`	`78`	`staticint_bt_strategy(FindSplitDatastate,SplitPointleftpage,`
`@@ -249,9 +252,10 @@ _bt_findsplitloc(Relation rel,`
`249`	`252`	`* Start search for a split point among list of legal split points. Give`
`250`	`253`	`* primary consideration to equalizing available free space in each half`
`251`	`254`	`* of the split initially (start with default strategy), while applying`
`252`		`- * rightmost optimization where appropriate. Either of the two other`
`253`		`- * fallback strategies may be required for cases with a large number of`
`254`		`- * duplicates around the original/space-optimal split point.`
	`255`	`+ * rightmost and split-after-new-item optimizations where appropriate.`
	`256`	`+ * Either of the two other fallback strategies may be required for cases`
	`257`	`+ * with a large number of duplicates around the original/space-optimal`
	`258`	`+ * split point.`
`255`	`259`	`*`
`256`	`260`	`* Default strategy gives some weight to suffix truncation in deciding a`
`257`	`261`	`* split point on leaf pages. It attempts to select a split point where a`
`@@ -273,6 +277,44 @@ _bt_findsplitloc(Relation rel,`
`273`	`277`	`usemult= true;`
`274`	`278`	`fillfactormult=leaffillfactor /100.0;`
`275`	`279`	`}`
	`280`	`+elseif (_bt_afternewitemoff(&state,maxoff,leaffillfactor,&usemult))`
	`281`	`+{`
	`282`	`+/*`
	`283`	`+ * New item inserted at rightmost point among a localized grouping on`
	`284`	`+ * a leaf page -- apply "split after new item" optimization, either by`
	`285`	`+ * applying leaf fillfactor multiplier, or by choosing the exact split`
	`286`	`+ * point that leaves the new item as last on the left. (usemult is set`
	`287`	`+ * for us.)`
	`288`	`+ */`
	`289`	`+if (usemult)`
	`290`	`+{`
	`291`	`+/* fillfactormult should be set based on leaf fillfactor */`
	`292`	`+fillfactormult=leaffillfactor /100.0;`
	`293`	`+}`
	`294`	`+else`
	`295`	`+{`
	`296`	`+/* find precise split point after newitemoff */`
	`297`	`+for (inti=0;i<state.nsplits;i++)`
	`298`	`+{`
	`299`	`+SplitPoint*split=state.splits+i;`
	`300`	`+`
	`301`	`+if (split->newitemonleft&&`
	`302`	`+newitemoff==split->firstoldonright)`
	`303`	`+{`
	`304`	`+pfree(state.splits);`
	`305`	`+*newitemonleft= true;`
	`306`	`+returnnewitemoff;`
	`307`	`+}`
	`308`	`+}`
	`309`	`+`
	`310`	`+/*`
	`311`	`+ * Cannot legally split after newitemoff; proceed with split`
	`312`	`+ * without using fillfactor multiplier. This is defensive, and`
	`313`	`+ * should never be needed in practice.`
	`314`	`+ */`
	`315`	`+fillfactormult=0.50;`
	`316`	`+}`
	`317`	`+}`
`276`	`318`	`else`
`277`	`319`	`{`
`278`	`320`	`/* Other leaf page. 50:50 page split. */`
`@@ -519,6 +561,172 @@ _bt_splitcmp(const void arg1, const void arg2)`
`519`	`561`	`return0;`
`520`	`562`	`}`
`521`	`563`
	`564`	`+/*`
	`565`	`+ * Subroutine to determine whether or not a non-rightmost leaf page should be`
	`566`	`+ * split immediately after the would-be original page offset for the`
	`567`	`+ * new/incoming tuple (or should have leaf fillfactor applied when new item is`
	`568`	`+ * to the right on original page). This is appropriate when there is a`
	`569`	`+ * pattern of localized monotonically increasing insertions into a composite`
	`570`	`+ * index, where leading attribute values form local groupings, and we`
	`571`	`+ * anticipate further insertions of the same/current grouping (new item's`
	`572`	`+ * grouping) in the near future. This can be thought of as a variation on`
	`573`	`+ * applying leaf fillfactor during rightmost leaf page splits, since cases`
	`574`	`+ * that benefit will converge on packing leaf pages leaffillfactor% full over`
	`575`	`+ * time.`
	`576`	`+ *`
	`577`	`+ * We may leave extra free space remaining on the rightmost page of a "most`
	`578`	`+ * significant column" grouping of tuples if that grouping never ends up`
	`579`	`+ * having future insertions that use the free space. That effect is`
	`580`	`+ * self-limiting; a future grouping that becomes the "nearest on the right"`
	`581`	`+ * grouping of the affected grouping usually puts the extra free space to good`
	`582`	`+ * use.`
	`583`	`+ *`
	`584`	`+ * Caller uses optimization when routine returns true, though the exact action`
	`585`	`+ * taken by caller varies. Caller uses original leaf page fillfactor in`
	`586`	`+ * standard way rather than using the new item offset directly when *usemult`
	`587`	`+ * was also set to true here. Otherwise, caller applies optimization by`
	`588`	`+ * locating the legal split point that makes the new tuple the very last tuple`
	`589`	`+ * on the left side of the split.`
	`590`	`+ */`
	`591`	`+staticbool`
	`592`	`+_bt_afternewitemoff(FindSplitData*state,OffsetNumbermaxoff,`
	`593`	`+intleaffillfactor,bool*usemult)`
	`594`	`+{`
	`595`	`+int16nkeyatts;`
	`596`	`+ItemIditemid;`
	`597`	`+IndexTupletup;`
	`598`	`+intkeepnatts;`
	`599`	`+`
	`600`	`+Assert(state->is_leaf&& !state->is_rightmost);`
	`601`	`+`
	`602`	`+nkeyatts=IndexRelationGetNumberOfKeyAttributes(state->rel);`
	`603`	`+`
	`604`	`+/* Single key indexes not considered here */`
	`605`	`+if (nkeyatts==1)`
	`606`	`+return false;`
	`607`	`+`
	`608`	`+/* Ascending insertion pattern never inferred when new item is first */`
	`609`	`+if (state->newitemoff==P_FIRSTKEY)`
	`610`	`+return false;`
	`611`	`+`
	`612`	`+/*`
	`613`	`+ * Only apply optimization on pages with equisized tuples, since ordinal`
	`614`	`+ * keys are likely to be fixed-width. Testing if the new tuple is`
	`615`	`+ * variable width directly might also work, but that fails to apply the`
	`616`	`+ * optimization to indexes with a numeric_ops attribute.`
	`617`	`+ *`
	`618`	`+ * Conclude that page has equisized tuples when the new item is the same`
	`619`	`+ * width as the smallest item observed during pass over page, and other`
	`620`	`+ * non-pivot tuples must be the same width as well. (Note that the`
	`621`	`+ * possibly-truncated existing high key isn't counted in`
	`622`	`+ * olddataitemstotal, and must be subtracted from maxoff.)`
	`623`	`+ */`
	`624`	`+if (state->newitemsz!=state->minfirstrightsz)`
	`625`	`+return false;`
	`626`	`+if (state->newitemsz* (maxoff-1)!=state->olddataitemstotal)`
	`627`	`+return false;`
	`628`	`+`
	`629`	`+/*`
	`630`	`+ * Avoid applying optimization when tuples are wider than a tuple`
	`631`	`+ * consisting of two non-NULL int8/int64 attributes (or four non-NULL`
	`632`	`+ * int4/int32 attributes)`
	`633`	`+ */`
	`634`	`+if (state->newitemsz>`
	`635`	`+MAXALIGN(sizeof(IndexTupleData)+sizeof(int64)*2)+`
	`636`	`+sizeof(ItemIdData))`
	`637`	`+return false;`
	`638`	`+`
	`639`	`+/*`
	`640`	`+ * At least the first attribute's value must be equal to the corresponding`
	`641`	`+ * value in previous tuple to apply optimization. New item cannot be a`
	`642`	`+ * duplicate, either.`
	`643`	`+ *`
	`644`	`+ * Handle case where new item is to the right of all items on the existing`
	`645`	`+ * page. This is suggestive of monotonically increasing insertions in`
	`646`	`+ * itself, so the "heap TID adjacency" test is not applied here.`
	`647`	`+ */`
	`648`	`+if (state->newitemoff>maxoff)`
	`649`	`+{`
	`650`	`+itemid=PageGetItemId(state->page,maxoff);`
	`651`	`+tup= (IndexTuple)PageGetItem(state->page,itemid);`
	`652`	`+keepnatts=_bt_keep_natts_fast(state->rel,tup,state->newitem);`
	`653`	`+`
	`654`	`+if (keepnatts>1&&keepnatts <=nkeyatts)`
	`655`	`+{`
	`656`	`+*usemult= true;`
	`657`	`+return true;`
	`658`	`+}`
	`659`	`+`
	`660`	`+return false;`
	`661`	`+}`
	`662`	`+`
	`663`	`+/*`
	`664`	`+ * "Low cardinality leading column, high cardinality suffix column"`
	`665`	`+ * indexes with a random insertion pattern (e.g., an index with a boolean`
	`666`	`+ * column, such as an index on '(book_is_in_print, book_isbn)') present us`
	`667`	`+ * with a risk of consistently misapplying the optimization. We're`
	`668`	`+ * willing to accept very occasional misapplication of the optimization,`
	`669`	`+ * provided the cases where we get it wrong are rare and self-limiting.`
	`670`	`+ *`
	`671`	`+ * Heap TID adjacency strongly suggests that the item just to the left was`
	`672`	`+ * inserted very recently, which limits overapplication of the`
	`673`	`+ * optimization. Besides, all inappropriate cases triggered here will`
	`674`	`+ * still split in the middle of the page on average.`
	`675`	`+ */`
	`676`	`+itemid=PageGetItemId(state->page,OffsetNumberPrev(state->newitemoff));`
	`677`	`+tup= (IndexTuple)PageGetItem(state->page,itemid);`
	`678`	`+/* Do cheaper test first */`
	`679`	`+if (!_bt_adjacenthtid(&tup->t_tid,&state->newitem->t_tid))`
	`680`	`+return false;`
	`681`	`+/* Check same conditions as rightmost item case, too */`
	`682`	`+keepnatts=_bt_keep_natts_fast(state->rel,tup,state->newitem);`
	`683`	`+`
	`684`	`+if (keepnatts>1&&keepnatts <=nkeyatts)`
	`685`	`+{`
	`686`	`+doubleinterp= (double)state->newitemoff / ((double)maxoff+1);`
	`687`	`+doubleleaffillfactormult= (double)leaffillfactor /100.0;`
	`688`	`+`
	`689`	`+/*`
	`690`	`+ * Don't allow caller to split after a new item when it will result in`
	`691`	`+ * a split point to the right of the point that a leaf fillfactor`
	`692`	`+ * split would use -- have caller apply leaf fillfactor instead`
	`693`	`+ */`
	`694`	`+*usemult=interp>leaffillfactormult;`
	`695`	`+`
	`696`	`+return true;`
	`697`	`+}`
	`698`	`+`
	`699`	`+return false;`
	`700`	`+}`
	`701`	`+`
	`702`	`+/*`
	`703`	`+ * Subroutine for determining if two heap TIDS are "adjacent".`
	`704`	`+ *`
	`705`	`+ * Adjacent means that the high TID is very likely to have been inserted into`
	`706`	`+ * heap relation immediately after the low TID, probably during the current`
	`707`	`+ * transaction.`
	`708`	`+ */`
	`709`	`+staticbool`
	`710`	`+_bt_adjacenthtid(ItemPointerlowhtid,ItemPointerhighhtid)`
	`711`	`+{`
	`712`	`+BlockNumberlowblk,`
	`713`	`+highblk;`
	`714`	`+`
	`715`	`+lowblk=ItemPointerGetBlockNumber(lowhtid);`
	`716`	`+highblk=ItemPointerGetBlockNumber(highhtid);`
	`717`	`+`
	`718`	`+/* Make optimistic assumption of adjacency when heap blocks match */`
	`719`	`+if (lowblk==highblk)`
	`720`	`+return true;`
	`721`	`+`
	`722`	`+/* When heap block one up, second offset should be FirstOffsetNumber */`
	`723`	`+if (lowblk+1==highblk&&`
	`724`	`+ItemPointerGetOffsetNumber(highhtid)==FirstOffsetNumber)`
	`725`	`+return true;`
	`726`	`+`
	`727`	`+return false;`
	`728`	`+}`
	`729`	`+`
`522`	`730`	`/*`
`523`	`731`	`* Subroutine to find the "best" split point among candidate split points.`
`524`	`732`	`* The best split point is the split point with the lowest penalty among split`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitf21668f

File tree

1 file changed

1 file changed

`‎src/backend/access/nbtree/nbtsplitloc.c‎`

0 commit comments