NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commit1542e16

committed

Consider outliers in split interval calculation.

Commit0d861bb, which introduced deduplication to nbtree, added somelogic to take large posting list tuples into account when choosing asplit point. We subtract firstright posting list overhead from theprojected new high key size when calculating leftfree/rightfree valuesfor an affected candidate split point. Posting list tuples aren'tspecial to nbtsplitloc.c, but taking them into account like this makes ahuge difference in practice. Posting list tuples are frequently tuplesize outliers.However, commit0d861bb missed a closely related issue: split intervalitself is calculated based on the assumption that tuples on the pagebeing split are roughly equisized. That assumption was acceptable backwhen commitfab2502 taught the logic for choosing a split point aboutsuffix truncation, but it's pretty questionable now that very largetuple sizes are common. This oversight led to unbalanced page splits inlow cardinality multi-column indexes when deduplication was used: pagesplits that don't give sufficient weight to how unbalanced the split iswhen the interval happens to include some large posting list tuples (andwhen most other tuples on the page are not so large).Nail this down by calculating an initial split interval in a way that'sattuned to the actual cost that we want to keep under control (not afuzzy proxy for the cost): apply a leftfree + rightfree evenness test toeach candidate split point that actually gets included in the splitinterval (for the default strategy). This replaces logic that used apercentage of all legal split points for the page as the basis of theinitial split interval.Discussion:https://postgr.es/m/CAH2-WznJt5aT2uUB2Bs+JBLdwe0XTX67+xeLFcaNvCKxO=QBVQ@mail.gmail.com

1 parent1c45507 commit1542e16Copy full SHA for 1542e16

File tree

1 file changed

+83

-21

lines changed

src/backend/access/nbtree
- nbtsplitloc.c

1 file changed

+83

-21

lines changed

`‎src/backend/access/nbtree/nbtsplitloc.c‎`

Lines changed: 83 additions & 21 deletions

Original file line number	Diff line number	Diff line change
`@@ -17,10 +17,6 @@`
`17`	`17`	`#include"access/nbtree.h"`
`18`	`18`	`#include"storage/lmgr.h"`
`19`	`19`
`20`		`-/* limits on split interval (default strategy only) */`
`21`		`-#defineMAX_LEAF_INTERVAL9`
`22`		`-#defineMAX_INTERNAL_INTERVAL18`
`23`		`-`
`24`	`20`	`typedefenum`
`25`	`21`	`{`
`26`	`22`	`/* strategy for searching through materialized list of split points */`
`@@ -76,6 +72,7 @@ static bool _bt_afternewitemoff(FindSplitData *state, OffsetNumber maxoff,`
`76`	`72`	`staticbool_bt_adjacenthtid(ItemPointerlowhtid,ItemPointerhighhtid);`
`77`	`73`	`staticOffsetNumber_bt_bestsplitloc(FindSplitData*state,intperfectpenalty,`
`78`	`74`	`bool*newitemonleft,FindSplitStratstrategy);`
	`75`	`+staticint_bt_defaultinterval(FindSplitData*state);`
`79`	`76`	`staticint_bt_strategy(FindSplitDatastate,SplitPointleftpage,`
`80`	`77`	`SplitPointrightpage,FindSplitStratstrategy);`
`81`	`78`	`staticvoid_bt_interval_edges(FindSplitData*state,`
`@@ -279,7 +276,7 @@ _bt_findsplitloc(Relation rel,`
`279`	`276`	`* left side of the split, in order to maximize the number of trailing`
`280`	`277`	`* attributes that can be truncated away. Only candidate split points`
`281`	`278`	`* that imply an acceptable balance of free space on each side are`
`282`		`- * considered.`
	`279`	`+ * considered. See _bt_defaultinterval().`
`283`	`280`	`*/`
`284`	`281`	`if (!state.is_leaf)`
`285`	`282`	`{`
`@@ -338,19 +335,6 @@ _bt_findsplitloc(Relation rel,`
`338`	`335`	`fillfactormult=0.50;`
`339`	`336`	`}`
`340`	`337`
`341`		`-/*`
`342`		`- * Set an initial limit on the split interval/number of candidate split`
`343`		`- * points as appropriate. The "Prefix B-Trees" paper refers to this as`
`344`		`- * sigma l for leaf splits and sigma b for internal ("branch") splits.`
`345`		`- * It's hard to provide a theoretical justification for the initial size`
`346`		`- * of the split interval, though it's clear that a small split interval`
`347`		`- * makes suffix truncation much more effective without noticeably`
`348`		`- * affecting space utilization over time.`
`349`		`- */`
`350`		`-state.interval=Min(Max(1,state.nsplits*0.05),`
`351`		`-state.is_leaf ?MAX_LEAF_INTERVAL :`
`352`		`-MAX_INTERNAL_INTERVAL);`
`353`		`-`
`354`	`338`	`/*`
`355`	`339`	`* Save leftmost and rightmost splits for page before original ordinal`
`356`	`340`	`* sort order is lost by delta/fillfactormult sort`
`@@ -361,6 +345,9 @@ _bt_findsplitloc(Relation rel,`
`361`	`345`	`/* Give split points a fillfactormult-wise delta, and sort on deltas */`
`362`	`346`	`_bt_deltasortsplits(&state,fillfactormult,usemult);`
`363`	`347`
	`348`	`+/* Determine split interval for default strategy */`
	`349`	`+state.interval=_bt_defaultinterval(&state);`
	`350`	`+`
`364`	`351`	`/*`
`365`	`352`	`* Determine if default strategy/split interval will produce a`
`366`	`353`	`* sufficiently distinguishing split, or if we should change strategies.`
`@@ -850,11 +837,13 @@ _bt_bestsplitloc(FindSplitData *state, int perfectpenalty,`
`850`	`837`	`*/`
`851`	`838`	`if (strategy==SPLIT_MANY_DUPLICATES&& !state->is_rightmost&&`
`852`	`839`	`!final->newitemonleft&&final->firstrightoff >=state->newitemoff&&`
`853`		`-final->firstrightoff<state->newitemoff+MAX_LEAF_INTERVAL)`
	`840`	`+final->firstrightoff<state->newitemoff+9)`
`854`	`841`	`{`
`855`	`842`	`/*`
`856`	`843`	`* Avoid the problem by performing a 50:50 split when the new item is`
`857`	`844`	`* just to the right of the would-be "many duplicates" split point.`
	`845`	`+ * (Note that the test used for an insert that is "just to the right"`
	`846`	`+ * of the split point is conservative.)`
`858`	`847`	`*/`
`859`	`848`	`final=&state->splits[0];`
`860`	`849`	`}`
`@@ -863,6 +852,79 @@ _bt_bestsplitloc(FindSplitData *state, int perfectpenalty,`
`863`	`852`	`returnfinal->firstrightoff;`
`864`	`853`	`}`
`865`	`854`
	`855`	`+#defineLEAF_SPLIT_DISTANCE0.050`
	`856`	`+#defineINTERNAL_SPLIT_DISTANCE0.075`
	`857`	`+`
	`858`	`+/*`
	`859`	`+ * Return a split interval to use for the default strategy. This is a limit`
	`860`	`+ * on the number of candidate split points to give further consideration to.`
	`861`	`+ * Only a fraction of all candidate splits points (those located at the start`
	`862`	`+ * of the now-sorted splits array) fall within the split interval. Split`
	`863`	`+ * interval is applied within _bt_bestsplitloc().`
	`864`	`+ *`
	`865`	`+ * Split interval represents an acceptable range of split points -- those that`
	`866`	`+ * have leftfree and rightfree values that are acceptably balanced. The final`
	`867`	`+ * split point chosen is the split point with the lowest "penalty" among split`
	`868`	`+ * points in this split interval (unless we change our entire strategy, in`
	`869`	`+ * which case the interval also changes -- see _bt_strategy()).`
	`870`	`+ *`
	`871`	`+ * The "Prefix B-Trees" paper calls split interval sigma l for leaf splits,`
	`872`	`+ * and sigma b for internal ("branch") splits. It's hard to provide a`
	`873`	`+ * theoretical justification for the size of the split interval, though it's`
	`874`	`+ * clear that a small split interval can make tuples on level L+1 much smaller`
	`875`	`+ * on average, without noticeably affecting space utilization on level L.`
	`876`	`+ * (Note that the way that we calculate split interval might need to change if`
	`877`	`+ * suffix truncation is taught to truncate tuples "within" the last`
	`878`	`+ * attribute/datum for data types like text, which is more or less how it is`
	`879`	`+ * assumed to work in the paper.)`
	`880`	`+ */`
	`881`	`+staticint`
	`882`	`+_bt_defaultinterval(FindSplitData*state)`
	`883`	`+{`
	`884`	`+SplitPoint*spaceoptimal;`
	`885`	`+int16tolerance,`
	`886`	`+lowleftfree,`
	`887`	`+lowrightfree,`
	`888`	`+highleftfree,`
	`889`	`+highrightfree;`
	`890`	`+`
	`891`	`+/*`
	`892`	`+ * Determine leftfree and rightfree values that are higher and lower than`
	`893`	`+ * we're willing to tolerate. Note that the final split interval will be`
	`894`	`+ * about 10% of nsplits in the common case where all non-pivot tuples`
	`895`	`+ * (data items) from a leaf page are uniformly sized. We're a bit more`
	`896`	`+ * aggressive when splitting internal pages.`
	`897`	`+ */`
	`898`	`+if (state->is_leaf)`
	`899`	`+tolerance=state->olddataitemstotal*LEAF_SPLIT_DISTANCE;`
	`900`	`+else`
	`901`	`+tolerance=state->olddataitemstotal*INTERNAL_SPLIT_DISTANCE;`
	`902`	`+`
	`903`	`+/* First candidate split point is the most evenly balanced */`
	`904`	`+spaceoptimal=state->splits;`
	`905`	`+lowleftfree=spaceoptimal->leftfree-tolerance;`
	`906`	`+lowrightfree=spaceoptimal->rightfree-tolerance;`
	`907`	`+highleftfree=spaceoptimal->leftfree+tolerance;`
	`908`	`+highrightfree=spaceoptimal->rightfree+tolerance;`
	`909`	`+`
	`910`	`+/*`
	`911`	`+ * Iterate through split points, starting from the split immediately after`
	`912`	`+ * 'spaceoptimal'. Find the first split point that divides free space so`
	`913`	`+ * unevenly that including it in the split interval would be unacceptable.`
	`914`	`+ */`
	`915`	`+for (inti=1;i<state->nsplits;i++)`
	`916`	`+{`
	`917`	`+SplitPoint*split=state->splits+i;`
	`918`	`+`
	`919`	`+/* Cannot use curdelta here, since its value is often weighted */`
	`920`	`+if (split->leftfree<lowleftfree\|\|split->rightfree<lowrightfree\|\|`
	`921`	`+split->leftfree>highleftfree\|\|split->rightfree>highrightfree)`
	`922`	`+returni;`
	`923`	`+}`
	`924`	`+`
	`925`	`+returnstate->nsplits;`
	`926`	`+}`
	`927`	`+`
`866`	`928`	`/*`
`867`	`929`	`* Subroutine to decide whether split should use default strategy/initial`
`868`	`930`	`* split interval, or whether it should finish splitting the page using`
`@@ -1097,7 +1159,7 @@ _bt_split_penalty(FindSplitData state, SplitPoint split)`
`1097`	`1159`	`}`
`1098`	`1160`
`1099`	`1161`	`/*`
`1100`		`- * Subroutine to get a lastleft IndexTuple for a split point from page`
	`1162`	`+ * Subroutine to get a lastleft IndexTuple for a split point`
`1101`	`1163`	`*/`
`1102`	`1164`	`staticinlineIndexTuple`
`1103`	`1165`	`_bt_split_lastleft(FindSplitDatastate,SplitPointsplit)`
`@@ -1113,7 +1175,7 @@ _bt_split_lastleft(FindSplitData state, SplitPoint split)`
`1113`	`1175`	`}`
`1114`	`1176`
`1115`	`1177`	`/*`
`1116`		`- * Subroutine to get a firstright IndexTuple for a split point from page`
	`1178`	`+ * Subroutine to get a firstright IndexTuple for a split point`
`1117`	`1179`	`*/`
`1118`	`1180`	`staticinlineIndexTuple`
`1119`	`1181`	`_bt_split_firstright(FindSplitDatastate,SplitPointsplit)`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit1542e16

File tree

1 file changed

1 file changed

`‎src/backend/access/nbtree/nbtsplitloc.c‎`

0 commit comments