You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
Commit0d861bb, which introduced deduplication to nbtree, added somelogic to take large posting list tuples into account when choosing asplit point. We subtract firstright posting list overhead from theprojected new high key size when calculating leftfree/rightfree valuesfor an affected candidate split point. Posting list tuples aren'tspecial to nbtsplitloc.c, but taking them into account like this makes ahuge difference in practice. Posting list tuples are frequently tuplesize outliers.However, commit0d861bb missed a closely related issue: split intervalitself is calculated based on the assumption that tuples on the pagebeing split are roughly equisized. That assumption was acceptable backwhen commitfab2502 taught the logic for choosing a split point aboutsuffix truncation, but it's pretty questionable now that very largetuple sizes are common. This oversight led to unbalanced page splits inlow cardinality multi-column indexes when deduplication was used: pagesplits that don't give sufficient weight to how unbalanced the split iswhen the interval happens to include some large posting list tuples (andwhen most other tuples on the page are not so large).Nail this down by calculating an initial split interval in a way that'sattuned to the actual cost that we want to keep under control (not afuzzy proxy for the cost): apply a leftfree + rightfree evenness test toeach candidate split point that actually gets included in the splitinterval (for the default strategy). This replaces logic that used apercentage of all legal split points for the page as the basis of theinitial split interval.Discussion:https://postgr.es/m/CAH2-WznJt5aT2uUB2Bs+JBLdwe0XTX67+xeLFcaNvCKxO=QBVQ@mail.gmail.com