Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit6dd86c2

Browse files
Fix nbtsort.c's page space accounting.
Commitdd299df, which made heap TID a tiebreaker nbtree indexcolumn, introduced new rules on page space management to make suffixtruncation safe. In general, suffix truncation needs to have a smallamount of extra space available on the new left page when splitting aleaf page. This is needed in case it turns out that truncation cannoteven "truncate away the heap TID column", resulting in alarger-than-firstright leaf high key with an explicit heap TIDrepresentation.Despite all this, CREATE INDEX/nbtsort.c did not account for thepossible need for extra heap TID space on leaf pages when decidingwhether or not a new item could fit on current page. This could lead to"failed to add item to the index page" errors when CREATEINDEX/nbtsort.c tried to finish off a leaf page that lacked space for alarger-than-firstright leaf high key (it only had space for firstrighttuple, which was just short of what was needed following "truncation").Several conditions needed to be met all at once for CREATE INDEX tofail. The problem was in the hard limit on what will fit on a page,which tends to be masked by the soft fillfactor-wise limit. The easiestway to recreate the problem seems to be a CREATE INDEX on a lowcardinality text column, with tuples that are of non-uniform width,using a fillfactor of 100.To fix, bring nbtsort.c in line with nbtsplitloc.c, which alreadypessimistically assumes that all leaf page splits will have high keysthat have a heap TID appended.Reported-By: Andreas Joseph KroghDiscussion:https://postgr.es/m/VisenaEmail.c5.3ee7fe277d514162.16a6d785bea@tc7-visena
1 parentdd69597 commit6dd86c2

File tree

1 file changed

+42
-18
lines changed

1 file changed

+42
-18
lines changed

‎src/backend/access/nbtree/nbtsort.c

Lines changed: 42 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -841,6 +841,7 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
841841
OffsetNumberlast_off;
842842
Sizepgspc;
843843
Sizeitupsz;
844+
boolisleaf;
844845

845846
/*
846847
* This is a handy place to check for cancel interrupts during the btree
@@ -855,9 +856,12 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
855856
pgspc=PageGetFreeSpace(npage);
856857
itupsz=IndexTupleSize(itup);
857858
itupsz=MAXALIGN(itupsz);
859+
/* Leaf case has slightly different rules due to suffix truncation */
860+
isleaf= (state->btps_level==0);
858861

859862
/*
860-
* Check whether the item can fit on a btree page at all.
863+
* Check whether the new item can fit on a btree page on current level at
864+
* all.
861865
*
862866
* Every newly built index will treat heap TID as part of the keyspace,
863867
* which imposes the requirement that new high keys must occasionally have
@@ -870,16 +874,29 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
870874
* the reserved space. This should never fail on internal pages.
871875
*/
872876
if (unlikely(itupsz>BTMaxItemSize(npage)))
873-
_bt_check_third_page(wstate->index,wstate->heap,
874-
state->btps_level==0,npage,itup);
877+
_bt_check_third_page(wstate->index,wstate->heap,isleaf,npage,
878+
itup);
875879

876880
/*
877-
* Check to see if page is "full". It's definitely full if the item won't
878-
* fit. Otherwise, compare to the target freespace derived from the
879-
* fillfactor. However, we must put at least two items on each page, so
880-
* disregard fillfactor if we don't have that many.
881+
* Check to see if current page will fit new item, with space left over to
882+
* append a heap TID during suffix truncation when page is a leaf page.
883+
*
884+
* It is guaranteed that we can fit at least 2 non-pivot tuples plus a
885+
* high key with heap TID when finishing off a leaf page, since we rely on
886+
* _bt_check_third_page() rejecting oversized non-pivot tuples. On
887+
* internal pages we can always fit 3 pivot tuples with larger internal
888+
* page tuple limit (includes page high key).
889+
*
890+
* Most of the time, a page is only "full" in the sense that the soft
891+
* fillfactor-wise limit has been exceeded. However, we must always leave
892+
* at least two items plus a high key on each page before starting a new
893+
* page. Disregard fillfactor and insert on "full" current page if we
894+
* don't have the minimum number of items yet. (Note that we deliberately
895+
* assume that suffix truncation neither enlarges nor shrinks new high key
896+
* when applying soft limit.)
881897
*/
882-
if (pgspc<itupsz|| (pgspc<state->btps_full&&last_off>P_FIRSTKEY))
898+
if (pgspc<itupsz+ (isleaf ?MAXALIGN(sizeof(ItemPointerData)) :0)||
899+
(pgspc<state->btps_full&&last_off>P_FIRSTKEY))
883900
{
884901
/*
885902
* Finish off the page and write it out.
@@ -889,7 +906,6 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
889906
ItemIdii;
890907
ItemIdhii;
891908
IndexTupleoitup;
892-
BTPageOpaqueopageop= (BTPageOpaque)PageGetSpecialPointer(opage);
893909

894910
/* Create new page of same level */
895911
npage=_bt_blnewpage(state->btps_level);
@@ -910,14 +926,20 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
910926
_bt_sortaddtup(npage,ItemIdGetLength(ii),oitup,P_FIRSTKEY);
911927

912928
/*
913-
* Move 'last' into the high key position on opage
929+
* Move 'last' into the high key position on opage. _bt_blnewpage()
930+
* allocated empty space for a line pointer when opage was first
931+
* created, so this is a matter of rearranging already-allocated space
932+
* on page, and initializing high key line pointer. (Actually, leaf
933+
* pages must also swap oitup with a truncated version of oitup, which
934+
* is sometimes larger than oitup, though never by more than the space
935+
* needed to append a heap TID.)
914936
*/
915937
hii=PageGetItemId(opage,P_HIKEY);
916938
*hii=*ii;
917939
ItemIdSetUnused(ii);/* redundant */
918940
((PageHeader)opage)->pd_lower-=sizeof(ItemIdData);
919941

920-
if (P_ISLEAF(opageop))
942+
if (isleaf)
921943
{
922944
IndexTuplelastleft;
923945
IndexTupletruncated;
@@ -943,15 +965,13 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
943965
* tuple, it cannot just be copied in place (besides, we want
944966
* to actually save space on the leaf page). We delete the
945967
* original high key, and add our own truncated high key at the
946-
* same offset. It's okay if the truncated tuple is slightly
947-
* larger due to containing a heap TID value, since this case is
948-
* known to _bt_check_third_page(), which reserves space.
968+
* same offset.
949969
*
950970
* Note that the page layout won't be changed very much. oitup is
951971
* already located at the physical beginning of tuple space, so we
952972
* only shift the line pointer array back and forth, and overwrite
953-
* thelatter portion of thespace occupied bythe original tuple.
954-
*This is fairlycheap.
973+
* thetuplespacepreviouslyoccupied byoitup. This is fairly
974+
* cheap.
955975
*/
956976
ii=PageGetItemId(opage,OffsetNumberPrev(last_off));
957977
lastleft= (IndexTuple)PageGetItem(opage,ii);
@@ -979,9 +999,9 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
979999
Assert((BTreeTupleGetNAtts(state->btps_minkey,wstate->index) <=
9801000
IndexRelationGetNumberOfKeyAttributes(wstate->index)&&
9811001
BTreeTupleGetNAtts(state->btps_minkey,wstate->index)>0)||
982-
P_LEFTMOST(opageop));
1002+
P_LEFTMOST((BTPageOpaque)PageGetSpecialPointer(opage)));
9831003
Assert(BTreeTupleGetNAtts(state->btps_minkey,wstate->index)==0||
984-
!P_LEFTMOST(opageop));
1004+
!P_LEFTMOST((BTPageOpaque)PageGetSpecialPointer(opage)));
9851005
BTreeInnerTupleSetDownLink(state->btps_minkey,oblkno);
9861006
_bt_buildadd(wstate,state->btps_next,state->btps_minkey);
9871007
pfree(state->btps_minkey);
@@ -1018,6 +1038,10 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, IndexTuple itup)
10181038
}
10191039

10201040
/*
1041+
* By here, either original page is still the current page, or a new page
1042+
* was created that became the current page. Either way, the current page
1043+
* definitely has space for new item.
1044+
*
10211045
* If the new item is the first for its page, stash a copy for later. Note
10221046
* this will only happen for the first item on a level; on later pages,
10231047
* the first item for a page is copied from the prior page in the code

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp