Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitcf2acaf

Browse files
Deprecate nbtree's BTP_HAS_GARBAGE flag.
Streamline handling of the various strategies that we have to avoid apage split in nbtinsert.c. When it looks like a leaf page is about tooverflow, we now perform deleting LP_DEAD items and deduplication in onecentral place. This greatly simplifies _bt_findinsertloc().This has an independently useful consequence: nbtree no longer relies onthe BTP_HAS_GARBAGE page level flag/hint for anything important. Westill set and unset the flag in the same way as before, but it's nolonger treated as a gating condition when considering if we should checkfor already-set LP_DEAD bits. This happens at the point where the pagelooks like it might have to be split anyway, so simply checking theLP_DEAD bits in passing is practically free. This avoids missingLP_DEAD bits just because the page-level hint is unset, which isprobably reasonably common (e.g. it happens when VACUUM unsets thepage-level flag without actually removing index tuples whose LP_DEAD-bitwas set recently, after the VACUUM operation began but before it reachedthe leaf page in question).Note that this isn't a big behavioral change compared to PostgreSQL 13.We were already checking for set LP_DEAD bits regardless of whether theBTP_HAS_GARBAGE page level flag was set before we considered doing adeduplication pass. This commit only goes slightly further by doing thesame check for all indexes, even indexes where deduplication won't beperformed.We don't completely remove the BTP_HAS_GARBAGE flag. We still rely onit as a gating condition with pg_upgrade'd indexes from before B-treeversion 4/PostgreSQL 12. That makes sense because we sometimes have tomake a choice among pages full of duplicates when inserting a tuple withpre version 4 indexes. It probably still pays to avoid accessing theline pointer array of a page there, since it won't yet be clear whetherwe'll insert on to the page in question at all, let alone split it as aresult.Author: Peter Geoghegan <pg@bowt.ie>Reviewed-By: Victor Yegorov <vyegorov@gmail.com>Discussion:https://postgr.es/m/CAH2-Wz%3DYpc1PDdk8OVJDChGJBjT06%3DA0Mbv9HyTLCsOknGcUFg%40mail.gmail.com
1 parent7684b6f commitcf2acaf

File tree

6 files changed

+135
-130
lines changed

6 files changed

+135
-130
lines changed

‎src/backend/access/nbtree/nbtdedup.c

Lines changed: 16 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,7 @@ static bool _bt_posting_valid(IndexTuple posting);
2828
#endif
2929

3030
/*
31-
* Deduplicate items on a leaf page. The page will have to be split by caller
32-
* if we cannot successfully free at least newitemsz (we also need space for
33-
* newitem's line pointer, which isn't included in caller's newitemsz).
31+
* Perform a deduplication pass.
3432
*
3533
* The general approach taken here is to perform as much deduplication as
3634
* possible to free as much space as possible. Note, however, that "single
@@ -43,76 +41,32 @@ static bool _bt_posting_valid(IndexTuple posting);
4341
* handle those if and when the anticipated right half page gets its own
4442
* deduplication pass, following further inserts of duplicates.)
4543
*
46-
* This function should be called during insertion, when the page doesn't have
47-
* enough space to fit an incoming newitem. If the BTP_HAS_GARBAGE page flag
48-
* was set, caller should have removed any LP_DEAD items by calling
49-
* _bt_vacuum_one_page() before calling here. We may still have to kill
50-
* LP_DEAD items here when the page's BTP_HAS_GARBAGE hint is falsely unset,
51-
* but that should be rare. Also, _bt_vacuum_one_page() won't unset the
52-
* BTP_HAS_GARBAGE flag when it finds no LP_DEAD items, so a successful
53-
* deduplication pass will always clear it, just to keep things tidy.
44+
* The page will have to be split if we cannot successfully free at least
45+
* newitemsz (we also need space for newitem's line pointer, which isn't
46+
* included in caller's newitemsz).
47+
*
48+
* Note: Caller should have already deleted all existing items with their
49+
* LP_DEAD bits set.
5450
*/
5551
void
56-
_bt_dedup_one_page(Relationrel,Bufferbuf,RelationheapRel,
57-
IndexTuplenewitem,Sizenewitemsz,boolcheckingunique)
52+
_bt_dedup_pass(Relationrel,Bufferbuf,RelationheapRel,IndexTuplenewitem,
53+
Sizenewitemsz,boolcheckingunique)
5854
{
5955
OffsetNumberoffnum,
6056
minoff,
6157
maxoff;
6258
Pagepage=BufferGetPage(buf);
63-
BTPageOpaqueopaque;
59+
BTPageOpaqueopaque= (BTPageOpaque)PageGetSpecialPointer(page);
6460
Pagenewpage;
65-
OffsetNumberdeletable[MaxIndexTuplesPerPage];
6661
BTDedupStatestate;
67-
intndeletable=0;
6862
Sizepagesaving=0;
6963
boolsinglevalstrat= false;
7064
intnkeyatts=IndexRelationGetNumberOfKeyAttributes(rel);
7165

72-
/*
73-
* We can't assume that there are no LP_DEAD items. For one thing, VACUUM
74-
* will clear the BTP_HAS_GARBAGE hint without reliably removing items
75-
* that are marked LP_DEAD. We don't want to unnecessarily unset LP_DEAD
76-
* bits when deduplicating items. Allowing it would be correct, though
77-
* wasteful.
78-
*/
79-
opaque= (BTPageOpaque)PageGetSpecialPointer(page);
80-
minoff=P_FIRSTDATAKEY(opaque);
81-
maxoff=PageGetMaxOffsetNumber(page);
82-
for (offnum=minoff;
83-
offnum <=maxoff;
84-
offnum=OffsetNumberNext(offnum))
85-
{
86-
ItemIditemid=PageGetItemId(page,offnum);
87-
88-
if (ItemIdIsDead(itemid))
89-
deletable[ndeletable++]=offnum;
90-
}
91-
92-
if (ndeletable>0)
93-
{
94-
_bt_delitems_delete(rel,buf,deletable,ndeletable,heapRel);
95-
96-
/*
97-
* Return when a split will be avoided. This is equivalent to
98-
* avoiding a split using the usual _bt_vacuum_one_page() path.
99-
*/
100-
if (PageGetFreeSpace(page) >=newitemsz)
101-
return;
102-
103-
/*
104-
* Reconsider number of items on page, in case _bt_delitems_delete()
105-
* managed to delete an item or two
106-
*/
107-
minoff=P_FIRSTDATAKEY(opaque);
108-
maxoff=PageGetMaxOffsetNumber(page);
109-
}
110-
11166
/* Passed-in newitemsz is MAXALIGNED but does not include line pointer */
11267
newitemsz+=sizeof(ItemIdData);
11368

11469
/*
115-
* By here, it's clear that deduplication will definitely be attempted.
11670
* Initialize deduplication state.
11771
*
11872
* It would be possible for maxpostingsize (limit on posting list tuple
@@ -138,6 +92,9 @@ _bt_dedup_one_page(Relation rel, Buffer buf, Relation heapRel,
13892
/* nintervals should be initialized to zero */
13993
state->nintervals=0;
14094

95+
minoff=P_FIRSTDATAKEY(opaque);
96+
maxoff=PageGetMaxOffsetNumber(page);
97+
14198
/* Determine if "single value" strategy should be used */
14299
if (!checkingunique)
143100
singlevalstrat=_bt_do_singleval(rel,page,state,minoff,newitem);
@@ -259,10 +216,9 @@ _bt_dedup_one_page(Relation rel, Buffer buf, Relation heapRel,
259216
/*
260217
* By here, it's clear that deduplication will definitely go ahead.
261218
*
262-
* Clear the BTP_HAS_GARBAGE page flag in the unlikely event that it is
263-
* still falsely set, just to keep things tidy. (We can't rely on
264-
* _bt_vacuum_one_page() having done this already, and we can't rely on a
265-
* page split or VACUUM getting to it in the near future.)
219+
* Clear the BTP_HAS_GARBAGE page flag. The index must be a heapkeyspace
220+
* index, and as such we'll never pay attention to BTP_HAS_GARBAGE anyway.
221+
* But keep things tidy.
266222
*/
267223
if (P_HAS_GARBAGE(opaque))
268224
{

‎src/backend/access/nbtree/nbtinsert.c

Lines changed: 95 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,10 @@ static void _bt_insert_parent(Relation rel, Buffer buf, Buffer rbuf,
5858
staticBuffer_bt_newroot(Relationrel,Bufferlbuf,Bufferrbuf);
5959
staticinlinebool_bt_pgaddtup(Pagepage,Sizeitemsize,IndexTupleitup,
6060
OffsetNumberitup_off,boolnewfirstdataitem);
61-
staticvoid_bt_vacuum_one_page(Relationrel,Bufferbuffer,RelationheapRel);
61+
staticvoid_bt_delete_or_dedup_one_page(Relationrel,RelationheapRel,
62+
BTInsertStateinsertstate,
63+
boollpdeadonly,boolcheckingunique,
64+
booluniquedup);
6265

6366
/*
6467
*_bt_doinsert() -- Handle insertion of a single index tuple in the tree.
@@ -871,38 +874,14 @@ _bt_findinsertloc(Relation rel,
871874
}
872875

873876
/*
874-
* If the target page is full, see if we can obtain enough space by
875-
* erasing LP_DEAD items. If that fails to free enough space, see if
876-
* we can avoid a page split by performing a deduplication pass over
877-
* the page.
878-
*
879-
* We only perform a deduplication pass for a checkingunique caller
880-
* when the incoming item is a duplicate of an existing item on the
881-
* leaf page. This heuristic avoids wasting cycles -- we only expect
882-
* to benefit from deduplicating a unique index page when most or all
883-
* recently added items are duplicates. See nbtree/README.
877+
* If the target page is full, see if we can obtain enough space using
878+
* one or more strategies (e.g. erasing LP_DEAD items, deduplication).
879+
* Page splits are expensive, and should only go ahead when truly
880+
* necessary.
884881
*/
885882
if (PageGetFreeSpace(page)<insertstate->itemsz)
886-
{
887-
if (P_HAS_GARBAGE(opaque))
888-
{
889-
_bt_vacuum_one_page(rel,insertstate->buf,heapRel);
890-
insertstate->bounds_valid= false;
891-
892-
/* Might as well assume duplicates (if checkingunique) */
893-
uniquedup= true;
894-
}
895-
896-
if (itup_key->allequalimage&&BTGetDeduplicateItems(rel)&&
897-
(!checkingunique||uniquedup)&&
898-
PageGetFreeSpace(page)<insertstate->itemsz)
899-
{
900-
_bt_dedup_one_page(rel,insertstate->buf,heapRel,
901-
insertstate->itup,insertstate->itemsz,
902-
checkingunique);
903-
insertstate->bounds_valid= false;
904-
}
905-
}
883+
_bt_delete_or_dedup_one_page(rel,heapRel,insertstate, false,
884+
checkingunique,uniquedup);
906885
}
907886
else
908887
{
@@ -942,8 +921,9 @@ _bt_findinsertloc(Relation rel,
942921
*/
943922
if (P_HAS_GARBAGE(opaque))
944923
{
945-
_bt_vacuum_one_page(rel,insertstate->buf,heapRel);
946-
insertstate->bounds_valid= false;
924+
/* Erase LP_DEAD items (won't deduplicate) */
925+
_bt_delete_or_dedup_one_page(rel,heapRel,insertstate, true,
926+
checkingunique, false);
947927

948928
if (PageGetFreeSpace(page) >=insertstate->itemsz)
949929
break;/* OK, now we have enough space */
@@ -993,14 +973,17 @@ _bt_findinsertloc(Relation rel,
993973
* performing a posting list split, so delete all LP_DEAD items early.
994974
* This is the only case where LP_DEAD deletes happen even though
995975
* there is space for newitem on the page.
976+
*
977+
* This can only erase LP_DEAD items (it won't deduplicate).
996978
*/
997-
_bt_vacuum_one_page(rel,insertstate->buf,heapRel);
979+
_bt_delete_or_dedup_one_page(rel,heapRel,insertstate, true,
980+
checkingunique, false);
998981

999982
/*
1000983
* Do new binary search. New insert location cannot overlap with any
1001984
* posting list now.
1002985
*/
1003-
insertstate->bounds_valid= false;
986+
Assert(!insertstate->bounds_valid);
1004987
insertstate->postingoff=0;
1005988
newitemoff=_bt_binsrch_insert(rel,insertstate);
1006989
Assert(insertstate->postingoff==0);
@@ -2623,32 +2606,59 @@ _bt_pgaddtup(Page page,
26232606
}
26242607

26252608
/*
2626-
* _bt_vacuum_one_page - vacuum just one index page.
2609+
* _bt_delete_or_dedup_one_page - Try to avoid a leaf page split by attempting
2610+
* a variety of operations.
2611+
*
2612+
* There are two operations performed here: deleting items already marked
2613+
* LP_DEAD, and deduplication. If both operations fail to free enough space
2614+
* for the incoming item then caller will go on to split the page. We always
2615+
* attempt our preferred strategy (which is to delete items whose LP_DEAD bit
2616+
* are set) first. If that doesn't work out we move on to deduplication.
2617+
*
2618+
* Caller's checkingunique and uniquedup arguments help us decide if we should
2619+
* perform deduplication, which is primarily useful with low cardinality data,
2620+
* but can sometimes absorb version churn.
2621+
*
2622+
* Callers that only want us to look for/delete LP_DEAD items can ask for that
2623+
* directly by passing true 'lpdeadonly' argument.
26272624
*
2628-
* Try to remove LP_DEAD items from the given page. The passed buffer
2629-
* must be exclusive-locked, but unlike a real VACUUM, we don't need a
2630-
* super-exclusive "cleanup" lock (see nbtree/README).
2625+
* Note: We used to only delete LP_DEAD items when the BTP_HAS_GARBAGE page
2626+
* level flag was found set. The flag was useful back when there wasn't
2627+
* necessarily one single page for a duplicate tuple to go on (before heap TID
2628+
* became a part of the key space in version 4 indexes). But we don't
2629+
* actually look at the flag anymore (it's not a gating condition for our
2630+
* caller). That would cause us to miss tuples that are safe to delete,
2631+
* without getting any benefit in return. We know that the alternative is to
2632+
* split the page; scanning the line pointer array in passing won't have
2633+
* noticeable overhead. (We still maintain the BTP_HAS_GARBAGE flag despite
2634+
* all this because !heapkeyspace indexes must still do a "getting tired"
2635+
* linear search, and so are likely to get some benefit from using it as a
2636+
* gating condition.)
26312637
*/
26322638
staticvoid
2633-
_bt_vacuum_one_page(Relationrel,Bufferbuffer,RelationheapRel)
2639+
_bt_delete_or_dedup_one_page(Relationrel,RelationheapRel,
2640+
BTInsertStateinsertstate,
2641+
boollpdeadonly,boolcheckingunique,
2642+
booluniquedup)
26342643
{
26352644
OffsetNumberdeletable[MaxIndexTuplesPerPage];
26362645
intndeletable=0;
26372646
OffsetNumberoffnum,
2638-
minoff,
26392647
maxoff;
2648+
Bufferbuffer=insertstate->buf;
2649+
BTScanInsertitup_key=insertstate->itup_key;
26402650
Pagepage=BufferGetPage(buffer);
26412651
BTPageOpaqueopaque= (BTPageOpaque)PageGetSpecialPointer(page);
26422652

26432653
Assert(P_ISLEAF(opaque));
2654+
Assert(lpdeadonly||itup_key->heapkeyspace);
26442655

26452656
/*
26462657
* Scan over all items to see which ones need to be deleted according to
26472658
* LP_DEAD flags.
26482659
*/
2649-
minoff=P_FIRSTDATAKEY(opaque);
26502660
maxoff=PageGetMaxOffsetNumber(page);
2651-
for (offnum=minoff;
2661+
for (offnum=P_FIRSTDATAKEY(opaque);
26522662
offnum <=maxoff;
26532663
offnum=OffsetNumberNext(offnum))
26542664
{
@@ -2659,12 +2669,50 @@ _bt_vacuum_one_page(Relation rel, Buffer buffer, Relation heapRel)
26592669
}
26602670

26612671
if (ndeletable>0)
2672+
{
26622673
_bt_delitems_delete(rel,buffer,deletable,ndeletable,heapRel);
2674+
insertstate->bounds_valid= false;
2675+
2676+
/* Return when a page split has already been avoided */
2677+
if (PageGetFreeSpace(page) >=insertstate->itemsz)
2678+
return;
2679+
2680+
/* Might as well assume duplicates (if checkingunique) */
2681+
uniquedup= true;
2682+
}
2683+
2684+
/*
2685+
* Some callers only want to delete LP_DEAD items. Return early for these
2686+
* callers.
2687+
*
2688+
* Note: The page's BTP_HAS_GARBAGE hint flag may still be set when we
2689+
* return at this point (or when we go on the try either or both of our
2690+
* other strategies and they also fail). We do not bother expending a
2691+
* separate write to clear it, however. Caller will definitely clear it
2692+
* when it goes on to split the page (plus deduplication knows to clear
2693+
* the flag when it actually modifies the page).
2694+
*/
2695+
if (lpdeadonly)
2696+
return;
2697+
2698+
/*
2699+
* We can get called in the checkingunique case when there is no reason to
2700+
* believe that there are any duplicates on the page; we should at least
2701+
* still check for LP_DEAD items. If that didn't work out, give up and
2702+
* let caller split the page. Deduplication cannot be justified given
2703+
* there is no reason to think that there are duplicates.
2704+
*/
2705+
if (checkingunique&& !uniquedup)
2706+
return;
2707+
2708+
/* Assume bounds about to be invalidated (this is almost certain now) */
2709+
insertstate->bounds_valid= false;
26632710

26642711
/*
2665-
* Note: if we didn't find any LP_DEAD items, then the page's
2666-
* BTP_HAS_GARBAGE hint bit is falsely set. We do not bother expending a
2667-
* separate write to clear it, however. We will clear it when we split
2668-
* the page, or when deduplication runs.
2712+
* Perform deduplication pass, though only when it is enabled for the
2713+
* index and known to be safe (it must be an allequalimage index).
26692714
*/
2715+
if (BTGetDeduplicateItems(rel)&&itup_key->allequalimage)
2716+
_bt_dedup_pass(rel,buffer,heapRel,insertstate->itup,
2717+
insertstate->itemsz,checkingunique);
26702718
}

‎src/backend/access/nbtree/nbtpage.c

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1187,8 +1187,8 @@ _bt_delitems_vacuum(Relation rel, Buffer buf,
11871187
* array of offset numbers.
11881188
*
11891189
* PageIndexTupleOverwrite() won't unset each item's LP_DEAD bit when it
1190-
* happens to already be set.Although we unset the BTP_HAS_GARBAGE page
1191-
*level flag, unsetting individual LP_DEAD bits should still be avoided.
1190+
* happens to already be set.It's important that we not interfere with
1191+
*_bt_delitems_delete().
11921192
*/
11931193
for (inti=0;i<nupdatable;i++)
11941194
{
@@ -1215,20 +1215,12 @@ _bt_delitems_vacuum(Relation rel, Buffer buf,
12151215
opaque->btpo_cycleid=0;
12161216

12171217
/*
1218-
* Mark the page as not containing any LP_DEAD items. This is not
1219-
* certainly true (there might be some that have recently been marked, but
1220-
* weren't targeted by VACUUM's heap scan), but it will be true often
1221-
* enough. VACUUM does not delete items purely because they have their
1222-
* LP_DEAD bit set, since doing so would necessitate explicitly logging a
1223-
* latestRemovedXid cutoff (this is how _bt_delitems_delete works).
1218+
* Clear the BTP_HAS_GARBAGE page flag.
12241219
*
1225-
* The consequences of falsely unsetting BTP_HAS_GARBAGE should be fairly
1226-
* limited, since we never falsely unset an LP_DEAD bit. Workloads that
1227-
* are particularly dependent on LP_DEAD bits being set quickly will
1228-
* usually manage to set the BTP_HAS_GARBAGE flag before the page fills up
1229-
* again anyway. Furthermore, attempting a deduplication pass will remove
1230-
* all LP_DEAD items, regardless of whether the BTP_HAS_GARBAGE hint bit
1231-
* is set or not.
1220+
* This flag indicates the presence of LP_DEAD items on the page (though
1221+
* not reliably). Note that we only trust it with pg_upgrade'd
1222+
* !heapkeyspace indexes. That's why clearing it here won't usually
1223+
* interfere with _bt_delitems_delete().
12321224
*/
12331225
opaque->btpo_flags &= ~BTP_HAS_GARBAGE;
12341226

@@ -1310,10 +1302,17 @@ _bt_delitems_delete(Relation rel, Buffer buf,
13101302

13111303
/*
13121304
* Unlike _bt_delitems_vacuum, we *must not* clear the vacuum cycle ID,
1313-
* because this is not called by VACUUM. Just clear the BTP_HAS_GARBAGE
1314-
* page flag, since we deleted all items with their LP_DEAD bit set.
1305+
* because this is not called by VACUUM
13151306
*/
13161307
opaque= (BTPageOpaque)PageGetSpecialPointer(page);
1308+
1309+
/*
1310+
* Clear the BTP_HAS_GARBAGE page flag.
1311+
*
1312+
* This flag indicates the presence of LP_DEAD items on the page (though
1313+
* not reliably). Note that we only trust it with pg_upgrade'd
1314+
* !heapkeyspace indexes.
1315+
*/
13171316
opaque->btpo_flags &= ~BTP_HAS_GARBAGE;
13181317

13191318
MarkBufferDirty(buf);

‎src/backend/access/nbtree/nbtutils.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1877,7 +1877,8 @@ _bt_killitems(IndexScanDesc scan)
18771877
* Since this can be redone later if needed, mark as dirty hint.
18781878
*
18791879
* Whenever we mark anything LP_DEAD, we also set the page's
1880-
* BTP_HAS_GARBAGE flag, which is likewise just a hint.
1880+
* BTP_HAS_GARBAGE flag, which is likewise just a hint. (Note that we
1881+
* only rely on the page-level flag in !heapkeyspace indexes.)
18811882
*/
18821883
if (killedsomething)
18831884
{

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp