Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commite4fa6c9

Browse files
Fix bug in nbtree VACUUM "skip full scan" feature.
Commit857f9c3 (which taught nbtree VACUUM to skip a scan of theindex from btcleanup in situations where it doesn't seem worth it) madeVACUUM maintain the oldest btpo.xact among all deleted pages for theindex as a whole. It failed to handle all the details surrounding pagesthat are deleted by the current VACUUM operation correctly (though pagesdeleted by some previous VACUUM operation were processed correctly).The most immediate problem was that the special area of the page wasexamined without a buffer pin at one point. More fundamentally, thehandling failed to account for the full range of _bt_pagedel()behaviors. For example, _bt_pagedel() sometimes deletes internal pagesin passing, as part of deleting an entire subtree with btvacuumpage()caller's page as the leaf level page. The original leaf page passed to_bt_pagedel() might not be the page that it deletes first in cases wheredeletion can take place.It's unclear how disruptive this bug may have been, or what symptomsusers might want to look out for. The issue was spotted duringunrelated code review.To fix, push down the logic for maintaining the oldest btpo.xact to_bt_pagedel(). btvacuumpage() is now responsible for pages that werefully deleted by a previous VACUUM operation, while _bt_pagedel() is nowresponsible for pages that were deleted by the current VACUUM operation(this includes half-dead pages from a previous interrupted VACUUMoperation that become fully deleted in _bt_pagedel()). Note that_bt_pagedel() should never encounter an existing deleted page.This commit theoretically breaks the ABI of a stable release by changingthe signature of _bt_pagedel(). However, if any third party extensionis actually affected by this, then it must already be completely broken(since there are numerous assumptions made in _bt_pagedel() that cannotbe met outside of VACUUM). It seems highly unlikely that such anextension actually exists, in any case.Author: Peter GeogheganReviewed-By: Masahiko SawadaDiscussion:https://postgr.es/m/CAH2-WzkrXBcMQWAYUJMFTTvzx_r4q=pYSjDe07JnUXhe+OZnJA@mail.gmail.comBackpatch: 11-, where the "skip full scan" feature was introduced.
1 parent98a4d69 commite4fa6c9

File tree

3 files changed

+149
-83
lines changed

3 files changed

+149
-83
lines changed

‎src/backend/access/nbtree/nbtpage.c

Lines changed: 94 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,9 @@
3333
#include"storage/predicate.h"
3434
#include"utils/snapmgr.h"
3535

36-
staticbool_bt_mark_page_halfdead(Relationrel,Bufferbuf,BTStackstack);
36+
staticbool_bt_mark_page_halfdead(Relationrel,Bufferleafbuf,BTStackstack);
3737
staticbool_bt_unlink_halfdead_page(Relationrel,Bufferleafbuf,
38-
bool*rightsib_empty);
38+
bool*rightsib_empty,TransactionId*oldestBtpoXact);
3939
staticbool_bt_lock_branch_parent(Relationrel,BlockNumberchild,
4040
BTStackstack,Buffer*topparent,OffsetNumber*topoff,
4141
BlockNumber*target,BlockNumber*rightsib);
@@ -1219,27 +1219,35 @@ _bt_lock_branch_parent(Relation rel, BlockNumber child, BTStack stack,
12191219
}
12201220

12211221
/*
1222-
* _bt_pagedel() -- Delete a page from the b-tree, if legal to do so.
1222+
* _bt_pagedel() -- Delete aleafpage from the b-tree, if legal to do so.
12231223
*
1224-
* This action unlinks the page from the b-tree structure, removing all
1224+
* This action unlinks theleafpage from the b-tree structure, removing all
12251225
* pointers leading to it --- but not touching its own left and right links.
12261226
* The page cannot be physically reclaimed right away, since other processes
12271227
* may currently be trying to follow links leading to the page; they have to
12281228
* be allowed to use its right-link to recover. See nbtree/README.
12291229
*
12301230
* On entry, the target buffer must be pinned and locked (either read or write
1231-
* lock is OK). This lock and pin will be dropped before exiting.
1231+
* lock is OK). The page must be an empty leaf page, which may be half-dead
1232+
* already (a half-dead page should only be passed to us when an earlier
1233+
* VACUUM operation was interrupted, though). Note in particular that caller
1234+
* should never pass a buffer containing an existing deleted page here. The
1235+
* lock and pin on caller's buffer will be dropped before we return.
12321236
*
12331237
* Returns the number of pages successfully deleted (zero if page cannot
1234-
* be deleted now; could be more than one if parent or sibling pages were
1235-
* deleted too).
1238+
* be deleted now; could be more than one if parent or right sibling pages
1239+
* were deleted too).
1240+
*
1241+
* Maintains *oldestBtpoXact for any pages that get deleted. Caller is
1242+
* responsible for maintaining *oldestBtpoXact in the case of pages that were
1243+
* deleted by a previous VACUUM.
12361244
*
12371245
* NOTE: this leaks memory. Rather than trying to clean up everything
12381246
* carefully, it's better to run it in a temp context that can be reset
12391247
* frequently.
12401248
*/
12411249
int
1242-
_bt_pagedel(Relationrel,Bufferbuf)
1250+
_bt_pagedel(Relationrel,Bufferleafbuf,TransactionId*oldestBtpoXact)
12431251
{
12441252
intndeleted=0;
12451253
BlockNumberrightsib;
@@ -1260,14 +1268,21 @@ _bt_pagedel(Relation rel, Buffer buf)
12601268

12611269
for (;;)
12621270
{
1263-
page=BufferGetPage(buf);
1271+
page=BufferGetPage(leafbuf);
12641272
opaque= (BTPageOpaque)PageGetSpecialPointer(page);
12651273

12661274
/*
12671275
* Internal pages are never deleted directly, only as part of deleting
12681276
* the whole branch all the way down to leaf level.
1277+
*
1278+
* Also check for deleted pages here. Caller never passes us a fully
1279+
* deleted page. Only VACUUM can delete pages, so there can't have
1280+
* been a concurrent deletion. Assume that we reached any deleted
1281+
* page encountered here by following a sibling link, and that the
1282+
* index is corrupt.
12691283
*/
1270-
if (!P_ISLEAF(opaque))
1284+
Assert(!P_ISDELETED(opaque));
1285+
if (!P_ISLEAF(opaque)||P_ISDELETED(opaque))
12711286
{
12721287
/*
12731288
* Pre-9.4 page deletion only marked internal pages as half-dead,
@@ -1286,13 +1301,22 @@ _bt_pagedel(Relation rel, Buffer buf)
12861301
errmsg("index \"%s\" contains a half-dead internal page",
12871302
RelationGetRelationName(rel)),
12881303
errhint("This can be caused by an interrupted VACUUM in version 9.3 or older, before upgrade. Please REINDEX it.")));
1289-
_bt_relbuf(rel,buf);
1304+
1305+
if (P_ISDELETED(opaque))
1306+
ereport(LOG,
1307+
(errcode(ERRCODE_INDEX_CORRUPTED),
1308+
errmsg_internal("found deleted block %u while following right link in index \"%s\"",
1309+
BufferGetBlockNumber(leafbuf),
1310+
RelationGetRelationName(rel))));
1311+
1312+
_bt_relbuf(rel,leafbuf);
12901313
returnndeleted;
12911314
}
12921315

12931316
/*
12941317
* We can never delete rightmost pages nor root pages. While at it,
1295-
* check that page is not already deleted and is empty.
1318+
* check that page is empty, since it's possible that the leafbuf page
1319+
* was empty a moment ago, but has since had some inserts.
12961320
*
12971321
* To keep the algorithm simple, we also never delete an incompletely
12981322
* split page (they should be rare enough that this doesn't make any
@@ -1307,14 +1331,14 @@ _bt_pagedel(Relation rel, Buffer buf)
13071331
* to. On subsequent iterations, we know we stepped right from a page
13081332
* that passed these tests, so it's OK.
13091333
*/
1310-
if (P_RIGHTMOST(opaque)||P_ISROOT(opaque)||P_ISDELETED(opaque)||
1334+
if (P_RIGHTMOST(opaque)||P_ISROOT(opaque)||
13111335
P_FIRSTDATAKEY(opaque) <=PageGetMaxOffsetNumber(page)||
13121336
P_INCOMPLETE_SPLIT(opaque))
13131337
{
13141338
/* Should never fail to delete a half-dead page */
13151339
Assert(!P_ISHALFDEAD(opaque));
13161340

1317-
_bt_relbuf(rel,buf);
1341+
_bt_relbuf(rel,leafbuf);
13181342
returnndeleted;
13191343
}
13201344

@@ -1351,7 +1375,7 @@ _bt_pagedel(Relation rel, Buffer buf)
13511375
* To avoid deadlocks, we'd better drop the leaf page lock
13521376
* before going further.
13531377
*/
1354-
LockBuffer(buf,BUFFER_LOCK_UNLOCK);
1378+
LockBuffer(leafbuf,BUFFER_LOCK_UNLOCK);
13551379

13561380
/*
13571381
* Fetch the left sibling, to check that it's not marked with
@@ -1375,10 +1399,10 @@ _bt_pagedel(Relation rel, Buffer buf)
13751399
* incompletely-split page to be split again. So we don't
13761400
* need to walk right here.
13771401
*/
1378-
if (lopaque->btpo_next==BufferGetBlockNumber(buf)&&
1402+
if (lopaque->btpo_next==BufferGetBlockNumber(leafbuf)&&
13791403
P_INCOMPLETE_SPLIT(lopaque))
13801404
{
1381-
ReleaseBuffer(buf);
1405+
ReleaseBuffer(leafbuf);
13821406
_bt_relbuf(rel,lbuf);
13831407
returnndeleted;
13841408
}
@@ -1395,40 +1419,59 @@ _bt_pagedel(Relation rel, Buffer buf)
13951419
_bt_relbuf(rel,lbuf);
13961420

13971421
/*
1398-
* Re-lock the leaf page, and start over, to re-check that the
1399-
* page can still be deleted.
1422+
* Re-lock the leaf page, and start over to use our stack
1423+
* within _bt_mark_page_halfdead. We must do it that way
1424+
* because it's possible that leafbuf can no longer be
1425+
* deleted. We need to recheck.
14001426
*/
1401-
LockBuffer(buf,BT_WRITE);
1427+
LockBuffer(leafbuf,BT_WRITE);
14021428
continue;
14031429
}
14041430

1405-
if (!_bt_mark_page_halfdead(rel,buf,stack))
1431+
/*
1432+
* See if it's safe to delete the leaf page, and determine how
1433+
* many parent/internal pages above the leaf level will be
1434+
* deleted. If it's safe then _bt_mark_page_halfdead will also
1435+
* perform the first phase of deletion, which includes marking the
1436+
* leafbuf page half-dead.
1437+
*/
1438+
Assert(P_ISLEAF(opaque)&& !P_IGNORE(opaque));
1439+
if (!_bt_mark_page_halfdead(rel,leafbuf,stack))
14061440
{
1407-
_bt_relbuf(rel,buf);
1441+
_bt_relbuf(rel,leafbuf);
14081442
returnndeleted;
14091443
}
14101444
}
14111445

14121446
/*
14131447
* Then unlink it from its siblings. Each call to
14141448
* _bt_unlink_halfdead_page unlinks the topmost page from the branch,
1415-
* making it shallower. Iterate until the leaf page is gone.
1449+
* making it shallower. Iterate until the leafbuf page is deleted.
1450+
*
1451+
* _bt_unlink_halfdead_page should never fail, since we established
1452+
* that deletion is generally safe in _bt_mark_page_halfdead.
14161453
*/
14171454
rightsib_empty= false;
1455+
Assert(P_ISLEAF(opaque)&&P_ISHALFDEAD(opaque));
14181456
while (P_ISHALFDEAD(opaque))
14191457
{
1420-
/* will check for interrupts, once lock is released */
1421-
if (!_bt_unlink_halfdead_page(rel,buf,&rightsib_empty))
1458+
/* Check for interrupts in _bt_unlink_halfdead_page */
1459+
if (!_bt_unlink_halfdead_page(rel,leafbuf,&rightsib_empty,
1460+
oldestBtpoXact))
14221461
{
1423-
/* _bt_unlink_halfdead_pagealready released buffer */
1462+
/* _bt_unlink_halfdead_pagefailed, released buffer */
14241463
returnndeleted;
14251464
}
14261465
ndeleted++;
14271466
}
14281467

1468+
Assert(P_ISLEAF(opaque)&&P_ISDELETED(opaque));
1469+
Assert(TransactionIdFollowsOrEquals(opaque->btpo.xact,
1470+
*oldestBtpoXact));
1471+
14291472
rightsib=opaque->btpo_next;
14301473

1431-
_bt_relbuf(rel,buf);
1474+
_bt_relbuf(rel,leafbuf);
14321475

14331476
/*
14341477
* Check here, as calling loops will have locks held, preventing
@@ -1448,7 +1491,7 @@ _bt_pagedel(Relation rel, Buffer buf)
14481491
if (!rightsib_empty)
14491492
break;
14501493

1451-
buf=_bt_getbuf(rel,rightsib,BT_WRITE);
1494+
leafbuf=_bt_getbuf(rel,rightsib,BT_WRITE);
14521495
}
14531496

14541497
returnndeleted;
@@ -1641,17 +1684,28 @@ _bt_mark_page_halfdead(Relation rel, Buffer leafbuf, BTStack stack)
16411684
* of the whole branch, including the leaf page itself, iterate until the
16421685
* leaf page is deleted.
16431686
*
1644-
* Returns 'false' if the page could not be unlinked (shouldn't happen).
1645-
* If the (new) right sibling of the page is empty, *rightsib_empty is set
1646-
* to true.
1687+
* Returns 'false' if the page could not be unlinked (shouldn't happen). If
1688+
* the right sibling of the current target page is empty, *rightsib_empty is
1689+
* set to true, allowing caller to delete the target's right sibling page in
1690+
* passing. Note that *rightsib_empty is only actually used by caller when
1691+
* target page is leafbuf, following last call here for leafbuf/the subtree
1692+
* containing leafbuf. (We always set *rightsib_empty for caller, just to be
1693+
* consistent.)
1694+
*
1695+
* We maintain *oldestBtpoXact for pages that are deleted by the current
1696+
* VACUUM operation here. This must be handled here because we conservatively
1697+
* assume that there needs to be a new call to ReadNewTransactionId() each
1698+
* time a page gets deleted. See comments about the underlying assumption
1699+
* below.
16471700
*
16481701
* Must hold pin and lock on leafbuf at entry (read or write doesn't matter).
16491702
* On success exit, we'll be holding pin and write lock. On failure exit,
16501703
* we'll release both pin and lock before returning (we define it that way
16511704
* to avoid having to reacquire a lock we already released).
16521705
*/
16531706
staticbool
1654-
_bt_unlink_halfdead_page(Relationrel,Bufferleafbuf,bool*rightsib_empty)
1707+
_bt_unlink_halfdead_page(Relationrel,Bufferleafbuf,bool*rightsib_empty,
1708+
TransactionId*oldestBtpoXact)
16551709
{
16561710
BlockNumberleafblkno=BufferGetBlockNumber(leafbuf);
16571711
BlockNumberleafleftsib;
@@ -1788,9 +1842,9 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
17881842
lbuf=InvalidBuffer;
17891843

17901844
/*
1791-
* Next write-lock the target page itself. It should beokay to takejust
1792-
*a write lock nota superexclusive lock, since noscans would stop on an
1793-
*emptypage.
1845+
* Next write-lock the target page itself. It'sokay to takea write lock
1846+
*rather thana superexclusive lock, since noscan will stop on an empty
1847+
* page.
17941848
*/
17951849
LockBuffer(buf,BT_WRITE);
17961850
page=BufferGetPage(buf);
@@ -1928,6 +1982,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
19281982
*/
19291983
page=BufferGetPage(buf);
19301984
opaque= (BTPageOpaque)PageGetSpecialPointer(page);
1985+
Assert(P_ISHALFDEAD(opaque)|| !P_ISLEAF(opaque));
19311986
opaque->btpo_flags &= ~BTP_HALF_DEAD;
19321987
opaque->btpo_flags |=BTP_DELETED;
19331988
opaque->btpo.xact=ReadNewTransactionId();
@@ -2030,6 +2085,10 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
20302085
_bt_relbuf(rel,lbuf);
20312086
_bt_relbuf(rel,rbuf);
20322087

2088+
if (!TransactionIdIsValid(*oldestBtpoXact)||
2089+
TransactionIdPrecedes(opaque->btpo.xact,*oldestBtpoXact))
2090+
*oldestBtpoXact=opaque->btpo.xact;
2091+
20332092
/*
20342093
* Release the target, if it was not the leaf block. The leaf is always
20352094
* kept locked.

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp