Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commite628464

Browse files
committed
Modify btree to delete known-dead index entries without an actual VACUUM.
When we are about to split an index page to do an insertion, first lookto see if any entries marked LP_DELETE exist on the page, and if so removethem to try to make enough space for the desired insert. This should reduceindex bloat in heavily-updated tables, although of course you still needVACUUM eventually to clean up the heap.Junji Teramoto
1 parentedd49fc commite628464

File tree

6 files changed

+144
-16
lines changed

6 files changed

+144
-16
lines changed

‎src/backend/access/nbtree/README

Lines changed: 42 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
$PostgreSQL: pgsql/src/backend/access/nbtree/README,v 1.12 2006/05/08 00:00:09 tgl Exp $
1+
$PostgreSQL: pgsql/src/backend/access/nbtree/README,v 1.13 2006/07/25 19:13:00 tgl Exp $
22

33
This directory contains a correct implementation of Lehman and Yao's
44
high-concurrency B-tree management algorithm (P. Lehman and S. Yao,
@@ -146,11 +146,12 @@ Because a page can be split even while someone holds a pin on it, it is
146146
possible that an indexscan will return items that are no longer stored on
147147
the page it has a pin on, but rather somewhere to the right of that page.
148148
To ensure that VACUUM can't prematurely remove such heap tuples, we require
149-
btbulkdelete to obtain super-exclusive lock on every leaf page in the index
150-
(even pages that don't contain any deletable tuples). This guarantees that
149+
btbulkdelete to obtain super-exclusive lock on every leaf page in the index,
150+
even pages that don't contain any deletable tuples. This guarantees that
151151
the btbulkdelete call cannot return while any indexscan is still holding
152152
a copy of a deleted index tuple. Note that this requirement does not say
153-
that btbulkdelete must visit the pages in any particular order.
153+
that btbulkdelete must visit the pages in any particular order. (See also
154+
on-the-fly deletion, below.)
154155

155156
There is no such interlocking for deletion of items in internal pages,
156157
since backends keep no lock nor pin on a page they have descended past.
@@ -320,6 +321,43 @@ positives, so long as it never gives a false negative. This makes it
320321
possible to implement the test with a small counter value stored on each
321322
index page.
322323

324+
On-the-fly deletion of index tuples
325+
-----------------------------------
326+
327+
If a process visits a heap tuple and finds that it's dead and removable
328+
(ie, dead to all open transactions, not only that process), then we can
329+
return to the index and mark the corresponding index entry "known dead",
330+
allowing subsequent index scans to skip visiting the heap tuple. The
331+
"known dead" marking uses the LP_DELETE bit in ItemIds. This is currently
332+
only done in plain indexscans, not bitmap scans, because only plain scans
333+
visit the heap and index "in sync" and so there's not a convenient way
334+
to do it for bitmap scans.
335+
336+
Once an index tuple has been marked LP_DELETE it can actually be removed
337+
from the index immediately; since index scans only stop "between" pages,
338+
no scan can lose its place from such a deletion. We separate the steps
339+
because we allow LP_DELETE to be set with only a share lock (it's exactly
340+
like a hint bit for a heap tuple), but physically removing tuples requires
341+
exclusive lock. In the current code we try to remove LP_DELETE tuples when
342+
we are otherwise faced with having to split a page to do an insertion (and
343+
hence have exclusive lock on it already).
344+
345+
This leaves the index in a state where it has no entry for a dead tuple
346+
that still exists in the heap. This is not a problem for the current
347+
implementation of VACUUM, but it could be a problem for anything that
348+
explicitly tries to find index entries for dead tuples. (However, the
349+
same situation is created by REINDEX, since it doesn't enter dead
350+
tuples into the index.)
351+
352+
It's sufficient to have an exclusive lock on the index page, not a
353+
super-exclusive lock, to do deletion of LP_DELETE items. It might seem
354+
that this breaks the interlock between VACUUM and indexscans, but that is
355+
not so: as long as an indexscanning process has a pin on the page where
356+
the index item used to be, VACUUM cannot complete its btbulkdelete scan
357+
and so cannot remove the heap tuple. This is another reason why
358+
btbulkdelete has to get super-exclusive lock on every leaf page, not only
359+
the ones where it actually sees items to delete.
360+
323361
WAL considerations
324362
------------------
325363

‎src/backend/access/nbtree/nbtinsert.c

Lines changed: 73 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtinsert.c,v 1.141 2006/07/13 16:49:12 momjian Exp $
11+
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtinsert.c,v 1.142 2006/07/25 19:13:00 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -65,6 +65,7 @@ static void _bt_pgaddtup(Relation rel, Page page,
6565
OffsetNumberitup_off,constchar*where);
6666
staticbool_bt_isequal(TupleDescitupdesc,Pagepage,OffsetNumberoffnum,
6767
intkeysz,ScanKeyscankey);
68+
staticvoid_bt_vacuum_one_page(Relationrel,Bufferbuffer);
6869

6970

7071
/*
@@ -262,6 +263,8 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
262263
hbuffer)==HEAPTUPLE_DEAD)
263264
{
264265
curitemid->lp_flags |=LP_DELETE;
266+
opaque->btpo_flags |=BTP_HAS_GARBAGE;
267+
/* be sure to mark the proper buffer dirty... */
265268
if (nbuf!=InvalidBuffer)
266269
SetBufferCommitInfoNeedsSave(nbuf);
267270
else
@@ -424,11 +427,29 @@ _bt_insertonpg(Relation rel,
424427
*/
425428
boolmovedright= false;
426429

427-
while (PageGetFreeSpace(page)<itemsz&&
428-
!P_RIGHTMOST(lpageop)&&
429-
_bt_compare(rel,keysz,scankey,page,P_HIKEY)==0&&
430-
random()> (MAX_RANDOM_VALUE /100))
430+
while (PageGetFreeSpace(page)<itemsz)
431431
{
432+
Bufferrbuf;
433+
434+
/*
435+
* before considering moving right, see if we can obtain enough
436+
* space by erasing LP_DELETE items
437+
*/
438+
if (P_ISLEAF(lpageop)&&P_HAS_GARBAGE(lpageop))
439+
{
440+
_bt_vacuum_one_page(rel,buf);
441+
if (PageGetFreeSpace(page) >=itemsz)
442+
break;/* OK, now we have enough space */
443+
}
444+
445+
/*
446+
* nope, so check conditions (b) and (c) enumerated above
447+
*/
448+
if (P_RIGHTMOST(lpageop)||
449+
_bt_compare(rel,keysz,scankey,page,P_HIKEY)!=0||
450+
random() <= (MAX_RANDOM_VALUE /100))
451+
break;
452+
432453
/*
433454
* step right to next non-dead page
434455
*
@@ -438,7 +459,7 @@ _bt_insertonpg(Relation rel,
438459
* pages won't do because we don't know when they will get
439460
* de-linked from the tree.
440461
*/
441-
Bufferrbuf=InvalidBuffer;
462+
rbuf=InvalidBuffer;
442463

443464
for (;;)
444465
{
@@ -702,9 +723,9 @@ _bt_split(Relation rel, Buffer buf, OffsetNumber firstright,
702723
ropaque= (BTPageOpaque)PageGetSpecialPointer(rightpage);
703724

704725
/* if we're splitting this page, it won't be the root when we're done */
705-
/* also, clear the SPLIT_ENDflag in both pages */
726+
/* also, clear the SPLIT_ENDand HAS_GARBAGE flags in both pages */
706727
lopaque->btpo_flags=oopaque->btpo_flags;
707-
lopaque->btpo_flags &= ~(BTP_ROOT |BTP_SPLIT_END);
728+
lopaque->btpo_flags &= ~(BTP_ROOT |BTP_SPLIT_END |BTP_HAS_GARBAGE);
708729
ropaque->btpo_flags=lopaque->btpo_flags;
709730
lopaque->btpo_prev=oopaque->btpo_prev;
710731
lopaque->btpo_next=BufferGetBlockNumber(rbuf);
@@ -1651,3 +1672,47 @@ _bt_isequal(TupleDesc itupdesc, Page page, OffsetNumber offnum,
16511672
/* if we get here, the keys are equal */
16521673
return true;
16531674
}
1675+
1676+
/*
1677+
* _bt_vacuum_one_page - vacuum just one index page.
1678+
*
1679+
* Try to remove LP_DELETE items from the given page. The passed buffer
1680+
* must be exclusive-locked, but unlike a real VACUUM, we don't need a
1681+
* super-exclusive "cleanup" lock (see nbtree/README).
1682+
*/
1683+
staticvoid
1684+
_bt_vacuum_one_page(Relationrel,Bufferbuffer)
1685+
{
1686+
OffsetNumberdeletable[MaxOffsetNumber];
1687+
intndeletable=0;
1688+
OffsetNumberoffnum,
1689+
minoff,
1690+
maxoff;
1691+
Pagepage=BufferGetPage(buffer);
1692+
BTPageOpaqueopaque= (BTPageOpaque)PageGetSpecialPointer(page);
1693+
1694+
/*
1695+
* Scan over all items to see which ones need deleted
1696+
* according to LP_DELETE flags.
1697+
*/
1698+
minoff=P_FIRSTDATAKEY(opaque);
1699+
maxoff=PageGetMaxOffsetNumber(page);
1700+
for (offnum=minoff;
1701+
offnum <=maxoff;
1702+
offnum=OffsetNumberNext(offnum))
1703+
{
1704+
ItemIditemId=PageGetItemId(page,offnum);
1705+
1706+
if (ItemIdDeleted(itemId))
1707+
deletable[ndeletable++]=offnum;
1708+
}
1709+
1710+
if (ndeletable>0)
1711+
_bt_delitems(rel,buffer,deletable,ndeletable);
1712+
/*
1713+
* Note: if we didn't find any LP_DELETE items, then the page's
1714+
* BTP_HAS_GARBAGE hint bit is falsely set. We do not bother
1715+
* expending a separate write to clear it, however. We will clear
1716+
* it when we split the page.
1717+
*/
1718+
}

‎src/backend/access/nbtree/nbtpage.c

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
*
1010
*
1111
* IDENTIFICATION
12-
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtpage.c,v 1.98 2006/07/13 16:49:12 momjian Exp $
12+
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtpage.c,v 1.99 2006/07/25 19:13:00 tgl Exp $
1313
*
1414
*NOTES
1515
* Postgres btree pages look like ordinary relation pages.The opaque
@@ -668,6 +668,15 @@ _bt_delitems(Relation rel, Buffer buf,
668668
opaque= (BTPageOpaque)PageGetSpecialPointer(page);
669669
opaque->btpo_cycleid=0;
670670

671+
/*
672+
* Mark the page as not containing any LP_DELETE items. This is not
673+
* certainly true (there might be some that have recently been marked,
674+
* but weren't included in our target-item list), but it will almost
675+
* always be true and it doesn't seem worth an additional page scan
676+
* to check it. Remember that BTP_HAS_GARBAGE is only a hint anyway.
677+
*/
678+
opaque->btpo_flags &= ~BTP_HAS_GARBAGE;
679+
671680
MarkBufferDirty(buf);
672681

673682
/* XLOG stuff */

‎src/backend/access/nbtree/nbtutils.c

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtutils.c,v 1.77 2006/07/13 16:49:12 momjian Exp $
11+
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtutils.c,v 1.78 2006/07/25 19:13:00 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -884,9 +884,15 @@ _bt_killitems(IndexScanDesc scan, bool haveLock)
884884
* Since this can be redone later if needed, it's treated the same
885885
* as a commit-hint-bit status update for heap tuples: we mark the
886886
* buffer dirty but don't make a WAL log entry.
887+
*
888+
* Whenever we mark anything LP_DELETEd, we also set the page's
889+
* BTP_HAS_GARBAGE flag, which is likewise just a hint.
887890
*/
888891
if (killedsomething)
892+
{
893+
opaque->btpo_flags |=BTP_HAS_GARBAGE;
889894
SetBufferCommitInfoNeedsSave(so->currPos.buf);
895+
}
890896

891897
if (!haveLock)
892898
LockBuffer(so->currPos.buf,BUFFER_LOCK_UNLOCK);

‎src/backend/access/nbtree/nbtxlog.c

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
* Portions Copyright (c) 1994, Regents of the University of California
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtxlog.c,v 1.35 2006/07/14 14:52:17 momjian Exp $
11+
* $PostgreSQL: pgsql/src/backend/access/nbtree/nbtxlog.c,v 1.36 2006/07/25 19:13:00 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -346,6 +346,7 @@ btree_xlog_delete(XLogRecPtr lsn, XLogRecord *record)
346346
Relationreln;
347347
Bufferbuffer;
348348
Pagepage;
349+
BTPageOpaqueopaque;
349350

350351
if (record->xl_info&XLR_BKP_BLOCK_1)
351352
return;
@@ -374,6 +375,13 @@ btree_xlog_delete(XLogRecPtr lsn, XLogRecord *record)
374375
PageIndexMultiDelete(page,unused,unend-unused);
375376
}
376377

378+
/*
379+
* Mark the page as not containing any LP_DELETE items --- see comments
380+
* in _bt_delitems().
381+
*/
382+
opaque= (BTPageOpaque)PageGetSpecialPointer(page);
383+
opaque->btpo_flags &= ~BTP_HAS_GARBAGE;
384+
377385
PageSetLSN(page,lsn);
378386
PageSetTLI(page,ThisTimeLineID);
379387
MarkBufferDirty(buffer);

‎src/include/access/nbtree.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
* Portions Copyright (c) 1996-2006, PostgreSQL Global Development Group
88
* Portions Copyright (c) 1994, Regents of the University of California
99
*
10-
* $PostgreSQL: pgsql/src/include/access/nbtree.h,v 1.101 2006/07/11 21:05:57 tgl Exp $
10+
* $PostgreSQL: pgsql/src/include/access/nbtree.h,v 1.102 2006/07/25 19:13:00 tgl Exp $
1111
*
1212
*-------------------------------------------------------------------------
1313
*/
@@ -71,6 +71,7 @@ typedef BTPageOpaqueData *BTPageOpaque;
7171
#defineBTP_META(1 << 3)/* meta-page */
7272
#defineBTP_HALF_DEAD(1 << 4)/* empty, but still in tree */
7373
#defineBTP_SPLIT_END(1 << 5)/* rightmost page of split group */
74+
#defineBTP_HAS_GARBAGE(1 << 6)/* page has LP_DELETEd tuples */
7475

7576

7677
/*
@@ -163,6 +164,7 @@ typedef struct BTMetaPageData
163164
#defineP_ISROOT(opaque)((opaque)->btpo_flags & BTP_ROOT)
164165
#defineP_ISDELETED(opaque)((opaque)->btpo_flags & BTP_DELETED)
165166
#defineP_IGNORE(opaque)((opaque)->btpo_flags & (BTP_DELETED|BTP_HALF_DEAD))
167+
#defineP_HAS_GARBAGE(opaque)((opaque)->btpo_flags & BTP_HAS_GARBAGE)
166168

167169
/*
168170
*Lehman and Yao's algorithm requires a ``high key'' on every non-rightmost

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp