Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitb50991e

Browse files
committed
Fix more crash-safe visibility map bugs, and improve comments.
In lazy_scan_heap, we could issue bogus warnings about incorrectinformation in the visibility map, because we checked the visibilitymap bit before locking the heap page, creating a race condition. Fixby rechecking the visibility map bit before we complain. Rejiggersome related logic so that we rely on the possibly-outdatedall_visible_according_to_vm value as little as possible.In heap_multi_insert, it's not safe to clear the visibility map bitbefore beginning the critical section. The visibility map is notcrash-safe unless we treat clearing the bit as a critical operation.Specifically, if the transaction were to error out after we set thebit and before entering the critical section, we could end up writingthe heap page to disk (with the bit cleared) and crashing before thevisibility map page made it to disk. That would be bad. heap_inserthas this correct, but somehow the order of operations got rearrangedwhen heap_multi_insert was added.Also, add some more comments to visibilitymap_test, lazy_scan_heap,and IndexOnlyNext, expounding on concurrency issues.Per extensive code review by Andres Freund, and further review by TomLane, who also made the original report about the bogus warnings.
1 parent92135ea commitb50991e

File tree

4 files changed

+61
-15
lines changed

4 files changed

+61
-15
lines changed

‎src/backend/access/heap/heapam.c

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2149,15 +2149,6 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
21492149
&vmbuffer,NULL);
21502150
page=BufferGetPage(buffer);
21512151

2152-
if (PageIsAllVisible(page))
2153-
{
2154-
all_visible_cleared= true;
2155-
PageClearAllVisible(page);
2156-
visibilitymap_clear(relation,
2157-
BufferGetBlockNumber(buffer),
2158-
vmbuffer);
2159-
}
2160-
21612152
/* NO EREPORT(ERROR) from here till changes are logged */
21622153
START_CRIT_SECTION();
21632154

@@ -2172,6 +2163,15 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
21722163
RelationPutHeapTuple(relation,buffer,heaptup);
21732164
}
21742165

2166+
if (PageIsAllVisible(page))
2167+
{
2168+
all_visible_cleared= true;
2169+
PageClearAllVisible(page);
2170+
visibilitymap_clear(relation,
2171+
BufferGetBlockNumber(buffer),
2172+
vmbuffer);
2173+
}
2174+
21752175
/*
21762176
* XXX Should we set PageSetPrunable on this page ? See heap_insert()
21772177
*/

‎src/backend/access/heap/visibilitymap.c

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -293,6 +293,13 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, XLogRecPtr recptr,
293293
* relation. On return, *buf is a valid buffer with the map page containing
294294
* the bit for heapBlk, or InvalidBuffer. The caller is responsible for
295295
* releasing *buf after it's done testing and setting bits.
296+
*
297+
* NOTE: This function is typically called without a lock on the heap page,
298+
* so somebody else could change the bit just after we look at it. In fact,
299+
* since we don't lock the visibility map page either, it's even possible that
300+
* someone else could have changed the bit just before we look at it, but yet
301+
* we might see the old value. It is the caller's responsibility to deal with
302+
* all concurrency issues!
296303
*/
297304
bool
298305
visibilitymap_test(Relationrel,BlockNumberheapBlk,Buffer*buf)
@@ -327,7 +334,9 @@ visibilitymap_test(Relation rel, BlockNumber heapBlk, Buffer *buf)
327334
map=PageGetContents(BufferGetPage(*buf));
328335

329336
/*
330-
* We don't need to lock the page, as we're only looking at a single bit.
337+
* A single-bit read is atomic. There could be memory-ordering effects
338+
* here, but for performance reasons we make it the caller's job to worry
339+
* about that.
331340
*/
332341
result= (map[mapByte]& (1 <<mapBit)) ? true : false;
333342

‎src/backend/commands/vacuumlazy.c

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -419,6 +419,15 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
419419
* Note: if scan_all is true, we won't actually skip any pages; but we
420420
* maintain next_not_all_visible_block anyway, so as to set up the
421421
* all_visible_according_to_vm flag correctly for each page.
422+
*
423+
* Note: The value returned by visibilitymap_test could be slightly
424+
* out-of-date, since we make this test before reading the corresponding
425+
* heap page or locking the buffer. This is OK. If we mistakenly think
426+
* that the page is all-visible when in fact the flag's just been cleared,
427+
* we might fail to vacuum the page. But it's OK to skip pages when
428+
* scan_all is not set, so no great harm done; the next vacuum will find
429+
* them. If we make the reverse mistake and vacuum a page unnecessarily,
430+
* it'll just be a no-op.
422431
*/
423432
for (next_not_all_visible_block=0;
424433
next_not_all_visible_block<nblocks;
@@ -852,22 +861,37 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
852861
freespace=PageGetHeapFreeSpace(page);
853862

854863
/* mark page all-visible, if appropriate */
855-
if (all_visible&& !all_visible_according_to_vm)
864+
if (all_visible)
856865
{
857866
if (!PageIsAllVisible(page))
858867
{
859868
PageSetAllVisible(page);
860869
MarkBufferDirty(buf);
870+
visibilitymap_set(onerel,blkno,InvalidXLogRecPtr,vmbuffer,
871+
visibility_cutoff_xid);
872+
}
873+
elseif (!all_visible_according_to_vm)
874+
{
875+
/*
876+
* It should never be the case that the visibility map page
877+
* is set while the page-level bit is clear, but the reverse
878+
* is allowed. Set the visibility map bit as well so that
879+
* we get back in sync.
880+
*/
881+
visibilitymap_set(onerel,blkno,InvalidXLogRecPtr,vmbuffer,
882+
visibility_cutoff_xid);
861883
}
862-
visibilitymap_set(onerel,blkno,InvalidXLogRecPtr,vmbuffer,
863-
visibility_cutoff_xid);
864884
}
865885

866886
/*
867887
* As of PostgreSQL 9.2, the visibility map bit should never be set if
868-
* the page-level bit is clear.
888+
* the page-level bit is clear. However, it's possible that the bit
889+
* got cleared after we checked it and before we took the buffer
890+
* content lock, so we must recheck before jumping to the conclusion
891+
* that something bad has happened.
869892
*/
870-
elseif (all_visible_according_to_vm&& !PageIsAllVisible(page))
893+
elseif (all_visible_according_to_vm&& !PageIsAllVisible(page)
894+
&&visibilitymap_test(onerel,blkno,&vmbuffer))
871895
{
872896
elog(WARNING,"page is not marked all-visible but visibility map bit is set in relation \"%s\" page %u",
873897
relname,blkno);

‎src/backend/executor/nodeIndexonlyscan.c

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,19 @@ IndexOnlyNext(IndexOnlyScanState *node)
8282
* We can skip the heap fetch if the TID references a heap page on
8383
* which all tuples are known visible to everybody. In any case,
8484
* we'll use the index tuple not the heap tuple as the data source.
85+
*
86+
* Note on Memory Ordering Effects: visibilitymap_test does not lock
87+
* the visibility map buffer, and therefore the result we read here
88+
* could be slightly stale. However, it can't be stale enough to
89+
* matter. It suffices to show that (1) there is a read barrier
90+
* between the time we read the index TID and the time we test the
91+
* visibility map; and (2) there is a write barrier between the time
92+
* some other concurrent process clears the visibility map bit and the
93+
* time it inserts the index TID. Since acquiring or releasing a
94+
* LWLock interposes a full barrier, this is easy to show: (1) is
95+
* satisfied by the release of the index buffer content lock after
96+
* reading the TID; and (2) is satisfied by the acquisition of the
97+
* buffer content lock in order to insert the TID.
8598
*/
8699
if (!visibilitymap_test(scandesc->heapRelation,
87100
ItemPointerGetBlockNumber(tid),

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp