Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit0d1fe9f

Browse files
committed
Move page initialization from RelationAddExtraBlocks() to use, take 2.
Previously we initialized pages when bulk extending inRelationAddExtraBlocks(). That has a major disadvantage: It tiesRelationAddExtraBlocks() to heap, as other types of storage are likelyto need different amounts of special space, have different amount offree space (previously determined by PageGetHeapFreeSpace()).That we're relying on initializing pages, but not WAL logging theinitialization, also means the risk for getting"WARNING: relation \"%s\" page %u is uninitialized --- fixing"style warnings in vacuums after crashes/immediate shutdowns, isconsiderably higher. The warning sounds much more serious than whatthey are.Fix those two issues together by not initializing pages inRelationAddExtraPages() (but continue to do so inRelationGetBufferForTuple(), which is linked much more closely toheap), and accepting uninitialized pages as normal invacuumlazy.c. When vacuumlazy encounters an empty page it now adds itto the FSM, but does nothing else. We chose to not issue a debugmessage, much less a warning in that case - it seems rarely useful,and quite likely to scare people unnecessarily.For now empty pages aren't added to the VM, because standbys would notre-discover such pages after a promotion. In contrast to other sourcesfor empty pages, there's no corresponding WAL records triggering FSMupdates during replay.Previously when extending the relation, there was a moment betweenextending the relation, and acquiring an exclusive lock on the newpage, in which another backend could lock the page. To avoid newcontent being put on that new page, vacuumlazy needed to acquire theextension lock for a brief moment when encountering a new page. Asecond corner case, only working somewhat by accident, was thatRelationGetBufferForTuple() sometimes checks the last page in arelation for free space, without consulting the FSM; that only workedbecause PageGetHeapFreeSpace() interprets the zero page header in anew page as no free space. The lack of handling this properlyrequired reverting the previous attempt in6842005.This issue can be solved by using RBM_ZERO_AND_LOCK when extending therelation, thereby avoiding this window. There's some added complexitywhen RelationGetBufferForTuple() is called with another buffer (forupdates), to avoid deadlocks, but that's rarely hit at runtime.Author: Andres FreundReviewed-By: Tom LaneDiscussion:https://postgr.es/m/20181219083945.6khtgm36mivonhva@alap3.anarazel.de
1 parentac3a9af commit0d1fe9f

File tree

2 files changed

+126
-75
lines changed

2 files changed

+126
-75
lines changed

‎src/backend/access/heap/hio.c

Lines changed: 82 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -74,23 +74,31 @@ RelationPutHeapTuple(Relation relation,
7474
}
7575

7676
/*
77-
* Read in a buffer, using bulk-insert strategy if bistate isn't NULL.
77+
* Read in a buffer in mode, using bulk-insert strategy if bistate isn't NULL.
7878
*/
7979
staticBuffer
8080
ReadBufferBI(Relationrelation,BlockNumbertargetBlock,
81-
BulkInsertStatebistate)
81+
ReadBufferModemode,BulkInsertStatebistate)
8282
{
8383
Bufferbuffer;
8484

8585
/* If not bulk-insert, exactly like ReadBuffer */
8686
if (!bistate)
87-
returnReadBuffer(relation,targetBlock);
87+
returnReadBufferExtended(relation,MAIN_FORKNUM,targetBlock,
88+
mode,NULL);
8889

8990
/* If we have the desired block already pinned, re-pin and return it */
9091
if (bistate->current_buf!=InvalidBuffer)
9192
{
9293
if (BufferGetBlockNumber(bistate->current_buf)==targetBlock)
9394
{
95+
/*
96+
* Currently the LOCK variants are only used for extending
97+
* relation, which should never reach this branch.
98+
*/
99+
Assert(mode!=RBM_ZERO_AND_LOCK&&
100+
mode!=RBM_ZERO_AND_CLEANUP_LOCK);
101+
94102
IncrBufferRefCount(bistate->current_buf);
95103
returnbistate->current_buf;
96104
}
@@ -101,7 +109,7 @@ ReadBufferBI(Relation relation, BlockNumber targetBlock,
101109

102110
/* Perform a read using the buffer strategy */
103111
buffer=ReadBufferExtended(relation,MAIN_FORKNUM,targetBlock,
104-
RBM_NORMAL,bistate->strategy);
112+
mode,bistate->strategy);
105113

106114
/* Save the selected block as target for future inserts */
107115
IncrBufferRefCount(buffer);
@@ -204,30 +212,29 @@ RelationAddExtraBlocks(Relation relation, BulkInsertState bistate)
204212
/*
205213
* Extend by one page. This should generally match the main-line
206214
* extension code in RelationGetBufferForTuple, except that we hold
207-
* the relation extension lock throughout.
215+
* the relation extension lock throughout, and we don't immediately
216+
* initialize the page (see below).
208217
*/
209-
buffer=ReadBufferBI(relation,P_NEW,bistate);
210-
211-
LockBuffer(buffer,BUFFER_LOCK_EXCLUSIVE);
218+
buffer=ReadBufferBI(relation,P_NEW,RBM_ZERO_AND_LOCK,bistate);
212219
page=BufferGetPage(buffer);
213220

214221
if (!PageIsNew(page))
215222
elog(ERROR,"page %u of relation \"%s\" should be empty but is not",
216223
BufferGetBlockNumber(buffer),
217224
RelationGetRelationName(relation));
218225

219-
PageInit(page,BufferGetPageSize(buffer),0);
220-
221226
/*
222-
* We mark all the new buffers dirty, but do nothing to write them
223-
* out; they'll probably get used soon, and even if they are not, a
224-
* crash will leave an okay all-zeroes page on disk.
227+
* Add the page to the FSM without initializing. If we were to
228+
* initialize here, the page would potentially get flushed out to disk
229+
* before we add any useful content. There's no guarantee that that'd
230+
* happen before a potential crash, so we need to deal with
231+
* uninitialized pages anyway, thus avoid the potential for
232+
* unnecessary writes.
225233
*/
226-
MarkBufferDirty(buffer);
227234

228235
/* we'll need this info below */
229236
blockNum=BufferGetBlockNumber(buffer);
230-
freespace=PageGetHeapFreeSpace(page);
237+
freespace=BufferGetPageSize(buffer)-SizeOfPageHeaderData;
231238

232239
UnlockReleaseBuffer(buffer);
233240

@@ -412,7 +419,7 @@ RelationGetBufferForTuple(Relation relation, Size len,
412419
if (otherBuffer==InvalidBuffer)
413420
{
414421
/* easy case */
415-
buffer=ReadBufferBI(relation,targetBlock,bistate);
422+
buffer=ReadBufferBI(relation,targetBlock,RBM_NORMAL,bistate);
416423
if (PageIsAllVisible(BufferGetPage(buffer)))
417424
visibilitymap_pin(relation,targetBlock,vmbuffer);
418425
LockBuffer(buffer,BUFFER_LOCK_EXCLUSIVE);
@@ -479,6 +486,19 @@ RelationGetBufferForTuple(Relation relation, Size len,
479486
* we're done.
480487
*/
481488
page=BufferGetPage(buffer);
489+
490+
/*
491+
* If necessary initialize page, it'll be used soon. We could avoid
492+
* dirtying the buffer here, and rely on the caller to do so whenever
493+
* it puts a tuple onto the page, but there seems not much benefit in
494+
* doing so.
495+
*/
496+
if (PageIsNew(page))
497+
{
498+
PageInit(page,BufferGetPageSize(buffer),0);
499+
MarkBufferDirty(buffer);
500+
}
501+
482502
pageFreeSpace=PageGetHeapFreeSpace(page);
483503
if (len+saveFreeSpace <=pageFreeSpace)
484504
{
@@ -571,42 +591,67 @@ RelationGetBufferForTuple(Relation relation, Size len,
571591
* it worth keeping an accurate file length in shared memory someplace,
572592
* rather than relying on the kernel to do it for us?
573593
*/
574-
buffer=ReadBufferBI(relation,P_NEW,bistate);
594+
buffer=ReadBufferBI(relation,P_NEW,RBM_ZERO_AND_LOCK,bistate);
575595

576596
/*
577-
* We can be certain that locking the otherBuffer first is OK, since it
578-
* must have a lower page number.
597+
* We need to initialize the empty new page. Double-check that it really
598+
* is empty (this should never happen, but if it does we don't want to
599+
* risk wiping out valid data).
579600
*/
580-
if (otherBuffer!=InvalidBuffer)
581-
LockBuffer(otherBuffer,BUFFER_LOCK_EXCLUSIVE);
601+
page=BufferGetPage(buffer);
582602

583-
/*
584-
* Now acquire lock on the new page.
585-
*/
586-
LockBuffer(buffer,BUFFER_LOCK_EXCLUSIVE);
603+
if (!PageIsNew(page))
604+
elog(ERROR,"page %u of relation \"%s\" should be empty but is not",
605+
BufferGetBlockNumber(buffer),
606+
RelationGetRelationName(relation));
607+
608+
PageInit(page,BufferGetPageSize(buffer),0);
609+
MarkBufferDirty(buffer);
587610

588611
/*
589612
* Release the file-extension lock; it's now OK for someone else to extend
590-
* the relation some more. Note that we cannot release this lock before
591-
* we have buffer lock on the new page, or we risk a race condition
592-
* against vacuumlazy.c --- see comments therein.
613+
* the relation some more.
593614
*/
594615
if (needLock)
595616
UnlockRelationForExtension(relation,ExclusiveLock);
596617

597618
/*
598-
* We need to initialize the empty new page. Double-check that it really
599-
* is empty (this should never happen, but if it does we don't want to
600-
* risk wiping out valid data).
619+
* Lock the other buffer. It's guaranteed to be of a lower page number
620+
* than the new page. To conform with the deadlock prevent rules, we ought
621+
* to lock otherBuffer first, but that would give other backends a chance
622+
* to put tuples on our page. To reduce the likelihood of that, attempt to
623+
* lock the other buffer conditionally, that's very likely to work.
624+
* Otherwise we need to lock buffers in the correct order, and retry if
625+
* the space has been used in the mean time.
626+
*
627+
* Alternatively, we could acquire the lock on otherBuffer before
628+
* extending the relation, but that'd require holding the lock while
629+
* performing IO, which seems worse than an unlikely retry.
601630
*/
602-
page=BufferGetPage(buffer);
631+
if (otherBuffer!=InvalidBuffer)
632+
{
633+
Assert(otherBuffer!=buffer);
603634

604-
if (!PageIsNew(page))
605-
elog(ERROR,"page %u of relation \"%s\" should be empty but is not",
606-
BufferGetBlockNumber(buffer),
607-
RelationGetRelationName(relation));
635+
if (unlikely(!ConditionalLockBuffer(otherBuffer)))
636+
{
637+
LockBuffer(buffer,BUFFER_LOCK_UNLOCK);
638+
LockBuffer(otherBuffer,BUFFER_LOCK_EXCLUSIVE);
639+
LockBuffer(buffer,BUFFER_LOCK_EXCLUSIVE);
608640

609-
PageInit(page,BufferGetPageSize(buffer),0);
641+
/*
642+
* Because the buffer was unlocked for a while, it's possible,
643+
* although unlikely, that the page was filled. If so, just retry
644+
* from start.
645+
*/
646+
if (len>PageGetHeapFreeSpace(page))
647+
{
648+
LockBuffer(otherBuffer,BUFFER_LOCK_UNLOCK);
649+
UnlockReleaseBuffer(buffer);
650+
651+
gotoloop;
652+
}
653+
}
654+
}
610655

611656
if (len>PageGetHeapFreeSpace(page))
612657
{

‎src/backend/access/heap/vacuumlazy.c

Lines changed: 44 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -860,43 +860,46 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
860860

861861
if (PageIsNew(page))
862862
{
863+
boolstill_new;
864+
863865
/*
864-
* An all-zeroes page could be left over if a backend extends the
865-
* relation but crashes before initializing the page. Reclaim such
866-
* pages for use.
867-
*
868-
* We have to be careful here because we could be looking at a
869-
* page that someone has just added to the relation and not yet
870-
* been able to initialize (see RelationGetBufferForTuple). To
871-
* protect against that, release the buffer lock, grab the
872-
* relation extension lock momentarily, and re-lock the buffer. If
873-
* the page is still uninitialized by then, it must be left over
874-
* from a crashed backend, and we can initialize it.
866+
* All-zeroes pages can be left over if either a backend extends
867+
* the relation by a single page, but crashes before the newly
868+
* initialized page has been written out, or when bulk-extending
869+
* the relation (which creates a number of empty pages at the tail
870+
* end of the relation, but enters them into the FSM).
875871
*
876-
*We don't really need the relation lock when this is a new or
877-
*temp relation, but it's probably not worth the codespaceto
878-
*check that, since this surely isn't a critical path.
872+
*Make sure these pages are in the FSM, to ensure they can be
873+
*reused. Do that by testing if there's anyspacerecorded for
874+
*the page. If not, enter it.
879875
*
880-
* Note: the comparable code in vacuum.c need not worry because
881-
* it's got exclusive lock on the whole relation.
876+
* Note we do not enter the page into the visibilitymap. That has
877+
* the downside that we repeatedly visit this page in subsequent
878+
* vacuums, but otherwise we'll never not discover the space on a
879+
* promoted standby. The harm of repeated checking ought to
880+
* normally not be too bad - the space usually should be used at
881+
* some point, otherwise there wouldn't be any regular vacuums.
882882
*/
883-
LockBuffer(buf,BUFFER_LOCK_UNLOCK);
884-
LockRelationForExtension(onerel,ExclusiveLock);
885-
UnlockRelationForExtension(onerel,ExclusiveLock);
886-
LockBufferForCleanup(buf);
887-
if (PageIsNew(page))
883+
884+
/*
885+
* Perform checking of FSM after releasing lock, the fsm is
886+
* approximate, after all.
887+
*/
888+
still_new=PageIsNew(page);
889+
UnlockReleaseBuffer(buf);
890+
891+
if (still_new)
888892
{
889-
ereport(WARNING,
890-
(errmsg("relation \"%s\" page %u is uninitialized --- fixing",
891-
relname,blkno)));
892-
PageInit(page,BufferGetPageSize(buf),0);
893893
empty_pages++;
894-
}
895-
freespace=PageGetHeapFreeSpace(page);
896-
MarkBufferDirty(buf);
897-
UnlockReleaseBuffer(buf);
898894

899-
RecordPageWithFreeSpace(onerel,blkno,freespace);
895+
if (GetRecordedFreeSpace(onerel,blkno)==0)
896+
{
897+
Sizefreespace;
898+
899+
freespace=BufferGetPageSize(buf)-SizeOfPageHeaderData;
900+
RecordPageWithFreeSpace(onerel,blkno,freespace);
901+
}
902+
}
900903
continue;
901904
}
902905

@@ -905,7 +908,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
905908
empty_pages++;
906909
freespace=PageGetHeapFreeSpace(page);
907910

908-
/* empty pages are always all-visible and all-frozen */
911+
/*
912+
* Empty pages are always all-visible and all-frozen (note that
913+
* the same is currently not true for new pages, see above).
914+
*/
909915
if (!PageIsAllVisible(page))
910916
{
911917
START_CRIT_SECTION();
@@ -1639,12 +1645,13 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
16391645

16401646
*hastup= false;
16411647

1642-
/* If we hit an uninitialized page, we want to force vacuuming it. */
1643-
if (PageIsNew(page))
1644-
return true;
1645-
1646-
/* Quick out for ordinary empty page. */
1647-
if (PageIsEmpty(page))
1648+
/*
1649+
* New and empty pages, obviously, don't contain tuples. We could make
1650+
* sure that the page is registered in the FSM, but it doesn't seem worth
1651+
* waiting for a cleanup lock just for that, especially because it's
1652+
* likely that the pin holder will do so.
1653+
*/
1654+
if (PageIsNew(page)||PageIsEmpty(page))
16481655
return false;
16491656

16501657
maxoff=PageGetMaxOffsetNumber(page);
@@ -2029,7 +2036,6 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
20292036

20302037
if (PageIsNew(page)||PageIsEmpty(page))
20312038
{
2032-
/* PageIsNew probably shouldn't happen... */
20332039
UnlockReleaseBuffer(buf);
20342040
continue;
20352041
}

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp