Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit6dced63

Browse files
committed
Fix control file update done in restartpoints still running after promotion
If a cluster is promoted (aka the control file shows a state differentthan DB_IN_ARCHIVE_RECOVERY) while CreateRestartPoint() is stillprocessing, this function could miss an update of the control file for"checkPoint" and "checkPointCopy" but still do the recycling and/orremoval of the past WAL segments, assuming that the to-be-updated LSNvalues should be used as reference points for the cleanup. This causesa follow-up restart attempting crash recovery to fail with a PANIC on amissing checkpoint record if the end-of-recovery checkpoint triggered bythe promotion did not complete while the cluster abruptly stopped orcrashed before the completion of this checkpoint. The PANIC would becaused by the redo LSN referred in the control file as located in asegment already gone, recycled by the previous restartpoint with"checkPoint" out-of-sync in the control file.This commit fixes the update of the control file during restartpoints soas "checkPoint" and "checkPointCopy" are updated even if the cluster hasbeen promoted while a restartpoint is running, to be on par with the setof WAL segments actually recycled in the end of CreateRestartPoint().7863ee4 has fixed this problem already on master, but the release timingof the latest point versions did not let me enough time to study and fixthat on all the stable branches.Reported-by: Fujii Masao, Rui ZhaoAuthor: Kyotaro HoriguchiReviewed-by: Nathan Bossart, Michael PaquierDiscussion:https://postgr.es/m/20220316.102444.2193181487576617583.horikyota.ntt@gmail.comBackpatch-through: 10
1 parentac51c9f commit6dced63

File tree

1 file changed

+29
-15
lines changed
  • src/backend/access/transam

1 file changed

+29
-15
lines changed

‎src/backend/access/transam/xlog.c

Lines changed: 29 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -9692,21 +9692,26 @@ CreateRestartPoint(int flags)
96929692
PriorRedoPtr=ControlFile->checkPointCopy.redo;
96939693

96949694
/*
9695-
* Update pg_control, using current time. Check that it still shows
9696-
*DB_IN_ARCHIVE_RECOVERY state and anolder checkpoint, else do nothing;
9697-
*this is a quick hack to make surenothing really bad happens if somehow
9698-
*we get here after theend-of-recovery checkpoint.
9695+
* Update pg_control, using current time. Check that it still shows an
9696+
* older checkpoint, else do nothing; this is a quick hack to make sure
9697+
* nothing really bad happens if somehow we get here after the
9698+
* end-of-recovery checkpoint.
96999699
*/
97009700
LWLockAcquire(ControlFileLock,LW_EXCLUSIVE);
9701-
if (ControlFile->state==DB_IN_ARCHIVE_RECOVERY&&
9702-
ControlFile->checkPointCopy.redo<lastCheckPoint.redo)
9701+
if (ControlFile->checkPointCopy.redo<lastCheckPoint.redo)
97039702
{
9703+
/*
9704+
* Update the checkpoint information. We do this even if the cluster
9705+
* does not show DB_IN_ARCHIVE_RECOVERY to match with the set of WAL
9706+
* segments recycled below.
9707+
*/
97049708
ControlFile->checkPoint=lastCheckPointRecPtr;
97059709
ControlFile->checkPointCopy=lastCheckPoint;
97069710
ControlFile->time= (pg_time_t)time(NULL);
97079711

97089712
/*
9709-
* Ensure minRecoveryPoint is past the checkpoint record. Normally,
9713+
* Ensure minRecoveryPoint is past the checkpoint record and update it
9714+
* if the control file still shows DB_IN_ARCHIVE_RECOVERY. Normally,
97109715
* this will have happened already while writing out dirty buffers,
97119716
* but not necessarily - e.g. because no buffers were dirtied. We do
97129717
* this because a non-exclusive base backup uses minRecoveryPoint to
@@ -9715,18 +9720,27 @@ CreateRestartPoint(int flags)
97159720
* at a minimum. Note that for an ordinary restart of recovery there's
97169721
* no value in having the minimum recovery point any earlier than this
97179722
* anyway, because redo will begin just after the checkpoint record.
9723+
* this because a non-exclusive base backup uses minRecoveryPoint to
9724+
* determine which WAL files must be included in the backup, and the
9725+
* file (or files) containing the checkpoint record must be included,
9726+
* at a minimum. Note that for an ordinary restart of recovery there's
9727+
* no value in having the minimum recovery point any earlier than this
9728+
* anyway, because redo will begin just after the checkpoint record.
97189729
*/
9719-
if (ControlFile->minRecoveryPoint<lastCheckPointEndPtr)
9730+
if (ControlFile->state==DB_IN_ARCHIVE_RECOVERY)
97209731
{
9721-
ControlFile->minRecoveryPoint=lastCheckPointEndPtr;
9722-
ControlFile->minRecoveryPointTLI=lastCheckPoint.ThisTimeLineID;
9732+
if (ControlFile->minRecoveryPoint<lastCheckPointEndPtr)
9733+
{
9734+
ControlFile->minRecoveryPoint=lastCheckPointEndPtr;
9735+
ControlFile->minRecoveryPointTLI=lastCheckPoint.ThisTimeLineID;
97239736

9724-
/* update local copy */
9725-
minRecoveryPoint=ControlFile->minRecoveryPoint;
9726-
minRecoveryPointTLI=ControlFile->minRecoveryPointTLI;
9737+
/* update local copy */
9738+
minRecoveryPoint=ControlFile->minRecoveryPoint;
9739+
minRecoveryPointTLI=ControlFile->minRecoveryPointTLI;
9740+
}
9741+
if (flags&CHECKPOINT_IS_SHUTDOWN)
9742+
ControlFile->state=DB_SHUTDOWNED_IN_RECOVERY;
97279743
}
9728-
if (flags&CHECKPOINT_IS_SHUTDOWN)
9729-
ControlFile->state=DB_SHUTDOWNED_IN_RECOVERY;
97309744
UpdateControlFile();
97319745
}
97329746
LWLockRelease(ControlFileLock);

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp