NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commit3c64dcb

committed

Prevent references to invalid relation pages after fresh promotion

If a standby crashes after promotion before having completed its firstpost-recovery checkpoint, then the minimal recovery point which marksthe LSN position where the cluster is able to reach consistency may beset to a position older than the first end-of-recovery checkpoint whileall the WAL available should be replayed. This leads to the instancethinking that it contains inconsistent pages, causing a PANIC and a hardinstance crash even if all the WAL available has not been replayed forcertain sets of records replayed. When in crash recovery,minRecoveryPoint is expected to always be set to InvalidXLogRecPtr,which forces the recovery to replay all the WAL available, so thiscommit makes sure that the local copy of minRecoveryPoint from thecontrol file is initialized properly and stays as it is while crashrecovery is performed. Once switching to archive recovery or if crashrecovery finishes, then the local copy minRecoveryPoint can be safelyupdated.Pavan Deolasee has reported and diagnosed the failure in the firstplace, and the base fix idea to rely on the local copy ofminRecoveryPoint comes from Kyotaro Horiguchi, which has been expandedinto a full-fledged patch by me. The test included in this commit hasbeen written by Álvaro Herrera and Pavan Deolasee, which I have modifiedto make it faster and more reliable with sleep phases.Backpatch down to all supported versions where the bug appears, aka 9.3which is where the end-of-recovery checkpoint is not run by the startupprocess anymore. The test gets easily supported down to 10, still ithas been tested on all branches.Reported-by: Pavan DeolaseeDiagnosed-by: Pavan DeolaseeReviewed-by: Pavan Deolasee, Kyotaro HoriguchiAuthor: Michael Paquier, Kyotaro Horiguchi, Pavan Deolasee, ÁlvaroHerreraDiscussion:https://postgr.es/m/CABOikdPOewjNL=05K5CbNMxnNtXnQjhTx2F--4p4ruorCjukbA@mail.gmail.com

1 parent249126e commit3c64dcbCopy full SHA for 3c64dcb

File tree

2 files changed

+157

-31

lines changed

src
- backend/access/transam
  - xlog.c
- test/recovery/t
  - 015_promotion_pages.pl

2 files changed

+157

-31

lines changed

`‎src/backend/access/transam/xlog.c`

Lines changed: 70 additions & 31 deletions

Original file line number	Diff line number	Diff line change
`@@ -821,8 +821,14 @@ static XLogSource XLogReceiptSource = 0;/* XLOG_FROM_* code */`
`821`	`821`	`staticXLogRecPtrReadRecPtr;/* start of last record read */`
`822`	`822`	`staticXLogRecPtrEndRecPtr;/* end+1 of last record read */`
`823`	`823`
`824`		`-staticXLogRecPtrminRecoveryPoint;/* local copy of`
`825`		`- * ControlFile->minRecoveryPoint */`
	`824`	`+/*`
	`825`	`+ * Local copies of equivalent fields in the control file. When running`
	`826`	`+ * crash recovery, minRecoveryPoint is set to InvalidXLogRecPtr as we`
	`827`	`+ * expect to replay all the WAL available, and updateMinRecoveryPoint is`
	`828`	`+ * switched to false to prevent any updates while replaying records.`
	`829`	`+ * Those values are kept consistent as long as crash recovery runs.`
	`830`	`+ */`
	`831`	`+staticXLogRecPtrminRecoveryPoint;`
`826`	`832`	`staticTimeLineIDminRecoveryPointTLI;`
`827`	`833`	`staticboolupdateMinRecoveryPoint= true;`
`828`	`834`
`@@ -2711,20 +2717,26 @@ UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force)`
`2711`	`2717`	`if (!updateMinRecoveryPoint\|\| (!force&&lsn <=minRecoveryPoint))`
`2712`	`2718`	`return;`
`2713`	`2719`
	`2720`	`+/*`
	`2721`	`+ * An invalid minRecoveryPoint means that we need to recover all the WAL,`
	`2722`	`+ * i.e., we're doing crash recovery. We never modify the control file's`
	`2723`	`+ * value in that case, so we can short-circuit future checks here too. The`
	`2724`	`+ * local values of minRecoveryPoint and minRecoveryPointTLI should not be`
	`2725`	`+ * updated until crash recovery finishes.`
	`2726`	`+ */`
	`2727`	`+if (XLogRecPtrIsInvalid(minRecoveryPoint))`
	`2728`	`+{`
	`2729`	`+updateMinRecoveryPoint= false;`
	`2730`	`+return;`
	`2731`	`+}`
	`2732`	`+`
`2714`	`2733`	`LWLockAcquire(ControlFileLock,LW_EXCLUSIVE);`
`2715`	`2734`
`2716`	`2735`	`/* update local copy */`
`2717`	`2736`	`minRecoveryPoint=ControlFile->minRecoveryPoint;`
`2718`	`2737`	`minRecoveryPointTLI=ControlFile->minRecoveryPointTLI;`
`2719`	`2738`
`2720`		`-/*`
`2721`		`- * An invalid minRecoveryPoint means that we need to recover all the WAL,`
`2722`		`- * i.e., we're doing crash recovery. We never modify the control file's`
`2723`		`- * value in that case, so we can short-circuit future checks here too.`
`2724`		`- */`
`2725`		`-if (minRecoveryPoint==0)`
`2726`		`-updateMinRecoveryPoint= false;`
`2727`		`-elseif (force\|\|minRecoveryPoint<lsn)`
	`2739`	`+if (force\|\|minRecoveryPoint<lsn)`
`2728`	`2740`	`{`
`2729`	`2741`	`XLogRecPtrnewMinRecoveryPoint;`
`2730`	`2742`	`TimeLineIDnewMinRecoveryPointTLI;`
`@@ -3110,7 +3122,16 @@ XLogNeedsFlush(XLogRecPtr record)`
`3110`	`3122`	`*/`
`3111`	`3123`	`if (RecoveryInProgress())`
`3112`	`3124`	`{`
`3113`		`-/* Quick exit if already known updated */`
	`3125`	`+/*`
	`3126`	`+ * An invalid minRecoveryPoint means that we need to recover all the`
	`3127`	`+ * WAL, i.e., we're doing crash recovery. We never modify the control`
	`3128`	`+ * file's value in that case, so we can short-circuit future checks`
	`3129`	`+ * here too.`
	`3130`	`+ */`
	`3131`	`+if (XLogRecPtrIsInvalid(minRecoveryPoint))`
	`3132`	`+updateMinRecoveryPoint= false;`
	`3133`	`+`
	`3134`	`+/* Quick exit if already known to be updated or cannot be updated */`
`3114`	`3135`	`if (record <=minRecoveryPoint\|\| !updateMinRecoveryPoint)`
`3115`	`3136`	`return false;`
`3116`	`3137`
`@@ -3124,20 +3145,8 @@ XLogNeedsFlush(XLogRecPtr record)`
`3124`	`3145`	`minRecoveryPointTLI=ControlFile->minRecoveryPointTLI;`
`3125`	`3146`	`LWLockRelease(ControlFileLock);`
`3126`	`3147`
`3127`		`-/*`
`3128`		`- * An invalid minRecoveryPoint means that we need to recover all the`
`3129`		`- * WAL, i.e., we're doing crash recovery. We never modify the control`
`3130`		`- * file's value in that case, so we can short-circuit future checks`
`3131`		`- * here too.`
`3132`		`- */`
`3133`		`-if (minRecoveryPoint==0)`
`3134`		`-updateMinRecoveryPoint= false;`
`3135`		`-`
`3136`	`3148`	`/* check again */`
`3137`		`-if (record <=minRecoveryPoint\|\| !updateMinRecoveryPoint)`
`3138`		`-return false;`
`3139`		`-else`
`3140`		`-return true;`
	`3149`	`+returnrecord>minRecoveryPoint;`
`3141`	`3150`	`}`
`3142`	`3151`
`3143`	`3152`	`/* Quick exit if already known flushed */`
`@@ -4269,6 +4278,12 @@ ReadRecord(XLogReaderState *xlogreader, XLogRecPtr RecPtr, int emode,`
`4269`	`4278`	`minRecoveryPoint=ControlFile->minRecoveryPoint;`
`4270`	`4279`	`minRecoveryPointTLI=ControlFile->minRecoveryPointTLI;`
`4271`	`4280`
	`4281`	`+/*`
	`4282`	`+ * The startup process can update its local copy of`
	`4283`	`+ * minRecoveryPoint from this point.`
	`4284`	`+ */`
	`4285`	`+updateMinRecoveryPoint= true;`
	`4286`	`+`
`4272`	`4287`	`UpdateControlFile();`
`4273`	`4288`	`LWLockRelease(ControlFileLock);`
`4274`	`4289`
`@@ -6892,9 +6907,26 @@ StartupXLOG(void)`
`6892`	`6907`	`/* No need to hold ControlFileLock yet, we aren't up far enough */`
`6893`	`6908`	`UpdateControlFile();`
`6894`	`6909`
`6895`		`-/* initialize our local copy of minRecoveryPoint */`
`6896`		`-minRecoveryPoint=ControlFile->minRecoveryPoint;`
`6897`		`-minRecoveryPointTLI=ControlFile->minRecoveryPointTLI;`
	`6910`	`+/*`
	`6911`	`+ * Initialize our local copy of minRecoveryPoint. When doing crash`
	`6912`	`+ * recovery we want to replay up to the end of WAL. Particularly, in`
	`6913`	`+ * the case of a promoted standby minRecoveryPoint value in the`
	`6914`	`+ * control file is only updated after the first checkpoint. However,`
	`6915`	`+ * if the instance crashes before the first post-recovery checkpoint`
	`6916`	`+ * is completed then recovery will use a stale location causing the`
	`6917`	`+ * startup process to think that there are still invalid page`
	`6918`	`+ * references when checking for data consistency.`
	`6919`	`+ */`
	`6920`	`+if (InArchiveRecovery)`
	`6921`	`+{`
	`6922`	`+minRecoveryPoint=ControlFile->minRecoveryPoint;`
	`6923`	`+minRecoveryPointTLI=ControlFile->minRecoveryPointTLI;`
	`6924`	`+}`
	`6925`	`+else`
	`6926`	`+{`
	`6927`	`+minRecoveryPoint=InvalidXLogRecPtr;`
	`6928`	`+minRecoveryPointTLI=0;`
	`6929`	`+}`
`6898`	`6930`
`6899`	`6931`	`/*`
`6900`	`6932`	`* Reset pgstat data, because it may be invalid after recovery.`
`@@ -7861,6 +7893,8 @@ CheckRecoveryConsistency(void)`
`7861`	`7893`	`if (XLogRecPtrIsInvalid(minRecoveryPoint))`
`7862`	`7894`	`return;`
`7863`	`7895`
	`7896`	`+Assert(InArchiveRecovery);`
	`7897`	`+`
`7864`	`7898`	`/*`
`7865`	`7899`	`* assume that we are called in the startup process, and hence don't need`
`7866`	`7900`	`* a lock to read lastReplayedEndRecPtr`
`@@ -9949,11 +9983,16 @@ xlog_redo(XLogReaderState *record)`
`9949`	`9983`	`* Update minRecoveryPoint to ensure that if recovery is aborted, we`
`9950`	`9984`	`* recover back up to this point before allowing hot standby again.`
`9951`	`9985`	`* This is important if the max_* settings are decreased, to ensure`
`9952`		`- * you don't run queries against the WAL preceding the change.`
	`9986`	`+ * you don't run queries against the WAL preceding the change. The`
	`9987`	`+ * local copies cannot be updated as long as crash recovery is`
	`9988`	`+ * happening and we expect all the WAL to be replayed.`
`9953`	`9989`	`*/`
`9954`		`-minRecoveryPoint=ControlFile->minRecoveryPoint;`
`9955`		`-minRecoveryPointTLI=ControlFile->minRecoveryPointTLI;`
`9956`		`-if (minRecoveryPoint!=0&&minRecoveryPoint<lsn)`
	`9990`	`+if (InArchiveRecovery)`
	`9991`	`+{`
	`9992`	`+minRecoveryPoint=ControlFile->minRecoveryPoint;`
	`9993`	`+minRecoveryPointTLI=ControlFile->minRecoveryPointTLI;`
	`9994`	`+}`
	`9995`	`+if (minRecoveryPoint!=InvalidXLogRecPtr&&minRecoveryPoint<lsn)`
`9957`	`9996`	`{`
`9958`	`9997`	`ControlFile->minRecoveryPoint=lsn;`
`9959`	`9998`	`ControlFile->minRecoveryPointTLI=ThisTimeLineID;`

`‎src/test/recovery/t/015_promotion_pages.pl`

Lines changed: 87 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,87 @@`
	`1`	`+# Test for promotion handling with WAL records generated post-promotion`
	`2`	`+# before the first checkpoint is generated. This test case checks for`
	`3`	`+# invalid page references at replay based on the minimum consistent`
	`4`	`+# recovery point defined.`
	`5`	`+use strict;`
	`6`	`+use warnings;`
	`7`	`+use PostgresNode;`
	`8`	`+use TestLib;`
	`9`	`+use Test::Moretests=> 1;`
	`10`	`+`
	`11`	`+# Initialize primary node`
	`12`	`+my$alpha = get_new_node('alpha');`
	`13`	`+$alpha->init(allows_streaming=> 1);`
	`14`	`+# Setting wal_log_hints to off is important to get invalid page`
	`15`	`+# references.`
	`16`	`+$alpha->append_conf("postgresql.conf",<<EOF);`
	`17`	`+wal_log_hints = off`
	`18`	`+EOF`
	`19`	`+`
	`20`	`+# Start the primary`
	`21`	`+$alpha->start;`
	`22`	`+`
	`23`	`+# setup/start a standby`
	`24`	`+$alpha->backup('bkp');`
	`25`	`+my$bravo = get_new_node('bravo');`
	`26`	`+$bravo->init_from_backup($alpha,'bkp',has_streaming=> 1);`
	`27`	`+$bravo->append_conf('postgresql.conf',<<EOF);`
	`28`	`+checkpoint_timeout=1h`
	`29`	`+checkpoint_completion_target=0.9`
	`30`	`+EOF`
	`31`	`+$bravo->start;`
	`32`	`+`
	`33`	`+# Dummy table for the upcoming tests.`
	`34`	`+$alpha->safe_psql('postgres','create table test1 (a int)');`
	`35`	`+$alpha->safe_psql('postgres','insert into test1 select generate_series(1, 10000)');`
	`36`	`+`
	`37`	`+# take a checkpoint`
	`38`	`+$alpha->safe_psql('postgres','checkpoint');`
	`39`	`+`
	`40`	`+# The following vacuum will set visibility map bits and create`
	`41`	`+# problematic WAL records.`
	`42`	`+$alpha->safe_psql('postgres','vacuum verbose test1');`
	`43`	`+# Wait for last record to have been replayed on the standby.`
	`44`	`+$alpha->wait_for_catchup($bravo,'replay',`
	`45`	`+$alpha->lsn('insert'));`
	`46`	`+`
	`47`	`+# Now force a checkpoint on the standby. This seems unnecessary but for "some"`
	`48`	`+# reason, the previous checkpoint on the primary does not reflect on the standby`
	`49`	`+# and without an explicit checkpoint, it may start redo recovery from a much`
	`50`	`+# older point, which includes even create table and initial page additions.`
	`51`	`+$bravo->safe_psql('postgres','checkpoint');`
	`52`	`+`
	`53`	`+# Now just use a dummy table and run some operations to move minRecoveryPoint`
	`54`	`+# beyond the previous vacuum.`
	`55`	`+$alpha->safe_psql('postgres','create table test2 (a int, b text)');`
	`56`	`+$alpha->safe_psql('postgres','insert into test2 select generate_series(1,10000), md5(random()::text)');`
	`57`	`+$alpha->safe_psql('postgres','truncate test2');`
	`58`	`+`
	`59`	`+# Wait again for all records to be replayed.`
	`60`	`+$alpha->wait_for_catchup($bravo,'replay',`
	`61`	`+$alpha->lsn('insert'));`
	`62`	`+`
	`63`	`+# Do the promotion, which reinitializes minRecoveryPoint in the control`
	`64`	`+# file so as WAL is replayed up to the end.`
	`65`	`+$bravo->promote;`
	`66`	`+`
	`67`	`+# Truncate the table on the promoted standby, vacuum and extend it`
	`68`	`+# again to create new page references. The first post-recovery checkpoint`
	`69`	`+# has not happened yet.`
	`70`	`+$bravo->safe_psql('postgres','truncate test1');`
	`71`	`+$bravo->safe_psql('postgres','vacuum verbose test1');`
	`72`	`+$bravo->safe_psql('postgres','insert into test1 select generate_series(1,1000)');`
	`73`	`+`
	`74`	`+# Now crash-stop the promoted standby and restart. This makes sure that`
	`75`	`+# replay does not see invalid page references because of an invalid`
	`76`	`+# minimum consistent recovery point.`
	`77`	`+$bravo->stop('immediate');`
	`78`	`+$bravo->start;`
	`79`	`+`
	`80`	`+# Check state of the table after full crash recovery. All its data should`
	`81`	`+# be here.`
	`82`	`+my$psql_out;`
	`83`	`+$bravo->psql(`
	`84`	`+'postgres',`
	`85`	`+"SELECT count(*) FROM test1",`
	`86`	`+stdout=> \$psql_out);`
	`87`	`+is($psql_out,'1000',"Check that table state is correct");`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit3c64dcb

File tree

2 files changed

2 files changed

`‎src/backend/access/transam/xlog.c`

`‎src/test/recovery/t/015_promotion_pages.pl`

0 commit comments