NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commit595b9cb

committed

Fix timeline assignment in checkpoints with 2PC transactions

Any transactions found as still prepared by a checkpoint have theirstate data read from the WAL records generated by PREPARE TRANSACTIONbefore being moved into their new location within pg_twophase/. Whilereading such records, the WAL reader uses the callbackread_local_xlog_page() to read a page, that is shared across variousparts of the system. This callback, since1148e22, has introduced anupdate of ThisTimeLineID when reading a record while in recovery, whichis potentially helpful in the context of cascading WAL senders.This update of ThisTimeLineID interacts badly with the checkpointer if apromotion happens while some 2PC data is read from its record, as, bychanging ThisTimeLineID, any follow-up WAL records would be written toan timeline older than the promoted one. This results in consistencyissues. For instance, a subsequent server restart would cause a failurein finding a valid checkpoint record, resulting in a PANIC, forinstance.This commit changes the code reading the 2PC data to reset the timelineonce the 2PC record has been read, to prevent messing up with the staticstate of the checkpointer. It would be tempting to do the same thingdirectly in read_local_xlog_page(). However, based on the discussionthat has led to1148e22, users may rely on the updates ofThisTimeLineID when a WAL record page is read in recovery, so changingthis callback could break some cases that are working currently.A TAP test reproducing the issue is added, relying on a PITR toprecisely trigger a promotion with a prepared transaction stilltracked.Per discussion with Heikki Linnakangas, Kyotaro Horiguchi, Fujii Masaoand myself.Author: Soumyadeep Chakraborty, Jimmy Yih, Kevin YeapDiscussion:https://postgr.es/m/CAE-ML+_EjH_fzfq1F3RJ1=XaaNG=-Jz-i3JqkNhXiLAsM3z-Ew@mail.gmail.comBackpatch-through: 10

1 parentac897c4 commit595b9cbCopy full SHA for 595b9cb

File tree

2 files changed

+103

-1

lines changed

src
- backend/access/transam
  - twophase.c
- test/recovery/t
  - 023_pitr_prepared_xact.pl

2 files changed

+103

-1

lines changed

`‎src/backend/access/transam/twophase.c`

Lines changed: 14 additions & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -1316,14 +1316,19 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)`
`1316`	`1316`	`* twophase files and ReadTwoPhaseFile should be used instead.`
`1317`	`1317`	`*`
`1318`	`1318`	`* Note clearly that this function can access WAL during normal operation,`
`1319`		`- * similarly to the way WALSender or Logical Decoding would do.`
	`1319`	`+ * similarly to the way WALSender or Logical Decoding would do. While`
	`1320`	`+ * accessing WAL, read_local_xlog_page() may change ThisTimeLineID,`
	`1321`	`+ * particularly if this routine is called for the end-of-recovery checkpoint`
	`1322`	`+ * in the checkpointer itself, so save the current timeline number value`
	`1323`	`+ * and restore it once done.`
`1320`	`1324`	`*/`
`1321`	`1325`	`staticvoid`
`1322`	`1326`	`XlogReadTwoPhaseData(XLogRecPtrlsn,char*buf,intlen)`
`1323`	`1327`	`{`
`1324`	`1328`	`XLogRecord*record;`
`1325`	`1329`	`XLogReaderState*xlogreader;`
`1326`	`1330`	`char*errormsg;`
	`1331`	`+TimeLineIDsave_currtli=ThisTimeLineID;`
`1327`	`1332`
`1328`	`1333`	`xlogreader=XLogReaderAllocate(wal_segment_size,NULL,`
`1329`	`1334`	`XL_ROUTINE(.page_read=&read_local_xlog_page,`
`@@ -1338,6 +1343,14 @@ XlogReadTwoPhaseData(XLogRecPtr lsn, char *buf, int len)`
`1338`	`1343`
`1339`	`1344`	`XLogBeginRead(xlogreader,lsn);`
`1340`	`1345`	`record=XLogReadRecord(xlogreader,&errormsg);`
	`1346`	`+`
	`1347`	`+/*`
	`1348`	`+ * Restore immediately the timeline where it was previously, as`
	`1349`	`+ * read_local_xlog_page() could have changed it if the record was read`
	`1350`	`+ * while recovery was finishing or if the timeline has jumped in-between.`
	`1351`	`+ */`
	`1352`	`+ThisTimeLineID=save_currtli;`
	`1353`	`+`
`1341`	`1354`	`if (record==NULL)`
`1342`	`1355`	`ereport(ERROR,`
`1343`	`1356`	`(errcode_for_file_access(),`

`‎src/test/recovery/t/023_pitr_prepared_xact.pl`

Lines changed: 89 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,89 @@`
	`1`	`+# Test for point-in-time-recovery (PITR) with prepared transactions`
	`2`	`+use strict;`
	`3`	`+use warnings;`
	`4`	`+use PostgresNode;`
	`5`	`+use TestLib;`
	`6`	`+use Test::Moretests=> 1;`
	`7`	`+use File::Compare;`
	`8`	`+`
	`9`	`+# Initialize and start primary node with WAL archiving`
	`10`	`+my$node_primary = get_new_node('primary');`
	`11`	`+$node_primary->init(has_archiving=> 1);`
	`12`	`+$node_primary->append_conf(`
	`13`	`+'postgresql.conf',qq{`
	`14`	`+max_wal_senders = 10`
	`15`	`+wal_level = 'replica'`
	`16`	`+max_prepared_transactions = 10});`
	`17`	`+$node_primary->start;`
	`18`	`+`
	`19`	`+# Take backup`
	`20`	`+my$backup_name ='my_backup';`
	`21`	`+$node_primary->backup($backup_name);`
	`22`	`+`
	`23`	`+# Initialize node for PITR targeting a very specific restore point, just`
	`24`	`+# after a PREPARE TRANSACTION is issued so as we finish with a promoted`
	`25`	`+# node where this 2PC transaction needs an explicit COMMIT PREPARED.`
	`26`	`+my$node_pitr = get_new_node('node_pitr');`
	`27`	`+$node_pitr->init_from_backup(`
	`28`	`+$node_primary,$backup_name,`
	`29`	`+standby=> 0,`
	`30`	`+has_restoring=> 1);`
	`31`	`+$node_pitr->append_conf(`
	`32`	`+'postgresql.conf',qq{`
	`33`	`+max_prepared_transactions = 10`
	`34`	`+recovery_target_name = 'rp'`
	`35`	`+recovery_target_action = 'promote'});`
	`36`	`+`
	`37`	`+# Workload with a prepared transaction and the target restore point.`
	`38`	`+$node_primary->psql(`
	`39`	`+'postgres',qq{`
	`40`	`+CREATE TABLE foo(i int);`
	`41`	`+BEGIN;`
	`42`	`+INSERT INTO foo VALUES(1);`
	`43`	`+PREPARE TRANSACTION 'fooinsert';`
	`44`	`+SELECT pg_create_restore_point('rp');`
	`45`	`+INSERT INTO foo VALUES(2);`
	`46`	`+});`
	`47`	`+`
	`48`	`+# Find next WAL segment to be archived`
	`49`	`+my$walfile_to_be_archived =$node_primary->safe_psql('postgres',`
	`50`	`+"SELECT pg_walfile_name(pg_current_wal_lsn());");`
	`51`	`+`
	`52`	`+# Make WAL segment eligible for archival`
	`53`	`+$node_primary->safe_psql('postgres','SELECT pg_switch_wal()');`
	`54`	`+`
	`55`	`+# Wait until the WAL segment has been archived.`
	`56`	`+my$archive_wait_query =`
	`57`	`+"SELECT '$walfile_to_be_archived' <= last_archived_wal FROM pg_stat_archiver;";`
	`58`	`+$node_primary->poll_query_until('postgres',$archive_wait_query)`
	`59`	`+ordie"Timed out while waiting for WAL segment to be archived";`
	`60`	`+my$last_archived_wal_file =$walfile_to_be_archived;`
	`61`	`+`
	`62`	`+# Now start the PITR node.`
	`63`	`+$node_pitr->start;`
	`64`	`+`
	`65`	`+# Wait until the PITR node exits recovery.`
	`66`	`+$node_pitr->poll_query_until('postgres',"SELECT pg_is_in_recovery() = 'f';")`
	`67`	`+ordie"Timed out while waiting for PITR promotion";`
	`68`	`+`
	`69`	`+# Commit the prepared transaction in the latest timeline and check its`
	`70`	`+# result. There should only be one row in the table, coming from the`
	`71`	`+# prepared transaction. The row from the INSERT after the restore point`
	`72`	`+# should not show up, since our recovery target was older than the second`
	`73`	`+# INSERT done.`
	`74`	`+$node_pitr->psql('postgres',qq{COMMIT PREPARED 'fooinsert';});`
	`75`	`+my$result =$node_pitr->safe_psql('postgres',"SELECT * FROM foo;");`
	`76`	`+is($result,qq{1},"check table contents after COMMIT PREPARED");`
	`77`	`+`
	`78`	`+# Insert more data and do a checkpoint. These should be generated on the`
	`79`	`+# timeline chosen after the PITR promotion.`
	`80`	`+$node_pitr->psql(`
	`81`	`+'postgres',qq{`
	`82`	`+INSERT INTO foo VALUES(3);`
	`83`	`+CHECKPOINT;`
	`84`	`+});`
	`85`	`+`
	`86`	`+# Enforce recovery, the checkpoint record generated previously should`
	`87`	`+# still be found.`
	`88`	`+$node_pitr->stop('immediate');`
	`89`	`+$node_pitr->start;`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit595b9cb

File tree

2 files changed

2 files changed

`‎src/backend/access/transam/twophase.c`

`‎src/test/recovery/t/023_pitr_prepared_xact.pl`

0 commit comments