Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit49d9cfc

Browse files
committed
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directories whenreplaying create database WAL records. Prior to this patch, the standbywould fail to recover in such a case. However, the directories could belegitimately missing. Consider a sequence of WAL records as follows: CREATE DATABASE DROP DATABASE DROP TABLESPACEIf, after replaying the last WAL record and removing the tablespacedirectory, the standby crashes and has to replay the create databaserecord again, the crash recovery must be able to move on.This patch adds a mechanism similar to invalid-page tracking, to keep atally of missing directories during crash recovery. If all the missingdirectory references are matched with corresponding drop records at theend of crash recovery, the standby can safely continue following theprimary.Backpatch to 13, at least for now. The bug is older, but fixing it inolder branches requires more careful study of the interactions withcommite6d8069, which appeared in 13.A new TAP test file is added to verify the condition. However, becauseit depends on commitd6d317d, it can only be added to branchmaster. I (Álvaro) manually verified that the code behaves as expectedin branch 14. It's a bit nervous-making to leave the code uncovered bytests in older branches, but leaving the bug unfixed is even worse.Also, the main reason this fix took so long is precisely that wecouldn't agree on a good strategy to approach testing for the bug, soperhaps this is the best we can do.Diagnosed-by: Paul Guo <paulguo@gmail.com>Author: Paul Guo <paulguo@gmail.com>Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>Author: Asim R Praveen <apraveen@pivotal.io>Discussion:https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
1 parentc64fb69 commit49d9cfc

File tree

7 files changed

+311
-1
lines changed

7 files changed

+311
-1
lines changed

‎src/backend/access/transam/xlogrecovery.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2047,6 +2047,12 @@ CheckRecoveryConsistency(void)
20472047
*/
20482048
XLogCheckInvalidPages();
20492049

2050+
/*
2051+
* Check if the XLOG sequence contained any unresolved references to
2052+
* missing directories.
2053+
*/
2054+
XLogCheckMissingDirs();
2055+
20502056
reachedConsistency= true;
20512057
ereport(LOG,
20522058
(errmsg("consistent recovery state reached at %X/%X",

‎src/backend/access/transam/xlogutils.c

Lines changed: 158 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,164 @@ boolInRecovery = false;
5454
/* Are we in Hot Standby mode? Only valid in startup process, see xlogutils.h */
5555
HotStandbyStatestandbyState=STANDBY_DISABLED;
5656

57+
58+
/*
59+
* If a create database WAL record is being replayed more than once during
60+
* crash recovery on a standby, it is possible that either the tablespace
61+
* directory or the template database directory is missing. This happens when
62+
* the directories are removed by replay of subsequent drop records. Note
63+
* that this problem happens only on standby and not on master. On master, a
64+
* checkpoint is created at the end of create database operation. On standby,
65+
* however, such a strategy (creating restart points during replay) is not
66+
* viable because it will slow down WAL replay.
67+
*
68+
* The alternative is to track references to each missing directory
69+
* encountered when performing crash recovery in the following hash table.
70+
* Similar to invalid page table above, the expectation is that each missing
71+
* directory entry should be matched with a drop database or drop tablespace
72+
* WAL record by the end of crash recovery.
73+
*/
74+
typedefstructxl_missing_dir_key
75+
{
76+
OidspcNode;
77+
OiddbNode;
78+
}xl_missing_dir_key;
79+
80+
typedefstructxl_missing_dir
81+
{
82+
xl_missing_dir_keykey;
83+
charpath[MAXPGPATH];
84+
}xl_missing_dir;
85+
86+
staticHTAB*missing_dir_tab=NULL;
87+
88+
89+
/*
90+
* Keep track of a directory that wasn't found while replaying database
91+
* creation records. These should match up with tablespace removal records
92+
* later in the WAL stream; we verify that before reaching consistency.
93+
*/
94+
void
95+
XLogRememberMissingDir(OidspcNode,OiddbNode,char*path)
96+
{
97+
xl_missing_dir_keykey;
98+
boolfound;
99+
xl_missing_dir*entry;
100+
101+
/*
102+
* Database OID may be invalid but tablespace OID must be valid. If
103+
* dbNode is InvalidOid, we are logging a missing tablespace directory,
104+
* otherwise we are logging a missing database directory.
105+
*/
106+
Assert(OidIsValid(spcNode));
107+
108+
if (missing_dir_tab==NULL)
109+
{
110+
/* create hash table when first needed */
111+
HASHCTLctl;
112+
113+
memset(&ctl,0,sizeof(ctl));
114+
ctl.keysize=sizeof(xl_missing_dir_key);
115+
ctl.entrysize=sizeof(xl_missing_dir);
116+
117+
missing_dir_tab=hash_create("XLOG missing directory table",
118+
100,
119+
&ctl,
120+
HASH_ELEM |HASH_BLOBS);
121+
}
122+
123+
key.spcNode=spcNode;
124+
key.dbNode=dbNode;
125+
126+
entry=hash_search(missing_dir_tab,&key,HASH_ENTER,&found);
127+
128+
if (found)
129+
{
130+
if (dbNode==InvalidOid)
131+
elog(DEBUG1,"missing directory %s (tablespace %u) already exists: %s",
132+
path,spcNode,entry->path);
133+
else
134+
elog(DEBUG1,"missing directory %s (tablespace %u database %u) already exists: %s",
135+
path,spcNode,dbNode,entry->path);
136+
}
137+
else
138+
{
139+
strlcpy(entry->path,path,sizeof(entry->path));
140+
if (dbNode==InvalidOid)
141+
elog(DEBUG1,"logged missing dir %s (tablespace %u)",
142+
path,spcNode);
143+
else
144+
elog(DEBUG1,"logged missing dir %s (tablespace %u database %u)",
145+
path,spcNode,dbNode);
146+
}
147+
}
148+
149+
/*
150+
* Remove an entry from the list of directories not found. This is to be done
151+
* when the matching tablespace removal WAL record is found.
152+
*/
153+
void
154+
XLogForgetMissingDir(OidspcNode,OiddbNode)
155+
{
156+
xl_missing_dir_keykey;
157+
158+
key.spcNode=spcNode;
159+
key.dbNode=dbNode;
160+
161+
/* Database OID may be invalid but tablespace OID must be valid. */
162+
Assert(OidIsValid(spcNode));
163+
164+
if (missing_dir_tab==NULL)
165+
return;
166+
167+
if (hash_search(missing_dir_tab,&key,HASH_REMOVE,NULL)!=NULL)
168+
{
169+
if (dbNode==InvalidOid)
170+
{
171+
elog(DEBUG2,"forgot missing dir (tablespace %u)",spcNode);
172+
}
173+
else
174+
{
175+
char*path=GetDatabasePath(dbNode,spcNode);
176+
177+
elog(DEBUG2,"forgot missing dir %s (tablespace %u database %u)",
178+
path,spcNode,dbNode);
179+
pfree(path);
180+
}
181+
}
182+
}
183+
184+
/*
185+
* This is called at the end of crash recovery, before entering archive
186+
* recovery on a standby. PANIC if the hash table is not empty.
187+
*/
188+
void
189+
XLogCheckMissingDirs(void)
190+
{
191+
HASH_SEQ_STATUSstatus;
192+
xl_missing_dir*hentry;
193+
boolfoundone= false;
194+
195+
if (missing_dir_tab==NULL)
196+
return;/* nothing to do */
197+
198+
hash_seq_init(&status,missing_dir_tab);
199+
200+
while ((hentry= (xl_missing_dir*)hash_seq_search(&status))!=NULL)
201+
{
202+
elog(WARNING,"missing directory \"%s\" tablespace %u database %u",
203+
hentry->path,hentry->key.spcNode,hentry->key.dbNode);
204+
foundone= true;
205+
}
206+
207+
if (foundone)
208+
elog(PANIC,"WAL contains references to missing directories");
209+
210+
hash_destroy(missing_dir_tab);
211+
missing_dir_tab=NULL;
212+
}
213+
214+
57215
/*
58216
* During XLOG replay, we may see XLOG records for incremental updates of
59217
* pages that no longer exist, because their relation was later dropped or
@@ -79,7 +237,6 @@ typedef struct xl_invalid_page
79237

80238
staticHTAB*invalid_page_tab=NULL;
81239

82-
83240
/* Report a reference to an invalid page */
84241
staticvoid
85242
report_invalid_page(intelevel,RelFileNodenode,ForkNumberforkno,

‎src/backend/commands/dbcommands.c

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
#include"access/tableam.h"
3131
#include"access/xact.h"
3232
#include"access/xloginsert.h"
33+
#include"access/xlogrecovery.h"
3334
#include"access/xlogutils.h"
3435
#include"catalog/catalog.h"
3536
#include"catalog/dependency.h"
@@ -2483,7 +2484,9 @@ dbase_redo(XLogReaderState *record)
24832484
xl_dbase_create_rec*xlrec= (xl_dbase_create_rec*)XLogRecGetData(record);
24842485
char*src_path;
24852486
char*dst_path;
2487+
char*parent_path;
24862488
structstatst;
2489+
boolskip= false;
24872490

24882491
src_path=GetDatabasePath(xlrec->src_db_id,xlrec->src_tablespace_id);
24892492
dst_path=GetDatabasePath(xlrec->db_id,xlrec->tablespace_id);
@@ -2501,6 +2504,56 @@ dbase_redo(XLogReaderState *record)
25012504
(errmsg("some useless files may be left behind in old database directory \"%s\"",
25022505
dst_path)));
25032506
}
2507+
elseif (!reachedConsistency)
2508+
{
2509+
/*
2510+
* It is possible that a drop tablespace record appearing later in
2511+
* WAL has already been replayed -- in other words, that we are
2512+
* replaying the database creation record a second time with no
2513+
* intervening checkpoint. In that case, the tablespace directory
2514+
* has already been removed and the create database operation
2515+
* cannot be replayed. Skip the replay itself, but remember the
2516+
* fact that the tablespace directory is missing, to be matched
2517+
* with the expected tablespace drop record later.
2518+
*/
2519+
parent_path=pstrdup(dst_path);
2520+
get_parent_directory(parent_path);
2521+
if (!(stat(parent_path,&st)==0&&S_ISDIR(st.st_mode)))
2522+
{
2523+
XLogRememberMissingDir(xlrec->tablespace_id,InvalidOid,parent_path);
2524+
skip= true;
2525+
ereport(WARNING,
2526+
(errmsg("skipping replay of database creation WAL record"),
2527+
errdetail("The target tablespace \"%s\" directory was not found.",
2528+
parent_path),
2529+
errhint("A future WAL record that removes the directory before reaching consistent mode is expected.")));
2530+
}
2531+
pfree(parent_path);
2532+
}
2533+
2534+
/*
2535+
* If the source directory is missing, skip the copy and make a note of
2536+
* it for later.
2537+
*
2538+
* One possible reason for this is that the template database used for
2539+
* creating this database may have been dropped, as noted above.
2540+
* Moving a database from one tablespace may also be a partner in the
2541+
* crime.
2542+
*/
2543+
if (!(stat(src_path,&st)==0&&S_ISDIR(st.st_mode))&&
2544+
!reachedConsistency)
2545+
{
2546+
XLogRememberMissingDir(xlrec->src_tablespace_id,xlrec->src_db_id,src_path);
2547+
skip= true;
2548+
ereport(WARNING,
2549+
(errmsg("skipping replay of database creation WAL record"),
2550+
errdetail("The source database directory \"%s\" was not found.",
2551+
src_path),
2552+
errhint("A future WAL record that removes the directory before reaching consistent mode is expected.")));
2553+
}
2554+
2555+
if (skip)
2556+
return;
25042557

25052558
/*
25062559
* Force dirty buffers out to disk, to ensure source database is
@@ -2563,6 +2616,10 @@ dbase_redo(XLogReaderState *record)
25632616
ereport(WARNING,
25642617
(errmsg("some useless files may be left behind in old database directory \"%s\"",
25652618
dst_path)));
2619+
2620+
if (!reachedConsistency)
2621+
XLogForgetMissingDir(xlrec->tablespace_ids[i],xlrec->db_id);
2622+
25662623
pfree(dst_path);
25672624
}
25682625

‎src/backend/commands/tablespace.c

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@
5757
#include"access/tableam.h"
5858
#include"access/xact.h"
5959
#include"access/xloginsert.h"
60+
#include"access/xlogrecovery.h"
6061
#include"access/xlogutils.h"
6162
#include"catalog/catalog.h"
6263
#include"catalog/dependency.h"
@@ -1574,6 +1575,22 @@ tblspc_redo(XLogReaderState *record)
15741575
{
15751576
xl_tblspc_drop_rec*xlrec= (xl_tblspc_drop_rec*)XLogRecGetData(record);
15761577

1578+
if (!reachedConsistency)
1579+
XLogForgetMissingDir(xlrec->ts_id,InvalidOid);
1580+
1581+
/*
1582+
* Before we remove the tablespace directory, update minimum recovery
1583+
* point to cover this WAL record. Once the tablespace is removed,
1584+
* there's no going back. This manually enforces the WAL-first rule.
1585+
* Doing this before the removal means that if the removal fails for
1586+
* some reason, the directory is left alone and needs to be manually
1587+
* removed. Alternatively we could update the minimum recovery point
1588+
* after removal, but that would leave a small window where the
1589+
* WAL-first rule could be violated.
1590+
*/
1591+
if (!reachedConsistency)
1592+
XLogFlush(record->EndRecPtr);
1593+
15771594
/*
15781595
* If we issued a WAL record for a drop tablespace it implies that
15791596
* there were no files in it at all when the DROP was done. That means

‎src/include/access/xlogutils.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,10 @@ extern void XLogDropDatabase(Oid dbid);
6565
externvoidXLogTruncateRelation(RelFileNodernode,ForkNumberforkNum,
6666
BlockNumbernblocks);
6767

68+
externvoidXLogRememberMissingDir(OidspcNode,OiddbNode,char*path);
69+
externvoidXLogForgetMissingDir(OidspcNode,OiddbNode);
70+
externvoidXLogCheckMissingDirs(void);
71+
6872
/* Result codes for XLogReadBufferForRedo[Extended] */
6973
typedefenum
7074
{
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Copyright (c) 2022, PostgreSQL Global Development Group
2+
3+
# Test recovery involving tablespace removal. If recovery stops
4+
# after once tablespace is removed, the next recovery should properly
5+
# ignore the operations within the removed tablespaces.
6+
7+
use strict;
8+
use warnings;
9+
10+
use PostgreSQL::Test::Cluster;
11+
use PostgreSQL::Test::Utils;
12+
use Test::More;
13+
14+
my$node_primary = PostgreSQL::Test::Cluster->new('primary1');
15+
$node_primary->init(allows_streaming=> 1);
16+
$node_primary->start;
17+
$node_primary->psql('postgres',
18+
qq[
19+
SET allow_in_place_tablespaces=on;
20+
CREATE TABLESPACE dropme_ts1 LOCATION '';
21+
CREATE TABLESPACE dropme_ts2 LOCATION '';
22+
CREATE TABLESPACE source_ts LOCATION '';
23+
CREATE TABLESPACE target_ts LOCATION '';
24+
CREATE DATABASE template_db IS_TEMPLATE = true;
25+
]);
26+
my$backup_name ='my_backup';
27+
$node_primary->backup($backup_name);
28+
29+
my$node_standby = PostgreSQL::Test::Cluster->new('standby1');
30+
$node_standby->init_from_backup($node_primary,$backup_name,has_streaming=> 1);
31+
$node_standby->start;
32+
33+
# Make sure connection is made
34+
$node_primary->poll_query_until(
35+
'postgres','SELECT count(*) = 1 FROM pg_stat_replication');
36+
37+
$node_standby->safe_psql('postgres','CHECKPOINT');
38+
39+
# Do immediate shutdown just after a sequence of CREATE DATABASE / DROP
40+
# DATABASE / DROP TABLESPACE. This causes CREATE DATABASE WAL records
41+
# to be applied to already-removed directories.
42+
$node_primary->safe_psql('postgres',
43+
q[CREATE DATABASE dropme_db1 WITH TABLESPACE dropme_ts1;
44+
CREATE DATABASE dropme_db2 WITH TABLESPACE dropme_ts2;
45+
CREATE DATABASE moveme_db TABLESPACE source_ts;
46+
ALTER DATABASE moveme_db SET TABLESPACE target_ts;
47+
CREATE DATABASE newdb TEMPLATE template_db;
48+
ALTER DATABASE template_db IS_TEMPLATE = false;
49+
DROP DATABASE dropme_db1;
50+
DROP DATABASE dropme_db2; DROP TABLESPACE dropme_ts2;
51+
DROP TABLESPACE source_ts;
52+
DROP DATABASE template_db;]);
53+
54+
$node_primary->wait_for_catchup($node_standby,'replay',
55+
$node_primary->lsn('replay'));
56+
$node_standby->stop('immediate');
57+
58+
# Should restart ignoring directory creation error.
59+
is($node_standby->start, 1,"standby started successfully");
60+
61+
my$log = PostgreSQL::Test::Utils::slurp_file($node_standby->logfile);
62+
like(
63+
$log,
64+
qr[WARNING: skipping replay of database creation WAL record],
65+
"warning message is logged");
66+
67+
done_testing();

‎src/tools/pgindent/typedefs.list

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3736,6 +3736,8 @@ xl_invalid_page
37363736
xl_invalid_page_key
37373737
xl_invalidations
37383738
xl_logical_message
3739+
xl_missing_dir_key
3740+
xl_missing_dir
37393741
xl_multi_insert_tuple
37403742
xl_multixact_create
37413743
xl_multixact_truncate

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp