Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit9a7e26b

Browse files
committed
Fix replay of create database records on standby
Crash recovery on standby may encounter missing directorieswhen replaying database-creation WAL records. Prior to thispatch, the standby would fail to recover in such a case;however, the directories could be legitimately missing.Consider the following sequence of commands: CREATE DATABASE DROP DATABASE DROP TABLESPACEIf, after replaying the last WAL record and removing thetablespace directory, the standby crashes and has to replay thecreate database record again, crash recovery must be able to continue.A fix for this problem was already attempted in49d9cfc, but itwas reverted because of design issues. This new version is basedon Robert Haas' proposal: any missing tablespaces are createdduring recovery before reaching consistency. Tablespacesare created as real directories, and should be deletedby later replay. CheckRecoveryConsistency ensuresthey have disappeared.The problems detected by this new code are reported as PANIC,except when allow_in_place_tablespaces is set to ON, in whichcase they are WARNING. Apart from making tests possible, thisgives users an escape hatch in case things don't go as planned.Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>Author: Asim R Praveen <apraveen@pivotal.io>Author: Paul Guo <paulguo@gmail.com>Reviewed-by: Anastasia Lubennikova <lubennikovaav@gmail.com> (older versions)Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> (older versions)Reviewed-by: Michaël Paquier <michael@paquier.xyz>Diagnosed-by: Paul Guo <paulguo@gmail.com>Discussion:https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
1 parent16e7a8f commit9a7e26b

File tree

4 files changed

+302
-31
lines changed

4 files changed

+302
-31
lines changed

‎src/backend/access/transam/xlog.c

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8037,6 +8037,59 @@ StartupXLOG(void)
80378037
RequestCheckpoint(CHECKPOINT_FORCE);
80388038
}
80398039

8040+
/*
8041+
* Verify that, in non-test mode, ./pg_tblspc doesn't contain any real
8042+
* directories.
8043+
*
8044+
* Replay of database creation XLOG records for databases that were later
8045+
* dropped can create fake directories in pg_tblspc. By the time consistency
8046+
* is reached these directories should have been removed; here we verify
8047+
* that this did indeed happen. This is to be called at the point where
8048+
* consistent state is reached.
8049+
*
8050+
* allow_in_place_tablespaces turns the PANIC into a WARNING, which is
8051+
* useful for testing purposes, and also allows for an escape hatch in case
8052+
* things go south.
8053+
*/
8054+
staticvoid
8055+
CheckTablespaceDirectory(void)
8056+
{
8057+
DIR*dir;
8058+
structdirent*de;
8059+
8060+
dir=AllocateDir("pg_tblspc");
8061+
while ((de=ReadDir(dir,"pg_tblspc"))!=NULL)
8062+
{
8063+
charpath[MAXPGPATH+10];
8064+
#ifndefWIN32
8065+
structstatst;
8066+
#endif
8067+
8068+
/* Skip entries of non-oid names */
8069+
if (strspn(de->d_name,"0123456789")!=strlen(de->d_name))
8070+
continue;
8071+
8072+
snprintf(path,sizeof(path),"pg_tblspc/%s",de->d_name);
8073+
8074+
#ifndefWIN32
8075+
if (lstat(path,&st)<0)
8076+
ereport(LOG,
8077+
(errcode_for_file_access(),
8078+
errmsg("could not stat file \"%s\": %m",
8079+
path)));
8080+
elseif (!S_ISLNK(st.st_mode))
8081+
#else/* WIN32 */
8082+
if (!pgwin32_is_junction(path))
8083+
#endif
8084+
ereport(allow_in_place_tablespaces ?WARNING :PANIC,
8085+
(errcode(ERRCODE_DATA_CORRUPTED),
8086+
errmsg("unexpected directory entry \"%s\" found in %s",
8087+
de->d_name,"pg_tblspc/"),
8088+
errdetail("All directory entries in pg_tblspc/ should be symbolic links."),
8089+
errhint("Remove those directories, or set allow_in_place_tablespaces to ON transiently to let recovery complete.")));
8090+
}
8091+
}
8092+
80408093
/*
80418094
* Checks if recovery has reached a consistent state. When consistency is
80428095
* reached and we have a valid starting standby snapshot, tell postmaster
@@ -8107,6 +8160,14 @@ CheckRecoveryConsistency(void)
81078160
*/
81088161
XLogCheckInvalidPages();
81098162

8163+
/*
8164+
* Check that pg_tblspc doesn't contain any real directories. Replay
8165+
* of Database/CREATE_* records may have created ficticious tablespace
8166+
* directories that should have been removed by the time consistency
8167+
* was reached.
8168+
*/
8169+
CheckTablespaceDirectory();
8170+
81108171
reachedConsistency= true;
81118172
ereport(LOG,
81128173
(errmsg("consistent recovery state reached at %X/%X",

‎src/backend/commands/dbcommands.c

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@
4646
#include"commands/defrem.h"
4747
#include"commands/seclabel.h"
4848
#include"commands/tablespace.h"
49+
#include"common/file_perm.h"
4950
#include"mb/pg_wchar.h"
5051
#include"miscadmin.h"
5152
#include"pgstat.h"
@@ -2169,6 +2170,45 @@ get_database_name(Oid dbid)
21692170
returnresult;
21702171
}
21712172

2173+
/*
2174+
* recovery_create_dbdir()
2175+
*
2176+
* During recovery, there's a case where we validly need to recover a missing
2177+
* tablespace directory so that recovery can continue. This happens when
2178+
* recovery wants to create a database but the holding tablespace has been
2179+
* removed before the server stopped. Since we expect that the directory will
2180+
* be gone before reaching recovery consistency, and we have no knowledge about
2181+
* the tablespace other than its OID here, we create a real directory under
2182+
* pg_tblspc here instead of restoring the symlink.
2183+
*
2184+
* If only_tblspc is true, then the requested directory must be in pg_tblspc/
2185+
*/
2186+
staticvoid
2187+
recovery_create_dbdir(char*path,boolonly_tblspc)
2188+
{
2189+
structstatst;
2190+
2191+
Assert(RecoveryInProgress());
2192+
2193+
if (stat(path,&st)==0)
2194+
return;
2195+
2196+
if (only_tblspc&&strstr(path,"pg_tblspc/")==NULL)
2197+
elog(PANIC,"requested to created invalid directory: %s",path);
2198+
2199+
if (reachedConsistency&& !allow_in_place_tablespaces)
2200+
ereport(PANIC,
2201+
errmsg("missing directory \"%s\"",path));
2202+
2203+
elog(reachedConsistency ?WARNING :DEBUG1,
2204+
"creating missing directory: %s",path);
2205+
2206+
if (pg_mkdir_p(path,pg_dir_create_mode)!=0)
2207+
ereport(PANIC,
2208+
errmsg("could not create missing directory \"%s\": %m",path));
2209+
}
2210+
2211+
21722212
/*
21732213
* DATABASE resource manager's routines
21742214
*/
@@ -2185,6 +2225,7 @@ dbase_redo(XLogReaderState *record)
21852225
xl_dbase_create_rec*xlrec= (xl_dbase_create_rec*)XLogRecGetData(record);
21862226
char*src_path;
21872227
char*dst_path;
2228+
char*parent_path;
21882229
structstatst;
21892230

21902231
src_path=GetDatabasePath(xlrec->src_db_id,xlrec->src_tablespace_id);
@@ -2204,6 +2245,34 @@ dbase_redo(XLogReaderState *record)
22042245
dst_path)));
22052246
}
22062247

2248+
/*
2249+
* If the parent of the target path doesn't exist, create it now. This
2250+
* enables us to create the target underneath later. Note that if
2251+
* the database dir is not in a tablespace, the parent will always
2252+
* exist, so this never runs in that case.
2253+
*/
2254+
parent_path=pstrdup(dst_path);
2255+
get_parent_directory(parent_path);
2256+
if (stat(parent_path,&st)<0)
2257+
{
2258+
if (errno!=ENOENT)
2259+
ereport(FATAL,
2260+
errmsg("could not stat directory \"%s\": %m",
2261+
dst_path));
2262+
2263+
recovery_create_dbdir(parent_path, true);
2264+
}
2265+
pfree(parent_path);
2266+
2267+
/*
2268+
* There's a case where the copy source directory is missing for the
2269+
* same reason above. Create the emtpy source directory so that
2270+
* copydir below doesn't fail. The directory will be dropped soon by
2271+
* recovery.
2272+
*/
2273+
if (stat(src_path,&st)<0&&errno==ENOENT)
2274+
recovery_create_dbdir(src_path, false);
2275+
22072276
/*
22082277
* Force dirty buffers out to disk, to ensure source database is
22092278
* up-to-date for the copy.

‎src/backend/commands/tablespace.c

Lines changed: 9 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -154,8 +154,6 @@ TablespaceCreateDbspace(Oid spcNode, Oid dbNode, bool isRedo)
154154
/* Directory creation failed? */
155155
if (MakePGDirectory(dir)<0)
156156
{
157-
char*parentdir;
158-
159157
/* Failure other than not exists or not in WAL replay? */
160158
if (errno!=ENOENT|| !isRedo)
161159
ereport(ERROR,
@@ -164,36 +162,16 @@ TablespaceCreateDbspace(Oid spcNode, Oid dbNode, bool isRedo)
164162
dir)));
165163

166164
/*
167-
* Parent directories are missing during WAL replay, so
168-
* continue by creating simple parent directories rather
169-
* than a symlink.
165+
* During WAL replay, it's conceivable that several levels
166+
* of directories are missing if tablespaces are dropped
167+
* further ahead of the WAL stream than we're currently
168+
* replaying. An easy way forward is to create them as
169+
* plain directories and hope they are removed by further
170+
* WAL replay if necessary. If this also fails, there is
171+
* trouble we cannot get out of, so just report that and
172+
* bail out.
170173
*/
171-
172-
/* create two parents up if not exist */
173-
parentdir=pstrdup(dir);
174-
get_parent_directory(parentdir);
175-
get_parent_directory(parentdir);
176-
/* Can't create parent and it doesn't already exist? */
177-
if (MakePGDirectory(parentdir)<0&&errno!=EEXIST)
178-
ereport(ERROR,
179-
(errcode_for_file_access(),
180-
errmsg("could not create directory \"%s\": %m",
181-
parentdir)));
182-
pfree(parentdir);
183-
184-
/* create one parent up if not exist */
185-
parentdir=pstrdup(dir);
186-
get_parent_directory(parentdir);
187-
/* Can't create parent and it doesn't already exist? */
188-
if (MakePGDirectory(parentdir)<0&&errno!=EEXIST)
189-
ereport(ERROR,
190-
(errcode_for_file_access(),
191-
errmsg("could not create directory \"%s\": %m",
192-
parentdir)));
193-
pfree(parentdir);
194-
195-
/* Create database directory */
196-
if (MakePGDirectory(dir)<0)
174+
if (pg_mkdir_p(dir,pg_dir_create_mode)<0)
197175
ereport(ERROR,
198176
(errcode_for_file_access(),
199177
errmsg("could not create directory \"%s\": %m",
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
2+
# Copyright (c) 2021-2022, PostgreSQL Global Development Group
3+
4+
# Test replay of tablespace/database creation/drop
5+
6+
use strict;
7+
use warnings;
8+
9+
use PostgreSQL::Test::Cluster;
10+
use PostgreSQL::Test::Utils;
11+
use Test::More;
12+
13+
subtest_tablespace
14+
{
15+
my$node_primary = PostgreSQL::Test::Cluster->new("primary1");
16+
$node_primary->init(allows_streaming=> 1);
17+
$node_primary->start;
18+
$node_primary->psql(
19+
'postgres',
20+
qq[
21+
SET allow_in_place_tablespaces=on;
22+
CREATE TABLESPACE dropme_ts1 LOCATION '';
23+
CREATE TABLESPACE dropme_ts2 LOCATION '';
24+
CREATE TABLESPACE source_ts LOCATION '';
25+
CREATE TABLESPACE target_ts LOCATION '';
26+
CREATE DATABASE template_db IS_TEMPLATE = true;
27+
]);
28+
my$backup_name ='my_backup';
29+
$node_primary->backup($backup_name);
30+
31+
my$node_standby = PostgreSQL::Test::Cluster->new("standby2");
32+
$node_standby->init_from_backup($node_primary,$backup_name,
33+
has_streaming=> 1);
34+
$node_standby->append_conf('postgresql.conf',
35+
"allow_in_place_tablespaces = on");
36+
$node_standby->start;
37+
38+
# Make sure connection is made
39+
$node_primary->poll_query_until('postgres',
40+
'SELECT count(*) = 1 FROM pg_stat_replication');
41+
42+
$node_standby->safe_psql('postgres','CHECKPOINT');
43+
44+
# Do immediate shutdown just after a sequence of CREAT DATABASE / DROP
45+
# DATABASE / DROP TABLESPACE. This causes CREATE DATABASE WAL records
46+
# to be applied to already-removed directories.
47+
my$query =q[
48+
CREATE DATABASE dropme_db1 WITH TABLESPACE dropme_ts1;
49+
CREATE TABLE t (a int) TABLESPACE dropme_ts2;
50+
CREATE DATABASE dropme_db2 WITH TABLESPACE dropme_ts2;
51+
CREATE DATABASE moveme_db TABLESPACE source_ts;
52+
ALTER DATABASE moveme_db SET TABLESPACE target_ts;
53+
CREATE DATABASE newdb TEMPLATE template_db;
54+
ALTER DATABASE template_db IS_TEMPLATE = false;
55+
DROP DATABASE dropme_db1;
56+
DROP TABLE t;
57+
DROP DATABASE dropme_db2; DROP TABLESPACE dropme_ts2;
58+
DROP TABLESPACE source_ts;
59+
DROP DATABASE template_db;
60+
];
61+
62+
$node_primary->safe_psql('postgres',$query);
63+
$node_primary->wait_for_catchup($node_standby,'replay',
64+
$node_primary->lsn('write'));
65+
66+
# show "create missing directory" log message
67+
$node_standby->safe_psql('postgres',
68+
"ALTER SYSTEM SET log_min_messages TO debug1;");
69+
$node_standby->stop('immediate');
70+
# Should restart ignoring directory creation error.
71+
is($node_standby->start(fail_ok=> 1), 1,"standby node started");
72+
$node_standby->stop('immediate');
73+
}
74+
75+
test_tablespace();
76+
77+
# Ensure that a missing tablespace directory during create database
78+
# replay immediately causes panic if the standby has already reached
79+
# consistent state (archive recovery is in progress).
80+
81+
my$node_primary = PostgreSQL::Test::Cluster->new('primary2');
82+
$node_primary->init(allows_streaming=> 1);
83+
$node_primary->start;
84+
85+
# Create tablespace
86+
$node_primary->safe_psql(
87+
'postgres',q[
88+
SET allow_in_place_tablespaces=on;
89+
CREATE TABLESPACE ts1 LOCATION ''
90+
]);
91+
$node_primary->safe_psql('postgres',
92+
"CREATE DATABASE db1 WITH TABLESPACE ts1");
93+
94+
# Take backup
95+
my$backup_name ='my_backup';
96+
$node_primary->backup($backup_name);
97+
my$node_standby = PostgreSQL::Test::Cluster->new('standby3');
98+
$node_standby->init_from_backup($node_primary,$backup_name,
99+
has_streaming=> 1);
100+
$node_standby->append_conf('postgresql.conf',
101+
"allow_in_place_tablespaces = on");
102+
$node_standby->start;
103+
104+
# Make sure standby reached consistency and starts accepting connections
105+
$node_standby->poll_query_until('postgres','SELECT 1','1');
106+
107+
# Remove standby tablespace directory so it will be missing when
108+
# replay resumes.
109+
my$tspoid =$node_standby->safe_psql('postgres',
110+
"SELECT oid FROM pg_tablespace WHERE spcname = 'ts1';");
111+
my$tspdir =$node_standby->data_dir ."/pg_tblspc/$tspoid";
112+
File::Path::rmtree($tspdir);
113+
114+
my$logstart = get_log_size($node_standby);
115+
116+
# Create a database in the tablespace and a table in default tablespace
117+
$node_primary->safe_psql(
118+
'postgres',
119+
q[
120+
CREATE TABLE should_not_replay_insertion(a int);
121+
CREATE DATABASE db2 WITH TABLESPACE ts1;
122+
INSERT INTO should_not_replay_insertion VALUES (1);
123+
]);
124+
125+
# Standby should fail and should not silently skip replaying the wal
126+
# In this test, PANIC turns into WARNING by allow_in_place_tablespaces.
127+
# Check the log messages instead of confirming standby failure.
128+
my$max_attempts =$PostgreSQL::Test::Utils::timeout_default;
129+
while ($max_attempts-- >= 0)
130+
{
131+
last
132+
if (
133+
find_in_log(
134+
$node_standby,"WARNING: creating missing directory: pg_tblspc/",
135+
$logstart));
136+
sleep 1;
137+
}
138+
ok($max_attempts > 0,"invalid directory creation is detected");
139+
140+
done_testing();
141+
142+
143+
# return the size of logfile of $node in bytes
144+
subget_log_size
145+
{
146+
my ($node) =@_;
147+
148+
return (stat$node->logfile)[7];
149+
}
150+
151+
# find $pat in logfile of $node after $off-th byte
152+
subfind_in_log
153+
{
154+
my ($node,$pat,$off) =@_;
155+
156+
$off = 0unlessdefined$off;
157+
my$log = PostgreSQL::Test::Utils::slurp_file($node->logfile);
158+
return 0if (length($log) <=$off);
159+
160+
$log =substr($log,$off);
161+
162+
return$log =~m/$pat/;
163+
}

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp