Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit1691512

Browse files
pg_rewind: Fetch small files according to new size.
There's a race condition if a file changes in the source systemafter we have collected the file list. If the file becomes larger,we only fetched up to its original size. That can easily result ina truncated file. That's not a problem for relation files, filesin pg_xact, etc. because any actions on them will be replayed fromthe WAL. However, configuration files are affected.This commit mitigates the race condition by fetching small files inwhole, even if they have grown. A test is added in which an extrafile copied is concurrently grown with the output of pg_rewind thusguaranteeing it to have changed in size during the operation. Thisis not a full fix: we still believe the original file size for fileslarger than 1 MB. That should be enough for configuration files,and doing more than that would require big changes to the chunkinglogic in libpq_source.c.This mitigates the race condition if the file is modified betweenthe original scan of files and copying the file, but there's stilla race condition if a file is changed while it's being copied.That's a much smaller window, though, and pg_basebackup has thesame issue.This race can be seen with pg_auto_failover, which frequently usesALTER SYSTEM, which updates postgresql.auto.conf. Often, pg_rewindwill fail, because the postgresql.auto.conf file changed concurrentlyand a partial version of it was copied to the target. The partialfile would fail to parse, preventing the server from starting up.Author: Heikki LinnakangasReviewed-by: Cary HuangDiscussion:https://postgr.es/m/f67feb24-5833-88cb-1020-19a4a2b83ac7%40iki.fi
1 parent98fe742 commit1691512

File tree

5 files changed

+188
-14
lines changed

5 files changed

+188
-14
lines changed

‎src/bin/pg_rewind/libpq_source.c

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ static void process_queued_fetch_requests(libpq_source *src);
6363
/* public interface functions */
6464
staticvoidlibpq_traverse_files(rewind_source*source,
6565
process_file_callback_tcallback);
66+
staticvoidlibpq_queue_fetch_file(rewind_source*source,constchar*path,size_tlen);
6667
staticvoidlibpq_queue_fetch_range(rewind_source*source,constchar*path,
6768
off_toff,size_tlen);
6869
staticvoidlibpq_finish_fetch(rewind_source*source);
@@ -88,6 +89,7 @@ init_libpq_source(PGconn *conn)
8889

8990
src->common.traverse_files=libpq_traverse_files;
9091
src->common.fetch_file=libpq_fetch_file;
92+
src->common.queue_fetch_file=libpq_queue_fetch_file;
9193
src->common.queue_fetch_range=libpq_queue_fetch_range;
9294
src->common.finish_fetch=libpq_finish_fetch;
9395
src->common.get_current_wal_insert_lsn=libpq_get_current_wal_insert_lsn;
@@ -307,6 +309,36 @@ libpq_traverse_files(rewind_source *source, process_file_callback_t callback)
307309
PQclear(res);
308310
}
309311

312+
/*
313+
* Queue up a request to fetch a file from remote system.
314+
*/
315+
staticvoid
316+
libpq_queue_fetch_file(rewind_source*source,constchar*path,size_tlen)
317+
{
318+
/*
319+
* Truncate the target file immediately, and queue a request to fetch it
320+
* from the source. If the file is small, smaller than MAX_CHUNK_SIZE,
321+
* request fetching a full-sized chunk anyway, so that if the file has
322+
* become larger in the source system, after we scanned the source
323+
* directory, we still fetch the whole file. This only works for files up
324+
* to MAX_CHUNK_SIZE, but that's good enough for small configuration files
325+
* and such that are changed every now and then, but not WAL-logged. For
326+
* larger files, we fetch up to the original size.
327+
*
328+
* Even with that mechanism, there is an inherent race condition if the
329+
* file is modified at the same instant that we're copying it, so that we
330+
* might copy a torn version of the file with one half from the old
331+
* version and another half from the new. But pg_basebackup has the same
332+
* problem, and it hasn't been a problem in practice.
333+
*
334+
* It might seem more natural to truncate the file later, when we receive
335+
* it from the source server, but then we'd need to track which
336+
* fetch-requests are for a whole file.
337+
*/
338+
open_target_file(path, true);
339+
libpq_queue_fetch_range(source,path,0,Max(len,MAX_CHUNK_SIZE));
340+
}
341+
310342
/*
311343
* Queue up a request to fetch a piece of a file from remote system.
312344
*/

‎src/bin/pg_rewind/local_source.c

Lines changed: 66 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,10 @@ static void local_traverse_files(rewind_source *source,
2929
process_file_callback_tcallback);
3030
staticchar*local_fetch_file(rewind_source*source,constchar*path,
3131
size_t*filesize);
32-
staticvoidlocal_fetch_file_range(rewind_source*source,constchar*path,
33-
off_toff,size_tlen);
32+
staticvoidlocal_queue_fetch_file(rewind_source*source,constchar*path,
33+
size_tlen);
34+
staticvoidlocal_queue_fetch_range(rewind_source*source,constchar*path,
35+
off_toff,size_tlen);
3436
staticvoidlocal_finish_fetch(rewind_source*source);
3537
staticvoidlocal_destroy(rewind_source*source);
3638

@@ -43,7 +45,8 @@ init_local_source(const char *datadir)
4345

4446
src->common.traverse_files=local_traverse_files;
4547
src->common.fetch_file=local_fetch_file;
46-
src->common.queue_fetch_range=local_fetch_file_range;
48+
src->common.queue_fetch_file=local_queue_fetch_file;
49+
src->common.queue_fetch_range=local_queue_fetch_range;
4750
src->common.finish_fetch=local_finish_fetch;
4851
src->common.get_current_wal_insert_lsn=NULL;
4952
src->common.destroy=local_destroy;
@@ -65,12 +68,65 @@ local_fetch_file(rewind_source *source, const char *path, size_t *filesize)
6568
returnslurpFile(((local_source*)source)->datadir,path,filesize);
6669
}
6770

71+
/*
72+
* Copy a file from source to target.
73+
*
74+
* 'len' is the expected length of the file.
75+
*/
76+
staticvoid
77+
local_queue_fetch_file(rewind_source*source,constchar*path,size_tlen)
78+
{
79+
constchar*datadir= ((local_source*)source)->datadir;
80+
PGAlignedBlockbuf;
81+
charsrcpath[MAXPGPATH];
82+
intsrcfd;
83+
size_twritten_len;
84+
85+
snprintf(srcpath,sizeof(srcpath),"%s/%s",datadir,path);
86+
87+
/* Open source file for reading */
88+
srcfd=open(srcpath,O_RDONLY |PG_BINARY,0);
89+
if (srcfd<0)
90+
pg_fatal("could not open source file \"%s\": %m",
91+
srcpath);
92+
93+
/* Truncate and open the target file for writing */
94+
open_target_file(path, true);
95+
96+
written_len=0;
97+
for (;;)
98+
{
99+
ssize_tread_len;
100+
101+
read_len=read(srcfd,buf.data,sizeof(buf));
102+
103+
if (read_len<0)
104+
pg_fatal("could not read file \"%s\": %m",srcpath);
105+
elseif (read_len==0)
106+
break;/* EOF reached */
107+
108+
write_target_range(buf.data,written_len,read_len);
109+
written_len+=read_len;
110+
}
111+
112+
/*
113+
* A local source is not expected to change while we're rewinding, so
114+
* check that the size of the file matches our earlier expectation.
115+
*/
116+
if (written_len!=len)
117+
pg_fatal("size of source file \"%s\" changed concurrently: "UINT64_FORMAT" bytes expected, "UINT64_FORMAT" copied",
118+
srcpath,len,written_len);
119+
120+
if (close(srcfd)!=0)
121+
pg_fatal("could not close file \"%s\": %m",srcpath);
122+
}
123+
68124
/*
69125
* Copy a file from source to target, starting at 'off', for 'len' bytes.
70126
*/
71127
staticvoid
72-
local_fetch_file_range(rewind_source*source,constchar*path,off_toff,
73-
size_tlen)
128+
local_queue_fetch_range(rewind_source*source,constchar*path,off_toff,
129+
size_tlen)
74130
{
75131
constchar*datadir= ((local_source*)source)->datadir;
76132
PGAlignedBlockbuf;
@@ -94,14 +150,14 @@ local_fetch_file_range(rewind_source *source, const char *path, off_t off,
94150
while (end-begin>0)
95151
{
96152
ssize_treadlen;
97-
size_tlen;
153+
size_tthislen;
98154

99155
if (end-begin>sizeof(buf))
100-
len=sizeof(buf);
156+
thislen=sizeof(buf);
101157
else
102-
len=end-begin;
158+
thislen=end-begin;
103159

104-
readlen=read(srcfd,buf.data,len);
160+
readlen=read(srcfd,buf.data,thislen);
105161

106162
if (readlen<0)
107163
pg_fatal("could not read file \"%s\": %m",srcpath);
@@ -120,7 +176,7 @@ static void
120176
local_finish_fetch(rewind_source*source)
121177
{
122178
/*
123-
* Nothing to do,local_fetch_file_range() copies the ranges immediately.
179+
* Nothing to do,local_queue_fetch_range() copies the ranges immediately.
124180
*/
125181
}
126182

‎src/bin/pg_rewind/pg_rewind.c

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -537,10 +537,7 @@ perform_rewind(filemap_t *filemap, rewind_source *source,
537537
break;
538538

539539
caseFILE_ACTION_COPY:
540-
/* Truncate the old file out of the way, if any */
541-
open_target_file(entry->path, true);
542-
source->queue_fetch_range(source,entry->path,
543-
0,entry->source_size);
540+
source->queue_fetch_file(source,entry->path,entry->source_size);
544541
break;
545542

546543
caseFILE_ACTION_TRUNCATE:

‎src/bin/pg_rewind/rewind_source.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,19 @@ typedef struct rewind_source
4747
void(*queue_fetch_range) (structrewind_source*,constchar*path,
4848
off_toffset,size_tlen);
4949

50+
/*
51+
* Like queue_fetch_range(), but requests replacing the whole local file
52+
* from the source system. 'len' is the expected length of the file,
53+
* although when the source is a live server, the file may change
54+
* concurrently. The implementation is not obliged to copy more than 'len'
55+
* bytes, even if the file is larger. However, to avoid copying a
56+
* truncated version of the file, which can cause trouble if e.g. a
57+
* configuration file is modified concurrently, the implementation should
58+
* try to copy the whole file, even if it's larger than expected.
59+
*/
60+
void(*queue_fetch_file) (structrewind_source*,constchar*path,
61+
size_tlen);
62+
5063
/*
5164
* Execute all requests queued up with queue_fetch_range().
5265
*/
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
2+
# Copyright (c) 2021-2022, PostgreSQL Global Development Group
3+
4+
use strict;
5+
use warnings;
6+
use PostgreSQL::Test::Utils;
7+
use Test::More;
8+
9+
use FindBin;
10+
use lib$FindBin::RealBin;
11+
12+
use RewindTest;
13+
14+
RewindTest::setup_cluster("local");
15+
RewindTest::start_primary();
16+
17+
# Create a test table and insert a row in primary.
18+
primary_psql("CREATE TABLE tbl1 (d text)");
19+
primary_psql("INSERT INTO tbl1 VALUES ('in primary')");
20+
primary_psql("CHECKPOINT");
21+
22+
RewindTest::create_standby("local");
23+
24+
# Insert additional data on primary that will be replicated to standby
25+
primary_psql("INSERT INTO tbl1 values ('in primary, before promotion')");
26+
primary_psql('CHECKPOINT');
27+
28+
RewindTest::promote_standby();
29+
30+
# Insert a row in the old primary. This causes the primary and standby to have
31+
# "diverged", it's no longer possible to just apply the standy's logs over
32+
# primary directory - you need to rewind. Also insert a new row in the
33+
# standby, which won't be present in the old primary.
34+
primary_psql("INSERT INTO tbl1 VALUES ('in primary, after promotion')");
35+
standby_psql("INSERT INTO tbl1 VALUES ('in standby, after promotion')");
36+
37+
# Stop the nodes before running pg_rewind
38+
$node_standby->stop;
39+
$node_primary->stop;
40+
41+
my$primary_pgdata =$node_primary->data_dir;
42+
my$standby_pgdata =$node_standby->data_dir;
43+
44+
# Add an extra file that we can tamper with without interfering with the data
45+
# directory data files.
46+
mkdir"$standby_pgdata/tst_both_dir";
47+
append_to_file"$standby_pgdata/tst_both_dir/file1",'a';
48+
49+
# Run pg_rewind and pipe the output from the run into the extra file we want
50+
# to copy. This will ensure that the file is continously growing during the
51+
# copy operation and the result will be an error.
52+
my$ret = run_log(
53+
[
54+
'pg_rewind','--debug',
55+
'--source-pgdata',$standby_pgdata,
56+
'--target-pgdata',$primary_pgdata,
57+
'--no-sync',
58+
],
59+
'2>>',"$standby_pgdata/tst_both_dir/file1");
60+
ok(!$ret,'Error out on copying growing file');
61+
62+
# Ensure that the files are of different size, the final error message should
63+
# only be in one of them making them guaranteed to be different
64+
my$primary_size =-s"$primary_pgdata/tst_both_dir/file1";
65+
my$standby_size =-s"$standby_pgdata/tst_both_dir/file1";
66+
isnt($standby_size,$primary_size,"File sizes should differ");
67+
68+
# Extract the last line from the verbose output as that should have the error
69+
# message for the unexpected file size
70+
my$last;
71+
openmy$f,'<',"$standby_pgdata/tst_both_dir/file1";
72+
$last =$_while (<$f>);
73+
close$f;
74+
like($last,qr/fatal: size of source file/,"Check error message");
75+
76+
done_testing();

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp