Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit6d8727f

Browse files
committed
Ensure cleanup of orphan archive status files
When a WAL segment is recycled, its ".ready" and ".done" status filesget also automatically removed, however this is not done in a durablemanner. Hence, in a subsequent crash, it could be possible that a".ready" status file is still around with its corresponding segmentalready gone.If the backend reaches such a state, the archive command would mostlikely complain about a segment non-existing and would keep retrying,causing WAL segments to bloat pg_wal/, potentially making Postgres crashhard when running out of space.As status files are removed after each individual segment, usingdurable_unlink() does not completely close the window either, as a crashcould happen between the moment the WAL segment is recycled and themoment its status files are removed. This has also some performanceimpact with the additional fsync() calls needed to make the removal in adurable manner. Doing the cleanup at recovery is not cost-free eitheras this makes crash recovery potentially take longer than necessary.So, instead, as per an idea of Stephen Frost, make the archiver aware oforphan status files and remove them on-the-fly if the correspondingsegment goes missing. Removal failures follow a model close to whathappens for WAL segments, where multiple attempts are done before givingup temporarily, and where a successful orphan removal makes the archivermove immediately to the next WAL segment thought as ready to bearchived.Author: Michael PaquierReviewed-by: Nathan Bossart, Andres Freund, Stephen Frost, KyotaroHoriguchiDiscussion:https://postgr.es/m/20180928032827.GF1500@paquier.xyz
1 parent7fee252 commit6d8727f

File tree

1 file changed

+55
-0
lines changed

1 file changed

+55
-0
lines changed

‎src/backend/postmaster/pgarch.c

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
#include<fcntl.h>
2929
#include<signal.h>
3030
#include<time.h>
31+
#include<sys/stat.h>
3132
#include<sys/time.h>
3233
#include<sys/wait.h>
3334
#include<unistd.h>
@@ -59,8 +60,18 @@
5960
#definePGARCH_RESTART_INTERVAL 10/* How often to attempt to restart a
6061
* failed archiver; in seconds. */
6162

63+
/*
64+
* Maximum number of retries allowed when attempting to archive a WAL
65+
* file.
66+
*/
6267
#defineNUM_ARCHIVE_RETRIES 3
6368

69+
/*
70+
* Maximum number of retries allowed when attempting to remove an
71+
* orphan archive status file.
72+
*/
73+
#defineNUM_ORPHAN_CLEANUP_RETRIES 3
74+
6475

6576
/* ----------
6677
* Local data
@@ -424,9 +435,13 @@ pgarch_ArchiverCopyLoop(void)
424435
while (pgarch_readyXlog(xlog))
425436
{
426437
intfailures=0;
438+
intfailures_orphan=0;
427439

428440
for (;;)
429441
{
442+
structstatstat_buf;
443+
charpathname[MAXPGPATH];
444+
430445
/*
431446
* Do not initiate any more archive commands after receiving
432447
* SIGTERM, nor after the postmaster has died unexpectedly. The
@@ -456,6 +471,46 @@ pgarch_ArchiverCopyLoop(void)
456471
return;
457472
}
458473

474+
/*
475+
* Since archive status files are not removed in a durable manner,
476+
* a system crash could leave behind .ready files for WAL segments
477+
* that have already been recycled or removed. In this case,
478+
* simply remove the orphan status file and move on. unlink() is
479+
* used here as even on subsequent crashes the same orphan files
480+
* would get removed, so there is no need to worry about
481+
* durability.
482+
*/
483+
snprintf(pathname,MAXPGPATH,XLOGDIR"/%s",xlog);
484+
if (stat(pathname,&stat_buf)!=0&&errno==ENOENT)
485+
{
486+
charxlogready[MAXPGPATH];
487+
488+
StatusFilePath(xlogready,xlog,".ready");
489+
if (unlink(xlogready)==0)
490+
{
491+
ereport(WARNING,
492+
(errmsg("removed orphan archive status file \"%s\"",
493+
xlogready)));
494+
495+
/* leave loop and move to the next status file */
496+
break;
497+
}
498+
499+
if (++failures_orphan >=NUM_ORPHAN_CLEANUP_RETRIES)
500+
{
501+
ereport(WARNING,
502+
(errmsg("removal of orphan archive status file \"%s\" failed too many times, will try again later",
503+
xlogready)));
504+
505+
/* give up cleanup of orphan status files */
506+
return;
507+
}
508+
509+
/* wait a bit before retrying */
510+
pg_usleep(1000000L);
511+
continue;
512+
}
513+
459514
if (pgarch_archiveXlog(xlog))
460515
{
461516
/* successful */

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp