Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commita4d1ce0

Browse files
committed
Don't lose walreceiver start requests due to race condition in postmaster.
When a walreceiver dies, the startup process will notice that and senda PMSIGNAL_START_WALRECEIVER signal to the postmaster, asking for a newwalreceiver to be launched. There's a race condition, which at leastin HEAD is very easy to hit, whereby the postmaster might see thatsignal before it processes the SIGCHLD from the walreceiver process.In that situation, sigusr1_handler() just dropped the start requeston the floor, reasoning that it must be redundant. Eventually, after10 seconds (WALRCV_STARTUP_TIMEOUT), the startup process would make afresh request --- but that's a long time if the connection could havebeen re-established almost immediately.Fix it by setting a state flag inside the postmaster that we won'tclear until we do launch a walreceiver. In cases where that resultsin an extra walreceiver launch, it's up to the walreceiver to realizeit's unwanted and go away --- but we have, and need, that logic anywayfor the opposite race case.I came across this through investigating unexpected delays in thesrc/test/recovery TAP tests: it manifests there in test cases wherea master server is stopped and restarted while leaving streamingslaves active.This logic has been broken all along, so back-patch to all supportedbranches.Discussion:https://postgr.es/m/21344.1498494720@sss.pgh.pa.us
1 parentf6af9c7 commita4d1ce0

File tree

1 file changed

+32
-7
lines changed

1 file changed

+32
-7
lines changed

‎src/backend/postmaster/postmaster.c

Lines changed: 32 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -354,6 +354,9 @@ static volatile sig_atomic_t start_autovac_launcher = false;
354354
/* the launcher needs to be signalled to communicate some condition */
355355
staticvolatileboolavlauncher_needs_signal= false;
356356

357+
/* received START_WALRECEIVER signal */
358+
staticvolatilesig_atomic_tWalReceiverRequested= false;
359+
357360
/* set when there's a worker that needs to be started up */
358361
staticvolatileboolStartWorkerNeeded= true;
359362
staticvolatileboolHaveCrashedWorker= false;
@@ -417,6 +420,7 @@ static void maybe_start_bgworkers(void);
417420
staticboolCreateOptsFile(intargc,char*argv[],char*fullprogname);
418421
staticpid_tStartChildProcess(AuxProcTypetype);
419422
staticvoidStartAutovacuumWorker(void);
423+
staticvoidMaybeStartWalReceiver(void);
420424
staticvoidInitPostmasterDeathWatchHandle(void);
421425

422426
/*
@@ -1783,6 +1787,10 @@ ServerLoop(void)
17831787
kill(AutoVacPID,SIGUSR2);
17841788
}
17851789

1790+
/* If we need to start a WAL receiver, try to do that now */
1791+
if (WalReceiverRequested)
1792+
MaybeStartWalReceiver();
1793+
17861794
/* Get other worker processes running, if needed */
17871795
if (StartWorkerNeeded||HaveCrashedWorker)
17881796
maybe_start_bgworkers();
@@ -2923,7 +2931,8 @@ reaper(SIGNAL_ARGS)
29232931
/*
29242932
* Was it the wal receiver? If exit status is zero (normal) or one
29252933
* (FATAL exit), we assume everything is all right just like normal
2926-
* backends.
2934+
* backends. (If we need a new wal receiver, we'll start one at the
2935+
* next iteration of the postmaster's main loop.)
29272936
*/
29282937
if (pid==WalReceiverPID)
29292938
{
@@ -5011,14 +5020,12 @@ sigusr1_handler(SIGNAL_ARGS)
50115020
StartAutovacuumWorker();
50125021
}
50135022

5014-
if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER)&&
5015-
WalReceiverPID==0&&
5016-
(pmState==PM_STARTUP||pmState==PM_RECOVERY||
5017-
pmState==PM_HOT_STANDBY||pmState==PM_WAIT_READONLY)&&
5018-
Shutdown==NoShutdown)
5023+
if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER))
50195024
{
50205025
/* Startup Process wants us to start the walreceiver process. */
5021-
WalReceiverPID=StartWalReceiver();
5026+
/* Start immediately if possible, else remember request for later. */
5027+
WalReceiverRequested= true;
5028+
MaybeStartWalReceiver();
50225029
}
50235030

50245031
if (CheckPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE)&&
@@ -5369,6 +5376,24 @@ StartAutovacuumWorker(void)
53695376
}
53705377
}
53715378

5379+
/*
5380+
* MaybeStartWalReceiver
5381+
*Start the WAL receiver process, if not running and our state allows.
5382+
*/
5383+
staticvoid
5384+
MaybeStartWalReceiver(void)
5385+
{
5386+
if (WalReceiverPID==0&&
5387+
(pmState==PM_STARTUP||pmState==PM_RECOVERY||
5388+
pmState==PM_HOT_STANDBY||pmState==PM_WAIT_READONLY)&&
5389+
Shutdown==NoShutdown)
5390+
{
5391+
WalReceiverPID=StartWalReceiver();
5392+
WalReceiverRequested= false;
5393+
}
5394+
}
5395+
5396+
53725397
/*
53735398
* Create the opts file
53745399
*/

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp