Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitcb59949

Browse files
committed
Don't lose walreceiver start requests due to race condition in postmaster.
When a walreceiver dies, the startup process will notice that and senda PMSIGNAL_START_WALRECEIVER signal to the postmaster, asking for a newwalreceiver to be launched. There's a race condition, which at leastin HEAD is very easy to hit, whereby the postmaster might see thatsignal before it processes the SIGCHLD from the walreceiver process.In that situation, sigusr1_handler() just dropped the start requeston the floor, reasoning that it must be redundant. Eventually, after10 seconds (WALRCV_STARTUP_TIMEOUT), the startup process would make afresh request --- but that's a long time if the connection could havebeen re-established almost immediately.Fix it by setting a state flag inside the postmaster that we won'tclear until we do launch a walreceiver. In cases where that resultsin an extra walreceiver launch, it's up to the walreceiver to realizeit's unwanted and go away --- but we have, and need, that logic anywayfor the opposite race case.I came across this through investigating unexpected delays in thesrc/test/recovery TAP tests: it manifests there in test cases wherea master server is stopped and restarted while leaving streamingslaves active.This logic has been broken all along, so back-patch to all supportedbranches.Discussion:https://postgr.es/m/21344.1498494720@sss.pgh.pa.us
1 parent456bf26 commitcb59949

File tree

1 file changed

+32
-7
lines changed

1 file changed

+32
-7
lines changed

‎src/backend/postmaster/postmaster.c

Lines changed: 32 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -370,6 +370,9 @@ static volatile sig_atomic_t start_autovac_launcher = false;
370370
/* the launcher needs to be signalled to communicate some condition */
371371
staticvolatileboolavlauncher_needs_signal= false;
372372

373+
/* received START_WALRECEIVER signal */
374+
staticvolatilesig_atomic_tWalReceiverRequested= false;
375+
373376
/* set when there's a worker that needs to be started up */
374377
staticvolatileboolStartWorkerNeeded= true;
375378
staticvolatileboolHaveCrashedWorker= false;
@@ -443,6 +446,7 @@ static void maybe_start_bgworker(void);
443446
staticboolCreateOptsFile(intargc,char*argv[],char*fullprogname);
444447
staticpid_tStartChildProcess(AuxProcTypetype);
445448
staticvoidStartAutovacuumWorker(void);
449+
staticvoidMaybeStartWalReceiver(void);
446450
staticvoidInitPostmasterDeathWatchHandle(void);
447451

448452
#ifdefEXEC_BACKEND
@@ -1766,6 +1770,10 @@ ServerLoop(void)
17661770
kill(AutoVacPID,SIGUSR2);
17671771
}
17681772

1773+
/* If we need to start a WAL receiver, try to do that now */
1774+
if (WalReceiverRequested)
1775+
MaybeStartWalReceiver();
1776+
17691777
/* Get other worker processes running, if needed */
17701778
if (StartWorkerNeeded||HaveCrashedWorker)
17711779
maybe_start_bgworker();
@@ -2848,7 +2856,8 @@ reaper(SIGNAL_ARGS)
28482856
/*
28492857
* Was it the wal receiver? If exit status is zero (normal) or one
28502858
* (FATAL exit), we assume everything is all right just like normal
2851-
* backends.
2859+
* backends. (If we need a new wal receiver, we'll start one at the
2860+
* next iteration of the postmaster's main loop.)
28522861
*/
28532862
if (pid==WalReceiverPID)
28542863
{
@@ -4896,14 +4905,12 @@ sigusr1_handler(SIGNAL_ARGS)
48964905
StartAutovacuumWorker();
48974906
}
48984907

4899-
if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER)&&
4900-
WalReceiverPID==0&&
4901-
(pmState==PM_STARTUP||pmState==PM_RECOVERY||
4902-
pmState==PM_HOT_STANDBY||pmState==PM_WAIT_READONLY)&&
4903-
Shutdown==NoShutdown)
4908+
if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER))
49044909
{
49054910
/* Startup Process wants us to start the walreceiver process. */
4906-
WalReceiverPID=StartWalReceiver();
4911+
/* Start immediately if possible, else remember request for later. */
4912+
WalReceiverRequested= true;
4913+
MaybeStartWalReceiver();
49074914
}
49084915

49094916
if (CheckPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE)&&
@@ -5279,6 +5286,24 @@ StartAutovacuumWorker(void)
52795286
}
52805287
}
52815288

5289+
/*
5290+
* MaybeStartWalReceiver
5291+
*Start the WAL receiver process, if not running and our state allows.
5292+
*/
5293+
staticvoid
5294+
MaybeStartWalReceiver(void)
5295+
{
5296+
if (WalReceiverPID==0&&
5297+
(pmState==PM_STARTUP||pmState==PM_RECOVERY||
5298+
pmState==PM_HOT_STANDBY||pmState==PM_WAIT_READONLY)&&
5299+
Shutdown==NoShutdown)
5300+
{
5301+
WalReceiverPID=StartWalReceiver();
5302+
WalReceiverRequested= false;
5303+
}
5304+
}
5305+
5306+
52825307
/*
52835308
* Create the opts file
52845309
*/

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp