Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commite5d494d

Browse files
committed
Don't lose walreceiver start requests due to race condition in postmaster.
When a walreceiver dies, the startup process will notice that and senda PMSIGNAL_START_WALRECEIVER signal to the postmaster, asking for a newwalreceiver to be launched. There's a race condition, which at leastin HEAD is very easy to hit, whereby the postmaster might see thatsignal before it processes the SIGCHLD from the walreceiver process.In that situation, sigusr1_handler() just dropped the start requeston the floor, reasoning that it must be redundant. Eventually, after10 seconds (WALRCV_STARTUP_TIMEOUT), the startup process would make afresh request --- but that's a long time if the connection could havebeen re-established almost immediately.Fix it by setting a state flag inside the postmaster that we won'tclear until we do launch a walreceiver. In cases where that resultsin an extra walreceiver launch, it's up to the walreceiver to realizeit's unwanted and go away --- but we have, and need, that logic anywayfor the opposite race case.I came across this through investigating unexpected delays in thesrc/test/recovery TAP tests: it manifests there in test cases wherea master server is stopped and restarted while leaving streamingslaves active.This logic has been broken all along, so back-patch to all supportedbranches.Discussion:https://postgr.es/m/21344.1498494720@sss.pgh.pa.us
1 parentad1b5c8 commite5d494d

File tree

1 file changed

+32
-7
lines changed

1 file changed

+32
-7
lines changed

‎src/backend/postmaster/postmaster.c

Lines changed: 32 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -357,6 +357,9 @@ static volatile sig_atomic_t start_autovac_launcher = false;
357357
/* the launcher needs to be signalled to communicate some condition */
358358
staticvolatileboolavlauncher_needs_signal= false;
359359

360+
/* received START_WALRECEIVER signal */
361+
staticvolatilesig_atomic_tWalReceiverRequested= false;
362+
360363
/* set when there's a worker that needs to be started up */
361364
staticvolatileboolStartWorkerNeeded= true;
362365
staticvolatileboolHaveCrashedWorker= false;
@@ -426,6 +429,7 @@ static void maybe_start_bgworkers(void);
426429
staticboolCreateOptsFile(intargc,char*argv[],char*fullprogname);
427430
staticpid_tStartChildProcess(AuxProcTypetype);
428431
staticvoidStartAutovacuumWorker(void);
432+
staticvoidMaybeStartWalReceiver(void);
429433
staticvoidInitPostmasterDeathWatchHandle(void);
430434

431435
/*
@@ -1810,6 +1814,10 @@ ServerLoop(void)
18101814
kill(AutoVacPID,SIGUSR2);
18111815
}
18121816

1817+
/* If we need to start a WAL receiver, try to do that now */
1818+
if (WalReceiverRequested)
1819+
MaybeStartWalReceiver();
1820+
18131821
/* Get other worker processes running, if needed */
18141822
if (StartWorkerNeeded||HaveCrashedWorker)
18151823
maybe_start_bgworkers();
@@ -2958,7 +2966,8 @@ reaper(SIGNAL_ARGS)
29582966
/*
29592967
* Was it the wal receiver? If exit status is zero (normal) or one
29602968
* (FATAL exit), we assume everything is all right just like normal
2961-
* backends.
2969+
* backends. (If we need a new wal receiver, we'll start one at the
2970+
* next iteration of the postmaster's main loop.)
29622971
*/
29632972
if (pid==WalReceiverPID)
29642973
{
@@ -5066,14 +5075,12 @@ sigusr1_handler(SIGNAL_ARGS)
50665075
StartAutovacuumWorker();
50675076
}
50685077

5069-
if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER)&&
5070-
WalReceiverPID==0&&
5071-
(pmState==PM_STARTUP||pmState==PM_RECOVERY||
5072-
pmState==PM_HOT_STANDBY||pmState==PM_WAIT_READONLY)&&
5073-
Shutdown==NoShutdown)
5078+
if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER))
50745079
{
50755080
/* Startup Process wants us to start the walreceiver process. */
5076-
WalReceiverPID=StartWalReceiver();
5081+
/* Start immediately if possible, else remember request for later. */
5082+
WalReceiverRequested= true;
5083+
MaybeStartWalReceiver();
50775084
}
50785085

50795086
if (CheckPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE)&&
@@ -5409,6 +5416,24 @@ StartAutovacuumWorker(void)
54095416
}
54105417
}
54115418

5419+
/*
5420+
* MaybeStartWalReceiver
5421+
*Start the WAL receiver process, if not running and our state allows.
5422+
*/
5423+
staticvoid
5424+
MaybeStartWalReceiver(void)
5425+
{
5426+
if (WalReceiverPID==0&&
5427+
(pmState==PM_STARTUP||pmState==PM_RECOVERY||
5428+
pmState==PM_HOT_STANDBY||pmState==PM_WAIT_READONLY)&&
5429+
Shutdown==NoShutdown)
5430+
{
5431+
WalReceiverPID=StartWalReceiver();
5432+
WalReceiverRequested= false;
5433+
}
5434+
}
5435+
5436+
54125437
/*
54135438
* Create the opts file
54145439
*/

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp