NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commitee32782

committed

Fix postmaster state machine to handle dead_end child crashes better.

A report from Alvaro Herrera shows that if we're in PM_STARTUPstate, and we spawn a dead_end child to reject some incomingconnection request, and that child dies with an unexpected exitcode, the postmaster does not respond well. We correctly sendSIGQUIT to the startup process, but then:* if the startup process exits with nonzero exit code, as expected,we thought that that indicated a crash and aborted startup.* if the startup process exits with zero exit code, which is possibledue to the inherent race condition, we'd advance to PM_RUN statewhich is fine --- but the code forgot that AbortStartTime would benonzero in this situation. We'd either die on the Asserts sayingthat it was zero, or perhaps misbehave later on. (A quick looksuggests that the only misbehavior might be busy-waiting due toDetermineSleepTime doing the wrong thing.)To fix the first point, adjust the state-machine logic to recognizethat a nonzero exit code is expected after sending SIGQUIT, and haveit transition to a state where we can restart the startup process.To fix the second point, change the Asserts to clear the variablerather than just claiming it should be clear already.Perhaps we could improve this further by not treating a crash ofa dead_end child as a reason for panic'ing the database. However,since those child processes are connected to shared memory, thatseems a bit risky. There are few good reasons for a dead_end childto report failure anyway (the cause of this in Alvaro's report isquite unclear). On balance, therefore, a minimal fix seems best.This is an oversight in commit45811be. While that was back-patched,I'm hesitant to back-patch this change. The lack of reasons for adead_end child to fail suggests that the case should be very rare inthe field, which squares with the lack of reports; so it seems likethis might not be worth the risk of introducing new issues. In anycase we can let it bake awhile in HEAD before considering a back-patch.Discussion:https://postgr.es/m/20190615160950.GA31378@alvherre.pgsql

1 parent348778d commitee32782Copy full SHA for ee32782

File tree

1 file changed

+19

-4

lines changed

src/backend/postmaster
- postmaster.c

1 file changed

+19

-4

lines changed

`‎src/backend/postmaster/postmaster.c‎`

Lines changed: 19 additions & 4 deletions

Original file line number	Diff line number	Diff line change
`@@ -2920,7 +2920,9 @@ reaper(SIGNAL_ARGS)`
`2920`	`2920`	`* during PM_STARTUP is treated as catastrophic. There are no`
`2921`	`2921`	`* other processes running yet, so we can just exit.`
`2922`	`2922`	`*/`
`2923`		`-if (pmState==PM_STARTUP&& !EXIT_STATUS_0(exitstatus))`
	`2923`	`+if (pmState==PM_STARTUP&&`
	`2924`	`+StartupStatus!=STARTUP_SIGNALED&&`
	`2925`	`+!EXIT_STATUS_0(exitstatus))`
`2924`	`2926`	`{`
`2925`	`2927`	`LogChildExit(LOG,_("startup process"),`
`2926`	`2928`	`pid,exitstatus);`
`@@ -2937,11 +2939,24 @@ reaper(SIGNAL_ARGS)`
`2937`	`2939`	`* then we previously sent the startup process a SIGQUIT; so`
`2938`	`2940`	`* that's probably the reason it died, and we do want to try to`
`2939`	`2941`	`* restart in that case.`
	`2942`	`+ *`
	`2943`	`+ * This stanza also handles the case where we sent a SIGQUIT`
	`2944`	`+ * during PM_STARTUP due to some dead_end child crashing: in that`
	`2945`	`+ * situation, if the startup process dies on the SIGQUIT, we need`
	`2946`	`+ * to transition to PM_WAIT_BACKENDS state which will allow`
	`2947`	`+ * PostmasterStateMachine to restart the startup process. (On the`
	`2948`	`+ * other hand, the startup process might complete normally, if we`
	`2949`	`+ * were too late with the SIGQUIT. In that case we'll fall`
	`2950`	`+ * through and commence normal operations.)`
`2940`	`2951`	`*/`
`2941`	`2952`	`if (!EXIT_STATUS_0(exitstatus))`
`2942`	`2953`	`{`
`2943`	`2954`	`if (StartupStatus==STARTUP_SIGNALED)`
	`2955`	`+{`
`2944`	`2956`	`StartupStatus=STARTUP_NOT_RUNNING;`
	`2957`	`+if (pmState==PM_STARTUP)`
	`2958`	`+pmState=PM_WAIT_BACKENDS;`
	`2959`	`+}`
`2945`	`2960`	`else`
`2946`	`2961`	`StartupStatus=STARTUP_CRASHED;`
`2947`	`2962`	`HandleChildCrash(pid,exitstatus,`
`@@ -2954,7 +2969,7 @@ reaper(SIGNAL_ARGS)`
`2954`	`2969`	`*/`
`2955`	`2970`	`StartupStatus=STARTUP_NOT_RUNNING;`
`2956`	`2971`	`FatalError= false;`
`2957`		`-Assert(AbortStartTime==0);`
	`2972`	`+AbortStartTime=0;`
`2958`	`2973`	`ReachedNormalRunning= true;`
`2959`	`2974`	`pmState=PM_RUN;`
`2960`	`2975`
`@@ -3504,7 +3519,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)`
`3504`	`3519`	`if (pid==StartupPID)`
`3505`	`3520`	`{`
`3506`	`3521`	`StartupPID=0;`
`3507`		`-StartupStatus=STARTUP_CRASHED;`
	`3522`	`+/* Caller adjustsStartupStatus, so don't touch it here */`
`3508`	`3523`	`}`
`3509`	`3524`	`elseif (StartupPID!=0&&take_action)`
`3510`	`3525`	`{`
`@@ -5100,7 +5115,7 @@ sigusr1_handler(SIGNAL_ARGS)`
`5100`	`5115`	`{`
`5101`	`5116`	`/* WAL redo has started. We're out of reinitialization. */`
`5102`	`5117`	`FatalError= false;`
`5103`		`-Assert(AbortStartTime==0);`
	`5118`	`+AbortStartTime=0;`
`5104`	`5119`
`5105`	`5120`	`/*`
`5106`	`5121`	`* Crank up the background tasks. It doesn't matter if this fails,`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commitee32782

File tree

1 file changed

1 file changed

`‎src/backend/postmaster/postmaster.c‎`

0 commit comments