Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit29722d7

Browse files
committed
In immediate shutdown, postmaster should not exit till children are gone.
This adjusts commit82233ce so that thepostmaster does not exit until all its child processes have exited, evenif the 5-second timeout elapses and we have to send SIGKILL. There is nogreat value in having the postmaster process quit sooner, and doing so canmislead onlookers into thinking that the cluster is fully terminated whenactually some child processes still survive.This effect might explain recent test failures on buildfarm member hamster,wherein we failed to restart a cluster just after shutting it down with"pg_ctl stop -m immediate".I also did a bit of code review/beautification, including fixing a faultyuse of the Max() macro on a volatile expression.Back-patch to 9.4. In older branches, the postmaster never waited forchildren to exit during immediate shutdowns, and changing that would betoo much of a behavioral change.
1 parentcf73376 commit29722d7

File tree

2 files changed

+17
-19
lines changed

2 files changed

+17
-19
lines changed

‎doc/src/sgml/runtime.sgml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1426,10 +1426,11 @@ $ <userinput>sysctl -w vm.nr_hugepages=3170</userinput>
14261426
<para>
14271427
This is the <firstterm>Immediate Shutdown</firstterm> mode.
14281428
The server will send <systemitem>SIGQUIT</systemitem> to all child
1429-
processes and wait for them to terminate. Those that don't terminate
1430-
within 5 seconds, will be sent <systemitem>SIGKILL</systemitem> by the
1431-
master <command>postgres</command> process, which will then terminate
1432-
without further waiting. This will lead to recovery (by
1429+
processes and wait for them to terminate. If any do not terminate
1430+
within 5 seconds, they will be sent <systemitem>SIGKILL</systemitem>.
1431+
The master server process exits as soon as all child processes have
1432+
exited, without doing normal database shutdown processing.
1433+
This will lead to recovery (by
14331434
replaying the WAL log) upon next start-up. This is recommended
14341435
only in emergencies.
14351436
</para>

‎src/backend/postmaster/postmaster.c

Lines changed: 12 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -324,8 +324,10 @@ typedef enum
324324

325325
staticPMStatepmState=PM_INIT;
326326

327-
/* Start time of abort processing at immediate shutdown or child crash */
328-
statictime_tAbortStartTime;
327+
/* Start time of SIGKILL timeout during immediate shutdown or child crash */
328+
/* Zero means timeout is not running */
329+
statictime_tAbortStartTime=0;
330+
/* Length of said timeout */
329331
#defineSIGKILL_CHILDREN_AFTER_SECS5
330332

331333
staticboolReachedNormalRunning= false;/* T if we've reached PM_RUN */
@@ -1411,7 +1413,8 @@ checkDataDir(void)
14111413
* In normal conditions we wait at most one minute, to ensure that the other
14121414
* background tasks handled by ServerLoop get done even when no requests are
14131415
* arriving. However, if there are background workers waiting to be started,
1414-
* we don't actually sleep so that they are quickly serviced.
1416+
* we don't actually sleep so that they are quickly serviced. Other exception
1417+
* cases are as shown in the code.
14151418
*/
14161419
staticvoid
14171420
DetermineSleepTime(structtimeval*timeout)
@@ -1425,11 +1428,12 @@ DetermineSleepTime(struct timeval * timeout)
14251428
if (Shutdown>NoShutdown||
14261429
(!StartWorkerNeeded&& !HaveCrashedWorker))
14271430
{
1428-
if (AbortStartTime>0)
1431+
if (AbortStartTime!=0)
14291432
{
14301433
/* time left to abort; clamp to 0 in case it already expired */
1431-
timeout->tv_sec=Max(SIGKILL_CHILDREN_AFTER_SECS-
1432-
(time(NULL)-AbortStartTime),0);
1434+
timeout->tv_sec=SIGKILL_CHILDREN_AFTER_SECS-
1435+
(time(NULL)-AbortStartTime);
1436+
timeout->tv_sec=Max(timeout->tv_sec,0);
14331437
timeout->tv_usec=0;
14341438
}
14351439
else
@@ -1699,20 +1703,13 @@ ServerLoop(void)
16991703
* Note we also do this during recovery from a process crash.
17001704
*/
17011705
if ((Shutdown >=ImmediateShutdown|| (FatalError&& !SendStop))&&
1702-
AbortStartTime>0&&
1703-
now-AbortStartTime >=SIGKILL_CHILDREN_AFTER_SECS)
1706+
AbortStartTime!=0&&
1707+
(now-AbortStartTime) >=SIGKILL_CHILDREN_AFTER_SECS)
17041708
{
17051709
/* We were gentle with them before. Not anymore */
17061710
TerminateChildren(SIGKILL);
17071711
/* reset flag so we don't SIGKILL again */
17081712
AbortStartTime=0;
1709-
1710-
/*
1711-
* Additionally, unless we're recovering from a process crash,
1712-
* it's now the time for postmaster to abandon ship.
1713-
*/
1714-
if (!FatalError)
1715-
ExitPostmaster(1);
17161713
}
17171714
}
17181715
}

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp