Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit3d8068e

Browse files
committed
Fix race condition between shutdown and unstarted background workers.
If a database shutdown (smart or fast) is commanded between the timesome process decides to request a new background worker and the timethat the postmaster can launch that worker, then nothing happensbecause the postmaster won't launch any bgworkers once it's exitedPM_RUN state. This is fine ... unless the requesting process iswaiting for that worker to finish (or even for it to start); in thatcase the requestor is stuck, and only manual intervention will get usto the point of being able to shut down.To fix, cancel pending requests for workers when the postmaster sendsshutdown (SIGTERM) signals, and similarly cancel any new requests thatarrive after that point. (We can optimize things slightly by onlydoing the cancellation for workers that have waiters.) To fit withinthe existing bgworker APIs, the "cancel" is made to look like theworker was started and immediately stopped, causing deregistration ofthe bgworker entry. Waiting processes would have to deal withpremature worker exit anyway, so this should introduce no bugs thatweren't there before. We do have a side effect that registrationrecords for restartable bgworkers might disappear when theoreticallythey should have remained in place; but since we're shutting down,that shouldn't matter.Back-patch to v10. There might be value in putting this into 9.6as well, but the management of bgworkers is a bit different there(notably see8ff5186) and I'm not convinced it's worth the effortto validate the patch for that branch.Discussion:https://postgr.es/m/661570.1608673226@sss.pgh.pa.us
1 parent8985e02 commit3d8068e

File tree

4 files changed

+90
-20
lines changed

4 files changed

+90
-20
lines changed

‎contrib/pg_prewarm/autoprewarm.c

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -400,12 +400,7 @@ apw_load_buffers(void)
400400

401401
/*
402402
* Likewise, don't launch if we've already been told to shut down.
403-
*
404-
* There is a race condition here: if the postmaster has received a
405-
* fast-shutdown signal, but we've not heard about it yet, then the
406-
* postmaster will ignore our worker start request and we'll wait
407-
* forever. However, that's a bug in the general background-worker
408-
* logic, not the fault of this module.
403+
* (The launch would fail anyway, but we might as well skip it.)
409404
*/
410405
if (got_sigterm)
411406
break;

‎src/backend/postmaster/bgworker.c

Lines changed: 71 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -232,13 +232,15 @@ FindRegisteredWorkerBySlotNumber(int slotno)
232232
}
233233

234234
/*
235-
* Notice changes to shared memory made by other backends. This code
236-
* runs in the postmaster, so we must be very careful not to assume that
237-
* shared memory contents are sane. Otherwise, a rogue backend could take
238-
* out the postmaster.
235+
* Notice changes to shared memory made by other backends.
236+
* Accept new worker requests only if allow_new_workers is true.
237+
*
238+
* This code runs in the postmaster, so we must be very careful not to assume
239+
* that shared memory contents are sane. Otherwise, a rogue backend could
240+
* take out the postmaster.
239241
*/
240242
void
241-
BackgroundWorkerStateChange(void)
243+
BackgroundWorkerStateChange(boolallow_new_workers)
242244
{
243245
intslotno;
244246

@@ -298,6 +300,15 @@ BackgroundWorkerStateChange(void)
298300
continue;
299301
}
300302

303+
/*
304+
* If we aren't allowing new workers, then immediately mark it for
305+
* termination; the next stanza will take care of cleaning it up.
306+
* Doing this ensures that any process waiting for the worker will get
307+
* awoken, even though the worker will never be allowed to run.
308+
*/
309+
if (!allow_new_workers)
310+
slot->terminate= true;
311+
301312
/*
302313
* If the worker is marked for termination, we don't need to add it to
303314
* the registered workers list; we can just free the slot. However, if
@@ -504,12 +515,55 @@ BackgroundWorkerStopNotifications(pid_t pid)
504515
}
505516
}
506517

518+
/*
519+
* Cancel any not-yet-started worker requests that have waiting processes.
520+
*
521+
* This is called during a normal ("smart" or "fast") database shutdown.
522+
* After this point, no new background workers will be started, so anything
523+
* that might be waiting for them needs to be kicked off its wait. We do
524+
* that by cancelling the bgworker registration entirely, which is perhaps
525+
* overkill, but since we're shutting down it does not matter whether the
526+
* registration record sticks around.
527+
*
528+
* This function should only be called from the postmaster.
529+
*/
530+
void
531+
ForgetUnstartedBackgroundWorkers(void)
532+
{
533+
slist_mutable_iteriter;
534+
535+
slist_foreach_modify(iter,&BackgroundWorkerList)
536+
{
537+
RegisteredBgWorker*rw;
538+
BackgroundWorkerSlot*slot;
539+
540+
rw=slist_container(RegisteredBgWorker,rw_lnode,iter.cur);
541+
Assert(rw->rw_shmem_slot<max_worker_processes);
542+
slot=&BackgroundWorkerData->slot[rw->rw_shmem_slot];
543+
544+
/* If it's not yet started, and there's someone waiting ... */
545+
if (slot->pid==InvalidPid&&
546+
rw->rw_worker.bgw_notify_pid!=0)
547+
{
548+
/* ... then zap it, and notify the waiter */
549+
intnotify_pid=rw->rw_worker.bgw_notify_pid;
550+
551+
ForgetBackgroundWorker(&iter);
552+
if (notify_pid!=0)
553+
kill(notify_pid,SIGUSR1);
554+
}
555+
}
556+
}
557+
507558
/*
508559
* Reset background worker crash state.
509560
*
510561
* We assume that, after a crash-and-restart cycle, background workers without
511562
* the never-restart flag should be restarted immediately, instead of waiting
512-
* for bgw_restart_time to elapse.
563+
* for bgw_restart_time to elapse. On the other hand, workers with that flag
564+
* should be forgotten immediately, since we won't ever restart them.
565+
*
566+
* This function should only be called from the postmaster.
513567
*/
514568
void
515569
ResetBackgroundWorkerCrashTimes(void)
@@ -549,6 +603,11 @@ ResetBackgroundWorkerCrashTimes(void)
549603
* resetting.
550604
*/
551605
rw->rw_crashed_at=0;
606+
607+
/*
608+
* If there was anyone waiting for it, they're history.
609+
*/
610+
rw->rw_worker.bgw_notify_pid=0;
552611
}
553612
}
554613
}
@@ -1098,6 +1157,9 @@ GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, pid_t *pidp)
10981157
* returned. However, if the postmaster has died, we give up and return
10991158
* BGWH_POSTMASTER_DIED, since it that case we know that startup will not
11001159
* take place.
1160+
*
1161+
* The caller *must* have set our PID as the worker's bgw_notify_pid,
1162+
* else we will not be awoken promptly when the worker's state changes.
11011163
*/
11021164
BgwHandleStatus
11031165
WaitForBackgroundWorkerStartup(BackgroundWorkerHandle*handle,pid_t*pidp)
@@ -1140,6 +1202,9 @@ WaitForBackgroundWorkerStartup(BackgroundWorkerHandle *handle, pid_t *pidp)
11401202
* and then return BGWH_STOPPED. However, if the postmaster has died, we give
11411203
* up and return BGWH_POSTMASTER_DIED, because it's the postmaster that
11421204
* notifies us when a worker's state changes.
1205+
*
1206+
* The caller *must* have set our PID as the worker's bgw_notify_pid,
1207+
* else we will not be awoken promptly when the worker's state changes.
11431208
*/
11441209
BgwHandleStatus
11451210
WaitForBackgroundWorkerShutdown(BackgroundWorkerHandle*handle)

‎src/backend/postmaster/postmaster.c

Lines changed: 16 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3761,6 +3761,13 @@ PostmasterStateMachine(void)
37613761
*/
37623762
if (pmState==PM_STOP_BACKENDS)
37633763
{
3764+
/*
3765+
* Forget any pending requests for background workers, since we're no
3766+
* longer willing to launch any new workers. (If additional requests
3767+
* arrive, BackgroundWorkerStateChange will reject them.)
3768+
*/
3769+
ForgetUnstartedBackgroundWorkers();
3770+
37643771
/* Signal all backend children except walsenders */
37653772
SignalSomeChildren(SIGTERM,
37663773
BACKEND_TYPE_ALL-BACKEND_TYPE_WALSND);
@@ -5151,13 +5158,6 @@ sigusr1_handler(SIGNAL_ARGS)
51515158
PG_SETMASK(&BlockSig);
51525159
#endif
51535160

5154-
/* Process background worker state change. */
5155-
if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE))
5156-
{
5157-
BackgroundWorkerStateChange();
5158-
StartWorkerNeeded= true;
5159-
}
5160-
51615161
/*
51625162
* RECOVERY_STARTED and BEGIN_HOT_STANDBY signals are ignored in
51635163
* unexpected states. If the startup process quickly starts up, completes
@@ -5203,6 +5203,7 @@ sigusr1_handler(SIGNAL_ARGS)
52035203

52045204
pmState=PM_RECOVERY;
52055205
}
5206+
52065207
if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY)&&
52075208
pmState==PM_RECOVERY&&Shutdown==NoShutdown)
52085209
{
@@ -5228,6 +5229,14 @@ sigusr1_handler(SIGNAL_ARGS)
52285229
StartWorkerNeeded= true;
52295230
}
52305231

5232+
/* Process background worker state changes. */
5233+
if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE))
5234+
{
5235+
/* Accept new worker requests only if not stopping. */
5236+
BackgroundWorkerStateChange(pmState<PM_STOP_BACKENDS);
5237+
StartWorkerNeeded= true;
5238+
}
5239+
52315240
if (StartWorkerNeeded||HaveCrashedWorker)
52325241
maybe_start_bgworkers();
52335242

‎src/include/postmaster/bgworker_internals.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,11 +46,12 @@ extern slist_head BackgroundWorkerList;
4646

4747
externSizeBackgroundWorkerShmemSize(void);
4848
externvoidBackgroundWorkerShmemInit(void);
49-
externvoidBackgroundWorkerStateChange(void);
49+
externvoidBackgroundWorkerStateChange(boolallow_new_workers);
5050
externvoidForgetBackgroundWorker(slist_mutable_iter*cur);
5151
externvoidReportBackgroundWorkerPID(RegisteredBgWorker*);
5252
externvoidReportBackgroundWorkerExit(slist_mutable_iter*cur);
5353
externvoidBackgroundWorkerStopNotifications(pid_tpid);
54+
externvoidForgetUnstartedBackgroundWorkers(void);
5455
externvoidResetBackgroundWorkerCrashTimes(void);
5556

5657
/* Function to start a background worker, called from postmaster.c */

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp