Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork34.3k
gh-146313: Fix multiprocessing ResourceTracker deadlock after os.fork()#146316
Draft
gpshead wants to merge 1 commit intopython:mainfrom
Draft
gh-146313: Fix multiprocessing ResourceTracker deadlock after os.fork()#146316gpshead wants to merge 1 commit intopython:mainfrom
gpshead wants to merge 1 commit intopython:mainfrom
Conversation
ProblemResourceTracker.__del__ (added inpythongh-88887) calls os.waitpid(pid, 0)which blocks indefinitely if a process created via os.fork() stillholds the tracker pipe's write end. The tracker never sees EOF, neverexits, and the parent hangs at interpreter shutdown.Root causeThree requirements conflict:-pythongh-88887 wants the parent to reap the tracker to prevent zombies-pythongh-80849 wants mp.Process(fork) children to reuse the parent's tracker via the inherited pipe fd-pythongh-146313 shows the parent can't block in waitpid() if a child holds the fd -- the tracker won't see EOF until all copies closeFixTwo layers:Timeout safety-net. _stop_locked() gains a wait_timeout parameter.When called from __del__, it polls with WNOHANG using exponentialbackoff for up to 1 second instead of blocking indefinitely.At-fork handler. An os.register_at_fork(after_in_child=...) handlercloses the inherited pipe fd in the child unless a preserve flag isset. popen_fork.Popen._launch() sets the flag before its fork somp.Process(fork) children keep the fd and reuse the parent's tracker(preservingpythongh-80849). Raw os.fork() children close the fd, lettingthe parent reap promptly.Result Scenario Before After Raw os.fork(), parent exits while child alive deadlock ~30ms reap mp.Process(fork), parent joins then exits ~30ms reap ~30ms reap mp.Process(fork), parent exits abnormally deadlock 1s bounded wait No fork (pythongh-88887 scenario) ~30ms reap ~30ms reapThe at-fork handler makes the timeout unreachable in all well-behavedpaths. The timeout remains as a safety net for abnormal shutdowns.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading.Please reload this page.
Problem
ResourceTracker.__del__(added ingh-88887) callsos.waitpid(pid, 0)which blocks indefinitely if a process created viaos.fork()still holds the tracker pipe's write end. The tracker never sees EOF, never exits, and the parent hangs at interpreter shutdown.Root cause
Three requirements conflict:
mp.Process(fork)children to reuse the parent's tracker via the inherited pipe fdwaitpid()if a child holds the fd -- the tracker won't see EOF until all copies closeFix
Two layers:
Timeout safety-net.
_stop_locked()gains await_timeoutparameter. When called from__del__, it polls withWNOHANGusing exponential backoff for up to 1 second instead of blocking indefinitely.At-fork handler. An
os.register_at_fork(after_in_child=...)handler closes the inherited pipe fd in the child unless a preserve flag is set.popen_fork.Popen._launch()sets the flag before its fork somp.Process(fork)children keep the fd and reuse the parent's tracker (preservinggh-80849). Rawos.fork()children close the fd, letting the parent reap promptly.Result
os.fork(), parent exits while child alivemp.Process(fork), parent joins then exitsmp.Process(fork), parent exits abnormallyThe at-fork handler makes the timeout unreachable in all well-behaved paths. The timeout remains as a safety net for abnormal shutdowns.