Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork34.3k
Description
Bug report
Bug description:
A regression caused by#88887
The best context for this issue comes from two places: my (1)#88887 (comment) report and independent confirmation from@itamaro in (2)#88887 (comment)
(1) """a regression in processes using fork() where a reference to the resource_tracker's pipe remains alive in another process.https://github.com/gpshead/cpython/blob/00d16dca6e911fb69c055aa874a2d25cb5e5fe6a/Lib/test/_test_multiprocessing.py#L6293-L6306 has an example of a regression test that demonstrates it.
Basically, at process shutdown the new__del__ finalizer is called and can hang inwaitpid on a child process that is not exiting.
We could severthat relationship so the fd isn't inherited and the shared resource_tracker used by multiple sub-child processes when the "fork" start_method is used is no longer a feature - that'd undo#80849 's#5172 which added that as a feature (cc:@pitrou &@tomMoral) - but also "fork" as a start_method is rather frowned upon these days - people are better off avoiding it. But the default onlyjust changed away in 3.14 so a lot of people still are - I encountered this in 3.13.9 & 3.13.11.
I would not undo a feature in a bugfix regardless.
One "easy" workaround for now isprobably for anyone actually hitting this is possibly to restore previous behavior and re-gainthis issue - which it feels like it was uncommon:
ifhasattr(multiprocessing.resource_tracker.ResourceTracker,"__del__"):delmultiprocessing.resource_tracker.ResourceTracker.__del__
A fix forward could basically be to undo#5172's feature."""
(2) """hey@gpshead, I believe I ran into this at least twice now, while migrating Meta to 3.12.
Trying to create a minimal reproducer, here's what I got:
importosimportsysimporttimefrommultiprocessing.resource_trackerimportensure_running# Step 1: Start the resource tracker (creates the pipe with fds r, w).ensure_running()print("Resource tracker started.",flush=True)# Step 2: Fork. The child inherits the write-end fd of the tracker pipe.pid=os.fork()ifpid==0:# Child: stay alive so the inherited write-end fd remains open,# preventing the tracker from seeing EOF.print(f"[child{os.getpid()}] sleeping (holds write fd open)...",flush=True)time.sleep(100.0)print(f"[child{os.getpid()}] exiting...",flush=True)sys.exit(0)else:# Parent: exit normally. During shutdown, ResourceTracker.__del__# closes the write fd and calls waitpid() on the tracker process.# The tracker never exits because the child still has the fd open.print(f"[parent{os.getpid()}] exiting normally (child={pid})...",flush=True)
and here's what I ended up doing in our global sitecustomize.py to workaround it:
https://github.com/facebook/buck2/blob/271de04a2a00041cee2e9e18d896fcd24f241598/prelude/python/tools/make_par/sitecustomize.py#L203-L246
(briefly: register at fork callback that resets the resource tracker inherited from the parent (if it was started) after in child)"""
My first draft of a regression test trying to reproduce it and a fix was inmain...gpshead:cpython:claude/fix-resource-tracker-hang-XZw5P from January.
I'll turn something here into a real fix.
CPython versions tested on:
3.12, 3.13
Operating systems tested on:
Linux