Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-88887: Cleanupmultiprocessing.resource_tracker.ResourceTracker upon deletion#130429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
vstinner merged 21 commits intopython:mainfromluccabb:fix_resource_tracker_leak
Mar 20, 2025

Conversation

@luccabb
Copy link
Contributor

@luccabbluccabb commentedFeb 21, 2025
edited
Loading

fixing#88887: multiprocessing: the Resource Tracker process is never reaped

#88887 is only visible when running Python with PID 1. This is because Python relies onkernel's process handling code to clean up terminated processes, which works fine for most cases. But it won't work when running Python as PID 1and there's nosignal processing handler to reap zombies on client code

So to test changes we need a docker instance that starts with python code that leaks the process. For that I'm usinghttps://github.com/viktorvia/python-multi-issue but with a custom Dockerfile that installs my branch of cpython instead:https://gist.github.com/luccabb/a0a48b15cd30746e7619ecb7168ce439

Test plan

before

docker build

(base) luccab@luccab-mbp python-multi-issue % docker build -t multi

observe[python3] <defunct> processes increasing by 1 every 1s:

(base) luccab@luccab-mbp python-multi-issue % docker run -it multi python3 main.py[1, 4, 9]UID        PID  PPID  C STIME TTY          TIME CMDroot         1     0  0 23:00 pts/0    00:00:00 python3 main.pyroot         8     1  0 23:00 pts/0    00:00:00 [python3] <defunct>root        17     1  0 23:00 pts/0    00:00:00 ps -ef --forest[1, 4, 9]UID        PID  PPID  C STIME TTY          TIME CMDroot         1     0  3 23:00 pts/0    00:00:00 python3 main.pyroot         8     1  2 23:00 pts/0    00:00:00 [python3] <defunct>root        19     1  0 23:00 pts/0    00:00:00 [python3] <defunct>root        28     1  0 23:00 pts/0    00:00:00 ps -ef --forest[1, 4, 9]UID        PID  PPID  C STIME TTY          TIME CMDroot         1     0  1 23:00 pts/0    00:00:00 python3 main.pyroot         8     1  1 23:00 pts/0    00:00:00 [python3] <defunct>root        19     1  2 23:00 pts/0    00:00:00 [python3] <defunct>root        30     1  0 23:00 pts/0    00:00:00 [python3] <defunct>root        39     1  0 23:00 pts/0    00:00:00 ps -ef --forest

after

update Dockerfile to get code from this branch:https://gist.github.com/luccabb/a0a48b15cd30746e7619ecb7168ce439

docker build

(base) luccab@luccab-mbp python-multi-issue % docker build -t multi .

observe no[python3] <defunct> processes:

(base) luccab@luccab-mbp python-multi-issue % docker run -it multi python3 main.py[1, 4, 9]UID        PID  PPID  C STIME TTY          TIME CMDroot         1     0  4 23:02 pts/0    00:00:00 python3 main.pyroot        17     1  0 23:02 pts/0    00:00:00 ps -ef --forest[1, 4, 9]UID        PID  PPID  C STIME TTY          TIME CMDroot         1     0  2 23:02 pts/0    00:00:00 python3 main.pyroot        28     1  0 23:02 pts/0    00:00:00 ps -ef --forest[1, 4, 9]UID        PID  PPID  C STIME TTY          TIME CMDroot         1     0  1 23:02 pts/0    00:00:00 python3 main.pyroot        39     1  0 23:02 pts/0    00:00:00 ps -ef --forest

cavdhut, kjsakher, SamuelDoud, meobilivang, and NeonWizard reacted with thumbs up emoji
@ghost
Copy link

ghost commentedFeb 21, 2025
edited by ghost
Loading

All commit authors signed the Contributor License Agreement.
CLA signed

# see https://github.com/python/cpython/issues/88887
try:
self._stop()
except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Catching some exceptions is fine. But catching any exception is not ok. You should not ignore KeyboardInterrupt for example.

luccabb reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

moved to catchOSError, TypeError, AttributeError only

@picnixzpicnixz changed the titlegh-88887: Fix resource tracker leakgh-88887: Cleanupmultiprocessing.resource_tracker.ResourceTracker upon deletionFeb 24, 2025
Co-authored-by: Victor Stinner <vstinner@python.org>
@luccabb
Copy link
ContributorAuthor

@vstinner good to merge?

# see https://github.com/python/cpython/issues/88887
try:
self._stop()
except (OSError,TypeError,AttributeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not confortable with these exceptions. In which case we should expect these exceptions? I would prefer a direct call to self._stop() without try/except.

_stop() should be modified to do nothing ifself._pid is None.

def_stop(self):withself._lock:ifself._pidisNone:return            ...

luccabb reacted with thumbs up emoji
Copy link
ContributorAuthor

@luccabbluccabbMar 6, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

moved to catching AttributeError only which is likely caused by__del__:

del() can be executed during interpreter shutdown. As a consequence, the global variables it needs to access (including other modules) may already have been deleted or set to None.

https://docs.python.org/3/reference/datamodel.html#object.del

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

not catching AttributeError here fails some of the tests as theos module is alreadyNone when_stop runs

@luccabbluccabb requested a review fromvstinnerMarch 7, 2025 00:15
@luccabbluccabb requested a review fromvstinnerMarch 19, 2025 18:32
Copy link
Member

@vstinnervstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM

@vstinnervstinner merged commitf53e7de intopython:mainMar 20, 2025
43 checks passed
@vstinnervstinner added needs backport to 3.12only security fixes needs backport to 3.13bugs and security fixes labelsMar 20, 2025
@miss-islington-app
Copy link

Thanks@luccabb for the PR, and@vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Thanks@luccabb for the PR, and@vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry,@luccabb and@vstinner, I could not cleanly backport this to3.12 due to a conflict.
Please backport usingcherry_picker on command line.

cherry_picker f53e7de6a84a0f535efb75c3671283b801a1af0f 3.12

miss-islington pushed a commit to miss-islington/cpython that referenced this pull requestMar 20, 2025
…cker` upon deletion (pythonGH-130429)(cherry picked from commitf53e7de)Co-authored-by: luccabb <32229669+luccabb@users.noreply.github.com>Co-authored-by: Victor Stinner <vstinner@python.org>Co-authored-by: Gregory P. Smith <greg@krypto.org>
@bedevere-app
Copy link

GH-131516 is a backport of this pull request to the3.13 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.13bugs and security fixes labelMar 20, 2025
@vstinner
Copy link
Member

Merged, thank you.

@luccabb: Automated backport to 3.12 failed, do you want to try to backport the change manually to 3.12?

luccabb reacted with thumbs up emoji

vstinner added a commit that referenced this pull requestMar 20, 2025
…acker` upon deletion (GH-130429) (#131516)gh-88887: Cleanup `multiprocessing.resource_tracker.ResourceTracker` upon deletion (GH-130429)(cherry picked from commitf53e7de)Co-authored-by: luccabb <32229669+luccabb@users.noreply.github.com>Co-authored-by: Victor Stinner <vstinner@python.org>Co-authored-by: Gregory P. Smith <greg@krypto.org>
@luccabb
Copy link
ContributorAuthor

@vstinner: manually backporting changes to 3.12 on#131530

vstinner reacted with thumbs up emoji

@bedevere-app
Copy link

GH-131530 is a backport of this pull request to the3.12 branch.

@bedevere-appbedevere-appbot removed the needs backport to 3.12only security fixes labelMar 21, 2025
vstinner added a commit that referenced this pull requestMar 21, 2025
…acker` upon deletion (GH-130429) (#131530)Co-authored-by: Victor Stinner <vstinner@python.org>Co-authored-by: Gregory P. Smith <greg@krypto.org>(cherry picked from commitf53e7de)
@luccabbluccabb deleted the fix_resource_tracker_leak branchMarch 21, 2025 16:21
@tomMoral
Copy link
Contributor

Hello,

We noticed this change in the logs ofjoblib.
I did not have time to investigate in great details but from my understanding, this PR might break the intended purpose of theresource_tracker in case of nested parallelism.

Theresource_tracker is shared by the main process and all its children, and should be cleaned up only once all processes that share this resource tracker are destroyed. In case of nested parallelism, the same resource tracker is linked to all child processes. The goal of this tracker is to be a safe guard: it should be the last one standing. For this, the original design relied on the fact that a "read" pipe is not readable once the file descriptor of the "write" pipe is destroyed. But with the new design, I think the resource tracker will be called as soon as one child finishes cleanly, even if all other processes are still alive no?

I think in normal behavior, it should probably only be called by the main process no? Sorry if this is just because I misunderstood the PR by reading too fast.

@gpshead
Copy link
Member

Thanks, please open a new issue with a test scenario and reproduction details.

luccabb reacted with thumbs up emoji

@tomMoral
Copy link
Contributor

Sorry for the noise, I read too fast... The behavior is still the right one.

seehwan pushed a commit to seehwan/cpython that referenced this pull requestApr 16, 2025
…cker` upon deletion (python#130429)Co-authored-by: Victor Stinner <vstinner@python.org>Co-authored-by: Gregory P. Smith <greg@krypto.org>
@vstinner
Copy link
Member

There is a regression, waitpid() can fail with ChildProcessError:#140485

I wrote#141132 to fix it.

luccabb reacted with heart emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@vstinnervstinnervstinner approved these changes

@gpsheadgpsheadAwaiting requested review from gpsheadgpshead is a code owner

Assignees

@vstinnervstinner

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@luccabb@vstinner@tomMoral@gpshead

[8]ページ先頭

©2009-2025 Movatter.jp