Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-130895: fix multiprocessing.Process join/wait/poll races#131440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
duaneg wants to merge2 commits intopython:main
base:main
Choose a base branch
Loading
fromduaneg:gh-130895

Conversation

duaneg
Copy link
Contributor

@duanegduaneg commentedMar 19, 2025
edited by bedevere-appbot
Loading

This bug is caused by race conditions in the poll implementations (which are called by join/wait) where if multiple threads try to reap the dead process only one "wins" and gets the exit code, while the others get an error.

In the forkserver implementation the losing thread(s) set the code to an error, possibly overwriting the correct code set by the winning thread. This is relatively easy to fix: we can just take a lock before waiting for the process, since at that point we know the call should not block.

In the fork and spawn implementations the losers of the race return before the exit code is set, meaning the process may still report itself as alive after join returns. Fixing this is trickier as we have to support a mixture of blocking and non-blocking calls to poll, and we cannot have the latter waiting to take a lock held by the former.

The approach taken is to split the blocking and non-blocking call variants. The non-blocking variant does its work with the lock held: since it won't block this should be safe. The blocking variant releases the lock before making the blocking operating system call. It then retakes the lock and either sets the code if it wins or waits for a potentially racing thread to do so otherwise.

If a non-blocking call is racing with the unlocked part of a blocking call it may still "lose" the race, and return None instead of the exit code, even though the process is dead. However, as the process could be alive at the time the call is made but die immediately afterwards, this situation should already be handled by correctly written code.

To verify the behaviour a test is added which reliably triggers failures for all three implementations. A work-around for this bug in a test added forgh-128041 is also reverted.

azuline reacted with heart emoji
This bug is caused by race conditions in the poll implementations (which arecalled by join/wait) where if multiple threads try to reap the dead processonly one "wins" and gets the exit code, while the others get an error.In the forkserver implementation the losing thread(s) set the code to an error,possibly overwriting the correct code set by the winning thread. This isrelatively easy to fix: we can just take a lock before waiting for the process,since at that point we know the call should not block.In the fork and spawn implementations the losers of the race return before theexit code is set, meaning the process may still report itself as alive afterjoin returns. Fixing this is trickier as we have to support a mixture ofblocking and non-blocking calls to poll, and we cannot have the latter waitingto take a lock held by the former.The approach taken is to split the blocking and non-blocking call variants. Thenon-blocking variant does its work with the lock held: since it won't blockthis should be safe. The blocking variant releases the lock before making theblocking operating system call. It then retakes the lock and either sets thecode if it wins or waits for a potentially racing thread to do so otherwise.If a non-blocking call is racing with the unlocked part of a blocking call itmay still "lose" the race, and return None instead of the exit code, eventhough the process is dead. However, as the process could be alive at the timethe call is made but die immediately afterwards, this situation should alreadybe handled by correctly written code.To verify the behaviour a test is added which reliably triggers failures forall three implementations. A work-around for this bug in a test added forpythongh-128041 is also reverted.
@duanegduaneg requested a review fromgpshead as acode ownerMarch 19, 2025 02:06
@ghost
Copy link

ghost commentedMar 19, 2025
edited by ghost
Loading

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

Most changes to Pythonrequire a NEWS entry. Add one using theblurb_it web app or theblurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply theskip news label instead.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@gpsheadgpsheadAwaiting requested review from gpsheadgpshead is a code owner

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

1 participant
@duaneg

[8]ページ先頭

©2009-2025 Movatter.jp