Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

bpo-22393: Fix multiprocessing.Pool hangs if a worker process dies unexpectedly#10441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
oesteban wants to merge16 commits intopython:mainfromoesteban:fix-issue-22393

Conversation

oesteban
Copy link

@oestebanoesteban commentedNov 9, 2018
edited by bedevere-bot
Loading

This PR fixesissue22393.

Three new unittests have been added.

https://bugs.python.org/issue22393

@the-knights-who-say-ni

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed thePSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please followthe steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You cancheck yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

oesteban added a commit to oesteban/nipype that referenced this pull requestNov 9, 2018
This PR relates tonipy#2700, and should fix the problemunderlyingnipy#2548.I first considered adding a control thread that monitorsthe `Pool` of workers, but that would require a large overheadkeeping track of PIDs and polling very often.Just adding the core file of [bpo-22393](python/cpython#10441)shouldfixnipy#2548
Copy link
Contributor

@effigieseffigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Just a couple comments, pending review from the cpython devs.

@oesteban
Copy link
Author

Hi@pitrou (or anyone with a say), can you give us a hint about the fate of this PR (even if you honestly think it does not have a very promising future).

Thanks

Copy link
Member

@pitroupitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Sorry for the delay@oesteban. I've made a couple of comments, you might want to address them.

Also, it seems you'll need to merge/rebase from master and fix any conflicts.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phraseI have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes,you will be poked with soft cushions!

@oesteban
Copy link
Author

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@pitrou: please review the changes made to this pull request.

@oesteban
Copy link
Author

pinging@pitrou, at least to know if the changes pointed at the right direction.

@pitroupitrou changed the titlebpo-22393: FIX multiprocessing.Pool hangs if a worker process dies unexpectedlybpo-22393: Fix multiprocessing.Pool hangs if a worker process dies unexpectedlyFeb 7, 2019
@pitrou
Copy link
Member

Sorry, will take a look again. Also@pablogsal you may be interested in this.

oesteban and yiannist reacted with thumbs up emojiyiannist reacted with eyes emoji

@oesteban
Copy link
Author

bumping up!

@oesteban
Copy link
Author

oesteban commentedMay 22, 2019
edited
Loading

Are there any plans for deprecating multiprocessing? Otherwise, I think this bug should be addressed.

If the proposed fix is not the right way of fixing it, please let me know. I'll resolve the conflicts only once I know there is interest in doing so.

Thanks very much

@pitrou
Copy link
Member

@pierreglaser@tomMoral Would you like to take a look at this?

@pierreglaser
Copy link
Contributor

Yes I can have a look.

oesteban reacted with thumbs up emoji

@tomMoral
Copy link
Contributor

tomMoral commentedMay 23, 2019
edited
Loading

I'll have a look too.

oesteban reacted with thumbs up emoji

@oesteban
Copy link
Author

@pitrou thanks for the prompt response!

Copy link
Contributor

@pierreglaserpierreglaser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Here is a first review.@tomMoral's one should land sometime next week :)


class BrokenProcessPool(RuntimeError):
"""
Raised when a process in a ProcessPoolExecutor terminated abruptly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Maybe avoid usingProcessPoolExecutor andfuture terms, which are objects of theconcurrent.futures package and not themultiprocessing package.

util.debug('terminate pool entering')
is_broken = BROKEN in (task_handler._state,
worker_handler._state,
result_handler._state)

worker_handler._state = TERMINATE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No need to use the_worker_state_lock here? And in other places where_worker_handler._state is manipulated?

util.debug('helping task handler/workers to finish')
cls._help_stuff_finish(inqueue, task_handler, len(pool))
else:
util.debug('finishing BROKEN process pool')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What happens here if thetask_handler is blocked, but we do not run_help_stuff_finish?


err = BrokenProcessPool(
'A worker in the pool terminated abruptly.')
# Exhaust MapResult with errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This also applies toApplyResult right?

err = BrokenProcessPool(
'A worker in the pool terminated abruptly.')
# Exhaust MapResult with errors
for i, cache_ent in list(self._cache.items()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Out of curiosity, is there any reason why we iterate on a list of ofself._cache?

@pablogsal
Copy link
Member

There are multiple tests being added that make use of sleep to synchronize processes (in particular it assumes that the processes will be entered in time when sleep finishes). This is very unreliable and it will most certainly fail on the slowest buildbots.

Please, try to add some synchronization to the tests to make them more deterministic.

@tomMoral
Copy link
Contributor

Note that this PR, while improving the current state of themultiprocessing.Pool, is not a full solution as it is still very easy to deadlock by callingsys.exit(0) in the function.

importsysfrommultiprocessingimportPoolpool=Pool(2)pool.apply(sys.exit, (0,))

or at unpickling time

importsysfrommultiprocessingimportPoolclassFailure:def__reduce__(self):returnsys.exit, (0, )pool=Pool(2)pool.apply(id, (Failure(),))

Also, many other problems exists withmultiprocessing.Pool as you can easily deadlock it (choose from failure to serialize/deserialize, flooding the queue with many tasks and one failure, segfaulting with bad timing). I did some work to try to make it fault tolerant (seeclass_ReusablePool in this branch ) but some design in the communication process make it tricky to fix all possible deadlocks/interpreter freeze.

Maybe a more stable solution would be to actually change thePool to rely on aconcurrent.futures executor for the parallel computations (which is now far more stable IMO), just keeping the API to reduce the maintenance burden to only one implementation of the parallel pool of workers.

@github-actionsGitHub Actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actionsgithub-actionsbot added the staleStale PR or inactive for long period of time. labelApr 13, 2025
@gpshead
Copy link
Member

closing in favor of#16103

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@effigieseffigieseffigies left review comments

@pierreglaserpierreglaserpierreglaser left review comments

@pitroupitroupitrou requested changes

Assignees
No one assigned
Labels
awaiting change reviewstaleStale PR or inactive for long period of time.topic-multiprocessing
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

11 participants
@oesteban@the-knights-who-say-ni@bedevere-bot@pitrou@pierreglaser@tomMoral@pablogsal@gpshead@effigies@vstinner@ezio-melotti

[8]ページ先頭

©2009-2025 Movatter.jp