Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-109934: notify cancelled futures on thread pool shutdown#134618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
duaneg wants to merge8 commits intopython:main
base:main
Choose a base branch
Loading
fromduaneg:gh-109934

Conversation

duaneg
Copy link
Contributor

@duanegduaneg commentedMay 24, 2025
edited by bedevere-appbot
Loading

WhenThreadPoolExecutor shuts down it cancels any pending futures, however at present it doesn't notify waiters. Thus their state stays asCANCELLED instead ofCANCELLED_AND_NOTIFIED and any waiters are not awakened.

Callset_running_or_notify_cancel on the cancelled futures to fix this.

When `ThreadPoolExecutor` shuts down it cancels any pending futures, however atpresent it doesn't notify waiters. Thus their state stays as `CANCELLED`instead of `CANCELLED_AND_NOTIFIED` and any waiters are not awakened.Call `set_running_or_notify_cancel` on the cancelled futures to fix this.
@lizzydavis695

This comment was marked as spam.

@duaneg
Copy link
ContributorAuthor

What on earth possessed me to say "I think I've managed to come up with [a unit test] that works works reliably", I have no idea. Utter foolishness. Oh well, we'll get there.

@chrisvanrun
Copy link

chrisvanrun commentedJul 17, 2025
edited
Loading

What on earth possessed me to say "I think I've managed to come up with [a unit test] that works works reliably", I have no idea. Utter foolishness. Oh well, we'll get there.

Been there, done that!

In my local tests I just added a time-based approach that sets the the process at 10 seconds OR fail directly: and a generic maximum runtime of the test on 4s. Which is fine for the current project which also has a 'kill all children' directly following the shutdown.

I think a good approach would perhaps be to add a generic Lock, have each process stall on that except for one. The one gets picked up in the first 'batch' and then immediately errors out. Then call for executor.shutdown and subsequent release the lock to free up the stalled processes.

You could then assert for the final n tasks to be correctly canceled. I suspect a runtime variance here is that the failed task might or might not free-up a worker for the next Lock-blocked task; but perhaps the exector blocked while a future's completeness is being handled?

@duaneg
Copy link
ContributorAuthor

I think a good approach would perhaps be to add a generic Lock, have each process stall on that except for one. The one gets picked up in the first 'batch' and then immediately errors out. Then call for executor.shutdown and subsequent release the lock to free up the stalled processes.

Yeah, that is basically what the test does: submits a bunch of tasks, the firstmax_workers of which immediately block waiting on a barrier, so we know all workers are engaged (and blocked, so they will remain so), and that there are remaining tasks that are pending and hence will be cancelled. Then issue the shutdown, then release the barrier and unblock the workers.

You could then assert for the final n tasks to be correctly canceled. I suspect a runtime variance here is that the failed task might or might not free-up a worker for the next Lock-blocked task; but perhaps the exector blocked while a future's completeness is being handled?

This should be reliable, as the shutdown is initiated synchronously while all workers are blocked. At the point any of the tasks complete the executor must be shutdown and no additionalpending tasks will be started.

However, there are lots of tricky details in ensuring this all works robustly. E.g. once the executor is shutdown use of any synchronisation primitivesmay fail, depending on timing and details of the implementation, so we have to handleBrokenBarrierError (both directly in the test and the indirectly if the workers hit it).

Also the internal multiprocessing machinery uses a work queue and considers tasks running (and no longer pending) once they are enqueued, even if they haven't been distributed to workers. It eagerly enqueues some extra tasks over-and-above the max number of workers to keep the pipeline filled. Those tasks will not be cancelled, but may or may not actually run, depending on timing.

Anyway, hopefully with all of that taken into account the test is now robust and reliable!

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@chrisvanrunchrisvanrunchrisvanrun left review comments

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@duaneg@lizzydavis695@chrisvanrun

[8]ページ先頭

©2009-2025 Movatter.jp