Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
Description
Bug report
With a ProcessPoolExecutor, after submitting and quickly canceling a future, a call toshutdown(wait=True) would hang indefinitely.
This happens pretty much on all platforms and all recent Python versions.
Here is a minimal reproduction:
importconcurrent.futuresppe=concurrent.futures.ProcessPoolExecutor(1)ppe.submit(int).result()ppe.submit(int).cancel()ppe.shutdown(wait=True)
The first submission gets the executor going and creates its internalqueue_management_thread.
The second submission appears to get that thread to loop, enter a wait state, and never receive a wakeup event.
Introducing a tiny sleep between the second submit and its cancel request makes the issue disappear. From my initial observation it looks like something in the way thequeue_management_worker internal loop is structured doesn't handle this edge case well.
Shutting down withwait=False would return immediately as expected, but thequeue_management_thread would then die with an unhandledOSError: handle is closed exception.
Environment
- Discovered on macOS-12.2.1 with cpython 3.8.5.
- Reproduced in Ubuntu and Windows (x64) as well, and in cpython versions 3.7 to 3.11.0-beta.3.
- Reproduced in pypy3.8 as well, but not consistently. Seen for example in Ubuntu with Python 3.8.13 (PyPy 7.3.9).
Additional info
When tested withpytest-timeout under Ubuntu and cpython 3.8.13, these are the tracebacks at the moment of timing out:
Details
_____________________________________ test _____________________________________@pytest.mark.timeout(10)deftest(): ppe= concurrent.futures.ProcessPoolExecutor(1) ppe.submit(int).result() ppe.submit(int).cancel()> ppe.shutdown(wait=True)test_reproduce_python_bug.py:14:_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/concurrent/futures/process.py:686: in shutdownself._queue_management_thread.join()/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py:1011: in joinself._wait_for_tstate_lock()_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _self = <Thread(QueueManagerThread, started daemon 140003176535808)>block = True, timeout = -1def_wait_for_tstate_lock(self,block=True,timeout=-1):# Issue #18808: wait for the thread state to be gone.# At the end of the thread's life, after all knowledge of the thread# is removed from C data structures, C code releases our _tstate_lock.# This method passes its arguments to _tstate_lock.acquire().# If the lock is acquired, the C code is done, and self._stop() is# called. That sets ._is_stopped to True, and ._tstate_lock to None. lock=self._tstate_lockif lockisNone:# already determined that the C code is doneassertself._is_stopped> elif lock.acquire(block, timeout):E Failed: Timeout >10.0s/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py:1027: Failed----------------------------- Captured stderr call -----------------------------+++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++~~~~~~~~~~~~~~~~~ Stack of QueueFeederThread (140003159754496) ~~~~~~~~~~~~~~~~~ File"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line890, in_bootstrapself._bootstrap_inner() File"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line932, in_bootstrap_innerself.run() File"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line870, inrunself._target(*self._args,**self._kwargs) File"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/queues.py", line227, in_feed nwait() File"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line302, inwait waiter.acquire()~~~~~~~~~~~~~~~~ Stack of QueueManagerThread (140003176535808) ~~~~~~~~~~~~~~~~~ File"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line890, in_bootstrapself._bootstrap_inner() File"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line932, in_bootstrap_innerself.run() File"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/threading.py", line870, inrunself._target(*self._args,**self._kwargs) File"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/concurrent/futures/process.py", line362, in_queue_management_worker ready= mp.connection.wait(readers+ worker_sentinels) File"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/multiprocessing/connection.py", line931, inwait ready= selector.select(timeout) File"/opt/hostedtoolcache/Python/3.8.13/x64/lib/python3.8/selectors.py", line415, inselect fd_event_list=self._selector.poll(timeout)+++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
Tracebacks in PyPy are similar on theconcurrent.futures.process level. Tracebacks in Windows are different in the lower-level areas, but again similar on theconcurrent.futures.process level.
Linked PRs:
Linked PRs
Metadata
Metadata
Assignees
Projects
Status