python/cpythonPublic

NotificationsYou must be signed in to change notification settings
Fork33.7k
Star70.4k

test_concurrent_futures.test_shutdown: test_interpreter_shutdown() fails randomly (race condition) #109047

New issue

Closed

test_concurrent_futures.test_shutdown: test_interpreter_shutdown() fails randomly (race condition)#109047

Labels

testsTests in the Lib/test dir

Description

vstinner

opened

on Sep 7, 2023

Thetest_interpreter_shutdown() test oftest_concurrent_futures.test_shutdown has a race condition. On purpose, the test doesn't wait until the executor completes (!). Moreover, itexpects the executor to always be able to submit its job, and the job to complete successfully! It's a very optimistic bet.

See also issue#107219: test_concurrent_futures:test_crash_big_data() hangs randomly onWindows.

When Python is shutting down,Py_Finalize() quickly blocks the creation of new threads in_thread.start_new_thread():

cpython/Modules/_threadmodule.c

Lines 1161 to 1165 in3bfa24e

	if (interp->finalizing) {
	PyErr_SetString(PyExc_RuntimeError,
	"can't create new thread at interpreter shutdown");
	returnNULL;
	}

This exception was added recently (last June) by commitce558e6: see issuegh-104690 for the rationale.

The multiprocessing executor spawns_ExecutorManagerThread thread which runs its "main loop" in itsrun() method:

cpython/Lib/concurrent/futures/process.py

Lines 335 to 378 in3bfa24e

	defrun(self):
	# Main loop for the executor manager thread.

	whileTrue:
	self.add_call_item_to_queue()

	result_item,is_broken,cause=self.wait_result_broken_or_wakeup()

	ifis_broken:
	self.terminate_broken(cause)
	return
	ifresult_itemisnotNone:
	self.process_result_item(result_item)

	process_exited=result_item.exit_pidisnotNone
	ifprocess_exited:
	p=self.processes.pop(result_item.exit_pid)
	p.join()

	# Delete reference to result_item to avoid keeping references
	# while waiting on new results.
	delresult_item

	ifexecutor:=self.executor_reference():
	ifprocess_exited:
	withself.shutdown_lock:
	executor._adjust_process_count()
	else:
	executor._idle_worker_semaphore.release()
	delexecutor

	ifself.is_shutting_down():
	self.flag_executor_shutting_down()

	# When only canceled futures remain in pending_work_items, our
	# next call to wait_result_broken_or_wakeup would hang forever.
	# This makes sure we have some running futures or none at all.
	self.add_call_item_to_queue()

	# Since no new work items can be added, it is safe to shutdown
	# this thread if there are no pending work items.
	ifnotself.pending_work_items:
	self.join_executor_internals()
	return

It tries to submit new jobs to the worker process through a queue, but oops, the Python main thread is finalizing (calledPy_Finalizing())! There is not notification system to notify threads that Python is being finalized.

Moreover, there are3 "finalization" states:

interp->finalizing -- used by_thread.start_new_thread() to block thread creation during Python finazlization
runtime->_finalizing -- used bysys.is_finalizing(),Py_IsFinalizing() and_PyRuntimeState_GetFinalizing(runtime)
interp->_finalizing -- used byceval.c to decide if a Python thread "must exit" or not, as soon as it's set, all Python threads must exit as soon as theyattempt to acquire the GIL

These 3 states at not set at the same time.

CallingPy_Finalize() setsinterp->finalizing to 1 as soon as possible: so spawning new threads isimmediately blocked (which is a good thing to get a reliable finalization!)
Py_Finalize() callsthreading._shutdown() which blocks until all non-daemon threads completes
Py_Finalize() callsatexit callbacks
And only then,Py_Finalize() setsruntime->_finalizingandinterp->_finalizing to the Python thread state (tstate) which callsPy_Finalize()

The delay between (1) and (4) can be quite long, a thread can take several milliseconds, if not seconds, to complete.

Canmultiprocessing orconcurrent.futures check if Python is finalizing or be notified? Well, did you hear aboutTime-of-check to time-of-use race conditions? Even if it would be possible, I don't think that we can "check" if it's safe to spawn a threadjust before spawning a thread, since the main thread can decide to finalize Python "at any time". It will become even more tricky with Python nogil ;-)

So what's left? Well,multiprocessing andconcurrent.futures should be optimistic, call Python functions and onlythen check for exceptions. Depending on the exceptions, they can decide how to handle it. I would suggest to exit as soon as possible, and try to cleanup resources if possible.

Another option would be to makemultiprocessing andconcurrent.futures more determistic. Rather than spawning threads and processes in the background "on demand" and hope that everything will be fine, add more synchronization to "wait" until everything is ready to submit jobs. I think that I already tried this approach in the past, but@pitrou didn't like it since it made some workloads slower. You may not always need to actually submits jobs. You may not always need all threads and processes.

Well, I don't know even these complex modules to tell which option is the least bad :-)

Finally, as usually, I beg you to make these APIs less magical, and enforce more explicit resources management! It shouldn't even be possible to not wait until an executor complete. It should be enforced by emitting loudlyResourceWarning warnings :-) Well, that's my opinion. I know that it's not shared by@pitrou :-)

Linked PRs

Metadata

Assignees

No one assigned

Labels

testsTests in the Lib/test dir

Projects

Multiprocessing issues

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

test_concurrent_futures.test_shutdown: test_interpreter_shutdown() fails randomly (race condition) #109047

Description

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions