Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.3k
Description
Bug report
Main thread hang atdrop_gil after daemon thread exited, my Python version is 3.9.2, but I think the same issue is exists in the Python 3.10 newest version.
(gdb) info threads Id Target Id Frame * 1 Thread 0xffff909dd010 (LWP 226769) "python" 0x0000ffff909355c0 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xaaaadf99e8d4 <_PyRuntime+524>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88(gdb) bt#0 0x0000ffff909355c0 in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0xaaaadf99e8d4 <_PyRuntime+524>) at ../sysdeps/unix/sysv/linux/futex-internal.h:88#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0xaaaadf99e8d8 <_PyRuntime+528>, cond=0xaaaadf99e8a8 <_PyRuntime+480>) at pthread_cond_wait.c:508#2 __pthread_cond_wait (cond=cond@entry=0xaaaadf99e8a8 <_PyRuntime+480>, mutex=mutex@entry=0xaaaadf99e8d8 <_PyRuntime+528>) at pthread_cond_wait.c:638#3 0x0000aaaadf35d184 in drop_gil (ceval=0xaaaadf99e820 <_PyRuntime+344>, ceval@entry=0xaaaadf300274 <PyObject_Hash+116>, ceval2=ceval2@entry=0xaaaafbc61ff0, tstate=tstate@entry=0xaaaafbc64a00) at Python/ceval_gil.h:178#4 0x0000aaaadf2c05c4 in eval_frame_handle_pending (tstate=0xaaaafbc64a00) at Python/ceval.c:882#5 _PyEval_EvalFrameDefault (tstate=<optimized out>, f=0xffff83929d40, throwflag=<optimized out>) at Python/ceval.c:1402#6 0x0000aaaadf2bee1c in _PyEval_EvalFrame (throwflag=0, f=0xffff83929d40, tstate=0xaaaafbc64a00) at ./Include/internal/pycore_ceval.h:40#7 function_code_fastcall (tstate=0xaaaafbc64a00, co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:330#8 0x0000aaaadf31ac04 in _PyObject_VectorcallTstate (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>, tstate=<optimized out>) at ./Include/cpython/abstract.h:118#9 _PyObject_CallNoArg (func=<optimized out>) at ./Include/cpython/abstract.h:172#10 call_unbound_noarg (self=<optimized out>, func=0xffff83691ee0, unbound=<optimized out>) at Objects/typeobject.c:1515#11 slot_tp_finalize (self=<optimized out>) at Objects/typeobject.c:6992#12 0x0000aaaadf30011c in PyObject_CallFinalizer (self=self@entry=0xffff837ea6d0) at Objects/object.c:195#13 0x0000aaaadf302694 in PyObject_CallFinalizerFromDealloc (self=self@entry=0xffff837ea6d0) at Objects/object.c:213#14 0x0000aaaadf314b1c in subtype_dealloc (self=0xffff837ea6d0) at Objects/typeobject.c:1265#15 0x0000aaaadf2f3f4c in _Py_DECREF (op=<optimized out>) at ./Include/object.h:430#16 _Py_XDECREF (op=<optimized out>) at ./Include/object.h:497#17 insertdict (mp=mp@entry=0xffff8383de80, key=0xffff837450b0, hash=-5519671433175608536, value=value@entry=0xaaaadf94be68 <_Py_NoneStruct>) at Objects/dictobject.c:1123#18 0x0000aaaadf2f52d4 in PyDict_SetItem (op=op@entry=0xffff8383de80, key=<optimized out>, value=value@entry=0xaaaadf94be68 <_Py_NoneStruct>) at Objects/dictobject.c:1573#19 0x0000aaaadf2ffa48 in _PyModule_ClearDict (d=0xffff8383de80) at Objects/moduleobject.c:612#20 0x0000aaaadf2ffc84 in _PyModule_Clear (m=m@entry=0xffff837f1a40) at Objects/moduleobject.c:560#21 0x0000aaaadf3798e8 in _PyImport_Cleanup (tstate=tstate@entry=0xaaaafbc64a00) at Python/import.c:606#22 0x0000aaaadf38e63c in Py_FinalizeEx () at Python/pylifecycle.c:1426#23 0x0000aaaadf2c9110 in Py_RunMain () at Modules/main.c:679#24 0x0000aaaadf2c99f8 in pymain_main (args=args@entry=0xffffd3754d88) at Modules/main.c:707#25 0x0000aaaadf2c9ad4 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:731#26 0x0000ffff906c1f80 in __libc_start_main (main=0x0, argc=0, argv=0x0, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:308#27 0x0000aaaadf2c87bc in _start ()Backtrace stopped: previous frame identical to this frame (corrupt stack?)(gdb) py-btTraceback (most recent call first): File "/home/ossadm/zhao/runtime.py", line 8, in __del__ def __del__(self):(gdb)Rough analysis
In Python version 3.9.2, thetake_gil is implemented as following:
staticvoidtake_gil(PyThreadState*tstate){...while (_Py_atomic_load_relaxed(&gil->locked)) {unsigned longsaved_switchnum=gil->switch_number;unsigned longinterval= (gil->interval >=1 ?gil->interval :1);inttimed_out=0;COND_TIMED_WAIT(gil->cond,gil->mutex,interval,timed_out);/* If we timed out and no switch occurred in the meantime, it is time to ask the GIL-holding thread to drop it. */if (timed_out&&_Py_atomic_load_relaxed(&gil->locked)&&gil->switch_number==saved_switchnum) {if (tstate_must_exit(tstate)) {MUTEX_UNLOCK(gil->mutex);PyThread_exit_thread();// [2] }assert(is_tstate_valid(tstate));SET_GIL_DROP_REQUEST(interp);// [1] } }...}
A daemon thread may callSET_GIL_DROP_REQUEST and then exits fromPyThread_exit_thread, so this thread will not get GIL really. This happens if the main thread is on the finalizing procedure and it calls_PyRuntimeState_SetFinalizing inPy_FinalizeEx before the daemon thread gets GIL (The daemon thread wait 5ms at most before exit).
After that, the flaggil_drop_request is set, but only the main thread is left. Then, if the main thread runs intodrop_gil again, it will hang atFORCE_SWITCHING:
COND_WAIT(gil->switch_cond,gil->switch_mutex);
Reproducible example
This issue could be reproduced by:
$python-mhttp.server5678&$for ((i=0;;i++));doecho $i;python-Wallservice.py;done
And the scripts used are:
runtime.py
importsocketclassTestUnclosedSocket:def__init__(self):self.s=socket.socket()def__del__(self):self.s.connect(('127.0.0.1',5678))ins=TestUnclosedSocket()
service.py
importthreadingimportruntimeimportsyssys.setswitchinterval(0.001)defcalc():sum=0foriinrange(100000):sum+=i*idefdaemon_func():whileTrue:calc()if__name__=='__main__':threading.Thread(target=daemon_func,name="daemon",daemon=True).start()calc()
I change GIL switch interval to 1ms to increase the probability here.