It took me 4 years to fix a nasty bug in the famous Python GIL (GlobalInterpreter Lock), one of the most critical part of Python. I had to dig theGit history to find achange made 26 years ago byGuido van Rossum: atthis time,threads were something esoteric. Let me tell you my story.
Fatal Python error caused by a C thread and the GIL
In March 2014,Steve Dower reported the bugbpo-20891 when a "C thread" uses the Python CAPI:
In Python 3.4rc3, callingPyGILState_Ensure() from a thread that wasnot created by Python and without any calls toPyEval_InitThreads()will cause a fatal exit:
Fatal Python error: take_gil: NULL tstate
My first comment:
IMO it's a bug inPyEval_InitThreads().

PyGILState_Ensure() fix
I forgot the bug during 2 years. In March 2016, I modified Steve's testprogram to make it compatible with Linux (the test was written for Windows). Isucceeded to reproduce the bug on my computer and I wrote a fix forPyGILState_Ensure().
One year later, november 2017,Marcin Kasperski asked:
Is this fix released? I can't find it in the changelog…
Oops, again, I completely forgot this issue! This time, not only Iapplied myPyGILState_Ensure() fix, but I also wrote theunit testtest_embed.test_bpo20891():
Ok, the bug is now fixed in Python 2.7, 3.6 and master (future 3.7). On 3.6and master, the fix comes with an unit test.
My fix for the master branch, commitb4d1e1f7:
bpo-20891: Fix PyGILState_Ensure() (#4650)When PyGILState_Ensure() is called in a non-Python thread beforePyEval_InitThreads(), only call PyEval_InitThreads() after callingPyThreadState_New() to fix a crash.Add an unit test in test_embed.
And I closed the issuebpo-20891...
Random crash of the test on macOS
Everything was fine... but one week later, I noticedrandom crashes onmacOS buildbots on my newly added unit test. I succeeded to reproduce the bugmanually, example of crash at the 3rd run:
macbook:master haypo$ while true; do ./Programs/_testembed bpo20891 ||break; date; doneLun 4 déc 2017 12:46:34 CETLun 4 déc 2017 12:46:34 CETLun 4 déc 2017 12:46:34 CETFatal Python error: PyEval_SaveThread: NULL tstateCurrent thread 0x00007fffa5dff3c0 (most recent call first):Abort trap: 6
test_embed.test_bpo20891() on macOS showed a race condition inPyGILState_Ensure(): the creation of the GIL lock itself... was notprotected by a lock! Adding a new lock to check if Python currently has the GILlock doesn't make sense...
I proposed an incomplete fix forPyThread_start_new_thread():
I found a working fix: callPyEval_InitThreads() inPyThread_start_new_thread(). So the GIL is created as soon as a secondthread is spawned. The GIL cannot be created anymore while two threads arerunning. At least, with thepython binary. It doesn't fix the issue ifa thread is not spawned by Python, but this thread callsPyGILState_Ensure().
Why not always create the GIL?
Antoine Pitrou asked a simple question:
Why notalways callPyEval_InitThreads() at interpreterinitialization? Are there any downsides?
Thanks togit blame andgit log, I found the origin of the codecreating the GIL "on demand",a change made 26 years ago!
commit 1984f1e1c6306d4e8073c28d2395638f80ea509bAuthor: Guido van Rossum <guido@python.org>Date: Tue Aug 4 12:41:02 1992 +0000 * Makefile adapted to changes below. * split pythonmain.c in two: most stuff goes to pythonrun.c, in the library. * new optional built-in threadmodule.c, build upon Sjoerd's thread.{c,h}. * new module from Sjoerd: mmmodule.c (dynamically loaded). * new module from Sjoerd: sv (svgen.py, svmodule.c.proto). * new files thread.{c,h} (from Sjoerd). * new xxmodule.c (example only). * myselect.h: bzero -> memset * select.c: bzero -> memset; removed global variable(...)+void+init_save_thread()+{+#ifdef USE_THREAD+ if (interpreter_lock)+ fatal("2nd call to init_save_thread");+ interpreter_lock = allocate_lock();+ acquire_lock(interpreter_lock, 1);+#endif+}+#endifMy guess was that the intent of dynamically created GIL is to reduce the"overhead" of the GIL for applications only using a single Python thread (neverspawn a new Python thread).
Luckily,Guido van Rossum was around and was able to elaborate therationale:
Yeah, the original reasoning was thatthreads were something esoteric andnot used by most code, and at the time we definitely felt thatalwaysusing the GIL would cause a (tiny) slowdown andincrease the risk ofcrashes due to bugs in the GIL code. I'd be happy to learn that we nolonger need to worry about this andcan just always initialize it.
Second fix for Py_Initialize() proposed
I proposed asecond fix forPy_Initialize() to always create the GIL assoon as Python starts, and no longer "on demand", to prevent any risk of a racecondition:
+ /* Create the GIL */+ PyEval_InitThreads();
Nick Coghlan asked if I could you run my patch through the performancebenchmarks. I ranpyperformance on myPR 4700. Differences of at least 5%:
haypo@speed-python$ python3 -m perf compare_to \ 2017-12-18_12-29-master-bd6ec4d79e85.json.gz \ 2017-12-18_12-29-master-bd6ec4d79e85-patch-4700.json.gz \ --table --min-speed=5+----------------------+--------------------------------------+-------------------------------------------------+| Benchmark | 2017-12-18_12-29-master-bd6ec4d79e85 | 2017-12-18_12-29-master-bd6ec4d79e85-patch-4700 |+======================+======================================+=================================================+| pathlib | 41.8 ms | 44.3 ms: 1.06x slower (+6%) |+----------------------+--------------------------------------+-------------------------------------------------+| scimark_monte_carlo | 197 ms | 210 ms: 1.07x slower (+7%) |+----------------------+--------------------------------------+-------------------------------------------------+| spectral_norm | 243 ms | 269 ms: 1.11x slower (+11%) |+----------------------+--------------------------------------+-------------------------------------------------+| sqlite_synth | 7.30 us | 8.13 us: 1.11x slower (+11%) |+----------------------+--------------------------------------+-------------------------------------------------+| unpickle_pure_python | 707 us | 796 us: 1.13x slower (+13%) |+----------------------+--------------------------------------+-------------------------------------------------+Not significant (55): 2to3; chameleon; chaos; (...)
Oh, 5 benchmarks were slower. Performance regressions are not welcome inPython: we are working hard onmaking Python faster!
Skip the failing test before Christmas
I didn't expect that 5 benchmarks would be slower. It required furtherinvestigation, but I didn't have time for that and I was too shy or ashame totake the responsibility of pushing a performance regression.
Before the christmas holiday, no decision was taken whereastest_embed.test_bpo20891() was still failing randomly on macOS buildbots.Iwas not confortable to touch a critical part of Python, its GIL, justbefore leaving for two weeks. So I decided to skiptest_bpo20891() untilI'm back.
No gift for you, Python 3.7.

New benchmark run and second fix applied to master
At the end of january 2018, I ran again the 5 benchmarks made slower by my PR.I ran these benchmarks manually on my laptop using CPU isolation:
vstinner@apu$ python3 -m perf compare_to ref.json patch.json --tableNot significant (5): unpickle_pure_python; sqlite_synth; spectral_norm; pathlib; scimark_monte_carlo
Ok, it confirms that my second fix hasno significant impact onperformances according to thePython "performance" benchmark suite.
I decided topush my fix to the master branch, commit2914bb32:
bpo-20891: Py_Initialize() now creates the GIL (#4700)The GIL is no longer created "on demand" to fix a race condition whenPyGILState_Ensure() is called in a non-Python thread.
Then I reenabledtest_embed.test_bpo20891() on the master branch.
No second fix for Python 2.7 and 3.6, sorry!
Antoine Pitrou considered that backport for Python 3.6should not bemerged:
I don't think so. People can already callPyEval_InitThreads().
Guido van Rossum didn't want to backport this change neither. So I onlyremovedtest_embed.test_bpo20891() from the 3.6 branch.
I didn't apply my second fix to Python 2.7 neither for the same reason.Moreover, Python 2.7 has no unit test, since it was too difficult to backportit.
At least, Python 2.7 and 3.6 got my firstPyGILState_Ensure() fix.
Conclusion
Python still has some race conditions in corner cases. Such bug was found inthe creation of the GIL when a C thread starts using the Python API. I pushed afirst fix, but a new and different race condition was found on macOS.
I had to dig into the very old history (1992) of the Python GIL. Luckily,Guido van Rossum was also able to elaborate the rationale.
After a glitch in benchmarks, we agreed to modify Python 3.7 to always createthe GIL, instead of creating the GIL "on demand". The change has no significantimpact on performances.
It was also decided to leave Python 2.7 and 3.6 unchanged, to prevent any riskof regression: continue to create the GIL "on demand".
It took me 4 years to fix a nasty bug in the famous Python GIL. I am neverconfortable when touching suchcritical part of Python. I am now happy thatthe bug is behind us: it's now fully fixed in the future Python 3.7!
Seebpo-20891 for the full story.Thanks to all developers who helped me to fix this bug!