Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue30703

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:Non-reentrant signal handler (test_multiprocessing_forkserver hangs)
Type:behaviorStage:resolved
Components:macOS, TestsVersions:Python 3.7, Python 3.6
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To:Nosy List: eric.snow, kristjan.jonsson, mattbillenstein, ned.deily, neologix, njs, pitrou, ronaldoussoren, vstinner
Priority:normalKeywords:buildbot

Created on2017-06-19 15:20 byvstinner, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.

Pull Requests
URLStatusLinkedEdit
PR 2408closedvstinner,2017-06-26 13:46
PR 2415mergedpitrou,2017-06-26 21:54
PR 2527mergedpitrou,2017-07-01 15:26
PR 2581mergedmasamoto,2017-07-05 08:01
Messages (25)
msg296348 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2017-06-19 15:20
regrtest hangs while running test_multiprocessing_forkserver in a worker process on the "x86-64 Sierra 3.x" buildbot.Bug seen twice: build 343 and 337.http://buildbot.python.org/all/builders/x86-64%20Sierra%203.x/builds/337/steps/test/logs/stdioRun tests in parallel using 2 child processes...0:29:36 load avg: 5.24 [402/406] test_deque passed -- running: test_multiprocessing_forkserver (1346 sec)0:29:37 load avg: 5.24 [403/406] test_descrtut passed -- running: test_multiprocessing_forkserver (1346 sec)0:29:37 load avg: 4.90 [404/406] test_weakset passed -- running: test_multiprocessing_forkserver (1347 sec)0:29:40 load avg: 4.90 [405/406] test_distutils passed -- running: test_multiprocessing_forkserver (1350 sec)command timed out: 1200 seconds without output running ['make', 'buildbottest', 'TESTOPTS=-j2', 'TESTPYTHONOPTS=', 'TESTTIMEOUT=900'], attempting to killprocess killed by signal 9program finished with exit code -1elapsedTime=2983.908679
msg296376 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2017-06-19 21:17
Different issue, but same behaviour: test hangs and then killed by buildbot, whereas a single test was still running.See alsobpo-30351 which tracks similar bugs, but on Python 2.7.I really hate such bugs :-(http://buildbot.python.org/all/builders/AMD64%20Debian%20root%203.x/builds/928/steps/test/logs/stdio...0:15:10 load avg: 0.52 [404/406] test_unicodedata passed -- running: test_pydoc (401 sec)0:15:10 load avg: 0.52 [405/406] test_exception_hierarchy passed -- running: test_pydoc (401 sec)command timed out: 1200 seconds without output running ['make', 'buildbottest', 'TESTOPTS=-j2', 'TESTPYTHONOPTS=', 'TESTTIMEOUT=900'], attempting to killprocess killed by signal 9program finished with exit code -1elapsedTime=2111.415970
msg296419 -(view)Author: Matt Billenstein (mattbillenstein)Date: 2017-06-20 12:49
I've been debugging this and I can repro on El Capitan on a different machine as well -- it's an infrequent hang, I've been running in a loop:mattb@mattb-mbp:~/src/misc/cpython master$ for i in $(seq 1000); do echo "Run: $i -- $(date)"; time./python.exe -m test --timeout 180 test_multiprocessing_forkserver || break; sleep 10; doneAnd in this case after ~30 successful runs:Run: 31 -- Tue Jun 20 05:12:19 PDT 2017Run tests sequentially0:00:00 load avg: 4.28 [1/1] test_multiprocessing_forkserverTimeout (0:03:00)!Thread 0x00007fffa1b3a3c0 (most recent call first):  File "/Users/mattb/src/misc/cpython/Lib/selectors.py", line 415 in select  File "/Users/mattb/src/misc/cpython/Lib/multiprocessing/connection.py", line 916 in wait  File "/Users/mattb/src/misc/cpython/Lib/multiprocessing/popen_fork.py", line 45 in wait  File "/Users/mattb/src/misc/cpython/Lib/multiprocessing/process.py", line 121 in join  File "/Users/mattb/src/misc/cpython/Lib/test/_test_multiprocessing.py", line 431 in test_many_processes  File "/Users/mattb/src/misc/cpython/Lib/unittest/case.py", line 605 in run  File "/Users/mattb/src/misc/cpython/Lib/unittest/case.py", line 653 in __call__  File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 122 in run  File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 84 in __call__  File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 122 in run  File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 84 in __call__  File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 122 in run  File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 84 in __call__  File "/Users/mattb/src/misc/cpython/Lib/test/support/__init__.py", line 1772 in run  File "/Users/mattb/src/misc/cpython/Lib/test/support/__init__.py", line 1896 in _run_suite  File "/Users/mattb/src/misc/cpython/Lib/test/support/__init__.py", line 1936 in run_unittest  File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/runtest.py", line 168 in test_runner  File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/runtest.py", line 169 in runtest_inner  File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/runtest.py", line 137 in runtest  File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 374 in run_tests_sequential  File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 454 in run_tests  File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 530 in _main  File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 504 in main  File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 573 in main  File "/Users/mattb/src/misc/cpython/Lib/test/__main__.py", line 2 in <module>  File "/Users/mattb/src/misc/cpython/Lib/runpy.py", line 85 in _run_code  File "/Users/mattb/src/misc/cpython/Lib/runpy.py", line 193 in _run_module_as_mainreal    3m0.115suser    0m5.804ssys     0m0.683sI also ran ~600 passes overnight on this test on an Ubuntu 16.04 machine without a hang, so it seems to be OSX specific.
msg296420 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2017-06-20 12:53
> I've been debugging this and I can repro on El Capitan on a different machine as well -- it's an infrequent hang, I've been running in a loop:Hum, a timeout of 3 minutes seems short for test_multiprocessing_forkserver. How long does it take *usually* to run this test?At least, we have a first clue: test_many_processes() ofLib/test/_test_multiprocessing.py.
msg296460 -(view)Author: Matt Billenstein (mattbillenstein)Date: 2017-06-20 15:39
It consistently takes between ~61 and ~73 seconds with this setup.
msg296641 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2017-06-22 16:11
Matt, if you try the following command, it will run the specific test in a loop in verbose mode:$ ./python -m test --timeout=30 -F -m test_many_processes -v test_multiprocessing_forkserver
msg296774 -(view)Author: Matt Billenstein (mattbillenstein)Date: 2017-06-24 15:56
Cool -- do you need me to do something more to help debug this?
msg296775 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2017-06-24 15:57
It would be nice to know if there is additional output (for example exceptions happening in other threads or processes) when you run that command and manage to trigger a hang.
msg296777 -(view)Author: Matt Billenstein (mattbillenstein)Date: 2017-06-24 16:56
I don't see anything odd -- it runs for awhile and then times out once it's deadlocked:0:03:18 load avg: 3.20 [224] test_multiprocessing_forkserver             test_many_processes (test.test_multiprocessing_forkserver.WithProcessesTestProcess) ... ok                                                         test_many_processes (test.test_multiprocessing_forkserver.WithThreadsTestProcess) ... skipped 'test not appropriate for threads'                   ----------------------------------------------------------------------   Ran 2 tests in 0.849s               OK (skipped=1)                      0:03:18 load avg: 3.11 [225] test_multiprocessing_forkserver             test_many_processes (test.test_multiprocessing_forkserver.WithProcessesTestProcess) ... ok                                                         test_many_processes (test.test_multiprocessing_forkserver.WithThreadsTestProcess) ... skipped 'test not appropriate for threads'                   ----------------------------------------------------------------------   Ran 2 tests in 0.865s               OK (skipped=1)                      0:03:19 load avg: 3.11 [226] test_multiprocessing_forkserver             test_many_processes (test.test_multiprocessing_forkserver.WithProcessesTestProcess) ... Timeout (0:00:30)!                                         Thread 0x00007fff790c1000 (most recent call first):                        File "/Users/mattb/src/misc/cpython/Lib/selectors.py", line 415 in select                                                                          File "/Users/mattb/src/misc/cpython/Lib/multiprocessing/connection.py", line 916 in wait                                                           File "/Users/mattb/src/misc/cpython/Lib/multiprocessing/popen_fork.py", line 45 in wait                                                            File "/Users/mattb/src/misc/cpython/Lib/multiprocessing/process.py", line 121 in join                                                              File "/Users/mattb/src/misc/cpython/Lib/test/_test_multiprocessing.py", line 431 in test_many_processes                                            File "/Users/mattb/src/misc/cpython/Lib/unittest/case.py", line 605 in run                                                                         File "/Users/mattb/src/misc/cpython/Lib/unittest/case.py", line 653 in __call__                                                                    File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 122 in run                                                                        File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 84 in __call__                                                                    File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 122 in run                                                                        File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 84 in __call__                                                                    File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 122 in run                                                                        File "/Users/mattb/src/misc/cpython/Lib/unittest/suite.py", line 84 in __call__                                                                    File "/Users/mattb/src/misc/cpython/Lib/unittest/runner.py", line 176 in run                                                                       File "/Users/mattb/src/misc/cpython/Lib/test/support/__init__.py", line 1896 in _run_suite                                                         File "/Users/mattb/src/misc/cpython/Lib/test/support/__init__.py", line 1936 in run_unittest                                                       File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/runtest.py", line 168 in test_runner                                                      File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/runtest.py", line 169 in runtest_inner                                                    File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/runtest.py", line 137 in runtest                                                          File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 374 in run_tests_sequential                                                File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 454 in run_tests                                                           File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 530 in _main                                                               File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 504 in main                                                                File "/Users/mattb/src/misc/cpython/Lib/test/libregrtest/main.py", line 573 in main                                                                File "/Users/mattb/src/misc/cpython/Lib/test/__main__.py", line 2 in <module>                                                                      File "/Users/mattb/src/misc/cpython/Lib/runpy.py", line 85 in _run_code                                                                            File "/Users/mattb/src/misc/cpython/Lib/runpy.py", line 193 in _run_module_as_main
msg296778 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2017-06-24 16:59
Thanks for trying :-)  Hmm, that's annoying.  I don't know if you'd like to give me shell access to the machine (and a CPython checkout I can play with, perhaps)?
msg296824 -(view)Author: Matt Billenstein (mattbillenstein)Date: 2017-06-25 14:20
Yes, I'll email you the details.
msg296876 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2017-06-26 11:46
Ok, I think I've managed to dig to the core issue.  It is actually the same issue ashttps://bugs.python.org/issue11768 (which was wrongly closed as fixed, apparently :-)).Py_AddPendingCall() calls PyThread_acquire_lock() to try and take the pending calls lock.  Unfortunately, PyThread_acquire_lock() is not reentrant in the case where semaphores are not used (e.g. on OS X).  We can probably fix that first issue by calling pthread_mutex_trylock() instead of pthread_mutex_lock().There is a second more fundamental issue, though, which is that PyThread_acquire_lock() calls into functions that are not async-signal-safe (seehttp://man7.org/linux/man-pages/man7/signal-safety.7.html for a list).  So, depending on the particular OS and libc implementation, PyThread_acquire_lock() can fail in mysterious ways (including hang the process) when called from a signal handler.So perhaps the ultimate fix would be to remove the OS-based locking in Py_AddPendingCall and use a busy spinwait...  The performance implications may be bad, though.
msg296879 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2017-06-26 11:51
Using pthread_mutex_trylock() *and* disabling the CHECK_STATUS_PTHREAD() calls (which use the non-async-signal-safe fprintf()) at least seems to suppress the hangs on Matt's OS X machine (after more than 1000 runs).
msg296881 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2017-06-26 11:55
> It is actually the same issue ashttps://bugs.python.org/issue11768 (which was wrongly closed as fixed, apparently :-))Well, don't be confused by the issue title. The only made change is to only call Py_AddPendingCall() only once, instead of calling it for each received signal.I don't know if the signal handler is really reentrant or not.Signal handling is hard :-(
msg296895 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2017-06-26 13:44
If I understood corretly, the problem is that the Python C signal handler is not reentrant because it calls Py_AddPendingCall() which uses a lock and a list.Before, the signal handler queued a new call to checksignals_witharg() (which just calls PyErr_CheckSignals()) for each received signal. Now, we only queue a single call to checksignals_witharg().To prevent reentrency issues, can't we hardcoded a call to PyErr_CheckSignals() in ceval.c when SIGNAL_PENDING_CALLS() is called?Py_AddPendingCall() feature is rarely used, it's mostly used for processing signals, no? Calling PyErr_CheckSignals() when no signal was received is cheap, so it shouldn't hurt.
msg296897 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2017-06-26 13:50
I think Kristjan uses Py_AddPendingCall, so ideally we would make it more reliable.  However, we're right that for the purpose of signal delivery, we can hardcall a call inside Py_MakePendingCalls().
msg296903 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2017-06-26 14:25
It seemshttps://github.com/python/cpython/pull/2408 alone solves the issue on Matt's machine.  It's also a pleasantly simple patch :-)
msg296905 -(view)Author: Kristján Valur Jónsson (kristjan.jonsson)*(Python committer)Date: 2017-06-26 14:55
Thanks for the mention, @pitrou.CCP was using Py_AddPendingCall but not from signal handlers, but external threads.  Also on windows only.You'll also be happy to know that I have left CCP and the Eve codebase is being kept stable while regularly adding security patches from the 2.7 codebase, as far as I know :)
msg297208 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2017-06-28 19:03
For the record,https://github.com/python/cpython/pull/2415 is ready.
msg297234 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2017-06-28 21:29
New changesetc08177a1ccad2ed0d50898c2731b518c631aed14 by Antoine Pitrou in branch 'master':bpo-30703: Improve signal delivery (#2415)https://github.com/python/cpython/commit/c08177a1ccad2ed0d50898c2731b518c631aed14
msg297361 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2017-06-30 08:37
I will wait a bit and then backport this to 3.6.
msg297405 -(view)Author: Matt Billenstein (mattbillenstein)Date: 2017-06-30 14:14
Verified this by letting the test_many_processes loop overnight (42bc8beadd49">master@42bc8beadd49)-- 34k passes later and no hangs.  Nice work!
msg297413 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2017-06-30 14:47
I concur with Matt: nice job Antoine, thanks for making Python more reliable ;-)
msg297418 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2017-06-30 15:28
Thanks for running such a lengthy test, Matt :-)
msg297484 -(view)Author: Antoine Pitrou (pitrou)*(Python committer)Date: 2017-07-01 17:12
New changeset3024c0529077f5cff0b32dc84b5923c8fba99a87 by Antoine Pitrou in branch '3.6':[3.6]bpo-30703: Improve signal delivery (GH-2415) (#2527)https://github.com/python/cpython/commit/3024c0529077f5cff0b32dc84b5923c8fba99a87
History
DateUserActionArgs
2022-04-11 14:58:47adminsetgithub: 74888
2019-03-29 16:08:52eric.snowsetnosy: +eric.snow
2017-07-05 08:01:51masamotosetpull_requests: +pull_request2652
2017-07-01 17:14:23pitrousetstatus: open -> closed
stage: backport needed -> resolved
2017-07-01 17:12:08pitrousetmessages: +msg297484
2017-07-01 15:26:01pitrousetpull_requests: +pull_request2592
2017-06-30 15:28:06pitrousetmessages: +msg297418
2017-06-30 14:47:33vstinnersetmessages: +msg297413
2017-06-30 14:14:49mattbillensteinsetmessages: +msg297405
2017-06-30 08:37:48pitrousetresolution: fixed
stage: patch review -> backport needed
messages: +msg297361
versions: + Python 3.6
2017-06-28 21:29:31pitrousetmessages: +msg297234
2017-06-28 19:03:57pitrousetmessages: +msg297208
2017-06-26 21:54:45pitrousetpull_requests: +pull_request2473
2017-06-26 14:55:13kristjan.jonssonsetmessages: +msg296905
2017-06-26 14:26:03pitrousetstage: patch review
2017-06-26 14:25:53pitrousetdependencies: -PyThread_acquire_lock can block even when asked not ot,Make CHECK_STATUS_PTHREAD signal-safe
2017-06-26 14:25:49pitrousetmessages: +msg296903
2017-06-26 13:50:50pitrousetnosy: +kristjan.jonsson
messages: +msg296897
2017-06-26 13:46:02vstinnersetpull_requests: +pull_request2456
2017-06-26 13:45:01vstinnersettitle: test_multiprocessing_forkserver hangs on the master branch -> Non-reentrant signal handler (test_multiprocessing_forkserver hangs)
2017-06-26 13:44:40vstinnersetmessages: +msg296895
2017-06-26 12:52:55pitrousettype: behavior
versions: - Python 2.7, Python 3.5, Python 3.6
2017-06-26 12:44:15pitrousetdependencies: +PyThread_acquire_lock can block even when asked not ot,Make CHECK_STATUS_PTHREAD signal-safe
2017-06-26 11:55:08vstinnersetmessages: +msg296881
2017-06-26 11:51:13pitrousetmessages: +msg296879
2017-06-26 11:46:23pitrousetnosy: +njs,neologix

messages: +msg296876
versions: + Python 2.7, Python 3.5, Python 3.6
2017-06-25 14:20:41mattbillensteinsetmessages: +msg296824
2017-06-24 16:59:17pitrousetmessages: +msg296778
2017-06-24 16:56:21mattbillensteinsetmessages: +msg296777
2017-06-24 15:57:56pitrousetmessages: +msg296775
2017-06-24 15:56:55mattbillensteinsetmessages: +msg296774
2017-06-22 16:11:26pitrousetnosy: +pitrou
messages: +msg296641
2017-06-22 16:05:14pitrousettitle: regrtest hangs on the master branch -> test_multiprocessing_forkserver hangs on the master branch
2017-06-20 15:39:59mattbillensteinsetmessages: +msg296460
2017-06-20 12:53:02vstinnersetmessages: +msg296420
2017-06-20 12:49:28mattbillensteinsetnosy: +mattbillenstein
messages: +msg296419
2017-06-19 21:17:37vstinnersettitle: test_multiprocessing_forkserver hangs on x86-64 Sierra 3.x. buildbot -> regrtest hangs on the master branch
2017-06-19 21:17:11vstinnersetmessages: +msg296376
2017-06-19 15:21:04vstinnersetnosy: +ronaldoussoren,ned.deily
components: + macOS
2017-06-19 15:20:57vstinnercreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp