Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue28782

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:SEGFAULT when running a given coroutine
Type:crashStage:resolved
Components:Interpreter CoreVersions:Python 3.7, Python 3.6
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To: vstinnerNosy List: Demur Rumed, martius, ned.deily, python-dev, r.david.murray, serhiy.storchaka, vstinner, yselivanov
Priority:Keywords:patch

Created on2016-11-23 18:03 bymartius, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
test.pymartius,2016-11-23 18:03reduced test case
pygen_yf.patchvstinner,2016-11-23 23:11review
lasti.patchvstinner,2016-11-23 23:57review
pygen_yf-2.patchvstinner,2016-11-24 14:25review
Pull Requests
URLStatusLinkedEdit
PR 552closeddstufft,2017-03-31 16:36
Messages (18)
msg281573 -(view)Author: Martin Richard (martius)*Date: 2016-11-23 18:03
Hi,I stumbled upon a SEGFAULT while trying Python 3.6.0 on a project usingasyncio. I can't really figure out what's happening, so I reduced the originalcode triggering the bug down to a reproducible case (which looks a bit clunky,sorry). The case has been tested on two Linux systems (Archlinux and Debian),and with several versions of Python.The bug appears between 3.6.0a4 (most recent version tested not affected) and3.60b1 (so before the C asyncio module I believe), and is not fixed in thecurrent repository tip (changeset: 105345:3addf93f4111).I also produced a traceback using gdb (see bellow).The segfault happens around the "await" in the body of Cursor._read_data(),interestingly, if I change anything in the body of the method, the SEGFAULTdisappears and the code works as expected. Also, it seems that calling it froman other coroutine (main() in the example) is required to trigger the bug.Cheers,MartinCase (also attached as test.py) : import asyncioloop = asyncio.get_event_loop()class Connection:    def read_until(self, *args, **kwargs):        return self    async def all(self):        return b"\n"class Cursor:    def __init__(self):        self._connection = Connection()        self._max_bytes = 100        self._data = bytearray()    async def _read_data(self):        # XXX segfault there, if I change anything in the code, it works...        while True:            data = await self._connection.read_until(                b'\n', max_bytes=self._max_bytes).all()            self._max_bytes -= len(data)            if data == b'\n':                break            self._data.extend(data)async def main():    await Cursor()._read_data()loop.run_until_complete(main())Traceback extract (with Python3.6.0b4, --with-pydebug on Linux):Program received signal SIGSEGV, Segmentation fault.0x000000000046d177 in _PyGen_yf (gen=gen@entry=0x7ffff34bdaf8) atObjects/genobject.c:361361        Py_INCREF(yf);(gdb) bt#0  0x000000000046d177 in _PyGen_yf (gen=gen@entry=0x7ffff34bdaf8) atObjects/genobject.c:361#1  0x000000000052f49c in _PyEval_EvalFrameDefault (f=0x7ffff67067d8, throwflag=<optimized out>)    atPython/ceval.c:1992#2  0x000000000052a0fc in PyEval_EvalFrameEx (f=f@entry=0x7ffff67067d8,throwflag=throwflag@entry=0)    atPython/ceval.c:718#3  0x000000000046d393 in gen_send_ex (gen=gen@entry=0x7ffff34bdc08, arg=<optimized out>,exc=exc@entry=0,closing=closing@entry=0) atObjects/genobject.c:189#4  0x000000000046de8d in _PyGen_Send (gen=gen@entry=0x7ffff34bdc08, arg=<optimized out>)    atObjects/genobject.c:308#5  0x00007ffff384ba2c in task_step_impl (task=task@entry=0x7ffff3263bd8,exc=exc@entry=0x0)    at (...)/Python-3.6.0b4/Modules/_asynciomodule.c:1963#6  0x00007ffff384c72e in task_step (task=0x7ffff3263bd8, exc=0x0)    at (...)/Python-3.6.0b4/Modules/_asynciomodule.c:2247#7  0x00007ffff384ca79 in task_call_step (arg=<optimized out>, task=<optimized out>)    at (...)/Python-3.6.0b4/Modules/_asynciomodule.c:1848#8  TaskSendMethWrapper_call (o=<optimized out>, args=<optimized out>, kwds=<optimized out>)    at (...)/Python-3.6.0b4/Modules/_asynciomodule.c:1167#9  0x0000000000446702 in PyObject_Call (func=0x7ffff37d7f60, args=0x7ffff7fb8058, kwargs=0x0)    atObjects/abstract.c:2246#10 0x00000000005295c8 in do_call_core (func=func@entry=0x7ffff37d7f60,callargs=callargs@entry=0x7ffff7fb8058,kwdict=kwdict@entry=0x0) atPython/ceval.c:5054#11 0x0000000000534c64 in _PyEval_EvalFrameDefault (f=0xb4cb48, throwflag=<optimized out>)    atPython/ceval.c:3354#12 0x000000000052a0fc in PyEval_EvalFrameEx (f=f@entry=0xb4cb48,throwflag=throwflag@entry=0)    atPython/ceval.c:718#13 0x000000000052a1cc in _PyFunction_FastCall (co=<optimized out>, args=0xb4c5b0,nargs=nargs@entry=1,     globals=<optimized out>) atPython/ceval.c:4867(...)
msg281575 -(view)Author: R. David Murray (r.david.murray)*(Python committer)Date: 2016-11-23 18:10
I'm marking this as a release blocker since it is a (new) segfault.
msg281582 -(view)Author: Ned Deily (ned.deily)*(Python committer)Date: 2016-11-23 19:25
FWIW, hg bisect finds:changeset 103444:a77756e480c2: badThe first bad revision is:changeset:   103444:a77756e480c2parent:      103442:914a81781291user:        Victor Stinner <victor.stinner@gmail.com>date:        Fri Sep 09 10:17:08 2016 -0700summary:     Rework CALL_FUNCTION* opcodesAlso, FWIW, the successful runs prior to this revision report:base_events.py:481: ResourceWarning: unclosed event loop <_UnixSelectorEventLoop running=False closed=False debug=False>
msg281587 -(view)Author: Yury Selivanov (yselivanov)*(Python committer)Date: 2016-11-23 21:40
Wow, Martin, this is a very interesting one.  Thanks so much for tracking this down and reducing the test case.So far this doesn't seem to be related to async/await: the interpreter stack seems to enter some strange state under some conditions, and async/await just happens to be the thing that explodes.To those trying to debug this: you don't need asyncio, just replace `loop.run_until_complete(main())` with `main().send(None)`.
msg281591 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-23 23:11
Oh wow, the bug is tricky._PyGen_yf() checks if the next instruction is YIELD_FROM using code[f_lasti+2]. The problem is that WORDCODE kept f_lasti == -1 for a frame not executed yet: f_lasti+2 is 1 in this case.Problem: code[1] is not an operation code, but the argment. Python 3.6 bytecode now uses the "wordcode" format: 16-bit units of (operation, argument).The obvious and simple fix is to add a special case for f_lasti==-1 in _PyGen_yf(): see attached patch.pygen_yf.patch fixes the crash.--Another much larger change would be to change f_lasti to -2... In the rewiew of the huge WORDCODE patch, if I recall correctly, I asked Demur to use -1 for backward compatibility. Or maybe I asked to kept the backward compatibility at the Python level using a getter converting -2 to -1... I don't recall correctly.Seehttp://bugs.python.org/issue26647 for wordcode.
msg281592 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-23 23:12
It would be nice if Demur Rumed and/or Serhiy Storchaka can review  pygen_yf.patch since Demur wrote the wordcode patch and Serhiy reviewed the wordcode patch.
msg281593 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-23 23:57
> Another much larger change would be to change f_lasti to -2...Attached lasti.patch implements this idea. I consider that it makes the C code simpler because getting the next instruction (f_lasti + 2) doesn't require a special case anymore.My patch keeps f_lasti == -1 at the Python level for backward compatibility.lasti.patch is only a backward incompatible change at the C level.--Between pygen_yf.patch and lasti.patch, I prefer lasti.patch even if 3.6 is at its last beta version before the final version. I prefer to fix the C API. Later it will be much harder to fix it.--I read again the wordcode issue#26647:I wrote on the review of wpy7.patch: "The overall change LGTM, but I'm no more 100% sure that starting f_lasti=-1 is safe."http://bugs.python.org/review/26647/#msg17I wrote: "IMHO it's ok to break the C API, but I would prefer to keep the backward compatibility for the Python API (replace any negative number with -1 for the Python API)."http://bugs.python.org/issue26647#msg262758Serhiy: "I think we should make yet few related changes: (...) * Change f_lasti, tb_lasti etc to count code units instead of bytes."http://bugs.python.org/issue26647#msg262758
msg281597 -(view)Author: Philip Dubé (Demur Rumed)*Date: 2016-11-24 02:02
I had considered this, for some reason I didn't realize code[1] could be equal to YIELD_FROM, felt it'd always be falsef_lasti being -2 has always been my preference, lasti.patch lgtm
msg281598 -(view)Author: Yury Selivanov (yselivanov)*(Python committer)Date: 2016-11-24 02:06
> f_lasti being -2 has always been my preference, lasti.patch lgtmHow about we commit pygen_yf.patch to 3.6 and whatever else in 3.7?  lasti.patch is too invasive for 3.6 at this point, -1 for it in 3.6.
msg281608 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2016-11-24 06:08
I prefer pygen_yf.patch (with addressing Yury's suggestions). It is much simpler, and in any case I want to change f_lasti to count words rather than bytes (issue27129). It would be again -1 at the start.
msg281632 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-24 14:25
Updated patch. I added a NEWS entry and the two assertions from  lasti.patch. These assertions are basic sanity checks, not directly related to this issue, they shouldn't harm. A started frame must never go back to the not-started state (f_lasti < 0).Yury: "How about we commit pygen_yf.patch to 3.6 and whatever else in 3.7?"Wordcode is already a massive change in our bytecode, I would prefer to not touch f_lasti just as a late update of WORDCODE in 3.7.I'm ok to keep f_lasti == -1 for frames not started yet.Serhiy: "I prefer pygen_yf.patch (with addressing Yury's suggestions)."Ok, fine. I'm also more confident in smaller changes when we are very close to the final release! So let's forget lasti.patch ;-) (Sorry Demur!)--FYI attached test.py reproduces the bug because Cursor._read_data() starts with the instruction  "SETUP_LOOP 72" and 72 is the code of the operation YIELD_FROM :-)To reproduce the bug, you must have a code object where the second byte is 72. It's not easy to control the generated bytecode from the Python code, so I decided to not write an unit test for this bug.
msg281635 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2016-11-24 14:52
pygen_yf-2.patch LGTM.Agreed about tests.
msg281644 -(view)Author: Yury Selivanov (yselivanov)*(Python committer)Date: 2016-11-24 18:52
LGTM
msg281657 -(view)Author: Roundup Robot (python-dev)(Python triager)Date: 2016-11-24 21:35
New changeset303cedfb9e7a by Victor Stinner in branch '3.6':Fix _PyGen_yf()https://hg.python.org/cpython/rev/303cedfb9e7a
msg281660 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-24 21:53
Thanks Martin Richard to continue to find (and report!) crazy corner cases ;-)
msg281694 -(view)Author: Martin Richard (martius)*Date: 2016-11-25 10:28
Thank you all for fixing this so quickly, it's been done amazingly fast!
msg281695 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2016-11-25 10:31
Martin Richard: "Thank you all for fixing this so quickly, it's been done amazingly fast!"Even if I didn't write the commita77756e480c2 which introduced the regression, I pushed it and so I felt responsible. It's a good motivation to fix it quickly. So nobody notice :-D
msg281936 -(view)Author: Ned Deily (ned.deily)*(Python committer)Date: 2016-11-29 02:56
(Victor also merged this into default branch for 3.7.0 in6453ff3328b8)
History
DateUserActionArgs
2022-04-11 14:58:39adminsetgithub: 72968
2017-03-31 16:36:24dstufftsetpull_requests: +pull_request978
2016-11-29 02:56:36ned.deilysetpriority: release blocker ->

messages: +msg281936
stage: commit review -> resolved
2016-11-25 10:31:49vstinnersetmessages: +msg281695
2016-11-25 10:28:56martiussetmessages: +msg281694
2016-11-24 21:53:41vstinnersetstatus: open -> closed
resolution: fixed
messages: +msg281660
2016-11-24 21:35:18python-devsetnosy: +python-dev
messages: +msg281657
2016-11-24 18:52:37yselivanovsetmessages: +msg281644
2016-11-24 14:52:48serhiy.storchakasetassignee:vstinner
messages: +msg281635
stage: commit review
2016-11-24 14:25:29vstinnersetfiles: +pygen_yf-2.patch

messages: +msg281632
2016-11-24 06:08:52serhiy.storchakasetmessages: +msg281608
2016-11-24 02:06:27yselivanovsetmessages: +msg281598
2016-11-24 02:02:48Demur Rumedsetmessages: +msg281597
2016-11-23 23:57:51vstinnersetfiles: +lasti.patch

messages: +msg281593
2016-11-23 23:12:34vstinnersetnosy: +Demur Rumed
messages: +msg281592
2016-11-23 23:11:37vstinnersetfiles: +pygen_yf.patch
keywords: +patch
2016-11-23 23:11:25vstinnersetmessages: +msg281591
2016-11-23 21:42:46yselivanovsetnosy: +serhiy.storchaka
2016-11-23 21:40:08yselivanovsetmessages: +msg281587
2016-11-23 19:25:30ned.deilysetnosy: +vstinner
messages: +msg281582
2016-11-23 18:10:14r.david.murraysetpriority: normal -> release blocker
nosy: +r.david.murray,ned.deily
messages: +msg281575

2016-11-23 18:03:27martiuscreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp