
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2015-01-16 06:17 bymartin.panter, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| cjk_codecs_reset.patch | vstinner,2015-07-15 23:00 | review | ||
| fix-multibytecodec-segfault.patch | Aaron1011,2015-07-16 03:29 | review | ||
| fix-multibytecodec-segfault-with-test.patch | Aaron1011,2015-07-16 16:47 | review | ||
| Messages (13) | |||
|---|---|---|---|
| msg234112 -(view) | Author: Martin Panter (martin.panter)*![]() | Date: 2015-01-16 06:17 | |
$ python3 -c 'import codecs; from io import BytesIO; codecs.getwriter("big5")(BytesIO()).reset()'Segmentation fault (core dumped)[Exit 139]Happens for all the multibyte codecs:broken_stream_codecs = { "big5", "big5hkscs", "cp932", "cp949", "cp950", "euc_jp", "euc_jis_2004", "euc_jisx0213", "euc_kr", "gb2312", "gbk", "gb18030", "hz", "iso2022_jp", "iso2022_jp_1", "iso2022_jp_2", "iso2022_jp_2004", "iso2022_jp_3", "iso2022_jp_ext", "iso2022_kr", "johab", "shift_jis", "shift_jis_2004", "shift_jisx0213",}These codecs also share the property that their StreamReader.read() methods do not accept the second “chars” parameter:>>> codecs.getreader("big5")(BytesIO()).read(1, 1)Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: read expected at most 1 arguments, got 2 | |||
| msg246781 -(view) | Author: Aaron Hill (Aaron1011)* | Date: 2015-07-15 22:27 | |
This is also present in the latest Python 3.6.I'm going to work on providing a patch for this, unless someone else already is | |||
| msg246782 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2015-07-15 23:00 | |
> Happens for all the multibyte codecs: (...)All these codecs share the same C implementation: the "CJK" codecs.The bug was introduced in Python 3.4 by my huge changesetd621bdaed7c3: Issue "#17693: CJK encoders now use the new Unicode API (PEP 393)".It looks like there was no test for this specific bug :-/ Calling reset() just after creating a StreamReader object.Attached patch should fix the crash.@Aaron1011: Can you please try to write a patch adding a unit test to test_codecs reproducing the crash? | |||
| msg246783 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2015-07-15 23:02 | |
> It looks like there was no test for this specific bug :-/ Calling reset() just after creating a StreamReader object.Oops: StreamWriter, not StreamReader. | |||
| msg246784 -(view) | Author: Martin Panter (martin.panter)*![]() | Date: 2015-07-16 00:56 | |
Perhaps this test case proposed in my patch forIssue 13881 could be useful:+def test_writer_reuse(self):+ """StreamWriter should be reusable after reset"""Looks like that is where my “broken_stream_codecs” list from the original post came from. | |||
| msg246785 -(view) | Author: Aaron Hill (Aaron1011)* | Date: 2015-07-16 02:57 | |
The included patch fixes the issue, and modifies the existing unittest to prevent a future regression.The patch corrects an issue where the 'pending' struct field was NULL, but was used as the input to multibytecodec_encode anyay. | |||
| msg246786 -(view) | Author: Serhiy Storchaka (serhiy.storchaka)*![]() | Date: 2015-07-16 03:27 | |
cjk_codecs_reset.patch LGTM. | |||
| msg246787 -(view) | Author: Aaron Hill (Aaron1011)* | Date: 2015-07-16 03:29 | |
The patch didn't get attached for some reason. It's attached now. | |||
| msg246801 -(view) | Author: Martin Panter (martin.panter)*![]() | Date: 2015-07-16 12:57 | |
Aaron: Your version of the fix immediately returns None, while Victor’s tries to encode an empty string (if I understand it correctly). I imagine this shortcut could be slightly more efficient, but is it always correct?In any case, Aaron’s test looks okay to me. | |||
| msg246802 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2015-07-16 13:01 | |
> Aaron: Your version of the fix immediately returns None, while Victor’s tries to encode an empty string (if I understand it correctly).Aaron patch is better. I misunderstood the code. reset() always return None, the empty string is unused in my patch.fix-multibytecodec-segfault.patch: please write a _new_ test mentionning this issue to check for non regression. | |||
| msg246810 -(view) | Author: Aaron Hill (Aaron1011)* | Date: 2015-07-16 16:47 | |
I've added a test case to exercise reset() | |||
| msg246818 -(view) | Author: Roundup Robot (python-dev)![]() | Date: 2015-07-16 20:22 | |
New changeset35a6fe0e2b27 by Victor Stinner in branch '3.4':Closes#23247: Fix a crash in the StreamWriter.reset() of CJK codecshttps://hg.python.org/cpython/rev/35a6fe0e2b27 | |||
| msg246819 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2015-07-16 20:25 | |
@Aaron Hill: Thanks for your patch! I only kept the minimal test of your patch. If you want to enhance the test suite, you may write new test to test the behaviour of reset(). I prefer to only commit the minimum patch.@Martin: Thanks for your report. The issue is now fixed. | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:11 | admin | set | github: 67436 |
| 2015-07-16 20:25:47 | vstinner | set | messages: +msg246819 |
| 2015-07-16 20:22:58 | python-dev | set | status: open -> closed nosy: +python-dev messages: +msg246818 resolution: fixed stage: patch review -> resolved |
| 2015-07-16 16:47:36 | Aaron1011 | set | files: +fix-multibytecodec-segfault-with-test.patch messages: +msg246810 |
| 2015-07-16 13:01:32 | vstinner | set | messages: +msg246802 |
| 2015-07-16 12:57:06 | martin.panter | set | messages: +msg246801 stage: test needed -> patch review |
| 2015-07-16 03:29:46 | Aaron1011 | set | files: +fix-multibytecodec-segfault.patch messages: +msg246787 |
| 2015-07-16 03:27:43 | serhiy.storchaka | set | assignee:serhiy.storchaka ->vstinner messages: +msg246786 stage: test needed |
| 2015-07-16 02:57:43 | Aaron1011 | set | messages: +msg246785 |
| 2015-07-16 00:56:58 | martin.panter | set | messages: +msg246784 |
| 2015-07-15 23:02:01 | vstinner | set | messages: +msg246783 |
| 2015-07-15 23:00:54 | vstinner | set | files: +cjk_codecs_reset.patch versions: + Python 3.5 title: Multibyte codec StreamWriter.reset() crashes -> Crash in the reset() method of StreamWriter of CJK codecs messages: +msg246782 keywords: +patch |
| 2015-07-15 22:27:47 | Aaron1011 | set | nosy: +Aaron1011 messages: +msg246781 versions: + Python 3.6 |
| 2015-02-28 13:34:48 | serhiy.storchaka | set | priority: normal -> high assignee:serhiy.storchaka nosy: +serhiy.storchaka |
| 2015-01-16 06:17:50 | martin.panter | create | |