Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue23247

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:Crash in the reset() method of StreamWriter of CJK codecs
Type:crashStage:resolved
Components:UnicodeVersions:Python 3.6, Python 3.4, Python 3.5
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To: vstinnerNosy List: Aaron1011, ezio.melotti, martin.panter, python-dev, serhiy.storchaka, vstinner
Priority:highKeywords:patch

Created on2015-01-16 06:17 bymartin.panter, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.

Files
File nameUploadedDescriptionEdit
cjk_codecs_reset.patchvstinner,2015-07-15 23:00review
fix-multibytecodec-segfault.patchAaron1011,2015-07-16 03:29review
fix-multibytecodec-segfault-with-test.patchAaron1011,2015-07-16 16:47review
Messages (13)
msg234112 -(view)Author: Martin Panter (martin.panter)*(Python committer)Date: 2015-01-16 06:17
$ python3 -c 'import codecs; from io import BytesIO; codecs.getwriter("big5")(BytesIO()).reset()'Segmentation fault (core dumped)[Exit 139]Happens for all the multibyte codecs:broken_stream_codecs = {    "big5", "big5hkscs", "cp932", "cp949", "cp950",    "euc_jp", "euc_jis_2004", "euc_jisx0213", "euc_kr",    "gb2312", "gbk", "gb18030", "hz",    "iso2022_jp", "iso2022_jp_1", "iso2022_jp_2", "iso2022_jp_2004",    "iso2022_jp_3", "iso2022_jp_ext", "iso2022_kr",    "johab", "shift_jis", "shift_jis_2004", "shift_jisx0213",}These codecs also share the property that their StreamReader.read() methods do not accept the second “chars” parameter:>>> codecs.getreader("big5")(BytesIO()).read(1, 1)Traceback (most recent call last):  File "<stdin>", line 1, in <module>TypeError: read expected at most 1 arguments, got 2
msg246781 -(view)Author: Aaron Hill (Aaron1011)*Date: 2015-07-15 22:27
This is also present in the latest Python 3.6.I'm going to work on providing a patch for this, unless someone else already is
msg246782 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2015-07-15 23:00
> Happens for all the multibyte codecs: (...)All these codecs share the same C implementation: the "CJK" codecs.The bug was introduced in Python 3.4 by my huge changesetd621bdaed7c3: Issue "#17693: CJK encoders now use the new Unicode API (PEP 393)".It looks like there was no test for this specific bug :-/ Calling reset() just after creating a StreamReader object.Attached patch should fix the crash.@Aaron1011: Can you please try to write a patch adding a unit test to test_codecs reproducing the crash?
msg246783 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2015-07-15 23:02
> It looks like there was no test for this specific bug :-/ Calling reset() just after creating a StreamReader object.Oops: StreamWriter, not StreamReader.
msg246784 -(view)Author: Martin Panter (martin.panter)*(Python committer)Date: 2015-07-16 00:56
Perhaps this test case proposed in my patch forIssue 13881 could be useful:+def test_writer_reuse(self):+    """StreamWriter should be reusable after reset"""Looks like that is where my “broken_stream_codecs” list from the original post came from.
msg246785 -(view)Author: Aaron Hill (Aaron1011)*Date: 2015-07-16 02:57
The included patch fixes the issue, and modifies the existing unittest to prevent a future regression.The patch corrects an issue where the 'pending' struct field was NULL, but was used as the input to multibytecodec_encode anyay.
msg246786 -(view)Author: Serhiy Storchaka (serhiy.storchaka)*(Python committer)Date: 2015-07-16 03:27
cjk_codecs_reset.patch LGTM.
msg246787 -(view)Author: Aaron Hill (Aaron1011)*Date: 2015-07-16 03:29
The patch didn't get attached for some reason. It's attached now.
msg246801 -(view)Author: Martin Panter (martin.panter)*(Python committer)Date: 2015-07-16 12:57
Aaron: Your version of the fix immediately returns None, while Victor’s tries to encode an empty string (if I understand it correctly). I imagine this shortcut could be slightly more efficient, but is it always correct?In any case, Aaron’s test looks okay to me.
msg246802 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2015-07-16 13:01
> Aaron: Your version of the fix immediately returns None, while Victor’s tries to encode an empty string (if I understand it correctly).Aaron patch is better. I misunderstood the code. reset() always return None, the empty string is unused in my patch.fix-multibytecodec-segfault.patch: please write a _new_ test mentionning this issue to check for non regression.
msg246810 -(view)Author: Aaron Hill (Aaron1011)*Date: 2015-07-16 16:47
I've added a test case to exercise reset()
msg246818 -(view)Author: Roundup Robot (python-dev)(Python triager)Date: 2015-07-16 20:22
New changeset35a6fe0e2b27 by Victor Stinner in branch '3.4':Closes#23247: Fix a crash in the StreamWriter.reset() of CJK codecshttps://hg.python.org/cpython/rev/35a6fe0e2b27
msg246819 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2015-07-16 20:25
@Aaron Hill: Thanks for your patch! I only kept the minimal test of your patch. If you want to enhance the test suite, you may write new test to test the behaviour of reset(). I prefer to only commit the minimum patch.@Martin: Thanks for your report. The issue is now fixed.
History
DateUserActionArgs
2022-04-11 14:58:11adminsetgithub: 67436
2015-07-16 20:25:47vstinnersetmessages: +msg246819
2015-07-16 20:22:58python-devsetstatus: open -> closed

nosy: +python-dev
messages: +msg246818

resolution: fixed
stage: patch review -> resolved
2015-07-16 16:47:36Aaron1011setfiles: +fix-multibytecodec-segfault-with-test.patch

messages: +msg246810
2015-07-16 13:01:32vstinnersetmessages: +msg246802
2015-07-16 12:57:06martin.pantersetmessages: +msg246801
stage: test needed -> patch review
2015-07-16 03:29:46Aaron1011setfiles: +fix-multibytecodec-segfault.patch

messages: +msg246787
2015-07-16 03:27:43serhiy.storchakasetassignee:serhiy.storchaka ->vstinner
messages: +msg246786
stage: test needed
2015-07-16 02:57:43Aaron1011setmessages: +msg246785
2015-07-16 00:56:58martin.pantersetmessages: +msg246784
2015-07-15 23:02:01vstinnersetmessages: +msg246783
2015-07-15 23:00:54vstinnersetfiles: +cjk_codecs_reset.patch
versions: + Python 3.5
title: Multibyte codec StreamWriter.reset() crashes -> Crash in the reset() method of StreamWriter of CJK codecs
messages: +msg246782

keywords: +patch
2015-07-15 22:27:47Aaron1011setnosy: +Aaron1011

messages: +msg246781
versions: + Python 3.6
2015-02-28 13:34:48serhiy.storchakasetpriority: normal -> high
assignee:serhiy.storchaka

nosy: +serhiy.storchaka
2015-01-16 06:17:50martin.pantercreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp