
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2017-04-05 03:50 bymalin, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 994 | closed | python-dev,2017-04-05 03:53 | |
| PR 999 | closed | malin,2017-04-05 06:28 | |
| PR 1495 | merged | xiang.zhang,2017-05-08 03:25 | |
| PR 1507 | merged | xiang.zhang,2017-05-09 03:46 | |
| PR 1508 | merged | xiang.zhang,2017-05-09 03:47 | |
| PR 1509 | merged | xiang.zhang,2017-05-09 04:03 | |
| Messages (11) | |||
|---|---|---|---|
| msg291153 -(view) | Author: Ma Lin (malin)* | Date: 2017-04-05 03:50 | |
This issue is split fromissue24117, that issue became a soup of small issues, so I'm going to close it.For 4-byte GB18030 sequence, the legal range is:0x81-0xFE for the 1st byte0x30-0x39 for the 2nd byte0x81-0xFE for the 3rd byte0x30-0x39 for the 4th byteGB18030 standard:https://en.wikipedia.org/wiki/GB_18030https://pan.baidu.com/share/link?shareid=2606985291&uk=3341026630The current code forgets to check 0xFE for the 1st and 3rd byte.Therefore, there are 8630 illegal 4-byte sequences can be decoded by GB18030 codec, here is an example:# legal sequence b'\x81\x31\x81\x30' is decoded to U+060A, it's fine.uchar = b'\x81\x31\x81\x30'.decode('gb18030')print(hex(ord(uchar)))# illegal sequence 0x8130FF30 can be decoded to U+060A as well, this should not happen.uchar = b'\x81\x30\xFF\x30' .decode('gb18030')print(hex(ord(uchar))) | |||
| msg291205 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2017-04-06 03:23 | |
The table in wikipedia is somewhat complex. I findftp://ftp.software.ibm.com/software/globalization/documents/gb18030m.pdf and the table in it is same ashttps://pan.baidu.com/share/link?shareid=2606985291&uk=3341026630 (except 0x80) but in English. I agree with Ma Lin bytes sequences like b'\x81\x30\xFF\x30' are invalid.For current implementation, you could see:>>> invalid = b'\x81\x30\xff\x30'>>> invalid.decode('gb18030').encode('gb18030') == invalidFalse | |||
| msg291208 -(view) | Author: Ma Lin (malin)* | Date: 2017-04-06 04:18 | |
> except 0x80 (€)I suppose the English edition is not the final release of GB18030-2000.At the end of official Chinese edition of GB18030-2005, listed the difference between GB18030-2000 and GB18030-2005 clearly, it doesn't mention 0x80 (€), so GB18030-2000 should not has 0x80 as well.Why 0x80 (€) appear in English edition?I searched on Google, this topic said 0x80 appears in *draft* of GB18030-2000.http://www.pkucn.com/thread-304395-1-1.htmlSo maybe the English edition is a translation of GB18030-2000 draft, this logic seems ok.Anyway, 0x80 is another story, not conflict with this issue.Zhang, do I need to make PR for 3.6/3.5/2.7 respectively? | |||
| msg291210 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2017-04-06 05:05 | |
Yes, 0x80 doesn't matter here.It's nice to make the backporting PRs. But let's wait some time for ezio and haypo's comments and reviews. Get the master PR merged first and then continue on backporting. :-) | |||
| msg291213 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2017-04-06 06:54 | |
An incorrect implementation of a decoder might lead to security vulnerabilities:http://unicodebook.readthedocs.io/issues.html#security-vulnerabilities*But* UTF-8 decoder of Python 2 is *not* strict and nobody complained.I suggest that, once the changed is merged in master, backport the fix to 3.6 and 3.5.But I'm not sure that it's worth it to backport it to 2.7? Is there a risk to break an application? | |||
| msg291215 -(view) | Author: Ma Lin (malin)* | Date: 2017-04-06 07:31 | |
This is a very trivial bug, it's hard to imagine a scene that someone trying to decode those 8630 illegal 4-byte sequences with GB18030 decoder.And I think this bug can't lead to security vulnerabilities.As far as I can see, GB2312/GBK/GB18030 codecs are bugfree except this bug, of course maybe I'm wrong. | |||
| msg293275 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2017-05-09 03:38 | |
New changeset9da408d15bdef624a5632182cb4edf98001fa82f by Xiang Zhang in branch 'master':bpo-29990: Fix range checking in GB18030 decoder (#1495)https://github.com/python/cpython/commit/9da408d15bdef624a5632182cb4edf98001fa82f | |||
| msg293276 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2017-05-09 04:16 | |
New changeset72e1b61da0920c5607481304879e039b63e2a3d5 by Xiang Zhang in branch '3.6':bpo-29990: Fix range checking in GB18030 decoder (#1495) (#1507)https://github.com/python/cpython/commit/72e1b61da0920c5607481304879e039b63e2a3d5 | |||
| msg293277 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2017-05-09 04:17 | |
New changesetf5f7870d9322b46ab87c45b2c4c46f6b10ecbd70 by Xiang Zhang in branch '3.5':bpo-29990: Fix range checking in GB18030 decoder (#1495) (#1508)https://github.com/python/cpython/commit/f5f7870d9322b46ab87c45b2c4c46f6b10ecbd70 | |||
| msg293278 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2017-05-09 04:18 | |
New changeset4e7457b85316e6591a4f0c3a4d0807bfdf7a2bea by Xiang Zhang in branch '2.7':bpo-29990: Fix range checking in GB18030 decoder (#1509)https://github.com/python/cpython/commit/4e7457b85316e6591a4f0c3a4d0807bfdf7a2bea | |||
| msg293279 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2017-05-09 04:22 | |
Thanks Ma Lin for finding the problem! Don't know why you close the PR but anyway, we solve it finally. | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:44 | admin | set | github: 74176 |
| 2017-05-09 04:22:14 | xiang.zhang | set | messages: +msg293279 |
| 2017-05-09 04:19:26 | xiang.zhang | set | status: open -> closed resolution: fixed stage: needs patch -> resolved |
| 2017-05-09 04:18:58 | xiang.zhang | set | messages: +msg293278 |
| 2017-05-09 04:17:11 | xiang.zhang | set | messages: +msg293277 |
| 2017-05-09 04:16:52 | xiang.zhang | set | messages: +msg293276 |
| 2017-05-09 04:03:55 | xiang.zhang | set | pull_requests: +pull_request1611 |
| 2017-05-09 03:47:37 | xiang.zhang | set | pull_requests: +pull_request1610 |
| 2017-05-09 03:46:14 | xiang.zhang | set | pull_requests: +pull_request1609 |
| 2017-05-09 03:38:35 | xiang.zhang | set | messages: +msg293275 |
| 2017-05-08 03:25:20 | xiang.zhang | set | pull_requests: +pull_request1597 |
| 2017-04-14 17:21:39 | Mariatta | set | stage: patch review -> needs patch |
| 2017-04-06 07:31:53 | malin | set | messages: +msg291215 |
| 2017-04-06 06:54:09 | vstinner | set | messages: +msg291213 |
| 2017-04-06 05:05:02 | xiang.zhang | set | messages: +msg291210 |
| 2017-04-06 04:18:36 | malin | set | messages: +msg291208 |
| 2017-04-06 03:24:45 | xiang.zhang | set | stage: patch review versions: + Python 2.7, Python 3.5, Python 3.6 |
| 2017-04-06 03:23:12 | xiang.zhang | set | messages: +msg291205 |
| 2017-04-05 06:28:48 | malin | set | pull_requests: +pull_request1171 |
| 2017-04-05 03:56:05 | xiang.zhang | set | nosy: +xiang.zhang |
| 2017-04-05 03:53:59 | python-dev | set | pull_requests: +pull_request1168 |
| 2017-04-05 03:50:03 | malin | create | |