
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2017-02-06 04:27 bypusnow, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| u1176.patch | pusnow,2017-02-06 04:27 | review | ||
| u11a7u11c3.patch | pusnow,2017-02-06 05:47 | review | ||
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 1958 | merged | pusnow,2017-06-05 15:48 | |
| PR 7702 | merged | miss-islington,2018-06-15 12:03 | |
| PR 7703 | merged | miss-islington,2018-06-15 12:04 | |
| PR 7704 | merged | xiang.zhang,2018-06-15 12:23 | |
| Messages (23) | |||
|---|---|---|---|
| msg287077 -(view) | Author: Wonsup Yoon (pusnow)* | Date: 2017-02-06 04:27 | |
unicodedata can't normalize(NFC) hangul strings which contain \u1176(HANGUL JUNGSEONG A-O).>>> from unicodedata import normalize>>> normalize("NFC", "\u1100\u1176\u11a8")'깍'=> should be "\u1100\u1176\u11a8" not '깍' (\uae4d)I attached a patch for this issue. (Fixing boundary of modern medial vowels) | |||
| msg287078 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2017-02-06 05:21 | |
How about the third character's range? The code seems assuming it's [11a7..11c3] while the spec is [11a8..11c2]?>>> unicodedata.normalize("NFC", "\u1100\u1175\u11a7")'기'while it should be '기ᆧ'? | |||
| msg287079 -(view) | Author: Wonsup Yoon (pusnow)* | Date: 2017-02-06 05:47 | |
I think you are right. The modern final consonants is [11a8..11c2].I attached another patch for this issue. | |||
| msg295123 -(view) | Author: Wonsup Yoon (pusnow)* | Date: 2017-06-04 11:19 | |
Is there anything need more? | |||
| msg295171 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2017-06-05 07:32 | |
We have moved our code hosting to GitHub, would you mind turn your patch into a GitHub PR first Wonsup? | |||
| msg295172 -(view) | Author: Wonsup Yoon (pusnow)* | Date: 2017-06-05 08:06 | |
Ok, I'll do it. | |||
| msg299214 -(view) | Author: Wonsup Yoon (pusnow)* | Date: 2017-07-26 07:54 | |
Any updates? I need this fix for my project. | |||
| msg299657 -(view) | Author: Wonsup Yoon (pusnow)* | Date: 2017-08-02 13:25 | |
I added some test cases for this issue. Please, someone check this. | |||
| msg300039 -(view) | Author: Wonsup Yoon (pusnow)* | Date: 2017-08-10 03:46 | |
I think it can be merged. Is there anything I need to do? | |||
| msg300046 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2017-08-10 05:00 | |
Hi Wonsup, sorry for the delay. I get really busy with my work these days. If no one get involved I'd try to find time reviewing your patch this week. | |||
| msg300576 -(view) | Author: Wonsup Yoon (pusnow)* | Date: 2017-08-19 09:54 | |
This patch fixes changes in Unicode 4.1.0.I think it well reviewed and it is time to merge.Who can commit this patch? @animalize says:Let me give a supplement:Before Unicode 4.1.0 (draft), here is: TBase <= code <= TBase+TCountsee:http://www.unicode.org/reports/tr15/tr15-24.html#hangul_compositionAfter Unicode 4.1.0, here is TBase < code < TBase+TCount, which in line with the latest version (Unicode 10.0)see:http://www.unicode.org/reports/tr15/tr15-25.html#hangul_compositionThis change happened in 2005. | |||
| msg300933 -(view) | Author: Wonsup Yoon (pusnow)* | Date: 2017-08-28 02:41 | |
Hello? | |||
| msg313056 -(view) | Author: Ma Lin (malin)* | Date: 2018-02-28 11:09 | |
ping, this was forgotten. | |||
| msg315214 -(view) | Author: Wonsup Yoon (pusnow)* | Date: 2018-04-12 08:18 | |
Hello! | |||
| msg319591 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2018-06-15 07:58 | |
Sorry for the absence and late response. I just reviewed it and think it's ready. I think the change in the unicode standard is more like a bug in the implementation than an intentional change. It's mentioned in Unicode 3.0 the third character is out of bounds when TIndex <= 0 or TIndex >= TCount. We have a ucd_3_2_0 in unicodedata.I'll merge it after resolve the CI bot. | |||
| msg319608 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2018-06-15 12:03 | |
New changesetd134809cd3764c6a634eab7bb8995e3e2eff14d5 by Xiang Zhang (Wonsup Yoon) in branch 'master':bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (GH-1958)https://github.com/python/cpython/commit/d134809cd3764c6a634eab7bb8995e3e2eff14d5 | |||
| msg319609 -(view) | Author: miss-islington (miss-islington) | Date: 2018-06-15 12:21 | |
New changeset0e2b76ea4e48d0fc1ca34ae4ffbb2fd6c19664bb by Miss Islington (bot) in branch '3.7':bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (GH-1958)https://github.com/python/cpython/commit/0e2b76ea4e48d0fc1ca34ae4ffbb2fd6c19664bb | |||
| msg319610 -(view) | Author: miss-islington (miss-islington) | Date: 2018-06-15 12:32 | |
New changesete2e7ff0d0378ba44f10a1aae10e4bee957fb44d2 by Miss Islington (bot) in branch '3.6':bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (GH-1958)https://github.com/python/cpython/commit/e2e7ff0d0378ba44f10a1aae10e4bee957fb44d2 | |||
| msg319615 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2018-06-15 13:26 | |
New changeset1889c4cbd62e200fa4cde3d6219e0aadf9bd8149 by Xiang Zhang in branch '2.7':bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (GH-1958) (GH-7704)https://github.com/python/cpython/commit/1889c4cbd62e200fa4cde3d6219e0aadf9bd8149 | |||
| msg319701 -(view) | Author: Ma Lin (malin)* | Date: 2018-06-16 03:18 | |
> We have a ucd_3_2_0 in unicodedata.Probably this 3.2 unicodedata is used for IDNA2003.In IDNA2003 there is a step: normalize the domain_name string to Unicode Normalization Form C.Now we changed the Composition code of Hangul to Unicode Standard 4.1+, and fixed the bug even in Unicode Standard 4.1-.Should this (Unicode Standard 4.1+ behavior) cause a security vulnerability for someone who is using IDNA2003 via ucd_3_2_0? | |||
| msg319719 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2018-06-16 05:56 | |
As I said, I checked Unicode 3.0 for the hangul composition algorithm. It looks consistent with Unicode 4.1+. 3.0 only gets description but no sample implementation. So I think the changed code also applies to Unicode 3.0+. | |||
| msg319802 -(view) | Author: Ma Lin (malin)* | Date: 2018-06-17 02:40 | |
You are right.I found a Normalization Test Suite for Unicode 3.2http://www.unicode.org/Public/3.2-Update/NormalizationTest-3.2.0.txt\u1176 is not in the range of the second character.\u11a7, \u11c3 are not in the range of the third character. | |||
| msg319886 -(view) | Author: Xiang Zhang (xiang.zhang)*![]() | Date: 2018-06-18 14:21 | |
Thanks for your confirmation, Ma Lin. Also thanks for Wonsup! | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:42 | admin | set | github: 73642 |
| 2018-06-18 14:21:55 | xiang.zhang | set | messages: +msg319886 components: + Unicode, - Library (Lib) |
| 2018-06-17 02:40:32 | malin | set | messages: +msg319802 |
| 2018-06-16 05:56:16 | xiang.zhang | set | messages: +msg319719 |
| 2018-06-16 03:18:55 | malin | set | messages: +msg319701 |
| 2018-06-15 13:28:49 | xiang.zhang | set | status: open -> closed resolution: fixed components: + Library (Lib), - Unicode stage: patch review -> resolved |
| 2018-06-15 13:26:57 | xiang.zhang | set | messages: +msg319615 |
| 2018-06-15 12:32:53 | miss-islington | set | messages: +msg319610 |
| 2018-06-15 12:23:29 | xiang.zhang | set | pull_requests: +pull_request7320 |
| 2018-06-15 12:21:57 | miss-islington | set | nosy: +miss-islington messages: +msg319609 |
| 2018-06-15 12:04:26 | miss-islington | set | pull_requests: +pull_request7319 |
| 2018-06-15 12:03:37 | miss-islington | set | pull_requests: +pull_request7318 |
| 2018-06-15 12:03:16 | xiang.zhang | set | messages: +msg319608 |
| 2018-06-15 07:58:37 | xiang.zhang | set | messages: +msg319591 versions: + Python 3.8, - Python 3.5 |
| 2018-04-12 08:18:58 | pusnow | set | messages: +msg315214 |
| 2018-02-28 11:09:52 | malin | set | nosy: +malin messages: +msg313056 |
| 2017-08-28 02:41:24 | pusnow | set | messages: +msg300933 |
| 2017-08-19 09:54:09 | pusnow | set | messages: +msg300576 |
| 2017-08-10 05:00:52 | xiang.zhang | set | messages: +msg300046 |
| 2017-08-10 04:59:30 | xiang.zhang | set | files: -800.jpg |
| 2017-08-10 04:11:28 | 高可爱 | set | files: +800.jpg |
| 2017-08-10 03:46:54 | pusnow | set | messages: +msg300039 |
| 2017-08-02 13:25:40 | pusnow | set | messages: +msg299657 |
| 2017-07-26 07:54:11 | pusnow | set | messages: +msg299214 |
| 2017-06-05 15:48:57 | pusnow | set | pull_requests: +pull_request2029 |
| 2017-06-05 15:46:27 | pusnow | set | title: bug in unicodedata.normalize: u1176, u11a7 and u11c3 -> bugs in unicodedata.normalize: u1176, u11a7 and u11c3 |
| 2017-06-05 08:06:08 | pusnow | set | messages: +msg295172 |
| 2017-06-05 07:32:39 | xiang.zhang | set | messages: +msg295171 |
| 2017-06-04 11:19:17 | pusnow | set | messages: +msg295123 |
| 2017-03-11 12:55:26 | serhiy.storchaka | set | nosy: +lemburg,loewis stage: patch review type: behavior versions: + Python 3.5, Python 3.7 |
| 2017-03-11 12:33:28 | pusnow | set | title: bug in unicodedata.normalize: u1176 -> bug in unicodedata.normalize: u1176, u11a7 and u11c3 |
| 2017-02-06 05:47:24 | pusnow | set | files: +u11a7u11c3.patch messages: +msg287079 |
| 2017-02-06 05:21:48 | xiang.zhang | set | nosy: +xiang.zhang messages: +msg287078 |
| 2017-02-06 04:27:52 | pusnow | create | |