
This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.
Created on2017-12-12 01:16 byMaxime Belanger, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 4806 | merged | maxbelanger,2017-12-12 01:19 | |
| Messages (4) | |||
|---|---|---|---|
| msg308085 -(view) | Author: Maxime Belanger (Maxime Belanger) | Date: 2017-12-12 01:16 | |
In our deployment of Python 2.7, we've patched `unicodedata` to introduce a new function: `is_normalized` can check whether a unistr is in a given normal form. This currently has to be done by creating a normalized copy, then checking whether it is equal to the source string.This function uses the internal helper (also called `is_normalized`) that can "quick check" normalization, but falls back on creating a normalized copy and comparing (when necessary).We're contributing this change in case this can helpful to others. Feedback is welcome! | |||
| msg308122 -(view) | Author: Steven D'Aprano (steven.daprano)*![]() | Date: 2017-12-12 12:25 | |
Python 2.7 is in feature freeze, so this can only go into 3.7.I would find this useful, and would like this feature. However, I'm concerned by your comment that you fall back on creating a normalized copy and comparing. That could be expensive, and shouldn't be needed. According to here:http://unicode.org/reports/tr15/#Detecting_Normalization_Formsin the worst case, you can incrementally check only the code points in doubt (around the "MAYBE" code points). | |||
| msg308127 -(view) | Author: STINNER Victor (vstinner)*![]() | Date: 2017-12-12 13:10 | |
> However, I'm concerned by your comment that you fall back on creating a normalized copy and comparing.The purpose of the function is to be faster than str == unicodedata.normalize(form, str). So yeah, any optimization is welcome.But I don't bother with MAYBE suboptimal case which is implemented with: str == unicodedata.normalize(form, str). It can be optimized later, if needed.If someone cares of performance, I will require a benchmark, since I only trust numbers :-) | |||
| msg329276 -(view) | Author: Benjamin Peterson (benjamin.peterson)*![]() | Date: 2018-11-04 23:58 | |
New changeset2810dd7be9876236f74ac80716d113572c9098dd by Benjamin Peterson (Max Bélanger) in branch 'master':closesbpo-32285: Add unicodedata.is_normalized. (GH-4806)https://github.com/python/cpython/commit/2810dd7be9876236f74ac80716d113572c9098dd | |||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:58:55 | admin | set | github: 76466 |
| 2018-11-04 23:58:27 | benjamin.peterson | set | status: open -> closed nosy: +benjamin.peterson messages: +msg329276 resolution: fixed stage: patch review -> resolved |
| 2018-10-25 00:06:09 | Maxime Belanger | set | versions: + Python 3.8, - Python 3.7 |
| 2017-12-12 13:10:46 | vstinner | set | messages: +msg308127 |
| 2017-12-12 12:25:08 | steven.daprano | set | versions: - Python 2.7 nosy: +steven.daprano messages: +msg308122 type: enhancement |
| 2017-12-12 01:19:59 | maxbelanger | set | keywords: +patch stage: patch review pull_requests: +pull_request4703 |
| 2017-12-12 01:16:10 | Maxime Belanger | create | |