Movatterモバイル変換

Issue32285

➜

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/76466

classification

Title:	In `unicodedata`, it should be possible to check a unistr's normal form without necessarily copying it
Type:	enhancement	Stage:	resolved
Components:	Unicode	Versions:	Python 3.8

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	Maxime Belanger, benjamin.peterson, ezio.melotti, steven.daprano, vstinner
Priority:	normal	Keywords:	patch

Created on2017-12-12 01:16 byMaxime Belanger, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.

Pull Requests
URL	Status	Linked	Edit
PR 4806	merged	maxbelanger,2017-12-12 01:19

Messages (4)
msg308085 -(view)	Author: Maxime Belanger (Maxime Belanger)	Date: 2017-12-12 01:16
In our deployment of Python 2.7, we've patched `unicodedata` to introduce a new function: `is_normalized` can check whether a unistr is in a given normal form. This currently has to be done by creating a normalized copy, then checking whether it is equal to the source string.This function uses the internal helper (also called `is_normalized`) that can "quick check" normalization, but falls back on creating a normalized copy and comparing (when necessary).We're contributing this change in case this can helpful to others. Feedback is welcome!
msg308122 -(view)	Author: Steven D'Aprano (steven.daprano)*	Date: 2017-12-12 12:25
Python 2.7 is in feature freeze, so this can only go into 3.7.I would find this useful, and would like this feature. However, I'm concerned by your comment that you fall back on creating a normalized copy and comparing. That could be expensive, and shouldn't be needed. According to here:http://unicode.org/reports/tr15/#Detecting_Normalization_Formsin the worst case, you can incrementally check only the code points in doubt (around the "MAYBE" code points).
msg308127 -(view)	Author: STINNER Victor (vstinner)*	Date: 2017-12-12 13:10
> However, I'm concerned by your comment that you fall back on creating a normalized copy and comparing.The purpose of the function is to be faster than str == unicodedata.normalize(form, str). So yeah, any optimization is welcome.But I don't bother with MAYBE suboptimal case which is implemented with: str == unicodedata.normalize(form, str). It can be optimized later, if needed.If someone cares of performance, I will require a benchmark, since I only trust numbers :-)
msg329276 -(view)	Author: Benjamin Peterson (benjamin.peterson)*	Date: 2018-11-04 23:58
New changeset2810dd7be9876236f74ac80716d113572c9098dd by Benjamin Peterson (Max Bélanger) in branch 'master':closesbpo-32285: Add unicodedata.is_normalized. (GH-4806)https://github.com/python/cpython/commit/2810dd7be9876236f74ac80716d113572c9098dd

History
Date	User	Action	Args
2022-04-11 14:58:55	admin	set	github: 76466
2018-11-04 23:58:27	benjamin.peterson	set	status: open -> closed nosy: +benjamin.peterson messages: +msg329276 resolution: fixed stage: patch review -> resolved
2018-10-25 00:06:09	Maxime Belanger	set	versions: + Python 3.8, - Python 3.7
2017-12-12 13:10:46	vstinner	set	messages: +msg308127
2017-12-12 12:25:08	steven.daprano	set	versions: - Python 2.7 nosy: +steven.daprano messages: +msg308122 type: enhancement
2017-12-12 01:19:59	maxbelanger	set	keywords: +patch stage: patch review pull_requests: +pull_request4703
2017-12-12 01:16:10	Maxime Belanger	create

Supported byThe Python Software Foundation,
Powered byRoundup