Movatterモバイル変換


[0]ホーム

URL:


homepage

Issue32285

This issue trackerhas been migrated toGitHub, and is currentlyread-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title:In `unicodedata`, it should be possible to check a unistr's normal form without necessarily copying it
Type:enhancementStage:resolved
Components:UnicodeVersions:Python 3.8
process
Status:closedResolution:fixed
Dependencies:Superseder:
Assigned To:Nosy List: Maxime Belanger, benjamin.peterson, ezio.melotti, steven.daprano, vstinner
Priority:normalKeywords:patch

Created on2017-12-12 01:16 byMaxime Belanger, last changed2022-04-11 14:58 byadmin. This issue is nowclosed.

Pull Requests
URLStatusLinkedEdit
PR 4806mergedmaxbelanger,2017-12-12 01:19
Messages (4)
msg308085 -(view)Author: Maxime Belanger (Maxime Belanger)Date: 2017-12-12 01:16
In our deployment of Python 2.7, we've patched `unicodedata` to introduce a new function: `is_normalized` can check whether a unistr is in a given normal form. This currently has to be done by creating a normalized copy, then checking whether it is equal to the source string.This function uses the internal helper (also called `is_normalized`) that can "quick check" normalization, but falls back on creating a normalized copy and comparing (when necessary).We're contributing this change in case this can helpful to others. Feedback is welcome!
msg308122 -(view)Author: Steven D'Aprano (steven.daprano)*(Python committer)Date: 2017-12-12 12:25
Python 2.7 is in feature freeze, so this can only go into 3.7.I would find this useful, and would like this feature. However, I'm concerned by your comment that you fall back on creating a normalized copy and comparing. That could be expensive, and shouldn't be needed. According to here:http://unicode.org/reports/tr15/#Detecting_Normalization_Formsin the worst case, you can incrementally check only the code points in doubt (around the "MAYBE" code points).
msg308127 -(view)Author: STINNER Victor (vstinner)*(Python committer)Date: 2017-12-12 13:10
> However, I'm concerned by your comment that you fall back on creating a normalized copy and comparing.The purpose of the function is to be faster than str == unicodedata.normalize(form, str). So yeah, any optimization is welcome.But I don't bother with MAYBE suboptimal case which is implemented with: str == unicodedata.normalize(form, str). It can be optimized later, if needed.If someone cares of performance, I will require a benchmark, since I only trust numbers :-)
msg329276 -(view)Author: Benjamin Peterson (benjamin.peterson)*(Python committer)Date: 2018-11-04 23:58
New changeset2810dd7be9876236f74ac80716d113572c9098dd by Benjamin Peterson (Max Bélanger) in branch 'master':closesbpo-32285: Add unicodedata.is_normalized. (GH-4806)https://github.com/python/cpython/commit/2810dd7be9876236f74ac80716d113572c9098dd
History
DateUserActionArgs
2022-04-11 14:58:55adminsetgithub: 76466
2018-11-04 23:58:27benjamin.petersonsetstatus: open -> closed

nosy: +benjamin.peterson
messages: +msg329276

resolution: fixed
stage: patch review -> resolved
2018-10-25 00:06:09Maxime Belangersetversions: + Python 3.8, - Python 3.7
2017-12-12 13:10:46vstinnersetmessages: +msg308127
2017-12-12 12:25:08steven.dapranosetversions: - Python 2.7
nosy: +steven.daprano

messages: +msg308122

type: enhancement
2017-12-12 01:19:59maxbelangersetkeywords: +patch
stage: patch review
pull_requests: +pull_request4703
2017-12-12 01:16:10Maxime Belangercreate
Supported byThe Python Software Foundation,
Powered byRoundup
Copyright © 1990-2022,Python Software Foundation
Legal Statements

[8]ページ先頭

©2009-2026 Movatter.jp