unicodedata: is_normalized claims nothing is normalized in any form when using the 3.2.0 database #101372

New issue

Open

unicodedata: is_normalized claims nothing is normalized in any form when using the 3.2.0 database#101372

Labels

topic-unicodetype-bugAn unexpected behavior, bug, or error

Description

zahlman

opened

on Jan 27, 2023

Bug report

3.8 adds the.is_normalized function to theunicodedata module, which also is available as a method on the legacyunicodedata.ucd_3_2_0 database. It is supposed to check whether a string is equal to its normalization in a given form, but without having to normalize and compare.

However, the legacy version does not maintain the expected invariant. In fact, it reports thatevery single-character string isnot normalized,regardless of the normalization form chosen. Presumably, the result is the same for every non-empty string. (It appears that the empty string works because it is special-cased at line 871-874.)

Example:

>>> import unicodedata>>> unicodedata.ucd_3_2_0.normalize('NFC', '!') == '!'True>>> unicodedata.ucd_3_2_0.is_normalized('NFC', '!')False>>> any(unicodedata.ucd_3_2_0.is_normalized(form, chr(x)) for form in ('NFC', 'NFD', 'NFKC', 'NFKD') for x in range(0x110000))False

The bug appears to beat line 801-804 of unicodedata.c:

    /* UCD 3.2.0 is requested, quickchecks must be disabled. */    if (UCD_Check(self)) {        return NO;    }

I believe theNO should sayMAYBE instead. TheNO value appears to indicate that the quickcheck has determined that the string is not normalized - contrary to both the comment and expected behaviour.

Your environment

$ pythonPython 3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0] on linuxType "help", "copyright", "credits" or "license" for more information.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

unicodedata: is_normalized claims nothing is normalized in any form when using the 3.2.0 database #101372

Description

Bug report

Your environment

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions