C API: Add PyUnicode_EqualToUTF8() function #110289

New issue

Closed

C API: Add PyUnicode_EqualToUTF8() function#110289

Labels

topic-C-APItopic-unicodetype-featureA feature request or enhancement

Description

serhiy-storchaka

opened

on Oct 3, 2023

Feature or enhancement

There is publicPyUnicode_CompareWithASCIIString() function. Despite it name, it compares Python string object with ISO-8859-1 encoded C string. it returns -1, 0 or 1 and never sets an error.

There is private_PyUnicode_EqualToASCIIString() function. It only works with ASCII encoded C string and crashes in debug build it it is not ASCII. It returns 0 or 1 and never sets an error.

_PyUnicode_EqualToASCIIString() is more efficient thanPyUnicode_CompareWithASCIIString(), because if arguments are not equal it can simply return false instead of determining what is larger. It was the main reason of introducing it. It is also more convenient, because you do not need to add== 0 or!= 0 after the call (and if it is not added, it is difficult to read).

I propose to add the latter function to the public C API, but also extend it to support UTF-8 encoded C strings. While most of use cases are ASCII-only, formally almost all C strings in the C API are UTF-8 encoded.PyUnicode_FromString() andPyUnicode_AsUTF8AndSize() used to convert between Python and C strings use UTF-8 encoding.PyTypeObject.tp_name,PyMethodDef.ml_name,PyDescrObject.d_name all are UTF-8 encoded.PyUnicode_CompareWithASCIIString() cannot be used to compare Python string with such names.

For PyASCIIObject objects the new function will be as fast as_PyUnicode_EqualToASCIIString().

Linked PRs

gh-110289: C API: Add PyUnicode_EqualToUTF8() function #110297

Metadata

Assignees

No one assigned

Labels

topic-C-APItopic-unicodetype-featureA feature request or enhancement

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

C API: Add PyUnicode_EqualToUTF8() function #110289

Description

Feature or enhancement

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions