NotificationsYou must be signed in to change notification settings
Fork32.4k
Star67.9k

gh-105156: Deprecate the old Py_UNICODE type in C API#105157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

vstinner merged 2 commits intopython:mainfromvstinner:deprecate_py_unicode

Jun 1, 2023

Merged

gh-105156: Deprecate the old Py_UNICODE type in C API#105157

vstinner merged 2 commits intopython:mainfromvstinner:deprecate_py_unicode

Jun 1, 2023

Conversation

Copy link

Member

vstinner commentedMay 31, 2023•
edited by github-actionsbot
Loading

Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API: use wchar_t instead.

Replace Py_UNICODE with wchar_t in multiple C files.

Issue:C API: Deprecate Py_UNICODE type #105156

📚 Documentation preview 📚:https://cpython-previews--105157.org.readthedocs.build/

pythongh-105156: Deprecate the old Py_UNICODE type in C API

3165ff7

Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API:use wchar_t instead.Replace Py_UNICODE with wchar_t in multiple C files.

bedevere-bot added the awaiting core review label

May 31, 2023

bedevere-bot mentioned this pull request

May 31, 2023

C API: Deprecate Py_UNICODE type#105156

Closed

Copy link

MemberAuthor

vstinner commentedMay 31, 2023

cc@methane

Copy link

Member

methane commentedMay 31, 2023

Sourcegraph results:

It seems two releases is not enough for removingPy_UNICODE. But let's see it two years later.

methane approved these changes

May 31, 2023

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels

May 31, 2023

methane reviewed

May 31, 2023

View reviewed changes

Include/cpython/unicodeobject.h OutdatedShow resolvedHide resolved

Copy link

MemberAuthor

vstinner commentedMay 31, 2023

It seems two releases is not enough for removing Py_UNICODE. But let's see it two years later.

This PR is mostly about deprecation. I prefer to announce a Python release when these types will be removed, Python 3.15. But we will have to do this usage study again when these types will be removed for real.

The warning should help users to find old code still using Py_UNICODE by mistake or not.

Copy link

Member

methane commentedMay 31, 2023

Fix here too.
https://github.com/python/cpython/pull/105157/files#file-modules-posixmodule-c-L5653

Copy link

MemberAuthor

vstinner commentedMay 31, 2023

Sourcegraph results: Py_UNICODE

The first result isPy_UNICODE *inp = PyUnicode_AS_UNICODE(in);. This code is already broken by Python 3.12: the function got removed.

Update Include/cpython/unicodeobject.h

93f06f7

Co-authored-by: Inada Naoki <songofacandy@gmail.com>

Copy link

MemberAuthor

vstinner commentedMay 31, 2023

Fix here too.https://github.com/python/cpython/pull/105157/files#file-modules-posixmodule-c-L5653

I planned to write a separated PR for code generated by Argument Clinic. It's now done with: PR#105161.

Copy link

MemberAuthor

vstinner commentedMay 31, 2023

I will wait until they 2 other PRs of this issue will be merged, to avoid emitting new compiler warnings.

Copy link

Member

arhadthedev commentedMay 31, 2023

use wchar_t instead.

Can we usechar16_t from С11? Docs:https://en.cppreference.com/w/c/string/multibyte/char16_t.

It would avoid 2-vs-4-byte size discrepancy.

Copy link

Member

methane commentedMay 31, 2023

Can we usechar16_t from С11? Docs:https://en.cppreference.com/w/c/string/multibyte/char16_t.
It would avoid 2-vs-4-byte size discrepancy.

At where?

Py_UNICODE has been wchar_t since Python 3.3.
So user should use wchar_t where Py_UNICODE was required before.

Where Py_UNICODE was not required, my recommendation is "use UTF-8 always".

Copy link

Member

arhadthedev commentedMay 31, 2023

Ah, I got it that the parent issue is about removal of a thin thus unnecessary typedef, not about changing the multybyte machinery for the next major version of CPython.

Copy link

Member

arhadthedev commentedMay 31, 2023

Initially I've got an impression that thePEP-393 removal ofPy_UNICODE leaves the C API without a wide character type at all (so we need to fill the gap with any other wide char type).

Now I see that this would require a PEP before the removal.

Copy link

MemberAuthor

vstinner commentedJun 1, 2023

Can we use char16_t from С11?

That would be wrong. Python has many C functions which really expect 16-bit or 32-bit wchar_t like PyUnicode_FromWideChar().

Initially I've got an impression that thePEP-393 removal of Py_UNICODE leaves the C API without a wide character type at all

There is Py_UCS4 which should be 32-bit and is able to store all Unicode characters.

Where Py_UNICODE was not required, my recommendation is "use UTF-8 always".

Right. PEP 393 implementation first added many functions using Py_UCS4 arrays. It was inefficient since most of the time, all code points could be stored in Py_UCS1 arrays (4x smaller). Many strings are just ASCII. There are now more memory efficient structures. I also wrote _PyUnicodeWriter private API to change the internal storage depending on the maximum code point.