Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-105156: Deprecate the old Py_UNICODE type in C API#105157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
vstinner merged 2 commits intopython:mainfromvstinner:deprecate_py_unicode
Jun 1, 2023

Conversation

vstinner
Copy link
Member

@vstinnervstinner commentedMay 31, 2023
edited by github-actionsbot
Loading

Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API: use wchar_t instead.

Replace Py_UNICODE with wchar_t in multiple C files.


📚 Documentation preview 📚:https://cpython-previews--105157.org.readthedocs.build/

Deprecate the old Py_UNICODE and PY_UNICODE_TYPE types in the C API:use wchar_t instead.Replace Py_UNICODE with wchar_t in multiple C files.
@vstinner
Copy link
MemberAuthor

cc@methane

@methane
Copy link
Member

Sourcegraph results:

It seems two releases is not enough for removingPy_UNICODE. But let's see it two years later.

@vstinner
Copy link
MemberAuthor

It seems two releases is not enough for removing Py_UNICODE. But let's see it two years later.

This PR is mostly about deprecation. I prefer to announce a Python release when these types will be removed, Python 3.15. But we will have to do this usage study again when these types will be removed for real.

The warning should help users to find old code still using Py_UNICODE by mistake or not.

@methane
Copy link
Member

@vstinner
Copy link
MemberAuthor

Sourcegraph results: Py_UNICODE

The first result isPy_UNICODE *inp = PyUnicode_AS_UNICODE(in);. This code is already broken by Python 3.12: the function got removed.

Co-authored-by: Inada Naoki <songofacandy@gmail.com>
@vstinner
Copy link
MemberAuthor

Fix here too.https://github.com/python/cpython/pull/105157/files#file-modules-posixmodule-c-L5653

I planned to write a separated PR for code generated by Argument Clinic. It's now done with: PR#105161.

@vstinner
Copy link
MemberAuthor

I will wait until they 2 other PRs of this issue will be merged, to avoid emitting new compiler warnings.

@arhadthedev
Copy link
Member

use wchar_t instead.

Can we usechar16_t from С11? Docs:https://en.cppreference.com/w/c/string/multibyte/char16_t.

It would avoid 2-vs-4-byte size discrepancy.

@methane
Copy link
Member

Can we usechar16_t from С11? Docs:https://en.cppreference.com/w/c/string/multibyte/char16_t.

It would avoid 2-vs-4-byte size discrepancy.

At where?

Py_UNICODE has been wchar_t since Python 3.3.
So user should use wchar_t where Py_UNICODE was required before.

Where Py_UNICODE was not required, my recommendation is "use UTF-8 always".

arhadthedev reacted with thumbs up emoji

@arhadthedev
Copy link
Member

Ah, I got it that the parent issue is about removal of a thin thus unnecessary typedef, not about changing the multybyte machinery for the next major version of CPython.

@arhadthedev
Copy link
Member

Initially I've got an impression that thePEP-393 removal ofPy_UNICODE leaves the C API without a wide character type at all (so we need to fill the gap with any other wide char type).

Now I see that this would require a PEP before the removal.

@vstinner
Copy link
MemberAuthor

Can we use char16_t from С11?

That would be wrong. Python has many C functions which really expect 16-bit or 32-bit wchar_t like PyUnicode_FromWideChar().

Initially I've got an impression that thePEP-393 removal of Py_UNICODE leaves the C API without a wide character type at all

There is Py_UCS4 which should be 32-bit and is able to store all Unicode characters.

Where Py_UNICODE was not required, my recommendation is "use UTF-8 always".

Right. PEP 393 implementation first added many functions using Py_UCS4 arrays. It was inefficient since most of the time, all code points could be stored in Py_UCS1 arrays (4x smaller). Many strings are just ASCII. There are now more memory efficient structures. I also wrote _PyUnicodeWriter private API to change the internal storage depending on the maximum code point.

@vstinnervstinner merged commit8ed705c intopython:mainJun 1, 2023
@vstinnervstinner deleted the deprecate_py_unicode branchJune 1, 2023 06:56
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@methanemethanemethane approved these changes

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

4 participants
@vstinner@methane@arhadthedev@bedevere-bot

[8]ページ先頭

©2009-2025 Movatter.jp