This PEP proposes to remove deprecatedPy_UNICODE encoder APIs in Python 3.11:
PyUnicode_Encode()PyUnicode_EncodeASCII()PyUnicode_EncodeLatin1()PyUnicode_EncodeUTF7()PyUnicode_EncodeUTF8()PyUnicode_EncodeUTF16()PyUnicode_EncodeUTF32()PyUnicode_EncodeUnicodeEscape()PyUnicode_EncodeRawUnicodeEscape()PyUnicode_EncodeCharmap()PyUnicode_TranslateCharmap()PyUnicode_EncodeDecimal()PyUnicode_TransformDecimalToASCII()Note
PEP 623 propose to removeUnicode object APIs relating toPy_UNICODE. On the other hand, this PEPis not relating to Unicode object. These PEPs are split because they havedifferent motivations and need different discussions.
In general, reducing the number of APIs that have been deprecated fora long time and have few users is a good idea for not only itimproves the maintainability of CPython, but it also helps API usersand other Python implementations.
Py_UNICODE and APIs using it has been deprecated since Python 3.3.
All of these APIs are implemented usingPyUnicode_FromWideChar.So these APIs are inefficient when user want to encode Unicodeobject.
When searching from the top 4000 PyPI packages[1], only pyodbc usethese APIs.
PyUnicode_EncodeUTF8()PyUnicode_EncodeUTF16()pyodbc uses these APIs to encode Unicode object into bytes object.So it is easy to fix it.[2]
There are alternative APIs to acceptPyObject*unicode instead ofPy_UNICODE*. Users can migrate to them.
| Deprecated API | Alternative APIs |
|---|---|
PyUnicode_Encode() | PyUnicode_AsEncodedString() |
PyUnicode_EncodeASCII() | PyUnicode_AsASCIIString() (1) |
PyUnicode_EncodeLatin1() | PyUnicode_AsLatin1String() (1) |
PyUnicode_EncodeUTF7() | (2) |
PyUnicode_EncodeUTF8() | PyUnicode_AsUTF8String() (1) |
PyUnicode_EncodeUTF16() | PyUnicode_AsUTF16String() (3) |
PyUnicode_EncodeUTF32() | PyUnicode_AsUTF32String() (3) |
PyUnicode_EncodeUnicodeEscape() | PyUnicode_AsUnicodeEscapeString() |
PyUnicode_EncodeRawUnicodeEscape() | PyUnicode_AsRawUnicodeEscapeString() |
PyUnicode_EncodeCharmap() | PyUnicode_AsCharmapString() (1) |
PyUnicode_TranslateCharmap() | PyUnicode_Translate() |
PyUnicode_EncodeDecimal() | (4) |
PyUnicode_TransformDecimalToASCII() | (4) |
Notes:
constchar*errors parameter is missing.PyUnicode_AsEncodedString() instead.constchar*errors,intbyteorder parameters are missing.Py_UNICODE_TODECIMALcan be used instead. CPython uses_PyUnicode_TransformDecimalAndSpaceToASCII for convertingfrom Unicode to numbers instead.Remove these APIs in Python 3.11. They have been deprecated already.
PyUnicode_Encode()PyUnicode_EncodeASCII()PyUnicode_EncodeLatin1()PyUnicode_EncodeUTF7()PyUnicode_EncodeUTF8()PyUnicode_EncodeUTF16()PyUnicode_EncodeUTF32()PyUnicode_EncodeUnicodeEscape()PyUnicode_EncodeRawUnicodeEscape()PyUnicode_EncodeCharmap()PyUnicode_TranslateCharmap()PyUnicode_EncodeDecimal()PyUnicode_TransformDecimalToASCII()Py_UNICODE* withPyObject*As described in the “Alternative APIs” section, some APIs don’t havepublic alternative APIs acceptingPyObject*unicode input.And some public alternative APIs have restrictions like missingerrors andbyteorder parameters.
Instead of removing deprecated APIs, we can reuse their names foralternative public APIs.
Since we have private alternative APIs already, it is just renamingfrom private name to public and deprecated names.
| Rename to | Rename from |
|---|---|
PyUnicode_EncodeASCII() | _PyUnicode_AsASCIIString() |
PyUnicode_EncodeLatin1() | _PyUnicode_AsLatin1String() |
PyUnicode_EncodeUTF7() | _PyUnicode_EncodeUTF7() |
PyUnicode_EncodeUTF8() | _PyUnicode_AsUTF8String() |
PyUnicode_EncodeUTF16() | _PyUnicode_EncodeUTF16() |
PyUnicode_EncodeUTF32() | _PyUnicode_EncodeUTF32() |
Pros:
Cons:
PyUnicode_AsEncodedString() can be used in other cases.Py_UNICODE* withPy_UCS4*We can replacePy_UNICODE withPy_UCS4 and undeprecatethese APIs.
UTF-8, UTF-16, UTF-32 encoders supportPy_UCS4 internally.SoPyUnicode_EncodeUTF8(),PyUnicode_EncodeUTF16(), andPyUnicode_EncodeUTF32() can avoid to create a temporary Unicodeobject.
Pros:
Py_UCS4* into bytes object with UTF-8, UTF-16, UTF-32 codecs.Cons:
Py_UNICODE* withwchar_t*We can replacePy_UNICODE withwchar_t. SincePy_UNICODEis typedef ofwchar_t already, this is status quo.
On platforms wheresizeof(wchar_t)==4, we can avoid to create atemporary Unicode object when encoding fromwchar_t* to bytesobjects using UTF-8, UTF-16, and UTF-32 codec, like the “ReplacePy_UNICODE* withPy_UCS4*” idea.
Pros:
Py_UCS4* into bytes object with UTF-8, UTF-16, UTF-32 codecson platform wheresizeof(wchar_t)==4.Cons:
wchar_theavily, these APIs need to create a temporary Unicode objectalways becausesizeof(wchar_t)==2 on Windows.In addition to existing compiler warning, emitting runtimeDeprecationWarning is suggested.
But these APIs doesn’t release GIL for now. Emitting a warning fromsuch APIs is not safe. See this example.
PyObject*u=PyList_GET_ITEM(list,i);//uisborrowedreference.PyObject*b=PyUnicode_EncodeUTF8(PyUnicode_AS_UNICODE(u),PyUnicode_GET_SIZE(u),NULL);//Assumesuisstilllivingreference.PyObject*t=PyTuple_Pack(2,u,b);Py_DECREF(b);returnt;
If we emit Python warning fromPyUnicode_EncodeUTF8(), warningfilters and other threads may change thelist andu can bea dangling reference afterPyUnicode_EncodeUTF8() returned.
PyUnicode_DecodeASCII() andPyUnicode_DecodeUTF8() areused very widely. Deprecating them is not worth enough.This document is placed in the public domain or under theCC0-1.0-Universal license, whichever is more permissive.
Source:https://github.com/python/peps/blob/main/peps/pep-0624.rst
Last modified:2025-02-01 08:55:40 GMT