Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[C API] Add an efficient public PyUnicodeWriter API #119182

Closed
Labels
@vstinner

Description

@vstinner

Feature or enhancement

Creating a Python string object in an efficient way is complicated. Python hasprivate_PyUnicodeWriter API. It's being used by these projects:

Affected projects (5):

  • Cython (3.0.9)
  • asyncpg (0.29.0)
  • catboost (1.2.3)
  • frozendict (2.4.0)
  • immutables (0.20)

I propose making the API public to promote it and help C extensions maintainers to write more efficient code to create Python string objects.

API:

typedefstructPyUnicodeWriterPyUnicodeWriter;PyAPI_FUNC(PyUnicodeWriter*)PyUnicodeWriter_Create(void);PyAPI_FUNC(void)PyUnicodeWriter_Discard(PyUnicodeWriter*writer);PyAPI_FUNC(PyObject*)PyUnicodeWriter_Finish(PyUnicodeWriter*writer);PyAPI_FUNC(void)PyUnicodeWriter_SetOverallocate(PyUnicodeWriter*writer,intoverallocate);PyAPI_FUNC(int)PyUnicodeWriter_WriteChar(PyUnicodeWriter*writer,Py_UCS4ch);PyAPI_FUNC(int)PyUnicodeWriter_WriteUTF8(PyUnicodeWriter*writer,constchar*str,// decoded from UTF-8Py_ssize_tlen);// use strlen() if len < 0PyAPI_FUNC(int)PyUnicodeWriter_Format(PyUnicodeWriter*writer,constchar*format,    ...);// Write str(obj)PyAPI_FUNC(int)PyUnicodeWriter_WriteStr(PyUnicodeWriter*writer,PyObject*obj);// Write repr(obj)PyAPI_FUNC(int)PyUnicodeWriter_WriteRepr(PyUnicodeWriter*writer,PyObject*obj);// Write str[start:end]PyAPI_FUNC(int)PyUnicodeWriter_WriteSubstring(PyUnicodeWriter*writer,PyObject*str,Py_ssize_tstart,Py_ssize_tend);

The internal writer buffer isoverallocated by default.PyUnicodeWriter_Finish() truncates the buffer to the exact size if the buffer was overallocated.

Overallocation reduces the cost of exponential complexity when adding short strings in a loop. UsePyUnicodeWriter_SetOverallocate(writer, 0) to disable overallocation just before the last write.

The writer takes care of the internal buffer kind: Py_UCS1 (latin1), Py_UCS2 (BMP) or Py_UCS4 (full Unicode Character Set). It also implements an optimization if a single write is made usingPyUnicodeWriter_WriteStr(): it returns the string unchanged without any copy.


Example of usage (simplified code from Python/unionobject.c):

staticPyObject*union_repr(PyObject*self){unionobject*alias= (unionobject*)self;Py_ssize_tlen=PyTuple_GET_SIZE(alias->args);PyUnicodeWriter*writer=PyUnicodeWriter_Create();if (writer==NULL) {returnNULL;    }for (Py_ssize_ti=0;i<len;i++) {if (i>0&&PyUnicodeWriter_WriteUTF8(writer," | ",3)<0) {            gotoerror;        }PyObject*p=PyTuple_GET_ITEM(alias->args,i);if (PyUnicodeWriter_WriteRepr(writer,p)<0) {            gotoerror;        }    }returnPyUnicodeWriter_Finish(writer);error:PyUnicodeWriter_Discard(writer);returnNULL;}

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp