python/cpythonPublic

NotificationsYou must be signed in to change notification settings
Fork34.1k
Star71.6k

[WIP] gh-129813, PEP 782: Add PyBytesWriter C API#131681

Closed

vstinner wants to merge 30 commits intopython:mainfrom

vstinner:bytes_writer_size

Closed

[WIP] gh-129813, PEP 782: Add PyBytesWriter C API#131681
vstinner wants to merge 30 commits intopython:mainfrom
vstinner:bytes_writer_size

Conversation

Copy link

Member

vstinner commentedMar 24, 2025•
edited by bedevere-appbot
Loading

Add functions:

PyBytesWriter_Create()
PyBytesWriter_Discard()
PyBytesWriter_Finish()
PyBytesWriter_FinishWithSize()
PyBytesWriter_FinishWithEndPointer()
PyBytesWriter_Data()
PyBytesWriter_Allocated()
PyBytesWriter_SetSize()
PyBytesWriter_Resize()

Issue:[C API] PEP 782: Add PyBytesWriter API #129813

vstinner added the skip news label

Mar 24, 2025

bedevere-appbot mentioned this pull request

Mar 24, 2025

[C API] PEP 782: Add PyBytesWriter API#129813

Closed

vstinner force-pushed thebytes_writer_size branch from459f3d8 to9097e5fCompare

March 24, 2025 16:55

pythongh-129813: Add PyBytesWriter C API (flavor with size)

e24d40e

Add functions:* PyBytesWriter_Create()* PyBytesWriter_Discard()* PyBytesWriter_Finish()* PyBytesWriter_FinishWithSize()* PyBytesWriter_FinishWithEndPointer()* PyBytesWriter_Data()* PyBytesWriter_Allocated()* PyBytesWriter_SetSize()* PyBytesWriter_Resize()

vstinner force-pushed thebytes_writer_size branch from9097e5f toe24d40eCompare

March 24, 2025 16:56

vstinner changed the title~~[WIP] gh-129813: Add PyBytesWriter C API (with size flavor)~~[WIP] gh-129813: Add PyBytesWriter C API (flavor with size)

Mar 24, 2025

vstinner added24 commits

March 25, 2025 13:07

Remove PyBytesWriter_SetSize()

8761a9b

Add tests

92e1294

* Add PyBytesWriter_GetSize()* Rename:  * PyBytesWriter_Data() => PyBytesWriter_GetData()  * PyBytesWriter_Allocated() => PyBytesWriter_GetAllocated()

Add PyBytesWriter_WriteBytes()

eff71b5

Add PyBytesWriter_Format()

31c7ca7

Fix build on Windows

86d0fd9

Add PyBytesWriter_ResizeAndUpdatePointer() function

79fa5f8

Convert _PyBytes_FromIterator()

bf60f7f

Add _PyBytesWriter_CreateByteArray()

62a15be

Convert _PyBytes_FromHex().

Convert _PyBytes_FormatEx()

0a70d70

Rename PyBytesWriter_FinishWithPointer()

457e21a

Add PyBytesWriter_GrowAndUpdatePointer()

40ef4e1

Make PyBytesWriter_ResizeAndUpdatePointer() private

0313087

Make PyBytesWriter_GetAllocated() private

c8ac889

Don't overallocate for bytearray()

7095ac4

Move _PyBytesWriter_CreateByteArray() to the internal C API

befd574

Move code

3ba1d1c

Add examples

ede2776

Add high-level API example

be56685

Fix tests

1135390

fix linter

000ba58

Convert more functions

b864c26

Replace  PyBytes_FromStringAndSize(NULL, 0) withPy_GetConstant(Py_CONSTANT_EMPTY_BYTES).

Convert _hashopenssl function

6d7e37d

Detect strlen() overflow

d8a4659

Fix mmap

ed00f95

vstinner added2 commits

March 31, 2025 18:49

Grow() can now shrink the buffer

6307895

Fix WriteBytes()

18d41ff

vstinner changed the title~~[WIP] gh-129813: Add PyBytesWriter C API (flavor with size)~~[WIP] gh-129813, PEP 782: Add PyBytesWriter C API

Apr 2, 2025

Merge branch 'main' into bytes_writer_size

4cf51f3

Copy link

MemberAuthor

vstinner commentedApr 22, 2025•
edited
Loading

This change has no impact on performance, even if the new public API allocates memory on the heap, instead of allocating on the stack. It uses a freelist to optimizePyBytesWriter_Create().

Microbenchmark on 3 functions, to compare the private_PyBytesWriter (ref) to the new publicPyBytesWriter (change):

bytes(list)
bytes.fromhex(str)
binascii.b2a_uu(bytes)

importpyperfimportbinasciirunner=pyperf.Runner()runner.bench_func('from list 100',bytes,list(b'x'*100))runner.bench_func('from list 1,000',bytes,list(b'x'*1_000))runner.bench_func('from hex 100',bytes.fromhex,bytes(range(100)).hex())runner.bench_func('from hex 1,000',bytes.fromhex, (b'x'*1_000).hex())runner.bench_func('b2a_uu',binascii.b2a_uu,b'x'*45)

Result:

Benchmark	ref	change
from list 100	631 ns	623 ns: 1.01x faster
from hex 100	141 ns	145 ns: 1.03x slower
from hex 1,000	1.03 us	1.04 us: 1.00x slower
b2a_uu	112 ns	111 ns: 1.01x faster
Geometric mean	(ref)	1.00x slower

Benchmark hidden because not significant (1): from list 1,000

Copy link

MemberAuthor

vstinner commentedApr 22, 2025

Benchmark comparingPyBytes_FromStringAndSize(NULL, length) (ref) toPyBytesWriter_Create() (change).

Benchmark:

importpyperfSIZES= (10,100,500)runner=pyperf.Runner()forsizeinSIZES:large_int= (2** (size*8)-1)runner.bench_func(f'to_bytes({size})',large_int.to_bytes,size)forsizeinSIZES:mem=memoryview(b'x'*size)runner.bench_func(f'memoryview({size}).tobytes()',mem.tobytes)

Result:

Benchmark	ref	change
to_bytes(10)	56.3 ns	66.4 ns: 1.18x slower (+10.1 ns)
to_bytes(100)	152 ns	162 ns: 1.06x slower (+10 ns)
to_bytes(500)	563 ns	559 ns: 1.01x faster (+4 ns)
memoryview(10).tobytes()	37.5 ns	47.0 ns: 1.25x slower (+9.5 ns)
memoryview(100).tobytes()	35.3 ns	46.6 ns: 1.32x slower (+11.3 ns)
memoryview(500).tobytes()	45.5 ns	55.3 ns: 1.21x slower (+9.8 ns)
Geometric mean	(ref)	1.16x slower

It's hard to beatPyBytes_FromStringAndSize(NULL, length) performance, sincePyBytesWriter_Create() is a wrapper built on top ofPyBytes_FromStringAndSize(NULL, length).

There is an overhead around10 ns when usingPyBytesWriter.

vstinner mentioned this pull request

Apr 22, 2025

PEP 782: Add PyBytesWriter C APIcapi-workgroup/decisions#62

Closed

5 tasks

Copy link

Member

serhiy-storchaka commentedMay 6, 2025

Could you please benchmark the following?

ASCII, Latin1 and UTF-8 encoders. For ASCII-only and non-ASCII data.
The backslashreplace and xmlcharrefreplace error handlers (encoding).
PyBytes_FromFormat(). Especially with few % formats and large raw data between them.
PyBytes_DecodeEscape().

Copy link

MemberAuthor

vstinner commentedMay 6, 2025

I wrote a big PR to show how PEP 782 would look like and how it's being used. But if PEP 782 is accepted, I will only start by adding the API without using it. Then I will write separated changes to use the new API and run benchmarks on each change.

ASCII, Latin1 and UTF-8 encoders. For ASCII-only and non-ASCII data.

I didn't modify these encoders, they still use the private_PyBytesWriter API.

The backslashreplace and xmlcharrefreplace error handlers (encoding).

Same.

If I modify these encoders and error handlers later, I will run benchmarks to decide if it's acceptable to use the public API or not.

Copy link

MemberAuthor

vstinner commentedMay 6, 2025

Microbenchmark onPyBytes_FromFormat() andPyBytes_DecodeEscape() functions.

Details

importpyperfrunner=pyperf.Runner()importctypesfromctypesimportpythonapi,py_objectfromctypesimport (c_int,c_uint,c_long,c_ulong,c_size_t,c_ssize_t,c_char_p)PyBytes_FromFormat=pythonapi.PyBytes_FromFormatPyBytes_FromFormat.argtypes= (c_char_p,)PyBytes_FromFormat.restype=py_objectPyBytes_DecodeEscape=pythonapi.PyBytes_DecodeEscapePyBytes_DecodeEscape.argtypes= (c_char_p,c_size_t,c_char_p,c_size_t,c_char_p)PyBytes_DecodeEscape.restype=py_objectrunner.bench_func('Format hello world',PyBytes_FromFormat,b'Hello %s !',b'world')fmt= (b'Hell%c'+b' '*1024+b' %s')runner.bench_func('Format long format',PyBytes_FromFormat,fmt,c_int(ord('o')),b'world')s=b'abc\\ndef\\x40.'runner.bench_func('Decode simple',PyBytes_DecodeEscape,s,len(s),None,0,b'unused')s=b'x'*1024runner.bench_func('Decode long copy',PyBytes_DecodeEscape,s,len(s),None,0,b'unused')s=b'\\x40'*1024runner.bench_func('Decode long\\x40',PyBytes_DecodeEscape,s,len(s),None,0,b'unused')

Results:

Benchmark	ref	pep782
Format long format	1.06 us	1.04 us: 1.02x faster
Decode simple	776 ns	743 ns: 1.04x faster
Decode long copy	1.38 us	1.34 us: 1.03x faster
Decode long \x40	2.70 us	2.67 us: 1.01x faster
Geometric mean	(ref)	1.02x faster

Benchmark hidden because not significant (1): Format hello world

I'm not sure why PEP 782 is faster, but at least it's not slower :-)

I build Python withgcc -O3 (without PGO, LTO, CPU isolation).

vstinner added2 commits

August 13, 2025 15:02

Merge branch 'main' into bytes_writer_size

93f8447

Merge branch 'main' into bytes_writer_size

a261a43

vstinner mentioned this pull request

Sep 12, 2025

gh-129813, PEP 782: Add PyBytesWriter_Format()#138824

Merged

vstinner closed this

Sep 12, 2025

Copy link

MemberAuthor

vstinner commentedSep 12, 2025

I started to split this huge PR into smaller PRs, see PRs attached to the issue#129813.

vstinner deleted the bytes_writer_size branch

December 3, 2025 15:37

Reviewers

corona10Awaiting requested review from corona10corona10 will be requested when the pull request is marked ready for reviewcorona10 is a code owner

erlend-aaslandAwaiting requested review from erlend-aaslanderlend-aasland will be requested when the pull request is marked ready for reviewerlend-aasland is a code owner

serhiy-storchakaAwaiting requested review from serhiy-storchakaserhiy-storchaka will be requested when the pull request is marked ready for reviewserhiy-storchaka is a code owner

gpsheadAwaiting requested review from gpsheadgpshead will be requested when the pull request is marked ready for reviewgpshead is a code owner

picnixzAwaiting requested review from picnixzpicnixz will be requested when the pull request is marked ready for reviewpicnixz is a code owner

ericsnowcurrentlyAwaiting requested review from ericsnowcurrentlyericsnowcurrently will be requested when the pull request is marked ready for reviewericsnowcurrently is a code owner

ZeroIntensityAwaiting requested review from ZeroIntensityZeroIntensity will be requested when the pull request is marked ready for reviewZeroIntensity is a code owner

Labels

skip news

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] gh-129813, PEP 782: Add PyBytesWriter C API#131681

[WIP] gh-129813, PEP 782: Add PyBytesWriter C API#131681
vstinner wants to merge 30 commits intopython:mainfrom
vstinner:bytes_writer_size

Conversation

vstinner commentedMar 24, 2025•
edited by bedevere-appbot
Loading

Uh oh!

Uh oh!

vstinner commentedApr 22, 2025•
edited
Loading

Uh oh!

Uh oh!

vstinner commentedApr 22, 2025

Uh oh!

serhiy-storchaka commentedMay 6, 2025

Uh oh!

vstinner commentedMay 6, 2025

Uh oh!

vstinner commentedMay 6, 2025

Uh oh!

vstinner commentedSep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Movatterモバイル変換

Uh oh!

Conversation

vstinner commentedMar 24, 2025• edited by bedevere-appbotLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

vstinner commentedApr 22, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

vstinner commentedApr 22, 2025

Uh oh!

serhiy-storchaka commentedMay 6, 2025

Uh oh!

vstinner commentedMay 6, 2025

Uh oh!

vstinner commentedMay 6, 2025

Uh oh!

vstinner commentedSep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

vstinner commentedMar 24, 2025•
edited by bedevere-appbot
Loading

vstinner commentedApr 22, 2025•
edited
Loading