main.json=========Performance version: 1.11.0Python version: 3.15.0a0 (64-bit) revision c600310663Report on macOS-13.7.6-x86_64-i386-64bit-Mach-ONumber of logical CPUs: 8Start date: 2025-05-30 16:28:48.633929End date: 2025-05-30 16:29:21.986698feature.json============Performance version: 1.11.0Python version: 3.15.0a0 (64-bit) revision 566637c24aReport on macOS-13.7.6-x86_64-i386-64bit-Mach-ONumber of logical CPUs: 8Start date: 2025-05-30 16:29:28.723283End date: 2025-05-30 16:29:59.110914### json_loads ###Mean +- std dev: 34.2 us +- 7.3 us -> 30.7 us +- 0.9 us: 1.11x fasterSignificant (t=5.22)

jsonyx-performance-tests (with`--enable-optimizations` and`--with-lto`)

decode	main	feature	difference
Dict with 65,536 booleans	12295.20 μs	12303.52 μs	no difference
List of 65,536 empty strings	2220.59 μs	1891.46 μs	1.17x faster
List of 65,536 ASCII strings	7524.63 μs	7094.07 μs	1.06x faster
List of 65,536 strings	168058.21 μs	179960.84 μs	1.07x slower

Issue:Using the public PyUnicodeWriter C API made the json module slower #133968

Use private unicode writer for json

9ea12ad

Copy link

Member

ZeroIntensity commentedMay 10, 2025

What's the point? This just adds more maintenance if we make changes to howPyUnicodeWriter works.

Copy link

ContributorAuthor

nineteendo commentedMay 10, 2025

They might have caused a performance regression compared to 3.13:faster-cpython/ideas#726
I'm still benchmarking, but wanted to already run the tests.

nineteendo added5 commits

May 10, 2025 17:00

Part 2

46df04f

Restore fast path for integers

ab1aa42

Reduce diff

d18c455

Include necessary headers

51c760f

Use PyUnicodeWriter_WriteRepr

72ae3d0

Copy link

ContributorAuthor

nineteendo commentedMay 10, 2025

cc@vstinner,@mdboom

nineteendo marked this pull request as ready for review

May 11, 2025 15:38

bedevere-appbot added the awaiting review label

May 11, 2025

ZeroIntensity requested changes

May 11, 2025

View reviewed changes

Copy link

Member

ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think this is a good idea.

json optimizations have been rejected in the past--useujson or something like that if performance is critical.
If we change howPyUnicodeWriter works, it adds more maintenance, especially if we use this as precedent for doing this elsewhere.
We should aim for optimizingPyUnicodeWriter as a broader change, not speed up each individual case by using the private API.

bedevere-appbot added awaiting core review and removed awaiting review labels

May 11, 2025

vstinner reviewed

May 11, 2025

View reviewed changes

Copy link

Member

vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would it be possible to only replace PyUnicodeWriter_WriteUTF8() with _PyUnicodeWriter_WriteASCIIString()? Do you get similar performance in this case?

Copy link

Member

vstinner commentedMay 11, 2025

ZeroIntensity commentedMay 11, 2025

If_PyUnicodeWriter_WriteASCIIString is significantly faster thanPyUnicodeWriter_WriteUTF8, then we should expose it as a public API.

Copy link

Member

vstinner commentedMay 11, 2025

If _PyUnicodeWriter_WriteASCIIString is significantly faster than PyUnicodeWriter_WriteUTF8, then we should expose it as a public API.

I chose to not expose it since it generates an invalid string if the input string contains non-ASCII characters. But yeah, maybe we should expose it. The function only validates the input string in debug mode for best performance.

Copy link

ContributorAuthor

nineteendo commentedMay 11, 2025•
edited
Loading

Would it be possible to only replace PyUnicodeWriter_WriteUTF8() with _PyUnicodeWriter_WriteASCIIString()? Do you get similar performance in this case?

Maybe, but my current benchmark has too much overhead to measure this accurately. I'll have to rewrite it first.

I hope we can figure out how to get the performance of the public API very close to the private one, such that everyone feels comfortable using it.

Copy link

ContributorAuthor

nineteendo commentedMay 11, 2025

I updated the benchmark, but I don't understand why:

writing integers is now 20% faster
reading and writing unicode strings is now 2-5% slower (shouldn't be caused by noise)

Does this have something to do with theoverallocate parameter?

Copy link

Member

vstinner commentedMay 11, 2025

Does this have something to do with the overallocate parameter?

The private API doesn't enable overallocation by default.

Copy link

Member

vstinner commentedMay 12, 2025

I replaced PyUnicodeWriter_WriteUTF8() with _PyUnicodeWriter_WriteASCIIString() in Modules/_json.c and ran a benchmark:

Benchmark	ref	write_ascii
encode 100 booleans	9.54 us	8.83 us: 1.08x faster
encode 1000 booleans	60.8 us	53.1 us: 1.15x faster
encode escaped string len=896	4.11 us	4.10 us: 1.00x faster
encode 10000 booleans	569 us	487 us: 1.17x faster
encode 10000 integers	1.03 ms	1.03 ms: 1.00x slower
encode 10000 floats	2.11 ms	2.13 ms: 1.01x slower
Geometric mean	(ref)	1.02x faster

Benchmark hidden because not significant (15): encode 100 integers, encode 100 floats, encode 100 "ascii" strings, encode ascii string len=100, encode escaped string len=128, encode Unicode string len=100, encode 1000 integers, encode 1000 floats, encode 1000 "ascii" strings, encode ascii string len=1000, encode Unicode string len=1000, encode 10000 "ascii" strings, encode ascii string len=10000, encode escaped string len=9984, encode Unicode string len=10000

I built Python with./configure && make and used CPU Isolation on Linux.

Benchmark code:

importjsonimportpyperfrunner=pyperf.Runner()forcountin (100,1_000,10_000):runner.bench_func(f'encode{count} booleans',json.dumps, [True,False]* (count//2))runner.bench_func(f'encode{count} integers',json.dumps,list(range(count)))runner.bench_func(f'encode{count} floats',json.dumps, [1.0]*count)runner.bench_func(f'encode{count} "ascii" strings',json.dumps, ['ascii']*count)text='ascii'text*= (count//len(text)or1)runner.bench_func(f'encode ascii string len={len(text)}',json.dumps,text)text=''.join(chr(ch)forchinrange(128))text*= (count//len(text)or1)runner.bench_func(f'encode escaped string len={len(text)}',json.dumps,text)text='abcd€'text*= (count//len(text)or1)runner.bench_func(f'encode Unicode string len={len(text)}',json.dumps,text)

Copy link

Member

vstinner commentedMay 12, 2025

I also ran my benchmark on this PR:

Benchmark	ref	change
encode 100 booleans	9.53 us	6.26 us: 1.52x faster
encode 100 integers	13.8 us	11.6 us: 1.20x faster
encode 100 floats	24.8 us	19.0 us: 1.30x faster
encode 100 "ascii" strings	17.1 us	12.4 us: 1.37x faster
encode ascii string len=100	902 ns	877 ns: 1.03x faster
encode escaped string len=128	1.10 us	1.07 us: 1.03x faster
encode Unicode string len=100	1.07 us	1.04 us: 1.03x faster
encode 1000 booleans	58.9 us	29.6 us: 1.99x faster
encode 1000 integers	103 us	81.5 us: 1.26x faster
encode 1000 floats	209 us	152 us: 1.37x faster
encode 1000 "ascii" strings	131 us	86.9 us: 1.51x faster
encode ascii string len=1000	3.48 us	3.46 us: 1.00x faster
encode escaped string len=896	4.12 us	3.96 us: 1.04x faster
encode 10000 booleans	546 us	257 us: 2.12x faster
encode 10000 integers	1.00 ms	788 us: 1.27x faster
encode 10000 floats	2.04 ms	1.46 ms: 1.39x faster
encode 10000 "ascii" strings	1.27 ms	806 us: 1.57x faster
encode ascii string len=10000	28.4 us	28.4 us: 1.00x slower
encode escaped string len=9984	38.5 us	36.3 us: 1.06x faster
encode Unicode string len=10000	42.4 us	43.2 us: 1.02x slower
Geometric mean	(ref)	1.26x faster

Benchmark hidden because not significant (1): encode Unicode string len=1000

Copy link

ContributorAuthor

nineteendo commentedMay 12, 2025

The private API doesn't enable overallocation by default.

Yeah, but both the old code and the public API do, so it's not that. (And you really don't want to turn it off)

encode	overallocate	normal	slow down
List of 65,536 booleans	1222.61 μs	10456.60 μs	8.55x slower
List of 65,536 ints	3174.27 μs	12961.39 μs	4.08x slower
Dict with 65,536 booleans	10011.93 μs	29166.14 μs	2.91x slower
List of 65,536 ASCII strings	13303.33 μs	23714.00 μs	1.78x slower
List of 65,536 floats	37103.57 μs	48716.57 μs	1.31x slower
List of 65,536 strings	91757.30 μs	113949.94 μs	1.24x slower

decode	overallocate	normal	slow down
Dict with 65,536 booleans	12194.03 μs	12098.16 μs	1.01x faster
List of 65,536 ASCII strings	7011.86 μs	7101.61 μs	1.01x slower
List of 65,536 strings	36049.66 μs	36412.55 μs	1.01x slower

Copy link

ContributorAuthor

nineteendo commentedMay 12, 2025•
edited
Loading

This line is inefficient for exact string instances (Py_INCREF is enough):

cpython/Objects/unicodeobject.c

Line 13936 in86c1d43

PyObject*str=PyObject_Str(obj);

encode	private	public	slow down
List of 65,536 booleans	1217.10 μs	1854.86 μs	1.52x slower
List of 65,536 ints	3190.74 μs	3701.18 μs	1.16x slower
Dict with 65,536 booleans	8783.92 μs	11459.45 μs	1.30x slower
List of 65,536 ASCII strings	12502.92 μs	14842.52 μs	1.19x slower
List of 65,536 floats	37008.47 μs	39790.32 μs	1.08x slower
List of 65,536 strings	90841.32 μs	94637.59 μs	1.04x slower

decode	private	public	slow down
List of 65,536 ASCII strings	7064.05 μs	7186.32 μs	1.02x slower
List of 65,536 strings	36033.15 μs	36904.06 μs	1.02x slower

Copy link

ContributorAuthor

nineteendo commentedMay 13, 2025•
edited
Loading

Here's the comparison with a minimal PR:

encode	full	minimal	improvement
List of 65,536 booleans	1222.01 μs	1201.54 μs	1.02x faster
List of 65,536 ints	3156.75 μs	3047.81 μs	1.04x faster
Dict with 65,536 booleans	9442.35 μs	8518.33 μs	1.11x faster
List of 65,536 ASCII strings	13405.99 μs	12027.66 μs	1.11x faster
List of 65,536 floats	37037.70 μs	36684.89 μs	1.01x faster
List of 65,536 strings	92007.41 μs	87721.40 μs	1.05x faster

decode	full	minimal	improvement
List of 65,536 ASCII strings	7029.30 μs	7431.43 μs	1.06x slower
List of 65,536 strings	36401.90 μs	34232.03 μs	1.06x faster

Reduce diff

2a6ec43

nineteendo marked this pull request as draft

May 13, 2025 07:31

bedevere-appbot removed the awaiting core review label

May 13, 2025

nineteendo marked this pull request as ready for review

May 13, 2025 09:13

Copy link

Member

vstinner commentedMay 13, 2025

I created issue#133968 to track this work.

@vstinner, could you add a fast path for exact string instances in PyUnicodeWriter_WriteStr()?

I wrote#133969 to add a fast path.

vstinner mentioned this pull request

May 13, 2025

gh-133968: Add fast path to PyUnicodeWriter_WriteStr()#133969

Merged

Copy link

Member

vstinner commentedMay 13, 2025

I wrote#133969 to add a fast path.

Merged. I confirmed with two benchmarks that this small optimization makes a big difference on some use cases such as encoding short strings in JSON.

nineteendo added2 commits

May 13, 2025 16:03

Merge branch 'main' into json-private-unicode-writer

49a92f3

Reduce diff

822ea86

vstinner mentioned this pull request

May 13, 2025

gh-133968: Add PyUnicodeWriter_WriteASCII() function#133973

Merged

Copy link

Member

vstinner commentedMay 13, 2025

@ZeroIntensity:

If _PyUnicodeWriter_WriteASCIIString is significantly faster than PyUnicodeWriter_WriteUTF8, then we should expose it as a public API.

Ok, I created#133973 to add PyUnicodeWriter_WriteASCII().

Copy link

ContributorAuthor

nineteendo commentedMay 13, 2025

Ok, I created#133973 to add PyUnicodeWriter_WriteASCII().

If that's merged we would use this aproach in 3.14, right?

Avoid heap allocation

01c45a9

Copy link

ContributorAuthor

nineteendo commentedMay 15, 2025

@vstinner, it looks like the regression injson.loads() is caused by the heap allocation inPyUnicodeWriter_Create().
I've now delayed the allocation until it's necessary. Thoughts?

Copy link

Member

vstinner commentedMay 16, 2025

@vstinner, it looks like the regression in json.loads() is caused by the heap allocation in PyUnicodeWriter_Create().

Are you sure about that? The implementation uses a freelist which avoids the heap allocation in most cases.

Copy link

Member

vstinner commentedMay 16, 2025

I've now delayed the allocation until it's necessary. Thoughts?

Would you mind to create a separated PR just for that?

Copy link

Member

ZeroIntensity commentedMay 16, 2025

Are the benchmarks creating an unrealistic number of concurrent writers? That would starve the freelist and create some allocation overhead, but only on the benchmarks.

Copy link

ContributorAuthor

nineteendo commentedMay 16, 2025•
edited
Loading

Are you sure about that? The implementation uses a freelist which avoids the heap allocation in most cases.

You're right, it does seem to be using the only entry of the freelist. (I disabled the malloc to check)
There might be some overhead compared to using the stack though.

Copy link

Member

vstinner commentedMay 30, 2025

I suggest closing this PR. It's not worth it anymore (according to the benchmark below) and I prefer to stick to the public C API.

I made two small optimizations in the publicPyUnicodeWriter API:

Add fast path to PyUnicodeWriter_WriteStr():gh-133968: Add fast path to PyUnicodeWriter_WriteStr() #133969
Add PyUnicodeWriter_WriteASCII() function:gh-133968: Add PyUnicodeWriter_WriteASCII() function #133973

With these optimizations, it seems like this PR is less appealing. I ran a benchmark to compare this PR to the current main branch:

Benchmark	main	pr133832
encode 100 booleans	6.52 us	6.61 us: 1.01x slower
encode 100 integers	11.9 us	11.7 us: 1.01x faster
encode 100 floats	19.9 us	20.7 us: 1.04x slower
encode 100 "ascii" strings	13.3 us	13.5 us: 1.01x slower
encode ascii string len=100	901 ns	884 ns: 1.02x faster
encode escaped string len=128	1.11 us	1.07 us: 1.03x faster
encode 1000 booleans	32.6 us	31.8 us: 1.03x faster
encode 1000 integers	88.3 us	83.2 us: 1.06x faster
encode 1000 floats	161 us	168 us: 1.04x slower
encode 1000 "ascii" strings	96.6 us	94.1 us: 1.03x faster
encode ascii string len=1000	3.49 us	3.50 us: 1.00x slower
encode escaped string len=896	4.14 us	3.95 us: 1.05x faster
encode Unicode string len=1000	4.92 us	5.35 us: 1.09x slower
encode 10000 booleans	284 us	272 us: 1.05x faster
encode 10000 integers	850 us	797 us: 1.07x faster
encode 10000 floats	1.56 ms	1.59 ms: 1.02x slower
encode 10000 "ascii" strings	897 us	857 us: 1.05x faster
encode ascii string len=10000	28.5 us	29.2 us: 1.02x slower
encode escaped string len=9984	38.5 us	37.2 us: 1.03x faster
encode Unicode string len=10000	42.4 us	46.8 us: 1.10x slower
Geometric mean	(ref)	1.00x faster

The best speedup is 1.07x faster for "encode 10000 integers".

The worst slowdown is 1.10x slower for "encode Unicode string len=10000".

Overall, the impact is "1.00x faster" which is not impressive.

Copy link

Member

vstinner commentedMay 30, 2025

Hum. I would be interested by a change which would just remove _PyUnicodeWriter_IsEmpty(), without touching WriteUTF8/WriteASCII calls.

Copy link

ContributorAuthor

nineteendo commentedMay 30, 2025

I suggest closing this PR. It's not worth it anymore (according to the benchmark below) and I prefer to stick to the public C API.

According to the pyperformance benchmark, json_loads is still 10% slower because of the freelist. And after I've updated the PR, it will only be using the public API.

Merge branch 'main' into json-private-unicode-writer

566637c

Copy link

ContributorAuthor

nineteendo commentedMay 30, 2025

Done. Decoding empty strings is now 17% faster. Annoyingly, decoding strings with escapes is 7% slower.

vstinner added the skip news label

May 31, 2025

vstinner merged commitc81446a intopython:main

May 31, 2025

40 checks passed

bedevere-appbot removed the awaiting review label

May 31, 2025

Copy link

Member

vstinner commentedMay 31, 2025

I merged your change, thanks.

nineteendo deleted the json-private-unicode-writer branch

May 31, 2025 12:07

Copy link

ContributorAuthor

nineteendo commentedMay 31, 2025

This still needs to be backported.

Copy link

Member

ZeroIntensity commentedMay 31, 2025

Hm, do we backport performance improvements?

Copy link

ContributorAuthor

nineteendo commentedMay 31, 2025

We backported the other fixes for the performance regression. See the issue.

Copy link

Member

ZeroIntensity commentedMay 31, 2025

I thought the introduction ofWriteASCII fixed the regression.

Copy link

Member

vstinner commentedMay 31, 2025

I would prefer to not backport this change.

vstinner mentioned this pull request

Jun 11, 2025

gh-135336: Add fast path to json string encoding#133239

Open

Labels

skip news

3 participants

Movatterモバイル変換

Uh oh!

gh-133968: Use private unicode writer for json#133832

gh-133968: Use private unicode writer for json#133832

Uh oh!

Conversation

nineteendo commentedMay 10, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

pyperformance (with--enable-optimizations and--with-lto)

jsonyx-performance-tests (with--enable-optimizations and--with-lto)

Uh oh!

ZeroIntensity commentedMay 10, 2025

Uh oh!

nineteendo commentedMay 10, 2025

Uh oh!

nineteendo commentedMay 10, 2025

Uh oh!

ZeroIntensity left a comment

Choose a reason for hiding this comment

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

vstinner commentedMay 11, 2025

Uh oh!

ZeroIntensity commentedMay 11, 2025

Uh oh!

vstinner commentedMay 11, 2025

Uh oh!

nineteendo commentedMay 11, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

nineteendo commentedMay 11, 2025

Uh oh!

vstinner commentedMay 11, 2025

Uh oh!

vstinner commentedMay 12, 2025

Uh oh!

vstinner commentedMay 12, 2025

Uh oh!

nineteendo commentedMay 12, 2025

Uh oh!

nineteendo commentedMay 12, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

nineteendo commentedMay 13, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

vstinner commentedMay 13, 2025

Uh oh!

vstinner commentedMay 13, 2025

Uh oh!

vstinner commentedMay 13, 2025

Uh oh!

nineteendo commentedMay 13, 2025

Uh oh!

nineteendo commentedMay 15, 2025

Uh oh!

vstinner commentedMay 16, 2025

Uh oh!

vstinner commentedMay 16, 2025

Uh oh!

ZeroIntensity commentedMay 16, 2025

Uh oh!

nineteendo commentedMay 16, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

vstinner commentedMay 30, 2025

Uh oh!

vstinner commentedMay 30, 2025

Uh oh!

nineteendo commentedMay 30, 2025

Uh oh!

nineteendo commentedMay 30, 2025

Uh oh!

Uh oh!

vstinner commentedMay 31, 2025

Uh oh!

nineteendo commentedMay 31, 2025

Uh oh!

nineteendo commentedMay 10, 2025•
edited
Loading

pyperformance (with`--enable-optimizations` and`--with-lto`)

jsonyx-performance-tests (with`--enable-optimizations` and`--with-lto`)

nineteendo commentedMay 11, 2025•
edited
Loading

nineteendo commentedMay 12, 2025•
edited
Loading

nineteendo commentedMay 13, 2025•
edited
Loading

nineteendo commentedMay 16, 2025•
edited
Loading