Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32k
json: Optimize escaping string in Encoder#133186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
Uh oh!
There was an error while loading.Please reload this page.
without
|
Uh oh!
There was an error while loading.Please reload this page.
I'm going to benchmark this on pyperformance on the Faster CPython infrastructure and report back in a couple of hours. |
I benchmarked this feature on my own library and I'm a bit worried. Strings without escapes are faster, but strings with escapes are a lot slower:
|
methane commentedApr 30, 2025 via email• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
How about adding if (copy_len > 0) before PyUnicodeWriter_WriteSubstring? |
Better, but it's still twice as slow:
|
nineteendo commentedApr 30, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
How about just writing strings without escapes directly to the unicode writer? _PyUnicodeWriter_WriteChar(writer,'"')_PyUnicodeWriter_WriteStr(writer,pystr)// original string_PyUnicodeWriter_WriteChar(writer,'"') |
nineteendo commentedApr 30, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Results of that (nineteendo/jsonyx@7c31ee4):
It's going to be a little harder to apply the change here (unless we just duplicate the functions). |
I would still like a proper fix forfaster-cpython/ideas#726 though. Should we just switch back to the private API? |
See#133239 for my approach. |
https://gist.github.com/methane/e080ec9783db2a313f40a2b9e1837e72
Benchmark hidden because not significant (5): json_dumps: List of 256 floats, json_dumps(ensure_ascii=False): List of 256 floats, json_loads: List of 256 dicts with 1 int, json_loads: Complex object, json_loads: Complex objectensure_ascii=False |
methane commentedMay 1, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
This PR is faster, but#133239 is enough for fixing regression from Python 3.13. For longer term, encoder should use private (maybe utf-8) buffer instead of PyUnicodeWriter. |
It's still not fully fixed, encoding booleans is twice as slow. And I don't fully understand why this PR is faster. |
Just as a data point, on our Faster CPython infrastructure, this makes the json_dumps benchmark14.8% faster than main, and is within thenoise as the same performance as 3.13.0. I will also kick off a run on#133239 for comparison. |
methane commentedMay 2, 2025 • edited by hugovk
Loading Uh oh!
There was an error while loading.Please reload this page.
edited by hugovk
Uh oh!
There was an error while loading.Please reload this page.
Using $./python -m pyperf compare_to with-fast-path.json use_write_ascii.json -GSlower (3):- json_dumps(ensure_ascii=False): List of 256 dicts with 1 int: 101 us +- 0 us -> 102 us +- 0 us: 1.00x slower- json_loads: Dict with 256 lists of 256 dicts with 1 int: 46.6 ms +- 0.1 ms -> 46.8 ms +- 0.5 ms: 1.00x slower- json_dumps(ensure_ascii=False): List of 256 floats: 239 us +- 1 us -> 239 us +- 1 us: 1.00x slowerFaster (10):- json_dumps(ensure_ascii=False): List of 256 strings: 303 us +- 5 us -> 279 us +- 3 us: 1.08x faster- json_dumps: List of 256 strings: 302 us +- 3 us -> 278 us +- 3 us: 1.08x faster- json_dumps(ensure_ascii=False): List of 256 booleans: 16.5 us +- 0.1 us -> 15.3 us +- 0.1 us: 1.08x faster- json_dumps: List of 256 booleans: 16.5 us +- 0.1 us -> 15.3 us +- 0.1 us: 1.07x faster- json_dumps: Complex object: 1.96 ms +- 0.01 ms -> 1.87 ms +- 0.01 ms: 1.05x faster- json_dumps(ensure_ascii=False): Complex object: 1.96 ms +- 0.01 ms -> 1.87 ms +- 0.02 ms: 1.05x faster- json_dumps: Medium complex object: 173 us +- 1 us -> 171 us +- 1 us: 1.01x faster- json_dumps(ensure_ascii=False): Medium complex object: 172 us +- 1 us -> 171 us +- 1 us: 1.01x faster- json_loads: Medium complex object: 148 us +- 1 us -> 147 us +- 1 us: 1.00x faster- json_dumps: List of 256 floats: 239 us +- 0 us -> 239 us +- 0 us: 1.00x fasterBenchmark hidden because not significant (13): json_dumps: List of 256 ASCII strings, json_dumps: List of 256 dicts with 1 int, json_dumps: Dict with 256 lists of 256 dicts with 1 int, json_dumps(ensure_ascii=False): List of 256 ASCII strings, json_dumps(ensure_ascii=False): Dict with 256 lists of 256 dicts with 1 int, json_loads: List of 256 booleans, json_loads: List of 256 ASCII strings, json_loads: List of 256 floats, json_loads: List of 256 dicts with 1 int, json_loads: List of 256 strings, json_loads: Complex object, json_loads: List of 256 stringsensure_ascii=False, json_loads: Complex objectensure_ascii=False Patch: diff --git a/Modules/_json.c b/Modules/_json.cindex cd08fa688d3..cd57760282a 100644--- a/Modules/_json.c+++ b/Modules/_json.c@@ -351,7 +351,7 @@ write_escaped_ascii(PyUnicodeWriter *writer, PyObject *pystr) } if (buf_len + 12 > ESCAPE_BUF_SIZE) {- ret = PyUnicodeWriter_WriteUTF8(writer, buf, buf_len);+ ret = _PyUnicodeWriter_WriteASCIIString((_PyUnicodeWriter*)writer, buf, buf_len); if (ret) return ret; buf_len = 0; }@@ -359,7 +359,7 @@ write_escaped_ascii(PyUnicodeWriter *writer, PyObject *pystr) assert(buf_len < ESCAPE_BUF_SIZE); buf[buf_len++] = '"';- return PyUnicodeWriter_WriteUTF8(writer, buf, buf_len);+ return _PyUnicodeWriter_WriteASCIIString((_PyUnicodeWriter*)writer, buf, buf_len); } static int@@ -1612,13 +1612,13 @@ encoder_listencode_obj(PyEncoderObject *s, PyUnicodeWriter *writer, int rv; if (obj == Py_None) {- return PyUnicodeWriter_WriteUTF8(writer, "null", 4);+ return _PyUnicodeWriter_WriteASCIIString((_PyUnicodeWriter*)writer, "null", 4); } else if (obj == Py_True) {- return PyUnicodeWriter_WriteUTF8(writer, "true", 4);+ return _PyUnicodeWriter_WriteASCIIString((_PyUnicodeWriter*)writer, "true", 4); } else if (obj == Py_False) {- return PyUnicodeWriter_WriteUTF8(writer, "false", 5);+ return _PyUnicodeWriter_WriteASCIIString((_PyUnicodeWriter*)writer, "false", 5); } else if (PyUnicode_Check(obj)) { return encoder_write_string(s, writer, obj);@@ -1779,7 +1779,7 @@ encoder_listencode_dict(PyEncoderObject *s, PyUnicodeWriter *writer, if (PyDict_GET_SIZE(dct) == 0) { /* Fast path */- return PyUnicodeWriter_WriteUTF8(writer, "{}", 2);+ return _PyUnicodeWriter_WriteASCIIString((_PyUnicodeWriter*)writer, "{}", 2); } if (s->markers != Py_None) {@@ -1883,7 +1883,7 @@ encoder_listencode_list(PyEncoderObject *s, PyUnicodeWriter *writer, return -1; if (PySequence_Fast_GET_SIZE(s_fast) == 0) { Py_DECREF(s_fast);- return PyUnicodeWriter_WriteUTF8(writer, "[]", 2);+ return _PyUnicodeWriter_WriteASCIIString((_PyUnicodeWriter*)writer, "[]", 2); } if (s->markers != Py_None) { |
May I merge this PR before Python 3.14b1 release? @vstinner How do you think about using |
You missed the feature freeze, this change should now target Python 3.15.
That's perfectly fine for a stdlib module, especially if it's faster :-) |
---- | ||
* Improve the performance of :class:`~json.JSONEncoder` encodes strings. | ||
(Contributed by Inada Naoki in :gh:`133186`.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
You should now retarget this change to Python 3.15 (move the text to Doc/whatsnew/3.15.rst).
I found the difference between our PRs: you're using |
PyObject_Str() doesn't create a new string. |
nineteendo commentedMay 16, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I see. Is |
No description provided.