Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
Description
Feature or enhancement
Proposal:
Motivation
The standard library json module is widely used and highly stable, but its core parsing architecture is still fundamentally scalar and recursive-descent based. On modern CPUs (especially x86_64 and increasingly ARM64), JSON parsing is often dominated by:
- UTF-8 validation
- structural character detection ({ } [ ] , :)
- whitespace skipping
These stages are well known to be amenable to SIMD acceleration.
Projects such as simdjson demonstrate that a two-stage parsing pipeline (structural scan + semantic parsing), driven by SIMD instructions, can deliver multiple-x speedups while remaining fully compliant with RFC 8259.
Given the growing importance of JSON in performance-sensitive workloads (ML pipelines, telemetry, configuration at scale), I would like to propose a discussion on whether CPython’s json module could adopt a simdjson-inspired architecture, at least optionally.
Scope of the proposal
This issue is not a request to immediately replace the existing implementation. Instead, I would like to explore:
- Feasibility
Whether a SIMD-based parsing backend could coexist with the current implementation.
Whether this would fit CPython’s portability and maintenance constraints. - Architecture
A staged parsing model similar to simdjson:- Stage 1: SIMD structural scan (identify string boundaries, braces, commas, etc.)
- Stage 2: scalar semantic parsing using the structural index
Integration options
- Optional backend selected at build time or runtime
- Fallback to the existing implementation when SIMD is unavailable
Has this already been discussed elsewhere?
No response given
Links to previous discussion of this feature:
No response