- Notifications
You must be signed in to change notification settings - Fork91
Rust port of simdjson
License
Apache-2.0, MIT licenses found
Licenses found
simd-lite/simd-json
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Rust port of extremely fastsimdjson JSON parser withSerde compatibility.
simd-json is a Rust port of thesimdjson c++ library.It follows most of the design closely with a few exceptions to make it betterfit into the Rust ecosystem.
The goal of the Rust port of simdjson is not to create a one-to-onecopy, but to integrate the principles of the C++ library intoa Rust library that plays well with the Rust ecosystem. As suchwe provide both compatibility with Serde as well as parsing to aDOM to manipulate data.
As a rule of thumb this library tries to get as close as possibleto the performance of the C++ implementation (currently tracking 0.2.x, work in progress).However, in some design decisions—such as parsing to a DOM or a tape—ergonomics is prioritized overperformance. In other places Rust makes it harder to achieve the same level of performance.
To take advantage of this library your system needs to support SIMD instructions. Onx86
, it willselect the best available supported instruction set (avx2
orsse4.2
) when theruntime-detection
featureis enabled (default). Onaarch64
this library uses theNEON
instruction set. Onwasm
this library usesthesimd128
instruction set when available. When no supported SIMD instructions are found, this library will use afallback implementation, but this is significantly slower.
For best performance, we highly suggest usingsnmalloc,mimalloc orjemallocinstead of the system default allocator.
simd-json
usesa lot of unsafe code.
There are a few reasons for this:
- SIMD intrinsics are inherently unsafe. These uses of unsafe are inescapable in a library such as
simd-json
. - We work around some performance bottlenecks imposed by safe rust. These are avoidable, but at a performance cost.This is a more considered path in
simd-json
.
simd-json
goes through extra scrutiny for unsafe code. These steps are:
- Unit tests - to test 'the obvious' cases, edge cases, and regression cases
- Structural constructive property based testing - We generate random valid JSON objects to exercise the full
simd-json
codebase stochastically. Floats are currently excluded since slightly different parsing algorithms lead to slightlydifferent results here. In short "is simd-json correct". - Data-oriented property-based testing of string-like data - to assert that sequences of legal printable charactersdon't panic or crash the parser (they might and often error so - they are not valid JSON!)
- Destructive Property based testing - make sure that no illegal byte sequences crash the parser in any way
- Fuzzing - fuzz based on upstream & jsonorg simd pass/fail cases
This doesn't ensure complete safety nor is at a bulletproof guarantee, but it does go a long wayto assert that the library is of high production quality and fit for purpose for practical industrial applications.
Various features can be enabled or disabled to tweak various parts of this library. Any features not mentioned here arefor internal configuration and testing.
This feature allows selecting the optimal algorithm based on available features during runtime. It has no effect onnon-x86
platforms. When neitherAVX2
norSSE4.2
is supported, it will fall back to a native Rust implementation.
Disabling this feature (withdefault-features = false
)and settingRUSTFLAGS="-C target-cpu=native
will resultin better performance but the resulting binary will not be portable acrossx86
processors.
EnableSerde support. This consist of implementingserde::Serializer
andserde::Deserializer
,allowing types that implementserde::Serialize
/serde::Deserialize
to be constructed/serialized toBorrowedValue
/OwnedValue
.In addition, this provides the same convenience functions thatserde_json
provides.
Disabling this feature (withdefault-features = false
) will removeserde
andserde_json
from the dependencies.
Enables a parsing method that will parse 8 digits at a time for floats. This is a common pattern but comes at a slightperformance hit if most of the float have less than 8 digits.
Theknown-key
feature changes the hash mechanism for the DOM representation of the underlying JSON object fromahash
tofxhash
. Theahash
hasher is faster at hashing and provides protection against DOS attacks by forcingmultiple keys into a single hashing bucket. Thefxhash
hasher allows for repeatable hashing results,which in turn allows memoizing hashes for well known keys and saving time on lookups. In workloads that are heavy onaccessing some well-known keys, this can be a performance advantage.
Theknown-key
feature is optional and disabled by default and should be explicitly configured.
This flag has no effect on simd-json itself but purely affects theValue
structs.
Thevalue-no-dup-keys
feature flag enables stricter behavior for objects when deserializing into aValue
. Whenenabled, the Value deserializer will remove duplicate keys in a JSON object and only keep the last one. If not setduplicate keys are considered undefined behavior and Value will not make guarantees on its behavior.
Thebig-int-as-float
feature flag treats very large integers that won't fit into u64 as f64 floats. This preventsparsing errors if the JSON you are parsing contains very large integers. Keep in mind that f64 loses some precision whenrepresenting very large numbers.
Add support for parsing and serializing 128-bit integers. This feature is disabled by default because such large numbersare rare in the wild and adding the support incurs a performance penalty.
Enabling this feature can break dependencies in your dependency tree that are usingsimd-json
.
Replacestd::borrow::Cow
withbeef::lean::Cow
This feature is disabled by default, becauseit is a breaking change in the API.
By default the representation ofFloats
used inborrowed::Value
andowned::Value
is simply a value off64
.This however has the normally-not-a-big-deal side effect ofnot having theseValue
types bestd::cmp::Eq
. This does,however, introduce some incompatibilities when offeringsimd-json
as a quasi-drop-in replacement forserde-json
.
So, this feature changes the internal representation ofFloats
to be anf64
wrapped byan Eq-compatible adapter.
This probably carries with it some small performance trade-offs, hence its enablement by feature rather than by default.
Currently disabled
An highly experimental implementation of the algorithm usingstd::simd
and up to 512 byte wide registers.
simd-json offers three main entry points for usage:
The values API is a set of optimized DOM objects that allow parsedJSON to JSON data that has no known variable structure.simd-json
has two versions of this:
Borrowed Values
use simd_json;letmut d =br#"{"some": ["key", "value", 2]}"#.to_vec();let v: simd_json::BorrowedValue = simd_json::to_borrowed_value(&mut d).unwrap();
Owned Values
use simd_json;letmut d =br#"{"some": ["key", "value", 2]}"#.to_vec();let v: simd_json::OwnedValue = simd_json::to_owned_value(&mut d).unwrap();
use simd_json;use serde_json::Value;letmut d =br#"{"some": ["key", "value", 2]}"#.to_vec();let v:Value = simd_json::serde::from_slice(&mut d).unwrap();
use simd_json;letmut d =br#"{"the_answer": 42}"#.to_vec();let tape = simd_json::to_tape(&mut d).unwrap();let value = tape.as_value();// try_get treats value like an object, returns Ok(Some(_)) because the key is foundassert!(value.try_get("the_answer").unwrap().unwrap() ==42);// returns Ok(None) because the key is not found but value is an objectassert!(value.try_get("does_not_exist").unwrap() ==None);// try_get_idx treats value like an array, returns Err(_) because value is not an arrayassert!(value.try_get_idx(0).is_err());
There are also bindings for upstreamsimdjson
availablehere
simd-json itself is licensed under either of
at your option.
However it ports a lot of code fromsimdjson so their work and copyright on that should also be respected.
TheSerde integration is based onserde-json
so their copyright should as well be respected.
About
Rust port of simdjson
Topics
Resources
License
Apache-2.0, MIT licenses found
Licenses found
Code of conduct
Uh oh!
There was an error while loading.Please reload this page.