Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Rust port of simdjson

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
NotificationsYou must be signed in to change notification settings

simd-lite/simd-json

Rust port of extremely fastsimdjson JSON parser withSerde compatibility.


simd-json is a Rust port of thesimdjson c++ library.It follows most of the design closely with a few exceptions to make it betterfit into the Rust ecosystem.

Goals

The goal of the Rust port of simdjson is not to create a one-to-onecopy, but to integrate the principles of the C++ library intoa Rust library that plays well with the Rust ecosystem. As suchwe provide both compatibility with Serde as well as parsing to aDOM to manipulate data.

Performance

As a rule of thumb this library tries to get as close as possibleto the performance of the C++ implementation (currently tracking 0.2.x, work in progress).However, in some design decisions—such as parsing to a DOM or a tape—ergonomics is prioritized overperformance. In other places Rust makes it harder to achieve the same level of performance.

To take advantage of this library your system needs to support SIMD instructions. Onx86, it willselect the best available supported instruction set (avx2 orsse4.2) when theruntime-detection featureis enabled (default). Onaarch64 this library uses theNEON instruction set. Onwasm this library usesthesimd128 instruction set when available. When no supported SIMD instructions are found, this library will use afallback implementation, but this is significantly slower.

Allocator

For best performance, we highly suggest usingsnmalloc,mimalloc orjemallocinstead of the system default allocator.

Safety

simd-json usesa lot of unsafe code.

There are a few reasons for this:

  • SIMD intrinsics are inherently unsafe. These uses of unsafe are inescapable in a library such assimd-json.
  • We work around some performance bottlenecks imposed by safe rust. These are avoidable, but at a performance cost.This is a more considered path insimd-json.

simd-json goes through extra scrutiny for unsafe code. These steps are:

  • Unit tests - to test 'the obvious' cases, edge cases, and regression cases
  • Structural constructive property based testing - We generate random valid JSON objects to exercise the fullsimd-jsoncodebase stochastically. Floats are currently excluded since slightly different parsing algorithms lead to slightlydifferent results here. In short "is simd-json correct".
  • Data-oriented property-based testing of string-like data - to assert that sequences of legal printable charactersdon't panic or crash the parser (they might and often error so - they are not valid JSON!)
  • Destructive Property based testing - make sure that no illegal byte sequences crash the parser in any way
  • Fuzzing - fuzz based on upstream & jsonorg simd pass/fail cases

This doesn't ensure complete safety nor is at a bulletproof guarantee, but it does go a long wayto assert that the library is of high production quality and fit for purpose for practical industrial applications.

Features

Various features can be enabled or disabled to tweak various parts of this library. Any features not mentioned here arefor internal configuration and testing.

runtime-detection (default)

This feature allows selecting the optimal algorithm based on available features during runtime. It has no effect onnon-x86 platforms. When neitherAVX2 norSSE4.2 is supported, it will fall back to a native Rust implementation.

Disabling this feature (withdefault-features = false)and settingRUSTFLAGS="-C target-cpu=native will resultin better performance but the resulting binary will not be portable acrossx86 processors.

serde_impl (default)

EnableSerde support. This consist of implementingserde::Serializer andserde::Deserializer,allowing types that implementserde::Serialize/serde::Deserialize to be constructed/serialized toBorrowedValue/OwnedValue.In addition, this provides the same convenience functions thatserde_json provides.

Disabling this feature (withdefault-features = false) will removeserde andserde_json from the dependencies.

swar-number-parsing (default)

Enables a parsing method that will parse 8 digits at a time for floats. This is a common pattern but comes at a slightperformance hit if most of the float have less than 8 digits.

known-key

Theknown-key feature changes the hash mechanism for the DOM representation of the underlying JSON object fromahash tofxhash. Theahash hasher is faster at hashing and provides protection against DOS attacks by forcingmultiple keys into a single hashing bucket. Thefxhash hasher allows for repeatable hashing results,which in turn allows memoizing hashes for well known keys and saving time on lookups. In workloads that are heavy onaccessing some well-known keys, this can be a performance advantage.

Theknown-key feature is optional and disabled by default and should be explicitly configured.

value-no-dup-keys

This flag has no effect on simd-json itself but purely affects theValue structs.

Thevalue-no-dup-keys feature flag enables stricter behavior for objects when deserializing into aValue. Whenenabled, the Value deserializer will remove duplicate keys in a JSON object and only keep the last one. If not setduplicate keys are considered undefined behavior and Value will not make guarantees on its behavior.

big-int-as-float

Thebig-int-as-float feature flag treats very large integers that won't fit into u64 as f64 floats. This preventsparsing errors if the JSON you are parsing contains very large integers. Keep in mind that f64 loses some precision whenrepresenting very large numbers.

128bit

Add support for parsing and serializing 128-bit integers. This feature is disabled by default because such large numbersare rare in the wild and adding the support incurs a performance penalty.

beef

Enabling this feature can break dependencies in your dependency tree that are usingsimd-json.

Replacestd::borrow::Cow withbeef::lean::Cow This feature is disabled by default, becauseit is a breaking change in the API.

ordered-float

By default the representation ofFloats used inborrowed::Value andowned::Value is simply a value off64.This however has the normally-not-a-big-deal side effect ofnot having theseValue types bestd::cmp::Eq. This does,however, introduce some incompatibilities when offeringsimd-json as a quasi-drop-in replacement forserde-json.

So, this feature changes the internal representation ofFloats to be anf64wrapped byan Eq-compatible adapter.

This probably carries with it some small performance trade-offs, hence its enablement by feature rather than by default.

portable

Currently disabled

An highly experimental implementation of the algorithm usingstd::simd and up to 512 byte wide registers.

Usage

simd-json offers three main entry points for usage:

Values API

The values API is a set of optimized DOM objects that allow parsedJSON to JSON data that has no known variable structure.simd-jsonhas two versions of this:

Borrowed Values

use simd_json;letmut d =br#"{"some": ["key", "value", 2]}"#.to_vec();let v: simd_json::BorrowedValue = simd_json::to_borrowed_value(&mut d).unwrap();

Owned Values

use simd_json;letmut d =br#"{"some": ["key", "value", 2]}"#.to_vec();let v: simd_json::OwnedValue = simd_json::to_owned_value(&mut d).unwrap();

Serde Compatible API

use simd_json;use serde_json::Value;letmut d =br#"{"some": ["key", "value", 2]}"#.to_vec();let v:Value = simd_json::serde::from_slice(&mut d).unwrap();

Tape API

use simd_json;letmut d =br#"{"the_answer": 42}"#.to_vec();let tape = simd_json::to_tape(&mut d).unwrap();let value = tape.as_value();// try_get treats value like an object, returns Ok(Some(_)) because the key is foundassert!(value.try_get("the_answer").unwrap().unwrap() ==42);// returns Ok(None) because the key is not found but value is an objectassert!(value.try_get("does_not_exist").unwrap() ==None);// try_get_idx treats value like an array, returns Err(_) because value is not an arrayassert!(value.try_get_idx(0).is_err());

Other interesting things

There are also bindings for upstreamsimdjson availablehere

License

simd-json itself is licensed under either of

at your option.

However it ports a lot of code fromsimdjson so their work and copyright on that should also be respected.

TheSerde integration is based onserde-json so their copyright should as well be respected.

All Thanks To Our Contributors:

GitHub profile pictures of all contributors to simd-json

About

Rust port of simdjson

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp