- Notifications
You must be signed in to change notification settings - Fork0
Parsing gigabytes of JSON per second
License
JavaScriptExpert/simdjson
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
- Fast: Over 2.5x faster than other production-grade JSON parsers.
- Easy: First-class, easy to use API.
- Strict: Full JSON and UTF-8 validation, lossless parsing. Performance with no compromises.
- Automatic: Selects a CPU-tailored parser at runtime. No configuration needed.
- Reliable: From memory allocation to error handling, simdjson's design avoids surprises.
This library is part of theAwesome Modern C++ list.
- Quick Start
- Documentation
- Performance results
- Real-world usage
- Bindings and Ports of simdjson
- About simdjson
- Funding
- Contributing to simdjson
- License
The simdjson library is easily consumable with a single .h and .cpp file.
Prerequisites:
g++
(version 7 or better) orclang++
(version 6 or better), and a 64-bit system with a command-line shell (e.g., Linux, macOS, freeBSD). We also support programming environnements like Visual Studio and Xcode, but different steps are needed.Pullsimdjson.h andsimdjson.cpp into a directory, along with the sample filetwitter.json.
wget https://raw.githubusercontent.com/simdjson/simdjson/master/singleheader/simdjson.h https://raw.githubusercontent.com/simdjson/simdjson/master/singleheader/simdjson.cpp https://raw.githubusercontent.com/simdjson/simdjson/master/jsonexamples/twitter.json
Create
quickstart.cpp
:#include"simdjson.h"intmain(void) { simdjson::dom::parser parser; simdjson::dom::element tweets = parser.load("twitter.json"); std::cout << tweets["search_metadata"]["count"] <<" results." << std::endl;}
c++ -o quickstart quickstart.cpp simdjson.cpp
./quickstart
100 results.
Usage documentation is available:
- Basics is an overview of how to use simdjson and its APIs.
- Performance shows some more advanced scenarios and how to tune for them.
- Implementation Selection describes runtime CPU detection andhow you can work with it.
- API contains the automatically generated API documentation.
The simdjson library uses three-quarters less instructions than state-of-the-art parserRapidJSON andfifty percent less than sajson. To our knowledge, simdjson is the first fully-validating JSON parserto run atgigabytes per second (GB/s) on commodity processors. It can parse millions of JSON documents per second on a single core.
The following figure represents parsing speed in GB/s for parsing various fileson an Intel Skylake processor (3.4 GHz) using the GNU GCC 9 compiler (with the -O3 flag).We compare against the best and fastest C++ libraries.The simdjson library offers full unicode (UTF-8) validation and exactnumber parsing. The RapidJSON library is tested in two modes: fast andexact number parsing. The sajson library offers fast (but not exact)number parsing and partial unicode validation. In this data set, the filesizes range from 65KB (github_events) all the way to 3.3GB (gsoc-2018).Many files are mostly made of numbers: canada, mesh.pretty, mesh, randomand numbers: in such instances, we see lower JSON parsing speeds due to thehigh cost of number parsing. The simdjson library uses exact number parsing whichis particular taxing.
On a Skylake processor, the parsing speeds (in GB/s) of various processors on the twitter.json file are as follows, using again GNU GCC 9.1 (with the -O3 flag). The popular JSON for Modern C++ library is particularly slow: it obviously trades parsing speed for other desirable features.
parser | GB/s |
---|---|
simdjson | 2.5 |
RapidJSON UTF8-validation | 0.29 |
RapidJSON UTF8-valid., exact numbers | 0.28 |
RapidJSON insitu, UTF8-validation | 0.41 |
RapidJSON insitu, UTF8-valid., exact | 0.39 |
sajson (insitu, dynamic) | 0.62 |
sajson (insitu, static) | 0.88 |
dropbox | 0.13 |
fastjson | 0.27 |
gason | 0.59 |
ultrajson | 0.34 |
jsmn | 0.25 |
cJSON | 0.31 |
JSON for Modern C++ (nlohmann/json) | 0.11 |
The simdjson library offers high speed whether it processes tiny files (e.g., 300 bytes)or larger files (e.g., 3MB). The following plot presents parsingspeed forsynthetic files over various sizes generated with a script on a 3.4 GHz Skylake processor (GNU GCC 9, -O3).
All our experiments are reproducible.
If you are planning to use simdjson in a product, please work from one of our releases.
We distinguish between "bindings" (which just wrap the C++ code) and a port to another programming language (which reimplements everything).
- ZippyJSON: Swift bindings for the simdjson project.
- libpy_simdjson: high-speed Python bindings for simdjson usinglibpy.
- pysimdjson: Python bindings for the simdjson project.
- simdjson-rs: Rust port.
- simdjson-rust: Rust wrapper (bindings).
- SimdJsonSharp: C# version for .NET Core (bindings and full port).
- simdjson_nodejs: Node.js bindings for the simdjson project.
- simdjson_php: PHP bindings for the simdjson project.
- simdjson_ruby: Ruby bindings for the simdjson project.
- fast_jsonparser: Ruby bindings for the simdjson project.
- simdjson-go: Go port using Golang assembly.
- rcppsimdjson: R bindings.
The simdjson library takes advantage of modern microarchitectures, parallelizing with SIMD vectorinstructions, reducing branch misprediction, and reducing data dependency to take advantage of eachCPU's multiple execution cores.
Some peopleenjoy reading our paper: A description of the designand implementation of simdjson is in our research article: Geoff Langdale, DanielLemire,Parsing Gigabytes of JSON per Second, VLDB Journal 28 (6), 2019.
We also have an informalblog post providing some background and context.
For the video inclined,
(it was the best voted talk, we're kinda proud of it).
The work is supported by the Natural Sciences and Engineering Research Council of Canada under grantnumber RGPIN-2017-03910.
Head over toCONTRIBUTING.md for information on contributing to simdjson, andHACKING.md for information on source, building, and architecture/design.
This code is made available under theApache License 2.0.
Under Windows, we build some tools using the windows/dirent_portable.h file (which is outside our library code): it under the liberal (business-friendly) MIT license.
For compilers that do not supportC++17, we bundle the string-view library which is published under the Boost license (http://www.boost.org/LICENSE_1_0.txt). Like the Apache license, the Boost license is a permissive license allowing commercial redistribution.
About
Parsing gigabytes of JSON per second
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Languages
- C++92.1%
- C2.6%
- CMake2.4%
- Shell1.3%
- Python0.9%
- Ruby0.5%
- Other0.2%