EzequielRamis/zimdjsonPublic

NotificationsYou must be signed in to change notification settings
Fork2
Star101

Parsing gigabytes of JSON per second. Zig port of simdjson with fundamental features.

License

MIT license

101 stars 2 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 219 Commits
.github		.github
bench		bench
build		build
docs/assets		docs/assets
scripts		scripts
src		src
tests		tests
tools		tools
.envrc		.envrc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.zig		build.zig
build.zig.zon		build.zig.zon
flake.lock		flake.lock
flake.nix		flake.nix

Repository files navigation

zimdjson

JSON is everywhere on the Internet. Servers spend alot of time parsing it. We need a fresh approach.

Welcome to zimdjson: a high-performance JSON parser that takes advantage of SIMD vector instructions, based on the paperParsing Gigabytes of JSON per Second.

The majority of the source code is based on the C++ implementationhttps://github.com/simdjson/simdjson with the addition of some fundamental features like:

Streaming support which can handle arbitrarily large documents with O(1) of memory usage.
An ergonomic,Serde-like deserialization interface thanks to Zig's compile-time reflection. SeeReflection-based JSON.
More efficient memory usage.

Getting started

Install the zimdjson library by running the following command in your project root:

zig fetch --save git+https://github.com/ezequielramis/zimdjson#0.1.1

Then write the following in yourbuild.zig:

constzimdjson=b.dependency("zimdjson", .{});exe.root_module.addImport("zimdjson",zimdjson.module("zimdjson"));

As an example, download a sample file calledtwitter.json.

Then execute the following:

conststd=@import("std");constzimdjson=@import("zimdjson");pubfnmain()!void {vargpa=std.heap.GeneralPurposeAllocator(.{}).init;constallocator=gpa.allocator();varparser=zimdjson.ondemand.StreamParser(.default).init;deferparser.deinit(allocator);constfile=trystd.fs.cwd().openFile("twitter.json", .{});deferfile.close();constdocument=tryparser.parseFromReader(allocator,file.reader().any());constmetadata_count=trydocument.at("search_metadata").at("count").asUnsigned();std.debug.print("{} results.", .{metadata_count});}

> zig build run100 results.

To see how the streaming parser above handles multi-gigabyte JSON documents with minimal memory usage, download one ofthese dumps or play it with a file of your choice.

Requirements

Currently, targets with Linux, Windows, or macOS operating systems and CPUs with SIMD capabilities are supported. Missing targets can be added by contributing.

Documentation

The most recent documentation can be found inhttps://zimdjson.ramis.ar.

Reflection-based JSON

Although the provided interfaces are simple enough, it is expected to have unnecessary boilerplate when deserializing lots of data structures. Thank to Zig's compile-time reflection, we can eliminate it:

conststd=@import("std");constzimdjson=@import("zimdjson");constFilm=struct {name: []constu8,year:u32,characters: []const []constu8,// we could also use std.ArrayListUnmanaged([]const u8)};pubfnmain()!void {vargpa=std.heap.GeneralPurposeAllocator(.{}).init;constallocator=gpa.allocator();varparser=zimdjson.ondemand.FullParser(.default).init;deferparser.deinit(allocator);constjson=\\{\\  "name": "Esperando la carroza",\\  "year": 1985,\\  "characters": [\\    "Mamá Cora",\\    "Antonio",\\    "Sergio",\\    "Emilia",\\    "Jorge"\\  ]\\}    ;constdocument=tryparser.parseFromSlice(allocator,json);constfilm=trydocument.as(Film,allocator, .{});deferfilm.deinit();trystd.testing.expectEqualDeep(Film{            .name="Esperando la carroza",            .year=1985,            .characters= &.{"Mamá Cora","Antonio","Sergio","Emilia","Jorge",            },        },film.value,    );}

This is just a simple example, but this way of deserializing is as powerful asSerde, so there is a lot of more features we can use, such as:

Deserializing data structures from the Zig Standard Library.
Renaming fields.
Using different union representations.
Custom handling unknown fields.

To see all available options it offers checkout itsreference.

To see all supported Zig Standard Library's data structures checkoutthis list.

To see how it can be really used checkout thetest suite for more examples.

Performance

Note

As a rule of thumb, do not trust any benchmark — always verify it yourself. There may be biases that favor a particular candidate, including mine.

The following picture represents parsing speed in GB/s of similar tasks presented in the paperOn-Demand JSON: A Better Way to ParseDocuments?, where the first three tasks iterate overtwitter.json and the others iterate over a 626MB JSON file calledsystemsPopulated.json fromthese dumps.

Ok, it seems the benchmark got borked but it is not, because of how cache works on small files and how the streaming parser happily ended finding out the tweet in the middle of the file.

Let's get rid of that task to see better the other results.

The following picture corresponds to a second simple benchmark, representing parsing speed in GB/s for near-complete parsing of thetwitter.json file with reflection-based parsers (serde_json,std.json).

Note: If you look closely, you'll notice that "zimdjson (On-Demand, Unordered)" is the slowest of all. This is, unfortunately, a behaviour that also occurs withsimdjson when object keys are unordered. If you do not know the order, it can be mitigated by using an schema. Thanks to theglaze library author for pointing this out.

All benchmarks were run on a 3.30GHz Intel Skylake processor.

About

Parsing gigabytes of JSON per second. Zig port of simdjson with fundamental features.

zimdjson.ramis.ar

Releases

2tags

Sponsor this project

Learn more about GitHub Sponsors

Packages

No packages published

Languages

Zig95.5%
C++2.9%
Other1.6%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

zimdjson

Getting started

Requirements

Documentation

Reflection-based JSON

Performance

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages

Languages

Movatterモバイル変換

License

EzequielRamis/zimdjson

Folders and files

Latest commit

History

Repository files navigation

zimdjson

Getting started

Requirements

Documentation

Reflection-based JSON

Performance

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages0

Languages

Packages