Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Parsing gigabytes of JSON per second. Zig port of simdjson with fundamental features.

License

NotificationsYou must be signed in to change notification settings

EzequielRamis/zimdjson

Repository files navigation

JSON is everywhere on the Internet. Servers spend alot of time parsing it. We need a fresh approach.

Welcome to zimdjson: a high-performance JSON parser that takes advantage of SIMD vector instructions, based on the paperParsing Gigabytes of JSON per Second.

The majority of the source code is based on the C++ implementationhttps://github.com/simdjson/simdjson with the addition of some fundamental features like:

  • Streaming support which can handle arbitrarily large documents with O(1) of memory usage.
  • An ergonomic,Serde-like deserialization interface thanks to Zig's compile-time reflection. SeeReflection-based JSON.
  • More efficient memory usage.

Getting started

Install the zimdjson library by running the following command in your project root:

zig fetch --save git+https://github.com/ezequielramis/zimdjson#0.1.1

Then write the following in yourbuild.zig:

constzimdjson=b.dependency("zimdjson", .{});exe.root_module.addImport("zimdjson",zimdjson.module("zimdjson"));

As an example, download a sample file calledtwitter.json.

Then execute the following:

conststd=@import("std");constzimdjson=@import("zimdjson");pubfnmain()!void {vargpa=std.heap.GeneralPurposeAllocator(.{}).init;constallocator=gpa.allocator();varparser=zimdjson.ondemand.StreamParser(.default).init;deferparser.deinit(allocator);constfile=trystd.fs.cwd().openFile("twitter.json", .{});deferfile.close();constdocument=tryparser.parseFromReader(allocator,file.reader().any());constmetadata_count=trydocument.at("search_metadata").at("count").asUnsigned();std.debug.print("{} results.", .{metadata_count});}
> zig build run100 results.

To see how the streaming parser above handles multi-gigabyte JSON documents with minimal memory usage, download one ofthese dumps or play it with a file of your choice.

Requirements

Currently, targets with Linux, Windows, or macOS operating systems and CPUs with SIMD capabilities are supported. Missing targets can be added by contributing.

Documentation

The most recent documentation can be found inhttps://zimdjson.ramis.ar.

Reflection-based JSON

Although the provided interfaces are simple enough, it is expected to have unnecessary boilerplate when deserializing lots of data structures. Thank to Zig's compile-time reflection, we can eliminate it:

conststd=@import("std");constzimdjson=@import("zimdjson");constFilm=struct {name: []constu8,year:u32,characters: []const []constu8,// we could also use std.ArrayListUnmanaged([]const u8)};pubfnmain()!void {vargpa=std.heap.GeneralPurposeAllocator(.{}).init;constallocator=gpa.allocator();varparser=zimdjson.ondemand.FullParser(.default).init;deferparser.deinit(allocator);constjson=\\{\\  "name": "Esperando la carroza",\\  "year": 1985,\\  "characters": [\\    "Mamá Cora",\\    "Antonio",\\    "Sergio",\\    "Emilia",\\    "Jorge"\\  ]\\}    ;constdocument=tryparser.parseFromSlice(allocator,json);constfilm=trydocument.as(Film,allocator, .{});deferfilm.deinit();trystd.testing.expectEqualDeep(Film{            .name="Esperando la carroza",            .year=1985,            .characters= &.{"Mamá Cora","Antonio","Sergio","Emilia","Jorge",            },        },film.value,    );}

This is just a simple example, but this way of deserializing is as powerful asSerde, so there is a lot of more features we can use, such as:

  • Deserializing data structures from the Zig Standard Library.
  • Renaming fields.
  • Using different union representations.
  • Custom handling unknown fields.

To see all available options it offers checkout itsreference.

To see all supported Zig Standard Library's data structures checkoutthis list.

To see how it can be really used checkout thetest suite for more examples.

Performance

Note

As a rule of thumb, do not trust any benchmark — always verify it yourself. There may be biases that favor a particular candidate, including mine.

The following picture represents parsing speed in GB/s of similar tasks presented in the paperOn-Demand JSON: A Better Way to ParseDocuments?, where the first three tasks iterate overtwitter.json and the others iterate over a 626MB JSON file calledsystemsPopulated.json fromthese dumps.

Ok, it seems the benchmark got borked but it is not, because of how cache works on small files and how the streaming parser happily ended finding out the tweet in the middle of the file.

Let's get rid of that task to see better the other results.

The following picture corresponds to a second simple benchmark, representing parsing speed in GB/s for near-complete parsing of thetwitter.json file with reflection-based parsers (serde_json,std.json).

Note: If you look closely, you'll notice that "zimdjson (On-Demand, Unordered)" is the slowest of all. This is, unfortunately, a behaviour that also occurs withsimdjson when object keys are unordered. If you do not know the order, it can be mitigated by using an schema. Thanks to theglaze library author for pointing this out.

All benchmarks were run on a 3.30GHz Intel Skylake processor.

About

Parsing gigabytes of JSON per second. Zig port of simdjson with fundamental features.

Topics

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp