Movatterモバイル変換


[0]ホーム

URL:


simdjson

packagemodule
v0.4.5Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 11, 2023 License:Apache-2.0Imports:17Imported by:46

Details

Repository

github.com/minio/simdjson-go

Links

README

simdjson-go

Introduction

This is a Golang port ofsimdjson,a high performance JSON parser developed by Daniel Lemire and Geoff Langdale.It makes extensive use of SIMD instructions to achieve parsing performance of gigabytes of JSON per second.

Performance wise,simdjson-go runs on average at about 40% to 60% of the speed of simdjson.Compared to Golang's standard packageencoding/json,simdjson-go is about 10x faster.

Documentation

Features

simdjson-go is a validating parser, meaning that it amongst others validates and checks numerical values, booleans etc.Therefore, these values are available as the appropriateint andfloat64 representations after parsing.

Additionallysimdjson-go has the following features:

  • No 4 GB object limit
  • Support forndjson (newline delimited json)
  • Pure Go (no need for cgo)
  • Object search/traversal.
  • In-place value replacement.
  • Remove object/array members.
  • Serialize parsed JSONas binary data.
  • Re-serialize parts as JSON.

Requirements

simdjson-go has the following requirements for parsing:

A CPU with both AVX2 and CLMUL is required (Haswell from 2013 onwards should do for Intel, for AMD a Ryzen/EPYC CPU (Q1 2017) should be sufficient).This can be checked using the providedSupportedCPU() function.

The package does not provide fallback for unsupported CPUs, but serialized data can be deserialized on an unsupported CPU.

Using thegccgo will also always return unsupported CPU since it cannot compile assembly.

Usage

Run the following command in order to installsimdjson-go

go get -u github.com/minio/simdjson-go

In order to parse a JSON byte stream, you either callsimdjson.Parse()orsimdjson.ParseND() for newline delimited JSON files.Both of these functions return aParsedJsonstruct that can be used to navigate the JSON object by callingIter().

The easiest use is to callForEach() function of the returnedParsedJson.

func main() {// Parse JSON:pj, err := Parse([]byte(`{"Image":{"URL":"http://example.com/example.gif"}}`), nil)if err != nil {log.Fatal(err)}// Iterate each top level element._ = pj.ForEach(func(i Iter) error {fmt.Println("Got iterator for type:", i.Type())element, err := i.FindElement(nil, "Image", "URL")if err == nil {value, _ := element.Iter.StringCvt()fmt.Println("Found element:", element.Name, "Type:", element.Type, "Value:", value)}return nil})// Output:// Got iterator for type: object// Found element: URL Type: string Value: http://example.com/example.gif}
Parsing with iterators

Using the typeIter you can callAdvance() to iterate over the tape, like so:

for {    typ := iter.Advance()    switch typ {    case simdjson.TypeRoot:        if typ, tmp, err = iter.Root(tmp); err != nil {            return        }        if typ == simdjson.TypeObject {            if obj, err = tmp.Object(obj); err != nil {                return            }            e := obj.FindKey(key, &elem)            if e != nil && elem.Type == simdjson.TypeString {                v, _ := elem.Iter.StringBytes()                fmt.Println(string(v))            }        }    default:        return    }}

When you advance the Iter you get the next type currently queued.

Each type then has helpers to access the data. When you get a type you can use these to access the data:

TypeAction on Iter
TypeNoneNothing follows. Iter done
TypeNullNull value
TypeStringString()/StringBytes()
TypeIntInt()/Float()
TypeUintUint()/Float()
TypeFloatFloat()
TypeBoolBool()
TypeObjectObject()
TypeArrayArray()
TypeRootRoot()

You can also get the next value as aninterface{} using theInterface() method.

Note that arrays and objects that are null are always returned asTypeNull.

The complex types returns helpers that will help parse each of the underlying structures.

It is up to you to keep track of the nesting level you are operating at.

For anyIter it is possible to marshal the recursive content of the Iter usingMarshalJSON() orMarshalJSONBuffer(...).

Currently, it is not possible to unmarshal into structs.

Search by path

It is possible to search by path to find elements by traversing objects.

For example:

// Find element in path.elem, err := i.FindElement(nil, "Image", "URL")

Will locate the field inside a json object with the following structure:

{    "Image": {        "URL": "value"    }}

The values can be any type. TheElementwill contain the element information and an Iter to access the content.

Parsing Objects

If you are only interested in one key in an object you can useFindKey to quickly select it.

It is possible to use theForEach(fn func(key []byte, i Iter), onlyKeys map[string]struct{})which makes it possible to get a callback for each element in the object.

An object can be traversed manually by usingNextElement(dst *Iter) (name string, t Type, err error).The key of the element will be returned as a string and the type of the value will be returnedand the providedIter will contain an iterator which will allow access to the content.

There is aNextElementBytes which provides the same, but without the need to allocate a string.

All elements of the object can be retrieved using a pretty lightweightParsewhich provides a map of all keys and all elements an a slide.

All elements of the object can be returned asmap[string]interface{} using theMap method on the object.This will naturally perform allocations for all elements.

Parsing Arrays

Arrays in JSON can have mixed types.

It is possible to callForEach(fn func(i Iter)) to get each element.

To iterate over the array with mixed types use theItermethod to get an iterator.

There are methods that allow you to retrieve all elements as a single type,[]int64, []uint64, []float64 and []string with AsInteger(), AsUint64(), AsFloat() and AsString().

Number parsing

Numbers in JSON are untyped and are returned by the following rules in order:

  • If there is any float point notation, like exponents, or a dot notation, it is always returned as float.
  • If number is a pure integer and it fits within an int64 it is returned as such.
  • If number is a pure positive integer and fits within a uint64 it is returned as such.
  • If the number is valid number it is returned as float64.

If the number was converted from integer notation to a float due to not fitting inside int64/uint64theFloatOverflowedInteger flag is set, which can be retrieved using(Iter).FloatFlags() method.

JSON numbers follow JavaScript’s double-precision floating-point format.

  • Represented in base 10 with no superfluous leading zeros (e.g. 67, 1, 100).
  • Include digits between 0 and 9.
  • Can be a negative number (e.g. -10).
  • Can be a fraction (e.g. .5).
  • Can also have an exponent of 10, prefixed by e or E with a plus or minus sign to indicate positive or negative exponentiation.
  • Octal and hexadecimal formats are not supported.
  • Can not have a value of NaN (Not A Number) or Infinity.

Parsing NDJSON stream

Newline delimited json is sent as packets with each line being a root element.

Here is an example that counts the number of"Make": "HOND" in NDJSON similar to this:

{"Age":20, "Make": "HOND"}{"Age":22, "Make": "TLSA"}
func findHondas(r io.Reader) {var nFound int// Communicationreuse := make(chan *simdjson.ParsedJson, 10)res := make(chan simdjson.Stream, 10)simdjson.ParseNDStream(r, res, reuse)// Read results in blocks...for got := range res {if got.Error != nil {if got.Error == io.EOF {break}log.Fatal(got.Error)}var result intvar elem *Elementerr := got.Value.ForEach(func(i Iter) error {var err errorelem, err = i.FindElement(elem, "Make")if err != nil {return nil}bts, _ := elem.Iter.StringBytes()if string(bts) == "HOND" {result++}return nil})reuse <- got.Value}fmt.Println("Found", nFound, "Hondas")}

More examples can be found in the examples subdirectory and further documentation can be found atgodoc.

In-place Value Replacement

It is possible to replace a few, basic internal values.This means that when re-parsing or re-serializing the parsed JSON these values will be output.

Boolean (true/false) and null values can be freely exchanged.

Numeric values (float, int, uint) can be exchanged freely.

Strings can also be exchanged with different values.

Strings and numbers can be exchanged. However, note that there is no checks for numbers inserted as object keys,so if used for this invalid JSON is possible.

There is no way to modify objects, arrays, other than value types above inside each.It is not possible to remove or add elements.

To replace a value, of value referenced by anIter simply callSetNull,SetBool,SetFloat,SetInt,SetUInt,SetString orSetStringBytes.

Object & Array Element Deletion

It is possible to delete one or more elements in an object.

(*Object).DeleteElems(fn, onlyKeys) will call back fn for each key+ value.

If true is returned, the key+value is deleted. A key filter can be provided for optional filtering.If the callback function is nil all elements matching the filter will be deleted.If both are nil all elements are deleted.

Example:

// The object we are modifyingvar obj *simdjson.Object// Delete all entries where the key is "unwanted":err = obj.DeleteElems(func(key []byte, i Iter) bool {return string(key) == "unwanted")}, nil)// Alternative version with prefiltered keys:err = obj.DeleteElems(nil, map[string]struct{}{"unwanted": {}})

(*Array).DeleteElems(fn func(i Iter) bool) will call back fn for each array value.If the function returns true the element is deleted in the array.

// The array we are modifyingvar array *simdjson.Array// Delete all entries that are strings.array.DeleteElems(func(i Iter) bool {return i.Type() == TypeString})

Serializing parsed json

It is possible to serialize parsed JSON for more compact storage and faster load time.

To create a new serialized useNewSerializer.This serializer can be reused for several JSON blocks.

The serializer will provide string deduplication and compression of elements.This can be finetuned using theCompressMode setting.

To serialize a block of parsed data use theSerialize method.

To read back use theDeserialize method.For deserializing the compression mode does not need to match since it is read from the stream.

Example of speed for serializer/deserializer onparking-citations-1M.

Compress Mode% of JSON sizeSerialize SpeedDeserialize Speed
None177.26%425.70 MB/s2334.33 MB/s
Fast17.20%412.75 MB/s1234.76 MB/s
Default16.85%411.59 MB/s1242.09 MB/s
Best10.91%337.17 MB/s806.23 MB/s

In some cases the speed difference and compression difference will be bigger.

Performance vsencoding/json andjson-iterator/go

Though simdjson provides different output than traditional unmarshal functions this can givean overview of the expected performance for reading specific data in JSON.

Below is a performance comparison to Golang's standard packageencoding/json based on the same set of JSON test files, unmarshal tointerface{}.

Comparisons with default settings:

λ benchcmp enc-json.txt simdjson.txtbenchmark                      old ns/op     new ns/op     deltaBenchmarkApache_builds-32      1219080       142972        -88.27%BenchmarkCanada-32             38362219      13417193      -65.02%BenchmarkCitm_catalog-32       17051899      1359983       -92.02%BenchmarkGithub_events-32      603037        74042         -87.72%BenchmarkGsoc_2018-32          20777333      1259171       -93.94%BenchmarkInstruments-32        2626808       301370        -88.53%BenchmarkMarine_ik-32          56630295      14419901      -74.54%BenchmarkMesh-32               13411486      4206251       -68.64%BenchmarkMesh_pretty-32        18226803      4786081       -73.74%BenchmarkNumbers-32            2131951       909641        -57.33%BenchmarkRandom-32             7360966       1004387       -86.36%BenchmarkTwitter-32            6635848       588773        -91.13%BenchmarkTwitterescaped-32     6292856       972250        -84.55%BenchmarkUpdate_center-32      6396501       708717        -88.92%benchmark                      old MB/s     new MB/s     speedupBenchmarkApache_builds-32      104.40       890.21       8.53xBenchmarkCanada-32             58.68        167.77       2.86xBenchmarkCitm_catalog-32       101.29       1270.02      12.54xBenchmarkGithub_events-32      108.01       879.67       8.14xBenchmarkGsoc_2018-32          160.17       2642.88      16.50xBenchmarkInstruments-32        83.88        731.15       8.72xBenchmarkMarine_ik-32          52.68        206.90       3.93xBenchmarkMesh-32               53.95        172.03       3.19xBenchmarkMesh_pretty-32        86.54        329.57       3.81xBenchmarkNumbers-32            70.42        165.04       2.34xBenchmarkRandom-32             69.35        508.25       7.33xBenchmarkTwitter-32            95.17        1072.59      11.27xBenchmarkTwitterescaped-32     89.37        578.46       6.47xBenchmarkUpdate_center-32      83.35        752.31       9.03xbenchmark                      old allocs     new allocs     deltaBenchmarkApache_builds-32      9716           22             -99.77%BenchmarkCanada-32             392535         250            -99.94%BenchmarkCitm_catalog-32       95372          110            -99.88%BenchmarkGithub_events-32      3328           17             -99.49%BenchmarkGsoc_2018-32          58615          67             -99.89%BenchmarkInstruments-32        13336          33             -99.75%BenchmarkMarine_ik-32          614776         467            -99.92%BenchmarkMesh-32               149504         122            -99.92%BenchmarkMesh_pretty-32        149504         122            -99.92%BenchmarkNumbers-32            20025          28             -99.86%BenchmarkRandom-32             66083          76             -99.88%BenchmarkTwitter-32            31261          53             -99.83%BenchmarkTwitterescaped-32     31757          53             -99.83%BenchmarkUpdate_center-32      49074          58             -99.88%benchmark                      old bytes     new bytes     deltaBenchmarkApache_builds-32      461556        965           -99.79%BenchmarkCanada-32             10943847      39793         -99.64%BenchmarkCitm_catalog-32       5122732       6089          -99.88%BenchmarkGithub_events-32      186148        802           -99.57%BenchmarkGsoc_2018-32          7032092       17215         -99.76%BenchmarkInstruments-32        882265        1310          -99.85%BenchmarkMarine_ik-32          22564413      189870        -99.16%BenchmarkMesh-32               7130934       15483         -99.78%BenchmarkMesh_pretty-32        7288661       12066         -99.83%BenchmarkNumbers-32            1066304       1280          -99.88%BenchmarkRandom-32             2787054       4096          -99.85%BenchmarkTwitter-32            2152260       2550          -99.88%BenchmarkTwitterescaped-32     2330548       3062          -99.87%BenchmarkUpdate_center-32      2729631       3235          -99.88%

Here is another benchmark comparison tojson-iterator/go, unmarshal tointerface{}.

λ benchcmp jsiter.txt simdjson.txtbenchmark                      old ns/op     new ns/op     deltaBenchmarkApache_builds-32      891370        142972        -83.96%BenchmarkCanada-32             52365386      13417193      -74.38%BenchmarkCitm_catalog-32       10154544      1359983       -86.61%BenchmarkGithub_events-32      398741        74042         -81.43%BenchmarkGsoc_2018-32          15584278      1259171       -91.92%BenchmarkInstruments-32        1858339       301370        -83.78%BenchmarkMarine_ik-32          49881479      14419901      -71.09%BenchmarkMesh-32               15038300      4206251       -72.03%BenchmarkMesh_pretty-32        17655583      4786081       -72.89%BenchmarkNumbers-32            2903165       909641        -68.67%BenchmarkRandom-32             6156849       1004387       -83.69%BenchmarkTwitter-32            4655981       588773        -87.35%BenchmarkTwitterescaped-32     5521004       972250        -82.39%BenchmarkUpdate_center-32      5540200       708717        -87.21%benchmark                      old MB/s     new MB/s     speedupBenchmarkApache_builds-32      142.79       890.21       6.23xBenchmarkCanada-32             42.99        167.77       3.90xBenchmarkCitm_catalog-32       170.09       1270.02      7.47xBenchmarkGithub_events-32      163.34       879.67       5.39xBenchmarkGsoc_2018-32          213.54       2642.88      12.38xBenchmarkInstruments-32        118.57       731.15       6.17xBenchmarkMarine_ik-32          59.81        206.90       3.46xBenchmarkMesh-32               48.12        172.03       3.58xBenchmarkMesh_pretty-32        89.34        329.57       3.69xBenchmarkNumbers-32            51.71        165.04       3.19xBenchmarkRandom-32             82.91        508.25       6.13xBenchmarkTwitter-32            135.64       1072.59      7.91xBenchmarkTwitterescaped-32     101.87       578.46       5.68xBenchmarkUpdate_center-32      96.24        752.31       7.82xbenchmark                      old allocs     new allocs     deltaBenchmarkApache_builds-32      13248          22             -99.83%BenchmarkCanada-32             665988         250            -99.96%BenchmarkCitm_catalog-32       118755         110            -99.91%BenchmarkGithub_events-32      4442           17             -99.62%BenchmarkGsoc_2018-32          90915          67             -99.93%BenchmarkInstruments-32        18776          33             -99.82%BenchmarkMarine_ik-32          692512         467            -99.93%BenchmarkMesh-32               184137         122            -99.93%BenchmarkMesh_pretty-32        204037         122            -99.94%BenchmarkNumbers-32            30037          28             -99.91%BenchmarkRandom-32             88091          76             -99.91%BenchmarkTwitter-32            45040          53             -99.88%BenchmarkTwitterescaped-32     47198          53             -99.89%BenchmarkUpdate_center-32      66757          58             -99.91%benchmark                      old bytes     new bytes     deltaBenchmarkApache_builds-32      518350        965           -99.81%BenchmarkCanada-32             16189358      39793         -99.75%BenchmarkCitm_catalog-32       5571982       6089          -99.89%BenchmarkGithub_events-32      221631        802           -99.64%BenchmarkGsoc_2018-32          11771591      17215         -99.85%BenchmarkInstruments-32        991674        1310          -99.87%BenchmarkMarine_ik-32          25257277      189870        -99.25%BenchmarkMesh-32               7991707       15483         -99.81%BenchmarkMesh_pretty-32        8628570       12066         -99.86%BenchmarkNumbers-32            1226518       1280          -99.90%BenchmarkRandom-32             3167528       4096          -99.87%BenchmarkTwitter-32            2426730       2550          -99.89%BenchmarkTwitterescaped-32     2607198       3062          -99.88%BenchmarkUpdate_center-32      3052382       3235          -99.89%
Inplace strings

The best performance is obtained by keeping the JSON message fully mapped in memory and using theWithCopyStrings(false) option. This prevents duplicate copies of string values being madebut mandates that the original JSON buffer is kept alive until theParsedJson object is no longer needed(ie iteration over the tape format has been completed).

In case the JSON message buffer is freed earlier (or for streaming use cases where memory is reused)WithCopyStrings(true) should be used (which is the default behaviour).

The performance impact differs based on the input type, but this is the general differences:

BenchmarkApache_builds/copy-32                    8242    142972 ns/op 890.21 MB/s     965 B/op      22 allocs/opBenchmarkApache_builds/nocopy-32                 10000    111189 ns/op1144.68 MB/s     932 B/op      22 allocs/opBenchmarkCanada/copy-32                             91  13417193 ns/op 167.77 MB/s   39793 B/op     250 allocs/opBenchmarkCanada/nocopy-32                           87  13392401 ns/op 168.08 MB/s   41334 B/op     250 allocs/opBenchmarkCitm_catalog/copy-32                      889   1359983 ns/op1270.02 MB/s    6089 B/op     110 allocs/opBenchmarkCitm_catalog/nocopy-32                    924   1268470 ns/op1361.64 MB/s    5582 B/op     110 allocs/opBenchmarkGithub_events/copy-32                   16092     74042 ns/op 879.67 MB/s     802 B/op      17 allocs/opBenchmarkGithub_events/nocopy-32                 19446     62143 ns/op1048.10 MB/s     794 B/op      17 allocs/opBenchmarkGsoc_2018/copy-32                         948   1259171 ns/op2642.88 MB/s   17215 B/op      67 allocs/opBenchmarkGsoc_2018/nocopy-32                      1144   1040864 ns/op3197.18 MB/s    9947 B/op      67 allocs/opBenchmarkInstruments/copy-32                      3932    301370 ns/op 731.15 MB/s    1310 B/op      33 allocs/opBenchmarkInstruments/nocopy-32                    4443    271500 ns/op 811.59 MB/s    1258 B/op      33 allocs/opBenchmarkMarine_ik/copy-32                          79  14419901 ns/op 206.90 MB/s  189870 B/op     467 allocs/opBenchmarkMarine_ik/nocopy-32                        79  14176758 ns/op 210.45 MB/s  189867 B/op     467 allocs/opBenchmarkMesh/copy-32                              288   4206251 ns/op 172.03 MB/s   15483 B/op     122 allocs/opBenchmarkMesh/nocopy-32                            285   4207299 ns/op 171.99 MB/s   15615 B/op     122 allocs/opBenchmarkMesh_pretty/copy-32                       248   4786081 ns/op 329.57 MB/s   12066 B/op     122 allocs/opBenchmarkMesh_pretty/nocopy-32                     250   4803647 ns/op 328.37 MB/s   12009 B/op     122 allocs/opBenchmarkNumbers/copy-32                          1336    909641 ns/op 165.04 MB/s    1280 B/op      28 allocs/opBenchmarkNumbers/nocopy-32                        1321    910493 ns/op 164.88 MB/s    1281 B/op      28 allocs/opBenchmarkRandom/copy-32                           1201   1004387 ns/op 508.25 MB/s    4096 B/op      76 allocs/opBenchmarkRandom/nocopy-32                         1554    773142 ns/op 660.26 MB/s    3198 B/op      76 allocs/opBenchmarkTwitter/copy-32                          2035    588773 ns/op1072.59 MB/s    2550 B/op      53 allocs/opBenchmarkTwitter/nocopy-32                        2485    475949 ns/op1326.85 MB/s    2029 B/op      53 allocs/opBenchmarkTwitterescaped/copy-32                   1189    972250 ns/op 578.46 MB/s    3062 B/op      53 allocs/opBenchmarkTwitterescaped/nocopy-32                 1372    874972 ns/op 642.77 MB/s    2518 B/op      53 allocs/opBenchmarkUpdate_center/copy-32                    1665    708717 ns/op 752.31 MB/s    3235 B/op      58 allocs/opBenchmarkUpdate_center/nocopy-32                  2241    536027 ns/op 994.68 MB/s    2130 B/op      58 allocs/op

Design

simdjson-go follows the same two stage design assimdjson.During the first stage the structural elements ({,},[,],:, and,)are detected and forwarded as offsets in the message buffer to the second stage.The second stage builds a tape format of the structure of the JSON document.

Note that in contrast tosimdjson,simdjson-go outputsuint32increments (as opposed to absolute values) to the second stage.This allows arbitrarily large JSON files to be parsed (as long as a single (string) element does not surpass 4 GB...).

Also, for better performance,both stages run concurrently as separate go routines and a go channel is used to communicate between the two stages.

Stage 1

Stage 1 has been converted from the original C code (containing the SIMD intrinsics) to Golang assembly usingc2goasm.It essentially consists of five separate steps, being:

  • find_odd_backslash_sequences: detect backslash characters used to escape quotes
  • find_quote_mask_and_bits: generate a mask with bits turned on for characters between quotes
  • find_whitespace_and_structurals: generate a mask for whitespace plus a mask for the structural characters
  • finalize_structurals: combine the masks computed above into a final mask where each active bit represents the position of a structural character in the input message.
  • flatten_bits_incremental: output the active bits in the final mask as incremental offsets.

For more details you can take a look at the various test cases infind_subroutines_amd64_test.go to see howthe individual routines can be invoked (typically with a 64 byte input buffer that generates one or more 64-bit masks).

There is one final routine,find_structural_bits_in_slice, that ties it all together and isinvoked with a slice of the message buffer in order to find the incremental offsets.

Stage 2

During Stage 2 the tape structure is constructed.It is essentially a single function that jumps around as it finds the various structural charactersand builds the hierarchy of the JSON document that it processes.The values of the JSON elements such as strings, integers, booleans etc. are parsed and written to the tape.

Any errors (such as an array not being closed or a missing closing brace) are detected and reported back as errors to the client.

Tape format

Similarly tosimdjson,simdjson-go parses the structure onto a 'tape' format.With this format it is possible to skip over arrays and (sub)objects as the sizes are recorded in the tape.

simdjson-go format is exactly the same as thesimdjsontapeformat with the following 2 exceptions:

  • In order to support ndjson, it is possible to have more than one root element on the tape.Also, to allow for fast navigation over root elements, a root points to the next root element(and as such the last root element points 1 index past the length of the tape).

A "NOP" tag is added. The value contains the number of tape entries to skip forward for next tag.

  • Strings are handled differently, unlikesimdjson the string size is not prepended in the String bufferbut is added as an additional element to the tape itself (much like integers and floats).
    • In caseWithCopyStrings(false) Only strings that contain special characters are copied to the String bufferin which case the payload from the tape is the offset into the String buffer.For string values without special characters the tape's payload points directly into the message buffer.
    • In caseWithCopyStrings(true) (default): Strings are always copied to the String buffer.

For more information, seeTestStage2BuildTape instage2_build_tape_test.go.

Fuzz Tests

simdjson-go has been extensively fuzz tested to ensure that input cannot generate crashes and that output matchesthe standard library.

The fuzz tests are included as Go 1.18+ compatible tests.

License

simdjson-go is released under the Apache License v2.0. You can find the complete text in the file LICENSE.

Contributing

Contributions are welcome, please send PRs for any enhancements.

If your PR include parsing changes please run fuzz testers for a couple of hours.

Documentation

Index

Examples

Constants

View Source
const (TagString      =Tag('"')TagInteger     =Tag('l')TagUint        =Tag('u')TagFloat       =Tag('d')TagNull        =Tag('n')TagBoolTrue    =Tag('t')TagBoolFalse   =Tag('f')TagObjectStart =Tag('{')TagObjectEnd   =Tag('}')TagArrayStart  =Tag('[')TagArrayEnd    =Tag(']')TagRoot        =Tag('r')TagNop         =Tag('N')TagEnd         =Tag(0))
View Source
const JSONTAGMASK = 0xff <<JSONTAGOFFSET
View Source
const JSONTAGOFFSET = 56
View Source
const JSONVALUEMASK = 0xff_ffff_ffff_ffff
View Source
const STRINGBUFBIT = 0x80_0000_0000_0000
View Source
const STRINGBUFMASK = 0x7fffffffffffff

Variables

View Source
var ErrPathNotFound =errors.New("path not found")

ErrPathNotFound is returned

TagToType converts a tag to type.For arrays and objects only the start tag will return types.All non-existing tags returns TypeNone.

Functions

funcParseNDStream

func ParseNDStream(rio.Reader, res chan<-Stream, reuse <-chan *ParsedJson)

ParseNDStream will parse a stream and return parsed JSON to the supplied result channel.The method will return immediately.Each element is contained within a root tag.

<root>Element 1</root><root>Element 2</root>...

Each result will contain an unspecified number of full elements,so it can be assumed that each result starts and ends with a root tag.The parser will keep parsing until writes to the result stream blocks.A stream is finished when a non-nil Error is returned.If the stream was parsed until the end the Error value will be io.EOFThe channel will be closed after an error has been returned.An optional channel for returning consumed results can be provided.There is no guarantee that elements will be consumed, so always usenon-blocking writes to the reuse channel.

funcSupportedCPU

func SupportedCPU()bool

SupportedCPU will return whether the CPU is supported.

Types

typeArray

type Array struct {// contains filtered or unexported fields}

Array represents a JSON array.There are methods that allows to get full arrays if the value type is the same.Otherwise an iterator can be retrieved.

Example
if !SupportedCPU() {// Fake itfmt.Println("Found array\nType: int value: 116\nType: int value: 943\nType: int value: 234\nType: int value: 38793")return}input := `{    "Image":    {        "Animated": false,        "Height": 600,        "IDs":        [            116,            943,            234,            38793        ],        "Thumbnail":        {            "Height": 125,            "Url": "http://www.example.com/image/481989943",            "Width": 100        },        "Title": "View from 15th Floor",        "Width": 800    },"Alt": "Image of city" }`pj, err := Parse([]byte(input), nil)if err != nil {log.Fatal(err)}i := pj.Iter()i.AdvanceInto()// Grab root_, root, err := i.Root(nil)if err != nil {log.Fatal(err)}// Grab top objectobj, err := root.Object(nil)if err != nil {log.Fatal(err)}// Find element in path.elem, err := obj.FindPath(nil, "Image", "IDs")if err != nil {log.Fatal(err)}fmt.Println("Found", elem.Type)if elem.Type == TypeArray {array, err := elem.Iter.Array(nil)if err != nil {log.Fatal(err)}array.ForEach(func(i Iter) {asString, _ := i.StringCvt()fmt.Println("Type:", i.Type(), "value:", asString)})}
Output:Found arrayType: int value: 116Type: int value: 943Type: int value: 234Type: int value: 38793

func (*Array)AsFloat

func (a *Array) AsFloat() ([]float64,error)

AsFloat returns the array values as float.Integers are automatically converted to float.

func (*Array)AsInteger

func (a *Array) AsInteger() ([]int64,error)

AsInteger returns the array values as int64 values.Uints/Floats are automatically converted to int64 if they fit within the range.

func (*Array)AsString

func (a *Array) AsString() ([]string,error)

AsString returns the array values as a slice of strings.No conversion is done.

func (*Array)AsStringCvtadded inv0.2.1

func (a *Array) AsStringCvt() ([]string,error)

AsStringCvt returns the array values as a slice of strings.Scalar types are converted.Root, Object and Arrays are not supported an will return an error if found.

func (*Array)AsUint64added inv0.2.1

func (a *Array) AsUint64() ([]uint64,error)

AsUint64 returns the array values as float.Uints/Floats are automatically converted to uint64 if they fit within the range.

func (*Array)DeleteElemsadded inv0.4.3

func (a *Array) DeleteElems(fn func(iIter)bool)

DeleteElems calls the provided function for every element.If the function returns true the element is deleted in the array.

Example
if !SupportedCPU() {// Fake itfmt.Println("Found array\nModified: {\"Image\":{\"Animated\":false,\"Height\":600,\"IDs\":[943,38793]},\"Alt\":\"Image of city\"}")return}input := `{    "Image":    {        "Animated": false,        "Height": 600,        "IDs":        [            116,            943,            234,            38793        ]    },"Alt": "Image of city" }`pj, err := Parse([]byte(input), nil)if err != nil {log.Fatal(err)}i := pj.Iter()i.AdvanceInto()// Grab root_, root, err := i.Root(nil)if err != nil {log.Fatal(err)}// Grab top objectobj, err := root.Object(nil)if err != nil {log.Fatal(err)}// Find element in path.elem, err := obj.FindPath(nil, "Image", "IDs")if err != nil {log.Fatal(err)}fmt.Println("Found", elem.Type)if elem.Type == TypeArray {array, err := elem.Iter.Array(nil)if err != nil {log.Fatal(err)}// Delete all integer elements that are < 500array.DeleteElems(func(i Iter) bool {if id, err := i.Int(); err == nil {return id < 500}return false})}b, err := root.MarshalJSON()if err != nil {log.Fatal(err)}fmt.Println("Modified:", string(b))
Output:Found arrayModified: {"Image":{"Animated":false,"Height":600,"IDs":[943,38793]},"Alt":"Image of city"}

func (*Array)FirstType

func (a *Array) FirstType()Type

FirstType will return the type of the first element.If there are no elements, TypeNone is returned.

func (*Array)ForEachadded inv0.4.0

func (a *Array) ForEach(fn func(iIter))

ForEach calls the provided function for every element.

func (*Array)Interface

func (a *Array) Interface() ([]interface{},error)

Interface returns the array as a slice of interfaces.See Iter.Interface() for a reference on value types.

func (*Array)Iter

func (a *Array) Iter()Iter

Iter returns the array as an iterator.This can be used for parsing mixed content arrays.The first value is ready with a call to Advance.Calling after last element should have TypeNone.

func (*Array)MarshalJSON

func (a *Array) MarshalJSON() ([]byte,error)

MarshalJSON will marshal the entire remaining scope of the iterator.

func (*Array)MarshalJSONBuffer

func (a *Array) MarshalJSONBuffer(dst []byte) ([]byte,error)

MarshalJSONBuffer will marshal all elements.An optional buffer can be provided for fewer allocations.Output will be appended to the destination.

typeCompressModeadded inv0.1.4

type CompressModeuint8
const (// CompressNone no compression whatsoever.CompressNoneCompressMode =iota// CompressFast will apply light compression,// but will not deduplicate strings which may affect deserialization speed.CompressFast// CompressDefault applies light compression and deduplicates strings.CompressDefault// CompressBestCompressBest)

typeElement

type Element struct {// Name of the elementNamestring// Type of the elementTypeType// Iter containing the elementIterIter}

Element represents an element in an object.

typeElements

type Elements struct {Elements []ElementIndex    map[string]int}

Elements contains all elements in an objectkept in original order.And index contains lookup for object keys.

func (Elements)Lookup

func (eElements) Lookup(keystring) *Element

Lookup a key in elements and return the element.Returns nil if key doesn't exist.Keys are case sensitive.

func (Elements)MarshalJSON

func (eElements) MarshalJSON() ([]byte,error)

MarshalJSON will marshal the entire remaining scope of the iterator.

func (Elements)MarshalJSONBuffer

func (eElements) MarshalJSONBuffer(dst []byte) ([]byte,error)

MarshalJSONBuffer will marshal all elements.An optional buffer can be provided for fewer allocations.Output will be appended to the destination.

typeFloatFlagadded inv0.2.1

type FloatFlaguint64

FloatFlag is a flag recorded when parsing floats.

const (// FloatOverflowedInteger is set when number in JSON was in integer notation,// but under/overflowed both int64 and uint64 and therefore was parsed as float.FloatOverflowedIntegerFloatFlag = 1 <<iota)

func (FloatFlag)Flagsadded inv0.2.1

func (fFloatFlag) Flags(more ...FloatFlag)FloatFlags

Flags converts the flag to FloatFlags and optionally merges more flags.

typeFloatFlagsadded inv0.2.1

type FloatFlagsuint64

FloatFlags are flags recorded when converting floats.

func (FloatFlags)Containsadded inv0.2.1

func (fFloatFlags) Contains(flagFloatFlag)bool

Contains returns whether f contains the specified flag.

typeIter

type Iter struct {// contains filtered or unexported fields}

Iter represents a section of JSON.To start iterating it, use Advance() or AdvanceIter() methodswhich will queue the first element.If an Iter is copied, the copy will be independent.

func (*Iter)Advance

func (i *Iter) Advance()Type

Advance will read the type of the next elementand queues up the value on the same level.

func (*Iter)AdvanceInto

func (i *Iter) AdvanceInto()Tag

AdvanceInto will read the tag of the next elementand move into and out of arrays , objects and root elements.This should only be used for strictly manual parsing.

func (*Iter)AdvanceIter

func (i *Iter) AdvanceIter(dst *Iter) (Type,error)

AdvanceIter will read the type of the next elementand return an iterator only containing the object.If dst and i are the same, both will contain the value inside.

func (*Iter)Array

func (i *Iter) Array(dst *Array) (*Array,error)

Array will return the next element as an array.An optional destination can be given.

func (*Iter)Bool

func (i *Iter) Bool() (bool,error)

Bool returns the bool value.

func (*Iter)FindElementadded inv0.3.0

func (i *Iter) FindElement(dst *Element, path ...string) (*Element,error)

FindElement allows searching for fields and objects by path from the iter and forward,moving into root and objects, but not arrays.For example "Image", "Url" will search the current root/object for an "Image"object and return the value of the "Url" element.ErrPathNotFound is returned if any part of the path cannot be found.If the tape contains an error it will be returned.The iter will *not* be advanced.

Example
if !SupportedCPU() {// Fake itfmt.Println("int\n100 <nil>")return}input := `{    "Image":    {        "Animated": false,        "Height": 600,        "IDs":        [            116,            943,            234,            38793        ],        "Thumbnail":        {            "Height": 125,            "Url": "http://www.example.com/image/481989943",            "Width": 100        },        "Title": "View from 15th Floor",        "Width": 800    },"Alt": "Image of city" }`pj, err := Parse([]byte(input), nil)if err != nil {log.Fatal(err)}i := pj.Iter()// Find element in path.elem, err := i.FindElement(nil, "Image", "Thumbnail", "Width")if err != nil {log.Fatal(err)}// Print result:fmt.Println(elem.Type)fmt.Println(elem.Iter.StringCvt())
Output:int100 <nil>

func (*Iter)Float

func (i *Iter) Float() (float64,error)

Float returns the float value of the next element.Integers are automatically converted to float.

func (*Iter)FloatFlagsadded inv0.2.1

func (i *Iter) FloatFlags() (float64,FloatFlags,error)

FloatFlags returns the float value of the next element.This will include flags from parsing.Integers are automatically converted to float.

func (*Iter)Int

func (i *Iter) Int() (int64,error)

Int returns the integer value of the next element.Integers and floats within range are automatically converted.

func (*Iter)Interface

func (i *Iter) Interface() (interface{},error)

Interface returns the value as an interface.Objects are returned as map[string]interface{}.Arrays are returned as []interface{}.Float values are returned as float64.Integer values are returned as int64 or uint64.String values are returned as string.Boolean values are returned as bool.Null values are returned as nil.Root objects are returned as []interface{}.

func (*Iter)MarshalJSON

func (i *Iter) MarshalJSON() ([]byte,error)

MarshalJSON will marshal the entire remaining scope of the iterator.

func (*Iter)MarshalJSONBuffer

func (i *Iter) MarshalJSONBuffer(dst []byte) ([]byte,error)

MarshalJSONBuffer will marshal the remaining scope of the iterator including the current value.An optional buffer can be provided for fewer allocations.Output will be appended to the destination.

func (*Iter)Object

func (i *Iter) Object(dst *Object) (*Object,error)

Object will return the next element as an object.An optional destination can be given.

func (*Iter)PeekNext

func (i *Iter) PeekNext()Type

PeekNext will return the next value type.Returns TypeNone if next ends iterator.

func (*Iter)PeekNextTag

func (i *Iter) PeekNextTag()Tag

PeekNextTag will return the tag at the current offset.Will return TagEnd if at end of iterator.

func (*Iter)Root

func (i *Iter) Root(dst *Iter) (Type, *Iter,error)

Root returns the object embedded in root as an iteratoralong with the type of the content of the first element of the iterator.An optional destination can be supplied to avoid allocations.

func (*Iter)SetBooladded inv0.3.0

func (i *Iter) SetBool(vbool)error

SetBool can change a bool or null type to bool with the specified value.Attempting to change other types will return an error.

func (*Iter)SetFloatadded inv0.3.0

func (i *Iter) SetFloat(vfloat64)error

SetFloat can change a float, int, uint or string with the specified value.Attempting to change other types will return an error.

func (*Iter)SetIntadded inv0.3.0

func (i *Iter) SetInt(vint64)error

SetInt can change a float, int, uint or string with the specified value.Attempting to change other types will return an error.

func (*Iter)SetNulladded inv0.3.0

func (i *Iter) SetNull()error

SetNull can change the following types to null:Bool, String, (Unsigned) Integer, Float, Objects and Arrays.Attempting to change other types will return an error.

func (*Iter)SetStringadded inv0.3.0

func (i *Iter) SetString(vstring)error

SetString can change a string, int, uint or float with the specified string.Attempting to change other types will return an error.

func (*Iter)SetStringBytesadded inv0.3.0

func (i *Iter) SetStringBytes(v []byte)error

SetStringBytes can change a string, int, uint or float with the specified string.Attempting to change other types will return an error.Sending nil will add an empty string.

func (*Iter)SetUIntadded inv0.3.0

func (i *Iter) SetUInt(vuint64)error

SetUInt can change a float, int, uint or string with the specified value.Attempting to change other types will return an error.

func (*Iter)String

func (i *Iter) String() (string,error)

String() returns a string value.

func (*Iter)StringBytes

func (i *Iter) StringBytes() ([]byte,error)

StringBytes returns a string as byte array.

func (*Iter)StringCvt

func (i *Iter) StringCvt() (string,error)

StringCvt returns a string representation of the value.Root, Object and Arrays are not supported.

func (*Iter)Type

func (i *Iter) Type()Type

Type returns the queued value type from the previous call to Advance.

func (*Iter)Uint

func (i *Iter) Uint() (uint64,error)

Uint returns the unsigned integer value of the next element.Positive integers and floats within range are automatically converted.

typeObject

type Object struct {// contains filtered or unexported fields}

Object represents a JSON object.

func (*Object)DeleteElemsadded inv0.4.3

func (o *Object) DeleteElems(fn func(key []byte, iIter)bool, onlyKeys map[string]struct{})error

DeleteElems will call back fn for each key.If true is returned, the key+value is deleted.A key filter can be provided for optional filtering.If fn is nil all elements in onlyKeys will be deleted.If both are nil all elements are deleted.

func (*Object)FindKey

func (o *Object) FindKey(keystring, dst *Element) *Element

FindKey will return a single named element.An optional destination can be given.The method will return nil if the element cannot be found.This should only be used to locate a single key where the object is no longer needed.The object will not be advanced.

func (*Object)FindPathadded inv0.3.0

func (o *Object) FindPath(dst *Element, path ...string) (*Element,error)

FindPath allows searching for fields and objects by path.Separate each object name by /.For example `Image/Url` will search the current object for an "Image"object and return the value of the "Url" element.ErrPathNotFound is returned if any part of the path cannot be found.If the tape contains an error it will be returned.The object will not be advanced.

Example
if !SupportedCPU() {// Fake itfmt.Println("string\nhttp://www.example.com/image/481989943 <nil>")return}input := `{    "Image":    {        "Animated": false,        "Height": 600,        "IDs":        [            116,            943,            234,            38793        ],        "Thumbnail":        {            "Height": 125,            "Url": "http://www.example.com/image/481989943",            "Width": 100        },        "Title": "View from 15th Floor",        "Width": 800    },"Alt": "Image of city" }`pj, err := Parse([]byte(input), nil)if err != nil {log.Fatal(err)}i := pj.Iter()i.AdvanceInto()// Grab root_, root, err := i.Root(nil)if err != nil {log.Fatal(err)}// Grab top objectobj, err := root.Object(nil)if err != nil {log.Fatal(err)}// Find element in path.elem, err := obj.FindPath(nil, "Image", "Thumbnail", "Url")if err != nil {log.Fatal(err)}// Print result:fmt.Println(elem.Type)fmt.Println(elem.Iter.String())
Output:stringhttp://www.example.com/image/481989943 <nil>

func (*Object)ForEachadded inv0.4.0

func (o *Object) ForEach(fn func(key []byte, iIter), onlyKeys map[string]struct{})error

ForEach will call back fn for each key.A key filter can be provided for optional filtering.

func (*Object)Map

func (o *Object) Map(dst map[string]interface{}) (map[string]interface{},error)

Map will unmarshal into a map[string]interface{}See Iter.Interface() for a reference on value types.

func (*Object)NextElement

func (o *Object) NextElement(dst *Iter) (namestring, tType, errerror)

NextElement sets dst to the next element and returns the name.TypeNone with nil error will be returned if there are no more elements.

func (*Object)NextElementBytes

func (o *Object) NextElementBytes(dst *Iter) (name []byte, tType, errerror)

NextElementBytes sets dst to the next element and returns the name.TypeNone with nil error will be returned if there are no more elements.Contrary to NextElement this will not cause allocations.

func (*Object)Parse

func (o *Object) Parse(dst *Elements) (*Elements,error)

Parse will return all elements and iterators.An optional destination can be given.The Object will be consumed.

typeParsedJson

type ParsedJson struct {Message []byteTape    []uint64Strings *TStrings// contains filtered or unexported fields}

funcParse

func Parse(b []byte, reuse *ParsedJson, opts ...ParserOption) (*ParsedJson,error)

Parse an object or array from a block of data and return the parsed JSON.An optional block of previously parsed json can be supplied to reduce allocations.

funcParseND

func ParseND(b []byte, reuse *ParsedJson, opts ...ParserOption) (*ParsedJson,error)

ParseND will parse newline delimited JSON objects or arrays.An optional block of previously parsed json can be supplied to reduce allocations.

func (*ParsedJson)Cloneadded inv0.3.0

func (pj *ParsedJson) Clone(dst *ParsedJson) *ParsedJson

Clone returns a deep clone of the ParsedJson.If a nil destination is sent a new will be created.

func (*ParsedJson)ForEachadded inv0.4.0

func (pj *ParsedJson) ForEach(fn func(iIter)error)error

ForEach returns each line in NDJSON, or the top element in non-ndjson.This will usually be an object or an array.If the callback returns a non-nil error parsing stops and the errors is returned.

Example
if !SupportedCPU() {// Fake resultsfmt.Println("Got iterator for type: object\nFound element: URL Type: string Value: http://example.com/example.gif")return}// Parse JSON:pj, err := Parse([]byte(`{"Image":{"URL":"http://example.com/example.gif"}}`), nil)if err != nil {log.Fatal(err)}// Create an element we can reuse.var element *Elementerr = pj.ForEach(func(i Iter) error {fmt.Println("Got iterator for type:", i.Type())element, err = i.FindElement(element, "Image", "URL")if err == nil {value, _ := element.Iter.StringCvt()fmt.Println("Found element:", element.Name, "Type:", element.Type, "Value:", value)}return nil})if err != nil {log.Fatal(err)}
Output:Got iterator for type: objectFound element: URL Type: string Value: http://example.com/example.gif

func (*ParsedJson)Iter

func (pj *ParsedJson) Iter()Iter

Iter returns a new Iter.

func (*ParsedJson)Reset

func (pj *ParsedJson) Reset()

typeParserOptionadded inv0.2.2

type ParserOption func(pj *internalParsedJson)error

ParserOption is a parser option.

funcWithCopyStringsadded inv0.2.2

func WithCopyStrings(bbool)ParserOption

WithCopyStrings will copy strings so they no longer reference the input.For enhanced performance, simdjson-go can point back into the original JSON buffer for strings,however this can lead to issues in streaming use cases scenarios, or scenarios in whichthe underlying JSON buffer is reused. So the default behaviour is to create copies of allstrings (not just those transformed anyway for unicode escape characters) into the separateStrings buffer (at the expense of using more memory and less performance).Default: true - strings are copied.

typeSerializeradded inv0.1.4

type Serializer struct {// contains filtered or unexported fields}

Serializer allows to serialize parsed json and read it back.A Serializer can be reused, but not used concurrently.

funcNewSerializeradded inv0.1.4

func NewSerializer() *Serializer

NewSerializer will create and initialize a Serializer.

func (*Serializer)CompressModeadded inv0.1.4

func (s *Serializer) CompressMode(cCompressMode)

func (*Serializer)Deserializeadded inv0.1.4

func (s *Serializer) Deserialize(src []byte, dst *ParsedJson) (*ParsedJson,error)

Deserialize the content in src.Only basic sanity checks will be performed.Slight corruption will likely go through unnoticed.And optional destination can be provided.

func (*Serializer)Serializeadded inv0.1.4

func (s *Serializer) Serialize(dst []byte, pjParsedJson) []byte

Serialize the data in pj and return the data.An optional destination can be provided.

typeStream

type Stream struct {Value *ParsedJsonErrorerror}

A Stream is used to stream back results.Either Error or Value will be set on returned results.

typeTStringsadded inv0.3.0

type TStrings struct {B []byte}

typeTag

type Taguint8

Tag indicates the data type of a tape entry

func (Tag)String

func (tTag) String()string

func (Tag)Type

func (tTag) Type()Type

Type converts a tag to a type.Only basic types and array+object start match a type.

typeType

type Typeuint8

Type is a JSON value type.

const (TypeNoneType =iotaTypeNullTypeStringTypeIntTypeUintTypeFloatTypeBoolTypeObjectTypeArrayTypeRoot)

func (Type)String

func (tType) String()string

String returns the type as a string.

Source Files

View all Source files

Directories

PathSynopsis
examples
openrtbcommand

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f orF : Jump to
y orY : Canonical URL
go.dev uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic.Learn more.

[8]ページ先頭

©2009-2025 Movatter.jp