arangodb/velocypackPublic

NotificationsYou must be signed in to change notification settings
Fork43
Star428

A fast and compact format for serialization and storage

License

Unknown, Unknown licenses found

Licenses found

428 stars 43 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,347 Commits
.circleci		.circleci
.github		.github
cmake/Modules		cmake/Modules
examples		examples
external/sse2neon		external/sse2neon
include/velocypack		include/velocypack
scripts		scripts
src		src
tests		tests
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Comparison.md		Comparison.md
Install.md		Install.md
LICENSE		LICENSE
LICENSE-xxhash		LICENSE-xxhash
Performance.md		Performance.md
README.md		README.md
TODO		TODO
VERSION		VERSION
VelocyPack.md		VelocyPack.md
download-rapidjson.sh		download-rapidjson.sh
download-simdjson.sh		download-simdjson.sh

Repository files navigation

VelocyPack (VPack) - a fast and compact format for serialization and storage

CircleCI:

Motivation

These days, JSON (JavaScript Object Notation, seeECMA-404)is used in many cases where data has to be exchanged.Lots of protocols between different services use it, databases storeJSON (document stores naturally, but others increasingly as well). Itis popular, because it is simple, human-readable, and yet surprisinglyversatile, despite its limitations.

At the same time there is a plethora of alternatives ranging from XMLover Universal Binary JSON, MongoDB's BSON, MessagePack, BJSON (binaryJSON), Apache Thrift till Google's protocol buffers and ArangoDB'sshaped JSON.

When looking into this, we were surprised to find that none of theseformats manages to combine compactness, platform independence, fastaccess to sub-objects and rapid conversion from and to JSON.

We have invented VPack because we need a binary format that

is self-contained and schemaless
is compact
is largely platform independent (seePortability)
covers all of JSON plus dates, integers, binary data and arbitraryprecision numbers
can be used in a database kernel to access sub-documents forexample for indexes, so it must be possible to access sub-documents(array and object members) efficiently
can be transferred to JSON and from JSON rapidly
avoids too many memory allocations
gives flexibility to assemble objects, such that sub-objects residein the database in an unchanged way
allows to use an external table for frequently used attribute names
quickly allows to read off the type and length of a given objectfrom its first byte(s)

All this gives us the possibility to usethe same byte sequence ofdata fortransport,storage and (read-only)work. Using asingle data format not only eliminates a lot of conversions but canalso reduce runtime memory usage, as data does only need a singlein-memory representation.

The other popular formats we looked at have all some deficiency withrespect to the above list. To name but a few:

JSON itself lacks some data types (dates and binary data) and doesnot provide quick sub-value access without parsing. Parsing JSON isalso quite a challenge performance-wise
XML is not compact and is not good with binary data, it also lacksquick sub-value access
BSON gets quite a lot right with respect to data types, but isseriously lacking w.r.t. sub-value access. Furthermore, it is notvery compact and quite wasteful space-wise when storing array values
Apache Thrift and Google's Protocol Buffers are not schemaless andself-contained. Their transport format is a serialization that isnot good for rapid sub-value access
MessagePack is probably the closest to our shopping list. It hashas decent data types and is quite compact. However, we found thatone can do better in terms of compactness for some cases. Moreimportant for us, MessagePack provides no quick sub-value access
Our own shaped JSON (used in ArangoDB as internal storage format)has very quick sub-value access, but the shape data is kept outsidethe actual data, so the shaped values are not self-contained.Furthermore, we have run into scalability issues on multi-corebecause of the shared data structures used for interpretation ofthe values

Any new data format must be backed by C++ classes to allow

easy and fast parsing from JSON
easy and convenient buildup without too many memory allocations
fast access to data and its sub-objects (for arrays and objects)
flexible memory management
fast dumping to JSON

The VelocyPack format is an attempt to achieve all this.

This repository contains a C++ library for building, manipulating andserializing VPack data. It is thereference implementation for theVelocyPack format. The library is written in C++20 so it should compileon many up-to-date systems.

The VelocyPack format and library are used extensively in theArangoDB database.

Specification

See the fileVelocyPack.md for a detailed description ofthe VPack format.

Performance

See the filePerformance.md for a thorough comparisonto other formats like JSON itself, MessagePack and BSON. We look at filesizes as well as parsing and conversion performance.

Building the VPack library

The VPack library can be built on Linux, MacOS and Windows. It will likelycompile and work on other platforms for which a recent version ofcmake anda working C++20-enabled compiler are available.

See the fileInstall.md for compilation and installationinstructions.

Using the VPack library

Please consult the fileexamples/API.md for usage examples,and the fileexamples/Embedding.md for informationabout how to embed the library into client applications.

Testing and validating with fuzzer

The fuzzer tool can be used to generate random VPack or JSON structures andvalidate them. The tool can be run with multiple iterations, parallelism, anda seed can be provided for the random generation.Please consult the filetools/README.md for usageinformation.

Contributing

We welcome bug fixes and patches from 3rd party contributors!

Please follow the guidelines inCONTRIBUTING.mdif you want to contribute to VelocyPack. Have a look for the taghelp wantedin the issue tracker!

We also provide a golang version of VPack in thego-velocypack repository and aJava version in thejava-velocypack.

Additionally, there is a third party VPack implementation forPHP.

About

A fast and compact format for serialization and storage

Topics

serialization json performance cplusplus arangodb vpack velocypack

Resources

Readme

License

Unknown, Unknown licenses found

Contributors21

+ 7 contributors

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Licenses found

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

VelocyPack (VPack) - a fast and compact format for serialization and storage

Motivation

Specification

Performance

Building the VPack library

Using the VPack library

Testing and validating with fuzzer

Contributing

About

Topics

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors21

Uh oh!

Languages

Movatterモバイル変換

License

Licenses found

arangodb/velocypack

Folders and files

Latest commit

History

Repository files navigation

VelocyPack (VPack) - a fast and compact format for serialization and storage

Motivation

Specification

Performance

Building the VPack library

Using the VPack library

Testing and validating with fuzzer

Contributing

About

Topics

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors21

Uh oh!

Languages