Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
/hpsPublic

High Performance C++11 Serialization Library

License

NotificationsYou must be signed in to change notification settings

jl2922/hps

Repository files navigation

A C++11 High Performance Serialization Library.

Build Status

Overview

HPS is a high performance header-only C++11 library for data serialization.It can encode structured data or objects into a flat and compressed format efficiently, so that we can pass them over the network or write them to the file system faster, or store them more compactly in the memory.

It has thestate of the art performance and beats all the well-known serialization libraries.For example, compared to Boost Serialization, HPS is up to 150% faster and uses up to 40% less bytes for several common data structures.Check the benchmarks below for details.

In addition, it requires the least amount of human efforts to use.There isno need for making a separate schema file or using special data structures, HPS works with STL containers and user-defined types directly.This design makes the binding of data and serialization methods more cohesive and can often give amuch cleaner design, especially when there is composition or inheritance between classes.

HPS is being actively used in a quantum chemistry package (developed by the Umrigar Group at Cornell University LASSP Lab) and has successfully and efficiently serialized/parsedpetabytes of scientific data andreduced petabytes of network traffic due to the usage of a compact encoding scheme in HPS.

Citation

@article{li2018hps,  title={HPS: A C++ 11 High Performance Serialization Library},  author={Li, Junhao},  journal={arXiv preprint arXiv:1811.04556},  year={2018}}

Installation

Not needed!HPS is a header-only library.Simply include thehps.h file, which includes all the other headers.

Benchmark

The performance of HPS compared to other well-known C++ serializers for some most common data structures are as follows: (less is better)

Serialize and Parse Time

Serialized Message Size

The test codes are in thebenchmark directory.You can follow the continuous integration script inci.sh to install the libraries and reproduce these results.

The sparse matrix is stored as a list of rows, each of which contains a list of 64-bit integers for the column indices and a list of doubles for the values.The hash map is a map from strings to doubles.Both HPS and Boost can serializestd::unordered_map directly, ProtoBuf uses its own Map type and CapnProto does not support hash map or similar types.

In addition to the traditional benchmarks for computational cost, we also provide the human efforts cost in terms of source lines of code for these test cases: (less is better)

SLOCdouble arraysparse matrixhash mapfixed cost
protobuf12231217
capnproto1525-21
boost13201313
hps71672

Note: fixed cost includes the estimated amount of lines of commands needed for an experienced user to install the library, set the environment variables, extra lines of code needed in the Makefile, and various includes, etc.

Usage

HPS is super easy to use.For primitive types and most STL containers, serialization requires only one line of code.

#include<cassert>#include<iostream>#include"../src/hps.h"intmain() {  std::vector<int>data({22,333, -4444});  std::string serialized =hps::to_string(data);auto parsed = hps::from_string<std::vector<int>>(serialized);assert(parsed == data);  std::cout <<"size (B):" << serialized.size() << std::endl;// size (B): 6return0;}// Compile with C++11 or above.

There are also theto_stream andfrom_stream functions for writing the data to or reading it from file streams.For example

std::ofstreamout_file("data.log", std::ofstream::binary);hps::to_stream(data, out_file);std::ifstreamin_file("data.log", std::ifstream::binary);auto parsed = hps::from_stream<std::vector<int>>(in_file);

The bottom of this document contains all the APIs that HPS provides.

We can also extend HPS to support custom types.HPS internally uses static polymorphism on the classSerializer<DataType, BufferType> to support different types.Serializer<DataType, BufferType> will call theserialize andparse methods of the corresponding type by default.All we need to do is either provide theserialize andparse methods for the new type or specialize theSerializer class, and HPS will support it, together with any combination of this type with STL containers and other specialized types.

The following example shows the serialization of a custom quantum system object by providing itsserialize andparse methods.

#include<cassert>#include<iostream>#include"../src/hps.h"classQuantumState {public:unsigned n_elecs;  std::unordered_set<unsigned> orbs_from;  std::unordered_set<unsigned> orbs_to;template<classB>voidserialize(B& buf)const {    buf << n_elecs << orbs_from << orbs_to;  }template<classB>voidparse(B& buf) {    buf >> n_elecs >> orbs_from >> orbs_to;  }};intmain() {  QuantumState qs;  qs.n_elecs =33;  qs.orbs_from.insert({11,22});  qs.orbs_to.insert({44,66});  std::string serialized =hps::to_string(qs);  std::cout <<"size (B):" << serialized.size() << std::endl;// size (B): 7return0;}// Compile with C++11 or above.

For examples on extending HPS by specializing theSerializer class, you can check our source code for primitive types and STL containers, such asfloat_serializer.h, where we specialize theSerializer for all the floating point numbers (usingSFINAE).

Encoding Scheme

The encoding scheme of HPS is very similar to Google's protobuf.Google provides anextremely detailed exlanation on that.

The major difference between protobuf's encoding scheme and HPS' is thatHPS does not store field numbers or wire types.This gives HPS a significant advantage in both the speed and the size of the serialized messages over protobuf, especially when there are many fields and nested structures.

API Reference

// Serialize data t to an STL ostream.voidto_stream(const T& t, std::ostream& stream);
// Parse from an STL istream and save to the data t passed in.// Recommended for repeated use inside a loop.voidfrom_stream(T& t, std::istream& stream);
// Parse from an STL istream and return the data.T from_stream<T>(std::istream& stream);
// Serialize data t to the STL string passed in.// Recommended for repeated use inside a loop.voidto_string(const T& t, std::string& str);
// Serialize data t to an STL string and return it.std::stringto_string(const T& t);
// Parse from an STL string and save to the data t passed in.// Recommended for repeated use inside a loop.voidfrom_string(T& t,const std::string& str);
// Parse from an STL string and return the data.T from_string<T>(const std::string& str);
// Parse from a char array and save to the data t passed in.// Recommended for repeated use inside a loop.voidfrom_char_array(T& t,constchar* arr);
// Parse from a char array and return the data.T from_char_array<T>(constchar* arr);

HPS supports the following types and any combinations of them out of the box:

  • All primitive numeric types, a.k.a.std::is_arithmetic<T>, e.g.int, double, bool, char, uint8_t, size_t, ...
  • STL containersstring, array, deque, list, map, unordered_map, set, unordered_set, pair, vector, unique_ptr.

Tips for Heterogeneous Data

Heterogeneous data here refers to messages that contain data structures that occur repeatedly but have some fields missing irregularly.

There is no panacea for achieving the best performance for this type of data in all cases.

Protobuf uses an additional integer to indicate the existence of each field, which is best suitable for cases where there are lots of missing fields.

Another possible encoding scheme is bit representation, i.e., use a bit vector to indicate the existence of the fields.This is best suitable for cases where fields are missing less often.Note that there is no need to deal with bit operations manually.An STLvector<bool> will use a compact format automatically.

And for cases where most of the fields are present, the reverse of protobuf's scheme will be the best choice, i.e., use a vector to store the indices of the missing fields.


[8]ページ先頭

©2009-2025 Movatter.jp