Movatterモバイル変換


[0]ホーム

URL:


qs2

R-CMD-checkCRAN-Status-BadgeCRAN-Downloads-BadgeCRAN-Downloads-Total-Badge

qs2: a framework for efficient serialization

qs2 is the successor to theqs package. Thegoal is to have reliable and fast performance for saving and loadingobjects in R.

Theqs2 format directly uses R serialization (via theR_Serialize/R_Unserialize C API) whileimproving underlying compression and disk IO patterns. If you arefamiliar with theqs package, the benefits and usage arethe same.

qs_save(data,"myfile.qs2")data<-qs_read("myfile.qs2")

Use the file extensionqs2 to distinguish it from theoriginalqs package. It is not compatible with the originalqs format.

Installation

install.packages("qs2")

On x64 Mac or Linux, you can enable multi-threading by compiling fromsource. It is enabled by default on Windows.

remotes::install_cran("qs2",type ="source",configure.args ="--with-TBB --with-simd=AVX2")

On non-x64 systems (e.g. Mac ARM) remove the AVX2 flag.

remotes::install_cran("qs2",type ="source",configure.args ="--with-TBB")

Multi-threading inqs2 uses theIntel Thread Building Blocks framework via theRcppParallel package.

Converting qs2 to RDS

Because theqs2 format directly uses R serialization,you can convert it to RDS and vice versa.

file_qs2<-tempfile(fileext =".qs2")file_rds<-tempfile(fileext =".RDS")x<-runif(1e6)# save `x` with qs_saveqs_save(x, file_qs2)# convert the file to RDSqs_to_rds(input_file = file_qs2,output_file = file_rds)# read `x` back in with `readRDS`xrds<-readRDS(file_rds)stopifnot(identical(x, xrds))

Validating file integrity

Theqs2 format saves an internal checksum. This can beused to test for file corruption before deserialization via thevalidate_checksum parameter, but has a minor performancepenalty.

qs_save(data,"myfile.qs2")data<-qs_read("myfile.qs2",validate_checksum =TRUE)

The qdata format

The package also introduces theqdata format which hasits own serialization layout and works with only data types (vectors,lists, data frames, matrices).

It will replace internal types (functions, promises, externalpointers, environments, objects) with NULL. Theqdataformat differs from theqs2 format in that it is NOT ageneral.

The eventual goal ofqdata is to also haveinteroperability with other languages, particularlyPython.

qd_save(data,"myfile.qs2")data<-qd_read("myfile.qs2")

Benchmarks

A summary across 4 datasets is presented below.

Single-threaded

AlgorithmCompressionSave Time (s)Read Time (s)
qs27.9613.450.4
qdata8.4510.534.8
base::serialize1.18.8751.4
saveRDS8.6810763.7
fst2.595.0946.3
parquet8.2920.338.4
qs (legacy)7.979.1348.1

Multi-threaded (8 threads)

AlgorithmCompressionSave Time (s)Read Time (s)
qs27.963.7948.1
qdata8.451.9833.1
fst2.595.0546.6
parquet8.2920.237.0
qs (legacy)7.973.2152.0

Datasets used

These datasets are openly licensed and represent a combination ofnumeric and text data across multiple domains. Seeinst/analysis/datasets.R on Github.

Usage in C/C++

Serialization functions can be accessed in compiled code. Below is anexample using Rcpp.

// [[Rcpp::depends(qs2)]]#include<Rcpp.h>#include"qs2_external.h"usingnamespace Rcpp;// [[Rcpp::export]]SEXP test_qs_serialize(SEXP x){size_t len=0;unsignedchar* buffer= c_qs_serialize(x,&len,10,true,4);// object, buffer length, compress_level, shuffle, nthreads SEXP y= c_qs_deserialize(buffer, len,false,4);// buffer, buffer length, validate_checksum, nthreads c_qs_free(buffer);// must manually free bufferreturn y;}// [[Rcpp::export]]SEXP test_qd_serialize(SEXP x){size_t len=0;unsignedchar* buffer= c_qd_serialize(x,&len,10,true,4);// object, buffer length, compress_level, shuffle, nthreads SEXP y= c_qd_deserialize(buffer, len,false,false,4);// buffer, buffer length, use_alt_rep, validate_checksum, nthreads c_qd_free(buffer);// must manually free bufferreturn y;}/*** Rx <- runif(1e7)stopifnot(test_qs_serialize(x) == x)stopifnot(test_qd_serialize(x) == x)*/

Global Options for qs2

The following global options control the behavior of theqs2 functions. These global options can be queried ormodified usingqopt function.



[8]ページ先頭

©2009-2025 Movatter.jp