Movatterモバイル変換


[0]ホーム

URL:


Skip to contents

Write Parquet file to disk

Source:R/parquet.R
write_parquet.Rd

Parquet is a columnar storage file format.This function enables you to write Parquet files from R.

Usage

write_parquet(x,sink,  chunk_size=NULL,  version="2.4",  compression=default_parquet_compression(),  compression_level=NULL,  use_dictionary=NULL,  write_statistics=NULL,  data_page_size=NULL,  use_deprecated_int96_timestamps=FALSE,  coerce_timestamps=NULL,  allow_truncated_timestamps=FALSE)

Arguments

x

data.frame,RecordBatch, orTable

sink

A string file path, connection, URI, orOutputStream, or path in a filesystem (SubTreeFileSystem)

chunk_size

how many rows of data to write to disk at once. Thisdirectly corresponds to how many rows will be in each row group inparquet. IfNULL, a best guess will be made for optimal size (based onthe number of columns and number of rows), though if the data has fewerthan 250 million cells (rows x cols), then the total number of rows isused.

version

parquet version: "1.0", "2.4" (default), "2.6", or"latest" (currently equivalent to 2.6). Numeric values arecoerced to character.

compression

compression algorithm. Default "snappy". See details.

compression_level

compression level. Meaning depends on compressionalgorithm

use_dictionary

logical: use dictionary encoding? DefaultTRUE

write_statistics

logical: include statistics? DefaultTRUE

data_page_size

Set a target threshold for the approximate encodedsize of data pages within a column chunk (in bytes). Default 1 MiB.

use_deprecated_int96_timestamps

logical: write timestamps to INT96Parquet format, which has been deprecated? DefaultFALSE.

coerce_timestamps

Cast timestamps a particular resolution. Can beNULL, "ms" or "us". DefaultNULL (no casting)

allow_truncated_timestamps

logical: Allow loss of data when coercingtimestamps to a particular resolution. E.g. if microsecond or nanoseconddata is lost when coercing to "ms", do not raise an exception. DefaultFALSE.

Value

the inputx invisibly.

Details

Due to features of the format, Parquet files cannot be appended to.If you want to use the Parquet format but also want the ability to extendyour dataset, you can write to additional Parquet files and then treatthe whole directory of files as aDataset you can query.See thedatasetarticle for examples of this.

The parameterscompression,compression_level,use_dictionary andwrite_statistics support various patterns:

  • The defaultNULL leaves the parameter unspecified, and the C++ libraryuses an appropriate default for each column (defaults listed above)

  • A single, unnamed, value (e.g. a single string forcompression) applies to all columns

  • An unnamed vector, of the same size as the number of columns, to specify avalue for each column, in positional order

  • A named vector, to specify the value for the named columns, the defaultvalue for the setting is used when not supplied

Thecompression argument can be any of the following (case-insensitive):"uncompressed", "snappy", "gzip", "brotli", "zstd", "lz4", "lzo" or "bz2".Only "uncompressed" is guaranteed to be available, but "snappy" and "gzip"are almost always included. Seecodec_is_available().The default "snappy" is used if available, otherwise "uncompressed". Todisable compression, setcompression = "uncompressed".Note that "uncompressed" columns may still have dictionary encoding.

See also

ParquetFileWriter for a lower-level interface to Parquet writing.

Examples

tf1<-tempfile(fileext=".parquet")write_parquet(data.frame(x=1:5),tf1)# using compressionif(codec_is_available("gzip")){tf2<-tempfile(fileext=".gz.parquet")write_parquet(data.frame(x=1:5),tf2, compression="gzip", compression_level=5)}

[8]ページ先頭

©2009-2025 Movatter.jp