Dataset file formats

Source:R/dataset-format.R

FileFormat.Rd

AFileFormat holds information about how to read and parse the filesincluded in aDataset. There are subclasses corresponding to the supportedfile formats (ParquetFileFormat andIpcFileFormat).

Factory

FileFormat$create() takes the following arguments:

format: A string identifier of the file format. Currently supported values:
- "parquet"
- "ipc"/"arrow"/"feather", all aliases for each other; for Feather, note thatonly version 2 files are supported
- "csv"/"text", aliases for the same thing (because comma is the defaultdelimiter for text files
- "tsv", equivalent to passingformat = "text", delimiter = "\t"
...: Additional format-specific options
format = "parquet":
- dict_columns: Names of columns which should be read as dictionaries.
- Any Parquet options fromFragmentScanOptions.
format = "text": seeCsvParseOptions. Note that you can specify them eitherwith the Arrow C++ library naming ("delimiter", "quoting", etc.) or thereadr-style naming used inread_csv_arrow() ("delim", "quote", etc.).Not allreadr options are currently supported; please file an issue ifyou encounter one thatarrow should support. Also, the following options aresupported. FromCsvReadOptions:
- skip_rows
- column_names. Note that if aSchema is specified,column_names must match those specified in the schema.
- autogenerate_column_namesFromCsvFragmentScanOptions (these values can be overridden at scan time):
- convert_options: aCsvConvertOptions
- block_size

It returns the appropriate subclass ofFileFormat (e.g.ParquetFileFormat)

Examples

## Semi-colon delimited files# Set up directory for examplestf<-tempfile()dir.create(tf)on.exit(unlink(tf))write.table(mtcars,file.path(tf,"file1.txt"), sep=";", row.names=FALSE)# Create FileFormat objectformat<-FileFormat$create(format="text", delimiter=";")open_dataset(tf, format=format)#> FileSystemDataset with 1 csv file#> 11 columns#> mpg: double#> cyl: int64#> disp: double#> hp: int64#> drat: double#> wt: double#> qsec: double#> vs: int64#> am: int64#> gear: int64#> carb: int64

Movatterモバイル変換

Using the package

Arrow concepts

Installation

Dataset file formats

Factory

Examples