Movatterモバイル変換


[0]ホーム

URL:


Skip to contents

Dataset file formats

Source:R/dataset-format.R
FileFormat.Rd

AFileFormat holds information about how to read and parse the filesincluded in aDataset. There are subclasses corresponding to the supportedfile formats (ParquetFileFormat andIpcFileFormat).

Factory

FileFormat$create() takes the following arguments:

  • format: A string identifier of the file format. Currently supported values:

    • "parquet"

    • "ipc"/"arrow"/"feather", all aliases for each other; for Feather, note thatonly version 2 files are supported

    • "csv"/"text", aliases for the same thing (because comma is the defaultdelimiter for text files

    • "tsv", equivalent to passingformat = "text", delimiter = "\t"

  • ...: Additional format-specific options

    format = "parquet":

    • dict_columns: Names of columns which should be read as dictionaries.

    • Any Parquet options fromFragmentScanOptions.

    format = "text": seeCsvParseOptions. Note that you can specify them eitherwith the Arrow C++ library naming ("delimiter", "quoting", etc.) or thereadr-style naming used inread_csv_arrow() ("delim", "quote", etc.).Not allreadr options are currently supported; please file an issue ifyou encounter one thatarrow should support. Also, the following options aresupported. FromCsvReadOptions:

    • skip_rows

    • column_names. Note that if aSchema is specified,column_names must match those specified in the schema.

    • autogenerate_column_namesFromCsvFragmentScanOptions (these values can be overridden at scan time):

    • convert_options: aCsvConvertOptions

    • block_size

It returns the appropriate subclass ofFileFormat (e.g.ParquetFileFormat)

Examples

## Semi-colon delimited files# Set up directory for examplestf<-tempfile()dir.create(tf)on.exit(unlink(tf))write.table(mtcars,file.path(tf,"file1.txt"), sep=";", row.names=FALSE)# Create FileFormat objectformat<-FileFormat$create(format="text", delimiter=";")open_dataset(tf, format=format)#> FileSystemDataset with 1 csv file#> 11 columns#> mpg: double#> cyl: int64#> disp: double#> hp: int64#> drat: double#> wt: double#> qsec: double#> vs: int64#> am: int64#> gear: int64#> carb: int64

[8]ページ先頭

©2009-2025 Movatter.jp