Movatterモバイル変換


[0]ホーム

URL:


Skip to contents

File reader options

Source:R/csv.R,R/json.R
CsvReadOptions.Rd

CsvReadOptions,CsvParseOptions,CsvConvertOptions,JsonReadOptions,JsonParseOptions, andTimestampParser are containers for variousfile reading options. See their usage inread_csv_arrow() andread_json_arrow(), respectively.

Factory

TheCsvReadOptions$create() andJsonReadOptions$create() factory methodstake the following arguments:

  • use_threads Whether to use the global CPU thread pool

  • block_size Block size we request from the IO layer; also determinesthe size of chunks when use_threads isTRUE. NB: ifFALSE, JSON inputmust end with an empty line.

CsvReadOptions$create() further accepts these additional arguments:

  • skip_rows Number of lines to skip before reading data (default 0).

  • column_names Character vector to supply column names. If length-0(the default), the first non-skipped row will be parsed to generate columnnames, unlessautogenerate_column_names isTRUE.

  • autogenerate_column_names Logical: generate column names instead ofusing the first non-skipped row (the default)? IfTRUE, column names willbe "f0", "f1", ..., "fN".

  • encoding The file encoding. (default"UTF-8")

  • skip_rows_after_names Number of lines to skip after the column names (default 0).This number can be larger than the number of rows in one block, and empty rows are counted.The order of application is as follows:

    • skip_rows is applied (if non-zero);

    • column names are read (unlesscolumn_names is set);

    • skip_rows_after_names is applied (if non-zero).

CsvParseOptions$create() takes the following arguments:

  • delimiter Field delimiting character (default",")

  • quoting Logical: are strings quoted? (defaultTRUE)

  • quote_char Quoting character, ifquoting isTRUE (default'"')

  • double_quote Logical: are quotes inside values double-quoted? (defaultTRUE)

  • escaping Logical: whether escaping is used (defaultFALSE)

  • escape_char Escaping character, ifescaping isTRUE (default"\\")

  • newlines_in_values Logical: are values allowed to contain CR (0x0d)and LF (0x0a) characters? (defaultFALSE)

  • ignore_empty_lines Logical: should empty lines be ignored (default) orgenerate a row of missing values (ifFALSE)?

JsonParseOptions$create() accepts only thenewlines_in_values argument.

CsvConvertOptions$create() takes the following arguments:

  • check_utf8 Logical: check UTF8 validity of string columns? (defaultTRUE)

  • null_values character vector of recognized spellings for null values.Analogous to thena.strings argument toread.csv() orna inreadr::read_csv().

  • strings_can_be_null Logical: can string / binary columns havenull values? Similar to thequoted_na argument toreadr::read_csv().(defaultFALSE)

  • true_values character vector of recognized spellings forTRUE values

  • false_values character vector of recognized spellings forFALSE values

  • col_types ASchema orNULL to infer types

  • auto_dict_encode Logical: Whether to try to automaticallydictionary-encode string / binary data (thinkstringsAsFactors). DefaultFALSE.This setting is ignored for non-inferred columns (those incol_types).

  • auto_dict_max_cardinality Ifauto_dict_encode, string/binary columnsare dictionary-encoded up to this number of unique values (default 50),after which it switches to regular encoding.

  • include_columns If non-empty, indicates the names of columns from theCSV file that should be actually read and converted (in the vector's order).

  • include_missing_columns Logical: ifinclude_columns is provided, shouldcolumns named in it but not found in the data be included as a column oftypenull()? The default (FALSE) means that the reader will insteadraise an error.

  • timestamp_parsers User-defined timestamp parsers. If more than oneparser is specified, the CSV conversion logic will try parsing valuesstarting from the beginning of this vector. Possible values are(a)NULL, the default, which uses the ISO-8601 parser;(b) a character vector ofstrptime parse strings; or(c) a list of TimestampParser objects.

  • decimal_point Character to use for decimal point in floating point numbers. Default: "."

TimestampParser$create() takes an optionalformat string argument.Seestrptime() for example syntax.The default is to use an ISO-8601 format parser.

TheCsvWriteOptions$create() factory method takes the following arguments:

  • include_header Whether to write an initial header line with column names

  • batch_size Maximum number of rows processed at a time. Default is 1024.

  • null_string The string to be written for null values. Must not containquotation marks. Default is an empty string ("").

  • eol The end of line character to use for ending rows.

  • delimiter Field delimiter

  • quoting_style Quoting style: "Needed" (Only enclose values in quotes which need them, because their CSVrendering can contain quotes itself (e.g. strings or binary values)), "AllValid" (Enclose all valid values inquotes), or "None" (Do not enclose any values in quotes).

Active bindings

  • column_names: fromCsvReadOptions


[8]ページ先頭

©2009-2026 Movatter.jp