Movatterモバイル変換


[0]ホーム

URL:


Title:Read Rectangular Text Data
Version:2.1.6
Description:The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.
License:MIT + file LICENSE
URL:https://readr.tidyverse.org,https://github.com/tidyverse/readr
BugReports:https://github.com/tidyverse/readr/issues
Depends:R (≥ 3.6)
Imports:cli (≥ 3.2.0), clipr, crayon, hms (≥ 0.4.1), lifecycle (≥0.2.0), methods, R6, rlang, tibble, utils, vroom (≥ 1.6.0)
Suggests:covr, curl, datasets, knitr, rmarkdown, spelling, stringi,testthat (≥ 3.2.0), tzdb (≥ 0.1.1), waldo, withr, xml2
LinkingTo:cpp11, tzdb (≥ 0.1.1)
VignetteBuilder:knitr
Config/Needs/website:tidyverse, tidyverse/tidytemplate
Config/testthat/edition:3
Config/testthat/parallel:false
Encoding:UTF-8
Language:en-US
RoxygenNote:7.3.3
NeedsCompilation:yes
Packaged:2025-11-14 17:31:56 UTC; jenny
Author:Hadley Wickham [aut], Jim Hester [aut], Romain Francois [ctb], Jennifer BryanORCID iD [aut, cre], Shelby Bearrows [ctb], Posit Software, PBC [cph, fnd], https://github.com/mandreyel/ [cph] (mio library), Jukka Jylänki [ctb, cph] (grisu3 implementation), Mikkel Jørgensen [ctb, cph] (grisu3 implementation)
Maintainer:Jennifer Bryan <jenny@posit.co>
Repository:CRAN
Date/Publication:2025-11-14 22:30:02 UTC

readr: Read Rectangular Text Data

Description

logo

The goal of 'readr' is to provide a fast and friendly way to read rectangular data (like 'csv', 'tsv', and 'fwf'). It is designed to flexibly parse many types of data found in the wild, while still cleanly failing when data unexpectedly changes.

Author(s)

Maintainer: Jennifer Bryanjenny@posit.co (ORCID)

Authors:

Other contributors:

See Also

Useful links:


Tokenizers.

Description

Explicitly create tokenizer objects. Usually you will not call thesefunction, but will instead use one of the use friendly wrappers likeread_csv().

Usage

tokenizer_delim(  delim,  quote = "\"",  na = "NA",  quoted_na = TRUE,  comment = "",  trim_ws = TRUE,  escape_double = TRUE,  escape_backslash = FALSE,  skip_empty_rows = TRUE)tokenizer_csv(  na = "NA",  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip_empty_rows = TRUE)tokenizer_tsv(  na = "NA",  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip_empty_rows = TRUE)tokenizer_line(na = character(), skip_empty_rows = TRUE)tokenizer_log(trim_ws)tokenizer_fwf(  begin,  end,  na = "NA",  comment = "",  trim_ws = TRUE,  skip_empty_rows = TRUE)tokenizer_ws(na = "NA", comment = "", skip_empty_rows = TRUE)

Arguments

delim

Single character used to separate fields within a record.

quote

Single character used to quote strings.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

quoted_na

[Deprecated] Should missing valuesinside quotes be treated as missing values (the default) or strings. Thisparameter is soft deprecated as of readr 2.0.0.

comment

A string used to identify comments. Any text after thecomment characters will be silently ignored.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

escape_double

Does the file escape quotes by doubling them?i.e. If this option isTRUE, the value⁠""""⁠ representsa single quote,⁠\"⁠.

escape_backslash

Does the file use backslashes to escape specialcharacters? This is more general thanescape_double as backslashescan be used to escape the delimiter character, the quote character, orto add special characters like⁠\\n⁠.

skip_empty_rows

Should blank rows be ignored altogether? i.e. If thisoption isTRUE then blank rows will not be represented at all. If it isFALSE then they will be represented byNA values in all the columns.

begin,end

Begin and end offsets for each file. These are C++offsets so the first column is column zero, and the ranges are[begin, end) (i.e inclusive-exclusive).

Examples

tokenizer_csv()

Generate a column specification

Description

This is most useful for generating a specification using the short form

Usage

as.col_spec(x)

Arguments

x

Input object

Examples

as.col_spec("cccnnn")

Callback classes

Description

These classes are used to define callback behaviors.

Details

ChunkCallback

Callback interface definition, all callback functions should inherit from this class.

SideEffectChunkCallback

Callback function that is used only for side effects, no results are returned.

DataFrameCallback

Callback function that combines each result together at the end.

AccumulateCallBack

Callback function that accumulates a single result. Requires the parameteracc to specifythe initial value of the accumulator. The parameteracc isNULL by default.

See Also

Other chunked:melt_delim_chunked(),read_delim_chunked(),read_lines_chunked()

Examples

## If given a regular function it is converted to a SideEffectChunkCallback# view structure of each chunkread_lines_chunked(readr_example("mtcars.csv"), str, chunk_size = 5)# Print starting line of each chunkf <- function(x, pos) print(pos)read_lines_chunked(readr_example("mtcars.csv"), SideEffectChunkCallback$new(f), chunk_size = 5)# If combined results are desired you can use the DataFrameCallback# Cars with 3 gearsf <- function(x, pos) subset(x, gear == 3)read_csv_chunked(readr_example("mtcars.csv"), DataFrameCallback$new(f), chunk_size = 5)# The ListCallback can be used for more flexible outputf <- function(x, pos) x$mpg[x$hp > 100]read_csv_chunked(readr_example("mtcars.csv"), ListCallback$new(f), chunk_size = 5)# The AccumulateCallback accumulates results from each chunkf <- function(x, pos, acc) sum(x$mpg) + accread_csv_chunked(readr_example("mtcars.csv"), AccumulateCallback$new(f, acc = 0), chunk_size = 5)

Returns values from the clipboard

Description

This is useful in theread_delim() functions to read from the clipboard.

Usage

clipboard()

See Also

read_delim


Skip a column

Description

Use this function to ignore a column when reading in a file.To skip all columns not otherwise specified, usecols_only().

Usage

col_skip()

See Also

Other parsers:cols(),cols_condense(),parse_datetime(),parse_factor(),parse_guess(),parse_logical(),parse_number(),parse_vector()


Create column specification

Description

cols() includes all columns in the input data, guessing the column typesas the default.cols_only() includes only the columns you explicitlyspecify, skipping the rest. In general you can substitutelist() forcols() without changing the behavior.

Usage

cols(..., .default = col_guess())cols_only(...)

Arguments

...

Either column objects created by⁠col_*()⁠, or their abbreviatedcharacter names (as described in thecol_types argument ofread_delim()). If you're only overriding a few columns, it'sbest to refer to columns by name. If not named, the column types must matchthe column names exactly.

.default

Any named columns not explicitly overridden in...will be read with this column type.

Details

The available specifications are: (with string abbreviations in brackets)

See Also

Other parsers:col_skip(),cols_condense(),parse_datetime(),parse_factor(),parse_guess(),parse_logical(),parse_number(),parse_vector()

Examples

cols(a = col_integer())cols_only(a = col_integer())# You can also use the standard abbreviationscols(a = "i")cols(a = "i", b = "d", c = "_")# You can also use multiple sets of column definitions by combining# them like so:t1 <- cols(  column_one = col_integer(),  column_two = col_number())t2 <- cols(  column_three = col_character())t3 <- t1t3$cols <- c(t1$cols, t2$cols)t3

Examine the column specifications for a data frame

Description

cols_condense() takes a spec object and condenses its definition by settingthe default column type to the most frequent type and only listing columnswith a different type.

spec() extracts the full column specification from a tibblecreated by readr.

Usage

cols_condense(x)spec(x)

Arguments

x

The data frame object to extract from

Value

A col_spec object.

See Also

Other parsers:col_skip(),cols(),parse_datetime(),parse_factor(),parse_guess(),parse_logical(),parse_number(),parse_vector()

Examples

df <- read_csv(readr_example("mtcars.csv"))s <- spec(df)scols_condense(s)

Count the number of fields in each line of a file

Description

This is useful for diagnosing problems with functions that failto parse correctly.

Usage

count_fields(file, tokenizer, skip = 0, n_max = -1L)

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

tokenizer

A tokenizer that specifies how to break thefileup into fields, e.g.,tokenizer_csv(),tokenizer_fwf()

skip

Number of lines to skip before reading data.

n_max

Optionally, maximum number of rows to count fields for.

Examples

count_fields(readr_example("mtcars.csv"), tokenizer_csv())

Create a source object.

Description

Create a source object.

Usage

datasource(  file,  skip = 0,  skip_empty_rows = FALSE,  comment = "",  skip_quote = TRUE)

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

skip

Number of lines to skip before reading data.

Examples

# Literal csvdatasource("a,b,c\n1,2,3")datasource(charToRaw("a,b,c\n1,2,3"))# Stringsdatasource(readr_example("mtcars.csv"))datasource(readr_example("mtcars.csv.bz2"))datasource(readr_example("mtcars.csv.zip"))## Not run: datasource("https://github.com/tidyverse/readr/raw/main/inst/extdata/mtcars.csv")## End(Not run)# Connectioncon <- rawConnection(charToRaw("abc\n123"))datasource(con)close(con)

Create or retrieve date names

Description

When parsing dates, you often need to know how weekdays of the week andmonths are represented as text. This pair of functions allows you to eithercreate your own, or retrieve from a standard list. The standard list isderived from ICU (⁠http://site.icu-project.org⁠) via the stringi package.

Usage

date_names(mon, mon_ab = mon, day, day_ab = day, am_pm = c("AM", "PM"))date_names_lang(language)date_names_langs()

Arguments

mon,mon_ab

Full and abbreviated month names.

day,day_ab

Full and abbreviated week day names. Starts with Sunday.

am_pm

Names used for AM and PM.

language

A BCP 47 locale, made up of a language and a region,e.g."en" for American English. Seedate_names_langs()for a complete list of available locales.

Examples

date_names_lang("en")date_names_lang("ko")date_names_lang("fr")

Retrieve the currently active edition

Description

Retrieve the currently active edition

Usage

edition_get()

Value

An integer corresponding to the currently active edition.

Examples

edition_get()

Convert a data frame to a delimited string

Description

These functions are equivalent towrite_csv() etc., but insteadof writing to disk, they return a string.

Usage

format_delim(  x,  delim,  na = "NA",  append = FALSE,  col_names = !append,  quote = c("needed", "all", "none"),  escape = c("double", "backslash", "none"),  eol = "\n",  quote_escape = deprecated())format_csv(  x,  na = "NA",  append = FALSE,  col_names = !append,  quote = c("needed", "all", "none"),  escape = c("double", "backslash", "none"),  eol = "\n",  quote_escape = deprecated())format_csv2(  x,  na = "NA",  append = FALSE,  col_names = !append,  quote = c("needed", "all", "none"),  escape = c("double", "backslash", "none"),  eol = "\n",  quote_escape = deprecated())format_tsv(  x,  na = "NA",  append = FALSE,  col_names = !append,  quote = c("needed", "all", "none"),  escape = c("double", "backslash", "none"),  eol = "\n",  quote_escape = deprecated())

Arguments

x

A data frame.

delim

Delimiter used to separate values. Defaults to" " forwrite_delim(),"," forwrite_excel_csv() and";" forwrite_excel_csv2(). Must be a single character.

na

String used for missing values. Defaults to NA. Missing valueswill never be quoted; strings with the same value asna willalways be quoted.

append

IfFALSE, will overwrite existing file. IfTRUE,will append to existing file. In both cases, if the file does not exist a newfile is created.

col_names

IfFALSE, column names will not be included at the top of the file. IfTRUE,column names will be included. If not specified,col_names will take the opposite value given toappend.

quote

How to handle fields which contain characters that need to bequoted.

  • needed - Values are only quoted if needed: if they contain a delimiter,quote, or newline.

  • all - Quote all fields.

  • none - Never quote fields.

escape

The type of escape to use when quotes are in the data.

  • double - quotes are escaped by doubling them.

  • backslash - quotes are escaped by a preceding backslash.

  • none - quotes are not escaped.

eol

The end of line character to use. Most commonly either"\n" forUnix style newlines, or"\r\n" for Windows style newlines.

quote_escape

[Deprecated] Use theescapeargument instead.

Value

A string.

Output

Factors are coerced to character. Doubles are formatted to a decimal stringusing the grisu3 algorithm.POSIXct values are formatted as ISO8601 with aUTC timezoneNote:POSIXct objects in local or non-UTC timezones will beconverted to UTC time before writing.

All columns are encoded as UTF-8.write_excel_csv() andwrite_excel_csv2() also include aUTF-8 Byte order markwhich indicates to Excel the csv is UTF-8 encoded.

write_excel_csv2() andwrite_csv2 were created to allow users withdifferent locale settings to save .csv files using their default settings(e.g.⁠;⁠ as the column separator and⁠,⁠ as the decimal separator).This is common in some European countries.

Values are only quoted if they contain a comma, quote or newline.

The⁠write_*()⁠ functions will automatically compress outputs if an appropriate extension is given.Three extensions are currently supported:.gz for gzip compression,.bz2 for bzip2 compression and.xz for lzma compression. See the examples for more information.

References

Florian Loitsch, Printing Floating-Point Numbers Quickly andAccurately with Integers, PLDI '10,http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf

Examples

# format_()* functions are useful for testing and reprexescat(format_csv(mtcars))cat(format_tsv(mtcars))cat(format_delim(mtcars, ";"))# Specifying missing valuesdf <- data.frame(x = c(1, NA, 3))format_csv(df, na = "missing")# Quotes are automatically added as neededdf <- data.frame(x = c("a ", '"', ",", "\n"))cat(format_csv(df))

Guess encoding of file

Description

Usesstringi::stri_enc_detect(): see the documentation therefor caveats.

Usage

guess_encoding(file, n_max = 10000, threshold = 0.2)

Arguments

file

A character string specifying an input as specified indatasource(), a raw vector, or a list of raw vectors.

n_max

Number of lines to read. Ifn_max is -1, all lines infile will be read.

threshold

Only report guesses above this threshold of certainty.

Value

A tibble

Examples

guess_encoding(readr_example("mtcars.csv"))guess_encoding(read_lines_raw(readr_example("mtcars.csv")))guess_encoding(read_file_raw(readr_example("mtcars.csv")))guess_encoding("a\n\u00b5\u00b5")

Create locales

Description

A locale object tries to capture all the defaults that can vary betweencountries. You set the locale in once, and the details are automaticallypassed on down to the columns parsers. The defaults have been chosen tomatch R (i.e. US English) as closely as possible. Seevignette("locales") for more details.

Usage

locale(  date_names = "en",  date_format = "%AD",  time_format = "%AT",  decimal_mark = ".",  grouping_mark = ",",  tz = "UTC",  encoding = "UTF-8",  asciify = FALSE)default_locale()

Arguments

date_names

Character representations of day and month names. Eitherthe language code as string (passed on todate_names_lang())or an object created bydate_names().

date_format,time_format

Default date and time formats.

decimal_mark,grouping_mark

Symbols used to indicate the decimalplace, and to chunk larger numbers. Decimal mark can only be⁠,⁠ or..

tz

Default tz. This is used both for input (if the time zone isn'tpresent in individual strings), and for output (to control the defaultdisplay). The default is to use "UTC", a time zone that does not usedaylight savings time (DST) and hence is typically most useful for data.The absence of time zones makes it approximately 50x faster to generateUTC times than any other time zone.

Use"" to use the system default time zone, but beware that thiswill not be reproducible across systems.

For a complete list of possible time zones, seeOlsonNames().Americans, note that "EST" is a Canadian time zone that does not haveDST. It isnot Eastern Standard Time. It's better to use"US/Eastern", "US/Central" etc.

encoding

Default encoding. This only affects how the file isread - readr always converts the output to UTF-8.

asciify

Should diacritics be stripped from date names and converted toASCII? This is useful if you're dealing with ASCII data where the correctspellings have been lost. Requires thestringi package.

Examples

locale()locale("fr")# A South American localelocale("es", decimal_mark = ",")

Return melted data for each token in a delimited file (including csv & tsv)

Description

[Superseded]This function has been superseded in readr and moved tothe meltr package.

Usage

melt_delim(  file,  delim,  quote = "\"",  escape_backslash = FALSE,  escape_double = TRUE,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  comment = "",  trim_ws = FALSE,  skip = 0,  n_max = Inf,  progress = show_progress(),  skip_empty_rows = FALSE)melt_csv(  file,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  n_max = Inf,  progress = show_progress(),  skip_empty_rows = FALSE)melt_csv2(  file,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  n_max = Inf,  progress = show_progress(),  skip_empty_rows = FALSE)melt_tsv(  file,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  n_max = Inf,  progress = show_progress(),  skip_empty_rows = FALSE)

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

delim

Single character used to separate fields within a record.

quote

Single character used to quote strings.

escape_backslash

Does the file use backslashes to escape specialcharacters? This is more general thanescape_double as backslashescan be used to escape the delimiter character, the quote character, orto add special characters like⁠\\n⁠.

escape_double

Does the file escape quotes by doubling them?i.e. If this option isTRUE, the value⁠""""⁠ representsa single quote,⁠\"⁠.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

quoted_na

[Deprecated] Should missing valuesinside quotes be treated as missing values (the default) or strings. Thisparameter is soft deprecated as of readr 2.0.0.

comment

A string used to identify comments. Any text after thecomment characters will be silently ignored.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

skip

Number of lines to skip before reading data. Ifcomment issupplied any commented lines are ignoredafter skipping.

n_max

Maximum number of lines to read.

progress

Display a progress bar? By default it will only displayin an interactive session and not while knitting a document. The automaticprogress bar can be disabled by setting optionreadr.show_progress toFALSE.

skip_empty_rows

Should blank rows be ignored altogether? i.e. If thisoption isTRUE then blank rows will not be represented at all. If it isFALSE then they will be represented byNA values in all the columns.

Details

For certain non-rectangular data formats, it can be useful to parse the datainto a melted format where each row represents a single token.

melt_csv() andmelt_tsv() are special cases of the generalmelt_delim(). They're useful for reading the most common types offlat file data, comma separated values and tab separated values,respectively.melt_csv2() uses⁠;⁠ for the field separator and⁠,⁠ for thedecimal point. This is common in some European countries.

Value

Atibble::tibble() of four columns:

If there are parsing problems, a warning tells youhow many, and you can retrieve the details withproblems().

See Also

read_delim() for the conventional way to read rectangular datafrom delimited files.

Examples

# Input sources -------------------------------------------------------------# Read from a pathmelt_csv(readr_example("mtcars.csv"))melt_csv(readr_example("mtcars.csv.zip"))melt_csv(readr_example("mtcars.csv.bz2"))## Not run: melt_csv("https://github.com/tidyverse/readr/raw/main/inst/extdata/mtcars.csv")## End(Not run)# Or directly from a string (must contain a newline)melt_csv("x,y\n1,2\n3,4")# To import empty cells as 'empty' rather than `NA`melt_csv("x,y\n,NA,\"\",''", na = "NA")# File types ----------------------------------------------------------------melt_csv("a,b\n1.0,2.0")melt_csv2("a;b\n1,0;2,0")melt_tsv("a\tb\n1.0\t2.0")melt_delim("a|b\n1.0|2.0", delim = "|")

Melt a delimited file by chunks

Description

For certain non-rectangular data formats, it can be useful to parse the datainto a melted format where each row represents a single token.

Usage

melt_delim_chunked(  file,  callback,  chunk_size = 10000,  delim,  quote = "\"",  escape_backslash = FALSE,  escape_double = TRUE,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  comment = "",  trim_ws = FALSE,  skip = 0,  progress = show_progress(),  skip_empty_rows = FALSE)melt_csv_chunked(  file,  callback,  chunk_size = 10000,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  progress = show_progress(),  skip_empty_rows = FALSE)melt_csv2_chunked(  file,  callback,  chunk_size = 10000,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  progress = show_progress(),  skip_empty_rows = FALSE)melt_tsv_chunked(  file,  callback,  chunk_size = 10000,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  progress = show_progress(),  skip_empty_rows = FALSE)

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

callback

A callback function to call on each chunk

chunk_size

The number of rows to include in each chunk

delim

Single character used to separate fields within a record.

quote

Single character used to quote strings.

escape_backslash

Does the file use backslashes to escape specialcharacters? This is more general thanescape_double as backslashescan be used to escape the delimiter character, the quote character, orto add special characters like⁠\\n⁠.

escape_double

Does the file escape quotes by doubling them?i.e. If this option isTRUE, the value⁠""""⁠ representsa single quote,⁠\"⁠.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

quoted_na

[Deprecated] Should missing valuesinside quotes be treated as missing values (the default) or strings. Thisparameter is soft deprecated as of readr 2.0.0.

comment

A string used to identify comments. Any text after thecomment characters will be silently ignored.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

skip

Number of lines to skip before reading data. Ifcomment issupplied any commented lines are ignoredafter skipping.

progress

Display a progress bar? By default it will only displayin an interactive session and not while knitting a document. The automaticprogress bar can be disabled by setting optionreadr.show_progress toFALSE.

skip_empty_rows

Should blank rows be ignored altogether? i.e. If thisoption isTRUE then blank rows will not be represented at all. If it isFALSE then they will be represented byNA values in all the columns.

Details

melt_delim_chunked() and the specialisationsmelt_csv_chunked(),melt_csv2_chunked() andmelt_tsv_chunked() read files by a chunk of rowsat a time, executing a given function on one chunk before reading the next.

See Also

Other chunked:callback,read_delim_chunked(),read_lines_chunked()

Examples

# Cars with 3 gearsf <- function(x, pos) subset(x, data_type == "integer")melt_csv_chunked(readr_example("mtcars.csv"), DataFrameCallback$new(f), chunk_size = 5)

Return melted data for each token in a fixed width file

Description

[Superseded]This function has been superseded in readr and moved tothe meltr package.

Usage

melt_fwf(  file,  col_positions,  locale = default_locale(),  na = c("", "NA"),  comment = "",  trim_ws = TRUE,  skip = 0,  n_max = Inf,  progress = show_progress(),  skip_empty_rows = FALSE)

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

col_positions

Column positions, as created byfwf_empty(),fwf_widths() orfwf_positions(). To read in only selected fields,usefwf_positions(). If the width of the last column is variable (aragged fwf file), supply the last end position as NA.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

comment

A string used to identify comments. Any text after thecomment characters will be silently ignored.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

skip

Number of lines to skip before reading data.

n_max

Maximum number of lines to read.

progress

Display a progress bar? By default it will only displayin an interactive session and not while knitting a document. The automaticprogress bar can be disabled by setting optionreadr.show_progress toFALSE.

skip_empty_rows

Should blank rows be ignored altogether? i.e. If thisoption isTRUE then blank rows will not be represented at all. If it isFALSE then they will be represented byNA values in all the columns.

Details

For certain non-rectangular data formats, it can be useful to parse the datainto a melted format where each row represents a single token.

melt_fwf() parses each token of a fixed width file into a single row, butit still requires that each field is in the same in every row of thesource file.

See Also

melt_table() to melt fixed width files where eachcolumn is separated by whitespace, andread_fwf() for the conventionalway to read rectangular data from fixed width files.

Examples

fwf_sample <- readr_example("fwf-sample.txt")cat(read_lines(fwf_sample))# You can specify column positions in several ways:# 1. Guess based on position of empty columnsmelt_fwf(fwf_sample, fwf_empty(fwf_sample, col_names = c("first", "last", "state", "ssn")))# 2. A vector of field widthsmelt_fwf(fwf_sample, fwf_widths(c(20, 10, 12), c("name", "state", "ssn")))# 3. Paired vectors of start and end positionsmelt_fwf(fwf_sample, fwf_positions(c(1, 30), c(10, 42), c("name", "ssn")))# 4. Named arguments with start and end positionsmelt_fwf(fwf_sample, fwf_cols(name = c(1, 10), ssn = c(30, 42)))# 5. Named arguments with column widthsmelt_fwf(fwf_sample, fwf_cols(name = 20, state = 10, ssn = 12))

Return melted data for each token in a whitespace-separated file

Description

[Superseded]This function has been superseded in readr and moved tothe meltr package.

For certain non-rectangular data formats, it can be useful to parse the datainto a melted format where each row represents a single token.

melt_table() andmelt_table2() are designed to read the type of textualdata where each column is separated by one (or more) columns of space.

melt_table2() allows any number of whitespace characters between columns,and the lines can be of different lengths.

melt_table() is more strict, each line must be the same length,and each field is in the same position in every line. It first finds emptycolumns and then parses like a fixed width file.

Usage

melt_table(  file,  locale = default_locale(),  na = "NA",  skip = 0,  n_max = Inf,  guess_max = min(n_max, 1000),  progress = show_progress(),  comment = "",  skip_empty_rows = FALSE)melt_table2(  file,  locale = default_locale(),  na = "NA",  skip = 0,  n_max = Inf,  progress = show_progress(),  comment = "",  skip_empty_rows = FALSE)

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

skip

Number of lines to skip before reading data.

n_max

Maximum number of lines to read.

guess_max

Maximum number of lines to use for guessing column types.Will never use more than the number of lines read.Seevignette("column-types", package = "readr") for more details.

progress

Display a progress bar? By default it will only displayin an interactive session and not while knitting a document. The automaticprogress bar can be disabled by setting optionreadr.show_progress toFALSE.

comment

A string used to identify comments. Any text after thecomment characters will be silently ignored.

skip_empty_rows

Should blank rows be ignored altogether? i.e. If thisoption isTRUE then blank rows will not be represented at all. If it isFALSE then they will be represented byNA values in all the columns.

See Also

melt_fwf() to melt fixed width files where each columnis not separated by whitespace.melt_fwf() is also useful for readingtabular data with non-standard formatting.read_table() is theconventional way to read tabular data from whitespace-separated files.

Examples

fwf <- readr_example("fwf-sample.txt")writeLines(read_lines(fwf))melt_table(fwf)ws <- readr_example("whitespace-sample.txt")writeLines(read_lines(ws))melt_table2(ws)

Preprocess column for output

Description

This is a generic function that applied to each column before it is savedto disk. It provides a hook for S3 classes that need special handling.

Usage

output_column(x, name)

Arguments

x

A vector

Examples

# Most columns are not altered, but POSIXct are converted to ISO8601.x <- parse_datetime("2016-01-01")str(output_column(x))

Parse logicals, integers, and reals

Description

Use⁠parse_*()⁠ if you have a character vector you want to parse. Use⁠col_*()⁠ in conjunction with a⁠read_*()⁠ function to parse thevalues as they're read in.

Usage

parse_logical(x, na = c("", "NA"), locale = default_locale(), trim_ws = TRUE)parse_integer(x, na = c("", "NA"), locale = default_locale(), trim_ws = TRUE)parse_double(x, na = c("", "NA"), locale = default_locale(), trim_ws = TRUE)parse_character(x, na = c("", "NA"), locale = default_locale(), trim_ws = TRUE)col_logical()col_integer()col_double()col_character()

Arguments

x

Character vector of values to parse.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

See Also

Other parsers:col_skip(),cols(),cols_condense(),parse_datetime(),parse_factor(),parse_guess(),parse_number(),parse_vector()

Examples

parse_integer(c("1", "2", "3"))parse_double(c("1", "2", "3.123"))parse_number("$1,123,456.00")# Use locale to override default decimal and grouping markses_MX <- locale("es", decimal_mark = ",")parse_number("$1.123.456,00", locale = es_MX)# Invalid values are replaced with missing values with a warning.x <- c("1", "2", "3", "-")parse_double(x)# Or flag values as missingparse_double(x, na = "-")

Parse date/times

Description

Parse date/times

Usage

parse_datetime(  x,  format = "",  na = c("", "NA"),  locale = default_locale(),  trim_ws = TRUE)parse_date(  x,  format = "",  na = c("", "NA"),  locale = default_locale(),  trim_ws = TRUE)parse_time(  x,  format = "",  na = c("", "NA"),  locale = default_locale(),  trim_ws = TRUE)col_datetime(format = "")col_date(format = "")col_time(format = "")

Arguments

x

A character vector of dates to parse.

format

A format specification, as described below. If set to "",date times are parsed as ISO8601, dates and times used the date andtime formats specified in thelocale().

Unlikestrptime(), the format specification must matchthe complete string.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

Value

APOSIXct() vector withtzone attribute set totz. Elements that could not be parsed (or did not generate validdates) will be set toNA, and a warning message will informyou of the total number of failures.

Format specification

readr uses a format specification similar tostrptime().There are three types of element:

  1. Date components are specified with "%" followed by a letter. For example"%Y" matches a 4 digit year, "%m", matches a 2 digit month and "%d" matchesa 2 digit day. Month and day default to1, (i.e. Jan 1st) if not present,for example if only a year is given.

  2. Whitespace is any sequence of zero or more whitespace characters.

  3. Any other character is matched exactly.

parse_datetime() recognises the following format specifications:

ISO8601 support

Currently, readr does not support all of ISO8601. Missing features:

The parser is also a little laxer than ISO8601:

See Also

Other parsers:col_skip(),cols(),cols_condense(),parse_factor(),parse_guess(),parse_logical(),parse_number(),parse_vector()

Examples

# Format strings --------------------------------------------------------parse_datetime("01/02/2010", "%d/%m/%Y")parse_datetime("01/02/2010", "%m/%d/%Y")# Handle any separatorparse_datetime("01/02/2010", "%m%.%d%.%Y")# Dates look the same, but internally they use the number of days since# 1970-01-01 instead of the number of seconds. This avoids a whole lot# of troubles related to time zones, so use if you can.parse_date("01/02/2010", "%d/%m/%Y")parse_date("01/02/2010", "%m/%d/%Y")# You can parse timezones from strings (as listed in OlsonNames())parse_datetime("2010/01/01 12:00 US/Central", "%Y/%m/%d %H:%M %Z")# Or from offsetsparse_datetime("2010/01/01 12:00 -0600", "%Y/%m/%d %H:%M %z")# Use the locale parameter to control the default time zone# (but note UTC is considerably faster than other options)parse_datetime("2010/01/01 12:00", "%Y/%m/%d %H:%M",  locale = locale(tz = "US/Central"))parse_datetime("2010/01/01 12:00", "%Y/%m/%d %H:%M",  locale = locale(tz = "US/Eastern"))# Unlike strptime, the format specification must match the complete# string (ignoring leading and trailing whitespace). This avoids common# errors:strptime("01/02/2010", "%d/%m/%y")parse_datetime("01/02/2010", "%d/%m/%y")# Failures -------------------------------------------------------------parse_datetime("01/01/2010", "%d/%m/%Y")parse_datetime(c("01/ab/2010", "32/01/2010"), "%d/%m/%Y")# Locales --------------------------------------------------------------# By default, readr expects English date/times, but that's easy to change'parse_datetime("1 janvier 2015", "%d %B %Y", locale = locale("fr"))parse_datetime("1 enero 2015", "%d %B %Y", locale = locale("es"))# ISO8601 --------------------------------------------------------------# With separatorsparse_datetime("1979-10-14")parse_datetime("1979-10-14T10")parse_datetime("1979-10-14T10:11")parse_datetime("1979-10-14T10:11:12")parse_datetime("1979-10-14T10:11:12.12345")# Without separatorsparse_datetime("19791014")parse_datetime("19791014T101112")# Time zonesus_central <- locale(tz = "US/Central")parse_datetime("1979-10-14T1010", locale = us_central)parse_datetime("1979-10-14T1010-0500", locale = us_central)parse_datetime("1979-10-14T1010Z", locale = us_central)# Your current time zoneparse_datetime("1979-10-14T1010", locale = locale(tz = ""))

Parse factors

Description

parse_factor() is similar tofactor(), but generates a warning iflevels have been specified and some elements ofx are not found in thoselevels.

Usage

parse_factor(  x,  levels = NULL,  ordered = FALSE,  na = c("", "NA"),  locale = default_locale(),  include_na = TRUE,  trim_ws = TRUE)col_factor(levels = NULL, ordered = FALSE, include_na = FALSE)

Arguments

x

Character vector of values to parse.

levels

Character vector of the allowed levels. Whenlevels = NULL(the default),levels are discovered from the unique values ofx, inthe order in which they appear inx.

ordered

Is it an ordered factor?

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

include_na

IfTRUE andx contains at least oneNA, thenNAis included in the levels of the constructed factor.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

See Also

Other parsers:col_skip(),cols(),cols_condense(),parse_datetime(),parse_guess(),parse_logical(),parse_number(),parse_vector()

Examples

# discover the levels from the dataparse_factor(c("a", "b"))parse_factor(c("a", "b", "-99"))parse_factor(c("a", "b", "-99"), na = c("", "NA", "-99"))parse_factor(c("a", "b", "-99"), na = c("", "NA", "-99"), include_na = FALSE)# provide the levels explicitlyparse_factor(c("a", "b"), levels = letters[1:5])x <- c("cat", "dog", "caw")animals <- c("cat", "dog", "cow")# base::factor() silently converts elements that do not match any levels to# NAfactor(x, levels = animals)# parse_factor() generates same factor as base::factor() but throws a warning# and reports problemsparse_factor(x, levels = animals)

Parse using the "best" type

Description

parse_guess() returns the parser vector;guess_parser()returns the name of the parser. These functions use a number of heuristicsto determine which type of vector is "best". Generally they try to err ofthe side of safety, as it's straightforward to override the parsing choiceif needed.

Usage

parse_guess(  x,  na = c("", "NA"),  locale = default_locale(),  trim_ws = TRUE,  guess_integer = FALSE)col_guess()guess_parser(  x,  locale = default_locale(),  guess_integer = FALSE,  na = c("", "NA"))

Arguments

x

Character vector of values to parse.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

guess_integer

IfTRUE, guess integer types for whole numbers, ifFALSE guess numeric type for all numbers.

See Also

Other parsers:col_skip(),cols(),cols_condense(),parse_datetime(),parse_factor(),parse_logical(),parse_number(),parse_vector()

Examples

# Logical vectorsparse_guess(c("FALSE", "TRUE", "F", "T"))# Integers and doublesparse_guess(c("1", "2", "3"))parse_guess(c("1.6", "2.6", "3.4"))# Numbers containing grouping markguess_parser("1,234,566")parse_guess("1,234,566")# ISO 8601 date timesguess_parser(c("2010-10-10"))parse_guess(c("2010-10-10"))

Parse numbers, flexibly

Description

This parses the first number it finds, dropping any non-numeric charactersbefore the first number and all characters after the first number. Thegrouping mark specified by the locale is ignored inside the number.

Usage

parse_number(x, na = c("", "NA"), locale = default_locale(), trim_ws = TRUE)col_number()

Arguments

x

Character vector of values to parse.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

Value

A numeric vector (double) of parsed numbers.

See Also

Other parsers:col_skip(),cols(),cols_condense(),parse_datetime(),parse_factor(),parse_guess(),parse_logical(),parse_vector()

Examples

## These all return 1000parse_number("$1,000") ## leading `$` and grouping character `,` ignoredparse_number("euro1,000") ## leading non-numeric euro ignoredparse_number("t1000t1000") ## only parses first number foundparse_number("1,234.56")## explicit locale specifying European grouping and decimal marksparse_number("1.234,56", locale = locale(decimal_mark = ",", grouping_mark = "."))## SI/ISO 31-0 standard spaces for number groupingparse_number("1 234.56", locale = locale(decimal_mark = ".", grouping_mark = " "))## Specifying strings for NAsparse_number(c("1", "2", "3", "NA"))parse_number(c("1", "2", "3", "NA", "Nothing"), na = c("NA", "Nothing"))

Parse a character vector.

Description

Parse a character vector.

Usage

parse_vector(  x,  collector,  na = c("", "NA"),  locale = default_locale(),  trim_ws = TRUE)

Arguments

x

Character vector of elements to parse.

collector

Column specification.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

See Also

Other parsers:col_skip(),cols(),cols_condense(),parse_datetime(),parse_factor(),parse_guess(),parse_logical(),parse_number()

Examples

x <- c("1", "2", "3", "NA")parse_vector(x, col_integer())parse_vector(x, col_double())

Retrieve parsing problems

Description

Readr functions will only throw an error if parsing fails in an unrecoverableway. However, there are lots of potential problems that you might want toknow about - these are stored in theproblems attribute of theoutput, which you can easily access with this function.stop_for_problems() will throw an error if there are any parsingproblems: this is useful for automated scripts where you want to throwan error as soon as you encounter a problem.

Usage

problems(x = .Last.value)stop_for_problems(x)

Arguments

x

A data frame (from⁠read_*()⁠) or a vector (from⁠parse_*()⁠).

Value

A data frame with one row for each problem and four columns:

row,col

Row and column of problem

expected

What readr expected to find

actual

What it actually got

Examples

x <- parse_integer(c("1X", "blah", "3"))problems(x)y <- parse_integer(c("1", "2", "3"))problems(y)

Read built-in object from package

Description

Consistent wrapper arounddata() that forces the promise. This is also astronger parallel to loading data from a file.

Usage

read_builtin(x, package = NULL)

Arguments

x

Name (character string) of data set to read.

package

Name of package from which to find data set. By default, allattached packages are searched and then the 'data' subdirectory (if present)of the current working directory.

Value

An object of the built-in class ofx.

Examples

read_builtin("mtcars", "datasets")

Read a delimited file (including CSV and TSV) into a tibble

Description

read_csv() andread_tsv() are special cases of the more generalread_delim(). They're useful for reading the most common types offlat file data, comma separated values and tab separated values,respectively.read_csv2() uses⁠;⁠ for the field separator and⁠,⁠ for thedecimal point. This format is common in some European countries.

Usage

read_delim(  file,  delim = NULL,  quote = "\"",  escape_backslash = FALSE,  escape_double = TRUE,  col_names = TRUE,  col_types = NULL,  col_select = NULL,  id = NULL,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  comment = "",  trim_ws = FALSE,  skip = 0,  n_max = Inf,  guess_max = min(1000, n_max),  name_repair = "unique",  num_threads = readr_threads(),  progress = show_progress(),  show_col_types = should_show_types(),  skip_empty_rows = TRUE,  lazy = should_read_lazy())read_csv(  file,  col_names = TRUE,  col_types = NULL,  col_select = NULL,  id = NULL,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  n_max = Inf,  guess_max = min(1000, n_max),  name_repair = "unique",  num_threads = readr_threads(),  progress = show_progress(),  show_col_types = should_show_types(),  skip_empty_rows = TRUE,  lazy = should_read_lazy())read_csv2(  file,  col_names = TRUE,  col_types = NULL,  col_select = NULL,  id = NULL,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  n_max = Inf,  guess_max = min(1000, n_max),  progress = show_progress(),  name_repair = "unique",  num_threads = readr_threads(),  show_col_types = should_show_types(),  skip_empty_rows = TRUE,  lazy = should_read_lazy())read_tsv(  file,  col_names = TRUE,  col_types = NULL,  col_select = NULL,  id = NULL,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  n_max = Inf,  guess_max = min(1000, n_max),  progress = show_progress(),  name_repair = "unique",  num_threads = readr_threads(),  show_col_types = should_show_types(),  skip_empty_rows = TRUE,  lazy = should_read_lazy())

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

delim

Single character used to separate fields within a record.

quote

Single character used to quote strings.

escape_backslash

Does the file use backslashes to escape specialcharacters? This is more general thanescape_double as backslashescan be used to escape the delimiter character, the quote character, orto add special characters like⁠\\n⁠.

escape_double

Does the file escape quotes by doubling them?i.e. If this option isTRUE, the value⁠""""⁠ representsa single quote,⁠\"⁠.

col_names

EitherTRUE,FALSE or a character vectorof column names.

IfTRUE, the first row of the input will be used as the columnnames, and will not be included in the data frame. IfFALSE, columnnames will be generated automatically: X1, X2, X3 etc.

Ifcol_names is a character vector, the values will be used as thenames of the columns, and the first row of the input will be read intothe first row of the output data frame.

Missing (NA) column names will generate a warning, and be filledin with dummy names...1,...2 etc. Duplicate column nameswill generate a warning and be made unique, seename_repair to controlhow this is done.

col_types

One ofNULL, acols() specification, ora string. Seevignette("readr") for more details.

IfNULL, all column types will be inferred fromguess_max rows of theinput, interspersed throughout the file. This is convenient (and fast),but not robust. If the guessed types are wrong, you'll need to increaseguess_max or supply the correct types yourself.

Column specifications created bylist() orcols() must containone column specification for each column. If you only want to read asubset of the columns, usecols_only().

Alternatively, you can use a compact string representation where eachcharacter represents one column:

  • c = character

  • i = integer

  • n = number

  • d = double

  • l = logical

  • f = factor

  • D = date

  • T = date time

  • t = time

  • ? = guess

  • _ or - = skip

By default, reading a file without a column specification will print amessage showing whatreadr guessed they were. To remove this message,setshow_col_types = FALSE or setoptions(readr.show_col_types = FALSE).

col_select

Columns to include in the results. You can use the samemini-language asdplyr::select() to refer to the columns by name. Usec() to use more than one selection expression. Although thisusage is less common,col_select also accepts a numeric column index. See?tidyselect::language for full details on theselection language.

id

The name of a column in which to store the file path. This isuseful when reading multiple input files and there is data in the filepaths, such as the data collection date. IfNULL (the default) no extracolumn is created.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

quoted_na

[Deprecated] Should missing valuesinside quotes be treated as missing values (the default) or strings. Thisparameter is soft deprecated as of readr 2.0.0.

comment

A string used to identify comments. Any text after thecomment characters will be silently ignored.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

skip

Number of lines to skip before reading data. Ifcomment issupplied any commented lines are ignoredafter skipping.

n_max

Maximum number of lines to read.

guess_max

Maximum number of lines to use for guessing column types.Will never use more than the number of lines read.Seevignette("column-types", package = "readr") for more details.

name_repair

Handling of column names. The default behaviour is toensure column names are"unique". Various repair strategies aresupported:

  • "minimal": No name repair or checks, beyond basic existence of names.

  • "unique" (default value): Make sure names are unique and not empty.

  • "check_unique": No name repair, but check they areunique.

  • "unique_quiet": Repair with theunique strategy, quietly.

  • "universal": Make the namesunique and syntactic.

  • "universal_quiet": Repair with theuniversal strategy, quietly.

  • A function: Apply custom name repair (e.g.,name_repair = make.namesfor names in the style of base R).

  • A purrr-style anonymous function, seerlang::as_function().

This argument is passed on asrepair tovctrs::vec_as_names().See there for more details on these terms and the strategies usedto enforce them.

num_threads

The number of processing threads to use for initialparsing and lazy reading of data. If your data contains newlines withinfields the parser should automatically detect this and fall back to usingone thread only. However if you know your file has newlines within quotedfields it is safest to setnum_threads = 1 explicitly.

progress

Display a progress bar? By default it will only displayin an interactive session and not while knitting a document. The automaticprogress bar can be disabled by setting optionreadr.show_progress toFALSE.

show_col_types

IfFALSE, do not show the guessed column types. IfTRUE always show the column types, even if they are supplied. IfNULL(the default) only show the column types if they are not explicitly suppliedby thecol_types argument.

skip_empty_rows

Should blank rows be ignored altogether? i.e. If thisoption isTRUE then blank rows will not be represented at all. If it isFALSE then they will be represented byNA values in all the columns.

lazy

Read values lazily? By default, this isFALSE, because thereare special considerations when reading a file lazily that have tripped upsome users. Specifically, things get tricky when reading and then writingback into the same file. But, in general, lazy reading (lazy = TRUE) hasmany benefits, especially for interactive use and when your downstream workonly involves a subset of the rows or columns.

Learn more inshould_read_lazy() and in the documentation for thealtrep argument ofvroom::vroom().

Value

Atibble::tibble(). If there are parsing problems, a warning will alert you.You can retrieve the full details by callingproblems() on your dataset.

Examples

# Input sources -------------------------------------------------------------# Read from a pathread_csv(readr_example("mtcars.csv"))read_csv(readr_example("mtcars.csv.zip"))read_csv(readr_example("mtcars.csv.bz2"))## Not run: # Including remote pathsread_csv("https://github.com/tidyverse/readr/raw/main/inst/extdata/mtcars.csv")## End(Not run)# Read from multiple file paths at oncecontinents <- c("africa", "americas", "asia", "europe", "oceania")filepaths <- vapply(  paste0("mini-gapminder-", continents, ".csv"),  FUN = readr_example,  FUN.VALUE = character(1))read_csv(filepaths, id = "file")# Or directly from a string with `I()`read_csv(I("x,y\n1,2\n3,4"))# Column selection-----------------------------------------------------------# Pass column names or indexes directly to select themread_csv(readr_example("chickens.csv"), col_select = c(chicken, eggs_laid))read_csv(readr_example("chickens.csv"), col_select = c(1, 3:4))# Or use the selection helpersread_csv(  readr_example("chickens.csv"),  col_select = c(starts_with("c"), last_col()))# You can also rename specific columnsread_csv(  readr_example("chickens.csv"),  col_select = c(egg_yield = eggs_laid, everything()))# Column types --------------------------------------------------------------# By default, readr guesses the columns types, looking at `guess_max` rows.# You can override with a compact specification:read_csv(I("x,y\n1,2\n3,4"), col_types = "dc")# Or with a list of column types:read_csv(I("x,y\n1,2\n3,4"), col_types = list(col_double(), col_character()))# If there are parsing problems, you get a warning, and can extract# more details with problems()y <- read_csv(I("x\n1\n2\nb"), col_types = list(col_double()))yproblems(y)# Column names --------------------------------------------------------------# By default, readr duplicate name repair is noisyread_csv(I("x,x\n1,2\n3,4"))# Same default repair strategy, but quietread_csv(I("x,x\n1,2\n3,4"), name_repair = "unique_quiet")# There's also a global option that controls verbosity of name repairwithr::with_options(  list(rlib_name_repair_verbosity = "quiet"),  read_csv(I("x,x\n1,2\n3,4")))# Or use "minimal" to turn off name repairread_csv(I("x,x\n1,2\n3,4"), name_repair = "minimal")# File types ----------------------------------------------------------------read_csv(I("a,b\n1.0,2.0"))read_csv2(I("a;b\n1,0;2,0"))read_tsv(I("a\tb\n1.0\t2.0"))read_delim(I("a|b\n1.0|2.0"), delim = "|")

Read a delimited file by chunks

Description

Read a delimited file by chunks

Usage

read_delim_chunked(  file,  callback,  delim = NULL,  chunk_size = 10000,  quote = "\"",  escape_backslash = FALSE,  escape_double = TRUE,  col_names = TRUE,  col_types = NULL,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  comment = "",  trim_ws = FALSE,  skip = 0,  guess_max = chunk_size,  progress = show_progress(),  show_col_types = should_show_types(),  skip_empty_rows = TRUE)read_csv_chunked(  file,  callback,  chunk_size = 10000,  col_names = TRUE,  col_types = NULL,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  guess_max = chunk_size,  progress = show_progress(),  show_col_types = should_show_types(),  skip_empty_rows = TRUE)read_csv2_chunked(  file,  callback,  chunk_size = 10000,  col_names = TRUE,  col_types = NULL,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  guess_max = chunk_size,  progress = show_progress(),  show_col_types = should_show_types(),  skip_empty_rows = TRUE)read_tsv_chunked(  file,  callback,  chunk_size = 10000,  col_names = TRUE,  col_types = NULL,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  guess_max = chunk_size,  progress = show_progress(),  show_col_types = should_show_types(),  skip_empty_rows = TRUE)

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

callback

A callback function to call on each chunk

delim

Single character used to separate fields within a record.

chunk_size

The number of rows to include in each chunk

quote

Single character used to quote strings.

escape_backslash

Does the file use backslashes to escape specialcharacters? This is more general thanescape_double as backslashescan be used to escape the delimiter character, the quote character, orto add special characters like⁠\\n⁠.

escape_double

Does the file escape quotes by doubling them?i.e. If this option isTRUE, the value⁠""""⁠ representsa single quote,⁠\"⁠.

col_names

EitherTRUE,FALSE or a character vectorof column names.

IfTRUE, the first row of the input will be used as the columnnames, and will not be included in the data frame. IfFALSE, columnnames will be generated automatically: X1, X2, X3 etc.

Ifcol_names is a character vector, the values will be used as thenames of the columns, and the first row of the input will be read intothe first row of the output data frame.

Missing (NA) column names will generate a warning, and be filledin with dummy names...1,...2 etc. Duplicate column nameswill generate a warning and be made unique, seename_repair to controlhow this is done.

col_types

One ofNULL, acols() specification, ora string. Seevignette("readr") for more details.

IfNULL, all column types will be inferred fromguess_max rows of theinput, interspersed throughout the file. This is convenient (and fast),but not robust. If the guessed types are wrong, you'll need to increaseguess_max or supply the correct types yourself.

Column specifications created bylist() orcols() must containone column specification for each column. If you only want to read asubset of the columns, usecols_only().

Alternatively, you can use a compact string representation where eachcharacter represents one column:

  • c = character

  • i = integer

  • n = number

  • d = double

  • l = logical

  • f = factor

  • D = date

  • T = date time

  • t = time

  • ? = guess

  • _ or - = skip

By default, reading a file without a column specification will print amessage showing whatreadr guessed they were. To remove this message,setshow_col_types = FALSE or setoptions(readr.show_col_types = FALSE).

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

quoted_na

[Deprecated] Should missing valuesinside quotes be treated as missing values (the default) or strings. Thisparameter is soft deprecated as of readr 2.0.0.

comment

A string used to identify comments. Any text after thecomment characters will be silently ignored.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

skip

Number of lines to skip before reading data. Ifcomment issupplied any commented lines are ignoredafter skipping.

guess_max

Maximum number of lines to use for guessing column types.Will never use more than the number of lines read.Seevignette("column-types", package = "readr") for more details.

progress

Display a progress bar? By default it will only displayin an interactive session and not while knitting a document. The automaticprogress bar can be disabled by setting optionreadr.show_progress toFALSE.

show_col_types

IfFALSE, do not show the guessed column types. IfTRUE always show the column types, even if they are supplied. IfNULL(the default) only show the column types if they are not explicitly suppliedby thecol_types argument.

skip_empty_rows

Should blank rows be ignored altogether? i.e. If thisoption isTRUE then blank rows will not be represented at all. If it isFALSE then they will be represented byNA values in all the columns.

Details

The number of lines infile can exceed the maximum integer value in R (~2 billion).

See Also

Other chunked:callback,melt_delim_chunked(),read_lines_chunked()

Examples

# Cars with 3 gearsf <- function(x, pos) subset(x, gear == 3)read_csv_chunked(readr_example("mtcars.csv"), DataFrameCallback$new(f), chunk_size = 5)

Read/write a complete file

Description

read_file() reads a complete file into a single object: either acharacter vector of length one, or a raw vector.write_file() takes asingle string, or a raw vector, and writes it exactly as is. Raw vectorsare useful when dealing with binary data, or if you have text data withunknown encoding.

Usage

read_file(file, locale = default_locale())read_file_raw(file)write_file(x, file, append = FALSE, path = deprecated())

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

x

A single string, or a raw vector to write to disk.

append

IfFALSE, will overwrite existing file. IfTRUE,will append to existing file. In both cases, if the file does not exist a newfile is created.

path

[Deprecated] Use thefile argumentinstead.

Value

read_file: A length 1 character vector.read_lines_raw: A raw vector.

Examples

read_file(file.path(R.home("doc"), "AUTHORS"))read_file_raw(file.path(R.home("doc"), "AUTHORS"))tmp <- tempfile()x <- format_csv(mtcars[1:6, ])write_file(x, tmp)identical(x, read_file(tmp))read_lines(I(x))

Read a fixed width file into a tibble

Description

A fixed width file can be a very compact representation of numeric data.It's also very fast to parse, because every field is in the same place inevery line. Unfortunately, it's painful to parse because you need todescribe the length of every field. Readr aims to make it as easy as possibleby providing a number of different ways to describe the field structure.

Usage

read_fwf(  file,  col_positions = fwf_empty(file, skip, n = guess_max),  col_types = NULL,  col_select = NULL,  id = NULL,  locale = default_locale(),  na = c("", "NA"),  comment = "",  trim_ws = TRUE,  skip = 0,  n_max = Inf,  guess_max = min(n_max, 1000),  progress = show_progress(),  name_repair = "unique",  num_threads = readr_threads(),  show_col_types = should_show_types(),  lazy = should_read_lazy(),  skip_empty_rows = TRUE)fwf_empty(  file,  skip = 0,  skip_empty_rows = FALSE,  col_names = NULL,  comment = "",  n = 100L)fwf_widths(widths, col_names = NULL)fwf_positions(start, end = NULL, col_names = NULL)fwf_cols(...)

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

col_positions

Column positions, as created byfwf_empty(),fwf_widths() orfwf_positions(). To read in only selected fields,usefwf_positions(). If the width of the last column is variable (aragged fwf file), supply the last end position as NA.

col_types

One ofNULL, acols() specification, ora string. Seevignette("readr") for more details.

IfNULL, all column types will be inferred fromguess_max rows of theinput, interspersed throughout the file. This is convenient (and fast),but not robust. If the guessed types are wrong, you'll need to increaseguess_max or supply the correct types yourself.

Column specifications created bylist() orcols() must containone column specification for each column. If you only want to read asubset of the columns, usecols_only().

Alternatively, you can use a compact string representation where eachcharacter represents one column:

  • c = character

  • i = integer

  • n = number

  • d = double

  • l = logical

  • f = factor

  • D = date

  • T = date time

  • t = time

  • ? = guess

  • _ or - = skip

By default, reading a file without a column specification will print amessage showing whatreadr guessed they were. To remove this message,setshow_col_types = FALSE or setoptions(readr.show_col_types = FALSE).

col_select

Columns to include in the results. You can use the samemini-language asdplyr::select() to refer to the columns by name. Usec() to use more than one selection expression. Although thisusage is less common,col_select also accepts a numeric column index. See?tidyselect::language for full details on theselection language.

id

The name of a column in which to store the file path. This isuseful when reading multiple input files and there is data in the filepaths, such as the data collection date. IfNULL (the default) no extracolumn is created.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

comment

A string used to identify comments. Any text after thecomment characters will be silently ignored.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

skip

Number of lines to skip before reading data.

n_max

Maximum number of lines to read.

guess_max

Maximum number of lines to use for guessing column types.Will never use more than the number of lines read.Seevignette("column-types", package = "readr") for more details.

progress

Display a progress bar? By default it will only displayin an interactive session and not while knitting a document. The automaticprogress bar can be disabled by setting optionreadr.show_progress toFALSE.

name_repair

Handling of column names. The default behaviour is toensure column names are"unique". Various repair strategies aresupported:

  • "minimal": No name repair or checks, beyond basic existence of names.

  • "unique" (default value): Make sure names are unique and not empty.

  • "check_unique": No name repair, but check they areunique.

  • "unique_quiet": Repair with theunique strategy, quietly.

  • "universal": Make the namesunique and syntactic.

  • "universal_quiet": Repair with theuniversal strategy, quietly.

  • A function: Apply custom name repair (e.g.,name_repair = make.namesfor names in the style of base R).

  • A purrr-style anonymous function, seerlang::as_function().

This argument is passed on asrepair tovctrs::vec_as_names().See there for more details on these terms and the strategies usedto enforce them.

num_threads

The number of processing threads to use for initialparsing and lazy reading of data. If your data contains newlines withinfields the parser should automatically detect this and fall back to usingone thread only. However if you know your file has newlines within quotedfields it is safest to setnum_threads = 1 explicitly.

show_col_types

IfFALSE, do not show the guessed column types. IfTRUE always show the column types, even if they are supplied. IfNULL(the default) only show the column types if they are not explicitly suppliedby thecol_types argument.

lazy

Read values lazily? By default, this isFALSE, because thereare special considerations when reading a file lazily that have tripped upsome users. Specifically, things get tricky when reading and then writingback into the same file. But, in general, lazy reading (lazy = TRUE) hasmany benefits, especially for interactive use and when your downstream workonly involves a subset of the rows or columns.

Learn more inshould_read_lazy() and in the documentation for thealtrep argument ofvroom::vroom().

skip_empty_rows

Should blank rows be ignored altogether? i.e. If thisoption isTRUE then blank rows will not be represented at all. If it isFALSE then they will be represented byNA values in all the columns.

col_names

Either NULL, or a character vector column names.

n

Number of lines the tokenizer will read to determine file structure. By defaultit is set to 100.

widths

Width of each field. Use NA as width of last field whenreading a ragged fwf file.

start,end

Starting and ending (inclusive) positions of each field.Use NA as last end field when reading a ragged fwf file.

...

If the first element is a data frame,then it must have all numeric columns and either one or two rows.The column names are the variable names. The column values are thevariable widths if a length one vector, and if length two, variable start and endpositions. The elements of... are used to construct a data framewith or or two rows as above.

Second edition changes

Comments are no longer looked for anywhere in the file.They are now only ignored at the start of a line.

See Also

read_table() to read fixed width files where eachcolumn is separated by whitespace.

Examples

fwf_sample <- readr_example("fwf-sample.txt")writeLines(read_lines(fwf_sample))# You can specify column positions in several ways:# 1. Guess based on position of empty columnsread_fwf(fwf_sample, fwf_empty(fwf_sample, col_names = c("first", "last", "state", "ssn")))# 2. A vector of field widthsread_fwf(fwf_sample, fwf_widths(c(20, 10, 12), c("name", "state", "ssn")))# 3. Paired vectors of start and end positionsread_fwf(fwf_sample, fwf_positions(c(1, 30), c(20, 42), c("name", "ssn")))# 4. Named arguments with start and end positionsread_fwf(fwf_sample, fwf_cols(name = c(1, 20), ssn = c(30, 42)))# 5. Named arguments with column widthsread_fwf(fwf_sample, fwf_cols(name = 20, state = 10, ssn = 12))

Read/write lines to/from a file

Description

read_lines() reads up ton_max lines from a file. New lines arenot included in the output.read_lines_raw() produces a list of rawvectors, and is useful for handling data with unknown encoding.write_lines() takes a character vector or list of raw vectors, appending anew line after each entry.

Usage

read_lines(  file,  skip = 0,  skip_empty_rows = FALSE,  n_max = Inf,  locale = default_locale(),  na = character(),  lazy = should_read_lazy(),  num_threads = readr_threads(),  progress = show_progress())read_lines_raw(  file,  skip = 0,  n_max = -1L,  num_threads = readr_threads(),  progress = show_progress())write_lines(  x,  file,  sep = "\n",  na = "NA",  append = FALSE,  num_threads = readr_threads(),  path = deprecated())

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

skip

Number of lines to skip before reading data.

skip_empty_rows

Should blank rows be ignored altogether? i.e. If thisoption isTRUE then blank rows will not be represented at all. If it isFALSE then they will be represented byNA values in all the columns.

n_max

Number of lines to read. Ifn_max is -1, all lines infile will be read.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

lazy

Read values lazily? By default, this isFALSE, because thereare special considerations when reading a file lazily that have tripped upsome users. Specifically, things get tricky when reading and then writingback into the same file. But, in general, lazy reading (lazy = TRUE) hasmany benefits, especially for interactive use and when your downstream workonly involves a subset of the rows or columns.

Learn more inshould_read_lazy() and in the documentation for thealtrep argument ofvroom::vroom().

num_threads

The number of processing threads to use for initialparsing and lazy reading of data. If your data contains newlines withinfields the parser should automatically detect this and fall back to usingone thread only. However if you know your file has newlines within quotedfields it is safest to setnum_threads = 1 explicitly.

progress

Display a progress bar? By default it will only displayin an interactive session and not while knitting a document. The automaticprogress bar can be disabled by setting optionreadr.show_progress toFALSE.

x

A character vector or list of raw vectors to write to disk.

sep

The line separator. Defaults to⁠\\n⁠, commonly used on POSIXsystems like macOS and linux. For native windows (CRLF) separators use⁠\\r\\n⁠.

append

IfFALSE, will overwrite existing file. IfTRUE,will append to existing file. In both cases, if the file does not exist a newfile is created.

path

[Deprecated] Use thefile argumentinstead.

Value

read_lines(): A character vector with one element for each line.read_lines_raw(): A list containing a raw vector for each line.

write_lines() returnsx, invisibly.

Examples

read_lines(file.path(R.home("doc"), "AUTHORS"), n_max = 10)read_lines_raw(file.path(R.home("doc"), "AUTHORS"), n_max = 10)tmp <- tempfile()write_lines(rownames(mtcars), tmp)read_lines(tmp, lazy = FALSE)read_file(tmp) # note trailing \nwrite_lines(airquality$Ozone, tmp, na = "-1")read_lines(tmp)

Read lines from a file or string by chunk.

Description

Read lines from a file or string by chunk.

Usage

read_lines_chunked(  file,  callback,  chunk_size = 10000,  skip = 0,  locale = default_locale(),  na = character(),  progress = show_progress())read_lines_raw_chunked(  file,  callback,  chunk_size = 10000,  skip = 0,  progress = show_progress())

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

callback

A callback function to call on each chunk

chunk_size

The number of rows to include in each chunk

skip

Number of lines to skip before reading data.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

progress

Display a progress bar? By default it will only displayin an interactive session and not while knitting a document. The automaticprogress bar can be disabled by setting optionreadr.show_progress toFALSE.

See Also

Other chunked:callback,melt_delim_chunked(),read_delim_chunked()


Read common/combined log file into a tibble

Description

This is a fairly standard format for log files - it uses both quotesand square brackets for quoting, and there may be literal quotes embeddedin a quoted string. The dash, "-", is used for missing values.

Usage

read_log(  file,  col_names = FALSE,  col_types = NULL,  trim_ws = TRUE,  skip = 0,  n_max = Inf,  show_col_types = should_show_types(),  progress = show_progress())

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

col_names

EitherTRUE,FALSE or a character vectorof column names.

IfTRUE, the first row of the input will be used as the columnnames, and will not be included in the data frame. IfFALSE, columnnames will be generated automatically: X1, X2, X3 etc.

Ifcol_names is a character vector, the values will be used as thenames of the columns, and the first row of the input will be read intothe first row of the output data frame.

Missing (NA) column names will generate a warning, and be filledin with dummy names...1,...2 etc. Duplicate column nameswill generate a warning and be made unique, seename_repair to controlhow this is done.

col_types

One ofNULL, acols() specification, ora string. Seevignette("readr") for more details.

IfNULL, all column types will be inferred fromguess_max rows of theinput, interspersed throughout the file. This is convenient (and fast),but not robust. If the guessed types are wrong, you'll need to increaseguess_max or supply the correct types yourself.

Column specifications created bylist() orcols() must containone column specification for each column. If you only want to read asubset of the columns, usecols_only().

Alternatively, you can use a compact string representation where eachcharacter represents one column:

  • c = character

  • i = integer

  • n = number

  • d = double

  • l = logical

  • f = factor

  • D = date

  • T = date time

  • t = time

  • ? = guess

  • _ or - = skip

By default, reading a file without a column specification will print amessage showing whatreadr guessed they were. To remove this message,setshow_col_types = FALSE or setoptions(readr.show_col_types = FALSE).

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

skip

Number of lines to skip before reading data. Ifcomment issupplied any commented lines are ignoredafter skipping.

n_max

Maximum number of lines to read.

show_col_types

IfFALSE, do not show the guessed column types. IfTRUE always show the column types, even if they are supplied. IfNULL(the default) only show the column types if they are not explicitly suppliedby thecol_types argument.

progress

Display a progress bar? By default it will only displayin an interactive session and not while knitting a document. The automaticprogress bar can be disabled by setting optionreadr.show_progress toFALSE.

Examples

read_log(readr_example("example.log"))

Read/write RDS files.

Description

Consistent wrapper aroundsaveRDS() andreadRDS().write_rds() does not compress by default as space is generally cheaperthan time.

Usage

read_rds(file, refhook = NULL)write_rds(  x,  file,  compress = c("none", "gz", "bz2", "xz"),  version = 2,  refhook = NULL,  text = FALSE,  path = deprecated(),  ...)

Arguments

file

The file path to read from/write to.

refhook

A function to handle reference objects.

x

R object to write to serialise.

compress

Compression method to use: "none", "gz" ,"bz", or "xz".

version

Serialization format version to be used. The default value is 2as it's compatible for R versions prior to 3.5.0. Seebase::saveRDS()for more details.

text

IfTRUE a text representation is used, otherwise a binary representation is used.

path

[Deprecated] Use thefile argumentinstead.

...

Additional arguments to connection function. For example, controlthe space-time trade-off of different compression methods withcompression. Seeconnections() for more details.

Value

write_rds() returnsx, invisibly.

Examples

temp <- tempfile()write_rds(mtcars, temp)read_rds(temp)## Not run: write_rds(mtcars, "compressed_mtc.rds", "xz", compression = 9L)## End(Not run)

Read whitespace-separated columns into a tibble

Description

read_table() is designed to read the type of textualdata where each column is separated by one (or more) columns of space.

read_table() is likeread.table(), it allows any number of whitespacecharacters between columns, and the lines can be of different lengths.

spec_table() returns the column specifications rather than a data frame.

Usage

read_table(  file,  col_names = TRUE,  col_types = NULL,  locale = default_locale(),  na = "NA",  skip = 0,  n_max = Inf,  guess_max = min(n_max, 1000),  progress = show_progress(),  comment = "",  show_col_types = should_show_types(),  skip_empty_rows = TRUE)

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

col_names

EitherTRUE,FALSE or a character vectorof column names.

IfTRUE, the first row of the input will be used as the columnnames, and will not be included in the data frame. IfFALSE, columnnames will be generated automatically: X1, X2, X3 etc.

Ifcol_names is a character vector, the values will be used as thenames of the columns, and the first row of the input will be read intothe first row of the output data frame.

Missing (NA) column names will generate a warning, and be filledin with dummy names...1,...2 etc. Duplicate column nameswill generate a warning and be made unique, seename_repair to controlhow this is done.

col_types

One ofNULL, acols() specification, ora string. Seevignette("readr") for more details.

IfNULL, all column types will be inferred fromguess_max rows of theinput, interspersed throughout the file. This is convenient (and fast),but not robust. If the guessed types are wrong, you'll need to increaseguess_max or supply the correct types yourself.

Column specifications created bylist() orcols() must containone column specification for each column. If you only want to read asubset of the columns, usecols_only().

Alternatively, you can use a compact string representation where eachcharacter represents one column:

  • c = character

  • i = integer

  • n = number

  • d = double

  • l = logical

  • f = factor

  • D = date

  • T = date time

  • t = time

  • ? = guess

  • _ or - = skip

By default, reading a file without a column specification will print amessage showing whatreadr guessed they were. To remove this message,setshow_col_types = FALSE or setoptions(readr.show_col_types = FALSE).

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

skip

Number of lines to skip before reading data.

n_max

Maximum number of lines to read.

guess_max

Maximum number of lines to use for guessing column types.Will never use more than the number of lines read.Seevignette("column-types", package = "readr") for more details.

progress

Display a progress bar? By default it will only displayin an interactive session and not while knitting a document. The automaticprogress bar can be disabled by setting optionreadr.show_progress toFALSE.

comment

A string used to identify comments. Any text after thecomment characters will be silently ignored.

show_col_types

IfFALSE, do not show the guessed column types. IfTRUE always show the column types, even if they are supplied. IfNULL(the default) only show the column types if they are not explicitly suppliedby thecol_types argument.

skip_empty_rows

Should blank rows be ignored altogether? i.e. If thisoption isTRUE then blank rows will not be represented at all. If it isFALSE then they will be represented byNA values in all the columns.

See Also

read_fwf() to read fixed width files where each columnis not separated by whitespace.read_fwf() is also useful for readingtabular data with non-standard formatting.

Examples

ws <- readr_example("whitespace-sample.txt")writeLines(read_lines(ws))read_table(ws)

Read whitespace-separated columns into a tibble

Description

[Deprecated]

This function is deprecated because we renamed it toread_table() andremoved the oldread_table function, which was too strict for most casesand was analogous to just usingread_fwf().

Usage

read_table2(  file,  col_names = TRUE,  col_types = NULL,  locale = default_locale(),  na = "NA",  skip = 0,  n_max = Inf,  guess_max = min(n_max, 1000),  progress = show_progress(),  comment = "",  skip_empty_rows = TRUE)

Get path to readr example

Description

readr comes bundled with a number of sample files in itsinst/extdatadirectory. This function make them easy to access

Usage

readr_example(file = NULL)

Arguments

file

Name of file. IfNULL, the example files will be listed.

Examples

readr_example()readr_example("challenge.csv")

Determine how many threads readr should use when processing

Description

The number of threads returned can be set by

Usage

readr_threads()

Determine whether to read a file lazily

Description

This function consults the optionreadr.read_lazy to figure out whether todo lazy reading or not. If the option is unset, the default isFALSE,meaning readr will read files eagerly, not lazily. If you want to use thisoption to express a preference for lazy reading, do this:

options(readr.read_lazy = TRUE)

Typically, one would use the option to control lazy reading at the session,file, or user level. Thelazy argument of functions likeread_csv() canbe used to control laziness in an individual call.

Usage

should_read_lazy()

See Also

The blog post"Eager vs lazy reading in readr 2.1.0" explainsthe benefits (and downsides) of lazy reading.


Determine whether column types should be shown

Description

Wrapper aroundgetOption("readr.show_col_types") that implements some fallback logic if the option is unset. This returns:

Usage

should_show_types()

Determine whether progress bars should be shown

Description

By default, readr shows progress bars. However, progress reporting issuppressed if any of the following conditions hold:

Usage

show_progress()

Generate a column specification

Description

When printed, only the first 20 columns are printed by default. To override,setoptions(readr.num_columns) can be used to modify this (a value of 0turns off printing).

Usage

spec_delim(  file,  delim = NULL,  quote = "\"",  escape_backslash = FALSE,  escape_double = TRUE,  col_names = TRUE,  col_types = list(),  col_select = NULL,  id = NULL,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  comment = "",  trim_ws = FALSE,  skip = 0,  n_max = 0,  guess_max = 1000,  name_repair = "unique",  num_threads = readr_threads(),  progress = show_progress(),  show_col_types = should_show_types(),  skip_empty_rows = TRUE,  lazy = should_read_lazy())spec_csv(  file,  col_names = TRUE,  col_types = list(),  col_select = NULL,  id = NULL,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  n_max = 0,  guess_max = 1000,  name_repair = "unique",  num_threads = readr_threads(),  progress = show_progress(),  show_col_types = should_show_types(),  skip_empty_rows = TRUE,  lazy = should_read_lazy())spec_csv2(  file,  col_names = TRUE,  col_types = list(),  col_select = NULL,  id = NULL,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  n_max = 0,  guess_max = 1000,  progress = show_progress(),  name_repair = "unique",  num_threads = readr_threads(),  show_col_types = should_show_types(),  skip_empty_rows = TRUE,  lazy = should_read_lazy())spec_tsv(  file,  col_names = TRUE,  col_types = list(),  col_select = NULL,  id = NULL,  locale = default_locale(),  na = c("", "NA"),  quoted_na = TRUE,  quote = "\"",  comment = "",  trim_ws = TRUE,  skip = 0,  n_max = 0,  guess_max = 1000,  progress = show_progress(),  name_repair = "unique",  num_threads = readr_threads(),  show_col_types = should_show_types(),  skip_empty_rows = TRUE,  lazy = should_read_lazy())spec_table(  file,  col_names = TRUE,  col_types = list(),  locale = default_locale(),  na = "NA",  skip = 0,  n_max = 0,  guess_max = 1000,  progress = show_progress(),  comment = "",  show_col_types = should_show_types(),  skip_empty_rows = TRUE)

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

delim

Single character used to separate fields within a record.

quote

Single character used to quote strings.

escape_backslash

Does the file use backslashes to escape specialcharacters? This is more general thanescape_double as backslashescan be used to escape the delimiter character, the quote character, orto add special characters like⁠\\n⁠.

escape_double

Does the file escape quotes by doubling them?i.e. If this option isTRUE, the value⁠""""⁠ representsa single quote,⁠\"⁠.

col_names

EitherTRUE,FALSE or a character vectorof column names.

IfTRUE, the first row of the input will be used as the columnnames, and will not be included in the data frame. IfFALSE, columnnames will be generated automatically: X1, X2, X3 etc.

Ifcol_names is a character vector, the values will be used as thenames of the columns, and the first row of the input will be read intothe first row of the output data frame.

Missing (NA) column names will generate a warning, and be filledin with dummy names...1,...2 etc. Duplicate column nameswill generate a warning and be made unique, seename_repair to controlhow this is done.

col_types

One ofNULL, acols() specification, ora string. Seevignette("readr") for more details.

IfNULL, all column types will be inferred fromguess_max rows of theinput, interspersed throughout the file. This is convenient (and fast),but not robust. If the guessed types are wrong, you'll need to increaseguess_max or supply the correct types yourself.

Column specifications created bylist() orcols() must containone column specification for each column. If you only want to read asubset of the columns, usecols_only().

Alternatively, you can use a compact string representation where eachcharacter represents one column:

  • c = character

  • i = integer

  • n = number

  • d = double

  • l = logical

  • f = factor

  • D = date

  • T = date time

  • t = time

  • ? = guess

  • _ or - = skip

By default, reading a file without a column specification will print amessage showing whatreadr guessed they were. To remove this message,setshow_col_types = FALSE or setoptions(readr.show_col_types = FALSE).

col_select

Columns to include in the results. You can use the samemini-language asdplyr::select() to refer to the columns by name. Usec() to use more than one selection expression. Although thisusage is less common,col_select also accepts a numeric column index. See?tidyselect::language for full details on theselection language.

id

The name of a column in which to store the file path. This isuseful when reading multiple input files and there is data in the filepaths, such as the data collection date. IfNULL (the default) no extracolumn is created.

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

quoted_na

[Deprecated] Should missing valuesinside quotes be treated as missing values (the default) or strings. Thisparameter is soft deprecated as of readr 2.0.0.

comment

A string used to identify comments. Any text after thecomment characters will be silently ignored.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

skip

Number of lines to skip before reading data. Ifcomment issupplied any commented lines are ignoredafter skipping.

n_max

Maximum number of lines to read.

guess_max

Maximum number of lines to use for guessing column types.Will never use more than the number of lines read.Seevignette("column-types", package = "readr") for more details.

name_repair

Handling of column names. The default behaviour is toensure column names are"unique". Various repair strategies aresupported:

  • "minimal": No name repair or checks, beyond basic existence of names.

  • "unique" (default value): Make sure names are unique and not empty.

  • "check_unique": No name repair, but check they areunique.

  • "unique_quiet": Repair with theunique strategy, quietly.

  • "universal": Make the namesunique and syntactic.

  • "universal_quiet": Repair with theuniversal strategy, quietly.

  • A function: Apply custom name repair (e.g.,name_repair = make.namesfor names in the style of base R).

  • A purrr-style anonymous function, seerlang::as_function().

This argument is passed on asrepair tovctrs::vec_as_names().See there for more details on these terms and the strategies usedto enforce them.

num_threads

The number of processing threads to use for initialparsing and lazy reading of data. If your data contains newlines withinfields the parser should automatically detect this and fall back to usingone thread only. However if you know your file has newlines within quotedfields it is safest to setnum_threads = 1 explicitly.

progress

Display a progress bar? By default it will only displayin an interactive session and not while knitting a document. The automaticprogress bar can be disabled by setting optionreadr.show_progress toFALSE.

show_col_types

IfFALSE, do not show the guessed column types. IfTRUE always show the column types, even if they are supplied. IfNULL(the default) only show the column types if they are not explicitly suppliedby thecol_types argument.

skip_empty_rows

Should blank rows be ignored altogether? i.e. If thisoption isTRUE then blank rows will not be represented at all. If it isFALSE then they will be represented byNA values in all the columns.

lazy

Read values lazily? By default, this isFALSE, because thereare special considerations when reading a file lazily that have tripped upsome users. Specifically, things get tricky when reading and then writingback into the same file. But, in general, lazy reading (lazy = TRUE) hasmany benefits, especially for interactive use and when your downstream workonly involves a subset of the rows or columns.

Learn more inshould_read_lazy() and in the documentation for thealtrep argument ofvroom::vroom().

Value

Thecol_spec generated for the file.

Examples

# Input sources -------------------------------------------------------------# Retrieve specs from a pathspec_csv(system.file("extdata/mtcars.csv", package = "readr"))spec_csv(system.file("extdata/mtcars.csv.zip", package = "readr"))# Or directly from a string (must contain a newline)spec_csv(I("x,y\n1,2\n3,4"))# Column types --------------------------------------------------------------# By default, readr guesses the columns types, looking at 1000 rows# throughout the file.# You can specify the number of rows used with guess_max.spec_csv(system.file("extdata/mtcars.csv", package = "readr"), guess_max = 20)

Tokenize a file/string.

Description

Turns input into a character vector. Usually the tokenization is done purelyin C++, and never exposed to R (because that requires a copy). This functionis useful for testing, or when a file doesn't parse correctly and you wantto see the underlying tokens.

Usage

tokenize(file, tokenizer = tokenizer_csv(), skip = 0, n_max = -1L)

Arguments

file

Either a path to a file, a connection, or literal data(either a single string or a raw vector).

Files ending in.gz,.bz2,.xz, or.zip willbe automatically uncompressed. Files starting with⁠http://⁠,⁠https://⁠,⁠ftp://⁠, or⁠ftps://⁠ will be automaticallydownloaded. Remote gz files can also be automatically downloaded anddecompressed.

Literal data is most useful for examples and tests. To be recognised asliteral data, the input must be either wrapped withI(), be a stringcontaining at least one new line, or be a vector containing at least onestring with a new line.

Using a value ofclipboard() will read from the system clipboard.

tokenizer

A tokenizer specification.

skip

Number of lines to skip before reading data.

n_max

Optionally, maximum number of rows to tokenize.

Examples

tokenize("1,2\n3,4,5\n\n6")# Only tokenize first two linestokenize("1,2\n3,4,5\n\n6", n = 2)

Re-convert character columns in existing data frame

Description

This is useful if you need to do some manual munging - you can read thecolumns in as character, clean it up with (e.g.) regular expressions andthen let readr take another stab at parsing it. The name is a homage tothe baseutils::type.convert().

Usage

type_convert(  df,  col_types = NULL,  na = c("", "NA"),  trim_ws = TRUE,  locale = default_locale(),  guess_integer = FALSE)

Arguments

df

A data frame.

col_types

One ofNULL, acols() specification, ora string. Seevignette("readr") for more details.

IfNULL, column types will be imputed using all rows.

na

Character vector of strings to interpret as missing values. Set thisoption tocharacter() to indicate no missing values.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed fromeach field before parsing it?

locale

The locale controls defaults that vary from place to place.The default locale is US-centric (like R), but you can uselocale() to create your own locale that controls things likethe default time zone, encoding, decimal mark, big mark, and day/monthnames.

guess_integer

IfTRUE, guess integer types for whole numbers, ifFALSE guess numeric type for all numbers.

Note

type_convert() removes a 'spec' attribute,because it likely modifies the column data types.(seespec() for more information about column specifications).

Examples

df <- data.frame(  x = as.character(runif(10)),  y = as.character(sample(10)),  stringsAsFactors = FALSE)str(df)str(type_convert(df))df <- data.frame(x = c("NA", "10"), stringsAsFactors = FALSE)str(type_convert(df))# Type convert can be used to infer types from an entire dataset# first read the data as characterdata <- read_csv(readr_example("mtcars.csv"),  col_types = list(.default = col_character()))str(data)# Then convert it with type_converttype_convert(data)

Temporarily change the active readr edition

Description

with_edition() allows you to change the active edition of readr for a givenblock of code.local_edition() allows you to change the active edition ofreadr until the end of the current function or file.

Usage

with_edition(edition, code)local_edition(edition, env = parent.frame())

Arguments

edition

Should be a single integer, such as1 or2.

code

Code to run with the changed edition.

env

Environment that controls scope of changes. For expert use only.

Examples

with_edition(1, edition_get())with_edition(2, edition_get())# readr 1e and 2e behave differently when input rows have different number# number of fieldswith_edition(1, read_csv("1,2\n3,4,5", col_names = c("X", "Y", "Z")))with_edition(2, read_csv("1,2\n3,4,5", col_names = c("X", "Y", "Z")))# local_edition() applies in a specific scope, for example, inside a functionread_csv_1e <- function(...) {  local_edition(1)  read_csv(...)}read_csv("1,2\n3,4,5", col_names = c("X", "Y", "Z"))      # 2e behaviourread_csv_1e("1,2\n3,4,5", col_names = c("X", "Y", "Z"))   # 1e behaviourread_csv("1,2\n3,4,5", col_names = c("X", "Y", "Z"))      # 2e behaviour

Write a data frame to a delimited file

Description

The⁠write_*()⁠ family of functions are an improvement to analogous function suchaswrite.csv() because they are approximately twice as fast. Unlikewrite.csv(),these functions do not include row names as a column in the written file.A generic function,output_column(), is applied to each variableto coerce columns to suitable output.

Usage

write_delim(  x,  file,  delim = " ",  na = "NA",  append = FALSE,  col_names = !append,  quote = c("needed", "all", "none"),  escape = c("double", "backslash", "none"),  eol = "\n",  num_threads = readr_threads(),  progress = show_progress(),  path = deprecated(),  quote_escape = deprecated())write_csv(  x,  file,  na = "NA",  append = FALSE,  col_names = !append,  quote = c("needed", "all", "none"),  escape = c("double", "backslash", "none"),  eol = "\n",  num_threads = readr_threads(),  progress = show_progress(),  path = deprecated(),  quote_escape = deprecated())write_csv2(  x,  file,  na = "NA",  append = FALSE,  col_names = !append,  quote = c("needed", "all", "none"),  escape = c("double", "backslash", "none"),  eol = "\n",  num_threads = readr_threads(),  progress = show_progress(),  path = deprecated(),  quote_escape = deprecated())write_excel_csv(  x,  file,  na = "NA",  append = FALSE,  col_names = !append,  delim = ",",  quote = "all",  escape = c("double", "backslash", "none"),  eol = "\n",  num_threads = readr_threads(),  progress = show_progress(),  path = deprecated(),  quote_escape = deprecated())write_excel_csv2(  x,  file,  na = "NA",  append = FALSE,  col_names = !append,  delim = ";",  quote = "all",  escape = c("double", "backslash", "none"),  eol = "\n",  num_threads = readr_threads(),  progress = show_progress(),  path = deprecated(),  quote_escape = deprecated())write_tsv(  x,  file,  na = "NA",  append = FALSE,  col_names = !append,  quote = "none",  escape = c("double", "backslash", "none"),  eol = "\n",  num_threads = readr_threads(),  progress = show_progress(),  path = deprecated(),  quote_escape = deprecated())

Arguments

x

A data frame or tibble to write to disk.

file

File or connection to write to.

delim

Delimiter used to separate values. Defaults to" " forwrite_delim(),"," forwrite_excel_csv() and";" forwrite_excel_csv2(). Must be a single character.

na

String used for missing values. Defaults to NA. Missing valueswill never be quoted; strings with the same value asna willalways be quoted.

append

IfFALSE, will overwrite existing file. IfTRUE,will append to existing file. In both cases, if the file does not exist a newfile is created.

col_names

IfFALSE, column names will not be included at the top of the file. IfTRUE,column names will be included. If not specified,col_names will take the opposite value given toappend.

quote

How to handle fields which contain characters that need to bequoted.

  • needed - Values are only quoted if needed: if they contain a delimiter,quote, or newline.

  • all - Quote all fields.

  • none - Never quote fields.

escape

The type of escape to use when quotes are in the data.

  • double - quotes are escaped by doubling them.

  • backslash - quotes are escaped by a preceding backslash.

  • none - quotes are not escaped.

eol

The end of line character to use. Most commonly either"\n" forUnix style newlines, or"\r\n" for Windows style newlines.

num_threads

Number of threads to use when reading and materializingvectors. If your data contains newlines within fields the parser willautomatically be forced to use a single thread only.

progress

Display a progress bar? By default it will only displayin an interactive session and not while knitting a document. The displayis updated every 50,000 values and will only display if estimated readingtime is 5 seconds or more. The automatic progress bar can be disabled bysetting optionreadr.show_progress toFALSE.

path

[Deprecated] Use thefile argumentinstead.

quote_escape

[Deprecated] Use theescapeargument instead.

Value

⁠write_*()⁠ returns the inputx invisibly.

Output

Factors are coerced to character. Doubles are formatted to a decimal stringusing the grisu3 algorithm.POSIXct values are formatted as ISO8601 with aUTC timezoneNote:POSIXct objects in local or non-UTC timezones will beconverted to UTC time before writing.

All columns are encoded as UTF-8.write_excel_csv() andwrite_excel_csv2() also include aUTF-8 Byte order markwhich indicates to Excel the csv is UTF-8 encoded.

write_excel_csv2() andwrite_csv2 were created to allow users withdifferent locale settings to save .csv files using their default settings(e.g.⁠;⁠ as the column separator and⁠,⁠ as the decimal separator).This is common in some European countries.

Values are only quoted if they contain a comma, quote or newline.

The⁠write_*()⁠ functions will automatically compress outputs if an appropriate extension is given.Three extensions are currently supported:.gz for gzip compression,.bz2 for bzip2 compression and.xz for lzma compression. See the examples for more information.

References

Florian Loitsch, Printing Floating-Point Numbers Quickly andAccurately with Integers, PLDI '10,http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf

Examples

# If only a file name is specified, write_()* will write# the file to the current working directory.write_csv(mtcars, "mtcars.csv")write_tsv(mtcars, "mtcars.tsv")# If you add an extension to the file name, write_()* will# automatically compress the output.write_tsv(mtcars, "mtcars.tsv.gz")write_tsv(mtcars, "mtcars.tsv.bz2")write_tsv(mtcars, "mtcars.tsv.xz")

[8]ページ先頭

©2009-2025 Movatter.jp