tidyverse/readrPublic

NotificationsYou must be signed in to change notification settings
Fork292
Star1k

Read flat files (csv, tsv, fwf) into R

License

Unknown, MIT licenses found

Licenses found

1k stars 292 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,922 Commits
.claude		.claude
.github		.github
.vscode		.vscode
R		R
data-raw		data-raw
inst		inst
man		man
notes		notes
pkgdown/favicon		pkgdown/favicon
revdep		revdep
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitattributes		.gitattributes
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
MAINTENANCE.md		MAINTENANCE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
air.toml		air.toml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
readr.Rproj		readr.Rproj

Repository files navigation

readr

Overview

The goal of readr is to provide a fast and friendly way to readrectangular data from delimited files, such as comma-separated values(CSV) and tab-separated values (TSV). It is designed to parse many typesof data found in the wild, while providing an informative problem reportwhen parsing leads to unexpected results. If you are new to readr, thebest place to start is thedata importchapter in R for Data Science.

Installation

# The easiest way to get readr is to install the whole tidyverse:install.packages("tidyverse")# Alternatively, install just readr:install.packages("readr")

# Or you can install the development version from GitHub:# install.packages("pak")pak::pak("tidyverse/readr")

Cheatsheet

Usage

readr is part of the core tidyverse, so you can load it with:

library(tidyverse)#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──#> ✔ dplyr     1.1.4          ✔ readr     2.1.6.9000#> ✔ forcats   1.0.1          ✔ stringr   1.6.0#> ✔ ggplot2   4.0.1          ✔ tibble    3.3.0#> ✔ lubridate 1.9.4          ✔ tidyr     1.3.1#> ✔ purrr     1.2.0#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──#> ✖ dplyr::filter() masks stats::filter()#> ✖ dplyr::lag()    masks stats::lag()#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Of course, you can also load readr as an individual package:

library(readr)

To read a rectangular dataset with readr, you combine two pieces: afunction that parses the lines of the file into individual fields and acolumn specification.

readr supports the following file formats with theseread_*()functions:

read_csv(): comma-separated values (CSV)
read_tsv(): tab-separated values (TSV)
read_csv2(): semicolon-separated values with, as the decimal mark
read_delim(): delimited files (CSV and TSV are important specialcases)
read_fwf(): fixed-width files
read_table(): whitespace-separated files
read_log(): web log files

A column specification describes how each column should be convertedfrom a character vector to a specific data type (e.g. character,numeric, datetime, etc.). In the absence of a column specification,readr will guess column types from the data.vignette("column-types")gives more detail on how readr guesses the column types. Column typeguessing is very handy, especially during data exploration, but it’simportant to remember these arejust guesses. As any data analysisproject matures past the exploratory phase, the best strategy is toprovide explicit column types.

The following example loads a sample file bundled with readr and guessesthe column types:

(chickens<- read_csv(readr_example("chickens.csv")))#> Rows: 5 Columns: 4#> ── Column specification ────────────────────────────────────────────────────────#> Delimiter: ","#> chr (3): chicken, sex, motto#> dbl (1): eggs_laid#>#> ℹ Use `spec()` to retrieve the full column specification for this data.#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.#> # A tibble: 5 × 4#>   chicken                 sex     eggs_laid motto#>   <chr>                   <chr>       <dbl> <chr>#> 1 Foghorn Leghorn         rooster         0 That's a joke, ah say, that's a jok…#> 2 Chicken Little          hen             3 The sky is falling!#> 3 Ginger                  hen            12 Listen. We'll either die free chick…#> 4 Camilla the Chicken     hen             7 Bawk, buck, ba-gawk.#> 5 Ernie The Giant Chicken rooster         0 Put Captain Solo in the cargo hold.

Note that readr prints the column types – theguessed column types, inthis case. This is useful because it allows you to check that thecolumns have been read in as you expect. If they haven’t, that means youneed to provide the column specification. This sounds like a lot oftrouble, but luckily readr affords a nice workflow for this. Usespec() to retrieve the (guessed) column specification from yourinitial effort.

spec(chickens)#> cols(#>   chicken = col_character(),#>   sex = col_character(),#>   eggs_laid = col_double(),#>   motto = col_character()#> )

Now you can copy, paste, and tweak this, to create a more explicit readrcall that expresses the desired column types. Here we express thatsexshould be a factor with levelsrooster andhen, in that order, andthateggs_laid should be integer.

chickens<- read_csv(  readr_example("chickens.csv"),col_types= cols(chicken= col_character(),sex= col_factor(levels= c("rooster","hen")),eggs_laid= col_integer(),motto= col_character()  ))chickens#> # A tibble: 5 × 4#>   chicken                 sex     eggs_laid motto#>   <chr>                   <fct>       <int> <chr>#> 1 Foghorn Leghorn         rooster         0 That's a joke, ah say, that's a jok…#> 2 Chicken Little          hen             3 The sky is falling!#> 3 Ginger                  hen            12 Listen. We'll either die free chick…#> 4 Camilla the Chicken     hen             7 Bawk, buck, ba-gawk.#> 5 Ernie The Giant Chicken rooster         0 Put Captain Solo in the cargo hold.

vignette("readr") gives an expanded introduction to readr.

Editions

readr got a new parsing engine in version 2.0.0 (released July 2021). Inthis so-called second edition, readr callsvroom::vroom(), by default.

The parsing engine in readr versions prior to 2.0.0 is now called thefirst edition. If you’re using readr >= 2.0.0, you can still accessfirst edition parsing via the functionswith_edition(1, ...) andlocal_edition(1). And, obviously, if you’re using readr < 2.0.0, youwill get first edition parsing, by definition, because that’s all thereis.

We will continue to support the first edition for a number of releases,but the overall goal is to make the second edition uniformly better thanthe first. Therefore the plan is to eventually deprecate and then removethe first edition code. New code and actively-maintained code should usethe second edition. The workaroundswith_edition(1, ...) andlocal_edition(1) are offered as a pragmatic way to patch up legacycode or as a temporary solution for infelicities identified as thesecond edition matures.

Alternatives

There are two main alternatives to readr: base R and data.table’sfread(). The most important differences are discussed below.

Base R

Compared to the corresponding base functions, readr functions:

Use a consistent naming scheme for the parameters (e.g. col_namesandcol_types notheader andcolClasses).
Are generally much faster (up to 10x-100x) depending on the dataset.
Leave strings as is by default, and automatically parse commondate/time formats.
Have a helpful progress bar if loading is going to take a while.
All functions work exactly the same way regardless of the currentlocale. To override the US-centric defaults, uselocale().

data.table and`fread()`

data.table has a functionsimilar toread_csv() calledfread(). Compared tofread(), readrfunctions:

Are sometimes slower, particularly on numeric heavy data.
Can automatically guess some parameters, but basically encourageexplicit specification of, e.g., the delimiter, skipped rows, and theheader row.
Follow tidyverse-wide conventions, such as returning a tibble, astandard approach for column name repair, and a common mini-languagefor column selection.

Acknowledgements

Thanks to:

Joe Cheng for showing me the beauty ofdeterministic finite automata for parsing, and for teaching me why Ishould write a tokenizer.
JJ Allaire for helping me come up witha design that makes very few copies, and is easy to extend.
Dirk Eddelbuettel for coming up withthe name!

About

Read flat files (csv, tsv, fwf) into R

readr.tidyverse.org

Topics

r csv parsing fwf

Resources

Readme

License

Unknown, MIT licenses found

Licenses found

Code of conduct

Releases19

readr 2.1.6 Latest

Nov 14, 2025

+ 18 releases

Contributors107

+ 93 contributors

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Licenses found

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

readr

Overview

Installation

Cheatsheet

Usage

Editions

Alternatives

Base R

data.table and`fread()`

Acknowledgements

About

Topics

Resources

License

Licenses found

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases19

Uh oh!

Contributors107

Uh oh!

Languages

Movatterモバイル変換

License

Licenses found

tidyverse/readr

Folders and files

Latest commit

History

Repository files navigation

readr

Overview

Installation

Cheatsheet

Usage

Editions

Alternatives

Base R

data.table andfread()

Acknowledgements

About

Topics

Resources

License

Licenses found

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases19

Uh oh!

Contributors107

Uh oh!

Languages

data.table and`fread()`