- Notifications
You must be signed in to change notification settings - Fork420
Tidy Messy Data
License
Unknown, MIT licenses found
Licenses found
tidyverse/tidyr
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
The goal of tidyr is to help you createtidy data. Tidy data is datawhere:
- Each variable is a column; each column is a variable.
- Each observation is a row; each row is an observation.
- Each value is a cell; each cell is a single value.
Tidy data describes a standard way of storing data that is used whereverpossible throughout thetidyverse. If youensure that your data is tidy, you’ll spend less time fighting with thetools and more time working on your analysis. Learn more about tidy datainvignette("tidy-data").
# The easiest way to get tidyr is to install the whole tidyverse:install.packages("tidyverse")# Alternatively, install just tidyr:install.packages("tidyr")# Or the development version from GitHub:# install.packages("pak")pak::pak("tidyverse/tidyr")
library(tidyr)tidyr functions fall into five main categories:
“Pivoting” which converts between long and wide forms. tidyr 1.0.0introduces
pivot_longer()andpivot_wider(), replacing the olderspread()andgather()functions. Seevignette("pivot")for moredetails.“Rectangling”, which turns deeply nested lists (as from JSON) intotidy tibbles. See
unnest_longer(),unnest_wider(),hoist(), andvignette("rectangle")for more details.Nesting converts grouped data to a form where each group becomes asingle row containing a nested data frame, and unnesting does theopposite. See
nest(),unnest(), andvignette("nest")for moredetails.Splitting and combining character columns. Use
separate_wider_delim(),separate_wider_position(), andseparate_wider_regex()to pull a single character column intomultiple columns; useunite()to combine multiple columns into asingle character column.Make implicit missing values explicit with
complete(); make explicitmissing values implicit withdrop_na(); replace missing values withnext/previous value withfill(), or a known value withreplace_na().
tidyrsupersedesreshape2 (2010-2014) and reshape (2005-2010). Somewhatcounterintuitively, each iteration of the package has done less. tidyris designed specifically for tidying data, not general reshaping(reshape2), or the general aggregation (reshape).
data.table provideshigh-performance implementations ofmelt() anddcast()
If you’d like to read more about data reshaping from a CS perspective,I’d recommend the following three papers:
Wrangler: Interactive visual specification of data transformationscripts
An interactive framework for datacleaning(Potter’s wheel)
On efficiently implementing SchemaSQL on a SQL databasesystem
To guide your reading, here’s a translation between the terminology usedin different places:
| tidyr 1.0.0 | pivot longer | pivot wider |
|---|---|---|
| tidyr < 1.0.0 | gather | spread |
| reshape(2) | melt | cast |
| spreadsheets | unpivot | pivot |
| databases | fold | unfold |
If you encounter a clear bug, please file a minimal reproducible exampleongithub. For questionsand other discussion, please useforum.posit.co.
Please note that the tidyr project is released with aContributor Codeof Conduct. Bycontributing to this project, you agree to abide by its terms.
About
Tidy Messy Data
Topics
Resources
License
Unknown, MIT licenses found
Licenses found
Code of conduct
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.
