nacnudus/unpivotrPublic

NotificationsYou must be signed in to change notification settings
Fork19
Star187

Unpivot complex and irregular data layouts in R

License

Unknown, MIT licenses found

Licenses found

187 stars 19 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 649 Commits
.github		.github
R		R
data-raw		data-raw
data		data
docs		docs
inst/extdata		inst/extdata
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md

Repository files navigation

unpivotr

unpivotr deals with non-tabulardata, especially from spreadsheets. Use unpivotr when your source datahas any of these ‘features’:

Multi-headered hydra
Meaningful formatting
Headers anywhere but at the top of each column
Non-text headers e.g. dates
Other stuff around the table
Several similar tables in one sheet
Sentinel values
Superscript symbols
Meaningful comments
Nested HTML tables

If that list makes your blood boil, you’ll enjoy the function names.

behead() deals with multi-headered hydra tables one layer of headersat a time, working from the edge of the table inwards. It’s a bit likeusingheader = TRUE inread.csv(), but because it’s a function,you can apply it to as many layers of headers as you need. You end upwith all the headers in columns.
spatter() is liketidyr::spread() but preserves mixed data types.You get into a mixed-data-type situation by delaying type coercionuntilafter the table is tidy (rather than before, likeread.csv()et al). And yes, it usually followsbehead().

More positive, corrective functions:

justify() aligns column headers beforebehead()ing, and hasdeliberate moral overtones.
enhead() attaches a header to the body of the data,a laFrankenstein. The effect is the same asbehead(), but is morepowerful because you can choose exactly which header cells you want,paying attention to formatting (whichbehead() doesn’t understand).
isolate_sentinels() separates meaningful symbols like"N/A" or"confidential" from the rest of the data, giving them some timealone think about what they’ve done.
partition() takes a sheet with several tables on it, and slashesinto pieces that each contain one table. You can then unpivot eachtable in turn withpurrr::map() or similar.

Make cells tidy

Unpivotr uses data where each cells is represented by one row in adataframe. Like this.

Gif of tidyxl converting cells into atidy representation of one row per cell

What can you do with tidy cells? The best places to start are:

Spreadsheet MungingStrategies,a free, online cookbook usingtidyxl andunpivotr
Screencasts on YouTube.
Worked examples on GitHub.

Otherwise the basic idea is:

Read the data with a specialist tool.
- For spreadsheets, usetidyxl.
- For plain text files, you might soon be able to usereadr, but for now you’ll have toinstall a pull-request on that package withdevtools::install_github("tidyverse/readr#760").
- For tables in html pages, useunpivotr::tidy_html()
- For data frames, useunpivotr::as_cells() – this should be alast resort, because by the time the data is in a conventionaldata frame, it is often too late – formatting has been lost, andmost data types have been coerced to strings.
Eitherbehead() straight away, elsedplyr::filter() separatelyfor the header cells and the data cells, and then recombine withenhead().
spatter() so that each column has one data type.

library(unpivotr)library(tidyverse)#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──#> ✔ dplyr     1.1.4     ✔ readr     2.1.5#> ✔ forcats   1.0.0     ✔ stringr   1.5.1#> ✔ ggplot2   3.5.1     ✔ tibble    3.2.1#> ✔ lubridate 1.9.3     ✔ tidyr     1.3.1#> ✔ purrr     1.0.2#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──#> ✖ dplyr::filter() masks stats::filter()#> ✖ dplyr::lag()    masks stats::lag()#> ✖ tidyr::pack()   masks unpivotr::pack()#> ✖ tidyr::unpack() masks unpivotr::unpack()#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errorsx<-purpose$`up-left left-up`x# A pivot table in a conventional data frame.  Four levels of headers, in two#>                            X2      X3     X4     X5    X6     X7#> 1                        <NA>    <NA> Female   <NA>  Male   <NA>#> 2                        <NA>    <NA>  0 - 6 7 - 10 0 - 6 7 - 10#> 3           Bachelor's degree 15 - 24   7000  27000  <NA>  13000#> 4                        <NA> 25 - 44  12000 137000  9000  81000#> 5                        <NA> 45 - 64  10000  64000  7000  66000#> 6                        <NA>     65+   <NA>  18000  7000  17000#> 7                 Certificate 15 - 24  29000 161000 30000 190000#> 8                        <NA> 25 - 44  34000 179000 31000 219000#> 9                        <NA> 45 - 64  30000 210000 23000 199000#> 10                       <NA>     65+  12000  77000  8000 107000#> 11                    Diploma 15 - 24   <NA>  14000  9000  11000#> 12                       <NA> 25 - 44  10000  66000  8000  47000#> 13                       <NA> 45 - 64   6000  68000  5000  58000#> 14                       <NA>     65+   5000  41000  1000  34000#> 15           No Qualification 15 - 24  10000  43000 12000  37000#> 16                       <NA> 25 - 44  11000  36000 21000  50000#> 17                       <NA> 45 - 64  19000  91000 17000  75000#> 18                       <NA>     65+  16000 118000  9000  66000#> 19 Postgraduate qualification 15 - 24   <NA>   6000  <NA>   <NA>#> 20                       <NA> 25 - 44   5000  86000  7000  60000#> 21                       <NA> 45 - 64   6000  55000  6000  68000#> 22                       <NA>     65+   <NA>  13000  <NA>  18000# rows and two columns.y<- as_cells(x)# 'Tokenize' or 'melt' the data frame into one row per celly#> # A tibble: 132 × 4#>      row   col data_type chr#>    <int> <int> <chr>     <chr>#>  1     1     1 chr       <NA>#>  2     2     1 chr       <NA>#>  3     3     1 chr       Bachelor's degree#>  4     4     1 chr       <NA>#>  5     5     1 chr       <NA>#>  6     6     1 chr       <NA>#>  7     7     1 chr       Certificate#>  8     8     1 chr       <NA>#>  9     9     1 chr       <NA>#> 10    10     1 chr       <NA>#> # ℹ 122 more rowsrectify(y)# useful for reviewing the melted form as though in a spreadsheet#> # A tibble: 22 × 7#>    `row/col` `1(A)`            `2(B)`  `3(C)` `4(D)` `5(E)` `6(F)`#>        <int> <chr>             <chr>   <chr>  <chr>  <chr>  <chr>#>  1         1 <NA>              <NA>    Female <NA>   Male   <NA>#>  2         2 <NA>              <NA>    0 - 6  7 - 10 0 - 6  7 - 10#>  3         3 Bachelor's degree 15 - 24 7000   27000  <NA>   13000#>  4         4 <NA>              25 - 44 12000  137000 9000   81000#>  5         5 <NA>              45 - 64 10000  64000  7000   66000#>  6         6 <NA>              65+     <NA>   18000  7000   17000#>  7         7 Certificate       15 - 24 29000  161000 30000  190000#>  8         8 <NA>              25 - 44 34000  179000 31000  219000#>  9         9 <NA>              45 - 64 30000  210000 23000  199000#> 10        10 <NA>              65+     12000  77000  8000   107000#> # ℹ 12 more rowsy %>%  behead("up-left","sex") %>%# Strip headers  behead("up","life-satisfication") %>%# one  behead("left-up","qualification") %>%# by  behead("left","age-band") %>%# one.  select(-row,-col,-data_type,count=chr) %>%# cleanup  mutate(count= as.integer(count))#> # A tibble: 80 × 5#>     count sex    `life-satisfication` qualification     `age-band`#>     <int> <chr>  <chr>                <chr>             <chr>#>  1   7000 Female 0 - 6                Bachelor's degree 15 - 24#>  2  12000 Female 0 - 6                Bachelor's degree 25 - 44#>  3  10000 Female 0 - 6                Bachelor's degree 45 - 64#>  4     NA Female 0 - 6                Bachelor's degree 65+#>  5  27000 Female 7 - 10               Bachelor's degree 15 - 24#>  6 137000 Female 7 - 10               Bachelor's degree 25 - 44#>  7  64000 Female 7 - 10               Bachelor's degree 45 - 64#>  8  18000 Female 7 - 10               Bachelor's degree 65+#>  9     NA Male   0 - 6                Bachelor's degree 15 - 24#> 10   9000 Male   0 - 6                Bachelor's degree 25 - 44#> # ℹ 70 more rows

Note the compass directions in the code above, which hint tobehead()where to find the header cell for each data cell.

"up-left" means the header (Female,Male) is positioned up andto the left of the columns of data cells it describes.
"up" means the header (0 - 6,7 - 10) is positioned directlyabove the columns of data cells it describes.
"left-up" means the header (Bachelor's degree,Certificate,etc.) is positioned to the left and upwards of the rows of data cellsit describes.
"left" means the header (15 - 24,25 - 44, etc.) is positioneddirectly to the left of the rows of data cells it describes.

Installation

# install.packages("devtools") # If you don't already have devtoolsdevtools::install_github("nacnudus/unpivotr",build_vignettes=TRUE)

The version 0.4.0 release had somee breaking changes. SeeNEWS.md fordetails. The previous version can be installed as follow:

devtools::install_version("unpivotr",version="0.3.1",repos="http://cran.us.r-project.org")

Similar projects

unpivotr is inspired byDatabaker, acollaboration between theUnited Kingdom Office of NationalStatistics andThe Sensible CodeCompany.unpivotr.

jailbreaker attempts to extractnon-tabular data from spreadsheets into tabular structures automaticallyvia some clever algorithms.unpivotr differs by being lessmagic, and equipping you to express what you want to do.

About

Unpivot complex and irregular data layouts in R

nacnudus.github.io/unpivotr/

Topics

r excel spreadsheet pivot-tables

Resources

Readme

License

Unknown, MIT licenses found

Releases5

unpivotr v0.6.0 Latest

May 8, 2020

+ 4 releases

Packages

No packages published

Contributors9

Languages

R100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Licenses found

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

unpivotr

Make cells tidy

Installation

Similar projects

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases5

Packages

Uh oh!

Contributors9

Uh oh!

Languages

Movatterモバイル変換

License

Licenses found

nacnudus/unpivotr

Folders and files

Latest commit

History

Repository files navigation

unpivotr

Make cells tidy

Installation

Similar projects

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases5

Packages0

Uh oh!

Contributors9

Uh oh!

Languages

Packages