Movatterモバイル変換


[0]ホーム

URL:


Nested data

library(tidyr)library(dplyr)library(purrr)

Basics

A nested data frame is a data frame where one (or more) columns is alist of data frames. You can create simple nested data frames byhand:

df1<-tibble(g =c(1,2,3),data =list(tibble(x =1,y =2),tibble(x =4:5,y =6:7),tibble(x =10)  ))df1#> # A tibble: 3 × 2#>       g data#>   <dbl> <list>#> 1     1 <tibble [1 × 2]>#> 2     2 <tibble [2 × 2]>#> 3     3 <tibble [1 × 1]>

(It is possible to create list-columns in regular data frames, notjust in tibbles, but it’s considerably more work because the defaultbehaviour ofdata.frame() is to treat lists as lists ofcolumns.)

But more commonly you’ll create them withtidyr::nest():

df2<-tribble(~g,~x,~y,1,1,2,2,4,6,2,5,7,3,10,NA)df2%>%nest(data =c(x, y))#> # A tibble: 3 × 2#>       g data#>   <dbl> <list>#> 1     1 <tibble [1 × 2]>#> 2     2 <tibble [2 × 2]>#> 3     3 <tibble [1 × 2]>

nest() specifies which variables should be nestedinside; an alternative is to usedplyr::group_by() todescribe which variables should be kept outside.

df2%>%group_by(g)%>%nest()#> # A tibble: 3 × 2#> # Groups:   g [3]#>       g data#>   <dbl> <list>#> 1     1 <tibble [1 × 2]>#> 2     2 <tibble [2 × 2]>#> 3     3 <tibble [1 × 2]>

I think nesting is easiest to understand in connection to groupeddata: each row in the output corresponds to onegroup in theinput. We’ll see shortly this is particularly convenient when you haveother per-group objects.

The opposite ofnest() isunnest(). Yougive it the name of a list-column containing data frames, and itrow-binds the data frames together, repeating the outer columns theright number of times to line up.

df1%>%unnest(data)#> # A tibble: 4 × 3#>       g     x     y#>   <dbl> <dbl> <dbl>#> 1     1     1     2#> 2     2     4     6#> 3     2     5     7#> 4     3    10    NA

Nested data and models

Nested data is a great fit for problems where you have one ofsomething for each group. A common place this arises is whenyou’re fitting multiple models.

mtcars_nested<- mtcars%>%group_by(cyl)%>%nest()mtcars_nested#> # A tibble: 3 × 2#> # Groups:   cyl [3]#>     cyl data#>   <dbl> <list>#> 1     6 <tibble [7 × 10]>#> 2     4 <tibble [11 × 10]>#> 3     8 <tibble [14 × 10]>

Once you have a list of data frames, it’s very natural to produce alist of models:

mtcars_nested<- mtcars_nested%>%mutate(model =map(data,function(df)lm(mpg~ wt,data = df)))mtcars_nested#> # A tibble: 3 × 3#> # Groups:   cyl [3]#>     cyl data               model#>   <dbl> <list>             <list>#> 1     6 <tibble [7 × 10]>  <lm>#> 2     4 <tibble [11 × 10]> <lm>#> 3     8 <tibble [14 × 10]> <lm>

And then you could even produce a list of predictions:

mtcars_nested<- mtcars_nested%>%mutate(model =map(model, predict))mtcars_nested#> # A tibble: 3 × 3#> # Groups:   cyl [3]#>     cyl data               model#>   <dbl> <list>             <list>#> 1     6 <tibble [7 × 10]>  <dbl [7]>#> 2     4 <tibble [11 × 10]> <dbl [11]>#> 3     8 <tibble [14 × 10]> <dbl [14]>

This workflow works particularly well in conjunction withbroom, which makes it easy toturn models into tidy data frames which can then beunnest()ed to get back to flat data frames. You can see abigger example in thebroomand dplyr vignette.


[8]ページ先頭

©2009-2025 Movatter.jp