Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Rectangle Nested Lists

License

NotificationsYou must be signed in to change notification settings

mgirlich/tibblify

Lifecycle: experimentalCRAN statusCodecov test coverageR build statusR-CMD-check

The goal of tibblify is to provide an easy way of converting a nestedlist into a tibble.

Installation

You can install the released version of tibblify fromCRAN with:

install.packages("tibblify")

Or install the development version from GitHub with:

# install.packages("devtools")devtools::install_github("mgirlich/tibblify")

Introduction

Withtibblify() you can rectangle deeply nested lists into a tidytibble. These lists might come from an API in the form of JSON or fromscraping XML. The reasons to usetibblify() over other tools likejsonlite::fromJSON() ortidyr::hoist() are:

  • It can guess the output format likejsonlite::fromJSON().
  • You can also provide a specification how to rectangle.
  • The specification is easy to understand.
  • You can bring most inputs into the shape you want in a single step.
  • Rectangling is much faster than withjsonlite::fromJSON().

Example

Let’s start withgh_users, which is a list containing informationabout four GitHub users.

library(tibblify)gh_users_small<-purrr::map(gh_users,~.x[c("followers","login","url","name","location","email","public_gists")])names(gh_users_small[[1]])#> [1] "followers"    "login"        "url"          "name"         "location"#> [6] "email"        "public_gists"

Quickly rectanglinggh_users_small is as easy as applyingtibblify()to it:

tibblify(gh_users_small)#> The spec contains 1 unspecified field:#> • email#> # A tibble: 4 × 7#>   followers login      url                    name  location email  public_gists#>       <int> <chr>      <chr>                  <chr> <chr>    <list>        <int>#> 1       780 jennybc    https://api.github.co… Jenn… Vancouv… <NULL>           54#> 2      3958 jtleek     https://api.github.co… Jeff… Baltimo… <NULL>           12#> 3       115 juliasilge https://api.github.co… Juli… Salt La… <NULL>            4#> 4       213 leeper     https://api.github.co… Thom… London,… <NULL>           46

We can now look at the specificationtibblify() used for rectangling

guess_tspec(gh_users_small)#> The spec contains 1 unspecified field:#> • email#> tspec_df(#>   tib_int("followers"),#>   tib_chr("login"),#>   tib_chr("url"),#>   tib_chr("name"),#>   tib_chr("location"),#>   tib_unspecified("email"),#>   tib_int("public_gists"),#> )

If we are only interested in some of the fields we can easily adapt thespecification

spec<- tspec_df(login_name= tib_chr("login"),  tib_chr("name"),  tib_int("public_gists"))tibblify(gh_users_small,spec)#> # A tibble: 4 × 3#>   login_name name                   public_gists#>   <chr>      <chr>                         <int>#> 1 jennybc    Jennifer (Jenny) Bryan           54#> 2 jtleek     Jeff L.                          12#> 3 juliasilge Julia Silge                       4#> 4 leeper     Thomas J. Leeper                 46

Objects

We refer to lists likegh_users_small ascollection andobjectsare the elements of such lists. Objects and collections are the typicalinput fortibblify().

Basically, anobject is simply something that can be converted to aone row tibble. This boils down to a condition on the names of theobject:

  • theobject must have names (thenames attribute must not beNULL),
  • every element must be named (no name can beNA or""),
  • and the names must be unique.

In other words, the names must fulfillvec_as_names(repair = "check_unique"). The name-value pairs of anobject are thefields.

For examplelist(x = 1, y = "a") is an object with the fields(x, 1)and(y, "a") butlist(1, z = 3) is not an object because it is notfully named.

Acollection is basically just a list of similar objects so that thefields can become the columns in a tibble.

Specification

Providing an explicit specification has a couple of advantages:

  • you can ensure type and shape stability of the resulting tibble inautomated scripts.
  • you can give the columns different names.
  • you can restrict to parsing only the fields you need.
  • you can specify what happens if a value is missing.

As seen before the specification for a collection is done withtspec_df(). The columns of the output tibble are describe with thetib_*() functions. They describe the path to the field to extract andthe output type of the field. There are the following five types offunctions:

  • tib_scalar(ptype): a length one vector with typeptype
  • tib_vector(ptype): a vector of arbitrary length with typeptype
  • tib_variant(): a vector of arbitrary length and type; you shouldbarely ever need this
  • tib_row(...): an object with the fields...
  • tib_df(...): a collection where the objects have the fields...

For convenience there are shortcuts fortib_scalar() andtib_vector() for the most common prototypes:

  • logical():tib_lgl() andtib_lgl_vec()
  • integer():tib_int() andtib_int_vec()
  • double():tib_dbl() andtib_dbl_vec()
  • character():tib_chr() andtib_chr_vec()
  • Date:tib_date() andtib_date_vec()
  • Date encoded as character:tib_chr_date() andtib_chr_date_vec()

Scalar Elements

Scalar elements are the most common case and result in a normal vectorcolumn

tibblify(list(list(id=1,name="Peter"),list(id=2,name="Lilly")  ),  tspec_df(    tib_int("id"),    tib_chr("name")  ))#> # A tibble: 2 × 2#>      id name#>   <int> <chr>#> 1     1 Peter#> 2     2 Lilly

Withtib_scalar() you can also provide your own prototype

Let’s say you have a list with durations

x<-list(list(id=1,duration=vctrs::new_duration(100)),list(id=2,duration=vctrs::new_duration(200)))x#> [[1]]#> [[1]]$id#> [1] 1#>#> [[1]]$duration#> Time difference of 100 secs#>#>#> [[2]]#> [[2]]$id#> [1] 2#>#> [[2]]$duration#> Time difference of 200 secs

and then use it intib_scalar()

tibblify(x,  tspec_df(    tib_int("id"),    tib_scalar("duration",ptype=vctrs::new_duration())  ))#> # A tibble: 2 × 2#>      id duration#>   <int> <drtn>#> 1     1 100 secs#> 2     2 200 secs

Vector Elements

If an element does not always have size one then it is a vector element.If it still always has the same typeptype then it produces a list ofptype column:

x<-list(list(id=1,children= c("Peter","Lilly")),list(id=2,children="James"),list(id=3,children= c("Emma","Noah","Charlotte")))tibblify(x,  tspec_df(    tib_int("id"),    tib_chr_vec("children")  ))#> # A tibble: 3 × 2#>      id    children#>   <int> <list<chr>>#> 1     1         [2]#> 2     2         [1]#> 3     3         [3]

You can usetidyr::unnest() ortidyr::unnest_longer()to flatten these columns to regular columns.

Object Elements

For example ingh_repos_small

gh_repos_small<-purrr::map(gh_repos,~.x[c("id","name","owner")])gh_repos_small<-purrr::map(gh_repos_small,function(repo) {repo$owner<-repo$owner[c("login","id","url")]repo  })gh_repos_small[[1]]#> $id#> [1] 61160198#>#> $name#> [1] "after"#>#> $owner#> $owner$login#> [1] "gaborcsardi"#>#> $owner$id#> [1] 660288#>#> $owner$url#> [1] "https://api.github.com/users/gaborcsardi"

the fieldowner is an object itself. The specification to extract itusestib_row()

spec<- guess_tspec(gh_repos_small)spec#> tspec_df(#>   tib_int("id"),#>   tib_chr("name"),#>   tib_row(#>     "owner",#>     tib_chr("login"),#>     tib_int("id"),#>     tib_chr("url"),#>   ),#> )

and results in a tibble column

tibblify(gh_repos_small,spec)#> # A tibble: 30 × 3#>          id name        owner$login    $id $url#>       <int> <chr>       <chr>        <int> <chr>#>  1 61160198 after       gaborcsardi 660288 https://api.github.com/users/gaborcs…#>  2 40500181 argufy      gaborcsardi 660288 https://api.github.com/users/gaborcs…#>  3 36442442 ask         gaborcsardi 660288 https://api.github.com/users/gaborcs…#>  4 34924886 baseimports gaborcsardi 660288 https://api.github.com/users/gaborcs…#>  5 61620661 citest      gaborcsardi 660288 https://api.github.com/users/gaborcs…#>  6 33907457 clisymbols  gaborcsardi 660288 https://api.github.com/users/gaborcs…#>  7 37236467 cmaker      gaborcsardi 660288 https://api.github.com/users/gaborcs…#>  8 67959624 cmark       gaborcsardi 660288 https://api.github.com/users/gaborcs…#>  9 63152619 conditions  gaborcsardi 660288 https://api.github.com/users/gaborcs…#> 10 24343686 crayon      gaborcsardi 660288 https://api.github.com/users/gaborcs…#> # ℹ 20 more rows

If you don’t like the tibble column you can unpack it withtidyr::unpack(). Alternatively, if you only want to extract some ofthe fields inowner you can use a nested path

spec2<- tspec_df(id= tib_int("id"),name= tib_chr("name"),owner_id= tib_int(c("owner","id")),owner_login= tib_chr(c("owner","login")))spec2#> tspec_df(#>   tib_int("id"),#>   tib_chr("name"),#>   owner_id = tib_int(c("owner", "id")),#>   owner_login = tib_chr(c("owner", "login")),#> )tibblify(gh_repos_small,spec2)#> # A tibble: 30 × 4#>          id name        owner_id owner_login#>       <int> <chr>          <int> <chr>#>  1 61160198 after         660288 gaborcsardi#>  2 40500181 argufy        660288 gaborcsardi#>  3 36442442 ask           660288 gaborcsardi#>  4 34924886 baseimports   660288 gaborcsardi#>  5 61620661 citest        660288 gaborcsardi#>  6 33907457 clisymbols    660288 gaborcsardi#>  7 37236467 cmaker        660288 gaborcsardi#>  8 67959624 cmark         660288 gaborcsardi#>  9 63152619 conditions    660288 gaborcsardi#> 10 24343686 crayon        660288 gaborcsardi#> # ℹ 20 more rows

Required and Optional Fields

Objects usually have some fields that always exist and some that areoptional. By defaulttib_*() demands that a field exists

x<-list(list(x=1,y="a"),list(x=2))spec<- tspec_df(x= tib_int("x"),y= tib_chr("y"))tibblify(x,spec)#> Error in `tibblify()`:#> ! Field y is required but does not exist in `x[[2]]`.#> ℹ Use `required = FALSE` if the field is optional.

You can mark a field as optional with the argumentrequired = FALSE:

spec<- tspec_df(x= tib_int("x"),y= tib_chr("y",required=FALSE))tibblify(x,spec)#> # A tibble: 2 × 2#>       x y#>   <int> <chr>#> 1     1 a#> 2     2 <NA>

You can specify the value to use with thefill argument

spec<- tspec_df(x= tib_int("x"),y= tib_chr("y",required=FALSE,fill="missing"))tibblify(x,spec)#> # A tibble: 2 × 2#>       x y#>   <int> <chr>#> 1     1 a#> 2     2 missing

Converting a Single Object

To rectangle a single object you have two options:tspec_object()which produces a list ortspec_row() which produces a tibble with onerow.

While tibbles are great for a single object it often makes more sense toconvert them to a list.

For example a typical API response might be something like

api_output<-list(status="success",requested_at="2021-10-26 09:17:12",data=list(list(x=1),list(x=2)  ))

To convert to a one row tibble

row_spec<- tspec_row(status= tib_chr("status"),data= tib_df("data",x= tib_int("x")  ))api_output_df<- tibblify(api_output,row_spec)api_output_df#> # A tibble: 1 × 2#>   status                data#>   <chr>   <list<tibble[,1]>>#> 1 success            [2 × 1]

it is necessary to wrapdata in a list. To accessdata one has touseapi_output_df$data[[1]] which is not very nice.

object_spec<- tspec_object(status= tib_chr("status"),data= tib_df("data",x= tib_int("x")  ))api_output_list<- tibblify(api_output,object_spec)api_output_list#> $status#> [1] "success"#>#> $data#> # A tibble: 2 × 1#>       x#>   <int>#> 1     1#> 2     2

Now accessingdata does not required an extra subsetting step

api_output_list$data#> # A tibble: 2 × 1#>       x#>   <int>#> 1     1#> 2     2

Code of Conduct

Please note that the tibblify project is released with aContributorCode ofConduct.By contributing to this project, you agree to abide by its terms.

About

Rectangle Nested Lists

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors3

  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp