The goal of tibblify is to provide an easy way of converting a nested list into a tibble.
Installation
You can install the released version of tibblify fromCRAN with:
install.packages("tibblify")Or install the development version from GitHub with:
# install.packages("devtools")devtools::install_github("mgirlich/tibblify")Introduction
Withtibblify() you can rectangle deeply nested lists into a tidy tibble. These lists might come from an API in the form of JSON or from scraping XML. The reasons to usetibblify() over other tools likejsonlite::fromJSON() ortidyr::hoist() are:
- It can guess the output format like
jsonlite::fromJSON(). - You can also provide a specification how to rectangle.
- The specification is easy to understand.
- You can bring most inputs into the shape you want in a single step.
- Rectangling is much faster than with
jsonlite::fromJSON().
Example
Let’s start withgh_users, which is a list containing information about four GitHub users.
library(tibblify)gh_users_small<-purrr::map(gh_users,~.x[c("followers","login","url","name","location","email","public_gists")])names(gh_users_small[[1]])#> [1] "followers" "login" "url" "name" "location"#> [6] "email" "public_gists"Quickly rectanglinggh_users_small is as easy as applyingtibblify() to it:
tibblify(gh_users_small)#> The spec contains 1 unspecified field:#> • email#> # A tibble: 4 × 7#> followers login url name location email public_gists#> <int> <chr> <chr> <chr> <chr> <list> <int>#> 1 780 jennybc https://api.github.co… Jenn… Vancouv… <NULL> 54#> 2 3958 jtleek https://api.github.co… Jeff… Baltimo… <NULL> 12#> 3 115 juliasilge https://api.github.co… Juli… Salt La… <NULL> 4#> 4 213 leeper https://api.github.co… Thom… London,… <NULL> 46We can now look at the specificationtibblify() used for rectangling
guess_tspec(gh_users_small)#> The spec contains 1 unspecified field:#> • email#> tspec_df(#> tib_int("followers"),#> tib_chr("login"),#> tib_chr("url"),#> tib_chr("name"),#> tib_chr("location"),#> tib_unspecified("email"),#> tib_int("public_gists"),#> )If we are only interested in some of the fields we can easily adapt the specification
spec<-tspec_df( login_name=tib_chr("login"),tib_chr("name"),tib_int("public_gists"))tibblify(gh_users_small,spec)#> # A tibble: 4 × 3#> login_name name public_gists#> <chr> <chr> <int>#> 1 jennybc Jennifer (Jenny) Bryan 54#> 2 jtleek Jeff L. 12#> 3 juliasilge Julia Silge 4#> 4 leeper Thomas J. Leeper 46Objects
We refer to lists likegh_users_small ascollection andobjects are the elements of such lists. Objects and collections are the typical input fortibblify().
Basically, anobject is simply something that can be converted to a one row tibble. This boils down to a condition on the names of the object:
- the
objectmust have names (thenamesattribute must not beNULL), - every element must be named (no name can be
NAor""), - and the names must be unique.
In other words, the names must fulfillvec_as_names(repair = "check_unique"). The name-value pairs of an object are thefields.
For examplelist(x = 1, y = "a") is an object with the fields(x, 1) and(y, "a") butlist(1, z = 3) is not an object because it is not fully named.
Acollection is basically just a list of similar objects so that the fields can become the columns in a tibble.
Specification
Providing an explicit specification has a couple of advantages:
- you can ensure type and shape stability of the resulting tibble in automated scripts.
- you can give the columns different names.
- you can restrict to parsing only the fields you need.
- you can specify what happens if a value is missing.
As seen before the specification for a collection is done withtspec_df(). The columns of the output tibble are describe with thetib_*() functions. They describe the path to the field to extract and the output type of the field. There are the following five types of functions:
tib_scalar(ptype): a length one vector with typeptypetib_vector(ptype): a vector of arbitrary length with typeptypetib_variant(): a vector of arbitrary length and type; you should barely ever need thistib_row(...): an object with the fields...tib_df(...): a collection where the objects have the fields...
For convenience there are shortcuts fortib_scalar() andtib_vector() for the most common prototypes:
logical():tib_lgl()andtib_lgl_vec()integer():tib_int()andtib_int_vec()double():tib_dbl()andtib_dbl_vec()character():tib_chr()andtib_chr_vec()Date:tib_date()andtib_date_vec()Dateencoded as character:tib_chr_date()andtib_chr_date_vec()
Scalar Elements
Scalar elements are the most common case and result in a normal vector column
tibblify(list(list(id=1, name="Peter"),list(id=2, name="Lilly")),tspec_df(tib_int("id"),tib_chr("name")))#> # A tibble: 2 × 2#> id name#> <int> <chr>#> 1 1 Peter#> 2 2 LillyWithtib_scalar() you can also provide your own prototype
Let’s say you have a list with durations
x<-list(list(id=1, duration=vctrs::new_duration(100)),list(id=2, duration=vctrs::new_duration(200)))x#> [[1]]#> [[1]]$id#> [1] 1#>#> [[1]]$duration#> Time difference of 100 secs#>#>#> [[2]]#> [[2]]$id#> [1] 2#>#> [[2]]$duration#> Time difference of 200 secsand then use it intib_scalar()
tibblify(x,tspec_df(tib_int("id"),tib_scalar("duration", ptype=vctrs::new_duration())))#> # A tibble: 2 × 2#> id duration#> <int> <drtn>#> 1 1 100 secs#> 2 2 200 secsVector Elements
If an element does not always have size one then it is a vector element. If it still always has the same typeptype then it produces a list ofptype column:
x<-list(list(id=1, children=c("Peter","Lilly")),list(id=2, children="James"),list(id=3, children=c("Emma","Noah","Charlotte")))tibblify(x,tspec_df(tib_int("id"),tib_chr_vec("children")))#> # A tibble: 3 × 2#> id children#> <int> <list<chr>>#> 1 1 [2]#> 2 2 [1]#> 3 3 [3]You can usetidyr::unnest() ortidyr::unnest_longer() to flatten these columns to regular columns.
Object Elements
For example ingh_repos_small
gh_repos_small<-purrr::map(gh_repos,~.x[c("id","name","owner")])gh_repos_small<-purrr::map(gh_repos_small,function(repo){repo$owner<-repo$owner[c("login","id","url")]repo})gh_repos_small[[1]]#> $id#> [1] 61160198#>#> $name#> [1] "after"#>#> $owner#> $owner$login#> [1] "gaborcsardi"#>#> $owner$id#> [1] 660288#>#> $owner$url#> [1] "https://api.github.com/users/gaborcsardi"the fieldowner is an object itself. The specification to extract it usestib_row()
spec<-guess_tspec(gh_repos_small)spec#> tspec_df(#> tib_int("id"),#> tib_chr("name"),#> tib_row(#> "owner",#> tib_chr("login"),#> tib_int("id"),#> tib_chr("url"),#> ),#> )and results in a tibble column
tibblify(gh_repos_small,spec)#> # A tibble: 30 × 3#> id name owner$login $id $url#> <int> <chr> <chr> <int> <chr>#> 1 61160198 after gaborcsardi 660288 https://api.github.com/users/gaborcs…#> 2 40500181 argufy gaborcsardi 660288 https://api.github.com/users/gaborcs…#> 3 36442442 ask gaborcsardi 660288 https://api.github.com/users/gaborcs…#> 4 34924886 baseimports gaborcsardi 660288 https://api.github.com/users/gaborcs…#> 5 61620661 citest gaborcsardi 660288 https://api.github.com/users/gaborcs…#> 6 33907457 clisymbols gaborcsardi 660288 https://api.github.com/users/gaborcs…#> 7 37236467 cmaker gaborcsardi 660288 https://api.github.com/users/gaborcs…#> 8 67959624 cmark gaborcsardi 660288 https://api.github.com/users/gaborcs…#> 9 63152619 conditions gaborcsardi 660288 https://api.github.com/users/gaborcs…#> 10 24343686 crayon gaborcsardi 660288 https://api.github.com/users/gaborcs…#> # ℹ 20 more rowsIf you don’t like the tibble column you can unpack it withtidyr::unpack(). Alternatively, if you only want to extract some of the fields inowner you can use a nested path
spec2<-tspec_df( id=tib_int("id"), name=tib_chr("name"), owner_id=tib_int(c("owner","id")), owner_login=tib_chr(c("owner","login")))spec2#> tspec_df(#> tib_int("id"),#> tib_chr("name"),#> owner_id = tib_int(c("owner", "id")),#> owner_login = tib_chr(c("owner", "login")),#> )tibblify(gh_repos_small,spec2)#> # A tibble: 30 × 4#> id name owner_id owner_login#> <int> <chr> <int> <chr>#> 1 61160198 after 660288 gaborcsardi#> 2 40500181 argufy 660288 gaborcsardi#> 3 36442442 ask 660288 gaborcsardi#> 4 34924886 baseimports 660288 gaborcsardi#> 5 61620661 citest 660288 gaborcsardi#> 6 33907457 clisymbols 660288 gaborcsardi#> 7 37236467 cmaker 660288 gaborcsardi#> 8 67959624 cmark 660288 gaborcsardi#> 9 63152619 conditions 660288 gaborcsardi#> 10 24343686 crayon 660288 gaborcsardi#> # ℹ 20 more rowsRequired and Optional Fields
Objects usually have some fields that always exist and some that are optional. By defaulttib_*() demands that a field exists
x<-list(list(x=1, y="a"),list(x=2))spec<-tspec_df( x=tib_int("x"), y=tib_chr("y"))tibblify(x,spec)#> Error in `tibblify()`:#> ! Field y is required but does not exist in `x[[2]]`.#> ℹ Use `required = FALSE` if the field is optional.You can mark a field as optional with the argumentrequired = FALSE:
spec<-tspec_df( x=tib_int("x"), y=tib_chr("y", required=FALSE))tibblify(x,spec)#> # A tibble: 2 × 2#> x y#> <int> <chr>#> 1 1 a#> 2 2 <NA>You can specify the value to use with thefill argument
Converting a Single Object
To rectangle a single object you have two options:tspec_object() which produces a list ortspec_row() which produces a tibble with one row.
While tibbles are great for a single object it often makes more sense to convert them to a list.
For example a typical API response might be something like
api_output<-list( status="success", requested_at="2021-10-26 09:17:12", data=list(list(x=1),list(x=2)))To convert to a one row tibble
row_spec<-tspec_row( status=tib_chr("status"), data=tib_df("data", x=tib_int("x")))api_output_df<-tibblify(api_output,row_spec)api_output_df#> # A tibble: 1 × 2#> status data#> <chr> <list<tibble[,1]>>#> 1 success [2 × 1]it is necessary to wrapdata in a list. To accessdata one has to useapi_output_df$data[[1]] which is not very nice.
object_spec<-tspec_object( status=tib_chr("status"), data=tib_df("data", x=tib_int("x")))api_output_list<-tibblify(api_output,object_spec)api_output_list#> $status#> [1] "success"#>#> $data#> # A tibble: 2 × 1#> x#> <int>#> 1 1#> 2 2Now accessingdata does not required an extra subsetting step
api_output_list$data#> # A tibble: 2 × 1#> x#> <int>#> 1 1#> 2 2Code of Conduct
Please note that the tibblify project is released with aContributor Code of Conduct. By contributing to this project, you agree to abide by its terms.