Movatterモバイル変換

brolgar helps youbrowseoverlongitudinaldatagraphically andanalytically inR, by providing toolsto:

This helps you go from the “plate of spaghetti” plot on the left, to“interesting observations” plot on the right.

Installation

# install.packages("remotes")remotes::install_github("njtierney/brolgar")

# Enable this universeoptions(repos =c(njtierney ='https://njtierney.r-universe.dev',CRAN ='https://cloud.r-project.org')    )# Install some packagesinstall.packages('brolgar')

Usingbrolgar: We need to talk about data

There are many ways to describe longitudinal data - from panel data,cross-sectional data, and time series. We define longitudinal dataas:

The tools and workflows inbrolgar are designed to workwith a special tidy time series data frame called atsibble. We can define our longitudinal data in terms of atime series to gain access to some really useful tools. To do so, weneed to identify three components:

The termkey is used a lot in brolgar, so it is animportant idea to internalise:

Identifying the key, index, and regularity of the data can be achallenge. You can learn more about specifying this in the vignette,“LongitudinalData Structures”.

The wages data

wages#> # A tsibble: 6,402 x 9 [!]#> # Key:       id [888]#>       id ln_wages    xp   ged xp_since_ged black hispanic high_grade#>    <int>    <dbl> <dbl> <int>        <dbl> <int>    <int>      <int>#>  1    31     1.49 0.015     1        0.015     0        1          8#>  2    31     1.43 0.715     1        0.715     0        1          8#>  3    31     1.47 1.73      1        1.73      0        1          8#>  4    31     1.75 2.77      1        2.77      0        1          8#>  5    31     1.93 3.93      1        3.93      0        1          8#>  6    31     1.71 4.95      1        4.95      0        1          8#>  7    31     2.09 5.96      1        5.96      0        1          8#>  8    31     2.13 6.98      1        6.98      0        1          8#>  9    36     1.98 0.315     1        0.315     0        0          9#> 10    36     1.80 0.983     1        0.983     0        0          9#> # ℹ 6,392 more rows#> # ℹ 1 more variable: unemploy_rate <dbl>

wages<-as_tsibble(x = wages,key = id,index = xp,regular =FALSE)

Hereas_tsibble() takes wages, and akey,andindex, and we state theregular = FALSE(since there are not regular time periods between measurements). Thisturns the data into atsibble object - a powerful dataabstraction made available in thetsibble packagebyEaro Wang, if you would like to learnmore abouttsibble, see theofficial package documentationor readthe paper.

Efficiently exploringlongitudinal data

Exploring longitudinal data can be challenging when there are manyindividuals. It is difficult to look at all of them!

You often get a “plate of spaghetti” plot, with many lines plotted ontop of each other. You can avoid the spaghetti by looking at a randomsubset of the data using tools inbrolgar.

sample_n_keys()

Indplyr, you can usesample_n() to samplen observations, orsample_frac() to look at afraction of observations.

brolgar builds on this providingsample_n_keys() andsample_frac_keys(). Thisallows you to take a random sample ofn keys usingsample_n_keys(). For example:

set.seed(2019-7-15-1300)wages%>%sample_n_keys(size =5)%>%ggplot(aes(x = xp,y = ln_wages,group = id))+geom_line()

Clever facets:facet_sample()

facet_sample() allows you to specify the number of keysper facet, and the number of facets withn_per_facet andn_facets.

set.seed(2019-07-23-1937)ggplot(wages,aes(x = xp,y = ln_wages,group = id))+geom_line()+facet_sample()

Under the hood,facet_sample() is powered bysample_n_keys() andstratify_keys().

You can see more facets (e.g.,facet_strata()) and datavisualisations you can make in brolgar in theVisualisationGallery.

Finding features inlongitudinal data

Sometimes you want to know what the range or a summary of a variablefor each individual. We call these summariesfeatures ofthe data, and they can be extracted using thefeaturesfunction, fromfabletools.

For example, if you want to answer the question “What is the summaryof wages for each individual?”. You can usefeatures() tofind the five number summary (min, max, q1, q3, and median) ofln_wages withfeat_five_num:

wages%>%features(ln_wages,           feat_five_num)#> # A tibble: 888 × 6#>       id   min   q25   med   q75   max#>    <int> <dbl> <dbl> <dbl> <dbl> <dbl>#>  1    31 1.43   1.48  1.73  2.02  2.13#>  2    36 1.80   1.97  2.32  2.59  2.93#>  3    53 1.54   1.58  1.71  1.89  3.24#>  4   122 0.763  2.10  2.19  2.46  2.92#>  5   134 2.00   2.28  2.36  2.79  2.93#>  6   145 1.48   1.58  1.77  1.89  2.04#>  7   155 1.54   1.83  2.22  2.44  2.64#>  8   173 1.56   1.68  2.00  2.05  2.34#>  9   206 2.03   2.07  2.30  2.45  2.48#> 10   207 1.58   1.87  2.15  2.26  2.66#> # ℹ 878 more rows

There are many features in brolgar - these features all begin withfeat_. You can, for example, find those whoseln_wages values only increase or decrease withfeat_monotonic:

wages%>%features(ln_wages, feat_monotonic)#> # A tibble: 888 × 5#>       id increase decrease unvary monotonic#>    <int> <lgl>    <lgl>    <lgl>  <lgl>#>  1    31 FALSE    FALSE    FALSE  FALSE#>  2    36 FALSE    FALSE    FALSE  FALSE#>  3    53 FALSE    FALSE    FALSE  FALSE#>  4   122 FALSE    FALSE    FALSE  FALSE#>  5   134 FALSE    FALSE    FALSE  FALSE#>  6   145 FALSE    FALSE    FALSE  FALSE#>  7   155 FALSE    FALSE    FALSE  FALSE#>  8   173 FALSE    FALSE    FALSE  FALSE#>  9   206 TRUE     FALSE    FALSE  TRUE#> 10   207 FALSE    FALSE    FALSE  FALSE#> # ℹ 878 more rows

You can read more about creating and using features in theFindingFeatures vignette. You can also see other features for time seriesin thefeastspackage.

Linking individuals backto the data

Once you have created these features, you can join them back to thedata with aleft_join, like so:

wages%>%features(ln_wages, feat_monotonic)%>%left_join(wages,by ="id")%>%ggplot(aes(x = xp,y = ln_wages,group = id))+geom_line()+gghighlight(increase)#> Warning: Tried to calculate with group_by(), but the calculation failed.#> Falling back to ungrouped filter operation...#> label_key: id#> Too many data series, skip labeling

Other helper functions

n_obs()

n_obs(wages)#> n_obs#>  6402

n_keys()

n_keys(wages)#> [1] 888

Finding the numberof observations perkey.

You can also usen_obs() inside features to return thenumber of observations for each key:

wages%>%features(ln_wages, n_obs)#> # A tibble: 888 × 2#>       id n_obs#>    <int> <int>#>  1    31     8#>  2    36    10#>  3    53     8#>  4   122    10#>  5   134    12#>  6   145     9#>  7   155    11#>  8   173     6#>  9   206     3#> 10   207    11#> # ℹ 878 more rows

This returns a dataframe, with one row per key, and the number ofobservations for each key.

This could be further summarised to get a sense of the patterns ofthe number of observations:

library(ggplot2)wages%>%features(ln_wages, n_obs)%>%ggplot(aes(x = n_obs))+geom_bar()

wages%>%features(ln_wages, n_obs)%>%summary()#>        id            n_obs#>  Min.   :   31   Min.   : 1.000#>  1st Qu.: 3332   1st Qu.: 5.000#>  Median : 6666   Median : 8.000#>  Mean   : 6343   Mean   : 7.209#>  3rd Qu.: 9194   3rd Qu.: 9.000#>  Max.   :12543   Max.   :13.000

Movatterモバイル変換

brolgar

Installation

Using`brolgar`: We need to talk about data

The wages data

Efficiently exploringlongitudinal data

`sample_n_keys()`

Clever facets:`facet_sample()`

Finding features inlongitudinal data

Linking individuals backto the data

Other helper functions

`n_obs()`

`n_keys()`

Finding the numberof observations per`key`.

Further Reading

Related work

Contributing

A Note on the API

Acknowledgements

Movatterモバイル変換

brolgar

Installation

Usingbrolgar: We need to talk about data

The wages data

Efficiently exploringlongitudinal data

sample_n_keys()

Clever facets:facet_sample()

Finding features inlongitudinal data

Linking individuals backto the data

Other helper functions

n_obs()

n_keys()

Finding the numberof observations perkey.

Further Reading

Related work

Contributing

A Note on the API

Acknowledgements

Using`brolgar`: We need to talk about data

`sample_n_keys()`

Clever facets:`facet_sample()`

`n_obs()`

`n_keys()`

Finding the numberof observations per`key`.