Movatterモバイル変換


[0]ホーム

URL:


Introduction to fable

library(fable)library(tsibble)library(dplyr)

The fable package provides some commonly used univariate andmultivariate time series forecasting models which can be used with tidytemporal data in the tsibble format. These models are used within aconsistent and tidy modelling framework, allowing several models to beestimated, compared, combined, forecasted and otherwise worked withacross many time series.

Suppose we wanted to forecast the number of domestic travellers toMelbourne, Australia. In thetsibble::tourism data set,this can be further broken down into 4 reasons of travel: “business”,“holiday”, “visiting friends and relatives” and “other reasons”. Thefirst observation from each series are shown below.

tourism_melb<- tourism%>%filter(Region=="Melbourne")tourism_melb%>%group_by(Purpose)%>%slice(1)#> # A tsibble: 4 x 5 [1Q]#> # Key:       Region, State, Purpose [4]#> # Groups:    Purpose [4]#>   Quarter Region    State    Purpose  Trips#>     <qtr> <chr>     <chr>    <chr>    <dbl>#> 1 1998 Q1 Melbourne Victoria Business 405.#> 2 1998 Q1 Melbourne Victoria Holiday  428.#> 3 1998 Q1 Melbourne Victoria Other     79.9#> 4 1998 Q1 Melbourne Victoria Visiting 666.

The variable that we’d like to estimate is the number of overnighttrips (in thousands) represented by theTrips variable. Aplot of the data reveals that some trends and weak seasonality areapparent.

tourism_melb%>%autoplot(Trips)

Two widely used models available in this package are ETS and ARIMA.These models are specified using a compact formula representation (muchlike cross-sectional linear models usinglm()). Theresponse variable (Trips) and any transformations areincluded on the left, while the model specification is on the right ofthe formula. When a model is not fully specified (or if the formula’sright side is missing completely), the unspecified components will bechosen automatically.

Suppose we think that the ETS model must have an additive trend, andwant the other elements to be chosen automatically. This model would bespecified usingETS(Trips ~ trend("A")). Similarly, acompletely automatic ARIMA model (much likeauto.arima()from theforecast package) can be specified usingARIMA(Trips). Themodel() function is used toestimate these model specifications on a particular dataset, and willreturn a “mable” (model table).

fit<- tourism_melb%>%model(ets =ETS(Trips~trend("A")),arima =ARIMA(Trips)  )fit#> # A mable: 4 x 5#> # Key:     Region, State, Purpose [4]#>   Region    State    Purpose           ets                             arima#>   <chr>     <chr>    <chr>         <model>                           <model>#> 1 Melbourne Victoria Business <ETS(A,A,A)> <ARIMA(0,1,2)(1,0,1)[4] w/ drift>#> 2 Melbourne Victoria Holiday  <ETS(M,A,A)>           <ARIMA(0,1,1) w/ drift>#> 3 Melbourne Victoria Other    <ETS(A,A,N)>           <ARIMA(0,1,1) w/ drift>#> 4 Melbourne Victoria Visiting <ETS(M,A,A)>          <ARIMA(0,1,1)(1,0,2)[4]>

A mable contains a row for each time series (uniquely identified bythe key variables), and a column for each model specification. A modelis contained within the cells of each model column. In the example abovewe can see that the all four ETS models have an additive trend, and theerror and seasonality have been chosen automatically. Similarly, theARIMA model varies between time series as it has been automaticallyselected.

Thecoef() ortidy() function is used toextract the coefficients from the models. It’s possible to useselect() and other verbs to focus on the coefficients froma particular model.

fit%>%select(Region, State, Purpose, arima)%>%coef()#> # A tibble: 13 x 9#>    Region    State    Purpose .model term  estimate std.error statistic  p.value#>    <chr>     <chr>    <chr>   <chr>  <chr>    <dbl>     <dbl>     <dbl>    <dbl>#>  1 Melbourne Victoria Busine~ arima  ma1     -0.555    0.130     -4.28  5.29e- 5#>  2 Melbourne Victoria Busine~ arima  ma2     -0.233    0.129     -1.81  7.47e- 2#>  3 Melbourne Victoria Busine~ arima  sar1     0.946    0.0634    14.9   1.08e-24#>  4 Melbourne Victoria Busine~ arima  sma1    -0.772    0.145     -5.34  8.81e- 7#>  5 Melbourne Victoria Busine~ arima  cons~    0.192    0.213      0.903 3.69e- 1#>  6 Melbourne Victoria Holiday arima  ma1     -0.931    0.0851   -10.9   1.77e-17#>  7 Melbourne Victoria Holiday arima  cons~    3.65     0.571      6.39  1.06e- 8#>  8 Melbourne Victoria Other   arima  ma1     -0.750    0.0708   -10.6   8.19e-17#>  9 Melbourne Victoria Other   arima  cons~    1.24     0.640      1.93  5.70e- 2#> 10 Melbourne Victoria Visiti~ arima  ma1     -0.838    0.0652   -12.8   5.03e-21#> 11 Melbourne Victoria Visiti~ arima  sar1     0.659    0.193      3.41  1.03e- 3#> 12 Melbourne Victoria Visiti~ arima  sma1    -0.402    0.206     -1.95  5.47e- 2#> 13 Melbourne Victoria Visiti~ arima  sma2     0.322    0.143      2.26  2.68e- 2

Theglance() function provides a one-row summary of eachmodel, and commonly includes descriptions of the model’s fit such as theresidual variance and information criteria. Be wary though, asinformation criteria (AIC, AICc, BIC) are only comparable between thesame model class and only if those models share the same response (aftertransformations and differencing).

fit%>%glance()#> # A tibble: 8 x 14#>   Region    State   Purpose .model  sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE#>   <chr>     <chr>   <chr>   <chr>    <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 Melbourne Victor~ Busine~ ets    3.53e+3   -498. 1014. 1016. 1035. 3180. 3520.#> 2 Melbourne Victor~ Busine~ arima  3.67e+3   -435.  882.  883.  896.   NA    NA#> 3 Melbourne Victor~ Holiday ets    1.10e-2   -487.  992.  994. 1013. 2548. 2574.#> 4 Melbourne Victor~ Holiday arima  3.07e+3   -429.  864.  865.  872.   NA    NA#> 5 Melbourne Victor~ Other   ets    4.97e+2   -422.  853.  854.  865.  472.  512.#> 6 Melbourne Victor~ Other   arima  4.89e+2   -356.  718.  719.  725.   NA    NA#> 7 Melbourne Victor~ Visiti~ ets    1.09e-2   -503. 1024. 1026. 1045. 3714. 3860.#> 8 Melbourne Victor~ Visiti~ arima  4.24e+3   -442.  893.  894.  905.   NA    NA#> # i 3 more variables: MAE <dbl>, ar_roots <list>, ma_roots <list>

If you’re working with a single model (or want to look at one modelin particular), thereport() function gives a familiar andnicely formatted model-specific display.

fit%>%filter(Purpose=="Holiday")%>%select(ets)%>%report()#> Series: Trips#> Model: ETS(M,A,A)#>   Smoothing parameters:#>     alpha = 0.03084501#>     beta  = 0.03084499#>     gamma = 0.0001000967#>#>   Initial states:#>      l[0]      b[0]     s[0]    s[-1]     s[-2]    s[-3]#>  424.0777 -2.535481 -26.7441 4.256618 -10.10668 32.59417#>#>   sigma^2:  0.011#>#>       AIC      AICc       BIC#>  991.7305  994.3020 1013.1688

The fitted values and residuals from a model can obtained usingfitted() andresiduals() respectively.Additionally, theaugment() function may be moreconvenient, which provides the original data along with both fittedvalues and their residuals.

fit%>%augment()#> # A tsibble: 640 x 9 [1Q]#> # Key:       Region, State, Purpose, .model [8]#>    Region    State    Purpose  .model Quarter Trips .fitted  .resid  .innov#>    <chr>     <chr>    <chr>    <chr>    <qtr> <dbl>   <dbl>   <dbl>   <dbl>#>  1 Melbourne Victoria Business ets    1998 Q1  405.    396.    9.54    9.54#>  2 Melbourne Victoria Business ets    1998 Q2  408.    483.  -75.1   -75.1#>  3 Melbourne Victoria Business ets    1998 Q3  486.    487.   -1.13   -1.13#>  4 Melbourne Victoria Business ets    1998 Q4  429.    454.  -25.2   -25.2#>  5 Melbourne Victoria Business ets    1999 Q1  361.    391.  -30.3   -30.3#>  6 Melbourne Victoria Business ets    1999 Q2  486.    466.   19.9    19.9#>  7 Melbourne Victoria Business ets    1999 Q3  359.    492. -133.   -133.#>  8 Melbourne Victoria Business ets    1999 Q4  426.    424.    1.49    1.49#>  9 Melbourne Victoria Business ets    2000 Q1  495.    364.  130.    130.#> 10 Melbourne Victoria Business ets    2000 Q2  499.    477.   22.0    22.0#> # i 630 more rows

To compare how well the models fit the data, we can consider somecommon accuracy measures. It seems that on the training set the ETSmodel out-performs ARIMA for the series where travellers are on holiday,business, and visiting friends and relatives. TheEvaluating modellingaccuracy chapter from theForecasting: Principles andPractices (3rd Ed.) textbook provides more detail in howmodelling and forecasting accuracy is evaluated.

fit%>%accuracy()%>%arrange(MASE)#> # A tibble: 8 x 13#>   Region  State Purpose .model .type     ME  RMSE   MAE    MPE  MAPE  MASE RMSSE#>   <chr>   <chr> <chr>   <chr>  <chr>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>#> 1 Melbou~ Vict~ Holiday ets    Trai~  4.67   50.5  37.2  0.217  7.29 0.675 0.697#> 2 Melbou~ Vict~ Busine~ ets    Trai~  3.31   56.4  42.9 -0.753  9.31 0.691 0.740#> 3 Melbou~ Vict~ Busine~ arima  Trai~  2.54   58.2  46.0 -1.17  10.1  0.741 0.765#> 4 Melbou~ Vict~ Holiday arima  Trai~ -4.64   54.3  41.4 -2.44   8.46 0.752 0.751#> 5 Melbou~ Vict~ Other   arima  Trai~ -0.344  21.7  17.0 -6.16  19.5  0.763 0.772#> 6 Melbou~ Vict~ Other   ets    Trai~ -0.142  21.7  17.0 -5.97  19.6  0.767 0.773#> 7 Melbou~ Vict~ Visiti~ ets    Trai~  8.17   60.9  51.4  0.433  8.28 0.819 0.782#> 8 Melbou~ Vict~ Visiti~ arima  Trai~  6.89   63.1  51.7  0.106  8.44 0.825 0.809#> # i 1 more variable: ACF1 <dbl>

Forecasts from these models can be produced directly as our specifiedmodels do not require any additional data.

fc<- fit%>%forecast(h ="5 years")fc#> # A fable: 160 x 7 [1Q]#> # Key:     Region, State, Purpose, .model [8]#>    Region    State    Purpose  .model Quarter#>    <chr>     <chr>    <chr>    <chr>    <qtr>#>  1 Melbourne Victoria Business ets    2018 Q1#>  2 Melbourne Victoria Business ets    2018 Q2#>  3 Melbourne Victoria Business ets    2018 Q3#>  4 Melbourne Victoria Business ets    2018 Q4#>  5 Melbourne Victoria Business ets    2019 Q1#>  6 Melbourne Victoria Business ets    2019 Q2#>  7 Melbourne Victoria Business ets    2019 Q3#>  8 Melbourne Victoria Business ets    2019 Q4#>  9 Melbourne Victoria Business ets    2020 Q1#> 10 Melbourne Victoria Business ets    2020 Q2#> # i 150 more rows#> # i 2 more variables: Trips <dist>, .mean <dbl>

The resulting forecasts are contained in a “fable” (forecast table),and both point forecasts and forecast distributions are available in thetable for the next five years. Confidence intervals can be extractedfrom the distribution using thehilo() function.

Thehilo() function can also be used on fable objects,which allows you to extract multiple intervals at once.

fc%>%hilo(level =c(80,95))#> # A tsibble: 160 x 9 [1Q]#> # Key:       Region, State, Purpose, .model [8]#>    Region    State    Purpose  .model Quarter#>    <chr>     <chr>    <chr>    <chr>    <qtr>#>  1 Melbourne Victoria Business ets    2018 Q1#>  2 Melbourne Victoria Business ets    2018 Q2#>  3 Melbourne Victoria Business ets    2018 Q3#>  4 Melbourne Victoria Business ets    2018 Q4#>  5 Melbourne Victoria Business ets    2019 Q1#>  6 Melbourne Victoria Business ets    2019 Q2#>  7 Melbourne Victoria Business ets    2019 Q3#>  8 Melbourne Victoria Business ets    2019 Q4#>  9 Melbourne Victoria Business ets    2020 Q1#> 10 Melbourne Victoria Business ets    2020 Q2#> # i 150 more rows#> # i 4 more variables: Trips <dist>, .mean <dbl>, `80%` <hilo>, `95%` <hilo>

You can also see a plot of the forecasts usingautoplot(). To see the historical data along with theforecasts you can provide it as the first argument to the function.

fc%>%autoplot(tourism_melb)

More model methods may be supported by particular models, includingthe ability torefit() the model to new data,stream() in new data to extend the fit,generate() simulated paths from a model,interpolate() missing values, extractcomponents() from the fitted model, and display the model’sequation().

More information about modelling time series and using the fablepackage can be found inForecasting: Principles andPractices (3rd Ed.) and in thepkgdown site.


[8]ページ先頭

©2009-2025 Movatter.jp