Movatterモバイル変換


[0]ホーム

URL:


mvgam R package logoStan Logo

mvgam

MultiVariate (Dynamic)GeneralizedAdditiveModels

R-CMD-checkCoverage statusDocumentationMethods in Ecology & EvolutionCRAN VersionCRAN Downloads

Themvgam 📦 fits Bayesian Dynamic Generalized AdditiveModels (DGAMs) that can include highly flexible nonlinear predictoreffects, latent variables and multivariate time series models. Thepackage does this by relying on functionalities from the impressivebrms andmgcv packages. Parameters are estimatedusing the probabilistic programming languageStan, giving users accessto the most advanced Bayesian inference algorithms available. Thisallowsmvgam to fit a very wide range of models,including:

Installation

You can install the stable package version fromCRANusing:install.packages('mvgam'), or install the latestdevelopment version using:devtools::install_github("nicholasjclark/mvgam"). You willalso need a working version ofStan installed (along witheitherrstan and/orcmdstanr). Please refer toinstallation links forStan withrstanhere, or forStan withcmdstandrhere.

Cheatsheet

mvgam usage cheatsheet

A simple example

We can explore the package’s primary functions using one of it’sbuilt-in datasets. Useplot_mvgam_series() to inspectfeatures for time series fromthe PortalProject, which represent counts of baited captures for four desertrodent species over time (see?portal_data for more detailsabout the dataset).

data(portal_data)plot_mvgam_series(  data = portal_data,   y = 'captures',  series = 'all')

Visualizing multivariate time series in R using mvgam

plot_mvgam_series(  data = portal_data,   y = 'captures',  series = 1)

Visualizing multivariate time series in R using mvgam

plot_mvgam_series(  data = portal_data,   y = 'captures',  series = 4)

Visualizing multivariate time series in R using mvgam

These plots show that the time series are count responses, withmissing data, many zeroes, seasonality and temporal autocorrelation allpresent. These features make time series analysis and forecasting verydifficult using conventional software. Butmvgam shines inthese tasks.

For most forecasting exercises, we’ll want to split the data intotraining and testing folds:

data_train <- portal_data %>%  dplyr::filter(time <= 60)data_test <- portal_data %>%  dplyr::filter(time > 60 &                  time <= 65)

Formulate anmvgam model; this model fits a State-SpaceGAM in which each species has its own intercept, linear association withndvi_ma12 and potentially nonlinear association withmintemp. These effects are estimated jointly with a fulltime series model for the temporal dynamics (in this case a VectorAutoregressive process). We assume the outcome follows a Poissondistribution and will condition the model inStan usingMCMC sampling withCmdstan:

mod <- mvgam(  # Observation model is empty as we don't have any  # covariates that impact observation error  formula = captures ~ 0,    # Process model contains varying intercepts,   # varying slopes of ndvi_ma12 and varying smooths   # of mintemp for each series.   # Temporal dynamics are modelled with a Vector   # Autoregression (VAR(1))  trend_formula = ~     trend +    s(trend, bs = 're', by = ndvi_ma12) +    s(mintemp, bs = 'bs', by = trend) - 1,  trend_model = VAR(cor = TRUE),    # Obvservations are conditionally Poisson  family = poisson(),  # Condition on the training data  data = data_train,  backend = 'cmdstanr')

Usingprint() returns a quick summary of the object:

mod#> GAM observation formula:#> captures ~ 1#> #> GAM process formula:#> ~trend + s(trend, bs = "re", by = ndvi_ma12) + s(mintemp, bs = "bs", #>     by = trend) - 1#> #> Family:#> poisson#> #> Link function:#> log#> #> Trend model:#> VAR(cor = TRUE)#> #> #> N latent factors:#> 4 #> #> N series:#> 4 #> #> N timepoints:#> 60 #> #> Status:#> Fitted using Stan #> 4 chains, each with iter = 2000; warmup = 1500; thin = 1 #> Total post-warmup draws = 2000

Split Rhat and Effective Sample Size diagnostics show goodconvergence of model estimates

mcmc_plot(mod,           type = 'rhat_hist')#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Rhats of parameters estimated with Stan in mvgam

mcmc_plot(mod,           type = 'neff_hist')#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Effective sample sizes of parameters estimated with Stan in mvgam

Useconditional_effects() for a quick visualisation ofthe main terms in model formulae

conditional_effects(mod,                     type = 'link')

Plotting GAM effects in mvgam and RPlotting GAM effects in mvgam and RPlotting GAM effects in mvgam and R

If you have thegratia package installed, it can also beused to plot partial effects of smooths

require(gratia)draw(mod,      trend_effects = TRUE)

Plotting GAM smooth functions in mvgam using gratia

Or design more targeted plots usingplot_predictions()from themarginaleffects package

plot_predictions(  mod,  condition = c('ndvi_ma12',                'series',                'series'),  type = 'link')

Using marginaleffects and mvgam to plot GAM smooth functions in R

plot_predictions(  mod,  condition = c('mintemp',                'series',                'series'),  type = 'link')

Using marginaleffects and mvgam to plot GAM smooth functions in R

We can also view the model’s posterior predictions for the entireseries (testing and training). Forecasts can be scored using a range ofproper scoring rules. See?score.mvgam_forecast for moredetails

fcs <- forecast(mod,                 newdata = data_test)plot(fcs, series = 1) +  plot(fcs, series = 2) +  plot(fcs, series = 3) +  plot(fcs, series = 4)#> Out of sample DRPS:#> 8.40559#> Out of sample DRPS:#> 5.25875275#> Out of sample DRPS:#> 8.77930325#> Out of sample DRPS:#> 3.47124075

Plotting forecast distributions using mvgam in R

For Vector Autoregressions fit inmvgam, we can inspectimpulse response functions and forecast error variancedecompositions. Theirf() function runs an ImpulseResponse Function (IRF) simulation whereby a positive “shock” isgenerated for a target process at timet = 0. All elseremaining stable, it then monitors how each of the remaining processesin the latent VAR would be expected to respond over the forecast horizonh. The function computes impulse responses for allprocesses in the object and returns them in an array that can be plottedusing the S3plot() function. Here we will use thegeneralized IRF, which makes no assumptions about the order in which theseries appear in the VAR process, and inspect how each process isexpected to respond to a sudden, positive pulse from the other processesover a horizon of 12 timepoints.

irfs <- irf(mod,             h = 12,             orthogonal = FALSE)plot(irfs,      series = 1)

Impulse response functions computed using mvgam in R

plot(irfs,      series = 3)

Impulse response functions computed using mvgam in R

Using the same logic as above, we can inspect forecast error variancedecompositions (FEVDs) for each process usingfevd(). Thistype of analysis asks how orthogonal shocks to all process in the systemcontribute to the variance of forecast uncertainty for a focal processover increasing horizons. In other words, the proportion of the forecastvariance of each latent time series can be attributed to the effects ofthe other series in the VAR process. FEVDs are useful because someshocks may not be expected to cause variations in the short-term but maycause longer-term fluctuations

fevds <- fevd(mod,               h = 12)plot(fevds)

Forecast error variance decompositions computed using mvgam in R

This plot shows that the variance of forecast uncertainty for eachprocess is initially dominated by contributions from that same process(i.e. self-dependent effects) but that effects from other processesbecome more important over increasing forecast horizons. Given what wesaw from the IRF plots above, these long-term contributions frominteractions among the processes makes sense.

Plotting randomized quantile residuals overtime foreach series can give useful information about what might be missing fromthe model. We can use the highly versatilepp_check()function to plot these:

pp_check(  mod,   type = 'resid_ribbon_grouped',  group = 'series',  x = 'time',  ndraws = 200)

When describing the model, it can be helpful to use thehow_to_cite() function to generate a scaffold fordescribing the model and sampling details in scientificcommunications

description <- how_to_cite(mod)description#> Methods text skeleton#> We used the R package mvgam (version 1.1.593; Clark & Wells, 2023) to#>   construct, fit and interrogate the model. mvgam fits Bayesian#>   State-Space models that can include flexible predictor effects in both#>   the process and observation components by incorporating functionalities#>   from the brms (Burkner 2017), mgcv (Wood 2017) and splines2 (Wang & Yan,#>   2023) packages. To encourage stability and prevent forecast variance#>   from increasing indefinitely, we enforced stationarity of the Vector#>   Autoregressive process following methods described by Heaps (2023) and#>   Clark et al. (2025). The mvgam-constructed model and observed data were#>   passed to the probabilistic programming environment Stan (version#>   2.37.0; Carpenter et al. 2017, Stan Development Team 2025), specifically#>   through the cmdstanr interface (Gabry & Cesnovar, 2021). We ran 4#>   Hamiltonian Monte Carlo chains for 1500 warmup iterations and 500#>   sampling iterations for joint posterior estimation. Rank normalized#>   split Rhat (Vehtari et al. 2021) and effective sample sizes were used to#>   monitor convergence.#> #> Primary references#> Clark, NJ and Wells K (2023). Dynamic Generalized Additive Models#>   (DGAMs) for forecasting discrete ecological time series. Methods in#>   Ecology and Evolution, 14, 771-784. doi.org/10.1111/2041-210X.13974#> Burkner, PC (2017). brms: An R Package for Bayesian Multilevel Models#>   Using Stan. Journal of Statistical Software, 80(1), 1-28.#>   doi:10.18637/jss.v080.i01#> Wood, SN (2017). Generalized Additive Models: An Introduction with R#>   (2nd edition). Chapman and Hall/CRC.#> Wang W and Yan J (2021). Shape-Restricted Regression Splines with R#>   Package splines2. Journal of Data Science, 19(3), 498-517.#>   doi:10.6339/21-JDS1020 https://doi.org/10.6339/21-JDS1020.#> Heaps, SE (2023). Enforcing stationarity through the prior in vector#>   autoregressions. Journal of Computational and Graphical Statistics 32,#>   74-83.#> Clark NJ, Ernest SKM, Senyondo H, Simonis J, White EP, Yenni GM,#>   Karunarathna KANK (2025). Beyond single-species models: leveraging#>   multispecies forecasts to navigate the dynamics of ecological#>   predictability. PeerJ 13:e18929.#> Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M,#>   Brubaker M, Guo J, Li P and Riddell A (2017). Stan: A probabilistic#>   programming language. Journal of Statistical Software 76.#> Gabry J, Cesnovar R, Johnson A, and Bronder S (2025). cmdstanr: R#>   Interface to 'CmdStan'. https://mc-stan.org/cmdstanr/,#>   https://discourse.mc-stan.org.#> Vehtari A, Gelman A, Simpson D, Carpenter B, and Burkner P (2021).#>   Rank-normalization, folding, and localization: An improved Rhat for#>   assessing convergence of MCMC (with discussion). Bayesian Analysis 16(2)#>   667-718. https://doi.org/10.1214/20-BA1221.#> #> Other useful references#> Arel-Bundock V, Greifer N, and Heiss A (2024). How to interpret#>   statistical models using marginaleffects for R and Python. Journal of#>   Statistical Software, 111(9), 1-32.#>   https://doi.org/10.18637/jss.v111.i09#> Gabry J, Simpson D, Vehtari A, Betancourt M, and Gelman A (2019).#>   Visualization in Bayesian workflow. Journal of the Royal Statatistical#>   Society A, 182, 389-402. doi:10.1111/rssa.12378.#> Vehtari A, Gelman A, and Gabry J (2017). Practical Bayesian model#>   evaluation using leave-one-out cross-validation and WAIC. Statistics and#>   Computing, 27, 1413-1432. doi:10.1007/s11222-016-9696-4.#> Burkner PC, Gabry J, and Vehtari A. (2020). Approximate leave-future-out#>   cross-validation for Bayesian time series models. Journal of Statistical#>   Computation and Simulation, 90(14), 2499-2523.#>   https://doi.org/10.1080/00949655.2020.1783262

The post-processing methods we have shown above are just the tip ofthe iceberg. For a full list of methods to apply on fitted modelobjects, typemethods(class = "mvgam").

Extended observationfamilies

mvgam was originally designed to analyse and forecastnon-negative integer-valued data. But further development ofmvgam has resulted in support for a growing number ofobservation families. Currently, the package can handle data for thefollowing:

See??mvgam_families for more information. Below is asimple example for simulating and modelling proportional data withBeta observations over a set of seasonal series withindependent Gaussian Process dynamic trends:

set.seed(100)data <- sim_mvgam(  family = betar(),  T = 80,  trend_model = GP(),  prop_trend = 0.5,  seasonality = "shared")plot_mvgam_series(  data = data$data_train,   series = "all")

mod <- mvgam(  y ~ s(season, bs = "cc", k = 7) +    s(season, by = series, m = 1, k = 5),  trend_model = GP(),  data = data$data_train,  newdata = data$data_test,  family = betar())

Inspect the summary to see that the posterior now also containsestimates for theBeta precision parametersϕ.

summary(mod,         include_betas = FALSE)#> GAM formula:#> y ~ s(season, bs = "cc", k = 7) + s(season, by = series, m = 1, #>     k = 5)#> #> Family:#> beta#> #> Link function:#> logit#> #> Trend model:#> GP()#> #> N series:#> 3 #> #> N timepoints:#> 80 #> #> Status:#> Fitted using Stan #> 4 chains, each with iter = 1000; warmup = 500; thin = 1 #> Total post-warmup draws = 2000#> #> Observation precision parameter estimates:#>        2.5%  50% 97.5% Rhat n_eff#> phi[1]  7.8 12.0  18.0    1  1829#> phi[2]  5.6  8.6  13.0    1  1023#> phi[3]  4.1  6.0   8.7    1  1404#> #> GAM coefficient (beta) estimates:#>             2.5%  50% 97.5% Rhat n_eff#> (Intercept) 0.11 0.45   0.7 1.01   602#> #> Approximate significance of GAM smooths:#>                             edf Ref.df Chi.sq p-value  #> s(season)                3.9071      5  9.792  0.0653 .#> s(season):seriesseries_1 1.0934      4 11.307  0.2695  #> s(season):seriesseries_2 2.5629      4  2.227  0.4544  #> s(season):seriesseries_3 0.8565      4  6.556  0.5358  #> ---#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#> #> marginal deviation:#>              2.5%  50% 97.5% Rhat n_eff#> alpha_gp[1] 0.150 0.40  0.88 1.00   828#> alpha_gp[2] 0.570 0.93  1.50 1.00  1018#> alpha_gp[3] 0.052 0.40  0.92 1.01   672#> #> length scale:#>           2.5%  50% 97.5% Rhat n_eff#> rho_gp[1]  1.2  3.6    11 1.00  1482#> rho_gp[2]  3.0 12.0    30 1.01   367#> rho_gp[3]  1.3  4.9    26 1.00   532#> #> Stan MCMC diagnostics:#> ✔ No issues with effective samples per iteration#> ✔ Rhat looks good for all parameters#> ✔ No issues with divergences#> ✔ No issues with maximum tree depth#> #> Samples were drawn using sampling(hmc). For each parameter, n_eff is a#>   crude measure of effective sample size, and Rhat is the potential scale#>   reduction factor on split MCMC chains (at convergence, Rhat = 1)#> #> Use how_to_cite() to get started describing this model

Plot the hindcast and forecast distributions for each series

library(patchwork)fc <- forecast(mod)wrap_plots(  plot(fc, series = 1),  plot(fc, series = 2),  plot(fc, series = 3),  ncol = 2)

There are many more extended uses ofmvgam, includingthe ability to fit hierarchical State-Space GAMs that include dynamicand spatially varying coefficient models, dynamic factors, Joint SpeciesDistribution Models and much more. See thepackage documentation for more details.mvgam can also be used to generate all necessary datastructures and modelling code necessary to fit DGAMs usingStan. This can be helpful if users wish to make changes tothe model to better suit their own bespoke research / analysis goals.TheStan Discourse is a helpful place totroubleshoot.

Citingmvgam andrelated software

When using any software please make sure to appropriately acknowledgethe hard work that developers and maintainers put into making thesepackages available. Citations are currently the best way to formallyacknowledge this work (but feel free to ⭐ this repo as well).

When usingmvgam, please cite the following:

Clark, N.J. and Wells, K. (2023). Dynamic Generalized Additive Models(DGAMs) for forecasting discrete ecological time series.Methods inEcology and Evolution. DOI:https://doi.org/10.1111/2041-210X.13974

Asmvgam acts as an interface toStan,please additionally cite:

Carpenter B., Gelman A., Hoffman M. D., Lee D., Goodrich B.,Betancourt M., Brubaker M., Guo J., Li P., and Riddell A. (2017). Stan:A probabilistic programming language.Journal of StatisticalSoftware. 76(1). DOI:https://doi.org/10.18637/jss.v076.i01

mvgam relies on several otherR packagesand, of course, onR itself. Usehow_to_cite()to simplify the process of finding appropriate citations for yoursoftware setup.

Getting help

If you encounter a clear bug, please file an issue with a minimalreproducible example onGitHub. Pleasealso feel free to use themvgamDiscussion Board to hunt for or post other discussion topics relatedto the package, and do check out themvgamChangelog for any updates about recent upgrades that the package hasincorporated.

Other resources

A series ofvignettes cover data formatting, forecasting and severalextended case studies of DGAMs. A number of other examples,including some step-by-step introductory webinars, have also beencompiled:

Interested in contributing?

I’m actively seeking PhD students and other researchers to work inthe areas of ecological forecasting, multivariate model evaluation anddevelopment ofmvgam. Please reach out if you areinterested (n.clark’at’uq.edu.au). Other contributions are also verywelcome, but please seeTheContributor Instructions for general guidelines. Note that byparticipating in this project you agree to abide by the terms of itsContributor Code ofConduct.

License

Themvgam project is licensed under anMITopen source license


[8]ページ先頭

©2009-2025 Movatter.jp