Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork17
{mvgam} R 📦 to fit Dynamic Bayesian Generalized Additive Models for multivariate modeling and forecasting
License
Unknown, MIT licenses found
Licenses found
nicholasjclark/mvgam
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
MultiVariate (Dynamic)GeneralizedAdditiveModels
Themvgam 📦 fits Bayesian Dynamic Generalized Additive Models (DGAMs)that can include highly flexible nonlinear predictor effects, latentvariables and multivariate time series models. The package does this byrelying on functionalities from the impressivebrms andmgcv packages. Parameters are estimatedusing the probabilistic programming languageStan, giving users access to the most advancedBayesian inference algorithms available. This allowsmvgam to fit avery wide range of models, including:
- Multivariate State-Space Time Series Models
- Continuous-Time Autoregressive Time Series Models
- Shared Signal Time Series Models
- Dynamic Factor Models
- Hierarchical N-mixture Models
- Hierarchical Generalized Additive Models
- Joint Species Distribution Models
You can install the stable package version fromCRAN using:install.packages('mvgam'), or install the latest development versionusing:devtools::install_github("nicholasjclark/mvgam"). You will alsoneed a working version ofStan installed (along with eitherrstanand/orcmdstanr). Please refer to installation links forStan withrstanhere, or forStan withcmdstandrhere.
We can explore the package’s primary functions using one of it’sbuilt-in datasets. Useplot_mvgam_series() to inspect features fortime series fromthe PortalProject, which represent counts of baited captures for four desertrodent species over time (see?portal_data for more details about thedataset).
data(portal_data)plot_mvgam_series( data = portal_data, y = 'captures', series = 'all')plot_mvgam_series( data = portal_data, y = 'captures', series = 1)plot_mvgam_series( data = portal_data, y = 'captures', series = 4)These plots show that the time series are count responses, with missingdata, many zeroes, seasonality and temporal autocorrelation all present.These features make time series analysis and forecasting very difficultusing conventional software. Butmvgam shines in these tasks.
For most forecasting exercises, we’ll want to split the data intotraining and testing folds:
data_train <- portal_data %>% dplyr::filter(time <= 60)data_test <- portal_data %>% dplyr::filter(time > 60 & time <= 65)Formulate anmvgam model; this model fits a State-Space GAM in whicheach species has its own intercept, linear association withndvi_ma12and potentially nonlinear association withmintemp. These effects areestimated jointly with a full time series model for the temporaldynamics (in this case a Vector Autoregressive process). We assume theoutcome follows a Poisson distribution and will condition the model inStan using MCMC sampling withCmdstan:
mod <- mvgam( # Observation model is empty as we don't have any # covariates that impact observation error formula = captures ~ 0, # Process model contains varying intercepts, # varying slopes of ndvi_ma12 and varying smooths # of mintemp for each series. # Temporal dynamics are modelled with a Vector # Autoregression (VAR(1)) trend_formula = ~ trend + s(trend, bs = 're', by = ndvi_ma12) + s(mintemp, bs = 'bs', by = trend) - 1, trend_model = VAR(cor = TRUE), # Obvservations are conditionally Poisson family = poisson(), # Condition on the training data data = data_train, backend = 'cmdstanr')Usingprint() returns a quick summary of the object:
mod#> GAM observation formula:#> captures ~ 1#> #> GAM process formula:#> ~trend + s(trend, bs = "re", by = ndvi_ma12) + s(mintemp, bs = "bs", #> by = trend) - 1#> #> Family:#> poisson#> #> Link function:#> log#> #> Trend model:#> VAR(cor = TRUE)#> #> #> N latent factors:#> 4 #> #> N series:#> 4 #> #> N timepoints:#> 60 #> #> Status:#> Fitted using Stan #> 4 chains, each with iter = 2000; warmup = 1500; thin = 1 #> Total post-warmup draws = 2000Split Rhat and Effective Sample Size diagnostics show good convergenceof model estimates
mcmc_plot(mod, type = 'rhat_hist')#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.mcmc_plot(mod, type = 'neff_hist')#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.Useconditional_effects() for a quick visualisation of the main termsin model formulae
conditional_effects(mod, type = 'link')If you have thegratia package installed, it can also be used to plotpartial effects of smooths
require(gratia)draw(mod, trend_effects = TRUE)Or design more targeted plots usingplot_predictions() from themarginaleffects package
plot_predictions( mod, condition = c('ndvi_ma12', 'series', 'series'), type = 'link')plot_predictions( mod, condition = c('mintemp', 'series', 'series'), type = 'link')We can also view the model’s posterior predictions for the entire series(testing and training). Forecasts can be scored using a range of properscoring rules. See?score.mvgam_forecast for more details
fcs <- forecast(mod, newdata = data_test)plot(fcs, series = 1) + plot(fcs, series = 2) + plot(fcs, series = 3) + plot(fcs, series = 4)#> Out of sample DRPS:#> 8.40559#> Out of sample DRPS:#> 5.25875275#> Out of sample DRPS:#> 8.77930325#> Out of sample DRPS:#> 3.47124075For Vector Autoregressions fit inmvgam, we can inspectimpulse response functions and forecast error variancedecompositions. Theirf() function runs an Impulse ResponseFunction (IRF) simulation whereby a positive “shock” is generated for atarget process at timet = 0. All else remaining stable, it thenmonitors how each of the remaining processes in the latent VAR would beexpected to respond over the forecast horizonh. The function computesimpulse responses for all processes in the object and returns them in anarray that can be plotted using the S3plot() function. Here we willuse the generalized IRF, which makes no assumptions about the order inwhich the series appear in the VAR process, and inspect how each processis expected to respond to a sudden, positive pulse from the otherprocesses over a horizon of 12 timepoints.
irfs <- irf(mod, h = 12, orthogonal = FALSE)plot(irfs, series = 1)plot(irfs, series = 3)Using the same logic as above, we can inspect forecast error variancedecompositions (FEVDs) for each process usingfevd(). This type ofanalysis asks how orthogonal shocks to all process in the systemcontribute to the variance of forecast uncertainty for a focal processover increasing horizons. In other words, the proportion of the forecastvariance of each latent time series can be attributed to the effects ofthe other series in the VAR process. FEVDs are useful because someshocks may not be expected to cause variations in the short-term but maycause longer-term fluctuations
fevds <- fevd(mod, h = 12)plot(fevds)This plot shows that the variance of forecast uncertainty for eachprocess is initially dominated by contributions from that same process(i.e. self-dependent effects) but that effects from other processesbecome more important over increasing forecast horizons. Given what wesaw from the IRF plots above, these long-term contributions frominteractions among the processes makes sense.
Plotting randomized quantile residuals overtime for each series cangive useful information about what might be missing from the model. Wecan use the highly versatilepp_check() function to plot these:
pp_check( mod, type = 'resid_ribbon_grouped', group = 'series', x = 'time', ndraws = 200)When describing the model, it can be helpful to use thehow_to_cite()function to generate a scaffold for describing the model and samplingdetails in scientific communications
description <- how_to_cite(mod)description#> Methods text skeleton#> We used the R package mvgam (version 1.1.593; Clark & Wells, 2023) to#> construct, fit and interrogate the model. mvgam fits Bayesian#> State-Space models that can include flexible predictor effects in both#> the process and observation components by incorporating functionalities#> from the brms (Burkner 2017), mgcv (Wood 2017) and splines2 (Wang & Yan,#> 2023) packages. To encourage stability and prevent forecast variance#> from increasing indefinitely, we enforced stationarity of the Vector#> Autoregressive process following methods described by Heaps (2023) and#> Clark et al. (2025). The mvgam-constructed model and observed data were#> passed to the probabilistic programming environment Stan (version#> 2.37.0; Carpenter et al. 2017, Stan Development Team 2025), specifically#> through the cmdstanr interface (Gabry & Cesnovar, 2021). We ran 4#> Hamiltonian Monte Carlo chains for 1500 warmup iterations and 500#> sampling iterations for joint posterior estimation. Rank normalized#> split Rhat (Vehtari et al. 2021) and effective sample sizes were used to#> monitor convergence.#> #> Primary references#> Clark, NJ and Wells K (2023). Dynamic Generalized Additive Models#> (DGAMs) for forecasting discrete ecological time series. Methods in#> Ecology and Evolution, 14, 771-784. doi.org/10.1111/2041-210X.13974#> Burkner, PC (2017). brms: An R Package for Bayesian Multilevel Models#> Using Stan. Journal of Statistical Software, 80(1), 1-28.#> doi:10.18637/jss.v080.i01#> Wood, SN (2017). Generalized Additive Models: An Introduction with R#> (2nd edition). Chapman and Hall/CRC.#> Wang W and Yan J (2021). Shape-Restricted Regression Splines with R#> Package splines2. Journal of Data Science, 19(3), 498-517.#> doi:10.6339/21-JDS1020 https://doi.org/10.6339/21-JDS1020.#> Heaps, SE (2023). Enforcing stationarity through the prior in vector#> autoregressions. Journal of Computational and Graphical Statistics 32,#> 74-83.#> Clark NJ, Ernest SKM, Senyondo H, Simonis J, White EP, Yenni GM,#> Karunarathna KANK (2025). Beyond single-species models: leveraging#> multispecies forecasts to navigate the dynamics of ecological#> predictability. PeerJ 13:e18929.#> Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M,#> Brubaker M, Guo J, Li P and Riddell A (2017). Stan: A probabilistic#> programming language. Journal of Statistical Software 76.#> Gabry J, Cesnovar R, Johnson A, and Bronder S (2025). cmdstanr: R#> Interface to 'CmdStan'. https://mc-stan.org/cmdstanr/,#> https://discourse.mc-stan.org.#> Vehtari A, Gelman A, Simpson D, Carpenter B, and Burkner P (2021).#> Rank-normalization, folding, and localization: An improved Rhat for#> assessing convergence of MCMC (with discussion). Bayesian Analysis 16(2)#> 667-718. https://doi.org/10.1214/20-BA1221.#> #> Other useful references#> Arel-Bundock V, Greifer N, and Heiss A (2024). How to interpret#> statistical models using marginaleffects for R and Python. Journal of#> Statistical Software, 111(9), 1-32.#> https://doi.org/10.18637/jss.v111.i09#> Gabry J, Simpson D, Vehtari A, Betancourt M, and Gelman A (2019).#> Visualization in Bayesian workflow. Journal of the Royal Statatistical#> Society A, 182, 389-402. doi:10.1111/rssa.12378.#> Vehtari A, Gelman A, and Gabry J (2017). Practical Bayesian model#> evaluation using leave-one-out cross-validation and WAIC. Statistics and#> Computing, 27, 1413-1432. doi:10.1007/s11222-016-9696-4.#> Burkner PC, Gabry J, and Vehtari A. (2020). Approximate leave-future-out#> cross-validation for Bayesian time series models. Journal of Statistical#> Computation and Simulation, 90(14), 2499-2523.#> https://doi.org/10.1080/00949655.2020.1783262The post-processing methods we have shown above are just the tip of theiceberg. For a full list of methods to apply on fitted model objects,typemethods(class = "mvgam").
mvgam was originally designed to analyse and forecast non-negativeinteger-valued data. But further development ofmvgam has resulted insupport for a growing number of observation families. Currently, thepackage can handle data for the following:
gaussian()for real-valued datastudent_t()for heavy-tailed real-valued datalognormal()for non-negative real-valued dataGamma()for non-negative real-valued databetar()for proportional data on(0,1)bernoulli()for binary datapoisson()for count datanb()for overdispersed count databinomial()for count data with known number of trialsbeta_binomial()for overdispersed count data with known number oftrialsnmix()for count data with imperfect detection (unknown number oftrials)
See??mvgam_families for more information. Below is a simple examplefor simulating and modelling proportional data withBeta observationsover a set of seasonal series with independent Gaussian Process dynamictrends:
set.seed(100)data <- sim_mvgam( family = betar(), T = 80, trend_model = GP(), prop_trend = 0.5, seasonality = "shared")plot_mvgam_series( data = data$data_train, series = "all")mod <- mvgam( y ~ s(season, bs = "cc", k = 7) + s(season, by = series, m = 1, k = 5), trend_model = GP(), data = data$data_train, newdata = data$data_test, family = betar())Inspect the summary to see that the posterior now also containsestimates for theBeta precision parametersϕ.
summary(mod, include_betas = FALSE)#> GAM formula:#> y ~ s(season, bs = "cc", k = 7) + s(season, by = series, m = 1, #> k = 5)#> #> Family:#> beta#> #> Link function:#> logit#> #> Trend model:#> GP()#> #> N series:#> 3 #> #> N timepoints:#> 80 #> #> Status:#> Fitted using Stan #> 4 chains, each with iter = 1000; warmup = 500; thin = 1 #> Total post-warmup draws = 2000#> #> Observation precision parameter estimates:#> 2.5% 50% 97.5% Rhat n_eff#> phi[1] 7.8 12.0 18.0 1 1829#> phi[2] 5.6 8.6 13.0 1 1023#> phi[3] 4.1 6.0 8.7 1 1404#> #> GAM coefficient (beta) estimates:#> 2.5% 50% 97.5% Rhat n_eff#> (Intercept) 0.11 0.45 0.7 1.01 602#> #> Approximate significance of GAM smooths:#> edf Ref.df Chi.sq p-value #> s(season) 3.9071 5 9.792 0.0653 .#> s(season):seriesseries_1 1.0934 4 11.307 0.2695 #> s(season):seriesseries_2 2.5629 4 2.227 0.4544 #> s(season):seriesseries_3 0.8565 4 6.556 0.5358 #> ---#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1#> #> marginal deviation:#> 2.5% 50% 97.5% Rhat n_eff#> alpha_gp[1] 0.150 0.40 0.88 1.00 828#> alpha_gp[2] 0.570 0.93 1.50 1.00 1018#> alpha_gp[3] 0.052 0.40 0.92 1.01 672#> #> length scale:#> 2.5% 50% 97.5% Rhat n_eff#> rho_gp[1] 1.2 3.6 11 1.00 1482#> rho_gp[2] 3.0 12.0 30 1.01 367#> rho_gp[3] 1.3 4.9 26 1.00 532#> #> Stan MCMC diagnostics:#> ✔ No issues with effective samples per iteration#> ✔ Rhat looks good for all parameters#> ✔ No issues with divergences#> ✔ No issues with maximum tree depth#> #> Samples were drawn using sampling(hmc). For each parameter, n_eff is a#> crude measure of effective sample size, and Rhat is the potential scale#> reduction factor on split MCMC chains (at convergence, Rhat = 1)#> #> Use how_to_cite() to get started describing this modelPlot the hindcast and forecast distributions for each series
library(patchwork)fc <- forecast(mod)wrap_plots( plot(fc, series = 1), plot(fc, series = 2), plot(fc, series = 3), ncol = 2)There are many more extended uses ofmvgam, including the ability tofit hierarchical State-Space GAMs that include dynamic and spatiallyvarying coefficient models, dynamic factors, Joint Species DistributionModels and much more. See thepackage documentation for more details.mvgam canalso be used to generate all necessary data structures and modellingcode necessary to fit DGAMs usingStan. This can be helpful if userswish to make changes to the model to better suit their own bespokeresearch / analysis goals. TheStan Discourse is a helpful place totroubleshoot.
When using any software please make sure to appropriately acknowledgethe hard work that developers and maintainers put into making thesepackages available. Citations are currently the best way to formallyacknowledge this work (but feel free to ⭐ this repo as well).
When usingmvgam, please cite the following:
Clark, N.J. and Wells, K. (2023). Dynamic Generalized Additive Models(DGAMs) for forecasting discrete ecological time series.Methods inEcology and Evolution. DOI:https://doi.org/10.1111/2041-210X.13974
Asmvgam acts as an interface toStan, please additionally cite:
Carpenter B., Gelman A., Hoffman M. D., Lee D., Goodrich B.,Betancourt M., Brubaker M., Guo J., Li P., and Riddell A. (2017).Stan: A probabilistic programming language.Journal of StatisticalSoftware. 76(1). DOI:https://doi.org/10.18637/jss.v076.i01
mvgam relies on several otherR packages and, of course, onRitself. Usehow_to_cite() to simplify the process of findingappropriate citations for your software setup.
If you encounter a clear bug, please file an issue with a minimalreproducible example onGitHub. Please alsofeel free to use themvgam DiscussionBoard to hunt foror post other discussion topics related to the package, and do check outthemvgamChangelog forany updates about recent upgrades that the package has incorporated.
A series ofvignettes cover data formatting, forecasting and severalextended case studies of DGAMs. A number of other examples,including some step-by-step introductory webinars, have also beencompiled:
- Time series in R and Stan using the
mvgampackage - Ecological Forecasting with Dynamic Generalized AdditiveModels
- Distributed lags (and hierarchical distributed lags)using
mgcvandmvgam - State-Space Vector Autoregressions in
mvgam - Ecological Forecasting with Dynamic GAMs; a tutorial anddetailed case study
- Incorporating time-varying seasonality in forecastmodels
I’m actively seeking PhD students and other researchers to work in theareas of ecological forecasting, multivariate model evaluation anddevelopment ofmvgam. Please reach out if you are interested(n.clark’at’uq.edu.au). Other contributions are also very welcome, butplease seeThe ContributorInstructionsfor general guidelines. Note that by participating in this project youagree to abide by the terms of itsContributor Code ofConduct.
Themvgam project is licensed under anMIT open source license
About
{mvgam} R 📦 to fit Dynamic Bayesian Generalized Additive Models for multivariate modeling and forecasting
Topics
Resources
License
Unknown, MIT licenses found
Licenses found
Code of conduct
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Sponsor this project
Uh oh!
There was an error while loading.Please reload this page.
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors5
Uh oh!
There was an error while loading.Please reload this page.




















