| Type: | Package |
| Title: | Efficient Leave-One-Out Cross-Validation and WAIC for BayesianModels |
| Version: | 2.8.0 |
| Date: | 2024-07-03 |
| Maintainer: | Jonah Gabry <jsg2201@columbia.edu> |
| URL: | https://mc-stan.org/loo/,https://discourse.mc-stan.org |
| BugReports: | https://github.com/stan-dev/loo/issues |
| Description: | Efficient approximate leave-one-out cross-validation (LOO) for Bayesian models fit using Markov chain Monte Carlo, as described in Vehtari, Gelman, and Gabry (2017) <doi:10.1007/s11222-016-9696-4>. The approximation uses Pareto smoothed importance sampling (PSIS), a new procedure for regularizing importance weights. As a byproduct of the calculations, we also obtain approximate standard errors for estimated predictive errors and for the comparison of predictive errors between models. The package also provides methods for using stacking and other model weighting techniques to average Bayesian predictive distributions. |
| License: | GPL (≥ 3) |
| LazyData: | TRUE |
| Depends: | R (≥ 3.1.2) |
| Imports: | checkmate, matrixStats (≥ 0.52), parallel, posterior (≥1.5.0), stats |
| Suggests: | bayesplot (≥ 1.7.0), brms (≥ 2.10.0), ggplot2, graphics,knitr, rmarkdown, rstan, rstanarm (≥ 2.19.0), rstantools,spdep, testthat (≥ 2.1.0) |
| VignetteBuilder: | knitr |
| Encoding: | UTF-8 |
| SystemRequirements: | pandoc (>= 1.12.3), pandoc-citeproc |
| RoxygenNote: | 7.3.1 |
| NeedsCompilation: | no |
| Packaged: | 2024-07-03 16:26:26 UTC; jgabry |
| Author: | Aki Vehtari [aut], Jonah Gabry [cre, aut], Måns Magnusson [aut], Yuling Yao [aut], Paul-Christian Bürkner [aut], Topi Paananen [aut], Andrew Gelman [aut], Ben Goodrich [ctb], Juho Piironen [ctb], Bruno Nicenboim [ctb], Leevi Lindgren [ctb] |
| Repository: | CRAN |
| Date/Publication: | 2024-07-03 18:50:02 UTC |
Efficient LOO-CV and WAIC for Bayesian models
Description
Stan Development Team
This package implements the methods described in Vehtari, Gelman, andGabry (2017), Vehtari, Simpson, Gelman, Yao, and Gabry (2024), andYao et al. (2018). To get started see theloo packagevignettes, theloo() function for efficient approximate leave-one-outcross-validation (LOO-CV), thepsis() function for the Paretosmoothed importance sampling (PSIS) algorithm, orloo_model_weights() for an implementation of Bayesian stacking ofpredictive distributions from multiple models.
Details
Leave-one-out cross-validation (LOO-CV) and the widely applicableinformation criterion (WAIC) are methods for estimating pointwiseout-of-sample prediction accuracy from a fitted Bayesian model using thelog-likelihood evaluated at the posterior simulations of the parametervalues. LOO-CV and WAIC have various advantages over simpler estimates ofpredictive error such as AIC and DIC but are less used in practice becausethey involve additional computational steps. This package implements thefast and stable computations for approximate LOO-CV laid out in Vehtari,Gelman, and Gabry (2017). From existing posterior simulation draws, wecompute LOO-CV using Pareto smoothed importance sampling (PSIS; Vehtari,Simpson, Gelman, Yao, and Gabry, 2024), a new procedure for stabilizingand diagnosing importance weights. As a byproduct of our calculations,we also obtain approximate standard errors for estimated predictiveerrors and for comparing of predictive errors between two models.
We recommend PSIS-LOO-CV instead of WAIC, because PSIS provides usefuldiagnostics and effective sample size and Monte Carlo standard errorestimates.
Author(s)
Maintainer: Jonah Gabryjsg2201@columbia.edu
Authors:
Aki VehtariAki.Vehtari@aalto.fi
Måns Magnusson
Yuling Yao
Paul-Christian Bürkner
Topi Paananen
Andrew Gelman
Other contributors:
Ben Goodrich [contributor]
Juho Piironen [contributor]
Bruno Nicenboim [contributor]
Leevi Lindgren [contributor]
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian modelevaluation using leave-one-out cross-validation and WAIC.Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4(journal version,preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).Pareto smoothed importance sampling.Journal of Machine Learning Research,25(72):1-58.PDF
Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018) Usingstacking to average Bayesian predictive distributions.Bayesian Analysis, advance publication, doi:10.1214/17-BA1091.(online).
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2019).Leave-One-Out Cross-Validation for Large Data.InThirty-sixth International Conference on Machine Learning,PMLR 97:4244-4253.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2020).Leave-One-Out Cross-Validation for Model Comparison in Large Data.InProceedings of the 23rd International Conference on ArtificialIntelligence and Statistics (AISTATS), PMLR 108:341-351.
Epifani, I., MacEachern, S. N., and Peruggia, M. (2008). Case-deletionimportance sampling estimators: Central limit theorems and related results.Electronic Journal of Statistics2, 774-806.
Gelfand, A. E. (1996). Model determination using sampling-based methods. InMarkov Chain Monte Carlo in Practice, ed. W. R. Gilks, S. Richardson,D. J. Spiegelhalter, 145-162. London: Chapman and Hall.
Gelfand, A. E., Dey, D. K., and Chang, H. (1992). Model determination usingpredictive distributions with implementation via sampling-based methods. InBayesian Statistics 4, ed. J. M. Bernardo, J. O. Berger, A. P. Dawid,and A. F. M. Smith, 147-167. Oxford University Press.
Gelman, A., Hwang, J., and Vehtari, A. (2014). Understanding predictiveinformation criteria for Bayesian models.Statistics and Computing24, 997-1016.
Ionides, E. L. (2008). Truncated importance sampling.Journal ofComputational and Graphical Statistics17, 295-311.
Koopman, S. J., Shephard, N., and Creal, D. (2009). Testing the assumptionsbehind importance sampling.Journal of Econometrics149, 2-11.
Peruggia, M. (1997). On the variability of case-deletion importance samplingweights in the Bayesian linear model.Journal of the AmericanStatistical Association92, 199-207.
Stan Development Team (2017). The Stan C++ Library, Version 2.17.0.https://mc-stan.org.
Stan Development Team (2018). RStan: the R interface to Stan, Version 2.17.3.https://mc-stan.org.
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation andwidely application information criterion in singular learning theory.Journal of Machine Learning Research11, 3571-3594.
Zhang, J., and Stephens, M. A. (2009). A new and efficient estimation methodfor the generalized Pareto distribution.Technometrics51,316-325.
See Also
Useful links:
Report bugs athttps://github.com/stan-dev/loo/issues
Compute a point estimate from a draws object
Description
Compute a point estimate from a draws object
Usage
.compute_point_estimate(draws)## S3 method for class 'matrix'.compute_point_estimate(draws)## Default S3 method:.compute_point_estimate(draws)Arguments
draws | A draws object with draws from the posterior. |
Details
This is a generic function to compute point estimates from drawsobjects. The function is internal and should only be used by developers toenableloo_subsample() for arbitrary draws objects.
Value
A 1 by P matrix with point estimates from a draws object.
The number of posterior draws in a draws object.
Description
The number of posterior draws in a draws object.
Usage
.ndraws(x)## S3 method for class 'matrix'.ndraws(x)## Default S3 method:.ndraws(x)Arguments
x | A draws object with posterior draws. |
Details
This is a generic function to return the total number of draws froman arbitrary draws objects. The function is internal and should only beused by developers to enableloo_subsample() for arbitrary draws objects.
Value
An integer with the number of draws.
Thin a draws object
Description
Thin a draws object
Usage
.thin_draws(draws, loo_approximation_draws)## S3 method for class 'matrix'.thin_draws(draws, loo_approximation_draws)## S3 method for class 'numeric'.thin_draws(draws, loo_approximation_draws)## Default S3 method:.thin_draws(draws, loo_approximation_draws)Arguments
draws | A draws object with posterior draws. |
loo_approximation_draws | The number of posterior draws to return (ie after thinning). |
Details
This is a generic function to thin draws from arbitrary drawsobjects. The function is internal and should only be used by developers toenableloo_subsample() for arbitrary draws objects.
Value
A thinned draws object.
Compute weighted expectations
Description
TheE_loo() function computes weighted expectations (means, variances,quantiles) using the importance weights obtained from thePSIS smoothing procedure. The expectations estimated by theE_loo() function assume that the PSIS approximation is working well.A smallPareto k estimate is necessary,but not sufficient, forE_loo() to give reliable estimates. Additionaldiagnostic checks for gauging the reliability of the estimates are indevelopment and will be added in a future release.
Usage
E_loo(x, psis_object, ...)## Default S3 method:E_loo( x, psis_object, ..., type = c("mean", "variance", "sd", "quantile"), probs = NULL, log_ratios = NULL)## S3 method for class 'matrix'E_loo( x, psis_object, ..., type = c("mean", "variance", "sd", "quantile"), probs = NULL, log_ratios = NULL)Arguments
x | A numeric vector or matrix. |
psis_object | An object returned by |
... | Arguments passed to individual methods. |
type | The type of expectation to compute. The options are |
probs | For computing quantiles, a vector of probabilities. |
log_ratios | Optionally, a vector or matrix (the same dimensions as |
Value
A named list with the following components:
valueThe result of the computation.
For the matrix method,
valueis a vector withncol(x)elements, with one exception: whentype="quantile"andmultiple values are specified inprobsthevaluecomponent ofthe returned object is alength(probs)byncol(x)matrix.For the default/vector method the
valuecomponent is scalar, withone exception: whentype="quantile"and multiple valuesare specified inprobsthevaluecomponent is a vector withlength(probs)elements.pareto_kFunction-specific diagnostic.
For the matrix method it will be a vector of length
ncol(x)containing estimates of the shape parameterkof thegeneralized Pareto distribution. For the default/vector method,the estimate is a scalar. Iflog_ratiosis not specified whencallingE_loo(), the smoothed log-weights are used to estimatePareto-k's, which may produce optimistic estimates.For
type="mean",type="var", andtype="sd", the returned Pareto-k isusually the maximum of the Pareto-k's for the left and right tail ofhrand the right tail ofr, whereris the importance ratio andh=xfortype="mean"andh=x^2fortype="var"andtype="sd".Ifhis binary, constant, or not finite, or iftype="quantile", thereturned Pareto-k is the Pareto-k for the right tail ofr.
Examples
if (requireNamespace("rstanarm", quietly = TRUE)) {# Use rstanarm package to quickly fit a model and get both a log-likelihood# matrix and draws from the posterior predictive distributionlibrary("rstanarm")# data from help("lm")ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)d <- data.frame( weight = c(ctl, trt), group = gl(2, 10, 20, labels = c("Ctl","Trt")))fit <- stan_glm(weight ~ group, data = d, refresh = 0)yrep <- posterior_predict(fit)dim(yrep)log_ratios <- -1 * log_lik(fit)dim(log_ratios)r_eff <- relative_eff(exp(-log_ratios), chain_id = rep(1:4, each = 1000))psis_object <- psis(log_ratios, r_eff = r_eff, cores = 2)E_loo(yrep, psis_object, type = "mean")E_loo(yrep, psis_object, type = "var")E_loo(yrep, psis_object, type = "sd")E_loo(yrep, psis_object, type = "quantile", probs = 0.5) # medianE_loo(yrep, psis_object, type = "quantile", probs = c(0.1, 0.9))# We can get more accurate Pareto k diagnostic if we also provide# the log_ratios argumentE_loo(yrep, psis_object, type = "mean", log_ratios = log_ratios)}Pareto smoothed importance sampling (PSIS)using approximate posteriors
Description
Pareto smoothed importance sampling (PSIS)using approximate posteriors
Usage
ap_psis(log_ratios, log_p, log_g, ...)## S3 method for class 'array'ap_psis(log_ratios, log_p, log_g, ..., cores = getOption("mc.cores", 1))## S3 method for class 'matrix'ap_psis(log_ratios, log_p, log_g, ..., cores = getOption("mc.cores", 1))## Default S3 method:ap_psis(log_ratios, log_p, log_g, ...)Arguments
log_ratios | The log-likelihood ratios (ie -log_liks) |
log_p | The log-posterior (target) evaluated at S samples from theproposal distribution (g). A vector of length S. |
log_g | The log-density (proposal) evaluated at S samples from theproposal distribution (g). A vector of length S. |
... | Currently not in use. |
cores | The number of cores to use for parallelization. This defaults tothe option
|
Methods (by class)
ap_psis(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number ofchains, andNis the number of data points.ap_psis(matrix): AnSbyNmatrix, whereSis the sizeof the posterior sample (with all chains merged) andNis the numberof data points.ap_psis(default): A vector of lengthS(posterior sample size).
Model comparison (deprecated, old version)
Description
This function is deprecated. Please use the newloo_compare() functioninstead.
Usage
compare(..., x = list())Arguments
... | |
x | A list of at least two objects returned by |
Details
When comparing two fitted models, we can estimate the difference in theirexpected predictive accuracy by the difference inelpd_loo orelpd_waic (or multiplied by -2, if desired, to be on thedeviance scale).
When that difference,elpd_diff, is positive then the expectedpredictive accuracy for the second model is higher. A negativeelpd_diff favors the first model.
When usingcompare() with more than two models, the values in theelpd_diff andse_diff columns of the returned matrix arecomputed by making pairwise comparisons between each model and the modelwith the best ELPD (i.e., the model in the first row).Although theelpd_diff column is equal to the difference inelpd_loo, do not expect these_diff column to be equal to thethe difference inse_elpd_loo.
To compute the standard error of the difference in ELPD we use apaired estimate to take advantage of the fact that the same set ofNdata points was used to fit both models. These calculations should be mostuseful whenN is large, because then non-normality of thedistribution is not such an issue when estimating the uncertainty in thesesums. These standard errors, for all their flaws, should give a bettersense of uncertainty than what is obtained using the current standardapproach of comparing differences of deviances to a Chi-squareddistribution, a practice derived for Gaussian linear models orasymptotically, and which only applies to nested models in any case.
Value
A vector or matrix with class'compare.loo' that has its ownprint method. If exactly two objects are provided in... orx, then the difference in expected predictive accuracy and thestandard error of the difference are returned. If more than two objects areprovided then a matrix of summary information is returned (seeDetails).
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian modelevaluation using leave-one-out cross-validation and WAIC.Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4(journal version,preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).Pareto smoothed importance sampling.Journal of Machine Learning Research,25(72):1-58.PDF
Examples
## Not run: loo1 <- loo(log_lik1)loo2 <- loo(log_lik2)print(compare(loo1, loo2), digits = 3)print(compare(x = list(loo1, loo2)))waic1 <- waic(log_lik1)waic2 <- waic(log_lik2)compare(waic1, waic2)## End(Not run)Continuously ranked probability score
Description
Thecrps() andscrps() functions and theirloo_*() counterparts can beused to compute the continuously ranked probability score (CRPS) and scaledCRPS (SCRPS) (see Bolin and Wallin, 2022). CRPS is a proper scoring rule, andstrictly proper when the first moment of the predictive distribution isfinite. Both can be expressed in terms of samples form the predictivedistribution. See e.g. Gneiting and Raftery (2007) for a comprehensivediscussion on CRPS.
Usage
crps(x, ...)scrps(x, ...)loo_crps(x, ...)loo_scrps(x, ...)## S3 method for class 'matrix'crps(x, x2, y, ..., permutations = 1)## S3 method for class 'numeric'crps(x, x2, y, ..., permutations = 1)## S3 method for class 'matrix'loo_crps( x, x2, y, log_lik, ..., permutations = 1, r_eff = 1, cores = getOption("mc.cores", 1))## S3 method for class 'matrix'scrps(x, x2, y, ..., permutations = 1)## S3 method for class 'numeric'scrps(x, x2, y, ..., permutations = 1)## S3 method for class 'matrix'loo_scrps( x, x2, y, log_lik, ..., permutations = 1, r_eff = 1, cores = getOption("mc.cores", 1))Arguments
x | A |
... | Passed on to |
x2 | Independent draws from the same distribution as draws in |
y | A vector of observations or a single value. |
permutations | An integer, with default value of 1, specifying how manytimes the expected value of |X - X'| ( |
log_lik | A log-likelihood matrix the same size as |
r_eff | An optional vector of relative effective sample size estimatescontaining one element per observation. See |
cores | The number of cores to use for parallelization of |
Details
To compute (S)CRPS, the user needs to provide two sets of draws,x andx2, from the predictive distribution. This is due to the fact that formulasused to compute CRPS involve an expectation of the absolute difference ofxandx2, both having the same distribution. See thepermutations argument,as well as Gneiting and Raftery (2007) for details.
Value
A list containing two elements:estimates andpointwise.The former reports estimator and standard error and latter the pointwisevalues.
References
Bolin, D., & Wallin, J. (2022). Local scale invariance and robustness ofproper scoring rules. arXiv.doi:10.48550/arXiv.1912.05642
Gneiting, T., & Raftery, A. E. (2007). Strictly Proper Scoring Rules,Prediction, and Estimation. Journal of the American Statistical Association,102(477), 359–378.
Examples
## Not run: # An example using rstanarmlibrary(rstanarm)data("kidiq")fit <- stan_glm(kid_score ~ mom_hs + mom_iq, data = kidiq)ypred1 <- posterior_predict(fit)ypred2 <- posterior_predict(fit)crps(ypred1, ypred2, y = fit$y)loo_crps(ypred1, ypred2, y = fit$y, log_lik = log_lik(fit))## End(Not run)Generic (expected) log-predictive density
Description
Theelpd() methods for arrays and matrices can compute the expected logpointwise predictive density for a new dataset or the log pointwisepredictive density of the observed data (an overestimate of the elpd).
Usage
elpd(x, ...)## S3 method for class 'array'elpd(x, ...)## S3 method for class 'matrix'elpd(x, ...)Arguments
x | A log-likelihood array or matrix. TheMethods (by class)section, below, has detailed descriptions of how to specify the inputs foreach method. |
... | Currently ignored. |
Details
Theelpd() function is an S3 generic and methods are provided for3-D pointwise log-likelihood arrays and matrices.
Methods (by class)
elpd(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number ofchains, andNis the number of data points.elpd(matrix): AnSbyNmatrix, whereSis the sizeof the posterior sample (with all chains merged) andNis the numberof data points.
See Also
The vignetteHoldout validation and K-fold cross-validation of Stanprograms with the loo package for demonstrations of using theelpd()methods.
Examples
# Calculate the lpd of the observed dataLLarr <- example_loglik_array()elpd(LLarr)Objects to use in examples and tests
Description
Example pointwise log-likelihood objects to use in demonstrations and tests.See theValue andExamples sections below.
Usage
example_loglik_array()example_loglik_matrix()Value
example_loglik_array() returns a 500 (draws) x 2 (chains) x 32(observations) pointwise log-likelihood array.
example_loglik_matrix() returns the same pointwise log-likelihood valuesasexample_loglik_array() but reshaped into a 1000 (draws*chains) x 32(observations) matrix.
Examples
LLarr <- example_loglik_array()(dim_arr <- dim(LLarr))LLmat <- example_loglik_matrix()(dim_mat <- dim(LLmat))all.equal(dim_mat[1], dim_arr[1] * dim_arr[2])all.equal(dim_mat[2], dim_arr[3])all.equal(LLarr[, 1, ], LLmat[1:500, ])all.equal(LLarr[, 2, ], LLmat[501:1000, ])Extract pointwise log-likelihood from a Stan model
Description
Convenience function for extracting the pointwise log-likelihoodmatrix or array from astanfit object from therstan package.Note: recent versions ofrstan now include aloo() method forstanfit objects that handles this internally.
Usage
extract_log_lik(stanfit, parameter_name = "log_lik", merge_chains = TRUE)Arguments
stanfit | A |
parameter_name | A character string naming the parameter (or generatedquantity) in the Stan model corresponding to the log-likelihood. |
merge_chains | If |
Details
Stan does not automatically compute and store the log-likelihood. Itis up to the user to incorporate it into the Stan program if it is to beextracted after fitting the model. In a Stan model, the pointwise loglikelihood can be coded as a vector in the transformed parameters block(and then summed up in the model block) or it can be coded entirely in thegenerated quantities block. We recommend using the generated quantitiesblock so that the computations are carried out only once per iterationrather than once per HMC leapfrog step.
For example, the following is thegenerated quantities block forcomputing and saving the log-likelihood for a linear regression model withN data points, outcomey, predictor matrixX,coefficientsbeta, and standard deviationsigma:
vector[N] log_lik;
for (n in 1:N) log_lik[n] = normal_lpdf(y[n] | X[n, ] * beta, sigma);
Value
Ifmerge_chains=TRUE, anS byN matrix of(post-warmup) extracted draws, whereS is the size of the posteriorsample andN is the number of data points. Ifmerge_chains=FALSE, anI byC byN array, whereI \times C = S.
References
Stan Development Team (2017). The Stan C++ Library, Version 2.16.0.https://mc-stan.org/
Stan Development Team (2017). RStan: the R interface to Stan, Version 2.16.1.https://mc-stan.org/
Find the model names associated with"loo" objects
Description
Find the model names associated with"loo" objects
Usage
find_model_names(x)Arguments
x | List of |
Value
Character vector of model names the same length asx.
Estimate parameters of the Generalized Pareto distribution
Description
Given a samplex, Estimate the parametersk and\sigma ofthe generalized Pareto distribution (GPD), assuming the location parameter is0. By default the fit uses a prior fork, which will stabilizeestimates for very small sample sizes (and low effective sample sizes in thecase of MCMC samples). The weakly informative prior is a Gaussian priorcentered at 0.5.
Usage
gpdfit(x, wip = TRUE, min_grid_pts = 30, sort_x = TRUE)Arguments
x | A numeric vector. The sample from which to estimate the parameters. |
wip | Logical indicating whether to adjust |
min_grid_pts | The minimum number of grid points used in the fittingalgorithm. The actual number used is |
sort_x | If |
Details
Here the parameterk is the negative ofk in Zhang &Stephens (2009).
Value
A named list with componentsk andsigma.
References
Zhang, J., and Stephens, M. A. (2009). A new and efficient estimation methodfor the generalized Pareto distribution.Technometrics51, 316-325.
See Also
A parent class for different importance sampling methods.
Description
A parent class for different importance sampling methods.
Usage
importance_sampling(log_ratios, method, ...)## S3 method for class 'array'importance_sampling( log_ratios, method, ..., r_eff = 1, cores = getOption("mc.cores", 1))## S3 method for class 'matrix'importance_sampling( log_ratios, method, ..., r_eff = 1, cores = getOption("mc.cores", 1))## Default S3 method:importance_sampling(log_ratios, method, ..., r_eff = 1)Arguments
log_ratios | An array, matrix, or vector of importance ratios on the logscale (for PSIS-LOO these arenegative log-likelihood values). See theMethods (by class) section below for a detailed description of howto specify the inputs for each method. |
method | The importance sampling method to use. The following methodsare implemented: |
... | Arguments passed on to the various methods. |
r_eff | Vector of relative effective sample size estimates containingone element per observation. The values provided should be the relativeeffective sample sizes of |
cores | The number of cores to use for parallelization. This defaults tothe option
|
Generic function for K-fold cross-validation for developers
Description
For developers of Bayesian modeling packages,loo includesa generic functionkfold() so that methods may be defined for K-foldCV without name conflicts between packages. See, for example, thekfold() methods in therstanarm andbrms packages.
TheValue section below describes the objects thatkfold()methods should return in order to be compatible withloo_compare() and theloo package print methods.
Usage
kfold(x, ...)is.kfold(x)Arguments
x | A fitted model object. |
... | Arguments to pass to specific methods. |
Value
For developers defining akfold() method for a class"foo", thekfold.foo() function should return a list with classc("kfold", "loo") with at least the following named elements:
"estimates": A1x2matrix containing the ELPD estimate and itsstandard error. The matrix must have row name "elpd_kfold" and columnnames"Estimate"and"SE"."pointwise": ANx1matrix with column name"elpd_kfold"containingthe pointwise contributions for each data point.
It is important for the object to have at least these classes andcomponents so that it is compatible with other functions likeloo_compare() andprint() methods.
Helper functions for K-fold cross-validation
Description
These functions can be used to generate indexes for use withK-fold cross-validation. See theDetails section for explanations.
Usage
kfold_split_random(K = 10, N = NULL)kfold_split_stratified(K = 10, x = NULL)kfold_split_grouped(K = 10, x = NULL)Arguments
K | The number of folds to use. |
N | The number of observations in the data. |
x | A discrete variable of length |
Details
kfold_split_random() splits the data intoK groupsof equal size (or roughly equal size).
For a categorical variablexkfold_split_stratified()splits the observations intoK groups ensuring that relativecategory frequencies are approximately preserved.
For a grouping variablex,kfold_split_grouped() placesall observations inx from the same group/level together inthe same fold. The selection of which groups/levels go into whichfold (relevant when when there are more groups than folds) israndomized.
Value
An integer vector of lengthN where each element is an index in1:K.
Examples
ids <- kfold_split_random(K = 5, N = 20)print(ids)table(ids)x <- sample(c(0, 1), size = 200, replace = TRUE, prob = c(0.05, 0.95))table(x)ids <- kfold_split_stratified(K = 5, x = x)print(ids)table(ids, x)grp <- gl(n = 50, k = 15, labels = state.name)length(grp)head(table(grp))ids_10 <- kfold_split_grouped(K = 10, x = grp)(tab_10 <- table(grp, ids_10))colSums(tab_10)ids_9 <- kfold_split_grouped(K = 9, x = grp)(tab_9 <- table(grp, ids_9))colSums(tab_9)Efficient approximate leave-one-out cross-validation (LOO)
Description
Theloo() methods for arrays, matrices, and functions compute PSIS-LOOCV, efficient approximate leave-one-out (LOO) cross-validation for Bayesianmodels using Pareto smoothed importance sampling (PSIS). This isan implementation of the methods described in Vehtari, Gelman, and Gabry(2017) and Vehtari, Simpson, Gelman, Yao, and Gabry (2024).
Theloo_i() function enables testing log-likelihoodfunctions for use with theloo.function() method.
Usage
loo(x, ...)## S3 method for class 'array'loo( x, ..., r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis"))## S3 method for class 'matrix'loo( x, ..., r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis"))## S3 method for class ''function''loo( x, ..., data = NULL, draws = NULL, r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), is_method = c("psis", "tis", "sis"))loo_i(i, llfun, ..., data = NULL, draws = NULL, r_eff = 1, is_method = "psis")is.loo(x)is.psis_loo(x)Arguments
x | A log-likelihood array, matrix, or function. TheMethods (by class)section, below, has detailed descriptions of how to specify the inputs foreach method. |
r_eff | Vector of relative effective sample size estimates for thelikelihood ( |
save_psis | Should the |
cores | The number of cores to use for parallelization. This defaults tothe option
|
is_method | The importance sampling method to use. The following methodsare implemented: |
data,draws,... | For the |
i | For |
llfun | For |
Details
Theloo() function is an S3 generic and methods are provided for3-D pointwise log-likelihood arrays, pointwise log-likelihood matrices, andlog-likelihood functions. The array and matrix methods are the mostconvenient, but for models fit to very large datasets theloo.function()method is more memory efficient and may be preferable.
Value
Theloo() methods return a named list with classc("psis_loo", "loo") and components:
estimatesA matrix with two columns (
Estimate,SE) and three rows (elpd_loo,p_loo,looic). This contains point estimates and standard errors of theexpected log pointwise predictive density (elpd_loo), theeffective number of parameters (p_loo) and the LOOinformation criterionlooic(which is just-2 * elpd_loo, i.e.,converted to deviance scale).pointwiseA matrix with five columns (and number of rows equal to the number ofobservations) containing the pointwise contributions of the measures(
elpd_loo,mcse_elpd_loo,p_loo,looic,influence_pareto_k).in addition to the three measures inestimates, we also reportpointwise values of the Monte Carlo standard error ofelpd_loo(mcse_elpd_loo), and statistics describing the influence ofeach observation on the posterior distribution (influence_pareto_k).These are the estimates of the shape parameterkof thegeneralized Pareto fit to the importance ratios for each leave-one-outdistribution (see thepareto-k-diagnostic page for details).diagnosticsA named list containing two vectors:
pareto_k: Importance sampling reliability diagnostics. By default,these are equal to theinfluence_pareto_kinpointwise.Some algorithms can improve importance sampling reliability andmodify these diagnostics. See thepareto-k-diagnostic page for details.n_eff: PSIS effective sample size estimates.
psis_objectThis component will be
NULLunless thesave_psisargument is set toTRUEwhen callingloo(). In that casepsis_objectwill be the objectof class"psis"that is created when theloo()function callspsis()internally to do the PSIS procedure.
Theloo_i() function returns a named list with componentspointwise anddiagnostics. These components have the samestructure as thepointwise anddiagnostics components of theobject returned byloo() except they contain results for only a singleobservation.
Methods (by class)
loo(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number ofchains, andNis the number of data points.loo(matrix): AnSbyNmatrix, whereSis the sizeof the posterior sample (with all chains merged) andNis the numberof data points.loo(`function`): A functionf()that takes argumentsdata_ianddrawsand returns avector containing the log-likelihood for a single observationievaluatedat each posterior draw. The function should be written such that, for eachobservationiin1:N, evaluatingf(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length
S(size of posterior sample). Thelog-likelihood function can also have additional arguments butdata_ianddrawsare required.If using the function method then the arguments
dataanddrawsmust alsobe specified in the call toloo():data: A data frame or matrix containing the data (e.g.observed outcome and predictors) needed to compute the pointwiselog-likelihood. For each observationi, theith row ofdatawill be passed to thedata_iargument of thelog-likelihood function.draws: An object containing the posterior draws for anyparameters needed to compute the pointwise log-likelihood. Unlikedata, which is indexed by observation, for each observation theentire objectdrawswill be passed to thedrawsargument ofthe log-likelihood function.The
...can be used if your log-likelihood function takes additionalarguments. These arguments are used like thedrawsargument in that theyare recycled for each observation.
Definingloo() methods in a package
Package developers can defineloo() methods for fitted models objects. See the exampleloo.stanfit()method in theExamples section below for an example of defining amethod that callsloo.array(). Theloo.stanreg() method in therstanarm package is an example of defining a method that callsloo.function().
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian modelevaluation using leave-one-out cross-validation and WAIC.Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4(journal version,preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).Pareto smoothed importance sampling.Journal of Machine Learning Research,25(72):1-58.PDF
See Also
Theloo packagevignettesfor demonstrations.
TheFAQ page ontheloo website for answers to frequently asked questions.
psis()for the underlying Pareto Smoothed Importance Sampling (PSIS)procedure used in the LOO-CV approximation.pareto-k-diagnostic for convenience functions for looking at diagnostics.
loo_compare()for model comparison.
Examples
### Array and matrix methods (using example objects included with loo package)# Array methodLLarr <- example_loglik_array()rel_n_eff <- relative_eff(exp(LLarr))loo(LLarr, r_eff = rel_n_eff, cores = 2)# Matrix methodLLmat <- example_loglik_matrix()rel_n_eff <- relative_eff(exp(LLmat), chain_id = rep(1:2, each = 500))loo(LLmat, r_eff = rel_n_eff, cores = 2)### Using log-likelihood function instead of array or matrixset.seed(124)# Simulate data and draw from posteriorN <- 50; K <- 10; S <- 100; a0 <- 3; b0 <- 2p <- rbeta(1, a0, b0)y <- rbinom(N, size = K, prob = p)a <- a0 + sum(y); b <- b0 + N * K - sum(y)fake_posterior <- as.matrix(rbeta(S, a, b))dim(fake_posterior) # S x 1fake_data <- data.frame(y,K)dim(fake_data) # N x 2llfun <- function(data_i, draws) { # each time called internally within loo the arguments will be equal to: # data_i: ith row of fake_data (fake_data[i,, drop=FALSE]) # draws: entire fake_posterior matrix dbinom(data_i$y, size = data_i$K, prob = draws, log = TRUE)}# Use the loo_i function to check that llfun works on a single observation# before running on all obs. For example, using the 3rd obs in the data:loo_3 <- loo_i(i = 3, llfun = llfun, data = fake_data, draws = fake_posterior)print(loo_3$pointwise[, "elpd_loo"])# Use loo.function method (default r_eff=1 is used as this posterior not obtained via MCMC)loo_with_fn <- loo(llfun, draws = fake_posterior, data = fake_data)# If we look at the elpd_loo contribution from the 3rd obs it should be the# same as what we got above with the loo_i function and i=3:print(loo_with_fn$pointwise[3, "elpd_loo"])print(loo_3$pointwise[, "elpd_loo"])# Check that the loo.matrix method gives same answer as loo.function methodlog_lik_matrix <- sapply(1:N, function(i) { llfun(data_i = fake_data[i,, drop=FALSE], draws = fake_posterior)})loo_with_mat <- loo(log_lik_matrix)all.equal(loo_with_mat$estimates, loo_with_fn$estimates) # should be TRUE!## Not run: ### For package developers: defining loo methods# An example of a possible loo method for 'stanfit' objects (rstan package).# A similar method is included in the rstan package.# In order for users to be able to call loo(stanfit) instead of# loo.stanfit(stanfit) the NAMESPACE needs to be handled appropriately# (roxygen2 and devtools packages are good for that).#loo.stanfit <- function(x, pars = "log_lik", ..., save_psis = FALSE, cores = getOption("mc.cores", 1)) { stopifnot(length(pars) == 1L) LLarray <- loo::extract_log_lik(stanfit = x, parameter_name = pars, merge_chains = FALSE) r_eff <- loo::relative_eff(x = exp(LLarray), cores = cores) loo::loo.array(LLarray, r_eff = r_eff, cores = cores, save_psis = save_psis)}## End(Not run)Datasets for loo examples and vignettes
Description
Small datasets for use inloo examples and vignettes. TheKlineandmilk datasets are also included in therethinking package(McElreath, 2016a), but we include them here asrethinking is noton CRAN.
Details
Currently the data sets included are:
Kline:Small dataset from Kline and Boyd (2010) on tool complexity and demographyin Oceanic islands societies. This data is discussed in detail inMcElreath (2016a,2016b).(Link to variable descriptions)milk:Small dataset from Hinde and Milligan (2011) on primate milkcomposition.This data is discussed in detail in McElreath (2016a,2016b).(Link to variable descriptions)voice:Voice rehabilitation data from Tsanas et al. (2014).
References
Hinde and Milligan. 2011.Evolutionary Anthropology 20:9-23.
Kline, M.A. and R. Boyd. 2010.Proc R Soc B 277:2559-2564.
McElreath, R. (2016a). rethinking: Statistical Rethinking book package.R package version 1.59.
McElreath, R. (2016b).Statistical rethinking: A Bayesian course withexamples in R and Stan. Chapman & Hall/CRC.
A. Tsanas, M.A. Little, C. Fox, L.O. Ramig: Objective automatic assessment ofrehabilitative speech treatment in Parkinson's disease, IEEETransactions on Neural Systems and Rehabilitation Engineering, Vol. 22, pp.181-190, January 2014
Examples
str(Kline)str(milk)LOO package glossary
Description
The pages provides definitions to key terms. Also see theFAQ page ontheloo website for answers to frequently asked questions.
Note: VGG2017 refers to Vehtari, Gelman, and Gabry (2017). SeeReferences, below.
ELPD andelpd_loo
The ELPD is the theoretical expected log pointwise predictive density for a newdataset (Eq 1 in VGG2017), which can be estimated, e.g., usingcross-validation.elpd_loo is the Bayesian LOO estimate of theexpected log pointwise predictive density (Eq 4 in VGG2017) andis a sum of N individual pointwise log predictive densities. Probabilitydensities can be smaller or larger than 1, and thus log predictive densitiescan be negative or positive. For simplicity the ELPD acronym is used also forexpected log pointwise predictive probabilities for discrete models.Probabilities are always equal or less than 1, and thus log predictiveprobabilities are 0 or negative.
Standard error ofelpd_loo
Aselpd_loo is defined as the sum of N independent components (Eq 4 inVGG2017), we can compute the standard error by using the standard deviationof the N components and multiplying bysqrt(N) (Eq 23 in VGG2017).This standard error is a coarse description of our uncertainty about thepredictive performance for unknown future data. When N is small or there issevere model misspecification, the current SE estimate is overoptimistic andthe actual SE can even be twice as large. Even for moderate N, when the SEestimate is an accurate estimate for the scale, it ignores the skewness. Whenmaking model comparisons, the SE of the component-wise (pairwise) differencesshould be used instead (see these_diff section below and Eq 24 inVGG2017). Sivula et al. (2022) discuss the conditions when the normalapproximation used for SE andse_diff is good.
Monte Carlo SE of elpd_loo
The Monte Carlo standard error is the estimate for the computational accuracyof MCMC and importance sampling used to computeelpd_loo. Usually thisis negligible compared to the standard describing the uncertainty due tofinite number of observations (Eq 23 in VGG2017).
p_loo (effective number of parameters)
p_loo is the difference betweenelpd_loo and the non-cross-validatedlog posterior predictive density. It describes how much more difficult itis to predict future data than the observed data. Asymptotically undercertain regularity conditions,p_loo can be interpreted as theeffective number of parameters. In well behaving casesp_loo < N andp_loo < p, wherep is the total number of parameters in themodel.p_loo > N orp_loo > p indicates that the model has veryweak predictive capability and may indicate a severe model misspecification.See below for more on interpretingp_loo when there are warningsabout high Pareto k diagnostic values.
Pareto k estimates
The Paretok estimate is a diagnostic for Pareto smoothed importancesampling (PSIS), which is used to compute components ofelpd_loo. Inimportance-sampling LOO the full posterior distribution is used as theproposal distribution. The Pareto k diagnostic estimates how far anindividual leave-one-out distribution is from the full distribution. Ifleaving out an observation changes the posterior too much then importancesampling is not able to give a reliable estimate. Pareto smoothing stabilizesimportance sampling and guarantees a finite variance estimate at thecost of some bias.
The diagnostic threshold for Paretok depends on sample sizeS (sample size dependent threshold was introduced by Vehtariet al., 2024, and before that fixed thresholds of 0.5 and 0.7 wererecommended). For simplicity,loo package uses the nominal samplesizeS when computing the sample size specificthreshold. This provides an optimistic threshold if the effectivesample size is less than 2200, but even then if ESS/S > 1/2 the differenceis usually negligible. Thinning of MCMC draws can be used to improvethe ratio ESS/S.
If
k < \min(1 - 1 / \log_{10}(S), 0.7), whereSis thesample size, the PSIS estimate and the corresponding MonteCarlo standard error estimate are reliable.If
1 - 1 / \log_{10}(S) <= k < 0.7, the PSIS estimate and thecorresponding Monte Carlo standard error estimate are notreliable, but increasing the (effective) sample sizeSabove2200 may help (this will increase the sample size specificthreshold(1 - 1 / \log_{10}(2200) > 0.7and then the bias specificthreshold 0.7 dominates).If
0.7 <= k < 1, the PSIS estimate and the corresponding MonteCarlo standard error have large bias and are not reliable. Increasingthe sample size may reduce the variability in thekestimate, whichmay also result in a lowerkestimate.If
k \geq 1, the target distribution is estimated tohave non-finite mean. The PSIS estimate and the corresponding MonteCarlo standard error are not well defined. Increasing the sample sizemay reduce the variability inkestimate, which may also result ina lowerkestimate.
Paretok is also useful as a measure of influence of anobservation. Highly influential observations have highkvalues. Very highk values often indicate modelmisspecification, outliers or mistakes in data processing. SeeSection 6 of Gabry et al. (2019) for an example.
Interpretingp_loo when Paretok is large
Ifk > 0.7 then we can also look atthep_loo estimate for some additional information about the problem:
If
p_loo << p(the total number of parameters in the model),then the model is likely to be misspecified. Posterior predictive checks(PPCs) are then likely to also detect the problem. Try using an overdispersedmodel, or add more structural information (nonlinearity, mixture model,etc.).If
p_loo < pand the number of parameterspis relativelylarge compared to the number of observations (e.g.,p>N/5), it islikely that the model is so flexible or the population prior so weak that it’sdifficult to predict the left out observation (even for the true model).This happens, for example, in the simulated 8 schools (in VGG2017), randomeffect models with a few observations per random effect, and Gaussianprocesses and spatial models with short correlation lengths.If
p_loo > p, then the model is likely to be badly misspecified.If the number of parametersp<<N, then PPCs are also likely to detect theproblem. See the case study athttps://avehtari.github.io/modelselection/roaches.html for an example.Ifpis relatively large compared to the number ofobservations, sayp>N/5(more accurately we should count number ofobservations influencing each parameter as in hierarchical models some groupsmay have few observations and other groups many), it is possible that PPCs won'tdetect the problem.
elpd_diff
elpd_diff is the difference inelpd_loo for two models. If morethan two models are compared, the difference is computed relative to themodel with highestelpd_loo.
se_diff
The standard error of component-wise differences of elpd_loo (Eq 24 inVGG2017) between two models. This SE issmaller than the SE forindividual models due to correlation (i.e., if some observations are easierand some more difficult to predict for all models).
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian modelevaluation using leave-one-out cross-validation and WAIC.Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4(journal version,preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).Pareto smoothed importance sampling.Journal of Machine Learning Research,25(72):1-58.PDF
Sivula, T, Magnusson, M., Matamoros A. A., and Vehtari,A. (2022). Uncertainty in Bayesian leave-one-outcross-validation based model comparison.preprint arXiv:2008.10296v3..
Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. andGelman, A. (2019), Visualization in Bayesian workflow.J. R. Stat. Soc. A, 182: 389-402. doi:10.1111/rssa.12378(journal version,preprint arXiv:1709.01449,code on GitHub)
Efficient approximate leave-one-out cross-validation (LOO) for posteriorapproximations
Description
Efficient approximate leave-one-out cross-validation (LOO) for posteriorapproximations
Usage
loo_approximate_posterior(x, log_p, log_g, ...)## S3 method for class 'array'loo_approximate_posterior( x, log_p, log_g, ..., save_psis = FALSE, cores = getOption("mc.cores", 1))## S3 method for class 'matrix'loo_approximate_posterior( x, log_p, log_g, ..., save_psis = FALSE, cores = getOption("mc.cores", 1))## S3 method for class ''function''loo_approximate_posterior( x, ..., data = NULL, draws = NULL, log_p = NULL, log_g = NULL, save_psis = FALSE, cores = getOption("mc.cores", 1))Arguments
x | A log-likelihood array, matrix, or function.TheMethods (by class) section, below, has detailed descriptions of howto specify the inputs for each method. |
log_p | The log-posterior (target) evaluated at S samples from theproposal distribution (g). A vector of length S. |
log_g | The log-density (proposal) evaluated at S samples from theproposal distribution (g). A vector of length S. |
save_psis | Should the |
cores | The number of cores to use for parallelization. This defaults tothe option
|
data,draws,... | For the |
Details
Theloo_approximate_posterior() function is an S3 generic andmethods are provided for 3-D pointwise log-likelihood arrays, pointwiselog-likelihood matrices, and log-likelihood functions. The implementationworks for posterior approximations where it is possible to compute the logdensity for the posterior approximation.
Value
Theloo_approximate_posterior() methods return a named list withclassc("psis_loo_ap", "psis_loo", "loo"). It has the same structureas the objects returned byloo() but with the additional slot:
posterior_approximationA list with two vectors,
log_pandlog_gof the same lengthcontaining the posterior density and the approximation densityfor the individual draws.
Methods (by class)
loo_approximate_posterior(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number ofchains, andNis the number of data points.loo_approximate_posterior(matrix): AnSbyNmatrix, whereSis the sizeof the posterior sample (with all chains merged) andNis the numberof data points.loo_approximate_posterior(`function`): A functionf()that takes argumentsdata_ianddrawsand returns avector containing the log-likelihood for a single observationievaluatedat each posterior draw. The function should be written such that, for eachobservationiin1:N, evaluatingf(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length
S(size of posterior sample). Thelog-likelihood function can also have additional arguments butdata_ianddrawsare required.If using the function method then the arguments
dataanddrawsmust alsobe specified in the call toloo():data: A data frame or matrix containing the data (e.g.observed outcome and predictors) needed to compute the pointwiselog-likelihood. For each observationi, theith row ofdatawill be passed to thedata_iargument of thelog-likelihood function.draws: An object containing the posterior draws for anyparameters needed to compute the pointwise log-likelihood. Unlikedata, which is indexed by observation, for each observation theentire objectdrawswill be passed to thedrawsargument ofthe log-likelihood function.The
...can be used if your log-likelihood function takes additionalarguments. These arguments are used like thedrawsargument in that theyare recycled for each observation.
References
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2019).Leave-One-Out Cross-Validation for Large Data.InThirty-sixth International Conference on Machine Learning,PMLR 97:4244-4253.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2020).Leave-One-Out Cross-Validation for Model Comparison in Large Data.InProceedings of the 23rd International Conference on ArtificialIntelligence and Statistics (AISTATS), PMLR 108:341-351.
See Also
Model comparison
Description
Compare fitted models based onELPD.
By default the print method shows only the most important information. Useprint(..., simplify=FALSE) to print a more detailed summary.
Usage
loo_compare(x, ...)## Default S3 method:loo_compare(x, ...)## S3 method for class 'compare.loo'print(x, ..., digits = 1, simplify = TRUE)## S3 method for class 'compare.loo_ss'print(x, ..., digits = 1, simplify = TRUE)Arguments
x | An object of class |
... | Additional objects of class |
digits | For the print method only, the number of digits to use whenprinting. |
simplify | For the print method only, should only the essential columnsof the summary matrix be printed? The entire matrix is always returned, butby default only the most important columns are printed. |
Details
When comparing two fitted models, we can estimate the difference in theirexpected predictive accuracy by the difference inelpd_loo orelpd_waic (or multiplied by-2, ifdesired, to be on the deviance scale).
When usingloo_compare(), the returned matrix will have one row per modeland several columns of estimates. The values in theelpd_diff andse_diff columns of thereturned matrix are computed by making pairwise comparisons between eachmodel and the model with the largest ELPD (the model in the first row). Forthis reason theelpd_diff column will always have the value0 in thefirst row (i.e., the difference between the preferred model and itself) andnegative values in subsequent rows for the remaining models.
To compute the standard error of the difference inELPD —which should not be expected to equal the difference of the standard errors— we use a paired estimate to take advantage of the fact that the sameset ofN data points was used to fit both models. These calculationsshould be most useful whenN is large, because then non-normality ofthe distribution is not such an issue when estimating the uncertainty inthese sums. These standard errors, for all their flaws, should give abetter sense of uncertainty than what is obtained using the currentstandard approach of comparing differences of deviances to a Chi-squareddistribution, a practice derived for Gaussian linear models orasymptotically, and which only applies to nested models in any case.Sivula et al. (2022) discuss the conditions when the normalapproximation used for SE andse_diff is good.
If more than11 models are compared, we internally recompute the modeldifferences using the median model by ELPD as the baseline model. We thenestimate whether the differences in predictive performance are potentiallydue to chance as described by McLatchie and Vehtari (2023). This will flaga warning if it is deemed that there is a risk of over-fitting due to theselection process. In that case users are recommended to avoid modelselection based on LOO-CV, and instead to favor model averaging/stacking orprojection predictive inference.
Value
A matrix with class"compare.loo" that has its ownprint method. See theDetails section.
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian modelevaluation using leave-one-out cross-validation and WAIC.Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4(journal version,preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).Pareto smoothed importance sampling.Journal of Machine Learning Research,25(72):1-58.PDF
Sivula, T, Magnusson, M., Matamoros A. A., and Vehtari, A. (2022).Uncertainty in Bayesian leave-one-out cross-validation based modelcomparison.preprint arXiv:2008.10296v3..
McLatchie, Y., and Vehtari, A. (2023). Efficient estimation andcorrection of selection-induced bias with order statistics.preprint arXiv:2309.03742
See Also
TheFAQ page ontheloo website for answers to frequently asked questions.
Examples
# very artificial example, just for demonstration!LL <- example_loglik_array()loo1 <- loo(LL) # should be worst model when comparedloo2 <- loo(LL + 1) # should be second best model when comparedloo3 <- loo(LL + 2) # should be best model when comparedcomp <- loo_compare(loo1, loo2, loo3)print(comp, digits = 2)# show more details with simplify=FALSE# (will be the same for all models in this artificial example)print(comp, simplify = FALSE, digits = 3)# can use a list of objects with custom names# will use apple, banana, and cherry, as the names in the outputloo_compare(list("apple" = loo1, "banana" = loo2, "cherry" = loo3))## Not run: # works for waic (and kfold) tooloo_compare(waic(LL), waic(LL - 10))## End(Not run)Model averaging/weighting via stacking or pseudo-BMA weighting
Description
Model averaging via stacking of predictive distributions, pseudo-BMAweighting or pseudo-BMA+ weighting with the Bayesian bootstrap. See Yao etal. (2018), Vehtari, Gelman, and Gabry (2017), and Vehtari, Simpson,Gelman, Yao, and Gabry (2024) for background.
Usage
loo_model_weights(x, ...)## Default S3 method:loo_model_weights( x, ..., method = c("stacking", "pseudobma"), optim_method = "BFGS", optim_control = list(), BB = TRUE, BB_n = 1000, alpha = 1, r_eff_list = NULL, cores = getOption("mc.cores", 1))stacking_weights(lpd_point, optim_method = "BFGS", optim_control = list())pseudobma_weights(lpd_point, BB = TRUE, BB_n = 1000, alpha = 1)Arguments
x | A list of |
... | Unused, except for the generic to pass arguments to individualmethods. |
method | Either |
optim_method | If |
optim_control | If |
BB | Logical used when |
BB_n | For pseudo-BMA+ weighting only, the number of samples to use forthe Bayesian bootstrap. The default is |
alpha | Positive scalar shape parameter in the Dirichlet distributionused for the Bayesian bootstrap. The default is |
r_eff_list | Optionally, a list of relative effective sample sizeestimates for the likelihood |
cores | The number of cores to use for parallelization. This defaults tothe option
|
lpd_point | If calling |
Details
loo_model_weights() is a wrapper around thestacking_weights() andpseudobma_weights() functions that implements stacking, pseudo-BMA, andpseudo-BMA+ weighting for combining multiple predictive distributions. We canuse approximate or exact leave-one-out cross-validation (LOO-CV) or K-fold CVto estimate the expected log predictive density (ELPD).
The stacking method (method="stacking"), which is the default forloo_model_weights(), combines all models by maximizing the leave-one-outpredictive density of the combination distribution. That is, it finds theoptimal linear combining weights for maximizing the leave-one-out log score.
The pseudo-BMA method (method="pseudobma") finds the relative weightsproportional to the ELPD of each model. However, whenmethod="pseudobma", the default is to also use the Bayesian bootstrap(BB=TRUE), which corresponds to the pseudo-BMA+ method. The Bayesianbootstrap takes into account the uncertainty of finite data points andregularizes the weights away from the extremes of 0 and 1.
In general, we recommend stacking for averaging predictive distributions,while pseudo-BMA+ can serve as a computationally easier alternative.
Value
A numeric vector containing one weight for each model.
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian modelevaluation using leave-one-out cross-validation and WAIC.Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4(journal version,preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).Pareto smoothed importance sampling.Journal of Machine Learning Research,25(72):1-58.PDF
Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018) Usingstacking to average Bayesian predictive distributions.Bayesian Analysis, advance publication, doi:10.1214/17-BA1091.(online).
See Also
Theloo packagevignettes, particularlyBayesian Stacking and Pseudo-BMA weights using theloo package.
loo()for details on leave-one-out ELPD estimation.constrOptim()for the choice of optimization methods and control-parameters.relative_eff()for computingr_eff.
Examples
## Not run: ### Demonstrating usage after fitting models with RStanlibrary(rstan)# generate fake data from N(0,1).N <- 100y <- rnorm(N, 0, 1)# Suppose we have three models: N(-1, sigma), N(0.5, sigma) and N(0.6,sigma).stan_code <- " data { int N; vector[N] y; real mu_fixed; } parameters { real<lower=0> sigma; } model { sigma ~ exponential(1); y ~ normal(mu_fixed, sigma); } generated quantities { vector[N] log_lik; for (n in 1:N) log_lik[n] = normal_lpdf(y[n]| mu_fixed, sigma); }"mod <- stan_model(model_code = stan_code)fit1 <- sampling(mod, data=list(N=N, y=y, mu_fixed=-1))fit2 <- sampling(mod, data=list(N=N, y=y, mu_fixed=0.5))fit3 <- sampling(mod, data=list(N=N, y=y, mu_fixed=0.6))model_list <- list(fit1, fit2, fit3)log_lik_list <- lapply(model_list, extract_log_lik)# optional but recommendedr_eff_list <- lapply(model_list, function(x) { ll_array <- extract_log_lik(x, merge_chains = FALSE) relative_eff(exp(ll_array))})# stacking method:wts1 <- loo_model_weights( log_lik_list, method = "stacking", r_eff_list = r_eff_list, optim_control = list(reltol=1e-10))print(wts1)# can also pass a list of psis_loo objects to avoid recomputing looloo_list <- lapply(1:length(log_lik_list), function(j) { loo(log_lik_list[[j]], r_eff = r_eff_list[[j]])})wts2 <- loo_model_weights( loo_list, method = "stacking", optim_control = list(reltol=1e-10))all.equal(wts1, wts2)# can provide names to be used in the resultsloo_model_weights(setNames(loo_list, c("A", "B", "C")))# pseudo-BMA+ method:set.seed(1414)loo_model_weights(loo_list, method = "pseudobma")# pseudo-BMA method (set BB = FALSE):loo_model_weights(loo_list, method = "pseudobma", BB = FALSE)# calling stacking_weights or pseudobma_weights directlylpd1 <- loo(log_lik_list[[1]], r_eff = r_eff_list[[1]])$pointwise[,1]lpd2 <- loo(log_lik_list[[2]], r_eff = r_eff_list[[2]])$pointwise[,1]lpd3 <- loo(log_lik_list[[3]], r_eff = r_eff_list[[3]])$pointwise[,1]stacking_weights(cbind(lpd1, lpd2, lpd3))pseudobma_weights(cbind(lpd1, lpd2, lpd3))pseudobma_weights(cbind(lpd1, lpd2, lpd3), BB = FALSE)## End(Not run)Moment matching for efficient approximate leave-one-out cross-validation (LOO)
Description
Moment matching algorithm for updating a loo object when Pareto k estimatesare large.
Usage
loo_moment_match(x, ...)## Default S3 method:loo_moment_match( x, loo, post_draws, log_lik_i, unconstrain_pars, log_prob_upars, log_lik_i_upars, max_iters = 30L, k_threshold = NULL, split = TRUE, cov = TRUE, cores = getOption("mc.cores", 1), ...)Arguments
x | A fitted model object. |
... | Further arguments passed to the custom functions documented above. |
loo | A loo object to be modified. |
post_draws | A function the takes |
log_lik_i | A function that takes |
unconstrain_pars | A function that takes arguments |
log_prob_upars | A function that takes arguments |
log_lik_i_upars | A function that takes arguments |
max_iters | Maximum number of moment matching iterations. Usually thisdoes not need to be modified. If the maximum number of iterations isreached, there will be a warning, and increasing |
k_threshold | Threshold value for Pareto k values above which the momentmatching algorithm is used. The default value is |
split | Logical; Indicate whether to do the split transformation or notat the end of moment matching for each LOO fold. |
cov | Logical; Indicate whether to match the covariance matrix of thesamples or not. If |
cores | The number of cores to use for parallelization. This defaults tothe option
|
Details
Theloo_moment_match() function is an S3 generic and we provide adefault method that takes as arguments user-specified functionspost_draws,log_lik_i,unconstrain_pars,log_prob_upars, andlog_lik_i_upars. All of these functions should take.... as an argumentin addition to those specified for each function.
Value
Theloo_moment_match() methods return an updatedloo object. Thestructure of the updatedloo object is similar, but the method alsostores the original Pareto k diagnostic values in the diagnostics field.
Methods (by class)
loo_moment_match(default): A default method that takes as arguments auser-specified model objectx, alooobject and user-specifiedfunctionspost_draws,log_lik_i,unconstrain_pars,log_prob_upars,andlog_lik_i_upars.
References
Paananen, T., Piironen, J., Buerkner, P.-C., Vehtari, A. (2021).Implicitly adaptive importance sampling.Statistics and Computing, 31, 16.doi:10.1007/s11222-020-09982-2. arXiv preprint arXiv:1906.08850.
See Also
loo(),loo_moment_match_split()
Examples
# See the vignette for loo_moment_match()Split moment matching for efficient approximate leave-one-out cross-validation (LOO)
Description
A function that computes the split moment matching importance sampling loo.Takes in the moment matching total transformation, transforms only halfof the draws, and computes a single elpd using multiple importance sampling.
Usage
loo_moment_match_split( x, upars, cov, total_shift, total_scaling, total_mapping, i, log_prob_upars, log_lik_i_upars, r_eff_i, cores, is_method, ...)Arguments
x | A fitted model object. |
upars | A matrix containing the model parameters in unconstrained spacewhere they can have any real value. |
cov | Logical; Indicate whether to match the covariance matrix of thesamples or not. If |
total_shift | A vector representing the total shift made by the momentmatching algorithm. |
total_scaling | A vector representing the total scaling of marginalvariance made by the moment matching algorithm. |
total_mapping | A vector representing the total covariancetransformation made by the moment matching algorithm. |
i | Observation index. |
log_prob_upars | A function that takes arguments |
log_lik_i_upars | A function that takes arguments |
r_eff_i | MCMC relative effective sample size of the |
cores | The number of cores to use for parallelization. This defaults tothe option
|
is_method | The importance sampling method to use. The following methodsare implemented: |
... | Further arguments passed to the custom functions documented above. |
Value
A list containing the updated log-importance weights andlog-likelihood values. Also returns the updated MCMC effective sample sizeand the integrand-specific log-importance weights.
References
Paananen, T., Piironen, J., Buerkner, P.-C., Vehtari, A. (2021).Implicitly adaptive importance sampling.Statistics and Computing, 31, 16.doi:10.1007/s11222-020-09982-2. arXiv preprint arXiv:1906.08850.
See Also
Estimate leave-one-out predictive performance..
Description
Theloo_predictive_metric() function computes estimates of leave-one-outpredictive metrics given a set of predictions and observations. Currentlysupported metrics are mean absolute error, mean squared error and root meansquared error for continuous predictions and accuracy and balanced accuracyfor binary classification. Predictions are passed on to theE_loo()function, so this function assumes that the PSIS approximation is workingwell.
Usage
loo_predictive_metric(x, ...)## S3 method for class 'matrix'loo_predictive_metric( x, y, log_lik, ..., metric = c("mae", "rmse", "mse", "acc", "balanced_acc"), r_eff = 1, cores = getOption("mc.cores", 1))Arguments
x | A numeric matrix of predictions. |
... | Additional arguments passed on to |
y | A numeric vector of observations. Length should be equal to thenumber of rows in |
log_lik | A matrix of pointwise log-likelihoods. Should be of samedimension as |
metric | The type of predictive metric to be used. Currentlysupported options are
|
r_eff | A Vector of relative effective sample size estimates containingone element per observation. See |
cores | The number of cores to use for parallelization of |
Value
A list with the following components:
estimateEstimate of the given metric.
seStandard error of the estimate.
Examples
if (requireNamespace("rstanarm", quietly = TRUE)) {# Use rstanarm package to quickly fit a model and get both a log-likelihood# matrix and draws from the posterior predictive distributionlibrary("rstanarm")# data from help("lm")ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)d <- data.frame( weight = c(ctl, trt), group = gl(2, 10, 20, labels = c("Ctl","Trt")))fit <- stan_glm(weight ~ group, data = d, refresh = 0)ll <- log_lik(fit)r_eff <- relative_eff(exp(-ll), chain_id = rep(1:4, each = 1000))mu_pred <- posterior_epred(fit)# Leave-one-out mean absolute error of predictionsmae <- loo_predictive_metric(x = mu_pred, y = d$weight, log_lik = ll, pred_error = 'mae', r_eff = r_eff)# Leave-one-out 90%-quantile of mean absolute errormae_90q <- loo_predictive_metric(x = mu_pred, y = d$weight, log_lik = ll, pred_error = 'mae', r_eff = r_eff, type = 'quantile', probs = 0.9)}Efficient approximate leave-one-out cross-validation (LOO) using subsampling,so that less costly and more approximate computation is made for all LOO-fold,and more costly and accurate computations are made only for m<N LOO-folds.
Description
Efficient approximate leave-one-out cross-validation (LOO) using subsampling,so that less costly and more approximate computation is made for all LOO-fold,and more costly and accurate computations are made only for m<N LOO-folds.
Usage
loo_subsample(x, ...)## S3 method for class ''function''loo_subsample( x, ..., data = NULL, draws = NULL, observations = 400, log_p = NULL, log_g = NULL, r_eff = 1, save_psis = FALSE, cores = getOption("mc.cores", 1), loo_approximation = "plpd", loo_approximation_draws = NULL, estimator = "diff_srs", llgrad = NULL, llhess = NULL)Arguments
x | A function. TheMethods (by class) section, below, has detaileddescriptions of how to specify the inputs. |
data,draws,... | For |
observations | The subsample observations to use. The argument can takefour (4) types of arguments:
|
log_p,log_g | Should be supplied only if approximate posterior draws areused. The default ( |
r_eff | Vector of relative effective sample size estimates for thelikelihood ( |
save_psis | Should the |
cores | The number of cores to use for parallelization. This defaults tothe option
|
loo_approximation | What type of approximation of the loo_i's should be used?The default is
As point estimates of |
loo_approximation_draws | The number of posterior draws used whenintegrating over the posterior. This is used if |
estimator | How should
|
llgrad | The gradient of the log-likelihood. Thisis only used when |
llhess | The Hessian of the log-likelihood. This is only usedwith |
Details
Theloo_subsample() function is an S3 generic and a methods iscurrently provided for log-likelihood functions. The implementation worksfor both MCMC and for posterior approximations where it is possible tocompute the log density for the approximation.
Value
loo_subsample() returns a named list with classc("psis_loo_ss", "psis_loo", "loo"). This has the same structure as objects returned byloo() but with the additional slot:
loo_subsampling: A list with two vectors,log_pandlog_g, of thesame length containing the posterior density and the approximation densityfor the individual draws.
Methods (by class)
loo_subsample(`function`): A functionf()that takes argumentsdata_ianddrawsand returns avector containing the log-likelihood for a single observationievaluatedat each posterior draw. The function should be written such that, for eachobservationiin1:N, evaluatingf(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length
S(size of posterior sample). Thelog-likelihood function can also have additional arguments butdata_ianddrawsare required.If using the function method then the arguments
dataanddrawsmust alsobe specified in the call toloo():data: A data frame or matrix containing the data (e.g.observed outcome and predictors) needed to compute the pointwiselog-likelihood. For each observationi, theith row ofdatawill be passed to thedata_iargument of thelog-likelihood function.draws: An object containing the posterior draws for anyparameters needed to compute the pointwise log-likelihood. Unlikedata, which is indexed by observation, for each observation theentire objectdrawswill be passed to thedrawsargument ofthe log-likelihood function.The
...can be used if your log-likelihood function takes additionalarguments. These arguments are used like thedrawsargument in that theyare recycled for each observation.
References
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2019).Leave-One-Out Cross-Validation for Large Data.InThirty-sixth International Conference on Machine Learning,PMLR 97:4244-4253.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2020).Leave-One-Out Cross-Validation for Model Comparison in Large Data.InProceedings of the 23rd International Conference on ArtificialIntelligence and Statistics (AISTATS), PMLR 108:341-351.
See Also
Named lists
Description
Create a named list using specified names or, if names are omitted, using thenames of the objects in the list. The codelist(a = a, b = b) becomesnlist(a,b) andlist(a = a, b = 2) becomesnlist(a, b = 2), etc.
Usage
nlist(...)Arguments
... | Objects to include in the list. |
Value
A named list.
Examples
# All variables already defineda <- rnorm(100)b <- mat.or.vec(10, 3)nlist(a,b)# Define some variables in the call and take the rest from the environmentnlist(a, b, veggies = c("lettuce", "spinach"), fruits = c("banana", "papaya"))The number of observations in apsis_loo_ss object.
Description
The number of observations in apsis_loo_ss object.
Usage
## S3 method for class 'psis_loo_ss'nobs(object, ...)Arguments
object | a |
... | Currently unused. |
Get observation indices used in subsampling
Description
Get observation indices used in subsampling
Usage
obs_idx(x, rep = TRUE)Arguments
x | A |
rep | If sampling with replacement is used, an observation can havemultiple samples and these are then repeated in the returned object if |
Value
An integer vector.
Extractor methods
Description
These are only defined in order to deprecate with a warning (rather thanremove and break backwards compatibility) the old way of accessing the pointestimates in a"psis_loo" or"psis" object. The new way as ofv2.0.0 is to get them from the"estimates" component of the object.
Usage
## S3 method for class 'loo'x[i]## S3 method for class 'loo'x[[i, exact = TRUE]]## S3 method for class 'loo'x$nameArguments
x,i,exact,name | SeeExtract. |
Parallel psis list computations
Description
Parallel psis list computations
Usage
parallel_psis_list( N, .loo_i, .llfun, data, draws, r_eff, save_psis, cores, ...)parallel_importance_sampling_list( N, .loo_i, .llfun, data, draws, r_eff, save_psis, cores, method, ...)Arguments
N | The total number of observations (i.e. |
.loo_i | The function used to compute individual loo contributions. |
.llfun | See |
data,draws,... | For the |
r_eff | Vector of relative effective sample size estimates for thelikelihood ( |
save_psis | Should the |
cores | The number of cores to use for parallelization. This defaults tothe option
|
method | See |
Details
Refactored function to handle parallel computationsfor psis_list
Diagnostics for Pareto smoothed importance sampling (PSIS)
Description
Print a diagnostic table summarizing the estimated Pareto shape parametersand PSIS effective sample sizes, find the indexes of observations for whichthe estimated Pareto shape parameterk is larger than somethreshold value, or plot observation indexes vs. diagnostic estimates.TheDetails section below provides a brief overview of thediagnostics, but we recommend consulting Vehtari, Gelman, and Gabry (2017)and Vehtari, Simpson, Gelman, Yao, and Gabry (2024) for full details.
Usage
pareto_k_table(x)pareto_k_ids(x, threshold = NULL)pareto_k_values(x)pareto_k_influence_values(x)psis_n_eff_values(x)mcse_loo(x, threshold = NULL)## S3 method for class 'psis_loo'plot( x, diagnostic = c("k", "ESS", "n_eff"), ..., label_points = FALSE, main = "PSIS diagnostic plot")## S3 method for class 'psis'plot( x, diagnostic = c("k", "ESS", "n_eff"), ..., label_points = FALSE, main = "PSIS diagnostic plot")Arguments
x | |
threshold | For |
diagnostic | For the |
label_points,... | For the |
main | For the |
Details
The reliability and approximate convergence rate of the PSIS-basedestimates can be assessed using the estimates for the shapeparameterk of the generalized Pareto distribution. Thediagnostic threshold for Paretok depends on sample sizeS (sample size dependent threshold was introduced by Vehtariet al. (2024), and before that fixed thresholds of 0.5 and 0.7 wererecommended). For simplicity,loo package uses the nominal samplesizeS when computing the sample size specificthreshold. This provides an optimistic threshold if the effectivesample size is less than 2200, but if MCMC-ESS > S/2 the differenceis usually negligible. Thinning of MCMC draws can be used toimprove the ratio ESS/S.
If
k < min(1 - 1 / log10(S), 0.7), whereSis thesample size, the PSIS estimate and the corresponding Monte Carlostandard error estimate are reliable.If
1 - 1 / log10(S) <= k < 0.7, the PSIS estimate and thecorresponding Monte Carlo standard error estimate are notreliable, but increasing the (effective) sample sizeSabove2200 may help (this will increase the sample size specificthreshold(1-1/log10(2200)>0.7and then the bias specificthreshold 0.7 dominates).If
0.7 <= k < 1, the PSIS estimate and the corresponding MonteCarlo standard error have large bias and are not reliable. Increasingthe sample size may reduce the variability inkestimate, whichmay result in lowerkestimate, too.If
k \geq 1, the target distribution is estimated tohave a non-finite mean. The PSIS estimate and the corresponding MonteCarlo standard error are not well defined. Increasing the sample sizemay reduce the variability in thekestimate, whichmay also result in a lowerkestimate.
What if the estimated tail shape parameterkexceeds the diagnostic threshold?
Importance sampling is likely towork less well if the marginal posteriorp(\theta^s | y) andLOO posteriorp(\theta^s | y_{-i}) are very different, whichis more likely to happen with a non-robust model and highlyinfluential observations. If the estimated tail shape parameterk exceeds the diagnostic threshold, the user should bewarned. (Note: Ifk is greater than the diagnostic thresholdthen WAIC is also likely to fail, but WAIC lacks as accuratediagnostic.) When using PSIS in the context of approximate LOO-CV,we recommend one of the following actions:
With some additional computations, it is possible to transformthe MCMC draws from the posterior distribution to obtain morereliable importance sampling estimates. This results in a smallershape parameter
k. Seeloo_moment_match()and thevignetteAvoiding model refits in leave-one-out cross-validationwith moment matching for an example of this.Sampling from a leave-one-out mixture distribution (see thevignetteMixture IS leave-one-out cross-validation forhigh-dimensional Bayesian models), directly from
p(\theta^s | y_{-i})for the problematic observationsi, or usingK-fold cross-validation (see the vignetteHoldoutvalidation and K-fold cross-validation of Stan programs with theloo package) will generally be more stable.Using a model that is more robust to anomalous observations willgenerally make approximate LOO-CV more stable.
Observation influence statistics
The estimated shape parameterk for each observation can be used as a measure of the observation'sinfluence on posterior distribution of the model. These can be obtained withpareto_k_influence_values().
Effective sample size and error estimates
In the case that weobtain the samples from the proposal distribution via MCMC theloopackage also computes estimates for the Monte Carlo error and the effectivesample size for importance sampling, which are more accurate for PSIS thanfor IS and TIS (see Vehtari et al (2024) for details). However, the PSISeffective sample size estimate will beover-optimistic when the estimate ofk is greater thanmin(1-1/log10(S), 0.7), whereS is the sample size.
Value
pareto_k_table() returns an object of class"pareto_k_table", which is a matrix with columns"Count","Proportion", and"Min. n_eff", and has its own print method.
pareto_k_ids() returns an integer vector indicating whichobservations have Paretok estimates abovethreshold.
pareto_k_values() returns a vector of the estimated Paretok parameters. These represent the reliability of sampling.
pareto_k_influence_values() returns a vector of the estimated Paretok parameters. These represent influence of the observations on themodel posterior distribution.
psis_n_eff_values() returns a vector of the estimated PSISeffective sample sizes.
mcse_loo() returns the Monte Carlo standard error (MCSE)estimate for PSIS-LOO. MCSE will be NA if any Paretok values areabovethreshold.
Theplot() method is called for its side effect and does notreturn anything. Ifx is the result of a call toloo()orpsis() thenplot(x, diagnostic) produces a plot ofthe estimates of the Pareto shape parameters (diagnostic = "k") orestimates of the PSIS effective sample sizes (diagnostic = "ESS").
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian modelevaluation using leave-one-out cross-validation and WAIC.Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4(journal version,preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).Pareto smoothed importance sampling.Journal of Machine Learning Research,25(72):1-58.PDF
See Also
psis()for the implementation of the PSIS algorithm.TheFAQ page ontheloo website for answers to frequently asked questions.
Convenience function for extracting pointwise estimates
Description
Convenience function for extracting pointwise estimates
Usage
pointwise(x, estimate, ...)## S3 method for class 'loo'pointwise(x, estimate, ...)Arguments
x | A |
estimate | Which pointwise estimate to return. By default all arereturned. The objects returned by the different functions ( |
... | Currently ignored. |
Value
A vector of length equal to the number of observations.
Examples
x <- loo(example_loglik_array())pointwise(x, "elpd_loo")Print methods
Description
Print methods
Usage
## S3 method for class 'loo'print(x, digits = 1, ...)## S3 method for class 'waic'print(x, digits = 1, ...)## S3 method for class 'psis_loo'print(x, digits = 1, plot_k = FALSE, ...)## S3 method for class 'importance_sampling_loo'print(x, digits = 1, plot_k = FALSE, ...)## S3 method for class 'psis_loo_ap'print(x, digits = 1, plot_k = FALSE, ...)## S3 method for class 'psis'print(x, digits = 1, plot_k = FALSE, ...)## S3 method for class 'importance_sampling'print(x, digits = 1, plot_k = FALSE, ...)Arguments
x | |
digits | An integer passed to |
... | Arguments passed to |
plot_k | Logical. If |
Value
x, invisibly.
See Also
Print dimensions of log-likelihood or log-weights matrix
Description
Print dimensions of log-likelihood or log-weights matrix
Usage
print_dims(x, ...)## S3 method for class 'importance_sampling'print_dims(x, ...)## S3 method for class 'psis_loo'print_dims(x, ...)## S3 method for class 'importance_sampling_loo'print_dims(x, ...)## S3 method for class 'waic'print_dims(x, ...)## S3 method for class 'kfold'print_dims(x, ...)## S3 method for class 'psis_loo_ss'print_dims(x, ...)Arguments
x | |
... | Ignored. |
Pareto smoothed importance sampling (PSIS)
Description
Implementation of Pareto smoothed importance sampling (PSIS), a method forstabilizing importance ratios. The version of PSIS implemented herecorresponds to the algorithm presented in Vehtari, Simpson, Gelman, Yao,and Gabry (2024).For PSIS diagnostics see thepareto-k-diagnostic page.
Usage
psis(log_ratios, ...)## S3 method for class 'array'psis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1))## S3 method for class 'matrix'psis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1))## Default S3 method:psis(log_ratios, ..., r_eff = 1)is.psis(x)is.sis(x)is.tis(x)Arguments
log_ratios | An array, matrix, or vector of importance ratios on the logscale (for PSIS-LOO these arenegative log-likelihood values). See theMethods (by class) section below for a detailed description of howto specify the inputs for each method. |
... | Arguments passed on to the various methods. |
r_eff | Vector of relative effective sample size estimates containingone element per observation. The values provided should be the relativeeffective sample sizes of |
cores | The number of cores to use for parallelization. This defaults tothe option
|
x | For |
Value
Thepsis() methods return an object of class"psis",which is a named list with the following components:
log_weightsVector or matrix of smoothed (and truncated) butunnormalized logweights. To get normalized weights use the
weights()method provided for objects ofclass"psis".diagnosticsA named list containing two vectors:
pareto_k: Estimates of the shape parameterkof thegeneralized Pareto distribution. See thepareto-k-diagnosticpage for details.n_eff: PSIS effective sample size estimates.
Objects of class"psis" also have the followingattributes:
norm_const_logVector of precomputed values of
colLogSumExps(log_weights)that areused internally by theweightsmethod to normalize the log weights.tail_lenVector of tail lengths used for fitting the generalized Pareto distribution.
r_effIf specified, the user's
r_effargument.dimsInteger vector of length 2 containing
S(posterior sample size)andN(number of observations).methodMethod used for importance sampling, here
psis.
Methods (by class)
psis(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number ofchains, andNis the number of data points.psis(matrix): AnSbyNmatrix, whereSis the sizeof the posterior sample (with all chains merged) andNis the numberof data points.psis(default): A vector of lengthS(posterior sample size).
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian modelevaluation using leave-one-out cross-validation and WAIC.Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4(journal version,preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).Pareto smoothed importance sampling.Journal of Machine Learning Research,25(72):1-58.PDF
See Also
loo()for approximate LOO-CV using PSIS.pareto-k-diagnostic for PSIS diagnostics.
Theloo packagevignettesfor demonstrations.
TheFAQ page ontheloo website for answers to frequently asked questions.
Examples
log_ratios <- -1 * example_loglik_array()r_eff <- relative_eff(exp(-log_ratios))psis_result <- psis(log_ratios, r_eff = r_eff)str(psis_result)plot(psis_result)# extract smoothed weightslw <- weights(psis_result) # default args are log=TRUE, normalize=TRUEulw <- weights(psis_result, normalize=FALSE) # unnormalized log-weightsw <- weights(psis_result, log=FALSE) # normalized weights (not log-weights)uw <- weights(psis_result, log=FALSE, normalize = FALSE) # unnormalized weightsDiagnostics for Laplace and ADVI approximations and Laplace-loo and ADVI-loo
Description
Diagnostics for Laplace and ADVI approximations and Laplace-loo and ADVI-loo
Usage
psis_approximate_posterior( log_p = NULL, log_g = NULL, log_liks = NULL, cores, save_psis, ..., log_q = NULL)Arguments
log_p | The log-posterior (target) evaluated at S samples from theproposal distribution (g). A vector of length S. |
log_g | The log-density (proposal) evaluated at S samples from theproposal distribution (g). A vector of length S. |
log_liks | A log-likelihood matrix of size S * N, where N is the numberof observations and S is the number of samples from q. See |
cores | The number of cores to use for parallelization. This defaults tothe option
|
save_psis | Should the |
log_q | Deprecated argument name (the same as log_g). |
Value
If log likelihoods are supplied, the function returns a"loo" object,otherwise the function returns a"psis" object.
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian modelevaluation using leave-one-out cross-validation and WAIC.Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4(journal version,preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).Pareto smoothed importance sampling.Journal of Machine Learning Research,25(72):1-58.PDF
See Also
Pareto smoothed importance sampling (deprecated, old version)
Description
As of version2.0.0 this function isdeprecated. Please use thepsis() function for the new PSIS algorithm.
Usage
psislw( lw, wcp = 0.2, wtrunc = 3/4, cores = getOption("mc.cores", 1), llfun = NULL, llargs = NULL, ...)Arguments
lw | A matrix or vector of log weights. For computing LOO, |
wcp | The proportion of importance weights to use for the generalizedPareto fit. The |
wtrunc | For truncating very large weights to |
cores | The number of cores to use for parallelization. This defaults tothe option |
llfun,llargs | See |
... | Ignored when |
Value
A named list with componentslw_smooth (modified log weights) andpareto_k (estimated generalized Pareto shape parameter(s) k).
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian modelevaluation using leave-one-out cross-validation and WAIC.Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4(journal version,preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).Pareto smoothed importance sampling.Journal of Machine Learning Research,25(72):1-58.PDF
See Also
pareto-k-diagnostic for PSIS diagnostics.
Convenience function for computing relative efficiencies
Description
relative_eff() computes the the MCMC effective sample size divided bythe total sample size.
Usage
relative_eff(x, ...)## Default S3 method:relative_eff(x, chain_id, ...)## S3 method for class 'matrix'relative_eff(x, chain_id, ..., cores = getOption("mc.cores", 1))## S3 method for class 'array'relative_eff(x, ..., cores = getOption("mc.cores", 1))## S3 method for class ''function''relative_eff( x, chain_id, ..., cores = getOption("mc.cores", 1), data = NULL, draws = NULL)## S3 method for class 'importance_sampling'relative_eff(x, ...)Arguments
x | A vector, matrix, 3-D array, or function. See theMethods (byclass) section below for details on specifying |
chain_id | A vector of length |
cores | The number of cores to use for parallelization. |
data,draws,... | Same as for the |
Value
A vector of relative effective sample sizes.
Methods (by class)
relative_eff(default): A vector of lengthS(posterior sample size).relative_eff(matrix): AnSbyNmatrix, whereSis the sizeof the posterior sample (with all chains merged) andNis the numberof data points.relative_eff(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number ofchains, andNis the number of data points.relative_eff(`function`): A functionf()that takes argumentsdata_ianddrawsand returns avector containing the log-likelihood for a single observationievaluatedat each posterior draw. The function should be written such that, for eachobservationiin1:N, evaluatingf(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length
S(size of posterior sample). Thelog-likelihood function can also have additional arguments butdata_ianddrawsare required.If using the function method then the arguments
dataanddrawsmust alsobe specified in the call toloo():data: A data frame or matrix containing the data (e.g.observed outcome and predictors) needed to compute the pointwiselog-likelihood. For each observationi, theith row ofdatawill be passed to thedata_iargument of thelog-likelihood function.draws: An object containing the posterior draws for anyparameters needed to compute the pointwise log-likelihood. Unlikedata, which is indexed by observation, for each observation theentire objectdrawswill be passed to thedrawsargument ofthe log-likelihood function.The
...can be used if your log-likelihood function takes additionalarguments. These arguments are used like thedrawsargument in that theyare recycled for each observation.
relative_eff(importance_sampling): Ifxis an object of class"psis",relative_eff()simply returnsther_effattribute ofx.
Examples
LLarr <- example_loglik_array()LLmat <- example_loglik_matrix()dim(LLarr)dim(LLmat)rel_n_eff_1 <- relative_eff(exp(LLarr))rel_n_eff_2 <- relative_eff(exp(LLmat), chain_id = rep(1:2, each = 500))all.equal(rel_n_eff_1, rel_n_eff_2)Standard importance sampling (SIS)
Description
Implementation of standard importance sampling (SIS).
Usage
sis(log_ratios, ...)## S3 method for class 'array'sis(log_ratios, ..., r_eff = NULL, cores = getOption("mc.cores", 1))## S3 method for class 'matrix'sis(log_ratios, ..., r_eff = NULL, cores = getOption("mc.cores", 1))## Default S3 method:sis(log_ratios, ..., r_eff = NULL)Arguments
log_ratios | An array, matrix, or vector of importance ratios on the logscale (for Importance sampling LOO, these arenegative log-likelihoodvalues). See theMethods (by class) section below for a detaileddescription of how to specify the inputs for each method. |
... | Arguments passed on to the various methods. |
r_eff | Vector of relative effective sample size estimates containingone element per observation. The values provided should be the relativeeffective sample sizes of |
cores | The number of cores to use for parallelization. This defaults tothe option
|
Value
Thesis() methods return an object of class"sis",which is a named list with the following components:
log_weightsVector or matrix of smoothed butunnormalized logweights. To get normalized weights use the
weights()method provided for objects ofclasssis.diagnosticsA named list containing one vector:
pareto_k: Not used insis, all set to 0.n_eff: effective sample size estimates.
Objects of class"sis" also have the followingattributes:
norm_const_logVector of precomputed values of
colLogSumExps(log_weights)that areused internally by theweightsmethod to normalize the log weights.r_effIf specified, the user's
r_effargument.tail_lenNot used for
sis.dimsInteger vector of length 2 containing
S(posterior sample size)andN(number of observations).methodMethod used for importance sampling, here
sis.
Methods (by class)
sis(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number ofchains, andNis the number of data points.sis(matrix): AnSbyNmatrix, whereSis the sizeof the posterior sample (with all chains merged) andNis the numberof data points.sis(default): A vector of lengthS(posterior sample size).
References
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian modelevaluation using leave-one-out cross-validation and WAIC.Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4(journal version,preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).Pareto smoothed importance sampling.Journal of Machine Learning Research,25(72):1-58.PDF
See Also
psis()for approximate LOO-CV using PSIS.loo()for approximate LOO-CV.pareto-k-diagnostic for PSIS diagnostics.
Examples
log_ratios <- -1 * example_loglik_array()r_eff <- relative_eff(exp(-log_ratios))sis_result <- sis(log_ratios, r_eff = r_eff)str(sis_result)# extract smoothed weightslw <- weights(sis_result) # default args are log=TRUE, normalize=TRUEulw <- weights(sis_result, normalize=FALSE) # unnormalized log-weightsw <- weights(sis_result, log=FALSE) # normalized weights (not log-weights)uw <- weights(sis_result, log=FALSE, normalize = FALSE) # unnormalized weightsTruncated importance sampling (TIS)
Description
Implementation of truncated (self-normalized) importance sampling (TIS),truncated at S^(1/2) as recommended by Ionides (2008).
Usage
tis(log_ratios, ...)## S3 method for class 'array'tis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1))## S3 method for class 'matrix'tis(log_ratios, ..., r_eff = 1, cores = getOption("mc.cores", 1))## Default S3 method:tis(log_ratios, ..., r_eff = 1)Arguments
log_ratios | An array, matrix, or vector of importance ratios on the logscale (for Importance sampling LOO, these arenegative log-likelihoodvalues). See theMethods (by class) section below for a detaileddescription of how to specify the inputs for each method. |
... | Arguments passed on to the various methods. |
r_eff | Vector of relative effective sample size estimates containingone element per observation. The values provided should be the relativeeffective sample sizes of |
cores | The number of cores to use for parallelization. This defaults tothe option
|
Value
Thetis() methods return an object of class"tis",which is a named list with the following components:
log_weightsVector or matrix of smoothed (and truncated) butunnormalized logweights. To get normalized weights use the
weights()method provided for objects ofclasstis.diagnosticsA named list containing one vector:
pareto_k: Not used intis, all set to 0.n_eff: Effective sample size estimates.
Objects of class"tis" also have the followingattributes:
norm_const_logVector of precomputed values of
colLogSumExps(log_weights)that areused internally by theweights()method to normalize the log weights.r_effIf specified, the user's
r_effargument.tail_lenNot used for
tis.dimsInteger vector of length 2 containing
S(posterior sample size)andN(number of observations).methodMethod used for importance sampling, here
tis.
Methods (by class)
tis(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number ofchains, andNis the number of data points.tis(matrix): AnSbyNmatrix, whereSis the sizeof the posterior sample (with all chains merged) andNis the numberof data points.tis(default): A vector of lengthS(posterior sample size).
References
Ionides, Edward L. (2008). Truncated importance sampling.Journal of Computational and Graphical Statistics 17(2): 295–311.
See Also
psis()for approximate LOO-CV using PSIS.loo()for approximate LOO-CV.pareto-k-diagnostic for PSIS diagnostics.
Examples
log_ratios <- -1 * example_loglik_array()r_eff <- relative_eff(exp(-log_ratios))tis_result <- tis(log_ratios, r_eff = r_eff)str(tis_result)# extract smoothed weightslw <- weights(tis_result) # default args are log=TRUE, normalize=TRUEulw <- weights(tis_result, normalize=FALSE) # unnormalized log-weightsw <- weights(tis_result, log=FALSE) # normalized weights (not log-weights)uw <- weights(tis_result, log=FALSE, normalize = FALSE) # unnormalized weightsUpdatepsis_loo_ss objects
Description
Updatepsis_loo_ss objects
Usage
## S3 method for class 'psis_loo_ss'update( object, ..., data = NULL, draws = NULL, observations = NULL, r_eff = 1, cores = getOption("mc.cores", 1), loo_approximation = NULL, loo_approximation_draws = NULL, llgrad = NULL, llhess = NULL)Arguments
object | A |
... | Currently not used. |
data,draws | |
observations | The subsample observations to use. The argument can takefour (4) types of arguments:
|
r_eff | Vector of relative effective sample size estimates for thelikelihood ( |
cores | The number of cores to use for parallelization. This defaults tothe option
|
loo_approximation | What type of approximation of the loo_i's should be used?The default is
As point estimates of |
loo_approximation_draws | The number of posterior draws used whenintegrating over the posterior. This is used if |
llgrad | The gradient of the log-likelihood. Thisis only used when |
llhess | The Hessian of the log-likelihood. This is only usedwith |
Details
Ifobservations is updated then if a vector of indices or apsis_loo_ssobject is supplied the updated object will have exactly the observationsindicated by the vector orpsis_loo_ss object. If a single integer issupplied, new observations will be sampled to reach the supplied sample size.
Value
Apsis_loo_ss object.
Widely applicable information criterion (WAIC)
Description
Thewaic() methods can be used to compute WAIC from the pointwiselog-likelihood. However, we recommend LOO-CV using PSIS (as implemented bytheloo() function) because PSIS provides useful diagnostics as well aseffective sample size and Monte Carlo estimates.
Usage
waic(x, ...)## S3 method for class 'array'waic(x, ...)## S3 method for class 'matrix'waic(x, ...)## S3 method for class ''function''waic(x, ..., data = NULL, draws = NULL)is.waic(x)Arguments
x | A log-likelihood array, matrix, or function. TheMethods (by class)section, below, has detailed descriptions of how to specify the inputs foreach method. |
draws,data,... | For the function method only. See theMethods (by class) section below for details on these arguments. |
Value
A named list (of classc("waic", "loo")) with components:
estimatesA matrix with two columns (
"Estimate","SE") and threerows ("elpd_waic","p_waic","waic"). This containspoint estimates and standard errors of the expected log pointwise predictivedensity (elpd_waic), the effective number of parameters(p_waic) and the information criterionwaic(which is just-2 * elpd_waic, i.e., converted to deviance scale).pointwiseA matrix with three columns (and number of rows equal to the number ofobservations) containing the pointwise contributions of each of the abovemeasures (
elpd_waic,p_waic,waic).
Methods (by class)
waic(array): AnIbyCbyNarray, whereIis the number of MCMC iterations per chain,Cis the number ofchains, andNis the number of data points.waic(matrix): AnSbyNmatrix, whereSis the sizeof the posterior sample (with all chains merged) andNis the numberof data points.waic(`function`): A functionf()that takes argumentsdata_ianddrawsand returns avector containing the log-likelihood for a single observationievaluatedat each posterior draw. The function should be written such that, for eachobservationiin1:N, evaluatingf(data_i = data[i,, drop=FALSE], draws = draws)
results in a vector of length
S(size of posterior sample). Thelog-likelihood function can also have additional arguments butdata_ianddrawsare required.If using the function method then the arguments
dataanddrawsmust alsobe specified in the call toloo():data: A data frame or matrix containing the data (e.g.observed outcome and predictors) needed to compute the pointwiselog-likelihood. For each observationi, theith row ofdatawill be passed to thedata_iargument of thelog-likelihood function.draws: An object containing the posterior draws for anyparameters needed to compute the pointwise log-likelihood. Unlikedata, which is indexed by observation, for each observation theentire objectdrawswill be passed to thedrawsargument ofthe log-likelihood function.The
...can be used if your log-likelihood function takes additionalarguments. These arguments are used like thedrawsargument in that theyare recycled for each observation.
References
Watanabe, S. (2010). Asymptotic equivalence of Bayes cross validation andwidely application information criterion in singular learning theory.Journal of Machine Learning Research11, 3571-3594.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian modelevaluation using leave-one-out cross-validation and WAIC.Statistics and Computing. 27(5), 1413–1432. doi:10.1007/s11222-016-9696-4(journal version,preprint arXiv:1507.04544).
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2024).Pareto smoothed importance sampling.Journal of Machine Learning Research,25(72):1-58.PDF
See Also
Theloo packagevignettes andVehtari, Gelman, and Gabry (2017) and Vehtari, Simpson, Gelman, Yao,and Gabry (2024) for more details on why we prefer
loo()towaic().loo_compare()for comparing models on approximate LOO-CV or WAIC.
Examples
### Array and matrix methodsLLarr <- example_loglik_array()dim(LLarr)LLmat <- example_loglik_matrix()dim(LLmat)waic_arr <- waic(LLarr)waic_mat <- waic(LLmat)identical(waic_arr, waic_mat)## Not run: log_lik1 <- extract_log_lik(stanfit1)log_lik2 <- extract_log_lik(stanfit2)(waic1 <- waic(log_lik1))(waic2 <- waic(log_lik2))print(compare(waic1, waic2), digits = 2)## End(Not run)Extract importance sampling weights
Description
Extract importance sampling weights
Usage
## S3 method for class 'importance_sampling'weights(object, ..., log = TRUE, normalize = TRUE)Arguments
object | |
... | Ignored. |
log | Should the weights be returned on the log scale? Defaults to |
normalize | Should the weights be normalized? Defaults to |
Value
Theweights() method returns an object with the same dimensions asthelog_weights component ofobject. Thenormalize andlogarguments control whether the returned weights are normalized and whetheror not to return them on the log scale.
Examples
# See the examples at help("psis")