Movatterモバイル変換


[0]ホーム

URL:


How to Use the rstanarm Package

Jonah Gabry and Ben Goodrich

2025-09-29

library(ggplot2)library(bayesplot)theme_set(bayesplot::theme_default())

Introduction

This vignette provides anoverview of how to use thefunctions in therstanarm package that focuses oncommonalities. The otherrstanarm vignettes go into theparticularities of each of the individual model-estimatingfunctions.

The goal of therstanarm package is to make Bayesianestimationroutine for the most common regression models thatapplied researchers use. This will enable researchers to avoid thecounter-intuitiveness of the frequentist approach to probability andstatistics with only minimal changes to their existing R scripts.

The four steps of a Bayesian analysis are

  1. Specify a joint distribution for the outcome(s) and all theunknowns, which typically takes the form of a marginal priordistribution for the unknowns multiplied by a likelihood for theoutcome(s) conditional on the unknowns. This joint distribution isproportional to a posterior distribution of the unknowns conditional onthe observed data
  2. Draw from posterior distribution using Markov Chain Monte Carlo(MCMC).
  3. Evaluate how well the model fits the data and possibly revise themodel.
  4. Draw from the posterior predictive distribution of the outcome(s)given interesting values of the predictors in order to visualize how amanipulation of a predictor affects (a function of) the outcome(s).

Step 1 is necessarily model-specific and is covered in more detail inthe other vignettes that cover specific forms of the marginal priordistribution and likelihood of the outcome. It is somewhat more involvedthan the corresponding first step of a frequentist analysis, which onlyrequires that the likelihood of the outcome be specified. However, thedefault priors in therstanarm package should work wellin the majority of cases. Steps 2, 3, and 4 are the focus of thisvignette because they are largely not specific to how the jointdistribution in Step 1 is specified.

The key concept in Step 3 and Step 4 is the posterior predictivedistribution, which is the distribution of the outcome implied by themodel after having used the observed data to update our beliefs aboutthe unknown parameters. Frequentists, by definition, have no posteriorpredictive distribution and frequentist predictions are subtly differentfrom what applied researchers seek. Maximum likelihood estimates donot condition on the observed outcome data and so theuncertainty in the estimates pertains to the variation in the samplingdistribution of the estimator, i.e. the distribution of estimates thatwould occur if we could repeat the process of drawing a random samplefrom a well-defined population and apply the estimator to each sample.It is possible to construct a distribution of predictions under thefrequentist paradigm but it evokes the hypothetical of repeating theprocess of drawing a random sample, applying the estimator each time,and generating point predictions of the outcome. In contrast, theposterior predictive distribution conditions on the observed outcomedata in hand to update beliefs about the unknowns and the variation inthe resulting distribution of predictions reflects the remaininguncertainty in our beliefs about the unknowns.

Step 1: Specify a posterior distribution

For the sake of discussion, we need some posterior distribution todraw from. We will utilize an example from theHSAUR3package by Brian S. Everitt and Torsten Hothorn, which is used in their2014 bookA Handbook of Statistical Analyses Using R (3rdEdition) (Chapman & Hall / CRC). This book is frequentist innature and we will show how to obtain the corresponding Bayesianresults.

The model in section 6.3.2 pertains to whether a survey respondentagrees or disagrees with a conservative statement about the role ofwomen in society, which is modeled as a function of the gender andeducation of the respondents. The posterior distribution — withindependent priors — can be written as\[f\left(\alpha,\beta_1,\beta_2|\mathbf{y},\mathbf{X}\right)\propto f\left(\alpha\right) f\left(\beta_1\right) f\left(\beta_2\right)\times \prod_{i=1}^J { g^{-1}\left(\eta_i\right)^{y_i} \left(1 - g^{-1}\left(\eta_i\right)\right)^{n_i-y_i}},\] where\(\eta_i = \alpha + \beta_1 \mbox{education}_i+ \beta_2 \mbox{Female}_i\) is the linear predictor and afunction of an intercept\(\left(\alpha\right)\), a coefficient on theyears of education\(\left(\beta_1\right)\), and anintercept-shift\(\left(\beta_2\right)\) for the case wherethe respondent is female. These data are organized such that\(y_i\) is the number of respondents whoagree with the statement that have the same level of education and thesame gender, and\(n_i - y_i\) is thenumber of such people who disagree with the statement. The inverse linkfunction,\(p = g^{-1}\left(\eta_i\right)\), for a binomial likelihood can be one of severalCumulative Distribution Functions (CDFs) but in this case is thestandard logistic CDF,\(g^{-1}\left(\eta_i\right)=\frac{1}{1 + e^{-\eta_i}}\).

Suppose we believe — prior to seeing the data — that\(\alpha\),\(\beta_1\), and\(\beta_2\) are probably close to zero, areas likely to be positive as they are to be negative, but have a smallchance of being quite far from zero. These beliefs can be represented byStudent t distributions with a few degrees of freedom in order toproduce moderately heavy tails. In particular, we will specify sevendegrees of freedom. Note that these purported beliefs may well be moreskeptical than your actual beliefs, which are probably that women andpeople with more education have less conservative societal views.

Note on “prior beliefs” and default priors

In this vignette we use the term “prior beliefs” to refer ingenerality to the information content of the prior distribution(conditional on the model). Sometimes previous research on the topic ofinterest motivates beliefs about model parameters, but other times suchwork may not exist or several studies may make contradictory claims.Regardless, we nearly always havesome knowledge that should bereflected in our choice of prior distributions. For example, no onebelieves a logistic regression coefficient will be greater than five inabsolute value if the predictors are scaled reasonably. You may alsohave seen examples of so-called “non-informative” (or “vague”,“diffuse”, etc.) priors like a normal distribution with a variance of1000. When data are reasonably scaled, these priors are almost always abad idea for various reasons (they give non-trivial weight to extremevalues, reduce computational efficiency, etc). The default priors inrstanarm are designed to beweaklyinformative, by which we mean that they avoid placing unwarrantedprior weight on nonsensical parameter values and provide someregularization to avoid overfitting, but also do allow for extremevalues if warranted by the data. If additional information is available,the weakly informative defaults can be replaced with more informativepriors.

Step 2: Draw from the posterior distribution

The likelihood for the sample is just the product over the\(J\) groups of\[g^{-1}\left(\eta_i \right)^{y_i} \left(1 - g^{-1}\left(\eta_i \right)\right)^{n_i-y_i},\] whichcan be maximized over\(\alpha\),\(\beta_1\), and\(\beta_2\) to obtain frequentist estimatesby calling

data("womensrole",package ="HSAUR3")womensrole$total<- womensrole$agree+ womensrole$disagreewomensrole_glm_1<-glm(cbind(agree, disagree)~ education+ gender,data = womensrole,family =binomial(link ="logit"))round(coef(summary(womensrole_glm_1)),3)
             Estimate Std. Error z value Pr(>|z|)(Intercept)     2.509      0.184  13.646    0.000education      -0.271      0.015 -17.560    0.000genderFemale   -0.011      0.084  -0.136    0.892

The p-value for the null hypothesis that\(\beta_1 = 0\) is very small, while thep-value for the null hypothesis that\(\beta_2= 0\) is very large. However, frequentist p-values are awkwardbecause they do not pertain to the probability that a scientifichypothesis is true but rather to the probability of observing a\(z\)-statistic that is so large (inmagnitude) if the null hypothesis were true. The desire to makeprobabilistic statements about a scientific hypothesis is one reason whymany people are drawn to the Bayesian approach.

A model with the same likelihood but Student t priors with sevendegrees of freedom can be specified using therstanarmpackage in a similar way by prependingstan_ to theglm call and specifying priors (and optionally the numberof cores on your computer to utilize):

library(rstanarm)womensrole_bglm_1<-stan_glm(cbind(agree, disagree)~ education+ gender,data = womensrole,family =binomial(link ="logit"),prior =student_t(df =7,0,5),prior_intercept =student_t(df =7,0,5),cores =2,seed =12345)womensrole_bglm_1
stan_glm family:       binomial [logit] formula:      cbind(agree, disagree) ~ education + gender observations: 42 predictors:   3------             Median MAD_SD(Intercept)   2.5    0.2  education    -0.3    0.0  genderFemale  0.0    0.1  ------* For help interpreting the printed output see ?print.stanreg* For info on the priors used see ?prior_summary.stanreg

As can be seen, the “Bayesian point estimates” — which arerepresented by the posterior medians — are very similar to the maximumlikelihood estimates. Frequentists would ask whether the point estimateis greater in magnitude than double the estimated standard deviation ofthe sampling distribution. But here we simply have estimates of thestandard deviation of the marginal posterior distributions, which arebased on a scaling of the Median Absolute Deviation (MAD) from theposterior medians to obtain a robust estimator of the posterior standarddeviation. In addition, we can use theposterior_intervalfunction to obtain a Bayesian uncertainty interval for\(\beta_1\):

ci95<-posterior_interval(womensrole_bglm_1,prob =0.95,pars ="education")round(ci95,2)
          2.5% 97.5%education -0.3 -0.24

Unlike frequentist confidence intervals — which arenotinterpretable in terms of post-data probabilities — the Bayesianuncertainty interval indicates we believe after seeing the data thatthere is a\(0.95\) probability that\(\beta_2\) is betweenci95[1,1] andci95[1,2]. Alternatively, wecould say that there is essentially zero probability that\(\beta_2 > 0\), although frequentistscannot make such a claim coherently.

Many of the post-estimation methods that are available for a modelthat is estimated byglm are also available for a modelthat is estimated bystan_glm. For example,

cbind(Median =coef(womensrole_bglm_1),MAD_SD =se(womensrole_bglm_1))
                  Median     MAD_SD(Intercept)   2.52098276 0.18285768education    -0.27153061 0.01556542genderFemale -0.01262136 0.08463091
summary(residuals(womensrole_bglm_1))# not deviance residuals
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max.       NA's -0.3076575 -0.0359870 -0.0041319 -0.0003265  0.0660755  0.2822688          1
cov2cor(vcov(womensrole_bglm_1))
             (Intercept)   education genderFemale(Intercept)    1.0000000 -0.93963167  -0.23059559education     -0.9396317  1.00000000  -0.02463045genderFemale  -0.2305956 -0.02463045   1.00000000

rstanarm does provide aconfint method,although it is reserved for computing confidence intervals in the casethat the user elects to estimate a model by (penalized) maximumlikelihood. When using full Bayesian inference (therstanarm default) or approximate Bayesian inference theposterior_interval function should be used to obtainBayesian uncertainty intervals.

Step 3: Criticize the model

Thelaunch_shinystan function in theshinystan package provides almost all the tools youneed to visualize the posterior distribution and diagnose any problemswith the Markov chains. In this case, the results are fine and to verifythat, you can call

launch_shinystan(womensrole_bglm_1,ppd =FALSE)

which will open a web browser that drives the visualizations.

For the rest of this subsection, we focus on what users can doprogrammatically to evaluate whether a model is adequate. A minimalrequirement for Bayesian estimates is that the model should fit the datathat the estimates conditioned on. The key function here isposterior_predict, which can be passed a newdata.frame to predict out-of-sample, but in this case isomitted to obtain in-sample posterior predictions:

y_rep<-posterior_predict(womensrole_bglm_1)dim(y_rep)
[1] 4000   42

The resulting matrix has rows equal to the number of posteriorsimulations, which in this case is\(2000\) and columns equal to the number ofobservations in the original dataset, which is\(42\) combinations of education and gender.Each element of this matrix is a predicted number of respondents withthat value of education and gender who agreed with the survey questionand thus should be reasonably close to the observed proportion ofagreements in the data. We can create a plot to check this:

par(mfrow =1:2,mar =c(5,3.7,1,0)+0.1,las =3)boxplot(sweep(y_rep[,womensrole$gender=="Male"],2,STATS =               womensrole$total[womensrole$gender=="Male"],FUN ="/"),axes =FALSE,main ="Male",pch =NA,xlab ="Years of Education",ylab ="Proportion of Agrees")with(womensrole,axis(1,at = education[gender=="Male"]+1,labels =0:20))axis(2,las =1)with(womensrole[womensrole$gender=="Male",],points(education+1,  agree/ (agree+ disagree),pch =16,col ="red"))boxplot(sweep(y_rep[,womensrole$gender=="Female"],2,STATS =          womensrole$total[womensrole$gender=="Female"],FUN ="/"),axes =FALSE,main ="Female",pch =NA,xlab ="Years of Education",ylab ="")with(womensrole,axis(1,at = education[gender=="Female"]+1,labels =0:20))with(womensrole[womensrole$gender=="Female",],points(education+1,  agree/ (agree+ disagree),pch =16,col ="red"))
Posterior predictive boxplots vs. observed datapoints

Posterior predictive boxplots vs. observed datapoints

Here the boxplots provide the median, interquartile range, and hingesof the posterior predictive distribution for a given gender and level ofeducation, while the red points represent the corresponding observeddata. As can be seen, the model predicts the observed data fairly wellfor six to sixteen years of education but predicts less well for verylow or very high levels of education where there are less data.

Consequently, we might consider a model where education has aquadratic effect on agreement, which is easy to specify using R’sformula-based syntax.

(womensrole_bglm_2<-update(womensrole_bglm_1,formula. = .~ .+I(education^2)))
stan_glm family:       binomial [logit] formula:      cbind(agree, disagree) ~ education + gender + I(education^2) observations: 42 predictors:   4------               Median MAD_SD(Intercept)     2.1    0.4  education      -0.2    0.1  genderFemale    0.0    0.1  I(education^2)  0.0    0.0  ------* For help interpreting the printed output see ?print.stanreg* For info on the priors used see ?prior_summary.stanreg

Frequentists would test the null hypothesis that the coefficient onthe squared level of education is zero. Bayesians might ask whether sucha model is expected to produce better out-of-sample predictions than amodel with only the level of education. The latter question can beanswered using leave-one-out cross-validation or the approximationthereof provided by theloo function in theloo package, for which a method is provided by therstanarm package.

loo_bglm_1<-loo(womensrole_bglm_1)loo_bglm_2<-loo(womensrole_bglm_2)

First, we verify that the posterior is not too sensitive to anyparticular observation in the dataset.

par(mfrow =1:2,mar =c(5,3.8,1,0)+0.1,las =3)plot(loo_bglm_1,label_points =TRUE)plot(loo_bglm_2,label_points =TRUE)

There are only one or two moderate outliers (whose statistics aregreater than\(0.5\)), which should nothave too much of an effect on the resulting model comparison:

loo_compare(loo_bglm_1, loo_bglm_2)
                  elpd_diff se_diffwomensrole_bglm_1  0.0       0.0   womensrole_bglm_2 -0.7       1.7

In this case, there is little difference in the expected logpointwise deviance between the two models, so we are essentiallyindifferent between them after taking into account that the second modelestimates an additional parameter. The “LOO Information Criterion(LOOIC)”

loo_bglm_1
Computed from 4000 by 42 log-likelihood matrix.         Estimate   SEelpd_loo   -104.8  9.5p_loo         4.2  1.7looic       209.7 18.9------MCSE of elpd_loo is NA.MCSE and ESS estimates assume independent draws (r_eff=1).Pareto k diagnostic values:                         Count Pct.    Min. ESS(-Inf, 0.7]   (good)     41    97.6%   446        (0.7, 1]   (bad)       0     0.0%   <NA>       (1, Inf)   (very bad)  1     2.4%   <NA>    See help('pareto-k-diagnostic') for details.

has the same purpose as the Akaike Information Criterion (AIC) thatis used by frequentists. Both are intended to estimate the expected logpredicted density (ELPD) for a new dataset. However, the AIC ignorespriors and assumes that the posterior distribution is multivariatenormal, whereas the functions from the loo package used here do notassume that the posterior distribution is multivariate normal andintegrate over uncertainty in the parameters. This only assumes that anyone observation can be omitted without having a major effect on theposterior distribution, which can be judged using the plots above.

Step 4: Analyze manipulations of predictors

Frequentists attempt to interpret the estimates of the model, whichis difficult except when the model is linear, has no inverse linkfunction, and contains no interaction terms. Bayesians can avoid thisdifficulty simply by inspecting the posterior predictive distribution atdifferent levels of the predictors. For example,

# note: in newdata we want agree and disagree to sum to the number of people we# want to predict for. the values of agree and disagree don't matter so long as# their sum is the desired number of trials. we need to explicitly imply the# number of trials like this because our original data are aggregate. if we had# bernoulli data then it would be a given we wanted to predict for single# individuals.newdata<-data.frame(agree =c(0,0),disagree =c(100,100),education =c(12,16),gender =factor("Female",levels =c("Male","Female")))y_rep<-posterior_predict(womensrole_bglm_2, newdata)summary(apply(y_rep,1, diff))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.  -43.00  -24.00  -20.00  -19.76  -16.00    3.00

As can be seen, out of\(100\) womenwho have a college degree versus\(100\) women with only a high school degree,we would expect about\(20\) fewercollege-educated women to agree with the question. There is an evenchance that the difference is between\(24\) and\(16\), a one-in-four chance that it isgreater, and one-in-four chance that it is less.

Troubleshooting

This section provides suggestions for how to proceed when youencounter warning messages generated by the modeling functions in therstanarm package. The example models below are usedjust for the purposes of concisely demonstrating certain difficultiesand possible remedies (we won’t worry about the merit of the modelsthemselves). The references at the end provide more information on therelevant issues.

Markov chains did not converge

Recommendation: run the chains for more iterations(in certain cases, see qualification below).

By default, allrstanarm modeling functions will runfour randomly initialized Markov chains, each for 2000 iterations(including a warmup period of 1000 iterations that is discarded). Allchains must converge to the target distribution for inferences to bevalid. For most models, the default settings are sufficient, but if yousee a warning message about Markov chains not converging, the firstthing to try is increasing the number of iterations. This can be done byspecifying theiter argument. However, if all parametershave proper priors (no priors were set toNULL), and youused the default values for iterations (2000) and chains (4), and Rhats(explained below) are greater than 2, then increasing the number ofiterations alone is unlikely to solve to the problem.

One way to monitor whether a chain has converged to the equilibriumdistribution is to compare its behavior to other randomly initializedchains. This is the motivation for the Gelman and Rubin potential scalereduction statistic Rhat. The Rhat statistic measures the ratio of theaverage variance of the draws within each chain to the variance of thepooled draws across chains; if all chains are at equilibrium, these willbe the same and Rhat will be one. If the chains have not converged to acommon distribution, the Rhat statistic will tend to be greater thanone.

Gelman and Rubin’s recommendation is that the independent Markovchains be initialized with diffuse starting values for the parametersand sampled until all values for Rhat are below 1.1. When any Rhatvalues are above 1.1rstanarm will print a warningmessage like this:

Markov chains did not converge! Do not analyze results!

To illustrate how to check the Rhat values after fitting a modelusingrstanarm we’ll fit two models and run them fordifferent numbers of iterations.

bad_rhat<-stan_glm(mpg~ .,data = mtcars,iter =20,chains =2,seed =12345)
Warning: There were 2 chains where the estimated Bayesian Fraction of Missing Information was low. Seehttps://mc-stan.org/misc/warnings.html#bfmi-low
Warning: Examine the pairs() plot to diagnose sampling problems
Warning: The largest R-hat is 2.83, indicating chains have not mixed.Running the chains for more iterations may help. Seehttps://mc-stan.org/misc/warnings.html#r-hat
Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.Running the chains for more iterations may help. Seehttps://mc-stan.org/misc/warnings.html#bulk-ess
Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.Running the chains for more iterations may help. Seehttps://mc-stan.org/misc/warnings.html#tail-ess
Warning: Markov chains did not converge! Do not analyze results!
good_rhat<-update(bad_rhat,iter =1000,chains =2,seed =12345)

Here the first model leads to the warning message about convergencebut the second model does not. Indeed, we can see that many Rhat valuesare much bigger than 1 for the first model:

rhat<-summary(bad_rhat)[,"Rhat"]rhat[rhat>1.1]
  (Intercept)           cyl          drat            wt          qsec      2.341370      3.218821      2.270997      1.334041      1.350006            vs          gear          carb      mean_PPD log-posterior      2.335472      1.542550      1.948308      1.715434      1.564986

Since we didn’t get a warning for the second model we shouldn’t findany parameters with an Rhat far from 1:

any(summary(good_rhat)[,"Rhat"]>1.1)
[1] FALSE

Details on the computation of Rhat and some of its limitations can befound inStan Modeling LanguageUser’s Guide and Reference Manual.

Divergent transitions

Recommendation: increase the target acceptance rateadapt_delta.

Hamiltonian Monte Carlo (HMC), the MCMC algorithm used byStan, works by simulating the evolutionof a Hamiltonian system. Stan uses asymplecticintegrator to approximate the exact solution of the Hamiltoniandynamics. When the step size parameter is too large relative to thecurvature of the log posterior this approximation can diverge andthreaten the validity of the sampler.rstanarm willprint a warning if there are any divergent transitions after the warmupperiod, in which case the posterior sample may be biased. Therecommended method is to increase theadapt_delta parameter– target average proposal acceptance probability in the adaptation –which will in turn reduce the step size. Each of the modeling functionsaccepts anadapt_delta argument, so to increaseadapt_delta you can simply change the value from thedefault value to a value closer to\(1\). To reduce the frequency with whichusers need to manually setadapt_delta, the default valuedepends on the prior distribution used (seehelp("adapt_delta", package = "rstanarm") fordetails).

The downside to increasing the target acceptance rate – and, as aconsequence, decreasing the step size – is that sampling will tend to beslower. Intuitively, this is because a smaller step size means that moresteps are required to explore the posterior distribution. Since thevalidity of the estimates is not guaranteed if there are any post-warmupdivergent transitions, the slower sampling is a minor cost.

Maximum treedepth exceeded

Recommendation: increase the maximum allowedtreedepthmax_treedepth unless all other convergencediagnostics are ok.

Configuring the No-U-Turn-Sampler (the variant of HMC used by Stan)involves putting a cap on the depth of the trees that it evaluatesduring each iteration. This is controlled through a maximum depthparametermax_treedepth. When the maximum allowed treedepth is reached it indicates that NUTS is terminating prematurely toavoid excessively long execution time. Ifrstanarmprints a warning about transitions exceeding the maximum treedepth youshould try increasing themax_treedepth parameter using theoptionalcontrol argument. For example, to increasemax_treedepth to 16 (the default usedrstanarm is 15) you can provide the argumentcontrol = list(max_treedepth = 16) to any of therstanarm modeling functions. If you do not see awarning about hitting the maximum treedepth (which is rare), then you donot need to worry.

With the modelsrstanarm is capable of fitting, whenyou get a warning about max treedepth you will typically also getwarnings about other diagnostics. However, if you see a max treedepthwarning but all other convergence diagnostics are fine, you cantypically ignore the warning. In that case the warning likely indicatesefficiency issues but not that the results are invalid to analyze.

Conclusion

In this vignette, we have gone through the four steps of a Bayesiananalysis. The first step — specifying the posterior distribution —varies considerably from one analysis to the next because the likelihoodfunction employed differs depending on the nature of the outcomevariable and our prior beliefs about the parameters in the model variesnot only from situation to situation but from researcher to researcher.However, given a posterior distribution and given that this posteriordistribution can be drawn from using therstanarmpackage, the remaining steps are conceptually similar across analyses.The key is to draw from the posterior predictive distribution of theoutcome, which is the what the model predicts the outcome to be afterhaving updated our beliefs about the unknown parameters with theobserved data. Posterior predictive distributions can be used for modelchecking and for making inferences about how manipulations of thepredictors would affect the outcome.

Of course, all of this assumes that you have obtained draws from theposterior distribution faithfully. The functions in therstanarm package will throw warnings if there isevidence that the draws are tainted, and we have discussed some steps toremedy these problems. For the most part, the model-fitting functions intherstanarm package are unlikely to produce many suchwarnings, but they may appear in more complicated models.

If the posterior distribution that you specify in the first stepcannot be sampled from using therstanarm package, thenit is often possible to create a hand-written program in the the Stanlanguage so that the posterior distribution can be drawn from using therstan package. See the documentation for therstan package orhttps://mc-stan.org for more details about this moreadvanced usage of Stan. However, many relatively simple models can befit using therstanarm package without writing any codein the Stan language, which is illustrated for each estimating functionin therstanarm package in the othervignettes.

References

Betancourt, M. J., & Girolami, M. (2013). Hamiltonian Monte Carlofor hierarchical models.arXivpreprint.

Stan Development Team. (2015).Stan modeling language user’sguide and reference manual, Version 2.9.0.https://mc-stan.org/docs/. See the ‘Hamiltonian MonteCarlo Sampling’ chapter.

Gelman, A., & Rubin, D. B. (1992). Inference from iterativesimulation using multiple sequences.Statistical Science, 7(4),457 – 472.

Gelman, A., & Shirley, K. (2011). Inference from simulations andmonitoring convergence. In S. Brooks, A. Gelman, G. Jones, & X. Meng(Eds.),Handbook of Markov chain Monte Carlo. Boca Raton:Chapman & Hall/CRC.


[8]ページ先頭

©2009-2025 Movatter.jp