Fitting the Models

First we’ll create some useful objects to use throughout the rest ofthis vignette. One of them is a functionbatting_avg, whichjust formats a number to include three decimal places to the right ofzero when printing, as is customary for batting averages.

batting_avg<-function(x)print(format(round(x,digits =3),nsmall =3),quote =FALSE)player_avgs<- y/ K# player avgs through 45 ABtot_avg<-sum(y)/sum(K)# overall avg through 45 ABcat("Player averages through 45 at-bats:\n")batting_avg(player_avgs)cat("Overall average through 45 at-bats:\n")batting_avg(tot_avg)

Player averages through 45 at-bats: [1] 0.400 0.378 0.356 0.333 0.311 0.311 0.289 0.267 0.244 0.244 0.222 0.222[13] 0.222 0.222 0.222 0.200 0.178 0.156Overall average through 45 at-bats:[1] 0.265

Complete Pooling

The complete pooling model assumes a single parameter\(\theta\) representing the chance of successfor all units (in this case players).

Assuming each player’s at-bats are independent Bernoulli trials, theprobability distribution for each player’s number of hits\(y_n\) is modeled as

\[p(y_n \, | \, \theta)\ = \\mathsf{Binomial}(y_n \, | \, K_n, \theta).\]

When viewed as a function of\(\theta\) for fixed\(y_n\), this is called the likelihoodfunction.

Assuming each player is independent leads to the complete datalikelihood

\[p(y \, | \, \theta) = \prod_{n=1}^N \mathsf{Binomial}(y_n \, | \, K_n,\theta).\]

Usingfamily=binomial("logit"), thestan_glm function inrstanarm willparameterize the model in terms of the log-odds\(\alpha\), which are defined by the logittransform as

\[\alpha= \mathrm{logit}(\theta)= \log \, \frac{\theta}{1 - \theta}.\]

For example,\(\theta = 0.25\)corresponds to odds of\(.25\) to\(.75\) (equivalently,\(1\) to\(3\)), or log-odds of\(\log .25 / .75 = -1.1\).

The model is therefore

\[p(y_n \, | \, K_n, \alpha)\ = \ \mathsf{Binomial}(y_n \, | \, K_n, \ \mathrm{logit}^{-1}(\alpha))\]

The inverse logit function is the logisticsigmoid fromwhich logistic regression gets its name because the inverse logitfunction is also the standard logistic Cumulative Distribution Function(CDF),

\[\mathrm{logit}^{-1}(\alpha) = \frac{1}{1 + \exp(-\alpha)} = \theta.\]

By construction, for any\(\alpha \in(-\infty, \infty)\),\(\mathrm{logit}^{-1}(\alpha) \in (0, 1)\);the sigmoid converts arbitrary log odds back to the probabilityscale.

We will use a normal distribution with mean\(-1\) and standard deviation\(1\) as the prior on the log-odds\(\alpha\). This is a weakly informativeprior that places about 95% of the prior probability in the interval\((-3, 1)\), which inverse-logittransforms to the interval\((0.05,0.73)\). The prior median\(-1\)corresponds to a\(0.27\) chance ofsuccess. In fact, an even narrower prior is actually motivated here fromsubstantial baseball knowledge.

The figure below shows both this prior on\(\alpha\) as well as the prior it implies onthe probability\(\theta\).

To fit the model we callstan_glm with the formulacbind(Hits, AB - Hits) ~ 1. The left-hand side of theformula specifies the binomial outcome by providing the number ofsuccesses (hits) and failures (at-bats) for each player, and theright-hand side indicates that we want an intercept-only model.

SEED<-101wi_prior<-normal(-1,1)# weakly informative prior on log-oddsfit_pool<-stan_glm(cbind(Hits, AB- Hits)~1,data = bball,family =binomial("logit"),prior_intercept = wi_prior,seed = SEED)

Thesummary function will compute all sorts of summarystatistics from the fitted model, but here we’ll create a small functionthat will compute just a few posterior summary statistics that we’llwant for each of the models we estimate. Thesummary_statsfunction, defined below, will take a matrix of posterior draws as itsinput, apply an inverse-logit transformation (to convert from log-oddsto probabilities) and then compute the median and 80% interval.

invlogit<- plogis# function(x) 1/(1 + exp(-x))summary_stats<-function(posterior) {  x<-invlogit(posterior)# log-odds -> probabilitiest(apply(x,2, quantile,probs =c(0.1,0.5,0.9)))}pool<-summary_stats(as.matrix(fit_pool))# as.matrix extracts the posterior drawspool<-matrix(pool,# replicate to give each player the same estimatesnrow(bball),ncol(pool),byrow =TRUE,dimnames =list(bball$Player,c("10%","50%","90%")))batting_avg(pool)

           10%   50%   90%  Clemente   0.245 0.265 0.285Robinson   0.245 0.265 0.285Howard     0.245 0.265 0.285Johnstone  0.245 0.265 0.285Berry      0.245 0.265 0.285Spencer    0.245 0.265 0.285Kessinger  0.245 0.265 0.285Alvarado   0.245 0.265 0.285Santo      0.245 0.265 0.285Swaboda    0.245 0.265 0.285Petrocelli 0.245 0.265 0.285Rodriguez  0.245 0.265 0.285Scott      0.245 0.265 0.285Unser      0.245 0.265 0.285Williams   0.245 0.265 0.285Campaneris 0.245 0.265 0.285Munson     0.245 0.265 0.285Alvis      0.245 0.265 0.285

With more data, such as from more players or from the rest of theseason, the posterior approaches a delta function around the maximumlikelihood estimate and the posterior interval around the centralposterior intervals will shrink. Nevertheless, even if we know aplayer’s chance of success exactly, there is a large amount ofuncertainty in running\(K\) binarytrials with that chance of success; using a binomial model fundamentallybounds our prediction accuracy.

Although this model will be a good baseline for comparison, we havegood reason to believe from a large amount of prior data (players withas many as 10,000 trials) that it is very unlikely that all baseballplayers have the same chance of success.

No Pooling

A model with no pooling involves a separate chance-of-successparameter\(\theta_n \in [0,1]\) foreach player\(n\), where the\(\theta_n\) are assumed to beindependent.

rstanarm will again parameterize the model in termsof the log-odds,\(\alpha_n =\mathrm{logit}(\theta_n)\), so the likelihood then uses thelog-odds of success\(\alpha_n\) forunit\(n\) in modeling the number ofsuccesses\(y_n\) as

\[p(y_n \, | \, \alpha_n) =\mathsf{Binomial}(y_n \, | \, K_n, \mathrm{logit}^{-1}(\alpha_n)).\]

Assuming the\(y_n\) are independent(conditional on\(\theta\)), this leadsto the total data likelihood

\[p(y \, | \, \alpha) = \prod_{n=1}^N \mathsf{Binomial}(y_n \, | \, K_n,\mathrm{logit}^{-1}(\alpha_n)).\]

To fit the model we need only tweak the model formula used for thefull pooling model to drop the intercept and instead include as the onlypredictor the factor variablePlayer. This is equivalent toestimating a separate intercept on the log-odds scale for each player.We’ll also use theprior (rather thanprior_intercept) argument sincePlayer isconsidered a predictor rather than an intercept from R’s perspective.Using the same weakly informative prior now means that the each\(\alpha_n\) gets a\(\mathsf{Normal}(-1, 1)\) prior, independentof the others.

fit_nopool<-update(fit_pool,formula = .~0+ Player,prior = wi_prior)nopool<-summary_stats(as.matrix(fit_nopool))rownames(nopool)<-as.character(bball$Player)batting_avg(nopool)

            parameters   10%   50%   90%    Clemente   0.300 0.386 0.473  Robinson   0.279 0.366 0.458  Howard     0.263 0.344 0.435  Johnstone  0.244 0.326 0.414  Berry      0.227 0.305 0.393  Spencer    0.226 0.306 0.389  Kessinger  0.209 0.284 0.370  Alvarado   0.190 0.266 0.352  Santo      0.172 0.244 0.330  Swaboda    0.172 0.243 0.328  Petrocelli 0.154 0.223 0.305  Rodriguez  0.157 0.226 0.305  Scott      0.156 0.224 0.306  Unser      0.156 0.225 0.303  Williams   0.156 0.225 0.305  Campaneris 0.138 0.204 0.282  Munson     0.124 0.185 0.258  Alvis      0.108 0.166 0.241

Each 80% interval is much wider than the estimated interval for thepopulation in the complete pooling model; this is to be expected—thereare only 45 data units for each parameter here as opposed to 810 in thecomplete pooling case. If the units each had different numbers oftrials, the intervals would also vary based on size.

As the estimated chance of success goes up toward 0.5, the 80%intervals gets wider. This is to be expected for chance of successparameters, because the variance is maximized when\(\theta = 0.5\).

Based on our existing knowledge of baseball, the no-pooling model isalmost certainly overestimating the high abilities and underestimatinglower abilities (Ted Williams, 30 years prior to the year this data wascollected, was the last player with a 40% observed success rate over aseason, whereas 20% or less is too low for all but a few rare defensivespecialists).

Partial Pooling

Complete pooling provides estimated abilities that are too narrowlydistributed for the units and removes any chance of modeling populationvariation. Estimating each chance of success separately without anypooling provides estimated abilities that are too broadly distributedfor the units and hence too variable. Clearly some amount of poolingbetween these two extremes is called for. But how much?

A hierarchical model treats the players as belonging to a populationof players. The properties of this population will be estimated alongwith player abilities, implicitly controlling the amount of pooling thatis applied. The more variable the (estimate of the) population, the lesspooling is applied. Mathematically, the hierarchical model places aprior on the abilities with parameters that are themselvesestimated.

This model can be estimated using thestan_glmerfunction.

fit_partialpool<-stan_glmer(cbind(Hits, AB- Hits)~ (1| Player),data = bball,family =binomial("logit"),prior_intercept = wi_prior,seed = SEED)

Becausestan_glmer (likeglmer) estimatesthe varying intercepts forPlayer by estimating a singleglobal intercept\(\alpha_0\) andindividual deviations from that intercept for each player\(\delta_n = \alpha_n - \alpha_0\), to getthe posterior distribution for each\(\alpha_n\) we need to shift each of theposterior draws by the corresponding draw for the intercept. We can dothis easily using thesweep function.

# shift each player's estimate by intercept (and then drop intercept)shift_draws<-function(draws) {sweep(draws[,-1],MARGIN =1,STATS = draws[,1],FUN ="+")}alphas<-shift_draws(as.matrix(fit_partialpool))partialpool<-summary_stats(alphas)partialpool<- partialpool[-nrow(partialpool),]rownames(partialpool)<-as.character(bball$Player)batting_avg(partialpool)

            parameters   10%   50%   90%    Clemente   0.249 0.283 0.349  Robinson   0.245 0.281 0.341  Howard     0.243 0.277 0.331  Johnstone  0.240 0.274 0.323  Berry      0.238 0.271 0.317  Spencer    0.235 0.271 0.316  Kessinger  0.233 0.268 0.310  Alvarado   0.229 0.265 0.303  Santo      0.222 0.262 0.299  Swaboda    0.223 0.262 0.298  Petrocelli 0.216 0.258 0.294  Rodriguez  0.218 0.259 0.293  Scott      0.217 0.259 0.294  Unser      0.217 0.258 0.293  Williams   0.215 0.258 0.296  Campaneris 0.211 0.255 0.290  Munson     0.205 0.253 0.287  Alvis      0.198 0.250 0.285

Here the estimates are less extreme than in the no-pooling case,which we should expect due to the partial pooling. It is also clear fromthe wide posteriors for the\(\theta_n\) that there is considerableuncertainty in the estimates of chance-of-success on an unit-by-unit(player-by-player) basis.

Observed vs. Estimated Chance of Success

Figure 5.4 from (Gelman et al. 2013) plots the observed number ofsuccesses\(y_n\) for the first\(K_n\) trials versus the median and 80%intervals for the estimated chance-of-success parameters\(\theta_n\) in the posterior. The followingR code reproduces a similar plot for our data.

library(ggplot2)models<-c("complete pooling","no pooling","partial pooling")estimates<-rbind(pool, nopool, partialpool)colnames(estimates)<-c("lb","median","ub")plotdata<-data.frame(estimates,observed =rep(player_avgs,times =length(models)),model =rep(models,each = N),row.names =NULL)ggplot(plotdata,aes(x = observed,y = median,ymin = lb,ymax = ub))+geom_hline(yintercept = tot_avg,color ="lightpink",size =0.75)+geom_abline(intercept =0,slope =1,color ="skyblue")+geom_linerange(color ="gray60",size =0.75)+geom_point(size =2.5,shape =21,fill ="gray30",color ="white",stroke =0.2)+facet_grid(.~ model)+coord_fixed()+scale_x_continuous(breaks =c(0.2,0.3,0.4))+labs(x ="Observed Hits / AB",y ="Predicted chance of hit")+ggtitle("Posterior Medians and 80% Intervals")

The horizontal axis is the observed rate of success, broken out byplayer (the overplotting is from players with the same number ofsuccesses—they all had the same number of trials in this data). The dotsare the posterior medians with bars extending to cover the central 80%posterior interval. Players with the same observed rates areindistinguishable, any differences in estimates are due to MCMCerror.

The horizontal red line has an intercept equal to the overall successrate, The overall success rate is also the posterior mode (i.e., maximumlikelihood estimate) for the complete pooling model. The diagonal blueline has intercept 0 and slope 1. Estimates falling on this line make upthe maximum likelihood estimates for the no-pooling model. Overall, theplot makes the amount of pooling toward the prior evident.

Posterior Predictive Distribution

After we have fit a model using some “training” data, we are usuallyinterested in the predictions of the fitted model for new data, which wecan use to

make predictions for new data points; e.g., predict how many hitswill Roberto Clemente get in the rest of the season,
evaluate predictions against observed future data; e.g., how welldid we predict how many hits Roberto Clemente actually got in the restof the season, and
generate new simulated data to validate our model fits.

With full Bayesian inference, we do not make a point estimate ofparameters and use those prediction—we instead use an average ofpredictions weighted by the posterior.

Given data\(y\) and a model withparameters\(\theta\), the posteriorpredictive distribution for new data\(\tilde{y}\) is defined by

\[p(\tilde{y} \, | \, y)\ = \\int_{\Theta} p(\tilde{y} \, | \, \theta) \ p(\theta \, | \, y) \\mathrm{d}\theta,\]

where\(\Theta\) is the support ofthe parameters\(\theta\). What anintegral of this form says is that\(p(\tilde{y} \, | \, y)\) is defined as aweighted average over the legal parameter values\(\theta \in \Theta\) of the likelihoodfunction\(p(\tilde{y} \, | \,\theta)\), with weights given by the posterior,\(p(\theta \, | \, y)\). While we do not wantto get sidetracked with the notational and mathematical subtleties ofexpectations here, the posterior predictive density reduces to theexpectation of\(p(\tilde{y} \, | \,\theta)\) conditioned on\(y\).

Evaluating Held-Out Data Predictions

Because the posterior predictive density is formulated as anexpectation over the posterior, it is possible to compute via MCMC. With\(M\) draws\(\theta^{(m)}\) from the posterior\(p(\theta \, | \, y)\), the posteriorpredictive log density for new data\(y^{\mathrm{new}}\) is given by the MCMCapproximation

\[\log \frac{1}{M} \, \sum_{m=1}^M \ p\left( y^{\mathrm{new}} \, | \,\theta^{(m)} \right).\]

In practice, this requires care to prevent underflow in floatingpoint calculations; a robust calculation on the log scale is providedbelow.

Simulating Replicated Data

It is also straightforward to use forward simulation from theprobability distribution of the data\(p(y \,| \, \theta)\) to generate replicated data\(y^{\mathrm{rep}}\) according to theposterior predictive distribution. (Recall that\(p(y \, | \, \theta)\) is called theprobability distribution when\(\theta\) is fixed and the likelihood when\(y\) is fixed.)

With\(M\) draws\(\theta^{(m)}\) from the posterior\(p(\theta \, | \, y)\), replicated data canbe simulated by drawing a sequence of\(M\) simulations according\(y^{\mathrm{rep} \ (m)}\) with each drawnaccording to distribution\(p(y \, | \,\theta^{(m)})\). This latter random variate generation canusually be done efficiently (both computationally and statistically) bymeans of forward simulation from the probability distribution of thedata; we provide an example below.

Prediction for New Trials

Efron and Morris’s (1975) baseball data includes not only theobserved hit rate in the initial 45 at bats, but also includes the datafor how the player did for the rest of the season. The question arisesas to how well these models predict a player’s performance for the restof the season based on their initial 45 at bats.

Calibration

A well calibrated statistical model is one in which the uncertaintyin the predictions matches the uncertainty in further data. That is, ifwe estimate posterior 50% intervals for predictions on new data (here,number of hits in the rest of the season for each player), roughly 50%of the new data should fall in its predicted 50% interval. If the modelis true in the sense of correctly describing the generative process ofthe data, then Bayesian inference is guaranteed to be well calibrated.Given that our models are rarely correct in this deep sense, in practicewe are concerned with testing their calibration on quantities ofinterest.

Sharpness

Given two well calibrated models, the one that makes the more precisepredictions in the sense of having narrower intervals is betterpredictively (Gneiting et al. 2007). To see this in an example, we wouldrather have a well-calibrated prediction that there’s a 90% chance thenumber of hits for a player in the rest of the season will fall in\((120, 130)\) than a 90% prediction that thenumber of hits will fall in\((100,150)\).

For the models introduced here, a posterior that is a delta functionprovides the sharpest predictions. Even so, there is residualuncertainty due to the repeated trials; with\(K^{\mathrm{new}}\) further trials and a afixed\(\theta_n\) chance of success,the random variable\(Y^{\mathrm{new}}_n\) denoting the number offurther successes for unit\(n\) has astandard deviation from the repeated binary trials of

\[\mathrm{sd}[Y^{\mathrm{new}}_n] \ = \ \sqrt{K \\theta \, (1 - \theta)}.\]

Why Evaluate with the Predictive Posterior?

The predictive posterior density directly measures the probability ofseeing the new data. The higher the probability assigned to the newdata, the better job the model has done at predicting the outcome. Inthe limit, an ideal model would perfectly predict the new outcome withno uncertainty (probability of 1 for a discrete outcome or a deltafunction at the true value for the density in a continuous outcome).This notion is related to the notion of sharpness discussed in theprevious section, because if the new observations have higher predictivedensities, they’re probably within narrower posterior intervals(Gneiting et al. 2007).

\(\log E[p(\tilde{y} \, | \,\theta)]\) vs\(E[\log p(\tilde{y} \, |\, \theta)]\)

The log of posterior predictive density is defined in the obvious wayas

\[\log p(\tilde{y} \, | \, y)= \log \int_{\Theta} p(\tilde{y} \, | \, \theta) \ p(\theta \, | \, y) \ \mathrm{d}\theta.\]

This is not a posterior expectation, but rather the log of aposterior expectation. In particular, it should not be confused with theposterior expectation of the log predictive density, which is givenby

\[\int_{\Theta} \left( \log p(\tilde{y} \, | \, \theta) \right) \ p(\theta \, | \, y) \ \mathrm{d}\theta.\]

Although this is easy to compute in Stan in a stable fashion, it doesnot produce the same answer (as we show below).

Because\(-\log(u)\) is convex, alittle wrangling withJensen’sinequality shows that the expectation of the log is less than orequal to the log of the expectation,

\[\int_{\Theta} \left( \, \log p(\tilde{y} \, | \, \theta) \, \right) \p(\theta \, | \, y) \ \mathrm{d}\theta\ \leq \\log \int_{\Theta} p(\tilde{y} \, | \, \theta) \ p(\theta \, | \, y) \\mathrm{d}\theta\]

We’ll compute both expectations and demonstrate Jensen’s inequalityin our running example.

The variablesK_new[n] andy_new[n] holdthe number of at bats (trials) and the number of hits (successes) forplayer (unit)n. With the held out data we can compute thelog density of each data point using thelog_lik function,which, likeposterior_predict, accepts anewdata argument. Thelog_lik function willreturn an\(M \times N\) matrix, where\(M\) is the size of the posteriorsample (the number of draws we obtained from the posterior distribution)and\(N\) is the number of data pointsinnewdata. We can then take the row sums of this matrix tosum over the data points.

newdata<-data.frame(Hits = y_new,AB = K_new,Player = bball$Player)fits<-list(Pooling = fit_pool,NoPooling = fit_nopool,PartialPooling = fit_partialpool)# compute log_p_new matrix with each of the models in 'fits'log_p_new_mats<-lapply(fits, log_lik,newdata = newdata)# for each matrix in the list take the row sumslog_p_new<-sapply(log_p_new_mats, rowSums)M<-nrow(log_p_new)head(log_p_new)

       Pooling NoPooling PartialPooling[1,] -87.50510 -270.1454      -98.24389[2,] -73.97575 -310.3988     -116.16486[3,] -80.95743 -250.4451      -93.64268[4,] -77.54662 -281.8998     -101.46736[5,] -74.39093 -172.3741     -115.82690[6,] -88.18157 -171.3092      -74.83273

We now have the distributions oflog_p_new in a matrixwith a column for each model.

For each model, the posterior mean forlog_p_new willgive us

\[\int_{\Theta} \left( \log p(\tilde{y} \, | \, \theta) \right) \ p(\theta \, | \, y) \ \mathrm{d}\theta\ \approx \\frac{1}{M} \, \sum_{m=1}^M \log p(y^{\mathrm{new}} \, | \,\theta^{(m)}).\]

To compute this for each of the models we only need to take the meanof the corresponding column oflog_p_new.

mean_log_p_new<-colMeans(log_p_new)round(sort(mean_log_p_new,decreasing =TRUE),digits =1)

       Pooling PartialPooling      NoPooling          -81.8          -99.6         -207.8

From a predictive standpoint, the models are ranked by the amount ofpooling they do, with complete pooling being the best, and no poolingbeing the worst predictively. All of these models do predictions byaveraging over their posteriors, with the amount of posterioruncertainty also being ranked in reverse order of the amount of poolingthey do.

As we will now see, the ranking of the models can change when wecompute the posterior expectation of the log predictive density.

Posterior expectation of the log predictive density

The straight path to calculate this would be to define a generatedquantity\(p(y^{\mathrm{new}} \, |y)\), look at the posterior mean computed by Stan, and takes itslog. That is,

\[\log p(y^{\mathrm{new}} \, | \, y)\ \approx \\log \frac{1}{M} \, \sum_{m=1}^M p(y^{\mathrm{new}} \, | \,\theta^{(m)}).\]

Unfortunately, this won’t work in most cases because when we try tocompute\(p(y^{\mathrm{new}} \, | \,\theta^{(m)})\) directly, it is prone to underflow. For example,2000 outcomes\(y^{\mathrm{new}}_n\),each with likelihood 0.5 for\(\theta^{(m)}\), will underflow, because\(0.5^{2000}\) is smaller than thesmallest positive number that a computer can represent using standarddouble-precisionfloating point (used by Stan, R, etc.).

In contrast, if we work on the log scale,\(\log p(y^{\mathrm{new}} \, | \, y)\) willnot underflow. It’s a sum of a bunch of terms of order 1. But we alreadysaw we can’t just average the log to get the log of the average.

To avoid underflow, we’re going to use thelog-sum-of-exponentialstrick, which begins by noting the obvious,

\[\log \frac{1}{M} \, \sum_{m=1}^M \ p(y^{\mathrm{new}} \, | \,\theta^{(m)}).\ = \\log \frac{1}{M} \, \sum_{m=1}^M \ \exp \left( \log p(y^{\mathrm{new}} \, | \,\theta^{(m)}) \right).\]

We’ll then write that last expression as

\[-\log M+ \mathrm{log\_sum\_exp \, } \ \log p(y^{\mathrm{new}} \, | \, \theta^{(m)})\]

We can compute\(\mathrm{log\_sum\_exp}\) stably bysubtracting the max value. Suppose\(u = u_1,\ldots, u_M\), and\(\max(u)\)is the largest\(u_m\). We cancalculate

\[\mathrm{log\_sum\_exp \, } \ u_m\ = \\log \sum_{m=1}^M \exp(u_m)\ = \\max(u) + \log \sum_{m=1}^M \exp(u_m - \max(u)).\]

Because\(u_m - \max(u) \leq 0\),the exponentiations cannot overflow. They may underflow to zero, butthis will not lose precision because of the leading\(\max(u)\) term; the only way underflow canarise is if\(u_m - \max(u)\) is verysmall, meaning that it won’t add significant digits to\(\max(u)\) if it had not underflowed.

We can implement\(\mathrm{log\_sum\_exp}\) in R asfollows:

log_sum_exp<-function(u) {  max_u<-max(u)  a<-0for (nin1:length(u)) {    a<- a+exp(u[n]- max_u)  }  max_u+log(a)}# Or equivalently using vectorizationlog_sum_exp<-function(u) {  max_u<-max(u)  max_u+log(sum(exp(u- max_u)))}

and then include the\(-\log M\)term to make itlog_mean_exp:

log_mean_exp<-function(u) {  M<-length(u)-log(M)+log_sum_exp(u)}

We can then use it to compute the log posterior predictive densitiesfor each of the models:

new_lps<-lapply(log_p_new_mats,function(x)apply(x,2, log_mean_exp))# sum over the data pointsnew_lps_sums<-sapply(new_lps, sum)round(sort(new_lps_sums,decreasing =TRUE),digits =1)

PartialPooling        Pooling      NoPooling          -71.8          -73.1          -81.5

Now the ranking is different! As expected, the values here aregreater than the expectation of the log density due to Jensen’sinequality. The partial pooling model appears to be making slightlybetter predictions than the full pooling model, which in turn is makingslightly better predictions than the no pooling model.

Approximating the expected log predictive density

Vehtari, Gelman, and Gabry (2016) shows that the expected logpredictive density can be approximated using theloofunction for each model and then compared across models:

loo_compare(loo(fit_partialpool),loo(fit_pool),loo(fit_nopool))

                elpd_diff se_difffit_pool         0.0       0.0   fit_partialpool -0.1       0.5   fit_nopool      -6.0       2.6

The third column is the leave-one-out (loo) approximation to theexpected log predictive density. This approximation is onlyasymptotically valid and with only 18 observations in this case,substantially underestimates the expected log predictive densities foundin the previous subsection. Nevertheless, the relative ranking of themodels is essentially the same with the pooled and partially pooledmodels being virtually indistinguishable but much better than the nopooling model.

Predicting New Observations

Withrstanarm it is straightforward to generatedraws from the posterior predictive distribution using theposterior_predict function. With this capability, we caneither generate predictions for new data or we can apply it to thepredictors we already have.

There will be two sources of uncertainty in our predictions, thefirst being the uncertainty in\(\theta\) in the posterior\(p(\theta \, | \, y)\) and the second beingthe uncertainty due to the likelihood\(p(\tilde{y} \, | \, \theta)\).

We let\(z_n\) be the number ofsuccesses for unit\(n\) in\(K^{\mathrm{new}}_n\) further trials. Itmight seem tempting to eliminate that second source of uncertainty andset\(z_n^{(m)}\) to its expectation,\(\theta_n^{(m)} \, K^{\mathrm{new}}\),at each draw\(m\) from the posteriorrather than simulating a new value. Or it might seem tempting to removethe first source of uncertainty and use the posterior mean (or median ormode or …) rather than draws from the posterior. Either way, theresulting values would suffice for estimating the posterior mean, butwould not capture the uncertainty in the prediction for\(y^{\mathrm{new}}_n\) and would thus not beuseful in estimating predictive standard deviations or quantiles or asthe basis for decision making under uncertainty. In other words, thepredictions would not be properly calibrated (in a sense we definebelow).

To predict\(z\) for each player wecan use the following code:

newdata<-data.frame(Hits = y_new,AB = K_new,Player = bball$Player)ppd_pool<-posterior_predict(fit_pool, newdata)ppd_nopool<-posterior_predict(fit_nopool, newdata)ppd_partialpool<-posterior_predict(fit_partialpool, newdata)colnames(ppd_pool)<-colnames(ppd_nopool)<-colnames(ppd_partialpool)<-as.character(bball$Player)colMeans(ppd_partialpool)

  Clemente   Robinson     Howard  Johnstone      Berry    Spencer  Kessinger  107.23925  122.72675  147.32300   76.56375  114.86050  127.86900  158.09975   Alvarado      Santo    Swaboda Petrocelli  Rodriguez      Scott      Unser   36.63425  133.22975   52.15450  137.93575   47.70450  111.53375   71.25625   Williams Campaneris     Munson      Alvis  151.86475  141.23425  101.53275   17.14850

Translating the posterior number of hits into a season battingaverage,\(\frac{y_n + z_n}{K_n +K^{\mathrm{new}}_n}\), we get an 80% posterior interval of

z_1<- ppd_partialpool[,1]clemente_80pct<- (y[1]+quantile(z_1,prob =c(0.1,0.9)))/ (K[1]+ K_new[1])batting_avg(clemente_80pct)

  10%   90% 0.255 0.362

for Roberto Clemente from the partial pooling model. Part of ouruncertainty here is due to our uncertainty in Clemente’s underlyingchance of success, and part of our uncertainty is due to there being 367remaining trials (at bats) modeled as binomial. In the remaining at batsfor the season, Clemente’s success rate (batting average) was\(127 / 367 = 0.346\).

For each model, the following plot shows each player’s posteriorpredictive 50% interval for predicted batting average (success rate) inhis remaining at bats (trials); the observed success rate in theremainder of the season is shown as a blue dot.

ppd_intervals<-function(x)t(apply(x,2, quantile,probs =c(0.25,0.75)))ppd_summaries<- (1/ K_new)*rbind(ppd_intervals(ppd_pool),ppd_intervals(ppd_nopool),ppd_intervals(ppd_partialpool))df_ppd<-data.frame(player =rep(1:length(y_new),3),y =rep(y_new/ K_new,3),lb = ppd_summaries[,"25%"],ub = ppd_summaries[,"75%"],model =rep(models,each =length(y_new)))

ggplot(df_ppd,aes(x=player,y=y,ymin=lb,ymax=ub))+geom_linerange(color ="gray60",size =2)+geom_point(size =2.5,color ="skyblue4")+facet_grid(.~ model)+labs(x =NULL,y ="batting average")+scale_x_continuous(breaks =NULL)+ggtitle(expression(atop("Posterior Predictions for Batting Average in Remainder of Season",atop("50% posterior predictive intervals (gray bars); observed (blue dots)",""))))

We choose to plot 50% posterior intervals as they are a good singlepoint for checking calibration. Rather than plotting the number of hitson the vertical axis, we have standardized all the predictions andoutcomes to a success rate. Because each unit (player) has a differentnumber of subsequent trials (at bats), the posterior intervals arerelatively wider or narrower within the plots for each model (moretrials imply narrower intervals for the average). Because each unit hadthe same number of initial observed trials, this variation is primarilydue to the uncertainty from the binomial model of outcomes.

Calibration

With 50% intervals, we expect half of our estimates to lie outsidetheir intervals in a well-calibrated model. If fewer than the expectednumber of outcomes lie in their estimated posterior intervals, we havereason to believe the model is not well calibrated—its posteriorintervals are too narrow. This is also true if too many outcomes lie intheir estimated posterior intervals—in this case the intervals are toobroad. Of course, there is variation in the tests as the number of unitslying in their intervals is itself a random variable (see theexercises), so in practice we are only looking for extreme values asindicators of miscalibration.

Each of the models other than the complete pooling model appears tobe reasonably well calibrated, and even the calibration for the completepooling model is not bad (the variation in chance-of-success amongplayers has low enough variance that the complete pooling model cannotbe rejected as a possibility with only the amount of data we usedhere).

Sharpness

Consider the width of the posterior predictive intervals for theunits across the models. The model with no pooling has the broadestposterior predictive intervals and the complete pooling model thenarrowest. This is to be expected given the number of observations usedto fit each model; 45 each in the no pooling case and 810 in thecomplete pooling case, and relatively something in between for thepartial pooling models. Because the log odds model is doing morepooling, its intervals are slightly narrower than that of the directhierarchical model.

For two well calibrated models, the one with the narrower posteriorintervals is preferable because its predictions are more tighter. Theterm introduced for this by Gneiting et al. (2007) is “sharpness.” Inthe limit, a perfect model would provide a delta function at the trueanswer with a vanishing posterior interval.

Estimating Event Probabilities

The 80% interval in the partial pooling model coincidentally shows usthat our model estimates a roughly 10% chance of Roberto Clementebatting 0.400 or better for the season based on batting 0.400 in hisfirst 45 at bats. Not great, but non-trivial. Rather than fishing forthe right quantile and hoping to get lucky, we can write a model todirectly estimate event probabilities, such as Robert Clemente’s battingaverage is 0.400 or better for the season.

Event probabilities are defined as expectations of indicatorfunctions over parameters and data. For example, the probability ofplayer\(n\)’s batting average being0.400 or better conditioned on the data\(y\) is defined by the conditional eventprobability

\[\mathrm{Pr}\left[\frac{(y_n + z_n)}{(45 + K^{\mathrm{new}}_n)} \geq 0.400\, \Big| \,y\right]\ = \\int_{\Theta}\mathrm{I}\left[\frac{(y_n + z_n)}{(45 + K^{\mathrm{new}}_n)} \geq0.400\right] \ p(z_n \, | \, \theta_n, K^{\mathrm{new}}_n) \ p(\theta \, | \, y, K) \ \mathrm{d}\theta.\]

The indicator function\(\mathrm{I}[c]\) evaluates to 1 if thecondition\(c\) is true and 0 if it isfalse. Because it is just another expectation with respect to theposterior, we can calculate this event probability using MCMC as

\[\mathrm{Pr}\left[\frac{(y_n + z_n)}{(45 + K^{\mathrm{new}}_n)} \geq0.400 \, \Big| \, y \right]\ \approx \\frac{1}{M} \, \sum_{m=1}^M \mathrm{I}\left[\frac{(y_n + z_n^{(m)})}{(45+ K^{\mathrm{new}}_n)} \geq 0.400\right].\]

This event is about the season batting average being greater than0.400. What if we care about ability (chance of success), not battingaverage (success rate) for the rest of the season? Then we would ask thequestion of whether\(\mathrm{Pr}[\theta_n> 0.4]\). This is defined as a weighted average over the priorand computed via MCMC as the previous case.

\[\mathrm{Pr}\left[\theta_n \geq 0.400 \, | \, y \right]\ = \\int_{\Theta}\mathrm{I}\left[\theta_n \geq 0.400\right] \ p(\theta \, | \, y, K) \ \mathrm{d}\theta\ \approx \\frac{1}{M} \, \sum_{m=1}^M \mathrm{I}[\theta_n^{(m)} \geq 0.400].\]

draws_partialpool<-shift_draws(as.matrix(fit_partialpool))thetas_partialpool<-plogis(draws_partialpool)thetas_partialpool<- thetas_partialpool[,-ncol(thetas_partialpool)]colnames(thetas_partialpool)<-as.character(bball$Player)ability_gt_400<- thetas_partialpool>0.4cat("Pr(theta_n >= 0.400 | y)\n")colMeans(ability_gt_400)[c(1,5,10)]some_gt_350<-apply(thetas_partialpool,1,function(x)max(x)>0.35)cat("Pr(at least one theta_n >= 0.350 | y)\n")mean(some_gt_350)

Pr(theta_n >= 0.400 | y)Clemente    Berry  Swaboda  0.02175  0.00325  0.00050 Pr(at least one theta_n >= 0.350 | y)[1] 0.22875

Multiple Comparisons

We snuck in a “multiple comparison” event in the last section, namelywhether there was some player with an a chance of success for hits of.350 or greater.

With traditional significance testing over multiple trials, it iscommon to adjust for falsely rejecting the null hypothesis (a so-calledTypeI error) by inflating the conventional (and arguably far too low) 5%target for reporting “significance.”

For example, suppose we have our 18 players with ability parameters\(\theta_n\) and we have\(N\) null hypotheses of the form\(H_0^n: \theta_n < 0.350\). Now supposewe evaluate each of these 18 hypotheses independently at theconventional\(p = 0.05\) significancelevel, giving each a 5% chance of rejecting the null hypothesis inerror. When we run all 18 hypothesis tests, the overall chance offalsely rejecting at least one of the null hypotheses is a whopping\(1 - (1 - 0.05)^{18} = 0.60\).

The traditional solution to this problem is to apply aBonferroniadjustment to control the false rejection rate; the typicaladjustment is to divide the\(p\)-valueby the number of hypothesis tests in the “family” (that is, thecollective test being done). Here that sets the rate to\(p = 0.05/18\), or approximately\(p = 0.003\), and results in a slightly lessthan 5% chance of falsely rejecting a null hypothesis in error.

Although the Bonferroni correction does reduce the overall chance offalsely rejecting a null hypothesis, it also reduces the statisticalpower of the test to the same degree. This means that many nullhypotheses will fail to be rejected in error.

Rather than doing classical multiple comparison adjustments to adjustfor false-discovery rate, such as a Bonferroni correction, Gelman etal. (2012) suggest using a hierarchical model to perform partial poolinginstead. As already shown, hierarchical models partially pool the data,which pulls estimates toward the population mean with a strengthdetermined by the amount of observed variation in the population (seealso Figure 2 of (Gelman et al. 2012)). This automatically reduces thefalse-discovery rate, though not in a way that is intrinsicallycalibrated to false discovery, which is good, because reducing theoverall false discovery rate in and of itself reduces the true discoveryrate at the same time.

The generated quantitysome_ability_gt_350 will be setto 1 if the maximum ability estimate in\(\theta\) is greater than 0.35. And thus theposterior mean of this generated quantity will be the eventprobability

\[\mathrm{Pr}[\mathrm{max}(\theta) > 0.350]\ = \ \int_{\Theta} \mathrm{I}[\mathrm{max}(\theta) > 0.35] \p(\theta \, | \, y, K) \ \mathrm{d}\theta\ \approx \ \frac{1}{M} \, \sum_{m=1}^M \\mathrm{I}[\mathrm{max}(\theta^{(m)}) > 0.35]\]

where\(\theta^{(m)}\) is thesequence of posterior draws for the ability parameter vector. Stanreports this value as the posterior mean of the generated quantitysome_ability_gt_350, which takes on the value\(\mathrm{I}[\mathrm{max}(\theta^{(m)}) >0.35]\) in each iteration.

The probability estimate of there being a player with an ability(chance of success) greater than 0.350 is essentially zero in thecomplete and is essentially guaranteed in the no pooling model. Thepartially pooled estimates would not be considered significant atconventional p=0.05 thresholds. One way to get a handle on what’s goingon is to inspect the posterior 80% intervals for chance-of-successestimates in the first graph above.

Ranking

In addition to multiple comparisons, we can use the simultaneousestimation of the ability parameters to rank the units. In this section,we rank ballplayers by (estimated) chance of success (i.e., battingability).

Of course, ranking players by ability makes no sense for the completepooling model, where every player is assumed to have the sameability.

reverse_rank<-function(x)1+length(x)-rank(x)# so lower rank is betterrank<-apply(thetas_partialpool,1, reverse_rank)t(apply(rank,1, quantile,prob =c(0.1,0.5,0.9)))

            parameters   10% 50% 90%  Clemente     1   5  14  Robinson     1   5  14  Howard       1   6  14  Johnstone    2   7  15  Berry        2   8  15  Spencer      2   8  15  Kessinger    2   9  16  Alvarado     3   9  16  Santo        3  10  17  Swaboda      3  10  17  Petrocelli   4  11  17  Rodriguez    4  11  17  Scott        4  11  17  Unser        4  11  17  Williams     3  11  17  Campaneris   4  12  17  Munson       4  13  18  Alvis        5  14  18

It is again abundantly clear from the posterior intervals that ouruncertainty is very great after only 45 at bats.

In the original Volume I BUGSexampleof surgical mortality, the posterior distribution over ranks was plottedfor each hospital. It is now straightforward to reproduce that figurehere for the baseball data.

df_rank<-data.frame(name =rep(bball$Player,each = M),rank =c(t(rank)))ggplot(df_rank,aes(rank))+stat_count(width =0.8)+facet_wrap(~ name)+scale_x_discrete("Rank",limits =c(1,5,10,15))+scale_y_discrete("Probability",limits =c(0,0.1* M,0.2* M),labels =c("0.0","0.1","0.2"))+ggtitle("Rankings for Partial Pooling Model")

Who has the Highest Chance of Success?

We can use our ranking statistic to calculate the event probabilityfor unit\(n\) that the unit has thehighest chance of success using MCMC as

\[\mathrm{Pr}[\theta_n = \max(\theta)]\ = \\int_{\Theta} \mathrm{I}[\theta_n = \mathrm{max}(\theta)] \ p(\theta \, | \, y, K) \ \mathrm{d}\theta\ \approx \\frac{1}{M} \, \sum_{m=1}^M \mathrm{I}[\theta^{(m)}_n =\mathrm{max}(\theta^{(m)})].\]

Like our other models, the partial pooling mitigates the implicitmultiple comparisons being done to calculate the probabilities ofrankings. Contrast this with an approach that does a pairwisesignificance test and then applies a false-discovery correction.

We can compute this straightforwardly using the rank data we havealready computed or we could compute it directly as above. Because\(\mathrm{Pr}[\theta_n = \theta_{n'}] =0\) for\(n \neq n'\), wedon’t have to worry about ties.

thetas_nopool<-plogis(as.matrix(fit_nopool))colnames(thetas_nopool)<-as.character(bball$Player)rank_nopool<-apply(thetas_nopool,1, reverse_rank)is_best_nopool<-rowMeans(rank_nopool==1)is_best_partialpool<-rowMeans(rank==1)df_is_best<-data.frame(unit =rep(bball$Player,2),is_best =c(is_best_partialpool, is_best_nopool),model =rep(c("partial pooling","no pooling"),each = N))ggplot(df_is_best,aes(x=unit,y=is_best))+geom_bar(stat ="identity")+facet_wrap(~ model)+scale_y_continuous(name ="Pr[player is best]")+ggtitle("Who is the Best Player?")+theme(axis.text.x =element_text(angle =-45,vjust =1,hjust =0))

This question of which player has the highest chance of success(batting ability) doesn’t even make sense in the complete pooling model,because the chance of success parameters are all the same by definition.In the other models, the amount of pooling directly determines theprobabilities of being the best player. That is, the probability ofbeing best goes down for high performing players with more pooling,whereas it goes up for below-average players.

Graphical Posterior Predictive Checks

We can simulate data from the predictive distribution and compare itto the original data used for fitting the model. If they are notconsistent, then either our model is not capturing the aspects of thedata we are probing with test statistics or the measurement we made ishighly unlikely. That is, extreme\(p\)-values lead us to suspect there issomething wrong with our model that deserves further exploration.

In some cases, we are willing to work with models that are wrong insome measurable aspects, but accurately capture quantities of interestfor an application. That is, it’s possible for a model to capture some,but not all, aspects of a data set, and still be useful.

Test Statistics and Bayesian\(p\)-Values

A test statistic\(T\) is a functionfrom data to a real value. Following (Gelman et al. 2013), we willconcentrate on four specific test statistics for repeated binary trialdata (though these choices are fairly general): minimum value, maximumvalue, sample mean, and sample standard deviation.

Given a test statistic\(T\) anddata\(y\), the Bayesian\(p\)-value has a direct definition as aprobability,

\[p_B = \mathrm{Pr}[T(y^{\mathrm{rep}}) \geq T(y) \, | \, y].\]

Bayesian\(p\)-values, like theirtraditional counterparts, are probabilities, but not probabilities thata model is true. They simply measure discrepancies between the observeddata and what we would expect if the model is true.

Values of Bayesian\(p\)-values near0 or 1 indicate that the data\(y\)used to estimate the model is unlikely to have been generated by theestimated model. As with other forms of full Bayesian inference, ourestimate is the full posterior, not just a point estimate.

As with other Bayesain inferences, we average over the posteriorrather than working from a point estimate of the parameters. Expandingthis as an expectation of an indicator function,

\[p_B\ = \ \int_{\Theta, Y^{\mathrm{rep}}} \mathrm{I}[T(y^{\mathrm{rep}}) \geq T(y)] \ p(y^{\mathrm{rep}} \, | \, \theta) \ p(\theta \, | \, y) \ \mathrm{d}\theta,\]

We treat\(y^{\mathrm{rep}}\) as aparameter in parallel with\(\theta\),integrating over possible values\(y^{\mathrm{rep}} \in Y^{\mathrm{rep}}\). Asusual, we use the integration sign in a general way intended to includesummation, as with the discrete variable\(y^{\mathrm{rep}}\).

The formulation as an expectation leads to the obvious MCMCcalculation based on posterior draws\(y^{\mathrm{rep} (m)}\) for\(m \in 1{:}M\),

\[p_B\approx\frac{1}{M}\,\sum_{m=1}^M\mathrm{I}[T(y^{\mathrm{rep} \ (m)}) \geq T(y)].\]

Using thepp_check inrstanarm, we caneasily reproduce Figure 6.12 from (Gelman et al. 2013), which shows theposterior predictive distribution for the test statistic, the observedvalue as a vertical line, and the\(p\)-value for each of the tests. First,here is just the plot for the no pooling model using the mean as thetest statistic:

pp_check(fit_nopool,plotfun ="stat",stat ="mean")

Thestat argument can the be the name of any R function(including your own functions defined in the Global Environment) thattakes a vector as an input and returns a scalar.

To make plots for each of the models for several test statistics wecan use the following code, which will create a list of ggplot objectsfor each model and then arrange everything in a single plot.

tstat_plots<-function(model, stats) {lapply(stats,function(stat) {    graph<-pp_check(model,plotfun ="stat",stat = stat,seed = SEED)# optional arguments    graph+xlab(stat)+theme(legend.position ="none")  })}Tstats<-c("mean","sd","min","max")ppcs_pool<-tstat_plots(fit_pool, Tstats)ppcs_nopool<-tstat_plots(fit_nopool, Tstats)ppcs_partialpool<-tstat_plots(fit_partialpool, Tstats)if (require(gridExtra)) {grid.arrange(arrangeGrob(grobs = ppcs_pool,nrow =1,left ="Pooling"),arrangeGrob(grobs = ppcs_nopool,nrow =1,left ="No Pooling"),arrangeGrob(grobs = ppcs_partialpool,nrow =1,left ="Partial Pooling")  )}

The only worrisomely extreme value visible in the plots is the\(p\)-value for standard deviation in theno-pooling model, where the vast majority of the simulated data setsunder the model had standard deviations greater than the actualdata.

We didn’t actually compute this\(p\)-value because extreme\(p\)-values are easy to detect visually andwhether or not the\(p\)-value is lessthan\(0.05\) or some other arbitraryvalue is of little use to us beyond what we can already see in the plot.However, if we did want to actually compute the\(p\)-value we can do so easily:

yrep<-posterior_predict(fit_nopool,seed = SEED)# seed is optionalTy<-sd(y)Tyrep<-apply(yrep,1, sd)# tail-area probabilityp<-1-mean(Tyrep> Ty)print(p)

[1] 0.01325

Comparing Observed and Replicated Data

Following the advice of Gelman et al. (2013), we will take the fittedparameters of the data set and generate replicated data sets, thencompare the replicated data sets visually to the observed data we usedto fit the model. In this section we’ll create the plots for the modelusing partial pooling, but the same plots can be made for the othermodels too.

Again usingrstanarm’spp_checkfunction, we can plot some of the simulated data sets along with theoriginal data set to do a visual inspection as suggested by Gelman etal. (2013). For this type of posterior predictive check we set thecheck argument to"distributions" and we usenreps to specify how many replicated sets of data togenerate from the posterior predictive distribution. Because our modelshave a binomial outcome, instead of plotting the number of successes(hits in this case) on the x-axis,pp_check will plot theproportion of successes.

pp_check(fit_partialpool,plotfun ="hist",nreps =15,binwidth =0.025)+ggtitle("Model: Partial Pooling")

These simulations are not unreasonable for a binomial likelihood, butthey are more spread out than the actual data. In this case, this mayactually have more to do with how the data were selected out of all themajor league baseball players than the actual data distribution. Efronand Morris (1975, p 312) write

This sample was chosen because we wanted between 30 and 50 at bats toassure a satisfactory approximation of the binomial by the normaldistribution while leaving the bulk of at bats to be estimated. We alsowanted to include an unusually good hitter (Clemente) to test the methodwith at least one extreme parameter, a situation expected to be lessfavorable to Stein’s estimator. Stein’s estimator requires equalvariances, or in this situation, equal at bats, so the remaining 17players are all whom either the April 26 or May 3New YorkTimes reported with 45 at bats.

Movatterモバイル変換

Hierarchical Partial Pooling for RepeatedBinary Trials

Bob Carpenter, Jonah Gabry and Ben Goodrich

2025-09-29

Introduction

Repeated Binary Trials

Baseball Hits (Efron and Morris 1975)

Pooling

Fitting the Models

Complete Pooling

No Pooling

Partial Pooling

Observed vs. Estimated Chance of Success

Posterior Predictive Distribution

Evaluating Held-Out Data Predictions

Simulating Replicated Data

Prediction for New Trials

Calibration

Sharpness

Why Evaluate with the Predictive Posterior?

\(\log E[p(\tilde{y} \, | \,\theta)]\) vs\(E[\log p(\tilde{y} \, |\, \theta)]\)

Posterior expectation of the log predictive density

Approximating the expected log predictive density

Predicting New Observations

Calibration

Sharpness

Estimating Event Probabilities

Multiple Comparisons

Ranking

Who has the Highest Chance of Success?

Graphical Posterior Predictive Checks

Test Statistics and Bayesian\(p\)-Values

Comparing Observed and Replicated Data

Discussion

Exercises

References

Additional Data Sets

Rat tumors (N = 71)

Surgical mortality (N = 12)

Baseball hits 1996 AL (N = 308)