Movatterモバイル変換

Bayesian statistics

From Wikipedia, the free encyclopedia

Theory and paradigm of statistics

Bayesian statistics
Part of a series on

Posterior =Likelihood ×Prior ÷Evidence
Background
Bayesian inference Bayesian probability Bayes' theorem Bernstein–von Mises theorem Coherence Cox's theorem Cromwell's rule Likelihood principle Principle of indifference Principle of maximum entropy
Model building
Conjugate prior Linear regression Empirical Bayes Hierarchical model
Posterior approximation
Markov chain Monte Carlo Laplace's approximation Integrated nested Laplace approximations Variational inference Approximate Bayesian computation
Estimators
Bayesian estimator Credible interval Maximum a posteriori estimation
Evidence approximation
Evidence lower bound Nested sampling
Model evaluation
Bayes factor (Schwarz criterion) Model averaging Posterior predictive
Mathematics portal
v t e

Bayesian statistics (/ˈbeɪziən/BAY-zee-ən or/ˈbeɪʒən/BAY-zhən)^[1] is a theory in the field ofstatistics based on theBayesian interpretation of probability, whereprobability expresses adegree of belief in anevent. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of otherinterpretations of probability, such as thefrequentist interpretation, which views probability as thelimit of the relative frequency of an event after many trials.^[2] More concretely, analysis in Bayesian methods codifies prior knowledge in the form of aprior distribution.

Bayesian statistical methods useBayes' theorem to compute and update probabilities after obtaining new data. Bayes' theorem describes theconditional probability of an event based on data as well as prior information or beliefs about the event or conditions related to the event.^[3]^[4] For example, inBayesian inference, Bayes' theorem can be used to estimate the parameters of aprobability distribution orstatistical model. Since Bayesian statistics treats probability as a degree of belief, Bayes' theorem can directly assign a probability distribution that quantifies the belief to the parameter or set of parameters.^[2]^[3]

Bayesian statistics is named afterThomas Bayes, who formulated a specific case of Bayes' theorem ina paper published in 1763. In several papers spanning from the late 18th to the early 19th centuries,Pierre-Simon Laplace developed the Bayesian interpretation of probability.^[5] Laplace used methods now considered Bayesian to solve a number of statistical problems. While many Bayesian methods were developed by later authors, the term "Bayesian" was not commonly used to describe these methods until the 1950s. Throughout much of the 20th century, Bayesian methods were viewed unfavorably by many statisticians due to philosophical and practical considerations. Many of these methods required much computation, and most widely used approaches during that time were based on the frequentist interpretation. However, with the advent of powerful computers and newalgorithms likeMarkov chain Monte Carlo, Bayesian methods have gained increasing prominence in statistics in the 21st century.^[2]^[6]

Bayes' theorem

[edit]

Main article:Bayes' theorem

Bayes' theorem is used in Bayesian methods to update probabilities, which are degrees of belief, after obtaining new data. Given two events $A {\displaystyle A}$ and $B {\displaystyle B}$ , the conditional probability of $A {\displaystyle A}$ given that $B {\displaystyle B}$ is true is expressed as follows:^[7]

$P(A\mid B)={\frac {P(B\mid A)P(A)}{P(B)}}$

where $P(B)\neq 0$ . Although Bayes' theorem is a fundamental result ofprobability theory, it has a specific interpretation in Bayesian statistics. In the above equation, $A {\displaystyle A}$ usually represents aproposition (such as the statement that a coin lands on heads fifty percent of the time) and $B {\displaystyle B}$ represents the evidence, or new data that is to be taken into account (such as the result of a series of coin flips). $P(A)$ is theprior probability of $A {\displaystyle A}$ which expresses one's beliefs about $A {\displaystyle A}$ before evidence is taken into account. The prior probability may also quantify prior knowledge or information about $A {\displaystyle A}$ . $P(B\mid A)$ is thelikelihood function, which can be interpreted as the probability of the evidence $B {\displaystyle B}$ given that $A {\displaystyle A}$ is true. The likelihood quantifies the extent to which the evidence $B {\displaystyle B}$ supports the proposition $A {\displaystyle A}$ . $P(A\mid B)$ is theposterior probability, the probability of the proposition $A {\displaystyle A}$ after taking the evidence $B {\displaystyle B}$ into account. Essentially, Bayes' theorem updates one's prior beliefs $P(A)$ after considering the new evidence $B {\displaystyle B}$ .^[2]

The probability of the evidence $P(B)$ can be calculated using thelaw of total probability. If $\{A_{1},A_{2},\dots ,A_{n}\}$ is apartition of thesample space, which is the set of alloutcomes of an experiment, then,^[2]^[7]

$P(B)=P(B\mid A_{1})P(A_{1})+P(B\mid A_{2})P(A_{2})+\dots +P(B\mid A_{n})P(A_{n})=\sum _{i}P(B\mid A_{i})P(A_{i})$

When there are an infinite number of outcomes, it is necessary tointegrate over all outcomes to calculate $P(B)$ using the law of total probability. Often, $P(B)$ is difficult to calculate as the calculation would involve sums or integrals that would be time-consuming to evaluate, so often only the product of the prior and likelihood is considered, since the evidence does not change in the same analysis. The posterior is proportional to this product:^[2]

$P(A\mid B)\propto P(B\mid A)P(A)$

Themaximum a posteriori, which is themode of the posterior and is often computed in Bayesian statistics usingmathematical optimization methods, remains the same. The posterior can be approximated even without computing the exact value of $P(B)$ with methods such asMarkov chain Monte Carlo orvariational Bayesian methods.^[2]

Construction

[edit]

The classical textbook equation for the posterior in Bayesian statistics is usually stated as $\pi (\theta \mid x)={\mathcal {L}}(x\mid \theta )\cdot {\frac {\pi (\theta )}{\int _{\Theta }{\mathcal {L}}(x\mid \theta ')\cdot \pi (\theta ')\;d\theta '}}$ where $\pi (\theta \mid x)$ is the updated probability of $\theta$ being the true parameter after collecting the data $x {\displaystyle x}$ , ${\mathcal {L}}(x\mid \theta )$ is the likelihood of collecting the data $x {\displaystyle x}$ given the parameter $\theta$ , $\pi (\theta )$ is the prior belief of $\theta$ 's likelihood and the integral in the denominator gives the probability of collecting the data $x {\displaystyle x}$ .

Mathematically, this version of Bayes' theorem can be constructed in the following way:Suppose $(\Omega ,\Sigma _{\Omega },\lbrace P_{\theta }\mid \theta \in \Theta \rbrace )$ to be some parametric statistical model and $(\Theta ,\Sigma _{\Theta },\pi )$ to be a probability space over the parameter space.We can construct a new probability space $(\Theta \times \Omega ,\Sigma _{\Theta }\otimes \Sigma _{\Omega },Q)$ where $Q {\displaystyle Q}$ is a sort of product measure defined as: $Q(M):=(\pi \otimes P_{\cdot })(M)=\int _{\Theta }P_{\theta '}(M_{\theta '})\;d\pi (\theta ')$

Now, let $A_{\theta }:=\lbrace \theta \rbrace \times \Omega$ and $B_{x}:=\Theta \times \lbrace x\rbrace$ , then we get: $Q(\theta )=Q(A_{\theta })=\int _{\lbrace \theta \rbrace }P_{\theta '}(\Omega )\;d\pi (\theta ')=\pi (\lbrace \theta \rbrace )\cdot P_{\theta }(\Omega )=\pi (\theta )$

and hence

$Q(x\mid \theta )={\frac {Q(B_{x}\cap A_{\theta })}{Q(A_{\theta })}}={\frac {\pi (\theta )\cdot P_{\theta }(\lbrace x\rbrace )}{\pi (\theta )}}=P_{\theta }(x)$

both as empirically might be expected.Thus, Bayes' theorem states:

$Q(\theta \mid x)=P_{\theta }(x)\cdot {\frac {\pi (\theta )}{Q(x)}}$

If $\pi \ll \lambda$ (absolutely continuous w.r.t. lebesgue measure), then there exists a density such that $\pi (\theta )={\frac {d\pi }{d\lambda }}(\theta )$ and we can write:

$Q(x)=\int _{\Theta }P_{\theta '}(x)\;d\pi (\theta ')=\int _{\Theta }P_{\theta '}(x)\cdot \pi (\theta ')\;d\theta '$

Else, if $\pi \ll \nu$ (absolutely continuous w.r.t. counting measure), analogous we can write:

$Q(x)=\int _{\Theta }P_{\theta '}(x)\cdot \pi (\theta ')\;d\nu (\theta ')=\sum _{i}P_{\theta _{i}}(x)\cdot \pi (\theta _{i})$

Thus, by identifying $Q(\theta \mid x)$ with $\pi (\theta \mid x)$ and ${\mathcal {L}}(x\mid \theta )$ with $P_{\theta }(x)$ we arrive at the classical equation stated above.

Bayesian methods

[edit]

The general set of statistical techniques can be divided into a number of activities, many of which have special Bayesian versions.

Bayesian inference

[edit]

Main article:Bayesian inference

Bayesian inference refers tostatistical inference where uncertainty in inferences is quantified using probability.^[8] In classicalfrequentist inference, modelparameters and hypotheses are considered to be fixed. Probabilities are not assigned to parameters or hypotheses in frequentist inference. For example, it would not make sense in frequentist inference to directly assign a probability to an event that can only happen once, such as the result of the next flip of a fair coin. However, it would make sense to state that the proportion of headsapproaches one-half as the number of coin flips increases.^[9]

Statistical models specify a set of statistical assumptions and processes that represent how the sample data are generated. Statistical models have a number of parameters that can be modified. For example, a coin can be represented as samples from aBernoulli distribution, which models two possible outcomes. The Bernoulli distribution has a single parameter equal to the probability of one outcome, which in most cases is the probability of landing on heads. Devising a good model for the data is central in Bayesian inference. In most cases, models only approximate the true process, and may not take into account certain factors influencing the data.^[2] In Bayesian inference, probabilities can be assigned to model parameters. Parameters can be represented asrandom variables. Bayesian inference uses Bayes' theorem to update probabilities after more evidence is obtained or known.^[2]^[10] Furthermore, Bayesian methods allow for placing priors on entire models and calculating their posterior probabilities using Bayes' theorem. These posterior probabilities are proportional to the product of the prior and the marginal likelihood, where the marginal likelihood is the integral of the sampling density over the prior distribution of the parameters. In complex models, marginal likelihoods are generally computed numerically.^[11]

Statistical modeling

[edit]

The formulation ofstatistical models using Bayesian statistics has the identifying feature of requiring the specification ofprior distributions for any unknown parameters. Indeed, parameters of prior distributions may themselves have prior distributions, leading toBayesian hierarchical modeling,^[12]^[13]^[14] also known as multi-level modeling. A special case isBayesian networks.

For conducting a Bayesian statistical analysis, best practices are discussed by van de Schoot et al.^[15]

For reporting the results of a Bayesian statistical analysis, Bayesian analysis reporting guidelines (BARG) are provided in an open-access article byJohn K. Kruschke.^[16]

Design of experiments

[edit]

TheBayesian design of experiments includes a concept called 'influence of prior beliefs'. This approach usessequential analysis techniques to include the outcome of earlier experiments in the design of the next experiment. This is achieved by updating 'beliefs' through the use of prior andposterior distribution. This allows the design of experiments to make good use of resources of all types. An example of this is themulti-armed bandit problem.

Exploratory analysis of Bayesian models

[edit]

Exploratory analysis of Bayesian models is an adaptation or extension of theexploratory data analysis approach to the needs and peculiarities of Bayesian modeling. In the words of Persi Diaconis:^[17]

Exploratory data analysis seeks to reveal structure, or simple descriptions in data. We look at numbers or graphs and try to find patterns. We pursue leads suggested by background information, imagination, patterns perceived, and experience with other data analyses

Theinference process generates a posterior distribution, which has a central role in Bayesian statistics, together with other distributions like the posterior predictive distribution and the prior predictive distribution. The correct visualization, analysis, and interpretation of these distributions is key to properly answer the questions that motivate the inference process.^[18]

When working with Bayesian models there are a series of related tasks that need to be addressed besides inference itself:

Diagnoses of the quality of the inference, this is needed when using numerical methods such asMarkov chain Monte Carlo techniques
Model criticism, including evaluations of both model assumptions and model predictions
Comparison of models, including model selection or model averaging
Preparation of the results for a particular audience

All these tasks are part of the Exploratory analysis of Bayesian models approach and successfully performing them is central to the iterative and interactive modeling process. These tasks require both numerical and visual summaries.^[19]^[20]^[21]

References

[edit]

^"Bayesian".Merriam-Webster.com Dictionary. Merriam-Webster.
^^a ^b ^c ^d ^e ^f ^g ^h ⁱGelman, Andrew;Carlin, John B.; Stern, Hal S.; Dunson, David B.; Vehtari, Aki;Rubin, Donald B. (2013).Bayesian Data Analysis (Third ed.). Chapman and Hall/CRC.ISBN 978-1-4398-4095-5.
^^a ^bMcElreath, Richard (2020).Statistical Rethinking : A Bayesian Course with Examples in R and Stan (2nd ed.). Chapman and Hall/CRC.ISBN 978-0-367-13991-9.
^Kruschke, John (2014).Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan (2nd ed.). Academic Press.ISBN 978-0-12-405888-0.
^McGrayne, Sharon (2012).The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy (First ed.). Chapman and Hall/CRC.ISBN 978-0-3001-8822-6.
^Fienberg, Stephen E. (2006)."When Did Bayesian Inference Become "Bayesian"?".Bayesian Analysis.1 (1):1–40.doi:10.1214/06-BA101.
^^a ^bGrinstead, Charles M.; Snell, J. Laurie (2006).Introduction to probability (2nd ed.). Providence, RI: American Mathematical Society.ISBN 978-0-8218-9414-9.
^Lee, Se Yoon (2021). "Gibbs sampler and coordinate ascent variational inference: A set-theoretical review".Communications in Statistics - Theory and Methods.51 (6):1549–1568.arXiv:2008.01006.doi:10.1080/03610926.2021.1921214.S2CID 220935477.
^Wakefield, Jon (2013).Bayesian and frequentist regression methods. New York, NY: Springer.ISBN 978-1-4419-0924-4.
^Congdon, Peter (2014).Applied Bayesian modelling (2nd ed.). Wiley.ISBN 978-1119951513.
^Chib, Siddhartha (1995). "Marginal Likelihood from the Gibbs Output".Journal of the American Statistical Association.90 (432):1313–1321.doi:10.1080/01621459.1995.10476635.
^Kruschke, J K; Vanpaemel, W (2015). "Bayesian Estimation in Hierarchical Models". In Busemeyer, J R; Wang, Z; Townsend, J T; Eidels, A (eds.).The Oxford Handbook of Computational and Mathematical Psychology(PDF). Oxford University Press. pp. 279–299.
^Hajiramezanali, E. & Dadaneh, S. Z. & Karbalayghareh, A. & Zhou, Z. & Qian, X. Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data. 32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada.arXiv:1810.09433
^Lee, Se Yoon; Mallick, Bani (2021). "Bayesian Hierarchical Modeling: Application Towards Production Results in the Eagle Ford Shale of South Texas".Sankhya B.84:1–43.doi:10.1007/s13571-020-00245-8.
^van de Schoot, Rens; Depaoli, Sarah; King, Ruth; Kramer, Bianca; Märtens, Kaspar; Tadesse, Mahlet G.; Vannucci, Marina; Gelman, Andrew; Veen, Duco; Willemsen, Joukje; Yau, Christopher (January 14, 2021)."Bayesian statistics and modelling".Nature Reviews Methods Primers.1 (1):1–26.doi:10.1038/s43586-020-00001-2.hdl:1874/415909.S2CID 234108684.
^Kruschke, J K (Aug 16, 2021)."Bayesian Analysis Reporting Guidelines".Nature Human Behaviour.5 (10):1282–1291.doi:10.1038/s41562-021-01177-7.PMC 8526359.PMID 34400814.
^Diaconis, Persi (2011) Theories of Data Analysis: From Magical Thinking Through Classical Statistics. John Wiley & Sons, Ltd 2:e55doi:10.1002/9781118150702.ch1
^Kumar, Ravin; Carroll, Colin; Hartikainen, Ari; Martin, Osvaldo (2019)."ArviZ a unified library for exploratory analysis of Bayesian models in Python".Journal of Open Source Software.4 (33): 1143.Bibcode:2019JOSS....4.1143K.doi:10.21105/joss.01143.hdl:11336/114615.
^Gabry, Jonah; Simpson, Daniel; Vehtari, Aki; Betancourt, Michael; Gelman, Andrew (2019). "Visualization in Bayesian workflow".Journal of the Royal Statistical Society, Series A (Statistics in Society).182 (2):389–402.arXiv:1709.01449.doi:10.1111/rssa.12378.S2CID 26590874.
^Vehtari, Aki; Gelman, Andrew; Simpson, Daniel; Carpenter, Bob; Bürkner, Paul-Christian (2021). "Rank-Normalization, Folding, and Localization: An Improved Rˆ for Assessing Convergence of MCMC (With Discussion)".Bayesian Analysis.16 (2): 667.arXiv:1903.08008.Bibcode:2021BayAn..16..667V.doi:10.1214/20-BA1221.S2CID 88522683.
^Martin, Osvaldo (2018).Bayesian Analysis with Python: Introduction to statistical modeling and probabilistic programming using PyMC3 and ArviZ. Packt Publishing Ltd.ISBN 9781789341652.