Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Binomial regression

From Wikipedia, the free encyclopedia
Regression analysis technique
Part of a series on
Regression analysis
Models
Estimation
Background

Instatistics,binomial regression is aregression analysis technique in which theresponse (often referred to asY) has abinomial distribution: it is the number of successes in a series ofn{\displaystyle n} independentBernoulli trials, where each trial has probability of successp{\displaystyle p}.[1] In binomial regression, the probability of a success is related toexplanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

Binomial regression is closely related tobinary regression: a binary regression can be considered a binomial regression withn=1{\displaystyle n=1}, or a regression onungrouped binary data, while a binomial regression can be considered a regression ongrouped binary data (seecomparison).[2] Binomial regression models are essentially the same asbinary choice models, one type ofdiscrete choice model: the primary difference is in the theoretical motivation (seecomparison). Inmachine learning, binomial regression is considered a special case ofprobabilistic classification, and thus a generalization ofbinary classification.

Example application

[edit]

In one published example of an application of binomial regression,[3] the details were as follows. The observed outcome variable was whether or not a fault occurred in an industrial process. There were two explanatory variables: the first was a simple two-case factor representing whether or not a modified version of the process was used and the second was an ordinary quantitative variable measuring the purity of the material being supplied for the process.

Specification of model

[edit]

The response variableY is assumed to bebinomially distributed conditional on the explanatory variablesX. The number of trialsn is known, and the probability of success for each trialp is specified as a functionθ(X). This implies that theconditional expectation andconditional variance of the observed fraction of successes,Y/n, are

E(Y/nX)=θ(X){\displaystyle E(Y/n\mid X)=\theta (X)}
Var(Y/nX)=θ(X)(1θ(X))/n{\displaystyle \operatorname {Var} (Y/n\mid X)=\theta (X)(1-\theta (X))/n}

The goal of binomial regression is to estimate the functionθ(X). Typically the statistician assumesθ(X)=m(βTX){\displaystyle \theta (X)=m(\beta ^{\mathrm {T} }X)}, for a known functionm, and estimatesβ. Common choices form include thelogistic function.[1]

The data are often fitted as ageneralised linear model where the predicted values μ are the probabilities that any individual event will result in a success. Thelikelihood of the predictions is then given by

L(μY)=i=1n(1yi=1(μi)+1yi=0(1μi)),{\displaystyle L({\boldsymbol {\mu }}\mid Y)=\prod _{i=1}^{n}\left(1_{y_{i}=1}(\mu _{i})+1_{y_{i}=0}(1-\mu _{i})\right),\,\!}

where1A is theindicator function which takes on the value one when the eventA occurs, and zero otherwise: in this formulation, for any given observationyi, only one of the two terms inside the product contributes, according to whetheryi=0 or 1. The likelihood function is more fully specified by defining the formal parametersμi as parameterised functions of the explanatory variables: this defines the likelihood in terms of a much reduced number of parameters. Fitting of the model is usually achieved by employing the method ofmaximum likelihood to determine these parameters. In practice, the use of a formulation as a generalised linear model allows advantage to be taken of certain algorithmic ideas which are applicable across the whole class of more general models but which do not apply to all maximum likelihood problems.

Models used in binomial regression can often be extended to multinomial data.

There are many methods of generating the values ofμ in systematic ways that allow for interpretation of the model; they are discussed below.

Link functions

[edit]

There is a requirement that the modelling linking the probabilities μ to the explanatory variables should be of a form which only produces values in the range 0 to 1. Many models can be fitted into the form

μ=g(η).{\displaystyle {\boldsymbol {\mu }}=g({\boldsymbol {\eta }})\,.}

Hereη is an intermediate variable representing a linear combination, containing the regression parameters, of the explanatory variables. The functiong is thecumulative distribution function (cdf) of someprobability distribution. Usually this probability distribution has asupport from minus infinity to plus infinity so that any finite value ofη is transformed by the functiong to a value inside the range 0 to 1.

In the case oflogistic regression, the link function is the log of theodds ratio orlogistic function. In the case ofprobit, the link is the cdf of thenormal distribution. Thelinear probability model is not a proper binomial regression specification because predictions need not be in the range of zero to one; it is sometimes used for this type of data when the probability space is where interpretation occurs or when the analyst lacks sufficient sophistication to fit or calculate approximate linearizations of probabilities for interpretation.

Comparison with binary regression

[edit]

Binomial regression is closely connected with binary regression. If the response is abinary variable (two possible outcomes), then these alternatives can be coded as 0 or 1 by considering one of the outcomes as "success" and the other as "failure" and considering these ascount data: "success" is 1 success out of 1 trial, while "failure" is 0 successes out of 1 trial. This can now be considered a binomial distribution withn=1{\displaystyle n=1} trial, so a binary regression is a special case of a binomial regression. If these data aregrouped (by adding counts), they are no longer binary data, but are count data for each group, and can still be modeled by a binomial regression; the individual binary outcomes are then referred to as "ungrouped data". An advantage of working with grouped data is that one can test the goodness of fit of the model;[2] for example, grouped data may exhibitoverdispersion relative to the variance estimated from the ungrouped data.

Comparison with binary choice models

[edit]

A binary choice model assumes alatent variableUn, the utility (or net benefit) that personn obtains from taking an action (as opposed to not taking the action). The utility the person obtains from taking the action depends on the characteristics of the person, some of which are observed by the researcher and some are not:

Un=βsn+εn{\displaystyle U_{n}={\boldsymbol {\beta }}\cdot \mathbf {s_{n}} +\varepsilon _{n}}

whereβ{\displaystyle {\boldsymbol {\beta }}} is a set ofregression coefficients andsn{\displaystyle \mathbf {s_{n}} } is a set ofindependent variables (also known as "features") describing personn, which may be either discrete "dummy variables" or regular continuous variables.εn{\displaystyle \varepsilon _{n}} is arandom variable specifying "noise" or "error" in the prediction, assumed to be distributed according to some distribution. Normally, if there is a mean or variance parameter in the distribution, it cannot beidentified, so the parameters are set to convenient values — by convention usually mean 0, variance 1.

The person takes the action,yn = 1, ifUn > 0. The unobserved term,εn, is assumed to have alogistic distribution.

The specification is written succinctly as:

Let us write it slightly differently:

Here we have made the substitutionen = −εn. This changes a random variable into a slightly different one, defined over a negated domain. As it happens, the error distributions we usually consider (e.g.logistic distribution, standardnormal distribution, standardStudent's t-distribution, etc.) are symmetric about 0, and hence the distribution overen is identical to the distribution overεn.

Denote thecumulative distribution function (CDF) ofe{\displaystyle e} asFe,{\displaystyle F_{e},} and thequantile function (inverse CDF) ofe{\displaystyle e} asFe1.{\displaystyle F_{e}^{-1}.}

Note that

Pr(Yn=1)=Pr(Un>0)=Pr(βsnen>0)=Pr(en>βsn)=Pr(enβsn)=Fe(βsn){\displaystyle {\begin{aligned}\Pr(Y_{n}=1)&=\Pr(U_{n}>0)\\[6pt]&=\Pr({\boldsymbol {\beta }}\cdot \mathbf {s_{n}} -e_{n}>0)\\[6pt]&=\Pr(-e_{n}>-{\boldsymbol {\beta }}\cdot \mathbf {s_{n}} )\\[6pt]&=\Pr(e_{n}\leq {\boldsymbol {\beta }}\cdot \mathbf {s_{n}} )\\[6pt]&=F_{e}({\boldsymbol {\beta }}\cdot \mathbf {s_{n}} )\end{aligned}}}

SinceYn{\displaystyle Y_{n}} is aBernoulli trial, whereE[Yn]=Pr(Yn=1),{\displaystyle \mathbb {E} [Y_{n}]=\Pr(Y_{n}=1),} we have

E[Yn]=Fe(βsn){\displaystyle \mathbb {E} [Y_{n}]=F_{e}({\boldsymbol {\beta }}\cdot \mathbf {s_{n}} )}

or equivalently

Fe1(E[Yn])=βsn.{\displaystyle F_{e}^{-1}(\mathbb {E} [Y_{n}])={\boldsymbol {\beta }}\cdot \mathbf {s_{n}} .}

Note that this is exactly equivalent to the binomial regression model expressed in the formalism of thegeneralized linear model.

IfenN(0,1),{\displaystyle e_{n}\sim {\mathcal {N}}(0,1),} i.e. distributed as astandard normal distribution, then

Φ1(E[Yn])=βsn{\displaystyle \Phi ^{-1}(\mathbb {E} [Y_{n}])={\boldsymbol {\beta }}\cdot \mathbf {s_{n}} }

which is exactly aprobit model.

IfenLogistic(0,1),{\displaystyle e_{n}\sim \operatorname {Logistic} (0,1),} i.e. distributed as a standardlogistic distribution with mean 0 andscale parameter 1, then the correspondingquantile function is thelogit function, and

logit(E[Yn])=βsn{\displaystyle \operatorname {logit} (\mathbb {E} [Y_{n}])={\boldsymbol {\beta }}\cdot \mathbf {s_{n}} }

which is exactly alogit model.

Note that the two different formalisms —generalized linear models (GLM's) anddiscrete choice models — are equivalent in the case of simple binary choice models, but can be extended if differing ways:

Latent variable interpretation / derivation

[edit]

Alatent variable model involving a binomial observed variableY can be constructed such thatY is related to the latent variableY* via

Y={0,if Y>01,if Y<0.{\displaystyle Y={\begin{cases}0,&{\mbox{if }}Y^{*}>0\\1,&{\mbox{if }}Y^{*}<0.\end{cases}}}

The latent variableY* is then related to a set of regression variablesX by the model

Y=Xβ+ϵ .{\displaystyle Y^{*}=X\beta +\epsilon \ .}

This results in a binomial regression model.

The variance ofϵ can not be identified and when it is not of interest is often assumed to be equal to one. Ifϵ is normally distributed, then a probit is the appropriate model and ifϵ islog-Weibull distributed, then a logit is appropriate. Ifϵ is uniformly distributed, then a linear probability model is appropriate.

See also

[edit]

Notes

[edit]
  1. ^abSanford Weisberg (2005). "Binomial Regression".Applied Linear Regression. Wiley-IEEE. pp. 253–254.ISBN 0-471-66379-4.
  2. ^abRodríguez 2007, Chapter 3, p. 5.
  3. ^Cox & Snell (1981), Example H,p. 91

References

[edit]

Further reading

[edit]


Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis (see alsoTemplate:Least squares and regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
Retrieved from "https://en.wikipedia.org/w/index.php?title=Binomial_regression&oldid=1199287554"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp