Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Mixture distribution

From Wikipedia, the free encyclopedia
Type of probability distribution
See also:Mixture model andCompound probability distribution

Inprobability andstatistics, amixture distribution is theprobability distribution of arandom variable that is derived from a collection of other random variables as follows: first, a random variable is selected by chance from the collection according to given probabilities of selection, and then the value of the selected random variable is realized. The underlying random variables may be random real numbers, or they may berandom vectors (each having the same dimension), in which case the mixture distribution is amultivariate distribution.

In cases where each of the underlying random variables iscontinuous, the outcome variable will also be continuous and itsprobability density function is sometimes referred to as amixture density. Thecumulative distribution function (and theprobability density function if it exists) can be expressed as aconvex combination (i.e. a weighted sum, with non-negative weights that sum to 1) of other distribution functions and density functions. The individual distributions that are combined to form the mixture distribution are called themixture components, and the probabilities (or weights) associated with each component are called themixture weights. The number of components in a mixture distribution is often restricted to being finite, although in some cases the components may becountably infinite in number. More general cases (i.e. anuncountable set of component distributions), as well as the countable case, are treated under the title ofcompound distributions.

A distinction needs to be made between arandom variable whose distribution function or density is the sum of a set of components (i.e. a mixture distribution) and a random variable whose value is the sum of the values of two or more underlying random variables, in which case the distribution is given by theconvolution operator. As an example, the sum of twojointly normally distributed random variables, each with different means, will still have a normal distribution. On the other hand, a mixture density created as a mixture of two normal distributions with different means will have two peaks provided that the two means are far enough apart, showing that this distribution is radically different from a normal distribution.

Mixture distributions arise in many contexts in the literature and arise naturally where astatistical population contains two or moresubpopulations. They are also sometimes used as a means of representing non-normal distributions. Data analysis concerningstatistical models involving mixture distributions is discussed under the title ofmixture models, while the present article concentrates on simple probabilistic and statistical properties of mixture distributions and how these relate to properties of the underlying distributions.

Finite and countable mixtures

[edit]
Density of a mixture of three normal distributions (μ = 5, 10, 15,σ = 2) with equal weights. Each component is shown as a weighted density (each integrating to 1/3)

Given afinite set of probability density functionsp1(x), ...,pn(x), or corresponding cumulative distribution functionsP1(x), ...,Pn(x) andweightsw1, ...,wn such thatwi ≥ 0 andwi = 1, the mixture distribution can be represented by writing either the density,f, or the distribution function,F, as a sum (which in both cases is a convex combination):F(x)=i=1nwiPi(x),{\displaystyle F(x)=\sum _{i=1}^{n}\,w_{i}\,P_{i}(x),}f(x)=i=1nwipi(x).{\displaystyle f(x)=\sum _{i=1}^{n}\,w_{i}\,p_{i}(x).}This type of mixture, being a finite sum, is called afinite mixture, and in applications, an unqualified reference to a "mixture density" usually means a finite mixture. The case of a countablyinfinite set of components is covered formally by allowingn={\displaystyle n=\infty \!}.

Uncountable mixtures

[edit]
Main article:Compound distribution

Where the set of component distributions isuncountable, the result is often called acompound probability distribution. The construction of such distributions has a formal similarity to that of mixture distributions, with either infinite summations or integrals replacing the finite summations used for finite mixtures.

Consider a probability density functionp(x;a) for a variablex, parameterized bya. That is, for each value ofa in some setA,p(x;a) is a probability density function with respect tox. Given a probability density functionw (meaning thatw is nonnegative and integrates to 1), the function

f(x)=Aw(a)p(x;a)da{\displaystyle f(x)=\int _{A}\,w(a)\,p(x;a)\,da}

is again a probability density function forx. A similar integral can be written for the cumulative distribution function. Note that the formulae here reduce to the case of a finite or infinite mixture if the densityw is allowed to be ageneralized function representing the "derivative" of the cumulative distribution function of adiscrete distribution.

Mixtures within a parametric family

[edit]

The mixture components are often not arbitrary probability distributions, but instead are members of aparametric family (such as normal distributions), with different values for a parameter or parameters. In such cases, assuming that it exists, the density can be written in the form of a sum as:f(x;a1,,an)=i=1nwip(x;ai){\displaystyle f(x;a_{1},\ldots ,a_{n})=\sum _{i=1}^{n}\,w_{i}\,p(x;a_{i})}for one parameter, orf(x;a1,,an,b1,,bn)=i=1nwip(x;ai,bi){\displaystyle f(x;a_{1},\ldots ,a_{n},b_{1},\ldots ,b_{n})=\sum _{i=1}^{n}\,w_{i}\,p(x;a_{i},b_{i})}for two parameters, and so forth.

Properties

[edit]

Convexity

[edit]

A generallinear combination of probability density functions is not necessarily a probability density, since it may be negative or it may integrate to something other than 1. However, aconvex combination of probability density functions preserves both of these properties (non-negativity and integrating to 1), and thus mixture densities are themselves probability density functions.

Moments

[edit]

LetX1, ...,Xn denote random variables from then component distributions, and letX denote a random variable from the mixture distribution. Then, for any functionH(·) for whichE[H(Xi)]{\displaystyle \operatorname {E} [H(X_{i})]} exists, and assuming that the component densitiespi(x) exist,

E[H(X)]=H(x)i=1nwipi(x)dx=i=1nwipi(x)H(x)dx=i=1nwiE[H(Xi)].{\displaystyle {\begin{aligned}\operatorname {E} [H(X)]&=\int _{-\infty }^{\infty }H(x)\sum _{i=1}^{n}w_{i}p_{i}(x)\,dx\\&=\sum _{i=1}^{n}w_{i}\int _{-\infty }^{\infty }p_{i}(x)H(x)\,dx=\sum _{i=1}^{n}w_{i}\operatorname {E} [H(X_{i})].\end{aligned}}}

Thejth moment about zero (i.e. choosingH(x) =xj) is simply a weighted average of thej-th moments of the components. Moments about the meanH(x) = (x − μ)j involve a binomial expansion:[1]

E[(Xμ)j]=i=1nwiE[(Xiμi+μiμ)j]=i=1nwik=0j(jk)(μiμ)jkE[(Xiμi)k],{\displaystyle {\begin{aligned}\operatorname {E} \left[{\left(X-\mu \right)}^{j}\right]&=\sum _{i=1}^{n}w_{i}\operatorname {E} \left[{\left(X_{i}-\mu _{i}+\mu _{i}-\mu \right)}^{j}\right]\\&=\sum _{i=1}^{n}w_{i}\sum _{k=0}^{j}{\binom {j}{k}}{\left(\mu _{i}-\mu \right)}^{j-k}\operatorname {E} \left[{\left(X_{i}-\mu _{i}\right)}^{k}\right],\end{aligned}}}

whereμi denotes the mean of thei-th component.

In the case of a mixture of one-dimensional distributions with weightswi, meansμi and variancesσi2, the total mean and variance will be:E[X]=μ=i=1nwiμi,{\displaystyle \operatorname {E} [X]=\mu =\sum _{i=1}^{n}w_{i}\mu _{i},}E[(Xμ)2]=σ2=E[X2]μ2(standard variance reformulation)=(i=1nwiE[Xi2])μ2=i=1nwi(σi2+μi2)μ2(σi2=E[Xi2]μi2E[Xi2]=σi2+μi2){\displaystyle {\begin{aligned}\operatorname {E} \left[(X-\mu )^{2}\right]&=\sigma ^{2}\\&=\operatorname {E} [X^{2}]-\mu ^{2}&({\text{standard variance reformulation}})\\&=\left(\sum _{i=1}^{n}w_{i}\operatorname {E} \left[X_{i}^{2}\right]\right)-\mu ^{2}\\&=\sum _{i=1}^{n}w_{i}(\sigma _{i}^{2}+\mu _{i}^{2})-\mu ^{2}&(\sigma _{i}^{2}=\operatorname {E} [X_{i}^{2}]-\mu _{i}^{2}\implies \operatorname {E} [X_{i}^{2}]=\sigma _{i}^{2}+\mu _{i}^{2})\end{aligned}}}

These relations highlight the potential of mixture distributions to display non-trivial higher-order moments such asskewness andkurtosis (fat tails) and multi-modality, even in the absence of such features within the components themselves. Marron and Wand (1992) give an illustrative account of the flexibility of this framework.[2]

Modes

[edit]

The question ofmultimodality is simple for some cases, such as mixtures ofexponential distributions: all such mixtures areunimodal.[3] However, for the case of mixtures ofnormal distributions, it is a complex one. Conditions for the number of modes in a multivariate normal mixture are explored by Ray & Lindsay[4] extending earlier work on univariate[5][6] and multivariate[7] distributions.

Here the problem of evaluation of the modes of ann component mixture in aD dimensional space is reduced to identification of critical points (local minima, maxima andsaddle points) on amanifold referred to as theridgeline surface, which is the image of the ridgeline functionx(α)=[i=1nαiΣi1]1×[i=1nαiΣi1μi],{\displaystyle x^{*}(\alpha )=\left[\sum _{i=1}^{n}\alpha _{i}\Sigma _{i}^{-1}\right]^{-1}\times \left[\sum _{i=1}^{n}\alpha _{i}\Sigma _{i}^{-1}\mu _{i}\right],}whereα{\displaystyle \alpha } belongs to the(n1){\displaystyle (n-1)}-dimensional standardsimplex:Sn={αRn:αi[0,1],i=1nαi=1}{\displaystyle {\mathcal {S}}_{n}=\left\{\alpha \in \mathbb {R} ^{n}:\alpha _{i}\in [0,1],\sum _{i=1}^{n}\alpha _{i}=1\right\}}andΣiRD×D,μiRD{\displaystyle \Sigma _{i}\in \mathbb {R} ^{D\times D},\,\mu _{i}\in \mathbb {R} ^{D}} correspond to the covariance and mean of thei-th component. Ray & Lindsay[4] consider the case in whichn1<D{\displaystyle n-1<D} showing a one-to-one correspondence of modes of the mixture and those on theridge elevation functionh(α)=q(x(α)){\displaystyle h(\alpha )=q(x^{*}(\alpha ))} thus one may identify the modes by solvingdh(α)dα=0{\displaystyle {\frac {dh(\alpha )}{d\alpha }}=0} with respect toα{\displaystyle \alpha } and determining the valuex(α){\displaystyle x^{*}(\alpha )}.

Using graphical tools, the potential multi-modality of mixtures with number of componentsn{2,3}{\displaystyle n\in \{2,3\}} is demonstrated; in particular it is shown that the number of modes may exceedn{\displaystyle n} and that the modes may not be coincident with the component means. For two components they develop a graphical tool for analysis by instead solving the aforementioned differential with respect to the first mixing weightw1{\displaystyle w_{1}} (which also determines the second mixing weight throughw2=1w1{\displaystyle w_{2}=1-w_{1}}) and expressing the solutions as a functionΠ(α),α[0,1]{\displaystyle \Pi (\alpha ),\,\alpha \in [0,1]} so that the number and location of modes for a given value ofw1{\displaystyle w_{1}} corresponds to the number of intersections of the graph on the lineΠ(α)=w1{\displaystyle \Pi (\alpha )=w_{1}}. This in turn can be related to the number of oscillations of the graph and therefore to solutions ofdΠ(α)dα=0{\displaystyle {\frac {d\Pi (\alpha )}{d\alpha }}=0} leading to an explicit solution for the case of a two component mixture withΣ1=Σ2=Σ{\displaystyle \Sigma _{1}=\Sigma _{2}=\Sigma } (sometimes called ahomoscedastic mixture) given by1α(1α)dM(μ1,μ2,Σ)2{\displaystyle 1-\alpha (1-\alpha )d_{M}(\mu _{1},\mu _{2},\Sigma )^{2}}wheredM(μ1,μ2,Σ)=(μ2μ1)TΣ1(μ2μ1){\textstyle d_{M}(\mu _{1},\mu _{2},\Sigma )={\sqrt {(\mu _{2}-\mu _{1})^{\mathsf {T}}\Sigma ^{-1}(\mu _{2}-\mu _{1})}}} is theMahalanobis distance betweenμ1{\displaystyle \mu _{1}} andμ2{\displaystyle \mu _{2}}.

Since the above is quadratic it follows that in this instance there are at most two modes irrespective of the dimension or the weights.

For normal mixtures with generaln>2{\displaystyle n>2} andD>1{\displaystyle D>1}, a lower bound for the maximum number of possible modes, and – conditionally on the assumption that the maximum number is finite – an upper bound are known. For those combinations ofn{\displaystyle n} andD{\displaystyle D} for which the maximum number is known, it matches the lower bound.[8]

Examples

[edit]

Two normal distributions

[edit]

Simple examples can be given by a mixture of two normal distributions. (SeeMultimodal distribution#Mixture of two normal distributions for more details.)

Given an equal (50/50) mixture of two normal distributions with the same standard deviation and different means (homoscedastic), the overall distribution will exhibit lowkurtosis relative to a single normal distribution – the means of the subpopulations fall on the shoulders of the overall distribution. If sufficiently separated, namely by twice the (common) standard deviation, so|μ1μ2|>2σ,{\displaystyle \left|\mu _{1}-\mu _{2}\right|>2\sigma ,} these form abimodal distribution, otherwise it simply has a wide peak.[9] The variation of the overall population will also be greater than the variation of the two subpopulations (due to spread from different means), and thus exhibitsoverdispersion relative to a normal distribution with fixed variationσ, though it will not be overdispersed relative to a normal distribution with variation equal to variation of the overall population.

Alternatively, given two subpopulations with the same mean and different standard deviations, the overall population will exhibit high kurtosis, with a sharper peak and heavier tails (and correspondingly shallower shoulders) than a single distribution.

  • Univariate mixture distribution, showing bimodal distribution
    Univariate mixture distribution, showing bimodal distribution
  • Multivariate mixture distribution, showing four modes
    Multivariate mixture distribution, showing four modes

A normal and a Cauchy distribution

[edit]

The following example is adapted from Hampel,[10] who creditsJohn Tukey.

Consider the mixture distribution defined by

F(x)   =   (1 − 10−10) (standard normal) + 10−10 (standard Cauchy).

The mean ofi.i.d. observations fromF(x) behaves "normally" except for exorbitantly large samples, although the mean ofF(x) does not even exist.

Applications

[edit]
Further information:Mixture model

Mixture densities are complicated densities expressible in terms of simpler densities (the mixture components), and are used both because they provide a good model for certain data sets (where different subsets of the data exhibit different characteristics and can best be modeled separately), and because they can be more mathematically tractable, because the individual mixture components can be more easily studied than the overall mixture density.

Mixture densities can be used to model astatistical population withsubpopulations, where the mixture components are the densities on the subpopulations, and the weights are the proportions of each subpopulation in the overall population.

Mixture densities can also be used to modelexperimental error or contamination – one assumes that most of the samples measure the desired phenomenon, with some samples from a different, erroneous distribution.

Parametric statistics that assume no error often fail on such mixture densities – for example, statistics that assume normality often fail disastrously in the presence of even a fewoutliers – and instead one usesrobust statistics.

Inmeta-analysis of separate studies,study heterogeneity causes distribution of results to be a mixture distribution, and leads tooverdispersion of results relative to predicted error. For example, in astatistical survey, themargin of error (determined by sample size) predicts thesampling error and hence dispersion of results on repeated surveys. The presence of study heterogeneity (studies have differentsampling bias) increases the dispersion relative to the margin of error.

See also

[edit]

Mixture

[edit]

Hierarchical models

[edit]

Notes

[edit]
  1. ^Frühwirth-Schnatter (2006, Ch.1.2.4)
  2. ^Marron, J. S.; Wand, M. P. (1992)."Exact Mean Integrated Squared Error".The Annals of Statistics.20 (2):712–736.doi:10.1214/aos/1176348653.,http://projecteuclid.org/euclid.aos/1176348653
  3. ^Frühwirth-Schnatter (2006, Ch.1)
  4. ^abRay, R.; Lindsay, B. (2005), "The topography of multivariate normal mixtures",The Annals of Statistics,33 (5):2042–2065,arXiv:math/0602238,doi:10.1214/009053605000000417
  5. ^Robertson CA, Fryer JG (1969) Some descriptive properties of normal mixtures. Skand Aktuarietidskr 137–146
  6. ^Behboodian, J (1970). "On the modes of a mixture of two normal distributions".Technometrics.12:131–139.doi:10.2307/1267357.JSTOR 1267357.
  7. ^Carreira-Perpiñán, M Á; Williams, C (2003).On the modes of a Gaussian mixture(PDF). Published as: Lecture Notes in Computer Science 2695.Springer-Verlag. pp. 625–640.doi:10.1007/3-540-44935-3_44.ISSN 0302-9743.
  8. ^Améndola, C.; Engström, A.; Haase, C. (2020), "Maximum number of modes of Gaussian mixtures",Information and Inference: A Journal of the IMA,9 (3):587–600,arXiv:1702.05066,doi:10.1093/imaiai/iaz013
  9. ^Schilling, Mark F.;Watkins, Ann E.; Watkins, William (2002). "Is human height bimodal?".The American Statistician.56 (3):223–229.doi:10.1198/00031300265.
  10. ^Hampel, Frank (1998), "Is statistics too difficult?",Canadian Journal of Statistics,26:497–513,doi:10.2307/3315772,hdl:20.500.11850/145503

References

[edit]
National
Other
Retrieved from "https://en.wikipedia.org/w/index.php?title=Mixture_distribution&oldid=1313875320"
Category:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp