Instatistics, theWishart distribution is a generalization of thegamma distribution to multiple dimensions. It is named in honor ofJohn Wishart, who first formulated the distribution in 1928.[1] Other names includeWishart ensemble (inrandom matrix theory, probability distributions over matrices are usually called "ensembles"), orWishart–Laguerre ensemble (since its eigenvalue distribution involveLaguerre polynomials), or LOE, LUE, LSE (in analogy withGOE, GUE, GSE).[2]
known as thescatter matrix. One indicates thatS has that probability distribution by writing
The positive integern is the number ofdegrees of freedom. Sometimes this is writtenW(V,p,n). Forn ≥p the matrixS is invertible with probability1 ifV is invertible.
The Wishart distribution arises as the distribution of the sample covariance matrix for a sample from amultivariate normal distribution. It occurs frequently inlikelihood-ratio tests in multivariate statistical analysis. It also arises in the spectral theory ofrandom matrices[citation needed] and in multidimensional Bayesian analysis.[5] It is also encountered in wireless communications, while analyzing the performance ofRayleigh fadingMIMO wireless channels.[6]
The density above is not the joint density of all the elements of the random matrixX (such-dimensional density does not exist because of the symmetry constrains), it is rather the joint density of elements for (,[1] page 38). Also, the density formula above applies only to positive definite matrices for other matrices the density is equal to zero.
In fact the above definition can be extended to any realn >p − 1. Ifn ≤p − 1, then the Wishart no longer has a density—instead it represents a singular distribution that takes values in a lower-dimension subspace of the space ofp ×p matrices.[8]
The least informative, proper Wishart prior is obtained by settingn =p.[citation needed]
A common choice forV leverages the fact that the mean ofX ~Wp(V,n) isnV. ThenV is chosen so thatnV equals an initial guess forX. For instance, when estimating a precision matrixΣ−1 ~Wp(V,n) a reasonable choice forV would ben−1Σ0−1, whereΣ0 is some prior estimate for the covariance matrixΣ.
whereE[⋅] denotes expectation. (HereΘ is any matrix with the same dimensions asV,1 indicates the identity matrix, andi is a square root of −1).[10] Properly interpreting this formula requires a little care, because noninteger complex powers aremultivalued; whenn is noninteger, the correct branch must be determined viaanalytic continuation.[16]
Consider the case wherezT = (0, ..., 0, 1, 0, ..., 0) (that is, thej-th element is one and all others zero). Then corollary 1 above shows that
gives the marginal distribution of each of the elements on the matrix's diagonal.
George Seber points out that the Wishart distribution is not called the “multivariate chi-squared distribution” because the marginal distribution of theoff-diagonal elements is not chi-squared. Seber prefers to reserve the termmultivariate for the case when all univariate marginals belong to the same family.[18]
LetV be a2 × 2 variance matrix characterized bycorrelation coefficient−1 <ρ < 1 andL its lower Cholesky factor:
Multiplying through the Bartlett decomposition above, we find that a random sample from the2 × 2 Wishart distribution is
The diagonal elements, most evidently in the first element, follow theχ2 distribution withn degrees of freedom (scaled byσ2) as expected. The off-diagonal element is less familiar but can be identified as anormal variance-mean mixture where the mixing density is aχ2 distribution. The corresponding marginal probability density for the off-diagonal element is therefore thevariance-gamma distribution
whereKν(z) is themodified Bessel function of the second kind.[22] Similar results may be found for higher dimensions. In general, if follows a Wishart distribution with parameters,, then for, the off-diagonal elements
It is also possible to write down themoment-generating function even in thenoncentral case (essentially thenth power of Craig (1936)[24] equation 10) although the probability density becomes an infinite sum of Bessel functions.
It can be shown[25] that the Wishart distribution can be defined if and only if the shape parametern belongs to the set
This set is named afterSimon Gindikin, who introduced it[26] in the 1970s in the context of gamma distributions on homogeneous cones. However, for the new parameters in the discrete spectrum of the Gindikin ensemble, namely,
the corresponding Wishart distribution has no Lebesgue density.
In random matrix theory, the Wishart family is often studied through itsLaguerre ensembles. For the real case (orthogonal symmetry,β=1), the joint density of the eigenvalues of is
which is theLaguerre orthogonal ensemble (LOE). The complex and quaternion analogues are theLaguerre unitary (LUE,) andLaguerre symplectic (LSE,) ensembles, respectively.[27]
A further generalization, theβ-Laguerre ensemble, allows the Dyson index to vary continuously. Its joint eigenvalue density has theCoulomb gas form
which reduces to LOE/LUE/LSE for. For the classical Gaussian Wishart case (scale), one has the identification
A concrete probabilistic construction for any is provided by theDumitriu–Edelman bidiagonal model. One samples a random bidiagonal matrix with independent chi variables and sets; the eigenvalues of follow the β-Laguerre law with parameter.[28][29]
Sampling
Classical. Bartlett’s decomposition gives with lower-triangular having chi/normal entries, yielding LOE/LUE/LSE exactly.[30]
General. In the Dumitriu–Edelman model, for matrix size and parameter, one samples independently
and sets. Then the eigenvalues of have the β-Laguerre joint density above.[28]
Rectangular data matrices
If one wishes to realize these spectra as the singular values of a rectangular matrix (with), draw independent Haar-distributed (or) and, and set
so that has eigenvalues. For i.i.d. Gaussian entries (true Wishart), is fixed by the field (real, complex, quaternion) and reduces to.[29][31]
At the “hard edge” (near), β-Laguerre ensembles exhibitBessel-kernel correlations and level repulsion of order. In particular, with and fixed, the distribution of the smallest eigenvalue converges to a universal hard-edge (Bessel) law that depends only on and.[32][29]
Under proportional growth, the empirical spectral distribution of (with i.i.d. entries of variance) converges almost surely to theMarchenko–Pastur law, supported on. When, the density diverges as at the origin (a hard-edge singularity).[33]
The Wishart distribution is related to theinverse-Wishart distribution, denoted by, as follows: IfX ~Wp(V,n) and if we do the change of variablesC =X−1, then. This relationship may be derived by noting that the absolute value of theJacobian determinant of this change of variables is|C|p+1, see for example equation (15.15) in.[34]
^Livan, Giacomo; Vivo, Pierpaolo (2011). "Moments of Wishart-Laguerre and Jacobi ensembles of random matrices: application to the quantum transport problem in chaotic cavities".Acta Physica Polonica B.42 (5): 1081.arXiv:1103.2638.doi:10.5506/APhysPolB.42.1081.ISSN0587-4254.S2CID119599157.
^abcBishop, C. M. (2006).Pattern Recognition and Machine Learning. Springer.
^Hoff, Peter D. (2009).A First Course in Bayesian Statistical Methods. New York: Springer. pp. 109–111.ISBN978-0-387-92299-7.
^Kilian, Lutz; Lütkepohl, Helmut (2017). "Bayesian VAR Analysis".Structural Vector Autoregressive Analysis. Cambridge University Press. pp. 140–170.doi:10.1017/9781108164818.006.ISBN978-1-107-19657-5.
^abcForrester, Peter J. (2010).Log-Gases and Random Matrices. London Mathematical Society Monographs. Princeton University Press.ISBN978-0-691-12829-0.
^Anderson, T. W. (2003).An Introduction to Multivariate Statistical Analysis (3rd ed.). Wiley-Interscience. p. 257.ISBN0-471-36091-0.