Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Covariance matrix

From Wikipedia, the free encyclopedia

Measure of covariance of components of a random vector
Not to be confused withCross-covariance matrix.
Abivariate Gaussian probability density function centered at (0, 0), with covariance matrix given by[10.50.51]{\displaystyle {\begin{bmatrix}1&0.5\\0.5&1\end{bmatrix}}}
Sample points from abivariate Gaussian distribution with a standard deviation of 3 in roughly the lower left–upper right direction and of 1 in the orthogonal direction. Because thex andy components co-vary, the variances ofx{\displaystyle x} andy{\displaystyle y} do not fully describe the distribution. A2×2{\displaystyle 2\times 2} covariance matrix is needed; the directions of the arrows correspond to theeigenvectors of this covariance matrix and their lengths to the square roots of theeigenvalues.
Part of a series onStatistics
Correlation and covariance

Inprobability theory andstatistics, acovariance matrix (also known asauto-covariance matrix,dispersion matrix,variance matrix, orvariance–covariance matrix) is a squarematrix giving thecovariance between each pair of elements of a givenrandom vector.

Intuitively, the covariance matrix generalizes the notion of variance to multiple dimensions. As an example, the variation in a collection of random points in two-dimensional space cannot be characterized fully by a single number, nor would the variances in thex{\displaystyle x} andy{\displaystyle y} directions contain all of the necessary information; a2×2{\displaystyle 2\times 2} matrix would be necessary to fully characterize the two-dimensional variation.

Anycovariance matrix issymmetric andpositive semi-definite and its main diagonal containsvariances (i.e., the covariance of each element with itself).

The covariance matrix of a random vectorX{\displaystyle \mathbf {X} } is typically denoted byKXX{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {X} }},Σ{\displaystyle \Sigma } orS{\displaystyle S}.

Definition

[edit]

Throughout this article, boldfaced unsubscriptedX{\displaystyle \mathbf {X} } andY{\displaystyle \mathbf {Y} } are used to refer to random vectors, and Roman subscriptedXi{\displaystyle X_{i}} andYi{\displaystyle Y_{i}} are used to refer to scalar random variables.

If the entries in thecolumn vectorX=(X1,X2,,Xn)T{\displaystyle \mathbf {X} =(X_{1},X_{2},\dots ,X_{n})^{\mathsf {T}}}arerandom variables, each with finitevariance andexpected value, then the covariance matrixKXX{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {X} }} is the matrix whose(i,j){\displaystyle (i,j)} entry is thecovariance[1]: 177 KXiXj=cov[Xi,Xj]=E[(XiE[Xi])(XjE[Xj])]{\displaystyle \operatorname {K} _{X_{i}X_{j}}=\operatorname {cov} [X_{i},X_{j}]=\operatorname {E} [(X_{i}-\operatorname {E} [X_{i}])(X_{j}-\operatorname {E} [X_{j}])]}where the operatorE{\displaystyle \operatorname {E} } denotes the expected value (mean) of its argument.

Conflicting nomenclatures and notations

[edit]

Nomenclatures differ. Some statisticians, following the probabilistWilliam Feller in his two-volume bookAn Introduction to Probability Theory and Its Applications,[2] call the matrixKXX{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {X} }} thevariance of the random vectorX{\displaystyle \mathbf {X} }, because it is the natural generalization to higher dimensions of the 1-dimensional variance. Others call it thecovariance matrix, because it is the matrix of covariances between the scalar components of the vectorX{\displaystyle \mathbf {X} }.var(X)=cov(X,X)=E[(XE[X])(XE[X])T].{\displaystyle \operatorname {var} (\mathbf {X} )=\operatorname {cov} (\mathbf {X} ,\mathbf {X} )=\operatorname {E} \left[(\mathbf {X} -\operatorname {E} [\mathbf {X} ])(\mathbf {X} -\operatorname {E} [\mathbf {X} ])^{\mathsf {T}}\right].}

Both forms are quite standard, and there is no ambiguity between them. The matrixKXX{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {X} }} is also often called thevariance-covariance matrix, since the diagonal terms are in fact variances.

By comparison, the notation for thecross-covariance matrixbetween two vectors iscov(X,Y)=KXY=E[(XE[X])(YE[Y])T].{\displaystyle \operatorname {cov} (\mathbf {X} ,\mathbf {Y} )=\operatorname {K} _{\mathbf {X} \mathbf {Y} }=\operatorname {E} \left[(\mathbf {X} -\operatorname {E} [\mathbf {X} ])(\mathbf {Y} -\operatorname {E} [\mathbf {Y} ])^{\mathsf {T}}\right].}

Properties

[edit]

Relation to the autocorrelation matrix

[edit]

The auto-covariance matrixKXX{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {X} }} is related to theautocorrelation matrixRXX{\displaystyle \operatorname {R} _{\mathbf {X} \mathbf {X} }} byKXX=E[(XE[X])(XE[X])T]=RXXE[X]E[X]T{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {X} }=\operatorname {E} [(\mathbf {X} -\operatorname {E} [\mathbf {X} ])(\mathbf {X} -\operatorname {E} [\mathbf {X} ])^{\mathsf {T}}]=\operatorname {R} _{\mathbf {X} \mathbf {X} }-\operatorname {E} [\mathbf {X} ]\operatorname {E} [\mathbf {X} ]^{\mathsf {T}}}where the autocorrelation matrix is defined asRXX=E[XXT]{\displaystyle \operatorname {R} _{\mathbf {X} \mathbf {X} }=\operatorname {E} [\mathbf {X} \mathbf {X} ^{\mathsf {T}}]}.

Relation to the correlation matrix

[edit]
Further information:Correlation matrix

An entity closely related to the covariance matrix is the matrix ofPearson product-moment correlation coefficients between each of the random variables in the random vectorX{\displaystyle \mathbf {X} }, which can be written ascorr(X)=(diag(KXX))12KXX(diag(KXX))12,{\displaystyle \operatorname {corr} (\mathbf {X} )={\big (}\operatorname {diag} (\operatorname {K} _{\mathbf {X} \mathbf {X} }){\big )}^{-{\frac {1}{2}}}\,\operatorname {K} _{\mathbf {X} \mathbf {X} }\,{\big (}\operatorname {diag} (\operatorname {K} _{\mathbf {X} \mathbf {X} }){\big )}^{-{\frac {1}{2}}},}wherediag(KXX){\displaystyle \operatorname {diag} (\operatorname {K} _{\mathbf {X} \mathbf {X} })} is the matrix of the diagonal elements ofKXX{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {X} }} (i.e., adiagonal matrix of the variances ofXi{\displaystyle X_{i}} fori=1,,n{\displaystyle i=1,\dots ,n}).

Equivalently, the correlation matrix can be seen as the covariance matrix of thestandardized random variablesXi/σ(Xi){\displaystyle X_{i}/\sigma (X_{i})} fori=1,,n{\displaystyle i=1,\dots ,n}.corr(X)=[1E[(X1μ1)(X2μ2)]σ(X1)σ(X2)E[(X1μ1)(Xnμn)]σ(X1)σ(Xn)E[(X2μ2)(X1μ1)]σ(X2)σ(X1)1E[(X2μ2)(Xnμn)]σ(X2)σ(Xn)E[(Xnμn)(X1μ1)]σ(Xn)σ(X1)E[(Xnμn)(X2μ2)]σ(Xn)σ(X2)1].{\displaystyle \operatorname {corr} (\mathbf {X} )={\begin{bmatrix}1&{\frac {\operatorname {E} [(X_{1}-\mu _{1})(X_{2}-\mu _{2})]}{\sigma (X_{1})\sigma (X_{2})}}&\cdots &{\frac {\operatorname {E} [(X_{1}-\mu _{1})(X_{n}-\mu _{n})]}{\sigma (X_{1})\sigma (X_{n})}}\\\\{\frac {\operatorname {E} [(X_{2}-\mu _{2})(X_{1}-\mu _{1})]}{\sigma (X_{2})\sigma (X_{1})}}&1&\cdots &{\frac {\operatorname {E} [(X_{2}-\mu _{2})(X_{n}-\mu _{n})]}{\sigma (X_{2})\sigma (X_{n})}}\\\\\vdots &\vdots &\ddots &\vdots \\\\{\frac {\operatorname {E} [(X_{n}-\mu _{n})(X_{1}-\mu _{1})]}{\sigma (X_{n})\sigma (X_{1})}}&{\frac {\operatorname {E} [(X_{n}-\mu _{n})(X_{2}-\mu _{2})]}{\sigma (X_{n})\sigma (X_{2})}}&\cdots &1\end{bmatrix}}.}

Each element on the principal diagonal of a correlation matrix is the correlation of a random variable with itself, which always equals 1. Eachoff-diagonal element is between −1 and +1 inclusive.

Inverse of the covariance matrix

[edit]

Theinverse of this matrix,KXX1{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {X} }^{-1}}, if it exists, is the inverse covariance matrix (or inverse concentration matrix[dubiousdiscuss]), also known as theprecision matrix (orconcentration matrix).[3]

Just as the covariance matrix can be written as the rescaling of a correlation matrix by the marginal variances:cov(X)=[σx10σx20σxn]{\displaystyle \operatorname {cov} (\mathbf {X} )={\begin{bmatrix}\sigma _{x_{1}}&&&0\\&\sigma _{x_{2}}&&\\&&\ddots &\\0&&&\sigma _{x_{n}}\end{bmatrix}}}

×[1ρx1,x2ρx1,xnρx2,x11ρx2,xnρxn,x1ρxn,x21]{\displaystyle \times {\begin{bmatrix}1&\rho _{x_{1},x_{2}}&\cdots &\rho _{x_{1},x_{n}}\\\rho _{x_{2},x_{1}}&1&\cdots &\rho _{x_{2},x_{n}}\\\vdots &\vdots &\ddots &\vdots \\\rho _{x_{n},x_{1}}&\rho _{x_{n},x_{2}}&\cdots &1\end{bmatrix}}}

×[σx10σx20σxn]{\displaystyle \times {\begin{bmatrix}\sigma _{x_{1}}&&&0\\&\sigma _{x_{2}}&&\\&&\ddots &\\0&&&\sigma _{x_{n}}\end{bmatrix}}}

So, using the idea ofpartial correlation, and partial variance, the inverse covariance matrix can be expressed analogously:cov(X)1=[1σx1x201σx2x1,x301σxnx1xn1]{\displaystyle \operatorname {cov} (\mathbf {X} )^{-1}={\begin{bmatrix}{\frac {1}{\sigma _{x_{1}\mid x_{2}\dots }}}&&&0\\&{\frac {1}{\sigma _{x_{2}\mid x_{1},x_{3}\dots }}}\\&&\ddots \\0&&&{\frac {1}{\sigma _{x_{n}\mid x_{1}\dots x_{n-1}}}}\end{bmatrix}}}

×[1ρx1,x2x3ρx1,xnx2xn1ρx2,x1x31ρx2,xnx1,x3xn1ρxn,x1x2xn1ρxn,x2x1,x3xn11]{\displaystyle \times {\begin{bmatrix}1&-\rho _{x_{1},x_{2}\mid x_{3}\dots }&\cdots &-\rho _{x_{1},x_{n}\mid x_{2}\dots x_{n-1}}\\-\rho _{x_{2},x_{1}\mid x_{3}\dots }&1&\cdots &-\rho _{x_{2},x_{n}\mid x_{1},x_{3}\dots x_{n-1}}\\\vdots &\vdots &\ddots &\vdots \\-\rho _{x_{n},x_{1}\mid x_{2}\dots x_{n-1}}&-\rho _{x_{n},x_{2}\mid x_{1},x_{3}\dots x_{n-1}}&\cdots &1\end{bmatrix}}}

×[1σx1x201σx2x1,x301σxnx1xn1]{\displaystyle \times {\begin{bmatrix}{\frac {1}{\sigma _{x_{1}\mid x_{2}\dots }}}&&&0\\&{\frac {1}{\sigma _{x_{2}\mid x_{1},x_{3}\dots }}}\\&&\ddots \\0&&&{\frac {1}{\sigma _{x_{n}\mid x_{1}\dots x_{n-1}}}}\end{bmatrix}}}

This duality motivates a number of other dualities between marginalizing and conditioning for Gaussian random variables.

Basic properties

[edit]

ForKXX=var(X)=E[(XE[X])(XE[X])T]{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {X} }=\operatorname {var} (\mathbf {X} )=\operatorname {E} \left[\left(\mathbf {X} -\operatorname {E} [\mathbf {X} ]\right)\left(\mathbf {X} -\operatorname {E} [\mathbf {X} ]\right)^{\mathsf {T}}\right]} andμX=E[X]{\displaystyle {\boldsymbol {\mu }}_{\mathbf {X} }=\operatorname {E} [{\textbf {X}}]}, whereX=(X1,,Xn)T{\displaystyle \mathbf {X} =(X_{1},\ldots ,X_{n})^{\mathsf {T}}} is ann{\displaystyle n}-dimensional random variable, the following basic properties apply:[4]

  1. KXX=E(XXT)μXμXT{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {X} }=\operatorname {E} (\mathbf {XX^{\mathsf {T}}} )-{\boldsymbol {\mu }}_{\mathbf {X} }{\boldsymbol {\mu }}_{\mathbf {X} }^{\mathsf {T}}}
  2. KXX{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {X} }\,} ispositive-semidefinite, i.e.aTKXXa0for all aRn{\displaystyle \mathbf {a} ^{T}\operatorname {K} _{\mathbf {X} \mathbf {X} }\mathbf {a} \geq 0\quad {\text{for all }}\mathbf {a} \in \mathbb {R} ^{n}}
Proof

Indeed, from the property 4 it follows that under linear transformation of random variableX{\displaystyle \mathbf {X} } with covariation matrixΣX=cov(X){\displaystyle \mathbf {\Sigma _{X}} =\mathrm {cov} (\mathbf {X} )} by linear operatorA{\displaystyle \mathbf {A} } s.a.Y=AX{\displaystyle \mathbf {Y} =\mathbf {A} \mathbf {X} }, the covariation matrix is transformed as

ΣY=cov(Y)=AΣXA{\displaystyle \mathbf {\Sigma _{Y}} =\mathrm {cov} \left(\mathbf {Y} \right)=\mathbf {A\,\Sigma _{X}\,A} ^{\top }}.

As according to the property 3 matrixΣX{\displaystyle \mathbf {\Sigma _{X}} } is symmetric, it can be diagonalized by a linear orthogonal transformation, i.e. there exists such orthogonal matrixA{\displaystyle \mathbf {A} } (meanwhileA=A1{\displaystyle \mathbf {A} ^{\top }=\mathbf {A} ^{-1}}), that

AΣXA=AΣXA1=diag(σ1,,σn),{\displaystyle \mathbf {A\,\Sigma _{X}\,A} ^{\top }=\mathbf {A\,\Sigma _{X}\,A} ^{-1}={\mbox{diag}}(\sigma _{1},\ldots ,\sigma _{n}),}
andσ1,,σn{\displaystyle \sigma _{1},\ldots ,\sigma _{n}} are eigenvalues ofΣX{\displaystyle \mathbf {\Sigma _{X}} }. But this means that this matrix is a covariation matrix for a random variableY=AX{\displaystyle \mathbf {Y} =\mathbf {A} \mathbf {X} }, and the main diagonal ofΣY=cov(Y){\displaystyle \mathbf {\Sigma _{Y}} =\mathrm {cov} \left(\mathbf {Y} \right)} consists of variances of elements ofY{\displaystyle \mathbf {Y} } vector. As variance is always non-negative, we conclude thatσi0{\displaystyle \sigma _{i}\geq 0} for anyi{\displaystyle i}. But this means that matrixΣX{\displaystyle \mathbf {\Sigma _{X}} } is positive-semidefinite.
  1. KXX{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {X} }\,} issymmetric, i.e.KXXT=KXX{\displaystyle \operatorname {K} _{\mathbf {X} \mathbf {X} }^{\mathsf {T}}=\operatorname {K} _{\mathbf {X} \mathbf {X} }}
  2. For any constant (i.e. non-random)m×n{\displaystyle m\times n} matrixA{\displaystyle \mathbf {A} } and constantm×1{\displaystyle m\times 1} vectora{\displaystyle \mathbf {a} }, one hasvar(AX+a)=Avar(X)AT{\displaystyle \operatorname {var} (\mathbf {AX} +\mathbf {a} )=\mathbf {A} \,\operatorname {var} (\mathbf {X} )\,\mathbf {A} ^{\mathsf {T}}}
  3. IfY{\displaystyle \mathbf {Y} } is another random vector with the same dimension asX{\displaystyle \mathbf {X} }, thenvar(X+Y)=var(X)+cov(X,Y)+cov(Y,X)+var(Y){\displaystyle \operatorname {var} (\mathbf {X} +\mathbf {Y} )=\operatorname {var} (\mathbf {X} )+\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )+\operatorname {cov} (\mathbf {Y} ,\mathbf {X} )+\operatorname {var} (\mathbf {Y} )} wherecov(X,Y){\displaystyle \operatorname {cov} (\mathbf {X} ,\mathbf {Y} )} is thecross-covariance matrix ofX{\displaystyle \mathbf {X} } andY{\displaystyle \mathbf {Y} }.

Block matrices

[edit]

The joint meanμ{\displaystyle {\boldsymbol {\mu }}} andjoint covariance matrixΣ{\displaystyle {\boldsymbol {\Sigma }}} ofX{\displaystyle \mathbf {X} } andY{\displaystyle \mathbf {Y} } can be written in block formμ=[μXμY],Σ=[KXXKXYKYXKYY]{\displaystyle {\boldsymbol {\mu }}={\begin{bmatrix}{\boldsymbol {\mu }}_{X}\\{\boldsymbol {\mu }}_{Y}\end{bmatrix}},\qquad {\boldsymbol {\Sigma }}={\begin{bmatrix}\operatorname {K} _{\mathbf {XX} }&\operatorname {K} _{\mathbf {XY} }\\\operatorname {K} _{\mathbf {YX} }&\operatorname {K} _{\mathbf {YY} }\end{bmatrix}}}whereKXX=var(X){\displaystyle \operatorname {K} _{\mathbf {XX} }=\operatorname {var} (\mathbf {X} )},KYY=var(Y){\displaystyle \operatorname {K} _{\mathbf {YY} }=\operatorname {var} (\mathbf {Y} )} andKXY=KYXT=cov(X,Y){\displaystyle \operatorname {K} _{\mathbf {XY} }=\operatorname {K} _{\mathbf {YX} }^{\mathsf {T}}=\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )}.

KXX{\displaystyle \operatorname {K} _{\mathbf {XX} }} andKYY{\displaystyle \operatorname {K} _{\mathbf {YY} }} can be identified as the variance matrices of themarginal distributions forX{\displaystyle \mathbf {X} } andY{\displaystyle \mathbf {Y} } respectively.

IfX{\displaystyle \mathbf {X} } andY{\displaystyle \mathbf {Y} } arejointly normally distributed,X,Y N(μ,Σ),{\displaystyle \mathbf {X} ,\mathbf {Y} \sim \ {\mathcal {N}}({\boldsymbol {\mu }},\operatorname {\boldsymbol {\Sigma }} ),}then theconditional distribution forY{\displaystyle \mathbf {Y} } givenX{\displaystyle \mathbf {X} } is given by[5]YX N(μY|X,KY|X),{\displaystyle \mathbf {Y} \mid \mathbf {X} \sim \ {\mathcal {N}}({\boldsymbol {\mu }}_{\mathbf {Y|X} },\operatorname {K} _{\mathbf {Y|X} }),}defined byconditional meanμY|X=μY+KYXKXX1(XμX){\displaystyle {\boldsymbol {\mu }}_{\mathbf {Y} |\mathbf {X} }={\boldsymbol {\mu }}_{\mathbf {Y} }+\operatorname {K} _{\mathbf {YX} }\operatorname {K} _{\mathbf {XX} }^{-1}\left(\mathbf {X} -{\boldsymbol {\mu }}_{\mathbf {X} }\right)}andconditional varianceKY|X=KYYKYXKXX1KXY.{\displaystyle \operatorname {K} _{\mathbf {Y|X} }=\operatorname {K} _{\mathbf {YY} }-\operatorname {K} _{\mathbf {YX} }\operatorname {K} _{\mathbf {XX} }^{-1}\operatorname {K} _{\mathbf {XY} }.}

The matrixKYXKXX1{\displaystyle \operatorname {K} _{\mathbf {YX} }\operatorname {K} _{\mathbf {XX} }^{-1}} is known as the matrix ofregression coefficients, while in linear algebraKY|X{\displaystyle \operatorname {K} _{\mathbf {Y|X} }} is theSchur complement ofKXX{\displaystyle \operatorname {K} _{\mathbf {XX} }} inΣ{\displaystyle {\boldsymbol {\Sigma }}}.

The matrix of regression coefficients may often be given in transpose form,KXX1KXY{\displaystyle \operatorname {K} _{\mathbf {XX} }^{-1}\operatorname {K} _{\mathbf {XY} }}, suitable for post-multiplying a row vector of explanatory variablesXT{\displaystyle \mathbf {X} ^{\mathsf {T}}} rather than pre-multiplying a column vectorX{\displaystyle \mathbf {X} }. In this form they correspond to the coefficients obtained by inverting the matrix of thenormal equations ofordinary least squares (OLS).

Partial covariance matrix

[edit]

A covariance matrix with all non-zero elements tells us that all the individual random variables are interrelated. This means that the variables are not only directly correlated, but also correlated via other variables indirectly. Often such indirect,common-mode correlations are trivial and uninteresting. They can be suppressed by calculating the partial covariance matrix, that is the part of covariance matrix that shows only the interesting part of correlations.

If two vectors of random variablesX{\displaystyle \mathbf {X} } andY{\displaystyle \mathbf {Y} } are correlated via another vectorI{\displaystyle \mathbf {I} }, the latter correlations are suppressed in a matrix[6]KXYI=pcov(X,YI)=cov(X,Y)cov(X,I)cov(I,I)1cov(I,Y).{\displaystyle \operatorname {K} _{\mathbf {XY\mid I} }=\operatorname {pcov} (\mathbf {X} ,\mathbf {Y} \mid \mathbf {I} )=\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )-\operatorname {cov} (\mathbf {X} ,\mathbf {I} )\operatorname {cov} (\mathbf {I} ,\mathbf {I} )^{-1}\operatorname {cov} (\mathbf {I} ,\mathbf {Y} ).}The partial covariance matrixKXYI{\displaystyle \operatorname {K} _{\mathbf {XY\mid I} }} is effectively the simple covariance matrixKXY{\displaystyle \operatorname {K} _{\mathbf {XY} }} as if the uninteresting random variablesI{\displaystyle \mathbf {I} } were held constant.

Standard deviation matrix

[edit]
Main article:Standard deviation § Standard deviation matrix

The standard deviation matrixS{\displaystyle \mathbf {S} } is the extension of the standard deviation to multiple dimensions. It is the symmetricsquare root of the covariance matrixΣ{\displaystyle \mathbf {\Sigma } }.[7]

Covariance matrix as a parameter of a distribution

[edit]

If a column vectorX{\displaystyle \mathbf {X} } ofn{\displaystyle n} possibly correlated random variables isjointly normally distributed, or more generallyelliptically distributed, then itsprobability density functionf(X){\displaystyle \operatorname {f} (\mathbf {X} )} can be expressed in terms of the covariance matrixΣ{\displaystyle {\boldsymbol {\Sigma }}} as follows[6]f(X)=(2π)n/2|Σ|1/2exp(12(Xμ)TΣ1(Xμ)),{\displaystyle \operatorname {f} (\mathbf {X} )=(2\pi )^{-n/2}|{\boldsymbol {\Sigma }}|^{-1/2}\exp \left(-{\tfrac {1}{2}}\mathbf {(X-\mu )^{\mathsf {T}}\Sigma ^{-1}(X-\mu )} \right),}whereμ=E[X]{\displaystyle {\boldsymbol {\mu }}=\operatorname {E} [\mathbf {X} ]} and|Σ|{\displaystyle |{\boldsymbol {\Sigma }}|} is thedeterminant ofΣ{\displaystyle {\boldsymbol {\Sigma }}}, the so-calledgeneralized variance.

Covariance matrix as a linear operator

[edit]
Main article:Covariance operator

Applied to one vector, the covariance matrix maps a linear combinationc of the random variablesX onto a vector of covariances with those variables:cTΣ=cov(cTX,X){\displaystyle \mathbf {c} ^{\mathsf {T}}\Sigma =\operatorname {cov} (\mathbf {c} ^{\mathsf {T}}\mathbf {X} ,\mathbf {X} )}. Treated as abilinear form, it yields the covariance between the two linear combinations:dTΣc=cov(dTX,cTX){\displaystyle \mathbf {d} ^{\mathsf {T}}{\boldsymbol {\Sigma }}\mathbf {c} =\operatorname {cov} (\mathbf {d} ^{\mathsf {T}}\mathbf {X} ,\mathbf {c} ^{\mathsf {T}}\mathbf {X} )}. The variance of a linear combination is thencTΣc{\displaystyle \mathbf {c} ^{\mathsf {T}}{\boldsymbol {\Sigma }}\mathbf {c} }, its covariance with itself.

Similarly, the (pseudo-)inverse covariance matrix provides an inner productcμ|Σ+|cμ{\displaystyle \langle c-\mu |\Sigma ^{+}|c-\mu \rangle }, which induces theMahalanobis distance, a measure of the "unlikelihood" ofc.[citation needed]

Admissibility

[edit]

From basic property 4. above, letb{\displaystyle \mathbf {b} } be a(p×1){\displaystyle (p\times 1)} real-valued vector, thenvar(bTX)=bTvar(X)b,{\displaystyle \operatorname {var} (\mathbf {b} ^{\mathsf {T}}\mathbf {X} )=\mathbf {b} ^{\mathsf {T}}\operatorname {var} (\mathbf {X} )\mathbf {b} ,\,}which must always be nonnegative, since it is thevariance of a real-valued random variable, so a covariance matrix is always apositive-semidefinite matrix.

The above argument can be expanded as follows:wTE[(XE[X])(XE[X])T]w=E[wT(XE[X])(XE[X])Tw]=E[(wT(XE[X]))2]0,{\displaystyle {\begin{aligned}&w^{\mathsf {T}}\operatorname {E} \left[(\mathbf {X} -\operatorname {E} [\mathbf {X} ])(\mathbf {X} -\operatorname {E} [\mathbf {X} ])^{\mathsf {T}}\right]w=\operatorname {E} \left[w^{\mathsf {T}}(\mathbf {X} -\operatorname {E} [\mathbf {X} ])(\mathbf {X} -\operatorname {E} [\mathbf {X} ])^{\mathsf {T}}w\right]\\&=\operatorname {E} {\big [}{\big (}w^{\mathsf {T}}(\mathbf {X} -\operatorname {E} [\mathbf {X} ]){\big )}^{2}{\big ]}\geq 0,\end{aligned}}}where the last inequality follows from the observation thatwT(XE[X]){\displaystyle w^{\mathsf {T}}(\mathbf {X} -\operatorname {E} [\mathbf {X} ])} is a scalar.

Conversely, every symmetric positive semi-definite matrix is a covariance matrix. To see this, supposeM{\displaystyle M} is ap×p{\displaystyle p\times p} symmetric positive-semidefinite matrix. From the finite-dimensional case of thespectral theorem, it follows thatM{\displaystyle M} has a nonnegative symmetricsquare root, which can be denoted byM1/2. LetX{\displaystyle \mathbf {X} } be anyp×1{\displaystyle p\times 1} column vector-valued random variable whose covariance matrix is thep×p{\displaystyle p\times p} identity matrix. Thenvar(M1/2X)=M1/2var(X)M1/2=M.{\displaystyle \operatorname {var} (\mathbf {M} ^{1/2}\mathbf {X} )=\mathbf {M} ^{1/2}\,\operatorname {var} (\mathbf {X} )\,\mathbf {M} ^{1/2}=\mathbf {M} .}

Complex random vectors

[edit]
Further information:Complex random vector § Covariance matrix and pseudo-covariance matrix

Thevariance of acomplexscalar-valued random variable with expected valueμ{\displaystyle \mu } is conventionally defined usingcomplex conjugation:var(Z)=E[(ZμZ)(ZμZ)¯],{\displaystyle \operatorname {var} (Z)=\operatorname {E} \left[(Z-\mu _{Z}){\overline {(Z-\mu _{Z})}}\right],}where the complex conjugate of a complex numberz{\displaystyle z} is denotedz¯{\displaystyle {\overline {z}}}; thus the variance of a complex random variable is a real number.

IfZ=(Z1,,Zn)T{\displaystyle \mathbf {Z} =(Z_{1},\ldots ,Z_{n})^{\mathsf {T}}} is a column vector of complex-valued random variables, then theconjugate transposeZH{\displaystyle \mathbf {Z} ^{\mathsf {H}}} is formed byboth transposing and conjugating. In the following expression, the product of a vector with its conjugate transpose results in a square matrix called thecovariance matrix, as its expectation:[8]: 293 KZZ=cov[Z,Z]=E[(ZμZ)(ZμZ)H],{\displaystyle \operatorname {K} _{\mathbf {Z} \mathbf {Z} }=\operatorname {cov} [\mathbf {Z} ,\mathbf {Z} ]=\operatorname {E} \left[(\mathbf {Z} -{\boldsymbol {\mu }}_{\mathbf {Z} })(\mathbf {Z} -{\boldsymbol {\mu }}_{\mathbf {Z} })^{\mathsf {H}}\right],}The matrix so obtained will beHermitianpositive-semidefinite,[9] with real numbers in the main diagonal and complex numbers off-diagonal.

Properties

Pseudo-covariance matrix

[edit]

For complex random vectors, another kind of second central moment, thepseudo-covariance matrix (also calledrelation matrix) is defined as follows:JZZ=cov[Z,Z¯]=E[(ZμZ)(ZμZ)T]{\displaystyle \operatorname {J} _{\mathbf {Z} \mathbf {Z} }=\operatorname {cov} [\mathbf {Z} ,{\overline {\mathbf {Z} }}]=\operatorname {E} \left[(\mathbf {Z} -{\boldsymbol {\mu }}_{\mathbf {Z} })(\mathbf {Z} -{\boldsymbol {\mu }}_{\mathbf {Z} })^{\mathsf {T}}\right]}

In contrast to the covariance matrix defined above, Hermitian transposition gets replaced by transposition in the definition.Its diagonal elements may be complex valued; it is acomplex symmetric matrix.

Estimation

[edit]
Main article:Estimation of covariance matrices

IfMX{\displaystyle \mathbf {M} _{\mathbf {X} }} andMY{\displaystyle \mathbf {M} _{\mathbf {Y} }} are centereddata matrices of dimensionp×n{\displaystyle p\times n} andq×n{\displaystyle q\times n} respectively, i.e. withn columns of observations ofp andq rows of variables, from which the row means have been subtracted, then, if the row means were estimated from the data, sample covariance matricesQXX{\displaystyle \mathbf {Q} _{\mathbf {XX} }} andQXY{\displaystyle \mathbf {Q} _{\mathbf {XY} }} can be defined to beQXX=1n1MXMXT,QXY=1n1MXMYT{\displaystyle \mathbf {Q} _{\mathbf {XX} }={\frac {1}{n-1}}\mathbf {M} _{\mathbf {X} }\mathbf {M} _{\mathbf {X} }^{\mathsf {T}},\qquad \mathbf {Q} _{\mathbf {XY} }={\frac {1}{n-1}}\mathbf {M} _{\mathbf {X} }\mathbf {M} _{\mathbf {Y} }^{\mathsf {T}}}or, if the row means were known a priori,QXX=1nMXMXT,QXY=1nMXMYT.{\displaystyle \mathbf {Q} _{\mathbf {XX} }={\frac {1}{n}}\mathbf {M} _{\mathbf {X} }\mathbf {M} _{\mathbf {X} }^{\mathsf {T}},\qquad \mathbf {Q} _{\mathbf {XY} }={\frac {1}{n}}\mathbf {M} _{\mathbf {X} }\mathbf {M} _{\mathbf {Y} }^{\mathsf {T}}.}

These empirical sample covariance matrices are the most straightforward and most often used estimators for the covariance matrices, but other estimators also exist, including regularised or shrinkage estimators, which may have better properties.

Applications

[edit]

The covariance matrix is a useful tool in many different areas. From it atransformation matrix can be derived, called awhitening transformation, that allows one to completely decorrelate the data[10] or, from a different point of view, to find an optimal basis for representing the data in a compact way[citation needed] (seeRayleigh quotient for a formal proof and additional properties of covariance matrices).This is calledprincipal component analysis (PCA) and theKarhunen–Loève transform (KL-transform).

The covariance matrix plays a key role infinancial economics, especially inportfolio theory and itsmutual fund separation theorem and in thecapital asset pricing model. The matrix of covariances among various assets' returns is used to determine, under certain assumptions, the relative amounts of different assets that investors should (in anormative analysis) or are predicted to (in apositive analysis) choose to hold in a context ofdiversification.

Use in optimization

[edit]

Theevolution strategy, a particular family of Randomized Search Heuristics, fundamentally relies on a covariance matrix in its mechanism. The characteristic mutation operator draws the update step from a multivariate normal distribution using an evolving covariance matrix. There is a formal proof that theevolution strategy's covariance matrix adapts to the inverse of theHessian matrix of the search landscape,up to a scalar factor and small random fluctuations (proven for a single-parent strategy and a static model, as the population size increases, relying on the quadratic approximation).[11]Intuitively, this result is supported by the rationale that the optimal covariance distribution can offer mutation steps whose equidensity probability contours match the level sets of the landscape, and so they maximize the progress rate.

Covariance mapping

[edit]

Incovariance mapping the values of thecov(X,Y){\displaystyle \operatorname {cov} (\mathbf {X} ,\mathbf {Y} )} orpcov(X,YI){\displaystyle \operatorname {pcov} (\mathbf {X} ,\mathbf {Y} \mid \mathbf {I} )} matrix are plotted as a 2-dimensional map. When vectorsX{\displaystyle \mathbf {X} } andY{\displaystyle \mathbf {Y} } are discreterandom functions, the map shows statistical relations between different regions of the random functions. Statistically independent regions of the functions show up on the map as zero-level flatland, while positive or negative correlations show up, respectively, as hills or valleys.

In practice the column vectorsX,Y{\displaystyle \mathbf {X} ,\mathbf {Y} }, andI{\displaystyle \mathbf {I} } are acquired experimentally as rows ofn{\displaystyle n} samples, e.g.[X1,X2,,Xn]=[X1(t1)X2(t1)Xn(t1)X1(t2)X2(t2)Xn(t2)X1(tm)X2(tm)Xn(tm)],{\displaystyle \left[\mathbf {X} _{1},\mathbf {X} _{2},\dots ,\mathbf {X} _{n}\right]={\begin{bmatrix}X_{1}(t_{1})&X_{2}(t_{1})&\cdots &X_{n}(t_{1})\\\\X_{1}(t_{2})&X_{2}(t_{2})&\cdots &X_{n}(t_{2})\\\\\vdots &\vdots &\ddots &\vdots \\\\X_{1}(t_{m})&X_{2}(t_{m})&\cdots &X_{n}(t_{m})\end{bmatrix}},}whereXj(ti){\displaystyle X_{j}(t_{i})} is thei-th discrete value in samplej of the random functionX(t){\displaystyle X(t)}. The expected values needed in the covariance formula are estimated using thesample mean, e.g.X=1nj=1nXj{\displaystyle \langle \mathbf {X} \rangle ={\frac {1}{n}}\sum _{j=1}^{n}\mathbf {X} _{j}}and the covariance matrix is estimated by thesample covariance matrixcov(X,Y)XYTXYT,{\displaystyle \operatorname {cov} (\mathbf {X} ,\mathbf {Y} )\approx \langle \mathbf {XY^{\mathsf {T}}} \rangle -\langle \mathbf {X} \rangle \langle \mathbf {Y} ^{\mathsf {T}}\rangle ,}where the angular brackets denote sample averaging as before except that theBessel's correction should be made to avoidbias. Using this estimation the partial covariance matrix can be calculated aspcov(X,YI)=cov(X,Y)cov(X,I)(cov(I,I)cov(I,Y)),{\displaystyle \operatorname {pcov} (\mathbf {X} ,\mathbf {Y} \mid \mathbf {I} )=\operatorname {cov} (\mathbf {X} ,\mathbf {Y} )-\operatorname {cov} (\mathbf {X} ,\mathbf {I} )\left(\operatorname {cov} (\mathbf {I} ,\mathbf {I} )\backslash \operatorname {cov} (\mathbf {I} ,\mathbf {Y} )\right),}where the backslash denotes theleft matrix division operator, which bypasses the requirement to invert a matrix and is available in some computational packages such asMatlab.[12]

Figure 1: Construction of a partial covariance map of N2 molecules undergoing Coulomb explosion induced by a free-electron laser.[13] Panelsa andb map the two terms of the covariance matrix, which is shown in panelc. Paneld maps common-mode correlations via intensity fluctuations of the laser. Panele maps the partial covariance matrix that is corrected for the intensity fluctuations. Panelf shows that 10% overcorrection improves the map and makes ion-ion correlations clearly visible. Owing to momentum conservation these correlations appear as lines approximately perpendicular to the autocorrelation line (and to the periodic modulations which are caused by detector ringing).

Fig. 1 illustrates how a partial covariance map is constructed on an example of an experiment performed at theFLASHfree-electron laser in Hamburg.[13] The random functionX(t){\displaystyle X(t)} is thetime-of-flight spectrum of ions from aCoulomb explosion of nitrogen molecules multiply ionised by a laser pulse. Since only a few hundreds of molecules are ionised at each laser pulse, the single-shot spectra are highly fluctuating. However, collecting typicallym=104{\displaystyle m=10^{4}} such spectra,Xj(t){\displaystyle \mathbf {X} _{j}(t)}, and averaging them overj{\displaystyle j} produces a smooth spectrumX(t){\displaystyle \langle \mathbf {X} (t)\rangle }, which is shown in red at the bottom of Fig. 1. The average spectrumX{\displaystyle \langle \mathbf {X} \rangle } reveals several nitrogen ions in a form of peaks broadened by their kinetic energy, but to find the correlations between the ionisation stages and the ion momenta requires calculating a covariance map.

In the example of Fig. 1 spectraXj(t){\displaystyle \mathbf {X} _{j}(t)} andYj(t){\displaystyle \mathbf {Y} _{j}(t)} are the same, except that the range of the time-of-flightt{\displaystyle t} differs. Panela showsXYT{\displaystyle \langle \mathbf {XY^{\mathsf {T}}} \rangle }, panelb showsXYT{\displaystyle \langle \mathbf {X} \rangle \langle \mathbf {Y} ^{\mathsf {T}}\rangle } and panelc shows their difference, which iscov(X,Y){\displaystyle \operatorname {cov} (\mathbf {X} ,\mathbf {Y} )} (note a change in the colour scale). Unfortunately, this map is overwhelmed by uninteresting, common-mode correlations induced by laser intensity fluctuating from shot to shot. To suppress such correlations the laser intensityIj{\displaystyle I_{j}} is recorded at every shot, put intoI{\displaystyle \mathbf {I} } andpcov(X,YI){\displaystyle \operatorname {pcov} (\mathbf {X} ,\mathbf {Y} \mid \mathbf {I} )} is calculated as panelsd ande show. The suppression of the uninteresting correlations is, however, imperfect because there are other sources of common-mode fluctuations than the laser intensity and in principle all these sources should be monitored in vectorI{\displaystyle \mathbf {I} }. Yet in practice it is often sufficient to overcompensate the partial covariance correction as panelf shows, where interesting correlations of ion momenta are now clearly visible as straight lines centred on ionisation stages of atomic nitrogen.

Two-dimensional infrared spectroscopy

[edit]

Two-dimensional infrared spectroscopy employscorrelation analysis to obtain 2D spectra of thecondensed phase. There are two versions of this analysis:synchronous andasynchronous. Mathematically, the former is expressed in terms of the sample covariance matrix and the technique is equivalent to covariance mapping.[14]

See also

[edit]

References

[edit]
  1. ^abcPark, Kun Il (2018).Fundamentals of Probability and Stochastic Processes with Applications to Communications. Springer.ISBN 978-3-319-68074-3.
  2. ^William Feller (1971).An introduction to probability theory and its applications. Wiley.ISBN 978-0-471-25709-7. Retrieved10 August 2012.
  3. ^Wasserman, Larry (2004).All of Statistics: A Concise Course in Statistical Inference. Springer.ISBN 0-387-40272-1.
  4. ^Taboga, Marco (2010)."Lectures on probability theory and mathematical statistics".
  5. ^Eaton, Morris L. (1983).Multivariate Statistics: a Vector Space Approach. John Wiley and Sons. pp. 116–117.ISBN 0-471-02776-6.
  6. ^abW J Krzanowski "Principles of Multivariate Analysis" (Oxford University Press, New York, 1988), Chap. 14.4; K V Mardia, J T Kent and J M Bibby "Multivariate Analysis (Academic Press, London, 1997), Chap. 6.5.3; T W Anderson "An Introduction to Multivariate Statistical Analysis" (Wiley, New York, 2003), 3rd ed., Chaps. 2.5.1 and 4.3.1.
  7. ^Das, Abhranil; Wilson S Geisler (2020). "Methods to integrate multinormals and compute classification measures".arXiv:2012.14331 [stat.ML].
  8. ^Lapidoth, Amos (2009).A Foundation in Digital Communication. Cambridge University Press.ISBN 978-0-521-19395-5.
  9. ^Brookes, Mike."The Matrix Reference Manual".
  10. ^Kessy, Agnan; Strimmer, Korbinian; Lewin, Alex (2018)."Optimal Whitening and Decorrelation".The American Statistician.72 (4). Taylor & Francis:309–314.arXiv:1512.00809.doi:10.1080/00031305.2016.1277159.
  11. ^Shir, O.M.; A. Yehudayoff (2020)."On the covariance-Hessian relation in evolution strategies".Theoretical Computer Science.801. Elsevier:157–174.arXiv:1806.03674.doi:10.1016/j.tcs.2019.09.002.
  12. ^L J Frasinski "Covariance mapping techniques"J. Phys. B: At. Mol. Opt. Phys.49 152004 (2016),doi:10.1088/0953-4075/49/15/152004
  13. ^abO Kornilov, M Eckstein, M Rosenblatt, C P Schulz, K Motomura, A Rouzée, J Klei, L Foucar, M Siano, A Lübcke, F. Schapper, P Johnsson, D M P Holland, T Schlatholter, T Marchenko, S Düsterer, K Ueda, M J J Vrakking and L J Frasinski "Coulomb explosion of diatomic molecules in intense XUV fields mapped by partial covariance"J. Phys. B: At. Mol. Opt. Phys.46 164028 (2013),doi:10.1088/0953-4075/46/16/164028
  14. ^Noda, I. (1993). "Generalized two-dimensional correlation method applicable to infrared, Raman, and other types of spectroscopy".Appl. Spectrosc.47 (9):1329–36.Bibcode:1993ApSpe..47.1329N.doi:10.1366/0003702934067694.

Further reading

[edit]
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis (see alsoTemplate:Least squares and regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
Matrix classes
Explicitly constrained entries
Constant
Conditions oneigenvalues or eigenvectors
Satisfying conditions onproducts orinverses
With specific applications
Used instatistics
Used ingraph theory
Used in science and engineering
Related terms
Retrieved from "https://en.wikipedia.org/w/index.php?title=Covariance_matrix&oldid=1323208802"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp