Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Canonical correlation

From Wikipedia, the free encyclopedia
Part of a series on
Machine learning
anddata mining
Way of inferring information from cross-covariance matrices

Instatistics,canonical-correlation analysis (CCA), also calledcanonical variates analysis, is a way of inferring information fromcross-covariance matrices. If we have two vectorsX = (X1, ..., Xn) andY = (Y1, ..., Ym) ofrandom variables, and there arecorrelations among the variables, then canonical-correlation analysis will findlinear combinations ofX andY that have a maximum correlation with each other.[1] T. R. Knapp notes that "virtually all of the commonly encounteredparametric tests of significance can be treated as special cases of canonical-correlation analysis, which is the general procedure for investigating the relationships between two sets of variables."[2] The method was first introduced byHarold Hotelling in 1936,[3] although in the context ofangles between flats the mathematical concept was published byCamille Jordan in 1875.[4]

CCA is now a cornerstone of multivariate statistics and multi-view learning, and a great number of interpretations and extensions have been proposed, such as probabilistic CCA, sparse CCA, multi-view CCA, deep CCA,[5] and DeepGeoCCA.[6] Unfortunately, perhaps because of its popularity, the literature can be inconsistent with notation, we attempt to highlight such inconsistencies in this article to help the reader make best use of the existing literature and techniques available.

Like its sister methodPCA, CCA can be viewed inpopulation form (corresponding to random vectors and their covariance matrices) or insample form (corresponding to datasets and their sample covariance matrices). These two forms are almost exact analogues of each other, which is why their distinction is often overlooked, but they can behave very differently in high dimensional settings.[7] We next give explicit mathematical definitions for the population problem and highlight the different objects in the so-calledcanonical decomposition - understanding the differences between these objects is crucial for interpretation of the technique.

Population CCA definition via correlations

[edit]

Given twocolumn vectorsX=(x1,,xn)T{\displaystyle X=(x_{1},\dots ,x_{n})^{T}} andY=(y1,,ym)T{\displaystyle Y=(y_{1},\dots ,y_{m})^{T}} ofrandom variables withfinitesecond moments, one may define thecross-covarianceΣXY=cov(X,Y){\displaystyle \Sigma _{XY}=\operatorname {cov} (X,Y)} to be then×m{\displaystyle n\times m}matrix whose(i,j){\displaystyle (i,j)} entry is thecovariancecov(xi,yj){\displaystyle \operatorname {cov} (x_{i},y_{j})}. In practice, we would estimate the covariance matrix based on sampled data fromX{\displaystyle X} andY{\displaystyle Y} (i.e. from a pair of data matrices).

Canonical-correlation analysis seeks a sequence of vectorsak{\displaystyle a_{k}} (akRn{\displaystyle a_{k}\in \mathbb {R} ^{n}}) andbk{\displaystyle b_{k}} (bkRm{\displaystyle b_{k}\in \mathbb {R} ^{m}}) such that the random variablesakTX{\displaystyle a_{k}^{T}X} andbkTY{\displaystyle b_{k}^{T}Y} maximize thecorrelationρ=corr(akTX,bkTY){\displaystyle \rho =\operatorname {corr} (a_{k}^{T}X,b_{k}^{T}Y)}. The (scalar) random variablesU=a1TX{\displaystyle U=a_{1}^{T}X} andV=b1TY{\displaystyle V=b_{1}^{T}Y} are thefirst pair of canonical variables. Then one seeks vectors maximizing the same correlation subject to the constraint that they are to be uncorrelated with the first pair of canonical variables; this gives thesecond pair of canonical variables. This procedure may be continued up tomin{m,n}{\displaystyle \min\{m,n\}} times.

(ak,bk)=argmaxa,bcorr(aTX,bTY) subject to cov(aTX,ajTX)=cov(bTY,bjTY)=0 for j=1,,k1{\displaystyle (a_{k},b_{k})={\underset {a,b}{\operatorname {argmax} }}\operatorname {corr} (a^{T}X,b^{T}Y)\quad {\text{ subject to }}\operatorname {cov} (a^{T}X,a_{j}^{T}X)=\operatorname {cov} (b^{T}Y,b_{j}^{T}Y)=0{\text{ for }}j=1,\dots ,k-1}

The sets of vectorsak,bk{\displaystyle a_{k},b_{k}} are calledcanonical directions orweight vectors or simplyweights. The 'dual' sets of vectorsΣXXak,ΣYYbk{\displaystyle \Sigma _{XX}a_{k},\Sigma _{YY}b_{k}} are calledcanonical loading vectors or simplyloadings; these are often more straightforward to interpret than the weights.[8]

Computation

[edit]

Derivation

[edit]

LetΣXY{\displaystyle \Sigma _{XY}} be thecross-covariance matrix for any pair of (vector-shaped) random variablesX{\displaystyle X} andY{\displaystyle Y}. The target function to maximize is

ρ=aTΣXYbaTΣXXabTΣYYb.{\displaystyle \rho ={\frac {a^{T}\Sigma _{XY}b}{{\sqrt {a^{T}\Sigma _{XX}a}}{\sqrt {b^{T}\Sigma _{YY}b}}}}.}

The first step is to define achange of basis and define

c=ΣXX1/2a,{\displaystyle c=\Sigma _{XX}^{1/2}a,}
d=ΣYY1/2b,{\displaystyle d=\Sigma _{YY}^{1/2}b,}

whereΣXX1/2{\displaystyle \Sigma _{XX}^{1/2}} andΣYY1/2{\displaystyle \Sigma _{YY}^{1/2}} can be obtained from the eigen-decomposition (or bydiagonalization):

ΣXX1/2=VXDX1/2VX,VXDXVX=ΣXX,{\displaystyle \Sigma _{XX}^{1/2}=V_{X}D_{X}^{1/2}V_{X}^{\top },\qquad V_{X}D_{X}V_{X}^{\top }=\Sigma _{XX},}

and

ΣYY1/2=VYDY1/2VY,VYDYVY=ΣYY.{\displaystyle \Sigma _{YY}^{1/2}=V_{Y}D_{Y}^{1/2}V_{Y}^{\top },\qquad V_{Y}D_{Y}V_{Y}^{\top }=\Sigma _{YY}.}

Thus

ρ=cTΣXX1/2ΣXYΣYY1/2dcTcdTd.{\displaystyle \rho ={\frac {c^{T}\Sigma _{XX}^{-1/2}\Sigma _{XY}\Sigma _{YY}^{-1/2}d}{{\sqrt {c^{T}c}}{\sqrt {d^{T}d}}}}.}

By theCauchy–Schwarz inequality,

(cTΣXX1/2ΣXYΣYY1/2)(d)(cTΣXX1/2ΣXYΣYY1/2ΣYY1/2ΣYXΣXX1/2c)1/2(dTd)1/2,{\displaystyle \left(c^{T}\Sigma _{XX}^{-1/2}\Sigma _{XY}\Sigma _{YY}^{-1/2}\right)(d)\leq \left(c^{T}\Sigma _{XX}^{-1/2}\Sigma _{XY}\Sigma _{YY}^{-1/2}\Sigma _{YY}^{-1/2}\Sigma _{YX}\Sigma _{XX}^{-1/2}c\right)^{1/2}\left(d^{T}d\right)^{1/2},}
ρ(cTΣXX1/2ΣXYΣYY1ΣYXΣXX1/2c)1/2(cTc)1/2.{\displaystyle \rho \leq {\frac {\left(c^{T}\Sigma _{XX}^{-1/2}\Sigma _{XY}\Sigma _{YY}^{-1}\Sigma _{YX}\Sigma _{XX}^{-1/2}c\right)^{1/2}}{\left(c^{T}c\right)^{1/2}}}.}

There is equality if the vectorsd{\displaystyle d} andΣYY1/2ΣYXΣXX1/2c{\displaystyle \Sigma _{YY}^{-1/2}\Sigma _{YX}\Sigma _{XX}^{-1/2}c} are collinear. In addition, the maximum of correlation is attained ifc{\displaystyle c} is theeigenvector with the maximum eigenvalue for the matrixΣXX1/2ΣXYΣYY1ΣYXΣXX1/2{\displaystyle \Sigma _{XX}^{-1/2}\Sigma _{XY}\Sigma _{YY}^{-1}\Sigma _{YX}\Sigma _{XX}^{-1/2}} (seeRayleigh quotient). The subsequent pairs are found by usingeigenvalues of decreasing magnitudes. Orthogonality is guaranteed by the symmetry of the correlation matrices.

Another way of viewing this computation is thatc{\displaystyle c} andd{\displaystyle d} are the left and rightsingular vectors of the correlation matrix of X and Y corresponding to the highest singular value.

Solution

[edit]

The solution is therefore:

Reciprocally, there is also:

Reversing the change of coordinates, we have that

The canonical variables are defined by:

U=cTΣXX1/2X=aTX{\displaystyle U=c^{T}\Sigma _{XX}^{-1/2}X=a^{T}X}
V=dTΣYY1/2Y=bTY{\displaystyle V=d^{T}\Sigma _{YY}^{-1/2}Y=b^{T}Y}

Implementation

[edit]

CCA can be computed usingsingular value decomposition on a correlation matrix.[9] It is available as a function in[10]

CCA computation usingsingular value decomposition on a correlation matrix is related to thecosine of theangles between flats. Thecosine function isill-conditioned for small angles, leading to very inaccurate computation of highly correlated principal vectors in finiteprecisioncomputer arithmetic. Tofix this trouble, alternative algorithms[12] are available in

Hypothesis testing

[edit]

Each row can be tested for significance with the following method. Since the correlations are sorted, saying that rowi{\displaystyle i} is zero implies all further correlations are also zero. If we havep{\displaystyle p} independent observations in a sample andρ^i{\displaystyle {\widehat {\rho }}_{i}} is the estimated correlation fori=1,,min{m,n}{\displaystyle i=1,\dots ,\min\{m,n\}}. For thei{\displaystyle i}th row, the test statistic is:

χ2=(p112(m+n+1))lnj=imin{m,n}(1ρ^j2),{\displaystyle \chi ^{2}=-\left(p-1-{\frac {1}{2}}(m+n+1)\right)\ln \prod _{j=i}^{\min\{m,n\}}(1-{\widehat {\rho }}_{j}^{2}),}

which is asymptotically distributed as achi-squared with(mi+1)(ni+1){\displaystyle (m-i+1)(n-i+1)}degrees of freedom for largep{\displaystyle p}.[13] Since all the correlations frommin{m,n}{\displaystyle \min\{m,n\}} top{\displaystyle p} are logically zero (and estimated that way also) the product for the terms after this point is irrelevant.

Note that in the small sample size limit withp<n+m{\displaystyle p<n+m} then we are guaranteed that the topm+np{\displaystyle m+n-p} correlations will be identically 1 and hence the test is meaningless.[14]

Practical uses

[edit]

A typical use for canonical correlation in the experimental context is to take two sets of variables and see what is common among the two sets.[15] For example, in psychological testing, one could take two well established multidimensionalpersonality tests such as theMinnesota Multiphasic Personality Inventory (MMPI-2) and theNEO. By seeing how the MMPI-2 factors relate to the NEO factors, one could gain insight into what dimensions were common between the tests and how much variance was shared. For example, one might find that anextraversion orneuroticism dimension accounted for a substantial amount of shared variance between the two tests.

One can also use canonical-correlation analysis to produce a model equation which relates two sets of variables, for example a set of performance measures and a set of explanatory variables, or a set of outputs and set of inputs. Constraint restrictions can be imposed on such a model to ensure it reflects theoretical requirements or intuitively obvious conditions. This type of model is known as a maximum correlation model.[16]

Visualization of the results of canonical correlation is usually through bar plots of the coefficients of the two sets of variables for the pairs of canonical variates showing significant correlation. Some authors suggest that they are best visualized by plotting them as heliographs, a circular format with ray like bars, with each half representing the two sets of variables.[17]

Examples

[edit]

LetX=x1{\displaystyle X=x_{1}} with zeroexpected value, i.e.,E(X)=0{\displaystyle \operatorname {E} (X)=0}.

  1. IfY=X{\displaystyle Y=X}, i.e.,X{\displaystyle X} andY{\displaystyle Y} are perfectly correlated, then, e.g.,a=1{\displaystyle a=1} andb=1{\displaystyle b=1}, so that the first (and only in this example) pair of canonical variables isU=X{\displaystyle U=X} andV=Y=X{\displaystyle V=Y=X}.
  2. IfY=X{\displaystyle Y=-X}, i.e.,X{\displaystyle X} andY{\displaystyle Y} are perfectly anticorrelated, then, e.g.,a=1{\displaystyle a=1} andb=1{\displaystyle b=-1}, so that the first (and only in this example) pair of canonical variables isU=X{\displaystyle U=X} andV=Y=X{\displaystyle V=-Y=X}.

We notice that in both casesU=V{\displaystyle U=V}, which illustrates that the canonical-correlation analysis treats correlated and anticorrelated variables similarly.

Connection to principal angles

[edit]

Assuming thatX=(x1,,xn)T{\displaystyle X=(x_{1},\dots ,x_{n})^{T}} andY=(y1,,ym)T{\displaystyle Y=(y_{1},\dots ,y_{m})^{T}} have zeroexpected values, i.e.,E(X)=E(Y)=0{\displaystyle \operatorname {E} (X)=\operatorname {E} (Y)=0}, theircovariance matricesΣXX=Cov(X,X)=E[XXT]{\displaystyle \Sigma _{XX}=\operatorname {Cov} (X,X)=\operatorname {E} [XX^{T}]} andΣYY=Cov(Y,Y)=E[YYT]{\displaystyle \Sigma _{YY}=\operatorname {Cov} (Y,Y)=\operatorname {E} [YY^{T}]} can be viewed asGram matrices in aninner product for the entries ofX{\displaystyle X} andY{\displaystyle Y}, correspondingly. In this interpretation, the random variables, entriesxi{\displaystyle x_{i}} ofX{\displaystyle X} andyj{\displaystyle y_{j}} ofY{\displaystyle Y} are treated as elements of a vector space with an inner product given by thecovariancecov(xi,yj){\displaystyle \operatorname {cov} (x_{i},y_{j})}; seeCovariance#Relationship to inner products.

The definition of the canonical variablesU{\displaystyle U} andV{\displaystyle V} is then equivalent to the definition ofprincipal vectors for the pair of subspaces spanned by the entries ofX{\displaystyle X} andY{\displaystyle Y} with respect to thisinner product. The canonical correlationscorr(U,V){\displaystyle \operatorname {corr} (U,V)} is equal to thecosine ofprincipal angles.

Whitening and probabilistic canonical correlation analysis

[edit]

CCA can also be viewed as a specialwhitening transformation where the random vectorsX{\displaystyle X} andY{\displaystyle Y} are simultaneously transformed in such a way that the cross-correlation between the whitened vectorsXCCA{\displaystyle X^{CCA}} andYCCA{\displaystyle Y^{CCA}} is diagonal.[18]The canonical correlations are then interpreted as regression coefficients linkingXCCA{\displaystyle X^{CCA}} andYCCA{\displaystyle Y^{CCA}} and may also be negative. The regression view of CCA also provides a way to construct a latent variable probabilistic generative model for CCA, with uncorrelated hidden variables representing shared and non-shared variability.[19]

See also

[edit]

References

[edit]
  1. ^Härdle, Wolfgang; Simar, Léopold (2007). "Canonical Correlation Analysis".Applied Multivariate Statistical Analysis. pp. 321–330.CiteSeerX 10.1.1.324.403.doi:10.1007/978-3-540-72244-1_14.ISBN 978-3-540-72243-4.
  2. ^Knapp, T. R. (1978). "Canonical correlation analysis: A general parametric significance-testing system".Psychological Bulletin.85 (2):410–416.doi:10.1037/0033-2909.85.2.410.
  3. ^Hotelling, H. (1936). "Relations Between Two Sets of Variates".Biometrika.28 (3–4):321–377.doi:10.1093/biomet/28.3-4.321.JSTOR 2333955.
  4. ^Jordan, C. (1875)."Essai sur la géométrie àn{\displaystyle n} dimensions".Bull. Soc. Math. France.3: 103.
  5. ^Andrew, Galen; Arora, Raman; Bilmes, Jeff; Livescu, Karen (2013-05-26)."Deep Canonical Correlation Analysis".Proceedings of the 30th International Conference on Machine Learning. PMLR:1247–1255.
  6. ^Ju, Ce; Kobler, Reinmar J; Tang, Liyao; Guan, Cuntai; Kawanabe, Motoaki (2024).Deep Geodesic Canonical Correlation Analysis for Covariance-Based Neuroimaging Data. The Twelfth International Conference on Learning Representations (ICLR 2024, spotlight).
  7. ^"Statistical Learning with Sparsity: the Lasso and Generalizations".hastie.su.domains. Retrieved2023-09-12.
  8. ^Gu, Fei; Wu, Hao (2018-04-01)."Simultaneous canonical correlation analysis with invariant canonical loadings".Behaviormetrika.45 (1):111–132.doi:10.1007/s41237-017-0042-8.ISSN 1349-6964.
  9. ^Hsu, D.; Kakade, S. M.; Zhang, T. (2012)."A spectral algorithm for learning Hidden Markov Models"(PDF).Journal of Computer and System Sciences.78 (5): 1460.arXiv:0811.4413.doi:10.1016/j.jcss.2011.12.025.S2CID 220740158.
  10. ^Huang, S. Y.; Lee, M. H.; Hsiao, C. K. (2009)."Nonlinear measures of association with kernel canonical correlation analysis and applications"(PDF).Journal of Statistical Planning and Inference.139 (7): 2162.doi:10.1016/j.jspi.2008.10.011. Archived fromthe original(PDF) on 2017-03-13. Retrieved2015-09-04.
  11. ^Chapman, James; Wang, Hao-Ting (2021-12-18)."CCA-Zoo: A collection of Regularized, Deep Learning based, Kernel, and Probabilistic CCA methods in a scikit-learn style framework".Journal of Open Source Software.6 (68): 3823.Bibcode:2021JOSS....6.3823C.doi:10.21105/joss.03823.ISSN 2475-9066.
  12. ^Knyazev, A.V.; Argentati, M.E. (2002), "Principal Angles between Subspaces in an A-Based Scalar Product: Algorithms and Perturbation Estimates",SIAM Journal on Scientific Computing,23 (6):2009–2041,Bibcode:2002SJSC...23.2008K,CiteSeerX 10.1.1.73.2914,doi:10.1137/S1064827500377332
  13. ^Kanti V. Mardia, J. T. Kent and J. M. Bibby (1979).Multivariate Analysis.Academic Press.
  14. ^Yang Song, Peter J. Schreier, David Ram´ırez, and Tanuj HasijaCanonical correlation analysis of high-dimensional data with very small sample supportarXiv:1604.02047
  15. ^Sieranoja, S.; Sahidullah, Md; Kinnunen, T.; Komulainen, J.; Hadid, A. (July 2018)."Audiovisual Synchrony Detection with Optimized Audio Features"(PDF).2018 IEEE 3rd International Conference on Signal and Image Processing (ICSIP). pp. 377–381.doi:10.1109/SIPROCESS.2018.8600424.ISBN 978-1-5386-6396-7.S2CID 51682024.
  16. ^Tofallis, C. (1999). "Model Building with Multiple Dependent Variables and Constraints".Journal of the Royal Statistical Society, Series D.48 (3):371–378.arXiv:1109.0725.doi:10.1111/1467-9884.00195.S2CID 8942357.
  17. ^Degani, A.; Shafto, M.; Olson, L. (2006)."Canonical Correlation Analysis: Use of Composite Heliographs for Representing Multiple Patterns"(PDF).Diagrammatic Representation and Inference. Lecture Notes in Computer Science. Vol. 4045. p. 93.CiteSeerX 10.1.1.538.5217.doi:10.1007/11783183_11.ISBN 978-3-540-35623-3.
  18. ^Jendoubi, T.; Strimmer, K. (2018)."A whitening approach to probabilistic canonical correlation analysis for omics data integration".BMC Bioinformatics.20 (1): 15.arXiv:1802.03490.doi:10.1186/s12859-018-2572-9.PMC 6327589.PMID 30626338.
  19. ^Jendoubi, Takoua; Strimmer, Korbinian (9 January 2019)."A whitening approach to probabilistic canonical correlation analysis for omics data integration".BMC Bioinformatics.20 (1): 15.doi:10.1186/s12859-018-2572-9.ISSN 1471-2105.PMC 6327589.PMID 30626338.

External links

[edit]


Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
  1. ^Haghighat, Mohammad; Abdel-Mottaleb, Mohamed; Alhalabi, Wadee (2016)."Discriminant Correlation Analysis: Real-Time Feature Level Fusion for Multimodal Biometric Recognition".IEEE Transactions on Information Forensics and Security.11 (9):1984–1996.doi:10.1109/TIFS.2016.2569061.S2CID 15624506.
Retrieved from "https://en.wikipedia.org/w/index.php?title=Canonical_correlation&oldid=1319736196"
Category:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp