Movatterモバイル変換

Whitening transformation

From Wikipedia, the free encyclopedia

Classification algorithm

Awhitening transformation orsphering transformation is alinear transformation that transforms a vector ofrandom variables with a knowncovariance matrix into a set of new variables whose covariance is theidentity matrix, meaning that they areuncorrelated and each havevariance 1.^[1] The transformation is called "whitening" because it changes the input vector into awhite noise vector.

Several other transformations are closely related to whitening:

thedecorrelation transform removes only the correlations but leaves variances intact,
thestandardization transform sets variances to 1 but leaves correlations intact,
acoloring transformation transforms a vector of white random variables into a random vector with a specified covariance matrix.^[2]

Definition

[edit]

Suppose $X {\displaystyle X}$ is arandom (column) vector with non-singular covariance matrix $\Sigma$ and mean $0 {\displaystyle 0}$ . Then the transformation $Y=WX$ with awhitening matrix $W {\displaystyle W}$ satisfying the condition $W^{\mathrm {T} }W=\Sigma ^{-1}$ yields the whitened random vector $Y {\displaystyle Y}$ with unit diagonal covariance.

If $X {\displaystyle X}$ has non-zero mean $\mu$ , then whitening can be performed by $Y=W(X-\mu )$ .

There are infinitely many possible whitening matrices $W {\displaystyle W}$ that all satisfy the above condition. Commonly used choices are $W=\Sigma ^{-1/2}$ (Mahalanobis or ZCA whitening), $W=L^{T}$ where $L {\displaystyle L}$ is theCholesky decomposition of $\Sigma ^{-1}$ (Cholesky whitening),^[3] or the eigen-system of $\Sigma$ (PCA whitening).^[4]

Optimal whitening transforms can be singled out by investigating the cross-covariance and cross-correlation of $X {\displaystyle X}$ and $Y {\displaystyle Y}$ .^[3] For example, the unique optimal whitening transformation achieving maximal component-wise correlation between original $X {\displaystyle X}$ and whitened $Y {\displaystyle Y}$ is produced by the whitening matrix $W=P^{-1/2}V^{-1/2}$ where $P {\displaystyle P}$ is the correlation matrix and $V {\displaystyle V}$ the diagonal variance matrix.

Whitening a data matrix

[edit]

Whitening a data matrix follows the same transformation as for random variables. An empirical whitening transform is obtained byestimating the covariance (e.g. bymaximum likelihood) and subsequently constructing a corresponding estimated whitening matrix (e.g. byCholesky decomposition).

High-dimensional whitening

[edit]

This modality is a generalization of the pre-whitening procedure extended to more general spaces where $X {\displaystyle X}$ is usually assumed to be a random function or other random objects in aHilbert space $H {\displaystyle H}$ . One of the main issues of extending whitening to infinite dimensions is that thecovariance operator has an unbounded inverse in $H {\displaystyle H}$ , therefore only partial standardization is possible in infinite dimensions. A whitening operator can be then defined from the factorization of a degenerated covariance operator. High-dimensional features of the data can be exploited through kernel regressors or basis function systems.^[5]

R implementation

[edit]

An implementation of several whitening procedures inR, including ZCA-whitening and PCA whitening but alsoCCA whitening, is available in the "whitening" R package^[6] published onCRAN. The R package "pfica"^[7] allows the computation of high-dimensional whitening representations using basis function systems (B-splines,Fourier basis, etc.).