Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Kosambi–Karhunen–Loève theorem

From Wikipedia, the free encyclopedia
(Redirected fromKarhunen–Loève theorem)
Theory of stochastic processes

In the theory ofstochastic processes, theKarhunen–Loève theorem (named afterKari Karhunen andMichel Loève), also known as theKosambi–Karhunen–Loève theorem[1][2] states that astochastic process can be represented as an infinitelinear combination oforthogonal functions, analogous to aFourier series representation of a function on a bounded interval. The transformation is also known asHotelling transform andeigenvector transform, and is closely related toprincipal component analysis (PCA) technique widely used in image processing and in data analysis in many fields.[3]

There exist many such expansions of a stochastic process: if the process is indexed over[a,b], anyorthonormal basis ofL2([a,b]) yields an expansion thereof in that form. The importance of the Karhunen–Loève theorem is that it yields the best such basis in the sense that it minimizes the totalmean squared error.

In contrast to a Fourier series where the coefficients are fixed numbers and the expansion basis consists ofsinusoidal functions (that is,sine andcosine functions), the coefficients in the Karhunen–Loève theorem arerandom variables and the expansion basis depends on the process. In fact, the orthogonal basis functions used in this representation are determined by thecovariance function of the process. One can think that the Karhunen–Loève transform adapts to the process in order to produce the best possible basis for its expansion.

In the case of acentered stochastic process{Xt}t ∈ [a,b] (centered meansE[Xt] = 0 for allt ∈ [a,b]) satisfying a technical continuity condition,X admits a decomposition

Xt=k=1Zkek(t){\displaystyle X_{t}=\sum _{k=1}^{\infty }Z_{k}e_{k}(t)}

whereZk are pairwiseuncorrelated random variables and the functionsek are continuous real-valued functions on[a,b] that are pairwiseorthogonal inL2([a,b]). It is therefore sometimes said that the expansion isbi-orthogonal since the random coefficientsZk are orthogonal in the probability space while the deterministic functionsek are orthogonal in the time domain. The general case of a processXt that is not centered can be brought back to the case of a centered process by consideringXtE[Xt] which is a centered process.

Moreover, if the process isGaussian, then the random variablesZk are Gaussian andstochastically independent. This result generalizes theKarhunen–Loève transform. An important example of a centered real stochastic process on[0, 1] is theWiener process; the Karhunen–Loève theorem can be used to provide a canonical orthogonal representation for it. In this case the expansion consists of sinusoidal functions.

The above expansion into uncorrelated random variables is also known as theKarhunen–Loève expansion orKarhunen–Loève decomposition. Theempirical version (i.e., with the coefficients computed from a sample) is known as theKarhunen–Loève transform (KLT),principal component analysis,proper orthogonal decomposition (POD),empirical orthogonal functions (a term used inmeteorology andgeophysics), or theHotelling transform.

Formulation

[edit]
  • Throughout this article, we will consider a random processXt defined over aprobability space(Ω,F,P) and indexed over a closed interval[a,b], which issquare-integrable, has zero-mean, and with covariance functionKX(s,t). In other words, we have:
t[a,b]XtL2(Ω,F,P),i.e. E[Xt2]<,{\displaystyle \forall t\in [a,b]\qquad X_{t}\in L^{2}(\Omega ,F,\mathbf {P} ),\quad {\text{i.e. }}\mathbf {E} [X_{t}^{2}]<\infty ,}
t[a,b]E[Xt]=0,{\displaystyle \forall t\in [a,b]\qquad \mathbf {E} [X_{t}]=0,}
t,s[a,b]KX(s,t)=E[XsXt].{\displaystyle \forall t,s\in [a,b]\qquad K_{X}(s,t)=\mathbf {E} [X_{s}X_{t}].}

The square-integrable conditionE[Xt2]<{\displaystyle \mathbf {E} [X_{t}^{2}]<\infty } is logically equivalent toKX(s,t){\displaystyle K_{X}(s,t)} being finite for alls,t[a,b]{\displaystyle s,t\in [a,b]}.[4]

TKX:{L2([a,b])L2([a,b])fTKXf=abKX(s,)f(s)ds{\displaystyle T_{K_{X}}\colon \left\{{\begin{aligned}L^{2}([a,b])&\to L^{2}([a,b])\\f&\mapsto T_{K_{X}}f=\int _{a}^{b}K_{X}(s,\cdot )f(s)\,ds\end{aligned}}\right.}
SinceTKX is a linear endomorphism, it makes sense to talk about its eigenvaluesλk and eigenfunctionsek, which are found by solving the homogeneous Fredholmintegral equation of the second kind
abKX(s,t)ek(s)ds=λkek(t){\displaystyle \int _{a}^{b}K_{X}(s,t)e_{k}(s)\,ds=\lambda _{k}e_{k}(t)}.

Statement of the theorem

[edit]

Theorem. LetXt be a zero-mean square-integrable stochastic process defined over a probability space(Ω,F,P) and indexed over a closed and bounded interval [ab], with continuous covariance functionKX(s,t).

ThenKX(s,t) is aMercer kernel and lettingek be an orthonormal basis onL2([a,b]) formed by the eigenfunctions ofTKX with respective eigenvaluesλk, Xt admits the following representation

Xt=k=1Zkek(t){\displaystyle X_{t}=\sum _{k=1}^{\infty }Z_{k}e_{k}(t)}

where the convergence is inL2, uniform int and

Zk=abXtek(t)dt{\displaystyle Z_{k}=\int _{a}^{b}X_{t}e_{k}(t)\,dt}

Furthermore, the random variablesZk have zero-mean, are uncorrelated and have varianceλk

E[Zk]=0, kNandE[ZiZj]=δijλj, i,jN{\displaystyle \mathbf {E} [Z_{k}]=0,~\forall k\in \mathbb {N} \qquad {\mbox{and}}\qquad \mathbf {E} [Z_{i}Z_{j}]=\delta _{ij}\lambda _{j},~\forall i,j\in \mathbb {N} }

Note that by generalizations of Mercer's theorem we can replace the interval [a,b] with other compact spacesC and theLebesgue measure on [a,b] with aBorel measure whose support isC.

Proof

[edit]
  • The covariance functionKX satisfies the definition of a Mercer kernel. ByMercer's theorem, there consequently exists a setλk,ek(t) of eigenvalues and eigenfunctions ofTKX forming an orthonormal basis ofL2([a,b]), andKX can be expressed as
KX(s,t)=k=1λkek(s)ek(t){\displaystyle K_{X}(s,t)=\sum _{k=1}^{\infty }\lambda _{k}e_{k}(s)e_{k}(t)}
  • The processXt can be expanded in terms of the eigenfunctionsek as:
Xt=k=1Zkek(t){\displaystyle X_{t}=\sum _{k=1}^{\infty }Z_{k}e_{k}(t)}
where the coefficients (random variables)Zk are given by the projection ofXt on the respective eigenfunctions
Zk=abXtek(t)dt{\displaystyle Z_{k}=\int _{a}^{b}X_{t}e_{k}(t)\,dt}
  • We may then derive
E[Zk]=E[abXtek(t)dt]=abE[Xt]ek(t)dt=0E[ZiZj]=E[ababXtXsej(t)ei(s)dtds]=ababE[XtXs]ej(t)ei(s)dtds=ababKX(s,t)ej(t)ei(s)dtds=abei(s)(abKX(s,t)ej(t)dt)ds=λjabei(s)ej(s)ds=δijλj{\displaystyle {\begin{aligned}\mathbf {E} [Z_{k}]&=\mathbf {E} \left[\int _{a}^{b}X_{t}e_{k}(t)\,dt\right]=\int _{a}^{b}\mathbf {E} [X_{t}]e_{k}(t)dt=0\\[8pt]\mathbf {E} [Z_{i}Z_{j}]&=\mathbf {E} \left[\int _{a}^{b}\int _{a}^{b}X_{t}X_{s}e_{j}(t)e_{i}(s)\,dt\,ds\right]\\&=\int _{a}^{b}\int _{a}^{b}\mathbf {E} \left[X_{t}X_{s}\right]e_{j}(t)e_{i}(s)\,dt\,ds\\&=\int _{a}^{b}\int _{a}^{b}K_{X}(s,t)e_{j}(t)e_{i}(s)\,dt\,ds\\&=\int _{a}^{b}e_{i}(s)\left(\int _{a}^{b}K_{X}(s,t)e_{j}(t)\,dt\right)\,ds\\&=\lambda _{j}\int _{a}^{b}e_{i}(s)e_{j}(s)\,ds\\&=\delta _{ij}\lambda _{j}\end{aligned}}}
where we have used the fact that theek are eigenfunctions ofTKX and are orthonormal.
  • Let us now show that the convergence is inL2. Let
SN=k=1NZkek(t).{\displaystyle S_{N}=\sum _{k=1}^{N}Z_{k}e_{k}(t).}
Then:
E[|XtSN|2]=E[Xt2]+E[SN2]2E[XtSN]=KX(t,t)+E[k=1Nl=1NZkZek(t)e(t)]2E[Xtk=1NZkek(t)]=KX(t,t)+k=1Nλkek(t)22E[k=1NabXtXsek(s)ek(t)ds]=KX(t,t)k=1Nλkek(t)2{\displaystyle {\begin{aligned}\mathbf {E} \left[\left|X_{t}-S_{N}\right|^{2}\right]&=\mathbf {E} \left[X_{t}^{2}\right]+\mathbf {E} \left[S_{N}^{2}\right]-2\mathbf {E} \left[X_{t}S_{N}\right]\\&=K_{X}(t,t)+\mathbf {E} \left[\sum _{k=1}^{N}\sum _{l=1}^{N}Z_{k}Z_{\ell }e_{k}(t)e_{\ell }(t)\right]-2\mathbf {E} \left[X_{t}\sum _{k=1}^{N}Z_{k}e_{k}(t)\right]\\&=K_{X}(t,t)+\sum _{k=1}^{N}\lambda _{k}e_{k}(t)^{2}-2\mathbf {E} \left[\sum _{k=1}^{N}\int _{a}^{b}X_{t}X_{s}e_{k}(s)e_{k}(t)\,ds\right]\\&=K_{X}(t,t)-\sum _{k=1}^{N}\lambda _{k}e_{k}(t)^{2}\end{aligned}}}
which goes to 0 by Mercer's theorem.

Properties of the Karhunen–Loève transform

[edit]

Special case: Gaussian distribution

[edit]

Since the limit in the mean of jointly Gaussian random variables is jointly Gaussian, and jointly Gaussian random (centered) variables are independentif and only if they are orthogonal, we can also conclude:

Theorem. The variablesZi have a joint Gaussian distribution and are stochastically independent if the original process{Xt}t is Gaussian.

In the Gaussian case, since the variablesZi are independent, we can say more:

limNi=1Nei(t)Zi(ω)=Xt(ω){\displaystyle \lim _{N\to \infty }\sum _{i=1}^{N}e_{i}(t)Z_{i}(\omega )=X_{t}(\omega )}

almost surely.

The Karhunen–Loève transform decorrelates the process

[edit]

This is a consequence of the independence of theZk.

The Karhunen–Loève expansion minimizes the total mean square error

[edit]

In the introduction, we mentioned that the truncated Karhunen–Loeve expansion was the best approximation of the original process in the sense that it reduces the total mean-square error resulting of its truncation. Because of this property, it is often said that the KL transform optimally compacts the energy.

More specifically, given any orthonormal basis{fk} ofL2([a,b]), we may decompose the processXt as:

Xt(ω)=k=1Ak(ω)fk(t){\displaystyle X_{t}(\omega )=\sum _{k=1}^{\infty }A_{k}(\omega )f_{k}(t)}

where

Ak(ω)=abXt(ω)fk(t)dt{\displaystyle A_{k}(\omega )=\int _{a}^{b}X_{t}(\omega )f_{k}(t)\,dt}

and we may approximateXt by the finite sum

X^t(ω)=k=1NAk(ω)fk(t){\displaystyle {\hat {X}}_{t}(\omega )=\sum _{k=1}^{N}A_{k}(\omega )f_{k}(t)}

for some integerN.

Claim. Of all such approximations, the KL approximation is the one that minimizes the total mean square error (provided we have arranged the eigenvalues in decreasing order).

Proof

Consider the error resulting from the truncation at theN-th term in the following orthonormal expansion:

εN(t)=k=N+1Ak(ω)fk(t){\displaystyle \varepsilon _{N}(t)=\sum _{k=N+1}^{\infty }A_{k}(\omega )f_{k}(t)}

The mean-square errorεN2(t) can be written as:

εN2(t)=E[i=N+1j=N+1Ai(ω)Aj(ω)fi(t)fj(t)]=i=N+1j=N+1E[ababXtXsfi(t)fj(s)dsdt]fi(t)fj(t)=i=N+1j=N+1fi(t)fj(t)ababKX(s,t)fi(t)fj(s)dsdt{\displaystyle {\begin{aligned}\varepsilon _{N}^{2}(t)&=\mathbf {E} \left[\sum _{i=N+1}^{\infty }\sum _{j=N+1}^{\infty }A_{i}(\omega )A_{j}(\omega )f_{i}(t)f_{j}(t)\right]\\&=\sum _{i=N+1}^{\infty }\sum _{j=N+1}^{\infty }\mathbf {E} \left[\int _{a}^{b}\int _{a}^{b}X_{t}X_{s}f_{i}(t)f_{j}(s)\,ds\,dt\right]f_{i}(t)f_{j}(t)\\&=\sum _{i=N+1}^{\infty }\sum _{j=N+1}^{\infty }f_{i}(t)f_{j}(t)\int _{a}^{b}\int _{a}^{b}K_{X}(s,t)f_{i}(t)f_{j}(s)\,ds\,dt\end{aligned}}}

We then integrate this last equality over [a,b]. The orthonormality of thefk yields:

abεN2(t)dt=k=N+1ababKX(s,t)fk(t)fk(s)dsdt{\displaystyle \int _{a}^{b}\varepsilon _{N}^{2}(t)\,dt=\sum _{k=N+1}^{\infty }\int _{a}^{b}\int _{a}^{b}K_{X}(s,t)f_{k}(t)f_{k}(s)\,ds\,dt}

The problem of minimizing the total mean-square error thus comes down to minimizing the right hand side of this equality subject to the constraint that thefk be normalized. We hence introduceβk, the Lagrangian multipliers associated with these constraints, and aim at minimizing the following function:

Er[fk(t),k{N+1,}]=k=N+1ababKX(s,t)fk(t)fk(s)dsdtβk(abfk(t)fk(t)dt1){\displaystyle Er[f_{k}(t),k\in \{N+1,\ldots \}]=\sum _{k=N+1}^{\infty }\int _{a}^{b}\int _{a}^{b}K_{X}(s,t)f_{k}(t)f_{k}(s)\,ds\,dt-\beta _{k}\left(\int _{a}^{b}f_{k}(t)f_{k}(t)\,dt-1\right)}

Differentiating with respect tofi(t) (this is afunctional derivative) and setting the derivative to 0 yields:

Erfi(t)=ab(abKX(s,t)fi(s)dsβifi(t))dt=0{\displaystyle {\frac {\partial Er}{\partial f_{i}(t)}}=\int _{a}^{b}\left(\int _{a}^{b}K_{X}(s,t)f_{i}(s)\,ds-\beta _{i}f_{i}(t)\right)\,dt=0}

which is satisfied in particular when

abKX(s,t)fi(s)ds=βifi(t).{\displaystyle \int _{a}^{b}K_{X}(s,t)f_{i}(s)\,ds=\beta _{i}f_{i}(t).}

In other words, when thefk are chosen to be the eigenfunctions ofTKX, hence resulting in the KL expansion.

Explained variance

[edit]

An important observation is that since the random coefficientsZk of the KL expansion are uncorrelated, theBienaymé formula asserts that the variance ofXt is simply the sum of the variances of the individual components of the sum:

var[Xt]=k=0ek(t)2var[Zk]=k=1λkek(t)2{\displaystyle \operatorname {var} [X_{t}]=\sum _{k=0}^{\infty }e_{k}(t)^{2}\operatorname {var} [Z_{k}]=\sum _{k=1}^{\infty }\lambda _{k}e_{k}(t)^{2}}

Integrating over [a,b] and using the orthonormality of theek, we obtain that the total variance of the process is:

abvar[Xt]dt=k=1λk{\displaystyle \int _{a}^{b}\operatorname {var} [X_{t}]\,dt=\sum _{k=1}^{\infty }\lambda _{k}}

In particular, the total variance of theN-truncated approximation is

k=1Nλk.{\displaystyle \sum _{k=1}^{N}\lambda _{k}.}

As a result, theN-truncated expansion explains

k=1Nλkk=1λk{\displaystyle {\frac {\sum _{k=1}^{N}\lambda _{k}}{\sum _{k=1}^{\infty }\lambda _{k}}}}

of the variance; and if we are content with an approximation that explains, say, 95% of the variance, then we just have to determine anNN{\displaystyle N\in \mathbb {N} } such that

k=1Nλkk=1λk0.95.{\displaystyle {\frac {\sum _{k=1}^{N}\lambda _{k}}{\sum _{k=1}^{\infty }\lambda _{k}}}\geq 0.95.}

The Karhunen–Loève expansion has the minimum representation entropy property

[edit]

Given a representation ofXt=k=1Wkφk(t){\displaystyle X_{t}=\sum _{k=1}^{\infty }W_{k}\varphi _{k}(t)}, for some orthonormal basisφk(t){\displaystyle \varphi _{k}(t)} and randomWk{\displaystyle W_{k}}, we letpk=E[|Wk|2]/E[|Xt|L22]{\displaystyle p_{k}=\mathbb {E} [|W_{k}|^{2}]/\mathbb {E} [|X_{t}|_{L^{2}}^{2}]}, so thatk=1pk=1{\displaystyle \sum _{k=1}^{\infty }p_{k}=1}. We may then define the representationentropy to beH({φk})=ipklog(pk){\displaystyle H(\{\varphi _{k}\})=-\sum _{i}p_{k}\log(p_{k})}. Then we haveH({φk})H({ek}){\displaystyle H(\{\varphi _{k}\})\geq H(\{e_{k}\})}, for all choices ofφk{\displaystyle \varphi _{k}}. That is, the KL-expansion has minimal representation entropy.

Proof:

Denote the coefficients obtained for the basisek(t){\displaystyle e_{k}(t)} aspk{\displaystyle p_{k}}, and forφk(t){\displaystyle \varphi _{k}(t)} asqk{\displaystyle q_{k}}.

ChooseN1{\displaystyle N\geq 1}. Note that sinceek{\displaystyle e_{k}} minimizes the mean squared error, we have that

E|k=1NZkek(t)Xt|L22E|k=1NWkφk(t)Xt|L22{\displaystyle \mathbb {E} \left|\sum _{k=1}^{N}Z_{k}e_{k}(t)-X_{t}\right|_{L^{2}}^{2}\leq \mathbb {E} \left|\sum _{k=1}^{N}W_{k}\varphi _{k}(t)-X_{t}\right|_{L^{2}}^{2}}

Expanding the right hand size, we get:

E|k=1NWkφk(t)Xt|L22=E|Xt2|L2+k=1N=1NE[Wφ(t)Wkφk(t)]L2k=1NE[WkφkXt]L2k=1NE[XtWkφk(t)]L2{\displaystyle \mathbb {E} \left|\sum _{k=1}^{N}W_{k}\varphi _{k}(t)-X_{t}\right|_{L^{2}}^{2}=\mathbb {E} |X_{t}^{2}|_{L^{2}}+\sum _{k=1}^{N}\sum _{\ell =1}^{N}\mathbb {E} [W_{\ell }\varphi _{\ell }(t)W_{k}^{*}\varphi _{k}^{*}(t)]_{L^{2}}-\sum _{k=1}^{N}\mathbb {E} [W_{k}\varphi _{k}X_{t}^{*}]_{L^{2}}-\sum _{k=1}^{N}\mathbb {E} [X_{t}W_{k}^{*}\varphi _{k}^{*}(t)]_{L^{2}}}

Using the orthonormality ofφk(t){\displaystyle \varphi _{k}(t)}, and expandingXt{\displaystyle X_{t}} in theφk(t){\displaystyle \varphi _{k}(t)} basis, we get that the right hand size is equal to:

E[Xt]L22k=1NE[|Wk|2]{\displaystyle \mathbb {E} [X_{t}]_{L^{2}}^{2}-\sum _{k=1}^{N}\mathbb {E} [|W_{k}|^{2}]}

We may perform identical analysis for theek(t){\displaystyle e_{k}(t)}, and so rewrite the above inequality as:

E[Xt]L22k=1NE[|Zk|2]E[Xt]L22k=1NE[|Wk|2]{\displaystyle {\displaystyle \mathbb {E} [X_{t}]_{L^{2}}^{2}-\sum _{k=1}^{N}\mathbb {E} [|Z_{k}|^{2}]}\leq {\displaystyle \mathbb {E} [X_{t}]_{L^{2}}^{2}-\sum _{k=1}^{N}\mathbb {E} [|W_{k}|^{2}]}}

Subtracting the common first term, and dividing byE[|Xt|L22]{\displaystyle \mathbb {E} [|X_{t}|_{L^{2}}^{2}]}, we obtain that:

k=1Npkk=1Nqk{\displaystyle \sum _{k=1}^{N}p_{k}\geq \sum _{k=1}^{N}q_{k}}

This implies that:

k=1pklog(pk)k=1qklog(qk){\displaystyle -\sum _{k=1}^{\infty }p_{k}\log(p_{k})\leq -\sum _{k=1}^{\infty }q_{k}\log(q_{k})}

Linear Karhunen–Loève approximations

[edit]

Consider a whole class of signals we want to approximate over the firstM vectors of a basis. These signals are modeled as realizations of a random vectorY[n] of sizeN. To optimize the approximation we design a basis that minimizes the averageapproximation error. This section proves that optimal bases are Karhunen–Loeve bases that diagonalize the covariance matrix ofY. The random vectorY can be decomposed in an orthogonal basis

{gm}0mN{\displaystyle \left\{g_{m}\right\}_{0\leq m\leq N}}

as follows:

Y=m=0N1Y,gmgm,{\displaystyle Y=\sum _{m=0}^{N-1}\left\langle Y,g_{m}\right\rangle g_{m},}

where each

Y,gm=n=0N1Y[n]gm[n]{\displaystyle \left\langle Y,g_{m}\right\rangle =\sum _{n=0}^{N-1}{Y[n]}g_{m}^{*}[n]}

is a random variable. The approximation from the firstMN vectors of the basis is

YM=m=0M1Y,gmgm{\displaystyle Y_{M}=\sum _{m=0}^{M-1}\left\langle Y,g_{m}\right\rangle g_{m}}

The energy conservation in an orthogonal basis implies

ε[M]=E{YYM2}=m=MN1E{|Y,gm|2}{\displaystyle \varepsilon [M]=\mathbf {E} \left\{\left\|Y-Y_{M}\right\|^{2}\right\}=\sum _{m=M}^{N-1}\mathbf {E} \left\{\left|\left\langle Y,g_{m}\right\rangle \right|^{2}\right\}}

This error is related to the covariance ofY defined by

R[n,m]=E{Y[n]Y[m]}{\displaystyle R[n,m]=\mathbf {E} \left\{Y[n]Y^{*}[m]\right\}}

For any vectorx[n] we denote byK thecovariance operator represented by this matrix,

E{|Y,x|2}=Kx,x=n=0N1m=0N1R[n,m]x[n]x[m]{\displaystyle \mathbf {E} \left\{\left|\langle Y,x\rangle \right|^{2}\right\}=\langle Kx,x\rangle =\sum _{n=0}^{N-1}\sum _{m=0}^{N-1}R[n,m]x[n]x^{*}[m]}

The errorε[M] is therefore a sum of the lastNM coefficients of the covariance operator

ε[M]=m=MN1Kgm,gm{\displaystyle \varepsilon [M]=\sum _{m=M}^{N-1}{\left\langle Kg_{m},g_{m}\right\rangle }}

The covariance operatorK is Hermitian and Positive and is thus diagonalized in an orthogonal basis called a Karhunen–Loève basis. The following theorem states that a Karhunen–Loève basis is optimal for linear approximations.

Theorem (Optimality of Karhunen–Loève basis). LetK be a covariance operator. For allM ≥ 1, the approximation error

ε[M]=m=MN1Kgm,gm{\displaystyle \varepsilon [M]=\sum _{m=M}^{N-1}\left\langle Kg_{m},g_{m}\right\rangle }

is minimum if and only if

{gm}0m<N{\displaystyle \left\{g_{m}\right\}_{0\leq m<N}}

is a Karhunen–Loeve basis ordered by decreasing eigenvalues.

Kgm,gmKgm+1,gm+1,0m<N1.{\displaystyle \left\langle Kg_{m},g_{m}\right\rangle \geq \left\langle Kg_{m+1},g_{m+1}\right\rangle ,\qquad 0\leq m<N-1.}

Non-Linear approximation in bases

[edit]

Linear approximations project the signal onM vectors a priori. The approximation can be made more precise by choosing theM orthogonal vectors depending on the signal properties. This section analyzes the general performance of these non-linear approximations. A signalfH{\displaystyle f\in \mathrm {H} } is approximated with M vectors selected adaptively in an orthonormal basis forH{\displaystyle \mathrm {H} }[definition needed]

B={gm}mN{\displaystyle \mathrm {B} =\left\{g_{m}\right\}_{m\in \mathbb {N} }}

LetfM{\displaystyle f_{M}} be the projection of f over M vectors whose indices are inIM:

fM=mIMf,gmgm{\displaystyle f_{M}=\sum _{m\in I_{M}}\left\langle f,g_{m}\right\rangle g_{m}}

The approximation error is the sum of the remaining coefficients

ε[M]={ffM2}=mIMN1{|f,gm|2}{\displaystyle \varepsilon [M]=\left\{\left\|f-f_{M}\right\|^{2}\right\}=\sum _{m\notin I_{M}}^{N-1}\left\{\left|\left\langle f,g_{m}\right\rangle \right|^{2}\right\}}

To minimize this error, the indices inIM must correspond to the M vectors having the largest inner product amplitude

|f,gm|.{\displaystyle \left|\left\langle f,g_{m}\right\rangle \right|.}

These are the vectors that best correlate f. They can thus be interpreted as the main features of f. The resulting error is necessarily smaller than the error of alinear approximation which selects the M approximation vectors independently of f. Let us sort

{|f,gm|}mN{\displaystyle \left\{\left|\left\langle f,g_{m}\right\rangle \right|\right\}_{m\in \mathbb {N} }}

in decreasing order

|f,gmk||f,gmk+1|.{\displaystyle \left|\left\langle f,g_{m_{k}}\right\rangle \right|\geq \left|\left\langle f,g_{m_{k+1}}\right\rangle \right|.}

The best non-linear approximation is

fM=k=1Mf,gmkgmk{\displaystyle f_{M}=\sum _{k=1}^{M}\left\langle f,g_{m_{k}}\right\rangle g_{m_{k}}}

It can also be written as inner product thresholding:

fM=m=0θT(f,gm)gm{\displaystyle f_{M}=\sum _{m=0}^{\infty }\theta _{T}\left(\left\langle f,g_{m}\right\rangle \right)g_{m}}

with

T=|f,gmM|,θT(x)={x|x|T0|x|<T{\displaystyle T=\left|\left\langle f,g_{m_{M}}\right\rangle \right|,\qquad \theta _{T}(x)={\begin{cases}x&|x|\geq T\\0&|x|<T\end{cases}}}

The non-linear error is

ε[M]={ffM2}=k=M+1{|f,gmk|2}{\displaystyle \varepsilon [M]=\left\{\left\|f-f_{M}\right\|^{2}\right\}=\sum _{k=M+1}^{\infty }\left\{\left|\left\langle f,g_{m_{k}}\right\rangle \right|^{2}\right\}}

this error goes quickly to zero as M increases, if the sorted values of|f,gmk|{\displaystyle \left|\left\langle f,g_{m_{k}}\right\rangle \right|} have a fast decay as k increases. This decay is quantified by computing theIP{\displaystyle \mathrm {I} ^{\mathrm {P} }} norm of the signal inner products in B:

fB,p=(m=0|f,gm|p)1p{\displaystyle \|f\|_{\mathrm {B} ,p}=\left(\sum _{m=0}^{\infty }\left|\left\langle f,g_{m}\right\rangle \right|^{p}\right)^{\frac {1}{p}}}

The following theorem relates the decay ofε[M] tofB,p{\displaystyle \|f\|_{\mathrm {B} ,p}}

Theorem (decay of error). IffB,p<{\displaystyle \|f\|_{\mathrm {B} ,p}<\infty } withp < 2 then

ε[M]fB,p22p1M12p{\displaystyle \varepsilon [M]\leq {\frac {\|f\|_{\mathrm {B} ,p}^{2}}{{\frac {2}{p}}-1}}M^{1-{\frac {2}{p}}}}

and

ε[M]=o(M12p).{\displaystyle \varepsilon [M]=o\left(M^{1-{\frac {2}{p}}}\right).}

Conversely, ifε[M]=o(M12p){\displaystyle \varepsilon [M]=o\left(M^{1-{\frac {2}{p}}}\right)} then

fB,q<{\displaystyle \|f\|_{\mathrm {B} ,q}<\infty } for anyq >p.

Non-optimality of Karhunen–Loève bases

[edit]

To further illustrate the differences between linear and non-linear approximations, we study the decomposition of a simple non-Gaussian random vector in a Karhunen–Loève basis. Processes whose realizations have a random translation are stationary. The Karhunen–Loève basis is then a Fourier basis and we study its performance. To simplify the analysis, consider a random vectorY[n] of sizeN that is random shift moduloN of a deterministic signalf[n] of zero mean

n=0N1f[n]=0{\displaystyle \sum _{n=0}^{N-1}f[n]=0}
Y[n]=f[(np)modN]{\displaystyle Y[n]=f[(n-p){\bmod {N}}]}

The random shiftP is uniformly distributed on [0, N − 1]:

Pr(P=p)=1N,0p<N{\displaystyle \Pr(P=p)={\frac {1}{N}},\qquad 0\leq p<N}

Clearly

E{Y[n]}=1Np=0N1f[(np)modN]=0{\displaystyle \mathbf {E} \{Y[n]\}={\frac {1}{N}}\sum _{p=0}^{N-1}f[(n-p){\bmod {N}}]=0}

and

R[n,k]=E{Y[n]Y[k]}=1Np=0N1f[(np)modN]f[(kp)modN]=1NfΘf¯[nk],f¯[n]=f[n]{\displaystyle R[n,k]=\mathbf {E} \{Y[n]Y[k]\}={\frac {1}{N}}\sum _{p=0}^{N-1}f[(n-p){\bmod {N}}]f[(k-p){\bmod {N}}]={\frac {1}{N}}f\Theta {\bar {f}}[n-k],\quad {\bar {f}}[n]=f[-n]}

Hence

R[n,k]=RY[nk],RY[k]=1NfΘf¯[k]{\displaystyle R[n,k]=R_{Y}[n-k],\qquad R_{Y}[k]={\frac {1}{N}}f\Theta {\bar {f}}[k]}

Since RY is N periodic, Y is a circular stationary random vector. The covariance operator is acircular convolution with RY and is therefore diagonalized in the discrete Fourier Karhunen–Loève basis

{1Nei2πmn/N}0m<N.{\displaystyle \left\{{\frac {1}{\sqrt {N}}}e^{i2\pi mn/N}\right\}_{0\leq m<N}.}

The power spectrum is Fourier transform ofRY:

PY[m]=R^Y[m]=1N|f^[m]|2{\displaystyle P_{Y}[m]={\hat {R}}_{Y}[m]={\frac {1}{N}}\left|{\hat {f}}[m]\right|^{2}}

Example: Consider an extreme case wheref[n]=δ[n]δ[n1]{\displaystyle f[n]=\delta [n]-\delta [n-1]}. A theorem stated above guarantees that the Fourier Karhunen–Loève basis produces a smaller expected approximation error than a canonical basis of Diracs{gm[n]=δ[nm]}0m<N{\displaystyle \left\{g_{m}[n]=\delta [n-m]\right\}_{0\leq m<N}}. Indeed, we do not know a priori the abscissa of the non-zero coefficients ofY, so there is no particular Dirac that is better adapted to perform the approximation. But the Fourier vectors cover the whole support of Y and thus absorb a part of the signal energy.

E{|Y[n],1Nei2πmn/N|2}=PY[m]=4Nsin2(πkN){\displaystyle \mathbf {E} \left\{\left|\left\langle Y[n],{\frac {1}{\sqrt {N}}}e^{i2\pi mn/N}\right\rangle \right|^{2}\right\}=P_{Y}[m]={\frac {4}{N}}\sin ^{2}\left({\frac {\pi k}{N}}\right)}

Selecting higher frequency Fourier coefficients yields a better mean-square approximation than choosing a priori a few Dirac vectors to perform the approximation. The situation is totally different for non-linear approximations. Iff[n]=δ[n]δ[n1]{\displaystyle f[n]=\delta [n]-\delta [n-1]} then the discrete Fourier basis is extremely inefficient because f and hence Y have an energy that is almost uniformly spread among all Fourier vectors. In contrast, since f has only two non-zero coefficients in the Dirac basis, a non-linear approximation of Y withM ≥ 2 gives zero error.[5]

Principal component analysis

[edit]
Main article:Principal component analysis

We have established the Karhunen–Loève theorem and derived a few properties thereof. We also noted that one hurdle in its application was the numerical cost of determining the eigenvalues and eigenfunctions of its covariance operator through the Fredholm integral equation of the second kind

abKX(s,t)ek(s)ds=λkek(t).{\displaystyle \int _{a}^{b}K_{X}(s,t)e_{k}(s)\,ds=\lambda _{k}e_{k}(t).}

However, when applied to a discrete and finite process(Xn)n{1,,N}{\displaystyle \left(X_{n}\right)_{n\in \{1,\ldots ,N\}}}, the problem takes a much simpler form and standard algebra can be used to carry out the calculations.

Note that a continuous process can also be sampled atN points in time in order to reduce the problem to a finite version.

We henceforth consider a randomN-dimensional vectorX=(X1 X2  XN)T{\displaystyle X=\left(X_{1}~X_{2}~\ldots ~X_{N}\right)^{T}}. As mentioned above,X could containN samples of a signal but it can hold many more representations depending on the field of application. For instance it could be the answers to a survey or economic data in an econometrics analysis.

As in the continuous version, we assume thatX is centered, otherwise we can letX:=XμX{\displaystyle X:=X-\mu _{X}} (whereμX{\displaystyle \mu _{X}} is themean vector ofX) which is centered.

Let us adapt the procedure to the discrete case.

Covariance matrix

[edit]

Recall that the main implication and difficulty of the KL transformation is computing the eigenvectors of the linear operator associated to the covariance function, which are given by the solutions to the integral equation written above.

Define Σ, the covariance matrix ofX, as anN ×N matrix whose elements are given by:

Σij=E[XiXj],i,j{1,,N}{\displaystyle \Sigma _{ij}=\mathbf {E} [X_{i}X_{j}],\qquad \forall i,j\in \{1,\ldots ,N\}}

Rewriting the above integral equation to suit the discrete case, we observe that it turns into:

j=1NΣijej=λeiΣe=λe{\displaystyle \sum _{j=1}^{N}\Sigma _{ij}e_{j}=\lambda e_{i}\quad \Leftrightarrow \quad \Sigma e=\lambda e}

wheree=(e1 e2  eN)T{\displaystyle e=(e_{1}~e_{2}~\ldots ~e_{N})^{T}} is anN-dimensional vector.

The integral equation thus reduces to a simple matrix eigenvalue problem, which explains why the PCA has such a broad domain of applications.

Since Σ is a positive definite symmetric matrix, it possesses a set of orthonormal eigenvectors forming a basis ofRN{\displaystyle \mathbb {R} ^{N}}, and we write{λi,φi}i{1,,N}{\displaystyle \{\lambda _{i},\varphi _{i}\}_{i\in \{1,\ldots ,N\}}} this set of eigenvalues and corresponding eigenvectors, listed in decreasing values ofλi. Let alsoΦ be the orthonormal matrix consisting of these eigenvectors:

Φ:=(φ1 φ2  φN)TΦTΦ=I{\displaystyle {\begin{aligned}\Phi &:=\left(\varphi _{1}~\varphi _{2}~\ldots ~\varphi _{N}\right)^{T}\\\Phi ^{T}\Phi &=I\end{aligned}}}

Principal component transform

[edit]

It remains to perform the actual KL transformation, called theprincipal component transform in this case. Recall that the transform was found by expanding the process with respect to the basis spanned by the eigenvectors of the covariance function. In this case, we hence have:

X=i=1Nφi,Xφi=i=1NφiTXφi{\displaystyle X=\sum _{i=1}^{N}\langle \varphi _{i},X\rangle \varphi _{i}=\sum _{i=1}^{N}\varphi _{i}^{T}X\varphi _{i}}

In a more compact form, the principal component transform ofX is defined by:

{Y=ΦTXX=ΦY{\displaystyle {\begin{cases}Y=\Phi ^{T}X\\X=\Phi Y\end{cases}}}

Thei-th component ofY isYi=φiTX{\displaystyle Y_{i}=\varphi _{i}^{T}X}, the projection ofX onφi{\displaystyle \varphi _{i}} and the inverse transformX = ΦY yields the expansion ofX on the space spanned by theφi{\displaystyle \varphi _{i}}:

X=i=1NYiφi=i=1Nφi,Xφi{\displaystyle X=\sum _{i=1}^{N}Y_{i}\varphi _{i}=\sum _{i=1}^{N}\langle \varphi _{i},X\rangle \varphi _{i}}

As in the continuous case, we may reduce the dimensionality of the problem by truncating the sum at someK{1,,N}{\displaystyle K\in \{1,\ldots ,N\}} such that

i=1Kλii=1Nλiα{\displaystyle {\frac {\sum _{i=1}^{K}\lambda _{i}}{\sum _{i=1}^{N}\lambda _{i}}}\geq \alpha }

where α is the explained variance threshold we wish to set.

We can also reduce the dimensionality through the use of multilevel dominant eigenvector estimation (MDEE).[6]

Examples

[edit]

The Wiener process

[edit]

There are numerous equivalent characterizations of theWiener process which is a mathematical formalization ofBrownian motion. Here we regard it as the centered standard Gaussian processWt with covariance function

KW(t,s)=cov(Wt,Ws)=min(s,t).{\displaystyle K_{W}(t,s)=\operatorname {cov} (W_{t},W_{s})=\min(s,t).}

We restrict the time domain to [a,b]=[0,1] without loss of generality.

The eigenvectors of the covariance kernel are easily determined. These are

ek(t)=2sin((k12)πt){\displaystyle e_{k}(t)={\sqrt {2}}\sin \left(\left(k-{\tfrac {1}{2}}\right)\pi t\right)}

and the corresponding eigenvalues are

λk=1(k12)2π2.{\displaystyle \lambda _{k}={\frac {1}{(k-{\frac {1}{2}})^{2}\pi ^{2}}}.}
Proof

In order to find the eigenvalues and eigenvectors, we need to solve the integral equation:

abKW(s,t)e(s)ds=λe(t)t,0t101min(s,t)e(s)ds=λe(t)t,0t10tse(s)ds+tt1e(s)ds=λe(t)t,0t1{\displaystyle {\begin{aligned}\int _{a}^{b}K_{W}(s,t)e(s)\,ds&=\lambda e(t)\qquad \forall t,0\leq t\leq 1\\\int _{0}^{1}\min(s,t)e(s)\,ds&=\lambda e(t)\qquad \forall t,0\leq t\leq 1\\\int _{0}^{t}se(s)\,ds+t\int _{t}^{1}e(s)\,ds&=\lambda e(t)\qquad \forall t,0\leq t\leq 1\end{aligned}}}

differentiating once with respect tot yields:

t1e(s)ds=λe(t){\displaystyle \int _{t}^{1}e(s)\,ds=\lambda e'(t)}

a second differentiation produces the following differential equation:

e(t)=λe(t){\displaystyle -e(t)=\lambda e''(t)}

The general solution of which has the form:

e(t)=Asin(tλ)+Bcos(tλ){\displaystyle e(t)=A\sin \left({\frac {t}{\sqrt {\lambda }}}\right)+B\cos \left({\frac {t}{\sqrt {\lambda }}}\right)}

whereA andB are two constants to be determined with the boundary conditions. Settingt = 0 in the initial integral equation givese(0) = 0 which implies thatB = 0 and similarly, settingt = 1 in the first differentiation yieldse'(1) = 0, whence:

cos(1λ)=0{\displaystyle \cos \left({\frac {1}{\sqrt {\lambda }}}\right)=0}

which in turn implies that eigenvalues ofTKX are:

λk=(1(k12)π)2,k1{\displaystyle \lambda _{k}=\left({\frac {1}{(k-{\frac {1}{2}})\pi }}\right)^{2},\qquad k\geq 1}

The corresponding eigenfunctions are thus of the form:

ek(t)=Asin((k12)πt),k1{\displaystyle e_{k}(t)=A\sin \left((k-{\frac {1}{2}})\pi t\right),\qquad k\geq 1}

A is then chosen so as to normalizeek:

01ek2(t)dt=1A=2{\displaystyle \int _{0}^{1}e_{k}^{2}(t)\,dt=1\quad \implies \quad A={\sqrt {2}}}

This gives the following representation of the Wiener process:

Theorem. There is a sequence {Zi}i of independent Gaussian random variables with mean zero and variance 1 such that

Wt=2k=1Zksin((k12)πt)(k12)π.{\displaystyle W_{t}={\sqrt {2}}\sum _{k=1}^{\infty }Z_{k}{\frac {\sin \left(\left(k-{\frac {1}{2}}\right)\pi t\right)}{\left(k-{\frac {1}{2}}\right)\pi }}.}

Note that this representation is only valid fort[0,1].{\displaystyle t\in [0,1].} On larger intervals, the increments are not independent. As stated in the theorem, convergence is in the L2 norm and uniform in t.

The Brownian bridge

[edit]

Similarly theBrownian bridgeBt=WttW1{\displaystyle B_{t}=W_{t}-tW_{1}} which is astochastic process with covariance function

KB(t,s)=min(t,s)ts{\displaystyle K_{B}(t,s)=\min(t,s)-ts}

can be represented as the series

Bt=k=1Zk2sin(kπt)kπ{\displaystyle B_{t}=\sum _{k=1}^{\infty }Z_{k}{\frac {{\sqrt {2}}\sin(k\pi t)}{k\pi }}}

Applications

[edit]
[icon]
This sectionneeds expansion. You can help byadding missing information.(July 2010)

Adaptive optics systems sometimes use K–L functions to reconstruct wave-front phase information (Dai 1996, JOSA A).Karhunen–Loève expansion is closely related to theSingular Value Decomposition. The latter has myriad applications in image processing, radar, seismology, and the like. If one has independent vector observations from a vector valued stochastic process then the left singular vectors aremaximum likelihood estimates of the ensemble KL expansion.

Applications in signal estimation and detection

[edit]

Detection of a known continuous signalS(t)

[edit]

In communication, we usually have to decide whether a signal from a noisy channel contains valuable information. The following hypothesis testing is used for detecting continuous signals(t) from channel outputX(t),N(t) is the channel noise, which is usually assumed zero mean Gaussian process with correlation functionRN(t,s)=E[N(t)N(s)]{\displaystyle R_{N}(t,s)=E[N(t)N(s)]}

H:X(t)=N(t),{\displaystyle H:X(t)=N(t),}
K:X(t)=N(t)+s(t),t(0,T){\displaystyle K:X(t)=N(t)+s(t),\quad t\in (0,T)}

Signal detection in white noise

[edit]

When the channel noise is white, its correlation function is

RN(t)=12N0δ(t),{\displaystyle R_{N}(t)={\tfrac {1}{2}}N_{0}\delta (t),}

and it has constant power spectrum density. In physically practical channel, the noise power is finite, so:

SN(f)={N02|f|<w0|f|>w{\displaystyle S_{N}(f)={\begin{cases}{\frac {N_{0}}{2}}&|f|<w\\0&|f|>w\end{cases}}}

Then the noise correlation function is sinc function with zeros atn2ω,nZ.{\displaystyle {\frac {n}{2\omega }},n\in \mathbf {Z} .} Since are uncorrelated and gaussian, they are independent. Thus we can take samples fromX(t) with time spacing

Δt=n2ω within (0,T).{\displaystyle \Delta t={\frac {n}{2\omega }}{\text{ within }}(0,''T'').}

LetXi=X(iΔt){\displaystyle X_{i}=X(i\,\Delta t)}. We have a total ofn=TΔt=T(2ω)=2ωT{\displaystyle n={\frac {T}{\Delta t}}=T(2\omega )=2\omega T} i.i.d observations{X1,X2,,Xn}{\displaystyle \{X_{1},X_{2},\ldots ,X_{n}\}} to develop the likelihood-ratio test. Define signalSi=S(iΔt){\displaystyle S_{i}=S(i\,\Delta t)}, the problem becomes,

H:Xi=Ni,{\displaystyle H:X_{i}=N_{i},}
K:Xi=Ni+Si,i=1,2,,n.{\displaystyle K:X_{i}=N_{i}+S_{i},i=1,2,\ldots ,n.}

The log-likelihood ratio

L(x_)=logi=1n(2SixiSi2)2σ2Δti=1nSixi=i=1nS(iΔt)x(iΔt)Δtλ2{\displaystyle {\mathcal {L}}({\underline {x}})=\log {\frac {\sum _{i=1}^{n}(2S_{i}x_{i}-S_{i}^{2})}{2\sigma ^{2}}}\Leftrightarrow \Delta t\sum _{i=1}^{n}S_{i}x_{i}=\sum _{i=1}^{n}S(i\,\Delta t)x(i\,\Delta t)\,\Delta t\gtrless \lambda _{\cdot }2}

Ast → 0, let:

G=0TS(t)x(t)dt.{\displaystyle G=\int _{0}^{T}S(t)x(t)\,dt.}

ThenG is the test statistics and theNeyman–Pearson optimum detector is

G(x_)>G0K<G0H.{\displaystyle G({\underline {x}})>G_{0}\Rightarrow K<G_{0}\Rightarrow H.}

AsG is Gaussian, we can characterize it by finding its mean and variances. Then we get

H:GN(0,12N0E){\displaystyle H:G\sim N\left(0,{\tfrac {1}{2}}N_{0}E\right)}
K:GN(E,12N0E){\displaystyle K:G\sim N\left(E,{\tfrac {1}{2}}N_{0}E\right)}

where

E=0TS2(t)dt{\displaystyle \mathbf {E} =\int _{0}^{T}S^{2}(t)\,dt}

is the signal energy.

The false alarm error

α=G0N(0,12N0E)dGG0=12N0EΦ1(1α){\displaystyle \alpha =\int _{G_{0}}^{\infty }N\left(0,{\tfrac {1}{2}}N_{0}E\right)\,dG\Rightarrow G_{0}={\sqrt {{\tfrac {1}{2}}N_{0}E}}\Phi ^{-1}(1-\alpha )}

And the probability of detection:

β=G0N(E,12N0E)dG=1Φ(G0E12N0E)=Φ(2EN0Φ1(1α)),{\displaystyle \beta =\int _{G_{0}}^{\infty }N\left(E,{\tfrac {1}{2}}N_{0}E\right)\,dG=1-\Phi \left({\frac {G_{0}-E}{\sqrt {{\tfrac {1}{2}}N_{0}E}}}\right)=\Phi \left({\sqrt {\frac {2E}{N_{0}}}}-\Phi ^{-1}(1-\alpha )\right),}

where Φ is the cdf of standard normal, or Gaussian, variable.

Signal detection in colored noise

[edit]

When N(t) is colored (correlated in time) Gaussian noise with zero mean and covariance functionRN(t,s)=E[N(t)N(s)],{\displaystyle R_{N}(t,s)=E[N(t)N(s)],} we cannot sample independent discrete observations by evenly spacing the time. Instead, we can use K–L expansion to decorrelate the noise process and get independent Gaussian observation 'samples'. The K–L expansion ofN(t):

N(t)=i=1NiΦi(t),0<t<T,{\displaystyle N(t)=\sum _{i=1}^{\infty }N_{i}\Phi _{i}(t),\quad 0<t<T,}

whereNi=N(t)Φi(t)dt{\displaystyle N_{i}=\int N(t)\Phi _{i}(t)\,dt} and the orthonormal bases{Φit}{\displaystyle \{\Phi _{i}{t}\}} are generated by kernelRN(t,s){\displaystyle R_{N}(t,s)}, i.e., solution to

0TRN(t,s)Φi(s)ds=λiΦi(t),var[Ni]=λi.{\displaystyle \int _{0}^{T}R_{N}(t,s)\Phi _{i}(s)\,ds=\lambda _{i}\Phi _{i}(t),\quad \operatorname {var} [N_{i}]=\lambda _{i}.}

Do the expansion:

S(t)=i=1SiΦi(t),{\displaystyle S(t)=\sum _{i=1}^{\infty }S_{i}\Phi _{i}(t),}

whereSi=0TS(t)Φi(t)dt{\displaystyle S_{i}=\int _{0}^{T}S(t)\Phi _{i}(t)\,dt}, then

Xi=0TX(t)Φi(t)dt=Ni{\displaystyle X_{i}=\int _{0}^{T}X(t)\Phi _{i}(t)\,dt=N_{i}}

under H andNi+Si{\displaystyle N_{i}+S_{i}} under K. LetX¯={X1,X2,}{\displaystyle {\overline {X}}=\{X_{1},X_{2},\dots \}}, we have

Ni{\displaystyle N_{i}} are independent Gaussian r.v's with varianceλi{\displaystyle \lambda _{i}}
under H:{Xi}{\displaystyle \{X_{i}\}} are independent Gaussian r.v's.
fH[x(t)|0<t<T]=fH(x_)=i=112πλiexp(xi22λi){\displaystyle f_{H}[x(t)|0<t<T]=f_{H}({\underline {x}})=\prod _{i=1}^{\infty }{\frac {1}{\sqrt {2\pi \lambda _{i}}}}\exp \left(-{\frac {x_{i}^{2}}{2\lambda _{i}}}\right)}
under K:{XiSi}{\displaystyle \{X_{i}-S_{i}\}} are independent Gaussian r.v's.
fK[x(t)0<t<T]=fK(x_)=i=112πλiexp((xiSi)22λi){\displaystyle f_{K}[x(t)\mid 0<t<T]=f_{K}({\underline {x}})=\prod _{i=1}^{\infty }{\frac {1}{\sqrt {2\pi \lambda _{i}}}}\exp \left(-{\frac {(x_{i}-S_{i})^{2}}{2\lambda _{i}}}\right)}

Hence, the log-LR is given by

L(x_)=i=12SixiSi22λi{\displaystyle {\mathcal {L}}({\underline {x}})=\sum _{i=1}^{\infty }{\frac {2S_{i}x_{i}-S_{i}^{2}}{2\lambda _{i}}}}

and the optimum detector is

G=i=1Sixiλi>G0K,<G0H.{\displaystyle G=\sum _{i=1}^{\infty }S_{i}x_{i}\lambda _{i}>G_{0}\Rightarrow K,<G_{0}\Rightarrow H.}

Define

k(t)=i=1λiSiΦi(t),0<t<T,{\displaystyle k(t)=\sum _{i=1}^{\infty }\lambda _{i}S_{i}\Phi _{i}(t),0<t<T,}

thenG=0Tk(t)x(t)dt.{\displaystyle G=\int _{0}^{T}k(t)x(t)\,dt.}

How to findk(t)
[edit]

Since

0TRN(t,s)k(s)ds=i=1λiSi0TRN(t,s)Φi(s)ds=i=1SiΦi(t)=S(t),{\displaystyle \int _{0}^{T}R_{N}(t,s)k(s)\,ds=\sum _{i=1}^{\infty }\lambda _{i}S_{i}\int _{0}^{T}R_{N}(t,s)\Phi _{i}(s)\,ds=\sum _{i=1}^{\infty }S_{i}\Phi _{i}(t)=S(t),}

k(t) is the solution to

0TRN(t,s)k(s)ds=S(t).{\displaystyle \int _{0}^{T}R_{N}(t,s)k(s)\,ds=S(t).}

IfN(t)is wide-sense stationary,

0TRN(ts)k(s)ds=S(t),{\displaystyle \int _{0}^{T}R_{N}(t-s)k(s)\,ds=S(t),}

which is known as theWiener–Hopf equation. The equation can be solved by taking fourier transform, but not practically realizable since infinite spectrum needs spatial factorization. A special case which is easy to calculatek(t) is white Gaussian noise.

0TN02δ(ts)k(s)ds=S(t)k(t)=CS(t),0<t<T.{\displaystyle \int _{0}^{T}{\frac {N_{0}}{2}}\delta (t-s)k(s)\,ds=S(t)\Rightarrow k(t)=CS(t),\quad 0<t<T.}

The corresponding impulse response ish(t) =k(T − t) =CS(T − t). LetC = 1, this is just the result we arrived at in previous section for detecting of signal in white noise.

Test threshold for Neyman–Pearson detector
[edit]

Since X(t) is a Gaussian process,

G=0Tk(t)x(t)dt,{\displaystyle G=\int _{0}^{T}k(t)x(t)\,dt,}

is a Gaussian random variable that can be characterized by its mean and variance.

E[GH]=0Tk(t)E[x(t)H]dt=0E[GK]=0Tk(t)E[x(t)K]dt=0Tk(t)S(t)dtρE[G2H]=0T0Tk(t)k(s)RN(t,s)dtds=0Tk(t)(0Tk(s)RN(t,s)ds)=0Tk(t)S(t)dt=ρvar[GH]=E[G2H](E[GH])2=ρE[G2K]=0T0Tk(t)k(s)E[x(t)x(s)]dtds=0T0Tk(t)k(s)(RN(t,s)+S(t)S(s))dtds=ρ+ρ2var[GK]=E[G2|K](E[G|K])2=ρ+ρ2ρ2=ρ{\displaystyle {\begin{aligned}\mathbf {E} [G\mid H]&=\int _{0}^{T}k(t)\mathbf {E} [x(t)\mid H]\,dt=0\\\mathbf {E} [G\mid K]&=\int _{0}^{T}k(t)\mathbf {E} [x(t)\mid K]\,dt=\int _{0}^{T}k(t)S(t)\,dt\equiv \rho \\\mathbf {E} [G^{2}\mid H]&=\int _{0}^{T}\int _{0}^{T}k(t)k(s)R_{N}(t,s)\,dt\,ds=\int _{0}^{T}k(t)\left(\int _{0}^{T}k(s)R_{N}(t,s)\,ds\right)=\int _{0}^{T}k(t)S(t)\,dt=\rho \\\operatorname {var} [G\mid H]&=\mathbf {E} [G^{2}\mid H]-(\mathbf {E} [G\mid H])^{2}=\rho \\\mathbf {E} [G^{2}\mid K]&=\int _{0}^{T}\int _{0}^{T}k(t)k(s)\mathbf {E} [x(t)x(s)]\,dt\,ds=\int _{0}^{T}\int _{0}^{T}k(t)k(s)(R_{N}(t,s)+S(t)S(s))\,dt\,ds=\rho +\rho ^{2}\\\operatorname {var} [G\mid K]&=\mathbf {E} [G^{2}|K]-(\mathbf {E} [G|K])^{2}=\rho +\rho ^{2}-\rho ^{2}=\rho \end{aligned}}}

Hence, we obtain the distributions ofH andK:

H:GN(0,ρ){\displaystyle H:G\sim N(0,\rho )}
K:GN(ρ,ρ){\displaystyle K:G\sim N(\rho ,\rho )}

The false alarm error is

α=G0N(0,ρ)dG=1Φ(G0ρ).{\displaystyle \alpha =\int _{G_{0}}^{\infty }N(0,\rho )\,dG=1-\Phi \left({\frac {G_{0}}{\sqrt {\rho }}}\right).}

So the test threshold for the Neyman–Pearson optimum detector is

G0=ρΦ1(1α).{\displaystyle G_{0}={\sqrt {\rho }}\Phi ^{-1}(1-\alpha ).}

Its power of detection is

β=G0N(ρ,ρ)dG=Φ(ρΦ1(1α)){\displaystyle \beta =\int _{G_{0}}^{\infty }N(\rho ,\rho )\,dG=\Phi \left({\sqrt {\rho }}-\Phi ^{-1}(1-\alpha )\right)}

When the noise is white Gaussian process, the signal power is

ρ=0Tk(t)S(t)dt=0TS(t)2dt=E.{\displaystyle \rho =\int _{0}^{T}k(t)S(t)\,dt=\int _{0}^{T}S(t)^{2}\,dt=E.}
Prewhitening
[edit]

For some type of colored noise, a typical practise is to add a prewhitening filter before the matched filter to transform the colored noise into white noise. For example, N(t) is a wide-sense stationary colored noise with correlation function

RN(τ)=BN04eB|τ|{\displaystyle R_{N}(\tau )={\frac {BN_{0}}{4}}e^{-B|\tau |}}
SN(f)=N02(1+(wB)2){\displaystyle S_{N}(f)={\frac {N_{0}}{2(1+({\frac {w}{B}})^{2})}}}

The transfer function of prewhitening filter is

H(f)=1+jwB.{\displaystyle H(f)=1+j{\frac {w}{B}}.}

Detection of a Gaussian random signal inAdditive white Gaussian noise (AWGN)

[edit]

When the signal we want to detect from the noisy channel is also random, for example, a white Gaussian processX(t), we can still implement K–L expansion to get independent sequence of observation. In this case, the detection problem is described as follows:

H0:Y(t)=N(t){\displaystyle H_{0}:Y(t)=N(t)}
H1:Y(t)=N(t)+X(t),0<t<T.{\displaystyle H_{1}:Y(t)=N(t)+X(t),\quad 0<t<T.}

X(t) is a random process with correlation functionRX(t,s)=E{X(t)X(s)}{\displaystyle R_{X}(t,s)=E\{X(t)X(s)\}}

The K–L expansion ofX(t) is

X(t)=i=1XiΦi(t),{\displaystyle X(t)=\sum _{i=1}^{\infty }X_{i}\Phi _{i}(t),}

where

Xi=0TX(t)Φi(t)dt{\displaystyle X_{i}=\int _{0}^{T}X(t)\Phi _{i}(t)\,dt}

andΦi(t){\displaystyle \Phi _{i}(t)} are solutions to

0TRX(t,s)Φi(s)ds=λiΦi(t).{\displaystyle \int _{0}^{T}R_{X}(t,s)\Phi _{i}(s)ds=\lambda _{i}\Phi _{i}(t).}

SoXi{\displaystyle X_{i}}'s are independent sequence of r.v's with zero mean and varianceλi{\displaystyle \lambda _{i}}. ExpandingY(t) andN(t) byΦi(t){\displaystyle \Phi _{i}(t)}, we get

Yi=0TY(t)Φi(t)dt=0T[N(t)+X(t)]Φi(t)=Ni+Xi,{\displaystyle Y_{i}=\int _{0}^{T}Y(t)\Phi _{i}(t)\,dt=\int _{0}^{T}[N(t)+X(t)]\Phi _{i}(t)=N_{i}+X_{i},}

where

Ni=0TN(t)Φi(t)dt.{\displaystyle N_{i}=\int _{0}^{T}N(t)\Phi _{i}(t)\,dt.}

AsN(t) is Gaussian white noise,Ni{\displaystyle N_{i}}'s are i.i.d sequence of r.v with zero mean and variance12N0{\displaystyle {\tfrac {1}{2}}N_{0}}, then the problem is simplified as follows,

H0:Yi=Ni{\displaystyle H_{0}:Y_{i}=N_{i}}
H1:Yi=Ni+Xi{\displaystyle H_{1}:Y_{i}=N_{i}+X_{i}}

The Neyman–Pearson optimal test:

Λ=fYH1fYH0=Cei=1yi22λi12N0(12N0+λi),{\displaystyle \Lambda ={\frac {f_{Y}\mid H_{1}}{f_{Y}\mid H_{0}}}=Ce^{-\sum _{i=1}^{\infty }{\frac {y_{i}^{2}}{2}}{\frac {\lambda _{i}}{{\tfrac {1}{2}}N_{0}({\tfrac {1}{2}}N_{0}+\lambda _{i})}}},}

so the log-likelihood ratio is

L=ln(Λ)=Ki=112yi2λiN02(N02+λi).{\displaystyle {\mathcal {L}}=\ln(\Lambda )=K-\sum _{i=1}^{\infty }{\tfrac {1}{2}}y_{i}^{2}{\frac {\lambda _{i}}{{\frac {N_{0}}{2}}\left({\frac {N_{0}}{2}}+\lambda _{i}\right)}}.}

Since

X^i=λiN02(N02+λi){\displaystyle {\widehat {X}}_{i}={\frac {\lambda _{i}}{{\frac {N_{0}}{2}}\left({\frac {N_{0}}{2}}+\lambda _{i}\right)}}}

is just the minimum-mean-square estimate ofXi{\displaystyle X_{i}} givenYi{\displaystyle Y_{i}}'s,

L=K+1N0i=1YiX^i.{\displaystyle {\mathcal {L}}=K+{\frac {1}{N_{0}}}\sum _{i=1}^{\infty }Y_{i}{\widehat {X}}_{i}.}

K–L expansion has the following property: If

f(t)=fiΦi(t),g(t)=giΦi(t),{\displaystyle f(t)=\sum f_{i}\Phi _{i}(t),g(t)=\sum g_{i}\Phi _{i}(t),}

where

fi=0Tf(t)Φi(t)dt,gi=0Tg(t)Φi(t)dt.{\displaystyle f_{i}=\int _{0}^{T}f(t)\Phi _{i}(t)\,dt,\quad g_{i}=\int _{0}^{T}g(t)\Phi _{i}(t)\,dt.}

then

i=1figi=0Tg(t)f(t)dt.{\displaystyle \sum _{i=1}^{\infty }f_{i}g_{i}=\int _{0}^{T}g(t)f(t)\,dt.}

So let

X^(tT)=i=1X^iΦi(t),L=K+1N00TY(t)X^(tT)dt.{\displaystyle {\widehat {X}}(t\mid T)=\sum _{i=1}^{\infty }{\widehat {X}}_{i}\Phi _{i}(t),\quad {\mathcal {L}}=K+{\frac {1}{N_{0}}}\int _{0}^{T}Y(t){\widehat {X}}(t\mid T)\,dt.}

Noncausal filterQ(t,s) can be used to get the estimate through

X^(tT)=0TQ(t,s)Y(s)ds.{\displaystyle {\widehat {X}}(t\mid T)=\int _{0}^{T}Q(t,s)Y(s)\,ds.}

Byorthogonality principle,Q(t,s) satisfies

0TQ(t,s)RX(s,t)ds+N02Q(t,λ)=RX(t,λ),0<λ<T,0<t<T.{\displaystyle \int _{0}^{T}Q(t,s)R_{X}(s,t)\,ds+{\tfrac {N_{0}}{2}}Q(t,\lambda )=R_{X}(t,\lambda ),0<\lambda <T,0<t<T.}

However, for practical reasons, it's necessary to further derive the causal filterh(t,s), whereh(t,s) = 0 fors >t, to get estimateX^(tt){\displaystyle {\widehat {X}}(t\mid t)}. Specifically,

Q(t,s)=h(t,s)+h(s,t)0Th(λ,t)h(s,λ)dλ{\displaystyle Q(t,s)=h(t,s)+h(s,t)-\int _{0}^{T}h(\lambda ,t)h(s,\lambda )\,d\lambda }

See also

[edit]

Notes

[edit]
  1. ^Sapatnekar, Sachin (2011), "Overcoming variations in nanometer-scale technologies",IEEE Journal on Emerging and Selected Topics in Circuits and Systems,1 (1):5–1,Bibcode:2011IJEST...1....5S,CiteSeerX 10.1.1.300.5659,doi:10.1109/jetcas.2011.2138250,S2CID 15566585
  2. ^Ghoman, Satyajit; Wang, Zhicun; Chen, PC; Kapania, Rakesh (2012). "A POD-based Reduced Order Design Scheme for Shape Optimization of Air Vehicles".Proc of 53rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, AIAA-2012-1808, Honolulu, Hawaii.
  3. ^Karhunen–Loeve transform (KLT)Archived 2016-11-28 at theWayback Machine, Computer Image Processing and Analysis (E161) lectures, Harvey Mudd College
  4. ^Giambartolomei, Giordano (2016). "4 The Karhunen-Loève Theorem".The Karhunen-Loève theorem (Bachelors). University of Bologna.
  5. ^A wavelet tour of signal processing-Stéphane Mallat
  6. ^X. Tang, “Texture information in run-length matrices,” IEEE Transactions on Image Processing, vol. 7, No. 11, pp. 1602–1609, Nov. 1998

References

[edit]
  • Stark, Henry; Woods, John W. (1986).Probability, Random Processes, and Estimation Theory for Engineers. Prentice-Hall, Inc.ISBN 978-0-13-711706-2.OL 21138080M.
  • Ghanem, Roger; Spanos, Pol (1991).Stochastic finite elements: a spectral approach. Springer-Verlag.ISBN 978-0-387-97456-9.OL 1865197M.
  • Guikhman, I.; Skorokhod, A. (1977).Introduction a la Théorie des Processus Aléatoires. Éditions MIR.
  • Simon, B. (1979).Functional Integration and Quantum Physics. Academic Press.
  • Karhunen, Kari (1947). "Über lineare Methoden in der Wahrscheinlichkeitsrechnung".Ann. Acad. Sci. Fennicae. Ser. A I. Math.-Phys.37:1–79.
  • Loève, M. (1978).Probability theory Vol. II. Graduate Texts in Mathematics. Vol. 46 (4 ed.). Springer-Verlag.ISBN 978-0-387-90262-3.
  • Dai, G. (1996). "Modal wave-front reconstruction with Zernike polynomials and Karhunen–Loeve functions".JOSA A.13 (6): 1218.Bibcode:1996JOSAA..13.1218D.doi:10.1364/JOSAA.13.001218.
  • Wu B., Zhu J., Najm F.(2005) "A Non-parametric Approach for Dynamic Range Estimation of Nonlinear Systems". In Proceedings of Design Automation Conference(841–844) 2005
  • Wu B., Zhu J., Najm F.(2006) "Dynamic Range Estimation". IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 25 Issue:9 (1618–1636) 2006
  • Jorgensen, Palle E. T.; Song, Myung-Sin (2007). "Entropy Encoding, Hilbert Space and Karhunen–Loeve Transforms".Journal of Mathematical Physics.48 (10): 103503.arXiv:math-ph/0701056.Bibcode:2007JMP....48j3503J.doi:10.1063/1.2793569.S2CID 17039075.

External links

[edit]
Discrete time
Continuous time
Both
Fields and other
Time series models
Financial models
Actuarial models
Queueing models
Properties
Limit theorems
Inequalities
Tools
Disciplines
Retrieved from "https://en.wikipedia.org/w/index.php?title=Kosambi–Karhunen–Loève_theorem&oldid=1334891229"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp