Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
                                  NCBI home page
Search in PMCSearch
As a library, NLM provides access to scientific literature. Inclusion in an NLM database does not imply endorsement of, or agreement with, the contents by NLM or the National Institutes of Health.
Learn more:PMC Disclaimer | PMC Copyright Notice
NIHPA Author Manuscripts logo
. Author manuscript; available in PMC: 2016 Mar 18.

On Matrix-Valued Monge–Kantorovich Optimal Mass Transport

Lipeng Ning1,Tryphon T Georgiou2,Allen Tannenbaum3
1Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115 USA
2Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA
3Departments of Computer Science and Applied Mathematics, Stony Brook University, Stony Brook, NY 11794 USA

Issue date 2015 Feb.

PMCID: PMC4798256  NIHMSID: NIHMS757419  PMID:26997667
The publisher's version of this article is available atIEEE Trans Automat Contr

Abstract

We present a particular formulation of optimal transport for matrix-valued density functions. Our aim is to devise a geometry which is suitable for comparing power spectral densities of multivariable time series. More specifically, the value of a power spectral density at a given frequency, which in the matricial case encodes power as well as directionality, is thought of as a proxy for a “matrix-valued mass density.” Optimal transport aims at establishing a natural metric in the space of such matrix-valued densities which takes into account differences between power across frequencies as well as misalignment of the corresponding principle axes. Thus, our transportation cost includes a cost of transference of power between frequencies together with a cost of rotating the principle directions of matrix densities. The two endpoint matrix-valued densities can be thought of as marginals of a joint matrix-valued density on a tensor product space. This joint density, very much as in the classical Monge–Kantorovich setting, can be thought to specify the transportation plan. Contrary to the classical setting, the optimal transport plan for matrices is no longer supported on a thin zero-measure set.

Index Terms: Convex optimization, matrix-valued density functions, optimal mass-transport

I. Introduction

The problem of optimal mass transport (OMT) dates back to the work of G. Monge in 1781 [1] while its modern formulation is due to L. Kantorovich [2]. In recent years the subject is developing rapidly due to its intrinsic significance and range of applications in physics, economics, and probability [3]–[5].

Our motivation for studying “matrix-valued transport” originates in the spectral analysis of multi-variable time series. Just as in scalar time series, spectral content is assessed based on (estimated) statistics of the underlying process, where these are simply moments of the corresponding power spectral density (PSD). Different metrics have been proposed to compare PSD’s for purposes of spectral approximation, estimation, and system modeling (see [6], [7] and the references therein). However, since spectra are estimated based on integrals, weak* metrics1 are preferable since they provide continuity of statistics to perturbations in the PSD. Earlier metrics and so called, divergence measures, typically fail in this respect (see [7]). Hence, for this reason, optimal mass transport which endows the space of (scalar) probability/mass/power densities with a natural weak* metric—the Wasserstein metric, is of particular interest. Our aim in this paper is to develop one possible such generalization of the Wasserstein metric that allows comparison of matrix-valued density functions in a similar spirit.

The scalar OMT theory has been adapted in [8] to model slowly time-varying changes in power spectra of time series and has been used for statistical estimation, data assimilation, and morphing. While in scalar time series, the power spectral content may drift across frequencies over time (e.g., when considering Doppler effects, echolocation of a moving target, etc.), in vector-valued time series the power spectral content may shift principle directions as well. In fact, such a rotation of the power-specral content is typical in general antenna-arrays when a scatterer changes position with respect to array elements. Therefore, a concept of transport between matrix-valued densities requires that we take into account both, the cost of shifting power across frequencies as well as the cost of rotating the corresponding principle axes. Besides our particular formulation of a “non-commutative” Monge–Kantorovich transportation problem and of a correponding metric, the main results in this paper are i) that the optimal transport can be cast as a convex-optimization problem, ii) the geodesics and transport paths can be determined using convex programming, and iii) the optimal transport plan has support which, in contrast to the classical Monge–Kantorovich setting, is no longer contained on a thin zero-measure set. The relevance of the proposed metric is highlighted in examples on spectral morphing and spectral tracking in the final section of the paper.

II. Preliminaries on Optimal Mass Transport

Consider two probability density functionsμ0 andμ1 supported on ℝ and let ℳ(μ0, μ1) be the set of probability measuresm on ℝ × ℝ withμ0 andμ1 as marginals, i.e.,

m(x,y)dy=μ0(x),m(x,y)dx=μ1(y),m(x,y)0.

Clearly, ℳ(μ0, μ1) is not empty since already the productμ0(x)μ1(y) ∈ ℳ(μ0, μ1). Probability densities are thought of as distributions of mass and the optimal mass transport problem is to determine

Tc(μ0,μ1):=infmM(μ0,μ1)×c(x,y)m(x,y)dxdy(1)

wherec(x, y) is the cost of transporting one unit of mass from locationx toy. In particular, whenc(x, y) = |xy|2, the optimal cost gives rise to the 2-Wasserstein metric

W2(μ0,μ1)=T2(μ0,μ1)12

where

T2(μ0,μ1):=infmM(μ0,μ1)×x-y2m(x,y)dxdy.(2)

In general, (1) is a linear program with dual

supϕ,ψ{(ϕ0(x)μ0(x)-ϕ1(x)μ1(x))dxϕ0(x)-ϕ1(y)c(x,y)}(3)

whereϕ0, ϕ1 are continuous, see [3]. For the quadratic costc(x, y) = |xy|2 and in one spatial dimension, 𝓣2(μ0,μ1) can also be written explicitly in terms of the cumulative distributions functions

Mi(x)=-xμi(x)dxfori=0,1

in the form

T2(μ0,μ1)=01M0-1(t)-M1-1(t)2dt.(4)

In this case, the optimal joint probability densitym ∈ ℳ(μ0, μ1) has support on (x, T (x)) whereT (x) is the sub-differential of a convex lower semi-continuous function, see [3, p. 75]. More specifically,T (x) is uniquely defined by

M0(x)=M1(T(x)).(5)

Interestingly, a geodesicμτ (τ ∈ [0, 1]) betweenμ0 andμ1 can be written explicitly as well in terms of a corresponding cumulative functionMτ, for eachτ, defined via

Mτ((1-τ)x+τT(x))=M0(x).(6)

Indeed, it readily follows that:

W2(μ0,μτ)=τW2(μ0,μ1)W2(μτ,μ1)=(1-τ)W2(μ0,μ1)

and thatμτ (τ ∈ [0, 1]) is a geodesic.

III. Matrix-Valued Optimal Mass Transport

We consider the one-dimensional family of matrix-valued functions

F:={μ(·)forx,μ(x)=μ(x)n×n,μ(x)0,tr(μ(x)dx)=1}.

These are Hermitian, positive semi-definite matrix-valued functions on ℝ normalized so that their trace integrates to 1. They will be referred to as matrix-valued densities and can be thought of as a generalization of probability density functions. The scalar-valued tr(μ) represents mass at locationx. Thus, all elements in ℱ have the same total mass over the support. Below, we motivate a particular cost of transportation between such matrix-valued functions and introduce a suitable generalization of the Monge–Kantorovich OMT to matrix-valued densities.

A. Tensor Product and Partial Trace

Consider twon-dimensional real or complex (Hilbert) spaces ℋ0 and ℋ1, let ℒ(ℋ0) and ℒ(ℋ1) denote the space of linear operators on ℋ0 and ℋ1, respectively, and letμ0 ∈ ℒ(ℋ0) andμ1 ∈ ℒ(ℋ1). Thus, in the present subsection,μi (i ∈ {0, 1}) are fixed matrices. We denote their tensor product byμ0μ1 ∈ ℒ(ℋ0 ⊗ ℋ1) which is formally defined via

μ0μ1:uvμ0uμ1v.

Since our spaces are finite-dimensional this can be identified with the Kronecker product of the corresponding matrix representation of the two operators. The space ℒ(ℋ0 ⊗ ℋ1) is the span of all productsμ0μ1 withμi ∈ ℒ(ℋi) fori ∈ {0, 1}.

Considerμ ∈ ℒ(ℋ0 ⊗ ℋ1). The partial traces tr0 and tr1, or tr0 and tr1 for brevity, are linear maps

tr1:L(H0H1)L(H0):μtr1(μ)tr0:L(H0H1)L(H1):μtr0(μ)

defined uniquely by the property that on simple products they act as follows:

tr1(μ0μ1)=tr(μ1)μ0andtr0(μ0μ1)=tr(μ0)μ1

for anyμ0 ∈ ℒ(ℋ0) andμ1 ∈ ℒ(ℋ1). Alternatively,μ ∈ ℒ(ℋ0 ⊗ ℋ1) can be represented by a matrix [μik,m] of sizen2 ×n2 as it maps a basis elementuivk ∈ ℋ0 ⊗ ℋ1 to Σ,mμik,muvm. Then, the partial trace e.g., tr1(μ) is the represented by then ×n matrix with (i, ℓ)-th entry Σkμik,k, for 1 ≤i, ℓ ≤n. Likewise the (k, m)-th entry of tr0(μ) is Σiμik,im, for 1 ≤k,mn. See [9] for the significance of partial trace in the context of quantum mechanics.

B. Joint Matrix-Valued Density

We now return to considering matrix-valued density functionsμ0,μ1 ∈ ℱ. A naive attempt is to seek a joint densitym ≥ 0 having support on ℝ × ℝ and havingμ0,μ1 as “marginals,” i.e., so that

m(x,y)dy=μ0(x),m(x,y)dx=μ1(y).(7)

However, in contrast to the scalar case, such anm does not exist in general. To see this, consider the case of matrix valued measures

μ0(x)=[12000]δ(x-1)+[00012]δ(x-2),μ1(y)=[14-14-1414]δ(y-1)+[14141414]δ(y-2)

whereδ(·) denotes the Dirac delta. Ifm(x, y) were to exist, its support would be contained in {(1, 1), (1, 2) (2, 1), (2, 2)}. It is easy to see that there cannot be a consistent selection of four 2 × 2 matrices so that, in pairs, they sum up to the coefficients making upμ0(x) andμ1(y).

Thus, any natural definition of a transportation plan requires that the joint density lives in a bigger space. A particular formulation is as follows: we seek

m(x,y)beingn2×n2positivesemi-definitematrix(8a)

for (x, y) ∈ ℝ × ℝ, such that

m0(x,y):=tr1(m(x,y)),m1(x,y):=tr0(m(x,y))(8b)
m0(x,y)dy=μ0(x),m1(x,y)dx=μ1(y)(8c)

and denote

M(μ0,μ1):={m(8a)-(8c)aresatisfied}.

Sinceμ0μ1M(μ0,μ1), the setM(μ0,μ1) is clearly not empty.

We next motivate a suitable transportation cost. This is a functional on the joint densityM(μ0,μ1), just as in the scalar case. However, besides penalizing transport of mass between two pointsx andy, we also impose a penalty on a corresponding rotation as well.

C. Transportation Cost

As indicated earlier, we interpret tr(m(x, y)) as the total “mass” that is being transferred fromx toy. We consider a scalar cost2c(x, y) = (xy)2 and the “mass transference” cost

minmM(μ0,μ1)×c(x,y)tr(m(x,y))dxdy.(9)

This coincides with the optimal transportation cost between scalar-valued densities tr(μ0) and tr(μ1). Thus, if tr(μ0(x)) = tr(μ1(x)), the optimal value of (9) is zero since it reduces to optimal transport between identical scalar marginals. Thus, (9) fails to quantify mismatch of directionality between the given matrix-valued marginals. Below, we introduce a term that penalizes directionality missmatch.

We assume throughout that the marginals are positive definite pointwise. Then, fori ∈ {0, 1}, tr(μi(x)) represents the total mass atx whileμi(x)/tr(μi(x)), normalized to have trace 1, encapsulates directional information. Likewise, for the joint densitym(x, y), assuming thatm(x, y) ≠ 0, we define thenormalized partial traces

tr_0(m(x,y)):=tr0(m(x,y))/tr(m(x,y))tr_1(m(x,y)):=tr1(m(x,y))/tr(m(x,y)).

Their difference captures the directional mismatch between the two partial traces. Hence, we introduce

tr((tr_0-tr_1)m(x,y)F2m(x,y))

to quantify the rotational mismatch and we consider the cost functional

tr((c(x,y)+λ(tr_0-tr_1)m(x,y)F2)m(x,y))

withλ > 0, to weigh in the relative significance of the linear and rotational penalties.

D. Optimal Transportation Problem

In view of the above, we define

T2,λ(μ0,μ1):=minmM(μ0,μ1)×tr((c+λ(tr_0-tr_1)mF2)m)dxdy(10)

withc(x, y) = (xy)2, and show next that (10) is in fact a convex optimization problem.

From the definition

tr_0(m)tr(m)=tr0(m),tr_1(m)tr(m)=tr1(m)

and hence

(tr_0-tr_1)mF2tr(m)=(tr_0-tr_1)mF2tr(m)2tr(m)=(tr0-tr1)mF2tr(m).

Now letm(x, y) = tr(m(x, y)) andm0(x, y),m1(x, y) be as in (8). The expression for the optimal cost in (10) is lower bounded by

minm0,m1,m{(c(x,y)m(x,y)+λm0-m1F2m)dxdym0(x,y),m1(x,y)0,tr(m0(x,y))=tr(m1(x,y))=m(x,y)m0(x,y)dy=μ0(x),m1(x,y)dx=μ1(y)}.(11)

For an optimal triplem̂,0,1 of (11),:=01 is a minimizer of (10) that gives the same optimal value as (11). Thus, the optimal cost in (10) is equivalently written as (11).

Forx > 0, the expression (yz)2/x is jointly convex in the argumentsx, y, z, see e.g., [10, p. 72]. It readily follows that the integral in (11) is a jointly convex functional of its arguments. All additional constraints in (11) are convex as well and, therefore, so is the optimization problem.

IV. On the Geometry of Optimal Mass Transport

An important result in the (scalar) OMT theory is that the transportation plan is the sub-differential of a convex function and has support on a thin zero-measure set, see e.g., [3, p. 92]. This property is not shared by the optimal transportation plan between matrix-valued density functions as we explain next.

In standard scalar OMT with convex transportation cost, the optimal transportation plan has a certain cyclically monotonic property [3]. More specifically, if (x1, y1), (x2, y2) are two points where the transportation plan has support (i.e.,m(x, y) ≠ 0), thenx2> x1 impliesy2y1. The interpretation is that optimal transportation paths of mass elements do not cross. For the case of matrix-valued distributions as in (3), this property may not hold in the same way. However, interestingly, a weaker monotonicity property holds for the supporting set of the optimal matrix transportation plan. The property is defined next and the precise statement is given in Proposition 2 below.

Definition 1

A set 𝓢 ⊂ ℝ2 is called aρ-monotonically nondecreasing, forρ > 0, if for any two points (x1, y1), (x2, y2) ∈ 𝓢, it holds that

(x2-x1)(y1-y2)ρ.

A geometric interpretation for aρ-monotonically non-decreasing set is that if (x1, y1), (x2, y2) ∈ 𝓢 andx2> x1,y1> y2, then the area of the rectangle with vertices (xi, yj) (i, j ∈ {1, 2}) is not larger thanρ. The transportation plan of the scalar-valued optimal transportation problem with a quadratic cost has support on a 0-monotonically non-decreasing set.

Proposition 2

Givenμ0,μ1 ∈ ℱ, letm be the optimal transportation plan in (10) withc(x, y) = (xy)2 andλ > 0. Thenm has support on at most a (4 ·λ)-monotonically nondecreasing set.

Proof

See theAppendix.

Further, the optimal transportation cost 𝓣2, λ(μ0,μ1) satisfies:

  1. 𝓣2 (μ0,μ1) = 𝓣2(μ1,μ0),

  2. 𝓣2 (μ0,μ1) ≥ 0,

  3. 𝓣2 (μ0,μ1) = 0 if and only ifμ0 =μ1.

Thus, although 𝓣2(μ0,μ1) can be used to compare matrix-valued densities, it is not a metric and neither isT2,λ1/2 since the triangular inequality does not hold in general. We will introduce a slightly different formulation of a transportation problem which does give rise to a metric.

A. Optimal Transport on a Subset

In this subsection, we restrict attention to a certain subset of transport plansM(μ0,μ1) and show that the corresponding optimal transportation cost induces a metric. More specifically, let

M0(μ0,μ1):={mm(x,y)=(μ0(x)μ1(y))a(x,y),mM}.

Form(x, y) ∈M0(μ0,μ1),

tr_0(m(x,y)):=μ1(y)/tr(μ1(y))tr_1(m(x,y)):=μ0(x)/tr(μ0(x)).

Givenμ0 andμ1, the “orientation” of the mass ofm(x, y) is fixed. Thus, in this case, the optimal transportation cost is

T2,λ(μ0,μ1):=minmM0(μ0,μ1)tr((c+λ(tr_0-tr_1)m(x,y)F2)m)dxdy.(12)

Proposition 3

For𝓣2 as in (12) withλ > 0 andμ0,μ1 ∈ ℱ,

d2,λ(μ0,μ1):=(T2,λ(μ0,μ1))12(13)

defines a metric on ℱ.

Proof

It is straightforward to see that

d2,λ(μ0,μ1)=d2,λ(μ1,μ0)0

and thatd2 (μ0,μ1) = 0 if and only ifμ0 =μ1. We now show that the triangle inequality holds as well. Forμ0,μ1,μ2 ∈ ℱ, let

m01(x,y)=μ0(x)tr(μ0(x))μ1(y)tr(μ1(y))m01(x,y)m12(y,z)=μ1(y)tr(μ1(y))μ2(z)tr(μ2(z))m12(y,z)

denote the optimal transportation plan for the pairs (μ0,μ1) and (μ1,μ2), respectively, wherem01 andm12 are two (scalar-valued) joint densities on ℝ2 with marginals tr(μ0), tr(μ1) and tr(μ1), tr(μ2), respectively. Givenm01(x, y) andm12(y, z) there is a joint density functionm(x, y, z) on ℝ3 withm01 andm12 as the marginals on the corresponding subspaces [3, p. 208]. We set

m(x,y,z)=μ0(x)tr(μ0(x))μ1(y)tr(μ1(y))μ2(z)tr(μ2(z))m(x,y,z)

and note that it hasm01 andm12 as matrix-valued marginal distributions. Now, letm02(x,z) = (μ0(x)/trμ0(x)) ⊗ (μ2(z)/trμ2(z))m02(x,z) be the marginal ofm(x,y,z) when tracing out they-component. Thism02(x,z) is a possible transportation plan betweenμ0 andμ2. Hence

d2,λ(μ0,μ2)(2((x-z)2+λμ0(x)trμ0(x)-μ2(z)trμ2(z)F2)m02dxdz)12=(3((x-z)2+λμ0(x)trμ0(x)-μ2(z)trμ2(z)F2)mdxdydz)12=(3((x-y+y-z)2+λμ0(x)trμ0(x)-μ1(y)trμ1(y)+μ1(y)trμ1(y)-μ2(z)trμ2(z)F2)mdxdydz)12(2((x-y)2+λμ0(x)trμ0(x)-μ1(y)trμ1(y)F2)m01dxdy)12+(2((y-z)2+λμ1(y)trμ1(y)-μ2(z)trμ2(z)F2)m12dydz)12=d2,λ(μ0,μ1)+d2,λ(μ1,μ2)

where the last inequality is due to the metric property ofL2.

Ifλ = 0, then 𝓣̃2,0(μ0,μ1) is exactly the OMT cost between the scalar-valued densities tr(μ0) and tr(μ1) as was explained earlier. In particular, for an optimal transportation planm(x,y) between tr(μ0) and tr(μ1), the matrix-valued transportation planm(x,y) = (μ0(x)/tr(μ0(x))) ⊗ (μ1(y)/tr(μ1(y)))m(x,y) is optimal betweenμ0 andμ1 which satisfies that 𝓣̃2,0(μ0,μ1) = 𝓣2(tr(μ0), tr(μ1)). Thus, 𝓣̃2,0(μ0,μ1) = 0 if and only if tr(μ0) = tr(μ1). Hence, 𝓣̃2,0 fails to be a metric. Moreover, since for anyλ ≥ 0 it holds that 𝓣2,λ ≤ 𝓣̃2,λ, if tr(μ0) = tr(μ1) then 𝓣2,0(μ0,μ1) also equals to zero.

Proposition 4

Givenμ0,μ1 ∈ ℱ, letm be the optimal transportation plan in (13), thenm has support on at most a (2 ·λ)-monotonically non-decreasing set.

Proof

We need to prove that ifm(x1,y1) ≠ 0 andm(x2,y2) ≠ 0, thenx2 >x1,y1 >y2 implies

(y1-y2)(x2-x1)2λ.(14)

Assume thatm evaluated at the four points (xi,yj) withi,j ∈ {1, 2}, is as follows:

m(xi,yi)=mij·AiBj

with

Ai=μ0(xi)tr(μ1(xi)),Bi=μ0(yi)tr(μ1(yi))

andm11,m22 > 0. The steps of the proof are similar to those of Proposition 2 detailed in theAppendix: first, we assume that Proposition 4 fails and that

(y1-y2)(x2-x1)>2λ.

Then we show that a smaller cost can be incurred by rearranging the “mass.” Consider the situation whenm22m11 first and let be a new transportation plan with

m^(x1,y1)=0m^(x1,y2)=(m11+m12)·A1B2m^(x2,y1)=(m11+m21)·A2B1m^(x2,y2)=(m22-m11)·A2B2.

Then,,m have the same marginals at the four points, the cost incurred bym is

i=12j=12mij((xi-yj)2+λAi-BjF2)(15)

and the cost incurred by is

(m11+m12)((x1-y2)2+λA1-B2F2)+(m11+m21)((x2-y1)2+λA2-B1F2)+(m22-m11)((x2-y2)2+λA2-B2F2).(16)

To show that (15) is larger than (16), after canceling common terms, it suffices to show that

(y1-x1)2+(y2-x2)2+λA1-B1F2+λA2-B2F2(y2-x1)2+(y1-x2)2+λA1-B2F2+λA2-B1F2.

However, the above holds true since

(y1-x1)2+(y2-x2)2+λA1-B1F2+λA2-B2F2(y1-x1)2+(y2-x2)2=(y1-x2)2+(y2-x1)2+2(x2-x1)(y1-y2)>(y1-x2)2+(y2-x1)2+4λ(y1-x2)2+(y1-x2)2+λ(A1-B2F2+A2-B1F2).

The last inequality follows from:

A1-B2F2=tr(A12+B22-2A1B2)tr(A12+B22)2.

The casem11 >m22 proceeds similarly.

V. Examples

We give two different examples where matrix-valued OMT can be directly applied. Both relate to spectral analysis of multivariable time series.3

A. Spectral Morphing

We first highlight the relevance of matrix-valued OMT to spectral analysis with a numerical example on spectral morphing. The idea is to model slowly time-varying changes in the spectral domain by geodesics in a suitable geometry (see e.g., [7], [8]). The use of geodesic interpolation can be thought of as a regularization technique. Indeed, geodesics smoothly shift spectral power across frequencies lessening the possibility of a fade-in fade-out artifacts and OMT, for scalar power spectra, has been used to this end in [7], [8]. Below we exemplify how geodesics appear in matrix-valued OMT.

Starting withμ0,μ1 ∈ ℱ we approximate the geodesic between the two by constructingN − 1 points intermediate matrix densities. To this end, we setμτ0 =μ0 andμτN =μ1, and determineμτk ∈ ℱ fork = 1, …,N − 1 as the solution to

minμτk,0<k<Nk=0N-1T2,λ(μτk+1,μτk).(17)

As noted in Section III-D, this can be obtained numerically via convex programming. The present example uses

μ0=[100.2e-jθ1][1a0(ejθ)2000.01][10.2ejθ01]μ1=[10.201][0.01001a1(ejθ)2][100.21]

with

a0(z)=(z2-1.8cos(π4)z+0.92)(z2-1.4cos(π3)z+0.72)a1(z)=(z2-1.8cos(π6)z+0.92)(z2-1.5cos(2π15)z+0.752)

shown inFig. 1. Since the value of a power spectral density at each point in frequency is a 2 × 2 Hermitian matrix, we have used the (1, 1), (1, 2), and (2, 2) subplots to display the magnitude of the corresponding entries, i.e., |μ(1, 1)|, |μ(1, 2)|, (= |μ(2, 1)|), and |μ(2, 2)|, respectively, and the (2,1) subplot to display the phase ∠μ(1, 2) (= −∠μ(2, 1)).

Fig. 1.

Fig. 1

Subplots (1,1), (1,2), and (2,2) showμi(1, 1), |μi(1, 2)| (same as |μi(2, 1)|) andμi(2, 2). Subplot (2,1) shows ∠(μi(2, 1)) fori ∈ {0, 1} in blue and red, respectively.

The 3-D plots inFig. 2 refer to (17), withλ = 0.1, for an approximation of a geodesic. The two boundary plots represent the power spectraμ0 andμ1 shown in blue and red, respectively, using the same convention about magnitudes and phases explained above. There are in total 7 power spectraμτk,k = 1, …, 7 shown along the geodesic betweenμ0 andμ1, and the time-indices correspond toτk =k/8. It is interesting to observe the smooth shift of the energy over the geodesic path from the one “channel” to the other while, at the same time, the corresponding peak shifts from one frequency to another. One should bear in mind that the so-constructed geodesic is a non-parametric path interpolating/linking the given spectra.

Fig. 2.

Fig. 2

Interpolated resultsμτk fork = 0, …, 8 computed from (17) withμ0 andμ1 as the two boundary points: subplots (1, 1), (1, 2), and (2, 2) showμτk (1, 1), |μτk (1, 2)| (same as |μτk (2, 1)|) andμτk (2, 2), subplot (2, 1) shows ∠(μτk (2, 1)).

B. Regularization Using Geodesics

Consider two time series

x1(t)=a1(t)cos(θ1(t)t+ϕ1a)+a2(t)cos(θ2(t)t+ϕ1b)+w1(t)x2(t)=a2(t)cos(θ1(t)t+ϕ2a)+a1(t)cos(θ2(t)t+ϕ2b)+w2(t)

fort = 1, …, 2000, both consisting of sinusoidal signals with time-varying amplitude and frequency (chirp-like) with added white noisew1(t) andw2(t). The amplitudea1(t) decreases from 1.2 to 0.1 whilea2(t) increases from 0.1 to 1.2. Frequencyθ1(t) decreases from (π/4) to (π/4) − (π/30) whileθ2(t) increases from (π/3) to (π/3) + (π/30). Then [w1(t),w2(t)]′ is white, with independent components, and sampled from a zero-mean Gaussian distribution with covariance[31.51.53]. The initial phases of the sinusoids are randomly selected in [0, 2π].

Since we are dealing with non-stationary time series, we truncate the observed time series, to segments of length equal to 200, to retain resolution. Thus, we letxi,k(t) :=xi(200k+t) withi = 1, 2,k = 0, …, 9 and process separately the segments {xi,k(1),xi,k(2), …,xi,k(200)}. We obtain matrix-valued sample covariances {Rk,−20, …,Rk,0, …,Rk,20} for each. We then determine autoregressive models based on these sample covariances and, thereby, the corresponding power spectral density functions. More specifically, for ℓ = 0, 1, …, 20, we compute

Rk,:=1200i=200[x1,k(i)x2,k(i)][x1,k(i-),x2,k(i-)]

and letRk,-=Rk,. We then solve the Yule-Walker equations

Rk,=i=120Ak,iRk,-i,for=1,,20

for the autoregressive (matricial) coefficientsA1, …,A20. We letΩ:=R0-i=120Ak,iRk,-i be the corresponding innovation variance. The estimated power spectral density function for thekth segment, denoted asμ̂τk, is given asμ̂τk(θ) =Ak(e)−1Ω(Ak(e)*)−1 with

Ak(ejθ)=(I-Ak,1z--Ak,20z20)z=ejθ.

We scaleμ̂τk so that the integral of its trace is normalized to one. Thus, the observation record is used to obtain 10 PSD’s denoted asμ̂τk, fork = 1, …, 10. These represent estimates of the spectral power at intermediary points in time. We change the time scale so thatτ1 = 0 andτ10 = 1. The spectrogram is shown inFig. 3(a).

Fig. 3.

Fig. 3

(a) Shows the estimated spectrogram of the observed time series and (b) corresponds to the geodesic-fitted spectrogram.

We construct an OMT-geodesic to regularize the estimated PSD’s. This idea was proposed and carried out in [8] for scalar time series and scalar PSD’s. For the present matrix-valued setting, the geodesic is obtained by solving

minμτk{k=110T2,λ(μτk,μ^τk)μτkareonanOMTgeodesic}.(18)

An explicit formula of the OMT geodesic is not available. However, in light of Proposition 2, (18) can be approximated for smallλ as follows. Letμ̂τk = tr(μ̂τk) fork = 1, …, 10. These are scalar-valued PSD’s. Letτk denote the corresponding cumulative distribution functions. Forλ small, following [8], we computeμ0 := tr(μ0) andμ1 := tr(μ1) via solving:

minμ0,μ1k=11001((1-τk)M0-1(v)+τkM1-1(v)-M^τk-1(v))2dv

withM0 andM1 representing the cumulative distribution function ofμ0 andμ1, respectively. Then, as was shown in [8], theμτk’s for 1 <k < 10 can be computed via

Mτk-1(v)=(1-τk)M0-1(v)+τkM1-1(v).

The matrix-valued PSD’sμ0 andμ1 are obtained by solving

minμ0,μ1k=11001(1-τk)μ0(M0-1(v))μ0(M0-1(v))+τkμ1(M1-1(v))μ1(M1-1(v))-μ^τk(M^τk-1(v))μ^τk(M^τk-1(v))F2dv

and theμτk’s for 1 <k < 10 are computed via

μτk(Mτk-1(v))μτk(Mτk-1(v))=(1-τk)μ0(M0-1(v))μ0(M0-1(v))+τkμ1(M1-1(v))μ1(M1-1(v)).

We display this geodesic-fitted spectrogram inFig. 3(b). It can be seen that the shift of energy from one channel to another and between resonant frequencies is smoother than that shown inFig. 3(a) (which is a spectrogram based on matricial auto-regressive models).

In order to compare the resolution between the two techniques (spectrogram based on AR-modeling vs. geodesic regularization), we identify the frequency and directionality of peak power for the two power spectral densities and compare principal direction. This we explain next as the result is quite revealing and suggesting.

For eachμ̂τk (θ), we find two frequenciesθ1 andθ2 where the power spectral densities (PSD’s) have locally maximal power, i.e., the two frequencies where tr(μ̂τk (θ)) has the largest peaks. Then we compute the (normalized) eigenvectors corresponding to the dominant eigenvalues ofμτk (θ1) andμτk (θ2), respectively. These eigenvectors are shown inFig. 4(a) using black dashed lines. The red and green plots inFig. 4(a) represent the path of the two eigenvectors asτk increases from 0 to 1. The axes inFig. 4(a) correspond to the two channels/components of the time series andτk. (Should all the power be present in one of the two channels, the eigenvector would line up, accordingly, to one of the axes.) The values of the eigenvector when projected onto the two channels/axes, reflect the energy of the signals in the corresponding channels. Thus, in antenna-array applications, the direction of eigenvectors corresponds to the direction of a scatterer relative to the array. Statistical errors are reflected in the jagged nature of the paths when these are based on a spectrogram as inFig. 4(a). However, when comparing with the eigenvectors of the OMT regularized spectrogram/AR-modelsμτk, the corresponding paths shown inFig. 4(b) are smooth. Direct comparison betweenFig. 4(a) and (b) highlights the potential advantages of using geodesics as a means to regularize power distribution in non-stationary time series.

Fig. 4.

Fig. 4

In (a), the trajectories of the dominant eigenvector ofμ̂τk (θ1) andμ̂τk (θ2) are shown in red and blue, respectively. The corresponding trajectories ofμτk (θ1) andμτk (θ2) are shown in (b).

VI. Conclusion

The geometry of Monge–Kantorovich optimal mass transport provides scalar densities with a natural metric structure (see [11], [12]; also [13] for a systems viewpoint and connections to image analysis and power spectra). Our interest has been in extending such a geometric structure to matrix-valued densities. To this end, we formulated one possible matrix-valued version of the Monge–Kantorovich transportation problem. Computations require convex programming and the framework directly extends the scalar case. An alternative generalization of the Monge–Kantorovich theory to a “non-commutative” setting has been given in the context of the theory of free-probabilities [14]. However, this may not be suitable for matrix-valued power distributions as it is not weak* continuous. Alternative non-commutative generalizations of the Wasserstein metric are given in [15]–[17]. Possible connections between the formulation herein and these alternative viewpoints is the subject of current investigation.

Biographies

graphic file with name nihms757419b1.gif

Lipeng Ning received the B.S. and M.S. degrees in control science and engineering from Beijing Institute of Technology, Beijing, China, in 2006 and 2008, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Minnesota, Minneapolis, in 2013.

He is currently a Postdoc Research Fellow in Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA. His research interests include system identification, spectral estimation, sparse signal recovery, and diffusion magnetic resonance imaging.

graphic file with name nihms757419b2.gif

Tryphon T. Georgiou (F’00) received the Diploma in mechanical and electrical engineering from the National Technical University of Athens, Athens, Greece, in 1979 and the Ph.D. degree from the University of Florida, Gainesville, in 1983.

He is a faculty member in the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, and the Vincentine Hermes-Luh Chair. He has served as a Co-Director of the Control Science and Dynamical Systems Center at the University of Minnesota (1990–2014), and on the Board of Governors of the Control Systems Society of the IEEE (2002–2005).

Dr. Georgiou has been a recipient of the George S. Axelby Outstanding Paper Award of the IEEE Control Systems Society for the years 1992, 1999, and 2003. He is a Foreign Member of the Royal Swedish Academy of Engineering Sciences (IVA).

graphic file with name nihms757419b3.gif

Allen Tannenbaum (F’08) is a faculty member in computer science and applied mathematics at Stony Brook University, Stony Brook, NY, USA. He works in control, computer vision, and medical imaging.

Appendix

Appendix: Proof of Proposition 2

We need to show that ifm(x1,y1) ≠ 0 andm(x2,y2) ≠ 0, thenx2 >x1,y1 >y2 implies

(x2-x1)(y1-y2)4λ.(19)

Without loss of generality, let

m(xi,yj)=mij·AijBij(20)

withAij,Bij ≥ 0, tr(Aij) = tr(Bij) = 1 andi,j ∈ {1, 2}. Note thatm12 andm21 could be zero ifm does not have support on the particular point. We assume that the condition in the proposition fails and that

(x2-x1)(y1-y2)>4λ.(21)

We then show that by rearranging the mass, the cost can be reduced.

We first consider the situation whenm22m11. By rearranging the value ofm at the four points (xi,yj) withi,j ∈ {1, 2}, we construct a new transportation plan at these four locations as follows:

m(x1,y1)=0(22a)
m(x1,y2)=(m11+m12)·A12B12(22b)
m(x2,y1)=(m11+m21)·A21B21(22c)
m(x2,y2)=(m22-m11)·A22B22(22d)

where

A12=m11A11+m12A12m11+m12,B12=m11B22+m12B12m11+m12A21=m11A22+m21A21m11+m21,B21=m11B11+m21B21m11+m21.

This new transportation plan has the same marginals asm atx1,x2 andy1,y2. The original cost incurred bym at these four locations is

i=12j=12mij((xi-yj)2+λAij-BijF2)(23)

while the cost incurred by is

(m11+m12)((x1-y2)2+λA12-B12F2)+(m11+m21)((x2-y1)2+λA21-B21F2)+(m22-m11)((x2-y2)2+λA22-B22F2).(24)

After simplification, to show that (23) is larger than (24), it suffices to show that

2m11(x2-x1)(y1-y2)(25)

is larger than

λm11(i=12jiAij-BijF2-i=12Aii-BiiF2)(26a)
+λm12(A12-B12F2-A12-B12F2)(26b)
+λm21(A21-B21F2-A21-B21F2).(26c)

From (21), it follows that the value in (25) is greater than 20λm11. We derive upper bounds for each term in (26). First,

(26a)λm11(A12-B12F2+A21-B21F2)4λm11

where the last inequality follows from the fact that:

A-BF2=tr(A2-2AB+B2)tr(A2+B2)2

for anyA,B ≥ 0 with tr(A) = tr(B) = 1. Now consider

A12-B12F2-A12-B12F2=tr((A12-B12+A12-B12)(A12-B12-A12+B12))=m11m11+m12(A11-B22F2-A12-B12F2-m12m11+m12A11-B22-A12+B12F2)m11m11+m12A11-B22F22m11m11+m12

where the second equality follows from the definitions ofÃ12 and12 while the last inequality is obtained by bounding the terms in the trace. Thus, referring to expressions by the respective equation numbering

(26b)2λm12m11m11+m122λm11.

In a similar manner, (26c) ≤ 2λm11. Therefore

(26)8λm11<(25)

which implies that the cost incurred by is smaller than the cost incurred bym.

For the case wherem11 >m22, we can prove the claim by constructing a new transportation plan with values

m^(x1,y1)=(m11-m22)·A11B11m^(x1,y2)=(m12+m22)·A^12B^12m^(x2,y1)=(m21+m22)·A^21B^21m^(x2,y2)=0

with

A^12=m12A12+m22A11m12+m22,B^12=m12B12+m22B22m12+m22A^21=m21A21+m22A22m21+m22,B^21=m21B21+m22B11m21+m22.

The rest of the proof is carried out in a similar manner.

Footnotes

1

A sequence of measuresn converges weak* to if and only if ∫fdμn → ∫fdμ for all continuous and boundedf.

2

It is interesting to consider functionals of the form tr(c(x, y)m(x, y)), withc(x, y) being matricial, and how to utilize such so as to reflect transportation cost with practical relevance.

Color versions of one or more of the figures in this paper are available online athttp://ieeexplore.ieee.org.

Contributor Information

Lipeng Ning, Email: lning@bwh.harvard.edu, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115 USA.

Tryphon T. Georgiou, Email: tryphon@umn.edu, Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA.

Allen Tannenbaum, Email: allen.tannenbaum@stonybrook.edu, Departments of Computer Science and Applied Mathematics, Stony Brook University, Stony Brook, NY 11794 USA.

References

  • 1.Monge G. Open Library. De l’Imprimerie Royale; 1781. Mémoire sur la théorie des déblais et des remblais. [Google Scholar]
  • 2.Kantorovich L. On the transfer of masses. Dokl Akad Nauk SSSR. 1942;37:227–229. [Google Scholar]
  • 3.Villani C. Topics in optimal transportation. Vol. 58. Providence, RI: American Mathematical Society; 2003. [Google Scholar]
  • 4.Ambrosio L. Lecture notes on optimal transport problems. Mathematical aspects of evolving interfaces. 2003:1–52. [Google Scholar]
  • 5.Rachev S, Rüschendorf L. Mass Transportation Problems: Theory. Vol. 1. New York: Springer-Verlag; 1998. [Google Scholar]
  • 6.Ferrante A, Pavon M, Ramponi F. Hellinger versus Kullback–Leibler multivariable spectrum approximation. IEEE Trans Autom Control. 2008 Apr;53(4):954–967. [Google Scholar]
  • 7.Jiang X, Ning L, Georgiou TT. Distances and Riemannian metrics for multivariate spectral densities. IEEE Trans Autom Control. 2012 Jul;57(7):1723–1735. [Google Scholar]
  • 8.Jiang X, Luo Z, Georgiou T. Geometric methods for spectral analysis. IEEE Trans Signal Process. 2012 Mar;60(3):1064–1074. [Google Scholar]
  • 9.Petz D. Quantum Information Theory and Quantum Statistics (Theoretical and Mathematical Physics) Berlin, Germany: Springer; 2008. [Google Scholar]
  • 10.Boyd S, Vandenberghe L. Convex optimization. Cambridge, MA: Cambridge Univ. Press; 2004. [Google Scholar]
  • 11.Benamou J, Brenier Y. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numerische Mathematik. 2000;84(3):375–393. [Google Scholar]
  • 12.Jordan R, Kinderlehrer D, Otto F. The variational formulation of the Fokker-Planck equation. SIAM J Mathemat Anal. 1998;29(1):1–17. [Google Scholar]
  • 13.Tannenbaum E, Georgiou T, Tannenbaum A. Signals and control aspects of optimal mass transport and the Boltzmann entropy. Proc. 49th IEEE Conf. Decision and Control; 2010. pp. 1885–1890. [Google Scholar]
  • 14.Biane P, Voiculescu D. A free probability analogue of the Wasserstein metric on the trace-state space. Geometric and Funct Anal. 2001;11(6):1125–1138. [Google Scholar]
  • 15.Rieffel M. Metrics on state spaces. Doc Math J DMV. 1999;4:559–600. [Google Scholar]
  • 16.Andrea FD, Martinetti P. A view on optimal transport from noncommutative geometry. SIGMA. 2010;6(057):24. [Google Scholar]
  • 17.Martinetti P. Towards a Monge–Kantorovich metric in noncommutative geometry. Zap Nauch Semin POMI. 2013:411. arXiv preprint arXiv:1210.6573. [Google Scholar]

ACTIONS

RESOURCES


[8]ページ先頭

©2009-2025 Movatter.jp