. Author manuscript; available in PMC: 2016 Mar 18.

Published in final edited form as:IEEE Trans Automat Contr. 2014 Aug 21;60(2):373–382. doi:10.1109/TAC.2014.2350171

On Matrix-Valued Monge–Kantorovich Optimal Mass Transport

Lipeng Ning¹,Tryphon T Georgiou²,Allen Tannenbaum³

¹Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115 USA

²Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA

³Departments of Computer Science and Applied Mathematics, Stony Brook University, Stony Brook, NY 11794 USA

Issue date 2015 Feb.

PMC Copyright notice

PMCID: PMC4798256 NIHMSID: NIHMS757419 PMID:26997667

Abstract

We present a particular formulation of optimal transport for matrix-valued density functions. Our aim is to devise a geometry which is suitable for comparing power spectral densities of multivariable time series. More specifically, the value of a power spectral density at a given frequency, which in the matricial case encodes power as well as directionality, is thought of as a proxy for a “matrix-valued mass density.” Optimal transport aims at establishing a natural metric in the space of such matrix-valued densities which takes into account differences between power across frequencies as well as misalignment of the corresponding principle axes. Thus, our transportation cost includes a cost of transference of power between frequencies together with a cost of rotating the principle directions of matrix densities. The two endpoint matrix-valued densities can be thought of as marginals of a joint matrix-valued density on a tensor product space. This joint density, very much as in the classical Monge–Kantorovich setting, can be thought to specify the transportation plan. Contrary to the classical setting, the optimal transport plan for matrices is no longer supported on a thin zero-measure set.

Index Terms: Convex optimization, matrix-valued density functions, optimal mass-transport

I. Introduction

The problem of optimal mass transport (OMT) dates back to the work of G. Monge in 1781 [1] while its modern formulation is due to L. Kantorovich [2]. In recent years the subject is developing rapidly due to its intrinsic significance and range of applications in physics, economics, and probability [3]–[5].

Our motivation for studying “matrix-valued transport” originates in the spectral analysis of multi-variable time series. Just as in scalar time series, spectral content is assessed based on (estimated) statistics of the underlying process, where these are simply moments of the corresponding power spectral density (PSD). Different metrics have been proposed to compare PSD’s for purposes of spectral approximation, estimation, and system modeling (see [6], [7] and the references therein). However, since spectra are estimated based on integrals, weak^* metrics¹ are preferable since they provide continuity of statistics to perturbations in the PSD. Earlier metrics and so called, divergence measures, typically fail in this respect (see [7]). Hence, for this reason, optimal mass transport which endows the space of (scalar) probability/mass/power densities with a natural weak^* metric—the Wasserstein metric, is of particular interest. Our aim in this paper is to develop one possible such generalization of the Wasserstein metric that allows comparison of matrix-valued density functions in a similar spirit.

The scalar OMT theory has been adapted in [8] to model slowly time-varying changes in power spectra of time series and has been used for statistical estimation, data assimilation, and morphing. While in scalar time series, the power spectral content may drift across frequencies over time (e.g., when considering Doppler effects, echolocation of a moving target, etc.), in vector-valued time series the power spectral content may shift principle directions as well. In fact, such a rotation of the power-specral content is typical in general antenna-arrays when a scatterer changes position with respect to array elements. Therefore, a concept of transport between matrix-valued densities requires that we take into account both, the cost of shifting power across frequencies as well as the cost of rotating the corresponding principle axes. Besides our particular formulation of a “non-commutative” Monge–Kantorovich transportation problem and of a correponding metric, the main results in this paper are i) that the optimal transport can be cast as a convex-optimization problem, ii) the geodesics and transport paths can be determined using convex programming, and iii) the optimal transport plan has support which, in contrast to the classical Monge–Kantorovich setting, is no longer contained on a thin zero-measure set. The relevance of the proposed metric is highlighted in examples on spectral morphing and spectral tracking in the final section of the paper.

II. Preliminaries on Optimal Mass Transport

Consider two probability density functionsμ₀ andμ₁ supported on ℝ and let ℳ(μ₀, μ₁) be the set of probability measuresm on ℝ × ℝ withμ₀ andμ₁ as marginals, i.e.,

\int_{ℝ} m (x, y) d y = μ_{0} (x), \int_{ℝ} m (x, y) d x = μ_{1} (y), m (x, y) \geq 0.

Clearly, ℳ(μ₀, μ₁) is not empty since already the productμ₀(x)μ₁(y) ∈ ℳ(μ₀, μ₁). Probability densities are thought of as distributions of mass and the optimal mass transport problem is to determine

T_{c} (μ_{0}, μ_{1}) : = inf_{m \in M (μ_{0}, μ_{1})} \int_{ℝ \times ℝ} c (x, y) m (x, y) d x d y

(1)

wherec(x, y) is the cost of transporting one unit of mass from locationx toy. In particular, whenc(x, y) = |x −y|², the optimal cost gives rise to the 2-Wasserstein metric

W_{2} (μ_{0}, μ_{1}) = T_{2} {(μ_{0}, μ_{1})}^{\frac{1}{2}}

where

T_{2} (μ_{0}, μ_{1}) : = inf_{m \in M (μ_{0}, μ_{1})} \int_{ℝ \times ℝ} {∣ x - y ∣}^{2} m (x, y) d x d y .

(2)

In general, (1) is a linear program with dual

sup_{ϕ, ψ} {\int_{ℝ} (ϕ_{0} (x) μ_{0} (x) - ϕ_{1} (x) μ_{1} (x)) d x ∣ ϕ_{0} (x) - ϕ_{1} (y) \leq c (x, y)}

(3)

whereϕ₀, ϕ₁ are continuous, see [3]. For the quadratic costc(x, y) = |x −y|² and in one spatial dimension, 𝓣₂(μ₀,μ₁) can also be written explicitly in terms of the cumulative distributions functions

M_{i} (x) = \int_{- \infty}^{x} μ_{i} (x) d x for i = 0, 1

in the form

T_{2} (μ_{0}, μ_{1}) = \int_{0}^{1} {∣ M_{0}^{- 1} (t) - M_{1}^{- 1} (t) ∣}^{2} d t .

(4)

In this case, the optimal joint probability densitym ∈ ℳ(μ₀, μ₁) has support on (x, T (x)) whereT (x) is the sub-differential of a convex lower semi-continuous function, see [3, p. 75]. More specifically,T (x) is uniquely defined by

M_{0} (x) = M_{1} (T (x)) .

(5)

Interestingly, a geodesicμ_τ (τ ∈ [0, 1]) betweenμ₀ andμ₁ can be written explicitly as well in terms of a corresponding cumulative functionM_τ, for eachτ, defined via

M_{τ} ((1 - τ) x + τ T (x)) = M_{0} (x) .

(6)

Indeed, it readily follows that:

\begin{array}{l} W_{2} (μ_{0}, μ_{τ}) = τ W_{2} (μ_{0}, μ_{1}) \\ W_{2} (μ_{τ}, μ_{1}) = (1 - τ) W_{2} (μ_{0}, μ_{1}) \end{array}

and thatμ_τ (τ ∈ [0, 1]) is a geodesic.

III. Matrix-Valued Optimal Mass Transport

We consider the one-dimensional family of matrix-valued functions

F : = {μ (\cdot) ∣ for x \in ℝ, μ {(x)}^{*} = μ (x) \in ℂ^{n \times n}, μ (x) \geq 0, tr (\int_{ℝ} μ (x) d x) = 1} .

These are Hermitian, positive semi-definite matrix-valued functions on ℝ normalized so that their trace integrates to 1. They will be referred to as matrix-valued densities and can be thought of as a generalization of probability density functions. The scalar-valued tr(μ) represents mass at locationx. Thus, all elements in ℱ have the same total mass over the support. Below, we motivate a particular cost of transportation between such matrix-valued functions and introduce a suitable generalization of the Monge–Kantorovich OMT to matrix-valued densities.

A. Tensor Product and Partial Trace

Consider twon-dimensional real or complex (Hilbert) spaces ℋ₀ and ℋ₁, let ℒ(ℋ₀) and ℒ(ℋ₁) denote the space of linear operators on ℋ₀ and ℋ₁, respectively, and letμ₀ ∈ ℒ(ℋ₀) andμ₁ ∈ ℒ(ℋ₁). Thus, in the present subsection,μ_i (i ∈ {0, 1}) are fixed matrices. We denote their tensor product byμ₀ ⊗μ₁ ∈ ℒ(ℋ₀ ⊗ ℋ₁) which is formally defined via

μ_{0} \otimes μ_{1} : u \otimes v \mapsto μ_{0} u \otimes μ_{1} v .

Since our spaces are finite-dimensional this can be identified with the Kronecker product of the corresponding matrix representation of the two operators. The space ℒ(ℋ₀ ⊗ ℋ₁) is the span of all productsμ₀ ⊗μ₁ withμ_i ∈ ℒ(ℋ_i) fori ∈ {0, 1}.

Considerμ ∈ ℒ(ℋ₀ ⊗ ℋ₁). The partial traces tr_ℋ₀ and tr_ℋ₁, or tr₀ and tr₁ for brevity, are linear maps

\begin{array}{l} {tr}_{1} : L (H_{0} \otimes H_{1}) \to L (H_{0}) : μ \mapsto {tr}_{1} (μ) \\ {tr}_{0} : L (H_{0} \otimes H_{1}) \to L (H_{1}) : μ \mapsto {tr}_{0} (μ) \end{array}

defined uniquely by the property that on simple products they act as follows:

{tr}_{1} (μ_{0} \otimes μ_{1}) = tr (μ_{1}) μ_{0} and {tr}_{0} (μ_{0} \otimes μ_{1}) = tr (μ_{0}) μ_{1}

for anyμ₀ ∈ ℒ(ℋ₀) andμ₁ ∈ ℒ(ℋ₁). Alternatively,μ ∈ ℒ(ℋ₀ ⊗ ℋ₁) can be represented by a matrix [μ_ik,_ℓ_m] of sizen² ×n² as it maps a basis elementu_i ⊗v_k ∈ ℋ₀ ⊗ ℋ₁ to Σ_ℓ_,mμ_ik,_ℓ_mu_ℓ ⊗v_m. Then, the partial trace e.g., tr₁(μ) is the represented by then ×n matrix with (i, ℓ)-th entry Σ_kμ_ik,_ℓ_k, for 1 ≤i, ℓ ≤n. Likewise the (k, m)-th entry of tr₀(μ) is Σ_iμ_ik,im, for 1 ≤k,m ≤n. See [9] for the significance of partial trace in the context of quantum mechanics.

B. Joint Matrix-Valued Density

We now return to considering matrix-valued density functionsμ₀,μ₁ ∈ ℱ. A naive attempt is to seek a joint densitym ≥ 0 having support on ℝ × ℝ and havingμ₀,μ₁ as “marginals,” i.e., so that

\int_{ℝ} m (x, y) d y = μ_{0} (x), \int_{ℝ} m (x, y) d x = μ_{1} (y) .

(7)

However, in contrast to the scalar case, such anm does not exist in general. To see this, consider the case of matrix valued measures

\begin{array}{l} μ_{0} (x) = [\begin{matrix} \frac{1}{2} & 0 \\ 0 & 0 \end{matrix}] δ (x - 1) + [\begin{matrix} 0 & 0 \\ 0 & \frac{1}{2} \end{matrix}] δ (x - 2), \\ μ_{1} (y) = [\begin{matrix} \frac{1}{4} & - \frac{1}{4} \\ - \frac{1}{4} & \frac{1}{4} \end{matrix}] δ (y - 1) + [\begin{matrix} \frac{1}{4} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{4} \end{matrix}] δ (y - 2) \end{array}

whereδ(·) denotes the Dirac delta. Ifm(x, y) were to exist, its support would be contained in {(1, 1), (1, 2) (2, 1), (2, 2)}. It is easy to see that there cannot be a consistent selection of four 2 × 2 matrices so that, in pairs, they sum up to the coefficients making upμ₀(x) andμ₁(y).

Thus, any natural definition of a transportation plan requires that the joint density lives in a bigger space. A particular formulation is as follows: we seek

m (x, y) being n^{2} \times n^{2} positive semi-definite matrix

(8a)

for (x, y) ∈ ℝ × ℝ, such that

m_{0} (x, y) : = {tr}_{1} (m (x, y)), m_{1} (x, y) : = {tr}_{0} (m (x, y))

(8b)

\int_{ℝ} m_{0} (x, y) d y = μ_{0} (x), \int_{ℝ} m_{1} (x, y) d x = μ_{1} (y)

(8c)

and denote

M (μ_{0}, μ_{1}) : = {m ∣ (8 a) - (8 c) are satisfied} .

Sinceμ₀ ⊗μ₁ ∈M(μ₀,μ₁), the setM(μ₀,μ₁) is clearly not empty.

We next motivate a suitable transportation cost. This is a functional on the joint densityM(μ₀,μ₁), just as in the scalar case. However, besides penalizing transport of mass between two pointsx andy, we also impose a penalty on a corresponding rotation as well.

C. Transportation Cost

As indicated earlier, we interpret tr(m(x, y)) as the total “mass” that is being transferred fromx toy. We consider a scalar cost²c(x, y) = (x −y)² and the “mass transference” cost

min_{m \in M (μ_{0}, μ_{1})} \int_{ℝ \times ℝ} c (x, y) tr (m (x, y)) d x d y .

(9)

This coincides with the optimal transportation cost between scalar-valued densities tr(μ₀) and tr(μ₁). Thus, if tr(μ₀(x)) = tr(μ₁(x)), the optimal value of (9) is zero since it reduces to optimal transport between identical scalar marginals. Thus, (9) fails to quantify mismatch of directionality between the given matrix-valued marginals. Below, we introduce a term that penalizes directionality missmatch.

We assume throughout that the marginals are positive definite pointwise. Then, fori ∈ {0, 1}, tr(μ_i(x)) represents the total mass atx whileμ_i(x)/tr(μ_i(x)), normalized to have trace 1, encapsulates directional information. Likewise, for the joint densitym(x, y), assuming thatm(x, y) ≠ 0, we define thenormalized partial traces

\begin{array}{l} {\underline{tr}}_{0} (m (x, y)) : = {tr}_{0} (m (x, y)) / tr (m (x, y)) \\ {\underline{tr}}_{1} (m (x, y)) : = {tr}_{1} (m (x, y)) / tr (m (x, y)) . \end{array}

Their difference captures the directional mismatch between the two partial traces. Hence, we introduce

tr ({‖ ({\underline{tr}}_{0} - {\underline{tr}}_{1}) m (x, y) ‖}_{F}^{2} m (x, y))

to quantify the rotational mismatch and we consider the cost functional

tr ((c (x, y) + λ {‖ ({\underline{tr}}_{0} - {\underline{tr}}_{1}) m (x, y) ‖}_{F}^{2}) m (x, y))

withλ > 0, to weigh in the relative significance of the linear and rotational penalties.

D. Optimal Transportation Problem

In view of the above, we define

T_{2, λ} (μ_{0}, μ_{1}) : = min_{m \in M (μ_{0}, μ_{1})} \int_{ℝ \times ℝ} tr ((c + λ {‖ ({\underline{tr}}_{0} - {\underline{tr}}_{1}) m ‖}_{F}^{2}) m) d x d y

(10)

withc(x, y) = (x −y)², and show next that (10) is in fact a convex optimization problem.

From the definition

\begin{array}{l} {\underline{tr}}_{0} (m) tr (m) = {tr}_{0} (m), \\ {\underline{tr}}_{1} (m) tr (m) = {tr}_{1} (m) \end{array}

and hence

\begin{array}{l} {‖ ({\underline{tr}}_{0} - {\underline{tr}}_{1}) m ‖}_{F}^{2} tr (m) = \frac{{‖ ({\underline{tr}}_{0} - {\underline{tr}}_{1}) m ‖}_{F}^{2} tr {(m)}^{2}}{tr (m)} \\ = \frac{{‖ ({tr}_{0} - {tr}_{1}) m ‖}_{F}^{2}}{tr (m)} . \end{array}

Now letm(x, y) = tr(m(x, y)) andm₀(x, y),m₁(x, y) be as in (8). The expression for the optimal cost in (10) is lower bounded by

\begin{array}{l} min_{m_{0}, m_{1}, m} {\int (c (x, y) m (x, y) + λ \frac{{‖ m_{0} - m_{1} ‖}_{F}^{2}}{m}) d x d y ∣ \\ m_{0} (x, y), m_{1} (x, y) \geq 0, \\ tr (m_{0} (x, y)) = tr (m_{1} (x, y)) = m (x, y) \\ \int m_{0} (x, y) d y = μ_{0} (x), \int m_{1} (x, y) d x = μ_{1} (y)} . \end{array}

(11)

For an optimal triplem̂,m̂₀,m̂₁ of (11),m̂:=m̂₀ ⊗m̂₁ is a minimizer of (10) that gives the same optimal value as (11). Thus, the optimal cost in (10) is equivalently written as (11).

Forx > 0, the expression (y −z)²/x is jointly convex in the argumentsx, y, z, see e.g., [10, p. 72]. It readily follows that the integral in (11) is a jointly convex functional of its arguments. All additional constraints in (11) are convex as well and, therefore, so is the optimization problem.

IV. On the Geometry of Optimal Mass Transport

An important result in the (scalar) OMT theory is that the transportation plan is the sub-differential of a convex function and has support on a thin zero-measure set, see e.g., [3, p. 92]. This property is not shared by the optimal transportation plan between matrix-valued density functions as we explain next.

In standard scalar OMT with convex transportation cost, the optimal transportation plan has a certain cyclically monotonic property [3]. More specifically, if (x₁, y₁), (x₂, y₂) are two points where the transportation plan has support (i.e.,m(x, y) ≠ 0), thenx₂> x₁ impliesy₂ ≥y₁. The interpretation is that optimal transportation paths of mass elements do not cross. For the case of matrix-valued distributions as in (3), this property may not hold in the same way. However, interestingly, a weaker monotonicity property holds for the supporting set of the optimal matrix transportation plan. The property is defined next and the precise statement is given in Proposition 2 below.

Definition 1

A set 𝓢 ⊂ ℝ² is called aρ-monotonically nondecreasing, forρ > 0, if for any two points (x₁, y₁), (x₂, y₂) ∈ 𝓢, it holds that

(x_{2} - x_{1}) (y_{1} - y_{2}) \leq ρ .

A geometric interpretation for aρ-monotonically non-decreasing set is that if (x₁, y₁), (x₂, y₂) ∈ 𝓢 andx₂> x₁,y₁> y₂, then the area of the rectangle with vertices (x_i, y_j) (i, j ∈ {1, 2}) is not larger thanρ. The transportation plan of the scalar-valued optimal transportation problem with a quadratic cost has support on a 0-monotonically non-decreasing set.

Proposition 2

Givenμ₀,μ₁ ∈ ℱ, letm be the optimal transportation plan in (10) withc(x, y) = (x −y)² andλ > 0. Thenm has support on at most a (4 ·λ)-monotonically nondecreasing set.

Proof

See theAppendix.

Further, the optimal transportation cost 𝓣₂_{, λ}(μ₀,μ₁) satisfies:

𝓣₂_,λ (μ₀,μ₁) = 𝓣₂_,λ(μ₁,μ₀),
𝓣₂_,λ (μ₀,μ₁) ≥ 0,
𝓣₂_,λ (μ₀,μ₁) = 0 if and only ifμ₀ =μ₁.

Thus, although 𝓣₂_,λ(μ₀,μ₁) can be used to compare matrix-valued densities, it is not a metric and neither is $T_{2, λ}^{1 / 2}$ since the triangular inequality does not hold in general. We will introduce a slightly different formulation of a transportation problem which does give rise to a metric.

A. Optimal Transport on a Subset

In this subsection, we restrict attention to a certain subset of transport plansM(μ₀,μ₁) and show that the corresponding optimal transportation cost induces a metric. More specifically, let

M_{0} (μ_{0}, μ_{1}) : = {m ∣ m (x, y) = (μ_{0} (x) \otimes μ_{1} (y)) a (x, y), m \in M} .

Form(x, y) ∈M₀(μ₀,μ₁),

\begin{array}{l} {\underline{tr}}_{0} (m (x, y)) : = μ_{1} (y) / tr (μ_{1} (y)) \\ {\underline{tr}}_{1} (m (x, y)) : = μ_{0} (x) / tr (μ_{0} (x)) . \end{array}

Givenμ₀ andμ₁, the “orientation” of the mass ofm(x, y) is fixed. Thus, in this case, the optimal transportation cost is

{\tilde{T}}_{2, λ} (μ_{0}, μ_{1}) : = min_{m \in M_{0} (μ_{0}, μ_{1})} \int tr ((c + λ {‖ ({\underline{tr}}_{0} - {\underline{tr}}_{1}) m (x, y) ‖}_{F}^{2}) m) d x d y .

(12)

Proposition 3

For𝓣₂_,λ as in (12) withλ > 0 andμ₀,μ₁ ∈ ℱ,

d_{2, λ} (μ_{0}, μ_{1}) : = {({\tilde{T}}_{2, λ} (μ_{0}, μ_{1}))}^{\frac{1}{2}}

(13)

defines a metric on ℱ.

Proof

It is straightforward to see that

d_{2, λ} (μ_{0}, μ_{1}) = d_{2, λ} (μ_{1}, μ_{0}) \geq 0

and thatd₂_,λ (μ₀,μ₁) = 0 if and only ifμ₀ =μ₁. We now show that the triangle inequality holds as well. Forμ₀,μ₁,μ₂ ∈ ℱ, let

\begin{array}{l} m_{01} (x, y) = \frac{μ_{0} (x)}{tr (μ_{0} (x))} \otimes \frac{μ_{1} (y)}{tr (μ_{1} (y))} m_{01} (x, y) \\ m_{12} (y, z) = \frac{μ_{1} (y)}{tr (μ_{1} (y))} \otimes \frac{μ_{2} (z)}{tr (μ_{2} (z))} m_{12} (y, z) \end{array}

denote the optimal transportation plan for the pairs (μ₀,μ₁) and (μ₁,μ₂), respectively, wherem₀₁ andm₁₂ are two (scalar-valued) joint densities on ℝ² with marginals tr(μ₀), tr(μ₁) and tr(μ₁), tr(μ₂), respectively. Givenm₀₁(x, y) andm₁₂(y, z) there is a joint density functionm(x, y, z) on ℝ³ withm₀₁ andm₁₂ as the marginals on the corresponding subspaces [3, p. 208]. We set

m (x, y, z) = \frac{μ_{0} (x)}{tr (μ_{0} (x))} \otimes \frac{μ_{1} (y)}{tr (μ_{1} (y))} \otimes \frac{μ_{2} (z)}{tr (μ_{2} (z))} m (x, y, z)

and note that it hasm₀₁ andm₁₂ as matrix-valued marginal distributions. Now, letm₀₂(x,z) = (μ₀(x)/trμ₀(x)) ⊗ (μ₂(z)/trμ₂(z))m₀₂(x,z) be the marginal ofm(x,y,z) when tracing out they-component. Thism₀₂(x,z) is a possible transportation plan betweenμ₀ andμ₂. Hence

\begin{array}{l} d_{2, λ} (μ_{0}, μ_{2}) \leq {(\int_{ℝ^{2}} ({(x - z)}^{2} + λ {‖ \frac{μ_{0} (x)}{tr μ_{0} (x)} - \frac{μ_{2} (z)}{tr μ_{2} (z)} ‖}_{F}^{2}) m_{02} d x d z)}^{\frac{1}{2}} \\ = {(\int_{ℝ^{3}} ({(x - z)}^{2} + λ {‖ \frac{μ_{0} (x)}{tr μ_{0} (x)} - \frac{μ_{2} (z)}{tr μ_{2} (z)} ‖}_{F}^{2}) m d x d y d z)}^{\frac{1}{2}} \\ = {(\int_{ℝ^{3}} ({(x - y + y - z)}^{2} + λ {‖ \frac{μ_{0} (x)}{tr μ_{0} (x)} - \frac{μ_{1} (y)}{tr μ_{1} (y)} + \frac{μ_{1} (y)}{tr μ_{1} (y)} - \frac{μ_{2} (z)}{tr μ_{2} (z)} ‖}_{F}^{2}) m d x d y d z)}^{\frac{1}{2}} \\ \leq {(\int_{ℝ^{2}} ({(x - y)}^{2} + λ {‖ \frac{μ_{0} (x)}{tr μ_{0} (x)} - \frac{μ_{1} (y)}{tr μ_{1} (y)} ‖}_{F}^{2}) m_{01} d x d y)}^{\frac{1}{2}} + {(\int_{ℝ^{2}} ({(y - z)}^{2} + λ {‖ \frac{μ_{1} (y)}{tr μ_{1} (y)} - \frac{μ_{2} (z)}{tr μ_{2} (z)} ‖}_{F}^{2}) m_{12} d y d z)}^{\frac{1}{2}} \\ = d_{2, λ} (μ_{0}, μ_{1}) + d_{2, λ} (μ_{1}, μ_{2}) \end{array}

where the last inequality is due to the metric property ofL₂.

Ifλ = 0, then 𝓣̃_2,0(μ₀,μ₁) is exactly the OMT cost between the scalar-valued densities tr(μ₀) and tr(μ₁) as was explained earlier. In particular, for an optimal transportation planm(x,y) between tr(μ₀) and tr(μ₁), the matrix-valued transportation planm(x,y) = (μ₀(x)/tr(μ₀(x))) ⊗ (μ₁(y)/tr(μ₁(y)))m(x,y) is optimal betweenμ₀ andμ₁ which satisfies that 𝓣̃_2,0(μ₀,μ₁) = 𝓣₂(tr(μ₀), tr(μ₁)). Thus, 𝓣̃_2,0(μ₀,μ₁) = 0 if and only if tr(μ₀) = tr(μ₁). Hence, 𝓣̃_2,0 fails to be a metric. Moreover, since for anyλ ≥ 0 it holds that 𝓣_2,_λ ≤ 𝓣̃_2,_λ, if tr(μ₀) = tr(μ₁) then 𝓣_2,0(μ₀,μ₁) also equals to zero.

Proposition 4

Givenμ₀,μ₁ ∈ ℱ, letm be the optimal transportation plan in (13), thenm has support on at most a (2 ·λ)-monotonically non-decreasing set.

Proof

We need to prove that ifm(x₁,y₁) ≠ 0 andm(x₂,y₂) ≠ 0, thenx₂ >x₁,y₁ >y₂ implies

(y_{1} - y_{2}) (x_{2} - x_{1}) \leq 2 λ .

(14)

Assume thatm evaluated at the four points (x_i,y_j) withi,j ∈ {1, 2}, is as follows:

m (x_{i}, y_{i}) = m_{i j} \cdot A_{i} \otimes B_{j}

with

A_{i} = \frac{μ_{0} (x_{i})}{tr (μ_{1} (x_{i}))}, B_{i} = \frac{μ_{0} (y_{i})}{tr (μ_{1} (y_{i}))}

andm₁₁,m₂₂ > 0. The steps of the proof are similar to those of Proposition 2 detailed in theAppendix: first, we assume that Proposition 4 fails and that

(y_{1} - y_{2}) (x_{2} - x_{1}) > 2 λ .

Then we show that a smaller cost can be incurred by rearranging the “mass.” Consider the situation whenm₂₂ ≥m₁₁ first and letm̂ be a new transportation plan with

\begin{array}{l} \hat{m} (x_{1}, y_{1}) = 0 \\ \hat{m} (x_{1}, y_{2}) = (m_{11} + m_{12}) \cdot A_{1} \otimes B_{2} \\ \hat{m} (x_{2}, y_{1}) = (m_{11} + m_{21}) \cdot A_{2} \otimes B_{1} \\ \hat{m} (x_{2}, y_{2}) = (m_{22} - m_{11}) \cdot A_{2} \otimes B_{2} . \end{array}

Then,m̂,m have the same marginals at the four points, the cost incurred bym is

\sum_{i = 1}^{2} \sum_{j = 1}^{2} m_{i j} ({(x_{i} - y_{j})}^{2} + λ {‖ A_{i} - B_{j} ‖}_{F}^{2})

(15)

and the cost incurred bym̂ is

(m_{11} + m_{12}) ({(x_{1} - y_{2})}^{2} + λ {‖ A_{1} - B_{2} ‖}_{F}^{2}) + (m_{11} + m_{21}) ({(x_{2} - y_{1})}^{2} + λ {‖ A_{2} - B_{1} ‖}_{F}^{2}) + (m_{22} - m_{11}) ({(x_{2} - y_{2})}^{2} + λ {‖ A_{2} - B_{2} ‖}_{F}^{2}) .

(16)

To show that (15) is larger than (16), after canceling common terms, it suffices to show that

{(y_{1} - x_{1})}^{2} + {(y_{2} - x_{2})}^{2} + λ {‖ A_{1} - B_{1} ‖}_{F}^{2} + λ {‖ A_{2} - B_{2} ‖}_{F}^{2} \geq {(y_{2} - x_{1})}^{2} + {(y_{1} - x_{2})}^{2} + λ {‖ A_{1} - B_{2} ‖}_{F}^{2} + λ {‖ A_{2} - B_{1} ‖}_{F}^{2} .

However, the above holds true since

\begin{array}{l} {(y_{1} - x_{1})}^{2} + {(y_{2} - x_{2})}^{2} + λ {‖ A_{1} - B_{1} ‖}_{F}^{2} + λ {‖ A_{2} - B_{2} ‖}_{F}^{2} \geq {(y_{1} - x_{1})}^{2} + {(y_{2} - x_{2})}^{2} \\ = {(y_{1} - x_{2})}^{2} + {(y_{2} - x_{1})}^{2} + 2 (x_{2} - x_{1}) (y_{1} - y_{2}) \\ > {(y_{1} - x_{2})}^{2} + {(y_{2} - x_{1})}^{2} + 4 λ \\ \geq {(y_{1} - x_{2})}^{2} + {(y_{1} - x_{2})}^{2} + λ ({‖ A_{1} - B_{2} ‖}_{F}^{2} + {‖ A_{2} - B_{1} ‖}_{F}^{2}) . \end{array}

The last inequality follows from:

{‖ A_{1} - B_{2} ‖}_{F}^{2} = tr (A_{1}^{2} + B_{2}^{2} - 2 A_{1} B_{2}) \leq tr (A_{1}^{2} + B_{2}^{2}) \leq 2.

The casem₁₁ >m₂₂ proceeds similarly.

V. Examples

We give two different examples where matrix-valued OMT can be directly applied. Both relate to spectral analysis of multivariable time series.³

A. Spectral Morphing

We first highlight the relevance of matrix-valued OMT to spectral analysis with a numerical example on spectral morphing. The idea is to model slowly time-varying changes in the spectral domain by geodesics in a suitable geometry (see e.g., [7], [8]). The use of geodesic interpolation can be thought of as a regularization technique. Indeed, geodesics smoothly shift spectral power across frequencies lessening the possibility of a fade-in fade-out artifacts and OMT, for scalar power spectra, has been used to this end in [7], [8]. Below we exemplify how geodesics appear in matrix-valued OMT.

Starting withμ₀,μ₁ ∈ ℱ we approximate the geodesic between the two by constructingN − 1 points intermediate matrix densities. To this end, we setμ_τ₀ =μ₀ andμ_{τ_N} =μ₁, and determineμ_{τ_k} ∈ ℱ fork = 1, …,N − 1 as the solution to

min_{μ_{τ_{k}}, 0 < k < N} \sum_{k = 0}^{N - 1} T_{2, λ} (μ_{τ_{k + 1}}, μ_{τ_{k}}) .

(17)

As noted in Section III-D, this can be obtained numerically via convex programming. The present example uses

\begin{array}{l} μ_{0} = [\begin{matrix} 1 & 0 \\ 0.2 e^{- j θ} & 1 \end{matrix}] [\begin{matrix} \frac{1}{{∣ a_{0} (e^{j θ}) ∣}^{2}} & 0 \\ 0 & 0.01 \end{matrix}] [\begin{matrix} 1 & 0.2 e^{j θ} \\ 0 & 1 \end{matrix}] \\ μ_{1} = [\begin{matrix} 1 & 0.2 \\ 0 & 1 \end{matrix}] [\begin{matrix} 0.01 & 0 \\ 0 & \frac{1}{{∣ a_{1} (e^{j θ}) ∣}^{2}} \end{matrix}] [\begin{matrix} 1 & 0 \\ 0.2 & 1 \end{matrix}] \end{array}

with

\begin{array}{l} a_{0} (z) = (z^{2} - 1.8 cos (\frac{π}{4}) z + {0.9}^{2}) (z^{2} - 1.4 cos (\frac{π}{3}) z + {0.7}^{2}) \\ a_{1} (z) = (z^{2} - 1.8 cos (\frac{π}{6}) z + {0.9}^{2}) (z^{2} - 1.5 cos (\frac{2 π}{15}) z + {0.75}^{2}) \end{array}

shown inFig. 1. Since the value of a power spectral density at each point in frequency is a 2 × 2 Hermitian matrix, we have used the (1, 1), (1, 2), and (2, 2) subplots to display the magnitude of the corresponding entries, i.e., |μ(1, 1)|, |μ(1, 2)|, (= |μ(2, 1)|), and |μ(2, 2)|, respectively, and the (2,1) subplot to display the phase ∠μ(1, 2) (= −∠μ(2, 1)).

The 3-D plots inFig. 2 refer to (17), withλ = 0.1, for an approximation of a geodesic. The two boundary plots represent the power spectraμ₀ andμ₁ shown in blue and red, respectively, using the same convention about magnitudes and phases explained above. There are in total 7 power spectraμ_{τ_k},k = 1, …, 7 shown along the geodesic betweenμ₀ andμ₁, and the time-indices correspond toτ_k =k/8. It is interesting to observe the smooth shift of the energy over the geodesic path from the one “channel” to the other while, at the same time, the corresponding peak shifts from one frequency to another. One should bear in mind that the so-constructed geodesic is a non-parametric path interpolating/linking the given spectra.

B. Regularization Using Geodesics

Consider two time series

\begin{array}{l} x_{1} (t) = a_{1} (t) cos (θ_{1} (t) t + ϕ_{1 a}) + a_{2} (t) cos (θ_{2} (t) t + ϕ_{1 b}) + w_{1} (t) \\ x_{2} (t) = a_{2} (t) cos (θ_{1} (t) t + ϕ_{2 a}) + a_{1} (t) cos (θ_{2} (t) t + ϕ_{2 b}) + w_{2} (t) \end{array}

fort = 1, …, 2000, both consisting of sinusoidal signals with time-varying amplitude and frequency (chirp-like) with added white noisew₁(t) andw₂(t). The amplitudea₁(t) decreases from 1.2 to 0.1 whilea₂(t) increases from 0.1 to 1.2. Frequencyθ₁(t) decreases from (π/4) to (π/4) − (π/30) whileθ₂(t) increases from (π/3) to (π/3) + (π/30). Then [w₁(t),w₂(t)]′ is white, with independent components, and sampled from a zero-mean Gaussian distribution with covariance $[\begin{matrix} 3 & 1.5 \\ 1.5 & 3 \end{matrix}]$ . The initial phases of the sinusoids are randomly selected in [0, 2π].

Since we are dealing with non-stationary time series, we truncate the observed time series, to segments of length equal to 200, to retain resolution. Thus, we letx_i_,_k(t) :=x_i(200k+t) withi = 1, 2,k = 0, …, 9 and process separately the segments {x_i_,_k(1),x_i_,_k(2), …,x_i_,_k(200)}. We obtain matrix-valued sample covariances {R_k_,−20, …,R_k_,0, …,R_k_,20} for each. We then determine autoregressive models based on these sample covariances and, thereby, the corresponding power spectral density functions. More specifically, for ℓ = 0, 1, …, 20, we compute

R_{k, ℓ} : = \frac{1}{200} \sum_{i = ℓ}^{200} [\begin{matrix} x_{1, k} (i) \\ x_{2, k} (i) \end{matrix}] [x_{1, k} (i - ℓ), x_{2, k} (i - ℓ)]

and let $R_{k, - ℓ} = R_{k, ℓ}^{'}$ . We then solve the Yule-Walker equations

R_{k, ℓ} = \sum_{i = 1}^{20} A_{k, i} R_{k, ℓ - i}, for ℓ = 1, \dots, 20

for the autoregressive (matricial) coefficientsA₁, …,A₂₀. We let $Ω : = R_{0} - \sum_{i = 1}^{20} A_{k, i} R_{k, - i}$ be the corresponding innovation variance. The estimated power spectral density function for thekth segment, denoted asμ̂_{τ_k}, is given asμ̂_{τ_k}(θ) =A_k(e^jθ)⁻¹Ω(A_k(e^jθ)^*)⁻¹ with

A_{k} (e^{j θ}) = (I - A_{k, 1} z - \dots - A_{k, 20} z^{20}) ∣_{z = e^{j θ}} .

We scaleμ̂_{τ_k} so that the integral of its trace is normalized to one. Thus, the observation record is used to obtain 10 PSD’s denoted asμ̂_{τ_k}, fork = 1, …, 10. These represent estimates of the spectral power at intermediary points in time. We change the time scale so thatτ₁ = 0 andτ₁₀ = 1. The spectrogram is shown inFig. 3(a).

Fig. 3 — (a) Shows the estimated spectrogram of the observed time series and (b) corresponds to the geodesic-fitted spectrogram.

We construct an OMT-geodesic to regularize the estimated PSD’s. This idea was proposed and carried out in [8] for scalar time series and scalar PSD’s. For the present matrix-valued setting, the geodesic is obtained by solving

min_{μ_{τ_{k}}} {\sum_{k = 1}^{10} T_{2, λ} (μ_{τ_{k}}, {\hat{μ}}_{τ_{k}}) ∣ μ_{τ_{k}} are on an OMT geodesic} .

(18)

An explicit formula of the OMT geodesic is not available. However, in light of Proposition 2, (18) can be approximated for smallλ as follows. Letμ̂_{τ_k} = tr(μ̂_{τ_k}) fork = 1, …, 10. These are scalar-valued PSD’s. LetM̂_{τ_k} denote the corresponding cumulative distribution functions. Forλ small, following [8], we computeμ₀ := tr(μ₀) andμ₁ := tr(μ₁) via solving:

min_{μ_{0}, μ_{1}} \sum_{k = 1}^{10} \int_{0}^{1} {((1 - τ_{k}) M_{0}^{- 1} (v) + τ_{k} M_{1}^{- 1} (v) - {\hat{M}}_{τ_{k}}^{- 1} (v))}^{2} d v

withM₀ andM₁ representing the cumulative distribution function ofμ₀ andμ₁, respectively. Then, as was shown in [8], theμ_{τ_k}’s for 1 <k < 10 can be computed via

M_{τ_{k}}^{- 1} (v) = (1 - τ_{k}) M_{0}^{- 1} (v) + τ_{k} M_{1}^{- 1} (v) .

The matrix-valued PSD’sμ₀ andμ₁ are obtained by solving

min_{μ_{0}, μ_{1}} \sum_{k = 1}^{10} \int_{0}^{1} {‖ (1 - τ_{k}) \frac{μ_{0} (M_{0}^{- 1} (v))}{μ_{0} (M_{0}^{- 1} (v))} + τ_{k} \frac{μ_{1} (M_{1}^{- 1} (v))}{μ_{1} (M_{1}^{- 1} (v))} - \frac{{\hat{μ}}_{τ_{k}} ({\hat{M}}_{τ_{k}}^{- 1} (v))}{{\hat{μ}}_{τ_{k}} ({\hat{M}}_{τ_{k}}^{- 1} (v))} ‖}_{F}^{2} d v

and theμ_{τ_k}’s for 1 <k < 10 are computed via

\frac{μ_{τ_{k}} (M_{τ_{k}}^{- 1} (v))}{μ_{τ_{k}} (M_{τ_{k}}^{- 1} (v))} = (1 - τ_{k}) \frac{μ_{0} (M_{0}^{- 1} (v))}{μ_{0} (M_{0}^{- 1} (v))} + τ_{k} \frac{μ_{1} (M_{1}^{- 1} (v))}{μ_{1} (M_{1}^{- 1} (v))} .

We display this geodesic-fitted spectrogram inFig. 3(b). It can be seen that the shift of energy from one channel to another and between resonant frequencies is smoother than that shown inFig. 3(a) (which is a spectrogram based on matricial auto-regressive models).

In order to compare the resolution between the two techniques (spectrogram based on AR-modeling vs. geodesic regularization), we identify the frequency and directionality of peak power for the two power spectral densities and compare principal direction. This we explain next as the result is quite revealing and suggesting.

For eachμ̂_{τ_k} (θ), we find two frequenciesθ₁ andθ₂ where the power spectral densities (PSD’s) have locally maximal power, i.e., the two frequencies where tr(μ̂_{τ_k} (θ)) has the largest peaks. Then we compute the (normalized) eigenvectors corresponding to the dominant eigenvalues ofμ_{τ_k} (θ₁) andμ_{τ_k} (θ₂), respectively. These eigenvectors are shown inFig. 4(a) using black dashed lines. The red and green plots inFig. 4(a) represent the path of the two eigenvectors asτ_k increases from 0 to 1. The axes inFig. 4(a) correspond to the two channels/components of the time series andτ_k. (Should all the power be present in one of the two channels, the eigenvector would line up, accordingly, to one of the axes.) The values of the eigenvector when projected onto the two channels/axes, reflect the energy of the signals in the corresponding channels. Thus, in antenna-array applications, the direction of eigenvectors corresponds to the direction of a scatterer relative to the array. Statistical errors are reflected in the jagged nature of the paths when these are based on a spectrogram as inFig. 4(a). However, when comparing with the eigenvectors of the OMT regularized spectrogram/AR-modelsμ_{τ_k}, the corresponding paths shown inFig. 4(b) are smooth. Direct comparison betweenFig. 4(a) and (b) highlights the potential advantages of using geodesics as a means to regularize power distribution in non-stationary time series.

Fig. 4 — In (a), the trajectories of the dominant eigenvector ofμ̂_{τ_k} (θ₁) andμ̂_{τ_k} (θ₂) are shown in red and blue, respectively. The corresponding trajectories ofμ_{τ_k} (θ₁) andμ_{τ_k} (θ₂) are shown in (b).

VI. Conclusion

The geometry of Monge–Kantorovich optimal mass transport provides scalar densities with a natural metric structure (see [11], [12]; also [13] for a systems viewpoint and connections to image analysis and power spectra). Our interest has been in extending such a geometric structure to matrix-valued densities. To this end, we formulated one possible matrix-valued version of the Monge–Kantorovich transportation problem. Computations require convex programming and the framework directly extends the scalar case. An alternative generalization of the Monge–Kantorovich theory to a “non-commutative” setting has been given in the context of the theory of free-probabilities [14]. However, this may not be suitable for matrix-valued power distributions as it is not weak^* continuous. Alternative non-commutative generalizations of the Wasserstein metric are given in [15]–[17]. Possible connections between the formulation herein and these alternative viewpoints is the subject of current investigation.

Biographies

graphic file with name nihms757419b1.gif

Lipeng Ning received the B.S. and M.S. degrees in control science and engineering from Beijing Institute of Technology, Beijing, China, in 2006 and 2008, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Minnesota, Minneapolis, in 2013.

He is currently a Postdoc Research Fellow in Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA. His research interests include system identification, spectral estimation, sparse signal recovery, and diffusion magnetic resonance imaging.

graphic file with name nihms757419b2.gif

Tryphon T. Georgiou (F’00) received the Diploma in mechanical and electrical engineering from the National Technical University of Athens, Athens, Greece, in 1979 and the Ph.D. degree from the University of Florida, Gainesville, in 1983.

He is a faculty member in the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, and the Vincentine Hermes-Luh Chair. He has served as a Co-Director of the Control Science and Dynamical Systems Center at the University of Minnesota (1990–2014), and on the Board of Governors of the Control Systems Society of the IEEE (2002–2005).

Dr. Georgiou has been a recipient of the George S. Axelby Outstanding Paper Award of the IEEE Control Systems Society for the years 1992, 1999, and 2003. He is a Foreign Member of the Royal Swedish Academy of Engineering Sciences (IVA).

graphic file with name nihms757419b3.gif

Allen Tannenbaum (F’08) is a faculty member in computer science and applied mathematics at Stony Brook University, Stony Brook, NY, USA. He works in control, computer vision, and medical imaging.

Appendix

Appendix: Proof of Proposition 2

We need to show that ifm(x₁,y₁) ≠ 0 andm(x₂,y₂) ≠ 0, thenx₂ >x₁,y₁ >y₂ implies

(x_{2} - x_{1}) (y_{1} - y_{2}) \leq 4 λ .

(19)

Without loss of generality, let

m (x_{i}, y_{j}) = m_{i j} \cdot A_{i j} \otimes B_{i j}

(20)

withA_ij,B_ij ≥ 0, tr(A_ij) = tr(B_ij) = 1 andi,j ∈ {1, 2}. Note thatm₁₂ andm₂₁ could be zero ifm does not have support on the particular point. We assume that the condition in the proposition fails and that

(x_{2} - x_{1}) (y_{1} - y_{2}) > 4 λ .

(21)

We then show that by rearranging the mass, the cost can be reduced.

We first consider the situation whenm₂₂ ≥m₁₁. By rearranging the value ofm at the four points (x_i,y_j) withi,j ∈ {1, 2}, we construct a new transportation planm̃ at these four locations as follows:

\tilde{m} (x_{1}, y_{1}) = 0

(22a)

\tilde{m} (x_{1}, y_{2}) = (m_{11} + m_{12}) \cdot {\tilde{A}}_{12} \otimes {\tilde{B}}_{12}

(22b)

\tilde{m} (x_{2}, y_{1}) = (m_{11} + m_{21}) \cdot {\tilde{A}}_{21} \otimes {\tilde{B}}_{21}

(22c)

\tilde{m} (x_{2}, y_{2}) = (m_{22} - m_{11}) \cdot A_{22} \otimes B_{22}

(22d)

where

\begin{array}{l} {\tilde{A}}_{12} = \frac{m_{11} A_{11} + m_{12} A_{12}}{m_{11} + m_{12}}, & {\tilde{B}}_{12} = \frac{m_{11} B_{22} + m_{12} B_{12}}{m_{11} + m_{12}} \\ {\tilde{A}}_{21} = \frac{m_{11} A_{22} + m_{21} A_{21}}{m_{11} + m_{21}}, & {\tilde{B}}_{21} = \frac{m_{11} B_{11} + m_{21} B_{21}}{m_{11} + m_{21}} . \end{array}

This new transportation planm̃ has the same marginals asm atx₁,x₂ andy₁,y₂. The original cost incurred bym at these four locations is

\sum_{i = 1}^{2} \sum_{j = 1}^{2} m_{i j} ({(x_{i} - y_{j})}^{2} + λ {‖ A_{i j} - B_{i j} ‖}_{F}^{2})

(23)

while the cost incurred bym̃ is

(m_{11} + m_{12}) ({(x_{1} - y_{2})}^{2} + λ {‖ {\tilde{A}}_{12} - {\tilde{B}}_{12} ‖}_{F}^{2}) + (m_{11} + m_{21}) ({(x_{2} - y_{1})}^{2} + λ {‖ {\tilde{A}}_{21} - {\tilde{B}}_{21} ‖}_{F}^{2}) + (m_{22} - m_{11}) ({(x_{2} - y_{2})}^{2} + λ {‖ A_{22} - B_{22} ‖}_{F}^{2}) .

(24)

After simplification, to show that (23) is larger than (24), it suffices to show that

2 m_{11} (x_{2} - x_{1}) (y_{1} - y_{2})

(25)

is larger than

λ m_{11} (\sum_{i = 1}^{2} \sum_{j \neq i} {‖ {\tilde{A}}_{i j} - {\tilde{B}}_{i j} ‖}_{F}^{2} - \sum_{i = 1}^{2} {‖ A_{i i} - B_{i i} ‖}_{F}^{2})

(26a)

+ λ m_{12} ({‖ {\tilde{A}}_{12} - {\tilde{B}}_{12} ‖}_{F}^{2} - {‖ A_{12} - B_{12} ‖}_{F}^{2})

(26b)

+ λ m_{21} ({‖ {\tilde{A}}_{21} - {\tilde{B}}_{21} ‖}_{F}^{2} - {‖ A_{21} - B_{21} ‖}_{F}^{2}) .

(26c)

From (21), it follows that the value in (25) is greater than 20λm₁₁. We derive upper bounds for each term in (26). First,

(26 a) \leq λ m_{11} ({‖ {\tilde{A}}_{12} - {\tilde{B}}_{12} ‖}_{F}^{2} + {‖ {\tilde{A}}_{21} - {\tilde{B}}_{21} ‖}_{F}^{2}) \leq 4 λ m_{11}

where the last inequality follows from the fact that:

{‖ A - B ‖}_{F}^{2} = tr (A^{2} - 2 A B + B^{2}) \leq tr (A^{2} + B^{2}) \leq 2

for anyA,B ≥ 0 with tr(A) = tr(B) = 1. Now consider

\begin{array}{l} {‖ {\tilde{A}}_{12} - {\tilde{B}}_{12} ‖}_{F}^{2} - {‖ A_{12} - B_{12} ‖}_{F}^{2} = tr (({\tilde{A}}_{12} - {\tilde{B}}_{12} + A_{12} - B_{12}) ({\tilde{A}}_{12} - {\tilde{B}}_{12} - A_{12} + B_{12})) \\ = \frac{m_{11}}{m_{11} + m_{12}} ({‖ A_{11} - B_{22} ‖}_{F}^{2} - {‖ A_{12} - B_{12} ‖}_{F}^{2} - \frac{m_{12}}{m_{11} + m_{12}} {‖ A_{11} - B_{22} - A_{12} + B_{12} ‖}_{F}^{2}) \\ \leq \frac{m_{11}}{m_{11} + m_{12}} {‖ A_{11} - B_{22} ‖}_{F}^{2} \\ \leq 2 \frac{m_{11}}{m_{11} + m_{12}} \end{array}

where the second equality follows from the definitions ofÃ₁₂ andB̃₁₂ while the last inequality is obtained by bounding the terms in the trace. Thus, referring to expressions by the respective equation numbering

(26 b) \leq 2 λ m_{12} \frac{m_{11}}{m_{11} + m_{12}} \leq 2 λ m_{11} .

In a similar manner, (26c) ≤ 2λm₁₁. Therefore

(26) \leq 8 λ m_{11} < (25)

which implies that the cost incurred bym̃ is smaller than the cost incurred bym.

For the case wherem₁₁ >m₂₂, we can prove the claim by constructing a new transportation planm̂ with values

\begin{array}{l} \hat{m} (x_{1}, y_{1}) = (m_{11} - m_{22}) \cdot A_{11} \otimes B_{11} \\ \hat{m} (x_{1}, y_{2}) = (m_{12} + m_{22}) \cdot {\hat{A}}_{12} \otimes {\hat{B}}_{12} \\ \hat{m} (x_{2}, y_{1}) = (m_{21} + m_{22}) \cdot {\hat{A}}_{21} \otimes {\hat{B}}_{21} \\ \hat{m} (x_{2}, y_{2}) = 0 \end{array}

with

\begin{array}{l} {\hat{A}}_{12} = \frac{m_{12} A_{12} + m_{22} A_{11}}{m_{12} + m_{22}}, & {\hat{B}}_{12} = \frac{m_{12} B_{12} + m_{22} B_{22}}{m_{12} + m_{22}} \\ {\hat{A}}_{21} = \frac{m_{21} A_{21} + m_{22} A_{22}}{m_{21} + m_{22}}, & {\hat{B}}_{21} = \frac{m_{21} B_{21} + m_{22} B_{11}}{m_{21} + m_{22}} . \end{array}

The rest of the proof is carried out in a similar manner.

Footnotes

A sequence of measuresdμ_n converges weak^* todμ if and only if ∫fdμ_n → ∫fdμ for all continuous and boundedf.

It is interesting to consider functionals of the form tr(c(x, y)m(x, y)), withc(x, y) being matricial, and how to utilize such so as to reflect transportation cost with practical relevance.

Matlab code is available athttp://www.ece.umn.edu/users/ningx015/research.html.

Color versions of one or more of the figures in this paper are available online athttp://ieeexplore.ieee.org.

Contributor Information

Lipeng Ning, Email: lning@bwh.harvard.edu, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115 USA.

Tryphon T. Georgiou, Email: tryphon@umn.edu, Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA.

Allen Tannenbaum, Email: allen.tannenbaum@stonybrook.edu, Departments of Computer Science and Applied Mathematics, Stony Brook University, Stony Brook, NY 11794 USA.

References

1.Monge G. Open Library. De l’Imprimerie Royale; 1781. Mémoire sur la théorie des déblais et des remblais. [Google Scholar]
2.Kantorovich L. On the transfer of masses. Dokl Akad Nauk SSSR. 1942;37:227–229. [Google Scholar]
3.Villani C. Topics in optimal transportation. Vol. 58. Providence, RI: American Mathematical Society; 2003. [Google Scholar]
4.Ambrosio L. Lecture notes on optimal transport problems. Mathematical aspects of evolving interfaces. 2003:1–52. [Google Scholar]
5.Rachev S, Rüschendorf L. Mass Transportation Problems: Theory. Vol. 1. New York: Springer-Verlag; 1998. [Google Scholar]
6.Ferrante A, Pavon M, Ramponi F. Hellinger versus Kullback–Leibler multivariable spectrum approximation. IEEE Trans Autom Control. 2008 Apr;53(4):954–967. [Google Scholar]
7.Jiang X, Ning L, Georgiou TT. Distances and Riemannian metrics for multivariate spectral densities. IEEE Trans Autom Control. 2012 Jul;57(7):1723–1735. [Google Scholar]
8.Jiang X, Luo Z, Georgiou T. Geometric methods for spectral analysis. IEEE Trans Signal Process. 2012 Mar;60(3):1064–1074. [Google Scholar]
9.Petz D. Quantum Information Theory and Quantum Statistics (Theoretical and Mathematical Physics) Berlin, Germany: Springer; 2008. [Google Scholar]
10.Boyd S, Vandenberghe L. Convex optimization. Cambridge, MA: Cambridge Univ. Press; 2004. [Google Scholar]
11.Benamou J, Brenier Y. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numerische Mathematik. 2000;84(3):375–393. [Google Scholar]
12.Jordan R, Kinderlehrer D, Otto F. The variational formulation of the Fokker-Planck equation. SIAM J Mathemat Anal. 1998;29(1):1–17. [Google Scholar]
13.Tannenbaum E, Georgiou T, Tannenbaum A. Signals and control aspects of optimal mass transport and the Boltzmann entropy. Proc. 49th IEEE Conf. Decision and Control; 2010. pp. 1885–1890. [Google Scholar]
14.Biane P, Voiculescu D. A free probability analogue of the Wasserstein metric on the trace-state space. Geometric and Funct Anal. 2001;11(6):1125–1138. [Google Scholar]
15.Rieffel M. Metrics on state spaces. Doc Math J DMV. 1999;4:559–600. [Google Scholar]
16.Andrea FD, Martinetti P. A view on optimal transport from noncommutative geometry. SIGMA. 2010;6(057):24. [Google Scholar]
17.Martinetti P. Towards a Monge–Kantorovich metric in noncommutative geometry. Zap Nauch Semin POMI. 2013:411. arXiv preprint arXiv:1210.6573. [Google Scholar]

Movatterモバイル変換

PERMALINK

On Matrix-Valued Monge–Kantorovich Optimal Mass Transport

Lipeng Ning

Tryphon T Georgiou,Fellow, IEEE

Allen Tannenbaum,Fellow, IEEE

Abstract

I. Introduction

II. Preliminaries on Optimal Mass Transport

III. Matrix-Valued Optimal Mass Transport

A. Tensor Product and Partial Trace

B. Joint Matrix-Valued Density

C. Transportation Cost

D. Optimal Transportation Problem

IV. On the Geometry of Optimal Mass Transport

Definition 1

Proposition 2

Proof

A. Optimal Transport on a Subset

Proposition 3

Proof

Proposition 4

Proof

V. Examples

A. Spectral Morphing

Fig. 1.

Fig. 2.

B. Regularization Using Geodesics

Fig. 3.

Fig. 4.

VI. Conclusion

Biographies

Appendix

Appendix: Proof of Proposition 2

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases