
On Matrix-Valued Monge–Kantorovich Optimal Mass Transport
Lipeng Ning
Tryphon T Georgiou,Fellow, IEEE
Allen Tannenbaum,Fellow, IEEE
Issue date 2015 Feb.
Abstract
We present a particular formulation of optimal transport for matrix-valued density functions. Our aim is to devise a geometry which is suitable for comparing power spectral densities of multivariable time series. More specifically, the value of a power spectral density at a given frequency, which in the matricial case encodes power as well as directionality, is thought of as a proxy for a “matrix-valued mass density.” Optimal transport aims at establishing a natural metric in the space of such matrix-valued densities which takes into account differences between power across frequencies as well as misalignment of the corresponding principle axes. Thus, our transportation cost includes a cost of transference of power between frequencies together with a cost of rotating the principle directions of matrix densities. The two endpoint matrix-valued densities can be thought of as marginals of a joint matrix-valued density on a tensor product space. This joint density, very much as in the classical Monge–Kantorovich setting, can be thought to specify the transportation plan. Contrary to the classical setting, the optimal transport plan for matrices is no longer supported on a thin zero-measure set.
Index Terms: Convex optimization, matrix-valued density functions, optimal mass-transport
I. Introduction
The problem of optimal mass transport (OMT) dates back to the work of G. Monge in 1781 [1] while its modern formulation is due to L. Kantorovich [2]. In recent years the subject is developing rapidly due to its intrinsic significance and range of applications in physics, economics, and probability [3]–[5].
Our motivation for studying “matrix-valued transport” originates in the spectral analysis of multi-variable time series. Just as in scalar time series, spectral content is assessed based on (estimated) statistics of the underlying process, where these are simply moments of the corresponding power spectral density (PSD). Different metrics have been proposed to compare PSD’s for purposes of spectral approximation, estimation, and system modeling (see [6], [7] and the references therein). However, since spectra are estimated based on integrals, weak* metrics1 are preferable since they provide continuity of statistics to perturbations in the PSD. Earlier metrics and so called, divergence measures, typically fail in this respect (see [7]). Hence, for this reason, optimal mass transport which endows the space of (scalar) probability/mass/power densities with a natural weak* metric—the Wasserstein metric, is of particular interest. Our aim in this paper is to develop one possible such generalization of the Wasserstein metric that allows comparison of matrix-valued density functions in a similar spirit.
The scalar OMT theory has been adapted in [8] to model slowly time-varying changes in power spectra of time series and has been used for statistical estimation, data assimilation, and morphing. While in scalar time series, the power spectral content may drift across frequencies over time (e.g., when considering Doppler effects, echolocation of a moving target, etc.), in vector-valued time series the power spectral content may shift principle directions as well. In fact, such a rotation of the power-specral content is typical in general antenna-arrays when a scatterer changes position with respect to array elements. Therefore, a concept of transport between matrix-valued densities requires that we take into account both, the cost of shifting power across frequencies as well as the cost of rotating the corresponding principle axes. Besides our particular formulation of a “non-commutative” Monge–Kantorovich transportation problem and of a correponding metric, the main results in this paper are i) that the optimal transport can be cast as a convex-optimization problem, ii) the geodesics and transport paths can be determined using convex programming, and iii) the optimal transport plan has support which, in contrast to the classical Monge–Kantorovich setting, is no longer contained on a thin zero-measure set. The relevance of the proposed metric is highlighted in examples on spectral morphing and spectral tracking in the final section of the paper.
II. Preliminaries on Optimal Mass Transport
Consider two probability density functionsμ0 andμ1 supported on ℝ and let ℳ(μ0, μ1) be the set of probability measuresm on ℝ × ℝ withμ0 andμ1 as marginals, i.e.,
Clearly, ℳ(μ0, μ1) is not empty since already the productμ0(x)μ1(y) ∈ ℳ(μ0, μ1). Probability densities are thought of as distributions of mass and the optimal mass transport problem is to determine
(1) |
wherec(x, y) is the cost of transporting one unit of mass from locationx toy. In particular, whenc(x, y) = |x −y|2, the optimal cost gives rise to the 2-Wasserstein metric
where
(2) |
In general, (1) is a linear program with dual
(3) |
whereϕ0, ϕ1 are continuous, see [3]. For the quadratic costc(x, y) = |x −y|2 and in one spatial dimension, 𝓣2(μ0,μ1) can also be written explicitly in terms of the cumulative distributions functions
in the form
(4) |
In this case, the optimal joint probability densitym ∈ ℳ(μ0, μ1) has support on (x, T (x)) whereT (x) is the sub-differential of a convex lower semi-continuous function, see [3, p. 75]. More specifically,T (x) is uniquely defined by
(5) |
Interestingly, a geodesicμτ (τ ∈ [0, 1]) betweenμ0 andμ1 can be written explicitly as well in terms of a corresponding cumulative functionMτ, for eachτ, defined via
(6) |
Indeed, it readily follows that:
and thatμτ (τ ∈ [0, 1]) is a geodesic.
III. Matrix-Valued Optimal Mass Transport
We consider the one-dimensional family of matrix-valued functions
These are Hermitian, positive semi-definite matrix-valued functions on ℝ normalized so that their trace integrates to 1. They will be referred to as matrix-valued densities and can be thought of as a generalization of probability density functions. The scalar-valued tr(μ) represents mass at locationx. Thus, all elements in ℱ have the same total mass over the support. Below, we motivate a particular cost of transportation between such matrix-valued functions and introduce a suitable generalization of the Monge–Kantorovich OMT to matrix-valued densities.
A. Tensor Product and Partial Trace
Consider twon-dimensional real or complex (Hilbert) spaces ℋ0 and ℋ1, let ℒ(ℋ0) and ℒ(ℋ1) denote the space of linear operators on ℋ0 and ℋ1, respectively, and letμ0 ∈ ℒ(ℋ0) andμ1 ∈ ℒ(ℋ1). Thus, in the present subsection,μi (i ∈ {0, 1}) are fixed matrices. We denote their tensor product byμ0 ⊗μ1 ∈ ℒ(ℋ0 ⊗ ℋ1) which is formally defined via
Since our spaces are finite-dimensional this can be identified with the Kronecker product of the corresponding matrix representation of the two operators. The space ℒ(ℋ0 ⊗ ℋ1) is the span of all productsμ0 ⊗μ1 withμi ∈ ℒ(ℋi) fori ∈ {0, 1}.
Considerμ ∈ ℒ(ℋ0 ⊗ ℋ1). The partial traces trℋ0 and trℋ1, or tr0 and tr1 for brevity, are linear maps
defined uniquely by the property that on simple products they act as follows:
for anyμ0 ∈ ℒ(ℋ0) andμ1 ∈ ℒ(ℋ1). Alternatively,μ ∈ ℒ(ℋ0 ⊗ ℋ1) can be represented by a matrix [μik,ℓm] of sizen2 ×n2 as it maps a basis elementui ⊗vk ∈ ℋ0 ⊗ ℋ1 to Σℓ,mμik,ℓmuℓ ⊗vm. Then, the partial trace e.g., tr1(μ) is the represented by then ×n matrix with (i, ℓ)-th entry Σkμik,ℓk, for 1 ≤i, ℓ ≤n. Likewise the (k, m)-th entry of tr0(μ) is Σiμik,im, for 1 ≤k,m ≤n. See [9] for the significance of partial trace in the context of quantum mechanics.
B. Joint Matrix-Valued Density
We now return to considering matrix-valued density functionsμ0,μ1 ∈ ℱ. A naive attempt is to seek a joint densitym ≥ 0 having support on ℝ × ℝ and havingμ0,μ1 as “marginals,” i.e., so that
(7) |
However, in contrast to the scalar case, such anm does not exist in general. To see this, consider the case of matrix valued measures
whereδ(·) denotes the Dirac delta. Ifm(x, y) were to exist, its support would be contained in {(1, 1), (1, 2) (2, 1), (2, 2)}. It is easy to see that there cannot be a consistent selection of four 2 × 2 matrices so that, in pairs, they sum up to the coefficients making upμ0(x) andμ1(y).
Thus, any natural definition of a transportation plan requires that the joint density lives in a bigger space. A particular formulation is as follows: we seek
(8a) |
for (x, y) ∈ ℝ × ℝ, such that
(8b) |
(8c) |
and denote
Sinceμ0 ⊗μ1 ∈M(μ0,μ1), the setM(μ0,μ1) is clearly not empty.
We next motivate a suitable transportation cost. This is a functional on the joint densityM(μ0,μ1), just as in the scalar case. However, besides penalizing transport of mass between two pointsx andy, we also impose a penalty on a corresponding rotation as well.
C. Transportation Cost
As indicated earlier, we interpret tr(m(x, y)) as the total “mass” that is being transferred fromx toy. We consider a scalar cost2c(x, y) = (x −y)2 and the “mass transference” cost
(9) |
This coincides with the optimal transportation cost between scalar-valued densities tr(μ0) and tr(μ1). Thus, if tr(μ0(x)) = tr(μ1(x)), the optimal value of (9) is zero since it reduces to optimal transport between identical scalar marginals. Thus, (9) fails to quantify mismatch of directionality between the given matrix-valued marginals. Below, we introduce a term that penalizes directionality missmatch.
We assume throughout that the marginals are positive definite pointwise. Then, fori ∈ {0, 1}, tr(μi(x)) represents the total mass atx whileμi(x)/tr(μi(x)), normalized to have trace 1, encapsulates directional information. Likewise, for the joint densitym(x, y), assuming thatm(x, y) ≠ 0, we define thenormalized partial traces
Their difference captures the directional mismatch between the two partial traces. Hence, we introduce
to quantify the rotational mismatch and we consider the cost functional
withλ > 0, to weigh in the relative significance of the linear and rotational penalties.
D. Optimal Transportation Problem
In view of the above, we define
(10) |
withc(x, y) = (x −y)2, and show next that (10) is in fact a convex optimization problem.
From the definition
and hence
Now letm(x, y) = tr(m(x, y)) andm0(x, y),m1(x, y) be as in (8). The expression for the optimal cost in (10) is lower bounded by
(11) |
For an optimal triplem̂,m̂0,m̂1 of (11),m̂:=m̂0 ⊗m̂1 is a minimizer of (10) that gives the same optimal value as (11). Thus, the optimal cost in (10) is equivalently written as (11).
Forx > 0, the expression (y −z)2/x is jointly convex in the argumentsx, y, z, see e.g., [10, p. 72]. It readily follows that the integral in (11) is a jointly convex functional of its arguments. All additional constraints in (11) are convex as well and, therefore, so is the optimization problem.
IV. On the Geometry of Optimal Mass Transport
An important result in the (scalar) OMT theory is that the transportation plan is the sub-differential of a convex function and has support on a thin zero-measure set, see e.g., [3, p. 92]. This property is not shared by the optimal transportation plan between matrix-valued density functions as we explain next.
In standard scalar OMT with convex transportation cost, the optimal transportation plan has a certain cyclically monotonic property [3]. More specifically, if (x1, y1), (x2, y2) are two points where the transportation plan has support (i.e.,m(x, y) ≠ 0), thenx2> x1 impliesy2 ≥y1. The interpretation is that optimal transportation paths of mass elements do not cross. For the case of matrix-valued distributions as in (3), this property may not hold in the same way. However, interestingly, a weaker monotonicity property holds for the supporting set of the optimal matrix transportation plan. The property is defined next and the precise statement is given in Proposition 2 below.
Definition 1
A set 𝓢 ⊂ ℝ2 is called aρ-monotonically nondecreasing, forρ > 0, if for any two points (x1, y1), (x2, y2) ∈ 𝓢, it holds that
A geometric interpretation for aρ-monotonically non-decreasing set is that if (x1, y1), (x2, y2) ∈ 𝓢 andx2> x1,y1> y2, then the area of the rectangle with vertices (xi, yj) (i, j ∈ {1, 2}) is not larger thanρ. The transportation plan of the scalar-valued optimal transportation problem with a quadratic cost has support on a 0-monotonically non-decreasing set.
Proposition 2
Givenμ0,μ1 ∈ ℱ, letm be the optimal transportation plan in (10) withc(x, y) = (x −y)2 andλ > 0. Thenm has support on at most a (4 ·λ)-monotonically nondecreasing set.
Proof
See theAppendix.
Further, the optimal transportation cost 𝓣2, λ(μ0,μ1) satisfies:
𝓣2,λ (μ0,μ1) = 𝓣2,λ(μ1,μ0),
𝓣2,λ (μ0,μ1) ≥ 0,
𝓣2,λ (μ0,μ1) = 0 if and only ifμ0 =μ1.
Thus, although 𝓣2,λ(μ0,μ1) can be used to compare matrix-valued densities, it is not a metric and neither is since the triangular inequality does not hold in general. We will introduce a slightly different formulation of a transportation problem which does give rise to a metric.
A. Optimal Transport on a Subset
In this subsection, we restrict attention to a certain subset of transport plansM(μ0,μ1) and show that the corresponding optimal transportation cost induces a metric. More specifically, let
Form(x, y) ∈M0(μ0,μ1),
Givenμ0 andμ1, the “orientation” of the mass ofm(x, y) is fixed. Thus, in this case, the optimal transportation cost is
(12) |
Proposition 3
For𝓣2,λ as in (12) withλ > 0 andμ0,μ1 ∈ ℱ,
(13) |
defines a metric on ℱ.
Proof
It is straightforward to see that
and thatd2,λ (μ0,μ1) = 0 if and only ifμ0 =μ1. We now show that the triangle inequality holds as well. Forμ0,μ1,μ2 ∈ ℱ, let
denote the optimal transportation plan for the pairs (μ0,μ1) and (μ1,μ2), respectively, wherem01 andm12 are two (scalar-valued) joint densities on ℝ2 with marginals tr(μ0), tr(μ1) and tr(μ1), tr(μ2), respectively. Givenm01(x, y) andm12(y, z) there is a joint density functionm(x, y, z) on ℝ3 withm01 andm12 as the marginals on the corresponding subspaces [3, p. 208]. We set
and note that it hasm01 andm12 as matrix-valued marginal distributions. Now, letm02(x,z) = (μ0(x)/trμ0(x)) ⊗ (μ2(z)/trμ2(z))m02(x,z) be the marginal ofm(x,y,z) when tracing out they-component. Thism02(x,z) is a possible transportation plan betweenμ0 andμ2. Hence
where the last inequality is due to the metric property ofL2.
Ifλ = 0, then 𝓣̃2,0(μ0,μ1) is exactly the OMT cost between the scalar-valued densities tr(μ0) and tr(μ1) as was explained earlier. In particular, for an optimal transportation planm(x,y) between tr(μ0) and tr(μ1), the matrix-valued transportation planm(x,y) = (μ0(x)/tr(μ0(x))) ⊗ (μ1(y)/tr(μ1(y)))m(x,y) is optimal betweenμ0 andμ1 which satisfies that 𝓣̃2,0(μ0,μ1) = 𝓣2(tr(μ0), tr(μ1)). Thus, 𝓣̃2,0(μ0,μ1) = 0 if and only if tr(μ0) = tr(μ1). Hence, 𝓣̃2,0 fails to be a metric. Moreover, since for anyλ ≥ 0 it holds that 𝓣2,λ ≤ 𝓣̃2,λ, if tr(μ0) = tr(μ1) then 𝓣2,0(μ0,μ1) also equals to zero.
Proposition 4
Givenμ0,μ1 ∈ ℱ, letm be the optimal transportation plan in (13), thenm has support on at most a (2 ·λ)-monotonically non-decreasing set.
Proof
We need to prove that ifm(x1,y1) ≠ 0 andm(x2,y2) ≠ 0, thenx2 >x1,y1 >y2 implies
(14) |
Assume thatm evaluated at the four points (xi,yj) withi,j ∈ {1, 2}, is as follows:
with
andm11,m22 > 0. The steps of the proof are similar to those of Proposition 2 detailed in theAppendix: first, we assume that Proposition 4 fails and that
Then we show that a smaller cost can be incurred by rearranging the “mass.” Consider the situation whenm22 ≥m11 first and letm̂ be a new transportation plan with
Then,m̂,m have the same marginals at the four points, the cost incurred bym is
(15) |
and the cost incurred bym̂ is
(16) |
To show that (15) is larger than (16), after canceling common terms, it suffices to show that
However, the above holds true since
The last inequality follows from:
The casem11 >m22 proceeds similarly.
V. Examples
We give two different examples where matrix-valued OMT can be directly applied. Both relate to spectral analysis of multivariable time series.3
A. Spectral Morphing
We first highlight the relevance of matrix-valued OMT to spectral analysis with a numerical example on spectral morphing. The idea is to model slowly time-varying changes in the spectral domain by geodesics in a suitable geometry (see e.g., [7], [8]). The use of geodesic interpolation can be thought of as a regularization technique. Indeed, geodesics smoothly shift spectral power across frequencies lessening the possibility of a fade-in fade-out artifacts and OMT, for scalar power spectra, has been used to this end in [7], [8]. Below we exemplify how geodesics appear in matrix-valued OMT.
Starting withμ0,μ1 ∈ ℱ we approximate the geodesic between the two by constructingN − 1 points intermediate matrix densities. To this end, we setμτ0 =μ0 andμτN =μ1, and determineμτk ∈ ℱ fork = 1, …,N − 1 as the solution to
(17) |
As noted in Section III-D, this can be obtained numerically via convex programming. The present example uses
with
shown inFig. 1. Since the value of a power spectral density at each point in frequency is a 2 × 2 Hermitian matrix, we have used the (1, 1), (1, 2), and (2, 2) subplots to display the magnitude of the corresponding entries, i.e., |μ(1, 1)|, |μ(1, 2)|, (= |μ(2, 1)|), and |μ(2, 2)|, respectively, and the (2,1) subplot to display the phase ∠μ(1, 2) (= −∠μ(2, 1)).
Fig. 1.
Subplots (1,1), (1,2), and (2,2) showμi(1, 1), |μi(1, 2)| (same as |μi(2, 1)|) andμi(2, 2). Subplot (2,1) shows ∠(μi(2, 1)) fori ∈ {0, 1} in blue and red, respectively.
The 3-D plots inFig. 2 refer to (17), withλ = 0.1, for an approximation of a geodesic. The two boundary plots represent the power spectraμ0 andμ1 shown in blue and red, respectively, using the same convention about magnitudes and phases explained above. There are in total 7 power spectraμτk,k = 1, …, 7 shown along the geodesic betweenμ0 andμ1, and the time-indices correspond toτk =k/8. It is interesting to observe the smooth shift of the energy over the geodesic path from the one “channel” to the other while, at the same time, the corresponding peak shifts from one frequency to another. One should bear in mind that the so-constructed geodesic is a non-parametric path interpolating/linking the given spectra.
Fig. 2.
Interpolated resultsμτk fork = 0, …, 8 computed from (17) withμ0 andμ1 as the two boundary points: subplots (1, 1), (1, 2), and (2, 2) showμτk (1, 1), |μτk (1, 2)| (same as |μτk (2, 1)|) andμτk (2, 2), subplot (2, 1) shows ∠(μτk (2, 1)).
B. Regularization Using Geodesics
Consider two time series
fort = 1, …, 2000, both consisting of sinusoidal signals with time-varying amplitude and frequency (chirp-like) with added white noisew1(t) andw2(t). The amplitudea1(t) decreases from 1.2 to 0.1 whilea2(t) increases from 0.1 to 1.2. Frequencyθ1(t) decreases from (π/4) to (π/4) − (π/30) whileθ2(t) increases from (π/3) to (π/3) + (π/30). Then [w1(t),w2(t)]′ is white, with independent components, and sampled from a zero-mean Gaussian distribution with covariance. The initial phases of the sinusoids are randomly selected in [0, 2π].
Since we are dealing with non-stationary time series, we truncate the observed time series, to segments of length equal to 200, to retain resolution. Thus, we letxi,k(t) :=xi(200k+t) withi = 1, 2,k = 0, …, 9 and process separately the segments {xi,k(1),xi,k(2), …,xi,k(200)}. We obtain matrix-valued sample covariances {Rk,−20, …,Rk,0, …,Rk,20} for each. We then determine autoregressive models based on these sample covariances and, thereby, the corresponding power spectral density functions. More specifically, for ℓ = 0, 1, …, 20, we compute
and let. We then solve the Yule-Walker equations
for the autoregressive (matricial) coefficientsA1, …,A20. We let be the corresponding innovation variance. The estimated power spectral density function for thekth segment, denoted asμ̂τk, is given asμ̂τk(θ) =Ak(ejθ)−1Ω(Ak(ejθ)*)−1 with
We scaleμ̂τk so that the integral of its trace is normalized to one. Thus, the observation record is used to obtain 10 PSD’s denoted asμ̂τk, fork = 1, …, 10. These represent estimates of the spectral power at intermediary points in time. We change the time scale so thatτ1 = 0 andτ10 = 1. The spectrogram is shown inFig. 3(a).
Fig. 3.
(a) Shows the estimated spectrogram of the observed time series and (b) corresponds to the geodesic-fitted spectrogram.
We construct an OMT-geodesic to regularize the estimated PSD’s. This idea was proposed and carried out in [8] for scalar time series and scalar PSD’s. For the present matrix-valued setting, the geodesic is obtained by solving
(18) |
An explicit formula of the OMT geodesic is not available. However, in light of Proposition 2, (18) can be approximated for smallλ as follows. Letμ̂τk = tr(μ̂τk) fork = 1, …, 10. These are scalar-valued PSD’s. LetM̂τk denote the corresponding cumulative distribution functions. Forλ small, following [8], we computeμ0 := tr(μ0) andμ1 := tr(μ1) via solving:
withM0 andM1 representing the cumulative distribution function ofμ0 andμ1, respectively. Then, as was shown in [8], theμτk’s for 1 <k < 10 can be computed via
The matrix-valued PSD’sμ0 andμ1 are obtained by solving
and theμτk’s for 1 <k < 10 are computed via
We display this geodesic-fitted spectrogram inFig. 3(b). It can be seen that the shift of energy from one channel to another and between resonant frequencies is smoother than that shown inFig. 3(a) (which is a spectrogram based on matricial auto-regressive models).
In order to compare the resolution between the two techniques (spectrogram based on AR-modeling vs. geodesic regularization), we identify the frequency and directionality of peak power for the two power spectral densities and compare principal direction. This we explain next as the result is quite revealing and suggesting.
For eachμ̂τk (θ), we find two frequenciesθ1 andθ2 where the power spectral densities (PSD’s) have locally maximal power, i.e., the two frequencies where tr(μ̂τk (θ)) has the largest peaks. Then we compute the (normalized) eigenvectors corresponding to the dominant eigenvalues ofμτk (θ1) andμτk (θ2), respectively. These eigenvectors are shown inFig. 4(a) using black dashed lines. The red and green plots inFig. 4(a) represent the path of the two eigenvectors asτk increases from 0 to 1. The axes inFig. 4(a) correspond to the two channels/components of the time series andτk. (Should all the power be present in one of the two channels, the eigenvector would line up, accordingly, to one of the axes.) The values of the eigenvector when projected onto the two channels/axes, reflect the energy of the signals in the corresponding channels. Thus, in antenna-array applications, the direction of eigenvectors corresponds to the direction of a scatterer relative to the array. Statistical errors are reflected in the jagged nature of the paths when these are based on a spectrogram as inFig. 4(a). However, when comparing with the eigenvectors of the OMT regularized spectrogram/AR-modelsμτk, the corresponding paths shown inFig. 4(b) are smooth. Direct comparison betweenFig. 4(a) and (b) highlights the potential advantages of using geodesics as a means to regularize power distribution in non-stationary time series.
Fig. 4.
In (a), the trajectories of the dominant eigenvector ofμ̂τk (θ1) andμ̂τk (θ2) are shown in red and blue, respectively. The corresponding trajectories ofμτk (θ1) andμτk (θ2) are shown in (b).
VI. Conclusion
The geometry of Monge–Kantorovich optimal mass transport provides scalar densities with a natural metric structure (see [11], [12]; also [13] for a systems viewpoint and connections to image analysis and power spectra). Our interest has been in extending such a geometric structure to matrix-valued densities. To this end, we formulated one possible matrix-valued version of the Monge–Kantorovich transportation problem. Computations require convex programming and the framework directly extends the scalar case. An alternative generalization of the Monge–Kantorovich theory to a “non-commutative” setting has been given in the context of the theory of free-probabilities [14]. However, this may not be suitable for matrix-valued power distributions as it is not weak* continuous. Alternative non-commutative generalizations of the Wasserstein metric are given in [15]–[17]. Possible connections between the formulation herein and these alternative viewpoints is the subject of current investigation.
Biographies
Lipeng Ning received the B.S. and M.S. degrees in control science and engineering from Beijing Institute of Technology, Beijing, China, in 2006 and 2008, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Minnesota, Minneapolis, in 2013.
He is currently a Postdoc Research Fellow in Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA. His research interests include system identification, spectral estimation, sparse signal recovery, and diffusion magnetic resonance imaging.
Tryphon T. Georgiou (F’00) received the Diploma in mechanical and electrical engineering from the National Technical University of Athens, Athens, Greece, in 1979 and the Ph.D. degree from the University of Florida, Gainesville, in 1983.
He is a faculty member in the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, and the Vincentine Hermes-Luh Chair. He has served as a Co-Director of the Control Science and Dynamical Systems Center at the University of Minnesota (1990–2014), and on the Board of Governors of the Control Systems Society of the IEEE (2002–2005).
Dr. Georgiou has been a recipient of the George S. Axelby Outstanding Paper Award of the IEEE Control Systems Society for the years 1992, 1999, and 2003. He is a Foreign Member of the Royal Swedish Academy of Engineering Sciences (IVA).
Allen Tannenbaum (F’08) is a faculty member in computer science and applied mathematics at Stony Brook University, Stony Brook, NY, USA. He works in control, computer vision, and medical imaging.
Appendix
Appendix: Proof of Proposition 2
We need to show that ifm(x1,y1) ≠ 0 andm(x2,y2) ≠ 0, thenx2 >x1,y1 >y2 implies
(19) |
Without loss of generality, let
(20) |
withAij,Bij ≥ 0, tr(Aij) = tr(Bij) = 1 andi,j ∈ {1, 2}. Note thatm12 andm21 could be zero ifm does not have support on the particular point. We assume that the condition in the proposition fails and that
(21) |
We then show that by rearranging the mass, the cost can be reduced.
We first consider the situation whenm22 ≥m11. By rearranging the value ofm at the four points (xi,yj) withi,j ∈ {1, 2}, we construct a new transportation planm̃ at these four locations as follows:
(22a) |
(22b) |
(22c) |
(22d) |
where
This new transportation planm̃ has the same marginals asm atx1,x2 andy1,y2. The original cost incurred bym at these four locations is
(23) |
while the cost incurred bym̃ is
(24) |
After simplification, to show that (23) is larger than (24), it suffices to show that
(25) |
is larger than
(26a) |
(26b) |
(26c) |
From (21), it follows that the value in (25) is greater than 20λm11. We derive upper bounds for each term in (26). First,
where the last inequality follows from the fact that:
for anyA,B ≥ 0 with tr(A) = tr(B) = 1. Now consider
where the second equality follows from the definitions ofÃ12 andB̃12 while the last inequality is obtained by bounding the terms in the trace. Thus, referring to expressions by the respective equation numbering
In a similar manner, (26c) ≤ 2λm11. Therefore
which implies that the cost incurred bym̃ is smaller than the cost incurred bym.
For the case wherem11 >m22, we can prove the claim by constructing a new transportation planm̂ with values
with
The rest of the proof is carried out in a similar manner.
Footnotes
A sequence of measuresdμn converges weak* todμ if and only if ∫fdμn → ∫fdμ for all continuous and boundedf.
It is interesting to consider functionals of the form tr(c(x, y)m(x, y)), withc(x, y) being matricial, and how to utilize such so as to reflect transportation cost with practical relevance.
Matlab code is available athttp://www.ece.umn.edu/users/ningx015/research.html.
Color versions of one or more of the figures in this paper are available online athttp://ieeexplore.ieee.org.
Contributor Information
Lipeng Ning, Email: lning@bwh.harvard.edu, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115 USA.
Tryphon T. Georgiou, Email: tryphon@umn.edu, Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA.
Allen Tannenbaum, Email: allen.tannenbaum@stonybrook.edu, Departments of Computer Science and Applied Mathematics, Stony Brook University, Stony Brook, NY 11794 USA.
References
- 1.Monge G. Open Library. De l’Imprimerie Royale; 1781. Mémoire sur la théorie des déblais et des remblais. [Google Scholar]
- 2.Kantorovich L. On the transfer of masses. Dokl Akad Nauk SSSR. 1942;37:227–229. [Google Scholar]
- 3.Villani C. Topics in optimal transportation. Vol. 58. Providence, RI: American Mathematical Society; 2003. [Google Scholar]
- 4.Ambrosio L. Lecture notes on optimal transport problems. Mathematical aspects of evolving interfaces. 2003:1–52. [Google Scholar]
- 5.Rachev S, Rüschendorf L. Mass Transportation Problems: Theory. Vol. 1. New York: Springer-Verlag; 1998. [Google Scholar]
- 6.Ferrante A, Pavon M, Ramponi F. Hellinger versus Kullback–Leibler multivariable spectrum approximation. IEEE Trans Autom Control. 2008 Apr;53(4):954–967. [Google Scholar]
- 7.Jiang X, Ning L, Georgiou TT. Distances and Riemannian metrics for multivariate spectral densities. IEEE Trans Autom Control. 2012 Jul;57(7):1723–1735. [Google Scholar]
- 8.Jiang X, Luo Z, Georgiou T. Geometric methods for spectral analysis. IEEE Trans Signal Process. 2012 Mar;60(3):1064–1074. [Google Scholar]
- 9.Petz D. Quantum Information Theory and Quantum Statistics (Theoretical and Mathematical Physics) Berlin, Germany: Springer; 2008. [Google Scholar]
- 10.Boyd S, Vandenberghe L. Convex optimization. Cambridge, MA: Cambridge Univ. Press; 2004. [Google Scholar]
- 11.Benamou J, Brenier Y. A computational fluid mechanics solution to the Monge–Kantorovich mass transfer problem. Numerische Mathematik. 2000;84(3):375–393. [Google Scholar]
- 12.Jordan R, Kinderlehrer D, Otto F. The variational formulation of the Fokker-Planck equation. SIAM J Mathemat Anal. 1998;29(1):1–17. [Google Scholar]
- 13.Tannenbaum E, Georgiou T, Tannenbaum A. Signals and control aspects of optimal mass transport and the Boltzmann entropy. Proc. 49th IEEE Conf. Decision and Control; 2010. pp. 1885–1890. [Google Scholar]
- 14.Biane P, Voiculescu D. A free probability analogue of the Wasserstein metric on the trace-state space. Geometric and Funct Anal. 2001;11(6):1125–1138. [Google Scholar]
- 15.Rieffel M. Metrics on state spaces. Doc Math J DMV. 1999;4:559–600. [Google Scholar]
- 16.Andrea FD, Martinetti P. A view on optimal transport from noncommutative geometry. SIGMA. 2010;6(057):24. [Google Scholar]
- 17.Martinetti P. Towards a Monge–Kantorovich metric in noncommutative geometry. Zap Nauch Semin POMI. 2013:411. arXiv preprint arXiv:1210.6573. [Google Scholar]