. Author manuscript; available in PMC: 2021 Apr 26.

Published in final edited form as:IEEE Trans Automat Contr. 2018 Nov 5;64(8):3184–3193. doi:10.1109/tac.2018.2879597

Smooth Interpolation of Covariance Matrices and Brain Network Estimation

¹Department of Psychiatry, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115 USA.

^✉

Email:lning@bwh.harvard.edu

Issue date 2019 Aug.

Personal use is permitted, but republication/redistribution requires IEEE permission. Seehttp://www.ieee.org/publications_standards/publications/rights/index.html for more information.

PMC Copyright notice

PMCID: PMC8074851 NIHMSID: NIHMS1063649 PMID:33907337

Abstract

We propose an approach to use the state covariance of autonomous linear systems to track time-varying covariance matrices of nonstationary time series. Following concepts from the Riemannian geometry, we investigate three types of covariance paths obtained by using different quadratic regularizations of system matrices. The first quadratic form induces the geodesics based on the Hellinger–Bures metric related to optimal mass transport (OMT) theory and quantum mechanics. The second type of quadratic form leads to the geodesics based on the Fisher–Rao metric from information geometry. In the process, we introduce a weighted-OMT interpretation of the Fisher–Rao metric for multivariate Gaussian distributions. A main contribution of this work is the introduction of the third type of covariance paths, which are steered by system matrices with rotating eigenspaces. The three types of covariance paths are compared using two examples with synthetic data and real data from resting-state functional magnetic resonance imaging, respectively.

Index Terms—: Brain networks, functional magnetic resonance imaging, information theory, optimal control, optimal mass transport (OMT), Riemannian metric, system identification

I. Introduction

THE problem of tracking changes and deformations of positive-definite matrices is relevant to a wide spectrum of scientific applications, including computer vision, sensor array, and diffusion tensor imaging (see, e.g., [1]–[7]). A key motivation behind the present work is from a neuroscience application on understanding functional brain connectivity using resting-state function magnetic resonance imaging (rsfMRI) data. Specifically, rsfMRI is a widely used neuroimaging modality, which acquires a sequence of three-dimensional image volumes to understand brain functions and activities [8]. The standard approach for analyzing functional connections between brain regions is to compute the correlation coefficient between the underlying rsfMRI time-series data. Consequently, the whole-brain functional network is characterized by the covariance matrix of a multivariate time-series data obtained from different brain regions [9], [10]. It has been recently observed that the functional connectivity fluctuates over time [11], [12], implying that the static covariance matrix is too simplistic to capture the full extent of time-varying brain activities. Thus, there is an urgent need for new computational tools for understanding dynamic functional brain networks using nonstationary rsfMRI data. The aim of this work is use control-theoretic approaches to develop models for smooth paths of time-varying covariance matrices.

We consider a possibly nonstationary zero-mean time series ${x_{τ}; τ \in ℤ}$ taking values in $ℝ^{n}$ . We assume that the temporal change of the probability distributionsp_t(x) ofx_t is much slower than the rate of measurements. Therefore, the instantaneous covariance matrix

P_{t} ≔ E_{p_{t}} (x x^{'}) = \int_{ℝ^{n}} x x^{'} p_{t} (x) d x

can be estimated by using sample covariance matrices computed from discrete measurements of the time series. Assume that two covariance matricesP₀ andP₁ att = 0, 1 are known. Then, geodesics connectingP₀ andP₁ on the manifold of positive-definite matrices provide natural structures to model time-varying covariance matricesP_t on the time intervalt ∈ [0,1]. As generalizations of straight lines in Euclidean space, geodesics are paths of the shortest distance connecting the start to the finish on a curved manifold. The path length is measured by Riemannian metrics, which are quadratic forms of the tangent matrix ${\dot{P}}_{t}$ . Several Riemannian metrics have been investigated to derive geodesics for covariance matrices. For instance, the geodesics based on the Fisher–Rao metric for Gaussian distributions [13]–[16] from the theory of information geometry is given by

P_{t}^{info} = P_{0}^{\frac{1}{2}} {(P_{0}^{- \frac{1}{2}} P_{1} P_{0}^{- \frac{1}{2}})}^{t} P_{0}^{\frac{1}{2}} .

(1)

Another example would be the Wasserstein-2 metric for Gaussian probability density functions [17]–[20], which induces the following geodesic:

P_{t}^{omt} = ((1 - t) P_{0}^{\frac{1}{2}} + t P_{1}^{\frac{1}{2}} \hat{U}) {((1 - t) P_{0}^{\frac{1}{2}} + t P_{1}^{\frac{1}{2}} \hat{U})}^{'}

(2)

where $P_{t}^{\frac{1}{2}}$ denotes the unique positive-definite square root ofP_t, and

\hat{U} = P_{1}^{- \frac{1}{2}} P_{0}^{- \frac{1}{2}} {(P_{0}^{\frac{1}{2}} P_{1} P_{0}^{\frac{1}{2}})}^{\frac{1}{2}}

is an orthogonal matrix. These Riemannian metrics will be explained in more detail in the following sections.

In this paper, we combine concepts from Riemannian geometry and autonomous systems to develop smooth covariance paths. Specifically, we consider a geodesicP_t as the state covariance of the following autonomous linear system:

{\dot{x}}_{t} = A_{t} x_{t}

(3)

with $A_{t} \in ℝ^{n \times n}$ . We analyze the linear systems that steerP_t along different geodesics. Based on (3), the state covariance evolves according to

{\dot{P}}_{t} = A_{t} P_{t} + P_{t} A_{t}^{'} .

(4)

The matrixA_t can be considered as a noncommutative division of ${\dot{P}}_{t}$ byP_t scaled by a factor of $\frac{1}{2}$ . In this paper, we consider Riemannian metrics on the manifolds of positive-definite matrices as quadratic forms ofA_t. To this end, we consider covariance paths that are the solutions to

min_{P_{t}, A_{t}} {\int_{0}^{1} f (A_{t}) dt | {\dot{P}}_{t} = A_{t} P_{t} + P_{t} A_{t}^{'}, P_{0}, P_{1} given}

(5)

where f(A_t) is a positive quadratic function ofA_t, which may depend onP_t. Note that the optimal system matrixA_t may be asymmetric, which could provide insight to understand directed dependence between the underlying variables.

The rest of this paper is organized as follows. In Section II, we revisit the optimal mass transport (OMT)-based geodesics $P_{t}^{omt}$ and introduce the corresponding quadratic form f(A_t). Section III will focus on the quadratic forms that lead to the Fisher-Rao-based geodesics $P_{t}^{info}$ . We also introduce a fluid-mechanics interpretation of the Fisher–Rao metric, which provides a point of contact between OMT and information geometry. In Section IV, we investigate the optimal solutions to (5) corresponding to a family of quadratic functions f(A_t), which are weighted-square norms of the symmetric and asymmetric part ofA_t. In Section V, we compare the three types of covariance paths using two examples based on synthetic data and real data from rsfMRI, respectively. Section VI includes the discussions and conclusions.

For notations, Symⁿ, ${Sym}_{+}^{n}$ , and ${Sym}_{+ +}^{n}$ denote the set of symmetric, positive semidefinite, and positive-definite matrices of sizen ×n, respectively. Small boldface letters, e.g.,x,v, represent column vectors. Capital letters, e.g.,P, A, denote matrices. Regular small letters, e.g., w, h, are for scalars or scalar-valued functions.

II. Mass-Transport-Based Covariance Paths

A. On Optimal Mass Transport

Letp₀ (x) andp₁ (x) denote two probability density functions on $ℝ^{n}$ . The Wasserstein-2 metric between the two, denoted by w₂ (p₀,p₁), is defined by the square root of the optimal value as

inf_{m (x, y) \geq 0} \int_{ℝ^{n} \times ℝ^{n}} ‖ x - y ‖_{2}^{2} m (x, y) d x d y,

s.t. \int_{ℝ^{n}} m (x, y) d x = p_{1} (y), \int_{ℝ^{n}} m (x, y) d y = p_{0} (x)

wherem (x,y) represents a probability density function on the joint space $ℝ^{n} \times ℝ^{n}$ with the marginals specified byp₀ andp₁ [17], [18]. A fluid-mechanics interpretation of w₂ (p₀, p₁)² was introduced in [21] and [22], which provided a Riemannian structure of the manifold of probability density functions. To introduce this formula, we consider the following continuity equation:

\frac{\partial p_{t} (x)}{\partial t} + \nabla_{x} \cdot (p_{t} (x) v_{t} (x)) = 0

(6)

wherev_t (x) represents a time-varying velocity field andp_t (x) represents the time-varying density function. Then, w₂ (p₀, p₁)² is equal to (see [22])

inf_{p_{t}, v_{t}} {\int_{0}^{1} E_{p_{t}} ({‖ v_{t} (x) ‖}_{2}^{2}) dt | {\dot{p}}_{t} + \nabla_{x} \cdot (p_{t} v_{t}) = 0} .

(7)

The optimal solution ofp_t (x) is the geodesic on the manifold of probability density functions that connects the endpointsp₀ (x) andp₁ (x).

B. Hellinger–Bures Metric

In the special case whenp₀ (x) andp₁ (x) are zero-mean Gaussian probability density functions with

p_{i} (x) = det {(2 π P_{i})}^{- \frac{1}{2}} e^{(- \frac{1}{2} x^{'} P_{i}^{- 1} x)}, for i = 0, 1

the corresponding geodesicp_t (x) at any fixed timet is also zero-mean Gaussian with the corresponding covariance matrix given by $P_{t}^{omt}$ [19], [20]. The geodesic distance is equal to the Wasserstein-2 metric w₂ (p₀, p₁), which also induces the following distance measure on the covariance matrices:

w_{2} (p_{0}, p_{1}) = d_{w_{2}} (P_{0}, P_{1}) ≔ {‖ P_{0}^{\frac{1}{2}} - P_{1}^{\frac{1}{2}} \hat{U} ‖}_{F},

(8)

where ∥·∥_F denotes the Frobenius norm of a matrix.

The covariance path $P_{t}^{omt}$ is also equal to the geodesic induced by the Hellinger–Bures metric from quantum mechanics [23], [24] on the manifold of positive-definite matrices. In particular, let Δ ∈ Symⁿ, which represents a tangent vector at $P \in {Sym}_{+ +}^{n}$ . Then, the Hellinger–Bures metric takes the form

g_{P, Bures} (Δ) = tr (Δ M)

whereM is the unique matrix in Symⁿ, which satisfies $\frac{1}{2} (P M + M P) = Δ$ (see, e.g., [25] and [26]). The trajectory $P_{t}^{omt}$ in (2) is the shortest path connectingP₀ andP₁. Thus, it satisfies

P_{t}^{omt} = \underset{P_{t}}{argmin} \int_{0}^{1} g_{P_{t}, Bures} ({\dot{P}}_{t}) dt

(9)

with a given pair of endpoints $P_{0}, P_{1} \in {Sym}_{+ +}^{n}$ (see, e.g., [23]). The geodesic distance, also known as the Bures distance, is equal to $d_{w_{2}} (P_{0}, P_{1})$ .

The Hellinger–Bures metric was originally proposed in quantum mechanics to compare density matrices, which are positive-definite matrices whose traces are equal to one. The density matrices are noncommutative analogues of probability vectors. In the commutative case whenP is restricted to be a diagonal matrix whose diagonal entries consist of a probability vector, then the Hellinger–Bures metric g_P,Bures(Δ) is equal the Fisher information metric, which will be discussed in Section III in detail. A generalization of the Hellinger–Bures metric was also introduced in [27] for multivariate spectral densities.

C. Hellinger–Bures-Based Linear Systems

Here, we present an alternative expression of (9) using the linear system in (3). For this purpose, we define that

f_{P_{t}} (A_{t}) ≔ tr (A_{t} P_{t} A_{t}^{'})

which is equal to $E_{p_{t}} ({‖ {\dot{x}}_{t} ‖}^{2})$ ifẋ_t is given by (3). The following theorem relates the geodesics $P_{t}^{omt}$ to a linear system.

Theorem 1: Given $P_{0}, P_{1} \in {Sym}_{+ +}^{n}$ . Let f_P be defined as above. Let $P_{t}^{omt}$ , $A_{t}^{omt}$ be a pair of minimizer of

min_{P_{t}, A_{t}} {\int_{0}^{1} f_{P_{t}} (A_{t}) dt | {\dot{P}}_{t} = A_{t} P_{t} + P_{t} A_{t}^{'}, P_{0}, P_{1} given} .

(10)

Then, $P_{t}^{omt}$ is equal to (2) and

A_{t}^{omt} = Q {(t Q - I)}^{- 1}

(11)

where $Q = I - P_{0}^{- \frac{1}{2}} {(P_{0}^{\frac{1}{2}} P_{1} P_{0}^{\frac{1}{2}})}^{\frac{1}{2}} P_{0}^{- \frac{1}{2}}$ . The optimal value of the objective function in (10) is equal to $d_{w_{2}} {(P_{0}, P_{1})}^{2}$ .

Proof: The optimization problem (10) appears as a special case of (7) with the additional constraint that the velocity fieldv_t (x) =A_tx. Thus, $d_{w_{2}} {(P_{0}, P_{1})}^{2}$ is a lower bound of (10). Therefore, we need to show that $A_{t}^{omt}$ satisfies

{\dot{P}}_{t}^{omt} = A_{t}^{omt} P_{t}^{omt} + P_{t}^{omt} {(A_{t}^{omt})}^{'} .

(12)

To this end, we rewrite (2) as $P_{t}^{omt} = (I - t Q) P_{0} (I - t Q)$ . Taking the derivative of $P_{t}^{omt}$ gives

{\dot{P}}_{t}^{omt} = - Q P_{0} (I - t Q) - (I - t Q) P_{0} Q = Q {(t Q - I)}^{- 1} P_{t} + P_{t} Q {(t Q - I)}^{- 1}

which proves (12). Since all the eigenvalues ofQ are smaller than 1, the matrixtQ −I is invertible for allt ∈ [0, 1]. Therefore, (12) holds. Moreover, we have

tr (A_{t}^{omt} P_{t}^{omt} A_{t}^{omt}) = tr (Q P_{0} Q) = {‖ P_{1}^{1 / 2} \hat{U} - P_{0}^{1 / 2} ‖}_{F}^{2} = d_{w_{2}} {(P_{0}, P_{1})}^{2}

which completes the proof. ■

We note that the matrix $A_{t}^{omt}$ is symmetric. Moreover, the matrices $A_{t_{1}}^{omt}$ and $A_{t_{2}}^{omt}$ commute for anyt₁, t₂. Therefore, the eigenspace ofA_t is fixed on the intervalt ∈ [0, 1].

The results from Theorem 1 can be further extended to obtain the optimal solutions corresponding to the following objective function:

f_{P_{t}}^{W} (A_{t}) = E_{p_{t}} ({‖ {\dot{x}}_{t} ‖}_{W}^{2}) = tr (W A_{t} P_{t} A_{t}^{'})

(13)

whereẋ_t =A_tx, $W \in {Sym}_{+ +}^{n}$ , and ${‖ {\dot{x}}_{t} ‖}_{W}^{2} ≔ x^{'} W x$ . By applying change of variables, we define

P_{W, t} ≔ W^{\frac{1}{2}} P_{t} W^{\frac{1}{2}}

(14)

A_{W, t} ≔ W^{\frac{1}{2}} A_{t} W^{- \frac{1}{2}} .

(15)

Thus, $f_{P_{t}}^{W} (A_{t}) = tr (A_{W, t} P_{W, t} A_{W, t}^{'}) = f_{P_{W, t}} (A_{W, t})$ . Moreover, if (4) holds, then

{\dot{P}}_{W, t} = A_{W, t} P_{W, t} + P_{W, t} A_{W, t}^{'} .

Therefore, if $A_{t}^{omt}$ is the optimal system matrix that steersP_W,0 toP_W,1 with respect to f_P (A) given by Theorem 1, then $W^{- \frac{1}{2}} A_{t}^{omt} W^{\frac{1}{2}}$ is the optimal solution with respect to $f_{P}^{W} (A)$ . In the following section, we investigate a further extension of $f_{P}^{W} (\cdot)$ by using a time-dependent weighting matrix, which provides a point of contact between OMT and information geometry.

III. Information-Geometry-Based Covariance Paths

A. Fisher–Rao Metric

For two probability density functionsp(x) and $\hat{p} (x)$ on $ℝ^{n}$ , the Kullback–Leibler (KL) divergence

d_{KL} (p ‖ \hat{p}) ≔ \int_{ℝ^{n}} p log (\frac{p}{\hat{p}}) d x

(16)

represents a well-established notion of distance between the two [28], [29]. If $\hat{p} = p + δ$ withδ representing a small perturbation, then the quadratic term of the Taylor’s expansion of d_KL(p∥p +δ) in terms ofδ is the Fisher information metric

g_{p, Fisher} (δ) = \int \frac{δ^{2}}{p} d x .

For a probability distributionp(x,θ) parameterized by a vector θ, the corresponding metric is referred to as the Fisher–Rao metric and is given by

g_{θ, Rao} (δ_{θ}) = δ_{θ}^{'} E [(\frac{\partial log p}{\partial θ}) {(\frac{\partial log p}{\partial θ})}^{'}] δ_{θ} .

Ifp(x) is a zero-mean Gaussian probability density function parameterized by a covariance matrixP, then the metric becomes

g_{P, Rao} (Δ) = tr (P^{- 1} Δ P^{- 1} Δ) .

Given $P_{0}, P_{1} \in {Sym}_{+ +}^{n}$ , the geodesic in (1) is equal to the solution

P_{t}^{info} = \underset{P_{t}}{argmin} \int_{0}^{1} g_{P, Rao} ({\dot{P}}_{t}) dt

(17)

which is the shortest path connectingP₀ andP₁. The corresponding path length is equal to (see, e.g., [30, Th. 6.1.6])

d_{info} (P_{0}, P_{1}) = {‖ log (P_{0}^{- \frac{1}{2}} P_{1} P_{0}^{- \frac{1}{2}}) ‖}_{F} .

(18)

B. Fisher–Rao-Metric-Based Linear Systems

Following (5), we will introduce some positive quadratic forms f(A) so that the corresponding optimal covariance path is equal to $P_{t}^{info}$ . One choice for the quadratic form would be given by

f_{P}^{info, 1} (A) : = g_{P, Rao} (A P + P A^{'}) = 2 tr (A A + P^{- 1} A P A^{'})

(19)

which satisfies that $f_{P}^{info, 1} (A) \geq 0$ , $\forall A \in ℝ^{n \times n}$ and $P \in {Sym}_{+ +}^{n}$ .

Theorem 2: Given $P_{0}, P_{1} \in {Sym}_{+ +}^{n}$ . Let $f_{P}^{info, 1} (\cdot)$ be defined as above. Define $P_{t}^{info}$ ,A^info as a pair of minimizer of

min_{P_{t}, A_{t}} {\int_{0}^{1} f_{P_{t}}^{info, 1} (A_{t}) dt | {\dot{P}}_{t} = A_{t} P_{t} + P_{t} A_{t}^{'}, P_{0}, P_{1} given} .

(20)

Then, the optimal solution $P_{t}^{info}$ is equal to (1), and

A^{info} ≔ \frac{1}{2} P_{0}^{\frac{1}{2}} log (P_{0}^{- \frac{1}{2}} P_{1} P_{0}^{- \frac{1}{2}}) P_{0}^{- \frac{1}{2}} .

(21)

Moreover, the optimal value of the objective function is equal to d_info(P₀, P₁)².

Proof: We rewrite (1) as

P_{t}^{info} = P_{0}^{\frac{1}{2}} {(P_{0}^{- \frac{1}{2}} P_{1} P_{0}^{- \frac{1}{2}})}^{t} P_{0}^{\frac{1}{2}} = P_{0}^{\frac{1}{2}} e^{\frac{1}{2} log (P_{0}^{- \frac{1}{2}} P_{1} P_{0}^{- \frac{1}{2}}) t} e^{\frac{1}{2} log (P_{0}^{- \frac{1}{2}} P_{1} P_{0}^{- \frac{1}{2}}) t} P_{0}^{\frac{1}{2}} = e^{A^{info} t} P_{0} e^{A^{info'} t}

where the last equation is obtained using $e^{X Y X^{- 1}} = X e^{Y} X^{- 1}$ . Take the derivative of $P_{t}^{info}$ to give that

{\dot{P}}_{t}^{info} = A^{info} P_{t}^{info} + P_{t}^{info} A^{info'}

(22)

which completes the proof. ■

Note that the metric d_info(·) is invariant with respect to congruence transforms, i.e., d_info(P₀,P₁) = d_info(TP₀T′,TP₁T′) for any invertible matrixT. If $A_{t}^{info}$ is the optimal solution of (20), thenTA^infoT⁻¹ is the optimal solution corresponding to the pairTP₀T′,TP₁T′.

C. Weighted-Mass-Transport View

Since the map fromA_t to ${\dot{P}}_{t}$ in (4) is injective, the quadratic forms ofA_t that lead to the geodesic $P_{t}^{info}$ are not unique. Here, we provide an alternative quadratic form, which provides an interesting relation between OMT and the Fisher–Rao metric. For this purpose, we define

f_{P_{t}}^{info, 2} (A_{t}) = 4 E_{p_{t}} ({‖ {\dot{x}}_{t} ‖}_{P_{t}^{- 1}}^{2}) = 4 tr (P_{t}^{- 1} A_{t} P_{t} A_{t}^{'})

(23)

which is a special form of (13) with the weighting matrixW =P_t and scaled by the factor of 4. The following lemma draws the relation between $f_{P}^{info, 2} (A)$ and $f_{P}^{info, 1} (A)$ .

Lemma 1: Consider $f_{P}^{info, 1} (\cdot)$ and $f_{P}^{info, 2} (\cdot)$ be defined in (19) and (23), respectively. Then, we have

f_{P}^{info, 2} (A) \geq f_{P}^{info, 1} (A) \forall P \in {Sym}_{+ +}^{n}, A \in ℝ^{n \times n} .

Proof: Take the difference

f_{P}^{info, 2} (A) - f_{P}^{info, 1} (A) = 2 tr (P^{- 1} A P A^{'} - A A) = tr ((P^{- \frac{1}{2}} A P^{\frac{1}{2}} - P^{\frac{1}{2}} A^{'} P^{- \frac{1}{2}}) {(P^{- \frac{1}{2}} A P^{\frac{1}{2}} - P^{\frac{1}{2}} A^{'} P^{- \frac{1}{2}})}^{'})

which is nonnegative. ■

We note that if $P^{- \frac{1}{2}} A P^{\frac{1}{2}}$ is symmetric, then $f_{P}^{info, 1} (A)$ is equal to $f_{P}^{info, 2} (A)$ . This gives rise to the following proposition in parallel to Theorem 2.

Proposition 1: Given $P_{0}, P_{1} \in {Sym}_{+ +}^{n}$ . Then $P_{t}^{info}$ andA^info given by (1) and (21), respectively, are the unique pair of minimizer of

min_{P_{t}, A_{t}} {\int_{0}^{1} f_{P_{t}}^{info, 2} (A_{t}) dt | {\dot{P}}_{t} = A_{t} P_{t} + P_{t} A_{t}^{'}, P_{0}, P_{1} given}

(24)

with $f_{P_{t}}^{info, 2} (A_{t})$ defined as in (23).

Proof: From Lemma 1, we have

\int_{0}^{1} f_{P_{t}}^{info, 2} (A_{t}) dt \geq \int_{0}^{1} f_{P_{t}}^{info, 1} (A_{t}) dt \geq d_{info} {(P_{0}, P_{1})}^{2}

(25)

for any feasible pairs ofP_t andA_t. It is straightforward to verify that the above inequalities become equalities with the given $P_{t}^{info}$ andA^info. Therefore, the proposition holds. ■

D. Fluid-Mechanics Interpretation

Note that $f_{P_{t}}^{info, 2} (A_{t})$ is a special case of $f_{P_{t}}^{W} (A)$ in (13) when $W = 4 P_{t}^{- 1}$ . It is equal to

f_{P_{t}}^{info, 2} (A_{t}) = 4 E_{p_{t}} ({‖ v_{t} (x) ‖}_{P_{t}^{- 1}}^{2})

with the velocity field given byv_t(x) =A_tx. Thus, if the initial distributionp₀(x) is Gaussian, so isp_t(x), ∀t ≥ 0. Proposition 1 implies that among all the trajectories that connect two Gaussian probability density functionsp₀ andp₁, the lowest weighted-mass-transport cost is obtained by Gaussian density functions whose covariance matrices are equal to $P_{t}^{info}$ . But this optimal trajectory is obtained under the linear constraint of velocity fields. Next, we remove this constraint and show that this trajectory is still optimal.

Theorem 3: Given two zero-mean Gaussian probability density functionsp₀ (x),p₁ (x) in $ℝ^{n}$ with covariance matrices $P_{0}, P_{1} \in {Sym}_{+ +}^{n}$ , respectively, define $p_{t}^{info} (x)$ , $v_{t}^{info} (x)$ as the minimizer of

inf_{p_{t}, v_{t}} {4 \int_{0}^{1} E_{p_{t}} ({‖ v_{t} (x) ‖}_{P_{t}^{- 1}}^{2}) dt | {\dot{p}}_{t} + \nabla_{x} \cdot (p_{t} v_{t}) = 0} .

(26)

Then, $p_{t}^{info} (x)$ is zero-mean Gaussian whose covariance matrix is equal to $P_{t}^{info}$ in (1) and $v_{t}^{info} (x) = A^{info} x$ almost surely withA^info given by (21). Moreover, the optimal value is equal to d_info(P₀, P₁)².

Proof: First, we define

[\begin{matrix} V_{t} & C_{t} \\ C_{t}^{'} & P_{t} \end{matrix}] ≔ E_{p_{t}} ([\begin{matrix} v_{t} (x) \\ x \end{matrix}] [v_{t} {(x)}^{'} x^{'}]) .

(27)

Then, applying integral by parts, we obtain

{\dot{P}}_{t} = \int_{ℝ^{n}} x x^{'} {\dot{p}}_{t} (x) d x = \int_{ℝ^{n}} - x x^{'} \nabla \cdot (p_{t} (x) v_{t} (x)) d x = C_{t} + C_{t}^{'} .

(28)

Therefore, the following optimization problem

min_{C_{t}, V_{t}, P_{t}} {4 \int_{0}^{1} tr (P_{t}^{- 1} V_{t}) dt | [\begin{matrix} V_{t} C_{t} \\ C_{t}^{'} P_{t} \end{matrix}] \in {Sym}_{+}^{2 n \times 2 n}, {\dot{P}}_{t} = C_{t} + C_{t}^{'}}

(29)

provides a lower bound of (26) because the higher order moments of the probability density functions are not considered. On the other hand, (24) provides an upper bound of (26) because the velocity field is constrained to satisfy the linear system. Then, we show that the two bounds coincide.

Note that $V_{t} - C_{t} P_{t}^{- 1} C_{t}^{'} \in {Sym}_{+}^{n}$ . Thus, the optimalV_t of (29) should satisfy that $V_{t} = C_{t} P_{t}^{- 1} C_{t}^{'}$ . Therefore, (29) is equal to

min_{C_{t}, P_{t} \in {Sym}_{+}^{n}} {4 \int_{0}^{1} tr (P_{t}^{- 1} C_{t} P_{t}^{- 1} C_{t}^{'}) dt | {\dot{P}}_{t} = C_{t} + C_{t}^{'}} .

(30)

Note that the constraint $P_{t} \in {Sym}_{+}^{n}$ is automatically satisfied due to the inverse barrier objective function. Therefore, we drop the constraint that $P_{t} \in {Sym}_{+}^{n}$ in the following analysis.

The optimization problem (30) is viewed as an optimal control problem, withC_t being matrix-valued control. Then, we derive the optimal solution using Pontryagin’s minimum principle (see, e.g., [31, Ch. 2]). A necessary condition for the optimal solution is that it must annihilate the variation of the Hamiltonian

h_{1} (C_{t}, P_{t}, Π_{t}) ≔ 4 tr (P_{t}^{- 1} C_{t} P_{t}^{- 1} C_{t}^{'}) + tr (Π_{t} (C_{t} + C_{t}^{'}))

with respect to the controlC_t. Here, Π_t is a symmetric matrix representing the Lagrange multiplier, i.e., the costate. By setting the partial derivative ofh₁ (·) with respect toC_t to zero, we obtain that

C_{t} = - \frac{1}{4} P_{t} Π_{t} P_{t}

(31)

which provides a necessary condition that the optimalC_t is symmetric. Therefore, $C_{t} = \frac{1}{2} {\dot{P}}_{t}$ and the objective function in (29) becomes $tr (P_{t}^{- 1} {\dot{P}}_{t} P_{t}^{- 1} {\dot{P}}_{t}) = g_{P, Rao} ({\dot{P}}_{t})$ . Thus, the theorem directly follows from (17). For completeness, we finish the proof based on the Hamiltonian in the following.

The optimal ${\dot{Π}}_{t}$ must annihilate the partial derivative ofh₁ (·) with respect toP_t. This gives rise to

{\dot{Π}}_{t} = 8 P_{t}^{- 1} C_{t} P_{t}^{- 1} C_{t}^{'} P_{t}^{- 1} .

(32)

Then, substituting (31) into (28) and (32), we obtain

{\dot{P}}_{t} = - \frac{1}{2} P_{t} Π_{t} P_{t}

{\dot{Π}}_{t} = \frac{1}{2} Π_{t} P_{t} Π_{t} .

(33)

Note that ${\dot{P}}_{t} Π_{t} + P_{t} {\dot{Π}}_{t} = 0$ for allt. Hence,P_tΠ_t is constant. We set

- \frac{1}{4} P_{t} Π_{t} = A .

(34)

Thus, (31) is equal toC_t =AP_t =P_tA′. Multiplying both sides by $P_{t}^{- \frac{1}{2}}$ gives that $P_{t}^{- \frac{1}{2}} C_{t} P_{t}^{- \frac{1}{2}} = P_{t}^{- \frac{1}{2}} A P_{t}^{\frac{1}{2}}$ , which is symmetric for allt. Substituting (34) into (33) gives

{\dot{P}}_{t} = A P_{t} + P_{t} A^{'} .

Therefore, we have

P_{t} = e^{A t} P_{0} e^{A^{'} t} .

Multiplying $P_{0}^{- \frac{1}{2}}$ to both sides gives

P_{0}^{- \frac{1}{2}} P_{t} P_{0}^{- \frac{1}{2}} = P_{0}^{- \frac{1}{2}} e^{A t} P_{0} e^{A^{'} t} P_{0}^{- \frac{1}{2}} = e^{2 P_{0}^{-} \frac{1}{2}} A P_{0}^{\frac{1}{2}} t .

By settingt = 1, we solve that

A = \frac{1}{2} P_{0}^{\frac{1}{2}} log (P_{0}^{- \frac{1}{2}} P_{1} P_{0}^{- \frac{1}{2}}) P_{0}^{- \frac{1}{2}}

which is equal toA^info. Furthermore, from (22), the correspondingP_t is equal to $P_{t}^{info}$ . Then, the optimal covariance matrix in (27) is singular and has rankn. Thus, the optimal velocity fieldv_t(x) is equal toA^infox almost surely, implying that the correspondingp_t(x) is Gaussian. Therefore, the theorem is proved. ■

Note that the system matrixA^info is constant. Thus, bothA^info and $A_{t}^{omt}$ have fixed eigenspaces. Next, we introduce a different quadratic form ofA_t, which leads to system matrices with rotating eigenspaces.

IV. Rotation-Linear-System-Based Covariance Paths

A. Weighted-Least-Squares Cost Functions

Note that ifX is an antisymmetric matrix, i.e.,X′ = −X, thene^Xt is a rotation matrix. Consequently, if the system matrixA is antisymmetric, then the state covariance matrix has rotating eigenspaces. In this regard, we decompose

A = A_{s} + A_{a}

where $A_{s} ≔ \frac{1}{2} (A + A^{'}), A_{a} ≔ \frac{1}{2} (A - A^{'})$ are the symmetric and antisymmetric parts ofA, respectively. Then, we define the following weighted-least-squares (WLS) function:

f_{ϵ} (A) ≔ {‖ A_{s} ‖}_{F}^{2} + ϵ {‖ A_{a} ‖}_{F}^{2}

(35)

where the scalarϵ > 0 scalar weighs the relative significance of symmetric and asymmetric parts ofA. IfA satisfies $\dot{P} = A P + P A^{'}$ for a given pair $\dot{P}$ andP, then fϵ (A) is considered as a quadratic form of the noncommutative division of $\dot{P}$ byP, similar to the Fisher–Rao metric. Actually, for scalar-valued covariances, fϵ (A) is equal to the Fisher–Rao metric.

Following (5), we consider the optimal solution to

min_{P_{t}, A_{t}} {\int_{0}^{1} f_{ϵ} (A_{t}) dt | {\dot{P}}_{t} = A_{t} P_{t} + P_{t} A_{t}^{'}, P_{0}, P_{1} given}

(36)

for a given pair of endpoints $P_{0}, P_{1} \in {Sym}_{+ +}^{n}$ and a scalarϵ > 0.

B. Optimal Covariance Paths

To introduce the solution to (36), we define

T_{ϵ, t} (A) ≔ e^{(1 + ϵ) A_{a} t} e^{(A_{s} + ϵ A_{a}^{'}) t} .

(37)

The next lemma shows thatT_ϵ,t(·) is equal to the state transition matrix of a linear time-varying system.

Lemma 2: Given $A \in ℝ^{n \times n}$ and a scalarϵ > 0, define

A_{ϵ, t} ≔ e^{(1 + ϵ) A_{a} t} A e^{(1 + ϵ) A_{a}^{'} t} .

Then, ${\dot{T}}_{ϵ, t} (A) = A_{ϵ, t} T_{ϵ, t} (A)$ .

Proof:

{\dot{T}}_{ϵ, t} (A) = e^{(1 + ϵ) A_{a} t} ((1 + ϵ) A_{a}) e^{(A_{s} + ϵ A_{a}^{'}) t} + e^{(1 + ϵ) A_{a} t} (A_{s} + ϵ A_{a}^{'}) e^{(A_{s} + ϵ A_{a}^{'}) t} = e^{(1 + ϵ) A_{a} t} A e^{(A_{s} + ϵ A_{a}^{'}) t} = A_{ϵ, t} T_{ϵ, t} (A) .

■

The following corollary is a direct result of Lemma 2.

Corollary 1: Given $P_{0} \in {Sym}_{+ +}^{n}$ , $A \in ℝ^{n \times n}$ and a scalarϵ > 0, define

P_{ϵ, t} ≔ T_{ϵ, t} (A) P_{0} T_{ϵ, t} {(A)}^{'} .

Then, the following equation holds:

{\dot{P}}_{ϵ, t} = A_{ϵ, t} P_{ϵ, t} + P_{ϵ, t} A_{ϵ, t} .

(38)

The solution to (36) is presented as follows.

Theorem 4: Given $P_{0}, P_{1} \in {Sym}_{+ +}^{n}$ and a scalarϵ > 0, if there exists a matrix Π₀ ∈ Symⁿ such that the covariance path

P_{ϵ, t}^{wls} = T_{ϵ, t} (A_{0}) P_{0} T_{ϵ, t} {(A_{0})}^{'}

(39)

withT_ϵ,t(·) given by (37) and

A_{0} = - \frac{1}{2} (P_{0} Π_{0} + Π_{0} P_{0}) - \frac{1}{2 ϵ} (Π_{0} P_{0} - P_{0} Π_{0})

(40)

satisfies $P_{ϵ, 1}^{wls} = P_{1}$ att = 1, then $P_{ϵ, t}^{wls}$ is a minimizer of (36). The corresponding optimalA_t is equal to

A_{ϵ, t}^{wls} = e^{(1 + ϵ) A_{a} t} A_{0} e^{(1 + ϵ) A_{a}^{'} t} .

(41)

Proof: Consider (36) as an optimal control problem, withA_t being matrix-valued control. Then, the Hamiltonian is as follows:

h_{2} (A_{t}, P_{t}, Π_{t}) = \frac{1}{4} {‖ A_{t} + A_{t}^{'} ‖}_{F}^{2} + \frac{ϵ}{4} {‖ A_{t} - A_{t}^{'} ‖}_{F}^{2} + tr (Π_{t} (A_{t} P_{t} + P_{t} A_{t}^{'})), = \frac{1 + ϵ}{2} tr (A_{t} A_{t}^{'}) + \frac{1 - ϵ}{2} tr (A_{t} A_{t}) + tr (Π_{t} (A_{t} P_{t} + P_{t} A_{t}^{'})) .

It is necessary that ${\dot{Π}}_{t}$ annihilates the partial derivative ofh₂ (·) with respect toP_t, which gives rise to

{\dot{Π}}_{t} = - Π_{t} A_{t} - A_{t}^{'} Π_{t} .

(42)

Moreover, the partial derivative ofh₂ (·) with respect to the controlA_t vanishes, which leads to

(A_{t} + A_{t}^{'}) + ϵ (A_{t} - A_{t}^{'}) + 2 Π_{t} P_{t} = 0.

(43)

SolvingA_t from (43), we obtain

A_{t} = - \frac{1}{2} (P_{t} Π_{t} + Π_{t} P_{t}) - \frac{1}{2 ϵ} (Π_{t} P_{t} - P_{t} Π_{t}) .

(44)

Then, substituting (44) into (4) and (42), respectively, we obtain

{\dot{P}}_{t} = (- 1 + \frac{1}{ϵ}) P_{t} Π_{t} P_{t} - (\frac{1}{2} + \frac{1}{2 ϵ}) (Π_{t} P_{t}^{2} + P_{t}^{2} Π_{t})

{\dot{Π}}_{t} = (1 - \frac{1}{ϵ}) Π_{t} P_{t} Π_{t} + (\frac{1}{2} + \frac{1}{2 ϵ}) (Π_{t}^{2} P_{t} + P_{t} Π_{t}^{2}) .

Next, it can be verified that $(\overset{\cdot}{Π_{t} P_{t}}) - (\overset{\cdot}{P_{t} Π_{t}}) = 0$ . Thus, the asymmetric part ofA_t, which is equal to

{(A_{t})}_{a} = - \frac{1}{2 ϵ} (Π_{t} P_{t} - P_{t} Π_{t})

is constant and denoted byA_a. Taking the derivative of its symmetric part

{(A_{t})}_{s} = \frac{1}{2} (A_{t} + A_{t}^{'}) = - \frac{1}{2} (P_{t} Π_{t} + Π_{t} P_{t})

gives that

{(\overset{\cdot}{A_{t}})}_{s} = - \frac{1}{2} (\overset{\cdot}{P_{t} Π_{t}}) - \frac{1}{2} (\overset{\cdot}{Π_{t} P_{t}}) = \frac{1 + ϵ}{2 ϵ} (P_{t} Π_{t}^{2} P_{t} - Π_{t} P_{t}^{2} Π_{t}) = (1 + ϵ) (A_{a} {(A_{t})}_{s} + {(A_{t})}_{s} A_{a}^{'}) .

SinceA_a is constant, the solution to the above equation is equal to

{(A_{t})}_{s} = e^{(1 + ϵ) A_{a} t} A_{s} e^{(1 + ϵ) A_{a}^{'} t} .

Therefore, the optimalA_t has the form

A_{t} = e^{(1 + ϵ) A_{a} t} A_{s} e^{(1 + ϵ) A_{a}^{'} t} + A_{a} = e^{(1 + ϵ) A_{a} t} A e^{(1 + ϵ) A_{a}^{'} t}

withA =A_a +A_s being the initial value ofA_t. Next, we define a new variable

{\hat{P}}_{t} ≔ e^{(1 + ϵ) A_{a}^{'} t} P_{t} e^{(1 + ϵ) A_{a} t}

(45)

whose derivative is equal to

{\dot{\hat{P}}}_{t} = e^{(1 + ϵ) A_{a}^{'} t} (A_{t} P_{t} + P_{t} A_{t}^{'}) e^{(1 + ϵ) A_{a} t} + (1 + ϵ) A_{a}^{'} {\hat{P}}_{t} + {\hat{P}}_{t} (1 + ϵ) A_{a} = (A_{s} + ϵ A_{a}^{'}) {\hat{P}}_{t} + {\hat{P}}_{t} (A_{s} + ϵ A_{a}) .

Thus, the solution to ${\hat{P}}_{t}$ is equal to

{\hat{P}}_{t} = e^{(A_{s} + ϵ A_{a}^{'}) t} P_{0} e^{(A_{s} + ϵ A_{a}) t} .

Substituting this solution to (45), we obtain that the optimalP_t has the form

P_{t} = e^{(1 + ϵ) A_{a} t} e^{(A_{s} + ϵ A_{a}^{'}) t} P_{0} e^{(A_{s} + ϵ A_{a}) t} e^{(1 + ϵ) A_{a}^{'} t} .

In a similar way, we define

{\hat{Π}}_{t} = e^{(1 + ϵ) A_{a}^{'} t} Π_{t} e^{(1 + ϵ) A_{a} t} .

Then, we have

{\dot{\hat{Π}}}_{t} = (- A_{s} + ϵ A_{a}^{'}) {\hat{Π}}_{t} + {\hat{Π}}_{t} (- A_{s} + ϵ A_{a})

whose solution is equal to

Π_{t} = e^{(1 + ϵ) A_{a} t} e^{(- A_{s} + ϵ A_{a}^{'}) t} Π_{0} e^{(- A_{s} + ϵ A_{a}) t} e^{(1 + ϵ) A_{a}^{'} t} .

If (40) holds, then $A_{s} + ϵ A_{a}^{'} = - P_{0} Π_{0}$ . It is straightforward to show that (44) holds for allt > 0 for the provided expressions forA_t, P_t, Π_t. In this case, $f_{ϵ} (A_{t}) = {‖ A_{s} ‖}_{F}^{2} + ϵ {‖ A_{a} ‖}_{F}^{2}$ for allt. Therefore, ifA is equal to $\hat{A}$ in (40), then the proposed trajectories $A_{ϵ, t}^{wls}$ and $P_{ϵ, t}^{wls}$ are local minimizer, which completes the proof. ■

It is interesting to note that the geodesics $P_{t}^{info}$ are special cases of the covariance paths in (39) whenϵ = −1. The existence and uniqueness of the covariance paths that satisfy the conditions in Theorem 4 forϵ > 0 will be discussed somewhere else [32].

Next, we provide an upper bound of the optimal value of (36) for allϵ > 0. For this purpose, we define

\hat{A} = log (P_{0}^{- \frac{1}{2}} {(P_{0}^{\frac{1}{2}} P_{1} P_{0}^{\frac{1}{2}})}^{\frac{1}{2}} P_{0}^{- \frac{1}{2}})

which is symmetric. The corresponding covariance path

P_{t} = {(P_{0}^{- \frac{1}{2}} {(P_{0}^{\frac{1}{2}} P_{1} P_{0}^{\frac{1}{2}})}^{\frac{1}{2}} P_{0}^{- \frac{1}{2}})}^{t} P_{0} {(P_{0}^{- \frac{1}{2}} {(P_{0}^{\frac{1}{2}} P_{1} P_{0}^{\frac{1}{2}})}^{\frac{1}{2}} P_{0}^{- \frac{1}{2}})}^{t}

(46)

is a feasible solution to (36). Therefore, the following proposition holds.

Proposition 2: Given $P_{0}, P_{1} \in {Sym}_{+ +}^{n}$ , the optimal value of (36) for anyϵ > 0 is not larger than ${‖ log (P_{0}^{- \frac{1}{2}} {(P_{0}^{\frac{1}{2}} P_{1} P_{0}^{\frac{1}{2}})}^{\frac{1}{2}} P_{0}^{- \frac{1}{2}}) ‖}_{F}^{2}$ .

Ifϵ → ∞, then the symmetric matrix $\hat{A}$ becomes the optimal solution. Moreover, ifP₀ andP₁ commute, thenP_t in (46) is equal to $P_{t}^{info}$ .

V. Examples

A. Interpolating Covariance Matrices

In this example, we highlight the difference between $P_{ϵ, t}^{wls}$ and the other two types of trajectories, i.e., $P_{t}^{omt}$ and $P_{t}^{info}$ , using the following two matrices as the endpoints:

P_{0} = [\begin{array}{l} 1 & 0 \\ 0 & 2 \end{array}], P_{1} = [\begin{array}{l} 2 & 0 \\ 0 & 1 \end{array}] .

(47)

Applying (1) and (2), we obtain

P_{t}^{omt} = [\begin{matrix} {(1 + (\sqrt{2} - 1) t)}^{2} & 0 \\ 0 & {(\sqrt{2} + (1 - \sqrt{2}) t)}^{2} \end{matrix}]

P_{t}^{info} = [\begin{matrix} 2^{t} & 0 \\ 0 & 2^{1 - t} \end{matrix}]

which are all diagonal. On the other hand, ifϵ = 0, there are infinitely many antisymmetric matrices $A_{0}^{wls}$ of the following form:

A_{0}^{wls} = [\begin{matrix} 0 & \pm \frac{(2 k + 1) π}{2} \\ \mp \frac{(2 k + 1) π}{2} & 0 \end{matrix}]

withk being an arbitrary integer. All these matrices are able to steerP₀ toP₁. Therefore, the corresponding objective function $f_{0} (A_{0}^{wls})$ is equal to zero. The covariance path $P_{0, t}^{wls}$ is equal to

[\begin{array}{l} 1 + {sin}^{2} (\frac{(2 k + 1) π}{2} t) & \pm \frac{1}{2} sin ((2 k + 1) π t) \\ \pm \frac{1}{2} sin ((2 k + 1) π t) & 1 + {cos}^{2} (\frac{(2 k + 1) π}{2} t) \end{array}]

which is not diagonal. A further discussion on the covariance paths corresponding to nonzeroϵ will be discussed somewhere else [32]. Therefore, the path $P_{ϵ, t}^{wls}$ could better track the rotation of energy between different variables via their correlations for a suitable choice ofϵ.

B. Smooth Covariance Paths for rsfMRI Analysis

We investigate an application to use the proposed covariance paths to fit noisy sample covariance matrices from a rsfMRI dataset. In the following, we provide detailed descriptions about the data, the method, and experimental results.

1). Data:

The sample covariance matrices are computed using an rsfMRI dataset from the Human Connectome Project [33]. This dataset consists of 1200 rsfMRI image volumes measured in a 15-min time window. The provided data have already been processed by the ICA-FIX method [34]. These are further processed using global signal regression, as suggested in [35]. Then, we apply the label map from [36] to separate brain cortical surface into seven nonoverlapping regions. The data sequences from each region are averaged into a one-dimensional time series, providing a seven-dimensional time series sampled at 1200 time points. The same dataset and preprocessing method have been used in our early work [37]. Next, we normalize each dimension of the time series by its standard deviation. The normalized time series is denoted by {x_τ,τ = 1, …, 1200}. Moreover, we split the entire sequence into ten equal-length segments and compute the corresponding sample covariance matrices as

{\tilde{P}}_{k} = \frac{1}{120} \sum_{i = 1}^{120} x_{120 \times k + i} x_{120 \times k + i}^{'}, for k = 0, \dots, 9.

Then, the time scale is changed so that ${\tilde{P}}_{t_{k}}$ is equal to ${\tilde{P}}_{k}$ witht₀ = 0 andt₉ = 1. The color arrays in the first row ofFig. 1 illustrate several representative ${\tilde{P}}_{t_{k}}$ at $t = 0, \frac{1}{3}, \frac{2}{3}, 1$ , respectively. These figures show that ${\tilde{P}}_{t_{k}}$ has significant fluctuations, which is consistent with the observations from [11] and [12]. The main goal of this proof-of-concept experiment is to use the proposed covariance paths to fit these sample covariances and compare their differences. The neuroscience aspects of this experiment will not be discussed in this paper.

2). Method:

We solve optimization problems of the following form:

min_{P_{t} \in P} \sum_{k = 0}^{K} {‖ P_{t_{k}} - {\tilde{P}}_{t_{k}} ‖}_{F}^{2}

(48)

to obtain smooth paths that fit the measurements, whereK = 9 and $P$ represents a suitable set of smooth paths. Based on results from the previous sections, we propose three sets of parametric models for the smooth paths, which are described in the following.

Based on Theorem 1, we define

P_{omt} ≔ {P_{t} | P_{t} = (I - t Q) P_{0} (I - t Q^{'}), P_{0} \in {Sym}_{+ +}^{n}, Q \in ℝ^{n \times n}}

wheren = 7 in this example. Note that the matrixQ could be asymmetric so that $P_{omt}$ contains the OMT-based geodesics in the form of (2). We use the more general family of covariance paths $P_{omt}$ in order to obtain better fitting results. It is also clear from Theorem 1 that a $P_{t} \in P_{omt}$ is the state covariance of a linear time-varying system withA_t = −Q(I −Qt)⁻¹. We apply thefminsdp function¹ in MATLAB to obtain an optimal solution. The initial values forP₀ andQ are set to ${\tilde{P}}_{0}$ and the zero matrix, respectively. The same initial values and optimization algorithm are used to solve the subsequent optimization problems. The corresponding optimal value is denoted by ${\hat{P}}_{t}^{omt}$ .

Following Theorem 2, we define the second set of smooth paths as

P_{info} = {P_{t} | P_{t} = e^{A t} P_{0} e^{A^{'} t}, P_{0} \in {Sym}_{+ +}^{n}, A \in ℝ^{n \times n}}

which includes the geodesics $P_{t}^{info}$ . The optimal path in this set is denoted by ${\hat{P}}_{t}^{info}$ . Clearly, a trajectory in $P_{t} \in P_{info}$ is equal to the state covariance of a linear time-invariant system.

Based on Theorem 4, we define the set

P_{ϵ, wls} = {P_{t} | P_{t} = T_{ϵ, t} (A) P_{0} T_{ϵ, t} {(A)}^{'}, P_{0} \in {Sym}_{+ +}^{n}, A \in ℝ^{n \times n}}

for a givenϵ > 0. The corresponding optimal paths are denoted by ${\hat{P}}_{ϵ, t}^{wls}$ . This set includes all the trajectories that are solutions of (36). A trajectory in $P_{ϵ, wls}$ is equal to the state covariance of a linear time-varying system with the system matrices expressed in the form $e^{(1 + ϵ) A_{a} t} A e^{(1 + ϵ) A_{a}^{'} t}$ . The system matrices corresponding to ${\hat{P}}_{ϵ, t}^{wls}$ is denoted by ${\hat{A}}_{ϵ, t}^{wls}$ . The parameterϵ is then searched over a discrete set in [0, 100] to minimize fitting errors. Based on the fitting results, we set the value ofϵ at 20.

3). Results:

Fig. 2 illustrates the fitting results of six representative entries of ${\tilde{P}}_{t_{k}}$ . The black stars represent the noisy measurements. The blue, green, and red plots represent the estimated paths ${\hat{P}}_{t}^{omt}$ , ${\hat{P}}_{t}^{info}$ , and ${\hat{P}}_{ϵ, t}^{wls}$ , respectively. ${\hat{P}}_{t}^{omt}$ and ${\hat{P}}_{t}^{info}$ are very similar with each other. Clearly, ${\hat{P}}_{ϵ, t}^{wls}$ has more oscillations, which better fits the fluctuations in the measurements. The normalized square errors $(\sum_{k = 0}^{K} {‖ {\hat{P}}_{t_{k}} - {\tilde{P}}_{t_{k}} ‖}_{F}^{2}) / (\sum_{k}^{K} {‖ {\hat{P}}_{t_{k}} ‖}_{F}^{2})$ corresponding to ${\hat{P}}_{t}^{omt}$ , ${\hat{P}}_{t}^{info}$ , and ${\hat{P}}_{ϵ, t}^{wls}$ are equal to 0.1683, 0.1671, and 0.1543, respectively. Thus, ${\hat{P}}_{ϵ, t}^{wls}$ has the smallest fitting error. The overall relative large residual is partly due to the low signal-to-noise ratio of rsfMRI data [38]. Therefore, the corresponding system matrices ${\hat{A}}_{ϵ, t}^{wls}$ could better explain the dynamic interdependence between brain regions. The directed networks in the second row ofFig. 1 illustrates the matrices ${\hat{A}}_{ϵ, t}^{wls}$ at $t = 0, \frac{1}{3}, \frac{2}{3}, 1$ , respectively. The red and blue colors represent positive and negative values, respectively. The edge widths are weighted by the absolute value of the corresponding entries. To simplify visualization, edges with weight smaller than 0.15 are not displayed.

VI. Discussion

In this paper, we have investigated a framework to derive covariance paths on the Riemannian manifolds of positive-definite matrices by using quadratic forms of system matrices to regularize the path lengths. We have analyzed three types of quadratic forms and the corresponding covariance paths. The first and the second quadratic forms lead to the well-known geodesics derived from the Hellinger–Bures metric from OMT and the Fisher–Rao metric from information geometry, respectively. In the process, we have derived a fluid-mechanics interpretation of the Fisher–Rao metric in Theorem 3, which provides an interesting weighted-OMT view for the Fisher–Rao metric. Another contribution of this paper is the introduction of the third type of quadric form, which gives rise to a general family of covariance paths that are steered by system matrices with rotating eigenspaces.

We note that similar types of trajectories of positive-definite matrices with rotating eigenspaces have been investigated in [39]–[41] from different angles. This work is also developed along similar lines as [42] and [43], which focus on the optimal steering of state covariances via linear systems using external input. But the approach for developing covariance paths used in this paper is different from early work.

In a proof-of-concept example, we have applied the three types of smooth paths to fit noisy sample covariance matrices from an rsfMRI dataset. The goal of this experiment is to understand directed interactions among brain regions via the estimated system matrices. As expected, the rotation-system based covariance path has the best performance in terms of fitting errors. Therefore, the corresponding system matrices could provide a data-driven tool to understand the structured fluctuations of functional brain activities. The spatial–temporal varying features of functional brain networks could potentially provide useful information to understand the pathologies of mental disorders. In our future work, we will explore the proposed covariance paths to analyze data from other neuroimaging modalities such as electroencephalography/magnetoencephalography.

Acknowledgment

The author would like to thank T. Georgiou and Y. Rathi for insightful discussions. The author would also like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of this paper.

This work was supported under Grant R21MH116352, Grant R21MH115280 (PI: Ning), Grant R01MH097979 (PI: Rathi), Grant R01MH111917 (PI: Rathi), and Grant R01MH074794 (PI: Westin).

Biography

graphic file with name nihms-1063649-b0001.gif

Lipeng Ning received the B.Sc. and M.Sc. degrees in control science and engineering from the Beijing Institute of Technology, Beijing, China, in 2006 and 2008, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Minnesota, Minneapolis, MN, USA, in November 2013.

He is currently a Faculty Member with the Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA. He is interested in the application of mathematics in neuroimaging and neuroscience research. His current research focuses on developing neuroimaging techniques to improve the diagnosis and treatment of brain diseases.

Footnotes

This package is available fromhttps://www.mathworks.com/matlabcentral/fileexchange/43643-fminsdp.

References

[1].Porikli F, Tuzel O, and Meer P, “Covariance tracking using model update based on lie algebra,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit, Jun.2006, vol. 1, pp. 728–735. [Google Scholar]
[2].Wu Yet al. , “Real-time probabilistic covariance tracking with efficient modelupdate,”IEEE Trans. Image Process, vol. 21, no. 5, pp. 2824–2837, May2012. [DOI] [PubMed] [Google Scholar]
[3].Yang JF and Kaveh M, “Adaptive eigensubspace algorithms for direction or frequency estimation and tracking,” IEEE Trans. Acoust., Speech, Signal Process, vol. 36, no. 2, pp. 241–251, Feb.1988. [Google Scholar]
[4].Jiang X, Ning L, and Georgiou TT, “Distances and Riemannian metrics for multivariate spectral densities,” IEEE Trans. Autom. Control, vol. 57, no. 7, pp. 1723–1735, Jul.2012. [Google Scholar]
[5].Lenglet C, Rousson M, Deriche R, and Faugeras O, “Statistics on the manifold of multivariate normal distributions: Theory and application to diffusion tensor MRI processing,” J. Math. Imag. Vis, vol. 25, no. 3, pp. 423–444, Oct.2006. [Online]. Available: 10.1007/s10851-006-6897-z [DOI] [Google Scholar]
[6].Dryden IL, Koloydenko A, and Zhou D, “Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging,” Ann. Appl. Statist, vol. 3, no. 3, pp. 1102–1123, 2009. [Online]. Available:http://www.jstor.org/stable/30242879 [Google Scholar]
[7].Hao X, Whitaker RT, and Fletcher PT, Adaptive Riemannian Metrics for Improved Geodesic Tracking of White Matter. Berlin, Germany: Springer, 2011, pp. 13–24. [Online]. Available: 10.1007/978-3-642-22092-0_2 [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Biswal B, Yetkin FZ, Haughton VM, and Hyde JS, “Functional connectivity in the motor cortex of resting human brain using echo-planar MRI,” Magn. Reson. Med, vol. 34, no. 4, pp. 537–541, 1995. [Online]. Available: 10.1002/mrm.1910340409 [DOI] [PubMed] [Google Scholar]
[9].Buckner RL, Krienen FM, and Yeo BTT, “Opportunities and limitations of intrinsic functional connectivity MRI,” Nature Neurosci, vol. 16, pp. 832–837, 2013. [DOI] [PubMed] [Google Scholar]
[10].Smith SMet al. , “Functional connectomics from resting-state fMRI,” Trends Cogn. Sci, vol. 17, no. 12, pp. 666–682, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Chang C and Glover GH, “Time-frequency dynamics of resting-state brain connectivity measured with fMRI,” NeuroImage, vol. 50, no. 1, pp. 81–98, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Preti MG, Bolton TA, and Ville DVD, “The dynamic functional connectome: State-of-the-art and perspectives,” NeuroImage, vol. 160, pp. 41–54, 2017. [DOI] [PubMed] [Google Scholar]
[13].Rao C, “Information and the accuracy attainable in the estimation of statistical parameters,” Bull. Calcutta Math. Soc, vol. 37, pp. 81–91, 1945. [Google Scholar]
[14].Amari S-I and Nagaoka H, Methods of Information Geometry. Providence, RI, USA: Amer. Math. Soc., 2000. [Google Scholar]
[15].Cencov N, Statistical Decision Rules and Optimal Inference. Providence, RI, USA: Amer. Math. Soc., 1982. [Google Scholar]
[16].Kass R and Vos P, Geometrical Foundations of Asymptotic Inference. New York, NY, USA: Wiley, 1997. [Google Scholar]
[17].Villani C, Topics in Optimal Transportation. Providence, RI, USA: Amer. Math. Soc., 2003. [Google Scholar]
[18].Rachev S and Rüschendorf L, Mass Transportation Problems. Vol. I and II(Probability and Its Applications). NewYork, NY, USA: Springer, 1998. [Google Scholar]
[19].Knott M and Smith CS, “On the optimal mapping of distributions,” J. Optim. Theory Appl, vol. 43, no. 1, pp. 39–49, May1984. [Google Scholar]
[20].Takatsu A, “Wasserstein geometry of Gaussian measures,” Osaka J. Math, vol. 48, pp. 1005–1026, 2011. [Google Scholar]
[21].Jordan R, Kinderlehrer D, and Otto F, “The variational formulation of the Fokker–Planck equation,” SIAM J. Math. Anal, vol. 29, no. 1, pp. 1–17, 1998. [Google Scholar]
[22].Benamou J-D and Brenier Y, “A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem,” Numer. Math, vol. 84, no. 3, pp. 375–393, Jan.2000. [Google Scholar]
[23].Ning L, Jiang X, and Georgiou T, “On the geometry of covariance matrices,” IEEE Signal Process. Lett, vol. 20, no. 8, pp. 787–790, Aug.2013. [Google Scholar]
[24].Bhatia R, Jain T, and Lim Y, “On the Bures-Wasserstein distance between positive definite matrices,” Expo. Math, 2018. [Online]. Available: 10.1016/j.exmath.2018.01.002 [DOI] [Google Scholar]
[25].Uhlmann A, “The metric of Bures and the geometric phase,” in Quantum Groups and Related Topics: Proceedings of the First Max Born Symposium, Gielerak R, Lukierski J, and Popowicz Z, Eds. Dordrecht, The Netherlands: Springer, 1992, p. 267. [Google Scholar]
[26].Petz D, “Geometry of canonical correlation on the state space of a quantum system,” J. Math. Phys, vol. 35, pp. 780–795, 1994. [Google Scholar]
[27].Ferrante A, Pavon M, and Ramponi F, “Hellinger versus Kullback-Leibler multivariate spectrum approximation,” IEEE Trans. Autom. Control, vol. 53, no. 4, pp. 954–967, May2008. [Google Scholar]
[28].Kullback S and Leibler RA, “On information and sufficiency,” Ann. Math. Statist, vol. 22, no. 1, pp. 79–86, 1951. [Google Scholar]
[29].Cover T and Thomas J, Elements of Information Theory. Hoboken, NJ, USA: Wiley-Interscience, 2008. [Google Scholar]
[30].Bhatia R, Positive Definite Matrices. Princeton, NJ, USA: Princeton Univ. Press, 2007. [Google Scholar]
[31].Schättler H and Ledzewicz U, Geometric Optimal Control: Theory, Methods and Examples (Interdisciplinary Applied Mathematics). New York, NY, USA: Springer, 2012. [Google Scholar]
[32].Ning L, “Regularization of covariance matrices on Riemannian manifolds using linear systems,” May2018, arXiv: 1805.11699.
[33].Essen DVet al. , “The human connectome project: A data acquisition perspective,” NeuroImage, vol. 62, no. 4, pp. 2222–2231, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
[34].Smith etal SM., “Resting-state fMRI in the human connectome project,” NeuroImage, vol. 80, pp. 144–168, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
[35].Fox M, Zhang D, Snyder A, and Raichle M, “The global signal and observed anticorrelated resting state brain networks,” J. Neurophysiol, vol. 101, pp. 3270–3283, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[36].Yeo BTet al. , “The organization of the human cerebral cortex estimated by intrinsic functional connectivity,” J. Neurophysiol, vol. 106, pp. 1125–1165, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[37].Ning L and Rathi Y, “A dynamic regression approach for frequency-domain partial coherence and causality analysis of functional brain networks,” IEEE Trans. Med. Imag, vol. 37, no. 9, pp. 1957–1969, Sep.2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].Murphy K, Bodurka J, and Bandettini PA, “How long to scan? The relationship between fMRI temporal signal to noise ratio and necessary scan duration,” NeuroImage, vol. 34, no. 2, pp. 565–574, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
[39].Ning L, Georgiou TT, and Tannenbaum A, “On matrix-valued Monge-Kantorovich optimal mass transport,”IEEE Trans. Autom. Control, vol. 60, no. 2, pp. 373–382, Feb.2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
[40].Yamamoto K, Chen Y, Ning L, Georgiou TT, and Tannenbaum A, “Regularization and interpolation of positive matrices,” IEEE Trans. Autom. Control, vol. 63, no. 4, pp. 1208–1212, Apr.2018. [Google Scholar]
[41].Chen Y, Georgiou T, and Tannenbaum A, “Matrix optimal mass transport: A quantum mechanical approach,” IEEE Trans. Autom. Control, vol. 63, no. 8, pp. 2612–2619, Aug.2018. [Google Scholar]
[42].Chen Y, Georgiou TT, and Pavon M, “Optimal steering of a linear stochastic system to a final probability distribution, part I,” IEEE Trans. Autom. Control, vol. 61, no. 5, pp. 1158–1169, May2016. [Google Scholar]
[43].Chen Y, Georgiou TT, and Pavon M, “Optimal steering of a linear stochastic system to a final probability distribution, part II,” IEEE Trans. Autom. Control, vol. 61, no. 5, pp. 1170–1180, May2016. [Google Scholar]

Movatterモバイル変換

PERMALINK

Smooth Interpolation of Covariance Matrices and Brain Network Estimation

Lipeng Ning

Abstract

I. Introduction

II. Mass-Transport-Based Covariance Paths

A. On Optimal Mass Transport

B. Hellinger–Bures Metric

C. Hellinger–Bures-Based Linear Systems

III. Information-Geometry-Based Covariance Paths

A. Fisher–Rao Metric

B. Fisher–Rao-Metric-Based Linear Systems

C. Weighted-Mass-Transport View

D. Fluid-Mechanics Interpretation

IV. Rotation-Linear-System-Based Covariance Paths

A. Weighted-Least-Squares Cost Functions

B. Optimal Covariance Paths

V. Examples

A. Interpolating Covariance Matrices

B. Smooth Covariance Paths for rsfMRI Analysis

1). Data:

Fig. 1.

2). Method:

3). Results:

Fig. 2.

VI. Discussion

Acknowledgment

Biography

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

Movatterモバイル変換

PERMALINK

Smooth Interpolation of Covariance Matrices and Brain Network Estimation

Lipeng Ning

Abstract

I. Introduction

II. Mass-Transport-Based Covariance Paths

A. On Optimal Mass Transport

B. Hellinger–Bures Metric

C. Hellinger–Bures-Based Linear Systems

III. Information-Geometry-Based Covariance Paths

A. Fisher–Rao Metric

B. Fisher–Rao-Metric-Based Linear Systems

C. Weighted-Mass-Transport View

D. Fluid-Mechanics Interpretation

IV. Rotation-Linear-System-Based Covariance Paths

A. Weighted-Least-Squares Cost Functions

B. Optimal Covariance Paths

V. Examples

A. Interpolating Covariance Matrices

B. Smooth Covariance Paths for rsfMRI Analysis

1). Data:

Fig. 1.

2). Method:

3). Results:

Fig. 2.

VI. Discussion

Acknowledgment

Biography

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases