Movatterモバイル変換

[0]ホーム

Jump to content

Projected normal distribution

Add links

From Wikipedia, the free encyclopedia

(Redirected fromACG distribution)

Probability distribution

Projected normal distribution
Notation	${\mathcal {PN}}_{n}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$
Parameters	${\boldsymbol {\mu }}\in \mathbb {R} ^{n}$ (location) ${\boldsymbol {\Sigma }}\in \mathbb {R} ^{n\times n}$ (scale)
Support	Unitn-sphere, with angular or Cartesian coordinates: ${\boldsymbol {\Theta }}=[0,\pi ]^{n-2}\times [0,2\pi )$ $\mathbb {S} ^{n-1}=\{{\boldsymbol {z}}\in \mathbb {R} ^{n}:\lVert {\boldsymbol {z}}\rVert =1\}$
PDF	complicated, see text

Indirectional statistics, theprojected normal distribution (also known asoffset normal distribution,angular normal distribution orangular Gaussian distribution)^[1]^[2] is aprobability distribution overdirections that describes the radial projection of arandom variable withn-variate normal distribution over the unit(n-1)-sphere.

Definition and properties

[edit]

Given a random variable ${\boldsymbol {X}}\in \mathbb {R} ^{n}$ that follows a multivariate normal distribution ${\mathcal {N}}_{n}({\boldsymbol {\mu }},\,{\boldsymbol {\Sigma }})$ , the projected normal distribution ${\mathcal {PN}}_{n}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ represents the distribution of the random variable ${\boldsymbol {Y}}={\frac {\boldsymbol {X}}{\lVert {\boldsymbol {X}}\rVert }}$ obtained projecting ${\boldsymbol {X}}$ over the unit sphere. In the general case, the projected normal distribution can be asymmetric andmultimodal. In case ${\boldsymbol {\mu }}$ is parallel to aneigenvector of ${\boldsymbol {\Sigma }}$ , the distribution is symmetric.^[3] The first version of such distribution was introduced in Pukkila and Rao (1988).^[4]

Support

[edit]

The support of this distribution is the unit (n-1)-sphere, which can be variously given in terms of a set of $(n-1)$ -dimensionalangular spherical cooordinates:

{\boldsymbol {\Theta }}=[0,\pi ]^{n-2}\times [0,2\pi )\subset \mathbb {R} ^{n-1}

or in terms of $n {\displaystyle n}$ -dimensionalCartesian coordinates:

\mathbb {S} ^{n-1}=\{{\boldsymbol {z}}\in \mathbb {R} ^{n}:\lVert {\boldsymbol {z}}\rVert =1\}\subset \mathbb {R} ^{n}

The two are linked via theembedding function, $e:{\boldsymbol {\Theta }}\to \mathbb {R} ^{n}$ , with range $e({\boldsymbol {\Theta }})=\mathbb {S} ^{n-1}.$ This function is defined bythe formula for spherical coordinates at $r=1.$

Density function

[edit]

The density of the projected normal distribution ${\mathcal {PN}}_{n}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ can be constructed from the density of its generatorn-variate normal distribution ${\mathcal {N}}_{n}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ by re-parametrising ton-dimensional spherical coordinates and then integrating over the radial coordinate.

Infull spherical coordinates with radial component $r\in [0,\infty )$ and angles ${\boldsymbol {\theta }}=(\theta _{1},\dots ,\theta _{n-1})\in {\boldsymbol {\Theta }}$ , a point ${\boldsymbol {x}}=(x_{1},\dots ,x_{n})\in \mathbb {R} ^{n}$ can be written as ${\boldsymbol {x}}=r{\boldsymbol {v}}$ , with ${\boldsymbol {v}}\in \mathbb {S} ^{n-1}$ . To be clear, ${\boldsymbol {v}}=e({\boldsymbol {\theta }})$ , as given by the above-defined embedding function. The joint density becomes

p(r,{\boldsymbol {\theta }}|{\boldsymbol {\mu }},{\boldsymbol {\Sigma }})=r^{n-1}{\mathcal {N}}_{n}(r{\boldsymbol {v}}\mid {\boldsymbol {\mu }},{\boldsymbol {\Sigma }})={\frac {r^{n-1}}{{\sqrt {|{\boldsymbol {\Sigma }}|}}(2\pi )^{\frac {n}{2}}}}e^{-{\frac {1}{2}}(r{\boldsymbol {v}}-{\boldsymbol {\mu }})^{\top }\Sigma ^{-1}(r{\boldsymbol {v}}-{\boldsymbol {\mu }})}

where the factor $r^{n-1}$ is due to thechange of variables ${\boldsymbol {x}}=r{\boldsymbol {v}}$ . The density of ${\mathcal {PN}}_{n}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ can then be obtained via marginalization over $r {\displaystyle r}$ as^[5]

p({\boldsymbol {\theta }}|{\boldsymbol {\mu }},{\boldsymbol {\Sigma }})=\int _{0}^{\infty }p(r,{\boldsymbol {\theta }}|{\boldsymbol {\mu }},{\boldsymbol {\Sigma }})dr.

The same density had been previously obtained in Pukkila and Rao (1988, Eq. (2.4))^[4] using a different notation.

Note on density definition

[edit]

This subsection gives some clarification lest the various forms ofprobability density used in this article be misunderstood. Take for example a random variate $u\in (0,1]$ , with uniform density, $p_{U}(u)=1$ . If $\ell =-\log u$ , it has density, $p_{L}(\ell )=e^{-\ell }$ . This works if both densities are defined with respect toLebesgue measure on the real line. By default convention:

Density functions areLebesgue-densities, definedwith respect to Lebesgue measure, applied in the space where the argument of the density function lives, so that:
The Lebesgue-densities involved in achange of variables are related by a factor dependent on the derivative(s) of the transformation ( $d\ell /du=e^{-\ell }$ in this example; and $r^{n-1}$ for the above change of variables, ${\boldsymbol {x}}=r{\boldsymbol {v}}$ ).

Neither of these conventions apply to the ${\mathcal {PN_{n}}}$ densities in this article:

For $n\geq 3$ the density, $p({\boldsymbol {\theta }}\mid {\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ isnot defined w.r.t. Lebesgue measure in $\mathbb {R} ^{n-1}$ where ${\boldsymbol {\theta }}$ lives, because that measure does not agree with the standard notion ofhyperspherical area. Instead, thedensity is defined w.r.t. a measure that ispulled back (via the embedding function) to angular coordinate space, from Lebesgue measure in the $(n-1)$ -dimensionaltangent space of the hypersphere. This will be explained below.
With the embedding ${\boldsymbol {v}}=e({\boldsymbol {\theta }})$ , a density, ${\tilde {p}}({\boldsymbol {v}}\mid {\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ cannot be defined w.r.t. Lebesgue measure, because $\mathbb {S} ^{n-1}\in \mathbb {R} ^{n}$ has Lebesgue measure zero. Instead, ${\tilde {p}}$ is defined w.r.t.scaled Hausdorff measure.

The pullback and Hausdorff measures agree, so that:

p({\boldsymbol {\theta }}\mid {\boldsymbol {\mu }},{\boldsymbol {\Sigma }})={\tilde {p}}({\boldsymbol {v}}\mid {\boldsymbol {\mu }},{\boldsymbol {\Sigma }})

where there is no change-of-variables factor, because the densities usedifferent measures.

To better understand what is meant by a density being defined w.r.t. ameasure (a function that maps subsets insample space to a non-negative real-valued 'volume'), consider a measureable subset, $U\subseteq {\boldsymbol {\Theta }}$ , with embedded image $V=e(U)\subseteq \mathbb {S} ^{n-1}$ and let ${\boldsymbol {v}}=e({\boldsymbol {\theta }})\sim {\mathcal {PN_{n}}}$ , then the probability for finding the sample in the subset is:

P({\boldsymbol {\theta }}\in U)=\int _{U}p\,d\pi =P({\boldsymbol {v}}\in V)=\int _{V}{\tilde {p}}\,dh

where $\pi ,h$ are respectively the pullback and Hausdorff measures; and the integrals areLebesgue integrals, which can berewritten as Riemann integrals thus:

\int _{U}p\,d\pi =\int _{0}^{\infty }\pi \left(\{{\boldsymbol {\theta }}\in U:p({\boldsymbol {\theta }})>t\}\right)\,dt\quad (1)

Pullback measure

[edit]

Thetangent space at ${\boldsymbol {v}}\in \mathbb {S} ^{n-1}$ is the $(n-1)$ -dimensionallinear subspace perpendicular to ${\boldsymbol {v}}$ , where Lebesgue measurecan be used. At very small scale, the tangent space is indistinguishable from the sphere (e.g. Earth looks locally flat), so that Lebesgue measure in tangent space agrees with area on the hypersphere. The tangent space Lebesgue measure is pulled back via the embedding function, as follows, to define the measure in coordinate space. For $U\subseteq {\boldsymbol {\Theta }},$ a measureable subset in coordinate space, thepullback measure, as aRiemann integral is:

\pi (U)=\int _{U}{\sqrt {\left|\operatorname {det} (\mathbf {E} _{\boldsymbol {\theta }}'\mathbf {E} _{\boldsymbol {\theta }})\right|}}\,d\theta _{1}\,\cdots \,d\theta _{n-1}\quad (2)

where theJacobian of the embedding function, $e({\boldsymbol {\theta }})$ , is the $n{\text{-by-}}(n-1)$ matrix $\mathbf {E} _{\boldsymbol {\theta }},$ the columns of which span the $(n-1)$ -dimensional tangent space where the Lebesgue measure is applied.It can be shown: ${\sqrt {\left|\operatorname {det} (\mathbf {E} _{\boldsymbol {\theta }}'\mathbf {E} _{\boldsymbol {\theta }})\right|}}=\prod _{i=1}^{n-2}\sin ^{n-1-i}(\theta _{i}).$ When plugging the pullback measure (2), into equation (1) and exchanging the order of integration:^[6]

P({\boldsymbol {\theta }}\in {\mathcal {U}})=\int _{U}p\,d\pi =\int _{U}p({\boldsymbol {\theta }}\mid {\boldsymbol {\mu }},{\boldsymbol {\Sigma }})\,{\sqrt {\left|\operatorname {det} (\mathbf {E} _{\boldsymbol {\theta }}'\mathbf {E} _{\boldsymbol {\theta }})\right|}}\,d\theta _{1}\,\cdots \,d\theta _{n-1}

where the first integral is Lebesgue and the second Riemann. Finally, for better geometric understanding of the square-root factor, consider:

For $n=2$ , when integrating over the unitcircle, w.r.t. $\theta _{1}$ , with embedding $e(\theta _{1})=(\cos \theta _{1},\sin \theta _{1})$ , the Jacobian is $\mathbf {E} _{\boldsymbol {\theta }}=[-\sin \theta _{1}\,\cos \theta _{1}]'$ , so that ${\sqrt {\left|\operatorname {det} (\mathbf {E} _{\boldsymbol {\theta }}'\mathbf {E} _{\boldsymbol {\theta }})\right|}}=1$ . The angular differential, $d\theta _{1}$ directly gives the subtended arc length on the circle.
For $n=3$ , when integrating over the unitsphere, w.r.t. $\theta _{1},\theta _{2}$ , we get ${\sqrt {\left|\operatorname {det} (\mathbf {E} _{\boldsymbol {\theta }}'\mathbf {E} _{\boldsymbol {\theta }})\right|}}=\sin \theta _{1}$ , which is the radius of thecircle of latitude at $\theta _{1}$ (compare equator to polar circle). The area of the surface patch subtended by the two angular differentials is: $\sin \theta _{1}\,d\theta _{1}\,d\theta _{2}$ .
More generally, for $n\geq 2$ , let $\mathbf {T}$ be a square or tall matrix and let $/\mathbf {T} \!/$ denote theparallelotope spanned by its colums (which represent the edges meeting at a common vertex). The parallelotope volume is ${\sqrt {\left|\operatorname {det} (\mathbf {T} '\mathbf {T} )\right|}},$ the square root of the absolute value of theGram determinant. For square $\mathbf {T}$ , the volume simplifies to $\left|\operatorname {det} (\mathbf {T} )\right|.$ Now let $\mathbf {R} =\operatorname {diag} (d\theta _{1},\cdots ,d\theta _{n-1})$ , so that $/\mathbf {R} /\in {\boldsymbol {\Theta }}$ is a rectangle with infinitessimally small volume, $\left|\operatorname {det} (\mathbf {R} )\right|=\prod _{i=1}^{n-1}d\theta _{i}$ . Since the smooth embedding function is linear at small scale, the embedded image is the paralleotope, $e(/\mathbf {R} /)=/\mathbf {E_{\boldsymbol {\theta }}R} /$ , with volume (area of the subtended hyperspherical surface patch): ${\sqrt {|\operatorname {det} (\mathbf {RE_{\boldsymbol {\theta }}} '\mathbf {E_{\boldsymbol {\theta }}R} )|}}={\sqrt {|\operatorname {det} (\mathbf {E_{\boldsymbol {\theta }}} '\mathbf {E_{\boldsymbol {\theta }}} )|}}\,d\theta _{1}\,\cdots \,d\theta _{n-1}.$

Circular distribution

[edit]

For $n=2$ , parametrising the position on theunit circle inpolar coordinates as ${\boldsymbol {v}}=(\cos \theta ,\sin \theta )$ , the density function can be written with respect to the parameters ${\boldsymbol {\mu }}$ and ${\boldsymbol {\Sigma }}$ of the initial normal distribution as

p(\theta |{\boldsymbol {\mu }},{\boldsymbol {\Sigma }})={\frac {e^{-{\frac {1}{2}}{\boldsymbol {\mu }}^{\top }{\boldsymbol {\Sigma }}^{-1}{\boldsymbol {\mu }}}}{2\pi {\sqrt {|{\boldsymbol {\Sigma }}|}}{\boldsymbol {v}}^{\top }{\boldsymbol {\Sigma }}^{-1}{\boldsymbol {v}}}}\left(1+T(\theta ){\frac {\Phi (T(\theta ))}{\phi (T(\theta ))}}\right)I_{[0,2\pi )}(\theta )

where $\phi$ and $\Phi$ are thedensity andcumulative distribution of astandard normal distribution, $T(\theta )={\frac {{\boldsymbol {v}}^{\top }{\boldsymbol {\Sigma }}^{-1}{\boldsymbol {\mu }}}{\sqrt {{\boldsymbol {v}}^{\top }{\boldsymbol {\Sigma }}^{-1}{\boldsymbol {v}}}}}$ , and $I {\displaystyle I}$ is theindicator function.^[3]

In the circular case, if the mean vector ${\boldsymbol {\mu }}$ is parallel to theeigenvector associated to the largesteigenvalue of the covariance, the distribution is symmetric and has amode at $\theta =\alpha$ and either a mode or an antimode at $\theta =\alpha +\pi$ , where $\alpha$ is the polar angle of ${\boldsymbol {\mu }}=(r\cos \alpha ,r\sin \alpha )$ . If the mean is parallel to the eigenvector associated to the smallest eigenvalue instead, the distribution is also symmetric but has either a mode or an antimode at $\theta =\alpha$ and an antimode at $\theta =\alpha +\pi$ .^[7]

Spherical distribution

[edit]

For $n=3$ , parametrising the position on theunit sphere inspherical coordinates as ${\boldsymbol {v}}=(\cos \theta _{1}\sin \theta _{2},\sin \theta _{1}\sin \theta _{2},\cos \theta _{2})$ where ${\boldsymbol {\theta }}=(\theta _{1},\theta _{2})$ are theazimuth $\theta _{1}\in [0,2\pi )$ and inclination $\theta _{2}\in [0,\pi ]$ angles respectively, the density function becomes

p({\boldsymbol {\theta }}|{\boldsymbol {\mu }},{\boldsymbol {\Sigma }})={\frac {e^{-{\frac {1}{2}}{\boldsymbol {\mu }}^{\top }{\boldsymbol {\Sigma }}^{-1}{\boldsymbol {\mu }}}}{{\sqrt {|{\boldsymbol {\Sigma }}|}}\left(2\pi {\boldsymbol {v}}^{\top }{\boldsymbol {\Sigma }}^{-1}{\boldsymbol {v}}\right)^{\frac {3}{2}}}}\left({\frac {\Phi (T({\boldsymbol {\theta }}))}{\phi (T({\boldsymbol {\theta }}))}}+T({\boldsymbol {\theta }})\left(1+T({\boldsymbol {\theta }}){\frac {\Phi (T({\boldsymbol {\theta }}))}{\phi (T({\boldsymbol {\theta }}))}}\right)\right)I_{[0,2\pi )}(\theta _{1})I_{[0,\pi ]}(\theta _{2})

where $\phi$ , $\Phi$ , $T {\displaystyle T}$ , and $I {\displaystyle I}$ have the same meaning as the circular case.^[8]

Angular Central Gaussian Distribution

[edit]

In the special case, ${\boldsymbol {\mu }}=\mathbf {0}$ , the projected normal distribution, with $n\geq 2$ is known as theangular central Gaussian (ACG)^[9] and in this case, the density function can be obtained in closed form as a function ofCartesian coordinates. Let $\mathbf {x} \sim {\mathcal {N}}_{n}(\mathbf {0} ,{\boldsymbol {\Sigma }})$ and project radially: $\mathbf {v} =\lVert \mathbf {x} \rVert ^{-1}\mathbf {x}$ so that $\mathbf {v} \in \mathbb {S} ^{n-1}=\{\mathbf {z} \in \mathbb {R} ^{n}:\lVert \mathbf {z} \rVert =1\}$ (the unit hypersphere). We write $\mathbf {v} \sim \operatorname {ACG} ({\boldsymbol {\Sigma }})$ , which as explained above, at ${\boldsymbol {v}}=e({\boldsymbol {\theta }})$ , has density:

{\tilde {p}}_{\text{ACG}}(\mathbf {v} \mid {\boldsymbol {\Sigma }})=p({\boldsymbol {\theta }}\mid {\boldsymbol {0}},{\boldsymbol {\Sigma }})=\int _{0}^{\infty }r^{n-1}{\mathcal {N}}_{n}(r\mathbf {v} \mid \mathbf {0} ,{\boldsymbol {\Sigma }})\,dr={\frac {\Gamma ({\frac {n}{2}})}{2\pi ^{\frac {n}{2}}}}\left|{\boldsymbol {\Sigma }}\right|^{-{\frac {1}{2}}}(\mathbf {v} '{\boldsymbol {\Sigma }}^{-1}\mathbf {v} )^{-{\frac {n}{2}}}

where the integral can be solved by a change of variables and then using the standard definition of thegamma function. Notice that:

For any $k>0$ there is the parameter indeterminacy:

{\tilde {p}}_{\text{ACG}}(\mathbf {v} \mid k{\boldsymbol {\Sigma }})={\tilde {p}}_{\text{ACG}}(\mathbf {v} \mid {\boldsymbol {\Sigma }})

If ${\boldsymbol {\Sigma }}=k\mathbf {I} _{n}$ , theuniform hypersphere distribution, $\operatorname {ACG(\mathbf {I} _{n})}$ results, with constant density equal to the reciprocal of thesurface area of $\mathbb {S} ^{n-1}$ :

{\tilde {p}}_{\text{ACG}}(\mathbf {v} \mid k\mathbf {I} _{n})=p_{\text{uniform}}={\frac {\Gamma ({\frac {n}{2}})}{2\pi ^{\frac {n}{2}}}}

ACG via transformation of normal or uniform variates

[edit]

Let $\mathbf {T}$ be any $n {\displaystyle n}$ -by- $n {\displaystyle n}$ invertible matrix such that $\mathbf {T} \mathbf {T} '={\boldsymbol {\Sigma }}$ . Let $\mathbf {u} \sim \operatorname {ACG} (\mathbf {I} _{n})$ (uniform) and $s\sim \chi (n)$ (chi distribution), so that: $\mathbf {x} =s\mathbf {Tu} \sim {\mathcal {N}}_{n}(\mathbf {0} ,{\boldsymbol {\Sigma }})$ (multivariate normal). Now consider:

\mathbf {v} ={\frac {\mathbf {Tu} }{\lVert \mathbf {Tu} \rVert }}={\frac {\mathbf {x} }{\lVert \mathbf {x} \rVert }}\sim \operatorname {ACG} ({\boldsymbol {\Sigma }})

which shows that the ACG distributionalso results from applying, to uniform variates, thenormalized linear transform:^[9]

f_{\mathbf {T} }(\mathbf {u} )={\frac {\mathbf {Tu} }{\lVert \mathbf {Tu} \rVert }}

Some further explanation of these two ways to obtain $\mathbf {v} \sim \operatorname {ACG} ({\boldsymbol {\Sigma }})$ may be helpful:

If we start with $\mathbf {x} \in \mathbb {R} ^{n}$ , sampled from a multivariate normal, we can project radially onto $\mathbb {S} ^{n-1}$ to obtain ACG variates. To derive the ACG density, we first do a change of variables: $\mathbf {x} \mapsto (r,\mathbf {v} )$ , which is still an $n {\displaystyle n}$ -dimensional representation, and this transformation induces the differential volume change factor, $r^{n-1}$ , which is proportional to volume in the $(n-1)$ -dimensionaltangent space perpendicular to $\mathbf {x}$ . Then, to finally obtain the ACG density on the $(n-1)$ -dimensional unitsphere, we need to marginalize over $r {\displaystyle r}$ .
If we start with $\mathbf {u} \in \mathbb {S} ^{n-1}$ , sampled from the uniform distribution, we do not need to marginalize, because we are already in $n-1$ dimensions. Instead, to obtain ACG variates (and the associated density), we can directly do the change of variables, $\mathbf {v} =f_{\mathbf {T} }(\mathbf {u} )$ , for which further details are given in the next subsection.

Caveat: when ${\boldsymbol {\mu }}$ is nonzero, although $s\mathbf {Tu} +{\boldsymbol {\mu }}\sim {\mathcal {N}}_{d}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ , a similar duality doesnot hold:

{\frac {\mathbf {Tu} +{\boldsymbol {\mu }}}{\lVert \mathbf {Tu} +{\boldsymbol {\mu }}\rVert }}\neq {\frac {s\mathbf {Tu} +{\boldsymbol {\mu }}}{\lVert s\mathbf {Tu} +{\boldsymbol {\mu }}\rVert }}\sim {\mathcal {PN}}_{n}({\boldsymbol {\mu ,\Sigma }})

Although we can radially project affine-transformed normal variates to get ${\mathcal {PN}}_{n}$ variates, this does not work for uniform variates.

Wider application of the normalized linear transform

[edit]

The normalized linear transform, $\mathbf {v} =f_{\mathbf {T} }(\mathbf {u} )$ , is abijection from the unitsphere to itself; the inverse is $\mathbf {u} =f_{\mathbf {T} ^{-1}}(\mathbf {v} )$ . This transform is of independent interest, as it may be applied as aprobabilistic flow on the hypersphere (similar to anormalizing flow) to generalize also other (non-uniform) distributions on hyperspheres, for example theVon Mises-Fisher distribution. The fact that we have a closed form for the ACG density allows us to recover also in closed form thedifferential volume change induced by this transform.

For the change of variables, $\mathbf {v} =f_{\mathbf {T} }(\mathbf {u} )$ on themanifold, $\mathbb {S} ^{n-1}$ , the uniform and ACG densities are related as:^[6]

{\tilde {p}}_{\text{ACG}}(\mathbf {v} \mid {\boldsymbol {\Sigma }})={\frac {p_{\text{uniform}}}{R(\mathbf {v} ,{\boldsymbol {\Sigma }})}}

where the (constant) uniform density is $p_{\text{uniform}}={\frac {\Gamma (n/2)}{2\pi ^{n/2}}}$ and where $R(\mathbf {v} ,{\boldsymbol {\Sigma }})$ is the differential volume change factor from the input to the output of the transformation; specifically, it is given by the absolute value of thedeterminant of an $(n-1)$ -by- $(n-1)$ matrix:

R(\mathbf {v} ,{\boldsymbol {\Sigma }})=\operatorname {abs} \left|\mathbf {Q} _{\mathbf {v} }'\mathbf {J} _{\mathbf {u} }\mathbf {Q} _{\mathbf {u} }\right|

where $\mathbf {J} _{\mathbf {u} }$ is the $n {\displaystyle n}$ -by- $n {\displaystyle n}$ Jacobian matrix of thetransformation in Euclidean space, $f_{\mathbf {T} }:\mathbb {R} ^{n}\to \mathbb {R} ^{n}$ , evaluated at $\mathbf {u}$ . InEuclidean space, the transformation and its Jacobian are non-invertible, but when the domain and co-domain are restricted to $\mathbb {S} ^{n-1}$ , then $f_{\mathbf {T} }:\mathbb {S} ^{n-1}\to \mathbb {S} ^{n-1}$ is a bijection and the induced differential volume ratio, $R(\mathbf {v} ,{\boldsymbol {\Sigma }})$ is obtained by projecting $\mathbf {J} _{\mathbf {u} }$ onto the $(n-1)$ -dimensional tangent spaces at the transformation input and output: $\mathbf {Q} _{\mathbf {u} },\mathbf {Q} _{\mathbf {v} }$ are $n {\displaystyle n}$ -by- $(n-1)$ matrices whose orthonormal columns span the tangent spaces. Although the above determinant formula is relatively easy to evaluate numerically on a software platform equipped withlinear algebra andautomatic differentiation, a simple closed form is hard to derive directly. However, since we already have ${\tilde {p}}_{\text{ACG}}$ , we can recover:

R(\mathbf {v} ,{\boldsymbol {\Sigma }})=\left|{\boldsymbol {\Sigma }}\right|^{\frac {1}{2}}(\mathbf {v} '{\boldsymbol {\Sigma }}^{-1}\mathbf {v} )^{\frac {n}{2}}={\frac {\operatorname {abs} \left|\mathbf {T} \right|}{\lVert \mathbf {Tu} \rVert ^{n}}}

where in the final RHS it is understood that ${\boldsymbol {\Sigma }}=\mathbf {T} \mathbf {T} '$ and $\mathbf {u} =f_{\mathbf {T} ^{-1}}(\mathbf {v} )$ .

The normalized linear transform can now be used, for example, to give a closed-form density for a more flexible distribution on the hypersphere, that is generalized from theVon Mises-Fisher. Let $\mathbf {x} \sim {\text{VMF}}({\boldsymbol {\mu }},\kappa )$ and $\mathbf {v} =f_{\mathbf {T} }(\mathbf {x} )$ ; the resulting density is:

p(\mathbf {v} \mid {\boldsymbol {\mu }},\kappa ,\mathbf {T} )={\frac {{\tilde {p}}_{\text{VMF}}{\bigl (}\mathbf {f} _{T^{-1}}(\mathbf {v} )\mid {\boldsymbol {\mu }},\kappa {\bigr )}}{R(\mathbf {v} ,\mathbf {T} \mathbf {T} ')}}

References

[edit]

^Wang & Gelfand 2013.
^Pukkila & Rao 1988.
^^a ^bHernandez-Stumpfhauser, Breidt & van der Woerd 2017, p. 115.
^^a ^bPukkila & Rao 1988, p. 381.
^Hernandez-Stumpfhauser, Breidt & van der Woerd 2017, p. 117.
^^a ^bSorrenson et al. 2024, Appendix A.
^Hernandez-Stumpfhauser, Breidt & van der Woerd 2017, Supplementary material, p. 1.
^Hernandez-Stumpfhauser, Breidt & van der Woerd 2017, p. 123.
^^a ^bTyler 1987.

Sources

[edit]

Pukkila, Tarmo M.; Rao, C. Radhakrishna (1988). "Pattern recognition based on scale invariant discriminant functions".Information Sciences.45 (3):379–389.doi:10.1016/0020-0255(88)90012-6.
Hernandez-Stumpfhauser, Daniel; Breidt, F. Jay; van der Woerd, Mark J. (2017)."The General Projected Normal Distribution of Arbitrary Dimension: Modeling and Bayesian Inference".Bayesian Analysis.12 (1):113–133.doi:10.1214/15-BA989.
Wang, Fangpo; Gelfand, Alan E (2013)."Directional data analysis under the general projected normal distribution".Statistical Methodology.10 (1). Elsevier:113–127.doi:10.1016/j.stamet.2012.07.005.PMC 3773532.PMID 24046539.
Tyler, David E (1987). "Statistical analysis for the angular central Gaussian distribution on the sphere".Biometrika.74 (3):579–589.doi:10.2307/2336697.JSTOR 2336697.
Sorrenson, Peter; Draxler, Felix; Rousselot, Armand; Hummerich, Sander; Köthe, Ullrich (2024). "Learning Distributions on Manifolds with Free-Form Flows".arXiv:2312.09852 [cs.LG].