Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Directional statistics

From Wikipedia, the free encyclopedia
Subdiscipline of statistics

Directional statistics (alsocircular statistics orspherical statistics) is the subdiscipline ofstatistics that deals withdirections (unit vectors inEuclidean space,Rn),axes (lines through the origin inRn) orrotations inRn. More generally, directional statistics deals with observations on compactRiemannian manifolds including theStiefel manifold.

The overall shape of aprotein can be parameterized as a sequence of points on the unitsphere. Shown are two views of the sphericalhistogram of such points for a large collection of protein structures. The statistical treatment of such data is in the realm of directional statistics.[1]

The fact that 0degrees and 360 degrees are identicalangles, so that for example 180 degrees is not a sensiblemean of 2 degrees and 358 degrees, provides one illustration that special statistical methods are required for the analysis of some types of data (in this case, angular data). Other examples of data that may be regarded as directional include statistics involving temporal periods (e.g. time of day, week, month, year, etc.), compass directions,dihedral angles in molecules, orientations, rotations and so on.

Circular distributions

[edit]
Main article:Circular distribution

Anyprobability density function (pdf) p(x){\displaystyle \ p(x)} on the line can be"wrapped" around the circumference of a circle of unit radius.[2] That is, the pdf of the wrapped variableθ=xw=xmod2π  (π,π]{\displaystyle \theta =x_{w}=x{\bmod {2}}\pi \ \ \in (-\pi ,\pi ]}ispw(θ)=k=p(θ+2πk).{\displaystyle p_{w}(\theta )=\sum _{k=-\infty }^{\infty }{p(\theta +2\pi k)}.}

This concept can be extended to the multivariate context by an extension of the simple sum to a number ofF{\displaystyle F} sums that cover all dimensions in the feature space:pw(θ)=k1=kF=p(θ+2πk1e1++2πkFeF){\displaystyle p_{w}({\boldsymbol {\theta }})=\sum _{k_{1}=-\infty }^{\infty }\cdots \sum _{k_{F}=-\infty }^{\infty }{p({\boldsymbol {\theta }}+2\pi k_{1}\mathbf {e} _{1}+\dots +2\pi k_{F}\mathbf {e} _{F})}}whereek=(0,,0,1,0,,0)T{\displaystyle \mathbf {e} _{k}=(0,\dots ,0,1,0,\dots ,0)^{\mathsf {T}}} is thek{\displaystyle k}-th Euclidean basis vector.

The following sections show some relevant circular distributions.

von Mises circular distribution

[edit]
Main article:von Mises distribution

Thevon Mises distribution is a circular distribution which, like any other circular distribution, may be thought of as a wrapping of a certain linear probability distribution around the circle. The underlying linear probability distribution for the von Mises distribution is mathematically intractable; however, for statistical purposes, there is no need to deal with the underlying linear distribution. The usefulness of the von Mises distribution is twofold: it is the most mathematically tractable of all circular distributions, allowing simpler statistical analysis, and it is a close approximation to thewrapped normal distribution, which, analogously to the linear normal distribution, is important because it is the limiting case for the sum of a large number of small angular deviations. In fact, the von Mises distribution is often known as the "circular normal" distribution because of its ease of use and its close relationship to the wrapped normal distribution.[3]

The pdf of the von Mises distribution is:f(θ;μ,κ)=eκcos(θμ)2πI0(κ){\displaystyle f(\theta ;\mu ,\kappa )={\frac {e^{\kappa \cos(\theta -\mu )}}{2\pi I_{0}(\kappa )}}} whereI0{\displaystyle I_{0}} is the modifiedBessel function of order 0.

Circular uniform distribution

[edit]
Main article:Circular uniform distribution

The probability density function (pdf) of thecircular uniform distribution is given byU(θ)=12π.{\displaystyle U(\theta )={\frac {1}{2\pi }}.}

It can also be thought of asκ=0{\displaystyle \kappa =0} of the von Mises above.

Wrapped normal distribution

[edit]
Main article:Wrapped normal distribution

The pdf of thewrapped normal distribution (WN) is:WN(θ;μ,σ)=1σ2πk=exp[(θμ2πk)22σ2]=12πϑ(θμ2π,iσ22π){\displaystyle WN(\theta ;\mu ,\sigma )={\frac {1}{\sigma {\sqrt {2\pi }}}}\sum _{k=-\infty }^{\infty }\exp \left[{\frac {-(\theta -\mu -2\pi k)^{2}}{2\sigma ^{2}}}\right]={\frac {1}{2\pi }}\vartheta \left({\frac {\theta -\mu }{2\pi }},{\frac {i\sigma ^{2}}{2\pi }}\right)}where μ and σ are the mean and standard deviation of the unwrapped distribution, respectively andϑ(θ,τ){\displaystyle \vartheta (\theta ,\tau )} is theJacobi theta function:ϑ(θ,τ)=n=(w2)nqn2{\displaystyle \vartheta (\theta ,\tau )=\sum _{n=-\infty }^{\infty }(w^{2})^{n}q^{n^{2}}} whereweiπθ{\displaystyle w\equiv e^{i\pi \theta }} andqeiπτ.{\displaystyle q\equiv e^{i\pi \tau }.}

Wrapped Cauchy distribution

[edit]
Main article:Wrapped Cauchy distribution

The pdf of thewrapped Cauchy distribution (WC) is:WC(θ;θ0,γ)=n=γπ(γ2+(θ+2πnθ0)2)=12πsinhγcoshγcos(θθ0){\displaystyle WC(\theta ;\theta _{0},\gamma )=\sum _{n=-\infty }^{\infty }{\frac {\gamma }{\pi (\gamma ^{2}+(\theta +2\pi n-\theta _{0})^{2})}}={\frac {1}{2\pi }}\,\,{\frac {\sinh \gamma }{\cosh \gamma -\cos(\theta -\theta _{0})}}}whereγ{\displaystyle \gamma } is the scale factor andθ0{\displaystyle \theta _{0}} is the peak position.

Wrapped Lévy distribution

[edit]
Main article:Wrapped Lévy distribution

The pdf of thewrapped Lévy distribution (WL) is:fWL(θ;μ,c)=n=c2πec/2(θ+2πnμ)(θ+2πnμ)3/2{\displaystyle f_{WL}(\theta ;\mu ,c)=\sum _{n=-\infty }^{\infty }{\sqrt {\frac {c}{2\pi }}}\,{\frac {e^{-c/2(\theta +2\pi n-\mu )}}{(\theta +2\pi n-\mu )^{3/2}}}}where the value of the summand is taken to be zero whenθ+2πnμ0{\displaystyle \theta +2\pi n-\mu \leq 0},c{\displaystyle c} is the scale factor andμ{\displaystyle \mu } is the location parameter.

Projected normal distribution

[edit]
Main article:Projected normal distribution

The projected normal distribution is a circular distribution representing the direction of a random variable with multivariate normal distribution, obtained by radial projection of the variable over the unit (n-1)-sphere. Due to this, and unlike other commonly used circular distributions, it is not symmetric norunimodal.

Distributions on higher-dimensional manifolds

[edit]
Three points sets sampled from different Kent distributions on the sphere.

There also exist distributions on thetwo-dimensional sphere (such as theKent distribution[4]), theN-dimensional sphere (thevon Mises–Fisher distribution[5]) or thetorus (thebivariate von Mises distribution[6]).

Thematrix von Mises–Fisher distribution[7] is a distribution on theStiefel manifold, and can be used to construct probability distributions overrotation matrices.[8]

TheBingham distribution is a distribution over axes inN dimensions, or equivalently, over points on the (N − 1)-dimensional sphere with the antipodes identified.[9] For example, ifN = 2, the axes are undirected lines through the origin in the plane. In this case, each axis cuts the unit circle in the plane (which is the one-dimensional sphere) at two points that are each other's antipodes. ForN = 4, the Bingham distribution is a distribution over the space of unitquaternions (versors). Since a versor corresponds to a rotation matrix, the Bingham distribution forN = 4 can be used to construct probability distributions over the space of rotations, just like the Matrix-von Mises–Fisher distribution.

These distributions are for example used ingeology,[10]crystallography[11] andbioinformatics.[1][12][13]

Moments

[edit]

The raw vector (or trigonometric) moments of a circular distribution are defined as

mn=E(zn)=ΓP(θ)zndθ{\displaystyle m_{n}=\operatorname {E} (z^{n})=\int _{\Gamma }P(\theta )z^{n}\,d\theta }

whereΓ{\displaystyle \Gamma } is any interval of length2π{\displaystyle 2\pi },P(θ){\displaystyle P(\theta )} is thePDF of the circular distribution, andz=eiθ{\displaystyle z=e^{i\theta }}. Since the integralP(θ){\displaystyle P(\theta )} is unity, and the integration interval is finite, it follows that the moments of any circular distribution are always finite and well defined.

Sample moments are analogously defined:

m¯n=1Ni=1Nzin.{\displaystyle {\overline {m}}_{n}={\frac {1}{N}}\sum _{i=1}^{N}z_{i}^{n}.}

The population resultant vector, length, and mean angle are defined in analogy with the corresponding sample parameters.

ρ=m1{\displaystyle \rho =m_{1}}
R=|m1|{\displaystyle R=|m_{1}|}
θn=Arg(mn).{\displaystyle \theta _{n}=\operatorname {Arg} (m_{n}).}

In addition, the lengths of the higher moments are defined as:

Rn=|mn|{\displaystyle R_{n}=|m_{n}|}

while the angular parts of the higher moments are just(nθn)mod2π{\displaystyle (n\theta _{n}){\bmod {2}}\pi }. The lengths of all moments will lie between 0 and 1.

Measures of location and spread

[edit]

Various measures ofcentral tendency andstatistical dispersion may be defined for both the population and a sample drawn from that population.[3]

Central tendency

[edit]
Further information:Circular mean

The most common measure of location is the circular mean. The population circular mean is simply the first moment of the distribution while the sample mean is the first moment of the sample. The sample mean will serve as an unbiased estimator of the population mean.

When data is concentrated, themedian andmode may be defined by analogy to the linear case, but for more dispersed or multi-modal data, these concepts are not useful.

Dispersion

[edit]
See also:Yamartino method

The most common measures of circular spread are:

Distribution of the mean

[edit]

Given a set ofN measurementszn=eiθn{\displaystyle z_{n}=e^{i\theta _{n}}} the mean value ofz is defined as:

z¯=1Nn=1Nzn{\displaystyle {\overline {z}}={\frac {1}{N}}\sum _{n=1}^{N}z_{n}}

which may be expressed as

z¯=C¯+iS¯{\displaystyle {\overline {z}}={\overline {C}}+i{\overline {S}}}

where

C¯=1Nn=1Ncos(θn) and S¯=1Nn=1Nsin(θn){\displaystyle {\overline {C}}={\frac {1}{N}}\sum _{n=1}^{N}\cos(\theta _{n}){\text{ and }}{\overline {S}}={\frac {1}{N}}\sum _{n=1}^{N}\sin(\theta _{n})}

or, alternatively as:

z¯=R¯eiθ¯{\displaystyle {\overline {z}}={\overline {R}}e^{i{\overline {\theta }}}}

where

R¯=C¯2+S¯2 and θ¯=arctan(S¯/C¯).{\displaystyle {\overline {R}}={\sqrt {{\overline {C}}^{2}+{\overline {S}}^{2}}}{\text{ and }}{\overline {\theta }}=\arctan({\overline {S}}/{\overline {C}}).}

The distribution of the mean angle (θ¯{\displaystyle {\overline {\theta }}}) for a circular pdfP(θ) will be given by:

P(C¯,S¯)dC¯dS¯=P(R¯,θ¯)dR¯dθ¯=ΓΓn=1N[P(θn)dθn]{\displaystyle P({\overline {C}},{\overline {S}})\,d{\overline {C}}\,d{\overline {S}}=P({\overline {R}},{\overline {\theta }})\,d{\overline {R}}\,d{\overline {\theta }}=\int _{\Gamma }\cdots \int _{\Gamma }\prod _{n=1}^{N}\left[P(\theta _{n})\,d\theta _{n}\right]}

whereΓ{\displaystyle \Gamma } is over any interval of length2π{\displaystyle 2\pi } and the integral is subject to the constraint thatS¯{\displaystyle {\overline {S}}} andC¯{\displaystyle {\overline {C}}} are constant, or, alternatively, thatR¯{\displaystyle {\overline {R}}} andθ¯{\displaystyle {\overline {\theta }}} are constant.

The calculation of the distribution of the mean for most circular distributions is not analytically possible, and in order to carry out an analysis of variance, numerical or mathematical approximations are needed.[14]

Thecentral limit theorem may be applied to the distribution of the sample means. (main article:Central limit theorem for directional statistics). It can be shown[14] that the distribution of[C¯,S¯]{\displaystyle [{\overline {C}},{\overline {S}}]} approaches abivariate normal distribution in the limit of large sample size.

Goodness of fit and significance testing

[edit]

For cyclic data – (e.g., is it uniformly distributed) :

See also

[edit]

References

[edit]
  1. ^abHamelryck, Thomas; Kent, John T.; Krogh, Anders (2006)."Hamelryck, T., Kent, J., Krogh, A. (2006) Sampling realistic protein conformations using local structural bias. PLoS Comput. Biol., 2(9): e131".PLOS Computational Biology.2 (9): e131.Bibcode:2006PLSCB...2..131H.doi:10.1371/journal.pcbi.0020131.PMC 1570370.PMID 17002495.
  2. ^Bahlmann, C., (2006),Directional features in online handwriting recognition, Pattern Recognition, 39
  3. ^abFisher 1993.
  4. ^Kent, J (1982)The Fisher–Bingham distribution on the sphere. J Royal Stat Soc, 44, 71–80.
  5. ^Fisher, RA (1953) Dispersion on a sphere. Proc. Roy. Soc. London Ser. A., 217, 295–305
  6. ^Mardia, KM. Taylor; CC; Subramaniam, GK. (2007). "Protein Bioinformatics and Mixtures of Bivariate von Mises Distributions for Angular Data".Biometrics.63 (2):505–512.doi:10.1111/j.1541-0420.2006.00682.x.PMID 17688502.S2CID 14293602.
  7. ^Pal, Subhadip; Sengupta, Subhajit; Mitra, Riten; Banerjee, Arunava (September 2020)."Conjugate Priors and Posterior Inference for the Matrix Langevin Distribution on the Stiefel Manifold".Bayesian Analysis.15 (3):871–908.doi:10.1214/19-BA1176.ISSN 1936-0975.S2CID 209974627.
  8. ^Downs (1972). "Orientational statistics".Biometrika.59 (3):665–676.doi:10.1093/biomet/59.3.665.
  9. ^Bingham, C. (1974)."An Antipodally Symmetric Distribution on the Sphere".Ann. Stat.2 (6):1201–1225.doi:10.1214/aos/1176342874.
  10. ^Peel, D.; Whiten, WJ.; McLachlan, GJ. (2001)."Fitting mixtures of Kent distributions to aid in joint set identification"(PDF).J. Am. Stat. Assoc.96 (453):56–63.doi:10.1198/016214501750332974.S2CID 11667311.
  11. ^Krieger Lassen, N. C.; Juul Jensen, D.; Conradsen, K. (1994). "On the statistical analysis of orientation data".Acta Crystallogr.A50 (6):741–748.Bibcode:1994AcCrA..50..741K.doi:10.1107/S010876739400437X.
  12. ^Kent, J.T., Hamelryck, T. (2005).Using the Fisher–Bingham distribution in stochastic models for protein structureArchived 2024-01-20 at theWayback Machine. In S. Barber, P.D. Baxter, K.V.Mardia, & R.E. Walls (Eds.), Quantitative Biology, Shape Analysis, and Wavelets, pp. 57–60. Leeds, Leeds University Press
  13. ^Boomsma, Wouter; Mardia, Kanti V.; Taylor, Charles C.; Ferkinghoff-Borg, Jesper; Krogh, Anders; Hamelryck, Thomas (2008)."A generative, probabilistic model of local protein structure".Proceedings of the National Academy of Sciences.105 (26):8932–8937.Bibcode:2008PNAS..105.8932B.doi:10.1073/pnas.0801715105.PMC 2440424.PMID 18579771.
  14. ^abJammalamadaka & Sengupta 2001.

Books on directional statistics

[edit]
Discrete
univariate
with finite
support
with infinite
support
Continuous
univariate
supported on a
bounded interval
supported on a
semi-infinite
interval
supported
on the whole
real line
with support
whose type varies
Mixed
univariate
continuous-
discrete
Multivariate
(joint)
Directional
Degenerate
andsingular
Degenerate
Dirac delta function
Singular
Cantor
Families
Retrieved from "https://en.wikipedia.org/w/index.php?title=Directional_statistics&oldid=1269954966"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp