Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Statistical distance

From Wikipedia, the free encyclopedia
Distance between two statistical objects
This article has multiple issues. Please helpimprove it or discuss these issues on thetalk page.(Learn how and when to remove these messages)
This article includes a list ofgeneral references, butit lacks sufficient correspondinginline citations. Please help toimprove this article byintroducing more precise citations.(February 2012) (Learn how and when to remove this message)
icon
This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Statistical distance" – news ·newspapers ·books ·scholar ·JSTOR
(December 2020) (Learn how and when to remove this message)
(Learn how and when to remove this message)

Instatistics,probability theory, andinformation theory, astatistical distance quantifies thedistance between two statistical objects, which can be tworandom variables, or twoprobability distributions orsamples, or the distance can be between an individual sample point and a population or a wider sample of points.

A distance between populations can be interpreted as measuring the distance between twoprobability distributions and hence they are essentially measures of distances betweenprobability measures. Where statistical distance measures relate to the differences betweenrandom variables, these may havestatistical dependence,[1] and hence these distances are not directly related to measures of distances between probability measures. Again, a measure of distance between random variables may relate to the extent of dependence between them, rather than to their individual values.

Many statistical distance measures are notmetrics, and some are not symmetric. Some types of distance measures, which generalizesquared distance, are referred to as (statistical)divergences.

Terminology

[edit]

Many terms are used to refer to various notions of distance; these are often confusingly similar, and may be used inconsistently between authors and over time, either loosely or with precise technical meaning. In addition to "distance", similar terms includedeviance,deviation,discrepancy, discrimination, anddivergence, as well as others such ascontrast function andmetric. Terms frominformation theory includecross entropy,relative entropy,discrimination information, andinformation gain.

Distances as metrics

[edit]

Metrics

[edit]

Ametric on a setX is afunction (called thedistance function or simplydistance)d :X ×XR+(whereR+ is the set of non-negativereal numbers). For allx,y,z inX, this function is required to satisfy the following conditions:

  1. d(x,y) ≥ 0     (non-negativity)
  2. d(x,y) = 0   if and only if  x =y     (identity of indiscernibles. Note that condition 1 and 2 together producepositive definiteness)
  3. d(x,y) =d(y,x)     (symmetry)
  4. d(x,z) ≤d(x,y) +d(y,z)     (subadditivity /triangle inequality).

Generalized metrics

[edit]

Many statistical distances are notmetrics, because they lack one or more properties of proper metrics. For example,pseudometrics violate property (2), identity of indiscernibles;quasimetrics violate property (3), symmetry; andsemimetrics violate property (4), the triangle inequality. Statistical distances that satisfy (1) and (2) are referred to asdivergences.

Statistically close

[edit]

The total variation distance of two distributionsX{\displaystyle X} andY{\displaystyle Y} over a finite domainD{\displaystyle D}, (often referred to asstatistical difference[2]orstatistical distance[3] in cryptography) is defined as

Δ(X,Y)=12αD|Pr[X=α]Pr[Y=α]|{\displaystyle \Delta (X,Y)={\frac {1}{2}}\sum _{\alpha \in D}|\Pr[X=\alpha ]-\Pr[Y=\alpha ]|}.

We say that twoprobability ensembles{Xk}kN{\displaystyle \{X_{k}\}_{k\in \mathbb {N} }} and{Yk}kN{\displaystyle \{Y_{k}\}_{k\in \mathbb {N} }} are statistically close ifΔ(Xk,Yk){\displaystyle \Delta (X_{k},Y_{k})} is anegligible function ink{\displaystyle k}.

Examples

[edit]

Metrics

[edit]

Divergences

[edit]

See also

[edit]

Notes

[edit]
  1. ^Dodge, Y. (2003)—entry for distance
  2. ^Goldreich, Oded (2001).Foundations of Cryptography: Basic Tools (1st ed.). Berlin:Cambridge University Press. p. 106.ISBN 0-521-79172-3.
  3. ^Reyzin, Leo. (Lecture Notes)Extractors and the Leftover Hash Lemma

External links

[edit]
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis (see alsoTemplate:Least squares and regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics

References

[edit]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Statistical_distance&oldid=1289978641"
Category:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp