Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

U-statistic

From Wikipedia, the free encyclopedia
Class of statistics in estimation theory

Instatistical theory, aU-statistic is a class of statistics defined as the average over the application of a given function applied to all tuples of a fixed size. The letter "U" stands for unbiased.[citation needed] In elementary statistics, U-statistics arise naturally in producingminimum-variance unbiased estimators.

The theory of U-statistics allows aminimum-variance unbiased estimator to be derived from eachunbiased estimator of anestimable parameter (alternatively,statisticalfunctional) for large classes ofprobability distributions.[1][2] An estimable parameter is ameasurable function of the population'scumulative probability distribution: For example, for every probability distribution, the population median is an estimable parameter. The theory of U-statistics applies to general classes of probability distributions.

History

[edit]

Many statistics originally derived for particular parametric families have been recognized as U-statistics for general distributions. Innon-parametric statistics, the theory of U-statistics is used to establish for statistical procedures (such as estimators and tests) and estimators relating to theasymptotic normality and to the variance (in finite samples) of such quantities.[3] The theory has been used to study more general statistics as well asstochastic processes, such asrandom graphs.[4][5][6]

Suppose that a problem involvesindependent and identically-distributed random variables and that estimation of a certain parameter is required. Suppose that a simple unbiased estimate can be constructed based on only a few observations: this defines the basic estimator based on a given number of observations. For example, a single observation is itself an unbiased estimate of the mean and a pair of observations can be used to derive an unbiased estimate of the variance. The U-statistic based on this estimator is defined as the average (across all combinatorial selections of the given size from the full set of observations) of the basic estimator applied to the sub-samples.

Pranab K. Sen (1992) provides a review of the paper byWassily Hoeffding (1948), which introduced U-statistics and set out the theory relating to them, and in doing so Sen outlines the importance U-statistics have in statistical theory. Sen says,[7] “The impact of Hoeffding (1948) is overwhelming at the present time and is very likely to continue in the years to come.” Note that the theory of U-statistics is not limited to[8] the case ofindependent and identically-distributed random variables or to scalar random-variables.[9]

Definition

[edit]

The term U-statistic, due to Hoeffding (1948), is defined as follows.

LetK{\displaystyle K} be either the real or complex numbers, and letf:(Kd)rK{\displaystyle f\colon (K^{d})^{r}\to K} be aK{\displaystyle K}-valued function ofr{\displaystyle r}d{\displaystyle d}-dimensional variables.For eachnr{\displaystyle n\geq r} the associated U-statisticfn:(Kd)nK{\displaystyle f_{n}\colon (K^{d})^{n}\to K} is defined to be the average of the valuesf(xi1,,xir){\displaystyle f(x_{i_{1}},\dotsc ,x_{i_{r}})} over the setIr,n{\displaystyle I_{r,n}} ofr{\displaystyle r}-tuples of indices from{1,2,,n}{\displaystyle \{1,2,\dotsc ,n\}} with distinct entries.Formally,

fn(x1,,xn)=1i=0r1(ni)(i1,,ir)Ir,nf(xi1,,xir){\displaystyle f_{n}(x_{1},\dotsc ,x_{n})={\frac {1}{\prod _{i=0}^{r-1}(n-i)}}\sum _{(i_{1},\dotsc ,i_{r})\in I_{r,n}}f(x_{i_{1}},\dotsc ,x_{i_{r}})}.

In particular, iff{\displaystyle f} is symmetric the above is simplified to

fn(x1,,xn)=1(nr)(i1,,ir)Jr,nf(xi1,,xir){\displaystyle f_{n}(x_{1},\dotsc ,x_{n})={\frac {1}{\binom {n}{r}}}\sum _{(i_{1},\dotsc ,i_{r})\in J_{r,n}}f(x_{i_{1}},\dotsc ,x_{i_{r}})},

where nowJr,n{\displaystyle J_{r,n}} denotes the subset ofIr,n{\displaystyle I_{r,n}} ofincreasing tuples.

Each U-statisticfn{\displaystyle f_{n}} is necessarily asymmetric function.

U-statistics are very natural in statistical work, particularly in Hoeffding's context ofindependent and identically distributed random variables, or more generally forexchangeable sequences, such as insimple random sampling from a finite population, where the defining property is termed ‘inheritance on the average’.

Fisher'sk-statistics and Tukey'spolykays are examples ofhomogeneous polynomial U-statistics (Fisher, 1929; Tukey, 1950).

For a simple random sampleφ of size n taken from a population of size N, the U-statistic has the property that the average over sample values ƒn() is exactly equal to the population value ƒN(x).[clarification needed]

Examples

[edit]

Some examples:Iff(x)=x{\displaystyle f(x)=x} the U-statisticfn(x)=x¯n=(x1++xn)/n{\displaystyle f_{n}(x)={\bar {x}}_{n}=(x_{1}+\cdots +x_{n})/n} is the sample mean.

Iff(x1,x2)=|x1x2|{\displaystyle f(x_{1},x_{2})=|x_{1}-x_{2}|}, the U-statistic is the mean pairwise deviationfn(x1,,xn)=2/(n(n1))i>j|xixj|{\displaystyle f_{n}(x_{1},\ldots ,x_{n})=2/(n(n-1))\sum _{i>j}|x_{i}-x_{j}|}, defined forn2{\displaystyle n\geq 2}.

Iff(x1,x2)=(x1x2)2/2{\displaystyle f(x_{1},x_{2})=(x_{1}-x_{2})^{2}/2}, the U-statistic is thesample variancefn(x)=(xix¯n)2/(n1){\displaystyle f_{n}(x)=\sum (x_{i}-{\bar {x}}_{n})^{2}/(n-1)}with divisorn1{\displaystyle n-1}, defined forn2{\displaystyle n\geq 2}.

The thirdk{\displaystyle k}-statistick3,n(x)=(xix¯n)3n/((n1)(n2)){\displaystyle k_{3,n}(x)=\sum (x_{i}-{\bar {x}}_{n})^{3}n/((n-1)(n-2))},the sampleskewness defined forn3{\displaystyle n\geq 3},is a U-statistic.

The following case highlights an important point. Iff(x1,x2,x3){\displaystyle f(x_{1},x_{2},x_{3})} is themedian of three values,fn(x1,,xn){\displaystyle f_{n}(x_{1},\ldots ,x_{n})} is not the median ofn{\displaystyle n} values. However, it is a minimum variance unbiased estimate of the expected value of the median of three values, not the median of the population. Similar estimates play a central role where the parameters of a family ofprobability distributions are being estimated by probability weighted moments orL-moments.

See also

[edit]

Notes

[edit]
  1. ^Cox & Hinkley (1974), p. 200, p. 258
  2. ^Hoeffding (1948), between Eq's(4.3),(4.4)
  3. ^Sen (1992)
  4. ^Page 508 inKoroljuk, V. S.; Borovskich, Yu. V. (1994).Theory ofU-statistics. Mathematics and its Applications. Vol. 273 (Translated by P. V. Malyshev and D. V. Malyshev from the 1989 Russian original ed.). Dordrecht: Kluwer Academic Publishers Group. pp. x+552.ISBN 0-7923-2608-3.MR 1472486.
  5. ^Pages 381–382 inBorovskikh, Yu. V. (1996).U-statistics in Banach spaces. Utrecht: VSP. pp. xii+420.ISBN 90-6764-200-2.MR 1419498.
  6. ^Page xii inKwapień, Stanisƚaw; Woyczyński, Wojbor A. (1992).Random series and stochastic integrals: Single and multiple. Probability and its Applications. Boston, MA: Birkhäuser Boston, Inc. pp. xvi+360.ISBN 0-8176-3572-6.MR 1167198.
  7. ^Sen (1992) p. 307
  8. ^Sen (1992), p306
  9. ^Borovskikh's last chapter discusses U-statistics forexchangeablerandom elements taking values in avector space (separableBanach space).

References

[edit]
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
Retrieved from "https://en.wikipedia.org/w/index.php?title=U-statistic&oldid=1334922894"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp