Movatterモバイル変換

[0]ホーム

Jump to content

Bias of an estimator

Edit links

From Wikipedia, the free encyclopedia

Statistical property

For broader coverage of this topic, seeBias (statistics).

Instatistics, thebias of an estimator (orbias function) is the difference between thisestimator'sexpected value and thetrue value of the parameter being estimated. An estimator or decision rule with zero bias is calledunbiased. In statistics, "bias" is anobjective property of an estimator. Bias is a distinct concept fromconsistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased (seebias versus consistency for more).

All else being equal, an unbiased estimator is preferable to a biased estimator, although in practice, biased estimators (with generally small bias) are frequently used. When a biased estimator is used, bounds of the bias are calculated. A biased estimator may be used for various reasons: because an unbiased estimator does not exist without further assumptions about a population; because an estimator is difficult to compute (as inunbiased estimation of standard deviation); because a biased estimator may be unbiased with respect to different measures ofcentral tendency; because a biased estimator gives a lower value of someloss function (particularlymean squared error) compared with unbiased estimators (notably inshrinkage estimators); or because in some cases being unbiased is too strong a condition, and the only unbiased estimators are not useful.

Bias can also be measured with respect to themedian, rather than the mean (expected value), in which case one distinguishesmedian-unbiased from the usualmean-unbiasedness property. Mean-unbiasedness is not preserved under non-lineartransformations, though median-unbiasedness is (see§ Effect of transformations); for example, thesample variance is a biased estimator for the population variance. These are all illustrated below.

An unbiased estimator for a parameter need not always exist. For example, there is no unbiased estimator for the reciprocal of the parameter of a binomial random variable.^[1]

Definition

[edit]

Suppose we have astatistical model, parameterized by a real numberθ, giving rise to a probability distribution for observed data, $P_{\theta }(x)=P(x\mid \theta )$ , and a statistic ${\hat {\theta }}$ which serves as anestimator ofθ based on any observed data $x {\displaystyle x}$ . That is, we assume that our data follows some unknown distribution $P(x\mid \theta )$ (whereθ is a fixed, unknown constant that is part of this distribution), and then we construct some estimator ${\hat {\theta }}$ that maps observed data to values that we hope are close toθ. Thebias of ${\hat {\theta }}$ relative to $\theta$ is defined as^[2] $\operatorname {Bias} ({\hat {\theta }},\theta )=\operatorname {Bias} _{\theta }\left[\,{\hat {\theta }}\,\right]=\operatorname {E} _{x\mid \theta }\left[\,{\hat {\theta }}\,\right]-\theta =\operatorname {E} _{x\mid \theta }\left[{\hat {\theta }}-\theta \right],$

where $\operatorname {E} _{x\mid \theta }$ denotesexpected value over the distribution $P(x\mid \theta )$ (i.e., averaging over all possible observations $x {\displaystyle x}$ ). The second equation follows sinceθ is measurable with respect to the conditional distribution $P(x\mid \theta )$ .

An estimator is said to beunbiased if its bias is zero for all values of the parameterθ, or equivalently, if the expected value of the estimator matches that of the parameter.^[3] Unbiasedness is not guaranteed to carry over. For example, if ${\hat {\theta }}$ is an unbiased estimator for parameterθ, it is not guaranteed in general that g( ${\hat {\theta }}$ ) is an unbiased estimator forg(θ), unlessg is a linear function.^[4]

In a simulation experiment concerning the properties of an estimator, the bias of the estimator may be assessed using themean signed difference.

Examples

[edit]

Sample variance

[edit]

Main article:Sample variance

The sample variance highlights two different issues about bias and risk. First, the “naive” estimator that divides bynis biased downward because the sample mean is estimated from the same data. Multiplying byn/(n−1) (Bessel’s correction) yields an unbiased estimator. Second, unbiasedness does not imply minimummean squared error.

SupposeX₁, ...,X_n areindependent and identically distributed (i.i.d.) random variables withexpectationμ andvarianceσ². If thesample mean and uncorrectedsample variance are defined as

${\overline {X}}\,={\frac {1}{n}}\sum _{i=1}^{n}X_{i}\qquad S^{2}={\frac {1}{n}}\sum _{i=1}^{n}\left(X_{i}-{\overline {X}}\right)^{2}\qquad$

thenS² is a biased estimator ofσ². This follows immediately from thelaw of total variance because

$\underbrace {\operatorname {Var} (X)} _{\sigma ^{2}}=\underbrace {\operatorname {E} \left[\operatorname {Var} \left(X\mid {\bar {X}}\right)\right]} _{E[S^{2}]}+\underbrace {\operatorname {Var} \left(\operatorname {E} \left[X\mid {\bar {X}}\right]\right)} _{\sigma ^{2}/n},\quad \implies E[S^{2}]={\frac {n-1}{n}}\sigma ^{2}.$

In other words, the expected value of the uncorrected sample variance does not equal the population varianceσ², unless multiplied by a normalization factor. The ratio between the biased (uncorrected) and unbiased estimates of the variance is known asBessel's correction. The sample mean, on the other hand, is an unbiased^[5] estimator of the population mean μ.^[3] The equality of the second term on the right-hand side in the equation above can be understood in terms ofBienaymé's identity,

${\begin{aligned}\operatorname {Var} \left(\operatorname {E} [X\mid {\bar {X}}]\right)&=\operatorname {Var} \left({\overline {X}}\right)=\operatorname {Var} \left({\frac {1}{n}}\sum _{i=1}^{n}X_{i}\right)\\[1ex]&={\frac {1}{n^{2}}}\sum _{i=1}^{n}\operatorname {Var} \left(X_{i}\right)={\frac {1}{n^{2}}}n\sigma ^{2}={\frac {\sigma ^{2}}{n}}.\end{aligned}}$

The reason that an uncorrected sample variance,S², is biased stems from the fact that the sample mean is anordinary least squares (OLS) estimator forμ: ${\overline {X}}$ is the number that makes the sum ${\textstyle \sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}}$ as small as possible. That is, when any other number is plugged into this sum, the sum can only increase. In particular, the choice $\mu \neq {\overline {X}}$ gives,

${\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}<{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2},$ and then ${\begin{aligned}\operatorname {E} [S^{2}]&=\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-{\overline {X}})^{2}{\bigg ]}<\operatorname {E} {\bigg [}{\frac {1}{n}}\sum _{i=1}^{n}(X_{i}-\mu )^{2}{\bigg ]}=\sigma ^{2}.\end{aligned}}$

The above discussion can be understood in geometric terms: the vector ${\vec {C}}=(X_{1}-\mu ,\ldots ,X_{n}-\mu )$ can be decomposed into the "mean part" and "variance part" by projecting to the direction of ${\vec {u}}=(1,\ldots ,1)$ and to that direction's orthogonal complement hyperplane. One gets ${\vec {A}}=({\overline {X}}-\mu ,\ldots ,{\overline {X}}-\mu )$ for the part along ${\vec {u}}$ and ${\vec {B}}=(X_{1}-{\overline {X}},\ldots ,X_{n}-{\overline {X}})$ for the complementary part. Since this is an orthogonal decomposition, Pythagorean theorem says $|{\vec {C}}|^{2}=|{\vec {A}}|^{2}+|{\vec {B}}|^{2}$ , and taking expectations we get $n\sigma ^{2}=n\operatorname {E} \left[({\overline {X}}-\mu )^{2}\right]+n\operatorname {E} [S^{2}]$ , as above (but times $n {\displaystyle n}$ ).If the distribution of ${\vec {C}}$ is rotationally symmetric, as in the case when $X_{i}$ are sampled from a Gaussian, then on average, the dimension along ${\vec {u}}$ contributes to $|{\vec {C}}|^{2}$ equally as the $n-1$ directions perpendicular to ${\vec {u}}$ , so that $\operatorname {E} \left[({\overline {X}}-\mu )^{2}\right]={\frac {\sigma ^{2}}{n}}$ and $\operatorname {E} [S^{2}]={\frac {n-1}{n}}\sigma ^{2}$ . This is in fact true in general, as explained above.

Estimating a Poisson probability

[edit]

A far more extreme case of a biased estimator being better than any unbiased estimator arises from thePoisson distribution.^[6]^[7] Suppose thatX has a Poisson distribution with expectation λ. Suppose it is desired to estimate $\operatorname {P} (X=0)^{2}=e^{-2\lambda }\quad$

with a sample of size 1. (For example, when incoming calls at a telephone switchboard are modeled as a Poisson process, andλ is the average number of calls per minute, thene^−2λ (the estimand) is the probability that no calls arrive in the next two minutes.)

Since the expectation of an unbiased estimatorδ(X) is equal to theestimand, i.e. $\operatorname {E} (\delta (X))=\sum _{x=0}^{\infty }\delta (x){\frac {\lambda ^{x}e^{-\lambda }}{x!}}=e^{-2\lambda },$

the only function of the data constituting an unbiased estimator is $\delta (x)=(-1)^{x}.\,$

To see this, note that when decomposing e^−λ from the above expression for expectation, the sum that is left is aTaylor series expansion of e^−λ as well, yielding e^−λe^−λ = e^−2λ (seeCharacterizations of the exponential function).

If the observed value ofX is 100, then the estimate is 1, although the true value of the quantity being estimated is very likely to be near 0, which is the opposite extreme. And, ifX is observed to be 101, then the estimate is even more absurd: It is −1, although the quantity being estimated must be positive.

The (biased)maximum likelihood estimator $e^{-2{X}}\quad$

is far better than this unbiased estimator. Not only is its value always positive but it is also more accurate in the sense that itsmean squared error $e^{-4\lambda }-2e^{\lambda (1/e^{2}-3)}+e^{\lambda (1/e^{4}-1)}\,$

is smaller; compare the unbiased estimator's MSE of $1-e^{-4\lambda }.\,$

The MSEs are functions of the true value λ. The bias of the maximum-likelihood estimator is: $e^{\lambda (1/e^{2}-1)}-e^{-2\lambda }.\,$

Maximum of a discrete uniform distribution

[edit]

Main article:Maximum of a discrete uniform distribution

The bias of maximum-likelihood estimators can be substantial. Consider a case wheren tickets numbered from 1 ton are placed in a box and one is selected at random, giving a valueX. Ifn is unknown, then the maximum-likelihood estimator ofn isX, even though the expectation ofX givenn is only (n + 1)/2; we can be certain only thatn is at leastX and is probably more. In this case, the natural unbiased estimator is 2X − 1.

Median-unbiased estimators

[edit]

The theory ofmedian-unbiased estimators was revived by George W. Brown in 1947:^[8]

An estimate of a one-dimensional parameter θ will be said to be median-unbiased, if, for fixed θ, the median of the distribution of the estimate is at the value θ; i.e., the estimate underestimates just as often as it overestimates. This requirement seems for most purposes to accomplish as much as the mean-unbiased requirement and has the additional property that it is invariant under one-to-one transformation.

Further properties of median-unbiased estimators have been noted by Lehmann, Birnbaum, van der Vaart and Pfanzagl.^[9] In particular, median-unbiased estimators exist in cases where mean-unbiased andmaximum-likelihood estimators do not exist. They are invariant underone-to-one transformations.

There are methods of construction median-unbiased estimators for probability distributions that havemonotone likelihood-functions, such as one-parameter exponential families, to ensure that they are optimal (in a sense analogous to minimum-variance property considered for mean-unbiased estimators).^[10]^[11] One such procedure is an analogue of the Rao–Blackwell procedure for mean-unbiased estimators: The procedure holds for a smaller class of probability distributions than does the Rao–Blackwell procedure for mean-unbiased estimation but for a larger class of loss-functions.^[11]

Bias with respect to other loss functions

[edit]

Any minimum-variancemean-unbiased estimator minimizes therisk (expected loss) with respect to the squared-errorloss function (among mean-unbiased estimators), as observed byGauss.^[12] A minimum-average absolute deviationmedian-unbiased estimator minimizes the risk with respect to theabsolute loss function (among median-unbiased estimators), as observed byLaplace.^[12]^[13] Other loss functions are used in statistics, particularly inrobust statistics.^[12]^[14]

Effect of transformations

[edit]

For univariate parameters, median-unbiased estimators remain median-unbiased undertransformations that preserve order (or reverse order).Note that, when a transformation is applied to a mean-unbiased estimator, the result need not be a mean-unbiased estimator of its corresponding population statistic. ByJensen's inequality, aconvex function as transformation will introduce positive bias, while aconcave function will introduce negative bias, and a function of mixed convexity may introduce bias in either direction, depending on the specific function and distribution. That is, for a non-linear functionf and a mean-unbiased estimatorU of a parameterp, the composite estimatorf(U) need not be a mean-unbiased estimator off(p). For example, thesquare root of the unbiased estimator of the populationvariance isnot a mean-unbiased estimator of the populationstandard deviation: the square root of the unbiasedsample variance, the correctedsample standard deviation, is biased. The bias depends both on the sampling distribution of the estimator and on the transform, and can be quite involved to calculate – seeunbiased estimation of standard deviation for a discussion in this case.

Bias, variance and mean squared error

[edit]

Main article:Bias–variance tradeoff

While bias quantifies theaverage difference to be expected between an estimator and an underlying parameter, an estimator based on a finite sample can additionally be expected to differ from the parameter due to the randomness in the sample.An estimator that minimises the bias will not necessarily minimise the mean square error.One measure which is used to try to reflect both types of difference is themean square error,^[2] $\operatorname {MSE} ({\hat {\theta }})=\operatorname {E} {\big [}({\hat {\theta }}-\theta )^{2}{\big ]}.$ This can be shown to be equal to the square of the bias, plus the variance:^[2] ${\begin{aligned}\operatorname {MSE} ({\hat {\theta }})=&(\operatorname {E} [{\hat {\theta }}]-\theta )^{2}+\operatorname {E} [\,({\hat {\theta }}-\operatorname {E} [\,{\hat {\theta }}\,])^{2}\,]\\=&(\operatorname {Bias} ({\hat {\theta }},\theta ))^{2}+\operatorname {Var} ({\hat {\theta }})\end{aligned}}$

When the parameter is a vector, an analogous decomposition applies:^[15] $\operatorname {MSE} ({\hat {\theta }})=\operatorname {trace} (\operatorname {Cov} ({\hat {\theta }}))+\left\Vert \operatorname {Bias} ({\hat {\theta }},\theta )\right\Vert ^{2}$ where $\operatorname {trace} (\operatorname {Cov} ({\hat {\theta }}))$ is the trace (diagonal sum) of thecovariance matrix of the estimator and $\left\Vert \operatorname {Bias} ({\hat {\theta }},\theta )\right\Vert ^{2}$ is the squarevector norm.

Example: Estimation of population variance

[edit]

For example,^[16] suppose an estimator of the form

$T^{2}=c\sum _{i=1}^{n}\left(X_{i}-{\overline {X}}\,\right)^{2}=cnS^{2}$

is sought for the population variance as above, but this time to minimise the MSE:

${\begin{aligned}\operatorname {MSE} =&\operatorname {E} \left[(T^{2}-\sigma ^{2})^{2}\right]\\=&\left(\operatorname {E} \left[T^{2}-\sigma ^{2}\right]\right)^{2}+\operatorname {Var} (T^{2})\end{aligned}}$

If the variablesX₁ ...X_n follow a normal distribution, thennS²/σ² has achi-squared distribution withn − 1 degrees of freedom, giving:

$\operatorname {E} [nS^{2}]=(n-1)\sigma ^{2}{\text{ and }}\operatorname {Var} (nS^{2})=2(n-1)\sigma ^{4}.$

and so

$\operatorname {MSE} =(c(n-1)-1)^{2}\sigma ^{4}+2c^{2}(n-1)\sigma ^{4}$

With a little algebra it can be confirmed that it isc = 1/(n + 1) which minimises this combined loss function, rather thanc = 1/(n − 1) which minimises just the square of the bias.

More generally it is only in restricted classes of problems that there will be an estimator that minimises the MSE independently of the parameter values.

However it is very common that there may be perceived to be abias–variance tradeoff, such that a small increase in bias can be traded for a larger decrease in variance, resulting in a more desirable estimator overall.

Bayesian view

[edit]

Most bayesians are rather unconcerned about unbiasedness (at least in the formal sampling-theory sense above) of their estimates. For example, Gelman and coauthors (1995) write: "From a Bayesian perspective, the principle of unbiasedness is reasonable in the limit of large samples, but otherwise it is potentially misleading."^[17]

Fundamentally, the difference between theBayesian approach and the sampling-theory approach above is that in the sampling-theory approach the parameter is taken as fixed, and then probability distributions of a statistic are considered, based on the predicted sampling distribution of the data. For a Bayesian, however, it is thedata which are known, and fixed, and it is the unknown parameter for which an attempt is made to construct a probability distribution, usingBayes' theorem:

$p(\theta \mid D,I)\propto p(\theta \mid I)p(D\mid \theta ,I)$

Here the second term, thelikelihood of the data given the unknown parameter value θ, depends just on the data obtained and the modelling of the data generation process. However a Bayesian calculation also includes the first term, theprior probability for θ, which takes account of everything the analyst may know or suspect about θbefore the data comes in. This information plays no part in the sampling-theory approach; indeed any attempt to include it would be considered "bias" away from what was pointed to purely by the data. To the extent that Bayesian calculations include prior information, it is therefore essentially inevitable that their results will not be "unbiased" in sampling theory terms.

But the results of a Bayesian approach can differ from the sampling theory approach even if the Bayesian tries to adopt an "uninformative" prior.

For example, consider again the estimation of an unknown population variance σ² of a Normal distribution with unknown mean, where it is desired to optimisec in the expected loss function

$\operatorname {ExpectedLoss} =\operatorname {E} \left[\left(cnS^{2}-\sigma ^{2}\right)^{2}\right]=\operatorname {E} \left[\sigma ^{4}\left(cn{\tfrac {S^{2}}{\sigma ^{2}}}-1\right)^{2}\right]$

A standard choice of uninformative prior for this problem is theJeffreys prior, $\scriptstyle {p(\sigma ^{2})\;\propto \;1/\sigma ^{2}}$ , which is equivalent to adopting a rescaling-invariant flat prior forln(σ²).

One consequence of adopting this prior is thatS²/σ² remains apivotal quantity, i.e. the probability distribution ofS²/σ² depends only onS²/σ², independent of the value ofS² or σ²:

$p\left({\tfrac {S^{2}}{\sigma ^{2}}}\mid S^{2}\right)=p\left({\tfrac {S^{2}}{\sigma ^{2}}}\mid \sigma ^{2}\right)=g\left({\tfrac {S^{2}}{\sigma ^{2}}}\right)$

However, while

$\operatorname {E} _{p(S^{2}\mid \sigma ^{2})}\left[\sigma ^{4}\left(cn{\tfrac {S^{2}}{\sigma ^{2}}}-1\right)^{2}\right]=\sigma ^{4}\operatorname {E} _{p(S^{2}\mid \sigma ^{2})}\left[\left(cn{\tfrac {S^{2}}{\sigma ^{2}}}-1\right)^{2}\right]$

in contrast

$\operatorname {E} _{p(\sigma ^{2}\mid S^{2})}\left[\sigma ^{4}\left(cn{\tfrac {S^{2}}{\sigma ^{2}}}-1\right)^{2}\right]\neq \sigma ^{4}\operatorname {E} _{p(\sigma ^{2}\mid S^{2})}\left[\left(cn{\tfrac {S^{2}}{\sigma ^{2}}}-1\right)^{2}\right]$

— when the expectation is taken over the probability distribution of σ² givenS², as it is in the Bayesian case, rather thanS² given σ², one can no longer take σ⁴ as a constant and factor it out. The consequence of this is that, compared to the sampling-theory calculation, the Bayesian calculation puts more weight on larger values of σ², properly taking into account (as the sampling-theory calculation cannot) that under this squared-loss function the consequence of underestimating large values of σ² is more costly in squared-loss terms than that of overestimating small values of σ².

The worked-out Bayesian calculation gives ascaled inverse chi-squared distribution withn − 1 degrees of freedom for the posterior probability distribution of σ². The expected loss is minimised whencnS² = <σ²>; this occurs whenc = 1/(n − 3).

Even with an uninformative prior, therefore, a Bayesian calculation may not give the same expected-loss minimising result as the corresponding sampling-theory calculation.

Notes

[edit]

^"For the binomial distribution, why does no unbiased estimator exist for $1/p$?".Mathematics Stack Exchange. Retrieved2023-12-27.
^^a ^b ^cKozdron, Michael (March 2016)."Evaluating the Goodness of an Estimator: Bias, Mean-Square Error, Relative Efficiency (Chapter 3)"(PDF).stat.math.uregina.ca. Retrieved2020-09-11.
^^a ^bTaylor, Courtney (January 13, 2019)."Unbiased and Biased Estimators".ThoughtCo. Retrieved2020-09-12.
^Dekking, Michel, ed. (2005).A modern introduction to probability and statistics: understanding why and how. Springer texts in statistics. London [Heidelberg]: Springer.ISBN 978-1-85233-896-1.
^Richard Arnold Johnson; Dean W. Wichern (2007).Applied Multivariate Statistical Analysis. Pearson Prentice Hall.ISBN 978-0-13-187715-3. Retrieved10 August 2012.
^Romano, J. P.; Siegel, A. F. (1986).Counterexamples in Probability and Statistics. Monterey, California, USA: Wadsworth & Brooks / Cole. p. 168.
^Hardy, M. (1 March 2003). "An Illuminating Counterexample".American Mathematical Monthly.110 (3):234–238.arXiv:math/0206006.doi:10.2307/3647938.ISSN 0002-9890.JSTOR 3647938.
^Brown (1947), page 583
^Lehmann 1951;Birnbaum 1961;Van der Vaart 1961;Pfanzagl 1994
^Pfanzagl, Johann (1979)."On optimal median unbiased estimators in the presence of nuisance parameters".The Annals of Statistics.7 (1):187–193.doi:10.1214/aos/1176344563.
^^a ^bBrown, L. D.; Cohen, Arthur; Strawderman, W. E. (1976)."A Complete Class Theorem for Strict Monotone Likelihood Ratio With Applications".Ann. Statist.4 (4):712–722.doi:10.1214/aos/1176343543.
^^a ^b ^cDodge, Yadolah, ed. (1987).Statistical Data Analysis Based on the L₁-Norm and Related Methods. Papers from the First International Conference held at Neuchâtel, August 31–September 4, 1987. Amsterdam: North-Holland.ISBN 0-444-70273-3.
^Jaynes, E. T. (2007).Probability Theory : The Logic of Science. Cambridge: Cambridge Univ. Press. p. 172.ISBN 978-0-521-59271-0.
^Klebanov, Lev B.; Rachev, Svetlozar T.; Fabozzi, Frank J. (2009). "Loss Functions and the Theory of Unbiased Estimation".Robust and Non-Robust Models in Statistics. New York: Nova Scientific.ISBN 978-1-60741-768-2.
^Taboga, Marco (2010)."Lectures on probability theory and mathematical statistics".
^DeGroot, Morris H. (1986).Probability and Statistics (2nd ed.). Addison-Wesley. pp. 414–5.ISBN 0-201-11366-X. But compare it with, for example, the discussion inCasella; Berger (2001).Statistical Inference (2nd ed.). Duxbury. p. 332.ISBN 0-534-24312-6.
^Gelman, A.; et al. (1995).Bayesian Data Analysis. Chapman and Hall. p. 108.ISBN 0-412-03991-5.

References

[edit]

Brown, George W. "On Small-Sample Estimation."The Annals of Mathematical Statistics, vol. 18, no. 4 (Dec., 1947), pp. 582–585.JSTOR 2236236.
Lehmann, E. L. (December 1951). "A General Concept of Unbiasedness".The Annals of Mathematical Statistics.22 (4):587–592.doi:10.1214/aoms/1177729549.JSTOR 2236928.
Birnbaum, Allan (March 1961). "A Unified Theory of Estimation, I".The Annals of Mathematical Statistics.32 (1):112–135.doi:10.1214/aoms/1177705145.
Van der Vaart, H. R. (June 1961)."Some Extensions of the Idea of Bias".The Annals of Mathematical Statistics.32 (2):436–447.doi:10.1214/aoms/1177705051.
Pfanzagl, Johann (1994).Parametric Statistical Theory. Walter de Gruyter.
Stuart, Alan; Ord, Keith; Arnold, Steven [F.] (2010).Classical Inference and the Linear Model. Kendall's Advanced Theory of Statistics. Vol. 2A. Wiley.ISBN 978-0-4706-8924-0..
Voinov, Vassily [G.]; Nikulin, Mikhail [S.] (1993).Unbiased estimators and their applications. Vol. 1: Univariate case. Dordrect: Kluwer Academic Publishers.ISBN 0-7923-2382-3.
Voinov, Vassily [G.]; Nikulin, Mikhail [S.] (1996).Unbiased estimators and their applications. Vol. 2: Multivariate case. Dordrect: Kluwer Academic Publishers.ISBN 0-7923-3939-8.
Klebanov, Lev [B.]; Rachev, Svetlozar [T.]; Fabozzi, Frank [J.] (2009).Robust and Non-Robust Models in Statistics. New York: Nova Scientific Publishers.ISBN 978-1-60741-768-2.

External links

[edit]

"Unbiased estimator",Encyclopedia of Mathematics,EMS Press, 2001 [1994]

Statistics

Descriptive statistics

Continuous data

Center	Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode
Dispersion	Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance
Shape	Central limit theorem Moments Kurtosis L-moments Skewness

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Specific tests

Z-test(normal) Student'st-test F-test
Goodness of fit	Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality(Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank(Wilcoxon) Hodges–Lehmann estimator Rank sum(Mann–Whitney) Nonparametric anova 1-way(Kruskal–Wallis) 2-way(Friedman) Ordered alternative(Jonckheere–Terpstra) Van der Waerden test

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis (see alsoTemplate:Least squares and regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity
Generalized linear model	Exponential families Logistic(Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / multivariate / time-series / survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic(Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model(Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR) (Autoregressive model (AR))
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

v t e Biases
Cognitive biases	Acquiescence Ambiguity Affinity Anchoring Attentional Instrument Attribution Actor–observer Fundamental Group Ultimate Authority Automation Double standard Availability Mean world Belief Blind spot Choice-supportive Commitment Confirmation Selective perception Compassion fade Congruence Cultural Declinism Distinction Dunning–Kruger Egocentric Curse of knowledge Emotional Extrinsic incentives Fading affect Framing Frequency Frog pond effect Halo effect Hindsight Horn effect Hostile attribution Impact Implicit In-group Intentionality Illusion of transparency Mean world syndrome Mere-exposure effect Narrative Negativity Normalcy Omission Optimism Out-group homogeneity Outcome Overton window Precision Present Pro-innovation Proximity Response Rosy retrospection Restraint Self-serving Social comparison Social influence bias Spotlight Status quo Substitution Time-saving Trait ascription Turkey illusion von Restorff effect Zero-risk In animals
Statistical biases	Estimator Forecast Healthy user Information Psychological Lead time Length time Non-response Observer Omitted-variable Participation Recall Sampling Selection Self-selection Social desirability Spectrum Survivorship Systematic error Systemic Verification Wet
Other biases	Academic Basking in reflected glory Déformation professionnelle Funding FUTON Inductive Infrastructure Inherent In education Liking gap Media False balance Vietnam War South Asia United States Arab–Israeli conflict Ukraine Net Political bias Publication System justification Reporting White hat Ideological bias on Wikipedia
Bias reduction	Cognitive bias mitigation Debiasing Heuristics in judgment and decision-making
Lists:General Memory