Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Bayes error rate

From Wikipedia, the free encyclopedia
Error rate in statistical mathematics

Instatistical classification,Bayes error rate is the lowest possible error rate for any classifier of a random outcome (into, for example, one of two categories) and is analogous to the irreducible error.[1][2]

A number of approaches to the estimation of the Bayes error rate exist. One method seeks to obtain analytical bounds which are inherently dependent on distribution parameters, and hence difficult to estimate. Another approach focuses on class densities, while yet another method combines and compares various classifiers.[2]

The Bayes error rate finds important use in the study of patterns andmachine learning techniques.[3]

Definition

[edit]

Mohri, Rostamizadeh and Talwalkar define it as

Given a distributionD{\displaystyle {\mathcal {D}}} overX×Y{\displaystyle {\mathcal {X}}\times {\mathcal {Y}}}, the Bayes errorR{\displaystyle R^{*}} is defined as the infimum of the errors achieved by measurable functionsh:XY{\displaystyle h:{\mathcal {X}}\to {\mathcal {Y}}}:
R=infh:h measurableR(h){\displaystyle R^{*}=\inf \limits _{h:h{\text{ measurable}}}R(h)}
A hypothesis h with R(h) = R* is called a Bayes hypothesis orBayes classifier.

Error determination

[edit]

In terms of machine learning and pattern classification, the labels of a set of random observations can be divided into 2 or more classes. Each observation is called aninstance and the class it belongs to is thelabel.The Bayes error rate of the data distribution is the probability an instance is misclassified by a classifier that knows the true class probabilities given the predictors.

For amulticlass classifier, the expected prediction error may be calculated as follows:[3]

EPE=Ex[k=1KL(Ck,C^(x))P(Ck|x)]{\displaystyle EPE=E_{x}[\sum _{k=1}^{K}L(C_{k},{\hat {C}}(x))P(C_{k}|x)]}

wherex{\displaystyle x} is the instance,E[]{\displaystyle E[]} the expectation value,Ck{\displaystyle C_{k}} is a class into which an instance is classified,P(Ck|x){\displaystyle P(C_{k}|x)} is theconditional probability of labelk{\displaystyle k} for instancex{\displaystyle x}, andL(){\displaystyle L()} is the 0–1loss function:

L(x,y)=1δx,y={0if x=y1if xy,{\displaystyle L(x,y)=1-\delta _{x,y}={\begin{cases}0&{\text{if }}x=y\\1&{\text{if }}x\neq y\end{cases}},}whereδx,y{\displaystyle \delta _{x,y}} is theKronecker delta.

When the learner knows the conditional probability, then one solution is:

C^B(x)=argmaxk{1...K}P(Ck|X=x){\displaystyle {\hat {C}}_{B}(x)=\arg \max _{k\in \{1...K\}}P(C_{k}|X=x)}

This solution is known as the Bayes classifier.

The corresponding expected Prediction Error is called the Bayes error rate:

BE=Ex[k=1KL(Ck,C^B(x))P(Ck|x)]=Ex[k=1, CkC^B(x)KP(Ck|x)]=Ex[1P(C^B(x)|x)],{\displaystyle BE=E_{x}[\sum _{k=1}^{K}L(C_{k},{\hat {C}}_{B}(x))P(C_{k}|x)]=E_{x}[\sum _{k=1,\ C_{k}\neq {\hat {C}}_{B}(x)}^{K}P(C_{k}|x)]=E_{x}[1-P({\hat {C}}_{B}(x)|x)],}

where the sum can be omitted in the last step due to considering the counter event.By the definition of the Bayes classifier, it maximizesP(C^B(x)|x){\displaystyle P({\hat {C}}_{B}(x)|x)} and, therefore, minimizes the Bayes error BE.

The Bayes error is non-zero if the classification labels are not deterministic, i.e., there is a non-zero probability of a given instance belonging to more than one class.[4] In a regression context with squared error, the Bayes error is equal to the noise variance.[3]

Proof of Minimality

[edit]

Proof that the Bayes error rate is indeed the minimum possible and that the Bayes classifier is therefore optimal, may be found together on the Wikipedia pageBayes classifier.

Plug-in Rules for Binary Classifiers

[edit]

A plug-in rule uses an estimate of the posterior probabilityη{\displaystyle \eta } to form a classification rule. Given an estimateη~{\displaystyle {\tilde {\eta }}}, the excess Bayes error rate of the associated classifier is bounded above by:

2E[|η(X)η~(X)|].{\displaystyle 2\mathbb {E} [|\eta (X)-{\tilde {\eta }}(X)|].}

To see this, note that the excess Bayes error is equal to 0 where the classifiers agree, and equal to2|η(X)1/2|{\displaystyle 2|\eta (X)-1/2|} where they disagree. To form the bound, notice thatη~{\displaystyle {\tilde {\eta }}} is at least as far as1/2{\displaystyle 1/2} when the classifiers disagree.

See also

[edit]

References

[edit]
  1. ^Fukunaga, Keinosuke (1990).Introduction to Statistical Pattern Recognition. pp. 3, 97.ISBN 0122698517.
  2. ^abK. Tumer, K. (1996) "Estimating the Bayes error rate through classifier combining" inProceedings of the 13th International Conference on Pattern Recognition, Volume 2, 695–699
  3. ^abcHastie, Trevor (2009).The Elements of Statistical Learning (2nd ed.). Springer. p. 21.ISBN 978-0387848570.
  4. ^Mohri, Mehryar;Rostamizadeh, Afshin;Talwalkar, Ameet (2018).Foundations of Machine Learning (2nd ed.). p. 22.


Stub icon

Thisstatistics-related article is astub. You can help Wikipedia byadding missing information.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Bayes_error_rate&oldid=1322604938"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp