Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Admissible decision rule

From Wikipedia, the free encyclopedia
Type of "good" decision rule in Bayesian statistics
Part of a series on
Bayesian statistics
Posterior =Likelihood ×Prior ÷Evidence
Background
Model building
Posterior approximation
Estimators
Evidence approximation
Model evaluation

Instatistical decision theory, anadmissible decision rule is arule for making a decision such that there is no other rule that is always "better" than it[1] (or at least sometimes better and never worse), in the precise sense of "better" defined below. This concept is analogous toPareto efficiency.

Definition

[edit]

DefinesetsΘ{\displaystyle \Theta \,},X{\displaystyle {\mathcal {X}}} andA{\displaystyle {\mathcal {A}}}, whereΘ{\displaystyle \Theta \,} are the states of nature,X{\displaystyle {\mathcal {X}}} the possible observations, andA{\displaystyle {\mathcal {A}}} the actions that may be taken. An observation ofxX{\displaystyle x\in {\mathcal {X}}\,\!} is distributed asF(xθ){\displaystyle F(x\mid \theta )\,\!} and therefore provides evidence about the state of natureθΘ{\displaystyle \theta \in \Theta \,\!}. Adecision rule is afunctionδ:XA{\displaystyle \delta :{\mathcal {X}}\rightarrow {\mathcal {A}}}, where upon observingxX{\displaystyle x\in {\mathcal {X}}}, we choose to take actionδ(x)A{\displaystyle \delta (x)\in {\mathcal {A}}\,\!}.

Also define aloss functionL:Θ×AR{\displaystyle L:\Theta \times {\mathcal {A}}\rightarrow \mathbb {R} }, which specifies the loss we would incur by taking actionaA{\displaystyle a\in {\mathcal {A}}} when the true state of nature isθΘ{\displaystyle \theta \in \Theta }. Usually we will take this action after observing dataxX{\displaystyle x\in {\mathcal {X}}}, so that the loss will beL(θ,δ(x)){\displaystyle L(\theta ,\delta (x))\,\!}. (It is possible though unconventional to recast the following definitions in terms of autility function, which is the negative of the loss.)

Define therisk function as theexpectation

R(θ,δ)=EF(xθ)[L(θ,δ(x))].{\displaystyle R(\theta ,\delta )=\operatorname {E} _{F(x\mid \theta )}[{L(\theta ,\delta (x))]}.\,\!}

Whether a decision ruleδ{\displaystyle \delta \,\!} has low risk depends on the true state of natureθ{\displaystyle \theta \,\!}. A decision ruleδ{\displaystyle \delta ^{*}\,\!}dominates a decision ruleδ{\displaystyle \delta \,\!} if and only ifR(θ,δ)R(θ,δ){\displaystyle R(\theta ,\delta ^{*})\leq R(\theta ,\delta )} for allθ{\displaystyle \theta \,\!},and the inequality isstrict for someθ{\displaystyle \theta \,\!}.

A decision rule isadmissible (with respect to the loss function) if and only if no other rule dominates it; otherwise it isinadmissible. Thus an admissible decision rule is amaximal element with respect to the above partial order.An inadmissible rule is not preferred (except for reasons of simplicity or computational efficiency), since by definition there is some other rule that will achieve equal or lower risk forallθ{\displaystyle \theta \,\!}. But just because a ruleδ{\displaystyle \delta \,\!} is admissible does not mean it is a good rule to use. Being admissible means there is no other single rule that isalways as good or better – but other admissible rules might achieve lower risk for mostθ{\displaystyle \theta \,\!} that occur in practice. (The Bayes risk discussed below is a way of explicitly considering whichθ{\displaystyle \theta \,\!} occur in practice.)

Bayes rules and generalized Bayes rules

[edit]
See also:Bayes estimator § Admissibility

Bayes rules

[edit]

Letπ(θ){\displaystyle \pi (\theta )\,\!} be a probability distribution on the states of nature. From aBayesian point of view, we would regard it as aprior distribution. That is, it is our believed probability distribution on the states of nature, prior to observing data. For afrequentist, it is merely a function onΘ{\displaystyle \Theta \,\!} with no such special interpretation. TheBayes risk of the decision ruleδ{\displaystyle \delta \,\!} with respect toπ(θ){\displaystyle \pi (\theta )\,\!} is the expectation

r(π,δ)=Eπ(θ)[R(θ,δ)].{\displaystyle r(\pi ,\delta )=\operatorname {E} _{\pi (\theta )}[R(\theta ,\delta )].\,\!}

A decision ruleδ{\displaystyle \delta \,\!} that minimizesr(π,δ){\displaystyle r(\pi ,\delta )\,\!} is called aBayes rule with respect toπ(θ){\displaystyle \pi (\theta )\,\!}. There may be more than one such Bayes rule. If the Bayes risk is infinite for allδ{\displaystyle \delta \,\!}, then no Bayes rule is defined.

Generalized Bayes rules

[edit]
See also:Bayes estimator § Generalized Bayes estimators

In the Bayesian approach to decision theory, the observedx{\displaystyle x\,\!} is consideredfixed. Whereas the frequentist approach (i.e., risk) averages over possible samplesxX{\displaystyle x\in {\mathcal {X}}\,\!}, the Bayesian would fix the observed samplex{\displaystyle x\,\!} and average over hypothesesθΘ{\displaystyle \theta \in \Theta \,\!}. Thus, the Bayesian approach is to consider for our observedx{\displaystyle x\,\!} theexpected loss

ρ(π,δx)=Eπ(θx)[L(θ,δ(x))].{\displaystyle \rho (\pi ,\delta \mid x)=\operatorname {E} _{\pi (\theta \mid x)}[L(\theta ,\delta (x))].\,\!}

where the expectation is over theposterior ofθ{\displaystyle \theta \,\!} givenx{\displaystyle x\,\!} (obtained fromπ(θ){\displaystyle \pi (\theta )\,\!} andF(xθ){\displaystyle F(x\mid \theta )\,\!} usingBayes' theorem).

Having made explicit the expected loss for each givenx{\displaystyle x\,\!} separately, we can define a decision ruleδ{\displaystyle \delta \,\!} by specifying for eachx{\displaystyle x\,\!} an actionδ(x){\displaystyle \delta (x)\,\!} that minimizes the expected loss. This is known as ageneralized Bayes rule with respect toπ(θ){\displaystyle \pi (\theta )\,\!}. There may be more than one generalized Bayes rule, since there may be multiple choices ofδ(x){\displaystyle \delta (x)\,\!} that achieve the same expected loss.

At first, this may appear rather different from the Bayes rule approach of the previous section, not a generalization. However, notice that the Bayes risk already averages overΘ{\displaystyle \Theta \,\!} in Bayesian fashion, and the Bayes risk may be recovered as the expectation overX{\displaystyle {\mathcal {X}}} of the expected loss (wherexθ{\displaystyle x\sim \theta \,\!} andθπ{\displaystyle \theta \sim \pi \,\!}). Roughly speaking,δ{\displaystyle \delta \,\!} minimizes this expectation of expected loss (i.e., is a Bayes rule) if and only if it minimizes the expected loss for eachxX{\displaystyle x\in {\mathcal {X}}} separately (i.e., is a generalized Bayes rule).

Then why is the notion of generalized Bayes rule an improvement? It is indeed equivalent to the notion of Bayes rule when a Bayes rule exists and allx{\displaystyle x\,\!} have positive probability. However, no Bayes rule exists if the Bayes risk is infinite (for allδ{\displaystyle \delta \,\!}). In this case it is still useful to define a generalized Bayes ruleδ{\displaystyle \delta \,\!}, which at least chooses a minimum-expected-loss actionδ(x){\displaystyle \delta (x)\!\,} for thosex{\displaystyle x\,\!} for which a finite-expected-loss action does exist. In addition, a generalized Bayes rule may be desirable because it must choose a minimum-expected-loss actionδ(x){\displaystyle \delta (x)\,\!} foreveryx{\displaystyle x\,\!}, whereas a Bayes rule would be allowed to deviate from this policy on a setXX{\displaystyle X\subseteq {\mathcal {X}}} of measure 0 without affecting the Bayes risk.

More important, it is sometimes convenient to use an improper priorπ(θ){\displaystyle \pi (\theta )\,\!}. In this case, the Bayes risk is not even well-defined, nor is there any well-defined distribution overx{\displaystyle x\,\!}. However, the posteriorπ(θx){\displaystyle \pi (\theta \mid x)\,\!}—and hence the expected loss—may be well-defined for eachx{\displaystyle x\,\!}, so that it is still possible to define a generalized Bayes rule.

Admissibility of (generalized) Bayes rules

[edit]

According to the complete class theorems, under mild conditions every admissible rule is a (generalized) Bayes rule (with respect to some priorπ(θ){\displaystyle \pi (\theta )\,\!}—possibly an improper one—that favors distributionsθ{\displaystyle \theta \,\!} where that rule achieves low risk). Thus, infrequentistdecision theory it is sufficient to consider only (generalized) Bayes rules.

Conversely, while Bayes rules with respect to proper priors are virtually always admissible, generalized Bayes rules corresponding toimproper priors need not yield admissible procedures.Stein's example is one such famous situation.

Examples

[edit]

TheJames–Stein estimator is a nonlinear estimator of the mean of Gaussian random vectors and can be shown to dominate theordinary least squares technique with respect to a mean-squared-error loss function.[2] Thus least squares estimation is not an admissible estimation procedure in this context. Some others of the standard estimates associated with thenormal distribution are also inadmissible: for example, thesample estimate of the variance when the population mean and variance are unknown.[3]

Notes

[edit]
  1. ^Dodge, Y. (2003)The Oxford Dictionary of Statistical Terms. OUP.ISBN 0-19-920613-9 (entry for admissible decision function)
  2. ^Cox & Hinkley 1974, Section 11.8
  3. ^Cox & Hinkley 1974, Exercise 11.7

References

[edit]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Admissible_decision_rule&oldid=1191551405"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp