Movatterモバイル変換

[0]ホーム

Jump to content

Monotone likelihood ratio

Deutsch

Edit links

From Wikipedia, the free encyclopedia

Statistical property

A monotonic likelihood ratio in distributions

\ f(x)\

and

\ g(x)\

The ratio of thedensity functions above is monotone in the parameter $\ x\ ,$ so $\ {\frac {\ f(x)\ }{g(x)}}\$ satisfies themonotone likelihood ratio property.

Instatistics, themonotone likelihood ratio property is a property of the ratio of twoprobability density functions (PDFs). Formally, distributions $\ f(x)\$ and $\ g(x)\$ bear the property if

\ {\text{for every }}x_{2}>x_{1},\quad {\frac {f(x_{2})}{\ g(x_{2})\ }}\geq {\frac {f(x_{1})}{\ g(x_{1})\ }}\

that is, if the ratio is nondecreasing in the argument $x {\displaystyle x}$ .

If the functions are first-differentiable, the property may sometimes be stated

{\frac {\ \partial }{\ \partial x}}\left({\frac {f(x)}{\ g(x)\ }}\right)\geq 0\

For two distributions that satisfy the definition with respect to some argument $\ x\ ,$ we say they "have the MLRP in $\ x~.$ " For a family of distributions that all satisfy the definition with respect to some statistic $\ T(X)\ ,$ we say they "have the MLR in $\ T(X)~.$ "

Intuition

[edit]

The MLRP is used to represent a data-generating process that enjoys a straightforward relationship between the magnitude of some observed variable and the distribution it draws from. If $\ f(x)\$ satisfies the MLRP with respect to $\ g(x)\$ , the higher the observed value $\ x\$ , the more likely it was drawn from distribution $\ f\$ rather than $\ g~.$ As usual for monotonic relationships, the likelihood ratio's monotonicity comes in handy in statistics, particularly when usingmaximum-likelihood estimation. Also, distribution families with MLR have a number of well-behaved stochastic properties, such asfirst-order stochastic dominance and increasinghazard ratios. Unfortunately, as is also usual, the strength of this assumption comes at the price of realism. Many processes in the world do not exhibit a monotonic correspondence between input and output.

Example: Working hard or slacking off

[edit]

Suppose you are working on a project, and you can either work hard or slack off. Call your choice of effort $\ e\$ and the quality of the resulting project $\ q~.$ If the MLRP holds for the distribution of $\ q\$ conditional on your effort $\ e\$ , the higher the quality the more likely you worked hard. Conversely, the lower the quality the more likely you slacked off.

1: Choose effort

\ e\in \{H,L\}\

where

\ H\

means high effort, and

\ L\

means low effort.

2: Observe

\ q\

drawn from

\ f(q\ |\ e)~.

ByBayes' law with auniform prior,

\ \operatorname {\mathbb {P} } {\bigl [}\ e=H\ {\big |}\ q\ {\bigr ]}={\frac {f(q\ |\ H)}{\ f(q\ |\ H)+f(q\ |\ L)\ }}\

3: Suppose

\ f(q\ |\ e)\

satisfies the MLRP. Rearranging, the probability the worker worked hard is

\ {\frac {1}{\ 1+f(q\ |\ L)/f(q\ |\ H)\ }}\

which, thanks to the MLRP, is monotonically increasing in

\ q\

(because

\ {\frac {f(q\ |\ L)}{\ f(q\ |\ H)\ }}\

is decreasing in

\ q\

Hence if some employer is doing a "performance review" he can infer his employee's behavior from the merits of his work.

Families of distributions satisfying MLR

[edit]

Statistical models often assume that data are generated by a distribution from some family of distributions and seek to determine that distribution. This task is simplified if the family has the monotone likelihood ratio property (MLRP).

A family of density functions $\ {\bigl \{}\ f_{\theta }(x)\ {\big |}\ \theta \in \Theta \ {\bigr \}}\$ indexed by a parameter $\ \theta \$ taking values in an ordered set $\ \Theta \$ is said to have amonotone likelihood ratio (MLR) in thestatistic $\ T(X)\$ if for any $\ \theta _{1}<\theta _{2}\ ,$

\ {\frac {f_{\theta _{2}}(X=x_{1},\ x_{2},\ x_{3},\ \ldots \ )}{\ f_{\theta _{1}}(X=x_{1},\ x_{2},\ x_{3},\ \ldots \ )\ }}\

is a non-decreasing function of

\ T(X)~.

Then we say the family of distributions "has MLR in $\ T(X)\$ ".

List of families

[edit]

Family	$\ T(X)\$ in which $\ f_{\theta }(X)\$ has the MLR
Exponential $[\lambda ]\$	$\ \sum x_{i}\$ observations
Binomial $[n,p]\$	$\ \sum x_{i}\$ observations
Poisson $[\lambda ]\$	$\ \sum x_{i}\$ observations
Normal $[\mu ,\sigma ]\$	if $\ \sigma \$ known, $\ \sum x_{i}\$ observations

Hypothesis testing

[edit]

If the family of random variables has the MLRP in $\ T(X)\ ,$ auniformly most powerful test can easily be determined for the hypothesis $\ H_{0}\ :\ \theta \leq \theta _{0}\$ versus $\ H_{1}\ :\ \theta >\theta _{0}~.$

Example: Effort and output

[edit]

Example: Let $\ e\$ be an input into a stochastic technology – worker's effort, for instance – and $\ y\$ its output, the likelihood of which is described by a probability density function $\ f(y;e)~.$ Then the monotone likelihood ratio property (MLRP) of the family $\ f\$ is expressed as follows: For any $\ e_{1},e_{2}\ ,$ the fact that $e_{2}>e_{1}$ implies that the ratio $\ {\frac {\ f(y;e_{2})\ }{f(y;e_{1})}}\$ is increasing in $\ y~.$

Relation to other statistical properties

[edit]

Monotone likelihoods are used in several areas of statistical theory, includingpoint estimation andhypothesis testing, as well as inprobability models.

Exponential families

[edit]

One-parameterexponential families have monotone likelihood-functions. In particular, the one-dimensional exponential family ofprobability density functions orprobability mass functions with

\ f_{\theta }(x)=c(\theta )\ h(x)\ \exp {\Bigl (}\ \pi \left(\theta \right)\ T\left(x\right)\ {\Bigr )}\

has a monotone non-decreasing likelihood ratio in thesufficient statistic $\ T(x)\ ,$ provided that $\ \pi (\theta )\$ is non-decreasing.

Uniformly most powerful tests: The Karlin–Rubin theorem

[edit]

Monotone likelihood functions are used to constructuniformly most powerful tests, according to theKarlin–Rubin theorem.^[1]Consider a scalar measurement having a probability density function parameterized by a scalar parameter $\ \theta \ ,$ and define the likelihood ratio $\ \ell (x)={\frac {f_{\theta _{1}}(x)}{\ f_{\theta _{0}}(x)\ }}~.$ If $\ \ell (x)\$ is monotone non-decreasing, in $\ x\ ,$ for any pair $\ \theta _{1}\geq \theta _{0}\$ (meaning that the greater $\ x\$ is, the more likely $\ H_{1}\$ is), then the threshold test:

\ \varphi (x)={\begin{cases}1&{\text{if }}x>x_{0}\\0&{\text{if }}x<x_{0}\end{cases}}\

where

\ x_{0}\

is chosen so that

\ \operatorname {\mathbb {E} } {\bigl \{}\ \varphi (X)\ {\big |}\ \theta _{0}\ {\bigr \}}=\alpha \

is the UMP test of size $\ \alpha \$ for testing $\ H_{0}\ :\ \theta \leq \theta _{0}~~$ vs. $~~H_{1}:\theta >\theta _{0}~.$

Note that exactly the same test is also UMP for testing $\ H_{0}\ :\ \theta =\theta _{0}~~$ vs. $~~H_{1}:\theta >\theta _{0}~.$

Median unbiased estimation

[edit]

Monotone likelihood-functions are used to constructmedian-unbiased estimators, using methods specified byJohann Pfanzagl and others.^[2]^[3]One such procedure is an analogue of theRao–Blackwell procedure formean-unbiased estimators: The procedure holds for a smaller class of probability distributions than does the Rao–Blackwell procedure for mean-unbiased estimation but for a larger class ofloss functions.^[3]^: 713

Lifetime analysis: Survival analysis and reliability

[edit]

If a family of distributions $\ f_{\theta }(x)\$ has the monotone likelihood ratio property in $\ T(X)\ ,$

the family has monotone decreasinghazard rates in $\ \theta \$ (but not necessarily in $\ T(X)\$ )
the family exhibits the first-order (and hence second-order)stochastic dominance in $\ x\ ,$ and the best Bayesian update of $\ \theta \$ is increasing in $T(X)$ .

But not conversely: neither monotone hazard rates nor stochastic dominance imply the MLRP.

Proofs

[edit]

Let distribution family $\ f_{\theta }\$ satisfy MLR in $\ x\ ,$ so that for $\ \theta _{1}>\theta _{0}\$ and $\ x_{1}>x_{0}\ :$

\ {\frac {f_{\theta _{1}}(x_{1})}{\ f_{\theta _{0}}(x_{1})\ }}\geq {\frac {\ f_{\theta _{1}}(x_{0})\ }{f_{\theta _{0}}(x_{0})}}\ ,

or equivalently:

\ f_{\theta _{1}}(x_{1})\ f_{\theta _{0}}(x_{0})\geq f_{\theta _{1}}(x_{0})\ f_{\theta _{0}}(x_{1})~.

Integrating this expression twice, we obtain:

1. To $\ x_{1}\$ with respect to $\ x_{0}\$ ${\begin{aligned}&\int _{\min X}^{x_{1}}\ f_{\theta _{1}}(x_{1})\ f_{\theta _{0}}(x_{0})\ \mathrm {d} x_{0}\\[6pt]\geq {}&\int _{\min X}^{x_{1}}\ f_{\theta _{1}}(x_{0})\ f_{\theta _{0}}(x_{1})\ \mathrm {d} x_{0}\end{aligned}}$ integrate and rearrange to obtain ${\frac {f_{\theta _{1}}(x)}{\ f_{\theta _{0}}(x)\ }}\geq {\frac {F_{\theta _{1}}(x)}{\ F_{\theta _{0}}(x)\ }}\$		2. From $x_{0}$ with respect to $\ x_{1}\$ ${\begin{aligned}&\int _{x_{0}}^{\max X}\ f_{\theta _{1}}(x_{1})\ f_{\theta _{0}}(x_{0})\ \mathrm {d} x_{1}\\[6pt]\geq {}&\int _{x_{0}}^{\max X}\ f_{\theta _{1}}(x_{0})\ f_{\theta _{0}}(x_{1})\ \mathrm {d} x_{1}\end{aligned}}$ integrate and rearrange to obtain ${\frac {1-F_{\theta _{1}}(x)}{\ 1-F_{\theta _{0}}(x)\ }}\geq {\frac {f_{\theta _{1}}(x)}{\ f_{\theta _{0}}(x)\ }}\$

1. To

\ x_{1}\

with respect to

\ x_{0}\

{\begin{aligned}&\int _{\min X}^{x_{1}}\ f_{\theta _{1}}(x_{1})\ f_{\theta _{0}}(x_{0})\ \mathrm {d} x_{0}\\[6pt]\geq {}&\int _{\min X}^{x_{1}}\ f_{\theta _{1}}(x_{0})\ f_{\theta _{0}}(x_{1})\ \mathrm {d} x_{0}\end{aligned}}

integrate and rearrange to obtain

{\frac {f_{\theta _{1}}(x)}{\ f_{\theta _{0}}(x)\ }}\geq {\frac {F_{\theta _{1}}(x)}{\ F_{\theta _{0}}(x)\ }}\

2. From

x_{0}

with respect to

\ x_{1}\

{\begin{aligned}&\int _{x_{0}}^{\max X}\ f_{\theta _{1}}(x_{1})\ f_{\theta _{0}}(x_{0})\ \mathrm {d} x_{1}\\[6pt]\geq {}&\int _{x_{0}}^{\max X}\ f_{\theta _{1}}(x_{0})\ f_{\theta _{0}}(x_{1})\ \mathrm {d} x_{1}\end{aligned}}

integrate and rearrange to obtain

{\frac {1-F_{\theta _{1}}(x)}{\ 1-F_{\theta _{0}}(x)\ }}\geq {\frac {f_{\theta _{1}}(x)}{\ f_{\theta _{0}}(x)\ }}\

First-order stochastic dominance

[edit]

Combine the two inequalities above to get first-order dominance:

F_{\theta _{1}}(x)\leq F_{\theta _{0}}(x)~\forall x\

Monotone hazard rate

[edit]

Use only the second inequality above to get a monotone hazard rate:

\ {\frac {f_{\theta _{1}}(x)}{\ 1-F_{\theta _{1}}(x)\ }}\leq {\frac {f_{\theta _{0}}(x)}{\ 1-F_{\theta _{0}}(x)\ }}~\forall x\

Uses

[edit]

Economics

[edit]

The MLR is an important condition on the type distribution of agents inmechanism design andeconomics of information, wherePaul Milgrom defined "favorableness" of signals (in terms of stochastic dominance) as a consequence of MLR.^[4]Most solutions to mechanism design models assume type distributions that satisfy the MLR to take advantage of solution methods that may be easier to apply and interpret.

References

[edit]

^Casella, G.; Berger, R.L. (2008). "Theorem 8.3.17".Statistical Inference. Brooks / Cole.ISBN 0-495-39187-5.
^Pfanzagl, Johann (1979)."On optimal median unbiased estimators in the presence of nuisance parameters".Annals of Statistics.7 (1):187–193.doi:10.1214/aos/1176344563.
^^a ^bBrown, L.D.; Cohen, Arthur; Strawderman, W.E. (1976)."A complete class theorem for strict monotone likelihood ratio with applications".Annals of Statistics.4 (4):712–722.doi:10.1214/aos/1176343543.
^Milgrom, P.R. (1981). "Good news and bad news: Representation theorems and applications".The Bell Journal of Economics.12 (2):380–391.doi:10.2307/3003562.

Statistics

Descriptive statistics

Continuous data

Center	Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode
Dispersion	Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance
Shape	Central limit theorem Moments Kurtosis L-moments Skewness

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Specific tests

Z-test(normal) Student'st-test F-test
Goodness of fit	Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality(Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank(Wilcoxon) Hodges–Lehmann estimator Rank sum(Mann–Whitney) Nonparametric anova 1-way(Kruskal–Wallis) 2-way(Friedman) Ordered alternative(Jonckheere–Terpstra) Van der Waerden test

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity
Generalized linear model	Exponential families Logistic(Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / multivariate / time-series / survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic(Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model(Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

v t e Theory ofprobability distributions
probability mass function (pmf) probability density function (pdf) cumulative distribution function (cdf) quantile function
raw moment central moment mean variance standard deviation skewness kurtosis L-moment
moment-generating function (mgf) characteristic function probability-generating function (pgf) cumulant combinant

Retrieved from "https://en.wikipedia.org/w/index.php?title=Monotone_likelihood_ratio&oldid=1214330432"

Categories:

Hidden categories:

[8]ページ先頭