Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Median absolute deviation

From Wikipedia, the free encyclopedia
Statistical measure of variability
For other statistical measures also abbreviated as "MAD", seeAverage absolute deviation.

Instatistics, themedian absolute deviation (MAD), also referred to as themedian absolute deviation from the median (MADFM), is arobust oroutlier-resistant measure of thevariability of aunivariate sample ofquantitative data. For a univariate data setX1X2, ..., Xn, the MAD is defined as themedian of theabsolute deviations from the data's median,MAD=median(|XiX~|){\displaystyle \operatorname {MAD} =\operatorname {median} (|X_{i}-{\tilde {X}}|)}.It can also refer to thepopulationparameter that isestimated by the MAD calculated from a sample.[1]

Example

[edit]

Consider the data (1, 1, 2,2, 4, 6, 9). It has a median value of 2. The absolute deviations about 2 are (1, 1, 0, 0, 2, 4, 7) which in turn have a median value of 1 (because the sorted absolute deviations are (0, 0, 1,1, 2, 4, 7)). So the median absolute deviation for this data is 1.

Uses

[edit]

The median absolute deviation is a measure ofstatistical dispersion. Moreover, the MAD is arobust statistic, being more resilient to outliers in a data set than thestandard deviation. In the standard deviation, the distances from themean are squared, so large deviations are weighted more heavily, and thus outliers can heavily influence it. In the MAD, the deviations of a small number of outliers are irrelevant.

Because the MAD is a more robust estimator of scale than the samplevariance orstandard deviation, it works better with distributions without a mean or variance, such as theCauchy distribution.

Relation to standard deviation

[edit]

The MAD may be used similarly to how one would use the deviation for the average.In order to use the MAD as aconsistent estimator for theestimation of thestandard deviationσ{\displaystyle \sigma }, one takes

σ^=kMAD,{\displaystyle {\hat {\sigma }}=k\cdot \operatorname {MAD} ,}

wherek{\displaystyle k} is a constantscale factor, which depends on the distribution.[2]

Fornormally distributed datak{\displaystyle k} is taken to be

k=1/(Φ1(3/4))1/0.674491.4826,{\displaystyle k=1/\left(\Phi ^{-1}(3/4)\right)\approx 1/0.67449\approx 1.4826,}

i.e., thereciprocal of thequantile functionΦ1{\displaystyle \Phi ^{-1}} (also known as the inverse of thecumulative distribution function) for thestandard normal distributionZ=(Xμ)/σ{\displaystyle Z=(X-\mu )/\sigma }.[3][4]

Derivation

[edit]

The argument 3/4 is such that±MAD{\displaystyle \pm \operatorname {MAD} } covers 50% (between 1/4 and 3/4) of the standard normalcumulative distribution function, i.e.

12=P(|Xμ|MAD)=P(|Xμσ|MADσ)=P(|Z|MADσ).{\displaystyle {\frac {1}{2}}=P(|X-\mu |\leq \operatorname {MAD} )=P\left(\left|{\frac {X-\mu }{\sigma }}\right|\leq {\frac {\operatorname {MAD} }{\sigma }}\right)=P\left(|Z|\leq {\frac {\operatorname {MAD} }{\sigma }}\right).}

Therefore, we must have that

Φ(MAD/σ)Φ(MAD/σ)=1/2.{\displaystyle \Phi \left(\operatorname {MAD} /\sigma \right)-\Phi \left(-\operatorname {MAD} /\sigma \right)=1/2.}

Noticing that

Φ(MAD/σ)=1Φ(MAD/σ),{\displaystyle \Phi \left(-\operatorname {MAD} /\sigma \right)=1-\Phi \left(\operatorname {MAD} /\sigma \right),}

we have thatMAD/σ=Φ1(3/4)=0.67449{\displaystyle \operatorname {MAD} /\sigma =\Phi ^{-1}(3/4)=0.67449}, from which we obtain the scale factork=1/Φ1(3/4)=1.4826{\displaystyle k=1/\Phi ^{-1}(3/4)=1.4826}.

Another way of establishing the relationship is noting that MAD equals thehalf-normal distribution median:

MAD=σ2erf1(1/2)0.67449σ.{\displaystyle \operatorname {MAD} =\sigma {\sqrt {2}}\operatorname {erf} ^{-1}(1/2)\approx 0.67449\sigma .}

This form is used in, e.g., theprobable error.

In the case ofcomplex values (X+iY), the relation of MAD to the standard deviation is unchanged for normally distributed data.

Multivariate generalization

[edit]

Analogously to how themedian generalizes to thegeometric median (GM) inmultivariate data, MAD can be generalized to themedian of distances to GM (MADGM) inn dimensions. This is done by replacing the absolute differences in one dimension byEuclidean distances of the data points to the geometric median inn dimensions.[5] This gives the identical result as the univariate MAD in one dimension and generalizes to any number of dimensions. MADGM needs the geometric median to be found, which is done by an iterative process.

The population MAD

[edit]

The population MAD is defined analogously to the sample MAD, but is based on the complete population rather than on a sample. For a symmetric distribution with zero mean, the population MAD is the 75thpercentile of the distribution.

Unlike thevariance, which may be infinite or undefined, the population MAD is always a finite number. For example, the standardCauchy distribution has undefined variance, but its MAD is 1.

The earliest known mention of the concept of the MAD occurred in 1816, in a paper byCarl Friedrich Gauss on the determination of the accuracy of numerical observations.[6][7]

See also

[edit]

Notes

[edit]
  1. ^Dodge, Yadolah (2010).The concise encyclopedia of statistics. New York: Springer.ISBN 978-0-387-32833-1.
  2. ^Rousseeuw, P. J.; Croux, C. (1993). "Alternatives to the median absolute deviation".Journal of the American Statistical Association.88 (424):1273–1283.doi:10.1080/01621459.1993.10476408.hdl:2027.42/142454.
  3. ^Ruppert, D. (2010).Statistics and Data Analysis for Financial Engineering. Springer. p. 118.ISBN 9781441977878. Retrieved2015-08-27.
  4. ^Leys, C.; et al. (2013)."Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median"(PDF).Journal of Experimental Social Psychology.49 (4):764–766.doi:10.1016/j.jesp.2013.03.013.
  5. ^Spacek, Libor."Rstats - Rust Implementation of Statistical Measures, Vector Algebra, Geometric Median, Data Analysis and Machine Learning".crates.io. Retrieved26 July 2022.
  6. ^Gauss, Carl Friedrich (1816). "Bestimmung der Genauigkeit der Beobachtungen".Zeitschrift für Astronomie und Verwandte Wissenschaften.1:187–197.
  7. ^Walker, Helen (1931).Studies in the History of the Statistical Method. Baltimore, MD: Williams & Wilkins Co. pp. 24–25.

References

[edit]
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
Machine learning evaluation metrics
Regression
Classification
Clustering
Ranking
Computer vision
NLP
Deep learning
Recommender system
Similarity
Retrieved from "https://en.wikipedia.org/w/index.php?title=Median_absolute_deviation&oldid=1336617582"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp