Instatistics, themedian absolute deviation (MAD), also referred to as themedian absolute deviation from the median (MADFM), is arobust oroutlier-resistant measure of thevariability of aunivariate sample ofquantitative data. For a univariate data setX1, X2, ..., Xn, the MAD is defined as themedian of theabsolute deviations from the data's median,.It can also refer to thepopulationparameter that isestimated by the MAD calculated from a sample.[1]
Consider the data (1, 1, 2,2, 4, 6, 9). It has a median value of 2. The absolute deviations about 2 are (1, 1, 0, 0, 2, 4, 7) which in turn have a median value of 1 (because the sorted absolute deviations are (0, 0, 1,1, 2, 4, 7)). So the median absolute deviation for this data is 1.
The median absolute deviation is a measure ofstatistical dispersion. Moreover, the MAD is arobust statistic, being more resilient to outliers in a data set than thestandard deviation. In the standard deviation, the distances from themean are squared, so large deviations are weighted more heavily, and thus outliers can heavily influence it. In the MAD, the deviations of a small number of outliers are irrelevant.
Because the MAD is a more robust estimator of scale than the samplevariance orstandard deviation, it works better with distributions without a mean or variance, such as theCauchy distribution.
The MAD may be used similarly to how one would use the deviation for the average.In order to use the MAD as aconsistent estimator for theestimation of thestandard deviation, one takes
where is a constantscale factor, which depends on the distribution.[2]
Fornormally distributed data is taken to be
i.e., thereciprocal of thequantile function (also known as the inverse of thecumulative distribution function) for thestandard normal distribution.[3][4]
The argument 3/4 is such that covers 50% (between 1/4 and 3/4) of the standard normalcumulative distribution function, i.e.
Therefore, we must have that
Noticing that
we have that, from which we obtain the scale factor.
Another way of establishing the relationship is noting that MAD equals thehalf-normal distribution median:
This form is used in, e.g., theprobable error.
In the case ofcomplex values (X+iY), the relation of MAD to the standard deviation is unchanged for normally distributed data.
Analogously to how themedian generalizes to thegeometric median (GM) inmultivariate data, MAD can be generalized to themedian of distances to GM (MADGM) inn dimensions. This is done by replacing the absolute differences in one dimension byEuclidean distances of the data points to the geometric median inn dimensions.[5] This gives the identical result as the univariate MAD in one dimension and generalizes to any number of dimensions. MADGM needs the geometric median to be found, which is done by an iterative process.
The population MAD is defined analogously to the sample MAD, but is based on the complete population rather than on a sample. For a symmetric distribution with zero mean, the population MAD is the 75thpercentile of the distribution.
Unlike thevariance, which may be infinite or undefined, the population MAD is always a finite number. For example, the standardCauchy distribution has undefined variance, but its MAD is 1.
The earliest known mention of the concept of the MAD occurred in 1816, in a paper byCarl Friedrich Gauss on the determination of the accuracy of numerical observations.[6][7]