This articlemay be too technical for most readers to understand. Pleasehelp improve it tomake it understandable to non-experts, without removing the technical details.(May 2020) (Learn how and when to remove this message) |

Inprobability theory,heavy-tailed distributions areprobability distributions whose tails are not exponentially bounded:[1] that is, they have heavier tails than theexponential distribution. Roughly speaking, “heavy-tailed” means the distribution decreases more slowly than an exponential distribution, so extreme values are more likely. In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy.
There are three important subclasses of heavy-tailed distributions: thefat-tailed distributions, thelong-tailed distributions, and thesubexponential distributions. In practice, all commonly used heavy-tailed distributions belong to the subexponential class, introduced byJozef Teugels.[2]
There is still some discrepancy over the use of the termheavy-tailed. There are two other definitions in use. Some authors use the term to refer to those distributions which do not have all their powermoments finite; and some others to those distributions that do not have a finitevariance. The definition given in this article is the most general in use, and includes all distributions encompassed by the alternative definitions, as well as those distributions such aslog-normal that possess all their power moments, yet which are generally considered to be heavy-tailed. (Occasionally, heavy-tailed is used for any distribution that has heavier tails than the normal distribution.)
The distribution of arandom variableX withdistribution functionF is said to have a heavy (right) tail if themoment generating function ofX,MX(t), is infinite for allt > 0.[3]
That means
This is also written in terms of the tail distribution function
as
The distribution of arandom variableX withdistribution functionF is said to have a long right tail[1] if for allt > 0,
or equivalently
This has the intuitive interpretation for a right-tailed long-tailed distributed quantity that if the long-tailed quantity exceeds some high level, the probability approaches 1 that it will exceed any other higher level.
All long-tailed distributions are heavy-tailed, but the converse is false, and it is possible to construct heavy-tailed distributions that are not long-tailed.
Subexponentiality is defined in terms ofconvolutions of probability distributions. For two independent, identically distributedrandom variables with a common distribution function, the convolution of with itself, written and called the convolution square, is defined usingLebesgue–Stieltjes integration by:
and then-fold convolution is defined inductively by the rule:
The tail distribution function is defined as.
A distribution on the positive half-line is subexponential[1][5][2] if
This implies[6] that, for any,
The probabilistic interpretation[6] of this is that, for a sum ofindependentrandom variables with common distribution,
This is often known as the principle of the single big jump[7] or catastrophe principle.[8]
A distribution on the whole real line is subexponential if the distribution is.[9] Here is theindicator function of the positive half-line. Alternatively, a random variable supported on the real line is subexponential if and only if is subexponential.
All subexponential distributions are long-tailed, but examples can be constructed of long-tailed distributions that are not subexponential.
All commonly used heavy-tailed distributions are subexponential.[6]
Those that are one-tailed include:
Those that are two-tailed include:
Afat-tailed distribution is a distribution for which the probability density function, for large x, goes to zero as a power. Since such a power is always bounded below by the probability density function of an exponential distribution, fat-tailed distributions are always heavy-tailed. Some distributions, however, have a tail which goes to zero slower than anexponential function (meaning they are heavy-tailed), but faster than a power (meaning they are not fat-tailed). An example is thelog-normal distribution[contradictory]. Many other heavy-tailed distributions such as thelog-logistic andPareto distribution are, however, also fat-tailed.
There are parametric[6] and non-parametric[13] approaches to the problem of the tail-index estimation.[when defined as?]
To estimate the tail-index using the parametric approach, some authors employGEV distribution orPareto distribution; they may apply the maximum-likelihood estimator (MLE).
With a random sequence of independent and same density function, the Maximum Attraction Domain[14] of the generalized extreme value density, where. If and, then thePickands tail-index estimation is[6][14]
where. This estimator converges in probability to.
Let be a sequence of independent and identically distributed random variables with distribution function, the maximum domain of attraction of thegeneralized extreme value distribution, where. The sample path is where is the sample size. If is an intermediate order sequence, i.e., and, then the Hill tail-index estimator is[15]
where is the-thorder statistic of.This estimator converges in probability to, and is asymptotically normal provided is restricted based on a higher order regular variation property[16] .[17] Consistency and asymptotic normality extend to a large class of dependent and heterogeneous sequences,[18][19] irrespective of whether is observed, or a computed residual or filtered data from a large class of models and estimators, including mis-specified models and models with errors that are dependent.[20][21][22] Note that both Pickand's and Hill's tail-index estimators commonly make use of logarithm of the order statistics.[23]
Theratio estimator (RE-estimator) of the tail-index was introduced by Goldie and Smith.[24] It is constructed similarly to Hill's estimator but uses a non-random "tuning parameter".
A comparison of Hill-type and RE-type estimators can be found in Novak.[13]
Nonparametric approaches to estimate heavy- and superheavy-tailed probability density functions were given in Markovich.[26] These are approaches based on variable bandwidth and long-tailed kernel estimators; on the preliminary data transform to a new random variable at finite or infinite intervals, which is more convenient for the estimation and then inverse transform of the obtained density estimate; and "piecing-together approach" which provides a certain parametric model for the tail of the density and a non-parametric model to approximate the mode of the density. Nonparametric estimators require an appropriate selection of tuning (smoothing) parameters like a bandwidth of kernel estimators and the bin width of the histogram. The well known data-driven methods of such selection are a cross-validation and its modifications, methods based on the minimization of the mean squared error (MSE) and its asymptotic and their upper bounds.[27] A discrepancy method which uses well-known nonparametric statistics like Kolmogorov-Smirnov's, von Mises and Anderson-Darling's ones as a metric in the space of distribution functions (dfs) and quantiles of the later statistics as a known uncertainty or a discrepancy value can be found in.[26] Bootstrap is another tool to find smoothing parameters using approximations of unknown MSE by different schemes of re-samples selection, see e.g.[28]
{{cite book}}: CS1 maint: multiple names: authors list (link){{cite web}}: CS1 maint: multiple names: authors list (link)