


Instatistics, amultimodaldistribution is aprobability distribution with more than onemode (i.e., more than one local peak of the distribution). These appear as distinct peaks (local maxima) in theprobability density function, as shown in Figures 1 and 2. Categorical, continuous, and discrete data can all form multimodal distributions. Among univariate analyses, multimodal distributions are commonly bimodal.[citation needed]
When the two modes are unequal the larger mode is known as the major mode and the other as the minor mode. The least frequent value between the modes is known as theantimode. The difference between the major and minor modes is known as theamplitude. In time series the major mode is called theacrophase and the antimode thebatiphase.[citation needed]
Galtung introduced a classification system (AJUS) for distributions:[1]
This classification has since been modified slightly:
Under this classification bimodal distributions are classified as type S or U.
Bimodal distributions occur both in mathematics and in the natural sciences.
Important bimodal distributions include thearcsine distribution and thebeta distribution (iff both parametersa andb are less than 1). Others include theU-quadratic distribution.
The ratio of two normal distributions is also bimodally distributed. Let
wherea andb are constant andx andy are distributed as normal variables with a mean of 0 and a standard deviation of 1.R has a known density that can be expressed as aconfluent hypergeometric function.[2]
The distribution of thereciprocal of at distributed random variable is bimodal when the degrees of freedom are more than one. Similarly the reciprocal of a normally distributed variable is also bimodally distributed.
At statistic generated from data set drawn from aCauchy distribution is bimodal.[3]
Examples of variables with bimodal distributions include the time between eruptions of certaingeysers, thecolor of galaxies, the size of workerweaver ants, the age of incidence ofHodgkin's lymphoma, the speed of inactivation of the drugisoniazid in US adults, the absolute magnitude ofnovae, and thecircadian activity patterns of thosecrepuscular animals that are active both in morning and evening twilight. In fishery science multimodal length distributions reflect the different year classes and can thus be used for age distribution- and growth estimates of the fish population.[4] Sediments are usually distributed in a bimodal fashion. When sampling mining galleries crossing either the host rock and the mineralized veins, the distribution of geochemical variables would be bimodal. Bimodal distributions are also seen in traffic analysis, where traffic peaks in during the AM rush hour and then again in the PM rush hour. This phenomenon is also seen in daily water distribution, as water demand, in the form of showers, cooking, and toilet use, generally peak in the morning and evening periods. Some genes in bacteria have also exhibited bimodal distributions of gene expression both in normal as well as in stress conditions.[5]
Ineconometric models, the parameters may be bimodally distributed.[6]
A bimodal distribution commonly arises as a mixture of two differentunimodal distributions (i.e. distributions having only one mode). In other words, the bimodally distributed random variable X is defined as with probability or with probability whereY andZ are unimodal random variables and is a mixture coefficient.
Mixtures with two distinct components need not be bimodal and two component mixtures of unimodal component densities can have more than two modes. There is no immediate connection between the number of components in a mixture and the number of modes of the resulting density.
Bimodal distributions, despite their frequent occurrence in data sets, have only rarely been studied[citation needed]. This may be because of the difficulties in estimating their parameters either with frequentist or Bayesian methods. Among those that have been studied are
Bimodality also naturally arises in thecusp catastrophe distribution.
In biology, several factors are known to contribute to bimodal distributions of population sizes[citation needed]:
The bimodal distribution of sizes ofweaver ant workers arises due to existence of two distinct classes of workers, namely major workers and minor workers.[11]
Thedistribution of fitness effects of mutations for both wholegenomes[12][13] and individualgenes[14] is also frequently found to be bimodal with mostmutations being either neutral or lethal with relatively few having intermediate effect.
A mixture of two unimodal distributions with differing means is not necessarily bimodal. The combined distribution of heights of men and women is sometimes used as an example of a bimodal distribution, but in fact the difference in mean heights of men and women is too small relative to theirstandard deviations to produce bimodality when the two distribution curves are combined.[15]
Bimodal distributions have the peculiar property that – unlike the unimodal distributions – the mean may be a more robust sample estimator than the median.[16] This is clearly the case when the distribution is U-shaped like the arcsine distribution. It may not be true when the distribution has one or more long tails.
Let
wheregi is a probability distribution andp is the mixing parameter.
The moments off(x) are[17]
where
andSi andKi are theskewness andkurtosis of thei-th distribution.
It is not uncommon to encounter situations where an investigator believes that the data comes from a mixture of two normal distributions. Because of this, this mixture has been studied in some detail.[18]
A mixture of two normal distributions has five parameters to estimate: the two means, the two variances and the mixing parameter. A mixture of twonormal distributions with equalstandard deviations is bimodal only if their means differ by at least twice the common standard deviation.[15] Estimates of the parameters is simplified if the variances can be assumed to be equal (thehomoscedastic case).
If the means of the two normal distributions are equal, then the combined distribution is unimodal. Conditions forunimodality of the combined distribution were derived by Eisenberger.[19] Necessary and sufficient conditions for a mixture of normal distributions to be bimodal have been identified by Ray and Lindsay.[20]
A mixture of two approximately equal mass normal distributions has a negative kurtosis since the two modes on either side of the center of mass effectively reduces the tails of the distribution.
A mixture of two normal distributions with highly unequal mass has a positive kurtosis since the smaller distribution lengthens the tail of the more dominant normal distribution.
Mixtures of other distributions require additional parameters to be estimated.
Bimodal distributions are a commonly used example of how summary statistics such as themean,median, andstandard deviation can be deceptive when used on an arbitrary distribution. For example, in the distribution in Figure 1, the mean and median would be about zero, even though zero is not a typical value. The standard deviation is also larger than deviation of each normal distribution.
Although several have been suggested, there is no presently generally agreed summary statistic (or set of statistics) to quantify the parameters of a general bimodal distribution. For a mixture of two normal distributions the means and standard deviations along with the mixing parameter (the weight for the combination) are usually used – a total of five parameters.
A statistic that may be useful is Ashman's D:[23]
whereμ1,μ2 are the means andσ1,σ2 are the standard deviations.
For a mixture of two normal distributionsD > 2 is required for a clean separation of the distributions.
This measure is a weighted average of the degree of agreement the frequency distribution.[24]A ranges from -1 (perfectbimodality) to +1 (perfectunimodality). It is defined as
whereU is the unimodality of the distribution,S the number of categories that have nonzero frequencies andK the total number of categories.
The value of U is 1 if the distribution has any of the three following characteristics:
With distributions other than these the data must be divided into 'layers'. Within a layer the responses are either equal or zero. The categories do not have to be contiguous. A value forA for each layer (Ai) is calculated and a weighted average for the distribution is determined. The weights (wi) for each layer are the number of responses in that layer. In symbols
Auniform distribution hasA = 0: when all the responses fall into one categoryA = +1.
One theoretical problem with this index is that it assumes that the intervals are equally spaced. This may limit its applicability.
This index assumes that the distribution is a mixture of two normal distributions with means (μ1 andμ2) and standard deviations (σ1 andσ2):[25]
Sarle's bimodality coefficientb is[26]
whereγ is theskewness andκ is thekurtosis. The kurtosis is here defined to be the standardised fourth moment around the mean. The value ofb lies between 0 and 1.[27] The logic behind this coefficient is that a bimodal distribution with light tails will have very low kurtosis, an asymmetric character, or both – all of which increase this coefficient.
The formula for a finite sample is[28]
wheren is the number of items in the sample,g is thesample skewness andk is the sampleexcess kurtosis.
The value ofb for theuniform distribution is 5/9. This is also its value for theexponential distribution. Values greater than 5/9 may indicate a bimodal or multimodal distribution, though corresponding values can also result for heavily skewed unimodal distributions.[29] The maximum value (1.0) is reached only by aBernoulli distribution with only two distinct values or the sum of two differentDirac delta functions (a bi-delta distribution).
The distribution of this statistic is unknown. It is related to a statistic proposed earlier by Pearson – the difference between the kurtosis and the square of the skewness (vide infra).
This is defined as[25]
whereA1 is the amplitude of the smaller peak andAan is the amplitude of the antimode.
AB is always < 1. Larger values indicate more distinct peaks.
This is the ratio of the left and right peaks.[25] Mathematically
whereAl andAr are the amplitudes of the left and right peaks respectively.
This parameter (B) is due to Wilcock.[30]
whereAl andAr are the amplitudes of the left and right peaks respectively andPi is the logarithm taken to the base 2 of the proportion of the distribution in the ith interval. The maximal value of theΣP is 1 but the value ofB may be greater than this.
To use this index, the log of the values are taken. The data is then divided into interval of width Φ whose value is log 2. The width of the peaks are taken to be four times 1/4Φ centered on their maximum values.
The bimodality index proposed by Wanget al assumes that the distribution is a sum of two normal distributions with equal variances but differing means.[31] It is defined as follows:
whereμ1,μ2 are the means andσ is the common standard deviation.
wherep is the mixing parameter.
A different bimodality index has been proposed by Sturrock.[32]
This index (B) is defined as
Whenm = 2 andγ is uniformly distributed,B is exponentially distributed.[33]
This statistic is a form ofperiodogram. It suffers from the usual problems of estimation and spectral leakage common to this form of statistic.
Another bimodality index has been proposed by de Michele and Accatino.[34] Their index (B) is
whereμ is the arithmetic mean of the sample and
wheremi is number of data points in theith bin,xi is the center of theith bin andL is the number of bins.
The authors suggested a cut off value of 0.1 forB to distinguish between a bimodal (B > 0.1)and unimodal (B < 0.1) distribution. No statistical justification was offered for this value.
A further index (B) has been proposed by Sambrook Smithet al[35]
wherep1 andp2 are the proportion contained in the primary (that with the greater amplitude) and secondary (that with the lesser amplitude) mode andφ1 andφ2 are theφ-sizes of the primary and secondary mode. Theφ-size is defined as minus one times the log of the data size taken to the base 2. This transformation is commonly used in the study of sediments.
The authors recommended a cut off value of 1.5 with B being greater than 1.5 for a bimodal distribution and less than 1.5 for a unimodal distribution. No statistical justification for this value was given.
Otsu's method for finding a threshold for separation between two modes relies on minimizing the quantitywhereni is the number of data points in theith subpopulation,σi2 is the variance of theith subpopulation,m is the total size of the sample andσ2 is the sample variance. Some researchers (particularly in the field ofdigital image processing) have applied this quantity more broadly as an index for detecting bimodality, with a small value indicating a more bimodal distribution.[36]
A number of tests are available to determine if a data set is distributed in a bimodal (or multimodal) fashion.
In the study of sediments, particle size is frequently bimodal. Empirically, it has been found useful to plot the frequency against the log( size ) of the particles.[37][38] This usually gives a clear separation of the particles into a bimodal distribution. In geological applications thelogarithm is normally taken to the base 2. The log transformed values are referred to as phi (Φ) units. This system is known as theKrumbein (or phi) scale.
An alternative method is to plot the log of the particle size against the cumulative frequency. This graph will usually consist two reasonably straight lines with a connecting line corresponding to the antimode.
Approximate values for several statistics can be derived from the graphic plots.[37]
whereφx is the value of the variateφ at thexth percentage of the distribution.
Pearson in 1894 was the first to devise a procedure to test whether a distribution could be resolved into two normal distributions.[39] This method required the solution of a ninth orderpolynomial. In a subsequent paper Pearson reported that for any distribution skewness2 + 1 < kurtosis.[27] Later Pearson showed that[40]
whereb2 is the kurtosis andb1 is the square of the skewness. Equality holds only for the two pointBernoulli distribution or the sum of two differentDirac delta functions. These are the most extreme cases of bimodality possible. The kurtosis in both these cases is 1. Since they are both symmetrical their skewness is 0 and the difference is 1.
Baker proposed a transformation to convert a bimodal to a unimodal distribution.[41]
Several tests of unimodality versus bimodality have been proposed: Haldane suggested one based on second central differences.[42] Larkin later introduced a test based on the F test;[43] Benett created one based onFisher's G test.[44] Tokeshi has proposed a fourth test.[45][46] A test based on a likelihood ratio has been proposed by Holzmann and Vollmer.[21]
A method based on the score and Wald tests has been proposed.[47] This method can distinguish between unimodal and bimodal distributions when the underlying distributions are known.
Statistical tests for the antimode are known.[48]
Otsu's method is commonly employed in computer graphics to determine the optimal separation between two distributions.
To test if a distribution is other than unimodal, several additional tests have been devised: thebandwidth test,[49] thedip test,[50] theexcess mass test,[51] the MAP test,[52] themode existence test,[53] therunt test,[54][55] thespan test,[56] and thesaddle test.
An implementation of the dip test is available for theR programming language.[57] The p-values for the dip statistic values range between 0 and 1. P-values less than 0.05 indicate significant multimodality and p-values greater than 0.05 but less than 0.10 suggest multimodality with marginal significance.[58]
Silverman introduced a bootstrap method for the number of modes.[49] The test uses a fixed bandwidth which reduces the power of the test and its interpretability. Under smoothed densities may have an excessive number of modes whose count during bootstrapping is unstable.
Bajgier and Aggarwal have proposed a test based on the kurtosis of the distribution.[59]
Additional tests are available for a number of special cases:
A study of a mixture density of two normal distributions data found that separation into the two normal distributions was difficult unless the means were separated by 4–6 standard deviations.[60]
Inastronomy the Kernel Mean Matching algorithm is used to decide if a data set belongs to a single normal distribution or to a mixture of two normal distributions.
This distribution is bimodal for certain values of is parameters. A test for these values has been described.[61]

Assuming that the distribution is known to be bimodal or has been shown to be bimodal by one or more of the tests above, it is frequently desirable to fit a curve to the data. This may be difficult.
Bayesian methods may be useful in difficult cases.
A package forR is available for testing for bimodality.[62] This package assumes that the data are distributed as a sum of two normal distributions. If this assumption is not correct the results may not be reliable. It also includes functions for fitting a sum of two normal distributions to the data.
Assuming that the distribution is a mixture of two normal distributions then the expectation-maximization algorithm may be used to determine the parameters. Several programmes are available for this including Cluster,[63] and the R package nor1mix.[64]
The mixtools package available for R can test for and estimate the parameters of a number of different distributions.[65] A package for a mixture of two right-tailed gamma distributions is available.[66]
Several other packages for R are available to fit mixture models; these include flexmix,[67] mcclust,[68] agrmt,[69] and mixdist.[70]
The statistical programming languageSAS can also fit a variety of mixed distributions with the PROC FREQ procedure.
In Python, the packageScikit-learn contains a tool for mixture modeling[71]
{{cite web}}: CS1 maint: archived copy as title (link)