Movatterモバイル変換

[0]ホーム

Jump to content

68–95–99.7 rule

Edit links

From Wikipedia, the free encyclopedia

(Redirected from3-sigma)

Shorthand used in statistics

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "68–95–99.7 rule" – news ·newspapers ·books ·scholar ·JSTOR(September 2023) (Learn how and when to remove this message)

For an approximatelynormal data set, the values within one standard deviation of the mean account for about 68% of the set; while within two standard deviations account for about 95%; and within three standard deviations account for about 99.7%. Shown percentages are rounded theoretical probabilities intended only to approximate the empirical data derived from a normal population.

Prediction interval (on they-axis) given from thestandard score (on thex-axis). They-axis is scaled as the negative logarithm of the complement of the probability to 1, i.e.-log (1 -p) , and labeled with the values ofp.

Instatistics, the68–95–99.7 rule, also known as theempirical rule, and sometimes abbreviated3SR or3σ, is a shorthand used to remember the percentage of values that lie within aninterval estimate in anormal distribution: approximately 68%, 95%, and 99.7% of the values lie within one, two, and threestandard deviations of themean, respectively.

In mathematical notation, these facts can be expressed as follows, wherePr() is theprobability function,^[1]Χ is an observation from a normally distributedrandom variable,μ (mu) is the mean of the distribution, andσ (sigma) is its standard deviation: ${\begin{aligned}\Pr(\mu -1\sigma \leq X\leq \mu +1\sigma )&\approx 68.27\%\\\Pr(\mu -2\sigma \leq X\leq \mu +2\sigma )&\approx 95.45\%\\\Pr(\mu -3\sigma \leq X\leq \mu +3\sigma )&\approx 99.73\%\end{aligned}}$

The usefulness of this heuristic depends especially on the question under consideration and the manner in which the data have been collected; most particularly the heuristic depends on the data genuinely beingnormally distributed: Among the many bell-shaped distributions often seen in real-life data, the normal distribution has notoriously "thin tails" – an unusual concentration of probability near its center. If the datumX is instead governed by one of the many similar-appearing andcommonly encountered distributions that have "fatter tails" – with probability more spread-out – the significance would be lower for all three deviations from the mean.

In theempirical sciences, the so-calledthree-sigma rule of thumb (or3σ rule) expresses a conventionalheuristic that nearly all values are taken to lie within three standard deviations of the mean, and thus it is empirically useful to treat 99.7%probability as near certainty.^[2]

In thesocial sciences, a result may be consideredstatistically significant (clear enough to warrant closer examination) if itsconfidence level is of the order of a two-sigma effect (95%), while inparticle physics, there is a convention of requiring statistical significance of a five-sigma effect (99.99994% confidence) to qualify as adiscovery.^[3]

A weaker three-sigma rule can be derived fromChebyshev's inequality, stating that even for non-normally distributed variables, at least 88.8% of cases should fall within properly calculated three-sigma intervals. Forunimodal distributions, the probability of being within three-sigma is at least 95% by theVysochanskij–Petunin inequality. There may be certain assumptions for a distribution that force this probability to be at least 98%.^[4]

Proof

[edit]

We have that ${\begin{aligned}\Pr(\mu -n\sigma \leq X\leq \mu +n\sigma )=\int _{\mu -n\sigma }^{\mu +n\sigma }{\frac {1}{{\sqrt {2\pi }}\sigma }}e^{-{\frac {1}{2}}\left({\frac {x-\mu }{\sigma }}\right)^{2}}dx,\end{aligned}}$ doing thechange of variable in terms of thestandard score $z={\frac {x-\mu }{\sigma }}$ , we have ${\begin{aligned}{\frac {1}{\sqrt {2\pi }}}\int _{-n}^{n}e^{-{\frac {z^{2}}{2}}}dz\end{aligned}},$ and this integral is independent of $\mu$ and $\sigma$ . We only need to calculate each integral for the cases $n=1,2,3$ . ${\begin{aligned}\Pr(\mu -1\sigma \leq X\leq \mu +1\sigma )&={\frac {1}{\sqrt {2\pi }}}\int _{-1}^{1}e^{-{\frac {z^{2}}{2}}}dz\approx 0.6826894921\\\Pr(\mu -2\sigma \leq X\leq \mu +2\sigma )&={\frac {1}{\sqrt {2\pi }}}\int _{-2}^{2}e^{-{\frac {z^{2}}{2}}}dz\approx 0.9544997361\\\Pr(\mu -3\sigma \leq X\leq \mu +3\sigma )&={\frac {1}{\sqrt {2\pi }}}\int _{-3}^{3}e^{-{\frac {z^{2}}{2}}}dz\approx 0.9973002039.\end{aligned}}$

Cumulative distribution function

[edit]

Main article:Prediction interval § Known mean, known variance

Diagram showing thecumulative distribution function for the normal distribution with mean (μ) 0 and variance (`σ`²) 1

These numerical values "68%, 95%, 99.7%" come from thecumulative distribution function of the normal distribution.

Theprediction interval for anystandard scorez corresponds numerically to(1 − (1 −Φ_μ,σ²(z)) · 2).

For example,Φ(2) ≈ 0.9772, orPr(X ≤μ + 2σ) ≈ 0.9772, corresponding to a prediction interval of(1 − (1 − 0.97725)·2) = 0.9545 = 95.45%.This is not a symmetrical interval – this is merely the probability that an observation is less thanμ + 2σ. To compute the probability that an observation is within two standard deviations of the mean (small differences due to rounding): $\Pr(\mu -2\sigma \leq X\leq \mu +2\sigma )=\Phi (2)-\Phi (-2)\approx 0.9772-(1-0.9772)\approx 0.9545$

This is related toconfidence interval as used in statistics: ${\bar {X}}\pm 2{\frac {\sigma }{\sqrt {n}}}$ is approximately a 95% confidence interval when ${\bar {X}}$ is the average of a sample of size $n {\displaystyle n}$ .

Normality tests

[edit]

Main article:Normality test

The "68–95–99.7 rule" is often used to quickly get a rough probability estimate of something, given its standard deviation, if the population is assumed to be normal. It is also used as a simple test foroutliers if the population is assumed normal, and as anormality test if the population is potentially not normal.

To pass from a sample to a number of standard deviations, one first computes thedeviation, either theerror or residual depending on whether one knows the population mean or only estimates it. The next step isstandardizing (dividing by the population standard deviation), if the population parameters are known, orstudentizing (dividing by an estimate of the standard deviation), if the parameters are unknown and only estimated.

To use as a test for outliers or a normality test, one computes the size of deviations in terms of standard deviations, and compares this to expected frequency. Given a sample set, one can compute thestudentized residuals and compare these to the expected frequency: points that fall more than 3 standard deviations from the norm are likely outliers (unless thesample size is significantly large, by which point one expects a sample this extreme), and if there are many points more than 3 standard deviations from the norm, one likely has reason to question the assumed normality of the distribution. This holds ever more strongly for moves of 4 or more standard deviations.

One can compute more precisely, approximating the number of extreme moves of a given magnitude or greater by aPoisson distribution, but simply, if one has multiple 4 standard deviation moves in a sample of size 1,000, one has strong reason to consider these outliers or question the assumed normality of the distribution.

For example, a 6σ event corresponds to a chance of about twoparts per billion. For illustration, if events are taken to occur daily, this would correspond to an event expected every 1.4 million years. This gives asimple normality test: if one witnesses a 6σ in daily data and significantly fewer than 1 million years have passed, then a normal distribution most likely does not provide a good model for the magnitude or frequency of large deviations in this respect.

InThe Black Swan,Nassim Nicholas Taleb gives the example of risk models according to which theBlack Monday crash would correspond to a 36-σ event:the occurrence of such an event should instantly suggest that the model is flawed, i.e. that the process under consideration is not satisfactorily modeled by a normal distribution. Refined models should then be considered, e.g. by the introduction ofstochastic volatility. In such discussions it is important to be aware of the problem of thegambler's fallacy, which states that a single observation of a rare event does not contradict that the event is in fact rare. It is the observation of a plurality of purportedly rare events that increasinglyundermines the hypothesis that they are rare, i.e. the validity of the assumed model. A proper modelling of this process of gradual loss of confidence in a hypothesis would involve the designation ofprior probability not just to the hypothesis itself but to all possible alternative hypotheses. For this reason,statistical hypothesis testing works not so much by confirming a hypothesis considered to be likely, but byrefuting hypotheses considered unlikely.

Table of numerical values

[edit]

Because of the exponentially decreasing tails of the normal distribution, odds of higher deviations decrease very quickly. From therules for normally distributed data for a daily event:

Range	Expected fraction of population		Approx. expected frequency outside range	Approx. frequency outside range for daily event
Range	inside range	outside range	Approx. expected frequency outside range	Approx. frequency outside range for daily event
μ ± 0.5 σ	0.382924922548026	0.6171 = 61.71 %	3 in 5	Four or five times a week
μ ±σ	0.682689492137086^[5]	0.3173 = 31.73 %	1 in 3	Twice or thrice a week
μ ± 1.5 σ	0.866385597462284	0.1336 = 13.36 %	2 in 15	Weekly
μ ± 2 σ	0.954499736103642^[6]	0.04550 = 4.550 %	1 in 22	Every three weeks
μ ± 2.5 σ	0.987580669348448	0.01242 = 1.242 %	1 in 81	Quarterly
μ ± 3 σ	0.997300203936740^[7]	0.002700 = 0.270 % = 2.700 ‰	1 in 370	Yearly
μ ± 3.5 σ	0.999534741841929	0.0004653 = 0.04653 % =465.3 ppm	1 in 2149	Every 6 years
μ ± 4 σ	0.999936657516334	6.334×10⁻⁵ =63.34 ppm	1 in15787	Every 43 years (twice in a lifetime)
μ ± 4.5 σ	0.999993204653751	6.795×10⁻⁶ =6.795 ppm	1 in147160	Every 403 years (once in themodern era)
μ ± 5 σ	0.999999426696856	5.733×10⁻⁷ =0.5733 ppm =573.3 ppb	1 in1744278	Every4776 years (once inrecorded history)
μ ± 5.5 σ	0.999999962020875	3.798×10⁻⁸ =37.98 ppb	1 in26330254	Every72090 years (thrice in history ofmodern humankind)
μ ± 6 σ	0.999999998026825	1.973×10⁻⁹ =1.973 ppb	1 in506797346	Every 1.38 million years (twice in history ofhumankind)
μ ± 6.5 σ	0.999999999919680	8.032×10⁻¹¹ =0.08032 ppb =80.32 ppt	1 in12450197393	Every 34 million years (twice since theextinction of dinosaurs)
μ ± 7 σ	0.999999999997440	2.560×10⁻¹² =2.560 ppt	1 in390682215445	Every 1.07 billion years (four occurrences inhistory of Earth)
μ ± 7.5 σ	0.999999999999936	6.382×10⁻¹⁴ =63.82 ppq	1 in15669601204101	Once every 43 billion years (never in the history of theUniverse, twice in the future of theLocal Group before its merger)
μ ± 8 σ	0.999999999999999	1.244×10⁻¹⁵ =1.244 ppq	1 in 803734397655348	Once every 2.2trillion years (never in the history of theUniverse, once during the life of ared dwarf)
μ ±xσ	$\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)$	$1-\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)$	1 in ${\frac {1}{1-\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)}}$	Every ${\frac {1}{1-\operatorname {erf} \left({\frac {x}{\sqrt {2}}}\right)}}$ days

References

[edit]

^Huber, Franz (2018).A Logical Introduction to Probability and Induction. New York, NY:Oxford University Press. p. 80.ISBN 9780190845414 – via Google.
^
This use of the phrase "three-sigma rule" became common in the 2000s, e.g. cited in
- Schaum's Outline of Business Statistics. McGraw Hill Professional. 2003. p. 359.ISBN 9780071398763
- Grafarend, Erik W. (2006).Linear and Nonlinear Models: Fixed effects, random effects, and mixed models. Walter de Gruyter. p. 553.ISBN 9783110162165.
^Lyons, Louis (7 October 2013). "Discovering the sigificanceof5σ".arXiv:1310.1284 [physics.data-an].
^
See:
- Wheeler, D.J.; Chambers, D.S. (1992).Understanding Statistical Process Control. SPC Press.ISBN 9780945320135 – via Google.
- Czitrom, Veronica; Spagon, Patrick D. (1997).Statistical Case Studies for Industrial Process Improvement.SIAM. p. 342.ISBN 9780898713947 – via Google.
- Pukelsheim, F. (1994). "The Three Sigma rule".American Statistician.48 (2):88–91.doi:10.2307/2684253.JSTOR 2684253.
^Sloane, N. J. A. (ed.)."Sequence A178647".TheOn-Line Encyclopedia of Integer Sequences. OEIS Foundation.
^Sloane, N. J. A. (ed.)."Sequence A110894".TheOn-Line Encyclopedia of Integer Sequences. OEIS Foundation.
^Sloane, N. J. A. (ed.)."Sequence A270712".TheOn-Line Encyclopedia of Integer Sequences. OEIS Foundation.

External links

[edit]

"Calculate percentage proportion withinx sigmas".WolframAlpha.

Probability distributions (list)

Discrete
univariate

with finite support	Benford Bernoulli Beta-binomial Binomial Categorical Hypergeometric Negative Poisson binomial Rademacher Soliton Discrete uniform Zipf Zipf–Mandelbrot
with infinite support	Beta negative binomial Borel Conway–Maxwell–Poisson Discrete phase-type Delaporte Extended negative binomial Flory–Schulz Gauss–Kuzmin Geometric Logarithmic Mixed Poisson Negative binomial Panjer Parabolic fractal Poisson Skellam Yule–Simon Zeta

Continuous
univariate

supported on a bounded interval	Arcsine ARGUS Balding–Nichols Bates Beta Generalized Beta rectangular Continuous Bernoulli Irwin–Hall Kumaraswamy Logit-normal Noncentral beta PERT Power function Raised cosine Reciprocal Triangular U-quadratic Uniform Wigner semicircle
supported on a semi-infinite interval	Benini Benktander 1st kind Benktander 2nd kind Beta prime Burr Chi Chi-squared Noncentral Inverse Scaled Dagum Davis Erlang Hyper Exponential Hyperexponential Hypoexponential Logarithmic F Noncentral Folded normal Fréchet Gamma Generalized Inverse gamma/Gompertz Gompertz Shifted Half-logistic Half-normal Hotelling'sT-squared Hartman–Watson Inverse Gaussian Generalized Kolmogorov Lévy Log-Cauchy Log-Laplace Log-logistic Log-normal Log-t Lomax Matrix-exponential Maxwell–Boltzmann Maxwell–Jüttner Mittag-Leffler Nakagami Pareto Phase-type Poly-Weibull Rayleigh Relativistic Breit–Wigner Rice Truncated normal type-2 Gumbel Weibull Discrete Wilks's lambda
supported on the whole real line	Cauchy Exponential power Fisher'sz Kaniadakis κ-Gaussian Gaussianq Generalized hyperbolic Generalized logistic (logistic-beta) Generalized normal Geometric stable Gumbel Holtsmark Hyperbolic secant Johnson'sS_U Landau Laplace Asymmetric Logistic Noncentralt Normal (Gaussian) Normal-inverse Gaussian Skew normal Slash Stable Student'st Tracy–Widom Variance-gamma Voigt
with support whose type varies	Generalized chi-squared Generalized extreme value Generalized Pareto Marchenko–Pastur Kaniadakisκ-exponential Kaniadakisκ-Gamma Kaniadakisκ-Weibull Kaniadakisκ-Logistic Kaniadakisκ-Erlang q-exponential q-Gaussian q-Weibull Shifted log-logistic Tukey lambda