Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Tolerance interval

From Wikipedia, the free encyclopedia
Type of statistical probability
Not to be confused withEngineering tolerance.

Atolerance interval (TI) is astatistical interval within which, with someconfidence level, a specifiedsampledproportion of a population falls. "More specifically, a100×p%/100×(1−α) tolerance interval provides limits within which at least a certain proportion (p) of the population falls with a given level of confidence (1−α)."[1] "A (p, 1−α) tolerance interval (TI) based on a sample is constructed so that it would include at least a proportionp of the sampled population with confidence 1−α; such a TI is usually referred to as p-content − (1−α) coverage TI."[2] "A (p, 1−α) uppertolerance limit (TL) is simply a 1−α upperconfidence limit for the 100ppercentile of the population."[2]

Definition

[edit]
icon
This articleneeds attention from an expert in Statistics. The specific problem is:Definition needs to be contrasted and discussed against definition of aprediction interval.WikiProject Statistics may be able to help recruit an expert.(May 2024)

Assume observations orrandom variatesx=(x1,,xn){\displaystyle \mathbf {x} =(x_{1},\ldots ,x_{n})} as realization of independent random variablesX=(X1,,Xn){\displaystyle \mathbf {X} =(X_{1},\ldots ,X_{n})} which have a common distributionFθ{\displaystyle F_{\theta }}, with unknown parameterθ{\displaystyle \theta }.Then, a tolerance interval with endpoints(L(x),U(x)]{\displaystyle (L(\mathbf {x} ),U(\mathbf {x} )]} which has the defining property:[3]

infθ{Prθ(Fθ(U(X))Fθ(L(X))p)}=1α{\displaystyle \inf _{\theta }\{{\Pr }_{\theta }\left(F_{\theta }(U(\mathbf {X} ))-F_{\theta }(L(\mathbf {X} )\right)\geq p)\}=1-\alpha }

whereinf{}{\displaystyle \inf\{\}} denotes theinfimum function.

This is in contrast to a prediction interval with endpoints[l(x),u(x)]{\displaystyle [l(\mathbf {x} ),u(\mathbf {x} )]} which has the defining property:[3]

infθ{Prθ(X0[l(X),u(X)])}=1α{\displaystyle \inf _{\theta }\{{\Pr }_{\theta }(X_{0}\in [l(\mathbf {X} ),u(\mathbf {X} )])\}=1-\alpha }.

Here,X0{\displaystyle X_{0}} is a random variable from the same distributionFθ{\displaystyle F_{\theta }} but independent of the firstn{\displaystyle n} variables.

NoticeX0{\displaystyle X_{0}} isnot involved in the definition of tolerance interval, which deals only with the first sample, of sizen.

Calculation

[edit]

One-sided normal tolerance intervals have an exact solution in terms of the sample mean and sample variance based on thenoncentralt-distribution.[4] Two-sided normal tolerance intervals can be estimated using thechi-squared distribution.[4]

Relation to other intervals

[edit]
Further information:Interval estimation

"In the parameters-known case, a 95% tolerance interval and a 95%prediction interval are the same."[5] If we knew a population's exact parameters, we would be able to compute a range within which a certain proportion of the population falls. For example, if we know a population isnormally distributed withmeanμ{\displaystyle \mu } andstandard deviationσ{\displaystyle \sigma }, then the intervalμ±1.96σ{\displaystyle \mu \pm 1.96\sigma } includes 95% of the population (1.96 is thez-score for 95% coverage of a normally distributed population).

However, if we have only a sample from the population, we know only thesample meanμ^{\displaystyle {\hat {\mu }}} and sample standard deviationσ^{\displaystyle {\hat {\sigma }}}, which are only estimates of the true parameters. In that case,μ^±1.96σ^{\displaystyle {\hat {\mu }}\pm 1.96{\hat {\sigma }}} will not necessarily include 95% of the population, due to variance in these estimates. A tolerance interval bounds this variance by introducing a confidence levelγ{\displaystyle \gamma }, which is the confidence with which this interval actually includes the specified proportion of the population. For a normally distributed population, a z-score can be transformed into a "k factor" ortolerance factor[6] for a givenγ{\displaystyle \gamma } via lookup tables or several approximation formulas.[7] "As thedegrees of freedom approach infinity, the prediction and tolerance intervals become equal."[8]

The tolerance interval is less widely known than theconfidence interval andprediction interval, a situation some educators have lamented, as it can lead to misuse of the other intervals where a tolerance interval is more appropriate.[9][10]

The tolerance interval differs from aconfidence interval in that the confidence interval bounds a single-valued population parameter (themean or thevariance, for example) with some confidence, while the tolerance interval bounds the range of data values that includes a specific proportion of the population. Whereas a confidence interval's size is entirely due tosampling error, and will approach a zero-width interval at the true population parameter as sample size increases, a tolerance interval's size is due partly to sampling error and partly to actual variance in the population, and will approach the population's probability interval as sample size increases.[9][10]

The tolerance interval is related to aprediction interval in that both put bounds on variation in future samples. However, the prediction interval only bounds a single future sample, whereas a tolerance interval bounds the entire population (equivalently, an arbitrary sequence of future samples). In other words, a prediction interval covers a specified proportion of a populationon average, whereas a tolerance interval covers itwith a certain confidence level, making the tolerance interval more appropriate if a single interval is intended to bound multiple future samples.[10][11]

Examples

[edit]

[9] gives the following example:

So consider once again a proverbialEPAmileage test scenario, in which several nominally identical autos of a particular model are tested to produce mileage figuresy1,y2,...,yn{\displaystyle y_{1},y_{2},...,y_{n}}. If such data are processed to produce a 95% confidence interval for the mean mileage of the model, it is, for example, possible to use it to project the mean or total gasoline consumption for the manufactured fleet of such autos over their first 5,000 miles of use. Such an interval, would however, not be of much help to a person renting one of these cars and wondering whether the (full) 10-gallon tank of gas will suffice to carry him the 350 miles to his destination. For that job, a prediction interval would be much more useful. (Consider the differing implications of being "95% sure" thatμ35{\displaystyle \mu \geq 35} as opposed to being "95% sure" thatyn+135{\displaystyle y_{n+1}\geq 35}.) But neither a confidence interval forμ{\displaystyle \mu } nor a prediction interval for a single additional mileage is exactly what is needed by a design engineer charged with determining how large a gas tank the model really needs to guarantee that 99% of the autos produced will have a 400-mile cruising range. What the engineer really needs is a tolerance interval for a fractionp=.99{\displaystyle p=.99} of mileages of such autos.

Another example is given by:[11]

The air lead levels were collected fromn=15{\displaystyle n=15} different areas within the facility. It was noted that the log-transformed lead levels fitted a normal distribution well (that is, the data are from alognormal distribution. Letμ{\displaystyle \mu } andσ2{\displaystyle \sigma ^{2}}, respectively, denote the population mean and variance for the log-transformed data. IfX{\displaystyle X} denotes the corresponding random variable, we thus haveXN(μ,σ2){\displaystyle X\sim {\mathcal {N}}(\mu ,\sigma ^{2})}. We note thatexp(μ){\displaystyle \exp(\mu )} is the median air lead level. A confidence interval forμ{\displaystyle \mu } can be constructed the usual way, based on thet-distribution; this in turn will provide a confidence interval for the median air lead level. IfX¯{\displaystyle {\bar {X}}} andS{\displaystyle S} denote the sample mean and standard deviation of the log-transformed data for a sample of size n, a 95% confidence interval forμ{\displaystyle \mu } is given byX¯±tn1,0.975S/n{\displaystyle {\bar {X}}\pm t_{n-1,0.975}S/{\sqrt {n}}}, wheretm,1α{\displaystyle t_{m,1-\alpha }} denotes the1α{\displaystyle 1-\alpha } quantile of at-distribution withm{\displaystyle m} degrees of freedom. It may also be of interest to derive a 95% upper confidence bound for the median air lead level. Such a bound forμ{\displaystyle \mu } is given byX¯+tn1,0.95S/n{\displaystyle {\bar {X}}+t_{n-1,0.95}S/{\sqrt {n}}}. Consequently, a 95% upper confidence bound for the median air lead is given byexp(X¯+tn1,0.95S/n){\displaystyle \exp {\left({\bar {X}}+t_{n-1,0.95}S/{\sqrt {n}}\right)}}. Now suppose we want to predict the air lead level at a particular area within the laboratory. A 95% upper prediction limit for the log-transformed lead level is given byX¯+tn1,0.95S(1+1/n){\displaystyle {\bar {X}}+t_{n-1,0.95}S{\sqrt {\left(1+1/n\right)}}}. A two-sided prediction interval can be similarly computed. The meaning and interpretation of these intervals are well known. For example, if the confidence intervalX¯±tn1,0.975S/n{\displaystyle {\bar {X}}\pm t_{n-1,0.975}S/{\sqrt {n}}} is computed repeatedly from independent samples, 95% of the intervals so computed will include the true value ofμ{\displaystyle \mu }, in the long run. In other words, the interval is meant to provide information concerning the parameterμ{\displaystyle \mu } only. A prediction interval has a similar interpretation, and is meant to provide information concerning a single lead level only. Now suppose we want to use the sample to conclude whether or not at least 95% of the population lead levels are below a threshold. The confidence interval and prediction interval cannot answer this question, since the confidence interval is only for the median lead level, and the prediction interval is only for a single lead level. What is required is a tolerance interval; more specifically, an upper tolerance limit. The upper tolerance limit is to be computed subject to the condition that at least 95% of the population lead levels is below the limit, with a certain confidence level, say 99%.

See also

[edit]

References

[edit]
  1. ^D. S. Young (2010), Book Reviews: "Statistical Tolerance Regions: Theory, Applications, and Computation", TECHNOMETRICS, FEBRUARY 2010, VOL. 52, NO. 1, pp.143-144.
  2. ^abKrishnamoorthy, K. and Lian, Xiaodong(2011) 'Closed-form approximate tolerance intervals for some general linear models and comparison studies', Journal of Statistical Computation and Simulation, First published on: 13 June 2011doi:10.1080/00949655.2010.545061
  3. ^abMeeker, W.Q.; Hahn, G.J.; Escobar, L.A. (2017).Statistical Intervals: A Guide for Practitioners and Researchers. Wiley Series in Probability and Statistics. Wiley.ISBN 978-0-471-68717-7. Retrieved2024-11-05.
  4. ^abDerek S. Young (August 2010)."tolerance: An R Package for Estimating Tolerance Intervals".Journal of Statistical Software.36 (5):1–39.ISSN 1548-7660. Retrieved19 February 2013., p.23
  5. ^Thomas P. Ryan (22 June 2007).Modern Engineering Statistics. John Wiley & Sons. pp. 222–.ISBN 978-0-470-12843-5. Retrieved22 February 2013.
  6. ^"Statistical interpretation of data — Part 6: Determination of statistical tolerance intervals". ISO 16269-6. 2014. p. 2.
  7. ^"Tolerance intervals for a normal distribution".Engineering Statistics Handbook. NIST/Sematech. 2010. Retrieved2011-08-26.
  8. ^De Gryze, S.; Langhans, I.; Vandebroek, M. (2007). "Using the correct intervals for prediction: A tutorial on tolerance intervals for ordinary least-squares regression".Chemometrics and Intelligent Laboratory Systems.87 (2): 147.doi:10.1016/j.chemolab.2007.03.002.
  9. ^abcStephen B. Vardeman (1992). "What about the Other Intervals?".The American Statistician.46 (3):193–197.doi:10.2307/2685212.JSTOR 2685212.
  10. ^abcMark J. Nelson (2011-08-14)."You might want a tolerance interval". Retrieved2011-08-26.
  11. ^abK. Krishnamoorthy (2009).Statistical Tolerance Regions: Theory, Applications, and Computation. John Wiley and Sons. pp. 1–6.ISBN 978-0-470-38026-0.

Further reading

[edit]
  • Hahn, Gerald J.; Meeker, William Q.; Escobar, Luis A. (2017).Statistical Intervals: A Guide for Practitioners and Researchers (2nd ed.). John Wiley & Sons, Incorporated.ISBN 978-0-471-68717-7.
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
Retrieved from "https://en.wikipedia.org/w/index.php?title=Tolerance_interval&oldid=1334090272"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp