Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Margin of error

From Wikipedia, the free encyclopedia
Statistic expressing the amount of random sampling error in a survey's results
This article is about the statistical precision of estimates from sample surveys. For observational errors, seeObservational error. For safety margins in engineering, seeFactor of safety. For tolerance in engineering, seeEngineering tolerance. For the eponymous movie, seeMargin for error (film).
This article includes a list ofgeneral references, butit lacks sufficient correspondinginline citations. Please help toimprove this article byintroducing more precise citations.(November 2021) (Learn how and when to remove this message)
Probability densities of polls of different sizes, each color-coded to its 95%confidence interval (below), margin of error (left), and sample size (right). Each interval reflects the range within which one may have 95% confidence that thetrue percentage may be found, given a reported percentage of 50%. Themargin of error is half the confidence interval (also, theradius of the interval). The larger the sample, the smaller the margin of error. Also, the further from 50% the reported percentage, the smaller the margin of error.

Themargin of error is a statistic expressing the amount of randomsampling error in the results of asurvey. The larger the margin of error, the less confidence one should have that a poll result would reflect the result of a simultaneous census of the entirepopulation. The margin of error will be positive whenever a population is incompletely sampled and the outcome measure has positivevariance, which is to say, whenever the measurevaries.

The termmargin of error is often used in non-survey contexts to indicateobservational error in reporting measured quantities.

Concept

[edit]

Consider a simpleyes/no pollP{\displaystyle P} as a sample ofn{\displaystyle n} respondents drawn from a populationN(nN){\displaystyle N{\text{, }}(n\ll N)} reporting the percentagep{\displaystyle p} ofyes responses. We would like to know how closep{\displaystyle p} is to the true result of a survey of the entire populationN{\displaystyle N}, without having to conduct one. If, hypothetically, we were to conduct a pollP{\displaystyle P} over subsequent samples ofn{\displaystyle n} respondents (newly drawn fromN{\displaystyle N}), we would expect those subsequent resultsp1,p2,{\displaystyle p_{1},p_{2},\ldots } to be normally distributed aboutp¯{\displaystyle {\overline {p}}}, the true but unknown percentage of the population. Themargin of error describes the distance within which a specified percentage of these results is expected to vary fromp¯{\displaystyle {\overline {p}}}.

Going by theCentral limit theorem, the margin of error helps to explain how the distribution of sample means (or percentage of yes, in this case) will approximate a normal distribution as sample size increases. If this applies, it would speak about the sampling being unbiased, but not about the inherent distribution of the data.[1]

According to the68-95-99.7 rule, we would expect that 95% of the resultsp1,p2,{\displaystyle p_{1},p_{2},\ldots } will fall withinabout twostandard deviations (±2σP{\displaystyle \pm 2\sigma _{P}}) either side of the true meanp¯{\displaystyle {\overline {p}}}.  This interval is called theconfidence interval, and theradius (half the interval) is called themargin of error, corresponding to a 95%confidence level.

Generally, at a confidence levelγ{\displaystyle \gamma }, a sample sizedn{\displaystyle n} of a population having expected standard deviationσ{\displaystyle \sigma } has a margin of error

MOEγ=zγ×σ2n{\displaystyle MOE_{\gamma }=z_{\gamma }\times {\sqrt {\frac {\sigma ^{2}}{n}}}}

wherezγ{\displaystyle z_{\gamma }} denotes thequantile (also, commonly, az-score), andσ2n{\displaystyle {\sqrt {\frac {\sigma ^{2}}{n}}}} is thestandard error.

Standard deviation and standard error

[edit]

We would expect the average of normally distributed values  p1,p2,{\displaystyle p_{1},p_{2},\ldots } to have a standard deviation which somehow varies withn{\displaystyle n}. The smallern{\displaystyle n}, the wider the margin. This is called the standard errorσp¯{\displaystyle \sigma _{\overline {p}}}.

For the single result from our survey, weassume thatp=p¯{\displaystyle p={\overline {p}}}, and thatall subsequent resultsp1,p2,{\displaystyle p_{1},p_{2},\ldots } together would have a varianceσP2=P(1P){\displaystyle \sigma _{P}^{2}=P(1-P)}.

Standard error=σp¯σP2np(1p)n{\displaystyle {\text{Standard error}}=\sigma _{\overline {p}}\approx {\sqrt {\frac {\sigma _{P}^{2}}{n}}}\approx {\sqrt {\frac {p(1-p)}{n}}}}

Note thatp(1p){\displaystyle p(1-p)} corresponds to the variance of aBernoulli distribution.

Maximum margin of error at different confidence levels

[edit]

For a confidencelevelγ{\displaystyle \gamma }, there is a corresponding confidenceinterval about the meanμ±zγσ{\displaystyle \mu \pm z_{\gamma }\sigma }, that is, the interval[μzγσ,μ+zγσ]{\displaystyle [\mu -z_{\gamma }\sigma ,\mu +z_{\gamma }\sigma ]} within which values ofP{\displaystyle P} should fall with probabilityγ{\displaystyle \gamma }. Precise values ofzγ{\displaystyle z_{\gamma }} are given by thequantile function of the normal distribution (which the 68–95–99.7 rule approximates).

Note thatzγ{\displaystyle z_{\gamma }} is undefined for|γ|1{\displaystyle |\gamma |\geq 1}, that is,z1.00{\displaystyle z_{1.00}} is undefined, as isz1.10{\displaystyle z_{1.10}}.

γ{\displaystyle \gamma }zγ{\displaystyle z_{\gamma }} γ{\displaystyle \gamma }zγ{\displaystyle z_{\gamma }}
0.840.9944578832100.99953.290526731492
0.951.6448536269510.999953.890591886413
0.9751.9599639845400.9999954.417173413469
0.992.3263478740410.99999954.891638475699
0.9952.5758293035490.999999955.326723886384
0.99752.8070337683440.9999999955.730728868236
0.99852.9677379253420.99999999956.109410204869
Log-log graphs ofMOEγ(0.5){\displaystyle MOE_{\gamma }(0.5)} vs sample sizen and confidence levelγ. The arrows show that the maximum margin error for a sample size of 1000 is ±3.1% at 95% confidence level, and ±4.1% at 99%.
The inset parabolaσp2=pp2{\displaystyle \sigma _{p}^{2}=p-p^{2}} illustrates the relationship betweenσp2{\displaystyle \sigma _{p}^{2}} atp=0.71{\displaystyle p=0.71} andσmax2{\displaystyle \sigma _{max}^{2}} atp=0.5{\displaystyle p=0.5}. In the example,MOE95(0.71) ≈0.9 × ±3.1% ≈ ±2.8%.

SincemaxσP2=maxP(1P)=0.25{\displaystyle \max \sigma _{P}^{2}=\max P(1-P)=0.25} atp=0.5{\displaystyle p=0.5}, we can arbitrarily setp=p¯=0.5{\displaystyle p={\overline {p}}=0.5}, calculateσP{\displaystyle \sigma _{P}},σp¯{\displaystyle \sigma _{\overline {p}}}, andzγσp¯{\displaystyle z_{\gamma }\sigma _{\overline {p}}} to obtain themaximum margin of error forP{\displaystyle P} at a given confidence levelγ{\displaystyle \gamma } and sample sizen{\displaystyle n}, even before having actual results.  Withp=0.5,n=1013{\displaystyle p=0.5,n=1013}

MOE95(0.5)=z0.95σp¯z0.95σP2n=1.96.25n=0.98/n=±3.1%{\displaystyle MOE_{95}(0.5)=z_{0.95}\sigma _{\overline {p}}\approx z_{0.95}{\sqrt {\frac {\sigma _{P}^{2}}{n}}}=1.96{\sqrt {\frac {.25}{n}}}=0.98/{\sqrt {n}}=\pm 3.1\%}
MOE99(0.5)=z0.99σp¯z0.99σP2n=2.58.25n=1.29/n=±4.1%{\displaystyle MOE_{99}(0.5)=z_{0.99}\sigma _{\overline {p}}\approx z_{0.99}{\sqrt {\frac {\sigma _{P}^{2}}{n}}}=2.58{\sqrt {\frac {.25}{n}}}=1.29/{\sqrt {n}}=\pm 4.1\%}

Also, usefully, for any reportedMOE95{\displaystyle MOE_{95}}

MOE99=z0.99z0.95MOE951.3×MOE95{\displaystyle MOE_{99}={\frac {z_{0.99}}{z_{0.95}}}MOE_{95}\approx 1.3\times MOE_{95}}

Specific margins of error

[edit]

If a poll has multiple percentage results (for example, a poll measuring a single multiple-choice preference), the result closest to 50% will have the highest margin of error. Typically, it is this number that is reported as the margin of error for the entire poll. Imagine pollP{\displaystyle P} reportspa,pb,pc{\displaystyle p_{a},p_{b},p_{c}} as71%,27%,2%,n=1013{\displaystyle 71\%,27\%,2\%,n=1013}

MOE95(Pa)=z0.95σpa¯1.96pa(1pa)n=0.89/n=±2.8%{\displaystyle MOE_{95}(P_{a})=z_{0.95}\sigma _{\overline {p_{a}}}\approx 1.96{\sqrt {\frac {p_{a}(1-p_{a})}{n}}}=0.89/{\sqrt {n}}=\pm 2.8\%} (as in the figure above)
MOE95(Pb)=z0.95σpb¯1.96pb(1pb)n=0.87/n=±2.7%{\displaystyle MOE_{95}(P_{b})=z_{0.95}\sigma _{\overline {p_{b}}}\approx 1.96{\sqrt {\frac {p_{b}(1-p_{b})}{n}}}=0.87/{\sqrt {n}}=\pm 2.7\%}
MOE95(Pc)=z0.95σpc¯1.96pc(1pc)n=0.27/n=±0.8%{\displaystyle MOE_{95}(P_{c})=z_{0.95}\sigma _{\overline {p_{c}}}\approx 1.96{\sqrt {\frac {p_{c}(1-p_{c})}{n}}}=0.27/{\sqrt {n}}=\pm 0.8\%}

As a given percentage approaches the extremes of 0% or 100%, its margin of error approaches ±0%.

Comparing percentages

[edit]

Imagine multiple-choice pollP{\displaystyle P} reportspa,pb,pc{\displaystyle p_{a},p_{b},p_{c}} as46%,42%,12%,n=1013{\displaystyle 46\%,42\%,12\%,n=1013}. As described above, the margin of error reported for the poll would typically beMOE95(Pa){\displaystyle MOE_{95}(P_{a})}, aspa{\displaystyle p_{a}} is closest to 50%. The popular notion ofstatistical tie orstatistical dead heat, however, concerns itself not with the accuracy of the individual results, but with that of theranking of the results. Which is in first?

If, hypothetically, we were to conduct a pollP{\displaystyle P} over subsequent samples ofn{\displaystyle n} respondents (newly drawn fromN{\displaystyle N}), and report the resultpw=papb{\displaystyle p_{w}=p_{a}-p_{b}}, we could use thestandard error of difference to understand howpw1,pw2,pw3,{\displaystyle p_{w_{1}},p_{w_{2}},p_{w_{3}},\ldots } is expected to fall aboutpw¯{\displaystyle {\overline {p_{w}}}}. For this, we need to apply thesum of variances to obtain a new variance,σPw2{\displaystyle \sigma _{P_{w}}^{2}},

σPw2=σPaPb2=σPa2+σPb22σPa,Pb=pa(1pa)+pb(1pb)+2papb{\displaystyle \sigma _{P_{w}}^{2}=\sigma _{P_{a}-P_{b}}^{2}=\sigma _{P_{a}}^{2}+\sigma _{P_{b}}^{2}-2\sigma _{P_{a},P_{b}}=p_{a}(1-p_{a})+p_{b}(1-p_{b})+2p_{a}p_{b}}

whereσPa,Pb=PaPb{\displaystyle \sigma _{P_{a},P_{b}}=-P_{a}P_{b}} is thecovariance ofPa{\displaystyle P_{a}} andPb{\displaystyle P_{b}}.

Thus (after simplifying),

Standard error of difference=σw¯σPw2n=pa+pb(papb)2n=0.029,Pw=PaPb{\displaystyle {\text{Standard error of difference}}=\sigma _{\overline {w}}\approx {\sqrt {\frac {\sigma _{P_{w}}^{2}}{n}}}={\sqrt {\frac {p_{a}+p_{b}-(p_{a}-p_{b})^{2}}{n}}}=0.029,P_{w}=P_{a}-P_{b}}
MOE95(Pa)=z0.95σpa¯±3.1%{\displaystyle MOE_{95}(P_{a})=z_{0.95}\sigma _{\overline {p_{a}}}\approx \pm {3.1\%}}
MOE95(Pw)=z0.95σw¯±5.8%{\displaystyle MOE_{95}(P_{w})=z_{0.95}\sigma _{\overline {w}}\approx \pm {5.8\%}}

Note that this assumes thatPc{\displaystyle P_{c}} is close to constant, that is, respondents choosing either A or B would almost never choose C (makingPa{\displaystyle P_{a}} andPb{\displaystyle P_{b}} close toperfectly negatively correlated). With three or more choices in closer contention, choosing a correct formula forσPw2{\displaystyle \sigma _{P_{w}}^{2}} becomes more complicated.

Effect of finite population size

[edit]

The formulae above for the margin of error assume that there is an infinitely large population and thus do not depend on the size of populationN{\displaystyle N}, but only on the sample sizen{\displaystyle n}. According tosampling theory, this assumption is reasonable when thesampling fraction is small. The margin of error for a particular sampling method is essentially the same regardless of whether the population of interest is the size of a school, city, state, or country, as long as the samplingfraction is small.

In cases where the sampling fraction is larger (in practice, greater than 5%), analysts might adjust the margin of error using afinite population correction to account for the added precision gained by sampling a much larger percentage of the population. FPC can be calculated using the formula[2]

FPC=NnN1{\displaystyle \operatorname {FPC} ={\sqrt {\frac {N-n}{N-1}}}}

...and so, if pollP{\displaystyle P} were conducted over 24% of, say, an electorate of 300,000 voters,

MOE95(0.5)=z0.95σp¯0.9872,000=±0.4%{\displaystyle MOE_{95}(0.5)=z_{0.95}\sigma _{\overline {p}}\approx {\frac {0.98}{\sqrt {72,000}}}=\pm 0.4\%}
MOE95FPC(0.5)=z0.95σp¯NnN10.9872,000300,00072,000300,0001=±0.3%{\displaystyle MOE_{95_{FPC}}(0.5)=z_{0.95}\sigma _{\overline {p}}{\sqrt {\frac {N-n}{N-1}}}\approx {\frac {0.98}{\sqrt {72,000}}}{\sqrt {\frac {300,000-72,000}{300,000-1}}}=\pm 0.3\%}

Intuitively, for appropriately largeN{\displaystyle N},

limn0NnN11{\displaystyle \lim _{n\to 0}{\sqrt {\frac {N-n}{N-1}}}\approx 1}
limnNNnN1=0{\displaystyle \lim _{n\to N}{\sqrt {\frac {N-n}{N-1}}}=0}

In the former case,n{\displaystyle n} is so small as to require no correction. In the latter case, the poll effectively becomes a census and sampling error becomes moot.

See also

[edit]

References

[edit]
  1. ^Siegfried, Tom (2014-07-03)."Scientists' grasp of confidence intervals doesn't inspire confidence | Science News".Science News. Retrieved2024-08-06.
  2. ^Isserlis, L. (1918)."On the value of a mean as calculated from a sample".Journal of the Royal Statistical Society.81 (1). Blackwell Publishing:75–81.doi:10.2307/2340569.JSTOR 2340569. (Equation 1)

Sources

[edit]
  • Sudman, Seymour and Bradburn, Norman (1982).Asking Questions: A Practical Guide to Questionnaire Design. San Francisco: Jossey Bass.ISBN 0-87589-546-8
  • Wonnacott, T.H.; R.J. Wonnacott (1990).Introductory Statistics (5th ed.). Wiley.ISBN 0-471-61518-8.

External links

[edit]
Wikibooks has more on the topic of:Margin of error
Retrieved from "https://en.wikipedia.org/w/index.php?title=Margin_of_error&oldid=1291130091"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp