Probability densities of polls of different sizes, each color-coded to its 95%confidence interval (below), margin of error (left), and sample size (right). Each interval reflects the range within which one may have 95% confidence that thetrue percentage may be found, given a reported percentage of 50%. Themargin of error is half the confidence interval (also, theradius of the interval). The larger the sample, the smaller the margin of error. Also, the further from 50% the reported percentage, the smaller the margin of error.
Themargin of error is a statistic expressing the amount of randomsampling error in the results of asurvey. The larger the margin of error, the less confidence one should have that a poll result would reflect the result of a simultaneous census of the entirepopulation. The margin of error will be positive whenever a population is incompletely sampled and the outcome measure has positivevariance, which is to say, whenever the measurevaries.
The termmargin of error is often used in non-survey contexts to indicateobservational error in reporting measured quantities.
Consider a simpleyes/no poll as a sample of respondents drawn from a population reporting the percentage ofyes responses. We would like to know how close is to the true result of a survey of the entire population, without having to conduct one. If, hypothetically, we were to conduct a poll over subsequent samples of respondents (newly drawn from), we would expect those subsequent results to be normally distributed about, the true but unknown percentage of the population. Themargin of error describes the distance within which a specified percentage of these results is expected to vary from.
Going by theCentral limit theorem, the margin of error helps to explain how the distribution of sample means (or percentage of yes, in this case) will approximate a normal distribution as sample size increases. If this applies, it would speak about the sampling being unbiased, but not about the inherent distribution of the data.[1]
According to the68-95-99.7 rule, we would expect that 95% of the results will fall withinabout twostandard deviations () either side of the true mean. This interval is called theconfidence interval, and theradius (half the interval) is called themargin of error, corresponding to a 95%confidence level.
Generally, at a confidence level, a sample sized of a population having expected standard deviation has a margin of error
We would expect the average of normally distributed values to have a standard deviation which somehow varies with. The smaller, the wider the margin. This is called the standard error.
For the single result from our survey, weassume that, and thatall subsequent results together would have a variance.
For a confidencelevel, there is a corresponding confidenceinterval about the mean, that is, the interval within which values of should fall with probability. Precise values of are given by thequantile function of the normal distribution (which the 68–95–99.7 rule approximates).
Note that is undefined for, that is, is undefined, as is.
Log-log graphs of vs sample sizen and confidence levelγ. The arrows show that the maximum margin error for a sample size of 1000 is ±3.1% at 95% confidence level, and ±4.1% at 99%. The inset parabola illustrates the relationship between at and at. In the example,MOE95(0.71) ≈0.9 × ±3.1% ≈ ±2.8%.
Since at, we can arbitrarily set, calculate,, and to obtain themaximum margin of error for at a given confidence level and sample size, even before having actual results. With
If a poll has multiple percentage results (for example, a poll measuring a single multiple-choice preference), the result closest to 50% will have the highest margin of error. Typically, it is this number that is reported as the margin of error for the entire poll. Imagine poll reports as
(as in the figure above)
As a given percentage approaches the extremes of 0% or 100%, its margin of error approaches ±0%.
Imagine multiple-choice poll reports as. As described above, the margin of error reported for the poll would typically be, as is closest to 50%. The popular notion ofstatistical tie orstatistical dead heat, however, concerns itself not with the accuracy of the individual results, but with that of theranking of the results. Which is in first?
If, hypothetically, we were to conduct a poll over subsequent samples of respondents (newly drawn from), and report the result, we could use thestandard error of difference to understand how is expected to fall about. For this, we need to apply thesum of variances to obtain a new variance,,
Note that this assumes that is close to constant, that is, respondents choosing either A or B would almost never choose C (making and close toperfectly negatively correlated). With three or more choices in closer contention, choosing a correct formula for becomes more complicated.
The formulae above for the margin of error assume that there is an infinitely large population and thus do not depend on the size of population, but only on the sample size. According tosampling theory, this assumption is reasonable when thesampling fraction is small. The margin of error for a particular sampling method is essentially the same regardless of whether the population of interest is the size of a school, city, state, or country, as long as the samplingfraction is small.
In cases where the sampling fraction is larger (in practice, greater than 5%), analysts might adjust the margin of error using afinite population correction to account for the added precision gained by sampling a much larger percentage of the population. FPC can be calculated using the formula[2]
...and so, if poll were conducted over 24% of, say, an electorate of 300,000 voters,
Intuitively, for appropriately large,
In the former case, is so small as to require no correction. In the latter case, the poll effectively becomes a census and sampling error becomes moot.