
Instatistics,sampling bias is abias in which a sample is collected in such a way that some members of the intendedpopulation have a lower or highersampling probability than others. It results in abiased sample[1] of a population (or non-human factors) in which all individuals, or instances, were not equally likely to have been selected.[2] If this is not accounted for, results can be erroneously attributed to the phenomenon under study rather than to the method ofsampling.
Medical sources sometimes refer to sampling bias asascertainment bias.[3][4] Ascertainment bias has basically the same definition,[5][6] but is still sometimes classified as a separate type of bias.[5]
Sampling bias is usually classified as a subtype ofselection bias,[7] sometimes specifically termedsample selection bias,[8][9][10] but some classify it as a separate type of bias.[11]A distinction, albeit not universally accepted, of sampling bias is that it undermines theexternal validity of a test (the ability of its results to be generalized to the entire population), whileselection bias mainly addressesinternal validity for differences or similarities found in the sample at hand. In this sense, errors occurring in the process of gathering the sample or cohort cause sampling bias, while errors in any process thereafter cause selection bias.
However, selection bias and sampling bias are often used synonymously.[12]
The study of medical conditions begins with anecdotal reports. By their nature, such reports only include those referred for diagnosis and treatment. A child who can't function in school is more likely to be diagnosed withdyslexia than a child who struggles but passes. A child examined for one condition is more likely to be tested for and diagnosed with other conditions, skewingcomorbidity statistics. As certain diagnoses become associated with behavior problems orintellectual disability, parents try to prevent their children from being stigmatized with those diagnoses, introducing further bias. Studies carefully selected from whole populations are showing that many conditions are much more common and usually much milder than formerly believed.

Geneticists are limited in how they can obtain data from human populations. As an example, consider a human characteristic. We are interested in deciding if the characteristic is inherited as asimple Mendelian trait. Following the laws ofMendelian inheritance, if the parents in a family do not have the characteristic, but carry the allele for it, they are carriers (e.g. a non-expressiveheterozygote). In this case their children will each have a 25% chance of showing the characteristic. The problem arises because we can't tell which families have both parents as carriers (heterozygous) unless they have a child who exhibits the characteristic. The description follows the textbook by Sutton.[13]
The figure shows the pedigrees of all the possible families with two children when the parents are carriers (Aa).
The probabilities of each of the families being selected is given in the figure, with the sample frequency of affected children also given. In this simple case, the researcher will look for a frequency of4⁄7 or5⁄8 for the characteristic, depending on the type of truncate selection used.
An example of selection bias is called the "caveman effect". Much of our understanding ofprehistoric peoples comes from caves, such ascave paintings made nearly 40,000 years ago. If there had been contemporary paintings on trees, animal skins or hillsides, they would have been washed away long ago. Similarly, evidence of fire pits,middens,burial sites, etc. are most likely to remain intact to the modern era in caves. Prehistoric people are associated with caves because that is where the data still exists, not necessarily because most of them lived in caves for most of their lives.[14]
Sampling bias is problematic because it is possible that astatistic computed of the sample is systematically erroneous. Sampling bias can lead to a systematic over- or under-estimation of the correspondingparameter in the population. Sampling bias occurs in practice as it is practically impossible to ensure perfect randomness in sampling. If the degree of misrepresentation is small, then the sample can be treated as a reasonable approximation to a random sample. Also, if the sample does not differ markedly in the quantity being measured, then a biased sample can still be a reasonable estimate.
The wordbias has a strong negative connotation. Indeed, biases sometimes come from deliberate intent to mislead or otherscientific fraud. In statistical usage, bias merely represents a mathematical property, no matter if it is deliberate or unconscious or due to imperfections in the instruments used for observation. While some individuals might deliberately use a biased sample to produce misleading results, more often, a biased sample is just a reflection of the difficulty in obtaining a truly representative sample, or ignorance of the bias in their process of measurement or analysis. An example of how ignorance of a bias can exist is in the widespread use of a ratio (a.k.a.fold change) as a measure of difference in biology. Because it is easier to achieve a large ratio with two small numbers with a given difference, and relatively more difficult to achieve a large ratio with two large numbers with a larger difference, large significant differences may be missed when comparing relatively large numeric measurements. Some have called this a 'demarcation bias' because the use of a ratio (division) instead of a difference (subtraction) removes the results of the analysis from science into pseudoscience (SeeDemarcation Problem).
Some samples use a biased statistical design which nevertheless allows the estimation of parameters. The U.S.National Center for Health Statistics, for example, deliberately oversamples from minority populations in many of its nationwide surveys in order to gain sufficient precision for estimates within these groups.[15] These surveys require the use of sample weights (see later on) to produce proper estimates across all ethnic groups. Provided that certain conditions are met (chiefly that the weights are calculated and used correctly) these samples permit accurate estimation of population parameters.

A classic example of a biased sample and the misleading results it produced occurred in 1936. In the early days of opinion polling, the AmericanLiterary Digest magazine collected over two million postal surveys and predicted that the Republican candidate in theU.S. presidential election,Alf Landon, would beat the incumbent president,Franklin Roosevelt, by a large margin. The result was the exact opposite. The Literary Digest survey represented a sample collected from readers of the magazine, supplemented by records of registered automobile owners and telephone users. This sample included an over-representation of wealthy individuals, who, as a group, were more likely to vote for the Republican candidate. In contrast, a poll of only 50 thousand citizens selected byGeorge Gallup's organization successfully predicted the result, leading to the popularity of theGallup poll.
Another classic example occurred in the1948 presidential election. On election night, theChicago Tribune printed the headlineDEWEY DEFEATS TRUMAN, which turned out to be mistaken. In the morning the grinningpresident-elect,Harry S. Truman, was photographed holding a newspaper bearing this headline. The reason the Tribune was mistaken is that their editor trusted the results of aphone survey. Survey research was then in its infancy, and few academics realized that a sample of telephone users was not representative of the general population. Telephones were not yet widespread, and those who had them tended to be prosperous and have stable addresses. (In many cities, theBell Systemtelephone directory contained the same names as theSocial Register). In addition, the Gallup poll that the Tribune based its headline on was over two weeks old at the time of the printing.[17]
Inair quality data, pollutants (such ascarbon monoxide,nitrogen monoxide,nitrogen dioxide, orozone) frequently show highcorrelations, as they stem from the same chemical process(es). These correlations depend on space (i.e., location) and time (i.e., period). Therefore, a pollutant distribution is not necessarily representative for every location and every period. If a low-cost measurement instrument is calibrated with field data in a multivariate manner, more precisely by collocation next to a reference instrument, the relationships between the different compounds are incorporated into the calibration model. By relocation of the measurement instrument, erroneous results can be produced.[18]
A twenty-first century example is theCOVID-19 pandemic, where variations in sampling bias inCOVID-19 testing have been shown to account for wide variations in bothcase fatality rates and theage distribution of cases across countries.[19][20]
If entire segments of the population are excluded from a sample, then there are no adjustments that can produce estimates that are representative of the entire population. But if some groups are underrepresented and the degree of underrepresentation can be quantified, then sample weights can correct the bias. However, the success of the correction is limited to the selection model chosen. If certain variables are missing the methods used to correct the bias could be inaccurate.[21]
For example, a hypothetical population might include 10 million men and 10 million women. Suppose that a biased sample of 100 patients included 20 men and 80 women. A researcher could correct for this imbalance byattaching a weight of 2.5 for each male and 0.625 for each female. This would adjust any estimates to achieve the same expected value as a sample that included exactly 50 men and 50 women, unless men and women differed in their likelihood of taking part in the survey.[citation needed]
Mosby's Medical Dictionary, 8th edition