Theeffective population size (Ne) is the size of anidealised population that would experience the same rate ofgenetic drift as the real population.[1] Idealised populations are those where eachlocus evolvesindependently, following the assumptions of theneutral theory of molecular evolution. The effective population size is normally smaller than thecensus population sizeN. This can be due to chance events prevent some individuals from breeding, to occasionalpopulation bottlenecks, tobackground selection, and togenetic hitchhiking.
The same real population could have a different effective population size for different properties of interest, such as genetic drift (or more precisely, the speed ofcoalescence) over one generation vs. over many generations. Within a species,areas of the genome that have moregenes and/or lessgenetic recombination tend to have lower effective population sizes, because of the effects of selection atlinked sites. In a population with selection at many loci and abundantlinkage disequilibrium, the coalescent effective population size may not reflect the census population size at all, or may reflect its logarithm.
The concept of effective population size was introduced in the field ofpopulation genetics in 1931 by theAmericangeneticistSewall Wright.[2][3] Some versions of the effective population size are used in wildlife conservation.
In a rare experiment that directly measured genetic drift one generation at a time, inDrosophila populations of census size 16, the effective population size was 11.5.[4] This measurement was achieved through studying changes in the frequency of a neutral allele from one generation to another in over 100 replicate populations.
More commonly, effective population size is estimated indirectly by comparing data on current within-speciesgenetic diversity to theoretical expectations. According to theneutral theory of molecular evolution, an idealised diploid population will have a pairwisenucleotide diversity equal to 4Ne, where is themutation rate. The effective population size can therefore be estimated empirically by dividing the nucleotide diversity by 4.[5] This captures the cumulative effects of genetic drift, genetic hitchhiking, and background selection over longer timescales. More advanced methods, permitting a changing effective population size over time, have also been developed.[6]
The effective size measured to reflect these longer timescales may have little relationship to the number of individuals physically present in a population.[7] Measured effective population sizes vary between genes in the same population, being low in genome areas of low recombination and high in genome areas of high recombination.[8][9]Sojourn times are proportional to N in neutral theory, but for alleles under selection, sojourn times are proportional to log(N).Genetic hitchhiking can cause neutral mutations to have sojourn times proportional to log(N): this may explain the relationship between measured effective population size and the local recombination rate.[10]
If therecombination map ofrecombination frequencies alongchromosomes is known,Ne can be inferred fromrP2 = 1 / (1+4Ner), whererP is thePearson correlation coefficient between loci.[11] This expression can be interpreted as the probability that twolineages coalesce before one allele on either lineage recombines onto some third lineage.[6]
The population size might not be constant over time, and thus neither might the effective population size (defined as coalescence speed). With a constant population size, we expect larger pairwiseHamming distance between sequences to be rarer.[12] Under population expansion, an intermediate Hamming distance is instead most common; this is seen for humans. A skyline plot more directly describes coalescence speed over time.[13] The pairwise sequential Markovian coalescent[14] and multiple sequential Markovian coalescent[15] take the average of skyline plots over many loci. An alternative approach infers effective population size over time, together with migration among populations, using theallele frequency spectrum, describing how often alleles are rare versus common. Yet another approach exploitsruns of homozygosity to incorporate information from recombination events.[16]
A survey of publications on 102 mostly wildlife animal and plant species yielded 192Ne/N ratios. Seven different estimation methods were used in the surveyed studies. Accordingly, the ratios ranged widely from 10-6 for Pacific oysters to 0.994 for humans, with an average of 0.34 across the examined species. Based on these data they subsequently estimated more comprehensive ratios, accounting for fluctuations in population size, variance in family size and unequal sex-ratio. These ratios average to only 0.10-0.11.[17]
A genealogical analysis ofInuit hunter-gatherers determined the effective-to-census population size ratio for haploid (mitochondrial DNA, Y chromosomal DNA), and diploid (autosomal DNA) loci separately: the ratio of the effective to the census population size was estimated as 0.6–0.7 for autosomal and X-chromosomal DNA, 0.7–0.9 for mitochondrial DNA and 0.5 for Y-chromosomal DNA.[18]
In an idealised Wright-Fisher model, thefate of an allele, beginning at an intermediate frequency, is largely determined by selection if theselection coefficient s ≫ 1/N, and largely determined by neutral genetic drift if s ≪ 1/N. In real populations, the cutoff value of s may depend instead on local recombination rates.[19][20] This limit to selection in a real population may be captured in a toy Wright-Fisher simulation through the appropriate choice of Ne.
The ability of a species to differentiate between nearly neutral alleles can be measured by howcodon bias differs from neutral expectations.[21] TheKa/Ks ratio is also sometimes used as a proxy.[22]
Thedrift-barrier hypothesis claims that populations with different selection effective population sizes are predicted to evolve profoundly different genome architectures.[23][24]
Ronald Fisher andSewall Wright originally defined effective population size as "the number of breeding individuals in anidealised population that would show the same amount of dispersion ofallele frequencies under randomgenetic drift or the same amount ofinbreeding as the population under consideration". This implied two potentially different effective population sizes, based either on the one-generation increase in variance across replicate populations(variance effective population size), or on the one-generation change in the inbreeding coefficient(inbreeding effective population size). These two are closely linked, and derived fromF-statistics, but they are not identical.[25]
Today, the effective population size is usually estimated empirically with respect to the amount of within-speciesgenetic diversity divided by themutation rate, yielding acoalescent effective population size that reflects the cumulative effects of genetic drift, background selection, and genetic hitchhiking over longer time periods.[5] Another important effective population size is theselection effective population size 1/scritical, where scritical is the critical value of theselection coefficient at which selection becomes more important thangenetic drift.[19]
In theWright-Fisher idealized population model, theconditional variance of the allele frequency, given theallele frequency in the previous generation, is
Let denote the same, typically larger, variance in the actual population under consideration. The variance effective population size is defined as the size of an idealized population with the same variance. This is found by substituting for and solving for which gives
In the following examples, one or more of the assumptions of a strictly idealised population are relaxed, while other assumptions are retained. The variance effective population size of the more relaxed population model is then calculated with respect to the strict model.
Population size varies over time. Suppose there aret non-overlappinggenerations, then effective population size is given by theharmonic mean of the population sizes:[26]
For example, say the population size wasN = 10, 100, 50, 80, 20, 500 for six generations (t = 6). Then the effective population size is theharmonic mean of these, giving:
Note this is less than thearithmetic mean of the population size, which in this example is 126.7. The harmonic mean tends to be dominated by the smallestbottleneck that the population goes through.
If a population isdioecious, i.e. there is noself-fertilisation then
or more generally,
whereD represents dioeciousness and may take the value 0 (for not dioecious) or 1 for dioecious.
WhenN is large,Ne approximately equalsN, so this is usually trivial and often ignored:
If population size is to remain constant, each individual must contribute on average twogametes to the next generation. An idealized population assumes that this follows aPoisson distribution so that thevariance of the number of gametes contributed,k is equal to themean number contributed, i.e. 2:
However, in natural populations the variance is often larger than this. The vast majority of individuals may have no offspring, and the next generation stems only from a small number of individuals, so
The effective population size is then smaller, and given by:
Note that if the variance ofk is less than 2,Ne is greater thanN. In the extreme case of a population experiencing no variation in family size, in a laboratory population in which the number of offspring is artificially controlled,Vk = 0 andNe = 2N.
When thesex ratio of a population varies from theFisherian 1:1 ratio, effective population size is given by:
WhereNm is the number of males andNf the number of females. For example, with 80 males and 20 females (an absolute population size of 100):
Again, this results inNe being less thanN.
Alternatively, the effective population size may be defined by noting how the averageinbreeding coefficient changes from one generation to the next, and then definingNe as the size of the idealized population that has the same change in average inbreeding coefficient as the population under consideration. The presentation follows Kempthorne (1957).[27]
For the idealized population, the inbreeding coefficients follow the recurrence equation
Using Panmictic Index (1 − F) instead of inbreeding coefficient, we get the approximate recurrence equation
The difference per generation is
The inbreeding effective size can be found by solving
This is
When organisms live longer than one breeding season, effective population sizes have to take into account thelife tables for the species.
Assume a haploid population with discrete age structure. An example might be an organism that can survive several discrete breeding seasons. Further, define the following age structure characteristics:
Thegeneration time is calculated as
Then, the inbreeding effective population size is[28]
Similarly, the inbreeding effective number can be calculated for a diploid population with discrete age structure. This was first given by Johnson,[29] but the notation more closely resembles Emigh and Pollak.[30]
Assume the same basic parameters for the life table as given for the haploid case, but distinguishing between male and female, such asN0ƒ andN0m for the number of newborn females and males, respectively (notice lower caseƒ for females, compared to upper caseF for inbreeding).
The inbreeding effective number is