![]() | This article mayrequirecleanup to meet Wikipedia'squality standards. The specific problem is:nested fractions probably better written with <math>...</math> markup. Please helpimprove this article if you can.(February 2024) (Learn how and when to remove this message) |
Part of a series on |
Genetics |
---|
![]() |
Personalized medicine |
Quantitative genetics is the study ofquantitative traits, which arephenotypes that vary continuously—such as height or mass—as opposed to phenotypes and gene-products that arediscretely identifiable—such as eye-colour, or the presence of a particular biochemical.
Both of these branches ofgenetics use the frequencies of differentalleles of agene in breeding populations (gamodemes), and combine them with concepts from simpleMendelian inheritance to analyze inheritance patterns across generations and descendant lines. Whilepopulation genetics can focus on particular genes and their subsequent metabolic products, quantitative genetics focuses more on the outward phenotypes, and makes only summaries of the underlying genetics.
Due to the continuous distribution of phenotypic values, quantitative genetics must employ many other statistical methods (such as theeffect size, themean and thevariance) to link phenotypes (attributes) to genotypes. Some phenotypes may be analyzed either as discrete categories or as continuous phenotypes, depending on the definition of cut-off points, or on themetric used to quantify them.[1]: 27–69 Mendel himself had to discuss this matter in his famous paper,[2] especially with respect to his peas' attributetall/dwarf, which actually was derived by adding a cut-off point to "length of stem".[3][4] Analysis ofquantitative trait loci, or QTLs,[5][6][7] is a more recent addition to quantitative genetics, linking it more directly tomolecular genetics.
Indiploid organisms, the averagegenotypic "value" (locus value) may be defined by the allele "effect" together with adominance effect, and also by how genes interact with genes at other loci (epistasis). The founder of quantitative genetics -Sir Ronald Fisher - perceived much of this when he proposed the first mathematics of this branch of genetics.[8]
Being a statistician, he defined the gene effects as deviations from a central value—enabling the use of statistical concepts such as mean and variance, which use this idea.[9] The central value he chose for the gene was the midpoint between the two opposing homozygotes at the one locus. The deviation from there to the "greater" homozygous genotype can be named "+a"; and therefore it is "-a" from that same midpoint to the "lesser" homozygote genotype. This is the "allele" effect mentioned above. The heterozygote deviation from the same midpoint can be named "d", this being the "dominance" effect referred to above.[10] The diagram depicts the idea. However, in reality we measure phenotypes, and the figure also shows how observed phenotypes relate to the gene effects. Formal definitions of these effects recognize this phenotypic focus.[11][12] Epistasis has been approached statistically as interaction (i.e., inconsistencies),[13] butepigenetics suggests a new approach may be needed.
If0<d<a, the dominance is regarded aspartial orincomplete—whiled=a indicates full orclassical dominance. Previously,d>a was known as "over-dominance".[14]
Mendel's pea attribute "length of stem" provides us with a good example.[3] Mendel stated that the tall true-breeding parents ranged from 6–7 feet in stem length (183 – 213 cm), giving a median of 198 cm (= P1). The short parents ranged from 0.75 to 1.25 feet in stem length (23 – 46 cm), with a rounded median of 34 cm (= P2). Their hybrid ranged from 6–7.5 feet in length (183–229 cm), with a median of 206 cm (= F1). The mean of P1 and P2 is 116 cm, this being the phenotypic value of the homozygotes midpoint (mp). The allele affect (a) is [P1-mp] = 82 cm = -[P2-mp]. The dominance effect (d) is [F1-mp] = 90 cm.[15] This historical example illustrates clearly how phenotype values and gene effects are linked.
To obtain means, variances and other statistics, bothquantities and theiroccurrences are required. The gene effects (above) provide the framework forquantities: and thefrequencies of the contrasting alleles in the fertilization gamete-pool provide the information onoccurrences.
Commonly, the frequency of the allele causing "more" in the phenotype (including dominance) is given the symbolp, while the frequency of the contrasting allele isq. An initial assumption made when establishing the algebra was that the parental population was infinite and random mating, which was made simply to facilitate the derivation. The subsequent mathematical development also implied that the frequency distribution within the effective gamete-pool was uniform: there were no local perturbations wherep andq varied. Looking at the diagrammatic analysis of sexual reproduction, this is the same as declaring thatpP =pg =p; and similarly forq.[14] This mating system, dependent upon these assumptions, became known as "panmixia".
Panmixia rarely actually occurs in nature,[16]: 152–180 [17] as gamete distribution may be limited, for example by dispersal restrictions or by behaviour, or by chance sampling (those local perturbations mentioned above). It is well known that there is a huge wastage of gametes in Nature, which is why the diagram depicts apotential gamete-pool separately to theactual gamete-pool. Only the latter sets the definitive frequencies for the zygotes: this is the true "gamodeme" ("gamo" refers to the gametes, and "deme" derives from Greek for "population"). But, under Fisher's assumptions, thegamodeme can be effectively extended back to thepotential gamete-pool, and even back to the parental base-population (the "source" population). The random sampling arising when small "actual" gamete-pools are sampled from a large "potential" gamete-pool is known asgenetic drift, and is considered subsequently.
While panmixia may not be widely extant, thepotential for it does occur, although it may be only ephemeral because of those local perturbations. It has been shown, for example, that the F2 derived fromrandom fertilization of F1 individuals (anallogamous F2), following hybridization, is anorigin of a newpotentially panmictic population.[18][19] It has also been shown that if panmictic random fertilization occurred continually, it would maintain the same allele and genotype frequencies across each successive panmictic sexual generation—this being theHardy Weinberg equilibrium.[13]: 34–39 [20][21][22][23] However, as soon as genetic drift was initiated by local random sampling of gametes, the equilibrium would cease.
Male and female gametes within the actual fertilizing pool are considered usually to have the same frequencies for their corresponding alleles. (Exceptions have been considered.) This means that whenp male gametes carrying theA allele randomly fertilizep female gametes carrying that same allele, the resulting zygote has genotypeAA, and, under random fertilization, the combination occurs with a frequency ofp xp (=p2). Similarly, the zygoteaa occurs with a frequency ofq2. Heterozygotes (Aa) can arise in two ways: whenp male (A allele) randomly fertilizeq female (a allele) gametes, andvice versa. The resulting frequency for the heterozygous zygotes is thus2pq.[13]: 32 Notice that such a population is never more than half heterozygous, this maximum occurring whenp=q= 0.5.
In summary then, under random fertilization, the zygote (genotype) frequencies are the quadratic expansion of the gametic (allelic) frequencies:. (The "=1" states that the frequencies are in fraction form, not percentages; and that there are no omissions within the framework proposed.)
Notice that "random fertilization" and "panmixia" arenot synonyms.
Mendel's pea experiments were constructed by establishing true-breeding parents with "opposite" phenotypes for each attribute.[3] This meant that each opposite parent was homozygous for its respective allele only. In our example, "tallvs dwarf", the tall parent would be genotypeTT withp =1 (andq =0); while the dwarf parent would be genotypett withq =1 (andp =0). After controlled crossing, their hybrid isTt, withp =q =1/2. However, the frequency of this heterozygote =1, because this is the F1 of an artificial cross: it has not arisen through random fertilization.[24] The F2 generation was produced by natural self-pollination of the F1 (with monitoring against insect contamination), resulting inp =q =1/2 being maintained. Such an F2 is said to be "autogamous". However, the genotype frequencies (0.25TT, 0.5Tt, 0.25tt) have arisen through a mating system very different from random fertilization, and therefore the use of the quadratic expansion has been avoided. The numerical values obtained were the same as those for random fertilization only because this is the special case of having originally crossed homozygous opposite parents.[25] We can notice that, because of the dominance ofT- [frequency (0.25 + 0.5)] overtt [frequency 0.25], the 3:1 ratio is still obtained.
A cross such as Mendel's, where true-breeding (largely homozygous) opposite parents are crossed in a controlled way to produce an F1, is a special case of hybrid structure. The F1 is often regarded as "entirely heterozygous" for the gene under consideration. However, this is an over-simplification and does not apply generally—for example when individual parents are not homozygous, or whenpopulations inter-hybridise to formhybrid swarms.[24] The general properties of intra-species hybrids (F1) and F2 (both "autogamous" and "allogamous") are considered in a later section.
Having noticed that the pea is naturally self-pollinated, we cannot continue to use it as an example for illustrating random fertilization properties. Self-fertilization ("selfing") is a major alternative to random fertilization, especially within Plants. Most of the Earth's cereals are naturally self-pollinated (rice, wheat, barley, for example), as well as the pulses. Considering the millions of individuals of each of these on Earth at any time, it is obvious that self-fertilization is at least as significant as random fertilization. Self-fertilization is the most intensive form ofinbreeding, which arises whenever there is restricted independence in the genetical origins of gametes. Such reduction in independence arises if parents are already related, and/or from genetic drift or other spatial restrictions on gamete dispersal. Path analysis demonstrates that these are tantamount to the same thing.[26][27] Arising from this background, theinbreeding coefficient (often symbolized asF orf) quantifies the effect of inbreeding from whatever cause. There are several formal definitions off, and some of these are considered in later sections. For the present, note that for a long-term self-fertilized speciesf =1.Natural self-fertilized populations are not single "pure lines ", however, but mixtures of such lines. This becomes particularly obvious when considering more than one gene at a time. Therefore, allele frequencies (p andq) other than1 or0 are still relevant in these cases (refer back to the Mendel Cross section). The genotype frequencies take a different form, however.
In general, the genotype frequencies become forAA and forAa and foraa.[13]: 65
Notice that the frequency of the heterozygote declines in proportion tof. Whenf = 1, these three frequencies become respectivelyp,0 andq Conversely, whenf = 0, they reduce to the random-fertilization quadratic expansion shown previously.
The population mean shifts the central reference point from the homozygote midpoint (mp) to the mean of a sexually reproduced population. This is important not only to relocate the focus into the natural world, but also to use a measure ofcentral tendency used by Statistics/Biometrics. In particular, the square of this mean is the Correction Factor, which is used to obtain the genotypic variances later.[9]
For each genotype in turn, its allele effect is multiplied by its genotype frequency; and the products are accumulated across all genotypes in the model. Some algebraic simplification usually follows to reach a succinct result.
The contribution ofAA is, that ofAa is, and that ofaa is. Gathering together the twoa terms and accumulating over all, the result is:. Simplification is achieved by noting that, and by recalling that, thereby reducing the right-hand term to.
The succinct result is therefore.[14]: 110
This defines the population mean as an "offset" from the homozygote midpoint (recalla andd are defined asdeviations from that midpoint). The Figure depictsG across all values ofp for several values ofd, including one case of slight over-dominance. Notice thatG is often negative, thereby emphasizing that it is itself adeviation (frommp).
Finally, to obtain theactual Population Mean in "phenotypic space", the midpoint value is added to this offset:.
An example arises from data on ear length in maize.[28]: 103 Assuming for now that one gene only is represented,a = 5.45 cm,d = 0.12 cm [virtually "0", really],mp = 12.05 cm. Further assuming thatp = 0.6 andq = 0.4 in this example population, then:
G = 5.45 (0.6 − 0.4) + (0.48)0.12 =1.15 cm (rounded); and
P = 1.15 + 12.05 =13.20 cm (rounded).
The contribution ofAA is, while that ofaa is. [See above for the frequencies.] Gathering these twoa terms together leads to an immediately very simple final result:
. As before,.
Often, "G(f=1)" is abbreviated to "G1".
Mendel's peas can provide us with the allele effects and midpoint (see previously); and a mixed self-pollinated population withp = 0.6 andq = 0.4 provides example frequencies. Thus:
G(f=1) = 82 (0.6 − .04) = 59.6 cm (rounded); and
P(f=1) = 59.6 + 116 = 175.6 cm (rounded).
A general formula incorporates the inbreeding coefficientf, and can then accommodate any situation. The procedure is exactly the same as before, using the weighted genotype frequencies given earlier. After translation into our symbols, and further rearrangement:[13]: 77–78
Here,G0 isG, which was given earlier. (Often, when dealing with inbreeding, "G0" is preferred to "G".)
Supposing that the maize example [given earlier] had been constrained on a holme (a narrow riparian meadow), and had partial inbreeding to the extent off=0.25, then, using the third version (above) ofGf:
G0.25 = 1.15 − 0.25 (0.48) 0.12 = 1.136 cm (rounded), withP0.25 = 13.194 cm (rounded).
There is hardly any effect from inbreeding in this example, which arises because there was virtually no dominance in this attribute (d → 0). Examination of all three versions ofGf reveals that this would lead to trivial change in the Population mean. Where dominance was notable, however, there would be considerable change.
Genetic drift was introduced when discussing the likelihood of panmixia being widely extant as a natural fertilization pattern. [See section on Allele and Genotype frequencies.] Here the sampling of gametes from thepotential gamodeme is discussed in more detail. The sampling involves random fertilization between pairs of random gametes, each of which may contain either anA or ana allele. The sampling is therefore binomial sampling.[13]: 382–395 [14]: 49–63 [29]: 35 [30]: 55 Each sampling "packet" involves2N alleles, and producesN zygotes (a "progeny" or a "line") as a result. During the course of the reproductive period, this sampling is repeated over and over, so that the final result is a mixture of sample progenies. The result isdispersed random fertilization These events, and the overall end-result, are examined here with an illustrative example.
The "base" allele frequencies of the example are those of thepotential gamodeme: the frequency ofA ispg = 0.75, while the frequency ofa isqg = 0.25. [White label "1" in the diagram.] Five example actual gamodemes are binomially sampled out of this base (s = the number of samples = 5), and each sample is designated with an "index"k: withk = 1 .... s sequentially. (These are the sampling "packets" referred to in the previous paragraph.) The number of gametes involved in fertilization varies from sample to sample, and is given as2Nk [atwhite label "2" in the diagram]. The total (Σ) number of gametes sampled overall is 52 [white label "3" in the diagram]. Because each sample has its own size,weights are needed to obtain averages (and other statistics) when obtaining the overall results. These are, and are given atwhite label "4" in the diagram.
Following completion of these five binomial sampling events, the resultant actual gamodemes each contained different allele frequencies—(pk andqk). [These are given atwhite label "5" in the diagram.] This outcome is actually the genetic drift itself. Notice that two samples (k = 1 and 5) happen to have the same frequencies as thebase (potential) gamodeme. Another (k = 3) happens to have thep andq "reversed". Sample (k = 2) happens to be an "extreme" case, withpk = 0.9 andqk = 0.1; while the remaining sample (k = 4) is "middle of the range" in its allele frequencies. All of these results have arisen only by "chance", through binomial sampling. Having occurred, however, they set in place all the downstream properties of the progenies.
Because sampling involves chance, theprobabilities (∫k ) of obtaining each of these samples become of interest. These binomial probabilities depend on the starting frequencies (pg andqg) and the sample size (2Nk). They are tedious to obtain,[13]: 382–395 [30]: 55 but are of considerable interest. [Seewhite label "6" in the diagram.] The two samples (k = 1, 5), with the allele frequencies the same as in thepotential gamodeme, had higher "chances" of occurring than the other samples. Their binomial probabilities did differ, however, because of their different sample sizes (2Nk). The "reversal" sample (k = 3) had a very low Probability of occurring, confirming perhaps what might be expected. The "extreme" allele frequency gamodeme (k = 2) was not "rare", however; and the "middle of the range" sample (k=4)was rare. These same Probabilities apply also to the progeny of these fertilizations.
Here, somesummarizing can begin. Theoverall allele frequencies in the progenies bulk are supplied by weighted averages of the appropriate frequencies of the individual samples. That is: and. (Notice thatk is replaced by• for the overall result—a common practice.)[9] The results for the example arep• = 0.631 andq• = 0.369 [black label "5" in the diagram]. These values are quite different to the starting ones (pg andqg) [white label "1"]. The sample allele frequencies also have variance as well as an average. This has been obtained using thesum of squares (SS) method[31] [See to the right ofblack label "5" in the diagram]. [Further discussion on this variance occurs in the section below on Extensive genetic drift.]
Thegenotype frequencies of the five sample progenies are obtained from the usual quadratic expansion of their respective allele frequencies (random fertilization). The results are given at the diagram'swhite label "7" for the homozygotes, and atwhite label "8" for the heterozygotes. Re-arrangement in this manner prepares the way for monitoring inbreeding levels. This can be done either by examining the level oftotal homozygosis [(p2k + q2k) = (1 − 2pkqk)], or by examining the level of heterozygosis (2pkqk), as they are complementary.[32] Notice that samplesk= 1, 3, 5 all had the same level of heterozygosis, despite one being the "mirror image" of the others with respect to allele frequencies. The "extreme" allele-frequency case (k=2) had the most homozygosis (least heterozygosis) of any sample. The "middle of the range" case (k=4) had the least homozygosity (most heterozygosity): they were each equal at 0.50, in fact.
Theoverall summary can continue by obtaining theweighted average of the respective genotype frequencies for the progeny bulk. Thus, forAA, it is, forAa, it is and foraa, it is. The example results are given atblack label "7" for the homozygotes, and atblack label "8" for the heterozygote. Note that the heterozygosity mean is0.3588, which the next section uses to examine inbreeding resulting from this genetic drift.
The next focus of interest is the dispersion itself, which refers to the "spreading apart" of the progenies'population means. These are obtained as [see section on the Population mean], for each sample progeny in turn, using the example gene effects given atwhite label "9" in the diagram. Then, each is obtained also [atwhite label "10" in the diagram]. Notice that the "best" line (k = 2) had thehighest allele frequency for the "more" allele (A) (it also had the highest level of homozygosity). Theworst progeny (k = 3) had the highest frequency for the "less" allele (a), which accounted for its poor performance. This "poor" line was less homozygous than the "best" line; and it shared the same level of homozygosity, in fact, as the twosecond-best lines (k = 1, 5). The progeny line with both the "more" and the "less" alleles present in equal frequency (k = 4) had a mean below theoverall average (see next paragraph), and had the lowest level of homozygosity. These results reveal the fact that the alleles most prevalent in the "gene-pool" (also called the "germplasm") determine performance, not the level of homozygosity per se. Binomial sampling alone effects this dispersion.
Theoverall summary can now be concluded by obtaining and. The example result forP• is 36.94 (black label "10" in the diagram). This later is used to quantifyinbreeding depression overall, from the gamete sampling. [See the next section.] However, recall that some "non-depressed" progeny means have been identified already (k = 1, 2, 5). This is an enigma of inbreeding—while there may be "depression" overall, there are usually superior lines among the gamodeme samplings.
Included in theoverall summary were the average allele frequencies in the mixture of progeny lines (p• andq•). These can now be used to construct a hypothetical panmictic equivalent.[13]: 382–395 [14]: 49–63 [29]: 35 This can be regarded as a "reference" to assess the changes wrought by the gamete sampling. The example appends such a panmictic to the right of the Diagram. The frequency ofAA is therefore(p•)2 = 0.3979. This is less than that found in the dispersed bulk (0.4513 atblack label "7"). Similarly, foraa,(q•)2 = 0.1303—again less than the equivalent in the progenies bulk (0.1898). Clearly,genetic drift has increased the overall level of homozygosis by the amount (0.6411 − 0.5342) = 0.1069. In a complementary approach, the heterozygosity could be used instead. The panmictic equivalent forAa is2 p• q• = 0.4658, which ishigher than that in the sampled bulk (0.3588) [black label "8"]. The sampling has caused the heterozygosity to decrease by 0.1070, which differs trivially from the earlier estimate because of rounding errors.
Theinbreeding coefficient (f) was introduced in the early section on Self Fertilization. Here, a formal definition of it is considered:f is the probability that two "same" alleles (that isA andA, ora anda), which fertilize together are of common ancestral origin—or (more formally)f is the probability that two homologous alleles are autozygous.[14][27] Consider any random gamete in thepotential gamodeme that has its syngamy partner restricted by binomial sampling. The probability that that second gamete is homologous autozygous to the first is1/(2N), the reciprocal of the gamodeme size. For the five example progenies, these quantities are 0.1, 0.0833, 0.1, 0.0833 and 0.125 respectively, and their weighted average is0.0961. This is theinbreeding coefficient of the example progenies bulk, provided it isunbiased with respect to the full binomial distribution. An example based upons = 5 is likely to be biased, however, when compared to an appropriate entire binomial distribution based upon the sample number (s) approaching infinity (s → ∞). Another derived definition off for the full Distribution is thatf also equals the rise in homozygosity, which equals the fall in heterozygosity.[33] For the example, these frequency changes are0.1069 and0.1070, respectively. This result is different to the above, indicating that bias with respect to the full underlying distribution is present in the example. For the exampleitself, these latter values are the better ones to use, namelyf• =0.10695.
Thepopulation mean of the equivalent panmictic is found as[a (p•-q•) + 2 p•q• d] + mp. Using the examplegene effects (white label "9" in the diagram), this mean is 37.87. The equivalent mean in the dispersed bulk is 36.94 (black label "10"), which is depressed by the amount0.93. This is theinbreeding depression from this Genetic Drift. However, as noted previously, three progenies werenot depressed (k = 1, 2, 5), and had means even greater than that of the panmictic equivalent. These are the lines a plant breeder looks for in a line selection programme.[34]
If the number of binomial samples is large (s → ∞ ), thenp• → pg andq• → qg. It might be queried whether panmixia would effectively re-appear under these circumstances. However, thesampling of allele frequencies hasstill occurred, with the result thatσ2p, q ≠0.[35] In fact, ass → ∞, the, which is thevariance of thewhole binomial distribution.[13]: 382–395 [14]: 49–63 Furthermore, the "Wahlund equations" show that the progeny-bulkhomozygote frequencies can be obtained as the sums of their respective average values (p2• orq2•)plusσ2p, q.[13]: 382–395 Likewise, the bulkheterozygote frequency is(2 p• q•)minustwice theσ2p, q. The variance arising from the binomial sampling is conspicuously present. Thus, even whens → ∞, the progeny-bulkgenotype frequencies still revealincreased homozygosis, anddecreased heterozygosis, there is stilldispersion of progeny means, and stillinbreeding andinbreeding depression. That is, panmixia isnot re-attained once lost because of genetic drift (binomial sampling). However, a newpotential panmixia can be initiated via an allogamous F2 following hybridization.[36]
Previous discussion on genetic drift examined just one cycle (generation) of the process. When the sampling continues over successive generations, conspicuous changes occur inσ2p,q andf. Furthermore, another "index" is needed to keep track of "time":t =1 .... y wherey = the number of "years" (generations) considered. The methodology often is to add the current binomial increment (Δ = "de novo") to what has occurred previously.[13] The entire Binomial Distribution is examined here. [There is no further benefit to be had from an abbreviated example.]
Earlier this variance (σ2p,q[35]) was seen to be:-
With the extension over time, this is also the result of thefirst cycle, and so is (for brevity). At cycle 2, this variance is generated yet again—this time becoming thede novo variance ()—and accumulates to what was present already—the "carry-over" variance. Thesecond cycle variance () is the weighted sum of these two components, the weights being for thede novo and = for the"carry-over".
Thus,
1 |
The extension to generalize to any timet, after considerable simplification, becomes:[13]: 328 -
2 |
Because it was this variation in allele frequencies that caused the "spreading apart" of the progenies' means (dispersion), the change in σ2t over the generations indicates the change in the level of thedispersion.
The method for examining the inbreeding coefficient is similar to that used forσ2p,q. The same weights as before are used respectively forde novo f (Δ f ) [recall this is1/(2N) ] andcarry-over f. Therefore, , which is similar toEquation (1) in the previous sub-section.
In general, after rearrangement,[13] The graphs to the left show levels of inbreeding over twenty generations arising from genetic drift for variousactual gamodeme sizes (2N).
Still further rearrangements of this general equation reveal some interesting relationships.
(A) After some simplification,[13]. The left-hand side is the difference between the current and previous levels of inbreeding: thechange in inbreeding (δft). Notice, that thischange in inbreeding (δft) is equal to thede novo inbreeding (Δf) only for the first cycle—when ft-1 iszero.
(B) An item of note is the(1-ft-1), which is an "index ofnon-inbreeding". It is known as thepanmictic index.[13][14].
(C) Further useful relationships emerge involving thepanmictic index.[13][14].(D) A key link emerges betweenσ2p,q andf. Firstly...[13]Secondly, presuming thatf0 =0, the right-hand side of this equation reduces to the section within the brackets ofEquation (2) at the end of the last sub-section. That is, if initially there is no inbreeding,! Furthermore, if this then is rearranged,. That is, when initial inbreeding is zero, the two principal viewpoints ofbinomial gamete sampling (genetic drift) are directly inter-convertible.
It is easy to overlook thatrandom fertilization includes self-fertilization. Sewall Wright showed that a proportion1/N ofrandom fertilizations is actuallyself fertilization, with the remainder(N-1)/N beingcross fertilization. Following path analysis and simplification, the new viewrandom fertilization inbreeding was found to be:.[27][37] Upon further rearrangement, the earlier results from the binomial sampling were confirmed, along with some new arrangements. Two of these were potentially very useful, namely:(A); and(B).
The recognition that selfing mayintrinsically be a part of random fertilization leads to some issues about the use of the previousrandom fertilization 'inbreeding coefficient'. Clearly, then, it is inappropriate for any species incapable ofself fertilization, which includes plants with self-incompatibility mechanisms, dioecious plants, andbisexual animals. The equation of Wright was modified later to provide a version of random fertilization that involved onlycross fertilization with noself fertilization. The proportion1/N formerly due toselfing now defined thecarry-over gene-drift inbreeding arising from the previous cycle. The new version is:[13]: 166 .
The graphs to the right depict the differences between standardrandom fertilizationRF, and random fertilization adjusted for "cross fertilization alone"CF. As can be seen, the issue is non-trivial for small gamodeme sample sizes.
"Panmixia' isnot synonymous with 'random fertilization,' nor is "random fertilization" synonymous with 'cross fertilization'.[citation needed]
In the sub-section on "The sample gamodemes – Genetic drift", a series of gamete samplings was followed, an outcome of which was an increase in homozygosity at the expense of heterozygosity. From this viewpoint, the rise in homozygosity was due to the gamete samplings. Levels of homozygosity can be viewed also according to whether homozygotes arose allozygously or autozygously. Recall that autozygous alleles have the same allelic origin, the likelihood (frequency) of whichis theinbreeding coefficient (f) by definition. The proportion arisingallozygously is therefore(1-f). For theA-bearing gametes, which are present with a general frequency ofp, the overall frequency of those that are autozygous is therefore (fp). Similarly, fora-bearing gametes, the autozygous frequency is (fq).[38] These two viewpoints regarding genotype frequencies must be connected to establish consistency.
Following firstly theauto/allo viewpoint, consider theallozygous component. This occurs with the frequency of(1-f), and the alleles unite according to therandom fertilization quadratic expansion. Thus: Consider next theautozygous component. As these allelesareautozygous, they are effectivelyselfings, and produce eitherAA oraa genotypes, but no heterozygotes. They therefore produce"AA" homozygotes plus"aa" homozygotes. Adding these two components together results in: for theAA homozygote; for theaa homozygote; and for theAa heterozygote.[13]: 65 [14] This is the same equation as that presented earlier in the section on "Self fertilization – an alternative". The reason for the decline in heterozygosity is made clear here. Heterozygotes can ariseonly from the allozygous component, and its frequency in the sample bulk is just(1-f): hence this must also be the factor controlling the frequency of the heterozygotes.
Secondly, thesampling viewpoint is re-examined. Previously, it was noted that the decline in heterozygotes was. This decline is distributed equally towards each homozygote; and is added to their basicrandom fertilization expectations. Therefore, the genotype frequencies are: for the"AA" homozygote; for the"aa" homozygote; and for the heterozygote.
Thirdly, theconsistency between the two previous viewpoints needs establishing. It is apparent at once [from the corresponding equations above] that the heterozygote frequency is the same in both viewpoints. However, such a straightforward result is not immediately apparent for the homozygotes. Begin by considering theAA homozygote's final equation in theauto/allo paragraph above:-. Expand the brackets, and follow by re-gathering [within the resultant] the two new terms with the common-factorf in them. The result is:. Next, for the parenthesized "p20 ", a(1-q) is substituted for ap, the result becoming. Following that substitution, it is a straightforward matter of multiplying-out, simplifying and watching signs. The end result is, which is exactly the result forAA in thesampling paragraph. The two viewpoints are thereforeconsistent for theAA homozygote. In a like manner, the consistency of theaa viewpoints can also be shown. The two viewpoints are consistent for all classes of genotypes.
In previous sections, dispersive random fertilization (genetic drift) has been considered comprehensively, and self-fertilization and hybridizing have been examined to varying degrees. The diagram to the left depicts the first two of these, along with another "spatially based" pattern:islands. This is a pattern ofrandom fertilization featuringdispersed gamodemes, with the addition of "overlaps" in whichnon-dispersive random fertilization occurs. With theislands pattern, individual gamodeme sizes (2N) are observable, and overlaps (m) are minimal. This is one of Sewall Wright's array of possibilities.[37] In addition to "spatially" based patterns of fertilization, there are others based on either "phenotypic" or "relationship" criteria. Thephenotypic bases includeassortative fertilization (between similar phenotypes) anddisassortative fertilization (between opposite phenotypes). Therelationship patterns includesib crossing,cousin crossing andbackcrossing—and are considered in a separate section.Self fertilization may be considered both from a spatial or relationship point of view.
The breeding population consists ofs smalldispersed random fertilization gamodemes of sample size (k = 1 ...s ) with "overlaps " of proportion in whichnon-dispersive random fertilization occurs. The dispersive proportion is thus. The bulk population consists ofweighted averages of sample sizes, allele and genotype frequencies and progeny means, as was done for genetic drift in an earlier section. However, eachgamete sample size is reduced to allow for theoverlaps, thus finding a effective for.
For brevity, the argument is followed further with the subscripts omitted. Recall that is in general. [Here, and following, the2N refers to thepreviously defined sample size, not to any "islands adjusted" version.]
After simplification,[37] Notice that whenm = 0 this reduces to the previousΔ f. The reciprocal of this furnishes an estimate of the "effective for ", mentioned above.
This Δf is also substituted into the previousinbreeding coefficient to obtain[37] wheret is the index over generations, as before.
The effectiveoverlap proportion can be obtained also,[37] as
The graphs to the right show theinbreeding for a gamodeme size of2N = 50 forordinary dispersed random fertilization(RF)(m=0), and forfour overlap levels ( m = 0.0625, 0.125, 0.25, 0.5 ) ofislandsrandom fertilization. There has indeed been reduction in the inbreeding resulting from thenon-dispersed random fertilization in the overlaps. It is particularly notable asm → 0.50. Sewall Wright suggested that this value should be the limit for the use of this approach.[37]
Thegene-model examines the heredity pathway from the point of view of "inputs" (alleles/gametes) and "outputs" (genotypes/zygotes), with fertilization being the "process" converting one to the other. An alternative viewpoint concentrates on the "process" itself, and considers the zygote genotypes as arising from allele shuffling. In particular, it regards the results as if one allele had "substituted" for the other during the shuffle, together with a residual that deviates from this view. This formed an integral part of Fisher's method,[8] in addition to his use of frequencies and effects to generate his genetical statistics.[14] A discursive derivation of theallele substitution alternative follows.[14]: 113
Suppose that the usual random fertilization of gametes in a "base" gamodeme—consisting ofp gametes (A) andq gametes (a)—is replaced by fertilization with a "flood" of gametes all containing a single allele (A ora, but not both). The zygotic results can be interpreted in terms of the "flood" allele having "substituted for" the alternative allele in the underlying "base" gamodeme. The diagram assists in following this viewpoint: the upper part pictures anA substitution, while the lower part shows ana substitution. (The diagram's "RF allele" is the allele in the "base" gamodeme.)
Consider the upper part firstly. BecausebaseA is present with a frequency ofp, thesubstituteA fertilizes it with a frequency ofp resulting in a zygoteAA with an allele effect ofa. Its contribution to the outcome, therefore, is the product. Similarly, when thesubstitute fertilizesbasea (resulting inAa with a frequency ofq and heterozygote effect ofd), the contribution is. The overall result of substitution byA is, therefore,. This is now oriented towards the population mean [see earlier section] by expressing it as a deviate from that mean :
After some algebraic simplification, this becomes - thesubstitution effect ofA.
A parallel reasoning can be applied to the lower part of the diagram, taking care with the differences in frequencies and gene effects. The result is thesubstitution effect ofa, which isThe common factor inside the brackets is theaverage allele substitution effect,[14]: 113 and isIt can also be derived in a more direct way, but the result is the same.[39]
In subsequent sections, these substitution effects help define the gene-model genotypes as consisting of a partition predicted by these new effects (substitutionexpectations), and a residual (substitution deviations) between these expectations and the previous gene-model effects. Theexpectations are also called thebreeding values and the deviations are also calleddominance deviations.
Ultimately, the variance arising from thesubstitution expectations becomes the so-calledAdditive genetic variance (σ2A)[14] (also theGenic variance[40])— while that arising from thesubstitution deviations becomes the so-calledDominance variance (σ2D). It is noticeable that neither of these terms reflects the true meanings of these variances. The "genic variance" is less dubious than the additive genetic variance, and more in line with Fisher's own name for this partition.[8][29]: 33 A less-misleading name for thedominance deviations variance is the "quasi-dominance variance" [see following sections for further discussion]. These latter terms are preferred herein.
The gene-model effects (a,d and-a) are important soon in the derivation of thedeviations from substitution, which were first discussed in the previousAllele Substitution section. However, they need to be redefined themselves before they become useful in that exercise. They firstly need to be re-centralized around the population mean (G), and secondly they need to be re-arranged as functions ofβ, theaverage allele substitution effect.
Consider firstly the re-centralization. The re-centralized effect forAA isa• = a - G which, after simplification, becomesa• = 2q(a-pd). The similar effect forAa isd• = d - G = a(q-p) + d(1-2pq), after simplification. Finally, the re-centralized effect foraa is(-a)• = -2p(a+qd).[14]: 116–119
Secondly, consider the re-arrangement of these re-centralized effects as functions ofβ. Recalling from the "Allele Substitution" section that β = [a +(q-p)d], rearrangement givesa = [β -(q-p)d]. After substituting this fora ina• and simplifying, the final version becomesa•• = 2q(β-qd). Similarly,d• becomesd•• = β(q-p) + 2pqd; and(-a)• becomes(-a)•• = -2p(β+pd).[14]: 118
The zygote genotypes are the target of all this preparation. The homozygous genotypeAA is a union of twosubstitution effects of A, one from each sex. Itssubstitution expectation is thereforeβAA = 2βA = 2qβ (see previous sections). Similarly, thesubstitution expectation ofAa isβAa = βA + βa = (q-p)β; and foraa,βaa = 2βa = -2pβ. Thesesubstitution expectations of the genotypes are also calledbreeding values.[14]: 114–116
Substitution deviations are the differences between theseexpectations and thegene effects after their two-stage redefinition in the previous section. Therefore,dAA = a•• - βAA = -2q2d after simplification. Similarly,dAa = d•• - βAa = 2pqd after simplification. Finally,daa = (-a)•• - βaa = -2p2d after simplification.[14]: 116–119 Notice that all of thesesubstitution deviations ultimately are functions of the gene-effectd—which accounts for the use of ["d" plus subscript] as their symbols. However, it is a seriousnon sequitur in logic to regard them as accounting for the dominance (heterozygosis) in the entire gene model : they are simplyfunctions of "d" and not anaudit of the "d" in the system. Theyare as derived:deviations from the substitution expectations!
The "substitution expectations" ultimately give rise to theσ2A (the so-called "Additive" genetic variance); and the "substitution deviations" give rise to theσ2D (the so-called "Dominance" genetic variance). Be aware, however, that the average substitution effect (β) also contains "d" [see previous sections], indicating that dominance is also embedded within the "Additive" variance [see following sections on the Genotypic Variance for their derivations]. Remember also [see previous paragraph] that the "substitution deviations" do not account for the dominance in the system (being nothing more than deviations from thesubstitution expectations), but which happen to consist algebraically of functions of "d". More appropriate names for these respective variances might beσ2B (the "Breeding expectations" variance) andσ2δ (the "Breeding deviations" variance). However, as noted previously, "Genic" (σ2A) and "Quasi-Dominance" (σ2D), respectively, will be preferred herein.
There are two major approaches to defining and partitioninggenotypic variance. One is based on thegene-model effects,[40] while the other is based on thegenotype substitution effects[14] They are algebraically inter-convertible with each other.[36] In this section, the basicrandom fertilization derivation is considered, with the effects of inbreeding and dispersion set aside. This is dealt with later to arrive at a more general solution. Until thismono-genic treatment is replaced by amulti-genic one, and untilepistasis is resolved in the light of the findings ofepigenetics, the Genotypic variance has only the components considered here.
It is convenient to follow the biometrical approach, which is based on correcting theunadjusted sum of squares (USS) by subtracting thecorrection factor (CF). Because all effects have been examined through frequencies, the USS can be obtained as the sum of the products of each genotype's frequency' and the square of itsgene-effect. The CF in this case is the mean squared. The result is the SS, which, again because of the use of frequencies, is also immediately thevariance.[9]
The, and the. The
After partial simplification, The last line is in Mather's terminology.[40]: 212 [41][42]
Here,σ2a is thehomozygote orallelic variance, andσ2d is theheterozygote ordominance variance. Thesubstitution deviations variance (σ2D) is also present. The(weighted_covariance)ad[43] is abbreviated hereafter to "covad ".
These components are plotted across all values ofp in the accompanying figure. Notice thatcovad isnegative forp > 0.5.
Most of these components are affected by the change of central focus fromhomozygote mid-point (mp) topopulation mean (G), the latter being the basis of theCorrection Factor. Thecovad andsubstitution deviation variances are simply artifacts of this shift. Theallelic anddominance variances are genuine genetical partitions of the original gene-model, and are the only eu-genetical components. Even then, the algebraic formula for theallelic variance is effected by the presence ofG: it is only thedominance variance (i.e. σ2d ) which is unaffected by the shift frommp toG.[36] These insights are commonly not appreciated.
Further gathering of terms [in Mather format] leads to, where. It is useful later in Diallel analysis, which is an experimental design for estimating these genetical statistics.[44]
If, following the last-given rearrangements, the first three terms are amalgamated together, rearranged further and simplified, the result is the variance of the Fisheriansubstitution expectation.
That is:
Notice particularly thatσ2A is notσ2a. The first is thesubstitution expectations variance, while the second is theallelic variance.[45] Notice also thatσ2D (thesubstitution-deviations variance) isnotσ2d (thedominance variance), and recall that it is an artifact arising from the use ofG for the Correction Factor. [See the "blue paragraph" above.] It now will be referred to as the "quasi-dominance" variance.
Also note thatσ2D <σ2d ("2pq" being always a fraction); and note that (1)σ2D =2pq σ2d, and that (2)σ2d =σ2D / (2pq). That is: it is confirmed that σ2D does not quantify the dominance variance in the model. It is σ2d which does that. However, the dominance variance (σ2d) can be estimated readily from the σ2D if2pq is available.
From the Figure, these results can be visualized as accumulatingσ2a,σ2d andcovad to obtainσ2A, while leaving theσ2D still separated. It is clear also in the Figure thatσ2D <σ2d, as expected from the equations.
The overall result (in Fisher's format) is The Fisherian components have just been derived, but their derivation via thesubstitution effects themselves is given also, in the next section.
Reference to the several earlier sections on allele substitution reveals that the two ultimate effects aregenotype substitution expectations andgenotype substitution deviations. Notice that these are each already defined as deviations from therandom fertilization population mean (G). For each genotype in turn therefore, the product of the frequency and the square of the relevant effect is obtained, and these are accumulated to obtain directly aSS andσ2.[46] Details follow.
σ2A =p2 βAA2 +2pq βAa2 +q2 βaa2, which simplifies toσ2A = 2pqβ2—the Genic variance.
σ2D =p2 dAA2 +2pq dAa2 +q daa2, which simplifies toσ2D = (2pq)2 d2—the quasi-Dominance variance.
Upon accumulating these results,σ2G =σ2A + σ2D. These components are visualized in the graphs to the right. Theaverage allele substitution effect is graphed also, but the symbol is "α" (as is common in the citations) rather than "β" (as is used herein).
Once again, however, refer to the earlier discussions about the true meanings and identities of these components. Fisher himself did not use these modern terms for his components. Thesubstitution expectations variance he named the "genetic" variance; and thesubstitution deviations variance he regarded simply as the unnamedresidual between the "genotypic" variance (his name for it) and his "genetic" variance.[8][29]: 33 [47][48] [The terminology and derivation used in this article are completely in accord with Fisher's own.] Mather's term for theexpectations variance—"genic"[40]—is obviously derived from Fisher's term, and avoids using "genetic" (which has become too generalized in usage to be of value in the present context). The origin is obscure of the modern misleading terms "additive" and "dominance" variances.
Note that this allele-substitution approach defined the components separately, and then totaled them to obtain the final Genotypic variance. Conversely, the gene-model approach derived the whole situation (components and total) as one exercise. Bonuses arising from this were (a) the revelations about the real structure ofσ2A, and (b) the real meanings and relative sizes ofσ2d andσ2D (see previous sub-section). It is also apparent that a "Mather" analysis is more informative, and that a "Fisher" analysis can always be constructed from it. The opposite conversion is not possible, however, because information aboutcovad would be missing.
In the section on genetic drift, and in other sections that discuss inbreeding, a major outcome from allele frequency sampling has been thedispersion of progeny means. This collection of means has its own average, and also has a variance: theamongst-line variance. (This is a variance of the attribute itself, not ofallele frequencies.) As dispersion develops further over succeeding generations, this amongst-line variance would be expected to increase. Conversely, as homozygosity rises, the within-lines variance would be expected to decrease. The question arises therefore as to whether the total variance is changing—and, if so, in what direction. To date, these issues have been presented in terms of thegenic (σ2A ) andquasi-dominance (σ2D ) variances rather than the gene-model components. This will be done herein as well.
The crucialoverview equation comes from Sewall Wright,[13]: 99, 130 [37] and is the outline of theinbred genotypic variance based on aweighted average of its extremes, the weights being quadratic with respect to theinbreeding coefficient. This equation is:
where is the inbreeding coefficient, is the genotypic variance atf=0, is the genotypic variance atf=1, is the population mean atf=0, and is the population mean atf=1.
The component [in the equation above] outlines the reduction of variance within progeny lines. The component addresses the increase in variance amongst progeny lines. Lastly, the component is seen (in the next line) to address thequasi-dominance variance.[13]: 99 & 130 These components can be expanded further thereby revealing additional insight. Thus:-
Firstly,σ2G(0) [in the equation above] has been expanded to show its two sub-components [see section on "Genotypic variance"]. Next, theσ2G(1) has been converted to4pqa2, and is derived in a section following. The third component's substitution is the difference between the two "inbreeding extremes" of the population mean [see section on the "Population Mean"].[36]
Summarising: thewithin-line components are and; and theamongst-line components are and.[36]
Rearranging gives the following: The version in the last line is discussed further in a subsequent section.
Similarly,
Graphs to the left show these three genic variances, together with the three quasi-dominance variances, across all values off, forp = 0.5 (at which the quasi-dominance variance is at a maximum). Graphs to the right show theGenotypic variance partitions (being the sums of the respectivegenic andquasi-dominance partitions) changing over ten generations with an examplef = 0.10.
Answering, firstly, the questions posed at the beginning about thetotal variances [theΣ in the graphs] : thegenic variance rises linearly with theinbreeding coefficient, maximizing at twice its starting level. Thequasi-dominance variance declines at the rate of(1 − f2 ) until it finishes at zero. At low levels off, the decline is very gradual, but it accelerates with higher levels off.
Secondly, notice the other trends. It is probably intuitive that thewithin line variances decline to zero with continued inbreeding, and this is seen to be the case (both at the same linear rate(1-f) ). Theamongst line variances both increase with inbreeding up tof = 0.5, thegenic variance at the rate of2f, and thequasi-dominance variance at the rate of(f − f2). Atf > 0.5, however, the trends change. Theamongst linegenic variance continues its linear increase until it equals thetotalgenic variance. But, theamongst linequasi-dominance variance now declines towardszero, because(f − f2) also declines withf > 0.5.[36]
Recall that whenf=1, heterozygosity is zero, within-line variance is zero, and all genotypic variance is thusamongst-line variance and deplete of dominance variance. In other words,σ2G(1) is the variance amongst fully inbred line means. Recall further [from "The mean after self-fertilization" section] that such means (G1's, in fact) areG = a(p-q). Substituting(1-q) for thep, givesG1 = a (1 − 2q) =a − 2aq.[14]: 265 Therefore, theσ2G(1) is theσ2(a-2aq) actually. Now, in general, thevariance of a difference (x-y) is [ σ2x + σ2y − 2 covxy ].[49]: 100 [50]: 232 Therefore,σ2G(1) = [ σ2a + σ22aq − 2 cov(a, 2aq) ]. Buta (an alleleeffect) andq (an allelefrequency) areindependent—so this covariance is zero. Furthermore,a is a constant from one line to the next, soσ2a is also zero. Further,2a is another constant (k), so theσ22aq is of the typeσ2k X. In general, the varianceσ2k X is equal tok2 σ2X.[50]: 232 Putting all this together reveals that σ2(a-2aq) = (2a)2 σ2q. Recall [from the section on "Continued genetic drift"] thatσ2q = pq f. Withf=1 here within this present derivation, this becomespq 1 (that ispq), and this is substituted into the previous.
The final result is:σ2G(1) = σ2(a-2aq) = 4a2 pq = 2(2pq a2) = 2 σ2a.
It follows immediately thatf σ2G(1) =f 2 σ2a. [This lastf comes from theinitial Sewall Wright equation : it isnot thef just set to "1" in the derivation concluded two lines above.]
Previous sections found that thewithin linegenic variance is based upon thesubstitution-derived genic variance( σ2A )—but theamongst linegenic variance is based upon thegene model allelic variance( σ2a ). These two cannot simply be added to gettotal genic variance. One approach in avoiding this problem was to re-visit the derivation of theaverage allele substitution effect, and to construct a version,( βf ), that incorporates the effects of the dispersion. Crow and Kimura achieved this[13]: 130–131 using the re-centered allele effects (a•, d•, (-a)•) discussed previously ["Gene effects re-defined"]. However, this was found subsequently to under-estimate slightly thetotal Genic variance, and a new variance-based derivation led to a refined version.[36]
Therefined version is: βf = { a2 + [(1−f ) / (1 +f )] 2(q − p ) ad + [(1-f ) / (1 +f )] (q − p )2 d2 }(1/2)
Consequently,σ2A(f) = (1 +f ) 2pq βf2 does now agree with[ (1-f) σ2A(0) + 2f σ2a(0) ] exactly.
Thetotal genic variance is of intrinsic interest in its own right. But, prior to the refinements by Gordon,[36] it had had another important use as well. There had been no extant estimators for the "dispersed" quasi-dominance. This had been estimated as the difference between Sewall Wright'sinbred genotypic variance[37] and the total "dispersed" genic variance [see the previous sub-section]. An anomaly appeared, however, because thetotal quasi-dominance variance appeared to increase early in inbreeding despite the decline in heterozygosity.[14]: 128 : 266
The refinements in the previous sub-section corrected this anomaly.[36] At the same time, a direct solution for thetotal quasi-dominance variance was obtained, thus avoiding the need for the "subtraction" method of previous times. Furthermore, direct solutions for theamongst-line andwithin-line partitions of thequasi-dominance variance were obtained also, for the first time. [These have been presented in the section "Dispersion and the genotypic variance".]
The environmental variance is phenotypic variability, which cannot be ascribed to genetics. This sounds simple, but the experimental design needed to separate the two needs very careful planning. Even the "external" environment can be divided into spatial and temporal components ("Sites" and "Years"); or into partitions such as "litter" or "family", and "culture" or "history". These components are very dependent upon the actual experimental model used to do the research. Such issues are very important when doing the research itself, but in this article on quantitative genetics this overview may suffice.
It is an appropriate place, however, for a summary:
Phenotypic variance = genotypic variances + environmental variances +genotype-environment interaction + experimental "error" variance
i.e., σ2P = σ2G + σ2E + σ2GE + σ2
or σ2P = σ2A + σ2D + σ2I + σ2E + σ2GE + σ2
after partitioning the genotypic variance (G) into component variances "genic" (A), "quasi-dominance" (D), and "epistatic" (I).[51]
The environmental variance will appear in other sections, such as "Heritability" and "Correlated attributes".
Theheritability of a trait is the proportion of the total (phenotypic) variance (σ2P) that is attributable to genetic variance, whether it be the full genotypic variance, or some component of it. It quantifies the degree to which phenotypic variability is due to genetics: but the precise meaning depends upon which genetical variance partition is used in the numerator of the proportion.[52] Research estimates of heritability have standard errors, just as have all estimated statistics.[53]
Where the numerator variance is the whole Genotypic variance ( σ2G), the heritability is known as the "broadsense" heritability (H2). It quantifies the degree to which variability in an attribute is determined by genetics as a whole. [See section on the Genotypic variance.]
If only genic variance (σ2A) is used in the numerator, the heritability may be called "narrow sense" (h2). It quantifies the extent to which phenotypic variance is determined by Fisher'ssubstitution expectations variance.Fisher proposed that this narrow-sense heritability might be appropriate in considering the results of natural selection, focusing as it does on change-ability,that is upon "adaptation".[29] He proposed it with regard to quantifying Darwinian evolution.
Recalling that theallelic variance (σ2a) and thedominance variance (σ2d) are eu-genetic components of the gene-model [see section on the Genotypic variance], and thatσ2D (thesubstitution deviations or "quasi-dominance" variance) andcovad are due to changing from the homozygote midpoint (mp) to the population mean (G), it can be seen that the real meanings of these heritabilities are obscure. The heritabilities and have unambiguous meaning.
Narrow-sense heritability has been used also for predicting generally the results ofartificial selection. In the latter case, however, the broadsense heritability may be more appropriate, as the whole attribute is being altered: not just adaptive capacity. Generally, advance from selection is more rapid the higher the heritability. [See section on "Selection".] In animals, heritability of reproductive traits is typically low, while heritability of disease resistance and production are moderately low to moderate, and heritability of body conformation is high.
Repeatability (r2) is the proportion of phenotypic variance attributable to differences in repeated measures of the same subject, arising from later records. It is used particularly for long-lived species. This value can only be determined for traits that manifest multiple times in the organism's lifetime, such as adult body mass, metabolic rate or litter size. Individual birth mass, for example, would not have a repeatability value: but it would have a heritability value. Generally, but not always, repeatability indicates the upper level of the heritability.[54]
r2 = (s2G + s2PE)/s2P
where s2PE = phenotype-environment interaction = repeatability.
The above concept of repeatability is, however, problematic for traits that necessarily change greatly between measurements. For example, body mass increases greatly in many organisms between birth and adult-hood. Nonetheless, within a given age range (or life-cycle stage), repeated measures could be done, and repeatability would be meaningful within that stage.
From the heredity perspective, relations are individuals that inherited genes from one or more common ancestors. Therefore, their "relationship" can bequantified on the basis of the probability that they each have inherited a copy of an allele from the common ancestor. In earlier sections, theInbreeding coefficient has been defined as, "the probability that twosame alleles (A andA, ora anda ) have a common origin"—or, more formally, "The probability that two homologous alleles are autozygous." Previously, the emphasis was on an individual's likelihood of having two such alleles, and the coefficient was framed accordingly. It is obvious, however, that this probability of autozygosity for an individual must also be the probability that each of itstwo parents had this autozygous allele. In this re-focused form, the probability is called theco-ancestry coefficient for the two individualsi andj (fij ). In this form, it can be used to quantify the relationship between two individuals, and may also be known as thecoefficient of kinship or theconsanguinity coefficient.[13]: 132–143 [14]: 82–92
Pedigrees are diagrams of familial connections between individuals and their ancestors, and possibly between other members of the group that share genetical inheritance with them. They are relationship maps. A pedigree can be analyzed, therefore, to reveal coefficients of inbreeding and co-ancestry. Such pedigrees actually are informal depictions ofpath diagrams as used inpath analysis, which was invented by Sewall Wright when he formulated his studies on inbreeding.[55]: 266–298 Using the adjacent diagram, the probability that individuals "B" and "C" have received autozygous alleles from ancestor "A" is1/2 (one out of the two diploid alleles). This is the "de novo" inbreeding (ΔfPed) at this step. However, the other allele may have had "carry-over" autozygosity from previous generations, so the probability of this occurring is (de novo complement multiplied by theinbreeding of ancestor A ), that is (1 − ΔfPed ) fA = (1/2) fA. Therefore, the total probability of autozygosity in B and C, following the bi-furcation of the pedigree, is the sum of these two components, namely (1/2) + (1/2)fA = (1/2) (1+fA ). This can be viewed as the probability that two random gametes from ancestor A carry autozygous alleles, and in that context is called thecoefficient of parentage ( fAA).[13]: 132–143 [14]: 82–92 It appears often in the following paragraphs.
Following the "B" path, the probability that any autozygous allele is "passed on" to each successive parent is again(1/2) at each step (including the last one to the "target"X ). The overall probability of transfer down the "B path" is therefore (1/2)3. The power that (1/2) is raised to can be viewed as "the number of intermediates in the path betweenA andX ",nB = 3. Similarly, for the "C path", nC = 2, and the "transfer probability" is (1/2)2. The combined probability of autozygous transfer fromA toX is therefore[ fAA (1/2)(nB) (1/2)(nC) ]. Recalling that fAA = (1/2) (1+fA ), fX = fPQ = (1/2)(nB + nC + 1) (1 + fA ). In this example, assuming that fA = 0, fX = 0.0156 (rounded) =fPQ, one measure of the "relatedness" betweenP andQ.
In this section, powers of (1/2) were used to represent the "probability of autozygosity". Later, this same method will be used to represent the proportions of ancestral gene-pools which are inherited down a pedigree [the section on "Relatedness between relatives"].
In the following sections on sib-crossing and similar topics, a number of "averaging rules" are useful. These derive frompath analysis.[55] The rules show that any co-ancestry coefficient can be obtained as the average ofcross-over co-ancestries between appropriate grand-parental and parental combinations. Thus, referring to the adjacent diagram,Cross-multiplier 1 is thatfPQ = average of (fAC,fAD,fBC,fBD ) = (1/4) [fAC + fAD + fBC + fBD ] =fY. In a similar fashion,cross-multiplier 2 states thatfPC = (1/2) [ fAC + fBC ]—whilecross-multiplier 3 states thatfPD = (1/2) [ fAD + fBD ] . Returning to the first multiplier, it can now be seen also to befPQ = (1/2) [ fPC + fPD ], which, after substituting multipliers 2 and 3, resumes its original form.
In much of the following, the grand-parental generation is referred to as(t-2), the parent generation as(t-1), and the "target" generation ast.
The diagram to the right shows thatfull sib crossing is a direct application ofcross-Multiplier 1, with the slight modification thatparents A and B repeat (in lieu ofC and D) to indicate that individualsP1 andP2 have both oftheir parents in common—that is they arefull siblings. IndividualY is the result of the crossing of two full siblings. Therefore,fY = fP1,P2 = (1/4) [ fAA + 2 fAB + fBB ]. Recall thatfAA andfBB were defined earlier (in Pedigree analysis) ascoefficients of parentage, equal to (1/2)[1+fA ] and (1/2)[1+fB ] respectively, in the present context. Recognize that, in this guise, the grandparentsA andB representgeneration (t-2). Thus, assuming that in any one generation all levels of inbreeding are the same, these twocoefficients of parentage each represent(1/2) [1 + f(t-2) ].
Now, examinefAB. Recall that this also isfP1 orfP2, and so representstheir generation -f(t-1). Putting it all together,ft = (1/4) [ 2 fAA + 2 fAB ] = (1/4) [ 1 + f(t-2) + 2 f(t-1) ]. That is theinbreeding coefficient forFull-Sib crossing .[13]: 132–143 [14]: 82–92 The graph to the left shows the rate of this inbreeding over twenty repetitive generations. The "repetition" means that the progeny after cyclet become the crossing parents that generate cycle (t+1 ), and so on successively. The graphs also show the inbreeding forrandom fertilization 2N=20 for comparison. Recall that this inbreeding coefficient for progenyY is also theco-ancestry coefficient for its parents, and so is a measure of therelatedness of the two Fill siblings.
Derivation of thehalf sib crossing takes a slightly different path to that for Full sibs. In the adjacent diagram, the two half-sibs at generation (t-1) have only one parent in common—parent "A" at generation (t-2). Thecross-multiplier 1 is used again, givingfY = f(P1,P2) = (1/4) [ fAA + fAC + fBA + fBC ]. There is just onecoefficient of parentage this time, but threeco-ancestry coefficients at the (t-2) level (one of them—fBC—being a "dummy" and not representing an actual individual in the (t-1) generation). As before, thecoefficient of parentage is(1/2)[1+fA ], and the threeco-ancestries each representf(t-1). Recalling that fA represents f(t-2), the final gathering and simplifying of terms gives fY = ft = (1/8) [ 1 + f(t-2) + 6 f(t-1) ].[13]: 132–143 [14]: 82–92 The graphs at left include thishalf-sib (HS) inbreeding over twenty successive generations.
As before, this also quantifies therelatedness of the two half-sibs at generation (t-1) in its alternative form off(P1, P2).
A pedigree diagram for selfing is on the right. It is so straightforward it does not require any cross-multiplication rules. It employs just the basic juxtaposition of theinbreeding coefficient and its alternative theco-ancestry coefficient; followed by recognizing that, in this case, the latter is also acoefficient of parentage. Thus, fY = f(P1, P1) = ft = (1/2) [ 1 + f(t-1) ].[13]: 132–143 [14]: 82–92 This is the fastest rate of inbreeding of all types, as can be seen in the graphs above. The selfing curve is, in fact, a graph of thecoefficient of parentage.
These are derived with methods similar to those for siblings.[13]: 132–143 [14]: 82–92 As before, theco-ancestry viewpoint of theinbreeding coefficient provides a measure of "relatedness" between the parentsP1 andP2 in these cousin expressions.
The pedigree forFirst Cousins (FC) is given to the right. The prime equation isfY = ft = fP1,P2 = (1/4) [ f1D + f12 + fCD + fC2 ]. After substitution with corresponding inbreeding coefficients, gathering of terms and simplifying, this becomes ft = (1/4) [ 3 f(t-1) + (1/4) [2 f(t-2) + f(t-3) + 1 ]], which is a version for iteration—useful for observing the general pattern, and for computer programming. A "final" version is ft = (1/16) [ 12 f(t-1) + 2 f(t-2) + f(t-3) + 1 ].
TheSecond Cousins (SC) pedigree is on the left. Parents in the pedigree not related to thecommon Ancestor are indicated by numerals instead of letters. Here, the prime equation is fY = ft = fP1,P2 = (1/4) [ f3F + f34 + fEF + fE4 ]. After working through the appropriate algebra, this becomes ft = (1/4) [ 3 f(t-1) + (1/4) [3 f(t-2) + (1/4) [2 f(t-3) + f(t-4) + 1 ]]], which is the iteration version. A "final" version is ft = (1/64) [ 48 f(t-1) + 12 f(t-2) + 2 f(t-3) + f(t-4) + 1 ].
To visualize thepattern in full cousin equations, start the series with thefull sib equation re-written in iteration form: ft = (1/4)[2 f(t-1) + f(t-2) + 1 ]. Notice that this is the "essential plan" of the last term in each of the cousin iterative forms: with the small difference that the generation indices increment by "1" at each cousin "level". Now, define thecousin level ask = 1 (for First cousins),= 2 (for Second cousins),= 3 (for Third cousins), etc., etc.; and= 0 (for Full Sibs, which are "zero level cousins"). Thelast term can be written now as: (1/4) [ 2 f(t-(1+k)) + f(t-(2+k)) + 1]. Stacked in front of thislast term are one or moreiteration increments in the form(1/4) [ 3 f(t-j) + ..., wherej is theiteration index and takes values from1 ... k over the successive iterations as needed. Putting all this together provides a general formula for all levels offull cousin possible, includingFull Sibs. Forkthlevel full cousins,f{k}t =Ιterj = 1k { (1/4) [ 3 f(t-j) + }j + (1/4) [ 2 f(t-(1+k)) + f(t-(2+k)) + 1]. At the commencement of iteration, all f(t-x) are set at "0", and each has its value substituted as it is calculated through the generations. The graphs to the right show the successive inbreeding for several levels of Full Cousins.
Forfirst half-cousins (FHC), the pedigree is to the left. Notice there is just one common ancestor (individualA). Also, as forsecond cousins, parents not related to the common ancestor are indicated by numerals. Here, the prime equation is fY = ft = fP1,P2 = (1/4) [ f3D + f34 + fCD + fC4 ]. After working through the appropriate algebra, this becomes ft = (1/4) [ 3 f(t-1) + (1/8) [6 f(t-2) + f(t-3) + 1 ]], which is the iteration version. A "final" version is ft = (1/32) [ 24 f(t-1) + 6 f(t-2) + f(t-3) + 1 ]. The iteration algorithm is similar to that forfull cousins, except that the last term is(1/8) [ 6 f(t-(1+k)) + f(t-(2+k)) + 1 ]. Notice that this last term is basically similar to the half sib equation, in parallel to the pattern for full cousins and full sibs. In other words, half sibs are "zero level" half cousins.
There is a tendency to regard cousin crossing with a human-oriented point of view, possibly because of a wide interest in Genealogy. The use of pedigrees to derive the inbreeding perhaps reinforces this "Family History" view. However, such kinds of inter-crossing occur also in natural populations—especially those that are sedentary, or have a "breeding area" that they re-visit from season to season. The progeny-group of a harem with a dominant male, for example, may contain elements of sib-crossing, cousin crossing, and backcrossing, as well as genetic drift, especially of the "island" type. In addition to that, the occasional "outcross" adds an element of hybridization to the mix. It isnot panmixia.
Following the hybridizing betweenA andR, theF1 (individualB) is crossed back (BC1) to an original parent (R) to produce theBC1 generation (individualC). [It is usual to use the same label for the act ofmaking the back-crossand for the generation produced by it. The act of back-crossing is here initalics. ] ParentR is therecurrent parent. Two successive backcrosses are depicted, with individualD being theBC2 generation. These generations have been givent indices also, as indicated. As before,fD = ft = fCR = (1/2) [ fRB + fRR ], usingcross-multiplier 2 previously given. ThefRB just defined is the one that involves generation(t-1) with(t-2). However, there is another suchfRB contained whollywithin generation(t-2) as well, and it isthis one that is used now: as theco-ancestry of theparents of individualC in generation(t-1). As such, it is also theinbreeding coefficient ofC, and hence isf(t-1). The remainingfRR is thecoefficient of parentage of therecurrent parent, and so is(1/2) [1 + fR ]. Putting all this together :ft = (1/2) [ (1/2) [ 1 + fR ] + f(t-1) ] = (1/4) [ 1 + fR + 2 f(t-1) ]. The graphs at right illustrate Backcross inbreeding over twenty backcrosses for three different levels of (fixed) inbreeding in the Recurrent parent.
This routine is commonly used in Animal and Plant Breeding programmes. Often after making the hybrid (especially if individuals are short-lived), the recurrent parent needs separate "line breeding" for its maintenance as a future recurrent parent in the backcrossing. This maintenance may be through selfing, or through full-sib or half-sib crossing, or through restricted randomly fertilized populations, depending on the species' reproductive possibilities. Of course, this incremental rise infR carries-over into theft of the backcrossing. The result is a more gradual curve rising to the asymptotes than shown in the present graphs, because thefR is not at a fixed level from the outset.
In the section on "Pedigree analysis", was used to represent probabilities of autozygous allele descent overn generations down branches of the pedigree. This formula arose because of the rules imposed by sexual reproduction:(i) two parents contributing virtually equal shares of autosomal genes, and(ii) successive dilution for each generation between the zygote and the "focus" level of parentage. These same rules apply also to any other viewpoint of descent in a two-sex reproductive system. One such is the proportion of any ancestral gene-pool (also known as 'germplasm') which is contained within any zygote's genotype.
Therefore, the proportion of anancestral genepool in a genotype is: wheren = number of sexual generations between the zygote and the focus ancestor.
For example, each parent defines a genepool contributing to its offspring; while each great-grandparent contributes to its great-grand-offspring.
The zygote's total genepool (Γ) is, of course, the sum of the sexual contributions to its descent.
Individuals descended from a common ancestral genepool obviously are related. This is not to say they are identical in their genes (alleles), because, at each level of ancestor, segregation and assortment will have occurred in producing gametes. But they will have originated from the same pool of alleles available for these meioses and subsequent fertilizations. [This idea was encountered firstly in the sections on pedigree analysis and relationships.] The genepool contributions [see section above] of theirnearest common ancestral genepool(anancestral node) can therefore be used to define their relationship. This leads to an intuitive definition of relationship which conforms well with familiar notions of "relatedness" found in family-history; and permits comparisons of the "degree of relatedness" for complex patterns of relations arising from such genealogy.
The only modifications necessary (for each individual in turn) are in Γ and are due to the shift to "sharedcommon ancestry" rather than "individualtotal ancestry". For this, defineΡ (in lieu ofΓ); m = number of ancestors-in-common at the node (i.e. m = 1 or 2 only); and an "individual index"k. Thus:
where, as before,n = number of sexual generations between the individual and the ancestral node.
An example is provided by two first full-cousins. Their nearest common ancestral node is their grandparents which gave rise to their two sibling parents, and they have both of these grandparents in common. [See earlier pedigree.] For this case,m=2 andn=2, so for each of them
In this simple case, each cousin has numerically the same Ρ .
A second example might be between two full cousins, but one (k=1) has three generations back to the ancestral node (n=3), and the other (k=2) only two (n=2) [i.e. a second and first cousin relationship]. For both, m=2 (they are full cousins).
and
Notice each cousin has a different Ρk.
In any pairwise relationship estimation, there is oneΡk for each individual: it remains to average them in order to combine them into a single "Relationship coefficient". Because eachΡ is afraction of a total genepool, the appropriate average for them is thegeometric mean[56][57]: 34–55 This average is theirGenepool Relationship Coefficient—the "GRC".
For the first example (two full first-cousins), their GRC = 0.5; for the second case (a full first and second cousin), their GRC = 0.3536.
All of these relationships (GRC) are applications of path-analysis.[55]: 214–298 A summary of some levels of relationship (GRC) follow.
GRC | Relationship examples |
---|---|
1.00 | full Sibs |
0.7071 | Parent ↔ Offspring; Uncle/Aunt ↔ Nephew/Niece |
0.5 | full First Cousins; half Sibs; grand Parent ↔ grand Offspring |
0.3536 | full Cousins First ↔ Second; full First Cousins {1 remove} |
0.25 | full Second Cousins; half First Cousins; full First Cousins {2 removes} |
0.1768 | full First Cousin {3 removes}; full Second Cousins {1 remove} |
0.125 | full Third Cousins; half Second Cousins; full 1st Cousins {4 removes} |
0.0884 | full First Cousins {5 removes}; half Second Cousins {1 remove} |
0.0625 | full Fourth Cousins; half Third Cousins |
These, in like manner to the Genotypic variances, can be derived through either the gene-model ("Mather") approach or the allele-substitution ("Fisher") approach. Here, each method is demonstrated for alternate cases.
These can be viewed either as the covariance between any offspring andany one of its parents (PO), or as the covariance between any offspring and the "mid-parent" value of both its parents (MPO).
This can be derived as thesum of cross-products between parent gene-effects andone-half of the progeny expectations using the allele-substitution approach. Theone-half of the progeny expectation accounts for the fact thatonly one of the two parents is being considered. The appropriate parental gene-effects are therefore the second-stage redefined gene effects used to define the genotypic variances earlier, that is:a″ = 2q(a − qd) andd″ = (q-p)a + 2pqd and also(-a)″ = -2p(a + pd) [see section "Gene effects redefined"]. Similarly, the appropriate progeny effects,for allele-substitution expectations are one-half of the earlierbreeding values, the latter being:aAA = 2qa, andaAa = (q-p)a and alsoaaa = -2pa [see section on "Genotype substitution – Expectations and Deviations"].
Because all of these effects are defined already as deviates from the genotypic mean, the cross-product sum using {genotype-frequency * parental gene-effect * half-breeding-value} immediately provides theallele-substitution-expectation covariance between any one parent and its offspring. After careful gathering of terms and simplification, this becomescov(PO)A = pqa2 =1/2 s2A.[13]: 132–141 [14]: 134–147
Unfortunately, theallele-substitution-deviations are usually overlooked, but they have not "ceased to exist" nonetheless! Recall that these deviations are:dAA = -2q2 d, anddAa = 2pq d and alsodaa = -2p2 d [see section on "Genotype substitution – Expectations and Deviations"]. Consequently, the cross-product sum using {genotype-frequency * parental gene-effect * half-substitution-deviations} also immediately provides the allele-substitution-deviations covariance between any one parent and its offspring. Once more, after careful gathering of terms and simplification, this becomescov(PO)D = 2p2q2d2 =1/2 s2D.
It follows therefore that:cov(PO) = cov(PO)A + cov(PO)D =1/2 s2A +1/2 s2D, when dominance isnot overlooked !
Because there are many combinations of parental genotypes, there are many different mid-parents and offspring means to consider, together with the varying frequencies of obtaining each parental pairing. The gene-model approach is the most expedient in this case. Therefore, anunadjusted sum of cross-products (USCP)—using all products { parent-pair-frequency * mid-parent-gene-effect * offspring-genotype-mean}—is adjusted by subtracting the{overall genotypic mean}2 ascorrection factor (CF). After multiplying out all the various combinations, carefully gathering terms, simplifying, factoring and cancelling-out where applicable, this becomes:
cov(MPO) = pq [a + (q-p)d ]2 = pq a2 =1/2 s2A, with no dominance having been overlooked in this case, as it had been used-up in defining thea.[13]: 132–141 [14]: 134–147
The most obvious application is an experiment that contains all parents and their offspring, with or without reciprocal crosses, preferably replicated without bias, enabling estimation of all appropriate means, variances and covariances, together with their standard errors. These estimated statistics can then be used to estimate the genetic variances. Twicethe difference between the estimates of the two forms of (corrected) parent-offspring covariance provides an estimate ofs2D; and twice thecov(MPO) estimatess2A. With appropriate experimental design and analysis,[9][49][50] standard errors can be obtained for these genetical statistics as well. This is the basic core of an experiment known asDiallel analysis, the Mather, Jinks and Hayman version of which is discussed in another section.
A second application involves usingregression analysis, which estimates from statistics the ordinate (Y-estimate), derivative (regression coefficient) and constant (Y-intercept) of calculus.[9][49][58][59] Theregression coefficient estimates therate of change of the function predictingY fromX, based on minimizing the residuals between the fitted curve and the observed data (MINRES). No alternative method of estimating such a function satisfies this basic requirement of MINRES. In general, the regression coefficient is estimated asthe ratio of the covariance(XY) to the variance of the determinator (X). In practice, the sample size is usually the same for both X and Y, so this can be written asSCP(XY) / SS(X), where all terms have been defined previously.[9][58][59] In the present context, the parents are viewed as the "determinative variable" (X), and the offspring as the "determined variable" (Y), and the regression coefficient as the "functional relationship" (ßPO) between the two. Takingcov(MPO) =1/2 s2A ascov(XY), and s2P / 2 (the variance of the mean of two parents—the mid-parent) ass2X, it can be seen thatßMPO = [1/2 s2A] / [1/2 s2P] = h2.[60] Next, utilizingcov(PO) = [1/2 s2A +1/2 s2D ] ascov(XY), and s2P ass2X, it is seen that 2 ßPO = [ 2 (1/2 s2A +1/2 s2D )] / s2P = H2.
Analysis ofepistasis has previously been attempted via aninteraction variance approach of the type s2AA, and s2AD and also s2DD. This has been integrated with these present covariances in an effort to provide estimators for the epistasis variances. However, the findings of epigenetics suggest that this may not be an appropriate way to define epistasis.
Covariance between half-sibs (HS) is defined easily using allele-substitution methods; but, once again, the dominance contribution has historically been omitted. However, as with the mid-parent/offspring covariance, the covariance between full-sibs (FS) requires a "parent-combination" approach, thereby necessitating the use of the gene-model corrected-cross-product method; and the dominance contribution has not historically been overlooked. The superiority of the gene-model derivations is as evident here as it was for the Genotypic variances.
The sum of the cross-products{ common-parent frequency * half-breeding-value of one half-sib * half-breeding-value of any other half-sib in that same common-parent-group } immediately provides one of the required covariances, because the effects used [breeding values—representing the allele-substitution expectations] are already defined as deviates from the genotypic mean [see section on "Allele substitution – Expectations and deviations"]. After simplification. this becomes: cov(HS)A =1/2 pq a2 =1/4 s2A.[13]: 132–141 [14]: 134–147 However, thesubstitution deviations also exist, defining the sum of the cross-products{ common-parent frequency * half-substitution-deviation of one half-sib * half-substitution-deviation of any other half-sib in that same common-parent-group }, which ultimately leads to: cov(HS)D = p2 q2 d2 =1/4 s2D. Adding the two components gives:
cov(HS) = cov(HS)A + cov(HS)D =1/4 s2A +1/4 s2D.
As explained in the introduction, a method similar to that used for mid-parent/progeny covariance is used. Therefore, anunadjusted sum of cross-products (USCP) using all products—{ parent-pair-frequency * the square of the offspring-genotype-mean}—is adjusted by subtracting the{overall genotypic mean}2 ascorrection factor (CF). In this case, multiplying out all combinations, carefully gathering terms, simplifying, factoring, and cancelling-out is very protracted. It eventually becomes:
cov(FS) = pq a2 + p2 q2 d2 =1/2 s2A +1/4 s2D, with no dominance having been overlooked.[13]: 132–141 [14]: 134–147
The most useful application here for genetical statistics is thecorrelation between half-sibs. Recall that the correlation coefficient (r) is the ratio of the covariance to the variance [see section on "Associated attributes" for example]. Therefore, rHS = cov(HS) / s2all HS together = [1/4 s2A +1/4 s2D ] / s2P =1/4 H2.[61] The correlation between full-sibs is of little utility, being rFS = cov(FS) / s2all FS together = [1/2 s2A +1/4 s2D ] / s2P. The suggestion that it "approximates" (1/2 h2) is poor advice.
Of course, the correlations between siblings are of intrinsic interest in their own right, quite apart from any utility they may have for estimating heritabilities or genotypic variances.
It may be worth noting that[ cov(FS) − cov(HS)] =1/4 s2A. Experiments consisting of FS and HS families could utilize this by using intra-class correlation to equate experiment variance components to these covariances [see section on "Coefficient of relationship as an intra-class correlation" for the rationale behind this].
The earlier comments regarding epistasis apply again here [see section on "Applications (Parent-offspring"].
Selection operates on the attribute (phenotype), such that individuals that equal or exceed a selection threshold(zP) become effective parents for the next generation. Theproportion they represent of the base population is theselection pressure. Thesmaller the proportion, thestronger the pressure. Themean of the selected group(Ps) is superior to thebase-population mean(P0) by the difference called theselection differential (S). All these quantities are phenotypic. To "link" to the underlying genes, aheritability(h2) is used, fulfilling the role of acoefficient of determination in the biometrical sense. Theexpected genetical change—still expressed inphenotypic units of measurement—is called thegenetic advance (ΔG), and is obtained by the product of theselection differential (S) and itscoefficient of determination(h2). The expectedmean of the progeny(P1) is found by adding thegenetic advance (ΔG) to thebase mean (P0). The graphs to the right show how the (initial) genetic advance is greater with stronger selection pressure (smallerprobability). They also show how progress from successive cycles of selection (even at the same selection pressure) steadily declines, because the Phenotypic variance and the Heritability are being diminished by the selection itself. This is discussed further shortly.
Thus.[14]: 1710–181 and.[14]: 1710–181
Thenarrow-sense heritability (h2) is usually used, thereby linking to thegenic variance (σ2A). However, if appropriate, use of thebroad-sense heritability (H2) would connect to thegenotypic variance (σ2G); and even possibly anallelic heritability [ h2eu = (σ2a) / (σ2P) ] might be contemplated, connecting to (σ2a ). [See section on Heritability.]
To apply these conceptsbefore selection actually takes place, and so predict the outcome of alternatives (such as choice ofselection threshold, for example), these phenotypic statistics are re-considered against the properties of the Normal Distribution, especially those concerning truncation of thesuperior tail of the Distribution. In such consideration, thestandardized selection differential (i)″ and thestandardized selection threshold (z)″ are used instead of the previous "phenotypic" versions. Thephenotypic standard deviate (σP(0)) is also needed. This is described in a subsequent section.
Therefore,ΔG = (i σP) h2, where(i σP(0)) =S previously.[14]: 1710–181
The text above noted that successiveΔG declines because the "input" [thephenotypic variance ( σ2P )] is reduced by the previous selection.[14]: 1710–181 The heritability also is reduced. The graphs to the left show these declines over ten cycles of repeated selection during which the same selection pressure is asserted. The accumulated genetic advance (ΣΔG) has virtually reached its asymptote by generation 6 in this example. This reduction depends partly upon truncation properties of the Normal Distribution, and partly upon the heritability together withmeiosis determination ( b2 ). The last two items quantify the extent to which thetruncation is "offset" by new variation arising from segregation and assortment during meiosis.[14]: 1710–181 [27] This is discussed soon, but here note the simplified result forundispersed random fertilization (f = 0).
Thus :σ2P(1) = σ2P(0) [1 − i ( i-z)1/2 h2], wherei ( i-z) = K = truncation coefficient and1/2 h2 = R = reproduction coefficient[14]: 1710–181 [27] This can be written also asσ2P(1) = σ2P(0) [1 − K R ], which facilitates more detailed analysis of selection problems.
Here,i andz have already been defined,1/2 is themeiosis determination (b2) forf=0, and the remaining symbol is the heritability. These are discussed further in following sections. Also notice that, more generally,R = b2 h2. If the generalmeiosis determination ( b2 ) is used, the results of prior inbreeding can be incorporated into the selection. The phenotypic variance equation then becomes:
σ2P(1) = σ2P(0) [1 − i ( i-z) b2 h2].
ThePhenotypic variance truncated by theselected group ( σ2P(S) ) is simplyσ2P(0) [1 − K], and its containedgenic variance is(h20 σ2P(S) ). Assuming that selection has not altered theenvironmental variance, thegenic variance for the progeny can be approximated by σ2A(1) = ( σ2P(1) − σ2E) . From this,h21 = ( σ2A(1) / σ2P(1) ). Similar estimates could be made forσ2G(1) andH21, or forσ2a(1) andh2eu(1) if required.
The following rearrangement is useful for considering selection on multiple attributes (characters). It starts by expanding the heritability into its variance components.ΔG = i σP ( σ2A / σ2P ). TheσP andσ2P partially cancel, leaving a soloσP. Next, theσ2A inside the heritability can be expanded as (σA × σA), which leads to :
ΔG = i σA ( σA / σP ) =i σA h.
Corresponding re-arrangements could be made using the alternative heritabilities, givingΔG = i σG H orΔG = i σa heu.
This traditional view of adaptation in quantitative genetics provides a model for how the selected phenotype changes over time, as a function of the selection differential and heritability. However it does not provide insight into (nor does it depend upon) any of the genetic details - in particular, the number of loci involved, their allele frequencies and effect sizes, and the frequency changes driven by selection. This, in contrast, is the focus of work onpolygenic adaptation[62] within the field ofpopulation genetics. Recent studies have shown that traits such as height have evolved in humans during the past few thousands of years as a result of small allele frequency shifts at thousands of variants that affect height.[63][64][65]
The entirebase population is outlined by the normal curve[59]: 78–89 to the right. Along theZ axis is every value of the attribute from least to greatest, and the height from this axis to the curve itself is the frequency of the value at the axis below. The equation for finding these frequencies for the "normal" curve (the curve of "common experience") is given in the ellipse. Notice it includes the mean (μ) and the variance (σ2). Moving infinitesimally along the z-axis, the frequencies of neighbouring values can be "stacked" beside the previous, thereby accumulating an area that represents theprobability of obtaining all values within the stack. [That'sintegration from calculus.] Selection focuses on such a probability area, being the shaded-in one from theselection threshold (z) to the end of the superior tail of the curve. This is theselection pressure. The selected group (the effective parents of the next generation) include all phenotype values fromz to the "end" of the tail.[66] The mean of theselected group isμs, and the difference between it and the base mean (μ) represents theselection differential (S). By taking partial integrations over curve-sections of interest, and some rearranging of the algebra, it can be shown that the "selection differential" isS = [ y (σ / Prob.)], wherey is thefrequency of the value at the "selection threshold"z (theordinate ofz).[13]: 226–230 Rearranging this relationship givesS / σ = y / Prob., the left-hand side of which is, in fact,selection differential divided by standard deviation—that is thestandardized selection differential (i). The right-side of the relationship provides an "estimator" fori—the ordinate of theselection threshold divided by theselection pressure. Tables of the Normal Distribution[49]: 547–548 can be used, but tabulations ofi itself are available also.[67]: 123–124 The latter reference also gives values ofi adjusted for small populations (400 and less),[67]: 111–122 where "quasi-infinity" cannot be assumed (butwas presumed in the "Normal Distribution" outline above). Thestandardized selection differential (i) is known also as theintensity of selection.[14]: 174, 186
Finally, a cross-link with the differing terminology in the previous sub-section may be useful:μ (here) = "P0" (there),μS = "PS" andσ2 = "σ2P".
Themeiosis determination (b2) is thecoefficient of determination of meiosis, which is the cell-division whereby parents generate gametes. Following the principles ofstandardized partial regression, of whichpath analysis is a pictorially oriented version, Sewall Wright analyzed the paths of gene-flow during sexual reproduction, and established the "strengths of contribution" (coefficients of determination) of various components to the overall result.[27][37] Path analysis includespartial correlations as well aspartial regression coefficients (the latter are thepath coefficients). Lines with a single arrow-head are directionaldeterminative paths, and lines with double arrow-heads arecorrelation connections. Tracing various routes according topath analysis rules emulates the algebra of standardized partial regression.[55]
The path diagram to the left represents this analysis of sexual reproduction. Of its interesting elements, the important one in the selection context ismeiosis. That's where segregation and assortment occur—the processes that partially ameliorate the truncation of the phenotypic variance that arises from selection. The path coefficientsb are the meiosis paths. Those labeleda are the fertilization paths. The correlation between gametes from the same parent (g) is themeiotic correlation. That between parents within the same generation isrA. That between gametes from different parents (f) became known subsequently as theinbreeding coefficient.[13]: 64 The primes ( ' ) indicate generation(t-1), and theunprimed indicate generationt. Here, some important results of the present analysis are given. Sewall Wright interpreted many in terms of inbreeding coefficients.[27][37]
The meiosis determination (b2) is1/2 (1+g) and equals1/2 (1 + f(t-1)), implying thatg = f(t-1).[68] With non-dispersed random fertilization, f(t-1)) = 0, givingb2 =1/2, as used in the selection section above. However, being aware of its background, other fertilization patterns can be used as required. Another determination also involves inbreeding—the fertilization determination (a2) equals1 / [ 2 ( 1 + ft ) ] . Also another correlation is an inbreeding indicator—rA =2 ft / ( 1 + f(t-1) ), also known as thecoefficient of relationship. [Do not confuse this with thecoefficient of kinship—an alternative name for theco-ancestry coefficient. See introduction to "Relationship" section.] ThisrA re-occurs in the sub-section on dispersion and selection.
These links with inbreeding reveal interesting facets about sexual reproduction that are not immediately apparent. The graphs to the right plot themeiosis andsyngamy (fertilization) coefficients of determination against the inbreeding coefficient. There it is revealed that as inbreeding increases, meiosis becomes more important (the coefficient increases), while syngamy becomes less important. The overall role of reproduction [the product of the previous two coefficients—r2] remains the same.[69] Thisincrease inb2 is particularly relevant for selection because it means that theselection truncation of the Phenotypic variance is offset to a lesser extent during a sequence of selections when accompanied by inbreeding (which is frequently the case).
The previous sections treateddispersion as an "assistant" toselection, and it became apparent that the two work well together. In quantitative genetics, selection is usually examined in this "biometrical" fashion, but the changes in the means (as monitored by ΔG) reflect the changes in allele and genotype frequencies beneath this surface. Referral to the section on "Genetic drift" brings to mind that it also effects changes in allele and genotype frequencies, and associated means; and that this is the companion aspect to the dispersion considered here ("the other side of the same coin"). However, these two forces of frequency change are seldom in concert, and may often act contrary to each other. One (selection) is "directional" being driven by selection pressure acting on the phenotype: the other (genetic drift) is driven by "chance" at fertilization (binomial probabilities of gamete samples). If the two tend towards the same allele frequency, their "coincidence" is the probability of obtaining that frequencies sample in the genetic drift: the likelihood of their being "in conflict", however, is thesum of probabilities of all the alternative frequency samples. In extreme cases, a single syngamy sampling can undo what selection has achieved, and the probabilities of it happening are available. It is important to keep this in mind. However, genetic drift resulting in sample frequencies similar to those of the selection target does not lead to so drastic an outcome—instead slowing progress towards selection goals.
Upon jointly observing two (or more) attributes (e.g. height and mass), it may be noticed that they vary together as genes or environments alter. This co-variation is measured by thecovariance, which can be represented by "cov " or byθ.[43] It will be positive if they vary together in the same direction; or negative if they vary together but in opposite direction. If the two attributes vary independently of each other, the covariance will be zero. The degree of association between the attributes is quantified by thecorrelation coefficient (symbolr or ρ). In general, the correlation coefficient is the ratio of thecovariance to the geometric mean[70] of the two variances of the attributes.[59]: 196–198 Observations usually occur at the phenotype, but in research they may also occur at the "effective haplotype" (effective gene product) [see Figure to the right]. Covariance and correlation could therefore be "phenotypic" or "molecular", or any other designation which an analysis model permits. The phenotypic covariance is the "outermost" layer, and corresponds to the "usual" covariance in Biometrics/Statistics. However, it can be partitioned by any appropriate research model in the same way as was the phenotypic variance. For every partition of the covariance, there is a corresponding partition of the correlation. Some of these partitions are given below. The first subscript (G, A, etc.) indicates the partition. The second-level subscripts (X, Y) are "place-keepers" for any two attributes.
The first example is theun-partitioned phenotype.
The genetical partitions(a) "genotypic" (overall genotype),(b) "genic" (substitution expectations) and(c) "allelic" (homozygote) follow.
(a)
(b)
(c)
With an appropriately designed experiment, anon-genetical (environment) partition could be obtained also.
![]() | This section has multiple issues. Please helpimprove it or discuss these issues on thetalk page.(Learn how and when to remove these messages) (Learn how and when to remove this message)
|
There are several different ways that phenotypic correlation can arise. Study design, sample size, sample statistics, and other factors can influence the ability to distinguish between them with more or less statistical confidence. Each of these have different scientific significance, and are relevant to different fields of work.
One phenotype may directly affect another phenotype, by influencing development, metabolism, or behavior.
A common gene or transcription factor in the biological pathways for the two phenotypes can result in correlation.
The metabolic pathways from gene to phenotype are complex and varied, but the causes of correlation amongst attributes lie within them.
Multiple phenotypes may be affected by the same factors. For example, there are many phenotypic attributes correlated with age, and so height, weight, caloric intake, endocrine function, and more all have a correlation. A study looking for other common factors must rule these out first.
Differences between subgroups in a population, between populations, or selective biases can mean that some combinations of genes are overrepresented compared with what would be expected.[71] While the genes may not have a significant influence on each other, there may still be a correlation between them, especially when certain genotypes are not allowed to mix. Populations in the process ofgenetic divergence or having already undergone it can have different characteristic phenotypes,[72] which means that when considered together, a correlation appears. Phenotypic qualities in humans that predominantly depend on ancestry also produce correlations of this type. This can also be observed in dog breeds where several physical features make up the distinctness of a given breed, and are therefore correlated.[73]Assortative mating, which is thesexually selective pressure to mate with a similar phenotype, can result in genotypes remaining correlated more than would be expected.[74]
{{cite book}}
:|journal=
ignored (help)