
Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data
Elena Bitocchi
Laura Nanni
Elisa Bellucci
Monica Rossi
Alessandro Giardini
Pierluigi Spagnoletti Zeuli
Giuseppina Logozzo
Jens Stougaard
Phillip McClean
Giovanna Attene
Roberto Papa
To whom correspondence should be addressed. E-mail:r.papa@univpm.it.
Edited* by Jeffrey L. Bennetzen, University of Georgia, Athens, GA, and approved December 16, 2011 (received for review June 3, 2011)
Author contributions: E. Bitocchi and R.P. designed research; E. Bitocchi, L.N., E. Bellucci, M.R., A.G., P.S.Z., G.L., and J.S., P.M., G.A., and R.P. performed research; E. Bitocchi, L.N., E. Bellucci, G.A., and R.P. analyzed data; and E. Bitocchi and R.P. wrote the paper.
Series information
PNAS Plus
Issue date 2012 Apr 3.
Abstract
Knowledge about the origins and evolution of crop species represents an important prerequisite for efficient conservation and use of existing plant materials. This study was designed to solve the ongoing debate on the origins of the common bean by investigating the nucleotide diversity at five gene loci of a large sample that represents the entire geographical distribution of the wild forms of this species. Our data clearly indicate a Mesoamerican origin of the common bean. They also strongly support the occurrence of a bottleneck during the formation of the Andean gene pool that predates the domestication, which was suggested by recent studies based on multilocus molecular markers. Furthermore, a remarkable result was the genetic structure that was seen for the Mesoamerican accessions, with the identification of four different genetic groups that have different relationships with the sets of wild accessions from the Andes and northern Peru–Ecuador. This finding implies that both of the gene pools from South America originated through different migration events from the Mesoamerican populations that were characteristic of central Mexico.
Keywords: crop evolution,Phaseoleae, population structure, mutation rate
The common bean (Phaseolus vulgaris L.) is the main grain legume for direct human consumption, and it represents a rich source of protein, vitamins, minerals, and fiber, especially for the poorer populations of Africa and Latin America (1). Investigations into the origins and evolution of this species would be expected to highlight the structure and organization of its genetic diversity and the role of the evolutionary forces that have been shaping this diversity. Such knowledge is a crucial prerequisite for efficient conservation and use of the existing germplasm for the development of new improved plant varieties.
The current distribution of the wild common bean encompasses a large geographical area: from northern Mexico to northwestern Argentina (2). In general, two major ecogeographical gene pools are recognized: Mesoamerica and the Andes. These two gene pools are characterized by partial reproductive isolation (3,4), and they are seen in both wild and domesticated materials. They have been recognized in several studies based on morphology (5–7), agronomic traits (7), seed proteins (8), allozymes (9), and different types of molecular markers (10–15), which have given the overall indication of the occurrence of at least two independent domestication events in the two different hemispheres. The existence of these two geographically distinct and isolated evolutionary lineages that predate the domestication of the common bean represents a unique scenario among crops. This scenario differs from the occurrence over a smaller geographical range of multiple domestications that have been proposed for other species, such as barley (16) and wheat (17), and indeed, within the Mesoamerican gene pool of the common bean itself (18, but see13,19). In these cases, the lack of isolation between the populations prevented independent evolution of different lineages in both the wild and domesticated forms. Rice is the only crop that may show a scenario that is to some extent similar to the scenario ofP. vulgaris (20,21), although a recent study has suggested a single domestication event at the origin of theindica andjaponica subgroups (22). While only these two major gene pools are recognized in the domesticated population, the geographical structure of the wild form of the common bean is more complex, with an additional third gene pool that is localized between Peru and Ecuador (23) and characterized by a specific storage seed protein, phaseolin type I (24). Moreover, wild populations from Colombia are often seen as intermediates, and a marked geographical structure is observed in wild beans from Mesoamerica (12). The population from northern Peru and Ecuador is usually considered the ancestral population from whichP. vulgaris originated (the northern Peru–Ecuador hypothesis) (11,24,25). Indeed, the work by Kami et al. (24) analyzed a portion of the gene that codes for the storage seed protein phaseolin, including phaseolin type (type I) from northern Peru–Ecuador accessions that was not present in the other gene pools, which indicates that type I phaseolin is ancestral to the other phaseolin sequences ofP. vulgaris (24). Thus, the work by Kami et al. (24) suggested that, starting from the core area of the western slopes of the Andes in northern Peru and Ecuador, the wild bean was dispersed north (Colombia, Central America, and Mexico) and south (southern Peru, Bolivia, and Argentina), which resulted in the Mesoamerican and Andean gene pools, respectively.
The Mesoamerican origin of the common bean is an alternative and older hypothesis. This hypothesis is supported by the observations that the closest relatives of wildP. vulgaris in thePhaseolus genus are distributed throughout Mesoamerica (26–29). Additionally, the higher diversity found in the Mesoamerican compared with the Andean gene pool (phaseolin types, allozyme alleles, and molecular markers) (8–10,30,31) supports a Mesoamerican origin.
Recently, the work by Rossi et al. (13) compared amplified fragment length polymorphism (AFLP) data based on an analysis of wild and domesticatedP. vulgaris accessions with the data from the work by Kwak and Gepts (14) based on simple sequence repeats (SSRs) obtained on a similar sample. This analysis suggested that, before domestication, there was a severe bottleneck in the Andean populations, and they reproposed the Mesoamerican origin of the common bean.
The main aim of the present study was to investigate the evolutionary history ofP. vulgaris to solve the issue regarding the origin of this species. We have investigated the nucleotide diversity for five different genes from a wide sample of wildP. vulgaris that is representative of its geographical distribution. Our findings identify the origin ofP. vulgaris as Mesoamerican, and they also reveal the level and structure of the genetic diversity that characterizes the wild accessions of this species. Our data are relevant for crop improvement and in particular, maximization of the efficient use of wild genetic diversity for breeding programs.
Results
Nucleotide Variation in the Wild Common Bean.
We sequenced five loci of a large collection that included wild common bean accessions from Mesoamerican (n = 49) and Andean (n = 47) gene pools and genotypes from northern Peru–Ecuador (n = 6) that are characterized by the ancestral type I phaseolin (Table S1). These gene pools represent a cross-section of the entire geographical distribution of the wild form ofP. vulgaris. The sequenced region for each gene is located within the transcriptional unit and encompasses between 500 and 900 bp, which includes both introns and exons. Altogether, we sequenced ∼3.4 kb per accession.
We investigated the structure of the sequenced gene fragments inP. vulgaris, starting with BLAST analysis against the reference sequences for the identification of legume anchor (Leg) markers (Fig. S1 andTable S2) (32). For the Leg marker Leg044, we identified an exon of 102 bp within the fragment, and the deduced 34-aa sequence has 85% identity with histidinol dehydrogenase ofArabidopsis thaliana. This enzyme catalyses the last two steps in thel-histidine biosynthesis pathway, which is conserved in bacteria, archaea, fungi, and plants. For Leg100, a coding region of 45 bp was identified at the 3′ extremity of the sequence, and its deduced 15-aa sequence showed 100% identity with biotin synthase ofA. thaliana, an enzyme that is involved in the biotin biosynthetic process. Three exons were found in Leg133 of 75, 72, and 90 bp. The corresponding 79-aa sequence showed 84% identity with dolichyl-diphosphooligosaccharide-protein glycotransferase ofA. thaliana, an enzyme that is involved in asparagine-linked protein glycosylation. Two exons (88 and 56 bp) were identified in Leg223, with an identity of 71% of the translated protein fragment with the eukaryotic translation initiation factor SUI1 family protein ofA. thaliana. This protein seems to have an important role in accurate initiator codon recognition during translation initiation. The structure ofPvSHP1 was characterized in the work by Nanni et al. (19) (Fig. S1 andTable S2), and the fragment sequenced in this study included three coding regions of 12, 42, and 42 bp. This sequence was identified as homologous toArabidopsis SHP1 (SHATTERPROOF 1), a gene that is involved in the control of fruit shattering.
We considered the variation in the coding regions of the loci studied in all of theP. vulgaris accessions. We found that the coding regions of three loci (Leg044, Leg100, and Leg223) did not show nucleotide substitutions. For Leg133, there was one synonymous substitution in three Mexican accessions and all of the accessions characterized by type I phaseolin. There was also one nonsynonymous substitution that involved only one Andean accession from Argentina [a 1-aa replacement: Asp (D) for Ala (A)]. Finally, forPvSHP1, a nonsynonymous substitution was found in two Mesoamerican accessions from Mexico [a 1-aa replacement: Gln (Q) for His (H)].
The nucleotide diversity for each of these five loci and the concatenate sequence was analyzed across all of theP. vulgaris accessions, taking into account the major population subdivisions that corresponded to the three wild gene pools: the Mesoamerican, Andean, and phaseolin type I (northern Peru–Ecuador) groups (Table 1). For the concatenate sequence, which included 84 accessions ofP. vulgaris for which all of the five loci sequences were obtained, a total of 137 variable sites (V) were observed. The number of haplotypes was 56, none of which was shared among the three gene pools. The highest number of haplotypes was 34 for the Mesoamerican gene pool, with 18 haplotypes in the Andean gene pool and 4 haplotypes in the northern Peru–Ecuador accessions. Among all of the groups, the highest diversity was seen in the Mesoamerican wild population (π = 10.6 × 10−3), which was 10-fold higher than the diversity of the Andean accessions (π = 1.0 × 10−3). Considering each gene fragment separately, the diversity was always higher in Mesoamerica than in the Andes (for all of the statistics considered: number of haplotypes, haplotype diversity, π, and θW) (Table 1). According to the binomial distribution, this pattern has a probability of occurring by chance ofP = 0.03. Moreover, considering the estimates from the five loci and using the Wilcoxon–Kruskal–Wallis nonparametric test, the Mesoamerican wild bean showed a significant higher diversity compared with the Andean populations (fromP ≤ 0.009 for haplotype, haplotype diversity, and π toP ≤ 0.016 for θW). This finding was also shown by the loss of diversity (Lπ) estimates between the Mesoamerican and Andean populations, with a reduction in the diversity for the latter compared with the former that ranged from 0.49 (Leg223) to 1.00 (Leg100) and was 0.90 for the concatenate sequence (Table 2). Finally, even if Tajima's D was never significant, in the Andean population, it was negative and always much lower than Tajima's D from Mesoamerica (Table 1).
Table 1.
Population genetics statistics for the five gene fragments and the concatenate sequences in the different gene pools ofP. vulgaris
| Population | N | V | Pi | S | H | Hd | π × 10−3 | θW × 10−3 | D |
| Concatenate | |||||||||
| All | 84 | 137 | 123 | 14 | 56 | 0.96 | 9.9 | 8.3 | — |
| MW | 37 | 119 | 98 | 21 | 34* | 0.99* | 10.6* | 8.7* | — |
| AW | 43 | 32 | 22 | 10 | 18 | 0.86 | 1.0 | 2.3 | — |
| PhI | 4 | 18 | 0 | 18 | 4 | 1.00 | 2.7 | 3.0 | — |
| Leg044 | |||||||||
| All | 96 | 23 | 23 | 0 | 13 | 0.83 | 6.1 | 5.4 | 0.35 ns |
| MW | 44 | 20 | 10 | 10 | 9* | 0.82* | 4.3* | 5.6* | −0.73 ns |
| AW | 46 | 9 | 3 | 6 | 5 | 0.47 | 0.9 | 2.5 | −1.77 ns |
| PhI | 6 | 14 | 0 | 14 | 2 | 0.33 | 5.7 | 7.4 | −1.47 ns |
| Leg100 | |||||||||
| All | 99 | 45 | 43 | 2 | 13 | 0.70 | 22.6 | 15.4 | 1.47 ns |
| MW | 47 | 40 | 39 | 1 | 10* | 0.83* | 22.9* | 16.0* | 1.46 ns |
| AW | 46 | 1 | 0 | 1 | 2 | 0.04 | 0.1 | 0.4 | −1.11 ns |
| PhI | 6 | 2 | 2 | 0 | 2 | 0.53 | 1.9 | 1.6 | 1.03 ns |
| Leg133 | |||||||||
| All | 101 | 15 | 12 | 3 | 9 | 0.63 | 5.1 | 5.0 | 0.06 ns |
| MW | 48 | 13 | 12 | 1 | 6* | 0.75* | 6.2* | 5.1* | 0.70 ns |
| AW | 47 | 3 | 1 | 2 | 4 | 0.20 | 0.4 | 1.2 | −1.34 ns |
| PhI | 6 | 0 | 0 | 0 | 1 | 0.00 | 0.0 | 0.0 | — |
| Leg223 | |||||||||
| All | 94 | 7 | 7 | 0 | 8 | 0.78 | 3.2 | 2.9 | 0.23 ns |
| MW | 43 | 6 | 6 | 0 | 7* | 0.79* | 2.9* | 3.0* | −0.11 ns |
| AW | 47 | 4 | 3 | 1 | 4 | 0.45 | 1.5 | 1.9 | −0.58 ns |
| PhI | 4 | 0 | 0 | 0 | 1 | 0.00 | 0.0 | 0.0 | — |
| PvSHP1 | |||||||||
| All | 98 | 49 | 44 | 5 | 26 | 0.84 | 14.0 | 11.2 | 0.81 ns |
| MW | 47 | 42 | 36 | 6 | 20* | 0.92* | 16.0* | 11.2* | 1.47 ns |
| AW | 45 | 15 | 15 | 0 | 4 | 0.32 | 1.7 | 4.0 | −1.79 ns |
| PhI | 6 | 2 | 0 | 2 | 3 | 0.60 | 0.8 | 1.0 | −1.13 ns |
All, all genotypes ofP. vulgaris; AW, Andean wild; D, the D parameter by Tajima (66) for testing neutrality; H, number of haplotypes; Hd, haplotype diversity; MW, Mesoamerican wild;N, sample size; ns, not significant; PhI, phaseolin I type; Pi, parsimony informative sites; S, singleton variable sites; V, variable sites; π × 10−3 and θW × 10−3, two measures of nucleotide diversity from Tajima (64) and Watterson (65) (θ estimator), respectively.
*The MW diversity parameters that are higher than those parameters of the AW.
Table 2.
Loss of nucleotide diversity in the Andean wild (AW) population vs. Mesoamerican wild (MW) population calculated asLπ = 1 − (πAW/πMW), where πMW and πAW are the nucleotide diversities in MW and AW populations, respectively (39)
| Locus | Lπ |
| Leg044 | 0.78 |
| Leg100 | 1.00 |
| Leg133 | 0.93 |
| Leg223 | 0.49 |
| PvSHP1 | 0.89 |
| Concatenate | 0.90 |
Population Structure.
The population structure analysis determined six subpopulations that best define the population (Fig. 1, B1 to B6). All of the Andean accessions were clearly assigned to cluster B6, with high percentages of membership (qB6 ≥ 0.76). The same was seen for the type I phaseolin of the northern Peru–Ecuador accessions, which were assigned to cluster B5 (qB5 ≥ 0.91). The Mesoamerican accessions showed a different scenario; indeed, they were subdivided into four different clusters (B1–B4), and they showed higher levels of admixture. We considered a threshold ofq ≥ 0.70 to assign the individuals to these four clusters. Cluster B1 included 17 Mesoamerican accessions (qB1 ≥ 0.85), 6 of which were from Mexico, 6 from Guatemala, 4 from Colombia, and 1 from El Salvador. The other three clusters were composed of only Mexican accessions: cluster B2 included seven accessions (qB2 ≥ 0.93), cluster B3 included eight accessions (qB3 ≥ 0.76), and cluster B4 included four accessions (qB4 ≥ 0.99). One Mexican accession (PI325677) was not assigned to any specific Mesoamerican cluster because of its high level of admixture (qB2 = 0.28,qB3 = 0.45,qB6 = 0.22, andqB4 = 0.05).
Fig. 1.
Percentages of membership (q) for each of the clusters identified (B1–B6; color-coded as indicated). Each accession is represented by a vertical line divided into colored segments, the lengths of which indicate the proportions of the genome that are attributed to the specific clusters. The accessions are ordered according to latitude from northern Mexico to northern Argentina. The country of origin is indicated by the horizontal line. AW, Andean wild; ar, Argentina; bl, Bolivia; C_mx, central Mexico; col, Colombia; ec, Ecuador; es, El Salvador; gt, Guatemala; MW, Mesoamerican wild; N_mx, north Mexico; N_pr, northern Peru; PhI, type I phaseolin (northern Peru–Ecuador); S_mx, south Mexico; S_pr, southern Peru.
To better understand the geographical distributions of the Mesoamerican accessions that belong to genetic groups B1–B4, we carried out spatial interpolation of the membership coefficients (Fig. 2). The accessions of cluster B1 were distributed all along the Mesoamerican gene pool from the north of Mexico down to Colombia (Fig. 2A), with a major presence toward the Pacific Ocean. The accessions of cluster B2 were spread from central (particularly on the Caribbean side) down to southern Mexico (Fig. 2B). However, there was a lack of representation of these two clusters in a wide area of central Mexico. Clusters B3 and B4 were represented essentially in Mexico and in particular, above the Transverse Volcanic Axis, with cluster B3 widely spread from northern to central Mexico (Fig. 2C) and cluster B4 more restricted to a small area in central Mexico (Fig. 2D).
Fig. 2.
Spatial interpolation of membership coefficients (q) for the four Mesoamerican clusters identified by the Bayesian clustering analysis. (A) B1_blue. (B) B2_light-blue. (C) B3_green. (D) B4_orange. Latitude and longitude are expressed in the Universal Transverse Mercator system.
The relationships among the clusters identified were revealed by a neighbor-joining (NJ) tree (Fig. 3). There was no clear distinction between the three gene pools (Mesoamerican, Andean, and northern Peru–Ecuador). Interestingly, the Andean cluster (B6) was more closely related to the Mesoamerican group B3, and the northern Peru–Ecuador cluster (B5) was more closely related to the Mesoamerican group B4; however, the Mesoamerican groups B1 and B2 were in an external position.
Fig. 3.
Unrooted NJ tree showing the phylogenetic relationships of genetic clusters identified by cluster analysis.
A similar population structure was revealed by the NJ tree obtained considering the single accessions (Fig. 4) according to the Bayesian model-based approach [Bayesian Analysis of Population Structure (BAPS)]. Also in this case, there was no clear distinction between the three gene pools. Indeed, the Mesoamerican accessions were found to be distributed in all of the tree branches: the Andean accessions (Fig. 4, AW) clustered with the Mesoamerican group B3 (bootstrap value = 97%) and the northern Peru–Ecuador accessions (Fig. 4, PhI) were more related to the other Mesoamerican groups (bootstrap value = 98%) and particularly, the B4 accessions. The Mesoamerican groups B1 and B2 were included in a clade that was statistically well-supported (bootstrap value = 79%).
Fig. 4.
Unrooted NJ bootstrap tree inferred from the concatenate sequence data. Each set of accessions (as indicated) is represented by a colored circle, and each color indicates the membership to the BAPS groups. Small gray and violet circles represent the nodes for which bootstrap values are higher that 50% and 80%, respectively (the 80% threshold highlights the relationships with very strong support). AW, Andean wild; MW, Mesoamerican wild; PhI, type I phaseolin (northern Peru–Ecuador).
Haplotype Networks.
The haplotype networks for each of the five loci are shown inFig. 5. The number of haplotypes found for each gene ranged from 8 (Leg223) to 26 (PvSHP1). The Leg044, Leg133, and Leg223 haplotypes were interrelated through a few mutational steps, whereas a higher number of evolutionary steps was seen for the Leg100 andPvSHP1 haplotypes. However, a consistent observation arose from this analysis: the Mesoamerican gene pool had the greatest number of haplotypes for all of the loci, from 6 (Leg133) to 20 (PvSHP1), and the Andean gene pool showed a star-like structure with 1 major haplotype with a high frequency (higher than 0.72) and a few (from 1 to 3) additional minor haplotypes. Furthermore, there was no clear distinction between the Mesoamerican and Andean group of haplotypes, where the former seemed to be distributed all along the tree and the latter showed haplotypes shared always with the Mesoamerican accessions; the only exception here wasPvSHP1, where there were no common haplotypes between these gene pools. However, a clear relationship was evident between a group of Mesoamerican haplotypes (from central Mexico) and the three major Andean haplotypes, which were separated by six mutational steps (Fig. 5,PvSHP1). A confirmation of the clear relationship between the Andean and Mesoamerican accessions of group B3 that was highlighted by the NJ trees (Figs. 3 and4) was provided by these haplotype networks, even if, at this level, a similar situation can be seen for B1. Finally, for all of the genes, the type I phaseolin accessions showed haplotypes that were closer to the Mesoamerican accessions and often separated from the majority of the Andean accessions.
Fig. 5.
Haplotype networks of the five nuclear loci. Each circle represents a single different haplotype, and the circle sizes are proportional to the number of individuals that carry the same haplotype. Black circles indicate missing intermediate haplotypes. The lengths of the lines of the haplotype networks are proportional to the number of mutational steps, which are indicated when more than one. Colors show the genotypes belonging to the different BAPS clusters (as indicated). AW, Andean wild; MW, Mesoamerican wild; PhI, type I phaseolin (northern Peru–Ecuador).
Discussion
In this study, we analyzed the nucleotide diversity for five gene fragments in a large sample of wildP. vulgaris to address the origins of this species. Indeed, the main aim was to throw light onto the unique scenario among crop plants that characterizes the common bean: the existence of two major geographically distinct evolutionary lineages that predate domestication (Mesoamerican and Andean). Our results indicate a clear pattern associated with a Mesoamerican origin of this species from which different migration events extended the distribution ofP. vulgaris into South America.
To date, the most credited hypothesis relating to the origins of the common bean has indicated that, from a core area on the western slopes of the Andes in northern Peru and Ecuador, the wild beans were dispersed north (to Colombia, Central America, and Mexico) and south (to southern Peru, Bolivia, and Argentina), which resulted in the Mesoamerican and Andean gene pools, respectively. This hypothesis has relied on the identification of a phaseolin (the major seed storage protein) as the ancestral phaseolin (type I), and it was based on the assumption that the species phylogeny is identical to the phylogeny of the gene. However, for several reasons (33), this strict relationship has not always held, and as our results suggest, the current distribution of phaseolin might not reflect its ancient distribution. Alternatively, type I phaseolin might be extinct in Mesoamerica, or it might still be present but just not included in the samples analyzed.
Predomestication Bottleneck in the Andes.
Recently, on the basis of a comparison of the levels observed for AFLP (13) and SSR (14) diversity in the wild populations ofP. vulgaris from the Andes and Mesoamerica, the work by Rossi et al. (13) suggested that a bottleneck had occurred in the Andes before domestication. Indeed, for molecular markers characterized by a higher mutation rate (SSRs compared with AFLPs), there were no (or only very small) differences in the diversities observed in the two main geographical areas, whereas there was much lower diversity in the Andes for AFLPs compared with the Mesoamerican wildP. vulgaris. Indirect estimates of the AFLP mutation rate have shown values that vary from 10−6 to 10−5 (34–36), whereas the SSR mutation rate is higher; it ranges from 10−3 to 10−4 using both indirect (34,37,38) and direct (39,40) estimates. Thus, following the model proposed in the work by Nei et al. (41) that described the effects of a bottleneck on the genetic diversity of a population at a neutral locus, the work by Rossi et al. (13) suggested the occurrence of a bottleneck in the Andes before domestication. This bottleneck was then recovered by markers with a high mutation rate compared with markers showing lower rates of mutation. According to this hypothesis for nucleotide diversity, an even lower diversity should be expected in the Andes compared with Mesoamerica. Indeed, as indicated in the work by Lynch and Conery (42) for Fabaceae, the nucleotide mutation rate is ∼6.1 × 10−9, much lower than both the AFLP and SSR rates. Our sequence data show a very strong difference in the genetic diversity between the wild Mesoamerican and Andean accessions (Lπ = 90%). These reductions are about 2- and 13-fold higher than the reductions computed in a comparable sample ofP. vulgaris genotypes using AFLP (45%) (13) and SSR data (7%) (14), respectively. Thus, our data strongly support the bottleneck hypothesis of Rossi et al. (13), and they are based on the clear relationship between the mutation rate and the time of diversity recovery from the occurrence of a bottleneck: the higher the mutation rate, the faster the recovery of diversity. To the best of our knowledge, this example is one of the few (40) of the critical role of marker mutations in describing the diversity of plant populations, thus underlining the need for the careful consideration of mutation rates in diversity studies.
Population Structure in the Mesoamerican Gene Pool.
The occurrence of three gene pools of the wild common bean (Mesoamerica, Andes, and northern Peru–Ecuador) has been shown extensively (9,12–14,23,24,31,43). In particular, the subdivisions of the two major ecogeographical gene pools (Mesoamerica and Andes) have been shown by several studies that have used different types of markers. Even if high population structure has been seen in the Mesoamerican wild gene pool (12), it has usually been considered as a single gene pool. However, in the present study, while the northern Peru–Ecuador and Andean gene pools are, indeed, characterized by homogeneous assignments into specific genetic groups (B5 and B6, respectively), the Mesoamerican accessions are clearly split into four distinct genetic clusters, B1–B4, that are clearly separated also with the NJ tree based on the single individual genotypes (Fig. 4). Moreover, a very important result is seen in the lack of a clear distinction between the Mesoamerican and Andean wild gene pools (Figs. 3–5), whereas the Mesoamerican clusters show different degrees of relatedness with the other gene pools. In particular, as showed by the NJ trees (Figs. 3 and4), the Andean wild accessions (B6) were more related to the Mesoamerican B3 as were the northern Peru–Ecuador accessions (B5) to the Mesoamerican B4; indeed, the major Mesoamerican clusters (B1 and B2) were less related to the Andean and northern Peru–Ecuador clusters. It is, thus, important to consider the geographical distributions of these groups in Mesoamerica. The B1 group was present essentially across all of the geographical area from the north of Mexico down to Colombia, whereas the B2 group was spread from central to southern Mexico. According to our data, the B1 and B2 clusters were almost absent in a wide area of central Mexico, where they were substituted by the two Mesoamerican groups, as the B3 and B4 clusters, which were more related to the South American populations B5 and B6. This finding is clearly not compatible with the hypothesis of a South American origin, where the phaseolin I genotypes would be expected to be intermediate between the Mesoamerican and Andean clusters.
The Mesoamerican origin of the wild common bean is supported by the large diversity observed, and it is also confirmed by the single locus haplotype networks, where the northern Peru–Ecuador haplotypes were closer to the Mesoamerican groups and often separated from the majority of the Andean accessions that were usually represented as a single, highly frequent haplotype.
In summary, this analysis of the population structure supports the Mesoamerican origin hypothesis. At the same time, it reveals the very complex geographical structure of the genetic diversity in Mesoamerica, with central Mexico and the Transverse Volcanic Axis, which originated ∼5 Mya in the Late Miocene, as the cradle of diversity ofP. vulgaris. As the magnitude of this structure has not been clearly identified using multilocus markers, on the basis of its old origins, this finding would suggest that its signature has been partially hidden because of the combined effects of different mutation rates and recombination.
Origin of the Common Bean.
Our study presents clear evidence of a Mesoamerican origin ofP. vulgaris, which was most likely located in Mexico, both from the analysis of the population structure and phylogeny and the confirmation of the occurrence of a bottleneck before domestication in the Andes, which was proposed in the work by Rossi et al. (13). The Mesoamerican origin is consistent with the known distribution of most of the close relatives ofP. vulgaris, the much higher diversity of Mesoamerican wild compared with the diversity from South America, the occurrence of a severe bottleneck in the Andes before domestication, and finally, the occurrence in Mesoamerica of wild beans that are closely related to those beans found in South America, both from the Andean and northern Peru–Ecuador gene pool. Thus, we suggest thatP. vulgaris from northern Peru–Ecuador is a relict population that only represents a fraction of the genetic diversity of the ancestral population and that this population migrated from Central Mexico in ancient times. The results that suggest type I phaseolin as ancestral seem quite robust; thus, the absence of this type of seed protein in the Mesoamerican gene pool could be explained by two alternative hypotheses: the type I phaseolin became extinct in Mesoamerica, or it might still be present but just not included in the samples analyzed in the literature.
Our data also present a scenario for the evolutionary history of the wild common bean, with the magnitude of the population subdivisions in Mesoamerica not having been clearly recognized before this study.
Conclusions
Evolutionary studies of crop species are crucial for several applications; indeed, the knowledge relating to the level and structure of genetic diversity of crop plants and their wild relatives is the starting point of any breeding program (44–52). Our study indicates that, to explore new genetic diversity that is not incorporated into the current domesticated germplasm and consider this high genetic diversity in the Mesoamerican accessions that is not present in the Andean gene pool, it is the wild Mesoamerican germplasm that should be used in breeding programs, because it has potential for the release of new cultivars. Moreover, it is crucial to consider the high population structure that characterizes the Mesoamerican wild germplasm to sample the largest amount of diversity for introgression into commercial varieties. This finding is very important if we consider that the majority of the improved varieties of the common bean are of Andean origin at present. Furthermore, exploration of new genetic diversity is also fundamental for the meeting of future demands for cultivars that can adapt to climate change, while also maintaining, or improving their yields.
Materials and Methods
Plant Materials.
A panel of 102 wild accessions of the common bean was selected to represent the geographical distribution of wildP. vulgaris from northern Mexico to northwestern Argentina. The accessions are representative of the different gene pools of the species: 49 Mesoamerican accessions (Mexico, Central America, and Colombia), 47 Andean accessions (South America), and 6 wild accessions from northern Peru and Ecuador that are characterized by the ancestral type I phaseolin (23,24). The accessions characterized by the type I phaseolin are from few small populations found in restricted geographic areas on the western slope of the Andes. We used six of these accessions to represent the diversity of these populations, including only well-described and characterized accessions. A complete list of the accessions studied is available inTable S1. The seeds were provided by the United States Department of Agriculture Western Regional Plant Introduction Station and the International Centre of Tropical Agriculture in Colombia.
PCR and Sequencing.
Genomic DNA was extracted from each accession from young leaves of a single, greenhouse-grown plant using the miniprep extraction method (53). A total of five ∼500- to 900-bp gene regions across the common bean genome were sequenced (Table S2). Four of these fragment genes (Leg044, Leg100, Leg133, and Leg223) were chosen from a set of these Leg markers developed in the work by Hougaard et al. (32). Single-copy orthologous genes between legume species were identified, and primers were designed in conserved exon regions for the amplification of exon and intron sequences (32,54–56). The fifth gene fragment,PvSHP1, is homologous to the SHATTERPROOF (SHP1) gene involved in the control of fruit shattering inA. thaliana. This finding was developed in the work by Nanni et al. (19) that analyzedPvSHP1 nucleotide variation in a limited sample of wildP. vulgaris (29 and 16 Mesoamerican and Andean wild genotypes, respectively). The availablePvSHP1 sequence data of 40 wild genotypes from the study by Nanni et al. (19) were included in the present analyses (Table S1).
It is important to note that mapping experiments show that gene duplication and highly repeated gene sequences inP. vulgaris are generally low, with most loci occurring as single copies (57–59).
DNA fragments were amplified using 25 ng DNA and the following reagent concentrations: 0.25 μM each forward and reverse primer, 200 μM each dNTP, 2.5 mM MgCl2, 1× Taq polymerase buffer, 1 unit AmpliTaq DNA polymerase, and sterile double-distilled H2O to a final volume of 100 μL. Amplifications were conducted with a 9700 Thermal Cycler (Perkin-Elmer Applied Biosystems) with an initial denaturation of 1 min at 95 °C that was followed by 30 cycles of 1 min at 95 °C, 1 min at X °C, and 2 min at 72 °C plus 10 min of final extension at 72 °C. The X °C refers to the annealing temperature, which is specified for each primer pair inTable S2. The PCR products were purified using GFX PCR DNA and Gel Band Purification Kits (GE Healthcare) according to the manufacturer's instructions. For the Leg100, Leg133, and Leg223 loci, the samples were sequenced on both strands using forward and reverse primers on the cycle sequencing reaction with BigDye Terminator Cycle Sequencing Ready Reaction Kits (Applied Biosystems). The products were resolved on an ABI Prism 3100-Avant Automated Sequencer (Applied Biosystems). The sequence data were analyzed using Pregap4 and Gap4 of the Staden Software Package (http://staden.sourceforge.net/). The Pregap4 modules were used to prepare the sequence data for assembly (quality analysis). Gap4 was used for the final sequence assembly of the Pregap4 output files (normal shotgun assembly). The single-strand sequencing reaction for Leg044 (reverse primer) andPvSHP1 (forward primer) was performed by Macrogen. The fragments were resequenced if there was any ambiguity as to which allele was present. The sequences are accessible at GenBank (GeneBank accession nos.JN796475–JN796922).
Diversity Analyses.
For the Leg markers, the identification of exons and introns was carried out by BLAST analysis (60) against the reference sequence ofA. thaliana (32) (Table S2). ForPvSHP1, its structure was previously identified in the work by Nanni et al. (19). Sequence alignment and editing were carried out using MUSCLE v3.7 (61) and BioEdit v.7.0.9.0 (62). Insertion/deletions (indels) were not included in the analyses. Molecular population genetic analyses were conducted using DnaSP 5.10.01 (63). As given inTable 1, estimations for the five gene fragments and the concatenate sequences in the different gene pools ofP. vulgaris were made for the number of variable sites, singleton variable sites, parsimony informative sites, haplotypes, haplotype diversity, nucleotide diversity (π in ref.64 and θW in ref.65), and D by Tajima (66). To measure the loss of nucleotide diversity in the Andean wild vs. Mesoamerican wild populations as proposed in the work by Vigoroux et al. (39), we used the statisticLπ = 1 − (πAW/πMW), where πMW and πAW are the nucleotide diversities in Mesoamerican and Andean wild populations, respectively; theLπ parameter ranges from zero to one, which indicate a total loss of diversity and no loss of diversity, respectively. Haplotype trees for each gene fragment were constructed using the median-joining network algorithm implemented in the program NETWORK 4.5.1.6 (67). Considering the concatenate sequence, an unrooted phylogenetic tree was constructed based on Kimura two-parameter distances, and the relative support for each node was tested by the bootstrap method using 1,000 replicates in MEGA 4 (68). The sites with gaps were also excluded from these analyses.
Population Structure Analyses.
A Bayesian model-based approach was used to infer the hidden genetic population structure of our sample and thus, assign the genotypes into genetically structured groups/populations. This approach was implemented in the software BAPS 5.3 (69–72). This version of the software incorporates the possibility to account for the dependence caused by linkage between the sites within aligned sequences, which is different from the most widely used STRUCTURE software (73) that uses a model that is not designed to deal with background linkage disequilibrium between very tightly linked markers (74). A total of 84 accessions, those accessions showing high-quality sequences for all of the five loci, was used in this analysis. We carried out a genetic mixture analysis to determine the most probable number of populations (K) given the data (71,72). Under its default settings, BAPS includesK as a parameter to be estimated, and the best partition of the data intoK clusters is identified as the one with the highest marginal log likelihood. The clustering with linked loci analysis was chosen to account for the linkage present between sites within aligned sequences; at the same time, the five loci were assumed as independent. Ten iterations ofK (from 1 to 20) were conducted to determine the optimal number of genetically homogeneous groups. The admixture analysis was then applied to estimate individual admixture proportions with regards to the most likely number ofK clusters identified (70,72). Admixture inference was based on 100 realizations from the posterior of the allele frequencies. We repeated the admixture five times to confirm the consistency of the results. Spatial interpolation of membership coefficients was performed according to the kriging method, which was implemented in the R packages spatial (http://www.r-project.org/). To explore the relationships among the identified clusters, an unrooted phylogenetic tree was constructed using the NJ algorithm in MEGA 4 (68).
Supplementary Material
Acknowledgments
We thank G. Bertorelle for critical reading of this manuscript and his valuable advice. This study was supported by Italian Government (MIUR) Grant n. 20083PFSXA_001, Project Progetti di Ricerca di Interesse Nazionale (PRIN) 2008, and the Università Politecnica delle Marche (2006–2010).
Footnotes
The authors declare no conflict of interest.
*This Direct Submission article had a prearranged editor.
Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos.JN796475-JN796922).
See Author Summary on page5148 (volume 109, number 14).
This article contains supporting information online atwww.pnas.org/lookup/suppl/doi:10.1073/pnas.1108973109/-/DCSupplemental.
References
- 1.Brougthon WJ, et al. Beans (Phaseolus spp.): model food legumes. Plant Soil. 2003;252:55–128. [Google Scholar]
- 2.Toro O, Tohme J, Debouck DG. Wild Bean (Phaseolus vulgaris L): Description and Distribution. Cali, Colombia: Centro Internacional de Agricultura Tropical; 1990. [Google Scholar]
- 3.Gepts P, Bliss FA. F1 hybrid weakness in the common bean: Differential geographic origin suggests two gene pools in cultivated bean germplasm. J Hered. 1985;76:447–450. [Google Scholar]
- 4.Koinange EMK, Gepts P. Hybrid weakness in wild Phaseolus vulgaris L. J Hered. 1992;83:135–139. [Google Scholar]
- 5.Delgado-Salinas A, Bonet A, Gepts P. The wild relative of Phaseolus vulgaris in Middle America. In: Gepts P, editor. Genetic Resources of Phaseolus Beans. Boston: Kluwer; 1988. pp. 163–184. [Google Scholar]
- 6.Gepts P, Debouck DG. Origin, domestication, and evolution of the common bean, Phaseolus vulgaris. In: Voysest O, Van Schoonhoven A, editors. Common Beans: Research for Crop Improvement. Wallingford, Oxon, United Kingdom: CAB International; 1991. pp. 7–53. [Google Scholar]
- 7.Singh SP, Gutiérrez JA, Molina A, Urrea C, Gepts P. Genetic diversity in cultivated common bean: II. Marker-based analysis of morphological and agronomic traits. Crop Sci. 1991;31:23–29. [Google Scholar]
- 8.Gepts P, Osborn TC, Rashka K, Bliss FA. Phaseolin-protein variability in wild forms and landraces of the common bean (Phaseolus vulgaris): Evidence for multiple centers of domestication. Econ Bot. 1986;40:451–468. [Google Scholar]
- 9.Koenig R, Gepts P. Allozyme diversity in wild Phaseolus vulgaris: Further evidence for two major centers of genetic diversity. Theor Appl Genet. 1989;78:809–817. doi: 10.1007/BF00266663. [DOI] [PubMed] [Google Scholar]
- 10.Becerra-Velásquez VL, Gepts P. RFLP diversity in common bean (Phaseolus vulgaris L.) Genome. 1994;37:256–263. doi: 10.1139/g94-036. [DOI] [PubMed] [Google Scholar]
- 11.Freyre R, Ríos R, Guzmán L, Debouck D, Gepts P. Ecogeographic distribution of Phaseolus spp. (Fabaceae) in Bolivia. Econ Bot. 1996;50:195–215. [Google Scholar]
- 12.Papa R, Gepts P. Asymmetry of gene flow and differential geographical structure of molecular diversity in wild and domesticated common bean (Phaseolus vulgaris L.) from Mesoamerica. Theor Appl Genet. 2003;106:239–250. doi: 10.1007/s00122-002-1085-z. [DOI] [PubMed] [Google Scholar]
- 13.Rossi M, et al. Linkage disequilibrium and population structure in wild and domesticated populations of Phaseolus vulgaris L. Evol Appl. 2009;2:504–522. doi: 10.1111/j.1752-4571.2009.00082.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Kwak M, Gepts P. Structure of genetic diversity in the two major gene pools of common bean (Phaseolus vulgaris L., Fabaceae) Theor Appl Genet. 2009;118:979–992. doi: 10.1007/s00122-008-0955-4. [DOI] [PubMed] [Google Scholar]
- 15.Angioi SA, et al. Development and use of chloroplast microsatellites in Phaseolus spp. and other legumes. Plant Biol (Stuttg) 2009;11:598–612. doi: 10.1111/j.1438-8677.2008.00143.x. [DOI] [PubMed] [Google Scholar]
- 16.Morrell PL, Clegg MT. Genetic evidence for a second domestication of barley (Hordeum vulgare) east of the Fertile Crescent. Proc Natl Acad Sci USA. 2007;104:3289–3294. doi: 10.1073/pnas.0611377104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Özkan H, Willcox G, Graner A, Salamini F, Kilian B. Geographic distribution and domestication of wild emmer wheat (Triticum dicoccoides) Genet Resour Crop Evol. 2010;58:11–53. [Google Scholar]
- 18.Chacón S MI, Pickersgill B, Debouck DG. Domestication patterns in common bean (Phaseolus vulgaris L.) and the origin of the Mesoamerican and Andean cultivated races. Theor Appl Genet. 2005;110:432–444. doi: 10.1007/s00122-004-1842-2. [DOI] [PubMed] [Google Scholar]
- 19.Nanni L, et al. Nucleotide diversity of a genomic sequence similar to SHATTERPROOF (PvSHP1) in domesticated and wild common bean (Phaseolus vulgaris L.) Theor Appl Genet. 2011;123:1341–1357. doi: 10.1007/s00122-011-1671-z. [DOI] [PubMed] [Google Scholar]
- 20.Vitte C, Ishii T, Lamy F, Brar D, Panaud O. Genomic paleontology provides evidence for two distinct origins of Asian rice (Oryza sativa L.) Mol Genet Genomics. 2004;272:504–511. doi: 10.1007/s00438-004-1069-6. [DOI] [PubMed] [Google Scholar]
- 21.Londo JP, Chiang YC, Hung KH, Chiang TY, Schaal BA. Phylogeography of Asian wild rice, Oryza rufipogon, reveals multiple independent domestications of cultivated rice, Oryza sativa. Proc Natl Acad Sci USA. 2006;103:9578–9583. doi: 10.1073/pnas.0603152103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Molina J, et al. Molecular evidence for a single evolutionary origin of domesticated rice. Proc Natl Acad Sci USA. 2011;108:8351–8356. doi: 10.1073/pnas.1104686108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Debouck DG, Toro O, Paredes OM, Johnson WC, Gepts P. Genetic diversity and ecological distribution of Phaseolus vulgaris in northwestern South America. Econ Bot. 1993;47:408–423. [Google Scholar]
- 24.Kami J, Velásquez VB, Debouck DG, Gepts P. Identification of presumed ancestral DNA sequences of phaseolin in Phaseolus vulgaris. Proc Natl Acad Sci USA. 1995;92:1101–1104. doi: 10.1073/pnas.92.4.1101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Gepts P, Papa R, Coulibaly S, González Mejía A, Pasquet R. Wild legume diversity and domestication—insights from molecular methods. In: Vaughan D, editor. Wild Legumes: Proceedings of the Seventh MAFF International Workshop on Genetic Resources. Tsukuba, Japan: National Institute of Agrobiological Resources; 1999. pp. 19–31. [Google Scholar]
- 26.Schmit V, Jardin P, Baudoin JP, Debouck DG. Use of chloroplast DNA polymorphisms for the phylogenetic study of seven Phaseolus taxa including P. vulgaris and P. coccineus. Theor Appl Genet. 1993;87:506–516. doi: 10.1007/BF00215097. [DOI] [PubMed] [Google Scholar]
- 27.Freytag GF, Debouck DG. Taxonomy, Distribution, and Ecology of the Genus Phaseolus (Leguminosae-Papilionoideae) in North America, Mexico and Central America. Ft. Worth, TX: Botanical Research Institute Of Texas; 2002. [Google Scholar]
- 28.Delgado-Salinas A, Turley T, Richman A, Lavin M. Phylogenetic analysis of the cultivated and wild species of Phaseolus (Fabaceae) Syst Bot. 1999;24:438–460. [Google Scholar]
- 29.Delgado-Salinas A, Bibler R, Lavin M. Phylogeny of the genus Phaseolus (Leguminosae): A recent diversification in an ancient landscape. Syst Bot. 2006;31:779–791. [Google Scholar]
- 30.Koenig R, Singh SP, Gepts P. Novel phaseolin types in wild and cultivated common bean (Phaseolus vulgaris, Fabaceae) Econ Bot. 1990;44:50–60. [Google Scholar]
- 31.Chacón MI, Pickersgill B, Debouck DG, Salvador Arias J. Phylogeographic analysis of the chloroplast DNA variation in wild common bean (Phaseolus vulgaris L.) in the Americas. Plant Syst Evol. 2007;266:175–195. [Google Scholar]
- 32.Hougaard BK, et al. Legume anchor markers link syntenic regions between Phaseolus vulgaris, Lotus japonicus, Medicago truncatula and Arachis. Genetics. 2008;179:2299–2312. doi: 10.1534/genetics.108.090084. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Doyle JJ. Gene trees and species trees: Molecular systematics as one character taxonomy. Syst Bot. 1992;17:144–163. [Google Scholar]
- 34.Mariette S, et al. Genetic diversity within and among Pinus pinaster populations: Comparison between AFLP and microsatellite markers. Heredity (Edinb) 2001;86:469–479. doi: 10.1046/j.1365-2540.2001.00852.x. [DOI] [PubMed] [Google Scholar]
- 35.Gaudeul M, Till-Bottraud I, Barjon F, Manel S. Genetic diversity and differentiation in Eryngium alpinum L. (Apiaceae): Comparison of AFLP and microsatellite markers. Heredity (Edinb) 2004;92:508–518. doi: 10.1038/sj.hdy.6800443. [DOI] [PubMed] [Google Scholar]
- 36.Kropf M, Comes HP, Kadereit JW. An AFLP clock for the absolute dating of shallow-time evolutionary history based on the intraspecific divergence of southwestern European alpine plant species. Mol Ecol. 2009;18:697–708. doi: 10.1111/j.1365-294X.2008.04053.x. [DOI] [PubMed] [Google Scholar]
- 37.Estoup A, Angers B. Microsatellites and minisatellites for molecular ecology: Theoretical and experimental considerations. In: Carvalho G, editor. Advances in Molecular Ecology. Amsterdam: IOS Press; 1998. pp. 55–86. [Google Scholar]
- 38.Garoia F, Guarniero I, Grifoni D, Marzola S, Tinti F. Comparative analysis of AFLPs and SSRs efficiency in resolving population genetic structure of Mediterranean Solea vulgaris. Mol Ecol. 2007;16:1377–1387. doi: 10.1111/j.1365-294X.2007.03247.x. [DOI] [PubMed] [Google Scholar]
- 39.Vigouroux Y, et al. Identifying genes of agronomic importance in maize by screening microsatellites for evidence of selection during domestication. Proc Natl Acad Sci USA. 2002;99:9650–9655. doi: 10.1073/pnas.112324299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Thuillet AC, Bataillon T, Poirier S, Santoni S, David JL. Estimation of long-term effective population sizes through the history of durum wheat using microsatellite data. Genetics. 2005;169:1589–1599. doi: 10.1534/genetics.104.029553. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Nei M, Maruyama T, Chakraborty R. The bottleneck effect and genetic variability in populations. Evolution. 1975;29:1–10. doi: 10.1111/j.1558-5646.1975.tb00807.x. [DOI] [PubMed] [Google Scholar]
- 42.Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–1155. doi: 10.1126/science.290.5494.1151. [DOI] [PubMed] [Google Scholar]
- 43.Tohme J, Gonzales DO, Beebe S, Duque MC. AFLP analysis of gene pools of a wild bean core collection. Crop Sci. 1996;36:1375–1384. [Google Scholar]
- 44.Hammer K. The domestication syndrome (Das Domestikationssyndrom) Kulturpflanze. 1984;32:11–34. [Google Scholar]
- 45.Doebley J, Stec A, Wendel J, Edwards M. Genetic and morphological analysis of a maize-teosinte F2 population: Implications for the origin of maize. Proc Natl Acad Sci USA. 1990;87:9888–9892. doi: 10.1073/pnas.87.24.9888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Harlan JR. Crops and Man. Madison, WI: ASA; 1992. [Google Scholar]
- 47.Grandillo S, Tanksley SD. QTL analysis of horticultural traits differentiating the cultivated tomato from the closely related species Lycopersicon pimpinellifolium. Theor Appl Genet. 1996;92:935–951. doi: 10.1007/BF00224033. [DOI] [PubMed] [Google Scholar]
- 48.Koinange EMK, Singh SP, Gepts P. Genetic control of the domestication syndrome in common bean. Crop Sci. 1996;36:1037–1045. [Google Scholar]
- 49.Xiong LZ, Liu KD, Dai XK, Xu CG, Zhang Q. Identification of genetic factors controlling domestication-related traits of rice using an F2 population of a cross between Oryza sativa and O. rufipogon. Theor Appl Genet. 1999;98:243–251. [Google Scholar]
- 50.Poncet V, et al. Genetic analysis of the domestication syndrome in pearl millet (Pennisetum glaucum L, Poaceae): Inheritance of the major characters. Heredity. 1998;81:648–658. [Google Scholar]
- 51.Poncet V, et al. Genetic control of domestication traits in pearl millet (Pennisetum glaucum L, Poaceae) Theor Appl Genet. 2000;100:147–159. doi: 10.1007/s00122-002-0889-1. [DOI] [PubMed] [Google Scholar]
- 52.Singh SP. Broadening the genetic base of common bean cultivars: A review. Crop Sci. 2001;41:1659–1675. [Google Scholar]
- 53.Doyle JJ, Doyle JL. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem Bull. 1987;19:11–15. [Google Scholar]
- 54.Fredslund J, Schauser L, Madsen LH, Sandal N, Stougaard J. PriFi: Using a multiple alignment of related sequences to find primers for amplification of homologs. Nucleic Acids Res. 2005;33:W516–W520. doi: 10.1093/nar/gki425. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Fredslund J, et al. A general pipeline for the development of anchor markers for comparative genomics in plants. BMC Genomics. 2006;7:207. doi: 10.1186/1471-2164-7-207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Fredslund J, et al. GeMprospector—online design of cross-species genetic marker candidates in legumes and grasses. Nucleic Acids Res. 2006;34:W670–W675. doi: 10.1093/nar/gkl201. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Vallejos CE, Sakiyama NS, Chase CD. A molecular marker-based linkage map of Phaseolus vulgaris L. Genetics. 1992;131:733–740. doi: 10.1093/genetics/131.3.733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Freyre R, et al. Towards an integrated linkage map of common bean. 4. Development of a core map and alignment of RFLP maps. Theor Appl Genet. 1998;97:847–856. [Google Scholar]
- 59.McClean PE, Lee RK, Otto C, Gepts P, Bassett MJ. Molecular and phenotypic mapping of genes controlling seed coat pattern and color in common bean (Phaseolus vulgaris L.) J Hered. 2002;93:148–152. doi: 10.1093/jhered/93.2.148. [DOI] [PubMed] [Google Scholar]
- 60.Altschul SF, et al. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Edgar RC. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Hall TA. BioEdit: A user-friendly biological sequence alignment editor and. analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser. 1999;41:95–98. [Google Scholar]
- 63.Librado P, Rozas J. DnaSP v5: A software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009;25:1451–1452. doi: 10.1093/bioinformatics/btp187. [DOI] [PubMed] [Google Scholar]
- 64.Tajima F. Evolutionary relationship of DNA sequences in finite populations. Genetics. 1983;105:437–460. doi: 10.1093/genetics/105.2.437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Watterson GA. On the number of segregating sites in genetical models without recombination. Theor Popul Biol. 1975;7:256–276. doi: 10.1016/0040-5809(75)90020-9. [DOI] [PubMed] [Google Scholar]
- 66.Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123:585–595. doi: 10.1093/genetics/123.3.585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Bandelt HJ, Forster P, Röhl A. Median-joining networks for inferring intraspecific phylogenies. Mol Biol Evol. 1999;16:37–48. doi: 10.1093/oxfordjournals.molbev.a026036. [DOI] [PubMed] [Google Scholar]
- 68.Tamura K, Dudley J, Nei M, Kumar S. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007;24:1596–1599. doi: 10.1093/molbev/msm092. [DOI] [PubMed] [Google Scholar]
- 69.Corander J, Waldmann P, Sillanpää MJ. Bayesian analysis of genetic differentiation between populations. Genetics. 2003;163:367–374. doi: 10.1093/genetics/163.1.367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Corander J, Marttinen P. Bayesian identification of admixture events using multilocus molecular markers. Mol Ecol. 2006;15:2833–2843. doi: 10.1111/j.1365-294X.2006.02994.x. [DOI] [PubMed] [Google Scholar]
- 71.Corander J, Tang J. Bayesian analysis of population structure based on linked molecular information. Math Biosci. 2007;205:19–31. doi: 10.1016/j.mbs.2006.09.015. [DOI] [PubMed] [Google Scholar]
- 72.Corander J, Marttinen P, Sirén J, Tang J. Enhanced Bayesian modelling in BAPS software for learning genetic structures of populations. BMC Bioinformatics. 2008;9:539. doi: 10.1186/1471-2105-9-539. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155:945–959. doi: 10.1093/genetics/155.2.945. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies. Genetics. 2003;164:1567–1587. doi: 10.1093/genetics/164.4.1567. [DOI] [PMC free article] [PubMed] [Google Scholar]
Author Summary
Studies of the evolution of crop species are aimed at highlighting the structure and organization of their genetic diversity as well as the evolutionary forces that shaped it. Such knowledge is crucial for efficient conservation and development of new, improved plant varieties. Our study investigated the origins of the common bean (Phaseolus vulgaris) by examining the diversity at five gene sites (or loci). We present clear evidence of a Mesoamerican origin and reveal the very complex geographical structure of its genetic diversity in Mesoamerica.
In general, two major ecogeographical gene pools forP. vulgaris have been recognized: Mesoamerica and the Andes. However, the bean's wild form includes an additional gene pool located between Peru and Ecuador (1). Kami et al. (1) suggested that the seed protein, type I (Inca) phaseolin, which is characteristic of this wild form, is ancestral and thus identifies the Andes in northern Peru and Ecuador as the bean's origin. Recently, however, the alternative hypothesis of a Mesoamerican origin was suggested by Rossi et al. (2). They drew upon genetic analyses that suggested that, before domestication, the wild bean arrived in the Andean region, where it remained isolated. This process resulted in a bottleneck that reduced the genetic diversity of the Andean bean. To address this ongoing debate, we investigated the diversity of nucleotides (the subunits of the DNA molecule) at five gene sites (or loci) in a large sample that represents the entire geographical distribution of the bean's wild forms. The individuals studied are representative of the different gene pools of this species.
Diversity analysis was carried out by estimating several measures of nucleotide variability for the five gene fragments. Furthermore, we used statistical methods to measure the loss of nucleotide diversity in the Andean wild versus Mesoamerican wild bean populations. A Bayesian model-based approach was used to infer the hidden genetic population structure of our sample and thus to assign the genotypes (the genetic constitution of an organism) to genetically structured groups. Finally, to define the underlying relationships between the different genetic groups thus identified, we constructed haplotype and neighbor-joining (NJ) trees. Haplotypes are DNA sequences of a chromosome or of a single locus (in our case, of the gene fragments analyzed); NJ is a widely used method for constructing trees that illustrate the evolutionary development and diversification of a species (in our case based on multilocus data).
The Mesoamerican populations showed the highest genetic diversity estimates for each of the five genes. A very strong reduction in genetic diversity was found for the Andean gene pool compared with that of Mesoamerica. Our data strongly support the Andean bottleneck suggested by Rossi et al. (2). A clear relationship between the rate of mutations (genetic alterations) and the time of diversity recovery from the occurrence of a bottleneck was seen: The higher the mutation rate, the faster is the recovery of diversity. Indeed, considering our results and those obtained with different molecular markers, we found that the bottleneck was recovered almost completely by markers with a high mutation rate (simple sequence repeats) compared with markers that show lower rates of mutation (amplified fragment-length polymorphisms and sequence data).
In studying the population structure, we found six genetic clusters (B1–B6) (Fig. P1A). Although all the Andean and northern Peru–Ecuador genotypes were assigned clearly to single specific clusters (B6 and B5, respectively), the Mesoamerican genotypes were subdivided into four different clusters (B1, B2, B3, and B4), and they showed higher levels of admixture. The Mesoamerican accessions were found to be distributed in all the branches of the NJ tree (Fig. P1B): The Andean accessions clustered with the Mesoamerican group B3, and the northern Peru–Ecuador accessions (PhI) were more related to the other Mesoamerican groups and particularly to the B4 accessions.
Fig. P1.
(A) Percentages of membership (q) for each genetic cluster (B1, B2, B3, B4, B5, and B6; color coded). Each genotype ( “accession” refers to a member of a plant collection) is represented by a vertical line divided into colored segments, with the lengths representing their proportion of the genome. The accessions are ordered according to latitude: N_mx, Northern Mexico; C_mx, Central Mexico; S_mx, Southern Mexico; gt, Guatemala; es, El Salvador; col, Colombia; ec, Ecuador; N_pr, Northern Peru; S_pr, Southern Peru; bl, Bolivia; ar, Argentina. The country of origin is indicated by the horizontal line. (B) Unrooted NJ bootstrap tree. Each set of accessions is represented by a colored circle, and each color indicates the membership in the clusters. Gray and violet circles represent the nodes for which values are higher than 50% and 80%, respectively (indicating strength of support). AW, Andean wild; MW, Mesoamerican wild; PhI, type I phaseolin (northern Peru–Ecuador).
Analysis of haplotype networks supported these results. Importantly, the PhI accessions showed haplotypes that were closer to the Mesoamerican accessions and often separated from the majority of the Andean accessions. These outcomes clearly are not compatible with the hypothesis of a South American origin, in which the PhI genotypes would be expected to be an intermediate.
In summary, our study presents clear evidence of a Mesoamerican origin ofP. vulgaris, most likely Mexico, and different migrations into South America (2). Thus, we suggest thatP. vulgaris from northern Peru–Ecuador is a relict population that represents only a fraction of the genetic diversity of the ancestral population.
The magnitude of the population subdivisions in Mesoamerica that we present in our scenario had not been recognized clearly before our study. Our work indicates the potential of exploring genetic diversity that is not incorporated into the current, domesticated germplasm (the total genetic diversity present in a species). Specifically, in our case, the wild Mesoamerican germplasm should be used in breeding programs, because it has the potential for creation of new cultivars (plant varieties produced by selective breeding). Moreover, it is crucial to consider the Mesoamerican wild germplasm to sample the largest amount of diversity for use in commercial bean varieties, especially because most improved varieties of the common bean are of Andean origin at present. Furthermore, exploration of newly recognized genetic diversity is vital for meeting future challenges posed by climate change.
Footnotes
The authors declare no conflict of interest.
This Direct Submission article had a prearranged editor.
Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos.JN796475–JN796922).
See full research article on pageE788 ofwww.pnas.org.
Cite this Author Summary as: PNAS10.1073/pnas.1108973109.
References
- 1.Kami J, Velásquez VB, Debouck DG, Gepts P. Identification of presumed ancestral DNA sequences of phaseolin in Phaseolus vulgaris. Proc Natl Acad Sci USA. 1995;92:1101–1104. doi: 10.1073/pnas.92.4.1101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rossi M, et al. Linkage disequilibrium and population structure in wild and domesticated populations of Phaseolus vulgaris L. Evolutionary Applications. 2009;2:504–522. doi: 10.1111/j.1752-4571.2009.00082.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.





