Proceedings of the Royal Society B: Biological Sciences
You have accessResearch articles

Tracing back the origin of pumpkins (Cucurbita pepo ssp. pepo L.) in Mexico

Abstract

Cucurbita pepo is an economically important crop, which consists of cultivated C. pepo ssp. pepo, and two wild taxa (C. pepo ssp. fraterna and C. pepo ssp. ovifera). We aimed at understanding the domestication and the diversity of C. pepo in Mexico. We used two chloroplast regions and nine nuclear microsatellite loci to assess the levels of genetic variation and structure for C. pepo ssp. pepo's landraces sampled in 13 locations in Mexico, five improved varieties, one C. pepo ssp. fraterna population and ornamental C. pepo ssp. ovifera. We tested four hypotheses regarding the origin of C. pepo ssp. pepo's ancestor through approximate Bayesian computation: C. pepo ssp. ovifera as the ancestor; C. pepo ssp. fraterna as the ancestor; an unknown extinct lineage as the ancestor; and C. pepo ssp. pepo as hybrid from C. pepo ssp. ovifera and C. pepo ssp. fraterna ancestors. Cucurbita pepo ssp. pepo showed high genetic variation and low genetic differentiation. Cucurbita pepo ssp. fraterna and C. pepo ssp. pepo shared two chloroplast haplotypes. The three subspecies were well differentiated for microsatellite loci. Cucurbita pepo ssp. fraterna was probably C. pepo ssp. pepo's wild ancestor, but subsequent hybridization between taxa complicate defining C. pepo ssp. pepo's ancestor.

1. Background

The extent of genetic variation found in crop species is variable [1,2], and in many cases domesticated varieties show less genetic variation than their wild relatives [3]. The exploration and collection of wild crop relatives, and feral and native landrace germplasm, has important implications in understanding the history of domestication.

Mexico is considered the centre of domestication and diversification of pumpkins. In the genus Cucurbita, all species are diploid with 2n = 40 [4]. This genus is composed of 20 taxa, including five independently domesticated taxa [510], of which Cucurbita pepo ssp. pepo is the best studied – mainly because of its economic importance (i.e. from the 27 449 481 tons of pumpkins, squash and gourds produced worldwide in 2017, most of them belong to C. pepo, but other Cucurbita species are also included [11]), and its worldwide cultivated distribution [12,13].

Cucurbita pepo is subdivided into three subspecies: two wild taxa, C. pepo ssp. fraterna and C. pepo ssp. ovifera, the latter consisting of varieties texana and ozarkana; and one cultivated: C. pepo ssp. pepo [8,12,1416]. Cucurbita pepo ssp. fraterna is endemic to northeast Mexico in the states of Tamaulipas and Nuevo Leon, while C. pepo ssp. ovifera is found in southeast USA in Texas, the Greater Mississippi Valley and the Ozark plateau [6,15]. A recent phylogenetic analysis suggests that C. pepo ssp. ovifera might be polyphyletic [17].

Archaeological, linguistic and molecular data support two independent domestication events in this species; one occurred in Mexico (C. pepo ssp. pepo; ca 10 000 years ago) and the other in southeast USA (C. pepo ssp. ovifera; ca 5000 years ago) [912,14,15,1827]. Nevertheless, the complex history of C. pepo—which includes the establishment of populations from escaped cultivars and signals of hybridization and introgression among wild and cultivated taxa—has hindered the attempts to identify the wild progenitor of pepo cultivars [14,15,17,28,29].

A wide array of molecular markers have been used to analyse the relationships among C. pepo subspecies [3,1217,20,26,3036]. However, these relationships have not been adequately solved [17,25], perhaps as a consequence that most studies have focused on improved and commercial cultivars, usually including only a few landraces from Mexico [12,16,17,32,35], where primitive landraces are still grown [37].

A phylogeographical approach, which considers a wide sample of ancestral landraces from the crop's original range [38], might be most adequate for disentangling the relationships among C. pepo wild and cultivated species. Moreover, molecular markers from different origins (i.e. cpDNA, nuclear DNA) have different modes of inheritance and varying rates of sequence evolution [39], and a combined approach improves the quality of the estimates [40] and enhances resolution for complex domestication histories.

A phylogeographical approach for the analysis of C. pepo genetic variation and genetic differentiation in its region of origin (Mexico) has not been previously documented, and additionally a model-based approach has not been previously applied to C. pepo's molecular data. In this study, we used two chloroplast regions and nine nuclear microsatellite loci to assess the levels of genetic variation and differentiation among C. pepo's subspecies and among ancestral Mexican C. pepo ssp. pepo landraces. We used a model-based approach to test four hypotheses: (i) C. pepo ssp. ovifera as the wild ancestor of C. pepo ssp. pepo; (ii) C. pepo ssp. fraterna as the wild ancestor of C. pepo ssp. pepo; (iii) an unknown wild ancestor that is currently extinct; and (iv) C. pepo ssp. pepo as a hybrid from both C. pepo ssp. fraterna and C. pepo ssp. ovifera wild ancestors.

2. Material and methods

(a) Sampling and DNA extraction

We obtained seeds from C. pepo ssp. pepo from 13 locations throughout Mexico (61 fruits; electronic supplementary material, table S1; figure 1), five samples from C. pepo ssp. pepo's commercial improved varieties (black beauty zucchini, cocozelle, delicata honey boat, spaghetti, and early prolific straightneck) (improved varieties hereafter). We also obtained seed samples from two fruits corresponding to a wild population of C. pepo ssp. fraterna from Tamaulipas, Mexico, and we sampled C. pepo ssp. ovifera from three different fruits from ornamental gourds. The majority of seed samples (15 locations) were obtained from the germplasm collection of Campo Experimental Bajío, INIFAP, in 2014, (electronic supplementary material, table S1) and additional landraces were collected in the field between 2013 and 2015 taking between one and five fruits (one fruit from each different plant) per site. Seeds from collected fruits were deposited in the FESI germplasm collection (electronic supplementary material, table S1). Cucurbita are monoecious, self-incompatible and bee-pollinated outbreeding species, and seeds from a single fruit are potentially produced from different fathers [41]. To obtain a better representation of the genetic variation found at each sampled location, we grew 4–20 seeds from each fruit in commercial substrate under greenhouse conditions (35°C) for 30 days. Young leaves were used for DNA extraction with a modified CTAB protocol [42].

Figure 1.

Figure 1. (a) Seed morphological variation for Mexican C. pepo ssp. pepo sampled localities (asterisks (*) denote samples from seed bank; see the electronic supplementary material, table S1). (b) Morphological variation of some collected fruits of Mexican C. pepo ssp. pepo. (Online version in colour.)

(b) Chloroplast sequences

We amplified two chloroplast regions psbJ-petA and psbD-trnT(GGU) [43] for 29 individuals from C. pepo ssp. pepo's improved varieties, 279 ancestral Mexican C. pepo ssp. pepo landraces, 25 ornamental C. pepo ssp. ovifera and 22 wild C. pepo ssp. fraterna, for a total of 355 individuals (GenBank: MK595798–MK595819). We also included sequences for wild C. pepo ssp. ovifera reported in [9] (accession numbers MH470013 and MH470030) and [27] (electronic supplementary material, appendix S1).

We obtained basic statistics, such as the proportion of segregating sites (S), number of haplotypes (h), haplotypic diversity (Hd) and nucleotide diversity (π) with DnaSP v. 5.10.1 [44] and Arlequin v. 3.5 [45]. To depict the relationships among haplotypes, we obtained a median-joining haplotype network using population analysis with reticulate trees (PopART) [46,47].

We performed an assignment test with adegenet for R [48,49] considering four groups: C. pepo ssp. pepo's improved varieties, ancestral Mexican C. pepo ssp. pepo landraces, ornamental C. pepo ssp. ovifera and wild C. pepo ssp. fraterna. To test for isolation by distance (IBD), we obtained geographical distances with GeoDistanceInMetresMatrix [50] and Nei's pairwise FST with hierfstat [51], and performed a Mantel test, excluding C. pepo ssp. pepo's improved varieties, with 10 000 permutations to assess significance with vegan [52] (electronic supplementary material, appendix S1).

(c) Nuclear microsatellite loci

We amplified 12 nuclear microsatellite loci developed for C. pepo and Cucurbita moschata: CMTp17, CMp88, CMTp129, CMTp175, CMTp193, CMTm11, CMTm54, CMTm120, CMTm144, CMTm187, CMTm221 and CMTm261 [4] (electronic supplementary material, appendix S1). We selected only highly variable dinucleotide loci from different chromosomes to improve genome coverage and to reduce the probability of linkage disequilibrium. In total, we obtained microsatellite data for 374 individuals: 36 individuals from C. pepo ssp. pepo's improved varieties, 291 ancestral Mexican C. pepo ssp. pepo landraces, 18 ornamental C. pepo ssp. ovifera and 29 wild C. pepo ssp. fraterna. The dataset is available from the Dryad Digital Repository (https://doi.org/10.5061/dryad.s374m76) [53].

We used Microchecker v. 2.2.3 [54] for null allele analysis [55], and Arlequin v. 3.5 [46] for a Hardy–Weinberg exact test and a linkage disequilibrium test. We used Genodive v. 2.0b27 [56] to obtain allele frequencies by direct estimation, and basic statistics (allelic richness (A), expected (HE) and observed (HO) heterozygosities, as well as inbreeding coefficient (FIS)). Genetic structure was assessed with Structure v. 2.3.4 [57]—burn-in 500 000 chains; 1 000 000 final Markov chain Monte Carlo chains; K = 1–15; 10 repetitions for each K; priors = admixture and correlated allele frequencies, putative location of origin of each individual not considered. Results from K = 3 would define if each subspecies represents a well-defined genetic cluster, while K = 4 would define structure within subspecies (electronic supplementary material, appendix S1). We used pairwise RST, as implemented in Arlequin v. 3.5, to analyse genetic differentiation and to account for microsatellite stepwise mutation model [58]. In addition, to test for IBD, we used the method described above to estimate geographical distances, and we obtained pairwise RST, which were linearized (RST/1 − RST) to perform a Mantel test with 10 000 permutations with vegan [52].

(d) Approximate Bayesian computation analyses

We used nuclear microsatellite loci and the partitioned chloroplast sequence dataset to test for previously proposed hypotheses regarding C. pepo ssp. pepo's ancestor: (i) C. pepo ssp. ovifera as the wild ancestor; (ii) C. pepo ssp. fraterna as the wild ancestor; (iii) an unknown and extinct lineage as the wild ancestor; and (iv) origin by hybridization between both C. pepo ssp. ovifera and C. pepo ssp. fraterna wild ancestors (electronic supplementary material, figure S1).

To define which scenario has the best adjustment to our observed data, we obtained the posterior probability for each scenario by a direct approach and by the logistic regression approach implemented in DIYABC [5966]. The logistic approach shows good stability and better accuracy for scenario selection when comparing complex models in contrast to the direct approach [62,63]. We also estimated confidence intervals (CI) for the posterior probabilities of each scenario for further support in scenario selection [60] (electronic supplementary material, appendix S1).

For the best-supported scenario, we report the median of each temporal and demographic parameters drawn from the posterior distribution [64]; we also obtained the average relative bias [40] (electronic supplementary material, appendix S1).

We evaluated the ability of the approximate Bayesian computation (ABC) analysis to discriminate between scenarios and we estimated type I and type II error rates for both approaches [59,60,62] (electronic supplementary material, appendix S1). Then, to evaluate if the best-supported scenario could successfully reproduce the observed data, we performed model checking using the summary statistics not used during the inference step [40] (electronic supplementary material, appendix S1 and figure S2).

3. Results

(a) Chloroplast data

We obtained a total of 1724 bp, for intergenic regions petA-psbJ (889 bp) and psbD-trnT (835 bp), 11 haplotypes with 19 segregating sites (seven singleton variable sites and 12 parsimony informative sites) of the chloroplast genome. For ancestral Mexican C. pepo ssp. pepo landraces, we found five haplotypes with nine segregating sites (electronic supplementary material, table S2). Most sampled locations were fixed for a single haplotype, but some of them were fixed for different haplotypes (figure 2). Cucurbita pepo ssp. ovifera showed three haplotypes with 18 segregating sites and wild C. pepo ssp. fraterna showed two haplotypes with one segregating site, which were shared with Mexican C. pepo ssp. pepo (table 1 and figure 2).

Figure 2.

Figure 2. (a) Haplotype network for two concatenated chloroplast regions (petA-psbJ and psbD-trnT(GGU); 1724 bp) depicting the relationship between 11 haplotypes found in C. pepo ssp. pepo, C. pepo ssp. fraterna and C. pepo ssp. ovifera from the current study, plus data reported by Kistler et al. [27], including C. pepo ssp. fraterna archaeological sample (123Fra) as well as C. pepo ssp. fraterna modern sample (KFra). (b) Geographical distribution of 11 haplotypes considered in the haplotype network for the individuals analysed in this study. (Online version in colour.)

Table 1. Diversity statistics for chloroplast sequences (cpDNA) and nuclear microsatellite loci for each of the subspecies. (n, sample size; S, number of segregating sites; h, number of haplotypes; Hd, haplotypic diversity; π, nucleotide diversity; A, allelic richness; HO, observed heterozygosity; HE, expected heterozygosity; FIS, inbreeding coefficient. *p < 0.05.)

cpDNA
microsatellite loci
subspecies
nShHdπNAHOHEFIS
C. pepo subsp. pepoimproved varieties29550.677340.0007364.1110.2160.6380.101
Mexican pepo279950.63180.000829110.7780.430.5730.123*
C. pepo subsp. ovifera251830.29000.0014183.8890.6140.564−0.089*
C. pepo subsp. fraterna22120.51950.0003292.8890.1290.2920.559*

*p < 0.05.

All haplotypes were closely related, and the haplotype network showed a star configuration with haplotype A at the centre of the network, and many derived haplotypes. Haplotypes A and B were shared between C. pepo ssp. pepo and C. pepo ssp. fraterna. Kistler et al.'s [27] haplotypes from modern and archaeological samples, except for samples KFra and 123Fra, were placed within haplotype A. Cucurbita pepo ssp. ovifera showed haplotypes H to K, which differed from haplotype B by two mutational steps. Kistler et al.'s [27] haplotypes for C. pepo ssp. ovifera var. ovifera, var. texana and var. ozarkana were placed within haplotype K. Cucurbita pepo ssp. pepo's improved varieties, except for cocozelle (Coco), shared haplotype A with ancestral Mexican C. pepo ssp. pepo landraces and wild C. pepo ssp. fraterna and haplotypes H or K with C. pepo ssp. ovifera (figure 2).

Contrary to our expectations, most C. pepo ssp. pepo's improved varieties, except for spaghetti (Spa), showed higher genetic variation (S = 4; h = 2–3; Hd = 0.6667–0.733; π = 0.000932–0.0015) than ancestral Mexican C. pepo ssp. pepo landraces (Hd = 0–0.459; π = 0–0.00026) and than ornamental C. pepo ssp. ovifera (Hd = 0.299 and π = 0.0014). Ancestral Mexican C. pepo ssp. pepo landraces showed levels of genetic diversity similar to wild C. pepo ssp. fraterna (Hd = 0.519; π = 0.0003) (table 1; electronic supplementary material, table S2).

For the assignment test, samples from wild C. pepo ssp. fraterna were apportioned to the ancestral Mexican C. pepo ssp. pepo landraces group, while ornamental C. pepo ssp. ovifera was apportioned to its own group and C. pepo ssp. pepo's improved varieties were allocated to these three groups regardless of the variety (electronic supplementary material, figure S3).

Overall, FST was 0.586. Cucurbita pepo ssp. fraterna, shared haplotypes with many ancestral Mexican C. pepo ssp. pepo landraces. Spaghetti (Spa), San Luis Potosi (SLP), Zacatecas (Zac) and Hidalgo (Hgo) were completely differentiated (FST = 1). Improved varieties, cocozelle (Coco), early prolific straightneck (Strai) and delicata honey boat (Deli) shared some similarity with ancestral Mexican C. pepo ssp. pepo landraces, wild C. pepo ssp. fraterna and ornamental C. pepo ssp. ovifera (FST = 0.35–0.71, electronic supplementary material, figure S4). We did not find signals of IBD for chloroplast sequences (p = 0.4584).

(b) Microsatellite data

Three microsatellite loci (CMTp129, CMTm221 and CMTm 261) showed signals of null alleles in all populations and were discarded from analyses. Results were obtained for nine nuclear microsatellite loci that did not show signals of linkage disequilibrium.

Allelic richness was higher for ancestral Mexican C. pepo ssp. pepo landraces (A = 2.77–5.44) than for ornamental C. pepo ssp. ovifera (3.88), wild C. pepo ssp. fraterna (2.88) and C. pepo ssp. pepo's improved varieties (A = 1.3–2.2) (table 1). Genetic diversity was lower in C. pepo ssp. pepo's improved varieties (HE = 0.079–0.394) and wild C. pepo ssp. fraterna (HE = 0.292) than in ornamental C. pepo ssp. ovifera (HE = 0.564). Ancestral Mexican C. pepo ssp. pepo landraces showed intermediate to high levels of genetic diversity (HE = 0.293–0.556), but average HE for C. pepo ssp. pepo's improved varieties (0.638) was higher than average HE for ancestral Mexican C. pepo ssp. pepo landraces (0.573) (table 1; electronic supplementary material, table S2).

We found significant heterozygote deficiency in eight ancestral Mexican C. pepo ssp. pepo sampled locations (FIS = 0.118 in Tlaxcala (Tla) to 0.318 in Chiapas (Chiap)), in one C. pepo ssp. pepo's improved variety (FIS = 0.025 in cocozelle (Coco)) and in wild C. pepo ssp. fraterna (FIS = 0.559). By contrast, ornamental C. pepo ssp. ovifera showed a slight but significant heterozygote excess (FIS = –0.089) (table 1; electronic supplementary material, table S2).

Structure analysis for K = 3 showed that each subspecies constitutes a well-defined cluster (figure 3), with low admixture between C. pepo ssp. pepo and C. pepo ssp. fraterna. Cucurbita pepo ssp. pepo's improved varieties were assigned to either the C. pepo ssp. ovifera cluster (early prolific straightneck (Strai) and delicata honey boat (Deli)) or to the ancestral Mexican C. pepo ssp. pepo cluster (black beauty zucchini (Bbz), cocozelle (Coco) and spaghetti (Spa)). For K = 4, ancestral Mexican C. pepo ssp. pepo landraces were assigned to two clusters, with most individuals collected from the northern locations belonging to one cluster—together with C. pepo ssp. pepo's improved varieties—and most individuals collected from the southern locations assigned to the other cluster (figure 3).

Figure 3.

Figure 3. (a) Histogram depicting the results from the Structure analysis based on nine nuclear microsatellite loci for K = 3 and 4. (b) Geographical distribution of the genetic clusters for K = 4. (Online version in colour.)

Overall RST was 0.395. Pairwise RST values were significant and the lowest values pertained to comparisons among ancestral Mexican C. pepo ssp. pepo landraces (electronic supplementary material, figure S4), and the highest values for wild C. pepo ssp. fraterna (0.46–0.96; electronic supplementary material, figure S4). We did not find signals of IBD for nuclear microsatellite loci (p = 0.3967).

(c) Approximate Bayesian computation analyses

We obtained contrasting results for the direct and logistic methods. The logistic method provided better support to scenario 4 (hybrid origin for C. pepo ssp. pepo; 0.6982 posterior probability and non-overlapping CI (electronic supplementary material, table S3)). By contrast, for the direct approach, the reliability of scenario choice was problematic (overlapping CI), and the best-supported model was scenario 2 (wild C. pepo ssp. fraterna as C. pepo ssp. pepo ancestor; 0.5440 posterior probability), followed by scenario 4 (0.2720 posterior probability; electronic supplementary material, table S3). The rate at which each model was correctly recovered (0.944–0.995 for the logistic versus 0.567–1 for the direct approach), and the posterior predictive error (logistic approach = 0.197; direct approach = 0.321) support higher reliability in model selection for the logistic method (electronic supplementary material, table S3).

Model checking resulted in good adjustment for each scenario, as observed data fall within the cloud of simulated datasets (electronic supplementary material, figure S2). Some summary statistics used for model checking showed significant differences between observed and simulated datasets (electronic supplementary material, table S4).

Median parameter estimates for scenarios 2 (wild C. pepo ssp. fraterna as C. pepo ssp. pepo's ancestor) and 4 (C. pepo ssp. pepo's hybrid origin) were similar. In both scenarios, results suggest a population expansion for C. pepo ssp. pepo's lineage after domestication, and population contraction for C. pepo ssp. ovifera. Cucurbita pepo ssp. fraterna showed small effective population size. For scenario 4, migration rate suggests that C. pepo ssp. pepo has similar ancestry from C. pepo ssp. ovifera and C. pepo ssp. fraterna (table 2).

Table 2. Median value for effective population size (Ne), migration rate and time since divergence, drawn from the posterior distribution, for scenario 2 (C. pepo ssp. fraterna as Mexican C. pepo ssp. pepo wild ancestor) and scenario 4 (Mexican C. pepo ssp. pepo origin by hybridization between C. pepo ssp. fraterna and C. pepo ssp. ovifera), which showed the highest posterior distribution for the direct approach and the logistic approach, respectively. (Average relative bias for the median of each parameter is shown in parenthesis. Migration rate is not considered in scenario 2.)

parameter
scenario 2 (fraterna)scenario 4 (hybrid)
effective population size (Ne)ancestralC. pepo ssp. pepo's ancestral lineage40 610 (1.3269)39 830 (0.8937)
C. pepo ssp. ovifera540 200 (–0.0217)452 000 (–0.3250)
currentC. pepo ssp. pepo112 800 (0.4351)126 100 (0.3615)
C. pepo ssp. ovifera15 340 (0.3235)29 330 (1.2317)
C. pepo ssp. fraterna4353 (0.3028)5406 (–0.1364)
migration rate (ra)0.6186 (–0.1826)
time since divergence in yearsC. pepo ssp. pepo's lineage divergence74 890 (0.1750)43 920 (1.5941)
C. pepo ssp. ovifera's domestication70 829 (0.0435)39 015 (0.3508)
ancestral lineage divergence282 600 (0.7652)366 600 (0.0278)

4. Discussion

(a) Genetic variation and genetic structure

Cucurbita pepo possesses higher genetic diversity in chloroplast regions psbJ-petA and psbD-trnT(GGU) than C. moschata and Cucurbita argyrosperma (Hd = 0, π = 0). Cucurbita pepo ssp. pepo and C. pepo ssp. fraterna shared two haplotypes, while C. pepo ssp. ovifera did not share haplotypes with either. This may suggest that C. pepo ssp. fraterna is, most likely, C. pepo ssp. pepo's ancestor [65].

For nuclear microsatellite loci, average levels of genetic variation for ancestral Mexican C. pepo ssp. pepo and ornamental C. pepo ssp. ovifera (table 1) were higher than those reported for other annual plants (HE = 0.46 [66]), similar to those reported for C. pepo ssp. pepo in Guatemala (HE = 0.5; 10 polymorphic loci [36]), and higher than those reported for C. argyrosperma ssp. argyrosperma (HE = 0.410; nine polymorphic loci [67]) and C. moschata (HE = 0.395; 12 polymorphic loci [68]). These results are consistent with previous isozyme analyses showing higher genetic variation in C. pepo ssp. pepo than in other domesticated Cucurbita [3,32,69]. Low levels of genetic variation in C. pepo ssp. fraterna are consistent with previous analyses [14].

High genetic diversity found in ancestral Mexican C. pepo ssp. pepo, for chloroplast regions and microsatellite loci, might relate to: its ample distribution and its wide range of morphological diversity [25]; the fact that C. pepo ssp. pepo is one of the earliest domesticates in America [6,7,14,22,7072] followed by a rapid process of population expansion and diversification after domestication, as supported by our star-shaped haplotype network, archaeological records and linguistic evidence [5,6,18,73]; and because at least two independent domestication events and admixture among domesticated lineages [26,27,31,37] are involved in this taxa, as is supported by our assignment test. Moreover, high levels of genetic variation may relate to Andres's [37] hypothesis of multiple incipient domestications in the history of C. pepo.

Wild progenitors are expected to show higher levels of genetic variation than their crop relatives [2,3]. Low genetic variation in C. pepo ssp. fraterna may relate to its restricted distribution [7,9]. Currently, this wild subspecies is endemic to a small region in Tamaulipas and Nuevo Leon, Mexico [74]. Moreover, currently this species is rare [7,29] and its populations might have undergone a demographic bottleneck, as population declines for wild Cucurbita have been linked to megafaunal extinction after the Pleistocene [27], and, more recently, to the use of pesticides.

Improved commercial varieties showed a wide range of values of genetic diversity (table 1); nevertheless, this result should be taken with caution, given the small sample size used in the analysis. In this sense, samples from black beauty zucchini, cocozelle, early prolific straightneck and delicata honey boat were assigned to both C. pepo ssp. pepo and C. pepo ssp. ovifera groups, which suggest they might be hybrids from the two domesticated lineages. Early prolific straightneck is a confirmed hybrid [12]. Varieties from the zucchini group and cocozelle are considered as C. pepo ssp. pepo cultivars [12,75,76]; but our results suggest that cocozelle might have some degree of introgression.

Most C. pepo ssp. pepo locations showed significant heterozygote deficiency, which is to be expected as domestication and improvement involved genetic bottlenecks and inbreeding [2]. Wild C. pepo ssp. fraterna also showed heterozygote deficiency, which—as mentioned above—may relate to its restricted historical range, its current rareness and small effective populations sizes.

Genetic structure was higher for chloroplast sequences than for microsatellite loci, and many populations are fixed for different chlorotypes. Nuclear genetic structure was higher than values reported for wild C. argyrosperma ssp. sororia (RST = 0.3 [67]) and domesticated C. moschata (FST = 0.215 [68]), but similar to values reported for domesticated C. argyrosperma ssp. argyrosperma (RST = 0.4 [67]). However, when calculating RST for ancestral Mexican C. pepo ssp. pepo landraces only, we obtain RST = 0.160, which is lower than that reported for domesticated C. argyrosperma and C. moschata.

Patterns of genetic differentiation between C. pepo's landraces contrast between chloroplast and nuclear markers. Cucurbita pepo ssp. ovifera chlorotypes, including those reported by Kistler et al. [27] for wild varieties (haplotype K), cluster together and showed higher values of genetic differentiation, while C. pepo ssp. fraterna and C. pepo ssp. pepo Mexican landraces show shared ancestry. On the other hand, nuclear microsatellites show that each subspecies constitutes a well-defined genetic cluster with higher levels of differentiation between C. pepo ssp. fraterna and Mexican C. pepo ssp. pepo than between C. pepo ssp. ovifera and Mexican C. pepo ssp. pepo. This is in accordance with the data reported by Gong et al. [26], and contrast with a polyphyletic origin for C. pepo ssp. ovifera [17]. Lack of IBD for both molecular markers may relate to human-mediated dispersal. Future analyses should include C. pepo's varieties from the Yucatan Peninsula, as well other native varieties from specific areas.

These contrasting patterns may be related to the differences in their mode of inheritance and mutation rates. The chloroplast is haploid, has low mutation rate, is maternally inherited, recombination may occur but is rare, and is more subject to genetic drift, hence it reflects the evolutionary history of maternal lineages. Nuclear microsatellite loci are codominant genetic markers with high mutation rates, are of biparental inheritance, and their patterns of genetic variation and genetic differentiation reflect the history of the total population in ecological time [39].

Our results suggest that effective population size is smaller for C. pepo ssp. fraterna than for C. pepo ssp. ovifera. Also, there are signals of shared ancestry among C. pepo ssp. fraterna and C. pepo ssp. pepo, and that C. pepo ssp. pepo underwent rapid demographic expansion after domestication. We also found signals suggesting introgression or hybridization for improvement in C. pepo ssp. pepo's commercial lines; but nuclear microsatellite loci showed low gene flow among subspecies in recent times.

(b) Approximate Bayesian computation analysis

The best-supported scenario differed when using the direct and logistic approach. This is to be expected when complex scenarios are tested [62,63]. Thus, our results signal that C. pepo has a complex history of domestication.

ABC analysis supported the hypothesis for the origin for C. pepo ssp. pepo by hybridization between C. pepo ssp. fraterna and C. pepo ssp. ovifera. Nevertheless, shared haplotypes between C. pepo ssp. pepo and C. pepo ssp. fraterna, together with shared isozyme alleles [14,37], and higher support for scenario 2 with the direct approach, might suggest that C. pepo ssp. fraterna is indeed the wild ancestor of C. pepo ssp. pepo.

Indeed, archaeological and linguistic data support two independent domestication events for C. pepo: one in Mexico (C. pepo ssp. pepo) and the other in southeast USA (C. pepo ssp. ovifera). Regarding domesticated C. pepo ssp. pepo, the oldest archaeological remains (approx. 10 000 years ago) and the oldest linguistic evidence (approx. 6500 years ago), based on the reconstruction of words for squash in New World ancestral languages, suggest that initial domestication might have taken place in or near Oaxaca in Mexico, and early dissemination in Mesoamerica [18,22]. Cucurbita pepo ssp. fraterna's current distribution is approximately 1000 km from Guilá Naquitz, Oaxaca. However, past projections of species distribution models suggest that during the mid-Holocene (approx. 6000 years ago) suitable environmental conditions for the presence of C. pepo ssp. fraterna could have been found south into today's state of Puebla (electronic supplementary material, figure S5), close to archaeological sites from Tehuacán, Puebla (7900 BP), and in the area of the Infiernillo and Ocampo caves in Tamaulipas (9000–5000 years BP) [72,77,78].

Our results suggest a population expansion for C. pepo ssp. pepo after domestication, as it has undergone rapid geographical expansion and diversification [12,16,30,32,33]. Changes in effective population size for C. pepo ssp. ovifera are consistent with a bottleneck that might relate to domestication or to geographical range contraction since the Last Interglacial based on species distribution models ([9]; electronic supplementary material, figure S5) and Kistler et al. [27]'s megafaunal extinction hypothesis. Cucurbita pepo ssp. fraterna showed the smallest current effective population size, which agrees with a restricted distribution throughout the Pleistocene ([7,9]; electronic supplementary material, figure S5).

Data accessibility

DNA sequences: GenBank accessions MK595798–MK595819. Microsatellite available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.s374m76 [53].

Authors' contributions

G.C.-M. and K.Y.R.-M. contributed to fieldwork, laboratory work, molecular and population genetics analysis; H.S.H.-R. and G.S.-d.l.V. contributed to fieldwork, laboratory work; N.G. mapped potential distribution models of the wild subspecies; E.A.-P. contributed to laboratory work, logistics; S.M.-H. project leader, contributed germplasm collection, generated database for project design; R.L.-S. project leader, designed and coordinated the project and logistics; L.E.E. project leader, designed and coordinated the project, logistics. All authors contributed to writing and reviewing the manuscript.

Competing interests

We declare we have no competing interests.

Funding

The work was supported by CONABIO (grant nos. KE004, PE001), and CONACyT (Investigación Científica Básica grant no. 2011.167826, Problemas Nacionales grant no. 247730 to D. Piñero, SNI ayudante de Investigador for K.Y.R.-M. (Exp. Ayt. 13162)).

Acknowledgements

This manuscript represents, in part, the results for the Bachelor's degree thesis of K.Y.R.-M. at the Instituto de Ecología, Universidad Nacional Autónoma de México (UNAM). Special thanks to V. Souza and D. Piñero for supporting this research. We thank the Laboratorio de Evolución Molecular y Experimental, the Instituto de Ecología, and the FES-Iztacala, UNAM, and Campo Experimental Bajío del INIFAP. We thank P. Hernández, L. M. Paredes-Torres, D. C. Hernández-Rosales, T. E. Legaspi, G. M. Rosas, S. Barrientos, E. Scheinvar and L. Espinosa-Asuar for their help in fieldwork, laboratory work, logistics and/or help in conducting the analyses. L. F. Castellanos improved artwork. We thank two anonymous reviewers.

Footnotes

Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.4591643.

Published by the Royal Society. All rights reserved.

References