Profound genetic divergence and asymmetric parental genome contributions as hallmarks of hybrid speciation in polyploid toads

The evolutionary causes and consequences of allopolyploidization, an exceptional pathway to instant hybrid speciation, are poorly investigated in animals. In particular, when and why hybrid polyploids versus diploids are produced, and constraints on sources of paternal and maternal ancestors, remain underexplored. Using the Palearctic green toad radiation (including bisexually reproducing species of three ploidy levels) as model, we generate a range-wide multi-locus phylogeny of 15 taxa and present four new insights: (i) at least five (up to seven) distinct allotriploid and allotetraploid taxa have evolved in the Pleistocene; (ii) all maternal and paternal ancestors of hybrid polyploids stem from two deeply diverged nuclear clades (6 Mya, 3.1–9.6 Mya), with distinctly greater divergence than the parental species of diploid hybrids found at secondary contact zones; (iii) allotriploid taxa possess two conspecific genomes and a deeply diverged allospecific one, suggesting that genomic imbalance and divergence are causal for their partly clonal reproductive mode; (iv) maternal versus paternal genome contributions exhibit asymmetry, with the maternal nuclear (and mitochondrial) genome of polyploids always coming from the same clade, and the paternal genome from the other. We compare our findings with similar patterns in diploid/polyploid vertebrates, and suggest deep ancestral divergence as a precondition for successful allopolyploidization.


Details on animal sampling and DNA extraction
Adult individuals were documented photographically and sampled for buccal cells [10], fingertips or tail tips (tadpoles) before release or deposit in scientific collections (Table   S1). Swabs and tissue samples (in 70% ethanol) were stored at -20°C. DNA was extracted using the DNeasy Tissue Kit or the BioSprint robotic workstation (QIAGEN), following the manufacturer's protocols. DNA was eluted in volumes of 150 μl and 50 μl (QIAGEN Buffer AE) and stored at -20°C.

Details on phylogenetic analyses of mtDNA
We performed four runs with 20 million generations with four chains, starting with a random tree and sampling every 2000 generations, until reaching an average standard deviation of split frequencies of < 0.01. Stationarity and convergence of the runs were confirmed using the software Tracer v.1.7.2 (http://beast.bio.ed.ac.uk/Tracer). The first 25% of each run were discarded as burn-in.

Details on subgenome inference and phylogenetic analyses of single nuclear markers
The best-fitting model of marker-specific sequence evolution was selected (Akaike information criterion, AIC; Table S2). PhyML (ver. 3.0) was used with the SPR branch swapping algorithm and, to assess node support, with 10 3 bootstrap replicates (continued on next page). We then generated maximum likelihood-based phylogenies using PhyML (ver. 3.0, [11]) with a GTR + G +I model of sequence evolution, the SPR branch swapping algorithm and 10 3 bootstrap replicates.    Details on the molecular dating approach for nuclear and mitochondrial data Molecular dating was performed in BEAST v. 1.8.3 [12] (with input files created in BEAUTi v. 1.8.3). For the nuclear data set, we optimized the partitioning scheme by initially treating each gene fragment separately and determined the most suitable substitution models using PartitionFinder v.1.1.1 [13]. We evaluated only models available in BEAST with the follow- to specific mtDNA lineages [5], these are the best available calibration points for divergence estimations in Palearctic green toads [14].
For the mtDNA data (D-loop), we selected the best-fit model of evolution through jModel-Test v2.1.7 based on BIC [17,18] (HKY + G), and followed the workflow as described above.
Since B. bufo mtDNA could not aligned to green toad D-loops due to large INDELs, we included B. raddei as an outgroup. In addition to the age constraints (1) and (2) as implemented for the analyses of the nuclear data set (see above), we calibrated the ingroup (including stem) to have a minimum age of 18 My based on the oldest known green toad fossils [19]; cf. [5]. We applied a lognormal prior age distribution on that calibration point with an upper range bound of 29.5 My (mean = 0.48, SD = 1, offset = 18 My, 95% HPDI: 18.2-29.5 My; [20]).  (Fig. S8).
In several markers, some allotetraploid individuals exhibited three alleles clustering with one parental lineage (i.e. maternal or paternal) whereas the remaining allele clustered with the other one (i.e. paternal or maternal respectively) or displayed only alleles from one paternal cluster. In DMRT1 the individual P4 displayed three paternal alleles and only one maternal allele whereas in P450 only three paternal copies and no maternal copy were detected in the individuals O4 and P1. In SOX3, only two paternal alleles were detected for O5 and O1, and one paternal copy for P3 and P5.

Amplification of nuclear markers for three initially unidentified individuals
Thirteen out of the 30 microsatellite markers, developed by Betto-Colliard et al. [21], were used to infer the ploidy level of the four unidentified individuals, using the same PCR pro-

Supplementary Text S4: Maximum likelihood tests on the placement of B. turanensis
in the maternal clade of the Bayesian tree, shown in Fig. 1c We tested the resulting placement of the turanensis clade in the best BI tree (Fig. 1c) against alternative topologies in a ML framework to evaluate whether likelihoods were significantly different.
We estimated the four phylogenies as above and the per-site likelihoods in RAxML 8.2.7 [25]. The p-values were then obtained using the program CONSEL [26].
Although the p-value was the highest for model 1 (best BI topology: B. turanensis monophyletic, B. pewzowi basally) in both, the AU and SH tests, none of the topologies were rejected (Table S3).  ((2n turanensis),4n pewzowi),(4n oblongus),2n variabilis, 2n balearicus, 2n viridis (Fig. 1c), the subclade of allotetraploid B. pewzowi contains a single diploid species, B. turanensis, that also occurs geographically proximate, suggesting its lineage is the most parsimonious candidate maternal genome donor. The paternal genome is probably inherited from a species that had earlier diverged from B. latastii (Fig. 1c). Maternal (0.82 My; 0.32-1.45 My) and paternal (1.45, 0.54-2.64 My) subclades of 4n Bufo pewzowi are both estimated to have started their phylogenetic diversification in Pleistocene periods, suggesting this as a timeframe for the allopolyploidization event. In the mtDNA-and maternal sub-genome trees, B. pewzowi forms a paraphyletic group, which may indicate several diverged subpopulations or, alternatively, multiple origins of the allopolyploid B. pewzowi (Fig. 1b, c). A surprising result in this context is the phylogenetic position of 2n B. turanensis, which looks as if the recent representatives of this diploid lineage have been derived from the tetraploid B. pewzowi, both in the nuclear and the mtDNA phylogenies (i.e. after allopolyploidization in the Pleistocene, the true maternal B. turanensis ancestral lineage got extinct but later re-emerged by loss of the paternal genome from the 4n B. pewzowi). To test whether this is a true evolutionary signature or rather a phylogenetic artifact, we performed maximum likelihood ratio tests on the maternal nuclear ancestry (Suppl. Text S4). The data at hand indeed provide the highest likelihood for the topology suggesting that B. turanensis is derived from 4n B. pewzowi. However, alternative topologies could not be rejected and a final clarification of this question exceeds the framework of our study. However, future work should indeed test the intriguing possibility that a diploid vertebrate taxon could have evolved from an allotetraploid ancestor by the loss of a genome.
II: 4n B. oblongus. The situation is less clear for 4n B. oblongus from northern and eastern Iran, whose maternal genome forms a sister clade to the maternal ancestor of B. pewzowi, from which B. oblongus is cytogenetically distinguished [8]. Tetraploid B. oblongus presumably also inherited its maternal subgenome from the lineage represented by diploid B. turanensis, while its paternal subgenome is derived from a lineage related to B. latastii (Fig.   1c). Within the paternal clade of B. oblongus, we find at least one inconsistency (O5), which either indicates that this tetraploid arose several times (Fig. 1c), or experienced recent genetic interactions with diploid or triploid forms, as documented from Iran [27]. sister relationship with the maternal genome of 3n B. pseudoraddei (Fig. 1c, Fig. S1). This  (Fig. 1c).
V: 3n B. pseudoraddei. While the nuclear ancestry of B. pseudoraddei, endemic to the Swat valley of Pakistan, appears at first glance similar to that of 3n B. zugmayeri, quantitatively different genome contributions led to its formation. Namely, B. pseudoraddei presents 2 sets of maternal subgenomes, versus only one for B. zugmayeri (Fig. 1c, Fig. S1). In addition, B. pseudoraddei's mtDNA is closely related to that of B. shaartusiensis (and 3n B. baturae), while that of B. zugmayeri is closer to B. turanensis (Fig. 1b).
Signatures of three additional genomic interactions that led to allopolyploids ( Fig. S2: UIL X1, X2, X4) While on the one hand, UIL X1 and X2 from northeastern Iran share their mtDNA (Fig. 1b) and maternal nuDNA phylogenetic history with B. oblongus (of which they might simply represent a sub-population), their paternal genomes present the lineage of the paternal ancestry of 3n B. baturae. As mentioned for O5 (see above), this either indicates that B. oblongus-like tetraploids arose several times (Fig. 1c), or reflects recent genetic interactions with diploid or triploid forms in Iran [27]. Triploid UIL X4, on the other hand, relates to 2n B. shaartusiensis both in terms of mtDNA haplotype ( Fig. 1b) and nuclear maternal genome ( Fig. 1c), while its paternal genome clusters with the paternal ancestry of recent 4n B.
pewzowi, suggesting this is a recent hybrid between a B. shaartusiensis mother and a B.
pewzowi father, found at the type locality of the maternal species. It is worth noting that UIL X4 displayed two maternal alleles for SOX3 (Fig. S5): one related to B. baturae and another one to B. pewzowi maternal lineage (which is not apparent in Fig. 1c because our concatenation approach only kept one maternal sequence). Bayesian tree as shown in Fig. 1c; schematically shown are five hybridization events (I to V; bold arrows) that have successfully evolved polyploid taxa; in addition, thinner dashed lines with arrows show inferred additional hybridization events that have led to unidentified polyploid lineages (UIL). Small numbers at branches show Bayesian posterior support values (>50%); large numbers at nodes show divergence time estimates (mean values) in millions of years ago (Mya); grey bars indicate 95% confidence intervals for nodes with sufficient posterior support; labels at tree leaves are shown only for UIL X1, X2 and X4 and consist of the abbreviated taxon name, original sample number, followed by simplified sample number as used in the paper and P for paternal or M for maternal subgenome, respectively. For detailed comments see Supplementary Text S5.