Intercontinental karyotype–environment parallelism supports a role for a chromosomal inversion in local adaptation in a seaweed fly

Large chromosomal rearrangements are thought to facilitate adaptation to heterogeneous environments by limiting genomic recombination. Indeed, inversions have been implicated in adaptation along environmental clines and in ecotype specialization. Here, we combine classical ecological studies and population genetics to investigate an inversion polymorphism previously documented in Europe among natural populations of the seaweed fly Coelopa frigida along a latitudinal cline in North America. We test if the inversion is present in North America and polymorphic, assess which environmental conditions modulate the inversion karyotype frequencies, and document the relationship between inversion karyotype and adult size. We sampled nearly 2000 flies from 20 populations along several environmental gradients to quantify associations of inversion frequencies to heterogeneous environmental variables. Genotyping and phenotyping showed a widespread and conserved inversion polymorphism between Europe and America. Variation in inversion frequency was significantly associated with environmental factors, with parallel patterns between continents, indicating that the inversion may play a role in local adaptation. The three karyotypes of the inversion are differently favoured across micro-habitats and represent life-history strategies likely to be maintained by the collective action of several mechanisms of balancing selection. Our study adds to the mounting evidence that inversions are facilitators of adaptation and enhance within-species diversity.

: Best models of binomial logistic regression and beta-regression explaining and predicting inversion or karyotype frequencies by a combination of environmental variables .....13 Table S6: Dirichlet regression between karyotype composition and environmental variables. ...15 Table S7 Table S8: Environmental variables at locations sampled by Day et al (1983) (9)

Methods: Development and validation of a DNA marker for the inversion
We used the previously-demonstrated association between the chromosome I inversion karyotype (α/β) and two common alleles (B/D) of the alcohol dehydrogenase (Adh) allozyme marker (1) to develop and validate an inversion-specific DNA marker.
First, we searched for the putative Adh locus by blasting the Adh protein sequences of Drosophila from GeneBank on the draft assembly of Coelopa frigida genome (M. Wellenreuther, unpublished). The two best matching scaffolds were aligned together using MAUVE and showed several well-aligned areas, within which three coding loci corresponding to dipteran proteins were identified using Blastx in the swissprot NCBI protein database (the alcohol dehydrogenase, Adh, a predicted 39S ribosomal protein, Rib and an arginine N-methyltransferase 1, Met, see Table S1).
Second, to amplify and sequence those three coding loci in C. frigida, we developed five sets of primers (Table S1). Initial sequencing was applied on 42 (Adh) and 31 (Met, Rib) flies from North America and Europe (details in Table S2). Genomic DNA was extracted from adult flies using a salt-extraction protocol (2) with a RNase A treatment (Quiagen) or a lysis protocol (3) . The same PCR assays were carried out for the three loci in a 25-μL final volume composed of 1 μL of gDNA, 0.4 μM forward primers, 0.4 μM reverse primers and either 10µL of QuantaBio MasterMix or a mix including: 1× Green GoTaq Flexi buffer, 0.625 units of GoTaq DNA polymerase (ProMega), 2.5mM of MgCl2, 0.2 mM dNTP. The PCR amplification temperature profile consisted of an initial denaturation at 95 °C for 2 min followed by 35 cycles at 95°C for 45s; 55°C for 45 s and 72 °C for 1 min and a final elongation at 72 °C for 15 min. Sanger sequencing was performed with standard conditions at the Plate-forme d'Analyses Génomiques (Université Laval, Québec, Canada). Sequences of the three loci were cleaned for sequencing ambiguities and aligned using Geneious 9.1.7.
Third, a restriction enzyme procedure was developed to genotype two targeted SNPs in complete linkage with the Adh haplotypes (Fig.2), thus putatively associated with the inversion. On 42 samples, no other mutation than the targeted polymorphism was observed in the area to which the restriction enzymes linked. In a 10µL volume reaction, 8.5µL of PCR product was digested during 15min at 37°C with 5 units of the restriction enzyme AluI or DraI and 1x CutSmart Buffer (New England Biolabs). Digestion product was run on 2.5% agarose gel electrophoresis for 45min at 115V (Fig. S1).
To validate the association between the haplotype sequence and the inversion karyotype, 44 Coelopa frigida (4 αα, 17 αβ, 23 ββ, Table S2) were cut in two halves: the abdomen was used for allozyme characterization protocol as described in (4), which allow assessing the inversion karyotype, and DNA was isolated from the thorax for sequencing (12 samples) or genotyping with the restriction enzyme procedure (32 samples).

Fig S1: Electrophoresis of the Adh amplified locus digested by AluI and DraI.
The enzyme AluI digests only the β haplotype and thus allows separating the αα individuals from the ones carrying the β haplotype. DraI enzyme digests only the α haplotype and thus allows discriminating ββ individuals from the ones carrying the α haplotype    For each population, we simulated a set of karyotypes of the same sample size by sampling with replacement in a pool composed of the observed numbers of αα, αβ, ββ individuals. H-W deviations were calculated on the simulated set, this procedure was repeated 1000 times and the 2.5-97.5% quantiles on the distribution of bootstrapped values were taken as limits for the confidence interval.

Fig S2: Heterogeneity of inversion & karyotype frequencies between populations.
Deviance analysis and post-hoc contrasts of α rearrangement frequencies and each karyotype frequency between populations. Bars represent confidence intervals.

Fig S3: Heterogeneity of inversion frequency and karyotype proportions between populations.
Pairwise comparison of inversion frequency (left) and karyotype proportions (right) between populations. Stars denote significantly different proportions under a chi² test, corrected for multiple test following (5)

Fig S4: Summary variables drawn by PCA on environmental variables
For each summary variable (Fig. 3A), arrow plot representing the association between each environmental correlating variable and the first and second principal components (PC1, PC2). For all of them, the first PC was retained as the summary variable, following the Kaiser-Guttman and Broken Stick criteria (6). Eigen-values and percentage of variation explained by each PC are represented on the right side.  : Best models of binomial logistic regression and beta-regression explaining and predicting inversion or karyotype frequencies by a combination of environmental variables Analyses were conducted on frequencies corrected for sex-ratio 1:1, frequencies corrected for sex-ratio observed by sampling in natural populations, frequencies in male, frequencies in females. Grey  Salinity/tidal amplitude was a marginal predictor of the karyotype proportions (9%(αβ), 6%(αα), 1%(ββ) of variance,). Yet, it was not associated to variation in α frequency, possibly because the decrease in αβ proportions in the inner estuary was balanced by an increase in both homokaryotypes (Fig.3B). Wrackbed surface or the proportion of Zoosteraceae were never retained in the best models (Table.S5-7).  Table S7: Best models of redundancy analysis including spatial autocorrelation In complement, models of redundancy analysis were build including environmental predictors identified as relevant and, to control for spatial auto-correlation, spatial variables. Those spatial variables describe geographic proximity between populations based on Principal Coordinates of Neighbourhood Matrix (PCNM) map distances (7). Best models were selected using a stepwise backward model selection using permutation tests (8) and mostly support a significant role of the composition of the wrackbed in explaining variations of frequency between populations.
Spatial variables were not a significant factor except in one model addressing variation in αα frequency. When controlling for this spatial autocorrelation, the best environmental predictor (depth/T° of the wrackbed) still explained 18% of the variance in αα frequency.

Methods for comparison North America/Europe
Karyotype frequencies and wrackbed composition were reported by Day et al (9) for 13 Scandinavian populations (Table S8). Using locality names, we inferred approximate GPS coordinates and extracted the same large-scale climatic/abiotic variables as for North America (Table S8). For tidal amplitude, we used archives from The Norwegian Hydrographic Service and data from Gilburn & Day, 1994 (10). In the original publication, seaweed composition is reported with stars as an indicator of presence and abundance (ranging from 0 stars to 4 stars) and we interpreted those data as a ranked variable describing variation in the abundance of each seaweed between populations.
For comparison, the association between inversion/karyotype frequency and each raw environmental variable was analysed as in North America with a binomial GLM and a logit link (Fig.S10). Yet, given the high correlation between all variables in Scandinavia, we applied a PCA on all the variables and build multi-variable modelling on the first, second, third and fourth PCs (summary variables describing a total of 97% of environmental variance).

Results of the comparison North America/Europe
Inversion distribution and karyotype frequencies as reported by Day et al (7) mirrored the pattern that we observed in American populations since all investigated populations in Scandinavia were also polymorphic with a slightly less frequent α rearrangement (mean: 42% [35-50%]), significant heterogeneity between populations (deviance =77.4, df=14, p<0.001) and significant heterokaryotype excess (+24% on average, Fig.1).
Multiple ecological gradients (climate, salinity, tidal amplitude, and seaweed composition) correlated along the Scandinavian cline (Fig.S9D). For instance, Laminariaceae were more abundant in the norther part of the cline and other seaweeds in the southern part. Therefore, disentangling which ecological dimensions that underlined inversion frequency and karyotype composition in Europe was not possible. We therefore limited our comparison to the parallelism in the direction of the association between the inversion frequency and environmental variables.
Parallelism in the direction of association between inversion/karyotype frequencies and environment was observed for two major predictors, namely the North-South thermic cline and the seaweed composition of the wrackbed in Laminariaceae and other seaweeds. On both continents, the α frequency decreased along the North-South cline (negatively associated with latitude) with warmer air temperature observed in the south (Fig.S9BC). This is linked to significantly higher proportions of the karyotype ββ at southern warmer locations, at the depends of αβ in North America and at the depends of αα in Scandinavia. Both in America and Scandinavia, higher abundance of Laminariaceae was positively associated with α frequency linked to higher proportions of αα relatively to ββ. Similarly, higher abundance of other seaweeds was negatively associated with α frequency linked with higher ββ proportions relatively to αβ (Fig.S9BC). The abundance of Fucaceae was not a relevant predictor of α frequency in Scandinavia and showed no parallelism. Other climatic descriptors, such as precipitations and sea temperature, did not correlate with the North-South cline in the same direction on both continents, which may explain why they were differently associated with α frequency and suggest that parallelism is rather linked either to air temperature or other seaweeds abundance. Salinity and tidal amplitude were not significantly associated with α frequency in North America, while in Scandinavia they covaried along the cline, and as such, were also associated with variation in α frequency.
Overall, the cline was a much more important predictor of α frequency in Scandinavia, explaining up to 30% of the variation (PC1, Fig.S11, Table S9), likely because all factors co-varied along this gradient. Yet, seaweed composition of the wrackbed was an additional predictor of α frequency in Scandinavia, explaining 15% of variance (PC2, Fig. S11, Table S9), and underlying the relative proportions of ββ vs. αβ (Table S9-10).