Population genomics reveals possible genetic evidence for parallel evolution of Sebastiscus marmoratus in the northwestern Pacific Ocean

Understanding patterns of population diversity and structuring among marine populations is of great importance for evolutionary biology, and can also directly inform fisheries management and conservation. In this study, genotyping-by-sequencing was used to assess population genetic diversity and connectivity of Sebastiscus marmoratus. Based on 130 individuals sampled from 10 locations in the northwestern Pacific Ocean, we identified and genotyped 17 653 single-nucleotide polymorphisms. The patterns of genetic diversity and population differentiation suggested that the Okinawa Trough might be the ancestral centre of S. marmoratus after the Last Glacial Maximum. A shallow genetic structure was observed among sampled populations based on the implemented structuring approaches. Surprisingly, we detected genetic homogeneity in two population pairs (i.e. Xiamen–Niigata and Zhuhai–Iki Island), in which populations have large geographical and latitudinal intervals. Population structure and allele frequency distribution based on outlier loci also mirrored the observed genetic homogeneity in the above-mentioned population pairs. Integrated with biological, environmental and genomic data, our results provide possible genetic evidence for parallel evolution. Our study also provides new perspectives on the population structure of S. marmoratus, which could have important implications for sound management and conservation of this fishery species.


Introduction
Determining parallel evolution is important because it can reveal the nature of the ecological and evolutionary forces that shape biodiversity [1]. Genetic parallelism appears to be ubiquitous and occurs at all taxonomic levels [2][3][4]. When species adapt to local biotic and abiotic conditions, similar selective pressures will lead to identical or similar adaptive changes in distantly related species or populations [5,6].
Parallel evolution provides valuable systems for studying the genetics of new adaptations [3], and identification of the genetic basis for parallel adaptation is a prominent goal in evolutionary biology [7]. Recent technological advances in population-scale high-throughput sequencing provide powerful tools to explore parallel evolution at genomic scales [8]. In the marine realms, applying population genomics approaches, parallel evolutionary processes have been recently observed in threespine stickleback Gasterosteus aculeatus [8][9][10], European anchovy Engraulis encrasicolus [11], Atlantic cod Gadus morhua [12], steelhead/rainbow trout Oncorhynchus mykiss [7] and Atlantic herring Clupea harengus [13]. However, such evolutionary parallelism studies are generally restricted to similar latitudinal gradients on both sides of the Atlantic Ocean. Therefore, we wanted to investigate whether parallel evolution could be detected within large latitudinal scales.
In the northwestern Pacific Ocean (NWP), with the influence of the Kuroshio Current System (KCS), the ecological conditions between southeastern Chinese (SEC) coasts and Japanese (JPN) coasts are roughly homogeneous [14]. It is thought that parallel evolution could be possibly detected in marine organisms between the two regions. The marbled rockfish Sebastiscus marmoratus is probably an ideal candidate for the detection of parallel evolution. With low genetic background noise, the genetic homogeneity in this species [15] allows us to identify selective outliers associated with local adaptation [13]. Reported biological studies have demonstrated that the mating seasons of S. marmoratus are roughly identical between SEC and JPN regions [16,17]. It is commonly known that mating seasons largely influence the reproduction and population recruitment of ovoviviparous fish, rather than spawning seasons. The shared reproductive features could be an indicator of environmental and genetic similarity. In addition, our recent population genetic study explicated the close relationship between sampled individuals from Xiamen and Japan [18]. Therefore, in the present study, with the objective of delineating a precise genetic pattern of S. marmoratus populations, we applied a genotyping-by-sequencing (GBS) approach to investigate the population diversity and structuring of 10 populations (five Chinese populations and five Japanese populations) sampled from the NWP. The inclusion of S. marmoratus samples from SEC and JPN regions may provide possible evidence for parallel evolution of this species.

Sample collection
Adult S. marmoratus were obtained from 10 locations along the NWP ( figure 1 and table 1). The samples were collected by trawl net or hook fishing in offshore waters, thus ensuring that the samples collected were representative of the local populations. Muscle tissues were preserved in 95% ethanol. All samples were collected in accordance with national legislation.

DNA extraction, GBS library preparation and sequencing
DNA isolation was accomplished by a standard phenolchloroform extraction protocol, followed by RNase A treatment. The GBS libraries were constructed following Elshire et al. [19]. Briefly, the DNA was digested with both high-fidelity NlaIII and MseI restriction enzymes. A total of 10 libraries were created by uniquely barcoding each of the individuals from the respective site and then pooling these individually barcoded samples. The barcodes used were six nucleotides in length. The libraries were pooled for multiplexed polymerase chain reactions (PCRs), and then the PCR products were purified. The sequencing was performed in three lanes of an Illumina HiSeq2500 platform, using 150-bp paired-end reads. The library preparation and sequencing processes were performed commercially at Novogene Co. Ltd in Beijing, China.

SNP calling and filtering
Raw sequences were parsed, trimmed and demultiplexed using the bioinformatics tool Trimmomatic 0.36 [20] with default parameters. A draft genome sequence of S. marmoratus assembled using SOAPdenovo2 software [21] based on whole-genome resequencing data was used as the reference. All quality-filtered reads were sorted and aligned to the reference sequence using the bwa-mem algorithm in BWA 0.7.12 [22] with default parameters. After the alignment, singlenucleotide polymorphism (SNP) calling was performed using SAMtools 1.3.1 [23]. SNP filtering was produced using VCFtools [24] with the following parameters. royalsocietypublishing.org/journal/rsob Open Biol. 9: 190028 allele frequency (MAF) was greater than 5%, (iii) only two alleles were present, (iv) sites that contained an indel were excluded, and (v) sites that failed the Hardy-Weinberg equilibrium (HWE) test at p < 0.05 were excluded. The parameter scripts of BWA, SAMtools and VCFtools are shown in the electronic supplementary material. We also tested for departures from linkage disequilibrium (LD) expectations within each sample site and chose to exclude loci that exhibited strong LD (r 2 > 0.8 [25]). Testing for LD made use of the TASSEL 5.0 software [26]. All datasets were reformatted using PGDSpider 2.0.5.2 [27].

Outlier detection
To identify outlier loci putatively under selection, two different approaches were used. Firstly, we used the fdist approach implemented in Lositan [28]. The probability of each locus F st belonging to the neutral distribution is used to classify loci into one of three selection categories: neutral selection (0.1-0.9), balancing selection (less than 0.1) and divergent selection (greater than 0.995) [29]. Outlier analyses were based on 60 000 simulations assuming an infinite allele mutation model and using neutral mean F st , 0.95 confidence intervals and a false discovery rate of 0.1. Five independent runs were performed to further reduce false positives. Secondly, we used a Bayesian model-based approach in BayeScan 2.1 [30], which has been shown to have lower type 1 error rates [31]. Analyses were conducted using default settings. Loci under selection were defined as those with a false discovery rate (FDR) of 0.1. Loci both under divergent selection in Lositan and with an FDR of 0.1 in BayeScan were considered as outliers. Following removal of these outliers and those under balancing selection, the retained loci were set as the non-outlier dataset. The non-outlier dataset was used in population genetic analyses.

Population genetic analyses
The program Arlequin 3.5 [32] was used to estimate average SNP diversity (π) and expected heterozygosity (H e ) of each population. Given that genetic diversity indexes are normally correlated to sample size [33], rarefaction of sample size (10 individuals per population) was randomly subsampled and the genetic diversity indexes were recalculated based on the non-outlier dataset. Allele frequencies of outliers across all populations were calculated using Arlequin and then the differences were compared SNP by SNP between populations. Hierarchical clustering using Euclidean distance with the Ward clustering method was performed on allele frequencies using the R package pheatmap to detect population homogeneity based on allele frequencies. The pairwise fixation index (F st ) for each of the two populations was estimated using Arlequin software and the significance of each pairwise F st value was assessed by 10 000 bootstrap permutations [32]. To correct for multiple hypothesis testing we applied a Benjamini-Hochberg FDR (α = 0.01) correction [34] using the R function p.adjust in the package stats. Population structure was investigated by using the program Admixture 1.3.0 [35]. The best value of the coancestry cluster (K) was estimated using a cross-validation procedure in the Admixture software. The best value of the coancestry cluster exhibited the lowest cross-validation error (CVE) compared with the other cluster values. In order to further estimate the substructure of S. marmoratus populations, we increased the coancestry clusters spanning from 2 to 10 and ran the analysis with 10 000 iterations. Principal component analysis (PCA) was also implemented using the R package adegenet [36] to determine whether sampled individuals reflected a history of differentiated populations. Finally, a population-based neighbour-joining (NJ) tree was constructed using the program Populations [37] based on estimates of genetic distance among populations (Cavalli-Sforza and Edwards chord distance, D c ) with 1000 bootstrap replications on individuals and visually displayed using SplitsTree 4.14.4 [38].

Gene prediction and functional annotation
The contigs containing outlier loci were used as queries in nucleotide searches with BlastX (E-value 1 × 10 −3 ) against the NR database at the National Center for Biotechnology Information (NCBI) website. In the case of multiple hits, the best match was chosen. Then, the functional annotations of these genes were obtained using Blast2GO [39]. This software conducts Blast similarity searches and maps Gene Ontology (GO) for homologous sequences. Blast2GO produces GO annotations as well as corresponding enzyme commission numbers (EC) for sequences with E-values <1 × 10 −6 , annotation cut-off values greater than 55 and a GO weight greater than 5.

Results
The 130 individual S. marmoratus in this study (figure 1 and  table 1)

Genotyping, SNP calling and filtering
The GBS sequencing produced 212.8 million high-quality reads, with a mean of 1.64 million reads per individual (electronic supplementary material, table S1). Quality-filtered reads were mapped against the assembled genome. A total of 563 880 putative SNPs were produced among our samples and 17 653 SNPs were retained following filtering steps. Of these loci, 1.0% (n = 180) were identified as outliers, 0.8% (n = 141) were under balancing selection, and the rest (n = 17 332) were considered as the non-outlier dataset.

Gene annotation analyses
Genome-wide scans for selection identified 180 outlier loci. The BlastX analysis showed 14 contigs containing outliers corresponding to known proteins in the NR database and eight contigs were functionally annotated. The low proportion of annotated loci might be due to the combination of a lack of a high-quality reference genome and short assembled contigs of GBS reads. The annotated loci had homology to proteins associated with metabolic activities such as transferase activities, protein binding, motor activities and oxidoreductase activities, among others (electronic supplementary material,  table S2). Consequently, inhibition or promotion of such protein expression could result in a broad spectrum of phenotypes. In addition, different geographical populations also need to adapt to a variety of other biotic and abiotic factors not considered in this study.

Population genetic analyses
The global average SNP diversity (π) of S. marmoratus was 0.136 ± 0.064 and ranged from 0.074 to 0.199 within each sample location. The global observed heterozygosity (H o ) and expected heterozygosity (H e ) were 0.130 ± 0.077 and 0.136 ± 0.066, respectively, and the estimates within each population ranged from 0.152 to 0.185 and from 0.158 to 0.202, respectively (figure 2). Genetic diversity estimates of Chinese populations were comparatively higher than those of Japanese populations. Among the 10 populations the highest diversity estimates were all detected in population ZS, even when 10 individuals per population were randomly subsampled (electronic supplementary material,   statistically significant after FDR correction ( p < 0.05, FDR α = 0.01) except for the Niigata-Xiamen and Yokosuka-Kochi pairs, indicating a certain degree of population differentiation. Different methods of determining population structure generally produced similar results. The most likely numbers of the coancestry cluster were one (K = 1), suggesting population panmixia. As the K value increased, the plots showed six genetic subpopulations when K ranged from 5 to 10: It is worth noting that the substructure of the Admixture analyses revealed an unexpectedly close relationship in two population pairs (i.e. XM-NI and ZH-IK), where the populations within the pairs have large geographical and latitudinal distances ( figure 3). Given the relatively short geographical distance, a close genetic relationship should probably be detected in the XM-ZH and NI-IK pairs. The mismatch between genetic distance and geographical distance could be possible evidence for parallel adaptive evolution. Therefore, we further reconstructed outlier-based population structures of 10 populations and these four populations, respectively. PCA plotting revealed clear grouping within the XM-NI and ZH-IK pairs, which was consistent with clustering based on allele frequency of outliers ( figure 5), showing evolutionary similarity of populations with large geographical and latitudinal gradients.

Discussion
In the present study, using high-resolution genomic SNPs, weak but statistically significant levels of genetic differentiation were detected among S. marmoratus populations. S. marmoratus is a demersal rockfish with restricted migration during the juvenile and adult life history and thus is considered to disperse only during the larval stage. However, given a very short pelagic larval duration lasting about 10 days [40], larval dispersal might also be restricted. Therefore, it is likely that the genetic similarity could be the result of limited divergence time from shared ancestry populations. The contemporary distribution ranges of S. marmoratus were almost exposed during the last glacial period [41], so this species may have been extirpated through large parts of its range and survived in glacial refugium. Population expansion and subdivision from glacial refugium are expected when favourable conditions returned and genetic homogeneity would also be expected in the recolonized regions given the relatively young postglacial ecosystems (less than 10 000 years) [41]. Under such a scenario, it can be inferred that the East China Sea glacial refugium, i.e. the Okinawa Trough, should be the centre of the diversity and origin of S. marmoratus, integrated with the highest genetic diversity of population ZS. Similar inferences were also derived in the population genetic studies of Lateolabrax maculatus [41], Synechogobius ommaturus [42] and Thamnaconus hypargyreus [43].
Understanding the capacity of natural populations to adapt to their local environment is a central topic in evolutionary biology [44]. Geographically distinct populations that are exposed to similar environmental conditions generally evolve similar genotypic and phenotypic traits [13]. In the present study, structuring analyses demonstrated genetic similarity in XM-NI and ZH-IK population pairs, in which populations span geographical distances of approximately 2000 km and latitudinal distances of approximately 10°. However, we failed to detect such genetic similarity in XM-ZH and NI-IK pairs. The close genetic relationship was also detected when using mitochondrial coding gene sequences [18]. Integrated with similar reproductive features and an outlier allele frequency pattern, the genetic similarity detected in XM-NI and ZH-IK population pairs could be considered possible evidence for parallel evolution in S. marmoratus. A similar conclusion was also drawn in the Atlantic herring case [13], in which genotypes   royalsocietypublishing.org/journal/rsob Open Biol. 9: 190028 were shared among geographically distant populations with low genetic differentiation. In addition, parallel genetic evolution was also reported in the threespine stickleback [8][9][10] and Atlantic cod [12], given similar environmental conditions (salinity in the threespine stickleback and temperature in Atlantic cod). Previous parallel evolution studies on marine organisms generally focused on populations in the same latitudinal gradients, ensuring that individuals undergo similar selective pressures. Herein, despite the high latitudinal interval (ca. 10°), possible parallel evolution evidence was also detected in S. marmoratus populations. To our knowledge, this might be the first report of parallel evolution in the NWP.
In spite of the large geographical and latitudinal interval, the habitats of SEC and JPN regions are generally homogeneous because of the influence of the KCS [14]. Environmental similarity might promote similar or identical genetic variants in the two regions [4]. The similar mating season [16,17] between the two regions could be an indicator of physiological and biological similarity. However, because of the low-efficiency annotation of the GBS technique, we failed to detect genetic variants associated with reproductive biology in this study. Moreover, the complex and polygenic nature of parallel evolution restricted our understanding of genetic parallelism in S. marmoratus. Further studies linking environment, genotype and biology are warranted to fully demonstrate the pattern of parallel evolution in this species.