Historical gene flow constraints in a northeastern Atlantic fish: phylogeography of the ballan wrasse Labrus bergylta across its distribution range

The distribution and demographic patterns of marine organisms in the north Atlantic were largely shaped by climatic changes during the Pleistocene, when recurrent glacial maxima forced them to move south or to survive in northern peri-glacial refugia. These patterns were also influenced by biological and ecological factors intrinsic to each species, namely their dispersion ability. The ballan wrasse (Labrus bergylta), the largest labrid fish along Europe's continental margins, is a target for fisheries and aquaculture industry. The phylogeographic pattern, population structure, potential glacial refugia and recolonization routes for this species were assessed across its full distribution range, using mitochondrial and nuclear markers. The existence of a marked population structure can reflect both recolonization from three distinct glacial refugia and current and past oceanographic circulation patterns. Although isolated in present times, shared haplotypes between continental and Azores populations and historical exchange of migrants in both directions point to a common origin of L. bergylta. This situation is likely to be maintained and/or accentuated by current circulation patterns in the north Atlantic, and may lead to incipient speciation in the already distinct Azorean population. Future monitoring of this species is crucial to evaluate how this species is coping with current environmental changes.


Background
The northeastern Atlantic experienced considerable climatic changes during the Pleistocene when glacial cycles acted as a major driver shaping the structure of marine communities [1]. With the glaciation of the North Sea and the loss of the warm temperate regime along the west European continental margins [1][2][3], marine populations were forced to move towards southernmost regions or to survive in northern peri-glacial refugia [4]. During glacial maxima the polar front extended far south and reached the north of the Bay of Biscay [3] and possibly the western Iberian coast [2,5]. Despite the drastic changes in sea surface temperature [6], several glacial refugia have been evidenced, including areas around northern Norway, the Faeroes and Iceland, the Hürd Deep, southwest Ireland and southwest Britain, southwest Continental Europe and the Mediterranean Sea [4]. The rising temperatures and resulting regression of the polar front during interglacial periods would then allow the recolonization of northern regions [7].
These long-term environmental changes led to the emergence of distinct distribution patterns and, at the same time, to similarities between the geographical ranges of species that share similar environmental tolerance ranges [8]. Within this framework, many coastal species are thought to have experienced expansions and contractions of their geographical ranges associated with demographic changes, reflecting their biogeographic origins and environmental tolerances. As such, it could be expected that warm temperate and tropical species may have only survived in the warmer Mediterranean or along the West African coast, while cold tolerant species may have also survived in northern refugia [9]. In addition, there are multiple factors involved in the changing geographical distribution of populations and species, some of them related to other environmental variables, species-specific characters such as life-history traits and stochastic effects.
It should also be expected that distinct species coped differently with the above-referred constraints. Even closely related sympatric species with similar environmental tolerances may present distinct genetic structures and demographic history patterns (for a review see [10][11][12]). This should explain why we can find in the northeast Atlantic fish species showing: (i) panmictic populations without latitudinal differences in genetic diversity (e.g. Lipophrys pholis [13]); (ii) significant population structure but still with similar levels of genetic diversity throughout the entire species range (e.g. Taurulus bubalis [14]); or (iii) sharp decline of genetic diversity from southern (west European) to northern (Scandinavian) populations (e.g. Pomatoschistus microps [15]; Symphodus melops [16]). Similarly, studies have indicated that some species may have persisted in northern regions even during glacial maxima (e.g. Pholis gunnellus [17]), while other only subsisted near the North Sea [14,15] or within the Mediterranean (e.g. Chromis chromis [18]; Sprattus sprattus [19]).
The ballan wrasse, Labrus bergylta Ascanius, 1767, is the largest labrid fish in continental waters of the northeastern Atlantic [20], presently ranging from southern Norway to the west coast of Morocco and the Macaronesian archipelagos [21]. It is a sedentary, obligate protogynous hermaphrodite species whereby dominant territorial males derive from sex-changing mature females, which typically live in harems [22]. Adhesive eggs are laid in rocky substrates and larvae are released to the plankton [23].
It has attracted considerable attention due to three main reasons: (i) it is a target for commercial [22] and recreational fisheries [24], (ii) it is currently used as a cleaner species in northern Europe salmonid fish farms as an alternative to chemical treatments [25,26], and (iii) its phenotypic plasticity and lifehistory variation have raised questions about its taxonomic status and the possibility that one or more cryptic species could exist.
In recent years, several consensual and quite complete phylogenies have been published on both the tribe Labrini (where the genus Labrus is included) [27] and on the family Labridae [28]. At  revised due to the recent split of some taxa and the high intraspecific polymorphism, particularly in the genera Labrus and Symphodus ( [27,29] and references therein). Hanel et al. [27] reported high intraspecific genetic divergence in L. bergylta and its systematic status has raised additional interest because [30] (and references therein) showed differences in life-history traits of two main morphotypes (plain and spotted body colour patterns). Genetic differences between morphotypes were reported in [31] but only with microsatellites in one restricted region. Recently, [32] found no differences between morphotypes with genetic markers widely used in fish phylogeography.
A phylogeographic study of the ballan wrasse based on the control region (CR) of mitochondrial DNA has recently been published in [33]. Focusing in the northern distribution range of the species, around the British Isles and southern Norway, this study showed reduced levels of mitochondrial genetic diversity towards northern latitudes, and the presence of two divergent clades showing evidence of population expansion.
This study extends the previous study in [33] and our potential knowledge of the species phylogeography and genetic structure by: (i) covering the species geographical distribution and (ii) analysing a nuclear fragment (the first intron of the S7 ribosomal protein gene) in addition to the mitochondrial CR fragment. The present genetic assessment will aid conservation and management of the species by providing essential information: (i) on potential effects associated with individual translocations or the exploitation of wild cleaner fish in salmonid fish farms in the North Sea and (ii) for stock assessment studies of local fisheries in the central northeastern Atlantic. To date this is the most comprehensive study describing the genetic structure, the signatures of expansion/contraction events and identifying potential Pleistocene refugia of the ballan wrasse along its distributional range.

Sampling
Specimens of L. bergylta were obtained from 14 locations along its distributional range in the northeastern Atlantic and North Sea ( figure 1 and table 1). These sites included: Corvo, CO and Santa Maria, SM in the Atlantic archipelago of the Azores; Lisbon, LI in western Portugal; Vigo, VI and Ferrol, FE in northern Spain (Galicia); Roscoff, RO in France; Portaferry, PO in Northern Ireland; Mweenish, ME and Bertraghboy Bay, BB in western Ireland; Lochaline, LO and Loch Sunart, LS in western Scotland; Arendal, AR, Hidra, HI and Sogne, SO in southern Norway.
All sampling and handling of fish were performed by experienced personnel in accordance with relevant legislation in each country (see the Ethics section for detailed information). A field campaign to Madeira in 2014 provided no fish samples or even sightings despite this species being listed to the region [34]. In all cases, a small piece of fin tissue was clipped and preserved in 96% ethanol.

DNA extraction, amplification and sequencing
Total genomic DNA was extracted with the REDExtract-N-Amp Kit (Sigma-Aldrich) following the manufacturer's instructions. The mitochondrial CR and the first intron of the nuclear S7 ribosomal protein gene (S7) were amplified using L-pro1 and H-DL1 [35], and S7RPEX1F and S7RPEX2R [36], respectively (see electronic supplementary material, appendix I for details).
Sequences were edited with Codon Code Aligner (Codon Code Corporation) and aligned with Clustal X 2.1 [37]. Whenever possible, both strands of the same specimen were recovered for S7 following the approach of [38]. Sequences obtained were deposited in GenBank (accession nos. KU751889-KU752182). Additional sequences available in GenBank were obtained from [32,33] (electronic supplementary material, table S1 provides the original reference for each individual sequence). The ARLEQUIN software package v. 3.5 [44] was used to estimate genetic diversity within each sample, to evaluate potential population differentiation and to perform neutrality tests. The same software was used to perform analyses of molecular variance (AMOVA) [45], to compute pairwise F ST estimates and corrected pairwise differences between populations. In the case of the S7 gene fragment, the analyses were also run in ARLEQUIN after allowing the program to reconstruct the haplotypes using the ELB algorithm [46]. The number of migrants was calculated for each population pair using the same software. The correlation between geographical distance and F ST was computed with the Mantel test [47,48] (also in ARLEQUIN with 10 000 permutations; geographical distances measured along the shore line).
The spatial analysis of molecular variance (SAMOVA 1.0) [49] was used to identify groups of sampling locations that are geographically and genetically homogeneous and maximally differentiated from each other. The most likely number of groups was identified by running SAMOVA with 2 to 13 groups and choosing the partition scheme with the highest F CT value. The sequences of the locations included in each of the groups that maximized F CT were pooled together. Mismatch analysis [50,51], Fu's Fs [52] and Tajima's D [53] tests were performed in ARLEQUIN to test for possible bottlenecks and population expansion in each group. For the mismatch analyses parameters τ , θ 0 , θ 1 and M and their confidence intervals were obtained by a parametric bootstrap approach using 10     sizes were calculated before (N 0 ) and after (N 1 ) a sudden expansion event, the effective population size prior to a spatial expansion event (N), the time since the expansion (t, in years), and the migration rate (m), using the equations τ = 2μτ and θ = 2Nμ. A Markov chain Monte Carlo (MCMC) approach taking into account phylogenetic relationships among haplotypes as implemented in LAMARC 2.1.9 [54] was used to estimate effective population size (Nef), the exponential growth parameter (g) and the migration rates among adjacent groups of populations, using 10 runs of 12 short chains of 1000 steps and five long chains of 50 000 steps, with a burn-in of 10 000 steps. In order to compute estimates of effective population size, their changes with time and the age of populations, the following mutation rates were used: 5% for CR [55] and 0.23% for S7 [56]. MtDNA CR mutation rates in fish are widely variable (e.g. 2.2-4.5%/MY between lineages for East African cichlids [57]; 15-20%/MY for Indo-Pacific sardines [58]). In the absence of a clock calibration for the CR of L. bergylta, we address the uncertainty by tentatively assuming a within lineage mutation rate of 5%/MY, which is within the range of values found for other fish species Past population demography of L. bergylta was inferred using the linear Bayesian skyline plot (BSP) model [59] as implemented in BEAST v. 1.7 [60], employing the Bayesian MCMC coalescent method and a strict clock. The Bayesian distribution was generated using results from five independent runs of 150 million MCMC steps obtaining effective samples sizes (ESS) of parameter estimates of over 200, with a burn-in of 10%. The time to most recent common ancestor (tMRCA) and the median and corresponding credibility intervals of the BSP were depicted using TRACER v. 1.6 [61].

Results
The CR dataset consisted of a total of 333 bp fragment after alignment (433 sequences) and yielded 160 distinct haplotypes with 93 polymorphic sites (94 transitions, 12 transversions and six indels). For the S7, a fragment of 537 bp was analysed, with 39 haplotypes obtained for the 478 sequences (corresponding  The haplotype with the highest out group probability is displayed as a square, other haplotypes as circles. The area of the circles is proportional to each haplotype frequency. Colours refer to the region in which haplotypes were found. In the case where haplotypes are shared among regions, shading is proportional to the frequency of the haplotype in each region. to 239 individuals). Differences between haplotypes included 12 transitions, six transversions and no indels, in 18 polymorphic sites. The haplotype network for the CR of L. bergylta revealed three distinct groups (figure 2). The first group (sub-network (a) in figure 2) included most sequences from the Atlantic European shores and some Norwegian samples, showing star-like patterns with haplotypes shared between several sampling locations. The inferred ancestral haplotype included 17 individuals from Spain, Ireland and Scotland (outgroup weight 0.096). The second sub-network grouped ((b) in figure 2) together the majority of the Norwegian and the remaining Atlantic European sequences. The ancestral haplotype inferred for this sub-network included 82 fish from Norway, Ireland, Scotland and France (outgroup weight 0.238). Interestingly, the only sample caught in the Canary Islands shared its haplotype with fish from Ireland, Scotland, northern Spain and western Portugal. The third sub-network ((c) in figure 2) includes only fish from the Azores (outgroup weight 0.109). The haplotype network obtained for the S7 of the ballan wrasse is less deep and diverse, and the geographical pattern obtained is not as obvious (figure 3). The inferred ancestral, dominant and in the centre of several star-like patterns, includes 138 fish from Norway, Ireland, Scotland, the English Channel, France, Spain and Portugal (mainland and Azores) (outgroup weight 0.104). Table 1 showed the genetic diversity indices per sampling site. For the CR, haplotype diversity indices were higher for the Atlantic locations, comparing to Norwegian ones. Azorean locations presented a much higher percentage of private haplotypes when compared with all other sites. Concerning the S7, the genetic indices showed homogeneity along sites.
AMOVA analyses revealed genetic structure along the distribution area of L. bergylta for both genetic markers (CR: F ST = 0.462, p < 0.001; S7: F ST = 0.107, p < 0.001). Genetic differentiation and gene flow between sampling locations are shown in table 2 and corrected pairwise differences in electronic supplementary material, table S2. For the CR, significant F ST and corrected pairwise differences were found between locations from Norway, Azores and the rest of European sites, confirming the existence of three populations. Significant differences were also found between FE and some other adjacent locations; however, migration among these locations was consistently higher than the threshold limit of 1. The  The haplotype with the highest out group probability is displayed as a square, other haplotypes as circles. The area of the circles is proportional to each haplotype frequency. Colours refer to the region in which haplotypes were found. In the case where haplotypes are shared among regions, shading is proportional to the frequency of the haplotype in each region.
pattern of differentiation was not as straightforward for the S7. Significant F ST and corrected pairwise differences were found between Azorean and all other sites. These sites revealed no evident pattern, with significant differences being found between some of them. Considering the number of migrants (Nm), isolation was only found between the Azores and the remaining locations. SAMOVA results for the CR dataset yielded a maximized F CT (0.623, p = 0.009) for two groups, the two locations from the Azores (CO and SM) versus the 12 locations belonging to other European shores (AR, HI, SO, LS, LO, PO, BB, ME, RO, FE, VI and LI). The three gene pools structured test (Azores versus Norway versus all remaining sites in Continental Europe) was also significant (F CT = 0.610, p < 0.001). Concerning the S7, the SAMOVA analysis revealed a maximized F CT (0.268, p = 0.016) for two gene pools: Azores versus all remaining sites.
Isolation by distance was found for both the CR and the S7 of the ballan wrasse, as the Mantel test yielded significant correlations between genetic and geographical distances (r = 0.738, p < 0.001 and r = 0.573, p < 0.001, respectively). Taking together the results from the F ST , Nm and SAMOVA, the following groups were considered for further demographic and phylogeographic analyses:     The analyses of the mismatch distributions for the CR of L. bergylta were compatible with the models of sudden and spatial expansion for the Azores and all remaining Atlantic populations, except for the sudden demographic expansion in the case of Norway (table 3). Visual inspection of the mismatch graphics reveals clear differences among the three populations: the Azorean population presents a narrower width in comparison with the other two populations, revealing a strong peak at five differences; the Atlantic population revealed a stronger peak at four differences, while Norway presented a peak at 0 differences (electronic supplementary material, figure S1). The average estimated time for the demographic expansion yielded 154 thousand years (ky) for the Azores and 46 ky for the Atlantic. Considering the average spatial expansion, it was estimated to have occurred in the Late Pleistocene for the Azores (154 kya) and in the Middle Pleistocene for the Atlantic (471 kya) and Norway (519 kya). The S7 dataset, on the other hand, did not conform to neither the demographic nor the spatial model, for both populations.
For the CR dataset of the ballan wrasse, higher female effective population size was found for the population of Azores, and the Norwegian group presented the lowest value for this parameter. The same pattern was found for the growth rates, with the Scandinavian population presenting a negative growth rate. For the S7, the yielded values were similar for the two analysed groups (table 3). BSP runs failed to converge for the CR of the Atlantic group and for the S7 of Continental Europe. Estimates of tMRCA were consistent with estimates from previous analyses.
For the CR of L. bergylta, it was not possible to estimate migration rates between the Azorean group and the rest, as the connectivity assumption of LAMARC was not met. Migration rate was higher from Norway to the Atlantic (20.564) than in the opposite direction (0.014). Considering the S7 dataset, the migration rate was higher from the Azores into the European group (390.521) versus 106.857 in the opposite direction.

Discussion
The results presented in this study highlight four main features concerning the phylogeography of the ballan wrasse. First, the northward decrease in genetic diversity along the species' distributional range. Second, the existence of genetic structure along the sampled area. Third, the distinctiveness of the Azorean population. Finally, the demographic and spatial expansion of most of the species' populations. Taken together, these findings agree with and expand those found in [33] for the North Sea.

Genetic diversity and population structure
In general, the pattern of haplotype diversity found for the ballan wrasse follows that reported for other fishes [15,16], with higher indices in the central north Atlantic and lower in the northern locations (CR dataset). The Azorean locations clearly stood out, with higher percentages of private haplotypes. This fact supports the distinctiveness of the population from this isolated archipelago in the central northern Atlantic. Similar results were described for other fish species like Coryphoblennius galerita [62] and L. pholis [63], calling attention to the genetic distinctiveness and importance of the Azorean fish populations.
Are these differences related to life-history traits such as the pelagic larval duration (PLD)? The correlation between the PLD and F ST estimates has been found to be very strong in recent modelling approaches [64]. However, there is an ongoing debate on this topic, as other studies (e.g. [65]) refute the conventional use of PLD as a good predictor for the magnitude of gene flow. Several studies reported the absence of genetic structure for species with relatively long PLD, such as the rocky intertidal L. pholis [13] or the highly migratory Thunnus thynnus [66]. Although somehow expected, this is not a consensual pattern for species with such a life-history trait. In fact, population structure has also been recorded for a variety of fishes [14,62]. This also seems to be the case of the ballan wrasse, which has a relatively long PLD of 37-49 days [67] but revealed genetic structure along the sampled area, especially when considering the CR dataset. The locations from Norway, Azores and the rest of European sites were considered to be significantly different. Although the results from the slow evolving S7 gene do not show such a clear cut, Azores keeps its singularity, being isolated (with no migration) from the remaining European locations (average Nm of 0.160 between Azorean locations and the remaining ones). Similar findings related to the peripheral position of Azorean haplotypes were also reported for several other species [62,68].
Are these differences related to ocean current patterns in the northeastern Atlantic? The North Atlantic circulation pattern can at least partially explain the isolation of the Azorean population as it prevents gene flow between European and Azorean locations. This isolation is mainly related to the Azores recent Miocene origin (1)(2)(3)(4)(5)(6)(7)(8)), its distance from the closest continental shore (approx. 1300 km from mainland Europe), and the fact that predominant sea surface currents cross the North Atlantic in an eastward direction. Main current patterns may show regional variations that can be detected at a seasonal scale [72], yearly scale from decades to thousand years [73], or may occasionally generate considerable mesoscale variability (e.g. according to [74] strong eddies promote northwestward transport from Africa or Madeira towards the Azores).

Complexity and depth of the genealogies
All the analyses reveal expansion events and population ages clearly older than the Last Glacial Maximum (LGM, 21-18 kya) [75] with lineages coalescing in Late and Middle Pleistocene. Similar findings were reported for other fish in northeastern Atlantic (e.g. [62]). The close relative S. melops, on the other hand, showed a different pattern with exclusive haplotypes for the Scandinavian population and an expansion event much younger than the LGM [16].
Migration rate was higher from Norway to the Atlantic than in the opposite direction (CR), unlike what was found for other species (e.g. [14,16]). Considering the S7 dataset, the migration rate was higher from the Azores into the European group compared with the opposite direction. This pattern is uncommon and needs further evaluation. Nevertheless, this historical migration was previously advanced in [76] in a model of blenniid speciation in the northeastern Atlantic. According to this author, fish from European mainland could occasionally be transported to the Atlantic islands and survive glacial maxima, potentially evolving to become incipient new species. The Azores would thus act as potential speciation centre, exporting them back to European shores by predominant currents (e.g. Parablennius ruber [77]).

Glacial refugia
For L. bergylta, the present results suggest three distinct refugia during glacial maxima: North Sea, western and southern Iberian coasts and Azores. Considering the fact that L. bergylta is able to breed at high latitudes in southern Norway [21], it is very likely that it was able to survive in European shores during the last glaciation. During the LGM, the North Sea surface was almost entirely frozen [2]. However, ballan wrasse populations may have survived in the vicinity of the North Sea and in more southern refugia. In fact, L. bergylta could have used unglaciated parts of these shores as glacial refugia, as previously suggested for other north Atlantic fish species (e.g. [14,16]). Our results agree with this hypothesis, namely the reported deep networks and haplotypes shared between northern and southern populations along the northeastern Atlantic.
Western and southern Iberian shores could have kept their populations and may have functioned as glacial refugia for this species. According to several authors, during the LGM the temperatures could have been kept within the thermal limits tolerated by L. bergylta (e.g. [2,5]). While analysing the post-glacial recolonization events in the marine coastal environment, one must also consider the close dependence between dispersal patterns and the ecology, life-history traits and behaviour of the various organisms [7]. The relatively long PLD reported for L. bergylta is probably related to the low degree of genetic differentiation found in this study along European coasts. A long PLD might also be associated with the species' ability to long-distance dispersal in a single or a few generations. This would explain the existence of shared haplotypes between southern and northern locations for both markers, thus diluting the source effect of their glacial refugia.
Another potential refugium where L. bergylta may have survived Pleistocene glaciations is the Azores. Water temperatures in this Macaronesian archipelago would have been suitable for this species during the LGM, as the drop in sea surface temperature was moderate [78]. The distinct nature of Azorean haplotypes and the absence of gene flow between Azorean and Continental European locations support this hypothesis. The existence of an Azorean glacial refugium has been reported in the literature for several fish species (e.g. Raja clavata [79], C. galerita [62]).

Final remarks on species cohesiveness, phenotype plasticity and implications to management
The ballan wrasse is the target of local fisheries across its entire distribution range and is a resource of paramount importance for local fisheries in northern Spain (e.g. [22,24]). It is currently being used as a cleaner species in northern Europe salmonid fish farms (e.g. [25]). Its absence from the archipelago of Madeira (the species was reported in [34]), if confirmed in the next few years, suggests that L. bergylta may be under important selective pressures. This is particularly relevant in its southern distribution limit. In the future, under a warming climate prospective, this could imply a northward regression of this species along its southern limit. Therefore, a thorough monitoring programme should be conducted both in Madeira (where it seems to be currently absent) and also along the Portuguese continental shore. It is important that the results of this paper are taken into account together with the recent attention given to the wide phenotypic plasticity [80] and the systematic status of the ballan wrasse [22,[27][28][29][30][31][32]. Although this study did not control for morphotypes, and this is still an ongoing subject of research, results obtained in [32] using the same genetic markers point to the conclusion that our results are not affected by it.

Conclusion
On the one hand, this study points towards the common origin of the historical population of L. bergylta with shared haplotypes between the Azores and mainland and exchanges of migrants in both directions (S7 dataset). On the other hand, the mitochondrial results reveal a complete isolation of the Azorean archipelago. The present-day pattern of oceanographic currents in the northeastern Atlantic is likely to maintain and/or accentuate the isolation and distinctiveness of this insular population of the ballan wrasse. This situation can possibly lead to a case of incipient speciation, although endemic fish species are rare in this archipelago (see [8]). This trend for the genetic isolation of the Azores can be occasionally reversed by severe or atypical weather events (e.g. [68]).
The relatively long PLD of the ballan wrasse and the absence of geographical barriers could lead us to infer unconstrained gene flow along European shores. However, the patterns found for this species reveal a population substructure, which can reflect both a recolonization from two distinct glacial refugia and the circulation pattern along the continental coast.
Commercial exploitation, together with the ongoing taxonomic discussion on distinct morphotypes, suggests that special attention should be given to this species in the near future. The genetic structure and phylogeographic pattern of L. bergylta still need further enlightenment and it would be interesting to complement this study with microsatellites and/or genomic tools, comparing specimens from both the plain and the spotted morphotypes along their entire distribution area.