Population structure and connectivity of tiger sharks (Galeocerdo cuvier) across the Indo-Pacific Ocean basin

Population genetic structure using nine polymorphic nuclear microsatellite loci was assessed for the tiger shark (Galeocerdo cuvier) at seven locations across the Indo-Pacific, and one location in the southern Atlantic. Genetic analyses revealed considerable genetic structuring (FST > 0.14, p < 0.001) between all Indo-Pacific locations and Brazil. By contrast, no significant genetic differences were observed between locations from within the Pacific or Indian Oceans, identifying an apparent large, single Indo-Pacific population. A lack of differentiation between tiger sharks sampled in Hawaii and other Indo-Pacific locations identified herein is in contrast to an earlier global tiger shark nDNA study. The results of our power analysis provide evidence to suggest that the larger sample sizes used here negated any weak population subdivision observed previously. These results further highlight the need for cross-jurisdictional efforts to manage the sustainable exploitation of large migratory sharks like G. cuvier.


Introduction
Migratory species are described as entire populations (or any geographically separate part of a population), whose members cyclically and predictably cross one or more national jurisdictional boundaries [1]. In the marine environment, defining sharks as migratory species requires considerable knowledge of individual movement patterns, as well as an understanding of   developed for the species (see [26]). Sixty-nine juvenile (birth-200 cm TL), 69 sub-adult (200-300 cm TL), 96 adult (300 cm TL+), and 121 (size not recorded) G. cuvier (159 female, 107 male and 89 sex unknown) totalling 355 individuals were genotyped. Juvenile, sub-adult and adult sharks were sampled from QLD, NSW, NT and NCL, while predominantly adult sharks were sampled in WA and HAW (figure 2). Apart from during transit to Australia, all samples were stored at 4°C or below until laboratory processing. DNA extraction was performed using either a QIAGEN DNeasy blood and tissue extraction kit following the manufacturer's protocols (QIAGEN Inc., Valencia, CA, USA), or a salting out method [28]. Loci were optimized into polymerase chain reaction (PCR) multiplexes using a fluorescently labelled M13 primer. For each locus, one primer stock consisting of forward and reverse primer pairs, and the corresponding M13 primer (Fam, Vic, Ned or Pet) was used. Primer stocks consisted of 3 µl of 100 µM forward primer, 30 µl of 100 µM reverse primer and 30 µl of 100 µM of M13-fluro labelled primer. To achieve optimal amplification across all loci, primer stock proportions were experimentally determined (see electronic supplementary material, table S1). Multiplex 1 consisted of loci tgr_1157, tgr_212, tgr_47, tgr_233 and tgr_348; and multiplex 2 consisted of tgr_1033, tgr_1185, tgr_891 and tgr_943. The samples were amplified using a 14 µl PCR mixture containing 10 ng of genomic DNA, 8 µl of master mix (1x Kapa Buffer A, 1.5% DMSO, 0.18 mM dNTP, 0.25 M Betaine, 0.8 units/reaction Taq), and 4 µl of primer mix (3 µl forward, 30 µl reverse, 30 µl M13-fluoro, 87 µl H 2 O). Loci were amplified in 2 multiplexes using the following protocols: Multiplex 1: 94°C for 2 min, followed by 12 cycles of 94°C for 15 s, 56°C for 30 s and 72°C for 45 s, with the annealing temperature reduced by 0.5°C each cycle (touchdown cycle). The reaction was then exposed to 23 cycles of 94°C for 15 s, 50°C for 30 s and 72°C for 45 s. Multiplex 2: 94°C for 2 min, followed by 10 cycles of 94°C for 15 s, 60°C for 30 s and 72°C for 45 s, reduced by 0.5°C each cycle. The reaction was then exposed to 23 cycles of 94°C for 15 s, 55°C for 30 s and 72°C for 45 s. The reactions for multiplex 1 and 2 completed at 72°C for 7 min before being held at 4°C until required for further analysis. Amplicons were diluted 60-fold, before being further diluted (1 in 5) in formamide containing LIZ-500 size standard (Applied Biosystems, Foster City, CA, USA) and then gel separated by capillary electrophoresis (Applied Biosystems 3130xl) following the manufacturer's recommendation.

Statistics
Alleles were sized against an internal size standard (GeneScan-500 LIZ) before being scored using GeneMapper v. 5. Genotypes were checked for scoring errors using Micro-Checker v. 2.2.3 [29]. We tested for Hardy-Weinberg equilibrium (HWE) using the Markov chain Monte Carlo (MCMC) method in GENEPOP v. 4.1.3 [30], with 100 000 steps, 100 batches and 10 000 subsequent iterations. We also tested for linkage disequilibrium (LD) among pairs of loci using an exact test based on a Markov chain method as implemented in GENEPOP, in both cases using Bonferroni to correct for multiple tests (p < 0.05) [31]. If significant departures from HWE or LD were found at one or more locus by population comparisons, then they were excluded from further analysis. To ensure that duplicate samples were not inadvertently included a Queller and Goodnight [32] relatedness test was performed in GenAlEx v. 6.5. Individuals that were identical across the nine microsatellite loci were treated as duplicates and excluded from further analysis.
Population pairwise standard diversity indices were estimated using GenAlEx v. 6.5, which included number of alleles (N a ), number of effective alleles (N e ), expected (H e ) and observed heterozygosities (H o ), and inbreeding coefficient (F is ) [33]. To investigate the degree of subdivision between locations, pairwise F ST values was estimated using Arlequin v. 3.11 [34] with the level of significance calculated by pseudo-replication (10,100) of individuals between locations; and Jost's Dest (hereafter Dest) was determined using GenAlEx v. 6.5 (999 iterations; [33,35]). Genetic populations were defined by locations (or groups of locations) that did not have significantly different F ST values from one another, while showing significant variation to other sampled populations. An analysis of molecular variance (AMOVA) was also undertaken using a hierarchical approach in Arlequin v. 3.11, where population clusters were validated to ensure that the maximum amount of variance among groups of samples (regions) was maintained. Comparisons were made between the genotypes reported from Hawaii in Bernard et al. [9] and those herein to evaluate the presence of a separate Hawaiian population. To ensure the allele frequency bins were uniform between the studies at each locus, a comparison of dominant alleles was undertaken, and homology was determined for alleles having frequencies above 10% using allele sizes and per study frequencies. Two loci (tgr_348 and tgr_891) were unable to be reliably compared between the studies and were therefore excluded from the pairwise comparison analyses.
Bayesian cluster analysis was performed in STRUCTURE [36] on the microsatellite allele frequencies. STRUCTURE was run 10 independent times on the microsatellite dataset for the potential number of groups (K) ranging from K = 1 to 8. The run assumed an admixture ancestor model with a burn-in length of 100 000 MCMC steps, followed by simulation set at 1 000 000 repetitions. To estimate the number of groups (K) an ad hoc approach was taken by obtaining the mean posterior probability of the data K using STRUCTURE Harvester [37]. To visualize clustering across runs of K, structure outputs were submitted to the CLUMPAK pipeline [38].
To identify the theoretical statistical power required to detect an F ST differentiation among locations, pilot genotyping data from 96 randomly selected sharks was used in POWSIM [27]. Allele frequencies from the pilot data were used to simulate populations (differing by a range of user-defined overall F ST values) to match populations sampled here from the three ocean basins (Central-West Pacific ( Indo-East Indian (IEI), and the Atlantic). Three user-defined levels of divergence were simulated among populations; F ST = 0.01, 0.005 and 0.0025. The following parameters were used to simulate the set of F ST values: effective population sizes (N e ) = 1000; number of simulations = 1000; and generations of drift (t) = 20 (F ST = 0.01), 10 (F ST = 0.005) and 5 (F ST = 0.0025). To test for the effect of sample size on the ability to detect significant population differentiation (α = 0.05) at a given F ST value, tests were undertaken with sample sizes of 25, 50 and 100 per population. The degree of significant differentiation between populations for each replicate run at a set F ST value was tested using Chi-squared and Fisher's probabilities.

Power simulation
When the power simulation was run with sample sizes of 100 across all nine microsatellite loci, there was sufficient power to delineate population subdivision in greater than 98% of simulations across the three levels of genetic differentiation tested (F ST = 0.01, 0.005, 0.0025). When halving the sample size from 100 to 50, the ability to detect differences at the lowest F ST (0.0025) decreased by 35%. Minor reductions in the ability to detect population structure (100% to more than 97%) were also found at F ST = 0.005 and 0.01 when simulated sample sizes were reduced to 50. When sample sizes were further decreased to 25, an F ST of 0.01 could be detected in over 95% of simulations; however, detections of smaller F ST (from 0.005 and 0.0025; 58% and 22%, respectively) were considered unreliable. To summarize, the ability to detect differences among populations with low levels of genetic structure was found to decrease considerably with reductions in sample size.

Microsatellite diversity
Screening of the samples using a Queller and Goodnight [32] relatedness test detected 17 duplicates in the NCL samples, which were subsequently removed from further analysis. Across all samples, the nine loci had an average of 9.8 alleles (range 2-26) and unbiased heterozygosity of 0.70 (range 0.18-0.94). The most polymorphic locus was Tgr_891 (mean = 11.3 alleles), followed by Tgr_348 (9.7) and Tgr_943 (8.0), while Tgr_212 was the least polymorphic (1.4 alleles) (table 1). Screening of genotypes detected no linkage disequilibrium for any population, and genotype proportions at all loci did not deviate from Hardy-Weinberg expectations following Bonferroni correction. Pairwise F ST and Dest comparisons between locations indicated considerable genetic structuring between all Indo-Pacific locations and Brazil (F ST values > 0.14, p < 0.001; Dest values > 0.29, p < 0.001). In contrast, no significant genetic differences were observed between locations from within the Pacific or Indian Oceans (F ST range 0.000-0.004; Dest range 0.000-0.007) (table 2). As a result, collection locations within these regions were pooled for subsequent analysis by ocean basin. The ocean basin pools were the Central-West Pacific (CWP; consisting of NSW, QLD, COR, NCL, HAW), the Indo-East Indian (IEI; consisting of NT and WA) and the Atlantic Ocean (BRA). A comparison of F ST and Dest values among these three ocean basins was unable to identify any significant difference between CWP and IEI, which meant the null hypothesis of genetic homogeneity was unable to be rejected (table 3). Lack of genetic differentiation among samples from CWP and IEI was further supported by AMOVA analysis. Differences among groupings were maximized and variation within groupings was minimized when the Indo-Pacific samples (i.e. CWP and IEI) were considered as a single population (among groupings = 14.72%, within = 82.5%), compared to two separate populations (among groupings = 1.65%, within = 95.17%). Differentiation between the samples from Brazil (Atlantic) and all other locations were confirmed, with significant differences associated with grouped samples from the CWP (F ST = 0.151, p < 0.001; Dest = 0.321, p = 0.001) and IEI (F ST = 0.145, p < 0.001; Dest = 0.317, p = 0.001).
Pairwise comparisons were also made between the Hawaiian genotypes from this study, and the Central Pacific (CP) genotypes reported in Bernard et al. [9]. No significant differences were observed between Hawaiian samples from both datasets. The similarity between datasets was reflected in the allele frequency distributions between datasets (see electronic supplementary material).
Bayesian STRUCTURE analysis of microsatellite genotypes supported the groupings identified by F ST . The K method in STRUCTURE Harvester identified the K = 2 as the most likely number of populations. Visual clustering of individuals using CLUMPAK revealed that all samples from the Indian and Pacific Ocean were assigned to the same group, while the second group was made up of samples from Brazil. To investigate the Bayesian structuring of individuals within the Indo-Pacific, samples from Table 1. Microsatellite marker diversity described by number of alleles (N a ), number of effective alleles (N e ), observed (H o ) and expected heterozygosity (H e ), and inbreeding coefficient (F is ). n = number of individuals. Note that n in combined columns differs to total as not all loci could be genotyped per individual. Tgr_1033 Tgr_1157 Tgr_1185 Tgr_212 Tgr_233 Tgr_348 Tgr_47 Tgr_891 Tgr_943   QLD  N a  4  10  6  3  24  18  5  19             Brazil were removed before the remaining samples were re-analysed. When subsequent STRUCTURE analysis including only the Indo-Pacific samples was run, the K suggested only one discernible grouping was present within the data.

Discussion
Analysis of genetic homogeneity with microsatellites across all locations in this study revealed two genetically independent populations; an Indo-Pacific group composed of all samples collected in the IEI and the CWP, and the Brazilian sample from the western Atlantic. The absence of genetic differentiation between the east and west Australian coasts is consistent with previously reported movement data, with evidence of migrating sharks from both coasts moving into adjoining Northern Territorial waters (see [18,19]). Evidence of tiger shark movement through the Torres Strait land bridge [18] suggests that the relatively shallow waters (less than 15 m deep between Cape York, QLD and Papua New Guinea) do not act as a barrier to movements, and possibly gene flow, as it does for other fish species [39]. A high level of genetic connectivity throughout the Indo-Pacific region was also reported by Bernard et al. [9] in their global study, with little differentiation observed between sharks sampled from the east and west Australian coasts, the Andaman Sea (southeast Asia) and as far west as the eastern seaboard of South Africa. Our investigation suggests genetic connectivity among tiger sharks from the Indo-Pacific extends into the Pacific Ocean as far as Hawaii (central Pacific). This finding was in contrast to that identified in the global study by Bernard et al. [9], which reported significant differences between Hawaii and all other global sites surveyed, including those from the Indo-Pacific. The STRUCTURE analysis by Bernard et al. [9] may explain the differences observed between the two studies. Despite K suggesting K = 2 throughout all global samples, when K = 3 was assumed by Bernard et al. [9], a few individuals from the southwest Pacific (eastern Australia) assigned to a third cluster, which was subsequently reported as evidence of a genetically distinct Hawaiian population. Although Bernard et al. [9] demonstrated a weak signal for population subdivision comparing samples from Eastern Australia (n = 21) and Hawaii (n = 65) (F ST = 0.01, Dest = 0.02), the low statistical power due to the small sample size from Eastern Australia reduces the likelihood that the statistically significant finding actually reflects a true effect [40]. Moreover, the power analysis completed herein demonstrates that the ability to detect differences among tiger shark populations with low levels of genetic structure was found to decrease considerably with reductions in sample size less than 25. Ecological studies undertaking satellite tracking across the Hawaiian archipelago have shown that tiger sharks in this region do exhibit regionally localized movements (e.g. [21,23,41]). Notwithstanding, there is a confirmed record of a conventionally tagged tiger shark from Hawaii being captured in Mexico, approximately 5000 km east of its tagging location (C. Meyer 2016, personal communication). This recapture indicates that tiger sharks have the ability to cross the large oceanic expanses, with the possibility of promoting gene flow across the broader Pacific Ocean. Acquiring and analysing additional tissue samples from the eastern Pacific region, along with further research using alternative genetic markers such as mtDNA and single nucleotide polymorphisms (SNP) in nuclear DNA is needed to confirm the single population. The level of heterozygosity of microsatellite loci reported herein suggests that variation in SNP loci should also be high. The development of SNPs for G. cuvier is likely to increase the power to detect genetic variation, even if sample sizes are small [42].
Continued advances in tagging technology that allow for multi-year data collection from mature sharks, and further developments in molecular analyses, is likely to improve our understanding of how partial migration, sex-biased dispersal and reproductive mixing influences population structure and connectivity of tiger sharks within and among global ocean basins. Given the evidence of exploitation-driven declines in G. cuvier across the world [14,15,43,44], continued assessments are vital to inform population-level management and conservation efforts across the entire distribution of this oceanodromous species.
Ethics. All procedures were approved by the University of Queensland Animal Ethics Committee (CMS/300/08/DPI/