Phylogenomic history of enigmatic pygmy perches: implications for biogeography, taxonomy and conservation

Pygmy perches (Percichthyidae) are a group of poorly dispersing freshwater fishes that have a puzzling biogeographic disjunction across southern Australia. Current understanding of pygmy perch phylogenetic relationships suggests past east–west migrations across a vast expanse of now arid habitat in central southern Australia, a region lacking contemporary rivers. Pygmy perches also represent a threatened group with confusing taxonomy and potentially cryptic species diversity. Here, we present the first study of the evolutionary history of pygmy perches based on genome-wide information. Data from 13 991 ddRAD loci and a concatenated sequence of 1 075 734 bp were generated for all currently described and potentially cryptic species. Phylogenetic relationships, biogeographic history and cryptic diversification were inferred using a framework that combines phylogenomics, species delimitation and estimation of divergence times. The genome-wide phylogeny clarified the biogeographic history of pygmy perches, demonstrating multiple east–west events of divergence within the group across the Australian continent. These results also resolved discordance between nuclear and mitochondrial data from a previous study. In addition, we propose three cryptic species within a southwestern species complex. The finding of potentially new species demonstrates that pygmy perches may be even more susceptible to ecological and demographic threats than previously thought. Our results have substantial implications for improving conservation legislation of pygmy perch lineages, especially in southwestern Western Australia.

CRMA, 0000-0003-1157-570X; LBB, 0000-0003-0944-3003 Pygmy perches (Percichthyidae) are a group of poorly dispersing freshwater fishes that have a puzzling biogeographic disjunction across southern Australia. Current understanding of pygmy perch phylogenetic relationships suggests past east-west migrations across a vast expanse of now arid habitat in central southern Australia, a region lacking contemporary rivers. Pygmy perches also represent a threatened group with confusing taxonomy and potentially cryptic species diversity. Here, we present the first study of the evolutionary history of pygmy perches based on genome-wide information. Data from 13 991 ddRAD loci and a concatenated sequence of 1 075 734 bp were generated for all currently described and potentially cryptic species. Phylogenetic relationships, biogeographic history and cryptic diversification were inferred using a framework that combines phylogenomics, species delimitation and estimation of divergence times. The genome-wide phylogeny clarified the biogeographic history of pygmy perches, demonstrating multiple east-west events of divergence within the group across the Australian continent. These results also resolved discordance between nuclear and mitochondrial data from a previous study. In addition, we propose three cryptic 2018 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.

Introduction
The biogeographic histories of species contain information about their past distribution and evolutionary trajectories that can clarify key aspects about community assemblages, biotic exchanges and environmental determinants of biodiversity [1][2][3][4]. In fact, coherent patterns of distribution and evolutionary history are often identified in analyses of multiple codistributed taxa, which points to commonalities in biogeographic history [5][6][7]. On the other hand, biogeographic distributions that cannot be related to ecology or explained by dispersal or vicariance offer enigmatic puzzles that require additional investigation (e.g. [8,9]). Biogeographic knowledge can be further used in applied research, such as assessments of taxonomy and improving conservation legislation [10]. Species delimitation, for example, is a type of phylogenetic analysis that uses computational modelling to determine the number of putative species in a group by using coalescent methods [11,12]. This can lead to revisions of taxonomic uncertainties, a critical aspect of conservation legislation. For example, the identification of independently evolving but cryptic lineages within a single threatened taxon indicates that conservation strategies should manage these lineages separately and avoid outbreeding depression and hybridization among lineages [13].
With the expansion of next-generation sequencing (NGS) technologies, the ability to sequence a representative subset of the genome or even full genomes has become a feasible process in phylogenetics [14][15][16][17]. It allows analyses of thousands or more DNA markers, providing high power for circumventing locus-specific biases and inferring phylogenies. Furthermore, no a priori genomic resources are required Issues with the pygmy perch phylogeny also extend to taxonomy, with some studies suggesting there are cryptic species in Nannoperca. Within N. australis, populations from the eastern portion of its range have been suggested as a separate species informally referred to as N. 'flindersi' that is awaiting formal taxonomic review [27,36]. Within N. vittata, Unmack et al. [27] suggested that a second cryptic species was present. Additionally, N. pygmaea was only recently formally described as a distinct species from its putative sister species N. vittata based on the morphological and allozyme assessment by Morgan et al. [34]. Most pygmy perch species are currently threatened or endangered, thus confirming their taxonomy and identifying cryptic species is imperative for well-informed conservation. Five species (N. obscura, N. oxleyana, N. pygmaea, N. variegata and Nth. balstoni) are listed as threatened in national legislation [33], all species excluding N. vittata are listed in their respective state legislation, and several species have also been listed from vulnerable to endangered in the International Union for Conservation of Nature (IUCN) Red List (IUCN, 2015). Declines in populations of these species have been linked largely to habitat degradation as a result of human-altered hydrological conditions (such as damming and agricultural irrigation) and competition or predation by introduced fish species [27,33,[37][38][39][40]. These pressures are expected to be exacerbated by a drying climate due to contemporary climate change, with large impacts on ecologically specialized species or those with limited distributions, such as pygmy perches [33,38,39,41].
Here, we capitalize on NGS technology to address uncertainties in the evolutionary history of pygmy perches, to assess the taxonomic identity of recently suggested or described species, and to investigate the possibility of further cryptic diversity within pygmy perch lineages. We present phylogenetic trees based on genome-wide data and use a species delimitation framework and divergence estimates encompassing samples from all described species of pygmy perches. Our findings have implications for clarifying the puzzling biogeography of southern Australia and for the taxonomy and conservation management of a threatened group of endemic freshwater fishes.

Sample collection and ddRAD library preparation
Specimens were selected to represent the breadth of currently described pygmy perch species, totalling 45 individual samples across seven formally recognized species and all known potential cryptic forms (table 1). Our samples include populations spanning each species range and representatives of all known lineages, including known evolutionarily significant units (ESUs) within each taxon [27,36,39,40,42] (figure 1). The most closely related extant sister lineage to pygmy perches, the nightfish Bostockia porosa was included as the outgroup [27]. Specimens were collected using electrofishing, dip-, fyke-or seinenetting. Either the caudal fin or the entire specimen was stored at −80°C at the South Australian Museum, or in 99% ethanol at Flinders University.
DNA was extracted from muscle tissue or fin clips using a modified salting-out method [43] or a Qiagen DNeasy kit (Qiagen Inc., Valencia, CA, USA). Genomic DNA was checked for quality using a spectrophotometer (NanoDrop, Thermo Scientific), integrity using 2% agarose gels, and quantity

Sequence filtering and alignment
The resultant reads were filtered and cleaned to create a set of aligned concatenated sequences for statistical analyses. The raw sequences were demultiplexed using the 'process_radtags' module of STACKS 1.29 [45], allowing up to 2 mismatches in the 6 bp barcodes. Barcodes were removed and sequences trimmed to 80 bp to remove low-quality bases from the end of the reads. These cut reads were then aligned using the software package PyRAD 3.0.6 [46], and further cleaned by removing reads that had more than 5 bp with a PHRED score of less than 20. As PyRAD is a de novo assembly pipeline, it is particularly effective for large-scale, divergent sequences which may cause some sequences to 'drop out' due to indels or mutations [46]. To account for the effect of proportions of missing data in the alignment on downstream analyses [20,47], two alignment criteria were used based on minimum coverage: a strict dataset with a minimum coverage of 40 individuals per locus (approx. 89%), and a relaxed dataset with a minimum coverage of 32 individuals per locus (approx. 70%).

Sequence divergence analysis
The genetic distance between concatenated ddRAD sequences from different individuals was calculated using an uncorrected (P) distance matrix generated in PAUP* 4 [48], with mean values across lineages (species and, in the case of N. vittata and N. pygmaea, geographical groups and pairs of lineages). This was done for each filtered dataset (strict and relaxed) to compare the effects of the filtration process on downstream analyses.

Phylogenetic analysis
In order to determine evolutionary relationships among pygmy perch samples, maximum-likelihood (ML) phylogenies were estimated using RAxML for both the strict (4381 loci) and relaxed (13 991 loci) dataset [49]. RAxML remains one of the best options available to estimate a ML phylogeny in genomewide datasets [49]. This was done using rapid hill-climbing and 1000 resampling estimated log-likelihood [50] bootstraps with a GTRGAMMA model, using the online service CIPRES and the supercomputer XSEDE [51]. Invariable sites (+I) were not included in the model as they are intrinsically linked to other factors such as rate categories and are unlikely to be true biologically (as all sites within a sequence are likely to have some, if negligible, mutation rate) [52,53]. This model composition is the recommended choice by the developers of RAxML [49].
To test for consistency of results, Bayesian estimation of the pygmy perch phylogeny was also conducted using PhyloBayes 4.1 [54], using a CAT substitution model and a discrete gamma distribution of site rate heterogeneity model. A Markov chain Monte Carlo chain was run for a total of 1480 cycles consisting of 91 584 tree generations until the log likelihood demonstrated a stable equilibrium. Bayesian analysis was limited to the strict dataset due to computational limitations of the software. The resultant phylogenetic trees of both methods were visualized using MEGA 7 [55], using B. porosa as the outgroup.

Species delimitation
Species were delimited using the software package Bayesian Phylogenetics and Phylogeography (BPP 3.2) [56]. Delimitation was limited to the smaller, strict dataset due to computational constraints. BPP uses a coalescent modelling method to estimate species limits in a Bayesian framework, and estimates multiple species tree hypotheses simultaneously with the delimitation to create a more robust and statistically sound analysis by providing posterior probabilities of both.
The unguided species delimitation method (analysis A11 as described within the manual) was used to simultaneously estimate both the species tree and the delimitation of the designated species, avoiding biases associated with using a fixed guide phylogeny [56,57]. As BPP can only coalesce but not split input species, all three populations of N. vittata were input as separate species on account of their divergent and paraphyletic nature in the maximum-likelihood phylogeny (see Results below), giving a total of 11 input species. Biologically reasonable priors for the gamma distributions of population sizes of lineages (θs) and the divergence time of the root of the phylogeny (τs) were adjusted until convergence of species delimitations were found over two runs for both incorporated algorithms (n = 4) to confidently provide consistent results. Other divergence time parameters were estimated using a Dirichlet prior [58, eqn (2)].
A lower number of replications was initially used for computational efficiency [10 000 burn-in + (100 sample freq * 1000 nsample) = 110 000 generations]. Priors were progressively altered upwards from G(2, 0) for the τs prior and G(30, 1) for the θs prior until convergence of results was found. Convergence of the runs was found using a τs prior of G(10, 10 000), with Dirichlet prior for other divergence times, and a θs prior of G(2, 100), using cleandata = 0 to account for gaps and ambiguous characters in the sequence. Once convergence was found, a final species delimitation and species tree estimation analysis was done using a larger number of generations under each algorithm [2 replicates per algorithm; 10 000 burn-in + (5 sample freq * 98 000 nsample) = 500 000 generations].

Molecular dating
Divergence time estimates were obtained using a maximum clade credibility (MCC) pipeline. The MCC method allows for the summation of thousands of phylogenetic trees into a single, consensus tree by selecting the tree containing the most common clades, rather than building a tree from each most common clade that may never have been generated in the initial analysis [59]. As ambiguous or heterozygous sites can drastically impact divergence time estimates [60,61], haplotypes were generated

NflSRLO1
NflSRLO2 N. australis Nth. balstoni using repeated random haplotype sampling (RRHS), which randomly assigns a particular base to each heterozygous site within an unphased sequence [60]. A total of 3000 sets of RRHS sequences were created for divergence estimates using an edited RRHS Java command line script provided by [60], with each set of haplotype sequences analysed in a ML phylogeny using 1000 bootstraps in RAxML. The 'best tree' output for each RAxML run was unrooted using the R package ape [62] and summarized into a single MCC tree using the consense protocol of ExaBayes [63]. Due to issues with the formation of a polytomy within the root of the phylogeny which prevents divergence time estimation, a B. porosa outgroup sample (BpoHR1) was removed from the tree using the prune command of ape. This had no effect on branch lengths or topology within the rest of the phylogeny. The software r8s 1.81 was then used to estimate divergence times in the MCC phylogeny [64]. R8s estimates absolute rates of divergence times across branches of a given phylogenetic tree based on branch lengths, by using an estimation of relative rate across branches and relating this to a set age calibration of at least one given node [64]. A single calibration point was placed at the split between the eastern clade of N. australis-N. obscura-N. oxleyana and the western clade of N. vittata-N. pygmaea at 14-16 Ma. This date represents the formation of the current day Nullarbor Plain which is associated with the cessation of potential connectivity between the eastern and western freshwater fauna [27]. Divergence times for each node were estimated using a penalized-likelihood model under a truncated Newton algorithm [65], which uses a parametric branch substitution rate model with a nonparametric roughness penalty [64]. A 'smoothing parameter' determines the contribution of the roughness penalty aspect: a cross-validation procedure was used to determine the best value between log 10 0 and log 10 100. The optimum log 10 smoothing parameter of 44.00, with a chi-square error of 139 049.74, was used to estimate divergence times for defined major species divisions (described in figure 2). Confidence intervals for the age estimations were calculated by sub-sampling 100 independent RRHS trees from the 3000 trees dataset. These trees were randomly selected from a pool of 2388 trees that showed matching topology to the MCC phylogenetic tree (determined using the all.equal.phylo function in ape [62]). Each tree was dated using the same methods as above and the distribution of node ages for all major species divisions calculated using the profile function of r8s. Using topologically identical trees with varying branch lengths to estimate confidence intervals is the suggested approach within the r8s manual [64]. Estimates for mutation rates across all branches, as well as the average across the phylogeny, were also calculated using r8s. Including fossil-calibrated reconstructions to estimate divergence time in this study was hampered due to the absence of known suitable fossils from pygmy perches or closely related taxa. Although evolutionarily divergent fossils exist for Centrarchidae-a freshwater family endemic to North America thought to have split from the Australian Percichthyidae over 61 million years ago [66]-applying poorly constrained fossil evidence with questionable placement in the phylogeny is not a recommended option [67,68].

Ancestral area reconstruction analysis
In order to statistically evaluate competing biogeographic hypotheses about multiple dispersal or vicariance events (see Discussion), we estimated ancestral areas using the R package BioGeoBEARS [69]. As BioGeoBEARS requires a time-calibrated ultrametric tree as an input file, the MCC phylogenetic tree was first collapsed down to species using the collapseTree command of the R package phytools [70] and calibrated using the MCC r8s divergence estimates and the chronos command of ape. Species were assigned to one of two biogeographic regions (East or West) with a max range of 2 (i.e. both regions) for ancestral lineages. Ancestral areas were estimated under all six available models (DEC, DIVA-LIKE and BAYAREA-LIKE, as well as their + J counterparts) and compared using the Akaike information criterion (AIC) to assess the fit of each model.

Sequence filtering and alignment
The strict dataset (approx. 11% missing data) of 4381 ddRAD loci produced a concatenated sequence of 334 936 bp, and the relaxed dataset (approx. 30% missing data) of 13 991 ddRAD loci produced a concatenated sequence of 1 075 734 bp (table 2)  Furthermore, individual clades of N. vittata within each superclade showed greater genetic distance to one another (0.36-0.55%) than that found within any other species. These levels of divergence approached the genetic distance between previously suggested species, such as between N. 'flindersi' and N. australis (0.68%). When comparing N. vittata [A] and N. vittata [B], the divergence seen between these (1.26%) exceeds the genetic differentiation between N. 'flindersi' and N. australis. Finally, the eastern species N. oxleyana appeared as the most divergent Nannoperca lineage, showing the greatest pairwise genetic distance to all other pygmy perches (2.50-4.59%). Similar levels of divergence were shown for Nth. balstoni to all other pygmy perches (2.93-4.59%).

Phylogenetic analysis
Visual inspection of the ML phylogenies demonstrated no effect of missing data or loci number, with both datasets showing identical topology for all major clades within pygmy perches and similar branch lengths (electronic supplementary material, figures S1 and S2). Similarly, Bayesian analysis of the strict dataset showed no variation in topology compared to the ML phylogeny of the same dataset and returned very high nodal support (0.95-1 posterior probability for all nodes of tree; electronic supplementary material, figure S1). The MCC tree demonstrated near identical topology to the previously generated trees, albeit with significantly higher bootstrap values and a greater difference in branch lengths (figure 2). This difference is expected due to the pseudo-phasing of data and larger number of bootstraps (3000 individual trees). Topological differences were limited to changes in the position of samples NauMC1, NvaEP2 and BpoHR1 within their respective lineages. Distinctive branch length variations were observed in Darby River N. australis samples (NauDR), within individual N. vittata locality groups (NviAR, NviDC and NviCBP), within Nth. balstoni and the outgroup B. porosa. Bootstrap support varied slightly across datasets, with the strict dataset giving higher support for some nodes (within N. australis, N. 'flindersi' and N. variegata), but lower support for others (within and between    Table 4. Statistical evaluation of biogeographic models implemented in BioGeoBEARS. Extant species were assigned to an eastern or western geographical range with a maximum range of 2 (i.e. ancestors could occupy both areas) for all analyses, with the accuracy of each model assessed and compared using the Akaike information criterion (AIC).  N. obscura and N. oxleyana). This most likely reflects a bias towards N. australis-specific loci in the strict dataset due to the higher number of samples for this taxon. Four of the five species from eastern Australia (N. australis, N. 'flindersi', N. obscura and N. oxleyana) were reciprocally monophyletic. This clade was sister to a western clade comprised of N. vittata (including all putative cryptic forms within this species) and N. pygmaea. The combined eastern and western clade was sister to an eastern species, N. variegata, and these together were sister to a western species, Nth. balstoni. Additionally, the phylogeny included the distinct separation of N. 'flindersi' from its sister taxon N. australis and divergence of N. pygmaea and clades of N. vittata.
Finer scale population-level patterns could also be observed within several taxa, with N. 'flindersi' separating into one Victorian (Snowy River Lagoon, Orbost) and one Tasmanian (Flinders Island + Anson River) clade. Similarly, N. australis separated into three clades: one composed of Murray-Darling Basin individuals, one of fish from the southeast coast and one of a distinct population at Darby River. Shallower phylogeographic patterns were also observed within N. obscura, with geographical clusters of populations forming distinct clades.

Species delimitation
Coalescent modelling of species delimitation resulted in a total of 11 species within the phylogeny (posterior probability = 1 for all runs), with each clade within N. vittata being recognized as an independent species. All previously described or suggested species were similarly recognized as independent of one another, including N. 'flindersi' and N. pygmaea. The species tree output by BPP matched the topology of all ML and Bayesian trees and the MCC tree.

Molecular dating
Divergence time estimates from r8s revealed that pygmy perches are an ancient lineage, with the root of the group estimated at 20 (±0.01) Ma (figure 2). This time of origin is similar to other 'ancient' teleosts from the northern hemisphere [71,72]. Each inferred east-west divergence had a different estimated age within the pygmy perch radiation. Most lineages within the major eastern clade showed comparatively recent divergences, except for N. oxleyana which showed an older divergence time of 12 (±0.01) Ma from N. obscura (node I), reflected by the high genetic distance to all other pygmy perches. More closely related species had relatively young ages such as 3-4 Ma (nodes F and G) within the vittata clade and 6 (±0.02) Ma between N. australis and N. 'flindersi' (node J). All times of divergences were found to be within or close to the ranges proposed by Unmack et al. [27]. All estimations of node ages were highly consistent, as indicated by low standard deviation of all node estimations using 100 RRHS trees (electronic supplementary material, table S1; figure 2). This is expected as individual RRHS trees do not excessively vary in branch lengths and indicates that even pseudo-phasing of the data can produce consistent results.
Estimations of rate variation across sequences had a mean rate of 9.67 × 10 −4 (±4.81 × 10 −7 standard deviation) substitutions per site per million years. Very small differences in rate variation were seen across lineages, as highlighted by the low standard deviation of total rate variation.

Ancestral area reconstruction
Evaluation of the six potential biogeographic models within BioGeoBEARS using AIC suggested that the DIVALIKE+J model was best representative of the data (table 4, figure 3). This model demonstrated low probability of dispersal or extinction events (d and e = 1 × 10 −12 ) but showed an effect of a founder event ( J = 0.0972). Furthermore, nearly all ancestral lineages demonstrated a likely eastern geographical range (excluding the ancestor of the two pygmy perch genera). These results suggest that a founder event (such as a rare, long-distance dispersal event) from the east was a likely driving factor in the colonization of the major western clade containing the N. vittata species complex (including N. pygmaea).

Discussion
We performed a genome-wide study using phylogenomic and species delimitation methods to clarify evolutionary and biogeographic history and to assess cryptic diversification in all known lineages of pygmy perches. We confirmed the biological relevance of previously inferred phylogenies and recently suggested species. Additionally, with this much larger dataset, we provided greater phylogenetic resolution and discovered extra cryptic species. Our findings reveal complex biogeographic patterns in southern Australia and point to the importance of in-depth taxonomic analyses for appropriate conservation management of threatened biodiversity.

Phylogenomics of pygmy perches
Our phylogenetic trees corroborate the combined tree presented by Unmack et al. [27] and went further by providing substantially improved phylogenetic resolution. There was a major improvement in bootstrap support for all nodes in the genomic phylogeny, highlighting the ability of ddRAD to avoid locus-specific biases such as mitochondrial introgression, which may have obscured some findings of the previous study. These phylogenetic trees enabled the resolution of conflicting nodes across different markers from Unmack et al. [27] figure 2). The level of divergence between these N. vittata lineages is similar to that between N. vittata and N. pygmaea, possibly indicating cryptic species within N. vittata (see below). This study also enabled a much greater resolution of intraspecific population structure within N. australis and N. 'flindersi' than suggested by prior studies (e.g. [36]). A species delimitation framework, in combination with other qualitative genetic analyses, supported the proposed N. pygmaea, which was previously only based on morphological assessments and limited allozyme data [27,73]. We also revealed two new cryptic lineages within N. vittata. It is recommended that a more thorough analysis of N. vittata should be conducted in accordance with an integrative taxonomic approach [74][75][76] before taxonomic changes are implemented and used in conservation management [77,78]. Assessing all known populations of N. vittata, with more comprehensive sampling than achieved here, may reveal the full geographical range and ecological nature of the delimited species. Additional research focused on morphological and ecological divergences is important for establishing species boundaries which would help resolve their taxonomy. Clearly, further taxonomic studies of pygmy perches are needed to better inform their conservation management, particularly for those lineages without formal description and legislative protection such as N. 'flindersi'. An understanding of species limits and local adaptation would provide a framework for targeted conservation actions in pygmy perches, such as genetic rescue [79].
While species delimitation provides a statistical framework for testing hypotheses of species identity, there are limitations in the outcomes. Simulations have suggested that BPP has a tendency to exaggerate the number of species within a phylogeny, providing high support for divergent lineages which are not biologically true species [80]. Despite this criticism, a solely genetic basis for species description has rarely (if ever) been used [81]. Taxonomic changes require a complement of various forms of analyses, including morphological, ecological and behavioural data [12,76] to infer species identity and reproductive isolation. Thus, while we do not present our delimitations as fully putative species, we conservatively suggest that those identified may possibly reflect truly cryptic species.

Biogeographic interpretation
Our study provides support for multiple east-west movement events across southern Australia involving both older (i.e. Nth. balstoni and N. variegata), as well as younger pygmy perch lineages. The ancestral area reconstruction analysis did not support a simple model of vicariant divergence of eastern and western pygmy perches as the result of the Nullarbor Plain. Instead, the most supported model suggested a founder event from an eastern ancestor into the west (the predecessor of the N. vittata species group). Given that the climatic and geological changes associated with the formation of the Nullarbor Plain were likely gradual, it is possible that a peripheral population of the eastern ancestor became isolated during the Miocene. Alterations to hydrology and geology, such as river capture, may have disconnected this peripheral population from the rest of the species range, causing it to divert westward and found the western pygmy perch group. This is particularly relevant for freshwater species which inhabit dendritic systems as geological and climatic changes can significantly alter hydrological connections, thereby having significant impacts on phylogeographic structure and interpretation [82]. Thus, we suggest that range expansion, followed by subsequent isolation due to a vicariant barrier, is likely the major mechanism driving this geographical separation. Nonetheless, we assume that the formation of the Nullarbor Plain was still pivotal to the complete separation between younger western (N. vittata) from younger eastern (N. australis, N. obscura and N. oxleyana) lineages. This scenario is similar to that proposed by Unmack et al. [27], who suggested that the lack of a singular east-west split of pygmy perches was indicative of multiple migrations across central southern Australia prior to the formation of the Nullarbor Plain (deemed the 'Multiple Invasion Hypothesis'; figure 4). While projections of historical climate suggest that the hydrology of southern Australia may have been suitable for the migration, little support for multiple migrations has been found in other aquatic taxa [2,29,32]. Of these, many are more dispersive than pygmy perches, suggesting they should likewise have been able to migrate across southern Australia multiple times. While this may seem counterintuitive, little is known about the historic ecology of aquatic biota to propose a mechanism or reason for this disparity. The historically widespread distributions of N. vittata and N. australis have been suggested to predispose them to being tolerant to a range of habitats [27]: thus, it is hypothetically possible that pygmy perches were able to tolerate intermediate habitats between the east and west. Additionally, historical metapopulation dynamics suggested for pygmy perches may have allowed them to respond to temporally unfavourable habitats [40].  Nodes representing migration events are denoted within the phylogeny by asterisks, with the numbers corresponding to a particular migration on the map. Divergence [3] represents the last possible migration event before the arrival of the Nullarbor Plain as a barrier to dispersal.
However, the Multiple Invasion Hypothesis for pygmy perches was later criticized, with Ladiges et al. [83] instead proposing that geographical paralogy may explain the lack of a singular east-west split. Geographical paralogy is when two (or more) independent but closely related lineages experience a biogeographic split at the same time ( figure 5). In the case of pygmy perches, the geographical split of one lineage containing the ancestor of Nth. balstoni and N. variegata, and one lineage containing the ancestor of N. australis, N. obscura, N. oxleyana and N. vittata, would produce a similar phylogenetic pattern to that shown here. These independent divergence events could have been vicariantly caused by the formation of the Nullarbor Plain, or could reflect independent splitting events (both in mechanism and timing). There are unfortunately no known fossil records of pygmy perches across southern Australia to demonstrate the necessary ubiquity of pygmy perches prior to these splits [84].
The Multiple Invasion Hypothesis is instead more strongly supported than geographical paralogy by the current study. This is based on the lack of a sister relationship between N. variegata and Nth. balstoni required for geographical paralogy (electronic supplementary material, figures S1 and S2; figures 2 and 5). This lack of sister group relationships would likely only be incorrect if there are unaccountable artefacts from extinction of lineages or difficulties in detecting relationships in anciently diverged lineages. Additionally, the ancestral area reconstruction modelling did not suggest a simple vicariant separation of eastern and western pygmy perches from a widely distributed ancestor, but instead a founder event (e.g. long-distance dispersal) from the east to the west ( figure 3). While there are significant limitations and assumptions with historical biogeographic analysis, alternative models that included many widespread ancestors were not as well supported by the data (

Biodiversity hotspot in southwestern Australia
The highly divergent nature of populations in endemic species (N. vittata spp. and Nth. balstoni) of southwestern Western Australia (figure 2) indicates that this region is a significant driver of evolution and speciation. This is consistent with other studies that showed high levels of endemism and species diversity within the region, leading to its internationally and nationally recognized status as a biodiversity hotspot [28,29]. Our findings reinforce the high conservation value of the region.
The high levels of in situ speciation in southwestern Australia have often been linked to historical climate changes [29,30]. The drivers of population diversification within pygmy perches probably relate to allopatric isolation as a result of fragmentation of wetter regions, with all species of western pygmy perches limited to areas of higher annual rainfall (greater than 600 mm) [29]. Allopatric speciation has been noted throughout southwest Western Australia for a range of taxonomic groups with limited dispersal capabilities, including many invertebrate groups (e.g. spiders, isopods and crayfishes) and some frog species [29,85,86].
It is likely that a combination of long-term isolation associated with formation of the Nullarbor Plain, as well as in situ speciation within southwest Western Australia, are major factors accounting for the diversification and evolution of pygmy perches. Further assessments focusing on biogeographic history are required to elucidate the complexity of the pygmy perch phylogeny. In this regard, species distribution modelling (SDM) appears as a robust approach to addressing competing biogeographic theories when combined with statistical phylogeographic methods [87,88]. For pygmy perches, SDM could be applied to southwestern Australia to determine what role environmental factors played in driving the divergence of N. vittata lineages.

Conservation concerns
The identification of cryptic species has profound implications for their conservation management, particularly for legislation. While N. vittata is currently unlisted within the IUCN Red List, our results suggest that N. vittata is potentially three independent species, each with a small number of narrow endemic populations that are at risk of losing genetic diversity and local extirpation. Thus, a thorough assessment of N. vittata throughout their currently recognized range and associated species descriptions are required to fully elucidate the taxonomic intricacies within this lineage and to incorporate these into conservation management practices.
Our study also identified intraspecific lineages that are not yet divergent enough to be considered as different species (as opposed to the divergent lineages within N. vittata). Some of these lineages correlated with previously identified ESUs [89], such as in N. obscura [42]. Additional intraspecific genetic structure in N. 'flindersi' and N. obscura suggests that conservation managers should recognize different conservation units. This includes three geographically isolated lineages of N. 'flindersi'. Given the isolation of these lineages by unpassable saltwater barriers (Bass Strait) and environmental differences between the regions [90,91], they may be adaptively divergent and are perhaps precursors of incipient species. A thorough assessment of population structure using genome wide data in each region is required for improving and defining appropriate units for management within species.
Both the taxonomic issues demonstrated by this study, as well as the low genetic diversity and dispersal capabilities of many pygmy perches [33,37,39,40,42], raise concerns for their conservation management under future climate change. Their low diversity potentially reduces their capacity to adapt to changing environments, and their low dispersal capabilities inhibit them from easily moving to more favourable environments. In addition, low genetic variation and human-induced adaptive divergence in habitat fragments threaten the species in the Murray-Darling Basin [39] (see also [40]). Genetic-based captive breeding and restoration efforts have already been required for N. australis and N. obscura in the lower Murray River [37,92], and local extirpations in various parts of the Murray-Darling Basin are already recorded [38,40]. Our novel phylogeographic findings are expected to inform future geneticbased breeding programmes in pygmy perches, particularly for N. vittata spp., as such programmes should take into account the evolutionary history and historical demography of threatened lineages to maximize their success [37].
We used genomics to identify and characterize biodiversity patterns across southern Australia and within the pygmy perches. Specifically, we aimed to improve our understanding of southern Australian biogeography and the taxonomy of pygmy perches, with associated implications for conservation. Using a powerful dataset, we have resolved previously uncertain phylogenetic relationships in pygmy perches, identified and confirmed additional cryptic species within N. vittata through a species delimitation framework, and estimated divergence times for the pygmy perch phylogeny using a ddRAD molecular clock. These findings have strengthened our understanding of biological concepts such as southern Australian biogeography, the need for further taxonomic research into pygmy perches and important conservation suggestions. The indication of high levels of cryptic speciation within southwest Western Australia further corroborates its high conservation importance and biodiversity hotspot identity, while the identification of independent species suggests modification for current conservation statuses and management.

Ethics. Collections were obtained under permits from various state fisheries agencies and research was under Flinders
University Animal Welfare Committee approvals E313 and 396.