Horizontal transfer and subsequent explosive expansion of a DNA transposon in sea kraits (Laticauda)

Transposable elements (TEs) are self-replicating genetic sequences and are often described as important ‘drivers of evolution’. This driving force is because TEs promote genomic novelty by enabling rearrangement, and through exaptation as coding and regulatory elements. However, most TE insertions potentially lead to neutral or harmful outcomes, therefore host genomes have evolved machinery to suppress TE expansion. Through horizontal transposon transfer (HTT) TEs can colonize new genomes, and since new hosts may not be able to regulate subsequent replication, these TEs may proliferate rapidly. Here, we describe HTT of the Harbinger-Snek DNA transposon into sea kraits (Laticauda), and its subsequent explosive expansion within Laticauda genomes. This HTT occurred following the divergence of Laticauda from terrestrial Australian elapids approximately 15–25 Mya. This has resulted in numerous insertions into introns and regulatory regions, with some insertions into exons which appear to have altered UTRs or added sequence to coding exons. Harbinger-Snek has rapidly expanded to make up 8–12% of Laticauda spp. genomes; this is the fastest known expansion of TEs in amniotes following HTT. Genomic changes caused by this rapid expansion may have contributed to adaptation to the amphibious-marine habitat.


Introduction
Transposable elements (TEs) are self-replicating genetic elements that mobilize themselves across genomes. A substantial proportion of eukaryotic genomes is composed of TEs, with most reptilian and mammalian genomes comprising between 30 and 60% [1]. As TEs proliferate within a genome, most insertions will be either neutral or deleterious [2]. Over evolutionary timescales, the movement of TEs can enable major adaptive change; being exapted as coding and regulatory sequences, and by promoting both inter-and intra-chromosomal rearrangements such as segmental duplications, inversions and deletions through non-allelic homologous recombination (NAHR) [3,4]. Due to the deleterious effect of TE expansion, eukaryotes have evolved various defence and regulatory mechanisms [5][6][7].
In addition to being vertically inherited, TEs can also invade a new host through horizontal transposon transfer (HTT). While the exact mechanisms of HTT are unknown, many instances across eukaryotes have been reported [8][9][10][11][12]. It is expected that following HTT the expansion of new TEs is slowed or halted due to the potentially deleterious effects they can cause [2,13], and any continued expansion will likely be slow. For example, following ancient HTT events the BovB retrotransposon has taken 32-39 My and 79-94 My to colonize between 6 and 18% of ruminant and Afrotheria genomes, respectively [9,14,15]. However, the rapid expansion of TEs following HTT has previously been noted in Myotis bats, where hAT transposons expanded to cover 3.3% of the genome over the space of 15 My [16][17][18].
Here, we report the HTT of a Harbinger DNA transposon, Harbinger-Snek, into Laticauda, a genus of marine snakes which diverged from terrestrial Australian snakes 15-25 Mya [19][20][21]. Since diverging from terrestrial snakes Laticauda transitioned to amphibious-marine habitats, foraging on coral reefs and returning to land only to digest prey, mate and lay eggs [22]. Surprisingly, no available strictly terrestrial animal genomes contained any trace of Harbinger-Snek, with the most similar sequences identified in sea urchins. Due to the absence of Harbinger-Snek-like sequences from terrestrial species and highly similar sequences present in marine species, we propose Harbinger-Snek was horizontally transferred to Laticauda from a marine donor genome during habitat transition. Furthermore, since this initial HTT event, Harbinger-Snek has expanded rapidly within the genomes of Laticauda and now accounts for 8% of the L. laticaudata assembly and 12% of the L. colubrina assembly.

(b) Identification of horizontal transfer and potential source/vectors
To identify any TEs restricted to a single lineage of elapid, we searched for all TEs identified by RepeatModeler2 using BLASTN (-task dc-megablast) [28] in the three other assemblies, as well assemblies of the Asian elapids Naja naja [29] and Ophiophagus hannah [30]. TEs present in high numbers in a species, but not present in the other elapids, were considered HTT candidates. This yielded a high copy number of Harbinger elements in L. colubrina. To rule out contamination, we searched for this element in a L. laticaudata genome assembly [25]. Using RPSBLAST [31] and the Pfam database [32], we identified Harbinger copies with intact protein-coding domains. To identify potential source or vector species, we searched all metazoan RefSeq genomes with a contig N50 of at least 10 kbp with BLASTN (-penalty -5 -reward 4 -out -word_size 11 -gapopen 12 -gapextend 8) (electronic supplementary material, table S1). In species containing similar elements, we created consensus sequences using the aforementioned BLAST, extend, align and trim method. As we had identified similar Harbinger elements in fish, bivalves and echinoderms from RefSeq, we repeated this process for all GenBank assemblies of other species from these clades with a contig N50 of at least 10 kbp.
We identified transposase domains present in curated Harbinger sequences and all autonomous Harbinger elements available from Repbase [33] using RPSBLAST [31] and the Pfam database [32]. Using MAFFT (-localpair) [34], we created a protein multiple sequence alignment (MSA) of identified transposase domains. After trimming the MSA with Gblocks [35], we constructed a phylogenetic tree using FastTree [36] and from this tree chose an appropriate outgroup to use with curated elements. We subsequently constructed a protein MSA of the curated transposases using MAFFT, trimmed the MSA with Gblocks and created a phylogeny using IQ-TREE 2 (-m MFP -B 1000), which selected TVMe + I + G4 as the best model [37][38][39]. For comparison, we also created phylogenies using the same MSA with MrBayes and RAxML [40,41]. To compare the repeat and species phylogenies, we created a species tree of major sampled animal taxa using TimeTree [42].

(c) Potential interaction of Harbinger-Snek with genes
Using the improved RepeatModeler2 libraries and the Repbase (-lepidosaur) library, we used RepeatMasker [43] to annotate the two species of Laticauda. Using Liftoff [44], we transferred the No. scutatus gene annotation from RefSeq [45] to the L. colubrina and L. laticaudata genome assemblies. To identify Harbingers in genes, exons and regulatory regions we intersected the RepeatMasker intervals and transferred gene intervals using plyranges [46]. To test for potential effects of these insertions on biological processes and molecular functions in Laticauda, we ran PANTHER overrepresentation tests [47] of each using Anolis carolinensis as a reference with genes annotated in Laticauda as a filter.

(d) Continued expression of Harbinger-Snek
To test if Harbinger-Snek is expressed in L. laticaudata, we aligned raw RNA-seq reads from vomeronasal organ, tongue, nasal cavity and liver tissue from Kishida et al. [25] (BioProject PRJDB7257) to the L. laticaudata genome using STAR [48]. Using IGV [49], we examined the alignments, examining intact Harbinger-Snek TEs and exons of genes in which we had identified Harbinger insertions.

Results and discussion (a) Harbinger-Snek is unlike transposons seen in terrestrial elapid snakes
Our ab initio repeat annotation revealed a novel Harbinger DNA transposon in L. colubrina, Harbinger-Snek. Using BLASTN, we found Harbinger-Snek present in both L. colubrina and L. laticaudata, but failed to identify any similar sequences in terrestrial relatives. Harbingers are a superfamily of transposons encoding two proteins, a transposase and a Myb-like DNA-binding protein [50]. While both are necessary for transposition [51], we identified multi-copy variants of Harbinger-Snek which encoded only one of the two proteins, going forward referred to as solo-ORF variants. These variants likely result from large deletions and may be nonautonomous. In addition, we identified many short non-autonomous variants which retain the same target site duplications and terminal motifs, yet encode no proteins.

(b) Harbinger-Snek was horizontally transferred to Laticauda
Harbingers have previously been reported in a wide variety of aquatic vertebrates including fish and some crocodilians and testudines, but not in solely terrestrial vertebrates [33]. Our repeat annotation of the Laticauda, Aipysurus, Notechis and Pseudonaja assemblies revealed Harbingers to be the dominant TE superfamily in both Laticauda species examined ( Our phylogenetic analysis (figure 1) of similar Harbinger transposase sequences placed Harbinger-Snek in a strongly supported cluster with Harbingers found in two sea urchins, S. purpuratus and Hemicentrotus pulcherrimus (order Echinoida). In addition, the species that cluster together elsewhere on the tree are not closely related, for example, the sister cluster to the Laticauda-Echinoidea cluster contains a variety of fish and bivalve species. The mismatch of the species tree and the transposase tree suggests the horizontal transfer of Harbinger-Snek-like transposons may be widespread among these marine organisms. Interestingly, neither Echinoida assembly contained more than 10 Harbinger-Sneklike transposons, none of which encode both proteins.   Mass expansion of existing TEs during speciation has previously been seen in many groups including primates [53], woodpeckers [54] and salmonids [55]. By making the genome more dynamic, these expansions may have fostered Table 1. The expansion of Harbinger elements in Laticauda spp. This dramatic expansion (cells with grey background), along with that of LTR elements, in L. colubrina has contributed to L. colubrina having a larger genome than terrestrial species. This increase is due to the expansion of Harbinger-Snek alone as they account for over 99.7% of the Harbingers present in each Laticauda assembly. This gain in L. laticaudata appears to have been offset to some degree by loss from other TE families. Mbp or percentage difference in assembly repeat content between Laticauda and the average of the terrestrial Notechis scutatus and Pseudonaja textilis. Repeat content was annotated using RepeatMasker [43] using a combined Repbase [33] and curated RepeatModeler2 [24] library. rapid adaptations. The sharp peak in the divergence profile ( figure 2) indicates Harbinger-Snek's expansion was rapid, and the small number of near-identical copies suggests expansion has slowed massively, especially in L. laticaudata.
Many more apparently complete and potentially intact copies of Harbinger-Snek are present in the L. colubrina assembly than the L. laticaudata assembly, with only one fully intact copy in L. laticaudata, but 269 in L. colubrina. Our alignment of   Figure 1. The absence of Harbinger-Snek from terrestrial vertebrates and its highest similarity to Harbingers present in sea urchins support its horizontal transfer to Laticauda since transitioning to a marine habitat. Nodes without support values have support of 95% or higher. The distribution of species across this tree suggests Harbinger-Snek-like transposons were horizontally transferred between a wide variety of species. This figure is an extract of a maximum-likelihood phylogeny constructed from the aligned nucleotide sequences of the transposases present in curated elements using IQ-TREE 2 [37], for the full tree see electronic supplementary material, figure S1. We also reconstructed trees with similar topologies using RAxML and MrBayes (see methods). Clade phylogeny constructed with TimeTree [42]. L. laticaudata RNA-seq data from four tissues (vomeronasal organ, nasal cavity, tongue and liver) to the L. laticaudata genome revealed reads mapping across both coding regions of the intact copy of Harbinger-Snek. Therefore, Harbinger-Snek and its non-autonomous derivatives may still be transposing in L. laticaudata.
In addition to containing many more intact copies of the full element, Laticauda colubrina also contains a higher number of the solo-ORF variants than L. laticaudata, with 2263 intact transposase-only variants compared to 35, and 452 intact DNA-binding protein only variants compared to six. Based on this stark contrast, since divergence approximately 15 Mya [19] L. colubrina has maintained a higher rate of Harbinger-Snek expansion, or L. laticaudata has had a higher rate of Harbinger-Snek loss or has more efficiently suppressed expansion.

(d) The accordion model-the expansion of Harbinger-
Snek has been balanced by loss in L. laticaudata The peak in Harbinger-Snek expansion in L. colubrina is both higher and more recent than L. laticaudata ( figure 2). In addition, L. laticaudata has a much lower overall Harbinger-Snek content and genome size (table 1). Past observations in birds, mammals and squamates found increases in genome size due to transposon expansion are balanced by loss due to deletions through NAHR [56,57]. We expect that the mass expansion of Harbinger-Snek in Laticauda has generated many near-identical sites in the genome, in turn promoting NAHR. In spite of the explosive expansion of Harbinger-Snek in L. laticaudata, the genome size and total TE content is very similar to that of the terrestrial Pseudonaja and Notechis (table 1). This retention of a similar genome size is not seen in L. colubrina, the genome assembly of which is 20% larger than the terrestrial species. However, the overall TE content of the L. colubrina genome remains similar to that of L. laticaudata and the terrestrial species, with the expansion of TEs only contributing half of the total increase in genome size. This is consistent with the aforementioned expectation of balancing of TE expansion by deletions.  (electronic supplementary material, table S3). These genes have a wide range of functions, many of which could be significant in the context of adaptation. Of note, a fragmented insertion of Harbinger-Snek present in GTP Binding Protein 1 (GTPBP1) appears to have altered an ORF. Because GTPBP1 plays a role in regulating circadian mRNA stability [58], this could be consequential for aquatic adaptation. We also identified insertions into 1685 and 888 potentially regulatory regions (within 5 kbp of the 5' UTR in genes) and into introns of 4141 and 1440 genes in L. colubrina and L. laticauda, respectively. PANTHER over/under-representation tests of these in gene and regulatory region insertions identified a number of pathways of potential adaptive significance (electronic supplementary material, table S4-S7). Therefore, Harbinger-Snek is a prime candidate in the search for genomic changes responsible for Laticauda's adaptation to a marine environment through altered gene expression.

Conclusion
In this report, we describe the rapid expansions of Harbinger-Snek TEs in Laticauda spp., which is to our knowledge, the fastest expansion of a DNA transposon in amniotes reported to date. The large number of insertions of Harbinger-Snek into exons and regulatory regions may have contributed to speciation and adaptation to a new habitat. As the HTT was prior to the divergence of eight Laticauda species, Harbinger-Snek presents a unique opportunity to reconstruct subsequent molecular evolution and determine the impact of HTT on the adaptation of Laticauda to the amphibious-marine habitat.
Ethics. All data were sourced from public data repositories. We carried out no fieldwork and did not deal with specimens.
Data accessibility. All data (genome sequences, transcriptomes) are publicly available from NCBI as detailed in the electronic supplementary material, table S6 in the electronic supplementary material, Information file SI_Tables.xls. All code used in the analyses is available at https://github.com/jamesdgalbraith/Laticauda_HT, a GitHub repository, this code is also archived (see below for link). All sequences used for analysis, multiple alignments and code are archived and freely available for re-use at: https://doi.org/10.5281/ zenodo.5140604 or https://zenodo.org/record/5140605 according to Creative Commons 4.0 international license.