Vertically transmitted rhabdoviruses are found across three insect families and have dynamic interactions with their hosts

A small number of free-living viruses have been found to be obligately vertically transmitted, but it remains uncertain how widespread vertically transmitted viruses are and how quickly they can spread through host populations. Recent metagenomic studies have found several insects to be infected with sigma viruses (Rhabdoviridae). Here, we report that sigma viruses that infect Mediterranean fruit flies (Ceratitis capitata), Drosophila immigrans, and speckled wood butterflies (Pararge aegeria) are all vertically transmitted. We find patterns of vertical transmission that are consistent with those seen in Drosophila sigma viruses, with high rates of maternal transmission, and lower rates of paternal transmission. This mode of transmission allows them to spread rapidly in populations, and using viral sequence data we found the viruses in D. immigrans and C. capitata had both recently swept through host populations. The viruses were common in nature, with mean prevalences of 12% in C. capitata, 38% in D. immigrans and 74% in P. aegeria. We conclude that vertically transmitted rhabdoviruses may be widespread in a broad range of insect taxa, and that these viruses can have dynamic interactions with their hosts.


Introduction
Insects are host to a range of vertically transmitted parasites [1,2]. Vertical transmission is normally associated with maternally transmitted bacterial endosymbionts [1], and with transposable elements that can proliferate within the host genome and spread through populations [3,4]. Many freeliving viruses are also capable of vertical transmission without integrating into the host genome [5]. In most cases, this is combined with horizontal transmission. For example, Helicoverpa armigera densovirus-1, a virus of the cotton bollworm, is transmitted vertically as well as horizontally [6]. However, relatively little is known about most obligately vertically transmitted insect viruses [2,7].
A well-characterized obligately vertically transmitted virus that does not integrate into its host's genome is the sigma virus of Drosophila melanogaster (DMelSV) [8]. This is a negative sense RNA virus in the family Rhabdoviridae that is found in the cytoplasm of cells [9]. Typically, vertically transmitted cytoplasmic pathogens can only be transmitted maternally, and this alone cannot allow them to increase in prevalence [10]. However, DMelSV is transmitted by both infected males and females, which allows it to spread through host populations, even if it carries a cost to the host [2,9,[11][12][13]. This is analogous to the spread of transposable elements through a host population, where the element can spread at rates greater than a Mendelian locus, giving it a natural transmission advantage. Two more vertically transmitted sigma viruses have been characterized in different species of Drosophila (DAffSV and DObsSV from Drosophila affinis and Drosophila obscura, respectively). Like DMelSV, these are biparentally transmitted through both the eggs and sperm of their hosts [14,15]. Females transmit the virus at a high rate (typically close to 100%), whereas males transmit the virus at a lower rate, probably because sperm transmit a lower amount of virus to the developing embryo [9,15].
A consequence of biparental transmission is that, like transposable elements, sigma viruses can rapidly spread through host populations [3,4,16]. For example, a genotype of DMelSV that was able to overcome a host resistance gene called ref(2)P swept across Europe in the 1980s-1990s [17 -20]. Similarly, DObsSV has swept through populations of D. obscura in the last decade [15]. Theoretical models demonstrate that biparental transmission alone can allow sigma viruses to spread to high frequencies in just a few generations, even if the virus is costly to its host [12,15,21]. This mirrors the situation in bacterial endosymbionts where rapid sweeps have been demonstrated in a wide range of different insect taxa (e.g. [22][23][24][25][26][27]).
It is unknown whether obligately vertically transmitted viruses are common in nature or are just a quirk of a few species of Drosophila. Recent metagenomic sequencing has found rhabdoviruses associated with a wide diversity of insects and other arthropods, including numerous viruses closely related to the three vertically transmitted sigma viruses described in Drosophila [28,29]. This raises the prospect that vertically transmitted rhabdoviruses could be common insect pathogens. Here, we examine three recently identified viruses that fall into the sigma virus clade and infect Mediterranean fruit flies (medflies; Ceratitis capitata, Diptera, Tepthritidae), Drosophila immigrans (Diptera, Drosophilidae, sub-genus Drosophila) and speckled wood butterflies (Pararge aegeria, Lepidoptera, Nymphalidae) [28,29]. We go on to test whether these viruses are vertically transmitted and investigate whether they show evidence of the rapid population dynamics seen in other sigma viruses.

Material and methods (a) Transmission
We determined the patterns of transmission of Ceratitis capitata sigmavirus (CCapSV), Drosophila immigrans sigmavirus (DImmSV) and Pararge aegeria rhabdovirus (PAegRV), which all fall into the sigma virus clade [28]. We carried out crosses between infected and uninfected males and females, and measured the rates of transmission to their offspring. We also checked for sexual or horizontal transmission between the adults used in the crosses, to confirm this was true paternal transmission rather than sexual or horizontal and then maternal transmission. A subset of the offspring from each cross were tested for infection as adults.
Infected C. capitata were collected from the Cepa Petapa laboratory stock and uninfected flies were from the TOLIMAN laboratory stock. Virgin females and males were crossed and their offspring collected. Only 29% of flies from the Cepa Petapa stock were infected when we carried out the crosses (see below). In total, we tested 10 crosses between infected females and uninfected males, eight crosses with uninfected females and infected males, and seven crosses where neither sex was infected. We tested both parents and a mean of six offspring for each cross (range ¼ 4 -8, total of 197 offspring) for infection using reverse transcription (RT) PCR (electronic supplementary material, table S1).
Infected D. immigrans were collected from a DImmSV infected isofemale line (EGL 154) and uninfected flies were collected from a stock established from four isofemale lines (all lines originated from Cambridge, UK). Virgin females and males were crossed and their offspring collected. In total, we tested 20 crosses between infected females and uninfected males, 18 crosses with uninfected females and infected males and eight crosses where neither sex was infected. We tested both parents and a mean of four offspring for each cross (range ¼ 2 -4, total of 178 offspring) for infection using RT-PCR (electronic supplementary material, table S1).
To measure transmission of PAegSV in speckled wood butterflies, we examined crosses between the offspring of wild caught P. aegeria females that were an unknown mix of infected and uninfected individuals. The wild caught females were collected in Corsica and Sardinia in May 2014 (see below). Virgin females and males were crossed, and their offspring collected. We tested the infection status of the parents used for the crosses post hoc using RT-PCR; in total there were 10 crosses between infected females and uninfected males, eight crosses with uninfected females and infected males and one cross where neither sex was infected (data not shown from 28 crosses where both parents were infected). We tested a mean of four offspring for each cross (range ¼ 1 -8, total of 171 offspring) for virus infection using RT-PCR (electronic supplementary material, table S1).  (51.363, 22.525). We also sequenced one infected P. aegeria individual from each of the families used for the crosses described above. These individuals were collected in May 2014 from Corsica (41.752, 9.191; 41.759, 9.184; 41.810, 9.246; 41.377, 9.179; 41.407, 9.171; 41.862, 9.379; 42.443, 9.011 and 42.516, 9.174) and Sardinia (39.964, 9.139; 39.945, 9.199; 41.233, 9.408; 40.911, 9.095 and 40.037, 9.256).

(c) Virus detection and sequencing
Individual insects were homogenized in Trizol reagent (Invitrogen) and RNA extracted by chloroform phase separation, followed by RT with random-hexamers using GoScript reverse transcriptase (Promega). For each sample, we carried out PCRs to amplify partial nucleocapsid (N) and RNA Dependant RNA Polymerase (L) gene sequences from the respective viral genomes (electronic supplementary material table S1), as well as a control gene from the insect genome (COI or RpL32) to confirm the extraction was successful. For CCapSV and PAegRV, we sequenced all infected samples (19 and 130, respectively), for DImmSV we sequenced a subset of 87 samples from across all populations. PCR products were treated with Antarctic Phosphatase and Exonuclease I (New England Biolabs) and directly sequenced using BigDye on a Sanger ABI capillary sequencer (Source Bioscience, Cambridge, UK). Data were trimmed in Sequencher (v. 4.5) and aligned using ClustalW in Bioedit software. All polymorphic sites were examined by eye. Recombination is typically absent or rare in negative sense RNA viruses [30]; we confirmed this was the case using GARD [31] with a general time reversible model and gamma-distributed rate variation with four categories. Median joining phylogenetic networks were produced using PopArt (v. 1.7 http://popart. otago.ac.nz.). Population genetic analysis was carried out in DNAsp (v. 5.10.01). P-values for estimates of Tajima's D were estimated using DNAsp; across all sites p-values were calculated using coalescent simulations assuming no recombination, for synonymous sites they were estimated using a beta distribution. Maps used for the figures were from QGIS (v. 2.14) [32].

(d) ADAR edits
We observed some of the DImmSV sequences had a cluster of mutations in the N gene consistent with those caused by adenosine deaminases that act on RNAs (ADARs). ADARs target double-stranded RNA and convert adenosine (A) to inosine (I), and display a 5 0 neighbour preference (A ¼ U . C . G) [33,34]. During viral genome replication I's are paired with guanosine (G), so editing events appear as changes from A to G when sequenced. Sigma viruses have been found to show mutations characteristic of ADAR editing, with single editing events causing clusters of mutations [35,36].
As ADAR-induced hyper-mutations will be a source of nonindependent mutation, we aimed to exclude such mutations to prevent them confounding our analyses. Compared with a 50% majority-rule consensus sequence of our DImmSV sequences, we identified 17 sites with A to G mutations at ADAR preferred sites across the negative sense genome and its positive sense replication intermediate. This is a significant over-representation when compared with ADAR non-preferred sites (Fisher's exact test, p ¼ 0.037). We therefore excluded all ADAR preferred sites from our dataset and carried out our population genetics analysis on this data. For CCapSV and PAegRV sequences, we did not detect an over-representation of A to G mutations at ADAR preferred sites, suggesting these viruses did not contain ADAR hyper-mutations. The R [37] script used to identify ADAR edits and exclude preferential ADAR editing sites is available in a data repository (https://dx.doi.org/10.6084/m9.figshare.3438557.v1).

(e) Reconstructing viral population history
To reconstruct how long ago these viruses shared a common ancestor and how their population size has changed over time, we used a Bayesian phylogenetic inference package (BEAST v. 1.8.0) [38]. The evolutionary rate of viruses was assumed to be the same as in DMelSV [39]. This is similar to other related rhabdoviruses [40,41] and evolutionary rates of DMelSV do not differ significantly between the laboratory and field [42]. To account for uncertainty in this evolutionary rate estimate, we approximated its distribution with a normal distribution (mean ¼ 9.9Â10 25 substitutions/site/year, standard deviation ¼ 3.6 Â 10 25 , substitutions/site/year), and this distribution was used as a fully informative prior to infer dates of the most recent common ancestor of each virus. The model assumed a strict molecular clock model and an HKY85 substitution model [43]. Sites were partitioned into two categories by codon position (1 þ 2, 3), and separate evolutionary rates were estimated for each category. Such codon partition models have been shown to perform as well as more complex non-codon partitioned models but with fewer parameters [44]. We tested whether a strict clock rate can be excluded by running models with a lognormal relaxed clock; for all three viruses, we found the posterior estimate of the coefficient of variation statistic abuts the zero boundary, and so a strict clock cannot be excluded [45].
We reconstructed the phylogeny with an exponentially expanding population ( parametrized in terms of growth rate) or a constant population size model. The population doubling time was calculated from the growth rate as ln(2)/growth rate. We excluded a constant population size if the 95% confidence intervals (CIs) for estimates of growth rate did not cross zero. We verified that we were using the most suitable demographic model for each virus using the path sampling maximumlikelihood estimator implemented in BEAST (see the electronic supplementary material) [46]. We note that estimates of the root age were similar across models (electronic supplementary material, table S2).
Each model was run for 1 billion MCMC steps with sampling every 100 000 generations for each model, and a 10% burnin was used for all parameter estimates, selected after examining trace files by eye. Posterior distributions and model convergence was examined using TRACER (v. 1.6) [47] to ensure an adequate number of independent samples. The 95% CI was taken as the region with the 95% highest posterior density.

Results
We tested for horizontal or sexual transmission between the parents used in the crosses. For CCapSV and DImmSV, such transmission appears to be rare or absent in this situation, as we did not detect virus in uninfected individuals that mated with infected individuals during the crosses. We also did not detect virus in any of the offspring from crosses where both parents were uninfected, suggesting results were not due to contamination. As the infection status of parents in crosses to measure transmission of PAegRV was only established post hoc, we were unable to test for evidence of horizontal or sexual transmission between parents.

(b) Sigma viruses are common in natural populations
We tested 243 C. capitata, 527 D. immigrans and 137 P. aegeria from the wild for the presence of their respective viruses using RT-PCR. We found the mean viral prevalence across populations was 12% for CCapSV, 38% for DImmSV and 74% for PAegRV. There were significant differences in the prevalence between populations (electronic supplementary material, tables S4 -S6) for CCapSV (figure 2; x 2 -test, d.f. ¼ 1, x 2 ¼ 13.08, p ¼ ,0.001) and DImmSV (figure 2;

(c) Sigma viruses can rapidly spread through populations
To infer the past dynamics of these viruses, we sequenced part of the N and L genes from multiple viral isolates. We were unable to detect any evidence of recombination in our data. We used these sequence data to produce a median joining phylogenetic network for each of the three viruses (figure 3), and examined the population genetics of these virus populations. DImmSV had the lowest genetic diversity of the three viruses, and appears to have very recently swept through host populations. We found only 40 polymorphic sites out of 929 sites examined (16 in N and 24 in L gene) over 87 viral sequences. The average number of pairwise differences per site ( p) was 0.20% across all sites. This low genetic diversity appears to have been caused by the virus recently sweeping through host populations, with the DImmSV sequences forming a star-shaped network as is expected following a recent sweep (figure 3). Owing to the low levels of population structure and small sample sizes from individual populations (see below), we combined sequences from across populations to investigate the past demography of the virus. Overall, there was a large excess of rare variants compared with that expected under the neutral model, which is indicative of an expanding population or selective sweep (Tajima's D ¼ 22.45, p , 0.001). Out of 40 segregating sites, 27 are singletons. This result held even if only synonymous sites were analysed (Tajima's D ¼ 22.27 p , 0.01), indicating that it is not likely to be caused by slightly deleterious amino acid polymorphisms being kept at low frequency by purifying selection. Furthermore, these results are unlikely to be confounded by population structure, as when we analysed only the samples from Derbyshire (the population with the largest sample size, n ¼ 41) we found Tajima's D was significant for all sites (D ¼ 21.68 p ¼ 0.026). Tajima's D was negative but not significant for synonymous sites (synonymous: D ¼ 21.11 p . 0.1), probably because there were only six synonymous polymorphisms in this population.
To reconstruct past changes in the effective size of the DImmSV population, we used a Bayesian approach based on the coalescent process in the BEAST software [38,48]. The posterior distribution for the estimated growth rate did not overlap zero (95% CI ¼ 0.12, 0.99) suggesting the population had expanded, and the exponential growth model was also preferred in the path sampling analysis (electronic supplementary material, table S3). Assuming the evolutionary rate is the same as the related D. melanogaster virus DMelSV, we estimated the viral population size has doubled every 1.5 years (95% CI¼ 0.7-5.7), with the viruses in our sample sharing a common ancestor 16 years ago (95% CI¼ 5-31 years).
CCapSV had higher genetic diversity, probably reflecting a somewhat older infection than DImmSV. Combining sequences across populations, the genetic diversity was approximately five times greater than DImmSV (p ¼ 0.99% across all sites) and we found 44 segregating sites over 1278 sites (21 in N gene and 23 in L gene) in 19 viral sequences. As CCapSV showed high levels of genetic population structure (see results below), we restricted our analyses of demography to viruses from Morocco (the population with the greatest number of samples, n ¼ 13). For these viruses, we found a significant excess of rare variants at all  sites (Tajima's D ¼ 21.67 p ¼ 0.035) and for synonymous sites (Tajima's D ¼ 21.81 p , 0.05). This suggests the Moroccan population of CCapSV has been expanding or undergone a recent selective sweep. The coalescent analysis supported the hypothesis that the CCapSV population had expanded (95% CI of the exponential growth parameter ¼ 0.031, 0.343) and the exponential growth model was preferred in the path sampling analysis (electronic supplementary material, table S3). We estimated the effective population size has doubled every 3.9 years (95% CI¼ 2.0-22.7 years), with the Moroccan viruses sharing a common ancestor 45 years ago (95% CI ¼ 16-85 years) suggesting this is a more ancient expansion, with a greater number of mutations accumulating since the lineages separated. Combining the samples from the two populations, the most recent common ancestor of all CCapSV isolates was estimated to be 51 years ago (95% CI ¼ 18-96 years) with an exponential growth model, assuming the evolutionary rate is the same as DMelSV.
PAegRV was inferred to be the oldest of the three infections. The genetic diversity of this virus was over 11 times that of DImmSV (p ¼ 2.22% across all sites), and we found 204 segregating sites over 1281 sites (83 in N gene 121 in L gene) in 130 viral sequences. As PAegRV showed high levels of genetic population structure (see results below), we restricted our analyses of demography to viruses from South Cambridgeshire (the population with the largest sample size, n ¼ 36). There was no evidence of a population expansion in viruses from South Cambridgeshire, as there was not an excess of rare variants for all sites (Tajima's D ¼ 20.58, p ¼ 0.322) or synonymous sites (Tajima's D ¼ 20.65, p . 0.1). Furthermore, we could reject a model of exponential growth in BEAST, as the 95% CI of the growth rate overlapped zero (95% CI ¼ 20.008, 0.028), and the constant population size model was preferred in the path sampling analysis (electronic supplementary material, table S3). The BEAST analysis supported the conclusion that this was an older infection, with the common ancestor of all of our viral isolates existing 309 years ago (95% CI ¼ 105-588 years), assuming the evolutionary rate is the same as DMelSV.

(d) Sigma viruses have genetically structured populations
The virus populations were all geographically genetically structured, with the oldest infections (see above) showing  rspb.royalsocietypublishing.org Proc. R. Soc. B 284: 20162381 the greatest levels of structure. Using partial sequences of the N and L genes, we quantified genetic structure by calculating an analogue of F ST (K ST ) that measures the proportion of the genetic variation contained in subpopulations relative to the population as a whole [49].

Discussion (a) Vertical transmission
Free-living viruses that are obligately vertically transmitted have only been reported from a small number of insect species. However, recent metagenomic studies have found that sigma viruses related to a vertically transmitted pathogen of Drosophila melanogaster are widespread in insects [28,29]. Here, we have sampled three of these viruses from different insect taxa (Lepidoptera, and Diptera in the Tephritidae and Drosophilidae), and found that they are all vertically transmitted. Therefore, most sigma viruses are likely vertically transmitted, and may represent a major group of insect pathogens.
The patterns of vertical transmission that we observed for three sigma viruses from C. capitata, D. immigrans and P. aegeria are consistent with the mode of biparental vertical transmission seen in other sigma viruses, with high rates of maternal transmission and lower rates of paternal transmission [9,15]. In two of the viruses we studied-PAegRV and DImmSV-we found transmission through both eggs and sperm, but higher transmission rates through eggs. However, we only observed maternal transmission of CCapSV. Paternal transmission may also occur in CCapSV but we simply failed to detect it in this experiment, as in other sigma viruses some infected males are unable to transmit the virus if the line is 'unstabilized' [9,15]. This is commonly seen in insect lines with low infection rates (only 29% of flies carried CCapSV in our infected line) and appears to be due to males that are infected from their father receiving only a low dose of virus that fails to infect the male germ-line [9,15].

(b) Population dynamics
Our population genetic data show that vertical transmission can allow rapid sweeps of these viruses through host populations. Here, we have shown DImmSV and CCapSV sequences both show evidence of recent sweeps (approx. 15 and 50 years ago, respectively). Of the two sigma viruses studied previously (DMelSV from D. melanogaster and DObsSV in D. obscura) both showed evidence of a recent sweep through host populations [15,39,42]. The spread of DImmSV and CCapSV has occurred in the last few decades, with the viral populations doubling in just a few years. Therefore, sigma viruses seem to have very dynamic associations with host populations.
Many cytoplasmic bacteria are also vertically transmitted in insects, and these are almost exclusively only transmitted by infected females. This has led these endosymbionts to evolve different strategies to ensure their persistence, such as distorting the host sex ratio and causing cytoplasmic incompatibility [10]. Biparental transmission is an alternate strategy that can allow sigma viruses to rapidly invade host populations, even if they are costly to their host [21,39]. This reflects the pattern seen in other vertically transmitted parasites. For example P-elements invaded populations of D. melanogaster worldwide in the twentieth century and are currently spreading through D. simulans populations [3,16]. Likewise, vertically transmitted bacterial endosymbionts have frequently been found to have recently swept through new species or populations [22][23][24][25][26][27]. A striking example of this is the spread of Wolbachia through populations of Drosophila simulans on the West Coast of the US, at a rate of more than 100 km a year [22].
The high mutation rates and small effective population sizes of sigma viruses means they can reveal fine level population structure of their hosts [39,42]. For example, medflies have expanded their range from Africa into Europe [50], and this may explain the greater diversity seen in CCapSV from Morocco compared with Crete. In PAegRV, we see genetic differentiation between populations from Corsica and Sardinia, suggesting limited migration between these islands, despite them being less than 15 km apart. This mirrors data from island populations in Sweden where it has been suggested even short expanses of open water may constitute a significant barrier to P. aegeria [51]. PAegRV from UK populations show significant population structure, despite speckled wood butterflies having recently undergone a range contraction and then expansion in the UK [52]. Similarly, population genetic analyses of populations of P. aegeria from mainland northern Europe show significant population structure [51]. Drosophila immigrans is found worldwide and its globalization predates the spread of DImmSV. It would therefore be of interest to examine whether the sweep of DImmSV has occurred in populations outside of Europe.
Why do sigma viruses have such dynamic interactions with their hosts? In the case of DMelSV, this was thought to be a selective sweep of viral genotypes able to overcome a host resistance gene called ref (2)P [19,20,53]. DMelSV genotypes that could overcome the ref(2)P resistance allele rapidly increased in frequency between the early 1980s and early 1990s in French and German populations [18,54]. In a recombining population, a selective sweep would only reduce diversity surrounding the site under selection, but as these viruses do not recombine the entire genome is affected by a selective sweep. For DImmSV, CCapSV and DObsSV, we are therefore unable to distinguish between a long-term host -virus association overlain by a recent selective sweep of an advantageous mutation, or the recent acquisition and spread of novel viruses through previously uninfected host populations. To separate these hypotheses, we would need to discover either closely related viruses in other species or populations, or the remnants of more diverse viral populations that existed prior to a selective sweep.

Conclusion
Our results suggest that vertically transmitted rhabdoviruses may be widespread in a broad range of insect taxa. It remains rspb.royalsocietypublishing.org Proc. R. Soc. B 284: 20162381 to be seen whether this mode of vertical transmission is a unique trait of sigma-like rhabdoviruses, or whether this is the case for the numerous rhabdoviruses from other clades that infect insects [28,29]. Sigma viruses commonly have dynamic interactions with their hosts, with vertical transmission though both eggs and sperm enabling them to rapidly spread through host populations.