Changes in brain microRNAs are associated with social evolution in bees

Evolutionary transitions to a social lifestyle in insects are associated with lineage-specific changes in gene expression, but the key nodes that drive these regulatory changes are largely unknown. We tested the hypothesis that changes in gene regulation associated with social evolution are facilitated by lineage-specific function of microRNAs (miRNAs). Genome scans across 12 bee species showed that miRNA copy-number is mostly conserved and not associated with sociality. However, deep sequencing of small RNAs in six bee species revealed a substantial proportion (20-35%) of detected miRNAs had lineage-specific expression in the brain, 24-72% of which did not have homologs in other species. Lineage-specific miRNAs disproportionately target lineage-specific genes, and have lower expression levels than shared miRNAs. The predicted targets of lineage-specific miRNAs are enriched for genes related to social behavior in social species, but they are not enriched for genes under positive selection. Together, these results suggest that novel miRNAs may contribute to lineage-specific patterns of social evolution. Our analyses also support the hypothesis that many new miRNAs are purged by selection due to deleterious effects on mRNA targets, and suggest genome structure is not as influential in regulating bee miRNA evolution as has been shown for mammalian miRNAs.


INTRODUCTION
dozens to hundreds of mRNAs, and may therefore regulate multiple gene networks [6,7]. Like 80 mRNAs, the majority of miRNAs are generated via Pol II transcription, and are spatially-and 81 temporally-specific in their expression patterns. Thus, complex changes in gene regulation can 82 be achieved with relatively minor changes in miRNA expression. This can result in major 83 phenotypic shifts or fine-tuning of phenotypic optimization [6]. Novel miRNAs can originate in 84 a variety of genomic features, including exons and introns of protein-coding and non-coding 85 RNA genes, transposable elements, pseudogenes, or intergenic regions, and thus emerge and 86 disappear over relatively rapid timescales [8][9][10][11]. It is thus not surprising that expansion of the 87 miRNA repertoire is associated with the evolution of morphological complexity across the tree 88 of life [9,[12][13][14].

90
There is accumulating evidence for a role of miRNAs in regulating the social lives of 91 insects. While most miRNAs seem to be conserved in major lineages of insects [15,16], 92 expression levels vary across individuals performing different social functions, such as between 93 workers performing different tasks in honey bees [17][18][19]. MiRNAs may also play a role in caste 94 determination, as queen-and worker-destined larvae express different sets of miRNAs 95 throughout development in honey bees [20][21][22] and bumble bees [23]. Additionally, miRNAs 96 play a role in regulating some physiological correlates of social behavior in honey bees, 97 including activation of ovaries in queens and workers [24] and response to the reproductive 98 protein vitellogenin [25]. Together, these studies suggest that miRNAs could play a role in the 99 evolution of eusociality through their effects on gene regulatory networks involved in socially-  Here we present a comprehensive comparative analysis of miRNAs across bee species 104 with variable social organization. These solitary and social species shared a common ancestor 105 ~75-110 mya [26]. Previous comparative studies of miRNAs associated with eusociality have 106 relied on the parasitoid wasp, Nasonia vitripennis, as a solitary comparison [16]. This is a far 107 more distant relative to the social insects, sharing a last common ancestor with bees nearly 200 108 mya [26]. Moreover, the parasitoid lifestyle of N. vitripennis is different from that of eusocial 109 bees in nearly every way. The lifestyle of solitary bees, such as the ones we include in this study, 110 share many features of their natural history with the presumed ancestors from which eusociality 111 evolved.

113
We first looked for miRNA repertoire expansions associated with eusociality by scanning 114 12 bee genomes for known miRNAs, and statistically evaluating copy-number of each miRNA 115 type with regard to differences in sociality in a phylogenetic model. We then described and 116 compared miRNAs expressed in the brains of six bee species from three families that include 117 repeated origins of eusociality. We tested the hypothesis that changes in gene regulatory function 118 associated with social evolution are facilitated by lineage-specific miRNA regulatory function 119 with two predictions: (1) If lineage-specific miRNAs are assimilated into ancestral gene 120 networks, their predicted target genes should be ancient and conserved. (2)  We used miRDeep2 [27] to identify and quantify miRNAs expressed in the brains of each 159 species, with a three-step process of miRNA detection to identify homologous miRNAs between 160 species. First, we gathered a set of mature miRNA sequences previously described in other insect 161 species (Table S1). Reads for each sample were quality filtered (minimum length 18, removal of 162 reads with non-standard bases), adapter-trimmed, and aligned to the species' genome (Table S2) 163 with the mapper.pl script. Approximately 60-84% of reads successfully mapped.

165
We then identified known and novel miRNAs in each sample with the miRDeep2.pl 166 script, using our curated set of insect miRNAs (Table S1) as known mature sequences. We 167 followed this with the quantifier.pl script to generate sets of known and novel miRNAs in each 168 sample, along with quantified expression information for each. We then filtered novel miRNAs 169 in each species according to the following criteria: no rRNA/tRNA similarities, minimum of five 170 8 reads each on the mature and star strands of the hairpin sequence, and a randfold p-value < 0.05.

173
We used these filtered miRNAs in a second run of detection and quantification, adding 174 the mature sequences of novel miRNAs from each species to our set of known miRNAs, and 175 repeated the pipeline above. This allowed detection of homologous miRNAs (based on matching 176 seed sequences) that are not represented in miRBase across our species. We applied the same set 177 of filtering criteria as for our first run.  We used bedtools intersect [28] to find overlap of miRNAs with predicted gene models 186 (Table S3), and repetitive element repeatmasker [29] annotations from previously established 187 repeat libraries [4,[30][31][32][33].  190 We extracted potential target sites 500 bp downstream from each gene model using 191 bedtools flank and getfasta [28], following previous studies [21] and an average 3' UTR region 192 of 442 nt in Drosophila melanogaster [34]. Target prediction was run with miRanda v3.3 [35]  Gene ages were determined using orthogroups from OrthoDB v9 [37], which includes A.  Gene Ontology (GO) terms for each species were derived from a previous study [4], with  For each species, brain or head gene expression datasets related to socially relevant 217 phenotypes (e.g., caste) and genes under positive selection were compared against targets of 218 lineage-specific miRNAs. The complete list of included studies and gene lists are in Table S4. 219 For M. genalis caste data, RNAseq reads from Jones et al. [40] (NCBI PRJNA331103) were 220 trimmed using Trimmomatic (v. 0.36) [41] and aligned to an unpublished genome assembly of 221 M. genalis (NCBI PRJNA494872) using STAR (v. 2.5.3) [42]. Reads were mapped to gene 222 features using featureCounts in the Subread package (v. 1.5.2) [43]. Remaining differential 223 expression analysis followed the methods of Jones et al. [40] using edgeR [44].

225
We also tested datasets identifying genes under selection in bee species or across social 226 lineages of bees for enrichment of lineage-specific miRNA targets (Table S4). When necessary, 227 we used reciprocal blastp (evalue < 10e -5 ) to identify orthologous genes across species, and only 228 genes with putative orthologs were included in the analysis. Hypergeometric tests (using phyper 229 in R) were used to test for significance of over-or under-enrichment between each pair of lists.

230
The representation factor (RF) given represents the degree of overlap relative to random 231 expectation (RF=1). RF is calculated as RF=x/E, where x is the number of genes in common 232 between two lists and E is the expected number of shared genes (E = nD/N, where n is the 233 number of genes in list 1, D is the number of genes in list 2, and N is the total number of genes.) 234 235 miRNA Diversification 236 We performed genome scans for small RNAs across 12 bee genomes (Table S2)  inclusion (--cut_ga) [45] to find all Rfam accessions in each bee genome. We used Spearman 239 rank regressions to test for significant associations between miRNA copy-number and social 240 biology. We categorized each species as solitary, facultative basic eusocial, obligate basic 241 eusocial, or obligate complex eusocial following Kapheim et al. [4]. We used the ape package 242 [46] in R [39] to calculate phylogenetic independent contrasts for both social organization and 243 miRNA copy-number, cor.test to implement the Spearman's rank correlation, and p.adjust with 244 the Benjamini-Hochberg method to correct for multiple comparisons.  (Table S5). The mean copy-number across all miRNAs in all bee genomes 251 was 1.19 ± 0.74. One exception was miR-1122, for which we found 70 copies in M. genalis, but 252 no copies in the other species. We did not find any significant associations between miRNA 253 copy-number and social organization (Table S5).

255
Expressed miRNA diversity in bee brains 256 We identified 97-245 known and novel miRNAs expressed in the brains of each of our 257 six species (Table S6). The majority of these were located in intergenic regions or introns (Table   258 1). Each species had at least one miRNA that originated from exons of protein-coding genes and 259 repetitive DNA (Table 1). Most of the overlap between miRNA precursors and repetitive DNA 260 corresponded to uncharacterized repeat elements, with very few overlaps with well-characterized 261 transposons or retrotransposons (Table 1).

263
Most of the detected miRNAs in each species had known homologs in at least one other 264 species. However, each species had a substantial proportion (20-35%) of detected miRNAs with 265 lineage-specific expression in the brain (Table 1; Fig. 1A), 24-72% of which did not have any 266 known homologs in other species (Table 1). We defined lineage-specific miRNAs as those with 267 lineage-specific expression and for which no seed match with a known mature miRNA was 268 identified (Table 1,  Lineage-specific miRNAs were localized both within genes and intergenically. The 276 proportion of lineage-specific miRNAs that were intra-or intergenic was similar to miRNAs 277 with homologs for every species except N. melanderi, for which a disproportionate number of 278 lineage-specific miRNAs were intragenic (χ 2 = 4.78, p = 0.03). Genes that serve as hosts for 279 intragenic lineage-specific miRNAs were not significantly older than would be expected by 280 chance (i.e., belong to orthogroups shared with vertebrates) in any species (hypergeometric tests: 281 p = 0.14-0.76). Across all species, genes that serve as hosts for intragenic lineage-specific   were highly conserved and belonged to orthogroups shared by vertebrates ( Fig. 2; Table S8).

320
However, most genes in each genome are also highly conserved, and there was not a significant miRNAs are more likely to target novel genes than would be expected by chance ( Fig. 2; Table   328 S8). lineage-specific miRNAs are more likely to be unique to each species than predicted by chance.

334
Pie chart size is scaled to number of predicted target genes for lineage-specific miRNAs, but not 335 for all genes. Color slices indicate orthogroup age for each predicted gene. The green slice 336 (lineage-specific genes) is larger for the set of genes predicted to be targeted by lineage-specific 337 miRNAs than for all genes.

339
We found mixed support for the prediction that novel miRNAs should target genes that  Contrary to our prediction, targets of lineage-specific miRNAs were not significantly 360 enriched for genes under selection in any species. We assessed overlaps between genes 361 undergoing positive directional selection in A. mellifera [53], B. impatiens [54], M. genalis [33], 362 and N. melanderi [32] and the predicted targets of lineage-specific miRNAs in each species.

393
Interestingly, this miRNA is also upregulated in worker-destined compared to queen-destined 394 honey bee larvae, and may thus play a role in caste differentiation [22]. Further investigation 395 with additional social and solitary species is necessary to determine how this miRNA may 396 influence social behavior across species.

398
We focused attention on miRNAs for which no mature miRNAs with seed matches were 399 detected in any other species, because these have the potential to influence the lineage-specific 400 patterns of gene regulatory changes previously shown to influence social evolution [3,4]. We 401 hypothesized that if novel miRNAs are inserted into existing gene networks that become co-402 opted for social evolution, they should target genes that are highly conserved across species.

403
Instead, we find that the targets of lineage-specific miRNAs are enriched for lineage-specific 404 genes, while genes belonging to ancient orthogroups were not more likely to be targets than 405 expected by chance. This suggests that novel miRNAs co-evolve with novel genes, as has been 406 shown for the evolution of cognitive function in humans [61]. Previous work in honey bees has 407 shown that taxonomically-restricted genes play an important role in social evolution. Expression 408 of taxonomically-restricted genes is significantly biased toward glands with specialized functions 409 for life in a social colony (e.g., the hypopharyngeal gland and the sting gland) [62], and toward 410 genes that are upregulated in workers [63]. Thus, it is reasonable to expect that new miRNAs We find support for the prediction that lineage-specific miRNAs should target genes with 427 social function in the Apidae (e.g., honey bees and bumble bees), but not the Halictidae (M. 428 genalis). One explanation for this pattern is technical. We define genes with social functions as 429 those that are differentially expressed among castes. The genetic basis of social behavior has 430 been much better studied in honey bees and bumble bees than in any other species, and the sets 431 of genes known to function in sociality is thus richer for apids than for halictids. Further, not all 432 genes that function in social behavior are expected to be differentially expressed in the brains of 433 different castes, and our analysis is thus likely to exclude some important genes. reproductive workers can become reproductive queens if given the opportunity [69]. This 442 flexibility is reflected in the magnitude of differences in brain gene expression patterns between 443 queen and worker honey bees (thousands of genes [48]) and M. genalis (dozens of genes [40] and genes with caste-biased expression were not found among two other social insect species 449 with reduced social complexity [72]. thus preventing gene mis-expression [76][77][78]. This selective constraint is likely to be most 466 significant in the 3' UTR region, where miRNA binding sites are located. The evolution of eusociality depends on many different tissues and physiological 482 processes, and brain-specific expression patterns are not likely to be representative of the  of animal complexity in a wide range of species [9,12,13]. The evolution of eusociality from a 495 solitary ancestor is associated with increases in phenotypic complexity, and considered to be one 496 of the major transitions in evolution [79]. We therefore hypothesized that evolutionary increases 497 in social complexity would be associated with expansions in the number of miRNAs found 498 within bee genomes. To the contrary, we find that most bees have a single copy of previously 499 identified miRNAs in their genomes. This is consistent with results of comparative genome scans 500 across several ant species [3]. A recent study of miRNA diversity in insects found that 501 morphological innovations such as holometabolous development was accompanied by the 502 acquisition of only three miRNA families [15]. This suggests that insect evolution is not as 503 reliant on major expansions of miRNA families as other taxonomic groups.

505
Additionally, our characterization of lineage-specific miRNAs expressed in the brain of 506 each species reveals that genome structure is not as influential in regulating bee miRNA 507 evolution as has been shown for human miRNAs. Novel human miRNAs tend to arise within 508 ancient genes that have multiple functions and broad expression patterns [65]. It is hypothesized 509 that this increases the expression repertoire of emergent miRNAs, and thus facilitates persistence 510 in the population [64,65]. Only in one species (N. melanderi) were lineage-specific miRNAs 511 more likely to be localized intragenically than previously identified miRNAs, while lineage-512 specific miRNAs did not differ from previously identified miRNAs in their genomic locations in 513 the other five species. This suggests emergence patterns for new miRNAs are unique to each 514 lineage in bees. We also do not find a consistent pattern between young, emerging miRNAs and 515 host gene age. There was no significant difference in the age of genes that serve as hosts for 516 established versus lineage-specific miRNAs across species. This is despite the fact that a similar 517 25 proportion of bee miRNAs are located within introns (31-43%; Table 1), compared to in 518 vertebrates (36-65%) [8]. However, the fact that 73-88% of miRNAs localized to genes are 519 encoded on the sense strand suggests that they would benefit from host transcription, as is 520 observed in vertebrates [8]. Additional research with insects will be necessary to identify general 521 patterns of miRNA evolution in relationship to genome structure.