Dietary and developmental shifts in butterfly-associated bacterial communities

Bacterial communities associated with insects can substantially influence host ecology, evolution and behaviour. Host diet is a key factor that shapes bacterial communities, but the impact of dietary transitions across insect development is poorly understood. We analysed bacterial communities of 12 butterfly species across different developmental stages, using amplicon sequencing of the 16S rRNA gene. Butterfly larvae typically consume leaves of a single host plant, whereas adults are more generalist nectar feeders. Thus, we expected bacterial communities to vary substantially across butterfly development. Surprisingly, only few species showed significant dietary and developmental transitions in bacterial communities, suggesting weak impacts of dietary transitions across butterfly development. On the other hand, bacterial communities were strongly influenced by butterfly species and family identity, potentially due to dietary and physiological variation across the host phylogeny. Larvae of most butterfly species largely mirrored bacterial community composition of their diets, suggesting passive acquisition rather than active selection. Overall, our results suggest that although butterflies harbour distinct microbiomes across taxonomic groups and dietary guilds, the dramatic dietary shifts that occur during development do not impose strong selection to maintain distinct bacterial communities across all butterfly hosts.


11
Testing for bacterial contamination: 12 13 As reported by a previous study [1] DNA extraction kits can introduce bacterial contaminants. To test for 14 contamination, we carried out a mock DNA extraction using all the kit reagents (negative control). As a 15 positive control, we used larval DNA of A. merione and D. chrysippus. We prepared 16S rRNA libraries and 16 quantified the amount of DNA after each PCR purification step. We used Qubit (Qubit Fluorometric 17 Quantitation, Thermo Scientific), a highly sensitive method for DNA quantification. PCR with larval 18 samples yielded ~700 ng and ~3000 ng of total DNA after the first and second round of PCR purification 19 respectively. In the negative control, we could not detect any DNA using Qubit after either round of PCR 20 purification. Thus, it is unlikely that our DNA extraction kits introduced large amounts of contaminating 21 16S reads in our dataset. 22

23
Analysing bacterial community structure: 24 We quality filtered and analysed our data in QIIME (Quantitative Insights Into Microbial Ecology) version 25 1.9.1 [2]. We used 97% sequence similarity to define bacterial OTUs. We avoided summarizing bacterial 26 OTUs in different taxonomic groups since each group can contain several OTUs with very different relative 27 abundances, leading to confusion while statistically comparing relative abundance across host samples.

28
We used the following commands in QIIME to obtain the final bacterial OTU After obtaining the number of reads for each bacterial OTU in each sample, we obtained relative 38 abundance for each OTU in each sample. We further implemented 5 different filtering cut-offs to remove 39 rare OTUs from our analysis that may not contribute significantly to the bacterial community or may have 40 arisen due to sequencing errors. These cut-offs were as follows: 41 42 (A) Minimum OTU abundance of 5% in at least 1 sample 43 As an alternative to the set of dominant bacteria described above, we selected OTUs that contributed ≥ 44 5% relative abundance in at least 1 sample within the entire dataset. Even if all but one sample had ≥5% 45 relative abundance for an OTU, it was thus retained in the trimmed dataset. As a result, we chose a large 46 fraction of the full 16S-derived community for comparative analysis, while removing most of the rare 47 members 48

(B) Selecting top 5 abundant OTUs 49
To determine the most dominant set of OTUs, we selected the 5 most abundant OTUs (from 5% cut-off 50 OTU set as described above) for each developmental stage for each butterfly species. This small subset of 51 bacterial OTUs also allowed better visualization of the dynamics of dominant community members across 52 butterfly development. To select the top 5 bacterial OTUs, we first calculated the mean relative 53 abundance of each bacterial OTU across all replicates of a developmental stage of a host species. For each 54 stage, we sorted the mean relative abundance by rank and created a combined dataset of the 5 most 55 abundant OTUs across all stages of a host species. For instance, if OTU X was one of the top 5 OTUs in 56 larvae but not in adults, it was still selected as a part of the top 5 bacterial OTU dataset for that butterfly 57 host, with the appropriate mean relative abundance included for each developmental stage. This is 58 exemplified in the tables below. For some species the top 5 OTUs (T5) were similar across larvae, pupae 59 and adults ( (C) Minimum OTU abundance of 0.005% across the entire dataset 69 Next, we excluded rare OTUs based on their abundance across all samples in our original dataset (instead 70 of allowing high abundance in at least one sample, as explained above). We calculated the ratio of total 71 reads contributed by the OTU to the total number of reads obtained across the entire dataset (not per 72 sample), and removed OTUs with a total relative abundance less than 0.005%. We selected this cut-off 73 based on a previous study that reviewed sequencing data generated across several sequencing platforms 74 and found that this cut-off helps eliminate OTUs that may arise due to Illumina sequencing errors [3]. 75

(D) Minimum 20 reads per OTU 76
We filtered out only those OTUs with low sequencing coverage (rather than filtering by relative 77 abundance). For each sample, we removed OTUs that had less than 20 reads. We expected this filtering to 78 remove any spurious OTUs that could have arisen due to PCR or sequencing errors. Previous studies have 79 removed single or doubletons [4][5][6] and occasionally have also applied the cut-off of minimum 10 reads 80 [7][8][9]. However, in order make our analysis stringent we applied a cut-off of minimum 20 reads per OTU 81 per sample. 82

(E) Selecting Core OTUs 83
In addition to various abundance based cut-offs (A-D), we next used a frequency based cut-off to extract 84 bacterial OTUs that occur in >80% of the samples -the "Core OTU" set -using the 85 "compute_core_microbiome.py" function in Qiime 1.9.1. With this set of OTUs, bacterial communities of 86 samples from run 1 and run 2 showed 100% overlap, as expected. 87 After applying each cut-off independently, we recalculated the relative abundance of OTUs for subsequent 88 downstream analysis. The OTU filtering was done in Microsoft Excel and using customized algorithms 89 written in R [10]. 90 91

Statistical analysis 92
We used R [R Core Team 2013] for all statistical analyses. To test the variation in bacterial community 93 structure across different treatment groups (e.g. species, family), we performed Permutational 94 Multivariate Analysis of Variance (PERMANOVA) using the Adonis function in the R package Vegan [11]. To 95 visualise and quantify the impact of treatment groups on bacterial community composition we separately 96 used constrained or unconstrained ordination. 97 For constrained ordination, we used Canonical Analysis of Principal Coordinates based on Discriminant 98 Analysis (CAPdiscrim) in the R package "BiodiversityR" [12]. This function, in association with the R 99 package "Vegan", converts the abundance matrix of all OTUs into a distance matrix using Bray-Curtis 100 distances. We also used this package to perform a linear discriminant analysis (LDA) on the principal co-101 ordinate (PCo) values generated from the distance matrix, to find the impact of treatment groups on 102 bacterial community structure. LD plots show the clustering of data points based on the linear 103 discriminants with highest contribution to between-group variation. We added an ellipse showing 95% 104 confidence intervals (using standard error) using the function "Ordiellipse" in R package "Vegan" [13] in 105 order to visually differentiate the clusters. To statistically test the impact of different treatment groups on 106 variation in bacterial communities, we used MANOVA. 107 In addition to CAPdiscrim, we carried out an unconstrained analysis of principal co-ordinates (PCoA) to 108 test the separation between treatment groups and variation across replicates, based on the phylogenetic 109 distance between bacterial OTUs. For this analysis, we calculated beta diversity of bacterial OTUs using 110 weighted unifrac distance followed by generation of principle coordinates that were used for plotting 111 ordination. For this analysis we used Qiime 1.9.1 [14]. We calculated beta diversity using function 112 beta_diversity.py and plotted PCoA using principal_coordinates.py and make_2d_plots.py in Qiime 1. 113 CAPdiscrim is a constrained ordination analysis method, where groups are defined a priori and LDA 114 maximises between group variation. For our analysis, we report the 3 main outputs of CAPdiscrim, a) 115 classification success, b) proportion of trace (LD1, LD2) and c) p value of MANOVA. Percent classification 116 success suggests how well groups can be distinguished, whereas proportion of trace (LD1 and LD2) 117 represents the linear discriminants that best contribute to the classification success and explain maximum 118 between-group variance. On the other hand, PCoA is an unconstrained analysis where samples are not 119 categorized into groups. PCoA creates an ordination that best explains the variation across the entire 120 dataset.