Intestinal microbes: an axis of functional diversity among large marine consumers

Microbes are ubiquitous throughout the world's oceans, yet the manner and extent of their influence on the ecology and evolution of large, mobile fauna remains poorly understood. Here, we establish the intestinal microbiome as a hidden, and potentially important, ‘functional trait’ of tropical herbivorous fishes—a group of large consumers critical to coral reef resilience. Using field observations, we demonstrate that five common Caribbean fish species display marked differences in where they feed and what they feed on. However, in addition to space use and feeding behaviour—two commonly measured functional traits—we find that interspecific trait differences are even more pronounced when considering the herbivore intestinal microbiome. Microbiome composition was highly species specific. Phylogenetic comparison of the dominant microbiome members to all known microbial taxa suggest that microbiomes are comprised of putative environmental generalists, animal-associates and fish specialists (resident symbionts), the latter of which mapped onto host phylogeny. These putative symbionts are most similar to—among all known microbes—those that occupy the intestines of ecologically and evolutionarily related herbivorous fishes in more distant ocean basins. Our findings therefore suggest that the intestinal microbiome may be an important functional trait among these large-bodied consumers.


Content Description
Data Availability: DOI and accession numbers for study related data. Sample Naming: Naming scheme for samples. Supplementary Methods: Details on DNA extraction, sequencing, and read processing, and functional analysis. Table S1: Number of bites observed for each herbivore species at each site. Table S2: Summary of field-based feeding observations. Table S3: Metadata and microbiome diversity estimates for each sample. Table S4: Total taxonomic diversity (by Class) of herbivore microbiomes.  Table S7: Accession numbers and unique codes for sequence data in Figure 3. Figure S1: Class-level relative abundance of microbial communities from each sample.

Data Availability
We made additional data products and processing scripts available through online repositories. • You can also access the complete figshare project repository here.

Sample Naming
Raw fastq data files were named using the root format RunQ_GnSpe000_G, where Q was the run number (1, 2, or 3), GnSpe was the host genus and species, 00 was a unique host ID number, and G was the gut segment (F = foregut; M = midgut; H = hind). For example, the file name Run1_SpVir11_M_S147_L001_R2_001.fastq corresponded to: the reverse read; midgut sample; Sparisoma viride; individual 11; Run01. All raw data are publicly available. The 53 individual fish encompassed seven species and three genera. Two species-Sparisoma chrysopterum and Scarus vetula-were only represented by 1 and 2 individuals, respectively. Though we omitted these samples from the analysis due to low sample size, they were sequenced and processed along with the rest of the samples. The data are publicly available.

Study organisms
The five species of herbivorous fishes in this study-Acanthurus coeruleus, Acanthurus tractus, Scarus taeniopterus, Sparisoma aurofrenatum, and Sparisoma viride-all feed on benthic algae, but each species feeds in a different way and targets different components of the algal assemblage. Sp. viride is an excavating bioeroder that uses its strong, beak-like jaws to remove reef carbonates while feeding on endolithic algae [1]. The grazer Sc. taeniopterus primarily scrapes epilithic algal turfs from reef surfaces, while Sp. aurofrenatum tends to browse on erect macroalgae and longer turf algae that it tears from the reef [2,3]. Both A. coeruleus and A. tractus crop algal filaments and browse on macroalgae, but the two species feed at different rates, digest food via different mechanisms [4], and display species-specific feeding preferences [5].

DNA extraction, sequencing, & read processing
For all samples, we homogenized material from each gut segment (fore, mid, hind) in separate 50 mL conical tubes for 10 minutes on a Vortex Genie 2. We collected 200 mg (wet weight) of homogenate for DNA extraction following the Human Microbiome Project Core Microbiome Sampling Protocol A (v12.0, HMP Protocol # 07-001) for stool samples. We heat treated each sample, first at 65°C for 10 minutes, followed by 95°C for 10 minutes. We then used the PowerSoil® DNA Isolation Kit (formerly MoBio, Carlsbad CA, USA) following the manufacturer's protocol to extract community DNA from each sample. Extracted DNA was sequenced on an Illumina MiSeq by Integrated Microbiome Resource (Dalhousie University). We targeted the V4-V5 hypervariable region using 515F and 926R primers [6]. We generated sequence data for 159 samples-three gut segments (fore, mid, and hind) from 53 individuals. Sequencing was conducted across three runs. In Run01, 144 samples were sequenced and, due to lower than average yield, were re-sequenced (Run02). The remaining 15 samples were sequenced on a separate run (Run03).
Cutadapt [7] was used to remove adapter sequences (max error rate = 0.12) and then mothur [8] was used to merge fastq files for each gut segment by individual and run (i.e., to pool the sequencing data produced among the three gut segments within an individual). We used DADA2 [9] following the Bioconductor workflow (v2) proposed by Callahan and colleagues [10] to filter and trim based on the quality profiles (maxN = 0; maxEE = 2, 5; truncQ = 2; and truncLen = 270, 200), error correct, dereplicate, and infer amplicon sequence variants (ASVs). ASVs are analogous to OTUs, but have higher (single nucleotide) resolution [11]. We merged pair-end reads, constructed sequence tables for each run, and removed amplicons > 380 or < 368 base pairs. We combined sequence tables from replicate runs (Run01 & Run02) and merged this table with the sequence  table from Run03. Finally, we removed chimeras (method = "consensus") and assigned taxonomy against the Silva_nr_v132_train_set. See the electronic supplementary materials for details about obtaining the DADA2 workflow and associated data products.
In addition to the host species listed in the main text, we also collected two individuals from Scarus vetula, and one individual from Sparisoma chrysopterum in July 2016 at Pickles Reef, Upper Florida Keys, USA. Though these samples were sequenced and the data is publically available, the samples were removed from the study due to a low sample size of collected fish.
See the Data Availability section above for accession numbers and more details on obtaining raw sequencing data, workflows, data products, etc.

Inferring metagenomic function
We attempted to use PICRUSt and PICRUSt2 to predict the putative metagenome functional content of each microbiome; however, because so few intestinal microbiome studies have been conducted to-date with regard to herbivorous coral reef fishes, our weighted Nearest Sequenced Taxon Index (NSTI) scores were above the recommended cutoff (0.03). NSTI scores for PICRUSt ranged from 0.040 to 0.248 (mean: 0.099). PICRUSt2 performed slightly better with NSTI values from 0.045 to 0.166 (mean: 0.099), however still above the recommended cutoff. Therefore, we concluded that both PICRUSt nor PICRUSt2 were inappropriate analyses for these data.

Additional alpha diversity tests
We also conducted Shapiro-Wilk Normality Tests on of the following alpha diversity metrics: Observed AVS, Chao1, and Inverse Simpson. Though only the Shannon index was normally distributed, we conducted additional tests of the non-normally distributed indices. Since host species is categorical, we used Kruskal-Wallis (non-parametric equivalent of ANOVA) to test for significance. For the inverse Simpson, Chao1, and Observed richness the results of the Kruskal-Wallis rank sum test were all significant. We then used Wilcoxon rank sum test for post-hoc analysis of each pairwise comparisons for each index.
Again we saw that only Sp. aurofrenatum was significantly different from the other hosts. For the inverse Simpson index, Sp. aurofrenatum was significantly different from three of the four host species. For both Chao1 richness estimator and Observed ASV richness, Sp. aurofrenatum was significantly different from all other host species.
Complete code and results can be found on the project website. https://projectdigest.github.io/4_diversity.html#statistical_tests

Associations between intestinal microbes, host phylogeny, & herbivore foraging behaviour
We used a series of simple and partial Mantel tests to assess whether DA ASVs were associated with a fish species' foraging ecology and/or phylogenetic history. The dissimilarity matrix for foraging ecology was based on the behavioural foraging data collected on the reef (see above). The phylogenetic dissimilarity matrix was based on a phylogenetic tree we constructed using cytochrome oxidase subunit 1 (COI) genes retrieved from NCBI's Nucleotide Database for the five fish species used in this study.
We used Clustal Omega [12] to align sequences (default settings for DNA) and Jalview [13] to manually curate and trim the final alignment to 593 bp. The alignment contained COI genes from Sc. taeniopterus (n = 5), Sp. aurofrenatum (n = 22), Sp. viride (n = 21), A. coeruleus (n = 28), and A. tractus (bahianus; n = 23), with members of the Gerridae (2 Eucinostomus and 4 Gerres) used as the outgroup. We used RAxML [14] and the GTR model for tree computation and the GAMMA rate model for likelihoods. The tree was then transformed into a distance matrix using the cophenetic function in R [15]. Finally, we constructed separate dissimilarity matrices to focus on putative resident ASVs and putative environmental ASVs, respectively. All matrices were constructed based on Bray-Curtis dissimilarity of Hellinger transformed data using the vegan package [16] in R [15]. Summary data from observations of individual bites by five species of herbivorous fishes in the Florida Keys. Data includes the mean sediment depth and turf height of algal assemblages fed on by each species at each of three sites, as well as the proportion of bites that resulted in a grazing scar on the substrate where reef calcium carbonate had been removed (Prop. grazing scar). Table summarizes the proportion of bites on vertical (> 45 degrees), concave, and convex substrates as well as the relative proportion of bites targeting each of the 10 food types most commonly bitten during the observations. Most macroalgae were aggregated to genus while other food types are summarized by functional group.      This table provides lookup details for all neighbor sequences from BLAST and SILVA-ACT comparisons used in the tree from Figure 3 (main paper). In the tree, sequences are named after the isolation source (e.g., Naso unicornis) plus a unique abbreviated code. This table gives the full accession numbers for each leaf.
• Accession number Full accession numbers for each leaf.
• Tree code Parenthetical code used in the tree.
• Host, Isolation source, Location Details about source of sequence.
• Link Hyperlink to sequence page from NCBI Nucleotide database.

Figure S1
In Figure 2A of the main paper, Class-level abundance was displayed by host species. Here the same data is presented for each individual samples.