Preen gland microbiota covary with major histocompatibility complex genotype in a songbird

Pathogen-mediated selection at the major histocompatibility complex (MHC) is thought to promote MHC-based mate choice in vertebrates. Mounting evidence implicates odour in conveying MHC genotype, but the underlying mechanisms remain uncertain. MHC effects on odour may be mediated by odour-producing symbiotic microbes whose community structure is shaped by MHC genotype. In birds, preen oil is a primary source of body odour and similarity at MHC predicts similarity in preen oil composition. Hypothesizing that this relationship is mediated by symbiotic microbes, we characterized MHC genotype, preen gland microbial communities and preen oil chemistry of song sparrows (Melospiza melodia). Consistent with the microbial mediation hypothesis, pairwise similarity at MHC predicted similarity in preen gland microbiota. Counter to this hypothesis, overall microbial similarity did not predict chemical similarity of preen oil. However, permutation testing identified a maximally predictive set of microbial taxa that best reflect MHC genotype, and another set of taxa that best predict preen oil chemical composition. The relative strengths of relationships between MHC and microbes, microbes and preen oil, and MHC and preen oil suggest that MHC may affect host odour both directly and indirectly. Thus, birds may assess MHC genotypes based on both host-associated and microbially mediated odours.

We used seed-baited Potter traps and mist nets to capture birds at two locations in southern Ontario, Canada: on Western University property in London (43.008°N, 81.291°W) and at the rare Charitable Research Reserve in Cambridge (43.383°N, 80.357°W). We captured and sampled 31 adult song sparrows: 19 (11 males, 8 females) in London from 8 August to 1 September 2017 and 12 (8 males, 4 females) in Cambridge from 3 April to 1 May 2017.
From each bird, we gently probed the preen gland with an unheparinized capillary tube to express approximately 1-5 mg of preen oil. Immediately following preen oil collection, we swabbed the preen gland to collect microbes from both inside and outside the gland (see electronic supplementary material for details). The small body size of song sparrows (approx. 20 g) and the correspondingly small size of the preen gland and papilla (gland dimensions average 7.4 mm W by 4.2 mm L, papilla width averages 1.4 mm, electronic supplementary material, figure S1) prevent noninvasively and nonsurgically sampling microbes from directly within the gland while excluding microbes immediately outside the gland. Instead, because preen oil is frequently excreted from the preen gland, our external swabbing method was designed to collect bacteria living immediately outside the preen gland, as well as those inhabiting the gland. Because preen oil is routinely groomed through the feathers after being released from the preen gland, we think it likely that both bacteria inhabiting the preen gland and those around its external surface may modify the composition of preen oil on feathers through manipulation or degradation.
We collected a small blood sample (approx. 20 µl) from each bird through brachial venipuncture and, after the field season, sexed all birds using the P2/P8 PCR protocol described by Griffiths et al. [27]. This blood sample was also used for MHC genotyping. To ensure individuals were only sampled once, each bird was banded before release at the site of capture.

Bacterial DNA extraction and 16S amplification
Following [23], we extracted bacterial DNA from swabs using Qiagen DNeasy PowerSoil DNA isolation kits with some modifications to the manufacturer's recommended protocol (see electronic supplementary material for detailed protocol). Extractions were carried out in batches, each consisting of 23 preen gland swabs plus one extraction from a fresh sterile swab that served as a negative control. We amplified the V4 region of the bacterial 16S rRNA gene using the universal primers F518 [28] and R806 [29]. In addition to the priming sequences, each primer included an Illumina MiSeq adaptor sequence, four randomized nucleotides and a unique 'barcode' of eight nucleotides. We performed PCR in a total volume of 25 µl, including 10 µl of Quantabio 5PRIME HotMasterMix, 0.2 µM of each primer and 2 µl of DNA template (mean concentration = 0.1 ng µl −1 ). The thermocycling profile consisted of 2 min at 94°C; 35 cycles of 45 s at 94°C, 60 s at 50°C and 90 s at 72°C; and a 10 min final extension at 72°C. Amplification was confirmed by running samples on a 2% agarose gel.

Sequencing and pipeline
We pooled PCR products into a library and sequenced with 250 nt paired-end reads on an Illumina MiSeq at the London Regional Genomics Centre. We used a pipeline [30] to collapse sequences into clusters of identical reads and assign sequences to individuals. We used a second pipeline [31] and the R package dada2 [32] to overlap reads, remove ambiguous reads and filter out chimaeras and singleton sequences (i.e. those appearing only once in the dataset) and those rarer than 0.1% in any sample. We assigned each unique sequence (i.e. sequence variants; hereafter SVs) to bacterial taxon by clustering at greater than or equal to 97% sequence identity (following [30]) using the naive Bayesian Ribosomal Database Project (RDP) Classifier [32,33]. The workflow and parameters used are the same as in [23] and are available at https://github.com/ggloor/miseq_bin/. Raw read count data, including taxonomic assignments for each sample, are available in the electronic supplementary material.
We used a compositional data analysis approach [34] that examines the read ratios between sequences. Most datasets do not actually contain all possible components; often, small values, including values below the detection limit of an instrument such as the Illumina MiSeq, are rounded off to zero. In such cases, zero counts reflect sampling or equipment limitations; that is, they are not true zeros but low counts for which the true value is unknown [35]. Because replacing these values with zero or discarding them can lead to estimation bias, such values should be imputed using an estimation method [35]. Accordingly, following [31], we used Bayesian-multiplicative replacement to impute values for zero count sequences using the R package zCompositions [35]. We then applied a royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210936 centred log-ratio (clr) transformation to the zero-replaced dataset, which renders the use of Euclidean distances meaningful in subsequent analyses [31,36].
Next, because rare sequences that occur in only a few samples are generally uninformative, and because samples with very low read counts are more likely to represent undersampling, we filtered sequences by the minimum proportion, minimum occurrence and minimum sample count of reads. Thus, sequences found in fewer than 0.5% of reads, sequences found in fewer than 10% of samples and samples with fewer than 5000 reads were removed from the initial dataset. We also removed three non-bacterial SVs identified as chloroplasts. The filtered dataset contained 44 SVs from across the 31 samples (mean ± s.e. SVs per individual = 26.4 ± 0.66, electronic supplementary material, table S1). Most of these retained SVs (39/44) were classified to the genus level. These methods were applied to a larger dataset [23]; none of the 31 samples used in this study were removed as a result of these processing steps. Detailed methods and quality control procedures are outlined in the electronic supplementary material.

MHC genetic analysis
We amplified the hypervariable second exon of MHC class II (approx. 350 nt) using primers SospMHCint1f [22] and Int2r.1 [37]. In addition to the priming sequences, each primer included an Illumina MiSeq adaptor sequence, four randomized nucleotides and a unique 'barcode' of eight nucleotides. Detailed PCR conditions and MHC sequencing methods are described elsewhere [8]. Briefly, we aligned amino acid sequences in MEGA v. 7.0 [38] and trimmed based on comparison to conspecific sequences in GenBank [39]. Trimming resulted in alleles of 73-86 amino acids, corresponding to most of exon II. Next, we used a pipeline [30] to collapse sequences into clusters of identical reads and assign sequences to individuals, retaining sequences comprising at least 1% of an individual's reads (mean ± s.e. retained reads per individual = 20 736 ± 1939).
We assigned each retained sequence to its corresponding protein sequence, removed any putative pseudogenes following [22], and applied Bayesian-multiplicative replacement and clr-transformation to the data to allow comparison to the microbial dataset. In some cases, different DNA sequence reads translated to the same amino acid sequence. For these, we calculated the average log-ratio value so that only unique protein sequences were included in further analysis. Across all 31 birds, we detected 151 unique amino acid alleles (mean ± s.e. alleles per individual = 16.23 ± 0.61).

Preen oil chemical analysis
We dissolved preen oil samples in 1-5 ml chloroform (CHCl 3 ; scaled for the volume of preen oil collected for a final concentration of 1 mg preen oil/ml CHCl 3 ) and analysed them using an Agilent 7890A gas chromatograph with flame ionization detector (GC-FID), fitted with a 5% phenyl methyl siloxane column (Agilent Technologies DB-5, 30 m × 0.32 µm ID × 0.25 µm film thickness). We followed the protocol and temperature profile described in [25]. Because the volume of preen oil collected varied across individuals, we quantified peak sizes based on the proportional peak size relative to total chromatogram peak area. To maintain comparability with the 16S and MHC genetic datasets we applied Bayesian-multiplicative replacement and clr-transformation to these proportional data. Across all 31 birds, we detected 65 unique preen oil peaks (mean ± s.e. peaks per individual = 24.6 ± 1.1).

Data analysis
All statistical analyses were performed in R v. 3.3.3 [40]. We used the clr data to construct Euclidean distance matrices for each dataset ( preen gland microbial community composition, MHC amino acid genotype and preen oil chemical composition). Distances were calculated between all pairwise dyads. We compared these pairwise distance matrices (31 × 31 matrices, 961 pairwise combinations per matrix) in three separate tests. We ran Mantel tests in the vegan package [41] with 10 000 permutations to assess correlations (Spearman's r) between (i) MHC amino acid distance (hereafter 'MHC distance') and preen gland microbial distance (hereafter 'microbial distance'), (ii) microbial distance and preen oil chemical distance (hereafter 'chemical distance') and (iii) MHC distance and chemical distance.
Because we sampled birds from two locations and both sexes, for each of the MHC, microbial and chemical Euclidean distance matrices we performed permutational multivariate analysis of variance (PERMANOVA) tests to compare the magnitude of pairwise differences for individuals from different royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210936 populations and sexes. These analyses were done to assess the degree to which population or sex differences at MHC, microbial composition or preen oil chemistry might contribute to associations observed. This was particularly important because we pooled individuals from different populations and sexes and those sampled at different times of year. Pooling was done to maximize statistical power when testing for relationships between MHC, microbial and chemical distance.
Given the possibility that only a subset of microbes covary with MHC and/or influence preen oil composition, and that only a subset of preen oil chemicals reflect MHC, we used the 'bioenv' [42] function in the vegan package to identify the subset of bacterial SVs that best reflect MHC distance (which we interpreted as candidate SVs that may be particularly sensitive to MHC genotype), the subset of bacterial SVs that best predict chemical distance (interpreted as candidate SVs that may influence preen oil chemical composition) and the subset of preen oil chemical peaks that best reflect MHC distance (interpreted as candidate compounds for chemical cues of MHC genotype). This was achieved by considering all possible combinations of variables (SVs, chemical peaks) at increasing levels of complexity, up to a user-specified maximum (here, 6 SVs or peaks). The approach then identifies the variables that maximize the rank correlation between two distance matrices (here, MHC and microbial distance; microbial and chemical distance; MHC and chemical distance).
The rank correlations performed by bioenv are based on a large number of interdependent similarity calculations, so statistical tests of the bioenv results are inappropriate. Instead, these results are used as an exploratory tool, where the optimal variables identified are considered promising candidates for further study. Bias towards a more complicated explanation than the data support is not a major concern with this analysis, because the correlation coefficient tends to decrease with the inclusion of unimportant variables [42]. By selecting only the top six variables (out of a possible 44 SVs, 65 peaks, and 151 alleles) that maximize correlations, such bias is unlikely to be an issue in this dataset.
Finally, song sparrow preen oil is made up of wax monoesters arranged in a series of different combinations of different chain length fatty alcohols and fatty acids [25]. To infer the major and minor acid : alcohol esters in the subset of preen oil peaks that best reflect MHC distance in our bioenv analysis, we compared our GC-FID data with a previously published analysis of song sparrow preen oil that used GC-FID and GC-mass spectrometry approaches [25].
PERMANOVA on the Euclidian distance matrices for microbes, MHC and preen oil identified significant differences in preen gland microbiota, MHC genotype and preen oil chemistry among populations, but not between the sexes (table 1).
Permutation analysis (bioenv) identified a best combination of 6 SVs at which microbial and MHC distance matrices were maximally correlated (Mantel's r = 0.42, figure 2a, electronic supplementary material, table S2). Two of these SVs (Micrococcus and Bacillus) best maximized correlations (indicated by a change in Mantel's r ≥ 0.05, electronic supplementary material, table S2), suggesting that the relative abundance of these taxa may be particularly sensitive to MHC genotype. We also identified a best combination of six SVs at which microbial and chemical distance matrices were maximally correlated (Mantel's r = 0.48, figure 2b, electronic supplementary material, table S3). Three of these SVs (Methylobacterium, Bradyrhizobium and Xylophilus) best maximized correlations, suggesting that these taxa are candidates for influencing preen oil chemical composition. We then identified a best combination of six GC-FID peaks at which chemical and MHC distance matrices were maximally correlated (Mantel's r = 0.54, figure 2c, electronic supplementary material, table S4). Two of these peaks (2,24) best maximized correlations between preen oil and MHC, suggesting that they are candidates for chemical cues of MHC genotype. We were not able to infer the chemical composition of these peaks, but we inferred the major and minor wax monoesters of four of the six preen oil peaks that occurred repeatedly in all subset sizes (10, 18, 27 and 92) and an additional peak (51) that was identified in models with a smaller number of subsets (electronic supplementary material,

Discussion
Understanding the potential interconnections between MHC, microbiota and animal odour is best addressed by examining all three links simultaneously [43]. Recent experimental evidence implicates preen gland microbes in producing behaviourally relevant preen oil compounds in dark-eyed juncos (Junco hyemalis) [24]. In wild and captive song sparrows [6,20] and black-legged kittiwakes (Rissa tridactyla) [44], individuals that are more similar at MHC class II also have more similar preen oil chemistry. However, a recent review [43] identifies a critical knowledge gap in that no studies have concurrently characterized MHC, microbiota and chemical profiles or the relationships between all three. To our knowledge, ours is the first study to do so.
We hypothesized that the correlation previously described between MHC class II genotype and preen oil chemistry [8,22,44] is mediated by microbial communities associated with the preen gland. That is, we hypothesized that MHC genotype shapes preen gland microbial communities, which in turn contribute to variation in the chemical composition of preen oil. Consistent with this microbial mediation hypothesis, dyads that were more similar at MHC also had more similar preen gland microbiota. Permutation analysis identified a subset of six preen oil wax ester peaks that maximized the rank correlation between MHC class II amino acid distance and preen oil chemical distance. Euclidean distances were calculated from all pairwise dyads (N = 31 birds; 465 pairwise combinations). Ellipses indicate the normal-probability contours at 95% confidence. Axis scales differ between panels because different subsets generate different distance ranges. However, inconsistent with the hypothesis, pairwise similarity in preen gland microbiota did not significantly predict similarity in the chemical composition of preen oil. Furthermore, when microbial similarity was calculated based on the full set of sequences retrieved from preen gland samples, the association between matrices of MHC and microbial distance and that between matrices of microbial and chemical distance were not consistently stronger than that between MHC and chemical distance. Together, these findings suggest that although preen gland microbial community composition appears sensitive to host genotype at MHC class II, the resulting variation in overall community composition does not drive individual variation in host chemical cues. The fact that preen oil composition was more strongly related to MHC genotype than to overall preen gland microbiota (r = 0.39 versus r = 0.02) suggests that, counter to our prediction, the effects of MHC on preen oil composition are not mediated primarily through preen gland microbiota. Instead, MHC genotype may affect host odour more directly. Alternatively, the correlation we observed between MHC and preen oil chemical similarity could reflect indirect effects stemming from additional factors such as geographical variation [5,6].
MHC peptides bound to MHC proteins directly reflect the structure of the polymorphic peptide binding regions of MHC proteins. These MHC peptides can be secreted in bodily fluids, becoming chemical cues that convey information about MHC genotype [9,14,45]. This is consistent with our findings of a relatively large effect of MHC genotype on preen oil chemical composition, but a caveat is that we analysed the whole wax esters of preen oil rather than volatiles. We expect that such esters are the precursors to preen oil volatiles [25] but we did not measure volatile compounds directly, nor did we identify peptides or metabolites that might be MHC-derived. Consequently, our approach could have constrained our ability to detect effects of preen gland microbiota on odour.
Symbiotic microbes could affect preen oil composition during its production, storage and/or application. Bacteria living within the gland should be more likely to affect preen oil during production and/or storage, while bacteria living on the external surface of the gland and/or on the feathers should be more likely to affect preen oil after its excretion and application to the body surface. We sampled the overall bacterial community associated with the uropygial gland, so we are not able to reliably distinguish microbes from within and outside the gland. However, we reasoned that bacteria excreted with the preen oil we sampled immediately prior to swabbing and bacteria living on the external surface of the gland both have the potential to affect preen oil composition, since the oil is regularly excreted from the gland to the body surface.
The fermentation hypothesis for chemical recognition posits that symbiotic microbes produce host odours [46], but it is of course possible that not all gland-associated microbes do so. Similarly, some microbes may be more sensitive than others to host genotype at MHC class II. We supplemented our analyses of overall microbial distance with permutation analyses to identify subsets of microbes that maximize correlations between (i) microbial distance and MHC distance (candidates for being sensitive to host MHC genotype) or (ii) microbial distance and chemical distance (candidates for affecting host odour through modification of preen oil chemistry). The bacterial genera Micrococcus and Bacillus maximized correlations with MHC distance. Micrococcus and Bacillus have been found within the preen gland of dark-eyed juncos, and each contain species that are known odour producers [24]. For example, M. luteus breaks down sebaceous secretions such as human sweat, contributing to armpit odour [47]. Within the preen gland of birds, these genera may be able to degrade long-chain preen oil wax esters into odorous short-chain compounds. Our finding that the relative abundance of these genera varies with MHC genotype suggests that the metabolites these taxa produce may provide odour cues of MHC genotype.
We expected that the taxa identified through permutation analysis as maximizing correlations with preen oil chemical distance would include those that have previously been implicated in songbird chemical communication, such as Pseudomonas, Burkholderia or Staphylococcus [24,48]. We did not observe this pattern. Instead, genera Methylobacterium, Bradyrhizobium and Xylophilus maximized the rank correlation to preen oil chemistry. These taxa have primarily been found associated with plants and soils [49] and may thus be transiently acquired from the environment rather than being true preen gland symbionts. However, Methylobacterium and Bradyrhizobium have been found in or around the preen gland or on body feathers of other bird species [24,48,[50][51][52][53][54][55].
Some of the genera we identified in association with the preen gland of song sparrows (e.g. Bacillus, Methylobacterium and Sphingomonas) can be found as contaminants in DNA extraction kits and laboratory reagents [56]. While we cannot be certain that no contaminant SVs remained in our final dataset, we royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210936 think it likely that the filtering steps we took removed contaminant SVs, as these would have been orders of magnitude less abundant than the bacteria recovered from the birds. Finally, although we hypothesized that microbial community structure influences preen gland chemistry (as has been shown previously [24]), we cannot rule out the alternative hypothesis that preen oil chemical composition influences the microbiome, by presenting a more or less favourable environment for certain species to inhabit.
The preen oil peaks that maximized correlations between preen oil chemical composition and MHC genotype were mixtures of C12 : C19 through C19 : C16 acid : alcohol monoesters. By contrast to previous results, these peaks do not match the preen oil peaks previously identified as best explaining the relationship between preen oil and MHC genetic similarity in song sparrows from Newboro, Ontario [22]. We sampled birds from London and Cambridge (approx. 400-500 km away from Newboro); thus, differences in influential preen oil peaks could reflect population differences in preen oil composition [25] and MHC genotype.
Our use of two populations may have contributed to the stronger relationship observed between MHC similarity and preen oil similarity (r = 0.38 for all pairwise dyads) relative to previous studies in this and other species (song sparrows, r = 0.11-0.13 [8,22]; black-legged kittiwakes (r = 0.13-0.22) [44]). Previous studies each focused on a single population, while we screened two geographically distinct populations that differed in the composition of MHC, microbes and preen oil (as inferred from greater pairwise distances between than within sites for each of these measures). Thus, we cannot exclude the possibility that significant correlations observed between MHC and both preen gland microbes and preen oil chemical composition do not exclusively reflect causal relationships but are driven at least partly by geographical differences. Consistent with this possibility, preen gland microbial communities are influenced by environmental factors, though genetic factors also play a role [20]. Another limitation of our study is that we are unable to disentangle the potential influence of geographical differences from the possibility that seasonal or physiological differences may have affected our results. Due to personnel limitations, we sampled the preen gland microbiota of London birds during post-breeding and of Cambridge birds during breeding. Seasonal changes in bird bacterial communities have not been well studied, but they have been observed in cloacal bacteria [57]. Surprisingly, we did not detect sex differences in preen oil chemical composition, despite collecting preen oil from breeding condition birds. This is in contrast to prior work in this species [25,26], and we suspect may be due to the smaller sample size used here.

Conclusion
Song sparrows with more similar MHC genotypes were more similar in the overall community structure of their preen gland microbiota. Overall microbial similarity did not predict chemical similarity of preen oil, despite a robust positive correlation between MHC similarity and chemical similarity, but permutation analysis identified a microbial subset that was strongly predictive of preen oil chemistry. Taken together, our findings suggest that previously reported correlations between MHC and preen oil composition may be explained partly by direct effects of MHC on preen oil chemical composition and partly by indirect effects mediated by a subset of microbes inhabiting the preen gland.