Polyclonal symbiont populations in hydrothermal vent tubeworms and the environment

Horizontally transmitted symbioses usually house multiple and variable symbiont genotypes that are acquired from a much more diverse environmental pool via partner choice mechanisms. However, in the deep-sea hydrothermal vent tubeworm Riftia pachyptila (Vestimentifera, Siboglinidae), it has been suggested that the Candidatus Endoriftia persephone symbiont is monoclonal. Here, we show with high-coverage metagenomics that adult R. pachyptila house a polyclonal symbiont population consisting of one dominant and several low-frequency variants. This dominance of one genotype is confirmed by multilocus gene sequencing of amplified housekeeping genes in a broad range of host individuals where three out of four loci (atpA, uvrD and recA) revealed no genomic differences, while one locus (gyrB) was more diverse in adults than in juveniles. We also analysed a metagenome of free-living Endoriftia and found that the free-living population showed greater sequence variability than the host-associated population. Most juveniles and adults shared a specific dominant genotype, while other genotypes can dominate in few individuals. We suggest that although generally permissive, partner choice is selective enough to restrict uptake of some genotypes present in the environment.


Introduction
Nearly all ecosystems harbour a large diversity of mutualistic interactions in which microbes live in physical contact with a eukaryotic host and the partners cooperate to increase each other's fitness compared with solitary life [1,2]. However, the persistence of these symbiotic mutualisms represents a challenge for evolutionary theory [1,[3][4][5][6][7]. The evolution of cheaters among cooperating symbionts may lead to cooperators being outcompeted in the host-associated symbiont population, reducing the benefits to the host and thus decreasing its fitness [1,2,8,9]. In addition, a further detriment for the host population ensues if virulent variants arise that can successfully compete with cooperators [10]. Therefore, mechanisms are required to create a positive assortment of symbiont genotypes, so that cooperating partners preferentially interact with each other [11].
For horizontally transmitted symbionts, theory predicts that partner choice mechanisms through signalling between partners or screening of the partner play a crucial role in maintaining the associations [3,12,13]. Moreover, the structure and abundance of symbiont populations in the environment as well as dispersal and contact of both partners were identified as important factors influencing which symbiont strains are transmitted [14,15]. In particular, horizontal transmission creates the opportunity for uptake of symbiont variants that are optimized for local conditions [15]. horizontally transmitted symbioses such as Aliivibrio fischerisquid and rhizobia-legumes, symbionts within and between individual hosts are polyclonal [16,17], and environmental exceeds host-associated diversity. This suggests that partner choice is selective enough to restrict genotypes from entering the host, but is also permissive enough to allow a specific variety of genotypes into the host [16,18,19].
The hydrothermal vent tubeworms Riftia pachyptila (short Riftia, Vestimentifera, Siboglinidae) contain the sulfuroxidizing bacterial symbiont Candidatus Endoriftia persephone (short Endoriftia) in the symbiont-housing organ. These symbionts are transmitted horizontally upon settlement of the larvae through uptake from the environment [20], and viable bacteria can be released upon host death suggesting connectivity with the free-living symbiont population [21]. Indeed, several vestimentiferan host species share the same symbiont 16S rRNA phylotype [15,[22][23][24][25][26][27], which was also detected free-living among tubeworm clumps in diffuse vent flow areas, on cold basalt and in the water column at ambient deep-sea conditions [28,29]. Previous studies in Riftia have suggested that the symbionts are clonal or nearly clonal [22][23][24], with one study based on analysis of two adult Riftia individuals showing very low nucleotide variation of 0.29% across 30 selected loci [22]. While clonal symbiont populations are common in vertically transmitted symbionts, they are unknown from horizontally transmitted symbionts [30]. However, Endoriftia diversity was previously only assessed by 16S rRNA and internal transcribed spacer (ITS) sequencing or low coverage metagenomics, so that neither the abundance nor genotypic composition of free-living or host-associated symbionts has been characterized [22,23]. Hence, it remains unknown whether the giant tubeworm symbiosis is indeed unusual in terms of symbiont diversity or partner choice.
Because transmission is horizontal in Riftia, we hypothesized that the establishment of host-associated Endoriftia is a dynamic process that involves genotypically diverse populations of symbionts acquired from the environment rather than a monoclonal symbiont population as previously suggested [22,23]. Applying high-coverage metagenomics on one environmental sample taken underneath a tubeworm clump, and one Riftia specimen from this clump as well as multilocus gene sequencing on a broad range of hostassociated and environmental populations we show that host specimens at the hydrothermal vent environment, indeed, contain polyclonal populations dominated by one genotype and that symbiont strain diversity in the environment exceeds diversity in the host. We propose that community composition and density of free-living symbiont strains, competition between symbiont strains during host growth and competition of released symbionts from dead hosts with other microbes in the ambient environment [21] contribute to the distribution of the host-associated and free-living symbiont populations.

Material and methods (a) Sampling and DNA extraction
Tubeworm specimens of R. pachyptila, as well as biofilm, sediment and water samples, were collected from vigorous diffuse flow vents at the East Pacific Rise (EPR) at 9850 0 N, with the vent sites Tica and P-Vent, and 138 N, with the vent sites Janine and Genesis in approximately 2500 m depth in 2009, 2010 and 2011 (electronic supplementary material, figure S1 and tables S1 and S5). Juveniles between 0.3 and 1 cm long, similar in size to specimens that clearly lacked gonads [31], were recovered attached to lower parts of adult tubes. Freshly collected tubeworms were dissected aboard ship within 1.5 h after collection, and small trophosome pieces were fixed in ethanol or flash frozen in liquid nitrogen. Water samples were taken with the conductivity temperature pressure device above hydrothermal vent plumes, filtered immediately aboard onto 0.22 mm pore-size water filters (Millipore: GTTP Isopore membrane polycarbonate filter, 25 mm diameter) and fixed in ethanol, glycerol/Tris/ethylenediaminetetraacetate (EDTA) buffer or flash frozen in liquid nitrogen. Basalt rocks were transported in insulated boxes with in situ seawater to the surface. Biofilm samples were scraped off rocks using sterile scalpels and fixed in ethanol or in glycerol/Tris/EDTA (TE) buffer (0.1 M Tris (pH 7.5) and 1 mM EDTA (pH 8.0)) and flash frozen in liquid nitrogen (electronic supplementary material, tables S2 and S5). Genomic DNA was extracted from trophosome tissue, water filters and biofilm samples using the FASTDNA w SPIN Kit for soil (MP Biomedicals) according to the manufacturer's instructions.
Two universal primer pairs of the recA and ileS locus were used to amplify gene fragments from one trophosome sample by polymerase chain reaction (PCR) (for detailed information, see the electronic supplementary material). The purpose of the cloning of amplicons using universal primers was to compare the specificity of universal primers with the specificity of specific primers, and the diversity among these clones was examined. Furthermore, the five selected housekeeping genes were amplified by PCR using symbiont-specific primers for direct sequencing of the host-associated symbiont population. For the free-living symbiont population, only the atpA-and uvrD-specific primers could successfully amplify the target loci, which were cloned and were further analysed (for detailed information, see the electronic supplementary material). In addition, we used specific primers for genes encoding for the vestimentiferan host exoskeleton protein RP43 for Riftia, Tevnia jerichonana and Oasisia alvinae (RifTOExoF, RifTOExoR, [28]) to exclude host contamination, which proved negative for host occurrence in all biofilm, sediment and filtered water samples.
Manual proofreading of the chromatograms of each sequence of host-associated and free-living Endoriftia was performed with the software FINCHTV 1.5.0, GEOSPIZA. A consensus of forward and reverse sequences were generated with the software CODON-CODE ALIGNER 3.7.1.2, and single-nucleotide polymorphisms (SNPs) were only considered as SNP, if they were supported by the forward and reverse sequence read. Multiple sequence codon-based alignments for each locus were done using MEGA 5.2.2 with the alignment program MUSCLE and confirmed by visual inspection. For the host-associated Endoriftia population, a unique number was assigned to each allele for each locus. All numbers of the loci together built an allelic profile, which resulted in a specific sequence type (ST) of each sample. For the free-living Endoriftia population (Nfree), some allelic variants were judged to not be part of the symbiont population owing to the greater nucleotide variation in the atpA and uvrD loci compared with the host-associated population (Nsym), which showed no variation at these two loci.
royalsocietypublishing.org/journal/rspb Proc. R. Soc. B 286: 20181281 The clonal diversity of the host-associated symbiont population was examined with the eBURST tool, using the four loci atpA, uvrD, recA and gyrB [32]. Genetic diversity was calculated based on the average nucleotide diversity and based on the number of segregating sites [35]. To test the neutral equilibrium model of selection, Tajima's D [36] was calculated for each locus ( p , 0.01), by using the number of segregating sites and the pairwise sequence differences in R with the software packages 'ape' v.3.4 and 'pegas' v.0.9.

(c) Metagenomic comparison
For the host-associated symbiont population from a single Riftia specimen from the Tica location at the EPR 9850 0 N, a Percoll density gradient centrifugation was used to obtain an enriched symbiont fraction with depleted host content from the trophosome [23]. Trophosome was homogenized in imidazole-buffered saline (0.49 M NaCl, 0.03 M MgSO 4 , 0.011 M CaCl 2 , 0.003 M KCl and 0.05 M imidazole, pH 7.1) and centrifuged by Percoll centrifugation. Subsamples of the gradient were analysed by fluorescence in situ hybridization (FISH), and host-depleted fractions were washed in TE buffer and pooled for DNA extraction (see above). For the free-living population, we used a symbiont-enriched biofilm sample of basalt underneath the Riftia clump containing the host specimen used for metagenomics that was stored in glycerol/TE buffer and flash frozen in liquid nitrogen. Enriched symbiont fractions of host-associated and free-living samples were quantified by FISH [37], and the absence of host tissue was confirmed using a host-specific FISH probe RP1752 (electronic supplementary material, table S3). DNA was extracted as described above.
A metagenomic library was made from the free-living Endoriftia and the symbiont-enriched host-associated Endoriftia population using the Illumina Nextera XT DNA Library Preparation Kit. The free-living and host-associated samples were barcoded and sequenced in a paired-end 150 bp multiplex run on the Illumina HiSeq 2000 at the Yale Center for Genome Analysis (YCGA). Reads were demultiplexed by the YCGA and imported into CLC GENOMICS WORKBENCH 8.0.2 (http://www.clcbio.com) for further processing and analysis. Adapters and low-quality regions were trimmed from the demultiplexed sequences, and overlapping paired reads were merged using default parameter in CLC (mismatch: 2, gap: 3). The processed reads were then mapped in CLC GENOMICS WORKBENCH to the reference genome sequence (Genbank assembly record: GCF_000224455.1, [23]) using the following parameters: mismatch cost ¼ 2, insertion cost ¼ 3, deletion cost ¼ 3, length fraction ¼ 0.9, similarity fraction ¼ 0.8, auto-detect paired distances, random mapping of non-specific matches. Variants at segregating sites in the read mapping were called using the low-frequency variant caller in CLC GENOMICS WORKBENCH with the following parameters: required significance ¼ 0.1%; ignore positions with coverage . 1 Â 10 5 ; ignore broken pairs; minimum coverage ¼ 10; minimum count ¼ 2; minimum frequency ¼ 1.0%; base quality filter of radius 5, minimum central quality ¼ 30, minimum neighbourhood quality ¼ 30; relative read direction filter with significance ¼ 0.1%. Variant overlap between the two samples was calculated using custom python scripts.
To integrate the Sanger multilocus gene sequencing approach with the Illumina metagenomics approach, reads from the hostassociated and free-living symbiont metagenomes were mapped to the Sanger reads of Nsym and Nfree with segregating sites and variants, by considering default parameter with the help of BOWTIE2 [38].

(d) Statistical analysis
To test whether the sampling location or year, trophosome subsample location or host size has an influence on the ST, we used the ImerTest package in R (v.2.0-33; ( p-value cut-off of 0.01)) with the ANOVA method testing for linear mixed-effects models on the combined dataset to see any effect of interactions between fixed factors on the ST and with each factor separately (location, year, type of Riftia or trophosome compared to the ST).
To obtain an overview of the data, tidyverse was used to include the data as factors and numbers.

(e) Quantitative polymerase chain reaction analysis
To quantify the abundance of Endoriftia in each seawater, biofilm, sediment and in the trophosome, we confirmed the presence of the 16S rRNA Endoriftia phylotype by sequencing the 16S rRNA gene and used quantitative PCR (qPCR) assays to estimate total bacterial and Endoriftia-specific 16S rRNA gene copy numbers (electronic supplementary material, table S1). The qPCR analysis was performed on a LightCycler 480 Real-Time PCR system (Roche) with the LIGHTCYCLER RELEASE software 1.5.0. For both trophosome and free-living Endoriftia samples, we used universal 16S rRNA primers and compared the gene abundance with symbiont-specific 16S rRNA gene abundances (electronic supplementary material, table S3). To prepare the 16S rRNA standard, a gene fragment amplified with 16S rRNA universal primers was cloned into TOPO TA Cloning Vector pCR2.1 (Invitrogen). After plasmid isolation with the QIAprep Spin Miniprep Kit (Qiagen), the plasmid was linearized with the HindIII high-fidelity restriction enzyme (BioLabs) and checked by agarose gel electrophoresis and SYBR w Gold (Thermo Fisher) staining. For more details, see the electronic supplementary material.
We confirmed that the 16S rRNA gene was present in a single copy using qPCR of five samples with universal and specific primers (electronic supplementary material, figure S2 and table S3). The multiloci atpA, uvrD, recA, gyrB were also confirmed to be a single copy by qPCR comparisons to the 16S rRNA gene abundance in six samples (electronic supplementary material, figure  S2 and table S3). The ileS locus revealed a distinct higher copy number abundance in all the samples, pointing to a non-single copy presence (electronic supplementary material, figure S2). Therefore, this locus was excluded from further analysis.

Results
To investigate the variability within the symbiont population in Riftia from the four vent sites Tica and P-Vent at 9850 0 N EPR and Janine and Genesis at 138 N EPR, we analysed a total of 16 specimens, including nine adults and seven juveniles using a multilocus gene framework by comparing the allelic profiles of the four housekeeping loci atpA, uvrD, recA and gyrB. Because we used direct PCR amplification and sequencing for the Nsym population, it was possible to identify dominant symbiont genotypes as a single ST in each host-associated population. The multilocus gene sequencing revealed no variation of the loci (atpA, uvrD and recA) across all adult and juvenile Riftia specimens from 2010 and 2011 (electronic supplementary material, figure S1 and table S1). The fourth locus (gyrB) revealed nucleotide diversity per site of 1.19%.
Our eBURST analysis revealed one dominant sequence type 1 (ST 1), which is identical among a broad range of host individuals, including six juveniles and five adults, and to the metagenome of a single, previously studied Riftia trophosome specimen [23]. One (ST 2) out of seven juveniles and four (ST 3-6) out of nine adults differed in the gyrB locus from ST 1. Analysing the variable locus gyrB, we did not detect any significant effect of the host royalsocietypublishing.org/journal/rspb Proc. R. Soc. B 286: 20181281 Table 1. Candidatus Endoriftia persephone symbiont abundances in biofilm close to host tubeworms and in sediment away from host tubeworms determined by quantitative PCR. ('Vent activity' indicates whether the samples were collected at an active or at an inactive hydrothermal vent basalt biofilms. The total amount of total DNA recovered varied among different samples. The Endoriftia symbiont 16S rRNA copy number density was calculated for 1 ng recovered DNA. Each sample was analysed in triplicate and + values give the standard deviation. Copy numbers were transformed to cell numbers by the factor 4.1 copies cell 21 as an average copy number variation in bacteria [39].  We further tested whether additional genotypes can coexist within a single host specimen, reasoning that previous studies might have been unable to detect low abundance genotypes owing to low sequencing coverage (14Â [22] and 25Â [23]). We performed Illumina-based deep metagenomic sequencing with coverage of 828Â for one Riftia-associated symbiont population (Nsym) collected from Tica at 9850 0 N EPR in 2011 (electronic supplementary material, table S4). Our deep sequencing analysis shows that in this adult Riftia specimen, the majority of reads (76.2%) were identical to ST 1 (this study) and to the Riftia 1 reference metagenome [23]. The analysis of single and multiple nucleotide variants and indels revealed 7796 variants evenly spread across the reference metagenome, with a median frequency of 99.87% for these variants. Our dataset combines the low overall nucleotide heterogeneity of 0.17% in one specimen and suggests that the dominant phylotype is still the most abundant member of the endosymbiont community by a large margin (electronic supplementary material, figure S3).
To examine the genotypic variability in the free-living symbiont population (Nfree), we cloned and sequenced two multilocus genes (uvrD and atpA) that were also used in the Nsym gene analysis. We determined uvrD gene sequences from 14 environmental samples (11-29 clones sample 21 ) and atpA gene sequences from nine samples (1-21 clones sample 21 ) with seven samples overlapping for the two genes (electronic supplementary material, table S2). Moreover, we quantified population densities from the vent tubeworm habitat and also from surrounding environments with deep-sea conditions (inactive basalt, off-axis sediments and water) using qPCR with specific 16S rRNA gene primers.
Over 70% of the samples (26 of 36) showed the presence of the free-living 16S rRNA Endoriftia phylotype (electronic supplementary material, table S2), and all samples were dominated by ST 1 (88% uvrD and 86% atpA clones). Highest symbiont densities were found underneath tubeworm clumps at Tica and P-Vent (0.52-2.93 Â 10 3 16S rRNA copies per ng total DNA) and lower densities on inactive basalt close to and far away from these aggregations (0.16 -1.01 Â 10 3 16S rRNA copies per ng total DNA) (table 1; electronic supplementary material, table S2). These results indicate that the ST 1 genotype dominated the free-living Endoriftia population in all investigated environments. Apart from ST 1, other allelic variants were exclusively found in the free-living microbial population with a nucleotide heterogeneity of 5.16% (atpA) and 20.09% (uvrD), which indicated that the primers were not completely specific for Endoriftia. The data also showed considerable genotypic variation among the free-living populations in the number of alleles per locus with 14 for atpA and 30 for uvrD and segregating sites ranging between 20 for atpA and 37 for uvrD (table 2), respectively.
Nfree and Nsym variant analysis of the host-associated and free-living Endoriftia populations revealed that both populations share a high number of variants (7031), indicating that both populations are similar (figure 1a). However, despite its lower coverage, the free-living metagenome harboured more private variants (50 733) than the host-associated metagenome (765). These private variants occurred at low frequency with a median frequency of 2.6% in Nfree and 5.2% in Nsym. Almost all variants occurred with a frequency of less than 10% and showed the highest nucleotide diversity between Nfree and Nsym below a frequency of 10% (figure 1b).
Comparison of the multilocus genes with the same loci in the metagenomes confirmed that the dominant genotype in both deeply sequenced populations is identical to ST 1. However, the comparison also revealed that the multilocus gene sequence genes showed between approximately 2-14% and approximately 3-3.5% variation in the Nfree and Nsym metagenomes, respectively (figure 2; electronic supplementary material, figure S4). Among these additional variants, ST 3, 4 and 6 were also detectable in both metagenomes, while ST 5 from a single adult and ST 2 from a juvenile were only supported by the Nsym metagenome reads. Hence, not all variants were found in the Nfree metagenome, suggesting that the free-living symbiont population may have remained under-sampled or some variants may have been absent in this sample.
The Nfree population of Endoriftia shows higher heterogeneity than Nsym and also the recurrence of sequence Table 2. Summary of nucleotide variation across genes within the host-associated (Nsym) and free-living (Nfree) Candidatus Endoriftia persephone population. (Nucleotide variation across genes of the host-associated and free-living Endoriftia population with n, sample size; seg. sites, segregating sites; bp, number of base pairs; h (s) %, percentage of segregating sites that are singleton alleles; p total, nucleotide diversity per site for all changes; p non: nucleotide diversity per site for non-synonymous changes; p sym, nucleotide diversity per site for synonymous changes; dN/dS, ratio of non-synonymous and synonymous changes. The free-living population was analysed with 118 clones for the atpA locus and 315 clones for the uvrD locus.) habitats, some of the variants also occurred in more than one habitat. Interestingly, one allele that occurred at 18 m above the vents was also found at 103 m (electronic supplementary material, table S6). Another allele was present at the active vent site Tica next to tubeworm clumps, but also present in the pelagial at 103 m. For the atpA locus, all 14 alleles were supported by Nsym metagenomics reads and among these, nine alleles also by Nfree reads. Moreover, the same sequence, which shares the same SNP, was sampled at active vents at Tica and in biofilm off-axis BM13 at inactive basalt, and two other samples from underneath different tubeworms at Tica shared the same sequence (electronic supplementary material, table S6). The other 12 alleles only occurred once in different habitats. Overall, this comparison further supports that the free-living symbiont population shows more variation than the host-associated population and suggests that some of the variants are distributed over a range of different habitats.

Discussion
Symbioses with horizontal transmission face the challenge of maintaining long-term stability of the association, because each host generation relies on the availability of free-living cooperating symbionts in the environment for uptake into the host through partner choice [8,9,14,15]. Uptake is strongly influenced by the structure and abundance of symbiont populations in the local environment and their encounter with the settled larvae [14]. In this study, we show that instead of a clonal host-associated population, as previously suggested [23], there is a considerable variation of Endoriftia genotypes housed in the trophosome of a single Riftia specimen. Accordingly, partner choice cannot be restricted to one specific genotype that is selected from the environment. We propose that partner choice is more permissive and allows several different Endoriftia strains to enter the host from a large and abundant pool of free-living variants in the Riftia habitat. Multiple genotypes of very closely related symbionts have also been recently detected in the trophosome of the hydrothermal vent tubeworm species Ridgeia piscesae [40] applying whole genome shotgun sequencing and in the metagenomes of Escarpia sp. and Lamellibrachia sp. [41].
Our study revealed that the polyclonal symbiont population in Riftia is dominated by one genotype and several low-frequency variants based on our high-coverage metagenome sequencing analysis. Such variation cannot result from within host mutations alone, because the same symbiont variants were found across different samples in addition to the relatively short lifespan of the host [42]. Rather, the variant richness more likely results from infection of multiple strains during horizontal transmission in larvae [21].
The dominance of one strain in juvenile and adult hosts may result from frequency-dependent infection owing to dominance of this strain in the local environment or from preferential growth during the development of the host. The dominant genotype matches with the previously reported metagenomic analysis of a specimen collected at 138 N in 2008 [23] and is represented by the dominant sequence type (ST 1) in a broad range of juvenile and adult Riftia. This points to one dominating variant across the four studied vent sites on the EPR 98 and 138 N in two consecutive years (2010 and 2011). A previous study revealed that at the EPR Endoriftia clusters by host species, considering the core genome, but considering the accessory genome, it   [40]. Our multilocus gene sequencing approach to discriminate closely related bacterial strains based on wellconserved housekeeping loci paints a more differentiated picture by showing that there is more variation within the same tubeworm species than previously anticipated. Although 70% of the 17 sampled Riftia are dominated by one Endorifitia genotype (ST 1), other STs occur and can even dominate. These results further suggest that symbionts are locally adapted to a host-associated lifestyle and that symbiont reproduction primarily occurs within hosts and not in the environment [43,44].
In Riftia, other symbiont genotypes than ST 1 were dominant in several adults (ST 3, 4, 5 and 6) and one juvenile (ST 2) across vents and sampling years. The two tubeworms housing ST 3 and ST 4 were collected together with two other tubeworms that exhibited ST 1 from a single clump. Whether spatial and temporal heterogeneity of environmental royalsocietypublishing.org/journal/rspb Proc. R. Soc. B 286: 20181281 parameters known to be variable in the Riftia habitat [45][46][47][48][49] can lead to competitive replacement and selection of genotypes better adapted to different abiotic conditions, remains to be studied in a larger tubeworm collection with concurrent measurements of environmental parameters. Such selection may be enhanced by physiological adaptation of the host including enzymes and transporters, like a carbonic anhydrase [50], that are specific for a symbiotic lifestyle and may also control which symbiont genotype raises to dominance, potentially allowing the host to select the optimal partner for specific environmental conditions. Additionally, host genotype may influence the type of symbiont selected, but elucidating such specific genetic interactions will require host in addition to symbiont genotyping. Quantitative PCR analyses from biofilms at active vent sites revealed high abundances of the 16S rRNA symbiont phylotype. ST 1 dominating in most tubeworms also dominated in the environmental biofilms of the Riftia habitat. The metagenome analysis of the environmental sample taken underneath a tubeworm clump showed that the dominant free-living genotype is identical to ST 1. Furthermore, the multilocus gene analyses of the free-living symbiont populations also revealed the dominance of sequences identical to ST 1 in two genes (atpA and uvrD). The prevalence of free-living and host-associated ST 1 (and other identified STs) in the Riftia habitat suggests that vigorous diffuse sulfide emissions [51] fuel chemoautotrophic activity in the host [52]. The symbiont population outside of the host in the biofilms could be dormant, rely on heterotrophy or autotrophy but regardless of metabolic state, it guarantees the availability of source populations for transmission. Especially if the symbiont population is metabolically active, the unstable chemical environment may also introduce temporal variation in the symbiont pool available for host infection. Whether dominating symbiont genotypes in the trophosome are also favoured in the free-living environment after they escape from dead hosts [21] remains to be studied.
We could detect the 16S rRNA Endoriftia phylotype in a broad range of pelagic and benthic deep-sea environments, confirming and extending a previous study [28,29]. Owing to the presumably low concentration, we could not quantify free-living Endoriftia in non-vent environments from the pelagial, from far away sediments and from most inactive basalts far away from tubeworms. We were able, however, to sequence individual clones of atpA and uvrD genes and reported that the vast majority of retrieved sequences matches ST 1 (atpA clones: two samples 100% ST 1; two samples ST 1 dominance 67% and 93%; uvrD clones: seven samples 100% ST 1, three samples dominance 73, 75 and 93%). Presumably owing to its low concentration, we could not quantify Endoriftia in non-vent environments.
However, our data suggest that ST 1 is abundant in various non-vent benthic and pelagic environments along the axial summit trough housing vents. Chemosynthesis should not be possible in those habitats in the absence of sulfide. Instead, functional analyses of metagenomes suggest that Endoriftia might be able to switch to heterotrophy [22]. Whether this metabolic versatility contributes to the distribution of Endoriftia and specifically of ST 1 in non-vent environments under deep-sea conditions is not known. Escape of symbionts from hosts upon cessation of vent flux into environments with deep-sea conditions and dispersal

Conclusion
Our study suggests that genotypically diverse symbionts, available in high abundance for host infection, are acquired from the environment after settlement ( figure 3). What functionally differentiates the Endoriftia genotypes remains unknown. However, it is most likely that selection for the dominant genotype happens within the host rather than in the environment because environmental populations of bacteria are rarely dominated by a single clonal genotype [53]. This means that dominance in hosts causes dominance in environmental samples by the release of symbionts upon host death ( figure 3). Regardless, our data support permissive partner choice similar to other horizontally transmitted symbionts because diverse populations persist in the host and different genotypes can dominate. Frequency-dependent uptake and growth or rather unspecific uptake and preferential growth of a dominant symbiont might be as crucial for the long-term persistence of this association, as the release of symbionts upon host death followed by proliferation or at least persistence in the extant environment.
Data accessibility. Sequencing data will be available at NCBI under the following accession number: project SRP117045; SRR6014239 host-associated symbiont population and SRR6014238 free-living symbiont population.