Comparative genomic analysis of the ‘pseudofungus’ Hyphochytrium catenoides

Eukaryotic microbes have three primary mechanisms for obtaining nutrients and energy: phagotrophy, photosynthesis and osmotrophy. Traits associated with the latter two functions arose independently multiple times in the eukaryotes. The Fungi successfully coupled osmotrophy with filamentous growth, and similar traits are also manifested in the Pseudofungi (oomycetes and hyphochytriomycetes). Both the Fungi and the Pseudofungi encompass a diversity of plant and animal parasites. Genome-sequencing efforts have focused on host-associated microbes (mutualistic symbionts or parasites), providing limited comparisons with free-living relatives. Here we report the first draft genome sequence of a hyphochytriomycete ‘pseudofungus’; Hyphochytrium catenoides. Using phylogenomic approaches, we identify genes of recent viral ancestry, with related viral derived genes also present on the genomes of oomycetes, suggesting a complex history of viral coevolution and integration across the Pseudofungi. H. catenoides has a complex life cycle involving diverse filamentous structures and a flagellated zoospore with a single anterior tinselate flagellum. We use genome comparisons, drug sensitivity analysis and high-throughput culture arrays to investigate the ancestry of oomycete/pseudofungal characteristics, demonstrating that many of the genetic features associated with parasitic traits evolved specifically within the oomycete radiation. Comparative genomics also identified differences in the repertoire of genes associated with filamentous growth between the Fungi and the Pseudofungi, including differences in vesicle trafficking systems, cell-wall synthesis pathways and motor protein repertoire, demonstrating that unique cellular systems underpinned the convergent evolution of filamentous osmotrophic growth in these two eukaryotic groups.

GL, 0000-0002-4607-2064; AS-P, 0000-0002-9896-9746; JES, 0000-0002-7591-0020; BW, 0000-0002-4620-9091; TAR, 0000-0002-9692-0973 Eukaryotic microbes have three primary mechanisms for obtaining nutrients and energy: phagotrophy, photosynthesis and osmotrophy. Traits associated with the latter two functions arose independently multiple times in the eukaryotes. The Fungi successfully coupled osmotrophy with filamentous growth, and similar traits are also manifested in the Pseudofungi (oomycetes and hyphochytriomycetes). Both the Fungi and the Pseudofungi encompass a diversity of plant and animal parasites. Genome-sequencing efforts have focused on host-associated microbes (mutualistic symbionts or parasites), providing limited comparisons with free-living relatives. Here we report the first draft genome sequence of a hyphochytriomycete 'pseudofungus'; Hyphochytrium catenoides. Using phylogenomic approaches, we identify genes of recent viral ancestry, with related viral derived genes also present on the genomes of oomycetes, suggesting a complex history of viral coevolution and integration across the Pseudofungi. H. catenoides has a complex life cycle involving diverse filamentous structures and a flagellated zoospore with a single anterior tinselate flagellum. We use genome comparisons, drug sensitivity analysis and high-throughput culture arrays to investigate the ancestry of oomycete/pseudofungal characteristics, demonstrating that many of the genetic features associated with parasitic traits evolved specifically within the oomycete radiation. Comparative genomics also identified differences in the repertoire of genes associated with filamentous growth between the Fungi and the Pseudofungi, including differences in vesicle trafficking systems, cell-wall synthesis pathways and motor protein repertoire, demonstrating that unique cellular systems underpinned the convergent evolution of filamentous osmotrophic growth in these two eukaryotic groups. and cause important diseases of animals, algae and plants [3,4]. The stramenopiles are a phylogenetically robust group (e.g. [5]) defined by the presence of two motile flagella, a 'standard' smooth posterior flagellum and a 'tinselate' anterior flagellum with a tripartite rigid tubular mastigoneme (hairs) [2]. However, secondary flagellum loss has occurred during the radiation of this group, for example in the hyphochytrids like Hyphochytrium catenoides [6], which have lost a smooth posterior flagellum but retained a tinselate anterior flagellum.
Environmental sequencing, specifically of marine environments (e.g. [7]), has increased the known phylogenetic diversity of the stramenopiles, suggesting that this group is one of the most diverse higher-level groups within the eukaryotes [8]. Representatives of these groups remain uncultured with little gene/genome sampling. Furthermore, genome-sequencing efforts in the stramenopiles have largely focused on photosynthetic algae (e.g. [9,10]) or oomycete parasites (e.g. [11,12]), leaving the diversity of heterotrophic free-living stramenopiles undersampled. Here, we describe the sequencing and comparative genomic analysis of H. catenoides (ATCC 18719) originally isolated by D. J. Barr from pine tree pollen in Arizona, USA (however, we note that there is no direct reference in ATCC that accompanies this culture [13]). We propose this organism and associated genome data as a tool to investigate the evolution of stramenopile characteristics and for the purpose of comparing and contrasting the evolution of traits between free-living and parasitic Pseudofungi.
Hyphochytrium catenoides is a free-living hyphochytrid protist that forms hyphal-like networks and spores with only a single anterior tinselate flagellum (figure 1a) [6,14]. The hyphochytrids are thought to branch sister to the oomycetes [4,15], and both these groups grow as filamentous/ polarized cells feeding osmotrophically by extracellular secretion of digestive enzymes coupled to nutrient uptake [4,6,14]. These characteristics mean that they 'resemble' fungi [4]. Here, we use genome sequence data to confirm the phylogenetic position of the hyphochytrids, investigate characters shared with oomycete parasites and identify the genes involved in cellular characteristics shared with fungi that characterize filamentous/osmotrophic growth. We also use the genome data to investigate the protein repertoire putatively associated with loss of the posterior flagellum in the hyphochytrids. These data provide a unique genome sample of a free-living stramenopile in order to facilitate further evolutionary and cellular research.

Genome assembly and gene model prediction
Using a range of methods, we assembled and tested the completeness of the H. catenoides genome (see Material and methods). Comparisons measuring the fraction of transcriptome data that aligned to the genome with BLAST, along with CEGMA and BUSCO v.1.2, demonstrated that the genome assembly was predicted to be, respectively, 97.8%, 91.5% and 52% complete in terms of gene sampling (for further analysis and discussion of genome 'completeness' analysis, see electronic supplementary material, figure S1). Both CEGMA and BUSCO (v. 1.2) are likely to underestimate the completeness of genomes, as the core gene list is derived from a subset of genomes that does not fully sample a diverse collection of eukaryotic genomes (e.g. BUSCO v. 1.2 only samples fungal and metazoan genomes), which inevitably gives a much lower estimation of completion. A full set of tRNAs was identified in the Hyphochytrium genome, including an additional tRNA for selenocysteine. The !1 kbp scaffold assembly along with the predicted proteome has been submitted as a draft genome to the EMBL EBI (BioStudies: S-BSST46). Details comparing the assembly with other eukaryotic genome sequences are described in figure 1b. Analysis using REPEATMASKER [16] determined that the !1 kbp genome assembly comprised 9.53% repeat regions of which 1.79% were assigned to transposable elements.
The protocol used for genome contamination assessment, genome assembly and identification of putative proteincoding genes and their predicted proteins are provided in the Material and methods. This approach identified 18 481 putative gene models (406 of these gene models demonstrated evidence of multiple splice forms according to MAKER [17]), a total gene count similar to the mean (15 946) for other sequenced stramenopiles (figure 1b). The number of introns and exons reported by the program GENOME ANNOTATION GENERATOR (GAG) was 67 332 and 85 813, respectively, with an average of 3.64 introns per gene and an average exon length of 228 and intron length of 208 bp.
Using the genome assembly, we were able to identify and assemble a hypothetical circular mitochondrial chromosome (electronic supplementary material, figure S2). Further analysis did not identify a candidate relic plastid genome (electronic supplementary material, figure S3), while phylogenomic analysis identified only four genes that, under certain scenarios for gene ancestry, could represent genes acquired as part of the endosymbiosis that gave rise to the plastid organelle present in photosynthetic stramenopiles (electronic supplementary material, figure S3).

Genome size, ploidy and evidence of sexual reproduction
K-mer counting [18] was used to predict a haploid genome size of between 54.1 and 68.6 Mbp with follow-up analysis focusing specifically on the !1 kbp assembly suggesting a genome size of 65.7 Mbp across 4758 scaffolds and a scaffold N50 size of 35.57 kbp (L50 of 399). The average sequencing coverage of the total assembly was estimated to be 312Â, and the average coverage over the !1 kbp scaffolds is 610Â. Extraction and purification of long strands of DNA was not achieved using multiple DNA extraction protocols, preventing sequencing using a long-read technology and/or pulsed-field gel electrophoresis to estimate chromosome number. We used a RT-PCR method for estimation of genome size [19] that indicated a haploid genome size of 46.9 Mb (s.e.m. ¼ 1.5).
As mentioned in the methods, the N50 of the genome assembly was much improved by the use of Platanus-an assembly algorithm optimized for multi-ploidy genomes. To further investigate evidence of ploidy in our H. catenoides culture, we mapped approximately 101 million reads to the 65.7 Mbp assembly identifying 1 393 505 single nucleotide polymorphisms (SNPs) with 1 332 610 (96%) of the SNPs identified consisting of a two-way nucleotide polymorphism (i.e. 58.8/41.2% mean character split). We also took all rsob.royalsocietypublishing.org Open Biol. 8: 170184 scaffolds and plotted SNP frequency against scaffold size. The majority of the scaffolds are clustered around a SNP frequency of approximately 0.0275 (electronic supplementary material, figure S4), suggesting that this variation is consistent and not specific to a subset of chromosomes, for example, in the case of aneuploidy. Interestingly, this analysis showed two large scaffolds with very low SNP frequency compared with the rest of the assembly. These scaffolds contain a number of genes with high sequence identity to genes found on large DNA viruses, suggesting the presence of a viral genome or evidence of a recent viral introgression, discussed further below. K-mer mapping [18] showed two peaks in coverage frequency, which is consistent with the reads mapping to a diploid genome (electronic supplementary material, figure S5).
Using reciprocal BLAST searches, we confirmed that H. catenoides encodes and expresses putative homologues of all seven eukaryotic meiosis-specific gene families [20] in the culture conditions used to grow H. catenoides (see electronic supplementary material, table S1). To our knowledge, sexual recombination has only been observed once in Hyphochytriomycota cultures, with Johnson [21] identifying cellular forms suggestive of zygote production as a result of fusion in the resting spore development of Anisolpidium ectocarpii [21]. However, a range of different sexual reproduction systems have been identified in the oomycetes (e.g. [22]); collectively these data suggest meiosis is present in representative taxa across the wider Pseudofungi.

Phylogenetic position of Hyphochytrium
Hyphochytrium has previously been shown to branch as a sister-group to the oomycetes in rRNA gene phylogenies (e.g. [3,15]). Using a suite of concatenated multiple amino acid sequence alignment approaches (supermatrix and per gene partitioned approaches) and a gene tree coalescence approach [23], we investigated the phylogenetic relationship of Hyphochytrium to other eukaryotes by building on previous phylogenomic analyses (e.g. [24 -26]). We generated a concatenated amino acid alignment of 325 orthologues (128 taxa and 90 230 amino acid sites) including a comprehensive sampling of eukaryotic taxa based on previously published analyses [24]. We used this alignment to calculate a eukaryote-wide phylogeny using a maximum likelihood (ML) approach with 35 [6,14] showing: (i -iii) different views of zoospores (including magnification of tinselate flagellum i), (iv) germination stage of large spore, (v) primary enlargement or primary sporangium, (vi,vii) thallus development on substrate, (viii) unusual extensive branched thallus, which consists of separated sporangia at different stages of maturity (e.g. xii,xiv), connected by long, tubular, septate, hyaline and empty hyphae (x,xi), sometimes with enlargements without sporangia (e.g. ix). Zoospores may fail to swim coming to rest near exit tube (xiii). (b) Table of genome statistics for a range of different stramenopiles. Asterisk indicates k-mer estimation of genome size (column 2). All numbers are from the respective genome datasets (see electronic supplementary material, table S12). Numbers in italics (contigs, column 5) are inferred from the scaffolded data. CEGMA: C, complete; P, partial recovered gene models. BUSCO: C, complete; D, duplicated; F, fragmented; M, missing gene models. rsob.royalsocietypublishing.org Open Biol. 8: 170184 100 'standard' bootstrap replicates using the IQ-TREE software [27,28] under the site heterogeneous model LGþG4þFþFMIX (empirical, C60) þ PMSF [29] (figure 2; electronic supplementary material, figure S6a shows the wider tree topology). To obtain additional topology support values, we inferred a tree based on this supermatrix with a per gene partitioned model in IQ-TREE with 1000 ultrafast bootstraps replicates (figure 2). Furthermore, using a gene tree coalescence approach in ASTRAL [23] we inferred a species tree with 100 multilocus bootstrap replicates (figure 2). Previously, genes with higher relative tree certainty (RTC) values were shown to improve the overall robustness of phylogenomic analyses [30]. In order to examine the effect of orthologues selected for multi-gene tree analysis, we inferred the RTC for each of the 325 orthologues using RAxML [31], with 100 rapid bootstrap replicates under the LG þ G4 model of evolution. The orthologues were ranked, and the top 50% with the highest RTC scores were selected and multiple gene phylogenies were calculated as above (electronic supplementary material, figure S6b).
The resulting tree topology (figure 2) demonstrates that H. catenoides forms a sister-branch to the oomycete radiation other eukaryotes (    with !99% support from all methods used for both the 325 multi-gene analysis and the orthologues ranked in the top 50% according to RTC scores (electronic supplementary material, figure S6b). The internode certainty (IC) [30,32] scores of nodes within both analyses showed this phylogenetic relationship was moderately supported across the alignment data matrix (electronic supplementary material, figure S7a,b), consistent with the possibility of mixed signal for this branching relationship in our 'orthologue' gene sets. Nonetheless, these results are consistent with the Pseudofungi hypothesis, i.e. the hyphochytriomycetes and the oomycetes are monophyletic and share a common evolutionary trend towards fungal-like osmotrophic feeding and polarized cell growth [3,4].
Our tree places the Pseudofungi as a sister-group to the photosynthetic stramenopiles (i.e. the Ochrophyta) plus Developayella. This has some consistencies with previously published phylogenetic analysis based on three nuclear encoded genes [33] and wider phylogenomic analysis [24,34], and in contradiction to analyses of mitochondrial gene phylogenies (concatenation of 10 genes, 7479 positions), which have demonstrated that a separate stramenopile group, the Labyrinthulida (i.e. Bigyra), forms a sister-group to the oomycetes [35]. We note, however, this phylogeny demonstrates a different branching relationship with Developayella which is shown here to be sister to the Ochrophyta, a relationship very weakly supported in the internode consistency analyses (electronic supplementary material, figure S7a,b) [32]. The tree recovered here has some similarities to that reported by Derelle et al. [34], which uses a large phylogenomic dataset from different taxa. This work argues for monophyly of Bigyra (e.g. Blastocystis þ Aplanochytrium and Schizochytrium), although our tree shows that this group is paraphyletic, a relationship also shown in Noguchi et al. [24]. Derelle et al. [34] also recovered paraphyly of this group in a subset of their Bayesian analysis and in their ML analysis, but then went on to demonstrate that this relationship is likely due to a long branch attraction artefact (e.g. [36]) associated with the Blastocystis branch and which can lead to the misplacement of Opalozoa (e.g. Blastocystis). Interestingly, sisterhood of the Pseudofungi and Ochrophyta implies a minimum of two losses of photosynthesis [34] and independent specialization of 'osmotrophic lifestyles' in the Bigyra (e.g. Aplanochytrium and Schizochytrium) and the Pseudofungi (e.g. Hyphochytrium and Phytophthora) within the stramenopiles. However, this scenario implies that the stramenopile lineage was ancestrally photosynthetic [37], a subject of debate [38,39] (electronic supplementary material, figure S3).

Shared derived traits across the Pseudofungi
Given the placement of H. catenoides as a sister-branch to the oomycetes, we were interested in investigating the conservation of cellular, biochemical and genetic traits shared across pseudofungal taxa. Oomycete plant parasites, e.g. Phytophthora spp., are sterol auxotrophs and appear to have lost the enzymes involved in sterol biosynthesis [40]. The sterol biosynthesis pathway has been predicted to function in Saprolegnia, and a putative CYP51 sterol-demethylase encoding gene was identified from the Saprolegnia parasitica genome and transcriptome data [12,41]. The protein encoded by this gene is a target of antimicrobial drugs such as clotrimazole and, therefore, has been suggested as a therapeutic target for treatment of Saprolegnia infections of fish [42]. Reciprocal BLASTp searches and phylogenetic analyses demonstrated that H. catenoides also possesses a putative orthologue (Hypho2016_00003038; electronic supplementary material, figure S8a) of the S. parasitica CYP51 steroldemethylase, which appears to be lost in plant parasitic oomycetes. To confirm that this is a viable drug target we grew H. catenoides in the presences of two azole 'antifungals'-clotrimazole and fluconazole-to assess effectiveness of these compounds in inhibiting H. catenoides growth. Both 'antifungal' agents were able to inhibit growth of H. catenoides (MIC 100 : clotrimazole 0.25 mg ml 21 ; fluconazole 4 mg ml 21 ; electronic supplementary material, figure S8b), indicating that the H. catenoides is susceptible to azole compounds, consistent with H. catenoides having a functional CYP51 enzyme.
There has been considerable effort to sequence a number of oomycete genomes, which has largely focused on parasitic taxa (e.g. [11,12,[43][44][45][46]). This work has also, in part, focused on identifying candidate effector proteins (secreted proteins that perturb host function for the benefit of the invading parasite [47] and which often contain N-terminal RxLR amino acid motifs [48 -50]) or lectin proteins that bind host molecules. Searches of the H. catenoides genome demonstrate there is only one putative protein of unknown function with a candidate RxLR motif (table 1). In addition, H. catenoides lacked several gene families linked with the evolution of plant parasitic traits in the oomycetes, i.e. NPP1 or NEPlike proteins (necrosis-inducing Phytophthora protein [51,52]), elicitin proteins [53], cutinase [54], pectin esterase and pectin lyase [55,56]. The animal parasite S. parasitica was noted to show enrichment of Notch proteins and Ricin lectins, as well as presence of other galactose-binding lectins and the bacterial toxin-like gene family (haemolysin E) [12]. While the Notch protein and Ricin lectin gene families are present in H. catenoides, they show no evidence of enrichment comparable to S. parasitica. The galactose-binding lectin and haemolysin E gene families are absent. Protease gene families show no general enrichment in comparison with other stramenopiles (table 1).
Comparative analysis of candidate secreted proteins defined by in silico identification of putative N-terminal secretion sequences demonstrated that H. catenoides contains a lower proportion of secreted proteins compared with many other stramenopiles, comparable with the paraphyletic obligate biotrophs Albugo laibachii and Hyaloperonospora arabidopsidis ( figure 3). The H. catenoides predicted proteome contains a moderate-to-low proportion of carbohydrate active enzymes [57] relative to other stramenopiles. Interestingly, H. catenoides has very few secreted carbohydrate active enzymes in comparison with other stramenopiles, suggesting that H. catenoides has a low diversity of extracellular carbohydrate processing functions and is, therefore, dependent on a limited subset of extracellular sources of fixed carbon (figure 3). To test this observation, we grew H. catenoides cultures in 190 different carbon sources using OmniLog PM1 and PM2 plates, which allows investigation of growth and respiration rate across a diversity of different carbon sources [58]. These data demonstrated (electronic supplementary material, figure S9a,b) a significant increase in respiration rate compared with the controls upon the addition of: a-or b-cyclodextrin ( p ¼ 0.01 and 0.01), dextrin ( p ¼ 0.02), Tween 40 or 80 ( p ¼ 0.03 and 0.03) or melibionic rsob.royalsocietypublishing.org Open Biol. 8: 170184 acid ( p ¼ 0.03). Of note, dextrin/cyclodextrins are products of enzymatic activity upon starch, a typical component of H. catenoides growth medium (YpSs), and may be indicative of the environment in which this organism is typically found. The addition of Tween 40 or Tween 80 has been shown to improve yield in other organisms [59] and may result from direct accumulation of fatty acids, or altered membrane permeability affecting nutrient uptake. In contrast to many oomycetes (e.g. [60]), H. catenoides demonstrates a limited utilization of diverse carbon sources. These data are consistent with the hypothesis that the evolution of a wide diversity of secreted carbohydrate active enzymes is associated with evolution of parasitic lifestyle within the oomycete lineages (e.g. [12,[61][62][63]), although this pattern could also be the product of secondary loss in the H. catenoides branch.
Seidl et al. [64] detected 53 domain architectures that were unique and conserved across the oomycetes P. infestans, P. ramorum, P. sojae and Hy. arabidopsidis. Domain architectures are often recombined by a process of gene fusion and/or domain 'shuffling' [65]. Such gene fusion characters, although subject to sources of homoplasy (such as gene fission [66]), can represent synapomorphic traits useful for polarizing phylogenetic relationships. We searched the H. catenoides genome for evidence of the 53 gene fusions previously identified in oomycetes [64] and found that 12 of these domain architectures were also present in H. catenoides (electronic supplementary material, table S2). Of note, we found a fusion gene of a putative b-glucan synthase enzyme domain and a putative membrane transporter gene (electronic supplementary material, table S2 and GenBank 'nr' protein database) shared across the Pseudofungi, suggesting that domain fusion has led to a unique coupling of substrate transportation and enzymatic processing prior to the radiation of this group. Theoretically, however, without proteomic data we cannot exclude the possibility that this novel domain combination may be the product of a conserved operon-like gene structure.
Using OrthoMCL [67] combined with a custom pipeline we identified nine Pseudofungi-specific orthologues, with five of these orthologues representing additional Pseudofungi-specific domain combinations (electronic supplementary material,  table S3). Of note, these combined results (electronic supplementary material, table S2 and S3) demonstrate a novel diversification of the serine/threonine kinase gene families,

Protein repertoire changes associated with loss of the posterior flagellum
The stramenopiles (also known as Heterokonta, meaning possessing two unequal flagella) were formally described as a phylum based on the presence of two motile flagella: a 'standard' smooth posterior flagellum and an anterior flagellum with tripartite rigid tubular mastigonemes (tinselate) [2]. Hyphochytrium builds only a single, anterior tinselate flagellum [6] while the oomycetes build the stramenopile flagella pair. Therefore, the posterior smooth flagellum was lost in the ancestor of the hyphochytrids (figure 2). To explore the consequence of the loss of this organelle in H. catenoides, in terms of gene/protein repertoire, we used a comprehensive list of proteins putatively associated with flagellar function [68] to survey the Hyphochytrium genome. This list comprises 592 amino acid sequences, 355 of which are found in both the major eukaryotic phylogenetic groupings of Opimoda and Diphoda [69],   Figure 3. Comparison of secreted proteome and putative carbohydrate active proteins across the Pseudofungi including photosynthetic stramenopile taxa as an outgroup. The schematic phylogeny at the top indicates the relationship between different oomycete species with the 'lifestyle' of each species indicated by text colour; green (Phytophthora species) indicates plant hemibiotroph, blue (Hyaloperonospora and Albugo) obligate plant biotroph, teal (Pythium) plant necrotroph, orange (Saprolegnia) animal saprotroph/necrotroph and black indicates putatively free living (e.g. Hyphochytrium, Ectocarpus and Thalassiosira). The first heat map in white/purple indicates the proportion of proteome of each organism which was identified as belonging to a particular CAZY (www.cazy.org) category using BLASTp with an expectation of 1 Â 10 25 . The number listed is the proportion, and the colour relates to magnitude of the listed number (as shown by scale bar). The second heat map, in blue/yellow, indicates the proportion of the secretome ( predicted via a custom pipeline https://github.com/fmaguire/predict_secretome/tree/ refactor) that is identified as belonging to each of these CAZY categories. Auxiliary activities (AA) cover redox enzymes that act in conjunction with CAZY enzymes.
The bar chart at the bottom shows the proportion of the proteome for each organism which is predicted to be secreted.     to the posterior flagellum and three specific to the anterior flagellum [68]. BLAST searches suggest that the three anterior flagellum proteins are also present in H. catenoides, as are 12 of the 14 posterior flagellum proteins identified from C. bullosa. Conservation of these 'posterior-specific' proteins suggests that they have functions associated with the anterior tinselate flagellum in H. catenoides (figure 4a). One of the C. bullosa posterior-specific flagellum proteins absent in H. catenoides and the oomycetes is the PAS/PAC sensor hybrid histidine kinase (also known as a helmchrome, CBJ26132.1), a putative photo-sensor associated with a swelling in the posterior flagellum of brown algae [68], discussed further below.
Twenty-nine of the UFPs (8%) were present in oomycetes and other eukaryotic groups but absent in H. catenoides. These may represent genuine gene losses, although absences in our draft genome may also be due to incomplete genome sequencing and assembly. If these are genuine losses, it suggests they represent UFP losses that correlate with loss of the posterior flagellum without the function of these UFPs being integrated into the anterior tinselate flagellum (figure 4a). These losses include a putative homologue of the Dynein Regulatory Complex 1 (DRC1) protein, which regulates inner dynein motor activity in Homo sapiens and Chlamydomonas reinhardtii [72], and Radial Spoke Protein 7 (RSP7), a protein that functions in flagellum structure and beating in Ch. reinhardtii [70]. Further, analysis of the radial spoke protein repertoire encoded by H. catenoides identified a number of other components of the radial spoke complex which are putatively absent in H. catenoides. However, RSP7 was the only radial spoke proteome loss specific to the loss of the posterior flagellum in the Hyphochytrium lineage (figure 4a,b); this protein is putatively encoded in the oomycetes but has been separately lost within the Opisthokonta (e.g. Ho. sapiens). In Chlamydomonas [70], RSP11 and RSP7 have been shown to contain a RIIa domain [73]. Association between RIIa and AKAP domains and RSP3 at the spoke stalk is suggested to be important for flagellar function [70]. Interestingly, comparative analysis suggests that neither RSP7 nor RSP11 are conserved across flagellum-bearing eukaryotes with only Chlamydomonas, Batrachochytrium and H. catenoides retaining RSP11 in our comparative dataset ( figure 4a,b). Domain analysis [74] of the putative H. catenoides RSP3 and RSP11 confirmed these proteins contain an AKAP and a RIIa domain, respectively, suggesting that H. catenoides has retained only RSP3-RSP11 protein-protein interaction at the base of the radial spoke, proximate to the outer doublet ( figure 4b).
Phylogenomic analysis of motor protein repertoire, specifically kinesins and dyneins ( figure 4c,d ), confirmed that the H. catenoides genome assembly has retained many of the motor proteins associated with flagellum function. These include representatives of all seven axonemal dynein heavy chain families ( plus their associated intermediate and light-intermediate chains) [75], both the retrograde (DYNC2) and anterograde (Kinesin-2) motors used in intraflagellar transport (IFT), and non-motor components of the IFT particles (figure 4c). Also identifiable are members of Kinesin-9 and -16 families, which are present in organisms which build motile flagella [71] (figure 4d). This motor repertoire is similar to that seen in oomycetes and shows that the modified tinselate H. catenoides anterior flagellum has retained most functions associated with flagellar motors. Wickstead & Gull have also proposed that the Kinesin-17 family has a flagellar function based on its phylogenetic distribution [71]. Our analysis suggests that H. catenoides has lost Kinesin-17 (unlike in the oomycetes). This may be associated with the loss of the posterior smooth flagellum, but may also be due to missing sections of the genome in the draft assembly.

Photoreceptors
Stramenopile species have been shown to encode a range of photoreceptor proteins and to initiate a series of responses to light including phototaxis [76]. Specifically, the zoospores of some stramenopile algae can show positive and negative phototaxis [77] associated with a flavoprotein photoreceptor [78], putatively the 'helmchrome' located in the posterior flagellum [68] and associated with 'flagellar swelling' and a stigma [77]. Consistent with the loss of the anterior flagellum, H. catenoides (figure 4; electronic supplementary material, S10) also lacks a gene putatively encoding a helmchrome protein.
A number of additional putative photo-responsive proteins have also been reported from Ectocarpus [10]. Using these data and other seed sequences (e.g. [68,79]), we searched the H. catenoides genome for putative homologues of photoresponsive proteins. Reciprocal BLAST searches demonstrated that the H. catenoides genome contained putative homologues of the flavoproteins Cryptochrome (Hypho2016_00016188), Cryptochrome DASH (Hypho2016_00004514) and Photolyase (Hypho2016_00002462) gene families (electronic supplementary material, figure S10a), and transcriptome data demonstrate that these genes are transcribed. This analysis also identified three putative type I (microbial) rhodopsins (Hypho2016_00006030, Hypho2016_00006031 and Hypho2016_  , table S4 for full dataset). The heat map identifies 29 proteins present in the oomycetes but absent in H. catenoides, suggesting that this gene had been lost at the same proximate point to the loss of the posterior flagellum. The analysis also shows 12 proteins (marked as *) identified as posterior flagellum specific in C. bullosa that are retained in H. catenoides and therefore putatively function in the anterior flagellum. Three C. bullosa anterior flagellum specific proteins are also retained in H. catenoides. The putative radial spoke proteome also shows numerous losses similar to Ho. sapiens (**), this includes the loss of RSP7 (***). Only changes in flagella cytology relevant to the evolution of the stramenopiles are sketched on the top tree. (b) Shows a cartoon of the radial spoke protein complex identified in Chlamydomonas with each shape number referring to the RPS number [70]. Black shapes illustrate proteins of the spoke complex conserved across the eukaryotes sampled, grey are non-conserved proteins (showing evidence of mosaic loss), while the white complex refers to RPS7 which, although absent in Ho. sapiens and other eukaryotes, has been lost separately and is consistent with the loss of the posterior flagellum in the ancestor of H. catenoides. rsob.royalsocietypublishing.org Open Biol. 8: 170184 00010050), the first putative representative of this gene family from a stramenopile (electronic supplementary material, figure S10a,b). The three rhodopsins all contain a conserved 11-cis-retinal binding pocket, specifically the lysine residue site of the Schiff base where the retinal is covalently linked (electronic supplementary material, figure S10c). Furthermore, reciprocal BLAST searches of both the genome and the transcriptome sequence datasets confirmed the presence of genes putatively encoding the latter two steps of the retinal biosynthesis pathway (e.g. a putative b-carotene-15, 15 0 -dioxygenase (Hypho2016_00004122) and a putative retinol dehydrogenase (Hypho2016_00000702). These genes encode the pathway steps that convert the vitamin b-carotene into 11-cis-retinal, the critical cofactor for rhodopsin to function as a light-responsive protein.

Gene families encoding hallmarks of fungal characteristics in the Pseudofungi
One of the main purposes for sequencing the H. catenoides genome was to investigate conservation and/or loss of genes that underpin the fungal/pseudofungal lifestyle. Many fungi grow as filamentous cells, reinforced by robust cell walls composed of polysaccharides such as chitin. These characters are not unique to the Fungi but are typical in many fungal lineages [80]. A suite of cellular systems allow fungi to grow as polarized cells, laying down cell wall and feeding on extracellular substrates by a combination of exocytosis of enzymes and cell-wall material combined with endocytosis and transporter protein mediated uptake of target nutrients. Fungal filamentous structures such as hyphae grow almost exclusively from the tip of the hyphal structure [81], allowing fungi to 'grow as they feed'. This feature combined with a robust cell wall means they can generate high turgor pressures, ramify into recalcitrant material, feed osmotrophically and maximize metabolic rates [80,82,83]. Homologous cellular systems also drive bud growth in Saccharomyces cerevisiae, allowing researchers to use S. cerevisiae to study proteome function involved in polarized growth (for reviews, see [81,84]). The proteins that are known to control this system are illustrated in figure 5a and involve key complexes, the exocyst and the polarisome. These systems are important for establishing the temporal and spatial control of polarized cell growth in fungi [81,84]. Comparative analyses show the exocyst and Sec4 orthologues are conserved across a diversity of eukaryotes including H. catenoides, while the polarisome and associated proteins are specific to the Fungi, given current taxon sampling (figure 5c). Comparative analysis demonstrates that specific elements of polarized cell growth control are not present in Pseudofungi, suggesting these filamentous microbes accomplish polarized growth using different proteome functions.
Motor protein evolution has been suggested to be an important factor in the acquisition of filamentous growth phenotypes in the fungi, with a specific focus on myosin and kinesin genes that encode functions involved in polarized cell growth, vesicle-transit and chitin synthesis [95][96][97]. Phylogenomic analysis of the motor head domain of all three motor types (figure 4c-e) demonstrates no expansion in motor paralogues uniquely shared by the Fungi and Pseudofungi. In addition, Pseudofungi lack the Myosin V and XVII shown to be important in fungal growth and chitin synthesis [96] (figure 4e). The lack of shared/unique motor repertoire between Fungi and Pseudofungi is consistent with the idea that these groups evolved filamentous polarized growth characteristics separately and based on different cellular systems. It has been noted that oomycetes contain a diverse complement of myosin paralogues [98]. The analyses reported here demonstrate that elements of this oomycete motor protein gene family expansion are also present in H. catenoides, specifically; Myosin XXX and XXI and Kinesin 14 and 20 show high degrees of expansion by duplication specific to the Pseudofungi (figure 4c,e), suggesting these motor proteins may be linked to filamentous polarized growth characteristics present in this group.
Like fungi [99] and many other eukaryotes [100][101][102][103][104][105][106], H. catenoides also produces chitin as cell-wall material [107]. Oomycetes have also been shown to produce chitin in their cell walls [108]. This is consistent with previous data that suggest that chitin synthesis and deposition as a cell-wall material predates the diversification of many major lineages of the eukaryotes [80,107]. H. catenoides has a similar repertoire of chitin synthesis and digestion as found in the oomycetes (i.e. chitin synthase division I), while another group of stramenopiles, the diatoms, which also produce chitin [109], have a variant chitin gene repertoire, namely chitin synthase division II and a chitinase (GH19) not present in Pseudofungi ( figure 6). This suggests that chitin production as a cell-wall component is universal and anciently acquired in the eukaryotes, but the genes that control the synthesis and remodelling of this structural polysaccharide have been reconfigured numerous times. Specifically, Pseudofungi seem to lack all chitin synthase division II genes (figure 6c), which are numerous and diversified in fungi, suggesting another key difference between the Fungi and Pseudofungi.

Viral integration across the Pseudofungi
The comparative genomic analysis of Pseudofungi demonstrated that H. catenoides, Phytophthora cinnamomi, Phytophthora parasitica and Pythium ultimum harbour genes putatively encoding viral major capsid proteins (MCP) (electronic supplementary material, table S5). These proteins have high sequence identity with each other and branch together with MCP proteins from African swine fever virus (Asfarviridae, a lineage of the nucleocytoplasmic large DNA viruses-NCLDVs), but which are divergent when compared with other NCLDV MCP proteins (figure 7a). Exploring the H. catenoides genome assembly to determine the presence of viral-like genes, we identified 45 candidate viral-derived genes, 38 of which are present on two scaffolds which were shown to have very low SNP frequency in the assembly (electronic supplementary material, table S5). All of these 38 genes showed highest similarity to NCLDV families such as Mimiviridae, Marseilleviridae, Phycodnaviridae, Asfarviridae and Poxviridae (electronic supplementary material, table S5). The genome assembly in these regions was confirmed by nested PCR and sequencing from both the 5 0 and 3 0 ends of the polB, mcp, mg96 genes of viral ancestry (electronic supplementary material, table S6). The viral-like genes were found in linkage with genes of H. catenoides/pseudofungal ancestry. For example, the genome assembly demonstrated that the viral-like mcp gene was on the same DNA contig as a putatively native H. catenoides histone-encoding gene (electronic supplementary material, figure S11). To confirm this assembly and linkage between 'host' and viral gene we rsob.royalsocietypublishing.org Open Biol. 8: 170184 conducted a bridging PCR resulting in an amplicon of 2837 bp and sequenced this amplicon, confirming that the mcp and histone genes are linked and on the same stretch of DNA (electronic supplementary material, table S6).
One hundred and forty-five predicted genes were identified in the two contigs that contain a high number of viral genes. BLASTx analyses suggest that the two contigs contained 37 (26%) and 18 (12%) genes of highest identity to genes of known viral genomes (electronic supplementary material, table S7). The BLASTx results for the remaining 235 putative genes showed a wide variation of top scoring hits including both prokaryotic-and eukaryotic-like genes. The frequency of putative exons for the two contigs was 1.62 and 1.49, respectively, a lower intron/exon frequency than observed for the wider genome (intron frequency ¼ 3.64), thus suggesting that genes encoded on viral gene-containing contigs have introns. Indeed, multiple viral-like genes show evidence of introns suggesting these genes have been: incorrectly modelled, subject to intronization or exon-like shuffling during integration, or these genes are undergoing pseudogenization and are therefore    [80]). Vesicles are delivered from the Golgi (a(i)) along cytoskeleton tracks to predetermined sites on the plasma membrane. Cdc42p is activated by Cdc24p (a(ii)) promoting [84] assembly of the polarisome complex (a(iii)) resulting in the formin Bni1p radiating actin cables [85,86]. Msb3p and Msb4p interact with Spa2 in the polarisome (a(iv)) which is thought to recruit Cdc42 from the cytosol at the site of tip growth [87]. Post-Golgi secretory vesicles are transported along actin cables using a type V myosin motor protein [88,89] (a(v)), to dock with the exocyst complex in a process dependent on Sec4 and its GEF Sec2 [90,91] (a(vi)) and so the vesicle is guided to its target site on the plasma membrane [92]. Cdc42p and Rho1 are required for localization of Sec3p, which together form a spatial marker for the exocyst (a(vii)) and Rho3p and Cdc42p mediate vesicle docking (a(viii)). Cdc42p plays a key role in regulating these processes in S. cerevisiae but in Pezizomycotina and basidiomycete fungi equivalent functions are performed by Rac1p [93,94]. rsob.royalsocietypublishing.org Open Biol. 8: 170184 broken ORFs, which are being reported as intron/exon structures. However, we note that gene of viral provenance Hypho2016_00000945-RA (scaffold 5419) contains multiple putative coding regions present in our transcriptome data. The low SNP frequency of these contigs suggests they represent a unique haploid portion of the genome, a viral genome captured in our assembly, or alternatively a site of viral introgression in the H. catenoides genome. We currently favour the hypothesis that this is a site of viral introgression due to the presence of putative introns in the contig and the low relative proportion of genes of clear viral provenance. Products from polB, mg96 and rps3 were detected by RT-PCR in our culture conditions, suggesting that viral-like genes are transcriptionally active (figure 7b). By contrast, a lack of transcript from the mcp gene suggests that a complete virus or a viral factory is not being manufactured in the culture conditions tested (figure 7b). Electron microscopy also failed to observe icosahedral structures typical of NCLDV particles or an intracellular viral factory (see electronic supplementary material, figure S12).
These data combined with evidence of viral genes present in oomycete genome assemblies (figure 7a) [111] suggest a hitherto unsampled diversity of large DNA viruses found infecting or integrated within the genomes of Pseudofungi. This is consistent with other data suggesting the Pseudofungi have been subject to viral transduction [111]. It has also been shown that many different lineages of the stramenopiles have similarly retained fragments of viral genomes [112], suggesting a wider and undersampled diversity of stramenopile-infecting large DNA viruses. It is tempting to speculate that this may be a mechanism driving horizontal gene transfer (HGT) seen in the oomycetes [113], given that NCLDVs have been shown to   rsob.royalsocietypublishing.org Open Biol. 8: 170184 harbour host-derived and foreign genes [114,115] and that fragments of large DNA viruses have now been shown to be present in fungi [111], a group shown to be a donor of HGT genes to the oomycetes [63,113]. Consistent with this, we note that the two contigs containing the viral derived genes also contain two genes with top BLASTx hits to fungal genes (electronic supplementary material, table S7). The Pseudofungi are thought to lack the capabilities to perform phagotrophy [4], a mechanism hypothesized to be important for HGT in eukaryotes [116]. However, there is evidence of gene transfer into the oomycetes from both fungi and prokaryotes [54,63,[117][118][119][120][121]. The extent of ancient HGTs in eukaryotes has recently been questioned [122]. Yet, Ku et al. [122] also identified genes uniquely present in oomycetes and bacteria which are described as 'recent lineage specific acquisitions' (see fig. 1 in [122], marked as b). Evidence of viral introgression within the Pseudofungi, therefore, identifies a possible mechanism driving HGT in AAT84441 Figure 7. Phylogeny of viral MCP proteins indicating the branching position of the pseudofungal genes and evidence of transcription of viral derived genes in H. catenoides. (a) Homologous sequences were identified using three psi-BLAST iterations with H. catenoides putative MCP as query; to remove sequence redundancies, retrieved sequences were clustered at 90% amino acid identity with cd-hit v4.6. Sequences were then aligned using MAFFT v7 iterative, global homology mode (G-INS-i); alignment sites retained for subsequent phylogenetic analysis were selected using trimAL [110] gap distribution mode. Final MCP multiple sequence alignment was composed of 386 sites. ML tree was inferred using IQ-TREE v1.3 and LG þ I þ G4 þ F model (determined as the best-fitting model by Bayesian information criterion). Node supports were evaluated with 100 non-parametric bootstrap replicates. The Mimiviridae clade was used to root the ML tree (unrooted version displayed on the lower left part). (b) RT-PCR showing expression of polB and mg96 viral genes alongside an rps3 positive control. No expression of the mcp gene was detected. RT-PCR was performed on H. catenoides RNA alongside genomic DNA (þ) and no-template (2) controls, with PCR products run on an agarose gel alongside a 1 kb ladder (Promega; 250 bp shown). rsob.royalsocietypublishing.org Open Biol. 8: 170184 the Pseudofungi, which cannot perform phagotrophy. It is important to note that viral transduction as a vector for HGT in the eukaryotes would be likely to produce a very different profile of gene transfer compared with mechanisms such as phagocytosis (in eukaryotes) [116], transformation ( prokaryotes and eukaryotes) [123] or conjugation ( prokaryotes and eukaryotes) [124,125]. This is because gene transfer via a virus would be likely to transfer a lower number and lower diversity of gene families for two reasons: (i) genes carried by the virus would have been passaged by selection within the viral lineage and (ii) the limited DNA carrying capacity of the viroid. Such a mechanism of HGT is, therefore, consistent with the results of Ku et al. [122], which suggest HGT is less frequent in eukaryotes compared with prokaryotes. However, this does not exclude the possibility that infrequent HGTs can lead to the acquisition of novel and/or positively selected traits.

Conclusion
The draft genome of the free-living stramenopile pseudofungus H. catenoides provides an important reference for comparative biology specifically with a view to understanding the evolution of filamentous growth and osmotrophic feeding. H. catenoides branches sister to the oomycetes that contains many important parasitic groups. These data demonstrate that H. catenoides does not encode many of the gene families found in oomycetes that have been associated with parasitic function, suggesting that these characteristics are more recent adaptations/acquisitions within the oomycetes (table 1). Our data also demonstrates that H. catenoides, and the Pseudofungi more widely, possess the genes that encode a range of features associated with filamentous growth and osmotrophic feeding in fungi. These include the exocyst vesicle trafficking system, sterol biosynthesis pathway and a repertoire of chitin cell-wall synthesis systems common to fungi. By contrast, Pseudofungi do not possess the genes encoding a polarisome complex, chitinase I, chitin synthase II/Myosin V or Myosin XVII, identifying clear differences between these two filamentous osmotrophic groups. Figure 8 summarizes how various features associated with filamentous growth and osmotrophic feeding arose relative to the branching position of the Fungi and the Pseudofungi. We hope the H. catenoides draft genome will provide a useful dataset for comparative biology within the Pseudofungi and across the eukaryotes, especially with regards to understanding the evolution of filamentous osmotrophic characteristics.
The Platanus assembly was subsequently filtered into four datasets; all scaffolds, scaffolds !10 kbp, scaffolds !5 kbp and scaffolds !1 kbp in order to test the effects of the N50 statistic and gene recovery rate by removing short and erroneous scaffolds/contigs (electronic supplementary material, figure S1). We determined that the set of scaffolds ! 1 kbp did not affect our predicted proteome complement and increased the N50. The filtered !1 kbp Platanus assembly, along with the mitochondrial genome assembly, are deposited in EBI with the accessions: Study ID, PRJEB13950; Scaffolds, FLMG01000001-FLMG01004758; and Mitochondria, LT578416. The full assembly and other filtered datasets can be accessed at https:// github.com/guyleonard/hyphochytrium or https://www.ebi. ac.uk/biostudies/studies/S-BSST46.
K-mer counting analysis was conducted using JELLYFISH along with two publically available scripts (estimate_gen-ome_size.pl and the website GenoScope, see https://github. com/josephryan/estimate_genome_size.pl and [132]). The average sequencing coverage of this assembly was estimated using the 'estimate_genome_size.pl' tool for the total assembly and using the 'genomeCoverageBed' from BEDTOOLS [133] for the !1 kbp subset of scaffolds.
Gene prediction was conducted by using CEGMA to predict which of the 246 core genes are present in our Hyphochytrium !1 kbp scaffolds; these predicted CEGs are then used in the training step of the program SNAP (see http://korflab.ucdavis.edu/software.html) to generate a set of ab initio gene models. The program GENEMARK-ES [134] was also run independently on the !1 kbp scaffold data, which produced another set of gene models. Both these sets of gene models are in the form of a hidden Markov model (HMM). A first pass of the pipeline MAKER was then run with the default settings, incorporating the gene models from SNAP and GENEMARK-ES while also deriving alignment statistics from the 454-transcriptome assembly with tBLASTn, REPEATMASKER [135] and EXONERATE [136]. The output is a set of gene models in GFF3 format. A second round of SNAP was then performed with the new predictions (after the GFF3 has been converted to a HMM) and the program AUGUSTUS [137] is run in ab initio mode using the MAKER first pass predictions (i.e. AUGUSTUS default gene models were not used as they are generated from distantly related taxa). Both outputs of SNAP (run 2) and AUGUSTUS are then fed back into MAKER for a second run with stricter settings (gene predictions are available here: https://github. com/guyleonard/hyphochytrium/tree/master/gene_predictions). The final output is a GFF3 file, transcripts and protein FASTA files. The resulting gene predictions were then BLAST searched against the SwissProt database along with INTERPROSCAN to assign putative annotations. The results were then used with the program ANNIE [138] to provide the correct format of annotation information to the program GAG [139] for database deposition. The resulting genome data is submitted as an update of a prior BioProject sequence submission [63]; to do this we used the 'gff3toembl' program from PROKKA [140].
Previously, we had sequenced a transcriptome from the same culture strain of Hyphochytrium [63] using 454 FLX sequencing of cDNA reads and assembled it with NEWBLER 2.5 [141] using the default cDNA settings. We removed 70 sequences from this assembly of less than 100 bp in length (excluding the polyA regions) and/or contigs that consisted of predominantly repeat motifs. This resulted in 6202 transcript sequences assembled in NEWBLER 2.5 using the standard settings for cDNA. The reads were also assembled in TRINITY but resulted in significantly more (nearly double) contigs.

Assessment of contamination of the genome sequence
To identify any prokaryotic contamination in the !1 kbp scaffold assembly, we first conducted BLASTn searches of the assembly using prokaryotic SSU and LSU rDNA sequences as search seeds ( greater than 50% of the subsections with a top BLAST hit to a prokaryotic genome and only 20 of the scaffolds had greater than 70% of their top BLAST hits to a prokaryotic genome. These 20 scaffolds were inspected manually; 11 of these showed the presence of putative spliceosomal introns and/ or other genes more similar to other eukaryotic genes. For the remaining nine scaffolds (totalling 31.8 kbp), we could not exclude them as possible prokaryotic contamination (listed in electronic supplementary material, table S9).
Comparisons of GC content versus read coverage coupled with BLASTn analysis to identify likely aberrant genomic affiliation of assembly scaffolds (e.g. 'blobology' [142]) has emerged as useful tool for identifying contamination of genome-sequencing projects [143]. We undertook this approach on both the !1 kbp scaffold assembly and the total assembly, and the graphs did not identify any suspect traces of contamination; however, they do show the presence of the mitochondrial genome as an aberrant cluster of 'blobs', i.e. with lower than average GC content (electronic supplementary material, figure S13a-d).
A fourth round of checks for contamination were conducted by using tetramer counting of the !1 kbp scaffold dataset for the building of Emergent Self Organising Maps [144]. These use similarities in the 4-mer frequencies to build, by way of an artificial neural network, an emergent 'map' of the input space properties of the data. Two runs of the software developed by Dick et al. [144] were completed (see electronic supplementary material, figure S14a,b): (i) the Hyphochytrium scaffolds only and (ii) the Hyphochytrium scaffolds along with the scaffolds from eight 'small' genomes which were added to the tetramer frequency dataset, (Bacteria

Hyphochytrium catenoides genome qPCR size estimation
The haploid genome size of H. catenoides was estimated using a qPCR-based method [19]

Mitochondrial genome assembly
Contigs of putative mitochondrial origin, from both assemblies, were identified by BLAST searches against the mitochondrial genome of Phytophthora infestans (NC_002387.1). The contigs from the genome assemblies were visualized, linked and edited using the program SEQUENCHER (https:// www.genecodes.com), resulting in two contigs. However, we were unable to circularize the genome using these two fragments. Therefore, regions spanning the gaps in the mtDNA super-contigs were amplified by polymerase chain reaction (PCR) with primers specific to the flanking sequences. Purified PCR products were sequenced using Sanger chemistry (externally at Eurofins Genomics, Ebersberg). This allowed the two contigs to be joined, resulting in a linear genome flanked on one end with rpl16 and atp8 on the other. These genes were identical to the other rpl16 and atp8 genes found in the assembled mitochondrial genome; we therefore inferred that these represented the beginning and end of a 19 kb inverted repeat (electronic supplementary material, figure S2). Mitochondrial genes were identified and annotated using MFANNOT (http://megasun.bch.umontreal.ca/cgi-bin/mfannot/mfanno-tInterface.pl, last accessed 20 June 2017) followed by manual inspection. The putatively circular genome was visualized using CGVIEW [146]. Results and discussion of the mitochondrial data can be found in the electronic supplementary material, figure S2.

Secretome analysis
Putatively secreted proteins were predicted using a custom pipeline (https://github.com/fmaguire/predict_secretome/ tree/refactor) which identifies sequences predicted to have a signal peptide (via SIGNALP 4.1 [150]), no TM domains in their mature peptide (via TMHMM 2.0c [151,152]), a signal peptide that targets for secretion (via TARGETP [153]) and belonging to the extracellular 'compartment' (as predicted by WoLFPSORT 0.2 [154]). The CAZY database [155] was downloaded, converted into a BLAST-DB and searched using the predicted proteome and secretomes using BLASTp with an expectation of 1 Â 10 25 . Hit tallies were then summed, proportions calculated and data plotted in Python via the PANDAS and SEABORN packages (figure 3).

Phylogenetic analysis of individual gene families
Unless otherwise stated in the figure legends all phylogenetic analyses were conducted using the following protocols. Using BLASTp we used the seed sequence to identify putative homologues across a locally maintained database of eukaryotic and prokaryotic genome-derived protein datasets (electronic supplementary material,  [157]. Sequences that required a high level of site exclusion (due to the sequence not aligning or not masking well) or where they formed long branches in preliminary analysis were removed. The phylogenies were calculated using RAxML [31] with 1000 (non-rapid) bootstrap replicates and using the substitution matrix and gamma distribution identified using PROTTEST3 (v. 3.2.1). In some cases, the invariant sites parameter was also included in the model (if indicated in the PROTTEST3 analysis).
To identify putative orthologues that arose at the base of the Pseudofungi, gene clusters identified from 74 genomes (electronic supplementary material, table S11) were mapped onto the species phylogeny using a pipeline described at https://github.com/guyleonard/orthomcl_tools and http:// dx.doi.org/10.5281/zenodo.51349. Putative pseudofungal specific orthologues were individually tested by conducting gene phylogeny, as described above, combined with additional BLAST searches of NCBI and JGI databases to test and improve taxon sampling (see electronic supplementary material, table S3 for the resulting set of pseudofungal specific orthologues).

Multi-gene concatenated phylogenetic analysis to identify the branching position of Hyphochytrium catenoides
Using previously established methods [25,158], we built a concatenated amino acid alignment of 325 orthologues resulting in a masked data matrix of 128 taxa consisting of 90 203 amino acid sites constructed from previously identified seed alignments [25].This dataset encompassed a wide sampling of eukaryotes as well as a broad sampling of stramenopiles available in public databases (e.g. [24,25] [27,28] and with the site heterogenous model of evolution LGþG4þC60þFþ PMSF ( posterior mean site frequencies) substitution model [29]. The full phylogeny for each are shown in the electronic supplementary material, figure S6a and b. Partitioned phylogenomic species trees were inferred using IQ-TREE v. 1.5.5, allowing each partition to have its own model and evolutionary rates. Each partition was independently analysed under the LGþG4 model of evolution. This analysis encompassed 1000 ultrafast bootstrap replicates. For summary-coalescent species tree estimation, we employed ASTRAL [23] with default settings and with species tree topology and node support estimated with ASTRAL multilocus bootstrapping (100 replicates). For this coalescence tree, ASTRAL was given all single gene RAxML (PROTCATLGF) best ML phylogenies and 100 rapid bootstrap replicates for each single gene alignment. IC was calculated for the IQ-TREE supermatrix ML tree (LGþG4þC60þFþPMSF) for both datasets (325 and the 162-50RTC). These were calculated in RAxML v.8.2.6 [31] by comparing the overall ML bipartitions to those in the best individual ML gene trees. These IC along with the TC (Tree certainty) values are mapped on the phylogeny shown in the electronic supplementary material, figure S7a and b.

Identification of genes of plastid ancestry
We constructed a database of taxonomically diverse representative genomes (electronic supplementary material, table S11) rsob.royalsocietypublishing.org Open Biol. 8: 170184 and clustered the respective proteomes into putative orthologue groups using OrthoMCL [67], retaining only the groups containing H. catenoides genes. Next, we resampled sequences from a wider database of 1205 taxa (electronic supplementary material, table S10) using BLASTp searches [159] to recover up to three sequences from each genome using a gathering threshold of 1 Â 10 210 . We then filtered these clusters, identifying only those containing both a H. catenoides gene and genes from photosynthetic or ancestrally-photosynthetic eukaryotic taxa. These sequences were then aligned using MAFFT [160], masked using TRIMAL [110] and a phylogeny was calculated from the data matrix using FASTTREE2 [160]. The resulting phylogenies were manually inspected for a phylogeny that showed H. catenoides/pseudofungal/stramenopile genes which: (a) branched within the Archaeplastida radiation, (b) branched with genes of photosynthetic eukaryotes and within a bacterial radiation or (c) branched with cyanobacterial genes. This process required re-running of the phylogenetic pipeline for many gene clusters, either reducing gene sampling or removing long-branch sequences. A subset of 101 gene cluster phylogenies putatively showed a phylogenetic relationship consistent with criteria (a) -(c) described above. The alignments from these clusters were then manually refined, the taxon sampling checked using manual BLAST searches of the NCBI nr database and phylogenies recalculated using the RAxML approach described above. The results of this analysis identified four candidate plastid endosymbiosis acquired genes; these are presented and discussed in the electronic supplementary material, figure S3.

Testing for CYP51 sterol-demethylase drug sensitivity
Azole susceptibility was assessed using a modification of the protocol reported in Warrilow et al. [42]. Briefly, fluconazole and clotrimazole were dissolved in dimethyl sulfoxide (DMSO) to a stock concentration of 25.6 mg ml 21 . Dilutions were then made with DMSO to prepare 100Â stock solutions. These stocks were diluted in PYG (1.25 g l -1 peptone, 1.25 g l -1 yeast extract, 3 g l -1 glucose) medium to a final volume of 5 figure S8 for the results of the CYP51 and drug treatment analysis).

OmniLog 'phenotype microarrays'
Measures of 100 ml H. catenoides culture were grown in PYG in baffled flasks, at 258C with 170 r.p.m. shaking to minimize aggregation. Cells were recovered by centrifugation at 3200g, washed twice with water and re-suspended in PYG (as above, no carbon-source) to a final concentration of approximately 1.5 Â 10 3 cells ml 21 . Cells were allowed to recover at 258C with shaking for 30 min before Dye mix D (Biolog) was added to a 1Â final concentration. A measure of 100 ml of cells was inoculated into each well of PM1 and PM2 carbon-source plates and incubated for 7 days at 258C. Each growth assay was performed in triplicate from independent cultures. OmniLog Phenotype Microarray outputs were analysed using OPM [162]. Data were aggregated using the 'opmfast' method, analysed using the A parameter (maximum value of OmniLog units reached) and tested by t-test. Significant p-values were extracted if they resulted in increased respiration rate in comparison with the negative control well A01 (see electronic supplementary material, figure S9 for the results of the OmniLog analysis).

Confirmation of viral genes in the Hyphochytrium catenoides assembly and reverse-transcriptase PCR of viral genes
To confirm that the viral genes were assembled correctly and were resident in the H. catenoides genome, PCRs across the 3 0 and 5 0 junctions of the putative viral open reading frame for three of the viral genes polB, MCP and mg96 were performed. PCR reactions (25 ml; 1Â Phusion HF buffer, 400 mM dNTP mix, 200 nM each primer, 0.5 U Phusion polymerase) were performed with the following cycling conditions: initial denaturation of 5 min at 988C, followed by 30 cycles of 10 s at 988C, 30 s at 56-648C and 1 min at 728C, then a final extension of 5 min at 728C. These were purified using a GeneJET PCR Purification Kit or GeneJET Gel Extraction kit (Thermo Scientific) and sequenced to confirm that each product matched the expected amplicon. To confirm that the mcp gene was on the same contig as the histone H3 gene, we performed a PCR across these two genes (expected amplicon of 2837 bp) using the same conditions as above, except with an annealing temperature of 648C and with a 3-min extension. The PCR product was purified and A-tailed using Taq polymerase, then cloned using the StrataClone PCR Cloning Kit (Agilent Technologies). The resulting vector was sequenced using T3/T7 primers, with primer-walking to confirm the entire 2.8 kb sequence.
To investigate if the viral derived genes are actively transcribed in our culture conditions, we conducted RT-PCR of the polB, mcp, mg96 and rps3 virus confirming polB, mg96 and rps3 are expressed in our culture conditions and suggesting that the viral-like genes are transcriptionally active. RNA was extracted from H. catenoides using RNA PowerSoil Total RNA Isolation (MoBio). Residual genomic DNA was removed using RQ1 RNase-Free DNase (Promega) and Taq PCR was performed to confirm absence of DNA. Reverse-transcriptase PCR (RT-PCR) was then performed using a Qiagen OneStep kit according to the manufacturer's instructions, alongside genomic DNA positive and no-template controls. The following cycling conditions were used: reverse transcriptase of 30 min at 508C and initial denaturation of 15 min at 948C, followed by 32 cycles of 1 min at 948C, 1 min at 508C and 1 min at 728C, then a final extension of 10 min at 728C. Samples were then analysed on a 2% (w/v) agarose gel.

WGA staining
Hyphochytrium catenoides was grown for 7 days at 258C and 100 ml of mycelial growth was removed and suspended in 1 ml PBS, then 5 mg ml 21 calcofluor white (Fluka) and 10 mg ml 21 WGA, Alexa Fluor 488 conjugate (Invitrogen) were added and cells were incubated for 30 min in the rsob.royalsocietypublishing.org Open Biol. 8: 170184 dark. Cells were washed twice in PBS and imaged using an Olympus IX73 microscope on a 40Â objective. Unstained cells were also checked to confirm the absence of autofluorescence.