Combined morphological and phylogenomic re-examination of malawimonads, a critical taxon for inferring the evolutionary history of eukaryotes

Modern syntheses of eukaryote diversity assign almost all taxa to one of three groups: Amorphea, Diaphoretickes and Excavata (comprising Discoba and Metamonada). The most glaring exception is Malawimonadidae, a group of small heterotrophic flagellates that resemble Excavata by morphology, but branch with Amorphea in most phylogenomic analyses. However, just one malawimonad, Malawimonas jakobiformis, has been studied with both morphological and molecular-phylogenetic approaches, raising the spectre of interpretation errors and phylogenetic artefacts from low taxon sampling. We report a morphological and phylogenomic study of a new deep-branching malawimonad, Gefionella okellyi n. gen. n. sp. Electron microscopy revealed all canonical features of ‘typical excavates’, including flagellar vanes (as an opposed pair, unlike M. jakobiformis but like many metamonads) and a composite fibre. Initial phylogenomic analyses grouped malawimonads with the Amorphea-related orphan lineage Collodictyon, separate from a Metamonada+Discoba clade. However, support for this topology weakened when more sophisticated evolutionary models were used, and/or fast-evolving sites and long-branching taxa (FS/LB) were excluded. Analyses of ‘–FS/LB’ datasets instead suggested a relationship between malawimonads and metamonads. The ‘malawimonad+metamonad signal’ in morphological and molecular data argues against a strict Metamonada+Discoba clade (i.e. the predominant concept of Excavata). A Metamonad+Discoba clade should therefore not be assumed when inferring deep-level evolutionary history in eukaryotes.


Introduction
Most current views of the diversity of eukaryote life divide almost all known taxa into three massive assemblages [1][2][3][4][5]. These are: (i) Amorphea, which includes animals, fungi, choanoflagellates, many amoebae and most slime moulds; (ii) Diaphoretickes, encompassing land plants, almost all algae, and many heterotrophs like ciliates and foraminifera; and (iii) Excavata, which includes the euglenid algae, diverse parasites (e.g. trypanosomatids, trichomonads, Giardia), and various free-living protozoa like jakobids, heteroloboseids and Carpediemonas (alternative names for similar major assemblages are sometimes used [2]). The Excavata grouping contains two main subclades, Metamonada and Discoba, which are each robustly supported by molecular phylogenetics [6,7]. Some taxa in both Metamonada and Discoba are so-called 'typical excavates', organisms that share a characteristic suspension-feeding groove supported by a complex and specific flagellar apparatus cytoskeleton, as well as a vane-bearing posterior flagellum. These features unite Excavata morphologically [8].
Despite this eukaryote-wide phylogenetic framework, there remain a number of enigmatic protist lineages with poorly resolved evolutionary affinities. The most extraordinary example is Malawimonadidae. Malawimonads are small aerobic heterotrophic flagellates with a feeding groove [9]. An electron microscopy study of Malawimonas jakobiformis identified most of the 'typical excavate' cytoskeletal features [8][9][10], and phylogenies of one or a few slowly-evolving marker genes usually place malawimonads as a relative of some or all other excavates, though usually with only modest support [11][12][13][14]. By contrast, most phylogenomic analyses, which examine scores-to-hundreds of genes, show malawimonads branching separately from other excavates, and instead place them with Amorphea [7,[15][16][17][18][19][20][21][22]. If accurate, this inference profoundly impacts our understanding of the history of eukaryotic cells. Assuming the 'excavate-type' cell architecture is truly homologous in malawimonads and other 'typical excavates', it implies that the last common ancestor of all living eukaryotes was a 'typical excavate' itself, under the most popular model for the placement of the root of the tree of eukaryotes [21,23]. This is a remarkably specific inference about a pivotal species that lived more than a billion years ago.
To date, only one species of malawimonad has been described, Malawimonas jakobiformis. All published morphological information is from one strain of M. jakobiformis [9], while almost all analyses of molecular sequences employ data from two strains, the type strain of M. jakobiformis and a second, undescribed strain usually known informally as 'Malawimonas californiana' [7,15,16,21,[24][25][26]. Given the importance of malawimonads for understanding the deep-level evolutionary history of eukaryotes, this is a perilously narrow base of information.
Recently, the mitochondrial genome was reported from a third malawimonad, 'strain 249' [27]. Here, we describe strain 249 as Gefionella okellyi n. gen. n. sp. (see Taxonomic summary, below). Gefionella okellyi proves to be sister to the previously studied malawimonads. We determined the three-dimensional architecture of its flagellar apparatus cytoskeleton, and conducted phylogenomic analyses incorporating transcriptomic data. Our new data provide a broader base of understanding for malawimonads, allowing for a critical examination of the affinities of this mysterious group. tubes containing 3 ml of 25%-strength cerophyl medium (ATCC medium 802; ScholAR Chemistry, West Henrietta, NY, USA), with mixed unidentified bacteria. Bulk cultures were grown in 4-l flasks containing 1.0-1.5 l of 100% cerophyl medium, on a rotary shaker (120 rpm), at room temperature (RT).

Microscopy
Live cultures were observed using phase contrast and differential interference contrast optics, with 100× oil-immersion objectives and a 1.6× 'optovar' lens, and documented using a 1.4-megapixel camera.
Cells were fixed for scanning electron microscopy (SEM) using an osmium tetroxide vapour protocol [28], collected on 2.0-µm Isopore filters (Millipore), dehydrated through an ethanol series, critical-point dried in CO 2 , and sputter-coated with gold/palladium. Cells were imaged using only the secondary electron detector of the SEM at 20 keV.
For transmission electron microscopy (TEM), 3 ml of culture was concentrated by centrifugation (3000 × g for 5 min), fixed in 2.5% glutaraldehyde, rinsed twice, postfixed in 1% osmium tetroxide, and rinsed three times. All steps through the first post-OsO 4 rinse were performed in 50% cerophyl medium; the final two rinses were in distilled water. Cells were enrobed in 2% agarose, dehydrated through an ethanol series (30-50-70-80-90-95 × 2-100 × 3, 10 min each change), then propylene oxide (50% with ethanol, then three changes in pure reagent), and embedded in SPI-Pon resin (SPI) with intermediate 1 : 2 and 2 : 1 changes in resin : propylene oxide. Approximately 50-nm-thick serial sections were cut with a diamond knife, mounted on pioloform film in slot grids, stained with uranyl acetate (10 min) and lead citrate (5 min), and observed on a TEM equipped with a goniometer stage and a 14-megapixel camera. Eighteen series of 8-21 sections were documented, plus several shorter series. A three-dimensional (3-D) model was derived from one 21-section series, as described previously [29]. Briefly, the micrographs were first annotated by hand in a vector drawing program. The vector data were then imported to a 3-D modelling program, where they were aligned and scaled appropriately. Annotations corresponding to the same structure (e.g. the same microtubule) were identified, and model structures were constructed using the annotations as a framework. The final model included significant preparation artefacts (e.g. compression, skew), which were corrected by hand. All stages of the reconstruction process occurred with reference to multiple other series; no structure was represented in the model unless it could be identified in at least one other series.
Raw data were assembled into contigs using 'Inchworm' from the 'Trinity' package [30]. Low-k-mer contigs were removed to exclude mild contamination introduced during sequencing. Sequences were added to a published 159-gene phylogenomic dataset using an in-house Python pipeline [31,32]. This dataset also included recently reported transcriptome data from shorter-branching metamonads, including Trimastix marina [33]. All phylogenetic trees based on single-gene datasets were inspected by eye, and paralogues and potential lateral transfers were removed. Additionally, all bipartitions in singlegene trees with bootstrap proportions (BP) greater than 70% were cross-checked against a reference tree of eukaryotes, and conflicting bipartitions were examined by eye. The final dataset as analysed here had 84 taxa and 42 564 sites, with G. okellyi showing 75% site coverage.
The dataset was initially analysed using maximum likelihood (ML) as implemented by RAxML v. 7.8.1 [34] with the site-homogeneous evolutionary model LG+Γ+I. Parameters were estimated by the software and 500 bootstrap replicates were performed. A second ML analysis was conducted in IQ-TREE v. 1.5.5 [35] using a site-heterogeneous model (LG+C60+F+Γ4), with robustness assessed via 'ultrafast' bootstrap approximation (1000 replicates). We also performed a Bayesian analysis using PHYLOBAYES-MPI [36] on the full-taxon 159-gene dataset, using the site-heterogeneous CAT-GTR+Γ4 model, with four chains sampled every second generation for 24 000 generations. This computation-and time-expensive analysis still showed only two chains converging (maxdiff = 0.168), which were assessed after discarding the first 20% of generations as burn-in.
The impact of fast-evolving sites and long-branching taxa (FS/LB) on the phylogenetic inference was assessed in '-FS/LB' analyses as follows. Taxa were sorted by branch length (as inferred under ML using the LG+Γ+I model), and 35 of them (42%) were sequentially removed, to generate 36 taxon sets (including the original alignment). To eliminate the issue of the unknown position of the root, we calculated all pairwise branch lengths and used the average of the ten longest tip-to-tip distances for each taxon as the branch length metric. Fast evolving sites were then removed for each of the 36 taxon sets as follows. An evolutionary rate was estimated for each site using Dist_Est [37], then sites were sorted from fastest-to slowest-evolving and removed by thousands until 30 000 sites were excluded (generating 30 alternative datasets for each original one). Each of the 1080 datasets (36 × 30) was then bootstrapped using 100 rapid bootstraps in RAxML (model setting PROTCATLGF). Finally, the dataset with 23 000 fast-evolving sites and 22 long-branch taxa removed was selected for detailed phylogenetic analysis as above, including (i) a maximum-likelihood analysis using the LG+C60+F+Γ4 model in IQ-TREE, with a 1000-replicate 'ultrafast' bootstrap analysis, (ii) a maximum-likelihood analysis under the LG+Γ4+I model with 500 bootstrap replicates in RAxML (v. 8.1.16), and (iii) a Bayesian analysis using PHYLOBAYES-MPI under the CAT-GTR+Γ4 model, as described above, but with convergence among all four chains observed after 32 000 generations (maxdiff = 0.063).

Morphology
Live interphase cells have an approximately 6 µm long main cell body (4.4-7.2 µm; av. 5.9; s.d.: 0.6; n = 30). The main cell body is generally bean-shaped (though sometimes with a pointed posterior, which may generate a temporary extension up to 1.5 µm) and has a ventral feeding groove (figure 1a-d). One groove margin may project slightly as an 'epipodium' (see below; figure  Three microtubular roots, 'R1', 'R2' and 'S', originate near B1, and are associated with the 'typical excavate' set of non-microtubular fibres: 'A', 'B', 'C' and 'I' (see [8]). R1, eventually with six microtubules, originates on the left side of B1, and has the narrow, dense C fibre on its dorsal side (figure 3a,f -i). R2 originates on the right side of B1 as a curved row of about eight microtubules, connected to B1 on its  dorsal side by the narrow A fibre (figure 3a,c,e,f ). The I fibre adheres to the ventral face of R2, and is thick (approx. 75 nm), with a complex laminate structure (figure 3c,d,g; electronic supplementary material, figure S1b-d). The B fibre is narrow and striated. It originates near B1 (and one end of the distal fibre: see above), and heads right to associate with the right edge of R2 (figure 3d,h). Root 'S' is a single microtubule that originates near the dorsal side of R2 (figure 2f -i; see electronic supplementary material, figure S1).
A novel structure, the P (='paired') fibre, consists of two electron-dense and striated elements joined by fine material. It runs alongside the nucleus and connects the dorsal/right face of R2 to the posterior side of B2 (figure 3a,b,e-h).
Soon after its origin, R2 splits into an inner 'iR2' with six microtubules and an outer 'oR2' that grows to 15+ microtubules by addition along its outer (right) edge. The I fibre continues with oR2 only. The I fibre ends approximately 400 nm after the split, distal to which the B fibre connects to the ventral/rightmost part of oR2 (figure 3i), and the P fibre ends against the dorsal side of oR2 (figure 3h). As iR2 and oR2 diverge, a narrow 'G' (='groove') fibre originates against the ventral face of iR2, but bridges the gap between iR2 and oR2, and continues posteriorly with oR2 (electronic supplementary material, figures S1d-g and S2k). Several individual microtubules diverge from both oR2 and iR2 to support the groove membrane between them (electronic supplementary material, figures S1g and S2k,l), while S joins R1, and R1 frays into individual microtubules (electronic supplementary material, figure S2h,i).
figure 4c,d). Rapid bootstrap support (computed in RAxML) for the Malawimonadidae+Collodictyon clade declined with removal of fast evolving sites, falling from 100% to approximately 50% after removal of 15 000 sites, and later to approximately 25% (figure 4b,c). Support for Metamonada branching with Discoba showed a similar pattern of decline (electronic supplementary material, figure S6). By contrast, support for Opisthokonta (plotted as a control clade in figure 4b) remained at or near 100% throughout this site removal series. In fact, most other major eukaryote groups (e.g. those depicted as triangles in figure 4a) still received (near-) maximal support after removal of 15 000 sites (data not shown), further indicating that the collapse of support for Malawimonadidae+Collodictyon and Metamonada+Discoba was not due to a general loss of deep phylogenetic signal. Interestingly, removing rapidly evolving sites and rapidly-evolving taxa together revealed a broad 'island' of support for a malawimonads+metamonads clade (figure 4d). A single -FS/LB dataset from this island (23 000 fastest-evolving sites and 22 longest-branching taxa removed) was selected for detailed analysis. These analyses recovered a tree of eukaryotes mostly consistent with the initial phylogeny, but with different positions for malawimonads, metamonads and Collodictyon (figure 5). In this tree, malawimonads branched in a clade with Metamonada (in this case represented by Trimastix and Paratrimastix) that was quite strongly supported in the site-heterogeneous analysis (LG+C60+Γ4+F UFboot 92%), while bootstrap support under the LG+Γ4+F model was 81%, and posterior probability was low (0.7) in the Bayesian analysis (CAT-GTR+Γ4 model). This Malawimonadidae+Metamonada clade branched adjacent to Amorphea on the unrooted tree, while Collodictyon branched between this grouping and Discoba+Diaphoretickes ( figure 5).

Discussion
The distant relationship between malawimonads and all other excavates inferred in most recent phylogenomic-scale analyses demands that the cell architecture of malawimonads be carefully reevaluated, especially given their importance for understanding deep-level eukaryote evolution (e.g. [21,23]). If our re-examination of malawimonad ultrastructure had shown that the only cytoskeletal similarities between malawimonads and other excavates were those also shared by several other groups of eukaryotes [23], then the tension between morphology and typical phylogenomic results would disappear. Instead, our study shows the opposite, actually extending the known morphological similarity between malawimonads and (other) 'typical excavates'.
The system of microtubular roots and supporting fibres that are general to 'typical excavates' (R1, splitting R2, singlet root, fibres 'A', 'B', 'I' and 'C'; see [8]) are all present in Gefionella okellyi, as was proposed for Malawimonas jakobiformis [9]. Of these, the B fibre is most significant, since no unambiguous homologue of this structure has been positively identified outside of excavates (though see [41]). The confirmation here that the malawimonad B fibre is striated further supports its homology with the B fibres of other 'typical excavates' [8].
The two other best candidates for cytoskeletal synapomorphies for excavates are (i) the composite fibre, and (ii) the system of vanes on the posterior flagellum [8]. The composite fibre of G. okellyi is the first observed in a malawimonad. It is smaller than in most 'typical excavates', but is position-equivalent, and contains the standard arrangement of striated and dense components [10]. The absence of this fibre from the original description of M. jakobiformis [9] may be because that study focused on the cell's anterior, whereas the composite fibre is located posteriorly.
Malawimonas jakobiformis has vanes on the posterior flagellum, but is unusual in having a single ventral vane only [8,9]. By contrast, the pair of opposed vanes in G. okellyi conforms to the most common arrangement in metamonad 'typical excavates', which is inferred to be the ancestral state for Metamonada, based on mapping characters to molecular phylogenies [42]. Also, our documenting of striations on malawimonad vane lamellae further supports their homology with the lamellae of other 'typical excavates', which are similarly striated [8,43]. The possession of opposed vanes is shared by Malawimonadidae and Metamonada to the exclusion of Jakobida (the only 'typical excavate' group in Discoba), since jakobids have only a single dorsal vane [10].
Otherwise, the ultrastructure of G. okellyi underscores its identity as a malawimonad. The discoidal cristae, striated band and distal fibre connectives between the BBs, sizes of R1 and R2, and epipodium supported by part of R1 are all similar to M. jakobiformis [9]. The G fibre was not observed in M. jakobiformis, though this subtle feature would be easily overlooked. The conspicuous P fibre was also not recorded in M. jakobiformis, however, and provisionally distinguishes Gefionella from Malawimonas.
Meanwhile, our phylogenomic analyses demonstrate the weakness of the common inference that malawimonads are not related to any other excavates. It is well known that systematic errors such as long branch attraction (LBA) artefacts can result when the model of evolution does not sufficiently reflect the actual evolutionary process [44]. In some cases these errors can be overcome using more realistic models of sequence evolution, or excluding likely sources of phylogenetic 'noise', such as fast-evolving sites or long-branching taxa. On this basis, the relationship between malawimonads and Collodictyon recovered in our initial analysis (and several previous analyses [16,18,20]) is suspected to represent phylogenetic error. It is only weakly supported in our ML analysis using a site-heterogeneous mixture model (69% UFboot), and support under site-homogeneous substitution models rapidly decreases once several thousand fast-evolving sites are removed (figure 4b). In parallel, site-heterogeneous models support the conventional placement of Metamonada with Discoba only moderately (under ML with the LG+C60+Γ4+F model) or not at all (under Bayesian analysis with the CAT-GTR model), and the initially strong support for Metamonada+Discoba under simpler site-homogeneous models weakens as the noisiest data are excluded (fast-evolving sites in particular: electronic supplementary material, figure S6). The collapse in support for both groupings with exclusion of fast-evolving sites occurred while support for other similar-scale groupings remained very strong (exemplified by Opisthokonta in figure 4b, but equivalent for other clades). This indicates that the dissolution of support for Malawimonadidae+Collodictyon and Metamonada+Discoba is not due to a general loss of deep phylogenetic signal. Instead, we find support for a Malawimonadidae+Metamonada grouping in our '-FS/LB' analyses, where fast-evolving sites and long branching taxa are both removed (figures 4d and 5). As pointed out by Derelle et al. [21], a malawimonad+metamonad relationship has also been observed in a few recent phylogenomic analyses, specifically some in which metamonads are represented solely by the shorter-branching species Paratrimastix pyriformis (formerly Trimastix pyriformis) [18,19,30]. Thus, from our various treatments, (i) a malawimonad+metamonad grouping received its strongest support from the slowest evolving sites, and (ii) we still recovered this grouping with better taxon sampling for malawimonads and short-branched metamonads than was available in previous work. These trends are consistent with the malawimonad+metamonad phylogenetic signal reflecting the true evolutionary history. Conversely, the initial topology, including a Metamonada+Discoba clade, may be affected by LBA. Prior to long-branch removal, Metamonada and Discoba each included some of the longest-branching taxa examined (electronic supplementary material, figure S5), even though the most divergent metamonad taxa (e.g. diplomonads) were excluded a priori. This could have resulted in metamonads being pulled toward discobids and away from malawimonads, the latter being one of the shortest-branching groups of eukaryotes.
A close relationship between malawimonads and metamonads would also be consistent with other non-phylogenomic data. As discussed above, there are considerable morphological similarities between malawimonads and metamonad 'typical excavates'. At the ultrastructural level, they resemble each other more than either resemble any other group of eukaryotes, including other 'typical excavates' (i.e. jakobids), when the new information on flagellar vane organization is taken into account. (Malawimonads and metamonads also share the plesiomorphy of having an anterior R3 root, which is likely absent in all jakobids [43].) Further, some phylogenies inferred for one or a few slowly-evolving nucleus-encoded proteins place malawimonads with at least the shorter-branching metamonads (e.g. Trimastix, Paratrimastix), albeit usually with weak statistical support [11,13,14].
In summary, this study provides additional evidence that malawimonads are 'typical excavates', morphologically speaking, with their greatest similarity being to certain metamonads. Further, it highlights the weakness of the phylogenomic evidence separating malawimonads from all other excavates, and demonstrates a case where a moderately-well-supported malawimonad+metamonad grouping can be recovered in selected noise-filtered datasets. Together, the re-examined morphological and phylogenetic evidence imply that the predominant view of the evolutionary relationships among excavates (that malawimonads branch outside, and probably completely separately, from a robust Metamonada+Discoba clade [3][4][5]) is extremely insecure. Instead, the proposition that metamonads are more closely related to malawimonads than they are to Discoba is consistent with a greater range of evidence and analyses. Therefore, we caution that an Excavata grouping of Metamonada+Discoba (exclusively) should not be assumed in studies of the evolution of eukaryotes, such as inferring the history of major cellular systems from comparative genome data (e.g. [45][46][47]). In this view (and contrary to that with malawimonads branching separately from a Metamonada+Discoba clade), the strong morphological similarity between malawimonads and metamonads is not directly relevant for inferring the cytoskeleton organization in the last common ancestor of eukaryotes, since they are unlikely to branch on opposite sides of the root of eukaryotes [21].
The 'malawimonad+metamonad signal' may or may not reflect an exclusive sister-group relationship between these two taxa. Further testing is needed, especially using high-quality phylogenomic data from other hard-to-place eukaryote lineages (addressed in our ongoing research). In addition, the large disparities in branch lengths among Malawimonadidae, Metamonada and Discoba make it very challenging to resolve their relationships using phylogenomics. Denser and better-quality taxon sampling in phylogenomic datasets would be valuable, especially the addition of shorter-branching lineages of metamonads and discobids, or more malawimonad clades. Recent isolations of novel deepbranching discobids and metamonads [13,14,48] hint that many other important excavate lineages may indeed await discovery. Final resolution of these relationships will probably also require both sophisticated evolutionary models and identifying the most reliable data within the large amounts of sequence information now available.