Proceedings of the Royal Society B: Biological Sciences
You have accessResearch articles

Different phylogenomic methods support monophyly of enigmatic ‘Mesozoa’ (Dicyemida + Orthonectida, Lophotrochozoa)

Marie Drábková

Marie Drábková

Department of Parasitology, University of South Bohemia, České Budějovice 37005, Czech Republic

Laboratory of Molecular Ecology and Evolution, Institute of Parasitology, Biology Centre CAS, České Budějovice 37005, Czech Republic

[email protected]

Contribution: Data curation, Investigation, Methodology, Project administration, Resources, Software, Visualization, Writing – original draft, Writing – review & editing

Google Scholar

Find this author on PubMed

,
Kevin M. Kocot

Kevin M. Kocot

Department of Biological Sciences, The University of Alabama, Campus Box 870344, Tuscaloosa, AL 35487, USA

Contribution: Conceptualization, Data curation, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing – review & editing

Google Scholar

Find this author on PubMed

,
Kenneth M. Halanych

Kenneth M. Halanych

The Centre for Marine Science, University of North Carolina, Wilmington, 57000 Marvin K. Moss Lane, Wilmington, NC 28409, USA

Contribution: Conceptualization, Writing – review & editing

Google Scholar

Find this author on PubMed

,
Todd H. Oakley

Todd H. Oakley

Department of Ecology, Evolution, and Marine Biology, University of California Santa Barbara, Santa Barbara, CA 93106, USA

Contribution: Conceptualization, Funding acquisition, Writing – review & editing

Google Scholar

Find this author on PubMed

,
Leonid L. Moroz

Leonid L. Moroz

Department of Neuroscience, and the Whitney Laboratory for Marine Bioscience, University of Florida, 9505 Ocean Shore Boulevard, St Augustine, FL 32080, USA

Contribution: Conceptualization, Funding acquisition, Resources, Writing – review & editing

Google Scholar

Find this author on PubMed

,
Johanna T. Cannon

Johanna T. Cannon

Department of Ecology, Evolution, and Marine Biology, University of California Santa Barbara, Santa Barbara, CA 93106, USA

Contribution: Resources, Writing – review & editing

Google Scholar

Find this author on PubMed

,
Armand Kuris

Armand Kuris

Department of Ecology, Evolution, and Marine Biology, University of California Santa Barbara, Santa Barbara, CA 93106, USA

Contribution: Conceptualization

Google Scholar

Find this author on PubMed

,
Ana Elisa Garcia-Vedrenne

Ana Elisa Garcia-Vedrenne

Department of Ecology, Evolution, and Marine Biology, University of California Santa Barbara, Santa Barbara, CA 93106, USA

Contribution: Data curation, Writing – review & editing

Google Scholar

Find this author on PubMed

,
M. Sabrina Pankey

M. Sabrina Pankey

Molecular, Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH 03824, USA

Contribution: Resources

Google Scholar

Find this author on PubMed

,
Emily A. Ellis

Emily A. Ellis

Department of Ecology, Evolution, and Marine Biology, University of California Santa Barbara, Santa Barbara, CA 93106, USA

Contribution: Resources

Google Scholar

Find this author on PubMed

,
Rebecca Varney

Rebecca Varney

Department of Biological Sciences, The University of Alabama, Campus Box 870344, Tuscaloosa, AL 35487, USA

Department of Ecology, Evolution, and Marine Biology, University of California Santa Barbara, Santa Barbara, CA 93106, USA

Contribution: Resources

Google Scholar

Find this author on PubMed

,
Jan Štefka

Jan Štefka

Department of Parasitology, University of South Bohemia, České Budějovice 37005, Czech Republic

Laboratory of Molecular Ecology and Evolution, Institute of Parasitology, Biology Centre CAS, České Budějovice 37005, Czech Republic

Contribution: Conceptualization, Funding acquisition, Project administration, Resources, Supervision, Writing – review & editing

Google Scholar

Find this author on PubMed

and
Jan Zrzavý

Jan Zrzavý

Department of Zoology, Faculty of Science, University of South Bohemia, České Budějovice 37005, Czech Republic

Contribution: Conceptualization, Funding acquisition, Supervision, Writing – review & editing

Google Scholar

Find this author on PubMed

    Abstract

    Dicyemids and orthonectids were traditionally classified in a group called Mesozoa, but their placement in a single clade has been contested and their position(s) within Metazoa is uncertain. Here, we assembled a comprehensive matrix of Lophotrochozoa (Metazoa) and investigated the position of Dicyemida (= Rhombozoa) and Orthonectida, employing multiple phylogenomic approaches. We sequenced seven new transcriptomes and one draft genome from dicyemids (Dicyema, Dicyemennea) and two transcriptomes from orthonectids (Rhopalura). Using these and published data, we assembled and analysed contamination-filtered datasets with up to 987 genes. Our results recover Mesozoa monophyletic and as a close relative of Platyhelminthes or Gnathifera. Because of the tendency of the long-branch mesozoans to group with other long-branch taxa in our analyses, we explored the impact of approaches purported to help alleviate long-branch attraction (e.g. taxon removal, coalescent inference, gene targeting). None of these were able to break the association of Orthonectida with Dicyemida in the maximum-likelihood trees. Contrastingly, the Bayesian analysis and site-specific frequency model in maximum-likelihood did not recover a monophyletic Mesozoa (but only when using a specific 50 gene matrix). The classic hypothesis on monophyletic Mesozoa is possibly reborn and should be further tested.

    1. Introduction

    Phylogenomic studies have dramatically improved our understanding of deep metazoan phylogeny, however, some key branches remain controversial (e.g. [19]). Incongruence in phylogenomic analyses could be owing to insufficient phylogenetic signal (e.g. owing to closely spaced branching events), limited taxon and/or gene sampling, or systematic error, particularly the long-branch attraction (LBA) artefact ([10]; reviewed by [11,12]).

    The early radiation of Lophotrochozoa (549–535 Ma; [13]), a major clade of protostome metazoans including annelids, molluscs, brachiopods, bryozoans, flatworms and rotifers among others, has been especially challenging to uncover [5,8,9,14,15]. This is probably owing to relatively rapid diversification [16] and fast evolutionary rates in some lineages, leading to long terminal branches in phylogenetic analyses possibly prone to LBA. Differences in branch lengths can be caused by differences in the rate of molecular evolution and generation time, as both are known to be variable for invertebrate phyla [17], especially in parasites [18]. This may aggravate the inherent difficulty of resolving ancient radiations because of short internal branches among deep nodes [19].

    The placement of Orthonectida and Dicyemida, two clades of enigmatic morphologically simple parasites, could be seen as examples of such a controversial case (for images of mesozoans, see electronic supplementary material, figure S1). Orthonectids (ca 25 spp. [20]) are parasites of a variety of marine invertebrates; their adults are free-swimming and produce ciliated larvae that enter the host tissues to form amoeboid trophic syncytia that effectively castrate their hosts [21]. Dicyemida (syn. Rhombozoa, used especially if including still unsequenced Heterocyemida; ca 100 spp. [22]) are symbionts inhabiting the renal organs of benthic cephalopods. Dicyemids were originally placed in Mesozoa by Van Beneden [23] who viewed them as a ‘link’ between Protozoa and Metazoa. By contrast, a hypothesis on mesozoan convergent evolution from more complex animals prevails in modern literature, even though their phylogenetic origins remain uncertain. Some early workers (reviewed in [24]) suggested mesozoans might be simplified platyhelminths, based on similarities in the life cycle and a putative resemblance of the mesozoan body structure to trematode miracidium larvae. Metschnikoff [25] derived orthonectids from annelids such as Dinophilus, which by themselves are secondarily simplified [26]. More recently, Slyusarev & Kristensen [27] found that external cells of Orthonectida are covered by a thin cuticle, which provided some support for the ‘annelid hypothesis’ of orthonectid origin.

    Kozloff [28, p. 193], who worked on both mesozoan groups, viewed the similarities between orthonectids and dicyemids as superficial and stated that ‘we do not know how either group has evolved, but it does seem clear that they are not closely related’. Phylogenetic analysis of complete 18S rRNA sequences recovered orthonectids and dicyemids as two separate clades of early branching animals [29], while analyses of morphology combined with 18S rRNA recovered Mesozoa as a monophyletic group close to the base of Bilateria [30]. Other attempts based on single genes supported a simplified bilaterian scenario for Dicyemida [3133], consistently grouping it within Lophotrochozoa.

    Phylogenomic analyses by Mikhailov et al. [34] (22 909 amino acid positions, 61 taxa), including the first published mesozoan genome (Intoshia, Orthonectida) recovered Orthonectida as a clade of uncertain position within Lophotrochozoa. Lu et al. [35] conducted phylogenomic analyses (58 124 amino acid positions, 29 taxa) with two genomes from both Orthonectida (Intoshia) and Dicyemida (Dicyema) and recovered Mesozoa as a monophyletic group close to gastrotrichs and/or platyhelminths. On the contrary, Schiffer et al. [36] proposed, based on both mitochondrial (2969 amino acid positions, 69 taxa) and nuclear (190 027 amino acid positions, 45 taxa) data, that Mesozoa is not monophyletic: orthonectid Intoshia was recovered as nested within Annelida while the position of Dicyemida (three spp. of Dicyema) within Lophotrochozoa was considered unresolved. Zverkov et al. [15] reported similar results in their analysis of nuclear protein-coding genes (87 610 amino acid positions, 73 taxa including Intoshia and two spp. of Dicyema). Taken together, the phylogenomic analyses to date (and most molecular analyses in general) have converged to the scenario that both mesozoan groups represent secondarily simplified lophotrochozoans, but their phylogenetic position within Lophotrochozoa and whether or not they form a monophyletic group remain unresolved. Difficulties in the placement of these enigmatic marine groups may be owing to their fast rate of evolution, as evidenced by extremely long branches of both mesozoan groups in all analyses.

    Multiple approaches exist to tackle systematic errors in phylogenomics. One straightforward way to treat LBA and test leaf stability is to remove other long branches, thereby decreasing the ‘pull’ of these long branches and compare the resulting position of the taxon in question (e.g. the position of Orthonectida after exclusion of Dicyemida from the matrix, and vice versa [36]). However, if more than two long-branch taxa are present, interpretation of the results becomes difficult. The site-specific models of sequence evolution, such as the CAT + GTR model [37] which is implemented in a Bayesian framework in PhyloBayes [38], and the PMSF model [39], implemented in a maximum-likelihood (ML) framework in IQ-Tree [40], have been purported to reduce LBA by modelling the actual complexity of the data resulting from biological processes. Other approaches thought to help mitigate LBA are based on selecting molecular markers based on properties such as branch-length homogeneity or compositional homogeneity. Balanced sequence composition has also been suggested to be a good predictor of phylogenetic signal (e.g. [41]), and taking steps to reduce compositional heterogeneity was advocated by Nesnidal et al. [42] in order to help overcome systematic artefacts affecting inference of relationships within Lophotrochozoa. Another approach for phylogeny reconstruction that may be useful for genomic datasets containing heterogeneous signal is the reconstruction of the species tree based on coalescent theory. This method has been argued to perform well in cases where heterogeneous single-gene trees are present in the analysed set [43,44].

    Here, we sought to explore the phylogenetic position(s) of Orthonectida and Dicyemida in the context of lophotrochozoan phylogeny, focusing on extensive comparisons of results from different approaches purported to reduce LBA. To this end, we assembled a dataset with significantly increased taxon sampling for Mesozoa (genome/transcriptome data from eight dicyemids and three orthonectids) and diverse representatives of other lophotrochozoan phyla (32 spp.) that we carefully screened to exclude exogenous contamination. We constructed several data matrices from different sets of genes, selecting genes according to different criteria purported to help reduce LBA and other artefacts thought to have impacted earlier studies of lophotrochozoan phylogeny. We performed phylogenetic analyses on these datasets using the following approaches: (i) ML using the best-fitting site-homogeneous model for each gene as implemented in RAxML [45]; (ii) Bayesian inference (BI) using the CAT + GTR empirical profile mixture model to account for site-specific evolutionary rate heterogeneity [46] as implemented in PhyloBayes [38]; (iii) ML inference using the site-specific PMSF profile mixture model [39] as implemented in IQ-Tree [40], enabling us to compare the use of site-specific models in both a ML and Bayesian framework; and (iv) a coalescent approach for phylogenomics as implemented in Astral II [47].

    2. Methods

    (a) Taxon sampling

    We sought to broadly sample the diversity of Lophotrochozoa with an emphasis on Dicyemida and Orthonectida, while avoiding including too many terminals in order to facilitate the computationally intensive analyses (e.g. PhyloBayes analyses with the site-heterogeneous CAT model). Thus, we opted to broadly sample each relevant phylum with the minimum number of taxa (typically 2–4 spp.) necessary to capture the deepest node in that phylum while using only high-quality (e.g. deeply sequenced and contamination-free) data. We constructed a dataset with seven newly sequenced dicyemid transcriptomes, one newly sequenced dicyemid genome, two newly sequenced orthonectid transcriptomes, and representatives of all other lophotrochozoan phyla, including three newly sequenced bryozoan transcriptomes (electronic supplementary material, table S1). To test the historical hypothesis that the mesozoans are closely related to flukes (Trematoda), Schistosoma was included as a representative of Neodermata (Platyhelminthes). Because orthonectids have been recovered as closely related to or even placed within Annelida, representatives of Palaeoannelida (Magelona, Owenia), Chaetopteriformia (Phyllochaetopterus), Sipunculida (Phascolosoma) and Pleistoannelida (Capitella) were included. Representatives of Cnidaria (Nematostella), Xenacoelomorpha (Xenoturbella and Meara), Ambulacraria (Ptychodera and Strongylocentrotus) and Ecdysozoa (Priapulus) were used as outgroups.

    (b) Molecular laboratory techniques

    To produce new transcriptome data for this study, total RNA was extracted from Alcyonidium sp. (Bryozoa), Cristatella mucedo (Bryozoa), Pectinatella magnifica (Bryozoa), two different collections of Rhopalura cf. ophiocomae (Orthonectida; both from ophiurid Amphipholis squamata), and dicyemids Dicyema sp. 1 (from octopus Enteroctopus dofleini), Dicyema sp. 2 (from Octopus bimaculoides), Dicyemida sp. 3 (from octopus Megaleledone setebos), Dicyemida sp. 4 (from Octopus vulgaris), and Dicyemennea brevicephaloides and D. rossiae (both from sepiolid Rossia pacifica). For assembly of dicyemid genome, individuals of Dicyema moschatum from octopus Eledone moschata were used. Dicyemid species were determined by combination of morphological characters (shape of the calotte in vivo, confocal fluorescent microscopy images; electronic supplementary material, figure S1) and host specificity. Specimen collection data, details of sample processing and sequence assembly (sample handling, extraction and cDNA library kits used, sequencing strategy, sequence assembly and annotation) for all newly sequenced taxa are provided in the electronic supplementary material, table S1.

    (c) Contamination filtering

    For the dicyemid genome assembly, only reads not mapping to the Octopus bimaculoides genome (NCBI GCF_001194135.1) were used in downstream analyses in order to remove host contamination (see the electronic supplementary material, S1 for details; e.g. figures S2 and S3 blobplots showing taxonomic assignment of sequences before and after decontamination step).

    In the case of transcriptomes, sequences were compared to the NCBI Protein and Nucleotide databases (accessed March 2017) to check for possible contamination. The result was semi-manually evaluated, checking the similarity scores and taxonomy ID of hits. If the hits clearly originated from contamination (e.g., sequences >95% similar to bacterial sequences, parasitic protists and other nonmetazoans, or cephalopods in the case of the dicyemids), they were removed (for details see electronic supplementary material, S2). Contamination-filtered nucleotide sequences were then translated into amino acids with Transdecoder (http://transdecoder.sf.net; script available in electronic supplementary material, S2).

    (d) Dataset construction

    Translated transcripts for all taxa were searched against 2259 lophotrochozoan-specific core orthologous groups (OGs) as profile hidden Markov models (Lophotrochozoa-Kocot pHMMs [48] in HaMStR 13.2.6 [49]. Briefly, this dataset is based on genes identified to be single-copy in genomes or deeply sequenced transcriptomes from representatives of Annelida, Brachiopoda, Entoprocta, Mollusca, Nemertea, Platyhelminthes, Phoronida, and Rotifera. Sequences matching an OG's pHMM were compared to the proteome of Lottia using blastp. If the Lottia amino acid sequence contributing to the pHMM was the best blastp hit in each of these back-BLASTs, the sequence was then assigned to that OG.

    OGs were carefully filtered for missing data and checked for paralogs. The remaining sequences were aligned with MAFFT (for details see the electronic supplementary material, S3).

    To reduce homoplasy that may be introduced by including distant outgroups, we excluded, for most analyses, all outgroups except the very slowly evolving ecdysozoan Priapulus (matrices with names ending in R have a Reduced outgroup, e.g. COMPR). We assembled several different matrices targeting genes with specific qualities such as missing data, phylogenetic signal, branch-length heterogeneity and compositional heterogeneity (electronic supplementary material, table S2, S4). We used Matrix Reduction (MARE [50]) to exclude OGs with low phylogenetic signal and to reduce missing data (a dataset called ‘MARER’ hereafter). OGs were sorted by branch length heterogeneity (‘CxLBR’ hereafter) and by compositional heterogeneity (‘CxHR’ hereinafter) to select the best 50 and 100 OGs according to each criterion (i.e. C50LBR, C100LBR, C50HR, C100HR). We used TreSpEx 1.0 [51] and the single-gene ML trees described above to assess branch length heterogeneity of each OG and BaCoCA [52] to assess the compositional heterogeneity of each OG according to the relative composition frequency variability (RCFV) metric. We also assembled a matrix of the best 50 OGs of the MARE-reduced dataset (MARER) according to compositional heterogeneity (M50HR). Both C50LBR and MARER datasets were used for our numerous taxon removal experiments (see below) and Bayesian inference analyses using the computationally demanding CAT + GTR model. For an overview of matrices produced in this study, corresponding partition data, and a flowchart explaining data matrix production see the electronic supplementary material, S4.

    (e) Phylogenomic analyses

    (i) Maximum-likelihood analyses

    Phylogenetic analyses of all datasets were conducted using ML with RAxML 8.2.4 [45]. Matrices were partitioned by gene (=orthologous group or OG) and the AUTO option was used to select the best-fitting model for each partition. For each ML analysis, the tree with the best likelihood score after 10 random addition sequence replicates was retained and topological robustness (i.e. nodal support) was assessed by the optimal number of rapid bootstraps. Furthermore, some matrices were analysed with posterior mean site frequency (PMSF) model in ML (LG + C60 + G + F [39]) as implemented in IQ-Tree 1.5.3 [40], as an approximation of the CAT site-specific model [37,53].

    (ii) Bayesian inference

    Bayesian analyses (BI) were performed in PhyloBayes MPI 1.5 or 1.7 [38] using the CAT + GTR model [37,53]. Because of the computationally intensive nature of PhyloBayes analyses, only datasets with a highly reduced number of genes sampled were analysed (see Results). Each analysis was run simultaneously in four chains. Analyses were periodically checked for progress and convergence among chains was assessed with the bpcomp program from the PhyloBayes package after discarding a burn-in of 1500 cycles. Convergence was assumed if the maxdiff dropped below 0.3, as advocated in the PhyloBayes manual. Chains were run until convergence or for at least 12 000 generations, but often for longer if resources were available (for details see the electronic supplementary material, S8).

    (iii) Comparison of maximum likelihood and Baysian inference tree-building methods

    In order to inspect differences between tree-building methods without the confounding factor of the long-branch mesozoans, we analysed the C50LBR matrix (the only BI analysis that successfully converged) by RAxML, IQ-Tree (PMSF model), and BI (CAT model). This data matrix was also modified by the exclusion of Dicyemida, Orthonectida, or both.

    (f) Taxon removal experiments

    To test the robustness of relationships within Lophotrochozoa, we carried out a series of taxon removal experiments based on the MARER dataset. Topological effects of the presence/absence of several clades of interest or their combinations (Platyhelminthes, Gastrotricha, Rouphozoa, Gnathifera, Gnathifera + Platyhelminthes, Entoprocta + Cycliophora, Bryozoa and Annelida, i.e. all possible sister groups of Mesozoa and/or clades containing long-branch taxa; for details see the electronic supplementary material, S5) were tested in ML (IQ-Tree).

    (g) Gene signal analysis

    We examined conflicting signal among genes by the difference in the log-likelihood score (difference in gene likelihood score (ΔGLS)) between two competing tree topologies (as in [54,55]). The two topologies correspond to the unconstrained tree resulting from ML analysis (T1; including monophyletic Mesozoa) and the constrained tree (T2) where Orthonectida were forced to group with Annelida. ML trees were estimated from the 202-gene (MARER) dataset under an automatically selected model per partition in RAxML. Site likelihood scores for contrasting gene topologies needed for ΔGLS computation were estimated under the LG model in RAxML. We also applied the approximately unbiased (AU) test in the software package CONSEL v. 0.20. The AU test was conducted using the multi-scale bootstrap technique based on the site-wise log-likelihood scores.

    (h) Coalescent approach

    Coalescent phylogenetic analysis of single-gene trees generated in RAxML (987 gene trees) was run in Astral-II 4.11.1 [47] with default settings. Supports were computed by gene resampling as suggested by Simmons et al. [56] in 1000 replicates. The presented majority rule consensus tree was computed by Phyutility [57].

    (i) Tree plotting

    All trees were plotted with ETE Toolkit v. 3 [58] and FigTree (http://tree.bio.ed.ac.uk/software/figtree/) and adjusted in Inkscape (https://inkscape.org/nl/) or Vectornator (https://www.vectornator.io). All scripts used for phylogenetic analyses and for plotting trees are provided in the electronic supplementary material, S6.

    3. Results

    (a) Matrix construction

    Our bioinformatic pipeline for orthologue clustering with strict paralogy filtering and decontamination resulted in a matrix with 49 taxa and 987 OGs that was 171 791 amino acid positions long, with 31% missing data (COMP). This set of OGs was further reduced to test the effects of analysing subsets of the best OGs in terms of phylogenetic signal, resulting in a MARE-reduced matrix (MARER) of 202 OGs, as well as branch-length heterogeneity and compositional heterogeneity, resulting in matrices composed of the best 100, and 50 OGs according to these criteria (C100LBR, C50LBR, C100HR and C50HR matrices; see the electronic supplementary material, S4).

    (b) Comparison of tree reconstruction approaches

    (i) Maximum likelihood

    ML analysis of the complete matrix (COMP) in RAxML, using the best-fitting model for each gene, recovered Mesozoa monophyletic with maximal support and sister to Gnathostomulida with strong bootstrap support (BS = 98; electronic supplementary material, figure S4 in the electronic supplementary material, S7). Overall, Lophotrochozoa was split into two subclades: Trochozoa (Annelida, Brachiopoda, Phoronida, Mollusca and Nemertea) and a clade we refer provisionally to as ‘Platyzoa s.l.’, which includes platyzoan as well as polyzoan phyla. Polyzoa (Bryozoa + Entoprocta–Cycliophora) and Gnathifera (Gnathostomulida + Micrognathozoa–Rotifera) were not recovered monophyletic. Removal of distant outgroup taxa in favour of the short-branch ecdysozoan Priapulus did not have any effect on the resulting branching order and support values were comparable (electronic supplementary material, figure S5 in S7). Thus, all outgroup taxa except Priapulus were excluded from subsequent analyses to decrease computational demands.

    To examine the effect of using a site-heterogeneous model in a ML framework, we analysed the COMP matrix with the posterior mean site frequency model (PMSF [39]) with 60 rate categories as implemented in IQ-Tree 1.5.3 [40]. The resulting tree topology (figure 1) slightly differed from that of the RAxML analysis: when the site-specific PMSF model was used, monophyletic Mesozoa was again maximally supported but recovered sister to Platyhelminthes (BS = 90), with that clade sister to monophyletic Gnathifera (BS = 99). For comparison of site-specific models in ML and BI we also analysed a smaller matrix (C50LBR that was used also for BI analyses) with the PMSF model. The resulting tree recovered Mesozoa as two separate clades in Platyzoa s.l.: Orhonectida were sister to Platyhelminthes and Dicyemida sister to Gnathifera (albeit both with low support: BS = 49 and 54, respectively; electronic supplementary material, figure S6 in S7).

    Figure 1.

    Figure 1. Tree representing mesozoan position in Lophotrochozoa based on 987 genes, computed in maximum-likelihood framework with site-specific model with 60 categories (IQ-Tree c60 PMSF COMP), only bootstrap supports lower than 100 showed. (Online version in colour.)

    ML analysis of the MARE-reduced matrix (MARER; electronic supplementary material, figure S7 in S7) also recovered Mesozoa as a maximally supported monophyletic group within paraphyletic Gnathifera, as the sister group of Gnathostomulida (BS = 92). The high similarity of the MARER and COMPR topologies suggests that the reduced matrix represents well the whole dataset and could substitute it where the lower number of genes is essential to carry out the analysis.

    In an attempt to reduce potential sources of systematic error, we assembled datasets based on reduced subsets of the best 100 genes with the lowest branch heterogeneity (C100LBR; electronic supplementary material, figure S8 in S7) and the lowest compositional heterogeneity (C100HR; electronic supplementary material, figure S9 in S7), and then compare RAxML trees derived from both matrices. Results of both analyses recovered Mesozoa monophyletic within Platyzoa, either as a sister group of Platyhelminthes, with monophyletic Gnathifera sister to the Mesozoa–Platyhelminthes clade (C100LBR), or as a sister group of Gnathostomulida, and Platyhelminthes and Micrognathozoa–Rotifera as successive sister groups to the Mesozoa–Gnathostomulida clade (C100HR).

    (ii) Bayesian inference

    As the site-heterogeneous CAT + GTR model has been suggested to be more robust against long-branch attraction artefacts than site-homogeneous models [37] (but see [59]), we performed BI analyses with this model in PhyloBayes. Matrices with reduced numbers of genes (C50LBR, C100LBR, C100HR, and M50HR) and just Priapulus as the outgroup were chosen for BI, owing to the computational demands of the complex CAT + GTR model. In all of the analyses of C50LBR, the four chains converged on one solution, according to the PhyloBayes bpcomp maxdiff statistic [38], but the other analyses failed to converge after greater than 30 000 generations (see the electronic supplementary material S8 for details and resulting trees). The BI analysis of the C50LBR dataset (electronic supplementary material, figure S10 in S7) recovered Mesozoa polyphyletic but within monophyletic Platyzoa, with Dicyemida in a poorly supported polytomy with Gnathostomulida and Micrognathozoa–Rotifera, and Orthonectida sister to Platyhelminthes (with posterior probability (PP) = 0.98, a moderate support by Bayesian standards).

    (iii) Comparison of maximum likelihood and Baysian inference tree-building methods

    In order to inspect differences between tree-building methods, we analysed the C50LBR matrix (the only BI analysis that successfully converged) with or without Dicyemida and Orthonectida, respectively, by RAxML (or with similar setting in IQ-Tree with automatically selected gene models), IQ-Tree (PMSF model) and BI (CAT model in PhyloBayes). Both ML trees with excluded mesozoans were identical (electronic supplementary material, figures S11 and S12 in S7): Platyhelminthes and Gnathifera formed a clade, and Entoprocta–Cycliophora, Bryozoa–Gastrotricha, Nemertea, Annelida–Brachiozoa and Mollusca were successive sister groups to it (i.e. there were monophyletic Platyzoa s.l. within paraphyletic Trochozoa). On the contrary, in the BI tree excl. Mesozoa (electronic supplementary material, figure S13 in S7) there were monophyletic Trochozoa and Platyzoa s.l., both weakly supported, and within Platyzoa s.l., a very weakly supported basalmost position of Gnathostomulida (i.e. polyphyletic Gnathifera). When only Orthonectida were included, they were recovered as a sister group of Platyhelminthes in all trees (electronic supplementary material, figures S14–S16, in S7). Dicyemida alone were recovered as a sister group of Gnathifera as a whole (IQ-Tree, electronic supplementary material, figure S17 in S7), or grouped with Gnathostomulida within monophyletic Gnathifera (RAxML, electronic supplementary material, figure S18 in S7), or in an unresolved trichotomy with Gnathostomulida and Micrognathozoa–Rotifera (BI; electronic supplementary material, figure S19 in S7). When both mesozoan groups were added to the analyses, they became diphyletic, both placed at the same positions as individual mesozoan clades alone (Dicyemida next to Gnathifera, or in a polytomy with their subclades, Orthonectida sister to Platyhelminthes; electronic supplementary material, figures S6 and S10 in S7). Only in the RAxML tree the dicyemids were apparently attracted toward orthonectids to form monophyletic Mesozoa sister to Platyhelminthes (electronic supplementary material, figure S20 in S7).

    (iv) Coalescent approach

    The coalescent tree (electronic supplementary material, figure S21 in S7) recovered mesozoans monophyletic with maximal support. They were nested in polytomy with Gnathostomulida, Rotifera–Micrognathozoa and Platyhelminthes with relatively high support (BS = 97). Overall, the coalescent tree showed a similar branching pattern as ML and BI analyses (Lophotrochozoa split into Platyzoa s.l. and Trochozoa). In this case, however, Bryozoa was not associated with Platyzoa s.l. as in ML and BI trees, but was recovered as an early branching clade of Trochozoa.

    For comparison of topological differences in trees resulting from ML, BI and coalescent methods (matrices COMP, MARER, C50LBR) see the electronic supplementary material, table S3.

    (c) Taxon removal experiments

    In order to examine the effect of taxon sampling on the position of long-branch clades, we performed a series of taxon removal experiments, based on the 202-gene MARER dataset, analysed by both RAxML and IQ-Tree (electronic supplementary material, figures S22–S40 in S5). The IQ-Tree-based tree with both mesozoan groups excluded was split to Trochozoa and Platyzoa s.l.; the latter group included Platyhelminthes and Gnathifera as sister groups, followed by Gastrotricha, Entoprocta–Cycliophora and Bryozoa. Orthonectida alone groups with Gnathostomulida (within Gnathifera), Dicyemida alone with Platyhelminthes (or as a sister group of Catenulida within Platyhelminthes in RAxML tree). When both mesozoan clades were present they formed monophyletic groups, sister to Gnathostomulida (i.e. at the same position as Orthonectida alone). However, all deeper nodes were weakly supported. Subsequent exclusion of Platyhelminthes, Gastrotricha, Rouphozoa (Gastrotricha + Platyhelminthes), Gnathifera, Gnathifera + Platyhelminthes, Entoprocta–Cycliophora, Bryozoa and Annelida had no effects on the tree topology. Only when Gnathostomulida were excluded, Mesozoa did not stay as a sister group of the rest of Gnathifera but transferred towards Platyhelminthes. On the contrary, removing the Micrognathozoa and Rotifera had no effect and Mesozoa remained close to the Gnatostomulida. Importantly, in all experiments Mesozoa remained a clade within Platyzoa s.l. Only when all Platyzoa and Dicyemida were excluded (merely 17 spp. of Trochozoa and Orthonectida plus Priapulus remained), Orthonectida was recovered as nested within Annelida, albeit with low support (electronic supplementary material, figure S37 in S5).

    (d) Gene-conflict analyses

    The likelihood score of the constrained topology T2 (Orthonectida grouping with Annelida), compared by AU test, was significantly worse than that of the unconstrained topology with monophyletic Mesozoa (T1) in the case of the MARER matrix. Out of 202 genes in the data matrix, 148 support topology T1 and only 54 support the constrained topology T2 based on ΔGLS (electronic supplementary material, figure S41 in S9). However, the RAxML tree based on genes supporting T1 includes, as usually, monophyletic Mesozoa within Gnathifera (as a sister group of Gnathostomulida), but the tree based on T2-supporting genes includes Dicyemida as a sister group of Catenulida within Platyhelminthes, Orthonectida sister to the Platyhelminthes–Dicyemida clade, followed by Gastrotricha, and monophyletic Gnathifera are sister to the whole Rouphozoa (Gastrotricha + Platyhelminthes) clade (the latter including both mesozoan groups). The T1-supporting genes are insignificantly longer than T2-compatible ones (mean 159 > 139); also taxonomic representativeness is comparable in both groups of genes (mean 36.5 : 34.5) but compositional heterogeneity is slightly higher in T1-supporting genes (T1 0.163, T2 0.149).

    4. Discussion

    (a) Phylogenetic relationships

    Consistent with previous studies [29,35,36], all our results confirm that both Orthonectida and Dicyemida are secondarily morphologically simplified bilaterians, not a ‘primitive missing link’ between Protozoa and Metazoa as occasionally suggested earlier [23,60]. Both Orthonectida and Dicyemida were recovered as monophyletic groups belonging to Lophotrochozoa in all analyses performed in this study. In general, our results represent a ‘traditional view’ on lophotrochozoan relationships [5] with small- and simple-bodied phyla forming a clade (Platyzoa s.l.) apart from Trochozoa; the newly sequenced transcriptomes supported isolated position of Bryozoa next to other Platyzoa s.l. rather than monophyletic Lophophorata within Trochozoa [61]. Importantly, this pattern was not affected by the presence/absence of mesozoans.

    Dicyemids fundamentally differ from orthonectids by having no cuticle, nervous system, muscles, or true tissues at all. Furthermore, life-histories of the two groups are extremely derived yet quite different (host-parasite relationships, host spectrum, position of parasitic stage in the life cycle, etc.). The only potentially shared ultrastructural character, a unique type of ciliary rooting [62], has been challenged by the discovery of more diverse ciliary rootlets in Orthonectida [27]. Consequently, the monophyletic phylum Mesozoa has recently been doubted or rejected by most authors. However, the genomic data so far provided equivocal results (monophyly: [35]; polyphyly: [15,36]). Nevertheless, the old hypothesis about monophyly of Mesozoa seems to be (quite surprisingly) supported by most of our phylogenomic analyses. All ML and coalescent analyses of the complete 987-gene (COMP) and 202-gene (MARER) datasets (BI analyses did not converge) recovered Mesozoa monophyletic, usually with strong support, and close to Gnathifera or Gnathostomulida (COMP: RAxML, MARER: RAxML, IQ-Tree, coalescent), to Platyhelminthes (COMP: IQ-Tree), or in the polytomy with both (COMP: coalescent; figure 2 summarizing the most probable position of Mesozoa in Lophotrochozoa). The same applied also to RAxML analysis of the C50LBR dataset: Mesozoa were found monophyletic and sister to Platyhelminthes. Both gene subsampling strategies (branch length homogeneity and compositional homogeneity), used in an attempt to alleviate LBA, were not able to disintegrate monophyletic Mesozoa or to reduce its strong support.

    Figure 2.

    Figure 2. Overview of possible scenarios of mesozoan evolutionary history. Animal silhouette images are from Phylopic.com. (Online version in colour.)

    Contrarily, the IQ-Tree and BI analyses (with site-specific models) of the C50LBR data matrix recovered Mesozoa split into two separate clades. Dicyemida was found either within Gnathifera in a poorly supported polytomy with Gnathostomulida and Micrognathozoa–Rotifera (BI, with only very low support pp = 0.62), or as a sister group of Gnathifera (IQ-Tree, again with very low support BS = 54). The position of Orthonectida as a sister group of Platyhelminthes was relatively well supported in the BI tree (PP = 0.98) but not in the IQ-Tree one (BS = 49).

    The possibility that Mesozoa are two separate groups attracted to each other owing to long-branch artefact was tested by including just one mesozoan group in absence of the other. Both Dicyemida and Orthonectida consistently grouped within the Gnathifera–Platyhelminthes clade. The taxon-removal analyses suggest that Gnathostomulida (but not Micrognathozoa and Rotifera) is the only attractor strong enough to change position of mesozoans from Platyhelminthes to Gnathifera. Thus, we can assume that gnathiferan or even within-gnathiferan affinities of Mesozoa are probably artificial, and the alternative, i.e. close relationship between Mesozoa (or at least Dicyemida; see below) and Platyhelminthes, is more probable. The hypothesis considering Mesozoa as extremely simplified parasitic flatworms (namely, Neodermata, the group including flukes and tapeworms) has actually been considered for a long time (see [63]), based on putative similarities between trematode miracidia and mesozoan infective larvae and more recently on the ciliary rootlet system putatively shared by Neodermata and Mesozoa (but see above). On the contrary, Mesozoa has the orthodox mitochondrial genetic code rather than the apomorphic unorthodox code characteristic for Rhabditophora (that includes Neodermata), which suggests the idea that they might be derived flatworms is unlikely [64]. Even if Dicyemida were occasionally recovered as platyhelminths here (T2 tree in ΔGLS analysis or IQ-Tree analysis of the MARER dataset with orthonectids excluded), they were a sister group of Catenulida, not an in-group of Rhabditophora. Hence, parasitism of the two mesozoan groups has certainly evolved independently from the parasitism of neodermatan platyhelminths, even if they share evolutionary origins with flatworms.

    Whereas phylogenetic position of dicyemids alone was usually compatible with the position of the whole Mesozoa, higher instability of position of Orthonectida was evident in published studies. Mikhailov et al. [34] sequenced the genome of Intoshia linei and recovered Orthonectida as a sister group of Platyhelminthes in Bayesian analysis with the CAT + GTR model as well as in ML analysis, yet the use of CAT model with a flat rate in the Bayesian analysis resulted in Orthonectida sister to Annelida. With the addition of a dicyemid genome, Lu et al. [35] recovered Orthonectida (together with Dicyemida in monophyletic Mesozoa) as close relatives of Gastrotricha and/or Platyhelminthes, while Schiffer et al. [36] and Zverkov et al. [15] found them as early branching annelids, unrelated to Dicyemida, and Bondarenko et al. [65] even as a sister group of leeches (Annelida: Pleistoannelida: Clitellata).

    The annelid affinities of the Orthonectida, based on morphology, have been proposed previously, starting with Metschnikoff [25]. In more recent studies, it has been hypothesized on the basis of annelid-like microvilli-formed cuticle in the free-living stages of Orthonectida [27]. Orthonectids (contra dicyemids) also have a true epidermis (but their epidermal basal lamina is reduced or absent). A possible trace of an ancestral segmented body plan is seen as the series of regularly spaced circular muscles along the anterior-posterior axis of Orthonectida (along with repeated bands of cilia and paired serotonin-like immunoreactive nerve cells [66]). However, orthonectids use cilia for locomotion, and their muscle system is reduced to serve almost exclusively for copulation and hatching of larvae. The annelid muscular pattern consists of two muscle layers, external circular/transverse and internal longitudinal. In Orthonectida, the circular muscles are situated inversely, inside the longitudinal muscles. The only annelid with external longitudinal muscles sunken into the epidermis is non-segmented meiofaunal Lobatocerebrum [67], a member of Dinophiliformia, one of the basal annelid clades [26]. The possible relationships of Orthonectida and basal annelids are compatible with the results published by Schiffer et al. [36] who found in the Orthonectida a short stretch of mitochondrial genes (nad1, nad6, cob) in the same order as in Owenia but not as in the pleistoannelids.

    In our study, the scenario of Orthonectida related or belonging to Annelida was recovered only in one unconverged BI analysis (C100HR matrix) and in a taxon removal experiment where no dicyemid and platyzoan taxa were present, both results being extremely problematic to interpret. The AU test showed that topology with orthonectids related to annelids is significantly worse based on our data than topology with monophyletic Mesozoa within Platyzoa. Furthermore, ΔGLS analysis revealed that monophyly of Mesozoa is supported by more genes than its polyphyly with annelid affinities of Orthonectida (148 : 54). Moreover, even in the genes compatible with the annelid hypothesis, the ‘annelid signal’ is minor, and the resulting tree included within-platyzoan Mesozoa, albeit diphyletic.

    (b) Different methods, different views

    Two main approaches are typically used in analysing phylogenomic matrices, BI and ML, which have a wide, but not fully overlapping range of available models of sequence evolution. Site-specific frequency models (or empirical profile mixture models) based on modelling of each alignment site individually are considered to better model sequence evolution and therefore be more reliable in obtaining correct phylogeny in difficult cases [46,68]. Until recently, the site-specific frequency model (CAT [53]) was available only in the Bayesian framework, but an approximation of such a model (posterior mean site frequency model [39]) has recently been implemented in IQ-Tree in the ML framework.

    In the case of Mesozoa, almost all maximum-likelihood-based methods, regardless of the number of genes used, recovered a similar topology; contrarily, Bayesian analyses tended to recover a different tree. Based on the 50-gene matrix (C50LBR, the only matrix that converged in BI), both BI and IQ-Tree analyses recovered diphyletic Mesozoa (Dicyemida close to Gnathifera, Orthonectida to Platyhelminthes), while in the RAxML tree Mesozoa is monophyletic and sister to Platyhelminthes. It could suggest that the site-specific model is a key for recovery of this specific topology in the 50 gene matrix. However, when the same model was used in ML on bigger matrices, it again consistently showed Mesozoa as monophyletic.

    Similar inconsistency was recovered by Zverkov et al. [15]. They presented BI phylogeny with polyphyletic Mesozoa and paraphyletic Platyzoa (basal Gnathifera, followed by the Dicyemida–Entoprocta–Cycliophora–Gastrotricha–Platyhelminthes clade, the latter sister to Trochozoa incl. Bryozoa and Orthonectida) but they recovered monophyletic Mesozoa within monophyletic Platyzoa in RAxML tree (their electronic supplementary material, figure S4) and monophyletic Mesozoa as a sister group of Rouphozoa (Gastrotricha + Platyhelminthes) in IQ-Tree topology (electronic supplementary material, figure S5). Both Zverkov et al.'s and the present analyses suggest a consistent conflict between ML and BI topologies. Surprisingly, ML methods with both classic and site-specific PMSF models showed similar scenarios.

    The inference of relationships among lophotrochozoan clades is thought to be hindered by the effect of LBA owing to (sometimes dramatic) differences in molecular evolutionary rate, the application of inaccurate models of sequence evolution, and the inherent difficulty of resolving ancient radiations because of short stem branches in deep nodes [19]. Using genes with specific characteristics (branch length homogeneity, compositional homogeneity) has been suggested to be more suited for phylogenetic inference of taxa with uneven branch lengths [41,69]. However, targeting genes with these specific qualities did not have a clear effect in the case of mesozoan phylogeny, similarly as in an earlier phylogenomic analysis of Lophotrochozoa [48]. Also, in our case, the phylogenetic position of the long-branch mesozoan groups was not affected by taxon-removal experiments. This suggests that either LBA is not the main player causing significant errors in the case of lophotrochozoan and mesozoan phylogeny, or that LBA in the case of Mesozoa is too strong to be overcome by the currently used methods.

    Data accessibility

    Electronic supplementary material can be found online [70].

    Authors' contributions

    M.D.: data curation, investigation, methodology, project administration, resources, software, visualization, writing—original draft, writing—review and editing; K.M.K.: conceptualization, data curation, funding acquisition, methodology, project administration, resources, supervision, writing—review and editing; K.M.H.: conceptualization, writing—review and editing; T.H.O.: conceptualization, funding acquisition, writing—review and editing; A.E.G.-V.: data curation, writing—review and editing; M.S.P.: resources; R.V.: resources; L.L.M.: conceptualization, funding acquisition, resources, writing—review and editing; J.T.C.: resources, writing—review and editing; A.K.: conceptualization; E.A.E.: resources; J.Š.: conceptualization, funding acquisition, project administration, resources, supervision, writing—review and editing; J.Z.: conceptualization, funding acquisition, supervision, writing—review and editing.

    All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

    Conflict of interest declaration

    We declare we have no competing interests.

    Funding

    This work was supported by the Czech Science Foundation (GACR grant nos. 15-08717S to J.Z. and M.D., and 19-28399X to J.Š.) and, in part, by the United States National Science Foundation (grant nos. 1146575, 1557923, 1548121 and 1645219 to L.L.M. and 1846174 to K.M.K.).

    Acknowledgements

    We would like to thank colleagues and institutions who let us use their facilities for sample processing, namely Graziano Fiorito and Giovanna Ponte (Stazione Zoologica Anton Dohrn, Naples, Italy); Andrea De Lucia, Andrea Camedda and Stefano Marra (Institute for Coastal Marine Environment, CNR, Oristano, Sardinia, Italy) and we acknowledge the Laboratory of Microscopy and Histology of the Biology Centre CAS for technical help with the guiding through immunohistochemical visualization and for allowing the use of their facilities. We thank Deb Crocker and Robert Griffin for support of the University of Alabama High-Performance Computing cluster (UAHPC). Computational resources were also supplied by the project ‘e-Infrastruktura CZ’ (e-INFRA LM2018140) provided within the programme Projects of Large Research, Development and Innovations Infrastructures, as well as the National Science Foundation (CNS-1725797) administered by UCSB's Center for Scientific Computing (CSC).

    Footnotes

    Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.6070258.

    Published by the Royal Society. All rights reserved.