Unexpected cryptic species among streptophyte algae most distant to land plants

Streptophytes are one of the major groups of the green lineage (Chloroplastida or Viridiplantae). During one billion years of evolution, streptophytes have radiated into an astounding diversity of uni- and multicellular green algae as well as land plants. Most divergent from land plants is a clade formed by Mesostigmatophyceae, Spirotaenia spp. and Chlorokybophyceae. All three lineages are species-poor and the Chlorokybophyceae consist of a single described species, Chlorokybus atmophyticus. In this study, we used phylogenomic analyses to shed light into the diversity within Chlorokybus using a sampling of isolates across its known distribution. We uncovered a consistent deep genetic structure within the Chlorokybus isolates, which prompted us to formally extend the Chlorokybophyceae by describing four new species. Gene expression differences among Chlorokybus species suggest certain constitutive variability that might influence their response to environmental factors. Failure to account for this diversity can hamper comparative genomic studies aiming to understand the evolution of stress response across streptophytes. Our data highlight that future studies on the evolution of plant form and function can tap into an unknown diversity at key deep branches of the streptophytes.

II, 0000-0002-3628-1137; TD, 0000-0002-1957-0076; TP, 0000-0002-7858-0434; JMRF-J, 0000-0002-5269-8725; MJ, 0000-0002-2930-9226; JdV, 0000-0003-3507-5195 Streptophytes are one of the major groups of the green lineage (Chloroplastida or Viridiplantae). During one billion years of evolution, streptophytes have radiated into an astounding diversity of uni-and multicellular green algae as well as land plants. Most divergent from land plants is a clade formed by Mesostigmatophyceae, Spirotaenia spp. and Chlorokybophyceae. All three lineages are species-poor and the Chlorokybophyceae consist of a single described species, Chlorokybus atmophyticus. In this study, we used phylogenomic analyses to shed light into the diversity within Chlorokybus using a sampling of isolates across its known distribution. We uncovered a consistent deep genetic structure within the Chlorokybus isolates, which prompted us to formally extend the Chlorokybophyceae by describing four new species. Gene expression differences among Chlorokybus species suggest certain constitutive variability that might influence their response to environmental factors. Failure to account for this diversity can hamper comparative genomic studies aiming to understand the evolution of stress response across streptophytes. Our data highlight that future studies on the evolution of plant form and function can tap into an unknown diversity at key deep branches of the streptophytes.

Background
Green algae and land plants (Chloroplastida or Viridiplantae) consist of three major lineages: the recently pinpointed Prasinodermophyta [1], Chlorophyta and Streptophyta [2]. Streptophyta are about a billion years old [3,4] and encompass the main constituents of the land flora, the Embryophyta (land plants). In addition, Streptophyta include the algal relatives of land plants, known as streptophyte algae. In the past few years, the phylogenetic backbone of the green lineage has been brushed up. This was largely thanks to both an increased effort in sequencing streptophyte algae [5][6][7][8][9][10][11][12][13] and the use of these data in phylogenomic analyses to infer a robust green tree of life [2,14,15]. The new phylogenetic framework marked a milestone; it clarified the phylogenetic relationships among land plants and their streptophyte algal relatives. Within streptophytes, the position of Zygnematophyceae as closest relatives to land plants made quite a splash. However, equally important was the recovery of Mesostigmatophyceae, Spirotaenia spp. [2] and Chlorokybophyceae as sister to all other Streptophyta [16]. Both Chlorokybophyceae and Mesostigmatophyceae are thought to encompass, respectively, one or few extant species. The apparent low diversity in these key lineages complicates macroevolutionary studies that aim to reconstruct the early evolution of key traits in the streptophyte ancestor. Recent genomic and phylogenomic investigations have honed in on freshwater and terrestrial streptophyte algae because they provide important insights into the origin of land plants and the evolution of response mechanisms to terrestrial stressors [2,5,7,10,12,13,17].
Here, we investigate the diversity within the Chlorokybophyceae using a phylotranscriptomic approach with broad sampling of isolates across its known distribution (Eurasia, Central and South America). We pinpoint that the Chlorokybophyceae consist of a cryptic species complex of at least five extant members.

Results and discussion (a) Chlorokybophyceae is an oligotypic class
Chlorokybophyceae is thought to be a monotypic class with a single described species, Chlorokybus atmophyticus Geitler 1942. Chlorokybus is a subaerial alga inhabiting soil and rock surfaces and cracks [18][19][20][21]; it has been isolated from Europe and Central and South America, although it is thought to have a cosmopolitan distribution, despite being rare (electronic supplementary material, 'Portrait and history of Chlorokybus'). To further explore the distribution and diversity of Chlorokybus, we searched four large soil environmental sequencing datasets (Neotropical forest, Swiss Alps, meadow and agricultural soils from the UK and Tibet, and a set of globally distributed soils; approximately 128 Mio. reads total). Only a single amplicon sequence variant (ASV) of Chlorokybus was obtained, which was composed of 32 reads total (less than 0.01% abundance; electronic supplementary material, table S1). This ASV originated from a high-altitude Swiss Alpine soil sample [22]. Phylogenetic analyses confirmed the identity of this ASV as Chlorokybus, but its precise phylogenetic position could not be determined because the SSU V4 region has limited phylogenetic signal [23] (electronic supplementary material, figure S1). None of the primer sets used in the above studies were biased against Chlorokybus and DNA extraction methods are unlikely to be so, but the lack of rocky outcrop samples in the above studies could have exacerbated the reported low abundance. Currently, 11 strains of Chlorokybus are available in public culture collections, none of them were isolated from the type locality and therefore no authentic strain is available (electronic supplementary material, table S2). We performed a phylogenetic analysis including all available Chlorokybus strains with two commonly used nuclear markers (SSU and ITS rDNA). This phylogeny suggested a deep genetic structure within Chlorokybus (figure 1a). Extensive observations under light microscope revealed no obvious morphological differences among the studied isolates, despite marked genetic divergences: all studied Chlorokybus isolates form sarcinoid, cubical packets of two to eight cells with a gelatinous matrix; cells are spherical or broadly ellipsoidal and contain a parietal slightly lobated chloroplast with two types of pyrenoids (figures 1 and 2; electronic supplementary material, figure  S3; full description is provided below). The life cycle is haploid and was studied by Rieth [21] (figure 2). Since the phenotype did not give away hints as to the differences among the Chlorokybus strains, we garnered more sequence data.

(b) A phylotranscriptomic framework for Chlorokybus
Using the Illumina NovaSeq6000 platform, we generated 224 million paired-end reads (greater than 47 Gbp of raw sequence information) on four isolates of Chlorokybus from across its known distribution range. Combining these data with published genomic and transcriptomic information from other algae and land plants (electronic supplementary material, table S3), we inferred a robust phylogenomic tree based on 529 densely sampled loci (17% missing data). The maximum-likelihood tree, which was inferred with IQ-TREE under the LG + F+ I + Γ4 + C60 mixture model, unambiguously recapitulated the accepted phylogeny of the green lineage (Chloroplastida), including the position of Chlorokybus (Chlorokybophyceae), Mesostigma (Mesostigmatophyceae) and Spirotaenia minuta as the sister group to all other  figure S2). Our phylotranscriptomic trees show unmistakable deep genetic structure within Chlorokybus, represented here by eight isolates. The genetic distances among Chlorokybus isolates are often more than twice as those recovered among three different species of Arabidopsis (figure 3a). The inferred patristic (maximum-likelihood) distances among Chlorokybus species are between 0.0254 and 0.0874 substitutions per site ( p-uncorrected distances: 0.0245-0.0730), whereas the distances among the three Arabidopsis species are between 0.0149 and 0.0346 ( p-uncorrected distances: 0.0147-0.0332) (table 1). A Bayesian relaxed molecular clock analysis calibrated with eight fossils (uniform priors) found that divergences within Chlorokybus could be as old as 76 Ma (95% HPD interval: 54-102 Ma) and the divergence between the two closest isolates described here as species-C. atmophyticus and C. melkonianii sp. nov., see below-was 24 Ma (95% HPD 15-34 Ma) (electronic supplementary material, figure S4). The use of more informative prior distributions for fossil calibrations (t-cauchy and skew-t) produced slightly younger divergences, as expected, but differences were not substantial (average differences within Chlorokybus were 0.47 and 1.47 Ma, respectively) (electronic supplementary material, figure S4). In contrast to Chlorokybus, the divergences among Arabidopsis species were 13 Ma (95% HPD 7-19 Ma) and 28 Ma (95% HPD 18-39 Ma).
To further scrutinize the deep genetic structure within Chlorokybus, we performed a maximum-likelihood analysis of 75 plastid proteins using IQ-TREE and the best-fit cpREV + F + I + Γ4 + C60 mixture model. The plastid phylogeny was moderately resolved and statistically supported; it further confirmed the deep divergences among Chlorokybus isolates, even though internal relationships in Chlorokybus differed from the nuclear tree (electronic supplementary material, figure S5). Similar plastid-nuclear incongruences are often observed in algae, for example in Volvocales [24], and might be due to either methodological or biological reasons (e.g. introgression), or both. While biological confounding factors cannot be excluded, the failure to recover Amborella as sister to all other flowering plants suggests the presence of biases and/or limited phylogenetic signal in the plastid dataset. At any rate, both plastid and nuclear marker phylogenies agreed on the presence of deep divergences among Chlorokybus isolates.
The final assessment of the genetic diversity within Chlorokybus is based on the more robust nuclear phylotranscriptomic dataset. On the basis of the inferred deep divergences, we propose a new taxonomic arrangement by describing four new species and assigning a lectotype and an epitype for C. atmophyticus, for which no authentic strain is available in public culture collections (see 'Systematic botany').
Taking advantage of the fact that the new isolates were grown simultaneously under the same experimental conditions, we explored whether the genetic distances among species are reflected in differences in global gene expression patterns. Clean Illumina reads were mapped against the annotated Chlorokybus genome using STAR [25] and expression quantified with RSEM [26], followed by TMM (trimmed mean of Mvalues) cross-sample normalization. While the lack of biological replicates prevented us from inferring differential gene expression, we observed marked differences in steady-state gene expression levels among the four new isolates (figure 3b, c). The clustering of expression values mirrored the species phylogeny, with NIES-160 (C. riethii sp. nov.) showing the most  royalsocietypublishing.org/journal/rspb Proc. R. Soc. B 288: 20212168 different expression profile, followed by SAG 2611 (C. bremeri sp. nov.), and the more similar profiles shown by ACOI 1086 (C. atmophyticus) and SAG 2609 (C. melkonianii sp. nov.). Yet, even the two latter isolates showed marked differences in gene expression, which together with the reported genetic distances support the notion that they are not only different species but might also exhibit different cell physiologies.

Conclusion
Here, we report on the presence of consistent deep structure within Chlorokybus after analysing all currently available isolates. These divergences might date back to approximately 76 Ma and are twice as large as those among some flowering plant species (e.g. Arabidopsis). Deep genetic divergences    Table 1. Genetic distances among Chlorokybus isolates and Arabidopsis species measured from concatenated amino acid alignments of 529 loci (178 397 aligned amino acids). p-uncorrected (upper triangle) and maximum-likelihood distances (lower triangle; figure 1) are shown, with intra-specific comparisons in italics. among Chlorokybus isolates are further supported by substantial gene expression variation when grown under the same experimental conditions. Yet, these genetic differences are not reflected in appreciable morphological differences, which suggest the presence of undescribed cryptic diversity within this lineage. All this genetic diversity has remained unnoticed under the umbrella name Chlorokybus atmophyticus, the only validly described species so far. To remedy this, we describe four new species of Chlorokybus and designate a cryopreserved culture as epitype for C. atmophyticus. Chlorokybus species are probably cosmopolitan but rare, as further supported by our search across global soil metabarcoding datasets that identified a single sequence of this genus. Properly recognizing the existing diversity within Chlorokybus is paramount, given the key phylogenetic position of Chlorokybophyceae, which together with Spirotaenia spp. [27] and Mesostigmatophyceae are the sister lineage to all other streptophytes. This diversity has to be taken into account for the adequate comparison of current and future data from different Chlorokybus strains [2,8,13]. In fact, the reported gene expression differences might even suggest certain interspecific variability in responding to environmental factors and adequately accounting for this will be essential in comparative genomic studies that aim to understand the evolution of key traits (such as phytohormone or stress response pathways [17]) along the backbone phylogeny of streptophytes. Our phylogenetic analysis of genomic data can aid in uncovering key cryptic diversity, which together with the discovery of new deep-branching lineages [28][29][30], are revealing important pieces in the puzzle that is the Eukaryotic Tree of Life.

Systematic botany
In the following, we describe four new species of Chlorokybus and designate a lectotype and an epitype for C. atmophyticus, given that no cultured material is available from the different locations studied by Geitler [18][19][20]. We further provide a formal description of the class Chlorokybophyceae, which was originally proposed by Bremer [31] without formal description nor page numbers, and thus being invalid under articles 38.1 and 41.5 of the International Code of Nomenclature (ICN) for algae, fungi and plants [32].
Class Chlorokybophyceae class. nov. (figure 2) Description: Green algae forming sarcinoid, cubical packets. Single chloroplast containing two pyrenoids. First pyrenoid located in the middle of the chloroplast and surrounded by starch grains. Second naked pyrenoid (or called pseudopyrenoid) located at the edge of the chloroplast. Reproduction can occur asexually by breaking cell packages into separate cells or by zoospores ( figure 2). Zoospores are produced one per cell and possess two laterally inserted flagella. The flagella and body are covered with square scales. The flagellar apparatus is non-cruciate unilateral and contains multi-layered structures (MLS). After settling of the zoospores, the flagella are retracted at the point of their insertion. Cell division type phragmoplast-like, presence of advanced cleavage furrow and VII type of mitosis (sensu van den Hoek et al. [33]). Sexual reproduction is not observed. The class is supported by SSU rDNA, plastid and nuclear transcriptomic data.
Etymology: The species epithet honours Prof. Dr Michael Melkonian (University of Cologne, Germany) for his important contributions to understanding the diversity and evolution of algae.
Comment: The strain CCAC 0220 represents another isolate of this species and the SSU-ITS sequence and NCBI BioSample accession are available under SAMEA2242428 (RNAseq) and SAMN10351691 (genome assembly), respectively.
Etymology: The species epithet honours Prof. Dr Rüdiger Cerff (Braunschweig University of Technology, Germany) for his contributions on endosymbiosis research and plant evolutionary biology.

(d) RNAseq and transcriptome assembly
Algae were scraped off the agar and transferred into 1 ml Trizol (Thermo Fisher, Carlsbad, CA, USA), ground using a Tenbroek tissue homogenizer and RNA isolated according the manufacturer's instructions. RNA samples were treated with DNAse I (Thermo Fisher, Waltham, MA, USA) and quality and quantity assessed with a formamide agarose gel and Nanodrop (Thermo Fisher), respectively. RNA was shipped on dry ice to Genome Québec (Montreal, Canada). After Bioanalyzer (Agilent Technologies Inc., Santa Clara, CA, USA) quality check, libraries were built using the NEBNext mRNA stranded library preparation kit (New England Biolabs, Beverly, MA, USA). Libraries were checked with Bioanalyzer and sequenced using NovaSeq 6000 (Illumina) with NEBNext dual adapters: 5'-AGATCGGAAGAGC ACACGTCTGAACTCCAGTCAC-3 0 for read 1 and 5'-AGATCGG AAGAGCGTCGTGTAGGGAAAGAGTGT-3 0 for read 2. FastQC (www.bioinformatics.babraham.ac.uk/projects/) reports are available in Dryad.
Data accessibility. RNAseq data are available on NCBI (Bioproject