A single-cell genome perspective on studying intracellular associations in unicellular eukaryotes

Single-cell genomics (SCG) methods provide a unique opportunity to analyse whole genome information at the resolution of an individual cell. While SCG has been extensively used to investigate bacterial and archaeal genomes, the technique has been rarely used to access the genetic makeup of uncultivated microbial eukaryotes. In this regard, the use of SCG can provide a wealth of information; not only do the methods allow exploration of the genome, they can also help elucidate the relationship between the cell and intracellular entities extant in nearly all eukaryotes. SCG enables the study of total eukaryotic cellular DNA, which in turn allows us to better understand the evolutionary history and diversity of life, and the physiological interactions that define complex organisms. This article is part of a discussion meeting issue ‘Single cell ecology’.


Introduction
Associations of eukaryotes with archaea and bacteria are central to eukaryote evolution and were the driving force behind the emergence of the eukaryotic cell. An alphaproteobacterial symbiont that once settled within a pre-eukaryotic cell gave rise to mitochondria [1]. Likewise, other bioenergetic organelles, primary plastids of Archaeplastida, arose upon a symbiotic event where an archaeplastid ancestor engulfed a cyanobacterium [2]. During later stages of evolution, some major eukaryotic lineages independently acquired secondary or even tertiary plastids, for instance by taking up a cell already containing the primary or secondary plastid, respectively [3]. Over time, the morphologies and genomes of precursors of secondary and tertiary plastids have been reduced to different degrees. Unlike most of the phototrophic eukaryotic lineages, chromistan and chlorarachniophyte algae contain a nucleomorph, a residual nucleus originating from engulfed eukaryotes that contain the primary plastid within their secondary plastids [4,5]. Dinotoms (Kryptoperidiniaceae, a small group of Dinoflagellata) also retained the engulfed nucleus and preserved the symbiont's mitochondria within dinotom plastids [6].
While the evolutionary impact of semiautonomous organelles becomes easily apparent, they are not the only DNA containing entities that contribute to the genomic complexity of eukaryotic cells. Eukaryotes frequently host archaeal, bacterial and even eukaryotic endocytobionts (endosymbionts living intracellularly). Note that we use symbiosis here in its broadest sense-that is, any intimate and constant interaction along the continuum between mutualism and pathogenicity and ranging from harmful to beneficial and from facultative to obligate [7]. Whereas some of these associations resemble the fate of the semiautonomous organelles, i.e. being present in whole host population and transmitted only vertically (e.g., Perkinsela in Neoparamoeba, [8]), other endocytobionts (and intracellular entities) are dependent on horizontal transmission and thus their presence and prevalence in host populations is difficult to predict and fluctuates, respectively. In phagocytosis capable eukaryotes, another intracellular, nucleic acid containing component, albeit transient, is represented by ingested prey items. Lastly, viruses as common obligate intracellular parasites are extremely diverse and ubiquitous; despite their generally rather small particle and genome sizes they play a major role in controlling their host populations, thereby impacting biogeochemical cycles across ecosystems [9]. Giant viruses, members of nucleocytoplasmic large DNA viruses, are comparable in size and complexity to other intracellular entities. They infect a wide range of eukaryotes, especially those capable of phagocytosis [10] and can host other viruses [11], and they are predicted to having evolved from smaller viruses through successive horizontal gene transfer of genetic material from their hosts [12].
All the above-mentioned intracellular entities contain nucleic acids, and thus, the pool of total cellular DNA in a single eukaryotic cell reflects this sum. Aside from general biological importance, this has practical implications when using single-cell genomics (SCG). Here we do not reflect on these cellular conglomerates as something that weakens the SCG data but as an exciting opportunity to study underexplored multipartite associations within eukaryotic cells. Here, we focus on unicellular eukaryotes and discuss compositions of various possible genomic pools within their single cells. We illustrate how the cultivation skew is limiting our current understanding of eukaryote intracellular associations and how emerging SCG provides a promising tool to enhance our knowledge of the genetic make-up of these associations.

Genome-wide approaches to studying intracellular associations in unicellular eukaryotes
Over the last decades, the use of easily-maintainable host laboratory cultures has been a standard procedure for studying eukaryote intracellular associations [13,14] and our understanding of the nature of many intracellular associations stems primarily from whole genome information. The need for genomic DNA of sufficient quantity and quality for studying interactions has narrowed the phylogenetic scope of research favouring model hosts grown axenically (i.e. isolated as single species, under contaminant free conditions) [15,16]. These laboratory model systems lack the higher complexity observed in naturally occurring intracellular associations; for example, associations between intracellular entities and a single host cell can vary across natural host populations [17]. Moreover, the genetic diversity of the intracellularly associated entities is expected to greatly exceed what can be deduced from laboratory culture-based models (e.g. [18,19]). Consequently, the effects of eukaryotic intracellular associations on ecosystem processes (through modulating microbial population structure [9]) or on their hosts (by providing new metabolic capabilities [20]) remain poorly studied beyond laboratory conditions. Looking at the phylogenetic diversity of eukaryotes that host three important obligate parasitic groups (Chlamydiae, Rickettsiales, giant viruses) studied at the genomic level (figure 1b), it is clear that Obazoa (which includes animals), Amoebozoa (namely Acanthamoeba strains), and Archaeplastida are overrepresented, suggesting that most of the diversity is still hidden and remains to be discovered. The biases and limitations of laboratory model systems can be partly overcome by employing cultivation-independent methods (e.g. metagenomics) on environmental samples; however, these methods lack single-cell resolution and therefore obscure host-symbiont relationships. Thus, only by focusing on one eukaryotic cell at a time can we obtain a detailed understanding of these fascinating multipartite associations and SCG has already begun to fill this gap (lineages labelled on figure 1a). SCG has become a well-established approach for studying the coding potential and evolutionary histories of bacteria and archaea [22,23], specifically for microbial dark matter clades [24,25]. While bacteria and archaea have been extensively investigated by SCG, studies of associations within unicellular eukaryotes have lagged behind [26]. Unicellular eukaryotes are ubiquitous, encompass the most diverse and abundant part of the eukaryotic domain, and play a critical role across various ecosystems as primary producers, consumers, decomposers, and trophic links in food webs (e.g. [27]). Further, while some groups have initially used SCG (and SCG-like methods) to investigate this set of organisms, nearly all have followed protocols developed for the study of archaeal and bacterial associations (e.g. [28][29][30][31][32]; for the review of methods, see [23]). Although the standard SCG approach (i.e. physical separation of single cells and whole genome amplification and sequencing) works well with eukaryotes there are some unique aspects associated with sorting eukaryotic cells. Several separation techniques have been developed for obtaining individual cells (reviewed by Blainey [33]). Briefly, the separation techniques can be divided into two approaches based on whether the individual cells are selected randomly (random encapsulation methods) or identified first and then sorted (micromanipulation methods). The choice of the separation technique is of decisive influence on cell throughput and the micromanipulation approaches (e.g. micropipetting, optical tweezers) are among those considerably limited by their speed. Currently, the most common method, fluorescence-activated cell sorting (FACS), represents the random encapsulation approach and excels in speed. Another major advantage of FACS is the possibility of characterizing cells based on different fluorescence signals, e.g. autofluorescence from chlorophyll and specific staining for acidic organelles [28], providing some level of selectivity. The sizes of protists often make it possible to observe individual cells and their morphology under light microscopy, which enables: (i) assigning them to a group of target organisms [31]; (ii) examination of their current condition in fresh samples; or (iii) identifying the presence of endosymbionts and even their physical separation from the host cell [34]. On the technical side, sample preparation may have to be adjusted for some eukaryotic taxa compared to bacterial/archaeal samples. Generally, eukaryotic cells are much more fragile and sensitive to mechanical treatment (such as filtration or sonication) or cryopreservation, though this varies greatly across eukaryotic groups, e.g. Not et al. [35]. Thus, sample preparation needs to be tailored to meet the specific needs of a particular eukaryotic target group.

Single-cell genomics-enabled biological insights into eukaryote intracellular associations
The SCG approach has sufficient sensitivity to recover genomic data of other DNA containing entities present within a royalsocietypublishing.org/journal/rstb Phil. Trans. R. Soc. B 374: 20190082 single cell. As referenced earlier, Yoon and colleagues [26] focused on an (at that time) obscure group of likely photoautotrophic marine plankton-Picobiliphyta (currently called Picozoa). Their data did not provide any evidence of plastid sequences, as was expected, but instead revealed bacterial and viral DNA probably originating from ingested prey items. Similarly, phagotrophic interactions were assumed in subsequent SCG-based studies (table 1). Plotnikov et al. [37] found rich bacterial assemblages within single cells of the ciliate Paramecium aurelia, however, it was not possible to unambiguously identify their role inside their hosts. Aside from the ingested prey, infections caused by nanovirus [26], and putative symbionts related to Rickettsiales and the candidate divisions ZB3 and TG2 [28] were also revealed. These  and other studies provide an interesting perspective on the power of SCG and its limitations. SCG and ancillary methods have the potential to provide insight, at the single cell level, into predator-prey relationships in phagotrophic eukaryotes, uncover prey associated phages and intracellular symbionts, and identify viral infections. On the other hand, technical issues associated with separation of single cells and genome amplification cannot be ignored. For example, associations may be wrongly inferred if DNA from non-associated entities is introduced into a sample by co-sorting under relaxed sorting conditions or if bacteria/viruses are firmly attached to cells [38]. Another critical step-whole genome amplificationmay introduce various artefacts including amplification bias and genome loss [39]. The most challenging aspect, however, is the specific interpretation of the nature of intracellular associations revealed by SCG. In data obtained from phagocytosis-capable eukaryotes (table 1), it has been difficult to discern whether bacterial contigs in SCG assemblies belonged to prey items or endosymbionts. This situation may not be simpler for photosynthetic microbial eukaryotes, as many of them are also capable of phagocytosis [40]. Distinguishing prey from other intracellular entities in silico is of considerable importance or else long-term interactions may remain obscure because of this uncertainty. Even if viruses (only double stranded DNA and single stranded DNA viruses are detectable by SCG) or strictly intracellularly living organisms (e.g. members of Chlamydiae or Rickettsiales) are detected, their origin as prey items cannot be renounced. Clustered regularly interspaced short palindromic repeat (CRISPR) systems can link viruses to their host but this approach has two limitations: it is only suited for bacterial/archaeal genomes, and complete or nearly complete genome assemblies from host and virus are necessary [41,42]. Thus, while using CRISPRs can link viruses with bacteria and archaea, either food or symbiotic, it relies on the quality of the assemblies. Alternatively, putative viral-host connections can be identified by aligning viral transfer RNAs with other recovered genomes [43].
A complementary approach to differentiate free-living and endosymbiotic organisms is to look for typical genomic signatures of an intracellular lifestyle. Generally speaking, bacteria obligately associated with and living intracellularly in other organisms tend to have genomes that are (i) small, (ii) AT-rich, and (iii) rapidly evolving [44,45]. These genome changes occur during the transitions from free-living to facultative to obligately intracellular [46]. In the early stage of the adaptation towards endosymbiosis, i.e. during the shift from a free-living to an intracellular lifestyle, genomes undergo extensive pseudogenization and gene loss [47,48]. Further adaptation towards obligate endosymbiosis, hence towards a stable and nutrient-rich environment, leads to genome reduction through gene loss. Endosymbionts tend to have limited biosynthetic capabilities (though a unique combination of functional pathways is often retained [46]) and pathways of energy metabolism and biosynthesis of nucleotides, amino acids or vitamins are the most frequently disrupted [44]. Genome reduction affects many other genes, including those involved in cell motility [49], cell division [50], or DNA repair mechanisms [51,52]. In addition, their genes frequently encode a wide range of proteins involved in host interactions or the transport of metabolites [8,53]. Pathogens have acquired genes encoding proteins with eukaryotic motifs that are probably involved in modulation of host functions, e.g. suppressing host defences and allowing survival and replication within eukaryotic host cells [54]. Taken together, a wealth of genomic signatures is associated with the intracellular lifestyle; these signatures provide an opportunity to build and gradually optimize models for predicting endosymbionts in SCG data. Considering that some genomic signatures indicate well-defined effects of symbiont presence on host fitness (e.g. pathogens, metabolic mutualists), or stages of host adaptation, the models may ideally provide even finer classification.

Validating the nature of intracellular associations revealed by single-cell genomics
While the aforementioned possible strategies for the distinction between prey and other intracellular entities heavily rely on the genomic sequence data, other approaches can be used to further validate the nature of intracellular associations revealed by SCG. Recurrent observation of the same 'contaminating' DNA may suggest the symbiotic lifestyle and rule out the prey origin, especially in combination with starvation experiments. However, it is nearly impossible to eliminate bacteria in environmental samples without depletion of heterotrophic unicellular eukaryotes. A plausible strategy could be the replacement of a diverse bacterial community by a homogeneous and well-defined population of a single bacterium. Lagkouvardos et al. [55] succeeded using the Escherichia coli tolC knockout strain that not only substituted diverse original bacterial communities in several environmental isolates of Acanthamoeba but also could be later controlled by a sublethal concentration of ampicillin owing to its hypersensitivity. Distinction between prey and endosymbionts can also be achieved by localization of the organism(s) detected in the single-cell data using fluorescence in situ hybridization via specific oligonucleotide probes. The sample has to be chemically fixed [56] for later hybridization and microscopical inspection. Endosymbionts may reside virtually in any possible compartment within the host cell whereas phagocytized prey is generally limited only to phagosomes and last until completely digested in phagolysosome ( phagosome fused with lysosomes). Thus, an intracellular localization beyond vacuoles provides conclusive evidence for the symbiotic lifestyle whereas the presence within vacuoles does not prove the prey origin. Both endosymbionts and prey can be found in phagosomes because phagocytosis is also a common way of entering host cells for endosymbionts [57]. Even the presence within acidified vacuoles (late phagosomes) does not necessarily confirm a prey origin as some pathogens are known for their ability to persist within acidified vacuoles (e.g. Coxiella burnetii [58]).
Lastly, single-cell RNA sequencing (scRNA-seq) provides an opportunity to assess the nature of intracellular associations. Though still hampered by technical challenges when applied to single celled eukaryotic microorganisms [59], progress has been made over the last years (e.g. [60][61][62][63]). Because scRNA-seq is capable of showing different patterns of gene expression in each cell at a specific time point, it could be highly beneficial in the differentiation of symbionts versus prey. Unlike the adapted endosymbionts able to manipulate phagosome maturation, phagocytized free-living organisms encounter a hostile environment with which they cannot cope. The phagosome becomes more acidic during its maturation and the engulfed microorganisms are gradually royalsocietypublishing.org/journal/rstb Phil. Trans. R. Soc. B 374: 20190082 degraded by hydrolytic enzymes [64]. Thus, they cannot optimally function which would be reflected in their gene expression. Consequently, apart from the eukaryote host transcriptome probably represented in high abundance in scRNAseq data, endosymbiont transcripts could be present with upregulated genes involved in host interactions. One caveat to this approach however is that current scRNA-seq protocols applied to single celled eukaryotic microorganisms [60][61][62][63] do not process all RNA present in the eukaryotic host cell. Instead, only eukaryotic messenger RNAs (mRNAs) that possess polyadenylated (polyA) tails are hybridized to an oligo(dT)-containing primer and transcribed into complementary DNA [65]. Thus, ribosomal RNAs (rRNAs) and transfer RNAs along with bacterial or archaeal mRNAs are omitted from sequencing. While scRNA-seq has been applied to bacteria [66], there is no rRNA depletion step [67], hampering efficient transcript detection. An alternative approach to the polyA selection universal for both host and symbiont cells is needed.

Concluding remarks
Intracellular associations underlie the emergence and subsequent flourishing and evolutionary trajectory of eukaryotic cells. However, as pointed out by Woyke & Schulz [68], we still do not know much about these intimate relationships, and what we do know is heavily skewed towards established laboratory systems, represented by a small number of model hosts that do not reflect the true diversity of eukaryotes (figure 1) or their intracellular associations. We predict that SCG methods will lead to fundamental discoveries and move us forward in understanding one of the most remarkable phenomena on Earth.
In principle, there are two primary directions for exploring diverse eukaryotic lineages and their associated intracellular entities. First, microbial culture collections around the world store strains of unicellular eukaryotes, some of which represent neglected eukaryotic lineages [69]. Most likely, reasons for their exclusion as established model taxa include difficulties with maintaining these cultures. Their genome data could be obtained by SCG, and priority strains with genomic signatures of likely symbionts or viruses could be selected to optimize culture conditions for further experimental work. The second approach would encompass the targeted sorting of unicellular eukaryotes from environmental samples with a focus on eukaryotic lineages currently lacking in culture collections.
To accelerate progress in this field, we encourage the research community to establish a genomic encyclopaedia of eukaryote intracellular associations. A robust genomic database connecting eukaryote host cells and their intracellular entities will enable (i) extension of the tree of life with new microbes (archaeal, bacterial, and eukaryotic) and viruses, (ii) investigation of the intracellular associations in ecological context (determination of prevalence, distribution and abundance of identified associations), and (iii) large-scale comparative genome analyses (metabolic features, virulence factors, new gene functions provided to hosts). By sequencing a single eukaryotic cell at a time, we can bypass cultivation requirements, overcome existing biases and gain insight into naturally occurring associations within so far neglected eukaryotic hosts.
Data accessibility. This article has no additional data. Authors' contributions. T.T. wrote the first draft of the manuscript which was edited by T.W. and S.V.D.