Head transcriptome profiling of glossiphoniid leech (Helobdella austinensis) reveals clues about proboscis development

Cephalization refers to the evolutionary trend towards the concentration of neural tissues, sensory organs, mouth and associated structures at the front end of bilaterian animals. Comprehensive studies on gene expression related to the anterior formation in invertebrate models are currently lacking. In this study, we performed de novo transcriptional profiling on a proboscis-bearing leech (Helobdella austinensis) to identify differentially expressed genes (DEGs) in the anterior versus other parts of the body, in particular to find clues as to the development of the proboscis. Between the head and the body, 132 head-specific DEGs were identified, of which we chose 11 to investigate their developmental function during embryogenesis. Analysis of the spatial expression of these genes using in situ hybridization showed that they were characteristically expressed in the anterior region of the developing embryo, including the proboscis. Our results provide information on the genes related to head formation and insights into the function of proboscis-related genes during organogenesis with the potential roles of genes not yet characterized.


Introduction
In the development of bilaterian animals, a broadly conserved genetic toolkit underlies the specification of positional identity along the body axis, usually including the establishment of a centralized nerve system, concentration of sensory organs and development of specialized mouth parts at the most anterior region [1][2][3]. The mouth parts of animals have undergone tremendous specialization and diversification as they have evolved to capture and ingest food according to their diverse life styles and ecosystems [4][5][6][7], in large part by modifying the regulation and deployment of the broadly conserved toolkit. through injecting toxin [14,15] and fluid sucking [16,17], and the proboscis as a feeding apparatus has a tube for sucking up food such as nectar, body fluid and cellular constituents [18,19]. Fluid sucking of insects is generated in the canalized proboscis by suction pressure produced from the probosciscranial sucking pump complex with capillary activity which directs the flow of fluid into the oesophagus [19][20][21][22]. On the other hand, the feeding tube of the leech, another representative proboscis-bearing animal, consists only of a muscle complex. The force generated by this complex induces peristalsis for moving food in an aboral direction [11,16].
Cell lineage experiments performed with a well-studied leech model, Helobdella (family Glossiphoniidae), have provided information on the development of the leech proboscis [23,24]. The formation of germ layer-specific tissues in the proboscis is achieved through orchestrated proliferation and interdigitation of clones arising from cells in the early lineage of the mesodermal proteoblast DM' ('4d cell' in spiralian nomenclature), which mainly contribute to the formation of non-segmental mesoderm that can develop into the digestive tract during organogenesis [23]. At stage 10 (220 h to 245 h after zygotic development), the proboscis develops in an everted. During stage 11, the proboscis gradually inverts to its resting adult position within the foregut region, by which time it consists of a complex array of longitudinal, radial and circular muscles [23,25]. These radical and dynamic developmental changes raise the question of whether essential factors are actively expressed at stage 10 for the formation of the head or specific structures such as the proboscis.
Over the past few years, many studies have been performed to understand the develpomental involvement of various transcription factors [26][27][28] and signalling molecules [29][30][31], including physiological substances in the anterior region of the leech [32]. Previous studies have suggested that anterior development involves a combination of numerous signalling pathways and transcription factors and that the anterior head is a vast hub of gene expression. However, large-scale studies on expressed genes have not been reported. Towards this end, we used stage 10 leech embryos and conducted RNA sequencing (RNA-seq), a successful method for understanding differential gene expression in specific tissues or under different conditions in a wide variety of animals and plants [33][34][35]. In this study, we provide detailed information on the stage 10 leech embryo transcriptome and present proboscis-specific genes identified via differentially expressed gene (DEG) analysis between the head and body. Our results will be used as comparative data for the evolutionary relevance of the proboscis and also fill the gap in transcriptional information about the missing stage during leech embryo development.

Results and discussion
2.1. Internal structure development of the proboscis during organogenesis The harmonious proliferation and differentiation of the progeny of lineage-specific blast cells and micromeres give rise to the development of epidermis, ganglia and musculature with the formation of primary mouthparts that can develop into a future digestive tract [23][24][25]36]. After epiboly, the anterior germinal plate develops into the precursor of the mouthpart (stomodeum), surrounded by primary nerves and muscle fibres (figure 1a-d) [23]. At stage 10, the proboscis gradually protrudes, covered with epithelial tissues which develop into the proboscis sheath. This process involves the active development of a complex of innervated muscle layers in the proboscis structure and oesophagus (figure 1a, e-h 00 ). By stage 11, the proboscis invaginates into the mouth pore located in the foregut region with placement of a presumptive circumferential muscle layer and spanning of the radial muscle along with development of longitudinal and oesophageal muscle (figure 1a,i-l 0 ). The timing of the morphological differentiation of the proboscis thus suggests that stage 10 might be a crucial step in the development of the internal structure and that proboscis-related genes might be highly expressed during this dynamic event.

2.2.
De novo transcriptome assembly of a stage 10 leech embryo We used RNA-seq to find specific genes in the proboscis compared with the body region at stage 10 (figure 2a and electronic supplementary material, figure S1). More than 85 million raw reads of the body and 79 million raw reads of the proboscis were obtained, yielding 84 million and 78 million filtered reads, respectively, with high-quality values (filteredout ratio greater than 98%; Q 20 > 98%; for details, see electronic supplementary material, table S1). We combined these reads and subjected them to de novo transcriptome assembly using Trinity v.  , see electronic supplementary  material, table S1). Subsequently, we annotated the assembled transcripts using BLASTP (e-value ≤1 × 10 −10 ) based on Uniprot/SwissProt and UniRef50 databases to obtain complete sequence coverage. We obtained 13 735 annotated transcripts, including 8003 genes annotated based on the Uniprot/ SwissProt database and 5732 genes annotated based on the UniRef50 database (electronic supplementary material, tables S1 and S2). To examine the completeness of this Helobdella austinensis embryo transcriptome, we assessed annotated protein-coding sequences. A total of 82.5% (250/303 genes) and 78.4% (767/978 genes) of the eukaryote and metazoan singlecopy orthologues were identified, respectively (figure 2b).
For further investigation, we focused on genes that were both highly and specifically expressed in the head, assuming that highly expressed genes and those most differentially expressed would mainly contribute to head formation, including proboscis development. We selected carefully based on criteria (highly expressed genes annotated, head fragments per kilobase of transcript per million mapped reads (FPKM) > 100; highly expressed genes uncharacterized, head FPKM > 100; differentially expressed genes annotated, head FPKM >10 and body FPKM <1), and 11 transcripts including three uncharacterized genes were identified (table 1). Among these head-   royalsocietypublishing.org/journal/rsob Open Biol. 12: 210298 related DEGs, genes encoding secreted frizzled-related protein (sFRP) homologues were highly and specifically expressed based on both criteria. These findings suggest that a variety of genes are related to the development of the anterior region. Among them, the high and characteristic expression of sFRPs might be correlated with anterior formation [3,38,44]. In addition to transcriptional expression analysis, we also investigated spatial expression patterns of these 11 transcripts to examine their embryological contribution during organogenesis.

Widespread expression of the hs71l and scot1 homologue
In stage 9, a homologue of the heat shock protein (HSP) hs71l was expressed in the stomodeum and ventral germinal plate (figure 4a). Its expression in the mouthpart persisted. Its expression was also detected in the rostral ganglia, salivary gland and visceral muscle, including the precursor of the posterior sucker (figure 4b). After proboscis invagination, it was expressed ubiquitously in the whole body with discriminate expression in the rostral ganglia, salivary gland and hindgut (figure 4c-c 0 ). HSPs are multifunctional proteins related to protein maturation, protein refolding, protein import and translocation. HSPs facilitate proteolytic degradation of unstable proteins by targeting proteins to lysosomes or proteasomes under normal and stress conditions [45][46][47]. Although some HSPs have been shown to have embryological functions such as neurodevelopment in previous studies [48,49], comparable expression patterns in lophotrochozoans have not to our knowledge been previously described. Our results for the first time suggest that hs71l plays multifunctional roles in tissues or organs undergoing dynamic structural changes during leech embryogenesis. Succinyl-CoA-3-oxaloacid CoA transferase 1 (SCOT1) is related to ketone metabolism, which catalyses acetoacetate to acetoacyl-coenzyme A (acetoacyl-CoA) in the mitochondria [50,51]. At stage 9, the scot1 homologue was expressed around the stomodeum and in the adhesive zone, by which the embryo attaches to the maternal venter during the period of development after hatching from the fertilization membrane and cocoon and before the rear sucker becomes functional (figure 4d). At stage 10, scot1 was expressed in the proboscis, salivary gland and visceral muscle (figure 4e). Even after proboscis invagination, its overall expression pattern in the proboscis persisted along with the developing head, salivary glands and hindgut (figure 4f,f 0 and electronic supplementary material, figure S3). We believe that, during embryogenesis, scot1 is expressed in structures and developing regions in which ketone metabolism might be required [52][53][54]. These results suggest that the stage of organogenesis might be a period in which tissues and organs undergo dynamic changes with organ-specific metabolic process of ketone.

Genes expressed in specific tissue layers in the proboscis
Previous lineage analyses [23,24,36] of leech proboscis development and our morphostructural analysis revealed that the royalsocietypublishing.org/journal/rsob Open Biol. 12: 210298 development of the proboscis is marked by orderly expansion of radial muscles and differentiation of longitudinal muscles. We hypothesized that characteristic core signalling genes might contribute to this process.
Calmodulin (CaM) and calmodulin-like (CML) proteins known as major regulators of Ca 2+ -dependent signalling are key regulators of various mechanisms such as cell proliferation, motility and cell cycle progression [55][56][57]. In addition, CaM is an upstream regulator of calmodulin kinase (CaMK), which activates CaMK signalling to induce the development of the central nervous system [58,59]. At stage 9, transcripts of cml23 homologue were expressed in the furrow of the stomodeum and the adhesion site for attachment to the maternal venter (figure 4g). The adhesion site develops into the future epithelium of the anterior sucker. Transcripts of the cm123 homologue were continuously expressed in the region, including nuclei of radial muscle and ventral ganglia (figure 4h-i 0 ). Our results indicate that the cml23 homologue is related to the cellular process of the developing proboscis and central nerve system during organogenesis.
sFRPs are key modulators of the Wnt signalling system, which determines the anteroposterior axis according to the distribution of Wnt ligands and their frizzled receptors; Wnt signalling is essential for bilaterian head formation [3,44]. In this study, sfrp orthologues showed an anterior-specific expression pattern in embryos during organogenesis (figures 4j-l 0 and 5a-c 0 ). At stage 9, these homologues were expressed around the stomodeum. At stages 10 and 11, these two paralogues showed different expression patterns. Transcripts of sfrp1/2/5a were expressed in the radial muscle precursor in the proboscis but weakly expressed in the posterior body (figure 4k-l 0 ). However, sfrp1/2/5b was expressed in the posteriormost proboscis sheath and the anterior intrinsic muscle [60] (figure 5b-c 0 ). Based on these spatial evidences, sfrp orthologues might play a role as conserved anterior morphogens with proboscis differentiation factors [3,44,61].
Zinc finger CCHC-type containing protein 24 (ZCH24) is one of the zinc finger domain proteins [62] known to be related to the development of somitogenic mesoderm in vertebrates [63], although it is largely unknown in invertebrates. In developing leech embryos, the zch24 homologue was only expressed inside the stomodeum at stage 9 (figure 5d) and in radial muscle precursors from the anterior to the posterior of the proboscis by stage 11 (figure 5e-f 0 ). These restricted expressions of zch24 transcripts in proboscis indicate that zch24 is a possible proboscis-specific gene in H. austinensis as well as related to the radial muscle differentiation of the developing proboscis.
2.6. Nervous system-specific expression of the Dmrt93B homologue At stage 9, Dmrt93B was expressed around and inside the stomodeum and in the ectodermal region of the germinal plate (figure 5g). At stage 10, its transcripts were mainly expressed in the rostral segment of ganglia and the proboscis including ventral ganglia (figure 5h). After proboscis invagination, Dmrt93B transcripts were expressed in the rostral segments, epidermis, ventral ganglion, nerve canal in the proboscis cavity [16] and the posterior sucker (figure 5i,i 0 ). Dmrt genes are well known to be mainly sex dimorphic factors [64,65]. They have been actively studied in various fields such as neuronal differentiation and brain development, in addition to differentiation of animal sex determinants and reproductive organs [66,67]. Furthermore, in Panarthropoda, Dmrt93B is expressed in the nervous system including mouth development [68]. Taken together, these results indicate that this highly conserved gene also plays a role as a transcription factor for the development of the mouth and nervous system during leech embryogenesis.

Developmental relevance of the defence-related gene
Defence protein is characterized by a reeler domain. It is an innate immune substance that protects against pathogens royalsocietypublishing.org/journal/rsob Open Biol. 12: 210298 [69,70]. In this study, dfp3 homologue was not expressed at stage 9 (figure 5j). However, it began to be expressed in the epithelium region of the proboscis at stage 10 (figure 5k). Its expression in the epithelium surrounding the proboscis was maintained even after proboscis invagination (figure 5l,l 0 ). We suggest that this expression pattern is related to the development process of leech embryos. Helobdella embryos are surrounded by a vitelline membrane until stage 9. After hatching, they attach to the maternal venter and take off from it [25]. Based on the developmental evidence of spatial expression, the innate immune system in the leech might begin its protective function as the leech breaks through the vitelline membrane and becomes exposed to the environment.

Developmental contribution of uncharacterized genes during organogenesis
At stage 9, UniRef50_T1FEH8 transcripts were expressed in the periphery of the stomodeum along with overall expression in the germinal band Then, transcripts were expressed in the epithelium of the everted proboscis and rostral ganglia at stage 10. After proboscis invagination, UniRef50_T1FEH8 transcripts were expressed in the proboscis chamber, proboscis sheath and ventral ganglion (figure 6a-c 0 ). From these results, we found that UniRef50_T1FEH8 is related to development of the epithelium of the proboscis and ganglia formation. UniRef50_A7TSG4 transcripts were expressed in the foregut precursor region and around the stomodeum at stage 9 and in the proboscis epithelium, oesophageal funnel and overall gut boundaries at stage 10. At stage 11, the expression at stage 10 was maintained, in addition to the expression in the proboscis chamber, proboscis sheath, oesophagus and the developing hindgut ( figure 6d-f 0 ). Therefore, we found that UniRef50_A7TSG4 is related to the epithelium of the proboscis along with overall gut development.
UniRef50_T1F354 transcripts were expressed in the stomodeum, coelomic cavity and visceral muscle precursors at stage 9 (figure 6g). At stage 10, these transcripts were expressed in the radial muscle precursor in the anterior proboscis, fibres of the proboscis sheath, body and visceral muscles (figure 6h). The expression was maintained in the same region at stage 11 (figure 6i,i 0 ). Therefore, Uni-Ref50_T1F354 is related to the overall mesodermal tissue development during organogenesis. Their intense expression in the proboscis with major organs of these uncharacterized or maybe leech-specific genes indicates that these unique genes are not only important for embryogenesis but also are specifically related to the proboscis development.

Conclusion
The feeding organs of invertebrates have undergone extensive diversification and specialization over the course of evolution. We performed a comprehensive study to expand our knowledge about proboscis evolution and to discover genes related to the development of this specialized organ. Through transcriptome profiling, DEG analysis and spatial analysis, we identified specific genes related to proboscis development including highly conserved anterior formation factors. This not only provides novel information that has not been previously elucidated but also suggests that leeches also undergo anterior formation by conserved morphogens along with diverse transcription factors and that the developing proboscis is a vast hub of signal molecules. Furthermore, the presence of numerous uncharacterized genes in the anterior formation suggests a possible involvement of specific genes in proboscis development, including head formation in leeches, raising questions about the evolution of mouth formation in other proboscis-bearing animals. Taken together, our comprehensive survey will be used for comparative Table 1. Genes of interest selected based on the three criteria. Asterisks indicate previously studied genes (for detailed phylogeny, see electronic supplementary material, figure S2

RNA isolation and RNA-seq library preparation
Embryos were cultured in clean HTR medium until stage 10 (220-245 h after zygotic development). To obtain a sufficient amount of RNA, about 400 embryos were used in the experiment. The head region was dissected using insect pins  royalsocietypublishing.org/journal/rsob Open Biol. 12: 210298 (Shiga, no. 2) under a Leica ZOOM 2000 stereomicroscope (Leica, Wetzlar, Germany). Next, the proboscis region was transferred to a 1.7 ml tube containing 200 µl of TRIzol reagent (Invitrogen, Carlsbad, CA, USA). RNA was isolated from each sample using TRIzol reagent according to the manufacturer's instructions. The purity and integrity of the total RNA isolated from dissected embryo samples were examined using a Nanodrop 2000C spectrophotometer (Thermo Scientific, Waltham, MA, USA) and a Bioanalyzer 2100 (Agilent Technologies, Palo Alto, CA, USA). Then, total RNA concentration was calculated using Quant-IT RiboGreen (Invitrogen, R11490). To assess the integrity of the total RNA, samples were run on a TapeStation RNA screentape (Agilent, 5067-5576). Only high-quality RNA (RNA integrity number greater than 7.0) was used to construct an RNA library. A library was independently prepared with 1 µg of total RNA for each sample using Illumina TruSeq Stranded Total RNA Sample Prep Kit v. 2 (Illumina, Inc., San Diego, CA, USA). Ribosomal RNA in total RNA was depleted using a Ribo-Zero kit (Epicentre Biotechnologies, Madison, WI, USA). After the rRNA depletion, the remaining RNA was purified, fragmented and primed for cDNA synthesis. Cleaved RNA fragments were copied into first-strand cDNAs using reverse transcriptase and random hexamers followed by second-strand cDNA synthesis using DNA polymerase I, RNase H and dUTP. Next, cDNA fragments underwent end repair, addition of a single A base stage 9 (155-175h AZD) stage 10 (220-245h AZD) stage 11 (300-310h AZD) cross-sectional view   [74]. For annotated NRCDS, homology search was performed against the Uniprot/SwissProt database using BLASTP with an E-value cut-off of 10 −10 and the best blast hit. For one-to-one protein and gene mapping, the one with the highest expression level was selected if multiple NRCDS were annotated in a single gene. To minimize the loss of unannotated NRCDS, it was reannotated with the UniRef50 database, an automatically annotated database and clustered sets of sequences from UniProt Knowledgebase and selected UniParc records [37]. Differentially expressed genes between the head and body were identified using the edgeR program with an FDR cut-off of less than 0.05. GO terms of differentially expressed genes were identified using PANTHER (Protein Analysis Through Evolutionary Relationships, http://pantherdb.org/) [75].

Whole mount in situ hybridization
Fluorescence in situ hybridization (FISH) was performed as previously described [29,32]

Immunostaining
Whole-mount immunostaining was performed according to previously published protocols [16]. Briefly, fixed embryos were washed with PBS three times. After washing with PBS, the embryos were rinsed with 1% Triton in 0.1 M Tris-HCl ( pH 7.5) several times for 1 h, incubated with diluted block solution (1 : 9 = 10× Roche Western Blocking Reagent:PBT) for 2 h and then incubated with primary antibodies (anti-acetylated-α-tubulin produced in mouse; Sigma Aldrich; T-7451) in diluted blocking solution (1 : 500) at 4°C for 48 h. After five consecutive washes with PBT, embryos were incubated with a secondary antibody (goat antimouse IgG H&L Alexa Fluor 488, Abcam, ab150113) in diluted blocking solution (1 : 1000) at 4°C for 24 h. After checking the signal, embryos were washed five times with PBT and then stained with Texas Red-X Phalloidin (Thermo Scientific, Waltham, MA, USA) for 24 h to visualize F-actin. After checking the signal, embryos were washed with PBT five times and labelled with DAPI in PBT (1 : 100) at room temperature in the dark for 20 min. Fluorescence-stained embryos were imaged using a LSM 710 confocal microscope (Carl Zeiss, Oberkochen, BW, Germany). The obtained images were edited using ZEN software (Carl Zeiss). Fluorescence-labelled embryos were then dehydrated in ethanol series (70%, 90% and 100% diluted in 1 × PBS) and propylene oxide, followed by infiltration with plastic embedding solution (PolyBed). Embryos were cut with a microtome blade (Leica 818; Leica, Wetzlar, Germany) under an Olympus SZ-STS microscope (Olympus, Tokyo, Japan). These sections were then mounted with Fluoromount-G (SouthernBiotech, Birmingham, AL, USA). Stained embryos and slide samples were imaged using a LEICA DM6 B with a LEICA DFC450 C camera (Leica, Wetzlar, Germany). Obtained images were edited using a Las X software (Leica), and prepared as figure plates using Adobe Illustrator CS6 (Adobe, San Jose, CA, USA).
Data accessibility. Sequences generated in this study are deposited in GenBank (electronic supplementary material, table S4). Raw sequencing data are deposited in the NCBI Sequence Read Archive (SRA) database with accession numbers of SRR15013283 and SRR15013284 under BioProject PRJNA742851. The data are provided in electronic supplementary material [76].