Biology Letters
You have accessResearch articles

Bioinformatic prediction of putative metallothioneins in non-ciliate protists

Sergio Balzano

Sergio Balzano

Stazione Zoologica Anton Dohrn Napoli (SZN), Department of Ecosustainable Marine Biotechnology, via Ammiraglio Ferdinando Acton 55, 80133, Naples, Italy

NIOZ Royal Netherlands Institute for Sea Research, 1790AB Den Burg, The Netherlands

[email protected]

Contribution: Conceptualization, Formal analysis, Investigation, Writing – original draft, Writing – review & editing

Google Scholar

Find this author on PubMed

Angela Sardo

Angela Sardo

Stazione Zoologica Anton Dohrn Napoli (SZN), Department of Ecosustainable Marine Biotechnology, via Ammiraglio Ferdinando Acton 55, 80133, Naples, Italy

Istituto di Scienze Applicate e Sistemi Intelligenti – CNR, via Campi Flegrei 34, 80078 Pozzuoli, Naples, Italy

Contribution: Conceptualization, Writing – original draft, Writing – review & editing

Google Scholar

Find this author on PubMed


    Intracellular ligands that bind heavy metals (HMs) and thereby minimize their detrimental effects to cellular metabolism are attracting great interest for a number of applications including bioremediation and development of HM-biosensors. Metallothioneins (MTs) are short, cysteine-rich, genetically encoded proteins involved in intracellular metal-binding and play a key role in detoxification of HMs. We searched approximately 700 genomes and transcriptomes of non-ciliate protists for novel putative MTs by similarity and structural analyses and found 21 unique proteins playing a potential role as MTs. Most putative MTs derive from heterokonts and dinoflagellates and share common features such as (i) a putative metal-binding domain in proximity of the N-terminus, (ii) two putative MT-specific domains near the C-terminus and (iii) one to three CTCGXXCXCGXXCXCXXC patterns. Although the biological function of these proteins has not been experimentally proven, knowledge of their genetic sequences adds useful information on proteins that are potentially involved in HM-binding and can contribute to the design of future biomolecular assays on HM–microbe interactions and MT-based biosensors.

    1. Introduction

    Microorganisms inhabiting heavy metal (HM)-contaminated environments, eventually incorporating contaminants within the cell, are biotechnologically interesting because of their potential use for bioremediation [1]. Passive adsorption of cations onto cell walls and transport across cell membranes are the two major mechanisms of HM uptake by living cells [2,3]. Subsequently, intracellular polypeptides such as enzymatically produced phytochelatins and genetically encoded metallothioneins (MTs) limit the detrimental effect of HMs by complexing and transporting them towards vacuoles, chloroplasts or mitochondria [4,5].

    MTs are low-molecular weight proteins exhibiting a low content of aromatic amino acids and high proportions of cysteine residues (10% or more); they have been characterized in great detail in multicellular organisms [6,7] as well as in bacteria [8], yeasts and ciliates [9], and are currently classified in 15 families that are not phylogenetically related but are likely to result from convergent evolution [10]. Ciliate MTs are generally longer than average and, along with MTs from metazoans and fungi, contain greater proportions of cysteine than MTs from plants and bacteria [11]. In addition to classified proteins, MTs isolated and characterized experimentally from the brown macroalga Fucus vesiculosus [12], the excavate Trichomonas vaginalis [10] and different fungi and metazoans [13], as well as HM-contaminated soils [14], could not be classified and were suggested to make up novel MT families [13].

    The broad genetic diversity spanning living organisms [15,16] and the scarcity of known MTs in microbial eukaryotes other than fungi and ciliates [17] suggest that the real diversity of MTs as well as the number of distinct families are likely to be broader than is currently known. For example, less than 1% of proteins annotated as MTs on GenBank belong to protists, they are mostly associated with parasitic genera (Babesia, Entoamoeba, Plasmodium and Trichomonas), and other microorganisms including microalgae are highly underrepresented [17]. MTs from both eukaryotic and prokaryotic microbes have been recently reviewed by Gutiérrez et al. [18] and while MTs from ciliates and fungi have been characterized in detail and classified in different families, little is known on MTs from non-ciliate protists. Although some common features—such as a prevalence of CXC motifs—were observed, MTs from non-ciliate protists do not share a common evolutionary origin and are likely to result from the convergent evolution of different genes [18]. Overall, very little is known to date on proteins from microalgae and, in general, from protists different from ciliates. Here we predicted, through a bioinformatic approach, novel potential MTs from eukaryotic microbial genomes and transcriptomes.

    2. Material and methods

    We searched 44 genomes [19] and 636 transcriptomes [20] for novel MTs of non-ciliate protists. The amino acid sequences of the proteins predicted from the genomes were downloaded from GenBank (electronic supplementary material, table S1), whereas a re-assembled version of the proteins predicted from the marine microbial eukaryote transcriptome sequencing project (MMETSP) database (electronic supplementary material, table S2) was downloaded from iMicrobe [21,22]. We carried out structural analyses of the proteins predicted from the abovementioned databases using InterProScan [23] with default parameters (; proteins found to possess regions identified as MT-domains with a score (e-value) of less than 5 × 10−5 were retained for downstream analyses (electronic supplementary material, table S3). GPS-Prot software [24] was used to plot the position of the different domains within each protein. The resulting proteins were aligned using MAFFT-linsy [25] and analyses revealed the presence of one to three highly conserved CTCGXXCXCGXXCXCXXC patterns in most proteins. We then searched for other proteins possessing the CTCGXXCXCGXXCXCXXC pattern within the abovementioned databases and results were then added to the previous alignments (electronic supplementary material, figure S1). A sequence logo of the abovementioned pattern was generated using WebLogo [26].

    3. Results and discussion

    Functional analyses of genomes and transcriptomes sequenced from non-ciliate protists yielded 10 unique proteins possessing putative MT-specific domains (table 1; electronic supplementary material, table S4). AlanMT protein (Armaparvus languidus, amoebozoan and excavate) possesses a region sharing similarities with a domain present in yeast MTs (IPR035715), whereas all the other proteins found here contain two adjacent regions sharing similarities with known MT domains from molluscs (IPR001008). Most (8 out of 10) proteins also contain a putative HM-associated domain (HMA, IPR006121) located in proximity of the N-terminus (figure 1), one to three conserved cysteine-rich patterns 18 AA long (CTCGXXCXCGXXCXCXXC), and have been originally isolated from species affiliated to the Stramenopile–Alveolata–Rhizaria (SAR) supergroup. Twelve additional unique proteins containing the same 18 AA pattern were subsequently found in other SAR species (electronic supplementary material, table S5). Overall, structural analyses and pattern search allowed the identification of 21 unique proteins (table 1), 19 of which derive from SAR species and possess a highly conserved cysteine-rich pattern, that are likely to play a role as MTs (figure 2). Thirteen putative MTs are present in more than one transcriptome of the MMETSP database being thus very unlikely to result from contaminations or sequencing errors. Interestingly, in many cases, our putative MTs derive from transcriptomes sequenced out of specimens collected under stress conditions such as high light irradiance (greater than 300 µE m−2 s−1) or under nitrogen (less than 2 µM) or phosphorus (less than 0.5 µM) limitation (table 1). Both high light irradiance and nutrient starvation can generate oxidative stress [27,28] that has been reported to induce MT biosynthesis [4,29]. Current data thus suggest that the proteins found here are more likely to be expressed while microorganisms thrive under oxidative stress conditions, coherently with a potential role as MTs.

    Table 1. List of protein sequences predicted from eukaryotic genomes and transcriptomes as likely to play a role as MTs, as revealed by Interproscan analyses or motif search.a

    protein ID species class supergroup strain ID transcriptome ID database ID no. identical transcriptsb stress conditionc interproscan domain coded
    EinvMT Entoamoeba invadens archamoeboe Amoebozoans IP1 NA XP_004259069 1 NA
    AlanMT Armaparvus languidus vannellids Amoebozoans PRA-29 MMETSP0420 Tr3694 1 HL IPR035715
    CsorMT Chlorella sorokiniana green algae Archaeplastida 1602 NA PRW44601.1 1 NA IPR002045
    MconMT Micractinium condutrix green algae Archaeplastida SAG 241.80 NA PSC70917 1 NA
    TvagMT Trichomonas vaginalis parabasalids Discoba ATCC PRA-98 NA XP_001321197 1 NA
    CrotMT Chrysochromulina rotalis haptophytes Hacrobians UIO044 MMETSP0287 Tr26136 1 HL IPR001008
    CowcMT Capsaspora owczarzaki filozoa Opisthokonts ATCC 30864 NA XP_011270693 1 NA
    BbigMT Babesia bigemina apicomplexa SARe NA XP_012768823 1 NA
    BlasMT Blastocystis sp. bigyra SAR ATCC 50177 NA OAO13187 1 NA
    EsilMT Ectocarpus siliculosus brown algae SAR NA CBJ32637 IPR001008
    FvesMT Fucus vesiculosus brown algae SAR NA CAA06729 IPR001008
    AglaMT1 Asterionellopsis glacialis diatoms SAR CCMP134 MMETSP0708 Tr19519 3 N-/P- IPR001008
    AglaMT2 Asterionellopsis glacialis diatoms SAR CCMP1581 MMETSP1394 Tr220 1 N-/P-
    CwaiMT Coscinodiscus wailesii diatoms SAR CCMP2513 MMETSP1066 Tr41518 1 HL IPR001008
    DbriMT Ditylum brightwellii diatoms SAR GSO105 MMETSP0998 Tr22984 8 HL/No/N-/P- IPR001008
    EspiMT Extubocellulus spinifer diatoms SAR CCMP396 MMETSP0697 Tr10701 1 Si-/No/HL
    MpolMT Minutocellus polymorphus diatoms SAR NH13 MMETSP1070 Tr24663 2 No/HL
    OaurMT Odontella aurita diatoms SAR Is-1302-5 MMETSP0015 Tr34634 2 HL
    PdubMT Pseudodictyota dubia diatoms SAR CCMP147 MMETSP1175 Tr24667 1 HL IPR001008
    SyneMT Synedropsis sp. diatoms SAR CCMP1620 MMETSP1176 Tr28518 2 HL IPR001008
    TpseMT Thalassiosira pseudonana diatoms SAR CCMP1335 NA XP_002296843 1 NA
    AcatMT Alexandrium catenella dinoflagellate SAR OF101 MMETSP0790 Tr99632 1 No
    AmonMT Alexandrium monilatum dinoflagellate SAR CCMP3105 MMETSP0096 Tr45933 4 HL/P-
    AzspMT Azadinium spinosum dinoflagellate SAR 3D9 MMETSP1037 Tr93697 2 HL IPR001008
    GspiMT Gonyaulax spinifera dinoflagellate SAR CCMP409 MMETSP1439 Tr79705 1 HL
    LpolMT Lingulodinium polyedrum dinoflagellate SAR CCMP1738 MMETSP1032 Tr14667 4 No/HL
    AplaMT Aplanochytrium sp. labyrinthulids SAR PBS07 MMETSP0956 Tr7261 4 NA IPR001008
    AstoMT Aplanochytrium stocchinoi labyrinthulids SAR GSBS06 MMETSP1349 Tr9377 4 NA IPR001008
    AuanMT1 Aureococcus anophagefferens pelagophyceae SAR CCMP1850 MMETSP0917 Tr30268 3 N-
    AuanMT2 Aureococcus anophagefferens pelagophyceae SAR CCMP1984 NA XP_009037419 1 NA IPR001008
    PsubMT Pelagococcus subviridis pelagophyceae SAR CCMP1429 MMETSP0883 Tr17315 3 N-
    PcalMT Pelagomonas calceolata pelagophyceae SAR RCC969 MMETSP1328 Tr480 4 NA

    aKnown MTs identified in previous studies are in bold.

    bIn many cases, 2 or more identical proteins possessing MT-specific domain or resulting from keyword searches were found from different transcriptomes of the same strain.

    cStress condition at which the strain was maintained prior to transcriptome sequencing. ‘No’ refers to transcriptomes derived from strains cultured at standard conditions.

    In some cases identical sequences were obtained from different transcriptomes reflecting either different stress treatments or both stress and non-stress conditions. Abbreviations: NA, not available; N-, nitrogen deprivation (<2 μM); P-, phosphorus deprivation (<0.5 μM); Si-, silica deprivation (<0.5 μM); HL, high light (>300 μEm−2 s−1).

    dThe sequences without an InterProScan code were identified by keyword search of the conserved CTCGXXCXCGXXCXCXXC motif.


    Figure 1.

    Figure 1. Proteins from different microbial eukaryotes containing MT-specific domains as found by structural analyses using InterProScan [23]. Numbers indicate protein length and the position of the different domains. Domains specific for MTs are in black (Mollusc MTs, IPR001008; crustacean MTs, IPR002045; eukaryotic MT, PF12809), whereas HM-associated domains (HMA, IPR006121) are in grey. Species name and sequence identifiers are indicated on the left of each putative MTs, whereas class names are on the right.

    Figure 2.

    Figure 2. Alignment of the putative MTs from heterokonts (Labyinthulids, Pelagophyceae and diatoms) and dinoflagellates and sequence logo of the highly conserved motif CTCGXXCXCGXXCXCXXC. Underlined sequence IDs correspond to putative MTs found in the present study, whereas IDs that are not underlined are related known MTs from previous studies. Numbers reflect the amino acid position with respect to the longest protein found here (SyneMT from Synedropsis sp. CCMP1620). Cysteine residues are highlighted in black while histidine residues, which might also be involved in HM binding, are in grey. Only the regions corresponding to the HM-associated domains (HMA, IPR006121, positions 197 to 242) and those exhibiting the cysteine-rich motif CTCGXXCXCGXXCXCXXC are shown for clarity, whereas the full alignment is shown in electronic supplementary material, figure S1. MTs predicted in this study are underlined, whereas MT activity has been previously proven or predicted in the other proteins. The species, strain and treatment associated with each protein abbreviated here are reported in table 1. Sequence logo was created using WebLogo (

    Little is known on metal-binding mechanisms in microalgal MTs. MTs are generally known to have affinities with monovalent and divalent ions, with each cation coordinated by 3 to 4 cysteine residues, and each residue coordinating one or two cations [7,30,31]. The number of monovalent or divalent metal cations that can be coordinated by the putative MTs found here cannot be predicted in silico but needs to be evaluated experimentally. It has been suggested that a MT is able to chelate a number of monovalent cations slightly higher than half of its cysteine residues and a number of divalent cations lower than 50% of its cysteine residues [7,30,31]. Short putative MTs such as AlanMT, CrotMT or EspiMT can coordinate around 5–10 cations, whereas the longest proteins found here such as AplaMT (258 AA), AstoMT (264) and SyneMT (255) can coordinate up to 30 cations.

    Current results strongly suggest that at least the proteins that possess an HMA domain along with two adjacent MT domains (figure 1) that were found here from SAR representatives are likely to play a role as MTs. HMA domains have previously been found in proteins involved in HM transport and detoxification in mammals [32,33], and two adjacent MT-domains typically occur in known MTs from plants [7], mammals [34] and ciliates [35]. Proteins found here from SAR representatives are longer than most known MTs (table 1), ranging from 189 (DbriMT) to 320 AA (OaurMT). The presence of multiple, conserved cysteine-rich patterns (figure 2), and the fact that such proteins are longer than average, suggest that putative SAR MTs might have resulted from gene duplication of shorter MTs, similarly to what has been hypothesized for very long MTs in fungi [36], molluscs [37] and T. vaginalis [10].

    The cysteine content found in our putative MTs is lower than that of most known MTs, ranging from 8% (AlanMTs) to 19% (CrotMT and CwaiMT) and was highly variable even within SAR-derived proteins (table 2). Histidine content is very low (less than 2%) in all proteins except for CrotMT (3.6%) and SyneMT (3.9%); aromatic amino acids account for less than 5% in most proteins, whereas lysine contribution ranges from 0.9% (CrotMT) to 10% (AlanMTs). Overall, putative SAR MTs found here, along with the known MT AuanMT2, exhibit a similar domain distribution (figure 1), contain cysteine residues mostly clustered in CXC motifs and share one to three conserved 18 AA patterns (figure 2). Gutiérrez et al. [18] observed a predominance of CXC motifs, especially CKC, in MTs from non-ciliate protists. However, while some known MTs like BlasMT, CowcMT and TvagMT are indeed rich (more than 8) in CKC motifs, this does not seem to be a common feature among the putative MTs found here in non-ciliate protists. For example, AuanMTs and MconMT do not contain such motifs, whereas only one CKC motif occurs in BbigMT, CsorMT and TpseMT (table 2). Similarly, among our putative MTs, a CKC motif occurs five times in OaurMT, but it is repeated three times or less in the other proteins. In general, CTC and CQC motifs are more common than CKC motifs in our putative SAR MTs (table 2). Current data indicate that both proteins with an experimentally proven HM-binding activity and putative MTs found here via bioinformatic analyses exhibit a highly variable content in CKC, CTC and CQC motifs.

    Table 2. Main features and proportions of amino acids potentially involved in metal chelation, for the putative MTs found in the present study.

    species protein ID length amino acid cysteinea (%) residues histidine (%) aromatic AA (%) CXC specific motifs
    18 AAb
    Alexandrium catenella AcatMT 189 9 1.6 3.2 6 2 1 1
    Alexandrium monilatum AmonMT 196 10 1.0 5.1 6 3 1 1
    Aplanochytrium sp. AplaMT 258 15 0.4 1.9 12 0 5 3
    Aplanochytrium stocchinoi AstoMT 264 14 1.1 1.5 12 2 6 3
    Armaparvus languidus AlanMT 116 8 1.7 7.7 3 0
    Asterionellopsis glacialis AglaMT1 204 13 1.5 2.5 9 1 3 2
    Asterionellopsis glacialis AglaMT2 196 14 1.0 2.6 9 1 4 1
    Aureococcus anophagefferens AuanMT1 232 15 0.0 1.3 11 1 4 1
    Azadinium spinosum AzspMT 208 11 0.5 4.8 6 3 1 1
    Chrysochromulina rotalis CrotMT 112 19 3.6 6.3 6 1
    Coscinodiscus wailesii CwaiMT 312 19 0.0 0.0 18 0 7 4
    Ditylum brightwellii DbriMT 193 11 1.0 1.6 6 0 3 1
    Extubocellulus spinifer EspiMT 129 12 0.8 1.6 4 0 2 1
    Gonyaulax spinifera GspiMT 164 10 0.6 3.7 6 3 1
    Lingulodinium polyedrum LpolMT 196 10 1.0 3.6 6 1 1 1
    Minutocellus polymorphus MpolMT 203 11 1.0 1.5 6 0 3 1
    Odontella aurita OaurMT 320 17 0.0 1.3 17 5 6 3
    Pelagococcus subviridis PsubMT 207 11 0.5 2.4 6 3 2 1
    Pelagomonas calceolata PcalMT 160 11 0.6 2.5 6 0 1 2
    Pseudodictyota dubia PdubMT 260 19 0.0 0.8 15 3 5 3
    Synedropsis sp. SyneMT 255 11 3.9 4.7 9 1 2 1
    Aureococcus anophagefferens AuanMT2 171 18. 0.0 1.2 12 0 3 1
    Babesia bigemina BbigMT 214 12 1.9 7.9 2 1
    Blastocystis sp. BlasMT 207 40 0.0 0.0 33 29
    Capsaspora owczarzaki CowcMT 176 27 0.0 0.0 15 11 1
    Chlorella sorokiniana CsorMT 56 32 0.0 0.0 6 1 1
    Entoamoeba invadens EinvMT 103 35 0.0 1.9 13 4
    Micractinium condutrix MconMT 59 30 0.0 0.0 6 0 3
    Thalassiosira pseudonana TpseMT 141 13 1.4 5.7 6 1
    Trichomonas vaginalis TvagMT 308 30 6.2 2.3 41 9

    aValues refer to the numbers of amino acids in the sequence.

    bSpecific, 18 amino acid motif (CTCGXXCXCGXXCXCXXC) identified in the putative MTs found in the present study.

    The possible role of our SAR proteins as MTs is further suggested by the presence in known MTs, from some metazoans, amoebozoans, fungi and higher plants, of a region slightly different from our 18 AA pattern. In this case, the threonine residue on the second position is replaced by other polar or positively charged amino acids (electronic supplementary material, figure S2). Besides this difference, putative SAR MTs share the same number and position of cysteine residues with metal-binding domains in Type 1 MTs from plants [7], copper and cadmium MTs in snails [38], and silver MTs in fungi [39].

    In spite of the similarities found, even putative SAR MTs, possessing the shared 18 AA pattern, exhibit great differences among each other, and we could not construct a meaningful (i.e. bootstrap support greater than 30%, using neighbour joining or maximum-likelihood algorithms) phylogenetic tree from the alignment of such sequences. This variability is likely to reflect the broad genetic diversity of non-ciliate protists and suggests that, although SAR species share a common evolutionary origin [16], their MTs are likely to result from convergent evolution of different genes, in spite of the shared 18 AA pattern.

    Although the putative SAR MTs found here possess two regions related to metal-binding domains of mollusc MTs (figure 1) and a conserved 18 AA cysteine-rich region (figure 2) that can be found, in part, in MTs from different organisms (electronic supplementary material, figure S2), none of the putative SAR MTs found here possesses the motifs previously described for the 15 MT families [10,11] and thus do not belong to any family described to date. Although ciliates are part of the SAR supergroup, MTs from ciliates (Family 7) are shorter, contain greater proportions of cysteine and differ in their amino acid sequence from the putative SAR MTs found here [9]. In addition, except for AuanMT2, known unclassified MTs from SAR species (BbigMT, BlasMT, EsilMT, FvesMT and TpseMT) do not possess the conserved 18AA pattern observed here (figure 2), suggesting great differences even within SAR MTs.

    MTs can contribute to the development of more efficient HM-sensors. Whole-cell MT-based biosensors have been developed in different microbes [4042] and ciliates are currently considered as the most suitable candidates because of the absence of cell wall [43,44]. However, testing the potential of MTs from other microbes for the development of whole-cell biosensors might yield some more efficient candidates. Microalgae can be cultured autotrophically in simple seawater or freshwater enriched with basic nutrients, and several green algae, diatoms, dinoflagellates and Eustigmatophyceae are commonly used for genetic editing. In particular, lightly silicified diatoms, unarmoured dinoflagellates and Chlorella spp. are known for their weak cell walls [45], and cell wall-free mutants of Chlamydomonas spp. are currently available ( Diatoms and dinoflagellates typically dominate shallow benthic communities [46], including HM-contaminated sediments [47], and might thus be suitable for the development of MT-based sensors.

    Bioinformatic mining of eukaryotic genomes and transcriptomes thus contributed to predict putative MTs of 21 species, 19 of which derive from SAR representatives and share an 18 amino acid-long cysteine-rich motif. The biological function of these proteins remains to be experimentally proven for a complete structural and functional in vivo characterization, as well as for the quantification of MT expression in polluted environments and in laboratory microcosms by real-time PCR, and, finally, for the development of MT-based biosensors. Furthermore, physiological assays of species tolerance to HMs can be combined with gene expression determination to improve our understanding of microbe–HM interactions.

    Data accessibility

    All of the GenBank and MMETSP IDs for the sequences used in this study are included in the electronic supplementary material, tables [48]. The electronic supplementary material also includes the protein sequences used in these studies and the same sequences aligned to identified conserved patterns. Both files are available as fasta files.

    Authors' contributions

    S.B.: conceptualization, formal analysis, investigation, writing—original draft and writing—review and editing; A.S.: conceptualization, writing—original draft and writing—review and editing.

    Both authors gave final approval for publication and agreed to be held accountable for the work performed therein.

    Competing interests

    We declare we have no competing interests.


    We received no funding for this study.


    The authors are grateful to M. Miralto and L. Ambrosino (RIMAR, SZN) for their support in bioinformatic data processing, and to G. Lanzotti (RIMAR, SZN) for graphical assistance. Analyses were performed by using the SZN bioinformatics server Falkor available at SZN ( The authors received no financial support for the research and authorship of this article.


    Electronic supplementary material is available online at

    Published by the Royal Society. All rights reserved.