Characteristics of 29 novel atypical solute carriers of major facilitator superfamily type: evolutionary conservation, predicted structure and neuronal co-expression

Solute carriers (SLCs) are vital as they are responsible for a major part of the molecular transport over lipid bilayers. At present, there are 430 identified SLCs, of which 28 are called atypical SLCs of major facilitator superfamily (MFS) type. These are MFSD1, 2A, 2B, 3, 4A, 4B, 5, 6, 6 L, 7, 8, 9, 10, 11, 12, 13A, 14A and 14B; SV2A, SV2B and SV2C; SVOP and SVOPL; SPNS1, SPNS2 and SPNS3; and UNC93A and UNC93B1. We studied their fundamental properties, and we also included CLN3, an atypical SLC not yet belonging to any protein family (Pfam) clan, because its involvement in the same neuronal degenerative disorders as MFSD8. With phylogenetic analyses and bioinformatic sequence comparisons, the proteins were divided into 15 families, denoted atypical MFS transporter families (AMTF1-15). Hidden Markov models were used to identify orthologues from human to Drosophila melanogaster and Caenorhabditis elegans. Topology predictions revealed 12 transmembrane segments (for all except CLN3), corresponding to the common MFS structure. With single-cell RNA sequencing and in situ proximity ligation assay on brain cells, co-expressions of several atypical SLCs were identified. Finally, the transcription levels of all genes were analysed in the hypothalamic N25/2 cell line after complete amino acid starvation, showing altered expression levels for several atypical SLCs.


Introduction
It is essential that transport of nutrients, waste and drugs over lipid bilayers is executed accurately to keep the homeostasis within the body, and disturbances in the transport systems are associated with Mendelian diseases [1,2]. Most transport is carried out by three major types of transporters [3]: channels, primary active transporters and secondary active transporters. With its 430 members [4], the secondary active transporters, commonly called the solute carriers (SLCs), constitute the largest group of membrane-bound transporters in humans [5]. The SLCs are currently divided into 52 families [6]. SLCs use energy from coupled ions or facilitative diffusion to move substrates via coupled transport, exchange or uniport [7]. SLC transporters are crucial throughout the body, and their importance is particularly prominent in the brain, where they, for example, gate nutrients over the blood-brain barrier [8], terminate neuronal transmission by clearing neurotransmitters from the synaptic cleft [9,10], refill vesicles [11] and maintain the glutamine-glutamate cycle [12]. These mechanisms are used in pharmacology, where transporters are used either as direct drug targets [2,10] or indirectly as facilitators of drug distribution to specific tissues [13].
Most SLC proteins can be divided into Pfam clans based on sequence similarity [4,14], where the major facilitator superfamily (MFS; Pfam clan id: CL0015), amino acid/polyamine/organocation (APC; CL0062), cation : proton antiporter/anion transporter (CPA/AT; CL0064) and drug/ metabolite transporter superfamily (DMT; CL0184) clans include more than one SLC family [4,14,15]. Approximately one-third of all SLCs belong to the MFS clan [4], making it the largest group of phylogenetically related SLCs. MFS is a large and diverse family of proteins [16], which evolved from a common ancestor [17]. This ancient family has members in several organisms, including bacteria, yeast, insects and mammals [16][17][18][19][20]. As MFS proteins are closely related, they usually share protein topology. MFS proteins are single polypeptides [16], usually composed of 400-600 amino acids [21]. They probably arose by duplication of a six transmembrane segment (TMS), providing the N and C domains, which are connected by a long cytoplasmic loop between TMS 6 and 7 [21], resulting in a 12 TMS protein [17]. It is suggested that transporters containing the MFS fold move substrates via the rocker-switch mechanism [22] or through the updated clamp-and-switch model [23].
Among the 430 human SLCs, 30 proteins are called atypical SLCs as they are evolutionarily connected to SLCs [4], but are yet to be classified into any existing SLC family. Twentyeight of the atypical SLCs belong to the MFS Pfam clan [4] and are discussed in this article, together with the non-MFS Pfam clan protein ceroid lipofuscinosis, neuronal 3 (CLN3). According to the transporter classification database [24], CLN3 belongs to the equilibrative nucleoside transporter, which is a subfamily of the larger MFS superfamily. Additional atypical SLCs are TMEM104 that belong to the APC clan and OCA2 which cluster with the IT clan [4]. The atypical SLCs of MFS type are the major facilitator superfamily domain containing (MFSD) proteins, MFSD1, 2A, 2B, 3, 4A, 4B, 5, 6, 6 L, 8,9,10,11,12, 13A, 14A and 14B; the synaptic vesicles glycoprotein 2 (SV2) proteins, SV2A, SV2B and SV2C; the SV2-related proteins SVOP and SVOPL; three sphingolipid transporters, SPNS1, SPNS2 and SPNS3; and two unc-93 proteins, UNC93A and UNC93B1 [4]. These proteins were identified as possible SLCs by searching the human proteome using hidden Markov models (HMM) composed of known SLC sequences originating from the MFS Pfam clan [4]. MFSD7 was also included in the analysis and considered as an atypical SLC, because of its status as an orphan protein. However, MFSD7 is already classified into the SLC49 family [25]. Knowledge about atypical SLCs is limited, which is why we aim to present a cohesive study of the basic characteristics of 29 atypical SLCs belonging to the MFS clan. They cluster phylogenetically with SLC families from the MFS Pfam clan, SLC2, 15 16, 17, 18, 19, SLCO (SLC21), 22, 29, 33, 37, 40, 43, 45, 46 and 49 [4], suggesting that they have transporter properties, and are involved in homeostatic maintenance. Since the atypical SLCs are MFS proteins, it is likely that they all are constituted of the common 12 TMS polypeptides [17], which has been predicted for some (e.g. MFSD1 [19], MFSD2A [26], MFSD8 [27,28], SVOP [29] and UNC93B1 [30]), while CLN3 only has six predicted TMSs [31,32].
Several atypical SLCs are expressed in the brain, where they are found in neurons [19,20,33,34] and the CNS vasculature system [35]. Concerning their subcellular expression, atypical SLCs are expressed both in the plasma membrane [19,36] and intracellular membranes [27,33,[37][38][39][40] (localizations summarized in table 1). There are also contradictory reports, suggesting that the same protein is located in several subcellular locations; MFSD1 is found in embryonic mouse neuronal plasma membranes [19] and lysosomal membranes in HeLa and rat liver cells [39,41], which could be explained by translocation of the transporters in the cell, serving multiple functions under different conditions or states of the cell. SV2 proteins are identified both at synaptic vesicles [49] and the plasma membrane, possibly because the synaptic vesicles fuse with the plasmalemma during neurotransmitter release. CLN3 is expressed at the plasma membrane as well as on endosome/lysosome membranes [34], where it is involved in neuronal ceroid lipofuscinosis, which leads to neurodegenerative disorders resulting from the accumulation of lipofuscin [57]. This is of interest because MFSD8 (known as CLN7) is also involved in this pathology [58].
Several atypical SLCs are affected by food intake and nutritional status, where both high-fat diet and food deprivation alter their expression levels in rodents [19,20,33,37,59]. Furthermore, the expression of Mfsd11 is altered in immortalized mouse hypothalamic N25/2 cells exposed to complete amino acid starvation [60]. This suggests that the atypical SLCs are involved in maintaining the nutritional status both in vivo and in vitro, which reinforces the importance of understanding their fundamental properties.
Here, we phylogenetically studied interrelations between the atypical SLCs of MFS type and similarities between the protein sequences. Furthermore, we investigated if the atypical SLCs met the requirements to belong in any of the existing 52 SLC families. SLC families are divided on the basis of homology or phenotype [61], and a protein must share at least 20% sequence identity to another family member [62] to be placed in that family. HMMs were built to search proteomes from several organisms to identify related proteins, showing their evolutionary development. Furthermore, topology predictions were made for the human protein sequences, suggesting 12 TMS for all investigated atypical SLCs, except for CLN3 with its 11 predicted TMS. With single-cell RNA sequencing data retrieved from 10X genomics (www.10xgenomics.com/), we examined which atypical SLCs were expressed in the same cell from an 18 days mouse embryo brain. We supplemented these results at protein level using in situ proximity ligation assay [63,64], where interaction between proteins were quantified in mouse brain sections. Finally, using microarray data [60], we analysed if and how the atypical SLCs were affected by complete amino acid deprivation in N25/2 cells.
An additional tree was built, including all known SLC and atypical SLC sequences originating from the MFS Pfam  [61] and sequence identities [62]. As the atypical SLCs group among SLC families [4], it is possible that they belong to already annotated SLC or new families. To study this further, sequence identities were analysed using global pairwise sequence alignment based on the Needleman-Wunsch algorithm [70]. The similarities between human atypical SLCs were analysed, followed by comparison with all SLC members of MFS type (SLC family 2, 15 16,17,18,19,SLCO,22,29,33,37,40,43,45,46 and 49) (matrixes in electronic supplementary material, table S1). To group the atypical proteins into families, the following parameters were considered: (i) 20% identity to other atypical SLCs, (ii) phylogenetic clustering among the atypical SLCs, (iii) phylogenetic clustering among SLCs and (iv) 20% identity to at least one other SLC family member. Families including atypical SLCs were called atypical MFS transporter families (AMTF).

Hidden Markov models to identify related proteins
Hidden Markov models (HMM) were built for all 29 atypical SLCs by running mammalian sequences through HMMBUILD from the HMMER package [71]. The models were used to search the protein datasets (obtained from ENSEMBL version 86 [72]) listed in table 2, to identify related proteins in yeast, roundworm, fruit fly, zebrafish, chicken, mouse and human. Sequences were manually curated, and proteins originating from the same locus and pseudogenes were removed. Genes not in closest phylogenetic proximity with the human version were also removed, as they were either without specific orthologues in mammals or that they phylogenetically clustered to other proteins. Predicted full-length proteins were kept as related reliable hits. As the atypical SLCs are relatively similar in amino acid sequence, proteins were identified in several HMM. Phylogenetic analyses were therefore performed, using RAxML, as described above, to determine which were orthologues and other related proteins. All identified proteins were annotated and listed with accession number in electronic supplementary material, table S2. Note that some proteins were given names with Like (L) as a suffix, and these were related proteins identified by the HMM, without belonging to the human protein cluster. It is possible that these are orthologues to proteins not studied here, or that they lack equivalents in humans.

Structural predictions to study possible transporter properties
For a MFS protein to have optimal transporter properties, 12 transmembrane segments (TMS) are required [17]. To investigate if the proteins of interest possessed the common MFS structures, topology predictions were done using the constrained consensus TOPology prediction server (CCtop) [73,74]. CCtop combine the results from 10 known online topology tools to incorporate parameters like hydrophobicity, charge bias, helix lengths and signal peptides in the predictions [75,76], and further combine the result with structural information from existing experimental and computational sources [73]. Three of the proteins were not predicted to contain 12 TMS, MFSD13A, SPNS3 and CLN3, and homology models were built to verity these three predictions. The tertiary structures were built using SWISS MODEL, a fully automated homology program [77], where structurally known MFS transporters were used as templates. MFSD13A was aligned against the bacterial sodium symporter, MelB [78], providing global model quality estimation (GMQE) of 0.47. GMQE indicates the reliability of models on a scale range from 0 to 1, where 1 represents total reliability. For the SPNS3 model, the protondriven YajR transporter from E. coli was used as template [79], with a GMQE of 0.45. For CLN3, a peptide MFS transporter from bacteria [80] was used as template, providing a score of 0.44. Homology models were adjusted in the open-source Java viewer JMOL [81] (http://www.jmol.org/). Finally, the amino acids in each TMS from the homology models were manually identified and compared with the ones predicted by CCTOP.

RNA analysis from single brain cells, to identify co-expression between atypical SLCs
The complete dataset (9 k brain cells from an E18 Mouse) for single-cell RNA sequencing from E18 mouse brain was downloaded from 10X Genomics (www.10xgenomics.com) under a Creative Commons license. The data was analysed to investigate co-expression of atypical SLCs of MFS type in single brain cells. Of note, 10 289 cells were collected from cortex, hippocampus and subventricular zone of an E18 mouse, and sequenced on Illumina Hiseq4000 with approximately 42 000 reads per cell (10X Genomics). A digital expression matrix was constructed based on that data to extract information from the atypical SLCs, and removing cells with fewer than three identified transcripts. Then, cells expressing fewer than two different atypical SLC transcripts were removed. This resulted in 9693 cells co-expressing 21 atypical SLCs. To assess the significance of these observations, we used a bootstrapping approach, implemented in a custom written Java program. Briefly, in the implementation, as our null hypothesis, we assumed that there was no co-expression observed in the data over what is expected by chance. We created a dataset with the same frequency of each of the transcripts as observed in our actual data and randomly assigned these transcripts to 9693 cells. This process was repeated 1000 times and the mean number of transcripts and the population standard deviation of the number of transcripts for each cell were calculated. We considered any values one standard deviation above and below the mean of the bootstrapped data as significantly different from true chance.

In situ proximity ligation assay, sample preparation, execution and analysis
To complement the co-expression, in situ proximity ligation assay (PLA) was performed. Intra-peritoneal injections of sodium Pentobarbital (Apoteket Farmaci, Sweden) (10 mg kg 21 ) were used to anesthetize adult C57BL6/J mice, followed by trans-cardiac perfusion using 4% formaldehyde (Histolab) and then paraffin embedding, as described in [20]. The brains were cut in 7 mm sections using a Microm 355S STS cool cut microtome and attached on Superfrost Plus slides (Menzel-Gläser). Each slide was dried overnight at 378C before stored at 48C. Sections were deparaffinized by 10 min washes in X-TRA solve (Medite, Dalab), followed by an ethanol (Solveco) rehydration series ranging from 100% to water. Antigen retrieval was performed in boiling 0.01 M citric acid (Sigma-Aldrich) at pH 6.0, for 10 min, after which the slides were cooled, washed in PBS, and placed in a humidity chamber throughout the experiment to avoid drying out during incubations at 378C. Brain sections were blocked for 1 h at 378C in blocking solution, provided by Duolink II fluorescence kit (orange detection reagents; Olink Biosciences), followed by primary antibody incubation at 48C overnight (table 3 for antibody   Table 3. Antibody combinations and concentrations used for the in situ proximity ligation assay. Micrographs were taken using a Zeiss Axioplan 2 epifluorescent microscope, and 11 Z-stacks from various brain areas, like cortex and striatum, were acquired for each antibodypair combination. Filters suitable for the used fluorophores and a filter to detect autofluorescence were used. The Z-stacked images were transformed using the maximum intensity projection function in IMAGEJ v. 1.48 [82], to merge the signals into a one plane image. CELLPROFILER v. 2.2.0 [83,84] was then used to analyse the signals. The autofluorescence data were used to subtract background from the images, after which the images were cleared using a white tophat filter to remove anything over 10 pixels in diameter, leaving only the amplified signal. DAPI staining was used to define cells to enable automated counting of PLA signals within specific cells, and all signals with pixel intensity above 0.08 were automatically counted. The combined signal from all brain areas was divided with number of cells, to get an average of interactions within the brain. A graph was plotted using GraphPad PRISM 5 software.

Analysis of gene expression after complete amino acid starvation in N25/2 mouse hypothalamic cells
It was previously shown that gene expression of Mfsd11 is altered upon complete amino acid starvation for 1, 2, 3, 5 or 16 h in immortalized N25/2 mouse hypothalamic cells [60]. Here, we reused the data from their microarray analysis (accession number GSE61402) to study if the atypical SLCs were affected by the removal of all amino acids. Data were downloaded and the probes most similar to the human proteins were included in the analysis. Note that two genes (Unc93a and Cln3) had two probes each that correspond to the human protein on the GeneChip, which is why both are presented in the heat map. The duplicated probes are splice variants that are present under different accession numbers in the database used to define the genes on the chip. GENESIS version 1.7.6 was used to generate the heat map. For 1, 2, 3 and 16 h, the difference between the log 2 values of expression between starved and control cells were used in the analysis. For 5 h of starvation, the log 2 fold change value of expression was used. Green colour represents downregulation and red colour represents upregulation, where more alteration correlates with more colour intensity.

Interrelations between human SLCs of MFS type
The phylogenetic interrelations between atypical SLCs were inferred in the phylogenetic tree presented in figure 1, where the schematic branching order is displayed in the figure. Some sequences were seemingly diverged from the other proteins, like MFSD3, MFSD6, MFSD6 L, MFSD7, MFSD8, MFSD12, MFSD13A and CLN3 (figure 1), while others formed potential families connected by a common node. Grouping of proteins is important as it strengthens the possibility to elucidate evolutionary conservation, mechanism and substrate specificity, because similar sequences usually share these characteristics [85]. To divide the atypical SLCs into families, members had to share phylogenetic closeness and be 20% identical to other proteins in the family. Among the atypical SLCs we identified 15 possible families that were denoted Atypical MFS Transporter rsob.royalsocietypublishing.org Open Biol. 7: 170142 MFSD2A and MFSD2B belonged to AMTF8; while SV2A, SV2B, SV2C, SVOP and SVOPL were in AMTF9. AMTF10 included MFSD11, UNC93A and UNC93B1; and AMTF11 consisted of SPNS1, SPNS2 and SPNS3 (figure 1). To examine the plausible family members further, similarities between protein sequences were analysed. All sequence identities were listed in the matrixes in supplementary table 1, where 24 of the 29 atypical SLCs had more than 20% identical amino acids to at least one other atypical SLC sequence. MFSD3, MFSD6, MFSD6 L, MFSD8 and MFSD13A had less than 20% identity with any other atypical SLC protein. In predicted AMTF1 (for members, see figure 1), all four proteins shared more than 20% identity with at least one other member, as were the case for AMTF9, AMTF10 and AMTF11. In AMTF3, MFSD4A and MFSD4B shared 20% identity, and AMTF8 was constituted by MFSD2A and MFSD2B sharing 37% identity. MFSD1 and MFSD5 did not cluster in closest proximity, yet shared 20% identity, and were considered constituents of the same family. The remaining eight atypical SLCs did not meet the clustering and/or identity criteria and were placed in individual families. Taken together, the atypical SLCs can be grouped into 15 possible AMTF (summarized in figure 1). The AMTF nomenclature was used instead of the SLC nomenclature to highlight that the functions of the atypical SLCs remains to be elucidated.
The distribution of the atypical SLCs among the SLCs of MFS type was investigated through a phylogenetic analysis. It showed that the proteins of interest placed within the SLC tree, and not as outgroups ( figure 2). This strengthens the hypothesis that they are novel transporters of SLC type. When comparing the sequence identities (MFS matrix 2 in supplementary table 1), the following atypical proteins had less than 20% identity with any other SLC: MFSD2A, MFSD4B, MFSD6, SV2A, SV2B, SV2C and UNC93B1. On the other hand, some atypical SLCs had at least 20% identity to members of several families, like MFSD1, which was more than 20% identical with SLC2A8, SLC16A10, SLC19A2; and MFSD9 and MFSD10, having 20% or higher identity with members from seven different SLC families each. Finally, no atypical SLC shared more than 20% with all members in a single SLC family. Therefore, it is not possible to place the atypical SLCs into existing SLC families based only on sequence identity. However, when combining the sequence identity and phylogenetic clustering (figure 2), possible

Identification of related proteins in several species
With hidden Markov models, several protein datasets were searched to identify related proteins in various species. The atypical SLCs were identified in human and mouse (figure 3), where UNC93A had duplicated in mouse resulting in two variants on the same chromosome. All but MFSD3, MFSD6 L, SPNS1 and CLN3 were found in chicken ( figure 3). Furthermore, MFSD14B was identified in both the MFSD14A and MFSD14B HMM search in the chicken proteome, but it phylogenetically clustered closer to human MFSD14A. Therefore, MFSD14B was not separately included in figure 3 or electronic supplementary material, table S2, but as one of the two proteins found for MFSD14A. All except MFSD5 were detected in zebrafish (figure 3). Eight proteins had two copies each in the zebrafish proteome. 11 atypical SLCs had related proteins in fruit flies (figure 3), where MFSD1 had two copies, MFSD14A had four copies (equally related to MFSD14B), SV2A had 10 (equally related to SV2B and SV2C) and Unc93A had two copies (equally related to UNC93B1). In the figure, we enlisted the proteins where they were most similar, and if they were equally related to several proteins we listed them in the first possible position. Identified proteins were sometimes found in several HMM, but they were included only once in figure 3 and electronic supplementary material, table S2. About half of the atypical SLCs were found in C. elegans, while only CLN3 was identified in yeast. Furthermore, in some proteomes, several related proteins  rsob.royalsocietypublishing.org Open Biol. 7: 170142 were found but they did not cluster phylogenetically with the human proteins, but still in relative proximity. We call these 'Like' (L) proteins, and they are included in electronic supplementary material, table S2, but not in figure 3. There are, for example, 11 proteins related to MFSD8, but none in the human cluster, and they were annotated as MFSD8L1-MFSD8L11.

Atypical SLCs are predicted to have 12 TMS
We used CCTOP to predict the structural appearance of the human atypical SLCs. All but MFSD13A (9 TMS), SPNS3 (11 TMS) and CLN3 (11 TMS) were predicted to contain 12 TMS, the common number for MFS proteins [17]. Six TMS has been suggested for CLN3 [31,32,86,87], but different TMS has been found by the different groups. The general 12 TMS structure is schematically depicted in figure 4a. MFSD6, SV2A, SV2B, SV2C and UNC93B1 were seemingly longer peptides than the regular MFS peptide (table 1), and they all were predicted to contain exceptionally long N-terminals. Furthermore, MFSD6 had a relatively long extracellular loop between TMS 3 and 4, while the SV2 proteins had a longer loop between TMS 7 and 8. To verify the structure of the irregular predictions of MFSD13A, SPNS3 and CLN3, homology models were built. Structurally known MFS proteins were used as templates. In the homology models both MFSD13A (figure 4b) and SPNS3 (figure 4c) were predicted to contain the expected 12 TMS, whereas CLN3 (figure 4d) still was composed of 11 TMS. When manually comparing the amino acids in each TMS that were identified in CCTOP versus the homology models, it was revealed that MFSD13A consisted of several amphipathic TMS (figure 4b), which could explain why they were not identified by CCTOP. For SPNS3, all TMS overlapped, except TMS11, which was lacking in the secondary structure prediction. As TMS 11 was amphipathic, it could have been considered as a too short hydrophobic segment to be identified as a TMS by the CCTOP server. Finally, for CLN3, both models predicted the same TMS. In conclusion, we predict all studied atypical SLCs to have 12 TMS, except CLN3, which was predicted to have 11 TMS.

Several atypical SLC genes are expressed in the same cells
To study co-expression of atypical SLC genes in embryonic mouse brain cells, data from single-cell RNA sequencing was analysed. Co-expression of at least two atypical SLC transcripts was identified in 9693 of the total 10 289 cells analysed. Twenty-one of the atypical SLCs were found as significantly co-expressed with other atypical SLCs ( figure 5). Mfsd1, Mfsd4b, Mfsd5, Mfsd6l, Mfsd9, Mfsd13a, Sv2b and Svop were not detected in the analysis, probably due to the relatively shallow sequence depth or utilized cut-off values. There are three different Mfsd7 (Mfsd7a-c) genes in mice corresponding to human Mfsd7, but only Mfsd7c was found in the dataset. Some genes were co-expressed with several other genes, like Mfsd11, which was co-expressed with all studied atypical transcripts except Mfsd14b and Cln3. Others showed more stringent co-expression, like Mfsd14b, which only co-localized with Mfsd8, Mfsd10 and Mfsd12. The sequentially similar Mfsd2a and Mfsd2b displayed a complementary co-expression, and together they were co-expressed with all found atypical SLCs except Sv2a and Sv2c. The three Spns genes supplemented each other, and together they were expressed in the same cells as all other genes except Mfsd14b ( figure 5). Regarding AMTFs, Mfsd10 showed extensive co-expression with 12 other genes, while its family member were more restricted; Mfsd14a was co-expressed with eight other genes and Mfsd14b with only 3, while Mfsd9 was not detected at all. Some of the co-expressions were found only in few cells, like Unc93a having only 1-2 cells containing each interaction ( figure 5). Among the more frequently found co-expressions were Cln3 together with Mfsd10, Mfsd11, Mfsd12 or Sv2a, with co-expression in more than 3000 cells ( figure 5).
To supplement the co-localization and to detect probable interactions at protein level, in situ proximity ligation assay was run. As Mfsd11 was most commonly found as coexpressed on transcript level (figure 6a), a subset of its combinations were selected and tested. In all selected combinations, interaction signals were identified, but at different degrees, confirming that co-expressed RNA transcripts were found at protein level (figure 6b). Even genes such as Mfsd9, which was not found to be co-expressed in the RNA sequencing, was found in proximity to other atypical SLCs at protein level (figure 6c).  Figure 4. Structural prediction of the atypical SLCs. The online tool CCTOP [73] was used to predict the topology of the atypical SLCs, where all but three proteins were predicted to possess the N and C domain, connected by a long cytoplasmic loop (MFS loop), resulting in a 12 transmembrane segment (TMS) polypeptide, as schematically depicted in (a) MFSD13A, SPNS3 and CLN3 diverged from the common structure, for which homology models were built to verity the predictions. The three proteins were aligned against structurally known MFS proteins, using the automated Swiss model homology program [77].   Figure 6. Verification of co-expression at protein level. In situ PLA was run on mouse brain sections to study interaction between certain atypical SLC proteins, to verify the single-cell RNA sequencing. (a) Mfsd11 was co-expressed with several atypical SLCs using the RNA sequencing dataset. (b) The co-expression of the corresponding proteins was also detected at protein levels using in situ PLA. Some atypical SLCs were not detected in the single-cell RNA sequencing data, likely due to low transcript detection. However, interactions for those proteins were still found using in situ PLA. (c) Protein -protein interactions detected by PLA between MFSD7, which was not found on transcript level, and its closely related proteins MFSD8, MFSD10, MFSD14A and MFSD14B are shown here.

Discussion
Here we investigated the characteristics of 29 novel predicted transporters, denoted atypical SLCs, to get a comprehensive understanding of their phylogenetic interrelations, family clustering, protein structures, co-expression and how they responded to altered amino acid levels. With phylogenetic trees, we elucidated the interrelations between the atypical SLCs alone, and how they group among the known SLC of MFS type. Upon closer inspection, the two phylogenetic trees provided mostly similar results, but not identical. UNC93A, for example, clustered with MFSD11 and UNC93B1 in figure 1, and closest to MFSD12 in figure 2.
The reasons for this discrepancy could be several. First, we used different programs for tree calculations. MRBAYES is a good tool concerning small-to-medium alignments, but for larger and more complex datasets, other methods, like the likelihood method implemented in RAxML [69], have to be used. Here, the main reasons for differences are within the tree searching algorithms. With the more advanced and computational intensive models implemented in MRBAYES, it will be possible to investigate a smaller proportion of the total number of possible trees compared to RAxML. In addition, the more stringent models implemented in MRBAYES will not converge in reasonable time for more complex datasets. Second, as more sequences were included when compiling figure 2, there were larger variations, resulting in a less accurate starting alignment. This is why the tree in figure 1 was considered most accurate and primarily used for family clustering, while the second figure showed that the atypical SLCs cluster with SLCs. The atypical SLCs are probably SLC proteins, but most are still orphan regarding function. Therefore, they were divided into AMTF families instead of using the existing SLC nomenclature. This highlights that the proteins are possible transporters, but that their function remains to be elucidated. Whenever their functions are determined they can be renamed according to the SLC root system, which could result in 64 SLC families instead of the present 52 SLC families.
In general, proteins within a SLC family usually share mechanism and substrate profiles [85], although exceptions to this rule can be observed. Most proteins in the AMTFs are not well studied, but there seem to be both similarities and differences within the families. AMTF1 (MFSD9, MFSD10, MFSD14A and MFSD14B) and AMTF8 (MFSD2A and MFSD2B) are examples for similarities and dissimilarities. In AMTF1, MFSD10 is identified both at the plasma [47] and intracellular membranes [36], while MFSD14A and MFSD14B have only known intracellular expressions [33]. MFSD8, which shares a branching node with the AMTF1 proteins, is also intracellular [27]. Therefore, it is likely that MFSD9 also has an intracellular location. This hypothesis was strengthened as we detected interaction between MFSD9 and MFSD8, MFSD10, MFSD14A and MFSD14B using in situ PLA. This means that MFSD9 is located within 40 nm proximity of the other three intracellular proteins. Regarding their substrates, they are believed to differ as MFSD10 transport organic ions [36], while MFSD14A is suggested to be sugar transporter as it shares several structural characteristics with known sugar transporters [48]. MFSD14B is a predicted sugar transporter due to its high sequence identity (67.7%) to MFSD14A. However, similar  Figure 7. Transcription levels of atypical SLCs are changed upon complete amino acid starvation. Mouse hypothalamic N25/2 cells were deprived of all amino acids for 1, 2, 3, 5 and 16 h, followed by microarray analysis to study transcriptional changes [60]. Data accession number was GSE61402. Genesis version 1.7.6 was used to generate the heat map, which depicts log 2 difference between starved and control cells at each time point. Green colour depicts downregulation while red colour corresponded to upregulated expression, where larger changes correlate with stronger colour intensity. Note that for Cln3 and Unc93a, two probes were identified corresponding to the human proteins, and both were included in the analysis.
rsob.royalsocietypublishing.org Open Biol. 7: 170142 response patterns to amino acid deprivation were found, where small changes were detected until 5 h for all four members, followed by upregulation of all but Mfsd9 after 16 h. If we instead consider AMTF8, both MFSD2A and MFSD2B are located to the endoplasmic reticulum [37], while MFSD2A is also detected in the plasmalemma [42]. As they are nearly 40% identical, it is likely that they share a substrate and mechanism, and as MFSD2B transports lipids in a sodium-dependent manner [43], it is possible that MFSD2B does so as well. The genes were expressed together in some cells, and their combined transcripts were found with all atypical SLCs, except the Sv2s, suggesting they could have similar effects. Mfsd2a was co-expressed with 14 atypical SLCs, while Mfsd2b co-expressed with 12 genes, of which they shared coexpression with 7 genes. This suggests that MFSD2B could function as the back-up system for MFSD2A in specific cells or that it may have a more direct and specific function. They responded differently to amino acid starvation, where Mfsd2a was significantly reduced, while Mfsd2b remained unaffected.
It is possible that Mfsd2b functions as a housekeeping gene, and hence lacks alteration upon diet change. On the other hand, Mfsd2a could have a direct function in energy balance, and is therefore found to be affected by starvation. Taken together, there are both similarities and differences between AMTF members, and it is not yet possible to elucidate their expression or functions, but the family clusters are good suggestions on which further investigations can be based.
To understand how single cells maintain their homeostasis, preserve ion balances, keep optimal sugar levels and so on, we must figure out which transporters are expressed together. By studying single-cellular transcriptomes, we identified genes that seem to be co-expressed with several other atypical SLCs, like Mfsd8, Mfsd11 and Mfsd12, suggesting that are needed for basic maintenance, while other genes displayed a more restricted co-expressions, like Mfsd14b and Unc93a. In the RNA sequencing analysis, there were approximately 42 000 reads per cell, meaning that low-expressed genes are probably missing from the dataset. This is why undetected Table 4. Results from amino acid starvation on N25/2 mouse hypothalamic cells [60]. Asterisk indicates significantly changed expressions. rsob.royalsocietypublishing.org Open Biol. 7: 170142 but anticipated co-expressed transcripts, like Mfsd9, could still be found as interacting partner to other proteins in vitro. There were detectable PLA signals even though the corresponding genes were not present in the sorted RNA dataset. This can be explained by the fact that low levels of mRNA can result in high protein translation in mammalian cells [88]. In many cases, mRNA and protein levels do not correlate completely because of different regulation controls. From the experiments we conclude that if genes were co-expressed according to the RNA sequencing, they were indeed found in the same cell. However, we cannot deduce anything about the unfound interactions; even if Mfsd2b has fewer gene co-expressions than Mfsd2a, transcripts could have been missed. For the in situ PLA, interactions were considered as accurate and as confirmations of co-existing proteins in the same cell, but comparisons between protein combinations were not performed. If we were able to understand the complete transporter co-expression map, it would facilitate the understanding of pharmacokinetics and human diseases. Most MFSs are similar in structure [17], despite their relatively low sequence identities. Therefore, we found it convincing that the predictions of atypical SLCs containing 12 TMS were accurate. This was in accordance with previous publications describing the structure of some atypical SLCs based on other topology prediction tools [27,29,30] or homology models [19,26]. As the predictions for MFSD13A, SPNS3 and CLN3 did not support our hypothesis, we built homology models to verify their predicted structures. When building homology models, the sequences were aligned against a structurally known MFS protein, providing higher reliability to the model than the prediction pool based only on amino acid sequences. This is why we feel confident to suggest that MFSD13A and SPNS3 have 12 TMS each. Interestingly, we identified only 11 TMS for CLN3 using CCtop and homology modelling, while previous reports have postulated conflicting results [31], where a six TMS protein is seemingly accepted [31,32,86,87]. However, it is different six TMS that are predicted in previous publications [31]. We have identified all previously predicted TMS, and additionally two regions, TMS 8 and 11, which have not been suggested so far. To our knowledge, no homology models have previously been built for CLN3. Since it does not belong to any Pfam clan, but is a clustered as member of the MFS superfamily according to the Transporter classification database [24], and because it shared between 10 and 20% sequence identities with many MFS proteins, we decided to align it against an MFS template. As the predicted TMS corresponded with those found by CCTOP we considered it as a reliable three-dimensional model. Therefore, we deviate from previous reports, and propose an 11 TMS structure for CLN3. Among SLCs belonging to other Pfam clans, 11TMS is a common structure (e.g. the SLC38 family belong to the APC Pfam clan, and they all are predicted to contain 11 TMS [89]). It is thus possible for an atypical SLC to have such structure.
Since the atypical SLCs phylogenetically group among SLCs of MFS type, share the MFS transporter topology and are affected by complete amino acid deprivation in cell cultures, it is likely that these proteins are novel transporters. As there has been a call for systematic research on transporters [6], we suggest that the atypical SLCs should be included in this. They could interact with drugs and be associated with diseases.
Ethics. All experiments including animals were approved by the local ethical committee in Uppsala (Uppsala Djurfö rsö ksetiska Nämnd, Uppsala district court, permit number C39/16), and conducted in unity with the European Union legislation (Convention ETS123 and EU-directive 2010/63). Adult C57BL6/J male mice (Taconic M&B, Denmark) were used and housed in accordance with the Swedish regulation guidelines (Animal Welfare Act SFS 1998:56). Euthanasia was performed during the light period by trans-cardiac perfusion of anaesthetized animals. Intra-peritoneal injections of sodium pentobarbital (Apoteket Farmaci, Sweden) (10 mg kg 21 ) were used to anaesthetize adult mice.
Data accessibility. All data are available through the main article and the online electronic supplementary material.