Logical modelling of in vitro differentiation of human monocytes into dendritic cells unravels novel transcriptional regulatory interactions

Dendritic cells (DCs) are the major specialized antigen-presenting cells, thereby connecting innate and adaptive immunity. Because of their role in establishing adaptive immunity, they constitute promising targets for immunotherapy. Monocytes can differentiate into DCs in vitro in the presence of colony-stimulating factor 2 (CSF2) and interleukin 4 (IL4), activating four signalling pathways (MAPK, JAK/STAT, NFKB and PI3K). However, the downstream transcriptional programme responsible for DC differentiation from monocytes (moDCs) remains unknown. By analysing the scientific literature on moDC differentiation, we established a preliminary logical model that helped us identify missing information regarding the activation of genes responsible for this differentiation, including missing targets for key transcription factors (TFs). Using ChIP-seq and RNA-seq data from the Blueprint consortium, we defined active and inactive promoters, together with differentially expressed genes in monocytes, moDCs and macrophages, which correspond to an alternative cell fate. We then used this functional genomic information to predict novel targets for previously identified TFs. By integrating this information, we refined our model and recapitulated the main established facts regarding moDC differentiation. Prospectively, the resulting model should be useful to develop novel immunotherapies targeting moDCs.


Introduction
Dendritic cells (DCs) are the main antigen-presenting cells [1]. By presenting antigens to the naïve lymphocytes, they initiate the immune response against various kinds of pathogens [2]. This capacity of DCs to activate the adaptive immune response opens interesting prospects for immunotherapies [3]. Circulating in the peripheral blood, monocytes are easily accessible and can be differentiated into dendritic cells, called moDCs (for monocyte-derived DCs), using an established protocol [4].
To integrate multiple signalling pathways into a comprehensive regulatory network and check its coherence with existing expression data, one can rely on the construction of a dynamical model [12]. As most of the available data are qualitative, we opted for using a qualitative approach. Logical modelling is well suited to represent such qualitative data and has been applied to similar processes [13][14][15]. This qualitative formalism relies on the construction of a regulatory graph, whose nodes denote molecular components (e.g. genes, proteins and lncRNA), while signed arcs denote positive or negative (or sometimes dual) regulatory interactions. In the simplest cases, nodes are associated with Boolean variables, which take the values 0 or 1, denoting the absence/inactivity or the presence/activation of the corresponding component, respectively [16]. When needed, multilevel variables (e.g. ternary variables taking the values 0, 1, 2) can be used to account for different ranges of activation (e.g. negligible, medium and high). Logical models are usually derived from a careful manual curation of relevant scientific literature, but they can also be enriched with other sources of information, such as high-throughput sequencing data [17].
GINsim is a computational software dedicated to the building and analysis of logical models of cellular networks [16]. GINsim includes specific tools to perform model simulations, as well as efficient algorithms to identify the attractors of the system (stable states and/or oscillatory behaviour), for wild-type or mutant conditions [16]. The resulting model can be further analysed using the CoLoMoTo suite, an interactive toolbox integrating several logical modelling software tools, with a uniform interface to perform complementary analyses, which are easy to share and reproduce through the use of notebooks [18].
The aim of our study was to integrate all the information gathered from scientific literature and functional genomic data (RNA-seq and ChIP-seq) into a logical model of the regulatory network underlying moDC differentiation. We further included key macrophage differentiation elements to complement our study [11,19,20]. The resulting model recapitulates the salient cell commitment features for each of the  Figure 1. Regulatory graph controlling monocyte to moDC differentiation, as derived from the scientific literature. Ellipsoid nodes are associated with Boolean components (0 and 1), whereas the two rectangular nodes are associated with ternary components (0, 1 and 2). The green nodes at the top represent the inputs (CSF2 and IL4), the yellow nodes denote transcription factors, the blue nodes denote moDC-specific genes, while orange nodes denote macrophage-specific genes. Nodes left in white correspond to components of generic signalling pathways. Green and red arcs denote positive and negative interactions, respectively. The core network interactions are emphasized by thicker arcs.

Information gathered from literature curation leads to a fragmentary model for the differentiation of monocytes into dendritic cells
To better understand the regulatory network controlling moDCs differentiation, we analysed the scientific literature and integrated relevant information into a regulatory graph.
In this process, we focused on monocyte to moDC differentiation studies carried primarily on human cells, in particular on experiments where CSF2 and IL4 were used alone or in combination in otherwise similar culture conditions. The resulting regulatory graph is shown in figure 1. Based on this first regulatory graph, we used GINsim to define logical rules, combining conditions on regulatory nodes with NOT, AND and OR Boolean operators, to compute the corresponding stable states, and to perform simulations in order to determine the cellular phenotypes reached for each specific input condition. Note that although most nodes are associated with Boolean variables, we assigned ternary variables to the model components STAT3 and PU1. In such a situation, two non-overlapping rules are defined, enabling the value 1 and 2, respectively (see electronic supplementary material, table S1). For this preliminary model, we obtained six stable states, but only one of them could be directly interpreted as a cellular phenotype (pre-DCs), while the other stable states diverged from the typical gene expression patterns of activated monocytes and macrophages.
Regarding the regulatory interactions between TFs and their target genes displayed in figure 1, we observed that several TFs exert only a few interactions. For example, CREB solely activates ALOX15, and no other target gene. Furthermore, this regulatory graph contains very few specific moDC markers. Consequently, to complete this preliminary network, we decided to take advantage of the public epigenome and transcriptome datasets to infer novel regulatory interactions and integrate them into our logical model (a proof of concept of this approach can be found in [17]).  royalsocietypublishing.org/journal/rsfs Interface Focus 11: 20200061 2.2. Epigenomic data analysis helps to unravel relevant transcriptional regulatory interactions In order to complete our model of the regulatory network controlling the differentiation of monocytes into moDCs, we included the TFs known to be activated by CSF2 and IL4 signals in moDCs, together with established monocyte markers. Moreover, we included information regarding the differentiation of monocytes into macrophages, which occurs when monocytes are treated with CSF2 alone [21]. In short, we (i) used monocyte, moDC and macrophage epigenome data to define chromatin states, (ii) defined genomic regions likely to be involved in the regulation of the genes included in the model, and (iii) searched for putative TFs binding sites in these regions. We analysed ChIP-seq data from the Blueprint consortium for six histone marks (H3K4me1, H3K4me3, H3K27ac, H3K36me3, H3K9me3 and H3K27me3) in monocytes, moDCs and macrophages derived from monocytes. We then used ChromHMM [22] to annotate the epigenome in each cell type based on these data. The resulting genomic segment states were classified into 10 categories: quiescent/low signal, polycomb repressed, poised regulation, active TSS, active promoter, Primed enhancer, active gene/enhancer, low transcription, TSS repressed and strong transcription (figure 2a). As expected, it is possible to visualize clear differences in the epigenome of moDCs versus monocytes when exploring genes with specific cell expression in a genome browser. For example, the gene coding for IRF4, a TF that mediates the differentiation of moDCs, is active in moDCs, while it is poised in macrophages and monocytes (figure 2b).
In the following step, we selected the segments corresponding to promoter and enhancer states: active TSS, repressed TSS, active gene/enhancer and poised regulation. These regulatory regions were used to predict binding sites for the TFs known to be activated by the CSF2 and IL4 pathways (figure 1), using position-weight matrices collected from the Jaspar database [19], with the pattern-matching tool matrix-scan [20] from the RSAT suite [23]. This led us to predict novel transcription factor binding sites presumably involved in the regulation of specific gene markers for moDCs, monocytes and macrophages (table 1), thereby enabling us to complete the regulatory network controlling monocyte to moDC differentiation.
For 13 out of 19 genes related to moDC phenotype, we detected putative binding sites for IRF4 (figure 3a) into putative regulatory regions (cf. ChromHMM analysis) located near to the TSS of the corresponding genes. Interestingly, this led us to corroborate the central role of IRF4 in moDC differentiation. Indeed, we predicted that IRF4 possibly regulates several TLR genes (TLR3, 4 and 7), which play a crucial role in antigen recognition in myeloid cells and are thus relevant for moDC. Furthermore, we predicted that TLR6 and TLR8 are regulated by STAT6, another essential TF in moDCs [7], we acknowledge that TLRs genes are not specific for moDCs; however, their transcriptional regulatory mechanisms are not fully understood and our results provided useful insights for future studies. In addition, we predicted that the genes encoding for the trans-membrane proteins CD1A, CD1B and CD1C are regulated by IRF4, as well as by other TFs (PU1, PRDM1, NR4A1 and CEBPA) related to moDC differentiation. Furthermore, we predicted that the gene coding for CD48, a costimulatory molecule involved in T cell activation, is regulated by PU1, which is known to participate in the differentiation of STEM cell progenitors into leucocytes [33]. We also analysed the non-redundant peaks available from Remap and identified several concordances with our results (electronic supplementary material, table S6). We also looked for regulatory interactions between the identified TFs. In particular, we predicted regulatory interactions from PU1 onto CEBPA, IRF4 and IRF8. We also predicted that AHR regulates IRF4, MAFB and PRDM1, which represent interesting candidates to assess experimentally. Figure 3b summarizes the regulatory interactions between TFs that compose our final logical model.

Integration of predicted novel regulatory interactions improves model accuracy
We integrated the selected gene markers for each cell type with the predicted regulatory TFs into our model and connected them with the regulatory interactions reported in figure 3. Using this extended regulatory graph (figure 4a) together with relevant Boolean rules, we computed the stable states, which consistently recapitulated the main cell fates (see electronic supplementary material, table S3).    royalsocietypublishing.org/journal/rsfs Interface Focus 11: 20200061 Our revised model is characterized by four stable states (figure 4b; electronic supplementary material, figure S1). The first stable state, in the absence of IL4 and CSF2 activation, corresponds to cell death, which is the expected outcome in this situation. The second stable state is obtained upon IL4 activation and corresponds to the monocyte signature, with KLF4, SELL and CD14, in the ON setting. The third stable state is obtained for CSF2 activation and corresponds to a macrophage signature, with MAFB, IRF8, CCDC151 and CD206 ON. Finally, the last stable state is obtained when both CSF2 and IL4 are activated and corresponds to the moDC signature, with IRF4, STAT6, CD1A and CD209 all ON. Noteworthy, this analysis recapitulates the fact that PI3K signalling is inhibited during moDC commitment, as previously reported by van de Laar et al. [6].
To validate the different expression signatures, we analysed RNA-seq data from monocytes, moDCs and macrophages. Figure 5 displays the differential expression of the genes included in the model. Interestingly, we found two main clusters of genes highly expressed in moDCs, but downregulated in macrophages. These moDC differentially expressed genes include those coding for the TFs STAT3, STAT6, CEBPA and IRF4, which participate in moDC differentiation, as well as those for CD209, MAOA and SLAMF1, which are specific markers for moDCs. Additionally, monocytes display high expressions of the genes coding for KLF4, IRF8, SELL and CD14.
In the next step, we took advantage of the CoLoMoTo toolbox to recapitulate documented cellular commitment experiments.

Model simulations recapitulate the main aspects of cellular commitment to differentiation
We imported our model into the CoLoMoTo environment to ease further analyses with complementary software tools such as Pint, BioLQM and MaBoSS [34]. The integration of all analyses can be found in a Jupiter (python) notebook (available at http://ginsim.org/model/monocytes-to-dc) further ensures reproducibility. We used the tool BioLQM to compute the trap-spaces for the wild-type situations [35]. As trap-spaces provide approximations of cyclic attractors, we could thereby verify that the model does not generate any cyclic attractors.
We continued to use BioLQM [35] to assess the behaviour of the model for nine single gene losses-of-function (affecting IRF4, STAT6, PU1, IRF8, MAFB, NCOR2, AHR, JAK3 and CEBPB, respectively) that have been reported in the literature STAT3  STAT6  CEBPA  HLA-DRQ1  ITGAX  PRMD1  NCOR2  SPI1  IRF8  TLR4  MAFB  DUSP6  TLR6  LY75  TLR8  CCL22  TIMP3  KLF4  CEBPB   CCL2   MERTK   DUOX1  CIITA  CLEC4A  CD226  CD209  MAOA  IRF4  SLAMF1  THBD  SELL  TLR3  CD14  TLR7  CD83   Heatmap showing differentially expressed genes between cell types after hierarchical clustering based on differential expression. The first three columns correspond to moDCs, the following three columns to macrophages and the last three columns to monocytes (each triplet of columns represent biological replicates). The colour-coded Z-score indicates the level of differential expression for each gene. Cell types are annotated based on known gene expression on the right, with light blue denoting moDCs expression, orange denoting macrophages and purple denoting monocytes.
royalsocietypublishing.org/journal/rsfs Interface Focus 11: 20200061 to affect the differentiation process. Table 2 summarizes the results obtained for these perturbations, while figure 6 shows the activity of each node for each perturbation at the stable states. These results qualitatively replicate the behaviour of each of the documented mutants. We further simulated the knock-out of each TF from figure 3a and identified that BATF3, FOXO1, PU1 and USF1 affected the three final cell types (electronic supplementary material, table S7).
We then used the tool Pint [38] to verify the reachability of the stable states corresponding to the correct cell commitment for each combination of CSF2 and ILF4, starting from a quiescent Monocyte state. Next, we used the stochastic Boolean simulation tool MaBoSS to generate mean temporal curves, starting from initial conditions corresponding to quiescent Monocytes, towards the stable state reached in the presence of the corresponding combination of active CSF2 and ILF4 [39]. In figure 7, we can observe several curves initially growing but later decaying. These curves most likely represent transient cellular states during the conversion of cells from their initial state to the attracting state reached at the end of the simulations.

Discussion
The construction of logical models traditionally relies on manual curation of the literature on a biological system of interest. In this work, we further took advantage of public ChIP-seq and RNA-seq data from the Blueprint consortium [40] to delineate in more detail the network driving the differentiation of monocytes into moDCs. We were able to fill in various gaps in the regulatory network, which allowed us to reach a better understanding of this differentiation process. Noteworthy, this led us to predict 102 novel interactions, which were validated in silico through our simulations, and are amenable to further experimental tests.
In particular, we delineated a series of target genes presumably important for the differentiation of monocytes into moDCs. Some TFs are already well known, such as IRF4, AHR, STAT6 and PU1 [7,11,29]. Our analysis recapitulates several key features regarding the expression of the corresponding genes, such as a high expression of IRF4 and STAT6 genes in moDCs. We further validated the results obtained by Vento-Tormo et al. [7] indicating that STAT6 loss of function lack of moDC-specific markers AHR transcription factor AHR silencing reduced mo-DC differentiation while slightly increasing mo-Mac [11] loss of function lack of every moDC-specific markers, Mac differentiation is normal JAK3 tyrosine-protein kinase STAT6 phosphorylation disappeared following JAK3 inhibition. In the case of Macs, we did not observe STAT6 phosphorylation, given the lack of stimulation of JAK3 [7] loss of function Mac phenotype with CSF2, and IL4. Mac differentiation is not affected CEBPB transcription factor in the absence of CEBPb in monocytes CEBPb-KO, only a very low amount (5%) of this Mac-like morphology was seen, and most of the cells stayed round [37] loss of function lack of some specific macrophage markers royalsocietypublishing.org/journal/rsfs Interface Focus 11: 20200061 is required for moDCs differentiation; according to our model analysis, STAT6 is indeed required for moDC differentiation, but not for macrophage commitment (figure 6). We were also able to predict novel TF targets that are relevant for this process, regulated by FOXO1, CEBPA, AP1 and PRDM1. We predict that FOXO1 regulates at least six moDC genes, while CEBPA could regulate at least seven of them. Furthermore, AP1 presumably regulates TLR4, DEC205 (LY75) and CD209 (DC-SING), which are associated with antigen-presenting cells. CREB1 is also presumably involved in the regulation of moDC genes, through the activation of CD141 and CD1A. We also predict for the first time that NR4A1 could regulate CD1C, a protein found at the surface of moDCs.
We further reviewed data recently published on the predicted TF-gene interaction considered in our model, and we found that some of these interactions have been recently experimentally confirmed. In particular, the regulation for the ITGAX gene was shown to be regulated by PU1 and IRF4 [29], as predicted by our epigenomic data analysis.
In summary, this study represents the first effort to integrate the current knowledge on monocytes to moDCs differentiation in vitro and should foster our understanding of this process.
Additionally, we unravelled novel transcriptional regulatory links presumably involved in this differentiation process.

Model implementation and simulations
Using the software GINsim [16], we integrated previously described signalling pathways activated when monocytes are cultured with CSF2 and IL4 (all the corresponding studies are referenced in the GINsim model as node annotations, and further listed in electronic supplementary material, table S3). We also reviewed the literature on the differentiation of monocytes into moDCs. The logical model was built using GINsim v. 3.0 [41], where nodes represent genes or proteins, and edges represent regulatory interactions between them, which can be negative or positive (or sometimes dual). In general, each node can take two values, zero or one, but in special cases, one may need to consider different qualitative levels of activation (e.g. STAT3 expression is activated by JAK1, but the presence of LnC-DC leads to a further increase of STAT3 expression). For such special cases, it is possible to use multilevel nodes, e.g. ternary variables enabling an additional level of activation (hence taking the values 0, 1 and 2). Logical rules are associated with each

Chromatin states definition
We used ChIP-seq data for six histone marks (H3K4me1, H3K4me3, H3K27ac, H3K36me3, H3K9me3 and H3K27me3) for each cell type (monocytes, macrophages and moDCs), with their respective input control. Chromatin states were defined using ChromHMM [18] v. 1.12 [46] with the recommended parameters (BinarizeBed -b 200, assembly hg38), and specifying 10 hidden states for the hidden Markov model. Each chromatin state was annotated based on the probability of appearance of the different marks (e.g. H3K27ac-Enhancers, H3Kme1-Enhancers, H3K4me3-Promoters, H3K27me3-Repressive, H3K9me3-Repressive and H3K36me3-Transcribed [47]). We then assessed   [19]. We performed pattern-matching searches for TF motif instances using the 22 PSSMs in the selected chromatin regulatory regions, active TSS, repressed TSS, active gene/enhancer and Poised regulation in the vicinity of the TSS annotated in RefSeqTSS (see ChromHMM analysis above). For this task, we used the tool matrix-scan [20] from the RSAT suite [23] with the following main parameters: background model of Markov order 1, and stringent thresholds of p ≤ 10 −5 , and score 1 (-markov 1 -lth score 1 -uth pval 1 × 10 −5 ). In order to assess our results regarding the number of predicted interactions discovered in figure 3a, we generated 100 groups of 20 randomly selected genes, selected their regulatory regions using ChromHMM, used matrix-scan (as described above) with the same collection of motifs and performed a hypergeometric test in the electronic supplementary material, figure S2.

RNA-seq analyses
Raw fastq files from RNA-seq experiments were retrieved from the Blueprint Consortium [40] data access portal (http://dcc. blueprint-epigenome.eu/#/datasets) with dataset identifiers: EGAD00001002308, EGAD00001001506, EGAD00001002526, EGAD00001002507 and EGAD00001001582. Blueprint monocytes samples were obtained from the same laboratory from healthy volunteers using positive selection, then the same samples were treated with CSF2 and IL4 for 6 days (monocyte purification: https://www.blueprint-epigenome.eu/UserFiles/ File/Protocols/UCAM_BluePrint_Monocyte.pdf ). For this analysis, we used the methods described in Law et al. [48]. In brief, we performed quality control with FastQC/0.11.3 [49], pseudo-alignment and count determination with Kallisto 0.43.1 [50] using the release-90 from Ensembl (ftp://ftp.ensembl.org/pub/release-90/fasta/homo_sapiens/cdna/Homo_sapiens. GRCh38.cdna.all.fa.gz) to create our index with the following command: kallisto index -i index_kallisto_hsap_90_cdna -make-unique Homo_sapiens.GRCh38.cdna.all.fa.gz. Counts were assigned to genes using Tximport 1.14.0 [51] and were processed from rawscale to counts per million (CPM), and then they were transformed to log-CPM. Genes with expression values below 1 were removed. Then we normalized raw library sizes using the calcNormFactors function from edgeR library in R. Afterwards, we performed a differential gene expression analysis with edgeR 3.28.0 [52]. Finally, we used heatmap.2 from the gplots library to plot the genes found in our model ( figure 5).

Colomoto analysis
In order to foster reproducibility, we used the CoLoMoTo toolbox [18] that integrates several logical modelling tools, including GINsim, bioLQM, Pint and MaBoSS. We used GINsim to compute the stable states and bioLQM to identify trap-spaces approximating cyclic attractors. The computation of mean stochastic temporal trajectories was performed using MaBoSS [39]. The GINsim model and the CoLoMoTo notebook are available on the GINsim repository (http://ginsim.org/model/monocytes-to-dc), as well as a dedicated git repository (https://github.com/karenunez/ moDC_model_differentiation). The results can be replicated with the CoLoMoTo Docker image following the instructions provided at http://colomoto.org/notebook. Figure 1 and 4a were generated with the GINsim software. The plots in figures 2a, 3a,b and 4b were generated with the ggplot2 library from R. Figures 6 and 7 were extracted from the CoLo-MoTo notebook developed for this study.

Generation of the figures
Data accessibility. Supplementary files, including the logical model of monocyte to DCs differentiation, are available at https://github. com/karenunez/moDC_model_differentiation.