Modelling biological behaviours with the unified modelling language: an immunological case study and critique

We present a framework to assist the diagrammatic modelling of complex biological systems using the unified modelling language (UML). The framework comprises three levels of modelling, ranging in scope from the dynamics of individual model entities to system-level emergent properties. By way of an immunological case study of the mouse disease experimental autoimmune encephalomyelitis, we show how the framework can be used to produce models that capture and communicate the biological system, detailing how biological entities, interactions and behaviours lead to higher-level emergent properties observed in the real world. We demonstrate how the UML can be successfully applied within our framework, and provide a critique of UML's ability to capture concepts fundamental to immunology and biology more generally. We show how specialized, well-explained diagrams with less formal semantics can be used where no suitable UML formalism exists. We highlight UML's lack of expressive ability concerning cyclic feedbacks in cellular networks, and the compounding concurrency arising from huge numbers of stochastic, interacting agents. To compensate for this, we propose several additional relationships for expressing these concepts in UML's activity diagram. We also demonstrate the ambiguous nature of class diagrams when applied to complex biology, and question their utility in modelling such dynamic systems. Models created through our framework are non-executable, and expressly free of simulation implementation concerns. They are a valuable complement and precursor to simulation specifications and implementations, focusing purely on thoroughly exploring the biology, recording hypotheses and assumptions, and serve as a communication medium detailing exactly how a simulation relates to the real biology.

The simulation platform is the software that simulates the domain. The CoSMoS process does not explicitly dictate how the simulation platform is constructed from the platform model, but it recommends that established software engineering principles, for example testing and agile software development, are adhered to. The process does not dictate exactly which software engineering principles be observed, it states only that simulation engineers should take steps to ensure high-quality, well-documented and bug-free code be developed using appropriate measures given the problem at hand.
The results model encapsulates observations and understanding that stems from the simulation: simulation dynamics, recorded data and statistics, and observations. The manner in which the domain model reflects the domain is mirrored in the results model's reflection of the simulation platform.
The research context captures the overall context in which the simulation-based research is being conducted. This includes the motivation for the research, the questions that are to be addressed, and requirements for evaluation of success and validation of results.
The CoSMoS process is intended to be an iterative process, with all of the artefacts undergoing potential modification between iterations. An iteration is envisaged to comprise three stages, discovery where the domain model is modified to reflect further investigation of the domain or changes to the context in which research is to be conducted; development in which the platform model and simulation platforms are updated to reflect the modified domain model; and exploration, where simulation experiments are performed.
Though it appears prescriptive in nature, the process is not intended to force researchers to explicitly undertake all iteration phases and maintain all artefacts. Rather, the process highlights that these phases and artefacts are inherent in any simulation endeavour, and prompts their consideration. How much effort is vested in explicitly undertaking or maintaining aspects of the process is a judgement call to be made by the investigators, and is informed by the complexity of the problem at hand, its criticality, and the rigour felt appropriate given the scientific questions being addressed.
With respect to the domain model, the CoSMoS process does not specify any particular format, formalism, or even the exact details that should constitute the model. It only defines the concept, and how it links to other artefacts. The present manuscript provides more specifics on domain modelling for a particular class of biological problems. The framework and use of UML that we present should not be interpreted as prescriptive for those following the CoSMoS process, it provides only an example method for domain modelling. There will be other methods, and investigators should approach domain modelling in a manner most appropriate to the problem at hand.

Domain Model Assumptions
The domain model (DM) is intended to capture the assumptions made of a biological system in creating a consistent abstract model of it. It is impossible to record everything that is abstracted, as the field of Biology is constantly expanding, and this exercise can theoretically distil down to the level of organic chemistry, physics, and thereafter quantum mechanics. Instead, a DM should be regarded as a record of assumptions in the sense of what is represented, rather than what is not.
The biological entities represented in a DM in actuality represent a great many biological factors at an abstract level. A domain model, or simulation derived from it, cannot be used to examine the dynamics or contributions to a biological phenomenon of biological components that have been abstracted into the same logical entity.
Although an exhaustive list of abstractions is impossible for the reasons outlined above, we provide the following list of key abstractions for illustration.
• The onset, persistence and recovery of EAE can be adequately described based on the activities occurring within a single peripheral lymph node, the circulatory system, spleen, central nervous system and a single cervical lymph node. Consequence: DM, and any simulation resulting from it, cannot be used to investigate the role of any other spatial compartments in the dynamics of EAE.
• Intracellular signalling cascades have not been explicitly represented. Instead, the behavioural changes that result from extra-cellular signalling events (mediated through, for example, cytokines and receptors) have been represented. Consequence: The dynamics and contributions of aspects of intracellular signalling cascades to cellular behaviours can not be examined using this DM, or any simulation derived from it.
• The activities of CD4Th17 cells is sufficiently similar to those of CD4Th1 cells that they have not been explicitly represented. The DM's CD4Th1 cells abstract the behaviour of both CD4Th1 and CD4Th17 cells in the real biology. Consequence: DM, and any simulation resulting from it, cannot be used to examine the individual contributions of CD4Th1 and CD4Th17 cells to establishment and persistence of EAE.
• The actions of IL-2, INF-γ, IL-12, IL-17 are sufficiently similar to be abstracted into a single cytokine type, type 1 cytokine. Consequence: DM, and any simulation resulting from it, cannot be used to examine the individual roles of these cytokines in EAE.
• The actions of IL-4 and IL-10 are sufficiently similar to be abstracted into a single cytokine type, type 2 cytokine. Consequence: DM, and any simulation resulting from it, cannot be used to examine the individual roles of these cytokines in EAE.
• Demyelination is abstracted as neuronal apoptosis. Phagocytosis of 'apoptotic' neurons leads to MHC:MBP presentation on antigen presenting cells. Consequence: DM, and any simulation resulting from it, cannot examine the specific contributions of the demyelination process and the specific behaviour of demyelinated neurons (as opposed to fully functioning, or apoptotic) on the dynamics of EAE.
• Microglia and macrophages residing in the CNS are sufficiently similar in function to be abstractly represented as a single cell, the CNS macrophage. Consequence: DM, and any simulation resulting from it, cannot be used to examine the individual roles that these cells have in EAE.

Domain Model as Different from the Platform Model
We consider here how a platform model is different from a domain model. The purpose a domain model is not to provide a comprehensive implementation specification for a simulation. Rather, it is intended to highlight hypotheses, abstractions and assumptions, and for communicating the biology that a simulation represents in a clear and coherent manner.
A simulation specification will make additional assumptions from those present in a domain model where insufficient biological detail is available, and provides specifics of how implementation is to be accomplished. This can be illustrated by comparing elements of our EAE domain model with its implementation described in [4]. Specifically, figure 11 of our manuscript depicts the dynamics of dendritic cells (DC) in our domain model, highlighting how DCs migrate from the periphery into the SLO upon maturation, an event which leads to the establishment of autoimmunity. Neither it, nor any other element of the domain model, indicates how many DCs undergo this migration nor how quickly; this information is not known, but must be provided in the software specification. Figure 2 (below), reproduced from [4], illustrates how the establishment of autoimmunity is to be implemented in ARTIMMUS: through the periodic creation of MBP-presenting immunogenic type 1-polarised DCs in the SLO compartment. It explicitly references simulation parameters, and describes how they together specify a linearly decreasing number of DCs to be created over time. The simulation specification makes further assumptions of the domain model such as omitting the periphery from explicit representation. There are many alternative ways in which immunization for EAE could be implemented based on information contained in the same domain model, each necessitating different assumptions. Most importantly, the domain model contains information that should not be present in the simulation specification. For instance, it describes how a cascade of cellular interactions, starting with immunization for EAE resulting in MBP-presenting DCs migrating from the periphery to the SLO compartment, culminate in damage to the central nervous system. High level overviews such as this are hypotheses, and must not be directly coded into the simulation; they represent the abstract outcomes we seek to observe from a simulation which explicitly codes only single-cell-level behavioural dynamics, thereby allowing us to evaluate our hypotheses. Someone wishing to understand the biology underpinning ARTIMMUS would not benefit from having implementation-level details obscuring the portrayal of purely biological concepts, and someone wishing to implement ARTIMMUS would require more specifics than is contained purely in the domain model. The domain model and simulation specification have different purposes, and are therefore two explicitly separate entities. Domain modelling provides the starting point for simulation specification.  Figure 2: ARTIMMUS's immunization mechanism, and how it is parametrised. This comprises part of ARTIM-MUS's simulation specification. The label "Simulation immunizationLinear" has been omitted from parameter names. Immunisation for EAE is accomplished in vivo through the administration of MBP, PTx and CFA. These immunisation substances do not find explicit representation within the simulation, which instead represents immunisation through the appearance of MBP-presenting immunogenic type 1 polarising DCs in the SLO compartment. Hence, the periphery compartment of the domain model is not represented in the simulation specification, and is not implemented in ARTIMMUS. The immunisation mechanism is parametrised through 4 parameters: Simulation immunizationDC0, Simulation immunizationLinearFreq, Simulation immunizationLinearGradient, and Simulation immunizationLinearInitial. The last specifies the number of immunisation DCs placed into the SLO compartment at time zero, as a one-off event. The remainder parametrise a linearly reducing number of DCs that are added to the SLO periodically. The period is defined by Simulation immunizationLinearFreq. Simulation immunizationDC0 and Simulation immunizationLinearGradient describe the level of DCs inserted at time zero, and the rate of linear decay. Every Simulation immunizationLinearFreq hours, the value described by these two parameters, given the current simulation time, is rounded to the nearest whole number of DCs which are then placed in the SLO. Reproduced from [4].