Clinically driven design of multi-scale cancer models: the ContraCancrum project paradigm
Abstract
The challenge of modelling cancer presents a major opportunity to improve our ability to reduce mortality from malignant neoplasms, improve treatments and meet the demands associated with the individualization of care needs. This is the central motivation behind the ContraCancrum project. By developing integrated multi-scale cancer models, ContraCancrum is expected to contribute to the advancement of in silico oncology through the optimization of cancer treatment in the patient-individualized context by simulating the response to various therapeutic regimens. The aim of the present paper is to describe a novel paradigm for designing clinically driven multi-scale cancer modelling by bringing together basic science and information technology modules. In addition, the integration of the multi-scale tumour modelling components has led to novel concepts of personalized clinical decision support in the context of predictive oncology, as is also discussed in the paper. Since clinical adaptation is an inelastic prerequisite, a long-term clinical adaptation procedure of the models has been initiated for two tumour types, namely non-small cell lung cancer and glioblastoma multiforme; its current status is briefly summarized.
1. Introduction
Multi-scale cancer modelling has recently become a very active research area. A relatively recent literature review of representative cancer modelling efforts worldwide is included in Stamatakos et al. [1]. The mission of the ContraCancrum project [2] is to boost the translation of clinically validated multi-level cancer models into clinical practice. To this end, the project has designed and developed a composite multi-level platform for simulating malignant tumour growth as well as tumour response to therapeutic modalities and treatment schedules. The joint effort aims to produce an impact primarily by:
— improving the understanding of the natural phenomenon of cancer at different levels of biocomplexity, and | |||||
— optimizing the disease treatment procedure in the patient-individualized context by simulating the response to various therapeutic regimens. |
Given that clinical adaptation of the models is a prerequisite for their eventual clinical translation, a long-term clinical adaptation procedure has been initiated within the framework of the ContraCancrum project for two tumour types, namely non-small cell lung cancer and glioblastoma multiforme (GBM), and their current status is briefly outlined. One of the central questions of modern clinical oncology is whether it is possible to select the best treatment scheme and/or schedule for a patient by multi-modal therapy simulation on the computer [3]. To answer this, the actual clinical response for an individual patient needs to be compared with the in silico prediction of the ContraCancrum integrated simulator, as shown in figure 1 for both clinical studies of the project.
Figure 1. Clinical predictive oncology scenarios in ContraCancrum. The data are collected, anonymized and uploaded on the ContraCancrum repository to run the ContraCancrum simulations. A set of clinical cases (including imaging, histopathological and molecular data) is used for the clinical adaptation of the model, whereas another independent set is used for clinical validation of the models.
The clinical adaptation procedure is based on the comparison of multi-level therapy simulation predictions with multi-level patient data, acquired before and after therapy. ContraCancrum data include treatment data, histological data, molecular data and imaging data. All data are pseudonymized or anonymized before they are uploaded to the so-called Individualized MediciNe Simulation Environment (IMENSE), the integrated e-science platform of the project. Patient imaging data are stored as digital imaging and communications in medicine (DICOM) files at the time of diagnosis, after surgery and at the end of treatment. Clinical data, including age, sex, clinical findings, mutation analysis of the tyrosine kinase pathway, treatment and outcome data, are collected from all patients and stored in a database. Lung cancer specimens have been obtained and used for molecular analyses, including gene expression profiling. In total, up to now, 13 lung cancer and four GBM multi-scale datasets have been exploited.
2. Methods and technical components
The ContraCancrum predictive oncology environment consists of a number of predictive multi-scale cancer oncology modules/services, including cellular and higher level tumour dynamics simulation (microscopic and mesoscopic–macroscopic), biomechanical simulations, biochemical simulations and molecular determinants of response to therapy and image analysis modules:
— microscopic GBM tumour growth and response to radiotherapy and chemotherapy simulator, | |||||
— mesoscopic–macroscopic GBM tumour growth and response to radiotherapy and chemotherapy simulator, | |||||
— mesoscopic–macroscopic lung cancer growth and response to radiotherapy and chemotherapy simulator, | |||||
— biomechanics module, | |||||
— biomolecular simulations for patient-specific chemotherapy drug ranking, | |||||
— molecular determinants of response to therapy, and | |||||
— integrated image analysis (e.g. DrEye software). |
ContraCancrum is also progressively integrating individual modules into composite multi-scale simulators and technological tools for specific clinical studies on gliomas and lung carcinoma. These include the following:
— ‘TB’ multi-level integrated simulator: fusion of the Tumour growth and therapy response modules with the respective Biomechanical models, | |||||
— the ‘TBI’ multi-level integrated simulator: fusion of the ‘TB’ simulator with the Image analysis module, | |||||
— the ‘TBIN’ multi-level integrated simulator: fusion of the Normal tissue response module with the ‘TBI’ simulator; this is based on normal tissue toxicity limits, according to available phase I clinical trial outcomes, and | |||||
— the ‘TBINM’ multi-level integrated simulator: fusion of the Molecular simulations and networks module with the ‘TN integrator’. |
The final simulator integrates all of the previously mentioned simulation and technology modules. The image processing software DrEye integrates imaging/annotation and visualization services and is freely available (http://biomodeling.ics.forth.gr/). From the above, the clinician is able to define predictive oncology workflows within the IMENSE (described in §3), in the context of multi-level GBM or lung cancer decision support.
In the following, we present a more analytical description of the predictive oncology components that have been developed within ContraCancrum.
2.1. Microscopic cellular and higher level tumour dynamics module
A microscopic tumour growth module is used for the simulation of basic microscopic mechanisms of tumour growth, including, for example, avascular tumour growth, angiogenesis, invasion and metastasis. A hybrid agent-based framework that extends an idea of complex automata [4] has been developed. The framework consists of two main parts. The first part includes a description of hierarchical networks of agents, each operating on a predefined time and space scale and representing different biological processes. Communication between agents reflects interactions between different biological processes and uses different work-flow and data-flow paradigms [5]. The second part contains a modular computational cell-based ‘middle-out’ system for the description of cellular processes and their interrelations, including, for example, a description of micro-environmental (extracellular and intracellular) processes as diffusion and reaction kinetics, a compartmental model of cells, a description of extracellular long-range interactions, a cellular Potts model (CPM) [6], cell cycle approximations (cdk/cdc, p27, p53), the influence of growth promoter/inhibitor factors on phenotype (EGF/R, VEGF/R pathways, hypoxia), and the influence of adhesion on phenotype (cadherin–catenin pathways, extracellular matrix) [7]. The framework includes advanced visualization tools for the visualization and editing of spatio-temporal simulations.
The main purpose of the development of these microscopic models within the ContraCancrum framework is to enhance our understanding of tumour dynamics on the microscopic level so that refinement of macroscopic tumour models can be achieved.
Figure 2 illustrates a three-dimensional output from simulation of an avascular spheroid formation after 20 days of growing using CPM, reaction–diffusion (oxygen, nutrients and growth promoter/inhibitor factors), cell cycle, hypoxia and EGF/R pathway modules and a four-compartment model of a cell.
Figure 2. Output of simulation of an avascular spheroid formation after 20 days. View of XY, XZ, YZ cuts and three-dimensional view. Purple, dead cells; blue, quiescent cells; red, proliferating cells.
2.2. Mesoscopic–macroscopic cellular and higher level tumour dynamics module
This module includes the development of a set of multi-level simulation models of tumour growth and response to radiotherapy and chemotherapy for the cases of GBM and lung cancer in the patient-individualized context. Both discrete and continuous simulation models of tumour growth from a single tumour cell or an already grown tumour, as well as the tumour response to radiotherapy and chemotherapy, are considered and exploited within the framework of ContraCancrum [1,8–12]. In order for the models to be translatable into clinical practice, a thorough long-term clinical adaptation, optimization and validation procedure is being performed by the clinical partners of the University of Saarland hospital, Saarland, Germany. In parallel with the tumour response models, available toxicological data provide safety limits beyond which any candidate treatment scheme would be clinically unacceptable regardless of the tumour control predicted outcome.
Two major modelling approaches are being used. The first one is a continuum-based approach exploiting primarily diffusion theory [11]. The second one is a discrete entity/discrete event modelling approach based on the ‘top-down’ method, which exploits the potential of cellular automata, the Monte Carlo method, cell clustering into equivalence classes, as well as numerous dedicated algorithms [1,8–10]. In this way, both diffusion phenomena (e.g. on tumour invasion) and complex multi-scale biological mechanisms of a predominantly discrete character (e.g. symmetric and asymmetric stem cell division) are addressed. In the framework of the ContraCancrum project [2,13], the continuous approach is mainly used to simulate free tumour growth (in practice applicable to gliomas), whereas the discrete entity/discrete event approach is mainly used to predict tumour response to treatment. In that sense, the two approaches are to be viewed as complementary rather than mutually interchangeable.
According to the continuum approach, the tumour is considered as a spatio-temporal distribution of continuous cell density that follows the generic reaction–diffusion law [11] with sources and sinks. Therefore, the reaction–diffusion equation is used to couple diffusion and proliferation of glioma cancer cells in brain. The developed diffusive model solves this equation in order to simulate the behaviour of glioma cells. The grid and the equation system (the equation is a second-order partial differential equation) are constructed by using several numerical schemes of finite differences, both explicit (e.g. forward Euler) and implicit (e.g. backward Euler, Crank–Nikolson) [14]. A conjugate gradient solver is used for solving the corresponding numerical system. The final result is the approximated concentration of glioma cells after a predefined time of diffusion.
Furthermore, the model has been designed to simulate the heterogeneous and anisotropic migration of glioma cells, observed in real clinical cases [15,16]. More specifically, the diffusive model considers the inhomogeneous diffusion of cells in white and grey matter, by using local diffusion coefficients, which change according to the underlying tissue. Also, the model takes into account the anisotropic migration of glioma cells, which has been observed to be facilitated along white fibres, by converting these coefficients into 3 × 3 diffusion tensors. Finally, the model can receive therapy parameters as input, which adds a third term to the reaction–diffusion equation. This parameter expresses the efficiency of therapy in terms of the cells' death rate (owing to therapy) [14].
An example of simulating the free growth (without therapy input) of a glioblastoma is presented in figure 3. The graph depicts two approximated concentrations of glioma cells, which have been calculated along the same straight line at two different time points. This line intersects the initial imageable tumour centre, shown in the inset in figure 3. The first graph (lower, red) shows the initial concentration of glioma cells, as annotated by clinicians, while the second graph (upper, blue) shows the approximated concentration of cells after 100 fictitious days of cancer diffusion, using simulation parameters extracted from the bibliography [11].
Figure 3. Simulation of free glioblastoma growth along a straight line intersecting the initial imageable tumour [13]. Blue line, after 100 days; red line, initial.
The second approach uses ISOG1 discrete-entity/discrete-event modelling based on the ‘top-down’ method, which exploits cellular automata, the Monte Carlo method, cell clustering into equivalence classes, as well as numerous dedicated algorithms [1,8–10]. In this way, both diffusion phenomena (e.g. on tumour invasion) and complex multi-scale biological mechanisms of a predominantly discrete character (e.g. symmetric and asymmetric stem cell division) are addressed. It proved particularly amenable to parallelization, and a GPU-based approach delivered a speed-up factor of more than 120 when compared with computation on a same-generation CPU [17].
In the ISOG discrete-entity/discrete-event model, the tumour is described as a spatio-temporal distribution of discrete cells (and cell death products) belonging to several proliferative potential categories and cell cycling phases. Transitions among the corresponding equivalence classes dictate the spatio-temporal course of the tumour. Pertinent clinical data are used in order to adapt, optimize and validate the discrete simulator. In summary, the discrete simulator module functions as follows.
— All available patient-specific data are collected. Macroscopic imaging data of the patient are collected at baseline, subsequently segmented by the clinician (delineation of tumour boundaries, necrotic areas, etc.), interpolated and three-dimensionally reconstructed. Molecular data (e.g. status, amplification and expressions of critical genes) proved to drastically affect the response of the tumour under consideration to the treatment addressed are provided. Estimates (even semi-qualitative) of their effect on the cell kill ratio (CKR) per cell owing to the treatment considered are also provided based on pertinent literature. The idea is to use a carefully chosen reference value of the CKR and then perturb it based on the specifics of the particular tumour case under consideration, in order to achieve higher patient individualization of the simulation. | |||||
— The inclusion of patient-specific information in the model is a multi-level procedure involving the direct assignment of specific values to some model parameters or tumour features (e.g. initial and final volume, dead cell population percentage and perturbations of CKR value according to the molecular profile of a tumour), as well as an indirect estimation of most model parameters through perturbation of their reference values while seeking full compliance of various virtual tumour characteristics (such as tumour volume, doubling time, growth fraction, percentages of various cell category subpopulations, etc.) with the corresponding actual clinical data [1,18]. Issues related to the spatio-temporal initialization of a virtual tumour have been analytically described in Stamatakos et al. [1]. As a first approximation, the distribution of the cell cycle duration over the various cell cycle phases is obtained as described in Stamatakos et al. [1], where the methodology for the tumour's cell populations' initialization is also provided. Similarly, an initial approach to the estimation of the mitotic potential cell categories, which relies on a combination of literature and a procedure of adapting the model to histopathological data, is presented in Stamatakos et al. [18]. | |||||
— A discretization mesh is superimposed upon the three-dimensionally reconstructed tumour. The geometrical cells of the discretization mesh constitute the elementary spatial units of the problem. Within each geometrical cell, biological cells are clustered together based primarily on their mitotic potential, their cell proliferation phase and the treatment killing effect upon them. Based on their mitotic potential they may be stem cells (having theoretically infinite mitotic potential), progenitor cells (having limited mitotic potential that is defined by the number of mitoses they can still undergo), differentiated cells (with no further mitoses possible) and dead cells. Based on the cell proliferation phase in which they are found they may belong to the G1, the S, the G2, the mitosis, the G0, the necrosis or the apoptosis equivalence class. Based on the treatment killing effect they may be treatment hit or treatment non-hit cells. | |||||
— At each time step, the discretizing mesh is scanned and the basic cytokinetic, metabolic, pharmacokinetic/pharmacodynamic and mechanical rules that govern the spatio-temporal evolution of the tumour are applied. The outcome is an update of the spatio-temporal distribution of the tumour cells. | |||||
— A prediction of the spatio-temporal distribution of the tumour cell categories and the cell cycling phases for free tumour growth (no treatment) and/or tumour response to treatment (chemotherapy/radiotherapy) is obtained. Several forms of prediction visualisation (graphs, three-dimensional and four-dimensional rendering, etc.) can be produced. | |||||
— The simulated outcome is compared with the actual imaging and other available clinical data following treatment. |
Within the framework of ContraCancrum, the core of the ICCS In Silico Oncology Group (ISOG) discrete model has been adapted to the case of lung cancer neoadjuvant chemotherapy treatment with various combinations of cisplatin, gemcitabine, vinorelbin and docetaxel. The model has been applied up to now to a clinical dataset of 13 patients with primary lung cancer: nine cases of squamous cell carcinomas and four cases of adenocarcinomas. The patient-specific data that have been exploited by the model are the applied chemotherapeutic scheme (drugs, administration times) and the three-dimensional image of the tumour as reconstructed from computed tomography (CT) imaging data. The sets of imaging data were provided for two time instants—before and after the completion of the treatment. Owing to non-availability of proliferation indexes and data that could allow us to estimate the tumour growth rate in a patient-specific manner, an extensive literature review has provided biologically reasonable values for critical tumour dynamics features, such as tumour doubling time and growth fraction [19–24]. The latter has been exploited in order to achieve an initial quantitative adaptation of the model to clinical reality. More specifically the virtual tumour implemented was homogeneous with characteristics that fall within the value range reported in the literature, namely a volume doubling time of 200 days and a growth fraction equal to 40 per cent. The above values were achieved by properly adapting the model parameters related to tumour free growth. An excellent fit in terms of volumetric data has been achieved by adapting for each clinical case the CKR of the drugs involved. The suggested value of the CKR for each case study (the ‘apparent’ CKR) is the CKR that produces good agreement between the evolution of the simulated tumour and that of the real tumour according to the clinical data.
Regarding GBM, the collection of multi-scale directly exploitable data has turned out to be a very difficult process. The main reason is that neo-adjuvant radio- or chemotherapeutic treatment, which would be convenient for tumour evolution simulation runs, is a rare treatment choice. In most cases, no imageable tumour exists after surgery so that the in silico response to treatment could be compared with its in vivo counterpart. Up to now there have been four exploitable GBM multi-scale datasets identified, and further collection is underway. These clinical cases involve treatment with a combination of radiotherapy and temozolomide.
The basic philosophy of the ISOG discrete models clinical adaptation strategy has been recently described in Stamatakos et al. [18], in which a real clinical case serves as a proof-of-principle case study, demonstrating the basics of an ongoing clinical adaptation process. A comparison of the simulation results with clinical data, in terms of both volume reduction and histological constitution of the tumour (e.g. percentage of dead cells, percentage of differentiated cells), dependent each time on the available actual data of each clinical case, lies at the heart of the procedure. A thorough study of the literature relating to each tumour type precedes the simulations, so as to define—in conjunction with accumulated basic science and clinical experience—plausible reference values and value ranges of the various model parameters. The concurrent constraints imposed by both the actual multi-scale clinical data and the literature-derived plausible value ranges for the model parameters drastically narrow the window of possible solutions to each clinical adaptation problem.
A set of clinical cases (including imaging, histopathological and molecular data) is used for the clinical adaptation of the model, whereas another independent set is used for clinical validation of the models. Clinical adaptation results for lung cancer and GBM within the framework of the ContraCancrum project will be the subject of a dedicated paper.
2.3. The biomechanical simulator
The objective of the biomechanical simulator is to consider the mechanical environment inside and outside the tumour, which is of significant importance especially when there is a marked variation in the mechanical properties of the various surrounding tissues. Information about tumour growth is obtained at the molecular and cellular levels and fed into the biomechanical model. The mechanical information obtained is then transmitted back to the cellular simulator and drives the spatial evolution of the tumour shape and volume.
A fully automatic meshing algorithm for the different tissues has been built for this purpose.
The main challenge centres around the selection of a robust meshing technique able to automatically provide reliable finite-element meshes of the tumour as well as related healthy tissues. For this reason, a voxel mesh approach, which is extremely efficient in terms of time, has been enhanced with a smoothing algorithm to improve the accuracy of the stress/strain calculations. Results show that the output of the voxel mesh is significantly improved with the smoothing algorithm, while keeping the time needed to produce the mesh short [25].
Based on this mesh, a continuum finite-element model is proposed to simulate the tumour, its growth and the mechanical perturbations induced on the surrounding healthy tissues. A new mechanical model has been developed to model the sources and sinks of matter linked to tumour growth and shrinkage. The change in volume is modelled as a uniform strain added in the three main directions in the elastic formulation of the element. This mechanical law has been applied to the elements used to model the tumour. Stresses in the tumour and in the healthy tissues are then calculated according to the change in volume of the tumour elements.
The mechanical information obtained in this manner is transmitted back to the cellular simulator, leading to the coupling of the cellular and biomechanical simulations. On the one hand, the mesoscopic cell simulator (described in §2.2) uses information about the direction along which new tumour cells will spread, based on the pressure gradient in the surrounding tissue. On the other hand, the mechanical simulation needs information on the amount by which individual geometrical cells will expand which is, in turn, provided by the cell concentrations calculated by the cellular simulator. Initial results showed that biomechanical information leads to a 20 per cent correction of the tumour shape in terms of the ratio of smallest to largest moment of inertia compared with simulation performed without biomechanical simulations. For illustrative purposes, figure 4 shows biomechanical simulation results in a lung cancer case.
Figure 4. (a) Biomechanical simulations in lung cancer data with different colours representing the different anatomical structures used for the simulation. (b) Calculated magnitude of displacement (colour-coded) for the growth of a tumour in the lung.
2.4. Biomolecular (biochemical) simulations and molecular determinants of response to therapy
A new generation of anti-cancer drugs have recently been introduced to the clinic, targeting a specific molecule that plays a crucial role in tumour growth. One of these drugs is a tyrosine kinase inhibitor (TKI), which interferes with receptor tyrosine kinases; these are usually overexpressed or overactive in tumour cells. This modelling component focuses on the events taking place on the molecular level. A simulator has been built to provide a better understanding of treatment response to various drug therapy regimens. It has been used for free energy ranking of the binding of the TKI to the epidermal growth factor receptor (EGFR), and for accessing the conformational stabilities of the EGFR on mutations.
Using our biomolecular simulator, we have investigated the binding affinities of two TKIs—AEE788 and gefitinib—to wild-type and four mutant EGFRs. Multiple short (ensemble) molecular dynamics simulations have been performed for each inhibitor–EGFR complex on high-performance computing resources on the US TeraGrid (www.teragrid.org) and the EU's DEISA (www.deisa.eu). Structural and energetic analyses indicate that converged sampling is reached with respect to the energy minima. A reasonable correlation has been obtained between the calculated drug-binding affinities and available experimental data [26]. The simulations reveal how interactions change as a result of mutations, and account for the molecular basis of drug efficacy. The free energy calculations show that the simulator is able to rank binding affinities of one drug to multiple EGFR mutants, as well as the efficacy of drugs with respect to a single EGFR sequence [26]. The results indicate that the molecular-level simulator is able to identify drug treatments better suited to an individual's specific genotypes. It can therefore be expected to have an increasing impact in personalized drug treatment of targeted therapy as patients become more frequently subjected to genotypic assays as part of the standard routine.
Mutations can also change the activity of the protein directly. For example, the drug-resistant mutation T790M stabilizes the active conformation of the EGFR, which leads to overactivity. We have used the biomolecular simulator to study the changes of conformational stability upon mutations [27]. The wild-type and L858R mutant EGFRs are simulated for 200 ns, in both the active and inactive states. The simulations clearly show that the wild-type EGFR prefers the inactive state. This result is in agreement with experiments that indicate that the EGFR remains dormant under most physiological conditions [27]. The mutation changes the stabilities of both the active and inactive conformations, and shifts the equilibrium between the two conformations. The mutant EGFR also displays the initial steps of conformational transferring between the two conformations, while the wild-type EGFR remains in its initial state throughout the course of the simulations [27]. The overall conformational transformation is expected to occur on microsecond timescales, which we are planning to investigate on Anton, a purpose-built machine for molecular dynamics simulation.
The simulator, called the binding affinity calculator (BAC) [28], is an automated workflow tool that performs rapid simulations and analyses across multiple supercomputing resources. Rapid turnaround is achieved by use of the workflow tools developed in this project (see section ‘Integrated ContraCancrum technical environment’). Our results show that the simulator can accurately rank drug-binding affinities at clinically relevant timescales, and offer real-time support for clinical decision-making [26].
2.5. Biomolecular (molecular network level) simulations
Another molecular-level model aims to provide a statistical model of the individual response to therapy. This component defines the means to incorporate molecular information within the context of the in silico simulation of patient-specific therapy by modifying the cell survival probability within the tissue-level component. The molecular state of the tumour is a key factor in the therapeutic outcome.
With recent developments in the field of high-throughput molecular profiling, it is now practical to consider the contribution of an individual patient's molecular profile to their therapeutic outcome. Using drug growth inhibition 50 (GI50) or radiation 50 per cent lethal dose (LD50) in vitro sensitivity data, coupled with microarray expression data from a panel of cell lines, we have identified signatures of sensitivity and resistance to these cytotoxic therapeutic modalities. A statistical model is constructed based on the correlation between gene expression and therapy-induced cytotoxicity. The known molecular profiles of the tumour type under consideration (e.g. GBM) are classified from the treatment (e.g. temozolomide) responsiveness standpoint into either three groups (sensitive, intermediate and resistant) or a more continuous set of sensitivity grades. This grouping is used in order to perturb the population-based average values of the CKR or equivalently the cell kill probability (CKP) or the survival fraction (SF) so that molecular personalization of the multi-scale model is achieved. The quantitative extent of the perturbation is performed by starting with an empirically plausible fraction of the CKR (e.g. +1/3 CKR) to be added to the CKR in the case of a sensitive tumour or to be subtracted from the CKR (e.g. −1/3 CKR) in the case of a resistant tumour, and by subsequently applying an optimization loop. This statistical model is then used to assess a given patient's tumour profile, providing an estimation of cellular therapy response.
2.6. Image analysis modules
ContraCancrum has developed the necessary image analysis components for in silico modelling of tumours with the aim of extracting as much personalized pathophysiological information as possible from each patient's medical imaging data. Such data are both multi-modal (including T1/T2 MRI and positron emission tomography (PET)/CT) and temporal (the patient is scanned at least once before and after treatment). A variety of powerful tools are provided to enable the clinician to segment multi-modality images of tumours for both applications—gliomas as well as lung cancer. Because sophisticated image analysis methods always depend on the organ of interest as well as on the image modality used, specialized algorithms for both clinical applications have been developed.
Segmenting and labelling brain tissues in tumour-bearing images are difficult tasks. We chose to segment brain volumes implicitly by applying an atlas to the patient image. Atlas-based segmentation has the advantage of being robust while also providing other relevant information, for example on subcortical structures and fibre maps. To adapt the standard approach to tumour images, a Markov random field (MRF)-based tumour growth method has been developed, which introduces a tumour seed into the atlas and grows the lesion to its approximate shape, displacing the surrounding tissues according to their biomechanical properties. In a final step, a non-rigid Demons registration accounts for the accurate mapping of the atlas labels to the patient image. The method is presented in detail in Bauer et al. [29] and Bauer & Reyes [30] and was shown to be accurate, featuring state-of-the-art Dice coefficients of approximately 0.8 for white matter and grey matter. Results on one patient image from the ContraCancrum database are shown in figure 5.
Figure 5. (a) Axial slice of the patient image. (b) Atlas after MRF-based tumour growth. (c) Registered atlas. (d) Labelled patient image.
For the glioma application, a novel segmentation technique extending the traditional adaptive snakes algorithm and taking into account spatial image information has been proposed [31]. The method outperforms traditional snakes with an average overlap with the expert clinician's annotation of 89 per cent, while traditional snakes were at 82.5 per cent and region growing at 59.2 per cent [31]. Figure 6 shows the results of spatially adaptive active contours on brain tumour segmentation.
Figure 6. Results of the spatially adaptive active contours on semi-automatic tumour detection on four different magnetic resonance images of glioma cases.
A large number of the mentioned imaging functionalities as well as others, including registration and resampling, have been integrated in the DrEye tool that is available free to the scientific community [32].
In the case of lung cancer, the image analysis tools developed within ContraCancrum aim at registering time series of PET/CT images as well as tumour and normal tissue in both CT and PET images. We use a fast multi-resolution rigid registration in order to initialize a local block matching in CT images in the region around the tumour [33]. The tumour segmentation is done semi-automatically with little interaction. A segmentation result in one image of a time series can be propagated to images at other time points and adapted automatically. In order to correct for eventual segmentation errors, tools are provided to interact with the segmentation result and to adapt it manually [33].
2.7. Simulation module interconnections
The simulation modules have been integrated to form the composite simulator (integrator), which will ultimately perform the simulation tasks as submitted by the end users. Integration across biocomplexity scales is achieved by applying the ‘summarize and jump’ strategy [1]. The method starts from the macroscopic imaging data (a high biocomplexity level) and proceeds towards lower biocomplexity levels. When there is a need for an upwards movement in the biocomplexity scales, a summary of the available information pertaining to the previous lower level is used, most commonly in the form of appropriate parameter value perturbations.
Logical and technical validations of the composite simulator have been performed before the initiation of the clinical testing, optimization and validation procedure.
The input and output of the various simulators as well as their interconnections are basically as follows:
— Biochemical simulator {INPUT: receptor protein mutations, candidate targeted drugs; OUTPUT: sorting of candidate drugs based on their binding affinities and selection of the apparently best drug for a given patient}. At this early stage, the connection of the biochemical simulator with the rest of the simulator modules is to be seen only as a demonstrator of a future scenario in which the selection of the optimal drug in the targeted therapy context for a given patient would be based on in silico experimentation. According to that scenario, the drug properties and data would feed the molecular network simulator, which would provide estimates of the CKR to be input into the cellular and higher level simulators. | |||||
— Molecular network-level simulator {INPUT: molecular profile of tumour; OUTPUT: CKR}. The CKR will be the input into the cellular and higher level simulators. | |||||
— Normal tissue toxicity limits, based on available phase I clinical trial outcomes is used to avoid in silico experimentation involving treatment doses that would be forbidden owing to extreme toxicity. | |||||
— Cellular and higher level simulators (microscopic and/or mesoscopic–macroscopic approaches) {INPUT: processed multi-scale data referring to tumour cell density, cell cycling, mitotic potential of the various cell categories, neovascularization/necrosis field, treatment data and CKR; OUTPUT: spatio-temporal prediction of the tumour constitution regarding cell phase and mitotic potential}. The latter may be the input into the biomechanical simulator if a refined prediction of tumour morphology is sought. The output (with or without the utilization of the biomechanical simulator) is the input into the image analysis modules for visualization and clinical adaptation and validation purposes. | |||||
— Biomechanical simulator {INPUT: tumour cell concentration distribution in space at any simulated instant; OUTPUT: updated macroscopic morphology of the tumour}. | |||||
— Image analysis modules {INPUT: tomographic slices or simulation output files; OUTPUT: three-dimensional reconstruction of the tumour and its internal structure}. |
It is noted that the simulator and technological modules can be interconnected in either a bottom-up or a top-down sense, depending upon whether the user is seeking a new in silico prediction or a model clinical adaptation and/or a validation procedure.
3. Integrated contracancrum e-science environment
One of the problems faced by ContraCancrum—sharing clinical data—presents a significant hurdle if patient-specific medical simulation is to be incorporated into clinical practice, and for the facilitation of research using those data. The data sources held by hospitals represent a major resource that is currently not adequately exploited, by either researchers or clinicians. At the heart of the matter is the need to gain access to these distributed data sources in a routine, transparent way, while remaining subject to appropriate anonymization and security procedures. While solutions exist to enable access to federated, distributed data sources, in many cases these are neither appropriate nor acceptable to a hospital, nor insufficiently generic to be used in anything other than the narrow scenarios for which they were originally developed.
In ContraCancrum, several different classes of users need to gain access to these clinical data in order to run the various simulation techniques described in this paper. They do this through the IMENSE, which provides a central platform for researchers from which these data can be accessed, and from which simulation tools can be launched via Web services and orchestrating workflow tools. One of the initial tasks carried out in the ContraCancrum project was a requirements gathering exercise to assess the capabilities that the technical environment must offer in order to meet the needs of researchers.
This involved clinicians, scientists and IT specialists and led to the design of the IMENSE system [34], which comprises three main components. Firstly, IMENSE provides a data warehousing system, hosted at the Centre for Computational Science at University College London (UCL), London, UK, in which anonymized patient data can be stored. These data fall into three broad categories: imaging data, stored in the DICOM format, structured clinical records, and file-based data, generated by the different types of simulation used in the project. The data environment ties together these different types of data after pseudonymization through the use of unique patient identifiers (and, in future, through ontologies [35]). This means that a user of the data environment can query and view multiple different data types held on a single patient or on a population. The second component of IMENSE is a set of Web services, which provide a standard way to access different simulation tools, including the ability to segment medical images and launch simulations on high-performance computing resources. The third component of IMENSE is a workflow engine, which couples together data resources and simulation services to automate the processing of the different types of data available, and to tie different simulation scales together. All components of the technical environment are accessed through a Web portal, which manages access policies and presents data and results to users.
To address the aforementioned clinical data access problems, the data-sharing component of the IMENSE system is designed so that the hospital IT managers (the information controllers) can be assured that their data are adequately protected by implementing a range of security measures; patient data are managed in compliance with relevant data protection legislation (i.e. UK Data Protection Act and EU Health Directive [36]). A clinician or clinical worker identifies a set of data to be shared and ensures, with the assistance of the hospital's data manager, that the data are curated to an acceptable level. The data are then ‘pushed’ from the hospital system to a central project data repository. This approach has been successfully used in the GENIUS project [37], which involves pushing X-ray CT and magnetic resonance angiography (MRA) images from an NHS hospital in the UK to the Centre of Computational Science at UCL. This removes the need to punch inbound holes in hospital firewalls when creating federated databases.
A user wanting to gain access to the data can do so by retrieving it from the central repository. The curation stage can be quite labour intensive, but it is necessary for the system to be usable. The use of data-checking algorithms and ontologies (used to map between disparate datasets) can help to alleviate the problem.
A common feature of computational biomedical simulation scenarios such as those considered in ContraCancrum is the need to perform many tasks in a specific order to achieve a desired end result—ultimately, perhaps, a clinical decision. Typically, a scenario will involve data acquisition, pre-processing using low-end computational resources such as workstations, simulation using high-performance computational resources, and post-processing using high- or low-end resources. Such steps can present stumbling blocks for even experienced computational scientists; if these techniques are to be taken up by clinical researchers, and ultimately clinicians, then they need to be automated. Developments in e-Science have seen the emergence of workflow toolkits [38,39], designed to connect together discrete processing steps using distributed resources and to provide a trivial interface from which to create and launch a workflow. Workflows are essential to the entire Virtual Physiological Human (VPH) effort [40], as they are to many other medical computing scenarios beyond the scope of the VPH; a major goal of VPH research is to integrate simulations at different levels, which is essentially a workflow scenario [41].
The ContraCancrum ‘virtual laboratory’ consists of a number of different simulation techniques that can be usefully employed by the clinical oncologist and that are described in §2. All of these simulation paradigms have one thing in common: they are driven by the use of clinical datasets supplied by the clinical partners in the project. The simulation components must be automatically connected and should access datasets from the ContraCancrum technical environment. The logical way to do this is through a workflow tool, which allows multiple computational components to be linked together and accessed as a single application.
There are many such workflow tools available, but the one that most closely meets the needs of this project is GSEngine; this was originally developed in the EU FP6 ViroLab project [38] and now forms part of the VPH ToolKit [39]. GSEngine provides facilities to develop workflows using the Ruby scripting language, and then to execute them through a remote runtime engine. This is coupled with a workflow repository system, which means that workflows can be developed and used following a community model—workflows are first developed by ‘expert’ users or developers in a community and placed in the workflow repository. End users can then access and execute the workflows in the repository by specifying a few required parameters, without worrying about the details of how the workflow is constructed. Different interfaces exist for workflow development and execution, shielding users from the internals of how the workflow works. This fits the model of the IMENSE system well, since researchers within the community develop domain-specific, clinically relevant workflows, which are then executed by clinicians and other users for clinical and research purposes.
We have deployed the components of GSEngine as part of the IMENSE. GSEngine is connected to the Application Hosting Environment (AHE) [42] server in order to launch computational simulations across a range of distributed computing resources, and to the data environment in order to access the clinical and imaging data on which the simulations are based. Security of data and access to remote resources are provided via the Audited Credential Delegation [43,44] extension to the AHE.
4. Multiscale cancer model integration in the clinical context: a clinical decision-support scenario
Provided that the system has been validated (retrospectively and prospectively) for a specific tumour type, the simulator executes the simulation code for a number of candidate treatment schemes. The clinician uses the results in conjunction with his/her clinical experience and specialist knowledge to decide upon the optimal treatment scheme. Subsequent comparison of the predictions with the real outcome provides feedback for further optimization of the ContraCancrum integrated simulator.
More importantly, the integration across different scales is driven by the planned clinical decision-support scenario. This is illustrated in figure 7.
Figure 7. The envisaged multi-scale predictive oncology clinical decision-support scenario. The clinician can use molecular models for drug-binding ranking and patient sensitivity to a given therapy. The tissue-level oncosimulator provides four-dimensional simulations of patient response to therapy (including molecular-level simulations) predicting the therapy response which can then be evaluated (optimized) against the actual outcome.
The clinician introduces multi-scale patient-specific data (molecular, clinical, treatment and imaging) to the ContraCancrum simulators. The molecular-level models (as described in §§2.4 and 2.5) allow the clinician to use drug-binding affinity models for ranking drug-binding affinities. At the same time, the clinician can query the drug-sensitivity model with a patient's tumour transcriptome profile and estimate a priori sensitivity (either binary or continuous as discussed above). This gives a direct input to the tissue-level oncosimulator (described in §2.2) defining CKPs for the candidate therapy questioned.
In the clinical decision-support layer, the set of responses to potential treatments (i, i + 1, … ) is simulated by the oncosimulator to provide a four-dimensional therapy outcome. It might also be possible to run these simulations automatically with a variety of drugs that are chosen beforehand. In this way, the clinician runs the oncosimulator for all candidate therapies to define the ‘best’ for the individual patient. This, in turn, is validated against the actual therapy outcome in the optimization layer where the predictive oncology results are compared against the actual experimental results. Any deviations from the actual response to therapy may be used to refine and optimize the models.
5. Conclusions
ContraCancrum is built on the basis of instigating the clinical translation of predictive in silico oncology, for which a supportive and interdisciplinary environment is essential. The mission is to bring to life a novel but also feasible plan for translating multi-level cancer modelling into a clinical setting and to prove its worth, thereby establishing it as an important part of the VPH vision. Through the development of integrated multi-scale cancer models, the ContraCancrum integrated strategy is expected to contribute to the advancement of in silico oncology through the optimization of cancer treatment in the patient-individualized context by simulating the response to various therapeutic regimens.
In this paper, a paradigm for designing clinically driven multi-scale cancer models by combining multi-scale basic biomedical science with information technology has been delineated. While it is hard (and in some cases unfeasible) to integrate all ContraCancrum components into a single piece of software, several integrated components (oncosimulators) have been achieved, as described in §2.7. However, the most important novelty is the presented systemic integration of cancer modelling components under tight clinical expert supervision and interaction. This integration is highlighted in the proposed predictive oncology clinical decision-support scenario presented in §4 that draws a novel predictive medicine workflow in the context of model-assisted clinical decision support for oncology.
The ultimate goal of the work presented here is an integrated predictive in silico oncology environment that can be adapted, optimized and validated within the wider clinical oncology environment. The anticipated impact is a contribution to the optimization of personalized cancer treatment strategies, while the socio-economic and societal impacts are expected to be the alleviation of the societal burden caused by cancer.
Acknowledgements
This work is partially supported by the European Commission under the project ‘ContraCancrum: Clinical Oriented Translational Cancer Multi-level Modelling’ (FP7-ICT-2007-2-223979).
Footnotes
One contribution of 17 to a Theme Issue ‘The virtual physiological human’.