Mechanistic description of spatial processes using integrative modelling of noise-corrupted imaging data
Abstract
Spatial patterns are ubiquitous on the subcellular, cellular and tissue level, and can be studied using imaging techniques such as light and fluorescence microscopy. Imaging data provide quantitative information about biological systems; however, mechanisms causing spatial patterning often remain elusive. In recent years, spatio-temporal mathematical modelling has helped to overcome this problem. Yet, outliers and structured noise limit modelling of whole imaging data, and models often consider spatial summary statistics. Here, we introduce an integrated data-driven modelling approach that can cope with measurement artefacts and whole imaging data. Our approach combines mechanistic models of the biological processes with robust statistical models of the measurement process. The parameters of the integrated model are calibrated using a maximum-likelihood approach. We used this integrated modelling approach to study in vivo gradients of the chemokine (C-C motif) ligand 21 (CCL21). CCL21 gradients guide dendritic cells and are important in the adaptive immune response. Using artificial data, we verified that the integrated modelling approach provides reliable parameter estimates in the presence of measurement noise and that bias and variance of these estimates are reduced compared to conventional approaches. The application to experimental data allowed the parametrization and subsequent refinement of the model using additional mechanisms. Among other results, model-based hypothesis testing predicted lymphatic vessel-dependent concentration of heparan sulfate, the binding partner of CCL21. The selected model provided an accurate description of the experimental data and was partially validated using published data. Our findings demonstrate that integrated statistical modelling of whole imaging data is computationally feasible and can provide novel biological insights.
1. Introduction
In the past decades, our understanding of biological processes has been revolutionized by imaging technologies. Nowadays, super-resolved fluorescence microscopy [1], light sheet fluorescence microscopy [2], cryo-electron microscopy [3] and other technologies provide information about cell and tissue structures over a broad range of scales. Multiplexed information about intracellular processes is, for instance, provided by matrix-assisted laser desorption/ionization imaging mass spectrometry [4] and mass cytometry [5]. These imaging data are analysed using tailored image processing pipelines to quantify properties of interest (see [6] and references therein). This provides detailed information about the imaged system, e.g. biological tissues. Yet, mechanisms often remain elusive; for instance, it is usually not evident from imaging data how the observed spatial patterns are established and controlled. However, such insights are necessary to improve the understanding of complex biological systems [7,8].
Model-based approaches have been introduced to unravel the mechanisms underlying the spatio-temporal organization of tissues [9,10]. Partial differential equation (PDE) models and agent-based models which capture static and dynamic properties of tissue-scale images have been developed [10–13]. These models can describe the underlying biological mechanism and allow for the evaluation of competing biological hypotheses.
Modelling and hypothesis testing, however, mostly employ qualitative information [10] or summary statistics [12–15]. Qualitative information is used due to limited image quality caused, among other factors, by limitations of labelling methods. Summary statistics are considered as they are easy to assess using available processing pipelines. Although qualitative abstractions and summary statistics provide only a fraction of the information encoded in the images, they are widely used. A key reason is the use of sequential analysis approaches (figure 1a) which exploit established image processing pipelines.
Figure 1. Illustration of data-driven modelling in image-based systems biology. (a) Sequential analysis relying on image processing and extracted features. (b) Integrated modelling approach combining image processing, information retrieval and modelling in a single step. (Online version in colour.)
In this paper, we propose an integrated modelling approach for imaging data (figure 1b). The proposed framework combines image processing with the mechanistic description of the biochemical process using PDE models, instead of performing these steps sequentially. To account for outliers and structured measurement noise, e.g. signals generated by biological processes not considered in the model, we employ concepts from robust regression [16,17]. The integrated modelling approach facilitates the simultaneous assessment of the quality of the imaging data, the filtering of outliers and artefacts, and the mechanistic modelling of the biological process. As this integrated framework circumvents preprocessing and the extraction of summary statistics, it avoids a potential information loss and provides a tailored, unbiased filtering. By avoiding the tuning of parameters in the preprocessing, the approach furthermore simplifies the workflow and promises an improved reproducibility of analysis results.
We implemented the integrated modelling approach and assessed it by studying artificial and experimental data for the formation of gradients of the chemokine (C-C motif) ligand 21 (CCL21), a process relevant in the immune response. Using this process, we demonstrate the loss of information associated with the use of summary statistics as well as the influence of structured noise on estimation results. Subsequently, we demonstrate how the integrated modelling framework facilitates the direct use of noise-corrupted whole imaging data. We exploit the integrated framework to generate novel hypotheses regarding the underlying biochemistry, which are partially validated using data from the literature.
2. Methods
In this paper, we present an integrated modelling approach for tissue-scale imaging data. In the following, we outline the considered modelling approaches, data types, and inference methods.
2.1. Mechanistic model of spatio-temporal biological processes
We consider spatio-temporal biological processes described by reaction–diffusion equations—a class of PDE models. Reaction–diffusion equations are widely used in systems and computational biology, for instance, to capture the dynamics of intra- and extracellular substances [18].
The state variable of the PDE model is the abundance of n chemical substances (e.g. their concentrations) at time t and spatial location x ∈ Ω. The state is defined on the modelled spatial domain Ω and changes due to diffusion and biochemical reactions. The Laplace operator is denoted by , the matrix of diffusion coefficients by D(θ) and the reaction term by f(u(x, t), x, θ). The unknown parameters in the matrix of diffusion coefficients and the reaction term are denoted by θ. This yields the PDE model
2.2. Statistical modelling of imaging data
We consider standard image acquisition technologies which provide intensity averages over pixels (or voxels). The spatial domain of the jth pixel is denoted by Ωj, j = 1, …, np. This yields the observation model
The intensity values of individual pixels, yij(tk), are corrupted by experimental noise, providing the measured pixel intensities . In many applications, the measurement noise is assumed to be independent and identically distributed, e.g. multiplicative log-normally distributed measurement noise,
The collection of all imaging data is in the remainder denoted by . Furthermore, the unknown observation parameters, i.e. scaling and background, and noise levels are included in the parameter vector θ.
2.3. Reconstruction of biological processes from imaging data
To achieve a mechanistic understanding of spatio-temporal biological processes, we want (i) to infer the parameters of model (2.1) and (ii) to perform model selection to compare competing hypotheses. To address these problems, we consider three alternative statistical approaches:
| — | Direct approach: The presence of outliers is disregarded and the model is fitted to the data using standard noise models (2.3). | ||||
| — | Filtering approach: The measurement data are preprocessed to detect and remove outliers. The model is fitted to the remaining data using standard noise models (2.3). | ||||
| — | Integrated modelling approach: A statistical model for the outlier distribution is formulated. From outlier and noise distribution a likelihood function is derived and used to simultaneously fit the model and quantify the noise level. | ||||
In the following, these approaches are described in further detail.
2.3.1. Direct approach
The likelihood of observing the imaging data given the parameter vector θ is
2.3.2. Filtering approach
To reduce the impact of outliers and structured noise on the estimation results, image data are preprocessed. We consider filtering methods which provide an index set of filtered pixels, , for the individual observables yi and time points tk. These index sets are masks for regions to be excluded from the objective function. Accordingly, the likelihood is only evaluated for the unfiltered pixels, meaning that for the index j in (2.4) only the set is considered. Appropriate filtering should render parameter estimation more robust against outliers and structured noise.
Filtering can be performed using a variety of algorithms, most of which possess several tuning parameters which have to be chosen manually or in a semi-automated fashion. The choice of algorithm and tuning parameters depends on the type of structured noise. To remove bright spots from the image, maximally stable extremal region (MSER) filtering [23] can be employed. MSER filtering is based on a water shedding mechanism and has been used successfully in a series of studies (e.g. work by Buggenthin et al. [24]).
2.3.3. Integrated modelling approach
We propose to circumvent the selection of filtering algorithms and the manual tuning of filtering parameters by integrating filtering and parameter estimation. Our integrated modelling approach requires a sufficiently flexible statistical model, ideally accounting for standard measurement noise, structured noise and outliers as well as spatial correlation structure (see §2.2). In this study, we follow ideas from robust regression, i.e. ε-contamination models [25], to address these needs. We assume that the intensity measurement for each pixel is with probability wo an outlier/artefact generated by structured noise and with probability 1 − wo no outlier. The outliers are assumed to be distributed according to the density function po while the remaining points are distributed according to the standard noise model (2.3). This yields the likelihood function
Conceptually, integrated statistical modelling weights the impact of a data point on the model fit, while the standard filtering approach employs a hard cut-off. The weighting depends on the model–data agreement in different regions of the image, providing a context-dependent filter.
2.4. Parameter estimation and model selection
The analysis of measurement data using the different statistical approaches requires the estimation of the parameters . For this, we use maximum-likelihood (ML) estimation. The ML estimate of the parameter vector, , is the solution of the PDE-constrained optimization problem
Optimization problem (2.8) is usually nonlinear and can possess multiple local optima. To determine the global optimum, we employ a multi-start local optimization method. The starting points are sampled from Θ using latin hypercube sampling. For local optimization, an interior point algorithm is used, which is supplied with gradients computed using forward sensitivity equations. This multi-start approach is computationally efficient and reliable for a broad range of applications [26,27]. Instead of multi-start local optimization, also evolutionary and genetic algorithms [28], particle swarm optimizers [29] or hybrid optimizers [30] could be employed. For a comprehensive survey and evaluations, we refer to the work of Moles et al. [31] and Raue et al. [26].
The parameter estimates are usually subject to uncertainty due to limited and noise-corrupted data. We determine the uncertainty of the estimated parameters using structural and practical identifiability analysis. For practical identifiability, profile likelihoods are computed [32,33], which provide parameter confidence intervals to particular confidence levels. For profile likelihood calculation, we use the methods recently described for parameter estimation problems with PDE constraints [34].
Biological processes are still poorly understood and there are usually competing hypotheses giving rise to different model structures. To assess the plausibility of hypotheses, we use the Bayesian information criterion (BIC) [35]. The BIC accounts for model–data mismatch and the complexity of the model, measured by the negative log-likelihood and number of parameters , respectively. It is defined as
2.5. Implementation
All methods are implemented in matlab and available as electronic supplementary material, Code S1. The simulation of the PDE model is implemented using the Partial Differential Equation Toolbox of matlab. The multi-start local optimization exploits the matlab routine fmincon.m. Parameter estimation and uncertainty analysis are performed using the Parameter EStimation TOolbox (PESTO) available on GitHub (https://github.com/ICB-DCM/PESTO) [37].
3. Results
In the following, we will illustrate the reliability achieved using whole imaging data and spatial summary statistics, and compare direct, filtering and integrated modelling approaches for statistical inference from imaging data. For this purpose, we studied artificial imaging data, for which the ground truth is known, as well as experimental imaging data, from which new biological insights are gained.
3.1. Biological process
We studied the distribution of CCL21 in dermal interstitium (figure 2a). CCL21 gradients facilitate the delivery of antigens to the lymph nodes by guiding mature dendritic cells (figure 2b) [38]. Inside the lymph nodes, mature dendritic cells present the antigens to T-cells, initiating the adaptive immune response.
Figure 2. Model for CCL21 gradient formation and experimental data. (a) Schematic of model for CCL21 gradient formation, including diffusive and immobilized CCL21. (b) Illustration of mature dendritic cells guided by a gradient of immobilized CCL21. (c) Experimental data for immobilized CCL21 and lymphatic vessels. CCL21 immunostaining is colour-coded and the outlines of the lymphatic vessels (light grey lines), which were determined using an additional staining, are indicated. These data were collected and provided by Weber et al. [22] and we refer to the original publication for details on materials and methods.
The formation of the CCL21 gradients and their biological functions are relatively well understood and experimentally verified [22]. It is known that soluble chemokine CCL21 is secreted at the lymphatic vessels, and it is assumed that from there it diffuses into the dermal interstitium. Furthermore, it has been established that CCL21 binds to heparan sulfate proteoglycan, resulting in immobilized CCL21 which guides the migratory dendritic cells. However, the quantitative properties of the individual processes and the detailed mechanisms remain to be analysed. In addition, the available imaging data (figure 2c) are corrupted by structured noise (see discussion below), rendering the analysis challenging and the process well suited for the evaluation of the proposed approaches.
3.2. Mathematical model and experimental data
We modelled the dynamics of the concentrations of soluble CCL21 u1(x, t), of heparan sulfate u2(x, t) and of heparan sulfate–CCL21 dimers u3(x, t) by a system of PDEs [21] with the two-dimensional (2D) spatial coordinate . The PDE model accounted for
| — | the secretion of soluble CCL21 with rate α from L lymphatic vessels, | ||||
| — | the diffusion of soluble CCL21 with diffusion coefficient D, | ||||
| — | the degradation of soluble CCL21 with rate constant γ, | ||||
| — | the binding of soluble CCL21 to heparan sulfate with rate constant k1, and | ||||
| — | the unbinding of CCL21 from heparan sulfate with rate constant k−1, | ||||
and we assumed no flux conditions at the boundaries. The spatial location of the lth lymphatic vessel is marked by the indicator function ql(x), which is zero outside and one inside the lymphatic vessel. This yields the spatial domains covered by lymphatic vessels, Ωl = {x ∈ Ω | ql(x) = 1}, l = 1, …, L, which we refer to as lymphatic vessel masks. Mathematically, we obtained the evolution equation
Weber et al. [22] succeeded in measuring the in vivo gradients of immobilized CCL21 u3(x, t) in mouse ear sheets by immunostraining. Therefore, mouse ear sheets were incubated with CCL21 antibody and imaged using confocal microscopy. This yielded the 2D images depicted in figure 2c. As the experiments were performed in unperturbed tissue, the images provide the equilibrium distributions. Accordingly, the experimental readout is
As the absolute concentration of CCL21 was unknown and merely the equilibrium distribution of immobilized CCL21 was measured, the parameters (α, D, γ, k1, k−1, S0, s, b)T were structurally non-identifiable (see [39] for definition). To circumvent this, we reformulated the model in terms of the parameters (D/γ, (αk1)/(γk−1), s S0, b)T. For details on the reparametrization, we refer to electronic supplementary material, text S2.
The visual inspection of the imaging data revealed a high level of immobilized CCL21 associated with the lymphatic vessel, which was in agreement with the model. However, there were also high intensity spots outside the lymphatic vessels (figure 2c), which were not explained by the aforementioned processes. As in fixed tissues the immunostaining performed by Weber et al. [22] labels intracellular and extracellular CCL21 [40], these spots are most probably caused by previously reported CCL21 expressing cells [41]. As intracellular CCL21 does not contribute to the extracellular distribution of CCL21 described by model (3.1), the spots should be considered as structured noise and disregarded in the parameter estimation. This rendered the modelling problem appropriate for the evaluation of the integrated modelling approach.
3.3. Integrative modelling approach outperforms conventional methods on artificial experimental data
In this section, we assess the properties of different image-based modelling using artificial imaging data. This allows us to evaluate the accuracy with which the true parameter vector is recovered using (i) whole imaging data versus a summary statistic and (ii) direct approach versus filtering approach versus integrated modelling approach.
3.3.1. Generation of artificial data
We derived artificial imaging data closely resembling the experimentally observed images to ensure a realistic test scenario. The spatial structure of the images was conserved by using the measured lymphatic vessel masks and selecting model parameters which roughly reproduced the experimentally observed CCL21 distributions. The employed parameters are provided in electronic supplementary material, text S2 and table S1. The structured measurement noise was captured by extracting relevant features of the high-intensity spots. Firstly, the high-intensity spots were detected using MSER filtering [23] using an implementation by Nistér & Stewenius [42]. Secondly, the identified spots were analysed to obtain the distributions of spot shape parameters and sizes (figure 3a). Given these distributions, the artificial data were obtained by simulating the model for the selected parameters and adding a varying number of spots with properties sampled from the measured distribution. For simplicity, the spots were assumed to be ellipsoidal. A representative artificial image is depicted in figure 3c. While the artificial data do not capture the full complexity of experimental data, they facilitate the evaluation of the approaches.
Figure 3. Pipeline for the generation of artificial imaging data. (a) Raw and filtered images. Outlines of spots identified using MSER filtering are indicated. (b) Distribution of size and shape parameters of spots. Histogram and points in scatter plot indicate the information extracted using filtering. The lines indicate the densities used for the generation of the artificial data. (c) Artificial data obtained by simulation of the model (3.1) and subsequent addition of spots and measurement noise.
In addition to the artificial imaging data, we generated artificial summary statistics. Here, we considered the distance-dependent average intensity of immobilized CCL21. To calculate this summary statistic for the artificial data, the minimal distance to the next lymphatic vessel is computed for each pixel. Subsequently, the intensity values of all pixels with the same distance are averaged. The particular summary statistic was chosen because (1) it was used in the paper by Weber et al. [22] to analyse the considered dataset and (2) it is similar to spatial summary statistics used in other image-based modelling projects [12,14,43,44].
3.3.2. Detailed spatial information improves estimation accuracy
Given the artificial datasets, we first asked how much information the raw imaging data contain in comparison to the summary statistic computed from them. To study this, we employed the 2D model (3.1), as well as a one-dimensional (1D) model approximating the distance-dependent average CCL21. The 1D model was included as the use of summary statistics and simplified process description often goes hand in hand [12–14]. Overall, we considered three set-ups:
| (i) | Fitting of the distance-dependent average CCL21 intensity using the 1D model. | ||||
| (ii) | Fitting of the distance-dependent average CCL21 using the 2D model accounting for the measured vessel topology. | ||||
| (iii) | Fitting of the CCL21 imaging data using the 2D model accounting for the measured vessel topology. | ||||
The 1D model employed in set-up (i) is a simplified version of model (3.1) with x ∈ [0, L] denoting the distance from the lymphatic vessel. The secretion at the lymphatic vessel (at x = 0) is modelled via the boundary condition ∂u1/∂x|x=0 = α. The diffusion and reaction dynamics stay the same. For details on the 1D model, we refer to the electronic supplementary material, text S2. Set-ups (ii) and (iii) employed model (3.1) with the measured lymphatic vessel mask. To study the relevance of detailed spatial information, we considered artificial data without outliers and structured noise but with independent and identically distributed measurement noise, i.e. multiplicative log-normally distributed measurement noise. The signal-to-noise ratio, which is the mean signal intensity divided by the standard deviation of the noise, was approximately 6.
As the considered artificial data contain neither outliers nor structured noise, we employed the direct approach for statistical modelling. Parameter optimization and uncertainty analysis for set-ups (i)–(iii) were performed using multi-start local optimization and profile likelihood methods, respectively. All parameters were constrained to a regime spanning at least four orders of magnitude (see electronic supplementary material, text S2, table S1).
The analysis of a representative artificial dataset revealed that for set-ups (i) and (ii) a good agreement with the summary statistic was achieved (figure 4a,b), while for set-up (iii) a good agreement with the imaging data was obtained (figure 4c). Indeed, although the artificial data were generated using a 2D model with a (non-trivial) experimentally observed lymphatic vessel geometry, the 1D model provides an accurate fit of the summary statistics for all but small distances from the vessel. The estimated parameters in set-up (i) were however far from the true parameters. This was among other reasons due to practical non-identifiabilities of the parameters (αk1)/(γk−1) and S0 (figure 4d). While the individual parameters are non-identifiable, their product is practically identifiable. The same phenomenon was observed for set-up (ii) (figure 4d,e), implying that modelling the underlying spatial structure did not improve the information extraction substantially. By contrast, for set-up (iii), all parameter estimates were close to the true parameter and practically identifiable (figure 4d). Thus, not the summary statistic but the whole imaging data should be used as they allow for more accurate parameter estimation. While the use of more informative summary statistics might resolve the problems, it is not clear whether such summary statistics exist and how they can be constructed a priori.
Figure 4. Parameter estimation results for the 1D and 2D models using extracted features and whole imaging data. (a) Fitting result for the 1D model using the distance-dependent average intensity of immobilized CCL21 (1D data). (b) Fitting result for the 2D model using the distance-dependent average intensity of immobilized CCL21 (1D data). (c) Fitting result for the 2D model using the measured intensity distribution of immobilized CCL21 (2D data). (d,e) Profile likelihood derived confidence intervals for parameter estimates for set-ups (i)–(iii). The horizontal line marks the true parameter and the vertical bars represent the confidence intervals corresponding to different confidence levels (75%, 90% and 99%), with non-identifiable parameters indicated by ‘n.i.’. (e) Profile likelihood derived confidence intervals for the product of the parameters which are non-identifiable for set-ups (i) and (ii).
3.3.3. Integrated modelling approach yields more accurate and robust results than conventional methods
As whole imaging data are strongly influenced by outliers and structured noise, we compared the accuracy of parameter estimates obtained using the direct approach, the filtering approach and the integrated statistical approach. We considered artificial imaging data with 0 to 620 bright spots and evaluated 30 datasets to obtain robust statistics.
Our analysis revealed that for artificial datasets with a large number of spots, fits obtained using the direct approach overestimated the concentration of immobilized CCL21 outside the spots while filtering and the integrated approach provided consistent results (figure 5a). Apparently, the direct approach could not explain the structured noise and the bimodal distribution of the residual (figure 5b). The filtering of points and the integrated modelling resulted in a more consistent statistical description.
Figure 5. Parameter estimation results for images with structured noise. (a) Spatial structure of residuals and (b) observed (histogram) and modelled (line) residual distribution for representative dataset with 320 spots. The results for the simulation of the 2D model with parameter estimates obtained using (top) the direct approach, (middle) the filtering approach, and (bottom) the integrated modelling approach are depicted. Pixels disregarded by the filtering approach are coded green (histogram). The estimated outlier distribution in the integrated modelling approach is depicted in blue (line). (c) Mean parameter estimation error (black line) 95% percentile interval (area) of the direct approach, the filtering approach and the integrated modelling approach for different spot numbers computed using 30 artificial datasets.
The analysis of the estimation indicated that for low numbers of spots, the filtering approach and the integrated modelling approach yielded almost the same results as without spots while the direct approach already possessed a bias and a large variance (figure 5c). For medium and high numbers of spots, the integrated modelling yielded the smallest estimation error. The improvement of the integrated modelling approach over the alternative approaches was statistically significant (p-value < 0.01; Welch’s paired-sample one-sided t-test) for (αk1)/(γk−1), S0 and σ2 for numbers of spots greater than or equal to 160. This was the case although (a) the filter approach employed the same MSER filter settings used to obtain the spot statistics—this parameter setting appeared to be ideal—and (b) the integrated modelling approach did not account for the spatial structure of measurement noise. Indeed, the integrated modelling approach yielded almost unbiased results. Thus, integrated noise modelling provided robust parameter estimates from imaging data corrupted with the considered type of structured noise.
In conclusion, our analysis of artificial data suggests that mechanistic modelling of spatial processes should be based on detailed imaging data rather than some spatial summary statistic with unknown information content. Additionally, filtering but even more so the proposed integrated modelling approach can provide robust estimates in the presence of structured noise and outliers.
3.4. Integrative modelling approach predicts lymphatic vessel-dependent heparan sulfate concentration
Given the positive results for artificial data, we used the integrated modelling approach on whole imaging data to analyse experimental data for CCL21 gradient formation. Among other things, we asked whether the current assumption of uniform heparan sulfate concentration is appropriate or alternative mechanisms need to be considered.
3.4.1. Model-based image analysis reveals limitation of a literature-based model
We employed model (3.1) with uniform heparan sulfate concentration, s0(x) = S0, to describe the imaging data collected by Weber et al. [22]. This model was based on available information in the literature (e.g. [22]) and suggested by experts in the field. As the no-flux boundary conditions (3.3) are presumably not precisely met in the biological system, we disregarded pixels which are within 40 µm of the boundary for the calculation of the objective function (2.6). This depth was chosen based on preliminary estimates for the diffusion length from the summary statistic (electronic supplementary material, text S2, figure S1) and retrospectively validated given the fitting results.
The fitting results for the model with uniform heparan sulfate concentration for a representative image with multiple lymphatic vessels are depicted in figure 6. For this image, the comparison of experimental data and the fitting results (figure 6b,c) revealed that—as expected—bright spots outside lymphatic vessels are not captured. However, there were also larger regions in the image with substantial disagreement. In particular, in lymphatic vessels 1 and 2, the concentration of immobilized CCL21 was overestimated. Accordingly, the residuals were not uncorrelated but show a clear spatial structure (figure 6d), resulting in a pronounced tail in the residual distribution (figure 6e). This indicated that the model with uniform heparan sulfate concentration might be too simple.
Figure 6. Analysis of model with uniform heparan sulfate concentration. (a) Spatial location of lymphatic vessels. The individual vessels are colour-coded. (b) Experimental data for immobilized CCL21. The difference in the concentration of immobilized CCL21 between lymphatic vessels is indicated along with the presence of spots. (c) Simulation results for immobilized CCL21. The maximum-likelihood estimate for the model with uniform heparan sulfate concentration is depicted. Differences between lymphatic vessels are not captured. Spots are filtered using the integrated statistical model. (d) Residuals of experimental data and simulation results. The low measured intensities in lymphatic vessels 1 and 2 are not captured by the model. (e) Observed residual distribution and distributions of unstructured and structured noise indicated by the integrative modelling approach.
3.4.2. Mathematical modelling supports hypothesis of vessel-dependent heparan sulfate concentration
As the detailed analysis of the whole imaging data revealed limitations of the literature-based model, we evaluated possible model refinements. In addition to the hypothesis underlying the model presented in the previous section:
| (1) | uniform heparan sulfate concentration, s0(x) = S0, | ||||
we considered two alternative hypotheses:
| (2) | Different heparan sulfate concentrations in lymphatic vessels and the tissue, . | ||||
| (3) | Different heparan sulfate concentrations in individual lymphatic vessels and the tissue, . | ||||
These hypotheses yield models 1–3 which are illustrated in figure 7a. We employed the integrated modelling approach to train the models 1–3 on all 9 images recorded by Weber et al. [22], namely image 1 to image 4 and image 12 to image 16. The optimization converged robustly (figure 7b) and the fitting results for different images are provided in electronic supplementary material, text S2, table S2–2.
Figure 7. Comparison of three different hypotheses for the CCL21 distributions. (a) Schematics of models 1–3. (b) Multi-start optimization results for the model alternatives, indicating a unique global optimum for each model. (c) Results of model comparison using BIC for each image and the overall fit. The differences of BIC values for different models are colour-coded, with each model being assigned one colour (model 1, purple; model 2, cyan; and model 3, red). (d) Experimental data for the spatial distribution of immobilized CCL21 and simulation results for model 3. (e) Experimental data for the distance-dependent intensity of immobilized CCL21 and simulation results for models 1–3.
For the individual images as well as for the overall dataset, model 3 was substantially better than models 1 and 2 (figure 7c). Model 3 provided a good agreement with the imaging data (figure 7d). Furthermore, the prediction of differences in the heparan sulfate concentrations between individual lymphatic vessels is consistent with experimental data indicating different levels of extracellular CCL21 in collecting and initial lymphatics [40]. Thus, our model-based analysis provided a mechanistic hypothesis which we were able to partially validate using published results.
To conclude, in this section, we verified the applicability of the integrated modelling approach to experimental imaging data including structured noise. We employed the statistical approach for model-based data analysis and hypothesis testing, thereby providing new insights into the CCL21 gradient formation and dendritic cell guidance. Notably, all models achieved an equally good fit for the summary statistic (figure 7e). This implies that the information content of the considered summary statistic is too limited for model selection and confirms that models should be rather based on the whole imaging data.
4. Discussion
Imaging data are widely used to assess biological processes. In many studies, the richness of imaging data is, however, disregarded and they are merely used to derive and evaluate simple summary statistics. We illustrated that this can result in a considerable loss of information. While summary statistics are often sufficient to draw qualitative conclusions, our analyses suggest that quantitative mechanistic models should be trained using whole imaging data—wherever possible—to exploit their richness. Accordingly, quantitative mechanistic models of spatio-temporal processes should be used instead of simplified models, e.g. 1D models, describing summary statistics. This avoids a loss of information and can improve identifiability.
The model-based analysis of imaging data facilitates the unravelling of novel mechanisms and the comparison of competing hypotheses [9,10,13]. However, this is often demanding and error-prone if structured noise and outliers are present. To address this problem, we introduced an integrated approach for the statistical and mechanistic modelling of imaging data. The integrated modelling approach employs a flexible statistical model with additional parameters. This enables it to cope with intensity distributions arising in the presence of structured noise and outliers. Conceptually, the integrated modelling approach can be interpreted as a direct approach with a more suited model of the measurement noise. For the considered problems, the integrated modelling approach yields similar or better results than conventional sequential methods. Even without knowledge of the precise structure of the noise, the method was able to reduce estimation bias and variance compared to direct and filtering approaches, providing more reliable parameter estimates. The finding that the integrated modelling approach outperforms the filtering approach, which uses information about spot properties, is very promising and hints towards its true potential.
To evaluate the properties of the integrated modelling approach, we studied CCL21 gradient formation. We established the first quantitative mathematical model of CCL21 gradients measured in tissue. Using experimental data, we quantified the estimation error of different models and performed model selection. Among other results, we found indications that the heparan sulfate concentration is vessel dependent. This finding relies on the mechanistic description of the imaging data we proposed. Simple statistical models are not sufficient as the heparan sulfate concentration is not observed directly. The vessel-dependence can influence the gradient formation and cell guidance and might be relevant in some disease conditions [45]. Furthermore, it demonstrates that integrated modelling approaches might reveal novel information from available data and can help to unravel causal factors. While CCL21 gradient formation is a specific example, the principle of sugar-mediated immobilization in gradient formation is observed for many (extracellular) signalling molecules, including growth factors, cytokines and selected hormones. This renders the proposed model and analysis approach interesting for a large number of research projects.
In this study, we proposed a simple statistical model for outliers and structured noise, and used it for the inference of PDE models. In principle, the statistical model could be used in combination with other types of mechanistic spatio-temporal models, including agent-based models and hybrid discrete–continuum models [13]. A further improvement of the integrated modelling approach could be achieved by considering more tailored statistical models. The correlation of noise in neighbouring pixels could be considered and even sophisticated segmentation methods, e.g. graph-based segmentation approaches [46], might be incorporated in a likelihood framework. Extension in this direction and towards image regression could improve robustness and applicability further. In addition, the use of time series data—which is supported by the approach—will facilitate the extraction of dynamic features and improve structural and practical identifiability.
As an alternative to the proposed frequentists method, Bayesian methods could be used to incorporate prior knowledge on the model parameters. In recent years, approximate Bayesian computation (ABC) methods [47,48] became popular and were also used to model spatial processes [13,49]. However, ABC methods employ summary statistics and we are not aware of a study using whole imaging data, which proved necessary in our analysis. Furthermore, ABC methods require accurate noise models [50], such as a simple statistical model for outliers and structured noise, and are often computationally demanding.
In conclusion, mechanistic understanding and rigorous hypothesis testing in biology require the formulation of mathematical and computational models. For cellular processes, this led to the development of modelling and estimation toolboxes, e.g. Data2Dynamics [51], which support the simultaneous inference of kinetic parameters and measurement noise. We illustrated that such a simultaneous inference is also feasible for the case of spatial models, which are usually more challenging. We illustrated parameter optimization, uncertainty analysis and model selection for PDE models of noise-corrupted imaging data. We expect that the proposed concept and algorithms are well suited for a broad range of applications, including scenarios with time-resolved measurements and time-dependent domains [10]. This is also facilitated by the availability of the matlab code, simplifying reuse and extensions of the methods. Accordingly, this study will contribute to the mechanistic description of spatial processes.
Data accessibility
Data, code and results files are provided as electronic supplementary material at: doi:10.5281/zenodo.1202475.
Authors' contributions
F.J.T. and J.H. conception and design of the study; M.S. supply of the imaging data; S.H. and J.H. model development and statistical inference; S.H., M.S. and J.H. interpretation of the results; S.H. and J.H. drafting of the manuscript. All authors revised the manuscript critically for important intellectual content, and provided final approval for publication.
Competing interests
The authors have no competing interests.
Funding
The project was funded by the German Ministry of Education and Research (BMBF) in the SYS-Stomach project (grant no. 01ZX1310B).
Acknowledgements
The authors thank Anna Fiedler and Dantong Wang for fruitful discussions and proof-reading of the manuscript.
Footnotes
References
- 1.
Huang B, Bates M, Zhuang X . 2009Super-resolution fluorescence microscopy. Annu. Rev. Biochem. 78, 993-1016. (doi:10.1146/annurev.biochem.77.061906.092014) Crossref, PubMed, ISI, Google Scholar - 2.
Santi PA . 2011Light sheet fluorescence microscopy. J. Histochem. Cytochem. 59, 129-138. (doi:10.1369/0022155410394857) Crossref, PubMed, ISI, Google Scholar - 3.
Al-Amoudi A et al.2004Cryo-electron microscopy of vitreous sections. EMBO J. 23, 3583-3588. (doi:10.1038/sj.emboj.7600366) Crossref, PubMed, ISI, Google Scholar - 4.
Cornett SD, Reyzer ML, Chauran P, Caprioli RM . 2007MALDI imaging mass spectrometry: molecular snapshots of biochemical systems. Nat. Methods 4, 828-833. (doi:10.1038/nmeth1094) Crossref, PubMed, ISI, Google Scholar - 5.
Giesen C et al.2014Highly multiplexed imaging of tumor tissues with subcellular resolution by mass cytometry. Nat. Methods 11, 417-422. (doi:10.1038/nmeth.2869) Crossref, PubMed, ISI, Google Scholar - 6.
Chenouard N et al.2014Objective comparison of particle tracking methods. Nat. Methods 11, 281-289. (doi:10.1038/nmeth.2808) Crossref, PubMed, ISI, Google Scholar - 7.
Turing AM . 1952The chemical basis of morphogenesis. Phil. Trans. R. Soc. Lond. B 237, 37-72. (doi:10.1098/rstb.1952.0012) Link, ISI, Google Scholar - 8.
Gurdon JB, Bourillot PY . 2001Morphogen gradient interpretation. Nature 413, 797-803. (doi:10.1038/35101500) Crossref, PubMed, ISI, Google Scholar - 9.
Iber D, Karimaddini Z, Ünal E . 2016Image-based modelling of organogenesis. Brief Bioinform. 17, 616-627. (doi:10.1093/bib/bbv093) Crossref, PubMed, ISI, Google Scholar - 10.
Uzkudun M, Marcon L, Sharpe J . 2015Data-driven modelling of a gene regulatory network for cell fate decisions in the growing limb bud. Mol. Syst. Biol. 11, 815. (doi:10.15252/msb.20145882) Crossref, PubMed, ISI, Google Scholar - 11.
Menshykau D, Blanc P, Unal E, Sapin V, Iber D . 2014An interplay of geometry and signaling enables robust lung branching morphogenesis. Development 141, 4526-4536. (doi:10.1242/dev.116202) Crossref, PubMed, ISI, Google Scholar - 12.
Hersch M, Hachet O, Dalessi S, Ullal P, Bhatia P, Bergmann S, Martin SG . 2015Pom1 gradient buffering through intermolecular auto-phosphorylation. Mol. Syst. Biol. 11, 818. (doi:10.15252/msb.20145996) Crossref, PubMed, ISI, Google Scholar - 13.
Jagiella N, Rickert D, Theis FJ, Hasenauer J . 2017Parallelization and high-performance computing enables automated statistical inference of multi-scale models. Cell Syst. 4, 194-206. (doi:10.1016/j.cels.2016.12.002) Crossref, PubMed, ISI, Google Scholar - 14.
Hock S, Ng Y-K, Hasenauer J, Wittmann D, Lutter D, Trumbach D, Wurst W, Prakash N, Theis FJ . 2013Sharpening of expression domains induced by transcription and microRNA regulation within a spatio-temporal model of mid-hindbrain boundary formation. BMC Syst. Biol. 7, 48. (doi:10.1186/1752-0509-7-48) Crossref, PubMed, ISI, Google Scholar - 15.
Hross S, Fiedler A, Theis FJ, Hasenauer J . 2016Quantitative comparison of competing PDE models for Pom1p dynamics in fission yeast. In Proc. 6th IFAC Conf. Found. Syst. Biol. Eng. IFAC-PapersOnLine, vol. 49, pp. 264–269. Google Scholar - 16.
Lange KL, Little RJ, Taylor JM . 1989Robust statistical modeling using the t distribution. J. Am. Stat. Assoc. 84, 881-896. (doi:10.2307/2290063) ISI, Google Scholar - 17.
Peel D, McLachlan GJ . 2000Robust mixture modelling using the t distribution. Stat. Comput. 10, 339-348. (doi:10.1023/A:1008981510081) Crossref, ISI, Google Scholar - 18.
Efendiev M . 2013Evolution equations arising in the modelling of life sciences.International Series of Numerical Mathematics, vol. 163 . Basel, Switzerland: Springer. Crossref, Google Scholar - 19.
Goldman DB . 2010Vignette and exposure calibration and compensation. IEEE Trans. Pattern Anal. Mach. Intell. 32, 2276-2288. (doi:10.1109/TPAMI.2010.55) Crossref, PubMed, ISI, Google Scholar - 20.
Schwarzfischer M, Marr C, Krumsiek J, Hoppe PS, Schroeder T, Theis FJ . 2011Efficient fluorescence image normalization for time lapse movies. In 6th Int. Workshop on Microscopic Image Analysis with Applications in Biology (MIAAB), Heidelberg, Germany, 2 September 2011. Google Scholar - 21.
Hock S, Hasenauer J, Theis FJ . 2013Modeling of 2D diffusion processes based on microscopy data: parameter estimation and practical identifiability analysis. BMC Bioinf. 14, S7. (doi:10.1186/1471-2105-14-S10-S7) Crossref, PubMed, ISI, Google Scholar - 22.
Weber M, Hauschild R, Schwarz J, Moussion C, de Vries I, Legler DF, Luther SA, Bollenbach T, Sixt M . 2013Interstitial dendritic cell guidance by haptotactic chemokine gradients. Science 339, 328-332. (doi:10.1126/science.1228456) Crossref, PubMed, ISI, Google Scholar - 23.
Matas J, Chum O, Urban M, Pajdla T . 2004Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22, 761-767. (doi:10.1016/j.imavis.2004.02.006) Crossref, ISI, Google Scholar - 24.
Buggenthin F, Marr C, Schwarzfischer M, Hoppe PS, Hilsenbeck O, Schroeder T, Theis FJ . 2013An automatic method for robust and fast cell detection in bright field images from high-throughput microscopy. BMC Bioinf. 14, 297. (doi:10.1186/1471-2105-14-297) Crossref, PubMed, ISI, Google Scholar - 25.
Berger J, Berliner LM . 1986Robust Bayes and empirical Bayes analysis with ε-contaminated priors. Ann. Stat. 14, 461-486. (doi:10.1214/aos/1176349933) Crossref, ISI, Google Scholar - 26.
Raue A et al.2013Lessons learned from quantitative dynamical modeling in systems biology. PLoS ONE 8, e74335. (doi:10.1371/journal.pone.0074335) Crossref, PubMed, ISI, Google Scholar - 27.
Hross S, Hasenauer J . 2016Analysis of CFSE time-series data using division-, age- and label-structured population models. Bioinformatics 32, 2321-2329. (doi:10.1093/bioinformatics/btw131) Crossref, PubMed, ISI, Google Scholar - 28.
Bäck T . 1996Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. New York, NY: Oxford University Press. Crossref, Google Scholar - 29.
Yang X . 2010Nature-inspired metaheuristic algorithms, 2nd edn. Bristol, UK: Luniver Press. Google Scholar - 30.
Vaz A, Vicente L . 2007A particle swarm pattern search method for bound constrained global optimization. J. Global Optim. 39, 197-219. (doi:10.1007/s10898-007-9133-5) Crossref, ISI, Google Scholar - 31.
Moles CG, Mendes P, Banga JR . 2003Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome Res. 13, 2467-2474. (doi:10.1101/gr.1262503) Crossref, PubMed, ISI, Google Scholar - 32.
Murphy SA, van der Vaart AW . 2000On profile likelihood. J. Am. Stat. Assoc. 95, 449-485. (doi:10.1080/01621459.2000.10474219) Crossref, ISI, Google Scholar - 33.
Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmüller U, Timmer J . 2009Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics 25, 1923-1929. (doi:10.1093/bioinformatics/btp358) Crossref, PubMed, ISI, Google Scholar - 34.
Boiger R, Hasenauer J, Hross S, Kaltenbacher B . 2016Integration based profile likelihood calculation for PDE constrained parameter estimation problems. Inverse Prob. 32, 125009. (doi:10.1088/0266-5611/32/12/125009) Crossref, ISI, Google Scholar - 35.
Schwarz G . 1978Estimating the dimension of a model. Ann. Stat. 6, 461-464. (doi:10.1214/aos/1176344136) Crossref, ISI, Google Scholar - 36.
Burnham KP, Anderson DR . 2004Multimodel inference: understanding AIC and BIC in model selection. Sociol. Methods Res. 33, 261-304. (doi:10.1177/0049124104268644) Crossref, ISI, Google Scholar - 37.
Stapor P et al.2018PESTO: parameter estimation toolbox. Bioinformatics 34, 705-707. (doi:10.1093/bioinformatics/btx676) Crossref, PubMed, ISI, Google Scholar - 38.
Schumann K et al.2010Immobilized chemokine fields and soluble chemokine gradients cooperatively shape migration patterns of dendritic cells. Immunity 32, 703-713. (doi:10.1016/j.immuni.2010.04.017) Crossref, PubMed, ISI, Google Scholar - 39.
Chis O-T, Banga JR, Balsa-Canto E . 2011Structural identifiability of systems biology models: a critical comparison of methods. PLoS ONE 6, e27755. (doi:10.1371/journal.pone.0027755) Crossref, PubMed, ISI, Google Scholar - 40.
Kilarski WW, Güç E, Teo JC, Oliver SR, Lund AW, Swartz MA . 2013Intravital immunofluorescence for visualizing the microcirculatory and immune microenvironments in the mouse ear dermis. PLoS ONE 8, e57135. (doi:10.1371/journal.pone.0057135) Crossref, PubMed, ISI, Google Scholar - 41.
Tal O, Lim HY, Gurevich I, Milo I, Shipony Z, Ng LG, Angeli V, Shakhar G . 2011DC mobilization from the skin requires docking to immobilized CCL21 on lymphatic endothelium and intralymphatic crawling. J. Exp. Med. 208, 2141-2153. (doi:10.1084/jem.20102392) Crossref, PubMed, ISI, Google Scholar - 42.
Nistér D, Stewenius H . 2008Linear time maximally stable extremal regions. In Computer vision: ECCV 2008 (eds D Forsyth, P Torr, A Zisserman). Lecture Notes in Computer Science, vol. 5303, pp. 183–196. Berlin, Germany: Springer. (doi:10.1007/978-3-540-88688-4_14) Google Scholar - 43.
Jagiella N, Müller B, Müller M, Vignon-Clementel IE, Drasdo D . 2016Inferring growth control mechanisms in growing multi-cellular spheroids of NSCLC cells from spatial-temporal image data. PLoS Comput. Biol. 12, e1004412. (doi:10.1371/journal.pcbi.1004412) Crossref, PubMed, ISI, Google Scholar - 44.
Stichel D, Middleton AM, Müller BF, Depner S, Klingmüller U, Breuhahn K, Matthäus F . 2017An individual-based model for collective cancer cell migration explains speed dynamics and phenotype variability in response to growth factors. NPJ Syst. Biol. Appl 3, 5. (doi:10.1038/s41540-017-0006-3) Crossref, PubMed, ISI, Google Scholar - 45.
Dudal S et al.2015Integrated pharmacokinetic, pharmacodynamic and immunogenicity profiling of an anti-CCL21 monoclonal antibody in cynomolgus monkeys. MAbs 7, 829-837. (doi:10.1080/19420862.2015.1060384) Crossref, PubMed, ISI, Google Scholar - 46.
Felzenszwalb P, Huttenlocher D . 2004Efficient graph-based image segmentation. IJCV 59, 167-181. (doi:10.1023/B:VISI.0000022288.19776.77) Crossref, ISI, Google Scholar - 47.
Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf MPH . 2009Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 6, 187-202. (doi:10.1098/rsif.2008.0172) Link, ISI, Google Scholar - 48.
Toni T, Stumpf MPH . 2010Simulation-based model selection for dynamical systems in systems and population biology. Bioinformatics 26, 104-110. (doi:10.1093/bioinformatics/btp619) Crossref, PubMed, ISI, Google Scholar - 49.
Sottoriva A et al.2015A Big Bang model of human colorectal tumor growth. Nat. Gen. 47, 209-216. (doi:10.1038/ng.3214) Crossref, PubMed, ISI, Google Scholar - 50.
Wilkinson RD . 2013Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat. Appl. Genet. Mol. Biol. 12, 129-141. (doi:10.1515/sagmb-2013-0010) Crossref, PubMed, ISI, Google Scholar - 51.
Raue A et al.2015Data2Dynamics: a modeling environment tailored to parameter estimation in dynamical systems. Bioinformatics 31, 3558-3560. (doi:10.1093/bioinformatics/btv405) Crossref, PubMed, ISI, Google Scholar


