Mechanistic description of spatial processes using integrative modelling of noise-corrupted imaging data

Spatial patterns are ubiquitous on the subcellular, cellular and tissue level, and can be studied using imaging techniques such as light and fluorescence microscopy. Imaging data provide quantitative information about biological systems; however, mechanisms causing spatial patterning often remain elusive. In recent years, spatio-temporal mathematical modelling has helped to overcome this problem. Yet, outliers and structured noise limit modelling of whole imaging data, and models often consider spatial summary statistics. Here, we introduce an integrated data-driven modelling approach that can cope with measurement artefacts and whole imaging data. Our approach combines mechanistic models of the biological processes with robust statistical models of the measurement process. The parameters of the integrated model are calibrated using a maximum-likelihood approach. We used this integrated modelling approach to study in vivo gradients of the chemokine (C-C motif) ligand 21 (CCL21). CCL21 gradients guide dendritic cells and are important in the adaptive immune response. Using artificial data, we verified that the integrated modelling approach provides reliable parameter estimates in the presence of measurement noise and that bias and variance of these estimates are reduced compared to conventional approaches. The application to experimental data allowed the parametrization and subsequent refinement of the model using additional mechanisms. Among other results, model-based hypothesis testing predicted lymphatic vessel-dependent concentration of heparan sulfate, the binding partner of CCL21. The selected model provided an accurate description of the experimental data and was partially validated using published data. Our findings demonstrate that integrated statistical modelling of whole imaging data is computationally feasible and can provide novel biological insights.

1 Two-dimensional model of CCL21 gradient formation To describe the spatio-temporal dynamics of the concentration of soluble CCL21 (u 1 (x, t)), of heparan sulfate (u 2 (x, t)) and of heparan sulfate-CCL21 dimers (u 3 (x, t)) in 2D, we consider the PDE model on x ∈ Ω with initial conditions ∀x ∈ Ω : u 1 (x, 0) = 0, u 2 (x, 0) = s 0 (x) and x 3 (x, 0) = 0 (2) and boundary conditions ∀x ∈ ∂Ω : for the diffusion components u 1 (x, t), with ∂Ω denoting the boundary of Ω and ν denoting its normal vector. The individual lymphatic vessels are exclusive, x ∈ Ω : l q l 1 (x) = q l 2 (x) = 1 with l 1 = l 2 , and the sum l q l (x) defines the location of the lymphatic vessels, For more details in the model, we refer to our previous publication (Hock et al., 2013).

Remark 1.
A two-dimensional model appears sufficient to describe the gradient formation in mouse ear sheets, as mouse ears are rather thin.
We consider the steady state of model (1)-(3), which is obtained by setting the time derivatives to zero. The steady state distribution of soluble CCL21, u s,1 (x, t), fulfils the PDE − D u s,1 (x) = α l q l (x) − γu s,1 (x).
Division by γ reveals that u s,1 (x) merely depends on D/γ and α/γ. While numerical methods have to be employed to calculate the distribution of soluble CCL21 -we used the methods implemented in the MATLAB PDE toolbox -, it can be shown analytically that the solution is proportional to α/γ. The distribution of soluble CCL21 shapes the steady state of heparan sulfate, u s,2 (x, t), and of heparan sulfate-CCL21 dimers, u s,3 (x, t), which are u s,2 (x) = 1 u s,1 (x) with dissociation constant K D = k −1 /k 1 . Indeed, the steady state distributions u s,2 (x) and u s,3 (x) depend merely on s 0 (x) and u s,1 (x)/K D . As the dissociation constant K D has the same scaling effect as α/γ, merely the effective scale parameter α/(γK D ) needs to be considered. The measured pixel intensities depend only on the parameters (D/γ, α/(γK D ), s, b) and the overall heparan sulfate abundance s 0 (x) which is parameterised individually. The individual images possess different scaling constants s.
In addition to the pixel intensities, we consider the distance-dependent intensity of immobilised CCL21 for the 2D model. This distance-dependent intensity is obtained by averaging over all pixels which possess the same distance from the nearest lymphatic vessel. This distancedependent intensity for the filtered and unfiltered image is depicted in Figure 1.
combined images shows a smooth gradient, which has a conside than the gradient obtained by the raw images. It is important to the distance measure in influenced by the outliers (bright spots) indicates that it is no stable measure to obtain an estimate for th of the chemokine CCL21.  Figure 6.5.: Distance measure for the filtered regions. D scaled by the maximal value was calculated for each filtered im The used grid was r j = 1 + 2 ⇤ j for j = 0, . . . , 20. It is visi dient steepness is similar between the images except for CCL high background values. B) Average CCL21 intensity across ima distances from the lymphoid vessels for the raw images and the

Filtered region analysis
In the following sections we will introduce models for the CCL21 g process and estimate the model parameters from the image da e ciency of the proposed estimation procedure we used simulated true LV structure and some realistic kinetic parameters. However the structured noise in the data is of special interest and we introd in the simulated data. To obtain simulated data as close as possib we analyzed the size and shape distribution of the bright spots images. For the size of the bright spots we fitted a normal distribut of the convex hulls for each segmented region. To exclude clumped the region size to maximally 250px 2 . We obtained a nearly normal size with mean µ = 149px 2 and variance = 47px 2 . Furthermo elliptic shape of the bright spots in terms of the length of the ma and fitted a two-dimensional normal distribution with mean min and major axis µ b = 19.3px and covariance matrix ⌃ a,b = ✓ 1 fitting was done with the MATLAB routines normfit and gmdi distributions of the data and our assumed distributions are displa 6.1.4. Problem statement 68 estimation. Based on the distance map a mean intensity for each individual CCL21 staining image is calculated depending on the distance from the LV. For a given set of distances r 0 , . . . , r n the mean intensity for all pixels with r i 1 < dm(x) < r i is calculated. In Figure 6.3 the mean intensity for each image and the mean overall measurements are shown. Based on this extracted mean intensity, the parameters of the models introduced in Section 6.2 will be estimated in Section 6.3.2. It is easily visible that the gradient steepness of the distance measure strongly varies between the images. B) Average CCL21 intensity across images for di↵erent distances from the lymphoid vessels.

Image processing
The distance dependent fluorescent intensity provides a first quantification of the data. To analyze the data in detail, we performed a visual inspection. We found high intensity spots in regions away from the lymphoid vessels. We assumed they originate from other cell types migrating towards the lymphoid vessels or inhomogeneities in the tissue, which react di↵erently to the staining than the other tissue. These spots, however, are not of interest for the process considered later on and can be seen as structured noise produced by the measurement procedure. In the following we discuss how to filter these bright spots to obtain an adjusted image.

Maximally stable extremal regions filtering
For the detection of the bright spots in the staining images we applied a maximally stable extremal region (MSER) filtering introduced by Matas et al. (2004) and the implementation introduced by Nistér & Stewénius (2008). It is based on a water shedding mechanism, where the image is considered as a landscape with heights given by the image gray scale value. This landscape is flooded by gradually raising the water level until the whole image is immersed. The algorithm keeps track of the connected water regions for each water level. MSER is a method to detect high intensity regions for example in fluorescence and brightfield images and has already been successfully applied to detect cells (Buggenthin et al., 2013  2 One-dimensional model of CCL21 gradient formation The two-dimensional model described in the previous section can be simplified by assuming that the concentration of soluble CCL21 (u 1 (x, t)), of heparan sulfate (u 2 (x, t)) and of heparan sulfate-CCL21 dimers (u 3 (x, t)) depends mostly on the distance z from the nearest lymphatic vessel. This assumption was employed by Weber et al. (2013).
The distance z ∈ [0, z max ] ⊂ R + of a point x ∈ Ω ⊂ R 2 from the nearest lymphoid vessel is In the coordinate z, we consider a one-dimensional PDE model which possesses the same structure as the two-dimensional PDE model (1): with initial conditions, ∀z ∈ [0, z max ] : u 1 (z, 0) = 0, u 2 (z, 0) = s 0 (z) and u 3 (z, 0) = 0, and boundary conditions, The lymphatic vessel is localised at z = 0 and we consider z max → ∞. The concentration of soluble CCL21 in the lymphatic vessel is denoted by u 1,LV .
Assuming that turnover dominates diffusion in the vessel, the concentration of soluble CCL21 in the lymphatic vessel in steady state is u 1,LV ≈ α/γ. To derive this result, we set the time derivative in (1) to zero and substitute the second equation in the first, yielding Given that in the lymphatic vessels defined by Ω LV = {x ∈ Ω| l q l (x) = 1} the turnover (γu 1 (x, t)) dominates diffusion (D u 1 (x, t)), we find that This provides the boundary condition for the 1D model at z = 0.

Model parameterisation for artificial data generation
The artificial experimental data were obtained by simulating the 2D model for manually selected parameters. These parameters are provided in Table 1, along with lower and upper bounds used in the parameter estimation for the artificial data. We note that the parameters of the outlier model are not assigned during data generation, but the outliers are simulated using a more involved statistical model (see description in the manuscript).
The parameter values are provided in units of length (UL), units of concentration (UC) and units of fluorescence intensity (UI). As semi-quantitative measurements are used, the scale of the concentration is completely unknown and arbitrary. In accordance with the considered imaging data, the unit of length is µm.

Summary statistic
In addition to the whole imaging data, this study employ the summary statistic proposed by Weber et al. (2013). This summary statistic describes is the distance dependent average intensity.
Each pixel possesses a measured intensity y m j and a distance from the nearest lymphatic vessel z m j . The distance z m j is evaluated at the center point of the pixel and the formal definition is provided in (8). The distance dependent average intensity is in which j is the pixel index and n j is the number of pixels. The function δ(z = z m j ) is one for z = z m j and zeros other. Accordingly, the y (z) is only defined for distances z for which at least one pixel exists.
The summary statistic can be evaluated for the measured data, as well as for the one-and two-dimensional model. Indeed, the one-dimensional model directly provides the summary statistic. For the two-dimensional model, the corresponding image is constructed and the distance is computed using (18).

Estimation results for experimental data
The parameter estimation results for the CCL21 data published by Weber et al. (2013) using the integrated modelling approach are reported in the Tables 2, 3 and 4. These tables provide the results for the model with constant heparan sulfate concentration (H1), the model with different heparan sulfate concentrations in lymphatic vessels and the tissue (H2) and the model with different heparan sulfate concentrations in individual lymphatic vessels and the tissue (H3), respectively. Maximum likelihood estimates were determined using multi-start local optimisation with bounds which were chosen according to the results of pre-runs.
In the Tables 2, 3 and 4 we report only the results for images 12-15 published by Weber et al. (2013) to enhance readability. The comparison of the maximum likelihood estimates obtained for different images is substantial and larger than the uncertainty of the parameter for the individual images as computed using profile likelihoods (results not shown). This indicates substantial biological variability between different regions of the mouse ear.