Estimating parameters of a stochastic cell invasion model with fluorescent cell cycle labelling using approximate Bayesian computation
Abstract
We develop a parameter estimation method based on approximate Bayesian computation (ABC) for a stochastic cell invasion model using fluorescent cell cycle labelling with proliferation, migration and crowding effects. Previously, inference has been performed on a deterministic version of the model fitted to cell density data, and not all parameters were identifiable. Considering the stochastic model allows us to harness more features of experimental data, including cell trajectories and cell count data, which we show overcomes the parameter identifiability problem. We demonstrate that, while difficult to collect, cell trajectory data can provide more information about the parameters of the cell invasion model. To handle the intractability of the likelihood function of the stochastic model, we use an efficient ABC algorithm based on sequential Monte Carlo. Rcpp and MATLAB implementations of the simulation model and ABC algorithm used in this study are available at https://github.com/michaelcarrstats/FUCCI.
1. Introduction
Australia and New Zealand have the highest incidence rates of melanoma in the world, followed by northern America and northern Europe [1]. In Australia, melanoma is the third most common diagnosed form of cancer [2]. Since the 1960s, Australia’s primary strategy to reduce overall mortality rates has been targeted at early prevention and detection [3]. However, a better understanding of the mechanisms which control cell invasion is necessary in order to improve or establish new treatment measures.
The underlying mechanisms of cell invasion we consider are combined cell proliferation and cell migration. Cell proliferation is a fourstage sequence consisting of gap 1 (G1), synthesis (S), gap 2 (G2) and mitosis (M) where the cell divides into two daughter cells, each of which return to the G1 phase [4]. Improvements in technology have enabled us to visualize different phases of the cell cycle in real time using fluorescent ubiquitinationbased cell cycle indicator (FUCCI) technology [5]. FUCCI technology involves two fluorescent probes which emit red fluorescence when the cells are in G1 phase and green fluorescence when in S/G2/M phases. During the transition between G1 and S phase, both probes are active (giving the impression that the cell fluoresces yellow), allowing the visualization of the early S phase, which we refer to as eS. Experiments using FUCCItransduced melonama cells are becoming increasingly important in cancer research because many drug treatments target different phases of the cell cycle [6].
The development of simulation models offer us a quick and inexpensive alternative to in vitro experiments. Although, existing mathematical models have had a long history without incorporating cell cycle information until more recently (e.g. [7,8]). In this study, we adopt the cell invasion model of scratch assay experiments developed by Simpson et al. [7]. This model describes a discrete exclusion based random walk on a twodimensional (2D) hexagonal lattice. Furthermore, this model involves treating the entire population of agents as three subpopulations that correspond to the red, yellow and green phases of the cell cycle as identified by FUCCI. Agents transition through the cell cycle, while simultaneously undergoing a nearest neighbour random walk, with exclusion, to model cell migration. This model is discussed in more detail in §3.1. This previous study did not perform any parameter inference or calibrate the model to experimental data. The primary focus of this present work is to apply Bayesian methods to recover parameter estimates for the model and the associated distribution of uncertainly around them. However, standard Bayesian approaches rely on the computation of the likelihood function which is often intractable in complex stochastic models. We overcome this limitation by applying approximate Bayesian computation (ABC) methods, which is discussed later in §3.2.
Simpson et al. [9] investigate practical parameter identifiability in a deterministic partial differential equation of FUCCI scratch assay experiments. Practical parameter identifiability is a term that describes whether it is possible to produce precise estimates with finite regions of confidence levels [10]. We adopt this terminology here since it is consistent with Simpson et al. [9]. Nevertheless, by using a simpler model, their study was able to adopt standard Bayesian approaches to parameter estimation since the likelihood function is tractable. Using a Markov chain Monte Carlo (MCMC) framework and cell density data, their study found cell diffusivities were practically nonidentifiable when they considered the case where the cell migration rate depends on the cell cycle phase. Although, their study does not consider other types of data which may be more informative of the underlying mechanisms. Here, we address the limitations Simpson et al. [9] identify by modelling individual cell behaviour with a stochastic model which allows the generation of numerous data types. Indeed, we take full advantage of the flexibility of the stochastic model in this study and combine multiple data types (the number of cells in each phase and cell trajectory data accounting for different phases) to improve parameter identifiability. However, working with cell trajectory data can be challenging, and these challenges include time consuming effort to manually track cells and the need for the cell density to be low to make cell tracking easier. Models that can avoid using cell trajectory data are an active area of research [11], but we find using the Simpson et al. [7] model, which incorporates cell trajectory data, leads to a good outcome.
Many other studies have explored modelling and/or parameter estimation in cell invasion models [12–17]. Notably, Vo et al. [12] estimate the parameters of a stochastic cell spreading model of an expanding population of fibroblast cells in a 2D circular barrier assay without cell cycle labelling. While ABC methods have previously been considered in stochastic cell spreading models, such as the Vo et al. [12] study, they have never before been considered with FUCCI models and/or data. Prior to Vo et al. [12], cell invasion models were usually defined by deterministic partial differential equation and when performing parameter inference, they usually used trial and errorbased approaches [16] or nonlinear leastsquares estimation [13–15,17]. However, these approaches to parameter estimation are unable to quantify the uncertainty around the point estimates. In this study, we show that using a discrete stochastic model is necessary to identify the transition and motility parameters when multiple phases of the cell cycle are considered. This difference is due to the wider range of data types that are available since individual cells are modelled rather than working with a simple cell density profile. This allows data types that are more informative about the model parameters, which have previously been unavailable to deterministic modelling approaches, to be considered. That is, we find cell count and cell trajectory data to be the most informative data types as they can produce practically identifiable parameters for the transition and motility parameters, respectively. Rcpp and MATLAB implementations of the simulation model and ABC algorithm used in this study are available at https://github.com/michaelcarrstats/FUCCI.
The paper is structured as follows. In §2, we introduce the experimental data and the process by which it is collected. §3 describes the simulation model, the parameter inference method used, and our prior knowledge on the model parameters. In §4, we explain the image analysis process and present the inference results when using synthetic and experimental data sets. Discussion of results, future work and concluding remarks are presented in §5.
2. Data
2D scratch assay experiments are a good screening tool for more complex experimental models, as they are low cost, allow for easy data interpretation and readily allow control of oxygen, nutrients and drug supply [18,19]. We adopt data from a study conducted by Vittadello et al. [20] where a scratch assay is used to examine melanoma cell proliferation and migration in real time with FUCCI technology. The experiment is initialized by placing a small population of cells and a growth medium in a culture dish (figure 1a) to create a uniform 2D monolayer of cells. Next, a sharptipped instrument is used to make a scratch in the monolayer of cells (figure 1b). Finally, the cells are observed at regular intervals as they proliferate and migrate into the newly created gap over the following 48 h. For this study, we adopt the data from the experiments with WM983C FUCCItransduced melonoma cells and present still images captured at 0 and 48 h in figure 1c,d, respectively. A major advantage of 2D scratch assay experiments is the multitude of different data types which can be easily recovered. The data types which we explore later include the number of cells in each population, position of cell populations, and cell trajectory data (figure 1e). It is important to consider the size of the imaged region compared to the culture plate (figure 1b) because the boundaries of the imaged region are not physical boundaries. Since the cell density outside of the scratched region is approximately uniform, with no macroscopic density gradients away from the leading edge, the net flux of cells across the boundary will be zero [7]. Therefore, the appropriate mathematical boundary conditions along the vertical boundaries will be zero net flux.
3. Methods
3.1. Simulation model
We adopt the discrete random walk model developed by Simpson et al. [7] on a 2D hexagonal lattice. Each lattice site has diameter Δ = 20 μm, which is the average cell diameter [21], and is associated with a set of unique Cartesian coordinates,
To simulate cell migration, agents undergo a nearest neighbour random walk at rates M_{r}, M_{y}, M_{g} per hour for red, yellow and green agents, respectively (figure 2a–f). Potential movement events involve randomly selecting the target site from the set of six nearestneighbouring lattice sites, with the movement event being successful only if the target site is vacant. In this way, crowding effects are simply accommodated. To simulate transitions through the cell cycle, red agents are allowed to transition into yellow agents at rate R_{r} per hour (figure 2h,i), yellow agents to green agents at rate R_{y} per hour (figure 2i,j) and green agents into two red daughter agents at rate R_{g} per hour (figure 2j,k). While we assume that the redtoyellow and yellowtogreen transitions are unaffected by crowding, we model crowding effects for the greentored transition by aborting transitions where the additional red daughter agent would be placed onto an occupied lattice site. By prohibiting multiple agents from occupying the same lattice site, we are able to realistically incorporate crowding effects [23,24].
The Simpson et al. [7] model is dependent on the initial geometry, boundary conditions, the lattice spacing Δ, and the cell cycle transition and motility rates. Since we have reasonable estimates for Δ [21] and we calibrate the initial geometry and boundary conditions to the experimental data, our study is concerned with estimating the unknown cell cycle transition and motility parameters. In a Bayesian setting, the unknown model parameters, θ = (R_{r}, R_{y}, R_{g}, M_{r}, M_{y}, M_{g}), and the uncertainty around them can be quantified by the posterior distribution, which is dependent on the likelihood and the prior distribution. However, while the Markov process model can capture the stochastic nature of cell proliferation and migration, when the dimension of the generator matrix (a matrix of rate parameters which describe the rate of transitioning between states) is too high the likelihood function consequently becomes intractable due to the computational cost of computing the matrix exponential (see [25–28]). Since conventional Bayesian approaches to parameter estimation are no longer feasible, we are motivated to use likelihoodfree methods.
3.2. Approximate Bayesian computation
Using a Bayesian framework, the uncertainty about the unknown parameter θ with respect to the data y can be quantified by sampling from the posterior distribution π(θy) ∝ π(yθ)π(θ), where π(yθ) is the likelihood function and π(θ) is the prior. However, the likelihood function for sufficiently complex models becomes intractable (see examples in biology [12,24,29], in ecology [30,31] and in cosmology [32]). Rather than reverting to simpler models with tractable likelihoods, these types of problems can be instead analysed using likelihoodfree methods that avoid evaluating the likelihood function.
One popular likelihoodfree approach is ABC [33]. ABC involves simulating data from the model x ∼ π(· θ) instead of evaluating the intractable likelihood; accepting configurations of θ which produce simulated data x that is close to the observed data y. It can be impractical to compare the full data sets of x and y, so ABC often relies on reducing the full data sets to summary statistics by some summarizing function S(·), where the summary statistics for x and y are denoted S_{x} = S(x) and S_{y} = S(y), respectively. Provided the summary statistics are highly informative about the model parameters, then π(θy) ≈ π(θS_{y}) is a good approximation or exact if sufficient statistics are used [34]. However, the latter are usually difficult to attain in practice and so this study focuses on the use of summary statistics. In effect, ABC samples from the approximate posterior:
To sample from the approximate posterior, commonly ABCrejection [36,37], MCMCABC [38], or sequential Monte Carlo ABC (SMCABC) (e.g. [39–41]) algorithms are used. ABCrejection samples particles from the prior distribution and accepts particles with a discrepancy measure ρ(S_{y}, S_{x}) less than the desired tolerance ε. In cases when the prior distribution is relatively diffuse compared to the posterior density (such as our application), lower acceptance rates are common because particles are predominantly sampled in regions of low posterior density [39]. To increase efficiency, one could instead use MCMCABC which constructs a Markov chain with a stationary distribution identical to the approximate posterior by proposing particles from a carefullytuned proposal distribution, θ^{i} ∼ q(· θ^{i−1}), and accepting those with probability
The SMCABC replenishment algorithm traverses a set of distributions defined by T nonincreasing tolerance levels ε_{1} ≥ · · · ≥ ε_{T} to sample from the approximate posterior:
A crucial limitation of ABC methods is the curse of dimensionality, where despite the addition of more data, the approximation to the posterior can become distorted as a result of the discrepancy between observed data and simulated data ρ(S_{y}, S_{x}) naturally increasing with the dimension [35]. In applications where increasing the dimension of the summary statistics cannot be avoided, the discrepancy between observed and summary statistics can be accounted for, at least approximately, with regression adjustment [34,35]. Regression adjustment involves explicitly modelling the parameters against the discrepancy between observed and simulated data. Assume for the moment that θ is a scalar parameter. Consider the following regression model
3.3. Prior knowledge
The cell cycle time, which is related to the doubling time, can be thought of as the summation of the time spent in each phase of the cell cycle. Estimates of the cell doubling time for melanoma cells range from 16 to 47 h [4,9]. Furthermore, Simpson et al. [9] estimate the average time 1205Lu FUCCItransduced melanoma cells spend in the G1 to be between 8 and 30 h. This means that the transition from G1 to S/G2/M phase is approximately 1/30−1/8 h^{−1}. Additionally, the duration spent in S/G2/M phases were reported as 8–17 h. This means that the transition from S/G2/M to G1 phase is approximately 1/17–1/8 h^{−1}. Therefore, we propose that our prior information of the transition rates is uniform over the range 0–1 h^{−1} to be conservative.
Cell diffusivity, D, the measurement of motility rate for particles undergoing random diffusive migration, can be used to quantify the cell motility rate $M\hspace{0.17em}\in \hspace{0.17em}\{{M}_{r},{M}_{y},{M}_{g}\}$, by D = MΔ^{2}/4 [46], where Δ is the cell diameter. Empirical evidence finds estimates for cell diffusivity to range from 0 to 3304 μm^{2} h^{−1} [13,14,21]. Furthermore, Simpson et al. [7] suggest the cell diffusivity to be approximately 400 μm^{2} h^{−1}, so that the rates are approximately 4 h^{−1}. We propose that the prior information of the motility rates to be uniform over the range 0−10 h^{−1}; attributing the larger interval to the greater variation of cell diffusivity estimates in existing literature.
4. Results
For SMCABC, we generate samples from the approximate posteriors using N = 1000 particles. From preliminary trials, we found it more useful to use the overall MCMC acceptance rate as the stopping rule for the SMCABC algorithm and adopt the sensible choice for the final acceptance rate as ${f}_{\mathrm{acc}}=1\text{\%}$ and ε_{T} = 0 for the target tolerance.
4.1. Developing summary statistics and validation with synthetic data
The accuracy and precision of ABC methods in approximating the posterior distribution is sensitive to the quality of the summary statistics used [35]. We first trial and validate different summary statistics with multiple synthetic datasets such that the true parameter values are known. In this way, we are able to compare the performance of different summary statistics and determine which are the most effective. While trying to replicate the environment of the experimental data as close as possible, such as domain size, boundary conditions and initial number of cells, we do not calibrate the initial location of cells but rather randomly distribute the cells within a 200 μm by 1745.35 μm region on either side of the scratch. We attain the initial cell counts of red, yellow and green cells by using the procedure outlined in §4.2 (steps 1–4) and report them here to be 119, 35 and 121, respectively.
In our analysis of the simulation model, agents with relatively higher transition rates were found to correspond to lower population sizes, and vice versa. Therefore, we use the number of agents in each population (N_{r}, N_{y}, N_{g}) at the end of the experiment as summary statistics that may be informative about the transition rates R_{r}, R_{y}, R_{g}, respectively. We test the suitability of this summary statistic on four synthetic data sets produced by varying the transition rates amongst biologically plausible values and keeping the motility rates known and constant. Estimates in existing literature of cell transition rates are similar [4,9] and the efficiency of the simulation model is dependent on the number of agents in the system (higher transition rates increase the overall proliferation rate and the frequency of events). Therefore, we choose to keep the transition rates rather close to the estimates of Haass et al. [4], instead of varying them over the extents of the prior domain. Thus, the four parameter configurations we choose to generate the synthetic datasets are $\theta \hspace{0.17em}\in \{(0.04,0.17,0.08,4,4,4),(0.25,0.15,0.22,4,4,4),(0.12,0.07,0.03,4,4,4),(0.3,0.36,0.28,4,4,4)\}$. In §2 of the electronic supplementary material, we present the marginal posterior distributions produced and confirm the suitability of this summary statistic.
For the motility parameters, we explore and compare two sets of summary statistics, namely cell density and cell trajectory data. Of these two datasets, cell density data are desirable due to less manual effort needed to generate the data while cell trajectory data could offer more information but is more challenging to collect. For the cell density data, we first segment the imaged region at the end of the experiment (t = 48 h) directly down the centre of the image in the y direction and calculate the median position and interquartile ranges of the red, yellow and green agent populations in the xdirection for cells on the left and right sides. For the cell trajectory data, we average the distance of multiple cell trajectories through each cell phase until the cell returns to the initial phase or the simulation is terminated. We select cells to be tracked provided that the cell is initially in G1 (red) phase and the cell is located on the leading edge of the cell monolayer toward the gap in the scratch assay. The reasoning behind beginning tracking from the G1 phases is due to a short period of fluorescent negativity in between S/G2/M phases and G1 phase [4] which makes tracking between these phases difficult. Therefore, by starting from the G1 phase, we avoid having to track between these two phases. Furthermore, cells were identified as being on the leading edge if their path towards the middle of the scratch was unhindered. The process of manually tracking cell trajectories can be time consuming. Thus, we are interested in finding the minimum number of cells to track such that sufficient information is acquired. In §2 of the electronic supplementary material, we draw samples from the posterior distribution using 10, 20, 30, 40 and 50 cell trajectories with four synthetic data sets generated from $\theta \hspace{0.17em}\in \{(0.04,0.17,0.08,4,4,4),(0.04,0.17,0.08,2,5,8),\hspace{0.17em}(0.04,0.17,\hspace{0.17em}0.08,8,2,5),(0.04,0.17,0.08,8,2,5),(0.04,0.17,0.08,5,8,2)\}$. Our analysis finds diminishing returns of parameter precision as the number of cell trajectories increases. We find that using 20 cell trajectories achieves a good balance between precision and the number of cell trajectories used. Using the same four synthetic datasets, we also attempt to draw samples from the posterior distribution using cell density data with the cell transition rates held constant in §2 of the electronic supplementary material. However, under these settings, we found the motility parameters to be nonidentifiable.
We now combine the summary statistics formulated to estimate the cell cycle transition and motility rates together with four synthetic data sets. Due to the similarity in estimates for cell cycle transition rates in existing literature (see [4,9]), we adopt estimates for the cell cycle transition rates from Haass et al. [4] for all four parameter configurations. Since estimates for motility rates have been reported to vary by two orders of magnitude (see [13,14,21]), we choose to vary the motility rates over the range of the prior for the four parameter configurations. That is, we generate four synthetic data sets with θ ∈ {(0.04, 0.17, 0.08, 4, 4, 4), (0.04, 0.17, 0.08, 2, 5, 8), (0.04, 0.17, 0.08, 8, 2, 5), (0.04, 0.17, 0.08, 5, 8, 2)}. In figure 3, we present the marginal posterior distributions produced when using the number of cells in each subpopulation and cell density data as summary statistics. In figure 4, we present the marginal posterior distributions when using cell trajectory data in place of cell density data. Again, we see that the motility parameters are practically nonidentifiable when cell density data are used while both cell cycle transition and motility parameters are practically identifiable when cell trajectory data are included. Furthermore, it is clear from the concentration of the marginal posterior distributions around the true parameter values (dashed line) in figure 4 that the cell count and cell trajectory data are highly informative about the transition and motility parameters, respectively. We note that the precision of these distributions is greater for the cell cycle transition parameters than the motility parameters. Importantly, these results show for the first time that practical parameter inference on both transition and motility parameters of a FUCCI scratch assay experiment using Bayesian inference techniques is possible. These results justify the choice of the Markov process model compared to simpler continuum models which do not give insight into cell trajectory data (see [9]).
4.2. Image analysis of experimental data
We analyse the experimental images using ImageJ [47] to record the Cartesian coordinates of cells. Of primary interest is processing the initial frame such that we can replicate the experimental settings as accurately as possible in the simulation but we also repeat this procedure for the final frame to retrieve the final cell counts and cell density data, which we use as summary statistics. The process is as follows:

Step 1: Read in image: File > Open > select image (figure 5a).

Step 2: Convert image to 8bit: Image > Type > 8 − bit (figure 5b).

Step 3: Identify cell edges: Convert the image to black and white ($\mathrm{Process}>\mathrm{Binary}>\mathrm{convert}\hspace{0.17em}\mathrm{to}\hspace{0.17em}\mathrm{mask}$) and then distinguish conjoined cells (Process > Binary > Watershed) (figure 5c).

Step 4: Compute Cartesian coordinates: Analyse > Analyse particles…> OK.
A limitation of using the watershed tool is that we must convert the image to black and white. In doing so, we lose the cell phase identity associated with the cell coordinates recovered from ImageJ. To overcome this, we use R [48] to retrieve the RGB decimal colour code and Cartesian coordinates of pixels. Matching pixel coordinates recovered from R and the coordinates of the centroid of the cells recovered in ImageJ, we create a data set of cell coordinates and their associated RGB decimal codes. To classify the RGB coordinates into one of the three cell cycle phases we use the conditions outlined in table 1.
state  RGB decimal code 


red  green  
G1  >100  ≤100 
eS  >100  >100 
S/G2/M  ≤100  >100 
To extract summary statistics from the experimental data, we repeat the image processing procedure previously outlined above with the final frame (t = 48 h) and extract the final cell counts and cell density data. Additionally, we extract cell trajectory data by processing the entire sequence of still images in ImageJ with the ‘Multipoint’ tool to manually track cell coordinates between frames. We use a similar process as before to identify cell phases in these summary statistics using R and present them in table 2 and the cell trajectory data in figure 6.
summary statistic  description  value 

S_{1}  number of red cells at 48 hours  566 cells 
S_{2}  number of yellow cells at 48 hours  111 cells 
S_{3}  number of green cells at 48 hours  166 cells 
S_{4}  average distance travelled through red phase by 20 cells  105 μm 
S_{5}  average distance travelled through yellow phase by 20 cells  40 μm 
S_{6}  average distance travelled through green phase by 20 cells  100 μm 
S_{7}  median position of red cells on the left and right side  (155, 1170) μm 
S_{8}  median position of yellow cells on the left and right side  (158, 1189) μm 
S_{9}  median position of green cells on the left and right side  (177, 1129) μm 
S_{10}  interquartile range of the red cells position on the left and right side  (196, 197) μm 
S_{11}  interquartile range of the yellow cells position on the left and right side  (164, 144) μm 
S_{12}  interquartile range of the green cells position on the left and right side  (213, 207) μm 
Finally, we calibrate the hexagonal lattice used in the simulation model with the dataset of Cartesian coordinates recovered previously by rearranging equation (3.1) to find their associated lattice row and column indices denoted
4.3. Estimating model parameters with experimental data
After calibrating the simulation to the experimental data of WM983C FUCCItransduced melanoma cells, we first attempt to sample from the posterior distribution using the number of cells in each subpopulation and cell density data (summary statistics S_{1} to S_{3} and S_{7} to S_{12} in table 2, respectively) and present the samples from the posterior distribution in figure 7. Consistent with results found in §4.1 and Simpson et al. [9], estimates for the motility rates are practically nonidentifiable when cell density data are used.
Next, we attempt to sample from the posterior distribution using the number of cells in each subpopulation and cell trajectory data (summary statistics S_{1} to S_{3} and S_{4} to S_{6} in table 2, respectively). We present the marginal posterior distributions produced in figures 8a,b along with the mean, standard deviation, (2.5%, 50%, 97.5%) quantiles, and the coefficient of variation (CV) in table 3. The practical identifiability in the transition and motility parameters clearly shows the benefits of using cell tracking data as the distributions are unimodal and concentrated. We estimate the cell cycle transition rates to be between 0.0411−0.193 h^{−1} which is consistent with estimates in existing literature [4,9]. Our estimates for cell motility were found to range between 0.316−1.12 h^{−1} which corresponds to estimates of cell diffusivity between 31.6 and 112 μm^{2} h^{−1} which is reasonable considering the degree of uncertainty in existing estimates which can vary between 0 and 3304 μm^{2} h^{−1} [13,14,21]. The precision in parameter estimates can be quantified by the CV which is a standard measure for the dispersion of data around the mean. Using the CV, the dispersion in the transition rates range from 2.65 to 5.31% and the motility rates range from 10.9 to 18.4%. To validate the parameter estimates recovered, we also present the posterior predictive distributions for the summary statistics retained from each parameter value in the posterior in figures 8c,d. These distributions are formed by plotting the distribution of simulated summary statistics produced from the posterior samples and is compared to the observed summary statistics (dashed line). These results suggest that the Markov process model developed by Simpson et al. [7] is promising as it is able to recover the observed summary statistics of the experimental data. However, further model validation should be considered in future research to determine if the simulated cell trajectories produce similar paths to those which were observed.
parameter  mean  s.d.  (2.5%, 50%, 97.5%)  CV (%) 

R_{r}  0.0411  0.00109  (0.039, 0.0411, 0.0432)  2.65 
R_{y}  0.192  0.0102  (0.173, 0.191, 0.214)  5.31 
R_{g}  0.193  0.00957  (0.177, 0.192, 0.213)  4.96 
M_{r}  0.316  0.0343  (0.256, 0.313, 0.385)  10.9 
M_{y}  0.514  0.0945  (0.353, 0.502, 0.725)  18.4 
M_{g}  1.12  0.169  (0.836, 1.10, 1.48)  15.1 
5. Discussion
In this study, we calibrate the 2D hexagonallattice random walk model developed by Simpson et al. [7] to scratch assay data where the cell cycle is revealed in real time using FUCCI technology. While this model is well suited to describing the stochastic nature of cell proliferation and migration, the likelihood function consequently becomes intractable. This makes conventional Bayesian approaches to parameter inference infeasible. We resort to using the class of Bayesian methods known as ABC which bypass evaluating the likelihood function. After evaluating the appropriateness of different ABC algorithms in §3.2 we find the SMCABC replenishment algorithm developed by Drovandi & Pettitt [40] to be suitable.
In this study, we work with uniform prior distributions. This may be considered as a reasonably vague prior, but it can also be interpreted as providing more support to larger rate parameters when the prior ranges over several orders of magnitude. To discourage larger rate parameter values, other priors could be considered, such as Jeffry’s prior over all positive reals. If a proper prior (i.e. integrates to unity) is desired, the Jeffrey’s prior could be truncated or an exponential prior used instead. We leave such extensive prior sensitivity analysis and performance for future research.
In §3.1, we previously discussed the intractability of the likelihood function. Although, we did not discuss particle filtering methods to construct a continuous time likelihood function which can be considerably more tractable than those based off discrete data. However, given that the model is highly stochastic, very different cell trajectories can be produced with the same parameter values. This would make filtering approaches difficult to apply. Instead, it is more efficient to match summary statistics of the cell trajectories (here we consider the average time spent in each phase). Additionally, pseudomarginal MCMC [49] could be used to sample from the exact posterior distribution if an unbiased likelihood estimator (based on the full dataset) with a small enough variance can be constructed. Unfortunately, due to the complexity of the model and the need to summarize the data, it does not seem feasible to construct such a likelihood estimator here. Therefore, the nature of the modelling approach and the use of summary statistics naturally lends itself to using ABC methods.
The accuracy of ABC methods in approximating the posterior distribution is sensitive to the quality of the summary statistics used [35]. We trial various summary statistics with multiple synthetic data sets to determine which summary statistics are the most informative. We find using the number of cells in each cell cycle phase at the end of the experiment to be highly informative about the cell cycle transition rates. We trial and compare two sets of summary statistics for the motility parameters: the median position and interquartile range of the cells in the xdirection on the left and right side of the scratch assay (which we refer to as cell density data); and the average distance travelled through each cell phase by 20 individual cells (which we refer to as cell trajectory data). Using these two sets of summary statistics in conjunction with the cell count data, we attempt to draw samples from the posterior distribution using the SMCABC replenishment algorithm with multiple biologically plausible synthetic data sets. We find that when using cell trajectory data as summary statistics the parameters are practically identifiable; however this is not the case when cell density data are used. Importantly, this is the first time practical parameter identifiability for both cell cycle transition and motility has been successfully conducted with fluorescent cell cycle labelling scratch assay experiments.
In this study, we summarize cell trajectory data by taking the average distance travelled in each phase across 20 cell trajectories. However, additional features of cell trajectories could also be considered (for example, the variance). Although, the addition of more summary statistics may increase ABC error due to the increased dimensionality of the summary statistic despite our efforts to treat this with regression adjustment. An additional method which could be used is semiautomatic ABC [50] which constructs a set of summary statistics with the same dimension as the parameter space by modelling the importance of the initial set of summary statistics. However, exploration of these additional summary statistics and the the effects on the precision of the posterior distribution are left for future research.
We extend on the work of previous studies [7,9] by calibrating our model to real data and performing Bayesian inference. Using experimental data of WM983C FUCCItransduced melanoma cells, we estimate the approximate posterior using the SMCABC algorithm with our cell cycle transition rate summary statistics and our two sets of motility summary statistics. Under the experimental setting, our results again find the estimates for the motility parameters to be practically nonidentifiable when cell density data are used but practically identifiable when cell trajectory data are used. These results are consistent with Simpson et al. [9] and justify the motivation to use a stochastic model capable of generating multiple data types. When using the number of cells in each subpopulation and cell trajectory data, we find estimates for the average cell cycle transition rates to range between 0.0411−0.193 h^{−1} and estimates for average cell motility to range between 0.316−1.12 h^{−1}. Interestingly, we find that the motility rates appear to depend upon the cell cycle phase and for these data the motility of cells in S/G2/M phase is higher than the motility rate in the in G1 or eS phase. We quantify the precision of these estimates through the CV which is a standard measure of dispersion about the mean. We find the CV to be suitably small for all parameters as it ranges from 2.65–5.31% and 10.9−18.4% for the transition and motility marginal posteriors, respectively. To validate our results, we also draw samples from the posterior predictive distribution to determine whether the simulated data sets recovered accurately reflect the observed datasets. These results confirm that the model and summary statistics are recovering the underlying mechanisms present in the experiment.
Now that the recovery of precise parameter estimates from a fluorescent cell cycle labelling model has been demonstrated, further models can be built which are more biologically realistic. For instance, the Markov process model we used in this study describes a discrete exclusion based random walk on a 2D hexagonal lattice. However, a more biologically realistic and meaningful model would incorporate a threedimensional (3D) environment (e.g. [51]). By constraining our model to a 2D hexagonal lattice, we ultimately omit realistically modelling: the spatial supply of oxygen, nutrients and drugs; the orientation in 3D space; and interactions with the extracellular matrix [19,52]. Although, increasing model complexity tends to require additional parameters in the model which in some applications may render ABC methods ill suited to inference due to their poorer performance in higher dimensions [50]. Such modelling and inference implications would need to be considered in future work. Nevertheless, we demonstrate that the 2D stochastic model developed by Simpson et al. [7] is able to recover key features of the experimental data set we examined and can be used to provide a quick and inexpensive alternative to in vitro experiments.
Finally, to bypass evaluating the likelihood function we resort to using ABC techniques. However, ABC requires many model simulations, which can be computationally expensive if the simulation model is relatively inefficient. In our application, the computation time for the model is largely dependent on the value of the transition and motility parameters, where larger values will require more computation. We compute the computational cost of 1000 simulations with parameter configurations drawn from the prior distribution and report the computational cost of the model as a 95% empirical confidence interval that ranges between 1.08 and 57.33 s per simulation. Using an Intel(R) Xeon(R) Gold 6140 CPU at 2.3 GHz and paralysing over the 16 cores results in the total computation time of the SMCABC algorithm taking approximately 23 h to run when using cell count and cell trajectory data and 16 h when using cell count and cell density data. We find these computation times to be reasonable but future work may need to consider more computationally efficient modelling and/or statistical methods, particularly if more summary statistics are to be considered.
Data accessibility
Rcpp and MATLAB implementations of the simulation model and SMCABC algorithm along with the data used in this study are available at https://github.com/michaelcarrstats/FUCCI.
Authors' contributions
All authors designed the research. M.J.C. performed the research, wrote the manuscript, produced the figures and implemented the computational algorithms used in this study. All authors edited and approved this manuscript.
Competing interests
We declare we have no competing interests.
Funding
M.J.S. is supported by the Australian Research Council (DP200100177) and C.D. is supported by the Australian Research Council (DP200102101).
Acknowledgements
We would like to acknowledge the services of Queensland University of Technologies High Performance Computing (QUT HPC) for allowing the code used within this study to be ran on their servers. Additionally, M.J.C. would like to thank funding provided by M.J.S. and C.D. We thank the four referees for their helpful suggestions.
References
 1.
Parkin DM, Bray F, Ferlay J, Pisani P . 2005 Global cancer statistics, 2002. CA Cancer J. Clin. 55, 74108. (doi:10.3322/canjclin.55.2.74) Crossref, PubMed, Web of Science, Google Scholar  2. Australian Institute of Health Welfare. 2018 Cancer in Australia: actual incidence data from 1982 to 2013 and mortality data from 1982 to 2014 with projections to 2017. AsiaPacific J. Clin. Oncol. 14, 515. (doi:10.1111/ajco.12761) Crossref, Web of Science, Google Scholar
 3.
Giblin AV, Thomas JM . 2007 Incidence, mortality and survival in cutaneous melanoma. J. Plast., Reconstr. Aesthetic Surg. 60, 3240. (doi:10.1016/j.bjps.2006.05.008) Crossref, PubMed, Web of Science, Google Scholar  4.
Haass NK, Beaumont KA, Hill DS, Anfosso A, Mrass P, Munoz MA, Kinjyo I, Weninger W . 2014 Realtime cell cycle imaging during melanoma growth, invasion, and drug response. Pigment Cell Melanoma Res. 27, 764776. (doi:10.1111/pcmr.12274) Crossref, PubMed, Web of Science, Google Scholar  5.
SakaueSawano A et al. 2008 Visualizing spatiotemporal dynamics of multicellular cellcycle progression. Cell 132, 487498. (doi:10.1016/j.cell.2007.12.033) Crossref, PubMed, Web of Science, Google Scholar  6.
Haass NK, Gabrielli B . 2017 Cell cycletailored targeting of metastatic melanoma: challenges and opportunities. Exp. Dermatol. 26, 649655. (doi:10.1111/exd.13303) Crossref, PubMed, Web of Science, Google Scholar  7.
Simpson MJ, Jin W, Vittadello ST, Tambyah TA, Ryan JM, Gunasingh G, Haass NK, McCue SW . 2018 Stochastic models of cell invasion with fluorescent cell cycle indicators. Physica A 510, 375386. (doi:10.1016/j.physa.2018.06.128) Crossref, Web of Science, Google Scholar  8.
PerezCarrasco R, Beentjes C, Grima R . 2020 Effects of cell cycle variability on lineage and population measurements of messenger RNA abundance. J. R. Soc. Interface 17, 20200360. (doi:10.1098/rsif.2020.0360) Link, Web of Science, Google Scholar  9.
Simpson MJ, Baker RE, Vittadello ST, Maclaren OJ . 2020 Practical parameter identifiability for spatiotemporal models of cell invasion. J. R. Soc. Interface 17, 20200055. (doi:10.1098/rsif.2020.0055) Link, Web of Science, Google Scholar  10.
Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmüller U, Timmer J . 2009 Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics 25, 19231929. (doi:10.1093/bioinformatics/btp358) Crossref, PubMed, Web of Science, Google Scholar  11.
Hywood JD, Rice G, Pageon SV, Read MN, Biro M . 2021 Detection and characterization of chemotaxis without cell tracking. J. R. Soc. Interface 18, 20200879. (doi:10.1098/rsif.2020.0879) Link, Web of Science, Google Scholar  12.
Vo BN, Drovandi CC, Pettitt AN, Simpson MJ . 2015 Quantifying uncertainty in parameter estimates for stochastic models of collective cell spreading using approximate Bayesian computation. Math. Biosci. 263, 133142. (doi:10.1016/j.mbs.2015.02.010) Crossref, PubMed, Web of Science, Google Scholar  13.
Maini PK, McElwain DLS, Leavesley DI . 2004 Traveling wave model to interpret a woundhealing cell migration assay for human peritoneal mesothelial cells. Tissue Eng. 10, 475482. (doi:10.1089/107632704323061834) Crossref, PubMed, Google Scholar  14.
Cai AQ, Landman KA, Hughes BD . 2007 Multiscale modeling of a woundhealing cell migration assay. J. Theor. Biol. 245, 576594. (doi:10.1016/j.jtbi.2006.10.024) Crossref, PubMed, Web of Science, Google Scholar  15.
Swanson KR . 2008 Quantifying glioma cell growth and invasion in vitro. Math. Computer Model. 47, 638648. (doi:10.1016/j.mcm.2007.02.024) Crossref, Google Scholar  16.
Takamizawa K, Niu S, Matsuda T . 1997 Mathematical simulation of unidirectional tissue formation: in vitro transanastomotic endothelialization model. J. Biomater. Sci., Polym. Ed. 8, 323334. (doi:10.1163/156856296X00336) Crossref, Web of Science, Google Scholar  17.
Savla U, Olson LE, Waters CM . 2004 Mathematical modeling of airway epithelial wound closure during cyclic mechanical strain. J. Appl. Physiol. 96, 566574. (doi:10.1152/japplphysiol.00510.2003) Crossref, PubMed, Web of Science, Google Scholar  18.
SantiagoWalker A, Li L, Haass N, Herlyn M . 2009 Melanocytes: from morphology to application. Skin Pharmacol. Physiol. 22, 114121. (doi:10.1159/000178870) Crossref, PubMed, Web of Science, Google Scholar  19.
Beaumont KA, MohanaKumaran N, Haass NK . 2014 Modeling melanoma in vitro and in vivo. Healthcare (Basel, Switzerland) 2, 2746. (doi:10.3390/healthcare2010027) Google Scholar  20.
Vittadello ST, McCue SW, Gunasingh G, Haass NK, Simpson MJ . 2018 Mathematical models for cell migration with realtime cell cycle dynamics. Biophys. J. 114, 12411253. (doi:10.1016/j.bpj.2017.12.041) Crossref, PubMed, Web of Science, Google Scholar  21.
Treloar KK, Simpson MJ, Haridas P, Manton KJ, Leavesley DI, McElwain DLS, Baker RE . 2013 Multiple types of data are required to identify the mechanisms influencing the spatial expansion of melanoma cell colonies. BMC Syst. Biol. 7, 137. (doi:10.1186/175205097137) Crossref, PubMed, Web of Science, Google Scholar  22.
Gillespie DT . 1977 Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81, 23402361. (doi:10.1021/j100540a008) Crossref, Web of Science, Google Scholar  23.
Ermentrout GB, EdelsteinKeshet L . 1993 Cellular automata approaches to biological modeling. J. Theor. Biol. 160, 97133. (doi:10.1006/jtbi.1993.1007) Crossref, PubMed, Web of Science, Google Scholar  24.
Johnston ST, Ross JV, Binder BJ, McElwain DLS, Haridas P, Simpson MJ . 2016 Quantifying the effect of experimental design choices for in vitro scratch assays. J. Theor. Biol. 400, 1931. (doi:10.1016/j.jtbi.2016.04.012) Crossref, PubMed, Web of Science, Google Scholar  25.
Sidje RB . 1998 Expokit: a software package for computing matrix exponentials. ACM Trans. Math. Softw. (TOMS) 24, 130156. (doi:10.1145/285861.285868) Crossref, Web of Science, Google Scholar  26.
Moler C, Van Loan C . 2003 Nineteen dubious ways to compute the exponential of a matrix, twentyfive years later. SIAM Rev. 45, 349. (doi:10.1137/S00361445024180) Crossref, Web of Science, Google Scholar  27.
Ho LST, Xu J, Crawford FW, Minin VN, Suchard MA . 2018 Birth/birthdeath processes and their computable transition probabilities with biological applications. J. Math. Biol. 76, 911944. (doi:10.1007/s0028501711603) Crossref, PubMed, Web of Science, Google Scholar  28.
Schnoerr D, Sanguinetti G, Grima R . 2017 Approximation and inference methods for stochastic biochemical kinetics—a tutorial review. J. Phys. A: Math. Theor. 50, 093001. (doi:10.1088/17518121/aa54d9) Crossref, Web of Science, Google Scholar  29.
Kursawe J, Baker RE, Fletcher AG . 2018 Approximate bayesian computation reveals the importance of repeated measurements for parameterising cellbased models of growing tissues. J. Theor. Biol. 443, 6681. (doi:10.1016/j.jtbi.2018.01.020) Crossref, PubMed, Web of Science, Google Scholar  30.
Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf MP . 2009 Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 6, 187202. (doi:10.1098/rsif.2008.0172) Link, Web of Science, Google Scholar  31.
Guillemaud T, Beaumont MA, Ciosi M, Cornuet JM, Estoup A . 2010 Inferring introduction routes of invasive species using approximate Bayesian computation on microsatellite data. Heredity 104, 8899. (doi:10.1038/hdy.2009.92) Crossref, PubMed, Web of Science, Google Scholar  32.
Weyant A, Schafer C, WoodVasey WM . 2013 Likelihoodfree cosmological inference with type Ia supernovae: approximate Bayesian computation for a complete treatment of uncertainty. Astrophys. J. 764, 116. (doi:10.1088/0004637X/764/2/116) Crossref, Web of Science, Google Scholar  33.
Sisson SA, Fan Y, Beaumont M . 2018 Handbook of approximate Bayesian computation. Boca Raton, FL: CRC Press. Crossref, Google Scholar  34.
Blum MG, Nunes MA, Prangle D, Sisson SA . 2013 A comparative review of dimension reduction methods in approximate Bayesian computation. Stat. Sci. 28, 189208. (doi:10.1214/12STS406) Crossref, Web of Science, Google Scholar  35.
Beaumont MA, Zhang W, Balding DJ . 2002 Approximate Bayesian computation in population genetics. Genetics 162, 20252035. (doi:10.1093/genetics/162.4.2025) Crossref, PubMed, Web of Science, Google Scholar  36.
Pritchard JK, Seielstad MT, PerezLezaun A, Feldman MW . 1999 Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 17911798. (doi:10.1093/oxfordjournals.molbev.a026091) Crossref, PubMed, Web of Science, Google Scholar  37.
Tavaré S, Balding DJ, Griffiths RC, Donnelly P . 1997 Inferring coalescence times from DNA sequence data. Genetics 145, 505518. (doi:10.1093/genetics/145.2.505) Crossref, PubMed, Web of Science, Google Scholar  38.
Marjoram P, Molitor J, Plagnol V, Tavaré S . 2003 Markov chain Monte Carlo without likelihoods. Proc. Natl Acad. Sci. USA 100, 15 32415 328. (doi:10.1073/pnas.0306899100) Crossref, Web of Science, Google Scholar  39.
Sisson SA, Fan Y, Tanaka MM . 2007 Sequential Monte Carlo without likelihoods. Proc. Natl Acad. Sci. USA 104, 17601765. (doi:10.1073/pnas.0607208104) Crossref, PubMed, Web of Science, Google Scholar  40.
Drovandi CC, Pettitt AN . 2011 Estimation of parameters for macroparasite population evolution using approximate Bayesian computation. Biometrics 67, 225233. (doi:10.1111/j.15410420.2010.01410.x) Crossref, PubMed, Web of Science, Google Scholar  41.
Harrison JU, Baker RE . 2020 An automatic adaptive method to combine summary statistics in approximate Bayesian computation. PLoS ONE 15, e0236954. (doi:10.1371/journal.pone.0236954) Crossref, PubMed, Web of Science, Google Scholar  42.
Hastings WK . 1970 Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97109. Crossref, Web of Science, Google Scholar  43.
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E . 1953 Equation of state calculations by fast computing machines. J. Chem. Phys. 21, 10871092. (doi:10.1063/1.1699114) Crossref, Web of Science, Google Scholar  44.
Epanechnikov VA . 1969 Nonparametric estimation of a multivariate probability density. Theory Probab. Appl. 14, 153158. (doi:10.1137/1114019) Crossref, Google Scholar  45.
Hamilton G, Stoneking M, Excoffier L . 2005 Molecular analysis reveals tighter social regulation of immigration in patrilocal populations than in matrilocal populations. Proc. Natl Acad. Sci. USA 102, 74767480. (doi:10.1073/pnas.0409253102) Crossref, PubMed, Web of Science, Google Scholar  46.
Codling EA, Plank MJ, Benhamou S . 2008 Random walk models in biology. J. R. Soc. Interface 5, 813834. (doi:10.1098/rsif.2008.0014) Link, Web of Science, Google Scholar  47.
Rueden CT, Schindelin J, Hiner MC, DeZonia BE, Walter AE, Arena ET, Eliceiri KW . 2017 ImageJ2: ImageJ for the next generation of scientific image data. BMC Bioinf. 18, 126. (doi:10.1186/s128590171934z) Crossref, PubMed, Web of Science, Google Scholar  48. R Core Team. 2018 R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. See http://www.Rproject.org/. Google Scholar
 49.
Andrieu C, Roberts GO . 2009 The pseudomarginal approach for efficient Monte Carlo computations. Ann. Stat. 37, 697725. (doi:10.1214/07AOS574) Crossref, Web of Science, Google Scholar  50.
Fearnhead P, Prangle D . 2012 Constructing summary statistics for approximate Bayesian computation: semiautomatic approximate Bayesian computation. J. R. Stat. Soc. B (Stat. Methodol.) 74, 419474. (doi:10.1111/j.14679868.2011.01010.x) Crossref, Web of Science, Google Scholar  51.
Jin W, Spoerri L, Haass NK, Simpson MJ . 2021 Mathematical model of tumour spheroid experiments with realtime cell cycle imaging. Bull. Math. Biol. 83, 123. (doi:10.1007/s11538021008784) Crossref, Web of Science, Google Scholar  52.
Smalley KS, Lioni M, Herlyn M . 2006 Life isn’t flat: taking cancer biology to the next dimension. In Vitro Cell. Dev. Biol. Anim. 42, 242247. (doi:10.1290/0604027.1) Crossref, PubMed, Web of Science, Google Scholar