Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
Open AccessResearch articles

Pushing coarse-grained models beyond the continuum limit using equation learning

Daniel J. VandenHeuvel

Daniel J. VandenHeuvel

School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia

Contribution: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Writing – original draft

Google Scholar

Find this author on PubMed

,
Pascal R. Buenzli

Pascal R. Buenzli

School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia

Contribution: Conceptualization, Formal analysis, Funding acquisition, Project administration, Supervision, Writing – review & editing

Google Scholar

Find this author on PubMed

and
Matthew J. Simpson

Matthew J. Simpson

School of Mathematical Sciences, Queensland University of Technology, Brisbane, Australia

[email protected]

Contribution: Conceptualization, Formal analysis, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing – review & editing

Google Scholar

Find this author on PubMed

    Abstract

    Mathematical modelling of biological population dynamics often involves proposing high-fidelity discrete agent-based models that capture stochasticity and individual-level processes. These models are often considered in conjunction with an approximate coarse-grained differential equation that captures population-level features only. These coarse-grained models are only accurate in certain asymptotic parameter regimes, such as enforcing that the time scale of individual motility far exceeds the time scale of birth/death processes. When these coarse-grained models are accurate, the discrete model still abides by conservation laws at the microscopic level, which implies that there is some macroscopic conservation law that can describe the macroscopic dynamics. In this work, we introduce an equation learning framework to find accurate coarse-grained models when standard continuum limit approaches are inaccurate. We demonstrate our approach using a discrete mechanical model of epithelial tissues, considering a series of four case studies that consider problems with and without free boundaries, and with and without proliferation, illustrating how we can learn macroscopic equations describing mechanical relaxation, cell proliferation, and the equation governing the dynamics of the free boundary of the tissue. While our presentation focuses on this biological application, our approach is more broadly applicable across a range of scenarios where discrete models are approximated by approximate continuum-limit descriptions.

    1. Introduction

    Mathematical models of population dynamics are often constructed by considering both discrete and continuous descriptions, allowing for both microscopic and macroscopic details to be considered [1]. This approach has been applied to several kinds of discrete models, including cellular Potts models [25], exclusion processes [69], mechanical models of epithelial tissues [1017], hydrodynamics [18,19] and a variety of other types of individual-based models [1,2027]. Continuum models are useful for describing collective behaviour, especially because the computational requirement of discrete models increases with the size of the population, and this can become computationally prohibitive for large populations, which is particularly problematic for parameter inference [28]. By contrast, the computational requirement to solve a continuous model is independent of the population size, and generally requires less computational overhead than working with a discrete approach only [15]. Continuum models are typically obtained by coarse-graining the discrete model, using Taylor series expansions to obtain continuous partial differential equation (PDE) models that govern the population densities on a continuum or macroscopic scale [10,11,29,30].

    One challenge with using coarse-grained continuum limit models is that while the solution of these models can match averaged data from the corresponding discrete model for certain choices of parameters [10,17,31], the solution of the continuous model can be a very poor approximation for other parameter choices [13,15,32,33]. More generally, coarse-grained models are typically only valid in certain asymptotic parameter regimes [31,33,34]. For example, suppose we have a discrete space, discrete time, agent-based model that incorporates random motion and random proliferation. Random motion involves stepping a distance Δ with probability Pm[0,1] per time step of duration τ. The stochastic proliferation process involves undergoing proliferation with probability Pp[0,1] per unit time step. The continuum limit description of this kind of discrete process can be written as [33]

    qt=x(D(q)qx)+R(q),1.1
    where q is the macroscopic density of individuals, D(q) is the nonlinear diffusivity that describes the effects of individual migration and R(q) is a source term that describes the effects of the birth/death process in the discrete model [33]. Standard approaches to derive (1.1) require D(q)=O(PmΔ2/τ) and R(q)=O(Pp/τ) in the limit that Δ0 and τ0. To obtain a well-defined continuum limit such that the diffusion and source terms are both present in the macroscopic model, some restrictions on the parameters in the discrete model are required [33,34]. Typically, this is achieved by taking the limit as Δ0 and τ0 jointly such that the ratio Δ2/τ remains finite, implying that Pp=O(τ) so that both the diffusion and source terms in (1.1) are O(1). In practice, this means that the time scale of individual migration events has to be much faster than the time scale of individual proliferation events, otherwise the continuum limit description is not well defined [33,34]. If this restriction is not enforced, then the solution of the continuum limit model does not always predict the averaged behaviour of the discrete model [33], as the terms on the right-hand side of (1.1) are no longer O(1) so that the continuum limit is not welldefined [34].

    Regardless of whether choices of parameters in a discrete model obey the asymptotic restrictions imposed by coarse-graining, the discrete model still obeys a conservation principle, which implies that there is some alternative macroscopic conservation description that will describe population-level features of interest [35,36]. Equation learning is a means of determining appropriate continuum models outside of the usual continuum limit asymptotic regimes. Equation learning has been used in several applications for model discovery. In the context of PDEs, a typical approach is to write q/t=N(q,D,θ), where q is the population density, N is some nonlinear function parametrized by θ, D is a collection of differential operators and θ are parameters to be estimated [37]. This formulation was first introduced by Rudy et al. [37], who extended previous work in learning ordinary differential equations (ODEs) proposed by Brunton et al. [38]. Equation learning methods developed for the purpose of learning biological models has also been a key interest [39,40]. Lagergren et al. [41] introduce a biologically informed neural network framework that uses equation learning that is guided by biological constraints, imposing a specific conservation PDE rather than a general nonlinear function N. Lagergren et al. [41] use this framework to discover a model describing data from simple in vitro experiments that describe the invasion of populations of motile and proliferative prostate cancer cells. VandenHeuvel et al. [42] extend the work of Lagergren et al. [41], incorporating uncertainty quantification into the equation learning procedure through a bootstrapping approach. Nardini et al. [32] use discrete data from agent-based models to learn associated continuum ODE models, combining a user-provided library of functions together with sparse regression methods to give simple ODE models describing population densities. Regression methods have also been used as an alternative to equation learning for this purpose [43].

    These previous approaches to equation learning consider various methods to estimate the parameters θ, such as sparse regression or nonlinear optimization [32,3739,41,42], representing N as a library of functions [3739], neural networks [41], or in the form of a conservation law with individual components to be learned [41,42]. In this work, we introduce a stepwise equation learning framework, inspired from stepwise regression [44], for estimating θ from averaged discrete data with a given N representing a proposed form for the continuum model description. We incorporate or remove terms one at a time until a parsimonious continuum model is obtained whose solution matches the data well and no further improvements can be made to this match. Our approach is advantageous for several reasons. Firstly, it is computationally efficient and parallelizable, allowing for rapid exploration of results with different discrete parameters and different forms of N for a given dataset. Secondly, the approach is modular, with different mechanistic features easily incorporated. This approach enables extensive computational experimentation by exploring the impact of including or excluding putative terms in the continuum model without any great increase in computational overhead. Lastly, it is easy to examine the results from our procedure, allowing for ease of diagnosing and correcting reasons for obtaining poor fitting models, and explaining what components of the continuum model are the most influential. We emphasize that a key difference between our approach and other work, such as the methods developed by Brunton et al. [38] and Rudy et al. [37], is that we constrain our problem so that we can only learn conservation laws rather than allow a general form through a library of functions, and that we iteratively eliminate variables from θ rather than use sparse regression. These important features are what support the modularity and interpretability of our approach.

    To illustrate our procedure, we consider a discrete, individual-based one-dimensional toy model inspired from epithelial tissues [10,17]. Epithelial tissues are biological tissue composed of cells, organized in a monolayer, and are present in many parts of the body and interact with other cells [45], lining surfaces such as the skin and the intestine [46]. They are important in a variety of contexts, such as wound healing [47,48] and cancer [49,50]. Many models have been developed for studying their dynamics, considering both discrete and continuum modelling [1016], with most models given in the form of a nonlinear reaction–diffusion equation with a moving boundary, using a nonlinear diffusivity term to incorporate mechanical relaxation and a source term to model cell proliferation [12,13,16]. These continuum limit models too are only accurate in certain parameter regimes, becoming inaccurate if the rate of mechanical relaxation is slow relative to the rate of proliferation [13,15,33]. To apply our stepwise equation learning procedure, we let the nonlinear function N be given in the form of a conservation law together with equations describing the free boundary. We demonstrate this approach using a series of four biologically motivated case studies, considering problems with and without a free boundary, and with and without proliferation, with each case study building on those before it. The first two case studies are used to demonstrate how our approach can learn known continuum limits, while the latter two case studies show how we can learn improved continuum limit models in parameter regimes where these known continuum limits are no longer accurate. We implement our approach in the Julia language [51], and all code is available on GitHub at https://github.com/DanielVandH/StepwiseEQL.jl.

    2. Mathematical model

    Following Murray et al. [10] and Baker et al. [16], we suppose that we have a set of nodes x1,,xn(t) describing n cell boundaries at a time t. The interval (xi(t),xi+1(t)) represents the ith cell for i=1,,n1, where we fix x1=0 and x1<x2(t)<<xn(t). The number of nodes, n, may increase over time due to cell proliferation. We model the mechanical interaction between cells by treating them as springs, as indicated in figure 1, so that each node i experiences forces Fi,i±1 from nodes i±1, respectively, except at the boundaries where there is only one neighbouring force. We further assume that each of these springs has the same mechanical properties, and that the viscous force from the surrounding medium is given by ηdxi(t)/dt with drag coefficient η. Lastly, assuming we are in a viscous medium so that the motion is overdamped, we can model the dynamics of each individual node xi(t), fixing x1=0, by [16]

    ηdxi(t)dt=Fi,i1+Fi,i+1,i=1,,n12.1
    and
    ηdxn(t)dt=Fn,n1,2.2
    where
    Fi,i±1=F(|xi(t)xi±1(t)|)xi(t)xi±1(t)|xi(t)xi±1(t)|2.3
    is the interaction force that the ith node experiences from nodes i±1 (figure 1). In Case Studies 1 and 3 (see §3, below), we hold xn(t)=L constant and discard (2.2). Throughout this work, we use linear Hookean springs so that F(i)=k(si), i>0, where i(t)=xi+1(t)xi(t) is the length of the ith cell, k>0 is the spring constant and s0 is the resting spring length [10]; we discuss other force laws in electronic supplementary material, §S5.
    Figure 1.

    Figure 1. Discrete model and schematics for each case study (CS). (a) A fixed boundary problem with x1=0 and xn=L fixed. (b) A free boundary problem with x1=0 and xn(t)=L(t), shown in red, free. (c) Proliferation schematic, showing a cell (xi(t),xi+1(t)) dividing into (xi(t+Δt),xi+1(t+Δt)) and (xi+1(t+Δt),xi+2(t+Δt)) following a proliferation event, where xi+1(t+Δt)=(xi(t)+xi+1(t))/2. Panels (dg) show schematics for the four case studies considered in the paper, where the first row in each panel is a representation of the initial configuration of cells at t=0 and the second row a representation at a later time t>0.

    The dynamics governed by (2.1)–(2.2) describe a system in which cells mechanically relax. Following previous work [10,14,16], we introduce a stochastic mechanism that allows the cells to undergo proliferation, assuming only one cell can divide at a time over a given interval [t,t+Δt) for some small duration Δt. We let the probability that the ith cell proliferates be given by GiΔt, where Gi=G(i) for some length-dependent proliferation law G(i)>0. As represented in figure 1c, when the ith cell proliferates, the cell divides into two equally sized daughter cells, and the boundary between the new daughter cells is placed at the midpoint of the original cell. Throughout this work, we use a logistic proliferation law G(i)=β[11/(Ki)] with i>1/K, where β is the intrinsic proliferation rate and K is the carrying capacity density; we consider other proliferation laws in electronic supplementary material, §S5. The implementation of the solution to these equations (2.1)–(2.2) and the proliferation mechanism is given in the Julia package EpithelialDynamics1D.jl; in this implementation, if G(i)<0, we set G(i)=0 to be consistent with the fact that we interpret G(i) as a probability. We emphasize that, without proliferation, we need only solve (2.1)–(2.2) once for a given initial condition in order to obtain the expected behaviour of the discrete model, because the discrete model is deterministic in the absence of proliferation. By contrast, incorporating proliferation means that we need to consider several identically prepared realizations of the same stochastic discrete model to estimate the expected behaviour of the discrete model for a given initial condition.

    In practice, macroscopic models of populations of cells are described in terms of cell densities rather than keeping track of the position of individual cell boundaries. The density of the ith cell (xi(t),xi+1(t)) is 1/i(t). For an interior node xi(t), we obtain a density qi(t) by taking the inverse of the average density of the cells left and right of xi(t), giving

    qi(t)=2xi+1(t)xi1(t),i=2,,n1,2.4
    as in Baker et al. [16]. At boundary nodes, we use
    q1(t)=2x2(t)2x3(t)andqn(t)=2xn(t)xn1(t)2xn(t)xn2(t),2.5
    derived by linear extrapolation of (2.4) to the boundary. The densities in (2.5) ensure that the slope of the density curves at the boundaries, q/x, match those in the continuum limit. We discuss the derivation of (2.5) in electronic supplementary material, §S2. In the continuum limit where the number of cells is large and mechanical relaxation is fast, the densities evolve according to the moving boundary problem [10,16]
    qt=x(D(q)qx)+R(q)0<x<L(t),t>0,qx=0x=0,t>0,qx=H(q)x=L(t),t>0andqdLdt=D(q)qxx=L(t),t>0,}2.6
    where q(x,t) is the density at position x and time t, D(q)=1/(ηq2)F(1/q), R(q)=qG(1/q), H(q)=2qF(1/q)/[ηD(q)], and L(t)=xn(t) is the leading edge position with L(0)=xn(0). The quantity 1/q in these equations can be interpreted as a continuous function related to the length of the individual cells. The initial condition q(x,0)=q0(x) is a linear interpolant of the discrete densities qi(t) of the cells at t=0. Similar to the discussion of (1.1), for this continuum limit to be valid so that both D(q) and R(q) play a role in the continuum model, constraints must be imposed on the discrete parameters. As discussed by Murphy et al. [15], we require that the time scale of mechanical relaxation is sufficiently fast relative to the time scale of proliferation. In practice, this means that for a given choice of β we must have k/η sufficiently large for the solution of the continuum model to match averaged data from the discrete model. We note that, with our choices of F and G, the functions in (2.6) are given by
    D(q)=kηq2,R(q)=βq(1qK),H(q)=2q2(1qs).2.7
    For fixed boundary problems, we take H(q)=0 and dL/dt=0. In electronic supplementary material, §S3, we describe how to solve (2.6) numerically, as well as how to solve the corresponding problem with fixed boundaries numerically.

    3. Continuum-discrete comparison

    We now consider four biologically motivated case studies to illustrate the performance of the continuum limit description (2.6). These case studies are represented schematically in figure 1dg. Case Studies 1 and 3, shown in figure 1d,f, are fixed boundary problems, where we see cells relax mechanically towards a steady state where each cell has equal length. Case Studies 2 and 4 are free boundary problems, where the right-most cell boundary moves in the positive x-direction while all cells relax towards a steady state where the length of each cell is given by resting spring length s. Case Studies 1 and 2 have β=0 so that there is no cell proliferation and the number of cells remains fixed during the simulations, whereas Case Studies 3 and 4 have β>0 so that the number of cells increases during the discrete simulations. To explore these problems, we first consider cases where the continuum limit model is accurate, using the data shown in figure 2, where we show space–time diagrams in the left column of figure 2 and a set of averaged density profiles for each problem in the right column of figure 2. Case Studies 1 and 3 initially place 30 nodes in 0x5 and 30 nodes in 25x30, or equivalently n=60 with 28 cells in 0x5 and 28 cells in 25x30, spacing the nodes uniformly within each subinterval. Case Studies 2 and 4 initially place 60 equally spaced nodes in 0x5.

    Figure 2.

    Figure 2. Space–time diagrams (left column) and densities (right column) for the four case studies from figure 1 considered throughout the paper. The left column shows the evolution of the discrete densities in space and time, with (c) and (d) showing averaged results over 2500 identically prepared realizations of the discrete model. In (b) and (d), the red line shows the position of the free boundary. In the figures in the right column, the solid curves are the discrete densities (2.4) and the dashed curves are solutions to the continuum limit problem (2.6), and the curves are given by black, red, blue, green, orange and purple in the order of increasing time as indicated by the black arrows. The times shown are (a) t=0,1,2,3,4,5; (b) t=0,5,10,25,50,100; (c) t=0,1,5,10,20,50; and (d) t=0,5,10,20,50,100. In (c) and (d), the shaded regions show 95% confidence bands from the mean discrete curves at each time; the curves in (a) and (b) show no shaded regions as these models have no stochasticity.

    The problems shown in figure 2 use parameter values such that the solution of the continuum limit (2.6) is a good match to the averaged discrete density profiles. In particular, all problems use k=50, η=1, s=1/5 and, for Case Studies 3 and 4, Δt=102, K=15 and β=0.15. The accuracy of the continuum limit is clearly evident in the density profiles in figure 2, where, in each case, the solution of the continuum limit model is visually indistinguishable from averaged data from the discrete model. With proliferation, however, the continuum limit can be accurate when k/η is not too much larger than β, and we use Case Studies 3 and 4 to explore this.

    Figure 3 shows further continuum-discrete comparisons for Case Studies 3 and 4 where we have slowed the mechanical relaxation by taking k=1/5. This choice of k means that D(q) and R(q) are no longer on the same scale and thus the continuum limit is no longer well defined, as explained in the discussion of (1.1), meaning the continuum limit solutions are no longer accurate. In both cases, the solution of the continuum limit model lags behind the averaged data from the discrete model. In electronic supplementary material, §S1, we show the 95% confidence regions for each curve in figure 3, where we find that the solutions have much greater variance compared with the corresponding curves in figure 2 where k=50.

    Figure 3.

    Figure 3. Examples of inaccurate continuum limits for (a) Case Study 3 and (b) Case Study 4, where both case studies use the same parameters as in figure 2 except with k=1/5 rather than k=50. The solid curves are the discrete densities (2.4) and the dashed curves are solutions to the continuum limit problem (2.6). The arrows show the direction of increasing time. The density profiles are plotted in black, red, blue, green, orange and purple for the respective times (a) t=0,1,10,25,40,75 and (b) t=0,5,25,50,100,250.

    We are interested in developing an equation learning method for learning an improved continuum model for problems like those in figure 3, allowing us to extend beyond the parameter regime where the continuum limit (2.6) is accurate. We demonstrate this in Case Studies 1–4 in §4 where we develop such a method.

    4. Learning accurate continuum limit models

    In this section, we introduce our method for equation learning and demonstrate the method using the four case studies from figures 13. Since the equation learning procedure is modular, adding these components into an existing problem is straightforward. All Julia code to reproduce these results is available at https://github.com/DanielVandH/StepwiseEQL.jl. A summary of all the parameters used for each case study is given in table 1.

    Table 1. Parameters used for each case study. The parameters are k, the spring constant; η, the drag coefficient; s, the resting spring length; Δt, the proliferation duration; β, the intrinsic proliferation rate; K, the carrying capacity density; M, the number of time points; t1, the initial time; tM, the final time; ns, the number of identically prepared realizations; nk, the number of knots used for averaging over realizations; τq, which defines the 100τq% and 100(1τq)% density quantiles; τdL/dt, which defines the 100τdL/dt% and 100(1τdL/dt)% velocity quantiles; and τt, which defines the 100τt% and 100(1τt)% temporal quantiles. Values indicated by a line are not relevant for the corresponding case study. For Case Studies 3 and 4, the label ‘a’ refers to the accurate continuum limit case, and ‘b’ refers to the inaccurate continuum limit case. For Case Study 4, some parameters are given by a set of four parameters, with the ith value of this set referring to the value used when learning the ith mechanism; see §4d for details.

    case study
    parameter 1 2 3a 3b 4a 4b
    k 50 50 50 1/5 50 1/5
    η 1 1 1 1 1 1
    s 1/5 1/5 1/5 1/5 1/5 1/5
    Δt 102 102 102 102
    β 0.15 0.15 0.15 0.15
    K 15 15 15 15
    M 50 150 501 751 (25,50,100,250) (20,200,200,200)
    t1 0 0 0 0 (0,0,5,10) (0,2,10,20)
    tM 5 15 50 75 (101,5,10,50) (2,10,20,50)
    ns 1000 1000 1000 1000
    nk 50 200 (25,50,100,50) (50,100,100,100)
    τq 0.1 0.35 0.1 0.25 (0.1,0,0,0) (0,0,0,0.3)
    τdL/dt 0.1 (0,0.2,0,0) (0,0.4,0,0)
    τt 0 0 0 0 0 (0.4,0.4,0,0)

    (a) Case Study 1: fixed boundaries

    Case Study 1 involves mechanical relaxation only so that there is no cell proliferation and the boundaries are fixed, implying R(q)=0 and H(q)=0 in (2.6), respectively, and the only function to learn is D(q).

    Our equation learning approach starts by assuming that D(q) is a linear combination of d basis coefficients {θ1,,θd} and d basis functions {φ1,,φd}, meaning D(q) can be represented as

    D(q)=i=1dθiφi(q).4.1
    These basis functions could be any univariate function of q, for example the basis could be {φ1,φ2,φ3}={1/q,1/q2,1/q3} with d=3. In this work, we impose the constraint that D(q)0 for qminqqmax, where qmin and qmax are the minimum and maximum densities observed in the discrete simulations, respectively. This constraint enforces the condition that the nonlinear diffusivity function is positive over the density interval of interest. While it is possible to work with some choices of nonlinear diffusivity functions for which D(q)<0 for some interval of density [5254], we wish to avoid the possibility of having negative nonlinear diffusivity functions and our results support this approach.

    The aim is to estimate θ=(θ1,,θd)T in (4.1). We use ideas similar to the basis function approach from VandenHeuvel et al. [42], using (4.1) to construct a matrix problem for θ. In particular, let us take the PDE (2.6), with R(q)=0 and H(q)=0, and expand the spatial derivative term so that we can isolate the θk terms,

    qijt=k=1d{dφk(qij)dq(qijx)2+φk(qij)2qijx2}θk,4.2
    where we let qij denote the discrete density at position xij=xi(tj) and time tj. We note that while qij is discrete, we assume it can be approximated by a smooth function, allowing us to define these derivatives qij/t, qij/x and 2qij/x2 in (4.2); this assumption is appropriate since, as shown in figure 2, these discrete densities can be well approximated by smooth functions. These derivatives are estimated using finite differences, as described in electronic supplementary material, §S4. We also emphasize that, while (4.2) appears similar to results in [37,38], the crucial difference is that we are specifying forms for the mechanisms of the PDE rather than the complete PDE itself; one other important difference is in how we estimate θ, defined below and in (4.7). We save the solution to the discrete problems (2.1)–(2.2) at M times 0=t1<t2<<tM so that i{1,,n} and j{2,,M}, where n=60 is the number of nodes and we do not deal with data at j=1 since the PDE does not apply at t=0. We can therefore convert (4.2) into a rectangular matrix problem Aθ=b, where the rth row in A, r=1,,n(M1), corresponding to the point (xij,tj) is given by aijR1×d, where
    aij=[dφ1(qij)dq(qijx)2+φ1(qij)2qijx2,,dφd(qij)dq(qijx)2+φd(qij)2qijx2],4.3
    with each element of aij corresponding to the contribution of the associated basis function in (4.2). Thus, we obtain the system
    A=[a12a22anM]Rn(M1)×dandb=[q12tq22tqnMt]Rn(M1)×1.4.4
    The solution of Aθ=b, given by θ=(ATA)1ATb, is obtained by minimizing the residual ||Aθb||22, which keeps all terms present in the learned model. We expect, however, just as in (2.7), that θ is sparse so that D(q) has very few terms, which makes the interpretation of these terms feasible [37,38]. There are several ways that we could solve Aθ=b to obtain a sparse vector, such as with sparse regression [37], but in this work, we take a stepwise equation learning approach inspired by stepwise regression [44] as this helps with both the exposition and modularity of our approach. For this approach, we first let I={1,,d} be the set of basis function indices. We let Ak denote the set of active coefficients at the kth iteration, meaning the indices of non-zero values in θ, starting with A1=I. The set of indices of zero values in θ, Ik=IAk, is called the set of inactive coefficients. To obtain the next set, Ak+1, from a current set Ak, we apply the following steps:
    (i)

    Let the vector θA denote the solution to Aθ=b subject to the constraint that each inactive coefficient θi is zero, meaning θi=0 for iIA for a given set of active coefficients A. We compute θA by solving the reduced problem in which the inactive columns of A are not included. The vector with A=Ak at step k is denoted θk. With this definition, we compute the sets

    Mk+={θAk{i}:iAk}andMk={θAk{i}:iAk}.4.5
    Mk+ is the set of all coefficient vectors θ obtained by making each active coefficient at step k inactive one at a time. Mk, is similar to Mk+1 except we make each inactive coefficient at step k active one at a time. We then define Mk={θk}Mk+Mk, so that Mk is the set of all coefficient vectors obtained from activating coefficients one at a time, deactivating coefficients one at a time, or retaining the current vector θk.

    (ii)

    Choose one of the vectors in Mk by defining a loss function L(θ),

    L(θ)loss=log[1n(M1)j=2Mi=1n(qijq(xij,tj;θ)qij)2]goodness of fit+||θ||0model complexity,4.6
    where q(x,t;θ) is the solution of the PDE (2.6) with R(q)=H(q)=0 and D(q) uses the coefficients θ in (4.1), q(xij,tj;θ) is the linear interpolant of the PDE data at t=tj evaluated at x=xij, and ||θ||0 is the number of non-zero terms in θ. This loss function balances the goodness of fit with model complexity. If, for some θ, D(q)<0 within qminqqmax, which we check by evaluating D(q) at nc=100 equally spaced points in qminqqmax, we set L(θ)=. With this loss function, we compute the next coefficient vector
    θk+1=argminθMkL(θ).4.7
    If θk+1=0, so that all the coefficients are inactive, we instead take the vector that attains the second-smallest loss so that a model with no terms cannot be selected.

    (iii)

    If θk+1=θk, then there are no more local improvements to be made and so the procedure stops. Otherwise, we recompute Ak+1 and Ik+1 from θk+1 and continue iterating.

    The second step prevents empty models from being returned, allowing the algorithm to more easily find an optimal model when starting with no active coefficients. We note that Nardini et al. [32] consider a loss based on the regression error, ||Aθb||22, that has been useful for a range of previously considered problems [32,37,38]. We do not consider the regression error in this work as we find that it typically leads to poorer estimates for θ compared with controlling the density errors as we do in (4.7).

    Let us now apply the procedure to our data from figure 2, where we know that the continuum limit with D(q)=50/q2 is accurate. We use the basis functions φi=1/qi for i=1,2,3 so that

    D(q)=θ1q+θ2q2+θ3q3,4.8
    and we expect to learn θ=(0,50,0)T. We save the solution to the discrete model at M=50 equally spaced time points between t1=0 and tM=5. With this set-up, and starting with all coefficients initially active so that A1={1,2,3}, we obtain the results in table 2. The first iteration gives us θ1 such that D(q)<0 for some range of q as we show in figure 4a, and so we assign L(θ1)=. To get to the next step, we remove θ1, θ2 and θ3 one a time and compute the loss for each resulting vector, and we find that removing θ3 leads to a vector that gives the least loss out of those considered. We thus find A2={1,2} and θ2=(1.46,47.11,0)T. Continuing, we find that out of the choice of removing θ1 or θ2, or putting θ3 back into the model, removing θ1 decreases the loss by the greatest amount, giving A3={2}. Finally, we find that there are no more improvements to be made, and so the algorithm stops at θ3=(0,43.52,0)T, which is close to the continuum limit. We emphasize that this final θ3 is a least-squares solution with the constraint θ1=θ3=0, thus there is no need to refine θ3 further by eliminating θ1 and θ3 directly in (4.8), as the result would be the same. Comparing the densities from the solution of the learned PDE with θ=θ3 with the discrete densities in figure 5a, we see that the curves are nearly visually indistinguishable near the centre, but there are some visually discernible discrepancies near the boundaries. We show the form of D(q) at each iteration in figure 4a, where we observe that the first iteration captures only the higher densities, the second iteration captures the complete range of densities, and the third iteration removes a single term which gives no noticeable difference.
    Figure 4.

    Figure 4. Progression of D(q) over each iteration for Case Study 1: fixed boundaries. (a) Progression from the results in table 2 (dashed curves). (b) As in (a), except with the results from table 3 using matrix pruning.

    Figure 5.

    Figure 5. Stepwise equation learning results for Case Study 1: fixed boundaries. (a) Comparisons of the discrete density profiles (solid curves) with those learned from PDEs obtained from the results in table 2 (dashed curves). (b) As in (a), except with the results from table 3 using matrix pruning so that densities outside of the 10% and 90% density quantiles are not included.(c) Comparisons of the learned D(q) from table 2 without pruning, table 3 with pruning, and the continuum limit from (2.7). In (ab), the arrows show the direction of increasing time, and the density profiles shown are at times t=0,1,2,3,4,5 in black, red, blue, green, orange and purple, respectively.

    Table 2. Stepwise equation learning results for the density data for Case Study 1: fixed boundaries using the basis expansion (4.8), saving the results at M=50 equally spaced times between t1=0 and tM=5 and starting with all coefficients active, A1={1,2,3}. Coefficients highlighted in blue show the coefficient chosen to be removed or added at the corresponding step.

    step θ1 θ2 θ3 loss
    1 −5.97 70.73 27.06
    2 1.46 47.11 0.00 −4.33
    3 0.00 43.52 0.00 −5.18

    Table 3. Improved results for Case Study 1: fixed boundaries from table 2, now using matrix pruning so that densities outside of the 10% and 90% density quantiles are not included. Coefficients highlighted in blue show the coefficient chosen to be removed or added at the corresponding step.

    step θ1 θ2 θ3 loss
    1 1.45 42.48 13.76 4.19
    2 0.00 37.79 19.69 5.46
    3 0.00 49.83 0.00 7.97

    To improve our learned model, we introduce matrix pruning, inspired from the data thresholding approach in VandenHeuvel et al. [42], to improve the estimates for θ. Visual inspection of the space–time diagram in figure 2a shows that the most significant density changes occur at early time and near to locations where q changes in the initial condition, and a significant portion of the space–time diagram involves regions where q is almost constant. These regions where q has minimal change are problematic as points which lead to a higher residual are overshadowed, affecting the least-squares problem and consequently degrading the estimates for θ significantly, and so it is useful to only include important points in the construction of A. To resolve this issue, we choose to only include points if the associated densities fall between the 10% and 90% quantiles for the complete set of densities, which we refer to by density quantiles; more details on this pruning procedure are given in electronic supplementary material, §S4. This choice of density quantiles is made using trial and error, starting at 0% and 100%, respectively, and shrinking the quantile range until suitable results are obtained. When we apply this pruning and reconstruct A, we obtain the improved results in table 3 and associated densities in figure 5b. Compared with table 2, we see that the coefficient estimates for θ all lead to improved losses, and our final model now has θ=(0,49.83,0)T, which is much closer to the continuum limit, as we see in figure 5b) where the solution curves are now visually indistinguishable everywhere. Moreover, we show in figure 4b how D(q) is updated at each iteration, where we see that the learned nonlinear diffusivity functions are barely different from the expected continuum limit result. These results demonstrate the importance of only including the most important points in A.

    (b) Case Study 2: free boundaries

    Case Study 2 extends Case Study 1 by allowing the right-most cell boundary to move so that H(q)0. We do not consider proliferation, giving R(q)=0 in (2.6).

    The equation learning procedure for this case study is similar to Case Study 1, namely we expand D(q) as in (4.1) and constrain D(q)0. In addition to learning D(q), we need to learn H(q) and the evolution equation describing the free boundary. In (2.6), this evolution equation is given by a conservation statement, qdL/dt=D(q)q/x with q=q(L(t),t). Here, we treat this moving boundary condition more generally by introducing a function E(t) so that

    qdLdt=E(q)qx,4.9
    at x=L(t) for t>0. While (4.9) could lead to local loss of conservation at the moving boundary, our approach is to allow for the possibility that coefficients in D(q) and E(q) differ and to explore the extent to which this is true, or otherwise, according to our equation learning procedure. We constrain E(q)0 so that (4.9) makes sense for our problem and we expand D(q), H(q) and E(q) as follows:
    D(q)=i=1dθidφid(q),H(q)=i=1hθihφih(q),E(q)=i=1hθieφie(q).4.10
    The matrix system for θd=(θ1d,,θdd)T is the same as it was in Case Study 1 in (4.4), which we now write as Adθd=bd with AdRn(M1)×d and bdRn(M1)×1 given by A and b in (4.4), and we can construct two other independent matrix systems for θh=(θ1h,,θhh)T and θe=(θ1e,,θee)T. To construct these matrix systems, for a given boundary point (xnj,tj), we write
    qnjx=k=1hθkhφkh(qnj)andqnjdLjdt=qnjxk=1eθkeφke(qnj),4.11
    where Lj=xnj is the position of the leading edge at t=tj. In (4.11), we assume that Lj can be approximated by a smooth function so that dLj/dt can be defined. With (4.11) we have Ahθh=bh and Aeθe=be, where
    Ah=[φ1h(q12)φhh(q12)φ1h(qnM)φhh(qnM)],bh=[q12xqnMx]4.12
    with AhR(M1)×h and bhR(M1)×1, and
    Ae=[φ1e(q12)qn2xφee(q12)qn2xφ1e(qnM)qnMxφee(qnM)qnMx],be=[qn2dL2dtqnMdLMdt]4.13
    with AeR(M1)×e and beR(M1)×1. Then, writing
    A=diag(Ad,Ah,Ae)R(n+2)(M1)×(d+h+e),b=[bdbhbe]R(n+2)(M1)×1,4.14
    we obtain
    Aθ=b,θ=[θdθhθe]R(d+h+e)×1.4.15
    The solution of Aθ=b is the combined solution of the individual linear systems as A is block diagonal. Estimates for θd, θh and θe are independent, which demonstrates the modularity of our approach, where these additional features, in particular the leading edge, are just an extra independent component of our procedure in addition to the procedure for estimating D(q).

    In addition to the new matrix system Aθ=b in (4.15), we augment the loss function (4.6) to incorporate information about the location of the moving boundary. Letting L(t;θ) denote the leading edge from the solution of the PDE (2.6) with parameters θ, the loss function is

    L(θ)loss=log[1n(M1)j=2Mi=1n(qijq(xij,tj;θ)qij)2]density goodness of fit+log[1M1j=2M(LjL(tj;θ)Lj)2]leading edge goodness of fit+||θ||0model complexity.4.16

    Let us now apply our stepwise equation learning procedure with (4.15) and (4.16). We consider the data from figure 2, where we know in advance that the continuum limit with D(q)=50/q2, H(q)=2q20.4q3 and E(q)=50/q2 is accurate. The expansions we use for D(q), H(q) and E(q) are given by

    D(q)=θ1dq+θ2dq2+θ3dq3,H(q)=θ1hq+θ2hq2+θ3hq3+θ4hq4+θ5hq5andE(q)=θ1eq+θ2eq2+θ3eq3.}4.17
    With these expansions, we expect to learn θd=(0,50,0)T, θh=(0,2,0.4,0,0)T and θe=(0,50,0)T. We initially consider saving the solution at M=1000 equally spaced times between t1=0 and tM=100, and using matrix pruning so that only points whose densities fall within the 35% and 65% density quantiles are included. The results with this configuration are shown in table 4, where we see that we are only able to learn H(q)=E(q)=0 and D(q)=25.06/q3. This outcome highlights the importance of choosing an appropriate time interval, since figure 2b indicates that mechanical relaxation takes place over a relative short interval, which means that working with data in 0<t100 can lead to a poor outcome.
    Figure 6.

    Figure 6. Stepwise equation learning results from table 5 for Case Study 2: free boundaries. (a) Comparisons of the discrete density profiles (solid curves) with those learned from PDEs obtained from the results in table 5 (dashed curves), plotted at the times t=0,5,10,25,50,100 in black, red, blue, green, orange and purple, respectively. The arrow shows the direction of increasing time. (b) As in (a), except comparing the leading edges. Panels (ce) are comparisons of the learned forms of D(q), H(q) and E(q) with the forms from the continuum limit (2.7).

    Table 4. Stepwise equation learning results for Case Study 2: free boundaries, using the basis expansions (4.17), saving the results at M=1000 equally spaced times between t1=0 and tM=100, pruning so that densities outside of the 35% and 65% density quantiles are not included, and starting with all terms inactive. Coefficients highlighted in blue show the coefficient chosen to be removed or added at the corresponding step.

    step θ1d θ2d θ3d θ1h θ2h θ3h θ4h θ5h θ1e θ2e θ3e loss
    1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.40
    2 0.00 0.00 25.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.40

    We proceed by restricting our data collection to 0t15, now saving the solution at M=200 equally spaced times between t1=0 and tM=15. Keeping the same quantiles for the matrix pruning, the new results are shown in table 5 and figure 6. We see that the densities and leading edges are accurate for small time, but the learned mechanisms do not extrapolate as well for t15, for example L(t) in figure 6b does not match the discrete data. To address this issue, we can further limit the information that we include in our matrices, looking to only include boundary points where dL/dt is neither too large nor too small. We implement this by excluding all points (xnj,tj) from the construction of (Ae,be) in (4.13) such that dLj/dt is outside of the 10% or 90% quantiles of the vector (dL2/dt,,dLM/dt), called the velocityquantiles.

    Figure 7.

    Figure 7. Stepwise equation learning results from table 5 for Case Study 2: free boundaries, except also using matrix pruning on (A3,b3) so points where dLj/dt falls outside of the 10% and 90% velocity quantiles are excluded, giving θ1e=9.42 rather than 8.74. (a) Comparisons of the discrete density profiles (solid curves) with those from the learned PDE (dashed curves), plotted at the times t=0,5,10,25,50,100 in black, red, blue, green, orange and purple, respectively. The arrow shows the direction of increasing time. (b) As in (a), except comparing the leading edges. Panels (ce) are comparisons of the learned forms of D(q), H(q) and E(q) with the forms from the continuum limit (2.7).

    Table 5. Stepwise equation learning results for Case Study 2: free boundaries, using the basis expansions (4.17), saving the results at M=200 equally spaced times between t1=0 and tM=15, pruning so that densities outside of the 35% and 65% density quantiles are not included, and starting with all terms inactive. Coefficients highlighted in blue show the coefficient chosen to be removed or added at the corresponding step.

    step θ1d θ2d θ3d θ1h θ2h θ3h θ4h θ5h θ1e θ2e θ3e loss
    1 0.00 0.00 0.00 0.00 . 0.00 0.00 0.00 0.00 0.00 0.00 −3.37
    2 0.00 0.00 0.00 0.00 −0.03 0.00 0.00 0.00 0.00 0.00 −2.37
    3 0.00 0.00 0.00 −0.03 0.00 0.00 0.00 8.74 0.00 0.00 −3.68
    4 0.00 47.38 0.00 −0.03 0.00 0.00 0.00 8.74 0.00 0.00 −4.02
    5 0.00 47.38 0.00 8.41 −1.69 0.00 0.00 0.00 8.74 0.00 0.00 −8.14

    Implementing thresholding on dL/dt leads to the results presented in figure 7. We see that the learned densities and leading edges are both visually indistinguishable from the discrete data. Since H(q) and E(q) are only ever evaluated at x=L(t), and q(L(t),t)5 for t>0, we see that H(q) and E(q) only match the continuum limit at q5, which means that our learned continuum limit model conserves mass and is consistent with the traditional coarse-grained continuum limit, as expected. We discuss in electronic supplementary material, §S5, how we can enforce D(q)=E(q) to guarantee conservation mass from the outset; however, our approach in figure 7 is more general in the sense that our learned continuum limit is obtained without making any a priori assumptions about the form of E(q).

    (c) Case Study 3: fixed boundaries with proliferation

    Case Study 3 is identical to Case Study 1 except that we incorporate cell proliferation, implying R(q)0 in (2.6). This case is more complicated than with mechanical relaxation only, as we have to consider how we combine the repeated realizations to capture the average density data as well. For this work, we average over each realization at each time using linear interpolants as described in electronic supplementary material, §S4. This averaging procedure gives nk points x¯ij between x=0 and x=30 at each time tj, j=1,,M, with corresponding density value q¯ij. The quantities x¯ij and q¯ij play the same role as xij and qij in the previous case studies.

    To apply equation learning, we note there is no moving boundary, giving H(q)=0 in (2.7). We proceed by expanding D(q) and R(q) as follows:

    D(q)=i=1dθidφid(q)andR(q)=i=1rθirφir(q),4.18
    with the aim of estimating θd=(θ1d,,θdd)T and θr=(θ1r,,θrr)T, again constraining D(q)0. We expand the PDE from (4.2), as in §4a, and the only difference is the additional term m=1rφmr(q¯ij)θmr for each point (x¯ij,tj). Thus, we have the same matrix as in §4a, denoted AdRnk(M1)×d, and a new matrix ArRnk(M1)×r whose row corresponding to the point (x¯ij,tj) is given by
    aijr=[φ1r(q¯ij)φrr(q¯ij)]R1×r,4.19
    so that the coefficient matrix A is now
    A=[AdAr]Rnk(M1)×(d+r).4.20
    The corresponding entry for the point (x¯ij,tj) in bRnk(M1)×1 is q¯ij/t. Notice that this additional term in the PDE adds an extra block to the matrix without requiring a significant coupling with the existing equations from the simpler problem without proliferation. Thus, we estimate our coefficient vectors using the system
    Aθ=bandθ=[θdθr]R(d+r)×1.4.21
    We can take exactly the same stepwise procedure as in §4a, except now the loss function (4.6) uses nk, q¯ij and x¯ij rather than n, qij and xij, respectively.

    (i) Accurate continuum limit

    Let us now apply these ideas to our data from figure 2, where we know that the continuum limit with D(q)=50/q2 and R(q)=0.15q0.01q2 is accurate. The expansions we use for D(q) and R(q) are given by

    D(q)=θ1dq+θ2dq2+θ3dq3andR(q)=θ1rq+θ2rq2+θ3rq3+θ4rq4+θ5rq5,4.22
    and we expect to learn θd=(0,50,0)T and θr=(0.15,0.01,0,0,0)T. We average over 1000 identically prepared realizations, saving the solutions at M=501 equally spaced times between t1=0 and tM=50 with nk=50 knots for averaging. For this problem, and for Case Study 4 discussed later, we find that working with 1000 identically prepared realizations of the stochastic models leads to sufficiently smooth density profiles. As discussed in appendix A, the precise number of identically prepared realizations is not important provided that the number is sufficiently large; when not enough realizations are taken, the results are inconsistent across different sets of realizations and will fail to identify the average behaviour from the learned model. We also use matrix pruning so that we only include points whose densities fall within the 10% and 90% density quantiles, as done in §4a. The results we obtain are shown in table 6, starting with all coefficients active.

    Table 6 shows that we find θd=(0,52.97,0)T and θr=(0.15,0.010,0,0,0)T, which are both very close to the continuum limit. Figure 8 visualizes these results, showing that the PDE solutions with the learned D(q) and R(q) match the discrete densities, and the mechanisms that we do learn are visually indistinguishable with the continuum limit functions (2.7) as shown in figure 8b,c.

    Figure 8.

    Figure 8. Stepwise equation learning results for Case Study 3: fixed boundaries with proliferation, where the continuum limit is accurate. (a) Comparisons of the discrete density profiles (solid curves) with those learned from PDEs obtained from the results in table 6 (dashed curves), plotted at the times t=0,1,5,10,20,50 in black, red, blue, green, orange and purple, respectively. The arrow shows the direction of increasing time. Panels (b)–(c) are comparisons of D(q) and R(q) with the forms from the continuum limit (2.7).

    Table 6. Stepwise equation learning results for Case Study 3: fixed boundaries with proliferation, where the continuum limit is accurate, using the basis expansions (4.22), saving the results at M=501 equally spaced times between t1=0 and tM=50, averaging across 1000 realizations with nk=50 knots, pruning so that densities outside of the 10% and 90% density quantiles are not included, and starting with all diffusion and reaction coefficients active. Coefficients highlighted in blue show the coefficient chosen to be removed or added at the corresponding step.

    step θ1d θ2d θ3d θ1r θ2r θ3r (×104) θ4r (×105) θ5r (×107) loss
    1 −11.66 147.43 191.51 0.13 −0.00 −0.00 5.83 −11.30
    2 −2.24 60.86 0.00 0.13 −0.00 5.72 2.62 −3.49 −0.71
    3 2.25 60.90 0.00 0.14 −0.01 0.00 −1.25 5.95 −1.92
    4 0.00 52.95 0.00 0.14 −0.01 0.00 1.36 6.49 −3.35
    5 0.00 53.02 0.00 0.15 −0.01 0.00 0.00 0.32 −4.98
    6 0.00 52.97 0.00 0.15 −0.01 0.00 0.00 0.00 −5.70

    (ii) Inaccurate continuum limit

    We now extend the problem so that the continuum limit is no longer accurate, taking k=1/5 to be consistent with figure 3a. Using the same basis expansions in (4.22), we save the solution at M=751 equally spaced times between t1=0 and tM=75, averaging over 1000 realizations with nk=200. We find that we need to use the 25% and 75% density quantiles rather than the 10% and 90% density quantiles, as in the previous example, to obtain results in this case. With this configuration, the results we find are shown in table 7 and figure 9.

    Figure 9.

    Figure 9. Stepwise equation learning results for Case Study 3: fixed boundaries with proliferation, where the continuum limit is inaccurate. (a) Comparisons of the discrete density profiles (solid curves) with those learned from PDEs obtained from the results in table 6 (dashed curves), plotted at the times t=0,1,10,25,40,75 in black, red, blue, green, orange and purple, respectively. The arrow shows the direction of increasing time. (b)–(c) are comparisons of D(q) and R(q) with the forms from the continuum limit (2.7).

    Table 7. Stepwise equation learning results for Case Study 3: fixed boundaries with proliferation, where the continuum limit is inaccurate, using the basis expansions (4.22), saving the results at M=751 equally spaced times between t1=0 and tM=75, averaging across 1000 realizations with nk=200 knots, pruning so that densities outside of the 25% and 75% density quantiles are not included, and starting with all diffusion and reaction coefficients inactive. Coefficients highlighted in blue show the coefficient chosen to be removed or added at the corresponding step.

    step θ1d θ2d θ3d θ1r θ2r θ3r (×104) θ4r (×105) θ5r loss
    1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 −0.33
    2 0.00 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.51
    3 0.00 0.00 0.00 0.11 −0.01 0.00 0.00 0.00 0.20
    4 0.00 0.11 0.00 0.11 −0.01 0.00 0.00 0.00 −0.04
    5 0.00 0.12 0.00 0.13 −0.01 1.59 0.00 0.00 −0.46
    6 0.00 0.12 0.00 0.16 −0.02 7.49 −1.69 0.00 −1.13

    Results in table 7 show θd=(0,0.12,0)T, which is reasonably close to the continuum limit with (0,0.2,0)T. The reaction vector, for which the continuum limit is (0.15,0.01,0,0,0)T so that R(q) is a quadratic, is now given by θr=(0.16,0.02,7.49×104,1.69×105,0)T, meaning the learned R(q) is a quartic. Figure 9 compares the averaged discrete densities with the solution of the learned continuum limit model. Figure 9c compares the learned source term with the continuum limit. While both terms are visually indistinguishable at small densities, we see that the two source terms differ at high densities, with the learned carrying capacity density, where R(q)=0, reduced relative to the continuum limit. This is consistent with previous results [15].

    (d) Case Study 4: free boundaries with proliferation

    Case Study 4 is identical to Case Study 2 except that we now introduce proliferation into the discrete model so that R(q)0 in (2.6). First, as in Case Study 3 and as described in electronic supplementary material, §S4, we average our data across each realization from our discrete model. This averaging provides us with points x¯ij between x=0 and x=L¯j at each time tj, j=1,,M, where L¯j is the average leading edge at t=tj, with corresponding density values q¯ij, where i=1,,nk and nk is the number of knots to use for averaging. We expand the functions D(q), R(q), H(q) and E(q) as

    D(q)=i=1dθidφid(q),R(q)=i=1rθirφir(q),H(q)=i=1hθihφih(q),E(q)=i=1eθieφie(q),4.23
    again restricting D(q),E(q)0. The function E(q) is used in the moving boundary condition in (2.6), as in (4.9). The matrix A and vector b are given by
    A=diag(Adr,Ah,Ae)Rnk(M1)×(d+r+h+e)andb=[bdrbhbe]Rnk(M1),4.24
    where Adr=[AdAr] as defined in (4.20), Ah and Ae are the matrices from (4.12) and (4.13), respectively, and similarly for bdr=q/t, bh, and be from (4.4), (4.12) and (4.13), respectively. Thus,
    Aθ=bandθ=[θdθrθhθe]R(d+r+h+e)×1.4.25
    Similar to Case Study 2, the coefficients for each mechanism are independent, except for θd and θr. The loss function we use is the loss function from (4.16).

    With this problem, it is difficult to learn all mechanisms simultaneously, especially as mechanical relaxation and proliferation occur on different time scales since mechanical relaxation dominates in the early part of the simulation, whereas both proliferation and mechanical relaxation play a role at later times. This means D(q) and R(q) cannot be estimated over the entire time range as was done in Case Study 3. To address this, we take a sequential learning procedure to learn these four mechanisms using four distinct time intervals Id, Ie, Ih and Ir:

    (i)

    Fix R(q)=H(q)=E(q)=0 and learn θd over tId, solving Adθd=bdr.

    (ii)

    Fix R(q)=H(q)=0 and θd and learn θe over tIe, solving Aeθe=be.

    (iii)

    Fix R(q)=0, θd, and θe and learn θh over tIh, solving Ahθh=bh.

    (iv)

    Fix θd, θe, and θh and learn θr over tIr, solving Arθr=bdrAdθd.

    In these steps, solving the system Aθ=b means to apply our stepwise procedure to this system; for these problems, we start each procedure with no active coefficients. The modularity of our approach makes this sequential learning approach straightforward to implement. For these steps, the interval Id must be over sufficiently small times so that proliferation does not dominate, noting that fixing R(q)=0 will not allow us to identify any proliferation effects when estimating the parameters. This is less relevant for Ih and Ie as the estimates of θh and θe impact the moving boundary only.

    (e) Accurate continuum limit

    We apply this procedure to data from figure 2, where the continuum limit is accurate with D(q)=50/q2, R(q)=0.15q0.01q2, H(q)=2q20.4q3 and E(q)=50/q2. The expansions we use are

    D(q)=θ1dq+θ2dq2+θ3dq3,R(q)=θ1rq+θ2rq2+θ3rq3+θ4rq4+θ5rq5,H(q)=θ1hq+θ2hq2+θ3hq3+θ4hq4+θ5hq5andE(q)=θ1eq+θ2eq2+θ3eq3.}4.26
    With these expansions, we expect to learn θd=(0,50,0)T, θr=(0.15,0.01,0,0,0)T, θh=(0,2,0.4,0,0,0)T and θe=(0,50,0)T. We average the data over 1000 realizations. For saving the solution, the time intervals we use are Id=[0,0.1], Ie=[0,5], Ih=[5,10] and Ir=[10,50], with 25, 50, 100 and 250 time points inside each time interval for saving. For interpolating the solution to obtain the averages, we use nk=25, nk=50, nk=100 and nk=50 over Id, Ie, Ih and Ir, respectively.

    To now learn the mechanisms, we apply the sequential procedure described for learning them one at a time. For each problem, we apply pruning so that points outside of the 10% and 90% density quantiles or the 20% and 80% velocity quantiles are not included. We find that θd=(0,49.60,0)T, θe=(0,49.70,0)T, θh=(0.0084,0,0,0.0011,0)T and θr=(0.15,0.010,0,0,0)T. The results with all these learned mechanisms are shown in figure 10. We see from the comparisons in figure 10a,b that the PDE results from the learned mechanisms are nearly indistinguishable from the discrete densities. Similar to Case Study 2, H(q) only matches the continuum limit at q(L(t),t). Note also that the solutions in figure 10a go up to t=100, despite the stepwise procedure considering only times up to t=50.

    Figure 10.

    Figure 10. Stepwise equation learning results for Case Study 4: free boundaries with proliferation, when the continuum limit is accurate, using the learned mechanisms with θd=(0,49.60,0)T, θe=(0,49.70,0)T, θh=(0.0084,0,0,0.0011,0)T and θr=(0.15,0.010,0,0,0)T. (a) Comparisons of the discrete density profiles (solid curves) with those learned from PDEs with the given θd, θe, θh and θr (dashed curves), plotted at the times t=0,5,10,25,50,100 in black, red, blue, green, orange and purple, respectively. The arrow shows the direction of increasing time. (b) As in (a), except comparing the leading edges. Panels (cf) are comparisons of the learned forms of D(q), R(q), H(q) and E(q) with the forms from the continuum limit (2.7).

    (f) Inaccurate continuum limit

    We now consider data from figure 3b where the continuum limit is inaccurate. Here, k=1/5 and the continuum limit vectors are θd=(0,0.2,0)T, θr=(0.15,0.01,0,0,0)T, θh=(0,2,0.4,0,0,0)T and θe=(0,0.2,0)T. Using the same procedures and expansions as figure 10, we average the data over 1000 realizations. The time intervals we use are Id=[0,2], Ie=[2,10], Ih=[10,20] and Ir=[20,50], using 20 time points for Id and 200 time points for Ie, Ih and Ir. We use nk=50 knots for averaging the solution over Id, and nk=100 knots for averaging the solution over Ie, Ih and Ir.

    To apply the equation learning procedure, we prune all matrices so that points outside of the 40% and 60% temporal quantiles are eliminated, where the temporal quantiles are the quantiles of q/t from the averaged discrete data, and similarly for points outside of the 40% and 60% velocity quantiles. We find θd=(0,0.21,0)T, θe=(0,0.23,0)T, θh=(0.15,0,0,0.0079,0)T and θr=(0.11,0.0067,0,0,0)T. Interestingly, here we learn R(q) is quadratic with coefficients that differ from the continuum limit. The results with all these learned mechanisms are shown in figure 11. We see from the comparisons in figure 11 that the PDE results from the learned mechanisms are visually indistinguishable from the discrete densities. Moreover, as in figure 10, the learned H(q) and E(q) match the continuum results at q(L(t),t), which confirms that the learned continuum limit conserves mass, as expected. Note also that the solutions in figure 11a go up to t=250, despite the stepwise procedure considering only times up to t=50, demonstrating the extrapolation power of our method.

    Figure 11.

    Figure 11. Stepwise equation learning results for Case Study 4: free boundaries with proliferation, when the continuum limit is inaccurate, using the learned mechanisms with θd=(0,0.21,0)T, θe=(0,0.23,0)T, θh=(0.15,0,0,0.0079,0)T and θr=(0.11,0.0067,0,0,0)T. (a) Comparisons of the discrete density profiles (solid curves) with those learned from PDEs with the given θd, θe, θh and θr (dashed curves), plotted at the times t=0,5,25,50,100,250 in black, red, blue, green, orange and purple, respectively. The arrow shows the direction of increasing time. (b) As in (a), except comparing the leading edges. Panels (cf) are comparisons of the learned forms of D(q), R(q), H(q) and E(q) with the forms from the continuum limit (2.7).

    5. Conclusion and discussion

    In this work, we presented a stepwise equation learning framework for learning continuum descriptions of discrete models describing population biology phenomena. Our approach provides accurate continuum approximations when standard coarse-grained approximations are inaccurate. The framework is simple to implement, efficient, easily parallelizable and modular, allowing for additional components to be added into a model with minimal changes required to accommodate them into an existing procedure. In contrast to other approaches, like neural networks [41] or linear regression approaches [43], results from our procedure are interpretable in terms of the underlying discrete process. The coefficients incorporated or removed at each stage of our procedure give a sense of the influence each model term contributes to the model, giving a greater interpretation of the results, highlighting an advantage of the stepwise approach over traditional sparse regression methods [32,37,38]. The learned continuum descriptions from our procedure enable the discovery of new mechanisms and equations describing the data from the discrete model. For example, the discovered form of D(q) can be interpreted relative to the discrete model, describing the interaction forces between neighbouring cells. In addition, we found in Case Study 4 that, when k=1/5 so that the continuum limit is inaccurate, the positive root of the quadratic form of the source term R(q) is greater than the mean field carrying capacity density K, as seen in figure 11. This increase suggests that, when the rate of mechanical relaxation is small relative to the proliferation rates, the mean field carrying capacity density in the continuum description can be different from that in the discrete model.

    We demonstrated our approach using a series of four biologically motivated case studies that incrementally build on each other, studying a discrete individual-based mechanical free boundary model of epithelial cells [10,11,15,16]. In the first two case studies, we demonstrated that we can easily rediscover the continuum limit models derived by Baker et al. [16], including the equations describing the evolution of the free boundary. The last two case studies demonstrate that, when the coarse-grained models are inaccurate, our approach can learn an accurate continuum approximation. The last case study was the most complicated, with four mechanisms needing to be learned, but the modularity of our approach made it simple to apply a sequential procedure to learning the mechanisms, applying the procedure to each mechanism in sequence. Our procedure was able to recover terms that conserved mass, despite not enforcing conservation of mass explicitly. The procedure as we have described does have some limitations, such as assuming that the mechanisms are linear combinations of basis functions, which could be handled more generally by instead using nonlinear least squares [42]. The procedure may also be sensitive to the quality of the data points included in the matrices, and thus to the parameters used for the procedure. In appendix A, we discuss a parameter sensitivity study that investigates this in greater detail. In this parameter sensitivity study, we find that the most important parameters to choose are the pruning parameters. These parameters can be easily tuned thanks to the efficiency of our method, modifying each parameter in sequence and using trial and error to determine suitable parameter values.

    There are many avenues for future work based on our approach. Firstly, two-dimensional extensions of our discrete model could be considered [55,56], which would follow the same approach except the continuum problems would have to be solved using a more detailed numerical approximation [5759]. Another avenue for exploration would be to consider applying the discrete model on a curved interface, which is more realistic than considering an epithelial sheet on a flat substrate [60,61]. Working with heterogeneous populations of cells, where parameters in the discrete model can vary between individuals in the population, is also another interesting option for future exploration [14]. Uncertainty quantification could also be considered using bootstrapping [42] or Bayesian inference [62]. Allowing for uncertainty quantification would also allow for noisy datasets to be modelled, unlike the idealized, noise-free data used in this work. We emphasize that, regardless of the approach taken for future work, we believe that our flexible stepwise learning framework can form the basis of these potential future studies.

    Data accessibility

    Julia implementations of all computations are available on GitHub at https://github.com/DanielVandH/StepwiseEQL.jl.

    Supplementary material is available online [63].

    Declaration of AI use

    We have not used AI-assisted technologies in creating this article.

    Authors' contributions

    D.J.V.: conceptualization, data curation, formal analysis, investigation, methodology, resources, software, writing—original draft; P.R.B.: conceptualization, formal analysis, funding acquisition, project administration, supervision, writing—review and editing; M.J.S.: conceptualization, formal analysis, funding acquisition, methodology, project administration, resources, supervision, writing—review and editing.

    All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

    Conflict of interest declaration

    We declare we have no competing interests.

    Funding

    M.J.S. and P.R.B. are supported by the Australian Research Council DP230100025.

    Appendix A. Parameter sensitivity study

    In this appendix, we provide a brief parameter sensitivity study, exploring the impact of parameters such as the pruning parameters and the number of time points on the results of our stepwise learning framework. We use Case Study 3 for this purpose, taking the case k=1/5 so that the continuum limit is inaccurate. The parameters we consider are h, the duration between time points; ns, the number of identically prepared realizations; tM, the final time, noting that t1=0; nk, the number of knots used for averaging; and τq, the pruning parameter for the density quantiles. We only vary each parameter one at a time, so that the default values for each parameter are h=0.1, ns=1000, nk=200, tM=75 and τq=0.25 while a given parameter is beingvaried.

    To assess the results for each set of parameters, we use the loss of the learned model, L(θ^). To further examine the results, we divide the results into two categories: those that learn D(q)=0, and those that learn D(q)0. The results of the study are shown in figure 12. We see that there is little dependence of the results on h, or equivalently on the number of time points. Figure 12b shows that ns needs to be sufficiently large, around ns>500, in order for any diffusion terms to be selected, although the loss does not change significantly once D(q) terms are identified. The final time is important, where only final times in 50tM75 give reasonable results. The number of knots is not too important according to figure 12d, so long as there are not too many or too few. The most impactful parameter is τq, where we need τq0.2 to obtain an adequate learned model; for other case studies which involve other pruning parameters, such as on the derivatives or on the leading edge, we also find that these parameters are the mostinfluential.

    Figure 12.

    Figure 12. Dependence of L(θ), where θ is the vector combining the learned θd and θr, on the parameters h, ns, tM, nK and τq. For each parameter, as it is varied the other parameters are held at their default values h=0.1, ns=1000, tM=75, nk=200 and τq=0.25.

    Overall, figure 12 shows that τq and tM are the most important parameters for this problem. This is consistent with what we have found for the other case studies, where the choice of pruning parameters is crucial and the time horizon needs to be carefully chosen so that D(q) can be identified. Choosing these parameters can be quite difficult, and trial and error is needed to identify appropriate terms, as well as understanding why a certain model is failing to give good results. For example, in Case Study 2, we determined that we had to shrink the time interval used for learning the results, and that we needed to use velocity quantiles, by determining what mechanisms are failing to be learned and seeing where the model fails to extrapolate. The values that we used for these parameters, though, had to be chosen with trial and error. Our procedure is efficient enough for this trial-and-error procedure to be performed quickly, but future work could examine these issues in more detail to simplify the selection of theseparameters.

    Footnotes

    Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.6984367.

    Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.