Cluster mean-field theory accurately predicts statistical properties of large-scale DNA methylation patterns

The accurate establishment and maintenance of DNA methylation patterns is vital for mammalian development and disruption to these processes causes human disease. Our understanding of DNA methylation mechanisms has been facilitated by mathematical modelling, particularly stochastic simulations. Megabase-scale variation in DNA methylation patterns is observed in development, cancer and ageing and the mechanisms generating these patterns are little understood. However, the computational cost of stochastic simulations prevents them from modelling such large genomic regions. Here, we test the utility of three different mean-field models to predict summary statistics associated with large-scale DNA methylation patterns. By comparison to stochastic simulations, we show that a cluster mean-field model accurately predicts the statistical properties of steady-state DNA methylation patterns, including the mean and variance of methylation levels calculated across a system of CpG sites, as well as the covariance and correlation of methylation levels between neighbouring sites. We also demonstrate that a cluster mean-field model can be used within an approximate Bayesian computation framework to accurately infer model parameters from data. As mean-field models can be solved numerically in a few seconds, our work demonstrates their utility for understanding the processes underpinning large-scale DNA methylation patterns.


Introduction
DNA methylation is a repressive epigenetic mark [1] which is primarily found on the cytosines of CpG dinucleotides in mammals. Double-stranded CpG dyads can be unmethylated or methylated on both strands (u and m, respectively) or methylated on only one strand (hemimethylated, h). DNA methylation is largely erased from the genome during early mammalian development [2]. It is then re-established by the de novo DNA methyltransferases DNMT3A and DNMT3B [3] resulting in a landscape where 70-80% of CpGs are methylated in most human cells [4]. Regulatory elements such as promoters and enhancers often remain methylation free [1]. During DNA replication, the nascent strand is synthesized with unmethylated cytosines and methylation patterns are copied by the maintenance methyltransferase, DNMT1 [5]. Failure to maintain DNA methylation at a locus results in passive DNA demethylation. Methylation can also be removed actively through transient modification by ten eleven translocation (TET) enzymes and subsequent DNA repair [6].
Waves of demethylation and remethylation take place during early development and the generation of germline cells [2]. Changes in DNA methylation patterns also occur during development and cellular differentiation, resulting in cell type-specific methylation patterns [7]. The correct establishment of DNA methylation patterns is vital for normal development. Mutations in DNMTs cause Mendelian disorders in humans [8][9][10] and mice knockouts die before or shortly after birth [3,5]. Widespread alterations in DNA methylation patterns occur in cancer and ageing [11,12], but the significance of these changes is unclear. It has been observed that globally hypomethylated mice expressing a single hypomorphic DNMT1 allele develop cancer, suggesting that altered DNA methylation can cause cancer [13]. However, the mechanisms underpinning DNA methylation changes remain unclear preventing the robust delineation of their role in development and disease.
Mathematical models are powerful tools for understanding complex biological processes, including DNA methylation. The importance of interactions between CpGs in maintaining DNA methylation patterns was first postulated through modelling [14]. Specifically, the authors modelled collaborative interactions where CpGs within a region of the genome can influence the state of other CpGs, e.g. through enzyme recruitment. Models including such collaborativity were subsequently found to explain experimental measurements of methylation maintenance in vitro and in vivo more closely than those that did not include it [15,16]. A recent study also suggests that collaborativity mediated by neighbour-guided error correction through DNMT1 is important for maintaining DNA methylation [17]. Deterministic models, non-spatial stochastic models and spatial stochastic models have all been used to describe DNA methylation [18]. Deterministic models are based on rate equations while stochastic models are based on Fokker-Planck equations or chemical master equations (CMEs). CMEs are ideal because they take into account the inherent discreteness of molecular fluctuations [19] which is well known to play an important role in cellular dynamics [20]. The CME of simple non-spatial stochastic models can be solved exactly in closed-form [21], but this is often not possible for spatial stochastic models. Rather in this case, stochastic simulations are used to model the individual reaction processes described by the CME. Various types of stochastic models have been used to describe collaborative methylation systems (e.g. [22][23][24][25][26]). To date, such mathematical models have been applied to understand methylation patterns on a kilobase scale. However, megabase-scale alterations to DNA methylation patterns occur in development, cancer and ageing [27]. These include the formation of megabase-sized partially methylated domains in cancer which have been linked to genome instability [27,28]. Existing models rely on simulations that are too computationally expensive to run for such large genomic regions.
Here, we test the idea that large-scale steady-state methylation patterns can be modelled in a tractable manner using mean-field (MF) models. By comparison to synthetic data generated from stochastic simulations, we demonstrate that a type of cluster MF model can predict the statistical properties of large-scale methylation patterns. In §2, we introduce a nearest-neighbour collaborative model for DNA methylation and describe the process used to simulate data from this model. We describe the three MF models we test in §3. In §4, we compare the ability of each MF model to predict statistics associated with methylation patterns resulting from the simulations. We find that a type of cluster MF model provides excellent predictions and demonstrate that this model can be used within an approximate Bayesian computation (ABC) framework to infer parameters underpinning large-scale methylation systems. Finally, in §5, we discuss the implications of our findings.

.1. Model set-up
We consider the reaction system in figure 1, where some reactions are non-collaborative (involving only the 'target' CpG, whose methylation state changes during the reaction), while others are collaborative (involving both a target CpG and a 'mediator' CpG). The role of the mediator is to encourage the reaction to occur, e.g. via the recruitment of methylase or demethylase enzymes. This system, and reduced versions, have previously been used to examine small-scale methylation patterns [14,29,30]. Previously, collaborative and non-collaborative reactions have been referred to as positive-feedback and noisy reactions in a study modelling histone modifications [31].
We make the following assumptions: (i) CpGs can only influence the methylation state of their nearest neighbours (see figure 2). While both experimental and modelling studies have demonstrated the importance of CpGs being influenced by surrounding CpGs [14,15], the extent and range of such influence are unknown. We therefore consider only interactions between nearest neighbours. (ii) There are no direct transitions between the unmethylated and methylated states. We justify this with the observation that methylase and demethylase enzymes act on single DNA strands [6,32]. This implies that hemimethylation is a necessary transition state between unmethylated and fully methylated CpGs. (iii) The system has reached a steady state. Here, we assume that there are no long-term effects of DNA replication on methylation patterns. This assumption is supported by the observation that the DNA methylation patterns of cycling and arrested cells are similar [33].
We do not impose an upper bound on the system size, N [ N, allowing large-scale methylation patterns to be considered. We also assume that the rates k i , i = {1, 2, …, 12} are of the form given in  (see table 1 legend), all reactions in figure 1 are possible for any {x, y}.

Simulations of nearest-neighbour collaborative system
In the simple case of two CpGs, the system can be in six possible states: mm, uu, hh, um, hm and uh. For such a small system, all possible transitions between states (via reactions in figure 1) can be identified and the evolution of the system can be described exactly by six mathematical equations, one for each state. However, for large systems, it is infeasible to identify all possible states and transitions between states meaning that equations describing the exact evolution of the system cannot be formulated. While stochastic simulations are computationally expensive, they are the ground truth of the nearest-neighbour collaborative system to which we compare our MF models and so here we describe the process underlying these simulations. We focus on the steady-state case so that u, h and m levels fluctuate around some fixed steady-state values. For a system of N CpGs, we simulate the nearest-neighbour collaborative system using Gillespie's algorithm [34]-see figure 3 for an illustration of how this algorithm simulates a 4-CpG system. Essentially, in each step of Gillespie's algorithm, a timepoint is chosen and a single reaction selected to occur. This reaction changes the methylation state of a single CpG, i.e. all other CpGs remain unchanged. Throughout the simulations, we monitor the proportions of u, h and m sites, declaring steady state to be attained when these fluctuate around fixed values, after which we sample the methylation pattern, containing N methylation states, at constant time intervals. For each parameter set, we sample at T = 10 6 /N timepoints to obtain a dataset of 10 6 steady-state methylation states.

Analysis of simulated data
To facilitate our analysis, we define a sequence u t , t ∈ {1, …, T}, where We define h t and m t similarly; see figure 4. We also define z t via non-collaborative reactions: collaborative reactions:  Figure 2. Collaborative interactions that can influence a target X under the nearest-neighbour collaborative model. Individual CpGs are represented by 'lollipops', with their colour indicating their methylation status (white: unmethylated; grey: hemimethylated; black: methylated). Collaborative methylation and demethylation reactions can only occur between neighbouring CpGs ( potential influences on CpG X are indicated by arrows). Table 1. Reaction rates for the model in figure 1. Here, a > 0, x ≥ 0, y ≥ 0 so that all reaction rates are non-negative. The values of x and y determine the probability associated with each type of reaction. For example, consider the case x < 1, y < 1. Then k 3 = k 4 = a is the largest reaction rate and so non-collaborative demethylation reactions are most probable, while k 5 = k 6 = k 7 = k 8 = axy is the smallest reaction rate and so collaborative methylation reactions are least probable.
We then define sequences v t and w t by comparing v t i and w t i provides information regarding the methylation state of two neighbouring CpGs at time t ∈ {1, …, T}. We calculate the covariance between v t and w t for each t, Covar(v t , w t ), averaging over t ∈ {1, …, T} to obtain an overall steadystate covariance between neighbouring sites: Finally, the steady-state correlation between neighbouring sites, ρ(z s ) : = ρ(v s , w s ), is calculated reaction: Figure 3. Stochastic simulations of the nearest-neighbour collaborative system. For simplicity, a 4-CpG system at 3 timepoints is considered here. We fix a, x and y and impose periodic boundary conditions so that the first and final CpG can interact. After all potential non-collaborative and collaborative reactions are identified, Gillespie's algorithm chooses a timepoint and a single reaction to occur at this timepoint. This reaction changes the methylation state of a single CpG and the list of potential reactions is updated to account for this change. This process is repeated to generate dynamical behaviour. In the example shown, we start with the system on the left. All possible reactions are listed and at the first timepoint (chosen by Gillespie's algorithm) a h À ! k 4 u reaction is chosen to occur at target positiont 1 . The methylation state at the target position is changed accordingly (all other CpGs remain unchanged) and the list of potential reactions is updated to account for this change (middle). At the next timepoint (chosen by Gillespie's algorithm), a m þ u À ! k 10 h þ u reaction is chosen to occur at target position,t 2 , with CpGm acting as mediator. Again, the methylation state at the target position is changed accordingly (all other CpGs remain unchanged) and the list of potential reactions is updated (right). This process is repeated until the system reaches steady state.
unmethylated; hemimethylated; methylated Figure 4. Construction of the sequences u t , h t , m t and z t . For simplicity, only 10 CpGs are shown for a single timepoint, t. A vector u t is created, where u t i ¼ 1 if CpG i is unmethylated at time t and u t i ¼ 0 otherwise. Vectors h t and m t are constructed similarly. Finally, a vector z t is created via z t = u t + 2h t + 3m t .

Mean-field models for DNA methylation maintenance
For large CpG systems, the CME describing the nearestneighbour collaborative model cannot be easily solved and stochastic simulations are computationally expensive. In contrast, it is often the case that models simplified using the MF approximation can be computationally solved in a timeefficient manner and we aim to test whether they accurately approximate the nearest-neighbour collaborative model for DNA methylation. To this end, we construct three MF models (see § §3.1-3.3). These models consider an infinite system of CpGs and so, by design, their ability to accurately describe a genomic region increases with the size of the region. In these models, nearest-neighbour interactions are approximated by considering the mean state of the system. In the first model, nearest-neighbour interactions are entirely approximated by considering the probability that two states are adjacent (one-site MF model). The second model describes distinct pairs of CpGs (distinct pairs MF model).
Interactions occurring within a pair are directly accounted for and other nearest-neighbour interactions are approximated by considering the probability that two paired states are adjacent. In the third model, we consider overlapping pairs of CpGs (overlapping pairs MF model). Interactions occurring within a pair are again directly accounted for, but now other nearest-neighbour interactions are approximated by considering the probability that two paired states overlap. The remainder of this section is devoted to mathematical descriptions of these models. Note that while the computational cost of simulations increases with sequence length, numerical solutions of the MF models discussed in § §3.1-3.3 are independent of sequence length. Hence these models describe arbitrarily large genomic regions.

One-site mean-field model
We define the proportion of sites in the u, h, m states to be the mean u, h, m levels, μ u , μ h , μ m , respectively. Here we construct a one-site MF model, where changes in the system are influenced by μ u , μ h , μ m , rather than nearest-neighbour interactions; see figure 5.
Consider the reaction u þ h À ! k5 h þ h in figure 1. Since the h mediator is unchanged by the reaction, we can write this as an effective first-order reaction u À ! 2k5m h h, where the h mediator is absorbed into the effective reaction rate by making it proportional to μ h . The factor of two accounts for the h mediator being on either side of the u target. Similarly, u þ m À ! k6 h þ m can be written as u À ! 2k6m m h. Thus the figure 1 can be written as a single effective first-order reaction u À ! r h, with r = k 1 + 2k 5 μ h + 2k 6 μ m . We thus write the system in figure 1 as the effective first-order system and μ u + μ h + μ m = 1. The k 7 and k 11 terms have an additional factor of two since their associated reactions involve two h reactants, and either of these can change state during the reaction. Let L u , L h , L m be the 'level' (proportion) of u, h, m at a single CpG, respectively. A CpG can only be in one state at any time and so the state of a CpG at position j can be described by a state vector of the form : ð3:2Þ The probabilities associated with these three states sum to one and so we only need to consider the probabilities associated with (L u , L h , L m ) j = (1, 0, 0), and (L u , L h , L m ) j = (0, 1, 0). Using (3.1) we construct the CME describing the probability that a site is in the (1, 0, 0) or (0, 1, 0) state: We can also obtain moment equations (see appendix C of [35]) for the statistics of L u , L h , L m . The mean values are μ u = 〈L u 〉, μ h = 〈L h 〉, μ m = 〈L m 〉, where the angled brackets denote the expected value. Due to the conservation law L m = 1 − L u − L h , we again need only consider equations for u and h. From equation (C 1) in [35] with the means are described by the equations Note that these equations are identical to (3.3) with P(1, 0, 0) = μ u and P(0, 1, 0) = μ h . Setting leads to implicit equations for the steady-state means, m us ¼ a 1s a 4s a 1s a 3s þ a 2s a 3s þ a 1s a 4s and m hs ¼ a 1s a 3s a 1s a 3s þ a 2s a 3s þ a 1s a 4s ,
royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20210707 Using equation (C2) in [35] with (3.4) the second-moment equations are given by Variances, σ 2 (L us ) and σ 2 (L hs ), can then be obtained via along with the covariance Covar(L us , L hs ), which is given by CovarðL us , L hs Þ ¼ hL u L h i s À m us m hs ¼ Àm us m hs : ð3:10Þ The mean, variance and covariances associated with the m state can now be obtained using m ms ¼ 1 À m us À m hs , s 2 ðL ms Þ ¼ s 2 ð1 À L us À L hs Þ ¼ s 2 ðL us Þ þ s 2 ðL hs Þ þ 2CovarðL us , L hs Þ, CovarðL us , L ms Þ ¼ Às 2 ðL us Þ À CovarðL us , L hs Þ and CovarðL hs , L ms Þ ¼ ÀCovarðL us , L hs Þ À s 2 ðL hs Þ: ð3:11Þ We calculate the steady-state mean and variance, μ zs and σ 2 (zs), associated with the variable z = L u + 2L h + 3L m via m zs ¼ m us þ 2m hs þ 3m ms and s 2 ðz s Þ ¼ s 2 ðL us Þ þ 4s 2 ðL hs Þ þ 9s 2 ðL ms Þ þ 2 2CovarðL us , L hs Þ þ 3CovarðL us , L ms Þ þ 6CovarðL hs , L ms Þ : Note that the superscript t was only used in §2.2 to differentiate between samples at different timepoints. Here, we simply have a single z. Since no spatial information is obtained from the one-site MF model, the covariance and correlation between neighbouring sites cannot be extracted.

Distinct pairs mean-field model
We next construct a two-site MF model, where we consider 'clusters' of two adjacent CpGs. Such cluster MF models have been successfully used to study vehicular traffic and driven-diffusive gas models [36,37]. We define the mean level (proportion) of pairs in the six possible states, mm, uu, hh, um ð: ¼ muÞ, hm ð: ¼ mhÞ, uh ð: ¼ huÞ,

ð3:12Þ
to be μ 1 , μ 2 , μ 3 , μ 4 , μ 5 , μ 6 , respectively. Here, μ 4 is the proportion of pairs containing u and m, irrespective of order. Similarly, μ 5 is the proportion of pairs containing h and m, and μ 6 is the proportion of pairs containing u and h, irrespective of order.
In the distinct pairs MF model (DPMF model), CpGs within a pair are allowed to interact directly, preserving some nearest-neighbour interactions. The influence of the nearest-neighbour CpGs flanking the pair is then approximated by considering the probabilities that an adjacent pair is in each of the six possible states; see figure 6. Here, each CpG belongs to only one pair and each pair of sites is a single reactant. As with the one-site model, we consider an effective first-order reaction system, given by mm À ! a1 hm, uu À ! a2 uh, hh À ! a3 uh, hh À ! a4 hm, um À ! a5 hm, um À ! a6 uh, hm À ! a7 um, hm À ! a8 mm, hm À ! a9 hh, uh À ! a10 hh, uh À ! a11 uu, uh À ! a12 um, where the effective rates are given bŷ ð3:14Þ unmethylated; hemimethylated; methylated Figure 5. Schematic of the one-site MF model. CpGs are influenced by the mean of the system rather than nearest-neighbour interactions.
royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20210707 and P 6 i¼1 m i ¼ 1. We describe the construction ofâ 1 in appendix A. Essentially,â 1 -â 12 are constructed by considering all possible ways that each reaction in (3.13) can occur via a reaction from figure 1 taking place. Such reactions can occur within the reactant pair or can take place between a site within the pair and a site from an adjacent pair. Terms that are associated with the k 7 and k 11 reaction rates and involve interactions between two hh, two hm or two uh pairs have an additional factor of two since either pair can change state during these reactions. While we can, in principle, calculate the distribution of pairs in (3.13) [38], we restrict our attention to obtaining moments of the system.
Let L 1 , L 2 , L 3 , L 4 , L 5 , L 6 be the level ( proportion) of each of the paired states at a single pair of CpGs. Since a pair of CpGs can only be in one state at any time, the state of a single pair of CpGs can be described by one of six vectors S j , j = 1, …, 6, where The u, h, m levels within a pair of CpGs,L u ,L h ,L m , are then given bŷ Using (3.13), we construct the CME for the system along with the first and second moment equations for L i , i = {1, …, 6}; see appendix C. The first moment equations describe μ i = 〈L i 〉, the mean values of L i for i = {1, …, 6}. For fixed parameters, solving these equations numerically in steady state gives the steady-state means, μ is , i = {1, …, 6}. Note that these means are independent of a.
From the second moment equations, we obtain the steady-state expected values of L i L j , 〈L i L j 〉 s , for i, j ∈ {1, 2, …, 6}. As expected from equation (3.15), Variances and covariances are then obtained using and CovarðL is , Once again, these are independent of the parameter a.
We now have statistics for the paired states in (3.12). The steady-state means for the u, h, m levels in a pair are then given by ð3:17Þ The pair-to-pair variance in u level is given by and similarly for the variances associated with h and m, s 2 ðL hs Þ and s 2 ðL ms Þ. Covariances are given by CovarðL 5s , L 6s Þ and analogously for CovarðL hs ,L ms Þ, CovarðL us ,L hs Þ.
Note that the statistics obtained so far relate to u, h, m levels within a pair of CpGs. The mean level of a state within a pair is the same as the mean level of the state at each site. However, this is not the case for higher moments. For example, s 2 ðL us Þ, s 2 ðL hs Þ, s 2 ðL ms Þ are pair-to-pair variances, rather than site-to-site variances.
We aim to obtain statistics relating to where v and w are as in equation (2.5) and we again do not require the superscript t.

Overlapping pairs mean-field model
Similarly to the DPMF model, there are also six possible states for a pair of CpGs in the overlappling pairs MF model (OPMF model); see (3.12). The OPMF model also incorporates direct interactions within a pair. However, each CpG now belongs to two pairs, one with its left-hand neighbour and one with its right-hand neighbour, leading to a system of overlapping pairs. The influence of CpGs flanking a pair is now approximated by considering the conditional probability that the pair overlaps with another pair of a certain state; see figure 7.
An identical approach to that in §3.2 leads to the CME and first and second moment equations for the system, see appendix C, and from these we obtain means, variances and covariances associated with the paired states and with the pair-to-pair u, h, m levels. Again, these depend only on x and y. The statistics associated with z are obtained as for the DPMF model; see §3.2.

Model comparison and parameter inference
To test whether MF models are capable of modelling largescale methylation patterns, we now compare model predictions to synthetic data generated using nearest-neighbour collaborative simulations. The statistical properties obtained from our models are independent of a and so we fix a = 0.2. Since demethylation dominates when y < 1 and methylation dominates when y > 1, we hypothesize that a sharp change in the behaviour of the system may occur at y = 1. To capture this potential transition for different collaborativity strengths, we consider x = {0.1, 1, 5, 50}, y = {0.1, 0.2, …, 2}. Since we approximate the nearest-neighbour collaborative system by MF models, which consider an infinite system of CpGs, finite-size effects cause discrepancies between simulations and MF model predictions as x increases. We counteract this by increasing the number of simulated sites and so simulate N = 200 CpGs when x = {0.1, 1, 5} and N = 500 CpGs when x = 50.
For each parameter set, we simulate n = 10 replicate datasets using the Gillespie algorithm (see §2.2). In particular, in each simulation, we take steady-state samples at T = 10 6 / 200 = 5000 timepoints when x = {0.1, 1, 5} and at T = 10 6 / 500 = 2000 timepoints when x = 50. Each of the replicate datasets therefore contains 10 6 states, from which we calculate the statistics of interest (see §2.3). We then calculate the means and standard errors over the 10 datasets to obtain overall summary statistics.

Mean-field models capture steady-state methylation levels
We first compare the mean u, h, m levels (μ us , μ hs , μ ms ) from the MF models to those from the simulations (figure 8). Considering the simulated data first, we observe that u and m dominate when y < 1 or y > 1, respectively. h is an intermediate state between u and m and peaks at y = 1, where there is also a sharp transition between u-and m-dominant states. While all models capture the qualitative behaviour of the means as x and y are varied, we observe that predictions of methylation levels from the OPMF model are closest to those observed in the simulated data (figure 8). All three models predict the mean u and m levels reasonably well; however only the OPMF model accurately predicts the mean h level for large x and for y close to one. All predictions from the OPMF model are within the error of the simulated data and it successfully captures the transition observed at y = 1 for all x considered. Conversely, the predictions of the other models deviate from the simulations at the transition point when x is large. The one-site MF model deviates to the greatest extent for 87% of the parameter sets and, for each mean, the average percentage error across y is higher for the one-site MF model compared to the other models for all x except μ hs when x = 5 (see appendix D, table 3). This suggests that the one-site model has the worst predictive power and we exclude it from further analysis.

The overlapping pairs mean-field model accurately predicts associations between neighbouring sites
To test whether MF models can predict associations between neighbouring CpGs, we consider z = {z 1 royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20210707 From z, we calculate the mean and variance associated with the methylation state, and the covariance and correlation in methylation state between neighbouring sites. In the simulated data (figure 9), we again observe a transition in these statistics when the methylation and demethylation strengths are equal (y = 1). Our results counterintuitively suggest that neighbouring sites are most correlated here (the peak ρ(z s ) occurs when y = 1).
To gain insight into this observation, we examine the patterns that evolve in the stochastic simulations for x = 50 (figure 10). When y is small, large u clusters form and we intuitively expect neighbouring sites to be highly correlated. However, these large clusters are interspersed with infrequent, isolated occurrences of h and m which have low correlations with their neighbours. Moreover, ten m (or h) sites appearing in isolation will result in smaller u clusters than the ten sites appearing as a single cluster. The overall effect is to reduce the correlation when y is small. A similar rationale explains the low correlation when y is large. u and m cluster sizes are most similar when methylation and demethylation are equally strong, resulting in u and m sites correlating equally with their neighbours and the overall correlation peaking.
We next consider predictions of these statistics from the MF models. Predictions from the OPMF model lie within the error observed in the simulated data for all parameters considered ( figure 9). Conversely, predicted statistics from the DPMF model show large deviations from the corresponding simulated statistics when x is large and y is close to one demonstrating that it has lower predictive power. The OPMF model again shows superior performance over the DPMF model when the percentage errors are considered. For each z statistic in figure 9, the average percentage error across y is higher for the DPMF model compared to the OPMF model for all x except μ zs when x = 0.1, where the percentage errors for the two models are the same to 2 decimal places (see appendix D, table 3).

Overlapping pairs mean-field model can infer the parameters underpinning large-scale methylation patterns
While thorough inference exploration is not the objective of the current work, we carry out a short proof-of-concept study to demonstrate that the OPMF model could, in principle, be used to infer collaborativity and methylation strengths from data. We generate synthetic data for selected model parameters, see §2.2, and then infer these parameters back using the OPMF model. Methylomes are typically assayed by whole-genome bisulfite sequencing [39]. A variant of this, hairpin-bisulfite sequencing, can be used to assay both strands of each DNA molecule [40]. In both cases, the resulting data are composed of short reads. Each read assays few CpGs and we do not know if reads originate from the same cell or DNA molecule. Simulated datasets from previous sections do not provide a good reflection of bisulfite sequencing since all of the CpGs in the simulated system were sampled at the same timepoints, the equivalent of the CpGs originating from the same molecule. To obtain short-read data, we instead simulate data for N = 1000 CpGs. After steady state is reached, we sample the system at 10 000 different timepoints. This is equivalent to sampling 10 000 molecules in steady state at a single timepoint. For each CpG, we take the methylation state at 30 timepoints, randomly chosen from the original 10 000. The timepoints chosen for each CpG site are independent of those chosen for other CpGs. This emulates hairpinbisulfite sequencing data with coverage of 30 reads per CpG. We combine the sample states for all CpGs into a single dataset, X, and consider z = {z 1 , z 2 , …, z 1000 }, where z i = {1, 2, 3} if X i corresponds to a u, h, m state, respectively. The mean and variance of z are calculated and used for inference.
There are numerous well-established methods for conducting inference. For example, in cases where a likelihood function is available, inference can be conducted using unmethylated; hemimethylated; methylated Figure 6. Schematic of the distinct pairs MF model. The two CpGs within a pair can interact directly with each other and the pair is also influenced by the mean state of pairs in the system. In the figure, the uh pair can change state due to interactions between the u and h within the pair, and due to the mean state of pairs in the system. royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20210707 maximum-likelihood estimation [41], which provides point estimates for model parameters. In our case, this likelihood is the solution of the CME for the full nearest-neighbour collaborative model and thus is not available to us. We therefore use the approximate Bayesian computation sequential Monte Carlo algorithm (ABC SMC; see [42] for a comprehensive review) which is an alternative likelihood-free inference approach that allows us to use the moments derived from the OPMF model. Using a Bayesian inference approach also has the advantage of providing us with a distribution of the estimate value from which we can calculate confidence intervals associated with our inferred parameter values [43][44][45][46]. We use uniform priors, U(0, 100) and U(0, 2), for x and y, respectively. We also define the distance, d, between the simulations and model prediction to be the sum of the absolute relative errors of the mean and variance, i.e.
where μ model , μ data are the means of z from the model and data, respectively, and s 2 model , s 2 data are the variances associated with the model and data, respectively. To rapidly select appropriate tolerances, we calculate the true distances between the simulated data and model predictions at the parameter values of interest.
As in previous sections, we examine x = {0.1, 1, 5, 50}. For each x, we infer for y = {0.3, 1, 1.7} using the GpABC Julia package [47]. Accepted x, y values from the final ABC SMC population make up the posterior distributions for x and y, with the means taken to be the inferred parameter values. 95% confidence intervals were calculated by removing the lowest 2.5% and highest 2.5% from the posteriors.
We find that the inferred parameter values are always of the same order of magnitude as the true values (table 2), with the most successfully inferred parameters being inferred within 1% of the true values ( figure 11a,b). There are only two cases where the true parameter values lie outwith the inferred 95% confidence intervals (e.g. figure 11c). However, we obtain wide posteriors for large x (e.g. figure 11d), indicating more uncertainty in the inference.   The accuracy of inference is highly dependent on the model sensitivity to parameters, with ease of inference increasing as sensitivity to the parameters increases. To test the sensitivity of our model to model parameters, we calculate the relative sensitivity [48] of the OPMF model to the parameters x and y, for x = {0.1, 0.2, ..., 99.9, 100} and y = {0.1, 0.2, ..., 1.9, 2}. For all parameters considered, the xsensitivity divided by the y-sensitivity is strictly less than one, indicating that the model shows more sensitivity to y than x. This means that y will be inferred more accurately than x.

Discussion
Genomic DNA methylation patterns vary between cell types, across differentiation and in disease. The mechanisms underpinning this variation remain unclear but can be better understood using mathematical models. Current approaches are limited by their inability to feasibly model large systems of CpGs and thus understand known large-scale features of methylomes. Here, we show that a cluster MF model, based around overlapping pairs of CpGs, can predict DNA methylation patterns generated under a nearest-neighbour collaborative model. This suggests that MF models are a valuable tool for understanding large-scale DNA methylation features.
Previous studies have used mathematical modelling to gain insight into the mechanisms regulating the establishment and maintenance of DNA methylation patterns. In particular, the requirement of collaborativity between CpGs to maintain DNA methylation patterns was postulated through modelling [14] before being observed experimentally [15,17] . The points and error bars correspond to the ground-truth statistics obtained from n = 10 replicate simulations of the full nearest-neighbour model (the mean results obtained from the ten simulations are shown by points, with the error bars denoting the standard error of this estimate across the ten simulations). Note that since z is determined by x and y, the statistics plotted here are implicit functions of x and y.
royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20210707 replicates with 500 CpGs take approximately 6 h. There are an average of 10 000 CpGs per megabase in the human genome making these models infeasible for such large genomic regions due to computational expense. This is particularly true in the case of parameter inference where such simulations would need to be rerun many times. By contrast, the OPMF model can be applied to arbitrarily large systems of CpGs and solved numerically in seconds for any parameters to give accurate predictions for statistics of interest. Our model is based upon the same, or similar, reaction systems used in previous stochastic modelling studies [14,29,30], but can be used to study larger systems of CpGs than previously considered. This makes our MF model far better suited to understanding the mechanisms underpinning megabase-sized variations in DNA methylation observed in development, ageing and cancer, which occur at a scale three orders of magnitude larger than promoters [27]. To our knowledge, the largest system previously examined mathematically contained 10 5 CpGs [15]. However, here simulations were conducted for only a single model parameter set. Running large-scale simulations of this type for many parameter sets will result in computational bottlenecks, meaning that such simulations cannot be used for inference. A previous study has proposed a method, based on the generalized method of moments, for rapid inference using DNA methylation patterns [49]. However, the largest system tackled with this approach contains 10 CpGs. Here, we show that our OPMF model can, in principle, be used for accurate, time-efficient inference when modelling arbitrarily large genomic regions. 'Divide and conquer' strategies have also previously been used to reduce the computational expense of large-scale simulations [50,51]. Here, a large simulation is split into smaller batches and batch results are aggregated to give overall estimates. Such an approach could make simulations of the full nearest-neighbour collaborative model applicable to larger systems of CpGs by reducing the computational burden. However, we have observed finite-size effects in our stochastic simulations suggesting that splitting them into smaller batches would result in discrepancies between estimates obtained via this approach and those obtained via a single large-scale simulation.
Our results show that the success of a MF model depends on the way in which it is constructed. Of the three MF models considered here we observe that the one-site MF model and OPMF model provide the worst and best approximations to the full nearest-neighbour model, respectively. In the full nearest-neighbour collaborative system, the probability that a CpG is in a particular state depends on the state of its nearest neighbours on the left and right. Since the one-site MF model contains no spatial information, it least reflects the full model. The DPMF and OPMF models both incorporate spatial information by allowing direct nearest-neighbour interactions to occur within pairs of CpGs. However, in the DPMF model, non-overlapping pairs are considered and the probability that a pair is in a certain state depends only on the mean state of pairs in the system, rather than on the states of its adjacent pairs. Conversely, in the OPMF model, overlapping pairs are considered and the state of a pair depends on the state of pairs it overlaps. Hence the OPMF model best reflects the full model.
When used for inference, the OPMF model has a higher sensitivity to methylation strength (y) than collaborativity strength (x) explaining why the former is generally better inferred than the latter. However, some posteriors obtained in §4.3 are very wide and/or the true parameter values lie outside the inferred 95% CIs, indicating that there is scope for inference to be improved. Since our model shows royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20210707 impressive performance in forward prediction, discrepancies between true and inferred parameters are likely due to insufficient data or the inference technique used. ABC algorithms are sensitive to the summary statistics, distance and tolerance used in the acceptance criteria for parameters [52]. Moreover, regardless of the method used, inference will always be difficult for systems of CpGs that are either 100% methylated or 100% unmethylated. This is because a very large x would  Figure 12. Potential collaborative interactions that can influence a target, X, under the models in [14] and the model considered here. (a) A cluster of 80 CpGs is first considered in [14], where a CpG can collaborate with any other CpG in the system with equal probability. (b) A high-density cluster (of 80 CpGs) adjacent to a highly methylated low-density region (of 240 CpGs) is then considered in [14], where there is nearest-neighbour collaborative methylation (red arrows) and the probability of collaborative demethylation occurring due to interaction between two sites decays as the distance between them increases (blue arrows; decay in reaction probability shown by narrowing width of arrows). Note that collaborative demethylation is restricted to the 80-CpG cluster. (c) In this paper, collaborative reactions only occur between neighbouring CpGs and there is no upper bound on the system size, allowing large-scale patterns to be considered. royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20210707 result in 100% unmethylation for almost all y < 1 and 100% methylation for almost all y > 1. Despite these challenges, our analysis demonstrates that the OPMF model can in principle be used for inference. The accuracy of inference may be improved in future studies by experimenting with different inference techniques and sample sizes. In addition, the summary statistics that can be reliably calculated from real data are dependent on the technology used to generate the data. Most DNA methylation data are currently derived from technologies that produce short reads containing between one and three CpGs. Statistics involving associations between consecutive CpGs from a single molecule cannot therefore be accurately estimated. This restricts inference based on such data to statistics such as the mean and standard deviation. Long-read technologies which can capture hundreds of CpGs on a single read have recently been applied to assay DNA methylation patterns [53]. Data obtained from such technologies could allow higher order statistics to be accurately estimated and potentially improve inference.
Here, we assume that the processes governing the creation of methylation patterns in vivo are described well by our nearest-neighbour collaborative model. It is possible that collaborativity in vivo can occur between non-nearestneighbours, something which is not explicitly accounted for in our nearest-neighbour collaborative model. Collaborative methylation interactions are likely determined by the properties of the DNA methylation machinery. DNMT1 and DNMT3B both methylate processively along DNA strands whereas DNMT3A methylates in a distributive manner but can form multimers along the DNA fibre [32]. However, the range and strengths across which these interactions occur are currently unclear so we focus on nearest-neighbour interactions. We note that our OPMF model does capture interactions beyond nearest neighbours because the mean state of pairs in the system influences the change in state of CpGs. Previous modelling studies have also considered different forms of collaborative interactions between CpGs in a system ( figure 12a,b). In [14], collaboration between any CpGs in the system and nearest-neighbour collaborative methylation alongside distance-dependent collaborative demethylation were both demonstrated to produce stable CpG clusters that were either methylated or unmethylated. Stable clusters were also observed under a distance-dependent collaborative model, where collaborative demethylation dominates over short ranges and collaborative methylation dominates over long ranges [29].
Here, we have also assumed that the system of CpGs we model reaches a steady state. This means that we consider either non-dividing cells, or dividing cells which settle down to steady state between replication events such that DNA replication has no long-term effects on DNA methylation patterns. Whether or not a cell satisfies the latter case is dependent on the real magnitudes of k i , i = {1, …, 12}, and the time between replication events. Experimental studies show that arrested cells have similar DNA methylation patterns to those that are cycling, supporting the assumption that DNA replication has no long-term effect on DNA methylation [33]. Furthermore, an analysis of DNA methylation patterns on newly synthesized DNA suggests that re-methylation occurs within 20 min of replication [54]. However, another analysis of DNA methylation following replication suggests re-methylation is often delayed [55]. At present, it is unclear whether this delay is sufficient to have an effect on methylation patterns during the following cell cycle.
Our assumption that k i , i = {1, …, 12} take the form in table 1 could be violated in reality. For example, DNMT1 shows a strong preference for h sites over u sites [32], meaning that methylation reactions with an h target may have higher reaction rates than those with a u target. Future work could relax rate assumptions to account for such factors. Preliminary investigations confirm that the OPMF model provides a good approximation to the nearest-neighbour collaborative system when k i , i = {1, …, 12} are considered as twelve independent parameters (data not shown). However, the difficulty of parameter inference increases with the number of parameters, meaning that relaxing parameter assumptions will likely decrease inference quality. Nonetheless, our model suggests that it is the parameters x and y that determine steady-state methylation patterns, rather than individual reaction rates. This is supported by a study where modelling of experimental data suggested that the ratio between methylation and demethylation rates determines steady-state methylation levels at single CpGs [56].
Here, we demonstrate that MF models can accurately predict the behaviour of large CpG systems subjected to nearest-neighbour collaboration. Our study presents the first mathematical modelling approach that can be applied to arbitrarily large systems of CpGs. The future application of this approach will facilitate the delineation of the methylation dynamics that underpin the formation of large-scale methylation patterns in developmental and disease contexts.
Data accessibility. The code used to simulate the full nearest-neighbour collaborative system is available at https://github.com/lkerr34/ kerr˙mean˙field˙paper˙2021.
Next, we consider all mm À ! hm reactions that can occur due to an m within the mm interacting with a site from an adjacent pair. For example, an m could collaborate with a u from an adjacent pair via m þ u À ! k10 h þ u, i.e. we can have mm þ uu À ! hm þ uu, mm þ um À ! hm þ um, mm þ uh À ! hm þ uh: ðA 1Þ The first reaction can be written as an effective first-order reaction, mm À ! 2k10m 2 hm, where the uu is absorbed into the reaction rate by making it proportional to μ 2 , and the factor of two allows for either m within the mm to undergo this reaction.
The second reaction in (A 1) can be written as mm À ! k10m 4 hm, where the um is absorbed into the reaction rate and either m can undergo the reaction, giving us a factor of two. However, this reaction requires the u within the um to be directly adjacent to the mm. The probability of having um in this particular order is 1 2 , giving an effective reaction rate of 2k 10 1 2 m 4 ¼ k 10 m 4 . Similarly, the third reaction in (A 1) can be written as mm À ! k10m 6 hm.
Finally, an m within the mm can interact with an h from an adjacent pair, via m þ h À ! k9 h þ h, i.e. we can have mm þ hh À ! hm þ hh, mm þ hm À ! hm þ hm, mm þ uh À ! hm þ uh: ðA 2Þ Using similar arguments as above, we can write each of these three reactions as the effective first-order reaction mm À ! hm, with rates 2k 9 μ 3 , k 9 μ 5 and k 9 μ 6 , respectively. Hence, Appendix B. Derivation of effective reaction rates for the overlapping pairs mean-field model We now constructã 1 , the reaction rate associated with mm À ! hm in (3.19). As with the DPMF model, reactions occurring within the mm pair contribute a 2k 3 term toã 1 . Now, mm À ! hm can occur due to a m þ u À ! k10 h þ u reaction if one of the m sites in the mm pair is also in a pair with a u site, i.e. we can have where now the um and mm reactants share a common m. This can be written as the effective first-order reaction mm À ! hm with rate where μ 4 /2/(μ 1 + μ 4 /2 + μ 5 /2) is the probability that an m within the mm also forms a um with its other neighbour, i.e. it is the conditional probability that a pair is in the um state given that we know a particular site in the pair is m.
Recall that μ 4 gives the proportion of um and mu pairs, while μ 5 gives the proportion of hm and mh pairs. The factors of 1/2 in (B 1) account for the fact that the u and h in the um and hm must be on a particular side of the m (since an m is on its other side). The factor of two at the front of (B 1) allows for either m in the mm to undergo the reaction. Similarly, mm À ! hm can occur due to a m þ h À ! k9 h þ h reaction if one of the m sites in the mm is also in a pair with a h site, i.e. we can have where the reactant hm and mm share a common m. This reaction can again be written as mm À ! hm, where the rate is derived in a similar way as above.