Distinguishing between models of mammalian gene expression: telegraph-like models versus mechanistic models

Two-state models (telegraph-like models) have a successful history of predicting distributions of cellular and nascent mRNA numbers that can well fit experimental data. These models exclude key rate limiting steps, and hence it is unclear why they are able to accurately predict the number distributions. To answer this question, here we compare these models to a novel stochastic mechanistic model of transcription in mammalian cells that presents a unified description of transcriptional factor, polymerase and mature mRNA dynamics. We show that there is a large region of parameter space where the first, second and third moments of the distributions of the waiting times between two consecutively produced transcripts (nascent or mature) of two-state and mechanistic models exactly match. In this region: (i) one can uniquely express the two-state model parameters in terms of those of the mechanistic model, (ii) the models are practically indistinguishable by comparison of their transcript numbers distributions, and (iii) they are distinguishable from the shape of their waiting time distributions. Our results clarify the relationship between different gene expression models and identify a means to select between them from experimental data.


Introduction
One of the most popular models of gene expression is the telegraph model, a twostate model where genes are assumed to be either on or off , being able to produce mature messenger RNA (mRNA) in the on state and having no mature mRNA production in the off state [1][2][3]. Because gene expression is inherently stochastic [4], mathematical models of the telegraph model often employ probabilistic modelling techniques such as the chemical master equation [5,6] or the stochastic simulation algorithm (SSA) [7]. By fitting the steady-state analytical solution of the telegraph model to experimentally measured distributions of the number of cellular mRNA in single cells, several studies have estimated the rates of gene switching and of initiation for several mammalian genes [8][9][10][11][12]. However, mapping cellular mRNA number to the underlying transcription kinetics is difficult because fluctuations in this number reflect noise owing to many processes downstream of transcription [13,14].
By contrast, the number of actively transcribing RNA polymerase II (Pol II) on a gene is not subject to these complex processes, and hence reveals more information on the details of transcription [15][16][17]. Therefore, unlike mature mRNA statistics, fluctuations of actively transcribing Pol II provide a direct readout of transcription. Because the speed of actively transcribing Pol II is approximately constant along a gene and since its premature detachment is not frequent, it follows that the loss of actively transcribing Pol II (leading to the production of a mature mRNA transcript) cannot be described by a first-order reaction (as assumed in the telegraph model for cellular mRNA). Rather it is much better captured by a delayed degradation reaction where the removal of an actively transcribing Pol II occurs after a fixed elapsed time since its binding to the promoter. A recent paper [14] has modified the telegraph model to account for the aforementioned speciality, a model that we shall refer to as the delayed telegraph model. This alternative two-state model, unlike the telegraph model, is non-Markovian; while its mathematical analysis is complex, it can be solved exactly in steady state to obtain distributions of the number of bound Pol II. Transcriptional parameters can then be obtained by fitting these distributions to those obtained experimentally using electron microscopy [18] or nascent mRNA sequencing [19]. Alternatively, because each actively transcribing Pol II has attached to it an incomplete nascent mRNA, one can also use the delay telegraph model to numerically calculate the steadystate distribution of nascent mRNA numbers which can then be fitted to distributions obtained using single-molecule fluorescence in situ hybridization (smFISH) [20].
Despite their success in predicting distributions of transcript numbers that match those calculated from experimental data, it is important to remember that both the telegraph model and the delayed telegraph model do not include a description of all the key rate limiting steps. In the past decade, several experimental papers have shown the necessity of including Pol II pausing and release in models of transcription. Bartman et al. [21] show experimentally that it is the release of Pol II from the pausing state, and not the Pol II recruitment rate, that is a key control point for gene expression. In fact, it is found universally amongst all metazoan genes that the rate of release of Pol II from pausing is the rate limiting step in transcription [22]. In mammalian cells, the release of Pol II from the paused state is dependent on the activity of several molecules, including the transcription elongation factor P-TEFb [22][23][24]. Specifically in embryonic stem cells, ChIP-Seq data have revealed that Pol II peaks near genes are at the promoter-proximal region, and that inhibiting the P-TEFb causes Pol II to remain in the promoter-proximal region genome-wide [24]. Figs 1 and 2 in Core & Adelman [22] provide a good overview of the key step of transcription, including Pol II pausing and release. The mechanism of Pol II pausing, in addition to the binding of Pol II and other transcription factors to the promoter, provides two layers of control over the production of nascent and mature mRNA. It is also found that expressed genes without a peak of paused Pol II in one cell type can acquire pausing in a different cell type, therefore genes have the potential of being regulated by proximal pausing even when the Pol II pausing peak is absent [23]. Clearly, if Pol II pausing and release is such a key feature of transcriptional models, the current ambiguity of the mechanisms' roles in the standard and delayed telegraph models is a problem in need of a solution.
Thus far, the modelling literature contains few studies where transcription is modelled incorporating Pol II pausing and release. One model, found in [26], includes pausing and release in a three-state gene model based on the findings of Bartman et al. [21], where the three states represent (i) an inactive gene state D 0 , (ii) a 'burst initiated' state D 10 where the gene is bound to transcription factors and enhancers, and (iii) a gene state D 11 in which the Pol II is bound and paused. Mature mRNA is produced in the transition from D 11 À !D 10 ; this reaction should actually produce nascent mRNA but in this model, it is assumed that the nascent lifetime is so short that a nascent mRNA description can be ignored. By ignoring nascent mRNA fluctuations and assuming that the pausing and unpausing of the Pol II is very fast, it was shown in [26] that the mature mRNA distribution from this model is well approximated by that from the telegraph model. Two other recent studies [27,28] also explore similar models albeit in the context of transcription reinitiation [29].
In summary, it is currently not so clear why the telegraph model is so successful in fitting experimental mature mRNA distributions, even though it misses important reaction steps which are key control points for gene expression. It is unclear if the assumptions made in [26] are necessary to guarantee that the true mature mRNA distribution is well approximated by the telegraph model; it could well be that these are sufficient but not necessary conditions. Because this study did not derive nascent mRNA statistics, nothing can be inferred about the reasons underlying the success of the delayed telegraph model in fitting experimental nascent mRNA distributions. A related and important question still remains: if the two-state and more detailed mechanistic models of transcription cannot be distinguished from distributions of the number of transcripts, is there another statistic that is useful to distinguish between them? In this study we take a first step at answering these questions.
The paper is divided as follows. In §2, we introduce the standard and delayed telegraph models (two-state models), as well as a mechanistic multi-state gene model that provides a stochastic description of transcription factor, Pol II and mature mRNA dynamics. Then, in §3 we explore the relationship between the two-state and mechanistic models by comparing the distributions of their waiting times between two consecutive transcripts. We show that two-state models can always be told apart from the mechanistic model from the shape of the waiting time distribution, even when they are indistinguishable from a comparison of their number distributions. We also derive conditions under which the moments of the waiting time distribution (up to third order) of the mechanistic model agree with those of two-state models, leading to expressions relating the parameters of the latter with those of the former. In §4, we perform a sensitivity analysis using the aforementioned expressions to understand which parameters of the mechanistic model are the parameters of two-state model most sensitive to. This uncovers non-trivial correlations between the parameters of two-state models. In §5, we show that the conclusions previously based on waiting time distributions agree with those obtained using model reduction methods based on number statistics. Finally, in §6 we conclude the study and discuss our results in the context of the literature.

Models of transcription
In this section, we start by introducing an effective reaction scheme for a mechanistic model of transcription describing activator, Pol II and mature mRNA dynamics. Then, we introduce the standard and delayed telegraph models as the two-state models whose dynamics we will attempt to match to that of mature mRNA and actively transcribing Pol II in the mechanistic model, respectively.

A non-Markovian mechanistic model of transcription
The mechanistic model of transcription in metazoan cells that we henceforth consider is defined in terms of the following effective reactions: The model is also illustrated in figure 1.  [32][33][34]. Transcription factors recruit cofactors and Pol II, and hence drive the (reversible) change of state from U to U Ã . Initiation starts with the binding of Pol II to the promoter; it then pauses promoter-proximally [35]. These processes are modelled by the change of state from U Ã to U ÃÃ , where the latter is the paused state. Once the pause is released, Pol II begins moving away from the promoter region, thus starting productive elongation that leads to a Pol II molecule with a nascent mRNA tail (even paused Pol II has a tail but it is very short and we will hence ignore it). Note that the nascent transcript is not a fully formed mRNA transcript since the length of the tail attached to Pol II increases as elongation progresses. The number of Pol II bound to the gene is equal to the number of nascent mRNA irrespective of their lengths. We call any Pol II undergoing productive elongation as an active Pol II (A), which implies that the change of state from paused U ÃÃ to unpaused U Ã must simultaneously lead to the production of an A particle. Note that the binding of another Pol II to the promoter is not possible when there is already a Pol II paused promoter-proximally owing to volume exclusion imposed by the latter [36].
Elongation (and termination) finishes after a fixed elapsed time τ leading to the detachment of Pol II from the gene and the dissociation of the mRNA tail from Pol II. We hence call the fully formed mRNA a mature transcript M and elongation is described by the effective reaction A ⇒ M (the double horizontal arrow is here used to denote delayed degradation which occurs after a fixed time τ). Note that the change from A to M cannot be modelled by a first-order reaction because elongation involves the movement of Pol II along the gene with an approximately constant velocity and hence the lifetime of an active Pol II molecule is not exponentially distributed [14,37]. Note that probably there are fluctuations in the elongation time (the lifetime of an active Pol II molecule) but we will ignore them because (i) we could not find single-cell measurements of the distribution of the elongation time for a given gene, and (ii) theory suggests these fluctuations are very small for long genes with low transcription rates [37].
Note that paused Pol II instead of leading to productive elongation can also undergo premature termination [22], i.e. the paused Pol II releases the short nascent mRNA tail attached to it (which is rapidly degraded) and the polymerase is recycled into the free Pol II pool. These reactions may happen quite often [38,39]; it is thus quite unlikely that they simultaneously lead to a dissociation of the dynamic promoter condensate since otherwise the efficiency of gene expression would become extremely low. Hence we assume that premature termination leads to a change from the paused state U ÃÃ to the unpaused state U Ã but do not consider transitions of the type U ÃÃ to the non-permissive/inactive state U. In the U* state, the activator is bound to the ABS meaning the Pol II can bind to the promoter. Pol II has been recruited to the promoter and pauses in state U ** . Transitions from U ** to U* either result from premature termination or else pause release and the subsequent production of an actively transcribing Pol II. Elongation (and termination) takes a deterministic time τ after which the mature mRNA is produced. The latter is subsequently degraded in the cytoplasm. For more details, see the main text.
royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210510 Finally, the mature transcripts are removed via various mRNA decay pathways in the cytoplasm [40,41]. Because many mammalian genes follow single-exponential decay kinetics [42], we model mature mRNA turnover via a first-order reaction of the form MÀ !;.
We emphasize that a speciality of our model is the reaction U ÃÃ À !U Ã þ A. This involves a change of gene state each time a transcription event occurs, whereas common models of gene expression in the literature do not have such a coupling [1,3,14,[43][44][45]. As explained above, the change of state is necessary to model the fine-scale details of the molecular biology, namely the fact that unpausing a Pol II frees the promoter and enables the binding of a new Pol II to it. Unpausing of Pol II is a key rate limiting step since the mapping of Pol II using chromatin immunoprecipitation (ChIP) revealed peaks of Pol II near many promoters [24,46,47]. In fact, this accumulation of Pol II near the promoters indicates that the relative rates of premature termination (b 0 ) and pause release (c) are much slower than rates of recruitment and entry into the paused state (b). Because regulatory processes often target rate-limiting steps, the release of paused Pol II has emerged as a central point of gene expression control [21,22].
We note that while the mechanistic model described incorporates more biological detail than the standard two-state models, nevertheless it is to be kept in mind that it is still based on some assumptions because the actual mechanisms of pausing and how variable it is between species is still an ongoing discussion in the experimental community. For instance, Pol II volume exclusion may not be enough to avoid the immediate recruitment of other polymerases.
A master equation can be written down which describes the stochastic dynamics of the mechanistic model; its form is quite different than the standard chemical master equation that is popular in the literature of gene expression [6]. The right hand side of the latter equation is only a function of the present time t. By contrast, the master equation describing our model has a right-hand side that is a function of not only the present time t but of the history of the process in the interval [t − τ, t]. This is because of the fixed time τ between the release from the paused state and the production of a mature transcript. The dependence of the dynamics of the system on its history means that our model is non-Markovian [48].

Two-state models of transcription: telegraph and delay telegraph models
In the literature, two models are commonly used to (separately) describe active Pol II and mature mRNA dynamics: we shall refer to model (2.2) as the delayed telegraph model. Note that the former is a Markov model while the latter is non-Markov in character for the same reason as described above for the mechanistic model. Clearly, the difference between the two models is how the transcripts are removed from the system: active Pol II is removed after a fixed elapsed time τ (owing to the termination of elongation which results in a mature transcript), whereas mature mRNA degradation follows the first-order kinetics. Both models postulate that at any point in time, the gene is in one of two states: an active state G from which transcription can occur and an inactive state G*. As argued by Bartman et al. [21], it is unclear what is the precise biological meaning that should be associated with these two states because the reaction in these models cannot be clearly associated with polymerase processes that are central to transcription.
As we argued in the Introduction, both telegraph and delayed telegraph models have been shown to accurately replicate experimental distributions of mature and nascent mRNA numbers. This leads us to the following question: could it be possible that the stochastic dynamics of our mechanistic model defined by (2.1) are accurately approximated by these simpler models? To be more precise, is there a set of effective transcriptional parameters ρ, σ u , σ b of the two-state models such that predict the same (or very similar) distributions of active Pol II and mature mRNA numbers in the mechanistic model. If there is such a set of effective parameters, ideally we would also want expressions showing their relationship to the parameters a, a 0 , b, b 0 , c of the mechanistic model.

3.
Relationship between the parameters of the two-state and mechanistic models 3.1. When can the two-state and mechanistic models be matched? A waiting time distribution perspective In appendices A and B, we calculate the distribution of the time between the production of two consecutive M (A) molecules for the mechanistic and (delayed) telegraph models. Using these distributions we can compute the square of the coefficient of variation of the time between two consecutive M (or A) production events. Throughout the paper, we will refer to this time between production events as the waiting time. In line with previous usage in the single enzyme molecule literature [49], we shall refer to the coefficient of variation of the waiting time distribution as the randomness parameter, which is given by for the telegraph or delayed telegraph models and by for the mechanistic model. Note that the waiting time statistics for A and M in the two-state models are the same because the waiting time distribution calculation is royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210510 not sensitive to the mode of degradation (first-order or delayed) since the absorbing state corresponds to the production of a new mature mRNA transcript or a new active Pol II which necessarily always precedes its degradation or removal. In addition to this reason, the statistics are the same for active Pol II and mature mRNA in the mechanistic model also because of the fixed time τ between the unpausing of a Pol II and the production of a mature mRNA. Note that while the randomness parameter for A or M is greater than 1 for all parameter values (see equation (3.1)) in the two-state models, the same statistical measure can be less than or greater than 1 in the mechanistic model (see equation (3.2)). In fact, evaluating the latter equation for over a million random values of parameters shows that R mec ≥ 1/2. The 2 appears because in our model, it is the smallest number of reaction steps between A production events (U Ã ! U ÃÃ ! U Ã þ A). Similar results have been derived in the context of single molecule enzyme kinetics [49].
It follows that the two-state models can only capture the waiting time statistics of the mechanistic model (up to second order) when R mec ≥ 1, which is the case when the following condition is satisfied: This implies that the conditions which favour a description of the mechanistic model by the two-state models are: (i) premature termination and unpausing from the paused promoterproximal state must be fast i.e. large b 0 + c; (ii) transcription factor binding to DNA elements and the reverse unbinding reaction must be slow, i.e. small a + a 0 ; and (iii) transcription factor unbinding is fast compared to transcription factor binding, i.e. a/a 0 is small. Note that the condition given by equation (3.3) is not a function of b, the rate at which polymerase binds the promoter and moves to the proximal paused state (see later for an explanation of the role of b).

Analytical expressions for the effective parameters of the two-state models
Matching the first three moments of the waiting time distribution of the times between consecutive M or A production events of the telegraph/delayed telegraph model (given by equations (A 5)) with those calculated using the mechanistic model (given by equation (B 5) evaluated for i = 1, 2, 3), we obtain a set of three simultaneous equations for the effective parameters of the two-state models ρ, σ u , σ b . The solution of these equations gives: and . Note that if the condition given by equation (3.3) is satisfied, then Δ ≥ 0 and hence the effective parameters defined by equations (3.4) are positive and physically meaningful. If the condition is not satisfied, then one of these effective parameters is negative which means that there are no two-state models that can approximate the mechanistic model's waiting time moments up to thirdorder. We emphasize that the effective parameters are the same for the telegraph and delay telegraph models because the waiting time calculation is insensitive to the mode of decay (first-order or delayed). In figure 2a-f, we compare the steady-state number distribution of the two-state models (which is analytically derived in [1,14]) evaluated with these effective parameters (for Δ > 0) and the steadystate number distribution of the mechanistic model (which is obtained from stochastic simulations modified to take into account fixed time delays [25]). In the cases shown, the two-state models provide an excellent match to the mechanistic model for both unimodal and bimodal distributions of active Pol II and mature mRNA numbers. Note that because most of the parameters in the mechanistic model have not been measured directly, we chose parameters such that the number distributions looked similar to those measured experimentally and such that the average number of mRNA is larger than the average number of active Pol II (the former can range from few tens to few hundreds whereas the latter is at most few tens) [9,50].

The case of fast switching between U Ã and U ÃÃ
Where min(b, b 0 ) ≫ max(a, a 0 ), the two states U Ã and U ÃÃ can be effectively subsumed into a single super state W and the system dynamics amounts to switching between an inactive state U and an active state W. Physically, one sees that this arises since in this limit transitions between U Ã and U ÃÃ occur almost instantaneously compared to transitions between U and U Ã . The transition rate from U to W is the same as from U to U Ã and hence in the two-state model this implies The transition rate from W to U must be equal to the transition rate from U Ã to U multiplied by the probability of being in state U Ã given that currently the effective system is in state W. This implies Similarly, the effective production rate is the rate of producing active Pol II from state U ÃÃ multiplied by the probability of being in this state given that currently the effective system is in state W. This implies These results can be formally obtained from equations (3.4) by choosing b 0 = γb (where γ is a constant) and taking the limit The case of fast switching in a similar three-state model of gene expression (without an explicit description of active Pol II dynamics) has been previously studied in [26]. While it is obvious that fast switching between U Ã and U ÃÃ simplifies to an effective two-state model, our condition (3.3) shows that fast switching is sufficient but not a necessary condition for a twostate model to describe the dynamics of the mechanistic model. We note that fast switching between U Ã and U ÃÃ is unlikely to be the general case since the average time scale of Pol II pausing is approximately 7min [23] and almost 1 h in a small subset of genes [51]. This indicates Pol II pausing is very stable and 'not the consequence of fast, repeated rounds of initiation and termination' [23].

Distinguishing between two-state and mechanistic models using waiting time distributions
It is interesting to note that while for Δ ≥ 0 the two-state and mechanistic models are practically indistinguishable by comparison of their number distributions, they can be always distinguished by the distribution of the time between consecutive active Pol II or mRNA production events. In particular, in figure 2g-i we show that while f (t), the waiting time distribution between consecutive production events, is a monotonically decreasing function for the two-state models, it has a peak at a non-zero value of time for the mechanistic model. Another distinguishing feature is that for two-state models, f(0) is non-zero while for the mechanistic model it is exactly zero. The latter feature can be explained as follows.
For two-state models, since there is no change in the gene state when production occurs, hence there is no lower bound on how short the time between two consecutive production events can be. However, in the mechanistic model, a production event is accompanied by a change of state (from U ÃÃ to U Ã ), therefore there is a finite non-zero time to switch back to state U ÃÃ from which the next production event occurs. Consequently, for the mechanistic model f (0) must be zero.  Figure 2. Comparison between the molecule number distributions of active Pol II and mature mRNA distributions of the two-state (reduced) models (equations (2.2) and (2.3)) and the mechanistic model (equation (2.1)). For a particular choice of parameters of the mechanistic model, equations (3.4) give the two-state model parameters. In (a-c), for three different parameter sets (one per column), we show that the mechanistic model describing the number of active Pol II molecules is well approximated by the exact steady-state solution of the delay telegraph model [14] evaluated with the effective parameters. Panels (d-f ) show a similar level of agreement between the mechanistic and two-state models but instead for the mature mRNA distributions, where the two-state model is now the telegraph model whose exact steady-state solution can be found in [1]. Steady-state distributions of the mechanistic model were obtained using the delay SSA (algorithm 2 of [25]) with 10 4 samples. In (g-i), we show the corresponding distributions of the waiting time between two consecutive active Pol II (or mature mRNA) production events for the two-state and mechanistic models. royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210510 By this reasoning, it follows that the mode should be close to zero whenever the state U ÃÃ is recovered rapidly after an active Pol II production event, which occurs when b is large. In figure 3, we confirm this intuition and show that for the mechanistic model as we increase b, the waiting time distribution of the two-state model better approximates the waiting time distribution of the mechanistic model. Note that a log-scale is used on the x-axis to emphasize that there are always differences between the mechanistic and two-state models for small values of t.

Sensitivity analysis
Equations (3.4) allow us to understand how the parameters of the mechanistic model influence the effective parameters of the two-state models. We define the ordered set of mechanistic model parameters as θ mec = {a, a 0 , b, b 0 , c} and the ordered set of the two-state model parameters as θ tele = {ρ, σ u , σ b }. In table 1, we show the sign of the derivative of a parameter in a two-state model with respect to changes in the parameter of the mechanistic model (when Δ ≥ 0). For example, the first row shows the sign of the derivative of ρ with respect to a, a 0 , b, b 0 and c. A positive (negative) sign for the pair (ρ, a) indicates that an increase in a leads to an increase (decrease) in ρ. We also show the same but for the burst size β = ρ/σ u , a commonly cited measure equal to the amount of mRNA produced while the gene is in the on state (in the two-state models). Note that while the sign is fixed for most cases, in three instances the sign can flip. There is also a case where one of the two-state model parameters is independent of one of the parameters of the mechanistic model (marked with a 0). Owing to the complicated nature of equations (3.4), it is difficult to deduce the signs in table 1 using simple arguments, however in some cases it can be done. For example, the relationship of ρ with respect to parameters b, b 0 and c is intuitive because: (i) increasing b increases the time in the U ÃÃ state meaning production of A (or M) happens more often; (ii) decreasing b 0 has the opposite effect, meaning production of A (or M) occurs less often; and (iii) increasing c obviously increases the production rate of A (or M) and hence increases the predicted value of ρ.
Next, we investigate the sensitivities of the parameters θ tele of the two-state model to the parameters of the mechanistic model θ mec . For this purpose, we randomly selected 10 3 parameter sets from a log-scaled space in the θ mec parameters, accepting only those parameter set combinations that came within 2 experimental errors of the measurements of Oct4 gene: ρ = 3.2 × 10 −2 ± 1.0 × 10 −2 s −1 , σ u = 3 × 10 −3 ± 2 × 10 −3 s −1 and σ b = 1.5 × 10 −4 ± 0.5 × 10 −4 s −1 [50]. We also did this for the Nanog gene whose measurements were: ρ = 1.3 × 10 −2 ± 0.3 × 10 −2 s −1 , σ u = 1.2 × 10 −4 ± 0.2 × 10 −4 s −1 and σ b = 3.2 × 10 −5 ± 0.3 × 10 −5 s −1 . A log-scaled parameter space was used such that various combinations of mechanistic model parameter timescales could be easily explored. The ranges of the mechanistic model parameters that we explored were u mec i [ ½10 À4 , 10 s À1 for the Oct4 gene and u mec i [ ½10 À5 , 10 À1 s À1 for the Nanog gene. The sensitivities calculated are the absolute values of the relative sensitivities given by, where senðu tele i , u mec j Þ is the magnitude of the relative sensitivity of u tele i with respect to u mec j .
As we show in figure 4, we find that for both genes: (i) the initiation rate ρ of the two-state models is most sensitive to parameters b and c in the mechanistic model, i.e. parameters that control the rate of Pol II binding, of entering and leaving the promoter-proximal paused state; (ii) the rate royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210510 of switching off of the two-state models σ u is most sensitive to parameter a 0 (controlling transcription factor unbinding) and also to parameters b, c which control the initiation rate; and (iii) the rate of switching on of the two-state models σ b is most sensitive to parameter a (controlling transcription factor binding) but also to parameters a 0 , c which control the rate of switching off and the initiation rate. In figure 4, we also show which parameters of the mechanistic model are the three parameters of the two-state model least sensitive to. This analysis identifies how 'microscopic' parameters of the mechanistic model affect the 'macroscopic' parameters of the two-state models. More importantly, it shows that the latter are typically correlated owing to their dependence on common microscopic parameters.   [50]. We chose 10 3 parameter sets θ mec at random, accepting only parameter sets for which the predicted telegraph model parameters ρ, σ u and σ b from equations (3.4) were within two experimental errors of values reported in [50]. From these randomly chosen parameter sets, we then calculated the relative sensitivity senðu tele i , u mec j Þ which is given by equation (4.1). The proportions on the pie charts show the proportion of parameter sets for which {i, j} were the most/least sensitive parameters, where {i, j} states that i is the most/least sensitive parameter followed by j. (a) for Oct4, the most sensitive parameters for ρ are b and c, with the majority of parameter sets being most sensitive to b and second-most to c. (b) for Oct4, the least sensitive parameters for ρ are a and a 0 , with the majority of parameter sets being least sensitive to a and second-least sensitive to a 0 . (c-f ) follow similar interpretations as made for (a) and (b) for the Oct4 gene, and (g-l ) follow similar interpretations for the Nanog gene.

Model reduction using number statistics or three-state models
Thus far, we have explored model reduction solely using waiting time statistics. Alternatively, one can match two-state and mechanistic models using moments of the number of molecules. As well, one can match three-state models and mechanistic models using waiting time or number statistics. In this section, we explore these alternative perspectives.

Obtaining reduced models with two states using number statistics
We begin by finding the Fano factor (defined as the variance divided by the mean) of the active Pol II and mature mRNA numbers in both the two-state and mechanistic models. In appendices C and D, we derive expressions for the mean and variance of active Pol II and mature mRNA numbers in steady-state conditions for both the mechanistic and two-state models (for a test of their accuracy versus stochastic simulations using the delay SSA, see table 2). The Fano factor of the two-state models is easily proved to be always greater than 1. Specifically, for the delayed and standard telegraph models, we have, respectively: The Fano factor of the number of active Pol II in the mechanistic model is given by: where A 0 , A 1 and A 2 can be positive or negative and are complicated functions of a, a 0 , b, b 0 , c, and where we have defined Note that the arguments of the exponential functions are negative for all positive values of the parameters. The Fano factor of the mature mRNA statistics is derived in appendix D and is given by When condition (5.6) is satisfied, one can find a mapping between the standard telegraph model describing mature mRNA and the mechanistic model. We note that this condition is not the same as that derived from model reduction using waiting time statistics, namely equation (3.3

). In fact, if equation (3.3) is not satisfied then neither is equation (5.6);
i.e. for all points in parameter space in which it is not possible to match the moments of waiting time distributions of the twostate and mechanistic models, it is also not possible to match the moments of the mature mRNA numbers. However, it also follows that there is a region of parameter space of the mechanistic model where moment matching of the two-state model using waiting time statistics is possible (equation (3.3) is satisfied) whereas moment matching using number statistics is not (equation (5.6) is not satisfied). This region of parameter space where the two methods give different results is very small whenever a + a 0 ≫ d where the rates of transcriptional factor binding/unbinding to the promoter are much larger than the rate of mature mRNA degradation. This seems to be generally the case since degradation timescales of mature mRNA are generally many hours in eukaryotic cells [8]. Incidentally, this offers an explanation why the Fano factor of mature mRNA is invariably measured to be greater than 1 in the literature of eukaryotic gene expression [9,10,52].
Owing to the complicated nature of the expression in equation (5.3), the derivation of an analytic condition for which the Fano factor of active Pol II is greater than 1 appears to be difficult to obtain. However, in the limit of τ → ∞ it is clear that FF mec A ! R mec . Hence, in the limit of long elongation times, the condition necessary for model reduction using active Pol II moment number statistics, i.e. FF mec A ! 1, is equivalent to the condition necessary for model reduction using waiting time statistics given by equation (3.3) (which is the same as R mec ≥ 1). This is intuitive because the waiting time calculation does not consider the removal of active Pol II via elongation but only their production time statistics. It can also be proved from equation (5.3) that in the limit τ → 0 we have FF mec A ! 1. What happens for finite τ > 0 is difficult to deduce from equation (5.3) and hence we investigate this numerically.
In figure 5, we evaluate equation (5.3) as a function of τ for a number of parameter sets with different R mec . Several notable features can be seen: (i) if R mec < 1 then FF mec A , 1, i.e. if model reduction using waiting time statistics is not possible then it is also impossible using number statistics; and (ii) for R mec ≥ 1, as we increase τ, FF mec A decreases from 1 to a value below 1, reaches a minimum and then increases up to the value R mec . Consequently, if the condition for model reduction using waiting time statistics is satisfied, it is not necessarily true that it is possible to achieve model reduction according to number statistics. In figure 6a-c, we show a heat map of the minimum Fano factor (achieved at intermediate τ) in the parameter space of the mechanistic model. Note that the minimum achieved inside the region where R mec > 1 (the region above the white line) is not far below 1. As a consequence, while here there is no model reduction from a number statistics point of view, model reduction using waiting time statistics is possible, and the distribution computed using the effective parameters given by equations (3.4) while not perfect, it is acceptable ( figure 6d-f ).
Thus far, we have looked at model reduction via number statistics from the perspective of when the Fano factor numbers of the mechanistic and two-state models are both greater than one. In appendix E, we extend this analysis royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210510 further by considering two other types of model reduction via number statistics: (i) matching of the molecule number moments and (ii) of the number distributions of the mechanistic and the two-state models for active Pol II and mature mRNA numbers. In particular, we found the following: (i) within the region of parameter space of the mechanistic model described by the condition equation (3.3), it was possible to numerically find parameters of the two-state models such that the first three moments of the active Pol II and mature mRNA number distributions of the two-state models agreed with those of the mechanistic model (figure 7a-c,g-i); and (ii) the Hellinger distance between the molecule number distributions predicted by the mechanistic model and the molecule number distributions of the twostate models that provides the best approximate distribution of the mechanistic model, is very small within the region defined by equation (3.3) (figure 7d-f, j-l ). The analysis shows there is a close relationship between model reduction using waiting time and number statistics, and supports the conclusions reached in § §3 and 4 using waiting time statistics.

Obtaining reduced models with three states using waiting time statistics
Thus far, we have considered the approximation of the mechanistic model by two-state models (telegraph and delay telegraph models). However, some papers have postulated the existence of two off states for some mammalian genes because the time spent by the gene in the off state is measured to be non-exponential [11]. This has led to a variation of the telegraph model, which we will refer to as the refractory model: One can also postulate a modification ( parallel to the delay telegraph model) that describes active Pol II rather than mature mRNA: An analysis akin to the one shown for the two-state models in appendix A.1 shows that the Laplace transform of the waiting time distribution of the time between consecutive active Pol II or mature mRNA production events is given bỹ The randomness parameter is then given by the square of the coefficient of variation squared of the time between two consecutive production events: ð5:11Þ    Figure 5. Fano factor of the active Pol II number distribution for the mechanistic model as a function of the elongation time τ and the randomness parameter R mec . The Fano factor is evaluated using equation (5.3). Note that the large τ limit of the Fano factor is equal to the randomness parameter R mec which is given by equation  Panel (a) shows that if R mec ≤ 1 then the Fano factor is less than 1 for all τ. Panel (b) shows that if R mec ≥ 1 then the Fano factor is less than 1 for a small enough value of τ.
Hence the randomness parameter of the reduced models (5.7) and (5.8) is always greater than 1. By contrast, we have already shown by equation (3.2) that for the mechanistic model the randomness parameter can be greater than or less than 1. Hence it follows that the condition given by equation (3.3) is necessary for both telegraph models and those with a refractory state to approximate the mechanistic model. Similar to what we have previously done for the two-state models, analytical expressions expressing the four parameters of the reduced refractory models in terms of the six parameters of the mechanistic model can be derived by matching the first four moments of the waiting time distribution of the two models. The steady-state distribution solutions of active Pol II and mature mRNA numbers of the reduced refractory models evaluated with these effective parameters provide an excellent approximation to the distributions of the mechanistic model. However, because this was already achieved using twostate models and since the refractory models have the same limitations as the two-state models (randomness parameter cannot be less than 1), it follows that two-state models provide the optimal choice for model reduction within the parameter space defined by equation (3.3).

Discussion
In this paper, we have investigated to what extent can twostate models predict the active Pol II and mature mRNA dynamics of a more realistic mechanistic model that incorporates transcriptional factor binding and unbinding, Pol II dynamics (binding, pausing, release, elongation) and mature mRNA dynamics. We found that there is a region of parameter space where there exists a choice of parameters of two-state models in terms of the mechanistic model such that the first three moments of their waiting time distributions exactly match. The distributions of active Pol II and mature mRNA numbers predicted by two-state models with these effective parameters provide a very close match to the distributions predicted by the mechanistic model; nevertheless, the models can be distinguished by comparison of the shape of their waiting time distribution. The waiting time distribution for the two-state model has a non-zero value at t = 0 and decreases monotonically with time; whereas for the mechanistic model, the waiting time distribution is zero at t = 0 and has a peak at a non-zero value of time. We note that while in principle these two distributions are always distinguishable, in practice the differences will be small if the rate of Pol II binding and entry into the paused state is very large. We also showed that the necessary condition for the reduction of the mechanistic model to two-state models that was analytically derived using waiting time statistics i.e. equation (3.3), is compatible with the region of parameter space identified by model reduction using matching of moments and distributions of molecule numbers. We note that while our model description was framed in terms of an activator, it has alternative interpretations which increase its generality and applicability. One such alternative interpretation is in terms of a repressor that operates via competitive binding 11 [53,54]. In this interpretation, U is a state that has a repressor bound to the promoter such that Pol II is blocked from being able to bind to the promoter. U Ã then represents the state where the promoter is free and neither repressor nor Pol II is bound to the promoter, meaning that the binding site is accessible to both repressors and Pol II. Finally, the U ÃÃ state represents the state in which Pol II is recruited and proximally paused. A main distinction of this work from the analysis of a similar model studied in [26] is that the present mechanistic model has an explicit description of active Pol II that allows us to study the accuracy of the delay telegraph model. It is also noteworthy that while [26] showed that the telegraph model provided an excellent approximation to the mature mRNA distribution of a similar mechanistic model under the assumption of rapid entry and exit from the paused state, in this study we showed using a variety of model reduction techniques that this assumption though sufficient is not necessary. We also note that while other papers have made use of waiting time statistics in the context of gene expression [11], our approach is distinctly different. The distribution of the off time in another three-state model of gene expression [11] is not the same as the distribution of the time between two consecutive active Pol II production   Figure 7. Comparison of model reduction of the mechanistic model to two-state models using two different types of number statistics and comparison with reduction from waiting time statistics. In (a-c) black dots show the points in parameter space where the first three moments of the active Pol II number distributions of the mechanistic and the delayed telegraph model match numerically using the Newton-Raphson method; in (g-i) we show the same for the distributions of mature mRNA of the mechanistic and telegraph models. The heat map shows the value of Δ = b 0 + c − a − (a 2 /a 0 ) and the solid black lines divides areas where Δ > 0 (waiting time moment matching exists) and Δ < 0 (waiting time moment matching does not exist). Note that the black dots in (a-c) do not fill the whole region Δ > 0 because of numerical issues with the solver (see appendix E for a discussion). In (d-f ) and ( j-l ), we show the Hellinger distance (h in log scale) between the molecule number distributions predicted by the mechanistic model and the molecule number distributions of the two-state models that provides the best approximate distribution of the mechanistic model; the parameters of the two-state models are those learnt after O(10 5 ) iterations of an algorithm that maximizes the likelihood. The mature mRNA decay rate d = 0.0016 s −1 in all cases and the delay time is τ = 273.62 s. See appendix E for details of the numerical procedures used. events; this is because the former provides only information about the time between two successive bursts of gene expression which occurs on long timescales and reflects the accessibility of the promoter but has no information on the fast Pol II processes within each burst. To the best of our knowledge, the experimental measurement of the distribution of the waiting time as defined in our paper has not been attempted yet. This is because with current labelling and imaging technology, it is not easy to directly visualize, track and quantify individual transcriptional initiation events. However, a set of recent papers report progress in this direction by estimating an approximate distribution between two consecutive initiation events in Drosophila using a machine-learning approach [55,56].
We finish by a discussion of the validity and interpretation of equations (3.4) which express the parameters of two-state models as a function of the parameters of the mechanistic model. We have shown from these expressions that different parameters of the two-state models can be effectively correlated owing to their dependence on a common parameter/s of the mechanistic model. This may explain correlations found between parameters of two-state models estimated from single-cell RNA sequencing for mammalian cells [10]. There is a region of parameter space where the effective parameters given by our theory become negative (when the inequality given by equation (3.3) is not satisfied), meaning that in this case there is no two-state model that can match the first three moments of the waiting time distribution of the mechanistic model; we also showed that if the elongation time and the mature mRNA degradation timescale are large enough, the aforementioned region is also characterized by Fano factors of active Pol II numbers and mature mRNA numbers that are less than one, i.e. sub-Poissonian statistics. To see whether such a case is realistic we extensively searched through the experimental literature of gene expression, and found that for mature mRNA all papers report a Fano factor of greater than 1 which is consistent with constitutive or bursty expression; for nascent mRNA, the majority of papers report Fano factors greater than 1 (see for example [9,20,57]) with the exception of one paper (see supplementary fig. 6 of [58]). However, it is to be borne in mind that while theoretically nascent mRNA numbers should equal the active Pol II numbers in our model, in practice owing to the intricacies of smFISH this is not the case, as we now explain. The number of nascent mRNA is most commonly calculated by dividing the total fluorescent signal from a transcription site by the fluorescence emitted by a mature transcript. In this technique, a fluorescent signal is emitted by oligonucleotide probes bound to the nascent mRNA tail. Since as an active Pol II travels along the gene, its nascent mRNA tail grows, we expect the fluorescent signal intensity to increase as well [14]. Hence it follows that the total nascent mRNA n N calculated using this method is generally a lower bound on the actual number of active Pol II n A at a transcription site in the nucleus, i.e. n N ∼ f n A where f is a fraction. From this, it immediately follows that the Fano factor of nascent mRNA is always less than the Fano factor of active Pol II. Thus the measurement of Fano factors of nascent mRNA numbers slightly less than 1 in [58] probably implies Fano factors of active Pol II which are above 1. Hence we come to the conclusion that all available evidence to date for both nascent and mature mRNA seems consistent with equation (3.3), which implies that equations (3.4) provide a generally useful means to understand the parameters of two-state models in terms of underlying microscopic processes.
Data accessibility. Julia Code for stochastic simulations of the mechanistic and the delayed telegraph model has been deposited at https:// github.com/jamesholehouse/Delayed-SSA-Models.
wherefðsÞ ¼ Ð 1 0 f ðtÞ e Àst dt. It then follows that the first three moments of the time between two consecutive active Pol II production events are given by (A 5Þ The square of the coefficient of variation of the waiting time (the randomness parameter) is (A 6Þ Note that R tele > 1 for all parameter values. For reference, the exponential distribution is characterized by a coefficient of variation squared equal to 1.

A.2. Proof of the monotonicity of the waiting time distribution
Here we prove that the waiting time distribution of the delayed telegraph model (and of the telegraph model) is a monotonically decreasing function. We start by rewriting equation (A 4) in the form: Taking the inverse Laplace transform of equation (A 7) one can show that and To determine if f (t) is monotonically decreasing in t, we need to know what is the sign of a 1 , a 2 , β 1 , β 2 . From equations (A 10) and (A 11), since the right hand sides of both equations are positive then a 1 , a 2 must also be positive (if one or both are negative then the sign of the left hand side will not match the sign on the right hand side of one of the two equations). Also by solving equations (A 8) and (A 9) simultaneously for β 1,2 one finds that these are positive. Because a 1 , a 2 , β 1 , β 2 > 0, it follows from equation (A 13) that ∂ t f (t) < 0 for all times and hence f (t) is a monotonic decreasing function of time t. Furthermore, by the initial value theorem [ In this section, we extend the analysis of appendix A to study the mechanistic model, which is given by We now derive the distribution of the time between two consecutive active Pol II production events and also the same but for mature mRNA M. We first consider the active Pol II case. We define four states: state W where the gene is in state U and the number of active Pol II is n; state X where the gene is in state U Ã and the number of active Pol II is n; state Y where the gene is in state U ÃÃ and the number of active Pol II is n; and state Z where the gene is in state U Ã and the number of active Pol II is n + 1. Hence the effective reaction scheme describing these four states is Just after an active Pol II is produced, the gene is in state U Ã and hence our initial condition is state X. The absorbing state is state Z. The master equations describing the effective reaction scheme are: @ t P W ðtÞ ¼ ÀaP W ðtÞ þ a 0 P X ðtÞ, @ t P X ðtÞ ¼ aP W ðtÞ þ b 0 P Y ðtÞ À ða 0 þ bÞP X ðtÞ and @ t P Y ðtÞ ¼ bP X ðtÞ À ðb 0 þ cÞP Y ðtÞ, with initial condition P X (0) = 1, P W (0) = P Y (0) = P Z (0) = 0. The distribution f(t) of the time t at which the system enters the absorbing state Z is given by the probability that the system is in state Y at time t multiplied by the rate of switching from state Y to Z, i.e. f(t) = c P Y (t). Solving the differential equations (B 3) using the Laplace transform we obtain: From the definition of the Laplace transform, we have that the moments are given by The square of the coefficient of variation squared of the time between two consecutive production events (the randomness parameter) is: Note that depending on the parameter values, R mec can be greater than or less than one (unlike for two-state models where it was shown in appendix A that the randomness parameter is always greater than one). Suppose there is a fixed time τ between the production of an active Pol II and the production of a mature mRNA (via elongation and termination). It follows that the time between two consecutive mature mRNA production events is precisely the same as the time between two consecutive Pol II activation events, i.e. all the waiting time statistics that we have derived for active Pol II also hold for mature mRNA too.

B.2. Some properties of the waiting time distribution
We note that sincefðsÞ in equation (B 4) can be written in the formfðsÞ ¼ P 3 k¼1 g k ð1 þ sc k Þ À1 (for particular values of the constants γ k and c k ), it follows that This is unlike that for two-state models in Appendix A where the waiting time distribution was a sum of two exponentials. Also by the initial value theorem and equation (B 4), we have that f ð0Þ ¼ lim s!1 sfðsÞ ¼ 0. As well necessarily for any distribution we have that lim t→∞ f(t) = 0. Hence it follows by the behaviour of f(t) at t = 0 and t = ∞, that the positive function f(t) must achieve one or more maxima at intermediate times. Hence the waiting time distribution for the mechanistic model is non-monotonic in time t (unlike for two-state models, which have a monotonic waiting time distribution).
where 〈X〉 denotes the average number of molecules of species X. The species are numbered in the order U, U Ã , M and the reactions in the order U ! U Ã , U Ã ! U, U Ã ! U ÃÃ , U ÃÃ ! U Ã , U ÃÃ ! U Ã þ M, M ! ;. We have here also used the same conservation law as in the previous appendix C.
The time-evolution equations for the mean numbers and covariance matrix are given by equations (C 4) and (C 5) where we replace S by S M and f by f M . Setting the time derivatives to zero and solving these equations simultaneously, we find the steady-state mean and variance of mature mRNA given by 〈n 3 〉 and C 33 , respectively. The Fano factor of mature mRNA is then determined by their ratio: : Appendix E. Comparison to reduction methods using number statistics Here we compare the waiting time moment matching approach to two well-known model reduction techniques, which are: (i) matching of the moments of the number distributions, and (ii) matching of the number distributions. The first method consists of matching the first three moments of transcript number distributions of the mechanistic and two-state models. We find the steady-state mean, variance and skewness of the transcript numbers in twostate models as functions of ρ, σ b and σ u and then we equate them to the steady-state mean, variance and skewness of the transcript numbers computed for the mechanistic model (the first two moments are in appendices C and D while the third moments can be computed similarly by solving the moment equations). For a given set of parameters of the mechanistic model, we solve the resulting system of three equations (with three unknowns ρ, σ b and σ u ) numerically-this gives us the effective parameters of the two-state models. We have searched for numerical solutions throughout huge ranges of parameter space, however as shown in figure 7a-c,g-i we only find a physically meaningful solution ( positive real numbers for the parameters of the two-state models which are shown by black dots in the figure) in the region of space given by equation (3.3) (where the mechanistic and two-state models can be matched using waiting time statistics; this is the region above the black solid line in the figure). Additionally note that the moment expressions for active Pol II for both the delayed telegraph model and mechanistic model are complicated and moment matching results in transcendental equations for the parameters ρ, σ b and σ u . The effective parameters for the delayed telegraph model, close to the contour lines where Δ = 0, are relatively Table 2. Comparison of the mean and variance of n active Pol II and m mature mRNA numbers in the mechanistic model evaluated from the exact theory (appendices C and D) and delay SSA (dSSA) with 10 5 samples. (Six different parameter sets are considered.) small. Thus, conventional numerical solvers struggle to find solutions close to the boundaries (see figure 7a-c). By contrast, figure 7g-i shows that effective parameters for the telegraph model can be found for nearly all mechanistic model parameter sets within the region given by equation (3.3). This is thanks to the relative simplicity of the analytical expressions for the telegraph model (compared to the transcendental equations encountered for the delayed telegraph model). In both figure 7a-c and g-i, we use the FindRoot function with the Newton-Raphson method in Mathematica, where we start very close to the parameter prediction of the waiting time moment matching method and do 2 × 10 4 iterations with a working precision of 200. Starting with points far from the theoretical predictions, no physical solutions are possible. The second method consists of matching the transcript number distributions of the mechanistic and two-state models via maximum likelihood estimation (MLE). The results are presented in figure 7d-g and j-l. The likelihood function is given by Pðx i juÞ, ( E 1 Þ where {x i } is a set of samples (with length N s ) generated using the delay SSA of the mechanistic model with specified parameters {a, a 0 , b, b 0 , c} (each x i represents the ith sample for the transcript number), θ is some candidate set of telegraph model parameters {ρ, σ u , σ b }, and P(x i |θ) is the probability of measuring x i given a telegraph model with parameters θ.
We calculate the probability density function for a given parameter set θ using the exact solution for the delayed telegraph model [14] in figure 7d-f and the exact solution for the telegraph model [1] in figure 7j-l. Next, we minimize the negative log-likelihood function: , (E 2Þ using the adaptive differential evolution algorithm in Julia's BlackBoxOptim package to find the optimal parameters θ* of the two-state model, where Q is the set of all possible two-state model parameters (essentially amounting to a choice of parameter space bounds in BlackBoxOptim).
Using these optimal parameters we obtain the steady-state distributions of active Pol II and of mature mRNA using the exact solutions of the two-state models. Finally, we compute the Hellinger distance (h) between these distributions and the corresponding ones from the mechanistic modelthe distance is shown by the colour in figure 7d-f and j-l.
Note that in the regions where Δ > 0, h is generally smaller than in the regions where Δ < 0, i.e. the transcript number distributions found by MLE best approximate those of the mechanistic model in the region of parameter space given by equation (3.3) (where the mechanistic and two-state models can be matched using waiting time statistics). Therefore, we can conclude that waiting time moment matching agrees with this alternative model reduction method, albeit the former is much more computationally efficient and accurate than the latter.