Simplified mechanistic models of gene regulation for analysis and design

Simplified mechanistic models of gene regulation are fundamental to systems biology and essential for synthetic biology. However, conventional simplified models typically have outputs that are not directly measurable and are based on assumptions that do not often hold under experimental conditions. To resolve these issues, we propose a ‘model reduction’ methodology and simplified kinetic models of total mRNA and total protein concentration, which link measurements, models and biochemical mechanisms. The proposed approach is based on assumptions that hold generally and include typical cases in systems and synthetic biology where conventional models do not hold. We use novel assumptions regarding the ‘speed of reactions’, which are required for the methodology to be consistent with experimental data. We also apply the methodology to propose simplified models of gene regulation in the presence of multiple protein binding sites, providing both biological insights and an illustration of the generality of the methodology. Lastly, we show that modelling total protein concentration allows us to address key questions on gene regulation, such as efficiency, burden, competition and modularity.

with other protein concentrations. The full ODE model iṡ (1) where the definitions of kinetic rates and species can be found in the paper (Equation 1).

The Reduced Model and Quasi-Steady State Approximation
We next derive the reduced model using the quasi-steady state approximation. To complete this, we set the summed variables which equate to the total mRNA concentration, the total protein concentration, and the total dimer concentration, in both bounds and unbound form. Using the variables z T T together with x T T simplifies the application of singular perturbation theory to the mechanistic model, as we will see below.
We find the reduced model with the proposed slow variables of total mRNA m T T and total protein x T T , while all other variables are assumed fast variables. The proposed slow variables have dynamicsṁ and the quasi-steady state approximations together with the conserved quantities are T = (w −6 + β l2 )X T2 w 8 g T X T2 = (w −8 + β tg )C 2T w 1 Pg T = (w −1 + w 2 )C 1T (4) We include the degradation/dilution terms β l2 X L2 for operator dissociation and β lg C 2L for dimer dissociation as this more closely matches experimental measurements for dissociation [7] and reduces the error. We also include γ t R in the ribosome binding quais-steady state for consistency.
Equation (1) has multiple conserved quantities, including g T + C 1T + C 2T = g T T shown above as well as R + C 3T = R T P + C 1T = P T However, the Ribosome and Polymerase are used by other genes, and so reduced models in terms of R T and P T in this form are specific to the prototype gene regulation, whereas we wish to develop a more general methodology. Thus, we treat R = R(t) and P = P(t) as time-varying parameters. This allows us to easily substitute terms involving all bound forms of RNAp and ribosomes when modelling gene regulatory networks instead of a single inputoutput 'module' presented here. As such, we can model the effects from competition of polymerase and ribosomes. We next use the variables C 2T and C 2L to reduce the equations, as they are directly related to expression levels in the model and so act as a proxy. We have the reduced differential-algebraic equationsṁ and we have the reduced parameters (7) such that V tx T is the transcription rate per non-repressed promoter, F L is the fraction of nonrepressed promoters, B L2 , B T2 are the effective dimerisation association constants,B Lg , B Lg are the effective dimer-operator association constants, γ T , β T are the effective mRNA and protein degradation rates, A 1L , A 1T are the fraction of expressing promoters bound by polymerase, and A 2T is the fraction of RBS (Ribosome Binding Sites) bound by ribosomes, such that The parameters A 1L , A 1T , A 1T are used to simplify terms, and are not reduced parameters or kinetic rates.
We can further simplify the parameters B L2 , B T2 , B Lg , B Lg and A 2T by removing the degradation terms if they are sufficiently small ( w −8 β tg , w −6 β t2 or w 4 P + w −4 + w 5 β tg ). The effect of polymerase binding on B Lg , B Tg occurs due to competitive binding for the operator-promoter in the prototypical mechanistic model between polymerase and transcription factor. It can be noted that k tl T matches with the model used in the RBS calculator [11] for the case that w −4 w 5 + w 4 R + γ tR by neglecting degradation, assuming that only a small fraction of the ribosome binding sites are occupied and assuming that thermodynamic equilibrium occurs for the ribosome binding/unbinding.
The differential-algebraic equations are well defined as θ L1 , θ T1 are monotonically increasing functions of C 2T , C 2L for 0 ≤ C 2T < g T T , 0 ≤ C 2L < g T L and so there is one non-negative solution for each function.
The model (6) can be used directly for numerical simulations, or can be further reduced to an ODE for either analytical or numerical analysis of the model. In order to find an ODE, we need to determine the inverse of θ L1 , θ T1 so that we have . This can be carried out numerically, if the parameters are known, or analytically using various (close) approximations (SI 2 and 3).
A simulation of (6) can be seen in Figure S1, which shows that the reduced DAE model is a (close) match to the full mechanistic model, assuming that the quasi-steady state approximation holds.

Nondimensionalisation and Standard Singular Perturbation Form
To determine under which conditions the quasi-steady state approximation holds, we need to apply singular perturbation theory [9]. The first step in this application is to nondimensionalise the original model, in order to determine which parameters are large or small.
We first transform the ODE, where the transformed equations with summed variables and conserved quantities arė where y T = A 1T g T T − C 1T is introduced as y T is at a maximum steady state when C 2T is maximum, useful for scaling.  (6) for a Repressilator network [5] with a stable equilibrium point. The reduced DAE model is a close approximation of the full mechanistic model when mRNA and protein degradation are much 'slower' than transcription, translation, operator binding and dimerisation. The parameters used for the simulation are P = 1000, R = 1000, a 4 = 0.01, a −4 = 1, , and other variables set at quasi-steady state (4).
The transformations to total transcription factor, total dimer and bound dimer allow the rate of change of each fast variable to (primarily) represent the dynamics of one reaction, e.g. transcription forẏ T or dimerisation forż T T . This representation allows simplified assumptions in terms of the individual reactions.
The dynamics of z T T and C 2T can be rewritten aṡ Notation: In the following section, Y = Y nȲ is used as notation for scaling, where Y is a variable, Y n is a nondimensionalisation constant andȲ is the non-dimensionalised variable.
We nondimensionalise all variables in reference to the maximum steady state over the range of inputs, which references the maximum gene expression level. Ribosome R and Polymerase P are typically referenced against their maximum value, but the initial value can also be used. It should be noted that m T T is minimum when y T L is maximum, and so we are not nondimensionalising using a particular equilibrium point. Using this reference, we have the nondimensionalisation parameters where β is the slow time scale. For simplicity of description, we also use where X Tn is the monomeric transcription factor scaling constant, X T2n is the dimeric transcription factor scaling constant and g Tn is the free operator-promoter scaling constant.
When referencing x T T against the maximum, the scaling parameters are in implicit form, where θ −1 T1 and β are not stated explicitly. However, this approach holds as there is a unique, well defined, non-negative solution due to monotonically increasing θ T1 , θ T2 and β Tn X T Tn with respect to X T Tn . The parameters can be determined explicitly through either solving computationally, if the kinetic parameters are given, or by (close) approximation (see SI 2 and 3).
We can alternatively scale against a lower value for the total transcription factor in order to obtain a more accurate 'typical' maximum, such that where 0 ≤ S L ≤ 1 is a fraction dependent upon the typical operating range of C 2L , and thus dependent upon the entire gene regulatory network. Equivalently, we can scale against a reference regulation fraction 0 ≤ F T,re f < 1, such that where the other parameters are referenced against the new X T Tn , as per Equation (11). Nondimensionalising the ODEs results in with parameters β C2T noting that we cancel terms as the nondimensionalisation is based on the maximum steady state for each variable. We estimate the separate time-scales β C3T , β ZT , β YT , β C2T of the individual fast variables from the diagonal of the Jacobian matrix of the fast variables about the candidate slow manifold.
To place the system in standard form, we need to estimate a lower bound for the fast time-scale, which we achieve based on local analysis about the candidate slow manifold. This lower bound is required to be slower than the individual fast variables, taking any coupling into account between the variables. C 3T is uncoupled from other variables on the fast time scale, while y T ,C 2T ,z T T are coupled. From experimental observations, we can typically assume that β C2T β C1T [7], and so β C1T is effectively decoupled. By using estimates of the minimum time scale based on the eigenvalue analysis, which hold locally about the quasi-steady state, we can estimate the lowest fast time scale to be setting ε w6 , as we need β β coup for time-scale separation to occur. These coupling conditions ensure that as well as both C 2T and z T T being fast variables, all possible local transformations of the two variables are also fast. If required, we can generalise the assumptions for the case that y T ,C 2T ,z T T are on the same time scale, which we can once again achieve using local analysis. We estimate the slow time-scale using The 'slow' time scale may differ for a network compared to a 'module' analysed here, with the network typically slower. However, the network may be faster due to feedback e.g. auto regulation [1]. In order to obtain useful estimates of fast and slow time-scales, care is required in selecting total transcription factor scaling X T Tn using (11) or (13), which should be relevant to the behaviour of the entire network. A choice of X T Tn well above the concentration range relevant for regulation of expression could result in an overestimate of the speed of dimerisation or operator binding. For example, for a large X T Tn where almost all operators are occupied (resulting in C 2Ln = 0.999g T L ) and where there is an excess of free dimeric transcription factor X 2Tn , then the estimated speed of operator binding may be orders of magnitude too high compared to a lower but still typical level of operator binding (C 2Ln = 0.8g T L ∼ g T L ). Similarly, if X T Tn is too low, then there may be an unnecessary underestimate. For example, if dimerisation is (conservatively) required to be fast for all X T Tn , then the speed of dimerisation may be estimated with a lower bound β ZT ≥ w −6 + β t2 based on very low transcription factor concentrations. However, this lower bound is often likely to significantly underestimate the dimerisation time scale at concentrations relevant to regulation, which is typically when C 2Ln ∼ g T L , depending upon the allowed error.
We can place (15) in standard singular perturbation form [9] x by setting . . , ε w6 ). By symmetry, the same methodology applies for reactions involving X L , X L2 , g L X L2 , g L P, using equivalent assumptions, which can be used to derive the gene expression of x T T . From the standard form (19), we can see that ε is small and thus that Quasi-Steady State holds under the assumptions as well as β tg η To and β C2T β C1T together with β Tn , γ Tn β coup , where the last of which can be relaxed to when (20) holds. The conditions in (20) require that the sum of the forward and reverse rates for transcription, translation, operator binding and dimerisation are faster than mRNA and protein degradation. For dimerisation and operator binding, this holds if either the forward or the reverse rates are sufficiently fast compared to degradation. For transcription and translation, this condition holds if ribosome/polymerase binding, unbinding or initiation is sufficiently fast.
Equation (22) holds if (20) holds unless both the reverse rate of operator binding and the forward rate of dimerisation are relatively small, when the operator is not being fully occupied. For the case that (22) does not hold, there would be a near zero free dimer concentration at quasi-steady state, which would limit bound dimer (X T2 g T ) and monomer (X T ) reaching (quasi) equilibrium on a fast time scale. This case is typically only possible at concentrations too low to be of interest for modelling of regulation.
The derived conditions generalise the existing model reduction methodology that uses total transcription factor indirectly [8,6], where the generalisation is required to match with known experimental data. In [7], the isolated forward rate of chromosomal lacI-operator binding has an associated time-scale of ≈ 1 2 min, while the reverse rate has an associated time-scale of ≈ 10 min, noting that lacI is a dimer of dimers and that X T Tn ≈ 20 molecules/cell [4]. As γ T ≈ 5 min and β T has a time scale between 5 min and hours/days [1], then experimental results indicate that in a typical case for lacI, only the forward rate is fast enough for time-scale separation to hold generally. As the assumptions used here require either a fast forward or reverse rate, the current reduced model assumptions are consistent with experimental data for lacI [4,7].
The time-scale separation conditions may be conservative for the asymptotic cases X Tn X T Tn or X T2n X T Tn or g Tn g T T , as for these cases the local analysis may not contain a useful estimate of the time-scales. For example, if X Tn X T Tn holds and β ZT β Tn , γ Tn does not hold, but the latter holds for a hypothetical X Tn ∼ X T Tn , then X Tn typically converges to near zero sufficiently quickly such that the error in degradation rate of x T T and quasi-steady state of C 2T are small. As C 2T is a proxy variable for regulation, the quasi-steady state is a useful estimate of regulation for this case.
Equivalent time-varying equations may be obtained if R and P are time-varying parameters, such that In this case, we also require that P and R are slowly time-varying. We require dP These conditions on parameters R and P in (24) hold locally about the slow manifold, and hold on the entire domain when there is a sufficiently large number of polymerase/ribosomes such that binding has little effect on the free polymerase/ribosomes concentration.

Application of Tikhonov's Theorem
To show that the quasi-steady state approximation holds, we need to show that the conditions for a version of Tikhonov's Theorem are met on the domain of interest [9], which for this case is the set of non-negative values of each concentration below the maximum of each variable. As stated earlier, the non-dimensionalised quasi-steady state has a unique solution. The well defined nature ofC 2T in terms ofx T T on the domain of interest implies an isolated root to the quasi-steady state solution. The functions f 1 and f 2 in (19) are polynomial in terms of (x 1 , x 2 , ε) and so are sufficiently smooth. Treating R and P as time-varying parameters, function f 2 is sufficiently smooth with respect to x T T if R and P are continuously differentiable. The inverse function theorem implies that the functionC 2T = θ −1 T1 (X T Tnx T T )/C 2Tn is sufficiently smooth on the domain for the reduced problem to have a well defined solution. Finally, for the fast dynamics it can be shown that the isolated root is (locally) exponentially stable by using local eigenvalue analysis. We set If R and P are time-varying, then they are assumed slowly time-varying and we can set R = R(0) and P = P(0) for the boundary value problem. The linearisation of the scaled boundary value problem is The matrix is Hurwitz for the entire domain of interest, and so the isolated root is (locally) exponentially stable. Thus the conditions for Tikhonov's Theorem are met internally on the domain of interest [9].
For the boundary cases, the isolated root is only on the boundary of the domain for x T T = 0 or m T T = 0, on the assumption that the kinetic rates are positive constants. We note that and so the quasi-steady state approximation holds trivially for C 2T , x T T , z T T , y T on the boundary of the domain, as well as holding in the limit due to continuity. This logic similarly holds for m T T = 0. Thus for sufficiently small ε in (19) then the quasi-steady state approximation holds on the domain.

Simplified Expression Model
In this section, we simplify the model of expression for gene regulation from an implicit to an explicit form, which together with simplified degradation (SI 3), enables us to represent the reduced model as an ODE rather than a DAE (differential-algebraic equation). To derive an ODE, we need to determine the expression in terms of the total TF (Transcription Factor), which is an inverse function problem. To solve this, we partition the function into multiple cases and then use Padé approximations to approximate the inverse function. The two cases used for the partition are the monomer dominant and multimer dominant regulation, which indicate the predominant form of the regulating transcription factor at concentrations relevant for regulation. We use two methods of perturbation theory and interpolation to determine the Padé approximations and to determine the partitioning of the cases. Using interpolation is conceptually related to previous interpolation methods for Michaelis-Menten kinetic derivations [3].
To determine expression explicitly, we need to invert the function θ L1 (C 2L ), where

Perturbation Theory
We initially approximate the inverse for the multimer dominant case, where and the monomer dominant case, where before using perturbation theory to both determine when the two cases are valid, and to find more accurate approximations.
We provide the approximation in terms of the multimerisation efficiency η Lm := is the fraction of a protein in its full multimer form. There are alternative approaches to represent the approximation than using the multimerisation efficiency. However, we use the approach presented here as it is relatively simple and allows the refined approximations to have a simple biological interpretation.

Multimer Dominant Regulation
We first look at the multimer dominant case. Using z T L = η Lm x T L , we have and so We set the initial approximation where η Lm0 = 1. More formally, we can write out the non-dimensionalised algebraic equation in (28) using the scaling approximation C 2Ln = g T L (noting C 2L = C 2LnC2L ) to obtain the inverse function perturbation problem There is a boundary layer nearC 2L = 1 if 1 − A 1 = O(ε), but we are only interested in the outer solution. We transform this perturbation problem to be in terms of the multimerisation efficiency, such that

Padé Approximation -Multimer Case
We next use a Padé approximation to estimate the multimerisation efficiency, as the Padé approximation is a close approximation for a large range of ε L . In comparison, a 1 st order Taylor series is much slower to converge and predicts negative solutions for small A 2 (i.e. small x T L ), which is not biologically feasible. For the zeroth order Padé approximation, we have η Lm0 = 1, which is the 'multimeric transcription factor only' case. Using a first order Padé approximation η Lm1 = a 0 (A 2 ) 1+εa 1 (A 2 ) and matching to the Taylor series, we have Using the original scaling and variables, we have  Figure S3: Comparison of Regulation function F L from the DAE (6) and perturbation model (33) for the multimer dominant case. Parameters are g T L = 1, 1 B Lg = 1, 1 B L2 = 16 (molecules/cell) and ε L = 1. The parameters chosen give the boundary case ε L = 1, but the first order Padé approximation only has a small error from the full mechanistic model.

Monomer Dominant Regulation
We next look at the monomer dominant case. Using and so We again set but use η Lm0 = 0. We find the non-dimensionalised algebraic equation in (6) using the approximation C 2Ln = g T L to obtain the perturbation problem For leading order behaviour of order 1 asC 2L → 0, we square the perturbation problem, such thatC Solving this, we have h L0 = 1 as expected, as well as With original scaling and parameters, this becomes and we set h L1 = (1 − η Lm1 ) 2 .

Alternative to Monomer Dominant Case
We can describe the expression term in Equation 2 (main paper) in an alternative representation by using the overlapping cases of predominantly multimer and predominantly free monomer/dimer, such that where the free transcription factor fraction h L replaces the multimerisation efficiency η Lm . However, here we use the original regulation term (Equation 3 in the paper), at least for lower order approxations, for simplicity and for the ability to more easily relate the proposed regulation term to traditional models. It can be noted that modelling the total free monomer/dimer concentration is of use to determining reduced models for tetrameric TF.
For the free monomer/dimer case, we use the perturbation problem noting that ε Lu ≤ ε Ls = 1 ε L , and so (47) covers the case of monomer dominant regulation. Using overlapping perturbation problems allows a non-unique partitioning, and so we can also replace ε L with 1/ε Lu if required.
Solving, we have Using the same methodology as the multimer dominant case, the first order (unscaled) Padé approximation is

Interpolation
We also use interpolation to determine the Padé approximation and to determine the required case, such that This test determines the predominant form of transcription factor when half of the operators are occupied.
To approximate multimerisation efficiency for the multimer dominant case, we use the point C 2g = 1 2 g for a one point interpolation, such that Similarly, for the first order rational case, we interpolate the points C 2g = 0, C 2g = 1 2 g, and C 2g → g, which gives For the monomer dominant case, we use an identical zeroth order approximation to the multimer case, where However, for the first order case, we use as using h L1 = (1 − η Lm1 ) 2 results in erroneous asymptotic behaviour of F L as x T L → ∞. Here, we interpolate at C 2L = 1 2 as we are analysing a module and have no information about the network. However, we can also interpolate at or near an equilibrium point for the network, or at a typical operating point.

Simplified Degradation Model
We next simplify the model of degradation from an implicit to an explicit form, which together with SI 2 enables the model of to be written as an ODE. We split degradation into the two cases of uniform and non-uniform degradation, where we define uniform degradation as occurring when different forms of the protein have the same degradation rate. For a transcription factor (TF) in the prototypical model with uniform degradation, the degradation rates of the monomer, free and bound dimer are equal. This differs from two distinct proteins having the same degradation rate. In many cases we can assume uniform degradation of proteins in order to simplify the protein degradation term in the ODEs (e.g. dilution only). This assumption is biologically reasonable in many cases, and more generally is a useful first approximation for the non-uniform case. If non-uniform degradation is required to be modelled, then we can use equivalent approximations to those used for simplifying regulation in SI 2.

Uniform Degradation Model
The uniform degradation assumption in the mechanistic model is Biologically, this is a lumped mechanistic model, as the term is composed of both dilution and degradation. The dilution terms are uniform for all conditions. Uniform degradation can be represented by Degradation The doubled kinetic rate assumes that the protease has two sites to bind on the dimer X T2 or g T X T2 . Equations of either form lead to the same reduced, deterministic equations, and so we lump the degradation and dilution biochemical equations.
Assuming uniform degradation of proteins, we can simplify the differential equation for proteins toẋ where β T is a constant, rather than a function of x T T . Similar to proteins, mRNA degradation can be treated as uniform as a first approximation, where the degradation rate is the same with or without a bound ribosome. Using the approximation of uniform mRNA degradation simplifies the degradation rate of mRNA to a constant γ T = γ t = γ tR , which is otherwise dependent on the often time-varying ribosome concentration.
Mathematically, uniform degradation is similar to and thus named after uniform damping approximations used in power systems and oscillator networks [2].

Non-Uniform Degradation Model
If non-uniform degradation is required to be modelled, then we can use the approximation of η Tm found in an identical fashion to η Lm and R T (see SI 2), where we have We can also use interpolation of the degradation rate directly, where for a zeroth order Padé approximation with interpolation at C 2T = 1/2g T T , we have We can similarly use interpolation to find higher order approximations (see SI 2).

Model Reduction of Activator
We next find the reduced model for an activator, in a similar manner to the repressor. For the case of activation, the transcription reactions in the full mechanistic model need to be replaced by the following biochemical equation:

Degradation (Activation)
g T X T2 P β tg → g T + P Operator Binding where for simplicity we assume uniform transcription factor-operator dissociation and degradation, with or without bound polymerase, noting that this assumption is generalisable.
We use equivalent derivation and nondimensionalisation as for the repressor, only showing the nondimensionalisation for the altered equations.
We set the summed variables for the transcription factor Thus the slow variables arė and the quasi-steady state approximations together with the conserved quantities are Solving, we have the reduced differential-algebraic equationṡ where z T (y T T ) = 2y T T B Tg (g T T −y T T ) + 2y T T is a function of y T and where such that V tx T is the transcription rate per non-repressed promoter, B L2 , B T2 are the effective dimerisation association constants, and B Lg , B Tg are the effective dimer-operator association constants. We can further simplify the parameters B L2 , B T2 and B Lg , B Tg by removing β lg and β l2 if required, as per the case of the repressor (SI 1).
The model can be further simplified in an identical fashion to the repressor case in order to derive an ODE (SI 2 and 3).

Assumptions
To determine the conditions under which the reduced model holds, we again need to apply singular perturbation theory. We rewrite the differential equations which are altered from the repressor caseġ For the activator, we have the transformed differential equationṡ which can be rewritteṅ We have the nondimensionalisation which results in where where g Tn = g T T − Y T Tn and X T2n = 1 2 (Z T Tn − Y T Tn ). Using a similar singular perturbation form for the activator, we have the conditions as well as β tg η To and similarly to the repressor case, we need (22). We also need to assume that if R and P are time-varying, then they are slowly time varying, as per (24). For the case of the activator we do not need β C2T β C1T , as the dynamics of C 1T have no effect on the dynamics of C 2T . Biologically, this is due to the lack of competition between polymerase and transcription factor for DNA binding sites.

Model Reduction -Protein Only
In this section, we determine the reduced model and assumptions for the case that only protein is modelled. We can remove the need to model mRNA if protein degradation/dilution is much smaller than mRNA degradation. For this case, the reduced biochemical equations are where k L are the expression rates per gene, which is k L (x T L ) = k tl L β L V tx L F L . We can determine conditions for this model to hold be setting β = β Tn and ε w7 = β Tn γ Tn in (15), which leads to the alternative mRNA equation We can therefore replace (20) with the assumptions β Tn γ Tn , (w 4 R n + w −4 + w 5 + γ tR ), (w 1 P n + w −1 + w 2 ), (w 8 X T2n + w 8 g Tn + w −8 + β tg ), (4w 6 X Tn + w −6 + β t2 ) as well as β tg η To 4w 6 X Tn + w −6 + β t2 and β C2T β C1T together with

Model Reduction -Basal Expression
In this section, we find the reduced model for gene regulation when basal expression is included. For basal expression, we introduce the extra biochemical equations Basal Transcription (Repression) This biochemical reaction is a lumped reaction representing the different effects such that X L2 bound to g L is only partially effective in repression.
For this case, we represent the modified transcription rate in (6) as The assumptions for the singular perturbation can be found in an equivalent manner to SI 1, while TF degradation rate in the reduced models can be found in a similar manner to that described in SI 3.

Model Reduction -Inducers
In this section, we include a prototypical mechanistic model of the effect of inducers on the transcription factors. We use a relatively simple biochemical model of inducer binding to illustrate the methodology, noting that more complicated biochemical models can also be used if required. We include the extra biochemical reactions Inducer in Equation (1) in the paper. We have the total protein concentration where C I = [X L I] and C 2I = [X L2 2I]. The quasi-steady state approximations are See SI 1 for further information on A 1L and B L2 . Thus We assume that the free inducer concentration is a constant. However, the non-constant inducer case can be treated using Padé approximations as per the methodology in SI 2. Now (76) has the same form as the non-inducer case, but with altered parameters. Thus we have For the multimer dominant case, we have For the monomer dominant case, we have The degradation term for the model can be determined using a similar methodology to SI 3. The assumptions for the singular perturbation can be found in an equivalent manner to SI 1.

Model Reduction -Unbound Dimer Only
In this section, we find a simplified version of the regulation term for the multimer dominant case. We start with the proposed model's multimer dominant regulation term and assume that g T L 1 B Lg , for which the system is predominantly unbound dimer for all concentrations of x T L , ignoring monomer by using the previous assumptions. Using 1 and so we have

Model Reduction -Multiple Operators
In this section, we determine reduced models of gene regulation when additional operators are included. The effect of additional operators can be important for the multimer dominant case of regulation, while for the monomer dominant case, the effect of adding additional operators is typically smaller. To incorporate the additional operators in the reduced models, we can once again split the model into multiple cases and use perturbation theory. For the multimer dominant case, we initially analyse the three special cases of extra operators with much higher, equal and much weaker affinity, when compared to the primary operator. We then use Padé approximations to describe regulation functions for cases in between the three asymptotic cases. Here we use perturbation theory to determine Padé approximations, although we can alternatively use interpolation. The extra operator has no affect on the monomer dominant case for the initial approximation, but is included in higher order approximations through the multimerisation efficiency.
For the case of a second operator, we have the added biochemical equation along with relevant degradation term. Setting C 4L = [X L2 O L ], we have the modified total protein equation along with the quasi-steady state where B Lo = α 9 α −9 . Now we have the conserved quantity C 4L 1+B Lo X L2 . Thus we have The perturbation problem for the dimer dominant regulation is From this we can see that a system with extra operators is more likely to be multimer dominant. The assumptions for the singular perturbation can be found in an equivalent manner to SI 1, noting that the coupling condition equivalent to (17) requires three variables to be taken into account, rather than two.

Multimer Dominant Case
For the multimer dominant case, we analyse the three special cases of B Lo B Lg , B Lo ≈ B Lg and B Lo B Lg . We then use Padé approximations to represent the cases in-between these asymptotic cases, so that the reduced models can be determine for all values of B Lo .
We have the total dimer concentration In the following, we treat the multimerisation efficiency and the effect of added operators as two separate perturbation problems. This minor informality is sufficient for the zeroth and first order approximations determined here, but more care is required for higher order terms or determining error bounds if ε L and the small parameters in this section are of the same order of magnitude.

Low Affinity Extra Operator
If B Lo B Lg then we use the form For the first order approximation, we have Adding more weak operators effectively weakens the binding affinity B Lg A of the primary operator. An example of this is due to non-specific binding.
For a higher order approximation, using a Padé approximation, we have

Equal Affinity Extra Operator
For an equal affinity extra operator, we have For a zeroth order approximation, we can use Thus adding more equal affinity operators effectively increases the gene copy number in the regulation term (Ag T L ). For a first order approximation, we can use

High Affinity Extra Operator
If B Lo B Lg then we have There is a boundary layer near C 2L = 0, and we have the zeroth order outer solution It can be noted that a higher affinity extra operator effectively removes transcription factor. For a rough solution, we can use the estimate
by noting that saturation of A with increasing x T L occurs when C 2L g T L and so any error is relatively small.
For a more accurate estimate across a range of parameters, it is simpler to use the low affinity case to determine the dimer bound to second operator, using (87) with B Lo , O T L swapped with B Lg , g T L to obtain F O . From this, we obtain

Monomer Dominant Case
For the monomer dominant case, the regulation term is not affected by the extra operators for the zeroth order approximation. However, for higher order approximations there is an effect.
Using an equivalent methodology to the single operator case, we have

Regulation Efficiency
In this section, we discuss regulation efficiency, an important concept in gene regulation. We define the regulation efficiency to be the fraction of transcription factor, in monomer units, which is bound to an operator. The efficiency of regulation η Lg for the transcription factor is defined here as noting the 2 is required as there are two monomer units in a dimer, and also noting that regulatory (η Lg ) and multimerisation (η Lm ) efficiency are distinct. The efficiency of regulation η Lg for the dimer can be estimated by where F L is given in Equation 3 in the paper. For the case of multiple operators, the efficiency is defined by and the efficiency can be estimated by where F L and F O are the regulation functions for the respective promoters, found using Equation 3 in the main section, with relevant corrections as described in SI 9.