Data-driven network models for genetic circuits from time-series data with incomplete measurements

Synthetic gene networks are frequently conceptualized and visualized as static graphs. This view of biological programming stands in stark contrast to the transient nature of biomolecular interaction, which is frequently enacted by labile molecules that are often unmeasured. Thus, the network topology and dynamics of synthetic gene networks can be difficult to verify in vivo or in vitro, due to the presence of unmeasured biological states. Here we introduce the dynamical structure function as a new mesoscopic, data-driven class of models to describe gene networks with incomplete measurements of state dynamics. We develop a network reconstruction algorithm and a code base for reconstructing the dynamical structure function from data, to enable discovery and visualization of graphical relationships in a genetic circuit diagram as time-dependent functions rather than static, unknown weights. We prove a theorem, showing that dynamical structure functions can provide a data-driven estimate of the size of crosstalk fluctuations from an idealized model. We illustrate this idea with numerical examples. Finally, we show how data-driven estimation of dynamical structure functions can explain failure modes in two experimentally implemented genetic circuits, a previously reported in vitro genetic circuit and a new E. coli-based transcriptional event detector.


Introduction
Synthetic gene networks fulfil diverse roles in realizing circuit logic [1] and timing in living organisms [2], ranging from single-input inverters [3,4] to combinatorial input logic gates [5,6], reduction in DNA synthesis and sequencing costs have made it possible to build increasingly complex genetic circuits with tens to hundreds of components. However, the ability to build novel biological circuitry often outpaces our ability to revise designs or to verify what has been built behaves as intended. As the fields of synthetic and systems biology continue to build and integrate on successes of circuit and device-level if nearby nodes positively or negatively correlate, an activating or repressing relationship between two network nodes can be inferred. In [24], this framework was taken a step further, by showing that the behaviour of direct and indirect links in a benchmark circuit is network topology dependent. This provided a means for using steady-state perturbation data [39,45,47] to estimate network models. Furthermore, these steady-state estimation algorithms have been verified using a benchmark synthetic gene circuit [43]. More recently, the authors in [42] and [40] showed that retroactivity in gene networks can paradoxically confound network predictions that are based wholly on correlation measures. The core issue is that even when measurement data for all biological states are available, causality is difficult to determine from steady-state measurement data affected by back-action or retroactivity in genetic networks [40].
At the single-cell level, the reconstruction problem for biological networks introduces challenges of inferring nonlinear stochastic models from noisy data [44,48,49]. In [48], the authors show that by comparing average abundances, molecule lifetimes, covariances and magnitude of step, they can map pairwise interaction dynamics, even when the rest of the system is completely unspecified. The key observation is that assembly stoichiometry of new molecules is fixed, so unbalanced production of linked precursor components will exacerbate imbalance further, resulting in empirically observed large fluctuations. Furthermore, Hilfinger et al. [49] showed that there are statistical invariants for certain kinds of network interactions, which can be used to evaluate and challenge existing hypotheses of stochastic gene interaction. More recently, Wang et al. showed that effective stoichiometric spaces can be used to determine network structure from the covariances of single-cell multiplex data [44]. These studies show that it is possible to infer meaningful structural information about a genetic network, even when only a portion of the network states are observed and the data are fundamentally noisy.
In this paper, we introduce a class of mesoscopic network reconstruction models with adaptable resolution, commensurate with the depth or coverage of the circuit states (or genome) available from fluorimetric, spectometry-based, or sequencing based measurements. Our method is distinct in that we consider the use of high-resolution time-series data, but where only partial measurement of the network's nodes is feasible. Furthermore, we consider dynamic measurements of bulk culture rather than single cell, where we benefit from the assumptions of high molecular copy number and large reaction volumes [49]. Specifically, we present the dynamical structure function, an abstract model class from linear time-invariant systems theory and show it can be used as a generalized representation of measured interactions between biological or biochemical states. The contributions of this paper are: (1) we show how a dynamical structure function can encode both direct and crosstalk network interactions, by way of theorem and simulated examples, (2) we develop a direct estimation algorithm and code to directly estimate the dynamical structure function, as well as visualization tools to monitor repression and activation in genetic circuits and (3) we demonstrate this theory on two experimental systems: (A) an in vitro genelet repressilator from the synthetic biology literature and (B) a novel transcriptional event detector that we build specifically to illustrate dynamical structure reconstruction.

Representing network interactions in partially measured biological networks with dynamical structure functions
In both systems and synthetic biology, discovering (or verifying) the network of a (engineered) biological system is an important problem. However, discovering an entire biochemical reaction network is typically ill-posed, since many network dynamics occur simultaneously from host or environmental context [50], loading effects [51], or unanticipated retroactivity effects [7,[51][52][53][54]. Even without these effects, the reconstruction problem is equivalent to finding a unique realization for the dynamical system from direct measurements of every state of the system. Unique realization problems are difficult, unless the system of interest has specific structure, e.g. measurement functions of the state that are diffeomorphic [55,56]. On the other hand, there are many inputs that can be used to perturb the system of interest, e.g. silencing RNA [57], genetic knockouts [58] and small chemical inducers [59]. Using these inputs, it is straightforward to reconstruct the system transfer function G(s) of the system [60,61], where YðsÞ ¼ GðsÞUðsÞ: Y(s) is a system's output, U(s) a system's input and s is the Laplace variable. However, the standard system transfer function G(s) only models closed-loop input-to-output dependencies. It has no direct information about how chemical species within the system are interacting with each other.
There are other kinds of transfer function models which have the potential to describe the network of interactions between chemical species, for example, the dynamical structure function [62]. The dynamical structure function is a representation derived from linear systems theory, and thus can be used to model transients of a genetic circuit around an operating point or even unstable network dynamics diverging from an equilibrium point. It is a more detailed description of network structure than the system transfer function G(s) since it models causal interactions between measured outputs, in addition to the causal dependencies of outputs on input variables. Most notably, necessary and sufficient conditions for recovery of dynamical network models have been developed and well-studied [22,[62][63][64][65][66], but so far no open-source algorithms, code bases, or applications of this theory have been developed directly for synthetic biology.

Dynamical structure functions
Here we introduce the mathematical formulation of a dynamical structure function, formulated in the context of biological analysis. The dynamical structure function is formulated on the premise that partial, rather than full, state measurements are accessible. The measured states are denoted as y [ R p and the hidden or latent states are denoted as x h [ R nÀp , where n is the dimension of the whole state. We then denote the state of the dynamical system We let u [ R m denote exogenous inputs that can be introduced to influence the dynamics of the state x. With the exception of oscillators, many biochemical reaction networks converge to a steady state. Moreover, it is generally the case that the parameters of biochemical reaction networks are time-invariant [67], so long as macroscopic experimental settings of the system such as temperature, growth media and dissolved oxygen content remain fixed. Therefore, while the model of a biochemical reaction network is of the form we will suppose that we can linearize the system about either an equilibrium point, a nominal operating point, or even an (unstable or oscillatory) initial condition to extract network dynamics. In biological systems, networks are almost never precisely linear, but we presume to model local fluctuations or perturbations from a target point in the state space. As we will see in the following, this will be enough to extract relevant network information. Proceeding with the linearization, we can write the system in the form We also assume the system's initial condition of the linearized system is x(0) = 0, and the entries in PðtÞ ; L À1 (PðsÞ): Note that Q(s) is defined as Q(s) = (sI − D) −1 (W − D) rather than Q alt (s) = 1/sW(s). This guarantees that the diagonal entries of Q(s) are 0, which implies that any non-zero terms Q ij (s) are strictly proper transfer functions and thus descriptions of causal interactions among measured Y i and Y j . This also means that Q(s), defined in this way, is unique and has p fewer transfer functions to identify on its diagonal. This construction of Q(s) and P(s) ultimately ensures identifiability [62] under reasonable assumptions of independent input perturbation [24,[39][40][41][42][43][44][45]. Furthermore, if Q(s) = W/s, then we would face two simultaneous challenges in estimation: (1) disentangling autoregulatory dynamics (Y i to Y i ) from pairwise interactions (Y i to Y j ) and (2) too many unknown parameters in both Q(s) and P(s). Lastly, we find that studying the pairwise interactions Q ij (s) can already elicit important functional information about a genetic network, as illustrated by the next two examples.
Based on the parameters selected above, the system yields a stable equilibrium point (x e , u e ) which becomes the point about which we linearize this example system. The dynamical structure function for this system is derived following the procedure outlined above: first, we take Laplace transforms; second, we eliminate the hidden mRNA states of x 1 , x 2 and x 3 , namely m 1 , m 2 , m 3 . The network structure Q a (s) and control structure P a (s) matrix transfer functions evaluate to the following:  The network, with edge weight functions corresponding to the entries of Q a (s), is drawn in figure 1b. Note that if we take s [ R .0 , the sign of the entries in Q a (s) coincides with the form of transcriptional regulation implemented by TetR and LasR, respectively. In [71], it was shown that the sign definite properties of entries in QðR .0 Þ are useful for reasoning about the monotonicity of interactions between measured outputs and how fundamental limits in system performance relate to network structure.
Let us now consider Q a (t) as defined above. We remark that yðtÞ ¼ Ð t 0 Q a ðtÞyðt À tÞdt follows from the equation whenever u(t) ≡ 0 such that U(s) is 0. This argument holds in general for any system of the form (2.2). In particular, the entries Q a (t) act as convolution kernels, and taken with the integral, define an operator for mapping y j (t) to y i (t).
[Q a (t)] ij also models the isolated impulse response of y i (t) to an impulse y j (t), while assuming all other elements are off or set to zero. In this way, [Q a (t)] ij encodes the time-dependent gain of y j (t) on y i (t) assuming all other nodes in the network are momentarily off. In the case of our example, we can see that the network structure of this incoherent feedforward loop is dynamical, hence our usage of the term dynamical structure function to describe the network structure among the measured chemical species y(t). In this particular case, the time-domain analogue of the dynamical structure (or dynamical structure convolution kernel) is given as Q a ðtÞ ; where Q 21 ðtÞ ¼ (5:5 Â 10 À3 ) e À(8:3Â10 À4 ) t sinh((6:0 Â 10 À5 ) t) Q 31 ðtÞ ¼ (4:1 Â 10 À7 ) e À(4:9Â10 À3 ) t sinh((4:1 Â 10 À3 ) t) and Q 32 ðtÞ ¼ À(9:7 Â 10 À8 ) e À(4:9Â10 À3 ) t sinh((4:1 Â 10 À3 ) t): A visualization of each of these impulse kernel functions and their corresponding location in the dynamic adjacency matrix, defined by Q a (t), is given in figure 2. Note how the activating or repressing nature of genetic regulation is encoded by the positive or negative sign of the corresponding kernel response. In addition to uncovering the Boolean network of interactions between biological states, the dynamical network convolution kernel Q a (t) reveals the time scales of response of each network edge, as well as the amplitude and the rate of decay of the gain from the time of impulse. Similar response profiles can be generated for step function inputs, though finite impulse inputs are typically more common in biological networks. Interestingly, the transfer function G a (s) of the system is likewise lower triangular, reflecting the feedforward network topology in the genetic circuit. Specifically, G a (s) has a sparsity structure of the form G a ðsÞ ; In prototyping a feedforward loop, it is important to anticipate in vivo context effects. We consider the same biocircuit as described in example 2.1.1, except now we specifically consider loading effects frequently neglected in the design process of synthetic biology. First, we note that each gene may be susceptible to loading effects [7]. For each gene in figure 1a, a degradation tag is added, to provide tunability, to the rate of degradation of the protein. Inside the cell, a protease called ClpXP targets these degradation tags and degrades the associated protein. Different tags can be incorporated to modulate royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210413 the gain of the degradation process. Furthermore, these degradation tags can be subject to mutagenesis experiments, as a means to modulate tunability. Tunability of degradation introduces a tradeoff in performance. Since the ClpXP protease is a housekeeping protein expressed to form a common pool of proteases for all genes in the cell, there is a limit to the supply of free ClpXP protein in any instant of the cell's growth cycle. When there are too many degradation-tagged proteins [72], the overloading of the protein degradation queue can trigger unwanted effects such as stress response. More directly, the competition for scarce proteases can induce coupled dynamics or a virtual or indirect interaction between two genes competing for the same protease pool. Even if the genes were engineered to have no direct transcriptional or translational cross-regulation, the competition for the same protease effectively couples the protein states of both genes. Modifying the above model to account for these type of loading effects yields ð2:9Þ We use the same parametric values as before. Computing the dynamical structure function, we obtain Q c (s) as Note that Q c (s) is no longer lower-triangular, but fully connected. Introducing loading effects creates additional coupling between nodes in the network. If the coupling is significant, the designed network interactions of the incoherent feedforward loop are overcome by the crosstalk network interactions [8,20,51,53,71,73,74]. Thus, the coupling that is introduced into the biochemical reaction network by loading effects is reflected in the structure of (Q c , P c )(s) (see figure 3).
By contrast, the input-output transfer function G c (s) of the crosstalk system only characterizes how system outputs causally depend on inputs. When calculated, G c (s) also is a full matrix like Q c (s) of sixth-order SISO transfer functions G c ; but all structural information about how loading effects cause interference among measured system states Y(s) is mixed with the information about how outputs causally depend on inputs U(s) in G(s). An identification algorithm of entries in G(s) will thus be unable to quantify the size of crosstalk or interference among system states Y(s).
To what extent can the entries of (Q(s), P(s)) can be used to quantify the size of crosstalk in a synthetic gene network?
The following theorem shows that the dynamical structure function can quantify crosstalk-induced deviation from idealized or 'designed' system behaviour.
Theorem 2.1. Let L denote the two-sided Laplace operator. Suppose we have a system model that incorporates the effect of crosstalk: where y c [ R p is a vector of measured states in the output, x c h [ R nÀp are the unmeasured states of the system, x c ¼ ðy c , x c h Þ is the full system state, and u [ R m is a vector of system inputs. Furthermore, suppose we have a idealized system model to simulate system dynamics in the absence of crosstalk: where y a [ R p is a vector of the measured states in the output, x a h [ R nÀp are the unmeasured states of the system, x a ¼ ðy a , x a h Þ is the full system state, and u [ R m is a vector of system inputs. Let (Q c (s), P c (s)) and (Q a (s), P a (s)) denote the respective dynamical structure functions calculated for each linearized system about the origin. Let zðtÞ ¼ x a ðtÞ À x c ðtÞ denote the deviation of the crosstalk state from the idealized state. Then and can be estimated from input output data (Y(s), U(s)).
Proof. The proof of this theorem is provided in the electronic supplementary material (theorem C7). ▪ There are two ways to apply this result. First, if one has an idealized or reference dynamical structure model of the circuit, then this can be compared to the dynamical structure model estimated from the data. Second, in the absence of a reference model, the dynamical structure model can be estimated directly from data to observe new edge functions or gain mismatch directly from the data-driven dynamical structure model. In the latter scenario, if the crosstalk interactions are mediated on an existing edge, it will not be possible to separate the magnitude of the crosstalk from the existing (intended) dynamics of a given active edge in the network. However, the discovered royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210413 edge dynamics can be compared to the intended behaviour, e.g. intentional activation or repression over a certain growth phase of the cells.
When the assumptions of theorem 2.1 are satisfied, the entries of Q c ij ðsÞ can be used to quantify the size of the crosstalk in the system. Example 1 in the electronic supplementary material shows how the H 2 gain of entries in Q c ij ðsÞ quantify crosstalk due to enzyme loading. Furthermore, we can compute Q c ij ðtÞ ¼ L À1 ðQ c ij ðsÞÞ, which gives a simulated response model of how crosstalk gain fluctuates over time to affect Y i (t) in response to an impulse in Y j (t). This response model also can be used to estimate the relevant time scales where crosstalk gain is as large as the gain of designed interactions between circuit components.
Knowledge of the active crosstalk in a genetic circuit can be useful for revising closed-loop design or designing biological controllers to compensate for crosstalk effects. As discussed above, discovering the network model of a complex biological network can be useful for design, verification, analysis or control. In the next section, we formally introduce the theoretical conditions under which identification of the dynamical structure function is possible, as well as the formal problem of network reconstruction.

Direct estimation algorithm for dynamical structure functions
In this section, we will introduce a direct estimation algorithm for estimating the dynamical structure function. The dynamical structure function is a tuple of matrix transfer functions (Q(s), P(s)) and can be directly estimated from experimental or simulation data so long as the data and the conditions of the experiment satisfy the assumptions of the following theorem, originally shown in [62]. Proof. See the proof of theorem 2 in section IIIB of [62]. ▪ It follows from this theorem that without additional structural information about the columns of the matrix ½QðsÞPðsÞ Ã , it is not possible to identify (Q(s), P(s)). In synthetic gene circuits with complex internal network interactions encoded by Q(s), we can still solve for the structure and parameters of Q(s), as long as p − 1 elements of P(s) are known. For example, targeted gene knockdowns (CRISPRi) or knockouts (engineered genomic mutations) can be engineered so that P(s) is a diagonal matrix transfer function. This guarantees that p − 1 entries of each column of P(s) are known, which satisfies the conditions of theorem 3.1. In this set-up, all genes or biological states in the network of interest are (A) monitored by some measurement channel over time and (B) independently perturbed by a diagonal element in P(s). These are necessary and sufficient conditions for reconstruction of Q(s) and P(s) and corroborate the general network reconstruction conditions developed for network reconstruction of full state measurement systems [24,[39][40][41][42][43][44][45].
Under the above premises, the task is to estimate the diagonal transfer function entries of P(s) and to estimate all offdiagonal entries of Q(s). Recall from the derivation in §2.1 that the diagonal entries of Q(s) are set to 0 by subtracting the diagonal entries of the precursor W(s) transfer function matrix from W(s). Furthermore, by left-multiplying (sI − D) −1 with W(s) − D(s), this guarantees that the representation for Y(s) = Q(s)Y(s) + P(s)U(s) yields a unique Q(s) and P(s). The entries of Q(s) and P(s) are all strictly proper rational transfer functions and thus encode causal dynamics.

Problem 3.2 (Network reconstruction and estimation).
Given output measurements Y(s) and input measurements U(s) for a system (2.1) satisfying identifiability conditions described in theorem 3.1, the network reconstruction problem Figure 3. Dynamical structure functions describe how network structure evolves over time (and as a function of frequency). The time-lapse response of the dynamical structure convolution kernel Q c ðtÞ ¼ L À1 ðQ c ðsÞÞ for the incoherent feedforward loop in system (2.9). By examining the functional response of each entry in Q c (t) (or Q c (s)), we see that the network structure of the incoherent feedforward loop in example 2.1.1 is a time-evolving, or dynamic, entity.
royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210413 is to find the dynamical structure function (Q(s), P(s)), as defined in (2.6) and (2.7) that satisfies the equation The network estimation problem is to find an estimate dynamical structure function ðQðsÞ,PðsÞÞ to solve the minimization problem min Q,P kYðsÞ ÀQðsÞYðsÞ ÀPðsÞUðsÞk: In practice, there are two approaches to solve for Q(s) and P(s). The first is to estimate the transfer function G(s) using a standard transfer function estimation routine, followed by inversion of G(s) and calculation of the entries of P ii (s) and subsequently the entries of Q(s). This approach has the drawback of relying on inversion of the matrix transfer function G(s), which often requires symbolic inversion and is thus prone to numerical instability and scaling issues for larger networks.
The second approach, which we propose here and develop code for, is to identify the dynamical structure function directly from data by writing the model estimation problem in normal form. First, we can denote the discretetime approximations ofQðsÞ andPðsÞ asQðzÞ,PðzÞ, where z denotes the z-transform.QðzÞ,PðzÞ will follow the same identifiability conditions [62].
Specifically, we have that YðzÞ ¼QðzÞYðzÞ þPðzÞUðzÞ, ð3:1Þ which given thatQðzÞ andPðzÞ share the same denominator, we can multiply the characteristic polynomial ofQðzÞ on both sides, to obtain (z Àn d þ a 1 z Àn d þ1 þ Á Á Á þ a n d )YðzÞ ¼Q num ðzÞYðzÞ þP num ðzÞUðzÞ: ð3:2Þ Here, the matricesQ num ðzÞ,P num ðzÞ are matrices that contain the numerator entries of each transfer function in Q(z) and P(z). NamelyQ We can express the known quantity z Àn d YðzÞ after inverse Z-transforming to obtain where Y tÀm d : t denotes a matrix containing time-shifted entries . This equation can be also stacked for multiple time traces y 1 ½t, y 2 ½t, . . ., y N exp ½t collected from N exp different conditions or experimental replicates, or by simply staggering the time horizon, to obtain the stacked normal form equations where B is a stacked matrix of b 1 , b 2 , . . ., b Nexp and X is a stacked matrix of v 1 , v 2 , . . ., v Nexp of time-series data and Q contains the coefficients determining the characteristic polynomial and the numerators of the dynamical structure function. Once these estimates are obtained, a standard discrete-to-continuous transformation can be used to estimate Q(s) and P(s). The steps of our algorithm are summarized in algorithm 1 and the full MATLAB code is provided at the Github repository https://github.com/YeungRepo/NetRecon.
A couple remarks are in order. First, there are two distinct hyperparameters that require optimization in generating the estimates for Q(s) and P(s): (1) n d , the order of the characteristic polynomial and (2) the selection of the number of subsampled timepoints h max in given time-series traces. The optimal value of h max will depend on the dataset, to ensure the condition number of the matrix X must be small. In high-resolution time-series measurements, a small or short subsampled horizon h max may produce virtually identical data if the transient has a slow rate of change, which can result in ill-conditioning of X. We optimize n d , h max to minimize the n-step L1 prediction error across all N exp experimental samples. In practice, we observe that this criterion, by necessity, guarantees an appropriate selection of h max as well as a lower optimal n d value. In general, optimization of n d is non-convex, as this corresponds to an estimate of the model order.
In the examples below, we leverage knowledge about the system to inform an initial estimate for n d and perform a local optimization. A generalized specification of n d , following the standard calculation of the rank of a Hankel matrix, but generalized for dynamical structure functions, is the subject of future work.
Secondly, at face value, the routine described above may appear similar to a classic transfer function estimation routine, analogous to the tools developed in [60] for MATLAB. The primary difference in algorithm 1 from a standard transfer function estimation procedure for ½QðzÞ PðzÞ is that we must impose structural information about Q(z) and P(z) on the estimation process. This results in a structured system identification problem, which typically is non-convex and difficult to initialize properly. Specifically, the diagonal entries of the estimated Q(z) and the off-diagonal entries of P(z) must be exactly 0, as per the premises of [62]. Again, these structural constraints guarantee a unique representation of the dynamical structure function and identifiability of the model. When using standard transfer function estimation tools (in MATLAB's System Identification Toolbox), we found repeatedly over thousands of numerical trials that imposing structural constraints as model priors sometimes resulted in (A) models with extremely poor n-step prediction capacity (forecasting) or (B) stable transfer function models that were unable to capture transient (divergent) dynamics.
If the dynamics of the system appear to behave like a linear, stable system, MATLAB system identification routines can be adapted to perform reconstruction (e.g. the experimental data for the in vitro transcriptional event detector). However, for many in vivo gene networks, dynamics are nonlinear or respond to perturbations with transients that are unlike asymptotically stable linear dynamics. The latter observation violates the premise of many transfer function estimation routines in the MATLAB System Identification Toolbox. This motivated the development of a direct estimation algorithm, that mirrors the standard estimation of a discrete-time transfer function model, but where structural constraints of Q(z) and P(z) are directly encoded in the formulation of the normal form of equations (lines 19-28, 32 and 36 of algorithm 1).

The dynamical structure of an in vitro genelet repressilator
We now illustrate the process of data-driven estimation of dynamical structure functions, using experimental data.
In this section, we take as a first test case the synthetic genelet repressilator developed by Kim & Winfree [75]. The genelet repressilator consists of three DNA switches that repress one another through indirect sequestration. Specifically, each DNA switch transcribes its mRNA product only when its activator strand binds to complete its T7 RNA polymerase promoter sequence. The RNA product produced from each DNA switch, in turn, acts as an inhibitor to the downstream switch by binding to the downstream switch's DNA activator molecule. Thus, by sequestering the DNA activator from completing the T7 RNA polymerase promoter region, the mRNA product of the upstream switch inhibits activation of the downstream switch. Figure 4a shows the mechanistic design of the genelet switch.   Figure 4. Network representations of a synthetic genelet repressilator. (a) A reaction network using push-arrow reaction notation of the synthetic genelet repressilator. (b) A diagram representing the reaction dynamics in (a) as state dependencies from a nonlinear ODE model in [75]. (c) The dynamical structure of the repressilator (without inputs), with nodes representing measured chemical species and edge weights corresponding to entries in Q(s).
The genelet switch relies heavily on RNase H to degrade any activator-mRNA inhibitor complexes. Without degradation, the binding of activator to mRNA inhibitor is much faster than unbinding and so sequestration is effectively irreversible. Thus, in order for the repressilator to function properly, RNase H must degrade its target substrates sufficiently fast. If RNase H is saturated with high levels of a particular substrate, this slows the degradation of other substrates, creating a crosstalk interaction between competing DNA-RNA complexes.
By performing network reconstruction on the genelet repressilator, we can determine how much crosstalk exists in the biocircuit. Furthermore, we can validate our dynamical structure reconstruction algorithm in an in vitro setting, by deliberately attenuating one of the components to create a gain imbalance. We can see if the reconstruction process recovers the deliberate imbalance we introduce into the genelet repressilator, even when simply measuring local perturbations of an operating point for a normally oscillatory circuit.
To reconstruct Q(s) and P(s), we performed a single experiment with three perturbations applied in series [24,[39][40][41][42][43][44][45]62]. To perturb each switch, we pipetted a small perturbative concentration of DNA inhibitor (a DNA analogue of RNA inhibitor). Since DNA is not degradable in a T7 expression system by RNase H, it effectively acts as a step input since it binds to DNA activator and does not degrade. In this way, our perturbation design ensures sufficiency of excitation and independent perturbation of each activator (and downstream switch), thereby satisfying the identifiability conditions in [62] and the persistence of excitation conditions described in [60]. Furthermore, we attenuated the concentration of the third switch T 31 by 20%, to create a deliberate gain imbalance for evaluating our reconstruction algorithm.
A detailed model of the repressilator can be found in the supplement of [75]. Since the derivation is lengthy, it suffices to write the idealized dynamical structure function Q a (s) of this system, corresponding to the detailed model provided in [75, supplementary §1.6]. The structure is obtained by linearizing the system, transforming into the Laplace domain, eliminating hidden variables to obtain the following: reflecting the cyclic structure of the system. This represents an idealized model of the system. As stated in theorem 2.1 every entry where Q a ij ðsÞ ; 0, the corresponding entry in Q(s) estimated directly from experimental data will be a crosstalk interaction present in the network. Here Q(s) is used to denote Q c (s), the dynamical structure function estimated directly from data.
The experimental data used to fit Q(s) and P(s) are plotted in figure 5, along with their respective fits. For each row i of Q(s), we use Y j , j ≠ i and U i as inputs and Y i as the output for a direct MIMO p × 1 transfer function estimation problem. The impulse response for the convolution kernel Q(t) of the reconstructed Q(s) is plotted in figure 6.
If we compute the corresponding H 1 gain of each entry in Q ij (s) and scale by the maximum gain, we obtain We see significant crosstalk on the edge Q 23 (s) and minor crosstalk from entries Q 31 (s) and Q 12 (s). This crosstalk need not occur simultaneously, since the H 1 gain calculates the worst-case or maximum gain over all possible frequencies.
With the exception of Q 23 (s), all other crosstalk entries have strictly smaller H 1 gain than the designed edge. Examining the impulse response of the convolution kernel confirms these observations; the crosstalk edge Q 23 (t) has a larger impulse response than designed edge Q 32 (t). The response of output is normalized per the maximum signal gain achieved, using the technique described in [75], in arbitrary fluorescent units (a.f.u.). As intended in the design of the experiment, our estimated network model shows a gain imbalance between the designed edges Q 32 (s), Q 13 (s) and Q 21 (s). It is well known that in order for a repressilator to stably oscillate [75], it needs to have approximately the same gain along each edge in the network. This example verifies that our reconstruction algorithm can identify important functional dynamics of a genetic circuit; especially for debugging purposes. The linearization scheme is valid, so long as we model fluctuations in dynamics from a nominal initial condition, even if the initial condition is not stable or leads to oscillatory dynamics. Our results here illustrate how linearized models can provide insight into local dynamics. In this simple, controlled dataset, we know we can increase the gain of the edge in Q 32 (s) by adjusting the binding affinity of the activator DNA with its inhibitor RNA, or by increasing the concentration of the corresponding downstream switch T 31 . Note that this design insight may not be obvious by direct examination of experimental trajectories of each switch in figure 5. As long as we have an idealized network model, we can measure the deviation from that model Q a (s) in the network model identified from the data Q c = Q(s) and identify edges or nodes in our network that need tuning.

The dynamical structure of an in vivo transcriptional event detector
We now introduce a new transcriptional event detector circuit, one that is designed, built, and constructed for illustrating the use of our dynamical structure estimation algorithm in in vivo circuit design. Event detectors are useful because of their ability to perform temporal logic. Making temporal logic decisions enable applications such as programmed differentiation, where the goal is to perform some operation based on combinatorial and temporal sequences of events that dictate cell fate. So far there are two demonstrations of temporal logic gates: (1) a temporal logic gate that differentiates start times of two chemical outputs [76] and (2) a molecular counter that counts the number of sequential pulses of inducers [77]. Both event detectors use serine integrases to perform irreversible recombination, while [77] demonstrates the use of transcription-based event detecting to perform event counting. The advantage of an integrase-based approach is the persistent nature of DNA-based memory. At the same time, the drawback of integrase-based event detection is that it is limited to one-time use.
By contrast, transcription based event detectors use proteins instead of DNA to encode a memory state [77,78]. The advantage of a transcription-based event detector is that proteins are labile, since they are diluted through cell growth or can be tagged for degradation. Thus, a transcriptional event detector's memory state can be reset after some period of time. On the other hand, maintaining protein state over multiple generations is metabolically expensive [51] and the dynamics of the circuit can become sensitive to production and growth phase of the cells. Therefore, a transcription based event detector biocircuit must be designed with precise timing, balance of production rates, and carefully tuned gain of each transcriptional regulator. This presents a suitable application for our network reconstruction algorithm.

Designing a transcriptional event detector
We designed our transcriptional event detector to be made of two constitutively expressed relay genes, AraC and LasR, and an internal toggle switch. The two relay genes transmit the arrival of two distinct induction events (arabinose and HSL) to relay output promoters pBAD and pLas, respectively, which drive production of a fluorescent response in two relay promoters. To record these induction events historically, the output of each relay gene is coupled to one of two combinatorial promoters ( pBAD-Lac or pLas-Tet) in a toggle switch. Each combinatorial promoter implements NIMPLY logic, e.g. pBAD-Lac ( pLas-Tet) expresses TetR (LacI) only when arabinose (HSL) and AraC (LasR) are present and LacI (TetR) is absent. Thus, when one analyte (e.g. arabinose) arrives, it triggers latching of the toggle switch only if the toggle switch is unlatched to begin with or the prior latching protein state has been diluted out. The relay outputs thus transmit the current or recent induction event state while the toggle switch maintains the historical induction event state.
Depending on the order of arrival of each inducer, we obtain different biocircuit states. Figure 7 details the genetic elements in the event detector biocircuit and the designed component interaction network.
We can write down an idealized model for the event detector (assuming no crosstalk), assuming first-order degradation and production, with Hill functions encoding the NIMPLY logic of each promoter in the memory module: royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210413 for u i , k l is the leaky catalytic transcription rate, k i is the catalytic transcription rate for m i , and u 1 , u 2 are arabinose and HSL, respectively. Again, the dynamical structure function for this system is calculated by linearizing the system about a nominal initial condition, (x 0 , m 0 ), taking a Laplace transform and solving out the hidden variables m 1 , …, m 4 . We present a simplified case here, assuming algebraic symmetry of the parameters k i = k, ρ i = ρ, k M,i = k M as it does not qualitatively change the structure of (Q(s), P(s)). We obtain where P ii (s) = ρ/(δ m + s)(δ p + s) for i = 1, 2 and In the absence of protein degradation, Q 12 (s) and Q 21 (s) can be approximated with first-order SISO transfer functions. These expressions for Q(s) and P(s) are for the idealized dynamical structure function of the alternative system. Notice that Q 12 (s) and Q 21 (s) are strictly negative transfer functions, indicating the repression present in an idealized simulation of the event detector circuit. This is the intended dynamical network structure of the event detector, in the absence of all genetic crosstalk or context effects. Depending on the abundance of transcription factors such as LacI, TetR and AraC, as well as commonly shared transcriptional and translational proteins, the actual dynamical structure function Q c (s) may not exhibit monotonic repression or may even unveil unwanted interactions. We can investigate these interactions under a range of conditions with dynamical structure estimation.
We constructed a biological implementation of the event detector, using the design specified in figure 7. The logical components containing the relays and the memory module  Figure 6. Impulse response of the estimated convolution kernel Q(t) matrix. Q(s) is estimated directly from experimental data, transformed into the frequency domain, and simulated in time for t = 0 to t = 300 min. The x-axis is plotted in log scale, to visualize fast, transient edge dynamics that happen upon impulse stimulation of a given edge.
were encoded on to a plasmid vector with a kanamycin resistance marker and a ColE1 (high copy) replication origin. The fluorescent reporter elements with the relay promoters and readouts for the toggle switch were encoded on a plasmid vector with chloramphenicol resistance and the p15 replication origin.

Event detector latching experiments
We evaluated the performance of our transcriptional event detector circuit using a temporal logic test. A standard temporal logic experiment for any two-input event detector is to evaluate the effect of varying the order of presentation of two input signals. In one test, we present the first input, arabinose, for 7.5 h, followed by induction of the second input, a homoserine lactone (HSL) quorum sensing molecule to activate the pLas-Tet promoter. In the second test, we swap the order of the inputs, presenting HSL quorum sensing molecule to the event detector for 7.5 h, then present arabinose inducer as a second input. Both tests evaluate the ability of the memory module of the event detector to latch in the correct state in response to the first input, followed by a challenge to ignore the second input signal while the relays detect and read out the second input signal. The data for both of these in vivo tests are plotted in figure 8b,c. The event detector showed the correct latching response in all tests at standard maximum induction concentrations of arabinose (1 mM) and working induction concentrations of 1 μM HSL. For example, figure 8c shows that when the event detector is given arabinose followed by HSL, it generates the correct fluorescent response of YFP, with lower expressions level of RFP. Conversely, when we add HSL first, followed by arabinose, RFP signal ramps up immediately beginning as early as 1-2 h after induction while YFP expression is abolished to background levels.
We tested a variety of combinations of high and low concentrations for arabinose and HSL. When the concentration of HSL was decreased to 1 nM, we observed consistent leaks in the memory module in either the YFP channel or the RFP channel. Decreasing arabinose down to 1 μM still allows for latching of high YFP expression, but in the presence of 1 μM HSL, any arabinose latching is reversed by HSL induction (data not plotted). Conversely, when we attenuate HSL induction to 1 nM, HSL does not prevent arabinose from reversing a HSL latch on the memory module; see figure 8b. This leak is significant enough in the 1 nM HSL induction level that the difference in signal between the arabinose-HSL induction  scenario versus the HSL-arabinose induction scenario vanished. This temporal logic response profile is evident of a glitch in the event detector circuit that occurs at lower HSL and arabinose concentrations.

Network reconstruction experiments to debug circuit failure
We conducted 4 in vivo network reconstruction experiments (2 inducers versus 2 concentrations), recording time-series data of the memory module relay elements, YFP and RFP. The memory module is designed using two hybrid promoters, so from a design standpoint, verification of the memory module was most critical. The arabinose inducer targets the pAra-Lac promoter, while the HSL inducer targets the pLas-Tet promoter (see electronic supplementary material for sequences).
As shown in the model (5.1) of the event detector, the actual event detector we constructed exhibits nonlinear response. However, for any one parametric concentration regime, e.g. at a fixed arabinose or HSL concentration, the response of the system behaves similar to that of a linear system. Thus, we estimated a dynamical structure function for both conditions of the reconstruction experiment. The one-step accuracies in fitting dynamical structure models to the low gain condition (1 μM arabinose, 1 nM HSL) and high gain condition (1 mM arabinose and 1 μM HSL) were 99.996% and 99.995%, respectively.
As in the case of the genelet repressilator, we can plot a dynamical network graph for the in vivo event detector to understand how the memory module components labelled by YFP and RFP, representing TetR and LacI, respectively, interact with each other. A movie visualizing the dynamics of the edges of the graph is available for download (see electronic supplementary material). Each edge represents the convolution kernel response of the edge to an impulse applied to that input. All responses are superimposed to form a dynamical graph. Snapshots of the graph are plotted in figure 10, while time-lapse responses of the weights of each edge are plotted in figure 9. Again as with the repressilator, we can see that the regulatory nature of edges in the event detector's memory module manifests as two edges with negative or positive values indicating repression or activation, respectively.
The reconstructed network of our transcriptional event detector reveals the functional relationship between states in the circuit at different concentration regimes. At lower concentrations of arabinose and HSL, the reconstructed transcriptional event detector network reveals functional cause of failed circuit latching. Both edges in the memory module did not repress their target promoters as intended, while the pLas-Tet promoter appears to enact a much higher gain of activated expression from HSL induction than does the activated expression of the pAra-Lac promoter in response to arabinose.
In the high gain setting, where arabinose is induced at 1 mM and HSL is induced at 1 μM, we see that the memory module exhibits the proper mutually repressing motif characteristic of the genetic toggle switch up after the arrival of the HSL inducer. The repression in both edges steps up their gain as t approaches 4 h, which is roughly the time when we see a plateauing of production in the RFP signal in figure 8c. From our reconstruction model, we can see that the edges are well balanced at the higher concentration of inducers. At the low gain of inducers, the network is completely inactive, even though the genetic sequence of the circuit is the same. This example shows that our network verification algorithm can be used to determine the conditions, or the performance envelope, under which the circuit is functioning properly. Even though the underlying model of our system is a linear approximation to a nonlinear system, we can test the system at multiple initial conditions, operating points, or equilibria, to quantify network behaviour of the system locally. Taken in aggregate, these can provide a

Conclusion
The dynamical structure function models the dependencies among measured states. It is a flexible representation of network structure that naturally adapts to the constraints imposed by experimental measurement. Since identifiability conditions of the dynamical structure function have been well characterized, appropriate experimental design can ensure that the process of network reconstruction produces a sensible answer.
In this work, we introduced a network reconstruction algorithm and a code base for reconstructing the dynamical structure function from data, to enable discovery and visualization of graphical relationships in a genetic circuit diagram as time-dependent functions rather than static, unknown weights. We proved a theorem, showing that dynamical structure functions can provide a data-driven estimate of the size of crosstalk fluctuations from an idealized model. We then illustrated these findings with numerical examples. Next, we used an in vitro genetic circuit, deliberately tuned with gain imbalance, to validate our algorithm on experimental data. Finally, we built a new E. coli based transcriptional event detector and showed how estimation of the dynamical structure reveals active and inactive network states, depending on inducer concentration. These results show how the dynamical structure function characterizes the operational or active network. They also provide a route for future study of relationships between environmental parameters, active network dynamics, and biocircuit performance.

Experimental methods
All plasmids were constructed using either Golden Gate assembly [79] or Gibson isothermal assembly [80] in E. coli. Plasmids were sequence verified in JM109 cloning strains and transformed into the strain MG1655ΔLacI, provided as a courtesy by R. J. Krom and J. J. Collins. The event detector was transformed as a two-plasmid system with kanamycin and chloramphenicol selection. All in vivo experiments were carried out with n = 2 replicates using MatriPlates (Brook Life Science Systems MGB096-1-2-LG-L) 96 square-well glass bottom plates at 29°C in a H1 Synergy Biotek plate reader using 505/535 nm and 580/610 nm excitation/emission wavelengths. Cell density was quantified with optical density at 600 nm.
For in vitro experiments, all genelet repressilator reconstruction experiments were carried out at 37°C in a Horiba spectrofluoremeter with 1 min readout times, using Rhodamine Green, TYE 563 and Texas Red flourophores with 10 nm monochromator excitation and emission bands centred at 502/527, 549/563 and 585/615 nm, respectively. All event detector network reconstruction reactions were performed using 500 μl reaction volumes in transformed E. coli, grown in square well glass-bottom plates using MatriPlates (Brook Life Science Systems MGB095-1-2-LG-L) with Luria-Bertain rich media broth at 29°C.
Data accessibility. All data files, network reconstruction code and visualization scripts can be obtained from the GitHub repository https:// github.com/YeungRepo/NetworkRecon. The DNA sequences for the T31, T12, T23 switch were obtained as a gift from the Winfree lab, mirroring the design identically of the repressilator genelet circuit used in [75]. Oligonucleotides were ordered with functionalized fluorophores or quenchers, corresponding to the original design of the genetic repressilator. DNA sequences were suspended in Tris-EDTA buffer for primary stock storage, while all genelet switches T12, T31, T23 added at concentrations of 75 nM, 75 nM and 60 nM, respectively to match previous tuning experiments to balance the repressilator, with 7.5 mM working concentration of mono-NTP solution, 24 mM MgCl2, and 1× T7 expression system buffer.
DNA analogues of RNA inhibitors were added to sequester DNA activator signal from the switches as an effective step input perturbation to each node. The switches produced a RNA signal that was designed to interfere with formation of a complete promoter region of the next downstream switch in the repressilator circuit. Adding DNA served as a step perturbation to the corresponding switch. Each DNA moiety added thus had the effect of an activator. Activator DNA molecules A1, A2 and A3, each containing Iowa Black quencher were added at 75 nM, 80 nM and 75 nM working concentration at 20 min from the onset of the reaction, to determine the maximum range of quenching. At 58 min, we added 0.7 μl of pyrophosphatase, 3 μl of T7 RNA Polymerase and 2.2 μl of RNase H to achieve identical working concentrations to those described in [75].

A.2. The transcriptional event detector circuit
The transcriptional event detector circuit, as illustrated in figure 7 in the main text, is composed of four distinct gene expression cassettes that define the regulatory logic of the circuit and four distinct gene expression cassettes that generate the fluorescent reporter elements of the circuit. Each gene cassette defines a transcriptional unit, with a promoter element, an royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210413 RBS, a coding sequence, and a terminator sequence. Each gene cassette was cloned using a 5 part Golden Gate assembly, with a type II BsaI restriction enzyme and overhang sequences from [81,82]. Each assembled gene cassette was cloned in JM109 E. coli cloning strains and sequence verified at Eurofins Genomic, by Sanger sequencing. Assembled plasmids were engineered to enable a second stage Golden Gate assembly, using the BbsI type II restriction enzyme, and assembled to either (1) form a master regulatory logic plasmid ( pEY15K), comprised of four distinct gene expression cassettes driving transcription factor or allosteric response or (2) form a master reporter plasmid comprised of four distinct reporter elements ( pEY14C). Both stage 2 assembled regulatory logic and reporter plasmids were sequence verified using Sanger sequencing (Eurofin Genomics) and transformed into MG1655ΔLacI (a gift from the Collins laboratory). The sequences for all individual plasmids and the circuit plasmids are listed in table 1.

Appendix B. Sequences of genetic circuit components
The sequences for all genetic components and circuits for the event detector circuit are listed in table 1. All genelet repressilator sequences are identical to the sequences used and listed in [75]. All ribosome binding site (RBS) sequences were derived from the bicistronic design (BCD) ribosome binding site library [83], while all terminator sequences were drawn from the synthetic terminator library characterized in [84].

Appendix C. Quantifying crosstalk in biochemical reaction networks
A common way that crosstalk arises in biochemical reaction networks is when species compete for commonly shared enzymes. When this occurs, the sequestration of an enzyme by one competing species makes the enzyme less accessible to other competing species. For example, when two mRNA are competing for a single ribosome, the binding of one mRNA to the ribosome during translation makes it less accessible to other mRNA. At the core of any such crosstalk is a sudden increase in the dependency of one biochemical state on another. Though enzyme loading may be a common source of crosstalk, such interactions can be modelled at a higher level of abstraction, namely how the dynamics of a given state are affected by the concentration fluctuations of other states. Nearly every synthetic gene network implements causal dependencies among states. Often, these 'designed' interactions take the form of transcription factor binding, senseanti-sense mRNA regulation, and sequestration events. In practice, every physical system exhibits trajectories that are a mixture of the consequences of both interaction types: designed and crosstalk interactions. Throughout the course of this paper, we will denote the physical system of interest in our models as To quantify crosstalk in such systems, we can compare the dynamics of system (C 1) against the dynamics of a reference or alternative system that is free of crosstalk. Such a reference system will still retain the desired interaction dynamics and reflects the idealized model often used to design a synthetic gene network, e.g. the feed forward loop model in example 2.1.1. Moreover, it can represent the desired behaviour of the system in a regime where the magnitude of crosstalk effects are supposed to be minimal or engineered in such a way that they are suppressed [7]. We write the reference system as Remark C.1. For the comparison between the alternative and crosstalk system to be fair, it is important that (C4) satisfies internal equivalence [85]. Specifically, we will suppose that any parameters or dynamics unassociated with crosstalk, e.g. interaction dynamics, catalytic reactions, or anabolic reactions with no loading effects, are held fixed. Thus, as we compare the behaviour of both systems, any differences in the hidden state x h or output y dynamics are purely due to effects of crosstalk.
With the definition of an alternative system in place, it becomes possible to reason about the size of crosstalk, by comparing the dynamics of both systems. In particular, we can develop a rigorous notion for describing the amount of crosstalk arising from the difference of trajectories in both systems.
Definition C.2 (Crosstalk trajectory). Consider two systems, a crosstalk system and an alternative or reference system, initialized from the same initial condition x(0). For each initial condition xð0Þ ¼ ðyð0Þ, x h ð0ÞÞ [ R n and input trajectory u(t), we define the crosstalk trajectory ζ(t) as zðtÞ ¼ x a ðtÞ À x c ðtÞ: The crosstalk trajectory is a time-evolving vector that describes the deviation of the physical system (subject to crosstalk) from the reference system's trajectory. With this notion of crosstalk, we can also make precise the concept of crosstalk between states. We note that in writing the following quantity of interest (∂/∂x j )ζ i , it is with a slight abuse of notation, since ζ i (x a (t), x c (t)). Mathematically, we are computing the jth partial derivative of each term in z i ¼ x a i ðtÞ À x c i ðtÞ: Thus, to be clear, when we write (∂/∂x j )ζ i , it will be implicit that we mean ð@=@x a j Þx a i ðtÞ À ð@=@x c j Þx c i ðtÞ: Definition C.3 (Directed crosstalk). Given an initial condition of (x(0), y(0)) and input trajectory u(t) we say that a chemical species x j exerts a crosstalk effect on chemical species x i if the ith component of the crosstalk trajectory ζ(t) has non-zero partial derivative @ @x j z i ðtÞ = 0, for some initial condition of (x(0), y(0)) and input trajectory u(t). In general, we will refer to (∂/∂x j )ζ i (t) as the crosstalk sensitivity of x i to x j .
royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210413 Note that the mathematical definition of crosstalk sensitivity (∂/∂x j )ζ i (t) depends on the initial condition x 0 (t) and the input u(t). This dependency is consistent with the parametric sensitivity of biological function. Many genetic circuits in bacteria behave acceptably in one initial condition and for one input condition, e.g. in log-phase with an attenuated amount of a small molecule or sugar compound, but exhibit significantly different behaviour when input concentrations are increased by an order of magnitude or subject to an alternative preparation method prior to the experiment. The latter imposes a state history that defines a distinct initial condition, which can drive a biological network to a highly coupled or decoupled state.
Example C.4. Consider two mRNA species m 1 and m 2 competing for the same degradation enzyme D in a physical system. For simplicity of exposition, suppose their production dynamics do not depend on each other and can be modelled as P 1 (t) and P 2 (t), respectively. The crosstalk system is given as while the reference system is given as In both systems, we have supposed that time has been rescaled so that the customary parameter k cat for degradation is unity. The crosstalk sensitivities of m 1 and m 2 (with respect to each other) are given as respectively. The crosstalk sensitivity between m 1 and m 2 is non-zero whenever m 1 or m 2 have non-zero initial condition.
Remark C.5. In synthetic biocircuit design, two chemical species x i and x j are often declared orthogonal when there is no designed interaction between them. Mathematically, in the crosstalk free system, this corresponds to This condition is interesting in experimental settings since a computational estimate of ð@=@x j Þ Ð t 0 f c i ðtÞ from perturbation experiments coincides with a direct estimate of the sensitivity of the crosstalk (∂/∂x j )ζ i . More specifically, when x i and x j are measured outputs of the system, we will show in the following that quantifying kQ c i,j ðsÞk is directly related to an estimate of the crosstalk sensitivity (∂/∂x j )ζ i (t) near the equilibrium point x c e .
Remark C.6. In general, estimating the crosstalk sensitivity for the nonlinear systems (C 1) and (C 4) can be challenging if either x i and x j are not measured directly. Firstly, if experimental data are available, they will often consist of data for the measured species y in the crosstalk system, but not the reference system. Second, if only one of the species x i (or none) is available for measurement, even if perturbation of x j is possible, a nonlinear observer is required to estimate the trajectory of x j (t). Unless the parameters of f i (x, u) are known a priori (which is generally not the case), this then also requires system identification of the parameters of f c (x, u) and f a (x, u) which often results in a non-convex optimization problem. Thus, our goal is to estimate the observed crosstalk between measured species Y i and Y j . This crosstalk estimate will invariably include the dynamics of unmeasured chemical species (such as ATP, RNAP, untagged mRNA and protein species, DNA-protein complexes etc.). From a synthetic biology design standpoint, this is not a disadvantage, since the goal is to design a synthetic gene network with an abstracted circuit architecture operating reliably in the context of many unmeasured species. In any genetic circuit, there are always additional biochemical compounds that are unmeasured. As stated in the main text, theorem 2.1 shows that under certain conditions, the dynamical structure function is able to estimate the crosstalk in a nonlinear system. We present the proof of this theorem now here.
Theorem C.7. Let L denote the two-sided Laplace operator. Suppose we have a system model that incorporates the effect of crosstalk where y c [ R p is a vector of measured states in the output, x c h [ R nÀp are the unmeasured states of the system, x c ¼ ðy c , x c h Þ is the full system state, and u [ R m is a vector of system inputs. Furthermore, suppose we have an idealized system model to simulate system dynamics in the absence of crosstalk where y a [ R p is a vector of the measured states in the output, x a h [ R nÀp are the unmeasured states of the system, x a ¼ ðy a , x a h Þ is the full system state, and u [ R m is a vector of system inputs. Let (Q c (s), P c (s)) and (Q a (s), P a (s)) denote the respective dynamical structure functions calculated for each linearized system about the origin. Let zðtÞ ¼ x a ðtÞ À x c ðtÞ denote the deviation of the crosstalk state from the idealized state. Then @L(z i ) @Y j ¼ Q a ij ðsÞ À Q c ij ðsÞ þ @ @Y j L(Oððx c Þ 2 Þ), (C 5Þ and in particular, if Q a ij ðsÞ ; 0 then @Lðz i Þ @Y j ¼ ÀQ c ij ðsÞ þ @ @Y j L(Oððx c Þ 2 Þ), and can be estimated from input output data (Y(s), U(s)).

Proof.
First, note that the Laplace transform of LðzðtÞÞ ¼ Lðx a À x c Þ W X a ðsÞ À X c ðsÞ, which can be decomposed into its measured and unmeasured states This result is important, since it tells us when estimating Q c (s) from experimental data will correspond to estimating crosstalk between measured states in Y(s). Since necessary and sufficient conditions for identifying Q(s) and P(s) have been already characterized [62], this provides conditions for inferring crosstalk from input-output data. For example, a sufficient condition required is that there is an input variable available to excite each measured output of the genetic network attempting to be reconstructed. This allows for the possibility that some biological states are unmeasured and unexcited, but these will be viewed as hidden states that play a role in defining the edge dynamics in Q c (s).
More generally, even if parameters for f a (x, u)(t) are unknown, the structure of Q a (s) can be analytically calculated (using a symbolic algebra package). For every zero entry in Q a (s) (coinciding with designed orthogonality between measured states), we can then estimate Q c (s) directly.
In practice, estimation of Q c (s) is also confounded by noise. In our analysis in this paper, we suppose that a series of filters can be applied to eliminate the noise in the data. This may not be the case for biological systems that have been characterized as inherently stochastic, e.g. single cell gene expression dynamics. In such settings, the estimated dynamical structure Q c (s) is a mixture of the process noise in the system and the crosstalk. From the standpoint of synthetic biocircuit prototyping, both are undesirable in the ultimate iteration of the biocircuit and thus need to be quantified. In this paper, we will demonstrate our theoretical and computational framework with experimental results derived from in vitro systems, where signal-to-noise ratios are high and the only sources of noise are measurement noise and pipetting error. For a theoretical treatment of how to reverse engineer Q c (s) in the presence of process noise or system perturbation, see [86].
An advantage of using Q c (s) to estimate the crosstalk is that we can use the H 1 norm of Q c i,j ðsÞ to calculate the worst-case crosstalk magnitude and H 2 of Q c i,j ðsÞ to calculate the average crosstalk across all frequencies. and Q a (s) is lower-triangular, reflecting the network structure of the intended IFFL. By examining the upper triangular entries in Q c (s), we can directly examine the effects of degradation crosstalk. In the lower entries of Q c (s), these crosstalk effects are confounded with the direct interactions modelled in Q a (s). Although the gains of the entries in Q c (s) are small, they nonetheless can have a significant effect on the dynamics of the IFFL.