Modular dynamic biomolecular modelling with bond graphs: the unification of stoichiometry, thermodynamics, kinetics and data

Renewed interest in dynamic simulation models of biomolecular systems has arisen from advances in genome-wide measurement and applications of such models in biotechnology and synthetic biology. In particular, genome-scale models of cellular metabolism beyond the steady state are required in order to represent transient and dynamic regulatory properties of the system. Development of such whole-cell models requires new modelling approaches. Here, we propose the energy-based bond graph methodology, which integrates stoichiometric models with thermodynamic principles and kinetic modelling. We demonstrate how the bond graph approach intrinsically enforces thermodynamic constraints, provides a modular approach to modelling, and gives a basis for estimation of model parameters leading to dynamic models of biomolecular systems. The approach is illustrated using a well-established stoichiometric model of Escherichia coli and published experimental data.


Introduction
The recent explosion of omics data has generated an interest in developing dynamic whole-cell models that account for the function of every gene and biomolecule over time. Such models have the potential to 'predict phenotype from genotype' [1][2][3] and hence to 'transform bioscience and medicine' [4]. Critical to understanding the large-scale metabolism within cells is the stoichiometric approach [5][6][7][8], which has had notable successes including the genome-scale reconstruction of the metabolism of Escherichia coli [9][10][11] and Neocallimastigomycota fungus [12].
The stoichiometric approach can give rise to constraint-based models such as flux balance analysis (FBA) [13], which predict metabolic fluxes at steady state. However, most implementations of such constraint-based models do not explicitly consider energy. This can lead to mass flows that are not thermodynamically possible because they violate the second law of thermodynamics. Such non-physical flows can be detected and eliminated by adding additional thermodynamic constraints, as in thermodynamics-based metabolic flux analysis (TFA) [14,15], energy balance analysis (EBA) and expression, thermodynamicsenabled flux models (ETFL) [16][17][18][19][20] and loopless FBA [21]. Whereas constraintbased models provide metabolic fluxes, they generally do not explicitly account for metabolite concentrations, or how fluxes vary over time, both of which are required for dynamic whole-cell modelling. However, the stoichiometric approach can help to bridge towards dynamic models capable of satisfying these requirements. In this context, there has been work into developing two types of large-scale dynamic models: fully detailed mass action stoichiometric simulation (MASS) models [7,22,23] and simplified network models that use non-mass action rate laws such as lin-log laws or modular rate laws [24,25]. Although mass-action approaches seem restrictive, we note that models of enzyme kinetics can be built from elementary mass-action reactions [26].
MASS models are parameterized by reaction rate constants which are subject to thermodynamic constraints such as the Wegscheider conditions [27] (Wegscheider conditions are a formulation of detailed balance conditions which avoid models that are inconsistent with thermodynamic laws [26], §1.5). This paper focuses on the mass-action formulation and introduces an alternative to MASS which explicitly incorporates thermodynamics. Specifically, the approach uses an alternative parameterization related to that of thermodynamic-kinetic modelling (TKM) [27,28]. TKM explicitly divides parameters into those associated with capacities and resistances by analogy with electrical systems; this approach gives thermodynamic consistency without invoking additional constraints such as the Wegscheider conditions [27,28]. Mason and Covert [29] developed a similar approach for a non-mass-action rate law.
Recently, the bond graph approach from engineering [30][31][32][33] has been adapted to biochemistry [34][35][36][37]. Bond graphs are close in spirit and application to TKM in that they produce ordinary differential equations for dynamic simulation [37] and that their parameters satisfy thermodynamic consistency without the need to invoke Wegscheider conditions [34][35][36][37]. However, bond graph models are endowed with several additional features: 1. Bond graphs can be easily generalized to model multiphysics systems and thus readily incorporate the physics of electrically charged species into an integrated model combining both chemical and electrical potential [38][39][40][41][42]. 2. Bond graphs are modular [43,44], a key requirement of any large-scale modelling endeavour [45]. 3. Bond graph models can be systematically modified to give simpler bond graph models which remain compatible with thermodynamic laws [37,46,47].
The stoichiometric matrix of a biomolecular network can be derived from the corresponding bond graph [37,43]. Similarly, as shown herein, a bond graph model can be constructed from a stoichiometric matrix. Thus, the large repository of models of biomolecular systems available in stoichiometric form are available as templates for developing bond graph models; we provide a methodology for this later in the paper. Furthermore, once rate laws such as mass action are added, such templates provide a basis for complete dynamic models of metabolic systems.
A key challenge in the development of dynamic models is the fitting of parameters to experimental data, especially when thermodynamic constraints need to be satisfied [48,49]. For large-scale biomolecular models such as whole-cell models, applying these constraints is particularly challenging [50]. In this paper, we use the thermodynamically safe parameterization provided by bond graphs to resolve this issue. As in the TKM [27,28] approach, the bond graph approach uses an alternative parameterization which satisfies thermodynamic constraints as long as the parameters are positive; such inequality constraints are easier to handle than nonlinear constraints. We illustrate this approach by generating a dynamic bond graph model of E. coli metabolism, using a well-established stoichiometric model [51] as a template and show that the use of thermodynamic parameters can significantly streamline the process of parameter estimation.
In summary, this paper proposes the fusion of the stoichiometric and bond graph approaches to modelling biological systems and illustrates its potential for the unification of stoichiometry, thermodynamics, kinetics and data. Section 2 summarizes the bond graph background to the rest of the paper. Section 3 shows how bond graph models can be extracted from stoichiometric information, used to create modular models and analysed in terms of pathways; the relationship of the approach to energy balance analysis is also discussed. Section 4 applies these concepts to two subsystems within the E. coli core model-a well-documented [8,51] and readily available stoichiometric model of a biomolecular system. The model is available within the COBRApy [52] package. Section 5 shows how thermodynamically consistent bond graph parameters can be extracted from experimental data and gives a dynamic simulation of the parameterized model. Section 6 concludes the paper and gives directions for future work.

Bond graphs
This section gives a brief introduction to the bond graph approach to modelling biomolecular systems based on the seminal work of Oster et al. [34,35] as extended by Gawthrop & Crampin [37,44,53].

Basic components
Bond graphs represent the energetic connections between components of a system. The S symbol is used to indicate an energetic connection, or 'bond', between components; the half-arrow indicates the direction corresponding to positive energy flow. In the biomolecular context, each bond is associated with two covariables: chemical potential μ (J mol −1 ) and flow v (mol s −1 ). The key point is that the product of μ and v is power p = μv (W). This ensures that models are consistent with the laws of thermodynamics, as energy flow is explicitly accounted for. In the context of cellular metabolism, and in line with the measurement of redox potentials, it is convenient to scale these co-variables by Faraday's constant F ≈ 96485 C mol −1 to give where (J C −1 ) has been replaced by the more convenient unit volt (V) and (C mol −1 ) has been replaced by the more convenient unit ampere (A) [38]. As a useful rule-of-thumb, μ (kJ mol −1 ) can be converted to ϕ (mV) by dividing by 10 6 /F ≈ 10. Bonds transmit, but do not store or dissipate energy. Within this context, the bonds connect four distinct types of component: 0 and 1 Junctions provide a method of connecting two or more bonds, and therefore creating a network. Analogous to electrical systems, there are two types of junction, denoted 0 and 1 . The bonds impinging on a 0 junction share a common effort (chemical potential); the bonds impinging on a 1 junction share a common flow. Both 0 and 1 junctions transmit, but do not store or dissipate energy. As discussed previously [37], the arrangement of royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210478 bonds and junctions represents the stoichiometry of the corresponding biomolecular system and thus the relationship both between reaction and species flows and between species potentials and reaction forward and reverse potentials. Furthermore, the reverse is also true: the stoichiometric matrix of a biomolecular system uniquely determines the bond graph, as will be discussed further below. Ce represents biochemical species. Thus species A is represented by Ce:A with the equations: and accumulates the flow f A of species A. Equation (2.3) generates chemical potential ϕ A in terms of the reference potential f É A at reference conditions x É A . Ce components thus store, but do not dissipate, energy. An equivalent parameterization that we use in this paper is to express the chemical potential in terms of ϕ N and the species constant K A , as defined in equation (2.5).
Re represents reactions. The flow f associated with each reaction is given by the Marcelin-de Donder formula [37,54]: where F f and F r are the forward and reverse reaction potentials (or affinities), defined as the sums of the chemical potentials of the reactants and products, respectively. If κ is constant, this represents the mass-action formula.
In general, κ is a function of F f , F r and enzyme concentration [37]; for example, a reversible Michaelis-Menten formulation used in Gawthrop et al. [43] is: where the three constants f max , K f and ρ define the kinetics. As discussed elsewhere [26,37], enzyme kinetics can be modelled using the pair of reactions with mass-action kinetics where A, B, E and C are the substrate, product, enzyme and complex of substrate and enzyme, respectively; the bond graph representation is given in appendix E. Equation (2.7) arises from the steady-state analysis of this model [37]. In particular K C and K E correspond to equation (2.5) for the complex and enzyme, respectively, and e 0 is the total amount of enzyme (unbound and bound within the complex).
Re components dissipate, but do not store, energy. In where F ¼ F f À F r and ϕ is a vector containing the chemical potentials of every species. Since f always has the same sign as F, f() is dissipative in F for all ϕ: fF . 0: ð2:11Þ The key stoichiometric equations arising from bond graph analysis are [37] where x, f, F and ϕ are the species amounts, reaction fluxes, reaction potentials and species potentials , respectively, all represented as vector quantities. N is the stoichiometric matrix of the network. Combining equations (2.12) and (2.13) x is the rate of energy into the species (which must be negative or zero for closed systems) and F T f is the rate of energy dissipated by the reactions. Since f T _ x þ F T f ¼ 0, it follows that the network of bonds and junctions transmits, but does not dissipate or store, energy [37].
Moreover, the stoichiometric matrix N can be decomposed as [37] where N r corresponds to the positive entries of N and N f to the negative entries. The forward and reverse reaction potentials F f and F r are given by In other words, the stoichiometric matrix N can be derived from the system bond graph. Section 3 shows that, conversely, the system bond graph can be derived from the stoichiometric matrix N.

Chemostats, flowstats and pathways
Modularity implies the interconnection of subsystems; thus such subsystems must be thermodynamically open. As discussed previously [38,44], the notion of a chemostat [55] is useful in creating an open system from a closed system. The chemostat has a number of interpretations [38]: 1. One or more species are fixed to give a constant concentration [43]; this implies that an appropriate external flow is applied to balance the internal flow of the species. 2. As a Ce component with a fixed state. 3. As an external port of a module which allows connection to other modules.
In the context of stoichiometric analysis, the chemostat concept provides a flexible alternative to the primary and currency exchange reactions [6,8,56].
Alternatively, reaction flows can be fixed using the dual concept of flowstats [44], which has a number of interpretations: royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210478 1. As an Re component with a fixed flow. 2. As an external port of a module which allows connection to other modules.
In the context of this paper, we use flowstats to isolate parts of a network by setting the flows of certain reactions to zero. Such zero flow flowstats can also be interpreted as removing the corresponding enzyme via gene knockout.
In terms of stoichiometric analysis, the closed system equations (2.12) and (2.13) are replaced by where N cd is created from the stoichiometric matrix N by setting rows corresponding to chemostats species and columns corresponding to flowstatted reactions to zero [44]. As discussed by Gawthrop & Crampin [44], system pathways corresponding to equation (2.17) are defined by the rightnull space of N cd , that is, the columns of a matrix K p satisfying the equation N cd K p = 0. At steady state, the flows through these pathways are defined by In a similar fashion to equation (2.18), the pathway reaction potentials F p are given by In the same way as the stoichiometric matrix N relates reaction flows to species and thus represents a set of reactions, the pathway stoichiometric matrix N p also represents a set of reactions: these reactions will be called the pathway reactions. Pathways can be divided into three mutually exclusive types [56] according to the species corresponding to the non-zero elements in the relevant column of the pathway stoichiometric matrix N p : Type I The species include primary metabolites; these pathways are of functional interest.
Type II The species include currency metabolites only; these pathways dissipate energy without creating or consuming primary metabolites. Such pathways are sometimes called futile cycles; however, they have an important role to play in regulating metabolite flow [57][58][59][60][61][62]. Type III There are no species. These may arise when the same reaction is catalysed by different isoforms of the same enzyme.
Pathway reactions for type I pathways contain both primary and currency metabolites; pathway reactions for type II pathways contain currency metabolites only; pathway reactions for type III pathways are empty. The concept of pathways is applied to a simple example in appendix B and to a biomolecular example (the pentose phosphate pathway) in §4.1.

Bond graphs integrate stoichiometry and energy
As discussed in the previous section, the stoichiometric matrix can be directly derived from the bond graph; this section shows that the converse is true and thus bond graphs can be automatically derived from preexisting stoichiometric representations thereby allowing bond graph energy-based analysis and modularity to be applied to such models.

Generating a bond graph from a stoichiometric matrix
A bond graph can be constructed from a stoichiometric matrix by using the following procedure: For example, the reaction A À ! À r1 2 B has the stoichiometric matrix and the bond graph of figure 1a. The reaction B þ C À ! À r2 D þ E has the stoichiometric matrix and has the bond graph of figure 1b. Bond graphs provide a graphical representation of a system. While this provides an intuitive and clear visual representation when dealing with small systems such as the ones royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210478 shown above, such visualization becomes cumbersome for large systems. We employ two approaches to overcome this issue for the large-scale systems considered in this paper: modularity and a non-graphical (or programmatic) representation. In particular, we use a recent concept of bond graph modularity [38] in §3.2 and the recently developed BondGraphTools package [63] (https://pypi.org/project/ BondGraphTools/) as a non-graphical representation that allows large-scale systems to be constructed in a scalable and automated manner. This is discussed further below.

Modularity
Two related but distinct concepts of modularity [44] are computational modularity, where physical correctness is retained, and behavioural modularity, where module behaviour (such as ultra-sensitivity) is retained. Here, we discuss computational modularity. In particular, it is shown how the concept of external flows, as discussed in §2.2, is key to bond graph modularity. Modular bond graphs provide a way of decomposing complex biomolecular systems into manageable subsystems [43,44,53]. This paper combines the modularity concepts of Neal et al. [64][65][66] with the bond graph approach to give a more flexible approach to modularity. The basic idea is simple [38] . Modules are self-contained and have no explicit ports, but any species represented by a Ce component has the potential to become a port available for external connection. Thus, if two modules share the same species, the corresponding Ce component in each module is replaced by a port (labelled with the same name), and the species is explicitly represented as a Ce component in the parent model. This approach allows each module to be individually tested prior to being integrated into a larger model.
We use the following algorithm to merge bond graph models of stoichiometric networks: For example, let modules M1 and M2 correspond to figure 1a,b, respectively. The composition of these modules requires the common species B to be exposed in both modules. This is illustrated in figure 2, where both modules are connected to the new Ce:B component via a 0 junction. The composite system contains the two coupled reactions Section 4 gives examples of modular decomposition of a metabolic system and §4.2 gives an example of how such modules can be combined using the methods of this section. The pathway analysis of §2.2 can be applied to modules themselves, and to systems built of modules, to give insight into the overall behaviour of complex systems; this is illustrated in §4.2.
The concept of modularity can be extended to include common Re (reaction) components [67]; but this concept is not pursued in this paper.

Energy balance analysis in a bond graph context
FBA [13] uses the linear equation (2.19) within a constrained linear optimization to compute pathway flows. EBA [16] adds two sorts of nonlinear constraint arising from thermodynamics. This section shows that the bond graph approach automatically includes the EBA constraint equations by considering Inequality (2.11) and equation (2.18). In particular 1. Inequality (2.11) corresponds to equation (8) of Beard et al. [16]. This inequality can be re-expressed as This corresponds to equation (7) of Beard et al. [16]. Note that K defines the pathways of the closed system, with no chemostats.
Moreover, the pathways of the open system as defined by K cd can be considered by defining R = diag r i and using equation (2.19) K T RK p f p ¼ 0: ð3:7Þ 1. For each species create a Ce component with appropriate name and a 0 junction; connect a bond from the 0 junction to the Ce component.

2.
For each reaction create an Re component with appropriate name and two 1 junctions; connect a bond from one 1 junction to the forward port of the Re component and a bond from the reverse port of the Re component to the other 1 junction.
3. For each negative entry N ij in the stoichiometric matrix, connect − N ij bonds from the zero junction connected to the ith species to the 1 junction connected to the forward port of the jth reaction.
4. For each positive entry N ij in the stoichiometric matrix, connect N ij bonds from the one junction connected to the reverse port of the jth reaction to the zero junction connected to the ith species.

5.
If an Michaelis-Menten formulation is required, each Re component is replaced by a bond graph module ( §3.2) corresponding to the enzyme catalysed reaction pair (2.8) and appendix E.
1. Within each module, each Ce component corresponding to a common species is exposed, that is, replaced by a port, or external connection.

2.
For each common species, create a Ce component connected to a 0 component.

3.
Connect all module ports associated with each species to the 0 junction associated with the species; all instances of Ce components corresponding to each species are thus unified into the same component.

Application to the E. coli core model
The E. coli core model [8,51] (see figure 3) is a welldocumented and readily available stoichiometric model of a biomolecular system; species, reactions and stoichiometric matrix were extracted from the CobraPy model: 'textbook'.
Using the methods of §3.1, the corresponding bond graph model was created which, as discussed in the Introduction, automatically satisfies thermodynamic constraints. To illustrate the concepts developed above, we analyse two subsets of reactions within this model 1. Section 4.1 uses the methods of §2.2 to examine possible pathways within the system formed from the combined glycolysis and pentose phosphate pathway (which produces precursors to the synthesis of nucleotides). 2. Section 4.2 uses the modularity approach of §3.2 to build a modular model of respiration using glycolysis, the TCA cycle, the electron transport chain and ATP synthase as modules. Furthermore, the methods of §2.2 are applied to examine the pathway properties of an individual module (the TCA cycle) as well as the overall system.

Glycolysis and pentose phosphate pathway
The combination of the glycolysis and pentose phosphate networks provides a number of different products from the metabolism of glucose. This flexibility is adopted by proliferating cells, such as those associated with cancer, to adapt to changing requirements of biomass and energy production [69,70].
We construct a stoichiometric model of these pathways, consisting of the upper reactions of glycolysis and the pentose phosphate pathway. The full reaction network is given in appendix C, and a bond graph is constructed using the methods of §3.1.
As discussed in the textbooks [61,71], it is illuminating to pick out individual paths through the network to see how these may be used to provide a variety of products. This is reproduced here by choosing appropriate chemostats and flowstats ( §2.2) to give the results listed by Garrett & Grisham [61] §22.6d. In each case, the corresponding pathway reaction potential is given. For consistency with Garrett & Grisham [61] §22.6d, each pathway starts with glucose 6-phosphate (G6P). We use the following list of chemostats (together with additional chemostats) for the pathway analysis below: {ADP, ATP, CO 2 , G6P, H, H 2 O, NAD, NADH, NADP, NADPH, PI, PYR}. The pathways are generated using the methods of §2.2.

R5P and NADPH generation
Chemostats:  royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210478 Reaction: ADP + G6P + 6 H 2 O + 12 NADP ⇄ ATP + 6 CO 2 + 11 H + 12 NADPH. In §5, we use the model of the glycolysis and pentose phosphate pathways as a basis for inferring parameters from experimental data. Once the parameters have been identified ( §5.4), dynamic simulations of these pathways can be run. This is shown later in §5.6.

Respiration
To illustrate the utility of using bond graphs for the modular construction of stoichiometric models, we construct a model of respiration by combining the subsystems of glycolysis, TCA cycle, electron transport chain and ATP synthase. Reactions for each of these subnetworks were extracted from the CobraPy model; these reactions are listed in appendix D. For simplicity, reactions PDH and PFL (converting PYR to ACCOA) and reaction NADTRHD (converting NADP/ NADPH to NAD/NADH) were included in the TCA cycle module. Once these are converted into bond graphs, the algorithm in §3.2 was used to combine these models together into a model of respiration.

Analysis of individual modules
An advantage of considering subsystems as separate modules is that these modules can be analysed individually. For example, the TCA cycle module can be analysed using the set of chemostats (see §2.  Figure 3. Escherichia coli core model. The extracted reactions corresponding to the glycolysis, pentose phosphate pathways and TCA cycle parts of the model are shown; a complete list of reactions is given in appendix D. The diagram was created using Escher [68].

Analysis of combined network
The bond graph approach provides a method for easily combining stoichiometric models using the methods of §3.2. Here, we demonstrate this by constructing a model of respiration from the individual modules glycolysis, TCA cycle, electron transport chain and ATP synthase. We begin by first combining the glycolysis and TCA modules, as indicated in figure 4a. As well as the common species PYR ( pyruvate) explicitly shown, the set of species fATP, ADP, PI, H, NAD, NADH, H 2 Og, were also declared to be common.
The full model of respiration is then constructed by combining the glycolysis + TCA cycle module with the electron transport chain and ATP synthase modules, as indicated in figure 4b. In addition to the common species explicitly shown

Dynamic modelling and parameter estimation
Dynamic models of biochemical networks have the potential to aid the understanding of how subprocesses change over time, and can potentially elucidate important control structures within these networks [72]. However, due to their nonlinear nature, parameter estimation is one of the most  royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210478 challenging aspects of developing models of biomolecular systems [73]. Parameter estimation depends on both the form of the model and the type of data available. This section assumes a bond graph model with the mass-action kinetics of equation (2.6) and that the following data are available for a single steady-state condition: 1. Reaction potentials F (equivalent to reaction Gibbs free energy). 2. Reaction flows f.

Species concentration c.
If data at three or more steady-state conditions were available, more complex kinetics such as the reversible Michaelis-Menten formulation (2.7) could be used but this is not pursued in this paper.
In recent times, such data are becoming more readily available; species concentrations can be obtained from metabolomics data, and tracer experiments involving 13 C and 2 H have been used to infer both fluxomics data for reaction flows [74,75] and thermodynamic data for reaction potentials [74,76,77]. In the following examples, we make use of the dataset obtained by Park et al. [74] to infer the thermodynamic parameters using a relatively fast quadratic programming algorithm.
Because bond graph models are thermodynamically consistent, the estimated parameters have physical meaning and the resultant estimated model, though not necessarily correct, is physically plausible [47]. Moreover, physical constraints imply parametric constraints thus reducing the parameter search space.

Species potentials
Because of the energetic constraints implied by the bond graph, the reaction potentials F are related to the species potentials ϕ by equation (2.13). Since some reaction potentials may be unavailable, we rearrange and partition F and the stoichiometric matrix N so that where F 0 and F 1 contain the known and unknown values of F respectively. Given the measured value of F 0 and the estimated species potentialsf, the estimation error ϵ is defined as I is the n f Â n f unit matrix and λ > 0 a small positive number. In some cases, there are more species than reactions and so the stoichiometric matrix N has more rows than columns; as a result, the number of species potentials ϕ is greater than the number of reaction potentials F and so equation (2.13) has no unique solution for ϕ given F. In such cases, it is standard practice to use the λI term to turn a nonunique solution for ϕ into a minimum norm solution.
Having deduced a set of estimated species potentialsf using the QP, the corresponding reaction potentialsF 0 and F can be obtained from equation (2.13) rewritten aŝ Once again,F 0 ¼ F 0 and the other values of F can be deduced from (5.7); because of the inequality constraint in the QP, these values are positive and thus physically plausible. QP also handles equality constraints [78]; this provides a potential mechanism for incorporating known parameters into the procedure.

Pathway flows
From basic stoichiometric analysis, steady-state flows f can be written in terms of the pathway matrix K p and pathway flows f p by equation (2.19) repeated here as ð5:8Þ Note that, as discussed in §2.2, the pathway matrix K p is dependent on the choice of chemostats. In general, K p has more rows than columns and thus the pathway flow f p is over-determined by the reaction flows f. Hence, given a set of experimental flows f, an estimatef p of f p can be obtained from the least-squares formula

Reaction constants
In terms of estimated quantities, the reaction flow of equation (2.6) can be rewritten aŝ ð5:12Þ For each reaction, the estimated reaction constantk is then given by equation (5.11).
royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210478 Similarly, reversible Michaelis-Menten reaction kinetics can be written in terms of estimated quantities and three estimated parametersf max ,K f andr from equation (2.7) f ¼f This can be rearranged aŝ and can be in rewritten in linear-in-the-parameters form [79] as and Given an estimateû of θ, the estimation error ϵ 0 is Because there are three unknown parameters (f max ,K f andr), at least three different sets of steady-state data are required to uniquely determine the parameters; this case is not considered here. Alternatively, these unknown parameters can be determined using measured constants from the literature [29]. Such known parameters can be included using an equality constraint of the form A ecû ¼ b ec -an example appears in §5.5. Noting that all elements of θ are positive,û also has the inequality constraintû . 0, the error equation (5.19) together with the constraints can be embedded QP [78] minimize 1 2û T Pû þ q Tû subject toû . 0 and A ecû ¼ b ec , where P ¼ X T X þ lI; q ¼ X T y ð5:20Þ I is the n u Â n u unit matrix and λ > 0 a small number. The parameters of the equivalent bond graph model can be deduced using equation (2.9). More general reaction kinetics [29] can be incorporated in a straightforward manner but, however, would require nonlinear fitting procedures to determine parameters.

Dynamical parameters
The parameter K of the species components (Ce) determines the time course of species amounts and reaction flows when there is a deviation from steady-state. Using equation (2.5), this can be determined from the species potential estimatef and the amount of species x É at the steady-state conditions. Expressing amounts per unit volume, it follows that x É ¼ c, the species concentration at the steady-state conditions.

Parameters for the glycolysis and pentose phosphate model
The bond graph of the glycolysis and pentose phosphate model ( §4.1) was parameterized to fit E. coli experimental data [74] using the approach described in this section. Table 2 [74] gives experimentally measured values of the reaction Gibbs energy ΔG for all of the reactions in the model except for G6PDH2R and PGL. The known values of ΔG were converted to reaction potentials F 0 ðmVÞ. The unknown potentials F 1 were constrained to be greater than 1 mV. The first column of table 1c gives the experimental values of reaction potential F with the unknown values indicated by -; the second column gives the corresponding estimatesF ðmVÞ. The estimated and known values are identical; of the two estimated unknown values, that for PGL lies on the constraint that unconstrained optimization gives a physically impossible negative value. As discussed in §2.2, pathways are determined by chemostats. In this case, it was assumed that the set of chemostats was: {ADP, ATP, CO 2 , G3P, G6P, H, H 2 O, NADP, NADPH, R5P}. Using the methods of §2.2, there were three pathways 1. PGI + PFK + FBA + TPI 2. G6PDH2R + PGL + GND + RPI 3. -2 PGI + 2 G6PDH2R + 2 PGL + 2 GND + TKT2 + TALA + TKT1 + 2 RPE with pathway matrix K p given by

ð5:24Þ
Data normalization is important in the context of parameter identification in systems biology [80]. Here, the experimental concentration and flow data [74] was normalized with respect to the concentration of G6P and flow of PGI (given in mM/min) by defining: where t 0 is the corresponding time unit. Using the pathway decomposition and the method of §5.2, the three pathway flows were deduced to be those of table 1d. The estimated reaction flowsf are then deduced from equation (5.8) and given in the fifth column of table 1c. The chemostat flows are given in table 1b. The concentrations given in table 3 [74] were used to derive the species parameters of table 1a.
The reaction constants κ of the mass action formulation are given in table 1d together with the reaction constants κ 1 and κ 2 of the Michaelis-Menten formulation derived using the QP of (5.20). These parameters are used to perform a dynamical simulation in §5.6.

Simulation
The parameters of table 1a,d were used with the bond graph model of the glycolysis and pentose phosphate pathway ( §4.1) to run simulations. In §4.1, we derived three pathways royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210478   royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210478 within this system; these are now simulated separately here. In particular, chemostats and flowstats (as defined in §4.1) were implemented for the three cases and the initial concentrations were set to those in table 1a where known and to unit values where unknown. The simulation was performed separately for two cases: the mass-action formulation using the κ parameters and the Michaelis-Menten formulation using thek 1 andk 2 parameters. Figure 5 shows the ratios ρ R5P = f R5P /f G6P and ρ NADPH = f NADPH /f G6P of the chemostat flows corresponding to the products R5P and NADH to the chemostat flow corresponding to the substrate G6P. At steady state, these ratios correspond to the stoichiometry of the three pathways of §2.2. In particular, pathway i yields both products, pathway ii yields more R5P at the expense of NADPH and pathway iii yields more NADPH at the expense of R5P. Figure 5a,b correspond to the mass-action formulation and figure 5c,d correspond to the Michaelis-Menten formulation.
Because the two-reaction Michaelis-Menten formulation of enzyme catalysed reactions (2.8) explicitly includes the enzyme, such models can be used to examine system behaviour as enzyme levels change.

Conclusion
The formulation of dynamic simulation models for largescale biological systems remains a key challenge in systems biology. With the advent of genome-scale simulation and whole-cell modelling, there is increasing recognition of the need for a modular approach in which model components can be formulated, tested and validated independently, and then seamlessly integrated together to form a model of the whole system. However, a dynamic modelling framework which is modular and which can in principle describe the broad range of biochemical and biophysical cellular processes has been elusive.
Several authors have acknowledged the need for energetic considerations to be integrated into modelling approaches, both to ensure that models are consistent with basic thermodynamic principles, and to enable calculation of energy flows and related concepts such as efficiency [67].  royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210478 Here, we have shown that thermodynamically compliant dynamic models of metabolism can be generated using the bond graph modelling approach, with the stoichiometric matrix as the starting point. Bond graphs, first advocated in the context of biological network thermodynamics by Oster et al. [34], represent both energy and mass flow through the biochemical network. Bond graphs separate the system connectivity from energy-dissipating processes (reactions), and thus are a very natural fit to network-based modelling in systems biology. As a port-based modelling approach, bond graphs are also inherently modular. Furthermore, application of bond graph modelling principles automatically endows models with a number of necessary features for large-scale modelling including modularity, thermodynamically distinguished parameters (wherein system-wide thermodynamic parameters relating to biochemical species are distinguished from reaction-specific parameters) and hence, as noted by Mason & Covert [29], improved opportunity for parameter identification from data. Energy-based modelling of biochemical reaction networks using bond graphs naturally encompasses the EBA approach [16], where we have shown that the key equations of EBA are implicit in the system bond graph. This is a powerful advantage as it means that no additional steps are required in order to satisfy thermodynamic constraints. Any model formulated as a bond graph implicitly satisfies these constraints; it is not possible to impose, or infer from data, parameters which break these constraints.
A further benefit for large-scale modelling is that bond graphs naturally lend themselves to model reduction, for example through generation of reduced-order models using pathway analysis [53,81]: any such simplified model will also satisfy the same thermodynamic constraints. This enables a hierarchical approach to modelling, and it is not necessary to model all aspects of the system at the same level of detail. Different levels of representation can be used as required, for example reflecting available knowledge and data about different parts of the system.
As noted in the Introduction, a key challenge in the development of dynamic models is the fitting of parameters to experimental data. We have shown that both mass-action kinetics and (reversible) Michaelis-Menten kinetics fall within the bond graph framework and therefore have a thermodynamically safe parameterization; moreover, it is shown this parameterization leads to a linear-in-the parameters estimation problem. Bond graphs separate the constitutive relations describing the reactions from the connectivity of the model; it is therefore possible to incorporate more complex kinetic schemes [59], including inhibition, allosteric modulation and cooperativity within the bond graph approach thus retaining thermodynamically safe parameterization. However, the resultant parameter estimation problem will not, in general, be linear-in-the parameters and will therefore require an optimization approach such as that used by K-FIT [82]. Optimization approaches such as K-FIT do not use a set of parameters that is thermodynamically safe by design, hence they need to derive additional constraints to incorporate thermodynamic consistency. Future work will examine how the thermodynamically safe parameterization induced by the bond graph approach can be used to simplify such optimization when applied to large systems and datasets.
According to Noor et al. [83] in the context of obtaining biological insights though omics data integration: 'To maximize predictive power and mechanistic insights on the molecular level, ODE simulations based on physical models of binding and catalysis remain the gold standard.' The illustrative example of this paper shows how data involving flows, concentrations and chemical potentials can be integrated using the physical model structure provided by combining stoichiometric and bond graph approaches. It is believed that this provides a basis for integrating the larger and more varied omics data becoming available. Moreover, the physical basis of the approach can be used to indicate what additional data should be gathered to fully parameterize the model.
Here, we have demonstrated that thermodynamically compliant dynamic models can be constructed starting from the stoichiometric matrix. The plethora of existing stoichiometric models for metabolic networks provides a natural starting point for this endeavour. However, while metabolic models are of central importance in a number of contexts, models of cellular physiology in general, and whole-cell models in particular, require a framework that can incorporate a much broader range of cellular processes, feedback and regulation. As a general tool for physically plausible systems modelling, bond graphs can naturally include energy compliant connections to other physical domains and processes, including transport [84], electrochemical transduction [38,39], membrane potential dynamics [41], mechanochemical transduction and photosynthesis. Furthermore, through incorporation of control-theoretic concepts, enzyme modulation and feedback control can be represented in a coherent manner [62]. There remain however several key domains of cellular biology where to our knowledge there are as yet no examples of bond graph modelling, including transcription and translation [20,85,86]. These will need to be demonstrated in order to provide a complete road map for construction of modular and thermodynamically compliant whole-cell models using bond graphs.
As r i > 0, it follows that v 1 and v 2 must either be zero or have the same sign.

A.2. Example: three-reaction cycle
Beard et al. [16] give the example of a three-reaction cycle. Figure 6b shows the corresponding bond graph. The species A, B and C are joined by three reactions A À ! À As r i > 0, it follows that v 1 and v 2 must either be zero or have the opposite sign. Alternatively, setting A, B and C as chemostats

Appendix B. Pathways: illustrative example
This example refers to §2.2. Noor [19] gives a simple illustrative example of the three types of pathway; figure 7a gives the corresponding bond graph. the reactions are The there are seven species and six reactions giving states x and flows v 2x ¼ The stoichiometric matrix is Setting A, E, ATP and ADP as chemostats, N cd is constructed by setting the corresponding rows of N to zero. The corresponding null space is three dimensional and corresponds to the three pathways (i) r1 + r2 + r3 + r4 (ii) r3 + r4 + r5 (iii) r1 + r2 + r6.
Using (2.20), the pathway stoichiometric matrix N p is Pathway reaction P1 corresponds to a type II pathway, pathway reaction P2 to a type III pathway and pathway reaction P3 to a type I pathway where A is converted to E driven by the conversion of ATP to ADP. The example is extended by assigning a set of nominal chemical potentials f to the species: The pathway reaction potentials are then computed using (2.21) as F P1 ¼ À2, F P2 ¼ 0, F P3 ¼ À1. As the potential for each pathway only depends on the species appearing in the pathway reactions, the potential of non-chemostatted species are irrelevant for this computation. In fact, the potentials of the species will correspond to the steadystate values of concentrations of the non-chemostatted species arising from the flow patterns corresponding to the chemostat potentials [87]. The pathway bond graph appears in figure 7b.

Appendix C. Glycolysis and pentose phosphate pathways
This section contains the reactions used in §4.1 to generate the three pathways arising from the upper reactions of glycolysis and the pentose phosphate pathway. The reactions are extracted as discussed in §4.
The reactions are  Figure 7. Bond graphs for illustrative example [19]. (a) Bond graph and (b) pathway bond graph.