Predicting left ventricular contractile function via Gaussian process emulation in aortic-banded rats

Cardiac contraction is the result of integrated cellular, tissue and organ function. Biophysical in silico cardiac models offer a systematic approach for studying these multi-scale interactions. The computational cost of such models is high, due to their multi-parametric and nonlinear nature. This has so far made it difficult to perform model fitting and prevented global sensitivity analysis (GSA) studies. We propose a machine learning approach based on Gaussian process emulation of model simulations using probabilistic surrogate models, which enables model parameter inference via a Bayesian history matching (HM) technique and GSA on whole-organ mechanics. This framework is applied to model healthy and aortic-banded hypertensive rats, a commonly used animal model of heart failure disease. The obtained probabilistic surrogate models accurately predicted the left ventricular pump function (R2 = 0.92 for ejection fraction). The HM technique allowed us to fit both the control and diseased virtual bi-ventricular rat heart models to magnetic resonance imaging and literature data, with model outputs from the constrained parameter space falling within 2 SD of the respective experimental values. The GSA identified Troponin C and cross-bridge kinetics as key parameters in determining both systolic and diastolic ventricular function. This article is part of the theme issue ‘Uncertainty quantification in cardiac and cardiovascular modelling and simulation’.


Gaussian process emulation
The employed Gaussian process emulators (GPEs) are obtained as a combination of a mean function and a zero-mean Gaussian process (GP) (for a deeper insight on GPs, see [1]): The mean function is of the form of a linear regression model while the GP is defined as where k(x, x ) is a positive definite kernel. The GPE training process consists of: (a) fitting the linear regression model's coefficient to the data by minimising the residual sum of squares between the observed targets (data) and the targets predicted by the linear approximation; (b) fitting the GP's hyperparameters to the residuals (data minus mean function predicted mean) by maximising the log-marginal likelihood (equation (2.30) in [1]).  Table 1: GPEs' accuracy, evaluated using a 5-fold cross-validation. One fifth of the training dataset is excluded and one GPE is trained on the remaining points to predict each single output feature. The left-out part is then used for testing the GPE accuracy, and an R 2 score is calculated (Test score). The process is repeated for each of the five subsets (split 1-5), randomly selected from the full training dataset. The final accuracy (Mean test score) is determined by averaging the R 2 scores obtained in predicting the five different left-out parts. LV features' labels are explained in Table 2 in the main manuscript.

History matching formalisation
This Section provides the mathematical description of the Bayesian history matching (HM) technique (implemented following Coveney et al. [2]). HM starts by defining a dataset of observations (simulator runs). One GPE is trained for each output feature to be matched. In the first HM iteration, testing points constituting the "not-ruled-out-yet" (NROY) space are sampled from the input parameter space through a space-filling design, in our case a Latin hypercube (LHD).
In the next iterations, the testing points are sampled from the NROY space of the previous iteration. A subset of NROY points is used to run simulations. The obtained simulator outputs are used to enrich the training dataset, making the GPEs more accurate in the NROY space. GPEs trained on the new space are then evaluated at the remaining points in the NROY space: if there are too few points according to the user needs (for this model: if the number of points was below 256), then additional points are generated within NROY using the "cloud" technique described in [2]. Briefly, every point in NROY space is used to generate new points by sampling from a multi-normal distribution centred on that point and scaled into the current range of the known NROY points, and then further scaled by a factor so that only 10% of the new points were non-implausible (ensuring new points are sufficiently far from the generating points).
The so called implausibility measure (I) is computed for each of the tested point x and for each output feature j: and the space of implausible points, satisfying the reversed inequality. The non-implausible points constitute the new NROY space for the next iteration. The iterations go on while progressively scaling down the I threshold constant, until reaching the convergence of NROY space, i.e. the percentage of NROY space out of the total testing points space is close to zero. The value of I threshold is a measure of the accepted discrepancy between the emulator and observations. This value was updated between waves. I threshold was manually initialised to ensure that a nonempty non-implausible set was obtained after the first wave. Starting from an I threshold value of 3.0 we incremented by 0.5 until we achieved a non-empty non-implausible set. This gave I threshold values of 5.5 and 5.0 in the SHAM and AB models' fitting, respectively. The value of I threshold was then decremented by 0.5 with each HM wave down to the commonly used (see e.g. [3,4]) value of 3.0 (the so-called Pukelsheim's 3-sigma rule [5]). The decrement rate of 0.5 was selected to ensure that we reached the desired final value of 3.0 within a reasonable number waves, to maintain computational tractability, while allowing us to initialise the HM over a sufficient large search space.

History matching target values
This Section summarises the target organ-scale features' mean and standard deviation values for HM, which in our study come from two different sources. Whenever it was not possible to get information about a specific organ-scale feature from the available MRI data [6], the information was obtained from available literature data from experimental studies performed in male rats at body temperature with abdominal/ascending aortic banding [7,8,9,10,11,12,13,14,15]. In particular, EDV, ESV, ET and IVRT features were immediately available from the MRI data, while information about PeakP, maxdP and mindP features was collected from literature. Mean and standard deviation values for the volume-related features come from recordings performed on the cohort of rats (n = 8, n = 15 from SHAM, AB groups, respectively) that underwent MRI examination [6]. Mean and standard deviation values for each pressure-related feature were calculated from the set of experimental mean values collected from each of above-mentioned literature studies. We did not have specific values for IVCT, Tdiast, Tpeak and ESP, and we chose not to match EF explicitly as this is derived form EDV and ESV. This resulted in 7 organ-scale phenotypes (see Table 2) used to constrain the model parameters.

History matching parameter bounds
This Section provides detailed information about the input parameters of the rat heart contraction SHAM and AB models. In the first wave of HM we used a LHD sampling of the plausible parameter space. The bounds of the plausible parameters are defined from prior experimental and modelling studies summarised in Tables 3-4.
The same parameter bounds are used in both SHAM and AB models to avoid bias.   Table 4: AB input parameters' ranges. The range for each parameter in HM is given by the union of experimental and modelling papers' ranges, intersected with the range resulting from the SHAM rat phenotype (Table 3).

Parameter Units
Using HM technique, we fitted the SHAM and AB rat models to match the experimentally observed LV features. The input parameter space underwent reduction towards different regions according to the rat phenotype. Two different regions in the high-dimensional space were obtained even if the process started from the same hypercube for both SHAM and AB rat models. Table 5 summarises the final ranges for the mechanics-related parameters after the tuning.

History matching verification using synthetic data
In this Section, we show the efficacy of HM technique in fitting the SHAM rat heart model to synthetic data. We started from the performed HM on the SHAM model ( Figure 1). We then selected an intermediate wave, specifically wave 4 (Figure 2), among the total 8 waves the full process took to complete. We then sampled points in wave 4 ( Figure 3) that did not belong to the final wave, namely wave 8. We then simulated all these points (n = 48) and computed the mean and standard deviation values of the observed sample for each of the organ-scale LV features, to be further used as synthetic data to be matched. We then started the HM process from the same initial space-filling design (lightest blue variant coloured points in Figure 1) used for the SHAM model HM, this time trying to match the synthetic data instead of the experimental data.
To replicate exactly the same process we performed with experimental data, we tried to match the same 7 features out of the total 12 features of interest, although we had mean and standard deviation values for all the LV features. Figure 4 shows that all the LV features were perfectly matched at the end of the HM process (which took 9 waves to complete), including the 5 features we did not tried to match. Moreover, we observed that the input parameters' ranges were most of the time restricted to spaces which have nonempty intersection with the ranges where the synthetic data were sampled from ( Figure 5).     Parameter range Wave number Figure 5: Input parameters' ranges at each wave during SHAM rat heart model history matching on synthetic data (blue). The parameters' ranges where the synthetic data were sampled from are also shown (red).

Investigating different emulation techniques
In Section 2(e) of the main manuscript, we used a specific formulation which describes the GPE as a combination of a mean function in the form of a linear regression model and a zero mean GP. Although this formulation served well in the context of HM (as previously shown by others, e.g. [3,4]) being fast and easy to repeatedly train in consecutive waves, it is still a heuristic formulation and can potentially lead to underestimating the surrogate model uncertainty in the predictions. In this Section, we investigate a different emulation strategy, as presented in the work of Oakley and O'Hagan [47] and used for HM in [2]. The main difference with the currently used emulation strategy (described in the Section "Gaussian process emulation") is that the linear regressor's coefficients are concurrently fitted with the GP's hyperparameters by maximal likelihood. This approach can potentially lead to a better representation of the surrogate model uncertainty.
To investigate the possible underestimation of the emulator's variance, we evaluated how the choice of a different emulation strategy could impact the HM implausibility condition and consequent high-dimensional parameters' space reduction. From now on, we will refer to the emulators trained with the strategy employed throughout this entire work as "sequential GPEs", while we will refer to the Oakley and O'Hagan [47] emulation strategy (currently under study) as "simultaneous GPEs". The following analysis has been performed for both the SHAM and AB rat models.
We trained simultaneous GPEs (with a linear regression model and RBF kernel) using the HM initial training dataset used for the sequential GPEs (first wave). For the sake of a better visualisation, a 1 million points LHD was sampled in the initial parameter hypercube, and was tested against the implausibility criterion (equation (5)) using both the sequential GPEs and the simultaneous GPEs, with same thresholds as in the original analysis (5.5 and 5.0 for SHAM and AB, respectively). We then repeated the same process for the last wave: simultaneous GPEs were trained on the last wave's training dataset derived using the sequential GPEs. Another LHD with 1 million points was sampled this time in the second to last wave's NROY space and was tested against the implausibility criterion using both the sequential GPEs and the simultaneous GPEs, with threshold set to 3.0.
In Figure 6-7, the implausibility measures obtained by testing all the points in the first wave's LHD when using the two different emulation strategies for both of the rat models are plotted according to their magnitude in a hexbin plot. We can compare the input parameters' highdimensional space reduction both by visual inspection and by computing the percentage of space removed in each wave 1-non-implausible (NIMP) / implausible (IMP) spaces. We can notice that, for both of the rat models, a bigger part of the space is cut using the sequential GPEs (B) compared with the simultaneous GPEs (A). For the SHAM rat the parameter space is reduced by 98.9134% and 99.9117% for the sequential and the simultaneous GPEs, respectively. For the AB rat the parameter space is reduced by 99.4345% and 99.918% for the sequential and the simultaneous GPEs, respectively.
At the last wave, however, the situation is different. At this point the accuracy of both GPEs are improved by higher sampling in the area of interest. Under these conditions, the simultaneous and sequential GPEs behave similarly. In Figure 8-9, the implausibility measures obtained by testing all the points in the last wave's LHD when using the two different emulation strategies for both of the rat models are plotted in the same manner as done in the previous case. We can notice that, for both of the rat models, the space is split into non-implausible and implausible regions in a very similar way. Again, inspecting the ratio values gives a better understanding of the modalities of the high-dimensional space splitting: 99.5202%, 99.1696% for respectively simultaneous and sequential GPEs for SHAM, and 97.187%, 97.9942% for respectively simultaneous and sequential GPEs for AB.
We can conclude that, given the bigger space cut, there might be a non-negligible underestimation of the emulator's variance in the first wave when using the sequential GPE formulation. Smaller variances decrease the denominator in equation (4), thereby increasing the numerator and the implausibility measure for those specific points which could be therefore deemed implausible and discarded. However, with each HM wave the emulator becomes progressively more and more accurate, the underestimation of the uncertainty phenomenon is reduced and the two GPEs formulations reach the same space. This was further assessed by comparing the one-dimensional parameters' ranges we obtained using the sequential GPEs and the ones we now obtain using the simultaneous GPEs. These are summarised in Figure 10. We can see that the parameters' ranges resulting from the last wave are very similar for both of the rat models.
We have provided an initial test of how differences in GPE formulations impact HM solutions. A comprehensive evaluation is beyond the scope of this study. Different groups are using the two GPE formulations described here ( [2,4,48,49,50]) and determining the costs and benefits of each approach in specific contexts would prove useful.

Global Sensitivity Analysis
This Section summarises the GP emulators-based global sensitivity analysis results. The firstand second-order interactions and total effects of input parameters for each of the organ-scale LV feature under study are shown in Figures 12-15. Figure 11 provides an example on how to read the network charts in Figures 12-15. Figure 11, for example, illustrates that the most significant second-order interaction is given by the ca 50 -k off pair of parameters.  Figure 11: Example network charts. The two charts illustrate input parameters' contribution to a specific LV feature's total variance in SHAM (A) and AB (B) models. For each input parameter, the size of the node represents the first-order effect, the thickness of the linking line represents the second-order effect, and the node border thickness represents the total effect.

Sensitivity indices estimation accounting for emulator uncertainty
In Section 2(h) of the main manuscript, we mentioned that global sensitivity indices were estimated by neglecting the emulator's uncertainty and using the emulator's mean as a pointwise approximation of the model's output. This was addressed as a model limitation (discussed in Section 5) and partially motivated by the fact the emulator uncertainty around the predictions was small.
In this Section, we provide additional information to show that this choice does not affect the final sensitivity indices estimates.
The Sobol sensitivity indices were estimated using Saltelli method [51] which efficiently estimates first-and second-order and total effects using full model evaluations at a specific set of points belonging to low-discrepancy quasi-random Sobol sequences. Without going into details, let's suppose that the Saltelli estimator can be described by a function S that takes as an input model f evaluations Y at specific points X (Y = f (X)) and gives as an output the sensitivity indices S i , S ij , S T : In this study, we substituted the full model f with a surrogate model consisting of a set of 12 Gaussian process emulators: and used the emulator posterior mean to estimate the Sobol indices, namely: In order to show that small uncertainties do not affect sensitivity indices estimates, we performed an additional example simulation in which we took into account the emulator posterior variance in the estimation of the SHAM rat model ejection fraction (EF) global sensitivities to model parameters. This was performed by evaluating the Saltelli estimator S at full emulator posterior predictionsŶ , which correspond to values randomly sampled from a multivariate normal distribution with mean vector given by the specific emulator of EF feature (f emul, 3 ) posterior mean µ = E[f emul, 3 (X)], and covariance matrix given by the same emulator posterior covariance matrix Σ = Cov[f emul, 3 (X)], namely: We sampled 1, 000 emulator posterior predictions (Ŷ k , for k = 1, . . . , 1000) and we compared the obtained sensitivity indices (S i , S ij , S T ) k with the ones obtained when only the emulator posterior mean was used (equation (8)). In Figure 16, each sensitivity index' distributions with 2 STD are plotted against the single values obtained when the posterior mean only is run instead. We can see that the distributions' means are almost perfectly matching the single-valued effects for each sensitivity index. The similarities of the sensitivities is consistent with a limited impact of GPE uncertainties on GSA calculation.

Modelling heart tissue mechanics
In this Section we provide a description of the employed mathematical model of heart tissue mechanics.
Tissue models of cardiac mechanics link cellular active contraction models through to whole organ pump function. We modelled mechanical properties of the cardiac tissue using non-linear solid mechanics. Like electrical properties, tissue mechanical properties depend on the orientation of cardiac tissue microstructure. Therefore, in the following description, tensors will be given in terms of a coordinate system locally aligned with the muscle fibre (f ), sheet (s) and sheet-normal directions (n) in the reference configuration, determining the orthonormal matrix L = [f s n].
In finite elastic deformation theory, the transformation from undeformed (X) configuration to deformed (x) configuration is described by means of the deformation gradient tensor: Large deformation mechanics equations enforce the stress equilibrium: −Div(FT) = 0 where T is the second Piola-Kirchhoff stress tensor. The solution to the mechanics problem requires a description of the material behaviour known as the constitutive law, which gives the stress tensor T as a function of the Green strain tensor E and the Cauchy strain tensor C for the material. The strain tensor is given as a function of the deformation gradient tensor: and is transformed to a fibre-aligned coordinate system using E fsn = L T E L. The stress tensor T fsn is determined from the constitutive law, which is generally expressed as a strain energy function W : The stress tensor T fsn is then transformed back using L T fsn L T . In this work, we used the transversely isotropic cardiac strain energy function proposed by Guccione et al. [52], combined with a Lagrange multiplier scheme with a compressibility penalty term [53,54]: where and where κ, c 1 , c 2 , c 3 , c 4 ∈ R + , p is the hydrostatic term and J = det(F). The final second Piola-Kirchhoff stress tensor is obtained from the strain energy function by adding an active tension component T a (calculated using the Land et al. [16] model of cellular contraction) along the fibre direction f : T expresses the stresses in actively contracting incompressible cardiac tissue in terms of its strain and completes the set of equations required to model actively contracting cardiac tissue. The resulting system of equations is solved using the finite element method.

LV features extraction
In this Section we describe how specifically the 12 left ventricular (LV) features under study were extracted from the model output.
The multi-scale model of rat heart contraction is run for a specific input parameters set for 4 heart beats. The 12 key LV features are then extracted from the fourth beat's volume V (t) and pressure P (t) curves as follows: Tpeak = arg max t>0 P (t) (= P (t 5 )) (26) where t 0 , t 1 , t 2 , t 3 , t 4 , t 5 are all positive time points, shown in Figure 17.