# An experimental design tool to optimize inference precision in data-driven mathematical models of bacterial infections *in vivo*

## Abstract

The management of bacterial diseases calls for a detailed knowledge about the dynamic changes in host–bacteria interactions. Biological insights are gained by integrating experimental data with mechanistic mathematical models to infer experimentally unobservable quantities. This inter-disciplinary field would benefit from experiments with maximal information content yielding high-precision inference. Here, we present a computationally efficient tool for optimizing experimental design in terms of parameter inference in studies using isogenic-tagged strains. We study the effect of three experimental design factors: number of biological replicates, sampling timepoint selection and number of copies per tagged strain. We conduct a simulation study to establish the relationship between our optimality criterion and the size of parameter estimate confidence intervals, and showcase its application in a range of biological scenarios reflecting different dynamics patterns observed in experimental infections. We show that in low-variance systems with low killing and replication rates, predicting high-precision experimental designs is consistently achieved; higher replicate sizes and strategic timepoint selection yield more precise estimates. Finally, we address the question of resource allocation under constraints; given a fixed number of host animals and a constraint on total inoculum size per host, infections with fewer strains at higher copies per strain lead to higher-precision inference.

### 1. Introduction

Bacterial infections remain leading causes of mortality and morbidity worldwide, accounting for greater than 50% of annual deaths in vulnerable demographic groups such as the elderly, young children and immunocompromised individuals, and disproportionately afflicting resource-constrained regions [1]. At the same time, the rapid emergence of antibiotic resistance [2] and tolerance [3] are swiftly incapacitating an increasing number of clinically licenced antibiotic agents. Research and development of both novel antibiotics and vaccines is slow, leaving us exposed to untreatable, chronic and recrudescent bacterial infections.

The establishment, progression and outcome of a bacterial infection in a host is shaped by the dynamics of bacterial transmission, colonization, within-host replication, death, inter-tissue migration and the interaction of bacteria with the immune system or administered therapeutic agents. Deciphering each of these factors individually can accelerate the transition from empirical to targeted engineered approaches in bacterial treatment and prophylaxis [4]. To aid the characterization of these dynamics, we often combine high-resolution experimental techniques with inferential mathematical models that fill in the gaps with regard to processes that remain experimentally unobservable [5].

In the field of bacterial dynamics, isogenic-tagged strains (ITS) constitute one of the most widely used experimental techniques to obtain *in vivo* data about the dynamic behaviour of bacteria in the host [6]. ITS-based data have been historically paired with mathematical models to maximize the biological insight they can offer [7,8]. By complementing experimental data with mechanistic mathematical models, we go beyond the mere characterization of statistical differences and correlations in the data, to formulation of biological hypotheses into mathematical relations and inference-making by fitting the models to the experimental observations. Representative examples of integrating such models with ITS-based data include studies identifying bottlenecks during the colonization and infection process [9–12], mapping out the host immune responses underpinning defence at different stages of the infection [11,12], deciphering the effects of host adaptation in subsequent infections [13,14], comparing the effects of different vaccine formulations [4] and antibiotic therapies [10,15] on infection control, characterizing the within-host dynamics of co-infections [9] and understanding the directionality of bacterial migration and de novo tissue colonization *in vivo* [10,12].

The aim of statistical inference is to estimate biological parameters as accurately and precisely as possible. Precision is typically reported in the form of confidence intervals [4,10,13,15], which are instrumental to assess the evidence for or against hypotheses such as whether bacteriostatic or bactericidal affects different stages in the infection process. Quantifying parameter estimate uncertainty is also important when we use a parametrized model for predictive purposes, as it allows us to predict the range of the potential future trajectories of the infection. To enable predictions and comparisons to be as clear-cut and distinguishable as possible, it is desirable to obtain parameter estimates with the narrowest possible confidence intervals.

Although mathematical models have been increasingly used in the study of *in vivo* bacterial population dynamics, their application has until recently remained limited to inference in simple biological systems for short time-interval post-inoculation (typically in the range of 0–12 h) largely owing to computational inefficiencies [4,10,11,13]. A recently developed statistical inference tool customized for making inferences based on experimental data from studies using ITS [16] has enabled inference in more complex biological systems, with larger inocula, and for the entirety of the infection timeline [15]. This moment-based, divergence-minimization approach represents a computationally efficient inference tool suitable for complex biological models, where traditional maximum-likelihood approaches become too computationally expensive.

To date, we know of no published studies providing executable algorithms for extending the application of data-based mechanistic models to experimental design in microbiology studies on within-host pathogen dynamics. The availability of a computationally efficient inference method now paves the way for an early iterative integration of mathematical modelling with data collection at the experimental design stage; this can optimize the inferential capacity of models and maximize the biological insights they generate (figure 1) [5,16]. Model-driven experimental design can also address the long-standing challenge of resource allocation, especially with regard to the use of animals in experiments. In particular, the ‘Three Rs’ tenet—replacement, reduction and refinement [17]—calls for ‘Reduction Alternatives' encouraging strategies to reduce the number of animals needed to provide answers to scientific questions. Maximizing the information obtained per animal has been recognized as one avenue to achieve [18]. In this paper, we show that with experimental design tools, it is possible to evaluate the information content of alternative designs and identify those with the maximum information content per animal host.

In this paper, we extend the inference methodology to improve experimental designs and provide a case study as a blueprint of how it can be applied to data from ITS-based studies. As ITS remains one of the key experimental techniques in the field of within-host bacterial dynamics [19–30], the development of dedicated tools to facilitate the integration of mathematical modelling along the experimental process can maximize biological insight by improving the statistical quality of inferences for a substantial body of research. A unique feature of ITS-based studies is the multiplicity of infections with different tagged strains within each host organism. Under the assumptions of independent action for bacteria and homogeneity in dynamics both between ITS and host organisms, both of which are assumptions commonly made in ITS-based studies [4,11–14], the number of biological replicates is equal to the number of ITS per host multiplied by the number of hosts. Experiments produce measurements of the number of copies of each ITS in anatomical compartments of interest at the time that each host animal is killed, and inference about the within-host dynamics ‘fills in’ the gap between the sampled timepoints.

Taking these features of ITS infections into account, we provide the first application of mathematical modelling as a tool to improve experimental design in studies of *in vivo* bacterial population dynamics using ITS. These studies commonly address questions around the dynamical composition of bacterial populations jointly shaped by environmental and biological processes such as bacterial replication, death, migration, phenotypic switch and bacterial–host interactions. We model bacterial kinetics with a continuous-time Markovian process where the state variables are the number of copies of a single ITS in each anatomical compartment. Given an experimental design, the model predicts the dynamics of the lower moments (mean, variance and covariance) of the state variables. Parameter inference is based on minimizing the Kullback–Liebler (KL) divergence between the predicted and observed moments (the latter being calculated from biological replicates, i.e. all ITS from all mice at each timepoint). First, we formulate a utility function inspired by the moment-based, divergence-minimization approach [16] that represents our criterion for optimality; the objective is to maximize this function across the design space. As we are interested in maximizing global parameter inference precision, we focus on optimizing the determinant of the parameter variance–covariance matrix. We use synthetic data to study its performance against the width of the corresponding 95% confidence intervals for a range of parameter sets. Finally, we illustrate its application by considering experimental designs for a variety of biological scenarios reflecting different phases of bacterial dynamics including exponential growth and decay as well as cases dominated by unidirectional inter-tissue bacterial migration.

### 2. Methods

#### 2.1. The isogenic-tagged strain technique: experimental output and dataset structure

ITS represents a large class of marker-based techniques, developed over the last decade, in order to study the within-host bacterial dynamics at a population level. Libraries of ITSs are generated by uniquely modifying the non-coding genome of bacteria with distinctive, short nucleotide sequences; the resulting ITS are genetically distinguishable, yet phenotypically identical. The ITS are mixed in equiproportional inocula and administered to the animal host. Animals are culled at predetermined timepoints to harvest or partly sample tissues of interest, and a total bacterial count is obtained per tissue [9,11]. The samples are also processed to determine the ITS composition in each tissue, originally by quantitative polymerase chain reaction (qPCR) [9,10,12] and more recently by next-generation sequencing [8,25]. The final structure of the dataset consists of the number of copies per ITS in each tissue of interest per timepoint.

By obtaining snapshots of spatio-temporal data recapitulating the changing ITS composition of bacteria in different tissues, it is possible to characterize the unobserved processes that underpin the dynamics of the infection *in vivo*. Supplementing these data with compartmental mechanistic models, the rates at which these unobserved dynamical processes evolve can be quantified with optimizable precision. We proceed to show how the inference precision can be optimized for this type of experimental data, by quantifying the impact of a range of design factors that we intuitively expect to affect inference precision.

#### 2.2. Conceptual model structure

Mechanistic models applied to ITS data typically have a compartmental structure, with each compartment representing a tissue of interest, and model parameters corresponding to the rates at which the unobserved biological processes change with time. In this paper, we use a radial model structure with three compartments, inspired by the model used in *Salmonella* Typhimurium infections [4,11,16], and shown as an example in figure 2. Linear compartmental model structures or a combination are also possible, and the model can be generalized to include *n* compartments [16].

#### 2.3. A likelihood-free inference framework for complex dynamical systems

As we are usually interested in tracking the evolution of infections over a period of days, and as bacterial infections are typically multiorgan, data collected from ITS-based experiments are often complex. Their complexity lies in (i) the length of observational time for the biological system in question, (ii) the number of tissues of interest, and (iii) the multiple organ–organ interactions. The ITS composition across all tissues of interest is summarized in a vector of moments (means, variances and covariances) corresponding to each observation time. Assuming first-order dynamics for each time interval between sequential points of data collection, a simple mathematical expression that derives the moments at a future timepoint can be obtained and solved analytically. The moments of the experimental dataset and the calculated moments for the same timepoint are compared via the KL divergence. Minimization of this divergence via standard optimization routines allows us to quantify processes underpinning the change in bacterial numbers and composition during a time interval of interest. A full set of computational tools and examples on how to use this inference framework can be freely accessed at: https://github.com/orestif/SPEEDI.R. For a more detailed description of the likelihood-free framework, see the electronic supplementary material.

#### 2.4. Parametric bootstrap to evaluate inference precision

Parameter inference would not be complete without a measure of its precision. Quantification of precision in this framework is carried out using a parametric bootstrap approach previously described [16]. Having identified the best parameter estimate by minimizing the KL divergence between the experimental and calculated moments for a range of parameter sets, we use it to simulate 200 synthetic datasets of the same size and structure as the experimental data. The choice of 200 bootstrapped samples was motivated by our priority to maximize the computational efficiency of the suggested tool and is in line with previous findings on the minimum adequate number of bootstrapped samples [31]. The corresponding moments are calculated, and the best parameters inferred via the same divergence-minimization method. This process results in a range of parameter estimates, which provides us with a measure of precision regarding the inference for each parameter.

#### 2.5. Deriving optimality criteria in a likelihood-free inference framework

We now turn to formulating an optimality criterion in this likelihood-free framework, inspired by the gold-standard alphabetical optimality criteria in the maximum-likelihood framework [32].

In the maximum-likelihood estimation (MLE) context, the Fisher Information Matrix (FIM) plays a key role in assessing parameter inference precision [33]. It summarizes the amount of information contained in the experimentally observed dataset with regard to the parameter set $\theta $ = {*θ*_{1}, *θ*_{2}, *θ*_{3}, …, *θ _{n}*}. The FIM can be analytically derived from the Hessian matrix ($H(\theta )$), a matrix of second-order derivatives of the likelihood $L(\theta )$ with respect to the parameters:

For a given MLE, the FIM contains information about the curvature of the likelihood. A higher value in the corresponding entry in the FIM indicates that the estimate for that is more precise (which can be expressed as a shorter confidence interval). Furthermore, by assessing different functions of the FIM (e.g. the determinant or trace), it is possible to derive information about different aspects of the inference quality provided by each experimental design; these ways of assessing different aspects of the inference quality of a model given an experimental design are known as alphabetical optimality criteria [32]. For example, the determinant of the FIM provides an index of average size (e.g. width, area and volume) of the confidence interval for the parameters of interest (*D*-optimality criterion), while the trace of the FIM summarizes the average variance of each parameter of interest (*A*-optimality criterion) [32]. Other composite functions of the FIM lead to the full range of the alphabetical optimality criteria.

An important property of the FIM is its inverse relationship with the variance–covariance matrix of *Θ* via the Cramer–Rao lower bound [34]. In §2.3, we used a parametric bootstrap approach to estimate the variance–covariance matrix of the parameter estimates; let that be *M*. As the inverse of *M* is proportional to the FIM, we can define equivalent optimality criteria by minimizing functions of *M*, rather than maximizing functions of the FIM. Analytical evaluation of the FIM is often difficult or impossible [35], but instead typically approximated numerically as we propose to do with the minimum-divergence inference method. Here, we focus on minimizing the determinant of *M*, which serves as the global measure of inference precision.

#### 2.6. Identifying experimental designs with higher information content

The FIM summarizes the information content of an experiment as a function of both the model parameters *Θ* and the design conditions under which the experiment is conducted. As such, it can be used to identify experimental designs that yield data with a higher information content. To achieve this, we first identify which aspects of the experimental process we are interested in optimizing. Here, we focus on the number of biological replicates, the initial number of copies per ITS and the choice of sampling timepoints.

Next, we place sensible bounds on the parameter space to be explored. This can be done through expert elicitation, based on prior literature or understanding of the system, or by using inferences made on previous experiments on the same or a similar system.

For the scenarios we consider in §§3.1, 3.2.1 and 3.2.2, we randomly draw parameters from a uniform distribution $\theta \sim U(0,\hspace{0.17em}1)$. In §3.2.3, we explore six biological patterns of within-host dynamics commonly occurring in *in vivo* bacterial infections, which we list below. For each of the scenarios A1, A2, B1, B2, C1 and C2 and keeping the total number of replicates fixed at 500, we consider 10 parameter sets, and for each parameter set three numbers of copies per ITS in {10, 100, 1000} and eight sampling timepoints reflecting early, mid-, late and a combination of early and late sampling in: *t* ∈ {(1 ,4), (2, 5), (10, 13), (11, 14), (19, 23), (20, 24), (1, 23), (2, 24)}.

To provide an example of how these sampling strategies would be implemented for a time interval of 24 h, a group of animal hosts would be euthanized at 1 h and a separate group at 4 h post-inoculation (1, 4). In the next experimental design, a group of animal hosts would be euthanized at 2 h and a separate group at 5 h post-inoculation (2, 5).

**A. Exponential growth (net growth = +0.2, minimal inter-tissue migration):**

A1. exponential growth with high killing and high replication rates (higher variance in data)

$${k}_{i}\sim U(2,\hspace{0.17em}2.5),{r}_{i}={k}_{i}+0.2,$$A2. exponential growth with low killing and low replication rates (lower variance in data)

$${k}_{i}\sim U(0.1,\hspace{0.17em}0.6),{r}_{i}={k}_{i}+\mathrm{0.2.}$$

**B. Exponential decay (net growth = −0.2, minimal inter-tissue migration):**

B1. exponential decay with high killing and high replication rates (higher variance in data)

$${k}_{i}\sim U(2,\hspace{0.17em}2.5),{r}_{i}={k}_{i}-0.2,$$B2. exponential decay with low killing and low replication rates (lower variance in data)

$${k}_{i}\sim U(0.3,\hspace{0.17em}0.8),{r}_{i}={k}_{i}-\mathrm{0.2.}$$

**C. Source–sink unidirectional bacterial transfer with near-zero intra-organ growth:**

C1. source–sink with high killing and high replication rates (higher variance in data)

$${k}_{i},{r}_{i}\sim U(2,\hspace{0.17em}2.5),\hspace{0.17em}{m}_{1j}\sim U(1.5,\hspace{0.17em}2),$$

C2. source–sink with low killing and low replication rates (lower variance in data)

$${k}_{i},{r}_{i}\sim U(0.1,\hspace{0.17em}0.6),\hspace{0.17em}{m}_{1j}\sim U(1.5,\hspace{0.17em}2).$$

### 3. Results

#### 3.1. A simulation study to correlate the determinant of the parameter variance–covariance matrix with precision in parameter inference

In this section, we perform a simulation study to demonstrate that the magnitude of the determinant of the parameter variance–covariance matrix is a reliable summary statistic for the precision of inference across the entire parameter set. We start with the model structure outlined in §2.2 and generate 10 random parameter sets. We then simulate 10 synthetic datasets for each parameter set (10 × 10 = 100 in total); the synthetic datasets serve as surrogates for experimentally observed ITS-based data. For each of the 100 virtual datasets, we obtain 200 bootstrapped samples and proceed to obtain 200 sets of parameter estimates. For each set of parameter estimates, we calculate the variance–covariance matrix and obtain its determinant, which is taken to be a global measure of inference precision. Next, we obtain the 95% confidence interval for each of the 12 parameters across the 10 synthetic datasets generated for each parameter set index and report the mean of their widths across each parameter set relative to the maximum one, as a function of the corresponding determinant (figure 3).

#### 3.2. Determinants of precision inference in experimental design

Having established the determinant of the parameter variance–covariance matrix as a reliable correlate to inference precision in §3.1, here we study the effect of three design factors on the quality of inference precision in ITS-based experiments, namely the number of biological replicates, the number of copies per ITS, and the choice of sampling timepoints. First, in §3.2.1, we focus on the number of biological replicates $(R)$; under the assumption of homogeneous inter-host immunological responses [4,11–14], *R* is equal to the number of animal hosts multiplied by the number of ITS per host. In §3.2.2, we introduce the biological limitation of a constraint on the total inoculum size $(I)$ per host; *I* is equal to the number of ITS per host multiplied by the number of copies per ITS. Finally, in §3.2.3, we study the mixed effect of experimental designs with different numbers of copies per ITS and different sampling timepoints.

##### 3.2.1. Number of biological replicates $(R)$

In §3.2.1, we use the same simulation-based approach and parameter sets as in §3.1 to evaluate the effect of the number of biological replicates on the magnitude of the determinant of the parameter variance–covariance matrix. At this stage, the number of biological replicates $(R)$ is equal to the animal hosts multiplied by any number of ITS per host, without constraint on the total inoculum size per animal host. As shown in figure 4, there is a directional effect between the determinant of the parameter variance–covariance matrix generated by the parametric bootstrap and the total number of replicates. As the number of total replicates increases (*x*-axis), the quality of parametric inference increases (reflected by lower values in the determinant on the *y*-axis).

##### 3.2.2. Distribution of replicates in animal hosts with a constraint on total inoculum size per host $(I)$

Here, we move a step further from §3.2.1 and introduce the biological constraint of total inoculum size per host $(I)$. In ITS-based experimental infections, there is a target total inoculum *I* achieved by multiplying the number of different ITS used by the number of copies per ITS. As detailed in table 1, we consider experimental designs with the total inoculum size fixed at 10 000 colony forming units (CFU); this is consistent with the orders of magnitude of inocula used in ITS studies [4,8,11,12]. We also keep the number of biological replicates fixed at 1000 biological replicates to minimize the stochastic noise in experiments with small replicate sizes and consider eight possible combinations between the total number of host animals per sampling timepoint and the total number of ITS per host animal.

experimental design index | total number of host animals per sampling timepoint | total number of ITS per host animal | mean number of CFU per ITS | total number of replicates (identical simulations) | total inoculum size | ||
---|---|---|---|---|---|---|---|

compartment 1 | compartment 2 | compartment 3 | |||||

1 | 200 | 5 | 2000 | 1 | 1 | 1000 | 10 000 |

2 | 100 | 10 | 1000 | 1 | 1 | 1000 | 10 000 |

3 | 50 | 20 | 500 | 1 | 1 | 1000 | 10 000 |

4 | 20 | 50 | 200 | 1 | 1 | 1000 | 10 000 |

5 | 10 | 100 | 100 | 1 | 1 | 1000 | 10 000 |

6 | 5 | 200 | 50 | 1 | 1 | 1000 | 10 000 |

7 | 2 | 500 | 20 | 1 | 1 | 1000 | 10 000 |

8 | 1 | 1000 | 10 | 1 | 1 | 1000 | 10 000 |

In figure 5, we plot the determinant of the parameter variance–covariance matrix as a function of the total number of ITS per host animal. There is a consistent positive relationship between the two variables, indicating that lower numbers of ITS per host yield lower determinant values, which reflect higher quality inference in terms of precision. Our analysis shows that given a fixed number of biological replicates and total inoculum size per host, experimental designs with higher numbers of hosts and lower numbers of ITS per host are predicted to yield higher-precision estimates.

##### 3.2.3. The choice of sampling timepoints in designs with different numbers of copies per isogenic-tagged strains

In §3.2.2., we found that given the presence of biologically realistic constraints on total inoculum size per host and total number of replicates, infections with fewer ITS at higher copy number per ITS consistently perform better in terms of precision inference. As a result, achieving high precision comes with the cost of using more animal hosts. In this section, we introduce a third design factor, the choice of sampling timepoints during the infection process. We seek to address the question: can we choose sampling timepoints that maximize the precision inference, allowing us to discount on the number of copies per ITS and, by extension, the number of animal hosts we need to inoculate?

To answer this question, we consider six biological scenarios that encompass common patterns in within-host dynamics of bacterial disease, as described in §2.6. In particular, we look at scenarios of exponential decay and growth with either high or low variance, as well as cases with unidirectional bacterial migration from one organ to the other.

We start with sampling at a single observation point. In the likelihood-free inference framework, inference using a single observation timepoint is only possible for models whose number of parameters is smaller than or equal to the number of moments that summarize the ITS distributions. We first consider a simple, two-compartment model structure with unidirectional bacterial flow from the first to the second compartment, and bacterial death and replication in both (five parameters in total). Figure 6 shows that unlike in the scenarios of exponential growth, in the scenarios of exponential decay intermediate sampling times (12- and 13-h post-inoculation) yield higher-precision inference, whereas early or late sampling times provide less information on average. This is probably owing to either minute changes in the bacterial population in the very early stages (1–2 h), or bacterial populations having died out during the later stages (23–24 h), at which point we can only get bounds on the parameter estimates that are consistent with extinction ‘up to’ that time (figure 6*a*). This pattern differs in the growth example (figure 6*b*), where early timepoints will be subject to greater variation in the population sizes. Later timepoints will have allowed sufficient time for the populations to establish/replicate whereby the variation between subpopulations will be less impactful on the inference.

In figure 7, we present the results of the same analysis performed using a combination of sampling timepoints for a four-compartment model. The precision across all experimental designs considered in figure 7 is consistently higher than in figure 6, as combining data two observation times increases the information content overall. Finally, experimental infections with the higher number of copies per ITS perform consistently better in terms of inference precision (figures 6 and 7). Assuming a constraint on the total inoculum size per host, this translates into designs involving a larger number of hosts per sampling timepoint.

#### 3.3. Strategies to reduce the number of animal hosts

In §3.2.3, we showed that infections with higher numbers of ITS per host tend to correlate with higher inference precision consistently across the sampling timepoints we considered. Under a fixed inoculum size per host, this translates into the cost of requiring more animal hosts, which incurs primarily ethical, but also financial costs. The analysis in §3.2.3 assumed a fixed number of biological replicates of 500.

Here, we consider the interaction between the total number of biological replicates (number of animal hosts multiplied by number of ITS per host) and the number of ITS per host, the latter being inversely correlated with the initial number of copies per ITS. Taking the case of a system with exponential growth dynamics and high variance as an example, we consider the experimental designs tabulated in table 2. Each of A–C includes experimental designs with a decreasing number of animal hosts per timepoint. From our previous analysis, the first design in each category A–C is expected to yield the highest precision, as it requires a high number of animal hosts and a high number of initial copies per ITS. In alternative designs 2 and 3, there is a 10-fold decrease in the number of animal hosts, with design no. 2 having a higher number of biological replicates combined with a low number of copies per ITS, and design no. 3 having a low number of biological replicates combined with a high number of copies per ITS.

experimental design index | total number of host animals per sampling timepoint | total number of ITS per host animal | mean number of CFU per ITS | total number of biological replicates (identical simulations) | total inoculum size | ||
---|---|---|---|---|---|---|---|

compartment 1 | compartment 2 | compartment 3 | |||||

A1 | 200 | 5 | 2000 | 1 | 1 | 1000 | 10 000 |

A2 | 20 | 50 | 200 | 1 | 1 | 1000 | 10 000 |

A3 | 20 | 5 | 2000 | 1 | 1 | 100 | 10 000 |

B1 | 100 | 5 | 2000 | 1 | 1 | 500 | 10 000 |

B2 | 10 | 50 | 200 | 1 | 1 | 500 | 10 000 |

B3 | 10 | 5 | 2000 | 1 | 1 | 50 | 10 000 |

C1 | 50 | 10 | 1000 | 1 | 1 | 500 | 10 000 |

C2 | 5 | 100 | 100 | 1 | 1 | 500 | 10 000 |

C3 | 5 | 10 | 1000 | 1 | 1 | 50 | 10 000 |

In figure 8, we comparatively plot the mean log_{10}(determinant) for the three classes of experimental designs outlined in table 2, across a range of sampling timepoints. A three-compartmental model is used (§2.2), and parameters are drawn at random from distributions specified in §2.5. While experimental design index 2 and 3 use the same number of animal hosts, the latter performs significantly better and very close to experimental design index 1, which involves 10 times more animal hosts. Overall, our analysis shows that given a constraint on the number of animal hosts, higher inference precision is achieved by using experimental infections with large initial numbers of ITS even at a lower number of biological replicates; the effect of the initial number of ITS is stronger than that of number of biological replicates in determining inference precision.

### 4. Discussion

We have presented a novel, computationally efficient approach to information-based experimental design for ITS-based data, with the aim of maximizing inference precision for the within-host dynamics of bacterial infections. Inspired by the canonical *D-*optimality criterion, we chose the determinant of the parameter variance–covariance matrix, obtained via a parametric bootstrap, as a criterion for global parameter inference precision in a likelihood-free inference framework previously developed by Price *et al.* [16]. We used a simulation study to illustrate that the magnitude of the determinant effectively captures the mean width of confidence intervals across parameter sets. As the determinant is a global measure of variance, we correlated it with the mean width of the confidence interval across the entire parameter set (12 parameters in our model) and demonstrated the strong correlation between the value of the determinant and the average width of the confidence interval.

We proceeded to show that the replicate size, initial number of copies per ITS and sampling timepoint selection all have an impact on inference precision in the range of general biological cases representative of bacterial dynamics typically observed during experimental infections. Biological scenarios with low replication and killing rates are characterized by low variance in the distribution of copies per ITS [16]; for the entirety of experimental designs considered for these low-variance scenarios, the inference precision is globally higher than for higher-variance scenarios. Furthermore, there is greater redundancy in experimental designs yielding similar levels of high-precision inference for lower-variance scenarios, entailing that prior intuitive knowledge about the dynamics of a system can inform experimental microbiologists about how much focus they should place on identifying unique experimental designs with the highest information content. Finally, we found that for a fixed total number of replicates and assuming inter-host immunological homogeneity, infections with lower numbers of ITS per host at higher initial copies per ITS tend to give higher-level inference precision for both high- and low-variance scenarios.

The benefit of our approach is that it can be used not only to identify a single optimal experimental design, but also to rank experimental designs, thus providing flexibility to experimentalists to choose those that reflect their own priorities. For example, assuming that the reduction in the number of animal hosts is a priority, one can dismiss the globally optimal experimental design that requires using a 10-fold higher number of animal hosts and can choose the design with the second highest precision, which can be achieved by strategically choosing the sampling timepoints while keeping the number of animal hosts low. Finally, we looked at the interaction between two design factors (total number of replicates and initial copies per ITS) under a constraint on the number of animal hosts, which is one of the key priorities in terms of animal welfare, logistical and financial considerations alike. We show that in this scenario, favouring an experimental design with the lower number of biological replicates and fewer ITS per host yields consistently higher inference precision than the alternative allocation of resources with a high number of biological replicates and more ITS per host. This consideration becomes particularly relevant as individual ITS detection in a multi-ITS sample has become more advanced, moving away from qPCR with a detection threshold of a less than 10 ITS per host tissue [11,12] to next-generation sequencing able to detect tens or hundreds of ITS per host tissue sample [9,29].

Discussing the assumptions and limitations of the likelihood-free inference framework used as the basis for the experimental design tool presented here, the moments-based, divergence-minimization inference framework makes the simplifying assumption that these distributions are multivariate normal across all tissues of interest and sampling timepoints; this assumption facilitates the derivation of the expressions for the moments. Given that the shape of the true distribution is different for every tissue and sampling timepoint, and that one would need to approximate each of those individually, the simplifying choice of a multivariate normal distribution seems sensible. Furthermore, the first- and second-order moments of these distributions were used as sufficient summary statistics. Although it could be argued that higher-order moments would capture additional features of the distributions, the added complexity is not justified by the resolution of the available data and the previous simplifying assumption about the structure of the distributions. As shown in [16], there is excellent agreement between both the inferences and the width of the confidence intervals obtained when using the maximum-likelihood framework and the likelihood-free framework used here; as a result, the use of summary statistics and the assumption of multivariate normal distributions do not affect the quality of inference. Finally, the estimated precision as identified by the optimal design is valid under the assumption that the model used to find the optimal design is a sufficiently accurate characterization of the true underlying mechanism.

In terms of the bootstrapped samples, we limited their number to 200 per experimental design considered in this study, striking a balance between reducing inter-simulation variation (Monte Carlo error) and limiting the applicability of the tool owing to long running times. Earlier work in [16] shows that even at 100 experiments (or bootstrapped samples), the Gillespie and moment-based approaches both lead to more and more precise estimates as the number of biological replicates increases, while at 500–1000 biological samples, the variance in the parameter estimates converges to 0. Hence, by large, the source of the Monte Carlo error is not the number of bootstrapped samples, but the number of biological replicates, which is one of the design factors considered in this study. Furthermore, our choice of 200 bootstrapped samples was further motivated by a study in [31], who developed a computationally efficient bootstrapping algorithm in the field of phylogenetics and used it to formulate testing criteria for determining the minimum number of bootstrapped samples needed to infer confidence values; in most cases, a size of 100–500 bootstrapped samples was adequate. Finally, we considered at least 10 parameter sets for each experimental design, obtaining consistent results in terms of how the experimental designs ranked relative to each other across the 10 individual datasets; this consistent pattern further confirms that our choice of bootstrapped sample size is not a major source of Monte Carlo error.

As exploratory research is gaining momentum and the competition for funding is increasing, optimal resource allocation is emerging as a first step in the process of experimental set-up conceptualization [36], while also affecting the welfare of animal hosts which is further reflected in the principle of the three Rs—replacement, reduction and refinement [17,37]. *In vivo* studies in population dynamics microbiology have typically not used experimental design tools until now. For example, despite the sustained wide application of the ITS technique, questions around optimizing experimental design and resource allocation have remained formally unaddressed, with the exception of past studies having used small-scale simulation-based approaches to determine a suitable inoculum size [4] or the distribution of ITS per host [4,12]. Limited examples of efforts to develop experimental design tools for population-level data in dynamical systems can be found in other microbiological fields like systems biology [38–43], ecology [44–46] and food science [47,48].

The apparent under-representation of optimal design algorithms in population dynamics microbiology is largely attributable to the lack of computationally efficient inference methods that cater for mathematical models with intractable likelihoods. Such models are often necessary to capture even simplified representations of biological systems which become increasingly more complex, as infections progress [16]. Here, we have provided a blueprint for the use of experimental design tools in complex systems in population dynamics microbiology. We addressed replicate size, sampling timepoint selection and the initial number of copies per ITS in the inoculum as design factors commonly considered by experimentalists when trying to optimize multifactorial resource allocation to achieve a desired level of inference precision. In terms of future work, our framework for optimizing experimental design can be further extended to serve a wider range of experimentalists' goals beyond maximizing inference precision. Other common goals include improving experimental designs to enable model selection, which can aid in assessing competing biological hypotheses [49], as well as minimizing the correlations between parameters, which enables a more clear-cut discrimination between often correlated processes, such as bacterial replication and death [4,7]. In this paper, we focused on improving the overall inference precision and focused on the determinant of the parameter variance–covariance matrix as the utility function of choice. However, because our likelihood-free experimental design tool is based on an approximation of the FIM, it could accommodate other utility functions such as the trace and the eigenvalue of the parameter variance–covariance matrix, which corresponds to the *A*- [50] and *E*-optimality criteria [51], respectively.

### Data accessibility

Sample code required to produce parameter estimates and compare experimental designs is available at: https://github.com/myrtovlazaki/OED.

### Authors' contributions

M.V. conceived and performed the study, and drafted the manuscript. O.R. and D.J.P. provided guidance throughout the study and helped with the writing process.

### Competing interests

The authors have no conflicts of interest to declare.

### Funding

M.V. is jointly funded by a Newnham College Major Studentship and a Vergottis Award from the Cambridge Trust. O.R. is funded by the ALBORADA Trust. O.R. and D.J.P. acknowledge research grant no. BB/M020193 from the BBSRC.

## Footnotes

Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.5230415.

### References

- 1. World Health Organization. 2019World Health Statistics. See https://www.who.int/gho/publications/world_health_statistics/2019/EN_WHS_2019_Main.pdf?ua=1 (accessed 1 October 2019). Google Scholar
- 2.
Hofer U . 2018The cost of antimicrobial resistance.**Nat. Rev. Microbiol.**, 3. (doi:10.1038/s41579-018-0125-x) Crossref, Google Scholar**17** - 3.
Brauner A, Fridman O, Gefen O, Balaban NQ . 2016Distinguishing between resistance, tolerance and persistence to antibiotic treatment.**Nat. Rev. Microbiol.**, 320-330. (doi:10.1038/nrmicro.2016.34) Crossref, PubMed, ISI, Google Scholar**14** - 4.
Coward C, Restif O, Dybowski R, Grant AJ, Maskell DJ, Mastroeni P . 2014The effects of vaccination and immunity on bacterial infection dynamics*in vivo*.**PLoS Pathog.**, e1004359. (doi:10.1371/journal.ppat.1004359) Crossref, PubMed, ISI, Google Scholar**10** - 5.
Restif O, Thakar J, Harvill ET . 2009Analysis with mathematical models provides insights into infectious diseases.**Microbe**, 176-182. (doi:10.1128/microbe.4.176.1) Google Scholar**4** - 6.
Abel S, Abel zur Wiesch P, Davis BM, Waldor MK . 2015Analysis of bottlenecks in experimental models of infection.**PLoS Pathog.**, e1004823. (doi:10.1371/journal.ppat.1004823) Crossref, PubMed, ISI, Google Scholar**11** - 7.
Vlazaki M, Huber J, Restif O . 2019Integrating mathematical models with experimental data to investigate the within-host dynamics of bacterial infections.**Pathog. Dis.**, ftaa001. (doi:10.1093/femspd/ftaa001) Crossref, PubMed, ISI, Google Scholar**77** - 8.
Rossi O, Vlazaki M, Kanvatirth P, Restif O, Mastroeni P . 2020Within-host spatiotemporal dynamic of systemic salmonellosis: ways to track infection, reaction to vaccination and antimicrobial treatment.**J. Microbiol. Methods**, 106008. (doi:10.1016/j.mimet.2020.106008) Crossref, PubMed, ISI, Google Scholar**176** - 9.
Abel S, Abel zur Wiesch P, Chang HH, Davis BM, Lipsitch M, Waldor MK . 2015Sequence tag-based analysis of microbial population dynamics.**Nat. Methods.**, 223-226. (doi:10.1038/nmeth.3253) Crossref, PubMed, ISI, Google Scholar**12** - 10.
Kaiser P, Regoes RR, Dolowschiak T, Wotzka SY, Lengefeld J, Slack E, Grant AJ, Ackermann M, Hardt W-D . 2014Cecum lymph node dendritic cells harbor slow-growing bacteria phenotypically tolerant to antibiotic treatment.**PLoS Biol.**, e1001793. (doi:10.1371/journal.pbio.1001793) Crossref, PubMed, ISI, Google Scholar**12** - 11.
Grant AJ, Restif O, McKinley TJ, Sheppard M, Maskell DJ, Mastroeni P . 2008Modelling within-host spatiotemporal dynamics of invasive bacterial disease.**PLoS Biol.**, e74. (doi:10.1371/journal.pbio.0060074) Crossref, PubMed, ISI, Google Scholar**6** - 12.
Kaiser P, Slack E, Grant AJ, Hardt W-D, Regoes RR . 2013Lymph node colonization dynamics after oral*Salmonella*Typhimurium infection in mice.**PLoS Pathog.**, e1003532. (doi:10.1371/journal.ppat.1003532) Crossref, PubMed, ISI, Google Scholar**9** - 13.
Dybowski R, Restif O, Goupy A, Maskell DJ, Mastroeni P, Grant AJ . 2015Single passage in mouse organs enhances the survival and spread of*Salmonella enterica*.**J. R. Soc. Interface**, 20150702. (doi:10.1098/rsif.2015.0702) Link, ISI, Google Scholar**12** - 14.
Dybowski R, Restif O, Price DJ, Mastroeni P . 2017Inferring within-host bottleneck size: a Bayesian approach.**J. Theor. Biol.**, 218-228. (doi:10.1016/j.jtbi.2017.09.011) Crossref, PubMed, ISI, Google Scholar**435** - 15.
Vlazaki M, Rossi O, Price DJ, McLean C, Grant AJ, Mastroeni P, Restif O . 2020A data-based mathematical modelling study to quantify the effects of ciprofloxacin and ampicillin on the within-host dynamics of*Salmonella enterica*during treatment and relapse.**J. R. Soc. Interface**, 20200299. (doi:10.1098/rsif.2020.0299) Link, ISI, Google Scholar**17** - 16.
Price DJ, Breuzé A, Dybowski R, Mastroeni P, Restif O . 2017An efficient moments-based inference method for within-host bacterial infection dynamics.**PLoS Comput. Biol.**, e1005841. (doi:10.1371/journal.pcbi.1005841) Crossref, PubMed, ISI, Google Scholar**13** - 17.
Russell WMS, Burch RL . 1959**The principles of humane experimental technique**. London, UK: Universities Federation for Animal Welfare. Google Scholar - 18.
Fenwick N, Griffin G, Gauthier C . 2009The welfare of animals used in science: how the ‘Three Rs’ ethic guides improvements.**Can. Vet. J.**, 523-530. PubMed, ISI, Google Scholar**50** - 19.
Melton-Witt JA, Rafelski SM, Portnoy DA, Bakardjiev AI . 2011Oral infection with signature-tagged*Listeria monocytogenes*reveals organ-specific growth and dissemination routes in guinea pigs.**Infect. Immun.**, 720-732. (doi:10.1128/iai.05958-11) Crossref, PubMed, ISI, Google Scholar**80** - 20.
Schwartz DJ, Chen SL, Hultgren SJ, Seed PC . 2011Population dynamics and niche distribution of uropathogenic*Escherichia coli*during acute and chronic urinary tract infection.**Infect. Immun.**, 4250-4259. (doi:10.1128/iai.05339-11) Crossref, PubMed, ISI, Google Scholar**79** - 21.
Walters MS, Lane MC, Vigil PD, Smith SN, Walk ST, Mobley HLT . 2012Kinetics of uropathogenic*Escherichia coli*metapopulation movement during urinary tract infection.**mBio**, e00303. (doi:10.1128/mbio.00303-11) Crossref, PubMed, ISI, Google Scholar**3** - 22.
Lowe DE, Ernst SMC, Zito C, Ya J, Glomski IJ . 2013*Bacillus anthracis*has two independent bottlenecks that are dependent on the portal of entry in an intranasal model of inhalational infection.**Infect. Immun.**, 4408-4420. (doi:10.1128/iai.00484-13) Crossref, PubMed, ISI, Google Scholar**81** - 23.
Gerlini A 2014The role of host and microbial factors in the pathogenesis of pneumococcal bacteraemia arising from a single bacterial cell bottleneck.**PLoS Pathog.**, e1004026. (doi:10.1371/journal.ppat.1004026) Crossref, PubMed, ISI, Google Scholar**10** - 24.
Lim CH, Voedisch S, Wahl B, Rouf SF, Geffers R, Rhen M, Pabst O . 2014Independent bottlenecks characterize colonization of systemic compartments and gut lymphoid tissue by*Salmonella*.**PLoS Pathog.**, e1004270. (doi:10.1371/journal.ppat.1004270) Crossref, PubMed, ISI, Google Scholar**10** - 25.
Lam LH, Monack DM . 2014Intraspecies competition for niches in the distal gut dictate transmission during persistent*Salmonella*infection.**PLoS Pathog.**, e1004527. (doi:10.1371/journal.ppat.1004527) Crossref, PubMed, ISI, Google Scholar**10** - 26.
Maier L 2014Granulocytes impose a tight bottleneck upon the gut luminal pathogen population during*Salmonella*Typhimurium colitis.**PLoS Pathog.**, e1004557. (doi:10.1371/journal.ppat.1004557) Crossref, PubMed, ISI, Google Scholar**10** - 27.
Rego ROM, Bestor A, Štefka J, Rosa PA . 2014Population bottlenecks during the infectious cycle of the Lyme disease spirochete*Borrelia burgdorferi*.**PLoS ONE**, e101009. (doi:10.1371/journal.pone.0101009) Crossref, PubMed, ISI, Google Scholar**9** - 28.
Gonzalez RJ, Weening EH, Lane MC, Miller VL . 2015Comparison of models for bubonic plague reveals unique pathogen adaptations to the dermis.**Infect. Immun.**, 2855-2861. (Doi:10.1128/iai.00140-15) Crossref, PubMed, ISI, Google Scholar**83** - 29.
Zhang T, Abel S, Abel zur Wiesch P, Sasabe J, Davis BM, Higgins DE, Waldor MK . 2017Deciphering the landscape of host barriers to*Listeria monocytogenes*infection.**Proc. Natl Acad. Sci. USA**, 6334-6339. (doi:10.1073/pnas.1702077114) Crossref, PubMed, ISI, Google Scholar**114** - 30.
Martin CJ, Cadena AM, Leung VW, Lin PL, Maiello P, Hicks N, Chase MR, Flynn JL, Fortune SM . 2017Digitally barcoding*Mycobacterium tuberculosis*reveals*in vivo*infection dynamics in the macaque model of tuberculosis.**MBio**, e00312-17. (doi:10.1128/mbio.00312-17) Crossref, PubMed, ISI, Google Scholar**8** - 31.
Pattengale ND, Alipour M, Bininda-Emonds ORP, Moret BME, Stamatakis A . 2010How many bootstrap replicates are necessary?**J. Comput. Biol.**, 337-354. (doi:10.1089/cmb.2009.0179) Crossref, PubMed, ISI, Google Scholar**17** - 32.
Box GEP . 1982Choice of response surface design and alphabetic optimality.**Util. Math.****21B**, 11-55. Google Scholar - 33.
Chaloner K, Verdinelli I . 1995Bayesian experimental design: a review.**Stat. Sci.**, 273-304. (doi:10.1214/ss/1177009939) Crossref, ISI, Google Scholar**10** - 34.
- 35.
Spall JC . 2003**Introduction to stochastic search and optimization: estimation, simulation, and control, vol. 65**. Hoboken, NJ: John Wiley & Sons. Crossref, Google Scholar - 36.
O'Malley MA . 2007Exploratory experimentation and scientific practice: metagenomics and the proteorhodopsin case.**Hist. Philos. Life Sci.**, 337-360. PubMed, ISI, Google Scholar**29** - 37.
Sneddon LU, Halsey LG, Bury NR . 2017Considering aspects of the 3Rs principles within experimental animal biology.**J. Exp. Biol.**, 3007-3016. (doi:10.1242/jeb.147058) Crossref, PubMed, ISI, Google Scholar**220** - 38.
Mdluli T, Buzzard GT, Rundell AE . 2015Efficient optimization of stimuli for model-based design of experiments to resolve dynamical uncertainty.**PLoS Comput. Biol.**, e1004488. (doi:10.1371/journal.pcbi.1004488) Crossref, PubMed, ISI, Google Scholar**11** - 39.
Sinkoe A, Hahn J . 2017Optimal experimental design for parameter estimation of an IL-6 signaling model.**Processes**, 49. (doi:10.3390/pr5030049) Crossref, ISI, Google Scholar**5** - 40.
Braniff N, Richards A, Ingalls B . 2019Optimal experimental design for a bistable gene regulatory network.**IFAC-PapersOnLine**, 255-261. (doi:10.1016/j.ifacol.2019.12.267) Crossref, Google Scholar**52** - 41.
White A, Tolman M, Thames HD, Withers HR, Mason KA, Transtrum MK . 2016The limitations of model-based experimental design and parameter estimation in sloppy systems.**PLoS Comput. Biol.**, e1005227. (doi:10.1371/journal.pcbi.1005227) Crossref, PubMed, ISI, Google Scholar**12** - 42.
Liepe J, Filippi S, Komorowski M, Stumpf MPH . 2013Maximizing the information content of experiments in systems biology.**PLoS Comput. Biol.**, e1002888. (doi:10.1371/journal.pcbi.1002888) Crossref, PubMed, ISI, Google Scholar**9** - 43.
Steiert B, Raue A, Timmer J, Kreutz C . 2012Experimental design for parameter estimation of gene regulatory networks.**PLoS ONE**, e40052. (doi:10.1371/journal.pone.0040052) Crossref, PubMed, ISI, Google Scholar**7** - 44.
Zhang JF, Papanikolaou NE, Kypraios T, Drovandi CC . 2018Optimal experimental design for predator–prey functional response experiments.**J. R. Soc. Interface**, 20180186. (doi:10.1098/rsif.2018.0186) Link, ISI, Google Scholar**15** - 45.
Moffat H, Hainy M, Papanikolaou NE, Drovandi C . 2020Sequential experimental design for predator–prey functional response experiments.**J. R. Soc. Interface**, 20200156. (doi:10.1098/rsif.2020.0156) Link, ISI, Google Scholar**17** - 46.
Overstall A, McGree J . 2020Bayesian design of experiments for intractable likelihood models using coupled auxiliary models and multivariate emulation.**Bayesian Anal.**, 103-131. (doi:10.1214/19-ba1144) Crossref, ISI, Google Scholar**15** - 47.
Garre A, Peñalver-Soto JL, Esnoz A, Iguaz A, Fernandez PS, Egea JA . 2019On the use of*in-silico*simulations to support experimental design: a case study in microbial inactivation of foods.**PLoS ONE**, e0220683. (doi:10.1371/journal.pone.0220683) Crossref, PubMed, ISI, Google Scholar**14** - 48.
Price DJ, Bean NG, Ross JV, Tuke J . 2018Designing group dose-response studies in the presence of transmission.**Math. Biosci.**, 62-78. (doi:10.1016/j.mbs.2018.07.007) Crossref, PubMed, ISI, Google Scholar**304** - 49.
Vanlier J, Tiemann CA, Hilbers PA, van Riel NA . 2014Optimal experiment design for model selection in biochemical networks.**BMC Syst. Biol.**, 20. (doi:10.1186/1752-0509-8-20) Crossref, PubMed, ISI, Google Scholar**8** - 50.
Chernoff H . 1953Locally optimal designs for estimating parameters.**Ann. Math. Statist.**, 586-602. (doi:10.1214/aoms/1177728915) Crossref, Google Scholar**24** - 51.
Ehrenfeld E . 1955On the efficiency of experimental design.**Ann. Math. Stat.**, 247-255. (doi:10.1214/aoms/1177728541) Crossref, Google Scholar**26**