How well can the exponential-growth coalescent approximate constant-rate birth–death population dynamics?

One of the central objectives in the field of phylodynamics is the quantification of population dynamic processes using genetic sequence data or in some cases phenotypic data. Phylodynamics has been successfully applied to many different processes, such as the spread of infectious diseases, within-host evolution of a pathogen, macroevolution and even language evolution. Phylodynamic analysis requires a probability distribution on phylogenetic trees spanned by the genetic data. Because such a probability distribution is not available for many common stochastic population dynamic processes, coalescent-based approximations assuming deterministic population size changes are widely employed. Key to many population dynamic models, in particular epidemiological models, is a period of exponential population growth during the initial phase. Here, we show that the coalescent does not well approximate stochastic exponential population growth, which is typically modelled by a birth–death process. We demonstrate that introducing demographic stochasticity into the population size function of the coalescent improves the approximation for values of R0 close to 1, but substantial differences remain for large R0. In addition, the computational advantage of using an approximation over exact models vanishes when introducing such demographic stochasticity. These results highlight that we need to increase efforts to develop phylodynamic tools that correctly account for the stochasticity of population dynamic models for inference.


Introduction
The composition of individuals that make up a population often changes through time. In many cases, mathematical models can be formulated that describe the qualitative dynamics of the population. However, in order to obtain a more quantitative description, statistical methods are required that can estimate model parameters from population data. Genetic sequence data, in particular, have been invaluable at informing the dynamical processes that shape populations. Such processes include for example environmental fluctuations [1], speciation events (see for example recent reviews by Burbrink [2] and Stadler [3]) and in particular the infection of new hosts in the case of infectious diseases (reviewed by Kü hnert et al. [4]).
At the core, such methods use the fact that genetic sequences obtained from individuals within the population differ enough to reconstruct their genealogy. In a genealogy, a coalescent event represents the most recent ancestor of two (or more) lineages. Now, typically the genealogy is assumed to be equal to the phylogeny of the population history. In a phylogeny, a branching event represents a birth event in the population. By equalling the genealogy to the phylogeny, we assume that a coalescent event in the genealogy corresponds to a birth event in the population. The reconstructed phylogeny (genealogy) now is used to quantify the parameters of the population dynamic process (birth and death of individuals in the population). Surprisingly, genetic information from only few individuals can lead to deep insights into the population dynamics as a whole (for an overview, see [4]).
The time resolution of this phylogeny depends on the time scale at which genetic mutations occur versus the time scale at which the population composition changes. When both these changes co-occur (i.e. when the evolutionary time scale is comparable to the time scale of the population dynamics), then there is a close correspondence between the population genetic and population dynamic history [5]. In the case of an infectious pathogen, the reconstructed phylogeny can be interpreted as a proxy for an incomplete transmission tree. In such an incomplete transmission tree, branches that link two individuals represent chains of transmission from one individual to another, which may or may not have involved unobserved intermediate individuals. The incomplete transmission tree can then be used to inform models of the population dynamics [6][7][8][9][10][11][12].
Phylogenetic reconstruction methods originally made simplifying assumptions that are common in the field of population genetics. The Wright-Fisher and Moran models, for example, assume that the number of individuals remains fixed through time. Using the coalescent framework, it is then possible to derive the likelihood for any specific phylogenetic tree that can be used in likelihood-based inference methods [13]. It is important to note that while the coalescent is typically used to model genealogies, here the coalescent is a model for the phylogenetic tree (i.e. the population dynamics). The accuracy of a reconstructed phylogeny obtained using this framework (and by proxy a-typically incomplete-transmission tree in the case of infectious disease outbreaks), however, will strongly depend on the validity of the assumptions underlying the coalescent approximation. A major advance in coalescent theory was introduced by Griffiths & Tavaré [14], who generalized the coalescent to population sizes that can be described by an integrable function through time. Central to the derivation of both the original constant population size formulation of Kingman [13] and the parametric coalescent of Griffiths & Tavaré [14] is the assumption that the number of sampled individuals remains fixed and the population size is large compared with the sample size, irrespective of whether the discrete generation Wright-Fisher or Moran model is used [15]. Fu [16] showed that even though the Kingman coalescent derivation is obtained for small sample sizes, the Kingman coalescent is also a good approximation to the population dynamics for bigger sample sizes. Boskova et al. [17] hypothesize that the key assumption of the coalescent often being violated is stochastically changing population size through time.
The joint inference of the phylogeny as well as the model for the population dynamics is called 'phylodynamic inference' and has become popular in a variety of fields, including the study of infectious disease outbreaks [7,8,18 -20]. Phylodynamic inferences mostly rely on the coalescent to approximate the underlying population dynamics. In particular, implementations assuming both parametric [7,8,18,21] and non-parametric [22,23] population size changes are commonly used for statistical inference under the coalescent, and the inferred population size changes shed light on the population dynamic process. However, the justification of the coalescent assumptions is often questionable in the case of epidemic outbreaks. Specifically, epidemic outbreaks commonly originate from a single individual. Thus, in the early stages of the outbreak, the number of infected individuals (i.e. the population size in the coalescent) is small, and therefore cannot be much larger than the number of sampled individuals.
Key to most epidemiological models of infectious dynamics (e.g. SIS, SIR, SEIR) is a phase of (exponential) population growth from the initial infected individual. These compartmental models can be written as forwardin-time birth-death (BD) models, with the initial exponential growth phase being a constant-rate BD model. Thus, deterministic exponential-growth coalescent models [24] appear as an appropriate description of early outbreaks. The deterministic exponential-growth coalescent has been used to estimate the initial growth rate, and from that the basic reproductive number R 0 of the pathogen [7].
Recent work has proposed a direct modelling of stochastic epidemiological dynamics (e.g. SIS or SIR dynamics) for phylogenetic inference that avoids any approximations made in the coalescent framework [4,9,11,12]. These models are forward-in-time stochastic BD models explicitly modelling transmission, recovery and sampling. These approaches build upon the BD framework that is commonly used to reconstruct evolutionary relationships between species [25 -29]. The advantage of these methods is that they do not implicitly require the assumptions of the coalescent to be justified, but assume stochastic epidemiological models such as SIS or SIR; however, they are often computationally much more expensive than many of the coalescent-based approximations.
Our aim here is to investigate the applicability of the coalescent approximation when performing phylodynamic inference. We assume that epidemic outbreaks are described by a constant-rate BD model, which is a good approximation to the early phases of most well-established epidemiological models [30,31]. We show that the coalescent approximation assuming an exponentially varying population size fails to accurately retrieve the true distribution of coalescent times of two randomly chosen individuals. We also show that replacing the deterministic exponential growth function with an ensemble of trajectories that are sampled from the full stochastic BD model in the coalescent approximation yields a good approximation to the true distribution. The added computational cost of such a stochastic extension of the coalescent, however, sacrifices its primary advantage compared with birth-death approaches. In fact, direct rspb.royalsocietypublishing.org Proc. R. Soc. B 282: 20150420 application of BD models is much more computationally efficient than the coalescent approximation for simple stochastic exponential growth.
While written in epidemiological terms, our results hold for any population dynamics, which is assumed to grow exponentially under a constant-rate BD model (e.g. species radiation or a rapid spread of languages).

Models
Our reference model of population growth is the constantrate BD model [32]. For disease outbreaks, the BD model is the simplest stochastic model of infectious disease dynamics in large susceptible populations with an infectious force proportional to the number of infected individuals. We highlight here that deterministic models for epidemic spread cannot be employed as they assume continuous population sizes, while we have discrete changes in number of infected individuals at branching/ removal events. Our model, the constant-rate BD model, assumes that infected individuals have a constant rate of transmitting to susceptibles, l, and a constant rate of becoming uninfected, m. The expected population size or number of infected individuals in this model grows exponentially, which is characteristic of the early phases of most epidemic outbreaks.
We compare the BD model with the coalescent model. In what follows, we refer to a 'coalescent model' when talking about the stochastic process giving rise to trees with a coalescent probability distribution (while typically the coalescent is termed a statistical framework rather than a model). It is important to distinguish two views of time here. Time runs in the usual (forward) sense when we consider population growth models, such that the value of a time t future in the future is larger than the value of time t 0 at the present, t future . t 0 . When we consider ancestral processes that reconstruct the phylogeny of the population, time runs in the opposite (backward) direction, such that a value of t past in the past is larger than a value of t 0 at the present, t past . t 0 (in order to avoid confusion, we denote forward time by the variable t and backward time by the variable t).
We compare the 'time of coalescence' of two present-time lineages under the BD model with two coalescent-type models with exponentially growing populations. Note that the time of coalescence of two lineages is the most recent time in the past at which these two lineages shared a common ancestor. The lineages may have evolved under any model, not only a coalescent model (despite the name 'time of coalescent').
For the first coalescent model, we consider a standard coalescent model with deterministic exponential population growth [14,24,33]. For the second coalescent model, we define a coalescent model with stochastic population growth [10], where the population trajectories are realizations of a stochastic BD process. Figure 1 shows a realization of a BD population size trajectory compared with the corresponding deterministic exponential population size curve, with exemplary two-leaf subtrees.
Here, we define the BD and the two coalescent models formally, and derive characteristics of the models that facilitate their comparison.

(a) The BD model
We consider a constant-rate BD model with constant infectious force (birth rate), l, and constant removal (death) rate, m (0 m , l). The process starts with one initially infected individual and stops after time T since the start of the epidemic (T is also called time since origin). This is a simple continuous-time epidemiological model that preserves exponential growth of the number of infected individuals while properly accounting for discrete population sizes and allowing for the possibility of early termination of the epidemic. The early phases of SIS-and SIR-type epidemic outbreaks are typically modelled by such a process. Key epidemiological parameters can be derived from the BD model parameters, such as the basic reproductive number [30], R 0 ¼ l/m, and the net growth rate r ¼ l 2 m.
In order to determine the expected population growth and the coalescent time distribution under the BD process, we define p 0 (t) as the probability that a single individual has no extant offspring after time t, and p 1 (t) as the probability that a single individual has exactly one extant offspring after time t. Following Kendall [34], (i) Expected population size under the BD model The expected population size under the BD model grows exponentially, (2:1) The BD process is an individual-based model, with each individual having some rate of dying. Thus, it can happen that all individuals die, and the trajectory of the BD process does not 'survive' until time T. When analysing a phylogeny of age T, we know that the process survived until time T. We thus only consider population size trajectories that survive until time T (i.e. that have a non-zero population size until time T). The expected population size for 0 t T is given by [35]. The conditioning on survival results in an early rapid increase in the expected number of individuals, called 'push of the past', after which the population size grows exponentially with rate r ¼ l 2 m [35]. The time T such that the expected number of individuals is equal to N is

(ii) Distribution of coalescent times under the BD model
In the electronic supplementary material, we show that the distribution of time to coalescence (where time is measured backward from the present) of two randomly chosen individuals at time t given parameters l, m and time since origin T under the BD model, denoted by f BD (t), is T ], and f BD (t) ¼ 0 for t . T since the process started at time T in the past.
(b) The coalescent model with deterministic exponential population growth (CD model) In the coalescent approximation, two lineages coalesce with rate 1/(N(t)r), with N(t) being the population size and r being the generation time. Thus, the coalescent is defined by parameters N(t) and r, compared with l, m, T in the BD process. The coalescent with a deterministic exponentially growing population size (referred to hereafter as the CD model) is therefore defined by the following two parameters: (i) the population growth rate r and (ii) a factor Q ¼ N 0 r, where N 0 is the present-day population size N 0 , and r the generation time [14,24]. This yields (i) Expected population size growth under the CD model Because population growth under the CD model is a deterministic process, the (expected) population size at time t is just (2:5) Note that compared with the BD model, there is no extinction of a population.
(ii) Distribution of coalescent times under the CD model The time of coalescence of two lineages picked from a population of size N has the probability density [14,24,36] f CD (t) ¼ 1 N 0 r e rt e (1Àe rt )=(N0rr) : (2:6) Thus, the probability that the two lineages coalesce between the present and time t in the past is Note that the deterministic coalescent is naturally defined for times t [ (0, 1) before the present, in particular for t . T (i.e. prior to the start of the corresponding BD model).

(c) Link between the BD and CD models
Both the BD and the CD models describe populations that (in expectation) grow exponentially in time. In fact, by comparing equations (2.1) and (2.5), it is natural to set the growth rate in the CD model to r ¼ l 2 m. The expected population size in the BD model at time t (unconditioned on survival of the process) is equal to the population size in the CD model at time t. Furthermore, N 0 is the presentday population size and relates to the BD parameters via When the coalescent approximates a BD process, the generation time r must be related to the per capita birth rate l. Volz et al. [8] showed that r ¼ 1/(2l). We explain the derivation of this result in the electronic supplementary material.

(i) Breakdown of the CD model
The length of the process T required to reach N 0 individuals is then given by T ¼ lnN 0 /r. From the point of view of the ancestral process, at time T in the past, the population size was 1. Thus, T is the maximum time at which all lineages should have coalesced to one lineage. However, we show in the following that under the CD model, the probability of observing a coalescent event prior to the origin of the process at time T in the past is strictly greater than zero.
The rate of coalescence for a changing population size is 1/(N(t)r). The probability that two lineages coalesce to one lineage within time interval [0, T ] is (equation (2.7)) (2:8) From this expression, we directly observe F CD,N0!1 (T) ¼ 1 À e À1=rr : (2:9) As in our setting r ¼ 1/2l, we have rr ¼ (l 2 m)/2l and, 2) is e Àg(t) , ( 2 :10) with, and We highlight that not all lineages coalesce by time T. As population size is not defined for t . T, there is a non-zero probability for no coalescence at all.
(e) The coalescent with stochastic population growth via birth-death trajectories (CS model) As pointed out above, the coalescent with a deterministically changing population size does not take into account stochastic fluctuations, which in particular may lead to population extinctions. In order to employ a coalescent with such stochastic population size changes, we define a coalescent process where population size is a population trajectory of the BD model, and the rates of coalescence within this population are 1=(Ñ(t)r) [20], whereÑ(t) is the population size at time t in the trajectory. We refer to this model as a stochastic coalescent (hereafter CS). As before, the generation time is r ¼ 1/(2l). Because the population size is undefined ancestral to time T, we assign 0 probability to coalescent times older than T. In fact, although T is not a parameter in the deterministic coalescent (as lineages simply coalesce backwards in time until the most recent common ancestor of the sample is found), it is a natural parameter for the stochastic coalescent, as BD trajectories are simulated for time T. However, note that under the stochastic coalescent model, lineages may not all coalesce when tracing them back in time for a given population size trajectory.

(i) Expected population size growth
The expected population size growth follows equation (2.2).

(ii) Distribution of coalescent times
While analytical derivation of the distribution of the time to coalescent exists when the population size varies according to a stationary Markov process [37], no such result is available for non-stationary stochastic processes, which is the case here. We therefore obtained the probability density of coalescent times f CS (t) under the stochastic coalescent via simulations. We first simulated BD population size trajectories forward in time for a duration of time T. Then, we sampled the coalescent time of two lineages extant at the present backward in time.

Results
Without loss of generality, we set l ¼ 0.5 (defining our time unit) and thus r ¼ 1/(2l) ¼ 1, meaning only parameters m and T are free to vary. We compare the probability distributions corresponding to the probability densities of the different models, f BD (t), f CD (t), f CDN (t), f CS (t), for R 0 ¼ 1.05, 1.3, 1.6, 2, 4, 10, 20. Furthermore, we plot in dashed lines the coalescent probability densities when using the coalescent rate proportional to N 2 1 rather than to N (see §2g above). We choose T ¼ T BD (N 0 ) such that the expected population size at the end of the process is N 0 ¼ 10, 100, 1000, 10 000 (equation (2.3)). The results are summarized in figure 2 for R 0 ¼ 1.05 and figure 3 for R 0 ¼ 20. The plots for the remaining R 0 values are displayed in the electronic supplementary material, figures S1 -S5.
In the following, we discuss under which circumstances coalescent time distributions under the coalescent models and the BD models are similar. Under these circumstances, the coalescent is a good approximation to the BD model. For large values of R 0 , the CD model has a bias towards older coalescent events. This is most likely to be due to the fact that the CD model allows for two lineages to coalesce at a time t . T before the start of the process. We can quantify the proportion of coalescent events ancestral to T under the deterministic coalescent. The probability that two lineages coalesce before the start of the process (t . T ) is always larger than zero (equation (2.8)), even when N 0 ! 1, under which assumption the coalescent approximation was derived (equation (2.9)). When R 0 ! 1, the probability that two lineages coalesce before the start of the process tends to e 22 % 0.135. This probability decreases with R 0 , and tends to zero for R 0 ! 1, meaning that all coalescent events happen in [0, T ] (though in this case T ¼ 1).
Coalescent times are younger for small R 0 under the deterministic model compared with the stochastic models owing to the differences in population growth. The expected population size at time t under the stochastic population size model (equation (2.2)) is larger than under the deterministic model (equation (2.5); see also figure 4). These differences in expected population size are largest for small values of R 0 , and tend to zero as R 0 increases to infinity. The rate of coalescence of two lineages is proportional to 1/N(t), and hence increases with decreasing population size. Therefore, coalescent events in the deterministic model happen faster (smaller t) than in the stochastic models. Changing the coalescent rate to be proportional to 1/(N(t) 2 1) instead of 1/N(t) does not change the probability density (blue dotted line).
We performed additional numerical experiments to investigate whether using the expected population size conditioned on survival, N BD (t) (equation (2.2)), instead of the usual expression in equation (2.5) improves the approximation of the CD to the BD model (light blue line; figures 2 and 3; electronic supplementary material, figures S1-S5). As N BD (t) is not defined for t . T, we plot only the distribution function in [0, T ], with the value at T denoting the percentage of pairs having coalesced within [0, T ]. As expected, for all considered parameter combinations, the modification of population size makes coalescent times older than under the CD model. However, for all considered parameter combinations, coalescent times are also older than under the BD model, so this new setting over-corrects the deterministic coalescent even for small R 0 (when we expect barely any events at t . T ). Thus, it appears that ignoring full stochasticity in the coalescent population size introduces biases.

(b) Stochastic coalescent
The cumulative probability distribution of the CS model is plotted as a red line. Again, the value at T denotes the percentage of pairs having coalesced within [0, T ]. For small values of R 0 , the CS model is a good approximation to the BD model. As R 0 increases, the CS line flattens out compared with the BD line close to time T. The amplified effect of flattening out for high R 0 is explained by more coalescent events happening ancestral to T for increasing R 0 ; this higher proportion for increasing R 0 is quantified in the deterministic case above. Using a coalescent rate proportional to 1/(N(t) 2 1) rather than to 1/N(t) results in more recent coalescent events. In particular, each pair has coalesced by time T with probability 1. However, for small N, coalescent events now happen more recently than under the BD model.
We point out that when doing phylogenetic inference of population dynamic parameters, we propose a time of origin T and a set of coalescent times, which are on interval [0, T ]. Thus, the prior distributions on coalescent times are the coalescent time distributions plotted in figures 2 and 3 and electronic supplementary material, figures S1 -S5, but normalized to be 1 at time T.

Discussion
We show here that deterministic coalescent approaches both under-and overestimate the coalescent times of two randomly sampled lineages depending on R 0 , with a stochastic BD model being used as a model of reference. Parameter estimates rspb.royalsocietypublishing.org Proc. R. Soc. B 282: 20150420 obtained using the coalescent approximation must therefore be treated with caution, because it is not immediately clear how strong the under-or overestimation is. The reason for the bias for small R 0 is the stochastic change in population size, which is ignored by the deterministic coalescent. A stochastic version of the coalescent can correct the bias for small values of R 0 and large enough population sizes (N . 100 for R 0 ¼ 1.05 and N . 1000 for R 0 ¼ 1.3 and R 0 ¼ 1.6). The reason for the remaining bias for larger R 0 is that the coalescent approximation allows for the two lineages to coalesce after a time that is longer than the duration of the population growth process (i.e. the time at which the population size was 1). More generally, any coalescent approximation (either parametric or non-parametric) that uses a population size function that decreases to a small number at any given time at which two or more lineages have not yet coalesced should be avoided. This has already been highlighted by Griffiths & Tavaré [14,p. 404]: 'there are cases in which variable population size processes are better studied in their original, discrete timescale, particularly those which have very small population sizes for many generations. Although we do not explicitly examine such cases in this paper, the methods developed here can be exploited in that setting too.' Griffiths & Tavaré nevertheless fitted the coalescent with a deterministic exponentially growing population to a dataset (which is fine in case the number of generations where the population size is 'very small' is considered to be not 'many') [14]. With the release of BEAST [38] assuming a coalescent-based prior on trees, the coalescent became the model of choice to use in phylodynamics, without validating its assumptions. We now show here that the exponential growth deterministic coalescent, however, may lead to biased growth rate estimates when studying exponentially growing populations in the presence of demographic stochasticity, questioning the accuracy of such growth rate estimates.
The appealing feature of the deterministic coalescent is that the likelihood of the population size N(t) and generation time r(t) (here constant, but see Volz et al. [8] for time-varying generation time) for a given tree is easily calculated by tracing all lineages backwards in time, with the lineages coalescing at rate 1/(N(t)r(t)). The computational gain, however, comes at the cost of incorrect estimates if the true population growth dynamics are stochastic in nature, which is the case for most biologically relevant applications. A stochastic coalescent can partly correct these errors, but because the population sizẽ N(t) is an ensemble of realizations of a stochastic process, the likelihood must be computed by averaging across the whole ensemble. Thus, the likelihood computation must be repeated a large number of times, which negates any computational advantage of the coalescent approximation.
In fact, when the population size is specified by a single trajectoryÑ(t), the computational cost of calculating the likelihood for a fixed tree under the BD model is of the same order as under the coalescent approximation. While the coalescent likelihood is calculated from the rate of coalescence 1=(rÑ(t)), the BD likelihood is calculated in the following way: each birth and removal event in the tree must coincide with a birth and removal event in the trajectory, otherwise the likelihood is zero. An example for a tree with non-zero likelihood is the tree in figure 1b, given the tra-jectoryÑ(t) in figure 1a. If all birth events coincide, the probability of a tree given the trajectories is a product of simple combinatorial factors, one for each transmission and removal event in the trajectory. For a given event, the factor is either (i) the probability of the event occurring between individuals represented as lineages in the tree (if the event actually occurred on the tree) or (ii) the probability of the event occurring between other individuals outside of the tree (if the event actually occurred outside the tree).
In the particular case studied here-the constant-rate BD model-we can analytically integrate over all trajectories by using the closed-form solution for the coalescent time of two lineages provided in equation (2.4). This provides a computationally efficient way to infer population parameters using the fully stochastic model and avoiding any approximations.
The advantage of the deterministic coalescent is that the rates of coalescence of the sampled lineages backwards in time are easily derived from the population size function. As we described, this backward-in-time interpretation is equivalent when the exact population trajectoryÑ(t) is known in the BD model and the CS model. It is unclear if such a simple backward-in-time (vertical) interpretation exists when averaging over trajectories. The BD model, however, has an alternative 'horizontal' interpretation of how sampled lineages coalesce when integrated over all trajectories, called the point process representation [27,39,40]. The point process allows us to sample the n 2 1 coalescent times in a tree of n extant lineages from the same point process distribution independently. For tips labelled 1, 2, . . . , n, the ith draw from the point process distribution is the coalescent time of the tips i and i þ 1. This horizontal interpretation allows one to quickly simulate a tree on n tips (namely by drawing n 2 1 random variables, each from the same point process distribution). In inference, the likelihood of a tree on n tips is simply a product over the point process densities of each of the n 2 1 coalescent times. Interestingly, the deterministic coalescent model does not have a horizontal point process representation [40]. It remains to be investigated if the stochastic coalescent has a horizontal point process representation. We showed that coalescent models give rise to different coalescent time distributions for two lineages compared with BD models. This has important consequences both when simulating trees and when estimating population dynamic parameters from trees (such as the basic reproductive number R 0 or the net growth of an epidemic r ¼ l 2 m). Trees are simulated under the coalescent by starting at the present and fixing the present-day population size N 0 . Because the process time T CD for the deterministic coalescent model is longer than the time T BD for the BD model for small R 0 (owing to the push of the past effect), the simulated trees will be older under the CD model than under the BD model with the same N 0 . As a consequence, when doing inference, if the true trees are realizations from a stochastic BD model with small R 0 , then growth rates estimated using the CD model will be overestimated, as the CD model expects older trees for the true growth rate. This is in agreement with previous results that, based on fixed trees, the deterministic coalescent overestimates growth rates and induces more ancestral coalescent events [9,17]. Furthermore, when doing inference, it has been observed that the deterministic coalescent is too confident in parameter estimates and underestimates the width of the highest posterior density (HPD) intervals [9,17]. As a consequence, the HPD intervals may not contain the true parameter with high probability when using the deterministic coalescent. This underestimate of parameter uncertainty is a drawback of the coalescent that is not apparent when considering the maximum-likelihood point estimates over a range of trees [41]. For more realistic epidemiological SIS-or SIR-type models, the pattern becomes more complex. Biases will again depend not only on R 0 , but also on the sampling scheme (i.e. sampled during exponential or post-exponential phase) and the overall population size [42].
Decisions have to be made about the sampling scheme in the BD model, and typically we assume that a fraction of individuals is sampled rather than two individuals from an arbitrary-sized population. Here, we conditioned on sampling two lineages from a population of N 0 lineages, where N 0 is a random variable and corresponds to the number of lineages in the population after time T. Different sampling schemes could be employed. A widely used sampling scheme is that each lineage in the present-day population is sampled with some probability p. The probability of sampling two individuals from a population of N 0 at the present then is p 2 (1 À p) N0À2 , which favours trajectories with few surviving lineages. The coalescent time distribution under the BD model will be affected by the specific choice of sampling. The distributions for the BD and CD models have been shown to be different in a number of cases [43].
The stochastic BD model considered here is only a crude approximation of real stochastic epidemic models. We showed that, even for this simple model, using a deterministic coalescent approximation to infer epidemiological parameters can lead to strong biases. We showed that a stochastic coalescent approximation yields correct parameter estimates for values of R 0 close to 1. We do not expect the bias for large values of R 0 to disappear when N ! 1, because lineages coalesce with non-zero probability at times larger than the duration of the process. Thus, parameter estimates obtained using coalescent approximations should be treated with caution. Whenever possible, inference methods that assume a stochastic underlying population model should be used. Computationally tractable implementations of such models have only recently started to become available, such as for the SIS model [11], for skylinetype models [44] and for structured population models with stochastic exponential growth [45]. For more realistic epidemiological models, we must for now rely on coalescent-type approximations [8,10,19,21], or BD-based approximations, such as the BDSIR model [12]. This current dependence on approximations reveals the need for a stronger focus on developing exact methods, as well as thorough validation of approximate methods for epidemiological models. rspb.royalsocietypublishing.org Proc. R. Soc. B 282: 20150420