Pathogen evolution in finite populations: slow and steady spreads the best

The theory of life-history evolution provides a powerful framework to understand the evolutionary dynamics of pathogens. It assumes, however, that host populations are large and that one can neglect the effects of demographic stochasticity. Here, we expand the theory to account for the effects of finite population size on the evolution of pathogen virulence. We show that demographic stochasticity introduces additional evolutionary forces that can qualitatively affect the dynamics and the evolutionary outcome. We discuss the importance of the shape of the pathogen fitness landscape on the balance between mutation, selection and genetic drift. This analysis reconciles Adaptive Dynamics with population genetics in finite populations and provides a new theoretical toolbox to study life-history evolution in realistic ecological scenarios.

asymptotic density of susceptible individuals at time t I i (t) asymptotic density of individuals infected with strain i at time t R(t) asymptotic density of recovered individuals at time t N (t) total asymptotic density of individuals at time t E ,i endemic equilibrium with resident strain i I(t) (I 1 (t), . . . , I d (t)) E(t) (S(t), I 1 (t), . . . , I d (t), N (t)) S(t) asymptotic density of susceptible individuals in slow time limit I i (t) asymptotic density of individuals infected with strain i in slow time limit R(t) asymptotic density of recovered individuals in slow time limit N (t) asymptotic total density of individuals in slow time limit I(t) (Î 1 (t), . . . ,Î d (t)) E(t) (Ŝ(t),Î 1 (t), . . . ,Î d (t),N (t)) P i (t) asymptotic frequency of individuals infected with strain i in slow time limit

Introduction
In this SI, we derive the results in the main text. Where suitable references exist in the literature, we keep the discussion informal, sketching how the results are obtained and referring to the appropriate references for rigorous proofs. Where they do not, we first give a heuristic derivation for a broader audience, whilst deferring the proofs to the end.

A Stochastic Epidemiological Model with Multiple Pathogen Strains
We consider a family of random processes S (n) (t), I d (t), R (n) (t) , indexed by a parameter n, the "system size" [31], which plays a role similar to the census population size in population genetics (see e.g., [15,11,13]): it can be thought of as the area in which the population lives, determining the population density per unit area and the rate of immigration. Similarly to fixed-size population genetic models, we will consider the asymptotic behaviour of our model when n is large. S (n) (t), I  d (t), and R (n) (t) are the number of susceptible individuals, individuals infected with strain i = 1, . . . , d, and recovered individuals, respectively. We will write N (n) (t) for the total population size at time t, so that N (n) (t) = S (n) (t) + I  Equivalently, we may describe our model as a continuous-time Markov chain E taking values in N d+2 with transition rates given in Table 1. When a transition occurs at time t, we will distinguish Transition Rate S → S + 1 λ (n) n S → S − 1 δ (n) S S → S − 1, I k → I k + 1 Table 1: Transition rates from the state S (n) (t) = S, I (n) k (t) = I k and R (n) (t) = R between the value E(t−) of the Markov chain before the transition and its value E(t) after the transition.
All parameters given in Table 1 may depend on n, but are assumed to have a constant value to first approximation in n α (n) i Simple calculations using the master equation tell us that in the absence of infected individuals, the expected value of N (n) (t) is which approaches an equilibrium value of λ (n) δ (n) n as t → ∞. Thus, to first approximation, the total population size is proportional to n.
If one knows the values of S (n) (t) and I (n) (t) := (I   In what follows, rather than working with E (n) (t) we will focus on the rescaled process  E (n) (t) has the advantage of being a density dependent population process [21,22,23,24] as generalized in [28]: the transition rates in (1) depend only on the densitiesS (n) (t),Ī (n) (t),N (n) (t) and not on the absolute numbers of individuals. As we discuss below, density dependent population processes have a number of nice features, including a law of large numbers and central limit theorems. Remark 1. To simplify our subsequent use of subscripts, we will considerĒ (n) (t) as a process taking values in R d+2 , the space of points x = (x 0 , x 1 , . . . , x d , x d+1 ), and useS (n) (t) andĒ

Stochastic Differential Equation Formulation
Here, we introduce a very convenient way of writing our Markov chain as the solution to a stochastic integral equation with the help of simple Poisson processes.
A Poisson process P is a Markov process making jumps of +1 exclusively, and such that P (0) = 0. A Poisson process P is a called a simple Poisson process if it jumps at constant rate 1. In this case, (P (at)) is a Poisson process with rate a. This can be generalized by noting that (P ( t 0 a(s) ds)) is a time-inhomogeneous Poisson process which jumps at rate a(t) at time t. Similarly, there is a unique continuous-time Markov chain X satisfying and when X(t−) = x, X jumps to x + 1 at rate f (x).
Then it is not difficult to extend this (see Chapter 6, §4 in [14] for details) to our Markov process as follows: where all the processes P l (t) are independent, simple Poisson processes, indexed by the corresponding jumps, l, of the Markov process (E (n) (t)) and e i is the i th standard basis vector, the element of R d+2 with zeros everywhere except for a 1 at row i i.e., an immigration event is indexed by e 0 + e d+1 , as it increases the number of susceptibles and the total population size by 1.
Changing variables, we get This formalism is useful because it will allow us to write each r.h.s. as the sum of a deterministic trend and of a stochastic term with zero expectation.
Recall that the marginal value P (t) of a simple Poisson process at time t is a Poisson random variable with parameter t. In particular, P (t) − t has mean 0 and variance t. So if we writẽ we are writing P (t) as the sum of a deterministic trend t and of a stochastic termP (t) with mean 0. If we come back to the example of the Markov process X jumping at rate f (X), we can write where we have set In addition, since the incrementsP (t + s) −P (t) are independent of the past before t, have mean 0 and variance s, we can write the last equation in differential form where U (t) equals 1 iff P jumps at t 0 f (X(s−)) ds and equals 0 otherwise. In particular, conditional on X(t−) = x, dM (t) has mean 0 and variance f (x) dt. Thus, we also recover the infinitesimal variation of X as the sum of an infinitesimal trend in the dynamics and of a stochastic fluctuation term with zero expectation. Now let us return to our initial process. We adopt the same notation as previously, for examplẽ P −e 0 −e d+1 (t) = P −e 0 −e d+1 (t) − t and Proposition 1. The infinitesimal variation ofĒ (n) (t) can be written as the sum of an infinitesimal deterministic trend and of a stochastic fluctuation term with zero expectation: −e d+1 (t), are independent infinitesimal noise terms with mean zero and respective infinitesimal variances nρ Remark 2. Note that nρ (n) l Ē (n) (t) , is rate at which the Markov process (E (n) (t)) makes a jump l. Setting the result in the proposition can be written more compactly as d+1 (x)). Thus, the function F (n) (x) describes the infinitesimal trend in the dynamics, whereas the terms M (n) l (t) capture the de-trended fluctuations corresponding to each type of possible event. This equation is analogous to an Itô SDE, only now the driving noise is the discontinuous M (n) (t), rather than the more familiar Brownian motion.
We note that for i = 1, . . . , d, two facts that will prove useful in what follows.
In particular, it makes sense to define a (n) as the infinitesimal variance-covariance matrix ofĒ (n) (t) by Let us compute a (n) . Because all distinct terms in the definition (S.2) of M (n) are independent, all cross terms vanish. For example, for any . We should not be surprised by the fact that this infinitesimal covariance is negative. Each event where a susceptible is infected by strain i has the effect of simultaneously decreasing the number of susceptibles and increasing the number of individuals infected by strain i.
By similar calculations, it is easy to see that and for any 1 In other words, for any 0 ≤ i, j ≤ d + 1, Equivalently, where the sum is over all possible jumps l of the Markov process (E (n) (t)).

Obtaining Equation (4)
Note that similarly to Brownian motion, the infinitesimal mean of 1 √ n M (n) l (t) during the time interval dt is zero and its infinitesimal variance is ρ   This allows us to rewrite the results in Proposition 1 in the form of the following diffusion approximation for n large, (see [23] for a rigorous statement).
and combining independent Brownian motions, we obtain equation (3) in the main text (n.b., to simplify notation in the main text, we use S, I and R in lieu ofS (n) (t),Ī (n) (t) andN (n) (t)).
We shall not use this diffusion approximation in the sequel, where we continue to consider the process with discrete jumps, (S.3).
To motivate this, suppose we had a deterministic differential equatioṅ and we let X(t) be a deterministic real function of Y (t), say where g : R d+2 → R is assumed to be continuously differentiable.
Then, applying the chain rule, we derive a differential equation satisfied by X(t): or equivalently The analogue of the chain rule in the fully stochastic case is the Meyer-Itô's formula (see e.g., [29]).
where a (n) (x) is the infinitesimal variance-covariance matrix ofĒ (n) (t) defined in (S.6) and where the sum is over the times s of discontinuity ofĒ (n) . At a time t of discontinuity, denotes the magnitude of the jump inĒ (n) j at time t. The term ε (n) (t) correcting for discontinuities distinguishes the more general Meyer-Itô formula from the familiar Itô's formula for diffusions. In Section 8.1, we show that ε (n) (t) = O 1/n 2 .
Using this, we can derive Equation (5) from the main text. Let is the proportion of the population infected with strain i. Since P (n) i (t) is a deterministic function ofĒ (n) (t), we can use Itô's formula (S.8) for jump processes. The following statement will be proved rigorously in Section 8.1. Proposition 2. The fraction of the population infected by strain i satisfies gives the Malthusian growth rate of strain i, whereas v (n) is the infinitesimal variance associated with the growth of strain i. In addition, for any T > 0, there is a constant C such that for all t ∈ [0, T ], P{|n 2 ε To obtain Equation (5) in the main text, we omit the lower order error term ε (n) gives the total number of infectives, and observe that, similarly to the previous section, for independent Brownian motions B 1 , . . . , B d , so that

Probability of Fixation of a Mutant Pathogen
In this section, we will be interested in the long time behaviour of our multi-strain stochastic epidemics. In particular, we tackle the problem of predicting which strains will be outcompeted and which strains will fix.

A Deterministic Limit and its Asymptotic Analysis
We begin this section with a result stating the convergence to a deterministic dynamical system as n → ∞. Proposition 3 (Theorem 2.2, [23]). If as n → ∞, then for any fixed T > 0, with probability 1, where E(t) := (S(t), I 1 (t), . . . , I d (t), N (t)) is the solution to the following system of ordinary differential equations:Ṡ with initial conditions S(0), I(0), and N (0).
Note that the result in the previous proposition can be written more compactly aṡ While we continue to work with the finite n fully stochastic process, the bifurcation structure of the deterministic system (S.12) will guide our analysis of the stochastic model. In particular, the steady states of this model, together with the degenerate case that arises when stability is exchanged between fixed points, give rise to two regimes that correspond to strong and weak selection in classical population genetics. To be explicit, let be the basic reproduction number of strain i. R 0,i is the expected total number of new infections caused by a single infected individual, assuming an unlimited supply of susceptibles.
If R 0,i = R 0,j for all 1 ≤ i = j ≤ d, the equations (S.12) have d + 1 fixed points, one at 0 and one at the d equilibria where the population is infected by a single strain When d = 1, it is shown in [32] when δ > α 1 that: if R 0,1 > 1 then unique endemic equilibrium of the strain,Ē ,1 is globally asymptotically stable, whereas if R 0,i ≤ 1, the disease-free equilibrium 0 is globally asymptotically stable. The stability of fixed points is slightly more subtle when there is more than one strain. Definition 1. We distinguish between two regimes of selection.
(i) The strong selection case, when R 0,1 > R 0,i for all i > 1, δ > α 1 and R 0,1 > 1; (ii) The weak selection case, (i) In the strong selection case, the equilibrium stateĒ ,1 with strain 1 endemic and all other strains extinct is globally asymptotically stable from any initial condition for which I 1 (0) > 0.
(ii) In the weak selection case, we arrive at a degenerate situation in which deterministic coexistence of strains 1, . . . , d is possible. Strains m + 1, . . . , d will eventually disappear, whereas all points x ∈ R d+2 are fixed points for the system (S.12). The set Ω of such points is globally attracting, but no point in Ω is an attracting fixed point.
Proof. Point (i) follows by a direct adaptation of the result in [6]: rearranging (S.12b), we see that and, recalling that I 1 (t) is bounded for all t > 0, we see that for all i > 1 The same argument shows that in the weak selection case, strains m + 1, . . . , d will eventually disappear, whereas all points in Ω are fixed points. Moreover, all vectors u tangent to Ω, i.e., such that are all eigenvectors to the Jacobian of F -evaluated at any x ∈ Ω -corresponding to the eigenvalue 0. Thus, while Ω is a globally attracting set, no point in Ω is an attracting fixed point.
We now turn to the computation of the fixation probability of a novel strain in the fully stochastic system. Informed by the previous statement, we will consider two cases, strong and weak selection, where the dynamics of the process -and thus our approach to the fixation probabilities -are qualitatively different. We will then show, despite the difference in the approaches, and in the expressions for the fixation probability thereby obtained, that our two results for the fixation probability agree on all intermediate scalings, and may thus be combined (heuristically) via the method of matched asymptotic expansions, to obtain a single expression valid across all scales.

The Strong Selection Case
We begin by recalling that for the deterministic approximation (S.12) to apply, we required that Unpacking this assumption, we see that I (n) , that a non-trivial portion of the population is already infected with strain i.

For any strain with
i (0) → 0, and thus I i (t) ≡ 0 for all t ≤ T , for any fixed T > 0: until O(n) individuals are infected, strain i is effectively invisible to the deterministic approximation on any finite time interval. This is not to say that the strain is absent, but rather, if we sample individuals from the population uniformly at random, the probability of sampling an individual infected with strain i is zero.
We will consider the case when a fixed number k of strain 2 individuals invade an established resident population. For our purposes, a strain i is established if it is initially present in macroscopic numbers, i.e.,Ī (n) In light of the results in 5.1, we will assume that there is only a single resident strain, strain 1. We will first consider the case when the resident strain is in endemic equilibrium, and then generalise to the case when the resident strain, whilst still present in macroscopic numbers, is initially away from equilibrium.
Should the invading strain, strain 2, exceed εn individuals, for any ε > 0, we arrive again in the domain of applicability of the deterministic approximation (Ī (n) 2 (0) → I 2 (0) > ε > 0). If the reproductive number of the invader is greater than that of the resident, R 0,2 > R 0,1 , then for n sufficiently large, the dynamics are essentially deterministic, and with high probability (i.e., tending to 1 as n → ∞) the process will in finite time T ε enter an ε-neighbourhood of the fixed point E ,2 for arbitrarily small ε > 0. Once the process reaches this new equilibrium, we will see the resident strain is no longer viable, and subsequently disappears.
If we start at the endemic equilibrium of the resident strain 1, E ,1 , then until I (n) 2 exceeds εn, the epidemic process E (n) (t) will remain close to that point. We thus have and to first approximation, strain 2 has per-host transmission and clearance/mortality rates of This latter is a birth and death process (see e.g., [3]) which will go extinct with probability if R 0,2 > R 0,1 , and with probability q = 1 otherwise. This probability of extinction q is for a single initial individual infected with strain 2, and becomes q k for k initial individuals. Now, a birth and death process either goes extinct or grows arbitrarily large, so with probability 1 − R 0,2 R 0,1 it will eventually exceed εn.
Similarly, when we have reached a neighbourhood of E ,2 the transmission and clearance/mortality rates of strain 1 are approximately Since R 0,2 > R 0,1 , this is a subcritical birth-death process which goes extinct with probability 1. Thus, invasion implies replacement, where for our purposes, the process invades if it exceeds εn individuals for some fixed ε > 0.
To summarize, we have heuristically derived Proposition 5 (Strong Selection). Consider a population infected with 2 strains such that R 0,2 > R 0,1 .
(ii) [Novel strain in small number of copies, resident at endemic equilibrium] Suppose thatĪ In other words, the event that strain 1 becomes extinct asymptotically coincides with the event that strain 2 invades, which happens with a probability asymptotically equal to the probability of a time-homogeneous birth-death process; on this event, the system reaches in finite time the deterministic equilibrium (S.13) with only strain 2 endemic.
We give a rigorous proof of this result in Section 8.2, and compare (S.15) to fixation probabilities estimated from simulated epidemics in Figure S.4b.

The Weak Selection Case
Recall that weak selection corresponds to the case when R 0,i = R 0,j = R 0 for 1 ≤ i, j ≤ m, whereas R 0 > R 0,j for m < j ≤ d (to simplify the discussion, we consider only the case where m = d). We hasten to clarify, however, that 0,i and R (n) 0,j are allowed to differ by O 1 n terms; as we shall see below, this is analogous to the weak selection limit of classical population genetics, and the values r i will appear as selection coefficients in a diffusion approximation.
In this case, we have a separation of timescales: there is a fast time-scale, in which (S.11) tells us that the stochastic process approximately follows the trajectories of (S.12) arbitrarily closely to an arbitrarily small neighbourhood of Ω. Then, as we discuss below, there is a slow-time scale, in which, having arrived at Ω, the stochastic process remains near this critical manifold.
where, as before, we letÎ (n) (t) = Î (n) , and note that Here, rescaling time by n is analogous to the passage to so-called "coalescent time" or "generation time", which is used to derive the diffusion limit of the Wright-Fisher model in classical population genetics.

Recalling (S.3), the SDE forÊ (n) (t) is then
Thus, in the slow time scale, the advective component is accelerated by a factor of n, causing the process to move rapidly to the critical manifold Ω; as n → ∞, this movement becomes instantaneous, and the process immediately jumps to Ω at time t = 0. Moreover, stochastic fluctuations away from Ω are restored instantaneously, so the process becomes "trapped" on Ω as n → ∞. The following statement formalizes this idea using the projection π defined as follows. Let E(t, x) = (S(t, x), I(t, x), N (t, x)) be the solution to (S.12) with initial conditions S(0) = x 0 , i.e., π(x) is the point on Ω at which the trajectory of (S.12) starting from x meets Ω. 1 where the latter is a diffusion on the manifold Ω, solution to the system of stochastic differential equations In other words, for each i where Dπ i denotes the gradient vector of π i , Hπ i its Hessian matrix, and Tr denotes the trace operator.
This diffusion can be understood as the result of stochastic fluctuations around Ω immediately followed by a strong deterministic drift towards Ω.
As can be seen from Proposition 3, the drift pushes the process very rapidly onto Ω, so that in the limit, the process lives permanently in Ω. Now to understand the interplay between the deterministic dynamics towards Ω and the stochastic fluctuations around Ω, it is useful to think of the dynamics in two steps. Suppose that starting from a pointÊ(t−) ∈ Ω, the process E has a jump l. Then, the rescaled process (Ê (n) (t)) has a jump 1 n l.
In a second step, it is immediately projected back to the manifold by the drift at the new location, so: Thus, expanding the i-th component of the r.h.s. of the last equation and recalling that π Ê (t−) = To determine E[dÊ where, because we have rescaled time, d(nt) replaces dt in probability of a jump at t. Recalling F (Ê(t)) = 0 and (S.7), this yields the first term of the previous equation yields the first term of Eq (S.16).
A picture (Figure S.2) more immediately explains the emergence of the variance induced drift: unless the flow lines are parallel, jumps of identical magnitude and direction will be returned to the manifold Ω at different distances from the initial point, as one moves along the manifold: Of course the rigorous way of obtaining the result is to use Itô's formula as done in the proof.
To characterise the limitÊ, we shall make use of π. Unfortunately, π(x) is impossible to compute analytically, but we can still use it to obtain an SDE forÊ(t). We first observe that if F is twice-continuously differentiable, then π is as well [18]. The continuity of π then tells us that π(Ê (n) (t)) ⇒ π(Ê(t)) as well. Since π has first and second derivatives, we may apply Itô's formula (see Section 4) to π(Ê (n) (t)):  The mutant strain has a higher virulence than the resident. The deterministic trajectories are shown as grey arrows that point towards the manifold Ω (the black line). The light red ellipsoid has axes proportional to the infinitesimal variance of the jumps that displace each strain from a given point on the manifold (black dot). The combination of the effect of stochasticity and the fast deterministic return to the manifold generates a drift (red arrow) that favours the strain with the lower virulence. Parameter values of the resident: β 1 = 10, α 1 = 2, δ = 0.05, γ = 0.5. Virulence of the mutant: α 2 = 1.25, 2, and 2.75 in A, B and C, respectively.
where, as before, a (n) jk (x) is given by (S.6) and ε (n) i (nt) is a smaller order error term. On first inspection, it might appear that the drift term, which is multiplied by n, explodes as n → ∞; however, from the definition of π, we see that π(E(t, x)) = π(x), and thus, and the terms of order O(n) vanish identically.
We can thus replace which remains bounded, as our assumptions (S.1) guarantee that where, for example, Thus, The latter is a stochastic process with jumps of order 1 n and variance Thus, as n → ∞, 1 n M (n) −e 0 −e d+1 (nt) approaches a continuous stochastic process with variance t 0 δÊ 0 (s) ds.
The martingale central limit theorem (see e.g., [14]) tells us that the only stochastic process with these properties is a Brownian motion with the same variance, (i.e., B −e 0 −e d+1 (t) is a standard Brownian motion with mean 0 and variance t).
Proceeding similarly, in the limit, we may replace all the terms M (n) l with integrals of independent Brownian motions, so that as n → ∞, 1 is an ordered list of the D Brownian motions corresponding to the D noises M l (t) and σ(x) is as in the statement. Taking the limit as n → ∞ on both sides of (S.18) and recalling that π(Ê(t)) =Ê(t), we obtain (S.16).
While the drift terms seem rather mysterious, they may be interpreted geometrically. We first observe that Proposition 7. (Dπ)(x) is the projection onto the tangent space to Ω at x, T x Ω.
If, moreover, x ∈ Ω, we also have π(π(x)) = x, so taking derivatives on left and right, using the chain rule, we have that where I denotes the identity matrix. Now, since x ∈ Ω, the right hand side is equal to so we have that (Dπ)(x) is a projection. It remains to see that it is a projection onto the tangent space. We will do so by showing it's image contains, and is contained by, the tangent space.
For the former, we recall that a vector X is in the tangent space to Ω if and only if there exists a parametric curve σ x,X (t) such that We then have π(σ x,X (t)) = σ x,X , and thus and thus T x Ω is in the image of (Dπ)(x).
On the other hand, since π(x) ∈ Ω, we have F (π(x)) = 0, and again, taking derivatives using the chain rule, we have Thus, the drift vector (Dπ)(x)f (x) from (S.17) is the projection of the vector f (x) onto the tangent space to Ω. This is an immediate consequence of the strong drift: in the absence of constraints, the process would move (on average) in the direction of this vector, whose components are the relative fitness of each strain, multiplied by the density of that strain. However, density limitation prevents unlimited growth, confining the process to the manifold Ω, and thus the direction of motion to the tangent plane, and the strains experience a drift that is the best approximating vector to their unconstrained growth rates.

Computing the derivatives of π
To complete our derivation of the equations for the limiting processÊ(t), we must compute the derivatives of the π i . Proposition 8. Let x ∈ Ω and 0 ≤ i ≤ d + 1. The first partial derivatives of π i at x are given by The second partial derivatives π i at x are given for any k, n both different from 0 and d + 1, by: Proof. We recall that under the weak selection hypothesis, We can solve this to obtain and, taking the limit as t → ∞, Taking derivatives, we then have Moreover, using (S.14) we have that Together, these equations give us systems of linear equations that may be solved for the various derivatives of π i (x). To illustrate, consider ∂π i ∂x 0 ; from the above, we have that whence ∂π 1 ∂x 0 = 0, and thus ∂π i ∂x 0 = 0 for all i. Proceeding in the same manner, we find ∂π i ∂x d+1 = 0 as well, and thus that all second derivatives of π i (x) involving x 0 or x d+1 vanish identically, whilst We shall only need to evaluate these for x ∈ Ω, whereÊ(t) is trapped. For such x, the first derivatives simplify to the expression given in the statement, since π(x) = x for x ∈ Ω. Similar calculations lead to the second partial derivatives.

Reduced Diffusion
We can use the results of the previous section to provide semi-explicit expressions for the SDE satisfied byÊ and displayed in Proposition 6. Proposition 9. Unlike the full stochastic SIR model, the weak selection limitÊ can be completely characterised by a system of equations that depend only on the variablesÎ 1 , . . . ,Î d : and dB 1 (t), . . . , dB D (t) are independent Brownian motions.
Proof. In the previous section, we observed that ∂π i ∂x 0 = ∂π i ∂x d+1 = 0, and thus any second partial derivative with respect to x 0 or x d+1 vanishes as well. Moreover, for i = 1, . . . , d, Moreover, for x ∈ Ω, we have so that, from (S.6), we obtain in the limit n → ∞ Substituting these and the first derivatives into (S.16) allows us to complete the description of the weak limitÊ(t), exploiting the fact that the triples of Brownian motions B −e 0 +e i , B −e i −e d+1 and B −e j and B −e 0 +e j , B −e j −e d+1 and B −e i are independent for i = j to combine each triple into a single Brownian motion.

Frequency Process
Repeating the argument of Section 4, we can use the functions Π i to finding an equation for the frequency of strain i, j=1Î j (t) where, because the limiting process is a diffusion, the standard Itô formula applies. We omit the lengthy calculations this entails, and present simply the result.
For our process P (t), we find that Writing , and recalling that we see that we can explicitly express I e as a function of p: . Remark 3. The notation above has been deliberately chosen to evoke the Wright-Fisher diffusion in population genetics, with s i (p) and I e (p) a frequency-dependent selection coefficient and an effective population size, respectively. To understand the motivation for the latter, note that so that I e (x) is an idealization of the total density of infected individuals (which is itself a random variable) when the diffusion limit is at the point x ∈ Ω, or, equivalently, when the frequencies of the various strains is p. In principle, additional population structure and the corresponding sampling effects could be taken into account via an "effective infected population size" much as effective population sizes are used in the Wright-Fisher model.

Results for d = 2
If we have d = 2 strains, then, since P 1 (t) + P 2 (t) = 1, it is sufficient to consider the frequency of the invading strain, strain 2. Writing P (t) := P 2 (t), the results of the previous section tell us that the generator of P (t) 2 is and a(p) : The generator allows us to compute many quantities of interest for the process P (t). In particular, if h(p) is the probability of fixation of strain 1 given P (0) = p, then h(p) satisfies the boundary problem The generator of P (t) is the operator on the space of continuous functions on the d-simplex We recall that if the diffusion process P (t) has SDE dP (t) = b(P (t))) dt + ς(P (t))) dB(t), then is the variance-covariance matrix for dP (t), and the probability density function for P (t), say f (t, p), satisfies the Kolmogorov backward equation (see e.g., [15,11,13]). This may be solved to give Leth(p) be the numerator of this fraction. Substituting the expressions for a(p) and b(p) and some simplification yields .
For ease of notation, we will write We can evaluate this expression numerically, but we will be particularly interested in a number of special cases, when we can obtain analytical approximations toh(p).
(i) When r 2 = r 1 (or, more generally, when R 0,i = R 0 1 + o 1 n ) we can give an explicit closed form forh(p), and thus h(p): This expression is not, however, especially illuminating.
We then have that where O(2) is used to denote terms of order O σ 2 , O θ 2 , or O(σθ).
Then, to order p 2 , the fixation probability is which may be written informally in terms of the original parameters as to lowest order. Here we have used In practice, we are most interested in the case when a single individual carries the mutant strain, so p = 1 nIe(0) = 1 I (n) (0) . While our proofs -which assume that p is independent of n, and thus that the number of invading individuals is proportional to n -do not justify taking this value for p, we find that the expression for the fixation probability obtained by taking p = 1 I (n) (0) , which to lowest order is agrees extremely well with simulations ( Figure S.4c) -an example of the so-called "unreasonable effectiveness of mathematics' [33] -and will use it to investigate the long term evolution of the virulence in Section 6.

Reconciling the Strong and Weak Selection Results
On first inspection, our expressions for the strong and weak selection limits have little in common. In Section 5.2, we saw that if R More generally, we might consider the intermediary scalings: let 1 κ n n, and suppose that Substituting these into our expression for strong selection, we see that, provided r 2 > r 1 , we have that the probability of fixation is as n → ∞ (n.b., that when κ n n, I (n) 2 (0) κn → 0, so trivially the probability of fixation is 0).
On the other hand, we can begin with our expression for the fixation probability under weak selection, which written in terms of the original parameters R Replacing R We can find a large n asymptotic expression for this probability using a pair of lemmas: Lemma 1. Suppose that φ(x) and g(x) are an increasing continuously differentiable function and a continuous function on [a, b] (−∞ < a < b < ∞) respectively, and that φ (a) = 0. Then, Proof. Fix ε > 0 such that φ (a) > ε. Using Taylor's theorem, we may write where R(x) → 0 as x → a. Fix δ > 0 such that Since ε > 0 can be chosen arbitrarily small, the result follows. Proof. By direct computation, we have The result follows.
To apply the lemmas here, we take a = 0 and b = 1, M n = n κn , and, as before, whereas we now take a slightly different definition for φ(p), which now has an o(1) correction: as n → ∞. Then, using Lemma 1 we conclude thath(1) is asymptotically equivalent to g(0)e −Mnφ(0) M n φ (0) . Now, to consider the numerator when we start with I 2 (0) I e (P (n) (0))κ n i.e., so that X (n) Mn ∼ p.
Thus, applying Lemma 2, we havẽ and the probability of fixation obtained from the weak selection expression is again asymptotic to 1 − e −(r 2 −r 1 )ι .
While this is not a rigorous proof, it does demonstrate heuristically that the weak and strong selection expressions for the fixation probability agree to first order when applied across the intermediate selective regimes. In particular, we can use the method of matched asymptotic expansions (see e.g., [17,20]) to combine our two solutions into a single expression valid across all scales, by summing the expressions for strong and weak selection and subtracting their common limit, where all are expressed in the unscaled (i.e., strong selection parameters): where [x] + = max{x, 0} and we have used that We illustrate how these approximations compare to a simulated epidemic in Figure S

Adaptive Dynamics
Using the expressions for the fixation probability derived above, we can use the framework of adaptive dynamics to investigate the long-term evolution of strains. In what follows, we give an informal discussion of the derivation of the canonical diffusion for the process, a generalisation of the canonical equation of adaptive dynamics which allows us to consider the influence of random drift on phenotypic evolution. We refer the reader to [26] for a more extensive discussion aimed at a biological audience and to [9] for a mathematically rigorous derivation of the canonical equation.
We briefly recall the assumptions of adaptive dynamics in the context of our epidemic models; throughout, we assume that a novel mutant strain with virulence α invades a population which is at the endemic equilibrium with a resident strain α.
We assume a tradeoff between transmissibility and virulence, so that the contact rate of a strain depends on its virulence according to some fixed function β(α). For our numerical investigations, we take where w is a parameter that determines the "flatness" of the fitness landscape.
The reproductive number is a function of the virulence, We will assume that there is a value, α 0 , for the virulence that maximises R 0 (α).
Under these assumptions, the density of ndividuals infected with the resident strain at the endemic equilibrium is is non-zero on a range (α min , α max ); outside of this range, R 0 (α) ≤ 1, and the pathogen goes extinct (when R 0 is independent of the virulence, α min = 0; in this case, as we will see below, we must fix α max < ∞ to have a probability distribution. The choice of α max is arbitrary).
We then have, using (S.27), that the fixation probability of a mutant strain of virulence α arising in a single individual in a population in which a strain of virulence α , is To introduce the evolutionary dynamics, we assume that mutations occur in individuals with virulence α at a per-capita rate η(α), where > 0 is a dimensionless parameter that we will take to 0. This will ensure that, with high probability, fixation occurs before a second novel mutation can arise. The population is thus assumed to be monomorphic (i.e., all individuals have the same virulence) between invasion events. We will assume that given a mutation occurs in an individual α, the offspring has virulence α with probability K(α, α − α) where K has mean 0 and finite variance: We now pass from the individual based model to the trait substitution sequence [25,10]: we have seen that whenever a new strain arises, either the mutant or resident strain will rapidly go extinct.
Until a new mutant arises, the population will be composed entirely of individuals of the surviving strain. Let A (t) be a random variable giving the virulence of the strain that survived the last competition event prior to time t. The population is thus entirely composed of the strain A (t) except for times t in the short intervals when two strains are competing. If we pass to a "mutational time scale", t , as → 0 the duration of these intervals shrinks to 0, and we are left with a process in which novel mutations either fix or disappear instantly, so that the population is only observed with a single strain at equilibrium.
Formally, as → 0, A ( t ) ⇒ A(t), a continuous time Markov chain that jumps from virulence α to α when a strain of virulence α successfully invades a population with resident virulence α [9]. The process A(t) has generator (recall, the generator is the operator L defined by Lf , then the mutant cannot fix. We now consider the limit of small mutations, and, following [8] assume that mutations will be of order O(ε), where ε is a dimensionless parameter which we will take to 0; this limit of small mutational effects allows us to ignore terms of order O |α − α | 2 in the fixation probability. In particular, we introduce a rescaled kernel so that K ε (α, z) dz is the probability the mutant has virulence α + εz. Let A ε (t) be the process defined as A(t) above, but with the kernel K replaced by the kernel K ε . Now, consider the time rescaled processÂ ε (t) := A ε t ε 2 ; this has generator In Section 8.3, we prove Proposition 10. As ε → 0,Â ε (t) converges to a limiting diffusionÂ(t) with advective coefficient and diffusion coefficient and generatorL for f ∈ C 2 [α min , α max ] such that f (α min ) = f (α max ) = 0.

Stationary Distribution
Using (S.29) and (S.30), we may compute the stationary distribution ψ ofÂ(t); this stationary distribution describes the long-term behaviour of the virulence, after any "memory" of the initial state has been lost. Given any subset A ⊂ (α min , α max ), ψ(A) gives the proportion of time that the virulence is in the set A, or equivalently, the probability that at some random sampling time t, the virulence takes a value in A. ψ is characterised by the relation In particular, if ψ(dα) has a density, which, in a slight abuse of notation, we write as ψ(α), is the adjoint operator toL (and thus, is the Fokker-Planck equation for the probability density ofÂ(t), f (α, t)). Thus, dα dα is a normalising constant.
From the previous, we have Unfortunately, we can only compute its integral analytically in the case of a "flat landscape", when R 0 (α) ≡ R 0 , independently of α. In this case, we have β(α) = R 0 (δ +α +γ), β (α) = R 0 , and In particular, in the case when η(α) and ν(α) (and thus σ 2 (α)) are constants independent of α, then we can integrate this to obtain Z and thus a closed expression for the stationary distribution: Because this is a zero-flux solution, α min and α max may be chosen arbitrarily (we have a zero-flux boundary condition). We take α min = 0; α max must be finite to ensure that Z is finite.
In the next section we will show how one may obtain an analytical approximation in the large n limit.

Asymptotic Approximation to Stationary Distribution
The stationary distribution (S.31) lends itself to an approximation by Laplace's method (see e.g., [12]), which tells us that if φ(x) is a twice differentiable function with a unique local maximum attained at x 0 ∈ (a, b), and g(x) is continuous, then ; (S.34) see e.g., [4]).
We can apply this to our stationary distribution by first observing from (S.32) that and and we can thus apply Laplace's formula to compute Z.
We next observe that which depends on the choice of tradeoff function β(α). We note briefly that this is a local maximum if and only if β (α 0 ) < 0. In particular, if we assume that R 0 (α) has a unique global maximum, then it must occur at α 0 . We will henceforth make this assumption (n.b., we don't have to assume this in general, but in applying Laplace's method, we require that α 0 be a global maximum; more generally we could partition (α min , α max ) into intervals containing a single local maxima).
We then have Thus, We then have Next, recalling that φ (α 0 ) = 0, a Taylor expansion gives Now, for α close to α 0 , (α − α 0 ) 3 will be quite small, so we can locally approximate our full stationary distribution by a process that is almost a Gaussian with mean α 0 and variance , except for a pre-factor of σ 2 (α 0 )β(α 0 ) σ 2 (α)β(α) , which skews the distribution: Again assuming that η(α) and ν(α) are constants independent of α, this simplifies to Remark 6. Applying (S.34) with f (α)g(α) in place of g(α) for an arbitrary differentiable function f (α), we may estimate the error in integrating f (α) versus the true and approximate densities for the stationary distribution: From this, see that in the bounded Lipschitz metric on probability measures, the difference between the true stationary approximation ψ(dα) and the Laplace approximation ψ approx (dα) satisfies Unfortunately, we cannot similarly bound the total variation distance: for any M > 0 the function may be made arbitrarily large.

Mean & Mode of the Stationary Distribution
We observe that, whilst the stationary distribution is closely related to a Gaussian centred at α 0 , the value of the virulence that maximises R 0 (α), the full stationary distribution does not have mean α 0 , and α 0 is not the most probable value of the virulence.

Estimating the Mean
Applying Laplace's method with g(α) replaced by αg(α) allows us to estimate the mean of the stationary distribution; to lowest order, we have so that, normalising by our prior estimate of Z, we find that the mean is α 0 to order O 1 n . To observe the effects of a finite population size, we can the use higher order corrections to Laplace's method to obtain the O 1 n terms in both the integral above and in Z; we omit the calculations, but remark that the mean can then be shown to be which simplifies to − 1 δ+α 0 +γ in the case when η and ν are independent of α, and We note that this order O 1 n term is proportional to the variance of the best-fit Gaussian of the previous section. Note also that One may similarly show that the variance and skewness of the stationary distribution are respectively. We note that if η (α 0 ) ≥ 0 and ν (α 0 ) ≥ 0 (for example, if both rates are independent of α), then the stationary distribution has negative skew.

Estimating the Mode
We begin by observing the density function of the stationary distribution (S.31) may be written as and thus has its maximum where or equivalently, for the value of α such that where φ(α) is given by (S.35).
Even in this special case, we cannot solve for α exactly. Instead we will seek a perturbative solution to in the general case. We already know that φ (α 0 ) = 0; we thus seek a solution of the form Substituting this into (S.40) and Taylor expanding right and left, we find that which may be expanded using the expression for g (α 0 ) g(α 0 ) given in the previous sections. In particular, when when η(α) and ν(α) (and thus σ 2 (α)) are constants independent of α, we have that so that to first order, the modal virulence is the virulence maximizing R 0 (α) less the product of the variance of the best-fit Gaussian and the expected infectious period when the virulence is α 0 .

Some Remarks Regarding Simulations
For all simulations, we simulate the Markov process described in Table 1.
For Figure 2, we consider a single individual infected with the mutant strain with the initial number of susceptibles, infecteds and total individuals set equal to the closest integers to nS eq , nI eq and nN eq , where (S eq , I eq , N eq ) is the (deterministic) endemic equilibrium with only the resident strain present. No further mutations are allowed to occur. For each value of s and σ, 10 6 simulations were allowed to continue until fixation of the mutant or resident, and we plot the fraction in which the mutant fixed.
For Figures 3,4 & 5, following Section 6, we impose a tradeoff between β and α, here taking The simulations were started with nN eq (α 0 ) individuals, nI eq (α 0 ) infected with a resident strain with virulence α 0 -the virulence maximizing R 0 (α) -and nS eq (α 0 ) susceptibles (as previously, (S eq (α 0 ), N eq (α 0 ), N eq (α 0 )) is the endemic equilibrium for the deterministic limit with a resident strain of virulence α 0 .) With probability µ per time-step, an individual was selected uniformly at random from the infected population, and the virulence of the mutant pathogen was selected according to the kernel if α m − α = ±∆α, and 0 otherwise, for ∆α = 6 100 , α min = 0, and α max = 6. There are thus 100 discrete values for the virulence (n.b., by assumption, for α = α min , mutations ∆α occur with probability 1 2 , whereas the process remains at α min with probability 1 2 , and similarly for α = α max ) which are the bins of the histogram in Figures 3,4 & 5. In [7], it is shown that µ has to be of the order of O 1 n ln n or smaller to ensure that mutations occur sufficiently rarely that little or no clonal interference occurs, and such that the resident population is in a small neighbourhood of the equilibrium of the deterministic limit when a mutation occurs. In Figures 3& 4, we take n = 200, so n ln n ≈ 1060 and take µ = 1 1000 to be in the appropriate range. A small amount of clonal interference was observed, but it did not contribute significantly to the histogram. For selection, taking w = 0 forces all strains to have equal R 0 , ensuring weak selection, whereas for w > 0, so selection is always of order O 1 n for w = 0.01, whereas when w = 0.1, one could see strong selection when |α 0 − α| is large. Despite this, our analytical results continue to accurately predict the stationary distribution in Figures 4C & 5, where selection need not be weak (α is far from α 0 ).

Proof of Proposition 2
Substituting f with Π i in Itô's formula (S.8) for jump processes yields where ε  Now it remains to prove that n 2 ε (n) i (t) is uniformly bounded with high probability. Taylor's theorem tells us that

Now some elementary computations yield
where the functions g ij (x, y) satisfy lim x→y g ij (x, y) = 0 uniformly on compact sets.

Now, recallinḡ
we see that ∆Ī In particular, since f i (x) is smooth outside of a neighbourhood of 0, we can conclude that g ij is bounded above by a constant multiple of x − y ; this allows us to conclude that |g ij (Ī (n) (s),Ī (n) (s−))| ≤ C n and |ε (n) k (s)| is non-zero if some pair of processes in P −e 0 +e j , P −e j −e d+1 , P −e j , and P −e 0 +e k , P −e k −e d+1 , P −e k jump simultaneously; if j = k, the Poisson processes are independent, and probability of such an event in an interval [t, t + ∆t) is O ∆t 2 , and thus tends to 0 if as ∆t → 0 i.e., |∆Ī We seek an upper bound on this quantity. To that end, we begin by observing thatS (n) (t) and eachĪ (n) i (t) is bounded above byN (n) (t), and that and, since this Poisson process is increasing in t, we have that for t ≤ T , and, applying Chebyshev's inequality, we see that for any C > 0, as n → ∞. Thus, for any fixed T > 0,N (n) (t) is bounded above and below on [0, T ] by e.g.,N (n) (0)+ λ (n) T ± 1, with probability that approaches one as n tends to infinity.
Thus, for example, and we may proceed exactly as above to conclude that fo t ≤ T , is bounded above with probability approaching 1 as n → ∞, and similarly for the other Poisson processes, from which we conclude that there exists some constant C such that with high probability.

Proof of Proposition 5
In this section, we will make the heuristic argument of Section 5.2 rigorous using the technique of coupling (see [2] for a very good introduction): we start by constructing birth and death processes that bound I (n) 1 (t) above and below providedS (n) (t) andN (n) (t) remain within ε of the endemic equilibrium, and such that the upper and lower bounds approach one another as ε → 0. Finally, we show that the probability thatS (n) (t) andN (n) (t) depart a ε-neighbourhood of the endemic equilibrium before strain 2 has either successfully invaded or gone extinct goes to 0 as n → ∞. Since ε is arbitrary, we recover the naïve branching process result.

Macroscopic Initial Frequencies
In this section we prove the following extinction of part (i) of Proposition 5, where the population is infected with d ≥ 2 strains. Proposition 11. Suppose that R 0,1 > R 0,i for all i > 1. IfĪ (n) 1 (0) → I 1 (0) > 0, then all strains i > 1 will go extinct with high probability.
In particular, fix ε > 0 such that For reasons that will become transparent below, we will assume that for some constant 0 < B < 1 that will be determined later. Let where we adopt the convention that τ Next, fix η > 0 sufficiently small that we can assume that n is sufficiently large that |β Thus, for t < τ (n) ε , the per-infective transmission rate for strain i satisfies whereas the total per-infective rate of removal of strain i satisfies ε , the number of infectives of strain i is stochastically smaller 3 than the birth and death process Z + i (t) with Z + i (0) = I (n) i (0) and birth and death rates and stochastically greater than the birth and death process i (0) and birth and death rates Proof. It suffices to construct coupled versions of I (n) We will do so inductively, at each step constructing the processes up to the next among the aggregated jump times of all three processes, which we denote For our underlying probability space, we assume sequences of independent rate 1 exponential random variables E k and independent uniformly distributed random variables U k on [0, 1], for k = 1, 2, . . ..

Suppose that I (n)
i (t), Z + i (t) and Z − i (t) have been constructed up to τ k (trivially true for k = 0). Note that is an upper bound on the combined rate of all transitions for all three processes. Set so τ k+1 is a rate ρ k+1 exponential random variable. Next, for t < τ k+1 we set 3 Given random variables X and X , we say that X is stochastically smaller than X , denoted X X if Similarly, a stochastic process X is stochastically smaller than the process X if S(t) X (t) for all t ≥ 0. One defines stochastically greater analogously.
Finally, we set It is readily verified that the resulting processes have the correct jump rates.
Remark 7. Note that given ε > ε 1 > 0, choosing η > 0 as before and η 1 > 0 analogously, we can similarly construct processes Z +,1 i (t) and Z −,1 i (t) such that etc. We shall apply this with a decreasing sequence of values ε n > 0 below. Now, our choice of η ensures that Z + i (t) is subcritical for all i > 1. In particular, setting and, for any sequence t n > 1 |µ + i | ln n, for all i > 1, we have, using Markov's inequality as n → ∞. Thus, if we show that P τ (n) ε > t n → 1 as n → ∞, we can conclude that all strains i > 1 vanish after before t n with high probability.
To this end, we start by defining  ε,i = ∞ should the respective process never exceed εn). We then have We continue with a classical result for birth and death processes: for i > 1, let Then, a simple calculation shows that We saw above that τ (n) 0,i and thus τ (n) i are with high probability bounded above by any sequence , which converges to 0 as n → ∞ for any B < 1. We thus have P τ (n) ε,i > t n → 1 as n → ∞.
For the remaining three values τ (n) ε,i , i = 0, 1, d + 1, we take a different approach, as the valuesS ,1 , I ,1 1 , andN ,1 are all non-zero and a branching process approach is no longer appropriate. Instead, we recall the SDE representation of our process (Proposition 1).
NowĒ ,1 is a stable fixed point for the dynamical systemĖ = F (Ē), so we may write where A := DF (Ē ,1 ), the Jacobian of F (x) evaluated at the resident endemic equilibrium, is a stable matrix and G(Ē) ≤ M Ē 2 for some fixed M > 0.
Then, using Duhamel's principle, we have that Let α 1 , . . . , α d denote the eigenvalues of A. It is a standard result (see e.g., [30]) that, given any α < min{− (α j ) : (α j ) < 0}, there exists a constant C, depending on α such that, when restricted to E s , we have Thus, Now, fix a sequence ln n t n n, we observe that  l (t) takes the formP l n Λ l (Ē (n) (t)) ds , for some continuous function Λ l . We will thus prove the generic lemma: Lemma 4. Let P be a Poisson process, Λ : R d+2 → R be continuous, and let M (n) (t) =P n Λ(Ē (n) (t)) ds .

Novel Strain in Small Number of Copies
We now consider the possibility that a new strain invades an established population. From the results of the previous section, we see that generically, the population will eventually be in an εneighbourhood of the fixed pointĒ ,1 for arbitrary ε > 0, and that I (n) i (t) ≡ 0 for all i > 1. If the new strain has reproductive number less than R 0,1 , then the arguments of the preceding section apply directly, and we can conclude that the novel strain will very rapidly go extinct.
If, however, it has higher reproductive number, the invading strain now has a non-zero probability of establishing itself and replacing the resident strain. In this section, we adapt the techniques above to deal with this case (by the above, we may take d = 2).
We start by defining the quantities τ We now fix ε > 0 such that and η > 0 sufficiently small that and suppose that n is sufficiently large that |β (n) 2 − β 2 | < η, etc.
Classical results for birth and death processes (e.g., [1]) tell us that Z + (t) and Z − (t) will hit 0 in finite time with probability q Z + (0) 2 and q 2 respectively, and will grow indefinitely otherwise, and, moreover, that there exist random variables W and W taking values on [0, ∞) such that P{W = 0} = q Now, as before, fix ln n t n n. Taking logarithms in (S.44), we see that for almost all ω ∈ {W = 0}, we have ln Z − (ω, t n ) t n → µ − 2 .
We have already established that the latter is bounded above by q Z − (0) 1 as n → ∞. The former follows almost exactly as the proof that P τ (n) ε < t n → 0 of the previous section;Ē ,1 is now a hyperbolic fixed point rather than a stable fixed point, and A has a positive eigenvalue, but the stable manifold ofĒ ,1 coincides with the subset of R d+2 with x 1 = 0. In particular, we may decompose where E S = {x 1 = 0} and E X are A invariant subspaces, corresponding to the sum of the generalised eigenspaces for eigenvalues with negative and positive real parts respectively. We will write P S and P X for the corresponding projections (i.e., P S has image E S and kernel E X , and oppositely for P X ), and note that both P S and P X commute with A. Proceeding exactly as previously, we define Applying the arguments of the previous section, using the equation for Ξ Now, note that for any ω ∈ {W = 0}, we must have Z + (ω, t) < εn for all t, for some sufficiently large n, so ω ∈ {τ Finally, we notice that as ε → 0 (and thus η → 0 also,) both q 2 and q 2 approach Since the choice of ε was arbitrary in our definition of invasion, we conclude that the probability of successful invasion of the new strain 1 is 1 − R 0,1 R 0,2 .

(S.45)
Once this has happened, as we note above, Kurtz's deterministic approximation is applicable, and with high probability, the system will approach any arbitrarily small neighbourhood of the endemic fixed point for the new strain 1, at which point, by the argument above, the former resident strain, strain 2, goes extinct with probability approaching 1 as n → ∞.
We first note that to have a well-posed (finite) limit as ε → 0, we must have f (α min ) = f (α max ) = 0, so that the first term, which is otherwise of order O ε −1 , vanishes.