How big is an outbreak likely to be? Methods for epidemic final-size calculation

Epidemic models have become a routinely used tool to inform policy on infectious disease. A particular interest at the moment is the use of computationally intensive inference to parametrize these models. In this context, numerical efficiency is critically important. We consider methods for evaluating the probability mass function of the total number of infections over the course of a stochastic epidemic, with a focus on homogeneous finite populations, but also considering heterogeneous and large populations. Relevant methods are reviewed critically, with existing and novel extensions also presented. We provide code in Matlab and a systematic comparison of numerical efficiency.


Introduction (a) Motivation
Epidemic models are now widely used to inform policy on a range of issues, from childhood diseases [1] to bioterrorist smallpox [2], SARS [3], foot-and-mouth disease [4] and pandemic influenza [5]. These models have typically involved either numerical integration of ordinary differential equations that do not explicitly account for the underlying chance events in transmission, or Monte Carlo simulation of models that aspire to a high level of realism [1,6]. There has been growing interest, however, in consideration of the parsimonious stochastic epidemic models that were described during the earliest phase of mathematical epidemiology [7]. This is partly because of the impressive array of formal results that have accumulated over the years in this field [8], but also because of the role that these models can play in modern, computationally intensive inference of epidemiological parameters [9][10][11] and optimization problems.
The final size of an epidemic can be defined informally as the total number of people experiencing infection during the outbreak. This quantity is typically called the attack rate by applied epidemiologists, and expressed as a percentage of the population in question. The probability distribution of final sizes is of particular interest to statistical epidemiologists, owing to its use inter alia in the analysis of household data [12,13].
It turns out that there is a particularly large number of approaches to calculation of the probability mass function (PMF) for the final size of an epidemic (often called the final-size distribution). This paper aims to summarize these approaches, paying particular attention to: (i) numerical implementation, including a novel application for iterative methods; (ii) critical comparison of different methods; and (iii) consideration of how these methods can be applied to calculation of other epidemiological quantities of interest.

(b) Model definition
While we consider various generalizations, our starting point is the susceptible-infectiousrecovered (SIR) epidemic model in a closed finite population. We consider a population of integer size N, and the state of the system is given by non-negative integer-valued stochastic variables S(t) and I(t), obeying S + I ≤ N, which represent the number of individuals who are susceptible or infectious at time t, respectively. The number of recovered individuals R is given by R = N − S − I due to our assumption that the population is closed (i.e. there are no births, deaths or migrations).
We assume that any pair of individuals makes contact at the points of a homogeneous Poisson process of rate β and that contacts between different pairs of individuals are mutually independent. Contact between an infectious and a susceptible individual results in the immediate infection of the susceptible individual, who experiences a random duration of infectiousness drawn from the infectious period distribution (also often called the recovery time distribution) and then recovers. Note that in these conventions, the overall rate of infection is βSI, so when comparing results across different values of N, it is often most instructive to hold the quantity β/(N − 1) constant.
The epidemic process starts in state (S 0 , I 0 ) and must end when there are no more infectious individuals. Writing the final state in the form (N − Z, 0) allows us to define the integer-valued stochastic variable Z as the final size of the epidemic. We are particularly interested in the PMF of this quantity; since this function has finite integer support, it is often conveniently represented as a vector of probabilities p = (p k ), for p k = Pr(Z = k). (1.1) In the rest of this paper, we will be interested in either the calculation of the components of the final-size distribution (often in vector representation) at machine precision or sampling from this distribution using Monte Carlo methods.

Material and methods
We now consider different ways to calculate the final-size distribution, starting with Monte Carlo methods that are simply described in §2a, before moving on to methods that run at machine precision in §2b,c. We have tried to use notation that is consistent within this manuscript, but in doing so, we may deviate from the conventions originally used for each technique.

(a) Monte Carlo methods (i) Direct continuous-time simulation
In the event that the infectious period distribution is exponential with rate parameter γ (i.e. mean γ −1 ) the system dynamics take the form of a continuous-time This system can then be simulated directly using Monte Carlo methods, once initial conditions are specified. Stigler's Law [14] states that scientific discoveries are not named after their originators (and was, of course, discovered earlier by Robert K. Merton). It is therefore as would be expected that while the most common method for Monte Carlo simulation of a continuous-time Markov chain is typically called Gillespie's algorithm [15], this method was actually developed by probabilists some decades earlier [16][17][18].
The results of simulating the Markov chain defined by (2.1) are shown in figure 1a,b, for a relatively large and small outbreak, respectively. We provide the function gil_mc.m in the electronic supplementary material, (S1.1), as an implementation of this, based on the code from Keeling & Rohani [6]. Obviously, if one is interested in temporal dynamics of the epidemic, all of the information created by this method is useful. But for the purposes of sampling from the final-size distribution, there is a lot of unnecessary computational effort expended by this method.
Extension: waning immunity. Some of the methods presented in this paper rely on longlasting immunity following recovery from natural infection-the SIR paradigm. Many diseases, for example respiratory synctial virus (RSV) [19] and rotavirus [20], exhibit significant waning immunity, which is readily incorporated in approaches based on direct simulation by adding the transition (S, I) → (S + 1, I) at rate μ(N − S − I), (2.2) to (2.1), giving SIRS epidemic dynamics. The system as defined always ends in state (N, 0) and so what to calculate depends on the epidemiological problem in hand. In the context of SIRS epidemics with household structure, the total number of infection events is of interest due to its role in the calculation of epidemic thresholds [21,22], and this can be readily extracted from a realization of the full SIRS epidemic. Extension: phase-type infectious periods. An assumption behind (2.1) is that the infectious period durations are exponentially distributed-this is justifiable for some diseases, but certainly not for others. Suppose we now introduce an integer index a between 1 and k for infectious individuals, and modify (2.1) to  the effect of reducing variability compared with an exponential distribution, and leads to a k-Erlang infectious period distribution when the transition rates are all equal. Another frequent use of phase-type distributions is the hyper-exponential distribution, where all q a,b in (2.3) are zero meaning that each infective passes through a unique I a state, which is often interpreted as individuals experiencing different disease severities, and which has the effect of increasing variability compared to an exponential distribution. (ii) Sellke's method Sellke's method provides a way to simulate final sizes of a stochastic epidemic with arbitrary infectious period distribution [24]. We suppose that each individual i has a stochastic variable T i for its infectious period, picked from the infectious period distribution, and that susceptible individuals have a random threshold Q i picked from an exponential distribution of mean 1 (in this work, we take the initial infectives to have zero threshold parameter). We arrange the labelling of individuals, without loss of generality, so that Q i ≤ Q j when i < j. Then The intuition behind this equation is that Q i is essentially how 'resistant' individual i is to infection, and βT i is the force of infection that individual i will contribute if ultimately infected.
In this non-rigorous picture, the epidemic 'stops' when the total infectious pressure drops below the resistance/threshold of all remaining susceptibles. To see more mathematically why this construction is equivalent to the model defined, we consider the argument of Andersson & Britton [8,§2.2]. If we let I(t) be the number of infective individuals at time t, then the rate at which a given susceptible becomes infective is βI(t) in the original model. We define the cumulative force of infection as Then considering a small unit of time δt, and a unit-mean exponentially distributed threshold Q we have that Therefore, the rate at which Λ exceeds Q is βI(t). Putting this together with (2.5) gives (2.4), showing that Sellke's method gives the same final size as direct simulation. Realizations of Sellke's method for a relatively large and small outbreak are shown in figure 1c,d. Numerically, this is much more efficient and general than Gillespie's algorithm for sampling from the final-size distribution, but the function sel_mc.m given in electronic supplementary material, S1.2.1, does not provide the temporal dynamics of the relevant epidemic process, although such dynamics can be readily calculated. The fundamental approach of the Sellke construction is to keep track of total infection pressure and thresholds separately, which is quite generally applicable.
Extension: temporal dynamics. Once a realization of the Sellke construction has been made as described above, it is possible to construct the temporal dynamics of that realization without any further picking of (pseudo-)random numbers. This follows from (2.6) and is practically implemented by maintaining a list of recovery times for currently infectious individuals, infecting each individual in the order specified by the Q thresholds and interspersing recoveries at the appropriate times. We provide an implementation of this method in electronic supplementary material, S1.2.2, which has the major advantage that an arbitrary distribution of infectious period can be simulated efficiently without any approximation.
is cycled through in turn, picking an infectious period T i from the relevant distribution, which then leads to an independent probability of infecting each of the remaining susceptibles. The number of susceptibles S is therefore reduced due to infective i by an integer S i ∼ Bin(S, π i ). (2.8) Once each of the susceptibles in the current rank has been considered, all infectives in the current rank are removed, and the number of infectives in the new rank g + 1 is i∈I g S i . This process is continued until an empty rank is generated. Figure 1e,f shows a realization of this process for large and small outbreaks, respectively. We provide code lud_mc.m in electronic supplementary material, S1.3 to implement this algorithm.
Pellis et al. [26] have argued that while Ludwig's argument is intuitive, it is not always obvious for which generalizations of simple epidemic models it will still hold, leading these authors to provide additional detail about the approach. Indeed, the insights built up from Ludwig's argument can be applied to very complex populations incorporating several layers of structureincluding households and networks-and for intrinsic varying severity, while the inclusion of infector-dependent varying severity invalidates it [27][28][29].
Perhaps the clearest way to confirm the validity of Ludwig's method is to make use of network theory. Using standard modern terminology [30], consider a random directed network where a link starting on individual i and ending on individual j is present with probability π i,j corresponding to the probability of infectious contact being made from i to j if i becomes infective, before i recovers. The key property required for use of Ludwig's method is that the probabilities of infectious contacts emanating from an individual must depend only on quantities that can be determined in advance, but cannot depend on (for example) the temporal behaviour of the epidemic before that individual is infected. Equation (2.7) is a simple example, but all that is needed is the calculability of probabilities of infectious contact in advance of the epidemic. A node j will be infected eventually if there is a path through the network from an individual i in the set of initially infective individuals I 0 to j. Letting D = (D i,j ) be the adjacency matrix for the random directed network (i.e. D i,j is 1 if i makes contact with j and is 0 otherwise) we can see that j is eventually infected ⇔ ∃n < ∞, i ∈ I 0 , such that (D n ) i,j > 0. (2.9) Ludwig's method therefore involves finding the smallest n satisfying the right-hand condition of (2.9), which becomes j's rank. The assumption of independence of the links means that this can be simulated iteratively as described. for appropriate initial probability vector p(0) and rate matrix Q = (Q i,j ). One way to define this matrix explicitly is by defining σ (i) and ι(i) as the number of susceptibles and infectives associated with state i, respectively. Then

(b) Machine-precision, Markov chain methods
where δ i,j is the Kronecker delta. We write the jump matrix for this Markov chain on August 16, 2017 http://rspa.royalsocietypublishing.org/ Downloaded from We use p S,I (t) to stand for the element of p(t) corresponding to the probability of S susceptibles and I infectives at time t. We also use the notation S for the state space of the Markov chain, which is composed of a set of absorbing states A and an irreducible transient class C, and which has dimension |S|.

(i) 'Brute force' methods
In this framework, there are two 'brute force' methods for matrix-based calculation. Firstly, for a Markov chain with finite state space, equation (2.10) has a matrix exponential solution which could be evaluated using, e.g. EXPOKIT [31] at a sufficiently large t. Alternatively, the probability vector after n events is, by definition of the jump matrix, p(0)P n . For SIR dynamics, a maximum of I 0 + 2S 0 events can take place, and so the relation can be used to calculate the final-size distribution. Code for both of these methods is provided in electronic supplementary material, S2.1.

(ii) Bailey's method, with Neuts & Li's implementation
Bailey [7] notes that an integral representation is available for any final size probability: Then by evaluating the Laplace transform of (2.10) at non-negative s, we see that It is possible to solve for q(s) algebraically, and thus obtain Extension: generalized transmission rates. The method as outlined is efficient due to the sparse, triangular structure of Q and does not depend sensitively on the actual transition rates. Neuts & Li [32] in fact considered models that had rates that have general functional dependence on S and I (but not, for example, time) which could be useful in several contexts, and does not significantly alter the algorithmic procedure or the computational time.
Extension: maximum size. Neuts & Li's paper also considered calculation of the distribution for the maximum size of a stochastic epidemic, defined as the maximum value of I(t) over the course of the epidemic and denoted I * . This quantity would typically be called peak prevalence by epidemiologists. Neuts & Li consider implementation of the method of Daniels [33] and provide an iterative procedure for calculation of Pr(I * ≥ y). We implement this scheme as NL_Imax.

(iii) Path integral/sum method
This approach has recently been used to evaluate the PMF of the number of secondary infections caused by an initially infected individual [34]. The method works by appending to the state of the Markov chain an indicator variable that takes the value one when the most recent transition corresponds to an infection event, and takes the value zero otherwise. In the SIR case, calling the indicator variable J gives the augmented Markov chain We then consider the jump chain of this modified Markov chain, a discrete-time Markov chain specifying the probabilities of jumping between states at the time of a jump; we label the transition probability matrix of the jump chain P. The total number of infections over the course of an epidemic, Z, is then equal to the number of jumps which result in being in a state where J = 1 up until there are no infectives remaining in the population. We can evaluate the distribution of Z using the following result [34]. Let (X(n), n ∈ Z + ) be a discrete-time Markov chain taking values in S with transition probability matrix P This result is directly derived by conditioning on the state first jumped to. The result above allows us to evaluate the probability generating function (PGF) φ(z) of the distribution of Z. To evaluate the PMF, we can numerically invert the PGF considered as a general Laplace transform [35]. However, a more efficient procedure can be obtained by differentiating the equation (2.21) k times with respect to z and evaluating at z = 0, resulting in a system of linear equations for {φ (k) i (0), i ∈ C}. Solving this system and using the relation allows computation of the kth element of p. This method was employed in Ross [34], where code was also made available.
In addition to being applicable to the SIR model with exponential infectious period distribution, this method can be applied to any Markovian model, including SIRS models and with any phase-type infectious period distribution; of course, more complicated infectious period distributions and model structures will result in an increase in the size of the matrix P required, and hence the methodology will eventually become computationally unfeasible.
Extension: hitting times. For a discrete random variable such as the final size, the path sum is most appropriate. For continuous random variables, the path integral method can be used [36]. Using notation as above, we consider the Laplace transform of the total cost (rather than the PGF): This is the maximal solution to the equations While moments of the hitting-time distribution can be simply obtained by differentiation of (2.24), calculation of the probability density at a given point in time requires numerical Laplace transform inversion, which is inherently numerically unstable. We recommend the Euler method of Abate & Whitt [35], with the roundoff error control proposed by Sakurai [37]. Our code for calculation of the Laplace transform is given in electronic supplementary material, S2.3.2.

(iv) Null space method
The null space method is based upon the spectral theory of matrices, in particular as appropriate for finite-state Markov chains and was used by Keeling & Ross [38]. We start with the Kolmogorov forward equations (2.10). Being a square matrix, Q is always diagonalizable in the sense of Jordan canonical form. The SIR model has N + 1 absorbing states, and hence N + 1 repeated eigenvalues of 0. We may write for some T, where J is block diagonal with each Jordan block J i of size n i equal to the algebraic multiplicity (i.e. number of repetitions) of its eigenvalue λ (i) . As the geometric multiplicity (i.e. the dimension of the null space) is equal to the algebraic multiplicity, each Jordan block J i is simply diagonal with λ (i) on the diagonal. Perron-Frobenius theory applied to the stochastic ensures that the eigenvalues lie within a disc in the negative half plane. Throughout this paper, we let I n be the identity matrix of dimension n. For the SIR model, where 0 d 1 ×d 2 is the matrix of size d 1 × d 2 consisting entirely of zeros. Now, by noting that if we start in an absorbing state (with probability one) then we must remain in that state, the columns of the matrix U 1 span the null space of Q so that v i u j = δ ij . This is essentially a computationally efficient version of the jump-matrix-based equation (2.14), provided we know the null space. We use the spspaces.m code of Kowal [39] to compute the null space of Q, and provide code in electronic supplementary material, S2.4, for application of this code to the SIR model. Extension: maximum size. Markov chain-based methods, such as the null space and path integral can also be used to calculate maximum epidemic sizes by augmentation of the state space: We do not provide code for this, since the Neuts & Li algorithm is certainly much more efficient, but note that Markov chain-based methods are extremely versatile, at the cost of state space expansion.
(c) Machine-precision, arbitrary infectious period methods Analytic traction can be gained on the Sellke construction by the derivation of a Wald-type identity. This was done by Ball [40], who obtained a triangular system of linear equations for the probability p k of observing k additional cases in a population of S 0 initial susceptibles and I 0 initial infectives: where Φ is the Laplace transform of the infectious period distribution, so for Markovian dynamics where 1 is a column vector with all entries equal to one, and we call B = (B kl ) the Ball matrix. We now consider three methods for numerical solution of (2.32).

(i) Direct substitution
The equation (2.32) can be solved directly, since the Ball matrix B is left triangular Unfortunately, this procedure becomes numerically divergent for large population sizes. The essential reason for this is that the binomial coefficients in (2.31) become extremely large for large l and k ≈ l/2. Figure 2a shows the large mass in the middle of the Ball matrix, which drives this numerical instability. Extension: multiple-precision arithmetic. Demiris & O'Neill [41] showed how the problem of numerical divergence could be overcome through the use of multiple-precision arithmetic. We provide code in the electronic supplementary material, S3.1, to implement this using MATLAB's vpa() function, which can be easily modified to work at standard machine precision. A limitation of this method, however, is the large computational cost involved, since multiple-precision algorithms are typically extremely costly.

(ii) Iterative methods
A large number of numerical methods are available for the solution of equations of type (2.32) [42]. We find that the use of the conjugate gradient method on the equation is stable for population sizes up to around 10 2 . This problem is illustrated in figure 2b, which shows a concentration of the mass of A in one area of the matrix that is similar to that of B noted above. In contrast, iterative schemes that can be applied directly to non-symmetric equations as (2.32) such as biconjugate methods (e.g. MATLAB's bicstab()) or minimum-residual methods (e.g. MATLAB's gmres()) do not out-perform direct substitution.  Extension: preconditioners. A major benefit of iterative methods is the potential to use preconditioners [42]. We find that the use of a Jacobi preconditioner together with an initial probability vector based on asymptotic results [8,41] can give accurate results even for N = 10 3 . This preconditioner is formed through the matrix D = (D kl ) for Then the preconditioned conjugate gradients method (PCG) effectively solves This problem is visualized in figure 2c, which shows how this preconditioner evens out the density of mass in the matrix involved. Code is provided in electronic supplementary material, S3.2, to implement this method. There remains the possibility of further preconditioning based on the properties of E to enhance convergence further without significant computational cost, but we were unable to find an appropriate second preconditioner. While we could not find a preconditioner that significantly improved biconjugate methods, the minimal residual method given by MATLAB's gmres() can be improved through the use of the preconditioner F = (F kl ) where This improved GMRES method effectively solves and has comparable performance to direct substitution. This problem is visualized in figure 2d. As for PCG, the possibility of finding a second preconditioner is still open and could yield a significant improvement over direct substitution if an appropriate one is found.

(iii) Gontcharoff polynomials
Another way of computing the mass function of the final-size distribution is via the expression for the PGF of this distribution given slightly indirectly in theorem 2.6 of Ball [40] (Ball gives the PGF for N − Z, the number of susceptibles remaining at the end of the epidemic). We can use the fact that a mass function (p k , k ∈ Z + ) can be recovered from its generating function φ(z) = k p k z k (z ∈ [0, 1]) using the relationship (2.22) to find that, for k = 0, 1, . . . , n, and

(d) Asymptotic results
There is an extensive literature on asymptotic results for epidemic final sizes in large populations. This goes back to the earliest mathematical representations of the mean behaviour of epidemics [46], and also limiting distributions of simple epidemics [47], but more recently involves formal convergence proofs [45,48] and results obtained for quite complex population structures, including multiple types [49], households [50] and networks [28].
At the heart of most asymptotic results is a transcendental equation for the probability π that an individual avoids 'global' infection (defined in a model-specific way) of the form π = F(π ), for F : [0, 1] → (0, 1]. (2.40) Here, π = 1 (i.e. asymptotically vanishing levels of global infection) will always be a solution. If that satisfies the equation (2.40). For complex models, there may be many non-maximal solutions, rendering general root-finding algorithms, such as bisection, unreliable. Considerinĝ (2.41) should, however, provide an accurate estimate for π for sufficiently small ε and a sufficiently large number m of iterations of F. In the case of household-structured models, there is then the question of derivation of the distribution of final-size proportions, which can often be done using the methods outlined elsewhere in this paper. Sample code for reproduction of fig. 2 of Ball et al. [51] using a Jump chain method and (2.41) is given in electronic supplementary material, S4.

(e) Epidemics on networks
There has been much recent interest in epidemic models where the population is connected on a network [52,53]. In these models, the population is made up of N individuals indexed by integers i, j, . . . and the contacts from i to j happen at the points of a Poisson process of rate β i,j . If i is infectious and j susceptible at the point of contact, then j becomes infectious. Recovery happens, as before, after a time drawn from the infectious period distribution. This very general formulation can be somewhat simplified when there are several individuals with the same epidemiological characteristics (i.e. if there is a large discrete symmetry group for β = (β i,j )). Alternatively, one may wish to preserve individual identity but make the simplifying assumption that β = τ G, where G = (G i,j ) is the adjacency matrix of an undirected, symmetrical, topological network without self-edges. Both simplifications are more commonly considered than the most general case [52,53]. Code to perform direct network epidemic simulation is already available in a form that is readily adapted [6, program 7.7]. For the Sellke method on networks, thresholds Q i ∼ exp(1) can be generated at the start of the process, however, the ordering of thresholds to obtain an expression similar to (2.4) is not straightforward. Instead, we provide sel_net.m in electronic supplementary material, S5.1.1, in which new generations of infectives are created iteratively. This is somewhat similar to Ludwig's approach, but with the difference that the random numbers are generated for the susceptible nodes rather than the infectious contacts produced by infectives.
Ludwig's method is, given its links with random directed networks, particularly natural in the context of epidemics on networks. In this case, the ranks are constructed iteratively as before, but the picking with probability given by (2.7) is applied only to remaining susceptible neighbours of the individual concerned. We provide the function lud_net.m in electronic supplementary material, S5.1.2, that implements this algorithm.

(ii) Machine precision methods
There are two distinct meanings to the final-size distribution for network models. The first, as considered by Neal [54], is the final-size PMF averaged over realizations of a given random graph model. This has, to our knowledge, only been done so far for the Bernoulli/Erdös-Rényi random which can then be solved using similar methods to those discussed for the Ball equations above. Code that creates the matrix N is provided in electronic supplementary material, S5.2.1.
The second broad class of distributions concerns the probabilities of different outcomes for a given β-typically the marginal probability for each node that it is ultimately infected. Here there is a 'multitype' formula also derived by Ball [40] that can be adapted. If we have a vector n = (n i ) whose elements are the initial number of susceptibles of type i, and also m = (m i ) with elements corresponding to the initial number of infectives of each type, then the relevant equations are where operations on vectors are defined in the natural way [8,40]. Code to generate the relevant matrix is given in electronic supplementary material, S5.2.2. It is also possible to consider a Markovian model as defined by (2.42). We provide code in electronic supplementary material, S5.2.3, to return an appropriate generator Q for this Markov chain when β = τ G, which can then be analysed using methods for Markov chains already discussed. An interesting recent development is the approach of Sharkey [55]. This method makes a 'closure' assumption that has been commonly used in pairwise epidemic network models [53] to derive a set of O(N 2 ) differential equations describing the temporal behaviour of a network epidemic that is provably exact for loop-less networks (I. Z. Kiss 2012, personal communication). Since the non-network homogeneous models considered are essentially of fully connected cliques this makes the Sharkey model inappropriate for these applications, however, it can be compared with other methods when the network structure is loop-less, and code to evaluate the model is already available [55, appendix D].

Results and discussion
We compare and benchmark each method systematically in table 1. These were carried out on a Mac Pro with a dual quad-core 3.2 GHz chipset and 16 GB of RAM running MATLAB V. 7.9, 64-bit version. The machine precision ≈ 2.2 × 10 −16 , meaning that numbers close to 1.0 differing by this amount may be evaluated as equal by the machine. For variable-precision algorithms, the minimum number of digits (considering only powers of 2) needed to obtain accurate results is reported. For iterative matrix methods solving Ax = b, and using the vector norm |v| = √ v · v, the standard diagnostic 'residual' is recorded, and is equal to |b − Ax|/|b|. For machine precision methods, we quote times for evaluation of the entire PMF, while for Monte Carlo methods we quote times per sample. We also display the behaviour of each algorithm as a function of population size graphically in figure 3.
The exact times given in seconds will of course vary significantly between machines, and may be subject to significant modification with more optimized code-in addition, the exact part of the code that should be timed to make a fair comparison is not clear, for example, where sparse matrices need to be generated, this is not included in our time measurement because the sparse structure of these matrices can be stored in a parameter-independent manner. Despite these caveats, however, some strong signatures show up in the benchmarking that can be interpreted and generalized.  If a Monte Carlo calculation is necessary, then Gillespie's method is typically the least efficient method. The Sellke construction is much faster for both final-size and temporal dynamics (in fact, it is the only approach that can generate temporal dynamics for an arbitrary infectious period distribution). Gillespie's method is, however, the only Monte Carlo approach that can deal straightforwardly with waning immunity. There is also the question of efficiency at different parameter values. For subcritical epidemics where the expected final size is much smaller than the population size, Ludwig's method can involve generation of many fewer pseudo-random numbers than Sellke's, for example. When considering the usefulness of any Monte Carlo method, it is useful to know how many samples are required.
Which of these measures is most relevant depends on the precise application. For example, the KL divergence is often used in estimation, while the KS D-statistic is used to assess model adequacy. Figure 4 indicates that the convergence with number of samples n is, respectively: While we have considered 'exact' Monte Carlo methods, approximate simulation algorithms also exist, such as the τ -leaping method introduced by Gillespie [56]. This was motivated by the desire to improve the numerical efficiency of the 'Gillespie algorithm', and assumes that the transition rates of the model are held fixed for a period of time τ , implying that the number of events of each type that occur during that time period have independent Poisson distributions. The algorithm is appropriate to use and provides most benefit, when (i) the changes to the state of the system have no or minimal impact on the transition rates of all event types, and (ii) it is unlikely for the number of events to lead to a state which is inconsistent with the state space of the model. The τ -leaping methods therefore seem unlikely to provide substantial benefits  for epidemic models: for example, requirement (i) is unlikely to be satisfied for homogeneously mixing models and (ii) will not be satisfied for network models. Nevertheless, a thorough analysis of this question with more sophisticated approximations [57] could be of significant interest.

(b) Numerical efficiency of Markov chain-based methods
For models based on Markov chains, multiple methods are available to calculate various quantities at machine precision. Of these, Bailey's method is the fastest, and is robust even for system sizes over 10 4 with the limitation on system size related to the resources available to store and process a dense N × N matrix, but does rely on special properties of the SIR model. Of the methods that apply to more general Markov chains, the exact calculation considered will determine the appropriate method: matrix exponentials can capture temporal dynamics; which of the null space and path sum/integral methods is faster is likely to be machine-and implementation-dependent, but both are efficient for calculation of a wide range of quantities. The system size limitations for these approaches are related to the resources available to store and process large sparse matrices, but as for Bailey's method there is no inherent numerical instability involved.

(c) Numerical efficiency of arbitrary infectious period methods
If a complex infectious period distribution is required then it may be impractical to use phasetype infectious distributions and Markovian approaches are unsuitable. Of the machine precision methods this leaves the Ball matrix equations and the use of Goncharoff polynomials. While these approaches are often much lower-dimensional than Markovian models, they are both inherently numerically unstable: the former because some elements of the Ball matrix are much larger than others; and the latter because solution involves summing many different positive and negative terms, leading to 'catastrophic cancellation'. Where possible, however, direct substitution of the Ball equations is numerically efficient; also Gontcharoff polynomials are comparable in numerical efficiency and stability to direct substitution.
When the system size becomes large enough to generate numerical instability, Jacobipreconditioned conjugate gradients can be used to reach system sizes of 10 3 while retaining numerical efficiency. An important point about all iterative methods, however, is that small negative probabilities can be returned for any value of the PMF that is close to zero. While these do not lead to practical problems or major numerical instability, variable-precision should still be viewed as a gold-standard method, albeit one that can be prohibitively computationally costly. On the other hand, in the context of inference schemes such as those based on random-walk Metropolis-Hastings sampling over parameters θ, if a probability vector p(θ) has been evaluated at one step and a change in parameters δθ is proposed, then p(θ) can be used as the starting vector for iterative calculation of p(θ + δθ), and would be expected to converge more quickly than an initial vector based on asymptotic results if δθ is small.  Table 2. Comparison of methods discussed in the main text for final sizes on the network in figure 5. For simulation methods, times are mean per simulation averaged over 10 4 realizations. For all benchmarks, mean infectious period is 1 and infectious period distribution is exponential. Benchmark I is for a fixed loop-less network as shown in figure 5c, with τ = 1. Benchmark II is an average over 20-node p = 0.2 Bernoulli graphs with τ = 3/((N − 1)p) . Times are given in seconds; a dash '-' is displayed if the method is unable to calculate the benchmark.

(d) Network methods
We also performed benchmarking of network methods, as shown in table 2. Here the conclusions are very similar to those for homogeneous models, but we note that for special topologies (averages over Bernoulli graphs or loop-less fixed networks) there are particularly efficient methods available. This is likely to be a general feature: for certain special networks, there will be techniques available that exploit the restricted topology; but in the most general case less efficient but more versatile approaches will be necessary. In this context, we note the efficiency of the Sellke construction, coupled with the possibility of reconstructing temporal dynamics as for the homogeneous case.

(e) Implications for inference
Since one of the main motivations for our study is the relevance of final-size calculations for statistical work, we now consider the implications of our results for estimation of epidemiological parameters. A theoretically attractive approach to inference is via exact evaluation of the likelihood. We have considered here machine-precision methods which facilitate achieving this goal. The inference itself is typically performed within a Bayesian framework, using Markov chain Monte Carlo (MCMC) methods [41,58]. However, all of the machine-precision methods we have considered become practically infeasible to implement as the population size, or the complexity of the epidemic model, increases, owing to growth in the number of equations to be solved and numerical instabilities. We have demonstrated in this work that deployment of appropriate numerical methods such as preconditioning or use of path sums can significantly extend the reach of machine-precision methods. The MCMC inference framework is very flexible and so difficulties such as infeasible evaluation of the exact likelihood can often be overcome by some form of imputation or data augmentation [58,59]. In the context of final-size data for epidemics, an example is the random graph approach developed by Demiris & O'Neill [60] for stochastic multi-type epidemics in structured populations. Often such approaches are heavily reliant on being able to restrict the search space, or at least to be able to sample over it efficiently, by good choice of prior. This is typically not a trivial task. Another approach which is gaining in popularity is to use simulation-based methods, such as Approximate Bayesian Computation or pseudo-marginal methods [58,[61][62][63]. This approach relies on the ability to simulate data efficiently, and hence our comparison of three Monte Carlo methods will facilitate the use of such simulation-based inference methods.
For small population sizes, there is a variety of methods which may be used to perform inference. The advantage of machine precision methods is that the PMF is computed without Monte Carlo errors, and hence inference based upon such calculations is reliable. However, these   methods may be time-consuming, and it may be more efficient to evaluate a Monte Carlo estimate. The difficulty is in determining how many simulations are required in order to estimate the mass function to sufficient accuracy for the problem in hand. To assist in determining this we have considered three measures of 'distance' between the machine-precision evaluation and the Monte Carlo estimate ( figure 4). This informs us that to achieve very close to the same precision as the machine-precision methods, based upon the Benchmark cases running times in tables 1 and 2, that often the machine-precision methods are more computationally favourable. However, if one is happy to forgive some inaccuracy-possibly in bias and confidence-in parameter estimates, then some speed-up is possible using Monte Carlo methods. This trade-off will often be also dependent upon the application, in terms of the total running time required and available. In any case, as the population size increases Monte Carlo estimation methods become more appealing, and eventually become necessary. Additionally, whether the infectious period distribution can be well approximated by a sufficiently low-dimensional phase-type distribution can lead to additional questions about speed and accuracy. Nevertheless, certain general statements as discussed earlier will generally hold: for SIR epidemics, Sellke's method typically outperforms the Gillespie algorithm, while only Markovian models can deal with waning immunity. Temporal data will also restrict the use of Sellke's method to Monte Carlo simulation, while Markovian models can have temporal quantities evaluated at machine precision through the use of, for example, matrix exponentials.

Conclusions
Since the work of Bailey [7], much effort on stochastic epidemic models has focused on analytic results to enhance understanding [8]. Modern computational resources, however, mean that there are three particularly strong reasons to consider numerical algorithms. First, there is the possibility of making a fast sweep over a large region of parameter space to aid intuitive understanding. Secondly, there is improving the performance of computationally intensive inference. Thirdly, there is enhancement of the performance of other 'inverse problems' such as optimization of public health intervention strategies.
In this work, we have reviewed a fairly comprehensive selection of the existing methods for generation of the epidemic final-size distribution, and compared their numerical performance. We have shown how Jacobi-preconditioned conjugate gradients can be used to help alleviate the reported limitations of the Ball method; however, we consider it likely that problem-specific preconditioners and other numerical techniques can be developed with the aim of addressing the epidemiological questions above. We would furthermore encourage anyone reading this paper to contribute to such developments.
T.H. is supported by the UK Engineering and Physical Science Research Council. J.V.R. was supported under Australian Research Council's Discovery Projects funding scheme (project no. DP110102893). We are grateful to Lorenzo Pellis and three anonymous referees for helpful comments on this manuscript.