Inferring R0 in emerging epidemics—the effect of common population structure is small

When controlling an emerging outbreak of an infectious disease, it is essential to know the key epidemiological parameters, such as the basic reproduction number R0 and the control effort required to prevent a large outbreak. These parameters are estimated from the observed incidence of new cases and information about the infectious contact structures of the population in which the disease spreads. However, the relevant infectious contact structures for new, emerging infections are often unknown or hard to obtain. Here, we show that, for many common true underlying heterogeneous contact structures, the simplification to neglect such structures and instead assume that all contacts are made homogeneously in the whole population results in conservative estimates for R0 and the required control effort. This means that robust control policies can be planned during the early stages of an outbreak, using such conservative estimates of the required control effort.


.1 Introduction
The stochastic and mathematical analysis of the spread of infectious diseases in large populations often relies on the theory of branching processes [1]. Branching processes are introduced as a model to describe family trees, where the simplifying assumption is that all women (in the branching process literature often the female lines are chosen) have the same probability, p k , of having k daughters, where k can be any non-negative integer. Furthermore, the numbers of daughters of different women are independent.
It is clear that this model ignores important properties of real populations, such as changing circumstances which make the distribution of the number of children change over time and the fact that populations in general cannot grow indefinitely because of competition for resources. However, simple as it is, the model has proved useful in many situations.
Branching processes are also useful to describe the spread of SEIR (susceptible → exposed → infectious → recovered/removed) epidemics, where an infection can be seen as a birth, with the infector being the mother and the infectee the daughter. In this model general parameters and notation λ infection rate 1/γ average duration of infectious period 1/δ average duration of latent period α exponential growth rate of number of infected individuals n population size R 0 basic reproduction number, transmission potential, mean number of new infections caused by typical infected individual v c required control effort, critical vaccination coverage I(t) number of infectious individuals at time t parameters specific for network model µ average number of acquaintances of individuals σ 2 variance of the number of acquaintances κ the mean number of acquaintances of newly infected individual, excluding the infector, κ = σ 2 µ + µ − 1 parameters specific for multi-type model ι number of different types π j fraction of population with type j λ ij infection rate from type i to type j individual M ι × ι next generation matrix, with elements m ij = λ ij π j /γ J ι × ι identity matrix ρ A largest eigenvalue of matrix A Table 1: Parameters and notation used for SEIR epidemic model in homogeneously mixing populations, on networks and in multi-type populations competition for resources is apparent, since once a susceptible individual is infected it cannot be infected again. However, if the population size n is large and the number of nolonger-susceptible individuals is of smaller order than √ n, then in homogeneous mixing populations, in configuration model network populations, in household models and in multi-type population models, suitable branching process approximations are very good (see e.g. [2]) and we use them without further justification. In the following paragraphs and sections, we discuss branching processes using terminology borrowed from epidemics.
Branching processes can be analysed in real time and in generations. In real time, the Malthusian parameter or the epidemic growth rate, α is arguably the most important parameter. A key theorem in branching processes [1,Thm.6.8.1] states that if the number of infected individuals in the population grows large, then it roughly grows at a rate proportional to e αt , where t is the time since the infectious disease was introduced. From a generation perspective the essential parameter is R 0 , the basic reproduction number, i.e. the average number of infections per typical infectious individual in an otherwise fully susceptible population. An outbreak can become large only if R 0 > 1, which happens if and only if α > 0. Note that if R 0 > 1, then it is still possible that the epidemic will go extinct quickly. The probability for this to happen can be computed [3,Eq. 3.10] and is less than 1.
In the remainder of this supplementary material, we first discuss some useful results from the theory of branching processes. Then we apply them to epidemics in respectively homogeneously mixing populations, network populations, multi-type populations and household populations. Throughout we focus on R 0 . It is however worth remarking that in homogeneously mixing populations, in (configuration model) network populations and in multi-type populations, we can deduce straightforwardly the required control effort or critical vaccination coverage, v c from R 0 (see main text). For more extensive discussions on control effort and vaccination in the household model see [4]. We note that the critical vaccination coverage is based on vaccination uniformly at random, i.e. all people have the same probability of receiving the vaccine. As stated in the article, this vaccination strategy is not optimal if the population structure is known exactly, but since this relevant population structure is generally hard to obtain for emerging diseases, vaccination uniformly at random might be the best feasible method.
Throughout we often use the superscripts "(hom)", "(net)", "(mult)", and "(house)", to refer to parameters and quantities associated with epidemics in respectively homogeneous mixing populations, network models, multi-type populations and populations consisting of households.
As a leading example we use the Markov SEIR epidemic model. In this model pairs of individuals make (close) contacts independently at a rate which might depend on the pair (depending on the population structure). If an infectious individual contacts a susceptible one, the susceptible one becomes latently infected (exposed) and stays so for an exponentially distributed time with mean 1/δ, after which the individual becomes infectious. An individual stays infectious for an exponentially distributed time with mean 1/γ, after which he or she is removed, which might mean that the individual dies, he or she recovers with permanent immunity or is isolated in a 100% effective way. We also discuss the Markov SIR epidemic, in which there is no latent period (or δ = ∞), but is the same as the Markov SEIR epidemic in all other respects. We assume that there are only a few initially infective individuals in the population and all others are susceptible.

Branching process results
In this section we need some notation: for t > 0, ξ(t) is the random number of individuals infected by an infectious individual in the first t time units of his or her infectious period. Thus, ξ(t) is a non-decreasing random process. Furthermore, define µ(t) = E(ξ(t)) as the expectation of ξ(t). It is clear that µ(t) is also non-decreasing. For ease of exposition we assume that the derivative of µ(t) exists and is given by β(t). Thus µ(t) = t 0 β(s)ds. This assumption is not necessary and the results below can be generalized in a straightforward way to the case where µ(t) is not differentiable. From the theory of branching processes [1], we know that R 0 = µ(∞) = ∞ 0 β(s)ds. In general there is no explicit expression for the Malthusian parameter α, only the implicit equation specifying α If R 0 > 1 (the situation we are interested in), this equation has exactly one real positive solution [1, p. 10], and serves as a definition of α.
If the length of the infectious period of an individual is distributed as the random variable I, and during his or her entire infectious period he or she infects other individuals at rate λ (that is, the infections form a homogeneous Poisson process with intensity λ), then β(t) = λP(I > t). This gives that ( Here we have used the standard equality ∞ 0 P(X > t)dt = E(X) for any non-negative random variable X (e.g. [5,Sec. 4.3]). From now on, for reasons of clarity, we assume that I has a density which is denoted by f I (t). We may relax these assumptions without further consequences. We deduce that where φ I (α) = ∞ 0 e −αt f I (t)dt = E(e −αI ) is the Laplace transform of I or, which is the same, the moment-generating function of −I. Equation (3) gives an implicit equation for α.
If an infected individual only starts being infectious after a random latent period which is distributed as L and has density f L (t), and after this period he or she is infectious for another, independent, period which is distributed as I, during which he or she infects others at rate λ, then which is the convolution of f L (t) and β 0 (t), where β 0 (t) is the derivative of E(ξ(t)) when the latent period is 0. This leads to where we have used the same computations as in (2). We note that R 0 is independent of the latent period. Similarly we deduce that where φ L is the Laplace transform of the random variable L. If L does not have a density the results above still hold. Note that if L = 0 with probability 1, then φ L (α) = 1 and we obtain (3) again.

Homogeneously mixing populations 1.3.1 Constant infectivity
For SEIR epidemics in a (homogeneously) randomly mixing population, every time an individual makes a close contact, it is with a random other individual from the population, which is chosen uniformly at random, independently of other close contacts. During the emerging phase of an epidemic it is unlikely that an individual is chosen, who is no longer susceptible. Thus, we assume that all close contacts of infectious individuals are with susceptible ones. To make the above mathematically fully rigorous, we should consider a sequence of epidemics in populations of increasing size and derive limit results for this sequence of epidemics [2], but we leave out this level of technicality here. If individuals each make close contacts independently at rate λ (hom) , then we deduce from (4) and (5), that In particular, 1 If I is exponentially distributed with mean 1/γ and there is no latent period, then φ I (α) = γ γ+α and φ L (α) = 1, which leads to R (hom) 0 = 1 + α/γ as was deduced in the main text. If the latent period is exponentially distributed with mean 1/δ, then φ L (α) = δ δ+α . Thus in the Markov SEIR model, (6) reads

Deterministic infectivity profile after latent period
We proceed by considering the (non-Markov) SEIR model in which, during the infectious period I being of random length, the close contact rate equals h(τ ), where τ is the time since the infectious period starts. Note that we assume that h(τ ) is non-random, i.e. identical for all infected individuals, but that the infectious period I may end after a random time hence being different for different individuals. We also allow for a random latency period L prior to the infectious period. In this case, R If h(τ ) = λ is a constant then this equality can be rewritten as (6).

Configuration model network populations 1.4.1 The network
In this subsection we consider the configuration model network. In this network a fraction d k of the n vertices (=individuals) has degree k, that is, a fraction d k of the population has k other people it can have close contacts with, its acquaintances. The acquaintancies are represented by so-called bonds or edges. Out of all possible networks created in this way with given n and d k 's, we choose one uniformly at random. See [6,Ch.3], for more information on the construction of such networks. We choose the (few) initial infective individuals all with equal probability (uniformly at random) from the population. If the population size n is large, then the probability that an initially infective individual has k acquaintances is d k . However, by the construction of the network, the probability that an acquaintance of such an initially chosen infective has k acquaintances is not d k ; for k = 1, 2, · · · the probability is given bỹ since an initial infective is k times as likely to be an acquaintance of an individual with degree k, than to be one of an individual with degree 1. Now, if an individual is infected during the early stage of an epidemic, then at least one of its acquaintances is no longer susceptible (i.e. its infector). However, if n is large, by the construction of the network the probability that its other acquaintances are still susceptible is close to 1. Hence, the expected number of susceptible acquaintances at the moment of infection of an individual infected during the early stages of the epidemic is which is equal to κ as used in the main article.

The epidemic with constant infectivity
Consider an SEIR epidemic on the configuration network described above. Assume again that f L (t) is the density of the duration of the latent period and f I (t) the density of the duration of the infectious period. Assume that between every pair of acquaintances the rate of close contacts is λ (net) (i.e. close contacts occur according to independent Poisson processes with rate λ (net) per pair). The rate at which infection of a given acquaintance occurs at that time is λ (net) multiplied by the probability that the infector is infectious and has not previously infected this acquaintance, i.e.
If the number of acquaintances of this infector is k, then the expected infectivity at time t is Taking the mean over the number of acquaintances of an individual infected during the early stages of an epidemic, we obtain This leads, after manipulations as performed in (2) and (3), to R (10) Combining these observations gives If, as before, we consider the Markov SIR model in which L = 0 and I has an exponential distribution with mean 1/γ, then (9) yields and (10) yields The latter equality implies λ (net) = γ+α κ−1 , which inserted in the former gives as claimed in the main text.
If we consider the Markov SEIR epidemic in which the latent period has mean 1/δ and the infectious period has mean 1/γ, then R λ (net) +γ still holds, while (10) yields which in turn implies Combining these observations gives that for the Markov SEIR epidemic

Deterministic infectivity profile after latent period
As in the homogeneous mixing case we now assume that the infectivity, conditional upon still being infectious, is a function of the time τ since the infectious period starts, saŷ h(τ ) (later we assume thatĥ is proportional to h as used in the homogeneous mixing population). Note that we assume thatĥ(τ ) is not random, but that L and I are random and independent. In this case, R Similarly, we obtain If we combine (6) and (11), and assume that α and the (constant) infection profiles (and thus φ I and φ L ) are known and the same for both models, then R To analyse this fraction, we introduce a random variable Y by its distribution function , for 0 ≤ y < ∞.
Using this and recalling that E(I) = ∞ 0 P(I > t)dt, we can write .
Since λ (net) , α > 0, we have that e −αx and e −λ (net) x are both non-increasing in x. Thus, by Chebyshev's integral inequality (or FKG inequality [5, p.86]), we have that e −αY and e −λ (net) Y are positively correlated, whence R is small if κ is relatively large compared to R (hom) 0 and the standard deviation of the infectious period is not large compared to the mean. (See Figure 2 of the main article). It can easily be seen that the opposite makes the approximation worse. Infections taking place a long time after the start of an infector's infectious period contribute relatively little to α; on the other hand all infections make the same contribution to R 0 . Also note, that if in the network model a given individual infects all of his/her acquaintances with large probability (say 99%) if he/she is infectious for a middle-long time (say T ), then increasing the infectious period to 2T has little effect on the epidemic both on its size (which relates to R 0 ) and its speed (which relates to α). However, in a homogeneously mixing model, the offspring (which contributes to R 0 ) would double in expectation in this situation, while the speed of the epidemic would hardly change. Thus, if the standard deviation of the infectious period is large, we cannot ignore the large infectious periods which cause the discrepancy between R . Now consider the second special case discussed above: the infectivity profile, conditional upon still being infectious,ĥ(τ ) is not constant, but is proportional to h(τ ) for the homogeneous mixing model, where τ is the time since an individual starts to be infectious. Let λ :=ĥ(τ )/h(τ ). Then, As for the SEIR model with constant rates, we introduce a random variable Y by its distribution function Using this we can write Since λ and α are positive and h(τ ) is a non-negative function, we have that e −αx and e −λ x τ =0 h(τ )dτ are both non-increasing in x. Thus, copying the argument above, we have that R . We note that although (14) does not explicitly depend on κ, the relationship between α and λ and h(τ ) does and therefore the exact value of the right hand side does as well.

Example of a model where R
does not hold in general if h(τ ) is a random function instead of a deterministic function, i.e. h(τ ) is different for different people, following some distribution over stochastic processes. This is shown in the following extreme example.
We assume that every infective individual is infectious for exactly one point in time, at which he/she infects a random number of other individuals. In the homogeneous mixing case, with probability 1/3 an infectious individual infects on average 2 other individuals at time 0 (relative to his/her time of infection), while with probability 2/3 he/she infects on average 1 other individual at time 1. This corresponds to leading to R (hom) 0 = 4/3 and 1 = 2/3 + (2/3)e −α , which implies e −α = 1/2 (or α = log [2]). In the corresponding network case we assume every individual has 3 acquaintances, so κ = 2. With probability 1/3 an infectious individual infects each of his/her susceptible acquaintances with probability 1 − e −2λ independently at time 0, while with probability 2/3 he/she infects each of his/her susceptible acquaintances with probability 1 − e −λ independently at time 1. Here λ is chosen such that e −α = 1/2.
For this model Some algebra gives that e −λ =

Multi-type epidemics
For the SEIR epidemic in a multi-type population, we assume that there are ι types of individuals, labelled 1, 2, · · · , ι and again that the population is large. Additionally we assume that the number of individuals of each type is large, and in what follows we assume that there is no relevant depletion of susceptibles of any type during the initial stages of the epidemic. We assume that a fraction π i of the community is of type i. Furthermore, we assume that not all close contacts lead to infection. However, we do assume that the probability that a close contact between a susceptible and an infectious individual leads to infection depends only on the time since infection of the infectious one, τ . This probability is random (i.e. different for different individuals) and is denoted by Λ(τ ). Note that we assume that the distribution of Λ(τ ) does not depend on the types of the individuals. The random function Λ incorporates the latent and recovered period, in the sense that before the end of the latent period and after recovery Λ(τ ) = 0. We use g(τ ) = E(Λ(τ )) for the expected probability of infection at age τ of a randomly selected individual. In an SIR epidemic the infectivity is often a function of τ conditioned on the individual still being infectious at time τ . In that case g(τ ) can be written as h(τ )P(I > τ ). Close contacts are not necessarily symmetric. That is, if individual x makes a close contact with individual y, then it is not necessarily the case that y makes a close contact with x. The rate of close contacts from a given type i individual to a given type j individual is λ ij /n. Therefore the expected number of j-individuals that an infected i-individual infects up to its "age" (time since infection) t during the early stages of an outbreak when all individuals are susceptible is given by The matrices M (t) and A(t) are defined by respectively M (t) = (m ij (t)) and A(τ ) = (a ij (τ )). Furthermore, we define M = M (∞) = (m ij (∞)) as the next generation matrix.
It is well-known that the basic reproduction number R (mult) 0 is given by the dominant (i.e. "largest") eigenvalue of M , also denoted by ρ M [3,7].
To determine the epidemic growth rate, α, we use Equation (6.4) and the subsequent paragraphs from [7]. This translates into that the dominant eigenvalue of ∞ 0 e −ατ A(τ )dτ should equal 1, where the integral is taken elementwise. Now we use that Hence, ρ A , the largest eigenvalue of the matrix ∞ 0 e −ατ A(τ )dτ is given by ρ M multiplied by ∞ 0 e −ατ g(τ )dτ /( ∞ 0 g(τ )dτ ), where ρ M is the largest eigenvalue of M . In particular this gives that 1 Notice that in the homogeneous case, i.e. the case with ι = 1 and we get the same relationship between α and R (hom) 0 (as given in equation (7), with h(τ )P(I > τ ) = g(τ )λ 11 ) as between α and R (mult) 0 , which implies that ignoring the population structure does not affect the estimates for R 0 .

Household epidemics
Household epidemics are harder to study in this context (compared to homogeneous, network and multi-type epidemics) and already several papers are dedicated to these epidemics, e.g. [8]. In particular, there is no easy way to compute R 0 or α (instead other threshold parameters are often derived). Furthermore, if v c is the critical vaccination coverage when vaccination is applied uniformly at random (i.e. the required control effort), then the relationship v (house) does not hold in general. Also, if the household structure is observed, then there are better vaccination strategies than vaccination uniformly at random [4]. (The same is true if the degrees of individuals are observed in the network model and if the types of individuals and their relative infectivities and susceptibilities are known in the multi-type model). However, in the article we consider the case where the population structure is hard to obtain. In that case vaccination uniformly at random seems to be the most natural vaccination strategy. Reproduction numbers for household epidemics and the relationships with vaccination uniformly at random and the epidemic growth rate are studied in great detail in [9] and some of the results will be repeated here. For the household model we assume that the population is partitioned in n/m households (or groups or cliques) of equal size m. So, we assume that n is an integer multiple of the positive integer m. For a population where the households are not of equal size we refer to [10]. We consider only SEIR models in which individuals have constant infectivity during their infectious period. Individuals contact each other with global contacts at perpair rate λ G /n, while members of the same household make additionally local contacts at per-pair rate λ H . Note that, unlike in Section 1.5, we assume that close contact of an infective with a susceptible necessarily results in the infection of the latter.
We use the basic reproduction number R (house) 0 as defined in [10,9], since this is the parameter having interpretation closest to the common R 0 definition. This R (house) 0 can be computed by considering one isolated household of size m, which has one initial infectious individual and m − 1 susceptibles. Let µ 0 = 1 and let µ 1 be the expected number of individuals in this household with whom the initial infective makes close contact during its infectious period (the first generation). Similarly µ i is the expected number of individuals in the i-th generation, that is, the expected number of initially susceptible individuals which were not in the first i−1 generations, but have a close contact with a generation (i−1) individual during its infectious period. Note that µ i = 0 for i ≥ n. In [10] it is shown that R (house) 0 is the unique positive x which solves If the households are not all of the same size then the µ i are replaced by household-sizebiased averages, see Section 3.3. of [10].
In Section 2.6 of [9] it is shown that for SEIR epidemics R 0 estimates based on α and the homogeneous mixing assumption are conservative. We note that α is in general implicitly defined as the solution of an equation involving the infectivity profile of a household. Further arguments provided in [9] also show that in general v (house) If we estimate v c based on α and the homogeneous mixing assumption, then in most numerically analysed cases enough people are vaccinated. However, some counter examples are provided in [9].
In Figure 3 of the main text the dependence of R 0 and v c on the relative contribution of the within household spread is illustrated for a household size distributions taken from Nigerian and Swedish datasets [11,12].

Simulations
The simulations used in the article are performed in R and in MATLAB. In all simulations we use a Markov SEIR epidemic with the expected latent period twice the expected infectious period. This resembles the estimates for Ebola in West Africa [13], where the average time between infection and symptom onset and the start of the infectious period is estimated to be approximately 9.4 days (standard deviation 7.4 days) and the average time between symptom onset and hospitalization or death is approximately 5 days (standard deviation 4.7 days). Because the differences between the means of the infectious and latent periods and their corresponding standard deviations are relatively small, we use a Markov SEIR epidemic model in which both periods are exponentially distributed.
We simulated a Markov SEIR epidemic in a multi-type population 250 times in MAT-LAB. As a population we took the Dutch population in 1987 (approximately 14.6 million people) as used in [14], for which extensive data on contact structure are available. The population is subdivided into six age groups (0-5, 6-12, 13-19, 20-39,40-59, 60+) and contact intensities are based on questionnaire data. For the simulations we use that the average infectious period 1/γ is 5 days, and the average latent period 1/δ is 10 days. The infection rates λ ij are chosen randomly for each simulation as follows. The data in Table 1 of [14] give estimates of m ij (i, j = 1, 2, . . . , 6), where m ij is the mean number of conversational partners per week in age class i of a typical individual in age class j. Using such conversations as a proxy for disease transmission, we assume that λ ij = cm ji /π j , where π j is the fraction of the Dutch population that are in age class j, estimated from Appendix Table 1 in [14], and c is a multiplicative constant chosen so that R (mult) 0 has a specified value, which is sampled independently and uniformly from the interval between 1.5 and 3 for each simulation.
All simulated epidemics start with 1 infectious individual in each of the six age groups. We use two estimates of R 0 . The first of these estimates is based on the average number of offspring from the people who were infected as 100th up to 1000th. We ignore the first 100 infecteds to ignore the effect of the initial stages of the epidemic, when the proportions of infecteds are still far from equilibrium. This procedure leads to a very good estimate of R 0 if the spread of the disease is observed completely. The second estimate is based onα, an estimate of the epidemic growth rate α, and neglects the multitype setting by assuming homogeneous mixing. We assume that we know γ and δ exactly and the estimate for R 0 is given by (1 +α/δ)(1 +α/µ). The estimateα is obtained from the development of the number of infectious people over time between the time the 100th individual becomes infectious and the time the 1000th individual becomes infectious, by using least square estimation of the natural logarithm of the number of infecteds against time. More specifically, if t 100 , t 101 , . . . , t 1000 denote the times that these individuals become infected thenα is obtained by fitting a straight line to the points (log(i), t i ), i = 100, 101, . . . , 1000 using linear regression, sô In Figure 4(a) of the article we provide a scatter plot depicting the two estimates of R 0 for the 250 simulations. The ratio of the two estimates in the 250 simulations are summarized in Figure 4(b). In Figure S1 of this ESM the estimate or R 0 based onα and the homogeneous mixing assumption is compared with the theoretical R 0 , based on the full model. We see that the estimates are generally very good, as predicted by the theory.
To simulate epidemics on networks we use several networks from the Stanford Large Network Dataset collection [15]. In the main article we use a collaboration network in Condense Matter physics, because (i) this graph is undirected (if individual a can contact individual b, then b can contact a, (ii) this graph is large (23133 individuals) and (iii) the mean excess degree, κ is not extremely high. Individuals are acquaintances if they were co-authors of a manuscript posted on the e-print service arXiv in the condense matter physics section between January 1993 and April 2004. A manuscript with more than 2 authors leads to cliques (small groups in which everybody is acquainted to everyone else in the group). Since arguably many networks relevant for the spread of infectious diseases contain such cliques (households, workplaces and groups of friends), the presence of many cliques in collaboration networks is a desirable property.
Our simulations of Markov SEIR epidemics on all the networks considered are per-formed in R, using the igraph package [16]. An epidemic starts with 10 uniformly chosen individuals which are at the start of their infectious period at time 0. We estimate the epidemic growth rate α based on the time between the total number of individuals which are infectious or recovered/deceased (the individuals that have shown symptoms) increases from 200 to 400. We exclude all simulations in which the total number of affected individuals stays below 400. The estimate of R 0 based on the real infection tree is obtained by looking at the epidemic from a generation perspective: All individuals infected by the initially infectious individuals are in generation 1, individuals infected by generation 1 infectives are in generation 2 etc. [10]. We consider as a reference generation the first generation in which there are 75 individuals (say generation k) and we divide the number of individuals in generation 2 up to k + 1 by the number of individuals in generation 1 up to k. We exclude the initial individuals from the estimation of R 0 , because those individuals are chosen uniformly at random and therefore independently of the population structure. By trial and error investigation we tune the infection parameter λ such that the estimate of R 0 using the infection process is close to 2. Using this λ we run 1000 simulations. A typical graph of how the number of observed individuals (i.e. infectious + removed) is given in Figure S2(a). In part (b) we show the same graph but now we subtract 0.05 times the time to show that the growth of the number of individuals is indeed close to exponential over a large time.
Because of the mechanical way of estimating α, it is possible to have atypical epidemic trajectories, in which the estimation procedure is not good. Examples are (i) epidemics in which for example the exponential growth has not started yet at the time the 200th individual starts its infectious period or (ii) epidemics where just around the time the 200th or 400th of individual starts its infectious period a new part of the network is affected, where this new part contains many acquaintances within itself but is not well connected to the rest of the network. Such an event causes a sudden strong increase in the observed cases. These atypical trajectories are possible to identify if one observes the number of infectious individuals for a single epidemic and better estimates can be obtained in this way. We deal with this problem by not considering the simulations which give the 5% lowest and 5% highest estimates for α.
In Figure S3 we provide a scatter plot of the two estimates of R 0 for the simulations used, we see that in the vast majority of the simulations, the estimate of R 0 based on the estimated α and the homogeneously mixing assumption is conservative. We note that the two estimates are hardly correlated.
We further summarize our data in Figure 5 of the article, and in Figure S4. In which the ratio and difference of the R 0 estimate based on the epidemic growth rate and assuming homogeneous mixing, and the R 0 estimate based on the observed infection process, are given.
We also analyse the spread of SEIR epidemics on 2 other networks described in the Stanford Large Network Dataset collection [15]. The first is the collaboration network in Astro Physics, which is obtained in a similar way as the collaboration network in Condense Matter Physics. This network is slightly smaller than the Condense Matter Physics network and has a higher κ (approximately 64 instead of 21). The analysis is performed similarly to the analysis of the Condense Matter Physics collaboration network. Boxplots of the estimates of R 0 using the real infection process, the estimates of R 0 using the epidemic growth rate and assuming homogeneous mixing, as well as a boxplot of the ratio of those estimates, are given in Figure S5.
We see that the two estimates are close, but that the simpler estimate assuming homogeneous mixing is slightly conservative for all three empirical networks, which is consistent with the theoretical result for the configuration model.
The second alternative network is a part of the facebook social network from [15]. This part is relatively small and we restrict ourselves to the largest connected component (containing 1034 individuals). This network has a high mean degree (51.7) and mean excess degree (93.5). Because of its relatively small size, and the observation that some substantial parts of the network are connected to the other parts of the network through only a few connections, the estimate of R 0 through the epidemic growth rate is less good. We also have to adapt the bounds for estimating R 0 from the infection tree (as a reference generation the first generation in which there are 40 individuals), and we estimate the epidemic growth rate based on the time between the total number of individuals which are infectious or recovered/deceased increases from 150 to 350. Furthermore, in order to obtain quicker convergence the 7 initial infectious individuals are chosen proportional to their number of acquaintances, which gives individuals with many acquaintances a higher probability of being initially infectious. Boxplots of the estimates of R 0 using the real infection process, the estimates assuming homogeneous mixing and using the epidemic growth rate, as well as a boxplot of the ratio of those estimates, are given in Figure S5. Figure S1: The estimated basic reproduction number, R 0 , for a Markov SEIR model in a multi-type population as described in [14], based on the homogeneous mixing assumption and the estimated epidemic growth rate, α, against the computed R 0 based on the full model. The infectivity is chosen at random, such that the theoretical R 0 is uniform between 1.5 and 3. The estimate of α is based on the times when individuals become infectious.   Figure S4: Histograms of the ratio (a) of and difference (b) between the estimates of R 0 assuming homogeneous mixing and using the estimated epidemic growth rate, and estimates based on the real infection process in the collaboration network in Condense Matter Physics. 1000 simulations are used and the simulations with the 50 lowest and 50 highest estimated epidemic growth rates are not represented in the histograms.  Figure S5: Boxplots of estimates of R 0 for three networks from [15]: The condensed matter physics and astrophysics collaboration network and a facebook social network graph. In (a) the estimates assuming homogeneous mixing and using the epidemic growth rate are plotted in red, while the estimates based on the real infection process are plotted in blue. In (b) the ratios of the two estimates of R 0 for each simulation are summarized.