Accounting for super-spreading gives the basic reproduction number R0 of COVID-19 that is higher than initially estimated

Transmission of infectious diseases is characterized by the basic reproduction number R0, a metric used to assess the threat posed by an outbreak and inform proportionate preventive decision-making. Based on individual case reports from the initial stage of the coronavirus disease 2019 epidemic, R0 is often estimated to range between 2 and 4. In this report, we show that a SEIR model that properly accounts for the distribution of the incubation period suggests that R0 lie in the range 4.4 - 11.7. This estimate is based on the doubling time observed in the near-exponential phases of the epidemic spread in China, United States, and six European countries. To support our empirical estimation, we analyze stochastic trajectories of the SEIR model showing that in the presence of super-spreaders the calculations based on individual cases reported during the initial phase of the outbreak systematically overestimate the doubling time and thus underestimate the actual value of R0.


Introduction
The basic reproduction number R 0 is a critical parameter characterizing the dynamics of the outbreak of an infectious disease. R 0 quantifies the expected number of secondary cases generated by an infectious individual in an entirely susceptible population. Although it is pertinent to the situation before imposition of protective measures, R 0 is influenced by natural conditions (such as seasonality) as well as socioeconomic factors (such as population density or ingrained societal norms and practices) (Delamater et al., 2019). Importantly, R 0 suggests the extent of control measures that have to be implemented to stop the spread of the epidemic.
A preliminary estimate published by the World Health Organization (WHO) suggested that R 0 of coronavirus disease 2019  lies in between 1.4 and 2.5 (WHO, 2020a). Later this estimate has been revised to 2-2.5 (WHO, 2020b), which is in agreement with numerous other studies that, based on official data from China, implied the range of 2-3 [see, e.g., Liu et al. (2020) or Boldog et al. (2020) for a summary]. This range suggests an outbreak of a contagious disease that should be containable by imposition of relatively mild restrictions on social interactions, as initially implemented in European countries such as Italy or Spain. Of note, there were early reports of R 0 higher than 4 by Cao et al. (2020) and Shen et al. (2020), in which, however, relatively long infectious periods were assumed.
Here we estimated R 0 of COVID-19 based on the doubling time observed in the (nearly) exponential phase of the increase of confirmed cases in China, United States, and six European countries. By considering the early phase in which the number of registered cases begins to exceed 100, we aimed at accounting for super-spreading events that are implausible when the number of infected individuals is very low. To provide additional motivation for this approach, we show that in the presence of superspreaders the median trajectory starting from a single infection grows much slower than the average trajectory, which very likely leads to overestimation of the doubling time. We used a susceptibleexposed-infected-removed (SEIR) model with six subsequent exposed states to reproduce the shape of the reported distribution of the incubation period. This allowed us to determine a plausible range of R 0 at 4.4-11.7, considerably higher than most initial estimates.

The SEIR model
We used a SEIR model with six exposed states, which is equivalent to the assumption that the incubation period is Erlang-distributed with the shape parameter 6. We assumed that the average incubation period 1/σ ranges between 5.28 days [mean value determined by Lauer et al. (2020), used by us as a lower bound] and 6.47 days [mean value determined by Backer et al. (2020), used by us as an upper bound]. The infectious period is either exponentially or Erlang-distributed with the shape parameter 2 [as in (Kucharski et al., 2020)], resulting in model variants with, respectively, a single or two subsequent infectious states. The average infectious period 1/γ is assumed to be 3 days [as in  and (Kucharski et al., 2020)]. See Methods for more details on the model.

Estimation of the basic reproduction number in the exponential growth phase
We estimated T d within two-and three-week periods starting on the day in which the number of confirmed (in the SEIR model naming convention, removed) cases exceeded 100 in populations of China, United States, and six European countries (Fig. 1A). In nearly all countries (except the United States), values of T d estimated based on three-week periods are higher than based on the two-week periods, which is likely a consequence of protective measures, such as social distancing, and isolation of positively tested individuals. Three-week T d values range from 2.34 (United States) to 3.25 (China), while two-week T d values lie in between 2.16 (Spain) and 2.88 ( United Kingdom).
We estimated the range of R 0 as a function of the doubling time T d using a formula provided in Methods. The lower bound has been obtained using the model variant with 1/σ = 5.28 days and two infectious states, whereas the upper bound results from the model with 1/σ = 6.47 days and one infectious state, Fig. 1B. After plugging in the overall longest T d = 3.25 and the shortest T d = 2.16 into, respectively, a variant of our model with the lowest R 0 (T d ) curve (parameters: m = 6, n = 2, 1/σ = 5.28) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 1, 2020. . https://doi.org/10. 1101/2020 and a variant with the highest R 0 (T d ) curve (parameters: m = 6, n = 1, 1/σ = 6.47), we arrived at the estimated R 0 range of 4.4-11.7. The two-week doubling time for China, 2.39, is remarkably consistent with the value of 2.4 reported by Sanche et al. (2020), who estimated that R 0 for China lies in the range 4.7 to 6.6, which is contained in our estimated range for China: 4.4-9.6. The models having one or two exposed states, often used to estimate the value of R 0 , substantially underestimated R 0 , whereas a SIR model with a fixed time delay equal to 1/σ overestimated R 0 , cf.  solid curves correspond to R 0 (T d ) according to SEIR models structured and parametrized as in (Kucharski et al., 2020) (n = 2, m = 2) and by (Wu et al., 2020) (n = 1, m = 1). The highest R 0 (T d ) curve (black dash-dotted line) has been obtained assuming a Dirac δ-distributed incubation period and a corresponding delay-differential equations-based model formulation (see Methods).
Our calculations suggest the range of R 0 that is higher than initially estimated (Boldog et al., 2020;Liu et al., 2020). This is because (1) in contrast to most model-based calculations, we appropriately accounted for the distribution of the incubation period and (2) we estimated T d based on the (nearly) exponential growth phases of the outbreak, which gives T d shorter than estimates based on initial individual propagation events. The discrepancy in T d estimation may be potentially attributed to the fact that not all removed individuals are registered. In the case when the ratio of registered to removed individuals is increasing over time, the true increase of the removed cases may be overestimated. We do not rule out this possibility, although we consider it implausible as the expansion of testing capacity in considered countries has been slower than the progression of the outbreak. We rather attribute the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 1, 2020. . https://doi.org/10. 1101/2020 discrepancy to the fact that in the very early phase, in which the doubling time (growth rate) is estimated based on individual case reports, the effect of super-spreading events (such as football matches, carnival fests, demonstrations, masses, or hospital-acquired infections) is negligible due to a low probability of such events when the number of infected individuals is low. Sanche et al. (2020) inferred that the pre-exponential period in Wuhan has been dominated by simple transmission chains.
In turn, super-spreading events were very likely the main drivers of the epidemic spread in, e.g., Italy and Germany, where, in the early exponential phase, spatial heterogeneity of registered cases has been evident (Cereda et al., 2020;Mercker et al., 2020).

Stochastic simulations of the SEIR model dynamics with super-spreading
We analyzed the impact of super-spreading on estimation of T d based on stochastic simulations of SEIR model dynamics. Simulations were performed in the perfectly mixed regime according to the Gillespie algorithm. We assumed that a predefined fixed proportion of individuals has higher infectiousness and as such is responsible for on average either half of infections ('super-spreaders') or two-third of infections ('hyper-spreaders'). To reproduce these fractions in systems with different assigned proportions of super-or hyper-spreaders, their infectiousness is assumed to be inversely proportional to their ratio in the simulated population. We estimated T d in two ways: based on one month of growth of the number of new cases since the first registered case ('30 days since the 1 s t case') and based on growth of new cases in the two-week period after the number of registered cases exceeds 100 ('14 days since 100 cases'). As we are interested in the initial phase characterized by exponential growth, we assumed that the susceptible population remains constant. infectiousness has been reduced to give the same deterministic T d = 2 days (vertical dotted gray lines). Remaining model parameters: (m, n) = (6, 1); (1/σ, 1/γ) = (5.28 days, 3 days). Each histogram results from 5,000 stochastic simulations, each starting from a single infected normal individual; trajectories resulting in outbreak failure were discarded. Each distribution is described in terms of its mean (µ), median (Q 2 and vertical dashed lines), and standard deviation (s.d.).
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 1, 2020. . https://doi.org/10.1101/2020 In Fig. 2 we show histograms of T d calculated using either the '14 days since 100 cases' method or the '30 days since the 1 s t case' method. One may observe that the histograms calculated using the '30 days since the 1 s t case' method are broader than those calculated using the '14 days since 100 cases' method, and the width of all histograms increases with increasing infectiousness (which is set inversely proportional to ρ). When T d is calculated using the '14 days since 100 cases' method, its median value is slightly larger than T d in the deterministic model (equal 2 days); however, when T d is calculated using the '30 days since the 1 s t case' method, then for high infectiousness of super-and hyper-spreaders (correspondingly, for low ρ) its median value becomes much larger than the deterministic T d . In the case of the lowest considered ρ = 1%, when super-spreaders (hyper-spreaders) have their infectiousness about 100 times (200 times) higher than the infectiousness of normal individuals , one obtains median T d that is overestimated by 28% (67%). Both methods statistically overestimate T d but when using the '30 days since the 1 s t case' method, overestimation is very significant. The effect is caused by low probability of appearance of super-or ultra-spreaders in the first weeks of the outbreak.
We notice that T d estimation for a given country based on available data is equivalent to the analysis of a single stochastic trajectory.

Conclusions
Based on epidemic data from China, United States, and six European countries, we have estimated that the basic reproduction number R 0 lies in the range 4.6-11.6 (4.4-9.6 for China), which is higher than most previous estimates (Boldog et al., 2020;Liu et al., 2020). There are two sources of the discrepancy in R 0 estimation. First, in agreement with data on the incubation period distribution, we used a model with six exposed states, which substantially increases R 0 (T d ) with respect to the models with one or two exposed states. Second, we estimated T d based on the two-and three-week period exponential growth phase beginning on the day in which the number of registered cases exceeds 100. This approach, in contrast to estimation of R 0 based on individual case reports, allows to implicitly take into account super-spreading events that substantially shorten T d . Spatial heterogeneity of the epidemic spread observed in many European countries, including Italy, Spain, and Germany, can be associated with larger or smaller super-spreading events that initiated outbreaks is particular regions of these countries.
Our estimates are consistent with current epidemic data in Italy, Spain, and France. As of April 25, 2020, these countries managed to terminate the exponential growth phase by means of country-wide quarantine. Current COVID-19 Community Mobility Reports (Google, 2020) show about 80% reduction of mobility in retail and recreation, transit stations, and workplaces in these countries. Together with increased social distancing, this reduction possibly lowered the infection rate β at least fivefold; additionally, massive testing reduced the infectious period, 1/γ. Consequently, we expect that the reproduction number R = β/γ was reduced more than fivefold, which brought it to the values somewhat smaller than 1. This suggests that R 0 in these countries could have been larger than 5.