Challenges in control of Covid-19: short doubling time and long delay to effect of interventions

Early assessments of the spreading rate of COVID-19 were subject to significant uncertainty, as expected with limited data and difficulties in case ascertainment, but more reliable inferences can now be made. Here, we estimate from European data that COVID-19 cases are expected to double initially every three days, until social distancing interventions slow this growth, and that the impact of such measures is typically only seen nine days - i.e. three doubling times - after their implementation. We argue that such temporal patterns are more critical than precise estimates of the basic reproduction number for initiating interventions. This observation has particular implications for the low- and middle-income countries currently in the early stages of their local epidemics.


Introduction
In December 2019, a cluster of unexplained pneumonia cases in Wuhan, the capital of Hubei province in the People's Republic of China, rapidly progressed into a large-scale outbreak, and a global pandemic by 11 March 2020, as declared by the World Health Organisation [1]. The disease caused by this highly contagious infection has since been named COVID-19, and is caused by a single-stranded RNA coronavirus (SARS-CoV-2) similar to the pathogen responsible for severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) [2]. As of 29 March 2020, 657,140 confirmed cases and 29,957 deaths have been reported in nearly 200 countries and territories globally [3].
Various control measures have been implemented worldwide, including isolation of confirmed and suspected cases, contact tracing, and physical distancing. However, only the most aggressive measures have resulted in epidemic suppression. In Hubei, a regional lockdown was implemented on 23-24 January 2020, with a peak in reported cases occurring approximately two weeks later [4]. In Italy, a national lockdown was implemented on 9 March 2020 once 7,375 confirmed cases and 366 deaths had been recorded; only in the last few days (as of 29 March) the epidemic seems to be slowing down [5]. In comparison, India declared a nationwide lockdown on 24 March 2020, with only 434 confirmed cases and 0 deaths [6,7]. Similarly, South Africa began a 21-day lockdown on 27 th March, with 927 known cases and 0 deaths [8,9]. The implementation of such early and aggressive control measures in India and South Africa may substantially increase their chances of successful containment, although the social and personal cost could be substantial [10].
Planning of interventions usually relies on estimates of the basic reproduction number R 0 , defined as the average number of new infections generated by a single infected person within a susceptible population. Reported estimates of COVID-19 R 0 are highly variable, ranging from 1.4 to 6.49 [2,11], with the differences attributable to the variety of methods, model structures and parameter values (in particular, the estimated or assumed amount of presymptomatic transmission), as well as the data sources used. Most official sources settle in the range of 2-3 [3,[12][13][14][15], but these estimates mostly derive from early studies of the epidemic in Wuhan [16][17][18], or the Diamond Princess Cruise ship [19], and so are subject to important limitations: these include small amounts of data, uncertain or biased reporting of early cases in Wuhan, and the uniqueness of the specific settings in which they occurred.
We argue official ranges of R 0 should be continuously updated with more recent estimates coming not only from China [20,21], but also from the many different outbreaks observed worldwide [22][23][24][25]. Point estimates might not change, but the task remains imperative both because available data is now more numerous and reliable, and because estimates of R 0 in one population do not necessarily translate to another.
However, we emphasise that the speed of epidemic growth and the delay between infection and case detection are more relevant for COVID-19 control implementation than estimates of R 0 [26,27]. Since doubling times can be estimated directly from data and estimates of delays are relatively consistent, sophisticated models are not required to infer when action is urgent.

Results
To support our argument, we first estimate the growth rate in multiple countries and with different methods. We then estimate the incubation period and the distribution of times from symptoms onset to hospitalisation in different settings and compare it with other results from the literature. Observation of the outbreak in the UK and Italy provides further support to the highlighted delays from infection to detection. Finally, we discuss the limitations of relying on R 0 in this context.
For the estimation of the growth rate, we focus on the number of confirmed cases in European countries that have experienced large local epidemics ( Figure 1A), as reported by the WHO [28]. To avoid relying only on case confirmation, which could be affected by numerous biases, we also estimate the growth rate in hospitalisations, intensive care unit (ICU) admissions and deaths in Italy ( Figure 1B; [5]). To ensure generalisability of results, we performed another analysis on a larger set of European countries ( Figure 2). With the exception of a few countries with a small number of cases and potentially unreliable data, we consistently find doubling times of about 3 days or less, that appear to be sustained before mitigation interventions are put in place ( Figure 2). These are significantly shorter than early estimates from China [16,29]. For robustness, we have used two different methods: semiparametric and generalised linear (details in Materials and Methods). Unsurprisingly, the results differ in terms of their confidence intervals, but the conclusions are similar, and are in agreement with the common exploratory analysis based on visually inspecting data plotted on a logarithmic scale ( Figure S3).
Although our results are robust to the method used, they might still be misleading if there are biases in the data, such as errors in reporting, changes in case definition or testing regime, and so on. This issue is particularly critical as lags in reporting of cases can create discrepancies between national and international official sources [5,28,30] for case counts. However, these are unlikely to affect our conclusions owing to the following considerations: • The fast growth and high numbers likely make small biases negligible; • Any multiplicative correction, such as constant underreporting, will not affect the observed trend; • It is relatively easy for exponential growth to appear slower than reality, for example if reporting rates decline over time, but difficult for growth to appear consistently faster: aggressive swabbing of asymptomatic individuals (e.g. early on in the Italian locked-down towns; [31]), as well as changes in case definition [32], might explain such a bias in the data, but these factors are unlikely to affect observations for longer than a few days or consistently across different countries; • Hospitalisation and ICU admission, which in the Italian data appear to grow at similar rates as the number of confirmed cases ( Figure 1B), should be much less affected by reporting issues. The even faster increase in death rates we observe, instead, may be explained by rapid outbreaks among vulnerable groups, such as those in care homes, coupled with quicker progression to death among these groups, or possibly local hospital saturation.
We conclude that, although existing data has its limitations, the evidence for fast exponential growth in the absence of intervention is overwhelming.
The delay between infection and case detection is crucial in determining how long cases have been growing unobserved. Early detection of cases during the incubation period, before individuals have become symptomatic, is typically not possible once containment has failed since it relies on full contact tracing and testing of asymptomatic individuals. Detecting cases at symptom onset is potentially more feasible, but in practice depends on the case-finding strategy. For example, in the UK, symptomatic individuals are instructed to self-isolate at home, and are only tested if they subsequently require hospitalisation. Thus the delay between infection and case detection includes the incubation period, the time between symptom onset and hospitalisation, and the time it takes to receive a positive test result. Similar effects will be visible in other countries where case counts are dominated by hospitalisations.
We report published estimates of the incubation period and the delay between symptom onset and hospitalisation (Table 1). Since none of these estimates simultaneously account for truncated observations and exponential growth in the number of infected cases, we also include our own estimates (see Materials and Methods) obtained by analysing UK line-list data provided by Public Health England (unfortunately not publicly available) and a publicly available line-list which collates worldwide data [33]. Although more robust, our estimates are consistent with the existing literature, and highlight geographical heterogeneity, such as shorter onset-to-hospitalisation intervals in Hong Kong and Singapore compared to the UK. With the exception of Singapore, the sum of the mean incubation period and mean onset-tohospitalisation interval is never shorter than 9 days, which corresponds to approximately 3 doubling times in an unconstrained epidemic like those observed in Figures 1 and 2.
Our estimate of the delay between infection and detection is consistent with observations of the UK and Italian epidemics ( Figure 3). For both countries, we plot the numbers of new cases and notice a visible drop occurring 8-9 days after the first, relatively soft, control measures were implemented (in the UK, recommended self-isolation if symptomatic on 13 March, and in Italy, lockdown of infected towns and school and university closure in Northern Italy, on 22-23 February). After the first control measure in the UK ( Figure 3A), cases continued to increase exponentially with an estimated growth rate of ~0.22/day (corresponding to a doubling time of just over 3 days) for 9 days. During this period numbers of daily confirmed cases rose approximately 8-fold. Subsequently, the number of new infections started to tail off. A similar pattern is observed in Italy ( Figure 3B). Because nonpharmaceutical interventions, with unknown compliance, cannot be evaluated until their effects emerge in the data, a pattern of introduction of increasingly strong measures has repeated across Europe, with long delays to control. Even with immediate hard interventions halting all community transmission, within-household transmission will continue to occur, creating an additional delay between the beginning of the intervention and its effect. This is consistent with the approximate 2-week delay from lockdown to peak in new cases observed in Hubei [4]. Further delays in case-confirmations, hospitalisations, potential ICU admissions and deaths mean the latter figures keep on growing well after transmission control is achieved.
Since R 0 remains the mainstay of most epidemiological analyses, we explored values of R 0 consistent with a range of growth rates and modelling assumptions. For a growth rate of 0.25/day and our delay estimates (Table 1), we obtain values ranging from 2 to 4 or larger (Table S1A), owing to the extreme sensitivity to assumptions, in particular the extent of presymptomatic transmission, for which estimates in the literature vary widely [34][35][36][37].
Although R 0 is commonly used to determine the reduction in person-to-person transmission needed to achieve control, this brings limited insight into how to control the COVID-19 pandemic. First, there is sufficient evidence worldwide that interventions must be draconian, particularly in countries where case numbers are high. Second, given the variability in R 0 estimates obtained from the same growth rate appear predominantly due to the assumed amount of pre-symptomatic transmission, the exact value of R 0 is only poorly correlated with the required aggressiveness of the intervention [38]. For example, if R 0 is four and most of the transmission occurs after symptoms onset, 75% of the transmission needs to be stopped and self-isolation when symptomatic can easily achieve this. Conversely, if R 0 is two but most of the transmission is pre-symptomatic, only 50% of the transmission needs to be prevented, but in practice this can only be achieved through interventions, like quarantining apparently healthy individuals, that are highly socially disruptive and likely enforced rather than spontaneous. Finally, R 0 informs how aggressive interventions should be, but not how quickly they should be implemented.

Discussion
The highlighted risks of underestimating the combination of short doubling times and long delays between infection and case detection are consistent with the now-common pattern of countries misjudging the initial small number of observed cases, only to realise the storm has already arrived. At unconstrained growth, even the immense effort of doubling local hospital capacity only buys 3 days of reprieve before bed capacity is breached. Being blind to the extent of an epidemic and the true number of infections at any one time results in intervention strategies based on the number of observed cases and measurements of R 0 . For COVID-19, these are insufficient and dangerously underestimate the true degree of intervention required to slow down and bring the epidemic under control. We advocate stronger action from national and international health care communities, with a particular focus on supporting lowand middle-income countries where numbers of cases, at the time of writing, appear to be relatively low. In settings where health care capacity is low and intergenerational mixing common, swift action will save numerous lives. Note that the Italian Hospitalised and ICU cases are daily prevalence rather than daily incidence.

Fig. 2.
Log daily confirmed cases and growth rate estimation for numerous European countries. Curves (black lines) are fit using the Generalised Linear Model methodology (see Materials and Methods) to the first 9 days non-zero data after a cumulative incidence of 25 is reached (period shown as a blue rectangle), which for most countries coincided with the start of sustained local transmission, with the exception of the UK and Romania where fitting started an additional 9 days after this criterion was met to reflect the local situation. Daily new cases are shown as red dots. The fit is performed on circles; crosses are data added after the prediction is made. The exponential trend appears reasonably accurate for two more days, after which the effect of the first major intervention becomes visible. Other Supplementary Materials for this manuscript include the following: Data sources and the code used to carry out our data analysis can be found at: https://github.com/thomasallanhouse/covid19-growth

Data Sources
We consider four sources for our epidemiological data: the WHO [28 ], line-list data provided by Public Health England (PHE), line-list data from [33 ] and the Italian Istituto Superiore di Sanità [5 ]. Of these data sets, three are publicly available. The line-list from PHE is unfortunately not publicly available. From these sources, we extract epidemiological data concerning case counts, incidence, hospitalisation, and delays between infection and symptom onset, and onset of symptoms and hospitalisation. These data sources and the code used to carry out our data analysis can be found at: https://github.com/thomasallanhouse/covid19-growth.

Supplementary Text
Fitting the growth rate Typically, an infection spread from person to person will grow exponentially in the early phase of an epidemic. This exponential growth can be measured through the real time growth rate r so that, loosely speaking, the prevalence of infection is A natural mathematical model to derive the estimate of r is a Poisson family generalised linear model (GLM) with a log link. Given the over-dispersed noise inherent in both disease dynamics and surveillance data, a quasi-Poisson family is considered here. The growth rate r is more intuitively reported as a doubling time (the time taken to double case numbers) and so t D = ln(2)/r. The log-linear analysis from a Poisson GLM defined formally below is restricted to datasets (or time windows) with clear exponential growth, or when additional explanatory variables, which are rarely available in real time, exist.
To allow, in semi-parametric manner, time variation in growth rates we adapt a generalised additive model (GAM) where I ∝ e s(t) for some smoother s(t). In particular, we use a quasi-Poisson family with canonical link and a thin-plate spline as implemented in the R package mgcv [45,46 ]. The instantaneous local growth rate is then the time derivative of the smootherṡ(t) and an instantaneous doubling time calculated as t D = ln(2)/ṡ(t). Potential issues with the GAM approach include that extrapolation outside of the data range (and hence forecasting epidemic trend) is not sensible, and that there may be boundary effects from the choice of smoother. However, this approach has the major advantage that it allows for time-varying estimates of doubling time and thereby implicitly allows for missing explanatory information.
As well as the semi-parametric GAM approach, we take a parametric approach based on direct estimation of the exponential growth rate. This lacks the ability to capture time variation, but allows for extrapolation and epidemiological interpretation. To capture over-dispersion we use a quasi-Poisson family for the noise model. Explicitly, the Negative Binomial probability mass function (pmf) is We will work in the parameterisation where the mean is µ and the variance is θµ, i.e.
Let the number of new cases on day t be y(t). We assume that this is generated by an exponentially growing mean, which is then combined with the negative binomial pmf to give a likelihood function for the observations over a set of times T of L(y|y 0 , r, θ) = t∈T NB(y(t)|n(y 0 exp(rt), θ), p(θ)) , where y = (y(t)) t∈T . This can then be viewed as a generalised linear model (GLM) with time as a continuous covariate, intercept ln(y 0 ), slope r, exponential link function and negative binomial noise model [47 ]. We can perform inference through numerical maximum likelihood estimation (MLE) and calculate confidence intervals using the Laplace approximation [48 ].
Each page of Figure S1 shows the GAM compared to a simple GLM with θ = 1 and T taken to be all of the data range, in contrast to the results in the main paper and Figure S2, where we fit θ and let T correspond to the first nine days of the local epidemic after the cumulative number of cases has reached 25 (the only exceptions are the UK and Romania, where fitting started an additional 9 days later to reflect the local situation). While the simple GLM method is clearly inadequate if the fit is not restricted to a window where exponential growth appears reasonable, it is shown for comparison in these plots. The left panel shows the output of the model fit and the data, the middle panel the instantaneous growth rate from GAM (black) and the growth rate from GLM (red) with 95% CI, and the right most panel shows these growth rates converted to doubling times. Of the fifteen countries, two (Belgium and Romania) have equivalent fits from GAM as from GLM and show constant growth rates over the time period. The Czech Republic, Greece, Ireland and Poland have the central GLM result within the 95% CI of the GAM suggesting a constant growth rate is a plausible explanation of the data reported. Austria, France, Italy, Portugal, Spain, Switzerland show a fairly smooth transition from short to longer doubling times. Germany, Netherlands and UK show more oscillatory behaviour in doubling times.
As a further test for the robustness of our results, we simply visually assess the growth rates of the epidemic in a set of countries by plotting data on a log scale and compare the observed slopes with pure exponential trends ( Figure S3). To avoid relying on confirmed cases only, we plot a 'mixed bag' of cumulative cases, daily new confirmations, hospitalisations and deaths. We also plot exponential trends with a lower growth rate (r = 0.18 and r = 0.13) to aid the visual distinction in slopes between different exponential growths and the visual assessment of the potential effects of interventions in Italy.

Estimating delay distributions
Delay distributions describe the time delay between two events. To understand how long until the impact of an intervention may be observed, we need to understand the delay between infection and symptom onset (the incubation period), and the delay from onset to hospitalisation. A difficulty with estimating delay distributions during an outbreak is that events are only observed if they occur before the final sampling date. Since delay distributions depend on the time between two events, if the first event occurs near to the end of the sampling window, it will only be observed if the delay to the second event is short. This causes an over-expression of short delays towards the end of the sampling window, which is exacerbated by the exponential growth of the epidemic. Therefore, we need to account for this growth and truncation within our model.
To fit the data we use maximum likelihood estimation. However, we do not observe the delay directly, instead observing the timing of the two events. Therefore, we need to construct a likelihood function for observing these events. Following [49 ], we construct the conditional density function for observing the second event given the time of the first event and given that the second event occurs before date T . That is, we are interested in the conditional density function where x 1 and x 2 can be exactly observed or interval censored. The delay from onset to hospitalisation for the UK is estimated using FF100 data provided by Public Health England, which contains data on the first few hundred infected individuals in the UK. This data incorporated the time of symptom onset and time of hospitalisation. There were some cases who were hospitalised before their onset date. These cases have been removed from the data set, since they do not provide insight into the delay. Additionally, some cases have no symptom onset, so these have also been removed from the data. For cases where symptom onset and hospitalisation occur on the same day, we add half a day to the hospitalisation day, since the delay is unlikely to be instantaneous. After tidying the data, this left 106 cases from which to infer the onset to hospitalisation delay. The dates in the line list are recorded exactly, so the likelihood function becomes where f is the density of the onset to hospitalisation delay and g is the density of the onset time. Using this truncation corrected method and a gamma distribution to fit the delay distribution, we get a mean delay of 5.14 with standard deviation 4.20. Unfortunately, we cannot share the FF100 data. To compare different regions, we also use data from Hong Kong and Singapore to estimate the local onset to hospitalisation delays. This data is taken from an open access line-list [33 ], and the filtered data sets used are provided in the supplementary material. Using the method above, for Hong Kong the mean delay is 4.41 days, with standard deviation 4.63, and for Singapore the mean delay is 2.62 with standard deviation 2.38. For the incubation period, we use data from Wuhan during the early stages of the outbreak. This data was extracted from an open access line-list [33 ], containing dates when individuals were in Wuhan and when they developed symptoms (among other information). Since this data is from the early stages of the epidemic, the majority of cases were in Wuhan. Therefore, it is likely that these individuals were infected in Wuhan, so the time spent in Wuhan provides a potential exposure window during which infection occurred. For individuals with symptom onset date before leaving Wuhan or the same day they left Wuhan, the upper bound on the exposure window was adjusted to half a day before symptom onset. Using the data as of 21/02/2020, we have 162 cases from which to infer the incubation period. This infection date is interval censored, so we obtain the likelihood function where f is the incubation period density function and g is the density of the infection date. We assume g is proportional to force of infection of the outbreak, which is assumed to follow exponential growth with rate parameter 0.25 day −1 . Using a gamma distribution to describe the incubation period, we get a mean incubation of 4.84 days with standard deviation 2.79. The data for the incubation period is also included as supplementary material, along with MATLAB code to perform the maximum likelihood estimation.

Estimation of R 0
The relationship between the growth rate r and the basic reproduction number R 0 in a simple homogeneously mixing model is provided by the Lotka-Euler equation: where τ represents the time since the infection of an individual and ω(τ ) is the infectious contact interval distribution, defined as the probability density function (pdf) of the times (since infection) at which an infectious contact is made. An infectious contact is a contact that results in an infection if the contactee is susceptible, and early on in the epidemic any randomly selected contactee is almost surely susceptible. Equation (S9) assumes all individuals have the same infectious contact interval distribution. However, if we assume random variability between individuals, there will be a set S of curves Ω(τ ). However, equation (S9) still applies, with ω(τ ) being the time-point average of all curves in S [50 ]; see Figure S4.
The generation time T g is defined as the mean of the infectious contact interval distribution ω: The same definition extends to a random infectivity profiles of which ω is the time-point average.
For the incubation period we use our estimates from Table 1 (mean 4.84, standard deviation 2.79), which are anyway similar to those estimated by others. However, information about any form of pre-symptomatic transmission is hard to obtain but crucial for R 0 estimates [34][35][36][37]. Furthermore, there is also limited information concerning how infectivity changes over time. Therefore, Table S1 reports the estimates we obtain assuming the infectious period starts at the onset of symptoms, one, two or three days earlier, and assuming a Gamma-shaped infectivity with mean 2 or 3 days. In both cases, the standard deviation is assumed to be 1.5 and the infectivity is truncated after 7 days (see Figure S4).
We conclude that the estimates of R 0 are highly sensitive to small variations in quantities that are poorly supported by available data, but that for a growth rate of 0.25 day −1 , close to what is observed in Italy and the UK, are also generally larger, and possibly much larger, than official estimates [3,[12][13][14][15]. Smaller values in this range are associated with significant amounts of pre-symptomatic transmission [34 ], leading to a generation time for example compatible with some of the shortest estimates of the serial interval seen in the literature [51 ], and with a front-loaded infectivity curve (mean 2, rather than 3).
We tested further assumptions. A simple SEIR model, with exponentially distributed incubation and infectious periods (with the same means as above but constant infectivity) leads to much smaller values of R 0 than our estimates, as it favours really short incubation periods (Table S1B, left). Estimates, instead, do not change significantly if high variability in total infectiousness between individuals, in line with what observed for SARS, is assumed (Table S1B, right) or if 50% of cases are assumed to be fully asymptomatic and transmit at half the rate as those with symptoms (not shown).
These simple estimates are obtained under the assumption of mass-action mixing. The explicit presence of a social structure (e.g. age-stratification, household/network structure, etc.), which in principle could affect them, is likely negligible in such a high R 0 and growth rate regime [52 ]. The effect of the social structure on transmission is expected to grow in importance (especially the household structure, since isolation and quarantine facilitate within-household transmission) the closer R 0 is to 1.        Generation time: Generation time: Table S1: Values of R 0 derived from different growth rates and different modelling assumptions. A) Gammadistributed latent period with estimates from Table 1, and Gamma-shaped infectivity profile with mean 2 (left) and 3 (right) and standard deviation 1.5; B) SEIR model (left) and same model as in A) but assuming total infectiousness is randomly drawn from a Gamma-distribution with mean 1 and standard deviation 1/ √ k, with k = 0.25.