Ascertaining the initiation of epidemic resurgences: an application to the COVID-19 second surges in Europe and the Northeast United States

Assessing a potential resurgence of an epidemic outbreak with certainty is as important as it is challenging. The low number of infectious individuals after a long regression, and the randomness associated with it, makes it difficult to ascertain whether the infectious population is growing or just fluctuating. We have developed an approach to compute confidence intervals for the switching time from decay to growth and to compute the corresponding multiple-location aggregated quantities over a region to increase the precision of the determination. We estimated the aggregate prevalence over time for Europe and the northeast United States to characterize the COVID-19 second surge in these regions during year 2020. We find a starting date as early as 3 July (95% confidence interval (CI): 1–6 July) for Europe and 19 August (95% CI: 16–23 August) for the northeast United States; subsequent infectious populations that, as of 31 December, have always increased or remained stagnant; and the resurgences being the collective effect of each overall region with no location, either country or state, dominating the regional dynamics by itself.


Introduction
Identifying a potential resurgence of an epidemic outbreak is crucial to timely implementation of measures for its mitigation and control. A major challenge, however, is the high uncertainty present because of the low prevalence values at which it typically happens after a long regression, as has been observed in many locations through the ongoing COVID-19 pandemic [1][2][3][4]. At the field level, direct characterization through randomized testing would need large population studies to provide significant results and using infection case data is dependent on varying testing rates [2,3]. More robust approaches based on death counts are also affected by the extremely small number of random events on which they rely for inference [2,4]. This uncertainty in assessing the state of the outbreak for a potential resurgence is a source of delays in the decision making and intervention implementation processes.
Here, we address two main computational needs to precisely characterize a resurgence. The first one is how to establish confidence intervals in the timing of the resurgence. These intervals range from the time it is certain that the infectious population has stopped decreasing to the time it is certain that the population has started to increase with a given confidence level. The second need is how to aggregate different local data into supra-local quantities to identify whether the resurgence is a collective regional effect and, if so, to determine the initiation of the resurgence more confidently.
We focus explicitly on Europe and the northeast United States (USA), which have experienced a second surge of the COVID-19 outbreak after a similar initial outburst and subsequent regression. None of these resurgences was widely expected nor anticipated [5,6]. Both regions display high mobility among their locations and broad independence among locations to enact measures to mitigate the propagation of the outbreak. In the case of Europe, Schengen Area countries allow for unrestricted border crossings among them. Mobility restrictions, lockdowns and other nonpharmaceutical interventions were able to achieve a major regression of the outbreaks, but the gradual lifting of restrictions has resulted in a resurgence across locations in these two regions [2,7]. The characterization of the similarities and differences of the outbreak progression in these two areas is needed to provide insights into the effectiveness of the actions taken, to ascertain the extent their results can be extrapolated from one region to another, and to informedly mitigate the current and potentially forthcoming resurgences.

Upper and lower bounds of the growth rate determine the confidence interval of the resurgences
We consider the estimated infectious population of the specific location at time t denoted by n I (t) and dynamics given by where k G (t) is its per capita growth rate with upper and lower bounds of the confidence interval (CI) denoted by k U G (t) and k L G (t), respectively. In epidemiology, it is customary to use the time-varying reproduction number R t , which describes the expected number of infections arising from a single case in the population [8,9]. It is related to the growth rate k G (t) through the Euler-Lotka equation where f GT (t) is the probability density function of the generation time [8,9]. We consider the usual description of generation times through a gamma distribution which leads to for k G (t) . Àb and R t ¼ 0 for k G (t) Àb. The values of the parameters are given by where t G and s 2 G are the mean and the variance of the generation time, respectively.
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210773 The starting date of the second surge, t 2 , is computed as the date the infectious population reached a minimum value after the maximum of the first surge at time denoted by t 1 : ð2:5Þ which in continuous time corresponds to a zero value of the growth rate (reproduction number equal to 1): k G (t 2 ) ≃ 0.
To compute confidence intervals for the switching time from negative to positive growth, we consider the confidence intervals of the population growth rate. Explicitly, when the upper bound of the growth rate is negative (reproduction number below 1), we can ascertain that the population is decreasing with a given confidence level. Analogously, when the lower bound of the growth rate is positive (reproduction number above 1), we can ascertain that the population is increasing with a given confidence level. Therefore, the lower bound, t L 2 , of the CI of the starting date of the second surge is computed as the last day before the minimum in which the upper bound of the CI of the growth rate is negative (reproduction number below 1): Analogously, the upper bound, t U 2 , of the CI is computed as the first day after reaching the minimum in which the lower bound of the CI of the growth rate is positive (reproduction number above 1): Note that, although the accuracy of the approach is dependent on the accuracy of the underlying characterization of the infectious population, the determination of t 2 , t L 2 , t U 2 is independent of potential multiplicative biases in n I (t), k U G (t ) and k L G (t ). The approach is illustrated for Connecticut (northeast USA) and Austria (Europe's Schengen Area) in figure 1. The trajectories of the infectious populations, the growth rates, and the 95% confidence intervals (CI) for each location were downloaded on 21 April 2021, from https://github.com/Covid19Dynamics/ trajectories. The data consider explicitly the age-specific infection fatality rates from Verity et al. [10], which are consistently similar among distinct locations [11,12], to infer the local infectious population from reported death counts [4]. Reproduction numbers were computed from the growth rates for each location by considering a gamma-distributed generation time, independent of the location, with a mean of 6.5 days and a standard deviation of 4.2 days [13]. The two locations in figure 1 show that, in general, there is a high uncertainty in the timing that can be attributed to the starting date of the resurgence.

Aggregate values provide a potential avenue to increase the reliability of the estimates for low prevalence values
The aggregate infectious population for a region is expressed as where n I,j (t) is the infectious population of the specific location with index j. Using the method of variance estimates recovery [14], which parallels the methodology of error propagation, the corresponding upper and lower confidence intervals are computed as from the upper, n U I,j (t), and lower, n L I,j (t), confidence intervals for each location. The method of variance estimates recovery cannot be used directly to compute the confidence intervals for the aggregate growth rate. We derive the expressions for the upper and lower bounds by royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210773 3 considering that the overall time-dependent growth rate is given by where k G,j (t) is the growth rate of the infectious population of the specific location with index j. This expression follows from The corresponding upper and lower confidence intervals are computed as ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi X j (k G,j (t) À k L G,j (t)) 2 n I,j (t) 2 s ð2:14Þ from the upper, k U G,j (t), and lower, k L G,j (t), confidence intervals for each location. These expressions explicitly consider that the uncertainty in the infectious populations times the corresponding growth rate is much smaller than the uncertainty in the growth rates times the corresponding infectious population.

The second surges started in early-mid summer with the northeast USA trailing Europe
To assess the properties of the resurgences with increased confidence, we computed the aggregate values of the infectious populations and the corresponding time-varying reproduction numbers for Europe's Schengen Area and the northeast USA from the individual values of the locations of each region [4]. We considered overall region values and overall region values excluding one location. Exclusion of one location provides an avenue to reliably infer the effects of the location in the overall region.
3.2. The resurgence has been more abrupt and intense in Europe than in the northeast USA Concomitantly, the time-varying reproduction numbers crossed above one on the resurgence dates less abruptly in the northeast USA than in Europe (figure 2c), reaching maximum values of 1.50 (95% CI: 1.48-1.51) in Europe and 1.30 (95% CI: 1.27-1.34) in the northeast USA. The sharp resurgence to exponential growth in Europe is coincidental with lifting major non-pharmaceutical interventions that curved the outbreak [15], including the coordinated end of travel bans in Schengen Area's countries on 1 July 2020 [16].
No substantial decreases in the overall infectious population, nor corresponding reproduction numbers below one, were observed for any of the two regions over three months after the starting dates of the second surges (figure 2). The estimated infectious population just stopped growing in the northeast USA in late December (figure 2b) and entered a prolonged stagnant state in Europe in early November (figure 2a).

Aggregate values are highly reliable compared to location-specific data
The low prevalence at the location-specific level leads to broad confidence intervals for both the infectious population and the time-varying reproduction numbers, which makes ascertaining the local growth properties of the outbreak unreliable over prolonged periods of time (figures 1 and 3, and electronic supplementary material, figure S1). The aggregated values for each region provide precise evidence of sustained growth of the outbreaks already over the summer, despite the uncertainty and variability present in each of the locations independently (figure 3 and electronic supplementary material, figure S1).
Our results also provide robust evidence that the resurgence was not driven by a unique location since any aggregate value of the starting date for each region leaving one of their locations out is within the confidence limits of the overall region (figure 3 and electronic supplementary material, figure S1). Therefore, the resurgences were the collective effect of each overall region.

Discussion
COVID-19 second surges in Europe and the northeast USA exemplify the difficulties of ascertaining the presence of an incipient epidemic resurgence and to determine whether the infectious population is growing or just fluctuating. We have provided an avenue to quantify the uncertainty present and the methodology to increase the reliability of the assessment by aggregating location-specific data in regional quantities.
The approach we have developed to quantify the uncertainty in the timing of a resurgence is based on the confidence intervals of the growth rate (or equivalently, those of the reproduction number) to ascertain, with a given confidence level, that the population is decreasing when the upper bound of the growth rate is negative and increasing when the lower bound of the growth rate is positive. The gap between these two regimes determines the CI of the minimum of the infectious population. The royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210773 aggregate values and their corresponding confidence intervals for a region, computed from those of its locations, allowed us to make precise assessments at a regional level. We obtained explicitly that regional values for the timing of COVID-19 second surges in Europe and the northeast USA are more precise than those of the individual locations and that the resurgences in these two regions are the collective effect of each overall region with no location, either country or state, dominating the regional dynamics by itself.
There are multiple behavioural, environmental and urban factors that affect the progression of infectious diseases in general and COVID-19 in particular [15,17,18]. These factors have exhibited royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210773 similar patterns across states in the northeast USA and across countries in Europe's Schengen Area. Our results show that indeed the confidence intervals of the timing of the second surge largely overlap among states and among countries in these two regions.
The northeast USA, as a region, closely trailed Europe in the second surge of the outbreak, but with a markedly smaller growth and evidence of slowing down earlier in the growth phase than Europe. Key differences in the actions taken included more gradual lifting and swifter progressive reimplementation of measures in the northeast USA than in Europe [15]. Our results show, through the progression over time of the aggregate prevalence of Europe's Schengen Area countries, with high certainty, that Europe's initial acting upon the second surge in mid-late October [5] took place well after a threemonth-long period of sustained growth of the COVID-19 infectious population in the overall region, which has resulted in a second surge deadlier than the first one [19]. With swifter progressive  royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 8: 210773 reimplementation of measures, such a high death toll has not been reached in the northeast USA [15]. Therefore, our results highlight the need to implement policies and surveillance approaches that also include data at a supra-location level when there is high mobility among locations.
Data accessibility. This work does not include any original data. The data used in the analysis were downloaded on