Better strategies for containing COVID-19 pandemic: a study of 25 countries via a vSIADR model

We study epidemiological characteristics of 25 early COVID-19 outbreak countries, which emphasizes on the reproduction of infection and effects of government control measures. The study is based on a vSIADR model which allows asymptomatic and pre-diagnosis infections to reflect COVID-19 clinical realities, and a linear mixed-effect model to analyse the association between each country’s control measures and the effective reproduction number Rt. It finds significant effects of higher stringency measures in lowering the reproduction, and a significant shortening effect on the time to the epidemic turning point by applying stronger early counter measures. Epidemic projections under scenarios of the counter measures (China and Korea, the USA and the UK) show substantial reduction in the epidemic size and death by taking earlier and forceful actions. The governments’ response before and after the start of the second wave epidemics were alarmingly weak, which made the average duration of the second wave more than doubled that of the first wave. We identify countries which urgently need to restore to at least the maximum stringency measures implemented so far in the pandemic in order to avoid even higher infection size and death.


SXC, 0000-0002-2338-0873
We study epidemiological characteristics of 25 early COVID-19 outbreak countries, which emphasizes on the reproduction of infection and effects of government control measures. The study is based on a vSIADR model which allows asymptomatic and pre-diagnosis infections to reflect COVID-19 clinical realities, and a linear mixed-effect model to analyse the association between each country's control measures and the effective reproduction number R t . It finds significant effects of higher stringency measures in lowering the reproduction, and a significant shortening effect on the time to the epidemic turning point by applying stronger early counter measures. Epidemic projections under scenarios of the counter measures (China and Korea, the USA and the UK) show substantial reduction in the epidemic size and death by taking earlier and forceful actions. The governments' response before and after the start of the second wave epidemics were alarmingly weak, which made the average duration of the second wave more than doubled that of the first wave. We identify countries which urgently need to restore to at least the maximum stringency measures implemented so far in the pandemic in order to avoid even higher infection size and death.  Table 1. Weekly averages of the estimated reproduction numbers R t (W1-W4) of 25 countries over the four weeks from their respective start date of community transmission (DCT) under the pre-symptomatic rate θ = 0.8. Countries are ranked based on the average R t over the four weeks (4W-Ave). China refers to the provinces excluding Hubei province. The 95% confidence intervals for R 0 are available in  the infection cases and the deaths would have been increased by 894 and 527% under the 5-day delayed setting.
By linking the estimated effective reproductive number R t with respect to three categories of OxCGRT scores on the stringency, economic and healthcare-related counter measures to COVID-19, as well as an NO 2 -based index that reflects the level of road traffics and hence home isolation, we establish a linear mixed-effect regression model [18] for two groups of countries, (i) European and American countries and (ii) Asian and Oceania countries, from the start of epidemics to 31 December 2020, motivated by the work of [19]. The mixed-effect model allows common fixed effect parameters shared by all countries in a group while each country has individual random effect corresponding to different categories of counter measures. The statistical inference on the linear mixed-effect model (table 3) shows the stringency measures were the most significant in reducing the reproductive power of COVID-19 in all countries, and the economic and the NO 2 indices were significant in the European-American group.
Twenty four out of the 25 countries have experienced the second wave of the epidemics since 29 April 2020. Our analysis (figure 4) shows much lower level of counter measures in the three weeks leading to the start of the second wave, with the important stringency score decreased to 69% (SE: 0.03) of the maximum level sustained in countering the first wave epidemics. And quite alarmingly, after the start of the second wave, there was no immediate increase in the stringency measures. It appears that the governments' only response to the second wave was in the healthcare area as reflected in the gradual increase in the health-related score in figure 4. The responses of most governments to the second and third wave were much slower and insufficient as compared with the first wave.
To provide insights on the policy effects on the winter epidemic situation without vaccines, we conduct projections under different strength of control measures with the established linear mixed-effect model. It shows that any relaxation of the stringency measures from the current level would lead to significant increases of infection cases and deaths by the end of February 2021. The projections reveal that, compared to the scenario of maintaining the policy scores and NO 2 levels as those on 31 December 2020, returning to the most strict policy measures would make confirmed cases decreased by 25.9% in average, and deaths reduced by 17.4% by 28 February 2021. If each country lowers its policy scores to half of its maximum policy levels and increases their NO 2 to be twice of their minimums seen in 2020, the confirmed cases would increase by 833.2% in average, and six countries' confirmed cases would increase more than 10 times. In addition, projected deaths would increase by 477.2% in average. This means that the levels of the counter measures should not be relaxed from the current level, and it is highly recommended to return at least to their maximum levels for some countries as specified in §6.

Data and methods (a) Data
We consider 25 countries in our study as listed in table 1, which had experienced COVID-19 with at least four weeks of established community infections on 20 April 2020. The daily records of infected, dead and recovered patients are obtained from Johns Hopkins University Center for Systems Science and Engineering [2] and WHO [3] for the 25 countries, supplemented by statistics from these nations' health ministry and Dingxiangyuan Pneumonia website [4] for China's data. The population sizes are from the United Nation [20]. In our study, data in China only contains those from the mainland provinces without Hubei province where Wuhan is the capital city due to incomplete observations at the start of the epidemic. The daily new cases, deaths and recoveries are smoothed to remove measurement errors and reporting delay. The smoothing procedure is outlined in the electronic supplementary material.
There are 11 countries which under-reported their daily recovery counts (Belgium, Canada, France, Italy, Netherlands, Norway, Portugal, Spain, Sweden, the UK and the USA), as shown by the much under-estimated recovery rates in electronic supplementary material, figure S2. We use 14 days as the average recover time from diagnosis, as suggested by WHO and supported clinically by [13], to impute the number of recovered cases for the 11 countries. For the other 14 countries, we use their reported data in the analysis.
Data on the 25 countries' COVID-19 containment measures are collected in Oxford Coronavirus Government Response Tracker (OxCGRT) project [15], which collects publicly available information on 18 variables and aggregates them into sub-categories. In order to study the impacts of different policy actions, we use the stringency index, the economical support index and recalculate a health index without cross-combining them to form a composition score. Stringency index measures the strictness of government's containment policy, such as restriction on gathering, suspension of schools and lock-down. The economic index measures a government's economic support (cash support and debt relief), while the health index gauges on the government's health-related actions including testing and contact tracing. We use the diagnosis testing scores from OxCGRT to demonstrate countries' responses at the beginning of the pandemic in figure 3. We also construct an index based on daily NO 2 concentrations using data provided by aqicn.org, which reflect the level of road traffics and hence the extent of the home isolation. Specifically, daily city-level NO 2 concentrations is the median of observations from multiple monitoring stations in the city. The NO 2 index is the ratio of the smoothed 2020 daily levels over the corresponding levels in 2019. NO 2 index is not available for Singapore and Malaysia due to a lack of access to the data. Electronic supplementary material, figure S3 displays these scores for each country from 1 January 2020 to 16 January 2021.

(b) Start dates and study period
The start date for established community transmission of a country is determined by considering the time when local infection emerged, the start date provided by the WHO [3] and the estimated infection rate under the proposed vSIADR model. It is set as the first local maximum of the estimated infection rate after the WHO date.
The study period for policy evaluations is from the start date of each country up to April 20 for the first wave of the pandemic, which is then extended to 31 December 2020 for analyses on later waves of COVID-19 reproduction and control measures for the 24 countries without China. The latter is due to China's zero-tolerance policy which made any locally transmitted cases sporadic since April 2020 and the data uninteresting from a modelling point of view. Projections under  (c) Varying coefficient susceptible-infected-asymptomatic-diagnosed-removed model Let S(t), I a (t), I p (t), D(t), R a (t), R r (t) and R d (t) be the counts of the susceptible, infected but asymptomatic, infected and pre-symptomatic, diagnosed, recovered from asymptomatic, recovered from diagnosed and dead people in a country at day t, respectively. Let R(t) be the sum of the recovered R r (t) and death R d (t), which is the number of removed from D(t) at time t. Let N(t) = D(t) + R(t) and N(t) = N(t + 1) − N(t) be the accumulative and daily increment of confirmed cases at time t.
We propose a varying coefficient susceptible-infected-asymptomatic-diagnosed-removed (vSIADR) model. It extends the conventional SIR [11] model in four aspects: (i) separating the classical I state into the asymptomatic state I a and the pre-symptomatic I p state before being diagnosed, where only the diagnosed cases are observable; (ii) allowing infections in both the I states (I a and I p ) and the D state, where I a and I p are not observable; (iii) having asymptomatic cases as a new pathway from the I state in additional to the one for the pre-symptomatic cases; and (iv) allowing time-varying coefficients including the infection and removal rates. The first three aspects of the vSIADR model better capture the COVID-19 epidemics as most infections are made before being clinically diagnosed, and there are substantial asymptomatic cases which are never diagnosed and yet contagious [19,21,22]. Indeed, both SIR and SEIR [23] models assume the infected I state (after being diagnosed) is the only infectious state among the compartments. However, in reality, the infected after being diagnosed will be largely quarantined at home or hospitals and would have reduced contact rates and be less infectious.
The vSIADR model has the following ordinary differential equations (ODEs) which specify the conditional means of the t + 1 daily increments given the counts (S(t), I a (t), where β I a t , β I p t , β D t are time-varying infection rates for the never diagnosed asymptomatic infections in I a , not yet diagnosed infection in I p and the diagnosed infections in D, θ ∈ (0, 1) is the proportion of pre-symptomatic cases who will develop symptom and be diagnosed in the daily increase of infected cases, γ r,t and γ d,t are the time-varying recovery and death rates, respectively, γ t = γ d,t + γ r,t is the overall symptomatic removal rate and M is the population size. Under Model (2.1), 1 − θ proportion of daily newly infected cases would become asymptomatic infections. The proportion θ and the diagnostic rate α are constants in (2.1), while extension to make them time varying can be made as there might be more infected cases not diagnosed at the beginning of the epidemics [19,22]. The selection and estimation of those parameters are discussed in the next subsection. Each equation in (2.1) may be divided by the population size M on both sides. In this sense, (S(t), I a (t), I p (t), D(t), R a (t), R r (t), R d (t)) become the proportions of the compartments such that their summation equals to 1.
A key difference between the SEIR and vSIADR models is that the latter does not have the non-infectious exposed E-state as the I state is already unobservable in vSIADR. We choose to hide the E-state in our model to avoid having two latent states in the pre-symptomatic pathway as otherwise it would create much disparity between model and data, and difficulties with estimation.
The time-varying coefficients in the model reflect the changing aspects of the COVID-19 pandemic in terms of the transmission and removal processes as well as the policy interventions implemented by various governments to contain the spread of the virus. The varying infection rates reflect the varying degrees of the containment measures taken by each country and individual behaviour changes (social distancing and face mask wearing) as well as the varying infectiousness of the virus over time. The varying removal rate accounts for varying medical conditions and improvement of treatments. For the infection rates, although the policies of a country can be fixed over a period of time, individual responses and behaviours to the policy measures may differ which can lead to varying dynamics in the spread of the disease. In any case, constant coefficients are special cases of the varying coefficient setting, and would be reflected in the empirical estimates by the local kernel regression outlined in §2d. Let be the daily increments of the state variables, respectively. The stochastic version of the vSIADR model is constructed so that the daily increments follow the conditional Poisson processes with the conditional means specified by the daily discretized version of (2.1): The Poisson model can be weakened to be semi-parametric as the estimation can be made based on the conditional mean specification in (2.2).
The proposed model assumes that the depletion of the susceptible S(t) is made by the infected in the asymptomatic compartment I a , the pre-symptomatic but yet to be diagnosed cases in the I p compartment and the confirmed (diagnosed) cases in the D compartment. The transmission rate of asymptomatic cases is lower than that of the pre-symptomatic cases [24,25] as supported by a range of clinical data. Focused on 455 contacts exposed to the asymptomatic COVID-19 virus carriers, [24] found no severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections was detected in all these contacts in the nucleic acid tests, and concluded that the infectivity of asymptomatic SARS-CoV-2 carriers might be weak. In a related work, [25] analysed a set of COVID-19 contact tracing surveillance data which included 161 symptomatic cases and 30 asymptomatic cases in Ningbo, China from 20 January to 6 March 2020. Using a SEIR modelling framework, they concluded that the relative transmissibility of asymptomatic case was significantly lower than that of the symptomatic cases. Besides, the contact rate of the cases after diagnosis is much reduced due to self-quarantine or hospitalization. Thus, the pre-symptomatic cases before being diagnosed are the most contagious group. We set β I a t = β D t = β I p t /r for a constant r > 1 to reflect the reality of COVID-19.
The effective reproduction number under the vSIADR model, whose derivation is given in the electronic supplementary material, is We are aware of the fact that the above model framework is only an approximation to the reality and is subject to the specification errors as implicated by the empirical results in [26,27]. The varying coefficient aspect of the model reduces the potential risk of mis-specification. In the proposed estimation, we only use the mean information to formulate estimators without exploiting the mean-variance relationship under the Poisson assumption, which reduces the consequences of potential model mis-specification and increases the robustness of the proposed method.

(d) Estimation
The parameter estimation is made in a two-step procedure. We first estimateβ I p t , γ d,t and γ r,t under the specificationβ I a t =β D t =β I p t /r for a given θ ∈ (0, 1), r > 1 and the diagnosis rate α. Then, in the second stage, we estimate α and r based on a cross-validation measure under a chosen θ.
As the asymptomatic cases are latent, θ which regulates the distribution of infected to be symptomatic and asymptomatic can not be identified from the standard epidemiological statistics, and its value has to be assigned based on results from other studies. There are a diverse range of values for θ among existing studies including [19,22] which reported rather small θ values in the initial stage of the epidemics.
[28] surveyed 79 datasets in published studies up to 10 June 2020, and found most of the asymptomatic cases become symptomatic as the time progresses. They reported the frequencies of symptomatic and asymptotic COVID-19 cases, and found 20% (CI: 17-25%) of COVID-19 infections remained asymptomatic among the 79 studies. The megaanalysis of [28] lends support for θ = 0.8, which is used as the baseline value in our analysis. As raised by a referee, there were under-reporting of the pre-symptomatic cases in some countries at the start of the epidemics, which had aggravated the situation. To consider the uncertainty with the underlying asymptomatic rate, two additional values of θ = 0.6 and 0.4 are also employed, and the results under these two values of θ are reported in electronic supplementary material, figure S4, which shows that the estimated R t is robust with respect to θ.
Another challenge in the estimation for both the conventional SEIR model and the proposed vSIADR model is due to the latent states. The Bayesian method has been a commonly used estimation approach for the SEIR model [29][30][31][32]. Here, we present a frequentist approach by first imputing the latent I a (t) and I p (t) in building the estimating equations. Under the assumption that the conditional mean of the daily increments follows the specification (2.1), the following approximation can be entertained Thus, for a given θ and a given pair of α and r, we can impute I p (t) byÎ p (t) = N(t)/α, and impute I a (t + 1) sequentially by based on the second equation of (2.1), whereγ r,t is the estimate of γ r,t from (2.6) below. Using those imputed values for I a (t) and I p (t), we estimateβ I p t via an estimating function we regress Y t on X t by the locally weighted kernel regression estimator. The kernel estimation ofβ I p (t), at the given r and α, minimizes the objective function is a boundary kernel modified from a usual symmetric kernel [33] and h is the temporal smoothing bandwidth. The use of the boundary kernel is to account for the Notice that the proposed estimator in (2.5) is based on the estimating equations of the means of the daily increments in the compartments. A conditional maximum-likelihood estimator (MLE) could be constructed using the distribution function of the difference between two Poisson random variables. However, this likelihood function depends on the infection rate parameters through the modified Bessel function of the first kind. The MLEs for the infection rate could be computationally unstable, specially for the local smoothing estimation where only observations in a local neighbourhood of t are used for estimatingβ I p t . Although the proposed method is not a locally weighted MLE, it is robust to the distributional assumption of daily increments, and is computationally efficient and stable.
From the last two equation in in the conditional mean. Thus, we estimate γ d,t and γ r,t by regressing R d (t) and R r (t) on D(t), respectively, without intercept using the boundary kernel B(t) similar to (2.5): It should be noted that the estimation of γ d,t and γ r,t are of one-step free of (α, r).
After obtaining estimation for the infection rateβ I p t at a given (α, r), we conduct the second stage estimation for α and r based on a leave-one-out cross-validation criteria. Specifically, for each fixed (α, r), we estimateβ I p t by the whole data except day t's. Denote the leave-one-out version of (2.5) asβ Evaluate the fitting performance by comparing the relative difference betweenŶ −t (α, r) and Y t by the following relative fitting criterion: for the estimation of α and r via the cross-validation approach. They reveal general satisfactory performance of the estimation procedure. There was a noticeable bias and large variance in the first few days of the simulation due to small numbers of infected cases at the beginning of the epidemic, which disappeared quickly as more days were added to the simulation.
To gain further information on the fitting performance of the proposed vSIADR model, we display in electronic supplementary material, figure S8 the fitted values figure S8 shows that the fitted values were very close to the observed trajectories N(t) − N(t − 1) of the countries.
We also consider the out-of-sample validation for the vSIADR model by forecasting the sizes of new infections and deaths of a country. We use the prediction errors for the predicted new case numbers in the 7 days from 13 to 20 April, and the 14 days from 7 to 20 April, respectively, as the performance measures. The prediction of the epidemics is made by substituting the predicted infection, diagnosed, recovery and death rates into the vSIADR model (2.1) with initial values If the estimated infection rates are decreasing in the last 7 days before T, we fit the reciprocal model β Electronic supplementary material, table S3 and figure S9 report the 7-day and the 14-day relative prediction errors for the 24 countries, where the errors were the ratio of the difference between the predict and the actual observed confirmed cases over the actual confirmed cases. It is seen that the relative errors for the number of new infections were generally small, averaged at 11.0 and 18.6% for the 7-day and 14-day predictions, respectively. Those results provided support for the vSIADR model and its estimation, and some assurance for its applications in the COVID-19 modelling and analyses.

(f) Modelling R t by policy scores
To gain knowledge on the effects of the COVID-19 counter measures on the reproduction number R t , we consider a linear mixed-effect model [18] which tries to explain the R t movement in terms of the policies imposed by the governments. This is inspired by [19], which models the effective reproduction number R m,t for the mth country at time t by scaling a baseline prior R m,0 with a linear combination of intervention indicators k α k I k,m (t) under the assumption that the effects of interventions are the same across all countries, where I k,m (t) equals 1 only if intervention policy k is imposed in country m at time t. In order to accommodate both common features shared by a group of countries and country-specific effects in the epidemiological processes, we consider using the linear mixed-effect model (LMM).
LetR m,t be the estimated effective reproduction number for the mth country at time t, and t = 0 represent the DCT, the start date of community transmission. Let s 1,m (t), s 2,m (t) and s 3,m (t) be the OxCGRT policy scores at time t corresponding to the country's stringency measure, economics support and healthcare, respectively, and s 4,m (t) be the NO 2 score. To reduce noise in the policy scores, we conducted 3-day average of s k,m (t) andR m,t , and denote the corresponding average values ass k,m (t) andR m,t . Due to the delayed policy effects as revealed in electronic supplementary material, figure S1, we consider the relationship betweenR m,t ands k,m (t − L), where the latter is the L-lagged values of the scores.
is the change of the policy scores between (t − L)th day and the Lth day before the DCT, β k is a fixed coefficient representing the common effect of the kth score on the change ofR m,t , and {b k,m } are the random effects of the kth score for each country, which are modelled as b k,m ∼ N(0, σ 2 k ) for k = 1, . . . , 4. The errors m (t) are assumed to be independently N(0, σ 2 ) distributed for each t. The lag value L = 12, representing the delay effect of the intervention policies, was chosen by minimizing the residual sum of square of the fitted model.
The advantages of the linear mixed-effect model (2.9) are in its incorporation of the groupcommon and individual-specific effects within a model. The fixed coefficient β k represents the average effect of the kth policy on controlling the spread of COVID-19 across all countries. The random slopes {b k,m } reveal the country-wise specific effect for the kth score. If the variance σ 2 k is not significantly non-zero, we may conclude that the kth policy had the same effect β k for all countries in the group. Otherwise if the variance σ 2 k is significantly non-zero, heterogeneous country-wise effects of the kth policy exist. For instance, a country with a negative (positive) b k,m indicates a country-specific effect of the kth policy on deviating above (below) the common effect β k on the effective reproduction of the epidemic. Hence, with the proposed LMM, we can estimate not only the commonly shared policy effect, but also conduct inference on the homogeneity of policy effects, and compare the heterogeneous effects among different countries.
Since the European and American countries have generally similar policy strategies and most of their epidemics are currently more severe than the countries in Asian and Australia, we divide the 24 countries into two groups: European and American countries as one group and Asian and Oceania countries as another. We fit the LMM (2.9) for the two groups separately by the lmer function in the lme4 R package, and predict the mixed effect {b k,m } by the best linear unbiased prediction (BLUP). Results are presented and discussed in §5.

Estimated effective reproduction numbers
To evaluate community transmission and policy effects of COVID-19 across different countries, we first estimate the effective reproduction number R t from the start date of each country's local transmission under the vSIADR model using the estimation approach outlined in §2. Table 1 reports the estimated reproduction numbers R t on the start date, and their weekly averages over the next four weeks. Figure 1 displays the R t curves up to 20 April focusing on the first wave, while figure 2 shows the R t up to 31 December for the first and second, and even the third waves of the epidemics in these countries. The average reproductive numbers on the start, which may be viewed as the basic reproduction number R 0 , was 5.25 (SE: 0.31) among the 25 countries with the lower and upper 25% quartiles being 4.24 and 6.39, respectively. The 14 European countries had higher R 0 , averaged at 5.65 (SE: 0.39). See table S4 in electronic supplementary material for the 95% confidence intervals of R 0 . As R 0 is known to be subject to high volatility [34], one may look at the average R t in the first week after the start of local transmission, which was 3.97 (SE: 0.22). Our estimate of R 0 for non-Hubei China (4.78) is closer to that in [35] (R 0 = 5.7 for Wuhan). Our estimated average R 0 over the 25 countries was higher than the estimates 2-3 in [36,37] on Wuhan and 3.15 (CI: 3.04-3.26) in [8]. Figure 1 shows that China and Korea's R t curves fell sharply below the other 23 countries' over most of the four weeks after the start dates. Their rapid decline in the reproducing power first four weeks, their cumulative percentages of declines were the most as shown in electronic supplementary material, table S5. The drastic decline in the reproduction of China was consistent with the studies [38,39]. Korea and China's rapid declines in their R t were due to their quick responses during the first four weeks of the epidemics, which included requiring face mask in public areas and reducing contact rates by sealing off outbreak areas and communities, promoting home isolation and enforcing quarantine for close contacts of diagnosed [8,40], which were effective measures in the early stage of the epidemic [5,12]. These were reflected in figure 3a which shows that China had the quickest increase in the stringency score within five days before and after the DCT, while Korea had the highest testing rates in the first four weeks. Indeed, Korea conducted active testing in its epidemic centre with more than half-million tests being carried out in the first month since the epidemic started [41,42].
We identify the time points when the estimated R t started to go above the critical threshold level 1 and stayed so over a period of time as the period of the first, second or the third wave of a country. Figure 2 shows a strong correspondence between R t > 1 and the substantial increase of newly reported cases N(t). Figure 2 also displays the estimated R t curves with marked start and ending dates of the first, second and the third (if any) waves for each country. Table 2 provides the start dates and the lengths of each wave for the 25 countries. Among the 24 countries having had the second wave, Canada, Malaysia and Thailand were still in the middle of it without reaching the turning point (defined as the first time when R t < 1). By 31 December, 14 countries have started the third wave and three of them have reached the turning points. The Kaplan-Meier estimate [43] for the average length of the second wave was 113 (SE 11.2), which was almost three times of the first wave (43, SE 5.9). The average gap time between the first and second waves was 92 days (SE: 8.4), and that between the second and third waves was 38 days (SE: 4.5). As shown in figure 2, although the average R t s in the second and third wave were not as high as those in the first wave, the daily new cases increased much more as the size of the infected population was much larger in the later waves. Thus, one should not think the relative low R t in the second and third wave would suggest the epidemics in the second and third waves was less severe. Table 2 shows a substantial reduction in the death rate in the second wave in most countries relative to the first wave. Despite the overall average of the daily death ratios in the second wave  Table 2. Start date, length (in days) and the number of death between the start date and the turning point of the first, second and third waves in the 25 countries from their DCT to 31 December 2020. The average length of the second wave and their standard error estimated by the Kaplan-Meier method [43] are provided in parentheses. The columns headed by 'Ratio' report the ratios of the average daily death in the second (third) wave to that in the first wave. Empty cells mean the third wave has not started.   to that in the first wave was close to 1 (98.7%, SE 25%), 18 of the 24 countries had the daily death ratio less than 0.85. The average daily death ratios was 68.8% (SE 15%) if we exclude Australia and Germany whose daily death ratios in the second wave were more than 4. Figure 2     that the daily increase of death in the second wave was smaller than that in the first wave for most countries, especially for the European countries. But the death number started to climb up at the end of second wave for some countries due to the increase in the cases, while the numbers of death stayed at high level after the turning points due to the delay effect. We see a much higher daily death rate in 11 of the 14 countries having had the third wave. The average daily death ratios in the third wave relative to the first two waves in the 14 countries were very high, reaching 3.97 and 5.40, respectively, indicating a very grim situation.
Perhaps it was because of the relative low case numbers (hence low deaths) in majority of countries and a stringency fatigue, the second wave failed to arouse sufficient counter measures of the governments in most of the countries, which led to the prolonged duration of the second wave (average 113 days, SE 11.2) with three countries still not emerging from it on 31 December. The lack of governments' response to the second wave was clearly shown in figure 4 which presents the average estimated R t and the four average scores related to control measures in the 24 countries which had experienced the first two waves of epidemics. The figure demonstrates concordance in the changes of R t and the policy scores, which confirms the sensitivity of the epidemics to the control measures. In particular, the decrease in R t after the start of the first wave was highly correlated with the large increase in the stringency and economy scores. However, the very important stringency index was decreasing till the start of the second wave, indicating relaxed stringency measures before the start of the second wave. Only the health index increased in the first 45 days of the second wave, while the average stringency measures were flat at less than 55% of the maximum stringency strength seen in the first wave. This suggests the governments were reluctant to resort to the maximum stringency in the second wave. For the third wave, we observed that in the first 45 days of the third wave, the average stringency measures were more than 65% of the maximum stringency strength, but still largely flat, and the economy index increased from 70.31 to 82.14% of the maximum strength 25 days (in average) after the start of the third wave.

Scenario analysis and evaluation
Part of the COVID-19 control measures were designed to reduce the infection rates by limiting the contact probability. Thus, policy scenarios are made by altering the infection rates and hence the number R t while keeping each country's other epidemic parameters: the diagnose rate α, the recovery and death rates γ r,t and the γ d,t the same within the proposed vSIADR model while fixing the pre-symptomatic rate θ = 0.8. Epidemiological scenarios of Korea and China are generated for other countries in the early stage of the pandemic by applying the daily change percentages of the

(a) Epidemic projections under Korea and China's scenarios
Given the effectiveness of Korea and China's approaches in containing COVID-19 in the early stage of the epidemic, we generate scenarios for other countries that mimic Korea and China's daily reduction percentages in the infection rates from the 8th day since the start of local transmission, while keeping their diagnosis, recovery and death rates intact. The generated numbers after Day 8 create scenarios for other countries (shown in electronic supplementary material, figure S10) where the control measures of Korea and China were implemented.
Comparing to the actual cases observed up to 20 April, figure 5 shows that 1.83 (1.88) millions reductions in the confirmed cases and 139 321 (142 645) reduction in deaths for the 23 countries under the Korea's (China's) scenario, mounting to 89% (91%) and 86% (88%) reductions of the confirmed cases and deaths on average, respectively (more details in electronic supplementary material, table S6). The reductions in the cases and deaths would have been phenomenal for the USA, Japan, the UK and France, attaining more than 92% reductions in the confirmed cases and 86% reduction in deaths under Korea's scenario, if control measures had been implemented earlier that would lead to the reductions in the infection rates from Day 8 of community transmission. And those under China's scenario would be a few percentage points more.

(b) Evaluation on the USA and the UK
We consider two policy intervention experiments specific to the USA and the UK, which represent early and delayed implementation of counter-COVID-19 measures. 13 March and 20 March were the dates of firm policy measures by the USA and the UK governments, respectively, when the USA declared the national emergency and the UK started to close schools and public facilities. It appears from figure 6 that sustained declining trend of R t was established after 13 March and 20 March for the USA and the UK, respectively. The experimental design of 5-day earlier intervention would mean the R t curves start to decline from 8 March for the USA and 15 March for the UK at the actual daily declines rates from 13 and 20 March, respectively. The 5-day delayed experiments is to mimic later actions that would delay the decline in R t from 13 and 20 March for 5 days to 18 and 25 March, respectively. Figure 6a,b display the actual R t curves with the ones under the two designs.
The numbers of death cases under the two experiments are reported in figure 6c,d; see electronic supplementary material, table S7 for the specific numbers of death and total cases, including asymptomatic cases along with the observed statistics up to 20 April. Our result shows that by acting earlier both countries would have seen substantial reductions in the total number of infected cases and deaths: the cases and deaths in the USA would have been reduced by about 80 and 78% on 20 April, respectively; and the UK by 28 and 28%, respectively. By contrast, under the 5-day delayed postulations, the cases and deaths in the USA would have increased by 384 and 315%, and the UK by 42 and 37% on 20 April, respectively.
The above results indicate that the USA was more responsive to the intervention than the UK, as the earlier (delayed) intervention would reduce (increase) more infection cases and deaths than the UK. This can be explained in several aspects. First, the absolute decline of US's R t was 2.46, more than the UK's 1.65 over the two weeks from 13 and 20 March, although the relative decline were comparable at 54 and 55%, respectively. Second, the US implemented the intervention relatively earlier as 13 March was the 14th day after the US's start date while 20 March was the 24th days for the UK. Thus, the USA would have more time to amplify the effect of the intervention.
A less obvious reason for the differential sensitivity lies in the estimated diagnosed rates: 0.17 for the USA and 0.1 for the UK, implying the UK having longer time for diagnosing the infected deaths U S S p a i n I t a l y F r a n c e G e r m a n y U K T u r k e y I r a n B r a z i l B e l g i u m C a n a d a H o l l a n d S w i t z e r l a n d P o r t u g a l S w e d e n A u s t r i a J a p a n K o r e a S i n g a p o r e D e n m a r k N o r w a y A u s t r a l i a M a l a y s i a T h a i l a n d U S S p a i n I t a l y F r a n c e G e r m a n y U K T u r k e y I r a n B r a z i l B e l g i u m C a n a d a H o l l a n d S w i t z e r l a n d P o r t u g a l S w e d e n A u s t r i a J a p a n than the USA. The larger diagnosis rate for the USA means quicker turn-over time from exposure to diagnosis, which would bring early peak time for the active infected cases I(t) and reduce the size of infections and hence the death number. Our result shows the numbers of cases for the USA with the UK's diagnosis rate (0.10) would have amplified by more than 510% and the deaths by 212% on 20 April under the 5-day delay design. For the UK with the US's higher diagnosis rate (0.17), there would have been a further 16% reduction in the number of total cases on 20 April under the 5-day early design, where the impacts on the death was slight. A high diagnosis rate is part of the Korea's counter COVID-19 strategy.

Policy effects on R t
The countries have implemented a range of policies as counter measures to control the COVID-19 pandemic. Electronic supplementary material, figure S1 reports significant negative correlations between the OxCGRT scores with the 0-3 weeks delayed effective numbers R t for all 25 countries from each country's DCT to 31 December 2020. It shows that the correlation between the stringency score and the one or two (three) weeks lagged R t being the highest in 18 (6)  Here the turning point is defined as the first day that R t < 1 since the start of community infection and has stayed so for 7 consecutive days. The average time to the turning point for the 24 countries was 39.7 days (SE: 3.97). It took Brazil 145 days to reach the turning point, more than double the second longest of 58 days by the US as shown in figure 3b, which led us to treat Brazil as an outlier. Figure 3b also presents the scatter plot of the average OxCGRT stringency index within the first two weeks after DCT and the time to the turning points. The correlation between the average stringency score in the first two weeks since the DCT and the time to the turning point was −0.57 (p-value: 0.002) (excluding Brazil), indicating the overall effects of mounting the counter measures within the first two weeks of the epidemic for turning around the epidemics sooner.
In additional to the correlation analysis, we fit the LMM (2.9) for the European-American group and the Asian-Oceania group separately. China is not included in the Asian-Oceania group due to its rather short local transmission period and most of the infections since April were imported cases. Table 3   k of the random effects, together with their significance of testing β k = 0 and σ 2 k = 0, and the BLUP for the random effects {b k,m } under the LMM for those two groups of countries. Electronic supplementary material, table S8 reports the coefficients of determination R 2 for the countries as goodness-of-fit measures for their fitted curves as a function of the policy scores. The results show that the fitting performance of the proposed LMM was generally good with average R 2 being 0.554 for European and American countries and 0.372 for Asian and Oceania countries, respectively. Lower levels of R 2 were observed for Sweden (0.151), Denmark (0.190) and Iran (0.277), largely due to infrequent changes in the policy scores.
For the European-American group, the stringency, economic and mobility (NO 2 ) scores were all significant at the 5% significance level, where stringency and economic were negative and the mobility index was positive, implying that a higher stringency level, cash assistance and debt reliefs offered by the countries were effective in during the epidemics. It also shows that more mobility (increased NO 2 ) encouraged the increase of infection. The variance of the random effects for the four scores were all significantly not zero, implying there were individual country-specific effects for the four factors.
The NO 2 data were not available for Singapore and Malaysia. We fitted the LMM with the NO 2 index for the remaining countries in the Asian-Oceania group, which showed that the NO 2 index was not significant. As a result, we did not consider NO 2 in the model for the group. Table 3(b) shows that the fixed effect β k and random effect variance σ 2 k were significantly non-zero only for the stringency index among the three OxCGRT indices. The magnitude of the estimated β 1 of the Asian-Oceania group was about twice of that of the European-American group. As β 1 indicates the average reduction of log(R m,t ) over all countries in the group for one unit increase in the stringency score, this shows that stringency policy in general was twice more effective for Asian and Oceania countries than that for the European and American countries. For the economic relief effects, since the absolute estimated β 2 of the European-American group was larger than that of the Asian-Oceania group, while β 2 was not significant for the Asian-Oceania group, the economic relief measures were more effective in the European and American countries.
For country-specific effects of the stringency measures within the European-American group, it is noted that the five most negative estimated random effects b 1,m were Germany, Sweden, France, Spain and Turkey (in increasing order). This implies that these countries had the most effective stringency measures among the European-American group. While, the least effective countries are Brazil, Canada, the USA, Italy, the UK with highest estimated b 1,m in positive range, which also had the most and least effective economic relief measures, respectively, based on the ranking of the estimated b 2,m .

Projection based on policy scenarios
Based on the vSIADR model and the LMM, we conduct projections from 31 December 2020 to 28 February 2021 under different settings of control measures to show how the epidemics would evolve under three different strategies of policies. The first scenario (Current Scenario) is to maintain the three policy scores as well as the NO 2 levels of each country as those on 31 December 2020 for the target months of January and February 2021. The scenario represents a situation that the countries would maintain policy measures as they were on 31 December 2020. The second scenario (Maximum Scenario) assumes each country adopts the strongest policy intervention it has taken in the pandemic, meaning the policy scores at their respective maximum values and NO 2 levels at the minimum throughout January and February 2021. Projections under this scenario show how the epidemics would evolve if the control measures are strengthened to their maximum levels again. The third scenario (50% Scenario) has the policy scores in January and February 2021 behalf of the historic maximums and twice of the minimum NO 2 for each country. To gain information on the practical relevance of such projections, we conducted onemonth projections under the three scenarios using data up to 15 December 2020 to predict the state variables on 15 January 2021 as displayed in electronic supplementary material, figures S11 and S12. It is shown that the errors in the projected deaths and cases standardized by population  Table 3. Estimates (multiplied by 10 2 ) for the common policy effects and the standard deviation (σ ) of the individual random effects together with their level of significance as reflected by the number of * under the linear mixed-effect model for (a) European and American countries and (b) Asian and Oceania countries using data from DCT to 31 December 2020, where β 1 , . . . , β 4 represent coefficient of stringency, economics, healthcare and NO 2 related index, respectively. sizes were small under the scenario closest to the actual policy measure of the country during the projection period. It is noted that there were some increases in the stringency and economic scores in the last two months of 2020 as more countries entered the more deadly third wave. The averages of the two scores for the 24 countries on 31 December were 83.7 and 90.2% of the historic maximums, respectively, up 15 and 4.7% from those on 30 October 2020. However, the average health score was down by 1.9% over the period.
For the Current Scenario, the projected R t is chosen as its estimated value on 31 December 2020 according to the LMM (2.9). For the Maximum Scenario, we use a fraction of the estimated R t on 31 December 2020 by a discount factor equal to the ratio of the fitted R t under the strongest policy scores over the actual scores on 31 December 2020; and likewise for the 50% Scenario. Given that policy measures have delayed effect on R t , a two-week transitional period is set to linearly transform the R t on 31 December 2020 to the projected value under a scenario on 13 January 2021, which is then kept fixed until 28 February 2021. We then solve the infection rates β I p t and β I a t = β D t = β I p t /r from the expression of R t in (2.3) with the pre-symptomatic rate θ = 0.8. Other epidemic parameters γ r,t , γ d,t , α and the infection ratio r of pre-symptomatic cases to asymptomatic cases were set as the estimated values based on the data up to 31 December 2020. We substitute those parameters into the conditional Poisson-vSIADR framework (2.2) to project the dynamic epidemiological state variables over the projected period. Projection intervals are constructed using the same procedure as the 95% confidence intervals of R t (in the electronic supplementary material) via the LMM.
The projected results under the scenarios are summarized in figures 7 for the ratio of cumulative confirmed cases and the number of death relative to the population total of respective country with the 95% projection intervals. The absolute numbers of the projected infections and deaths are reported in electronic supplementary material, figures S13 and S14, while the total number of infections (including asymptomatic cases) are reported in figure S15. Compared to the Current Scenario, the Maximum Scenario would have the confirmed cases decreased by 25.9% in average among the 24 countries by the end of February 2021, and eight countries decreased by more than 30%. The total infection size including asymptomatic cases would decrease by 31.4%, averaged over the 24 countries, with 11 countries decreased by more than 30%. Under the Maximum Scenario, for the three countries currently in the second wave and 10 of the 11 countries currently in the third wave, their R t would drop below 1 by the end of February 2021 implying that the second or third wave of the epidemics could be finished by then. However, Japan, who is currently in the third wave, needs to apply 134% of the its past maximum levels of the control measures in order to attain the turning point before the end of February.
Under the 50% Scenario (half of the maximum policy scores and twice of the minimum NO 2 level), the confirmed cases would increase by 833% in average, and for six countries (Japan, Malaysia, Norway, Germany, France and Italy), the confirmed cases would increase more than 10 times as compared to those under the Current Scenario. The total infections would increase more dramatically by 1066% in average among the 24 countries. One reason for the dramatic change is that with the half of the maximum policy scores, some countries whose current R t levels were relatively low on 31 December 2020 would exceed 1, leading to exponential growth of the infections. Under this scenario, all countries which were in the second or third wave would remain so, as their R t would be above 1 under the relaxed control measures.
On the number of death, compared to the Current Scenario, the average projected deaths would be reduced by 17.4% under the Maximum Scenario and increased by 477.2% under the 50% Scenario among the 24 countries. The relative less reduction in death under the Maximum Scenario was partly due to the difference between the score values on 31 December 2020 and the Maximum scores having become smaller. The projection results also reveal that for countries with low current infection cases such as Singapore and Australia, whose total infection size were under 500 at the end of 2020, it is comparatively safe to relax the policy to some extent as the projection results are not sensitive to the three scenarios. However, for countries like Japan, Sweden, the USA    and France, if their policies were reduced to 50% of the maximum levels, the confirmed cases at the end of the February 2021 would increase over 550% of those under the Current Scenario, and deaths more than tripled. These are due to the four countries had at least one of the following characteristics on 31 December 2020: (i) high R t ; (ii) excessive stock of infection, for instance the USA had over 1.9 millions infected cases; (iii) relative lower maximum level of stringency score than most countries. The maximum stringency scores of Japan and the Sweden were the first and third lowest among the 24 countries, respectively. It is expected that much higher casualties would occur once their policy scores reduce to 50% of the maximum level seen in the past. Therefore, it is unwise for these countries to relax their policy as the consequences would be too much to bear.

Discussion
With a novel stochastic disease transmission model, country's epidemic characteristics are estimated, and then used to evaluate the effectiveness of COVID-19 control strategies. The model and the estimation allow a country to obtain potential infection sizes and deaths if other country's policy-implied epidemic characteristics were implemented in the quest for a better strategy. Our study shows that both the sizes of infection and deaths of COVID-19 are particularly responsive to early and more strict containment measures as verified not only by the results from Korea and China, but also by the potential outcomes generated under the counter-factual experiments for the USA and the UK. There are several critical lessons one can deduce from the 25 countries' COVID-19 experiences to prepare for the future course of the COVID-19 pandemics or other infectious disease. The first one is to take action to reduce the contact rate as early as possible, which is shown to the effective in reducing R t and slowing down the infection. Taking early stringency measures is especially efficient in controlling the infection size as COVID-19 is very infectious. Second, stringency measures and economy aids are effective measures in controlling the spread of COVID-19. Our analyses shows stronger stringency measures lead to more reduction in the reproduction number and shortening the time to the turning point. The third lesson is to maintain a high level of diagnostic testing to detect and quarantine the infected cases early, which is proven successfully in Korea and by the USA and the UK diagnosis-rate-exchange experiment.
It is rather disappointing to see the second wave countries was not very responsive to the worsening epidemics in the second wave, probably due to the desire and need to re-start economy which was encouraged by the public's fatigue to the stringency control measures and the lower death rates in many countries in the second wave. Our projections for the epidemic situations in January and February 2021 suggest that countries would have substantial epidemiological benefits if their stringency levels return to their historic maximum levels. Stronger control measures would curb the winter epidemics and avoid more deaths. There is a great urgency for those countries with low maximum stringency levels to adopt tougher containment measures than their respective maximum level in order to turn around the worsening situations.
Data accessibility. The data are all from public data sources as stated in the Data description.