SUIHTER: a new mathematical model for COVID-19. Application to the analysis of the second epidemic outbreak in Italy

The COVID-19 epidemic is the latest in a long list of pandemics that have affected humankind in the last century. In this paper, we propose a novel mathematical epidemiological model named SUIHTER from the names of the seven compartments that it comprises: susceptible uninfected individuals (S), undetected (both asymptomatic and symptomatic) infected (U), isolated infected (I), hospitalized (H), threatened (T), extinct (E) and recovered (R). A suitable parameter calibration that is based on the combined use of the least-squares method and the Markov chain Monte Carlo method is proposed with the aim of reproducing the past history of the epidemic in Italy, which surfaced in late February and is still ongoing to date, and of validating SUIHTER in terms of its predicting capabilities. A distinctive feature of the new model is that it allows a one-to-one calibration strategy between the model compartments and the data that are made available daily by the Italian Civil Protection Department. The new model is then applied to the analysis of the Italian epidemic with emphasis on the second outbreak, which emerged in autumn 2020. In particular, we show that the epidemiological model SUIHTER can be suitably used in a predictive manner to perform scenario analysis at a national level.


Introduction
The coronavirus disease 2019 (COVID- 19) pandemic is a tremendous threat to global health. Since the first outbreak in early December 2019 in China, more than 1 834 573 deaths have been registered worldwide, while the estimated total number of confirmed cases is 84 511 153 up to 2 January 2021 [1]. The real number of people infected is unknown, but it is probably much higher. In this scenario, predicting the trend of the epidemic is of paramount importance to mitigate the pressure on health systems and to activate control strategies (e.g. quarantines, lockdowns and suspension of travel) aimed at containing the disease and delaying the spread.
As these predictions have vital consequences for the different actions taken by governments to limit and control the COVID-19 pandemic, the recent period has seen considerable growth of epidemiological mathematical models (e.g. [2][3][4][5][6][7]). However, estimates and scenarios emerging from modelling highly depend on different factors, ranging from epidemiological assumptions to, perhaps most importantly, the completeness and quality of the data on which models are calibrated. Since the beginning of the COVID-19 emergency, the quality of data on infections, deaths, tests and other factors has been spoiled by under-detection or inconsistent detection of cases, reporting delays and poor documentation. This has affected, and is still to date hampering, the intrinsic predictive capability of mathematical models.
However, the peculiar epidemiological traits of COVID-19 require models that are better able to accurately portray the mutable dynamic characteristics of the ongoing epidemic, with particular emphasis on two critical aspects: (i) the crucial role played by the undetected (both asymptomatic and symptomatic) individuals; (ii) the number of individuals who require intensive care unit (ICU) admission. This latter aspect is of paramount importance in designing realistic scenarios that incorporate the pressure of the epidemic on national health systems.
In this paper, we introduce a new mathematical model, named SUIHTER, based on the initials of the seven compartments that it comprises: susceptible uninfected individuals (S), undetected (both asymptomatic and symptomatic) infected (U), isolated infected (I), hospitalized (H), threatened (T), extinct (E), recovered (R). It is a system of coupled ordinary differential equations (ODEs) that are driven by a set of parameters that are indeed piecewise constant time-dependent functions. A first set of parameters denote the transmission rates due to contacts between susceptible and undetected, quarantined or hospitalized subjects. A second set of parameters mimics the rates at which I (isolated) and H (hospitalized) individuals develop clinically relevant or life-threatening symptoms. A further parameter indicates the probability rate of detection of previously undetected infected individuals. Another set of parameters indicates the rate of recovery for the four classes of infected subjects. Finally, the last parameters denote the mortality rates for the different compartments.
This SUIHTER model has been conceived to overcome some of the limitations that can be found in existing epidemiological models applied to the COVID-19 pandemic. On the one hand, some studies adopt simple SIR-like models [6,12,14], which have the advantage of a limited number of parameters to be calibrated but are unable to track the dynamics of different categories of infected individuals. On the other hand, other multi-compartmental models (e.g. [2,5]) have been proposed to account for the detailed knowledge of the clinical characterization for different classes of infected individuals according to the actual level of disease severity. However, it is not always possible (and, even when possible, it is not easy) to associate the multiple infected compartments with the available data. The SUIHTER model has been designed with the objective of creating the most compact model able to predict the different categories of infectious individuals that are considered relevant by the policymakers.
A key challenge in modelling the dynamics of the COVID-19 epidemic is represented by the large number of undetected cases. Indeed, the contribution to the spread of the epidemic by (often asymptomatic) undetected cases is too relevant to be neglected. Several authors [5,15,16] have attempted, using different strategies, to quantify the number of undetected infections and their effect on epidemic spread. In the present work, we propose a strategy for the initialization of those compartments that are not covered by the data (such as susceptible, undetected and recovered).
The model adopts a two-step calibration process based on a preliminary estimation of the model parameters that uses a least-squares (LS) minimization, followed by a Bayesian calibration performed through a Markov chain Monte Carlo (MCMC) algorithm.
The model has been adopted to simulate the second COVID-19 epidemic outbreak in Italy, began in autumn 2020 (and is still ongoing). In particular, we have investigated the capability of the model in forecasting the occurrence of a peak for the most relevant compartments with adequate advance notice. The results of the calibration, simulation by SUIHTER and predictions for Italy and the six largest Italian regions are also reported.
The outline of the paper is as follows: in §2, we introduce the SUIHTER mathematical model; §3 is devoted to the description of the calibration procedure; and §4 contains the numerical results along with the discussion. In §5, we draw our conclusions and we discuss some of the model's limitations.

Mathematical model
The spread of COVID-19 has made it clear that it is of paramount importance to include in epidemiological models a compartment describing the dynamics of infected individuals who are still undetected. This is, for example, the case in [5]. However, some compartments presented in [5] (undetected asymptomatic infected and undetected symptomatic infected) are virtually impossible to validate since these classes of individuals cannot be traced in public databases (see [17]). For this reason, building upon [5], we propose a new model more suited to taking full advantage of publicly available data. In particular, our model is described by the following system of ODEs:Ṡ healing h e a l i n g and N = S + U + I + H + T + E + R denotes the total population (assumed constant). The model is characterized by the following 14 parameters, some of which are chosen as timedependent piecewise polynomial functions: β U , β I , β H denote the transmission rates due to contacts between a susceptible subject and an undetected infected, a quarantined or a hospitalized subject, respectively; -ω I denotes the rate at which I-individuals develop clinically relevant symptoms, while ω H denotes the rate at which H-individuals develop life-threatening symptoms; -θ H and θ T denote the rates at which the health of H-and T-individuals improves and they return to the less critical I and H compartments, respectively; -δ denotes the probability rate of detection, relative to undetected infected individuals; -ρ U , ρ I and ρ H denote the rate of recovery for three classes (U, I and H, respectively) of infected subjects; -γ I , γ H and γ T denote the mortality rates for the individuals isolated at home, hospitalized and being cared for in ICUs, respectively.
Since data available for the recovered cases do not include those individuals who recovered before being detected, we also propose a novel indicator that will be denoted as recovered from detected and that we define as This indicator can be obtained in postprocessing from computed compartments and collects those individuals who recover after being detected.
In mathematical epidemiology, a fundamental quantity is the basic reproduction number (denoted by R 0 ), which is used to measure the transmission potential of a disease. It represents the average number of secondary infections produced by a typical case of an infection in a population where everyone is susceptible (see [9,11,18]). For our model, by using a similar argument to the one adopted in the proof of proposition 1 in [5], we find For the sake of comparison (see eqn (32) in [5]), we observe that in the present context the characteristic polynomial q(s) of the Jacobian matrix associated with the linearization of (2.1) around the equilibrium configuration (S, 0, 0, 0, 0,Ē,R) withS +Ē +R = N is From a mathematical point of view, the reproduction number R 0 plays the role of a threshold value at the outset of the epidemic. If R 0 > 1, the disease spreads in the population; if R 0 < 1, the number of infected gradually declines to zero. Note that all factors in equation (2.2) are, as expected, actually positive. Furthermore, the expression (2.2) considerably simplifies upon assuming, as done in the remainder of this paper, that β I = β H = θ H = γ H = 0. Our SUIHTER model, as with other compartmental models, corresponds to a particular case of an integral model with arbitrary distribution of infectious time, for which R 0 is well known [18].

(a) Model initialization
A critical issue is the way in which those compartments for which data are unavailable (susceptible, undetected and recovered) are initialized. In particular, when the analysis focuses on a late phase of the epidemic, as in the present investigation of the second epidemic outbreak in autumn 2020, it may be difficult to estimate those 'initial' values as a result of the simulation from day 0.
For these reasons, we have devised a strategy to estimate the number of recovered and undetected individuals based on the value of the infection fatality ratio (IFR), which is defined as the ratio between the number of deaths and the number of resolved cases (dead or recovered) at a specific time (ideally at the end of the epidemic) We assume that IFR will be roughly constant in time, at least over the first wave. By using the age-dependent estimates given in [19] ) denote the deaths and (detected) recovered cases observed in a time window of size t = 28 days around a given time t. The number of undetected individuals at a given time is estimated by assuming that the detection ratio, that is, the percentage of detected cases with respect to the total number of positive cases, is computed as . (2.5) Here, we have assumed that the variation in the number of total positive individuals in the time window [t − t, t + t] can be approximated by the variation in the resolved cases shifted by a confirmation-to-death delay d = 13 (see [20]), namely Similarly, we have assumed that the variation in the number of detected positive individuals in the time window [t − t, t + t] can be approximated by the variation in the resolved detected cases shifted by the same time delay .
By using the available data for I, H, T and E, other than the value of S deduced under the assumption of constant total population and the estimate of CFR(t) given in equation (2.4), we estimate the initial conditions for R and U as and from equations (2.3) and (2.5), respectively.

Parameter calibration
Model calibration through data fitting is essential to reproduce the past history of the epidemic and to perform short-term forecasts by inferring the epidemiological characteristics of COVID-19.
Here, we use data reported for isolated, hospitalized, threatened and extinct cases to estimate the parameters of the proposed SUIHTER model. In particular, we perform the calibration in two steps. Firstly, we find a set of parameter values using an (ordinary) LS estimator. Then, we perform a Bayesian calibration using an MCMC algorithm, starting from a prior distribution of the parameters centred about the LS estimate. Calibration of epidemiological models has already been performed in a Bayesian framework, following the pioneering paper by O'Neill & Roberts [21], for several infectious diseases [22][23][24]. In the case of the COVID-19 epidemic, Bayesian inference has been performed using simpler SIR [25,26], meta-community SEIR-like [2,4,13,27,28] and SEIAR [7] models, in the last case aiming at estimating nine parameters-including a dynamic, time-dependent contact rate β(t)-during the first outbreak of the COVID-19 epidemic. In addition to model calibration, our analysis also provides a numerical assessment of the predictive capability of the model, in forecasting with adequate advance notice the occurrence of a peak for the most relevant compartments.
System (2.1) can be recast in the following general form, which describes a system of ODEs for a state vector Y with n e components (or compartments): find [Y 1 (t), . . . , Y n e (t)] T , such that Y (t) = F(t, Y(t); p(t)) t ∈ (t I , t F ] (3.1) and where F is the right-hand side of system (2.1) and Y 0 ∈ R n e denotes the initial condition at time t I evaluated as discussed in §2a. The evolution of the system depends on n par time-dependent parameters, collected into the function p(t) : (t I , t F ] → R n p . Note that, at the discrete level, the nonlinear nature of system (2.1) is treated by considering an explicit (fourth-order Runge-Kutta) time discretization with time step t = 1 day. Let us partition the interval I = [t I , t F ] into n ph phases, corresponding to different epidemic stages due to, for example, partial restrictions (such as lockdown measures) or different containment rules introduced by the government or by the local authorities, so that in each phase the environmental conditions may be considered fixed. Thus, in each phase, the values of the n par model parameters are assumed constant (but unknown), so that we can introduce the following set of admissible parameters: where p L,k , p U,k are constant vectors. For the sake of notation, let us denote by p ∈ R n p the vectors of unknown parameters to be estimated, with n p = n par n ph , and let Y = Y(t, p) highlight the dependence of the states on the parameters. Consequently, P ad is the n p -dimensional hypercube delimited by the constraints (3.3). Additional constraints on the parameters are assumed, by imposing that some of them are constant over all phases. Let t be a positive time step, for which we consider n me measurements of n com = 5 < n e compartments at equally spaced times t j = j t, j = 1, . . . , n me over the interval The first stage of the calibration process is then performed by seeking a LS estimate of the parameters vector, given by the solution of the following minimization problem: corresponding to the compartments I, H, T, E and R D , respectively, evaluated at time instant t j , j = 1, . . . , n me , for the set of parameters p. For a balanced distribution of the error across the different compartments, whose amplitudes vary with time, the dynamical weight coefficients are defined as α k (t j ) = 1/Ŷ k (t j ). We considered the official epidemiological data supplied daily by the Dipartimento della Protezione Civile (Italian Civil Protection Department), hereafter called 'raw data' and which are freely available at https://github.com/pcm-dpc/COVID-19 [17]. The accuracy of these data is highly questioned, in particular concerning the estimate of the total number of reported positive cases (strongly dependent on the daily screening effort) [4]. This lack of accuracy is indeed one of the most critical aspects of simple SIR-like models in which the calibration of the infected compartment is often performed using data on the reported positive cases. In our model, the n com = 5 time series selected for the calibration (isolated, hospitalized, threatened, extincts and recovered from detected) are in fact the data that have been supplied daily by the Italian authorities since the beginning of the pandemic. One of the key features of the proposed SUIHTER model is indeed the one-to-one correspondence of the compartments with the categories for which reliable data, such as those provided on a daily basis by the Italian Civil Protection Department, are available [17].
When n ph phases are considered, equation (3.5) leads to the optimization of n p = 14n ph parameters in total. Namely, for each phase of the epidemic, we have the 14 parameters given by Unfortunately, so many parameters make the calibration process problematic. In what follows, we calibrate our model under the following simplifying assumptions: β I and β H are set to zero, by assuming that the infection only occurs through a contact between a susceptible individual and an undetected infected individual (see, for instance, [29]); -θ H is set to zero as we assume that a hospitalized individual can return home only once that individual has recovered, since this parameter may be difficult to estimate in the absence of specific data on the H to I flux; -γ H are set to zero, by assuming that, when a hospitalized individual is in a life-threatening condition, that individual is moved to ICU; With these restrictions, the total number of parameters to be calibrated is reduced to 4n ph + 6. The first stage of the calibration process has been performed by solving the minimization problem (3.5) numerically. We have used a parallel version of the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm with box constraints (L-BFGS-B); see [30] for details.
The second stage of the calibration process aims to quantify uncertainties and has been carried out by employing a Bayesian framework, since the latter provides probability densities of the input parameters that can be propagated through the model.
Bayesian inference allows us to construct a probability distribution function (PDF) for the unknown parameters, merging prior information and available data, the latter entering in the expression of the likelihood function. At this stage, in order to account for the uncertainty on the initial conditions, we extend the set of parameters to be estimated top = (p, q), where q = (U(t I ), R(t I )) collects the initial conditions for the undetected and recovered compartments whose values are not available from the data and can only be estimated. The posterior PDF can then be obtained through the Bayes theorem on conditional probabilities. For the case at hand, we quantify the likelihood of the parameter vectorp corresponding to the model outcome Y k (t j ,p), k = {I, H, T, E, R D } in correlation with the reported casesD(t) as where the (unknown) variance σ 2 is assumed to be constant for each compartment.
Note that a usual assumption when estimating parameters of epidemiological models is that reported cases, given their discrete nature, follow either a Poisson or a negative binomial distribution [4,23,28,29]. However, given the relatively large number of cases, the normal distribution represents a flexible approximation, allowing for faster computations.
Using Bayes' theorem, we obtain the posterior distribution of the parametersp accounting for the prior knowledge on the parameters and the reported cases, as where π (p) denotes the (uniform) prior distribution for the parameters. Here, we assume that the prior PDF for the model parameters p is centred at the LS estimatep obtained during the former calibration stage, on a range [0.9p, 1.1p], while the prior of the initial values q is centred around an estimateq obtained based on the IFR (see equation (2.3)), on a range [0.7q, 1.3q]. The larger relative amplitude of the latter prior interval reflects the higher uncertainty on the initial value for the recovered and undetected compartments. An alternative, more common and rigorous procedure would require informative priors to be specified for the parameters, starting from key epidemiological features, as done, for example, in [4,29]. However, given the large numbers of parameters to be estimated-some of which do not find explicit counterparts in the epidemiological literature-we have assumed uniform priors, centred about the LS estimates, as a practical shortcut to overcoming the difficulty in specifying the prior distribution. In terms of predictive capability of the model, the numerical results provided in §4 allow us to assess the proposed approach.
Since we cannot obtain the posterior distribution over the model parameters p analytically, we adopt approximate-inference techniques based on Monte Carlo (MC) methods, which aim to generate a sequence of random samples from a Markov chain whose distribution approaches the posterior distribution asymptotically, whence the name of MCMC [31]. In particular, we have used the delayed rejection adaptive Metropolis (DRAM) algorithm [32] implemented in pymcmcstat; see [33] for the details. The first 500 000 samples of the chain serve to tune the sampler and are later discarded (burn-in period). We use the next 500 000 samples to approximate the posterior distribution for the parametersp.
From the generated chains, we draw N MC samples of the parametersp 1 , . . . ,p N MC that we use to perform forward propagation of uncertainty through the model, and to compute predictive envelopes of the SUIHTER model compartments (or predictive distributions).
We report the MC samples of the trajectories on the time interval (t I , t for ], including a forecast window (t F , t for ] that extends beyond the time window (t I , t F ] where data have been reported, to assess the predictive capability of the model.

Results and discussion
In this section, we present three batteries of numerical results assessing the forecasting capabilities of the SUIHTER model. Our analysis focuses on the second wave of the epidemic that started at the end of summer 2020 and, at the time of writing, is still affecting Italy. In §4a, we present the simulation of the second wave obtained with the SUIHTER model using for its calibration all the data between 20 August 2020 and 31 December 2020. By limiting the time range of the data used for the calibration, we also investigate the model's capability of forecasting the peaks of the different compartments (see §4b).
Our results at the national level for the second outbreak have been obtained by initializing the isolated, hospitalized, threatened and extinct compartments with the data provided by the Italian Civil Protection Department [17] on 20 August 2020, namely I = 15 063, H = 883, T = 68 and E = 35 418. The initial values for the undetected and recovered compartments are estimated using the strategy based on IFR and the time-dependent CFR introduced in §2a, resulting in the values U = 12 274 and R = 2 916 082, respectively. Finally, the initial condition for the susceptible compartments is given by Note that this would imply that, by the end of the first wave, around 4.8% of the Italian population had been infected. A serosurvey organized by Istituto Nazionale di Statistica (ISTAT) and Istituto Superiore di Sanità (ISS) had estimated that 2.5% of the Italian population had been infected [34,35]; the survey however had a low compliance, so that its results may be biased. A corresponding survey in Spain [36] with a much higher compliance rate estimated a seropositivity value of 4.6% or 5%, depending on the methodology used for the seroprevalence analysis. Using an ensemble model calibrated over several countries [37] the estimate of the proportion infected in Italy on 1 September 2020 was around 4.5%. Using instead a dynamical model calibrated over detailed data [28] the estimate of the proportion infected in Italy on 30 September 2020 was 4.78%. Thus, the value of recovered cases obtained for 20 August looks rather realistic.  The SUIHTER model has been used to simulate the second epidemic outbreak, from 20 August 2020 until 31 December 2020. The different phases in which the parameters can take different values have been identified according to the occurrence of some critical events: -24 September 2020: all schools at the national level reopened after the summer (and spring lockdown) closure (schools calendars vary by grades and by region in Italy); -8 October 2020: new rules imposing the mandatory use of masks in all locations (either indoor or outdoor) accessible to public; -26 October 2020: confinement rules including distance learning for most secondary schools, limitations on the activity of shops, bars and restaurants, strong limitation on sport and leisure activities; 1 -6 November 2020: stricter confinement rules including distance learning from the 9th grade, further restrictions on commercial activities, limitations on circulation outside people's own municipality (for some Italian regions, classified as red regions); 2 -15 November 2020: additional confinement rules as more regions became red regions; 3 -19 November 2020: additional confinement rules as more regions became red regions; 4 -29 November 2020: relaxation of confinement rules in some regions as they became orange regions; 5 -6 December 2020: relaxation of confinement rules in some regions as they became yellow regions; 6 -18 December 2020: stricter confinement rules are introduced for the Christmas holidays. 7 By considering a time lag of 4 days (to account for the incubation period) [38], the corresponding phases on which the model parameters are defined (and possibly changing) are: -phase 1: 20 August 2020-28 September 2020; -phase 2: 29 September 2020-11 October 2020; -phase 3: 12 October 2020-29 October 2020; -phase 4: 30 October 2020-9 November 2020; -phase 5: 10 November 2020-18 November 2020; -phase 6: 19 November 2020-23 November 2020; -phase 7: 24 November 2020-3 December 2020; -phase 8: 4 December 2020-10 December 2020; -phase 9: 11 December 2020-22 December 2020; -phase 10: 23 December 2020-31 December 2020.
As mentioned in §3, the compartments employed for calibration are only those with more reliable data, namely isolated (I), hospitalized (H), threatened (T), extinct (E) and recovered from detected individuals.
We performed the model calibration by employing the MCMC parameter estimation procedure described in §3, over the 10 phases, using the data over the full time range from 20 August 2020 to 31 December 2020. The simulations were run for the subsequent 15   In figure 2, we report the expected values for the time evolution of the seven compartments of the SUIHTER model as well as the time evolution of the additional compartment of the daily new positives, which corresponds to δ U(t), and the corresponding 95% prediction intervals obtained by propagating input uncertainties through the model.
We note that the calibrated compartments (isolated, hospitalized, threatened, extinct and recovered from detected) accurately fit the corresponding time series-data in the calibration phase. For all compartments the 15-day forecast also indicates the capability of the model in predicting the evolution of the epidemic at the national level.
Moreover, the time history of the daily new positives is also in reasonable agreement with the data, proving that the model is also able to capture the main dynamics of the system for quantities that are not directly driven by the data calibration. Our calibration indicates that, from 20 August 2020 to 15 January 2021, 4 056 118 ± 284 533 individuals were infected, of which 48.0% ± 2.0% had been detected. In addition, we estimate a posteriori, i.e. by using the outputs of the simulation, that the IFR during the same period was 1.3% ± 0.1%. This latter figure is compatible with the  IFR estimated at 1.2% for Italy by using estimates by age reported in [19], but is somewhat higher than the estimate shown in [37]. We also observe that our calculated estimates are likely to be underestimated as the second outbreak is still ongoing at the present time and compartments of isolated and extinct individuals become populated at different time scales.
The median values and the 95% credibility intervals computed by the MCMC calibration are reported in table 1 for the parameters that are constant over the simulation and for the initial values of undetected and recovered, while in table 2 we report the parameters that are free to change in each phase. The posterior distributions for all the constant and time-dependent parameters are reported in electronic supplementary material, figures S1 and S2. The traceplots for all the parameters (also available in electronic supplementary material, figures S3 and S4) indicate that the MCMC method showed good convergence after the first 500 000 samples.
The former parameters and time-dependent functions represent rates that can be used to interpret the dynamics of the second Italian outbreak. For example, large values of β U indicate sustained transmission rates at the corresponding phases. Values of healing rates ρ U , ρ I and ρ H are proportional to the probability of healing for individuals in the compartments U, I and H, but are inversely proportional to the corresponding average time of healing; the rate ρ I also incorporates the healing on isolated individuals who are however asymptomatic. To better understand the role of the parameters, note that, if they were constant, ρ I /(ρ I + ω I + γ I ) would represent the probability of an isolated individual recovering without being hospitalized, and similarly ρ H /(ρ H + ω H + γ H ) represents the probability of a hospitalized individual recovering without being transferred to an ICU. In the same way, γ T /(γ T + θ T ) represents the probability of an individual in an ICU dying, and δ/(δ + ρ U ) represents the probability that an infected individual is detected.
Finally, table 2 also reports the value of the basic reproduction number R 0 calculated as in equation (2.2) for the SUIHTER model. The calculation uses the model parameters reported in tables 1 and 2 (columns 1-4). Note that the amplitude of the credibility intervals is strongly influenced by the choice of the prior in the interval centred about the values of the model parameters obtained by the LS procedure ±10%. They should mainly be judged in relative terms.
We observe that the value of R 0 obtained by the calibration reflects the full reopening of educational activities and work restaring after holidays, as well as the public health measures and restrictions later introduced by the authorities to contain the second epidemic outbreak. In particular, the rise of R 0 in phases 2 and 3 follows the full reopening of schools and restarting of working activities from mid-September, and probably accounts for seasonality effects too. Restrictions on mobility, schools and businesses and partial lockdowns were introduced in late October at the regional and national levels, as reflected by the decrease in R 0 from phase 5 to phase 6, when R 0 became less than 1. Partial reopening and easing of restrictions were gradually introduced in some regions and at the national level from late November, as the new increment of R 0 from phase 9 indicates.

(i) Simulating the second outbreak for Italian regions
The results obtained by simulating the epidemic at the national scale can indeed hide specific local outbreaks. The SUIHTER model can also simulate the evolution of the epidemic for everyone in the 20 Italian regions for which the same time-series data as those used for the national calibration are available. Unfortunately, this is not true for the finer geographical level (the 107 provinces) since only the number of total cases from the beginning of the epidemic is provided. Following the same initialization and calibration strategies adopted at the national level, we have carried out the simulation of the second epidemic outbreak in the six larger Italian regions, namely Lombardy, Veneto, Emilia-Romagna, Lazio, Campania and Sicily. In figure 3, the expected value for the time evolution of the three infectious compartments used for the calibration and the corresponding 95% prediction intervals are reported for the former six regions. The calibration has been carried out using the same setting as for the national level, i.e. calibrating the model with the data available until 31 December 2020 and then simulating until 15 January 2021. The results obtained by numerical simulations are in good agreement with the real data, with few exceptions, namely the isolated compartment in Veneto (where the time series is clearly affected by some reporting problems) and the hospitalized compartment in Emilia-Romagna.      With the goal of investigating to which extent our SUIHTER model is able to predict the occurrence of the epidemic wave peak, we repeated the calibration using the data over limited time ranges.
In particular, we have considered three different cases: in case 0 we used all the data time histories available until 3 December, while in cases 1, 2 and 3, the data employed for the calibration were limited to 23 November, 18 November and 9 November, respectively. For each case, the simulations were run for the subsequent 30 days beyond the date associated with the last set of data used for the calibration and the linear extrapolation carried out as indicated before.
In figure 4, we report the expected value for the time evolution of the four compartments used for the calibration, and the 95% prediction intervals obtained by propagating input uncertainties through the model. The accuracy of the forecast, as expected, improves when a richer set of data are employed in the calibration. Our simulations show the occurrence of a peak for each of the three compartments, not only for case 0, in which the time lapse of the data used for the calibration covers the peaks, but also for cases 1, 2 and 3, when the time-series data employed for the calibration are still rising. However, we should remark that if the model is calibrated with a shorter time series, namely available data stop more than 30 days before the peak, the occurrence of the peak cannot be correctly predicted.
As already noticed, because of the overall complexity of the problem and the limited data available for its calibration, we do not intend here to certify in rigorous terms the actual values of the future compartments. However, in spite of the widths of the predictive intervals (which depend, to some extent, on the widths of the chosen prior distributions), we nonetheless observe that the expected values (solid lines in figure 4) carry meaningful prediction capabilities.
To further assess the accuracy of the prediction, it is interesting to compare this peak forecasting with respect to the actual data, i.e. the day and value that have been reported for the different compartments at the end of November 2020. Moreover, we propose a comparison with the predictions obtained using the two different strategies based on data fitting. The first is based on a simple polynomial fit of degree 2 on the last recorded 10 days, while the second is obtained by using a curve registration (see [39] for an overview on this subject) by exploiting the similarities between the first and second waves. The registration procedure is performed by first computing the exponentially modified Gaussian (EMG) function that best fits the first wave. We denote this function as w(t), t 0 ≤ t ≤ t 1 , with t 0 the first day of the recorded data (24 February) and t 1 equal to 1 August. Then a second minimization problem is solved to compute the time shift and scaling factors to apply to the computed EMG function to best fit the rising portion of the second wave in the time range [t k , t n ], with t k coinciding with 15 October and t n with the last recorder date. Namely, we look for the optimal time shifth and the scaling factorss 1 ands 2 such that (h,s 1 ,s 2 ) = arg min h,s 1 ,s 2 where d i is the value of the considered data series at day t i .
For each data series, the fitted EMG function and the optimal values for the shift and scaling factors are computed and, in this way, the shape of the first wave can be used to complete the second wave for the different compartments.
A comparison between the peak forecast obtained with the SUIHTER model, the quadratic extrapolation (based on the last 10 days) and the registration approach is displayed in figure 5, for the isolated, hospitalized and threatened compartments. The curves show how the prediction in terms of the day of peak occurrence and peak value changes when an increasing number of data are used (the last data day is reported on the horizontal axis). To minimize the effect of daily data noise, the reference value (dashed line) is obtained by smoothing the data with a Savitzky-Golay polynomial smoothing filter of degree 3 [40].
By comparing the peak predictions with the day and value of the measured (smoothed) data peak (shown with a dashed line in figure 5), we should first remark that the SUIHTER prediction largely outperforms those obtained with polynomial extrapolation. Moreover, even when compared with predictions based on the registration with the first epidemic wave, the SUIHTER model is more accurate for most of the considered quantities. When making this comparison, it is worth noticing that, while prediction based on the registration strongly depends on the evolution of the different compartments during the first epidemic wave, the predictions based on SUIHTER do not require any a priori knowledge of previous epidemic waves.

Conclusion and model limitations
In this paper, we have introduced a new mathematical model, named SUIHTER, to describe the ongoing pandemic of COVID-19. This epidemiological model is constructed on seven compartments-susceptible uninfected individuals (S), undetected (both asymptomatic and symptomatic) infected (U), isolated (I), hospitalized (H), threatened (T), extinct (E) and recovered (R)-and we exploit it to study and analyse the second Italian outbreak that emerged in autumn 2020 and is still ongoing. In particular, our model is suited for calibration against data made available daily by the Italian Civil Protection Department [17]. On the basis of these data at the national level, our calibration populates the compartments I, H, T, E and R D , which we purposely use to determine transmission rates, rates of recovery, IFR, etc. In particular, SUIHTER allows us to estimate the infected but undetected population, a compartment (U) that is crucial for studying and understanding the epidemic, especially considering that large numbers of infected individuals went uncounted during the first and even the second outbreaks in Italy. Moreover, thanks to our approach, transmission rates, and thus the basic reproduction number R 0 , can be estimated. Finally, our calibration is made robust by exploiting Bayesian estimation using the MCMC method.
The SUIHTER model calibrated at the Italian national level is validated against data related to the last part of the second outbreak. Comparisons are made against basic statistical models, namely quadratic regression and registration of the first epidemic wave.  demonstrates the better accuracy of SUIHTER for predictive purposes. This is made possible by using extrapolated transmission rates that are calibrated at earlier times through regression models, a feature that allows the peaks of the second Italian outbreak to be captured correctly and enables SUIHTER to be used in a predictive fashion by leveraging data available at the current date. This novel approach attempts to circumvent a common issue of the use of epidemiological compartmental models for forecasting [7], that is, accurately capturing transmission rates. However, as our approach is based on interpolating values of these transitions rates, the accuracy of their extrapolation and, consequently, their exploitation for prediction within SUIHTER can only be limited to restricted time windows, especially when government interventions and citizen behaviours are changing. Note that, although the calibration procedure did not make any assumptions about the temporal changes in parameters, the estimates accurately reflect the policy changes: estimates of R 0 decrease as control measures are tightened and increase when they are relaxed. The results of the simulation of the second wave carried out at the national and regional levels showed the capability of the model in predicting the time evolution accurately in a time frame of 15 days past the data used in the calibration. In longer term predictions the model should account for the possible changes in restriction rules that may occur in the future to supply analyses of different scenarios (as recently done in [40] based on the SUIHTER model).
A further limitation of our approach is that we are currently calibrating the Italian epidemic outbreaks at the national level, that is, as a whole, without summing up the different contributions at the level of the 20 Italian regions for which data are available [17]; however, we performed the calibration of the six larger Italian regions. Populating compartments at the national level by summing up results obtained by tailored calibrations of each Italian region would allow the spatio-temporal heterogeneity of the Italian outbreaks to be better captured, reflecting different mobility patterns and densities of population. In this respect, several different approaches have been proposed in the literature (e.g. [41] and the references therein), ranging from the use of network-based models [42,43] to systems of ODEs on a network [44,45], as well as non-local partial differential equations [46]. Among the contributions that appeared during the COVID-19 pandemic, we also recall [2,4,13], where a meta-community SEIR-like model was proposed and employed to reproduce the contagion in Italy. However, calibrating our SUIHTER model at the regional level, and for all the regions, would require a more sophisticated design owing to the intrinsic ill-posedness of the inverse problem, especially when taking mobility patterns into account. Nevertheless, we plan to better address spatio-temporal heterogeneity of the Italian outbreaks in the future by generalizing our SUIHTER model to incorporate suitable spatial-multicity mobility terms at the regional level. Even though a more spatially detailed compartment model is desirable, to act, for example, at the provincial level (Italy comprises 107 provinces), currently no detailed data for its calibration have been made available.
Although the SUIHTER model is very sophisticated and involves 14 time-dependent parameters and functions to be determined based on available data, we limited our calibration to a subset of the possible control variables, by forcibly setting to zero some parameters that we deemed to be less relevant for the transmission of the epidemic and by assuming some others as constants over time. We also neglected incubation time, and we implicitly assumed that all distributions in the states are exponential, which is far from correct [47]. Still, we believe that this qualifies as an acceptable compromise among the complexity of the SUIHTER model and its calibration procedure, the associated computational costs and the accuracy of the results. Some of the calibrated parameters assume values that are able to compensate for those parameters prescribed a priori, even if their interpretation may not be straightforward in explaining the outbreak. In this respect, we plan to assess the robustness of our approach by allowing the calibration of additional parameters. Furthermore, our multi-compartment SUIHTER model does not consider stratification of age groups within the compartments. This is an important aspect as some compartments such as H, T and E are mostly populated by the elderly, while the transmission mechanisms widely differ by age and context of infection (workplace, school, family, etc.) [28,48]. We also plan to improve SUIHTER by considering age stratification within its compartments.
Among the limitations of our work is the identifiability of model parameters as SUITHER requires a relatively large set of such parameters to be calibrated. Although we acknowledge its importance, the current study does not present a direct verification of the identification conditions on model parameters nor, in general, an identifiability analysis. Some, albeit indirect, verification of the identifiability properties of the model comes from the very large sets of time series on which calibration has been successfully performed. These are however aspects that we may better address in future studies.
Finally, in consideration of the ongoing emergency situation during the second Italian outbreak, we believe that our SUIHTER model is well suited to being used in a predictive manner to support and motivate public health measures. To the best of our knowledge, apart from [49],