Explicit characterization of human population connectivity reveals long run persistence of interregional dengue shocks

Dengue is hyper-endemic in Singapore and Malaysia, and daily movement rates between the two countries are consistently high, allowing inference on the role of local transmission and imported dengue cases. This paper describes a custom built sparse space–time autoregressive (SSTAR) model to infer and forecast contemporaneous and future dengue transmission patterns in Singapore and 16 administrative regions within Malaysia, taking into account connectivity and geographical adjacency between regions as well as climatic factors. A modification to forecast impulse responses is developed for the case of the SSTAR and is used to simulate changes in dengue transmission in neighbouring regions following a disturbance. The results indicate that there are long-term responses of the neighbouring regions to shocks in a region. By computation of variable inclusion probabilities, we found that each region’s own past counts were important to describe contemporaneous case counts. In 15 out of 16 regions, other regions case counts were important to describe contemporaneous case counts even after controlling for past local dengue transmissions and exogenous factors. Leave-one-region-out analysis using SSTAR showed that dengue transmission counts could be reconstructed for 13 of 16 regions' counts using external dengue transmissions compared to a climate only approach. Lastly, one to four week ahead forecasts from the SSTAR were more accurate than baseline univariate autoregressions.

We consider the following Sparse Space-Time Autoregressive (SSTAR) model for R regions with the following specification (1), with the stack matrix formulation is defined here in order to derive the SSTAR specific forecast error impulse response function in the following section.
Our SSTAR consists of matrices Y t−i1,(R×1) , A i1,(R×R) , B i2,(R×R) , W i2, (R×R) . In addition, we consider a diagonal time autoregressive matrix diag(A i1 ) = φ i1,1:R with φ i1,1:R entry denoting the i th 1 autoregressive coefficient for regions 1 to R, a diagonal time-space autoregressive matrix diag(B i2 ) = γ i2,1:R with γ i2,1:R entry denoting the i th 2 space-time recursive coefficient for regions 1 to R and a weight matrix with diagonal entries made to be zero diag(W) = W the distance/adjacency matrix between regions as described in the main text. We may also consider H exogenous variables C i3,u,(R×R) , with diag(C i3,u ) = c i3,u,1:R denoting the i th 3 lag of variable u associated to Y t . P 1 , P 2 , P 3 are the number of lag terms associated to autoregressive, space-time recursive and exogenous coefficients respectively. Finally t denotes the noise vector.
It is easy to see that SSTAR is of seemingly-unrelated regression form by considering the model specification for the k th row of Y t for the k th region (2) , with estimation proceeding by considering the following equation by equation penalized minimization problem iteratively for all regions (3).

SSTAR Forecast Error Impulse Response Function
The companion form of SSTAR was utilized to derive the forecast error impulse response function (FEIR) (4). Define Y T to be a Px1 vector where entries {Y t . . . Y tp−1 } are NxR observations of N time-points from R regions. A P RxP R an autoregressive matrix as defined in (6), B P RxP R a space-time autoregressive matrix as defined in (7) and W P RxP R a space-time diagonal weight matrix (8) with diag(W) = W , where W RxR the time-invariant weight matrices defining distance between regions as described in section 1. We deprecate exogenous variables from FEIR computations as they may be treated as a time varying intercept term.
By defining the FEIR for some j th region shock as the effect of ε t,j on Y T +h [2], we yield the FEIR for SSTAR on the h th horizon by recursion as in (9). e j denotes a selection vector with zeros everywhere except e jj = 1.
As we assume no contemporaneous relationships for the estimated model, we can simply identify α level confidence intervals for the IRF by using the bootstrap interval estimates for (4). The procedure proceeds as follows: 1. Retrieve point estimates forÂ,B by (3) 2. Obtain bootstrap confidence intervals by block bootstraping N samples and re-estimatingÂ boot ,B boot at each iteration 3. Use samples obtained from Step 2 to construct the sampling distributionsP A ,P B 4. Retrieve the FEIR intervals by plugging in α and 1-α sampled values,Â α ,B α ,Â 1−α ,B 1−α into (9).

SSTAR Generalised Impulse Response Function
We modified Pesaran and Shin's (1998) [3] procedure to the SSTAR to obtain the generalised impulse response functions (GIRF). The GIRF is defined as the difference of conditional expectation of the forward forecast y t+j given an one time shock δ i occurs in series j given the information set Ω t−1 .
The SSTAR (4) has the VMA(∞) form required for computation of GIRF after deriving the companion SSTAR form for the specification. G i X i denotes the companion form matrix of exogenous variables irrelevant to GIRF calculations.
We detail the full computational procedure to compute GIRF for SSTAR below: 1. Obtain the error variance-covariance matrix of under L 1 penalty by adapting Fan, Guo, and Hao (2012)'s estimator [1] to the vector autoregressive case withŝ L,λ number of nonzero coefficients under penalized estimation : This provides us with the approximationˆ ∼ N (0,Σ L,λ ) 2. Pick a one standard deviation shock for the j th region it = δ i = σ jj 3. Modify Pesaran and Shin's (1998) expression to yield the scaled GIRF for the SSTAR, where Λ P RxP R is a selection square matrix diag(Λ) = 1 for the first 1 to J diagonals and 0 everywhere else. SSTAR was estimated using penalized least squares and 5-fold cross validation with both connectivity and weight matrices within the specification, FGLS-AR was estimated using feasible generalized least squares with variance weighted by own region counts, STAR and STAR-2 were estimated using least squares with connectivity and weight matrices of other regions added into the specification SSTAR was estimated using penalized least squares and 5-fold cross validation with both connectivity and weight matrices within the specification, FGLS-AR was estimated using feasible generalized least squares with variance weighted by own region counts, STAR and STAR-2 were estimated using least squares with connectivity and weight matrices of other regions added into the specification SSTAR was estimated using penalized least squares and 5-fold cross validation with both connectivity and weight matrices within the specification, FGLS-AR was estimated using feasible generalized least squares with variance weighted by own region counts, STAR and STAR-2 were estimated using least squares with connectivity and weight matrices of other regions added into the specification SSTAR was estimated using penalized least squares and 5-fold cross validation with both connectivity and weight matrices within the specification, FGLS-AR was estimated using feasible generalized least squares with variance weighted by own region counts, STAR and STAR-2 were estimated using least squares with connectivity and weight matrices of other regions added into the specification SSTAR was estimated using penalized least squares and 5-fold cross validation with both connectivity and weight matrices within the specification, FGLS-AR was estimated using feasible generalized least squares with variance weighted by own region counts, STAR and STAR-2 were estimated using least squares with connectivity and weight matrices of other regions added into the specification SSTAR was estimated using penalized least squares and 5-fold cross validation with both connectivity and weight matrices within the specification, FGLS-AR was estimated using feasible generalized least squares with variance weighted by own region counts, STAR and STAR-2 were estimated using least squares with connectivity and weight matrices of other regions added into the specification