Time-aggregated mobile phone mobility data are sufficient for modelling influenza spread: the case of Bangladesh

Human mobility plays a major role in the spatial dissemination of infectious diseases. We develop a spatio-temporal stochastic model for influenza-like disease spread based on estimates of human mobility. The model is informed by mobile phone mobility data collected in Bangladesh. We compare predictions of models informed by daily mobility data (reference) with that of models informed by time-averaged mobility data, and mobility model approximations. We find that the gravity model overestimates the spatial synchrony, while the radiation model underestimates the spatial synchrony. Using time-averaged mobility resulted in spatial spreading patterns comparable to the daily mobility model. We fit the model to 2014–2017 influenza data from sentinel hospitals in Bangladesh, using a sequential version of approximate Bayesian computation. We find a good agreement between our estimated model and the case data. We estimate transmissibility and regional spread of influenza in Bangladesh, which are useful for policy planning. Time-averaged mobility appears to be a good proxy for human mobility when modelling infectious diseases. This motivates a more general use of the time-averaged mobility, with important implications for future studies and outbreak control. Moreover, time-averaged mobility is subject to less privacy concerns than daily mobility, containing less temporal information on individual movements.


S1 Incompleteness of mobile phone mobility data
The mobility data is completely lacking for five upazilas of Bangladesh in the Chittagong division, and our model thus does not include these upazilas. These upazilas are Thanchi, Barkal, Belai Chhari, Jurai Chhari, and Langadu. The transitions to and from the upazila Parshuram (Chittagong division) are missing, but the total count of active mobile phones in Parshuram is observed, allowing us to estimate the population size in Parshuram. We use the gravity model to estimate the transitions to and from Parshuram.

S2 Descriptive statistics of the hospital sentinel case data
Descriptives for the hospital sentinel cases for the different seasons are given in Table S1. Note that both influenza B and influenza A types were circulating in 2014, 2016, and 2017, while 2015 was mainly dominated by A(H1N1). Most of the type A cases in 2014 and 2016 were of subtype A(H3).

S3 Infectious disease model description
The influenza dynamics model is a stochastic compartmental SEII a R metapopulation model, similar to the model previously published in (Engebretsen et al., 2019). Here the compartments are the individuals who are susceptible to the disease (S), the exposed individuals (E), that is, those who have been infected but are not yet infectious, the symptomatic infectious individuals (I), the asymptomatic infectious individuals (I a ), and the recovered/removed individuals (R). See also Balcan et al. (2009) and Colizza et al. (2007). We let β denote the transmission probability per unit time, 1/λ be the average latent period, 1/γ be the average infectious period, and p a denote the probability that an infection is asymptomatic. We assume that the transmission probability of the asymptomatic infectious is reduced by r β . The stochastic SEII a R equations are given by: , t), and N t i are the number of susceptible individuals, exposed individuals, symptomatic infectious individuals, asymptomatic infectious individuals, and the total population at time t in location i, respectively. X i (1 − p a )λ∆t, p a λ∆t), X i 4 (t) ∼ Binom (I i (t), γ∆t), X i 5 (t) ∼ Binom (I i a (t), γ∆t), Binom(n, p) is the binomial distribution with n trials and success probability p, and Multinom(n, p 1 , p 2 ) is the multinomial distribution with n trials and success probabilities p 1 and p 2 . We assume that the population size is constant during the epidemic, so that R = N − S − E − I − I a . As in Balcan et al. (2009), we fix the probability of an infectious individual being asymptomatic to p a = 0.33, similar to the estimated asymptomatic proportion in Presanis et al. (2011); Birrell et al. (2011). We let the transmission probability of the asymptomatic infectious be reduced by 50%, so we fix r β = 0.50, as in Balcan et al. (2009). The basic reproductive number R 0 is defined as the number of expected new cases generated by one infectious individual, in a fully susceptible population. The basic reproductive number of the model is given by R 0 = β γ (0.5 · 0.33 + 0.67) (Colizza et al., 2007). As we cannot assume that the population is fully susceptible, we will estimate the effective reproductive number, R e , which takes the prior immunity into account.

S4.1 Prior distribution and parameter specifications
We assume prior distributions for the transmission parameter β and the reporting probability r. It is reasonable to assume that only a small proportion of the cases are reported. Some individuals do not go to their doctor with influenza. Moreover, since we only have information for some hospitals, it should be a small fraction of the total number of influenza cases. We assume a beta distribution for r, with shape parameters 1 and 7, corresponding to a mean 1/8 and variance 7/(8 2 · 9). This prior is quite wide and uninformative, but skewed towards lower reporting probabilities. A plot of the prior distribution for the reporting rate is provided in Figure S1. For the transmission parameter β, we assume a uniform prior on (0.4, 0.7), which is quite uninformative. When we estimate the parameters for the other seasons than 2017, we use the results for the 2017 season to define our priors. We fix the mean latent period and the mean infectious period. The mean latent period is fixed to 1.9 days, as in Balcan et al. (2009) andLongini Jr et al. (2004). This is also in accordance with the fact sheet from WHO on seasonal influenza (World Health Organization, 2016), where the average incubation period is claimed to be two days. The mean infectious period is fixed to three days. Similar infectious and/or latent periods for seasonal and pandemic influenza can also be found in the literature in Corbella et al. (2018); Birrell et al. (2017); Balcan et al. (2009);Colizza et al. (2007);Germann et al. (2006);Longini Jr et al. (2004); Bajardi et al. (2011).
The mean latent and infectious periods are fixed, as we do not have data to inform these- Corbella et al. (2018) argue that the latent and infectious periods can only be inferred from detailed information at the individual level, which we do not have. Even though there is good knowledge of the latent and infectious period distributions in the literature, this is a strong assumption. In addition to scarce data, the different model parameters are strongly correlated, further complicating the parameter estimation. The data contain little or no information on the latent period, and the infectious period is correlated with both the transmission parameter and the reporting probability.

S5 Posterior distribution
A scatterplot of the joint posterior for the 2017 season is given in Figure S2. The posterior correlation is −0.0525.

S6 ABC-SMC procedure for inference
The idea behind approximate Bayesian computation (ABC) is to use simulations from the model, and accept parameters if the simulations are sufficiently close to the observed data, measured in summary statistics. We use a sequential Monte Carlo version of ABC. We use four summary statistics-the global observed incidence curve, the observed peak height (that is, the maximum number of daily observed new infectious cases), the duration of the epidemic, and the final size of the epidemic. We define the duration as the number of days between the day with 10% of the total number of observed cases, and the day with 90% of the total number of observed cases. The final size is the total number of observed symptomatic cases. The time-averaged mobility is used in the fitting procedure, as this is computationally advantageous, and since we do not q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q know the initial date.
We use an algorithm similar to the one in Brooks-Pollock et al. (2014), provided in Algorithm 1. In the algorithm, m is the number of summary statistics, and they are denoted by S 1 , . . . , S m . The distance in the summary statistics are weighted, in order to make sure that they are on the same scale. We denote the data by D lt , which is the number of observed cases in location l on day t, and the simulated observed incidence by D lt . In the proposal distribution, we use a variance which is twice the empirical variance of the previous round, as suggested in Beaumont et al. (2009). We use a Euclidean distance metric between the number of reported cases and the simulated number of observed infectious symptomatic cases, for the global incidence curves. In the comparison, we align the global peak date of the simulated data with the global peak date in the observed data. Hence, we compare the shapes of the curves. Let L be the number of locations and T the duration of the epidemic in number of days. The distance in the shape of curve statistic is thus given by For the other summaries, duration, final size, and peak height, we use the absolute distance.

S7 Seeding locations
When investigating the spatial spread for various hypothetical seeding scenarios, we seed in Dhaka (seeding in Sutrapur upazila), and in the divisions located in the different "corners" of the country, that is, North-West (Rangpur division, seeding in Rangpur Sadar), North-East (Sylhet division, seeding in Sylhet Sadar), South-West (Khulna division, seeding in Khulna Sadar), and South-East (Chittagong division, seeding in Double Mooring). For completeness, we also include results when seeding in the remaining two divisions Barisal (seeding in Barisal Sadar), and Rajshahi (seeding in Matihar), provided in the Section S13 in this supplement.
When we simulate the 2014-2016 influenza seasons, we seed in the 11 upazilas with the earliest cases for the year in question, similar to the seeding scenario for 2017. For 2016, this corresponds to six upazilas in Dhaka, four in Chittagong, and one in Sylhet. For 2015, this Algorithm 1 ABC-SMC Initialise: Set a starting value 0 for the tolerance, r = 1 and w 1 = (1, . . . , 1).
Sample θ θ θ p from θ θ θ r−1 with weights w r−1 . Propose θ θ θ i from an independent gaussian distribution G centered at θ θ θ p with variance equal to twice the empirical variance from the previous round.
Run the model with θ θ θ i , providing the simulated number of reported cases D .
set θ θ θ r i = θ θ θ i , and calculate weights w r i for the parameters, as for the accepted parameters. Increment r = r + 1. end while Figure S3: Daily mobility model, 2017. Mean prevalence per 100 000 residents in the different upazilas when seeding according to the 2017 epidemic, a) 2, b) 5, c) 10, d) 20, e) 40, and f) 50 days after the initial seeding date. Note that the scale is different for the different days.
corresponds to six upazilas in Dhaka, two in Barisal, one in Rangpur, one in Rajshahi, and one in Sylhet. For 2014, this corresponds to three upazilas in Barisal, three in Rajshahi, two in Dhaka, two in Sylhet, and one in Chittagong.

S8 Early spread for different seeding scenarios
We investigate the early spread for the different seeding settings, by plotting the mean prevalence in each upazila 2, 5, 10, 20, 40, and 50 days after the initial seeding date (day 0).

S8.1 2017 simulation
The early spatial spread for the 2017 simulation is given in Figure S3. The disease quickly spreads through the country, and most of the early spread seems to be to the neighbouring upazilas, but also to larger cities.

S8.2 Dhaka
The early spatial influenza spread when seeding in Dhaka is given in Figure S4. The disease quickly spreads to the larger cities, but also to upazilas within Dhaka division.

S8.3 Chittagong
The early spatial influenza spread is given in Figure S5. The disease quickly spreads to some of the larger cities (including Dhaka), but also to upazilas within Chittagong.

S8.4 Khulna
The early spatial influenza spread when seeding in Khulna is given in Figure S6. There is early spread to some of the larger cities, including Dhaka, but most of the early spread is within the Khulna division.

S8.5 Rangpur
The early spatial influenza spread when seeding in Rangpur is given in Figure S7. We see that we have early spread to Dhaka city, but that most of the early spread is in the North West.

S8.6 Sylhet
The early spatial influenza spread when seeding in Sylhet is given in Figure S8. There is early spread to Dhaka city, but most of the early spread is within the Sylhet division.

S9 Correlation between upazila initial dates
We compute the correlations between the mean initial dates in each upazila for the model using the daily mobility and the various mobility approximations. In addition, we compute the correlation between the model with the time-averaged mobility and the model with the daily mobility, when seeding on the 1st of April 2017. The correlations are provided in Table S2. The time-averaged mobility has a much higher correlation with the daily mobility than the gravity model and the radiation model. Correlation between the mean initial dates obtained with the daily mobility, and the mean initial dates obtained with the other mobility approximations: the gravity model, the radiation model, and the time-averaged mobility, for the various seeding scenarios. We also provide the correlation between the time-averaged mobility and the daily mobility when seeding on the 1st of April 2017.
In Panigutti et al. (2017), the authors investigated similarity of epidemic outbreak simulations on two alternative mobility networks in France-one based on census data, and the other based on mobile phone mobility data. They calculated a rank correlation between the arrival dates (defined as the first day of an infectious case) in the different locations for different seeding nodes. They found that the epidemic trajectories on the two networks were more similar the higher the degree and strength of the seeding node. We therefore also calculate the degree and strength of the seeding nodes for the different mobility proxies, provided in Table S3. There does not seem to be any connection between neither the strength nor the degree of the node and the correlations in our simulations. Note that all the seeding upazilas are central upazilas, and that such a trend might not be possible to uncover when investigating only five upazilas. It could also be that other network properties could explain the similarities better.

S10 Early spread with gravity and radiation models
We investigate the early spread with the gravity and radiation model approximations, for the different seeding settings, by plotting the prevalence in each upazila 2, 5, 10, 20, 40, and 50 days after the initial seeding date.

S10.1 2017 simulation
The early spatial spread when seeding according to the early cases of 2017 is given in Figure S9 for the gravity model, and in Figure S10 for the radiation model. We clearly see that the epidemic spreads much faster in the gravity model setting than in the mobile phone mobility setting, and more slowly in the radiation model setting.

S10.2 Dhaka
The early spread when seeding in Dhaka is given in Figure S11 for the gravity model, and in Figure S12 for the radiation model. Also in this setting, the spread based on the gravity model is much faster than the mobile phone mobility setting. The epidemic spreads early to the upazilas within Dhaka division, but also to many other upazilas of the country. In the radiation model setting, the spread is much slower, and most of the early spread is to the upazilas in Dhaka division.

S10.3 Chittagong
The early spread when seeding in Chittagong is given in Figure S13 for the gravity model, and in Figure S14 for the radiation model. Again, the epidemic spreads quickly under the gravity model, both to upazilas of Chittagong, but also to many other upazilas of the country. In the radiation model, the early spread is much more restricted to Chittagong division, and the   epidemic spreads more slowly than with mobile phone mobility and the gravity model. Dhaka city is also hit early.

S10.4 Khulna
The early spread when seeding in Khulna is given in Figure S15 for the gravity model, and in Figure S16 for the radiation model. Also when seeding in Khulna, the disease spreads quickly under the gravity model, and all the divisions have been reached early. In the radiation model, the spread is again slower than in the mobile phone mobility setting, and the early spread is mainly restricted to Khulna division and upazilas close to the seeding location. The epidemic also spreads quickly to Dhaka city.

S10.5 Rangpur
The early spread when seeding in Rangpur is given in Figure S17 for the gravity model, and in Figure S18 for the radiation model. In the gravity model, the epidemic spreads more quickly than in the mobile phone mobility setting. The early spread is to the upazilas of Rangpur, and Rajshahi, but also to other upazilas around the country. Under the radiation model, the spread is again slower, and the initial spread is radially from the seeding upazila, but also to Dhaka    city.

S10.6 Sylhet
The early spatial spread when seeding in Sylhet is given in Figure S19 for the gravity model, and in Figure S20 for the radiation model. Again, the spread using the gravity model is much faster than when using the mobile phone mobility data, and the early spread is both to upazilas of Sylhet, and to other upazilas of the country. The spread is slower in the radiation model setting, and the early spread is to upazilas of Sylhet and to Dhaka city.

S11 Early spread when seeding on April 1st 2017
We investigate how the early spread differs when the daily mobility matrices are used for the first days, that is, when seeding on the 1st of April 2017 instead of the 30th of September 2016. We plot the prevalence in each upazila 2, 5, 10, 20, 40, and 50 days after the initial seeding date for the different seeding scenarios. The plots are given in Figures S21, S22, S23, S24, S25, and S26 when seeding according to the 2017 epidemic, in Dhaka, in Chittagong, in Khulna, in Rangpur, and in Sylhet, respectively. The spreading pattern is very similar to the setting when seeding on    the 30th of September, for all the seeding scenarios. Hence the time-averaged mobility is also a good approximation in the initial period of the epidemic.

S12 Initial dates in upazilas
The initial dates in each upazila for the daily mobility setting when seeding on the 30th of September 2016 according to the 2017 epidemic, in Dhaka, in Chittagong, in Khulna, in Rangpur, and in Sylhet are given in Figures S27a-f, respectively. For all seeding scenarios, neighbouring upazilas seem to have similar initial dates. In general, the upazilas in the seeding division have the earliest initial dates. The differences in initial dates between the different upazilas are larger when seeding in Khulna, Rangpur or Sylhet, than when seeding in Dhaka, Chittagong or according to the 2017 epidemic.
We also provide the histogram over the initial dates in the different upazilas in the 100 simulations, provided in Figure S28, by seeding regions. Again we note that the spread is less coherent when seeding in Khulna, Rangpur or Sylhet than when seeding according to the early cases of 2017, in Dhaka or in Chittagong.
The corresponding histograms for the gravity model approximation, the radiation model approximation, the time-averaged mobility and the daily mobility when seeding on the 1st of      April are given in Figures S29-S32, respectively. The initial dates per upazila under the gravity model are very similar for the different seeding scenarios, while for the radiation model the distributions are much wider than for the daily mobility and the gravity model, and depend more on the seeding locations.

S13 Seeding in Barisal and Rajshahi
For completeness, we also include simulations when seeding in the two remaining divisions, Barisal and Rajshahi, seeding in the upazilas Barisal Sadar and Matihar, respectively. The resulting initial dates in each upazila are given in Figure S33. The disease clearly sparks first in the seeding divisions in both settings. When seeding in Barisal, the disease spreads to Khulna, Chittagong, and Dhaka before spreading to the rest of the country. When seeding in Rajshahi, the disease spreads to Rangpur, Khulna, Chittagong, and Dhaka before spreading to the rest of the country, and the upazilas of Sylhet have the latest initial dates. The early spread when seeding in Barisal is given in Figure S34, and the early spread when seeding in Rajshahi is given in Figure S35. Similar to the spreading pattern for the other seeding locations, the early spread is to upazilas within the seeding divisions, but also to larger cities and in particular to Dhaka city. The distributions of initial dates for each division when seeding in Barisal and Rajshahi are given in Figure S36. In both cases, the seeding division clearly has earlier initial dates  than the other divisions, and the synchrony is similar to when seeding in Khulna, Rangpur, and Sylhet. When seeding in Barisal, the order of the initial dates in each division is Barisal, Khulna, Chittagong, Dhaka, Rajshahi, Sylhet, and Rangpur. When seeding in Rajshahi, the corresponding order is Rajshahi, Rangpur, Khulna, Chittagong, Dhaka, Barisal, and Sylhet. When seeding in Barisal, the mean difference in initial dates is slightly more than three weeks between Barisal and Rangpur. When seeding in Rajshahi, the mean difference in initial dates is also slightly more than three weeks between Rajshahi and Sylhet. The final sizes are the same as for the other seeding settings (0.228 (0.000172) when seeding in Barisal, 0.228 (0.000189) when seeding in Rajshahi). The global peak date and peak prevalence when seeding in Barisal are 272.1 (7.0) and 0.00716 (2.21 · 10 −5 ). The global peak date and peak prevalence when seeding in Rajshahi are 272.4 (6.0) and 0.00715 (2.31 · 10 −5 ). The global peaks are thus slightly delayed compared to seeding in Dhaka or Chittagong, but earlier than for the other seeding divisions. The peak prevalences are also slightly lower than when seeding in Dhaka and Chittagong. Standard deviations are given in parentheses.

S14 Gravity and radiation model
Plots of the estimated travel fluxes for the radiation model and the gravity model versus the time-averaged mobile phone mobility are given in Figure S37, along with the line y = x for comparison. The radiation model clearly overestimates the travel volume on some links, while the gravity model does not appear to have this problem. The R 2 values are 0.31 for the gravity model and 0.65 for the radiation model. However, the gravity model estimates are closer to the y = x line, and the estimates are thus closer with the gravity model than with the radiation model. In addition, we perform a fluctuation analysis for the travel approximations, as is also done in Masucci et al. (2013) to compare the two models. We compute the Sørensen-Dice coefficient, Figure S34: Mean prevalence per 100 000 residents in the different upazilas when seeding in Barisal, a) 2, b) 5, c) 10, d) 20, e) 40, and f) 50 days after the seeding event. Note that the scale is different for the different days.  E Sørensen , given as which is a similarity index between 0 and 1, where 0 means that there is no correlation, and 1 means that there is a perfect match. The E Sørensen is given in Figure S38, plotted in a distance and destination population size plane, and an origin population size and destination population size plane. We see that the gravity model performs the worst for large distances and small destination populations. The radiation model performs badly for short distances. The radiation model performs badly for travel between large and small origin and destination populations. The gravity model performs badly for small origin and destination population sizes. In addition, a plot showing which model performs the best in terms of E Sørensen is also given in Figure S38. The radiation model seems to overall perform better in terms of the Sørensen-Dice coefficient, except from for short distances, and travel between large and small populations, where the gravity model seems to perform better.

S15.2 Simulations for 2014, 2015, and 2016
The resulting initial dates for each division for the 2014, 2015, and 2016 season simulations are given in Figure S40. For 2016, the epidemic seems to first spread in the Sylhet division. Considering the mean initial dates for the different divisions, the order of arrival is Sylhet, Chittagong, Dhaka, Barisal, Khulna, Rajshahi, and Rangpur. For 2015, the epidemic seems to be more synchronised across the different divisions. Considering the mean initial date for each division, the relative arrival times are Sylhet, Barisal, Chittagong, Dhaka, Khulna, Rangpur, and Rajshahi, in that order. Also for 2014, the epidemic seems to first spread in Sylhet. The order of arrival is Sylhet, Chittagong, Barisal, Dhaka, Rajshahi, Khulna, and Rangpur. Hence, the results are in agreement with our general results, where we find that Chittagong and Dhaka seem to be hit early in the epidemic, while the epidemics seem to have started in Sylhet. In recent years, our simulations thus indicate that there seems to have been an epidemic wave from East to West. We plot the fitted incidence curves from our simulations, along with the case data for 2014, 2015, and 2016. The plot is given in Figure S41. The model fits the case data well for all the seasons. Note that we have plotted them to align the peak of the case data and the simulations as we have not estimated the initial conditions.

S16 Network descriptives and centrality measures
We compute various descriptive statistics of the time-averaged mobility network. The density of the network is the number of existing links divided by the number of possible links. The density of the time-averaged mobility network is 82%. The reciprocity of the network is the proportion of reciprocal links. The reciprocity is high-95% of the edges are reciprocal. The transitivity (proportion of triangles) is also high, 0.88. The diameter of the network is two, which means that no upazilas are more than two links away from each other. Hence, the mobility network is very dense and tightly connected.
We repeated the network analysis for the daily mobility matrices. The average density of the networks is 42% (standard deviation 3.1%), which is lower than for the time-averaged mobility matrix, as expected, but still a high density. The average reciprocity of the network is high, 79%  (standard deviation 1.6%), but lower than for the time-averaged mobility matrix. The average transitivity is 0.59 (standard deviation 0.023). The average diameter of the network is 2.01 (standard deviation 0.10). Hence, the daily mobility networks are dense and tightly connected.