Risk mapping for COVID-19 outbreaks in Australia using mobility data

COVID-19 is highly transmissible and containing outbreaks requires a rapid and effective response. Because infection may be spread by people who are pre-symptomatic or asymptomatic, substantial undetected transmission is likely to occur before clinical cases are diagnosed. Thus, when outbreaks occur there is a need to anticipate which populations and locations are at heightened risk of exposure. In this work, we evaluate the utility of aggregate human mobility data for estimating the geographical distribution of transmission risk. We present a simple procedure for producing spatial transmission risk assessments from near-real-time population mobility data. We validate our estimates against three well-documented COVID-19 outbreaks in Australia. Two of these were well-defined transmission clusters and one was a community transmission scenario. Our results indicate that mobility data can be a good predictor of geographical patterns of exposure risk from transmission centres, particularly in outbreaks involving workplaces or other environments associated with habitual travel patterns. For community transmission scenarios, our results demonstrate that mobility data add the most value to risk predictions when case counts are low and spatially clustered. Our method could assist health systems in the allocation of testing resources, and potentially guide the implementation of geographically targeted restrictions on movement and social interaction.


S1. DESCRIPTION OF MOBILITY DATA
The data used in our study was provided by the Facebook Data for Good program. The data set (in the Disease Prevention Maps subset) is aggregated from individual-level GPS coordinates collected from the use of Facebook's mobile app. Therefore, the raw data is biased to overrepresent the movements of any subpopulations more likely to utilise social media applications on mobile devices. After collection, the data is spatially and temporally aggregated as a list of trip numbers between Bing Tiles [1] within a rectangular raster pattern (i.e., centered on a country, state, or city). The sizes and boundaries of these discrete locations are determined by an optimisation procedure that produces the smallest subregion size possible (down to a minimum size of 600m × 600m), given the extent of the region of interest and the requirement for nearreal time release of new data. A trip between locations is defined based on the most frequently visited tile in the first 8-hour period and the most frequently-visited tile in the subsequent 8-hour period. Finally, before the data is released, any entries showing fewer than 10 trips between a pair of locations are removed to protect the privacy of individual users. For Australia, the statelevel data consists of trip numbers between 2km × 2km tiles. By comparing this scale to larger (national-scale) and smaller (city-scale) regions of interest, we determined that the state-level data provided the best balance, with trip numbers large enough to produce a sufficiently dense network of connections while still providing a subregion size that is usually smaller than the Local Government Areas for which case data is reported.

A. Generating correspondences
Because the raw mobility data is provided as movements between tiles, while case data is provided based on the boundaries of Local Government Areas. We note that while Facebook releases data aggregated to administrative regions, these regions were not geographically consistent with the current LGA boundaries for Australia. In order to ensure consistency of our method across datasets and jurisdictions, we produced our own correspondence system. We did this by performing two spatial join operations. These associate either tiles or LGAs with Meshblocks (the smallest geographic partition on which the Australian Bureau of Statistics releases population data). Meshblocks were associated based on their centroid locations. Each meshblock centroid was associated to the tile with the nearest centroid and to the LGA containing it. We did not S1 split meshblocks whose boundaries lay on either side of an LGA or tile boundary, as their sizes are sufficiently small that edge effects are negligible (in addition, the set of LGAs forms a complete partition of meshblocks, so edge effects were only observed for tile associations). We then associated tiles to LGAs proportionately based on the fraction of the total meshblock population within that tile that was associated with each overlapping LGA.

B. Re-partitioning mobility data
Once a correspondence is established between the tile partitions on which mobility data is released and the LGA partitions on which case data is released, the matrix of connections between tiles must be converted into a matrix of connections between LGAs. The Supplementary Technical Note explains how we performed this step, and gives a general method for converting matrices between partition schemes. Briefly, the number of trips between two locations in the initial data is split between the overlapping set of partitions in the new set of boundaries (in this case, local government areas), based on the correspondence between partition schemes determined as explained in the previous subsection.

C. Spatial biases in Facebook mobility data
To investigate the spatial sample biases present in the mobility data provided by Facebook, we examined the ratio of Facebook users to ABS 2018 population for each suburb in Victoria.
While the true number varies from day to day, an example of this distribution is shown as a heat map in Supplemental Figure S1, which displays the average number of Facebook mobile app users indexed to each LGA between the hours of 2am and 10am from May 15th to June 25th, divided by the estimated resident population reported by the ABS in 2018. The distribution is narrow, with most urban areas falling in the range of 5 % to 10 % Facebook users. However, this is not an exact representation of residential population proportions, as many mobile users work during the nighttime and will not be located at their residence during the selected period.
Unfortunately, it is not possible to precisely quantify the bias introduced by Facebook's sampling scheme.
Despite these limitations, it may still be informative to examine whether accounting for the bias pictured in Figure S1 affects our validation. To determine this, we re-computed the correlations

D. Temporal autocorrelation of mobility matrices
In order to investigate the degree to which mobility changed during the study period, we computed the autocorrelation of mobility flows between origin-destination pairs at time t to those at future times t . The results, shown in Figure S3, demonstrate that while weekend and weekday mobility differ markedly, and the implementation of stage 3 restrictions in Greater Melbourne altered mobility patterns, there is a very high level of temporal consistency in relative mobility volumes throughout the studied period. For this reason, our results for the community transmission scenario shown in Figure 5 are robust to the precise selection of time periods used to generate the mobility matrices for our risk estimates. For example, if we integrate mobility flows over a period longer than one week, or consider a nonzero delay between the period over which mobility is averaged and the time t for which active case data is tabulated, it has no effect on the resulting risk rankings and gives the same pattern of temporal correlations (though the risk values themselves are affected slightly).

S2. CORRELATING RISK ESTIMATES TO CASE DATA
We used Spearman's rank correlation to investigate the correspondence between our relative risk estimates and documented case data. This measure of correlation is typically used when comparing ordinal data, or, more generally, when monotonic relationships are expected, but errors are not normally-distributed. In order to investigate the monotonicity between relative risk estimates and reported case numbers, we aligned the documented case data for all regions in which infections had been tabulated against the corresponding relative risk estimates for those regions. Note that our correlations did not include regions for which no case data was available.
Therefore, our correlation results illustrate the degree to which risk estimates are monotonic with case numbers, but do not account for any risk estimates made in areas with no cases to compare to. This results in a high degree of uncertainty when the number of affected areas is small, reflected by the wide confidence intervals observed in the early stages of the Cedar Meats and Crossroads Hotel outbreaks (Figures 3, and 4a, respectively).
The 95% confidence intervals were computed using Fisher's Z transformation with quantile parameter α = 1.96.

S3. ABS DATA SOURCES
Two data sets from the Australian Bureau of Statistics were used in this study: 1) number of residents by industry of occupation (2016), and 2) resident population (2018).

A. Population by LGA
The distributions shown in Figure S1 were computed by dividing the number of Facebook users indexed to each LGA during the nighttime period by the resident population in each LGA. We obtained the population data from the ABS 2018 population dataset which is publicly available To compute the factors used to weight the mobility-based relative risk predictions, we divided the total number of workers in both of the above categories by the number of employed persons (those employed full time or part-time) in each LGA, which we also drew from the 2016 Australian Census via Census TableBuilder.

S4. CASE DATA
COVID-19 case data by local government area is available from Australian jurisdictional health authorities. For this work, we used data provided by NSW Health [4] (all data is publicly available)

S6
and from Victoria DHHS. The data used for the Cedar Meats outbreak scenario was obtained from DHHS through a formal request to the Victorian Agency for Health Information (VAHI) and cannot be made public in this work. The case data by LGA used to evaluate the Victoria community transmission scenario was taken directly from the COVID-19 daily update archives available on the DHHS public website [5].

S5. DESCRIPTION OF SUPPLEMENTAL DATA
• Timeseries of total case incidence for the Crossroads Hotel and Cedar Meatworks studies • Correlation values used in Figure 5