The spatial dissemination of COVID-19 and associated socio-economic consequences

The ongoing coronavirus disease 2019 (COVID-19) pandemic has wreaked havoc worldwide with millions of lives claimed, human travel restricted and economic development halted. Leveraging city-level mobility and case data, our analysis shows that the spatial dissemination of COVID-19 can be well explained by a local diffusion process in the mobility network rather than a global diffusion process, indicating the effectiveness of the implemented disease prevention and control measures. Based on the constructed case prediction model, it is estimated that there could be distinct social consequences if the COVID-19 outbreak happened in different areas. During the epidemic control period, human mobility experienced substantial reductions and the mobility network underwent remarkable local and global structural changes toward containing the spread of COVID-19. Our work has important implications for the mitigation of disease and the evaluation of the socio-economic consequences of COVID-19 on society.


Introduction
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which caused coronavirus disease 2019 (COVID-19), was identified in Wuhan (the provincial capital of Hubei province) in December 2019 and then diffused across mainland China, coinciding with mass human migration during the Spring Festival period [1,2]. Given the migration scale and the position of Wuhan in the national transportation network, combating the dissemination of SARS-CoV-2 became urgent but very challenging.
As the Lunar New Year approached, a series of disease prevention and control measures were implemented, which effectively contained the evolution of the COVID-19 outbreak in early 2020 [2][3][4][5]. For example, people were encouraged to stay at home in a 14-day nationwide epidemic control period with the coming of the Lunar New Year. After 9 February 2020, the orderly economic reopening was enabled in most areas due to the notably positive momentum in epidemic control.
Human movements are thought to play a crucial role in shaping the spatio-temporal transmission of infectious diseases [6][7][8][9][10][11][12][13][14][15][16][17][18][19]. To this end, a wealth of studies has been dedicated to investigating the relationship between human mobility and COVID-19 spreading using statistical analysis [9][10][11]14,18,[20][21][22][23] and epidemiological modelling [3,16,17,24,25]. In this paper, we complement these studies by addressing the spatial spread of COVID-19 from the view of network diffusion. Specifically, using human mobility and case data across more than 360 cities in mainland China, we construct a national human mobility network and assess how the spatial dissemination of COVID-19 is associated with the mobility patterns and what could it be if the COVID-19 outbreak had occurred in different areas. Our analysis suggests that the spatial dissemination of COVID-19 in mainland China can be well explained by the human flow from Wuhan and the city population, which constitutes a local diffusion process in the mobility network, rather than a global diffusion process, where cities located at central positions are likely to have more cases due to travels of the infected people. This also indicates the effectiveness of the implemented disease prevention and control measures, where most of the infected people were quarantined or isolated during the epidemic control period, thus largely preventing further transmission to other areas. Based on the gained insights, a simple case prediction model is then constructed to estimate potential social consequences if the outbreak occurred in different areas. The estimation suggests that the place that the outbreak occurred would play an important role in shaping the spatial prevalence of COVID-19.
'The COVID-19 pandemic is far more than a health crisis: it is affecting societies and economies at their core' [26]. The implemented disease prevention and control measures not only significantly changed the course of COVID-19 spreading, but also triggered substantial changes in human mobility and forced re-evaluation of social and economic development [26][27][28][29][30]. Although several valuable attempts have been devoted to this field [27][28][29][31][32][33][34][35][36][37], there is still an immense shortage of empirical evaluation of the socioeconomic impacts of COVID-19 on society. Based on the collected human mobility data, this paper further presents an empirical assessment of the social changes in response to COVID-19. Specifically, we observe a long-lasting reduction of mass migration, where human movements were reduced substantially during the epidemic control period and steadily resumed after the reopening. The human mobility network experienced striking structural changes as well, with the average path length increasing drastically while the average degree decreased substantially during the epidemic control period. As the human mobility network provides the primary pathway along which infectious diseases were transmitted from one city to another, these significant social changes would in turn contribute a lot to combating the spread of COVID-19 [24,28,38]. Our study helps to understand the spatial dissemination of COVID-19 and could shed light on the modelling of disease spread and the evaluation of socio-economic consequences in the post-epidemic period.

Human mobility network
The human mobility data were collected from Baidu Migration platform [39], which is curated by the Chinese search engine Baidu based on its location-based services. This platform presents relative daily human movements (depicted by the Baidu Migration Index) rather than the exact number of travellers across cities and provinces in mainland China. We collected the human flow data of 366 cities at the municipal level, which cover most of the areas in mainland China. The national human mobility network is then constructed based on the human movements across cities (see Methods and electronic supplementary material for details). Figure 1a illustrates the aggregated human mobility network from 1 January 2020 to 23 January 2020, with nodes representing cities and edges representing human flows among them. Cities are placed according to their geographical coordinates, and node and label sizes are proportional to the weighted degree of each city in the constructed mobility network. Cities in Hubei province and the human migration from them are highlighted in colour. The corresponding human migration data are further presented in figure 1b, where cities in the same province are placed together and darker colours indicate larger flow values of human migration. For ease of visualization, only province names are shown, and the provincial capitals appear first in each provincial block. As shown in the figure, most of the large values are condensed around the diagonal in the migration matrix, which may suggest a clustered structure of human mobility where human movements primarily circulate from one city to another in the same province.
To contain the spread of COVID-19, Wuhan was put on lockdown on 23 January 2020 (2 days before the upcoming Lunar New Year). Shortly, similar epidemic control measures were also implemented in many other cities in Hubei province. As shown in figure 1e, the lockdown drastically reduced population flow from Wuhan to other areas. For example, compared with last year (2019 in lunar calendar) human migration from Wuhan dropped about 75% on the first day (25 January 2020) and 90% on the third day (27 January 2020) of the Lunar New Year. Figure 1c,d presents two snapshots of the daily human mobility network before (16 January 2020) and after (26 January 2020) the lockdown. Clearly, the implemented epidemic control measures had effectively cut off the social connections between Hubei and other areas.

The spatial dissemination of COVID-19
Catalysed by the annual Spring Festival Travel Rush (which involves as many as three billion trips in a 40-day period in 2019) and the improved clinical testing capacity, the number of confirmed COVID-19 cases was escalating with the arrival of the Lunar New Year (figure 2a). Consistent with some previous studies [9][10][11]18,23], we find that the spatial prevalence of COVID-19 in mainland China can be well explained (measured by R 2 ) by the human flow from Wuhan (1-23 January 2020) (figure 2b). For a given date, the R 2 -value is obtained by a univariate ordinary least squares (OLS) regression using the number of cumulative cases (logtransformed) on that day as a function of human flow from Wuhan (log-transformed) (see electronic supplementary material, Note 2 for further details). Specially, we achieve a R 2 -value of approximately 0.8 since 31 January 2020.
We further adopt a multivariate regression model and incorporate more city-specific factors in the analysis, including the global centrality of a city in the mobility network (measured by Pagerank [40,41]), city population, the spatial distance to Wuhan, intra-city activity intensity (provided by Baidu) and city tier. From a network perspective, human flow from Wuhan captures a local diffusion process of COVID-19 from Wuhan to neighbouring areas in the mobility network, while Pagerank would denote a global network diffusion process that involves multi-step transmissions across areas (electronic supplementary material, Note 2, figure S3). Therefore, the direct comparison between human flow from Wuhan and the global centrality of a city (Pagerank) would be able to answer the following question: which diffusion process dominates the spatial dissemination of COVID-19, local or global network diffusion? Using the royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20210662 variables described above, both OLS and negative binomial regression models are adopted in the analysis (see electronic supplementary material for further details). Figure 2c illustrates the estimated coefficients for each variable in predicting the spatial distribution of cumulative COVID-19 cases on 9 February 2020. Specifically, we find consistent evidence that both human flow from Wuhan and the city population act as significant and positive predictors ( p < 0.001) in the case prediction (figure 2c). In other words, cities with larger volumes of human migration from Wuhan and more of the population are likely to have more confirmed cases. More importantly, classic complex network spreading theory would hypothesize that cities located at central positions in the mobility network are generally vulnerable to infectious diseases. However, our study reveals that although the global network centrality of a city (measured by Pagerank) is positively correlated with the number of confirmed cases (Spearman's r s = 0.6698, p < 0.001), once the human flow from Wuhan and the city population are controlled in the regression, the positive role of the global network centrality in the prediction of cumulative COVID-19 cases disappears (figure 2c and electronic supplementary material, Note 2).
The finding suggests that the spatial dissemination of COVID-19 in mainland China can be well explained by a local network diffusion process, which goes only one step further from the outbreak area in the mobility network, rather than a global network diffusion process. It also implies the effectiveness of the implemented control measures where most of the infected people were quarantined and isolated during the epidemic control period, thereby largely preventing further transmission to other areas. In other words, without effective control measures, a global network diffusion process of COVID-19 may be uncovered, and cities located at central positions may have many more people infected due to the migration of the infected across areas.

COVID-19 outbreak in different areas
Based on the insights gained above, we estimate what would have happened if the COVID-19 outbreak had occurred in different areas. We focus on several key factors that help to predict the prevalence of COVID-19, including human flow from and the distance to the outbreak city, city population and intra-city activity intensity. Figure 3a presents   outflow index of nine example cities in January 2020, where some cities (e.g. Beijing, Shanghai and Guangzhou) had higher population outflow than Wuhan while some others (e.g. Changsha and Shenyang) had relatively lower population outflow. For the outbreak in Wuhan, we construct a negative binomial regression model with the cumulative number of cases on 9 February 2020 set as the dependent variable and the above key factors set as the independent variables. After that, we obtain the spatial spread pattern of COVID-19 depicted by these factors. Suppose that the control measures and the spatial spread pattern remain the same. Based on the constructed model, the spatial prevalence of COVID-19 can be roughly estimated when the outbreak area changes (see electronic supplementary material, Note 3 for further details). Figure 3b illustrates the estimated cumulative cases (excluding the outbreak area) as of 9 February 2020, varying with the outbreak area. The vertical dashed line indicates the actual cumulative number of confirmed cases in cities other than Wuhan on 9 February 2020 (which is 23 236) and serves as the baseline. Compared with the baseline, the relative change of cumulative cases for each outbreak area is shown as a percentage. As shown in the figure, if the COVID-19 outbreak happened in cities like Beijing and Guangzhou, the number of confirmed cases could be nearly doubled, but if the outbreak occurred in cities like Shenyang and Nanchang, the number of confirmed cases could be reduced by nearly half. This also suggests that the place that the outbreak occurred could play an important role in the spatial dissemination of COVID-19, which may have meaningful implications for the prevention of infectious diseases in the future.

Social changes
After the implementation of a series of epidemic control measures, human mobility underwent striking changes. Usually, we would expect a recovery of human movements The R 2 -value in each day is obtained by a univariate OLS regression using the number of cumulative cases (log-transformed) of each city on that day as a function of the human flow from Wuhan (log-transformed). (c) Estimated coefficients from multiple OLS regression (shown in circles) and negative binomial regression (shown in squares) are plotted, with error bars indicating 95% confidence intervals. Estimates whose 95% confidence intervals do not cross 0 are coloured.   Year, we observe that the national migration scale gradually decreased until the coming of the Lantern Festival (close to the reopening). After the economic reopening was put in force orderly, the national migration steadily resumed afterwards.
We also observe remarkable local and global structural changes of the mobility network (figure 4b). Firstly, after the implementation of a series of epidemic control measures, the average degree of the mobility network endured notable reductions before the economic reopening. This local structural change would reduce the connectivity of the mobility network and was able to prevent the spread of virus across areas. Secondly, the average path length of the mobility network experienced substantial increases during the epidemic control period. This global structural change would largely reduce the reachability of each area in the mobility network and was able to delay the spread of virus from one place to another. Taken together, these mobility changes during the control period would, in turn, contribute to the mitigation of infectious diseases [28,42]. After the reopening, especially after 15 February 2020, we observe a steady recovery of the network connectivity and reachability, which indicates the lifting of travel restrictions across the country.

Discussion
The COVID-19 pandemic is a serious crisis and a daunting challenge for the entire world. In this paper, our analysis shows that the spatial dissemination of COVID-19 in mainland China can be well explained by a local network diffusion process rather than a global network diffusion process, which implies the effectiveness of the implemented epidemic control measures. It is estimated that there could be very different social consequences if the COVID-19 outbreak area varies, which may have meaningful implications for future epidemic prevention and control. We also note a remarkable reduction in human movements during the epidemic control period, with significant structural changes to the human mobility network toward containing the spread of COVID-19. In summary, our work contributes to a further understanding of how human mobility data and network analysis can be used to address the spread of infectious diseases and paves a way for the application of data analytics in preventing and containing an epidemic.
Our work has several limitations. First, we emphasize that most of our conclusions are drawn upon correlation studies based on observational data, thereby not reflecting causality sufficiently. Second, the mobility data we adopted here are collected from Baidu based on its location-based services, but we are not able to incorporate the movements of those without such services in the current study. Other sources of mobility data are thus needed to enhance the analysis. Third, due to the lack of accurate timestamps of human movements, we do not exactly know the departure and arrival time of each travel trajectory. Therefore, there may exist travel delay issues in the human mobility data. For example, some people may depart from a city in 1 day but arrive at the destination in the following day. In addition, this paper mainly investigates the spatial dissemination of COVID-19 in mainland China, but whether the proposed approach applies in other areas or other kinds of infectious diseases still needs further exploration in the future.

Data
The human mobility data were sourced from the Baidu Migration platform [39] based on Baidu's location-based services. As the dominant search engine in China, Baidu has nearly 189 million daily active users and responses to more than 120 billion daily location service requests. Similar to previous studies [3,10], the mobility data do not indicate the absolute number of recorded trips but reflect the relative movements of people using Baidu's location-based services. We collected daily interand intra-city mobility data across 366 cities from 1 January to 15 March in 2020 and the corresponding period in 2019 (aligned by the Lunar New Year). For inter-city activity in 2019, only aggregated inflow and outflow data were provided for each city. The COVID-19 data were obtained from the daily  royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 19: 20210662 case report released by the Health Commission of each province and NetEase News [43], a professional media platform that provides timely updates and serves as a supplementary source in our study. The population of each city was collected from the National Economic and Social Development Statistical Bulletin 2019. The spatial distance between two cities was obtained by their geodesic distance based on their latitude and longitude geographical coordinates.

Network analysis
We adopt Pagerank [40,41], a classic global network centrality measure, to quantify how important a city is located in the mobility network. In practice, the human flow volume between two cities is used as the weight in the calculation of Pagerank. The average degree measures the local connectivity of the mobility network and can be simply calculated using the average number of incoming and outgoing edges of each node in the mobility network. In the context of human migration, two cities are said to be close to each other if they share a large volume of human flow [28]. As such, we use the inverse of the human flow volume to denote the 'network distance' of two cities along each edge, based on which the shortest path length from one city to another is calculated. The average path length of the mobility network is obtained by averaging the shortest path length of all pairs of nodes. In practice, these network metrics were computed using Python package networkx. In addition, given a vector V comprising a list of quantities, the element of V is normalized as V i = (V i − V min )/(V max − V min ), where V max and V min are the maximum and minimum values of V, respectively.

Statistical analysis
Most of the data processing was done by Python package pandas and R package dplyr. Spearman rank correlation was performed by Python package scipy; OLS regression analysis was performed by Python package statsmodel and R function lm; negative binomial regression was performed by R package MASS and Python package statsmodel. Further details on statistical analysis can be found in electronic supplementary material.
Data accessibility. The raw human mobility data can be obtained from Baidu Migration: https://qianxi.baidu.com/2020/. The processed data as well as the developed codes in the study can be found in the associated GitHub repository: https://github.com/yflyzhang/ spatial_COVID_19.