The time geography of segregation during working hours

While segregation is usually evaluated at the residential level, the recent influx of large streams of data describing urbanites’ movement across the city allows to generate detailed descriptions of spatio-temporal segregation patterns across the activity space of individuals. For instance, segregation across the activity space is usually thought to be lower compared with residential segregation given the importance of social complementarity, among other factors, shaping the economies of cities. However, these new dynamic approaches to segregation convey important methodological challenges. This paper proposes a methodological framework to investigate segregation during working hours. Our approach combines three well-known mathematical tools: community detection algorithms, segregation metrics and random walk analysis. Using Santiago (Chile) as our model system, we build a detailed home–work commuting network from a large dataset of mobile phone pings and spatially partition the city into several communities. We then evaluate the probability that two persons at their work location will come from the same community. Finally, a randomization analysis of commuting distances and angles corroborates the strong segregation description for Santiago provided by the sociological literature. While our findings highlights the benefit of developing new approaches to understand dynamic processes in the urban environment, unveiling counterintuitive patterns such as segregation at our workplace also shows a specific example in which the exposure dimension of segregation is successfully studied using the growingly available streams of highly detailed anonymized mobile phone registries.


Sensibility analysis of community detection algorithm
The XDR dataset consists of four weeks (from Monday to Friday) distributed along four months: March, May, October and November. We assessed the changes of communities detection as a function of the working day chosen by comparing the results to the "aggregated network", that is the full dataset . We generated five home-work networks using the procedure outlined in Section 2. The first network was created aggregating data of all Mondays (i.e. March 16, May 11, September 3, and November 23), the second network was produced with all available Thursdays, and so on. We then applied the community detection algorithm explained also in Section 2. Results show that, six communities were always found in Santiago, irrespective of the day chosen to perform the analysis, and with a large match rate when compared to the aggregated network (see Figure S1). To quantify the correlation among resulting communities, each of the six communities were labeled in the aggregated network. Then, quantified for each detected community, the number of the nodes remaining in the same community when compared to the aggregated case. The resulting correlations are shown in Table S1. Despite of community B, which changes from the downtown zone in the aggregated case into a broader area in the case of Tuesday to Friday, the communities retained their ascription to the same, and well defined, zones (see Figure S1).  Table S1: Percentage of nodes retaining their same community ascription, compared to the aggregated network. Figure S1: Comparing the communities obtained for each weekday network. The upper left map (outlined) is the aggregated network, i.e., the network created by taking all the data.

Random walk simulations
We also developed an algorithm to create new simulated workplaces for each user in our dataset. The algorithm generates random coordinates from the empirical distance and movement angle distribution for the community that the user belongs to. A pseudo-code description is provided: Algorithm 1 Random walk simulations algorithm procedure RandomWalks(communities,dataset) for each community comm in communities do dist d , dist θ ← get distance and angle movement probability distribution for community comm for each user u in dataset do X h , Y h ← get homework coordinates of user u d, θ ← assign a random distance and angle of movement to user u, which are taken from probability distributions dist d (u) and dist θ (u), respectively.
if (X w , Y w ) lies outside the urban boundary then goto line 6 else assign the closest tower to point (X w , Y w ) as the new workplace of user u

Socioeconomic level classification from census data
We used the classification of socioeconomic level (SEL) proposed by ADIMARK (2009). Family households are classified in one of five categories: ABC1, C2, C3, D, and E, which in our work, relabeled as S1, S2, S3, S4, S5 and S6 for simplicity. ABC1 are the most affluent families. This labeling follows from the criteria described in Table S2. Hence, two dimensions drive the SEL classification: educational level and ownership of material goods. No studies  Both dimensions are directly extracted from census data. Each census block is then represented by the most frequent SEL in that block (i.e. geographic unit).
Each census block k may be represented by a vector v k of five components, one per each socioeconomic level (SEL), and it will have a value of one for the corresponding SEL, and zeros in all the other four places (each census block is represented only by one SEL). For example, if the census block k belongs to the S1 category, then v k = [1, 0, 0, 0, 0]. Given a Voronoi cell j, with area A j , we may obtain its socioeconomic composition v j as: where A kj = A j ∩ A k , the areal intersection. This assures that area is taken into account, and Voronoi cell composition is a mere aggregation of census blocks weighed by their area contribution. In Figure S2 we show an example, where we have five blocks (red rectangles) contributing to a Voronoi cell (in blue). Red filled areas correspond to the intersection areas A kj . In this case, S1 and S2 are the main SEL. It is easy to see that the sum of the elements of v j is less or S2 S2 S1 S5 S6 Figure S2: Example of a Voronoi cell superposed with census blocks. equal to one. However, if we want to interpret the elements of v j as a proportion (or percentage) of a certain SEL, there is one more issue to deal with. As it can be seen in figure S1, the equality k A kj = A j is not always fulfilled, because there are gaps between census blocks (red rectangles), so that the entire area of the Voronoi cell is not filled. These gaps usually correspond to streets, parks, or other non residential areas. Nonetheless, one can easily correct this by redefining each vector v j : i.e., to do this correction and interpret area coverage as a percentage we may divide v j by the sum of its elements to adjust the resulting sum to one.
Finally, we may obtain the SEL composition vector for an entire community i as follows: where the sum is taken over all the Voronoi cells j belonging to community i.