Measuring accessibility using gravity and radiation models

Since the presentation of the radiation model, much work has been done to compare its findings with those obtained from gravitational models. These comparisons always aim at measuring the accuracy with which the models reproduce the mobility described by origin–destination matrices. This has been done at different spatial scales using different datasets, and several versions of the models have been proposed to adjust to various spatial systems. However, the models, to our knowledge, have never been compared with respect to policy testing scenarios. For this reason, here we use the models to analyse the impact of the introduction of a new transportation network, a bus rapid transport system, in the city of Teresina in Brazil. We do this by measuring the estimated variation in the trip distribution, and formulate an accessibility to employment indicator for the different zones of the city. By comparing the results obtained with the two approaches, we are able to not only better assess the goodness of fit and the impact of this intervention, but also understand reasons for the systematic similarities and differences in their predictions.

DP, 0000-0002-4047-4628; MB, 0000-0002-9931-1305 Since the presentation of the radiation model, much work has been done to compare its findings with those obtained from gravitational models. These comparisons always aim at measuring the accuracy with which the models reproduce the mobility described by origin-destination matrices. This has been done at different spatial scales using different datasets, and several versions of the models have been proposed to adjust to various spatial systems. However, the models, to our knowledge, have never been compared with respect to policy testing scenarios. For this reason, here we use the models to analyse the impact of the introduction of a new transportation network, a bus rapid transport system, in the city of Teresina in Brazil. We do this by measuring the estimated variation in the trip distribution, and formulate an accessibility to employment indicator for the different zones of the city. By comparing the results obtained with the two approaches, we are able to not only better assess the goodness of fit and the impact of this intervention, but also understand reasons for the systematic similarities and differences in their predictions.

Introduction
Assessing the impact of new infrastructural projects is a challenging and demanding task that requires knowledge or estimates of the mobility of the individuals living in the city. Many models have been developed to this effect [1][2][3][4], focusing on different scales of the urban system, according to the quality of the data available (for a recent review on the subject refer to [5]). Traditionally, these models allocate trips from one geographical zone to another, according to estimates of where people live and work. Infrastructure projects are then assessed following changes in accessibility which are computed from the model's predictions network, and a rail service connecting the southeast zone to the city centre. The rail service operates sharing the infrastructure with freight trains on a single track in both directions, resulting in a sparse and low usage service. For this reason, it will not be considered for cost comparisons in this study. The present structure of the public transport system is thus non-hierarchical. The majority of the lines form a radial scheme, departing from the suburbs towards the city centre with a few services that directly connect zones in the suburbs. In recent years, several exclusive bus lanes in the central area have been constructed aiming at reducing travel time on congested roads.
As previously mentioned, in 2007 the municipality of Teresina conducted an origin -destination survey which was developed to analyse travel patterns and predict future scenarios [23]. This was population distribution employment distribution (a) BRT corridors and stops used for the Transport and Mobility Master Plan in 2008, which proposes a new system of public transport in the city. The original proposal suggests the implementation of a BRT system, splitting the existing routes and creating a hierarchical system of feeder, inter-terminal and trunk services. This will be composed of eight terminals connected through express bus corridors. The proposal aims to increase the effectiveness of public transportation in the city, improving the accessibility to jobs, education and public services. In this paper, through different measures of accessibilities, we evaluate the impact of such an infrastructure project. The population and employment distribution data are provided by Teresina city council's census. The main mobility data source is the STRANS/PMT (Teresina Transport Authority) database created through an origin-destination survey conducted in 2007 including the cities of Teresina and Timon [23]. The database contains household conditions, personal socio-economic information and travel diaries per person on the day before the survey was taken. The main dataset is based on the trip data (walking times, waiting times, travel times, mode, origin and destination zones and activities and trip costs) combined with disaggregated demographic and socio-economic information about the traveller (education level, income, gender, employment, age) and traveller's household data (traffic zone, comfort and deprivation variables-number of families, bedrooms, bathrooms, sewage system, access to water, energy consumption). The survey consists of 64 traffic zones in Teresina and Timon that coincide with the cities' districts which provide the opportunity to gather socio-demographic data from the national census. In total, the dataset contains 5177 journeys distributed across 138 households. The dataset also contains information about the number of employers, students and total population in each traffic zone. Geo-referenced data are also available for bus routes and stops for all routes in Teresina and Timon and the general transit feed specification (GTFS) for the public transportation in the city. Data about Timon's buses routes were taken from Moovit App. The Bus Journeys Dataset was also taken from STRANS/ PMT and contains 45 090 bus trips for a 24

Methods
In this section, we will present and briefly recount the details of the models we have used to estimate the impact of the introduction of the BRT system in Teresina and Timon, while also introducing the equations we have used to measure the accessibilities. As we will see, we have taken into account the journey to work distribution using both models and infrastructure to calculate the accessibilities.

The gravity model
In the gravity model approach, the flow from zone i to zone j is proportional to the opportunities, employment in this case, E j in destination j, and to the demand in origin i (which is represented by the population) P i , and weighted by the cost function f(c ij ). Given our approach, we consider the cost c ij of going from i to j as the expected time of travel using the public transportation network (see appendix for details on the travel time calculations), and the function as being an exponential decay of the form f(c ij ) ¼ e 2bc ij , where b is a parameter that has to be calibrated on data. Traditionally, to model the journey to work trip distribution, the total number of commuters is constrained (outflow), as are the employees arriving at work (inflow). This corresponds to the doubly constrained model, where the flows are described by the equation where A i and B j are two normalization constants that one has to solve iteratively. By imposing the constraints on the total outflow and on the total inflow X j T ij ¼ P i and X i T ij ¼ E j ð3:2Þ and following the procedure in [4], we get In its single-constrained version, where only the constraint on the outflow is kept, the flow from origin i and destination j in this case has the form where Z i is the normalization constant. Imposing the constraint on the outflow P j T ij ¼ P i leads to ð3:5Þ In the next section, we will use both forms to reproduce the origin-destination matrix and to calculate the accessibilities.

The radiation model
The original radiation model [14] takes its inspiration from the intervening opportunities model [1]. In this approach, the probability of commuting between two units i and j depends on the number of opportunities between the origin and the destination, rather than on their distance. In its original formulation, the radiation model made use of the population in each zone, using it also as a proxy for employment. The flow between zone i and j is therefore quantified as where P ij is the population in zones included in a radius of distance or travel time d ij , and excluding those of zones P i and P j . These represent the opportunities between them, and T i is the amount of commuters in i. As presented, the model in equation (3.6) was formulated to describe flows happening on large scales and the absence of parameters to calibrate makes the model hard to fit to smaller scales. For this reason, the form we have used is slightly different and following [21] we have added a normalization constant that takes into account the finite size of the system. Moreover, we have used the employment to characterize opportunities rather than the population. The flows between zones i and j are now described by where P is the total amount of population in the system, and where E ij is the amount of employment between zones i and j. As previously said, for our analysis we have used the expected travel time to measure the cost of travelling from one zone to the another, therefore as in [34], E ij here represents the opportunities within travel time c ij from i. We have also used the extended version of this model presented in [20] where a parameter a is introduced, and whose calibration makes the model adaptable to different spatial scales. The flow in this version is derived by combining the original radiation model with survival analysis [35] and in this context the probability of commuting from i to j is described by the equation The details of the calculations that lead to this form may be found in [20]. The flows in the extended version are the product of equation (3.8), the population in the origin zone i, and normalization term : ð3:9Þ One may note that as per our construction, equation (3.9) is constrained to meet the outflow P k T ik ¼ P i , but not the inflow P k T ki = E i which is correspondence in the singly constrained version of the gravity model.

Measures of accessibility
As mentioned in the introduction, in order to quantify the impact of the new infrastructure, we measure the accessibility predicted by the models before and after the introduction of the BRT, which is done by using both c old ij and the updated cost matrix c brt ij . Of course, the variation of accessibility induced by changing the cost matrix will have an impact on the population and employment distributions. A zone with an increased accessibility will see the house price rise, which will affect the resident population and consequently the employment. Despite these secondary effects definitely playing an important role in the planning of such rsos.royalsocietypublishing.org R. Soc. open sci. 5: 171668 public interventions, they emerge as a result of complex nonlinear interactions and are therefore hard to predict. For this reason, their description goes beyond the scope of this work, where we have concentrated on the immediate impact of the transportation network.
Accessibility has become a central concept in physical planning in the past decades [36][37][38][39][40], and many different definitions exist which depend on the specific application. In general, as stated in [39], accessibility associates some measure of opportunity at a place with the cost of actually realizing that opportunity, or in other words as the cost of getting to some place traded off against the benefits received once that place is reached. In [39], two main types of accessibility are defined: type 1 takes into account the locational behaviour described through models, with infrastructure only implicitly considered; type 2 considers the physical infrastructure and some generalized measure of the distance from the zone of interest to all others. For a zone i, we define accessibility of type 1 as where c ij is the cost of commuting from i ! j (this is a measured quantity which depends on the infrastructure), and T ij is the predicted flow and depends on the model used to calculate it. The accessibility in equation (3.10) quantifies the inverse of the average cost of commuting from the given area: high values of A i correspond to the flows happening with low values of c ij . We define, as a simple measurement of the infrastructure, the accessibility of type 2 as where E j are the opportunities (employment) in zone j, and N d is the total number of destinations. Once again, equation (3.11) is telling us the average benefit -cost ratio for zone i. As we can see, there is no modelling involved in this measure, and we have only exploited the cost matrix and the opportunities distribution. Comparing equations (3.10) and (3.11) allows us to understand the information benefit of adding a modelling layer to the analysis, especially given the use of the different kinds of model we have applied.

Results
In the first part of this section, we calibrate the models on the origin and destination matrix. To do so, we used Sørensen's index which estimates the degree of similarity between the observed and modelled trips. This is how usually models are compared: the model that yields the highest value of the index is then considered the most fit to describe the system. As mentioned here, we compare models simulating a policy testing scenario, and therefore regardless of the results obtained during the calibration we then use the models to predict the variation of the accessibility of each zone after the introduction of the new BRT corridors. These results are found in the second part of the section. Finally, we study in detail the characteristics of the zones whose predictions obtained using the double-constrained gravity and the extended radiation models are similar, and those for which they are different, in search of systematic differences. We chose the two models given the great similarity in their results.

Sørensen's index calibration
As we have seen, the survey used to calibrate the models counts only 138 households, and therefore does not contain information on trips from every zone to every other. For each origin zone, we only have information on trips to a limited number of destinations. As done in previous comparisons [18 -21], we calibrate our models using Sorensen's index [41], and exploit the dataset in [23]. The index is defined as  When repeating the calculation in equation (4.1) for the extended radiation model, we see how the maximum is asymptotically reached for a ! 0. Given that we are working at an intra-city scale this is not surprising, and in the electronic supplementary material of [20] (in §9) the authors have solved the model's form as equation (3.9) for this limit with these scales in mind. The equation that describes the flow from zone i to zone j for a ! 0 becomes lim a!0 :

ð4:2Þ
Perhaps surprisingly, the single-constrained gravity model is the one which performs the best with the index reaching I sgl ¼ 0.39, followed by the double-constrained where I dbl ¼ 0.38 and the extended radiation where I ext ¼ 0.34. The normalized radiation with no calibration process yielded a Sørensen index of I rad ¼ 0.22.

Accessibility variations after the introduction of the BRT
In order to quantify the impact of the introduction of the BRT, we have measured, for all zones in Teresina, the quantities in equations (3.10) and (3.11) using the cost matrices before the intervention c old ij and after c brt ij . To explicitly study the variation introduced by the BRT system, we have then analysed the ratio of the two quantities so that the zones with A var i . 0 are predicted to benefit from the intervention and vice versa. We repeat this process using the gravitational and radiation approach. The results are summarized in figure 3, where we show in detail all the quantities we have discussed and the spatial distribution of the predicted impact on the city's zoning system. In the top panel, we have shown A old and A var calculated with the various models. Given the difference in the characteristic values between the accessibilities of type 1 and 2, we only compare their predictions on the A var , where the variation of type 2 is represented by a black curve. We will refer to A dbl i , A sng i , A ext i and A rad i for the accessibilities calculated using the double-constrained gravity, single-constrained gravity and the extended and normalized radiation model, respectively.
What immediately catches the eye is the substantial agreement between the A ext i (blue curve) and A dbl i (red curve) before the BRT introduction. The R 2 values in the insets confirm this impression, with higher values between the two models than that measured between A dbl i and A sng i . This is quite unexpected, especially considering the fact that per construction, the extended radiation model is a singly constrained model. Very similar results apply when calculating the accessibilities on the new cost matrix c brt ij , these have not been shown to avoid redundant figures. Looking at the accessibility variations A var i , it is clear that all models predict two main peaks, one for zones 17 and 16, and another for zone 26. The first peak can be seen analysing both types of accessibility, A 1 i and A 2 i (black curve), while the peak in zone 26 is not captured by the measure of the infrastructure. The BRT map in figure 1 shows that a new BRT stop is planned near zone 26, so a spike in its accessibility is indeed reasonable. This seems to suggest that the behavioural layer introduced through the modelling of the flows adds more information to the simple analysis of infrastructure. The variation of the accessibility with current transport network predicted accessibility variation after BRT predicted variation gravity single constrained predicted variation gravity double constrained predicted variation extended radiation predicted variation radiation  Figure 3. In this figure, we show the accessibility to employment with A old i and A var i ¼ A brt i /A old i on the top left and top right, respectively, for each zone of the city of Teresina and Timon. As noted in the legends of the figures, the red (double) and orange (single) curves show the results obtained from the gravity models, while the blue (extended) and magenta (normalized) curves represent the results obtained with the radiation models. The black curve in the accessibility variation is the variation measured with respect to accessibility of type 2. In the insets, we present the R 2 values between the various curves. The maps in the bottom panels present the spatial distribution of A var obtained with all the models. rsos.royalsocietypublishing.org R. Soc. open sci. 5: 171668 accessibilities of type 1 predicted by all models has a high R 2 value (R 2 . 0.8), while the correlation between the models and A 2 is significantly lower.
To understand the spatial configuration of the results, we show heat maps of the A var on the map of the city in figure 3. The maps show that if the zones in the city centre are expected to benefit from the new transport network, the main benefits are located in the south zones 16 and 17, and in the centre north zones 26 and 27. We can see from the BRT map in figure 1 that these zones are close to where new bus stops will be positioned. The zones in the north of the city, despite being close to the new stops and served by a BRT, are expected to generate longer journeys to work, and therefore their accessibility will be lower. The zones in the district of Timon are not included in the BRT intervention, and it is therefore not surprising to find that their employment accessibilities decrease according to all the three models.
In general, we can say that the predictions obtained with the models are in good agreement. This is clear by looking at the high R 2 values obtained when comparing the curves, and it is especially true for the doubly constrained and the extended radiation models. Moreover, the normalized radiation's predictions are in great agreement with the other models, which was not expected considering the absence of a calibration process, and a low Sørensen index. To conclude, we can say that the model's predictions are coherent and, bearing in mind the spatial distribution of the BRT stops and corridors, also reasonable, and that the accessibility of type 1 seems a more appropriate measure for this study than the measure of type 2.

The doubly constrained gravity versus the extended radiation model
As portrayed in the introduction, one of the main objectives of this work is to analyse similarities and differences in the predictions made using the two approaches. Despite the great agreement found between the doubly constrained and the extended radiation models, several zones show contradictory results. We will now look into the properties of these specific zones, and check if there are similarities among them. We do this by comparing the predictions on the accessibility after the BRT introduction, A ext i and A dbl i , by analysing the difference in the rank of each zone, namely where r dbl i is the rank of zone i using the gravity and r ext i is the rank obtained using the radiation model. If Dr i . 0, this implies that r dbl i . r ext i , which means that the accessibility of zone i ranks higher if we use the gravitational model to calculate the accessibilities, and vice versa. The distribution of the quantity in equation (4.4) calculated using the two models is found in the top panel of figure 4. The red and blue areas highlight the zones for which the predictions vary (kDrk . 0), which are those whose characteristics we want to analyse.
By introducing a threshold on the rank difference, we can divide the zone into three categories: -Category 1: zones that rank higher when using the extended radiation model of a quantity T d : Dr i , -T d (blue in the figures). -Category 2: zones that rank higher when using the double-constrained model of a quantity: 2T d : Dr i . T d (red in the figures). -Category 3: zones whose rank is similar when using the two models: T d , Dr i , T d (white in the figures).
Of course for increasing values of T d , categories 1 and 2 contain zones of whose predictions vary considerably using the two models.
We have studied several characteristics of the zones that belong to each category: the population and employment distribution, their spatial distribution and their distance to employment. The population and employment distributions are completely comparable in all three categories, and for this reason we have not shown it in the figures. Furthermore, no precise spatial pattern emerges when projecting these categories on the city's zoning system (map in figure 4a). But, an interesting difference emerges when quantifying the distance to employment of the different categories, as we can see from figure 4b. What emerges is a clear tendency of the zones of category 1 to be closer to the opportunities (blue curves) than those of category 2 (red curves). By looking at the figure, it is clear how the blue curve tends to increase faster than the red one. Zones in category 1 tend to be surrounded by zones rich in employment, while category 2 neighbouring zones have less employment. This seems to imply that the extended radiation tends to give a higher weight to closer neighbours than the double-constrained gravity model, and probably depends on the different way the two models handle distance (metric vs rsos.royalsocietypublishing.org R. Soc. open sci. 5: 171668 topological). Moreover we can see from figure 4 how this tendency increases when increasing the difference threshold T d .

Discussion
In this work, we used two different approaches to test the impact of a new transportation system. In doing so, we achieved two results: on the one hand, we were able to identify which zones would benefit the most from such an infrastructure project by increasing their accessibility to jobs and services; and on the other, we were able to compare the performance of the gravity and the radiation models on real data. Interestingly, the agreement between the results obtained with all models is outstanding. This may suggest that the detailed reproduction of the observed distribution is not crucial in a city planning context with respect to measuring accessibility variations at the scale we are working at. Furthermore, we have compared the estimated impact on the accessibilities, using the type 1 and 2 definitions.
The type 1 accessibility seemed to be in better agreement with the expected impact. By looking at figure 3, we see that the zones with the highest estimated improvement, zones 15, 16 and 26, have low starting accessibilities and are positioned next to a BRT stop. The modelling of flows does seem to add information that the policy maker cannot extract simply by observing the differences in the infrastructure and the trip durations. With this in mind, the models that seem to better describe mobility are the extended radiation and the doubly constrained gravity model. We have seen how, despite considerable agreement, the predictions obtained by the two models differ for those zones with many opportunities around them and those with very few. Indeed, we have seen that the zones with a neighbourhood rich in opportunities perform better with the gravity model, and vice versa. This is an interesting result, which is potentially useful in urban planning scenarios. The results we  present do not take into consideration exogenous changes to the system as an indirect consequence of introducing the BRT network. This is because accessibility, population and employment distribution are inter-dependent, and nonlinear relationships emerge feeding back on each other. In this work, we have not taken into account these externalities and have only described how the new transportation network would impact the accessibility distribution. We leave the rest of the analysis to future work. That said, we may conclude by saying that these preliminary results show that the radiation model in its extended version could be a valid alternative to study urban mobility and test new transportation networks. More research on this topic would help us better understand how the two models could support each other.
Data accessibility. All the data used in this work can be found in the public repository, https://github.com/ ducciopiovani/Data-on-Teresina-, where we have stored the cost matrices and the origin and destination matrix in three separate files.