On the importance of trip destination for modeling individual human mobility patterns

Understanding human mobility patterns and reproducing them accurately is crucial in a wide range of applications from public health, to transport and urban planning. Still the relationship between the effort individuals will to invest in a trip and its purpose importance is not taken into account in the individual mobility models in the literature. Here, we address this issue by introducing a model hypothesizing a relation between the importance of a trip and the distance traveled. In most practical cases, quantifying such importance is undoable. We overcome this difficulty by focusing on shopping trips (for which we have empirical data) and by taking the price of items as a proxy. Our model is able to reproduce the long-tailed distribution in travel distances empirically observed and to explain the collapse of the curves for different price ranges. Our results show the presence of a genuine scaling relation controlled only by the mean distance traveled connected, as hypothesized, to the item value.


INTRODUCTION
Individual human mobility is a complex phenomenon, involving various mechanisms interacting at different spatial and temporal scales. These dynamics are the product of individual behaviors, governed by decisions that may depend on multiple contextual factors such as economic resources, geography, culture, norms, habits or life experiences. However, beneath this apparent complexity lies remarkable temporal and spatial regularities in the way people travel and interact with their environment [1]. Results obtained in several studies based on dollar-bill tracking [2], mobile phone data [3], Twitter data [4,5], Foursquare data [6] and GPS data [7] suggest that the distance ∆ r between two consecutive locations follows an heavy tail distribution well approximated by a Pareto function P (∆ r ) ∼ ∆ −(1+α) r with 0 < α ≤ 1. It has also been shown that individuals tend to be attracted by popular places [8,9] and to return to previously visited locations, thus increasing the predictability of individual human movements [10] and allowing the identification of most visited locations as well as the characterization of daily commuting patterns [11]. Individual human mobility patterns are also strongly influenced by geographical constraints [12] but also by individuals' socio-economic status [13][14][15] and social network [16][17][18][19].
Based on these empirical observations, several models have been proposed for modeling individual human mobility patterns. The simplest type describe human traveling using Lévy Flights and Continuous Time Random Walks [2,20]. These models give accurate predictions but fail at reproducing some features such as the individuals' tendency for revisiting locations [3,20,21]. In [21], the authors propose a new model considering two generic mechanisms: exploration and preferential return, to decide whether an individual will visit a new place or a previously visited one as his/her next displacement. Going further in this direction, several models have been proposed to take into account diverse contextual factors such as the social context, urban geography and/or type and popularity of locations [9,11,12,22].
Nonetheless, most of these models focus on longterm mobility, and, most importantly, they do not take into account the characteristics of the destination such as the travel purpose and its importance for the individual. Indeed, one can assume that an individual is not willing to invest the same amount of time or money, more generically, the same effort or amount of "energy" into a travel according to the value attached to the destination/objective of this travel. A basic trip purpose is the displacement between home and work, which have been collected in censuses for decades (in the US, for instance, since 1990). The introduction of new GPS-based technologies have permitted to explore other trip purposes since the early 2000's [23,24]. Even though the relationship between trip cost and destination importance has been postulated in transport economy, and more recently in ecology, with the use of travel cost methods to assess the value of a natural sites based on the time and travel cost expenses that people spent to visit this site [25,26], without adequate empirical data sets to explicitly assess the "value" of a destination this feature is rarely modeled at an individual scale.
Inspired by search processes for wild food resources in natural environment [7,[27][28][29], we propose in this work a human individual mobility model taking into account the value given to the trip destination. We test this model on two case-studies (the metropolitan areas of Barcelona and Madrid) based on a shopping trips data extracted from 40 million bank card transactions made by customers of the Banco Bilbao Vizcaya Argentaria (BBVA). We show that our model is able to reproduce and explain the relationship between importance of trip destination and distance traveled observed in the data.  Figure 1. A schematic diagram of the model. At each step, the individual leaves his/her actual location and moves in a random direction at a distance sample from a Pareto distribution P (l) = αl 0 α l α+1 . If the new location falls outside of the square boundaries the sampling process is repeated. According to the value v given to the trip destination, the individual will then decide to stop or not his/her journey with a probability p. If the individual decides to end his/her journey the final destination is drawn at random in a circle of radius r around the last position (green circle).

The model
The proposed model can be interpreted as a search process that stops when a satisfying object (destination) has been found [30]. The rules of the model are outlined in Figure 1. We assume that the individuals start their travel at his/her actual location (at home for instance), and, at each step, move in a random direction and at a random distance sampled from a Pareto distribution P (l) = α l 0 α l −α−1 , where α is the exponent and l 0 the minimum spatial scale considered. At each step, the possibility to end the travel is represented by a probability p of fulfilling the trip goal. Note that unlike most of the models described in the introduction, since only short-range mobility patterns are considered, our model does not take explicitly into account time. We assume that the probability of stopping is related to the importance given to the trip goal v. The higher the value v associated to the objective of the travel, the longer the search process and the distance traveled can become. If the purpose of the trip is a search to buy an object, the individuals would be willing to explore more shops or to travel further as the item price increases (buying a car requires more "energy" than a piece of bread). The number of steps during the search is related to the distance from home to the location where the goal was met, where the item was finally found/purchased.     Figure S4 in Appendix.

Data
To validate our model and, especially, the assumption that it exists a relationship between travel cost (energy invested) and the importance given to its destination, we analyzed a credit card dataset containing information about 40 million bank card transactions made by customers of the Banco Bilbao Vizcaya Argentaria (BBVA) in the province of Barcelona and Madrid in 2011 (see the Appendix for details). The cost associated to the travel is estimated with the distance between the user's place of residence and the location of the business in which the transaction occurred. The value v given to the objective attained in the final destination is inferred by the amount of money spent.

RESULTS
First, we investigate the relationship between the cost associated to a travel and the importance given to its purpose. Figure 2 displays the probability density function of the distance between the users' home and the location of the business in which the transaction occurred according to the amount of money spent v divided into five intervals. Several regimes can be observed. First, the probability to travel a certain distance to make a purchase increases, reaching a maximum between 500 m and 1 km, and, then, the probability starts to decrease, slowly at first, and then more rapidly, exhibiting a power-law like decay. Finally, after 50 km the province boundaries act as a natural cutoff in the distribution (our data is limited to single provinces). The mediand has been computed to char-acterize the impact of the amount of money spent on the distance distribution. It is interesting to note that the median increases with the amount of money spent from about 2.25 km for the smallest amounts to 4 km for the highest ones, supporting our initial hypothesis (Figure 3b).
More interestingly, a scaling factor depending only ond can be used to collapse all the PDFs shown in Figure 2 into a single curve (Figure 3a). This means that the mechanisms underlying trip generation are the same for all price ranges and the only difference is a characteristic distanced, which is a function of the price v of the item to purchase. Therefore, the energy that people invest into a travel is directly related to the importance of the trip destination following the scaling relationshipd ∼ v γ . Moreover, this positive correlation between the two quantities is independent of the spatial distribution of users' place of residence and their economic and sociodemographic characteristics (see SM for details), which further supports the universality of the tip generation mechanisms.
We now calibrate our model to fit the PDF of the distance between the users' place of residence and the location of the business. We first consider the distribution of all the amounts combined in order to calibrate the parameters related to the mobility and the geographical shape separately. As it can be seen in Figure 4a, the fit is quite good. The parameter α, exponent of the Pareto distribution, is equal to 0.7 which is consistent with values obtained in other studies [2][3][4][5][6][7]. We obtain a value of p equal to 0.2, this value, comprised between 0 and 1, has an inverse relation to the energy that people are willing to invest in order to go shopping in the province of Barcelona. Finally, we explore the behavior of p according to the  amount of money spent v. The results obtained are plotted in Figure 4a (inset). As expected, the value of p decreases with increasing v, which implies that the distance traveled grows with the price of the item to purchase. Furthermore, we find that a scaling relation of the type p ∼ v −β adjusts well to the data. However, keep in mind that the model does not impose a given relation between p and v, it can be general with different type of data leading to diverse relationships (or exponents if the power-law scaling holds). In our case, bothd and p can be expressed as scaling functions of v. It is, therefore, important to understand the relation between the direct observable in mobilityd and our model's p. If the basic displacement distribution had had a finite second moment, i.e., the movement was a random walk in 2D, it would have been possible to find analytical approximations for the final distance. However, this task becomes complex with a finite number of steps in a Lévy flight. For this reason, we assume a relation p ∼ v −β in the model, calculate numericallȳ d for different v, estimate the exponent γ fromd ∼ v γ and compare the values of γ obtained versus those of the corresponding β. The results of this exercise are shown in Figure 4b. The relation is initially non linear, it becomes the identity for a range of exponent values and later it saturates. The saturation in γ is related to the box size: given that we are considering a limited square in space, if β is large thend approaches the maximum possible scale and its exponent γ does not change anymore. The effect of increasing the box size is also exposed in the Figure 4b. In the infinite limit, the identity between γ and β is maintained. This result is consistent with the empirical observations made in Barcelona and Madrid, confirming that the probability to stop the journey is inversely proportional to the distance traveled (p ∼ 1 d ).

DISCUSSION
In summary, we introduced a model of individual human mobility patterns taking into account the importance of trip destination. We shown that our model is able to reproduce the link between cost of a trip and importance of its purpose found in credit card data. The distribution of distances where the items are finally found with respect to the homes follow a single universal curve, with a single parameter that accounts for the difference in price range (d). The model is able to reproduce these behaviors and also to mimic the final scaling relation. These results shed new light on the modeling of human mobility patterns at an individual scale.
The results obtained give a good confidence in the robustness of the scaling relationship observed in the data by comparing this relationship in two cities and by assessing the effect of the users' characteristics and business category on the exponent of the scaling relationship (see the Appendix for more details). However, it will be important to evaluate our hypothesis and our model on case studies coming from other continents and on different data sources.

APPENDIX Dataset
The dataset contains information about 40 million bank card transactions made by customers of the Banco Bilbao Vizcaya Argentaria (BBVA) in the provinces of Madrid and Barcelona in 2011. Each transaction is characterized by its amount (in euro currency) and the time when the transaction has occurred. Each transaction is also linked to a customer and a business using anonymized customer and business IDs. Customers are identified with an anonymized customer ID, connected with sociodemographic characteristics (gender, age and occupation) and their postcode of residence. For convenience sake, we consider five age groups (]15, 30], ]30, 45], ]45, 60], ]60, 75], > 75) and five types of occupations (student, unemployed, employed, homemaker, and retired). In the same way, businesses are identified with an anonymized business ID, a business category (accommodation, automotive industry, bars and restaurants, etc.) and the geographical coordinates of the credit card terminal. Table S1 presents some basic statistics on the data collected. As mention in the main text, the cost associated to a travel is estimated with the distance between the user's postcode of residence (lon/lat coordinates of the centroid) and the location of the business in which the transaction occurred (lon/lat of the credit card terminal). The value v given to the travel purpose is inferred by the amount of money spent. For both case studies, we only consider the credit card payments whose amount was inferior to 500 euros. The PDF of the amount of money spent per transaction is displayed in Figure S1.

Effect of the users' characteristics on the exponent γ
The relationship between the median distanced and the amount of money spent v can be wellapproximated by a log-linear function, However, we need to verify that this positive correlation between the two quantities is not the result of the spatial distribution of users and businesses in the two provinces. To ensure that it is not the case, we plot in Figure S2 the distribution of the exponent γ estimated with a log-linear model for each postcode. The value of γ is globally strictly higher than 0, suggesting that the positive correlation betweend v and v does not dependent of the user's postcode of residence. Finally, the relationship between the distance traveled and the amount of money spent according to the users' economic and sociodemographic characteristics is also investigated. The results are displayed in Table  S2. Here again, the value of γ is always strictly higher than 0. Finally, we need to verify that the positive correlation between the median distanced and the amount of money spent v does not depend on the type of purchases (i.e. business category) in the two provinces. The different business categories and their proportions of associated transactions are available in Table S3. The relationship between the distance traveled and the amount of money spent according to the business category is presented in Table S4. In most of the cases, the value of γ is strictly higher than 0. Note that in some cases, like for the Restaurants business category for example, due to the presence of outlier ( Figure S3) no log-linear relationship betweend and v has been found.