Royal Society Open Science
Open AccessResearch article

A model to identify urban traffic congestion hotspots in complex networks

Albert Solé-Ribalta

Albert Solé-Ribalta

Departament d’Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, 43007 Tarragona, Spain

Internet Interdisciplinary Institute, Universitat Oberta de Catalunya, 08018 Barcelona, Catalonia, Spain

[email protected]

Google Scholar

Find this author on PubMed

Sergio Gómez

Sergio Gómez

Departament d’Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, 43007 Tarragona, Spain

Google Scholar

Find this author on PubMed

Alex Arenas

Alex Arenas

Departament d’Enginyeria Informàtica i Matemàtiques, Universitat Rovira i Virgili, 43007 Tarragona, Spain

IPHES, Institut Catala de Paleoecologia Humana i Evolucio Social, 43007 Tarragona, Spain

Google Scholar

Find this author on PubMed



The rapid growth of population in urban areas is jeopardizing the mobility and air quality worldwide. One of the most notable problems arising is that of traffic congestion. With the advent of technologies able to sense real-time data about cities, and its public distribution for analysis, we are in place to forecast scenarios valuable for improvement and control. Here, we propose an idealized model, based on the critical phenomena arising in complex networks, that allows to analytically predict congestion hotspots in urban environments. Results on real cities’ road networks, considering, in some experiments, real traffic data, show that the proposed model is capable of identifying susceptible junctions that might become hotspots if mobility demand increases.

1. Introduction

Urban life is characterized by a huge mobility, mainly motorized. Amidst the complex urban management problems, there is a prevalent one: traffic congestion. Several approaches exist to efficiently design road networks [1] and routing strategies [2]; however, the establishment of collective actions, given the complex behaviour of drivers, to prevent or ameliorate urban traffic congestion is still at its dawn. Usually, congestion is not homogeneously distributed around all city areas but there are salient locations where congestion is settled. We call this locations congestion hotspots. These hotspots usually correspond to junctions and are problematic for the efficiency of the network as well as for the health of pedestrians and drivers. It has been shown [3] that drivers queuing in a traffic jam are the most affected individuals to car exhaust pollution inhalation. In addition, these hotspots are usually located in the city centre, magnifying the problem [4]. Assuming that congestion is an inevitable consequence of urban motorized areas, the challenge is to develop strategies towards a sustainable congestion regime at which delays and pollution are under control. The first step to confront congestion is the modelling and understanding of the congestion phenomena.

The modelling of traffic flows has been a prevalent hot topic since the late 1970s when Gipps’ model appeared [5]. Gipps’ model and other car-following models [6,7] have evidenced the necessity of modelling traffic flows to improve road network efficiency and also have shown how congestion severely affects the traffic flows. Since 10 years ago the complex networks community has also proposed stylized models to analyse the problem of traffic congestion in networks and design optimal topologies to avoid it [821]. The focus of attention of the previous works was the onset of congestion, which corresponds to a critical point in a phase transition, and how it depends on the topology of the network and the routing strategies used. However, the proper analysis of the system after the onset of congestion has remained analytically slippery. It is known that when a transportation network reaches congestion, the system becomes highly nonlinear, large fluctuation exists and the travel time and the number of vehicles queued at a junction diverge [16]. This phenomenon is equivalent to a phase transition in physics, and its modelling is challenging [2224]. Here, we propose an idealized model to predict the behaviour of transportation networks after the onset of congestion. The presented model is analytically tractable and can be iteratively solved up to convergence. To the best of our knowledge, this is the first analytical model that is able to give predictions beyond the onset of congestion. We present the model in terms of road transportation networks but it could also be applied to analyse other types of transportation networks, such as computer networks, business organizations or social networks.

2. Transportation balance equations

To identify congestion hotspots in urban environments, we propose a model based on the theory of critical (congestion) phenomena on complex networks. The model, that we call the microscopic congestion model (MCM), is a mechanistic model (yet simple) and analytically tractable. It is based on assuming that the growth of vehicles observed at each congested node of the networks is constant. This usually happens in real transportation networks at the stationary state. The assumption allows us to describe, with a set of balance equations (one for each node), the increment of vehicles in the junction queues and the number of vehicles arriving or traversing each junction from neighbouring junctions. Mathematically, the increment of the vehicles per unit time at every junction i of the city, Δqi, satisfies the following balance equation

where gi(t) is the average number of vehicles entering junction i from the area surrounding i at time t, σi(t) is the average number of vehicles that arrive to junction i from the adjacent links of that junction and di(t)∈[0,τi] corresponds to the average number of vehicles that actually finish in junction i or traverse towards other junctions. Note that the value of di is upper-bounded by the maximum number of vehicles τi that can traverse junction i in a time step. This simulates the physical constraints of the road network. A graphical explanation of the variables of the model is shown in figure 1.
Figure 1.

Figure 1. Illustration of the variables of the MCM model. (a) Vehicles entering junction i from the area surrounding i. (b) Vehicles entering junction i from its neighbouring junctions. (c) Vehicles leaving junction i, either to go to other neighbouring junctions or to finishing the trip in its surrounding area.

The system of equations (2.1) defined for every node i, is coupled through the incoming flux variables σi(t), that can be expressed as

where Pji(t) accounts for the routing strategy of the vehicles (probability of going from j to i), pj(t) stands for the probability of traversing junction j but not finishing at j and S is the number of nodes in the network.

For each junction i, the onset of congestion is determined by di=τi, meaning that the junction is behaving at its maximum capability of processing vehicles. Thus, for any flux generation (gi), routing strategy (Pij) and origin–destination probability distribution, equations (2.1) can be solved using an iterative approach to predict the increase of vehicles per unit time at each junction of the network (Δqi(t)) (see §3). The only hypothesis we use is that the system dynamics has reached a stationary state in which the growth of the queues is constant. It is worth commenting here that the MCM model considers a fixed average of new vehicles entering the system gi. However, gi certainly changes during daytime, with increasing values in rush hours and lower values during off-peak periods. MCM can easily consider evolving values of gi provided the time scale to reach the stationary state in the MCM (which is usually of the order of minutes in real traffic systems) is shorter than the rate of change in the evolution of gi (which is usually of the order of hours for the daily peaks).

3. Microscopic congestion model

Let node i denote a road junction, edge aij the road segment between junctions i and j, Nini and Nouti the sets of ingoing and outgoing neighbouring junctions of junction i, respectively, and S the number of junctions in the road network of the city. Incoming vehicles to node i at each time step can be of two types: those coming from other junctions Nini and those that start their trip with node i as their first crossed junction. We consider this second type of incoming vehicles as generated in node i. Our MCM describes at the stationary state the increment of the vehicles per unit time at every junction i of the city, Δqi, as

As described above, we decompose the incoming flux of vehicles σi to node i in the stationary state as

As vehicles just generated in a certain node are not affected by the congestion in the rest of the network, we separate their contributions in the computation of probabilities p and P. Thus, we decompose pi as

where the first term accounts for vehicles generated in node i (pgeni) whose destination is not i (ploci) and the second term accounts for vehicles not generated in i whose destination is not i (pexti). Supposing trips consist in travelling through two or more junctions, we have that ploci=1. Probability pgeni is equal to the fraction of vehicles generated in i with respect to the total number of incoming vehicles
Considering the distribution of origins, destinations, the routing strategy and the congestion in the network, probability pexti can be expressed in terms of the effective node betweenness B~i and the effective vehicle arrivals e~i (the number of vehicles with destination node i that arrive to node i at each time step)
The effective betweenness B~i of a node i accounts for the expected number of vehicles each node i receives per unit time considering the routing algorithm and the overall congestion of the network. See §3.2 for an extended description and computation of the effective node betweenness B~i and the effective vehicle arrivals e~i.

In the same spirit, we decompose the probability Pji that a vehicle waiting in node j goes to node i as

The first term corresponds to the routed vehicles generated in node j (prgenj) that go to node i (Plocji) and the second term to the routed vehicles not generated in j that go to node i (Pextji). Similarly as before, prgenj can be expressed as the rate between the vehicles generated in j and the total number of routed vehicles
and Plocji and Pextji can be computed in terms of the normalized effective edge betweenness of the network
where the computation of E~jiloc only considers paths that start on node j and E~jiext only considers paths that do not start on node j. Equivalently to the effective node betweenness B~i, computation of E~jiloc and E~jiext consider, if required, all congested junctions in the network, as described in a later section, as well as the distribution of the vehicle sources and destinations. Note that the sum of Elocji and Eextji corresponds to the classical edge betweenness. Moreover, Pji is an exact expression before and after the onset of congestion.

Eventually, the MCM is composed of a set of S equations (Δqi=gi+σidi), one for each junction, and, in principle, a set of 2S unknowns, Δqi and di for each junction. To see that the system is indeed determined, we need to note that for congested junctions Δqi>0 and, thus, after the transient state, di=τi. For the non-congested junctions, we have that Δqi=0 and consequently di=gi+σi. In conclusion, for any node i, either di=τi or di=gi+σi which reduces the number of unknowns to S.

To solve the model given a fixed generation rate gi, we start by considering that no junction is congested and we solve the set of equations equations (3.1)–(3.9) by iteration. It is possible that some nodes exceed their maximum routing rate. If this is the case, we set the node with maximum di as congested and we solve the system again. This process is repeated until no new junction exceeds its maximum routing rate.

3.1. Onset of congestion using the microscopic congestion model

Most of the works that consider static routing strategies assume that the generation rate of vehicles is the same for all nodes, gi=ρ. In that case, it is possible to compute the critical generation rate ρc such that for any generation rate ρ>ρc the network will not be able to route or absorb all the traffic [2530]. After this point is reached, the number of vehicles Q(t) in the network will grow proportionally with time, Q(t)∝t, as some of the vehicles get stacked in the queues of the nodes. This transition to the congested state is characterized using the following order parameter:

where 〈ΔQ〉 represents the average increment of vehicles per unit of time in the stationary state. Basically, the order parameter measures the ratio between in-transit and generated vehicles.

In the non-congested phase, the number of incoming and outgoing vehicles for each node can be computed in terms of the node’s algorithmic betweenness Bi [25]. In particular,

where the second term inside the parentheses accounts for the fact that, in our model, vehicles are also queued at the destination node, unlike in [25]. When no junction is congested, we have that Δqi=0 for all nodes and consequently
A node i becomes congested when it is required to process more vehicles than its maximum processing rate, di>τ. Thus, the critical generation rate at which the first node, and so the system, reaches congestion is

This is one of the most important analytical results on transportation networks with static routing strategies. In the following, we show that we can recover equation (3.11) before the onset of congestion using our MCM approach. After substitution of the expression of the probabilities in equation (3.2)

and, given we do not have congestion (i.e. dj=ρ+σj), it simplifies to
Equation (3.15) in matrix form becomes
and then
This expression can be shown to be equivalent to equation (3.11) by using the following relationship between node and edge betweenness
The right-hand side corresponds to the accumulated fractions of paths that pass through the neighbours of node i and then go to i. Each neighbour contributes with two terms, the paths that go through j coming from other nodes, and the paths that start in j.

3.2. Effective betweenness in congested transportation networks

The effective betweenness B~i of a node i, as defined in [25], accounts for the expected number of vehicles each node i receives per unit time. When the network is not congested and the vehicle generation rate gi is equal for all nodes, gi=ρ, the number of vehicles each node receives can be obtained using equation (3.11). However, if the network is congested, the traffic dynamics becomes highly nonlinear and the value of σi computed in equation (3.11) becomes a poor approximation.

Suppose we focus on a particular congested node j* of the network. For j*, being congested means that it is receiving more vehicles that the ones it can process and route. In particular, from the σj*+gj* vehicles that arrive to the node, only τj* can be processed at each time step.

Therefore, the contribution to the effective betweenness B~i of the paths from a source/destination pair, (s,t), that traverse the congested node j* before reaching i, must be rescaled by the fraction of processable vehicles

When a path traverses multiple congested nodes j*,k*,…, the remaining fraction of paths that will reach the target node will be the result of the application of the multiple rescalings sx*.

The computation of sj* is not straightforward. In general, σi is not known after the onset of congestion and depends on the effective betweenness that requires, at the same time, to know the sj* fraction for all congested nodes. Thus, an iterative calculation is needed to fit all the parameters at the same time as we do in our MCM.

The effective arrivals e~i account for the number of vehicles with destination node i that arrive at node i at each time step. This value in the non-congested phase can be obtained, considering homogeneous source and destination nodes, as

However, congestion also affects the variable ei and needs to be corrected accordingly using the same procedure presented above.

4. Results

To simulate the traffic dynamics of the road network, we assign a first-in-first-out queue to each junction that simulates the blocking time of vehicles before they are allowed to cross it and continue their trip. We suppose these queues have infinite capacity and a maximum processing rate that simulates the physical constraints of the junction. Vehicle origins and destinations may follow any desired distribution. In this work, we have considered two distributions: a random uniform distribution for the synthetic experiments, and one obtained considering the ingoing and outgoing flux of vehicles of the city of Milan. At each time step (of 1 min duration) vehicles are generated and arrive to their first junction. During the following time steps, vehicles navigate towards their destination following any routing strategy. Here, we have used two different routing strategies: shortest-path and random local search.

4.1. Congestion on synthetic networks

To evaluate the MCM, we have conducted experiments on several synthetic networks and with two different routing strategies: local search strategy and shortest path strategy. In both routing strategies we assume, for simplicity, that all vehicles randomly choose the starting and ending junctions of their journey uniformly within all junctions of the network. Thus, each junction generates new vehicles with the same rate gi=ρ. For shortest path strategy, vehicles follow a randomly selected shortest path towards the destination. Without loss of generality, we fix τ=1 and analyse the performance of MCM for different values of ρ.

Figure 2 shows the accuracy on predicting the values of the order parameter η=Δqi/ρS and di for shortest paths routing strategy. As in [25,30], this order parameter η corresponds to the ratio between in-transit and generated vehicles. In the electronic supplementary material, we extend the evaluation considering local search routing strategies and also evaluate the accuracy on predicting other variables of the model. All experiments show that the MCM achieves high accuracy in predicting the macroscopic and microscopic variables of the stylized transportation dynamics.

Figure 2.

Figure 2. Validation of the microscopic congestion model with Barabási–Albert (a,b) and Erdos–Rényi (c,d) networks of 1000 nodes and shortest path routing strategy. In the construction procedure of the Barabási–Albert networks each new node is connected to one existing node in the network. The Erdos–Rényi networks have an average degree of 50. In (a,c), accuracy in predicting the order parameter η. In (b,d), correlation between predicted and simulated values of di. Vertical solid lines on panels (a,c) show the predicted critical generation rate ρc. Vertical dashed lines show the ρ values where d is evaluated on the panels (b,d).

4.2. Application to real scenarios

INRIX Traffic Scorecard ( reports the rankings of the most congested countries worldwide in 2014. USA, Canada and most of the European countries are in the top 15, with averages that range from 14 to 50 h per year wasted in congestion, with their corresponding economic and environmental negative consequences. To demonstrate that the MCM model can be applied to real scenarios to obtain real predictions, in the following we apply the MCM model to the nine most congested cities according to the INRIX Traffic Scorecard (table 1).

Table 1.Comparison between the INRIX (12 months) traffic index and the number of hotspots estimated by the proposed model for the most congested cities of the world.

city INRIXa hotspots nodes links
Milano 36.2 108 6924 14 315
London 32.4 93 6378 14 662
Los Angeles 32.2 57 6799 19 368
Brussels 30.5 50 6645 15 624
Antwerpen 28.6 44 6530 15 252
San Francisco 27.9 45 8854 25 530
Stuttgart 21.9 34 8330 19 946
Nottingham 21.6 28 7337 16 723
Karlsruhe 21.3 19 4257 10 379

aThe INRIX index is the percentage increase in the average travel time of a commute above free-flow conditions during peak hours, e.g. an INRIX index of 30 indicates a 40-min free-flow trip will take 52 min. Each city has been mapped to a graph with the indicated numbers of nodes and links. See text for details and electronic supplementary material, figures S1–S18 for the graph representation of the cities and the geographical representation of the congestion hotspots.

We first focus on the city of Milan, the city with largest INRIX value. To evaluate the outcome of the MCM model, we first gather data about the road network topology using Open Street Map (OSM). OSM data represent each road (or way) with an ordered list of nodes which can either be road junctions or simply changes of the direction of the road. We have obtained the required abstraction of the road network building a simplified version of the OSM data which only account for road junctions (nodes). Then, for each pair of adjacent junctions we have queried the real travel distance (i.e. following the road path) using the API provided by Google Maps. The resulting network corresponds to a spatial weighed directed network [31] where the driving directions are represented and the weight of each link indicates the expected travelling time between two adjacent junctions (see electronic supplementary material, figure S21).

Second, we build up the dynamics of the model analysing real traffic data provided by Telecom Italia for their Big Data Challenge. The data provide, for every car entering the cordon pricing zone in Milan during November and December 2013, an encoding of the car’s plate number, time and gate of entrance (a total of 9 183 475 records). This allows us to obtain the (hourly) average incoming and outgoing traffic flow, for each gate of the cordon taxed area.

Given the previous topology and traffic information, we generated traffic compatible with the observations, and evaluated the outcome of the MCM model. Specifically, the simulated dynamics is as follows: for each vehicle entering the Area-C we fix a randomly selected location as destination and use the shortest path route towards it. After the vehicle has arrived to its destination, it randomly chooses an exit door and travels to it also using the shortest path route. This is similar to the well-known Home-to-Work travel pattern where vehicles arrive from the outskirts of the city, go to the city centre and then return to the outskirts. Specifically, in our simulation, traffic is generated in the peripheral junctions of the network, goes to a randomly selected junction within the city and then returns back to a randomly selected peripheral junction. We do not consider trips with origin and destination inside the city centre because public transportation systems (e.g. train or subway) usually constitute a better alternative than private vehicles for those trips. The maximum crossing rate of each junction τi accounts, among others, for the existence of traffic lights governing the junction, the width of the street as well as its traffic. We have not been able to get this information for the studied cities, and consequently we cannot set to each junction its precise value. Instead, without loss of generality and for the sake of simplicity, we set to all junctions the same maximum crossing rate, τi=15 (an estimation of the average of their real values).

Figures 3 and 4 show the obtained results. Figure 3b displays the predicted congestion hotspots on a map of Milan, panel (a) of the same figure shows a real traffic situation obtained with Google Maps. We see that the predicted congestion hotspots are located in the circular roads of Milan as well as on the arterial roads of the city; this agrees with the real traffic situation shown in panel (a). Figure 4a shows the distribution of the mean increments each junction has to deal with. This might be a good indicator to decide about future planning actuations to improve city mobility. However, differently from what is described in [11], the improvement of the throughput of a single junction might not be enough to improve city mobility because this might end up with the collapse of neighbouring junctions (their incoming rate σi will increase). This situation is similar to Braess’ paradox [33]. Figure 4b shows the mean increment of vehicles (in vehicles per minute) for each hour of the weekday. The figure clearly shows the morning and evening rush hours as well as the lunch time.

Figure 3.

Figure 3. Congestion hotspot analysis of the city of Milan. Panel (a) shows the typical situation around 9.00 for a weekday. The image and the data have been obtained with Google Maps. Google Maps displays traffic information considering historical data and real-time car velocity reported by smartphones [32]. Panel (b) shows the prediction of the MCM model considering the real road topology obtained using Open Street Map and real traffic data provided by Telecom Italia for their Big Data Challenge. The extracted road network topology is provided in electronic supplementary material, figure S19. For all congestion hotspots the model has predicted, we show its mean increment of the queue size, 〈Δqi〉.

Figure 4.

Figure 4. Statistics of Milan congestion hotspots. Panel (a) shows the distribution of the vehicle increments (Δqi) of each congestion hotspot predicted by the MCM. The plot aggregates the MCM predicted congestion considering the average traffic for every weekday and every hour of the day. Panel (b) shows the average city congestion in terms of mean increment of vehicle in transit for every hour of the day during a weekday.

For the other top nine congested cities, we do not have previous traffic information, neither about the real flux of vehicles nor about the vehicle source and destination distributions (to obtain a fair comparison between all the analysed cities, we have not considered the Telecom traffic data for Milan here). Thus, for each city, we consider homogeneously distributed source and destination locations and the required road traffic to obtain an order parameter η compatible with the congestion effects recorded by INRIX sensing of real traffic. By relating the INRIX value and η, we are assuming that there exists a relation between the fraction of global congestion and the fraction of extra time wasted reported by INRIX. The obtained results are summarized in table 1, which shows that the number of hotspots is correlated with the INRIX value. This shows evidence that the percentage increase in the average travel time to commute between city locations is related to the number of congestion hotspots and the excess of vehicles within the city. The extracted road network topology for all the analysed cities as well as the predicted congestion hotspots is provided in electronic supplementary material, figures S1–S18.

5. Discussion

The previous results show that the MCM can be used to predict the local congestion before and beyond the onset of congestion of a transportation network. To the knowledge of the authors, this is the first analytical model that is able to give predictions beyond the onset of congestion where the system is highly nonlinear, large fluctuation exists and the number of vehicles on transit diverge with respect to time. Our model is based on assuming that the growth of vehicles observed in each congested node of the networks is constant, which allowed us to derive a set of balance equations that can accurately predict macroscopic, mesoscopic and microscopic variables of the transportation network.

Traffic congestion is a common and open problem whose negative impacts range from wasted time and unpredictable travel delays to a waste of energy and an uncontrolled increase of air pollution. A first step towards the understanding and fight of congestion and its related consequences is the analytical modelling of the congestion phenomena. Here, we have shown that the MCM model is detailed enough to give real predictions considering real traffic data and topology. These results pave the way to a new generation of stylized physical models of traffic on networks in the congestion regime that could be very valuable to assess and test new traffic policies on urban areas in a computer-simulated scenario.

Data accessibility

Data available at Dryad Digital Repository: [34].

Authors' contributions

A.S.-R., S.G and A.A. contributed equally to the research and writing of the manuscript; A.S.-R. performed the experiments.

Competing interests

We have no competing interests.


This work has been supported by Ministerio de Economía y Competitividad (Grant FIS2015-71582-C2-1) and European Comission FET-Proactive Projects MULTIPLEX (grant no. 317532). A.A. also acknowledges partial financial support from the ICREA Academia and the James S. McDonnell Foundation.


Electronic supplementary material is available online at

Published by the Royal Society under the terms of the Creative Commons Attribution License, which permits unrestricted use, provided the original author and source are credited.