Abstract
We propose a quantitative method to classify cities according to their street pattern. We use the conditional probability distribution of shape factor of blocks with a given area and define what could constitute the ‘fingerprint’ of a city. Using a simple hierarchical clustering method, these fingerprints can then serve as a basis for a typology of cities. We apply this method to a set of 131 cities in the world, and at an intermediate level of the dendrogram, we observe four large families of cities characterized by different abundances of blocks of a certain area and shape. At a lower level of the classification, we find that most European cities and American cities in our sample fall in their own sub-category, highlighting quantitatively the differences between the typical layouts of cities in both regions. We also show with the example of New York and its different boroughs, that the fingerprint of a city can be seen as the sum of the ones characterizing the different neighbourhoods inside a city. This method provides a quantitative comparison of urban street patterns, which could be helpful for a better understanding of the causes and mechanisms behind their distinct shapes.
1. Introduction
The recent availability of large amounts of data about urban systems has opened the exciting possibility of a new ‘science of cities’, with the aim of understanding and modelling phenomena taking place in the city [1]. Urban morphology and morphogenesis, activity and residence location choice, urban sprawl and the evolution of urban networks, are just a few of the important processes that have been discussed for a long time but that we now hope to understand quantitatively. An important component of cities is their street and road networks. These networks can be thought of as a simplified schematic view of cities, which captures a large part of their structure and organization [2], and contains a large amount of information about underlying and universal mechanisms at play in their formation and evolution. Extracting common patterns between cities is a way towards the identification of these underlying mechanisms. At stake is the question of the processes behind the so-called ‘organic’ patterns—which grow in response to local constraints—and whether they are preferable to the planned patterns which are designed under large-scale constraints. This programme is not new [3,4], but the recent dramatic increase of data availability such as digitized maps, historical or contemporary data [5–8] allows us now to test ideas and models on large-scale cross-sectional and historical data.
Streets and roads form a network (where nodes are intersections and links are segment roads) which is planar to a good approximation. This network is now fairly well characterized [9–20]; owing to spatial constraints, the degree distribution has peaked, the clustering coefficient and assortativity are large, and most of the interesting information lies in the spatial distribution of betweenness centrality [21]. An important point is that information about these networks is not contained only in their adjacency matrix. Geometry, encoded in the spatial distribution of nodes, plays a crucial role. A classification of cities according to their street network should then rely on both topology and geometry.
We note that while classifications do not provide any understanding of the objects being classified per se, they provide a useful first insight into the different characteristics exhibited by objects of the same nature. Classifying, from a fundamental point of view, is however difficult: finding a typology of street patterns essentially amounts to classifying planar graphs, a non-trivial problem. The classification of street networks has previously been addressed by the space syntax community [22,23] and a good account can be found in the book by Marshall [24]. These works, although based on empirical observations, contain much subjectivity and our goal is to eliminate this subjective part to reach a non-ambiguous, scientific classification of these patterns.
An interesting direction is provided by the study of leaves and their classification according to their veination patterns [25,26], but with a notable difference which prevents us from a direct application to streets: the existence of a hierarchy of veins governed by their diameter (the width of streets is usually absent from datasets). Another enticing idea can be found in the mathematics literature: there exists an exact bijection between planar graphs and trees [27]. Using this bijection, classifying planar graphs would amount to classify trees, which is a simpler problem. However, this bijection does not take into account the geometrical shape of the planar graph: indeed two street patterns can have the same topology but cells could be of very different areas, leading to visually different patterns and to cities of different structures. It is thus important to take into account not only the topology of the planar graph—as described by the adjacency matrix—but also the position of the nodes. In order to do that, we propose in this article a method to characterize this complex object by extracting the ‘fingerprint’ of a street pattern. These fingerprints allow us to define a measure of the distance between two graphs and to construct a classification of cities.
2. Streets versus blocks
A major shortcoming of existing classifications is that they are based on the street network. This is problematic for two different reasons. First, there is no unambiguous, purely geometrical definition of what a street is: we could define it as the road segment between two intersections, as an almost straight line (up to a certain angular tolerance, see [12]), or we could also follow the actual street names. There is a certain degree of arbitrariness in each of these definitions, and it is not clear how robust a classification based on streets would be. Second, it seems that what is perceived by the human eye of a city map is not streets but the distribution of the shapes, area and disposition of blocks (figure 1).
Figure 1. (a,b) From the street network to blocks. Example of a street pattern taken in the neighbourhood of Shibuya in Tokyo (Japan) and the corresponding set of blocks. Note that the block representation does not take into account dead-ends.
A natural idea when trying to classify cities is thus to focus on blocks (or cells, or faces) rather than streets. A block can usually be defined without ambiguity as being the smallest area delimited by roads (it has then to be distinguished from a parcel which is a tax-related definition). While the information contained in the blocks and the streets are equivalent (up to dead-ends), the information related to the visual aspect of the street network seems to be easier to extract from blocks. Blocks are indeed simple geometrical objects—polygons—whose properties are easily measured. The properties of blocks and their arrangement thus seem to be a good starting point for attempting a classification of urban street patterns.
3. Characterizing blocks
Blocks are defined as the cells of the planar graph formed by streets, and it is relatively easy to extract them from a map. We have gathered road networks for 131 major cities across the world, spanning all continents (but Antartica), and their locations are represented on the map (figure 2). The street networks have been obtained from the OpenStreetMap database [5], and restricted to the city centre using the Global Administrative Areas database (or databases provided by the countries' administration). We extracted the blocks from the street network and removed undesired features (aspects that have no real-world counterpart but appear due to the particular way data are encoded in OpenStreetMap). We end up with a set of blocks, each with a geographical position corresponding to their centroids.
Figure 2. Location of the cities in our dataset and geographical repartition of the different groups. The colour of the dots indicates in which group the city falls, as defined in figure 5. At the bottom of the map, the pie charts display the relative importance of the different groups per continent for cities in our dataset (group 1: 0.8%, group 2: 20.9%, group 3: 77.5%, group 4: 0.8%). We see that group 3, composed of cities with blocks of various shapes and a slight predominance of larger areas, is by far the most represented group in the world. (Online version in colour.)
Blocks are polygons and as such can be characterized by simple measures. First, the surface area A of a block gives a useful indication, and its distribution is an important piece of information about the block pattern. As in [13,28], we find that for different cities, the distributions have different shapes for small areas, but display fat tails that decrease as a power law

A second characterization of a block is through its shape, with the form (or shape) factor Φ, defined in the geography literature in [29] as the ratio between the area of the block and the area of the circumscribed circle

The quantity Φ is always smaller than one, and the smaller its value, the more anisotropic the block is. There is not a unique correspondence between a particular shape and a value of Φ, but this measure gives a good indication about the block's shape in real-world data, where most blocks are relatively simple polygons. The distributions of Φ displays important differences from one city to another, and a first naive idea would be to classify cities according to the distribution of block shapes given by P(Φ). The shape itself is however not enough to account for visual similarities and dissimilarities between street patterns. Indeed, we find for example that for cities such as New York and Tokyo, even if we observe similar distributions P(Φ) (figure 3), the visual similarity between both cities' layouts is not obvious at all. One reason for this is that blocks can have a similar shape but very different areas: if two cities have blocks of the same shape in the same proportion but with totally different areas, they will look different. We thus need to combine the information about both the shape and the area.
Figure 3. The fingerprints of Tokyo (a,b) and New York, NY (c,d). (a,c) We rearrange the blocks of a city according to their area (y-axis) and their Φ value (x-axis). The colour of each block corresponds to the area category it falls into. (b,d) We quantify this pattern by plotting the distribution of shapes, as measured by Φ for each area category, represented by coloured curves. The grey curve is the sum of all the coloured curves and represents the distribution of Φ for all cells. As shown in the inset, we see that intermediate area categories dominate the total number of cells and are thus enough for the clustering procedure. (Online version in colour.)
In order to construct a simple representation of cities which integrates both area and shape, we rearrange the blocks according to their area (on the y-axis) and display their Φ value on the x-axis (figure 3). We divide the range of areas in (logarithmic) bins and the colour of a block represents the area category to which it belongs. We describe this pattern quantitatively by plotting the conditional probability distribution P(Φ|A)P(A) of shapes, given an area bin (figure 3b,d). The coloured curves represent the distribution of Φ in each area category, and the curve delimited by the grey area is the sum of all of these curves and is the distribution of Φ for all cells, which is simply the translation of the well-known formula for probability conditional distribution

These figures give a ‘fingerprint’ of the city which encodes information about both the shape and the area of the blocks. In order to quantify the distribution of blocks inside a city, and thus the visual aspect of the latter, we will then use P(Φ|A) for different area bins. The comparison between these quantities provides the basis for the classification of street patterns that we propose here.
4. A typology of cities across the world
Two cities display similar patterns if their blocks have both similar area and shape. In other words, the shape distributions for each area bin should be very close, and this simple idea allows us to propose a distance between street patterns of different cities. More precisely, as one can see in figure 3, the number of blocks of area in the range [103, 105] (in square metres) dominate the total number of cells, and we will neglect very small blocks (of area less than 103 m2) and very large ones (of area more than 105 m2). We thus sort the blocks according to their area in two distinct bins


We denote by fα(Φ) the ratio of the number of cells with a form factor Φ that lie in the bin α over the total number of cells for that city. We then define a distance dα between two cities a and b characterized by their respective and

We tested different choices (n = 1 and n = 2) for dα (a, b), and although they might change the position of some cities in the classification, our conclusions are robust. We then construct a global distance D between two cities by combining all area bins α

— In group 1 (comprising Buenos Aires, Argentina only), we essentially have blocks of medium size (in the bin α2) with shapes that are dominated by the square shape and regular rectangles. Small areas (in bin α1) are almost exclusively squares. | |||||
— Athens, Greece, is a representative element of group 2, which comprises cities with a dominant fraction of small blocks with shapes broadly distributed. | |||||
— Group 3 (illustrated here by New Orleans, USA) is similar to group 2 in terms of the diversity of shapes but is more balanced in terms of areas, with a slight predominance of medium-size blocks. | |||||
— Group 4, which contains for this dataset the interesting example of Mogadishu, Somalia, displays essentially small, square-shaped blocks, together with a small fraction of small rectangles. |

Figure 4. Dendrogram. We represent the structure of the hierarchical clustering at a given level. Interestingly, 68% of American cities are present in the second largest subgroup of group 3 (fourth from the top). Also, all European cities except Athens are in the largest subgroup of group 3 (third from top). This result gives a first quantitative grounding to the feeling that European and most American cities are laid out differently. (Online version in colour.)

Figure 5. The four groups. (Left) Average distribution of the shape factor Φ for each group found by the clustering algorithm (each area bin is represented by a different colour from small areas in dashed green, medium size in orange, and large cells in blue). (Right) Typical street patterns for each group (plotted at the same scale in order to observe differences both in shape and areas). Group 1: Buenos Aires; Group 2: Athens; Group 3: New Orleans; Group 4: Mogadishu. (Online version in colour.)
The proportion and location of cities belonging to each group is shown in figure 2. Although one should be wary of sampling bias here, it seems that the type of pattern characteristic of group 3 (various shapes with larger areas) largely dominates among cities in the world. Interestingly, all North American cities (except Vancouver, Canada) are part of group 3, as well as all European cities (except Athens, Greece). The composition of the other continents is more balanced between the different groups. At a smaller scale within group 3 (figure 4), all European cities (except Athens) in our sample belong to the same subgroup of group 3 (the largest one, third from the top in figure 4). Similarly, 15 American cities out of the 22 in our dataset belong to the same subgroup of group 3 (the second largest one, fourth from the top in figure 4). Exceptions are Indianapolis (IN), Portland (OR), Pittsburgh (PA), Cincinnati (OH), Baltimore (MD), Washington (DC) and Boston (MA), which are classified with European cities, confirming the impression that these US cities have a European feel. These results point towards important differences between US and European cities, and could constitute the starting point for the quantitative characterization of these differences [31].
5. A local analysis
Cities are complex objects, and it is unlikely that a representation as simple as the fingerprint can capture all their intricacies. Indeed, cities are usually made of different neighbourhoods, which often exhibit different street patterns. In Europe, the division is usually clear between the historical centre and the more recent surburbs (a striking example of such differences is the Eixample neighbourhood in Barcelona, very distinct from other areas of the city). In order to illustrate this difference and to show that they also can be captured with our method, we isolate the different boroughs of New York, NY: the Bronx, Brooklyn, Manhattan, Queens and Staten Island. We extract the fingerprint of each borough, as represented in figure 6. The fingerprint of New York (bottom, figure 3) is indeed the combination of different fingerprints for each of the boroughs. While Staten Island and the Bronx have very similar fingerprints, the others are different. Manhattan exhibits two sharp peaks at Φ ≈ 0.3 and Φ ≈ 0.5, which are the signature of a grid-like pattern with the predominance of two types of rectangles. Brooklyn and the Queens exhibit a sharp peak at different values of Φ, also the signature of grid-like patterns with different rectangles for basic shapes.
Figure 6. New York City, NY, and its different boroughs. (Top) We represent New York City and its five boroughs: the Bronx, Brooklyn, Manhattan, Queens and Staten Island. (Bottom) The corresponding fingerprints for each borough. Only Staten Island and the Bronx have similar fingerprints and the others are different. In particular, Manhattan exhibits two sharp peaks at Φ ≈ 0.3 and Φ ≈ 0.5, which are the signature of a grid-like pattern with the predominance of two types of rectangles. Brooklyn and Queens exhibit a sharp peak at different values of Φ, signalling the presence of grid-like patterns made of different basic rectangles. (Online version in colour.)
6. Discussion and perspectives
We have introduced a new way of representing the road networks of cities, which can be seen as the equivalent of fingerprints for cities. It seems reasonable to think that the possibility of a classification based on these fingerprints hints at common causes behind the shape of the networks of cities in the same categories. Of course, this study has limitations: even if the shape of the blocks alone is good enough for the purpose of giving a rough classification of cities, we miss some aspects of the patterns. Indeed, the way the blocks are arranged together locally should also give some information about the visual aspect of the global pattern. Indeed, many cities are made of neighbourhoods, built at different times, with different street patterns. What is lacking at this point is a systematic, quantitative way to identify and distinguish different neighbourhoods and to describe the correlation between the positions of the blocks. Indeed, the New York boroughs, taken as examples in the last section, are administrative, arbitrary definitions of a neighbourhood. The reality is however more complex: similar patterns might span several administrative regions, or a given administrative division might host very distinct neighbourhoods. A further step in the classification would thus be to find a method to extract these neighbourhoods and integrate the spatial correlations between different types of neighbourhoods.
Despite the simplifications that our method entails, we believe that the classification we propose is an encouraging step towards a quantitative and systematic comparison of the street patterns of different cities. This, together with the specific knowledge of architects, urbanists, etc. should lead to a better understanding of the shape of our cities. Further studies are indeed needed in order to relate the various types that we observe to different urban processes. For example, in some cases, small blocks are obtained through a fragmentation process, and their abundance could be related to the age of the city. Consistency of cell shapes could be related to planning, such as in the case of Manhattan for example, but we also know with the example of Paris [7] that a large variety of shapes is also directly related to the effect of urban modification that does not respect the existing geometry.
Data accessibility
All the data used in this article can be downloaded from the OpenStreetMap database. Information on how to download OpenStreetMap data is available at http://wiki.openstreetmap.org/wiki/downloading_data.
Acknowledgements
We thank Vincenzo Nicosia for interesting discussions at an early stage of this project. We also thank Anne Bretagnolle, Maurizio Gribaudi, Vito Latora, Thomas Louail, Denise Pumain for stimulating discussions at various stages of this study.