The systematic structure and predictability of urban business diversity

Understanding cities is central to addressing major global challenges from climate and health to economic resilience. Although increasingly perceived as fundamental socio-economic units, the detailed fabric of urban economic activities is only now accessible to comprehensive analyses with the availability of large datasets. Here, we study abundances of business categories across U.S. metropolitan statistical areas to investigate how diversity of economic activities depends on city size. A universal structure common to all cities is revealed, manifesting self-similarity in internal economic structure as well as aggregated metrics (GDP, patents, crime). A derivation is presented that explains universality and the observed empirical distribution. The model incorporates a generalized preferential attachment process with ceaseless introduction of new business types. Combined with scaling analyses for individual categories, the theory quantitatively predicts how individual business types systematically change rank with city size, thereby providing a quantitative means for estimating their expected abundances as a function of city size. These results shed light on processes of economic differentiation with scale, suggesting a general structure for the growth of national economies as integrated urban systems.

fundamental socio-economic units, the detailed fabric of urban economic activities is only now accessible to comprehensive analyses with the availability of large datasets. Here, we study abundances of business categories across U.S. metropolitan statistical areas to investigate how diversity of economic activities depends on city size. A universal structure common to all cities is revealed, manifesting self-similarity in internal economic structure as well as aggregated metrics (GDP, patents, crime). A derivation is presented that explains universality and the observed empirical distribution. The model incorporates a generalized preferential attachment process with ceaseless introduction of new business types. Combined with scaling analyses for individual categories, the theory quantitatively predicts how individual business types systematically change rank with city size, thereby providing a quantitative means for estimating their expected abundances as a function of city size. These results shed light on processes of economic differentiation with scale, suggesting a general structure for the growth of national economies as integrated urban systems.
Diversity is central to the resilience of complex adaptive systems whether ecosystems or economies (1)(2)(3). In particular, it has been argued that the success and resilience of cities, together with their role in innovation and wealth creation, are driven by their ever-expanding diversity (2)(3)(4)(5)(6)(7)(8)(9). The presence and ever-changing admixture of individuals, ethnicities, cultural activities, businesses, services, and social interactions is a defining characteristic of urban life.
Together with its counterpart, specialization, this is often cited as what to makes a city unique and distinctive, and has consequently featured prominently in the study of cities across economics, geography and urban planning. Despite its acknowledged importance, however, there have been surprisingly few quantitative investigations into possible systematic regularities and underlying dynamics that govern the diversity of cities across an entire urban system. A recurrent goal in developing an overarching science of cities is to discover, and conceptually understand, general patterns for how people, infrastructure and economic activity are organized and inter-related (10,11). A compelling question therefore is how is diversity related to aggregate urban socio-economic and infrastructural metrics and how do these depend on city size (12)(13)(14)16)?
The systematic quantitative understanding of diversity requries two issues be addressed.
Measuring diversity typically involves identifying different (business) types and counting their frequency for a given unit of analysis, such as a city or a nation (1). It should be immediately clear, then, that such a task can be problematic because any systematic classification scheme is subject to an arbitrary recognition of specific categories, since any business type can be subdivided further as long as a defining distinction is made. Restaurants, for example, can be decomposed into fine dining, fast food, etc, and into their cuisine, price, quality etc. In general, therefore, urban diversity is scale-dependent and we should seek a resolution-independent characterization. Secondly, we need to deconvolute the intricate relations between scale, diversity and economic productivity, and between diversification and specialization. New business types must involve a larger number of people, both as workers and clients, and, to be sustained, should lead to greater economic productivity, by permitting, for example, greater specialization and interdependence. Thus we may expect that the larger scale of bigger cities should afford greater economic diversity (at least in absolute terms) but such an expectation is at odds with the idea that specialization drives increase in efficiency.
In this paper we present a novel approach to measuring and characterizing economic diversity in order to clarify its underlying role in urban economic development. Our analysis reveals a surprising systematic behaviour common to all cities. We show how this can be derived theoretically and present a simple model for understanding its structure based on a variant of preferential attachment for introducing new business types. The model quantitatively predicts how individual business types systematically change rank with city size, shedding light on processes of economic differentiation with scale.
We focus on the frequency distribution of business types (the number of "species") and first ask how this varies across cities (the "ecosystem"). We identify our unit of analysis as the establishment, which is defined as a single physical location where business is conducted, so that, for example, individual stores of a national chain would be counted separately. Establishments are nowadays seen as fundamental units of economic analysis because innovation, wealth generation, entrepreneurship and job creation all manifest themselves through the formation and growth of workplaces (17). We explore a unique dataset, the National Establishment Time-Series, a proprietary longitudinal database built by Walls & Associates, to capture economic life at an extraordinarily fine-grained level (18,19). This dataset includes records of nearly the entire set of establishments (work places) in US urban areas (over 32 million) each of which is classified according to the North American Industry Classification System (NAICS). We aggregate such information into the standard definition of functional cities: the 366 Metropolitan Statistical Areas (MSAs), which are defined by the census bureau as unified labor markets centered on a single large city wielding substantial influence over its surrounding region (20). These MSAs account for over 90% of the economic output of the US and almost 85% of its population (the lower limit on MSA size is ∼ 50, 000).
The data reveals that the total number of establishments, N f , in each MSA is linearly proportional to its population size, N : that is, with the proportionality constant η 21.6 −1 (Fig 1A). Thus, on average, there are about 22 people per establishment in a city, regardless of its size. Combined with the fact that the number of employees, N e , also scales linearly with N , the average size of establishments is also independent of population size with N e /N f 11.9. This remarkable constancy of the average employment rate and number of establishments across cities is puzzling when viewed in light of agglomeration effects, and per capita increases in productivity, wages, GDP, or patent production, with population size (13,14). Clearly, then, the increasing returns to scale characterizing the benefits of urbanization and increasing city size are not simply due to bigger cities having more establishments. This suggests that investigating the diverse composition of different eco- in the description of business types. We return to this point below.
A more insightful way of assessing economic diversity is to examine the constituent types of D(N ) for individual cities. The abundance of the hundred leading business types for a selection of cities is shown in Fig. 2A. In New York, the most abundant business type is offices of physicians, followed by offices of lawyers and restaurants; Phoenix ranks restaurants first and real estate second (perhaps not surprising in a rapidly growing city); San Jose, which includes Silicon Valley, predictably ranks computer programming second only to restaurants. Indeed, the composition of economic activities in cities has its own distinctive characteristics reflecting the individuality of each city. It is therefore all the more remarkable that, despite the unique admixture of business types for cities, the shape of these distributions is universal; so much so that, with a simple scale transformation, their rank abundances collapse to a single unique curve common to all cities (Fig. 2B). Note that the curve is robust to changes in levels of wealth, density, and population size, which vary widely across the U.S.
This universality can be derived from a sum rule for the total number of establishments as follows. Let F i (N ) be the number of the i th most abundant business type in a city of size N , as shown in Fig. 2A. When summed over all ranks, this must add up to the total number of As N increases, any growing or diverging dependences of the f i (N ) on N in Eq. (2) cannot be cancelled against each other because of their positive definiteness, whilst any decreasing dependence vanishes. Since η is a constant this implies that each f i must itself become independent of N for sufficiently large N , predicting that the per capita frequency of business types must be the same for all cites. Note that the derivation is independent of the underlying dynamics. when we treat the discrete rank, i, as a continuous variable, x, and correspondingly f i as a continuous function, f (x). The surprise in the data is that this predicted collapse to a single curve extends all the way down to relatively small cities (that is, up to relatively high ranks) mirroring a similar precocious scaling observed in urban metrics.
The universal form of this scaled rank-size distribution, f (x), has three distinct regimes: for small x(< x 0 , say), it is well described by a Zipfian power law with exponent γ, as shown in the inset of Fig. 2B; for larger x(> x 0 ), it is approximately exponential; and finally, as x approaches the maximally allowed value for the total number of categories, D max , f (x) drops off suddenly. To a very good approximation, these can be combined into a single analytic form: The where N depends weakly on N .
This represents an open-ended ever-expanding diversity with population growth and confirms that the cut-off, φ, is associated with the saturation observed in We can understand the structure of f (x) and D(N ) in the context of generalized preferential attachment or growing models (21,22). The growing model is a widely accepted mechanism for generating rank-size distributions, whether for words, genes or cities. It is based on a stochastic growth process in which new elements of the system (business types in this case) are attributed a probability, α, of adding a new type, or adding to an existing type (23,24). In the classic Simon-Yule model, for example, the attachment probability, α, of being an existing type is proportional to the existing frequency. As a result, the model exhibits a feed-back mechanism in which more-frequent types acquire new elements with higher probability than less-frequent The empirical findings (Figs. 1-2) coupled with the predictions of the model described above suggest that all cities, as they grow, exhibit the same underlying dynamics in the development of their business ecology. Initially, small cities, with a limited portfolio of economic activities, need to create new functionalities at a fast pace. These basic activities constitute the economic core of every city, big and small. Later, as cities grow, the pace at which new functionalities are introduced slows down dramatically, but never completely ceases. Large cities, then, presumably rely primarily on combinatorial processes for developing new relationships among their many existing functionalities, which in turn is the source of observed increases in economic productivity. This is a general feature of combinatorial growth process: Once the set of individual building blocks is large enough, their combination is sufficient to generate novelty even when the set itself expands slowly or not at all.
The universal distribution of frequencies does not, however, account for the entire developmental process of economic functionalities in cities. The stochastic Simon-Yule model, for example, does not predict what business compositions sit in what ranks. If, during growth, the introduction and success of each establishment were independent of business type (but dependent on frequencies), there would be no structure in how ranks are occupied. This is in clear disagreement with the following observation as well as with the pattern that "creative" activities and innovation concentrates disproportionally in large cities (5,13).
The process by which specific business types assume different ranks in different cities may be particular to the ecology of specific places; or it may also be a property of scale. To distinguish between these two cases, we perform a multi-dimensional allometric scaling analysis of the number of specific establishments in each type. The super (or sub)-linearity of specific business types represents a systematic per capita increase (or decrease) of their abundances with city size. Fig. 3A shows an example: the number of lawyers' offices, N lo , scales as N lo ∼ N β , with β ≈ 1.17. That the exponent β is greater than one (super-linear) means that larger cities systematically have more lawyers per capita. Because lawyers' offices typically appear at high frequencies (x < x 0 ) we can approximate f (x) in Eq. (3) by its power law behavior and write, lo and thereby derive how the rank of lawyers changes with city size: x lo ∼ N (1−β)/γ . This predicts x lo ∝ N −0.4 , which is in good agreement with the actual scaling shown in Fig. 3B. We can similarly predict how the ranks of low abundance business types scale. The rank shift can be expressed as: Thus, business types whose abundances scale super-linearly with population size systematically increase their rankings, whereas those that are sub-linear systematically decrease, as expected. Most primary sectors such as, agriculture, mining, and utility scale sub-linearly, predicting their systematic suppression, in relative terms, as cities get larger. On the other hand, informational and service businesses, such as professional, scientific, and technical services and management of companies and enterprises scale super-linearly, and are therefore predicted to increase disproportionally with city size, as observed. There are also sectors such as restaurants, for example, that do not change ranks. Note that sectors that deviate from linearity tend to be tradable industries that may be exchanged across cities (25). Because markets for these industries are not restricted to their immediate spatial location, comparative advantages may generate agglomeration effects resulting from city size and/or to specific places.
We have shown that the distribution of business in U.S. cities is characterized by a universal rank-size curve in which specific types predictably increase or decrease their relative rankings and frequencies as a function of city size. The results constitute a first general picture of the properties of the economic diversity of US cities measured in terms of business types. We to business life-cycle theories (28,29), where some types of business may be more prevalent in larger cities, but over time tend to move down the urban hierarchy as they mature and internalize more of their business model. We believe that the present results, together with further analyses of revenue, employment, temporal patterns, provide the foundation for a mechanistic understanding of how large cities realize greater economic productivity and how urbanization tends to promote nationwide economic growth (30).

Materials and Methods
The     Figure 2: Rank-abundance of establishment types (A) The number of establishments at rank x ranging from 1 to 90 in descending order of their frequencies (from common to rare) for New York city, Chicago, Phoenix and San Jose. Establishment types are color coded by their classification at the 2-digit level. (B) Universal rank-abundance shape of the establishment type by dividing N x by the population size of city in semi-log for all ranges. All metropolitan statistical areas are denoted by gray circles. Seven selected cities are denoted by various colors and shapes; New York city, Chicago, Phoenix, Detroit, San Jose, Champaign-Urbana, and Danville are, respectively, marked by red squares, pink diamonds, orange triangles, yellow left triangles, green right triangles, sky blue pluses, and blue crosses. The black dash line and the black sold line are fits predicted from Eq. 1 without and with φ respectively. The inset shows the first 200 types on a log-log plot showing an approximate Zipf-like power law behavior.