Urban geography and scaling of contemporary Indian cities

This paper attempts to create a first comprehensive analysis of the integrated characteristics of contemporary Indian cities, using scaling and geographical analysis over a set of diverse indicators. We use data of urban agglomerations in India from the Census 2011 and from a few other sources to characterize patterns of urban population density, infrastructure, urban services, crime and technological innovation. Many of the results are in line with expectations from urban theory and with the behaviour of analogous quantities in other urban systems in both high and middle-income nations. India is a continental scale, fast developing urban system, and consequently there are also a number of interesting exceptions and surprises related to both particular quantities and strong regional patterns of variation. Specifically, these relate to the potential salience of gender and caste in driving sub-linear scaling of crime and to the geography of technological innovation. We characterize these patterns in detail for crime and invention, and connect them to the existing literature on their determinants in a specifically Indian context. The paucity of data at the urban level and the absence of official definitions for functional cities in India create a number of limitations and caveats to any present analysis. We discuss these shortcomings and spell out the challenge for a systematic statistical data collection relevant to cities and urban development in India.

Infrastructure data: Data on city level infrastructure is available from the Census of India at http://www.censusindia.gov.in/2011census/dchb/DCHB.html, under the column "Town Amenities". Each file contains data for a single state. This data is available (for each state) at the city level. aggregation to UA-level data in each state, is just as described for population data. For our scaling analysis, we build the infrastructure metrics from the raw data provided by the Census of India in the following manner: 1. Road length = Pucca (paved) Road Length + Kuccha (unpaved) Road Length 2. Number of educational institutions = Schools (including primary, middle, secondary and senior secondary, both government and private) + Colleges (including arts, science, commerce, arts and science, arts and commerce, arts science and commerce, law, university, medical, engineering, management, and others, both government and private) + Polytechnics (government and private) 3. Number of bank branches = Nationalised bank branches + Private bank branches + Cooperative bank branches 4. Number of private toilets is available as latrine count 5. Number of private electricity collections is also directly provided in raw data 6. Number of commercial and industrial electricity connections = Industrial connections + Commercial connections + Other connections 7. Total Area is directly available in the raw data For all infrastructure, public and private, we use data from the 2011 census for the scaling analysis. The complete data set has 911 data points.
Gross Domestic Product (GDP) data: There is no official data series of urban GDP in India, so we searched for other sources. We found two small datasets: Technological Innovation data: We used the published patent records of Intellectual Property India at http://ipindiaservices.gov.in/publicsearch. Given that the data itself is not readily available in a document format, we had to individually search for patent counts on each city. We collect data for the years 2004, 2006, 2008, and 2011. Given the fact that there were a few zero data points (cities with no patents published for a given year), the scaling analysis is not directly performed on all the raw data points. Instead, we bin all the data in logarithmic bins (logbins) of population. For instance, the first logbin of population is 12.25-12.75, which is to say that for all cities whose log(population) is between 12.25 and 12.75, their patent counts are averaged, and the logarithm of this average patent count is taken. We therefore end up with 9 logbins of population from 12.25-12.75 to 16.25-16.75 and each of these logbins have a corresponding log(average patent count) measure. We plot the scaling relationship between these two derived values to arrive at the scaling exponent for patents (technological innovation, or invention). The complete data set has 320 data points (cities).

Appendix B: Scaling sensitivity to city boundaries
The following analysis, shown in Fig B1, shows scaling plots and estimated exponents for road length, number of educational institutions, number of bank branches, number of private toilets, and number of private electricity connections for un-agglomerated cities. The results of this scaling analysis are as expected from theory, with sub-linear scaling for public infrastructures (road lengths, educational institutions, and bank branches) and linear scaling for private infrastructures (private electricity connections). The only seeming anomaly is the sub-linear scaling of private toilets, which changes to linear scaling upon agglomerating individual cities into approximately functional urban units (Urban Agglomerations).

Appendix C: Challenges in urban GDP analysis
This lack of official urban GDP statistics has generated several estimates from non-official sources (details in Appendix A)one from 2008 covering 13 cities measuring GDP in Purchasing Power Parity (PPP) terms and the other from 2010 covering 9 cities only, measuring nominal GDP, see Figure C1. These data sets are inconsistent with each other, one suggests a superlinear scaling with ≃ 1.12 (roughly in line with other nations and theory), while the other suggests a slightly sublinear relationship with ≃ 0.95. Other data sources provide partial proxies for testing the hypothesis of higher value economic activity in cities. For instance, the number of commercial and industrial electricity connections (not power usage) shows superlinear scaling with city size, with an exponent of ≃ 1.08. In the absence of larger and more reliable datasets, it is difficult to say which of these relationships reflect the reality of GDP scaling, and the strength of economic agglomeration, in urban India.

Appendix D: Sensitivity of scaling exponent for technological innovation to binning
In the main text, we use logarithmic binning with 9 bins to estimate the scaling exponent for technological innovation (base case). In this section, we use a number of techniques to establish the scaling relationship and assess the sensitivity of the exponent, Figure D1. We use logarithmic binning with a larger number of bins (19) and find that the scaling exponent, ≃ 1.49 (with a 95% Confidence Interval (CI) of [1.18,1.80]), which is statistically close to the values obtained with 9 bins -≃ 1.53 with 95% CI of [1.22, 1.83]. One of the problems with logarithmic binning is the significant variations in bin sizes (with small bin sizes at the low and high ends of the scale and significantly larger bins in the middle), and to address this, we create 16 equisized bins of 20 data points each and compute the average population and patents for these bins. The scaling exponent using equisized bins is ≃ 1.55 (with 95% CI of [1.26,1.84]). Finally, in an attempt to use each of the individual data points to assess the scaling exponent, we use a simplistic approach of adding 1 to the number of patents for each city (this ensures no zero values and systematically increases patents by 1 across the board) and find that under this method, the scaling exponent, ≃ 1.55, with a 95% CI of [1.39,1.70]. Overall, under all these distinct estimates, the value of the scaling exponent and corresponding CIs does not show significant statistical variation from that obtained using the baseline logbinning approach with 9 bins.