Proceedings of the Royal Society B: Biological Sciences
Open AccessResearch articles

Drivers of geographical patterns of North American language diversity

Marco Túlio Pacheco Coelho

Marco Túlio Pacheco Coelho

Department of Human Dimensions of Natural Resources, Colorado State University, Fort Collins, CO, USA

Departamento de Ecologia, ICB, Universidade Federal de Goiás, 74.690-900 Goiânia, Goiás, Brazil

[email protected]

Google Scholar

Find this author on PubMed

,
Elisa Barreto Pereira

Elisa Barreto Pereira

Departamento de Ecologia, ICB, Universidade Federal de Goiás, 74.690-900 Goiânia, Goiás, Brazil

Google Scholar

Find this author on PubMed

,
Hannah J. Haynie

Hannah J. Haynie

Department of Human Dimensions of Natural Resources, Colorado State University, Fort Collins, CO, USA

Google Scholar

Find this author on PubMed

,
Thiago F. Rangel

Thiago F. Rangel

Departamento de Ecologia, ICB, Universidade Federal de Goiás, 74.690-900 Goiânia, Goiás, Brazil

Google Scholar

Find this author on PubMed

,
Patrick Kavanagh

Patrick Kavanagh

Department of Human Dimensions of Natural Resources, Colorado State University, Fort Collins, CO, USA

Google Scholar

Find this author on PubMed

,
Kathryn R. Kirby

Kathryn R. Kirby

Department of Ecology and Evolutionary Biology and Department of Geography and Planning, University of Toronto, Ontario, Canada

Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany

Google Scholar

Find this author on PubMed

,
Simon J. Greenhill

Simon J. Greenhill

Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany

CoEDL (ARC Centre of Excellence for the Dynamics of Language), Australian National University, Canberra, Australia

Google Scholar

Find this author on PubMed

,
Claire Bowern

Claire Bowern

Department of Linguistics, Yale University, New Haven, CT, USA

Google Scholar

Find this author on PubMed

,
Russell D. Gray

Russell D. Gray

Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany

Google Scholar

Find this author on PubMed

,
Robert K. Colwell

Robert K. Colwell

Departamento de Ecologia, ICB, Universidade Federal de Goiás, 74.690-900 Goiânia, Goiás, Brazil

Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA

University of Colorado Museum of Natural History, Boulder, CO 80309, USA

Google Scholar

Find this author on PubMed

,
Nicholas Evans

Nicholas Evans

CoEDL (ARC Centre of Excellence for the Dynamics of Language), Australian National University, Canberra, Australia

Google Scholar

Find this author on PubMed

and
Michael C. Gavin

Michael C. Gavin

Department of Human Dimensions of Natural Resources, Colorado State University, Fort Collins, CO, USA

Department of Linguistic and Cultural Evolution, Max Planck Institute for the Science of Human History, Jena, Germany

[email protected]

Google Scholar

Find this author on PubMed

    Abstract

    Although many hypotheses have been proposed to explain why humans speak so many languages and why languages are unevenly distributed across the globe, the factors that shape geographical patterns of cultural and linguistic diversity remain poorly understood. Prior research has tended to focus on identifying universal predictors of language diversity, without accounting for how local factors and multiple predictors interact. Here, we use a unique combination of path analysis, mechanistic simulation modelling, and geographically weighted regression to investigate the broadly described, but poorly understood, spatial pattern of language diversity in North America. We show that the ecological drivers of language diversity are not universal or entirely direct. The strongest associations imply a role for previously developed hypothesized drivers such as population density, resource diversity, and carrying capacity with group size limits. The predictive power of this web of factors varies over space from regions where our model predicts approximately 86% of the variation in diversity, to areas where less than 40% is explained.

    1. Introduction

    Humans collectively speak over 7000 distinct languages, and these languages are unevenly distributed across the globe [1,2]. Surprisingly, we still know little about the complex web of processes that shape these geographical patterns of language diversity (i.e. the number of languages spoken in a given region). Linguists distinguish three types of diversity—the number of languages (language diversity), the number of language families (phylogenetic diversity), and the amount of structural difference between languages (typological diversity or disparity). Here, we focus only on the number of languages, using the term language diversity, which in contrast to the more ambiguous term linguistic diversity indicates that languages are the unit of our diversity measures.

    One barrier to our prior understanding has been contradictory results from the limited number of empirical studies that have investigated the relationship between environmental and/or sociocultural variables and language diversity [1,38]. Prior studies have found mixed results for the effect of environmental variables, spatial heterogeneity, and isolation on language diversity [812]. For example, human populations may expand social networks to cope with higher levels of ecological risk, resulting in larger language ranges and lower levels of language diversity per unit area [13]. Although some prior studies have concluded that the most commonly used measure of ecological risk in linguistics—mean growing season—correlates with language diversity (e.g. [10,11]), others have found little support for this relationship (e.g. [4,8,12]).

    Two methodological challenges contribute to the inconsistencies in these results: first, previous studies have tried to identify universal predictors of language diversity, but it is possible that no universal predictor exists. Research in macroecology has shown that the drivers of observed spatial patterns in biodiversity tend to be spatially variable [1416]. We might assume that the mechanisms driving language diversification also vary from one location to another, but the methods used to date cannot capture this potential non-stationarity. Second, contradictory results may also reflect the complexity of the pattern being studied, which can be generated by a web of both direct and indirect pathways. For example, environmental drivers of language density vary across subsistence types [17]; the adoption of agriculture, or new boat and fishing technology, may transform the number of people a given ecoregion can support; or political centralization, the product of a particular historical trajectory, may homogenize a previously disparate linguistic mosaic.

    Surprisingly, only a limited number of statistical techniques have been used to explore the direct and indirect associations between multiple predictors underlying the heterogeneous spatial patterns of language diversity [1]. To the best of our knowledge, only one previous study briefly explores a simple structural equation modelling approach that considers the direct and indirect effect of three variables on the distributional range size of languages [12]. Here, we overcome prior methodological limitations by designing a path analysis model that assumes direct and indirect effects of environmental and sociocultural variables on language diversity, while exploring spatial variation in the predictors' effects. Our study is the first to use a geographically weighted path analysis (GWPath) to examine possible drivers of human diversity patterns.

    (a) Factors contributing to language diversity patterns

    Because languages are markers of social boundaries within and between groups [1820], group boundary formation is a critical step in language diversification. The formation or dissolution of group boundaries can be influenced by many different environmental and social factors [1]. Variation within a language can lead to new language formation (i.e. cladogenesis) if these group boundaries are stable and socially important, amplifying the degree of linguistic difference between groups to the point that erstwhile dialects become distinct languages. Here, our aim is to demonstrate the importance of complex paths and non-stationarity by examining a subset of variables that have been widely discussed in the literature and may contribute to group boundary formation, thus affecting spatial patterns of language diversity. We do not focus on the internal factors contributing to individual language variation [2126], rather we focus on a subset of the large-scale processes that may shape language diversity patterns in a broader ecological context.

    We examine the direct and indirect effects of eight factors hypothesized to influence group boundary formation and language diversity patterns: river density, topographic complexity, ecoregion richness, climate (i.e. temperature and precipitation constancy, and climate change velocity), population density, and carrying capacity with group size limits. Rivers and topography have recently been proposed as universal predictors of language diversity at a global scale [7]. Movement and isolation are both critical processes for the formation of group boundaries [26,27]. When groups of people move to the other side of physical barriers, the costs of interacting with neighbouring groups can increase, leading to social isolation and group boundary formation [7,28,29]. Rivers and complex topography may act as barriers to contact among groups, promoting isolation and driving diversification, in a mechanism similar to models of allopatric speciation developed in ecology and evolutionary biology to explain biodiversity patterns [29]. This mechanistic link implies that both river density and topographic complexity should be positively correlated with language diversity. Alternatively, rivers may also improve transportation, which can increase contact among groups and undermine group boundary formation leading to less language diversity in a region [7,30,31]. In addition, in regions such as Southern New Guinea [32,33] complex linguistic differentiation has occurred despite the absence of any complex topography, suggesting linguistic differentiation in circumstances of ethnic intermarriage and multilingualism can sometimes be accelerated by easily traversed terrain.

    Many prior studies discuss possible links between language diversity and biological diversity [4,11,34,35]. One possible explanation for the association between biological and language diversity is that biodiversity facilitates group boundary formation through resource partitioning [11]. The development of unique subsistence strategies and technologies may allow different groups to thrive within different ecoregions, each of which represents a distinct assemblage of species [36]. Therefore, ecoregion richness (i.e. number of ecoregions) might be expected to associate positively with language diversity.

    Climate may influence group boundary formation and geographical patterns in language diversity via multiple pathways [17]. For example, unstable and extreme climatic conditions of temperature and precipitation contribute to higher ecological risk for human groups, which can lead to the growth of larger social networks that provide a source of alternative resources and manage risk [9,13,32]. Larger social networks limit group boundary formation and promote linguistic homogenization [10,37]. Therefore, we would expect fewer languages in areas that experience greater fluctuation in climatic conditions of temperature and precipitation. We propose that the velocity at which the climate has changed may also be a proxy for long-term ecological risk, because higher velocity of climate change indicates more instability of climate in a region over longer periods of time. In addition, the velocity of climate change over longer periods of time played an important role in the human colonization of the globe, opening pathways and territories for settlement where climatic conditions were suitable for humans (e.g. warming of northern regions) [38].

    Climate may also influence language diversity through its effects on human population densities. When climatic conditions are favourable (i.e. warm and wet) and predictable, human groups can be more assured of rich and stable sources of resources that may support higher population densities [3941]. Several other environmental and sociocultural variables also shape potential population densities. For example, population densities may increase in coastal regions, given greater access to marine resources; in topographically complex areas due to access to a range of nearby ecosystems and restrictions on available level surfaces for settlement [41,42]; and in areas of higher river density, where rivers provide services such as food and water that directly affect the establishment of human groups [7]. In addition, less mobile groups and those with established land ownership norms tend to have higher population densities [41,43,44].

    Multiple possible mechanisms link higher population densities with greater language diversity per unit area. As has been suggested in ecological theory, regions that support more individuals may also accumulate more diversity over time due to stochastic diversification events [44,45]. If more individuals exist in a given location, the probability of high linguistic variation also increases, and therefore we expect higher rates of diversification. Similarly, Bromham et al. [46] found that larger populations have faster rates of innovation, which could lead to more languages as changes accumulate. Another possible link involves the effects of group size on boundary formation. Large groups provide more opportunities to cooperate in resource acquisition, but also increase the costs associated with maintaining social ties [10,47,48]. Limits on the size of human groups imply that regions that can support higher population densities will tend to have greater language diversity [49]. However, these limits are not fixed—for example, increases in food production per unit area (e.g. as a result of the development of intensive agriculture) as well as the evolution of centralized political institutions have both been associated with increases in maximum group sizes and linguistic homogenization [50,51].

    Prior studies seeking to identify factors linked to language diversification have been almost exclusively based on correlative analyses [1], in which no causal story is modelled [52]. Recently, a relatively simple mechanistic simulation model explored causal explanations for language diversity in Australia [49]. The model reproduced the spatial pattern of language diversity in Australia assuming only that carrying capacity varies over space as a function of the environment, and groups have maximum size limits (i.e. carrying capacity with group size limits) [49]. However, the carrying capacity with group size limits mechanism remains untested in other regions of the world.

    Here, we test the hypothesized effect of each of the eight factors discussed above (river density, topographic complexity, ecoregion richness, temperature and precipitation constancy, climate change velocity, population density, and carrying capacity with group size limits) using a path analysis that models the multiple paths through which predictors could be associated with language diversity. Each pathway implies a different set of mechanisms that may shape language diversity. River density, number of ecoregions, topographic complexity, and climate may directly shape language diversity, or influence diversity indirectly through effects on population density. Population density can also directly affect language diversity, or influence diversity by contributing to the carrying capacity with group size limits mechanism. Therefore, large groups of people can occupy small areas if population density is high, which affects the total number of groups in a given region. We designed two types of path analysis models, one assuming that the relationship between predictors is constant over space (i.e. Stationary Path Analysis), and another assuming that the relationship between predictors may vary over space (i.e. GWPath). Our analysis examines the strength of associations between the hypothesized predictors and language diversity, and how these effects vary over space. The only variable that explicitly captures a causal relationship is carrying capacity, which is produced by a mechanistic simulation model (see Methods and [49]).

    (b) Geographical domain

    We applied our models to understand the spatial pattern of language diversity in North America. We obtained the distribution of languages in North America from Goddard [53], which provides information about the approximate spatial distribution, around the time of colonial contact, of languages north of Mexico, and the Survey of California and Other Indian Languages, which provides additional detail in a particularly diverse region. Using these data, we calculated the number of languages occupying geographical cells on a gridded map at the resolution of 300 × 300 km (figure 1; See Sensitivity analysis in the electronic supplementary material).

    Figure 1.

    Figure 1. Observed language diversity. Language ranges are shown in the gridded map. Blank spaces on the map indicate regions in which no information about language distribution is available and thus were not compiled in the grid map. (Online version in colour.)

    North America provides an ideal setting to examine how the relative effects of explanatory factors vary over space, as the continent contains a wide range of environmental and sociocultural conditions and a wide spectrum of language diversity. Prior to European contact, the continent supported hundreds of languages [53,54], unevenly distributed over the continent, with greater richness along the west coast and at lower latitudes [53,55]. Prior research has proposed many factors to explain the empirical pattern of North American language diversity (e.g. [55]), but no empirical study has tested them. Here, we explore the direct and indirect effects of river density, topographic complexity, ecoregion diversity, climate, population density, and carrying capacity with group size limits on the spatial pattern of North American language diversity. These factors encompass proposed drivers of language richness in North America and are also expected to drive global patterns of language diversity [29].

    (c) Results and discussion

    To explore both indirect and direct effects of each factor, we first conducted a stationary path analysis that assumes the effects of environmental and sociocultural variables are constant over space. The variables included in our model vary in the direction of effect (i.e. negative and positive; figure 2). Population density, carrying capacity with group size limits, and ecoregion richness had the strongest direct effects, suggesting a role for multiple mechanisms in shaping language richness patterns (figure 2).

    Figure 2.

    Figure 2. Global path model quantifying direct and indirect effects of environmental and sociocultural factors on North American language richness. The numbers marking each arrow represent the standardized β coefficients (i.e. path coefficients) for language diversity. Model fits (R2) are shown for variables directly affected by other factors. (Online version in colour.)

    Population density had the strongest direct effect on language diversity (β = 0.44; figure 2), supporting the proposed mechanism that a larger number of individuals should lead to a greater accumulation of languages. The simple mechanistic model, simulating the effects of varying carrying capacity with group size limits was also one of the strongest predictors of language diversity (β = 0.25, figure 2). Therefore, in regions with higher potential carrying capacity, limits on the size of human groups tended to lead to greater language richness [49]. Finally, the strength of the direct effect of ecoregion richness (β = 0.20, figure 2)1 implies that resource partitioning may contribute to language diversification [11], as unique subsistence strategies and technologies could allow different human groups to thrive within different ecoregions.

    We emphasize here that carrying capacity with group size limits is the only component of our path analysis that is modelled in a mechanistic, explicitly causal manner. The correlations used to explore all the other components indicate an association with language diversity, but future simulation modelling will be needed to verify the causal mechanisms that link these components with language diversification.

    The stationary path analysis approach also demonstrates the indirect roles played by several variables. For example, if we evaluated only the direct effects of variables, as was commonly done in prior language diversity studies [11], we would conclude that topographic complexity has little influence on language diversity. However, each of these variables does have a substantial indirect effect by shaping population density (figure 2). Topographic complexity may indirectly affect population density through its positive association with resource availability [5658], which, in turn, may influence the number of people that can live in a given location (i.e. population density; [41]).

    2. Geographically weighted path analysis

    The combination of environmental and demographic variables in our stationary path analysis explains 50% of the variation in the spatial pattern of language richness in North America (figure 2). The stationary path analysis has a large statistical effect (effect-error ratio = 28.430) relative to the magnitude of error given the null expectation (see Comparison to a Null Model in the electronic supplementary material). However, this analysis does not allow us to explore how drivers of linguistic richness vary over space. To overcome this limitation, we conducted a GWPath, which assumes that the effects of hypothesized factors may vary over geographical space. To the best of our knowledge, this is the first study to apply a GWPath to examine human diversity patterns.

    The effects of the predictors we tested vary widely over space (figure 3a). The overall model performs well in some regions of North America (e.g. the northwest region where R2 ∼ 0.80, figure 3b), but the model fit varies over space (36–86%), with an average R2 of 0.61. Our model also has a large statistical effect over space relative to the magnitude of errors given the null expectation (minimum effect-error ratio = 3.7, see Comparison to a Null Model in the electronic supplementary material). In addition, we find no universal predictor of language richness. Instead, the variables that most strongly affect language richness change from one region to another across the continent (figure 3c), implying that the mechanisms of language diversification also vary over space. This result helps to explain why the variables tested in previous global-scale studies tend to explain only a limited portion of the variability in language richness, and why different regional analyses point to the importance of distinct sets of variables [1]. Spatial variation in explanatory variables is also found in macroecological analyses of species diversity patterns (e.g. [15,59,60]). For example, although species diversity is strongly limited by water availability in southern regions, in northern regions energy availability is more important [59]. Our results show not only that the most important predictor varies over space, but also that predictors can vary in the direction of their effects in different regions (figure 4). Climate change velocity presents different directions of effect in two different regions of North America: the northern region and eastern region (figure 4d). In the northern region, climate change velocity has a positive direct effect on language richness, while the effect is negative in the eastern region (figure 4d). The high rate of climate change in the northern region reflects rapid warming following the Last Glacial Maximum (LGM) (e.g. ice sheet melting, [61]), which likely opened ecological opportunities for human populations to obtain more resources given the positive effect of past climate change on many aspects of biodiversity in these northern regions [62]. Conversely, in the eastern region (figure 3c), the effect of climate change velocity is negative (figure 4), suggesting that climatic instability since the LGM prevented or reduced language diversity. The effect of climate change velocity across both regions is consistent with a long-term version of the ecological risk hypothesis [9,13]. Nettle [13] proposed that in areas with high seasonal variation in food availability, humans will experience high levels of ecological risk. An increased probability of food deficiencies may force people to form social bonds across wider areas, to ensure access to sufficient resources. Wider social networks may increase the geographical range of a language and reduce language diversity in areas that pose greater ecological risk. Over thousands of years of human spread in North America, higher climate change velocity likely decreased ecological risk in northern regions, while climatic change may have increased ecological risk farther south. The strong indirect effect of temperature constancy (figure 2; electronic supplementary material, figure S5b) on language diversity is another indication of the importance of ecological risk for shaping population density and language diversity.

    Figure 3.

    Figure 3. GWPath applied to North American linguistic diversity. (a) In the GWPath model, the standardized β coefficients of variables, as well as the R2 for the direct relationships are represented by the average value over the continent, followed by its standard deviation. (b) Model fit varies over the geographical domain of North America. (c) Variables with the highest total coefficient (sum of direct and indirect effects) also vary across the continent. (Online version in colour.)

    Figure 4.

    Figure 4. Direct effect of predictors mapped over the North American domain. The standardized β coefficient is mapped for (a) river density, (b) ecoregion richness, (c) topographic complexity, (d) climate change velocity, (e) precipitation constancy, (f) population density, and (g) carrying capacity with group size limits. (Online version in colour.)

    Our GWPath also reveals that river density is not the primary predictor of language diversity in any region of North America (figure 3c). River density has been proposed as a global universal predictor of language diversity [7], but it does not show substantial effects in any region of North America when compared to other variables (figure 3c).

    Where our model performs best (R2 > 0.5; red areas in figure 3b), population density and climate (i.e. temperature or precipitation constancy) are the variables most strongly affecting language diversity (figure 3c). The strong association of these variables in the areas of highest model fit provides support for several of the proposed pathways of language diversification (See factors contributing to language diversity patterns). Therefore, in those regions we can identify the best predictors of language diversity and better understand what is driving the performance of our model. However, in other regions (green in figure 3b), the model explains less than 50% of the variation in language diversity (R2 < 0.5). One possible reason for the poorer model performance in these regions is that pre-colonial human groups may have used rivers differently in different regions. The observed effect of river density on language diversity in the areas of lower model performance is the opposite (negative effect) to what has been hypothesized in the literature (figure 4a). One potential mechanism that may explain this negative correlation involves the impact of rivers on transportation. Compared to the west, many of the rivers in the central part of the continent flow through plains with fewer rapids, making them more navigable. Therefore, these rivers may have served to connect human groups and reduce language diversity, as opposed to acting as a barrier and means of group boundary formation. Finally, there are multiple sociocultural and historical factors that cannot be summarized in gridded map cells, and thus are absent from our model, including subsistence strategies, agricultural development, trade, and political complexity [12,29,63] that may be part of the unexplained percentage of variation. For example, the spread of politically complex agricultural societies may be a dominant factor in the reduction of language diversity [12].

    To the best of our knowledge, this is the first study to investigate the complex web of predictors underlying geographical patterns of language diversity. We show that the strongest effects on North American language diversity involve variables associated with previously developed hypotheses that assume the effect of resource availability, resource diversity, and climate affecting population density, and thus language diversification. The many factors are connected in a complex web of causality, consisting of both direct and indirect effects. Moreover, no single predictor explains the pattern of language diversity in North America, and the best predictors of language diversity vary over space. Thus, our study sheds light on important points that should be taken into consideration in future studies of language diversity, namely that the ecological drivers of language diversity are neither perfectly universal nor entirely direct. The combination of path analysis techniques with the exploration of non-stationarity in predictors' effects can help us to examine these complexities, and better understand a more complete picture of human biogeography. The methodological approach outlined here may serve as a template for exploring the potential interaction between multiple factors that have shaped geographical patterns of human diversity across the planet.

    3. Methods

    (a) Data

    We obtained the approximate distribution of languages in North America immediately prior to European contact from two sources. We used the Survey of California and Other Indian Languages map (http://linguistics.berkeley.edu/~survey/resources/language-map.php) for the approximate spatial extents of California language ranges, and we digitized language ranges for other regions from Goddard [53]. The final map consisted of 344 language ranges. The geographical domain of North America was represented by an equal-area, gridded map at the resolution of 300 × 300 km. Our choice of this grid resolution ensured that grid cells were small enough to capture the variation in language diversity across space. We tested the sensitivity of our results to different grid resolutions; and we concluded that the results remained qualitatively insensitive to grid resolution (see Sensitivity Analysis in the electronic supplementary material). We computed the number of languages (i.e. language diversity) and extracted each predictor variable for each grid map cell (electronic supplementary material, figure S6).

    High-resolution river maps for North America were obtained from the Global Self-Consistent Hierarchical High-resolution Shoreline dataset ([64], www.soest.hawaii.edu/wessel/gshhg/). Following Axelsen & Manrubia [7], we defined river density as the number of river branches within a geographical cell. We obtained data on ecoregions from the Terrestrials Ecoregions of the World dataset ([36]; www.worldwildlife.org/publications/terrestrial-ecoregions-of-the-world), and we used the number of terrestrial ecoregions within each geographical cell as a measure of ecoregion richness. We measured topographic complexity as the standard deviation of elevation above the sea level (m) within a cell ([65]; www.worldclim.org/). We used climate change velocity since the LGM [62] as a measure of long-term ecological risk. Climate change velocity measures the rate of displacement of climate over the geographical space by dividing the climatic difference between two periods by climate change over space. We calculated the inter-annual variability (i.e. constancy) of temperature and precipitation following the Colwell index of constancy [66]. Constancy is used to describe the time-independent magnitude of variability of temperature and precipitation. We calculated precipitation and temperature constancy using data from ecoClimate [67] for 1900–1949 from the CCSM4 model. We extracted the estimated population density (people per km2) for foraging societies [42] in each grid cell (see Population Density in the electronic supplementary material).

    The effect of carrying capacity with group size limits on language diversity was simulated through a recently proposed mechanistic simulation model of language diversity (see Simulation Model section in the electronic supplementary material for additional details) [49]. The model's basic assumption is that the carrying capacity of a region is a function of the environment. Thus, locations that support more humans per unit area can also support more languages. The model accurately predicted the diversity of Australian languages [49], and here we apply it to North America. After running the model, replicated 120 times, we used the simulated geographical distribution of language ranges to summarize the model's prediction in the 300 × 300 km grid of North America. The prediction extracted from the model and used in our path analysis was a ratio between the number of languages predicted in each cell and total number of languages predicted for the geographical domain. We used the average among 120 model replicates as our carrying capacity with group size limits estimation in the path model.

    (b) Statistics

    Based on the hypothesized roles of the predictors used in our study on language and cultural diversity, we designed a path analysis model including the direct and indirect effects of our predictors on language diversity (figure 1). We evaluated the proposed direct and indirect effect of each variable on language diversity while controlling for the effects of the remaining predictor variables. We used the standardized partial slope coefficient of a multiple regression (i.e. path coefficient) to represent the strength of the effect of each variable on language diversity. This modelling technique allows us to explore direct, indirect (i.e. multiplication of direct coefficients), and total effects (i.e. sum of direct and indirect coefficients) of each predictor.

    Path analysis assumes stationarity in the relationship among variables, but no theory would suggest that mechanisms of language diversification must be the same in all locations. In order to explore the potential for non-stationarity in our results, we also employed a GWPath, in which we estimated the coefficients for the predictor variables for each geographical cell following a Geographically Weighted Regression (GWR) [14] with a Gaussian distance function. We estimated a bandwidth for the GWR by visual inspection [14] and Akaike criteria model selection, which considers the likelihood of the model as well as its complexity. The best bandwidth obtained was 8° (approx. 880 km), which avoids overfitting and has a good fit to empirical data. Statistical analysis was conducted in R. GWPath used the ‘gwr’ function of the ‘spgwr’ package ([68]; also see electronic supplementary material for data and code). We also compared the predictions of our model against the expectations of a null model, which randomized language diversity in North America among grid cells, effectively removing the spatial pattern in language diversity (see Contrast Against a Null Model in the electronic supplementary material).

    Data accessibility

    The data used in this study are available as electronic supplementary material.

    Authors' contributions

    M.T.P.C. and M.G. jointly conceived the study. M.T.P.C. led the writing and created the figures with input from all authors. E.B.P. performed statistical analysis. H.H., produced the language distribution map. M.T.P.C., E.B.P., K.K., H.H. and P.K. processed the spatial data. T.F.R. programed the mechanistic simulation and M.T.P.C. applied it for North America. All authors contributed conceptually to the design of the study and interpretation of results.

    Competing interests

    We declare we have no competing interests.

    Funding

    Research was supported by the National Science Foundation (award no. 1660465). M.T.P.C. and E.B.P. are supported by PhD scholarships provided by Coordenação de Aperfeiçoamento de Pessoal de Nível Superior, Brasil (CAPES - Finance Code 001). T.F.R. is supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, grant nos. PQ309550/2015-7), and by INCT in Ecology, Evolution and Biodiversity Conservation (grant nos. MCTIC/CNPq 465610/2014-5 and FAPEG 201810267000).

    Footnotes

    1 In Australia, there are language-origin stories explicitly linking language regions of clans to ecological differentiation through staple foods, such as the tradition of the founding ancestress Warramurrungunji [25], who placed different plant foods (lily roots, yams, etc.) in different parts of the landscape at the same time as she placed people there and instructed them in what their clans would be, what their languages would be, and what they would eat.

    Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.4440170.

    Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.

    References