Philosophical Transactions of the Royal Society B: Biological Sciences
You have accessResearch articles

The potential to infer the historical pattern of cultural macroevolution

Dieter Lukas

Dieter Lukas

Department of Human Behavior, Ecology and Culture, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig, Germany

[email protected]

Google Scholar

Find this author on PubMed

Mary Towner

Mary Towner

Department of Integrative Biology, Oklahoma State University, Stillwater, OK 74078, USA

Google Scholar

Find this author on PubMed

Monique Borgerhoff Mulder

Monique Borgerhoff Mulder

Department of Human Behavior, Ecology and Culture, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz 6, 04103 Leipzig, Germany

Department of Anthropology, University of California Davis, Davis, CA 95616, USA

Google Scholar

Find this author on PubMed


    Phylogenetic analyses increasingly take centre-stage in our understanding of the processes shaping patterns of cultural diversity and cultural evolution over time. Just as biologists explain the origins and maintenance of trait differences among organisms using phylogenetic methods, so anthropologists studying cultural macroevolutionary processes use phylogenetic methods to uncover the history of human populations and the dynamics of culturally transmitted traits. In this paper, we revisit concerns with the validity of these methods. Specifically, we use simulations to reveal how properties of the sample (size, missing data), properties of the tree (shape) and properties of the traits (rate of change, number of variants, transmission mode) might influence the inferences that can be drawn about trait distributions across a given phylogeny and the power to discern alternative histories. Our approach shows that in two example datasets specific combinations of properties of the sample, of the tree and of the trait can lead to potentially high rates of Type I and Type II errors. We offer this simulation tool to help assess the potential impact of this list of persistent perils in future cultural macroevolutionary work.

    This article is part of the theme issue ‘Foundations of cultural evolution’.

    1. Introduction

    Human societies exhibit a diversity of cultural practices around the world (e.g. [1]). The field of cultural macroevolution aims to identify the origins of this diversity and the factors shaping the distribution of cultural variation across societies [2]. It is now almost obligatory for studies investigating cross-cultural variation to link cultural practices to phylogenies derived from linguistic, morphological or genetic data [35]. Phylogenetic information is often included in comparative studies, when testing for associations in the distribution of cultural variation, to account for the potential dependencies that arise among traits from a shared history. In addition, to fully understand the history of cultural traits and to determine whether any association among cultural traits does reflect a causal relationship, phylogenetic reconstructions of trait evolution trace changes in cultural practices across a tree reflecting ancestral relationships among the societies. Phylogenetic reconstructions of trait evolution offer the potential to infer what cultural variants might have been present in the past, how variants change from one state to another and what socioecological conditions might have influenced such trajectories of change [6]. As such, they allow us to test a wide range of hypotheses for the patterning of human cultural variability, bringing precision to the pursuits of early anthropologists, such as Boas' [7] interest in separating the roles of culture, environment and biology, or Murdock's [8] proposal that changes in residence rules precede change in other social structures.

    Phylogenetic reconstruction of trait evolution relies on two steps: the first is to infer the likely historical relationships among populations, and the second is to determine whether changes in a cultural trait relate to the patterns of historical splits among these populations. Here, we focus on the second part: we assume that a tree is available, and that we want to understand where and when on the tree changes in the cultural traits occurred. In most instances, we do not have information from the past to guide our inferences of the history of a trait. In effect, phylogenetic reconstructions of trait evolution are not opening a window onto the past, but painting a picture about the past based primarily on information from the present.

    The accuracy of this picture depends on how well we address at least four challenges. The first is to assess how much the past is likely to resemble the present. If traits change rapidly, we cannot say with certainty which of the variants at the tips of the phylogeny might have been present at a particular point in the past, which in our painting metaphor would be akin to the mix of colours from neighbouring societies leading to brown and blurry internal nodes. Second, the accuracy of the picture is fundamentally affected by the assumptions we use to link the current data to patterns in the past, and whether we use appropriate models of rate and directionality in the transmission of cultural variants. In terms of the metaphor, are we even using the right brush for this type of paint to capture our depiction of the past? These first two challenges question whether a phylogenetic method is appropriate to make inferences about the past for the particular trait. Third, the accuracy will depend on how complete our present information is, on whether we have an adequate sampling of cultural practices to make proper inferences about the past. In terms of the metaphor, do we have the full range of colours in our palette or are some of them missing, and if so why? Fourth, and related, what if there are traits that predominate in the present that did not exist in the past? We may be working with an entirely inappropriate paint box. The last two challenges relate to the sample available to answer a specific question about the past.

    Many studies have investigated the strength of cultural ancestry of various traits by mapping them onto an independent language phylogeny, and then directly evaluate their fit with population history in order to detect phylogenetic signal. For example, Moylan et al. [9] examined the distribution of 55 East African cultural traits across a linguistic phylogeny and found that only 18 showed a clear phylogenetic branching pattern. Subsequent studies across multiple cultural domains report widely varying phylogenetic signals for both material [10] and social organizational traits [11], signals that can also vary by the scale and the prevalence of cultural boundaries [12,13]. Even across non-humans, behavioural traits often show very low phylogenetic signal compared to morphological and physiological traits (as in non-human primates [14]) and other animals [15]. This widely varying strength of cultural ancestry likely reflects an interplay of the factors listed above, raising multiple challenges for phylogenetic analyses of cultural traits. These include (i) the extent to which the history of a trait can be reliably reconstructed using population history (as captured in a phylogeny), (ii) whether traits change primarily at evolutionary time scales, (iii) whether as investigators we have the appropriate data and (iv) samples from which to infer past trait states.

    We first examine these four challenges (§2) and next present simulations (§3) to illustrate some implications of these challenges for inferences in studies of trait evolution. We end (§4) with a discussion of how and why advances in addressing these challenges can make phylogenetic approaches a powerful tool for understanding human cultural diversity. The challenges we review are not fundamental flaws that prohibit cultural phylogenetic approaches, but highlight our need to know the extent to which they affect inferences drawn from phylogenetic analyses. Accordingly, we offer a checklist and our simulation as part of a potential workflow assessing the challenges researchers using phylogenetic methods face, in the spirit of ‘caveat emptor.’

    2. Review of the challenges

    (a) To what extent can the history of a trait be reflected by a phylogeny?

    (i) Are cultural traits inherited together or do they have independent histories?

    Just as biologists recognize that every gene has its own history, so social scientists appreciate this could also be true for cultural traits (e.g. [16,17]). Boyd et al. [18] evaluate how human societies differ with respect to how integrated are their cultural traits. At one end of the continuum, societies are seen as consisting of a tightly knit set of cultural traits, while at the other end, of largely ephemeral traits with only low coherence. Generally, the empirical evidence points to the middle of the continuum, with societies containing a vertically transmitted core of integrated cultural traits in addition to a hugely varying proportion of more peripheral traits, some horizontally borrowed from other populations and some independently invented. Bayesian phylogenetic analyses can be used to identify incongruent cultural histories, by enabling researchers to classify traits as core or peripheral and then test whether allowing rates of change between partitions provides a better fit with the data than assuming equal rates of change [19]. While this approach can show how different traits are likely to have had different histories, it cannot reveal the particular history of the individual traits. This leaves unresolved a determination of the extent to which an independently derived phylogeny can capture the distinct histories of different culturally transmitted traits.

    (ii) Can we determine how much deviations from exclusive vertical transmission of traits will affect our ability to infer internal nodes?

    Various approaches have been suggested to identify the role of horizontal transmission, some retaining and others abandoning tree-based approaches. For example, biologists explicitly model the possibility that not all genes within an individual will necessarily share the same branching phylogeny (reconciliation analysis [20]; incomplete lineage sorting [21]), a method that can be applied to human cultural traits [19,22,23]. Cultural evolutionists turn to network analysis [24] or popularity spectra [25] to detect horizontal transmission, shown in each of these cited studies to be particularly predominant in oceanic environments. Other comparative social and evolutionary scientists turn to various matrix frameworks, employing multiple regression, Mantel tests and autologistic regression models to detect shared ancestry and/or cultural diffusion in their data (e.g. [2630]). While these latter alternatives can indicate the relative contributions of vertical versus horizontal transmission, there is generally no direct way to link them back to inform inferences of internal nodes.

    Simulation studies are particularly helpful in examining the sensitivity of inferences about evolutionary processes to horizontal transmission. Nunn et al. [31] studied character evolution in a spatial framework, and showed that horizontal transmission can in some cases produce misleading inferences about evolutionary processes. Others show that identifying patterns of trait evolution, and trait values at internal nodes, depends not just on rates of horizontal transmission but on whether traits are borrowed as packets or singly [32] (but see discussion in [33]), or advise on focusing on less ‘unrealistic’ rates of horizontal transmission [34]. But—and this is the point—we rarely have direct windows onto the past, so speculations over whether traits are borrowed as packages or singly, and at what rate, are questionable [18,19]. Decisions about what to consider a tolerable rate of borrowing or how to include borrowing explicitly in any analysis depend on what we know empirically, on the nature of the trait, and on whether the borrowing is global or local (as discussed in [24,25,3133]).

    (b) Does the trait change over evolutionary or shorter time scales?

    (i) Can we identify the rate at which societies change their cultural traits?

    If cultural traits are highly facultative, and change over very short periods of time, they may not be amenable to phylogenetic reconstructions of their evolutionary history because there is simply too much variation at the tips of the trees. The signature of history might be weak compared to the strength of current selection pressures [35,36]. Indeed, simulations show that high rates of evolutionary change have a strong depressive impact on measures of phylogenetic signal when determined through the fit of such traits onto an independently determined tree (the ‘true’ tree, see [33]). Estimating highly facultative/flexible traits on an independently derived tree risks obscuring the possibility that there were in fact multiple undetectable changes on each branch. A phylogenetic approach cannot accurately reconstruct internal nodes if these do not retain a signal about the past [37] and might instead present a regression to the mean with larger changes earlier in the tree where deeper branches merge.

    The problem of highly facultative traits is well-illustrated in a recent innovative study linking archaeological and contemporary ethnographic data to examine the association between dwelling size and post-marital residence [38]. By plotting dwelling size and residence rules onto a time-calibrated global super-tree of human populations, the authors show that changes in house size precede changes in residence patterns. While there are possible explanations for these findings, it is hard to see intuitively how dwelling size (which might entail simply adding an extra room) is less mutable over evolutionary time than post-marital residence norms (which are known to vary with other traits such as forms of property transmission, lineality and warfare). Dwelling size is by no means a uniquely facultative trait; polygyny, for example, also appears to be highly volatile [36,39], raising the question of whether it is ever reasonable to put highly facultative traits on a phylogeny built on language evolution [35,40,41]?

    Identifying rates of change and flexibility is complicated. Short of a time machine, direct evidence from history, archaeology or paleoanthropology is obviously the gold standard [41]. The preferred method for inferring past changes is to identify an independent source revealing the historical relationship among populations (step 1, as noted above), typically a linguistic or genetic tree with known branch lengths. Investigators then use the observed (tip) value of cultural traits to estimate (with probabilities) past values of the cultural trait at its internal nodes, and where possible triangulating with independent sources of data, such as archaeological data (e.g. [42]) or other well-known (and typically more recent) cultural sequences (such as the technological changes in brasswind cornets [43]). However, for many behavioural and cultural traits such historical data are unavailable.

    A different approach entails making predictions, both from intuition and empirical patterning, about the likely conservatism or volatility of traits in different domains: family and kin-based traits, for example, have long been held by anthropologists to change slowly (e.g. [44]), or at least more slowly than traits directly linked to the environment [45,46], but other patterns can emerge. Evidence from the Austronesian language family shows phylogenetic and geographical (or both) patterning of some of the social/kinship traits studied, but no clear model for just under half the 78 traits studied [11]. Similarly, material and technological traits can show widely varying phylogenetic signal [26,46,47].

    (ii) Does the granularity at which cultural traits are defined affect the inferred rate of change?

    Any measured rate of trait replacement depends on the granularity of trait measurement and categorization. This is particularly likely to be a problem for cultural traits, which are often reported as categorical states (such as bridewealth, dowry, brideservice and no payments), states that are in many instances further combined into binary categories to retain statistical power for analysis [48]. Transition rates may reflect such granularity: for example, a shift from matrilocality to non-matrilocality could take longer than a shift to multilocality, which encompasses matrilocality (see §2b(iii)). The difficulty of inferring transition rates is further compounded if the rate differs in different parts of the tree [49] or if it is higher for some transitions than for others [50]. Finally, studies of contemporary cultural transitions, such as how bridewealth payments shift into effective dowry [51,52] or how matrifocal institutions emerge and wither [53,54], can throw light on the stability and/or changeability of these specific traits, but generalization to other contexts remains problematic.

    In short, we still grapple with deriving a priori hypotheses for which cultural domains are most conserved as well as with developing sound methods for determining how rapidly they change. While the ancestral relationships among variants of cultural traits can be represented as trees using ‘the most parsimonious cladogram’ [55, p. 175], the calculation of indices to represent historical changes from such cladograms is problematic ([33], §4d). It is worth stressing that the majority of traits show little or no patterning with demographic and/or geographical indices that are supposed to reflect population histories (e.g. [11,36,56]). This all points to the importance of facultative adaptation across different domains, as envisaged by Rogers & Cashdan [35]. We should then, as previously emphasized (e.g. [19,57]), acknowledge huge variability in the rates at which traits change across domains, world regions, historical periods and geographical scales before deciding whether or not a phylogenetic approach is appropriate for our particular trait of interest.

    (iii) Are the traits of interest measurable at a scale whereby the question can be satisfactorily answered?

    Closely related to the argument above, the grain at which traits are measured may preclude, or even distort, understanding of evolutionary dynamics [58]. Comparisons across large samples of diverse societies raise challenges for identifying traits that can be both measured in all societies and considered homologous. A trait that appears similar in different societies because it appears in the same context (e.g. ensuring food security in agriculturalists and foragers) might represent independent inventions or borrowing [18].

    The granularity at which traits are measured is likely to affect our ability to detect sequences of cultural transitions, whether A is a necessary precondition to B or vice versa. For example, consider ‘social complexity’, a central trait in the debate over whether moral gods precede or postdate the emergence of complex societies [59,60]. In an examination of the robusticity of the measure of social complexity, Miranda & Freeman [61] show how two societies with radically different social organizations can look the same in terms of a unidimensional measure of social complexity. By exploring beyond the first principal component of a society's social complexity ranking (see also [62]), the authors develop a scale that could add considerable nuance to the conclusion (made from transition analyses) that societal complexity causally preceded moralizing gods. The fact that often only coarse codings are available in no way invalidates phylogenetic (or any other comparative) analyses, but clearly impedes the strength of precise inference, particularly with respect to sequences [48]. Furthermore, it bears stressing that breaking down traits into multiple states will likely make it harder to identify phylogenetic signal, if finer grain traits are more likely to change (as discussed in §2b(ii)).

    More generally, for each of these data-related problems discussed in this section, starting from a clear theoretical framework that outlines the specific predicted relationships among well-defined variables can help to decide whether the empirical data available are adequate to the question at hand. For example, to understand the evolution of a concept such as ‘social complexity,’ now that social scientists recognize its general patterning we might learn more from examining the distribution of specific social traits associated with complexity (as in [63], for non-humans) rather than continuing to employ coarse unidimensional variables. Where binarization is unavoidable, it should be based on clear theoretical justification.

    (c) Are there appropriate independently derived population trees available for phylogenetic analysis?

    (i) Is there an appropriate tree to study the trait of interest at the scale of interest?

    Assuming the use of a phylogenetic tree is deemed appropriate, the next question is whether the tree available well reflects the demographic history of the culture-bearing populations. With the use of genetic or, more commonly, linguistic trees, an affirmative answer seems reasonable insofar as the associations between populations and their languages are generally quite stable. However, such detailed trees can generally only be constructed for a small number of closely related populations. Super-trees [64], which patch together multiple of these trees into a global phylogeny [65], offer the potential to explore the generality of explanatory hypotheses for cultural diversity, but they may incorporate very different histories in different parts of the tree. These differences could reflect the distinct data sources from each branch of the tree, or genuine lineage-specific evolutionary patterns, to which we turn below. It is also important to consider whether the trees chosen to represent population histories reflect a time scale appropriate to the social learning and transmission processes that generated the data being evaluated by the comparative method.

    (ii) Are there lineage-specific or adaptive effects where trait changes differ in different parts of the sample?

    Traits may show different evolutionary patterning in different language families and/or in different parts of the world. Take, for example, post-marital residence (whether a new married couple live with his, her or neither set of parents). Given the range of ecological, technological and institutional factors that have been shown to influence this cultural pattern, it is hardly surprising that there are contradictory findings between different studies conducted on different samples (as discussed in [50,66]). These can usefully be thought of as lineage-specific effects with respect to the patterning of evolutionary transitions [49]. As long as the sample sizes in each part of the sample are sufficient for inference, this diversity should be seen as a strength of phylogenetically based global comparisons, providing material with which to sharpen our understanding of the precise conditions under which particular evolutionary transitions occur. In short, while there may be a justification for aiming to test classical hypotheses at increasingly global scales (e.g. [60]), failure to find generalized support may actually be highly informative. In a similar vein, we might expect to find different evolutionary dynamics in agricultural versus foraging societies, rendering problematic inferences made from current samples that are typically dominated by agriculturalists, who (as in our opening metaphor) offer only a limited palette.

    In cases where we know or suspect the specific factors that influence cultural patterns differently across the tree, we can be more explicit about evolutionary processes. For example, Ross et al. [67] first built an explicit evolutionary model to assess how the classification of societies as stratified or not might influence the origin and maintenance of the practice of female genital modification. Next, they used an approach based on the Ornstein–Uhlenbeck process [68], which allows for adaptive hypotheses to be evaluated with phylogenetic models that explicitly include adaptive dynamics, both to test for coevolution across the phylogeny and to contrast the strength of selection on a trait with that of drift and other selective processes. In addition to testing explicit coevolutionary hypotheses, external knowledge about the influence of a factor on the cultural trait can also help to reduce the error in the reconstruction of the history of the trait.

    (iii) Are inferences affected by the resolution and structure of the tree?

    Phylogenetic relationships among an increasing number of societies are starting to be described, primarily based on language patterns [6], and increasingly incorporated into super-trees (e.g. [69]). Details of the phylogeny are important to understanding how much information about evolutionary processes can be extracted [70]. One issue is uncertainty in the tree: researchers need to consider carefully whether their tree provides a clear branching pattern, as well as robust dates for when the splits occurred. Analyses can now incorporate uncertainty, but often at a reduction in the power to make inferences [71]. In addition, the shape and the size of the phylogeny influence the inferences that can be made. Small sets of societies that branch off very early can become very influential for the reconstruction of internal nodes, in the same way that outgroups are used to reconstruct the state of a common ancestor of a clade [72] if data are available for only few societies in the main clade. More generally, balanced trees, with regular branching patterns across the tree, appear more robust to phylogenetic inferences of cultural traits, especially in the presence of horizontal transmission, than unbalanced trees [73]. By contrast, phylogenies with early bursts of speciation, with long branches leading to the societies, contain less information to extract the correct patterns of changes in cultural traits [70].

    (d) Is this the right sample from which to make inferences?

    (i) Is the sample large enough?

    Finding societies, or ethnolinguistic units, with appropriate measures for a cultural evolutionary analysis can be challenging. The vast majority of research to date is based on language families with well-resolved trees, although the number of societies available with appropriate data is often quite small. So, for example, in a recent study of post-marital residence, Indo-European provided 27 societies, Uto-Aztecan 26 societies, Pama–Nyungan 66 societies, Bantu 69 societies and Austronesian 135 societies [49]. In a different kind of study investigating the association between parasite stress and political traits like authoritarianism and democracy, data were available only from 52 nation states [30]. The issue is not just the number of societies with data (tips), but the total number of societies in a clade, and the patterning of variation among them: if variant A is present in all societies in one clade and trait B present in all societies of the other clade, most likely only one change occurred, making it impossible to associate any factor with that change statistically. Indeed, Moravec et al. [49] had to drop the entire Indo-European family from their study because there were no matrilocal populations. The inverse (too many changes) is also a problem: if each society differs from its neighbour, it is impossible to reconstruct changes in the past—in our metaphor, the picture of the past is murky brown. In summary, available sample sizes may not allow for reliable inferences.

    (ii) What about missing data and extinction?

    Researchers might have to drop cases from their phylogeny, or sample societies strategically, on the basis of trait data availability. Missing data are an issue for at least two reasons. First, if the information is available for many societies in one clade but only a few societies in other clades, trees will be highly unbalanced (see §2c(iii)). Second, data (or branches) may be missing not at random with respect to the trait of interest, in which case the sample would not accurately reflect the diversity of the trait. With respect to the first issue, in Hrnčíř et al.'s study [38] the sample for studying the association between dwelling size and post-marital residence was heavily biased to New World sites, on account of the scant attention payed to material traits (such as house size) by European as opposed to American ethnographers. While the resulting sample may not be biased with respect to the hypothesis (i.e. data are missing at random), the effect of such uneven and sparse sampling might obscure any phylogenetic signal in the variables and influence inferences about internal nodes. With regard to the second problem, where there is an absence of records (e.g. among foragers in highly productive habitats; see below), inferences must be drawn with great caution because, following our metaphor, we simply do not have the full palette of colours.

    Extinctions, including unknown ones, are a linked concern insofar as simulation studies show that high extinction rates bring considerable error to estimates of root states [45], as well as influence Type I and II errors [3133]. Furthermore, extinctions are unlikely to be random. The reason a society is extinct (i.e. does not appear at a tip) may well be associated with its trait values prior to extinction. For example, examining prevalent marriage customs (e.g. are they arranged, are they polygynous) on a tree of contemporary hunter–gatherers [74] may bias ancestral estimates toward the traits of populations that have managed to persist, often in highly marginal areas, into the present era. Surwiec et al. [50] reconstructed, both in their global super-tree and in the Bantu language family, the ancestral state for lineality to be non-matrilineal and the ancestral state for post-marital residence to be non-matrilocal (in the case of Bantu, contrary to inferences from historical and linguistic data [75]). Could this be because matrilineal (and matrilocal) societies are less likely to have survived shifts to intensive agriculture and pastoralism, which typically favour male-centred kinship institutions?

    3. Assessing potential challenges through simulation

    (a) Simulations to reveal both general and specific limitations

    Various studies have used simulations of the likely processes that generate and modify traits to assess the challenges discussed above. These typically focus on the general feasibility of cultural macroevolution studies (e.g. [31,45]) or the limitation of particular approaches to recover the history of traits (e.g. [70,76,77]). Here, following previous suggestions (e.g. [16,34]), we show that simulations can be usefully employed to assess the extent to which these challenges might influence a particular study.

    Our simulations investigate the potential impact that the different challenges might have on phylogenetic reconstructions of trait evolution. We focus on three types of inferences that are common in studies of the phylogenetic reconstruction of trait evolution (figure 1), as follows. Are some changes between the states of the trait favoured (inference i)? Do socioecological conditions influence which changes among the states of the trait occur (inference ii)? Are there differences between lineages in which changes among the states of the trait occur (inference iii)? In trying to derive these inferences, there is a risk of concluding that the answer to the inference is yes even though in truth there was no effect (false-positive Type I error) and of concluding that the answer to the inference is no even though there was an effect (false-negative Type II error). Our simulations estimate the expected chance of obtaining false-positive or false-negative conclusions given a particular phylogeny and dataset. In addition, following from the discussion above, we examine how these error rates are influenced by properties of the sample (sample size, missing data), properties of the tree (tree shape) and properties of the traits (rate of change, number of variants, horizontal transmission). By simulating likely histories of the trait of interest across the given phylogeny, the power to discern alternative histories can be assessed.

    Figure 1.

    Figure 1. Common inferences in cultural macroevolution and their associated errors. We focus on three inferences about the evolution of cultural traits, and the errors that can occur when trying to derive them.

    We demonstrate this approach using two language phylogenies, one for societies in the Western North American Indian (WNAI) dataset [78] (electronic supplementary material, figure S1) and one for societies included in a recent phylogenetic study of the Pama–Nyungan language family [79] (electronic supplementary material, figure S2). For the WNAI, we constructed a phylogenetic tree for the 172 societies based on a hierarchical language classification ([80], modified by [81]). For analyses that require bifurcating trees, we resolved polytomies in the phylogenies randomly, but did not represent this uncertainty by repeating the simulations across multiple resolutions because our aim here is to illustrate example cases. For the 306 Pama–Nyungan societies, we used the phylogeny provided in the study. The results we present are dataset-specific, so inferences cannot be directly transferred to other studies. However, several of the findings are consistent across the two phylogenies, and we emphasize that we selected language trees that largely reflect the sample size, tree shape and trait distributions that cultural evolutionists typically encounter. All input data, code and output are available at The code can be adapted for use on any phylogeny prior to analysis to assess the specific power of a given dataset to test phylogenetic hypotheses.

    (b) Methods for our simulations of cultural macroevolution

    (i) Simulating discrete and continuous traits

    Across the two phylogenies, we simulated the evolution of discrete (having either two or four different states) and continuous traits using functions of the package ‘phytools’ in R [82]. Simulations start at the root (with the first discrete variant or the value zero for continuous traits) and estimate transitions along all branches. Transitions between different states (discrete traits) or rates of change (continuous traits) occurred according to four different scenarios. In the first scenario (to obtain the rate of Type I error rates of wrongly inferring directional changes, of wrongly inferring socioecological influences on transitions and of wrongly inferring lineage differences), all transitions and changes were the same and occurred randomly and with one consistent probability, reflecting a null model drift scenario, where all changes are equally likely across the whole tree. In the second scenario (to obtain the Type II error rate of wrongly missing directional change), there are two rates of transitions and changes. For discrete traits, one rate reflects higher transitions towards one of the variants from the other one or three variants, whereas the other rate reflects all other transitions. For continuous traits, increases are more likely than decreases. In the third scenario (to obtain the Type II error rate of wrongly missing socioecological differences), transition rates differ between societies associated with different socioecologies, such that one variant (discrete traits) or positive values (continuous traits) are favoured in branches within lineages associated with different socioecologies. To represent different socioecological conditions as potential correlates of the changes in the simulated cultural trait, we classified the WNAI societies as either living in forest habitats or not, and the Pama–Nyungan societies as hunter–gatherers or food-producers. In the fourth scenario, transition rates differ between lineages, with changes towards one of the variants (discrete traits) or increases towards positive values (continuous traits) more common along the branches of the largest clade compared to the rest of the tree (to obtain the Type II error rate of wrongly missing lineage differences).

    For all scenarios, we extract the simulated states of the trait across the tips of the phylogeny. The reported error rates reflect the number of independent simulations in which the analysis supported the wrong inference. The error rates for the baseline (figure 2a) are based on 110 simulations each, reflecting the different settings and ten independent repetitions of each setting. The error rates for simulations assessing the influence of properties of the sample, of the tree and of the trait are based on the respective subsets of simulations, and we report how these deviations from our chosen baseline change the proportion of wrong inferences (indicated by + and − designations in figure 2b–d and electronic supplementary material, figures S3–S5).

    Figure 2.

    Figure 2. Sources and rates of errors across the simulations of trait evolution for the WNAI and the Pama–Nyungan societies. The illustrations show examples of potential sources of errors in phylogenetic reconstructions of trait evolution. They depict the evolutionary history of a trait with two states (red and black, representing either a discrete trait or negative and positive values of a continuous trait), the resulting values at the tip, and most likely reconstruction. The values underneath show the error rates in baseline simulations (a), and how much these change (increase + or decrease −) depending on an example property of the sample (missing data, b), an example property of the trait (random horizontal transmission, c) and an example property of the tree (a late-burst phylogeny, d). The reported error rates display the percentage of independent repetitions in which a phylogenetic reconstruction of trait evolution either wrongly supported an evolutionary model in which rates of change differed across the tree even though all changes were simulated to occur randomly (Type I error, false positive), or wrongly did not support an evolutionary model in which rates of change differed even though this was how the change was simulated (Type II error, false negative).

    (ii) Analysing simulated traits

    To determine whether some changes in the simulated discrete traits occurred more frequently than other changes (inference i), we compared the likelihood of a reconstruction in which all transitions are assumed to occur at the same rate to that of a reconstruction in which some transitions occur more frequently than others. We estimated the likelihoods of the equal and the biased models using the function ‘ace’ of the package ‘ape’ [83] in R, assessing significance with a likelihood ratio test adjusting the degrees of freedom to reflect the different numbers of parameters in the two models. To determine whether some changes occurred more in specific socioecological conditions (inference ii), we compared the likelihood of a reconstruction where transitions in the simulated trait are independent from changes in the socioecological condition with a reconstruction where changes in the simulated trait depend on the socioecological condition. We used the function ‘discrete’ in the software Bayestraits [84] to estimate the likelihood of a dependent model and the likelihood of an independent model, comparing them using a likelihood ratio test. To determine whether changes were more likely to have occurred in the largest clade compared to the rest of the tree (inference iii), we estimated the likelihood of a reconstruction where transitions in the simulated trait are independent of where they occur in the tree and the likelihood of a reconstruction where transitions in the simulated trait are dependent on whether they occur on a branch in the largest clade or on a branch in a different part of the tree. For this, we also used the function ‘discrete’ in the software Bayestraits and conducted a likelihood ratio test.

    To determine whether changes in the simulated continuous traits occurred predominantly in one direction rather than randomly (inference i), we compared the likelihood of a reconstruction in which changes occurred as under a null Brownian drift model to a reconstruction in which changes occurred with a trend in one direction using functions of the package ‘OUwie’ [85] in R, assessing significance with a likelihood ratio test. To determine whether values differed between societies in different socioecological conditions (inference ii), we performed phylogenetic regressions to assess whether socioecological conditions are associated with differences in the simulated values observed at the tips (using functions of the package ‘MCMCglmm’ in R [86], assessing significance using the p-value). To determine whether values in the largest clade differed from the values in the remainder of the tree, we performed phylogenetic regressions to compare all values in the largest clade with the remaining values (using functions of the package ‘MCMCglmm’ in R, and assessing significance using the p-value).

    (iii) Identifying the effect of differences in the properties of the trait, of the sample and of the tree

    In addition to a baseline scenario, we varied the following parameters during the simulations: properties of the trait, of the sample and of the tree. To change properties of the trait, we: (i) set the number of possible variants of the trait to be either two or four (discrete traits); (ii) changed the rate at which transitions could occur from low to high (for a slow rate, this is equal to an average of 10 changes across the 610 branches in the Pama–Nyungan phylogeny and about half as many changes across the 342 branches of the WNAI phylogeny; for a fast rate, this is equal to an average of 75 changes across the Pama–Nyungan phylogeny and about 40 changes across the WNAI phylogeny); and (iii) introduced horizontal transmission by having 10% societies at the tips copy a variant of another random society, another society in the same ecology, or their closest geographical neighbour. To change properties of the sample, we investigated samples with: (i) the full set of societies; (ii) only societies in the largest clade; and (iii) 25% or 50% of data missing from a random set of societies, from societies in the dominant ecology or from societies with one of the variants (discrete traits) or with positive trait values (continuous traits). To change properties of the tree, we manipulated branch lengths: (i) to reflect trait change at the time of the split, where branches lengths are all the same; (ii) to reflect an early burst into the existing societies, where branches between clades are short and branches to the tips are long; and (iii) to reflect a late radiation where branches separating clades are long and all diversification within the clades happened very recently. We display examples of manipulations of properties of the trait, of the sample and of the tree in figure 2 and below; for additional results see electronic supplementary material.

    (c) Findings from our simulations of cultural macroevolution

    Our simulations show that for these particular datasets and phylogenies, properties of the trait have the biggest influence on the frequency of erroneous inferences and properties of the sample have a relatively minor influence, while the influence of properties of the tree are context-specific (figure 2).

    We set up the baseline simulations to generate data similar to those commonly available in cultural phylogenetics, and we analysed the simulated data with the standard settings of the respective methods. In these baselines, we obtained substantial false-positive (Type I) error and false-negative (Type II) error that range between 12–40% and 36–65%, respectively, for discrete traits, and 0–51% and 0–51% for continuous traits (figure 2a). Continuous traits generally show lower false-negative errors than discrete traits: the amount of variation in discrete traits is restricted, reducing power to detect weak effects, in particular when relying on the standard settings of the analyses that attempt to estimate a large number of parameters without any prior information. False-positive rates are largely comparable for continuous and discrete traits: the simulated traits changed relatively rapidly, such that by chance lineages and ecologies will differ. There is no consistent difference in performance between the two language trees, despite their difference in shape and tree structure balance. In the following, we present changes in the errors contingent on various properties of traits, trees and sample, but we should emphasize that throughout errors are high. While undoubtedly some of these problems might be specific to the situations we chose to simulate, and the precise magnitude of the errors might be reduced by adapting the settings of the analyses to independently available information (such as likely types of transmission), the size of these errors should not be overlooked.

    Regarding directional change, the false-positive error of wrong inference was increased most markedly from the baseline rate when variants of the trait were transmitted horizontally among societies (figure 2c; electronic supplementary material, figure S3). Such horizontal transmission might reflect actual directional changes, but we found relatively high error rates even when horizontal transmission occurred randomly: relatively high rates of random horizontal transmission (in our simulations, 10% of extant societies copy a variant) will introduce highly divergent variants, leading to Type I errors of wrongly supporting a positive signal as these divergent values would require faster changes than assumed under a null model of random drift. The Type II error of wrongly missing directional change was generally high both for discrete and continuous traits: it was highest when power was reduced because there are very few changes in total across the phylogeny, that is with fewer variants of the trait (electronic supplementary material, figure S5) and lower when horizontal transmission increased the frequency of certain states throughout the tree (figure 2c).

    Regarding the detection of socioecological influences on change, although the false-positive error rate was already high in the baseline for both discrete and continuous traits, it was larger when there was horizontal transmission that could increase the occurrence of a rare variant within one ecology (electronic supplementary material, figure S4) and when the societies in different ecologies are separated by long branches such that changes might occur by chance (e.g. late-burst phylogeny for the Pama–Nyungan, figure 2d). The false-negative error rate of missing socioecological influences on changes was highest when data were missing non-randomly (figure 2b), in particular, if certain variants are rare to begin with (less than 10% of Pama–Nyungan societies are food-producers; electronic supplementary material, figure S2). False-negative error rates for detecting a socioecological influence also increased under random horizontal transmission (figure 2c) and late-burst phylogenies (figure 2d) in both language families, although only for continuous traits.

    Regarding lineage-specific patterns, the false-positive error rate of wrongly inferring lineage differences on changes was already high in the baseline simulation for both discrete and continuous traits. It was considerably increased when there was structured horizontal transmission within lineages or ecologies (electronic supplementary material, figure S3), random horizontal transmission (figure 2c), and as a result of the tree structure, specifically in late-burst phylogenies where long branches separate lineages such that differences might occur by chance (figure 2d). The false-negative error rate of wrongly missing lineage differences was already high in the baseline (for discrete traits) and appears to decline when horizontal transmission occurs (figure 2c; electronic supplementary material, figure S3).

    4. Discussion and outlook

    The most striking aspect of these simulations is the very high rate of false-positive Type I error and false-negative Type II error contingent on properties of the traits, of the tree shape, and less so of the sample. While this result may be specific to the WNAI dataset (with its poorly resolved language phylogeny, its categorical codings and its unbalanced clade structures), largely similar false-positive error and false-negative error in the Pama–Nyungan analyses suggest they might be quite general.

    The strong effects of horizontal transmission on false inferences about selection may reflect the fact that in the WNAI societies linguistic and spatial signals are relatively uncorrelated ([26], see also [39]). This is not always the case; phylogenies vary in the extent to which they can reliably capture the spread of cultural traits, which will depend on whether or not there has been a well-documented linguistic and cultural expansion as in the Austronesian and Bantu language families. While we might all hope that the WNAI-specific findings are not generalizable, the analysis of Pama–Nyungan (with its better-balanced clade structure) is concerning in that it shows very similar false inference when traits are transmitted horizontally.

    Despite the fact that cultural phylogeneticists differentiate, quite rightly, the distinct objectives of different cultural phylogenetic studies, our simulation results reveal problems that are not particular to any one objective. More specifically, we show that properties of the traits, tree and (less so) sample generate inferential problems that result from inaccurate reconstructions of internal nodes. These inaccuracies impact not just studies that seek determination of ancestral states, but also those concerned with the identification of independent adaptations, the understanding of potential sequences of cultural transitions and the determination of whether transitions are correlated with changes in other traits, insofar as each of these objectives requires estimation of internal nodes.

    Given the potential generality of these effects of properties of traits (and to a lesser extent, of the tree and the sample) on the validity of inferences, and the relevance of these findings for the multiple objectives within cultural phylogenetic research, we suggest that the concerns we review in §2 continue to be legitimate. Accordingly, we offer a simulation that identifies the expected error rates associated with a large set of trait, tree and sample properties (see code on GitHub) that we encourage investigators to use to determine the error rates they might expect in their particular analysis. We view our contribution here as complementary to that of staunch supporters of phylogenetic analyses, who have themselves acknowledged that, where cultural reticulation has been rampant, projecting traits onto trees can only produce ‘meaningless results’ [34, p. 366], adding that phylogenetic analysis may not be suitable for all temporal and spatial time scales [3].

    We also propose the following items as a starting point for a checklist when planning a cultural macroevolution study. (i) Research questions should be informed by solid theoretical models (e.g. [67,87]). This will help address many of the dilemmas and tradeoffs outlined here—specifically with respect to how to identify which sample to use, and how to measure the trait of interest. (ii) Investigators should consider carefully whether the tree chosen to represent population histories reflects a time scale appropriate to the social learning and transmission processes that generated the data being evaluated by the comparative method. Ideally, the tree will reflect diverse sources of data across history, linguistics, genetics and archaeology (see also [41]). (iii) Investigators should consider carefully how the trait of interest is transmitted, using ethnographic or historical materials where available, taking care to differentiate mechanisms of transmission for between and within populations; this will inform about the potential rate of change, and whether the horizontal transmission is likely to be prevalent (see also [88]). (iv) Investigators should explore how missing data, horizontal transmission and/or differential lineage extinction might influence their inferences for their particular sample (see our code: Investigators may have additional information about the transmission of the cultural trait they are studying that can inform the analyses and there are multiple ways to model such evolutionary processes. (v) Investigators should acknowledge the centrality of internal node trait estimation to their inferences. For example, researchers often assert that if they are using phylogenies only to control for statistical independence, the precise details of internal node reconstructions are not critical. This may be true, but it is incumbent then not to draw inferences of causality from their phylogenetic analyses when, as we have shown, certain features of the tree, the sample or the traits may exacerbate these inaccuracies.

    To return to our metaphor, we urge investigators to consider carefully what the picture they are painting of the past is actually going to show. While it may not offer perfect resolution, nor the full-colour spectrum, is there sufficient detail to discern the key features of interest?

    In sum, we are not arguing against phylogenetic studies of cultural traits. Rather, we hope our overview and our simulation will encourage researchers to consider persistent perils inherent in answering questions about the history of a cultural trait. While we recognize the importance of the cultural evolutionary goal of putting ‘pre-history back into anthropology’ [11, p. 406], we suspect more caution is still needed. Evidence for our position lies in the increasingly contradictory findings that are emerging in the literature regarding ancestral states, sequence of changes and adaptive value. The very fact that the field has grown enough to produce discrepant findings is positive and generates progress. Discrepancies may well reflect lineage, region or scale-specific effects, as well as differences in the definition or classification of traits which, with further research, will only increase our understanding of the drivers of cultural diversity, but we do need to keep in mind that methodological problems might also be entailed.

    Data accessibility

    The data of the raw output estimates from the phylogenetic reconstructions applied to simulated trait data across two cultural phylogenies as described in the results are available at Lukas D. (2021). Output data from article 'The potential to infer the historical pattern of cultural macroevolution'. Max Planck Society. (

    Authors' contributions

    D.L.: conceptualization and design, methodology, analyses, writing—original draft, writing—editing and revising. M.T.: conceptualization and design, writing—original draft, writing—editing and revising. M.B.M.: conceptualization and design, analyses, writing—original draft, writing—editing and revising.

    Competing interests

    We declare we have no competing interests.


    We received no funding for this study.


    We thank the members of the Department of Human Behaviour, Ecology and Culture at the MPI EVA for feedback during the planning of this study, and Fiona Jordan, Sam Passmore and Tom Currie for helpful comments on an earlier version of this manuscript. We also thank the organizers of this special issue, particularly Eva Boon, as well as the Lorentz Center at Leiden University for hosting the Foundations of Cultural Evolution workshop.


    One contribution of 15 to a theme issue ‘Foundations of cultural evolution’.

    Electronic supplementary material is available online at

    Published by the Royal Society. All rights reserved.