Philosophical Transactions of the Royal Society B: Biological Sciences
You have accessReview article

Mammal madness: is the mammal tree of life not yet resolved?

Nicole M. Foley

Nicole M. Foley

School of Biology and Environmental Science, Science Centre East, University College Dublin, Dublin 4, Ireland

Google Scholar

Find this author on PubMed

,
Mark S. Springer

Mark S. Springer

Department of Biology, University of California, Riverside, CA 92521, USA

[email protected]

Google Scholar

Find this author on PubMed

and
Emma C. Teeling

Emma C. Teeling

School of Biology and Environmental Science, Science Centre East, University College Dublin, Dublin 4, Ireland

[email protected]

Google Scholar

Find this author on PubMed

    Abstract

    Most molecular phylogenetic studies place all placental mammals into four superordinal groups, Laurasiatheria (e.g. dogs, bats, whales), Euarchontoglires (e.g. humans, rodents, colugos), Xenarthra (e.g. armadillos, anteaters) and Afrotheria (e.g. elephants, sea cows, tenrecs), and estimate that these clades last shared a common ancestor 90–110 million years ago. This phylogeny has provided a framework for numerous functional and comparative studies. Despite the high level of congruence among most molecular studies, questions still remain regarding the position and divergence time of the root of placental mammals, and certain ‘hard nodes’ such as the Laurasiatheria polytomy and Paenungulata that seem impossible to resolve. Here, we explore recent consensus and conflict among mammalian phylogenetic studies and explore the reasons for the remaining conflicts. The question of whether the mammal tree of life is or can be ever resolved is also addressed.

    This article is part of the themed issue ‘Dating species divergences using rocks and clocks’.

    1. Introduction

    Of all classes of animals, humans are most concerned with and fascinated by class Mammalia, of which we are members. Class Mammalia contains warm-blooded animals that have hair or fur, produce milk and typically give birth to live young [1]. As of 2005, there are approximately 5400 living mammalian species that inhabit every biome on earth, range in size from the 2 g bumblebee bat to the 170 tonne blue whale, and exhibit unprecedented phenomic diversity and ecological adaptations [1]. Class Mammalia is divided into two subclasses: Prototheria, which contains the egg-laying monotremes (platypus and echidna), and Theria, which contains the placental and marsupial clades [2]. The mammalian fossil record extends deep into the Triassic (approx. 220 Ma) and records the evolution of mammalian lineages through extreme changes in flora, environments and landmasses during the Cretaceous Terrestrial Revolution (KTR) and the Cretaceous–Palaeogene (KPg) mass extinction events [36].

    Despite the keen interest in mammals, the evolutionary history of this clade has been and remains at the centre of heated scientific debates [4,5,712]. In part, these controversies stem from the widespread occurrence of convergent morphological characters in mammals, which makes it difficult to tease apart homology and homoplasy in phylogenetic analyses that are solely based on these characters [4,9,13,14]. Molecules have proven more successful in recovering relationships among extant taxa in the mammalian tree [4,5,15,16], but molecular data cannot be obtained for most extinct taxa. Notable exceptions include South American ungulates and glyptodonts, which have been positioned in the mammalian tree based on protein sequences of type I collagen [17] and complete mitogenomic DNA sequences [18], respectively. Even for molecular data, different data types require different phylogenetic models [14], each of which has its own limitations [10]. Whole genome analyses have promised to revolutionize our understanding of animal evolutionary history, but some claims for the robust resolution of difficult nodes with phylogenomic data are underpinned by problematic analyses and/or poor data [10,19,20]. Despite these difficulties, there is consensus over the main topology [46] even though a few local polytomies are still unresolved (e.g. placental root; branching pattern within Paenungulata; position of Scandentia within Euarchontoglires; branching pattern within Laurasiatheria [4,7,21,22]). Consensus on the timing and biogeographic history of the placental mammal radiation has proved more elusive [46,8,9] owing to morphological convergence and uncertain phylogenetic relationships of fossil to living taxa [9,13]; rapid cladogenesis at difficult to resolve nodes [23]; disagreement over appropriate calibration strategies for molecular dating analyses; and higher levels of incompleteness in the Cretaceous and Early Cenozoic fossil record of the Southern Hemisphere than Northern Hemisphere. Indeed, the history of debate on these issues, in conjunction with the rich database of mammalian fossils and genomes, provides an unprecedented opportunity to explore the pros and cons of modern methods for species tree estimation and timetree construction.

    Here, we showcase some of the past and present debates pertaining to the phylogenetic and evolutionary history of placental mammals. We discuss the next generation of phylogenetic methods and data that have been employed to resolve the placental mammal tree of life. We analyse a new molecular dataset comprised of sequences for 286 mammals and four outgroups, and date the divergence of these species to generate a novel timetree. We provide a roadmap for future analyses, detailing the pros and cons of current methods, and highlight the ways forward to finally resolve the mammal tree of life.

    2. Shaking the morphological tree

    In 1945, G.G. Simpson published ‘The classification of mammals’ [24]. This seminal work was based on morphological comparisons and is largely reflected in the landmark phylogeny published by Novacek [25] in Nature. Novacek's [25] morphological tree portrays a basal split between an edentate group comprised of Xenarthra (e.g. armadillos, anteaters) and Pholidota (pangolins) and a larger group that includes all other placental mammal orders. The latter group includes Ungulata (all hoofed mammals), Archonta (e.g. colugos, bats, primates), Anagalida (e.g. rodents and elephant shrews), Carnivora and Insectivora (table 1). Throughout the 1990s, aspects of this topology [25] were challenged through new fossil finds and phylogenetic analyses: for example, Beard [30] suggested a closer affinity of Dermoptera (e.g. colugos) with Primates than bats, and the finding of Eocene whale fossils (Artiocetus, Rodhocetus) with diagnostic ankle bones suggested derivation of Cetacea from Artiodactyla rather than from Mesonychia [31]. Nevertheless, a recent cladistic analysis of the largest morphological dataset to date, which includes approximately 4500 characters [8], has many features in common with Novacek's [25] morphological tree and still supports numerous polyphyletic groups including an ‘insectivore’ group, a ‘spiny hedgehog’ group, an ‘ant and termite eating’ group, a ‘tree-dwelling’ group and an ‘ungulate’ group [9] (table 1). On the molecular front, features of Novacek's [25] morphological tree were both corroborated and challenged by early phylogenetic analyses of DNA sequences (reviewed by Springer et al. [14]). Morphological and molecular consensus emerged for Paenungulata (hyraxes, manatees and elephants) [24,25,3237], and some molecular studies [35,38] agreed with morphology in supporting Glires (lagomorphs and rodents) [24,32,35,38,39]. Other morphological hypotheses including Altungulata (perissodactyls and paenungulates), Anagalida (rodents, lagomorphs, and elephant shrews), Archonta (primates, dermopterans, treeshrews and bats), Ungulata (paenungulates, perissodactyls, cetartiodactyls and aardvarks) and Volitantia (bats and colugos) were rejected by molecular data [35]. Of historical interest, early DNA studies also suggested novel or throwback hypotheses including rodent paraphyly, a basal split between rodents or hedgehogs and all other placental mammals, and a modified version of Gregory's [40] Marsupionta (monotremes and marsupials) hypothesis, all of which have now been debunked (reviewed in Novacek [25] and Springer et al. [14]). These extraordinary hypotheses potentially resulted from poor taxon sampling with its attendant long-branch misplacement problems [14]. The next wave of studies (see below) aimed to address these deficiencies with increased gene and taxon sampling.

    Table 1.Higher-level relationships of placental mammal orders based on morphology versus molecules. Orders (italics) are coloured by their superordinal membership according to molecular studies. The majority of superordinal groups based on morphology are polyphyletic and reflect ecomorphological convergence (e.g. ‘ant and termite eating group’ includes representatives from Xenarthra, Afrotheria, and Laurasiatheria).

    Inline Graphic

    3. First wave of large-scale molecular data—comparative molecular phylogenetics

    At the turn of this millennium, the first large-scale molecular phylogenetic studies were published [15,16,26] and drove the most significant revisions of Novacek's [25] phylogeny (table 1; Springer et al. [14]). These initial studies, which included representatives from all recognized mammalian orders, were based on large multigene datasets comprised of both nuclear and mitochondrial markers and provided support for four superordinal clades of placental mammals that are still recognized today: Afrotheria, Xenarthra, Euarchontoglires and Laurasiatheria [15,16,2629]. Subsequent studies of retrotransposed elements and indels provided confirmatory support for these four superordinal clades [13,15,16,26,29,41,42]. Afrotheria contains the orders Tubulidentata (aardvarks), Afrosoricida (e.g. golden moles and tenrecs), Macroscelidea (elephant shrews), Hyracoidea (hyraxes), Sirenia (manatees, dugongs, sea cows) and Proboscidea (e.g. elephants), with the latter three orders grouping in the clade Paenungulata. Xenarthra contains Cingulata (armadillos) and Pilosa (anteaters, sloths). The remaining mammalian orders are grouped into Boreoeutheria, which is further divided into the superordinal groups Laurasiatheria and Euarchontoglires. Laurasiatheria unites phenotypically diverse orders: Chiroptera (bats), Cetartiodactyla (e.g. cetaceans, cows and pigs), Perissodactyla (e.g. rhinos and horses), Pholidota (pangolins), Carnivora (e.g. lions, dogs, seals) and Eulipotyphla (e.g. shrews and hedgehogs). Eurachontoglires contains Primates (e.g. humans and monkeys), Scandentia (treeshrews), Glires (e.g. rabbits and rodents) and Dermoptera (colugos).

    While numerous studies support the monophyly of each of the four superordinal groups, including studies with expanded gene and taxon sampling [4], their branching order at the base of Placentalia is still unresolved and remains hotly debated [43]. The three competing hypotheses for the placental root posit basal splits between (i) Exafroplacentalia and Afrotheria, (ii) Xenarthra and Epitheria, and (iii) Atlantogenata and Boreoeutheria (Teeling & Hedges [43]; table 2). Many of the early large-scale molecular phylogenetic studies favoured Afrotheria versus Exafroplacentalia [15,16,2628,44], but Atlantogenata versus Boreoeutheria and to a lesser extent Xenarthra versus Epitheria also received support, sometimes in the same studies that provided support for Afrotheria versus Exafroplacentalia [15,16,28]. Meredith et al.'s [4] multigene dataset, which expanded gene sampling to 26 loci (totalling approx. 35.6 kb) and extended taxonomic coverage from mammalian orders to 97% of mammalian families, was insufficient to resolve the placental root and supported Afrotheria versus Exafroplacentalia (DNA analyses) or Atlantogenata versus Boreoeutheria (amino acid analyses). Analyses of coding indels and retroposons have also returned mixed results, with some studies favouring a basal split between Xenarthra and Epitheria [41] and others supporting Atlantogenata versus Boreoeutheria [52]. The most comprehensive study based on retroposon insertions provides commensurate levels of support for each of the three competing hypotheses [64].

    Table 2.Summary of the various papers supporting one of the three hypotheses for the root node of eutherian mammals: Exafroplacentalia, Epitheria or Atlantogenata.

    Inline Graphic

    Aside from the placental root, other recalcitrant problems in higher-level placental systematics include (i) the Laurasiatheria polytomy, (ii) sister-group relationships within Paenungulata, and (iii) the position of Scandentia (treeshrews) within Euarchontoglires [14]. These difficult problems are characterized by short internal branches, which are indicative of rapid radiations, and are likely to remain difficult to resolve owing to incomplete lineage sorting (ILS) and a variety of systematic biases including long branch misplacement and model mis-specification that can hamper phylogenetic inference. Even though these nodes and branches of the tree remained unresolved, it was hoped that the promise of phylogenomic analyses would resolve these ‘tricky’ branches [65].

    4. Second wave of molecular data—comparative phylogenomics

    In 2001, the draft sequence of the human genome was published [66,67]. Following its initial publication, researchers aimed to discover, annotate and describe functional elements in the human genome through cross-species comparisons with other mammals [68]. In 2005, a sequencing effort began to maximize the representation of mammals from each of the four superordinal groups, which culminated in the publication of 29 mammal genomes [68]. This particular sequencing effort, along with ongoing large-scale sequencing initiatives such as Genome 10 K, which aims to sequence 10 000 vertebrate genomes, have revolutionized comparative genomics and mammalian phylogenetics [69,70]. Different approaches to analyse these computationally challenging, extremely large datasets have included standard supermatrix methods (= total evidence), with or without the incorporation of sophisticated models that accommodate tree and dataset heterogeneity [6,45,53], shortcut coalescence methods and concatalescence (= binning) approaches that combine elements of concatenation with shortcut coalescence [7]. The application of these methods to large molecular datasets has not resulted in consensus for difficult problems such as the placental root, the position of treeshrews and the Laurasiatheria polytomy. Rather, these studies often highlight incongruent results that arise from different phylogenomic analyses of the same dataset. Conflicting results of concatenation and coalescence studies, in particular, have become the subject of spirited debates [7,911,19,71] that hearken back to the ‘cladist/pheneticist/likelihood’ debates of the 1970s and 1980s [11].

    5. Revolution in analytical methods

    In the supermatrix approach, data from multiple gene fragments are concatenated to form a single matrix for subsequent phylogenetic analyses [19]. The advantage of the supermatrix approach lies in combining individual genes into a single matrix for simultaneous analysis, an approach which has the ability to decrease sampling error, offset homoplastic signals in individual genes and uncover hidden support [19,72] (table 3). Hidden support refers to increased support for a clade in combined analysis relative to the support obtained in separate analysis [19,73], and is the primary advantage of the supermatrix approach. In recent years, the supermatrix approach for species tree estimation has been criticized by champions of coalescent methods, who have correctly noted that concatenation methods do not explicitly address the problem of ILS [7,21,54,7478]. Furthermore, it has been suggested that the supermatrix approach can overestimate nodal support when an incorrect model of sequence evolution is applied. However, coalescence methods make their own assumptions and the worthwhile goal of accounting for ILS introduces a series of potential problems that may arise if these assumptions are violated. In particular, coalescence methods assume recombination between but not within loci, gene tree heterogeneity that results exclusively from ILS, and neutral evolution [76,77].

    Table 3.Brief description and implementation of recent emerging tree reconstruction methods.

    method coalescence shortcut coalescence binning supermatrix supertree
    description co-estimates gene trees and species tree; fully parametric separate estimation of gene tree and species tree; partially parametric genes are concatenated into supergenes prior to coalescence-based estimation of the species tree estimates species tree from gene fragments concatenated into a single matrix; also referred to as concatenation various subsets of characters are analysed to generate subtrees which are further combined to produce a species tree
    implementation BEST, *BEAST STAR, STEAC, MP-EST, STEM, MDC, ASTRAL STAR, STEAC, MP-EST, STEM, MDC, ASTRAL wide variety of programmes that implement maximum parsimony, distance, maximum-likelihood and Bayesian methods MRP–PAUP*
    computation massively computationally intensive less computationally intensive less computationally intensive less computationally intensive less computationally intensive
    suitable for large-scale analysis no yesa yes yes yes

    aMP-EST not suitable for datasets with large numbers of taxa [20].

    Given the occurrence of unresolved nodes on the mammalian tree that are characterized by rapid divergences [65], coalescent approaches may be better suited to resolving such nodes than concatenation [21]. However, fully parametric coalescent models such as *BEAST [79] and BEST [80], which co-estimate gene trees and the species tree [81], cannot be applied to large datasets because they are massively computationally intensive [19]. As such, shortcut coalescence methods such as STEAC, STAR [76], MP-EST [54] and ASTRAL [82], which carry out separate estimation of gene trees and the species tree, have been applied to large mammalian datasets [7,21] (table 3). Criticisms aimed at recent applications of these shortcut methods to the placental tree have focused on inappropriate coalescent-gene (c-gene) size and high levels of gene tree reconstruction error [10]. One of the key assumptions of the multispecies coalescent model is that recombination occurs between c-genes but not within c-genes. To satisfy this assumption, it is necessary to use gene trees that are inferred from short stretches of DNA. Empirical evidence from primates has shown that the average length of recombination free c-genes is less than 100 bps [83,84]. This is particularly problematic for the application of coalescent approaches to large taxonomic datasets because as the number of taxa increases, c-gene size must decrease owing to the recombination rachet [19]. Recent phylogenomic studies with coalescence methods have employed widely variable mean locus lengths. McCormack et al.'s [21] mean locus length for ultraconserved elements was only 410 bp, but Song et al. [7] reported an average locus length of 3.1 kb for protein-coding sequences. However, Song et al. [7] did not consider the intervening introns and their true mean locus length, measured from start codon to stop codon, is 139.6 kb [19]. This inadvertent application of coalescence methods to ‘c-genes’ that are much too long was dubbed concatalescence by Gatesy & Springer [85], because disparately spaced exons are first concatenated into coding sequences and then subsequently analysed using shortcut coalescence approaches. A recent study using simulated data suggests that species tree estimation using coalescent methods may be more robust to recombination than previously thought [86], but these simulations were for relatively shallow divergences and only included a small number of taxa. Further, these simulations [86] did not compare the performance of coalescence methods to concatenation in their study of the effects of recombination on coalescence methods. More recently, the idea of concatalescence has been reimagined as statistical binning, in which gene trees and alignments are assigned to a bin based on ‘combinability’ before creating supergene trees for downstream coalescence analysis [87]. Accurate gene tree reconstructions are also central to the coalescent approach as each tree is given equal weight in the analysis; however, accurate reconstructions become increasingly difficult with decreasing c-gene sizes as more taxa are added to the analysis [19]. It should be noted that many of the forces impacting accurate gene tree reconstruction such as long branches, mutational saturation and weak signal are also problematic issues for supermatrix approaches [88].

    At present, the relative advantages and disadvantages of coalescence versus supermatrix approaches for phylogeny reconstruction have not been fully explored. This is despite the existence of many simulation studies, which have variably supported one method over the other depending on the simulation parameters [82,87,8993]. Future studies should simulate more realistic c-gene sizes and increase the number of taxa to better reflect empirical datasets, and examine the effects of recombination on coalescence versus concatenation methods [86].

    Beyond the direct debate over coalescent and supermatrix approaches to species tree construction, alternative solutions are emerging within each of these camps. On the coalescence front, single nuclear polymorphism (SNP) methods such as SVDquartets [94] provide an alternative to fully parametric and shortcut coalescence methods that depend on gene trees. An important advantage of SNP methods is that they avoid problems with the recombination ratchet [10]. Among novel supermatrix approaches, Romiguier et al. [45] explored the effect of biased gene conversion, as measured by GC content, which is an indicator of recombination, on phylogenetic inference. Through comparative analysis of GC-rich and AT-rich datasets, they showed that GC-rich genes induced a higher amount of conflict between gene trees, whereas AT-rich datasets provided increased resolution of relationships among placental mammals [45]. Crucially, these compositionally biased datasets supported different root node hypotheses and in doing so highlight the potentially confounding role composition bias can play in phylogenetic inference if not adequately accounted for during data selection or through appropriate model selection. Previous large-scale supermatrix analyses of the placental tree have used homogeneous models of sequence evolution [4,95]. However, among placental mammals, there exists considerable heterogeneity among genes, lineage-specific substitution rates and sequence compositional bias [53]. In an analysis that accounted for heterogeneity across the phylogeny using NDRH/NDCH node-discrete rate matrix heterogeneous/node-discrete composition heterogeneous models and CAT to model heterogeneity across the data, Morgan et al. [53] showed that employing models that account for heterogeneity greatly improves phylogenetic resolution compared to homogeneous models.

    6. Can comparative genomics resolve problematic nodes in the placental tree?

    Despite the abundance of genomic data for placental mammals and the development of novel methods for phylogeny reconstruction, the ‘tricky’ nodes are still unresolved. The placental root node remains a matter of contention with claims of strong support emerging for one or a number of the three competing hypotheses using different data types and analytical methods: Exafroplacentalia [21,45,46], Epitheria [8] and Atlantogenata [4,7,53]. However, the most recent study of the placental tree has taken a ‘total evidence’ approach to resolve the root by analysing morphological, nucleotide and micro-RNA data using a supermatrix approach with heterogeneous modelling, coalescent approaches with unbinned gene trees and statistical binning [6]. In this study, Tarver et al. [6] recovered high support for the Atlantogenata hypothesis (originally described from coding indels by Murphy et al. [52]) from their supermatrix and concatalescence analyses of nucleotides, as well as from their micro-RNA dataset. Furthermore, to investigate previous studies that supported other hypotheses for the placental root, they re-analysed the datasets of O'Leary et al. [8], Hallström & Janke [46] and the AT-rich dataset of Romiguier et al. [45], and found that model fit for these datasets could be improved using heterogeneous modelling, and that reanalysis of the datasets of O'Leary et al. [8] and Hallström & Janke [46] supported Atlantogenata [6]. While the re-analysis of the Romiguier et al. [45] dataset only yielded minimal support for Altlantogenata (approx. 0.5 posterior probability), the dataset no longer recovered strong support for Exafroplacentalia [6]. While consensus may be tipping in favour of an Atlantogenata versus Boreoeutheria root, advocates of this and other hypotheses for the placental root have not provided compelling explanations for Nishihara et al.'s [64] retroposon study that provides commensurate levels of support for each of the three competing hypotheses, i.e. 22, 25 and 21 L1 insertions favouring Exafroplacentalia, Epitheria and Atlanatogenata, respectively. Unfortunately, despite the large datasets, the other hard nodes also remain to be resolved. At present, it is unclear if treeshrews (Order Scandentia) are more closely related to Glires (rodents and lagomorphs), Primatomorpha (primates and colugos) or Dermoptera (colugos). Also, relationships among the laurasiatherian clades Chiroptera, Ostentoria (carnivorans and pangolins), Perissodactyla and Cetartiodactyla await elucidation. There is still much research ahead for future mammal phylogeneticists.

    7. Dating the mammal tree—a time bomb

    The timing of the origin and diversification of placental mammals is a highly contentious topic in phylogenetics. At its core lies disagreement on the timing of interordinal and intraordinal divergences in relationship to the KPg boundary. Archibald and Deutchsmann [96] proposed three models to characterize the results of different studies. More recently, each of these models has received support from one or more large-scale studies of placental mammal phylogeny. Archibald & Deutchsmann's [96] three models are (i) the explosive model, (ii) the short fuse model and (iii) the long fuse model. The explosive model is generally favoured by palaeontologists and proposes that both interordinal and intraordinal clagodenesis within Placentalia occurred after the KPg boundary (approx. 66 Ma) in response to newly available ecospace vacated by non-avian dinosaurs after the extinction event [8]. This model rejects a crown position for all Mesozoic eutherians and is consistent with O'Leary et al.'s [8] parsimony analysis of a combined phenomic–genomic dataset for representative extant and extinct eutherians.

    By contrast, most molecular timetrees posit much older interordinal divergences, wherein placental mammals originate in the Cretaceous and persisted at low diversity before eventually experiencing an explosion in diversification [96]. Where molecular diversification models differ is in the duration of the lag period between the origin of placentals and their burst of diversification. The long fuse model posits interordinal cladogenesis in the Cretaceous followed by intraordinal cladogenesis after the KPg mass extinction, and is broadly supported by recent studies that have employed large molecular datasets with different models for branch rates (i.e. autocorrelated, independent) and multiple calibrations from the fossil record [46,12,97]. We analysed a dataset for Meredith et al.'s 26 genes that was expanded to included 286 mammals and four outgroups (26 loci; see electronic supplementary material, table S1). We included only therian taxa that are represented by at least 16 of 26 genes. Timetree analyses were performed with mcmctree [98] using autocorrelated rates with 99 hard-bounded calibrations (see electronic supplementary material, table S2). The results are highly congruent with Meredith et al.'s [4] original timetree analyses and estimates from dos Reis et al. [5] (figure 1). An estimated eutherian origin around approximately 170 Ma is in agreement with the fossil record following the discovery of a stem eutherian fossil Juramaia sinensis [101] from the Jurassic [4,5] and these results support the long fuse model (figure 1). While limited intraordinal cladogenesis may have commenced in the Late Cretaceous for a few orders (e.g. Eulipotyphla), the vast majority of placental intraordinal diversification takes place in the Cenozoic. The long fuse model is also consistent with a major role for the KPg extinction in promoting morphological and ecological diversification in the wake of the ecological vacuum that ensued after the KPg extinction event [4,5]. The short fuse model agrees with the long fuse model in positing interordinal cladogenesis in the Cretaceous, and further suggests that many intraordinal divergences occurred well before the KPg boundary. Among recent studies, support for this model is more limited and derives mostly from Bininda-Emonds et al.'s [95] mammalian supertree.

    Figure 1.

    Figure 1. Timetree based on mcmctree supports a long fuse model of mammalian diversification [4,5]. Divergence time estimates were obtained under a GTR + Γ model of sequence evolution, 99 calibrated nodes (see electronic supplementary material table S2 for details), and autocorrelated rates of evolution. Divergence time estimates for clades of interest are shown at each node, with the 95% CI denoted by blue bars. The dataset included 26 tree of life gene fragments [4] and was expanded to include 286 taxa (see electronic supplementary material, table S1 for taxa, gene fragments and accession numbers). GenBank sequences for single genes were supplemented with available genome data that were mined using BLAST searches. The tree was estimated using RAxML [99] on the CIPRES web server [100] under a fully partitioned model where each partition had its own GTR + Γ model of sequence evolution. Paintings by Carl Buell. (Online version in colour.)

    Both the short and long fuse models suggest that there should be crown placentals from the Cretaceous, but recent morphological parsimony analysis studies provide no support for this prediction and instead, position all Cretaceous taxa outside of crown Placentalia [8,102104]. Explanations for this discrepancy are that (i) reconstructed divergence times in the Cretaceous based on molecular data are inaccurate [8], (ii) the placental fossil record is highly incomplete and it is simpler to propose ghost lineages in the Cretaceous than virus-like rates of molecular evolution in early placental mammals [9] and (iii) morphological parsimony analyses that position Cretaceous eutherians outside of crown Placentalia are inherently unreliable owing to pervasive convergence in morphological characters and clustering of the recent, which is a form of long branch attraction that can result in stemward slippage of fossil taxa. Below, we discuss each of these explanations for the perceived discrepancies between molecular and palaeontological estimates for interordinal divergence times.

    O'Leary et al.'s [8] timetree for Placentalia provides the most explicit formulation of the explosive model. Importantly, O'Leary et al.'s [8] estimated nodal ages for interordinal divergences are minimum ages that are younger than true ages, because the fossil record is incomplete [12]. O'Leary et al.'s [8] enforcement of a maximum age of 66 Ma for the most recent common ancestor of Placentalia suggests that as many as 10 interordinal divergences may have occurred in the 200 000 years just after the KPg boundary. This hypothesis requires rates of molecular evolution that were accelerated more than 60× in early crown placentals relative to the stem placental ancestor [9]. Such rates would be commensurate with those of double-stranded DNA viruses. For these reasons, we reject the explosive model of placental mammal diversification and prefer the alternate explanations that ghost lineages extend into the Cretaceous and/or morphological parsimony analyses that position Cretaceous eutherians outside of Placentalia are incorrect. Phillips [105] suggested that Meredith et al.'s [4] timetree dates require more than 250 Myr of ghost lineages in Boreoeutheria, and for this reason proposed a soft explosive model wherein most interordinal diverences in Placentalia occurred after the KPg boundary. Phillip's [105] soft explosive model only allows for the emergence of the stem Afrotheria, stem Xenarthra, stem Laurasiatheria and stem Euarchontoglires lineages in the Cretaceous. Phillips [105] also suggested that Meredith et al.'s [4] interordinal divergences were inflated because of rate-transference errors, and that these errors can be mitigated by eliminating calibrations in clades that are characterized by large body sizes and long lifespans that are the source of rate transference errors. An alternate explanation is that Phillips' [105] interordinal divergence dates are too young because they are dragged forward by divergence dates in clades with large body size and long lifespan that are also too young. Indeed, Phillip's preferred timetree, which is based on calibrations at only 27 nodes (contra 82 in Meredith et al. [4]), has estimated dates at 62 of 136 internal nodes in Placentalia that are younger than minimum ages implied by the fossil record [106]. Included among these nodes are numerous interordinal divergences as well as some divergences in clades that are characterized by smaller body sizes (e.g. Eulipotyphla). By contrast, timetree analyses that exclude taxa based on large body size and/or long lifespan thresholds provide support for the long fuse model of placental diversification [29].

    The final explanation for the perceived disagreement between molecular studies that support Cretaceous interordinal divergences and morphological parsimony analyses that place Cretaceous eutherians outside of crown Placentalia is that higher-level placental relationships in the latter are inaccurate and unreliable. Along these lines, Sansom & Wills [107] documented fundamental taphonomic biases that will cause stemward slippage of fossils. In addition, pseudo-extinction analyses [13] have shown that the majority of living placental orders move to a different superordinal group when molecular data are recoded as missing and that some orders are rendered polyphyletic or paraphyletic. Finally, cladistic analyses of the largest morphological datasets [8,103] exhibit massive homoplasy problems that call into question the results of analyses based on these datasets. In addition to problems with O'Leary et al.'s [8] morphological dataset that were noted above, Halliday et al.'s [103] dataset results in equally surprising relationships and their unconstrained morphological parsimony tree (their fig. 3) supports treeshrew diphyly, talpid diphyly, elephant shrew diphyly, etc., and fails to recover virtually every superordinal clade of placental mammals that is well supported by numerous molecular datasets.

    Molecular scaffolds or combined analyses with molecular and morphological data can effectively rescue extant taxa, but the results of pseudo-extinction analyses suggest that molecular scaffolds/combined analyses are not effective for extinct taxa in higher-level placental phylogenetics. Finally, several authors (e.g. [108] for arctoid carnivorans) have suggested that parallel evolutionary trends can result in excessive homoplasy among living taxa that obscures relationships when diachronous terminals (i.e. extinct and extant taxa) are included in the same analyses. In view of these difficulties, Cretaceous interordinal divergences should not be rejected because they disagree with morphological parsimony trees. New approaches for molecular dating include total evidence or tip dating, but these approaches are dependent on accurate phylogenies and should be used cautiously given potential problems with morphological and palaeontological character data for placental mammals (e.g. correlated homoplasy, taphonomic biases, incompleteness) that are surpassed by the quantity, quality and unambiguity of molecular data [109111]. Future clever and novel integrative research based on molecules, fossils and timetree estimation methods will be required to resolve the timing of origination and diversification of mammals.

    8. The future

    We have come so far and have greatly advanced our understanding of mammalian evolutionary history in the past twenty years. These advances have resulted from (i) new methods in molecular technologies enabling the fast and rapid generation of huge amounts of sequence data from novel taxa [69,70], (ii) the conception and implementation of novel analytical methods for accurately modelling sequence evolution and allowing independent rates and models across branches and (iii) the availability of fast and powerful computers that enable computationally intensive phylogenetic analyses. However, despite these advances, there are still three key areas that must be addressed before we can have ‘mammal tree’ resolution.

    First, the unresolved ‘hard’ nodes in the mammal tree of life must be resolved. Whole genomes should provide the required data, which if aligned and analysed correctly, should aid in the resolution of these nodes. One requirement is that the sequence data must be excellent, the genomes must be at high coverage and ultimately at a chromosome level assembly [112]. New sequencing technologies, such as Dovetail and 10X Genomics, promise to aid in the difficult task of scaffolding de novo genomes and coupled with long sequence reads, such as PacBio sequencing, excellent chromosome level assemblies of de novo genomes are possible in the near future [112]. Appropriate and correct analyses of these data are required to resolve these problems. At present, the field advances on two fronts. On one front, the supermatrix approaches to solve these problems have been advanced through the application of better fitting heterogeneous models of sequence evolution, compared with analyses employing widely and in some cases inappropriately used homogeneous models [6,53]. On the other front, as outlined above, the limitations, powers and pitfalls of novel gene tree estimation methods must be elucidated to ascertain the best methods to analyse these new whole genome data; however, the refinement of statistical binning [82] shows promise to address some of the criticisms of this approach. Potentially, some of the ‘tricky’ nodes may not be resolved with phylogenomic analyses alone, but whole genome data will provide accurate insertion positions of novel transposable elements and insertion/deletion events that will provide independent synapomorphic characters, indicative of true shared ancestry. Non-molecular data can also be used to resolve these hard nodes. Haeckel [113] originally proposed that the study of morphological ontogenic development of embryos may reveal phylogenetic affinities not present in the adults [114]. New imaging technologies promise to provide novel non-lethal ontogenic evo/devo data of developing embryos [115], thus allowing for the study of protected and diverse species, never possible before. Molecular phylogenetic trees can act as a scaffold to differentiate between homoloplastic and homologous morphological characters, ultimately providing true morphological synapomorphic characters to help provide the data to resolve the difficult nodes [116]. All of these data, interpreted together should provide the information needed to resolve the mammal tree of life.

    Second, to determine divergence times for this topology, we still must accurately assign fossils to the appropriate branches, whether across the tree or just at key basal nodes. This is a more difficult task as certain lineages are extensively missing fossil data (e.g. bats are missing more than 70% of fossil history; Teeling et al. [117]), and also the key homologous structures may be lost or modified during preservation. Uncovering key transitions fossils, e.g. Eocene whale fossils (Artiocetus, Rodhocetus), which contain diagnostic characters, could ultimately resolve the tricky nodes. Therefore, more fossil finds, particularly in the under-represented Southern Hemisphere are required. The development in new sequencing technologies has revolutionized the field of ancient DNA. It is now possible to sequence the whole genome of recent fossils and ultimately place them on a tree, from as little as a single bone (e.g. Denisovan [118]). This greatly advances our understanding of fossil placement in the tree. The integration of both molecular and morphological data and the contribution of each data type to phylogenetic inference must also be explored. Potentially these data cannot be analysed together, but each type of data offers unique insights into evolutionary history and therefore must be considered. Further advances in divergence time analyses are required to refine estimates, recent simulation studies, which incorporate the multispecies coalescent model into estimates of divergence time show promise with small datasets [119] and surely, the application of such an approach to large empirical datasets is just around the corner. With the new technological advances in sequencing, imaging, computation and analytical methods, coupled with future fossil finds, the next 20 years of phylogenetic research promise to be exciting and dynamic, and ultimately, should result in the resolution and dating of the mammal tree of life.

    Authors' contributions

    E.C.T. conceived and designed the study. M.S.S. assembled the datasets and built trees. All authors contributed to the timetree analysis, interpretation of results and writing, editing the manuscript.

    Competing interests

    We have no competing interests.

    Funding

    E.C.T and N.M.F are supported through a European Research Council Research Grant awarded to E.C.T (ERC-20120StG311000). M.S.S is supported through an NSF United States grant no. DEB-1457735

    Acknowledgements

    We thank Phil Donoghue and Ziheng Yang for the invitation to contribution. We also thank Mario dos Reis, Robert Asher and one anonymous reviewer for helpful comments and suggestions.

    Footnotes

    One contribution of 15 to a discussion meeting issue ‘Dating species divergences using rocks and clocks’.

    Published by the Royal Society. All rights reserved.