Philosophical Transactions of the Royal Society B: Biological Sciences
You have accessIntroduction

Dating species divergences using rocks and clocks

Ziheng Yang

Ziheng Yang

Department of Genetics, Evolution and Environment, University College London, Gower Street, London WC1E 6BT, UK

[email protected]

Google Scholar

Find this author on PubMed

and
Philip C. J. Donoghue

Philip C. J. Donoghue

School of Earth Sciences, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, UK

Google Scholar

Find this author on PubMed

    Knowledge of absolute species divergence times is not only fascinating to evolutionary biologists in establishing the age of a species group, but also critically important to addressing a variety of biological questions. Absolute times allow us to place speciation events (such as the diversification of the mammals relative to the demise of the dinosaurs) in the correct geological and environmental contexts and to gain a better understanding of speciation and dispersal mechanisms [1,2]. They also allow us to characterize species richness and species diversification rates over geological periods. Estimated molecular evolutionary rates can also be correlated with life-history traits and are important for interpretation of the fast-accumulating genomic sequence data. Molecular clock methods are also used widely in establishing the evolutionary history of viruses, including those related to human diseases.

    The molecular clock hypothesis (rate constancy over time), proposed by Zuckerkandl & Pauling [3,4], provides a powerful approach to estimating divergence times. Under the clock assumption, the distance between sequences grows linearly with time, so that if the ages of some nodes are known (for example, from the fossil record), the absolute rate of evolution as well as the absolute geological ages for all other nodes on the tree can be calculated. The past decade has seen exciting developments in clock-dating methodologies, especially in the Bayesian framework, such as stochastic models of evolutionary rate change to deal with the sloppiness of the clock [57], flexible calibration curves to accommodate uncertain fossil information [8]. There has also been a surge of interest in probabilistic modelling of fossil presence and absence within stratigraphic sequence [911] and models of morphological character evolution [12] to use fossil data to generate time estimates, in the analysis of either fossil data alone or in a combined analysis of data from both fossils and modern species.

    However, many challenges remain, such as the relative merits of the different prior models of evolutionary rate drift (e.g. the correlated- and independent-rate models), the difference between user-specified time prior incorporating fossil calibrations and the effective time prior used by the computer program, the partitioning of molecular sequence data in a Bayesian dating analysis and the persistent uncertainty in time and rate estimation despite explosive increase in sequence data. Realistic models for the analysis of fossil data (either fossil occurrence data or fossil morphological measurements) are still in their infancy.

    With the explosive growth of genomic sequence data, molecular clock-dating techniques are increasingly being used to date divergence events in various systems. It is timely to review the recent breakthroughs in the field and highlight future directions. We thus organized a Royal Society discussion meeting titled ‘Dating species divergences using rocks and clocks’, on 9–10 November 2015, to celebrate Zuckerkandl and Pauling's ingenious molecular clock hypothesis, to assess this fast developing field and to identify the fundamental challenges that remain in developing molecular clock-dating methodology. The meeting brought together leaders in the fields of geochronology and computational molecular phylogenetics, as well as empirical biologists who use molecular clock-dating technologies to establish a timescale for some of the most fundamental events in organismal evolutionary history. This special issue is the result of that meeting.

    The special issue consists of 14 reviews and original papers. In the first [13], we review molecular clock-dating methods developed over the five decades, with a focus on recent developments and the Bayesian methods. The rest of the papers (13 of them) fall into three groups: (i) on features and analyses of rock and fossil data, (ii) on theoretical developments in molecular clock-dating methods, and (iii) on applications of clock-dating methodology to infer divergence times in various biological systems. In the first group, Holland [14] describes the structure of the fossil record. Everyone is familiar with the vagaries of fossil preservation, but the most significant ‘bias’ in the fossil record is perhaps the non-uniform nature of the rock record within which it is entombed. Holland describes variations in preservation among lineages, environments and sedimentary basins, across time and in terms of perception, and finally variation in sampling. While modern biogeography has been shaped by a reliance on the security of direct dating of tectonic events, like the opening and closure of oceans, Holland argues that the predictably non-uniform nature of the rock and fossil records is amenable to probabilistic modelling. The influential factors he has discussed may be important ‘covariates' in building a model of fossil preservation and discovery. De Baets and co-workers [15] show that the high precision of radiometric dating belies the poor accuracy of the estimated age of biogeographic events, which are invariably long drawn-out episodes of tectonism, the impact of which will vary depending on the ecology of the clades. Nevertheless, the uncertainties associated with biogeographic calibrations can be modelled in much the same way as in fossil calibrations and the two approaches, rather than competing, can be used in combination to constrain clade ages.

    The papers in the second group explore theoretical issues of molecular clock dating or implement new models in Bayesian dating programs (e.g. MrBayes and BEAST). Note that in a modern clock-dating analysis, the calculation of the likelihood for the sequence data takes most of the CPU time but the theory is mature, with well-developed substitution models [16,17]. Instead, methodological developments have focused on the other components of the Bayesian analysis, including the prior on divergence times (the time prior), the prior on substitution rates (the rate prior) and the model of morphological character evolution to incorporate fossil data. Rannala [18] discusses a number of issues in the construction of the time prior, such as the important distinction between the user-specified time prior and the effective time prior used by the computer program. Calibration densities for node ages specified by the user often do not satisfy the constraint that ancestors are older than descendents. Bayesian dating programs then automatically and without any warning truncate node ages that violate the constraint, altering the user-specified time prior to become the effective time prior. Rannala shows that conflicts among fossil calibrations, and between fossil calibrations and the molecular data, may lead to highly precise but grossly wrong time estimates and warns that overly narrow posterior distributions of divergence times should be carefully scrutinized. dos Reis [19] discusses another issue with the time prior. He points out that the procedure for constructing a time prior suggested by Yang & Rannala [8] and known as the conditional reconstruction [20], which combines a birth–death model of cladogenesis with user-specified calibration densities, may generate ugly multimoded prior densities for node ages. The difficulty is caused by the fact that phylogenetics dating analysis uses rooted trees while the birth–death model operates on the so-called labelled histories (rooted trees with the interior nodes ranked by age, [21]). Lartillot et al. [22] explore the rate prior, in particular the independent-rates and the correlated-rates (Brownian-motion) models, for relaxing the molecular clock and allowing the rate to drift over branches on the tree. The authors propose a mixed relaxed clock model to combine the features of both models, assuming that the rate undergoes short-term independent fluctuations on the top of a Brownian long-term trend. Applied to date the divergences of mammals, the new model was found to help reduce the oversensitivity of the posterior to the rate prior, especially when tip calibrations are used.

    Drummond & Stadler [23] use the fossilized birth–death time prior [11] and a simple model of discrete morphological character evolution [24] to estimate the divergence times in an integrated analysis of molecular sequence data for modern species and morphological characters for both modern and fossil species. They take a jackknife-style approach to estimate the age of each fossil in turn using the other dated fossils, based on two rich and well-characterized datasets. This investigation of the internal consistency of the method produced promising results, finding that the posterior mean age of each fossil to be is on average less than 2 Myr from the midpoint age of the geological strata from which it was excavated. However, the credibility intervals of the posterior estimates tend to be large.

    Ronquist and co-workers [25] analysed a mammalian dataset to explore why the combined analysis of molecular and morphological data (known as total evidence dating), despite its theoretical advantages, has not closed the gap between rocks and clocks. The authors highlight that the conflict between morphology and molecules under standard models causes the dating method to generate ancient divergence time estimates. They discuss a number of influential factors, such as the inadequacy of the model of species diversification and fossil sampling used to construct the time prior (in particular, the failure to account for diversified sampling) and inadequacies in morphological models (in particular, the failure to account for correlations among characters). By assuming rapid diversification, rare extinction or high fossil sampling rate, the authors were able to obtain highly congruent time estimates with a minimal gap between rocks and clocks. It may be an open question whether molecular time estimates should be judged by their match or conflicts with the fossil dates. Without the knowledge of the true ages, and given the general sensitivity of posterior time estimates to many aspects of the prior formulation, this question may be expected to haunt many molecular dating studies.

    The sensitivity of posterior time estimates in a Bayesian dating analysis means that the details of the substitution models may also have substantial impact. Lee et al. [26] modelled the different clocks (rate-drift patterns) for different types of substitutions. For example, in mammals, CpG dinucleotides have high mutation rates, which tend to be constant over calendar time, while other types of point mutations may be associated with meiosis so that the rate per year may vary if different species have different generation times. In an analysis of an intergenic region from eight primate species, these authors found that the different groupings of substitution types affected the widths of the credibility intervals far more than the posterior means of divergence times. Scally [27] explores the mechanisms of mutation, confronting the incongruence between high rates inferred from fossil-calibrated divergence time analyses of catarrhines with the low mutation rates observed in human parent-offspring triplets. There appears to be a genuine slowing down in the evolutionary rate, obscured by incomplete understanding of spermatogenesis. However, Scally argues that the precipitousness nature of mutation rate deceleration is, perhaps, an artefact of the fossil calibrations underestimating the timing of divergences among the great ape lineages. These expectations form an interesting contrast with the study of Cahill and co-workers [28], which explores the utility of a pairwise sequential Markovian coalescent (PSMC) model [29] to date both the end of lineage panmixia and the cessation of gene flow among derivative lineages. The PSMC model is novel in that it accounts for the polymorphism and coalescent in extinct ancestors [30] and obviates the need for phased data. The authors' simulations suggest that the method can be used reliably for analysis of low coverage genome data. They show that while divergences among great apes and among bears show evidence of an abrupt end to gene flow following the end of panmixia, more recently, diverged clades like chimpanzee and bonobos exhibit evidence of continued gene flow post divergence.

    The final three papers concern empirical application of divergence time methods to unravelling the evolutionary history of animal and plant lineages. Nicole Foley and co-workers [31] consider the timing of diversification of mammals which has achieved iconic status because of historic discord between the early applications of the molecular clock and traditional estimates based on the fossil record. Definitive records of placental mammals are limited to the Cenozoic and, hence, the fossil record has been interpreted to reflect an adaptative radiation following the demise of non-avian dinosaurs as part of the end-Cretaceous mass extinction. Molecular clock analyses have invariably indicated that placentals diverged in the Cretaceous or even earlier, though the extent of this prefossil history has been diminishing with the development of new methods, more data, more computational power, and a reinterpretation of the fossil record. Foley and co-workers suggest that a definitive time-scale must await a definitive phylogeny. This appears some way off as the branching relationships involving fundamental clades like Laurasiatheria remain refractory to resolution, and interpretation of the fossil record is confounded by the same widespread convergent evolution of phenotypes that had compromised attempts at a mammalian phylogeny before the molecular revolution in systematics.

    Susanne Renner and co-workers [32] use the fossilized birth–death (FBD) model to establish a timescale for beeches and royal ferns, revealing a fivefold difference in the species turnover rate between these two clades. They attribute the low rate of turnover in royal ferns to their adaptation to low-nutrient marginal environments. This study highlights the power of the FBD model in facilitating the inclusion of fossil data that would otherwise be irrelevant to conventional node-calibrated molecular clock analyses. Finally, Lozano-Fernandez and co-workers [33] employ the molecular clock to tackle the timing of arthropod terrestrialization. Arthropods are an ideal model for exploring this ecological transition because they have achieved this feat in a number of independent lineages. These natural experiments provide a basis for exploring the phenomena of convergence and parallelism in the physiological adaptation of a marine aquatic metabolism to the terrestrial environment. The authors show that fossil and molecular estimates of terrestrialization in arachnids are in close approximation, but the terrestrialization of myriapods significantly predates the oldest myriapod fossils and, indeed, the oldest records of plants which are assumed to have terraformed the continents before an arthropod invasion. Lozano-Fernandez et al. suggest that this inconsistency may be an artefact of independent terrestrialization events in the two principal lineages of myriapods—as suggested by long-standing arguments that their tracheal systems have evolved in parallel, rather than inherited from a common ancestor.

    Molecular clock dating is hard when the clock is violated. The molecular sequence data contain information about the genetic distances, but not about the absolute times and absolute rates separately. In a Bayesian analysis, the resolution of distances into times and rates is achieved through the time prior and the rate prior, and through the analysis of morphological measurements from fossil and living species. The time-rate confounding effect combined with the well-known non-clock nature of morphological evolution means that posterior time estimates will remain sensitive to the priors and to the evolutionary models, even if whole genomes are sequenced from many species. We hope that the papers in this special issue have successfully demonstrated the challenges facing molecular dating studies and have also highlighted areas where future methodological developments are most likely to be fruitful.

    Competing interests

    We declare we have no competing interests.

    Funding

    Z.Y. and P.C.J.D. are funded by the Biotechnology and Biological Sciences Research Council (BB/N000919/1), Natural Environment Research Council (NE/N003438/1) and the Royal Society (Wolfson Merit Award).

    Footnotes

    One contribution of 15 to a discussion meeting issue ‘Dating species divergences using rocks and clocks’.

    Published by the Royal Society. All rights reserved.