Abstract
Genome sequencing studies of de novo mutations in humans have revealed surprising incongruities in our understanding of human germline mutation. In particular, the mutation rate observed in modern humans is substantially lower than that estimated from calibration against the fossil record, and the paternal age effect in mutations transmitted to offspring is much weaker than expected from our long-standing model of spermatogenesis. I consider possible explanations for these discrepancies, including evolutionary changes in life-history parameters such as generation time and the age of puberty, a possible contribution from undetected post-zygotic mutations early in embryo development, and changes in cellular mutation processes at different stages of the germline. I suggest a revised model of stem-cell state transitions during spermatogenesis, in which ‘dark’ gonial stem cells play a more active role than hitherto envisaged, with a long cycle time undetected in experimental observations. More generally, I argue that the mutation rate and its evolution depend intimately on the structure of the germline in humans and other primates.
This article is part of the themed issue ‘Dating species divergences using rocks and clocks'.
1. The germline mutation rate
All evolutionary processes depend on the flow of genetic information from one generation to the next, and as with any signal, errors in transmission can occur. The rate at which this happens is called the germline mutation rate, and is of central importance to evolutionary genetics. Not only is it key to interpreting genomic differences between individuals and populations, it also determines the timescale by which we can relate genetic data to other evidence for the evolutionary past. This relationship is not straightforward, however, because although in evolutionary genetic theory the mutation rate often plays the role of a fundamental constant, in truth it evolves like any other trait and can differ by orders of magnitude between species [1].
Estimates of the mutation rate in humans have varied according to the data and methods available. The first were made even before the nature of the DNA molecule had been established [2,3], and so were indirect and restricted to mutations causing phenotypic differences, such as at dominant disease loci. Subsequent estimates were based on phylogenetic comparisons between species, with divergence times drawn from fossil evidence. More recently, developments in genome sequencing technology have enabled mutation rate estimates based on counting de novo mutations, comparing closely related individuals in parent–offspring trios or larger pedigrees (reviewed in [4,5]).
In principle, phylogenetic and de novo estimates represent different aspects of the same approach, counting genetic differences accumulated over a number of generations. For evolutionary analyses, a de novo estimate seems at first glance more attractive because it avoids the circularity implicit in phylogenetic calibration, particularly when comparing genetic data against fossil dates. However, the first such estimates in human trios yielded a value of 0.5 × 10–9 bp–1 yr–1 for single-nucleotide mutations, almost half the established phylogenetic rate, and thus implying a substantial lengthening of the evolutionary timescale if applied across all hominoid lineages [4]. While such a revision may be warranted in places, particularly for recent events within the genus Homo and the speciation of the African great apes, a longer timescale for older events is difficult to reconcile with the primate fossil record. For example, with this rate the 4.7% sequence divergence between apes and old-world monkeys [6] implies a genetic divergence time 47 Ma, and hence speciation approximately 40 Ma (assuming a reasonably large ancestral population), whereas the fossil record seems consistent with a divergence no more than 25–30 Ma [7].
Several explanations for this disagreement have been proposed, including the possibility that de novo estimates have failed to correctly quantify false positives or inaccessible regions of the genome [5]. However, while there are caveats to any approach, more than a dozen subsequent de novo studies have consistently produced similarly low values [5]. This includes one study based on more distantly related individuals [8], and while other forthcoming pedigree-based estimates may lead to some adjustment (for reasons discussed below), it seems unlikely that methodological considerations alone will close the gap between phylogenetic and de novo-estimated rates. Furthermore, additional evidence supporting a low germline mutation rate in modern humans comes from comparisons of ancient and modern DNA [9], and a lower rate is arguably more compatible with archaeological evidence for the timing of recent events such as the divergence of Native American and East Asian populations [10].
This paper explores three alternative explanations for the rate discrepancy and discusses factors underlying the germline mutation rate which may have led to its evolution on shorter or longer timescales. Firstly, I discuss the possibility that mutation rates may have slowed due to life-history changes during the last 20 Myr of hominoid evolution. Secondly, I consider whether aspects of the cellular genealogy of the germline might have led to a substantial number of mutations going undetected in trio sequencing experiments. Finally, I discuss how stem-cell processes in spermatogenesis affect the germline mutation rate and how our model for this might be reconciled with recent experimental observations.
2. Life-history changes during hominoid evolution
One possible explanation for the discrepancy between mutation rate estimates is that rates themselves may have changed during hominoid evolution. Since they are observed to differ between species across large evolutionary distances, a slowdown on this scale is not implausible a priori [11,12]. Indeed, great apes have evolved in several ways over this time, notably increasing in body mass [13]. This itself leads to an explanation for the putative slowdown based on a change in generation time (defined as the average time from zygote to zygote along a genetic lineage), since life-history parameters such as generation time scale with body mass across a wide range of mammal species [14,15]. Consider a simplistic model where the per-generation mutation rate μgen is constant and the mutation rate per year μ scales inversely with generation time: μ = μgen/tgen. Then an increase in generation time by a factor of almost two could account for the necessary reduction in yearly rate from approximately 1 × 10–9 bp–1 yr–1 in the past to 0.5 × 10–9 bp–1 yr–1 today (figure 1).
However, as noted by Ségurel et al. [5], this model is too simplistic, for in supposing that the number of mutations per generation is independent of tgen, it ignores the fact that older fathers tend to pass on more mutations to their offspring than younger fathers. This phenomenon, the paternal age effect, is a consequence of the fact that cell-division replication errors are the major source of germline mutation, and whereas in both sexes there are several divisions associated with embryonic development prior to gametogenesis, spermatogenesis in males involves a process of continuous further cell division throughout reproductive life. Hence the older the father, the more cell divisions his gametes will have passed through, and the more errors accumulated. By contrast, in oogensis a stock of primary oocytes is generated within the developing embryo, each of which is held in stasis until the final two meiotic divisions leading to ovulation later in life.
Empirical measurements of the paternal age effect in de novo sequencing studies have found that the mean number of mutations passed on by fathers grows linearly with age, approximately doubling between the ages of 20 and 40 years [16–18]. This would seem to largely mitigate the generation time effect on mutation rates [5]. Consider a straightforward extension to the model presented above: for an autosomal lineage (which spends equal time in males and females), we have μgen = (μgen,f + μgen,m)/2, where μgen,f is the female mutation rate per generation and μgen,m the male rate. Then
If tpub is fixed then even a substantial change in generation time has relatively little effect on the yearly mutation rate under this model, as shown in figure 1. However, the assumption of a fixed age of puberty is itself almost certainly invalid, since like other life-history parameters the age of male sexual maturity scales with body mass across the primates, and variation within extant primates suggests a strong correlation (R2 = 0.84) with tgen (figure 2). Assessments of sexual maturity can vary and may not coincide with the onset of spermatogenesis in every case [27]. Nevertheless, if we incorporate a linear scaling of tpub with tgen, we recover much of the generation time effect, in the sense that an increase in tgen from 15 to 30 years now corresponds to a reduction in μ by a factor of 1.5 (figure 1).
3. Hidden germline mutations in trio sequencing
An alternative explanation for the discrepancy in rates may lie in how de novo sequencing experiments relate to the cellular genealogy of the germline, and the definition of germline mutation rate as the mean number of mutations acquired on a germline lineage from zygote to zygote. Mutations on somatic lineages are important in the context of diseases such as cancer, but such lineages do not as a rule extend beyond lifespan of the organism and thus make no direct contribution to evolutionary genetic processes. However, the detection of de novo mutations in trios is based on sequencing somatic cells in parents and offspring, not zygotes (or even other germ cells). To understand the implications of this and how these experiments relate to what we want to measure, we need to consider the cellular genealogy of the germline within a family (figure 3).
Germ cell specification—the process by which certain cellular lineages are set aside as germ cells—occurs in mammals around the time of gastrulation. Following invagination of the epiblast, a number of cells originating there find a niche in the wall of the yolk sac and subsequently migrate as primordial germ cells (PGCs) to the gonadal ridge. Many somatic lineages also differentiate around the same time, and also have their origins within the epiblast. In humans, this specification process occurs about two weeks after fertilization, or approximately 15 cell divisions [28]. Thereafter, germ cell lineages undergo several further divisions in preparation for gametogenesis: approximately 15 more divisions in females and 20–24 in males [28,29]. Thus, in total, there are about 30–40 mitotic divisions from fertilization to puberty, at which point in males the population of gonial stem cells (GSCs) is established in the testes, and the primary oocytes have been formed in females. From then on, the male and female gametogenetic processes differ markedly, with GSCs replicating continuously in adult males to maintain the germ cell lineages and support gamete production.
Given this structure, the fact that de novo sequencing estimates are based on sequencing somatic rather than germ cells creates a potential for error. For example, in comparing parents and offspring, mutations arising early in the somatic cellular genealogy of the offspring may be counted as de novo germline mutations (false-positive errors), and early mutations in either parent on lineages having both somatic and germline descendants may not be recognized as de novo (false negatives).
Some of these cases may be excluded or recovered by careful filtering based on the fraction of somatic cells in which they are present [30]. However, there may be a class of early post-zygotic mutations which cannot be accounted for in this way, depending on when and how the divergence of germ cell and sequenced somatic lineages occurs. Prior to the completion of this divergence, early embryonic cells may be ancestral both to germ and somatic cells within the organism, and mutations occurring then may be shared by some or all cells in either genealogy (figure 3). Such ‘hidden’ mutations could contribute a component to the germline mutation rate which is undetectable in parent–offspring comparisons, and whose size depends on the number of cell divisions and the per-cell-division mutation rate at this early stage [31]. For example, it has been suggested that the first few post-zygotic cell divisions might be particularly mutagenic, based on the high level of chromosomal instability and other errors found in early IVF embryos and the frequency of early pregnancy loss after conception [32–34].
The potential for hidden mutations depends on the distribution of cell fates within the epiblast (for which much of our understanding derives from studies in mice or non-human primates). It may also depend to some extent on which somatic cells are sequenced. For example, compared with cells sampled from multiple tissues or from blood, cells derived only from one tissue or region of the body may descend from a smaller number of lineages at any given stage in development. As a consequence, configurations A and C for the divergence of germ and somatic lineages in figure 3 may be more likely for such cells, potentially increasing the number of early cell divisions in which mutations would be hidden. As an aside, the observation that parental mosaicism in blood is correlated with recurrence risk [35] suggests that lineage ancestries for these cells at least are mixed in humans (case D in figure 3) [36]. Lineage-tracing experiments on mouse oocytes suggest that a similar situation exists in mice across a range of somatic cell types, notwithstanding a degree of lineage clustering by cell type [37].
Might a hidden mutation component explain the discrepancy between phylogenetic and de novo rate estimates? Various considerations suggest that this is unlikely, subject to further data. Firstly, although hidden mutations are impossible to detect in single-generation experiments, comparisons over many generations should be sensitive to mutations on all ‘internal’ segments, including all hidden mutations except at the root and leaves of the pedigree. If hidden mutations make a substantial contribution to the germline mutation rate, we might expect pedigree-based estimates to be higher than those made in trios. Two such studies have been published to date, of which one did not differ significantly from trio-based estimates [8] and the other obtained a value 33% higher [38]. Forthcoming studies may clarify this picture.
Secondly, a large hidden contribution should lead to a correspondingly high rate of within-family recurrence of genetic diseases associated with de novo mutations. This would be in addition to rates of recurrence due to shared gametic ancestry following germ-cell specification, for which previous models have estimated recurrence rates of ≪1% for mutations of paternal origin (i.e. most mutations) and approximately 4% for those of maternal origin [35,39,40]. By definition, hidden mutations occurring in a parent will be present in all of his or her gametes. Thus, if hidden mutations constitute a fraction ϕ of all germline mutations, the probability of recurrence due to such mutations will be ϕR, where R is sibling relatedness. If hidden mutations occurred at a rate similar to the observed de novo mutation rate, so that the total de novo rate matched the phylogenetic rate, we would expect to see at least 25% recurrence of autosomal-linked diseases (R = 0.5). Recurrence at sex-linked loci depends on the sexes of offspring, but even in brothers of female offspring we would expect a recurrence rate of at least 12.5% for diseases caused by de novo mutations on the X chromosome (R = 0.25).
Clinically estimated recurrence rates depend on the disorder involved and the nature of the causative mutation or mutations. Some disorders such as Duchene muscular dystrophy show recurrence rates as high as 14% [41], but estimates are generally less than 1% [42,43]. However, such estimates may not necessarily reflect the recurrence rates of single-nucleotide variants (SNVs) as counted in de novo sequencing experiments and in phylogenetic comparisons. Even where they relate to clinical genetic data, such data often include structural mutations and chromosomal abnormalities whose origins may tend to differ from those of de novo SNVs. In particular, chromosomal rearrangements may be enriched for meiotic errors [44], whereas the apparent linearity of the paternal age effect suggests that germline SNVs are dominated by mitotic events. Where clinical estimates are based on phenotypic recurrence, uncertainty arises in modelling the relationship between genotype and phenotype, the number of loci involved, and controlling for possible environmental factors. Additionally, some phenotypes may be difficult to diagnose consistently, particularly where there is already a diagnosis in siblings, and further potential bias arises from stoppage, whereby parents of an affected child are less likely to have additional children [45,46]. Some of these considerations suggest that clinical estimates might underestimate the true recurrence of de novo germline mutations. However, the degree of underestimation would have to be at least an order of magnitude to be consistent with a substantial contribution of hidden mutations to the germline mutation rate.
Another effect of hidden mutations would be to inflate the male–female mutation rate ratio as measured in trio comparisons. If hidden mutations occur with equal probability in males and females, and if the male–female ratio of observed (non-hidden) mutations is αobs, it is straightforward to show that the true male–female ratio α is bounded above by αobs and given by
4. Changes in the structure of spermatogenesis
The importance of paternally transmitted mutations focuses attention on spermatogenesis as a key factor affecting the germline mutation rate. The established model of human spermatogenesis is based on long-standing experimental observations of the seminiferous epithelium (the environment within the testes where spermatogenesis occurs) [51,52]. Yves Clermont observed several types of spermatogonial cell in humans, differing in their appearance and degree of staining with haematoxylin and eosin [53]. Two of these types correspond to self-renewing (GSC) states [54], and based on their staining are generally referred to as pale (Ap) and dark (Ad) spermatogonia. However, in Clermont's observations only Ap cells were seen to actively divide, each doing so every 16 days to produce a new Ap cell and a progenitor spermatocyte which he termed B-spermatogonia. The latter subsequently undergo two further mitotic divisions and meiosis to produce up to 32 spermatozoa [54] in a process lasting 48 days. Thus, Ap cells are widely regarded as the active spermatogenetic population and Ad cells are thought to comprise a pool of reserve stem cells, to be drawn upon only when the active population has failed or is damaged. Figure 4 (model 1) illustrates this model in terms of the cell states and transitions involved.
One possible explanation for a slowdown in mutation rate would, therefore, be an increase in the cycle time of the seminiferous epithelium during hominoid evolution, leading to fewer mutations acquired during spermatogenesis for a typical adult male. Such a change is equivalent to varying μs in the model discussed above, and figure 5 shows the effect on germline mutation rate, assuming that puberty also scales with generation time as previously discussed. The seminiferous epithelial cycle time in monkeys varies between 9 and 11 days [55], and if the cycle time in ancestral great apes was similar to this, the change since then would correspond to a mutation rate slowdown by almost a factor of 2 (dashed line in figure 5), perhaps sufficient to explain the discrepancy in mutation rates.
However, there is a problem with this model as presented, in that lineages resulting in gametes produced by a 30-year-old male would have passed through approximately 400 cell divisions since fertilization, meaning we would expect a roughly 10-fold increase in the number of mutations passed on to offspring at age 30 compared with those at puberty. Yet sequencing studies have consistently measured only a twofold increase from early to late adulthood [16–18]. This discrepancy, noted also by Ségurel et al. [5], suggests that aspects of the model for spermatogenesis need to be revised.
One possibility is that per-cell-division mutation rates are much higher at earlier developmental stages than during spermatogenesis. For example, there is evidence that somatic cell-division mutation rates are substantially higher than those in germ cells [56] and it may be that changes in the environment or phenotype of cells at germ cell specification are accompanied by improved mechanisms of DNA replication error correction. In order to account for a weak paternal age effect, the mutation rate in the first 15 cell divisions (prior to specification) would need to be a factor of approximately 20 higher than in subsequent divisions, which is near the limit of the range of reported estimates for germline and somatic cells [1]. Note also that this excludes any hidden mutations of the kind discussed above, although such mutations would also be made more likely by a high post-zygotic mutation rate, as discussed above. Alternatively, elevated per-cell-division rates may last for a longer time, perhaps until the onset of spermatogenesis or shortly thereafter. If we assume a higher rate applies to the first 40 divisions then it need only be higher by a factor of approximately 9. A recent study of transmitted mutations in a large cohort including some teenage fathers suggested that the very earliest stage of spermatogenesis may be more mutagenic [18]. If true, this might reflect a shift to lower mutation rates once spermatogenesis is fully established. Early spermatogenesis is known to differ in some respects from the process later on: for example, daily sperm production (DSP) volumes are approximately 10 times lower in teenage males than in men 20–30 years old [57].
Another possibility is that the apparent 16-day cycle of the seminiferous epithelium is only part of the picture and that germline lineages actually experience a longer cell-cycle time for some or all of their passage through spermatogenesis. This would imply a more complex structure for GSC state transitions and the number of self-renewing states in which they can exist. It is of course possible to conceive of many such models, but a relatively straightforward extension of the existing model would be for the Ad cells to play a more prominent role. If they replicate with a longer cycle time than was detectable in Clermont's data, and if transitions are possible between the Ad and Ap states, then germline lineages could spend some or even most of their time in the Ad state during spermatogenesis (figure 4, model 2). By reducing the number of germline cell divisions, this could account for a weaker than expected paternal age effect.
Potential evidence for such a model comes from several sources. Within primates, investigations of spermatogonial renewal in monkeys after exposure to radioactive or contraceptive agents [58,59] have shown that Ap cells may be able to transition to Ad without undergoing cell division. If this occurs under normal conditions then the Ad state could play a role other than that of a dormant and non-proliferative reserve. Other evidence comes from comparison with spermatogenesis in mice, which although differing in several respects does share many basic features with primates [60] (figure 4). GSCs in mice can exist in a singular state (As) or in various syncytial states wherein the nuclei share a common cytoplasm, either paired (state Apr) or in longer alignments of n cells (states Aal−n) [61]. Recent experimental studies suggest that progenitor spermatocytes may be produced from divisions of any of these states [60,62], but the degree of commitment to (or likelihood of) differentiation may be greater in the Aal state. An analogy can be drawn with the process in humans, based both on function and on expression levels of several molecular markers, in which the As state corresponds to Ad and the Apr and Aal states correspond to Ap [60,63]. Moreover, a model of stochastic transition between the As, Apr and Aal states, in which intracellular bridges are broken as well as created, has been shown to fit the dynamics of GSC populations in mice [62,63]. The analogous set of transitions in humans and other primates would fit the alternative model shown in figure 4.
By making certain assumptions about the possible transitions between GSC states, we can estimate what cycle time for the Ad state would produce the observed paternal age effect. Previous approaches to modelling stem-cell systems have sometimes assumed homeostatic equilibrium as a way of estimating or constraining model parameters (e.g. [62]), but it may be that spermatogenesis is better represented as a near-equilibrium process. For example, there is notable variation in DSP with age: in humans mean DSP decreases steadily in older men, dropping by a factor of 2 from age 20–80 years [57]. To capture these non-equilibrium aspects, we can use a finite state simulation in which transitions between cell states have both a probability parameter and an associated transition time (figure 6). The basic assumption of this model is that the transition Ap → Ad occurs with some probability during each Ap cycle. For the reverse transition, since Ap is observed to divide asymmetrically (Ap → Ap + S), I assume for now that Ad behaves similarly (Ad → Ad + Ap); an alternative choice (Ad → Ap) is discussed below. I also introduce a cell-death state which both regulates the process (since otherwise Ad replication will lead to unbounded GSC proliferation) and ensures that gamete production declines with age. It is plausible that cell death plays a central role in regulating many stem-cell systems [64], and GSC replication occurs only within a niche at the basement membrane of a seminiferous tubule for which several cellular and other environmental factors may be essential. In particular, the availability of Sertoli cells, somatic cells which play both a structural and regulatory role in gamete production, is thought to be a critical factor [52,57]. Cell-death probabilities in this model are an abstraction representing the typical availability of and contention for these critical factors.
A simple parameter-space search, fitting simulated output of this model to the observed slopes of the paternal age effect and DSP age profile, estimates an Ad cycle time of around 300 days and values of 20–30% for transition probabilities to the cell-death state (figure 7). Replication of Ad cells on this timescale would likely not have been observed in Clermont's or subsequent experiments, although transitions Ap → Ad, which in this example are predicted to occur in 10% of Ap cycles, might be detectable.
Other model assumptions are possible, and may result in different parameter estimates. For example, in a model where the Ad → Ap transition occurs without cell division, so that the Ad cycle is essentially a quiescent interlude before GSCs return to the active Ap state, a similar procedure fitting the observed paternal age-effect estimates a cycle time of around 750 days (data not shown). However, the point here is not that parameters can be inferred under one model or another, but the fact that including an active role for the Ad cells allows models which are compatible both with long-standing observations of the seminiferous epithelium and with recent measurements of the paternal age effect. More sophisticated models might also include feedback or global regulation mechanisms other than cell death [52,55], age-related changes in cell-division mutation rate and spermiogenetic efficiency, and perhaps phenomena such as selfish spermatogonial selection [65]. At present, experimental data are limited, but more extensive data including trio sequencing on population-wide scales will provide a better basis for exploration of spermatogenetic models along these or similar lines.
5. Discussion
I have argued that the discrepancy between phylogenetic and de novo estimates of the mutation rate is more probably due to a genuine evolutionary slowdown than to methodological errors or the failure of trio sequencing experiments to detect early post-zygotic mutations. Nevertheless, the latter factors may be present at some level and thus contribute to the discrepancy, meaning that the magnitude of any slowdown may be less than was first hypothesized. Also, while we may be confident that rates have slowed at some point during primate evolution, our inference of the timing, extent and number of ancestral lineages involved in such a slowdown is determined largely by the fossil record and the confidence with which we can constrain speciation events, particularly within the hominoids. Initial attempts to reconcile the rate discrepancy were concerned with the plausibility of a slowdown affecting all four great ape lineages in parallel and to the same degree, given that their branch lengths from an outgroup such as macaque do not differ substantially [49]. However, if newer interpretations of the fossil record were to admit a more ancient speciation time of 20 Ma or more between the ancestors of orangutans and other great apes, they would be consistent with an earlier slowdown affecting only the stem hominoid lineage, and this would suffice to explain the current data without requiring parallel evolution.
More broadly, and regardless of the extent to which rates may have changed in recent primate evolution, the processes considered here are relevant to evolutionary genetic analyses across the mammalian tree and beyond. Previous studies have proposed life-history variation as an explanation for mutation rate change, but it is clear that such explanations need to involve more biologically sophisticated models incorporating factors such as varying pubertal age and sex-dependent parameters [5,50]. Mutation rate change may also be due to evolution in the underlying cellular processes and genealogical structure of the germline, particularly in gametogenesis. Here too, recent experimental data are incongruous with existing models of spermatogenesis and the strength of the paternal age effect. I have focused on potential variation in cell-division mutation rates at different developmental stages and on the stem-cell states involved in spermatogenesis. Other issues not touched on include the relative importance of spontaneous mutation processes [66], potential evidence for a maternal age effect [67], and the evolution of regulatory factors controlling gametogenesis [52]. Progress to date in addressing these questions has been difficult in part because of the challenge of obtaining experimental data on human germline processes: some techniques can only be applied to non-human models, and genome sequence data for human de novo mutations has previously been limited. However, the potential now exists for large-scale genome sequencing of somatic and germ cells and experimental studies of non-human and human stem cell systems. In combination with computational modelling approaches such as that presented here (and widely used in previous studies to explore stem-cell population dynamics [35,62,68–70]), these developments will facilitate a better understanding of mutation processes and the evolution of the human germline.
Competing interests
I have no competing interests.
Funding
I am grateful for support from an Isaac Newton Trust/Wellcome Trust ISSF Joint Research Grant.
Acknowledgement
I thank Matt Hurles, Anne Goriely, Guy Sella, Molly Przeworski, Guy Amster, Raheleh Rahbari, Sarah Lindsay and Alfonso Martinez-Arias for useful discussions.