Philosophical Transactions of the Royal Society B: Biological Sciences
You have accessReview article

Human evolution: a tale from ancient genomes

Bastien Llamas

Bastien Llamas

Australian Centre for ADNA, School of Biological Sciences, University of Adelaide, Adelaide, South Australia 5005, Australia

Google Scholar

Find this author on PubMed

,
Eske Willerslev

Eske Willerslev

Centre for GeoGenetics, Natural History Museum of Denmark, Øster Voldgade 5–7, 1350 K Copenhagen, Denmark

Department of Zoology, University of Cambridge, Downing Street, Cambridge CB2 3EJ, UK

Wellcome Genome Campus Hinxton, Wellcome Trust Sanger Institute, Cambridge CB10 1SA, UK

Google Scholar

Find this author on PubMed

and
Ludovic Orlando

Ludovic Orlando

Centre for GeoGenetics, Natural History Museum of Denmark, Øster Voldgade 5–7, 1350 K Copenhagen, Denmark

Laboratoire d'Anthropobiologie Moléculaire et d'Imagerie de Synthèse, Université de Toulouse, University Paul Sabatier, CNRS UMR 5288, 31000 Toulouse, France

[email protected]

Google Scholar

Find this author on PubMed

Published:https://doi.org/10.1098/rstb.2015.0484

    Abstract

    The field of human ancient DNA (aDNA) has moved from mitochondrial sequencing that suffered from contamination and provided limited biological insights, to become a fully genomic discipline that is changing our conception of human history. Recent successes include the sequencing of extinct hominins, and true population genomic studies of Bronze Age populations. Among the emerging areas of aDNA research, the analysis of past epigenomes is set to provide more new insights into human adaptation and disease susceptibility through time. Starting as a mere curiosity, ancient human genetics has become a major player in the understanding of our evolutionary history.

    This article is part of the themed issue ‘Evo-devo in the genomics era, and the origins of morphological diversity’.

    1. Introduction

    The study of the fossil evidence underlying human evolution makes the backbone of palaeoanthropology [1]. This discipline has greatly advanced the understanding of our own origins, in terms of the environment in which we and extinct hominins emerged, dispersed and interacted with each other [2,3]. However, the fossil record is discontinuous and fragmentary, often precluding access to the full morphological range present in past populations. It is thus not surprising that a total redrawing of the human tree, lumping together lineages previously considered as distinct, can be proposed following exceptional discoveries such as the skull and jaw series of Dmanisi [4]. In addition, admixture—which is far more common than previously thought [5]—blurs the species limits for extinct groups, especially since the morphological identification of hybrids is difficult.

    In the mid-1980s, molecular anthropology emerged as a complementary approach that exploits the genetic diversity of contemporary human groups to understand their history and genetic makeup [6]. Almost concomitantly, such methods were applied with success to the fossil record for the first time, adding DNA from ancient individuals and extinct species to the list of proxies available to study our origins. Ancient DNA (aDNA) research truly merges palaeoanthropology and molecular anthropology, and its application to ancient humans and archaic hominins has blossomed in the past decade [7]. Ancient genomes can now be almost routinely sequenced [8], allowing us to follow demographic changes [9] and adaptation [10,11] as they happen with considerable statistical power.

    In this review, we will first present how the technology underlying the characterization of ancient hominin sequence data developed in the past 30 years. We will then show that the information collected revolutionized our understanding of our own origins in terms of (i) whether we developed in genetic isolation from archaic hominins, (ii) how our ancestors populated the planet, and (iii) how we became adapted to a wide range of environments.

    2. Sequencing ancient hominins

    (a) From ancient DNA to short sequences

    The characterization of a short 229-bp stretch of mitochondrial DNA (mtDNA) from the skin of a museum specimen of an extinct zebra (quagga) prepared in the mid-1840s kick started aDNA research [12]. It did not take long before ancient human DNA was first reported [13]. DNA was extracted from tissues from an approximately 2400-year-old Egyptian mummy, and a multicopy 3.4 kb Alu element was sequenced following bacterial cloning. We know today that aDNA is extremely fragmented down to template sizes of 30–80 bp, so, the Egyptian mummy Alu sequence was probably from a modern DNA contaminant. It nonetheless paved the way for a systematic use of recombinant DNA techniques to a broader range of archaeological samples.

    Molecular cloning is quite demanding in terms of amounts and quality of the DNA material. Very conveniently, the polymerase chain reaction (PCR) technique had just been developed to amplify minute amounts of specific genomic targets up to a level compatible with downstream sequencing [14]. PCR was successfully applied to DNA from approximately 7000-year-old human brains preserved in peat bogs [15]. Most importantly, and in combination with new extraction techniques, PCR allowed the analysis of DNA from calcified remains such as bones [16,17] and teeth [18], moving the scope of aDNA research from the anecdotal to the broader scale. PCR naturally became the main technology of aDNA research for the following 20 years.

    (b) PCR limitations and authentication

    No matter how great PCR is, it is not necessarily easy to apply to aDNA as (i) extensive post-mortem DNA fragmentation precludes the amplification of long DNA fragments [15,19] and (ii) post-mortem chemical modifications of the DNA block DNA polymerases [20,21]. As a result, modern DNA contaminants (e.g. present in reagents [22]) can easily outcompete aDNA templates. Extreme caution is thus required to authenticate aDNA results, especially when analysing ancient human remains where contaminant human DNA can be introduced from the field excavation to the laboratory.

    Some authentication criteria directly exploit the properties of aDNA templates, e.g. PCR efficiency should decrease with increasing fragment sizes [23]. Additionally, post-mortem DNA modifications introduce base substitutions during PCR amplification, leading to relatively high frequencies of certain types of sequencing errors. The most abundant error derives from the deamination of cytosine into uracil—a chemical analogue of thymine, and results in the introduction of C-to-T (and complementary G-to-A) transitions during PCR [2427]. Subcloning of PCR amplicons and the sequencing of multiple clones help to reveal such signatures, and to authenticate the data [28]. Of course, as one single PCR reaction could introduce mis-incorporations very early during amplification, it may happen that the pool of sequenced clones includes a majority of sequencing errors. To be validated, the results obtained in a first laboratory should thus rely on multiple independent PCR amplifications, and be replicated in a second laboratory [29].

    In 1997, this methodological approach perhaps culminated in the characterization of the first mtDNA sequence of a 30 000–100 000-year-old Neanderthal individual [30]. The sequence not only filled all the criteria presented above, but most importantly appeared to fall outside the range of modern human mitochondrial diversity, and could be dismissed as a nuclear mtDNA insert. Authentication of ancient anatomically modern humans (AMHs) data proved, however, to be much more difficult, if not impossible, owing to their expected close genetic proximity. The general opinion was thus that ‘it is impossible to establish [human DNA] authenticity even with rigorous application of [authentication] criteria’ [31, p. 659].

    Another limitation in the analysis of aDNA through PCR was the characterization of a single locus at a time. Such singleplex PCRs limited the recoverable information to a few short DNA fragments only. Multiplex PCRs helped improve the amount of genetic information retrieved [32], but genetic analyses remained still limited to whole mtDNA genomes [33] or a series of short nuclear loci [34,35].

    (c) First shotgun attempts

    The only relatively high-throughput methodology available in the mid-2000s was to end-repair aDNA templates, use bacterial cloning to build genomic DNA libraries, and sequence multiple clones in parallel on a battery of multichannel capillary sequencers. The first attempt on approximately 40 000-year-old cave bear bones was deceptive as it revealed the metagenomic nature of aDNA extracts: a dominant fraction of the sequences produced do not belong to the species of interest but to environmental microbes that colonized fossils after death [36]. Consequently, shotgun sequencing was not time- and cost-effective. An additional step where library clones carrying some fragments of interest were selected by PCR prior to sequencing could help focus the sequencing effort [37], but it still remained low throughput. Fortunately, new sequencing technologies were being developed with the aim to (i) eliminate the rate-limiting bacterial cloning step and (ii) improve the sequencing capacity by several orders of magnitude [38]. High-throughput DNA sequencing (HTS), thus provided the second technological revolution in aDNA research, and paved the way to the characterization of entire ancient genomes.

    (d) The high-throughput sequencing revolution

    The Roche-454 platform replaced bacterial colonies by PCR colonies, which are distributed in individual wells where several hundreds of pyrosequencing reactions are performed in parallel in less than 24 hours [39]. Importantly, the extensive fragmentation of aDNA suited the short sequencing reads (then only approx. 100 nucleotides long, on average). Applied to Neanderthals, HTS first delivered one megabase of nuclear sequence information [40], which was further demonstrated to be predominantly modern human DNA contamination: the estimates of population divergence between Neanderthals and AMHs recovered from the 454 data were similar to those obtained between AMH individuals, in contrast to much older estimates obtained from capillary sequencing data [41]. Clearly, HTS greatly improved the aDNA sequencing capacity, but also demonstrated that new criteria are necessary to authenticate the massive amounts of data generated.

    (e) Novel authentication criteria

    Once again, one of the most important lines of authentication came from the analysis of DNA degradation patterns [42]. First, post-mortem DNA fragmentation proceeds mostly through depurination [43]. After aligning the HTS data against the reference genome of a closely related species, the genomic position preceding read starts is thus expected to show an excess of As and Gs compared to the random base composition of the genome. Second, aDNA templates tend to show single-stranded overhangs, where cytosine deamination occurs much faster than in double strands. As a result, C-to-T transition rates will be uneven along sequencing reads and are expected to increase towards read starts [42,43]. Although the exact DNA degradation profile depends on the molecular tools used for constructing and/or amplifying aDNA libraries [4447], its characterization has greatly advanced the authentication of ancient HTS datasets. It is nonetheless not bulletproof, as mixtures of aDNA templates and modern DNA contaminants, or chemical treatment of the contaminants present in the bone powder [48] can result in bona-fide fragmentation and mis-incorporation signatures. Additional arguments are thus required to rule out contamination.

    Current approaches involve a direct quantification of the contamination levels based on patterns of mutations observed at loci particularly well characterized in modern populations (representing a panel of possible contamination sources). Haploid loci, such as the mtDNA [49] and the sex chromosomes for males [50,51], make such analyses relatively easy as only one allele is expected at a given site in the absence of contamination. The proportion of alternative alleles found can thus be converted in a measure of contamination levels. For mtDNA, Bayesian statistical approaches have been developed to co-estimate contamination levels and call a sequence consensus deprived of mis-incorporations and other sequencing errors [52]. However, given that the number of mitochondria present in a cell varies across tissues, mtDNA-based contamination levels do not necessarily reflect the contamination levels present in the nuclear DNA [53], which should be gauged through the analysis of sexual [51,54] or autosomal [55] chromosomes.

    It is important to note that ancient specimens can provide important evolutionary insights on human evolution, even if extremely contaminated. One way to do so is to stratify the sequence data according to the presence/absence of mis-incorporations driven by DNA damage. This was first proposed by Skoglund et al. [56] and was shown sensitive enough to pull out the entire mitochondrial sequence from a Neanderthal DNA extract that was heavily contaminated by modern human DNA [56]. Similar approaches were recently applied to identify a significant proportion of Neanderthal ancestry in the sequence dataset of an early AMH that lived in Romania some approximately 45 000 years ago and was showing up to 30% of nuclear contamination [57], as well as to authenticate the sequence data recovered from approximately 400 000-year-old archaic hominins excavated from the Sima de los Huesos cave in Spain [58,59].

    (f) Current molecular approaches

    The first ancient human genome was characterized from the hairs of an approximate 4000-year-old Palaeo-Eskimo individual [44]. The average depth-of-coverage achieved was approximately 20×, owing to both the extremely limited microbial content of the DNA extract [60] and the sequencing capacity of the Illumina HTS technology. Since then, all ancient hominin genomes sequenced have been almost exclusively generated with the Illumina technology (with a few exceptions [50,61]), which can today generate up to billions of bases per sequencing run (http://www.illumina.com/systems/sequencing.html).

    A few other specimens, such as Neanderthal and Denisovans samples from the Denisova cave [6264], also benefited from extremely high endogenous DNA content, but such preservation levels are generally exceptional. The petrous portion of the temporal bone, and in particular the inner ear part of the petrous bone, has, however, been recently shown to contain higher proportions of endogenous DNA than other bones [65]. In teeth, cementum contains more endogenous DNA than the dental pulp [66].

    Given the importance of DNA damage patterns for authenticating aDNA, molecular methods have been developed to separate deaminated templates from their undamaged counterparts at the library stage [67]. This strategy builds on the preparation of single-stranded DNA libraries [68]. Since aDNA often contains single-stranded DNA breaks, multiple templates can be formed following heat denaturation (compared to only two for modern DNA contaminants) and be incorporated into the library, which increases the ratio between ancient templates and contaminants. The selection of deaminated cytosines improves this balance even further [67]. This approach can greatly improve the cost-efficiency of sequencing, especially if used in combination with DNA extraction procedures tailored to the ultrashort DNA fragments [69,70] or washing away a fraction of the DNA contaminants [71,72].

    In the case of very low endogenous DNA content or poor preservation, other methods can still be used to retrieve ancient genetic information. All of these methods aim at focusing preferentially on the endogenous fraction preserved in the DNA extract. Leveraging on the fact that human and bacterial DNA methylation marks do not occur in similar base compositional contexts, methyl-binding domains (MBDs) targeting human-like methylation marks can be used to separate human from bacterial DNA in aDNA extracts [73]. However, as (i) the deamination rate of methylated epialleles is much faster than that of unmethylated epialleles and (ii) deaminated methylated epialleles do not bind MBDs, this approach is only recommended for the analysis of recent fossil material. Other approaches consist of enriching amplified aDNA libraries using annealing to pre-selected oligonucleotide baits complementary to the regions of interest. A plethora of these DNA capture methods have been applied with success and differ according to the way baits are prepared [74,75], their DNA or RNA nature [76,77], the annealing conditions in solution [78,79] or on a solid-phase [80], the annealing temperature [81], etc. The scale of the genomic region targeted can vary, from the characterization of the mtDNA (e.g. [82]) to millions of SNPs spread across the genome (e.g. [83]), whole exomes [84], entire chromosomes [78] or even complete genomes [76,77].

    Finally, and even though DNA damage mis-incorporations are important to establish the authenticity of the data generated, several methods have been developed to limit their impact on downstream analyses, by partially or fully removing deaminated templates prior to sequencing [85,86], or masking them in silico by ignoring transitions [50], trimming read ends [63] or downgrading base quality scores according to their probability of being a damage by-product [87]. Therefore, ancient genomes are not condemned to show high error rates and their quality can even rival that of modern genomes [64].

    3. Archaic hominins

    (a) Admixture with Neanderthals

    The debate concerning the possible admixture between Neanderthals and AMHs is old, almost taking root in the early discoveries of Neanderthal remains. Neanderthals represent a distinct group of archaic hominins, defined by specific morphological features, including an occipital bun and a prominent brow ridge [88,89]. They were distributed across the Levant, Europe and East Siberia [35,64] and are present in the fossil record from beyond 230 000 years ago [88] until around 39 000 years ago [90]. AMHs, who left Africa less than ca 55 000 years ago [91] and entered into Europe ca 45 000 years ago [92], might thus have encountered, and become admixed with, Neanderthal groups.

    Early genome scans of modern human populations indicated a number of regions coalescing long before the emergence of AMHs. Since such regions showed no signs of balancing selection, they were proposed to reflect ancestry blocks inherited following admixture between our AMH ancestors and some divergent population of archaic hominins, such as Neanderthals [93]. The sequencing of Neanderthal mitochondrial genes dismissed this hypothesis [30,94], as Neanderthals formed a divergent monophyletic clade and no Neanderthal haplotype was found within modern humans. However, the non-recombining mtDNA can only trace the genealogical history of a single maternally inherited marker; the mtDNA tree observed cannot be used to reject the admixture hypothesis [95]. The test required to sequencing the Neanderthal genome instead.

    The first Neanderthal genome (approx. 1.3×) was drafted using three approximately 38 000-year-old Neanderthal individuals from Croatia (Vindija) [50]. Neanderthals had more shared derived polymorphisms with Eurasians than Africans, which fitted a model where the Neanderthal and AMH populations diverged some around 550 000–765 000 years ago (assuming the mutation rate for human from [96]) and where Neanderthals and Eurasians admixed after the latter left Africa. An alternative model without admixture, assumed that (i) early AMH populations were spatially structured within Africa and (ii) the AMH groups that left Africa shared some ancestry with the ancestors of Neanderthals, who had left Africa hundreds of thousands years earlier [97]. This alternative model could, however, not explain the size distribution of the genomic blocks of Neanderthal ancestry found in modern humans, which was only compatible with an origin approximately 47 000–65 000 years ago (maximum around 86 000 years ago) [98]. The admixture scenario has since been confirmed through the characterization of two additional Neanderthal genomes from the Altai (approx. 30×), and from Mezmaiskaya, Caucasus (0.5×) [64], showing that 1.5–2.1% of the genome variation present in contemporary humans in Europe and Asia is inherited from Neanderthals [99].

    The shape of the size distribution of Neanderthal genomic blocks found in AMHs is a function of the time elapsed since the admixture event (and also the number of such events). Therefore, Neanderthal blocks should be larger in the genome of early AMHs from Eurasia than in that of modern Eurasians, and their rate of decay by recombination could be used to precisely date the time of admixture(s). Using this principle, and the genome sequences of two Upper Paleolithic AMHs, the Neanderthal admixture could be estimated to be approximately 52 000–58 000 years old [47,100]. Current population models, however, have shown that Neanderthal admixture occurred more than once. First, the approximate 37 000–42 000-year-old AMH from Peştera cu Oase, Romania, was found to show around 6–9% of Neanderthal ancestry [57]. The size of Neanderthal genomic blocks was extremely large, suggesting a Neanderthal ancestor within the past four to six generations (i.e. two centuries). Additionally, tracking Neanderthal alleles in contemporary human individuals [99,101,102] revealed a significantly higher Neanderthal ancestry within contemporary Asians than contemporary Europeans [99,102]. This difference is not compatible with a single admixture pulse from Neanderthals within the ancestral population of modern Europeans and Asians, but requires instead a second pulse within the Asian lineage after it diverged from modern Europeans [99,103,104], i.e. within the past 38 000 years [47].

    (b) Admixture with Denisovans

    As a matter of fact, current population models suggest admixture events with more archaic hominins than just the Neanderthals. One such group has been nicknamed the Denisovans, after the analysis of the genome sequence from an Upper Paleolithic finger bone excavated at the Denisova cave (in the Siberian Altai) revealed that another group of archaic hominins than Neanderthals existed in Eurasia some ca 50 000–110 000 years ago [6264]. Denisovans are known by only one genome sequence, and genome-wide information from two additional specimens [63,105]. The Denisovans' mtDNA is very divergent, and tracks a lineage that split from a population ancestral to both Neanderthals and AMHs approximately 1.0 Ma [105,106]. Their nuclear genome, however, depicts Denisovans as a population that is more related to Neanderthals than to AMHs, and diverged from the Neanderthal lineage approximately 381 000–473 000 years ago [63,64]. All contemporary, non-African peoples carry a similarly small amount (less than 1%) of Denisovan DNA, except for Papuans and other closely related Oceanic/Melanesian populations, who share as much as 2–6% of Denisovan derived mutations [62,63,104]. Simulations explicitly modelling demographic changes indicate a 2.3–3.7% Denisovan contribution by gene flow into modern Papuans and probably 0.1–1.6% into other Asian groups (as represented by Han Chinese) [107]. In addition, the Denisovans probably also received some genetic influx from Neanderthals also present at Denisova cave (1.8% at best) as well as from a still unknown and very divergent group of archaic hominins (0.2–1.2%), probably routing back to early Eurasian groups of Homo [107]. Finally, some gene flow (0.1–2.1%) occurred from AMHs representing an early African population or the ancestral population of all Africans into Altai Neanderthals [107]. Where and when this gene flow took place cannot be inferred in the absence of a known temporal and spatial map of the Denisovan range. However, this, and recent reports of AMH remains in China as early as 120 000 years ago [108], suggest that AMHs could have left Africa early after the emergence of our own species and interbred with archaic hominins prior to the Out-of-Africa wave approximately 50 000 years ago, which shaped current patterns of worldwide diversity in modern humans.

    (c) From genomes to species-specific traits

    Overall, the Neanderthal and Denisovan genomes have revealed that our own evolutionary history is complex, with multiple episodes of admixture with several groups of archaic hominins. They also provided a fantastic opportunity to track the origin of particular genetic variants along the branches of the hominin tree. In particular, genomic regions showing alleles different from that of the chimp in (almost) all modern humans, but showing the chimp allele in archaic hominins, represent regions that define our own genetic makeup and that were likely important in the process leading to the emergence of our species [80]. Among the ca 31 000 single nucleotide polymorphisms (SNPs) and ca 4000 indels identified, only 96 amino acid substitutions were found to be fixed in modern humans and ca 3000 changes could potentially influence gene expression [64]. Further functional work is required to understand the exact phenotypic consequences of such changes.

    Similarly, finding genomic regions uniquely derived in Neanderthals or Denisovans can reveal the genetic basis underlying archaic traits. Recent surveys of Neanderthal exomes have, for instance, revealed that the Neanderthal lineage accumulated more non-synonymous mutations in genes involved in skeletal morphology, while behavioural and pigmentation genes have changed more along the AMH lineage [84]. In any case, the Neanderthal- and/or Denisovan-specific variants identified, as well as their demographic profile showing very limited effective population sizes [84] (representing a few thousands individuals at best [63,64,107]), start to unveil specific characteristics of archaic hominins.

    4. Reconstructing population models in anatomically modern humans

    (a) The origin of the Palaeo-Eskimos

    Ancient DNA provides access to genetic variation in past populations, the most of which may not be accessible in the modern-day genetic pool owing to some past demographic processes (e.g. bottleneck and isolation). The first ancient AMH genome was sequenced from a Palaeo-Eskimo, using an approximate 4000-year-old hair sample from northwestern Greenland [44]. Beyond the characterization of some phenotypic traits (e.g. eye and hair colour, adaptation to cold climate), this ancient genome was used to decipher the enigmatic origin of the extinct Saqqaq culture, and its affinities with extant populations. The ancestry analysis provided evidence that Palaeo-Eskimos migrated from Siberia to North America approximately 5500 years ago. The data supported two independent migrations involving first the Palaeo-Eskimos, then the Neo-Eskimos—the ancestors of modern-day Inuits. These findings have later been confirmed using additional ancient and modern genomes from the Holarctic, with further evidence of an early gene flow between the Palaeo- and Neo-Eskimo lineages [109].

    (b) The peopling of the Americas

    Since the Saqqaq genome study, several hundreds of ancient AMHs have been analysed for genome-wide sequence data. Statistical tools have been developed to specifically address questions about gene flow and/or population splits [110112], and the findings have greatly improved—even challenged—the understanding of our past [7]. The colonization of the American continent by AMHs is one such question where aDNA can provide invaluable insights, mostly because analysis of modern Native American population history is complicated by a bottleneck soon after Columbus' landfall in 1492 [82,113], and the following admixture with Europeans and Africans [114]. The genome from a 24 000-year-old Siberian individual from Mal'ta was found to be basal to present-day Europeans and closely related to Native Americans, revealing some of the European genetic signal in modern-day Native Americans as deriving from a mixed ancestry of the first inhabitants of the Americas [115]. Furthermore, the first ancient human genome from the Americas, that of a Clovis child (Anzick-1) buried 12 600 years ago, supports a pre-Clovis occupation of the Americas [116]. Thus, this genome provides crucial evidence to the debate about who—Clovis or pre-Clovis people—were the first inhabitants of the Americas. Finally, the genome sequence of Kennewick Man (buried around 8500 years ago) is closely related to some North American Native populations, whereas the Anzick-1 genome is related to Central and South American populations, pinpointing an early population structure within the Americas [117], also supported by a large dataset of pre-Columbian mtDNA sequences [82].

    (c) The Neolithic transition in Europe

    Palaeogenomic data have been used to test extreme models of the spread of farming from the Near East during the Neolithic transition in Europe. The cultural diffusion model, in which ideas and technologies spread into Europe, implies a genetic continuity between pre-Neolithic hunter–gatherers and Early Neolithic farmers. The demic diffusion model, in which early farmers migrated, implies a replacement of hunter–gatherers by Early Neolithic farmers as well as ancestral Near-Eastern genetic variants in present-day Europeans. These two models have been tested multiple times using modern and (low resolution) ancient genetic data (see [7] for review). Recent palaeogenomic studies have provided compelling evidence to support a Near-Eastern origin of European Neolithic farmers [79,118], even showing a direct genetic link between the Near-Eastern and European Neolithic farmers via the analysis of genomic data from Anatolian and Aegean individuals buried 8700–6000 years ago [119,120]. In addition, the differential ancestry between pre-Neolithic hunter–gatherer and Early Neolithic farmer genomes demonstrates a clear genetic discontinuity between these two groups [65,79,112,120], supporting a spread of humans, rather than ideas, into Europe.

    (d) The complex European genetic makeup

    Despite the sparse sampling of pre-Neolithic Europeans, inferences about the ancestry of western Eurasians show complex interactions between populations starting during the Upper Palaeolithic [47,83]. Western Eurasians and East Asians diverged outside of Africa between 45 000 [100] and 36 200 years ago [47]. Then, Ice Age western Eurasians formed a meta-population covering a large geographical area between Europe and Central Asia [47,115], and contributed genetic variation to present-day Europeans and the first humans in the Americas [109,115,116] from 37 000 years ago onwards [83]. The analysis of palaeogenomic data from more than 200 western Eurasians who lived between 8000 and 2000 years ago demonstrates that modern European genomic variation can be explained by three main ancestral origins: indigenous Palaeolithic hunter–gatherers, Near-Eastern Neolithic farmers and Bronze Age pastoralists from the Russian steppes [79,118,120,121]. The population transformations occurring during the Bronze Age transition were a large-scale migration pulse from the Russian steppes and the simultaneous admixture of these eastern populations with Middle Neolithic European farmers [79,121].

    5. Adaptation signatures

    (a) Candidate loci

    Adaptation to changing environment, climate, lifestyle or social structure is central to human evolution. Ancient DNA provides a unique means to track genetic determinants of adaptive traits as they emerge in past human populations. Lactase persistence, common in people with European ancestry (but also in some African, Middle Eastern and southern Asian groups), is a dominant Mendelian trait that confers the ability to digest lactose in adults. The derived allele associated with lactase persistence in Europeans (−13 910*T) is absent in early European farmers [122]. Statistical models based on the modern distribution of −13 910 C/T and the absence of the derived allele in Early Neolithic farmers in central Europe suggested that the derived allele was positively selected approximately 7500 years ago in a region between the northern Balkans and Central Europe, and spread as part of the expansion of Early European farmers [123]. However, the statistical models were based on the hypothesis that the derived allele −13 910*T arose first in Early Neolithic farmers and spread from the southeast corner of Europe. The most recent ancient European population studies challenged these results. First, the derived allele was not present before and during the Early Bronze Age, which indicates an onset of positive selection more recently than ca 7500 years ago [65,121]. Second, a large genome-wide study of Early Eurasians led to the identification of a strong selection signal at the genomic site responsible for lactase persistence in Europe [120]. Interestingly, the ancient populations with the highest derived allele frequencies have a Russian steppe ancestry, suggesting that lactase persistence spread in Europe with the arrival of the steppe pastoralists ca 4500 years ago [79].

    (b) Genome-wide scans

    The study by Mathieson and collaborators is the largest aDNA study to date, and includes genome-wide data collected from 67 previously characterized and 163 novel West Eurasian individuals who lived between 8500 and 2300 years ago [119]. It allowed the refinement of previous findings about past population transformations in Europe [79], but also most importantly the scanning of the genome of ancient West Eurasians for signatures of positive selection. The study reported selection signals at loci associated with diet (SLS22A4, DHCR7, FADS1–2, ATXN2/SH2B3), pigmentation (SLC45A2, GRM5, HERC2/OCA2) and immunity (TLR1-6-10, major histocompatibility complex on chromosome 6), in addition to lactase persistence [120]. Interestingly, selection on complex traits such as height, body mass index, waist-to-hip ratio, type 2 diabetes and inflammatory bowel disease, was also tested. Only height showed evidence of selection in ancient Eurasians [120]. Given the trend to increasing sample size and optimization of molecular techniques, it is likely that statistical approaches for detecting selection will become increasingly robust in the near future. Ultimately, we envision that association studies of traits with a medical relevance will incorporate the dimension of time—and the concomitant evolutionary perspective—offered by aDNA.

    (c) Adaptive introgression

    Beyond selection of genetic variation that arose specifically along the AMH branch, a growing number of studies recently showed that admixture with archaic hominins has provided genetic variants that facilitated humans' adaptation to their environment (see [124] for a review). One such example is the Tibetans' adaptation to high altitude hypoxic conditions [125130], where the derived allele of EPAS1—which is associated with low haemoglobin levels at high altitude, decreased blood viscosity and the risk of cardiac complications—likely derives from Denisovans or Denisovan-like groups [131,132].

    The Tibetans' admixture-mediated adaptation to high altitude is not anecdotal. Several alleles found in the modern human genome were strongly selected after gene flow from archaic hominins into AMH, including loci associated to the immune system [104,133136] and metabolism [104]. Additionally, genomic blocks of Neanderthal ancestry appear to be enriched in genes affecting keratin filaments [99,102], suggesting that Neanderthals might have provided AMHs in Eurasia with adaptive skin phenotypes. Interestingly, a recent study has investigated the genetic association between Neanderthal variants and a large electronic database of medical phenotypes in approximately 28 000 European adults [137]. At approximately 135 000 loci where Neanderthal SNP variants could be unambiguously identified, the study confirmed a higher association of Neanderthal SNPs with skin phenotypes (corns and callosities) but also with the susceptibility to develop myocardial infection, depression, coronary atherosclerosis and obesity. It is thus likely that introgressed archaic variants that provided an advantage to early AMHs who left Africa have now become maladapted to the modern Western lifestyle.

    The genetic legacy of Neanderthals and Denisovans in the modern human genome is not random. In addition to showing signature of adaptive admixture, genomic maps of archaic ancestry also show deserts depleted of archaic introgression [99,102104]. Typically, the X-chromosome, which contains lots of male hybrid sterility genes, shows an approximate fivefold reduction in Neanderthal ancestry. Additionally, gene-rich regions and genes highly expressed in testes also tend to show limited Neanderthal ancestry. This probably results from deleterious epistatic interactions between AMH and archaic alleles, reducing fertility in male hybrids [102]. It is also noteworthy that the increased deleterious load in Neanderthals probably resulted in a significant drop in their fitness, which accelerated the elimination of Neanderthal alleles when placed in an AMH genomic background [138].

    6. Perspectives

    Over about 30 years, aDNA research has moved from the characterization of short stretches of DNA to the sequencing of complete genomes [8]. At the time these lines are written, several hundreds of ancient AMHs and half a dozen archaic hominins have provided genome-wide sequence information [5658,64,76,79,80,83,84,100,109,112,120,139143] and/or whole-genome sequences [44,47,50,51,6165,115119,121,141,144151]. Even though the history of aDNA applied to ancient humans started with the analysis of Egyptian mummies, no genome information have yet been recovered from such specimens. By contrast, the first 2 megabases from approximately 400 000-year-old archaic hominins have been recently characterized [58], paving the way for further genetic analyses of ancient hominins in the Middle Pleistocene. With molecular methods tailor-made to the biochemical features of aDNA, it is likely that large-scale genetic information could be retrieved from environments not favourable to preservation of DNA—such as tropical areas, or hot and dry deserts—but where hominin remains that are essential for the understanding of our own evolutionary origin have been excavated.

    The first maps presenting how the AMH genetic variation was distributed at key historical periods have now been drafted. Combined with archaeology, history and linguistics, these promise to bring our understanding of past populations to a brand new level. We can also start to better comprehend ancient individuals' phenotypes by exploiting the genetic information present at key functional loci, even for traits that do not fossilize such as skin and eye colour [120] or lactase persistence [120,121]. This probably exceeds by far the expectations of even the most enthusiastic aDNA researchers five years ago.

    Importantly, pioneering work aiming at detecting ancient epigenetic signatures has shown that beyond the DNA alone, DNA methylation marks and the whole compaction state of the chromatin can be preserved [152,153]. Although direct approaches based on bisulfite sequencing [154,155] and enrichment of methylated epialleles [73] have been used, the most successful approach currently available leverages on aDNA damage patterns to computationally derive genome-wide cytosine methylation and nucleosome maps [8,152,156]. Interestingly, lifestyle and socio-cultural differences are known to influence such epigenetic signatures in contemporary human groups [157]. Their persistence in aDNA molecules paves the way for an investigation of how epigenetic profiles might have been modified during major human evolutionary transitions, such as the Neolithic and the Industrial Revolution [158].

    Even though we focused in this review on ancient human DNA, human fossils can also provide lots of additional information about the past. For instance, the DNA from ancient pathogens can help identify the aetiological agents of major historical epidemics, such as the Black Death [159] or the Justinian Plague [160]. In addition to providing invaluable calibration for bacterial molecular clocks [161], such work can also sometimes revolutionize current views on the origins of human diseases. For example, the plague is only described in the historical record from the fifth century BCE (the Plague of Athens) but has been found to haunt human populations already in the Bronze Age, some 5000 years ago [162]. Beyond traces of pathogens, some archaeological material such as dental calculus (i.e. calcified plaque) preserves the genetic composition of ancient microbial communities [163,164], paving the way for studies exploring how the human microbiome changed over time. More prosaically, the genetic material that can be extracted from dental calculus as well as coprolites (fossilized faeces) [165168] can provide information about past diet, sometimes enabling the identification of the source of meat and plants at the species level and advantageously complementing classical isotopic analyses.

    The methodology presented herein is equally applied to animal and plant remains to provide additional important insights into human evolution, such as how past human groups transported and exchanged crops [169] and livestock [170], but also which animal [171] or plant [169] characters were preferred in different socio-cultural contexts and even represented early selection targets during domestication [172174]. Similarly, the genetic information present in the fossil record of wild species can be used to track past demographic trajectories and test whether their populations expanded or collapsed in the face of major climate changes and/or human activities [175177]. Together with the analysis of environmental aDNA preserved in sediments [178], this can reveal when and to what extent humans started to become a major evolutionary force, driving species to extinction [179], transporting commensals overseas [180] and even deeply transforming the plant and animal communities present in their environment [181].

    PCR has been a very useful technique for decades in aDNA research, but is now completely superseded by more powerful technologies. We believe that the foreseeable future of aDNA research relies on the systematic use of HTS methods, or any not-yet-developed technique that will optimize the quality and quantity of output genetic information from as little bioarchaeological material as possible. The detection and handling of exogenous DNA contaminants in aDNA datasets should also remain of paramount importance. Finally, aDNA researchers should not blindly aim at generating ‘big data’, and should instead keep anthropological, archaeological, paleontological and ecological questions in mind. Only then will aDNA research remain fascinating for the general public and attract more aspiring talented researchers.

    Competing interests

    We declare we have no competing interests.

    Funding

    This work was supported by the Danish Council for Independent Research, Natural Sciences (Grant 4002-00152B); the Danish National Research Foundation (Grant DNRF94); the Villum Fonden (Grant miGENEPI), and; the 'Chaires d'Attractivité 2014' IDEX, University of Toulouse, France (OURASI). B.L. is funded by the Australian Research Council.

    Footnotes

    One contribution of 17 to a theme issue ‘Evo-devo in the genomics era, and the origins of morphological diversity’.

    Published by the Royal Society. All rights reserved.

    References