Evolutionary history of enigmatic bears in the Tibetan Plateau–Himalaya region and the identity of the yeti

Although anecdotally associated with local bears (Ursus arctos and U. thibetanus), the exact identity of ‘hominid’-like creatures important to folklore and mythology in the Tibetan Plateau–Himalaya region is still surrounded by mystery. Recently, two purported yeti samples from the Himalayas showed genetic affinity with an ancient polar bear, suggesting they may be from previously unrecognized, possibly hybrid, bear species, but this preliminary finding has been under question. We conducted a comprehensive genetic survey of field-collected and museum specimens to explore their identity and ultimately infer the evolutionary history of bears in the region. Phylogenetic analyses of mitochondrial DNA sequences determined clade affinities of the purported yeti samples in this study, strongly supporting the biological basis of the yeti legend to be local, extant bears. Complete mitochondrial genomes were assembled for Himalayan brown bear (U. a. isabellinus) and black bear (U. t. laniger) for the first time. Our results demonstrate that the Himalayan brown bear is one of the first-branching clades within the brown bear lineage, while Tibetan brown bears diverged much later. The estimated times of divergence of the Tibetan Plateau and Himalayan bear lineages overlap with Middle to Late Pleistocene glaciation events, suggesting that extant bears in the region are likely descendants of populations that survived in local refugia during the Pleistocene glaciations.


Introduction
The Tibetan Plateau, the most extensive and highest plateau in the world with an average altitude of 4500 m above sea level, is partly surrounded by the Himalayan range and many of Earth's highest mountains. Dramatic environmental changes caused by the uplift of the plateau and climatic oscillations during the Quaternary glaciations substantially impacted the evolution, diversification, and distribution of local plant and animal species [1]. Because of its heterogeneous habitat and topography, the region sustains a distinct biome with rich biological diversity and high level of endemism [2]. Extant plants and animals on the plateau are likely either descendants of relict colonists that migrated from other areas or recently derived endemic species [3][4][5][6][7][8][9][10]. However, the colonization and population expansion history of many species remains poorly understood, despite current and future impacts of climate change and anthropogenic threats to diversity loss.
Two brown bear subspecies, the Himalayan (Ursus arctos isabellinus) and the Tibetan (U. a. pruinosus) brown bear, inhabit the northwestern Himalayan region and southeastern Tibetan Plateau, respectively [11][12][13][14] (figure 1). These two subspecies have distinct skull features and the Himalayan brown bear is characterized by its paler and reddish-brown fur, while the Tibetan brown bear has generally darker fur with a developed, white 'collar' around the neck [11]. As the most widely distributed bear in the world, phylogeography of the brown bear has been well studied in North America, Europe and Japan [10,[16][17][18][19][20][21][22][23][24]. However, due to limited sampling, very few studies have been conducted on these enigmatic subspecies. Two complete mitochondrial genomes (mitogenomes) from captive Tibetan brown bears are available, while only two short fragments of mitochondrial DNA (mtDNA) sequences from the Himalayan brown bear have been published [10,15]. Phylogenetic analyses based on these sequences suggested that the Tibetan brown bear might be a relict population of the Eurasian brown bear [10], and that the Himalayan brown bear, which is genetically distinct from the Tibetan brown bear, may represent a more ancient lineage [15]. However, phylogenetic relationships deduced from limited genetic data and number of individuals have put these preliminary findings into question. For example, the phylogenetic placement of a Gobi brown bear (U. a. gobiensis) sequence [25] was inconsistent with a later study also including sequences from Himalayan brown bear [15], and phylogenetic trees based on mtDNA control region and cytochrome b sequences, respectively, of the Tibetan brown bear were incongruent [26]. The other bear species found to inhabit the Tibetan Plateau-Himalaya region is the Asian black bear (U. thibetanus), which historically had a continuous distribution from southeastern Iran through Afghanistan and Pakistan to India, Nepal, China, Korea, Japan, and south into Myanmar and the Malayan peninsula [12,27,28]. Today it occupies a patchy distribution throughout its historic range, including across a narrow band from Pakistan, Kashmir and to Bhutan, the home range of the Himalayan black bear (U. t. laniger) [27,29], which was described as distinguished from other black bear populations by its longer, thicker fur and smaller, whiter chest mark [11]. Although the range of Asian black bear overlaps with brown bear in the Tibetan Plateau-Himalaya region, it is mostly found at lower altitudes in forested hills ranging from 1200 to 3300 m [12,29]. So far, little is known about the evolutionary history of black bear in the region and no sequence data are available from the Himalayan black bear. To elucidate the evolutionary and migration history of the Himalayan and Tibetan bears, more genetic data from additional individuals are critically needed.
It has been reported that the brown bear populations in the Tibetan Plateau -Himalaya region have declined by more than half in the past century because of habitat loss, fragmentation, poaching and intense hunting by humans [12,29 -31]. Facing the same threats as brown bears, Asian black bear populations have also decreased in the past few decades [29,32,33]. The Himalayan brown bear is listed in the IUCN (International Union for the Conservation of Nature) red list of threatened species as critically endangered [34], while the Asian black bear is listed as vulnerable [27]. Hence, clarifying population structure and genetic diversity for conservation management purposes is also urgently needed for these endangered bear species.
The Tibetan Plateau-Himalaya region is also known for the legend of purported 'hominid'-like creatures, referred to  Figure 1. Distribution of Himalayan and Tibetan brown bear and localities of samples studied. Red and blue lines outline the approximate historical range of the Himalayan brown bear and the Tibetan brown bear, respectively (redrawn from Galbreath et al. [15]). The triangles, diamonds and circles, respectively, indicate the approximate collecting localities of the studied samples associated with Asian black bear, Tibetan brown bear and Himalayan brown bear. as the 'yeti', 'chemo', 'mheti' or 'bharmando', among other regional monikers (for simplicity they are referred to in this paper as yeti). Despite decades of research and anecdotal association with bears and other mammals in the region [35,36], the species identity of the mysterious yeti is still debated, given the lack of conclusive evidence. A survey of hair samples attributed to yeti and other anomalous, supposed primates, was recently conducted to identify their genetic affinities [37]. Based on a short fragment of the mtDNA 12S rRNA gene from two samples collected in Ladakh, India and Bhutan, respectively, and a 100% match to a sequence recovered from a subfossil polar bear [38], Sykes et al. [37] speculated that an unclassified bear species or hybrid of polar bear and brown bear might be present in the Tibetan Plateau-Himalaya region. However, this speculation was critiqued by others [39,40], and their phylogenetic analyses using the sequences from Sykes et al. and other available Ursidae sequences did not rule out the possibility that the samples belonged to brown bear. Thus, to get accurate species identification, comprehensive phylogenetic analyses using genetic information from more variable and informative loci are needed.
Here, we report on new analyses of 24 field-collected and museum specimens, including hair, bone, skin and faecal samples, collected from bears or purported yetis in the Tibetan Plateau-Himalaya region. Based on both amplified mtDNA loci as well as complete mitogenomes, we reconstructed maternal phylogenies to increase knowledge about the phylogenetic relationships and evolutionary history of Himalayan and Tibetan bears.

Material and methods (a) Samples
A total of 24 samples, including hair, tissue, bone and faeces, were analysed in this study (electronic supplementary material, table S1). Of these, 12 samples had been collected for a previous analysis of Himalayan brown bear in the Khunjerab National Park, Northern Pakistan [30], two samples were from purported Himalayan brown bears housed in the Lahore and Islamabad Zoos, one bone sample (M-70448) recorded as U. a. pruinosus was obtained from the American Museum of Natural History, and nine samples were provided to us by the Reinhold Messner Museum and the Icon Film Company.

(b) DNA extraction
Genomic DNA from 12 faecal samples collected in the Khunjerab National Park, Northern Pakistan [41], were previously extracted using the QIAmp DNA Stool Kit (Qiagen, USA) in a room dedicated to processing hairs and faeces [30]. DNA from two ethanol-preserved hair samples from Lahore and Islamabad Zoos were isolated in a room dedicated to nucleic acid extraction from modern samples. A DNeasy Blood & Tissue DNA Kit (Qiagen, USA) was used according to the manufacturer's protocol, except for the following modifications to optimize extraction of DNA from hair: 10 strands of hair from each sample were cut into fragments of approximately 0.5 cm with a sterile razor blade. Ethanol was allowed to evaporate (approx. 1 h), and hair fragments were transferred to a microcentrifuge tube. Three hundred microlitres of ATL buffer, 20 ml proteinase K, 20 ml 1M DTT (dithiothreitol) and 4 ml RNase A were added, and samples were incubated at 568C overnight until completely lysed. A negative control was prepared alongside each hair sample. Following lysis, 300 ml AL buffer and 300 ml 100% EtOH were added to each sample, and the mixture was pipetted into the DNeasy Mini Spin Column and centrifuged for 2 min. DNA was eluted twice with 50 ml AE buffer for a total elution volume of 100 ml. The remaining 10 samples, which had not been intentionally preserved for later extraction of DNA, were regarded as non-modern (ancient) samples, and thus DNA extractions and pre-amplifications were performed in a dedicated state-of-the-art cleanroom facility, physically separated from any modern DNA laboratory and appropriate for ancient DNA research. The following protocols designed for ancient DNA extraction were used: for bone samples, 50-100 mg fine bone powder was obtained from each sample by using a dental drill (HKM Surgical Handpiece, Pearson Dental, USA), and 50-100 mg skin samples were sliced into approximately 1 mm pieces with a sterile razor blade. DNA from the bone powder and the sliced skin samples was extracted using the protocol in Dabney et al. [42]. DNA from the hair samples were extracted using the protocol provided by Gilbert et al. [43] with the following modifications: 1 ml digestion buffer was used for each hair extraction. After purification with phenol and chloroform, additional purification was performed using Qiagen MinElute PCR Purification Kit (Qiagen, USA). Finally, a 12.5 ml EB buffer elution step was performed twice to obtain a total elution volume of 25 ml. DNA from approximately 100 mg faecal samples was extracted using the QIAmp DNA Stool Kit (Qiagen, USA). The final elution step was also performed twice to obtain a total volume of 100 ml. Negative controls were prepared alongside all extractions.
(c) PCR amplification PCR amplifications from modern DNA were performed in a 25 ml reaction volume each containing 2.5 ml of 10 Â PCR buffer (Applied Biosystems, USA), 1.0 ml of dNTP mixture (2.5 mM each dNTP; Applied Biosystems), 2.5 ml of MgCl 2 (25 mM, Applied Biosystems), 0.1 ml of Taq DNA polymerase (5-10 U ml 21 ; Applied Biosystems, AmpliTaq Gold), 1 ml each of the forward and reverse primers (10 mM), 2 ml of the genomic DNA and 17.4 ml of H 2 O. The PCR reaction mix for ancient DNAs was prepared in the cleanroom by adding 21 ml H 2 O, 1 ml of each forward and reverse primer, and 2 ml genomic DNA to each GE illustra PuReTaq Ready-To-Go PCR bead (GE Healthcare, USA). A touchdown thermal cycling protocol was used as follows: 10 min at 948C, 10 cycles of 30 s at 948C, 30 s annealing with the temperature decreasing every cycle by 0.58C from 558C to 508C, and 30 s extension at 728C, followed by 25 cycles the annealing temperature set to 508C and denaturation and extension phases as above. For samples of unknown identity, two sets of mtDNA 12S rRNA primers [44,45] were used to determine their approximate taxonomic affinity. Bear-specific primers targeting the mtDNA control region and cytochrome b ( [46] and primers designed for this study; see electronic supplementary material, table S2) were used for samples identified as ursid bears. PCR products were Sanger sequenced directly using the same primers as in the PCR.  (e) Mitochondrial genome assembly Assembly of mitochondrial genomes was performed using the following strategy: species-specific mitochondrial reference genomes were selected from initial species identification based on phylogenetic analyses of amplicon sequences (results not shown). All Ion Torrent reads were first aligned against the reference genome using BWA aln (v. 0.7.13) [47] using the default parameters, except for the parameter '-l 1024' to disable the seed and increase high-quality hits for the damaged ancient DNA reads [48]. The remaining unmapped reads were then aligned against the same reference using BWA mem with default parameters (see electronic supplementary material, table S3, for assembly statistics). We filtered for human contamination by applying an edit-distance based strategy [48]. All reads were mapped to a human mitochondrial genome reference (NCBI accession J01415.2) using the same BWA mapping method described above. Reads with a higher mapping edit-distance to human mtDNA than to bear mitochondrial genomes were considered of likely human origin and were removed from the bear mitogenome mapping results. PCR duplicates were removed with the MarkDuplicates tool in the Picard software suite v. 1.112 using lenient validation stringency (http://broadinstitute.github.io/picard/). Consensus calling was carried out using Samtools mpileup [49] with default settings.

(f ) Phylogenetic analyses
Complete mitochondrial genomes, partial control region sequences, and cytochrome b sequences for 11 Asian black bears, 76 American black bears, two cave bears (U. spelaeus), 200 brown bears, and 52 polar bears were obtained from GenBank (electronic supplementary material, table S4). Two GenBank datasets were created: one dataset included only complete mitogenomes for the non-Tibetan/ Himalayan bears and both partial (amplicon sequences) and complete mitogenomes for Tibetan and Himalayan bear lineages, while the other dataset included both amplicon sequences and complete mitogenomes for non-Tibetan/Himalayan bears. All new sequences produced in this study were added to these two GenBank datasets and used in the phylogenetic analyses. Sloth bear (U. ursinus) and sun bear (U. malayanus) sequences were included to root the trees (electronic supplementary material, table S4). Alignments were generated using MAFFT [50] followed by manual adjustment in BioEdit [51] to exclude the variable number tandem repeats of the D-loop. The total length of the final alignment was 16 412 bp. Maximum-likelihood (ML) phylogenetic analyses were performed using RAxML-HPC BlackBox v. 8.2.8 [52] in the CIPRES Science Gateway under the GTR substitution model, which was identified as the best-supported model by jmodeltest2 [53,54]. A total of 1000 bootstrap replicates were conducted to evaluate branch support. Bayesian inference (BI) phylogenetic analyses were carried out using MrBayes v. 3.2.6 [55] in two runs of 5 000 000 Markov chain Monte Carlo (MCMC) generations, with trees for estimation of the posterior probability distribution sampled every 100 generations. The best-fit substitution model was determined by the program by setting Nst¼mixed; 500 000 trees were discarded as burn-in.

(g) Divergence time estimation
Bayesian MCMC-based divergence time estimation was carried out using BEAST version 1.8.0 under the GTR substitution model. The dataset used for molecular dating analysis included only complete mitogenome, since shorter mtDNA regions (e.g. control region and cytochrome b) are generally associated with considerable uncertainty and may bias molecular dating analyses due to homoplasy [10,17]. The uncorrelated lognormal relaxed clock and the constant size coalescent prior were used. Radiocarbon dates and stratigraphically estimated dates for four ancient sequences were used to calibrate ages for terminal nodes, including three sequences from extinct bear species (U. spelaeus and U. deningeri) dated to 31.8 thousand years (ka) BP [56], 44.1 ka BP [57], and 409 ka BP [42], an approximately 120 ka BP polar bear subfossil [38], and seven European brown bears dated to approximately 4.1-37 ka BP [58]. Trees were sampled every 1000 generations from a total of 1 000 000 000 generations. The maximum clade credibility tree was generated using TreeAnnotator, implemented in the BEAST package [59], with 10% burn-in. Effective sampling size value greater than 200 for all parameters sampled from the MCMC and the posterior distributions were examined using Tracer v. 1.6 [60].

Results (a) Identity and phylogenetic placement of the Tibetan Plateau -Himalayan samples
Except for one tooth sample collected from a stuffed exhibit at the Reinhold Messner Mountain Museum, which BLASTmatched dog (Canis lupus familiaris), all other samples were identified as ursid bears. ML tree reconstruction based on amplicon and mitogenome sequences (electronic supplementary material, figure S1)

Discussion (a) Phylogenetic placement and evolutionary history of Himalayan and Tibetan brown bears
Few genetic studies have been conducted of bears in the Tibetan Plateau and surrounding Himalaya region, and their evolutionary history remains enigmatic. Particularly little is known about the Himalayan brown bear (U. a. isabellinus).
First, Masuda et al. [25] reported a 269 bp mtDNA control region sequence from a Gobi bear collected from the Great Gobi National Park in Mongolia, and suggested it was more closely related to Western European brown bears based on a neighbour-joining phylogenetic analysis. Later, Galbreath et al. [15] investigated homologous DNA fragments from two brown bears collected from the Deosai Plains of the western Himalayas. Their analyses demonstrated that the two Himalayan brown bears grouped together with the Gobi bear, confirming a close relationship between these two populations and a clear separation from European and Tibetan brown bears. Our results, providing more data and better resolution, demonstrate that the Himalayan brown bears, including the previously reported Gobi bear and Deosai bears, form a wellsupported, sister lineage to all other extant brown bear clades included here. This result strongly supports Himalayan brown bears as a relict population that diverged early from other brown bear populations. The phylogenetic position of Tibetan brown bears (U. a. pruinosus), which form a sister clade to North American and Eurasian brown bears consistent with previous reports [10,[17][18][19]25], indicates that the Tibetan and other Eurasian brown bears, as well as North American brown bears, are all descendants of a common ancestral lineage. It was proposed that the Tibetan brown bears migrated to the Tibetan Plateau from its source population-ancestral Eurasian brown bears-approximately 343 ka BP, and that they remained geographically isolated from this source population thereafter [10]. Our phylogenetic analyses strongly support this migration scenario.
In our study, brown bear samples collected in the northwestern to western Himalayas were all identified as Himalayan brown bear, while the ones collected in the southeastern Himalayas and Tibetan Plateau were all identified as Tibetan brown bear (figure 1). The historical range of the Himalayan brown bear extends from the north and west of the Taklimakan Desert to the western Himalayas, while the historical range of the Tibetan brown bear lies in the Tibetan Plateau and the southeastern Himalayas [15]. While the Tibetan brown bears share a common ancestry with extant North American and Eurasian brown bears, the Himalayan brown bear appears to have originated from an ancient lineage that experienced long isolation in the mountains of central Asia, at least over the last 658 ka. Although the habitats of the two brown bear subspecies are geographically close, the high-altitude peaks of the Himalayan Mountains have likely impeded migration between these populations, and subsequently kept them as genetically distinct lineages.
(b) Phylogenetic placement and evolutionary history of the Himalayan black bear The phylogenetic topology of Asian black bears is in agreement with a previous finding [61], except here we also include the rare Himalayan black bear (U.

(c) Quaternary climatic oscillations and divergence of local bear lineages in the Tibetan Plateau-Himalaya region
The Tibetan Plateau is one of the youngest plateaus on Earth, created by the collision of the Indian subcontinent with the Eurasian continental plate in early Cenozoic times, followed by diachronous and extensive surface uplifts in the Miocene and even into the Pleistocene [62,63]. Although the dates and details of the uplifts have long been debated, many studies indicate they caused dramatic climatic changes and topographic variation, which facilitated the introduction and evolution of new plant and animal clades and greatly influenced the current spatial distribution of local species and their genetic diversity [64]. The Pleistocene glaciations of the Tibetan Plateau, which is closely related to the progressive uplift of the plateau and the surrounding Himalayan Mountains, have been suggested to have had a highly complex pattern, occurring asynchronously with the Northern Hemisphere glaciation events [65]. Four Pleistocene glaciations have been described in several geological and geographical studies [66][67][68] changes in environmental conditions from cold and arid to warm and wet during the great interglacial period (500-300 ka BP) [68]. Both the divergence of the Himalayan black bear at around 475 ka BP and the Tibetan brown bear at around 342 ka BP overlap with this interglacial period, indicating that ancestors of these bear lineages migrated from lower altitudes to higher altitude locales after glaciers retreated. Subsequently, these populations may have diverged from lower altitude populations due to isolation in the high mountains and the following Guxiang glaciation event. Phylogeographic studies of many Tibetan plant and animal species indicate that local extant plant and animal populations, which mainly derived from colonists migrating from other areas or represent endemic species that diverged recently [3][4][5][6][7][8][9][10], experienced extensive oscillations and survived through glacial periods in multiple refugia or microrefugia on the plateau [1,65,[70][71][72][73][74][75].
Similarly, we speculate that ancestral bear lineages on the Tibetan Plateau and Himalayan Mountains likely immigrated to the region from nearby Asian locales. These ancestral lineages then likely experienced extensive population oscillations caused by local climatic changes and diverged from other bear populations in refugia during the Pleistocene glaciations.

Conclusion
Samples collected in the field and archived in museum or private collections can significantly aid in our understanding of the genetic variation and phylogeographic patterns of rare and widespread species. To determine accurate species identification and clade affinity, however, phylogenetically informative genetic markers and appropriate phylogenetic analyses are critically needed. Based on a BLAST search using a 104 bp fragment of the mitochondrial 12S rRNA locus, which gave a 100% match to a complete mitogenome recovered from a subfossil polar bear [38], Sykes et al. [37] suggested that a previously unrecognized bear species or possibly a hybrid between brown bear and polar bear exists in the Himalayas. However, as also demonstrated by others [39,40], the short 12S rRNA gene fragment is insufficiently informative to determine precise taxonomic identity, particularly among closely related species, although it can be a useful screening marker to assess preliminary species affinities. We isolated DNA and assembled a complete mitogenome from a hair sample (collected in Ladakh, India, and named 'YHB' in this study), which based on their shared collection locality and other anecdotal evidence obtained from Icon Films, our sample source, may come from the same specimen that Sykes et al. [37] speculated represents an unknown or hybrid bear.
Here, we unambiguously show that this sample is from a bear that groups with extant Himalayan brown bear. Similarly, we were able to determine the clade affinities of all other purported yeti samples in this study and infer their well-supported and resolved phylogenetic relationships among extant bears in the Tibetan Plateau and surrounding Himalayan Mountains. This study represents the most rigorous analysis to date of samples suspected to derive from anomalous or mythical 'hominid'-like creatures, strongly suggesting that the biological basis of the yeti legend is local brown and black bears.
Data accessibility. The DNA sequences generated in this study are deposited in the NCBI database under accession nos MG066702-MG066705 and MG131869 -MG131905.