Barcoding the largest animals on Earth: ongoing challenges and molecular solutions in the taxonomic identification of ancient cetaceans

Over the last few centuries, many cetacean species have witnessed dramatic global declines due to industrial overharvesting and other anthropogenic influences, and thus are key targets for conservation. Whale bones recovered from archaeological and palaeontological contexts can provide essential baseline information on the past geographical distribution and abundance of species required for developing informed conservation policies. Here we review the challenges with identifying whale bones through traditional anatomical methods, as well as the opportunities provided by new molecular analyses. Through a case study focused on the North Sea, we demonstrate how the utility of this (pre)historic data is currently limited by a lack of accurate taxonomic information for the majority of ancient cetacean remains. We then discuss current opportunities presented by molecular identification methods such as DNA barcoding and collagen peptide mass fingerprinting (zooarchaeology by mass spectrometry), and highlight the importance of molecular identifications in assessing ancient species’ distributions through a case study focused on the Mediterranean. We conclude by considering high-throughput molecular approaches such as hybridization capture followed by next-generation sequencing as cost-effective approaches for enhancing the ecological informativeness of these ancient sample sets. This article is part of the themed issue ‘From DNA barcodes to biomes’.

Biomolecular analysis was applied to 17 cetacean bones recovered from seven archaeological sites, including Saint Martin, Cougourlude, and Saint Sauveur on the southern coast of France [1,2], Nuraghe Lu Brandali, Porto Torres, Villa Sant'Imbenia, in Sardinia, and San Rocchino, Tuscany, Italy [3]. Based on previous morphological analysis, four of these samples from Saint Sauveur were presumed to represent possible gray whale remains [4] while the other could not be confidently assigned to species.

DNA sample preparation, extraction and amplification
The ancient whale samples were prepared and processed for DNA extraction in the Ancient DNA laboratory at University of York, following strict protocols for contamination control and detection, including positive pressure, the use of protective clothing, UV sources for workspace decontamination, and laminar flow hoods for extraction and PCR-set-up. Fragment of bone were immersed in 6% sodium hypochlorite for 5 mins, rinsed two times in HPLC grade water, UV irradiated for 30 min on two sides, and ground into powder. DNA from 20-55 mg of bone powder was extracted using a silica spin column protocol [5] as modified in Dabney et al. [6], and DNA was eluted in 50ul. PCR amplifications targeted a 182bp fragment of cytochrome b mitochondrial gene which has been demonstrated to successfully distinguish cetacean species [7,8]. PCR reactions and cycling conditions followed those described in Speller et al. [9]; successfully amplified products were sequenced using the forward primer at Eurofins Genomics, Ebersberg, Germany.

mtDNA sequence analysis and species identifications
ChromasPro software (www.technelysium.com.au) was used to visually analyse and edit the sequences and truncate primer sequences. Sequences were compared with published references through the GenBank BLAST application (http://www.ncbi.nlm.nih.gov/BLAST/), with multiple alignments of ancient and published reference was sequences conducted using ClustalW [10], through BioEdit (http://www.mbio.ncsu.edu/BioEdit). Species identifications were assigned to a sample only if it was identical to published reference sequences from a single species in GenBank; species identities were further confirmed through 'DNA Surveillance', a web-based programme which provides robust cetacean identifications based on comparisons with a comprehensive set of validated cetacean reference sequences [11]. Twelve sequences were uploaded to the Genetic Sequence Database at the National Center for Biotechnical Information (NCBI) (GenBank ID:KT923090-KT923101).

Collagen peptide mass fingerprinting
The 17 cetacean samples were analyzed using the ZooMS protocol described in Buckley et al [12] and Evans et al. [8]. Between 10-30 mg of bone powder was fully demineralized through immersion in 0.6 M hydrochloric acid at room temperature or at 4℃. Samples WH505-507, WH511-513, and WH801-804 were centrifuged, the supernatant was discarded, and the samples rinsed three times with 200 µl AmBic solution (50 mMol ammonium bicarbonate, pH 8.0) before being gelatinised in 100 µl of AmBic solution for 1 hour at at 65˚C. WH501-504 and WH508-510 underwent an additional ultrafiltration step. Following demineralization, these samples were were centrifuged, the supernatant was discarded, and the collagen gelatinised through incubation in 250 µl of 0.6M HCl for three hours at 65˚C. The collagen was ultrafiltered using Amicon Ultra-4 centrifugal filter units (30,000NMWL, EMD Millipore) until the supernatant was concentrated to approximately 100 µl. The retentate was washed three times with 200 µl AmBic solution, and concentrated to a final volume of 50 µl.
For all samples, the resulting collagen was incubated with 0.4µg of trypsin overnight at 37˚C, acidified to 0.1% trifluoroacetic acid (TFA). The collagen was purified using a 100 µl C18 resin ZipTip® pipette tip (EMD Millipore) with conditioning and eluting solutions composed of 50% acetonitrile and 0.1% TFA, while 0.1% TFA was used for the lower hydrophobicity buffer. The resulting collagen was eluted in 50 µl.

Mass spectrometry and taxonomic identifications
One microlitre of the collagen extract was mixed with 1 µl of α-cyano-hydroxycinnamic acid matrix solution (1% in conditioning solution) and spotted onto a 384 spot MALDI target plate, with calibration standards. Sample were spotted in triplicate, and run on a Bruker ultraflex III MALDI TOF/TOF mass spectrometer with a Nd:YAG smart beam laser. A SNAP averaging algorithm was used to obtain moniosotopic masses (C 4.9384, N 1.3577, O 1.4773, S 0.0417, H 7.7583), resulting in a total of 51 individual spectra. mMass software [13] was used to visually inspect the spectra; spectra from replicates of the same sample were averaged, and compared to the list of m/z markers for marine mammals presented in Buckley et al. [14] and Kirby et al. [15]. Taxonomic identifications were assigned at the most conservative level of identification (genus, or family level) based on the presence of unambiguous m/z markers.

Taxonomic identifications
Following analysis of the mtDNA sequences and PMF spectra, taxonomic identifications could be assigned to 15 of the 17 samples. Taxonomic identifications were assigned to 12 archaeological samples using ancient mtDNA sequences and 14 samples using PMF spectra (Table S2; Table S3). The combined results produced 11 fin whale (Balaenoptera physalus), one sperm whale (Physeter catodon), one right whale (Eubalaena glacialis), one Cuvier's beaked whale (Ziphius cavirostris) and one family level identification (Mysticeti). ZooMS and mtDNA identifications were consistent for the 11 samples which produced results using both methods. The three samples that failed to amplify using the whale-specific cytb primers (WH502, 504, 509), also failed to produce unambiguous ZooMS identifications, suggesting poor overall biomolecular preservation in these samples.