New ancient Eastern European Yersinia pestis genomes illuminate the dispersal of plague in Europe

Yersinia pestis, the causative agent of plague, has been prevalent among humans for at least 5000 years, being accountable for several devastating epidemics in history, including the Black Death. Analyses of the genetic diversity of ancient strains of Y. pestis have shed light on the mechanisms of evolution and the spread of plague in Europe. However, many questions regarding the origins of the pathogen and its long persistence in Europe are still unresolved, especially during the late medieval time period. To address this, we present four newly assembled Y. pestis genomes from Eastern Europe (Poland and Southern Russia), dating from the fifteenth to eighteenth century AD. The analysis of polymorphisms in these genomes and their phylogenetic relationships with other ancient and modern Y. pestis strains may suggest several independent introductions of plague into Eastern Europe or its persistence in different reservoirs. Furthermore, with the reconstruction of a partial Y. pestis genome from rat skeletal remains found in a Polish ossuary, we were able to identify a potential animal reservoir in late medieval Europe. Overall, our results add new information concerning Y. pestis transmission and its evolutionary history in Eastern Europe. This article is part of the theme issue ‘Insights into health and disease from ancient biomolecules’.

SpexMill), decalcified in lysis buffer (0.5 M EDTA, 10% proteinase K), and incubated with rotation for 48 hours at 37 o C. The bone powder was then precipitated by centrifugation for 5 min at maximum speed, the supernatant was concentrated using Amicon centrifugal units (30 kD, Millipore) to the final volume 100-150 μl. DNA was then extracted from the filtrate using silica spin columns (Qiagen MinElute PCR Purification Kit) according to the manufacturer's protocol.
The final volume of the extract was 60 μl. DNA quantity was assessed using Qubit fluorometer (Thermo Fisher Scientific). DNA extracts were stored at -20 o C.

Primary screening
Initial screening for the presence of Y. pestis DNA was performed using primers specific to the plasminogen activator (pla) gene located on the high-copy pPCP1 plasmid of Y. pestis as described elsewhere (52 bp fragments, [10]). In addition, the primers for longer pla-fragments (133 bp fragments, [11]) and subsequent Sanger sequencing was done, in order to exclude false-positive results. As a control for the presence of bacterial DNA of the same length in the extracts, a parallel PCR was performed using universal primers for V6 region of bacterial 16S rRNA [12]. Pla-positive samples were built into NGS libraries [13,14]. Due to a small amount of skeletal material for the rat, the screening stage for the rat sample was skipped, and the rat DNA was directly transformed into NGS libraries following the same methods [13,14].

NGS Library preparation and shotgun sequencing
Double-stranded indexed Illumina libraries were constructed according to the protocols [13,14] specifically developed for ancient DNA. Index combinations containing unique 8 bp barcodes were used for double indexing. Ten PCR cycles were used for the indexing step. Indexed libraries were quantified using Agilent 2200 TapeStation System, and equimolar quantities of every library were pooled together and sequenced on Illumina HighSeq 4000 with 2*75+8+8 cycles. The sequencing was performed at the Functional Genomics Center Zurich, Switzerland.

Target enrichment
The five samples showing positive Y. pestis signals in shotgun sequencing (Rostov16039, Rostov2033, Rostov2039, Azov38, and Gdansk8) and the rat sample were subjected to target enrichment [15,16]. The libraries prepared for shotgun sequencing were used for enrichment.
SeqCap EZ Prime Developer Probes (Roche) were used for in-solution capture. Full Y. pestis chromosome (NC_003143.1) and three plasmids, pCD1 (NC_003131.1), pMT1 (NC_003134.1), and pPCP1 (NC_003132.1) were used to design the probes. Target enrichment was performed according to the manufacturer's protocol. Briefly, amplified indexed libraries were mixed in equimolar amounts to a final concentration of 1.5 µg of DNA per capture reaction and hybridized with DNA capture probes using the following regimen: 95 o C for 5 min, 47 o C for 20 hrs in a thermocycler with heated lid (57 o C). After this, the captured DNA samples were washed using HyperCap Beads (Roche). The whole bead-bound DNA samples (about 20 µl) were used for subsequent PCR amplification with the same pair of primers which was used for amplification of indexed DNA libraries prior to enrichment. 14 cycles of amplification were performed. Enriched libraries were quantified using Agilent 2200 TapeStation System and sequenced on Illumina NextSeq500 with 2*75+8+8 cycles (Functional Genomics Center Zurich).

Read processing, mapping, and variant calling
First, all libraries belonging to the same individual were merged. Then, all samples were processed Before variant calling, one base at the 5' and 3' end, respectively, was trimmed by one base pair using FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) to remove sites that could have been affected by ancient DNA damage. Subsequently, the reads were re-filtered for length and remapped using the parameters described above. The Genome Analysis Toolkit (GATK) version 3.8.0 [21,22] was used to generate a mapping assembly and SNP calling. The reference base was called if the position was covered by a read at least three times and the quality score was at least 30. The base was called as a SNP if the quality score was at least 30 and 90% of the mapped reads contained this variant.
In addition, all merged libraries from humans were mapped to the human mitochondrial genome (NC_012920.1) as described above with the exception of variant calling.
To assign the rat to the correct species, this sample was independently mapped against different reference mitochondrial genomes of the genus Rattus, namely Rattus fuscipes (NC_014867.1), Rattus leucopos (NC_014855.1), Rattus norvegicus (NC_001665.2), and Rattus rattus (NC_012374.1) using the parameters described when mapping against Y. pestis, with the exception of the variant calling. In addition, the sample was mapped against Mus musculus (NC_005089.1).
Furthermore, these data were mapped to the complete nuclear genome of R. rattus and R.

Metagenomic screening
To detect the presence of Y. pestis in studied samples and determine the Yersinia species that is most likely present in the rat sample, we performed a comparative mapping with MALT [36] using all complete bacterial, viral, and archaeal genomes in GenBank [37] as a reference (version May 2018). MALT was executed with the following mapping parameters: Only reads with a minimum 85% identity (--minPercentIdentity) were considered as a possible match to the reference.
Moreover, the minimum support parameter (--minSupport) was set to 5, i.e. only nodes with minimum support of five reads are kept. BlastN mode and SemiGlobal alignment were applied and a top percent value (--topPercent) of 1 was set. All other parameters were set to default. MALT results were analyzed and visualized using MEGAN6 [38].
The reference database also includes various Yersinia strains, which were used for the identification of the reads mapping to Yersinia from the rat sample (Y. enterocolitica (NC_008800.1), Y.
intermedia (NZ_CP009801.1), and Y. massiliensis (NZ_CP028487.1)). Indels were excluded from vcf files before creating consensus sequences. The regions with coverage below three were masked during consensus construction. Next, CDS sequences were extracted from consensus sequences using gffread software from GFF Utilities version 0.11.5 To access the phylogenetic placement of the partial Yersinia pestis strain reconstructed from the rat sample, a maximum likelihood tree was calculated based on a SNP alignment using positions that were covered at least three times. We added one random strain per branch, the rat strain, Y. pseudotuberculosis, and Y. enterocolitica since they two contained the maximum number of mapped reads after Y. pestis (Supplementary Figure 5). The alignment was created as described above. RAxML version 8.2.12 [39] was used with 100 bootstraps and the GTR -GAMMA model.

BEAST analysis
We used the Bayesian framework BEAST v1.10.4 [40] to estimate divergence times and substitution rates. All published modern and ancient strains [2, 15, 24-35] were treated with the EAGER pipeline [17] using the parameters described above. In the analysis, we only included the samples representing branch 1 of the Y. pestis phylogeny that fulfilled quality criteria of at least 3fold coverage at each called site and at least 60% of the reference genome covered. The SNP alignment was built with MUSIAL (https://github.com/Integrative-Transcriptomics/MUSIAL) and a SNP was used when it was called in at least one sample. All positions with more than 3% missing data were excluded. The resulting SNP alignment consisted of 620 SNPs for a total of 82 historical and modern strains [15,25,27,28] prepared using an unpublished R script by Sebastian Duchene) and Bayesian Evaluation of Temporal Signal (BETS, [44]). In DRT, substitution rate estimates for ten replicates of the BEAST analysis with tip dates randomized among the samples do not overlap with the estimate using the original tip dates (Supplementary Figure 7), indicating sufficient temporal signal for calibrated phylogenetic analysis in the dataset. BETS analysis using Bayes Factor based model selection also detected temporal signal supporting the justification for our tip dating analysis with Bayes Factors of 1.97x10 96 and 2.65 x10 96 to support the tip dated versus isochronous phylogeny using path sampling and stepping stone sampling, respectively. Root-to-tip regression was projected using TempEst v1.5.3 [45]; however, it did not support the presence of temporal signal in the data resulting in Rsq = 0.19 and a Correlation Coefficient of -0.43.

Genome coverage
Per base depth was obtained by using samtools depth software. CG count were counted in 100 bp windows for Y. pestis CO92 genome. Circular graphs for depth and GC content were created using CIRCOS software [46]. Full Y. pestis chromosome (NC_003143.1) and three plasmids, pCD1 (NC_003131.1), pMT1 (NC_003134.1), and pPCP1 (NC_003132.1) were used for the analysis. Estimates for ten replicates with tip dates randomized among the samples (1-10) and for the original data ("data"). The lack of overlap between the original estimate values and the estimates for the replicates indicate that the dataset represents a measurably evolving population, i.e. the temporal signal in the dataset supports the applicability of tip dating analysis.