Global distribution of single amino acid polymorphisms in Plasmodium vivax Duffy-binding-like domain and implications for vaccine development efforts

Plasmodium vivax (Pv) malaria continues to be geographically widespread with approximately 15 million worldwide cases annually. Along with other proteins, Duffy-binding proteins (DBPs) are used by plasmodium for RBC invasion and the parasite-encoded receptor binding regions lie in their Duffy-binding-like (DBL) domains—thus making it a prime vaccine candidate. This study explores the sequence diversity in PvDBL globally, with an emphasis on India as it remains a major contributor to the global Pv malaria burden. Based on 1358 PvDBL protein sequences available in NCBI, we identified 140 polymorphic sites within 315 residues of PvDBL. Alarmingly, country-wise mapping of SAAPs from field isolates revealed varied and distinct polymorphic profiles for different nations. We report here 31 polymorphic residue positions in the global SAAP profile, most of which map to the PvDBL subdomain 2 (α1–α6). A distinct clustering of SAAPs distal to the DARC-binding sites is indicative of immune evasive strategies by the parasite. Analyses of PvDBL-neutralizing antibody complexes revealed that between 24% and 54% of interface residues are polymorphic. This work provides a framework to recce and expand the polymorphic space coverage in PvDBLs as this has direct implications for vaccine development studies. It also emphasizes the significance of surveying global SAAP distributions before or alongside the identification of vaccine candidates.

receptor-ligand interactions [5][6][7][8][9][10]. Most of the parasite proteins involved in such interactions are potential targets for the human immune system and display extensive polymorphisms as a mechanism for immune evasion [11,12]. One such candidate for Pv is the cysteine-rich Duffy-binding protein (DBP), which is in a subset of the erythrocyte-binding antigens (EBAs) [13,14]. DBP helps the parasite bind to Duffy antigen receptor for chemokines (DARC) for invasion and entry-known as the DARC-dependent invasion pathway [8,15,16]. There are also DARC-independent invasion pathways as reported for in Pf and Plasmodium knowlesi (Pk) [9,17]. DARC is present on the surface of endothelial cells and erythrocytes [18]. It is a hepta-helical transmembrane protein and is involved in pro-inflammatory responses as it is a promiscuous receptor for chemokines/cytokines [18,19]. Pv hijacks human DARC for invasion and uses the parasite-encoded DBL for DARC recognition [20].
In regions where Pv is endemic, naturally acquired immune responses against PvDBL are associated with reduced risk of parasitaemia and lower Pv invasion into host RBCs-thus making PvDBL a domain of interest [33][34][35]. Despite this, one of the major impediments for successful vaccine design to enable global protection is the extensive polymorphic nature of PvDBP-especially of PvDBL, which potentially indicates strategies of evasion from host immune responses [6,7,13,[36][37][38][39][40][41][42]. Some of the polymorphisms in PvDBL cause antigenic drift and hence are responsible for strain-specific immune responses [43]. Complicating these scenarios further are additional facets of Pv biology-several cases of Pv infection have now been observed in Duffy-negative individuals, suggesting that the parasite might have evolved an alternative pathway that is independent of DARC-DBL interaction for invasion [44][45][46][47][48][49][50][51][52]. Studies from Malagasy, Colombia and Ethiopia have also suggested that PvDBP gene copy variations in the field are evident and indicative of Pv evolution [53][54][55]. Therefore, the development of PvDBL-based vaccine seems to be fraught with several challenges-and here we highlight further reasons for scepticism in this direction. V236G  D240E  T259P  F261L  R263S  V282L  V282I   L288F   E283K   R294K  I303R  G323R  K326E  N330D  N331I  F336L  D339G  E340K  K341N  K341Q  R345H  R346C  S353F  S353T  T359R  N372K  I374M  L379I  I385T  E386D  W392R  I393F  Q441E  I443V  I458K  N486I  V488M  F490L  I502S   0 A thorough examination of PvDBL sequences from global Pv isolates will provide a better understanding of how natural selection has shaped this antigen across different populations and continues to do so. India contributes approximately 47% of Pv malaria burden globally [56] (figure 1). We here provide both new PvDBL sequences from across India and also use the PvDBL sequences recorded in GenBank to assess the extent of non-synonymous single-nucleotide polymorphisms which give rise to single amino acid polymorphisms (SAAPs).

Sample collection
A total of 176 finger-pricked blood samples as dried blood spots were collected from adult patients showing symptoms of malaria between 2014 and 2019. The search for cases/samples included active (house-to-house visits) and passive (from malaria clinics, ICMR-NIMR field units and Malaria Parasite Bank of ICMR-

Global Plasmodium vivax Duffy-binding-like domain sequence retrieval, curation and SAAP analyses
The protein sequences under the labels of 'Plasmodium vivax Duffy-binding-like domain' and 'Plasmodium vivax Duffy-Binding Protein Region II' were retrieved from the NCBI GenBank database. All hypothetical, partial and unverified data entries were removed. This formed a final list of 1285 protein sequences to which 73 newly submitted sequences from India were added such that the entire set of 1358 protein sequences constitute the global dataset of PvDBL sequences (table 2). All the sequences were classified on the basis of their respective country of collection. We divided our dataset into six sub-groups so as to represent different continental regions: Southeast Asia (India, Thailand, Republic of Korea, Malaysia, Myanmar and Sri Lanka), Oceania (Papua New Guinea), Americas (Brazil, Colombia and Mexico), Africa (Sudan and Uganda), Middle East (Iran) and Central Asia (Kyrgyz Republic). Protein sequences of all isolates from each region were aligned using Clustal Omega and scanned for polymorphic residues. The country/region-wise divided data points were simultaneously analysed to assess if any country/region-specific SAAPs were prominent. These data were then collated to allow for a global view of polymorphisms and to calculate the frequency of each SAAP. The SAAPs were classified as conservative or non-conservative on the basis of their physico-chemical properties. Analyses beyond this stage included residues that exhibited polymorphic nature with a minimum frequency of 0.5% in the global dataset of 1358 sequences. Those SAAPs with a minimum frequency of 0.5% but less than 5% in the global set are classified here as 'minor' whereas those with a frequency of 5% and above are classified as 'major' (median royalsocietypublishing.org/journal/rsob Open Biol. 10: 200180 L379I  N384S  I389T  W392R  R394L  D399G  S402K  T406S  E407G  Q409K  K414E  T422P  D440H  D440N  I443V  K447R  K447E  Q449R  N455S  N455D  I458K  0  10  20  30  50  40  60  70  80  Molecular surface representation of PvDBL with all polymorphic residue positions found in Papua New Guinean isolates (royal blue) is shown in addition to Site1 ( pink) and Site2 ( purple) residues. Geographically exclusive polymorphic residues are labelled.  royalsocietypublishing.org/journal/rsob Open Biol. 10: 200180 frequency of dataset = 5%). The total number of instances of SAAPs was also classified on the basis of occurrence within particular subdomains of PvDBL to deduce the most variable subdomain of PvDBL. We analysed the distribution of SAAPs on the three-dimensional structure of PvDBL in relation to Site1 and Site2 [25]. The 4 mAb-bound PvDBL structures in PDB (Protein Data Bank), purportedly with straintranscending broadly neutralizing antibodies, were analysed using PDBePISA/PDBsum to identify the interfacial residues [63,68]. These data were finally contextualized to analyse residues that were polymorphic in the global SAAP dataset.

Plasmodium vivax Duffy-binding-like domain SAAP profile within India
Out of the 176 samples from India (collected between 2014 and 2019), 104 were positive for Plasmodium spp. (73 for Pv and 31 for Pf ) and no mixed-species Plasmodium (Pv + Pf ) infection was found. DNA from the 73 P. vivax mono-infected isolates were used for further analysis. Out of 100 sequences downloaded from NCBI, a total of 98 sequences were included in this study (sequences with GenBank IDs FJ491186 and FJ491187 were excluded as they had stop codons in their coding regions). Hence, a total of 171 sequences formed the total sample from India for this study (figure 2). Protein sequence analysis of PvDBL among 171 Pv isolates from India revealed that there were 37 polymorphic residue positions which contained 40 SAAPs, of which 24 SAAPs occurred with a frequency of 1% and above. Of these polymorphisms, amino acid changes at positions 282, 341 and 353 were trimorphic. SAAPs that occurred with a frequency less than 1% but were found to be exclusively present only in isolates from India include D229G, V282I, V236G, D240E, T259P, R294K, I303R, N331I, E386D, I393F, N486I, F490L and I502S. The SAAPs-V282I, I303R, N331I and S353F-were found to occur in multivariant forms globally but have one of those forms exclusive to isolates from India-as shown in figure 3a. Total of 75% SAAPs were non-conservative (30 of 40). E386D was found to occur within 5 Å radius of Site1, while the trimorphic mutation S353F/T occurs within 5 Å radius of Site2 (figure 3b). Considering that N372K, L379I, W392R and I458K either individually or in combination could affect PvDBL antigenicity, the presence of such polymorphisms was sought [43]. I458K was found to occur with a high frequency (56%) as compared to N372K and W392R, which occurred with low frequencies (33% each). The N372K-L379I-W392R trio was found in 33% (56 of 171) of the total samples, while 22% (37 of 171) isolates bear the fourth polymorphic position I458K in addition to the trio. An insertion of a   E340K  K341N  K341Q  R345H  N372K  L379I  W392R  D451N  I458K  T468K  E484K  V506D   10  20  30  50  40  60  70  80
royalsocietypublishing.org/journal/rsob Open Biol. 10: 200180 Kyrgyz Republic from central Asia certified malaria free in 2016 by WHO, also displayed a very unique SAAP profile ( figure 16a,b). Within the 16 sequenced samples, only three major SAAPs were found to occur-D339G, R345H and I458K-resulting in a unique haplotype which reveals much lower PvDBL polymorphism in the Kyrgyz Republic as compared to other geographical regions [42].

Global SAAP profile and implications for vaccine development
We observed that although the cysteine residues in PvDBL were highly conserved, as much as half of all other constituent amino acids show polymorphism (i.e. 140 out of 315). Of these 140 polymorphic residues, approximately 40% were singletons i.e. occurred only once in a total set of 1358 sequences and were distributed randomly on the three-dimensional structure of PvDBL. There were a total of 7298 instances of SAAPs observed in our dataset, out of which approximately 72% of instances of SAAPs occurred in subdomain 2 indicating its highly polymorphic nature when compared to the other two subdomains (0.2% and approximately 22% in subdomains 1 and 3, respectively, figure 17a). Some SAAPs (approx. 6%) which include two minor ones-N260Y and F261L-and one major-R263S-occurred outside the three subdomains in low complexity regions (LCR) ( figure 17a  and b). Thirty-one polymorphic residues were noted to occur with a frequency of at least 0.5% ( figure 18) 18 and electronic supplementary material, table S1). Of these 31 polymorphic residues, three residues displayed conservative changes while 24 residues displayed non-conservative changes. It was noted that of the multivariant residue positions 326, 340, 341, 374, 430 and 460 found in many geographical regions, their distinct forms K326N, E340Q/T, K341T, I374L/R, P430L and V460I were found exclusively within Brazil. Similarly, W349R was exclusively found in PNG but its other form W349L was unique to RoK. S353F was found to be exclusive to isolates from India. K428R was unique in isolates from Myanmar while its other form, K428Q, was found only in Korean isolates. The polymorphic residue A463R/P which showed a trimorphic change was uniquely observed in isolates from Sri Lanka. S402K occurred as a globally major SAAP while being found to be exclusive to PNG. I277M and K366N were globally minor SAAPs which occurred exclusively in isolates from Brazil.
The locations of major and minor SAAPs do not overlap with the DARC engaging pockets called Site1 and Site2 ( figure 19) [23,25]. A distinct clustering of four major SAAPs-D339G, E340K/Q/T, K341N/Q/T and R345H-at a conformational epitope-was found to be distal and opposite to Site1 and Site2 ( figure 19), in agreement with the original hypothesis that immune evasion by the parasite is most likely facilitated by polymorphic clustering on regions away from DARC recognition residues [11]. The above-mentioned residues are part of an immunodominant epitope which may be employed as a decoy by the parasite to evade host immune response. A PvDBL construct lacking the charged and polar residues in this epitope, known as DEK-null, reduces the immunogenicity of this immuneevading epitope and has been hypothesized to be a valid vaccine candidate [78].
Two major polymorphisms-S353F/T in Subdomain 2 and W392R in Subdomain 3-were found to be buried deep within the protein core. S353T/F was noted to occur within 5 Å radius of Site2 while W392R is closer to Site1 than Site2 [25]. The effects of swapping a hydrophobic residue (W392R) to a basic one or that of a polar residue to a hydrophobic one (S353T/F) are likely to be drastic. How this may change the topology of PvDBL remains unanswered.
Our analyses reveal that only 16% instances of total SAAPs occur in the vicinity of Site1 and Site2, and the vast majority (84%) were spread throughout the surface of the PvDBL molecule (figure 20). Among those in the vicinity of the two sites, 2% instances of total SAAPs fall within 5 Å radius of Site1 and 3.6% map within 5 Å radius of Site2 within 5Å radius of Site1 within 5Å radius of Site2 intervening between Site1 and Site2 away from Site1 and Site2 royalsocietypublishing.org/journal/rsob Open Biol. 10: 200180 (table 3). Of the DbCRs, the SAAPs K273R and R274K in Site2 and K297E and R304T in Site1 are observed once each (1/1358 sequences). It is intriguing that the functionally conserved Site2 has more instances of SAAPs in its 5 Å radius than Site1 (3.6% versus 2%). Among the DaBRs, Y295, N296, L369 and I376 fall proximal to Site1 while K289, Y363, K366 and K367 are closer to Site2 while F299 and F373 fall in the space between Sites 1 and 2 (see electronic supplementary material, figure S1). Half of the DaBRs display polymorphisms albeit with varying frequencies-K289N (1/1358), F299S (5/1358), K366N (58/1358), K367E (1/1358) and I376N (1/1358). One major SAAP (L379I) maps to the intervening structural spaces between Site1 and Site2 and contributes 10.4% of total SAAP instances. It is noted that four SAAPs-I277M, W349R (both minor SAAPs), S353T and T359R (both major SAAPs)-occurred within 5 Å of Site2, whereas only one major SAAP-I374M-was found to occur within 5 Å radius of Site1 indicating higher physico-chemical conservation of the local environment of Site1 over Site2 and its    importance as a key DARC-binding region, in agreement with previously published studies [11,14,18,22,23,25,27,28]. Three major SAAPs-N372K, L379I (in Subdomain 2) and W392R (in Subdomain 3)-seem to be closely related as they occurred together in 43% of all the isolates analysed across different geographical regions. Moreover, 28% had the aforementioned trio in addition to I458K (in Subdomain 3). Among these, only L379 occurred in the intervening structural region between Site1 and Site2. N372 was found to be proximal to Site2, W392 was buried closer to Site1 in comparison to Site2, and I458 was distal to both Site1 and Site2 [25]. Previous studies have demonstrated the role of the SAAP trio N372K-W392R-I458K-particularly for residues W392 and I458 [43]. These are part of the dominant neutralizing epitopes that can change the antigenic character of DBP and alter the efficacy of immune inhibition [43].

Polymorphisms in epitopes of broadly neutralizing strain-transcending mAbs
There are four PvDBL-mAb complex structures submitted in the PDB, each purportedly with broadly neutralizing strain-transcending activity (PDB IDs: 5F3J, 6OAN, 6OAO, 6R2S) [79][80][81]. It is notable that in all the four complexes, the PvDBL itself is monomeric and not dimeric [11,25]. Analyses of each of these was performed in detail to assess their interface binding regions and the location of DARC binding sites 1 and 2 in the context of the neutralizing mAb footprints ( figure 21). The oldest of these PvDBL-mAb structures is with a potent inhibitory murine monoclonal antibody 2D10 (PDB: 5F3J) whose epitope lies within subdomain 3 and was suggested to be conserved [81,82]. However, we show here that approximately 44% of the interfacing residues in PvDBL exhibited a tendency to be polymorphic (figure 22b). Indeed, some of these SAAPs interact directly and could alter the binding efficiency of the antibody (table 4). Q441E, K428R/Q and P430A/L are globally minor SAAPs found at the interface of this epitope (figure 22a). Interestingly, an insertion of leucine was observed between V429 and P430 in isolates from Brazil, India and Iran that might potentially change the epitope conformation drastically thus affecting antibody binding efficacy.
Three more recent PvDBL-mAb structures are now available that dissect the structural basis of Pv neutralization with naturally acquired [79,83] or vaccine-induced human antibodies against PvDBP [80]. Two of the three structures were with mAbs 092096 (PDB: 6OAO) and 053054 (PDB: 6OAN) that bind to the same face of PvDBL and their epitopes show substantial overlap (figure 21) [79]. These epitopes lie in subdomain 2 and partly engage with PvDBL Site2 residues, thereby neutralizing Pv by targeting the proposed dimer interface of PvDBL [79,83,84]. Closer inspection of the PvDBL-mAb interface revealed that approximately 54% and 48% of these interface residues (from antibody 092096 and 053054, respectively) tend to be polymorphic (figure 22c,d,e,f ). There was high variability in the topology of recognized epitopes. T359R and N372K are two major SAAPs at the interfaces between PvDBL and mAbs 053054 and 092096 (figure 22c,e). Urusova   royalsocietypublishing.org/journal/rsob Open Biol. 10: 200180 frequencies globally, they do exist in Pv isolates from multiple geographical regions. Most of the polymorphisms observed were non-conservative and therefore may alter the antigenic profiles of PvDBL. Additionally, it has been previously reported that when N372K occurs with W392R and I458K, it compromises the efficiency of immune inhibition [43]. Analysis of the fourth PvDBL-mAb structure (PDB: 6R2S) with mAb DB9 revealed a similar trend of partial conservation of the recognized epitope despite its presence within subdomain 3 [80]. Approximately 24% of the interface residues here were also found to be polymorphic, with one globally minor SAAP V488M (figure 22g,h; table 7).
Since residues that are critical for binding of antibodies have a tendency to be polymorphic, assessing their contribution to antibody-based neutralization becomes essential using a much wider landscape of SAAPs from field isolates. Besides, given the extremely low and non-proportional coverage of polymorphic space in terms of representation of sequences from Pv field isolates, conclusions about any new SAAPs being present in Pv endemic regions cannot be drawn. These analyses therefore suggest that deployment of any potential PvDBL vaccine is unrealistic unless a greater extent of sequence variability has been mapped in Pv affected regions of the world.

Discussion
This study is the first to investigate the SAAP profile from India which contributes approximately half of the global Pv burden [56]. A limiting factor of this analysis is the assumption that sequence isolates from GenBank are from samples with mono-infection of Pv and no mixed samples were involved. A comparison of the most common amino acid changes in PvDBL among presently studied Pv populations revealed that isolates from India showed a different SAAP profiling as compared to isolates from other geographical regions. The non-synonymous For PvDBL, there seem to be currently two vaccine development strategies. The first entails developing a protein vaccine that encompasses most or all of the sequence variants present in endemic regions [66,78,81]. As highlighted, this will certainly not be an efficacious or cost-effective process given the sparse sampling of the PvDBL sequence diversity studied so far. Intriguingly, Afghanistan, Ethiopia, Indonesia and Pakistan are also Pv endemic regions but no PvDBL SAAP data are available for these regions in public databases. Altogether, this paucity of coverage will make it difficult to assess the feasibility of any PvDBL-based vaccine. Periodic surveillance of PvDBL polymorphisms across the whole Pv endemic space is essential. These analyses also call attention to the issue of implicit bias with regard to the country/ region for which the DBL-based vaccine development effort is focused at. The sequence space that has so far addressed polymorphisms in PvDBL is very limited, but despite this, varying profiles of SAAPs are evident in different Pv endemic regions of the world. The evident diversity just within Asia is indicative of the immense difficulty in designing a single subunit vaccine capable of covering the full spectrum of variations in the isolates of Pv in the context of their DBL sequences.
The second strategy purports the use of the antigen with conserved B-cell epitopes that can overcome strain-specific immunity [78,82,83]. Thus, broadly neutralizing antibodies raised against a globally conserved epitope would be the basic requirement for the rational design of a straintranscending DBL-based vaccine [82]. From the four PvDBL-mAb complex structures, it is evident that two of the four purported neutralizing mAbs do not bind near the supposed dimer interface. Further, although it has been suggested that the above mAbs bind to conformational epitopes that are broadly neutralizing and hence the possible target of strain-transcending global protection, we show via an in-depth global SAAP analysis, that this may not be so. The polymorphisms observed at amino acid positions 372, 379, 392 and 458 might be due to immune pressure that aids the parasite to evade host immunity. This pressure generates new PvDBL variants that are still functional but adept at escaping inhibitory antibodies. It is noteworthy that host immunity evasion due to SAAPs in PvDBL are predicted to hamper binding efficacy of neutralizing antibodies that recognize conformational epitopes due to a change in topological features; however, these are not confirmed outcomes. This work therefore suggests that no single PvDBL sequence may be used as a platform for vaccine development as the inherent variability in PvDBL sequences will render such vaccines inefficacious. A global real-time database needs to be built from Pv afflicted regions to assess the current state and spread of polymorphic Pv strains, and their respective DBL sequences to discern presently conserved epitopes that may be targeted by neutralizing antibodies. Moreover, the presence of Pv in DARC negative populations along with the recent finding that gene amplification could be an additional immune evasion mechanism used by Pv emphasizes the importance of a multicomponent vaccine strategy that in addition to eliciting inhibitory antibodies may also reduce the ability of the parasite to escape immunological control [85]. In sum, this study suggests the need of a vast expansion and analysis of the PvDBL sequence database, region-and country-wise, in order to assess the real feasibility of PvDBL as a vaccine against Pv malaria.
Ethics. This article presents research with ethical considerations. The ethical approval (ECR/NIMR/EC/2013/100) for the study was given by the Institutional Ethics Committee of ICMR-NIMR, New Delhi, India, and written informed consents were obtained before the samples were collected from all patients who participated in the study.
Data accessibility. DNA sequences: GenBank accession numbers MT502426-MT502498. The relevant sequences will be publicly available at the NCBI portal after May 2021.
Authors' contributions. A.Sh. conceived of the study, coordinated it and critically revised the manuscript. P.M. performed sequence recovery, curation, alignments, conception and data analyses along with drafting the whole manuscript. S.K. did the sample collection, amplification and sequencing of 73 P. vivax isolates from India. S.M. participated in designing the study, data analyses and critically revising the manuscript. A.Si. aided in designing the sample collection and associated experimentation along with revising the manuscript critically. V.P. interpreted the sequencing results and aided in revision of the manuscript. All authors gave final approval for publication and agree to be held accountable for the work performed therein.