Identification of allosteric disulfides from labile bonds in X-ray structures

Protein disulfide bonds link pairs of cysteine sulfur atoms and are either structural or functional motifs. The allosteric disulfides control the function of the protein in which they reside when cleaved or formed. Here, we identify potential allosteric disulfides in all Protein Data Bank X-ray structures from bonds that are present in some molecules of a protein crystal but absent in others, or present in some structures of a protein but absent in others. We reasoned that the labile nature of these disulfides signifies a propensity for cleavage and so possible allosteric regulation of the protein in which the bond resides. A total of 511 labile disulfide bonds were identified. The labile disulfides are more stressed than the average bond, being characterized by high average torsional strain and stretching of the sulfur–sulfur bond and neighbouring bond angles. This pre-stress likely underpins their susceptibility to cleavage. The coagulation, complement and oxygen-sensing hypoxia inducible factor-1 pathways, which are known or have been suggested to be regulated by allosteric disulfides, are enriched in proteins containing labile disulfides. The identification of labile disulfide bonds will facilitate the study of this post-translational modification.


Introduction
Allosteric disulfide bonds are defined by their ability to affect the functioning of the protein in which the bond resides. Reduction or oxidation of allosteric disulfide bonds leads to conformational transitions in the residing protein, that result in a change in either ligand binding, enzyme activity, proteolysis or oligomerization of the protein [1]. cleaved or formed by oxidoreductases of the thioredoxin family or by intra-or inter-molecular thioldisulfide exchange. Over 30 examples of allosteric disulfides have been described. The extent to which biological processes are controlled by allosteric disulfides has not yet been fully elucidated. However, certain processes are clearly regulated by this form of protein control. In humans, thrombosis and haemostasis is an example of a system that is regulated by allosteric disulfides [2]. Disease processes regulated by allosteric disulfide bonds include cancer [3] and viral infection [4]. Clinical relevance lies in the fact that these disulfide bonds can be targeted with inhibitors of certain factors that cleave the bonds, such as protein disulfide isomerase (PDI). Small molecule PDI inhibitors are being developed [5] and a first generation molecule is currently being tested as an anti-thrombotic in a Phase II cancer clinical trial [6].
Studies of the biophysical properties of allosteric disulfides have led to the recognition of defining features of these bonds. Firstly, a conformational signature for allosteric disulfides has been identified based on the sign of the five dihedral angles which define the cystine residue [7]. There are 20 different disulfide bond configurations based on this classification and 3 of the 20 are emerging as allosteric configurations: the -RHstaple, -LHhook and -/+RHhook bonds. Secondly, the -RHstaple and -/+RHhook bonds are more stressed than the other 18 disulfide types [8], which is primarily due to stretching of the sulfur-sulfur bond and neighbouring bond angles. Stretching of sulfur-sulfur bonds is known to accelerate their cleavage [9][10][11][12], so the pre-stress of the -RHstaple and -/+RHhook configurations is very likely important for their reduction and has probably influenced their evolution as allosteric bonds.
The three allosteric configurations constitute approximately 20% of all disulfide bonds in X-ray structures in the Protein Data Bank (PDB) [7]. While bond configuration has proved useful for identifying allosteric disulfides in proteins, it is likely that many, if not most, bonds with allosteric configurations will not have a functional role. Additional methods are needed to identify this post-translational modification. Here, we identify 511 labile disulfide bonds in PDB X-ray structures from bonds that are present in some molecules of a protein crystal or in some structures of a protein, but absent in others. A notable feature of the labile bonds is their pre-stress that likely underlies their facile nature. Biological pathways enriched in proteins containing labile disulfides are the complement and coagulation cascades and cytoplasmic oxygen-sensing hypoxia inducible factor-1 (HIF-1) system. Potential allosteric disulfide bonds in these pathways are presented.

Methods
All X-ray structures released in the PDB as of June 2017 were assembled. The list was culled to exclude all structures with a resolution >2.5 Å. Structures that had been prepared and crystallized in the presence of dithiothreitol or any other reducing agent were removed from the analysis.
To identify missing disulfide bonds, each PDB chain was first mapped to a corresponding UniProtKB accession and protein sequence using the PDBSWS tool [13]. Subsequently for each UniProtKB accession, a list of corresponding PDB chains and disulfide bonds present in any of these PDB chains were recorded. Disulfide bonds in structures were determined by the presence of an SSBOND line in the PDB file. Finally, for each disulfide bond that has now been associated with each UniProtKB protein, all PDB chains mapped to this corresponding UniProtKB accession were analysed to determine whether this disulfide bond is present or missing. If the disulfide bond is missing within a particular PDB chain, that structure was further analysed to establish whether the bond is missing due to a truncated or mutated PDB chain protein sequence. The annotation of each disulfide bond was performed as described previously [7]. A schematic diagram illustrating the analysis is shown in electronic supplementary material, figure S1.
For the analysis of disulfide bond features, a list of culled disulfide bonds from all PDB structures was used. To define the set of culled disulfide bonds, the PISCES PDB culling server was used with a cut-off of 90% homology, maximum resolution of 2.5 and R value of 1.0. Subcellular localization of proteins were obtained from the UniProtKB database. Pathways enriched in human proteins containing labile disulfide bonds were identified using the DAVID Functional Annotation Tool and KEGG pathway analysis [14,15].
A Python script was developed implementing the above analysis and can be downloaded from https://github.com/jwon7011/missing_disulfide.

Results
A total of 1361 unique labile disulfide bonds were identified from the PDB as of June 2017 . The reference  dataset consisted of all 14 033 unique disulfide bonds in the PDB (electronic supplementary material,  table S1). To eliminate poorly defined or erroneous bonds, criteria of a structure resolution <2.5 Å and sulfur-sulfur distances < 10% from the disulfide bond equilibrium length of 2.038 Å [8] were applied. The datasets were refined to present each unique disulfide bond as a single entry (electronic supplementary  material, table S2). The final list contains 511 labile disulfides and 13 030 total disulfides. In X-ray crystallography, the B factor is a measure of the degree to which the electron density of an atom is dispersed. To ensure that the missing disulfide bonds that we detected were not due to uncertainty in the position of cysteines, we compared the average B factor of disulfide-bonded cysteines with those of matched missing disulfide bonds. The B factor for each atom of each cysteine involved in disulfide bond formation (present or missing) was extracted from corresponding PDB structure. The B factor for each disulfide bond was calculated as the average of the 12 atoms per cysteine pair. To compare the B factor of present and missing disulfide bonds, the average disulfide bond B factors for all redundant structures were further averaged, respectively. There was no significant difference between the B factor of present (37.58 ± 21.78, s.d.) and missing (36.24 ± 21.02, s.d.) disulfide bonds (p = 0.0736, paired t-test).

Structural and functional features of the labile disulfide bonds
By comparing the distribution of the 20 disulfide configurations between the entire PDB and labile disulfides, two differences were notable (figure 1a). Within the labile disulfide bonds, an increase in the +/-RHhook and +/-LHstaple configurations was observed (χ 2 test, p < 0.0001). The -LHspiral, which is the main structural disulfide, as well as the +RHspiral configurations were decreased in the labile disulfides compared with the total PDB (χ 2 test, p < 0.0001). The +/-RHhook is the predominant configuration of the catalytic disulfide bonds of oxidoreductases [7], such as PDI. This reflects the conserved position of this bond at the end of an α-helix in a thioredoxin fold. The catalytic disulfides of oxidoreductases undergo cycles of reduction and oxidation and there are several examples of oxidized and reduced structures in the PDB, hence their prevalence in the labile disulfide dataset. Of the 67 proteins in the labile disulfide dataset that have a +/-RHhook configuration, 22 are oxidoreductases (electronic supplementary material, table S2).
The secondary structures that a disulfide links can be informative. For instance, allosteric -RHstaple bonds often link adjacent strands in the same antiparallel β-sheet or constrain β-loops [4,16]. Also, the catalytic disulfide bonds of oxidoreductases link an α-helix to another or a loop structure. For the total PDB, disulfide bonds linking two β-strands were the most common, followed by linking of β strands and loops (figure 1b). For labile disulfides, enrichment of bonds linking α-helices and loops was observed (χ 2 test, p < 0.0001) (figure 1b), which reflects the higher relative number of catalytic +/-RHhook configurations.
As anticipated, oxidoreductases were enriched in proteins containing labile disulfide bonds (figure 1c). Transferases, which includes kinases, methyltransferases and other enzymes that transfer functional groups, were also enriched in proteins containing labile disulfide bonds (figure 1c). Hydrolases and proteins involved in immune function were the largest category of disulfide-containing proteins in the PDB. Immune function proteins contain relatively fewer labile disulfides in this analysis. Overall, labile disulfide bonds were found in proteins of diverse functionalities.
The subcellular localization of proteins containing disulfide bonds was examined using the UniProt designation of the protein. A high proportion of cytoplasmic and nuclear proteins contained labile disulfide bonds (figure 1d). The cytoplasm and nucleus are environments traditionally thought not to be conducive to disulfide bond formation. This is not the case, however, as 509 disulfide bonds have been structurally defined in cytoplasmic proteins (electronic supplementary material, table S1). A high proportion of these disulfides (113, electronic supplementary material, table S2) have been characterized in oxidized and reduced states, indicating that they are unusually labile.

The labile disulfides are characterized by high strain
The conformational constraints on allosteric disulfides imposed by secondary structural features stress the bonds. The stresses fine tune their cleavage and thus the function of the protein. The stresses of the labile disulfides have been compared and contrasted with the average disulfide. There are different measures of disulfide bond stress [7,8] , table S2). Compared with the total PDB, the labile bonds are enriched in +/-RHhook and +/-LHstaple bonds (χ 2 test, p < 0.0001) and have relatively fewer -LHspiral and +RHspiral bonds (χ 2 test, p < 0.0001) (indicated by *). (b) Heatmap displaying the frequency of the secondary structures linked by disulfide bonds in all PDB protein structures and by labile disulfide bonds. There is enrichment of disulfides linking α-helices and loops in labile disulfide bonds (χ 2 test, p < 0.0001).
(c) Distribution of the functional classification of all proteins containing disulfide bonds and proteins containing labile disulfide bonds. Compared to the total PDB, there was a significant increase in oxidoreductases, transferases and isomerases (indicated by *). A significant decrease in disulfide bonds in proteins involved in signalling and immune function was observed (χ 2 test, p < 0.0001). (d) Subcellular localization of all proteins containing disulfide bonds and proteins containing labile disulfide bonds. Compared to the total PDB, a significant increase in cytoplasmic proteins, as well as a decrease in membrane associated and secreted proteins was observed (χ 2 test, p < 0.0001) (indicated by *). total PDB labile **** (b) Relative frequency of DSE ranging from 0 to 60 kJ mol −1 . The DSE was significantly increased for labile disulfides compared to all disulfide bonds in the PDB (p < 0.0001). (c) The relative frequency of the sulfur-sulfur bond distance ranging from 1.96 to 2.14 Å is shown. An increase in sulfur-sulfur bond distance is observed for labile disulfide bonds (p < 0.0001). (d) The average of both α angles was calculated for each disulfide bond. Shown is the relative frequency of the average angle ranging from 95 to 120°. Angles are increased for labile disulfide bonds (p < 0.0001). T-tests were used to compare total PDB to labile disulfides.
Dihedral strain energy (DSE) is an indicator of bond strain. The DSE is defined in terms of the torsion of the five dihedral or χ angles (figure 2a) of the cystine residue [17,18], and has been shown experimentally to reflect the amount of strain in a disulfide bond [19][20][21][22]. The length of the sulfur-sulfur bond and magnitude of the neighbouring angles (figure 2a) also reflect the stress in a disulfide [8]. The allosteric −RHstaple and −/+RHhook disulfide configurations carry tensile pre-stress in the bond due to direct stretching of the sulfur-sulfur bond and α angles, rather than by dihedral angle torsions [8]. This was shown using force distribution analysis, a technique for calculating atom-atom and residue-residue forces from molecular-dynamics simulations. As mechanical stretching of sulfur-sulfur bonds increases their redox potential [9][10][11][12], the pre-stressed bonds are more susceptible to cleavage.
The mean DSE of labile disulfide bonds was significantly higher than that of all disulfide bonds in the PDB (figure 2b). The mean DSE for all disulfide bonds was 12.48 kJ mol −1 , whereas that of labile disulfide bonds was 17.64 kJ mol −1 . The mean sulfur-sulfur bond length (figure 2c) and average α angle magnitude (figure 2d) were also significantly higher than those for all disulfide bonds. The mean sulfursulfur bond length of all disulfides was 2.046 Å, whereas for the labile disulfide bonds it was 2.055 Å. While the increased bond length is small at approximately 1 pm, the high stiffness of sulfur-sulfur bonds means that this change can entail substantial stress. The mean α angle of all disulfide bonds was 104.7°, whereas for the labile disulfide bonds it was 106.1°. Between 1 and 2°of stretching is also seen for the allosteric −RHstaple and −/+RHhook disulfide configurations [8]. Thus, the labile disulfide bonds are more stressed than the average bond based on three measures of strain.
Correlations between the measures of strain on disulfides were examined (electronic supplementary material, figure S2). DSE positively correlated with stretching of the α angles for both labile (p < 0.0001) and total (p < 0.0001) disulfides. There was no correlation between DSE and sulfur-sulfur bond length, or between sulfur-sulfur bond length and α angles for both labile and total disulfides. This indicates that stretching of the α angles is associated with high overall torsional strain, whereas sulfur-sulfur bond length is independent of the α angles.

The labile disulfides with allosteric configurations have higher dihedral strain
The allosteric -RHstaple, -LHhook and -/+RHhook configurations represented 9%, 6.5% and 4.5% of all disulfide bonds in the PDB (electronic supplementary material, figure S3a). For those allosteric disulfide bonds where there are high resolution crystal structures (n = 29, electronic supplementary material, table S3), these percentages increased to 39%, 16% and 16%, respectively (electronic supplementary material, figure S3a). The labile disulfides with allosteric configurations had a higher mean DSE than for all disulfide bonds (18.11 versus 12.74 kJ mol −1 , p < 0.0001, t-test) (electronic supplementary material, figure S3b), which is consistent with the known properties of two (−RHstaple and −/+RHhook) of the three allosteric configurations.
To determine whether specific biological processes or pathways are enriched among human labile disulfides, the labile dataset was analysed using the DAVID Functional Annotation Tool [14,15]. The coagulation and complement cascades and oxygen-sensing HIF-1 pathway are significantly enriched in proteins containing labile disulfide bonds.

Coagulation and complement cascades
When the endothelium that lines blood vessels is damaged the process of thrombosis ensures that any leak is plugged and the endothelium is repaired. Thrombus formation relies on the deposition of two main components: platelets and the product of blood coagulation, fibrin. The complement system clears microbes and damaged cells from the circulation. The coagulation and complement pathways were enriched in proteins containing labile disulfide bonds (p = 0.0081, DAVID Functional Annotation Tool, KEGG pathway analysis, electronic supplementary material, table S2) [14,15] (table 1 and figure 3). Two proteins containing labile disulfide bonds are highlighted.

Urokinase-type plasminogen activator
Urokinase-type plasminogen activator (uPA) is a serine proteinase that converts the zymogen plasminogen into the active protease plasmin. It consists of a serine protease, kringle and growth factor domain. A labile disulfide bond was found linking Cys50 and Cys111 in the kringle domain of uPA. Cleavage of this disulfide displaces the N-terminal charged loop that is implicated in uPA binding to its inhibitor, plasminogen activator inhibitor-1 (PAI-1) [29] (figure 4a). Replacing Cys11 with Tyr reduces uPA activity to 7% of wild-type protein [30].

Factor Xa
Factor X (FX) is a vitamin K-dependent plasma zymogen that plays an essential role in blood coagulation. FX is activated through the extrinsic or intrinsic coagulation pathway by the tissue factor-VIIa complex or the IXa-VIIa complex, respectively. Activated FX (FXa) is then able to form the prothrombinase complex in association with factor Va, which cleaves prothrombin to form active thrombin on negatively charged phospholipid surfaces in the presence of calcium. A labile disulfide bond occurs in the catalytic  Loop-70 forms a single Ca 2+ -binding site [31], whereas loop-225 is a Na 2+ -binding region [32]. Ca 2+ and Na 2+ binding both increase the catalytic efficiency of FXa, and thus thrombin generation, in a synergistic manner [33].

Hypoxia inducible factor-1 signalling pathway
Low oxygen levels lead to the activation of the HIF-1 pathway, which is conserved across metazoan species. Activation of this pathway results in rapid accumulation of the transcription factor, HIF-1α, which stimulates expression of glycolysis genes in response to suppressed oxidative phosphorylation [34,35]. The HIF-1 signalling pathway is enriched in human proteins containing labile disulfide bonds (p = 0.026, DAVID Functional Annotation Tool, KEGG pathway analysis, electronic supplementary material, table S2) (figure 5). Three of these proteins are described below. All HIF-1 pathway proteins containing labile disulfide bonds are shown in table 2.

Protein kinase B (AKT1)
AKT1 is a serine/threonine kinase that plays a central role in glycogen metabolism, cell survival, proliferation and angiogenesis. Cytokines and growth stimuli activate phosphoinositide 3-kinase to generate phosphatidylinositol (3,4,5)-triphosphate (PIP3) patches at the plasma membrane. AKT1 resides in the cytosol but translocates to plasma membrane through interaction with PIP3 via its N-terminal pleckstrin-homology (PH) domain. AKT1 is then activated by phosphoinositide-dependent kinase I and mechanistic target of rapamycin complex by phosphorylation at Thr308 and Ser473, respectively. The PH domain contains a labile disulfide bond that links cysteines 60 and 77 ( figure 6a)  disulfide has the archetypal -RHstaple allosteric configuration. Reduction of the disulfide bond results in movement of variable loop (VL) 3 adjacent to the disulfide bond and VL1 that lines the phosphoinositol binding pocket ( figure 6a). Additionally, a short acidic α helix in VL2 is present when the disulfide is intact but not when the bond is cleaved. Cys60 and Cys77 are conserved in AKT2 and AKT3, as well as in mouse and rat AKT1.

Prolyl hydroxylase-containing protein 2
Prolyl hydroxylase-containing protein 2 (PHD2) is encoded by the EGLN1 gene. PHD2 controls the activity of HIF-1α by hydroxylation, which leads to polyubiquitination and degradation of the transcription factor. As oxygen levels decrease, PHD2 fails to hydroxylate HIF-1α which then relocates to the nucleus to stimulate expression of glycolysis-related genes [35]. PHD2 contains a labile disulfide bond at Cys201-Cys208. Of the many crystal structures available in the PDB, about half of them show PHD2 with a reduced disulfide bond (table 2). The β2β3 loop encompassing residues 238-250 determines substrate specificity of PHD2 [36] and its position in the structure is influenced by the redox state of the Cys201-Cys208 bond ( figure 6b). Notably, reactive oxygen species are implicated in control of PHD2 activity by mediating disulfide-linked homodimerization of the enzyme [37], leading to activation of the HIF-1 pathway and a switch from oxidative phosphorylation to glycolysis.

EP300
EP300 is a histone acetyl transferase that is crucial for normal gene regulation. EP300 and the closely related CBP are involved in binding and coordinating the assembly of transcription factor complexes to influence gene transcription. EP300 consists of multiple well-defined domains, including TAZ1 and TAZ2 domains, which surround the catalytic core. The catalytic core consists of a bromodomain that recognizes acetylated substrates, a HAT domain that acetylates histones and proteins [38,39], and a CH2 region containing a RING domain [40]. A labile disulfide bond was identified in the TAZ2 domain of EP300 that mediates interaction with transcription factors and binds zinc (   bond can differentially form between two of three cysteines (see insets of figure 6c). The disulfide bond can link Cys1796 and Cys1806 or Cys1796 and Cys1801. The redox state of the disulfide influences the conformation of the loops linking the α-helices and may influence zinc binding (figure 6c). Zinc binding is increasingly recognized as being redox sensitive, with reactive oxygen species-mediated disulfide bond formation triggering the release of zinc ions [41].

Discussion
External mechanical forces regulate cleavage of protein disulfide bonds [9][10][11][12][42][43][44][45][46]. Rates of thiol/disulfide bond exchange are subject to mechano-chemical coupling. That is, the reactivity of a disulfide bond can be increased or decreased by mechanical forces that stretch, bend and twist the sulfursulfur and neighbouring bonds. For example, stretching of the sulfur-sulfur bond enhances cleavage of the disulfide.
Internal mechanical forces also control cleavage of protein disulfide bonds in an analogous fashion [8]. Two of the twenty disulfide bond configurations, the -RHstaple and -/+RHhook bonds, are particularly subject to topological stresses and allosteric function has been reported for seventeen of these bonds thus far. The -LHhook configuration is also associated with allosteric function with seven examples thus far, although these bonds are no more stressed than the other eighteen configurations and it remains to be determined the reason for this functional association.
While these biophysical properties are informative and have proved useful for identifying new allosteric bonds, it is likely that the majority of the approximately 2800 disulfide bonds with allosteric configurations in known protein structures will not be redox active. To facilitate identification of this post-translational control of protein function, we mined X-ray structures for labile disulfide bonds that exist is some structures of a protein but are reduced in others. Our hypothesis is that the facile nature indicates a propensity for cleavage and so possible allosteric regulation of the protein in which the disulfide resides.
The limitations of this analysis are the availability of crystal structures, potential differences in the qualities of the structures themselves, and the non-native conditions that may have been employed to obtain the crystals and structures. For example, purifying and crystallizing cytosolic proteins in oxidizing conditions. It is also possible that some of the identified labile bonds are the result of cleavage by X-rays during data collection [47]   stressed based on high average dihedral strain coupled with an average elongated sulfur-sulfur bond length and extended bond angles. As stretching of the sulfur-sulfur bond makes disulfides easier to cleave [9], this feature is likely a major reason why the identified bonds are labile. Five known allosteric disulfides were captured in the labile bonds and visual inspection of a number of the labile bonds suggested an allosteric function, which further supports the conclusion that the labile bonds are enriched in allosteric disulfides. Blood coagulation and haemostasis are processes that are regulated by allosteric disulfide bonds [2]. Four different secreted oxidoreductases, PDI [48], ERp57 [49], ERp5 [50] and ERp72 [51], have been found to be essential for thrombosis in mice, and proteins involved in thrombosis, such as thrombospondin-1 [52], vitronectin [53], plasminogen [26] and tissue factor [54], are known to be regulated by allosteric disulfides. Notably, tissue factor expression [55] and activity [56] have been linked to complement activation. PDI and complement activation has also been linked to tissue factor decryption [57]. Anti-thymocyte globulin activates tissue factor on monocytes and PDI inhibitors block this activation. In addition, C5 complement activation on monocytes results in oxidation of cell surface PDI. It is perhaps not surprising, therefore, that the coagulation and complement cascades are enriched in labile disulfides. For instance, the Cys50-Cys111 disulfide in uPA may be a substrate for one or more of PDI, ERp57, ERp5 and ERp72. Cleavage of the bond could regulate its inactivation by PAI-1.
A high proportion of labile disulfide bonds occur in proteins that reside in the cytoplasm or nucleus. This finding implies that these intracellular compartments are conducive to this post-translational protein control. Cleavage and/or formation of the labile disulfides is presumably enabled by the precise redox buffering mechanisms of the cytoplasm/nucleus. Pathway analysis showed that proteins involved in the HIF-1 oxygen homeostasis system were enriched among human proteins containing labile disulfides. Notably, reversible disulfide bond formation has been speculated to be a common regulatory mechanism of oxygen sensors [41]. For example, a labile disulfide was identified in the histone acetyl transferase, EP300. The bond is in the TAZ2 domain that mediates interaction with oxygen-sensing HIF-1α. Cleavage and/or formation of this bond is predicted to influence HIF-1α binding.