How quickly can we predict trimethoprim resistance using alchemical free energy methods?

The emergence of antimicrobial resistance threatens modern medicine and necessitates more personalized treatment of bacterial infections. Sequencing the whole genome of the pathogen(s) in a clinical sample offers one way to improve clinical microbiology diagnostic services, and has already been adopted for tuberculosis in some countries. A key weakness of a genetics clinical microbiology is it cannot return a result for rare or novel genetic variants and therefore predictive methods are required. Non-synonymous mutations in the S. aureus dfrB gene can be successfully classified as either conferring resistance (or not) by calculating their effect on the binding free energy of the antibiotic, trimethoprim. The underlying approach, alchemical free energy methods, requires large numbers of molecular dynamics simulations to be run. We show that a large number (N = 15) of binding free energies calculated from a series of very short (50 ps) molecular dynamics simulations are able to satisfactorily classify all seven mutations in our clinically derived testset. A result for a single mutation could therefore be returned in less than an hour, thereby demonstrating that this or similar methods are now sufficiently fast and reproducible for clinical use.

inferential and so cannot make a prediction when it encounters a genetic variant not present in the catalogue, such as is the case for rare genetic mutations. Predictive methods are therefore needed to give the clinician some information about the likely effectiveness of a drug in treating an infection while, at the very least, they wait for the clinical sample to be cultured and tested in the traditional manner [11].
Trimethoprim is a competitive inhibitor of S. aureus dihydrofolate reductase (DHFR; figure 1a), an enzyme in the essential folic acid pathway encoded by the chromosomal gene dfrB. It is usually administered with sulfamethoxazole, which inhibits another protein in the bacterial folic acid pathway, and the combination is used to treat urinary tract and soft tissue infections. Predicting resistance to trimethoprim is a good test of any novel AMR predictive method since there exists a large amount of structural, biophysical and clinical data and the most common mutation that confers resistance to trimethoprim is F99Y (figure 1b), which is a comparatively small mutation and therefore is a stringent test of any predictive method.
Since DHFR is essential, our hypothesis is that non-synonymous protein mutations confer resistance by reducing how well the antibiotic, but not the natural substrate (dihydrofolic acid, DHA), binds. This reduces the problem to calculating how the binding free energy of the drug (ΔΔG tmp ) changes upon introducing the protein mutation. If the mutation reduces the binding free energy below a pre-determined threshold, then one predicts that the mutation confers resistance. Given the subtle nature of the mutations involved and the small size of this protein, we assume that only alchemical free energy methods which are derived from classical statistical mechanics will be sufficiently accurate and precise. It has been previously shown that such methods can be successfully employed to predict the effect of individual amino acid mutations on the action of trimethoprim [13]. By applying simple kinetic theory to clinically observed minimum inhibitory concentrations of trimethoprim for resistant and susceptible samples the previous study was able to establish that for a mutation to confer resistance, ΔΔG tmp ≥ 0.8 kcal mol −1 .
For such a method to be deployed clinically it must be both fast and consume as little computational resource as possible. While broadly successful, the previous study required 32 344 molecular dynamics simulations to be run, yielding a total of 8.1 μs. At the time of writing, one can simulate about 10 ns per day of DHFR using 4 computer cores slaved to a single consumer-grade graphics processing unit (GPU). The calculations underlying a single prediction therefore would require 9720 CPU hours and 2430 GPU hours which, although feasible, is still too large for routine use.
Any method must also meet the thresholds for accuracy and reproducibility as laid out by the existing international standards for new AST methods [14]. The relevant criteria are the very major discrepancy (VMD) and major discrepancy (MD) rates. The former is defined as the number of samples that are classified as susceptible by the method under test which the reference method determined as being resistant as a proportion of the total number of resistant samples and, to pass, VMD must be less than or equal to 3%. The definition of the MD rate is similar but inverted, i.e. the number of samples incorrectly interpreted as resistant that are susceptible. Again, to pass, MD must be less than or equal to 3%.
In this paper, we shall examine how reducing the computational resource allocated to the calculations affects the qualitative prediction of antibiotic resistance and its reproducibility and thereby answer the question: just how quickly can we reliably predict the effect of a mutation in dfrB on the action of trimethoprim? The answer to this question will guide whether it is yet feasible to consider deploying this kind of approach clinically.

Datasets
A previous study calculated 32 independent values of how the binding free energies of both the antibiotic, trimethoprim (ΔΔG tmp ), and the natural substrate, DHA (ΔΔG dha ), changed for each of seven mutations [13]. Here we shall focus solely on the effect of the mutations on the binding free energy of the antibiotic (ΔΔG tmp ). Three of the mutations (F99Y, F99Y/ L21V, L41F) are known to confer resistance (figure 1b) while the remaining four (F123L, A135T, V76A, I83V) have no clinical effect on the action of trimethoprim [15]. Each free energy (ΔΔG tmp ) required four alchemical free energies to be calculated (electronic supplementary material, figure S1). Two of those energies (ΔG 1 and ΔG 6 ) were themselves the sum of three further alchemical free energies, hence 8 free energies were required in total. Each free energy in turn necessitated between 8 and 16 molecular dynamics simulations, each at a different value of the progress parameter, λ. By the standards  Figure 1. The structure of S. aureus DHFR [12] showing (a) the overall topology and the trimethoprim (TMP) binding site and (b) the location of the seven mutations studied. The three mutations that confer resistance are coloured in different shades of red, while the four mutations that have no clinical effect on the action of trimethoprim are coloured in different shades of blue.
royalsocietypublishing.org/journal/rsfs Interface Focus 10: 20190141 of the field, all of the molecular dynamics simulations were short, at just 250 ps in duration. We call this existing collection of simulations Set1 (table 1). So that we may assess whether the calculations are converged, we first extended the simulations underlying ten of the 32 free energies by an order of magnitude, i.e. to 2.5 ns (Set2, table 1). To allow us to further assess the convergence of the calculations, we ran an additional five calculations for the F99Y mutation using simulations 25 ns long; we call this Set3 (table 1).

Assessing the convergence of the free energy calculations
First, let us consider how extending the duration of the molecular dynamics simulations affects the accuracy and precision of how F99Y alters the binding free energy of trimethoprim, ΔΔG tmp . For simplicity, in this and all subsequent calculations, the first half of each trajectory is discarded to remove transients. The mean and standard deviation of ΔΔG tmp is plotted as a function of how much data are included, as described by the simulation duration, t (figure 2). During the first few tens of picoseconds (Set1) the variance decreases, as one might expect, with the mean remaining approximately constant up to 0.25 ns and in agreement with the published thermodynamic data for this mutant measured using isothermal titration calorimetry (ITC) [12,[16][17][18][19]. Swapping to Set2 allows us to examine the behaviour in the range 0.25 ns < t ≤ 2.5 ns at the cost of a reduced number of values of ΔΔG tmp from which to calculate statistics (n = 15 versus n = 37, table 1). Here, the mean value of ΔΔG tmp rises and then falls slightly; however, the variance is observed to gradually increase. Again, it agrees with the published ITC data [12,[16][17][18][19]. Finally, moving into the range 2.5 ns < t ≤ 25 ns using Set3 we observe the mean value of ΔΔG tmp to fluctuate between 0 and 2 kcal mol −1 while the variance remains similar, or increases slightly. The net result is here there is only intermittent agreement with the published ITC data [12,[16][17][18][19]. Note that only five simulations are contributing to the statistics in this last regime (table 1) and therefore one is least confident about the observed trends. The thermodynamic cycle (electronic supplementary material, figure S1) describes how ΔΔG tmp can be decomposed into four separate free energies, ( 2 :1) Table 1. We took a published set of 32 values of ΔΔG tmp per mutation (Set1) and extended the simulations underlying ten of these values by an order of magnitude creating a second set (Set2). A further five values of ΔΔG tmp were calculated for the most commonly observed mutation associated with resistance, F99Y, using simulations two orders of magnitude longer than Set1 (i.e. 25 ns). Note therefore that Set2 is a subset of Set1 since the first 250 ps of all the simulations in Set2 appear also in Set1.
Since an additional five values of ΔΔG tmp were calculated for F99Y mutation (Set3), Sets 1 and 2 for this mutation contain 5 more members than the other mutations with 37 and 15 members, respectively. At short simulation durations, the value of ΔΔG tmp is in agreement with the results of published isothermal titration calorimetry (ITC) data [12,[16][17][18][19]; however, in the region 2.5 < t ≤ 25 ns, there is scant agreement due to the large deviations in the mean of ΔΔG tmp . For each of the three sets of simulations, the mean and standard deviation of the values of ΔΔG tmp are plotted as a function of the simulation duration, t. The dotted line is the threshold (0.8 kcal mol −1 ) above which a mutation would be clinically classified as resistant [13] and the shaded grey area is the 95% confidence limit for the four published values of ΔΔG tmp for the F99Y mutation [12,[16][17][18][19].
royalsocietypublishing.org/journal/rsfs Interface Focus 10: 20190141 where ΔG 1 and ΔG 6 are the free energies of introducing the mutation into the protein when trimethoprim is either absent or bound, respectively. To prevent the ligand unbinding from the protein, a simple distance restraint was applied during the ΔG 6 transition. The remaining two free energies, ΔG 5 and ΔG 7 , estimate the cost of removing this restraint from the mutant and wild-type protein, respectively. By considering Set1, Set2 and Set3 in turn, the variation of all four of these free energies in the regime 0.05 < t ≤ 25 ns is shown in electronic supplementary material, figure S2. Once t > 5 ns, there is little change in the mean and the standard deviation is small for the two alchemical free energies, ΔG 1 and ΔG 6 , and hence they appear to have converged. Interestingly, although when t ≤ 5 ns the means of both values drift (especially in Set2), they appear partially correlated, such that the mean of the difference, as prescribed by equation (2.1), varies less than its components. Note also that at small values of t (Set1), the variance in ΔG 1 (apo) is larger than ΔG 6 (complexed), which might be expected given the experimental structure had trimethoprim bound. By contrast, the variances of the free energies of removing the restraints (ΔG 5 and ΔG 7 ) are small at the shorter simulation durations (Set1) and grow in Set2 before in Set3 approaching 4 kcal mol −1 while the mean fluctuates by around 2 kcal mol −1 .
The observed reduction and then subsequent growth in the variance of ΔΔG tmp for the F99Y mutation observed in figure 2 is hence due to two different effects; at short times (Set1, as previously published [13]) increasing the duration of the simulations reduces the apparent variance in the alchemical free energies, ΔG 1 and ΔG 6 , and there appears to be little contribution from the cost of removing the restraints. But as the simulations are extended, the variances of the restraining free energies, ΔG 5 and ΔG 7 , increase until it is their variances that are dominating the apparent error of ΔΔG tmp .
Similar trends are observed for the other six mutations (figure 3); the variance in ΔΔG tmp decreases with simulation The mean and standard deviations as a function of simulation duration for the other six mutations. As seen in the F99Y mutation (figure 2), the variance in ΔΔG tmp initially falls as the simulation duration increases, but then rises again. To allow a wide range of simulation durations both (a) Set1 and (b) Set2 were used which contain 32 and 10 values of ΔΔG tmp per mutation, respectively. The dotted black line is the resistance threshold [13]: if ΔΔG tmp is greater than this value then we predict that it will confer resistance to trimethoprim.
royalsocietypublishing.org/journal/rsfs Interface Focus 10: 20190141 duration before increasing again. Inspecting the four component free energies (electronic supplementary material, figure S3 and S4) shows that, with the exception of F123L, the initial decrease in the variance of ΔΔG tmp is mainly driven by convergence of the two alchemical free energies, ΔG 1 and ΔG 6 , and, like F99Y, any subsequent increase in the variance is mainly driven by the increasing variance of the free energies of removing the restraints (ΔG 5 and ΔG 7 ). And, as one would expect, these effects are most pronounced for the mutations where the largest number of atoms are being perturbed (L41F, F123L). Why is the variance of the cost of removing the restraints increasing with the simulation duration? Since the restraining potential is weaker for high values of the alchemical progress parameter, λ, one possible reason might be that after sufficient time the ligand is able to unbind or change conformation. Visually inspecting the simulation trajectories confirms that this is not the case, although it remains possible a more subtle effect is occurring that has not been detected. Alternatively, a more complex set of restraints or a flat-bottomed harmonic potential may behave better; however that is outside the scope of the present work.
In absolute free energy calculations, restraints are required to keep the ligand in the bound state as the ligand is decoupled [20,21]. For relative binding free energy calculations, as done here, a restraint is only necessary if the ligand departs from the bound conformation during the simulation [20]. To determine retrospectively if restraints are needed for these mutations, five unrestrained 25 ns simulations of each endstate (e.g. F99Y) with trimethoprim bound for each of the seven mutations were run. The ligand did not depart from the binding site in any of the simulations (electronic supplementary material, figure S5), strongly suggesting that restraints are not, in fact, required for these calculations. That said, unless one runs a series of unrestrained simulations beforehand to check if the mutation does not cause the ligand to unbind, it remains best practice to include restraints to prevent any unbinding events from occurring.
We conclude that the relative binding free energy calculations shown here are not converged, although we note that some of the component free energies (ΔG 1 and ΔG 6 ) for some of the mutations show some evidence of converging as t → 2.5 ns. Due to the lack of convergence, one must assume that the resulting numerical values of ΔΔG tmp are likely inaccurate and that the reported errors are underestimating the uncertainty. We are, however, primarily interested here in how rapidly can we make a qualitative prediction of antibiotic resistance and so we shall now turn our attention to this.

Reducing the simulation duration decreases the sensitivity of the antibiotic resistance prediction
Let us use Set1 to examine how the method performs when simulations with very short durations are used to predict the effect of each mutation on the binding, and thence action, of trimethoprim. We will evaluate the method by comparing its ability to predict the effect of the individual mutations and then, by aggregating the results, calculate the overall sensitivity and specificity and the VMD and MD rates. At a specified simulation duration, t, ten values of ΔΔG tmp were drawn with replacement from Set1 for the mutation being studied. The mean value of ΔΔG tmp and its associated error (95% confidence) were then calculated. If this value was greater than the threshold of 0.8 kcal mol −1 that defines clinical resistance [13] then the mutation was predicted as conferring Resistance (R); conversely, if this value lay below the threshold, it was predicted as having no effect, i.e. the sample would be Susceptible (S) to trimethoprim. Finally, if its confidence limits bracketed the threshold, then it was classified as having an Unknown effect (U). This was repeated 100 times at this value of t (figure 4), thereby allowing us to examine the reproducibility of the method. By then repeating the process across all mutations for all values of t, it also allows us to examine how reducing the simulation duration affects our ability to predict the effect of each mutation on trimethoprim when only ten values of ΔΔG tmp are used.
If we allow the method to use all the data (t = 250 ps) then it correctly and reproducibly predicts the three mutations clinically associated with trimethoprim resistance (F99Y, F99Y/ L21V and L41F). This can be deduced from the earlier graphs of ΔΔG tmp (figures 2 and 3) since the values for these three mutations and their variances lie above the clinical threshold for resistance. Of the four mutations known to have no effect, the method correctly and reproducibly predicts two to be susceptible (A135T and V76A). The other two are predicted either to be susceptible or to have an unknown effect (F123L and I83V), with the latter more probable than the former. Now let us consider the effect of reducing the duration of the alchemical simulations. First, and perhaps most importantly, we observe that, across all simulation durations, the method very rarely makes an incorrect definitive prediction, i.e. falsely predicts a mutation either to confer resistance when it has no effect (a major discrepancy) or to be susceptible when it confers resistance (a very major discrepancy). The clearest example is when a very short duration is used the F123L mutation is incorrectly predicted to confer resistance with an approximately 25% probability. As a result, we see that as the duration of the simulations is decreased, the probability that the method returns an unknown prediction rises. The net effect of this is that once the method has returned a definitive prediction (i.e. resistant or susceptible) then that is overwhelmingly likely to be correct.
If we combine the results for all the mutations, the resulting sensitivity is high (t ≥ 0.5 ns: mean 94.9%, maximum 100%, minimum 74.3%), while the specificity varies between 50 and 60% (t ≥ 0.5 ns: mean 58.6%, maximum 64.0%, minimum 53.8%). Formally since the definitions of sensitivity and specificity assume a binary not a ternary phenotype, the implication of a low specificity is that the method is incorrectly classifying Susceptible samples as Resistant, which is not the case since they are, for the most part, being classified as Unknown. We are therefore perhaps being unduly conservative by including in the denominator cases where an Unknown phenotype has been predicted.  probability of making an Unknown prediction results in a lower specificity; however, the sensitivity remains high. (c) The very major discrepancy (VMD) rate remains below the required 3% threshold throughout while the major discrepancy (MD) rate reaches a maximum of 25% before falling below the threshold after t ∼ 0.01 ns.

Using a fixed amount of computational resource
royalsocietypublishing.org/journal/rsfs Interface Focus 10: 20190141 with both having a small probability (5-10%) of being classified Unknown at larger values of t. The other two mutations (F123L and I83V) are most likely to be predicted as having an Unknown phenotype, with both consistently having a very small probability of being incorrectly predicted as Resistant and F123L having a high probability of being incorrectly classified as Resistant when t < 20 ps. Aggregating these results leads to approximately constant values of the sensitivity (t ≥ 0.5 ns: mean 81.2%, maximum 90.0%, minimum 71.3%) and specificity (t ≥ 0.5 ns: mean 54.7%, maximum 69.0%, minimum 45.2%), although again we emphasize that by including the cases predicted Unknown in the denominator of the sensitivity and specificity calculations, we are probably being over-conservative and excluding these cases would result in much higher values for the sensitivity and specificity.

How fast could we predict whether a mutation confers resistance to trimethoprim?
The detailed analysis in figures 4 and 5 enables us to choose values of (N, t) to minimize the time required to make a satisfactory set of predictions; however one does not a priori usually have this information and, in any case, it is possible that as yet unseen mutations in dfrB could have effects not congruent with our testset of seven mutations. For example, our testset does not contain any mutation that is marginally resistant (which explains why the sensitivity is always greater than the specificity). This analysis therefore should be taken as indicative; however, it is clearly of interest to consider just how rapidly one could predict whether a mutation confers resistance to trimethoprim or not. We shall choose N = 15 and t = 50 ps (figure 5) since the trajectories are long enough to avoid the observed transients in classification behaviour which affect the major discrepancy rate but are still short enough to run quickly using a consumer grade GPU. Using these parameters, the sensitivity and specificity are predicted to be 86.3% and 70%, respectively, with no VMDs or MDs. The lack of any VMDs or MDs illustrates again that the method is either returning a correct prediction of Resistant or Susceptible, or is returning a prediction of Unknown. We estimate that, at the time of writing, a single 50 ps trajectory will take approximately 7.5 min to complete, assuming the GPU is large enough (or there is more than one on the mother board) so that all the replicas can run on the same node to permit Hamiltonian replica exchange which requires rapid communication between the replicas. Since we have restricted ourselves in this paper to only calculating ΔΔG tmp , each value therefore only requires eight alchemical free energies to be calculated, each in turn requiring (at least) eight coupled molecular dynamics simulations, making a total of 960 simulations, which is still daunting. If we arbitrarily decide that a prediction must be complete within one hour, then twenty nodes, each with one or more GPUs and 8-16 CPU cores, could run the simulations in 45 min. Such a facility could easily be provided by a large research university or a commercial cloud platform. That would leave 15 min for setup and analysis: preparing all simulation input files on a single machine would likely create a bottleneck, and hence one would also have to distribute all the setup tasks onto the high-performance computer. We conclude that, while there are obvious challenges, it is now feasible to predict whether individual mutations confer resistance to an antibiotic using free energy methods and that this can be done fast enough to be clinically useful.

Prevalence of the studied mutations in the European Nucleotide Archive
The seven different non-synonymous dfrB mutations studied here were selected from a relatively small dataset of 501 unrelated S. aureus isolates collected from patients in the UK [13,15]. Our analyses and conclusions depend on our testset of seven mutations being representative of the likely mutations in dfrB one might encounter clinically. To estimate how prevalent these mutations are globally, we searched an index of the European Nucleotide Archive [22]. Due to how the index is created, only results for amino acids 11-148 (inclusive) were returned, and samples containing multiple amino acid mutations fewer than 10 positions apart are unlikely to have been detected.  [15], only two were found in the ENA: A135T (n = 6578, 34.3%) and V76A (1412, 7.4%). Further examination of the results returned for k-mers used to probe codon variation at Val76 and Phe123 showed that comparatively few sets of short-reads in the ENA were identified around these positions, suggesting that some samples were missed since they contained multiple amino acid mutations within the width of the k-mer (21 amino acids) or that similar k-mers are found in other species and therefore our ability to detect variation at these sites using this method is probably limited.

Discussion
We conclude that it is now possible to rapidly and reproducibly predict whether individual non-synonymous mutations confer resistance (or not) to trimethoprim, an antibiotic, and we have shown that, for this system at least, it is theoretically possible and practical to make a prediction in less than 1 h. This relies on our observation that calculating a large number of values of ΔΔG tmp using very short alchemical molecular dynamics simulations allows the seven mutations in our test set to be adequately classified with acceptable sensitivities and specificities and low very major and major discrepancy rates, thereby using an order of magnitude less computational resource than a previous study [13]. The relatively small amount of computation resource required also opens up the possibility of running calculations ahead of time. For example, given a large enough royalsocietypublishing.org/journal/rsfs Interface Focus 10: 20190141 computational resource one could predict the effect of all possible single-nucleotide non-synonymous mutations in a small protein like DHFR, thereby providing a lookup table so that a prediction can be returned instantaneously to the clinician. One can further imagine that this data could be applied to help de-risk lead compounds in drug development for resistance, since it would reveal how a protein could evade the action of a small-molecule inhibitor by mutation.
We note, however, that the values of ΔΔG tmp are not yet converged and this appears to be due to the restraints applied to prevent the ligand leaving the binding pocket. Although this invites further investigation, since we have demonstrated that restraints are not required to keep the ligand bound to the protein, it suggests that in future it may be simpler and preferable to run such calculations without restraints, thereby reducing the computational overhead yet further. What is curious is that while the values of ΔΔG tmp are not yet converged and their precision is likely underestimated, they appear sufficiently accurate to satisfactorily predict whether a mutation confers resistance to trimethoprim or not across a wide range of simulation durations. If the mutation under study has a value of ΔΔG tmp that is close to the resistance threshold, however, then quantitative accuracy will be important.
Our conclusions are reliant on the seven mutations that form the testset and we have investigated whether these are representative of the genetic variation one might expect to observe clinically. Since several of mutations were detected at either very low prevalences or not at all in the European Nucleotide Archive, we conclude that our testset is not truly representative and therefore our conclusions may not transfer into the clinic. Clearly, the method needs to be tested on additional dfrB mutations as well as other antibiotic/protein target combinations in a wide range of pathogenic bacteria before we can make more definitive statements about its applicability.
Although we were able to show that calculating 15 values of ΔΔG tmp from alchemical simulations only 50 ps long led to acceptable classification behaviour, this is almost certainly a form of overfitting since we had a free choice of a wide range of combinations and may have simply chosen one that works well for our testset. It will only be possible to gain confidence through applying the method 'blind' to other mutations. We have also restricted ourselves here to only considering the effect of the mutation on the binding of the antibiotic; previous work has shown that taking the effect on the natural substrate, DHA, into account changes and may improve the prediction [13]. Also, all of the mutations studied (with the possible exception of the double mutant, F99Y/ L21V) are tractable by alchemical methods: it remains to be seen how successful this approach will be for mutations involving a change in electrical charge, that involve a proline, or simply require a large number of atoms to be perturbed.
Our method assumes that all the eight alchemical free energies required to calculate a single value of ΔΔG tmp all converge and behave similarly, which we have shown not to be true (electronic supplementary material, figures S2-S4). Further work is needed to assess if different types of alchemical transition (e.g. removing the electrical charges from the alchemical atoms) require more or less simulation time and/ or numbers of molecular dynamics simulations. It is possible up to another order of magnitude of savings is available through careful dynamic control (i.e. 'steering' [23]) of the makeup and number of alchemical free energies run. This will necessarily complicate the calculation of errors which was done here at the level of ΔΔG tmp and in future will likely have to be done at the level of each alchemical free energy with errors then added in quadrature.
The translation of genetics into clinical microbiology shows no sign of abating, with the most progress being made using whole-genome sequencing for AST of tuberculosis where catalogues of observed genetic variants and their associated effects on different drugs are most advanced [24]. Since this approach is purely inferential, there remains a need to develop predictive methods [11]. Even a low-quality prediction for a single antibiotic may prove useful clinically, since it will be viewed alongside the results for other drugs which often allows nonsensical predictions to be discounted. Such a Bayesian approach has already been shown to improve AST of tuberculosis [11]. In any case, if the prediction ends up affecting the clinical decision, it is likely that the sample would still be sent for culturing and testing.
Although we have focused primarily on calculating the effect of the mutation on the binding free energy of the antibiotic, there are a range of other methods that could be brought to bear. Machine learning methods, using genetic, structural and/or chemical features, are likely to be sufficiently accurate and also fast [11,25,26]. Such methods could be used to screen out mutations that have no effect on an antibiotic, leaving only the marginal cases for computationally more intensive approaches such as proposed here.
One key advantage of our approach we have not discussed in detail is that it rarely makes an incorrect classification. This potentially enables a guided process whereby simulations are run until a definite (Resistant/Susceptible) prediction is returned, saving further computational resource and time. More work needs to be done if this, or other, methods are to be formally certified for use in AST. For example, to pass the relevant international standard [14], not only must the very major and major discrepancy rates be less than or equal to 3%, but also there must be a high level of categorial agreement with a reference method. This will necessitate carefully designed and thorough studies in collaboration with clinical microbiology laboratories using standard methods.
Clinical microbiology is built upon a binary paradigm of a sample being Resistant or Susceptible. Converting a quantitative measurement which has a confidence limit into a binary result necessitates a third category (here called Unknown) for those cases that cannot be definitely classified as R or S. This is a subtle point (and one can probably trace its origins back to the Law of the Excluded Middle) and it is nonsensical to deny predictive and experimental methods the option of returning an Unknown result. The Clinical Laboratory and Standards Institute, which provides clinical microbiology standards mainly to the USA, has adopted a ternary system; however, the European Committee on Antimicrobial Susceptibility Testing has only recently begun to introduce such a category into some antibiotic/pathogen combinations [27]. Many of the clinical tools implicitly assume the binary Resistant/Susceptible paradigm and new tools and language need developing, as we have seen here when calculating the sensitivity and specificity of our method.
As computational resource becomes faster and more widespread, the broader field of molecular simulation is gradually moving away from running single simulations towards running large number of replicas [28,29]. This potentially exposes underlying drawbacks with the molecular royalsocietypublishing.org/journal/rsfs Interface Focus 10: 20190141 dynamics codes and how we, as computational scientists, typically work. For example, when setting up and running thousands of molecular dynamics simulations the time taken to set up a simulation becomes appreciable. Likewise, one must use some kind of object store or file hierarchy to archive all the simulation files. Without progress in these areas, it is possible that the time taken to set up, copy files, queue simulations and retrieve and analyse files could become the limiting factor in speeding up and therefore applying these methods clinically.
In the field of alchemical free energy calculations much attention has understandably been focused on demonstrating that free energies can be calculated that agree with experimental data to a high degree of precision. A high degree of precision and accuracy is spurious in this application and one might speculate that applying alchemical methods in this way is a sign that the field is maturing. Finally, while thermodynamic integration is usually described as an equilibrium method, it is not obvious if this remains true when the duration of an alchemical simulation is only 50 ps. Despite this, it is both illuminating and encouraging to look back over the last 30 plus years on the progress made by the field of alchemical free energy calculations [30] and infer what might be possible in just a few more years.

Methods
The GROMACS molecular dynamics [31] simulations underlying the alchemical free energy calculations were set up and run as described previously [13] and followed best practice [32], including using pmx [33] to mutate the wild-type structure of DHFR with trimethoprim bound [12]. In brief, the structure of S. aureus DHFR with trimethoprim and NADPH bound (PDB:3FRE) [12] was mutated as required by pmx [33] and water and counterions added. To provide a range of starting structures to seed the different alchemical free energy calculations, each mutated protein, in both apo and trimethoprim-bound forms, was subjected to a short 2.5 ns equilibration simulation using an integration timestep of 1 fs. At a range of timepoints along each simulation, the mutant sidechain was 'phased-in' over 1000 steps by smoothly increasing λ, as per the Alchembed procedure [34]. Six different initial structures were created in this way for each mutant, thereby reducing the likelihood that short simulations are correlated. The generalized AMBER forcefield and AMBER ff9SB-ildn [35] were used throughout. Electrostatic forces were calculated by the particle mesh Ewald method using a real space cut off of 1.2 nm while van der Waals interactions were cut off at 1.2 nm, with a switching function applied from 0.9 nm. The temperature was maintained at 310 K using a Langevin thermostat with a time constant of 2 ps and the pressure held at 1 bar using an isotropic Parinello-Rahman barostat with a compressibility of 4.46 × 10 −5 bar −1 and a time constant of 1 ps.
The alchemical simulations underlying ten of the 32 values of ΔΔG tmp for each of the seven mutations were extended by an order of magnitude (from 0.25 ns to 2.5 ns). In addition five further values of ΔΔG tmp for the F99Y mutation were calculated using simulations two orders of magnitude longer (25 ns). To cope with the resulting very large numbers of molecular dynamics simulations, all data were stored in a file hierarchy and tagged using the datreant Python module [36]. All simulation data were then parsed and alchemical free energies were calculated as a function of simulation time t using a purposewritten Python class. In all cases, the first half of the available data were discarded and hence the free energy at time t was calculated by thermodynamic integration using the energies in the interval (t/2, t]. Using the simpler method of thermodynamic integration was possible because, as reported previously [13], there were no significant differences in values of ΔG calculated by either this method or the more complex multi-state Bennett acceptance ratio estimator. This yielded large tables of alchemical free energies for Set1, Set2 and Set3 and each was stored as a Pandas dataframe [37]. The resulting values of ΔΔG tmp were then calculated and stored. These dataframes can be found in the electronic supplementary material. Bespoke Python code then read these tables and applied the bootstrapping process described in the main body of the paper to produce the figures. Standard errors were calculated in the usual way and converted to a 95% confidence interval using the appropriate t-statistic for the number of values. All graphs were plotted using Matplotlib [38] and all protein images were rendered using VMD [39]. The BIGSI search index for microbial genomes [22] is interrogated using a k-mer where k ≥ 61 bases. Since searching the index for amino acid mutations involves permuting the bases in a triplet, leading to 80 different variations, we wrote a Python module, called pygsi, that automatically interrogates the index using a 63-mer constructed with 20 base pairs flanking the codon of interest [40]. Once all the permutations have been tried, the code moves on to the next codon.