The influence of base pair tautomerism on single point mutations in aqueous DNA

The relationship between base pair hydrogen bond proton transfer and the rate of spontaneous single point mutations at ambient temperatures and pressures in aqueous DNA is investigated. By using an ensemble-based multiscale computational modelling method, statistically robust rates of proton transfer for the A:T and G:C base pairs within a solvated DNA dodecamer are calculated. Several different proton transfer pathways are observed within the same base pair. It is shown that, in G:C, the double proton transfer tautomer is preferred, while the single proton transfer process is favoured in A:T. The reported range of rate coefficients for double proton transfer is consistent with recent experimental data. Notwithstanding the approximately 1000 times more common presence of single proton transfer products from A:T, observationally there is bias towards G:C to A:T mutations in a wide range of living organisms. We infer that the double proton transfer reactions between G:C base pairs have a negligible contribution towards this bias for the following reasons: (i) the maximum half-life of the G*:C* tautomer is in the range of picoseconds, which is significantly smaller than the milliseconds it takes for DNA to unwind during replication, (ii) statistically, the majority of G*:C* tautomers revert back to their canonical forms through a barrierless process, and (iii) the thermodynamic instability of the tautomers with respect to the canonical base pairs. Through similar reasoning, we also deduce that proton transfer in the A:T base pair does not contribute to single point mutations in DNA.


Introduction
This document provides further information in support of the work presented in the main paper. The first section provides more detail about the quantum mechanical benchmarking of calculations on a gas-phase G:C base pair. The second section contains data obtained from the QM/MM ensembles, in particular reporting on results from the individual replicas whose overall statistical properties are described in the main article. The final section contains further information on 'rare' proton transfer pathways that have been observed within the ensemble-QM/MM method. The description of input data, structure and trajectory files can be found at https://github.com/gh3orghiu/. Figure S1 shows the mean absolute deviation of the displacement for all atoms in the G:C base pair compared to a reference structure that has been optimised using MP2/aug-ccpvdz [S71]. For all QM-methods the deviation of geometries to the benchmark G:C structure do not exceed 7%. The largest errors arise from the B3LYP functional (without dispersion correction) and the M06-2X functional. The MP2/aug-cc-pvdz has a very low deviation to the geometry because the structure we are comparing it to is also optimised using the MP2 method. The LC-ωPBE-XDM/aug-cc-pvdz method converges to both good base pair geometries and hydrogen bond interaction energies. The B3LYP-XDM/aug-cc-pvdz method was selected as our QM method since it represents the best trade-off between computational cost and accuracy.

Base pair stacking interactions
A variety of DFT methods were benchmarked on their ability to best describe the stacking interactions between two stacked base pairs. The stacking interactions of ten unique combinations of base pairs were calculated. An example of one stacked base pair combination (GC:CG) is shown in Figure S2). The initial geometries of the stacked base pairs and the reference interaction energies (calculated using CBS(T)-F12-CP) were taken from the Supplementary Material [S72]. From there, single point energies for each combination of stacked base pairs were performed using NWChem using a variety of QM methods. The two-body stacking energy for each base pair stack, ∆E stack is calculated by the summation of the energies of the four-pair stacking interactions (as marked in Figure S2.) where ∆E XY is the interaction energy between nucleobases X and Y .
The inclusion of the four-body interaction term is given by

Ensemble-QM/MM reaction pathways
The mean electronic energies for the three most commonly observed reaction pathways are displayed in Table S1. These are as follows: G:C step-wise double proton transfer; G:C concerted double proton transfer; and A:T concerted single proton transfer.

G:C -Double proton transfer
The climbing image nudged elastic band (CI-NEB) pathways for each G:C replica in the 25-member QM/MM ensemble are shown in Figures S3 and S5. Each individual pathway is comprised of 24 images along the reaction coordinate. The electronic energies of the optimised geometries for the stepwise and concerted double proton transfer pathways are shown in Tables S2 and S3 respectively. Concerted double proton transfer mechanisms occur in several replicas (1, 15 and 21), while the remaining replicas all occur via the stepwise mechanism.

A:T -Single proton transfer
The climbing image nudged elastic band reaction coordinate for each single proton transfer A:T replica in the QM/MM ensemble is shown in Figure S4. Each individual pathway is comprised of 24 images along the reaction coordinate. The QM/MM energies of the optimised geometries for the concerted single proton transfer pathway are shown in Table  S4. Figure S4: Climbing image nudged elastic band reaction coordinates for the seven A:T replicas that show single proton transfer -out of 25 replicas from the QM/MM-ensemble. Energies were calculated using the B3LYP+XDM/aug-cc-pvdz/AMBER method. This section details the proton transfer mechanisms that we observed throughout the ensemble-QM/MM study. Due to the rarity and statistical insignificance of these processes, they are only discussed here for completeness.

G:C -Concerted single proton transfer
The QM/MM energies of the optimised geometries for the concerted single proton transfer pathway in G:C are shown in Table S5. Figure S5: Climbing image nudged elastic band reaction coordinates for the singe proton transfer G:C QM/MM-ensemble (observed in 1 replica of the 25 total). Energies were calculated using the B3LYP+XDM/aug-cc-pvdz/AMBER method.

A:T -Intra-rearrangement single proton transfer
One of the eight single proton transfer replicas for the A:T QM/MM-ensemble displayed a highly energetic intra-thymine rearrangement transition state, as shown in Figure S7. Figure S6: The mechanism for single proton transfer in A:T involving a high energy the intra-thymine rearrangement . The rearranging proton in this process is shown by the green asterisk. Figure S7: The electronic energy (blue) and the Gibbs free energy (red) as a function of the reaction coordinate for the intra-thymine rearrangement single proton transfer in the A:T base pair. Energies are calculated relative to the energy of the reactant.
Due to its extremely high activation energy of ∼70 kcal/mol, this transition state is deemed to be essentially irrelevant to the single proton transfer pathway. Due to unconverged transition state geometries, the two replicas shown in Figure S8 that displayed double proton transfer within A:T were replaced by two additional replicas in the final A:T QM/MM ensemble. It takes approximately 10 −10 seconds for the concentration of the G*:C* tautomer to reach equilibrium, which does not exceed the concentration of 8 × 10 −7 %. The A + :T − zwitterion reaches equilibrium faster, at 10 −13 s and at a significantly higher concentration (6×10 −2 %) than G*:C*.