The domain of computational biomedicine is a new and burgeoning one. Its areas of concern cover all scales of human biology and pathology from the genomic to the whole human and beyond, including epidemiology and population health.
Its methodological underpinnings are those provided by the established use of modelling and simulation in the physical sciences and engineering, supported and underpinned in various ways by applied mathematics and computer science. The intellectual and practical development of these disciplines has been strongly influenced and indeed frequently dominated by theory, the power and scope of which has been increasing relentlessly over the past several centuries. Some of the major advances in physics in recent years have come from experimental observations of phenomena, such as the Higgs boson and gravitational waves, which were predicted, respectively, 50 and 100 years ago. In areas of much greater complexity, such as the environmental sciences, theory has been making progress in terms of predicting events before they occur, one good example being weather forecasting, at least in the short term (for timescales of a matter of days). Indeed, the scientific method as advocated by many textbooks on the philosophy of science is based on the Popperian view according to which theories capture aspects of the true nature of the world and their validity is tested by confronting their predictions with experimental observations.
This is not how much of the life sciences and medicine operate. In comparative terms, they are much more empirical in nature. There is considerable reluctance among the peer group to put trust in theoretical predictions, so that the norm today is for theory, and its adjuncts modelling and simulation, to be subordinated to a role of providing post hoc rationalization of observations. There has been some ‘give’ in this approach in recent times, in the context of machine learning and artificial intelligence, whereby a computer is accorded credibility for making a prediction of an unobserved scenario because such output comes from a large number of empirical data records fed into the machine. It is, however, very difficult to quantify the uncertainty in such predictions, which depend on a number of common assumptions about the underlying distribution of the data that may not be valid [1,2]. Indeed, machine learning is at root nothing more than a glorified form of curve fitting. With so many parameters to fit to a data set, no one should be surprised that it appears to do well; but can we trust it to perform on previously unseen data? With so many correlations made manifest in these approaches, the onus is on distinguishing false from true ones, in general a hard task but one made feasible given an understanding of the structural characteristics of the problem at hand [2].
What the life sciences and medicine really require are mechanistic explanations—the why and the how behind observations in the laboratory and the clinic. Indeed, for a successful medical or clinical procedure to be approved by regulatory authorities, an explanation for the therapy must be provided. The assertion that we should carry out a particular procedure because a machine learning ‘algorithm’ predicted it will never wash. Instead, we need a description of biology and medicine that is based on theory, which enables modelling and simulation to be performed to provide rational, mechanistic explanations. Indeed, for so-called personalized medicine to work, it is essential that we can rely on predictive mechanistic models to provide explanations of processes, to probe ‘what-if’ scenarios, and so on. This can only be done using the Popperian models alluded to above; their ultimate scope can often be substantially extended in combination with machine-learning methods which pay careful attention to the structural characteristics of the problem in hand. This two part theme issue on Computational biomedicine is about such approaches, in which mechanistic methods are invoked, based on the laws of nature, in order to provide high fidelity descriptions and predictions of the behaviour of biomedical systems. The papers that are contained in this two-part theme issue were selected from a two-stage peer-review process beginning with an open call for papers associated with the Computational Biomedicine 2019 Conference, held in London 22–27 September 2019 (https://www.compbiomed-conference.org/compbiomed-conference-2019/), organized by the Computational Biomedicine Centre of Excellence (https://www.compbiomed.eu/), funded by the European Commission. The successful abstracts were invited to submit full papers for possible publication in Interface Focus following further peer review.
In part I, we focus on molecular medicine, that aspect of the subject which deals with the lowest levels of structure and function, nearest to the genome and thus aligned with the reductionist view that our DNA provides the blueprint for our lives. It is the most dominant part of the subject, based on the massive success of molecular biology over the past 60 or more years. In the main, we are concerned here with modern approaches to drug discovery and design, which is hardly a success story in terms of the time scale over which drugs are taken from concept to market place—10 years and $2–3 billion dollars are quoted as average industrial figures [3]—and at best work for no more than around 50% of the population [4,5]. Set against the pandemic which has now engulfed us and demands solutions at breakneck speed, we have a very long way to go. Moreover, we live in the post-genomic era and are all very familiar with the notion that one-size-fits-all has limited validity in medicine. The only way in which this rather broken industry can be improved is by much more effective and persistent use of appropriate forms of information technology, certainly not the use of IT for its own sake. Among these, the ability to reliably predict, by rapid, accurate, precise and reliable computational means, the optimal compounds to select as candidate compounds to act as drugs is of paramount importance. If that can be done reliably enough for the predictions to be actionable, such an approach will streamline the extremely laboratory intensive process of making and testing compounds in laboratories, as well as providing a much more dynamical basis for future in silico clinical trials by allowing feedback between things happening in trials and the eventual selection of a reliable drug.
There are several articles in this issue which are dedicated to approaches which will help to change the way that drug discovery is done in the longer term.
In [6], Wan et al. describe how a combination of hit-to-lead and lead-discovery methods has been successfully applied to the scientifically and pharmaceutically important class of G-protein coupled receptors to show how initial lead compounds of diverse natures can be selected and then refined in subsequent more compute demanding but less numerous studies.
König & Riniker [7] investigate the discrepancies between classical, forcefield based protein structure calculations using molecular mechanics and the more accurate quantum mechanics which underlies their ‘true’ behaviour, but which is too costly to use in present day computational techniques. The authors find significant discrepancies between all the widely used force fields and quantum mechanical predictions (themselves based on a particular level of approximation), indicating that enhanced accuracy of the classical parameterization will in the future need to draw on more carefully tuned force fields or ones based directly on quantum calculations.
One of the most exciting ways in which molecular simulation may play a direct role in clinical medicine is through its application to determine which drug a given genotype or genetic variant may respond to most effectively. Philip Fowler shows in his contribution [8] how free energy estimation may be used to treat bacterial (and viral) infections on a personalized basis following the sequencing of the tuberculosis genome for each infected individual. The timescales are now fast enough for the method to be used in clinical practice.
The nature of such free energy calculations requires access to powerful computational resources. To achieve the highest rates of throughput, it is necessary to have access to the latest supercomputers whose architectures today, being based on a vast proliferation of nodes, within which are present large numbers of cores and accelerators (mainly general-purpose graphics processing units), facilitate ensemble-based simulations of very large numbers of ligand–protein simulations. Zasada et al. [9] describe ways in which such free energy calculations can be off-loaded and run with considerable facility on modern computational clouds by exploiting concepts such as virtual machines and containerization to facilitate secure, rapid and reliable deployment. Indeed, their high-throughput binding affinity calculator can be used to run such calculations as hybrid workloads across combinations of distributed heterogeneous resources from the ultra-high-end of high-performance computing to elastic compute clouds.
Wan et al. [10] provide a review of approaches to drug discovery which are predicated on performing calculations in a reliable and reproducible manner using classical molecular dynamics, and in which the statistical robustness of the predictions is of central concern. This serves to emphasize the importance of treating the basic computational methods as having a probabilistic nature rather than a deterministic one.
Gheorghiu et al. [11] have studied the role of quantum mechanical processes associated with proton hopping between base pairs within DNA double helices, using a combination of ensemble-based classical molecular dynamics coupled to quantum mechanics/molecular mechanics, in order to calculate the rates of these reaction processes, the barriers to the reverse reactions and thermodynamic aspects, which lead to the conclusion that the Löwdin mechanism for mutations is unlikely to play a significant role in causing mutations within human DNA.
But even the most powerful supercomputers on Earth cannot keep pace with requirements to solve the most complicated problems in medicine. The problem is that, while the development of computing technology has outstripped by orders of magnitude progress in virtually all other fields, it remains too slow to be applied to many large scale complex systems. Worse, the intrinsic speed of the chips used to perform the processing has now tapered off, a result of approaching physical limits and the need to reduce the power consumption and energy dissipation of the leviathans we use for this research. So we must always be on the lookout for new and different forms of computing, and quantum computing is one which has considerable potential. In her article [12], Viv Kendon describes the opportunities that may arise from the successful implementation of quantum devices and the kinds of problems that may one day become tractable through their use. The exemple par excellence is of direct relevance here: a quantitative description of the electronic structure of molecules of arbitrary size is utterly out of reach of conventional quantum chemical methods which run on classical computers of any size. The hope here is of their solution on future quantum computers. Armed with such a capability, one might hope to use these descriptions as a basis for more rapid and reliable approaches to drug design and the description of biochemical processes in which electron dynamics needs to be tracked.
Finally, and of tremendous importance for the future uptake and championing of all these theory-based computational methods is a new approach to education and training of current and future generations of students studying and working in the fields of biosciences and medicine. Townsend-Nicholson [13] describes her innovative work in developing educational and training courses for diverse groups of biomedical students, ranging from early undergraduates to advanced medical students on electives, who have engaged in substantial numbers and with great enthusiasm on a set of educational, training and research projects within which the use of high-performance computing is a central element. Ultimately, this educational effort will help to trigger a seismic change in culture which shifts emphasis from a data-centric to a theory-led approach to biomedicine.
Data accessibility
This article has no additional data.
Competing interests
I declare I have no competing interests.
Funding
The theme issue is based on an open call for papers associated with the Computational Biomedicine 2019 Conference, held in London 22–27 September 2019, organized by the Computational Biomedicine Centre of Excellence, funding by the European Union's Horizon 2020 research and innovation program under grant agreement nos. 675451 and 823712.