Predicting virus emergence amid evolutionary noise

The study of virus disease emergence, whether it can be predicted and how it might be prevented, has become a major research topic in biomedicine. Here we show that efforts to predict disease emergence commonly conflate fundamentally different evolutionary and epidemiological time scales, and are likely to fail because of the enormous number of unsampled viruses that could conceivably emerge in humans. Although we know much about the patterns and processes of virus evolution on evolutionary time scales as depicted in family-scale phylogenetic trees, these data have little predictive power to reveal the short-term microevolutionary processes that underpin cross-species transmission and emergence. Truly understanding disease emergence therefore requires a new mechanistic and integrated view of the factors that allow or prevent viruses spreading in novel hosts. We present such a view, suggesting that both ecological and genetic aspects of virus emergence can be placed within a simple population genetic framework, which in turn highlights the importance of host population size and density in determining whether emergence will be successful. Despite this framework, we conclude that a more practical solution to preventing and containing the successful emergence of new diseases entails ongoing virological surveillance at the human–animal interface and regions of ecological disturbance.

Decision letter (RSOB-17-0189) 12-Sep-2017 Dear Dr Holmes, We are pleased to inform you that your manuscript RSOB-17-0189 entitled "Predicting virus emergence amidst evolutionary noise" has been accepted by the Editor for publication in Open Biology. The reviewer(s) have recommended publication, but also suggest some minor revisions to your manuscript. Therefore, we invite you to respond to the reviewer(s)' comments and revise your manuscript.
Please submit the revised version of your manuscript within 14 days. If you do not think you will be able to meet this date please let us know immediately and we can extend this deadline for you.
To revise your manuscript, log into https://mc.manuscriptcentral.com/rsob and enter your Author Centre, where you will find your manuscript title listed under "Manuscripts with Decisions." Under "Actions," click on "Create a Revision." Your manuscript number has been appended to denote a revision.
You will be unable to make your revisions on the originally submitted version of the manuscript. Instead, please revise your manuscript and upload a new version through your Author Centre.
When submitting your revised manuscript, you will be able to respond to the comments made by the referee(s) and upload a file "Response to Referees" in "Section 6 -File Upload". You can use this to document any changes you make to the original manuscript. In order to expedite the processing of the revised manuscript, please be as specific as possible in your response to the referee(s).
Before uploading your revised files please make sure that you have: 1) A text file of the manuscript (doc, txt, rtf or tex), including the references, tables (including captions) and figure captions. Please remove any tracked changes from the text before submission. PDF files are not an accepted format for the "Main Document".
2) A separate electronic file of each figure (tiff, EPS or print-quality PDF preferred). The format should be produced directly from original creation package, or original software format. Please note that PowerPoint files are not accepted.
3) Electronic supplementary material: this should be contained in a separate file from the main text and meet our ESM criteria (see http://royalsocietypublishing.org/instructions-authors#question5). All supplementary materials accompanying an accepted article will be treated as in their final form. They will be published alongside the paper on the journal website and posted on the online figshare repository. Files on figshare will be made available approximately one week before the accompanying article so that the supplementary material can be attributed a unique DOI.
Online supplementary material will also carry the title and description provided during submission, so please ensure these are accurate and informative. Note that the Royal Society will not edit or typeset supplementary material and it will be hosted as provided. Please ensure that the supplementary material includes the paper details (authors, title, journal name, article DOI). Your article DOI will be 10.1098/rsob.2016[last 4 digits of e.g. 10.1098/rsob.20160049]. 4) A media summary: a short non-technical summary (up to 100 words) of the key findings/importance of your manuscript. Please try to write in simple English, avoid jargon, explain the importance of the topic, outline the main implications and describe why this topic is newsworthy.

Images
We require suitable relevant images to appear alongside published articles. Do you have an image we could use? Images should have a resolution of at least 300 dpi, if possible.

Data-Sharing
It is a condition of publication that data supporting your paper are made available. Data should be made available either in the electronic supplementary material or through an appropriate repository. Details of how to access data should be included in your paper. Please see http://royalsocietypublishing.org/site/authors/policy.xhtml#question6 for more details.

Data accessibility section
To ensure archived data are available to readers, authors should include a 'data accessibility' section immediately after the acknowledgements section. This should list the database and accession number for all data from the article that has been made publicly available, for instance: We are pleased to inform you that your manuscript entitled "Predicting virus emergence amidst evolutionary noise" has been accepted by the Editor for publication in Open Biology.
If applicable, please find the referee comments below. No further changes are recommended.
You can expect to receive a proof of your article from our Production office within approx. 5 working days. Please let us know if you are likely to be away from e-mail contact during this period. Due to rapid publication and an extremely tight schedule, if comments are not received, we may publish the paper as it stands.
Thank you for your fine contribution. On behalf of the Editors of Open Biology, we look forward to your continued contributions to the journal.
Sincerely, The Open Biology Team mailto: openbiology@royalsociety.org As a conscientious publisher, Open Biology is keen to get your opinion on the publishing system so we can adapt and make the process more author-friendly. In order to achieve this, we would like to invite you to participate in a survey being conducted by Editage Insights by clicking on the following link: https://www.surveymonkey.com/r/author-perspectives-on-academicpublishing-royal-society This should take no more than 15 minutes and you will have the opportunity to enter a prize draw. We hope these results will provide us with valuable insights we can use to improve our service.

Referee: 1
Comments to the Author(s) I tremendously enjoyed reading this article, which is beautifully written. I could not agree more with the argument that prediction of viral emergence is inherently difficult (although I am an optimistic and would not go as far as calling this a 'futile' exercise). I fully concur with the authors that cataloguing viral diversity, as in the Global Virome Project, is overpromising our forecasting ability.
Response: We thank the reviewer for these positive comments.
I have only a few minor comments: -The authors provide nice examples of emergence 'failures', whereby viruses that seemingly have the required genetic makeup for successful transmission in a new host still fail to cause large-scale outbreaks, presumably due to ecological issues (the poster child being Canine Influenza Virus, CIV, here). There is however very little in this paper on the viruses that successfully caused full blown epidemics (HIV, SARS, pandemic flu, and perhaps a few more examples from non-human diseases systems). It would be useful to elaborate on these 'success' stories for contrast, summarize the processes of genetic adaptation (if any) and the permissive ecological conditions present during the emergence phase.
Response: The reviewer makes a fair point. We have therefore revised the paper to include some extra details on these virus 'success stories', focusing on HIV and HCV, and with a number of additional references. We also note that most (if not all) endemic viral infections likely have an animal reservoir, even though we have not identified that reservoir in most cases (with HCV providing a good example). Hence, for most of the success stories we don't actually know what mutations have been involved in human (or non-human) adaptation. We now make this point in the paper.

-
Using CIV as a case study, the authors highlight the importance of permissive ecological factors, which they summarize as "host population size" in Fig 4. However, it is more complex than sheer population size. As a case in point, the global canine population size is huge and would likely be able to sustain CIV outbreaks if it were a more connected population. Further, other factors beyond population structure could hinder viral emergence, including prior immunity and age structure (an emergent virus may be immunologically related to other viruses already encountered by a new host population, eg flu or enteroviruses). Perhaps a better terminology for what the authors mean is 'effective susceptible population', which could encompass population size and turn-over, contact networks and age structure, prior immunity, and probably other factors. A discussion of these ecological factors would be useful somewhere in the text.
Response: We agree with the reviewer and have revised the paper accordingly. By 'host population size' we were really thinking of a combination of host population size AND density. We have now clarified this in the paper. However, the reviewer is clearly correct that we need more nuance here and hope we have provided this in the revised version of the paper. Indeed, we now explicitly mention prior immunity and age structure. We really like the idea of 'effective susceptible population size'sums it up nicelyand have therefore revised Figure 4 and the associated text accordingly.
-In thinking about the 'fault-lines' of disease emergence at the animal-human interface, can the authors elaborate on the geographic regions where sampling should be prioritized? Clearly, there has been much attention devoted to identifying hotspots of viral diversity and regions associated with rapid ecological changes. This broadly aligns active animal-human interfaces, where I think the authors would like to see increased sampling. However, these fault lines fail to explain the emergence of the 2009 pandemic virus, MERS-CoV, or Zika virus. Do we need more research on identifying these fault lines in a more quantitative way, and if so, could sampling of viral diversity across species, and more behavioral/ecological data on species diversity and contacts, help in any way? Or are we left with sampling already recognized hotspots, which may help in only a fraction of viral emergence situations?
Response: This is a good question. It seems counter-productive to list precise geographic locations given our general nervousness about predictions. However, we have now suggested that existing serological data might be a good way of identifying those human populations that are commonly exposed to animal pathogens and hence where sampling might be most profitable. 2 The study of virus disease emergence, whether it can be predicted and how it might be prevented, has become a major research topic in biomedicine. HereinHere we show that efforts to actively predict disease emergence commonly conflate fundamentally different evolutionary and epidemiological timescales, and are likely to fail because of the enormous number of unsampled viruses that could conceivably emerge in humans.
Although we know much about the patterns and processes of virus evolution on evolutionary timescales as depicted in family-scale phylogenetic trees, these data have little predictive power to reveal the short-term microevolutionary processes that underpin cross-species transmission and emergence. Truly understanding disease emergence therefore requires a new mechanistic and integrated view of the factors that allow or prevent viruses to spreadspreading in host populationsnovel hosts. We present such a view, suggesting that both ecological and genetic aspects of virus emergence can be placed within a simple population genetic framework, which in turn highlights the importance of host population size and density in determining whether emergence will be successful. Despite this framework, we conclude that a more practical solution to preventing and containing the successful emergence of new diseases entails ongoing virological surveillance at the human-animal interface and regions of ecological disturbance.

Keywords
Emergence; Evolution; Phylogeny; Virus; Spill-over; Virosphere  (5)(6)(7)(8). (5)(6)(7)(8). Gaining this sort of predictive capability would have obvious and wideranging benefits. In these approaches, the study of virus emergence is often synonymous with the study of virus evolution, such that the more we understand about the patterns and processes of evolutionary change, the more accurate any emergence prediction is thought to be.
However, accurate predictions of disease emergence must be based on a correct and rigorous understanding of how viruses jump between hosts and adapt to new transmission cycles, including the timescale over which these processes occur. We show here that a more meaningful understanding of virus emergence requires us to shift the focus away from the broader processes of virus evolution and towards the short-term factors that influence the probability of the successful establishment of a virus in a host population. In other words, if the goal is to develop a meaningful predictive model of disease emergence, there may be considerable value in tuning out the background 'noise' of virus evolution rather than building the model around long-term evolutionary processes. More fundamentally, we will argue that a more practical approach to the challenge of virus emergence will involve abandoning prediction in favour of genomic surveillance at the ecological 'fault-lines' of emergence.

The Nature of Virus Emergence
The successful emergence of a virus in a new host will often entail a significant adaptive challenge. Indeed, one of the most important observations in disease emergence is that not all viruses that jump species boundaries successfully evolve onward transmission in Open Biology/Perspectives 5 the new host. Rather, many such viruses appear as transient 'spill-over' infections that soon die out, even in the absence of infection control. For example, despite repeated spill-over events from birds to humans, H5N1 avian influenza virus has not been able to evolve sustained human-to-human transmission (9). (9). Other viruses have been more successful and resulted in significant outbreaks in new hosts. For example, Ebola virus (EBOV) has caused several localised epidemics that have been largely restricted in their spread by administrative boundaries and border closings (10). (10). Hence, although there is evidence of the active adaptation of EBOV to human populationshumans during the recent 2013-2016 outbreak in West Africa (11,12), it is likely that (11,12), the virus clearly possesses the necessarybase-line virological traits needed to ensure its onward transmission in the new host. Finally, some viruses have evolved to become endemic human pathogens, involving the generation of well-established and long-standing chains of transmission that do not require repeated spill-over events from an animal reservoir.
An obvious case in point is HIV, the agent of AIDS, although a wide variety of human viruses fall into this class. Indeed, it is likely that most endemic virus infections in humans ultimately resulted from cross-species transmission, although in the majority of cases the exact animal reservoir species are unknown or unsampled. For example, although hepaciviruses are being increasingly documented in animal populations, it is likely that the true reservoir species for human hepatitis C virus has yet to be identified (13,14). There has been a great deal of experimental research in a number of systems directed toward identifying those specific viral genomic mutations responsible for successful host adaptation (13). (15), although as noted above an important limitation is that there are still relatively few cases in which the precise chain of evolutionary events from reservoir to recipient species have been determined (16). As expected, many mutations that promote successful host-adaptation are concerned with aspects of virus-receptor binding (14), although changes in other traits, such as pH, are also of importance (15). However, it is clear that virus genetics alone cannot explain the spectrum of disease emergence types. (11,12,17), although changes in other traits, such as pH (18) and interactions with host antiviral responses (19,20), are also of importance. However, virus genetics alone cannot explain why only some emerging viruses are successful. Indeed, even viruses that appear to be well adapted to a specific host (i.e. that seemingly harbour all necessary host-specific mutations) may fail to spread.
An informative example concerns the recent emergence of the A/H3N8 subtype of canine influenza virus (CIV). Although this virus was first recorded in dogs in the USA in the early 2000s, with horses acting as the reservoir host (16), (21), it has failed to become established in the domestic dog population. Instead, CIV is largely confined to dog shelters, where most dogs are infected soon after they arrive (17). (22). CIV clearly possesses all the genetic characteristics necessary to spread in dogs, and its reproductive number in dog shelters is always sufficient (i.e. R 0 > 1) to allow its spread within these confined spaces. However, CIV has failed to ignite a wider epidemic in dogs, likely because contact heterogeneity in the domestic dog population is much greater than in dog shelters such that there is an insufficient density of susceptible hosts for the outbreak to take hold (17). This inhibition of virus emergence through a lack of Open Biology/Perspectives 7 susceptibles is likely to be commonplace. (22). This inhibition of virus emergence through a lack of susceptible hosts is likely to be commonplace. The general lesson to be learned for exercises in prediction is that determining whether a virus can spread in a particular host, for example following cell passage experiments or using animal models, does not mean that it will in the real world unless epidemiological circumstances are permissive.

Is Emergence Predictable?
Predicting emergence has become one of the highest stakes topics in the study of infectious disease. The multi-host dynamics of virus emergence, from donorreservoir to recipient hosts, requires us to consider the interplay of host ecology and virus genetics.
Open Biology/Perspectives 8 A central goal of research in this area has been to reveal the 'rules' that underpin disease emergence, on the implicit assumption that predictive accuracy will follow. A more ambitious scheme was established in 2016 in the guise of the Global Virome Project (GVP). Through a global partnership the GVP aims to identify and characterize 99% of zoonotic viruses with epidemic potential to better predict, prevent and respond to future viral threats (24). (28). To achieve these aims, the GVP will perform large-scale The idea of ana virosphere so expansive is supported by the vast numbers of viruses discovered by recent studies of viral biodiversity that have been stimulated by advances in metagenomics, particularly the use of bulk RNA sequencing (25,26,28). (29,30,32).
Importantly, these metagenomics studies have considered virus diversity in terrestrial species, whereas previous studies had a strong focus on aquatic environments and DNA bacteriophage (29)(30)(31). (33)(34)(35). Most dramatically, a recent metagenomic analysis of nine invertebrate phyla identified 1445 novel RNA viruses, as well as newly defined genera and families (and possibly orders) (25). (29). Not only does this represent a major increase in our knowledge of virus diversity, but that it came from a survey of only 220 species from a small number of sampling locations in China hints at the true scale of the virosphere.
Although vertebrates, particularly mammals, may carry a smaller number of viruses, the number is still so very large as to make any detailed experimental follow-up of even the vertebrate virosphere impractical, particularly as the rapid nature of RNA virus evolution means that any individual virus species will harbour a wide diversity of ever-changing variants. This impracticality is augmented when one considers our lack of knowledge of whether this vast set of viruses can replicate in human cells, and even this trait will not guarantee that a virus will be able to successfully transmit between hosts. In this context it has been proposed that machine-learning may help in pandemic prediction, for instance by using sequence data to predict which cell receptors a virus might utilize (32, Open Biology/Perspectives 10 33). (36,37). However, attaining knowledge of cell receptor compatibility in itself does not enable accurate predictions of emergence, particularly as viruses with a diverse range of receptors are able to infect humans. For example, it has long been known that influenza viruses bind to sialic acid-containing molecules as receptors (34). (38). However, this information has not improved prediction of influenza virus emergence and reemergence. More generally, machine learning requires very large amounts of data to predict common events, whereas studies of disease emergence necessarily utilize data on rare events to predict rare events.
Paradoxically, then, the more we sample animal populations, the less frequently virus cross-species transmission to humans seems to occur. For example, when SARS coronavirus (CoV) was revealed to have its origin in bats (35), (39), the total number of known bat viruses was very small so that the likelihood that a bat virus might emerge in humans correspondingly appeared to be relatively high. However, the total number of known bat viruses has increased dramatically with better sampling (6,36,37), (6,40,41), and bat-to-human zoonotic transmission now appears to be a rare event.
It is also important to recall that the most recent viruses to achieve epidemic spread in humans -Ebola and Zika -were caused by known and well described human pathogens, with the first descriptions of Zika virus going back to the 1940s (38). (42). Yet, our previous knowledge of these pathogens was not indicative of their epidemic potential. It may therefore be the case that the greatest pandemic threat in fact lies in those viruses that re-emerge intermittently and whose onward success depends on the availability of a large, density populated host population.in large and dense host populations. We It has also been suggested that wildlife host species richness is an important predictor of disease emergence (5). (5). Conversely, however, biodiversity has also been linked to a decrease in disease risk through the 'dilution effect' (40)(41)(42)(43)(44). (44)(45)(46)(47)(48). This was first developed as a framework to infer the dynamics of tick-borne Lyme disease and describes the association between increasing species richness and reduced disease risk, particularly when the most competent hosts were dominant in the community and alternative hosts negatively influenced the dominant hosts as reservoirs (40,45). (44,49).
Although still debated, the dilution effect highlights the central role of host biodiversity and ecology in shaping the epidemiology of disease-causing pathogens. Inevitably, habitat destruction and ecosystem disturbance due to changes in land use will contribute to the loss of biodiversity. The broader consequences of such losses for emerging human pathogens are unknown and clearly merit further investigation (Figure 2).
Open Biology/Perspectives 12 Although cross-species virus transmission sits at the heart of virus emergence, phylogenetic studies of the frequency with which different virus families are able to jump species boundaries also offer little predictive power as all exhibit a strong tendency to jump hosts (46).. Indeed, it now appears that the evolutionary history of most virus families comprises a complex mix of cross-species transmission and virus-host codivergence, and that trying to disentangle the respective contributions of each process will be challenging (46). (50). In addition, the greater diversity of hosts and their viruses sampled, the more cases of species jumping we are likely to document (46). (50).
Importantly, these phylogenetic studies also demonstrate that virus-host associations, including cross-transmission, may extend over many millions of years and not only in the recent past as is assumed in studies of virus emergence. As a case in point, evolutionary relationships within the Narna-Levi group of RNA viruses are compatible with virus-host co-divergence since the α-proteobacteria became endosymbionts (25).As a case in point, it is possible that the Narna-Levi group of RNA viruses have co-diverged with their hosts since the α-proteobacteria became endosymbionts (29).
While other comparative analyses have revealed those virological factors that increase the transmissibility of emerging viruses in humans (7),(7), these analyses also likely offer little predictive power. These studies suggest that viruses with low host mortality, that establish chronic infections, that are non-segmented, that do not possess an envelope, and that are not transmitted by vectors have greater 'emergibility' in humans (7). (7).
Nonetheless, many viruses still fall into this class and a number of these traits are not measurable until the virus has already established itself in a new host, diminishing the predictive utility of such 'viral traits' analyses.
Open Biology/Perspectives 13 Given these uncertainties, and the fact that elements of the evolutionary processesprocess that underpins emergence are inherently unpredictable, we suggest that there is no simple algorithm that will enable an accurate prediction of what viruses might emerge in the future. Accordingly, we suggest thatHence, it is necessary to lower our expectations about disease emergence as a predictive science. In particular, although metagenomics undoubtedly has major implications for our understanding of virus evolution, it also likely undermines biodiversity-based attempts to predict the virus source of the next major disease pandemic (6). (6). There are clearly so many viruses in nature that trying to determine which will ultimately appear in a new host from diversity sampling alone is almost certainly futile.
Predictions also sit uneasily with most aspects of evolutionary biology. Even relatively simple traits like virulence, which have generated considerable evolutionary theory, have proven difficult to predict because of myriad unknown forces that shape their evolutionary trajectory (47). (51). Although there has been some success in using phylogenetic approaches to predict the short-term evolution of human influenza virus (48),(52), the nature of the central selective processes shaping virus evolution (i.e. antigenic drift) is well known and to some extent quantifiable over the timescale studied. This is demonstrably not the case when considering unknown emerging viruses.

The conflation of epidemiological and evolutionary timescales
At face value it seems obvious that evolutionary ideas and analyses will help predict the emergence threat posed by different viruses. However, a major limitation is that evolutionary processes, particularly those reliant on phylogenetic or other comparative Open Biology/Perspectives 14 analyses, often occur on a markedly different timescale than the epidemiological processes relevant to pandemic prediction. Indeed, one of the most important conclusions of recent work in the study of RNA virus evolution is that the timescale over which these viruses have evolved, including cross-species transmission events, is likely far longer than previously imagined. This realisation comes from both phylogenetic studies of virus biodiversity and branching patterns (46,49), (50,53), particularly the match between parts of the virus and host trees, and the analyses of endogenous virus elements that act as genomic fossils (50). (54). Hence, it is likely that many of the viral families that infect vertebrates have done so for many millions of years, and have experienced continual cross-species transmission since this time.
Although central to understanding evolutionary processes, these timescales are irrelevant for predicting the next pandemic within an epidemiological timescale (i.e. 1-10 years).
The same caveat applies to studies that have used the taxonomic span covered by viral families as a way of determining which have the greatest propensity to jump hosts (6). (6). These taxonomic ranges may have taken millions of years to generate and not the scale of years necessary for effective pandemic prediction. Evolutionary and epidemiological timescales should not simply be assumed to be equivalent. Although phylogenies can be used to accurately describe both macro-and micro-evolution, and superficially appear similar, the trees at these two scales are produced by markedly different evolutionary processes ( Figure 3). As it is clear that the pace of human ecological (anthropogenic) change generally occurs more rapidly than successful virus host-jumping adaptationsdepicted in a phylogenetic tree, from a public health point of view we would do better to monitor ongoing environmental disturbance by humans than quantify long-term aspects of virus evolution.
Open Biology/Perspectives 15 An informative example of this fundamental disconnect between evolutionary and epidemiological timescales is provided by the hepadnaviruses, which include human hepatitis B virus (HBV). There is strong evidence for hepadnavirus-host co-divergence stretching back for effectively the entire time-span of vertebrate evolution (49).(53, 55).
Cross-species transmission has occurred on this background of co-divergence, with a recent analysis revealing ~13 host jumps over an evolutionary period of approximately 400 million years (46). Although our sample of hepadnaviruses is inevitably small, (50).
Although our sample of hepadnaviruses is inevitably small, with new hepadnaviruses recently identified in fish (55), and more cases of cross-species transmission will assuredly be found, this very roughly equates to a successful cross species transmission event every 30 million years. Even if the rate of host jumping is 10,000 times more frequent, occurring once every 3,000 years, this is still far too broad brush a timescale to provide any meaningful predictive value for the study of human disease emergence. A similar story can be told for the influenza viruses. Although these are exemplars of crossspecies transmission (51), which occurs frequently in the Orthomyxoviridae (46), (56), which occurs frequently in the Orthomyxoviridae (50), it is still problematic to make these predictions over the timescale of human observation. For example, the emergence of H3N8 equine influenza virus from an avian host took place in the early 1960s. Although this virus is clearly adapted for mammalian respiratory transmission, there is no evidence that it has transmitted to humans during the last 50 years.
Those cases in which viruses have been deliberately released as biological controls also highlight the disconnect between evolutionary and epidemiological timescales. These natural experiments proceed over epidemiological timescales which in many ways parallel the natural emergence and spread of a novel virus in a new host. Most notably, Open Biology/Perspectives 16 both myxomavirus (a poxvirus) and rabbit haemorrhagic disease virus (a calicivirus) have been successfully released as biological controls into populations of European rabbits in Australia and Europe, in the 1950s and 1990s, respectively (52). What is particularly striking is that to date there are no (57). What is particularly striking is that to date there are no strongly supported cases of these viruses jumping into other (i.e. non-lagomorph) species over the timescale of release, even though both these virus families appear to experience very frequent host-jumping over long evolutionary timescales (46).virus families appear to experience very frequent host-jumping over long evolutionary timescales (50). Hence, the observation that poxviruses can frequently jump species boundaries over evolutionary timescales provides no assistance in predicting what happens on the shorter timescalestimescale that govern epidemics.

A population-genetic framework to understand virus emergence
The study of virus emergence represents a synthesis of two different types of scientific enquiry: virology, which aims to determine, usually experimentally, the mutations that enable a virus to infect a new host, and epidemiology, which primarily seeks to identify the ecological factors responsible for viruses crossing the species boundary and spreading in a new host.
We believecontend that both these approaches can be synthesised within a single population genetic framework. Specifically, the cross-species transmission and emergence of a virus in a new host mightcan be explainedenvisioned as a simple form of the adaptive process, wherein which the subject under consideration is the acquisition of mutations that facilitate replication and transmission and hence increase viral fitness.
Open Biology/Perspectives 17 Although they may be of myriad form, the ecological factors that dictate whether such an emergence event will be successful are directly analogous to the random sampling effects that necessarily impact the spread of any new allele in a population, increasing the likelihood of genetic drift that will in turn result in stochastic loss. For example, the extensive contact heterogeneity (i.e. lack of susceptible hosts) that prevents CIV spreading in the domestic dog population is equivalent to the fate of an advantageous allele in a small host population. That is, although the virus (mutation) may be host adapted (advantageous), it will not spread far because the host population is so small/sparse that genetic drift dominates substitution dynamics (and even strongly advantageous alleles may be lost rather than fixed in small populations).
We suggest that this new population genetic view of the process of cross-species transmission and emergence can be achieved by making the move away from thinking about viruses spreading horizontally through a population (by host-to-host transmission), which is the realm of epidemiology, and towards thinking about virus alleles/genes being inherited vertically, which is the domain of population genetics. A well-understood framework is that successful cross-species transmission requires three steps: (i) encounter a new host species, (ii) infection of that new host, and (iii) propagation in new host population (53). Genetic adaptationthe new host population (58). Adaptation to the new host species may often represent a major challenge, as mutations that are beneficial in this host also likely decrease fitness in the donor host species. This opposing selection between donor and recipient hosts shapes the adaptive landscape of viral emergence (21).reservoir host species. This opposing selection between reservoir and recipient hosts shapes the adaptive landscape of viral emergence (25). For example, as the gradient of the adaptive landscape increases, genetic variants are subjected to stronger opposing Open Biology/Perspectives 18 selection between the donorreservoir and the recipient hosts. Models of this adaptive process therefore offer an indication of which part of parameter space the host adaptation of a novel virus might be possible (2,4,21,23). (2,4,25,27).
Importantly, however, this adaptive process must also occur within the background of random sampling. Because of a lack of host contacts, or descendants, genetic drift will reduce sampling of the fittest virus and decrease the probability of emergence. Hence, as the host population increases in size, the probability that a virus will be sampled increases ( Figure 4).), although it is likely that additional factors, including prior immunity and population age structure, will also impact the probability of virus sampling. A simple lesson from this new realisationapproach is that host population size and density -which can be thought of as comprising the 'effective susceptible population size' for an emerging virus -will have a major impact on whether a new virus will successfully spread in a population, irrespective of the fitness of a particular mutation (i.e. whether a virus contains all the mutations necessary to adapt it to a new host). Consequently, if the fitness of a virus and the hosteffective susceptible population size were known, or even measurable, it would be possible to make bounded estimates of how likely a successful emergence event might be.
The importance of genetic drift can also be seen in the transmission bottlenecks that will routinely occur as a virus moves between hosts (54), which likely puts a brake on host adaptation (55)(56)(57)(58). (59), which likely put a brake on host adaptation (59)(60)(61)(62)(63). Even if a specific variant is favoured within an individual host, but does not increase sufficiently in frequency (i.e. such that is still found at sub-consensus levels), then a severe population bottleneck may result in its loss. Clearly, the more severe the population bottleneck, the less natural selection will be able to optimise viral fitness at the epidemiological scale. We therefore propose that a more effective practical strategy for managing emerging and re-emerging epidemic or pandemic disease is the targeted surveillance of viromes at the human-animal interface. The vast biodiversity of viruses in the animal world makes their analysis prior to any emergence in humans a Sisyphean exercise. Rather, humans are the best sentinels: a virus discovered in humans very obviously can replicate in that host, which will not be the case for myriad viruses identified through biodiversity surveys of other animal taxa.
Open Biology/Perspectives 20 We therefore urge regular genomic surveillance at the fault-line of disease emergence that captures this human-animal interface (Figure 4). Examples of this interface that could be sampled are those associated with (i) major changes in land-use, particularly human encroachment into forest areas during deforestation; (ii) occupational exposure to live animal markets; and (iii) changes in human demographics, behaviour and political instability that result in population mobility and displacement. To take one specific example, the hunting and butchering of wild animals, and the meat trade that flows from it, is common practice among many countries. This activity must represent a conduit for cross-species pathogen transmission, and is likely responsible for the transfer of simian retroviruses from infected nonhuman primates to humans (59). Virological surveillance of those working in the bushmeat trade therefore appears a necessary measure.
Importantly, accelerating environmental and anthropogenic changes are expanding the human-animal interface (59),This activity must represent a conduit for cross-species pathogen transmission, and is likely responsible for the transfer of simian retroviruses from infected non-human primates to humans (64). Virological surveillance of those working in the bushmeat trade therefore appears a necessary measure. Importantly, accelerating environmental and anthropogenic changes are expanding the human-animal interface (59), and the rapid movement of humans and livestock, as well as agricultural produce, highlights the importance of effective surveillance.
This virome surveillance should be ongoing and performed simultaneously on multiple human populations globally. While, with existing serological data perhaps helping to determine which geographical locations harbour human populations most frequently exposed to animal viruses and hence where virome surveillance will be most informative.
In addition, while metagenomics is hugely powerful in characterising the viromes of Open Biology/Perspectives 21 individual organisms, including the discovery of new species, it requires active infection (replication) and that samples be taken from tissues that contain the virus. For this reason meta-serological surveys will also be of importance as they enable the identification of infections that have occurred in the recent past.

Conclusions
Predicting virus emergence has risen to become a key goal of the study of infectious disease. The study of virus evolution has revealed much about the nature of virus emergence and its history over evolutionary timescales. However, due to the fundamental differences between evolutionary and epidemiological timescales, a focus on virus evolution may in fact be a distraction when it comes to predicting the next virus pandemic. Similarly, while virological features that increase the likelihood of virus emergibility can be identified, these features cannot be treated as hard and fast rules determining which viruses will in fact successfully emerge. Further, many of these features are only capable of being observed after emergence occurs, such that they are likely to be of little predictive power. In partial response to these problems, we suggest that the field may be advanced by utilizing a population genetic framework that melds genetic and ecological studies of virus emergence, and which highlights how the effective susceptible population size of a new host plays a major role in dictating the chance of successful emergence. In this manner we identify the possibility of a meaningful theoretical framework for the study of emergence that is grounded in evolutionary theory, but that tunes out the 'noise' of virus macroevolution.
Open Biology/Perspectives 22 Despite such a framework, the inconvenient truth for all those working in the realm of disease emergence is that the vastness of the unknown virosphere and the diverse range of viruses that have achieved endemic transmission in humans means that any attempt to predict what virus may emerge next will face substantial, and likely crippling, difficulties. In light of this we suggest it may be of more benefit to public health to target, via surveillance, the fault-line of disease emergence that is the human-animal interface, particular those shaped by ecological disturbance. Once a virus is identified as being of interest in this manner, other analyses may be able to assess its impact and pandemic potential. Such a shift in focus, away from being able to make predictions of emergence based on fundamental rules and toward the better assessment of emergence impact, is likely both more achievable and more likely to provide positive public health outcomes.
Competing interests. We declare we have no competing interests.
Author's contributions. JLG and ECH jointly conceived and wrote the paper. Both authors gave final approval for publication.    Critically, however, the adaptive processes (i.e. mutation and selection) that lead to virus 'spill-overs' and possible emergence in a new host are more informative when considering a shorter, microevolutionary timescale.