Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
You have accessReview article

Don't know, can't know: embracing deeper uncertainties when analysing risks

David J. Spiegelhalter

David J. Spiegelhalter

Statistical Laboratory, Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WB, UK

[email protected]

Google Scholar

Find this author on PubMed

and
Hauke Riesch

Hauke Riesch

Centre for Environmental Policy, Imperial College London, London SW7 2AZ, UK

Google Scholar

Find this author on PubMed

Published:https://doi.org/10.1098/rsta.2011.0163

    Abstract

    Numerous types of uncertainty arise when using formal models in the analysis of risks. Uncertainty is best seen as a relation, allowing a clear separation of the object, source and ‘owner’ of the uncertainty, and we argue that all expressions of uncertainty are constructed from judgements based on possibly inadequate assumptions, and are therefore contingent. We consider a five-level structure for assessing and communicating uncertainties, distinguishing three within-model levels—event, parameter and model uncertainty—and two extra-model levels concerning acknowledged and unknown inadequacies in the modelling process, including possible disagreements about the framing of the problem. We consider the forms of expression of uncertainty within the five levels, providing numerous examples of the way in which inadequacies in understanding are handled, and examining criticisms of the attempts taken by the Intergovernmental Panel on Climate Change to separate the likelihood of events from the confidence in the science. Expressing our confidence in the adequacy of the modelling process requires an assessment of the quality of the underlying evidence, and we draw on a scale that is widely used within evidence-based medicine. We conclude that the contingent nature of risk-modelling needs to be explicitly acknowledged in advice given to policy-makers, and that unconditional expressions of uncertainty remain an aspiration.

    In general, there is a degree of doubt, caution, and modesty, which, in all kinds of scrutiny and decision, ought for ever to accompany a just reasoner.

    David Hume [1], p. 89

    1. Introduction

    Hume's ‘just reasoner’, when faced with a difficult public policy decision in current times, would probably commission a risk analysis. But how could they incorporate ‘a degree of doubt, caution and modesty’? Here, we consider the issue of incorporating such deeper uncertainties into formal risk models.

    The academic literature about risk covers a broad spectrum from mathematical analyses to sociological discourses. Both ends of the spectrum deal with uncertainty, but the ‘quantitative’ extreme uses the formal language of probability theory as a tool for analysis [2], while the ‘qualitative’ end tends to emphasize the political, social and personal responses to perceived hazards [3]. Of course, these are archetypal positions and there are increasing attempts to bring these perspectives together. Such a synthesis could take the form of acknowledging the possible limitations of a strictly formal approach and engaging the social sciences with the process of building and critiquing risk analysis models, say by acknowledging variability in values between individuals, the potential for subjective biases in analysis or differing cultural views as to ‘rational’ actions [4]. Norton et al. [5] claim that ‘an important barrier to achieving a common understanding or interdisciplinary framework is the diversity of meanings associated with terms such as ‘uncertainty’ and ‘ignorance’, both within and between disciplines’. We cannot hope to dismantle this barrier, but shall include uncertainty arising from disputed framing of the issue under consideration.

    Additional challenges to a fully quantified approach to the analysis of risks can arise from scientific uncertainty, where there are doubts about how the world works or its current state, which in turn produces increased uncertainty about what may happen in the future and what actions might be appropriate now. Current arguments about climate change illustrate this issue well, where projections based on complex computer models have been accused of understating the deeper uncertainties about the underlying processes governing future climate and their possible consequences [6].

    In this paper, we propose a basic structure for analysing uncertainty when conducting risk analyses, which attempts to cover the whole range between complete numerical formalization and vague uneasiness about the possibility of unspecified but surprising events. Although the term ‘risk’ covers both the likelihood and the consequences of future events, we shall focus on the likelihood aspects and so will not address disagreements about the values that should be attached to different outcomes. Our main application is to the construction of risk models that are intended to help in decision-making, say as part of policy advice, and we follow Hassenzahl [7] and Kandlikar et al. [8] in believing that over-precise numerical expressions of the likelihood of events are potentially misleading and highly undesirable, since we may not feel confident either in delineating the set of events that may occur, or providing a precisely specified probability distribution over that set.

    We build on numerous categorizations of uncertainty that have been previously proposed. For example, a basic distinction is often made between aleatory and epistemic uncertainty [9]: aleatory is an essential, unavoidable unpredictability or chance, say the situation before a fair coin is flipped when we can't know the outcome, and epistemic uncertainty reflects lack of knowledge or ignorance, say the situation after a coin is flipped but while the result is covered up so that we don't know the outcome, although further information could remove or reduce the uncertainty. This broad division can be useful conceptually, but in practice greater refinement is necessary [10]. We shall return later to the controversial issue of the extent to which probability can be used as a quantitative measure of epistemic uncertainty; Kadvany [4] labels such attempts to quantify scientific uncertainty as ‘taming chance’.

    We begin with a motivating case study on the handling of uncertainty by the Intergovernmental Panel for Climate Change (IPCC), which has come under considerable scrutiny. We then briefly review previous analyses of uncertainty related to risk analysis for policy decisions, and introduce our suggested framework. The remainder of this paper explores this structure in more detail, drawing comparisons with other suggestions, and illustrating with a simple running example. A second case study deconstructs a rather silly media story about double-yolked eggs. We conclude by discussing the actions that might be appropriate on the basis of the proposed structure.

    (a) Case study. Intergovernmental Panel for Climate Change methods for handling of uncertainty

    In preparation for the Fourth Assessment Report (AR4) of the IPCC in 2007, an attempt was made [11] to standardize the expression of uncertainty across the three Working Groups. Lead authors were advised to consider plausible sources of uncertainty, such as unpredictability, structural uncertainty and value uncertainty, and assess the current level of understanding of key issues using the qualitative scale shown in table 1.

    Table 1.Qualitatively defined levels of understanding recommended for use by Working Groups of the IPCC. Adapted from [11].

    level of agreement or consensus high agreement, limited evidence high agreement, much evidence
    low agreement, limited evidence low agreement, much evidence
    amount of evidence (theory, observations, models)

    The guidance follows the work of Risbey & Kandlikar [12] in recommending that the precision of any statement about an unknown quantity should depend on the quality of the available evidence, and that numerical probability statements should only be made about well-defined events and when there is ‘high agreement, much evidence’. They distinguish between ‘levels of confidence’ ‘based on expert judgement as to the correctness of a model, an analysis or a statement’, and ‘likelihood’ defined as a ‘probabilistic assessment of some well defined outcome having occurred or occurring in the future’. Tables 2 and 3 provide a mapping between linguistic terms and numerical values for confidence and likelihood.

    Table 2.Likelihood scale recommended for use by Working Groups of the IPCC in 2007. Adapted from [11].

    terminology degree of confidence in being correct
    virtually certain >99% probability of occurrence
    very likely >90%
    likely >66%
    about as likely as not 33–66% probability
    unlikely <33% probability
    very unlikely <10% probability
    exceptionally unlikely <1% probability

    Table 3.Quantitatively calibrated levels of confidence recommended for use by Working Groups of the IPCC in 2007. Adapted from [11].

    terminology degree of confidence in being correct
    very high confidence at least 9 out of 10 chance of being correct
    high confidence about 8 out of 10 chance
    medium confidence about 5 out of 10 chance
    low confidence about 2 out of 10 chance
    very low confidence less than 1 out of 10 chance

    However, as reported in the critique of the IPCC by the Inter-Academy Council (IAC) [13], the Working Groups in AR4 were not consistent in their use of this guidance. Working Group I (The Physical Science Basis) made extensive use of formal models, which allowed representations of between-model variability in projecting key quantities as well as overall uncertainties expressed as probability distributions and confidence intervals. Overall conclusions were qualified with a mixture of likelihood and confidence statements, the choice of which sometimes appears somewhat arbitrary: compare ‘.. very high confidence that the global average net effect of human activities since 1750 has been one of warming’ and ‘Most of the observed increase in global average temperatures since the mid-20th century is very likely due to the observed increase in anthropogenic greenhouse gas concentrations’ [14]. Working Group II (Impacts, Adaptation and Vulnerability) primarily used the confidence scale (table 3), but the IAC report criticized the use of this numerical scale for conclusions that were vaguely worded or based on weak evidence. Working Group III (Mitigation of Climate Change) only used the level-of-understanding scale (table 1). All Working Groups conditioned on a list of emission scenarios that were not given probabilities.

    The IAC concludes that the IPCC uncertainty guidance [11] was a good starting point, but unnecessary errors arose from its inconsistent use, such as the expression of ‘high confidence in statements for which there is little evidence, such as the widely quoted statement that agricultural yields in Africa might decline by up to 50 percent by 2020’ [13]—which is in any case a fairly vacuous statement. They recommend that future Working Groups should use the level-of-understanding scale (table 1) supplemented by quantitative probabilities (table 2) if there is sufficient evidence, traceable accounts should be provided for expressions of scientific understanding and likelihoods, and that the numerical confidence scale (table 3) should be abandoned. We return to the revised IPCC guidance later.

    (b) How can we categorize uncertainty?

    There have been numerous attempts to deconstruct risk and uncertainty, leading to multiple categorizations and hierarchies, made more complex by different disciplines sometimes using terms in different ways [15,16]. Here, we provide only a highly selected sample.

    In their discussion of quantitative risk analysis, Morgan & Henrion [2] focus on the expression of epistemic uncertainty about each empirical quantity as a probability distribution, listing various sources of evidence and emphasizing sensitivity analysis to alternative assumptions about the structure of the model. In contrast to this highly quantitative approach, there is widespread use in the social sciences of Frank Knight's [17] distinction between risk, in which probabilities are either known or can be estimated, and uncertainty, in which the probabilities are unknown and unmeasurable: Knight claims that

    It will appear that a measurable uncertainty, or ‘risk’ proper, as we shall use the term, is so far different from an unmeasurable one that it is not in effect an uncertainty at all. We shall accordingly restrict the term ‘uncertainty’ to cases of the non-quantitive [sic] type.

    Frank Knight [17], p. 20

    Funtowicz & Ravetz [18] suggest that when there is substantial non-quantifiable uncertainty underlying scientific policy decisions then a new ‘post-normal’ or ‘second-order’ science is required. Wynne [19] disputes this claim and delineates the ‘different kinds of uncertainty’ shown in table 4.

    Table 4.‘Different kinds of uncertainty’ identified by Wynne. Adapted from [19].

    risk know the odds
    uncertainty don't know the odds: may know the main parameters, may reduce uncertainty but increase ignorance
    ignorance don't know what we don't know, ignorance increases with increased commitments based on current knowledge
    indeterminacy causal chains or networks open

    Wynne argues that while risk and uncertainty correspond to situations with known and unknown probabilities, indeterminacy and ignorance are not just more extreme forms of uncertainty but are orthogonal concepts that can exist, even if uncertainty is small. Essentially, he claims that numerical risk assessments are conditional on assumptions, and different kinds of uncertainty are needed to deal with the appropriateness of those assumptions to the context being examined: indeterminacy ‘exists in the open-ended question of whether knowledge is adapted to fit the mismatched realities of application situations’, and ignorance is when we don't know what we don't know about the completeness and validity of our knowledge, which by definition escapes recognition. We shall build on these ideas in our own proposals.

    Other contributions are worth noting. First, van Asselt & Rotmans [20] produces a complex encompassing structure for sources of uncertainty appropriate to integrated assessment modelling for climate change, listing five sources of uncertainty due to variability, and seven sources of uncertainty due to limited knowledge, including reducible and irreducible ignorance, inexactness and indeterminacy. Second, Walker et al. [21] propose a structure comprising three dimensions of uncertainty when constructing risk models: its ‘location’ in terms of the aspect of the model under scrutiny, its ‘level’ in terms of the precision of its expression and its ‘nature’ as either epistemic or aleatory.

    Third, Stirling [22] constructs a 2×2 matrix reflecting potential ‘incertitude’ about both probabilities and outcomes: risk corresponds to well-established probabilities and outcomes, uncertainty when the probabilities are problematic, ambiguity when there is doubt about the outcomes and ignorance when both probabilities and outcomes cannot be confidently specified. He suggests alternative strategies in each of these situations (we note the potential for confusion as ‘ambiguity’ is used in the behavioural economics literature to indicate unknown probabilities, precisely the opposite definition to Stirling's). Our focus here, in Stirling's terms, is on uncertainty. Finally, Kandlikar et al. [8] focus on representations of uncertainty about the overall conclusions of a risk model, arguing for a direct mapping from levels of disagreement and ignorance to the appropriate degree of precision; as we shall see later, their analysis has been very influential in the new guidance given to IPCC authors.

    (c) Yet another structure for uncertainty in risk analysis

    Risk analyses typically consist of one or more mathematical models based on scientific understanding of the underlying processes, which contain parameters that quantify the size of the relationships expressed in the model, which together are used to assess the probabilities of different events occurring. It is therefore natural to think of classifying our uncertainty within the modelling process into the residual unpredictability of events for given models and parameters, which we can think of as aleatory, and our uncertainty about the parameters and the model, which is (roughly) epistemic.

    This three-level structure is attractive but ignores the possibility of acknowledging deeper doubts about our capacity to model the issue in question, as emphasized by Wynne's categorization. Our proposed five-level structure, summarized in table 5 and illustrated in figure 1, uses a similar language to Wynne but changes the emphasis towards identifying the limitations in formal models.

    Table 5.Our proposed five levels for objects of uncertainty when conducting model-based risk analysis.

    level what we are uncertain about source of uncertainty
    1 events essential unpredictability
    2 parameters within models limitations in information
    3 alternative model structures limitations in formalized knowledge
    4 effects of model inadequacy from recognized sources indeterminacy—known limitations in understanding and modelling ability
    5 effects of model inadequacy from unspecified sources ignorance—unknown limitations in understanding
    Figure 1.

    Figure 1. Five levels of uncertainty. While the first three form a natural hierarchy, levels 4 and 5 apply to the entire modelling process and may exist even if there is little uncertainty expressed within the modelling framework.

    The crucial idea follows that of Wynne: while the first three levels reflect traditional ‘statistical’ measures of uncertainty, which are conditional on explicit or implicit assumptions, the remaining two levels attempt to make unconditional and reflective statements about the potential relevance of the model to the context under analysis, with levels 4 and 5 roughly corresponding to Wynne's ‘indeterminacy’ and ‘ignorance’, respectively, although we prefer to use these terms to identify sources rather than objects of uncertainty. Levels 4 and 5 are attempts to express our confidence in the reductionist analysis carried out at levels 1–3; this is related to the IPCC's attempt to separate ‘likelihood’ from ‘confidence’ but without putting a numerical scale on ‘confidence’. Our use of ‘indeterminacy’ in level 4 is intended to include possible disputes about the framing of the problem under consideration.

    Our framework emphasizes that uncertainty is perhaps best thought of as a relationship with a number of characteristics, such as

    — Object of the uncertainty. For example, the five levels specified above.

    — Form of expression of uncertainty. These may include (without implying a strict ordering)

    1. a full explicit probability distribution,

    2. an incompletely specified probability, distribution, say using bounds or an order-of-magnitude assessment,

    3. a list of possibilities,

    4. informal qualitative, qualifying statements, say about whether an effect may be positive or negative,

    5. informal acknowledgement that uncertainty exists,

    6. explicit denial that uncertainty exists, and

    7. no mention of uncertainty.

    — Source of uncertainty. Possibilities include variability within a population, ignorance, ‘chance’, computational limitations, etc. [20].

    — Subject. Whose uncertainty is it? This may be an individual, a part of the public, a decision-maker, a risk-assessor and so on.

    — ‘Affect’. Feelings associated with the uncertainty may include dread, excitement, fear and so on.

    Comparing the structure proposed by Walker et al. [21], our ‘object’ is their ‘location’, our ‘form of expression’ is their ‘level’, and our ‘source’ is their ‘nature’.

    The appropriate form of expression will clearly be strongly influenced by uncertainty at a deeper level; for example, if there are acknowledged doubts about the adequacy of a model, then any resulting probabilities should not be given with high precision, and only rough conclusions communicated [12,23]. Kandlikar et al. [8] provide specific guidance on the appropriate level of precision.

    An analysis of the components of an uncertainty statement can be illuminating, but could get too complex and over-prescriptive, which has possibly been a problem with the otherwise admirable Numeral, Unit, Spread, Assessment, Pedigree (NUSAP) [18] nomenclature for specifying characteristics for uncertain quantities in a model, including its pedigree. Here, we focus on the object of uncertainty and the potential form for its expression, trying to retain sufficient simplicity to make the structure routinely applicable and not being too dogmatic about the precise usage.

    Characterization of uncertainty as a relationship emphasizes that our philosophy is strongly influenced by the Bayesian subjectivist approaches. In particular, we do not view probability as an objective state of the world that can be estimated (except perhaps at subatomic levels), and prefer to follow de Finetti's dictum that ‘probability does not exist’ [24]. We also believe that subjective probabilities are not internal, subjective states of mind waiting to be measured, but are constructed in response to the elicitation process. Our final influence is statistician George Box, whose statement that ‘all models are wrong, but some are useful’ [25] summarizes our view that the whole process of model construction and probability assessment is contingent, changeable and arrived at by a process that inevitably involves deliberative aspects. In an abuse of sociological language, we might be classified as ‘analytic weak constructivists’, in that we view any risk models as human constructions based on our limited knowledge and judgement, which are best treated as ‘guide books’ that can be very useful but should be clearly distinguished from reality, cannot be judged purely on the basis of internal consistency and need to be used with caution. In their fine review of methods for dealing with uncertainty surrounding climate change models, Morgan et al. [10] take a very similar view, observing that choice of a model structure is a pragmatic ‘compromise between the credibility of the results and the effort to create and analyze the model’.

    We therefore echo Kadvany's [4] observation that

    It is remarkable then that from one of the most formalist branches of modern philosophy, a radically anti-formalist conclusion should ensue: that making a formalism itself work requires a constructive, temporal and social process.

    Kadvany [4], p. 20

    The way in which a risk assessment is framed is clearly open to deliberation and dispute, whether concerning the outcomes to be evaluated, whose values to be used and so on. This is unavoidable, and sensitivity analysis to alternative valuations on outcomes should clearly be carried out. However, we again emphasize that our main concern here is with scientific uncertainty rather than disagreement, except insofar as that uncertainty may sometimes be reflected in differing scientific views.

    We now consider each of the individual levels in turn.

    (d) Level 1. Uncertainty about which in a list of events will occur

    We assume a list of mutually exclusive and exhaustive events, so that one and only one will occur, and that our uncertainty is ‘extensional’ in the sense that it does not depend on framing or personal values. The source of uncertainty about events might be described as an essential unpredictability, which may be assigned to aleatory randomness or a measurement error, or by assuming that the process is deterministic but chaotic as in weather forecasting; pragmatically these have the same effect.

    It is natural to wish to use a full probability distribution when expressing uncertainty about events, and in the absence of other expressed uncertainties, this might be assumed known on the basis of either the symmetries in the randomizing device (cards, lotteries, dice, etc), a mathematical model, or extensive past data on situations judged exchangeable with the future. It is important to note that even ‘known’ probabilities are based on judgements and assumptions and represent a relationship between the assessor and the event.

    Of course, in practice, the assumption that we can list all possible future events is almost inevitably a simplification, and we shall see that doubts about this assumption can be expressed at levels 4 and 5, either informally or quantitatively using incomplete probability distributions that retain some probability for unspecified events.

    Probability distributions can be expressed in simplified forms, such as the words given a formal interpretation, as in table 2. A slightly more refined approximation is shown in figure 2, which reproduces a fan chart used by the Bank of England [26]. This provides a visual representation of the uncertainty about both past and future growth—the projections use a combination of formal analysis and the ‘best collective judgement’ of the Bank of England Monetary Policy Committee, and the different shades indicate from central 10 up to 90 per cent intervals of ‘possible futures’ assuming that ‘economic circumstances identical to today's were to prevail on 100 occasions’.

    Figure 2.

    Figure 2. An example from November 2007 of a fan chart to illustrate probabilistic projections for percentage annual growth. The observed series subsequently went off the bottom of the chart, but in fairness it is expected that 10 out of 100 projections will lie outside the shaded area. ONS, Office for National Statistics. Reproduced with permission from the Bank of England [26].

    Example: balls in a bag. We shall adopt a running example comprising the classic set-up of a bag in which we are told that there are, say, 10 red or white balls, and we are to receive a prize if we draw a red ball. If we assume there really are five white and five red balls, then we can reasonably assess a probability of 0.5 of drawing a red ball. But this relies on us believing what we have been told.

    (e) Level 2. Uncertainty about parameters in a model

    Any model underlying a risk analysis will contain parameters intended to represent idealized states of nature assuming the model is correct; for example, the slope of a dose–response curve for a toxic compound. Under this heading, we also include external inputs such as forcings in climate models. Uncertainty about these parameters in turn gives rise to uncertainty about the probabilities of future events. The source of uncertainty about parameters is essentially due to limitations in information, in particular lack of quality data, and can be expected to reduce on receipt of further evidence.

    There is a continued argument about whether we can use probability theory to express epistemic uncertainty about parameters. Such usage follows naturally from a Bayesian perspective, whereas classical statistical output is restricted to providing estimates and confidence intervals rather than full probability distributions over parameters. The Bayesian approach is increasingly popular in risk modelling as it allows for the propagation of uncertainty about parameters through a model to a final probability distribution on the predictions using, for example, Monte Carlo analysis. This is known as probabilistic sensitivity analysis, in contrast to the deterministic sensitivity analysis that can be carried out if we are only prepared to give a list of possible values for parameters to provide say a range, without giving a full distribution. We should also remember that these parameters do not really exist, and are better thought of as reasonable assumptions for the intended purpose.

    As in level 1, there may be additional doubts about the list of possible appropriate values for parameters from known or unknown sources, or possible unwillingness to specify a full probability distribution. For example, environmental regulatory agencies mandated to make decisions about permissible levels of potentially carcinogenic or otherwise harmful compounds are repeatedly faced with making risk assessments in the face of unquantified uncertainty about, for example, the toxicity of the substance being studied, due to lack of specific data on the form of the dose–response curve for humans. The US Environmental Protection Agency tends to handle this uncertainty by adopting ‘cautious’ default values of parameters—see the recent review by the National Research Council for a critique of their methods [27]. Similarly, ‘safety’ or ‘uncertainty’ factors are used by the UK Committee on Toxicology (COT) [28], who describe how a dose regarded as safe in animals (NOAEL—no observed adverse effect level) will be divided by 100 when assessing a safe dose on humans; a factor of 10 for moving between species and an additional factor of 10 for variability within-species. The COT [28] have suggested that greater attention should be paid to more formal, model-based methods for adjustment, and some have advocated abandoning uncertainty factors in terms of a fully model-based analysis [29].

    Example: balls in a bag. Suppose another bag is prepared that is said to contain 10 balls, either red or white, but this bag has an unknown number of red balls. This is ambiguity in the sense of the classic Ellsberg paradox [30], in that we don't know the chances of success. However, if we assume a known probability distribution for the number of red balls, then we can work out the probabilities of different events; for example, say that the number of red balls has been chosen randomly from a uniform distribution on the integers from 0 and 10, then our probability for choosing a red from the bag should be 0.5, exactly as if we knew there were five of each. However, the difference is that we can use our experience to learn about the composition of balls, so that once a red has been chosen and replaced, it can be shown that our probability for the next ball being red should rise to 0.7.

    Given an option of choosing between a bag with known chances of success, and one with an unknown chance, people tend to express ‘ambiguity aversion’ and tend to select the bag with known odds [30].

    (f) Level 3. Uncertainty about which model is best

    Mathematical risk models are simplified idealizations of how the world works, intended to capture the essential characteristics of the hazard being studied. Depending on the purpose of the exercise, it may be appropriate to act as if such hypotheses were the case, while not actually believing they were strictly true. In general, there are a wide range of options in constructing such models, for example, in the precision with which they model space and time, their functional form and the influencing factors that are included.

    Suppose a set of K models have been considered, denoted M1 to MK. If we observe data y and have a probability model under each model assumption, we can calculate p(y|Mk), k=1,…,K. This may involve integrating out (Bayesian) or maximizing over (classical) some parameters. Then, two alternative models, say M1 and M2, can be compared by their likelihood ratio p(y|M1)/p(y|M2), also known as the ‘Bayes factor’. Note this relative comparison does not involve a statement that either of the models is ‘true’.

    Some Bayesian practitioners will go further, in assigning a relative prior odds on the models p(M1)/p(M2), and hence obtaining the posterior odds through the standard odds form of Bayes theorem: posterior odds=likelihoods ratio × prior odds. If we go even further and assume a ‘closed-world’ in which one and only one of our candidate models can be assumed to be ‘true’, then we can assume a full posterior probability distribution p(Mk|y), k=1,…,K. In particular, this enables us to make predictions for a new quantity Y of the form

    Display Formula
    This is the Bayesian model averaging, in which the predictions under each model are weighted by that model's current plausibility. This can be a powerful prediction technique [31].

    But does it really make sense to put probabilities on models? In a sense, we are expressing our belief in the appropriateness of acting as if a model were the case, and so may depend on whether, say, we are using a model simply for prediction or if we are going to interpret the mechanism that the model encapsulates. If the former, then probabilities on models seem reasonable, but if the latter, then we follow others [10] in recommending a ‘clairvoyance’ or ‘clarity’ test, in which it is only meaningful to put probabilities on statements if we can at least imagine a future experiment that would reveal the statement's truth and falsity. And since ‘all models are wrong’, how can this be achieved? In addition, this criterion casts doubt on the appropriateness of the IPCC's [14] use of a probability scale for their statements (quoted previously) attributing climate changes to human influences. We find it difficult to imagine a future scenario in which these statements will definitively be determined as true or false, which leads us to agree with the IAC's recommendations [13] that the probability scale for confidence shown in table 3 is inappropriate.

    Uncertainty about alternative models structures might therefore only be expressed by full Bayesian model probabilities in very special circumstances, and in general a qualitative assessment of relative plausibility of alternative models, or a list of models/scenarios with deterministic sensitivity analysis, seems appropriate.

    Example: balls in a bag. Suppose we are exposed to the two otherwise identical bags discussed previously, bag 1 with a known proportion of 50 per cent red balls and bag 2 with an unknown proportion but assumed drawn from a uniform distribution. We pick a bag at random, and want to know which bag we have chosen. Picking a single ball provides no way of discriminating between the bags, since we have seen that each is judged equally likely to give a red ball. However, suppose we draw two balls (replacing the first before drawing the second) and they both are red. The probability of two red balls from bag 1 is trivially 0.25, and that our probability for two red balls from bag 2 should be 0.7×0.5=0.35. Therefore, the Bayes factor for two models is 0.25/0.35=5/7; the prior odds is 1 since we have picked a bag at random, and so the posterior odds is 5/7, giving a posterior probability that we have picked bag 1 of 5/12=0.42. So two red balls have slightly increased our belief that we picked the bag with an unknown proportion from 0.5 to 0.58. We repeat that such posterior model probabilities only seem appropriate in tightly controlled, idealized situations such as this.

    (g) Level 4. Uncertainty about known inadequacies of best model

    Even the best imaginable model is still not the real world and has inevitable limitations, which may or may not be important for the risk being analysed. Such recognized inadequacies broadly reflect Wynne's concept of indeterminacy, and could arise from aspects we know have been omitted from the model, computational limitations, extrapolations to areas for which there is little or no data, and known disagreements about how the analysis should be framed.

    A range of forms exist for expressing uncertainty about aspects that have been left out of formal models. These include informal expressions of concern, listing aspects not included with a verbal expression of their possible importance. In particular, we might list ‘imaginable surprises’ [32]—things we suspect could occur but for which we do not have the understanding to include in a formal model. The UK Climate Impact Programme [33] boldly produced Bayesian probability distributions for quantities such as the mean temperature rise on a 25 km grid under specific emission scenarios, but acknowledged that there are potentially important factors that have been left out of the model due to lack of understanding or unmodelled unpredictability, such as solar activity, volcanic activity or the methane cycle; however, they suggest informally that solar activity is unlikely to have more than a ±0.5°C effect [34].

    The European Food Standards Agency (EFSA) has suggested making qualitative assessments of the impact on a final conclusion from different unmodelled sources of uncertainty [35], which has been taken up by the European Chemicals Agency [36]. An example is shown in table 6. A tiered approach is proposed, where if such a qualitative analysis suggests a potentially important problem, then individual scenarios are examined in more detail and a full probabilistic modelling assessment is finally carried out, essentially moving the uncertainty into the model.

    Table 6.Qualitative evaluation of influence of uncertainties on an assessment of ochratoxin A exposure for high consumers contaminated with this substance. Adapted from [35].

    source of uncertainty direction and magnitude of effect
    moderate under reporting of consumption is known to occur −−
    misreporting: some subjects will have reported the food that they ate in a wrong food category ±
    use of broad food categories causes over-estimation of exposure, +++
    etc. etc.
    qualitative evaluation of overall effect of identified uncertainties: estimates for high consumers are likely to over-estimate adult exposure by a moderate amount, but there might be under-estimates for regional populations consuming locally produced food and which are probably under-estimates for children ++ adults± local populations−− children

    Following such an informal assessment, it is possible to go further and express the potential bias introduced by unmodelled factors as a probability distribution. Turner et al. [37] elicit judgements as to the quality and relevance of studies contributing to a systematic review of the benefits of a medical intervention, and quantify possible internal and external biases arising from study limitations. The elicitation process has two stages: first a qualitative assessment of the potential effects of biases that strongly resembles the EFSA procedure outlined in table 6, followed by a quantitative assessment. This finally leads to an automatic down-weighting of weaker or less relevant studies when estimating the overall effect of the intervention. If the estimates of potential bias were derived from a formal data-analysis, then this could form part of an integrated probability model.

    Finally, it is important to note that a ‘consumer’ confronted with a risk assessment is faced with another unmodelled source of uncertainty that the assessment implicitly assumes to be absent: can we assume the modeller to be trustworthy and/or competent?

    Example: balls in a bag. Suppose we had some explicit suspicions about the set-up that confronts us when making the choice between bags described previously; for example, are there really only 10 balls in each bag, and was the mechanism for choosing the number of red balls as claimed? It becomes clear that doubts at this level are intricately tied up with the trust we have in our sources of information.

    (h) Level 5. Uncertainty about unknown inadequacies of all models

    This final level concerns doubts about the entire modelling process, but where there is unwillingness to express the reasons for those doubts. This can be thought of as an expression of our confidence in our basic understanding, of the quality of the evidence underlying the analysis or conversely of our ignorance. Of course, it is important to distinguish between acknowledged or ‘conscious ignorance’ [16], where we know we don't know what we don't know, and unacknowledged or ‘meta-ignorance’ [16], where we don't even consider the possibility of error.

    The idea of acknowledging the limitations in our capacity for formal conclusions has a long history. A classic quote is from Keynes [38]

    By ‘uncertain’ knowledge, let me explain, I do not mean merely to distinguish what is known for certain from what is merely probable. The game of roulette is not subject, in this sense, to uncertainty; nor is the prospect of a Victory bond being drawn. Or, again, the expectation of life is only slightly uncertain. Even the weather is only moderately uncertain. The sense in which I am using the term is that in which the prospect of an European war is uncertain, or the price of copper and the rate of interest twenty years hence, or the obsolescence of a new invention, or the position of private wealthowners in the social system in 1970. About these matters there is no scientific basis on which to form any calculable probability whatever. We simply do not know.

    Keynes [38], pp. 213–214

    More recently, of course, Donald Rumsfeld has become an icon of risk analysis through his poetic statement

    There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we now know we don't know. But there are also unknown unknowns. These are things we do not know we don't know.

    Rumsfeld [39]

    although the quotes elsewhere in this paper show that this was hardly a new idea.

    A particularly powerful plea for humility came from Oliver Cromwell, who in 1650 was trying to avoid a battle with the Church of Scotland who were then supporting the return of the son of Charles I, the future Charles II. He wrote ‘is it therefore infallibly agreeable to the Word of God, all that you say?’ and continued with what may be termed ‘Cromwell's Law: ‘I beseech you, in the bowels of Christ, think it possible you may be mistaken’ [40]’.

    The idea of thinking it possible that you may be mistaken runs through Taleb's discourse on black swans [41], events that were not predicted, create a huge shock, but afterwards everyone finds a reason that they were not so surprising after all. Taleb makes a sustained attack on economic modellers who make inappropriate assumptions and then act as if their models were true, which turned out to be prescient of the crisis of 2008. It is interesting to note, however, that an anticipation of possible surprises, arising from entirely unforeseen circumstances, has played an integral part of statistical process control ideas since Shewhart's early work in the 1920s [42], which introduced the idea of ‘special-cause variation’, a qualitatively different cause of variability from that normally observed, and which is always a surprise.

    One possible response to acknowledgement of the limitations of the modelling process is to adapt the technical methods to deal with an unwillingness to provide a complete specification. Options include incompletely specified probability distributions, such as only specifying upper and lower probabilities for events, or the Dempster–Shafer approach in which a certain probability is left unassigned for everything else that might be the case. Lempert & Collins [43] compare a range of formal approaches that acknowledge deeper uncertainties and seek robust behaviour; these include a precautionary approach in which the worst case is assumed, trading some optimal performance for less sensitivity to assumptions (which is related to the Info-Gap procedure [44]), providing reasonable performance over a wide range of possible scenarios, and keeping options open. The UK COT also adopt a safety factor of 10 ‘when standard data package is not complete’ [28], essentially a precautionary hedge against unspecified inadequacies in the model.

    While respecting the insights of these technical approaches, we do not feel it is generally appropriate to respond to limitations in formal analysis by increasing the complexity of the modelling. Instead, we feel a need to have an external perspective on the adequacy and robustness of the whole modelling endeavour, rather than relying on within-model calculations of uncertainties, which are inevitably contingent on yet more assumptions that may turn out to be misguided.

    This brings us back to qualitative measures of the ‘confidence’ we have in our analysis, based on the quality of the evidence available. We have seen earlier how the IPCC has tried to deal with this issue, but it may be valuable to look in other areas in which evidence-based recommendations have developed a very strong profile. For example, in medicine, the Cochrane Collaboration is a major international organization that conducts systematic reviews of the evidence on the effectiveness of treatments, which generally feature meta-analyses of a treatment effect whose conclusions are expressed as the usual point estimates and intervals. However, this formal procedure is supplemented by a judgement of the quality of evidence underlying the estimate, expressed as the Grading of Recommendations Assessments, Development and Evaluation (GRADE) scale shown in table 7 [45].

    Table 7.GRADE scale for quality of evidence. Adapted with permission from [45].

    high quality further research is very unlikely to change our confidence in the estimate of effect
    moderate quality further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate
    low quality further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate
    very low quality any estimate of effect is very uncertain

    It is important to note that the GRADE scale does not depend on some prescriptive ‘hierarchy of evidence’, but provides an operational definition based on a judgement of the robustness of the conclusions to further information. This scale could be adapted to risk analysis simply by replacing ‘estimate of effect’ by ‘assessed risk’.

    Example: balls in a bag. Level 5 consists of unvoiced or unimagined suspicions—in our public demonstrations, we sometimes substitute an unpleasant-feeling object in a bag when asking people to draw a ball, which gives a suitable surprise and communicates the trust implicit in assuming that the list of possible outcomes has been fully provided.

    (i) Case study. Intergovernmental Panel on Climate Change revisited

    The IPCC guidance notes for the Fifth Assessment report [46] has followed, to some extent, the critical report of the IAC [13] in recommending that author teams (i) summarize the quality of evidence and level of agreement as in table 1, (ii) if there is high agreement and/or robust evidence, express qualitative confidence in findings on a scale ‘very low’, ‘low’, ‘medium’, ‘high’ and ‘very high’, and (iii) when appropriate, assign a quantitative probability as in table 2. When expressing uncertainty about key unknown quantities, a six-level scale of ‘calibrated language’ is also provided, ranging from situations of ambiguity and ignorance where confidence or likelihood should not be assigned, through only specifying an order of magnitude, to full specification of a probability distribution. Authors are urged to provide traceable account for their uncertainty assessment.

    This approach is strongly based on the work of Kandlikar et al. [8], who argue that confidence and likelihood cannot be separated. We can therefore identify two somewhat different approaches to expressing uncertainty when we recognize indeterminacy through disagreement or poor-quality evidence. The GRADE approach encourages a provisional numerical expression of uncertainty, but qualified by an additional judgement about the robustness of the analysis. In contrast, the revised IPCC guidance recommends that if conclusions cannot be given with high confidence, then quantities should not be given probability distributions. However, the difference between these approaches should not be exaggerated; the vital element is that each allows an expression of caution and humility concerning a precise quantification of uncertainty.

    (j) Case study. Egg-gate

    A somewhat absurd example illustrates our proposed structure. In February 2010, a story was reported in the newspapers [47] and on prime-time radio about a woman who had bought a box of half-dozen extra-large eggs and found they all had double-yolks. A representative from the Egg Council declared that only 1 in 1000 eggs were double-yolked, and so getting six such eggs in a box was assumed to have a chance of 1 in 1 000 000 000 000 000 000 (1/1000 multiplied together six times). This is an old English trillion.

    We can deconstruct this ‘risk analysis’ using the five levels.

    — Is this a plausible probability for this event? There are around 2 000 000 000 half-dozens of eggs sold in the UK each year, a huge number, but even so we would only expect such a rare event to occur once in every 500 000 000 years. So this suggests the probability is incorrect.

    — The basic parameter of the model is the 1/1000 chance of a double-yolked egg. But this risk is much higher for extra-large eggs, and so the parameter is thrown into doubt.

    — The model assumes that eggs in a box are independent. But a little research shows that eggs in the same box tend to come from the same flock, and hence getting one double-yolked egg increases the odds of getting another in the same box. Hence, we should consider elaborating our model to allow for correlated eggs.

    — But what about other inadequacies of any such model? A little reflection suggests that we might want to know about where the eggs came from, how they are screened, and many other possibly influential factors, before we could say whether this was really such a surprising event.

    — Finally, what did we not even think of, what were we entirely ignorant about? When we use this example in lectures, we say how the next box of eggs we bought had six eggs that were all double-yolked! Extraordinary? And then we show the picture shown in figure 3, revealing that this was not a difficult feat since they can be bought in a supermarket. This was a total surprise to us and has been to all audiences (although it was pointed out in the comments to the newspaper articles); it had not crossed our minds that double-yolked eggs can be easily detected by holding them to a light and selected for inclusion in a box, and so the six eggs in the original story were most likely deliberately selected or the box wrongly labelled.

    Figure 3.

    Figure 3. The next box of eggs were all double-yolked, but that was hardly surprising.

    Of course this still means that there is uncertainty about whether a box of ‘ordinary’ eggs are all double-yolked, and it is an unlikely event, but the basic model that was assumed was utterly wrong.

    (k) So what is to be done in the face of deeper uncertainties?

    The running theme throughout this paper is the importance of recognizing that formal ‘scientific’ analyses are constructed on the basis of current assumptions and judgements, that there are deep uncertainties that are not expressed through the standard intervals and sensitivity analysis, and that these uncertainties are not necessarily reduced by additional information.

    One response to such uncertainty is to adopt the ‘precautionary principle’, in which preventive action may be taken without waiting for firm scientific evidence [48]. However, we have seen in the previous section that we are naturally led into precautionary and robust approaches as a consequence of acknowledging deeper uncertainties without the need to invoke a separate principle; as Stirling [22] emphasizes, precaution arises as part of the risk assessment rather than being part of risk management.

    Our final recommendations may appear mere common sense.

    — Use quantitative models with aleatory and epistemic uncertainty expressed as Bayesian probability distributions.

    — Conduct sensitivity analysis to alternative model forms and assess evidential support for alternative structures, without putting probabilities on models.

    — Provide a list of known model limitations and a judgement of their qualitative or quantitative influence, possibly along the lines shown in table 6, and ensuring there has been a fully imaginative consideration of possible futures.

    — Provide a qualitative expression of confidence, or lack of it, in any analysis based on the quality of the underlying evidence, possibly expressed using an adapted GRADE scale or the IPCC guidance [46].

    — In situations of low confidence, use deliberately imprecise expressions of uncertainty about quantities, such as their orders-of-magnitude, whether they are positive or negative, or even refuse to give any judgement at all; the IPCC guidance suggests a calibrated scale for these expressions.

    — When exploring possible actions, look for robustness to error, resilience to the unforeseen, and potential for adaptivity in the face of the unexpected [10].

    — Seek transparency and ease of interrogation of any model, with clear expression of the provenance of assumptions.

    — Communicate the estimates with humility, communicate the uncertainty with confidence.

    — Fully acknowledge the role of judgement: this ‘..means engaging in policy making by fully accepting the constructive, participatory, ultimately open-ended and untamed nature of judgements under uncertainty’ [4].

    Finally, when working with policy-makers and policy-communicators, it is important to avoid the attrition of uncertainty in the face of an inappropriate demand for certainty: as Wynne claims, ‘the built-in ignorance of science towards its own limiting commitments and assumptions is a problem only when external commitments are built on it as if such intrinsic limitations did not exist’ [19].

    Footnotes

    One contribution of 15 to a Discussion Meeting Issue ‘Handling uncertainty in science’.