On estimating local long-term climate trends

Climate sensitivity is commonly taken to refer to the equilibrium change in the annual mean global surface temperature following a doubling of the atmospheric carbon dioxide concentration. Evaluating this variable remains of significant scientific interest, but its global nature makes it largely irrelevant to many areas of climate science, such as impact assessments, and also to policy in terms of vulnerability assessments and adaptation planning. Here, we focus on local changes and on the way observational data can be analysed to inform us about how local climate has changed since the middle of the nineteenth century. Taking the perspective of climate as a constantly changing distribution, we evaluate the relative changes between different quantiles of such distributions and between different geographical locations for the same quantiles. We show how the observational data can provide guidance on trends in local climate at the specific thresholds relevant to particular impact or policy endeavours. This also quantifies the level of detail needed from climate models if they are to be used as tools to assess climate change impact. The mathematical basis is presented for two methods of extracting these local trends from the data. The two methods are compared first using surrogate data, to clarify the methods and their uncertainties, and then using observational surface temperature time series from four locations across Europe.


London, UK
Climate sensitivity is commonly taken to refer to the equilibrium change in the annual mean global surface temperature following a doubling of the atmospheric carbon dioxide concentration. Evaluating this variable remains of significant scientific interest, but its global nature makes it largely irrelevant to many areas of climate science, such as impact assessments, and also to policy in terms of vulnerability assessments and adaptation planning. Here, we focus on local changes and on the way observational data can be analysed to inform us about how local climate has changed since the middle of the nineteenth century. Taking the perspective of climate as a constantly changing distribution, we evaluate the relative changes between different quantiles of such distributions and between different geographical locations for the same quantiles. We show how the observational data can provide guidance on trends in local climate at the specific thresholds relevant to particular impact or policy endeavours. This also quantifies the level of detail needed from climate models if they are to be used as tools to assess climate change impact. The mathematical basis is presented for two methods of extracting these local trends from the data. The two methods are compared first using surrogate data, to clarify the methods and their uncertainties, and then using observational surface temperature time series from four locations across Europe. 2013 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/ by/3.0/, which permits unrestricted use, provided the original author and source are credited. 1

. Introduction
That increases in atmospheric greenhouse gases will lead to global warming and climate disruption at a level that poses a threat to society is clear. The amount of warming to expect on a global scale during the twenty-first century under any particular concentration scenario is, however, uncertain. It depends partly on climate sensitivity that is defined as the equilibrium change in the annual mean global surface temperature following a sustained doubling of the atmospheric carbon dioxide concentration [1]. For any given change in global mean temperature, a wide variety of changes are possible at local scales. Yet it is at local scales that the impacts of climate change will be felt directly and at which adaptation planning decisions must be made [2]. Many planners and environmental disciplines are aware that an assumption of stationarity is ill-founded; therefore, basing decisions on climatic distributions over the last century is likely to be sub-optimal for the climate of today and even more so for the climate of the future [3]. The question for planners is how to obtain better information to support their assessments and decisions. The simulations of complicated global climate models can suffer from serious questions of robustness and reliability [4][5][6], and are known to inadequately represent behaviour that will have an impact on local scales. They are, therefore, a questionable basis for decision-making [2,4,7]. Our aim herein is to present a method for analysing local climatic time-series data to assess which quantiles of the local climatic distribution show the greatest and most robust trends over the last 60 years. A local trend parameter is constructed to automatically incorporate all the effects of wider regional and global changes while allowing for the local geographical factors that affect the local climate.
The local trend parameter proposed here isolates the change that has occurred in a single set of at-a-point observations of a given climate variable, for example, maximum daily temperature, over the available observation period (here, 60 years). The parameter that we obtain is a function of both geographical location and the given variable (or occurrence likelihood of the given variable); these are treated as independent. We thus obtain as outputs how the trend varies with temperature (or occurrence likelihood), and with geographical location. The trend parameter isolates the change in the climate variable, say temperature, at a fixed likelihood, and hence is closely related to the change in event return time. The novelty of our approach is its simplicity. It offers simple quantitative understanding of what the data can tell us about past changes and how robust they are. The trend parameter is calculated independently for each location; no geographical correlation is assumed. Thus, as outputs it identifies where trends are consistent across extended geographical regions, and which regions do not show a clear signal.
Our approach is thus distinct from, and complementary to, inference-based approaches that combine multiple observations and evidence [8] or combine spatio-temporal trends ( [9]; see also [10]). Importantly, it has relevance for the recent debate on the perception of how the climate has changed [11] in that it quantifies how changes in climate have played out on local scales and what the uncertainties are, simply from a given set of observations, independent of models or other inferences. As our method is model independent and relies entirely on the data, it does not distinguish between the many components that can influence local climate. When combined with understanding of climatic trends expected for the future, where such trends can be identified and understood, this local measure can assist in prioritizing adaptation projects and making judgements over the relative risks of different options.
This quantile-dependent change may or may not be representable by a change in the mean and/or a few higher moments of the distribution. Given the nonlinear nature of the system, we might expect that it would not. Furthermore, variations in the distribution can occur on a wide range of time scales. The mathematical challenge is, therefore, to make best use of the data to quantify the changes, to identify the robust aspects of the results and when the change cannot be well quantified. This paper addresses these challenges. In §2, we derive the local trend parameter and present two methods for extracting it from time-series data. In §3, the methods are illustrated using surrogate climate data designed to illustrate the concepts and the challenges of extracting a clear signal given statistical variations. Section 4 illustrates its application to real data and gives methods for quantifying robustness. The conclusions present opportunities for further development and application of the technique. The relationship to, and representation of, the local long-term trend in terms of return times is covered in appendix A.

Method: a parameter for local long-term trends
Our starting point is that we have a daily observation of some variable, say mean or maximum or minimum temperature, T at different geographical locations and over an extended interval of time. There is a climatic distribution from which these observations are drawn. This local distribution will vary with time, in a manner that depends upon the changing state of the climate at all scales. We represent the climate state at any given time t by the function g(t).
An underlying assumption is then that the effect of changing g on the distribution of observations of T is on a much longer time scale than that over which individual samples of T distributions are obtained. For each location, we can aggregate daily seasonal temperature observations over some multi-year interval τ , centred on time t, to obtain a cumulative density function (cdf). Let us assume that the cdf C(T, g(t)) can be treated as a continuous function of temperature and of time-dependent climate state g. Then at a given time t with corresponding g(t), the cdf value q is the likelihood of the quantile T q of a given temperature observation T ≤ T q : and T q =C(q, g(t)), (2.2) whereC is the quantile function obtained by inverting the cdf with respect to temperature. We can write the variation, to leading order: At some later time, we observe a temperature T * so that dT q = T * − T q ; we will assume that this can be treated as a small change. We can in principle ask for the T * that could occur in the absence of any change in climate state, that is dg = 0, directly from the change in the quantile function; from (2.4) this is We can also ask for the contribution to the observation T * solely owing to the change in climate state, that is at dq = 0, from the quantile function. From (2.4) this is This is illustrated in figure 1. A sufficiently accurate approximation for the quantile function at two times t 1 and t 2 would directly give an estimate of the discrete change T q over an interval [t 1 , t 2 ]. We now approximate dT q by T q by using ∂C ∂g q dg C (q, g(t 2 ), g(t 1 )) C (q, g(t 2 )) −C(q, g(t 1 )).
This direct estimate of the change T q due to the change in g requires full knowledge of the quantile function, that is, the inverse of the cdf. In practice, given that there is a fundamental rsta. royalsocietypublishing

C(g,T)
T C(g(t 1 )) If the temperature cdf is not changing with time (g is constant) then temperature T * is from the same distribution as temperature T q , and is observed with likelihood q + dq.Thisisthepathdg = 0. If the temperature cdf is changing with time, due to change in g(t), then temperature T * is observed at time t 2 with the same likelihood q as temperature T q was observed at t 1 . This is the path dq = 0. (Online version in colour.) upper limit on the number of observations in a given season, this can be problematic, particularly in the tails of the distribution for non-Gaussian processes. We can instead obtain an expression where a direct estimate of the cdf is needed.
where P(T q , g(t)) is the probability density function (pdf). Now let us compare two different realizations of the cdf obtained at different times t 1 and t 2 over which there is the discrete change T q . Then, we will approximate dT q by T q using This expression suggests pragmatic strategies for estimating a local trend parameter S = T q (dq = 0). This parameter is a function of the observed variable, here temperature T, and of the geographical location at which T is observed. It parametrizes which parts of the distribution are changing most rapidly with change in climate state g. We can immediately see that the pdf acts as an amplification factor for the local trend parameter. A given change in cdf will be most impactful where the pdf is small, that is, in the tails of the distribution rather than at the mean. Importantly, in a finite time interval of data, the pdf can be estimated across an aggregate of the entire dataset, whereas the estimate of the change in the cdf necessitates smaller sized samples.
In sparse datasets, such as precipitation, an estimate in the change in cdf may not be feasible. However, it may still be possible to estimate the pdf aggregated over the entire interval. The geographical variation of the pdf alone contains information on the geographical variation of the local trend parameter. Alternatively, we can write (2.10) in terms of the change in return time R(T q ) of events of size T q (see appendix A):

Illustration with synthetic data
We first illustrate the method, and how it can be used to optimize against observational statistical uncertainty by constructing a test time series of pseudo-daily observations that is of a similar finite size as the data as follows. We have daily observations over a 'season' which is typically no longer than three months, over a period of typically 50-100 years. For each of 100 'years' of the test time series, we then represent the 'season' by a sample of 100 independent, identically distributed random numbers that are drawn from a pdf which is the Gamma distribution (where γ is the Gamma function): For each 'year' t, that is, each 100 element sample, we specify the shift and scale parameters a(t) and b(t), where t = [1, 2, 3, . . . , 100]. These determine the mean (ab), the variance (ab 2 ), the skewness (2/ √ b) and the excess kurtosis (6/b). To model a distribution with slowly changing g(t), we will consider a linear change in a and/or b from one 'year' to the next. Figure 2 shows 100 'years' of this pseudo-temperature data. The column (i) is for 100 samples ('years') with the same constant a = a 0 = 3 and b = b 0 = 5, representing a time series with no underlying change in climate state g(t). The column (ii) is the same samples but with each shifted such that x → x + 5 × t/100; this is a uniform trend in time across the entire distribution. The column (iii) has b constant and a(t) = a 0 [1 + 1/2(t/100)], so that the mean and variance both increase with increasing years t, and column (iv) has a constant and b(t) = b 0 [1 + 1/2(t/100)], so that the first four moments all increase with increasing years t. The pseudo-temperature series in column (iv) has an approximately linear trend in these quantiles.
Clearly, there is considerable statistical variation in the pdfs, and to a lesser extent, the cdfs, as a consequence of small yearly sample size. To increase the sample size, we need to aggregate observations over several consecutive years. Our expression (2.10) allows us to optimize this separately for the pdf and the cdf. A procedure for this is shown in figure 3  In the fourth row of panels, the ratio S = − C/P obtained from these is plotted (red line) for all values. Clearly, when either − C or P is small, this ratio is dominated by statistical uncertainty and we have overplotted (black line) values where the magnitudes of both C and P exceed a threshold value of 0.005. For arbitrarily good statistics, these plots should show the following, from left to right: column (i) C → 0, so S → 0, (ii) S constant, (iii) and to a greater extent (iv) S is largest for values of x greater than the mean as the positive skew and kurtosis both increase with t.
Increasing the sample interval τ , over which the cdfs used to obtain C are estimated, improves statistics. This procedure is valid provided that this interval τ is much shorter than that rsta.royalsocietypublishing.org Phil Trans R Soc A 371:   over which g(t) is changing. Importantly, in real data, this procedure effectively removes trends on time scales shorter than τ . The choice of τ thus needs to be optimized for the specific dataset under consideration. We demonstrate this using the pseudo-temperature data in figure 4a,b for τ = 3 and τ = 9 'years', respectively. In these plots we calculate the local trend parameter S for 10 successive realizations of the pseudo-temperature data, that is, we increment t 1 and t 2 by one year for each curve that is plotted, keeping the interval t 2 − t 1 fixed. For arbitrarily large sample size these curves should collapse on top of each other. This suggests an operational measure for the reliability of the local trend parameter; one can consider a given value of the parameter to be reliable provided that value is exceeded in all (here 10) realizations. A further check on the robustness of the trend in time is to construct a surrogate dataset (see also [12]), that is, to recalculate the local trend parameter having randomly shuffled the time order in which the samples are observed; this quantifies the observed local trend parameter value that could occur  randomly with these finite sample statistics. We will illustrate these methods using the E-OBS data in §4.

10
We finally compare this method with direct estimation of the local trend parameter using expression (2.6). This is shown in figure 5a,b, which are in the same format as figure 4a,b, but where we have numerically inverted the cdf to obtain the quantile function. We plot the difference in quantile function with all values plotted in red, and values for 0.05 < C < 0.95 plotted as a black line as an indication of the limits of validity of the method; this is not a firm quantitative estimate of uncertainty and, in practice, one would need to obtain this from an analysis of the counting statistics and other uncertainties in the data. We can see that the indirect method and direct methods correspond within errors; the discretization or 'stepping' in figure 5 is a direct consequence of the process of numerical inversion of the cdf.

Application: E-OBS temperature data
We now illustrate this methodology with daily temperature measurements from the E-OBS gridded dataset [13] (1950-2011 v. 5.0). We will look at four locations chosen arbitrarily, but to be sufficiently far from each other to show geographical variation, at longitude and latitude (i) [ [11.25 43.75] Florence, Italy. We will repeat the above procedure using data from these four locations, focusing on summer; aggregating daily temperatures over three month intervals within each year of observation (June-August) each year gives a sample of 92 observations per year. Figure 6 shows the pdfs and cdfs for these yearly samples in the same format as figure 2. Figure 7a,b are in the same format as figure 3a,b showing the data with sample τ = 3 and 9 years, respectively. As above, a sample centred on year t will for τ = 3 include years t − 1, t, t + 1 and for τ = 9 include years t − 4, t − 3, . . . , t + 4; so that, for τ = 3 the sample centred on 1955 is over years 1954-1956, and for τ = 9, over years 1951-1959 inclusive. Here, C is then estimated using West Wales (ii) shows no significant local trend. North Spain (iii) shows a local trend of ∼1 • -2 • across all quantiles, and Florence (iv) has the same functional dependence upon quantile as The Netherlands, but at higher temperatures; at the quantile ∼27 • -35 • there is a positive shift of ∼2 • -3 • . Figure 8a,b shows the local trend parameter estimated for 10 successive intervals in the same manner as above, for τ = 3 and 9 years, respectively. For each interval, we take two samples centred on year t 1 and t 2 and we will use the same t 2 − t 1 for all 10 intervals. The first estimate of C is made using samples centred on t 1 = 1955 and t 2 = 1995, the second, on samples centred on t 1 = 1956 and t 2 = 1996, and so on. In figure 8a, we show the result of this procedure for samples with τ = 3, so that the first estimate of C is made using two samples aggregated over the years 1954-1956 and 1994-1996, the next estimate over 1955-1957 and 1995-1997, and so on.    1.0 1960 1980 2000 1960 1980 2000 1960 1980 2000 1960 1980 2000 1960 1980 2000 1960 1980 2000 1960 1980 2000 1960 1980 2000   20  10  30  40  20  10  30  40  20  10  30  40  20  10  30  40   20  10  30  40  20  10  30  40  20  10  30  40  20  10  30  40   20  10  30  40  20  10  30  40  20  10  30  40  20  10 30 40    shows the same procedure with τ = 9. Here, there is both variability due to the finite size of the sample as seen in the model data above, and in addition, systematic trends due to departures from our simple assumption (2.10); by taking a lowest order expansion, we have assumed that the trends can be approximated as linear. We have recalculated the curves of figure 8a,b using the direct estimate of the local trend parameter (2.6) and these are shown in figure 9a,b. Again, within uncertainties, these track the behaviour seen in figure 8a,b. The local trend parameter (here, for temperature) provides both geographical and at a quantile variation as outputs. While the dependence on quantile is most easily seen at a given location as shown above, geographical dependence can be seen using maps. We can map the local trend parameter either at fixed quantile (temperature) or at fixed quantile function and the latter case is shown in figure 10a. Here, for the E-OBS data, we plot, as a colour map, the value of S(longitude, latitude, q) in degrees centigrade (indicated by the colour bar), at q = 0.5 and q = 0.9, with C estimated using samples centred on years t 1 = 1958 and t 2 = 2001 with τ = 9 years (so that C is obtained from two values of C from samples over years 1954-1962 and 1997-2005). On these plots, locations where either | C| or P fall below the threshold are indicated as white space. Clear geographical trends can be seen, which vary with quantile function. Detailed interpretation of these maps will be presented in future work; here, we focus on two methods to quantify how robust these trends are. First, we can recalculate S(longitude, latitude, q) as above, with C estimated using 10 samples centred on years t 1 = 1955, 1956, 1957, . . . and t 2 = 1995, 1996, 1997, . . . with the same t 2 − t 1 . We then indicate on the maps shown in figure 10a locations (longitude, latitude) where in all 10 samples the magnitude of S > 2 (black crosses), S > 1 (black dots) and S < 1 (white crosses). This graphically indicates robustness in the trend in C over time. Second, we will estimate how significant the observed values of S(longitude, latitude, q) are compared with that obtained from a random variation in C over time (a surrogate data method, see [12]). We recalculate 100 realizations of the map shown in figure 10a with the years of the E-OBS data randomly shuffled. We then for each (longitude, latitude, q) count the number of times N r the value of S in the randomized realization exceeds that seen in the original data. We then map N r (longitude, latitude, q) in figure 10b. The colour map can be directly read as the likelihood (percent) that the local trend S(longitude, latitude, q) shown in figure 10a could occur at random. We see that throughout most of central Europe, this likelihood is below 2 percent, and generally is large where S is small.

Conclusions
We have presented a method which provides an improved way of understanding the observed consequences of a globally changing climate at specific locations and for specific thresholds. It provides a methodology for informing user-specific decisions that are often vulnerable to specific thresholds. It avoids the use of complicated models and all the epistemic uncertainties which that can involve. There are further interesting mathematical challenges in applying it to different types of distributions such as those representing precipitation, multi-variable distributions and functions of multiple variables. There are also, of course, opportunities to apply the method to thresholds experienced in real-world situations. These could be as diverse as building design, infrastructure vulnerability, strategic planning for heat waves, changing risks of crop failure, impacts on food prices, etc. These opportunities are intrinsically multi-disciplinary in nature and provide an exciting and rich vein of new research opportunities to support science-based policy and decision-making.