Human mortality at extreme age

We use a combination of extreme value statistics, survival analysis and computer-intensive methods to analyse the mortality of Italian and French semi-supercentenarians. After accounting for the effects of the sampling frame, extreme-value modelling leads to the conclusion that constant force of mortality beyond 108 years describes the data well and there is no evidence of differences between countries and cohorts. These findings are consistent with use of a Gompertz model and with previous analysis of the International Database on Longevity and suggest that any physical upper bound for the human lifespan is so large that it is unlikely to be approached. Power calculations make it implausible that there is an upper bound below 130 years. There is no evidence of differences in survival between women and men after age 108 in the Italian data and the International Database on Longevity, but survival is lower for men in the French data.

www.supercentenarians.org. For some countries the IDL has recently been extended to include data on semi-supercentenarians, i.e., people who lived to an age of at least 105. As of October 2019, it contained new French data on 9612 semi-supercentenarians and 241 supercentenarians, which we call the France 2019 data; all these supercentenarians but only some of the semi-supercentenarians were validated.
A 2016 version of the IDL (IDL 2016) was analysed by Gampe (Gampe, 2010) and Rootzén & Zholud (Rootzén & Zholud, 2017), the latter with extensive discussion (Rootzén & Zholud, 2018). Both papers made allowance for the sampling scheme, and in particular for the truncation of lifetimes that it entails. They concluded that there is no indication of an increase in mortality for ages above 110 years, and hence no indication of a finite upper limit to the human lifespan. Rootzén & Zholud (Rootzén & Zholud, 2017) also found no differences in mortality between men and women or between populations from regions and countries as varied as Japan, the USA or Europe. These conclusions are striking, but the small size of the IDL and the lack of balance between the subgroups limit the statistical power available to detect such differences.
The Italian National Institute of Statistics (ISTAT) has recently produced an important new database (ISTAT, 2018) containing individually validated birth dates and survival times in days of all persons in Italy who were at least 105 years old at some point in the period from 1 January 2009 to 31 December 2015. Using advanced survival analysis tools, Barbi et al. (Barbi et al., 2018) found that death rates in this dataset "reach or closely approach a plateau after age 105" and found a small but statistically significant cohort effect.
Data analysis must take into account the sampling scheme underlying such data. The ISTAT lifetimes are left-truncated because only individuals who attain an age of at least 105 years during the sampling period are included, and they are right-censored because the date of death of persons still alive in 2016 is unknown; see Figure 1. The right-censored lifetimes are shown by the tick marks at the right side of the panel, and include the oldest person; ignoring either the truncation or the censoring could lead to incorrect conclusions. The France 2019 lifetimes are left-and right-truncated: only individuals who are observed to die during the sampling period appear in the dataset. The statistical consequences are discussed in the Appendix A.2.
In our analysis we take the sampling frame into account and pinpoint the age, if any, at which mortality attains a plateau, and disentangle the effects of age and of birth cohort. We also compare mortality in the ISTAT, France 2019 and IDL 2016 databases, and between men and women.
We use the generalized Pareto distribution from extreme value statistics in the main analysis, supplemented by fits of the Gompertz distribution, which is standard in demography. We first outline our main results and conclusions; the Appendix gives a more detailed description of our methods.

Results for ISTAT data
Lifetimes beyond 105 years are highly unusual and the application of extreme value models (de Haan & Ferreira, 2006) is warranted. We use the generalized Pareto distribution, to model x, the excess lifetime above u years. In (1), a + = max(a, 0) and σ > 0 and γ ∈ R are scale and shape parameters. For negative shape parameter γ the distribution has a finite upper endpoint at −σ/γ, whereas γ ≥ 0 yields an infinite upper endpoint. The corresponding hazard function, often called the "force of mortality" in demography, is the density evaluated at excess age x, conditional on survival to then, i.e., . Censored observations are displayed in the right margin; the ticks indicate the ages of men (red) and women (black) still alive on 31 December 2015. The sampling frame includes only persons aged at least 105 years and alive on 1 January 2009, or whose 105th birthday occurred before 1 January 2016. The death counts for each calendar year are given at the top of the graph for men (red) and women (black).
where f (x) = dF (x)/dx is the generalized Pareto density function. If γ < 0, the hazard function tends to infinity at the finite upper limit for exceedances. When γ = 0, F is exponential and the hazard function is constant, meaning that the likelihood that a living individual dies does not depend on age beyond the threshold. In this case, mortality can be said to have plateaued at age u.
The choice of a threshold u such that Eq.
[1] models exceedances appropriately is a basic problem in extreme value statistics and is surveyed by Scarrott & MacDonald Scarrott & MacDonald (2012). If u is high enough for Eq. [1] to provide an adequate approximation to the distribution of exceedances, then the shape parameter γ is approximately unchanged if a higher threshold u is used, and the scale parameters for u and u have a known relationship, so a simple and commonly-used approach to the choice of threshold is to plot the parameters of the fitted distributions for a range of thresholds (Davison & Smith, 1990) and to use the lowest threshold above which parameter estimates stabilise. This choice balances the extrapolation bias arising if the threshold is too low with the increased variance incurred when taking u too high to retain an adequate number of observations.
The upper left-hand panel of Figure 2 shows that for age thresholds close to 105 years the estimated shape parameters for excess life lengths are negative, with 95% confidence intervals barely touching zero, but there is no systematic indication of non-zero shape above 107 years. The upper right-hand panel displays the estimated scale parameter of the exponential model fitted to life lengths exceeding the threshold. The scale parameters decrease for ages 105-107 but show no indication of change after age 107, where the scale parameter estimate is 1.45. Parameter stability plots suggest an exponential model and hence a constant hazard after age 107 or so, where a mortality plateau seems to be attained.
The upper part of Table 1 shows results from fitting Eq.
[1] and the exponential distribution to the ISTAT for a range of thresholds. The exponential model provides an adequate fit to the exceedances over a threshold at 108 years, above which the hypothesis that γ = 0, i.e., the exponential model is an adequate model simplification, is not rejected.  Figure 2: Parameter stability plots for the ISTAT data (top) and for the France 2019 data (bottom), showing the shape γ of the generalized Pareto distribution (left) and the scale σ e of the exponential distribution (right) based on lifetimes that exceed the age threshold on the x-axis. The plots give maximum likelihood estimates with 95% confidence intervals derived using a likelihood ratio statistic. The horizontal lines in the right-hand panels correspond to the estimated scale for excess lifetimes over 108 years for the ISTAT data.   Table 2: Estimates of the scale, σ e , of the exponential distribution, with 95% confidence intervals (CI). This distribution is fitted to exceedances of 108 years in the ISTAT and France 2019 data and of 110 years in the IDL 2016 data analysed in Rootzén & Zholud (2017).
The estimated scale parameter obtained by fitting an exponential distribution to the ISTAT data for people older than 108 is 1.45 (years) with 95% confidence interval (1.29, 1.61). Hence the hazard is estimated to be 0.69 (1/years) with 95% confidence interval (0.62, 0.77); above 108 years the estimated probability of surviving at least one more year at any given age is 0.5 with 95% confidence interval (0.46, 0.54).
We investigated birth cohort effects, but found none; see Appendix A.3.

Results for France 2019 data
Estimation for the France 2019 data was done as described in Rootzén & Zholud (Rootzén & Zholud, 2017), taking into account the left-and right truncation of the lifetimes. Both the parameter stability plots in the lower panels of Figure 2 and the results given in the lower part of Table 1 indicate that the exponential model is adequate above 108 years. For persons older than 108 the exponential scale parameter is estimated to be 1.41 (years) with 95% confidence interval (1.32, 1.51), the hazard is estimated to be 0.71 (years −1 ) with 95% confidence interval (0.66, 0.76) and the estimated probability of surviving at least one more year is 0.49 with 95% confidence interval (0.47, 0.52). Table 2 shows that estimates of the scale parameter for the exponential distribution for women and men for the France 2019 data differ. If men are excluded, then the estimated scale parameter increases from 1.41 to 1.46 years, and if the oldest woman, Jeanne Calment, is also excluded, the estimate for women drops to 1.44 years. Similarly to the ISTAT data, survival for ages 105-107 was lower in earlier cohorts.

Power Power
Our analysis above suggests that there is no upper limit to human lifetimes and that constant hazard adequately models excess liftime if one considers only those persons whose lifetimes exceed 108 years: there is no evidence that the force of mortality above this age is other than constant. One might wonder whether increasing force of mortality would be detectable, however, as the number of persons attaining such ages is relatively small. To assess this we performed a simulation study described in the Appendix A.7, mimicking the sampling schemes of the ISTAT, France 2019 and IDL 2016 datasets as closely as possible and generating samples from the generalized Pareto distribution with −0.25 ≤ γ ≤ 0. To remove overlap between the last two datasets, we dropped France from IDL 2016.
Any biological limit to their lifespan should be common for all humans, whereas differences in mortality rates certainly arise due to social and medical environments and can be accommodated by letting hazards vary by factors such as country or sex. With the overlap dropped we can treat the datasets as independent and compute the power for a combined likelihood ratio test of γ = 0 (infinite lifetime) against alternatives with γ < 0 (finite lifetime). For concreteness of interpretation we express the results in terms of the implied upper limit of lifetime ι = u − σ/γ. The left-hand panel of Figure 3 shows the power curves for the three datasets individually and pooled. The power of the likelihood ratio test for the alternatives ι ∈ {125, 130, 135} years, for example, is 0.46/0.32/0.24 for the ISTAT data above 108, 0.82/0.61/0.46 for the France 2019 data above 108, and 0.62/0.40/0.29 for the IDL 2016 data above 110. The power for ι = 125/130/135 years based on all three datasets is 0.96/0.80/0.64, so it appears to be rather unlikely that an upper limit to the human lifespan, if there is such a limit, is below 130 years or so.
Similar calculations give the power for testing the null hypothesis γ = 0 against alternatives γ < 0. Forcing all datasets to have the same shape parameter would allow them to have different endpoints so we reject the overall null hypothesis if we reject the exponential hypothesis any of the three datasets. The power of this procedure is shown with a dashed black line in Figure 3. The resulting combined power exceeds 0.8 for γ < −0.07 and equals 0.97 for the alternative γ = −0.1, giving strong evidence against a sharp increase in the hazard function after 108 years.

Gompertz model
The hazard function of the generalized Pareto distribution cannot model situations in which the hazard increases to infinity, but the upper limit to lifetimes is infinite. This possibility is encompassed by the Gompertz distribution (Gompertz, 1825), which has long been used for modelling lifetimes and often provides a good fit to data at lower ages (e.g., Thatcher, 1999). When the Gompertz model is expressed in the form σ is a scale parameter with the dimensions of x, and the dimensionless parameter β controls the shape of the distribution. Letting β → 0 yields the exponential distribution with mean σ; small values of β correspond to small departures from the exponential model. The fact that β cannot be negative affects statistical comparison of the Gompertz and exponential models; see Appendix A.8.
The Gompertz distribution has infinite upper limit to its support, so it cannot be used to assess whether there is a finite upper limit to the human lifespan. Its hazard function, σ −1 exp(βx/σ), is finite but increasing for all The endpoint cannot be lower than the largest observations in each database. Right: power of the Wald statistic for the null hypothesis γ = 0 against the one-sided alternative γ < 0 as a function of γ; the dashed line represents the power obtained by rejecting the null of exponentiality when any of the three one-sided test rejects. The curves are obtained by conditioning on the birthdates and left-truncated values in the databases, then simulating generalized Pareto data whose parameters are the partial maximum likelihood estimates ( σ γ , γ). The simulated records are censored if they fall outside the sampling frame for the ISTAT data and are simulated from a doubly truncated generalized Pareto distribution for IDL 2016 and France 2019. See Appendix A.7 for more details.
x (β > 0) or constant (β = 0). The limiting distribution for threshold exceedances of Gompertz variables is exponential, and this limit is attained rather rapidly, so a good fit of the Gompertz distribution for lower x would be compatible with good fit of the exponential distribution for threshold exceedances at higher values of x.
Computations summarised in Appendix A.8 show that the exponential model, and hence also the Gompertz model with very small β, give equally good fits to the Italian and the French datasets above age 107, and that the Gompertz and generalised Pareto models fit equally well above age 105.

Conclusions
None of the analyses of the ISTAT, IDL 2016 or France 2019 data, for women and men separately or combined, indicates any deviation from exponentially distributed residual lifetimes, or equivalently from constant force of mortality, beyond 108 years. Table 2 shows no differences between survival after age 108 in the ISTAT data and survival after age 110 in the IDL 2016 data for women, for men, or for women and men combined, so we merged these estimates by taking a weighted average with weights inversely proportional to the estimated variances. The resulting estimates also show no significant differences in survival between men and women, and we conclude that survival times in years after age 108 in the ISTAT data and after age 110 in the IDL 2016 data are exponentially distributed with estimated scale parameter 1.43 and 95% confidence interval (1.33, 1.52). The corresponding estimated probability of surviving one more year is 0.5 with 95% confidence interval (0.47, 0.52).
There was no indication of differences in survival for women or the whole of the France 2019 data and in the combined ISTAT and IDL 2016 data, but survival for men was lower in the France 2019 data. A weighted average of the estimates for the ISTAT data, the France 2019 data and the IDL 2016 data with France removed gives an exponential scale parameter estimate of 1.42 years with 95% confidence interval (1.34, 1.49), and estimated probability 0.49(0.47, 0.51) of surviving one more year.
Deleting the men from the France 2019 data or dropping Jeanne Calment changes estimates and confidence intervals by at most one unit in the second decimal.
There is high power for detection of an upper limit to the human lifespan up to around 130 years, based on fits of the generalized Pareto model to the three datatbases. Moreover there is no evidence that the Gompertz model, with increasing hazard, fits better than the exponential model, constant hazard, above 108 years.

Discussion
The results of the analysis of the newly-available ISTAT data agree strikingly well with those for the IDL supercentenarian database and for the women in the France 2019 data. Once the effects of the sampling frame are taken into account by allowing for truncation and censoring of the ages at death, a model with constant hazard after age 108 fits all three datasets well; it corresponds to a constant probability of 0.49 that a living person will survive for one further year, with 95% confidence interval (0.48, 0.51). The power calculations make it implausible that there is an upper limit to the human lifespan of 130 years or below.
Although many fewer men than women reach high ages, no difference in survival between the sexes is discernible in the ISTAT and the IDL 2016 data. Survival of men after age 108 is lower in the France 2019 data, but it seems unlikely that this reflects a real difference between France and Italy and between France and the other countries in the IDL. It seems more plausible that this effect is due to some form of age bias or is a false positive caused by multiple testing.
If the ISTAT and France 2019 data are split by birth cohort, then we find roughly constant mortality from age 105 for those born before the end of 1905, whereas those born in 1906 and later have lower mortality for ages 105-107; this explains the cohort effects detected by Barbi et al. (2018). Possibly the mortality plateau is reached later for later cohorts. The plausibility of this hypothesis could be weighed if further high-quality data become available.

A. Supplementary material
This section summarises the methods used to produce figures and perform our inferences, and adds goodness of fit diagnostics and hazard estimates.

A.1. Reproducibility
The ISTAT data can be obtained from the Italian National Institute of Statistics by registering at the Contact Center (https://contact.istat.it) and mentioning the Semisupercentenarian Survey and Marco Marsili as contact person.
LATool is a MATLAB toolbox for life length analysis that makes alternative analyses possible. It consists of three files that are available from https://doi.org/10.1007/s10687-017-0305-5, and a data file, data.mat, which can be downloaded, together with the full IDL database, by registering at www.supercentenarians.org.
R R Core Team (2020) was used for many of the analyses and to generate all the figures as well as Tables 1 and 3. Standalone code to reproduce the analyses is provided online. Both censoring and truncation must be handled correctly to avoid biased inferences; the effect of the truncation is that inferences must be conditioned on the event that an individual appears in the database. Below we outline how this is achieved; see Rootzén & Zholud (2017) and Davison (2018) for more details.

A.2. Truncation and censoring
Consider a sampling interval C = (b, e) of calendar time during which individuals aged over a threshold of u years were observed. Let x = age − u denote the excess age of an individual who dies aged older than u, having reached age u at calendar time t. Assume that the excess ages x are independent with cumulative distribution function F (x; θ), probability density function f (x; θ), and survival function S(x, θ) = 1 − F (x, θ), where θ is a vector of parameters to be estimated.
The likelihood contribution for someone who died in C is then whereas that for someone who is known to be alive at the end of C, and thus whose lifetime is censored, is The likelihood function L(θ) is the product of the likelihood contributions from all individuals included in the data under study. Estimates for the generalized Pareto and Gompertz distributions were found by numerical maximization of the log likelihood, with standard errors obtained from the inverse observed information matrix. Explicit expressions exist for the maximum likelihood estimator of the exponential distribution scale parameter and its standard error, The IDL and France 2019 data were left-and right-truncated, and this was taken into account in our analysis; the likelihood contribution for these individuals is Inappropriate analysis can lead to misleading results: for example, fitting an exponential distribution to the ISTAT individuals who survive beyond age 107 without accounting for truncation or censoring gives the estimate σ e = 1.25 with 95% confidence interval (1.17, 1.33), to be compared with σ e = 1.45 with 95% confidence interval (1.35, 1.56) once the truncation and censoring are accounted for; these confidence intervals do not overlap.

A.3. ISTAT cohort effects
The local hazard estimates in Figure 4 are obtained by splitting the likelihood contribution of individuals into yearly blocks, using disjoint intervals with (b, e) = (a, a + 1) years to avoid using the same individuals several times. For the highest interval we set e = ∞ and include all individuals who survived into that interval.
The parameter stability plots in Figure 5 and the estimated hazard plots in Figure 4 show roughly constant mortality for the birth cohort 1886-1905 for the entire age range. Mortality is lower for ages 105 and 106 for the birth cohort 1906-1910, but equals that for the earlier birth cohort at ages 107 and above. This reduction in mortality for the later birth cohort implies that plateauing for the entire ISTAT dataset does not start until approximately the age of 108.

A.4. Graphical diagnostics
A quantile-quantile (or QQ-) plot is a standard diagnostic of the fit of a specified distribution to data, but it must be modified to accommodate censoring or truncation. We plot the ordered ages at death, x i , of noncensored ISTAT individuals against plotting positions from F −1 { G(x)} (Waller & Turnbull, 1992), where in our case F −1 is the quantile function of the exponential distribution fitted to ages exceeding 108 years and G is the Kaplan-Meier estimate of the distribution function, corrected for censoring and truncation (Tsai et al., 1987). To assess the variability of the plot we simulate new ages at death from the fitted exponential distribution, conditioning on the birth dates, truncation time and censoring indicator; this amounts to simulating new lifetimes from a doubly truncated exponential distribution, since individuals whose death is observed during the sampling frame cannot exceed the age they would reach on 31 December 2015. The left-hand panel of Figure 6 shows 100 such curves, corresponding to x i (b) on the y-axis and under the parametric bootstrap, both F and G are re-estimated using the simulated samples. The right-hand panel of Figure 6 shows approximate 95% pointwise and simultaneous confidence intervals obtained using a simulation envelope for  Figure 5: Parameter stability plots for the ISTAT birth cohorts 1896-1905 (top) and 1906-1910 (bottom), for the parameters γ of the generalized Pareto distribution (left) and for the scale σ e of the exponential distribution (right) obtained from exceedances of the age threshold on the x-axis. The plots give maximum likelihood estimates with (profile) likelihood 95% confidence intervals. The horizontal line on the right panels corresponds to the estimated scale for exceedances above 108 for the full ISTAT dataset. (Davison & Hinkley, 1997, §4.2.4). Despite the discrepancy for individuals dying shortly after age 108, an exponential distribution adequately captures the observed mortality.

A.5. Differences between men and women
The imbalance in the number of men and women limits our ability to detect any effects of gender on mortality at great age. To illustrate this, we conducted a simulation study based on the ISTAT data to assess the power of a likelihood ratio test for conditional lifetime exceedances above 108 years, for which we have 40 men and 375 women; the lifetimes of 94 individuals, 15 of them men, are censored. We condition on the sampling frame and the sex of individuals and simulate new life trajectories for both men and women based on an exponential distribution with relative scale differing by a ratio λ > 1, corresponding to lower hazard for women. Thus individuals who were older than 108 in 2009 survive at least beyond that age, but a simulated lifetime that extends beyond 2016 is censored. For each of 10 000 simulated samples, we computed the likelihood ratio statistic for comparison of fits to men and women separately and a combined fit; the statistic has an asymptotic χ 2 1 distribution. We also considered conditioning only on the birth dates and the beginning of the sampling period to assess whether the right-censoring leads to loss of power, but the difference is negligible. Power of 80% is achieved if λ ≈ 1.61, corresponding to an average survival, σ e , of 1.85 years for women and 1.14 years for men, with the corresponding differences for powers 20% and 50% being 0.28 years and 0.50 years. The  Figure 6: Exponential QQ-plot of the ISTAT data for individuals who lived beyond 108 years and died before 2016, with pointwise (dashed) and simultaneous (dotted) 95% confidence intervals obtained through simulation (right). The left panel shows trajectories for simulated lifetimes drawn from the exponential model for observed non-censored observations. The circles give the ordered ages at death of non-censored ISTAT lifetimes against the plotting positions.
power of the corresponding test for the IDL 2016 data is expected to be similar. For the France 2019 data a likelihood ratio test strongly rejects the hypothesis of equal hazard for women and men. The ratio of the estimated hazards for men and women in this dataset is approximately 1.61. The combined power of the tests in the ISTAT and IDL 2016 data for detecting this ratio is approximately 0.96.

A.6. Hazard estimates
To construct a local hazard estimate based on the limiting generalized Pareto distribution, we note that this distribution has reciprocal hazard function r(x) = (σ + γx) + , where a + = max(a, 0) for real a. A more flexible functional form is r(x) = {σ + γx + g(x)} + , where g(x) is a smooth function of x that tends to zero as x increases. For exploratory purposes we take g( here the κ k are fixed knots, and b k (x) = 0 for x ≥ κ k . The resulting cubic spline function g(x) has two continuous derivatives. This model has K + 2 parameters and corresponds to the generalized Pareto distribution for x > κ K , but allows r(x) to depart from linearity for x ≤ κ K . Lifetime data are recorded to the nearest day and often there are ties for small x, so we use a discrete version of the model, with x ∈ {1, . . . , x max } days, where x max corresponds to 16 years after age 105. We let h(x) = 1/r(x) denote the hazard function, set H(x) The black rug shows (jittered) times of deaths, and the red rug shows (jittered) censored survival times; the rugs are suppressed for the lower ages, as there too many points to be informative. The right-hand panel shows the same output for the best-fitting exponential model, for which γ = 0.
The likelihood is then a product of terms of the form Pr(X > s) , x > s, and depends on the parameters σ, γ, β 1 , . . . , β K , estimates of which are readily obtained by numerical maximization of the log likelihood function. The resulting fit depends on the knots κ 1 , . . . , κ K , but to reduce this dependence we generate knots at random, roughly evenly spaced in an interval (0, x max ), where x max is chosen large enough that r(x) should be linear for x > x max , i.e., the generalized Pareto model is fitted when x > x max . The left-hand panel of Figure 7 shows a local estimate of the hazard function for the ISTAT data, constrained to have the form of (2) above 110 years. The hazard dips up to 108 years or so, then rises and declines slowly. To assess the significance of this decline we performed a bootstrap analysis Efron & Tibshirani (1993); Davison & Hinkley (1997), generating 5000 replicate datasets, the hazard function estimates for 100 of which are shown in the panel. These suggest that the slow decrease after age 110 is not significant, and this is confirmed by the pointwise and overall 95% confidence bands. The initial dip seems to be a genuine feature, but above 108 years the confidence bands include a wide range of possible functions, including a constant hazard.

A.7. Power
To assess the statistical power of our procedures, we compute the maximum profile likelihood estimates σ γ for the original data with γ fixed at the values −0.25, . . . , 0 and simulate new excess lifetimes for each γ, conditioning on the sampling frame. In the ISTAT data, for example, we calculate the ages of all individuals on January 1st, 2009, retain the dates at which they reach 108 years and sample new excess lifetimes from the generalized Pareto distribution with parameters (γ, σ γ ), right-censoring any simulated lifetimes beyond the end of 2016. This ensures that the power calculations are as relevant as possible, not only in terms of the sampling scheme but also in terms of the underlying parameters. For each simulated dataset we compute the directed likelihood ratio statistic r = sign( γ − γ){2 ( γ, σ) − 2 (0, σ e )} 1/2 and the Wald statistic w = γ/se( γ) for testing the null hypothesis γ = 0 against the alternative γ < 0, which corresponds to testing for exponentiality against possible upper bounds to the human lifespan. The asymptotic null distribution of the statistics r and w is standard normal and we can assess the quality of this approximation by calculating the Wald statistics when γ = 0. These simulations show that the estimator of the shape parameter is unbiased and normally distributed, but the distribution of the Wald statistics is left-skewed, leading to inflated Type I error. For the power study, we thus use the simulated null distribution for comparison; tests performed in the paper should be considered liberal, meaning their level would be higher than that claimed.
We proceed similarly for the endpoint ι. In order to simulate from generalized Pareto distributions with ι fixed at a given value, we reparametrise the generalized Pareto log likelihood function in terms of ι and σ and estimate the scale parameters σ ι for each of the original three datasets with ι fixed, then use the relation ι = u − σ/γ to obtain the three implied shape parameters γ ι . We then simulate new datasets with the sampling scheme described above, but using the three sets of parameter pairs ( γ ι , σ ι ). For the joint test of the null hypothesis of an exponential distribution, ι = ∞, against the alternative of a common but finite ι, we reparametrize the likelihood in terms of ι and allow the three datasets to have different values of σ.
The left-hand panel of Figure 3 shows how the empirical proportion of rejections for a test of nominal size 5% based on the directed likelihood root statistic r = −{2 gp ( ι, σ) − 2 exp ( σ e )} 1/2 varies with ι = u − σ/γ for γ < 0, for the ISTAT, France 2019 and the rest of the IDL 2016 data. Here ( ι, σ) are the maximum likelihood estimates for the generalized Pareto distribution parametrized in terms of a common finite ι and three scale parameters and σ e are the maximum likelihood estimates of the three exponential scale parameters under the null hypothesis.

A.8. Gompertz model
The reciprocal hazard function of the Gompertz distribution r(x) = σ exp(−βx/σ) encodes the speed of convergence to the limiting extreme value distribution (Smith, 1987); even if β > 0, r (x) = β exp(−βx/σ) → 0 exponentially fast as x → ∞. Any improvement in the fit of the Gompertz model for exceedances over some threshold would be shown by evidence that β > 0. Table 3 summarises the fit of this model for various thresholds and the ISTAT data, without sex or cohort effects. The hypothesis that the Gompertz distribution (β > 0) reduces to an exponential distribution (β = 0) is a boundary hypothesis, so the likelihood ratio statistic to test β = 0 does not have the usual approximate χ 2 1 distribution; its large-sample distribution is a 50:50 mixture of a point mass at 0 and a χ 2 1 distribution, sometimes written as 1 2 χ 2 0 + 1 2 χ 2 1 (Self & Liang, 1987). Barbi et al. Barbi et al. (2018) do not notice this, leading them to mis-state the significance of the difference of log-likelihoods in their Table 2 -the likelihood ratio statistic for testing β = 0 against β > 0 is w = 2 × 0.292 = 0.584, and Pr(χ 2 1 > 0.584) ≈ 0.44, which is the significance level quoted in Barbi et al. (2018). In fact the correct (asymptotic) level would be 1 2 Pr(χ 2 1 > 0.584) ≈ 0.22. Here the conclusion does not change, but it might in other contexts.
In general, p-values obtained using a parametric bootstrap are likely to be more reliable than asymptotic approximations of the form 1 2 χ 2 0 + 1 2 χ 2 1 and are preferable for such comparisons. In the present case this entails simulating independent datasets like the original data from the boundary exponential distribution (the null hypothesis, β = 0), and estimating the p-value by the empirical proportion of these simulated datasets in which the likelihood ratio statistics w * are no smaller than the original value w, i.e., Pr * 0 (w * ≥ w), where the asterisk indicates a parametric bootstrap simulation and the subscript indicates that the simulation is under the null hypothesis; see Chapter 4 of Davison & Hinkley (1997). This approach was used to obtain the p-values in Table 3, which show that the exponential model is statistically indistinguishable from the Gompertz Table 4: Deviances for comparison of extended generalized Pareto (GP) and the exponential, GP and Gompertz sub-models for different thresholds. The rows show the likelihood ratio statistic, i.e., twice the difference in log-likelihood between the specified model and an encompassing extended generalized Pareto model. 108 years onwards, though the Gompertz model fits better at 106 years and below for the ISTAT data and at 107 years and below for the France 2019 data. Table 4 compares the fits of the Gompertz, generalized Pareto and exponential models to the ISTAT data, with the baseline taken to be an extended generalized Pareto distribution that encompasses all three other models; the details will be reported elsewhere. The generalized Pareto and Gompertz models fit equally well for all thresholds, since the differences between their respective deviances are minimal. Both are better than the exponential model below 107 years, but not above, in agreement with Table 3.