Historical citizen science to understand and predict climate-driven trout decline
Abstract
Historical species records offer an excellent opportunity to test the predictive ability of range forecasts under climate change, but researchers often consider that historical records are scarce and unreliable, besides the datasets collected by renowned naturalists. Here, we demonstrate the relevance of biodiversity records developed through citizen-science initiatives generated outside the natural sciences academia. We used a Spanish geographical dictionary from the mid-nineteenth century to compile over 10 000 freshwater fish records, including almost 4 000 brown trout (Salmo trutta) citations, and constructed a historical presence–absence dataset covering over 2 000 10 × 10 km cells, which is comparable to present-day data. There has been a clear reduction in trout range in the past 150 years, coinciding with a generalized warming. We show that current trout distribution can be accurately predicted based on historical records and past and present values of three air temperature variables. The models indicate a consistent decline of average suitability of around 25% between 1850s and 2000s, which is expected to surpass 40% by the 2050s. We stress the largely unexplored potential of historical species records from non-academic sources to open new pathways for long-term global change science.
1. Background
Contemporary anthropogenic climate change is driving critical changes in biological communities, with severe implications for the conservation of biodiversity [1]. Climate-driven distribution changes have been widely documented in response to generalized warming [2], and future shifts are forecasted [1]. Such forecasts are most often performed using bioclimatic envelope models, which associate the occurrence of species and climate variables to define the range of climatic conditions (i.e. climatic niche) under which the species is likely to occur [3]. Once climate–distribution relationships have been identified, future distributions are forecasted by extending these relationships to future climate scenarios. However, there is a high level of uncertainty when extrapolating predictions to novel conditions, such as future climates. Actually, the accuracy of species range forecasts will not be fully evaluable until the forecasted period, typically the end of the twenty-first century, is reached [4].
Historical species records provide a powerful tool to analyse climate-driven biodiversity changes [5] and assess the accuracy of bioclimatic envelope approaches in predicting those shifts [6,7]. However, biodiversity researchers have often assumed that historical biodiversity data are scarce, unreliable, and/or unsuitable for robust descriptions of ecological processes [8]. Exceptions to this distrust are the datasets compiled by renowned naturalists, such as those of Alexander von Humboldt in South America in the early-nineteenth century and Joseph Grinnell in California in the early-twentieth century, which have been the base for pivotal studies on the impacts of climate change [9,10]. However, these data collections are not only extremely rare, but also restricted in space, in contrast with contemporary large-scale global biodiversity databases commonly used to model species distributions. Clavero & Revilla [11] suggested that historical citizen-science-like initiatives aimed at characterizing territories, often promoted from outside the natural sciences academia, should be mined to generate large-scale historical records datasets and characterize long-term biodiversity dynamics. Here, we explore the potential of this approach, using historical Spanish freshwater fish records from the mid-nineteenth century, to describe distribution changes of the brown trout (Salmo trutta) in the past 150 years.
In common with all salmonids, the brown trout (the trout henceforth) is a cold-water-dwelling species and is expected to be sensitive to warming patterns [12]. Spain is a topographically complex territory placed towards the southern, warm extreme of the trout range, and thus constitutes a good study system to analyse the impacts of warming on trout distribution [13]. We built an extensive dataset of historical trout records in Spain, in order to understand the long-term distribution changes and evaluate our capacity to predict them. We modelled historical trout distribution, projected distribution–temperature relationships to present-day climate conditions, and evaluated this forecast using present-day distribution data as an independent dataset. To the best of our knowledge, our approach has no precedents regarding the combination of amount of historical records (over 10 000 fish records), geographical area covered (around 500 000 km2), and time window analysed (150 years) for any group worldwide.
2. Methods
(a) Historical and current trout occurrence data
Freshwater fish records from the mid-nineteenth century (1850s) in Spain were extracted from the geographical dictionary (i.e. gazetteer) edited by Pascual Madoz (henceforth the Madoz) [14], a citizen-science initiative that involved thousands of informants in the description of population centres, rivers, and topographical elements [11]. We extracted 10 223 freshwater fish records from 5 427 localities, which were georeferenced using Google Earth. The trout was cited in 3 943 sites, i.e. 72.6% of the localities. Following Clavero & Hermoso [15], the trout was considered to be absent from a locality when the Madoz provided information on freshwater fish but did not cite the trout, assuming that, being the most appreciated freshwater fish in Spain [16], the trout would have been cited whenever present [17].
For consistency with the available contemporary records (see below), the trout distribution was summarized in Universal Transverse Mercator (UTM) 10 × 10 km cells (figure 1). This resulted in 2 061 cells with 1850s fish records, which included, on average, 2.6 localities (median = 2; maximum = 18). The trout was considered as ‘present’ in cells including at least one locality with trout presence (1 388 cases, 67.3% of the cells), and ‘absent’ otherwise. This presence–absence dichotomy eludes the fact that the trout could be either rare or widespread in a cell. Thus, we selected cells including at least three localities (n = 734) and classified them in terms of trout prevalence as (i) not reported (n = 57; not cited), (ii) low (n = 49; present in 1–50% of localities), (iii) high (n = 183; present in 51–90% of localities), and (iv) widespread (n = 445; present in >90% of localities).
The present-day (2000s) trout distribution in Spain was described, using the Spanish Inventory of Terrestrial Species (Inventario Español de Especies Terrestres, available at www.magrama.gob.es), which provides data on species presences in UTM 10 × 10 km cells. We extracted the 19 314 freshwater fish records and identified 1 404 cells with trout presence. The trout was considered absent from a cell when the database had information on freshwater fish for that cell but did not cite the trout, resulting in 2 375 trout absences (figure 1).
In total, 1 878 UTM cells had trout presence–absence information for both the 1850s and 2000s. We classified this subset of cells in terms of the changes in trout occurrence as (i) absence (the trout was absent in both periods), (ii) presence (present in both periods), (iii) colonization (absent in the Madoz, but currently present), and (iv) extinction (present in the Madoz, but currently absent; figure 1).
(b) Climatic characterization
We used digital climate surfaces to describe the spatial pattern of temperature during the mid-nineteenth century (1850s scenario), by the end of the twentieth century (2000s scenario), and in the mid-twenty-first century (2050s scenario). In all cases, we compiled three temperature variables: (i) mean annual temperature, (ii) mean of maximum July temperatures, and (iii) mean of minimum January temperatures.
We built a 1850s thermal scenario because, to the best of our knowledge, operational climate scenarios for Europe in the mid-nineteenth century were not available. We based the 1850s scenario on climatic estimates for the early-twentieth century, assuming that the temperature in these two periods were similar, which is supported by the available long-term temperature records in Spain (electronic supplementary material, figure S1). The 1850s scenario was thus built from the temperature estimates generated by the ALARM project [18] for the period 1901–1920, which were downscaled using regional temperature surfaces, assuming that the relationship between geographical factors and temperature remains stable over time. The downscaling procedure had three steps. First, we checked for spatial coherence between the climate research unit–general circulation model (CRU–GCM; with a spatial resolution of ca 250 km2) and more spatially detailed surfaces (0.01 km2) based on the Digital Climatic Atlas of the Iberian Peninsula (http://www.opengis.uab.es/wms/iberia/en_index.htm) over the same baseline years (1950–2000). Second, we computed the absolute difference between the GCM surface for the averaged baseline period and each one of the target years (i.e. from 1901 to 1920). Third, we added this difference to the averaged baseline period computed from the Atlas surfaces, thus introducing the topographic variability in our 1850s scenario.
The 2000s scenario used the average of the Digital Climatic Atlas of the Iberian Peninsula month-by-month temperature maps from 1991 to 2010, whereas the 2050s scenario was based on the Fifth Assessment Intergovernmental Panel on Climate Change (IPCC) report, as provided by WorldClim (www.worldclim.org). We averaged temperature information assembled from six GCM (CNRM-CM5, IPSL-CM5A-LR, HadGEM2-ES, MPI-ESM-LR, GISS-E2-R, and CCSM4). From the different representative concentration pathways (RCPs) considered by the IPCC, we assembled temperature information from the optimistic RCP2.6 scenario. The 2050s scenario resulted then from averaging 2041 to 2060 monthly temperature values with approximately 1 km2 of spatial resolution.
In a final step, we averaged the three scenarios (1850s, 2000s, and 2050s) to the 5 294 10 × 10 km UTM cells included in the conterminous Spain.
(c) Analytical procedures
We modelled trout presences–absences through an ensemble ecological niche modelling approach, using the BIOMOD2 library [19] within R. We used nine different algorithms (see electronic supplementary material, figure S2) and evaluated their predictive performance through the area under receiver operating characteristic curve (AUC). Only models with AUCs above 0.7 were used to build final ensemble models, using the weighted mean of probabilities option. The evaluation of ensemble model was done by an 80–20% splitting of the data in calibration and validation subsets AUC comparison, repeating the procedure 10 times.
We modelled the distribution of trout using the Madoz dataset and the 1850s temperature scenario (1850s model) and projected this model to the 2000s temperature scenario (2000s predictions). The performance of this forecast was assessed through a threefold procedure. First, we calculated the AUC of the forecast, using present-day presence–absence data as an independent validation dataset. Second, we modelled the present trout distribution (2000s model) and compared the resulting suitability estimates with those of the 2000s predictions. This comparison was made by means of the Pearson correlation coefficient (r) and the slope of major axis (MA) model II regressions, assuming that matching predictions would have a slope close to one and an intercept near zero. MA regressions were analysed with the ‘lmodel2’ package [20] in R. Third, we used the Pearson correlation to test the concordance of the forecasted and observed changes in UTM cells suitability between the 1850s and 2000s.
Finally, we assessed the relationships between present-day presence–absence trout data and temperature variables (i.e. from the 2000s model) to predict the distribution of species in the 2050s, based on the temperatures expected for that period. This distribution forecast was interpreted critically in light of the accuracy of long-term range shifts of the trout previously analysed.
3. Results
Trout distribution in Spain was similar between the 1850s and the 2000s (χ2 = 436.5; p < 0.001; n = 1 878; figure 1), being concentrated in northern areas or linked to mountain ranges towards the south. Changes in trout occurrence (i.e. colonizations or extinctions) affected 25% of the cells, with extinction events outnumbering colonizations by a threefold factor (367 versus 115 cells), indicating a clear trend towards a reduction of trout range. The trout was more resilient in cells where it had been more prevalent in the nineteenth century, with an almost threefold reduction in the probability of extinction from low to widespread prevalence cells (electronic supplementary material, figure S3).
The long-term decline of the trout in Spain coincides with a clear warming trend (mean annual values augmented by 1.52°C), although the magnitude of this increase changed both seasonally and spatially. Summer maximum temperatures increased more than winter minima (2.69°C versus 0.63°C) and the spatial patterns of temperature change also differed between seasons (electronic supplementary material, figure S4). Trout extinctions and colonizations tended to occur in areas with intermediate temperatures between those with constant presences or absences, although this pattern was more evident for summer temperatures than for winter ones (figure 2). The changes in temperatures, and not only their absolute values, were also associated with changes in trout distribution. Mean annual and July maximum temperature (but not January minima) increased more between the 1850s and the 2000s in cells in which the trout had disappeared than in those in which the species had been constantly present (figure 2).
The 1850s and 2000s models had excellent performances (AUCs 0.90 and 0.93, respectively; figure 3). Both ensemble models were built based on the same seven algorithms, and the response curves were almost identical for the 1850s and 2000s models (see electronic supplementary material, figure S2), highlighting the stability and temporal transferability of the trout occurrence–temperature relationships. The mean suitability in the 2000s model was around 25% smaller than that of the 1850s model (0.36 versus 0.50). Even though the 1850s model only analysed presence–absence data, relative suitability values clearly increased along with trout prevalence (figure 4). Relative suitability was also higher in cells with double presences than in those with double absences, attaining intermediate values in cells with a change in status (figure 3).
The 2000s predictions based on the 1850s model had a very high predictive performance (AUC: 0.88; figure 3). The predicted (2000s predictions) and observed (2000s model) average relative suitability values were very similar (0.35 versus 0.36), and the values for individual cells were closely related (Pearson's r = 0.93), with the slope of the type II regression line being very close to one (1.099; figure 3). The direction and magnitude of the observed and predicted suitability changes between the 1850s and 2000s was similar (r = 0.61; p < 0.001; electronic supplementary material, figure S5). This robust validation of the temporal projection of bioclimatic niche models provided a strong support for forecasting the trout distribution in the 2050s. Even using an optimistic future climate scenario, relative suitability in the 2050s would be only 56% of that estimated for the 1850s (figure 3).
4. Discussion
This work demonstrates the potential of species records contained in historical citizen-science initiatives to describe long-term dynamics in species distributions and to evaluate our ability to forecast future changes. We were able to compile abundant, fine-grained information on trout presences and absences dating back 150 years, overcoming preconceptions on the scarcity and lack of reliability of historical biodiversity records. These data confirm the long-term vulnerability of the trout to climate change. Trout persistence in Spain and other areas at the warm edge of the species distribution range will plausibly depend on active management. Conservation measures should include the preservation of trout native lineages by avoiding stocking hatchery trout [21], the enhancement of riparian vegetation and its heat-buffer effect [22], and the establishment of effective conservation planning to allow connectivity between current and future suitable areas [23].
The high accuracy of trout distribution forecasts implies that the species has a temporally stable thermal niche. However, this accuracy is especially remarkable in the context of the poor conservation status of Spanish rivers, because trout distribution could be negatively influenced by several non-climatic anthropogenic factors [24]. On the other hand, trout populations could also have been enhanced through widespread stocking [21], which may be the cause of several of the colonization events described here, but also through the recovery of riparian vegetation [25] or flow regulation attenuating summer droughts [26]. In spite of all these possible interfering factors, and in agreement with previous studies [27], air temperatures were strong predictors of the trout distribution both in the 1850s and 2000s.
The interpretation of range shifts based on bioclimatic envelope models should rely on the understanding of the mechanisms that generate climate–distribution relationships. The temporal transferability of climate envelope models may depend on species traits [7,28], and the effects of climate change on species ranges can be hidden, minimized, or amplified by other co-occurring processes, such as land-use changes [29] or overexploitation [30]. Thus, even though our results show a high accuracy of range forecasts for the brown trout, we stress the need to understand the links between climate variables and the distribution of organisms. Salmonids are suitable models to describe and predict the impacts of global warming because they depend on cold waters at all stages of their life history [31], whereas trout responses to changes in water temperature have been widely described [12]. We are thus confident that the high predictive value of temperatures and the temporal transferability of bioclimatic envelope models reported here are rooted in the close dependency of the trout on cold waters.
As shown here, historical citizen-science initiatives describing human and natural geographies contain massive amounts of biodiversity records. We must stress that while the Madoz dictionary is undoubtedly an important historical source, it was by no means an isolated initiative. All across Europe, there were equivalent systematic compilations of information, often in the form of geographical dictionaries and gazeteers [11], and large amounts of records are available in different types of documents from other regions [32]. The biodiversity information contained in these historical sources is most often focused on socioeconomically relevant and widely known species [8], but this bias is counteracted by the large number of fine-grained records over large spatial extents. Millions of records of wild and cultivated species could be made available in the form of large-scale, geographically precise transnational databases of the distribution of biodiversity in the past. The extraction of this information is a challenging and intrinsically collaborative and interdisciplinary task. There is a huge amount of compilation work to do, but the reward in terms of our knowledge of our environment is even bigger. Historical databases should be incorporated into, and become an important component of, global biodiversity databases [33]. Citizen science has a critical role in these present-day initiatives, through which many people from many places are providing information to improve knowledge on biodiversity and to adapt conservation strategies to future challenges. With the same aims, we should incorporate the biodiversity information provided by many people from many places in the past. We live in their future.
Data accessibility
Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.pb04q [17].
Authors' contributions
M.C. conceived the study, compiled data, and led the writing. M.N. and M.P. produced climatic scenarios. V.H. and D.V. led the analyses. All authors contributed to result interpretation and manuscript writing.
Competing interests
We declare we have no competing interests.
Funding
Part of the project HISTECOL (CGL2014-55266-P) was supported by the Spanish Ministry of Economy and Competitiveness (SMEC). SMEC also supported M.C. and V.H. through Ramón y Cajal contracts, L.B., D.V., and M.P. through FORESTCAST (CGL2014-59742) and M.N. through ACAPI (CGL2015-69888-P). L.B. and D.V. were also supported by the CULPA project (998/2013) from the National Park Autonomous Body and AFF funded by EDP Biodiversity Chair.
Acknowledgements
F. Rodríguez-Sánchez and the members of the Conservation Biology Department at EBD-CSIC commented upon the manuscript. AEMET provided climatic data, compiled by M. Batalla within the MONTES project (Consolider-Ingenio Montes CSD2008-00040).