Strategies for sustainable management of renewable resources during environmental change
Abstract
As a consequence of global environmental change, management strategies that can deal with unexpected change in resource dynamics are becoming increasingly important. In this paper we undertake a novel approach to studying resource growth problems using a computational form of adaptive management to find optimal strategies for prevalent natural resource management dilemmas. We scrutinize adaptive management, or learning-by-doing, to better understand how to simultaneously manage and learn about a system when its dynamics are unknown. We study important trade-offs in decision-making with respect to choosing optimal actions (harvest efforts) for sustainable management during change. This is operationalized through an artificially intelligent model where we analyze how different trends and fluctuations in growth rates of a renewable resource affect the performance of different management strategies. Our results show that the optimal strategy for managing resources with declining growth is capable of managing resources with fluctuating or increasing growth at a negligible cost, creating in a management strategy that is both efficient and robust towards future unknown changes. To obtain this strategy, adaptive management should strive for: high learning rates to new knowledge, high valuation of future outcomes and modest exploration around what is perceived as the optimal action.
1. Introduction
The uncertainties humans face when managing natural resources are increasing as a consequence of global environmental change [1]. These rising uncertainties pose additional challenges to management and, increase the urgency of developing efficient and robust strategies to adapt to these changes [2–6]. While climate change effects on species' distributions and abundance is a known uncertainty [7–9], how climate change will disturb harvested species' growth rates is often overlooked [10–12]. Ignoring this latter type of uncertainty may have detrimental consequences such as eventual stock collapse [12,13]. Hence, finding management strategies that can handle uncertainty in species' changing growth in addition to determining abundance is a very pertinent resource management problem.
Adaptive management [9,14–16] proposes the incorporation of learning-by-doing (LBD) to improve management in the presence of uncertainties, such as sudden fluctuations in the surrounding ecosystem. It focuses on three fundamental trade-off dilemmas for sustainable management of natural resources: how to value future outcomes (e.g. discount factors, planning horizons) [17–20], the rate at which reference points are adjusted in response to new knowledge [3,4,21] and exploration versus business as usual [22,23]. These dilemmas are similar to the key learning components of the LBD process in learning theory (e.g. [24]).
Earlier research using computational models for understanding adaptive management under uncertainty include studies on robustness of control methods in relation to performance [6,25,26], optimization methods from artificial intelligence for adaptive management under uncertainty in terms of species habitats [27] and incorporating uncertainty for adaptive conservation [28]. Here, we undertake a computational approach to adaptive management and specifically LBD, using methods from reinforcement learning (RL; [29,30]). Although RL is considered to be a promising approach within natural resource management (see e.g. [31,32]), few studies are found to have applied this approach (but see e.g. [27,32–36]). These studies primarily focus on implementing methods from RL for conservation in order to produce the best management option in a given situation. We further unpack RL to understand how to simultaneously manage and learn about a system when its dynamics are unknown.
The sources of uncertainty in adaptive management are numerous [9]. In a simplified closed system, i.e. where controllability is high [15], with one actor and one resource, we identify four major aspects of uncertainty: (1) the degree of variability in species' abundance (2) the degree to which harvesting can be regulated, (3) the degree of a priori system understanding and (4) the degree to which internal system change can be detected. If all four aspects are known, then the best management choice is to continuously calculate the analytical solution for the harvest effort, for which ample theory has been developed [37]. If the uncertainty of (4) is high, then one can either apply a fixed strategy, and ignore that the system might change or, use a learning strategy (or feedback strategy) to continuously update harvest efforts in order to gain knowledge about the system's dynamics. Furthermore, how to best optimize a given learning strategy may differ depending on what change the system is subjected to. An abrupt change in resource dynamics may require different strategies than a slow change. Thus, a learning strategy can be optimal for one type of dynamics, but sub-optimal for another.
In this paper, we contrast a learning strategy and a fixed strategy to the analytical solution of the problem at hand. We scrutinize three learning components: (i) valuing future outcomes, (ii) adapting to new knowledge, and (iii) exploration versus exploitation. We aim to find which learning strategies are most robust and sustainable in relation to six scenarios of a growth rate change for a renewable resource, in situations where stock abundance is randomly reduced. Robustness is defined as the system performance being maintained when the resource is exposed to external perturbations or when there is uncertainty about internal dynamics [25], and sustainable is defined as the optimal strategy for providing the highest long-term outcomes. We implement a simulation model of an intelligent agent that has the objective to learn sustainable management of a renewable resource. We do so to investigate the following questions: (i) how does the learning strategy perform, in comparison with a fixed strategy, under different scenarios of change? (ii) How do key learning components—such as adapting to new knowledge, valuing future outcomes, and, exploration versus exploitation—affect sustainable management outcome? (iii) What are the trade-offs that exist between optimizing for a specific scenario, versus a set of scenarios, regarding performance vis-à-vis robustness?
Recent studies on the management of renewable resources propose the inclusion of a continual learning process [6,38,39], as provided here. However, we acknowledge that there are a vast set of theories of learning and knowledge related to resource dilemmas [21,40]. As such, we emphasize that this study is limited to probing the key learning components of LBD in relation to performance and robustness when managing a resource, with the assumption that the three learning components can be set or changed through management actions. We propose a novel approach for quantitatively studying management of a renewable resource via an adaptive LBD process. As such, we aim to provide useful insights to different aspects of LBD and to identify robust strategies to improve adaptive management strategies in uncertain and changing environments.
2. Material and methods
(a) Introducing reinforcement learning and temporal-difference learning
The RL is a class of machine learning problems where an agent is supposed to learn a behaviour by interacting with its environment while receiving a reward as a signal of good behaviour or success. The objective for the agent is to find a behaviour that optimizes the total reward over time. There exists a large set of theoretically well-founded algorithms to address this task, building on ideas of goal-directed learning, i.e. LBD [29].
Temporal difference (TD) learning is a class of algorithms within RL based on explicitly estimating future outcomes. For every possible situation (referred to as a ‘state’), the TD algorithms store an estimate of the total future reward that can be collected, and this estimate is continuously updated by comparing actual and expected rewards while the agent is undergoing its programmed behaviour. These stored estimates are normally referred to as ‘state values’ and collectively constitute the agent's subjective ‘mental model’ of the environment. Technically, the estimated state values can be stored in different ways. The most straightforward approach is to simply use a lookup table, but this approach lacks the ability to generalize between similar but non-identical states. A common alternative is to use some form of artificial neural network as such networks have an inherent ability to interpolate when used as function approximators [30].
The agent's behaviour is defined by the way it chooses actions based on the current state. In general, the agent will choose actions that take it to states with a high estimated value in terms of expected future reward, but it may sometimes be preferable to take suboptimal actions in order to promote exploration of less known states.
In this study, we use an RL-based agent to represent a learning actor managing a renewable resource. We combine a TD-based learning method [29], a mental model stored in a neural network [41] and a decision-making model where the balance between best and explorative actions is parametrized (see figure 1 for a conceptual overview).
Figure 1. Conceptual model of the agent–resource system. The agent architecture consists of three learning components: a learning method, a mental model and a decision-making model. Each component can be set to influence the LBD process by varying its corresponding learning parameter, where γ represents the discount factor, α the learning rate, and, τ the exploration level. First, the agent decides on an action, next the harvest is received, and in the last step the agent learns from its experience and updates its mental model. The learning goal is to optimize harvests between current and future time steps. (Online version in colour.)
(b) Overview of the agent–resource system
To understand the learning components of LBD in relation to uncertain and changing resource dynamics, we modelled an artificially intelligent agent. The agent represents an actor, e.g. a fisher or a group of fishers learning to adapt their fishing effort to keep the stock at a sustainable level. The task of the agent is to learn to sustainably manage an archetypical renewable resource exposed to unexpected fluctuations in its abundance as well as changes in its growth rate (i.e. a non-stationary problem).
We consider an environment where the only way to collect information about the resource dynamics is by interacting with the resource. In the simulation, at each time step, the agent harvests the resource, receives a reward (harvest minus costs), processes and stores experiences (learns) and makes a decision on the next harvest effort. No a priori understanding of the resource dynamics is given to the agent. The agent's initial mental model (value function) is set to generate the same outcome for all combinations of actions and states, which means that the agent has to learn the consequences of its actions in the different states it encounters over time. For an initial period of 50 time steps the agent learns the system with a stationary growth rate, then, during the remaining time steps = [51 … 200], a growth rate change is introduced and the agent is tested for its ability to absorb these changes with different configurations of its learning parameters. Note that the agent lacked direct mediation with the growth rate as well as the change in growth rate. The random reductions in stock were present during the full simulation period.
(i) Terminology
Owing to differences in naming conventions for different scientific fields, the following terminology is used interchangeably: state = biomass of, e.g. a fish stock, action = harvest effort; (the control variable, how much to harvest) and reward = outcome; (i.e. net income calculated as harvest − costs). Mental model is equivalent to the value function (here in a neural network function approximator).
(c) The agent model
The agent model was created through the combination of three learning components, each of which associated with a specific learning parameter (figure 1). We present a short description of the agent model here and refer to the electronic supplementary material, section S1 for further details.
The mental model stores the agent's subjective estimate of the total long-term reward for every possible pair of states and next actions. This real-valued function of two variables is represented as a neural network where the values for a large number of state-action pairs are stored using weights [41]. This representation allows a read-out for any state–action pair as the network interpolates between the stored weights. Further, the weights can be updated during learning, as new estimates are made based on new experiences. The rate at which the mental model is updated is determined by the learning rate parameter (α). The learning rate determines the balance between sticking to past beliefs (a low learning rate) or quickly updating the estimates (a high learning rate).
For the decision-making model, we use a weighted random method to select an action at each time step specifically, the softmax policy [29]. The likelihood of selecting a specific action is weighted based on the estimated value for each action in the current state, as stored in the mental model. An exploration level parameter τ controls the balance between always selecting the perceived best action (low values) or making more random actions (high values), which corresponds to a more explorative behaviour.
For the learning method, we use a TD method where the agent updates its mental model after each interaction with its environment. The agent updates its expectations on future rewards, as stored in the mental model, by comparing the expected reward with the reward actually received. The expected reward is here based on the difference between the mental model's estimated value of the current and next state while taking a discount of future rewards into account. How much future reward is appreciated by the agent is, therefore, controllable via a discount factor parameter (γ). Values of γ near 1.0 means that the agent values immediate and future rewards the same, while lower values means that the agent tends to ignore the future. Hence, for values near 1.0 the agent will look for a policy to achieve long-term gains while values near 0 leads to policies optimized for short-term gain.
(d) The resource model
The resource system was represented by the Gordon–Schaefer model nonlinear function with logistic growth, in discrete time. This function is commonly used in resource economics as a ‘general-purpose representation’ of a renewable resource such as fisheries [37].




Figure 2. Scenarios of changes in growth rate. The x-axis shows the time steps and the y-axis shows the change in growth rate. (Online version in colour.)
(i) Scenarios of growth rate change
We explored six scenarios of growth rate change and one baseline scenario as depicted in figure 2. The corresponding equations are stated in electronic supplementary material, section S2.2. Although this is a strictly theoretical model, the scenarios relate to plausible empirical observations of how environmental change affects population growth over both short and long timescales [13], such as linear increases and decreases in fish growth rate due to temperature changes [10]. Year-to-year variation in population growth is found to be common in wild fish populations [42], and cyclical scenarios within recurrent epidemics of natural populations [43]. We let these examples influence our choice of scenarios, but do not parametrize according to the examples.
(e) Analysis, definition of strategies, and experiments
As a performance measure, we calculate how close the agent came to obtaining the accumulated reward maximally possible during a time frame as

The fixed strategy assumed constant environmental conditions. Its action was calculated using equation (2.3) for g = 1.0 at s = sMEY. For the learning strategy, we compared two versions of optimization, one using a specialized learning (SL) strategy and one using an averaged learning (AL) strategy. In the SL strategy, learning was optimized for each scenario while in the AL strategy learning was optimized for the aggregated set of scenarios. To identify the best combination of parameter values for the SL strategy, each learning parameter was adjusted towards optimum until all parameters were optimized according to the agent's maximum performance (equation (2.4)). For the AL strategy, the learning parameters were optimized to achieve the highest total reward for the aggregated set of scenarios.
The simulation experiments were thus compiled of (i) optimizing a combination of learning parameters for each specific scenario of change (i.e. finding SL). (ii) Optimizing a combination of learning parameters for the aggregated set of scenarios (i.e. finding AL). (iii) Finding the robustness of each SL strategy by letting it manage each of the other scenarios than the specific one for which it was optimized and, comparing it with the SL of the corresponding scenario. (iv) Finding the robustness of the AL strategy by letting it manage every scenario, and comparing it to the SL of the corresponding scenario.
3. Results
The results show that SL had higher performance compared with the fixed strategy when the growth rate was changing, however, to an even greater extent when a decrease in growth rate was present (figure 3). The SL strategy was able to reach a mean performance of 0.91, while the fixed strategy only reached a mean of 0.73. The fixed strategy suffered the lowest performance for the abrupt decreasing scenario, with a loss of 51% compared with the SL strategy. The most difficult scenarios using the SL strategy were the cyclical and the abrupt random scenarios (henceforth fluctuating scenarios), both with a performance of 0.86 (see electronic supplementary material, table S2, for the complete dataset). The highest performance for both strategies was obtained for the scenarios with increasing growth rates. This result depends on the properties of the logistic growth function and the corresponding optimal control curve. Judging from the shape of the state-specific policy (electronic supplementary material, figure S2), the choice of action becomes more robust to deviations from the optimal action at higher growth rates. Hence a fixed strategy can be efficient for increasing growth, because adaptation to change is less critical.
Figure 3. Performance of the agent using the specialized learning strategy versus the fixed strategy. A performance of 1.0 was possible if the action was taken according to equation (2.3). The learning strategy had optimized parameters for each scenario, while the fixed strategy continuously used the optimal action for the initial time step. The data were based on the average of 150 runs. (Online version in colour.)
(a) Learning parameters' effect on performance
To test the three learning parameters' effect on performance, 31 values were tested using a linear range from 0 to 1.0 for both the discount factor and the learning rate, and a logarithmic range from 0.002 to 1.0 for the exploration level (figure 4). The broad peaks in responsiveness to the varied learning parameters indicate a robust span of which parameters can be set while still maintaining a high performance. However, the results for the decreasing scenarios reveal an exception, these two scenarios are surprisingly sensitive to variations around the optimal value for the discount factor, and they are also more sensitive to the optimal value of the exploration level in comparison with the remaining scenarios.
Figure 4. The variation in performance for different parameter values using the two learning strategies. (a) The sensitivity to the discount factor, (b) to the learning rate and (c) to the exploration level. Each learning parameter was tested 31 times as represented by the markers, using the specialized learning strategy. As one parameter was tested, the remaining two were held fixed at their optimal values (dashed vertical lines). The solid lines show a smooth line over the data points. The black dash dot line shows the performance of the averaged learning strategy. The y-axis starts at 0.7 to increase contrast between data. Each data point is based on the average of 50 runs. (Online version in colour.)
The results of the discount factor (figure 4a), shows that a high valuation (0.9) of future outcomes in relation to present outcomes was always beneficial. A discount factor of 0.5–0.9 had no dramatic impact on the outcome for the increasing and fluctuating scenarios. In figure 4b, the learning rate of the mental model reveals that no learning, learning rate = 0, created a rapid drop in performance. However, a slight increase generated a high performance for the increasing scenarios, and a moderate increase for the remaining scenarios. The increasing scenarios benefit from a low learning rate (0.1) while the fluctuating and decreasing scenarios peaked in performance around 0.25 and 0.45. High learning rates of the mental model lead to ‘over training'. Hence, if the agent adapted too fast to the current experience an oscillating effect could potentially occur. Further, in figure 4c, a high exploration level, above 0.1, was always detrimental. A low exploration level generated a loss of approximately 7% to 13% at the most for all scenarios.
The fluctuating cases received a minor increase in performance from exploration, which could relate to the knowledge base of the agent. Because the agent's knowledge was based on action-state values, the growth rate was unknown. This is one plausible explanation as to why it was almost sufficient to adapt to perceived change of the system's dynamics in fluctuating scenarios.
(b) Optimal strategies and their robustness
The result of SL reveal that optimal combinations of parameters were similar to those scenarios characterized by decreasing, increasing or fluctuating change, and not whether the change was abrupt or linear (figure 5). The optimal parameters for AL, matched the parameters of the decreasing scenarios best, with a similar discount factor and exploration level, although with a slightly lower learning rate. Scenarios with clear trends benefited from higher exploration levels than the fluctuating scenarios, which required almost no exploration (see electronic supplementary material, table S3 for parameter values for both strategies).
Figure 5. The optimal configuration of learning parameters for each scenario. Specialized learning is implemented for bars 1-7, and averaged learning for bar 8. The parameter values correspond to the peak performance values for each scenario of change. Data are based on the average of 50 runs. (Online version in colour.)
The robustness of each SL strategy reveals major differences between how efficient it can perform when exposed to other scenarios than that for which it was optimized. The mean loss varied between 2% and 8%, and the maximum loss between 4% and 19% (table 1). The higher the maximum loss the worse the strategy performed when exposed to one other specific scenario of change. The higher the mean loss the worse the strategy performed when exposed to all other scenarios of change (see electronic supplementary material, table S4 for robustness data per scenario). The optimal strategy for the ‘abrupt increasing’ and the ‘abrupt random’ scenario, had the lowest robustness, while, not surprisingly, the AL strategy had the highest robustness. The optimal strategies for decreasing growth, were most similar to AL parameters and also had the least loss in robustness compared with implementing the AL strategy, with only 0.5–1% less mean robustness and 1–2% less maximum robustness.
| optimized for | mean loss | max loss |
|---|---|---|
| opt. [aggregated set] | 1.65 | 3.68 |
| opt. [linear decreasing] | 2.02 | 4.67 |
| opt. [abrupt decreasing] | 2.62 | 5.91 |
| opt. [cyclic] | 2.90 | 5.94 |
| opt. [constant] | 3.18 | 5.08 |
| opt. [linear increasing] | 3.38 | 8.33 |
| opt. [abrupt random] | 5.51 | 13.33 |
| opt. [abrupt increasing] | 8.36 | 18.87 |
4. Discussion
In this study, we demonstrate that adaptive management through LBD can achieve high levels of performance despite no prior knowledge about the dynamics of the system. When implementing specialized learning and thus optimizing for a specific scenario of change, we find up to 91 ± 5% performance compared with the analytically optimal solution, even when challenged by very different dynamics. However, this learning strategy performs worse when exposed to the scenarios of change for which it was not optimized, with a corresponding loss of robustness losing up to 19% in performance. Optimizing for the aggregated set of scenarios has surprisingly high performance compared with specialized learning, as well as high robustness, losing less than 2% to other scenarios on average. Subsequently, when implementing a strategy that optimizes for the aggregated set of scenarios, we find that the cost between optimizing for performance versus robustness is negligible, an outcome that is often difficult to achieve [26].
Furthermore, our results indicate that increasing resource growth rates creates a more tolerant system in which to learn, promoting shortsightedness (represented by a low discount factor), sticking to past beliefs about the system (represented by a low learning rate) and modest experimentation. Declining growth rates provide a more unforgiving resource system, which is also true for resources with threshold dynamics as shown in Lindkvist & Norberg [34]. Resources with declining growth can only be learned about efficiently with high discount factors (i.e. low discount rate), high learning rates and modest exploration. The high learning rates may be interpreted following the results of Brown et al. [13] where resources with declining growth rates need faster adaptation in management strategies leading to lowered harvest actions. An important finding of this study is that the same holds true for a robust learning strategy that can manage all scenarios at a negligible cost. Hence, this directly challenges and provides evidence against the prevalence of overly low discount factors with respect to management of renewable resources, even when stocks are in a healthy state. Our simulations show the benefits of LBD at the system level. However, whether or not the discount rate, exploration level and learning rate is something that is possible to choose or effect, or if it is optimal from an individual versus governance perspective is an open question.
As Blackmore [40] presents, there are a multitude of learning theories in the context of resource dilemmas and sustainability. Our study provides insights related to three key decision-making dilemmas related to adaptive management and LBD. Although, management decisions are inherently context dependent, we outline some general advice for management under conditions where both changes in growth rates and random stock removal is present, and the action (harvest effort) and goal state is learned instead of known as presupposed by many other studies on optimal resource management (e.g. [6,37]).
Firstly, in most economic decision-making the value of expected future outcomes plays a large role [37]. Our results accentuate the significance of this, where a high valuation of future outcomes is crucial for effectively learning sustainable resource use. These results correspond to the high discount factors Lindahl et al. [20] find in their laboratory experiments with human subjects for the Gordon–Schaefer model's constant case. For increasing resource growth rates, high discount factors are of less importance. In these situations a more shortsighted approach is justifiable, and hence a more shortsighted approach may be acceptable in a particular set of circumstances. As Sumaila & Domínguez-Torreiro [18] discuss, the optimal discount factors from an individual's perspective or a governance perspective depends on a number of factors such as a more economically rational actor weighing its own benefits at present versus in future time. In addition both risk perception and the more altruistic perspective of acknowledgement of future generations [17], along with the actual discount factor in operation will vary among the involved fishers and the governance and management systems in place.
Second, to increase efficiency of the learning strategy, relatively high exploration rates are favourable [22,23]. However, somewhat surprisingly, this only applies for the increasing and decreasing trends in growth rate change. For the fluctuating scenarios relatively low exploration rates suffice, which indicates that learning by adapting to feedbacks is sufficient. This raises the question of whether or not LBD can be efficient without exploration, and when it is more important to adapt to new knowledge as learned through the more recent feedbacks between harvesting efforts and catch. This may be the case when resources are highly fluctuating and unpredictable, as the information gained by exploration (herein deviating from the perceived optimal choice of action) will be difficult to interpret because of the turbulence in the system. This issue is a prominent focus of the adaptive management literature on active versus passive adaptive management [14,16]. Our current model suggests that a passive, less explorative approach, may be better when systems are highly noisy.
A third learning component is learning rate, i.e. at what rate is it appropriate to change ones understanding of what action is more sustainable when harvests deviate from the expected outcome. For resources with an increasing growth rate, slower adaptation is adequate, however, faster adaptation is advantageous for resources with declining or fluctuating growth rates. Our study thus supports the thesis of Brown et al. [13] which argues that resources with a declining growth rate are more sensitive to a fast reduction in harvest efforts to avoid severe effects of over harvesting, compared with an increasing growth rate. In addition, our results suggest that this fast reduction in harvest effort holds for resources with fluctuating growth rates as well, and should also be included in this thesis. Moreover, as e.g. Pinsky [3] and Brander [4] press, our model shows that it is fundamental to continuously adjust current reference points to changing conditions.
(a) Model limitations and future pathways
The simulation experiment undertaken in this study can be regarded as a pilot study to scrutinize adaptive management through a computational LBD model in the context of a simple renewable resource, and opens up for a variety of potential future studies. In this paper we present some general conclusions for a robust strategy exposed to a resource with logistic growth with uncertainty in growth rates. Although logistic growth is a common feature of many population models, applying LBD to other population models would test this generality further. Exploring variable costs in the reward function, and catchability in the growth function, may also challenge the generality of the results. An important assumption in this model is that the agent does not know the growth rate. However, to allow the agent to either assess and store knowledge about the growth rate, or providing this information to the agent, could reveal insights into the importance of what knowledge of system dynamics is needed for efficient LBD.
The model, in its present state, lends itself to further investigation, such as exploring different options for experimenting under different resource conditions and allowing the exploration level be a control variable of the agent. Depending on the context in which the model is further developed, the complexity of the resource model may be increased to incorporate food webs and/or to become spatially explicit. Another path is to test these results empirically, either in laboratory experiments or via online experiments with human subjects, to see their correspondence with human action. Future paths could also incorporate multi-agent-based modelling to study learning and cooperation between actors for sustainable resource use under different management policy settings.
5. Conclusion
This study makes a strong first step to operationalize selected core principles in sustainability science, such as dealing with unexpected events, surprise and radical uncertainty, and incorporate these operationalized principles into enhanced management practice by bridging adaptive management theory and practice. Through the implementation of a coupled social–ecological model our results demonstrate that an LBD strategy can obtain high resilience to shocks and disturbances. However, the configuration of the core components of LBD demands certain properties. While the actual values are difficult to translate into real world problems, sustainable management strategies with high efficiency and robustness can be obtained by striving for: high learning rates to new knowledge, high discount factors to value immediate and future outcomes equally, and modest exploration around what is perceived as the optimal management strategy. In sum, to understand the role of core learning components of adaptive management strategies in relation to the nature of change can yield not only increased performance, but more importantly, increased robustness to the unknown. This increased robustness will be a pivotal feature for dealing with the increased frequency of surprising events and the increased uncertainty in dynamics we will face in the Anthropocene as a consequence of climate change and related anthropogenic stressors on global ecosystems.
Data accessibility
Source code and data are available from the Dryad Digital Repository [44].
Authors' contributions
E.L. and J.N. designed the study. E.L., Ö.E. and J.N. jointly participated in developing the model and analysed and interpreted the model results. E.L. was lead writer and software developer. Ö.E. and J.N. contributed to writing parts of the manuscript and critically commented on the manuscript. All authors gave final approval for publication.
Competing interests
We declare we have no competing interests.
Funding
E.L. and J.N. were supported by Mistra through a core grant to the Stockholm Resilience Centre, Stockholm University. E.L. by the European Research Council under the European Union's Seventh Framework Program (FP/2007-2013)/ERC grant agreement no. 283950 SES-LINK.
Acknowledgments
We thank Anne-Sophie Crépin, Maja Schlüter, Steven Lade, Andrew Merrie, Andreas Hornegård, Patrick Meyfroidt, Jonas Hentati-Sundberg, Chris Fonnesbeck and three anonymous reviewers for discussions and constructive comments on earlier versions of the manuscript.
Footnotes
References
- 1World Bank. 2012Turn down the heat: why a 4°C warmer world must be avoided. Washington, DC: World Bank. See https://openknowledge.worldbank.org/handle/10986/1186. Google Scholar
- 2
New M, Liverman D, Schroder H, Anderson K . 2011Four degrees and beyond: the potential for a global temperature increase of four degrees and its implications. Phil. Trans. R. Soc. A 369, 6–19. (doi:10.1098/rsta.2010.0303) Link, ISI, Google Scholar - 3
Pinsky ML, Mantua NJ . 2014Emerging adaptation approaches for climate-ready fisheries management. Oceanography 27, 146–159. (doi:10.5670/oceanog.2014.93) Crossref, ISI, Google Scholar - 4
Brander K . 2012Climate and current anthropogenic impacts on fisheries. Clim. Change 119, 9–21. (doi:10.1007/s10584-012-0541-2) Crossref, ISI, Google Scholar - 5
Folke C, Hahn T, Olsson P, Norberg J . 2005Adaptive governance of social–ecological systems. Annu. Rev. Environ. Resour. 30, 441–473. (doi:10.1146/annurev.energy.30.050504.144511) Crossref, ISI, Google Scholar - 6
Anderies JM, Rodriguez AA, Janssen MA, Cifdaloz O . 2007Panaceas, uncertainty, and the robust control framework in sustainability science. Proc. Natl Acad. Sci. USA 104, 15 194–15 199. (doi:10.1073/pnas.0702655104) Crossref, ISI, Google Scholar - 7
Parmesan C, Yohe G . 2003A globally coherent fingerprint of climate change impacts across natural systems. Nature 421, 37–42. (doi:10.1038/nature01286) Crossref, PubMed, ISI, Google Scholar - 8
Scheffer M . 2009Critical transitions in nature and society. Princeton, NJ: Princeton University Press. Crossref, Google Scholar - 9
Williams BK . 2011Adaptive management of natural resources—framework and issues. J. Environ. Manage. 92, 1346–1353. (doi:10.1016/j.jenvman.2010.10.041) Crossref, PubMed, ISI, Google Scholar - 10
Thresher RE, Koslow JA, Morison AK, Smith DC . 2007Depth-mediated reversal of the effects of climate change on long-term growth rates of exploited marine fish. Proc. Natl Acad. Sci. USA 104, 7461–7465. (doi:10.1073/pnas.0610546104) Crossref, PubMed, ISI, Google Scholar - 11
Roessig JM, Woodley CM, Cech JJ, Hansen LJ . 2004Effects of global climate change on marine and estuarine fishes and fisheries. Rev. Fish Biol. Fish. 14, 251–275. (doi:10.1007/s11160-004-6749-0) Crossref, ISI, Google Scholar - 12
Niiranen S, Blenckner T, Hjerne O, Tomczak MT . 2012Uncertainties in a Baltic sea food-web model reveal challenges for future projections. AMBIO 41, 613–625. (doi:10.1007/s13280-012-0324-z) Crossref, PubMed, ISI, Google Scholar - 13
Brown CJ, Fulton EA, Possingham HP, Richardson AJ . 2012How long can fisheries management delay action in response to ecosystem and climate change?Ecol. Appl. 22, 298–310. (doi:10.1890/11-0419.1) Crossref, PubMed, ISI, Google Scholar - 14
Walters C . 1986Adaptive management of renewable resources. New York, NY: MacMillan Pub. Co. Google Scholar - 15
Allen CR, Fontaine JJ, Pope KL, Garmestani AS . 2011Adaptive management for a turbulent future. J. Environ. Manage. 92, 1339–1345. (doi:10.1016/j.jenvman.2010.11.019) Crossref, PubMed, ISI, Google Scholar - 16
Williams BK . 2011Passive and active adaptive management: approaches and an example. J. Environ. Manage. 92, 1371–1378. (doi:10.1016/j.jenvman.2010.10.039) Crossref, PubMed, ISI, Google Scholar - 17
Ostrom E . 1990Governing the commons. Cambridge, UK: Cambridge University Press. Crossref, Google Scholar - 18
Sumaila UR, Domínguez-Torreiro M . 2010Discount factors and the performance of alternative fisheries governance systems. Fish Fish. 11, 278–287. (doi:10.1111/j.1467-2979.2010.00377.x) Crossref, ISI, Google Scholar - 19
Williams BK, Johnson FA . 2013Confronting dynamics and uncertainty in optimal decision making for conservation. Environ. Res. Lett. 8, 025004. (doi:10.1088/1748-9326/8/2/025004) Crossref, ISI, Google Scholar - 20
Lindahl T, Crépin A-S, Schill C . 2016Potential disasters can turn the tragedy into success. Environ. Resour. Econ. 65, 657–676. (doi:10.1007/s10640-016-0043-1) Crossref, ISI, Google Scholar - 21
Armitage D, Marschke M, Plummer R . 2008Adaptive co-management and the paradox of learning. Glob. Environ. Change 18, 86–98. (doi:10.1016/j.gloenvcha.2007.07.002) Crossref, ISI, Google Scholar - 22
Duit A, Galaz V . 2008Governance and complexity-emerging issues for governance theory. Governance 21, 311–335. (doi:10.1111/j.1468-0491.2008.00402.x) Crossref, ISI, Google Scholar - 23
Axelrod RM, Cohen MD . 2000Harnessing complexity: organizational implications of a scientific frontier. New York, NY: Basic Books. Google Scholar - 24
Kolb DA . 1984Experiential learning: experience as the source of learning and development. Englewood Cliffs, NJ: Prentice-Hall. Google Scholar - 25
Anderies JM, Janssen MA, Ostrom E . 2004A framework to analyze the robustness of social–ecological systems from an institutional perspective. Ecol. Soc. 9, 18. Crossref, ISI, Google Scholar - 26
Janssen MA, Anderies JM . 2007Robustness trade-offs in social–ecological systems. Int. J. Commons 1, 43–66. (doi:10.18352/ijc.12) Crossref, Google Scholar - 27
Nicol S, Fuller RA, Iwamura T, Chadès I . 2015Adapting environmental management to uncertain but inevitable change. Proc. R. Soc. B 282, 20142984. (doi:10.1098/rspb.2014.2984) Link, ISI, Google Scholar - 28
McDonald-Madden E et al.2010Active adaptive conservation of threatened species in the face of uncertainty. Ecol. Appl. 20, 1476–1489. (doi:10.1890/09-0647.1) Crossref, PubMed, ISI, Google Scholar - 29
Sutton RS, Barto AG . 1998Reinforcement learning: an introduction. Cambridge, MA: MIT Press. Google Scholar - 30
Wiering M, van Otterlo. M 2012Reinforcement learning. In Adaptation, learning, and optimization, vol. 12. Berlin, Germany: Springer. Google Scholar - 31
Nichols JD et al.2011Climate change, uncertainty, and natural resource management. J. Wildl. Manage. 75, 6–18. (doi:10.1002/jwmg.33/full) Crossref, ISI, Google Scholar - 32
Fonnesbeck CJ . 2005Solving dynamic wildlife resource optimization problems using reinforcement learning. Nat. Resour. Model. 18, 1–40. (doi:10.1111/j.1939-7445.2005.tb00147.x) Crossref, Google Scholar - 33
Bone C, Dragićević S . 2010Incorporating spatio-temporal knowledge in an Intelligent Agent Model for natural resource management. Landsc. Urban Plann. 96, 123–133. (doi:10.1016/j.landurbplan.2010.03.002) Crossref, ISI, Google Scholar - 34
Lindkvist E, Norberg J . 2014Modeling experiential learning: the challenges posed by threshold dynamics for sustainable renewable resource management. Ecol. Econ. 104, 107–118. (doi:10.1016/j.ecolecon.2014.04.018) Crossref, ISI, Google Scholar - 35
Chadès I, Curtis JMR, Martin TG . 2012Setting realistic recovery targets for two interacting endangered species, sea otter and northern abalone. Conserv. Biol. 26, 1016–1025. (doi:10.1111/j.1523-1739.2012.01951.x) Crossref, PubMed, ISI, Google Scholar - 36
Nicol S, Chadès I . 2011Beyond stochastic dynamic programming: a heuristic sampling method for optimizing conservation decisions in very large state spaces. Methods Ecol. Evol. 2, 221–228. (doi:10.1111/j.2041-210X.2010.00069.x) Crossref, ISI, Google Scholar - 37
Clark CW . 2010Mathematical bioeconomics: the mathematics of conservation, vol. 91, 3rd edn. New York, NY: Wiley. Google Scholar - 38
Polasky S, de Zeeuw A, Wagener F . 2011Optimal management with potential regime shifts. J. Environ. Econ. Manage. 62, 229–240. (doi:10.1016/j.jeem.2010.09.004) Crossref, ISI, Google Scholar - 39
Levin S et al.2012Social–ecological systems as complex adaptive systems: modeling and policy implications. Environ. Dev. Econ. 18, 111–132. (doi:10.1017/S1355770X12000460) Crossref, ISI, Google Scholar - 40
Blackmore C . 2007What kinds of knowledge, knowing and learning are required for addressing resource dilemmas? A theoretical overview. Environ. Sci. Policy 10, 512–525. (doi:10.1016/j.envsci.2007.02.007) Crossref, ISI, Google Scholar - 41
Poggio T, Girosi F . 1989A theory of networks for approximation and learning. Cambridge, MA: MIT Press. Google Scholar - 42
Spencer PD, Collie JS . 1997Patterns of population variability in marine fish stocks. Fish. Oceanogr. 6, 188–204. (doi:10.1046/j.1365-2419.1997.00039.x) Crossref, ISI, Google Scholar - 43
Stone L, Olinky R, Huppert A . 2007Seasonal dynamics of recurrent epidemics. Nature 446, 533–536. (doi:10.1038/nature05638) Crossref, PubMed, ISI, Google Scholar - 44
Lindkvist E, Ekeberg Ö, Norberg J . 2017Data from: Strategies for sustainable management of renewable resources during environmental change. Dryad Digital Repository. (http://dx.doi.org/10.5061/dryad.527pn) Google Scholar



