Uncertainty drives deviations in normative foraging decision strategies

Nearly all animals forage to acquire energy for survival through efficient search and resource harvesting. Patch exploitation is a canonical foraging behaviour, but there is a need for more tractable and understandable mathematical models describing how foragers deal with uncertainty. To provide such a treatment, we develop a normative theory of patch foraging decisions, proposing mechanisms by which foraging behaviours emerge in the face of uncertainty. Our model foragers statistically and sequentially infer patch resource yields using Bayesian updating based on their resource encounter history. A decision to leave a patch is triggered when the certainty of the patch type or the estimated yield of the patch falls below a threshold. The time scale over which uncertainty in resource availability persists strongly impacts behavioural variables like patch residence times and decision rules determining patch departures. When patch depletion is slow, as in habitat selection, departures are characterized by a reduction of uncertainty, suggesting that the forager resides in a low-yielding patch. Uncertainty leads patch-exploiting foragers to overharvest (underharvest) patches with initially low (high) resource yields in comparison with predictions of the marginal value theorem. These results extend optimal foraging theory and motivate a variety of behavioural experiments investigating patch foraging behaviour.


Introduction
Foraging is performed by many different species [1][2][3][4][5] and engages cognitive computations such as learning of resource distributions across spatio-temporal scales, route planning and decision-making [6]. Comparing species, one can ask how these integrated processes have been shaped by natural selection to optimize returns in the face of environmental and physiological constraints [6,7]. Foraging thus provides the opportunity to study and quantify how both evolution and neural circuitry shape a natural behaviour [8][9][10][11].
In natural landscapes, foraging involves a decision hierarchy that unfolds across multiple length and time scales, which consider both where to forage as well as how long to exploit a certain resource [12,13]. On long time scales, animals accumulate evidence to choose which of a collection of large areas they will dwell in and forage, during which their activity does not appreciably change the resource landscape. Following previous work, we refer to this as 'habitat choice' [12]. Within habitats, animals exploit resources on shorter time scales, while their activity depletes resources in visits to localized regions. We refer to this behaviour as 'patch exploitation' or 'patch leaving'. Questions of where to forage and how long to exploit local patches of resource constitute a multi-level framework for examining behaviour across spatial scales; still larger scales consider the home range of an individual, as well as the species range [13,14]. For both habitat choice and patch leaving, the local regions of an environment can be conceptualized as a 'patch'; the key difference is in whether or not the forager's activity impacts the resource availability in the landscape. As a decision problem, both are sequential choice processes, so the forager does not make a choice between discrete alternatives that are presented simultaneously, but rather only receives evidence from the current patch and must decide whether to stay or go [15]. Thus, habitat choice and patch leaving are related but differ in the length and time scales involved (figure 1a). However, most theoretical work has considered these two problems separately.
Regarding habitat choice, the optimal behaviour is clear: the forager should locate and spend as much time as possible in the habitat patch that maximizes fitness outcomes. Observational studies of habitat choice often consider the combined effects of multiple factors to ask how well they predict observed habitat use (e.g. [16]). Theoretical and experimental studies have investigated multiple factors, such as density-dependent effects predicted by the ideal free distribution when there are multiple foragers on a landscape [17] and how perceptual constraints may lead to deviations from optimal choices [18]. Our focus here is specifically on understanding an accumulation process by which individuals use information to reach decisions on habitat choice, and deriving this result in a mathematically tractable model that can more precisely parameterize strategy and performance in a wide range of environments.
Patch leaving considers shorter time scales, where the forager substantially depletes the patch during its visit and must subsequently decide when to leave the current patch in search of another. A basic result in behavioural ecology, the marginal value theorem (MVT), states that an animal can optimize the resource intake rate by leaving its current patch when the estimated within-patch resource yield rate falls below the global average resource intake rate of the environment [19]. While this theory has been validated in multiple behavioural studies [20][21][22][23][24][25][26][27][28][29], it does not explicitly indicate how beliefs about environmental features (e.g. resource distribution) are accumulated over time or how decision rules might be applied to these beliefs in the form of concrete dynamical equations. To explain how foraging decisions are shaped by the presence and reduction of uncertainty based on resource encounters, it is useful to have normative Bayesian models of patch leaving that ask how animals use limited information to make foraging decisions. However, those that do tend to consider a narrow range of environmental conditions [30,[31][32][33][34][35][36][37][38][39][40][41][42][43][44][45][46]. General models that have been proposed are challenging to analyse mathematically [47], making it difficult to reveal how environmental parameters shape an optimal forager's strategy and yield in many different environments.
The aim of our study is thus to develop a Bayesian framework of foraging behaviour, treating decisions as a statistical inference problem and connecting normative theory of foraging decisions with mechanistic evidence accumulation models [48]. Normative strategies provide a best-case scenario for a particular objective, which can then be used as a touchstone for comparison with heuristics and actual animal behaviours. Animals may actually use evidence accumulation mechanisms that approximate Bayes' optimal strategies, such as energy measurement via the passage of food through the gut [49], so we consider our model to be an abstraction that such physical strategies can be measured against. We first define a general mathematical framework to model patch-foraging decisions that applies at different time scales with regards to search and depletion of the resource. These represent the different ecological decision cases described above: 'habitat choice' and 'patch leaving'. Because these cases are only separated in the time scale of resource depletion, we treat both with the framework of patch foraging as an evidence accumulation process whereby a threshold on available evidence triggers a decision.
Using several mathematically tractable cases in which probabilistic updating based on the receipt of resources within a patch can be modelled by stochastic differential equations (SDEs), we determine patch-leaving statistics via solutions to first-passage time problems. We thus obtain analytical expressions for optimal decision thresholds that patch environment p(l 0 ) patch resource encounter rate l 0 resource encounters  Figure 1. Patch-departure tasks and model. (a) Task environments: on long time scales, an animal decides between habitats whose resource yields change slowly; on shorter time scales, the animal exploits patches whose resources are depleted more quickly. In our model analysis, we assume that the forager is solving one of these problems at a time, but not both simultaneously. (b) Ideal observer foraging model: the initial yield of the patch is drawn from the distribution p(λ 0 ), generating random resource encounter times t 1:K , and updating the belief of the current resource yield rate λ(t) for the patch. We illustrate the movement of the forager and subsequent time series of resource encounters x(t), resulting in a refinement of the posterior p(λ|x(t)). The maximum-likelihood estimate λ MLE approaches the true λ true yield rate over time.
royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210337 connect to observable quantities of interest, including patch residence time, travel time, resource consumption over time and patch yield rate over time. For both habitat choice and patch leaving, we show that, in uncertain or resource-poor environments, uncertainty causes even an ideal Bayesian observer to tend to stay too long in low-yielding patches (overharvesting) and not long enough in high-yielding patches (underharvesting). Past studies have suggested that such deviations can also arise from patch-discrimination limits [18] or behavioural state dependence [50]. Though our work is not the first to demonstrate that over-and underharvesting can arise from uncertainty in statistical inference models [51,52], we are able to show how these trends vary across a wide range of environments owing to our model's mathematical tractability. By establishing a general Bayesian framework for patch foraging at multiple scales, our study provides a platform to study behavioural and neural mechanisms of naturalistic decision-making akin to how trained decision-making behaviour is studied within systems neuroscience [8,53].

Sequential sampling model framework
The patch-foraging model framework, which describes both habitat choice and patch leaving, considers an animal searching its environment, which contains distributed resource patches (figure 1a). When the animal enters a patch, it consumes resources within the patch. In the case of habitat choice, we assume that resources are depleted slowly enough that the depletion is negligible, whereas for the patch leaving problem resources are depleted. We represent the decision process to leave a patch via a sequential sampling model for an ideal observer's posterior of its current patch's yield rate, λ(t). This assumes that an animal learns over time the yield of the patch it is currently in and to decide if and when it should leave and search for another patch.
The initial yield rate l k 0 determines the rate at which the animal initially encounters a resource in the patch and is drawn from the distribution p(λ 0 ). We assume that the forager knows and initializes its belief with the prior p(λ 0 ) when arriving in a patch (figure 1b). This simplifying assumption allows us to obtain tractable solutions. Considering randomly timed resource encounters within a patch, we use a Poisson rate λ(t) = λ 0 − ρK(t) generating exponentially distributed waiting times between encounters (as in random search [54]) that decreases with K(t), the number of resource encounters so far, where ρ is the impact of each resource encounter on the underlying yield rate of the patch. Resource encounter history can be described by the summed sequence of encounters, each at time t j : . An ideal forager performs a Bayesian update of its belief about the current patch yield rate λ, In general, resource encounters both: (i) give evidence of higher yield rates λ, since encounters are more probable in high-yielding patches; and (ii) deplete the patch, decrementing the yield rate λ by ρ (figure 1b). Varying ρ changes the rate of patch depletion relative to the time scale of the foraging process. Small relative values of ρ/λ 0 represent a large resource patch that the forager depletes very slowly. The limiting case ρ/λ 0 → 0 represents the habitat choice problem. Alternatively, when this ratio is intermediate up to unity (ρ/λ 0 ∈ [10 −2 , 1]), the forager considerably depletes the patch with each encounter. We refer to this as the patch leaving problem, and show that, in such cases, uncertainty in the patch yield can play a major role in shaping the departure strategy. We first consider the habitat choice problem in §3; following this, we consider the patch leaving problem in §4.
3. 'Habitat choice': minimizing time to find high-resource habitats Habitat choice refers to patch use at scales where the forager's activity does not significantly affect the resource distribution or, in other words, that resource depletion occurs very slowly relative to the time needed for the search process. We represent this with the mathematically tractable yet representative limit of zero patch depletion. In this case, the optimal behaviour is to quickly locate a patch with the highest yield and remain there. Although in real environments habitats eventually deplete and the forager would leave, our theoretical treatment of a 'remain in the high-yielding patch' strategy can simply translate to a 'stay a long time in the initially high-yielding patch' strategy, with results applying similarly to both because of the separation of time scales: for habitat choice, the time needed to search and decide on a high-yield patch to remain in (which we denote as T arrive ) is much less than the time that would be needed to actually deplete the patch. Upon entering a patch, the forager must use its experience of resource encounters to decide whether to stay in the patch or leave for another. We first consider a simplified binary environment where there are only two patch types-high-yield versus low-yield-and that the forager knows these possible patch types and their return rates. Here, the optimal behaviour is to infer whether or not it is currently in a high-yield patch, and, if so, to stay, but otherwise to leave. Uncertainty and stochasticity of resource encounters means that the forager will visit some low-yielding patches until it learns the yield rate and departs, and may also visit and depart from high-yielding patches if the type is incorrectly inferred. We then consider more general cases, and show that the general trends and optimal strategies from the simpler binary case still apply; this includes environments with multiple patch types and environments with continuous distributions of patch types where the forager has a threshold for accepting a patch as sufficiently dense with resources. With this approach, we can explicitly derive statistics associated with patch departures and examine how the efficient identification of high-quality habitats depends on environmental parameters like patch discriminability (e.g. λ H /λ L ) and high-yield patch prevalence (p H ).

Two patch types
An environment with two possible patch types-high yielding and low yielding-is a mathematically tractable case that gives insight into optimal decision strategies and their resulting behavioural observables. Here, the probability distribution of patch types (likelihood that the next-visited patch is of a certain type) is p 0 (λ) = p H δ(λ − λ H ) + p L δ(λ − λ L ): H denotes the higher yielding patch and L denotes the lower yielding patch. As stated, we assume that the forager knows the values λ H and royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210337 λ L and uses these as prior information to infer the type of the current patch. Using the limit of slow depletion (ρ → 0) to represent habitat choice, the animal determines which patch type it is currently in using the log-likelihood ratio (LLR) y(t) ≡ log( p(λ H |x(t))/p(λ L |x(t))). With this, their belief update can be written as an SDE: with initial condition set by the prior y(0) = log(( p H )/(1 − p H )). Resource encounters provide evidence for the high-yielding patch (first term) while elapsed time between resource encounters builds up evidence for the low-yielding patch (second term). Equation (3.1) has a simple form similar to classic evidence accumulation models of decision-making psychophysics [55,56], recently extended to foraging decisions [48]. The long-term resource intake rate is maximized if the forager finds and remains in a high-yielding patch (figure 2a). If the forager remains in a high-yielding patch, then the energy intake rate will reach λ H in the limit of long time. Before locating and deciding to remain in a high-yielding patch, the forager may also visit low-yielding patches, leaving when its belief crosses the threshold (figure 2a), and may also visit and depart from high-yielding patches, if they are mistaken for low-yielding ones. The departure threshold sets the certainty that the forager obtains before leaving: a low threshold means high certainty of the patch type before leaving: while a high threshold will result in more departures. Too low a threshold can lead to too much time spent in low-yielding patches while gathering more evidence, while too high a threshold can lead to (incorrectly) departing from high-yielding patches before gathering enough evidence to distinguish their type. By setting the optimal threshold that balances uncertainty to minimize the time to arrive and remain in a high-yielding patch, we can ask how the environmental characteristics of relative patch yield, relative patch density and travel time influence behaviour and expected return of resources.
The forager's strategy is determined by the threshold θ on its belief (LLR), given by equation (3.1), shaping the time to find and remain in a high-yield patch T arrive (u). This quantity can be computed from the patch-departure statistics of the forager, using first-passage time methods, given the prior [57,58]; τ is the mean travel time between patches, which we assume is known or determined from experience. Using this, the time to arrive and remain in a high-yield patch is The minimum of this, corresponding to θ opt (figure 2b,c), can be determined numerically (solid lines in figure 2d,e). An explicit approximation of θ opt is obtained by differentiating equation (3.2), dropping higher order terms and solving for  royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210337 where A ¼ (l H À l L ) À l L log lH lL is defined for ease of notation (A = 0 for λ H = λ L , and A > 0 for λ H > λ L ) and W −1 (z) is the ( − 1)th branch of the Lambert W function (inverse of z = We W ). This approximation matches well with the numerically obtained minima of equation (3.2) (dotted lines in figure 2d,e), and can be further simplified using the approximation (3:4) indicating the scalings of the optimal threshold in limits of environmental parameters (dashed lines in figure 2d,e). Details on the derivation of the optimal threshold can be found in [58].
How should an animal best adapt its habitat search strategy to the statistics of the environment? When high-yield patches are rare (low p H ), travel times are large (high τ) or patches are easily discriminable (high λ H relative to λ L ), the forager should gain higher certainty by deliberating longer before departing a patch; indeed, from equation (3.4) and figure 2d,e we see that the optimal threshold decreases with ρ H , τ and λ H . Increasing discriminability (λ H /λ L ) or the high-yield patch fraction p H decreases the minimal mean time T opt arrive needed to arrive and remain in a high-yield patch (figure 2f ), since this makes finding a high-yielding patch easier for the animal.
Furthermore, the outcome T arrive (u) is most sensitive to the strategy (choice of threshold θ) in environments with low discriminability and a small fraction of desirable patches (figure 2b,c). In experiments, the value of T arrive is an observable that can be used to infer the effective value of θ that an animal is using. The parameter sensitivity suggests that an animal's patchselection strategy-i.e. the value of θ it is using-could be more precisely inferred when high-yield patches are rare or more difficult to identify. Note that the patch leaving rule of thresholding one's LLR is mathematically equivalent to thresholding the mean estimated resource yield rate since l ¼ (l H þ e Ày l L )=(1 þ e Ày ), analogous to previous patchdeparture rules developed [31,41]. Next, we generalize this approach to environments with more than two patch types, so decisions use multiple LLRs, such that optimal decisions do not simply map to thresholding the estimated yield rate.

Multiple patch types
Animals may have to select from any number of patch types in an environment, which begs the question as to how decision and search strategies should extend to more general environments. With multiple patch types, decisions made by computing only two LLRs is sufficient to obtain near optimal performance in terms of minimizing the time to find and remain in a high-return patch. This result thus complements and extends previous work that has considered optimal strategies for two patch types [33,34] or more general models that are intractable to a mathematical study of how behaviour and yield vary with environmental and strategy parameters [47,59].
As in the binary case, the optimal strategy is to find and remain in the highest yielding patch (λ 1 ). We again represent patch leaving decisions by thresholding the probability of being in the high-yielding patch, such that when p(λ 1 |x(t)) = ϕ ∈ (0, p 1 ) the forager exits the patch. We approximate this thresholding process by requiring y j ≥ θ (for j = 1, 2, …, N) to remain in the patch, so the forager departs given sufficient evidence that it is not in the highest yielding patch (see figure 3a for three patch types).
We also consider how effective reduced strategies are, for which the observer only tracks the first L LLRs y 1 , y 2 , …, y L and compares these with the threshold θ to decide when to leave the patch. Thus, we compute the mean time to arrive and remain in the high-yield patch, which depends on the escape probability π 1 (θ) from the high-yielding patch and the mean time to visit each patch T j (u, L) when escaping, where the patch-departure strategy depends on the number L of LLRs thresholded and the threshold θ used. The mean high-yield patch arrival time T arrive depends strongly on the high-yield patch resource yield rate λ 1 , which decreases considerably as the patch becomes more discriminable (three patches: figure 3b; five patches: figure 3d). On the other hand, T arrive depends weakly on the worst patch's yield rate λ 3 (figure 4a), so uncertainty among the less valuable patches has little effect on behaviour. In a related way, T arrive is much more strongly affected by changes in the fraction of the high-yielding patch (p 1 : figure 3c) than by changes in the balance of the mid-(λ 2 ) and low-yielding (λ 3 ) patches (figure 4b).
Again, the optimal threshold decreases when patches are more discriminable: as λ 1 increases the forager should gain a higher certainty before leaving (figure 3b). The average highyield patch arrival time T arrive depends weakly on the threshold near the optimum, but the optimum threshold shows a non-monotonic dependence on p 1 . The lower optimum threshold for both low-and high-yield p 1 values represents that, in these cases, it is optimal to be more certain of the future harvest rate before leaving: for low p 1 this occurs because high-yield patches are rare (and thus there is a higher premium on distinguishing the high-yielding patch when actually in one), and for high p 1 this occurs because they are plentiful (one is more likely to land in a high-yielding patch, so one can afford to require more certainty to depart). Such an increased premium placed on information gathering for the reduction of uncertainty about the future yield of a patch in sparse environments has been identified in previous analyses of foraging models governed by statistical decision theory [59,60]. Between these cases, although the optimal threshold is slightly higher, the dependence is weak. Additionally, this demonstrates that, if the forager did not know p 1 (we assumed that this is known and is used to formulate the leaving decisions), the best strategy would be to err on the side of choosing a low threshold, because the sensitivity of T arrive to threshold is relatively weak for choices too low but can be higher for choices too high (figure 3c).
royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210337 In an environment with five patches, performance depends weakly on how many LLRs are used to make patch-leaving decisions for L > 2. It is sufficient to simply track the LLRs between the first three patches, which correspond to using L = 2 (figure 3d). This is because the forager only needs to know whether it is in one of the best patches or not, since the goal is to eventually settle in one such patch as a habitat. This demonstrates again that the key features of uncertainty that matter to the optimal forager are the discriminability and prevalence of the best and second best patch type.

Continuum limit: many patch types
Building on the N-patch case, we now consider a scenario where there is a continuous distribution of patch qualities (N → ∞), so the resource yield rate for each patch λ is drawn from a continuous distribution p 0 (λ), which serves as a prior for the posterior p(λ|x(t)) with each patch visit (figure 5a). For any continuous probability distribution function p 0 (λ), the maximum λ will never be sampled, so arriving and remaining in the 'maximum' yielding patch is not possible. We therefore assume that the forager seeks patches with yield rates λ θ or above, but deems lower yield rates to be insufficient. With this formulation, the forager updates an LLR based on a belief of whether the current patch is greater or less than λ θ . Because this divides the continuous distribution into two categories, the mathematical treatment is similar to the binary case, but with added uncertainty because the patches in each category do not have the same return.
To model this, given a reference yield rate λ θ , we represent decisions in an environment with a continuous distribution of patch qualities by tracking P(l . l u jx(t)) ¼ Ð 1 lu p(ljx(t)) dl. For the case of an exponential prior p 0 (λ) = αe −αλ , given K(t) resource encounters, we define ρ(t) = log (P(λ > λ θ |x(t))/ P(λ < λ θ |x(t))) and state that the forager departs the patch when r(t) û or when P(l . l u jx(t)) u : ¼ 1=(1 þ e Àû ). Note that, to allow evidence accumulation, we require that u , a Ð 1 lu e Àal dl ¼ e Àalu ; f, which represents the fraction of patches where λ ≥ λ θ .
Computing the probability of escaping a high-yield patch, p H (u; l u ) ¼ Ð 1 lu p 0 (l)p(u; l) dl, and the mean time per visit to high-and low-yield patch types, T(u; l)dl, when departing (using Monte Carlo sampling), we then use equation (3.2) to compute the time to arrive in a high-yielding patch (figure 5b). Placing a higher threshold λ θ on the quality of an acceptably high-yielding patch increases the time to arrive in an acceptably high-yield patch. Moreover, the optimal threshold θ decreases, as more time must be spent in patches to discriminate a high-yielding patch, which become rarer as λ θ increases. Increasing λ θ corresponds to making sufficiently high-yield patches more discriminable and more rare.
With this formulation, the mathematical treatment in the case of a continuous distribution of patches is then the same as the binary case, and we can map corresponding (a) Beliefs about three possible habitat types y 1 (t) = log ( p(λ 1 |x(t))/p(λ 2 |x(t))) and y 2 (t) = log ( p(λ 1 |x(t))/p(λ 3 |x(t))) increase with resource encounters and decrease between resource encounters until either reaches the departure threshold θ. (b) The mean time T arrive to arrive and remain in the highest yielding patch λ 1 (equation (3.6)) (with N = L = 3) decreases with patch discriminability λ 1 , as does the optimal departure threshold θ opt (circles). Three patch types with λ 2 = 2, λ 3 = 1, τ = 5, and p 1 = p 2 = p 3 = 1/3. (c) T arrive decreases with the prevalence of the best patch p 1 (while p 2 = p 3 = (1 − p 1 )/2), but θ opt varies non-monotonically. (d) The mean time T arrive to arrive and remain in the highest yielding patch λ 1 (see equation (3.6)) depends on how many LLRs (L) the forager uses to make a decision. Although the optimal time T opt arrive (curve minimacircles) decreases with L, the dependence is weak; L = 2 yields nearly identical mean optimal arrival times as L = 4. The optimal threshold which leads to T royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210337 results: setting a higher l u is equivalent to decreasing p H and concurrently increasing λ H . Although we considered an exponential distribution for p 0 (λ), we note that if this distribution changes, then this will affect the relationship between l u and the equivalent mapping onto the binary case (in terms of p H and λ H ). Another possibility would be to further 'bin' the continuous distribution to correspond to three effective types, instead of two, as we did using a single threshold. The continuous case then would be treated analogously to a three-patch-type environment, and could be represented with two LLRs. However, further binning may not be necessary to achieve near-optimal decisions. Overall, this demonstrates that effective strategies for foraging environments with a continuum of patch types could be generated using particle filters that compute likelihoods over a finite set of patch types [61].

Summary of results: habitat choice problem
In general, we see that strategies that divide the environment into two patch types work well in efficiently finding the best or near-best patches, even in the presence of many patch types. The optimal time to arrive and remain in the highest yielding patch decreases as the high-yield patch discriminability increases and as high-yield patches become more common. Considering more than two patch types, the associated foraging strategies are most strongly coupled to environmental parameters of the highest and second highest yielding patch types. It is not necessary to compute LLRs associated with all possible types in order to efficiently find a high-yield patch-even considering only a single LLR gives reasonable results, and the average time to arrive in a high-yield patch is not strongly affected when the number of LLRs continues to increase beyond two. This suggests that animals select habitats by estimating a possible range of high-quality patches and then making patch-departure decisions based on whether patches meet those criteria or not.

Patch leaving: depletion-versus uncertaintydriven decisions
When the scale of a patch is smaller, the forager will significantly deplete the patch's resources during its visit. The decision is then not of which patch to remain in, but rather of when to leave the current patch in search of another. We therefore refer to this as patch exploitation (figure 1a; [12,62]). While the nature of the resource differs for different animals (and typical patch residence times can accordingly vary from seconds up to hours-for some examples, see [30,36,45,63,64]). These cases all have in common that each resource patch is small enough that availability within the patch is affected by the consumption of the forager. The MVT sets the optimal time to leave a patch in order to maximize resource consumption over time: when the current patch yield rate equals the overall average yield rate for the environment [19]. However, the MVT is simply an optimal rule, and does not specify the mechanistic process of how an animal uses its experience to reach a decision to leave a  There is an optimal θ that minimizes the time to arrive and remain in a high-yielding patch (λ > λ θ ), and the optimal time T opt arrive (circles) increases as the acceptable yield rate is increased. α = 1 is used here. 10 6 Monte Carlo simulations for each λ θ and θ.
royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210337 patch. Previous work has demonstrated that rewards in discrete chunks-instead of as a continuous rate-can affect the process an animal uses in decision-making [21,24,48]. From a Bayesian perspective, decisions should use available information about the resource distribution in the environment. If resource availability within a patch is discrete or uncertain, even an ideal observer may not be able to accurately infer the actual rate of return, and thus would not be able to implement the leaving rule prescribed by the MVT. Experiments show that, while the general trends predicted by the MVT hold in many cases, animals often deviate from an MVT-predicted strategy [50]. Moreover, in cases where patches contain very few items (e.g. 0 or 1 resource chunks), reward is not described by a rate function, and the MVT leaving rule does not apply.
Here we consider an animal that encounters resources in discrete chunks and infers the state of the environment and subsequently acts. This allows us to ask when the MVT rule is actually optimal versus when it does not apply, when deviations from the MVT occur owing to uncertainty and how a forager can incorporate prior knowledge about the resource distribution in the environment to reach a patch leaving decision. We first treat the simple case of homogeneous patch types to establish the basic theoretical approach. Then, we consider an environment with two patch types to show how the inference procedure affects decisions in different environmental configurations, which we refer to as the 'depletion-dominated' versus 'uncertainty-dominated' regimes.

Homogeneous environments
To show how discreteness of resources affects decisions [21,24], we first consider the simple case of a homogeneous environment with a single patch type. An ideal forager with prior knowledge of the initial yield rate λ 0 can track time and resource encounters to determine the current yield rate λ(t), and then depart the patch when the inferred value of λ(t) falls below some threshold l u . Prior knowledge of the initial patch yield can be used in order to infer l(t) ¼ l 0 À K(t)r, ( 4 :1) which represents the true underlying value of λ(t). Using this in a patch leaving decision strategy is equivalent to departing after a fixed number of resource encounters [48].
Using this inference strategy, we calculate the long-term resource intake rate by assuming that λ θ is an integer multiple of ρ. With this, the number of chunks consumed before departure is m θ ≡ K(T(λ θ )) : = (λ 0 − λ θ )/ρ. Linearity of expectations allows us to compute the mean departure time as the sum of mean exponential waiting times between resource encounters T lu ¼ [H m0 À H m0Àmu ]=r, where H n is the nth harmonic number. Thus, we can approximate the long-term resource intake rate given λ θ as which is valid for m 0 ≫ 1. There is an interior optimum m θ that maximizes the long-term resource consumption rate, which we can estimate by computing the approximate critical point equation of equation (4.2) m 0 r À l u l u ¼ log rm 0 l u þ rt: For large m 0 (many chunks per patch), l u ¼ R lu , i.e. the leaving threshold is equal to the overall average rate of return in the environment; this is the optimum prescribed in the MVT [19]. Using the exact formula in equation (4.2), we can numerically determine the optimal threshold for small m 0 (a few chunks per patch). This shows that, when there are only a few chunks per patch, the true optimal threshold is close, but not exactly equal, to R lu (figure 6a). Note that the current estimate of the encounter rate is monotonically related to the future expected harvest rate, so a patch-departure rule based on either can easily be mapped to the other.

Binary environments
In binary environments, the forager estimates the underlying yield rate of the current patch,l(t), and departs when this falls below a threshold λ θ . Constant threshold strategies are our focus owing to their relative simplicity, but we note that alternatively dynamic programming could be used to determine optima of a more general class of departure strategies [65]. We assume that the forager uses knowledge that two different patch types exist to estimate the yield rate of the current patch; this involves using prior information to discriminate the patch type (high or low), combined with resource encounters which decrement the estimated where y(0) = log ( p H /(1 − p H )) as in the case of habitat choice. We focus on two different scenarios, which we refer to as: (i) depletion-dominated regime, where the initial yield rate of the patch is known, and therefore leaving decisions are based solely on depletion, and (ii) uncertainty-dominated regime, where the type of patch is not known upon entry, and optimal leaving decisions must consider uncertainty in the estimate of the current yield rate of the patch. For the uncertainty-dominated regime, we consider first the cases where low-return patches contain zero resources, and then generalize to different amounts of resources per patch type.
Depletion-dominated regime. To represent what we term the depletion-dominated regime, we assume that the forager arrives in a patch and immediately knows the patch type λ j ( j ∈ {H, L}) in which it resides (e.g. owing to information provided by conspecifics or visual cues). In this case, the forager can make an accurate estimate of the true underlying yield rate of the patch, and therefore there is no uncertainty. Thus, leaving decisions are driven by depletion of the patch, as determined by when the estimated yield rate falls below some level l j u . Following our calculations from the homogeneous case, the long-term resource intake rate depends on the initial resource chunk count in patches of type j, m j 0 , and the departure thresholds, so in the large m j 0 limit whereT j (l j u ) ¼ log(rm j 0 ) À log l j u and the critical point equations for each partial derivative @ l j This can be rewritten as The aforementioned equation shows that, like the homogeneous case, an optimal strategy when there are many chunks per patch is to depart as the inferred yield rate equals the mean rate of resource encounters for the environment; additionally, the optimal threshold only depends on the average yield rate for the environment, and not the individual patch types (MVT; figure 6b). When there are a few chunks per patch, the optimal threshold may slightly differ from this value (see results for the homogeneous case in figure 6a). The depletion-dominated regime is similar to a homogeneous environment: since the forager knows the initial yield rate of the patch it is currently in, it can accurately infer the true underlying yield rate, and depart based on depletion of the patch. As in the homogeneous case, the optimal decision strategy can be formulated equivalently as either leaving when the estimated rate of return falls below a threshold or as counting-leaving after consuming a certain amount of resources.
Uncertainty-dominated regime-empty low-yield patch. In the 'uncertainty-dominated' regime, the forager does not know the initial yield rate of the patch upon entry. However, we assume that it has prior knowledge of the types of patches in the environment, i.e. that it knows the values of λ H and λ L . We first consider the tractable scenario where the low-yielding patch is empty (λ L = 0). Such situations occur if certain regions of the environment appear to have food (e.g. fruiting vegetation) but on closer inspection turn out to be empty (e.g. already foraged or rotten). The optimal strategy is for the observer to first wait a finite time T θ to depart if no resources are encountered, but if resources are encountered before t = T θ to consume those resources until the inferred yield rate drops to λ θ = (m 0 − m θ )ρ. Early/late decisions are thus driven by uncertainty/depletion. Assuming m 0 ≫ 1, we can continuously approximate the long-term resource intake rate and find that it is maximized using a waiting time T θ that is insensitive to p H . However, the threshold λ θ depends on both p H and travel time, because these parameters affect the overall average rate of resources available in the environment (figure 6c). The optimal threshold l opt u is not specified by the MVT, since uncertainty drives the forager to spend non-zero time in empty patches, adding extraneous time to the foraging process. . Depleting environments with 'known' versus 'unknown' patch resource values. In the 'known' case, the forager knows the patch type and initial resource yield rate upon arrival, whereas in the 'unknown' case the forager must infer the path type. Patch-leaving strategies in the unknown case approach those of the known case in the high-chunk count limit. (a) Total resource intake rate R l u as a function of the estimated yield rate λ θ at which the observer departs a patch. For low initial resource levels m H 0 and m L 0 , R l u near the optimum (dots) deviates between the case of initially known patch type (black) versus initially unknown (blue). Both thresholds are less than the MVT prediction owing to the discreteness of resource encounters. (b) Resource intake rates of the known/unknown cases converge at higher initial resource levels. (c) Deviations in optimal resource intake rate between known and unknown cases are due to less certainty P(l j ) (grey line) about the patch yield rate upon departure, leading the observer to stay in low-yield patches too long (blue line) and leave high-yield patches too soon (red line). Other parameters are ρ = 1, τ = 5 and p H = 0.5. Mean rates and departure times in the unknown patch case were computed from 10 6 Monte Carlo simulations.
royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210337 Many resource chunks. Next, we generalize to examine binary environments in which m H 0 . m L 0 are arbitrary integers. In this case, the belief y(t) = log (P(λ H − K(t)ρ|x(t))/ P(λ L − K(t)ρ|x(t))) evolves according to equation (4.4). The forager estimates the current yield rate of the patch from this belief,l and an optimal strategy is to depart whenl(t) l u . The threshold λ θ should be tuned to l opt u so the long-term resource intake rate is maximized. We can compute departure times T H and T L numerically via Monte Carlo sampling. For an environment where the overall availability of resources is low and there are few resource chunks per patch, the optimal strategy when the patch type is known is to fully deplete each patch before leaving-this is represented by a threshold of l u ¼ 0 for the inferred return rate. However, this is only optimal when the patch type is known; in the case of an unknown patch type, the forager has uncertainty in whether or not there are remaining resources in the patch, and this causes the optimum threshold to be non-zero. In both cases, the discreteness of resource encounters causes the optimal threshold to be lower than predicted by the MVT, although within this range of threshold values the average resource intake actually received is similar (figure 7a). In the case of high resource availability and many chunks per patch, the optimal thresholds are similar whether or not the forager knows the patch type upon arrival (figure 7b; compare black/blue curves) and coincide with the optimal threshold predicted by the MVT.
Comparing cases, we see that foragers in sparser environments (lower average initial resource amount m 0 ¼ ( m H 0 þ m L 0 )=2) stay in low-yield patches too long and leave high-yield patches too soon in comparison with observers that immediately know their patch type, owing to their uncertainty about their current patch before and at the time of departure (figure 7c). Uncertainty thus drives animals to underexploit (overexploit) high (low)-yielding patches when high-and low-yield patches are different enough that it is optimal to spend significantly more time in high-versus low-yield patches, but similar enough as to not be immediately distinguishable. Optimal leaving decisions in the uncertainty-dominated regime must use a rate-estimation process, because of the associated uncertainty in the true yield rate of the current patch.

Discussion
Patch foraging is a rich and flexible behaviour where an animal enters a patch of resources, harvests them and then leaves to search for another patch. An animal's behaviour can be quantified by its patch residence time distribution, travel time distribution, the amount of resources consumed and the movement pattern between patches. In this work, we used principles of probabilistic inference to establish a normative theory of patch-leaving decisions. With this general framework, we showed how foraging at different temporal and spatial scales is connected by a similar decision problem: 'habitat choice' refers to larger scales when foragers do not significantly deplete a resource, and 'patch exploitation' refers to smaller scales when the forager's activity depletes the patch. For habitat choice the optimal behaviour is to quickly locate and remain in a high-yielding habitat, while for patch exploitation it is optimal to use prior information along with reward encounters to estimate the current underlying yield rate to determine when to leave the patch (figure 8 and table 1).
In ecological contexts, these activities are part of a behavioural hierarchy, where an animal must decide where to forage and how long to exploit a certain resource.
In the case of habitat choice, the forager should use its experience of reward encounters to determine whether to stay or leave; in our model an optimal forager departs a habitat when its LLR for the probability of high-yield versus other habitats falls below a threshold. Optimal decisions are based on inference of habitat quality, with uncertainty being the driving factor in habitat departure times; while this is related to resource intake rates, it is not the same, because of how prior information can be used in patch inference. We showed that, with multiple different patch types, it is not necessary to track LLRs for all patch types-behaviour is most strongly affected by inference related to the highest and second-highest yield habitat types. The optimal time to arrive and remain in a high-yielding habitat is lower when patches are more discriminable, or when high-yield patches are more common. While T arrive is an experimentally observable quantity, an animal's internal decision threshold is not; our model connects these quantities, and thus can be used to infer the decision rules an animal is using ( figure 2). For example, a similar approach has been very informative to  See table 1 for details. In different environments with three patch types (low: red, medium: yellow, high: green yielding), the different time series of decision variables (for a single patch decision) and patch visit time intervals. (a) In habitat choice, an animal must determine whether its current patch is of the highest yielding type, departing if the probability that it is not in the highest reaches some threshold, and undergoing a sequence of patch visits until finding and remaining in a high-yielding patch. (b) An ideal forager performing patch exploitation infers the yield rate of its current patch and departs when the resource yield rate reaches a threshold, continuing patch visits indefinitely in large environments.
infer the parameters underlying two-choice decision tasks [66]. We showed that it is optimal to have a lower threshold-and thus gain a higher certainty before leaving-when travel times are large, high-yield patches are rare or high-yield patches are easier to discriminate. Analogous results have been found in observations of animals seeking habitats, where dispersal costs affect whether or not animals search for a new habitat [67]. These results give quantitative predictions that can be used to interpret experiments, for example to examine whether the animal is minimizing the time to reach the highest yield habitat in a heterogeneous environment. Moreover, we showed how behaviour in the general case where many habitat types exist can be understood by mapping results onto the tractable case of only two different patch types (figures 3 and 5). By varying systematically the percentage of high-yielding habitats and the discriminability (ratio of high-to low-yield rate), the model predicts how this affects the minimal time to arrive at the highest yield habitat, and connects this to a process that could implement such computations. For patch exploitation, when the forager depletes patches in its habitat, in most cases the long-term intake rate is maximized by departing a patch when the in-patch estimated resource yield Table 1. Detailed taxonomy of departure decision strategies. Departure strategies and observable trends depend on the environment and task: habitat selection or patch exploitation (see also figure 8). Columns describe the important aspects of the optimal decision strategy for each case, along with key model results. choose optimal threshold λ θ that maximizes long-term resource intake rate optimal return differs from known case given few resources per patch, converges to known patch case when resource density is high forager stays in low-yield patches too long, leaves high-yield patches too soon when there are few resources per patch (uncertainty-dominated regime) equation (4.7) figure 7 royalsocietypublishing.org/journal/rsif J. R. Soc. Interface 18: 20210337 rate matches the average return rate of the environment (i.e. by implementing the MVT rule). However, this does not apply when resources within a patch are limited so there is more uncertainty about the yield of the current patch upon departure ( figure 7). Often in Nature, the environment is volatile and animals make foraging decisions while uncertain about local resource availability [68]. Our model predicts that, if there is high uncertainty about the patch type, this causes even an ideal Bayesian forager to stay longer in low-yield patches and shorter in high-yield patches than predicted by the MVT. Our theoretical treatment of patch-leaving decisions builds on previous Bayesian models of foraging [31][32][33][34][35][36][37][38][39]41]. Our approach goes further than previous work by providing a step-by-step derivation of the normative strategies associated with a continuum of different environmental conditions, systematically identifying the dependence of observable behaviours (e.g. patch-departure times) on environmental parameters. During habitat choice, the minimal arrival time to the high-yield habitat scales with the probability of high-yield patches in the environment. On the other hand, we have shown that, in the case of depleting patches, the amount of time a forager overstays or understays in a patch scales with the density of resources in the patch. We are also able to infer the optimal threshold an animal should use. While this cannot be measured directly in experiments, our observations do reveal environmental parameter regimes under which performance (e.g. foraging yield) is sensitive to changes in strategy. This not only informs the design of behavioural foraging experiments, so as to determine task parameters that best reveal an animal's strategy, but optimal yields can also be compared with those obtained by animals in the wild to see how finely tuned their foraging strategies are.
Analysis of experiments shows that animals forage in ways that suggest they use Bayesian reasoning [40][41][42][43][44][45][46], using prior knowledge of their environment to modulate foraging behaviour [64,69,70]. For example, bumblebees [30] and Inca doves [71] adjust their foraging strategies in response to the predictability of the environment, as a Bayesian forager would, but this is not a universal trend [72]. Patch-leaving decisions may deviate from Bayes' optimality as animals become risk-averse in variable environments [73]. Other Bayesian models have considered patch-foraging decisions, even in rich multiple patch environments [34,47], but this work does not necessarily systematically vary environmental and strategy parameters to explore how the sensitivity of performance changes. Our work is also sufficiently mathematically tractable to suggest a mechanistic implementation that the forager can use to implement optimal decision rules. Moreover, our theoretical approach applies not only to patch exploitation, but also to habitat choice-where the MVT does not apply-and thus enables connections across these multiple scales of behaviour [13].
Although we used a constant threshold value based on either the belief or estimated yield rate, other work has examined cases where optimal decision strategies involve timedependent decision thresholds [65,[74][75][76]. Typically, these results arise in the context of multi-trial experiments in which the quality of evidence on each trial varies stochastically and is initially unknown. In the habitat-choice context, the quality of evidence is fixed across habitat visits, fitting the assumptions of classic, constant threshold optimal policies. We would therefore expect an analysis allowing for a dynamic threshold to yield the same results as we obtained here. On the other hand, when the animal performed patch exploitation in uncertain binary environments, we projected a higher dimensional description of the patch value to a single scalar estimate of the patch yield rate. In this case, a constantthreshold implementation may not be purely optimal. Leveraging methods from dynamic programming commonly used to set optimal decision policies [75] would be a fruitful next step in ensuring the optimality of our patch-leaving decision strategies. A common theme in previous Bayesian models that use dynamic programming [41,47] and our approach is that the forager should use the expected future return, not the current return, to make departure decisions. However, note that current and future return should be tightly monotonically related, especially in the limit of short times into the future. An advantage to the constant-threshold treatment is that it enables simple explicit quantitative relations that can be used to interpret experimental data (figure 2). For example, experiments evaluating how animals value the quality of evidence could help us delineate whether the animal is using a constant threshold or not.
Effective search is integral to survival in Nature [77], and search behaviour can give information on individual decision strategies. One can consider search within and between patches. Although we assumed random timings of resource encounters, an extension of our model could take into account different spatial arrangements. Within patches, an animal may perform random or systematic searches. For example, a recent study found that rats can solve the stochastic travelling salesman problem using a nearest neighbour algorithm [78]. A similar approach could ask how an animal's search and navigation pattern interacts with different patch-leaving decisions to create an effective foraging strategy that also considers memory of specific patch locations. Indeed, the explicit consideration of spatial movement may be necessary to understand foraging decisions. Previous work found that, when rats must physically move to perform foraging, the observed behaviour differed from tasks that 'simulate' foraging by presenting sequential choices or that consider a visual search [79]. It is an open question as to how the animal integrates aspects of spatial movement with economic valuations of future reward.
Our model assumes that animals know the initial yield rate of each type of patch in the environment. In a real-life context or in an experimental set-up, the animal would learn the environmental parameters, which we could model by considering another level in the inference hierarchy whereby the patch-type quality and fraction are learned along with the transit time distribution. Although we considered a single forager acting alone, another important extension will be to consider interactions between animals, either through predator-prey interactions which affect foraging decisions [80], social foraging of groups [52,81] or even competitive foraging [17,18,82]. In our model, the forager only receives direct (non-social) information about resource availability; in collective foraging, an individual receives both social and non-social information [83,84]. This can significantly affect foraging decisions, for example in the case where an individual must balance resource-seeking with group cohesion. Building on our modelling approach, foragers could share social information either by cooperating in the inference of the patch quality or by signalling to each other when to depart a patch as a threshold is reached.
To conclude, our model establishes a formal framework for the quantitative analysis of a natural behaviourpatch foraging (involving both habitat choice and patch exploitation)-that can be studied with the same formal rigor as many trained behavioural tasks. Such validated behavioural algorithms are crucial for the systematic design of future experiments and interpretation of data on animal behaviour [85]. By comparing with theoretical optimal strategies, experiments and data can be used to understand the decision strategies an animal is employing and relate these to recorded animal movement and neural data. Future work will build on this model framework to generate testable hypotheses on the role of social interactions and the neural mechanistic underpinnings of foraging behaviour.