A stochastic game model of searching predators and hiding prey

When the spatial density of both prey and predators is very low, the problem they face may be modelled as a two-person game (called a ‘search game’) between one member of each type. Following recent models of search and pursuit, we assume the prey has a fixed number of heterogeneous ‘hiding’ places (for example, ice holes for a seal to breathe) and that the predator (maybe polar bear) has the time or energy to search a fixed number of these. If he searches the actual hiding location and also successfully pursues the prey there, he wins the game. If he fails to find the prey, he loses. In this paper, we modify the outcome in the case that he finds but does not catch the prey. The prey is now vulnerable to capture while relocating with risk depending on the intervening terrain. This generalizes the original games to a stochastic game framework, a first for search and pursuit games. We outline a general solution and also compute particular solutions. This modified model now has implications for the question of when to stay or leave the lair and by what routes. In particular, we find the counterintuitive result that in some cases adding risk of predation during prey relocation may result in more relocation. We also model the process by which the players can learn about the properties of the different hiding locations and find that having to learn the capture probabilities is favourable to the prey.


Introduction
Foraging theory is generally concerned with groups of predators and prey and considerations of spatial densities are important. However when both predator and prey density is very small, it may be a good approximation to assume that the local environment contains one predator and one prey (or none, in which case the predator is doomed anyway). In this case a two person zero sum win-lose game model may be useful, where the predator wins the local game if it …nds and successfully pursues the prey and otherwise the prey wins. Such a model, with both search and pursuit considered, was introduced in [1]. In this and the following models, the prey could hide among a …xed number n of 'locations'(hiding places), and the predator had enough time or energy to look into only k of them in any period. The locations i are heterogeneous in the probability p i that the predator successfully pursues a prey found at location i: This model was extended to multiple periods in [2] in the case that the prey is found but not caught, in which case it can relocate at any hiding place in the next period. The relocation process was assumed to be riskless for the prey. In this paper that unrealistic assumption has been relaxed in that the prey is assumed to be captured by the predator when relocating from location i to location j with known probabilities i;j ; representing the danger of such a relocation in terms of the terrain that needs to be crossed. This realistic modi…cation to relocation risk was indeed suggested by an anonymous referee of that paper. This paper also introduces a model where the capture probabilities p i are not know initially by the players, but are learned over time. Thus while, we model precisely the hideseek part of the game, the pursuit part is simpli…ed by the adoption of known values of the p i : That part of the game might also be modeled, as in [3]. To make the relationship of the published and new models clearer, Table  1 compares their various properties. A …nal caution is that our notion of prey animals 'hiding'should not be taken too literally or restrictively. In fact the prey are usually carrying some other activity, like seals choosing an ice hole for breathing [4], which they wish to do repeatedly in an unpredictable manner to avoid the predator polar bear. It could be choosing a water hole, as in [12]. We use the metaphor of hiding to put this problem into the hide-seek literature, which we already extended to hide-seek-pursuit. From the point of view of the predator, the location of the prey is 'hidden', not known in advance of the search procedure.

Qualitative summary of main results
The main results of this paper are mathematical theorems distributed throughout Sections 5 to 8. Most of these are quantitative in nature, for example we give precise optimal probabilities for the prey to hide at each location, possibly based on its prior location. However we believe it is useful to give rough qualitative versions of some of these results here. For the precise results on which these summaries are based, refer to the speci…c results quoted.
1. In a two location model analyzed in Section 7, there is only a risk of inter-period capture if the prey moves to the other location (relocates) rather than remaining at its original location. We …nd the counterintuitive result that increasing the risk of relocation (higher capture probability during the move) may also increase the frequency of relocation by the prey under its optimal hiding strategy.
2. For certain data on the p i ; if the prey hides optimally in terms of the predator strategy, rather than simply hiding randomly, it can reduce the probability of eventually capture from about 0.46 to about 0.29. This is a reduction of about 37%. See equation (31) 3. When there is learning, higher variability of locations with respect to their capture probabilities favors the predator if these probabilities are high; but favors the prey if these probabilities are low. (Proposition 9) 4. When the patch is disrupted by some event (hurricane, drought) which may change the pursuit characteristics of the di¤erent locations, the fact that their capture probabilities must be learned again is favorable to the prey. (Proposition 8)

The Search Game Literature
The …eld of search games is an area of two person zero-sum games where the hider and Searcher are in a known search region and choose their motions: the hider (mobile or immobile) wishes to avoid or delay capture. In the search games most relevant to our model, the hider chooses to locate at one of a …nite number of locations (called cells, boxes, etc.) and then the Searcher looks sequentially into these boxes to try to …nd the hider. These boxes may be heterogeneous in the overlook probability (that the Searcher looks into the correct location but does not see the hider) and the cost of searching. The literature on this aspect of our model when the Searcher has a limited amount of time to …nd the Hider has been discussed in [1]. A related paper is the study of [5], who …nd a search strategy independent of the limited time horizon. The repetition of search in repeated periods is modeled in [6] and [7], where during the search the prey (Hider) may attempt to ‡ee the search region. The prey will succeed in this attempt if the predator is in a cruise search mode, but not if he is in an ambush mode. In those models, a successful ‡ight by the prey is de…nitely followed by a renewed attempt by the predator to …nd it. Search games with a network structure (related to transitions in our model) are studied in [8] and [9]. To extend our work to multiple hidden prey, the abstract model of [10] would be useful. The problem of where to hide food (in discrete packages such as nuts) rather than where to hide oneself, has been analyzed in a search game played between a scatter hoarder such as a squirrel and a pilferer in [11]. The squirrel has limited digging energy and has to decide between placing nuts deeply hidden in one place or alternatively widely scattered at shallower depths. This problem is somewhat analogous to the problem of a prey hiding in a good location or randomly choosing among less good locations. Of course the payo¤s are of a di¤erent kind as the prey either gets caught or not; while the squirrel either has enough nuts left to survive the winter, or not. Also, there is no pursuit phase in the squirrel's problem.
The work of [12] and [13] included ambush modes for the searching predator. A 'silent predator'(whose approach is not observable by the prey) was considered in [14]. More biologically realistic models were considered by [15] and [16]. The wider subject of search games is the subject of the monograph [17].

Behavioral Ecology Literature
The study of predator optimal foraging for stationary prey has a long history since the 60s ( [18], [19]). Simple situations can be formalized using graphic methods, as for the patch leaving rule, while complex situations, as foraging in a stochastic environment, require elaborate formalism such as stochastic dynamic programming [20]. These studies show that predators follow optimal solutions, but also use simpler rules of thumb. The study of optimal escape of prey is more recent [21]. Indeed, the advent of new tracking devices, from accelerometers to UAVs, enabled the collection of massive precision data about predator and prey movements only recently (see for example [22], [23]) and the recording of the paths of both antagonists even more recently, see [24]. The …eld is thus currently experiencing an explosion in terms of observation and experiments, while the modeling formalism is lagging behind. Here again, several key aspects have been formalized using graphical arguments. The number of models addressing more realistic situations is however much lower. In all these approaches, the consideration is focusing on a single agent, the predator or the prey, acting in a possibly changing environment. This is the heart of optimal foraging theory.
Cases in which antagonists have no or incomplete information during the interaction have been rarely studied, both phenomenological and in terms of optimal behavior. This is surprising given their frequent occurrence in Nature. Biological examples …tting such description include polar bears hunting for seals at breathing holes, parasitic wasps hunting host larvae hidden inside leaves or wolves hunting elk in deep forests. While the …rst two examples have been described in detail earlier in this context ( [1], [2]), the interaction between wolves and their prey was not, and is thus summarized here. The authors of [25] have patiently collated numerous observations of wolf packs pursuing many species of prey. Elk in particular (p. 68) seem to use features of the landscape to escape. They prefer areas where dead trees have toppled, creating an entanglement of logs di¢ cult to travel through. Mountain sheep, another prey, are also unique in their agility, sure-footedness and maneuverability over rugged terrain. The hide and seek games reported in that book are great examples of search games, including the added complexity displayed by wolves, sometimes able to predict the escape route of their prey and to position themselves accordingly. As exempli…ed by these case studies, Nature seems replete with predator-prey interactions which are best viewed as search games. While repeated search games [2] represent the most realistic types of interactions modeled so far, they still lack essential ingredients of interactions between foraging predators and escaping prey. We focus here on two such wanting elements, the spatial distribution of risk between and among hiding sites and the change of the environment during the interaction.
The spatial distribution of the risk of predation among and between discrete hiding locations can be categorized into two extreme cases. In the …rst case, the locations are relatively safe places. Examples include retreat holes for mammals, hiding crevices for lizards, bushes for small passerine birds or feeding tubes for worms in the sea [21] Here, the most dangerous moments are when animals are away or out of these positions, or when they move to them. By contrast, once the retreat is reached, the probability of being caught is decreased to a large degree, at times null. In the second case, the locations represent zones of high attack probabilities, while moving between them is risk-less. Breathing holes of seals attacked by polar bears or feeding windows of caterpillars attached by wasps are of this type ([1], [2]). Indeed, polar bears cannot attack seals while their travel below the ice sheet and wasps cannot attack leafminer larvae if they rest under the intact thick cuticula of a leaf. There is of course a continuum of cases spanning the two extremes. Thus, the most general model should allow capture both on site and while moving from site to site. The capture probability might be furthermore depending on the predator strategy, for example when predators choose which of the sites to visit, but might also be independent of the predator. The amount of vegetation cover, or the di¢ culty of progressing on the terrain between two sites are two such possible in ‡uencing factors. In these cases, there still exist site-to-site path dependent capture probabilities. We thus conclude that a realistic model should make the distinction between these risks among sites and between sites. In previous work, we dealt so far only with among site variability in predation risk. The present work is addressing both types of risks.
The search games played by foraging predators and escaping prey often unfold in conditions which usually change, possibly under the action of the players. These conditions, called environment, are here understood in a liberal fashion, being either external (time of day, for example) or internal (hunger level, for example). The proper formalism for such situations is the realm of stochastic games [39]. Furthermore, a classical optimal foraging model would not make the movement of the prey (if any) a function of the behavior of the predator. Would such two-ways interactions modify the outcome of the game? If so, in which way? These are the kind of questions we are interested in. Our aim is to develop a stochastic game framework including simultaneous decisions of two antagonists during a hide and seek game with multiple bouts in which the motivation of the predator ‡uctuates. This works represents therefore the natural bridge between the commonly used single predator, multiple stationary prey, optimal foraging theory described earlier and search games.

Overview of previous results
The current paper can be seen as an extension of [1] and [2]; we summarize those models, calling them respectively the one stage game and the repeated game. Table 2  As related notational convention in game theory is to use "he and she" to distinguish between the two players: here we will use "he" for the Searcher and "she" for the Hider, reverting to "it" when we refer to predator and prey animals.

The One Stage game
We now review in more detail the One Stage (period) game of [1]. A (stationary) Hider locates in one of n locations i 2 N = f1; 2; :::; ng while the Searcher inspects k of these, where n and k are parameters of the game. The order of inspection is not important. If the Searcher inspects the location i chosen by the Hider, the Hider is captured with a probability p i that depends on the location i: For convenience we assume that p 1 p 2 ::: p n ; that is, the locations are numbered in decreasing order of attractiveness to the Hider. The Searcher wins the game if he …nds and then captures the Hider. The Hider wins if she is not found or if she is found but not captured. So if the Hider hides at location i and the Searcher inspects a k-subset S (subset of cardinality k) of N then the payo¤ P to the maximizing Searcher, the probability that the Searcher wins, is given by If we say that the payo¤ to the Hider is the probability she is not found and captured, then the game has constant sum 1 (the Hider's payo¤ is 1 P ). A mixed Hiding strategy is a probability vector of hiding probabilities h = (h 1 ; h 2 ; :::; h n ) where h i is the probability that the Hider hides at location i: A mixed strategy for the Searcher is a probability distribution over k-subsets of N : Clearly to every such mixed search strategy there is a probability r i that location i is inspected. Conversely, if we know all the probabilities r i ; we can determine the mixed search strategy. This leads us to the following equivalent, and more useful, de…nition of the mixed Searcher strategy.
De…nition 1 A mixed search strategy is a vector of probabilities r = (r 1 ; r 2 ; :::; r n ) where r i 1 is the probability that the Searcher visits location i during the k rounds, satisfying n X 1 r i = k; and r i 0; for all i 2 N . ( In this constant sum game, the value v is the probability of capture P; with best play on both sides. Note that if the Searcher inspects location i when the Hider is adopting the mixed strategy h; the Searcher wins with probability h i p i ; the probability that the Hider is found multiplied by the probability she is then captured. We will often consider the mixed hiding strategy called h which makes all these probabilities the same, that is, and for some constant : We say that h is the Hider strategy which makes all locations equally attractive for the Searcher. These equations have a unique solution given by It follows from the formula for and the assumption that the p i are increasing in i that 1 p 1 = n: The solution of the game is easy to see in the two extreme cases where k = 1 and where k = n: When k is 1 this is a standard hide-seek game, sometimes called a diagonal game. The value of this game is , the Hider should adopt h to make all locations equally attractive, and the Searcher should inspect locations with probabilities proportional to their capture probabilities p i : On the other hand, when k = n and all locations are inspected, only the Hider has a strategic choice and she is captured with probability p i if she chooses location i; so clearly location i = 1 is best for her, with a value of p 1 : The surprising …nding of [1] is that for small k the solution is like that for k = 1 and for large k the solution is like that of k = n: The dividing value of k is given by p 1 = : This result is stated below.

Theorem 2
The solution of the one-stage game described above depends on the value of k relative to p 1 = : 1. If k < p 1 = then the optimal hiding strategy is h ; the optimal search strategy visits each location i with probability r i = k =p i and the value is k : 2. If k p 1 = then the value is p 1 : The Hider can guarantee paying at most p 1 by always hiding at location 1 and the Searcher can guarantee at least p 1 by choosing r 1 = 1 < k =p 1 and r i min(k =p i ; 1) for all 2 i k: This presentation of the one stage game of [1] is a good place to mention the distinction of our approach with evolutionary game theory. We note that our game is a big generalization of the so called matching pennies game, where each player chooses H or T and the maximizer wins if they choose the same and the minimizer wins if they are di¤erent. This is our game with n = 2 locations called H and T; with both capture probabilities equal to 1 and k = 1 searches. This game is mentioned in Section 4.2 of [40]. After observing that the game is not symmetric it is further observed, "Thus, matching pennies games fall outside the domain of evolutionary stability analysis." This applies equally well to our more general hide-seek-pursuit games, as well as any asymmetric game (see [41]). Obviously we cannot expect pure strategy solutions (saddle points) in hide-seek games, as certain knowledge of the hiding place ensures the prey will be found. However in matrix games such as the one presented here, iterative methods of solution are known. An evolutionary approach to search games would indeed be an interesting and useful contribution, but to our knowledge no attempts in this direction have been made, and we do not make such an attempt here. In our later stochastic game, the optimal strategies are indeed obtained by an iterative process (Corollary 6), though not exactly an evolutionary one.

Repeated Games
In [2], the one stage game was extended to a repeated game. We brie ‡y review the model and results for the undiscounted and discounted versions of that game here. Here the payo¤ P S to the Searcher is the probability that the Hider is eventually captured (at some stage of the game). The value v of the game is obtained by solving the equation The equally attractive hiding strategy is given by Note that the "attractiveness" of location i in a repeated game is given by The hiding strategy h is optimal for all k; whereas in the one stage game it was optimal only if k was below a threshold!

The Discounted Repeated Game
The repeated game can also be studied under the assumption that the payo¤ is discounted by a discount factor ; 0 1; in each stage. If = 0 we have the one stage game and if = 1 we have the undiscounted repeated game. In [2] we have shown that the value v of the discounted game is given as the unique solution of the equation and that in this case the strategy h is optimal for the Hider. Otherwise, the 'stay at 1' solution h 1 = 1 is optimal for the Hider, with the value We have also proved the following theorem: Theorem 3 Consider the repeated discounted game with k looks and a discount factor : Consider the equation (11).
If there is a solution k to equation (11), then it is unique and If < k , then the 'stay at 1' strategy h 1 = 1 is the optimal strategy for the Hider.
If > k , then the ' equally attractive' is the only solution to the game.
If equation (11) has no solution in [0; 1]; then the 'equally attractive'strategy h is the only optimal Hider strategy.

The Stochastic Game k
We now present our new model. In this section we modify our repeated game model so that after a prey escapes capture at location i; she may still be captured in the course of moving to her chosen next location j (possibly the same as i if she chooses not to move between periods). We assign a …xed probability i;j ; which depends on the two locations, to this capture probability. The probability i;j is a re ‡ection of the properties of the terrain between locations i and j: For example i;j might be high if the terrain in between is very open and has high visibility to the predator. In practice, this probability might depend on choices (such as where to position between periods) of the predator, but for simplicity we assume here that it is independent of any such choices. Note that if all the transition capture probabilities i;j are taken to be 0; then the new stochastic game model, which we will denote by k ; reduces to the previous repeated game model G k of ( [2]).
To formally de…ne the stochastic game k = k (n; p; i;j ) ; we must make two changes to the notation of the repeated game model. First, we need to add two additional arti…cial states, in addition to our n original location states, to indicate ending situations for the game. If the Hider has not been found at the end of the k searches allowed in a period, then the Hider wins and we say that the game moves to the arti…cial state i = 1: Alternatively, if the Searcher wins because he has found and captured the Hider, we say that the game moves to the arti…cial state i = 0: Clearly the n location states i 2 f1; 2; : : : ; ng are non-absorbing (the game continues from such a state) while the two arti…cial states i = 1; 0 are absorbing states, where one of the players has won. The location state i denotes the state of the game when the Hider has been found at location i but has escaped the pursuing Searcher.
Our previous models were constant-sum, rather than zero-sum because the payo¤s to the players were the probabilities that they would win the game. These probabilities sum to 1 rather than to 0: The theory of stochastic games we use here applies to zero-sum games, so we need to make a simple a¢ ne transformation of the payo¤s that takes 1 to 1 and 0 to 1. (This transformation is x ! 2x 1:) In the new notation the winner's payo¤ is +1 and the loser's payo¤ is 1; so the game is zero-sum. To transform the probability P S that the Searcher wins (payo¤ in the repeated game) into a constant sum payo¤ C , we adopt the monotone increasing a¢ ne transformation given by C = 2P S 1: Thus when the Searcher wins we have P S = 1 and C = 1; but when the Hider wins we have P S = 0 and hence C = 1: The same transformations applies as well to the values v of the repeated and stochastic games. For example a value of v = 0 now means that with best play either player is equally likely to win the game (the same as the value 1/2 in our previous models). Note that the probability of capture P S satis…es The dynamics of the stochastic game k (in both the undiscounted and discounted versions) are as follows. The location state i corresponds to the situation where the Hider has been found at location i and has successfully escaped capture. Her pure choice is her next location and so her mixed choice variable at i is her distribution h = h i = h i 1 ; : : : ; h i n over where to locate in the next period. The choice variable for the Searcher at state i consists of the k locations to search in the next period, given that the Hider has just left location i: The Searcher's mixed strategy from state i can be represented by the variable r = r i ; where r i j denotes the probability that location j is among the locations he will search. Suppose the Hider chooses location j: Then 1. She is captured before reaching the new location with probability i;j : Otherwise, 2. With probability 1 r i j she will not be found at j; and the next state is the absorbing state 1 (Hider wins, payo¤ is -1).
3. With probability r i j she will be found at j: In this case (a) With probability p j she will be captured at j and the next state is 0 (Searcher wins) (b) With probability 1 p j she will not be captured and will have to choose a new location for hiding. The new state is j:

The undiscounted and discounted stochastic games
Suppose that there exist values v i ; i = 1; : : : ; n; for the stochastic game when the state is a location state i; i = 1; : : : ; n; : It is a standard matter to …nd an equation which relates all the v i : Suppose that at state i the Hider chooses to go to location j and the Searcher chooses strategy r = r i : Then it is easy to see that the next state is either a location state j or one of the arti…cial states 1; 0 with the following probabilities and payo¤s.
next state probability payo¤ 0 (captured in transit) i;j 1 (Searcher wins) 0 (found and captured at j) It follows that the expected payo¤ if the Hider goes to location j and the Searcher uses the search strategy r = r i = (r 1 ; : : : ; r n ) is given by Consequently the expected payo¤ if the Hider adopts the mixed strategy h is given by

Existence of a value for k
The theory of stochastic games shows that the game k has a value vector v = (v 1 ; : : : ; v n ), where v i is the value of the game starting at location state i: A stationary strategy is a strategy which chooses actions depending on the current hiding place only. The game k is two-person zero-sum with …nite state and action spaces with a positive probability to stop for any state and any actions by the players: If the Searcher does not visit the hiding location the game stops (escape), and if the Searcher visits the hiding location the game stops (with capture) with probability at least p 1 > 0: Thus, we can use the fundamental result of [26] on stochastic games, that in the above mentioned conditions equilibrium exists in stationary strategies. This result is valid both for the undiscounted and the discounted stochastic games so we have the following theorem.
Theorem 5 There exist unique values v i ; i = 1; : : : ; n; for the stochastic game k : This result holds both for the undiscounted and the discounted version. there exist optimal stationary strategies for both players.

Value iteration algorithm
This algorithm has been devised by Shapley in his fundamental paper [26]. We now adapt it to the game k : Corollary 6 Consider the following iteration scheme, where i = 1; 2; : : : ; n: v i (0) is any initial guess. Then for L = 1; 2; : : : ; we de…ne iteratively, Then lim L!1 v i (L) = v i :This value iteration scheme converges with a geometric rate (1 p 1 ) L : This algorithm works for the undiscounted and even faster for the discounted stochastic game k .

The value at the beginning
At the beginning of the game no location has been chosen yet. The Hider chooses a location i; i = 1; :::; n and the Searcher chooses a set of k locations. What is the probability q i of eventual capture in the game under the condition that the prey was discovered at location i at the …rst stage? With probability p i (by de…nition) there is a successful pursuit and the Hider is captured with probability P S = 1: With complementary probability 1 p i the pursuit is not successful, and since the game is in state i (the Hider has escaped from location i) the de…nition of v i says that the expected payo¤ C = v i : By our a¢ ne transformation relating payo¤ and capture probability, equation (13), we have in this case that P S = (1 + C) =2 = (1 + v i ) =2: Thus overall we have Thus, the game at the beginning is equivalent to the one stage game with probability of capture q i for location i; that is, q i plays the role of what we called p i in the one stage game. The solution of this game is thus given by Theorem 7 The optimal solution of the game k can be obtained from Theorem 2 as follows: The capture probabilities q i ; i = 1; :::; n , are given by (16) Then we transform the optimal probability of capture P S into the zero-sum payo¤ C = 2P S 1 (see (12)).

A Comparative Example
We now look at the e¤ect of both optimizing (rather than simply random) prey movement and of adding risk to inter-period prey movement (allowing ij > 0 rather than riskless ij = 0). We do this we by comparing our models of Section 4.2 (repeated games) and Section 5 (stochastic games) with the Markov Decision Process solution to the one sided optimization of a Searcher against a random Hider, in a simple example with just two locations.
Consider a patch with two locations ( n = 2); p 1 = 0:1; p 2 = 0:8; and k = 1: In case of capture the payo¤ is 1 for the Searcher and 0 for the Hider and in case of ultimate escape the payo¤ for the Searcher (Hider) is 0 (1) so the ultimate payo¤ to the Searcher is the overall probability of capture. Assume that if the Hider was discovered but not captured she succeeds to reach another patch and the process continues until capture or ultimate escape. We use the undiscounted case. We denote v as the overall probability of capture in all the models of our toy example.
A Markov Decision Process (MDP) model for the Searcher is a framework in which his actions are optimal based on his knowledge about the current state and the strategy of the Hider. This state is …xed at the beginning but at any further stage it is the location at which the Hider was discovered (but not captured) at the previous stage. We now compare the MDP to the stochastic game version of this model. At …rst we neglect the capture risk of the Hider during changing locations, and then we take this risk into consideration. Then we consider the possibility of inter period capture.

Model without risk when changing location
We now consider the earlier model where between periods the Hider prey can move between locations without risk of capture, so that all the transition capture probabilities ij are 0: We …rst consider that the prey acts randomly and then considers that the prey acts so as to minimize capture probability. In both cases (8.1.1 and 8.1.2) we assume that the searching predator acts to maximize capture probability.

Random prey, optimizing predator (MDP model)
Assume that the Hider always hides randomly and uniformly, i.e., h = (0:5; 0:5); and that there is no risk for the Hider to change locations. That is, the Hider equiprobably stays or changes location between periods. In this case the optimal strategy of the Searcher is to always look at location 2 (or adopt strategy R = (0; 1) in our notation) at each stage.
The ultimate probability of capture v satis…es

Optimal play in repeated game model
Here also we assume no risk to change locations, so this is the repeated game . The optimal strategies are for both players to hide/search at location 1 with probability about 0:73 and at location 2 with probability about 0:27: The equation of the value of the game is given by equation (6), so we have Thus by hiding at the better location (location 1) with higher probability (:73 rather than :50) the Hider prey reduces the probability of eventual capture from about :44 to about :22; that is, by about 50%.

Model with risk when Hider changes locations
We now make the main assumption of this paper, that the Hider can be captured between periods when moving from location i to location j; with a possibly positive probability ij : For this example we make the simple assumption that the Hider cannot be captured if she stays at the same location, ii = 0 for i; j = 1; 2; but that any move between distinct locations has capture probability 0:3; that is ij = :3 for i 6 = j: State i = 1; 2 corresponds to the event that the Hider has been discovered but not caught at location i and v i is the overall probability of capture at that state.

Hider moves randomly, Searcher optimizes (MDP model)
We assume that from any state i = 1; 2; the Hider moves equiprobably to either location and that the Hider starts equiprobably at either location (not equiprobably in either state). Clearly in this case the Searcher should always look at location 2, where he has a higher chance of capturing the prey if she is there. We therefore have the following equations for v 1 and v 2 ; If the Hider starts in location 1; she will not be found and so the payo¤ is 0: If she starts at location 2; she will be found and she will be captured with probability p 2 = :8: she will not be captured with probability :2 in which case the eventual capture probability is v 2 : So overall the capture probability in this scenario at the beginning of the game is given by

Both Hider and Searcher optimize (stochastic game model)
We now analyze the model of this paper, covering the scenario with interperiod capture risk and two optimizing players in a stochastic game. State i = 1; 2 corresponds to the event that the Hider has been discovered but not caught at location i and v i is the overall probability of capture at that location (this is di¤erent from the notation in chapter 5). For the stochastic game we have the following equations, where the minimum is with respect to the Searcher looking at location 1 (left) or location 2 (right): where x and y are the probabilities that the Hider will stay at the same node after escaping capture at locations 1 and 2 respectively. The solution is v 1 ' :39 and v 2 ' :45; with x ' :60; y ' :27: Note that x, the probability to stay at location 1, is smaller than the corresponding result in the model without risk in moving. This is, obviously, counter intuitive and will be later explained in the Section 7.
At the beginning of the game, the overall probabilities of eventual capture if the Hider is discovered at location 1 is and at location 2 is The optimal hiding policy at the beginning , as given by Theorem 2 case 1, is about 2 3 ; 1 3 ; the same as the optimal search strategy for the …rst stage. Thus the overall probability of capture, since both must go to the same location, is given by The Hider thus reduces the probability of capture from about 0:46 for the random strategy to about 0:29 when playing optimally in the stochastic game. This is a reduction of about 37% if she uses the optimal hiding strategy, a function of the predator's actions rather than a random choice of locations.
Either if we neglect the risk of moving, or if we take it into account, there is thus a marked di¤erence in the probabilities of capture and escape between the stochastic and the single agent games, as used in most optimal foraging theory.

Relocation Probability and Relocation Risk
In Section 6 we noted the seemingly counter intuitive numerical result that the probability of moving increased when such a move became more risky. We now present a very simple numerical example that will enable us to understand why it happens. Assume we have two locations with probability of capture p 1 = p 2 = 1 " and k = 1: Consider …rst the repeated game with no risk of moving. The optimal strategy for the Hider is always to hide at each location with probability 1=2: Now consider the same example with risk = 1=2 for changing location. If the Hider has been discovered but not captured then it is easy to see that she should make both locations equally attractive for the Searcher so she chooses the probability to stay at her present location equal to 1=3: This means she will be captured in transit with probability (2=3) = 1=3; she will be at location 1 in the next period with probability 1=3; and she will be at location 2 with probability (2=3) (1 ) = 1=3: So, conditional on her still playing the game, she is equally likely to be at either location. This choice guarantees her to lose the game with probability about 2=3 which is the minimum possible, while staying with probability 1=2 leads to losing with probability 3=4: The paradox is that we have the same (simple) model but increasing the risk of moving also increases the probability of moving.
We now give an example which makes this phenomenon simpler, without any numbers. We consider the following general case of two identical locations with a common capture probability p = p 1 = p 2 < 1. We suppose that staying still is safe ( 11 = 12 = 0) and relocating either way has the same probability of being captured. The symmetry of the two locations ensures that v 1 = v 2 = v: From state 1 (after a successful escape at location 1) the game matrix is as follows: The existence of a value for this game, which we denote by v; follows from Shapley's result, our Theorem 5. First note that there is no pure strategy equilibrium. Suppose the Hider stays at location 1 with probability q; moving to location 2 with complementary probability 1 q: The equation obtained by equating the payo¤s (eventual capture probabilities) when the Searcher looks at location 1 (top row, left side of equation) and location 2 (bottom row, right side of equation) is given by ; which is decreasing in a: Note that the optimal probability q of staying at 1 (or at 2, by symmetry) does not depend on the common capture probability p or the common value v: The optimal probability is q = 1=2 when there is no relocation risk (a = 0). This makes sense because it makes the Hider distribution most random. As the relocation risk a goes to 1; the probability 1 q of relocating goes to 1 as shown in Figure 1. We note that this symmetric model extends easily to n identical locations, where by symmetry of locations the Hider has the two choices: remain at her current location or move to a randomly selected new location. For n such locations the formula for remaining becomes q = (1 a)=(n a).
Thus the prey may have more incentive to relocate when this move becomes riskier. We note that a somewhat similar observation was made, in a slightly di¤erent context, in [43] and [44]. There, a prey had to decide when to change locations when facing a predator who might either be in cruising search mode or in ambush mode. If the predator was in ambush mode then changing locations resulted in capture. However it was found that as the unsearched region decreased in size, the predator was more likely to be in ambush mode (so a higher "relocation cost" for the prey), but nevertheless the predator optimally increased her likelihood of relocating. The speci…cs of the calculations are di¤erent than those given here, as the model is only partly similar. The idea is that the relocation cost in the current model has some similarity to the ambush frequency in the earlier papers in that both incur a risk to a prey who changes location. It would be useful to have an additional explanation for the counter intuitive result that could be put purely in words, without the necessity of a mathematical model.

Learning the Capture Probabilities
An anonymous referee has asked how the predator and prey know the capture probabilities p i ; can they be learned? To answer this question we give a simple learning model. We consider the simplest case that allows for learning: two locations and two (or more) periods and only k = 1 location to be searched in each period. We assume that at each location the capture probabilities are known to be a or b equiprobably and independently, with a < b: (If a = b there is nothing to be learned.) This means locations either have a low capture probability or a high capture probability, only it is not known which. At a location where the prey has escaped j times, the conditional probability that the capture probability is a (low) is denoted by g (j) ; where g (0) = 1=2 and by Bayes Law, In other words, each successful escape from a location makes it more likely that it has a low capture probability and hence makes it more attractive to the prey and hence also to the predator. The e¤ective capture probability, denotedp; is initially given simply byp (0) = (a + b) =2; and more generally byp Note that when only one location is searched in each period, the payo¤ matrix has 0 entries o¤ the diagonal (when Hider is not in the searched location) so the matrix is a diagonal matrix. For two locations this is a matrix of the form and optimal probability (for both players) of strategy i is given by (d 1 ; d 2 ) =d i : So in the …nal period of a game, if there have been i escapes from location 1 and j escapes from location 2, the payo¤ matrix of this one stage game, called L i;j;1 , is simply (p (i) ;p (j)) : More generally let L i;j;m be the learning game where location 1 has had i escapes, 2 has had j escapes, and there are m more plays of the game. These games are recursively described by the matrix L i;j;m = p (i) + (1 p (i)) L i+1;j;m 1 0 0p (j) + (1 p (j)) L i;j+1;m 1 ; with value v (i; j; m) = (p (i) + (1 p (i)) v(i + 1; j; m 1)) ;p (j) + (1 p (j)) v(i; j + 1; m 1); In the game L i;j;m it is easy to show that the optimal probability of hiding/searching in location 1 is given by (37) For the two-stage game m = 2; the players randomize between the symmetric locations 1 and 2 in the …rst period, and assuming we name the location of escape in the …rst period as location 1, they go back to the same location in the second period with probability x 1;0;1 = 9=17 > 1=2. Now suppose that there are three stages. Clearly in the …rst stage the players have no choice but to locate equiprobably to the two locations. But how do they play in the second stage (assume there was an escape at location 1) if they know it is a three stage game? In this case the probability of choosing location 1 (for both hiding and searching) is given by x 1;0;2 = 99=191 < 9=17 = x 1;0;1 : This says that the presence of an additional …nal (third) period decreases the probability of going back to the same location as the escape in the previous period, but this probability is still greater than one half. In fact we …nd this phenomenon is true in general, learning reduces the bias toward returning to locations one has escaped from. This phenomenon obviously requires three stages in our model. Using numerical methods, this can be shown to be true for all a and b: It is useful to compare the learning game with low and high capture probabilities a and b with the similar non-learning game with a …xed and known capture probability (a + b) =2 which is the e¤ective capture probability of the learning game. We consider both in the setting of a two stage game with identical locations. If the capture probability in the non learning game is c at both locations, then the value of the second stage is given by (c; c) = 1= (1=c + 1=c) = c=2 and hence in the …rst stage has value (1=2)(c 1 + (1 c) (c=2)) = (3 c) c=4 (half the time they go to the same location, capture (payo¤ 1) has probability c and escape (with payo¤ c=2 from previous calculation) has payo¤ c=2: For the example a = 1=3; b = 2=3; c = 1=2 the non learning game has value (3 1=2) (1=2) =4 = 5=16 = 0:3125 while the learning game has the lower value v 0;0;2 = 21=68 ' 0:3088: This means that the capture probability is lower (better for the Hider) when there is learning. We show that this observation holds in general, at least for the two stage game.
Proposition 8 When there are two stages and identical locations (a priori), the optimal capture probability (value) is lower in the learning game with capture probabilities a < b < 1 than in the game where the …xed capture probability is set equal to the e¤ective capture probabilityp (0) = c = (a + b) =2: That is, v (0; 0; 2) < (3 c) c=4: Proof. After some algebraic simpli…cation, the di¤erence in the values between the no learning and learning games can be written as (38) because all the factors in the numerator are positive, and for the denominator we note that a > a 2 ; b > b 2 and a + b > 2a > 2ab: An interesting question concerns the variability of the capture probabilities. For example in a …nal period, is it better for the Hider to have escaped twice from one location (and face varied capture probabilitiesp (2) andp(0) = (a + b)=b) or once from each location (with an e¤ective capture probabilityp (1) at each location). In other words, what is the sign of v(2; 0; 1) v(1; 1; 1)? It turns out that the answer depends in a simple way on the size of the two probabilities a and b: If a + b > 1; then the Hider prefers the low variability case L 1;1;1 ; if a + b < 1 the Hider prefers the high variability case L 2;0;1 ; if a + b = 1 the players are indi¤erent between these cases. In particular we have the following. Proof. The di¤erence v(2; 0; 1) v(1; 1; 1) is given by the fraction The multinomial in the denominator has a minimum of 0 at a = b = 1; so for 0 < a < b 1 the sign of the fraction is the sign of a + b 1; as claimed.
(Note that if we allowed a = b then the escapes are irrelevant to current probabilities and the di¤erence would also be 0:) A normal form game (not a dynamic game) which considers this type of learning was analyzed in [48].

Discussion
Biology, economics, computer science and studies of human behavior have since long considered stochastic games ( [45], [46]). A recent important perspective is presented by [39]. What is therefore new here is application to the context of search games and behavioral ecology. Hence, our work has implications beyond behavioral ecology for any situation described by hide and seek games, from ecology, immune systems to computer science [1]. We are now in the position to assess the change in success of attack and escape in stochastic search games, compared to the situation in which one player is against another one which is moving randomly. This later case is known to be equivalent of games in which only one player is behaving optimally, a Markov Decision Process (MDP), see [42] or [27]. Of course it is necessary to observe that our modelling is appropriate only in the case of low densities for both predator and prey, allowing each to assume that there is at most one of the opposite type in the search region.
While we have given a complete solution to these problems in the text, the speci…c example of Section 6 is su¢ cient to indicate some di¤erences in the capture time for di¤ering assumptions. See Table 3. It is of course obvious that optimizing prey do better (lower v) than random prey and that prey would bene…t from having a risk free transition between locations between periods. In general, the prey reduces the capture probability by about 37-50% if she uses the optimal hiding strategy rather than moving and/or hiding randomly. There is thus a marked di¤erence in the probabilities of capture and escape between our (repeated or stochastic) game theoretic models and single agent predator optimization models, as used in most optimal foraging theory. This marked di¤erence extends to the use of space by the protagonists, as in our no transition capture example of Section 6 (middle column) the predator should always visit location 2 in the …rst case, and should concentrate its visits on the …rst location in the second case: a complete reversal of distribution of e¤ort as function of the tightness of the interaction! These di¤erences are the explanation why organisms tend to act according to the other player's actions and why the stochastic/repeated search game approach supersedes the classical optimal foraging one for modelling such interactions: the more complex modelling approach re ‡ects the complex, multi-step trajectories of the antagonists as we observe them. The myriad of delicate and intricate biochemical, physiological or behavioral adaptations of prey for escaping predators and of predators for successfully attacking and subduing their prey [47] show that natural selection is acting on all these traits. A stochastic game formulation is thus de…nitely required when players do behave according to what the other is doing. The model can be developed in two promising directions. First, we did not consider prey fatigue or more complex situations in which prey balance risk of predation with risk of starvation. A re…ned model taking fatigue into account would have then three state variables -motivation, fatigue, and the recent location of encounter and its development would follow lines similar as the ones we have proposed. Finally, our model is a zero-sum game. One may argue that a real game between a predator and a prey is not a zero-sum game, as the predator is running after its dinner while the prey is running for its life. This is an important if di¢ cult aspect to deal with. Indeed, while non-zero sum stochastic games have been modeled only a few years after zero-sum games were developed, the level of complexity is strongly increased. The value of the game cannot indeed be taken as granted, in contrast to zero-sum games. For search games, implementing non-zero sum games represents a virgin and much needed …eld. We advocate future analysis of the following non-zero sum model. In the single period problem we could require the predator to search the k locations sequentially. If the prey is found on the j'th search and successfully pursued the payo¤ to the predator would be 1 jc for some small …xed search cost c; modeling the e¤ort or energy of a search. The cost could also be location dependent, c i . The prey also might prefer later capture within a period in such a model, but this would still not make it zero-sum, as survival would be more signi…cant.