Spatial correlated games

This article studies correlated two-person games constructed from games with independent players as proposed in Iqbal et al. (2016 R. Soc. open sci. 3, 150477. (doi:10.1098/rsos.150477)). The games are played in a collective manner, both in a two-dimensional lattice where the players interact with their neighbours, and with players interacting at random. Four game types are scrutinized in iterated games where the players are allowed to change their strategies, adopting that of their best paid mate neighbour. Particular attention is paid in the study to the effect of a variable degree of correlation on Nash equilibrium strategy pairs.


Introduction
This paper considers the four two-person (A and B), 2×2 non-zerosum game types defined by the pay-off matrices given in table 1. Namely, the Prisoner's Dilemma (PD), the Hawk-Dove (HD), the Samaritan's Dilemma (SD) and the Battle of the Sexes (BOS), whose interpretation is described below.
In the PD game, both players may choose either to cooperate (C) or to defect (D). Mutual cooperators each scoring the reward R; mutual defectors score the punishment P; and D scores the temptation T against C, who scores S (sucker's pay-off) in such an encounter. In the PD, it is: T > R > P > S. In this study, the PD payoff values will be T = 5, R = 3, P = 2 and S = 1. The PD with these pay-offs will be referred to as PD(5,3,2,1).
In the HD game, the structure of the pay-offs matrices is similar to that in the PD, but in the HD it is P < S instead of P > S as in the PD. In this study, the HD pay-off values will be T = 3, R = 2, P = −1 and S = 0. The HD with these pay-offs will be referred to as HD(3, 2, 0, −1).
In the SD game, the charity player A may choose Aid/No Aid, whereas the beneficiary player B may choose Work/Loaf. The Samaritan's dilemma arises in the act of charity. The charity player wants to help (Aid) people in need. However, the beneficiary may simply rely on the handout (Loaf) rather than try to improve their situation (Work). This is not anticipated by the charity player. Many people may have experienced this dilemma when confronted with people in need. Although there is a desire to help them, there is the recognition that a handout may be harmful to the long-run interests of the recipient [1][2][3][4][5].  references [6][7][8], we adopt here the pay-off matrices given in its corresponding panel in table 1, and the SD with these pay-offs will be referred to as SD(3, 2, 1, −1).
In the so-called BOS game, the rewards R > r > 0 quantify the preferences in a conventional couple fitting the traditional stereotypes: The male player A prefers to attend a Football match, whereas the female player B prefers to attend a Ballet performance. Both players hope to coordinate their choices, but the conflict is also present because their preferred activities differ [9,10]. In this study, the BOS pay-off values will be R = 3, r = 1. The BOS with these pay-offs will be referred to as BOS(5,1).
The PD and HD games are symmetric, i.e. the pay-off matrices of both players coincide after transposition, whereas the SD and the BOS games are not symmetric. In symmetric games, the role of both players are somehow interchangeable, whereas in asymmetric games every player has to be studied separately. This issue is to be taken into account all across this study, but particularly in §5.
This paper studies the game-types under scrutiny interacting in a collective manner; either with players connected in a spatially structured manner ( §3) or with players randomly connected ( §4). Collective games on networks have long been studied previously [11,12]. The novelty of this study lies in the consideration of the mechanism for correlating independent strategies given in [13], and contextualized here in §2.

Independent players and correlated games
In the somehow canonical approach to game theory, both players choose their strategies independently of each other. In an alternative approach, an external (probabilistic) mechanism sends a signal to each player, so that, in principle, the players do not have any active role. Both approaches, as well as a mechanism for combining them, are featured in this section.

Games with independent players
In conventional games, both players decide independently their probabilistic strategies x = (x, 1 − x) and y = (y, 1 − y) , which give rise to the joint probability distribution Π = xy . As a result, in a game with P A and P B pay-off matrices, the expected pay-offs (p) of both players are ( indicates element-by-element matrix multiplication, 1 = (1, 1)): (2.1) The strategy pair (x, y), referred to here as (x, y), is in Nash equilibrium (NE), if x is the best response to y and y is the best response to x. In the PD game, mutual defection, i.e. x * = y * = 0, is the only pair of strategies in NE. The HD game has three strategy pairs in NE, two of them are given by the pure strategies (x * = 1, y * = 0 ≡ (D, H)) and (x * = 0, y * = 1 ≡ (H, D)), whereas the third NE in achieved with mixed strategies, which in the particular case of the HD(3, 2, 0, −1) considered here becomes x * = y * = 1 2 , leading to p A,B = 1. Note that (x * = y * = 0 ≡ (H, H)) is not in NE in the HD game. The SD game has only one NE, which in the particular case of the SD(3, 2, 1, −1) considered here becomes: (x * = 1 2 , y * = 1 5 ), leading to (p A = −0.2, p B = 1.5). The BOS game has three strategy pairs in NE, two of them are given by the pure strategies (x * = y * = 1 ≡ (F, F)) and (x * = y * = 0 ≡ (B, B)), whereas the third NE in achieved with the mixed strategies (x * = R/(R + r), y * = r/(R + r)), leading to p A,B = Rr/(R + r) < r. Social welfare (SW) functions may be envisaged as summarizing some particular conception of the common good [14]. In its simplest form, SW solutions maximize the sum of the pay-offs of both players. In the games studied here, only (1,1) is the SW solution in the HD(3, 2, 0, −1) and the SD(3, 2, 1, −1); in the PD(5,3,2,1), (1,1), (1,0), (0,1) are SW solutions, although only (1,1) is pay-offs balanced; in the BOS(5,1), both (1,1) and (0,0) are SW solutions.

Correlated games
In a different game scenario, that of correlated games, an external probability distribution Π = π 11 π 12 π 21 π 22 assigns probability to every combination of player choices [10], giving rise to the expected pay-offs Non-factorizable Π may be generated from independent strategies (x, y) as with the ad hoc method based on an external parameter k ∈ [0, 1] given in [13], and shown as follows: Equations (2.3) give the values of the elements of Π from equations (2.2) for three relevant values of k. Note that k = 1 interchanges the k = 0 values of π 12 and π 21 , whereas those of π 11 and π 22 remain unaltered. Also relevant is that if x = y = 1 2 , Π is uniform (all its elements equal to 1 4 ) for k = 0 and k = 1, but for k = 1 2 , it is π 11 = 0, π 12 = π 21 = 1 4 , π 22 = 2 4 . As a result, in a balanced x = y = k = 1 2 scenario, the player B is privileged in the KBOS game. Thus, following with the male-female stereotypes, a male modeller would describe the BOS game assigning player B to the female, whereas the female modeller would reverse such role assignments. = 1 4 (11 − k(1 − k)).
x y x y p, x, y p , x, y KSD(3, 2; -1, 3; -1, 1) NE Consequently, the strategy pairs in NE in the KSD(3, 2, −1, 0) are given in (2.8), where the threshold k * = 0.89 emerges from the x ≤ 1 restraint applied to x. Before k * , it is (2.8) Figure 2a shows the strategies and pay-offs in NE in a KSD(3, 2, −1, 0) for variable k. Figure 2b shows pay-offs in a KSD(3, 2, −1, 0) with (x = 0.5, y = 0.2). It is remarkable that the pay-offs in the latter scenario do not differ very much from that in NE, particularly in the case of p A .
From the joint probabilities given in equations (2.4), the pay-offs in a KBOS(5,1) with pure strategies and x = y = 0.5 are given in the following equations, and plotted in figure 7a.

Spatial games
In the spatial version of the two-person games we deal with, each player occupies a site (i, j) in a twodimensional N × N lattice. The A and B players alternate in the site occupation in a chessboard form, so that every player is surrounded by four partners (A-B, B-A), and four mates (A-A, B-B). The game is played in the cellular automata (CA) manner, i.e. with uniform, local and synchronous interactions [15]. In this way, every player (i, j) plays with his four adjacent partners, so that his pay-off at time step T, namely p (T) i,j , is the sum over these four interactions. The evolution is ruled by the (deterministic) imitation of the best paid neighbour, so that in the next generation, every generic player (i, j) will adopt the probabilities of his mate player (k, l) with the highest pay-off among their mate neighbours. Table 2 shows a simple example with the PD(5,3,2,1) game where initially every player cooperates (x = y = 1), except the defector (x = 0) player A located in the (3,4) cell. Thus at T = 1, the defector player A gets the p = 20 pay-off instead of the common p = 12 pay-off. The imitation mechanism spreads the x A = 1 defection across the player A cells, whereas player B cooperation remains unaltered as no player B defects.
All the simulations in this section are run in an N = 200 lattice with periodic boundary conditions and initial random assignment of the probability values sampled from a uniform distribution in the [0, 1] interval. Thus, initially:x 0.5 andȳ 0.5. As a rule, the results regarding player A are shown in red, and those regarding player B are shown in blue. The computations have been performed by a double precision Fortran code run on a mainframe. Figure 3 deals with spatial simulations of the PD(5,3,2,1) with joint probabilities generated according to (2.2). Figure 3b shows the mean pay-offs (p) and mean values of x and y at T = 200 starting from five different random assignments of x and y. Mutual defection (x = y = 0) arises below the lower k = 0.25 threshold and mutual cooperation (x = y = 1) beyond the higher k • = 0.707 threshold. In the (k , k • ) transition interval, where both (1,0) and (0,1) are in NE,x andȳ are fairly similar, increasing their values from 0.0 to 1.0 as k increases from k up to k • ; the mean pay-offs of both players in turn are fairly similar, reaching values not far from R = 3. With the more sophisticated method of correlating independent probability distributions presented in [16], referred to here as EWL, the transition interval from mutual defection up to mutual cooperation in the PD is shorter and a strategy pair in NE providing the pay-off of mutual cooperation appears with lower degree of correlation (entanglement in the quantum approach implemented by the EWL method). In the PD(5,3,2,1) studied here, the thresholds of the correlation parameter applying the EWL method (referred to here as k q ) in a 0.0 up to 1.0 normalized scale are k q = 0.333 and k • q = 0.500 [17]. Figure 3b shows also the mean-field pay-offs (p * ) achieved in a single hypothetical two-person game with players adopting the mean probabilities appearing in the spatial dynamic simulation, namely with joint probability matrix The mean-field pay-offs (coloured brown for player A, green for player B) fully coincide with the actual mean pay-offs out of the transition interval, but underestimate them in the transition interval. The lack of coincidence of both mean-field and actual mean pay-offs is due to spatial effects that will be illustrated here when addressing the BOS game (figures 9 and 8). Figure 4b shows the results in five spatial simulations of the HD(3, 2, 0, −1) at T = 200. Spatial effects arise before k so that the mean-field approaches underestimate the actual mean pay-offs as in the spatial   No results on the spatial simulations of the HD using the EWL correlation method have been reported elsewhere, so figure 5 is included in this article. Again, as stressed above regarding the PD, the outcome of mutual cooperation (Dove in the HD) emerges before with the EWL method: k q = 0.392 < k c = 0.848. Note in figure 5a that spatial effects also arise in spatial simulations using the EWL method before k q , so that the mean-field estimates (p ) also underestimate the actual mean pay-offs (p) in the QHD before the k q threshold.     As the SD has only one NE regardless of k, (i) the results shown in the spatial simulation mimic those corresponding to NE in two-person games shown in figure 2a, and (ii) no spatial effects arise so that both mean-field and actual pay-offs coincide for every k. In spatial simulations of the SD using the EWL correlation method [6] it is k • q = 0.500 > k • c = 0.890. Figure 7b,c show the results in five spatial simulations of the KBOS(5,1) at T = 200. Owing to the particular structure of the BOS game, where both π 12 and π 21 are irrelevant, the graphs in these panels are symmetric around k = 0.5. The general form of the pay-offs (figure 7b) correspond to that of x =ȳ = 1, diminishing close to k = 0.5 (figure 7c) where notable spatial effects arise, and particularly close to the extreme values of k, both 0.0 and 1.0. The output of the spatial simulations of the BOS using the EWL correlation method notably differ from that shown in figure 7 [18]. Let us say that the BOS game proves to be a highly challenging game.

Games on random networks
In the simulations of this section, every player is connected at random with four partners and four mates, so that any spatial structure is absent in such random networks. To compare the simulations presented in this section to those based in spatially structured lattices in §3, also 200 × 200 players interact in the games on networks studied in this section, half of them of type A, the other half of type B. Figure 10 deals with the KPD(5,3,2,1) game with variable k in network simulations. Figure 10a shows the mean pay-offs of both players and their mean values of x and y at T = 200 in five simulations.   The overall structure of the graphs in figure 10a coincides with that in figure 3b. The k and k • remain unaltered, with x = y = 0 before k and x = y = 1 after k in both scenarios. At variance with this, the behaviour of the system in the (k ,k • ) interval varies significantly in figure 10 compared to that in figure 3, as in the network simulation the (1,0) and (0,1) NE emerge with no spatial effects masking them. Panel b shows that also in network simulations the dynamics induced by the imitation of the best paid neighbour implemented here also actuates in a straightforward manner, so that the permanent regime is achieved almost immediately for k = 0.0 and k = 1.0, and as soon as just passed T = 10 for k = 0.4. In figure 11, the k • threshold and the permanent x = y = 1 regime after k • remain unaltered compared to those in figure 4. But before k • , the KHD system behaves much as the KPD in its transition interval in network simulations: the (1,0) and (0,1) NE emerge with no spatial effects masking them.  In figure 12, the k • threshold and the permanent x = y = 1 regime after k • remain unaltered compared to those in figure 6. But before k • , the KSD system shows a kind of helter-skelter oscillations particularly pronounced around k = 0.5.
The overall structure of the graphs in figure 13 coincides with that in figure 7, so that x =ȳ = 1 prevail, except close to the extreme values of k, both 0.0 and 1.0. The absence of spatial structure in the network simulations of figure 13 produces crisp pay-offs (and probability) graphs, with no relevant alterations around k = 0.5 , although in one of the simulations it is x =ȳ = 0 rendering p (k = 0.5) = 5. In the graphs of pay-offs in figure 13a player B overrates player A in the wide interval (k = 0.5 − √ 2/4, k = 0.5 + √ 2/4) (with k and k • defined at the intersection of the pay-offs given in equations (2.9a)). This indicates a kind of bias of the proposed correlation mechanism that favours player B (already pointed out when commenting on equations (2.3) in §2.2), a characteristic that is also found in the EWL model regarding the BOS game [18]. It is relevant to point out that π

Partial strategy updating
In this section, it is assumed that only one player type updates his strategies in the manner indicated in §3. Thus, in figures 14 and 15 only player A updates strategies in the symmetric games of PD and HD. The asymmetric games of SD and BOS are studied in figures 16-19, where both players are treated separately.
In all the figures of this section, panels a and b deal with spatial simulations and games on networks, respectively, with the initial strategy probabilities assigned at random. Panel c, the probability of the player that does not update his probability strategies is fixed at 0.5, instead of being assigned as random as is done with the player that updates probability strategies. Thus, panel c provides a kind of the theoretical reference of what is to be expected in the collective behaviour, both in spatial simulations and in games on networks.
In the mean-field analysis with partial updating, the player that does not update his probabilities will have his mean probability equal to the middle level 1 2 . In this scenario, the joint probability matrices, when player B is fixed to y = 1 2 and player A is fixed to x = 1 2 , become, from equations (2.3), and In the KPD context of figure 14, it is p p, x, y p, x, y p, x, y  In the KHD context of figure 15, it is p As a result, the general form of the payoff of player B, p = 0.25. As reported on the PD, moderate spatial effects emerge in the spatial simulations of player A close to k in figure 15a.
The strong effect that the absence of updating capacities from one of the players exerts on the collective dynamics studied here is remarkable. Thus, figure 14a,b are to be compared to figures 3 and 10 regarding the PD, respectively, and figure 15a,b are to be compared to figures 4 and 11 regarding the HD, respectively. In any case, the intrinsic symmetry of both the PD and HD games ceases to be operative in this section, favouring player A, i.e. the player allowed to find a best response to the fixed strategies of the other player, player B, so far.
In the KSD context of figure 16, it is p (x,y=1/2) A = (6k 2 − 6k + 3 2 )x − 1 2 , where (6k 2 − 6k + 3 2 ) ≥ 0 → x = 1, so that p (x=1,y=1/2) A = 6k 2 − 6k + 1 and p (x,y=1/2) B = (4k 2 − 6k + 2)x + k − 1 2 becomes p (x=1,y=1/2) B = 4k 2 − 5k + 5 2 . Note that the intrinsic unfairness of the SD game impedes the charity player A to overrate the beneficiary player B, even in the favourable to A scenario of figure 16. A common feature of all the simulations of this section is the non-dependence of the permanent regime of the initial configuration: The five simulations run in every frame cannot be distinguished. Thus, in the particular case of the KSD in figure 16 the outputs of the five CA and NW simulations are superimposed, so that it seems that     only one has been implemented. This contrasts with the results shown in figures 6 and 12 where the five simulations may be identified before k , although their outputs are qualitatively similar.
In the KBOS context of figure 19, it is p B (x = 1/2, y) = (−8k 2 + 8k − 2)y + 5 2 , where −8k 2 + 8k − 2 ≤ 0 and consequently the best response of player B is y = 0, which leads to p B (x = 1/2, y = 0) = 5 2 and p A (x = 1/2, y = 0) = 1 2 . As pointed out when dealing with figure 18, for k = 1 2 it is = −8k 2 − 8k − 2 = 0, so that now it is p (x=1/2,y) B = 5 2 and p (x=1/2,y) A = 1 2 regardless of y so that there is no repercussion for y = 1 2 at k = 1 2 on the actual pay-offs in the NW simulations or in the mean-field pay-off approaches in the CA simulations of figure 19. The spatial simulations show an odd aspect of the pay-off graphs with no explanation. Player B notably overrates player A regardless of k in the KBOS simulations of figure 19. This is highly expected, when in addition to the structural bias favouring player B in the KBOS, only player B is allowed to search for best responses.

Conclusion
This article studies correlated two-person games constructed from games with independent players. The games are studied in a collective manner, both in a spatially structured two-dimensional lattice and with players connected at random. Iterated games are analysed where the players interact with their nearest neighbours, and after every round each player adopts the strategy of his best paid mate neighbour for the next round. The implementation of such imitation of the best evolving rule proves to be a very useful tool to analyse the collective behaviour of two-person games via simulation.