Dynamic utility: the sixth reciprocity mechanism for the evolution of cooperation

Game theory has been extensively applied to elucidate the evolutionary mechanism of cooperative behaviour. Dilemmas in game theory are important elements that disturb the promotion of cooperation. An important question is how to escape from dilemmas. Recently, a dynamic utility function (DUF) that considers an individual's current status (wealth) and that can be applied to game theory was developed. The DUF is different from the famous five reciprocity mechanisms called Nowak's five rules. Under the DUF, cooperation is promoted by poor players in the chicken game, with no changes in the prisoner's dilemma and stag-hunt games. In this paper, by comparing the strengths of the two dilemmas, we show that the DUF is a novel reciprocity mechanism (sixth rule) that differs from Nowak's five rules. We also show the difference in dilemma relaxation between dynamic game theory and (traditional) static game theory when the DUF and one of the five rules are combined. Our results indicate that poor players unequivocally promote cooperation in any dynamic game. Unlike conventional rules that have to be brought into game settings, this sixth rule is universally (canonical form) applicable to any game because all repeated/evolutionary games are dynamic in principle.

HI, 0000-0001-9350-0546 Game theory has been extensively applied to elucidate the evolutionary mechanism of cooperative behaviour. Dilemmas in game theory are important elements that disturb the promotion of cooperation. An important question is how to escape from dilemmas. Recently, a dynamic utility function (DUF) that considers an individual's current status (wealth) and that can be applied to game theory was developed. The DUF is different from the famous five reciprocity mechanisms called Nowak's five rules. Under the DUF, cooperation is promoted by poor players in the chicken game, with no changes in the prisoner's dilemma and staghunt games. In this paper, by comparing the strengths of the two dilemmas, we show that the DUF is a novel reciprocity mechanism (sixth rule) that differs from Nowak's five rules. We also show the difference in dilemma relaxation between dynamic game theory and (traditional) static game theory when the DUF and one of the five rules are combined. Our results indicate that poor players unequivocally promote cooperation in any dynamic game. Unlike conventional rules that have to be brought into game settings, this sixth rule is universally (canonical form) applicable to any game because all repeated/evolutionary games are dynamic in principle.

Background
The evolution of cooperation in human and animal societies is enigmatic because a non-cooperative agent (defector) can obtain an evolutionarily selective advantage by taking the benefits of social contributions of other cooperators while avoiding the costs of cooperation [1]. However, we often observe cooperative behaviour in human and animal societies, even though society is constructed by non-kin agents [2,3]. Game theory has been extensively studied to explain how cooperation is promoted in human and animal societies [4][5][6][7][8][9]. One of the main foci of studies in game theory is the kind of reciprocity mechanisms that can resolve social dilemmas that disturb the promotion and evolution of cooperative behaviour and how the reciprocity mechanisms can allow players to escape from dilemmas [9][10][11]. In game theory, many 2 × 2 (pairwise) dilemma games have been built to investigate the types of reciprocity mechanisms that enable a player to overcome conflicts of interests and promote cooperative behaviour [10,11]. We can denote the pay-off matrix of pairwise games with two strategies: cooperation (C) and defection (D). The rewards of players are determined by the pay-off matrix and the strategies that the players choose (equation (1.1)).
This pay-off matrix means as follows: if both cooperate, they receive the 'reward' R; if both defect, they get 'punishment' P; and if one chooses cooperation while the other defects, the defector gets the 'temptation' T and the cooperator left the pay-off of 'sucker' S [11]. In a pairwise game, there are two indicators with which to measure the strength of the dilemma situation. One is the gamble-intending dilemma (GID), which appears because players try to exploit their opponents, and the other is the risk-averting dilemma (RAD), which appears because players try not to be exploited by their opponents [9,[12][13][14]. The strengths of these two dilemmas, namely, the GID and RAD, can be calculated from the elements of the pay-off matrix (equation (1.1)) [14]. Let D g 0 and D r 0 be the values of GID and RAD, respectively. Then, we obtain the following: Note that the following equations are established by defining [15] T ¼ R þ (R À P)D 0 g ð1:4Þ and S ¼ P À (R À P)D 0 r : ð1:5Þ Depending on the strengths of these two dilemmas, the game can be divided into four classes: a prisoner's dilemma (PD) game, a chicken game (also known as a snowdrift or hawk-dove game), a stag-hunt (SH) game and a trivial game with no dilemma. Therefore, we can evaluate the evolution of cooperation more precisely if we quantitatively compare the two constitutional strengths of the reciprocity mechanisms in all pairwise games (irrespective of the reciprocity mechanisms and finiteness properties) using a RAD-GID phase plane diagram that consists of the two standardized measures (figure 1a) [15]. According to the concept of universal scaling, the relaxation of these two types of dilemmas is expressed by shifting the x-axis (i.e. the RAD-axis) and the y-axis (i.e. the GID-axis) of the RAD-GID phase plane diagram to the positive domain [15]. In this paper, we refer to the D 0 r À D 0 g phase diagram without reciprocity as the 'default' (figure 1a). Note that in the RAD-GID phase diagram, the first, second, third and fourth quadrants represent the PD, chicken, trivial and SH game structures, respectively (figure 1a).
Nowak's five reciprocity rules (i.e. direct reciprocity, indirect reciprocity, kin selection, group selection and network reciprocity) work as reciprocity mechanisms to resolve (relax) social dilemmas and promote cooperative behaviour [9,14,16]. These fundamental mechanisms are collectively known as social viscosity. In our previous paper, we used a RAD-GID phase plane diagram to visually show how Nowak's five rules relax the dilemma structure [15]. Our results showed that Nowak's five rules had different relaxation functions for the two dilemma strengths [15].
Recently, however, the promotion of cooperation was reported in a chicken game using a dynamic utility function (DUF) [17]. The DUF was developed in dynamic utility theory based on the maximization of a stochastic growth process by applying the optimality principle of Bellman's dynamic programming [18,19]. Because dynamic utility theory optimizes Markov chains (stochastic processes) as a form of sequential decision making, it maximizes the geometric mean of multiplicative growth rates [20]. The DUF is derived as follows [21,22]. Let time t = 0, … ,T (final time), and let w t and r t represent wealth and the growth rate, respectively, at time t. Note that r t (>0) is the non-negative state variable of a decision maker royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 200891 (independent, identically distributed random variable). Let r t denote the multiplicative growth rate of wealth at time t, such that w t+1 = r t w t . Wealth at the final time point, w T , is then expressed as follows: w T = w 0 r 0 r 1 r 2 · · · r T−1 . We assume that the growth rates r t (t = 0, … ,T ) are independent, identically distributed random variables that represent a stochastic process. The decision maker can optimize this stochastic process by choosing the best option at every time point. Therefore, we maximize wealth at the final time point T, w T , such that w T → max. The maximization of w T is equivalent to that of the geometric mean growth rate, such that G(r) Taking the logarithm, we obtain log {G(r)} ¼ 1=T P log (r i ) ¼ E{log (r)}: ! max. Therefore, we can define the DUF u(r) for this maximization as u(r) = log r. We now maximize the expected dynamic utility E(u) [23]. From the temporal equation w t+1 = r t w t , we can rewrite r t = w t+1 /w t = (g t + w t )/w t to obtain r = (g + w)/w, where g and w are the current gain and current wealth, respectively. The growth utility formula is then rewritten in the form of g (decision variable) given w (state variable), such that and we maximize the expected utility E{u(g; w)}, which indicates that current wealth is the state variable for maximization of final wealth. Thus, the derived dynamic utility is in the form of a logarithmic function (equation (1.6)). Note that the value of g satisfies − w < g. This analytical solution demonstrates that the utility function depends on the current gain (decision variable) and the current wealth status (state variable) at the time of decision making, as in dynamic programming [17][18][19][20][21][22][23][24]. These properties show that cooperative behaviour evolves with the introduction of a DUF that accommodates the current wealth condition of individuals (decision makers) without the known five reciprocity mechanisms. However, we cannot explain why cooperation is promoted by poor players (whose current wealth, w, are very low) in the chicken game but not in the PD game or SH game [17].
Here, we combine two new developments, namely, universal scaling parameters and the DUF. Specifically, we apply the DUF to a traditional (well-mixed infinite population) 2 × 2 game and analyse how it relaxes the strengths of the two dilemmas. By drawing the RAD-GID phase plane diagram, we compare the dilemma relaxation mechanism of the DUF with that of the Nowak's five rules which were investigated in a previous study [15]. According to this comparison, we present that the DUF has an entirely different dilemma relaxation mechanism from that of the all five rules. Then, we show the dilemma relaxation when the DUF and one of the five rules are combined. We here introduce the completely different pictures from previous studies that have achieved only an understanding of the social dilemma relaxation mechanism of the five rules [15]. Our aim is to demonstrate the difference between dynamic game theory and (traditional) static game theory. Finally, we discuss and predict the evolution of cooperation in truly dynamic games.

Methods
Here, we verify the two dilemma strengths of a 2 × 2 dynamic game comprising a pay-off matrix (equation (1.1)). We assume an infinite, well-mixed population (i.e. an infinite number of agents) with no previous social viscosities. Two individuals ( players) are selected from an unlimited population at random and asked to play the game. Players receive a reward depending on the selected strategies C and D (equation (1.1)).

Dynamic utility function
Here, we introduce the concept of individual current status from the DUF, where w is the current wealth of the player: Then, the coordinates (D 0 r , D 0 g ) are transformed to (D 0 r rev , D 0 g rev ) by the DUF as follows: Thus, the static model is considered an approximate model of the DUF games for extremely rich people, but not for the ordinary people.
Note that the default phase plane is the region enclosed by the red square. The default game classes for each coordinate are indicated by the colour of the dot. For example, the red dot is the coordinate of the default PD game without any reciprocity mechanism. The introduction of reciprocity mechanisms enhances or relaxes the strengths of the two dilemmas, and the phase plane is transformed into the black square from the red default square. The transformation of the coordinates changes the game class in the region where the dot moved from the same coloured background to another coloured background. For example, if a red dot moves to a region with a green background, the game structure in that region has changed from a PD game to an SH game due to the introduction of the reciprocity mechanism. These methods expand upon those detailed within our previous work [15]. Note that the introduction of the DUF does not change the game class.

Direct reciprocity
Note that λ is the probability of two players meeting each other in another round. The coordinates (D 0 r , D 0 g ) are transferred to (D 0 r rev , D 0 g rev ) by direct reciprocity as follows:

Indirect reciprocity
Note that q is the probability of knowing the reputation of another individual. The coordinates (D 0 r , D 0 g ) are transferred to (D 0 r rev , D 0 g rev ) by indirect reciprocity as follows (figure 2a): 10Þ Note that r is the average relatedness between interacting individuals. The coordinates (D 0 r , D 0 g ) are transferred to (D 0 r rev , D 0 g rev ) by kin selection as follows (figure 2d): Note that m is the number of groups and n is the maximum size of a group. The coordinates (D 0 r , D 0 g ) are transferred to (D 0 r rev , D 0 g rev ) by group selection as follows (figure 3a):  The term H is defined as follows: Note that k is the number of neighbours. The coordinates (D 0 r , D 0 g ) are transferred to (D 0 r rev , D 0 g rev ) by direct reciprocity as follows (figure 3d):  Note that these calculations of coordination transformation by five rules are detailed within our previous work (i.e. equations (2.6)-(2.21)) [15].
The shift in new coordinates (D 0 r rev , D 0 g rev ) by the DUF is different from that under all the previous five rules. Unlike Nowak's five rules, the DUF simultaneously relaxes the GID and enhances the RAD (figure 1): the DUF does not unilaterally enhance the negative value of dilemma strength ( figure 1b and c). Moreover, the introduction of the DUF does not change the game class. By contrast, the five reciprocity rules can cause three types of changes in game class by shifting the origin of the coordinates: (i) PD to chicken, (ii) PD to SH, and (iii) PD to trivial (no dilemmas). The DUF is unique in the enhancement of dilemma strength and is the only rule that can cause enhancement of the RAD while relaxing the GID without changes in the origin. No other rules lead to the enhancement of any dilemma upon introduction.

Analysis
We combine the concept of DUF with the five reciprocity rules; thus, the player, considering their current wealth, plays a game in which one of the five reciprocity rules works. Each combination of reciprocity mechanisms is calculated as follows.

Combining dynamic utility function and direct reciprocity
Again, λ is the probability of two players meeting each other in another round, and w is the current wealth of a player. The coordinates (D 0 r , D 0 g ) are transferred to (D 0 r rev2 , D 0 g rev2 ) by DUF and direct reciprocity as follows (figure 2b and c):

Combining dynamic utility function and indirect reciprocity
Again, q is the probability of knowing the reputation of another individual, and w is the current wealth of a player. The coordinates (D 0 r , D 0 g ) are transferred to (D 0 r rev2 , D 0 g rev2 ) by DUF and indirect reciprocity as follows (figure 2b and c): Again, direct reciprocity and indirect reciprocity of the same strength will transfer the phase plane to the same coordinates [25].
3.3. Reciprocity mechanism combining dynamic utility function and kin selection Again, r is the average relatedness between interacting individuals, and w is the current wealth of a player. The coordinates (D 0 r , D 0 g ) are transferred to (D 0 r rev2 , D 0 g rev2 ) by DUF and kin selection as follows (figure 2e and f ):

Combining dynamic utility function and group selection
Again, m is the number of groups, n is the maximum size of a group and w is the current wealth of a player. The coordinates (D 0 r , D 0 g ) are transferred to (D 0 r rev2 , D 0 g rev2 ) by DUF and group selection as follows (figure 3b and c): 3.5. Combining dynamic utility function and network reciprocity Again, k is the number of neighbours and w is the current wealth of a player. The coordinates (D 0 r , D 0 g ) are transferred to (D 0 r rev2 , D 0 g rev2 ) by DUF and network reciprocity as follows (figure 3e and f ):

Discussion
The current study is very similar to our previous study [15]. However, this is distinctively different in the findings. DUF is a dynamic version of the utility function, whereas the traditional utility function assumes the independence from current wealth, that is, a static model. Game theory by its definition should be dynamic as long as players repeat games. In this sense, the five rules should be fundamentally viewed not under the static utility functions, but under DUF. We here call the DUF the sixth reciprocity mechanism because it modifies the elements of a pay-off matrix, as the Nowak's five rules do. However, we should note that the current DUF model is not a functional mechanism, unlike the Nowak's five rules, but a more realistic model considering the effects of current wealth in the optimization of individual behaviour. We here show that DUF changes the traditional view of dilemma structure that has been assumed under static, or quasi-static model of the von Neumann-Morgenstern axioms. Thus, we showed that the dilemma structures under DUF are what we have to look for when we consider all other dilemma relaxation rules. We analysed the dilemma strength of a game with a sixth reciprocity mechanism, a DUF, compared with that of a game with Nowak's five rules. The current result explains why the DUF promotes cooperative behaviour only in a chicken game [17]. RAD enhancement by the DUF means that the DUF strengthens the dilemma under the SH game (figure 1). We should also note that the coordinates change in the chicken and trivial regions in the DUF (see the grids of these regions in figure 1b and c). An increase in the wealth level w in the DUF decreases the degree of relaxation/enhancement towards the default ( figure 1). This result is consistent with previous analysis of a dynamic game (with a DUF) that becomes a static game as wealth w → ∞ [17]. These properties of the DUF are distinct from those of Nowak's five rules of relaxation [15,16].
Under the DUF, the GID is relaxed and the RAD is amplified simultaneously in the same game structure ( figure 1b and c). This fact is intuitive because the GID relegates humans to defective behaviours that deviate from cooperative actions more than the RAD does in social contexts. Note that the GID is inspired by an ambitious intention to exploit others more seriously, while the RAD is caused by the fear of being exploited by others. In other words, the GID is an indicator of the intention of exploitation, while the RAD is an indicator of the avoidance of exploitation [9]. Therefore, the relaxation of the GID is more critical to the development of cooperative behaviour. In this sense, the DUF should have played an important role in the evolution of animal and human societies.
We assume that the pay-off (utility) matrix depends on the player's current state, but more realistically, we can expect that the pay-off matrix also depends on the current state of opponent [24,26]. We also currently assume that r t are independent, identically distributed random variables (i.e. i.i.d.r.v.). In future, this condition may be relaxed, for example, depending on the current wealth, because growth rates are more likely to be depending on it. This random variable is more likely to be dependent on the current wealth. However, any changes in the current conditions remain to be unsolved, invoking the difficulty in analytic derivations.
The concept of the DUF, which considers the player's current wealth, can be combined with five reciprocity rules. All combinations of the DUF and five rules work effectively to relax the dilemma strength (figures 2, 3 and 4). In particular, the GID is dramatically relaxed by a combination of reciprocity mechanisms because both the DUF and the five rules relax the GID. By contrast, the DUF enhances the RAD; thus, the combinations of the DUF and five rules cannot be expected to relax the RAD. If the effect of the DUF is strong (i.e. the current wealth of player is small), the RAD may be enhanced by the DUF, despite RAD relaxation according to the five rules. However, as mentioned above, the GID is a more critical obstacle than the RAD for the promotion of cooperation. Therefore, royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 7: 200891 the combination of the DUF and the five reciprocity mechanisms is a highly effective promotion mechanism of cooperative behaviour in pairwise games.
The concept of the DUF is a possible alternative framework to the five reciprocity protocols elucidated by Nowak, leading to the evolution of mutual cooperation. More importantly, any games played by human and animal societies are dynamic [17,24]. Therefore, the current DUF should apply to any game in any society. This universality means that a game with the DUF is a true dynamic game that should follow the canonical form of games.
Ethics. The authors confirm that the study did not use humans or animals. Data accessibility. The authors confirm that the article has no data. Authors' contribution. H.I. and J.T. conceived the study. H.I. generated the figures. H.I. and J.T. wrote the manuscript. Competing interests. The authors declare that they have no competing interests. Funding. This work was supported by the JSPS KAKENHI (grant nos. 17J06741 and 17H04731 to H.I., and 18K18924 and 19KK0262 to J.T.).