Cost efficiency of institutional incentives for promoting cooperation in finite populations

Institutions can provide incentives to enhance cooperation in a population where this behaviour is infrequent. This process is costly, and it is thus important to optimize the overall spending. This problem can be mathematically formulated as a multi-objective optimization problem where one wishes to minimize the cost of providing incentives while ensuring a minimum level of cooperation, sustained over time. Prior works that consider this question usually omit the stochastic effects that drive population dynamics. In this paper, we provide a rigorous analysis of this optimization problem, in a finite population and stochastic setting, studying both pairwise and multi-player cooperation dilemmas. We prove the regularity of the cost functions for providing incentives over time, characterize their asymptotic limits (infinite population size, weak selection and large selection) and show exactly when reward or punishment is more cost efficient. We show that these cost functions exhibit a phase transition phenomenon when the intensity of selection varies. By determining the critical threshold of this phase transition, we provide exact calculations for the optimal cost of the incentive, for any given intensity of selection. Numerical simulations are also provided to demonstrate analytical observations. Overall, our analysis provides for the first time a selection-dependent calculation of the optimal cost of institutional incentives (for both reward and punishment) that guarantees a minimum level of cooperation over time. It is of crucial importance for real-world applications of institutional incentives since the intensity of selection is often found to be non-extreme and specific for a given population.

Institutions can provide incentives to enhance cooperation in a population where this behaviour is infrequent. This process is costly, and it is thus important to optimize the overall spending. This problem can be mathematically formulated as a multi-objective optimization problem where one wishes to minimize the cost of providing incentives while ensuring a minimum level of cooperation, sustained over time. Prior works that consider this question usually omit the stochastic effects that drive population dynamics. In this paper, we provide a rigorous analysis of this optimization problem, in a finite population and stochastic setting, studying both pairwise and multi-player cooperation dilemmas. We prove the regularity of the cost functions for providing incentives over time, characterize their asymptotic limits (infinite population size, weak selection and large selection) and show exactly when reward or punishment is more cost efficient. We show that these cost functions exhibit a phase transition phenomenon when the intensity of selection varies. By determining the critical threshold of this phase transition, we provide exact calculations for the optimal cost of the incentive, for any given intensity of selection. Numerical simulations are also provided to demonstrate analytical observations. Overall, our analysis provides for the first time a selection-dependent calculation of the optimal cost of institutional incentives (for both reward and punishment) that guarantees a minimum level of 2021 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/ by/4.0/, which permits unrestricted use, provided the original author and source are credited.
Providing incentives is costly and it is therefore important to minimize the cost while ensuring a sustained level of cooperation over time [28,31,41]. Despite its paramount importance, so far there have been only a few works exploring this question. In particular, Wang et al. [35] use optimal control theory to provide an analytical solution for cost optimization of institutional incentives assuming deterministic evolution and infinite population sizes (modelled using replicator dynamics). This work therefore does not take into account various stochastic effects of evolutionary dynamics such as mutation and non-deterministic behavioural update [4,43,44]. In a deterministic system consisting of cooperators and defectors, once the latter disappear (for instance through strong institutional punishment), there is no further change to the system and thus no further interference in it is required. When mutation is present, this behaviour can however recur and become abundant over time, requiring institutions to spend more of their budget on providing further incentives. Moreover, a key factor of behavioural update, the intensity of selection [4]-which determines how strongly an individual bases their decision to copy another individual's strategy on their fitness difference-might strongly impact an institutional incentives strategy and its cost efficiency. Its value is usually found to be specific for a given population [45][46][47][48] and thus should be taken into account when designing suitable costefficient incentives. For instance, when selection is weak such that behavioural update is close to a random process (i.e. an imitation decision is independent of how large the fitness difference is), providing incentives would make little difference to cause behavioural change, however strong it is. When selection is strong, incentives that ensure a minimum fitness advantage to cooperators would ensure a positive behavioural change.
In a stochastic, finite-population context, so far this problem has been investigated primarily using agent-based and numerical simulations [28,31,[49][50][51][52]. Results demonstrate several interesting phenomena, such as the significant influence of the intensity of selection on incentive strategies and optimal costs. However, there is no satisfactory rigorous analysis available at present that allows one to determine the optimal way of providing incentives. This is a challenging problem because of the large but finite population size and the complexity of stochastic processes governing the population dynamics.
In this paper, we provide exactly such a rigorous analysis. We study cooperation dilemmas in both pairwise (the Donation game (DG)) and multi-player (the Public Goods game (PGG)) settings [4]. They are among the most well-studied models for investigating the evolution of cooperative behaviour where individual defection is always preferred over cooperation while mutual cooperation is the preferred collective outcome for the population as a whole. Adopting a popular stochastic evolutionary game approach for analysing well-mixed finite populations [53][54][55], we derive the total expected costs of providing institutional reward or punishment, characterize their asymptotic limits (namely, for an infinite population, weak selection and strong selection) and show the existence of a phase transition phenomenon in the optimization problem when the intensity of selection varies. We calculate the critical threshold of phase transitions and study the minimization problem when the selection is less than and greater than the critical value. We furthermore provide numerical simulations to demonstrate the analytical results.
The rest of the paper is organized as follows. In §2, we introduce the models and methods, deriving mathematical optimization problems that will be studied. The main results of the paper are presented in §3. In §4, we discuss possible extensions for future work. Finally, detailed computations, technical lemmas and proofs of the main results are provided in the electronic supplementary material.

Models and methods (a) Cooperation dilemmas
We consider a well-mixed, finite population of N self-regarding individuals or players, who interact with each other using one of the following one-shot (i.e. non-repeated) cooperation dilemmas: the DG or its multi-player version, the PGG. In these games, a player can choose either to cooperate (i.e. a cooperator or C player) or to defect (i.e. a defector, or D player).
Let Π C (i) and Π D (i) be the average pay-offs of a C player and a D player in a population with i C players and N − i D players, respectively (see also §2.3 for more details). We show below that the difference δ = Π C (i) − Π D (i) does not depend on i. For cooperation dilemmas, it is always the case that δ < 0.

(i) Donation game
The pay-off matrix of the DG (for a row player) is given as follows: where c and b represent the cost and benefit of cooperation, where b > c. DG is a special version of the Prisoner's Dilemma (PD) game.
Denoting π X,Y as the pay-off of a strategist X when playing with strategist Y from the pay-off matrix above, we obtain Thus, (ii) Public Goods game In a PGG, players interact in a group of size n, where they decide to cooperate, contributing an amount c > 0 to a common pool, or to defect, contributing nothing to the pool. The total contribution in a group will be multiplied by a factor r, where 1 < r < n (for the PGG to be a Thus,

(b) Cost of institutional reward and punishment
To reward a cooperator (respectively, punish a defector), the institution has to pay an amount θ/a (resp., θ/b) so that the cooperator's (defector's) pay-off increases (decreases) by θ, where a, b > 0 are constants representing the efficiency ratios of providing the corresponding incentive. As we study reward and punishment separately, without losing generality, we set a = b = 1 [22,28]. Thus, the key question here is: What is the optimal value of the individual incentive cost θ that ensures a sufficient desired level of cooperation in the population (in the long run) while minimizing the total cost spent by the institution?

(i) Deriving the expected cost of providing institutional incentives
We adopt here the finite population dynamics with the Fermi strategy update rule [44], stating that a player A with fitness f A adopts the strategy of another player B with fitness f B with a probability given by P A, The entries n ij of the so-called fundamental matrix N = (n ij ) N−1 i,j=1 = (I − U) −1 of the absorbing Markov chain give the expected number of times the population is in the state S j if it starts in the transient state S i [57]. As a mutant can randomly occur at either S 0 or S N , the expected number of visits at state S i is, thus, 1 2 (n 1i + n N−1,i ). The total cost per generation is Hence, the expected total costs of interference for institutional reward and institutional punishment are, respectively,

(ii) Cooperation frequency
Since the population consists of only two strategies, the fixation probabilities of a C (D) player in a homogeneous population of D (C) players when the interference scheme is carried out are, respectively, Computing the stationary distribution using these fixation probabilities, we obtain the frequency of cooperation (see §2.3), Hence, this frequency of cooperation can be maximized by maximizing The fraction in equation (2.3) can be simplified as follows [54]: In the above transformation, T − (k) and T + (k) are the probabilities of decreasing or increasing the number of C players (i.e. k) by one in each time step, respectively. We consider non-neutral selection, i.e. β > 0 (under neutral selection, there is no need to use incentives). Assuming that we desire to obtain at least an ω ∈ [0, 1] fraction of cooperation, i.e.
Therefore, it is guaranteed that, if θ ≥ θ 0 (ω), at least an ω fraction of cooperation can be expected. This condition implies that the lower bound of θ monotonically depends on β. Namely, when ω ≥ 0.5 it increases with β, while when ω < 0.5 it decreases with β.

(iii) Optimization problems
Bringing all these factors together, we obtain the following cost-optimization problems of institutional incentives in stochastic finite populations:

(c) Methods: evolutionary dynamics in finite populations
We adopt in our analysis the evolutionary game theory (EGT) methods for finite populations [53][54][55]. Herein, individuals' pay-offs represent their fitness or social success, and evolutionary dynamics is shaped by social learning [4,43], whereby the most successful players will tend to be imitated more often by the other players. Here, social learning is modelled using the pairwise comparison rule [44], that is, a player A with fitness f A adopts the strategy of another player B with fitness f B with probability given by the Fermi function, where β conveniently describes the selection intensity (β = 0 represents neutral drift while β → ∞ represents increasingly deterministic selection).
In the absence of mutations or exploration, the end states of evolution are inevitably monomorphic: once such a state is reached, it cannot be escaped through social learning. We assume that, with a certain mutation probability, an individual switches randomly to a different strategy without imitating another individual. In addition, we assume here the small mutation limit [53,55,58]. Thus, at most two strategies are present in the population at a time. The evolutionary dynamics can be described by a Markov chain, where each state represents a homogeneous population and the transition probabilities between any two states are given by the fixation probability of a single mutant [53,55,58]. The resulting Markov chain has a stationary distribution, which describes the average time the population spends in an end state. The small mutation limit allows us to obtain an analytical form of the frequency of cooperation (see below). It is noteworthy that, although we focus here on the small mutation limit, this approach has been shown to be widely applicable to scenarios which go well beyond the strict limit of very small mutation rates [45,46,48,59].
The fixation probability of a single mutant A taking over a whole population with (N − 1) B players is as follows (see [44,55,60] for details) ] −1 describes the probability of changing the number of A players by ± one in a time step. Specifically, when β = 0, ρ B,A = 1/N, representing the transition probability at the neutral limit.
Considering the set of two strategies C and D (see [53,58] for the calculation for any number of strategies). Their stationary distribution is given by the normalized eigenvector associated with the eigenvalue 1 of the transpose of a matrix [53,58] The first term is the frequency of cooperation and the second one is that of defection.

Main results
The present paper provides a rigorous analysis of the expected total cost of providing an institutional incentive (2.2) and the associated optimization problem (2.6). In this section, we state our main analytical results, theorems 3. In the following theorems, E denotes the cost function for either institutional reward, E r , or institutional punishment, E p , as obtained in (2.2). Also, H N denotes the well-known harmonic number Our first main result provides qualitative properties and asymptotic limits of E.
Theorem 3.1 (qualitative properties and asymptotic limits of total cost functions).
(I) (finite population estimates) The expected total cost of providing an incentive satisfies the following estimates for all finite populations of size N: (II) (infinite population limit) The expected total cost of providing an incentive satisfies the following asymptotic behaviour when the population size N tends to +∞: where γ = 0.5772 · · · is the Euler-Mascheroni constant. (III) (weak selection limit) The expected total cost of providing an incentive satisfies the following asymptotic limit when the selection strength β tends to 0: (IV) (strong selection limit) The expected total cost of providing an incentive satisfies the following asymptotic limit when the selection strength β tends to +∞: (3.6) The lower and upper bounds obtained in part (I) of the theorem suggest that the total expected cost function E for both reward and punishment behaves asymptotically in the order of (N 2 H N ) × θ for sufficiently large N. This is confirmed in part (II), noting that H N ∼ ln N. We also show that the leading asymptotic coefficient of E depends on the game (i.e. DG or PGG) and its parameters. Hence, it is important to adopt a precise optimal value of θ (e.g. obtained by solving the optimization problem (2.6)), as a small increase in this individual incentive cost can lead to a significant increase in E, especially when the population size is large. Figure 1 numerically demonstrates this asymptotic limit. Parts (III) and (IV) of the theorem provide theoretical estimations of E under the weak (β → 0) and strong (β → +∞) selection limits. For the weak selection limit, the expected total costs are the same for reward and punishment, i.e. E r (θ ) = E p (θ). For the strong selection limit, E r is smaller than, equal to or greater than E p , depending on whether θ is smaller than, equal to or greater than −δ. Figure 2 provides numerical validation of the theoretical weak and strong selection asymptotic behaviours of E, for different population sizes N. We can observe that, for a given individual incentive cost θ , the range of E increases significantly for larger N.
Our second main result concerns the optimization problem (2.6). We show that the cost function E exhibits a phase transition when the selection intensity β varies.  where P(u) and F(u) as well asP andF are defined in the electronic supplementary material (see § §1 and 2 there, respectively). There exists a threshold value β * given by The proofs of theorems 3.1 and 3.2 for the cases of reward and punishment are given in § §1 and 2 in the electronic supplementary material, respectively. We also provide explicit computations for N = 3 and N = 4 to illustrate these theorems in §3 in the electronic supplementary material. Based on numerical simulations, we conjecture that the requirement N ≤ N 0 could be removed and theorem 3.2 is true for all finite N. In electronic supplementary material, figure S2, using numerical calculation we have shown that N 0 = 100 satisfies the conjecture, ensuring the validity of the numerical examples below. Theorem 3.2 gives rise to the following algorithm to determine the optimal value θ * for N ≤ N 0 .   N − 1))))}.
To illustrate theorem 3.2 and algorithm 3.3, we focus on the case of reward. Figure 3 shows the cost function E r as a function of θ , for different values of N, β and ω to illustrate the phase transition when varying β, in a DG. We can see that, in all cases, these numerical observations are in close accordance with theoretical results. For example, with N = 3 (figure 3a), we found β = f /δ = 10.9291/1.9 = 5.752. For β < β , E(θ ) are increasing functions of θ. Thus, the optimal cost of incentive θ = θ 0 , for a given required minimum level of cooperation ω. For example, with N = 3, for β = 1 to ensure at least 70% of cooperation (ω = 0.7), then θ = θ 0 = 2.32. When β ≥ β one needs to compare E(θ 0 ) and E(θ 2 ). For example, with N = 3, β = 10: for ω = 0. 25  Similarly, with a larger population size (N = 50; see figure S1 in the electronic supplementary material, bottom row), we obtained β = 3.15/1.03673 = 3.039. In general, similar observations are obtained as in the case of a small population size N = 3. Except that, when N is large, the values of θ 0 for different non-extreme values of minimum required cooperation ω (say, ω ∈ (0.01, 0.99)) are very small (given the log scale of ω/(1 − ω) in the formula of ω 0 ). This value is also smaller than θ 0 , with a cost E(θ 0 ) > E(θ 2 ), making θ 2 the optimal cost of incentive. Similar results are obtained for PGG ( figure 4). When ω is extremely high (i.e. greater than 1 − 10 −k , for a large k) (we do not look at extremely low values since we would like to ensure at least a sufficient level of cooperation), then we can also see other scenarios where the optimal cost is θ 0 (see figure S1 in the electronic supplementary material, bottom row). We thus can observe that for ω ∈ (0.01, 0.99), for sufficiently large population size N and large enough β (β > β + a bit more), then the optimal value of ω is always θ 2 . Otherwise, θ 0 is the optimal cost.
Our last result provides a comparison of the expected total costs for providing institutional reward and punishment, for different individual incentive costs θ.   Reward is less costly than punishment (E r < E p ) for small θ , and vice versa. The threshold of θ for this change was obtained analytically (see theorem 3.1), which is exactly equal to −δ. Results are obtained for DG with b = 2, c = 1. (Online version in colour.)

Theorem 3.4 (reward versus punishment costs). The difference between the expected total costs of reward and punishment is given by
As a consequence, when β ≤ min{β * r , β * p } we have E * r = E r (θ 0 ) and E * p = E p (θ 0 ).
In this case, varying θ . We observe that reward is less costly than punishment (E r < E p ) for θ < −δ and vice versa when θ > −δ. It is exactly as shown analytically in theorem 3.4. This analytical result is confirmed here for different population size N and intensity of selection β. Figure 6 also confirms the second part of the theorem, where for small β, if one can choose the type of incentive to use, either reward or punishment, then the former can provide a lower cost when requiring less than 50% cooperation at minimum and the latter otherwise. This is in line with previous work showing that reward mechanisms work very well to promote cooperation in environments in which it is rare, while punishment mechanisms are better at maintaining high levels of cooperation (e.g. [28,35,52]).

Discussion
Institutional incentives such as punishment and reward provide an effective tool for promoting the evolution of cooperation in social dilemmas. Both theoretical and experimental analysis has been carried out [29,36,37,52,[61][62][63]. However, past research usually ignores the question of how institutions' overall spending, i.e. the total cost of providing these incentives, can be minimized, while at the same time guaranteeing a minimum desired level of cooperation over time. Answering this question allows one to estimate exactly how incentives should be provided, that is, how much to reward a cooperator and how severely to punish a wrongdoer. Existing works that consider this question usually omit the stochastic effects that drive population dynamics, namely when the intensity of selection varies.
Resorting to a stochastic evolutionary game approach for finite, well-mixed populations, we have provided theoretical results for the optimal cost of incentives that ensure a desired level of cooperation while minimizing the total budget, for a given intensity of selection, β. We show that this cost strongly depends on the value of β, owing to the existence of a phase transition in the cost functions when β varies. This behaviour is missing in works that consider a deterministic evolutionary approach [35]. The intensity of selection plays an important role in evolutionary processes. Its value differs depending on the pay-off structure (i.e. scaling game pay-off matrix by a factor is equivalent to dividing β by that factor) and is usually found to be specific for a given population, which can be estimated through behavioural experiments [45][46][47][48]. Thus, our analysis provides a way to calculate the optimal incentive cost for a given population and game pay-off matrix at hand.
With regard to theoretical importance, we characterized asymptotic behaviours of the total cost functions for both reward and punishment (namely, in the limits of a large population, weak selection and strong selection) and compared these functions for the two types of incentive. We showed that punishment is always more costly for a small (individual) incentive cost (θ) but less so when this cost is above a certain threshold. We provided an exact formula for this threshold. This result provides insights into the choice of which type of incentives to use.
In the context of institutional incentives modelling, a crucial issue is the question of how to maintain the budget for providing incentives [59,64]. The problem of who pays or contributes to the budget is a social dilemma in itself, and how to escape this dilemma is a critical research question. In this work, we focus on the question of how to optimize the budget used for the provided incentives.
There are several simplifications made for the theoretical analysis to be possible. First, in order to derive the analytical formula for the frequency of cooperation, we assumed the small mutation limit. Despite the simplified assumption, this small mutation limit approach has been shown to be widely applicable to scenarios which go well beyond the strict limit of very small mutation rates [46,48,59]. Relaxing this assumption would make the derivation of a close form for the frequency of cooperation intractable.
Second, we focused in this paper on two important cooperation dilemmas, the DG and the PGG. They have in common a useful property that the difference in (average) pay-off between a cooperator and a defector, δ = Π C (i) − Π D (i), does not depend on i, the number of cooperators in the population. This property allows us to simplify the fundamental matrix to a tridiagonal form and apply the techniques of matrix analysis to obtain a close form of its inverse matrix (see electronic supplementary material). In games with more complex pay-off matrices such as the PD in its general form and the collective risk game [65], the difference δ depends on i and the technique in this paper cannot be directly applied. We might consider other approaches to approximate the inverse matrix, exploiting its block structure.
Data accessibility. This article has no additional data.