Electronic Supplementary Material: Evolution of coordinated punishment to enforce cooperation from an unbiased strategy space

An analytical approach based on small mutation rates [3] is problematic for the present model, as there are stable mixtures of strategies which call for mutation rates exponentially small in the population size and the intensity of selection [4]. Also a weak selection analysis for any mutation rate makes limited sense in our case [5], as this is based on deviations from a state where a large number strategies would coexist at identical abundance – a situation that appears ill-fitted to our setup with a large number of strategies. To bypass these difficulties, we focus on simulations to single out the important parts of the dynamics, which we then study analytically.

• The offspring replaces a randomly chosen individual.
An analytical approach based on small mutation rates [3] is problematic for the present model, as there are stable mixtures of strategies which call for mutation rates exponentially small in the population size and the intensity of selection [4]. Also a weak selection analysis for any mutation rate makes limited sense in our case [5], as this is based on deviations from a state where a large number strategies would coexist at identical abundance -a situation that appears ill-fitted to our setup with a large number of strategies. To bypass these difficulties, we focus on simulations to single out the important parts of the dynamics, which we then study analytically.

B Parameter choice
For the evolutionary dynamics, we assume a population of size N = 50, a mutation rate µ = 10 −3 , and a high intensity of selection ω = 10 to obtain results robust under changes of the parameters of the game [6,7]. Note that the model is interesting only if these parameters fulfill the following four conditions (i) A group in which all individuals contribute to a pro-social pool cooperates and is better off than a group in which everyone abstains, i.e., rc − c − γ > σ.
(ii) It is better to abstain than to be in a group of defectors, σ > 0.
(iii) The fine exceeds the costs, β > c, i.e., punishment is always a deterrent, since the fine paid for not contributing outweighs contributions themselves.
(iv) The fine exceeds the cost of punishing someone, β > γ and the cost of contribution is smaller than the cost of punishing, γ < c.
For our simulations, the game parameters are as chosen as follows: The loner payoff is σ = 1.
Contribution to the public good is c = 1, and the multiplier is r = 3. The individual cost of supporting a sanctioning pool is γ = 0.7. A fine per unit of contribution to a sanctioning pool is β = 1.5. In our simulations, we work with a population size of N = 50 and a group size of n = 5.
C Analysis of pairwise coexistences for k = 1   These results do not change with the parameters, but require k = 1. If the population size is no longer infinite, but large compared to the group size, the classification also determines the evolutionary dynamics in finite populations via fixation probabilities and fixation times.
As discussed in the main text, when a single individual can trigger the existence of a pool, k = 1, all pairwise stable coexistences between opportunists and supporters of the pro-social institution display the same behaviour. Supporters of the pro-social pool always find themselves in fully cooperative groups, leading to a payoff To simplify the analysis, we assume that the population is so large that we can assume it is infinite. Then, the probability that an opportunist faces a group with at least one supporter of the pro-social pool is 1 − x n−1 , where x is the fraction of opportunists, who obtain a payoff rc − c in that case. If opportunists are alone, no one cooperates and their payoff is zero. On average, this leads to a payoff In the unique stable equilibrium x * of the associated replicator dynamics [8], both payoffs must be equal, leading to . ( Supporters of pro-social sanctions can invade opportunists (as they immediately induce cooperation) and vice versa (as opportunists save the cost γ). For k = 1, the game is identical to a volunteer's dilemma [9,10,11].

D Pairwise coexistences for k > 1
Let us focus on the coexistences between supporters of the sanctioning pools and opportunists, N DC • •, in the case where more than one individual is needed to implement punishment, k > 1. We assume that only effective pools that are actually realised cause any costs.
We assume that supporters of the pool defect if the pool is not implemented successfully, i.e. we focus on strategies W DC • C (the strategies W CC • C can be invaded by W DC • C for k > 1). We abbreviate the payoff of the W DC • C strategies as where the sum captures the probability that a pool is realised -otherwise their payoff is zero. The opportunists obtain Equating the payoffs, π ID = π O , leads to an expression for the fixed point, The right hand side of this equation increases monotonously from 0, for x = 0, to 1 for x = 1, see Peña et al. [12]. Thus, the associated replicator dynamics [8] has a unique interior fixed point for γ < rc − c. This needs to be solved numerically for an arbitrary k > 1, but for ). This fixed point is stable, because supporters of the pool can invade a monomorphous population of opportunists (as they sometimes benefit from inducing cooperation) and opportunists can also invade a monomorphous population of institution supporters (as they save the cost γ). The probability that a pro-social pool is successfully implemented is given by n j=k n j x n−j (1 − x) j , where x is obtained from solving Eq 6.
D.1 Pairwise coexistences for k = 2 that cannot be invaded For k = 2, there is again a coexistence between W DC • C and N DC • •. Even for k = 2, this coexistence cannot be invaded by other strategies for our default parameters (n = 5, r = 3, c = 1, γ = 0.7, β = 1.5): • Invasion of other supporters of pro-social pools: If W DC • C players would cooperate in the absence of sanctions, they would be exploited, reducing their payoff.
• Invasion of other players who do not support any institution: If N DC • • players would cooperate in the absence of sanctions, they would be exploited. If N DC • • players would defect in the presence of a sanctioning pool, they would suffer from punishment.
• Invasion of supporters of anti-social pools: We assume that invaders continue to defect in the absence and cooperate in the presence of the pro-social pool, i.e. we focus on M DCDD. This provides the scenario most attractive for the supporters of anti-social sanctions. If a player would start to support the anti-social pool, her payoff would be identical to the N DC • • players as long as no sanctions are implemented (invasion is neutral until the anti-social pool is implemented). We then compare the payoff these players can obtain to the one of the residents in their equilibrium, which implies that the number of mutants remains small.
If the mutants find themselves in groups with only the anti-social pool, they pay the cost γ, but do not induce any cooperation and thus have a disadvantage compared to the residents. If the mutants find themselves in groups with both institutions, they are punished in addition. Since in the pairwise coexistence they invade, no player is punished, the fine implies a disadvantage for the supporters of the anti-social institution.
Thus, the coexistence between W DC • C and N DC • • is stable to any invasion also for k = 2.
D.2 Pairwise coexistences for k = 2 that can be invaded Thus, all coexistences that emerge for k = 2 that appear to be stable in a pairwise analysis are unstable with respect to a third invader strategy. Therefore, the qualitative results do no not change when we switch from k = 1 to k = 2, see Fig. 2. However, as there is now selection on the first bit of the pool supporter strategies to defect in the absence of sanctions, we only have six possible coexistences (displaying exactly the same behavior as for k = 1) instead of twelve.

E Additional costs for the sanctioning pools
Can cooperation be maintained if the supporters of the pro-social pool have an additional disadvantage by paying the cost γ for the institution even when the pool is not implemented? For k = 1, this situation never occurs. Only for k > 1, the additional costs affect the results and leads to a Volunteers Dilemma [13,14]. In this case, the payoff of supporters of the pro-social pool is where the cost γ now always has to be payed. The payoff of opportunists remains unchanged.
The right hand side of this equation has a unique maximum at x * = n−k n−1 given by For Γ > γ rc−c , the coexistence equilibrium vanishes and opportunists dominate over punishment supporters -no cooperation emerges. For Γ < γ rc−c , there are two internal equilibria, one stable and one unstable. This implies that opportunists can invade and coexist with sanction supporters. But in contrast to the case of k = 1 and our usual assumption not implementing costs for a non-existing pool, now opportunism is stable and a substantial fraction of sanction supporters is necessary to reach the stable coexistence. This results in a complete breakdown of cooperation.
In this case -as in the case of unconditional strategies -the option not to take part in the game can provide an escape hatch out of mutual defection [15,16]. If we add loners to our simulations, the system shows again some cooperation. Loners are necessary, as in the case of k > 1, opportunists can no longer invade defectors and an alternative path out of defection is necessary. However, for k = 2 and additional costs for the pools their success is very limited, see Fig. 3.

F Additional costs on conditional strategies
Our model assumes that the cost for institutions to be visible is part of the funds that establish the institutions. However, it is also possible to assume that this cost is paid by those strategies that use conditional information. We can study this setup by considering a a cost δ > 0, paid all individuals using strategies that discriminate. We will see that provided that the game is optional, the results remain valid in a wide range of parameters.
For the new cost it is important that δ is such that 0 < σ < (r − 1)c − G − δ. That is, conditional cooperators in groups of cooperators still earn more than those that do not play the game.
The first thing to know is that the stable mixture that guarantees cooperation remains an equilibrium, provided δ < β − c. First note that with the exception of W CCCC, any two strategies W • C • C and N DC • • will both pay the cost δ.
This coexistence cannot be invaded by any other single mutant: • Individuals cooperating in the absence of an institution, N CC • •, would be exploited by the N DC • • resident.
• Defectors in the presence of an institution, N DD • •, would be punished.
• Any player supporting the anti-social institution is outperformed by individuals in the stable coexistence: There, both players obtain −γ + rc − c − δ. A single supporter of the anti-social institution would at most get −γ + rc − β. Therefore, supporters of an anti-social institution cannot invade provided δ < β − c.
In addition for this equilibrium to be reached we require that the public goods game is optional. Our simulations show that the typical path into the stable mixture is via defection, which is neutral to N DD • • (see Figure 3 in the main text). This path is no longer neutral if N DD • • pays a cost δ, thus, getting into the mixture requires the loner strategy (see Figure 1 in this supplement).  Figure 4: Additional costs for the conditional strategies. The stable mixture is prevalent, provided the public goods game is optional (panel A), but is less prominent when the public goods game is mandatory (panel B). We show results for δ = 0.1 and δ = 0.2, other parameters as before, and averaging as before. Figure 4 shows how simulations align with this prediction. A cost in the provided range leads to qualitatively similar results provided the public goods game is optional.

G Computational methods
The simulations reported follow a standard agent-based simulation of a Moran process. Fitness inside groups is deterministic and can be obtained using an algorithm that given a group composition computes the size of the pro-social and anti-social pool as well as the number of PGG contributors.
Given the payoff inside groups we can use a multivariate hypergeometric distribution in order to estimate the probability of each possible group composition. This is efficient when the number of types is small. When the population is composed of three or more types, we resort to Montecarlo simulations to estimate group sampling probabilities out of 1000 samples. However, the payoff computation is exact for the majority of simulation runs, since a small mutation rate guarantees that the population is at most dimorphic for most of the time.
The process that estimates the frequency of mixtures works in the following fashion. A file reporting a complete simulation run is post-processed as follows. At each reported time step, we remove strategies that are below a given noise threshold and compute the support of the population without noise. We then count how many times each support appears as a fraction of all the time-steps in the simulation. Supports of size one correspond to monomorphous populations. Each support of size two corresponds to a potential stable mixture between two strategies. Supports of size three are reported, but their abundance is negligible given that mutation rates are small. The code from the simulations is available online. Payoff inside groups is determined with the following algorithm.