Escaping the tragedy of the commons through targeted punishment

Failures of cooperation cause many of society's gravest problems. It is well known that cooperation among many players faced with a social dilemma can be maintained thanks to the possibility of punishment, but achieving the initial state of widespread cooperation is often much more difficult. We show here that there exist strategies of ‘targeted punishment’ whereby a small number of punishers can shift a population of defectors into a state of global cooperation. We conclude by outlining how the international community could use a strategy of this kind to combat climate change.

• Groups: A defector is considered at fault if and only if at least a fraction θ of the players making up the group immediately before hers in the ordering are currently cooperating.
The strategies are most effective if players are arranged in descending order of their net perceived payoff h i -that is, in increasing order of their temptation to defect. We show that by focusing their punishment on defectors considered at fault according to the rule adopted, punishers can shift a population of defectors towards global cooperation in many situations where attempting to punish all defectors would have no appreciable effect. But the generality of these results is limited by two assumptions: that the number of punishers is proportional to the number of cooperators; and that the perceived payoffs of players follow a linear form, which is known to punishers. In this appendix we relax both assumptions and find that the effectivity of targeted punishment does not depend strongly on such considerations (Sections 1 and 2). We also look into the effects of the number of players (Section 3) and of heterogeneity (Section 4). Our results are found not to depend on a small system size, and heterogeneity is not necessary for targeted punishment to work, although it does promote cooperation.

Constant punishment
In the main text, we consider only scenarios in which the number of punishers is equal to the number of cooperators. If it were, in fact, proportional to the number of cooperators, this would simply involve a rescaling of the punishment parameter, π. But what if all players were punishers, irrespectively of their individual state of cooperation? Figure S1 shows the situations corresponding to Figure 1 of the main text, with the difference that now the number of punishers is n p = N at all times. As in the case where only cooperators punish, there is a large region of parameter space where cooperation is sustainable, but not achievable when punishment is diluted among all defectors. The targeted punishment strategies described above are able to bring about global cooperation in large regions of the parameter space, as can be seen in Figure S2 -this figure corresponds to Figure 3 of the main text, with the difference that here n p = N , instead of n p = n c . Note that, if we wished to consider a situation where a fraction a of players where punishers, it would suffice to rescale the punishment parameter as π → aπ in Figures S1 and S2.

Robustness to noise
In the main text we consider situations where player i's payoff, in the absence of punishment, is h i = −(i − 2)/(N − 2), and players are arranged in descending order of h. In real situations, punishers may not know the payoff perceived by player i. We therefore corrupt this setting with two sources of noise to gauge the strategies' robustness to this imperfect knowledge. We now consider that  a Gaussian with mean zero and variance σ 2 . We also reshuffle the ordering, by choosing a fraction f of the N players randomly, and switching their positions in the ordering with randomly chosen players. Figure S3 shows a setting like that of Figure 3 of the main text, after we have corrupted the payoffs with a (quenched) noise set by σ 2 = 1, and reshuffled a proportion f = 25% of players.
Although the targeted punishment strategies lose some of their effectivity, there are still large regions of parameter space where their adoption leads to most players cooperating. In Figure S4, we carry out the same corruption of payoffs and of the ordering as in Figure S3, but here we consider the situation where all players are punishers: n p = N . Again, the strategies can be seen to be fairly robust under these conditions.

Number of players
The results presented thus far in the figures are for a relatively small set of players, N = 200, a value chosen to coincide roughly with the number of countries in the world, and to make the heat-maps of previous figures computationally inexpensive. None of the reported results ensue from finite-size effects, however, and strategies of targeted punishment can work for any size N . In Fig. S5 we show the stationary proportion of cooperators against punishment level for N = 1000 players initially set to defect, for each of the punishment strategies described. There is always a discontinuous (first-order) transition between a regime in which global cooperation is attained, and one in which most players defect indefinitely. The targeted punishment strategies serve to limit the proportion of parameter space yielding coexistence of phases -in other words, they reduce the surface of the hysteresis loop.
The fact that our results are independent of system size suggests that strategies of targeted punishment might also be effective in situations where the players are a large number of individuals. For instance, if every person willing to recycle waste, vote or travel by bicycle chooses just a few individuals to chastise whenever they fail to cooperate too, socially beneficial behaviour can spread throughout society. Needless to say, under certain conditions, socially noxious behaviour might also percolate by the same mechanism.

Heterogeneity
We have so far been considering situations in which the individual predispositions of players are heterogeneously distributed. In the absence of noise, these have been given by h i = −(i − 2)/(N − 2). We expect the effectivity of targeted punishment strategies to be increased by heterogeneity, since it allows cooperation to build up gradually, beginning with the players most inclined to cooperate. In order to look into the importance of this aspect, we consider a different set of predispositions, is the mean predisposition, and α is a parameter determining the degree of heterogeneity: for α = 0 the situation is completely homogeneous, while for α = 1 we recover the heterogeneity considered in the main text. Note that the mean predisposition is independent of α.
In Fig. S6 we show the effect of heterogeneity on each of the punishment strategies. When punishment is applied equally to all defectors, heterogeneity has no effect on the stationary proportion of cooperators, ρ. Under strategies of targeted punishment, heterogeneity does indeed increase the parameter range in which global cooperation can be achieved. However, the strategies are still significantly more effective than equal punishment even for completely homogeneous predispositions (α = 0). We can therefore conclude that heterogeneity is not necessary for targeted punishment to work. It is noteworthy, however, that in the 'groups strategy' case not all Monte Carlo runs achieve global cooperation when α = 0. The proportion which do seems to depend on the maximum time the simulation is run for (in Fig. S6, 10 4 MCS), suggesting that global defection acts as a metastable state from which the system eventually escapes thanks to a degree of irrationality (finite β).