Proceedings of the Royal Society B: Biological Sciences
You have accessResearch articles

Payoff-based learning explains the decline in cooperation in public goods games

Maxwell N. Burton-Chellew

Maxwell N. Burton-Chellew

Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK

Magdalen College, Oxford OX1 4AU, UK

Nuffield College, New Road, Oxford OX1 1NF, UK

[email protected]

Google Scholar

Find this author on PubMed

Heinrich H. Nax

Heinrich H. Nax

Department of Social Sciences-SOMS, ETH Zurich, Clausiusstrasse 37, 8092 Zurich, Switzerland

Google Scholar

Find this author on PubMed

Stuart A. West

Stuart A. West

Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK

Magdalen College, Oxford OX1 4AU, UK

Google Scholar

Find this author on PubMed



    Economic games such as the public goods game are increasingly being used to measure social behaviours in humans and non-human primates. The results of such games have been used to argue that people are pro-social, and that humans are uniquely altruistic, willingly sacrificing their own welfare in order to benefit others. However, an alternative explanation for the empirical observations is that individuals are mistaken, but learn, during the game, how to improve their personal payoff. We test between these competing hypotheses, by comparing the explanatory power of different behavioural rules, in public goods games, where individuals are given different amounts of information. We find: (i) that individual behaviour is best explained by a learning rule that is trying to maximize personal income; (ii) that conditional cooperation disappears when the consequences of cooperation are made clearer; and (iii) that social preferences, if they exist, are more anti-social than pro-social.

    1. Introduction

    The results from economic games have been used to argue that humans are altruistic in a way that differs from most if not all other organisms [14]. In public goods games experiments, participants have to choose how much of their monetary endowment they wish to keep for themselves and how much to contribute to a group project [5,6]. Contributions to the group project are automatically multiplied by the experimenter before then being shared out equally among all group members regardless of their relative contributions [7,8]. The multiplication is usually less than the group size, so that a contributor receives back less from her contribution than she contributed. In this case, participants have to choose between retaining their full endowment and thus maximizing their personal income, or sacrificing some of their earnings to the benefit of the group. Hundreds of experiments have shown that most people partially contribute to the group project and thus fail to maximize personal income [5,6]. It has been argued that this robust result demonstrates that humans have a unique regard for the welfare of others, termed pro-social preferences, which cannot be explained by kin selection [9], reciprocity [10] and/or via improved reputation [1114]. Consequently, economic games are also increasingly being used in non-human primates in attempts to explore the evolutionary origins of such puzzling social behaviours [1517].

    The conclusion that humans are especially, perhaps uniquely, altruistic has relied on the assumption that individuals play ‘perfectly’ in experiments such as the public goods game. Specifically, that individuals have a full understanding of the game, in terms of the consequences of their behaviour for themselves and others, such that their play reflects how they value the welfare of others (social preferences) [1,18]. This results in the inference that the costly decisions that players make knowingly inflict a personal cost in order to benefit others [3]. Consequently the typical decline in contributions when players are made to play the game repeatedly [5,6] (figure 1), is argued to be a withdrawal of cooperation in response to a minority of non-cooperators [1921].

    Figure 1.

    Figure 1. We analyse the data from Burton-Chellew & West [23]. Participants played a public goods game for 20 repeated rounds, with random group composition each round. There were three different information treatments (see text for details). The results conform to the stereotypical results of public goods games, in that contributions commence at intermediate values and decline steadily with repetition of the game.

    An alternative explanation for the data is that individuals are trying to maximize their financial gain, but they are not playing the game ‘perfectly’ [22,23]. This hypothesis predicts individuals initially cooperate to some degree, because they are uncertain and bet-hedge [23], or they are mistaken about how the payoffs operate [22,24,25], or perhaps they operate a heuristic from every-day life that starts off cooperating without calculating the consequences [26]. This hypothesis consequently predicts a decline in cooperation over time as individuals learn, albeit imperfectly, how behaviour influences payoffs. Consistent with this alternative hypothesis, individuals have been found to contribute similar amounts over time to the group project even when they do not know they are playing the public goods game with others [23,27]. However, this alternate hypothesis has been argued against, with the suggestion that the decline in cooperation is better explained by pro-social individuals conditionally cooperating depending upon the behaviour of others, rather than individuals learning how to better play the game [21].

    We explicitly test these competing hypotheses, by examining the rules that individuals use to vary their behaviour when playing the public goods game [28,29] (figure 2). Our first rule assumes that individuals are trying to maximize their own income, but are uncertain or mistaken as to how to do this. They thus subsequently use information from game play to try and improve their earnings. For example, if contributing less over time to the public good coincided with an increase in such an individual's financial reward, then this individual would contribute even less next time, and vice versa (directional learning [27,2933]). Our second and third rules are based on two forms of pro-social behaviour that have been previously argued to lead to altruistic behaviour in public goods games [19,20,34,35]. Our second rule assumes that individuals are trying to maximize a weighted function of their own income and that of their group-mates [35]. This also allows directional learning, but in a way that takes account of the consequences of behaviour for others. Our third rule is conditional cooperation, in response to the cooperation of others [19,20,34,36]. For example, if the average contributions of one's group-mates increase from one round to the next, then one will respond by contributing more in the next round.

    Figure 2.

    Figure 2. We considered the explanatory power of three behavioural response rules: (a) payoff-based learning based on increasing own income; (b) pro-social directional learning, based on own income and the income of others (weighted by α); and (c) conditional cooperation, based on own income and a desire to equalize incomes (weighted by β).

    We analysed data from three public goods games, all with the same payoff-structure, but which differ in the amount of information that the players are given about the consequences of their behaviour for others. Specifically, individuals had no knowledge that their behaviour even benefited others (black box), or were told at the start how their behaviour benefited others (standard), or were also shown after each round of play that contributions benefited others (enhanced) [23]. By comparing behaviour in these different games, we could explicitly examine the extent to which behaviour was influenced by consequences for the actor himself/herself (the only concern in the black box), and consequences for others (increasingly highlighted in the standard and enhanced treatments). In addition, we told players in the standard and enhanced treatment the decisions of their group-mates after each round. This allows us to test whether players are attempting to condition their cooperation and whether this depends on how clear the benefits of contributing are for others.

    2. Material and methods

    (a) Data collection

    We analysed the dataset from our previously published study, where the experimental methods are described in detail [23] (figure 1). This experiment examined the behaviour of 236 individuals, distributed among 16 sessions. Here, we provide a brief summary of the parts of the experimental design relevant to this study.

    We tested three versions of the public goods game and used an identical set-up and payoff matrix, but provided different levels of social information, each time. In each session, we had 12 or 16 participants and we grouped them into groups of four and had them play the public good game, before repeating the game again and again for a total of 20 rounds. Groups were randomly created every round. In all treatments, we gave our participants a fresh endowment of 40 monetary units (MU), or 40 coins (for the black box), per round, and multiplied the contributions of players by 1.6 before sharing them out equally among all four group members. This meant that the marginal-per-capita-return (MPCR) for each unit contributed was 0.4. Consequently, contributions were always personally costly and to not contribute was the payoff-maximizing (strictly dominant) strategy in each round.

    Our most extreme condition was an entirely asocial set-up, with no social framing, and where instead of allowing participants to contribute to a group project, we let them contribute to a ‘black box’, even though they were in reality playing a standard inter-connected public goods game. We told the participants that the black box ‘performs a mathematical function that converts the number of coins inputted into a number of coins to be outputted’. This allowed us to deliberately create participants who would not know the payoff-maximizing strategy and are also unconcerned by other-regarding preferences. In such a condition, the participants could only be motivated to adjust behaviour so as to maximize their own income, as much as participants are ever so motivated.

    Our other two treatments were revealed public goods games, where we told our participants they could either contribute each MU to a group project (the public good) or keep it for themselves. We told our players how the game works, specifically that contributions are multiplied by 1.6 before being shared out equally among all four players. In both of these ‘revealed’ versions of the game, we gave our participants the exact same instructions, but we gave more information after each round of play in one treatment than the other. Specifically, in the ‘standard’ set-up, we told participants after each round what their own payoffs were, and also what the decisions of their three group-mates were. This is the most typical information content of public goods game studies, e.g. [1] which has provided the template for many subsequent studies. In our ‘enhanced’ treatment, we also informed our participants what their group-mates individual returns from the group project were and their subsequent individual earnings. Note that in this enhanced treatment, there is strictly speaking no new information relative to the standard treatment, if players (i) understood the game and (ii) were calculating the earnings of their group-mates from their contributions.

    Methodologically, in each session, we had our participants play two ‘game-frames', i.e. both a black box game and a revealed public goods game, in order to enable a within-participant analysis. We presented the two games as two entirely separate experiments to minimize spill-over effects: in one they could ‘input’ ‘coins' into a ‘black box’, in the other they could ‘contribute’ ‘MU’ to a ‘group project’, and the order of play of these games was counter-balanced across sessions.

    (b) Statistical analysis

    We tested three learning rules (figure 2). In all cases, we assumed that players adjusted their behaviour according to whether previous behavioural adjustments lead to positive or negative consequences for the proposed underlying utility function. For example, if players derive utility only from their personal income, and a previous reduction (or increase) in their contributions led to an increase in their personal income, then in the next time step they would gravitate towards the lesser (or greater), more successful, level of contribution. Similarly, if players value the payoffs to others, then ceteris paribus, others' changes in income would be responded to in an equivalent way. The three underlying utility functions that we examine were as follows:

    (I) payoff-based learning: individuals set contributions, ci, in response to their own income, φi(c) and the resulting utility is simply ui(c) = φi;

    (II) pro-social learning: individuals set contributions, ci, in response to both their own income, φi(c), and the income of the other members of their group, φj(c) and the resulting utility, a weighted function of the two, is Inline Formula Inline Formula where αi measures the agent's concern for others' payoffs. Pro-sociality implies αi > 0; and

    (III) conditional cooperation: individuals set contributions, ci, in response to their own income, φi(c) and to the contributions of their group-mates, such that the resulting utility is Inline Formula where βi measures the agent's concern to match others' contributions.

    We chose these utility functions because of their relationship to the utility functions already discussed in the literature, and because they allow a clear comparison between cases with and without pro-social preferences. Different, and potentially more elaborate behavioural rules could be favoured in different scenarios allowing more behavioural flexibility [37,38].

    We perform ordinary linear regressions with individual-level clustering of the form Inline Formula, where Inline Formula, measuring a contribution adjustment by player i, is the response variable and Inline Formula is the vector of the predictor variables including those for the three hypotheses. β is the vector of parameters to be estimated and βi is the estimator of predictor variable xi's positive effect on the response variable for a unit change in xi. et + 1 represents the standard (normally distributed) error term for this model. We focus on adjustments in periods 1–10 because median contributions, having reached zero in the enhanced treatment, and near zero otherwise (5/6 for black box and 4 for standard), change little after this and we are interested in modelling how cooperative behaviour changes over time.

    Our response variable records an individual's directional changes in contributions over time: Inline Formula, and takes the value +1 when representing an increase in contributions (relative to the average of the previous two periods), −1 when representing a decrease and 0 otherwise. Our predictor variables specify the directional change in contributions that should occur in line with the relevant utility function or learning rule.

    The predictor variables xi represent the three different learning rules above by encoding the previous relationship between an agent's contributions and (I) their payoffs, (II) their group-mates' payoffs or (III) their group-mates' actions, respectively. They take integer values from –1 to 1. Specifically, for utility function (I), payoff-based learning, if a player's contribution increased across the two rounds (if Inline Formula) along with their payoff (Inline Formula), then we predict that this coupling of increased contributions with ‘success' (increased payoff) will lead to a contribution increase (relative to the mean of the two previous rounds). We therefore encode this as +1. Likewise, following a contribution decrease and ‘failure’ (if Inline Formula and Inline Formula) we also predict a contribution increase and encode +1. By contrast, following a contribution decrease and ‘success' (if Inline Formula and Inline Formula) or a contribution increase and ‘failure’ (if Inline Formula and Inline Formula), we predict a contribution decrease (relative to the mean of the two previous rounds) and encode −1, and we predict 0 for all other cases.

    For utility function (II), pro-social learning, we likewise encode the value +1 following either a contribution increase and ‘other-regarding success' (if Inline Formula and Inline Formula) or a contribution decrease and ‘other-regarding failure’ (if Inline Formula and Inline Formula); –1 following either a contribution decrease coupled with ‘other-regarding success' (if Inline Formula and Inline Formula) or a contribution increase with ‘other-regarding failure’ (if Inline Formula and Inline Formula), and 0 otherwise. Thus this variable, along with the payoff-based learning variable, is also positive if the prior directional changes in contributions were maintained after success or reversed after failure, but success and failure are now judged in terms of others' payoffs instead of own payoffs. For our third utility function, (III), conditional cooperation, we encode +1 when there has been an increase in the mean contribution of group-mates across the previous two rounds (if Inline Formula) and 0 otherwise.

    Positive estimators of the βi, mean a positive correlation between the learning rule and the subsequent changes in contributions, and thus support the respective hypothesis, whereas negative estimators, meaning a negative correlation between the learning rule and the subsequent changes in contributions, contradict the respective hypothesis. For pro-social learning, the coefficient indicates whether the average of weights αi on others' income is supportive of pro-sociality (positive) or not. Table 1 summarizes the results according to their implications for the various hypotheses. Table 2 provides full details of the parameter estimates for all models on all the data. The electronic supplementary material provides the parameter estimates for models that analysed sub-sets of the data according to which game-frame order they belonged to (see Material and methods, data collection). We also provide a table detailing the utility functions and their quantitative relationship to the data (electronic supplementary material, table 2).

    Table 1.Summary of results from testing the three different learning rules together. (The table details the statistical significance of the three learning rules (payoff-based learning, pro-social learning and conditional cooperation) for the three information treatments (black box, standard and enhanced). ✓, estimators significantly support direction of hypothesis in this treatment. ✗, estimators significantly contradict direction of hypothesis in this treatment, n.s., non-significant. The values represent the estimate of the effects of unit changes in the hypothesis-specific predictor variables on the response variable; positive (negative) parameter estimators support (contradict) the respective hypothesis. Table 2 details the regressions fully.)

    black box standard enhanced
    payoff-based learning ✓0.30* ✓0.25* ✓0.14*
    pro-social learninga ✗−0.13* ✗−0.23* ✗−0.29*
    conditional cooperationa n.s.0.05 ✓0.21* n.s.−0.001

    *significance < 0.001.

    aControlling for payoff-based learning.

    Table 2.A comparison of the different behavioural rules, plus one combining them all together, across three different information treatments. (PBL, payoff-based learning (own success); PSL, pro-social learning (own success and others' success); CC, conditional cooperation (own success and others' actions). All, a combination of all the components from the three rules (own success, others' success, and others' actions). The parameters in the first three rows estimate the effects of unit changes in the predictor variables that act as components in the three learning rules; positive (negative) parameter estimators support (contradict) the respective hypothesis.)

    black box estimate (significance)
    standard estimate (significance)
    enhanced estimate (significance)
    own success 0.31 (0.001) 0.29 (0.001) 0.30 (0.001) 0.30 (0.001) 0.28 (0.001) 0.22 (0.001) 0.30 (0.001) 0.25 (0.001) 0.22 (0.001) 0.14 (0.001) 0.19 (0.001) 0.14 (0.001)
    others' success −0.12 (0.001) −0.13 (0.001) −0.16 (0.001) −0.23 (0.001) −0.29 (0.001) −0.29 (0.001)
    others' actions −0.04 (0.241) 0.05 (0.178) 0.09 (0.038) 0.21 (0.001) −0.11 (0.012) −0.001 (0.95)
    r2 0.09 0.10 0.09 0.10 0.07 0.09 0.07 0.10 0.04 0.11 0.05 0.11
    no. obs. 1888 1888 1888 1888 928 928 928 928 960 960 960 960

    3. Results and discussion

    We found that our payoff-based learning rule was significant for all three versions of the public goods game, in contrast to both our pro-social and conditional-cooperation rules which were typically non-significant or significant in the wrong direction (tables 1 and 2; electronic supplementary material).

    (a) Learning in a black box

    In the black box treatment, the behaviour of individuals could best be explained by payoff-based responses, with players significantly learning to improve their income (tables 1 and 2). Figure 1 confirms that, this leads to behaviour at the group level which is strikingly similar to play in standard public goods games. By contrast, the pro-social response rule estimate was significantly negative, attributing a negative weight to the welfare of other players. This would represent anti-social preferences if it were not for the asocial frame of the black box treatment and provides a baseline estimate for the anti-social nature of payoff-based learning. The conditional-cooperation response rule was not significant when payoff-based learning is controlled for.

    (b) Learning in public goods games

    We found that the behaviour of individuals in public goods games could, as in the black box, be significantly explained by payoff-based learning, but not by pro-sociality (tables 1 and 2). Again the pro-social learning rule estimated a significantly negative weight to the other players (α), implying ‘anti-social’ behaviour in this socially framed game (tables 1 and 2). The coefficient was considerably larger in the enhanced treatment, than in the standard treatment, and considerably larger in the standard treatment than in the black box, suggesting that providing players with more information on how contributions benefit others but are personally costly has anti-social consequences. This would not be the case if players understood the game and were willingly sacrificing in order to benefit others.

    Conditional cooperation was significant in the standard version but not the enhanced version of the game, which has identical instructions and game structure, but where individuals were explicitly shown the returns to the other group members from the group project. This enhanced information could of course in principle be calculated by participants in the standard treatment as they knew the decisions of their group-mates. In the standard version, the conditional cooperation rule was not so significant unless controlling for anti-social responses to others' success (table 2).

    Conditional cooperation is proposed to explain the typical decline in contributions over time [19,20], but contributions declined faster in the enhanced treatment where conditional cooperation was either non-significant (combined model, table 2) or significantly negative (non-combined model, table 2). This suggests that the conditional cooperation in the standard treatment is more to do with social learning than social preferences, as the reduced uncertainty in the enhanced treatment may reduce uncertain participants' reliance upon imitation [39]. In addition, if some participants have incorrect beliefs about how the payoffs are determined and choose to match others in the standard treatment, they may be less likely to do so in the enhanced treatment as they revise their mistaken beliefs.

    The dataset we used also contains three additional experimental treatments, where the contributions were multiplied by 6.4 instead of 1.6 and thus the resulting MPCR was 1.6 instead of 0.4 [23]. In these treatments, the MPCR > 1.0, which means that contributing fully was both the income-maximizing (strictly dominant) strategy for any particular round and also the social optimum. We do not analyse the data from these treatments here, because in such treatments it is impossible to differentiate our first and second behavioural rules, as individual and pro-social outcomes are aligned in these settings (there is no conflict between individual and group outcomes). However, the fact that contributions were significantly below full contribution in all three treatments, even after 20 rounds, but increased over time in both the black box and the standard games [23], is also consistent with the payoff-based learning hypothesis.

    However, such payoff-based learning does not require that people realize that the dominant strategy is independent of their group-mates' actions. Therefore the re-start phenomenon [40,41] whereby average cooperation levels temporarily increase from a previous decline when the experiment is ‘re-started’, while challenging, does not falsify learning hypotheses, and may also be partly owing to selfish players attempting to manipulate others [40,42,43].

    (c) Cooperation in public goods games

    Overall, our analyses suggest that changes in behaviour over time in public goods games are largely explained by participants learning how to improve personal income. We found conflicting support for conditional cooperation as such behaviour disappeared when the consequences of contributing were made clearer. This suggests that conditional cooperation is largely due to confusion/error and not pro-sociality. This is reinforced by our lack of evidence of a desire to help others (pro-sociality). Indeed, we found that, if anything, the benefits to others are weighted negatively, with individuals adjusting their behaviour to better reduce the income of others. We are not suggesting that humans are anti-social, nor that they are never pro-social—pro-sociality is found across the tree of life from genes to cells to vertebrates [44]—rather, that public goods games do not demonstrate that humans are uniquely altruistic.

    Our conclusions contradict a widely accepted paradigm in the field of human behaviour, that the results of public goods games reflect a uniquely human regard for the welfare of others [3,18,20]. We suggest that the acceptance of this human pro-sociality hypothesis was based on two things. First, there has perhaps been a lack of control treatments where imperfect behaviour would not always lead to higher than expected levels of cooperation [22], and null hypotheses, such as that provided by the black box treatment [23]. Second, there has been an implicit assumption that humans behave as utility-maximizers, such that their costly choices reliably reveal their (social) preferences [18].

    However, there is an increasing range of evidence that individuals do not play games as perfect maximizing machines [2225], that they instead exhibit bounded-rationality, and can be influenced by a variety of ‘irrelevant’ factors that do not influence payoffs in the game [4547]. This is in accord with one of the revolutionary findings of behavioural economics, that people are predictably irrational, and make systematic errors that limit their own welfare [28]. Yet paradoxically, the behavioural economics approach is routinely used to ‘measure’ pro-sociality, using methods that rely upon the assumption of rational choice and revealed preferences.

    Data accessibility

    All the data have been submitted to Dryad and are available at doi:10.5061/dryad.cr829.


    We thank Jay Biernaskie, Innes Cuthill, Claire El Mouden, Nichola Raihani and two anonymous referees for comments.

    Funding statement

    We thank the ERC, Nuffield College and the Calleva Research Centre, Magdalen College, for funding.


    †Joint first authors.