Cooperation in the Prisoner’s Dilemma: an experimental comparison between pure and mixed strategies

Cooperation is—despite not being predicted by game theory—a widely documented aspect of human behaviour in Prisoner’s Dilemma (PD) situations. This article presents a comparison between subjects restricted to playing pure strategies and subjects allowed to play mixed strategies in a one-shot symmetric PD laboratory experiment. Subjects interact with 10 other subjects and take their decisions all at once. Because subjects in the mixed-strategy treatment group are allowed to condition their level of cooperation more precisely on their beliefs about their counterparts’ level of cooperation, we predicted the cooperation rate in the mixed-strategy treatment group to be higher than in the pure-strategy control group. The results of our experiment reject our prediction: even after controlling for beliefs about the other subjects’ level of cooperation, we find that cooperation in the mixed-strategy group is lower than in the pure-strategy group. We also find, however, that subjects in the mixed-strategy group condition their cooperative behaviour more closely on their beliefs than in the pure-strategy group. In the mixed-strategy group, most subjects choose intermediate levels of cooperation.


Introduction
Prisoner's Dilemma (PD) is a social dilemma in which (usually) two players simultaneously face a choice between two options: to cooperate or to defect. The game matrix of the PD with payoffs T . R . P . S is displayed in table 1 (the first payoff in each cell belongs to Player A, the second to Player B). If both players cooperate, they both receive payoff R (for reward). If both players defect, they receive P (for punishment).  To increase the chances that decisions are taken deliberately, subjects in both treatments are asked to state a belief about their opponents' cooperativeness. Eliciting these beliefs also allows us to examine the relationship between beliefs and cooperative behaviour more closely. The elicitation of beliefs about other players' behaviour and the consequences of these beliefs for one's own behaviour was an early research topic. Subjects in PD experiments guess that others will play as they themselves intend to play [21][22][23]. Croson [24] found that when subjects were asked for their best (binary) estimation of what their counterpart in the experiment would do, it decreased subsequent cooperation in one-shot PD experiments by about 30% compared with subjects who were not asked. Acevedo & Krueger [25] attribute this relationship between beliefs and behaviour to evidential reasoning and social value orientation. Rubinstein & Salant [26] present related evidence for self-similarity in strategic interactions akin to the PD.
In a post-experimental questionnaire, we asked subjects about control variables we considered important for experiments conducted with students at a university campus (we decided not to include more control variables as the subjects in the experiments were exclusively students and hence of similar age and educational level and neither of them has participated previously in a PD experiment in the laboratory; we did not include a measure for risk aversion because there is evidence that it does not correlate with behaviour in the PD [27] or the Trust Game [28]). First, we included gender because [29] found females to be more cooperative in the first rounds of a repeated PD experiment (this difference was more pronounced in mixed-gender sessions than when single-gender sessions were compared). See [30, pp. 461 -463] for a more general discussion of gender differences in PD experiments and [31] for a meta-study of gender differences in Dictator Game and PD experiments. Second, we included whether subjects had already heard about the experiment (because having heard of the experiment from peers may make subjects behave differently than subjects who have not). Third, we included whether they were familiar with game theory (as the PD is usually taught in game theory classes and knowing the solution may make students behave more in line with theory; see, e.g. [32,33] on the role of subjects' experience in PD experiments). Finally, we asked how many other subjects in the room the subjects knew personally (knowing more of the other subjects personally may make subjects behave more pro-socially, i.e. more prone to cooperate in the PD).
Standard game theory predicts that the option to play mixed strategies in a one-shot PD game will not affect cooperation at all. Mutual defection is the game's only Nash Equilibrium, which means that players have no incentive to unilaterally deviate from the probability distribution of 100% defection and 0% cooperation. Empirically, however, up to 80% of the choices in experimental PD games are cooperative, depending on the calibration of the payoffs [34]. For our experiments, we chose the game matrix presented in table 2. It had already been used in [24], who reported a cooperation rate of 55% and a belief rate of 45%. In Pure, pro-social subjects have to face an 'all-or-nothing' decision. Here, uncertainty about others' behaviour is likely to draw pro-social subjects toward defection, because of the fear of being taken advantage of overwhelms the desire to maximize joint outcomes. In Mixed, we expect the option to play mixed strategies to encourage pro-social subjects to reciprocally cooperate at least to the same degree that they expect their opponents to cooperate. The crucial point is that only mixing strategies enables subjects to give the best response to their belief. As we expect a distribution very close to 50% cooperation/50% defection of both beliefs and behaviour, the chosen game matrix should give us clear results.
Prediction: The cooperation rate in Mixed is higher than in Pure. The one-shot decision provides the cleanest test for social dilemmas like the PD. When a decision is only taken once, subjects cannot learn over the course of time (as some subjects gain understanding when feedback is provided [8]). Conditioning one's own behaviour on the observed past behaviour of others is not possible (like the reciprocity reported in Public Goods Game experiments, i.e. in [35]) and reputationbuilding does not play a role (as it does in [36] when interacting more than once with the same subject).

Methods
The replication crisis [37][38][39] has revealed that many results in psychology, experimental economics and other social sciences are not reproducible. We address this crisis by determining the number of required observations with the help of a power calculation (where the expected effect size is based on the literature) before conducting our experiments. Using G*Power 3.1.9.2 [40], a required sample size of 40 in each of the two treatment groups was calculated to provide a statistical power of 1 2 b ¼ 0.8 to detect an effect of d ¼ 0.58, assuming a one-sided Wilcoxon rank-sum test and an error probability of a ¼ 0.05. We used the results in [24] and calculated the effect size based on an expected increase in cooperation of 7 percentage points in Mixed over the reported cooperation rate of 55% whose payoff matrix we also use [24, Table 4(a), p. 310]. We assumed a standard deviation of s.d. ¼ 12.16 in both treatments (calculated from the data points in a recent meta-study [13, fig. 3, p. 71]). A total of 97 students from the University of Potsdam who had subscribed to the ORSEE database (based on [41]) of the Potsdam Laboratory for Economic Experiments (or PLEx, https://www.unipotsdam.de/plex) were recruited to participate in this experiment. These subjects were randomly assigned to two treatments: 48 subjects in Pure and 49 subjects in Mixed. A total of 12-18 subjects participated in each of the six sessions conducted in June 2018. Each subject participated in one session only.
After entering the laboratory, subjects were randomly assigned to a computer terminal, after which point any communication between subjects was forbidden. Blinds between workstations prohibited subjects from looking at their neighbours' screens and observing their decisions. A blank sheet of paper and a pen were provided for each subject. Experimental instructions were displayed on the computer screen at the beginning of the experiment (for translations of the experiment and screenshots in German, refer to the repository in the Data Accessibility statement). Sessions were either Pure or Mixed sessions so that the instructions were identical for all subjects in the room. Each experimental session lasted about 15 min. Subsequent to the experimental game, subjects were asked to fill in a short questionnaire collecting information about subjects' gender (dummy variable Female ¼ 1 if female) and whether they had already heard about the experiment (dummy variable Known Experiment ¼ 1 if yes), whether they were familiar with game theory (dummy-variable Game Theory ¼ 1 if yes) and how many other subjects in the room they knew personally (variable Known Subjects ¼ number of known subjects). Subjects earned a show-up fee of E4 and an average of E6.18 in the game (E6.47 in Pure, E5.90 in Mixed). Subjects received their payoff in private. The experiment was programmed in z-Tree [42] and framed in a neutral way. In both groups, subjects were presented with the payoff matrix in table 2. Cooperation was labelled decision A, defection decision B.
In both groups, subjects had to take one single payoff-relevant decision. In Pure, subjects had to decide to play either decision A or decision B in all 10 subsequent interactions (variable Cooperation: either 0 or 1, transformed into rates of either 0 or 100). In Mixed, in contrast, subjects had to decide in how many of the 10 interactions they would take decision A (variable Cooperation: integers between 0 and 10, transformed into rates between 0 and 100). In the remaining interactions, they played decision B. The order in which they played the chosen mix of A or B against their counterparts was randomly determined by the computer. Following this, the computer matched subjects randomly into pairs with one of 10 other subjects in the room. Each subject's payoff from the experiment was the sum of profits earned in the 10 interactions. Subjects did not receive any information about their counterparts or other subjects' decisions.
Before subjects took their decision, they were asked to (non-incentivized) evaluate the other subjects' behaviour. In Pure, subjects had to state how many of their 10 interaction partners they expected would choose decision A (variable Belief: integer between 0 and 10, transformed into rates between 0 and 100). In Mixed, subjects had to state in how many interactions they believed their 10 interaction partners would choose decision A on average (variable Belief: number with up to two decimal places between 0 and 10, also transformed into rates).

Comparison of treatment means
Most important are the comparisons of the means of the two variables of interest, Cooperation and Belief, in our treatment groups (both variables are expressed here as rates and range between 0 and 100%). We also check our control variables for balanced samples, as differences between treatments may affect the outcomes. Table 3 presents the sample means, differences between treatments and test royalsocietypublishing.org/journal/rsos R. Soc. open sci. 6: 182142 results on the differences between the treatments. We randomly assign 49 subjects to Mixed and 48 subjects to Pure. We do not exclude any observations.
In order to compare the (quasi-)continuous variables in the two independent samples, we use the Wilcoxon rank-sum test. It is a non-parametric test as it (in contrast to the t-test) neither requires the assumption that both samples are of equal variance nor that the two samples are normally distributed. We apply the x 2 -test to detect differences in the frequencies of binary categories in the two independent samples.

Result
Our main question is the difference in cooperation rates between the Mixed treatment group and the Pure control group. The cooperation rate in Mixed is 60%, in Pure 75%. A two-sided Wilcoxon rank-sum test shows the difference between the two groups to be highly statistically significant ( p ¼ 0.0003).
Our prediction that the possibility to play mixed strategies will increase cooperation in the PD is shown to be wrong: the cooperation rate in Pure is higher than in Mixed.
Beliefs about other subjects' cooperativeness may, of course, also be affected by the decision environment (Belief is an endogenous variable). A two-sided Wilcoxon rank-sum test finds the difference between Mixed and Pure to be statistically different ( p ¼ 0.0396). Hence, the subjects' beliefs correctly reflect the lower cooperation rate in Mixed compared to Pure.
In our check for balanced samples, only the variable Known Subjects was found to be statistically different between the treatments ( p ¼ 0.0388). We will later include this variable in a robustness check of the different cooperation rates identified in the two treatments.

Test for gender differences in beliefs and cooperation
Given the interest in gender differences in cooperation mentioned in the introduction, we shortly examine the relationship between gender and cooperation rate and gender and belief separately. We neither observe a statistically significant relationship between

The relationship between cooperation and beliefs
First, we consider the distributions of the variables Belief and Cooperation. Figure 1 displays histograms of these two variables in Pure and Mixed. We observe that the distributions of Belief in Table 3. Variable means in both treatments in test of differences. Note: Standard deviations in parentheses and asterisks indicate difference between the treatments. This leads us to the main issue in this section: the relationship between the subjects' beliefs regarding the cooperative play of others and their own decision. Figure 2 shows a boxplot of Belief by Cooperation in Pure. Cooperators have a slightly higher median Belief than defectors and their beliefs are more compressed. However, the Pearson correlation coefficient of 0.140 is not found to be significantly different from zero ( p ¼ 0.3445). Figure 3 shows a scatterplot which suggests a linear relationship between Cooperation and Belief in Mixed. A positive correlation between Cooperation and Belief in this treatment is confirmed by a Pearson correlation coefficient of 0.403, significantly different from zero ( p ¼ 0.0041).

Controlling for confounds using OLS regressions
Does the result that subjects in Mixed cooperate less than the subjects in Pure still hold if we control for the two variables that differed between treatments? Table 4 displays the results from OLS regressions (in economics, the multivariate ordinary least-squares regression is the most common technique to estimate relationships between variables while controlling for covariates' influence). In Model 1, we regress Cooperation, using our pooled data, on a constant and a treatment-dummy for Mixed. The result confirms our previous finding: significantly more cooperation in Pure (t-test, p ¼ 0.043). In Model 2, we extend Model 1 by adding Belief into the regression. Both variables are statistically  Which model provides the best statistical fit (as we neither want to overfit nor underfit our model)? The measure of explained variance, adjusted R 2 , is highest for Model 2, and the Akaike and Bayesian information criteria (AIC and BIC; the most common criteria for model selection) are lowest for Model 2. All three metrics indicate that Model 2 provides the best statistical fit of the three models. We conclude from this robustness check that the cooperation rate in Pure is higher than in Mixed even when we control for the variable Belief (which is endogenous to the two treatment groups), contrary to our prediction.

Conclusion
To summarize, we conducted one-shot PD game experiments. Our treatment variable was the opportunity to play mixed strategies. In a control group, subjects were limited to playing either full  cooperation or full defection against 10 other subjects. In the treatment group, the subjects were allowed to choose any mix of the two strategies. Before subjects took their decision, we elicited their beliefs about the other subjects' level of cooperativeness. Using a two-sided test, we found that-contrary to our prediction-the cooperation rate in Pure was actually higher than in Mixed. Even after controlling for the subjects' beliefs in OLS regressions, this difference remains significantly different from zero (though only at the 10% level). As we conducted only a power calculation for a comparison of the treatment averages for Cooperation, we are careful with the interpretation of the higher cooperation rate we detected in Pure. However, we see our findings as an indication that cooperation rates differ when subjects can use mixed strategies.
A reviewer of this paper pointed out that the subjects in Mixed might cooperate with a certain probability. In Pure, these subjects would only cooperate if this probability is higher than a certain threshold (it is very likely that they only cooperate if they believe that more than 50% of the other subjects also cooperate). This switching-point theory sounds reasonable. However, in order to test it one would require an experimental design where each subject goes through both a Pure and a Mixed stage (the order of Pure and Mixed should be randomized between subjects and controlled for in the analyses). With our design, we only test whether cooperation rates (and beliefs) differ when subjects can play mixed strategies (between subjects). The beliefs about the cooperativeness of others is endogenous in the control and in the treatment group.
A previous study showed how cooperation rates vary across one-shot symmetric PD experiments when the cooperation/cooperation payoff in the underlying game matrices is varied [43]. They find, as predicted, that the cooperation rate increases when they increase the payoff. They also find that the beliefs about other subjects' behaviour (which were elicited after the subjects took their decision) closely track the cooperation rate in the respective treatment.
We think it would be interesting to combine the experimental design in [43] with our approach. Depending on the parametrization of the PD game matrix, the effect of mixed strategies may be different. When the cooperation rate in a Pure treatment is very low, this rate may be higher in a Mixed treatment (due to subjects who do not completely defect but choose an intermediate level of cooperation). This, of course, requires another series of experiments. These experiments could also include a questionnaire asking for the subjects' social value orientation in order to disentangle the subjects' motives for cooperating (see [44] for a meta-study of social value orientation in social dilemmas).
Ethics. Economic experiments like ours are not subject to approval by the university's ethical review board (https:// www.uni-potsdam.de/senat/kommissionen-des-senats/ek.html). A general informed consent/data privacy statement was signed by all subjects prior to the first experiment at the PLEx. No minors participated in the experiments. Competing interests. We declare we have no competing interests Funding. We acknowledge the support of Deutsche Forschungsgemeinschaft (German Research Foundation) and Open Access Publication Fund of University of Potsdam.