Journal of The Royal Society Interface
You have accessResearch articles

The effect of model uncertainty on cooperation in sensorimotor interactions


    Decision-makers have been shown to rely on probabilistic models for perception and action. However, these models can be incorrect or partially wrong in which case the decision-maker has to cope with model uncertainty. Model uncertainty has recently also been shown to be an important determinant of sensorimotor behaviour in humans that can lead to risk-sensitive deviations from Bayes optimal behaviour towards worst-case or best-case outcomes. Here, we investigate the effect of model uncertainty on cooperation in sensorimotor interactions similar to the stag-hunt game, where players develop models about the other player and decide between a pay-off-dominant cooperative solution and a risk-dominant, non-cooperative solution. In simulations, we show that players who allow for optimistic deviations from their opponent model are much more likely to converge to cooperative outcomes. We also implemented this agent model in a virtual reality environment, and let human subjects play against a virtual player. In this game, subjects' pay-offs were experienced as forces opposing their movements. During the experiment, we manipulated the risk sensitivity of the computer player and observed human responses. We found not only that humans adaptively changed their level of cooperation depending on the risk sensitivity of the computer player but also that their initial play exhibited characteristic risk-sensitive biases. Our results suggest that model uncertainty is an important determinant of cooperation in two-player sensorimotor interactions.

    1. Introduction

    When interacting with its environment, the human sensorimotor system has been shown to use predictive models for control and estimation [16]. These models are thought to be probabilistic in nature, and considerable evidence suggests that learning of such models is consistent with the process of Bayesian inference [710]. Such probabilistic models are not only important for perception but they can also be used for decision-making and motor control [1116]. Importantly, decision-makers who maximize expected gain (or minimize expected costs) require probabilistic models of their environment so that they can determine an expectation value. However, such optimal decision-makers have no performance guarantees if their model happens to be partially incorrect or plain wrong [17]. This raises the issue of decision-making strategies that do not rely on accurate probabilities. An extreme example strategy that completely dispenses with probabilities altogether are maximin strategies where the decision-maker picks an action that is optimal under the assumption of a worst-case scenario (or minimax strategies in the case of costs). Such a decision-maker, for example, would take out insurance not for the calamity with highest expected costs, but for the most disastrous (possibly low-probability) calamity, because of not knowing the probability. Similarly, an extremely optimistic decision-maker would assume a best-case scenario following a maxmax strategy (or a minmin strategy in the case of costs), for example by buying lottery tickets with the highest prize, independent of the presumed winning probabilities. Risk-sensitive decision-makers strike a compromise between the two extremes: they have a probabilistic model that they distrust to some extent, but they do not completely dismiss it—though the extreme cases of robust or optimistic and expected gain decision-making can also be considered as risk-sensitive limit cases [18].

    More formally, we can think of a decision-maker who considers model uncertainty in the following way [17,19]. Initially, the decision-maker has a probabilistic model p0, but knowing that this model may not be entirely accurate, the decision-maker allows deviations from it, which leads to a new effective probabilistic model p. The transformation between p0 and p has to be constrained when the decision-maker is very confident about the model. Conversely, when the decision-maker is very insecure about the correctness of the model, there should be leeway for larger deviations. The effective value of a choice set with outcomes x under the effective probability p can then be stated as

    Display Formula
    where the utility U(x) quantifies the desirability of x. The first term is the expected utility under p, and the second term—formed by the cost factor 1/β times the Kullback–Leibler (KL) divergence—captures the cost of the transformation from p0 to p. When 1/β > 0, we have to replace the extremum operator with a max operator (concave maximization), when 1/β < 0, we have to replace it with a min operator (convex minimization). Sensitivity to model uncertainty is modulated by β. When β → 0, we recover a decision-maker without model uncertainty. For β−∞, we get a maximin decision-maker who picks the choice set with maximum V, where each V considers the worst-case scenario of the choice set. In fact, the quantity V is a free energy difference, and equation (1.1) can be motivated by statistical physics (see §2.1).

    Recently, it was found that model uncertainty also affects decision-making in sensorimotor integration tasks where subjects have to form beliefs about latent variables, for example the position of a hidden target [20]. However, latent variables do not only play an important role in single-player environments but also in multi-player sensorimotor interactions [21,22], where the policy of the other player can be considered as a particular latent variable. Sensorimotor interactions in humans range from hand shaking and avoiding collisions with another passer-by to tandem riding, tango dancing and arm wrestling. As in the case of single-player environments, the presence of a latent variable suggests the formation of a belief model that can be exploited for prediction and planning [710]. And as in the case of the single-player environment, decision-makers might exhibit model uncertainty [20]. Especially when meeting a player for the first time, only a little information about this player's strategies is available. The initial trust or distrust with respect to this player can be thought of as an optimistic or pessimistic bias. However, as more information about the unknown player becomes available, such deviations should vanish and be replaced by accurate statistical estimates.

    Sensorimotor interactions can be of cooperative nature, as in the case of dancing, or of competitive nature as in the case of arm wrestling. To investigate the effect of model uncertainty on cooperation, we study sensorimotor interactions similar to the stag-hunt game. In the stag-hunt game, each player decides whether to hunt a higher-valued stag or a lower-valued hare. However, the stag is caught only if both players have decided to hunt stag. By contrast, a hare can be caught by each player independently. The stag-hunt game is a coordination game with two pure Nash equilibria, given by the pay-off-dominant stag solution, where both players hunt stag and achieve the highest possible pay-off, and the risk-dominant hare solution, where both players hunt hare and obtain a lower pay-off. The latter solution is called risk-dominant, because a player hunting hare knows exactly the pay-off he will receive, which is higher than he would get if he hunted stag by himself. The stag-hunt game is therefore often used to study the emergence of cooperation.

    In our study, we investigate a decision-making model that forms Bayesian beliefs about the other player's strategy based on empirically observed choice frequencies. In simulations, we study how model uncertainty with respect to these beliefs affects cooperation in a stag-hunt-like setting. To test human behaviour in stag-hunt-like sensorimotor interactions, we use a previously developed paradigm that allows translating 2 × 2 matrix games into sensorimotor interactions [21,22]. In the experiment, one of the players is simulated by a virtual computer player who is based on our risk-sensitive decision-making model. This way, we can directly manipulate the risk sensitivity of the artificial player and observe the response of the human player.

    2. A risk-sensitive model of interaction

    Classic models in game theory are usually equilibrium models that predict the occurrence of Nash equilibria, that is joint settings of strategies where no individual player has any incentive to deviate unilaterally from their strategy [23]. In evolutionary game theory, this problem is addressed by developing dynamic learning models that converge to the equilibria [24]. One of the simplest classes of such learning models is fictitious play [2527]. In fictitious play, it is assumed that the other player plays with a stationary strategy, which is estimated by the hitherto observed empirical choice frequencies. In our model, we also adopt the assumption of modelling the other player with a stationary strategy, but form a Bayesian belief about this strategy. In the case of the stag-hunt game, this strategy is a distribution over a binary random variable that indicates the two possible actions, namely whether to hunt stag or hare. This distribution can be expressed as a beta distribution. After observing s choices of stag and h choices of hare from the opponent, the decision-maker's belief about the strategy x of the other player is then given by

    Display Formula
    where the opponent's stationary strategy is represented by the probability x of choosing stag. For a known strategy x* of the opponent where p(x) = δ(xx*), the decision-maker faces the following expected pay-off
    Display Formula
    with U(a1, a2) denoting the player's pay-off if he chooses action a1, and the opponent chooses a2. Under strategy x*, the opponent chooses a2 = S with probability x* and a2 = H with probability 1 − x*. In fictitious play, the decision-maker simply gives a best response to this expected pay-off, where x* is given by the empirical frequencies and corresponds to the mean of the beta distribution. By contrast, we construct a decision-maker that takes the uncertainty over the x-estimate into account and exhibits risk sensitivity with respect to this belief over x. This can be achieved by inserting (2.1) as p0 and (2.2) as U(x) into equation (1.1), which results in
    Display Formula
    The value V(a1) assigned to each action depends on the parameter β, that in our case represents the risk sensitivity. For action selection, we assume a soft-max decision rule
    Display Formula
    where α is a rationality parameter that regulates how deterministic the response is. Soft-max decision rules are prevalent in quantal response equilibrium models to formalize the bounded rationality of decision-makers in games [28]. This includes the theoretically best response in the limit α that corresponds to a perfect rational agent that is able to distinguish between tiny differences in the values V. At the other end of the spectrum is a decision-maker with α → 0, which leads to P(a1) → 0.5 corresponding to an irrational agent that only produces random actions. In the remainder of the paper, we will refer to P(a1) also as λ1 if chosen by player 1 and λ2 if chosen by player 2.

    The expression for the value V also models the learning process of the parameter x. In the limit, when x is determined completely, then the distribution p(x|H, S) is going to approach a delta function in x. In that case, the integral collapses, and the free energy becomes equal to the expected pay-off. Fictitious play is therefore obtained in the limit of p(x|H, S) → δ(xx*) and α. Before this limit is reached, the distribution p(x|H, S) captures the uncertainty over the opponent, and the temperature parameter beta determines the risk sensitivity with respect to this distribution. In the infinitely risk-seeking limit β, the decision-maker is so optimistic about the stag outcome that he will ignore any information to the contrary, and such a player will always cooperate independent of the history of the game. This is because

    Display Formula
    Similarly, an infinitely risk-averse decision-maker (β−∞) is so pessimistic that he will only expect the worst-case scenario. This decision-maker will never cooperate independent of any experienced play. For any finite settings of α and β, both cooperative and non-cooperative solutions can occur.

    2.1. Model uncertainty and statistical physics

    The central idea of having model uncertainty is that we do not fully trust our probabilistic model p0(x) of a latent variable x. We therefore bias our estimates of x taking into account our utility function U(x). If we are extremely pessimistic and cautious, for example, we will completely dismiss our probability model and simply assume a worst-case scenario. We then pick the action with the best worst-case scenario. If we fully trust our probability model, then we will pick the action with the highest expected utility. But if we are a risk-averse decision-maker with a finite amount of model uncertainty, we compromise between the two extremes and bias our probability model towards the worst-case to some extent.

    This decision-making scenario can be translated in terms of state changes in physical systems, where we start with a probability distribution p0(x) and end up with a new distribution p(x), because we have added an energy potential Δϕ(x) to the system. In this analogy, energy plays the role of a negative utility. In physics, a statistical system in equilibrium can be described by a Boltzmann distribution Inline Formula with inverse temperature β = 1/kT, energy potential ϕ0(x) and partition sum Z0. The distribution p0 is called an equilibrium distribution, because it minimizes the free energy

    Display Formula
    such that p0 = arg minq F[q] with F[p0] = −1/βlogZ0. If an energy potential Δϕ(x) is now added to the system, then the new equilibrium distribution that will arise is given by Inline Formula. This equilibrium distribution minimizes a free energy F1[q]
    Display Formula

    The distribution p = arg minq F1[q] can be interpreted as the biased model. If the inverse temperature β is low, then p is going to be very similar to p0, if the inverse temperature β is high, then p is going to be biased towards low-energy outcomes of the added potential Δϕ. In the KL-control setting [2931], p0 is the equilibrium distribution resulting from the uncontrolled dynamics, whereas p corresponds to the controlled dynamics.

    Both free energies can be combined into a free energy difference as a single variational principle such that

    Display Formula
    and p = arg minqΔF[q] such that Inline FormulaInline Formula. When replacing Δϕ(x) = −U(x), we recognize in −ΔF[q] the same variational principle as suggested in equation (1.1) to describe model uncertainty. This variational principle has recently been suggested as a principle for decision-making with information-processing costs [3234]. Moreover, in non-equilibrium thermodynamics, the same expression for the free energy difference ΔF[p] can be obtained from the Jarzynski equation for infinitely fast switching between the two states. Crucially, the Jarzynski equation holds for any switching process between the two states, and generalizes classical results for infinitely slow and fast switching [35]. When the utilities are negative log-likelihoods of outcomes under a generative model, this becomes the free energy principle that has recently been proposed to model action and perception in living systems trying to minimize surprise [36].

    3. Simulation results

    To illustrate the behaviour that arises when two decision-makers interact following equation (2.3), we simulated two model players with rationality parameter α1 = α2 = 10 and risk-sensitive parameters β1 = −10 and β2 = 20 for player 1 and 2, respectively. In figure 1, we depict beliefs and action probabilities of the two players after the pessimistic player 1 played stag once and hare twice, and the very optimistic player 2 played stag three times in a row. Accordingly, player 1's belief about player 2 is biased towards cooperative strategies (figure 1a), whereas player 2's belief about player 1 is biased towards non-cooperative strategies (figure 1b). Despite being risk-averse, player 1 has a higher probability for cooperation, given the strong evidence of cooperative behaviour of player 2. By contrast, player 2 has evidence of non-cooperativeness of player 1, but because he is optimistic, he most probably chooses to cooperate anyway. In figure 2, it can be seen how both players converge to a cooperative equilibrium after 25 interactions. In the left panel, the mean and standard deviations of the beta distribution beliefs of the two players are shown over the course of the 25 trials. It can be seen that both beliefs converge towards cooperative strategies, implying that both players believe in the cooperativeness of the other player. In the right panel, the action probabilities of choosing stag for both players are shown. Both action probabilities converge to cooperative strategies.

    Figure 1.

    Figure 1. Belief and action probabilities. Belief probability of player 1 (a) and player 2 (b) after observing three actions of the other player. Player 1 observed three cooperative actions and player 2 observed one cooperative and two non-cooperative actions. Accordingly, player 1 has more probability mass on the right half, whereas player 2 has more probability mass on the left half. Action probability of player 1 (c) and player 2 (d) resulting from the beliefs and the player's risk sensitivity. Player 1 has a higher probability to cooperate even though he is risk-averse, due to the strong evidence of cooperation. Player 2 also places high probability on cooperation, because he is strongly risk-seeking, even though the evidence points more towards a non-cooperative opponent. (Online version in colour.)

    Figure 2.

    Figure 2. Evolution of (a) belief and (b) action probabilities over 25 trials. (a) Mean and standard deviation of the beta distribution reflecting each player's beliefs. (b) Action probabilities of the players according to equation (2.3). The third trial corresponds to the beliefs and actions displayed in figure 1. (Online version in colour.)

    In the bottom row of figure 3, we show the probability of a cooperative equilibrium after 25 interactions depending on all possible combinations of risk sensitivities of the two players ranging from risk-averse (β = −20) to risk-seeking (β = +20). In this simulation, the rationality of player 1 was always set to α1 = 10, whereas the rationality of player 2 was set to α2 = 2 (right panels) or α2 = 10 (left panels). The prior probability of cooperation before any interaction has taken place is shown in the upper panels. For uninformative priors, the probability of cooperation in the first trial is greater than one half for all risk-seeking decision-makers and lower than one half for all risk-averse decision-makers independent of the opponent's risk sensitivity. Naturally, in later interactions, the opponent's risk sensitivity comes to bear. If both players have positive risk sensitivities, then there is a higher probability they will end up cooperating, and similarly if both players have negative risk sensitivities there is an increased probability they will end up with a non-cooperative equilibrium. If one of the players is risk-seeking and the other one risk-averse, then the player whose risk sensitivity has higher absolute value will more probably drive the behaviour of the interaction towards cooperation if risk-seeking or non-cooperation if risk-averse. If player 2 has a low rationality α2 = 2, the overall pattern is similar, but more noisy.

    Figure 3.

    Figure 3. Probability of choosing the cooperative action for both players with different risk-sensitivities and rationality parameters. (top) Prior probability of cooperating. In the first trial, the probability of cooperation only depends on the risk sensitivity of the player and does not depend on the risk sensitivity of the opponent. (bottom) Probability of cooperating after 25 trials. In later trials, the probability of cooperation depends on the risk sensitivity of both the player and the opponent. (first two columns) Probability of cooperation when both players have equal rationality α. (last two columns) Probability of cooperation when players have different rationality α. The probability of cooperating was computed according to equation (2.3). (Online version in colour.)

    4. Experimental methods

    To investigate the effect of risk sensitivity in sensorimotor interactions in human subjects, we used a previously developed virtual reality paradigm to translate 2 × 2 matrix games into sensorimotor games [21,22]. One of the players was always simulated by a virtual agent modelled by equation (2.3). This way, we could directly manipulate the risk sensitivity of the virtual player and record subjects' responses to these changes.

    4.1. Experimental design

    As illustrated in figure 4b, participants held the handle of a robotic interface with which they could control the position of a cursor on a display. On each trial, participants had to move the cursor from a start bar to a target bar and back. Importantly, they could do so choosing any lateral position within the width of the target bar. Therefore, participants could achieve the task with their final hand position anywhere between the left and right target bounds. During the forth-and-back movement to the target, subjects had to cross a yellow decision line at 3 cm into the movement. Once the line was crossed, both the subject's and the virtual player's decisions were made. The left half of the subject's lateral workspace represented the cooperative stag solution, whereas the right half represented the non-cooperative hare solution.

    Figure 4.

    Figure 4. The sensorimotor stag-hunt game. (a) Pay-off matrix of the game. (b) Experimental methods. Subjects had to move a cursor from the start bar to the target bar. The left half of the workspace corresponded to selecting ‘stag’, and the right half of the workspace corresponded to selecting ‘hare’. Once they crossed, the decision line a circle on the target line indicated the choice of the virtual player, and subjects experienced a force opposing their forward movement that depended on their selection and the virtual opponent's action selection which followed equation (2.3) and subjects experienced a force opposing their forward movement that depended on their action selection and the virtual opponent's action selection as indicated by the pay-off matrix. (Online version in colour.)

    An implicit pay-off was placed on the movements beyond the decision line by using the robot to generate a resistive force opposing the forward motion of the handle. The forces were generated by simulating springs that acted between the handle and the yellow decision bar. The stiffness of the spring during the movement depended on the lateral position of the handle at the time of crossing the decision line and the computer player's choice. The spring constant was determined by the pay-off indicated in figure 4a and multiplied by a constant factor of 1.9 N cm−1. For successful trial completion, the target bar had to be reached within 1200 ms. The distance of the target bar from the start bar was sampled randomly each trial from a uniform distribution between 15 and 25 cm. Subjects performed two sessions where they faced virtual players with two different rationality parameters. In the first session, the rationality of the virtual player was α2 = 10 and subjects performed 40 sets of 25 trials, where the virtual player could assume one of five different β2-values from the set [±20, ±10, 0]. At the beginning of each set, the β2-parameter of the virtual player was determined and remained constant throughout the set. Each β2-parameter was chosen eight times, but in randomized order. In the second session, the rationality of the virtual player was set to α2 = 2 and subjects performed again 40 sets of 25 trials each with different β-parameters. At the start of every session, they had between 100 and 125 training trials where they could see the degree of risk sensitivity of the virtual player displayed on a bar.

    4.2. Experimental apparatus

    The experiments were conducted using a planar robotic manipulandum (vBOT) [37]. Participants held a handle of the vBOT, which constrained hand movements to the horizontal plane. A virtual reality system was used to overlay visual feedback onto the plane of movement and players were prevented from seeing their own hand. The vBOT allowed us to record the position of the handle and to generate forces on the hand with a 1 kHz update rate.

    4.3. Participants

    Six naive participants from the student pool of the Eberhard-Karls-Universität Tübingen participated in the study. All experimental procedures were approved by the ethics committee of the medical faculty at the University of Tübingen.

    The precise instructions given to subjects are described below. Subjects were told that they were playing a game against a virtual player and that they could choose between two actions in every trial: either to cooperate or not to cooperate. They were instructed to make their choice by moving the handle across the decision line either in the right or left half of the workspace and that the left half corresponded to cooperation, whereas the right half corresponded to non-cooperation. They were also informed that there would be a force opposing their movement between the decision line and the target line. They were told that in the case of non-cooperation they would always experience the same medium force, but that in the case of cooperation the force would depend on the choice of the virtual player, who could choose to cooperate or not to cooperate. In case both players cooperate, there would be no force, but if the virtual player chooses not to cooperate there would be a very high force. Subjects were also told that the virtual player can learn and adapt to the subject's play.

    At the beginning of each block of training trials, subjects could see a bar displaying the degree of the virtual player's risk sensitivity and they were told that the bar indicates the virtual player's attitude towards cooperation. They were also told that there was a different player with a different attitude every 25 trials. After the training trials, they were told that the bar would be no longer displayed and that they can learn the player's attitude towards cooperation only from actual play. Between blocks of 25 trials, there was a short break to mark the transition between different virtual players clearly.

    5. Results

    In figure 5, we show subjects' prior cooperation probabilities in the first trial of every set of 25, when they face a novel virtual player. This is shown in white for virtual players with rationality α2 = 10 and, in black for virtual players with rationality α2 = 2. In the α2 = 10 condition, we found that four of six subjects chose to cooperate most of the time in the first trial. In the α2 = 2 condition, only three out of six subjects chose to cooperate. This implies that about half of our subjects were risk-seeking and optimistic about cooperation, whereas the others were risk-averse and pessimistic.

    Figure 5.

    Figure 5. Prior cooperation probabilities in human subjects playing virtual opponents with high (α2 = 10, white colour) and low (α2 = 2, black colour) rationality. In the first trial, when facing a new opponent, subjects knew the rationality of the opponent, but not their risk sensitivity.

    After the first trial, subjects received feedback about the choice of the virtual player and could make a first inference about the virtual player's willingness to cooperate. Accordingly, subjects' probability of cooperation in subsequent trials in a set of 25 needs to be investigated separately for the different risk sensitivities of the virtual players. For the extreme risk sensitivities of β2 = 20 and β2 = −20, this is depicted in figure 6. When playing a risk-averse opponent (β2 = −20), subjects mostly converged to non-cooperative behaviour (figure 6c,d), whereas when playing a risk-seeking opponent (β2 = 20), subjects mostly converged to cooperative behaviour (figure 6a,b). This pattern is clearly demonstrated when facing virtual players with high rationality α2 = 10 (figure 6b,d), but much more diffuse in the case of virtual players with low rationality α2 = 2 (figure 6a,c).

    Figure 6.

    Figure 6. Evolution of cooperation over the course of 25 trials of human subjects facing virtual opponents with low (α2 = 2, (a,c)) or high (α2 = 10, (b,d)) rationality and positive (β2 = 20, (a,b)) and negative (β2 = −20, (c,d)) risk sensitivity. Different lines indicate different subjects. (Online version in colour.)

    To directly assess the effect of risk sensitivity on the cooperative behaviour of human subjects over all trials, we computed the mean probability of cooperation averaged over all trials where the opponent had the same risk sensitivity β2 and rationality α2. In figure 7, this is shown for all six subjects playing an opponent with rationality α2 = 10 (figure 7a) and rationality α2 = 2 (figure 7b), respectively. For both rationalities, the risk sensitivity β2 of the opponent has a significant effect on the probability of cooperation (non-parametric Jonckheere–Terpstra trend test p < 0.05 for α2 = 2 and p < 0.001 for α2 = 10). However, in the case of high rationality α2 of the virtual player, this effect is stronger and clearer than in the case of inconsistent play resulting from an opponent with low α2. The general trend is that subjects' tendency to cooperate increases for higher β2 and decreases for lower β2. Importantly, most subjects deviated on average from a 50 : 50 cooperation probability when playing a risk-neutral opponent of high rationality (α2 = 10), which is another signature of subjects' risk sensitivity.

    Figure 7.

    Figure 7. (top row) Average probability of cooperation depending on risk sensitivity of the opponent with either high (α2 = 10, (a)) or low (α2 = 2, (b)) rationality for the six different subjects. Different lines indicate different subjects. (Online version in colour.)

    To compare the predictive power of our model with the traditional fictitious play model, we investigated the ratio of cooperation after subjects had experienced an (approximately) 50 : 50 sequence of actions of the virtual player, i.e. the opponent had (roughly) cooperated half the time and refused cooperation the other half of the time. Importantly, we did this at two different stages of the game such that the 50 : 50 ratio was the result of either a small number of trials (after two trials) or a large number of trials (after 10 trials). In the two-trial case, only trials with one stag- and one hare-choice were included, however, for the 10-trial case, there were not enough instances with an exact 50 : 50 ratio. Therefore, we also included trials between 40% and 60% of cooperation, but still this analysis was only possible in the case of a virtual player with low rationality (α2 = 2). The crucial observation is that in the case of two trials the estimate of the other player's cooperation is highly uncertain, whereas in the case of 10 trials, this estimate is much more consolidated. In both cases, fictitious play makes the same prediction, which is the best response to the ratio—compare dashed line in figure 8. By contrast, a risk-sensitive model predicts that the best response should depend on the uncertainty of the estimate of the ratio. For our model predictions, we fitted to each subject an α1- and a β1-parameter by maximizing the log-likelihood of subjects choices given the predicted choice probabilities of equation (2.3). In particular, this predicts that a risk-seeking player will deviate towards cooperation in early trials, whereas a risk-averse player will deviate towards non-cooperation in early trials—compare figure 8a. In late trials, when a large part of the uncertainty has been removed, both players converge to fictitious play.

    Figure 8.

    Figure 8. Comparison of risk-sensitive predictions to fictitious play and human subjects' behaviour. (a) Predictions of probability of cooperation when observing a sequence with 50% cooperation after two trials (filled bar) or 10 trials (open bar). The dashed line is the prediction of fictitious play. (b) Subjects' cooperation probabilities when observing a sequence with roughly 50% cooperation after two trials (black) or 10 trials (white).

    In figure 8b, it can be seen that most subjects' behaviour was inconsistent with fictitious play. Subjects 1, 4 and 6 were risk-seeking and deviated significantly towards cooperation in the third trial (one-sided t-test p < 0.01). Subject 5 was risk-averse and refused cooperation in early trials (p < 0.01). Subjects 2 and 3 were risk-neutral and consistent with fictitious play and therefore the deviation from 0.5 choice probability was not significant (p > 0.1). Importantly, after 10 trials, all subjects were consistent with fictitious play and were best-responding to the observed sequence of the opponent's play, hence the deviation from 0.5 choice probability was not significant for all of them (p > 0.1).

    6. Discussion

    Most current theoretical frameworks of motor control rely on probabilistic models that are used for prediction, estimation and control. However, when such models are partially incorrect or wrong, there are usually no performance guarantees [17]. Model uncertainty is therefore an important factor in real-world control problems, because, in practice, one can never be absolutely sure about one's model. In this paper, we investigated risk-sensitive deviations arising from having model uncertainty in sensorimotor interactions. We found that human subjects adapted their cooperation depending on the risk sensitivity of a virtual computer player. Furthermore, we found that subjects did not only best-respond to the frequency of observed play, but that they were sensitive to the certainty of this estimate. In particular, they allowed for risk-sensitive deviations in initial interaction trials when uncertainty was high. This behaviour is consistent with a risk-sensitive decision-maker with model uncertainty.

    Recently, it was found that risk sensitivity is an important determinant in human sensorimotor behaviour [38]. Risk-sensitive decision-makers do not base their choices exclusively on the expectation value of a particular cost function, but they also consider higher-order moments of this cost function. This can be seen when approximating the risk-sensitive cost function with a Taylor series

    Display Formula
    assuming that Inline Formula is small [39]. Sensitivity to the second-order moment of the cost function was found, for example, in motor tasks with speed–accuracy trade-off [40]. Such risk-sensitive decision-makers can be thought of as trading off the mean cost versus the variability of the cost. A mean–variance trade-off in effort was found, for example, in a motor task where subjects had to decide between hitting differently sized targets that were associated with different levels of effort [41]. Sensitivity to the variance of the control cost was also found in continuous motor tasks, where subjects had to control a cursor undergoing a random walk [42]. The sensitivity to the variance can also be exploited by assistive technologies that consider the human as a (useful) noise source [43,44].

    When x is a latent variable that needs to be inferred, risk sensitivity also allows decision-makers to take model uncertainty into account. This can be seen when rewriting the risk-sensitive cost function as in equation (1.1) yielding

    Display Formula
    where J can be re-expressed as a variational principle that trades off the maximization of a utility term and the deviation from p0 to p [17]. Such model uncertainty was recently found to play a role in a sensorimotor integration task, where subjects had to infer the position of a hidden target (the latent variable) [20]. When given feedback information about the target position with varying degree of reliability, subjects' estimates of the target position was consistent with a Bayesian estimator that optimally combines prior knowledge of the distribution of target positions with the actual feedback information. Subjects' behaviour was therefore also consistent with previous reports on information integration in sensorimotor tasks [9]. However, when subjects' beliefs were associated with control costs, study [20] found that subjects exhibited characteristic deviations from the Bayes optimal response that could be described by a risk-sensitive decision-making model that depended on the level of model uncertainty, the reliability of the feedback and the control cost. These risk-sensitive deviations were particularly prominent in trials with high uncertainty and vanished in the absence of uncertainty as more and more information about the latent variable becomes available.

    In the context of model uncertainty, risk sensitivity can be distinguished from risk attitudes modelled by the curvature of the utility function, both theoretically and experimentally [45,46]. Utility functions generally express the subjective desirability of an outcome and not necessarily its nominal value. For example, the subjective value of money does typically not increase linearly with the nominal amount. Accordingly, receiving a monetary increase of $1000 has more utility for a beggar than for a millionaire. The utility function is said to be marginally decreasing. Intriguingly, this property can also be used to model risk attitudes. For example, people with a marginally decreasing utility function of money will prefer $50 for sure over a gamble between a 50 : 50 lottery, where one outcome is $0 and the other is $100, because U($50) > 1/2U($100), assuming that U($0) = 0. Importantly, these risk attitudes are independent of the level of information about the probabilities. In fact, the probabilities are assumed to be perfectly known. Thus, risk attitudes are conceptually very different from model uncertainty that vanishes in the limit of perfect information about the probabilities. Model uncertainty captures the lack of information about a lottery.

    The effect of risk attitudes on cooperation in the stag-hunt game is investigated in behavioural economics tasks [4749] in which the risk attitude of subjects is determined by subjects' choice behaviour when deciding between risky and safe lotteries. In these studies, it was found that subjects' risk attitude does not predict their cooperation in the stag-hunt game, although players consider information about the other player's risk attitude. In particular, subjects are less likely to cooperate if they know that their opponent is risk-averse. However, the fact that subjects' risk attitude is a poor predictor of their cooperation in the game suggests that not risk attitude, but model uncertainty might be a stronger factor affecting cooperation in the game.

    In the traditional stag-hunt game, pay-offs are usually framed as gains, whereas, in our experiment, the pay-offs are framed as losses in the shape of forces subjects have to exert. In the economics literature, it is well known that the framing of losses versus gains can have a strong influence on human choice behaviour [50]. It is therefore not surprising that different pay-off levels have also been found to influence choice behaviour in the stag-hunt game [51]; in particular, it was found that having losses increases players' probability of choosing the more risky stag. Crucially, our results showing sensitivity to model uncertainty do not depend on the exact shape of the utility function. Expected utility players that have experienced 50 : 50 play of their opponent after N amount of trials will choose between a1 = S and a1 = H according to equation (2.2), where x* = 0.5. The decision-maker's preference depends of course on the utilities U(a1, a2), but crucially these utilities and the resulting expected utility do not change with varying the amount of trials N as long as the empirical frequency is 50 : 50. The fact that we have used a loss scenario does therefore not invalidate our results on model uncertainty, although the exact choice probabilities might look different in a gain scenario.

    Fictitious play is one of the earliest models that were developed to explain learning in games [25,52]. Crucially, it assumes stationary strategies for both players. It can be shown to converge for a wide class of problems, including all two-player interactions [53]. However, it can also be shown that fictitious play can lead to non-converging limit-cycles for very simple games [54]. In our study, we found that subjects were not simply best-responding to the observed frequency of the opponent's play, as presumed by fictitious play. Rather, subjects were sensitive to the amount of information they had gathered about the other player when deciding whether to cooperate or not—compare figure 8. Our risk-sensitive model of cooperation can account for this dependency. However, it still makes the simplistic assumption of stationary strategy beliefs. This limitation may be overcome in the future by considering more complex belief models.

    An important objection to risk-sensitive models is often that they could be replaced by a standard risk-neutral Bayesian model under a different (post hoc) prior [17]. This is also true in our case: subjects could develop biased prior beliefs about the population of virtual players. Importantly, the population of virtual players was statistically balanced and there is therefore no statistical reason why subjects should develop biased priors. However, if the prior is thought to reflect not only the (prior) statistics of the environment but also traits of the decision-maker, then a risk-neutral Bayesian model with a biased prior could, in principle, also explain our data. This is sometimes also discussed in the context of so-called complete class theorems, in which the existence of priors is investigated when modelling Bayesian decision-makers with different loss functions [55,56].

    The results of our study also speak to cognitive theories of (dyadic) social interactions and joint actions. Several recent studies have investigated how humans mutually adjust and synchronize their behaviour during online joint actions, revealing the role of several mechanisms that range from automatic entrainment to action prediction [22,5759]. An open research question is if and how sensorimotor interactions are influenced by the co-actors' goals and attitudes. Given that socially and culturally relevant information (e.g. facial expression, racial or social group membership) is automatically processed in the brain [60] and can automatically modulate imitation [61] and empathy [62], most studies have focused on the impact of socially relevant variables in joint actions, with the hypothesis that it could favour pro-social or anti-social behaviour. It has been shown that interpersonal perception and (positive and negative) attitude towards the co-actor modulate cooperation and joint actions [63,64]. In turn, sensorimotor interactions can modulate a co-actors' attitude; for example, it has been reported that dyads engaged in synchronous interactions improve their altruistic behaviour [65].

    The aforementioned studies focus on social attitudes and leave unanswered the issue of how personal traits and non-social attitudes influence sensorimotor interactions. Here, we studied the influence of model uncertainty on the evolution of sensorimotor interactions. We designed a sensorimotor task that is equivalent to the stag-hunt game. Our results show that model uncertainty modulates sensorimotor interactions and their success. In particular, optimistic (risk-seeking) adaptive agents are much more likely to converge to cooperative outcomes. Furthermore, humans adaptively change their level of cooperation depending on the risk sensitivity of their co-actor (in our study, a computer player). Effects of model uncertainty are particularly strong in early interactions with a novel player. In summary, our results indicate that interacting agents can build sophisticated models of their co-actors [66] and use them to modulate their level of cooperation taking model uncertainty into account.

    Funding statement

    This study was supported by the DFG, Emmy Noether grant no. BR4164/1-1 and EU's FP7 under grant agreement no. FP7-ICT-270108 (goal-leaders).