Choice mechanisms for past, temporally extended outcomes

Accurate retrospection is critical in many decision scenarios ranging from investment banking to hedonic psychology. A notoriously difficult case is to integrate previously perceived values over the duration of an experience. Failure in retrospective evaluation leads to suboptimal outcome when previous experiences are under consideration for revisit. A biologically plausible mechanism underlying evaluation of temporally extended outcomes is leaky integration of evidence. The leaky integrator favours positive temporal contrasts, in turn leading to undue emphasis on recency. To investigate choice mechanisms underlying suboptimal outcome based on retrospective evaluation, we used computational and behavioural techniques to model choice between perceived extended outcomes with different temporal profiles. Second-price auctions served to establish the perceived values of virtual coins offered sequentially to humans in a rapid monetary gambling task. Results show that lesser-valued options involving successive growth were systematically preferred to better options with declining temporal profiles. The disadvantageous inclination towards persistent growth was mitigated in some individuals in whom a longer time constant of the leaky integrator resulted in fewer violations of dominance. These results demonstrate how focusing on immediate gains is less beneficial than considering longer perspectives.

In the discrete implementation, incentive value can be expressed in its recursive form: Fig. 2a.
For two options of equal contents presented so that the temporal profile for one option is increasing and decreasing for the other option, as in Fig. 2b, preference for positive contrasts causes differential discount on the perceived values of the competing options. This mechanism leads to evidence in favour of the increasing profile, under the parameterization of , B and β. Notice that optimal accumulation of evidence is characterized by = ∞ leading to ! = log ! − log ! . The parameters and variables of the model are summarized in Table S1.

Alternative mechanisms
Because the bounded dynamic range of neural encoding makes linear integration of the perceived values computationally intractable, especially for episodes of unknown duration, an observer may use numerosity or symbolic representations of reward value to construct summary evaluations for temporally extended outcomes. However, even for symbolically represented reward magnitudes, summary evaluation is facilitated by relative encoding [4].
Alternatively, the observer might adopt different valuation strategies for increasing and decreasing sequences. It is conceivable that expectations about upcoming events might affect some observers in ways that lead to differential valuation depending on expectation.
However, such a strategy would require the observer to decide which strategy to follow before the experience begins. By contrast, leaky integration of the perceived values implement accumulation of evidence in the same way for all experiences. Thus, contrastguided evaluation as described above is a biologically plausible and parsimonious mechanism for constructing summary evaluations for temporally extended outcomes.

Simulations
In order to test the ability of the proposed framework to characterize violation of dominance for temporally extended outcomes, the generative model was tested in several parametric implementations. Competing options with identical contents were used to simulate the mechanism in Fig. 2a. Each option had 19 elements; arranged along a decreasing or increasing profile as shown in Fig. 2b. The function = ( /100) ! implements perceived value and the exponential filter = ( + 1 ) !! implements accumulation of incentive value. Signal-to-noise ratio (SNR) was simulated by variance in the sampling noise ! ∈ 0, ! ! , SNR = EV / ! , and sensitivity and biases of the decision maker were simulated by decision noise. Since it is the difference in the expected value of the decision noises that determines bias, and it is the combined variance that determines sensitivity, for the sake of simplicity we added all the decision noise to option A, ! ∈ ! , ! ! , leaving

Simulation 1
We first investigated the interaction of value function and decay in the accumulation of evidence. The decay parameter τ discounts perceived value in the accumulation of incentive value and the value function parameter κ determines the absolute amount of this discount. We simulated the relative discount on incentive value based on the expected value of evidence (i.e. no noise) over the ranges 0.34 ≤ ≤ 1 and 1 ≤ ≤ 21 (Fig. S2, Fig. 2c).

Simulation 2
We then investigated the mutual representation of SNR, bias and sensitivity of the decision maker for choice data produced by the model. Here we used a fixed value function = ( /100) !.!" and exponential filter = ( + 1 21 ) !! , and varied the specification of the decision and sampling noises. We used three decision noises with expected values 0, 1

Simulation 3
We finally simulated the interaction of SNR and decay in the accumulation of evidence. To investigate the effect of state sampling noise, three values of variance ! ! were calculated to simulate signal-to-noise ratios, SNR ∈ −4, 0, 4dB . For each SNR, we simulated preference for the increasing profile for four values of decay, ∈ 1, 3, 8, 21s (Fig. 2e).

Pre-experimental evaluation
We measured the relationship between the physical size and perceived value of the virtual coins. The coins were presented to the subjects in a 3D projection to simulate the variation in volume. The state variable is coin volume, and objective value would be proportional to volume insofar as the specific value of a precious coin is given in prize units per mass unit (e.g. £/gr). This means that a coin scaled in diameter by a factor m differs in volume by m 3 compared to an unscaled coin. If perceived value would correspond to objective value there would be a linear relationship between simulated volume and subjective evaluations ( = 1).
The perceived value of the virtual coins was determined by a Becker-DeGroot-Marschak (BDM) evaluation task.
The participants were given a 5£ budget, and they were instructed to place bids on each of 120 coins according to how much they felt each coin was worth. The coins varied in simulated volume from 1% to 100% of a reference coin that was shown on the screen before the bidding started. A randomly chosen subset of the bids were drawn to exchange the £5 budget to approximately 30 virtual coins of varying size and value according to a secondprize auction [5]. This set of coins would become the subject's endowment in the main experiment that followed. Before the bidding round, the reference coin of nominal value 100 pence (£1) was shown on the screen. A training round of 20 coins preceded the actual bidding round. Each coin was visible for 350ms following a response window showing a black screen with the text "Place a bid for the coin" for up to 5 seconds.
After the bidding round the participants watched a computer animation of the auction, in which one coin at a time was drawn from the total pool and the associated bid placed by the participant was shown against the computer's bid. For each coin drawn in this way, the computer placed a random bid drawn from a rectangular distribution without taking into account the specification of the coin at stake or the participant's bid. If the computer's bid was higher, the coin at stake was discarded and a new coin was drawn from the pool of bids without replacement. If the participant's bid was higher, they would buy the coin and they would pay the amount bid by the computer. This amount was then taken from their budget, and this procedure would continue until the budget was spent. In this way approximately 30 bids were realized to ensure the participants an initial endowment of virtual coins. This endowment had an expected value of £10 because the coins were obtained in a second prize action [5] consistent with the BDM method [6]. In the subsequent experiments the value of the endowment was adjusted according to the participants' performance in the task, and payment was made on the basis of the final adjusted value of the endowment at the end of the experiment.

Experiment 1
Experiment 1 was designed to test if participants were sensitive to the chronological configuration of perceived outcomes. We used as option alternatives a set of arbitrary temporal profiles and slight modifications to these profiles. Experiment 1a used the temporal profiles shown in Fig. 3a (top). They included 9 pairs of sequences of up to 8 coins, and each condition was repeated 30 times. In three conditions the last coin was omitted, so 7 coins competed with 8 coins. In one case (Fig. 3a, top left), the contents were identical; only the order in which the coins were presented differed. Experiment 1b used the temporal profiles shown in Fig. 3a (bottom). They included 9 pairs of sequences of 10 coins, and each condition was repeated 20 times. In one case (Fig. 3a, bottom left), the contents were identical; only the order in which the coins were presented differed.
The CSs were randomized and balanced as follows: each CS pair was used twice in two different subjects to indicate either option of one condition. least as good as the alternative [7]. A brute-force method was used to determine, which coins to remove without changing the average objective value of the remainder. We used two levels of steepness to examine the potential effect of valence abruptness. Two levels of variance were used to obscure the underlying average profile. Thus, the sequences did never proceed monotonically along the temporal profile; for the wider distribution of obfuscation noise ( The temporal profiles were composed by scaling a sigmoid function to first generate a decreasing reference profile: initial scale I and final scale F. The temporal profiles were then composed by calculating individual obfuscated magnitudes, ! = ! + where the obfuscation ∈ 0, ! was pseudo-randomized so that the average was the reference profile. Obfuscation was used to make less obvious the underlying average temporal profile. Increasing profiles were generated by inverting the sequence along the temporal dimension, and dominated alternatives to each profile were generated by removing elements from the longer profile. The values in Table S2 were used to generate the profiles shown in Fig. S1.

Experiment 1
To test the predictive leverage of simple proxies for decision value, we used multiple logistic regression on the preference data P: according to the proxies = log ( ! ! ), where ! is perceived value (mean or proxy) of option I ∈ {A; B}, and Q is the modified Gram-Schmidt orthogonalization matrix [8]. We first fitted B for the proxies initial and final values (Fig. 3b top). Next, to enable analysis of the distribution of specific predictive leverage of coin position on preference for sequences of varying duration, we resampled all sequences to the maximum number of coins by linear interpolation. We then fitted B to fixed-effects models in which !"#$% was characterized by the perceived values of each element in the resampled sequences (Fig. 3b bottom). Note that this approach cannot separate effects related to correlation between the proxies; it only removes the effect of mean value. Thus, each slope estimate indicates the predictive leverage of coin position for the residual.  We used a combination of nonlinear optimization algorithms implemented in MATLAB to estimate the parameters to each participant's full data set over the trials of all conditions. scores can be obtained by summing the scores across subject [9]. When the models are compared at the group level, the best model fitness was obtained for the bias-free leaky integrator (Table S4).

Additional Analyses
Separate models were implemented to analyse decision bias for the first option and bias for positive contrasts. The predictors indicating option order and their respective temporal contrasts are independent, and the associated biases from the primary analyses were uncorrelated (ρ 2 = 0.0052, p = 0.699). Therefore, we tested a dual-bias model in which parameter estimates were obtained for primacy bias (β 1 ) and positive contrasts (β + ) by fitting the choice data (P) according to: and positive contrasts (ρ 2 = 0.97, p = 8.5×10 -24 ). In no case did the dual-bias model outperform any of the other models according to any criterion.
In every trial, one of the options was weakly dominated by the alternative. Since the value function is non-linear, ambiguity in dominance relation was avoided by offering option alternatives composed of a coin sequence and a subset of that same sequence. Hence, in any trial the shorter option is dominated by the longer option regardless of the individual value function exponent κ. Thus, it is conceivable that it is not perceived value but rather state estimate that is discounted in the accumulation of evidence (although it is not immediately obvious how an internal representation of state would be divorced from its perceived value).
To address this question, we re-fitted the parameters of the biasfree leaky integrator to the data using objective value in place of perceived value for all subjects. We analysed the difference in parameter estimates obtained with the simplified model that generally lead to a decrease in model fitness (Fig. S6).

Response strategies
We interviewed the participants at the end of the experiments and asked them how they had approached the task. Most participants reported a tendency to rely on a strategy whereby coins were classified in two or three bins (e.g. small/medium/large) and the number of large coins became the main determinant of choice. Some participants also reported that the coins seemed to be presented along a temporal profile and they were mindful not to let it affect their judgment. No participant reported relying entirely on duration, option order or screen position. Thus, the participants appeared to operate analytically while trying to avoid contextual effects. In spite of these attitudes, a large number of choices revealed preference for the dominated option. This result suggests that the participants had no explicit access to the generative process underlying their choices.        Parametric specification of M different utility profiles in the second version of the monetary venture games (Experiment 2). N is the number of coins in the dominating reference sequence, and I and F are the initial and final scale values, respectively. The parameters s and c specify steepness of the temporal profile and variance in the obfuscation noise. and individual value function exponent (κ) are shown for each subject.