The functional role of serial dependence

The world tends to be stable from moment to moment, leading to strong serial correlations in natural scenes. As similar stimuli usually require similar behavioural responses, it is highly likely that the brain has developed strategies to leverage these regularities. A good deal of recent psychophysical evidence is beginning to show that the brain is sensitive to serial correlations, causing strong drifts in observer responses towards previously seen stimuli. However, it is still not clear that this tendency leads to a functional advantage. Here, we test a formal model of optimal serial dependence and show that as predicted, serial dependence in an orientation reproduction task is dependent on current stimulus reliability, with less precise stimuli, such as low spatial frequency oblique Gabors, exhibiting the strongest effects. We also show that serial dependence depends on the similarity between two successive stimuli, again consistent with the behaviour of an ideal observer aiming at minimizing reproduction errors. Lastly, we show that serial dependence leads to faster response times, indicating that the benefits of serial integration go beyond reproduction error. Overall our data show that serial dependence has a beneficial role at various levels of perception, consistent with the idea that the brain exploits the temporal redundancy of the visual scene as an optimization strategy.


Introduction
As most objects in the environment are relatively stable over time, there are large temporal redundancies in the spatio-temporal flow of information. It has long been known that sensory systems exploit spatial redundancies by shifting their responses to match the stimulation statistics [1], but until recently there has been little evidence as to whether perceptual systems carried over information across time.
Two recent papers [2,3] introduced a new psychophysical paradigm, serial dependence, which provided direct evidence of how a system incorporates past information into the perception of the current stimulus. These effects have now been confirmed with a variety of stimuli and tasks, from simple orientation judgements [3][4][5], numerosity [2], position [6,7], facial identity and expression [8,9], eye gaze [10], pulchritude [11] or body size [12], to complex judgements such as summary statistics [13], variance [14] and confidence [15]. A series of control experiments showed that serial dependence effects could not be accounted for by effects such as priming, hysteresis, explicit memory or expectation. Furthermore, functional magnetic resonance imaging results [16] have shown that neural representations in the primary visual cortex (V1) were biased towards previous perceptual decisions, demonstrating a direct neural correlate of serial dependence, and suggesting that the effects occur early in primary visual cortex.
In non-symbolic numerosity judgements, serial dependence effects were found to be strong enough to compress the subjective spatial representation of numbers [2], an effect previously thought to reflect the logarithmic encoding of numbers [17]. The compression is a direct result of the fact that not all stimuli have the same level of dependence on the previous presentation: low numerosities are reproduced more reliably (with less variability) than higher numerosities, showed less serial dependence. This suggests that serial dependence is related to the reliability of the current sensory information, which prompted a model where serial dependence becomes a form of response optimization.
Here we develop an ideal observer model and test its predictions with an orientation reproduction task. To this aim, we first test if serial dependence is stronger with less reliable stimuli, varying reliability by varying the orientation and spatial frequency of grating patches [18,19]. We then explore how serial dependence changes as a function of inter-stimulus orientation change. We also show that besides reducing error, serial dependence leads to faster responses. All this evidence shows that serial dependence increases the accrual of sensory information, improving efficiency.

Methods (a) General procedure
The study was approved by the Regional Ethics Committee (Comitato Etico Pediatrico Regionale-Azienda Ospedaliero Universitaria Meyer-Firenze) and was in accordance with the ethical standards of the 1964 Declaration of Helsinki. Informed written consent was obtained from each participant prior to the experiments. Ten participants (two authors plus four naive observers for experiment 1, two authors plus four naive for experiment 2; mean age ¼ 31, range ¼ 28-41), all with normal or corrected-to-normal vision participated in the study.
Stimuli were presented on the face of a calibrated 23 inch LCD monitor subtending 268 (horizontal) by 14.58. Stimuli were generated using MATLAB (the MathWorks, Natick, MA) in conjunction with routines from the PSYCHTOOLBOX [20]. Responses were collected via a standard mouse and keyboard connected via USB to a PC yielding a temporal resolution of 4 ms.
(b) Experiment 1 Experiment 1 investigates the effect of stimulus reliability on serial dependence by manipulating the orientation and spatial frequency of the Gabor patch in a 2 Â 2 design. Stimulus orientation was either oblique (close to the diagonal: 258 -658 in steps of 108) or cardinal (close to vertical: 2208 to þ208, in steps of 108). The spatial frequency of the Gabor also varied; either 0.3 or 1.2 cycles per degree (cpd). Following Fischer & Whitney [3], the spatial frequency content of the mask was matched to that of the stimulus.
The experimental paradigm (figure 1a) was a close replication of the adjustment paradigms of Fischer & Whitney [3]. Each trial began with the presentation of an eccentric Gabor stimulus (contrast 25%, 500 ms, 3.28 full-width half-height), followed by a mask (random noise filtered, contrast 50%, 1000 ms) rightward of fixation (88 horizontal, 48 vertical eccentricity). Participants were instructed to reproduce the orientation of the Gabor patch by moving the mouse and setting the orientation of an oval (width 0.28, length 18). Participants confirmed the orientation with the space bar of the keyboard and the reproduction bar disappeared.
We expressed the strength of serial dependence as the weight of the previous orientation on the current judgement (figure 3e-h). To combine various conditions, we plotted reproduction bias (current reproduced orientation minus current stimulus orientation) on the ordinate against the orientation change across trials (orientation of the previous minus orientation of the current stimulus) on the abscissa. We excluded responses more than 3.5 s after the disappearance of the Gabor patch and also those deviating more than 308 from the physical orientation of the patch. A simple linear fit provides an estimate of the loading of previous orientation on current error. As serial effects are predicted and found to be maximal for relatively small stimulus differences [3,5] (see equation (3.1)), we restricted our linear fitting to trial pairs where the orientation changed between the previous stimulus and the current response was between 2108 and þ108. Six subjects took part in the experiment each contributing 280 trials to each condition.
The same participants also performed a two-alternative forced choice (2-AFC) orientation judgement task to measure individual sensitivity (figure 1b). To mimic the typical serial dependence paradigm, we presented Gabors sequentially, so that only a single stimulus was present at a given time. Stimulus parameters, position and mask were the same as in the reproduction task. The two presentations were separated by a 3 s pause, which was the average temporal separation of stimuli in the serial dependence studies. Proportion of 'more clockwise' responses (of 80 trials participant 21 ) were plotted as a function   Figure 1. Timeline and stimuli. (a) In experiments 1 and 2 we presented a Gabor stimulus for 500 ms, followed by a mask for 1000 ms and a response oval which the participants had to adjust in order to match the orientation of the Gabor and then confirming their response by pressing the spacebar of the keyboard. In experiment 1, the orientation and spatial frequency (SF) of the Gabor were manipulated in a 2 Â 2 design. Stimulus orientation in 'oblique' conditions was close to the diagonal (from 258 to 658, in steps of 108), whereas in 'cardinal' conditions it was close to vertical (2208 to þ 208, in steps of 108). The spatial frequency of the Gabor was either 0.3 or 1.2 cycles per degree (cpd) in 'low SF' or 'high SF' conditions respectively. In experiment 2, the orientation of the Gabor was all around the clock in steps of 158 and the spatial frequency was fixed at 0.3 cpd. (b) In experiment 1 in separate sessions we asked a 2-AFC orientation judgement task, of two Gabors followed by a mask and separated by 3000 ms; participants were asked to indicate which was more clockwise. (Online version in colour.) rspb.royalsocietypublishing.org Proc. R. Soc. B 285: 20181722 of orientation difference between the first and the second stimulus to yield psychometric functions, which were fitted with cumulative Gaussians. The standard deviation (s) of these functions is an estimate of the underlying noise distribution. As there were two stimulus presentations in each trial, the final estimate of reliability was given by given by s/ p 2.
(c) Experiment 2 Experiment 2 studied how serial dependence varies with similarity of stimuli. We presented all possible orientations in steps of 158 keeping spatial frequency fixed at 0.3 cpd (figure 1a), with all other parameters and timings the same as in experiment 1. Data were arranged in a two-dimensional space according to the orientation of the current and previous stimulus. For each combination of the two values, we calculated the average bias, the average rootmean-square error and average response time. To gain power in the analysis, we then averaged together the conditions with the same 'previous-current' orientation difference. By convention positive difference indicates that the previous trial was more clockwise than the current. Six participants completed the experiment leading to about 12 000 trials in total.

(a) Ideal observer model
Here we develop a model predicting how serial dependence can lead to optimal performance, gauged by measuring the deviation from correct responses in the face of sensory noise. The literature on multisensory integration and regression to the mean suggests that any sensory representation which can be characterized by noise can benefit when other information is taken into account [21][22][23][24][25]. Similarly, our model is essentially a weighted sum of the current and previous stimulus, to reduce noisiness: where R curr is the response to the current stimulus. Current and previous stimuli are S curr and S prev , the weight of the previous stimulus is w prev . In general, when an observer combines two signals there is a reduction of uncertainty as the overall variance is which is smaller than either of the two variances alone. At the same time, a linear combination may introduce a detrimental biasing term: which is proportional to the weight of the other cue and the distance between the two cues. Overall the total squared error is given by their sum (figure 2h): optimization entails the minimization of this quantity by selecting the appropriate w: Rearranging so to highlight w prev (the term to marginalize) yields which is a quadratic function of w prev (i.e. y ¼ ax 2 þ bx þ c) which is minimized when ð3:6Þ When reliabilities of previous and current stimuli are the same, this can be simplified to which reveals that the crucial variable is the ratio between the stimulus change and the sensory resolution.
The core idea of the model is illustrated in the example scenarios of figure 2a-d. The leftmost column shows hypothetical distributions of sensory representations of a currently displayed stimulus orientated at 408, for which the resolution variability is 108 ( figure 2a, grey). This information could in principle be used either alone or in conjunction with that from the previous stimulus which, in this example, was 108 away (mean ¼ 308, s.d. ¼ 108). If the observer uses the optimal weight (w ¼ 0.33), the combined distributions of response estimates (given by the weighted product of the two original distributions) are slightly off the correct value, but also narrower (pink distribution in figure 2a). The lower plot (figure 2c) shows the distribution of overall squared error, calculated as the product of the magnitude of the error and the probability of its occurrence (figure 2a, grey). If the current sensory representation is used on its own (grey), the error distribution will be symmetrically bimodal. However, if the previous stimulus is combined in an optimal manner, with a weighting of 0.33 (pink distribution of figure 2c), the error distribution will become asymmetric, shifting to the left; but it will also become narrower, so the overall error (given by the area under the curve) is less than for the grey curve.
An important aspect of the model is that the weight given to the previous representation should be scaled down as the difference between past and present stimuli (d of equation (2.7)) increases. To illustrate this, we simulate a condition where a very different previous stimulus (408 more tilted) is combined with the present with an inappropriately high weighting of 0.33. The response distribution for the combined information is similarly narrower than either distribution alone (yellow distribution in figure 2b). However, the bias is now much larger (about 138 instead of 38), and it produces squared error distributions that are very high, far higher than that for just considering the current stimulus. Clearly, a weighting of 0.33 for such large inter-stimulus distances is not optimal. Indeed, our formula prescribes that for 408 difference, the weight of the past should only be 0.056 (purple star in figure 2e-g). Figure 2e shows the optimal weight as a function of orientation difference. The theoretical considerations above suggest that for very small differences the weight should be 0.5 (equal for past and present), then roll off as stimulus distance increases. Figure 2f shows the corresponding biasing errors, which are reminiscent of those found by Fischer & Whitney [3]. Figure 2g shows the overall root-mean-square error (RMSE) as function of stimulus distance, with the best performance obtained when two successive stimuli are identical, and thus serial effects are maximal. Many of these signatures are evident in published data. Here, we aim to rspb.royalsocietypublishing.org Proc. R. Soc. B 285: 20181722 test directly various aspects of the model, showing that serial dependence leads to optimizing perception.

(b) Experiment 1
We measured sensitivity and serial dependence in orientation judgements for four types of Gabor stimuli differing in average orientation (oblique, cardinal) and spatial frequency (low, high). Figure 3a-d shows psychometric functions for discriminating the orientation of the four types of stimuli. It is clear that the steepness of the psychometric functions depends on both spatial frequency and orientation: they are steeper (implying higher sensitivity) for cardinal stimuli of high spatial frequency, and shallower for oblique stimuli of low spatial frequency. Average just-noticeable differences (JNDs) (given by 1 s.d. of the cumulative Gaussian fit) are given in figure 3a-d. A two-way repeated measures ANOVA confirmed that high spatial frequencies yield lower thresholds than low spatial frequencies (F 1,5 ¼ 67.8, p ¼ 0.0004) and that cardinal stimuli yield lower thresholds than oblique stimuli (F 1,5 ¼ 21.3, p ¼ 0.006). No significant interaction was found (F 1,5 ¼ 1.34, p ¼ 0. 30). Figure 3e-h shows serial dependence for the four types of stimuli. In all cases, the current trial was biased towards the orientation of the previous trial: positive when it was positive, and negative when negative. We calculate the weights of the past stimuli by the slope of the best-fitting linear regression to the three data points. The estimated weights (shown in figure 3e-h) increase orderly from weakest serial dependence for high spatial frequency cardinal Gabors to highest effects with the low spatial frequency oblique Gabors. A two-way repeated measures ANOVA shows a main effect both of spatial frequency (F 1,5 ¼ 9.4, p ¼ 0.028) and average orientation (F 1,5 ¼ 7.2, p ¼ 0.044). No significant interaction was found (F 1,5 ¼ 0.001, p ¼ 0.97). This indicates that both factors have a strong and independent effect upon serial dependence.
To better explore the relationship between serial effects and sensory thresholds, we plot the strength of the serial effect against the discrimination threshold for all conditions (a,c) show representation histograms. Along with the representation of sensory estimates, we show the histograms of estimates provided by an observer who integrates current and previous signals with a 0.33 weight of the past (which is optimal for the 108 difference and non-optimal for the 408difference).
(b,d) show squared error distributions associated with each estimate of the observer: squared error grows fast for estimates which are far from the correct value (408). In the case of small difference and optimal weight (b, pink) the observer performing integration fares better than the original set of estimates.
In (e) we show how the weight of the previous stimulus should vary as function of distance between the two stimuli, for noise of 108 (continuous line) and 38 (dashed line). Panels ( f,g) display the bias and root-mean-square error (RMSE) of the ideal observer employing optimal weighting of current and previous information. Again, two examples are shown assuming noise of 108 (continuous line) and 38 (dashed line). The conditions depicted in panels (a-d) are highlighted with a pink star (optimal weight for 108 difference) or a hollow circle (non-optimal weight with 408 difference) along with optimal choice for 408 ( purple star). Inset (h) illustrates that RMSE is the Pythagorean sum of bias and the square root of variance.
rspb.royalsocietypublishing.org Proc. R. Soc. B 285: 20181722 and participants ( figure 4). Superimposed on the data is the prediction of the ideal observer model of equation (3.1), which aims to minimize reproduction errors, considering both sensory noisiness and inter-stimulus distance. Note that there are no free parameters in this simulation, yet it follows the trend of the data very well (R 2 ¼ 0.21). If we allow a simple scaling factor of k ¼ 0.75 (which could reflect underestimation by the system of its noisiness, or corruption of the memory trace), the fit improves to R 2 ¼ 0.58.

(c) Experiment 2
Fischer & Whitney's initial report [3] showed that serial dependence is strongest when two successive stimuli are relatively similar. This fact is well captured by our ideal observer model, with the term d 2 on the denominator. To test further whether the model could predict this behaviour quantitatively, we measured serial dependence with low spatial frequency Gabors (which yield largest serial dependence effects) at all possible orientations in steps of 158. Figure 5a shows the signed error in the current trial as a function of orientation difference between the previous stimulus and the current response. As reported by Fischer & Whitney [3], the maximum bias occurs at about 158 and scales down when stimulus differences are larger. The black curve shows the prediction of our ideal observer model to the average data across all orientations. Since in this experiment we did not collect independent measures of sensory resolution we employed the average sensory resolution for low spatial frequency stimuli in the previous experiment (6.78). It is clear how the model provides a good match to the data, in particular in the central region (from -60 to þ608), where the fit is very good fit (R 2 ¼ 0.75). The Bayesian-based ideal observer model reduces overall error in discriminating noisy sensory inputs [22,24,26] by averaging successive noisy stimuli (figure 2). Figure 5c plots the average response scatter (root variance) as a function of the difference in orientation between the current and previous stimuli. When two orientations are identical (d ¼ 0), responses are less scattered, about 15% less than when stimuli are 158, 308 or 458 different (worst t-test is 215 versus 0 t 5 ¼ 22.6 p , 0.025). This is a clear signature of automatic averaging effects in the perceptual system. The black curve in figure 5c shows the predicted scatter of the ideal observer model, using the model employed in figure 5a (again assuming that the average sensory resolution is 6.78), with the only  extra assumption that the response adds a constant noise to all of the trials (adjusted to best-fit, about 6.28). The model, with only one degree of freedom, captures well the pattern of data (R 2 ¼ 0.74). Figure 5d plots scatter independently for near cardinal and near oblique stimuli. As expected, the scatter in this condition inherits the amount of perceptual noise associated with each stimulus and near oblique stimuli have more response scatter. Again, the ideal observer (dashed lines), with sensory resolutions of s ¼ 5.6 and 8.28 for cardinal and oblique, and 6.28 of motor noise fixed from the fit of figure 5c provides a very good description of the data (R 2 ¼ 0.77 and R 2 ¼ 0.81 for oblique and cardinal). Figure 6a plots root mean-square error (RMSE), or total error in the reproduction responses as a function of the stimulus orientation difference between trials. RMSE is given by the Pythagorean sum of biasing errors and scatter errors, displayed independently in figure 5a,c (see illustration in figure 2h and equation (3.4)). This plot demonstrates the optimality of the model and the observer responses. Consistently with the predictions of the ideal observer, when two successive stimuli are identical (d ¼ 0), serial dependencies should be at their highest yielding minimal error, which is what is found.
Although we did not ask subjects to make speeded responses, it is possible that the conditions which resulted in less error were accompanied by longer response times (speed-accuracy trade-off ). We therefore plot median response times in figure 6b. The response times show a clear minimum for identical successive stimuli and increasing with orientation difference (figure 6b). Figure 6c plots one quantity against the other, revealing how when two successive stimuli are identical there is a benefit along both dimensions and ruling out speedaccuracy trade-offs. This also shows that serial dependence can lead to increased efficiency, not only for error, but also for the more conventional measure of reaction times.

Discussion
In this paper, we tested explicitly a model for response optimization that leverages on the previous stimulus to minimize response errors. The model was first developed to explain mapping number to space [2], adapted here for orientation reproduction. We show that serial dependence complies fully with the predictions of the Bayesian inference model. On one hand, we demonstrate that serial effects scale with stimulus uncertainty and similarity of current and previous stimuli; on the other, we show that both reproduction errors and reaction times are lower when serial dependence is strong, showing it is beneficial.
The first prediction of our model is that serial dependence scales with sensory uncertainty (figure 2e). We measured serial dependence and sensory discrimination thresholds for four types of Gabors varying in orientation and spatial frequency and found that they vary together. This result confirms our suggestion that oblique and cardinal stimuli have different serial effects, reflecting their different reliabilities [5,18,19,27]. The model simulated these effects well, using the measured estimates of noise, and hence no free parameters. This result  [2], and shows that it generalizes well to other perceptual tasks such as orientation reproduction. It suggests that stimulus uncertainty is the major driving effect of serial effects regardless of the source of noise.
We also replicate and model the tuning of serial dependence (figure 5) [3,4], an important feature as it proscribes integration of dissimilar stimuli, which could lead to large estimation errors. The model is not perfect, especially for large differences in orientations (around 608), where it tends to predict more serial dependence than is actually observed. This may reflect inadequacies of the model, or the fact that the system does not have direct access to the model parameters (discussed below). However, the model does capture the fact that dissimilar features tend not to be integrated, a fact often observed in research into multisensory fusion: diverse stimuli are usually not integrated [28][29][30]. This points to stimulus similarity as a general perceptual rule, which is implemented in many systems and may be an important hallmark of the neural implementation of Bayesian processes. Interestingly our model gives some indication of appropriate parameters for measuring serial dependence. When the difference between stimuli is about 1 JND (s ¼ d in equation (3.7)), we expect the weight of the previous stimulus to be about 0.3.
Our model is an ideal observer model, developed to minimize total reproduction error (equation (3.5)). This results in a simple equation (equation (3.6)), where the theoretical weighting to the past depends on the reliability of past and present stimuli, and on their similarity. This is ideal. However, observers do not have access to the ground truth, either of the reliability of the stimuli or of the actual difference between them. All these parameters would need to be estimated in some way, and the estimation itself would not be noise-free. Several ideas have been advanced on how the system may extract an estimate of internal noisiness [31], and also of similarity [30], especially in multisensory research where this has long been acknowledged as a problem. Typically, researchers assume there is a 'coupling prior' [32], which they estimate from their data. We do not propose here any specific method of estimation, but rather use a theoretical value that should optimize performance. That this works well in predicting the data suggests that the system does have access to estimates of both noisiness and similarity of successive stimuli, although the mechanisms by which the parameters are estimated remain unknown and should be the subject of further research.
Importantly, serial dependence leads to an improvement of overall performance measured by response scatter ( figure 5). This could only occur if noisiness were reduced by the integration of information over trails. Reduction of response scatter, as well as of overall response error (figures 5 and 6) was well predicted by our model and is in line with the Bayesian framework. Recently, a debate has emerged about whether serial dependence acts at the level of perception, or at the level of decision processes [3][4][5][6]33]. This distinction is fundamental in understanding the role of serial dependence for perception in general: if it acts only on decisional processes, it may have little to do with perception itself. The paradigms used here do not attempt to distinguish 'perception' from 'decision'. Indeed, to model the data we measured perceptual reliability with a forced choice technique that emulated the conditions of the main experiments, including a 3 s pause between stimuli. Thus the measured noise includes not only sensory components of encoding the stimuli, but also effects that could reflect short-term memory, over the 3 s interval between trials. However, the fact that both data and model show very clear improvement in performance suggests that the serial dependence does not only bias perceptual decisions, but acts to improve perceptual efficiency, presumably by acting on perceptual processes themselves.
Our results also show that serial dependence can lead to decreases in reaction times. This is result was unexpected and shows that the error reduction is genuine and not simply the by-product of a speed-accuracy trade-off. Although the conditions of our experiment are far from ideal (because we did not explicitly request speeded responses), the results still suggest that serial dependence may facilitate the accrual of information over time for the reproduction task. This fact helps to relate the recent research line of serial dependence to the older priming literature, which has long documented speedup of responses when current presentations were primed by a suitable stimulus [34][35][36][37]. This is particularly obvious for examples when the term 'priming' has referred to low-level attentional selection [36,37], and has been conceived as a general perceptual process. Our demonstration that serial dependencies speed-up response times in a reproduction task show that biases in perception may well go hand in hand with perceptual distortions. After all, it is now becoming clear that the various processing stages of the brain accumulate evidence, and an alteration in lower-level representations that improves the quality of information also impacts on response speed.
Ethics. The study was approved by the Regional Ethics Committee (Comitato Etico Pediatrico Regionale-Azienda Ospedaliero Universitaria Meyer-Firenze), and was in accordance with the ethical standards of the 1964 Declaration of Helsinki. Informed written consent was obtained from each participant prior to the experiments.