Abstract
In the present review, we address the relationship between attention and visual stability. Even though with each eye, head and body movement the retinal image changes dramatically, we perceive the world as stable and are able to perform visually guided actions. However, visual stability is not as complete as introspection would lead us to believe. We attend to only a few items at a time and stability is maintained only for those items. There appear to be two distinct mechanisms underlying visual stability. The first is a passive mechanism: the visual system assumes the world to be stable, unless there is a clear discrepancy between the pre- and post-saccadic image of the region surrounding the saccade target. This is related to the pre-saccadic shift of attention, which allows for an accurate preview of the saccade target. The second is an active mechanism: information about attended objects is remapped within retinotopic maps to compensate for eye movements. The locus of attention itself, which is also characterized by localized retinotopic activity, is remapped as well. We conclude that visual attention is crucial in our perception of a stable world.
1. Introduction
In recent years, many researchers have emphasized that vision is an active process (e.g. [1,2]). This emphasis is well justified, since what we see depends as much on internal cognitive processes as it does on what is actually out there to see. An important aspect of active vision is that of all the visual information that is available to us, only a very limited selection is fully processed and ultimately guides action and perception. The remainder of the information is filtered out in the early stages of processing. This mechanism of selection is generally referred to as selective visual attention. By covertly attending (i.e. without making an eye movement) to a stimulus, we perceive that stimulus more clearly than we would if attention were unfocused or directed elsewhere. This increased perceptual ability can be measured as an increased sensitivity to faint stimuli [3], enhancement of perceived contrast [4] and decreased reaction times to attended stimuli [5]. In addition, visual attention is characterized by an inhibitory surround: processing of stimuli outside of but near the focus of attention is suppressed (e.g. [6–8]). These findings are paralleled by neurophysiological studies which have shown that visual attention enhances neural responsiveness and selectivity [9,10] and that the neural response to non-attended stimuli near the focus of attention is inhibited ([11]; for a review, see [12]). In addition to directing attention to a location in space, it is also possible to direct attention based on non-spatial features, such as colour or direction of motion [13]. However, in the present review, we will focus on spatial attention, which is especially relevant in the context of visual stability.
The effects of attention as studied in the laboratory are generally modest. For example, people respond about 20 ms faster to a validly cued, attended stimulus than to an uncued, neutral stimulus [5]. Presumably, this effect is small, because the display is sparse. In such a display, there is little competition between stimuli and therefore little effect of attention [9]. However, in more natural settings, the effects of attention can be substantial. This has been elegantly demonstrated in experiments on change blindness ([14]; see also [15]). In a typical change blindness experiment, participants observe two displays that are presented in alternation and differ in some important respect. If the two pictures are presented in immediate succession, the change is readily detected, because it constitutes a unique visual event. However, if a blank screen is introduced between the two displays, it takes considerable time and effort to detect the change. This is because the entire display now flashes and the change is no longer a unique visual event. In order to nevertheless find the changing element, you have to attend to different parts of the display in a serial fashion. This illustrates that, in natural settings, it is an understatement to say that attention provides us with improved perceptual abilities. Rather, we consciously perceive only what we attend to [16], which will be a recurring theme in the present review.
An equally important aspect of active vision is that we continuously make eye, head and body movements. This way, we actively control which visual input we receive, even prior to any effects of covert visual attention. Eye movements are an integral part of vision, because without eye movements we would only perceive a very small part of the visual field with high acuity and in colour: the part that projects onto the fovea. By making eye movements we sequentially extract information from different parts of the visual field. This method of actively sampling our environment comes so naturally that we are generally not aware of it. Perhaps even more surprisingly, we are also not aware of the fact that with each eye movement there is a corresponding shift in our retinal image of the world. Somehow, despite incomplete and unstable visual input, we feel as though we have a complete and stable percept of the world and are able to effortlessly perform visually guided actions.
In the current review, we focus on the role of attention in visual stability. Section 2 discusses trans-saccadic memory (TSM), a visual memory buffer that allows information to be retained across saccades. Section 3 describes the assumption of stability: we perceive a stable world, simply because we assume the world to be stable. The final three sections discuss remapping of receptive fields (RFs), which has received considerable interest as a potential mechanism underlying visual stability. Sections 4 and 5 deal with neurophysiological and behavioural studies on remapping, respectively. Section 6 describes a number of alternative views, which challenge the traditional notion of remapping.
2. Trans-saccadic memory
Subjective experience suggests that visual stability is absolute and complete. Not surprisingly, therefore, it has been suggested that conscious experience does not rely directly on retinotopically organized input, but on a representation of the world which is independent of eye position (spatiotopic). In general terms, TSM is such a spatiotopic memory buffer. However, its exact characteristics have been the subject of substantial debate and revision (for a review, see [17]). Initially, TSM was assumed to be a pre-attentive visual buffer, containing all visual detail of the world. In this form, it was also called an integrative visual buffer to emphasize its role in trans-saccadic integration [18,19]. Because trans-saccadic integration was believed to occur pre-attentively (at an early stage of processing), it was predicted that people should be able to seamlessly integrate information across saccades. Essentially, it should not matter whether people make eye movements or not. Although there was some initial support for this idea [20,21], further scrutiny revealed that people are often unable to integrate information across saccades [22–24], whereas they have no difficulty doing so while fixating [25]. These findings did not cause the notion of TSM to be abandoned, but the concept clearly needed modification (figure 1).
In a series of studies, Irwin [26–28] investigated the properties of TSM. In one experiment, participants were presented with an array of letters [27]. Next, a saccade target was presented. As soon as participants initiated an eye movement, the array of letters was extinguished. After the eye movement, a cue was presented and participants had to report which letter had been presented at the cued location. This experiment revealed two important properties of TSM. First, people remembered only three to four letters, suggesting a capacity limitation. In addition, memory was best for objects near the saccade target. The importance of this latter finding became apparent when later studies revealed that an eye movement is always preceded by a covert shift of attention [29,30], so that the saccade target receives an attentional benefit. This explained why in Irwin's study [27] TSM was best for stimuli near the saccade target: those stimuli received an attentional benefit and were therefore stored in TSM. The idea that attention functions as a ‘gatekeeper’ for TSM was investigated in more detail by Prime et al. ([31]; see also [32]). They instructed participants to remember a number of randomly positioned stimuli (patches of tilted lines known as Gabor patches). One of these stimuli was cued prior to its presentation, indicating that it was likely to be probed in the response phase. Presumably, participants attended to the cued stimulus. After an eye movement, a probe stimulus was presented (another Gabor patch). Participants reported whether the probe was tilted clockwise or counter-clockwise, relative to the original stimulus (the stimulus that had previously been presented at the same location). The crucial finding was that performance was best for stimuli that had been cued, confirming that TSM is best for attended stimuli.
On the basis of these findings, it can be concluded that TSM has a limited capacity and that attention acts as a gatekeeper. Other properties, not directly related to visual attention, are that TSM deals predominantly with abstract, conceptual information [17,33] and has a coarse spatial resolution [34]. Low-level, non-conceptual information has some effect on trans-saccadic integration, the extent of which is a matter of debate (e.g. [35,36]), but there appears to be a type of ‘gradient’: low-level features are not entirely lost, but conceptual features are dominant [37]. Taken together, the properties of TSM are strongly reminiscent of spatial working memory. The natural conclusion is that TSM is not a separate entity, but simply a name for spatial working memory in the context of eye movements [26].
To conclude, researchers have posited the existence of TSM. TSM contains a spatiotopic representation of the world, which is independent of eye position. In order to be integrated across saccades, stimuli need to be stored in TSM. Rather than a dedicated mechanism for trans-saccadic perception, TSM appears to rely on working memory [26,31]. TSM has a limited capacity and only information about attended stimuli is retained [31,32].
3. The assumption of stability
As was mentioned in the previous section, every saccade is preceded by a covert shift of attention [29,30,38]. In a typical paradigm investigating pre-saccadic shifts of attention, participants are instructed to make an eye movement to a particular location. After participants have been cued to make a saccade, but before the eyes set in motion, a stimulus is presented at the saccade goal. The pre-saccadic shift of attention is reflected by the finding that stimuli presented at the saccade goal are more readily discriminated [29] and elicit stronger priming effects [38] than stimuli presented elsewhere. A related finding is that people subjectively feel that the eyes have already moved to the saccade target, when in fact the saccade is yet to be executed [39,40]. Presumably, this is due to the pre-saccadic shift of attention, which provides improved perception of the saccade target before it has been foveated.
A number of researchers have suggested that the pre-saccadic shift of attention is integral to visual stability [41–44]. In this view, attention precedes an eye movement to allow for an accurate preview of the saccade target. After the eye movement, this region is observed again and trans-saccadic integration occurs based on the assumption that the saccade target and its surroundings have remained stable. It is, in a sense, a ‘snapshot’ theory, in which pre- and post-saccadic snapshots are superimposed. This differs from the traditional notion of TSM in that no knowledge of absolute spatial positions is required, since snapshots are integrated based on content rather than location. This also differs from the integrative visual buffer in that these snapshots are believed to contain mostly abstract representations, modulated by attention.
Assuming that the saccade target is stable (at least for the duration of a saccade) makes ecological sense, but in the laboratory it can be violated quite easily by moving the saccade target while the eyes are in motion. Since visual perception is strongly suppressed during eye movements [45,46], the exact moment of displacement is not observed and the visual system relies on pre- and post-saccadic snapshots to detect the displacement. Remarkably large displacements of the saccade target go unnoticed [47], confirming the notion that the visual system assumes the saccade target to be stable unless there is strong evidence to the contrary. In situations where the saccade target is clearly not stable, for example if the saccade target is already in motion prior to the saccade [48] or is briefly blanked after the saccade [49], displacement detection is greatly improved.
Visual attention is intricately related to the assumption of stability, as attention appears to be a determining factor in which objects are assumed to be stable. We can illustrate this by describing the assumption of stability in terms of ‘finding the best fit’ (figure 2). As mentioned, pre- and post-saccadic snapshots of the saccade goal and its surroundings are constructed. These snapshots contain representations of stimuli to the extent that they are attended. Effectively, this means that the saccade target itself is strongly represented, but nearby stimuli can also be represented, although more weakly. Integration occurs based on the assumption that the best fit between the pre- and post-saccadic snapshots is the true fit. This simple principle explains many findings. For example, if the saccade target is displaced during the saccade, there is still a perfect fit between pre- and post-saccadic snapshots (figure 2a). The only difference lies in absolute spatial position, which is not a factor in determining the best fit. Consequently, the visual system fails to perceive the displacement. We can also consider what happens if a second stimulus (an ‘X’) is added, which remains stable while the saccade target is displaced (figure 2b). In this case, the best fit still results from matching the pre- and post-saccadic saccade target. The best fit requires a misalignment of the pre- and post-saccadic X, because it receives less attention than the saccade target and therefore contributes less to the overall fit. Consequently, the X is erroneously perceived as being displaced [50]. This principle also explains why, if multiple stimuli are presented, a displacement is generally attributed to the stimulus that is briefly blanked at the moment the eyes arrive at the saccade target, regardless of which stimulus was actually displaced [42,51]. This is because only the stimuli that are present right after the saccade contribute to the fit. If one of the stimuli is missing (because it has been blanked), the fit will be poor, but the best fit will nevertheless result from aligning the stimuli that are present.
There are a number of qualifications that should be made. First, we have not considered what would happen if a stimulus is replaced by a qualitatively different stimulus during a saccade. Changing stimulus identity has a definite effect on trans-saccadic integration, which indicates that qualitative factors are important in matching pre- and post-saccadic information (e.g. [33]). In addition, even if a stimulus is briefly blanked after the saccade, it may still serve as a stable reference point, provided that other stimuli are blanked for a longer period of time [51]. This suggests that there is substantial temporal ‘fuzziness’ in the assumption of stability. Perhaps even more surprisingly, effects of stimulus blanking and displacement can also be observed during fixation, suggesting that the assumption of stability is a general phenomenon and not limited to trans-saccadic perception [51].
An important question is: if only a saccade target is presented, why does post-saccadic blanking improve detection of its displacement [49]? The fact that blanking breaks the assumption of stability is part of the explanation, but leaves us with another question: why do we still have a sense of position when we cannot rely on the assumption of stability? The answer must be that we fall back on different mechanisms (see §§3–5). This is also supported by evidence from corrective saccades. If a saccade target is displaced during the saccade, corrective saccades are executed towards the new location of the saccade target [52]. This is the assumption of stability at work. However, if the saccade target is removed (after the eyes set in motion), corrective saccades are executed towards the former location of the saccade target [53]. Clearly, the visual system has a way of maintaining positional information across saccades that does not rely on the assumption of stability.
To conclude, our visual system exploits the fact that the world is a stable place, at least for the duration of an eye movement (e.g. [43,44]). Generally, the saccade target dominates the assumption of stability, because it is strongly attended just before each eye movement (e.g. [30]), but other attended stimuli may serve as a stable reference point as well.
4. Remapping and attention: neurophysiology
As visual information enters the primary visual cortex, retinal topography is preserved: adjacent neurons process information from adjacent, and usually largely overlapping parts of the retina [54]. However, as we move further upstream in the visual processing hierarchy, things become considerably less clear. RFs of neurons in these later areas differ in many important respects, but here we focus on the distinction between retinotopy and spatiotopy. In addition, RFs change in different ways in the interval preceding an eye movement ([55]; see §5), but here we restrict the discussion to pre-saccadic RF shifts in the direction of the eye movement, usually called predictive remapping.
If the RF of a neuron is retinotopic, it is anchored to a location on the retina, which may correspond to different locations in the world depending on eye position. This is essentially what underlies the problem of visual stability. In contrast, if a neuron has a spatiotopic RF, it is always responsive to the same spatial location, irrespective of eye, body and head position. Because in most studies the head and body are in a fixed position, the term ‘spatiotopic’ is often used loosely and applied to responses that are highly independent of eye position. An important question is whether spatiotopy exists in the brain. It is attractive to assume that it does, since this would effectively solve the problem of visual stability. According to the spatiotopic hypothesis, action and conscious experience are based on spatiotopically organized brain areas. This bears some conceptual resemblance to TSM, although TSM is a cognitive construct that is not necessarily intended to reflect a spatiotopic map at the neural level.
In apparent support of the spatiotopic hypothesis, brain areas have been identified in which RFs are modulated by eye position ([56–58]; but see [59]). RFs in these areas are not retinotopic, but neither is it obvious that they are of the fine-grained spatiotopic sort that would be expected based on the spatiotopic hypothesis. An alternative, perhaps more likely, interpretation is that these RFs are tailored towards a specific modality, rather than being spatiotopic and directly related to visual stability. For example, in the extended dorsal stream, there is a continuum from visual to motor responses, such that observing an object automatically activates an associated motor programme [60]. Since information in retinal coordinates is of little use for programming manual reaching movements, a translation from retinotopic coordinates to a more appropriate frame of reference (for example, body-centred coordinates) seems natural. However, this does not require true spatiotopy and does not provide strong evidence for the spatiotopic hypothesis.
For this reason, the spatiotopic hypothesis has fallen out of favour as the complete solution to the problem of visual stability (see [61] for a discussion). However, it is well established that many RFs are modulated by eye position, presumably mediated by a corollary discharge [62]. It has been proposed that remapping of RFs might be the solution to the problem of visual stability. Before discussing neurophysiological studies, we will briefly introduce the concept of remapping by analogy.
Imagine that you are sitting in a train without windows. You are instructed to remain at the same position—not relative to the train, but relative to the outside world. This is tricky, because the train occasionally moves and you cannot look out of the windows to see where you are. Fortunately, the train operator always announces exactly how far and in what direction the train is going to move, just before the train actually sets in motion. Therefore, if you hear ‘Folks, we are about to move 20 m forward’, you quickly run 20 m to the back of the train, thus compensating for the movement of the train.
How does this example relate to visual stability? Imagine that a stimulus is briefly presented. Even after the stimulus has been extinguished, there is some residual neural activity. This is often called a memory trace [63], but you can also think of it as an attention-related increase in baseline activity [64]. The problem that the memory trace faces is analogous to that of our example. If the eyes move, the memory trace becomes misaligned with the world: the same spot in the retinotopic map now corresponds to a different location in the real world and therefore the memory trace is not sitting in the right spot of the retinotopic map any more. Fortunately, the corollary discharge informs the visual system of the impending eye movement. Using this information, the memory trace can be transferred onto a different set of neurons in the same retinotopic map, so that it remains correctly aligned with the world (e.g. [63]). This mechanism is called remapping or spatial updating. In a nutshell, remapping is a transfer of activity between retinotopically organized neurons. This transfer of activity is such that it compensates for eye movements, effectively updating retinotopic representations to prevent a misalignment with the world. This provides a way for the visual system to maintain visual stability without the need for spatiotopic RFs, and therefore it is sometimes called the retinotopic hypothesis.
Remember that the train operator signals movement before the train actually sets in motion. This allows you to get a head start, by running to the back of the train before the train starts moving forward. Similarly, a corollary discharge informs the visual system of an eye movement before it occurs, since it conveys information about intended rather than actual eye movements. This allows remapping to start before an eye movement, in which case it is referred to as predictive remapping. So far, we have looked at remapping from the perspective of the memory trace (of course, the same principles apply to remapping of visual information in general). However, predictive remapping is commonly described in terms of RFs. This distinction is important, because the identity of a memory trace is independent of the neurons that encode it. After all, the memory trace may be remapped from one set of neurons onto another. This shift in perspective is also useful, because it sheds some light on how remapping works. In the interval preceding an eye movement, RFs shift in the direction of the eye movement [65]. This may seem at odds with the fact that the memory trace is remapped in the direction opposite from the eye movement (as you run against the movement of the train), but it is not (figure 3). The anticipatory RF shift allows a neuron to take a ‘sneak peak’ at the location that will be brought into its RF. This is somewhat analogous to the pre-saccadic shift of attention (see §3) but applies to the visual field as a whole, rather than just the currently fixated location. In this context, the RF-location-to-be is often called the future field (FF). If the memory trace happens to be in a neuron's FF, the neuron will take over some of the memory trace activity, which corresponds to remapping of the memory trace. Remapping of activity is therefore in the direction opposite from the anticipatory RF shift.
We now move on to the actual neurophysiological studies. The first evidence for remapping was reported in primate single-cell recording studies of the frontal eye fields (FEFs; [66]) and the lateral intraparietal area (LIP; [67]), using the double-step paradigm. In a typical double-step task, two saccade targets are briefly presented. After the targets have been removed, participants (monkeys in this case) make two successive eye movements to where the targets used to be. The rationale behind this paradigm is that the first eye movement causes a retinal displacement of the location of the second target. Because the second target is no longer visible at the time of the second eye movement, somehow this retinal displacement needs to be taken into account when programming the second eye movement. The crucial finding was that if the location of the second target was brought into a neuron's RF (or movement field) by the first saccade, the neuron would often respond, even though the second target was no longer visible. The explanation is that the memory trace of the second target was remapped to compensate for the eye movement, and that the neuron was responding to the remapped memory trace.
In a landmark study, Duhamel et al. [65] extended these finding in a remarkable way. They recorded cells from the monkey LIP. The crucial finding was that almost half the neurons became responsive to their FF after the monkey had been instructed to make a saccade, but before the saccade had been executed: unmistakable evidence for predictive remapping. In addition, neurons became less responsive to their current RF: RFs shifted from current to FF. However, later studies showed that in other areas neurons sometimes become responsive to their FF, but remain responsive to their current RF as well [68]. When a stimulus was removed before the saccade, Duhamel et al. [65] found evidence for remapping (not necessarily predictive) of the memory trace of the removed stimulus. This was the case for almost all LIP neurons, so remapping is really a ubiquitous phenomenon in some brain areas.
In addition to the parietal cortex [65], remapping has been demonstrated in the FEF [69], the superior colliculus [70], areas V3 and V2 [68] and even in V1 [71]. Despite the fact that remapping occurs at many, if not all, levels of the visual system, the tendency is for early visual areas to show less remapping and later in time than areas such as the FEF [72]. This, and the observation that the FEF receives a strong corollary discharge [73], has led researchers to suggest that the FEF may be an important source for remapping of visual responses [74].
With respect to visual attention, an important question is whether a covert shift of attention is by itself sufficient to trigger remapping, as one might think given the strong link between visual attention and the oculomotor system [75]. This is not the case: remapping occurs only in combination with eye movements and there is no evidence to suggest that it can be induced by a covert shift of attention [76]. This makes sense, of course, because remapping in the absence of an eye movement would cause retinotopic maps to become misaligned with the world, which would stand in contrast with the assumption that remapping is a mechanism to prevent misalignment. However, visual attention does play an important role in remapping in a different way. By recording neurons from LIP, Gottlieb et al. [77] investigated how remapping is affected by attention. Area LIP is often conceptualized as a priority (or saliency) map [78]. That is, it is believed to contain little information about specific features, such as colour and form, but to be driven by the abstract notion of ‘priority’. The priority of an object is determined by bottom-up and top-down factors. Bottom-up factors are due to stimulus features, such as a sudden onset or a conspicuous colour, but are short-lasting [79]: an onset stimulus initially captures attention, but attention can be disengaged quickly. Top-down factors are due to the behavioural relevance of an object and can be long-lasting: if you want to, you can attend for a long time to a stimulus, even if it is inconspicuous. The priority map is ‘read out’ by the visual system to guide attention and therefore there is a strong correspondence between activation in the priority map and the allocation of attention [78]. In accord with the view of LIP as a priority map, Gottlieb et al. [77] found that LIP neurons showed sustained response to a behaviourally relevant stimulus (a saccade target), brief response to a behaviourally irrelevant onset stimulus and little to no response to a behaviourally irrelevant persistent stimulus (i.e. a stimulus that has been visible for an extended period of time). With respect to visual stability, an important question is what happens if a persistent stimulus is brought into a neuron's RF by an eye movement. From the neuron's perspective, which has never ‘seen’ the persistent stimulus before, the stimulus is novel and might therefore elicit a burst of activity as though it were an onset. This would result in a large number of pseudo-onset stimuli with every saccade, which would clearly be detrimental to performance and our sense of visual stability. What Gottlieb et al. [77] found was that a stimulus elicits a burst of activity only once, even if an eye movement brings it into the RFs of a new population of neurons. This shows that an important characteristic of bottom-up attention is preserved across saccades: stimuli capture attention only once.
An important question is whether remapping applies to all stimuli or only to a subset. The study by Gottlieb et al. [77] shows that stimuli that are attended are also remapped. Of course, it is difficult to show conclusively that information about unattended objects is never remapped, but many researchers believe this to be the case. Therefore, since most stimuli are not attended, most stimuli are not remapped. This is strongly reminiscent of the behavioural studies that we discussed previously, showing that TSM is best for attended stimuli [32,80].
To conclude, remapping (or spatial updating) is a strong candidate mechanism for visual stability [61]. Remapping refers to the transfer of visual information within retinotopic maps to compensate for eye movements [65]. It is generally believed that remapping is limited to attended stimuli [77]. Therefore, visual stability is maintained only for those stimuli that guide action and conscious perception.
5. Remapping and attention: behavioural findings
The hypothesis that remapping of RFs underlies visual stability is originally based on neurophysiological findings. However, there is a fast growing body of behavioural research on remapping. In this section, we highlight a number of behavioural studies that have specifically investigated the role of visual attention in remapping.
Melcher [32] investigated trans-saccadic integration using the tilt-adaptation after-effect (TAE). In a typical TAE experiment, participants are exposed to a tilted grating (the adapter) for some time. Next, they are presented with another, slightly tilted grating (the tester) and are asked to report the orientation of the tester. TAE is a bias to report the tester as being tilted away from the adapter orientation. TAE persists, albeit in slightly reduced form, if an eye movement is executed between the presentation of the adapter and the tester, if they are presented at the same spatial location ([32,81]; but see [82]). This suggests that the representation of the adapter is remapped to compensate for the eye movement (see also [83]). Gottlieb et al. [77] have shown that, at least for LIP neurons, visual attention determines which objects are represented and consequently remapped. Similarly, Melcher [32] found that if attention was directed to an adapter stimulus, TAE increased. However, this was the case regardless of whether an eye movement had been made between the presentation of the adapter and the tester. Again, this demonstrates that attention determines which objects are represented and that only represented (i.e. attended) objects are remapped. The role of attention in visual stability is therefore the same as the more general role of attention as a perceptual filter.
Another important question is whether attention itself is remapped. Even though the locus of attention is not a physical stimulus, it is characterized by localized activity in the visual system and as such can be remapped like a regular stimulus. It has generally not been described in these terms, but this is exactly what has been done in the previously discussed neurophysiological and neuroimaging studies, which investigated remapping of a memory trace [63,65,66]. In these studies, a stimulus was presented briefly, presumably attracting attention. Even after the stimulus had been removed, some residual activity was observed. This residual activity is usually referred to as a memory trace, but as suggested earlier, it can also be thought of as an attention-related increase in activity [64].
Posner & Cohen [84] were the first to investigate the reference frame of attention or, using modern terminology, remapping of attention. They investigated both attentional facilitation and the subsequent inhibitory phase (inhibition of return, IOR). Posner & Cohen [84] found that facilitation was retinotopic: if participants made an eye movement, the locus of attention moved with the eyes to a new spatial position. In contrast, they found that IOR was spatiotopic: the locus of inhibition remained at the same spatial location regardless of eye movements (see also [85]; but see [86]). The finding that IOR is spatiotopic makes ecological sense, because IOR is a relatively sustained effect, typically spanning multiple eye movements. However, the dissociation between facilitation (retinotopic) and inhibition (spatiotopic) was surprising, since these two phenomena are generally assumed to be linked.
More recently, Golomb et al. [87] investigated remapping of attention in more detail. In order to attract attention to a location, they instructed participants to remember the location of a briefly flashed cue [88]. After participants had made an eye movement, a line segment was presented at one of three locations: the original attended location, a location that retinotopically matched the original attended location or a control location. The reaction time difference between the location of interest and the control location in reporting the orientation of the line segment (attentional facilitation) was taken as a measure of attentional allocation. The results depended strongly on the task instruction and on the moment at which the line segment was presented. If the instruction was simply to remember the cued location, facilitation was initially strongest at the retinotopic location. However, retinotopic facilitation dissipated quickly, whereas spatiotopic facilitation was more sustained. This suggests that the locus of attention was remapped to compensate for the eye movement, resulting in spatiotopic facilitation. Because remapping was incomplete, there was retinotopic facilitation directly after the eye movement, which dissipated rapidly owing to a lack of maintenance. However, if the instruction was to remember the location relative to the eyes, the results were quite different. In this case, there was sustained retinotopic facilitation and even a hint of spatiotopic inhibition. This led the authors to conclude that the locus of attention is essentially tied to retinotopic coordinates, and is not remapped unless this is explicitly required. In other words, the authors propose an additional restriction on remapping: even attended objects are remapped only when this is required for the task at hand. However, to memorize a location relative to the eyes is arguably an awkward instruction and participants may have resorted to unknown strategies in order to comply. The instruction to simply memorize a location is more natural and indeed yielded results that are more consistent with neurophysiological studies, which typically do not consider task instruction at all (e.g. [65]).
In a paradigm inspired by the study by Golomb et al. [87], we likewise investigated the effect of an eye movement on the locus of attention [89]. We presented an onset stimulus, which is known to attract attention [90]. In one experiment, we presented the probe (which was also a line segment) at the time of the saccade, in which case we found facilitation at both the spatiotopic and retinotopic location. Again, this suggests that the locus of attention is remapped, resulting in spatiotopic facilitation, but that remapping is only partial, resulting in retinotopic facilitation. In a related experiment ([91]; see also [92]), participants did not respond manually to a probe, but, after the first eye movement, made a second eye movement to the location of interest (spatiotopic, retinotopic or one of two locations). The first eye movement allowed us to dissociate retinotopic and spatiotopic coordinates. The latency of the second eye movement was used as a measure of attention: faster eye movements indicate more attention (e.g. [30]). Because of the relatively long interval between onset presentation and the second saccade, we expected IOR rather than facilitation. The results were clear-cut: if the second saccade was made right after the first saccade, IOR was predominantly retinotopic. At longer intervals, IOR was predominantly spatiotopic. These findings resemble those of Golomb et al. [87], who reported the same pattern of results for attentional facilitation. In relation to the studies of Posner & Cohen [84], these findings illustrate that reference frames are flexible and dynamic: effects may appear to be retinotopic or spatiotopic, depending on when you probe. This may account for the apparent dissociation between attentional facilitation (retinotopic) and IOR (spatiotopic).
In another experiment [89], we presented the probe stimulus after observers were instructed to make a saccade, but before the saccade had been executed (figure 4). The rationale behind this experiment was as follows: we assumed that the presentation of the onset stimulus excited a population of neurons [77]. If a probe is subsequently presented within the RFs of these excited neurons, processing of the probe is facilitated. Under normal circumstances, this means facilitation for probes presented at the same location as the onset. However, in the pre-saccadic interval, a proportion of neurons become transiently responsive to their FF [65]. If a probe were to be presented within the FFs of the neurons that were excited by the onset, facilitation should, in theory, be observed. Therefore, in some trials, we presented the probe at the ‘future-retinotopic’ location that fell within these presumed FFs. Crucially, we found attentional facilitation for probes presented just before the saccade at the future-retinotopic location. This suggests that predictive remapping affects the locus of attention in the interval preceding saccade execution.
Another important question is whether an eye movement causes an attentional ‘spread’ or ‘split’. A recent study shows that the spatiotopic and retinotopic loci of attention form two non-contiguous locations, suggestive of a split [93], which is exactly what would be expected based on neurophysiological evidence [69].
In summary, behavioural findings on remapping and attention are consistent with neurophysiological evidence. There are two important conclusions. First, attention determines which stimuli are remapped [32]. This is an efficient strategy, because it limits the problem of visual stability to those objects for which it is truly a problem: attended objects, which we act upon and consciously perceive. Second, the locus of attention itself is remapped similar to remapping of a physical stimulus [89]. Remapping is not an instantaneous process and a gradual shift from retinotopic to spatiotopic coordinates can be observed [87,91].
6. Remapping and attention: alternative interpretations
Not all researchers agree that the findings discussed in the previous sections should be interpreted as evidence for remapping of RFs. Here we discuss two divergent interpretations of the available data, which invoke the concept of attention in different ways.
Hamker and colleagues have constructed a computational model of peri-saccadic RF changes [94]. By simulating single-cell recording studies, they have shown that their model produces output consistent with empirical data [95]. Importantly, their model does so without incorporating predictive remapping in the sense that cells become selectively responsive to their FF. Rather, their model relies on RF shifts towards the saccade target [55]. For selective parts of the visual field, this results in RF shifts that resemble predictive remapping, but this is an illusion (figure 5). Because of these shifts, the number of RFs that encompass the saccade target increases, which results in increased capacity for processing the saccade target. This could correspond to the pre-saccadic shift of attention. Essentially, in this model, all peri-saccadic RF changes are ultimately linked to the pre-saccadic shift of attention. This compelling model explains many findings in a parsimonious way, although it fails to account for some results as well. Notably, the finding that FF and RF are non-contiguous areas is not easily explained [69] as is the finding that, depending on stimulus configuration, the locus of attention may predictively shift to a location beyond the saccade target [89].
Cavanagh and colleagues propose yet another view on remapping [96]. According to them, remapping is best explained as predictive shifts of attention. They argue that just before a stimulus is brought into a neuron's RF, the neuron becomes more active in order to prepare for the incoming information. Traditionally, information is believed to be transferred within retinotopic maps so that we do not need to re-acquire visual information after every saccade. According to Cavanagh et al. [96], information is not preserved across saccades, but attentional shifts facilitate the process of re-acquiring what has been lost. Based on neurophysiological studies, this is difficult to prove or refute, since remapping is typically investigated without taking stimulus features into account. However, if they are correct, there should be no spatiotopic after-effects, since that would indicate remapping of stimulus features. As pointed out by Cavanagh et al. [96], a number of studies have indeed failed to show spatiotopic after-effects [82,97–99]. However, there are also several studies that have shown clear spatiotopic after-effects [37,81,100,101] and trans-saccadic integration of object features [35,36].
7. Conclusion
Over the years, research on visual stability has made considerable progress and a number of conclusions can be drawn. First, visual stability is not as absolute as introspection would lead us to believe. Stability is preserved only for a limited number of attended objects [26,31,32,77], which is sufficient since those objects guide action and conscious perception [14]. The feeling that we have a complete and stable perception of the entire visual field has been called a ‘grand illusion’ [16].
Second, remapping appears to be one of the underlying mechanisms in visual stability (but see [94,96]). To compensate for eye movements, visual information is remapped within retinotopic maps. Although not all visual information is encoded retinotopically [57,58], there is little evidence to suggest, and really no reason to a priori assume, that true spatiotopy exists [61]. The characteristics of TSM as revealed by behavioural experiments strongly resemble the characteristics of remapping. As mentioned earlier, attention-gated limited capacity is a feature of both TSM and remapping [32,77]. In addition, the fact that TSM contains mostly, although not exclusively, conceptual information [26,37] is compatible with the finding that remapping occurs predominantly in higher visual areas, and is much less pronounced in visual areas dealing with those low-level features that are not readily integrated across saccades [72].
Third, attention is not only involved in visual stability in a supervisory manner, but is itself the subject of remapping. In the pre-saccadic interval, the focus of attention is remapped predictively [89]. Remapping of attention continues into the post-saccadic interval during which there is a gradual remapping from retinotopic to spatiotopic coordinates [87].
Fourth, the visual system relies on the assumption of stability. That is, we perceive the world to be stable by default and substantial evidence to the contrary is required to break this assumption. This is related to the finding that a covert shift of attention precedes every eye movement [30], allowing for an accurate preview of the saccade target. This preview is subsequently integrated with the post-saccadic percept of the saccade target, based on the assumption that the target has remained stable [43,44]. Not all items are equally important in the assumption of stability: attention appears to determine which objects serve as a reference point. Like TSM, this theory does not make any claims about the underlying neurophysiology. However, one cannot help but wonder how this finding relates to remapping of RFs. It has been suggested that there is no direct relationship at all, but that both mechanisms are solutions to different problems: the assumption of stability underlies perceptual stability, whereas remapping is concerned with visually guided actions [102]. This is a plausible proposal, but an important avenue for future research will be to further investigate the relationship between remapping and the assumption of stability.
Acknowledgements
This research was funded by a grant from NWO (Netherlands Organization for Scientific Research), grant 463-06-014 to J.T.
Footnotes
One contribution of 11 to a Theme Issue ‘Visual stability’.