Perceptual teleology: expectations of action efficiency bias social perception

Primates interpret conspecific behaviour as goal-directed and expect others to achieve goals by the most efficient means possible. While this teleological stance is prominent in evolutionary and developmental theories of social cognition, little is known about the underlying mechanisms. In predictive models of social cognition, a perceptual prediction of an ideal efficient trajectory would be generated from prior knowledge against which the observed action is evaluated, distorting the perception of unexpected inefficient actions. To test this, participants observed an actor reach for an object with a straight or arched trajectory on a touch screen. The actions were made efficient or inefficient by adding or removing an obstructing object. The action disappeared mid-trajectory and participants touched the last seen screen position of the hand. Judgements of inefficient actions were biased towards the efficient prediction (straight trajectories upward to avoid the obstruction, arched trajectories downward towards the target). These corrections increased when the obstruction's presence/absence was explicitly acknowledged, and when the efficient trajectory was explicitly predicted. Additional supplementary experiments demonstrated that these biases occur during ongoing visual perception and/or immediately after motion offset. The teleological stance is at least partly perceptual, providing an ideal reference trajectory against which actual behaviour is evaluated.

3 subjects factor. The Trajectory X Efficiency interaction did not differ with response initiation time bin (F(4,336) = .855, p = .491, ηp 2 = .010). There was a marginal effect of Bin on response execution times (F(4,336) = 2.26, p = .063, ηp 2 =.026), with a trend for the Trajectory X Efficiency interaction to decrease with increasing execution times.

Method Participants
Fifteen participants took part in the experiment (mean age = 32.1 years, SD = 15.7,12 females, 12 right handed). All had normal/corrected vision, were recruited from Plymouth University and wider community, and received course credit or payment. As participants completed twice the number of trials as those in the main experiments (see Procedure), a reduced sample size could be tested whilst maintaining an equivalent level of statistical power.

Apparatus & Stimuli
All apparatus are the same as in the main experiments. The target stimulus was a circle the same size (30 X 30 px) and colour as the tip of the index finger of the action stimuli in the main experiments.

Procedure
The design of the experiment matched that of the main experiments. Each of the 80 different movie sequences were represented, with the placement of the circle corresponding to the four final positions in each respective movie, producing 320 trials in total. The duration of the circle was aligned to the duration of the action stimulus. For example, a position that matched an offset after 4 frames was on screen for 320 ms (4 x 80 ms), whereas a position that matched an offset after 7 frames was onscreen for 560 ms (7 x 80 ms). As in the main experiments, each trial began with the instruction to hold the spacebar, after which an image depicting the target object on the far left and, when relevant, the obstructing object (the response stimulus from the main experiments) was shown, to which participants responded either "Yes" or "No". After a delay of between 1000 -3000 ms, the circle appeared and disappeared. Participants then released the spacebar and touched the screen where they thought the circle had appeared, after which the next trial began.

Participant performance and exclusions
No participants were excluded on the basis of the distance between the real and selected screen coordinate (mean = 22.5 px, SD = 4.5), but one was excluded based on the correlation between the real and selected positions on the X (median r = .991, SD = .028) or Y axis (median r = .987, SD = .029).
Due to the stationary nature of the stimulus, anticipatory responses (releasing the spacebar before stimulus offset) were excessively high (28.8%). Furthermore, the number of trials in which a response was initiated less than 200 ms after stimulus offset (31.1%) was considerably higher than in the main experiments (3.5%). To maintain equivalent trial numbers, the lower limit of 200 ms for the inclusion of trials based on response initiation times was removed, such that only responses initiated more than 3SD slower than the group mean were excluded (the results were unaffected by this altered exclusion criteria). Response executions times were comparable to those of the main experiments (M = 784.8 ms, SD = 190.8) and were excluded based on the same criteria as the main experiments. Overall, 0.6% of trials were excluded.

Data analysis
As the positions of the stationary circles matched those of the different action trajectory conditions of the main experiments, it was possible to analyse the screen coordinates of the stimulus in terms of Action Trajectory (straight, arched) and Action Efficiency (efficient, inefficient), an interaction of which is equivalent to a main effect of obstacle presence/absence, to facilitate comparison with the main experiments.
Supplementary Figure 1. Trial sequence and results for Supplementary Experiment 1. An example of the trial sequence is depicted in Panel A, with the action sequence replaced by a stationary circle that matched the tip of the index finger for size, colour and position. The results are depicted in Panel B. The difference between the real location and the selected location is plotted along the X and Y axis. A value of 0 on both axis indicates no difference, and therefore the real position on any given trial. Despite the stimulus being a stationary circle, the locations reflected the 4 stimulus conditions (Trajectory X Efficiency) of the main experiments, which are depicted here to facilitate comparison. Error bars represent 95% confidence intervals.

Discussion
The results demonstrate that when participants were required to localise the screen position of a stationary geometric shape, the presence or absence of a second object did not influence participant responses. The perceived location was no more upwards when the "obstruction" was present than when it was absent. These results are very different to those of the main experiments. This implies that the observed perceptual biases very much rely on the participant's interpretation of the action as goal directed, and that the second object is acting as an obstruction that determines whether that action is efficient or not. Neither interpretation is available when the stimulus to be localised is a simple geometric shape.

Supplementary Experiment 2: Probe judgments
The touch screen judgements of the main experiments provide a direct measure of perceptual shift in each trial, but leave open at which processing step they occur. Do they directly affect the perceptual representations of the observed actions, or do they emerge from later changes to the action's perceptual representations in working memory or in the sensorimotor maps that guide the motor responses to the relevant locations on the screen? Here, we therefore replicate the Report Object experiment with a well-established psychophysical task that is free from such memory or motoric influences, but reliably measures changes to the perceived motion in the predicted path (i.e., representational momentum, Freyd & Finke, 1984;Hudson, Bach & Nicholson, 2017;Hudson, Nicholson, Ellis & Bach, 2016;Hudson, Nicholson, Simpson, Ellis & Bach, 2016, for reviews, see Hubbard, 2005Kerzel, 2005).
In each trial, participants compared the hand's last seen position to a probe stimulus presented directly after hand offset (250 ms gap), which was displaced vertically either in the predicted direction (e.g. downwards for inefficient arched reaches) or in the opposite unpredicted direction, and horizontally leftwards or rightwards. Participants indicated, with the press of a button, whether the probe stimulus position was identical or different from the hand's last seen position on the screen. Importantly, if predictions of efficient action affect the on-going perceptual representation of the observed actions (for example in non-biological perception, see Muckli, Kohler, Kriegeskorte & Singer, 2005;Yantis & Nakama, 1998) or lead to spontaneous perceptual filling in of the predicted trajectories after the sudden offset (e.g., Ekman, Kok & de Lange, 2017), then participants should be more likely to mistake probe displacements in the expected direction with the hand's last seen position, compared to displacements in the opposite direction. Because the probe stimuli appear directly after action offset and participants' responses do not need access to visuospatial representations, any such effects will therefore reflect either perceptual changes during on-going action observation or directly after action offset.

Method Participants
Thirty-nine participants took part in the experiment (mean age = 20.0 years, SD = 1.7, 28 females). All participants were right-handed, had normal/corrected vision, and were recruited from Plymouth University for course credit.

Apparatus & Stimuli
The experiment was presented on a HP EliteDisplay S230tm 23-inch widescreen (1920 X 1080) touch screen monitor. Verbal responses were recorded with Microsoft LifeChat LX-3000 Headsets. All other components of the apparatus were the same as in the main experiments. The stimulus set was identical to the main experiments. The only addition was the probe stimulus, a single red circle the same size (30 X 30 pixels) as the tip of the index finger of the action stimuli in the main experiments.

Procedure
The design of the experiment closely matched that of the main experiments. As before, participants completed two blocks of 80 randomised trials. Each trial began with the first static image of the action sequence, and continued to replicate the trial sequence of the Report Obstacle experiment until the response stimulus. Thus, participants saw the action commence after they reported, verbally into the microphone, whether an obstacle was present in the scene. After the action disappeared, participants did not make a touch response.
Instead, the probe stimulus was presented 250 ms after hand offset (preventing masking effects, . The probe stimulus was overlaid on top of the scene Each participant received two practice blocks containing six trials each. In the first practice block, the final action frame remained on screen instead of the response stimulus, and the probe was overlaid on top of this frame. This made it clear to participants when the probe was in the same or different position as the tip of the index finger. The second practice block was the same as the experimental trials.

Participant performance and exclusions
Participants were excluded if the correlation between their probe judgements and the probe positions was more than 3SD away from the median r value (X axis: median = .858, SD = .141; Y axis: median = .898, SD = .123, 2 participants excluded). Exclusion of these participants does not affect the results. Individual trials were excluded if response times were faster than 200 ms or slower than 3000 ms (.04% of trials).

Data analysis
Analysis was conducted on the proportion of "different" responses, averaged across the three probe positions in each of the four directions. Difference scores were calculated along the X and Y axis separately to measure the size of the perceptual shift. For the X axis, responses for rightward probes were subtracted from responses for leftward probes. Therefore, positive difference scores denote the proportion of rightward probes judged as "same" and negative difference scores denote the proportion of leftward probes judged as "same". For the Y axis, responses for upward probes were subtracted from responses for downward probes. Therefore, positive difference scores denote the proportion of upward probes judged as "same" and negative difference scores denote the proportion of downward probes judged as "same". These difference scores were entered into two separate 2 X 2 ANOVAs with Trajectory (arched, straight) and Efficiency (efficient, inefficient) as withinsubjects factors.

X Axis
As in the main experiment, we did not have specific predictions for the X Axis. The reported effects should therefore be considered exploratory and interpreted with caution.
Interestingly, the analysis revealed an interaction between Trajectory and Efficiency (F(1,36) = 7.49, p = .010, ηp 2 = .172), showing that the likelihood to accept rightwards compared to leftward probes as "same" was greater for efficient arched reaches than for inefficient arched reaches, and greater for inefficient straight reaches than for efficient straight reaches. While unpredicted, this finding is fully in line with the expected deviation towards the predicted "efficient" trajectory. Because straight reaches exert more forward displacements than arched reaches (see above), this forward displacement also takes placealbeit to a smaller extentwhen participants see an arched reach but predict a straight reach, or conversely, is reduced when participants see a straight reach but predict an arched one. As noted, this effect was not predicted and not observed with the touch screen responses. It should therefore be interpreted with caution before being replicated.

Discussion
The results of Supplementary Experiment 2 confirm that perceptual distortions of observed actions towards an ideal reference trajectory can be measured with probe stimuli shortly after action offset (250 ms), with responses that do not rely on perceptual working memory representations or visuospatial motor maps (e.g., Kerzel, 2005). Moreover, because the perceptual biases measured in this paradigm are to a large extent involuntary (Courtney & Hubbard, 2008;Ruppel, Fleming & Hubbard, 2009), they rule out strategic responses aimed to satisfy the experimental demands of the task. The results therefore confirm that the perceptual changes happen either during on-going motion perception (e.g., Muckli et al., 2005;Yantis & Nakama, 1998), or in the brief interval directly after its sudden offset, when the visual system spontaneously fills in the further expected trajectory (e.g., Ekman et al., 2017). They link the effects either to top-down changes that sharpen the considerable perceptual uncertainty during motion perception (i.e. motion blurring & sharpening, Hammett, 1997), and/or to changes in short term iconic memory that rely on early visual representation and are responsible for their conscious representation, linked to such phenomena as integration of stimulus features, change blindness, and the experience of stable percepts across saccades (e.g., Becker, Pashler & Anstis, 2000;Jonides, Irwin & Yantis, 1982, see Öğmen & Herzog, 2016 for a recent review).
To test whether such top-down interactions with early visual processes are responsible for the biases towards efficient actions, we replicated the Report Obstacle experiment but inserted, in half of the trials, a short (560 ms) rapidly changing visual noise pattern immediately after the action offset. Because such dynamic visual noise causes apparent motion (MacKay, 1965), it should interfere with motion based predictions that contribute either to the conscious perception of the seen action, or to the perceptual "filling in" of the suddenly missing information directly after action offset. If the perceptual biases emerge from such changes to early visual perceptual representations, then these biases should be only (or more strongly) observed in the no-mask compared to the masked trials.

Method Participants
Twenty-eight participants took part in the experiment (mean age = 19.6 years, SD = 1.1, 26 females). All participants were right-handed, had normal/corrected vision, and were recruited from Plymouth University for course credit.

Apparatus & Stimuli
The experiment was presented on a HP EliteDisplay S230tm 23-inch widescreen (1920 X 1080) touch screen monitor. Verbal responses were recorded with Microsoft LifeChat LX-3000 Headsets. All other components of the apparatus were the same as in the main experiments.
The stimulus set was identical to the main experiments. The additional mask stimuli were created in R. The mask covered an area of 200 X 200 pixels and contained 50 black and 50 white squares of equal size (12 X 12 pixels) on a transparent background. Twenty different mask images were created, each containing a randomised arrangement of the squares.

Procedure
The design of the experiment closely matched the Report Obstacle version of the main experiments. Participants completed two blocks of 80 randomised trials. Half of the trials were an exact replication of the Report Obstacle experiment (no-mask condition), and half the trials had the addition of the mask (mask condition), randomly interspersed. Participants again reported whether an obstacle was present in the scene or not, by speaking "Yes" or "No" into the microphone. The action sequence then started and disappeared before completion. In no-mask trials, participants simply indicated on the response stimulusthe scene with the hand removed -the index finger's last seen location. For masked trials, the mask was overlaid on top of the response stimulus 560 ms immediately after action offset, on which participants reportedwith a touch response -the hand's last seen position. The centre of the mask was positioned at the disappearance point of the tip of the index finger, plus or minus 20 pixels in the X and Y direction, to ensure that participants could not simply use the task to aid their judgment. As soon as the hand disappeared, a sequence of seven randomised mask images was presented at the same rate as the prior action sequence (80 ms per frame), creating a mask which was on screen for 560 ms. Once the mask ended, the response stimulus remained on screen until the touch response was recorded. Any touch responses recorded while the mask remained on screen ended the trial. An example trial sequence for masked trials can be seen in Supplementary Figure 3.

Participant performance and exclusions
Exclusion criteria were identical to the main experiments. No participants were excluded on the basis of the distance between the real and selected screen coordinate (mean = 36.3px, SD = 21.9), but one was excluded based on the correlation between the real and selected positions on the X (median r = .944, SD = .039) or Y axis (median r = .888, SD = .038). A total of 3.2% of trials were excluded due to incorrect response procedure and 2.8% of trials were excluded if initiation or execution times were less than 200 ms or more than 3SD above the sample mean (Initiation: mean = 350.5 ms, SD = 158.7; Execution: mean = 527.8 ms, SD = 161.8). In 2.9% of trials, a response was made while the mask remained on screen. These trials were included in the analysis but their exclusion/inclusion does not affect the results.

Data analysis
Data was analysed in the same way as the main experiments. Difference values (reported minus actual disappearance points) were entered into a 2 X 2 X 2 repeated-measures ANOVA for the X and Y coordinates separately, with Trajectory (arched, straight), Efficiency (efficient, inefficient), and Condition (mask, no-mask) as within-subjects factors.
There was also a three-way interaction between Efficiency, Trajectory and Mask condition (F(1,26) = 4.89, p = .036, ηp 2 = .158) revealing that the Trajectory X Mask condition effect was larger for Efficient actions than for Inefficient actions. While this effect reveals a similar mask effect as for the Y Axis, it should be treated with caution as it was not predicted, no similar interaction of Efficiency and Trajectory was found for any of the main experiments, and it was one of many possible (unpredicted) effects in the ANOVA, and would therefore be subject to adjustments for multiple comparisons (Cramer et al., 2016).
Supplementary Figure 3. Trial sequence and Results for Supplementary Experiment 3. An example of the trial sequence for the Mask condition is depicted in Panel A. The results for the no-mask condition are depicted in Panel B and the results for the Mask condition are depicted in Panel C. Panel D depicts a comparison of the size of the Y axis interaction in pixels, equivalent to the total amount by which inefficient actions were corrected towards a more efficient trajectory. Error bars represent 95% confidence intervals.

Discussion
Supplementary Experiment 3 replicated the finding that perceptual judgments of observed actions are biased towards efficient trajectories. Crucially, it showed that a brief dynamic visual noise mask inserted directly after action offset successfully disrupted the resulting effects on perceptual judgments, substantially reducing the bias towards efficient actions.
Dynamic visual noise masks as used here specifically interfere with the re-entrant top-down interactions with early perceptual regions (Boehler et al., 2008;Fahrenfort et al., 2007) that are crucial for visual awareness of a stimulus (e.g., Lamme & Roelfsema, 2000;Lamme et al., 2002) or the creation of a detailed mental image during visual imagery that is akin to actual perception and which can be accessed for further processing (e.g., Andrade et al., 2002;Borst et al., 2012;McConnell & Quinn, 2000). The masking effects therefore further confirm that the perceptual bias in the main experiments either reflect on-line changes to the action's perceptual representation during observation, or spontaneous "filling in" of the suddenly missing input briefly after its offset, creating an impression of an action displaced towards the anticipated ideal reference trajectory. They cannot be explained in terms of demand characteristics, which were equivalent across both Mask and No-Mask conditions, especially as the two conditions varied rapidly in an unpredictable manner, and participants' attention was equally drawn to the environmental constraints in both conditions.