Proceedings of the Royal Society B: Biological Sciences
You have accessResearch article

The perceptual shaping of anticipatory actions

Giovanni Maffei

Giovanni Maffei

Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain

Department of Information and Communication Technologies, Universitat Pompeu Fabra (UPF), Barcelona, Spain

Google Scholar

Find this author on PubMed

,
Ivan Herreros

Ivan Herreros

Department of Information and Communication Technologies, Universitat Pompeu Fabra (UPF), Barcelona, Spain

Imaging Neuroscience and Theoretical Neurobiology, Wellcome Trust Centre for Neuroimaging, University College of London (UCL), London, UK

Google Scholar

Find this author on PubMed

,
Marti Sanchez-Fibla

Marti Sanchez-Fibla

Department of Information and Communication Technologies, Universitat Pompeu Fabra (UPF), Barcelona, Spain

Google Scholar

Find this author on PubMed

,
Karl J. Friston

Karl J. Friston

Imaging Neuroscience and Theoretical Neurobiology, Wellcome Trust Centre for Neuroimaging, University College of London (UCL), London, UK

Google Scholar

Find this author on PubMed

and
Paul F. M. J. Verschure

Paul F. M. J. Verschure

Institute for Bioengineering of Catalonia (IBEC), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain

Department of Information and Communication Technologies, Universitat Pompeu Fabra (UPF), Barcelona, Spain

Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain

[email protected]

Google Scholar

Find this author on PubMed

Published:https://doi.org/10.1098/rspb.2017.1780

    Abstract

    Humans display anticipatory motor responses to minimize the adverse effects of predictable perturbations. A widely accepted explanation for this behaviour relies on the notion of an inverse model that, learning from motor errors, anticipates corrective responses. Here, we propose and validate the alternative hypothesis that anticipatory control can be realized through a cascade of purely sensory predictions that drive the motor system, reflecting the causal sequence of the perceptual events preceding the error. We compare both hypotheses in a simulated anticipatory postural adjustment task. We observe that adaptation in the sensory domain, but not in the motor one, supports the robust and generalizable anticipatory control characteristic of biological systems. Our proposal unites the neurobiology of the cerebellum with the theory of active inference and provides a concrete implementation of its core tenets with great relevance both to our understanding of biological control systems and, possibly, to their emulation in complex artefacts.

    1. Introduction

    Anticipatory motor actions, thought to depend on the cerebellum [13], are part of our everyday behaviour: from walking [4,5], to grasping [68] and to riding a bicycle [9]. The question then arises as to how these actions are controlled? Decades of research in motor control support the notion that internal models are key to skilful performance [1012]. Specifically, this research has highlighted two kinds of internal models: forward models, which map the efference copies of motor commands into their expected sensory consequences [13,14]; and inverse models, which map desired sensory outcomes into their required motor commands [15,16].

    However, here we argue that offering an alternative to these interpretations is a pressing issue for the field of motor control as neither forward nor inverse models (in their standard formulation) can explain the versatile anticipatory control observed in animals. In particular, standard forward models allow for rapid feedback control in the presence of the long transport latencies of the nervous system [13] or action planning [11] but, as they exclusively predict the consequences of motor commands, they cannot anticipate disturbances that are not contingent upon those motor commands [17]. That is, one cannot call upon efference-driven forward models to support behaviours that precede external events. This obvious limitation has led researchers to conclude that preparatory actions should result from inverse models that output anticipatory motor signals [1822]. The benchmark computational model for that theory is feedback error learning (FEL), which offers both an adaptive motor control architecture [23] and a theory of cerebellar function [10,16]. In FEL, predictive actions are the result of anticipatory motor signals, learned by shifting forward in time the output of the feedback controller [18,22]. However, we will show that inverse model schemes present some important limitations in the context of anticipatory control. For instance, while rapid corrections of erroneous anticipatory actions are commonly reported in biological systems, most notably in experiments that include catch trials (i.e. trials where a predictable disturbance is signalled but not delivered) [24,25], FEL has no mechanism to correct feed-forward motor responses once the course of events violates a prediction. In addition, FEL acquires motor commands that are tied to the dynamics of the plant that it controls and cannot easily be generalized to new configurations. However, experimental evidence suggests that in humans, anticipatory responses are still effective even if one changes the posture and/or the effector after learning [26,27]. Hence, given that standard forward and inverse models cannot fully account for anticipatory control, alternatives should be considered that both overcome the theoretical and practical limitations of these motor-centric accounts and resolve the forward–inverse model dichotomy.

    Here, we advance the hypothesis that biological anticipatory control can be explained by the ability of the brain to advance predictions of future perceptual events [28] and use those predictions to drive the motor system in an anticipatory way [29]. We formulate this hypothesis in computational terms by proposing the cerebellar-based Hierarchical Sensory Predictive Control (HSPC) architecture, in which internal models issue sensory predictions that facilitate anticipatory control, with motor signals (i.e. efference copies of motor commands) playing no role in adaptation itself. With that, HSPC challenges the inverse model interpretation of anticipatory control – and, indirectly, the ‘motor-centric’ forward–inverse model dichotomy. More precisely, we suggest that, in contrast to the FEL hypothesis, where predictive actions are the result of anticipatory motor signals, anticipatory actions can be controlled by predictive sensory signals, becoming reactions to events that are brought forward in time [3032]. Moreover, in HSPC the internal generation of sensory predictions can mirror the (hierarchical) causal structure of the sequence of perceptual events (figure 1). HSPC builds on the hypothesis that motor control can be understood as a process of sensory-sensory learning where sensory predictions are only mapped onto motor commands at the late stage before motor execution, for example through reflexes, as proposed in the Distributed Adaptive Control (DAC) theory and formalized in the theory of Active Inference [3335]. At the theoretical level, this hypothesis has been studied mostly within the active inference framework, using generative hierarchical models and focusing on the aspect of reformulating control as Bayesian inference [35,36], whereas DAC generalized it to robot-based foraging tasks showing Bayesian equivalence [34]. Hence, here we propose for the first time a detailed computational and practical treatment of the sensory-sensory learning hypothesis in the context of anticipatory actions. To this end, we provide a systematic comparison between HSPC and FEL by synthesizing each hypothesis into an architecture applied to a postural control task, minimally modelled as the stabilization of an inverted pendulum through a torque at its base (i.e. ankles; figure 1), demonstrating how learning in the sensory rather than in the motor domain can account for the robustness and generalization capabilities of biological control systems with emphasis on the relation between the cerebellum and the neo-cortex. In summary, this study presents an approach to motor control that could provide an alternative interpretation of the physiology of anticipatory control and contribute to the theory of cerebellar learning.

    Figure 1.

    Figure 1. Conceptualization of the Hierarchical Sensory Predictive Control (HSPC) hypothesis. A predictable displacement caused by a soccer ball directed to the chest elicits an anticipatory response that minimizes the loss of balance before it is perceived. In HSPC, the anticipatory response is the result of a hierarchy of descending sensory predictions from distal (visual detection) to proprioceptive (impact) to vestibular (loss of balance) modalities, where each modality advances in time the expected consequences on the next modality until the predicted error in balance triggers a reflexive action in a feed-forward manner. The minimal model for this behaviour is an inverted pendulum of mass (m) and height (h), whose error in angle (θ) is minimized by generating a torque (τ) at the ankles that counteracts the disturbance (F).

    2. Methods

    In order to compare the behaviour of a control strategy based on motor anticipation (FEL) with one based on sensory prediction (HSPC), we synthesize these hypotheses into two architectures that control an inverted pendulum (a common model for bipedal postural control–see [37] for review; figure 2b,c; electronic supplementary material, figure S1) engaged in an anticipatory postural adjustment (APA) task. This task, in line with experimental psychology paradigms [2,38] (figure 2a), requires the agent to learn an appropriate combination of anticipatory and compensatory responses to minimize the effect of a disturbance (i.e. loss of balance) signalled by a cue.

    Figure 2.

    Figure 2. Motor anticipation (FEL) versus sensory prediction (HSPC) strategies. (a) Different responses are elicited by different sensory modalities. (i) A corrective reaction is triggered by the perceived postural error. (ii) A fast compensatory corrective action is triggered by the perceived impact (proximal stimulus). (iii) An anticipatory action is triggered by the distance to the obstacle (distal stimulus). (b) Motor anticipation strategy (FEL). (i) A postural error is converted into a reflexive action by a feedback controller (R). (ii) A feed-forward compensatory action associated with the impact signal is acquired by the proximal adaptive module (FFp) on the basis of the feedback response to the error. (iii) A feed-forward anticipatory action associated with the distal cue is acquired by the cerebellar distal module (FFd) on the basis of the same feedback response. (c) Sensory prediction strategy (HSPC). (i) Reflexive action elicited as in FEL. (ii) Feed-forward compensatory action: triggered by the proximal cue and learned from the closed-loop error, a counterfactual error is issued by the proximal module (FFp) in response to the proximal cue driving the feedback controller. (iii) Anticipatory action: evoked by the cue, a prediction of the expected impact issued by the distal module (FFd) triggers the compensatory response in an anticipatory manner.

    (a) Model of the agent

    The inverted pendulum actuated by a torque (τ) at its base is modelled as follows:

    Display Formula
    2.1
    The pendulum has a mass (m) of 67 Kg and a height of its centre of mass (h) equal to 0.85 m. θ measures the angular deviation from the vertical position. The disturbance is introduced as a force (F) parallel to the ground applied to the centre of mass.

    (b) Control architectures

    The APA task involves three different sensory modalities: distal (perceiving a cue that precedes the collision), proximal or proprioceptive (sensing the magnitude of the impact on the body) and vestibular (sensing the postural effects of the impact, i.e. the inclination). Each modality enables a different type of response: distal sensing allows for preparatory responses, proximal sensing for fast compensation and vestibular for compensation through feedback control [6,30,3941] (figure 2a). Note that a similar distinction between distal and proximal sensory modalities can be found in [34,35] to account for sensory predictions within extrinsic and intrinsic frames of reference, respectively.

    (i) Feedback controller

    The agent is stabilized by a torque generated through a proportional-derivative feedback controller as follows:

    Display Formula
    2.2

    Note that in the error term we use the angle and angular velocity values delayed by δs (=100 ms) to account for the latency of the error feedback.

    (ii) Adaptive feed-forward modules

    In addition to a feedback controller, both architectures include the same adaptive feed-forward modules to process the proximal and distal cues. That adaptive feed-forward module (i.e. inversion of a forward or generative model) is implemented as an adaptive filter extended with an eligibility trace mechanism [4244]. Each feed-forward module receives a single sensory input signal that is expanded into N (=20) different signals or bases. Each basis corresponds to the convolution of the (sensory) input with an α signal that can be formulated as two serially linked leaky integrators with identical time constants. For a particular basis, its output value is generated as follows:

    Display Formula
    2.3
    and
    Display Formula
    2.4
    where Δt(=0.01 s) is the simulation time step and Inline Formula is the j-th basis decay factor, derived from a relaxation time constant τj. ζj is a scaling factor that equalizes the power of all bases. At this point, an expansion of the original signal x(t) into a series of bases or transients with different temporal profiles is obtained. The second processing step consists in mixing those bases according to a weight vector w(t) to generate an output signal (ff(t)):
    Display Formula
    2.5
    where Inline Formula is the vector of the bases. The weight vector is adaptively set by means of an LMS (Least Mean Squares) or Widrow–Hoff update rule [45] extended with an eligibility trace:
    Display Formula
    2.6
    where, Inline Formula is an appropriated error signal that is used to update the weights. The eligibility trace is implicit in the use of a delayed copy of the bases activity Inline Formula for the update, with x indexing the type of stimulus processed: proprioceptive (p) or distal (d). In short, to update the weights the current error is associated with an activity on the basis signals δx seconds ago. With that, we assume that activity at time Inline Formula is the one that should have been used to trigger a reaction with sufficient anticipation to cancel the current error at time t. In general, we set both δd and δp greater than the error feedback delay (δs), implying that the extent of the anticipation goes beyond the transport (or error feedback) delay.

    (iii) Configuration of the FEL and HSPC architectures

    Both control architectures include the feedback controller and two feed-forward modules (distal and proximal) wired according to the heuristic of either predicting motor commands from sensory signals (FEL architecture), or predicting sensory signals from sensory signals (HSPC architecture).

    In FEL, feed-forward modules act upon the plant and are supervised by the feedback reaction to the error in posture (figure 2b; electronic supplementary material, figure S1a). In particular, the proximal module issues a feed-forward action in response to the impact learned by shifting the reactive action earlier in time, while the distal module similarly acquires a response that is triggered by the distal stimulus, and thus can precede the impact itself.

    Let ffp(t) and ffd(t) be the outputs of the proximal and distal feed-forward modules; xp(t) and xd(t), their respective input signals; and Inline Formula and Inline Formula, their respective teaching signals. The structure of the FEL architecture is determined by the following equations:

    Display Formula
    2.7
    Display Formula
    2.8
    Display Formula
    2.9
    where Inline Formula and Inline Formula represent the cue (distal) and impact (proximal) signals, respectively, and fb is the output of the feedback controller. As a final step, the output of all modules are added up to generate the control signal (fel(t)):
    Display Formula
    2.10

    In HSPC, upstream modules drive and learn from the input of downstream modules (figure 2c; electronic supplementary material, figure S1b). That is, the proximal module learns counterfactual errors [29] contingent to the impact so that the feedback controller reacts to the expected error before the actual one occurs. While the distal module learns to predict the collision signal contingent to the cue and triggers the proximal module ahead of the impact. Note that, by necessity, the HSPC architecture includes an internal comparator that computes the prediction errors associated with the collision signal.

    In keeping with the above notational conventions, the equations determining the distal feed-forward module inputs and error signals in HSPC are:

    Display Formula
    2.11
    and
    Display Formula
    2.12

    Note that the error signal that controls learning in the distal feed-forward module is a prediction error, coding the difference between a past prediction, Inline Formula, and the actual stimulus, Inline Formula, where δd is the anticipatory delay of the distal module. The proximal feed-forward module is integrated within the control architecture as follows:

    Display Formula
    2.13
    and
    Display Formula
    2.14

    In brief, the sensory prediction error (SPE), Inline Formula, and the prediction signal, ffd(t), related to the collision drive the proximal module, which is supervised by the error in angle (measured with a delay of δs seconds).

    In the last stage, the output of the proximal feed-forward module is added to the error in velocity driving the feedback controller. We formulate that operation by introducing Inline Formula and then rewriting the first equation of the feedback controller:

    Display Formula
    2.15

    Finally, the motor control signal generated by the HSPC architecture is simply the output of the feedback controller, Inline Formula.

    3. Results

    Below, we report on the performance of both the HSPC and FEL control schemes for three experimental conditions: standard acquisition trials, robustness (catch) trials in which the disturbance is cued but not delivered, and generalization trials in which we provide both cued and non-cued trials, and change the weight of the agent during training.

    (a) Acquisition

    We start by analysing the performance of the two adaptive control architectures in the acquisition of an APA trained in a trial-by-trial manner. We use a simulated self-balancing system that at each trial receives an impact, preceded by a distal cue by a fixed interval of 400 ms, and resulting in a disturbance force (100 N during 300 ms). The force, applied to the pendulum, produces an angular displacement that, in the naive system, is uniquely counteracted by the reactive controller introducing oscillations in the angular position (figure 3a, grey line). After learning, acquired motor responses evoked by the two predictive stimuli (cue and collision) substantially reduce the angular error (figure 3a, red and cyan). Note that despite implementing different adaptive strategies, we could configure both architectures to exhibit similar learning curves (figure 3b).

    Figure 3.

    Figure 3. Experimental results. Acquisition. (a) Mean angular position during the disturbance rejection task for feedback-control condition (grey – 10 trials), trained FEL architecture (red – trials 90–100) and trained HSPC architecture (cyan – trials 90–100). Disturbance is delivered at t = 0 (dashed line). (b) Root mean square error (RMSE) in angular position over trials during acquisition phase for FEL (red) and HSPC (cyan) architectures normalized by the maximum error in the naive system (feedback-control only). Robustness. (c) Mean angular position of FEL (red) and HSPC (cyan) during catch (N = 5 – solid) and regular perturbed trials (N = 5 – dashed). (d) Root mean square error (RMSE) in angular position during regular trained perturbed trials (FEL, HSPC, N = 5), catch trials (FEL-C, HSPC-C, N = 5) and extinction trials (FEL-E, HSPC-E, N = 10). Generalization. (e) Root mean square error (RMSE) in angular position during light-to-heavy generalization phases for FEL (red) and HSPC (cyan). ‘Light plant’ denotes the phase before plant perturbation. ‘Heavy plant’ denotes the phase after plant perturbation. (f) FEL mean angular position after plant perturbation (heavy plant – N = 10) without (dashed) and with the cue (solid) and after regular training with heavy plant (solid magenta). (g) HSPC: mean angular position after plant perturbation. As (f).

    After learning, in FEL the reactive controller is only marginally engaged as the errors in behaviour that drove it initially are almost cancelled (electronic supplementary material, figure S1a). Note that in this architecture, only the cue-evoked command contributes to preparatory behaviour (before the collision) but both cue- and collision-evoked commands contribute to the fast feed-forward compensation that takes place after the collision (electronic supplementary material, figure S1a).

    Conversely, in HSPC the proximal adaptive module that associates the collision signal with inertial errors steers the feedback controller both during anticipation and fast compensation (electronic supplementary material, figure S1b). Still, after learning, the proximal module is fed with a mixture of actual and anticipated collision signals, where the former is sensed and the latter provided by the distal module (electronic supplementary material, figure S1b). Importantly, the distal module predicts the collision signal from the cue and issues an anticipated impact signal preceding the actual impact by 100 ms (the extent of the anticipation, δd, is a design parameter – see Methods). Hence, the anticipatory part of the response, despite being evoked only by the cue stimulus, results from a cascade of predictions that involves both adaptive feed-forward modules and the feedback controller.

    In sum, despite the marked differences in the processing, both architectures converged to similar motor commands and behaviour, indicating that both motor anticipation- (FEL) and sensory prediction-based (HSPC) strategies can be equally successful in acquiring APAs.

    (b) Robustness

    Next, we assess the reaction of both architectures to violations in the sequence of predicted events that was learned during training. To that end, after 100 acquisition trials, we run 50 trials within which we randomly intersperse 10% catch trials in which we present the cue but omit the disturbance. During catch trials, the agent initiates an anticipatory motor response that later, due to the lack of disturbance, results in a performance error [7,18,25]. Here, we use such errors to quantify how responsive FEL and HSPC are in recovering from erroneous predictions [25].

    Prior to the expected impact time, both architectures introduce a slight anticipatory angular error (figure 3c) by issuing the preparatory part of the response (electronic supplementary material, figure S2a). However, once the impact fails to occur, HSPC promptly corrects the initial error while in FEL the error keeps increasing. In terms of performance, the error in a catch trial incurred by HSPC (median of the RMSE) is approximately half of the error introduced by FEL (0.3 versus 0.6 in normalized RMSE; figure 3d). The errors seen in catch trials are the same ones observed at the onset of extinction training. Both architectures greatly suppress these errors (also called after-effects) after 50 extinction trials (figure 3d).

    The reasons behind the difference in performance in catch trials are the following: FEL reacts to the absence of the impact by omitting the collision-evoked command, but maintains the whole cue-evoked command even after the lack of the expected collision has shown it to be unnecessary. By contrast, HSPC rapidly aborts the (feed-forward) action once the proximal module receives the SPE triggered by the missed collision (electronic supplementary material, figure S2b).

    In summary, the HSPC architecture outperforms the FEL in that, due to the computation of sensory prediction errors, it can react online to violations in the course of expected events (i.e. to SPEs).

    (c) Generalization

    In a final set of simulations, we test how both architectures respond to changes in the plant dynamics and task contingencies. We run an additional set of 60 trials after acquisition. During the first 10 extra trials, we measure the performance of the feed-forward compensatory layer in isolation, omitting the cue. At trial 11, the plant is made heavier (+10% – light-to-heavy condition; note a similar manipulation in behavioural postural control studies [46]) and the agent receives additional non-cued collisions (40 trials). Afterwards, we reintroduce the cue for 10 more trials. In a separate set of simulation, we train initially the heavier agent and afterwards remove the weight (−10%–heavy-to-light condition).

    In FEL, any change in the task decreases performance (removing or reinstating the cue), irrespective of whether the plant has increased or decreased its weight (figure 3e,f; electronic supplementary material, figure S3c). In HSPC, the performance deteriorates, albeit to a lesser extent, after removing the cue. However, once the cue is reintroduced after having retrained the compensatory module, we observe a gain in performance in both cases, greater when transitioning to the lighter plant (figure 3e,g; electronic supplementary material, figure S3c).

    The difference in performance stems from the different ways in which both architectures combine the two stimuli. FEL deals with the cue and impact as independent stimuli. Initially, both contribute to the response, but once the cue is removed a part of the response is removed as well, damaging performance (electronic supplementary material, figure S3a). Further training makes FEL able to trigger appropriate compensatory responses just with the proximal stimulus, but then, reinstating the cue superposes a motor command partly redundant, damaging performance again (figure 3f). Notably, if one would consider that cue and impact form a compound stimulus in regular trials, one could explain the interference between the cue and the impact stimuli with the Rescorla–Wagner model [47]. On the contrary, in HSPC the distal module learns to predict the impact from the cue, and uses that prediction to trigger (a part of) the compensatory action in anticipation (electronic supplementary material, figure S3b). That implies that even after changing the properties of the plant, anticipating an appropriate compensatory action can result in an improvement in performance (figure 3g).

    In summary, in face of perturbations to the plant dynamics or changes in the task contingencies, a control strategy learning a cascade of sensory predictions allows for better generalization than one that treats the different stimuli independently.

    4. Discussion

    Even though it is clearly established that skilled motor behaviour relies on internal models, their nature is still under debate. The two prevailing views are that internal models can be either inverse models, mapping the desired sensory consequences into their required motor commands, or forward models, mapping motor commands into their predicted consequences. Here, we have challenged this dichotomy and advanced an alternative proposal (HSPC) that reformulates anticipatory motor control as a sensory–sensory learning problem. On this view, the predicted consequences of responses to (distal or proprioceptive) cues prescribe action or motor commands (that are mediated – or realized – by reflexes). This simplification and generalization of the ‘standard model’ appeal to active inference, with an emphasis on estimating and predicting states of the world and the self. In order to test this hypothesis, we designed two control architectures that adopted either a motor anticipation- or a sensory prediction-based approach. We based the motor-anticipation architecture on the well-established FEL model [15,16,23] whereas HSPC provided the sensory prediction-based architecture.

    We compared both architectures in a simulated APA task [6,7,38]. Despite differences in the processing, both architectures acquired an APA equally well (figure 3a,b). However, as soon as we extended the basic APA protocol with either the introduction of catch trials or by perturbing the plant, the sensory prediction strategy outperformed motor anticipation. Below, we will argue that the reasons for that superior performance are grounded in two specific consequences of the sensory prediction strategy: its reliance on SPEs, and second, that HSPC affords a hierarchical processing architecture that encapsulates learning at different levels. In other words, in line with active inference, placing a hierarchical model on top of reflexive sensorimotor control equips behaviour with a context-sensitivity and intentional aspects that are precluded in ‘standard’ formulations.

    (a) Origin of the robustness and generalization capabilities in HSPC

    The hierarchical structure of the HSPC explains its superior generalization ability. The FEL architecture has a flat structure as far as controlling behaviour is concerned: all modules send motor commands in parallel to the plant. This means that after a perturbation of the plant, the output of all modules has to be retrained to the new plant dynamics. In HSPC, its hierarchical structure entails that all modules are only concerned with driving and learning from the module immediately below in the hierarchy. Hence, HSPC solves the control problem by partitioning it into two smaller sub-problems: predicting the collision from the cue and predicting the postural errors from the collision. As a consequence, changing the mass of the agent only changes the sensory consequences of the collision, hence, once a new feed-forward reaction to the collision is acquired, a gain in performance can still be obtained by correctly anticipating the collision (thereby, bringing the trained reaction forward in time).

    On the other hand, SPEs enable the fast reaction to erroneous predictions. As FEL only learns to react to stimuli, but not to predict them, it cannot (at least naturally) incorporate SPEs. On the contrary, HSPC relies on SPEs both for improving prediction accuracy and to preclude reaction to predicted stimuli at the time of their actual occurrence [13]. That is, SPEs are intrinsic to the design principle behind HSPC. In catch trials, as no collision occurs, the prediction of the distal module fails, generating a negative SPE that interrupts the ongoing response of the proximal module initiated by the distal module, thereby enabling a fast recovery (in addition to readjustment – learning – as the absence of the collision may imply a lasting change in task contingencies).

    (b) Environmental forward models and inverted sensory–sensory forward models

    The distal module in HSPC is a forward model of the environment that solves the problem of predicting one stimulus (a collision) given another stimulus (a cue); that is, a task contingency. In general, forward models of the environment have been acknowledged [48], but usually not considered specifically in the context of physiological motor control except, recently, within the domain of active inference [33,35,36,49]. However, the forward model in HSPC is not generically predicting one stimulus from another; it is anticipating a stimulus with the objective of driving a behavioural response that minimizes a defined error. For that, it must take into account not only sensorimotor latencies but also the dynamics of the plant (e.g. musculo-skeletal system). Hence, the environmental forward model in HSPC affords action-aware sensory predictions in that they are made having knowledge about the dynamics of the action that they will drive. By contrast, standard forward models do not require knowledge of the dynamics of the feedback action itself, as they only need to be tuned to the afferent and efferent delays [13].

    On the other hand, the internal model dealing with the collision signal acts as an inverse model. Even though it is supervised by a postural error signal, its goal is not to learn to predict postural errors, but to steer its downstream feedback controller to avoid these errors. We have earlier called this approach counterfactual predictive control (CFPC) [29]. The goal of CFPC is acquiring counterfactual error signals that, even though they do not code any forthcoming errors derived from the interaction with the physical world, they are processed by a feedback controller as if they were real errors. In practice, this leads the adaptive model within the HSPC architecture to acquire an inverse model of the closed-loop system that reflects jointly the dynamics of the plant and the controller [29]. That is, a model is said to be inverse because it reverses a causal relationship: from the desired effects (i.e. avoiding errors in performance) to inferring the right causes (i.e. the motor commands that will avoid those errors). The module processing proximal events within HSPC shares the same goal as the standard inverse model just mentioned. The only difference is that it outputs a predicted sensory signal that signifies an error rather than a motor command. This signal must be considered counterfactual. This demonstrates how a learning process that depends on sensory errors (in contrast to motor errors) is not automatically building a forward model (for another example, see [50]).

    (c) Related research in experimental psychology and predictions of the HSPC hypothesis

    Experimental APA protocols include standing human participants receiving the impact of an object attached to a pendulum [38,39]. As expected, those experiments show that faced with the incoming pendulum, participants rely on distal sensing (vision) to issue the anticipatory responses [38,39], that is: no anticipatory responses were observed when participants closed their eyes. Regarding the interplay between proprioceptive and vestibular information, separate studies in compensatory postural control have shown that humans with compromised proprioception display compensatory responses delayed with respect to healthy controls [51] as well as animals with pyridoxine-induced loss of peripheral sensory efferents have delayed compensatory responses and increased postural sway [52]. This suggests that, despite some simplifications, the design of the task and the adaptive interplay between sensory modalities and responses in our simulated APA task is in close agreement with well-studied properties of biological control. We note, however, that in humans and animals, anticipatory and compensatory strategies often act synergistically across different sets of muscle synergies, reflecting different demands (i.e. upper extremities respond with a higher degree of anticipation compared to lower ones) [53]. However, those findings do not discriminate between the sensory prediction and motor-anticipation hypotheses. An exception comes from experiments showing that altered proprioceptive information at the level of the Achilles tendon delays anticipatory postural responses [39]. Note that FEL would predict that decreasing the information in the proprioceptive channel would have no effect in the preparatory actions, which are motor commands triggered by the visual stimulus. However, in the HSPC hypothesis, anticipatory actions are elicited by generating proprioceptive predictions. Hence, one could expect that a manipulation that alters the processing of real proprioceptive information would also affect the mapping of predicted proprioception into action.

    HSPC further predicts that in catch trials, subjects will correct erroneous anticipatory actions with a latency equal to the time needed to detect SPEs. By contrast, as FEL makes no use of SPEs, it has no mechanism that could detect and process such a sharp change in behaviour at the expected time of the disturbance. Note that errors observed in catch trials, or after-effects, which provide a means to quantify learned motor responses, are a hallmark of adaptive motor behaviour. Hence, as HSPC greatly diminishes those after-effects, it may seem that we are advancing a control scheme whose performance is non-biological. However, this is not the case, first, because HSPC reduces after-effects but it does not suppress them but rather they are subject to extinction, or washout (figure 3c,d). Second, HSPC curtails erroneous feed-forward responses as soon as SPEs can be detected. Experimentally, the fast correction in catch trials that we demonstrate with HSPC has also been observed using a grip-force modulation paradigm where participants learned to anticipate an artificially delayed (but self-generated) disturbance [25]. However, to the best of our knowledge, this kind of catch trial has not been studied in the context of the anticipatory control of balance. For an APA task as the one we have modelled, providing catch trials will likely require a virtual reality setup allowing to decouple the distal and proximal cues; that is, showing a virtual looming object that in paired trials coincides with an actual object hitting the participant but that in catch trials does not.

    In addition, generalization of adaptive motor responses has been found in limb [26,54] and postural control [27]. Subjects trained to catch a ball with one arm perform equally good when they switch arm [54], a result that cannot be explained in terms of inverse models (by definition, effector specific). Moreover, subjects that learned to counter a force-field perturbation in a sitting position correctly anticipated the postural disturbances that compensating for the force field would introduce in an upright posture [27]. This result argues in favour of an architecture composed of a forward internal representation of the dynamics of the environment coupled with an internal model of the postural dynamics, where the former is effector independent and the latter is already fine-tuned by experience; a proposal consistent with the hierarchical structure of HSPC.

    Put together, these three sources of evidence (generalization of acquired responses across limbs and postures, rapid reversal of the erroneous response in catch trials and anticipatory responses affected by altered proprioception) support a hierarchical control architecture that acquires forward models of the environment, exploits SPEs and shows a dependency between anticipatory and compensatory responses. All these features are embodied in HSPC but are difficult to reconcile with an inverse model-based architecture such as FEL.

    Finally, APAs are also observed in response to voluntary actions that trigger self-generated perturbations (e.g. extending an arm, loading a weight) [4,8]. Even though we focused on externally generated perturbations, HSPC could account for self-initiated perturbations by replacing the distal sensory input with an internally generated signal encoding the initiation of a motor plan [55], which would trigger a similar cascade of sensory predictions.

    (d) Implications for cerebellar physiology

    HSPC advances a hypothesis of cerebellar function in the domain of anticipatory control. It has its origins in a model of the cerebellum [29,42] as is the case for FEL [16]. In both architectures, adaptive modules are implemented as adaptive filters, a widely used computational model of cerebellar function [56,57]. Moreover, here we have demonstrated HSPC in a task that depends on the cerebellum [2,3,58]. A distinctive trait of our implementation of the cerebellar algorithm is the use of a delayed eligibility trace (Methods – equation (2.1)) [42]. Taking into account that in the cerebellum contextual information reaches Purkinje cells through the parallel fibres whereas specific error signals arrive via the climbing fibres, in terms of cerebellar physiology, the eligibility trace mechanism predicts a plasticity rule in the synapses between parallel fibres to Purkinje cells that modifies synaptic weights whenever activity in the parallel fibres precedes climbing fibre input by a certain time interval. Both in HSPC and FEL, we set that interval according to the behavioural constraints of the agent/task [29], a requirement that seems to apply also in the cerebellum, where the timing of the plasticity rule of cerebellar Purkinje cells is matched to behavioural function [59].

    From a system level perspective, our proposal emphasizes the computations that could be achieved by organizing cerebellar modules in a hierarchical fashion. At the level of anatomy, such a functional hierarchy would require cerebellar microcircuits to be serially connected. That is, the output from one microcircuit could provide an input to the next one (or ones) in the hierarchy. This could be realized as non-reciprocal nucleo-cortical connections by which a particular area of the cerebellar nucleus could feed a cerebellar cortical microzone projecting to a separate region of the cerebellar nucleus. This arrangement is in agreement with the descriptions of the organization of the nucleo-cortical projections between the nucleus interpositus posterior (NIP) and the nucleus interpositus anterior (NIA) already present in the literature [60]. Indeed, it has been shown that a proportion of nucleo-cortical projections originating in NIP target NIA, whereas the opposite is not the case. This would imply that activity on NIP could modulate, after one step of cerebellar cortical processing, activity in the NIA. On a more speculative note, from the perspective of HSPC, we would expect NIA to be more directly involved in motor control tasks (i.e. targeting motor nuclei) [61] whereas NIP would be more linked to sensory processing areas. Indeed, tracing studies showed that NIP sends its outputs to the ventro-lateral and ventro-posterior nuclei of the thalamus [62], main relays to somatosensory cortical areas crucial for the computation of SPEs [63] and to frontal cognitive areas [64,65].

    (e) Summary

    We have shown how a hierarchical control architecture based on sensory predictions enables the acquisition of responsive and generalizable APAs better than one based on the traditional view building on sensory-motor associations. In doing so, we went beyond the standard inverse–forward model dichotomy by showing how (the inversion of) forward models that acquire sensory–sensory associations can contribute to motor behaviour with, what we have called, action-aware sensory predictions. Our results provide a validation of key principles behind the active inference framework of motor behaviour and their realization in the DAC theory. In future work, we shall study how this anatomically constrained theory of anticipatory motor control could be extended to address the questions of optimality that arise when one takes effort-error trade-offs or the modulation of task-irrelevant versus task-relevant variability into account [66]. At this point, however, we expect the HSPC architecture to allow for the advancement of our understanding of the mechanisms underlying physiological anticipatory motor control, which we propose can now be treated in a framework related to active inference, while also contributing to the development of robust control architectures for artificial systems.

    Data accessibility

    Source code of the simulation and data used to generate the graphics are available at the following Dryad Digital Repository http://dx.doi.org/10.5061/dryad.1vs77 [67].

    Authors' contributions

    G.M., I.H., M.S., P.V. designed the research; G.M., I.H., M.S. designed the experiments; G.M., I.H. implemented the simulations; G.M., I.H., P.V. wrote the original manuscript; G.M. analysed the results; K.F., P.V. supervised the research and contributed to the final manuscript.

    Competing interests

    Authors declare to have no competing interests.

    Funding

    This work has been supported by the ERC grant no. 341196 cDAC and the European Commission's Horizon 2020 socSMC grant (no. socSMC-641321H2020-FETPROACT-2014).

    Footnotes

    These authors equally contributed to this work.

    Electronic supplementary material is available online at https://dx.doi.org/10.6084/m9.figshare.c.3948505.

    Published by the Royal Society. All rights reserved.