An optogenetic analogue of second-order reinforcement in Drosophila

In insects, odours are coded by the combinatorial activation of ascending pathways, including their third-order representation in mushroom body Kenyon cells. Kenyon cells also receive intersecting input from ascending and mostly dopaminergic reinforcement pathways. Indeed, in Drosophila, presenting an odour together with activation of the dopaminergic mushroom body input neuron PPL1-01 leads to a weakening of the synapse between Kenyon cells and the approach-promoting mushroom body output neuron MBON-11. As a result of such weakened approach tendencies, flies avoid the shock-predicting odour in a subsequent choice test. Thus, increased activity in PPL1-01 stands for punishment, whereas reduced activity in MBON-11 stands for predicted punishment. Given that punishment-predictors can themselves serve as punishments of second order, we tested whether presenting an odour together with the optogenetic silencing of MBON-11 would lead to learned odour avoidance, and found this to be the case. In turn, the optogenetic activation of MBON-11 together with odour presentation led to learned odour approach. Thus, manipulating activity in MBON-11 can be an analogue of predicted, second-order reinforcement.


Introduction
Animals and humans go to great lengths to obtain rewards, such as food and water, and to avoid punishment, such as bodily damage and pain. Essential to these processes is the learning of cues predictive of such actual or firstorder reinforcement. Critically, predictive cues not only acquire learned valence but, once predictive relationships are established, also can confer learned valence themselves; i.e. they can serve as second-order reinforcement [1 -3]. In humans, for example, learning that money can buy food establishes money as a second-order reward. In general, second-order conditioning may underlie chains of predictions and early anticipatory behaviour in humans and animals. Indeed, the capacity for second-order conditioning is widely distributed across the animal kingdom, including insects [4][5][6][7], and is implemented in many computational models of associative learning [8].
In flies, presenting odour A with an electric shock punishment and odour B without punishment leads to learned avoidance of A in a subsequent choice test. This learning of an odour as a predictor of electric shock takes place in the Kenyon cells (KCs) of the mushroom body (figure 1a) [9][10][11][12]. The mushroom body provides a sparse, combinatorial representation of the sensory environment, including odours. Along their long axonal fibres, the KCs further receive intersecting input from neurons mediating internal reinforcement, many of which are dopaminergic (DANs). The coincidence of activation by odour and of DAN signalling can lead to presynaptic plasticity at the cholinergic synapse between the KCs and the output neurons of the mushroom body (MBONs). Arborizations from DANs and MBONs overlap and are regionally confined along the KC fibres, establishing a characteristic compartmental organization. In the case of the PPL1-01 DAN mediating an internal punishment signal, synaptic strength between the odour-coding KCs and the approach-promoting MBON-11 is reduced [18,19]. For the punished odour, the innate balance between approach and avoidance is thus tilted in favour of avoidance. In other words, activity in PPL1-01 can provide first-order punishment, and an odour that predicts first-order punishment leads to reduced activity in MBON-11. We therefore wondered whether, in experimentally naive flies, optogenetically silencing MBON-11 might be an analogue of a punishment-predicting odour such that it confers a punishing effect of second-order upon an actually present odour associated with such silencing (also see [20])and whether in turn optogenetically activating MBON-11 might have a rewarding effect.

Material and methods
Procedures follow [21], unless mentioned otherwise. Drosophila melanogaster were maintained on standard food, with 60 -70% relative humidity, at 258C, and in constant darkness to prevent unintended optogenetic effects. Flies aged 1 -3 days after hatching were collected and kept at 188C for up to four additional days. MB320C and MB085C (Fly Light Split-GAL4 Driver Collection) [13] as driver strains covering the PPL1-01 and MBON-11 neurons, respectively, were crossed to UAS-ChR2-XXL (Bloomington stock number: 58374) [22] or UAS-GtACR1 as effectors for optogenetic activation or silencing, respectively. To generate the latter strain, the GtACR1 DNA was synthesized (Thermo Fisher Scientific) according to the published sequence [23] with codon usage optimized to D. melanogaster. The synthesized GtACR1 DNA with a C-terminal YFP was inserted into the expression vector pJFRC7. Embryo injection (BestGene Inc.) was  [9 -14]). Odour presentation in untrained animals mediates balanced approach and avoidance tendencies of mushroom body output neurons (MBONs). Coincidence of odour-evoked activity in the mushroom body Kenyon cells (KCs) and activity of the dopaminergic neuron PPL1-01 evoked by the electric shock leads to a depression of the synapses from these KCs to an approach-promoting MBON. In a subsequent test, this allows avoidance tendencies through non-depressed KC-MBON synapses in parallel compartments to prevail. The organization of innate olfactory, punishment-and reward-related behaviour largely bypasses the mushroom body. Behavioural experiments used a set-up from CON-ELEK-TRONIK (Greussenheim, Germany) and took place at 23 -258C and 60 -80% relative humidity. Training was performed in red light, which is invisible to flies, and testing in darkness. As odorants, 50 ml benzaldehyde (BA) and 250 ml 3-octanol (OCT) (CAS 100-52-7, 589-98-0; both from Fluka, Steinheim, Germany) were applied to 1 cm-deep Teflon containers of 5 and 14 mm diameter, respectively. Flies were presented with both odours during training, but only one was paired with light for optogenetic activation (465 nm) or silencing (520 nm), whereas the other odour was presented alone (see electronic supplementary material, figure S2, for more details). The flies were then tested in a T-maze for their choice between the two odours. From the number of flies choosing each odour (#), the relative preference was calculated as The presentation of BA and OCT with or without the light (*) was alternated between repetitions of the experiment, allowing an associative memory score to be obtained from reciprocally trained sets of flies as Memory score ¼ BA Preference BAÃ À BA Preference OCTÃ 2 : ð2:2Þ Data were analysed with Kruskal-Wallis tests (KW-tests) to compare more than two groups, Mann -Whitney U-tests (U-test) for pairwise comparisons, one-sample sign-tests for comparisons to chance level (i.e. zero), in all cases with Bonferroni -Holm corrections of p , 0.05 significance levels as appropriate, using Statistica 11.0 (StatSoft, Hamburg, Germany) and R 2.15.1 (www.r-project.org).

Results
Presenting an odour together with optogenetically silencing MBON-11 via the green-light-gated anion-channel GtACR1 established aversive memory for the odour (figure 1b). This effect was replicated using three training cycles (figure 1c). Consideration of the genetic controls suggests a weak appetitive olfactory memory through the pairing of odour with the green light, which is visible to the flies. Critically, relative to either genetic control, silencing MBON-11 had a punishing effect. Conversely, does activating MBON-11 have a rewarding effect?
Presenting an odour together with optogenetically activating MBON-11 via the blue-light-gated cation-channel ChR2-XXL established appetitive memory for the odour ( figure 2). Corresponding to what is typically observed for primary food rewards such as sugar [24], this appetitive memory appeared slightly stronger under starved conditions (figure 2c; indeed starvation was shown to facilitate MBON-11 activity [25]). In the case of blue light too, the data from the genetic controls suggest a weakly rewarding effect. We further note that relative to the respective genetic controls, the punishing effect of silencing MBON-11 (figure 1c) appears to be stronger than the rewarding effect of activating it (figure 2b).
We conclude that silencing/activating MBON-11 has a punishing/rewarding effect.

Discussion
MBON-11 is GABAergic [13]. It targets premotor circuitry outside the mushroom bodies, and hetero-compartmental regions in the ipsi-and the contralateral mushroom body, and furthermore features a homo-compartmental and contralateral feedback loop onto the dopaminergic, punishing PPL1-01 neuron (figure 1d) [13,17,25,26]. All of these regions could contribute to reinforcement through manipulation of MBON-11 activity, and we expressly do not draw a conclusion as to which of these regions is indeed involved in these reinforcing effects. One scenario is that silencing MBON-11 lifts inhibition from PPL1-01, promotes PPL1-01 activity and thus exerts a punishing effect (but see [20]). Accordingly, the observation that activating MBON-11 has just a mild rewarding effect (figure 2) would suggest that spontaneous activity in PPL1-01 is moderate, and thus that silencing PPL1-01 would have less effect than activating it. Indeed, as previously reported, activating PPL1-01 is very strongly punishing (electronic supplementary material, figure S4B) [14,19], whereas silencing it is of no measureable rewarding effect (electronic supplementary material, figure  S4C) (see [27] for a punishing effect of silencing the DAN of the g3 compartment). This scenario would therefore suggest that targets other than PPL1-01 are responsible for the rewarding effect of activating MBON-11 (also see [20]).
Interestingly, the pathway from MBON-11 onto the glutamatergic MBON-01 neuron of the g5 compartment and further from MBON-01 to the rewarding DANs of that royalsocietypublishing.org/journal/rsbl Biol. Lett. 15: 20190084 compartment is critical for extinction learning after aversive training ( [26]; also see [25] report here! To reconcile this contradiction, consider that during second-order conditioning a stimulus X is first paired with primary reinforcement, and then X is presented together with a novel stimulus A in the absence of primary reinforcement. Whereas during AX training the effects of X as a reinforcement-predicting, second-order reinforcer will initially dominate, extended AX training will extinguish the X-withreinforcement association. The above scenario would thus suggest that the opposing effects of second-order reinforcement and extinction learning, well known to practitioners of this paradigm, are related to homo-versus hetero-compartmental processes. We note that placing the behavioural effects of manipulating MBON-11 activity into an experimental psychology framework of secondary reinforcement processing also encompasses the effect labelled 'BGAM' (for blockade of MBON-g1pedc-induced aversive memory) [20, fig. 3B,C], obtained by blocking synaptic output from MBON-11 (also see [28]). Critically, the present framework suggests that silencing MBON-11 or preventing synaptic output from it leads to aversive learning about the odour paired with such treatment, whereas [20, p. 569] suggests that synaptic output from MBON-11 is necessary to prevent aversive learning about odours presented in an unpaired manner (for a discussion of paired and unpaired learning, see [29]).
We think that it is interesting that activity in a cell such as MBON-11 can be an analogue of second-order reinforcement, because this is the earliest site efferent to the memory trace in the presynaptic terminals of the mushroom body KCs for such an effect. This might inform the search for such analogues of secondary reinforcement in other species. It also raises the question of how much further down efferent pathways such analogues of second-order reinforcement can be observed, and indeed what the relation of action to valence is.
Ethics. All experiments comply with applicable law and ethics regulations.
Data accessibility. For data and statistical report, see the electronic supplementary material, table S1.