Experiments in artificial culture: from noisy imitation to storytelling robots

This paper presents a series of experiments in collective social robotics, spanning more than 10 years, with the long-term aim of building embodied models of (aspects of) cultural evolution. Initial experiments demonstrated the emergence of behavioural traditions in a group of social robots programmed to imitate each other’s behaviours (we call these Copybots). These experiments show that the noisy (i.e. less than perfect fidelity) imitation that comes for free with real physical robots gives rise naturally to variation in social learning. More recent experimental work extends the robots’ cognitive capabilities with simulation-based internal models, equipping them with a simple artificial theory of mind. With this extended capability we explore, in our current work, social learning not via imitation but robot–robot storytelling, in an effort to model this very human mode of cultural transmission. In this paper, we give an account of the methods and inspiration for these experiments, the experiments and their results, and an outline of possible directions for this programme of research. It is our hope that this paper stimulates not only discussion but suggestions for hypotheses to test with the Storybots. This article is part of a discussion meeting issue ‘The emergence of collective knowledge and cumulative culture in animals, humans and machines’.


Introduction
In this paper, we describe two sets of experiments with small groups of real robots, conducted over the course of more than 10 years, in the Bristol Robotics Lab. The long-term aim of these ongoing experiments is to explore aspects of the question 'how do we have culture?' in a new way, by modelling the lowlevel processes and mechanisms of cultural evolution with robots. In this paper we adopt Mesoudi's definition of culture: 'information that is acquired from other individuals via social transmission mechanisms such as imitation, teaching or language' [1]. We outline two sets of experiments-the first already completed and the second in preparation-with a focus on two of these transmission mechanisms: imitation and language.
The first set of experiments we describe were directly inspired by the thought experiment in [2, p. 106], which imagines a group of robots capable of imitating each other. Referred to as Copybots, their ability to imitate actions with variation makes them very simple meme machines. Another source of inspiration was Gabriel Tarde who proposed 'a remarkable sociological research project' [3] when he wrote Artificial Culture project, we realized that we could set up a free-running group of robots (an artificial society) and literally observe, record and analyse every minute detail of the robots' interactions with each other.
A second and more recent set of experiments extends our robots' cognitive capabilities with simulation-based internal models. A simulation-based internal model (literally a robot with a simulation of itself, inside itself ), allows a robot to be able to ask itself 'what if' questions. This capability has been described as a functional imagination [4], as it enables a robot to 'imagine' the consequences of its actions (andin our implementation-the reaction of others to those actions). Our experimental implementation of a simulationbased internal model, which we refer to as a consequence engine (CE), has proven to be remarkably powerful. Our experiments with the CE were inspired by both the simulation theory of cognition [5,6] and Dennett's 'Tower of Generate-and-Test' [7]. Both simulation and the loop of generate-and-test are present in the architecture of the CE.
In our current work, also using the CE, we aim to explore social learning not via imitation but robot-robot storytelling in an effort to model this very human mode of cultural transmission. Although we are not aiming at evolving language, we have nevertheless been influenced by both the seminal Talking Heads experiments of Steels [8] and work to evolve mechanisms of communication in a swarm of robots [9]. Instead, and in addition to the CE, our 'Storybots' are being equipped with the means to communicate via speech, and what Penn et al. [10] call the 'spectacular scaffolding provided by language'.
Our method for both sets of experiments is to build a working model or, as we prefer to describe it, an embodied simulation consisting of a group of autonomous robots, in which the robots are programmed with simple behaviours and interact with each other in an artificial arena. The arena is equipped with a system that allows each robot's movements to be tracked and recorded, alongside a time-stamped record of each robot's internal decisions sent to the logging system via a local WiFi network. In this way, we are able to capture Tarde's 'minute transformations' for analysis.
Physical embodiment is important to us for several reasons: first, because experiments with real robots are noisy and unpredictable. Even though our robots are seemingly identical, small differences between the motors and sensors mean that each robot will move and sense in slightly different ways. These unintended heterogeneities serve to model differences between conspecific animals and humans. And the noise will prove to be of critical importance. Unlike computer simulations, the noise, stochasticity and physics come for free, just as they do for animals and us. Second, and perhaps most importantly, robots-like animals-have physical bodies that constrain how they behave and 'think' [11]. Robots also see each other only from their own first-person perspectives. Yes, our robots have distinctly non-human minds [10]albeit of a kind so simple that the term 'mind' is hardly appropriate-but, we contend, they have enough in common with animals and humans to allow us to plausibly model interesting aspects of social learning and behavioural evolution. The work of this paper fits, we believe, within the microevolution strand of the science of cultural evolution [12], and although simulation models of cultural evolution are not new [13], we believe that our approach using robots in an embodied individual-based simulation is novel. Physical embodiment and noise together provide the 'natural phenomena' [14] that can be exploited in cumulative cultural evolution. This paper proceeds as follow. In §2, we outline the Copybots and the key findings from the first set of experiments. Then, in §3 we describe the consequence engine-the key innovation of our second generation of experimental work. We illustrate the CE and the kind of emergent behaviour that is typical of real-robot embodied simulations with the pedestrian experiment, before then introducing the Storybots. We conclude the paper in two parts: in §4a we outline Dennett's tower of generate-and-test before then showing how it provides a unifying framework in which we can classify all of the robots of this paper. Finally in §4b, we discuss experimental possibilities for the Storybots hoping this will stimulate research questions in cultural microevolution that might feasibly be explored.

Copybots
In a series of experiments, we implemented social learning in a group of robots [15]. Simple wheeled robots (figure 1) were programmed to learn socially, from each other, by imitation. These miniature robots-called e-pucks-are extremely simple compared with animals; they have just two wheels and so cannot interact with objects in their environment except by colliding with them. Their sensorium is equally limited-they 'see' only with a single 640 × 480 resolution camera, and something approaching a sense of touch is provided by eight short-range infrared proximity sensors mounted around their body radius [16]. From a behavioural perspective the robots are a blank canvas. They have no built-in or innate behaviours, all must be programmed [17].
In these experiments, imitation was strictly embodied. Robots have no access to each other's internal states, instead robots observed each other using their onboard sensors and, on the basis of only visual sense data from a robot's own camera and perspective, the learner robot inferred another robot's pattern of movements. (This contrasts with the social learning in Bredeche & Fontbonne [18], in which robots learn by sharing internal parameters when they encounter each other.) We 'seed' each Copybot with initial behaviours, which are self-contained movement sequences (or 'dances') which we refer to as memes (Dawkins [19] defines a meme as 'that which is imitated'). We then free run the Copybots with each robot alternating between enacting memes and watching (and learning) those memes.
Not surprisingly embodied robot-robot imitation is imperfect. A combination of factors including the e-puck robots' low-resolution camera, variations in ambient lighting, heterogeneities among the robots, multiple robots sometimes appearing within a learner robot's field of view, and of course having to infer another robot's movements by tracking the relative size and position of that robot in the learner's field of view, lead to imitation errors. Furthermore, some memes are easier to learn by imitation than others (think of how much easier it is to learn the steps of a slow waltz than the tango by watching your dance teacher). The fidelity of embodied imitation for robots, just as for animals, is a complex function of four factors: (i) the behaviours being imitated, (ii) the robots' sensorium and morphology, (iii) environmental noise and (iv) the inferential learning algorithm (for a brief outline of how the algorithm works see appendix A).
royalsocietypublishing.org/journal/rstb Phil. Trans. R. Soc. B 377: 20200323 But rather than being a problem, noisy imitation was our aim. We are interested in the dynamics of social learning, and in particular the way that memes evolve as they propagate across the collective, by social learning. Noisy social learning means that behaviours are subject to variation as they are copied from one robot to another. Multiple cycles of imitation (robot B socially learns behaviour m from A, then robot C learns the same behaviour m 0 (m mutated), from robot B, and so on), gives rise to behavioural heredity. And if robots are able to select which learned behaviours to enact we have the three Darwinian operators for evolution, except that this is behavioural, or memetic, evolution.
Our experiments demonstrate that embodied behavioural evolution does indeed take place. If selection is random, that is robots select which behaviour to enact from those already learned-with equal probability-then we see several interesting findings. First, if by chance one or more high fidelity copies follow a poor fidelity imitation, the large variation in the initial noisy learning can lead to a new behavioural species, as shown here in figure 2, thus demonstrating that noisy social learning can play a role in the emergence of new-and potentially useful-behaviours in behavioural (i.e. cultural) evolution [20]. Second, we observe that behaviours adapt to be easier to learn, i.e. better 'fitted' to the sensorium and morphology of the robots [21], a result which appears to mirror the findings of Kirby et al. [22], that artifically evolved languages evolve to be easier to learn.
A third finding from this series of experiments is perhaps the most unexpected. When we ran the same embodied behavioural evolution with three memory sizes-no memory, limited memory and unlimited memory-the limited memory case led to the most 'stable' population of behaviours across the robot collective, i.e. a smaller number of larger clusters of related memes; in other words, a small number of relatively persistent behavioural types. In figure 3, we see one cluster of 12 closely related memes. This result suggests the intriguing conclusion that forgetting may be a significant collective trait in behavioural evolution [21], and might also be related to what is referred to as conformist social learning, in which learners are more prone to act as others do [23].
A related series of experiments combine social and individual learning. We extended reinforcement learning with imitation, so that robots could observe and learn, by imitation, from more 'experienced' individual learners (for an outline description see appendix B). Reinforcement learning is a well-known approach to machine learning based on trial-and-error interactions between an agent and its environment [24]. As above, the imitation is strictly embodied, and an imitating robot has no access to the internal state of an observed robot. In a series of experiments, we saw that robots with imitation-enhanced reinforcement learning learned faster than those with reinforcement learning alone. Not a surprising result; social learning is very much faster than individual learning, and robots, just like animals, can benefit from learning socially from more experienced others [25]. Such social learning in animals, for instance when juveniles typically prefer to copy older individuals who are more experienced, is one of the social learning strategies reviewed in Kendal et al. [26]. However, we were surprised to observe that errors in the imitation phase sometimes led to robots learning even faster. It appears that imitation errors that arise while copying another robot can lead to faster learning. This work has, perhaps for the first time, studied embodied social learning, by imitation, in real-robot collectives. The work has value in extending techniques for robot-robot learning. But its primary purpose is to model and illuminate low-level processes and mechanisms of behavioural evolution. Embodied social learning provides minimal but sufficient biological plausibility and as outlined here, embodiment leads naturally to imperfect imitation, which appears to play an important role in the dynamics of behavioural evolution.

Storybots
In more recent work, we have extended the robots' cognition with a simulation-based internal model. Robots equipped with a simulation-based internal model have the ability to simulate (or 'imagine') the future actions of both themselves and others, and the consequences of those actions. Figure 4 shows a block diagram of a robot equipped with a consequence engine (CE), which consists of the blue boxes and dataflows on the left. The simulator at the heart of the CE contains three components: a model of the world, which must be initialized to mirror the robot's immediate environment including the objects and actors in it, as it is now, via the 'object tracker-localizer'; a model of the robot itself; and an exact copy of the robot's controller. The loop of generate-and-test shown on the left, generates each of the robot's next possible actions, then 'runs' the simulator for each of those actions in turn. The consequence evaluator determines the anticipated outcome for each of those actions, so that the robot's action selection can be appropriately moderated (what counts as appropriate depends on whether the CE's primary purpose is keeping the robot safe, or behaving ethically, etc.). In our experiments, the CE will typically generate and test 30 next possible actions and, for each action, simulate 10 s into the future. The complete generate-and-test cycle will be repeated every 0.5 s. The CE has proven to be a remarkably powerful piece of cognitive machinery. With it we have experimentally demonstrated (i) robots that can make simple ethical decisions in order to pro-actively prevent another robot (acting as a proxy human) from coming to harm [28,29]; (ii) robots with enhanced safety [30]; and (iii) robots capable of the imitation of goals [31]. We have also argued that the CE provides a robot with a simple artificial theory-of-mind [32]. These experiments were conducted with real physical e-puck (figure 1) and NAO robots (figure 6).
The pedestrian experiment provides an elegant example of emergent behaviour in two robots, each equipped with a CE, and programmed with the goal of approaching and then passing each other safely. Figure 5 shows the trajectories of the two robots: blue, starting from the left, and green, starting from the right. In real-robot experiments, four times out of five blue and green pass each other as two pedestrians would, each stepping to her left (or right) (see figure 5a). But one time in five, both blue and green step toward each other, and-just like humans-engage in a short dance before resolving and proceeding on their way (figure 5b).
More recently we have theorized that the CE may also be co-opted as a mechanism for robot-robot storytelling [27]. The CE provides a robot with the cognitive machinery to be able to ask 'what if?' questions. These could be very    Figure 6 illustrates this process for robot A. If robot B, the listener, is equipped with a microphone and speech recognition process it is able to listen to robot A's story, as shown in figure 7. Because robot B has the same internal modelling machinery as A-they are conspecifics-it is capable of 'running' the story it has just heard within its own internal model. In order that this can happen we need to modify the robot's programming so that the what-if sequence it has heard and interpreted is substituted for an internally generated what-if sequence. Once that substitution is made, robot B is able to run A's what-if sequence (its story) in exactly the same way it runs its own internally generated next possible actions, simulating and evaluating the consequences. Robot B is therefore able to 'imagine' robot A's story. Does this story mean anything to robot B? Arguably it does, as B is able to simulate and therefore 'experience' the sensory inputs, and consequences (if any) of listening to A's story.
Note that the humanoid NAO robots, shown in figures 6 and 7, do not have human-like intelligence even though their appearance might suggest otherwise. Like the e-puck Copybots, all behaviours must be programmed from scratch. The NAO robots do, however, have the advantage of microphones and loudspeakers, alongside a library of functions for speech recognition and synthesis. This makes them much more suitable as Storybots.
If we provide not just two, but a group of robots with a rich physical environment they can explore then we are providing the robots with something they can tell each other stories about. And, for the same reasons that our Copybots' imitation is noisy, so will our Storybots experience imperfect communication, so the stories will mutate as they are told and re-told. The architecture of the CE  Here both robots make a decision to turn at the same time, green to its left and blue to its right; a 'dance' then ensues before the impasse is resolved. Adapted from Winfield [32].
royalsocietypublishing.org/journal/rstb Phil. Trans. R. Soc. B 377: 20200323 and its simulation-based internal model opens the possibility that we can replay and visualize any episode in a robot's 'imagination', thus adding further detail to Tarde's 'minute transformations' and allowing us to inspect the robots' mental representation of stories as they pass from robot to robot.

Discussion and conclusion (a) Dennett's tower of generate-and-test: a unifying framework
As mentioned in the introduction, Dennett's tower of generate-and-test directly inspired the second series of experiments outlined above, culminating in the Storybots. Dennett's tower also provides us post facto with a single framework to unify all of the experimental work outlined in this paper. Dennett [7] proposes a conceptual framework, the tower of generate-and-test, for thinking about design options for brains. Each floor of the tower uses the three Darwinian operators: copy, generate variations, test outcomes-repeat. Each floor builds on the outcome of the previous ones. The framework provides a way of seeing how humans, as a cultural species, emerged from creatures with no cumulative culture, using the same 'generate and test' process all the way up. The ground floor is inhabited by Darwinian creatures. Variation is provided by the more or less random recombination and mutation of genes, and selection is brutal-design-by-death [33]. All living things are Darwinian creatures.
Some of these creatures emerged with conditionable plasticity; that is, not all their behaviour was genetically determined. Occupying the first floor of Dennett's tower, these Skinnerian creatures try out a variety of responses to their environment selecting only actions that are reinforced for repeating. Dennett named them after Skinner's comment that, 'Where inherited behaviour leaves off, the inherited modifiability of the process of conditioning takes over' [34, p. 83 Figure 6. Robot A, the storyteller, 'narrativizes' one of the 'what-if' sequences modelled by its generate-and-test machinery. First, an action is tested in the robot's internal model (left); second, that action-which is not executed for real-is converted into speech and spoken by the robot. Adapted from Winfield [27].  royalsocietypublishing.org/journal/rstb Phil. Trans. R. Soc. B 377: 20200323 models of (relevant features of ) the environment, as well as of their own body and behaviour. Variation comes from imagining different actions; selection is by imagined consequences. As Popper remarked, when imagining outcomes we 'let our conjectures, our theories, die in our stead' [35].
We are not alone in being Popperian creatures; most mammals, birds, fish and reptiles can learn through both classical and operant conditioning, and can contemplate the consequences of at least some of their actions. On the fourth floor are Gregorian creatures. As far as we know, we are the only Gregorian creatures, at least on this planet. Gregory introduced the idea of 'tools of Mind' or 'mind-tools' by which he meant 'aids to measuring, calculating and thinking' [36, p. 48], including tools like scissors or levers as well as spoken and written words, and ways of counting. Language makes possible long trains of thought, the ability to look ahead, and the sharing of tools that enhance intelligence. These tools are built up over generations by creatures that can copy information from each other, building up culture. Gregorian creatures are therefore meme machines [2] as well as Darwinian, Skinnerian and Popperian creatures. This ability to imitate and learn from others makes possible what Dennett calls the 'deliberate, foresightful generate-and-test known as science' [7, p. 380].
Dennett's tower of generate-and-test contrasts with other modular-mind frameworks, for instance Mithen's cathedral model [37], in three important respects: (i) it is not a flat model of intelligence modules but instead a nested hierarchy, (ii) it defines a number of key transitions in the evolution of mind, and (iii) in the third transition, from Skinnerian to Popperian, it introduces the crucial innovation of an internal model. Dennett's tower is commented on in several papers (including [38,39], and more recently [40]). Godfrey-Smith's paper [40] both critiques and extends Dennett's framework, and notably points out an omission in the original framework, that an internal model requires a mechanism for consequence evaluation in order to be of value. In developing our consequence engine (CE), we too realized that we needed to implement just such a mechanism.
Let us now place our robots within the floors of Dennett's tower. The basic Copybots are Darwinian creatures alone, their design having been selected from many other possible designs we might have chosen-a form of design-by-death. The Copybots with imitation-enhanced learning are also Skinnerian creatures. Both types can imitate, which might suggest they are Gregorian, but this would be a misclassification as the Copybots have no internal model-the defining characteristic of Popperian, and hence also Gregorian creatures. By contrast, all of our robots with a CE are Popperian; the CE enables them to generate and test hypotheses about what to do next. They are, however, not strictly Skinnerian because we have not added reinforcement learning (although this is perfectly feasible). The Storybots proposed in figures 6 and 7 finally take us from the Popperian to Gregorian level.
While the Copybots imitate by visually observing movement memes, lacking a CE they cannot predict and evaluate the consequences of those imitated behaviours. The Storybots, on the other hand do not imitate behaviour directly. Their method of learning from others is mediated by the mind-tool of language. Table 1 summarizes the classification outlined here.
Applying the framework of Dennett's tower does raise the interesting question of whether intelligence must necessarily be achieved by building each level of generate-and-test on top of the preceding one, or whether robot intelligence might skip one or more, to achieve cumulative culture more directly. The fact that we are able to skip the Skinnerian level is a consequence of hard coding the capabilities summarized in table 1. Strictly speaking, this is a serious break with both the evolutionary origins and cumulative nature of each level in Dennett's framework. Given, however, that artificially evolving each capability in succession is far beyond what can be achieved in evolutionary robotics at present [41], and that we have demonstrated the implementation of Skinnerian learning, the break is-we believe-justified.

(b) What can we learn from the Storybots?
In order to address this question, consider first what we might learn from the Storybots as presented in §3. These robots are equipped with a CE plus the speech synthesis and recognition capabilities shown in figures 6 and 7. Like the other robots with a CE outlined at the start of §3, the basic Storybots have a short-term memory which is used to retain the evaluated consequences of each generated-andtested action, in order that the most appropriate action can be selected. But at the end of each complete cycle of generate-and-test that short-term memory is cleared ready for the next cycle. These robots do not learn, which may seem surprising given the capabilities demonstrated by the CE. However, even these basic Storybots (like the first Copybots we tested) can usefully allow us to explore how stories vary as they are told and re-told by several robots. As part of this experiment, we would first need to find the optimal balance between zero and perfect fidelity robot-robot speech-transmission (as we had to do with the Copybots for vision-based imitation) such that we do see a reasonable level of variation; this would require for instance adjusting speech loudness, microphone sensitivities and attending to directionality so that the listener robot is facing the narrating robot.
Next consider memory. It would be straightforward to equip each Storybot with a long-term (episodic) memory. The memory would store discrete events (things that When it comes to deciding how to select which of a robot's stories it should narrate, we have several options: (i) we could use the same strategy as the Copybots and choose one from the robot's memory at random with equal probability. If we also limit the Storybots memory, as suggested by the Copybots experiments of Erbas et al. [21], then we might expect that-in time-some stories become dominant in the collective memory of the group of Storybots, for no other reason than they happen to have been selected then re-told with high fidelity. Of more interest would be (ii) selecting stories by content, for instance those that point out hazards, so that telling (for the first time) or re-telling the story is, in effect, 'spreading the word'. The robot would run each of the stories it has heard and remembered, in its CE, and select the one that it 'imagines' as the most dangerous, using exactly the same evaluation mechanism the robot would use when (generating and) testing its own possible actions. A third interesting option (iii) would be to select stories to re-tell on the basis of which other robot told the story. One strategy would be to choose to re-tell stories told by the robot whose stories have been re-told the most often in the group, thus introducing a frequency bias. Another would be to re-tell stories from the robot whose stories are judged the most impactful in the sense outlined in strategy (ii). If we introduce new robots into the group at different times we might see the emergence of an 'elder' storyteller robot that is accorded a prestige bias. Is it possible that we might also see the emergence of both 'cultural ratcheting' and collective 'memory splitting' (unlike in figure 3) [43]? There are many interesting options to be explored within strategy (iii).
Also important is how a robot decides when to tell the story it has selected for re-telling. Since our Storybots will, like the Copybots, be moving around in their shared environment they will encounter each other quite frequently. These encounters present opportunities for Storybots to tell new stories or re-tell previously heard stories. How, on meeting each other, would robots agree which one will be the storyteller (as in figure 6) and which the listener (figure 7)? A simple mechanism would be for both robots-on meeting-to start an internal timer, choosing at random, the number of seconds to wait. If a robot has not heard the other one speak before its timer runs out then it will speak first, taking the role of storyteller. The other robot will hear the storyteller start to speak while its timer is still running and adopt the role of listener. If the experiment uses selection strategy (iii) then a robot could, on encountering a 'prestigious' storyteller, add a few extra seconds to its randomly chosen wait-before-speaking timer. We could call this a 'deference' value.
By integrating an autobiographical/episodic memory within the CE of the Storybots, we are in effect providing the robots with what Conway [44] calls a self-memory system. Arguably our Storybots will have sufficient cognitive machinery for the emergence of an artificial 'narrative self' 3 that will become, in a short time, unique to each robot. The directions we have outlined here suggest the exciting possibility that we can, with our embodied simulation, experimentally explore the relationship between the developing 'narrative selves' of the individual robots and their evolving shared narrative, i.e. oral culture. What might such an experiment tell us about animal or human cultural evolution?  Figure 9. An illustration of line fitting, the third stage of learning by imitation. Note that positions P3 and P4, and P8 and P9 coincide, marking the points at which the teacher robot turned and thus the beginning and end of line L. Adapted from Erbas et al. [21].
royalsocietypublishing.org/journal/rstb Phil. Trans. R. Soc. B 377: 20200323 Appendix A. Imitation in the copybots In this appendix, we outline how the Copybots learn movement-memes by imitation. This algorithm solves the socalled 'correspondence problem', a term which refers to the learner's problem of translating a set of perceptual inputs to motor actions that correspond with the perceived actions of the teacher [45].
To simplify the process, each movement-meme consists only of turns (in which the e-puck robot rotates on the spot) and straight line segments of a given length. Thus the 'triangle' meme seeded into e-puck A, in figure 2 (meme 1), is described by the list of three pairs (60, 15), (60, 15), (60, 15), where (60, 15) means 'rotate 60°then move 15 cm'.
The e-pucks are fitted with red 'skirts', so that when the learner robot observes the teacher robot with its camera the learner 'sees' the teacher's skirt as a red rectangle in its field of vision. While the teacher is enacting the movementmeme, the red rectangle will both move within the learner's field of vision and sometimes get larger or smaller-as the teacher moves either nearer to or further away from the learner. The learner robot uses vision processing to estimate the position of the teacher robot, relative to itself, as x,y coordinates. Such a list of coordinates is shown on the left of figure 8. This list is the input to the algorithm.
The first stage of the algorithm 'detect turns' is to identify when the teacher robot is turning, by finding pairs of similar x,y coordinates. These are circled in figure 8. The turns mark the beginning and end of each straight line move. The second stage is to then use a line-fitting (regression) algorithm to estimate each straight line move. An illustration is shown in figure 9. The final stage is to put each turn and estimated line together as a reconstructed trajectory-a list of pairs of turn angles and distances moved. In this way, the imitation algorithm enables the learner robot to infer the teacher robot's sequence of moves. For a more detailed explanation, see Erbas et al. [21].

Appendix B. Imitation-enhanced reinforcement learning
In this experiment, each robot has its own work area, as shown in figure 10, and must-using individual (reinforcement) learning-learn how to navigate from the top right-hand corner, to the bottom left-hand corner of its area. Learning this way is slow, taking several hours. But in this experiment the robots also have the ability to learn socially, by watching each other. Periodically one of the robots will stop its individual learning and drive itself out of its own area, to the small opening at the bottom left corner of the other robot's work area. There it will stop and simply watch the other robot while it is learning for a few minutes. Using the same movement imitation algorithm outlined in appendix A, the watching robot will (socially) learn a fragment of what the other robot is doing, then combine this knowledge into what it is individually learning. The robot royalsocietypublishing.org/journal/rstb Phil. Trans. R. Soc. B 377: 20200323 then runs back to its own work area and resumes its individual learning. We call the combination of social and individual learning 'imitation-enhanced learning'. For a more detailed explanation, see Erbas et al. [25].