Memory for stimulus sequences: a divide between humans and other animals?

Humans stand out among animals for their unique capacities in domains such as language, culture and imitation, yet it has been difficult to identify cognitive elements that are specifically human. Most research has focused on how information is processed after it is acquired, e.g. in problem solving or ‘insight’ tasks, but we may also look for species differences in the initial acquisition and coding of information. Here, we show that non-human species have only a limited capacity to discriminate ordered sequences of stimuli. Collating data from 108 experiments on stimulus sequence discrimination (1540 data points from 14 bird and mammal species), we demonstrate pervasive and systematic errors, such as confusing a red–green sequence of lights with green–red and green–green sequences. These errors can persist after thousands of learning trials in tasks that humans learn to near perfection within tens of trials. To elucidate the causes of such poor performance, we formulate and test a mathematical model of non-human sequence discrimination, assuming that animals represent sequences as unstructured collections of memory traces. This representation carries only approximate information about stimulus duration, recency, order and frequency, yet our model predicts non-human performance with a 5.9% mean absolute error across 68 datasets. Because human-level cognition requires more accurate encoding of sequential information than afforded by memory traces, we conclude that improved coding of sequential information is a key cognitive element that may set humans apart from other animals.

Summary: Trained wild-caught (WC) and hand-raised (HR) black-capped chickadees (BC) and mountain chickadees (MC) to discriminate between two sets of 10 songs each. For some birds, rewarded songs where all BC, for others all MC, and for others still the rewarded and unrewarded songs were a mix of BC and MC songs (pseudo-category, PS).
Reference: M. R. D'Amato and D. P. Salmon. Tune discrimination in monkeys (Cebus apella) and in rats. Summary: Exp. 1 trained rats and capuchin monkeys to discriminate between two tunes with different average frequency. Exp. 2 trained a similar discrimination, but with the tunes close in overall frequency. Data from two rats that failed to learn in Exp. 2 are not included. Data from the "random notes" conditions are not analyzed here. This condition involved many different tunes of unreported structure, composed at random from a given set of tones.
Reference: Akihiro Izumi. Effect of temporal separation on tone-sequence discrimination in monkeys.
Hearing research, 175 (1) Summary: Trained to reproduce sequences of 2 coloured lights by pecking the lights themselves in correct order, after the sequence had been presented. The different experiments manipulate the relative duration of the two lights, as well as ISI and RI intervals. Pigeons in Exp.n had been part of Exp.(n − 1). The author claims results falsify the memory trace model, but in fact the model fits well, see Figure 3 in the main text. Summary: In Exp. ZF1, zebra finches were trained to respond to 5 ABA sequences, and not to respond to 5 AAB sequences composed of the same sounds. The birds were then tested with other sequences in ABA and AAB pattern. Exp. ZF2 was a replication with artificial sounds rather than zebra finch song syllables. Exp. B was a replication of Exp. ZF1 with budgerigars as subjects. Summary: Trained to discriminate AXCX from BXCX, where A and B were red or green lights, X was blank and C was the presentation of two patterns (one had to be touched to get the reward). Trials given are an underestimate; the author reports 1000 daily trials for 4 months. Summary: Trained to peck one color if food had been previosuly delivered, another color if food had not been delivered. Food/No-food stimulus was followed by a variable delay of 2-38 s. Jackdaws performed better than pigeons at intervals up to 20 s, with a similar amount of training. 2 Model fitting The memory trace model described in the main text (equations 1 and 3) states that responding to a stimulus sequence x is a linear function of the sequence's Euclidean distance from positive and negative sequences, i.e., from sequences the animal has been trained to respond (positive) or to not respond (negative). Fitting this model to data means finding the values of the memory parameters r up , r down , and r blank (see main text) that best account for the data. Below, we first describe how we evaluate model fit for a given set of parameter values, and then how we search for those parameter values that maximize the fit. We conclude with some further considerations about the fitting procedure.

Calculating model fit for a given set of parameters
where cor(·, ·) denotes Pearson's correlation. Because linear transformations leave correlations unaffected, it does not matter whether we use D(x i ) or R(x i ). Note that, on the r.h.s. of equation 7, the dependence on model parameters r up , r down , and r blank is implicit, but it exists as the distances that enter the D(x i ) values are based on memory traces, which are affected by the value of r up , r down , and r blank .

Maximizing model fit
In the previous section we showed how to calculate how well a given parameter triplet ρ = (r up , r down , r blank ) fits a set of experimental data. The last step of model fitting is to find the triplet that best fits the data. To do this, we maximize the correlation in equation 7 by systematic exploration of the parameter space. That is, we define lower and upper boundaries for r up , r down , and r blank , and we evaluate model fit on all point of a three-dimensional lattice of equally spaced points lying within these boundaries. We then select the parameter triplet that yields the highest fit, which is the final result of model fitting. The range of values delimiting our search space was generally 0.1 − −10 s −1 for all three parameters, with the constraint r blank ≤ r down .
Once we have determined the best fitting values of r up , r down , and r blank , we fit g and h in equation 1 using a linear model, to match numerically the observed response values. The values plotted in Figure 4 in the main text are the values of R(x i ) with the best fitting g, h, r up , r down , r blank . Note however, that fitting g and h does not affect model fit, as remarked above. Rather, it serves merely to transform the D(x i ) values so that they lie within the same range as the observed R i values, which is convenient for ease of comparison. A D(x i ) value, in fact, has arbitrary units (representing distance in an abstract memory space), while responses R i have units such as rate of responding or fraction of trials in which a response was observed.

Further considerations
To fit the model to data, we had to construct an empirical metric of the difficulty of discriminations. The best such metric would be to have learning curves for all discriminations involved in a study, but what is most often available is performance on each discrimination at the end of training. Fortunately, at any given time, learning speed and performance correlate, because the discrimination that proceeds more quickly has, by definition, the highest performance (see Figure 1 in the main text). Therefore, we can use performance at a given time to gauge the relative difficulty of discriminations. Most often, we use data on performance at the end of training. In a few cases, however, training continues for long enough that the performance on many discriminations is almost equally good. In these cases, we use, when available, data from earlier stages of training.
The general procedure just described had, in some cases, to be adapted to accommodate missing data or to address specific features of a study. For example, the model predicts short inter-trial intervals (ITIs) to be detrimental to responding because a short ITI may not allow the memory of a trial to decay fully before the next trial starts, which effectively decreases the distance between memory traces. Lacking precise information about the succession of trials, we estimated the memory trace at the beginning of each trial by calculating the trace that would be left, after the ITI, by each sequence appearing in the experiment, and then using the average of these traces as the initial memory trace for each trial. This is adequate if trial order is randomized or pseudo-randomized, as is typical in the reviewed experiments.
In Spierings2016, only responses to a subset of training sequences are reported. We assumed responses to all training sequences to be the same as the reported ones.
Another case in which we modified the fitting procedure slightly concerns data from MacDonald1993 (see section 1 of this Supplementary Information). In her experiments, pigeons had to produce a sequence of two pecks. Namely, pigeons were shown a sequence of two colors, and then asked to reproduce the sequence by pecking at two of three key lights, simultaneously lit with the colors that had been just shown as well as a third color. In this case, we computed distances between memory traces at the time of both the first and second peck (assumed to occur 5 s later based on information about stimulus duration). We assumed that the decision to peck a color first would depend on the distance between the memory trace of the sequence just witnessed and the memory traces of the rewarded and unrewarded sequences having the color in first position. For example, suppose the sequence AB is showed and, successively, the pigeon has to choose which to peck among the colors A, B, and C. In such a situation the pigeon would choose to peck A first if it remembered (correctly) having seen AB, but also if it remembered (incorrectly) having seen AC. Conversely, the pigeon would choose not to peck A if it remembered to have seen any of the sequences without A in first position, namely BA, BC, CA, or CB. A similar argument holds for the probabilities to choose other stimuli. The choice of which color to peck second is determined in the same way, but using the memory traces at the time of the second choice.
Overall, this reasoning still gives rise to equation 3 in the main text as a means to related a sample sequence to rewarded and non-rewarded sequences, but the sets of rewarded and non-rewarded sequences are different for each sample sequence.
Lastly, it would be valuable to obtain confidence intervals for our estimates of r up , r down , and r blank , which are in practice estimates of animal memory spans under given experimental conditions. However, we have not pursued this line of investigation for several reasons. First, our main goal was not to quantify memory spans accurately, but to show that a memory model with an imperfect representation of order accounts well for animal data. Second, memory spans are observed to differ across experimental conditions (cf. J. Lind, S. Ghirlanda, and M. Enquist. Animal memory: A review of delayed match-to-sample data from 25 species. Behavioral Processes, 117:52-58, 2015). Thus, while estimates of r up , r down , and r blank would be valid under specific experimental conditions, a useful quantification of memory span would require a systematic investigation of how these parameters vary across studies. It was not our aim to undertake such an investigation. Furthermore, distances between memory traces depend non-linearly on r up , r down , and r blank , so that estimating their confidence interval is not trivial.

Human sequence discrimination experiment
To compare human and non-human memory for sequences, we replicated Experiment 1 in "RG Weisman, EA Wasserman, PW Dodd, and Mark B Larew. Representation and retention of two-event sequences in pigeons. Journal of Experimental Psychology: Animal Behavior Processes, 6(4):312, 1980." In this experiment, pigeons were trained to peck a white square if they had just seen an AB sequence of lights, and to refrain from pecking after BA, AA, and BB sequences. The results are shown in Fig. 1d in the main text. In our experiments, human subjects received these instructions on screen: The data collected in this experiment will not be linked to your name or other identifying information. If you would like to terminate the experiment at any time, you may do so without penalty.
PLEASE READ THE FOLLOWING INSTRUCTIONS CAREFULLY.

THE EXPERIMENT LASTS LESS THAN 15 MINUTES, but if you don't pay attention your results may mislead us to incorrect conclusions! PLEASE TRY TO PAY ATTENTION THROUGHOUT THE EXPERIMENT!
During the experiment you will see short sequences of colored squares. The last square of each sequence will always be white. When you see the white square, you have the option of doing nothing or pressing the spacebar. Pressing the spacebar to certain sequences will cause a smiley face to appear. This means that your response was correct. However, pressing the spacebar to other sequences will not cause a smiley face to appear. For these sequences, you should choose to do nothing.
You have to learn which sequences will cause a smiley face to appear when you press the spacebar. Initially, of course, you will not know when to press, so you may respond incorrectly. This is fine, but try to press the spacebar only if you think that doing so will make the smiley face appear.
You must decide quickly, because the white square will be shown only briefly. If you do not respond fast enough, the experiment will move on. Responding before the white square appears is a mistake and will be ignored.
If you are clear on the above instructions, you can press 'Y' now to begin the experiment.
The explicit instruction to respond only to the white square mirrors the pigeons' preliminary training to peck only at a white square (the length of this training is not included in our analysis). For half of our participants, the correct sequence was yellow-blue, for the other half it was blue-yellow. By nonhuman standards (Fig. 2b in the main text), our replication is more difficult than the original experiment because it features shorter stimuli (1 s rather than 5 s) and inter-trial intervals (3 s rather than 35 s). The experiment was programmed using the ALEX software, freely available at www.github.com/drghirlanda/alex. The configuration files are available upon request. The experiment was approved by the CUNY IRB with code 412807.
Thirty-nine participants were recruited from the subject pool of the Brooklyn College Department of Psychology and participated for course credit. The latter was not contingent upon performance. Nine participants either never responded, or responded on every trial (presumably to finish the experiment earlier). We excluded these data from analysis. The results from the remaining subjects appear alongside the pigeon data in Fig. 1d in the main text.