Philosophical Transactions of the Royal Society B: Biological Sciences
Open AccessResearch articles

Measuring rhythms of vocal interactions: a proof of principle in harbour seal pups

Marianna Anichini

Marianna Anichini

Comparative Bioacoustics Group, Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands

Department of Biological Sciences, Faculty of Natural Sciences, Norwegian University of Science and Technology N-6025 Ålesund, Norway

Hanse-Wissenschaftskolleg Institute for Advanced Study, 'Brain' Research Area, 27753 Delmenhorst, Germany

Division Animal Physiology and Behaviour, Department for Neuroscience, Carl von Ossietzky University Oldenburg, 26129 Oldenburg, Germany

[email protected]

Contribution: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

Google Scholar

Find this author on PubMed

Koen de Reus

Koen de Reus

Comparative Bioacoustics Group, Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands

Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium

Donders Institute for Brain, Cognition and Behaviour, Radboud University, 6500 GL Nijmegen, The Netherlands

Contribution: Conceptualization, Data curation, Investigation, Writing – original draft, Writing – review & editing

Google Scholar

Find this author on PubMed

Taylor A. Hersh

Taylor A. Hersh

Comparative Bioacoustics Group, Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands

Contribution: Formal analysis, Investigation, Visualization, Writing – original draft, Writing – review & editing

Google Scholar

Find this author on PubMed

Daria Valente

Daria Valente

Department of Life Sciences and Systems Biology, University of Turin, 10123 Turin, Italy

Contribution: Formal analysis, Investigation, Visualization, Writing – original draft, Writing – review & editing

Google Scholar

Find this author on PubMed

Anna Salazar-Casals

Anna Salazar-Casals

Comparative Bioacoustics Group, Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands

Research Department, Sealcentre Pieterburen, 9968 AG Pieterburen, The Netherlands

Contribution: Methodology, Resources, Validation, Writing – review & editing

Google Scholar

Find this author on PubMed

Caroline Berry

Caroline Berry

Comparative Bioacoustics Group, Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands

Contribution: Data curation, Methodology, Software

Google Scholar

Find this author on PubMed

Peter E. Keller

Peter E. Keller

MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, NSW 2751, Australia

Center for Music in the Brain, Department of Clinical Medicine, Aarhus University and The Royal Academy of Music Aarhus/Aalborg, 8000 Aarhus, Denmark

Contribution: Formal analysis, Investigation, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

Google Scholar

Find this author on PubMed

Andrea Ravignani

Andrea Ravignani

Comparative Bioacoustics Group, Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands

Center for Music in the Brain, Department of Clinical Medicine, Aarhus University and The Royal Academy of Music Aarhus/Aalborg, 8000 Aarhus, Denmark

[email protected]

Contribution: Conceptualization, Funding acquisition, Resources, Supervision, Validation, Writing – review & editing

Google Scholar

Find this author on PubMed


    Rhythmic patterns in interactive contexts characterize human behaviours such as conversational turn-taking. These timed patterns are also present in other animals, and often described as rhythm. Understanding fine-grained temporal adjustments in interaction requires complementary quantitative methodologies. Here, we showcase how vocal interactive rhythmicity in a non-human animal can be quantified using a multi-method approach. We record vocal interactions in harbour seal pups (Phoca vitulina) under controlled conditions. We analyse these data by combining analytical approaches, namely categorical rhythm analysis, circular statistics and time series analyses. We test whether pups' vocal rhythmicity varies across behavioural contexts depending on the absence or presence of a calling partner. Four research questions illustrate which analytical approaches are complementary versus orthogonal. For our data, circular statistics and categorical rhythms suggest that a calling partner affects a pup's call timing. Granger causality suggests that pups predictively adjust their call timing when interacting with a real partner. Lastly, the ADaptation and Anticipation Model estimates statistical parameters for a potential mechanism of temporal adaptation and anticipation. Our analytical complementary approach constitutes a proof of concept; it shows feasibility in applying typically unrelated techniques to seals to quantify vocal rhythmic interactivity across behavioural contexts.

    This article is part of a discussion meeting issue ‘Face2face: advancing the science of social interaction’.

    1. Introduction and methodological approaches

    (a) Rhythmic interaction and turn-taking

    Humans adopt precise signalling behaviours to exchange information [1,2]. No matter the signal modality (e.g. acoustic and visual), an interactive event between sender and receiver is governed by a timed structure [36]. The structured exchange of communicative turns (i.e. turn-taking) characterizes our capacity for social interaction, enabling us to communicate with others [7,8]. The study of interactive rhythms—how two (or more) individuals coordinate their signalling in time—is an emerging field of research, with more recent work extending structural analyses of communication signals, once restricted to human spoken conversation, to other species [3,9]. Turn-taking in communication has been documented in primates [10] and in other non-primate species [6,1113]. For example, call exchanges in adult common marmoset monkeys (Callithrix jacchus) exhibit coupled oscillator dynamics, like those observed in human turn-taking [14]. Two key components of turn-taking are the flexible organization and distribution of turns, and the temporal relationship between adjacent turns [11]. In humans, face-to-face interactions require participants to be, among others, both socially and temporally sensitive [8]. Does behavioural context also affect signal timing in non-human animals? Comparative investigations on interactive vocal timing in mammals may help uncover shared turn-taking mechanisms, potentially providing more insights into their evolution.

    (b) Methodological approaches and challenges for studying rhythmic interaction

    Expanding the human turn-taking framework to other species is currently hindered by, among other things, the lack of suitable methodological approaches [11,15]. Cross-species frameworks exist [16], but open questions still remain in animal face-to-face interaction, such as: which analytical methodologies used to investigate turn-taking in humans may reveal temporal adjustments in other species? Can turn-taking arise from non-cooperative behavioural interactions? Can methods developed for individual rhythm analyses be used to study rhythmic interaction? Can parametric models for human rhythmic prediction and reaction detect similar features in other species? These open questions require suitable animal models and quantitative methods.

    Different forms of vocal rhythmic interaction, such as synchronous chorusing and turn-taking, have been mainly studied in mammals within a cooperative dynamic, like parent–infant and male–female dyads. Time-series analyses like Granger causality have shown temporal interdependence between vocalizations in male–female pairs [17,18] and movements [19] of non-human primates. Circular statistics is another method to study timing adjustments in interactions and has been used in previous animal work, including a seal pup playback experiment [20] and a study on parent–infant monkey interaction [21]. Categorical rhythms—those for which the temporal intervals between signal onsets are distributed categorically rather than uniformly—are a universal characteristic of human music, which is often produced in an interactive context [22]. Similar rhythms are also present in non-human animal songs (e.g. thrush nightingales [22], indris [23]), but whether such rhythms characterize other interactive non-song vocalizations, such as animal calls, is unknown [22,23]. Roeske et al. [22] hypothesized that categorical rhythms play a role in calls produced to attract and hold conspecific attention, by making sequences of vocalizations more predictable to listeners. Categorical rhythm analyses could therefore be an interesting method to test the predictability of vocal sequences in non-human animal interactions. Lastly, the ADaptation and Anticipation Model (ADAM), originally developed to model the mechanisms for interpersonal coordination in humans [24], has been adopted to probe sensorimotor and cognitive mechanisms underlying temporal dynamics in interaction [25,26]. Although ADAM is designed for ‘simultaneous chorusing’, it could also be used for a mixture of bouts of synchrony, turn-taking and other regimes [20,27].

    In this proof of concept study, we showcase how these methodological tools—Granger causality, circular statistics, categorical rhythm analysis and ADAM—can be applied to a new animal model: the harbour seal (Phoca vitulina). To illustrate the use and compatibility of these different analytical methods, we show how they can be used to better understand the rhythmic communication of a small sample of harbour seal pups in different behavioural contexts.

    (c) Our animal model

    The ‘vocal learning-beat perception and synchronization’ (VL-BPS) hypothesis states that only vocal learning species—those capable of producing new vocalizations or modifying existing ones based on auditory experience—may possess advanced rhythmic abilities [28,29]. This hypothesis is inherently cross-modal: it suggests a strong link between audition and timed movement. For example, Snowball, a sulfur-crested cockatoo (Cacatua galerita eleonora), was shown to perceive auditory rhythms at different tempi and to predictively synchronize his body movements to them [30]. Parrots are phylogenetically distant from humans and, among mammals, pinnipeds (seals, sea lions and walruses) are one of the vocal learning groups (besides humans, bats, elephants and cetaceans). Pinnipeds may well be the best mammalian model for testing the VL-BPS hypothesis—the ability to extract a beat from periodic acoustic stimuli and entrain to it in a predictive and adaptive manner—since some species show vocal mimicry and plasticity [31,32] and others can keep a beat [33]. These characteristics, parallelling human abilities, make pinnipeds an ideal animal clade for comparative research on the origins of rhythmic communicative behaviour.

    Harbour seals exhibit both vocal flexibility [32,34] and rhythmic interactivity [20], and are particularly vocal in the first few weeks of life [35]. During the lactation period, harbour seal pups emit ‘mother attraction calls' (hereafter ‘calls’) to draw their mothers' attention [36]. Mothers are silent and use the individual vocal signatures in these calls to recognize their pups [35,37]. Against the acoustically complex backdrop of large mother–pup rookeries, rhythmically tuned pup calls could constitute a socio-ecologically selected trait that allows individual pups to avoid conspecific call overlap by adjusting the timing of their own call onsets. Such timing plasticity could allow a pup to be more acoustically conspicuous and increase its chances of successful reunions with its mother. Unlike cooperative types of turn-taking (e.g. in humans and in common marmosets [38]) harbour seal pups’ interactions are a by-product of neighbouring pups vocalizing to attract their silent mothers and are thus probably competitive.

    To date, only two papers studied vocal rhythms in harbour seals, crucially both focusing on single individuals [20,27]. The first study was a playback experiment in which a pup vocally interacted with sounds broadcasted from a loudspeaker [20]. The pup adjusted the timing of its calls in an asynchronous manner by responding to the broadcasted conspecific calls with a non-uniformly distributed response phase whose mean approximated 90° [20]. The second study looked at the presence and development of vocal rhythms in three harbour seal pups [27]. Complementary analytical approaches showed how the pups' individual calling patterns gained more rhythmic structure over time [27]. However, a major limitation of both studies was the lack of sociality (i.e. individuals were tested alone) and, by extension, interactivity (i.e. the stimuli did not adapt to the response of the tested animals).

    (d) Aims and research questions

    In this work, we show how vocal interactive rhythmicity in non-human animals can be quantified using a multi-method approach spanning various research domains (e.g. temporal, social and cognitive) (table 1). We illustrate this approach through four research questions, all of which relate back to whether harbour seal vocal interactive rhythmicity varies in different behavioural contexts (table 1). While our sample sizes are too small to enable species-wide inferences, they are sufficient to illustrate how methods typically used to study human communication can be adopted to study interactivity in animal communication. The goal of this paper is thus to outline a quantitative roadmap that future research can follow. Circular statistics and categorical rhythm analysis are used to address the first question about temporal adjustment in interaction: ‘does the presence of a calling partner affect the call timing of individual pups?’ (Q1). The next two questions consider the effect of behavioural context on temporal adjustment: ‘does the type of calling partner (real or broadcasted) affect the call timing of individual pups?’ (Q2) and ‘when the focal pup is vocalizing, does the presence of a silent partner (versus no partner) affect call timing?’ (Q3). We answer these questions using circular statistics (Q2 and Q3) and Granger causality tests (Q2). Lastly, ADAM is used to investigate the fourth question about the cognitive processes involved in temporal adjustment: ‘which timing mechanisms are used by pups during vocal interactions?’ (Q4).

    Table 1. Summary table showing in order: research questions, analyses, contributing pups, predictions, whether the data supports each prediction, statistical test(s) used and result(s) obtained. (The column ‘supported by data?’ has three possible answers: results support the prediction (Y), results only partially support the prediction (partial) and results do not support the prediction (N). Owing to sample size and/or analytical requirements, not all pup data could be used in each analysis. The acoustic variables of interest for the different analytical approaches were response phases (circular statistics), inter-onset intervals (IOI) ratios (categorical rhythms), IOIs (Granger causality, ADAM) and asynchronies (ADAM). SIR, small integer ratio; KS, Kolmogorov–Smirnov.)

    Research question Analytical approaches Pups Prediction Supported by data? Statistical test(s) and result(s)
    temporal domain, Q1: does the presence of a calling partner affect the call timing of individual pups? circular statistics A–I pups will not vocalize at random points in time Y Rayleigh test: unimodal distribution of response phases
    pups will call in asynchrony to avoid overlap during vocal interactions Y V-test: pup calls start at one-quarter of the partner's period
    response phases will be affected by the presence of a vocalizing partner Y Watson's U2 test: response phase distributions differ between non-interactive and interactive contexts
    categorical rhythms A, B, C, E, H, I empirical and chance ratio distributions will only significantly differ when pups are vocally interacting partial one-sample KS tests: simulated and empirical ratio distributions are rarely significantly different (exceptions: pup I alone, pup I one-way, pup B two-way)
    A, B, I vocally interacting pups will have a significant peak at the 4 : 1 SIR N paired Wilcoxon signed-rank test: no significant peaks at any of the tested SIRs
    ratio distributions of individual pups will differ across behavioural contexts Y two-sample KS tests: ratio distributions significantly differed for the same individuals in different behavioural contexts
    social domain, Q2: does the type of calling partner (real or broadcasted) affect the call timing of individual pups? circular statistics A–I pups interacting with a real partner will show more adaptive call timing than the pup interacting with a broadcasted partner N Watson's U2 test: no difference in response phases between one-way and two-way interactions
    Granger causality A, B, C, D, E, F, I the time series of a pup will be better predicted considering the time series of a vocalizing partner rather than the time series of a broadcasted signal Y bidirectional and unidirectional causality: interaction with a real partner impacted the pup's vocal behaviour more than the playback. Mutual temporal adaptation among pairs of vocally interacting pups
    social domain, Q3: when the focal pup is vocalizing, does the presence of a silent partner (versus no partner) affect call timing? circular statistics A–I the calling pattern of pups will show similar rhythmic structure in both the alone and silent partner conditions, as no vocal interaction is taking place in both cases Y Watson's U2 test: no difference between response phase distributions of pup calling alone and pups calling with a silent partner
    cognitive domain, Q4: which timing mechanisms are used by pups during vocal interactions? ADAM A, B, I in the one-way interaction, pups may show sensitivity to (non-) interactivity which would be reflected by parameter changes over repeated sessions partial one-way interaction: temporal anticipation, and to a lesser extent also adaptation, decreased across the playback sessions, and was absent in the final session
    the two-way interaction may be mediated by basic temporal adaptation and possibly higher level anticipatory timing two-way interaction: clear evidence for temporal adaptation, with differing parameter estimates for each seal pup suggesting the emergence of different interactive roles

    (e) Subjects, housing conditions and behavioural contexts

    We recorded nine wild-born pups (A–I) calling in different behavioural contexts while housed at Sealcentre Pieterburen (the Netherlands) (electronic supplementary material, Methods S1 and S2). During the recordings, each pup was housed in an enclosure with a swimming pool and a resting platform (electronic supplementary material, figure S1). One pup was housed alone (I) while the others were housed in pairs (A/B, C/D, E/F, G/H). Note that the enclosures were physically but not acoustically isolated from each other, meaning that pups could hear other pups in neighbouring enclosures.

    We analysed focal pup vocalizations during four different behavioural contexts (figure 1; electronic supplementary material, table S1): (i) when the focal pup was alone (pup I); (ii) when the focal pup heard a playback of conspecific calls (pup I); (iii) when the focal pup's partner was silent (pups A–H); and (iv) when the focal pup's partner was also vocalizing (pups A–H). Hereafter, we refer to these conditions as: (i) alone, (ii) one-way interaction with a broadcasted partner, (iii) silent partner, and (iv) two-way interaction with a real partner. Notice that only some pups entered each condition and vice versa (electronic supplementary material, table S1).

    Figure 1.

    Figure 1. Schematic of experimental conditions and key measurements. Each of the four rows represents one bout and indicates a different behavioural context: (a) alone, (b) silent partner, (c) one-way interaction, and (d) two-way interaction. Boxes are coloured to represent the call source and grey rectangles denote vocal interactions. For bouts (c) and (d), calls are isochronously spaced for ease of visualization, but empirical patterns were not necessarily isochronous. Horizontal bars indicate how inter-onset intervals (IOIs) were calculated for different analyses. The call onsets (black arrows) and offsets (black lines) are shown for the first two calls in (a). ICI, inter-call interval.

    Our sample size was affected by the unpredictable arrival of animals at the Sealcentre, which varies seasonally. Only medically stable and healthy pups were included in this study. The number of daily recording sessions per pup pair varied based on veterinary staff recommendations at the rehabilitation centre (electronic supplementary material, table S1); we did not record pups with signs of disease. Data from pup I (i.e. the alone and one-way contexts both with and without playback stimuli) have been re-analysed from previous studies ([20,27], respectively) and combined with unpublished data from pups A–H (the silent partner and two-way contexts) (electronic supplementary material, table S1). The contribution of each pup to each analysis is shown in the electronic supplementary material, table S2.

    (f) Extraction of temporal variables, definition of call bout and vocal interactions

    We first extracted the onsets and offsets of each pup call recorded in each behavioural context (figure 1; electronic supplementary material, Method S3). From these values, we calculated rhythmic metrics such as call duration, inter-onset intervals (IOIs), ratios of adjacent IOIs and inter-call intervals (ICIs; i.e. silent gaps). Each IOI was obtained by subtracting the onset of call n from the onset of call n + 1, while the ICI was calculated by subtracting the offset of call n from the onset of call n + 1 (i.e. IOI minus duration of call n). Calls were organized into bouts, defined here as a series of at least three subsequent calls that were separated from adjacent bouts by a period greater than 1.5 times the median ICI of the recorded individual(s) calls (figure 1). The response phase was computed as the ratio of the ‘response IOI’ (i.e. time interval between the call onset of the partner and the call onset of the pup's response) and the previous IOI, multiplied by 360, resulting in a unit vector with an angle on a circle. A vocal interaction was defined as a group of three calls within the same bout, which includes two calls from the broadcasted/real partner followed by the response of the focal individual (figure 1). Following previous methodology [27,37], we calculated the IOI ratio, rk, for each pair of adjacent IOIs, tk and tk+1, in a bout as:


    (g) Descriptions of analytical approaches

    Circular statistics, wherein periodic measures are converted to angles on a circle and compared to distributions of interest [39], were used to investigate rhythmic periodicities in pup call response phases (electronic supplementary material, Method S4). Following [20], we considered the values of the response phases as circular data falling between 0° and 360°. We obtained the circular mean (μ) (i.e. the average direction of the response phases calculated from the pup calls; electronic supplementary material, table S3). Then, we ran Rayleigh z-tests to investigate whether the distribution of response phases was uniform (e.g. arousal hypothesis) or showed a unimodal peak (electronic supplementary material, table S4) [20]. Subsequently, we tested for uniformity against a specified mean direction for the unimodal peak using a V-test [3942].

    Our data met the assumptions for circular statistics. We tested if the response phases in all four behavioural contexts followed a von Mises distribution using one-sample Watson tests (electronic supplementary material, table S5). With deviations from uniformity (null hypothesis von Mises distribution rejected), we used Kuiper's test, Watson's test and Rao's spacing test, to confirm the p-value obtained from the Rayleigh test (electronic supplementary material, table S6) as suggested by Landler et al. [39]. More details are shown in the electronic supplementary material, Method S4.

    We then tested whether response phase distributions varied depending on the presence of the calling partner. We expected that vocally interacting pups would adjust their responses to broadcasted (one-way interaction) or real (two-way interaction) conspecific calls to avoid overlap and, hence, their response phases would show a unimodal distribution. Following previous work [20] and applying the V-test, we tested the null hypothesis of call response phase uniformity against two alternative unimodal departures: 0° (i.e. synchrony) and 90° (i.e. asynchrony). Using Watson's two-sample U2 test [43], we also compared the call phase distributions of: (i) a pup calling alone versus when responding to a broadcasted partner (pup I), and (ii) a pup calling in the presence of a silent partner versus when their partner was also calling (pups A–H). For interacting (one-way or two-way) pups, we applied Watson's two-sample U2 test to assess whether the type of partner (i.e. real or broadcasted) differentially affected the pups' response timing. We predicted that pups interacting with a real partner would show more adaptive call timing, thanks to potential communicative cues from other modalities. Lastly, we compared the distributions of call phases of the single pup vocalizing alone to those of the paired pups when their partner was silent to test whether the simple presence of a silent partner affects individual call timing. In both behavioural contexts, we predicted that calling patterns for pups without a responsive partner would show a different rhythmic structure to those observed in interaction.

    Categorical rhythm analysis tests whether the temporal intervals between signal onsets, as inferred from IOI ratios, are distributed categorically rather than uniformly. We predicted that empirical and simulated null ratio distributions (i.e. the expected distribution if no rhythmic categories exist) will not differ when a pup is alone or with a silent partner but will differ when a pup is vocally interacting. For vocally interacting pups, we predicted a significant peak in empirical ratio distributions at the 4 : 1 small integer ratio (SIR) based on the lone seal in [20], which called at approximately one-quarter of the playback's period. All categorical rhythm analyses were done within bouts following previous methodology (electronic supplementary material, Method S5) [22,23], with IOIs calculated in various ways depending on the behavioural context (electronic supplementary material, table S8; figure 1). We used one-sample Kolmogorov–Smirnov (KS) tests to determine whether empirical IOI ratio distributions significantly differed from simulated null IOI ratio distributions. Our data met the one-sample KS test assumptions, namely that the sample is random and the theoretical distribution is continuous and fully defined. When the empirical and simulated distributions were significantly different, we also looked for evidence of SIR categorical rhythms—specifically at the 1 : 4, 1 : 3, 1 : 2, 1 : 1, 2 : 1, 3 : 1, and 4 : 1 ratios—which have been found in other species' vocalizations [22,23]. In these analyses, the empirical ratio distributions were divided into ‘on-integer’ and ‘off-integer’ ratio bins (electronic supplementary material, table S7). On- and off-integer bin counts for each SIR were normalized by bin size and compared using a paired Wilcoxon signed-rank test (a non-parametric test that allows for non-normality in the population data and assumes paired differences are continuous, symmetrically distributed and mutually independent). When sample sizes allowed, we used two-sample KS tests (having met the assumption of mutual independence of measurements within samples) to determine whether the ratio distributions of individual pups differed across behavioural contexts.

    The Granger causality test investigates whether the values of a time series A are better predicted when considering the values from a second time series B, as opposed to only using values from time series A [44]. Here, we assessed whether the call timing of a pup partaking in a one-way or two-way vocal interaction can be predicted using the call timing of its partner. More specifically, to investigate whether the call timing of a pup differed in relation to the type of partner, we tested whether there is a difference in predicting the time series of the pup interacting with a broadcasted partner versus time series of the pups interacting with a real partner. Previous work showed that individuals respond to conspecific calls with a non-random pattern [1719,45]. We therefore expected that the time series of a pup can be better predicted considering the time series of a vocalizing partner rather than those of a broadcasted signal. In both the one-way and two-way interactive scenarios, we considered Granger causality at two levels: (i) the entire recording, regardless of the length of the pauses between consecutive calls, and (ii) different bouts within each recorded session. We restricted the analysis on the different bouts to call sequences that were long enough to generate accurate estimates (i.e. a minimum of five paired calls [46]). The bouts included in this analysis range from 5 to 20 calls. We conducted the Granger causality test using call onsets and different lag measures, from one to five (electronic supplementary material, Method S6), testing whether the previous one to five onsets in the first time series can be used to better predict the second time series (electronic supplementary material, figures S5 and S6, and table S11). For the one-way interaction, we performed a one-way analysis, considering whether the pup's timing could be predicted using the playback timing. For the two-way interaction, we performed a two-way analysis to assess whether the two interacting pups influenced the timing of each other's calls.

    We used ADAM to test for evidence of reactive error correction and predictive processes in the one-way and two-way interactive scenarios. ADAM consists of three computational modules that interact via internal models of ‘self’ and ‘other’ that support one's own action planning and external event prediction, respectively (electronic supplementary material, figure S2). The adaptation module compensates for synchronization errors by implementing error correction processes that alter the phase and/or period of an internal timekeeper controlling for action (here, call) timing. These error correction processes determine the provisional timing of the next planned action by providing input to an internal model of the ‘self’. The anticipation module computes the expected timing of upcoming events based on the weighted sum of two processes: the linear extrapolation of previous IOIs in the sequence and the copying (or ‘tracking’) of the previous IOI, with the output informing temporal predictions generated by the ‘other’ internal model. Finally, a joint module integrates and compares the output of the adaptation and anticipation modules and compensates for discrepancies by implementing an anticipatory error correction process before the next motor command is issued. The joint module thus reduces potential temporal mismatches between action plans in ‘self’ internal models and temporal predictions in ‘other’ internal models, thereby regulating the balance between the integration (merging) and segregation (distinction) of information about ‘self’ and ‘other’ [47,48]. Each process instantiated in ADAM is controlled by an independent parameter, and the value of these parameters can be estimated for a particular individual by fitting the model to behavioural time-series data [25,26,49,50]. Parameter estimates were obtained for both the adaptation-only version of ADAM—which includes phase correction and period correction—and the full (joint) version—including period correction, temporal prediction-tracking and anticipatory error correction. Both versions of ADAM were applied to each interactive context because it is not possible to know a priori whether the pups' call sequences (real or broadcasted) have a steady base tempo (for which adaptation is sufficient) or a systematically changing tempo (which benefits from both anticipation and adaptation) (electronic supplementary material, Method S7).

    3. Results

    (a) Does the presence of a calling partner affect the call timing of individual pups? (Q1)

    All pups’ data entered the circular statistics analysis. Running the Rayleigh test, we found that the response phase distribution was uniform for pup I which was recorded alone (z = 0.04, p = 0.254; figure 2a), whereas it was non-uniform for pups A–H which were recorded with a silent partner (z = 0.11, p < 0.001; figure 2a; table 1). This non-uniformity may have been driven by the individual contributions of pups B and C, which had non-uniformly distributed response phases (electronic supplementary material, table S4), whereas the other six pups had a uniform distribution. The Rayleigh tests run anew in the interactive contexts, showed that the response phase distributions of pup calls were unimodal in both the one-way (pup I: z = 0.39, p < 0.001) and two-way (pups A–H: z = 0.41, p < 0.001) interactions (figure 2b; table 1). Applying the V-test in both contexts, the direction of the response phases did not statistically match 0° (one-way: z = −0.02, p = 0.587; two-way: z = 0.06, p = 0.110), suggesting that pups did not synchronize with their partner (real or broadcasted). However, the response phase direction did match 90° (one-way: z = 0.38, two-way: z = 0.41, p < 0.001), supporting the previously reported evidence of asynchronous calling behaviour [20]. A Watson's two-sample U2 test confirmed that the response phase distributions significantly differed between the alone versus one-way interaction context for pup I (U2 = 1.76, p < 0.001), and between the silent partner versus two-way interaction context for pups A–H (U2 = 0.78, p < 0.001; electronic supplementary material, figure S3; table 1). Finally, the circular standard deviation values were higher for the alone and silent partner contexts compared to both interactive vocal contexts, indicating a larger dispersion of the response phases for the former conditions. This outcome is also confirmed by the values for the mean resultant length.

    Figure 2.

    Figure 2. Circular histogram plots (bin width = 20°) showing response phases in (a) both types of behavioural contexts with no vocal interaction (alone/silent partner), and (b) in both types of vocal interaction contexts (one-way/two-way). Angles are measured in degrees starting from 0° and going clockwise to 360°. The arrows indicate the circular mean (µ) and colours correspond to the different behavioural contexts. The length of the arrow corresponds to the value of the mean resultant length (ρ).

    In the categorical rhythm analyses, the empirical ratio distribution was significantly different from chance when pup I was recorded alone (figure 3a) and during the one-way interaction (figure 3c). In both contexts, there was no evidence of significant peaks at any of the tested ratios (table 1). When the playback calls were disregarded from IOI calculations (figure 3b), there was no significant difference in empirical and simulated ratio distributions for pup I. However, pairwise KS tests showed that the ratio distributions significantly differed when comparing each of the three behavioural contexts (alone versus one-way interaction disregarding playback versus one-way interaction when pup I responds) to each other (electronic supplementary material, table S10; table 1). For the five pups that were well-sampled in the silent partner context (pups A, B, C, E and H; electronic supplementary material, table S8), the empirical ratio distributions did not significantly differ from chance (figure 3; electronic supplementary material, table S9 and figure S4, Method S5). Finally, when considering two-way interactions, only pups A and B were well-sampled enough (i.e. had at least 10 ratios for both the silent partner and two-way interaction contexts) to compare, but the empirical and simulated ratio distributions were significantly different only when pup B was the responder (figure 3; electronic supplementary material, table S8 and table S9). Once again, there were no significant peaks at any of the tested ratios for pup B. For both pups A and B, the empirical ratio distributions significantly differed when comparing different behavioural contexts (electronic supplementary material, table S10). Collectively, there was thus little evidence of SIR rhythmic categories in pup calls, but IOI ratios did significantly differ when looking at the same individuals in different behavioural contexts (table 1).

    Figure 3.

    Figure 3. IOI ratio density plots for pups in different behavioural contexts. Pup I: (a) alone, (b) one-way interaction (disregarding playback), (c) one-way interaction (responding to playback). Pup A: (d) silent partner, (e) two-way interaction (responding to partner). Pup B: (f) silent partner and (g) two-way interaction (responding to partner). For each plot, the dashed vertical lines indicate, from left to right, 1 : 4, 1 : 3, 1 : 2, 1 : 1 (i.e. isochrony), 2 : 1, 3 : 1 and 4 : 1 SIRs. On-integer ratio ranges are in dark green and off-integer ratio ranges are in light green, with white lines and black dashed lines denoting bin boundaries. The orange curves indicate the ratio distribution expected under a uniform distribution if no rhythm categories exist. The empirical ratio distribution significantly differed from the simulated ratio distribution for (a), (c) and (g) only. Note that the scale of the y-axes differ.

    Thus, the results from both analyses suggest that the presence of a calling partner does affect the call timing of the focal pup, in terms of both call response phases and IOI ratios.

    (b) Does the type of calling partner (real or broadcasted) affect the call timing of individual pups? (Q2)

    To address this research question, we once again used circular statistics and the response phase distributions of all nine pups. Specifically, we compared calls from pup I during the playback (one-way interaction) with calls of pups A–H when their partner was also calling (two-way interaction). A Watson's two-sample U2 test statistically confirmed that the response phase distributions did not differ between the one-way and two-way vocal interactions (U2 = 0.07, p > 0.10; table 1).

    Interestingly, however, the Granger causality results from seven pups (A–F, I) showed that call timing behaviour differed depending on the type of partner (table 1). For the one-way interaction, five different playback sessions featuring pup I were considered (ranging from 34 to 121 paired calls) and the timing of the pup's calls were never significantly predicted by the timing of the playback (electronic supplementary material, table S11 and figure S5). For the two-way interaction, five different recording sessions were considered (ranging from 8 to 71 paired onsets). We found that in two sessions, the timing of the first pup did not significantly predict the timing of the second, in either direction (pair A > B and B > A; pair E > F and F > E; electronic supplementary material, table S11 and figure S6B and S6D). However, in two other sessions, the timing of a pup was significantly predicted by the calling partner in both directions, across different lag values (A > B: lag-1, lag-3, lag-4, lag-5; B > A: lag-3, lag-4, lag-5; C > D and D > C: lag-2 and lag-3; electronic supplementary material, table S11 and figure S6B and S6C). In the last session, the timing of pup A could be predicted by that of pup B, while we found no indication of temporal adjustment for pup B. At the bout level, we restricted our analyses to four series of paired onsets featuring pups A and B (as they were the only pup pair that met the sample size requirements). In this last scenario, only the timing of pup B was significantly influenced by pup A (electronic supplementary material, figure S6A).

    While the circular statistics results thus suggest that the type of calling partner (real versus broadcasted) does not affect focal pup call timing, the Granger causality results suggest that in certain two-way (but not one-way) interactions, focal pup call timing can be predicted by the partner's call timing.

    (c) When the focal pup is vocalizing, does the presence of a silent partner (versus no partner) affect call timing? (Q3)

    Using circular statistics, we compared the calls of pup I recorded alone with those of the eight other pups (A–H) recorded with a silent partner present. Watson's two-sample U2 test results show that the response phase distributions did not statistically differ between the two contexts (U2 = 0.10, p > 0.05). In other words, having a silent partner was essentially the same as having no partner—in terms of the effect on focal pup response phase distributions—for the pups in our study. This is intuitive, given that in both behavioural contexts, there is no acoustic stimuli to ‘respond’ to.

    (d) Which timing mechanisms are used by pups during vocal interactions? (Q4)

    ADAM parameter estimation was conducted on call data from all five playback sessions featuring pup I (one-way, figure 4a,b), but was restricted to pups A and B for the two-way interactions owing to sample size limitations (figure 4c,d). Interactive vocal bouts were concatenated to obtain a time-series length which would provide reliable ADAM parameter estimates. A simulation test then ensured that the estimates were not compromised by differing sequence lengths or by the concatenation procedure (electronic supplementary material, Method S7). The reliability of observed parameter estimates was tested by comparison against corresponding values for randomly permuted data (electronic supplementary material, Method S7). The quality of the fits to the data did not differ significantly between versions of ADAM (electronic supplementary material, Method S7). Results for the one-way interaction featuring pup I were remarkable with regard to typical human data (e.g. [25,26]) as most significant parameters were negative in sign (figure 4a,b; electronic supplementary material, tables S13 and S14). Negative phase and period correction estimates indicate that calling earlier will lead to a shortening of the next IOI, while calling later will lead to a lengthening of the next IOI. Negative prediction-tracking estimates mean that when the playback's IOIs increase (i.e. deceleration), the pup's IOIs will decrease (i.e. acceleration), and vice versa. This systematically enhances the timing distinction between calls, possibly testing the responsiveness of the partner (i.e. playback) by introducing timing asynchronies and gauging their effects. It is worth noting that evidence of such behaviour generally decreased across the five playback sessions, with no significant parameter estimates emerging in the final session.

    Figure 4.

    Figure 4. ADAM parameter estimates for seal pups A, B and I. (a,b) Parameter estimates for pup I in five playback sessions (one-way interaction); (c,d) estimates for pups A and B (two-way interaction). (a) Phase correction and period estimates obtained with the adaptation-only version of ADAM; (b,d) period correction and prediction-tracking estimates for the full ‘joint’ model (anticipatory error correction estimates are not shown). Parameter estimates are shown separately for pup A relative to pup B (aquamarine bars) and pup B relative to pup A (blue bars) in the recording session, and for pup I (green bars) relative to playback calls in separate playback sessions (PB1–PB5). Corresponding parameter estimates for randomly permuted data from each session are represented by box plots, with the central horizontal line indicating the median, the bottom and top edges of the box indicating the 25th and 75th percentiles, and the whiskers indicating the 5th and 95th percentiles. Real estimates with asterisks are significantly different from permuted data estimates at the two-tailed (orange) or one-tailed (grey) level.

    For the two-way interaction (figure 4c,d; electronic supplementary material, tables S13 and S14), the parameter estimation procedure was run twice, each time with one of the two pups serving as the focal individual and the other as the external reference. With this procedure, similar parameter estimates for each pup would indicate a symmetrical pattern of influence, whereas different estimates would indicate asymmetrical influence. Results for this type of vocal interaction showed evidence for temporal adaptation. In particular, parameter estimates were consistent with pup A engaging in period correction while pup B engaged in both phase and period correction. Period correction estimates were positive in sign for pup A and negative for pup B. Pup A's positive period correction estimates suggest that calling earlier will lead to a lengthening of the next IOI, whereas calling later will lead to a shortening of the next interval. Pup B exhibited the opposite effect but to a lesser degree. Pup A thus implemented a timing mechanism that increased the similarity in their calling rates, while pup B implemented a timing mechanism that increased the distinction in calling rates. It should be noted, however, that the observed parameter estimates do not necessarily reflect individual call styles per se, but rather the roles that dynamically emerge within the context of this specific pairing of pups.

    4. Discussion

    This study sought to provide a methodological proof of concept for quantifying vocal rhythmicity in non-human animal interactive communication. Particularly, we showed how complementary methodologies can be used to test whether the (Q1) presence and (Q2) type of a vocalizing partner, or the (Q3) presence of a silent partner affect patterns in animal communication (table 1). We also investigated which (Q4) underlying cognitive mechanisms potentially play a role in vocal interactions (table 1). The different analytical techniques proved fruitful; combining and contrasting their results could highlight nuances in rhythmic behaviour. Nonetheless, our sample size is undeniably small, and our opportunistic design (e.g. mixing within- and between-individual comparisons) cannot provide strong inference. We believe that our experimental set-up, combined with the approaches we present, can be adapted and expanded in future work to better understand the temporal, social and cognitive processes underlying interactive communication in animals.

    (a ) Q1. The presence of a vocalizing partner influences rhythm in vocal interactions

    The prediction of overlap avoidance was supported by circular statistics, which showed that the distributions of response phase angles were unimodal. When vocally interacting, pups responded at approximately one-quarter of the playback/real partner calling period (90°) and showed phase angles significantly different from 0°, supporting previous results from one individual (pup I) [20]. Pups responding to conspecifics seem to time their calls to avoid overlap, consistently de-synchronizing their call onsets. When pups were recorded alone or with a silent partner, the distribution of phase angles was uniform, with no significant rhythmic pattern. Our findings confirm and extend previous outcomes [20]; in our limited sample, pups not only react to playbacks with asynchrony, but also respond to conspecific partner calls in an asynchronous manner.

    Regarding categorical rhythms, the empirical ratio distribution did not significantly differ from the distribution expected by chance if IOIs were uniformly distributed for most pup/behavioural context combinations (7 out of 10). For the remaining three combinations (pup I alone; pup I, one-way interaction; pup B, two-way interaction), there was no significant evidence that call timing exhibited categorical rhythms at the seven tested SIRs. This includes the 4 : 1 ratio, which we hypothesized would frequently occur when pups were vocally interacting with playback stimuli or partners based on previous work [20]. Importantly, however, the empirical ratio distributions significantly differed across behavioural context conditions for the three pups (A, B, I) for whom such comparisons were possible. When alone or with a silent partner, pup calls generally showed unimodal ratio distributions centred around isochrony, whereas calls emitted by vocally interacting pups showed a clear right-shifted peak, or a second peak to the right of isochrony, indicating that the pup's IOIs during vocal interactions were generally shorter than the playback's/partner's IOIs. This context-dependent asymmetry bolsters results related to Q4, namely that the presence of vocalizing partner can significantly impact pup vocal behaviour, with interacting individuals trying to prevent call overlap. Roeske et al. [22] hypothesized that categorical rhythms may be prevalent in calls meant to attract and hold conspecific attention, such as the calls pups use to attract the attention of their mothers. We did not find evidence of rhythm categories at seven tested ratios; this negative result has methodological value, highlighting that not all species which produce attention-seeking vocalizations show integer ratio categories.

    A vocal interaction with non-adaptive playback stimuli (one-way) may represent a limitation to studying spontaneous vocal production. The extent of this limitation can be gauged by comparing results to data from vocal interactions with a real partner (two-way), and with the use of complementary analytical methodologies, such as categorical rhythm analysis and circular statistics.

    (b) Q2. Type of calling partner (real or broadcasted) partly affects rhythm in interaction

    Findings from Q2 show contrasting results. Call phases were statistically the same no matter if pups could interactively adjust their call timing to each other (two-way) or not (one-way). Partly in contrast with this, the Granger causality analysis showed how vocally interacting with a real individual impacted the pup's vocal behaviour more than interacting with a broadcasted partner. Indeed, we found evidence of mutual temporal adaptation among pairs of vocally interacting pups and, consistent with a recent hypothesis [6], conspecific interactions can be facilitated by the reciprocal adjustment of timing behaviour. By contrast, we found little evidence that a pup adjusts the time series of its calls to a playback series. Indeed, in most cases, the pup's call timing could not be predicted by the playback's call timing.

    Together, findings from circular statistics (Q1) and Granger causality (Q2) point towards a directionality-overlap avoidance relationship, which has also been suggested for other non-human species (e.g. primates [17,51]; birds [52]; amphibians [53] and seals [54]). Interestingly, when infant marmosets interact with their parents, the probability that their vocalizations will overlap with those of adults decreases over time [21], suggesting that turn-taking in some mammals is a learned vocal behaviour scaffolded by active parental feedback [38].

    (c) Q3. A silent partner does not trigger call rhythmicity

    Circular statistics indicated that pups did not show any periodic pattern when calling alone or with a silent partner present (Q1). This is consistent with the fact that wild pups produce calls to attract a silent mother [35]: if no other calling conspecifics are present, there is no need to adjust one's call timing and vocalizations are produced with a random onset. Moreover, given that the response phase distributions did not differ between pups recorded with a silent partner or alone, it suggests that the presence of a silent partner did not trigger variable calling behaviour in our study population.

    (d) Q4. Purported timing mechanisms for vocal interactions

    The ADAM analysis suggests that seal pups may perceive temporal patterns [55], which arise between their calls and those of others, and react to them by adopting different mechanisms for temporal adaptation. The negative parameter estimates that we observed in the one-way interaction sessions (consistent with enhancing the distinction between calls) could reflect attempts to lead the temporal interaction or even to test the responsiveness of the (broadcasted) partner by introducing timing perturbations and gauging their effects. The decrease of temporal adaptation and/or anticipation observed across playback sessions is consistent with a gradual process of habituation, with the pup possibly learning that the playback is non-interactive. Harbour seals are capable of acoustic recognition based on habituation paradigms; they can discriminate among different stimuli and selectively habituate to them [56]. The lack of temporal adaptation to the playback stimulus we observed in a seal pup may entail similar habituation processes. From a comparative perspective, these findings also suggest a sensitivity to social contingency that may be analogous to capacities in human infants, who become disinterested and display fewer signs of positive affect during vocal interactions with non-responsive or delayed video recordings of their mothers [5761]. Future studies could address the role of temporal contingency by using interactive playback sequences [6264].

    Ours constitutes, to our knowledge, the first attempt to apply ADAM to non-human animals. On the technical side, this necessitated the validation of an approach where brief interactive vocal bouts were concatenated to produce longer time series and thereby reduce the risk of model overfitting (electronic supplementary material, Discussion S1). Demonstrating the use of this procedure opens the door to applying the model in a wider range of behavioural contexts. Nevertheless, caveats are necessary when interpreting the seal data in light of previous work with ADAM in humans, where individuals intentionally produce movements, whose sensory effects occur simultaneously with rhythmically regular sounds (e.g. [25,26]). Assumptions about intentionality and simultaneity may not apply to seal pup vocal interactions or rhythmic interactions in other animals [6,9]. Points of convergence in the main outcomes of the complementary analysis techniques suggest that ADAM, like categorical rhythm analysis, circular statistics and Granger causality, may also be robust and informative under such conditions.

    5. Conclusion

    Crucially, our work highlights the efficacy of combining multiple methods to study rhythmic vocal behaviour. Our approaches vary in the degree to which they capture global temporal characteristics across events versus local temporal dependencies between events. Global measures (e.g. from circular statistics or categorical rhythm analysis) reveal predominant rhythmic features of a vocal interaction while local time-series measures (e.g. from Granger causality or ADAM) provide information about how these features might arise. Through this approach we could tease apart rhythm nuances in our dataset, further develop harbour seals as a model species, and demonstrate how certain analyses often restricted to humans, such as categorical rhythms (but see [22,23]) and ADAM, can be applied to non-human animals. The categorical rhythm and circular statistical analyses tackled similar questions from different angles, namely whether the distributions of IOI ratios (the former) or call response phases (the latter) significantly differed across behavioural contexts. The categorical rhythm analyses also sought to determine whether the rhythmicity of pup calling behaviour conforms to SIRs. Our negative result is, to our knowledge, the first published case of a species for which categorical rhythms are clearly absent from vocalizations, which adds to understanding of how, why and when such rhythms evolve in communication systems. Time-series analyses such as Granger causality allowed testing for timing adjustment. In the case of ADAM, mechanisms of temporal adaptation and anticipation that have previously been used to describe rhythmic behaviour in humans [65] were used to describe rhythmic behaviour in seals. Interestingly, the ADAM model provides an empirical warning about potential seal pup habituation effects when vocally interacting with a recorded partner.

    Though our sample size is small, studies on single individuals are not unusual in comparative research [19,66]. Nevertheless, it is possible that the lack of adjustment to a playback, the adaptation to a real individual, or both, reflect a peculiar vocal behaviour of the individuals we tested and cannot be generalized to the species as a whole. The pups in this study were in a temporarily captive setting, albeit in acoustic proximity to other individuals, similar to conditions they would experience in nature. Unfortunately, vocal development in harbour seal pups has not been studied in wild colonies, hence we do not know whether captivity affects their vocal development. We do know, however, that pups vocally interact with neighbouring pups in the colony and not with their silent mothers; hence the turn-taking behaviour observed in our captive conditions might extend to the same behaviour in wild conspecifics.

    Motivation for an individual to respond and engage in an interaction, with the closest partner in the colony, may depend on the degree of participation signalled by the partner. This, in turn, may be triggered by individual-specific behaviours or by cues from other modalities. Multi-modal communication should be the target of future studies since we cannot assume that such interactions are limited to acoustic cues. Despite these limitations, our study shows that adopting multiple complementary approaches can be a fruitful way to study rhythmic interactivity in non-human animal communication.


    This research was conducted following the principles of animal care and the Dutch and EU regulations for the ethical treatment of animals in behavioural research and was subject to the health and needs of the animals. All recordings complied with the guidelines outlined by the Veterinary Department of the Sealcentre Pieterburen.

    Data accessibility

    The datasets and codes for statistical analysis used in this article have been uploaded as part of the electronic supplementary material [67]. Please follow this link to find the related files:

    Authors' contributions

    M.A., K.d.R. and A.R. conceptualized the manuscript; M.A., C.B. and A.R. designed the spontaneous interaction experiment; C.B. and A.R. collected the spontaneous interaction data and C.B. annotated the video and audio files; A.R. designed the playback experiment and collected the corresponding data; K.d.R. collected the data for the single pup calling in isolation; M.A. and K.d.R. created the initial datasets and coded the R script to extract the rhythmical measures from the audio annotations; K.d.R. did the data curation; A.S.-C. drafted paragraphs “c” and “e” in the introduction and validated experimental protocols; M.A. conducted the circular statistic analyses and wrote the corresponding sections; T.A.H. conducted the categorical rhythm analyses and wrote the corresponding sections; D.V. conducted the Granger analysis and wrote the corresponding sections; P.E.K. conducted the ADAM model analysis and wrote the corresponding sections; M.A. drafted introduction, discussions. K.d.R., T.A.H., D.V., A.S.C. and P.E.K. edited and revised the overall work. P.E.K. and A.R. validated and supervised the whole work. All authors contributed to the ideas, discussion and interpretation of the results.

    All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

    Conflict of interest declaration

    We declare we have no competing interests.


    Open access funding provided by the Max Planck Society.

    M.A. was supported by the Junior Fellowship of the Hanse-Wissenschaftskolleg (Institute for Advanced Study); The Comparative Bioacoustics Group is supported by Max Planck Independent Research Group Leader funding to A.R. Center for Music in the Brain is funded by the Danish National Research Foundation (DNRF117).


    We are grateful to the editor and the two anonymous reviewers who provided insightful comments and suggestions on an earlier version of this manuscript. We thank all the volunteers, seal care members and veterinarians of Sealcentre Pieterburen who helped during the experiment and in the rehabilitation process of the seal pups. We are thankful to Chiara De Gregorio for sharing the R code used for the categorical rhythm analyses and to Laura Verga for providing helpful comments on a draft of this manuscript.


    One contribution of 15 to a discussion meeting issue ‘Face2face: advancing the science of social interaction’.

    Joint senior authors.

    Electronic supplementary material is available online at

    Published by the Royal Society under the terms of the Creative Commons Attribution License, which permits unrestricted use, provided the original author and source are credited.