Philosophical Transactions of the Royal Society B: Biological Sciences
You have accessResearch articles

A data-driven method for reconstructing and modelling social interactions in moving animal groups

R. Escobedo

R. Escobedo

Centre de Recherches sur la Cognition Animale, Centre de Biologie Intégrative (CBI), Centre National de la Recherche Scientifique (CNRS) & Université de Toulouse – Paul Sabatier, 31062 Toulouse, France

Google Scholar

Find this author on PubMed

, ,
V. Papaspyros

V. Papaspyros

MOBOTS group, Biorobotics laboratory, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland

Google Scholar

Find this author on PubMed

,
F. Bonnet

F. Bonnet

MOBOTS group, Biorobotics laboratory, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland

Google Scholar

Find this author on PubMed

,
F. Mondada

F. Mondada

MOBOTS group, Biorobotics laboratory, École Polytechnique Fédérale de Lausanne, 1015 Lausanne, Switzerland

Google Scholar

Find this author on PubMed

,
C. Sire

C. Sire

Laboratoire de Physique Théorique, Centre National de la Recherche Scientifique (CNRS) & Université de Toulouse – Paul Sabatier, 31062 Toulouse, France

Google Scholar

Find this author on PubMed

and
G. Theraulaz

G. Theraulaz

Centre de Recherches sur la Cognition Animale, Centre de Biologie Intégrative (CBI), Centre National de la Recherche Scientifique (CNRS) & Université de Toulouse – Paul Sabatier, 31062 Toulouse, France

Centre for Ecological Sciences, Indian Institute of Science, Bengaluru, India

[email protected]

Google Scholar

Find this author on PubMed

    Abstract

    Group-living organisms that collectively migrate range from cells and bacteria to human crowds, and include swarms of insects, schools of fish, and flocks of birds or ungulates. Unveiling the behavioural and cognitive mechanisms by which these groups coordinate their movements is a challenging task. These mechanisms take place at the individual scale and can be described as a combination of interactions between individuals and interactions between these individuals and the physical obstacles in the environment. Thanks to the development of novel tracking techniques that provide large and accurate datasets, the main characteristics of individual and collective behavioural patterns can be quantified with an unprecedented level of precision. However, in a large number of studies, social interactions are usually described by force map methods that only have a limited capacity of explanation and prediction, being rarely suitable for a direct implementation in a concise and explicit mathematical model. Here, we present a general method to extract the interactions between individuals that are involved in the coordination of collective movements in groups of organisms. We then apply this method to characterize social interactions in two species of shoaling fish, the rummy-nose tetra (Hemigrammus rhodostomus) and the zebrafish (Danio rerio), which both present a burst-and-coast motion. From the detailed quantitative description of individual-level interactions, it is thus possible to develop a quantitative model of the emergent dynamics observed at the group level, whose predictions can be checked against experimental results. This method can be applied to a wide range of biological and social systems.

    This article is part of the theme issue ‘Multi-scale analysis and modelling of collective migration in biological systems’.

    1. Introduction

    The identification and characterization of interactions between the constituent elements of a living system are a major challenge for understanding its dynamic and adaptive properties [1,2]. In recent years, this issue is also at the heart of research conducted in the field of collective behaviour in animal societies [3,4]. However, the identification from field data of the interaction rules between individuals in species whose level of cognitive complexity can be quite high remains problematic. Indeed, the way in which individuals interact is strongly influenced and modulated by the physical characteristics of the environment in which the organisms live, such as temperature, humidity, brightness, or even the presence of air currents (see for instance [5,6] in social insects). The situation is quite different in the laboratory, where conditions can be precisely controlled and monitored. Moreover, new tracking techniques make it possible to record the behaviour of individuals alone or in groups for relatively long periods of time [711]. Using large sets of tracking data, one can then reconstruct and model the social interactions between two individuals of the same species, and between them and the obstacles present in their environment [12,13].

    Explaining collective behaviour in groups of organisms consists in describing the mechanisms by which the behaviours of an individual are influenced by the behaviour of the other group members that are present in its neighbourhood [14,15]. This social influence is often assumed to result from the additive combination of pairwise interactions of the individual with part or all of the members of the group. Determining how these pairwise interactions may be combined, and which neighbours must be taken into account, is a central problem in the study of collective behaviour [16].

    The behavioural response of an individual is the set of its successive positions during a given period of time, usually at some discrete time steps. From these data, it is possible to draw the trajectories of all the individuals in a group and calculate their instantaneous velocity and acceleration, their distance and angle of incidence to obstacles (e.g. the wall of an experimental tank), their heading, as well as relative quantities such as the distance between individuals, the angle of their relative position, and group quantities such as cohesion and heading polarization. These measures can reveal individual behavioural patterns such as the average velocity or the frequency of heading changes close to obstacles, and also collective behavioural patterns such as the level of cohesion and polarization of the group. Thus, the analysis of collective behaviour consists in measuring behavioural changes at the individual scale that likely result from social interactions, and associating these measures with the relative state of the individuals involved in these interactions [12]. The relative state of an individual j with respect to a focal individual i is determined by the distance dij between i and j, its relative velocity vij, its angular position with respect to i, ψij (viewing angle), and its relative heading ϕij (figure 1). The behavioural changes are precisely given by the variations of an individual’s position and velocity, or, equivalently, by the position, speed and heading variations.

    Figure 1.

    Figure 1. State variables of a focal fish i with respect to its neighbour j. Fish position is determined by the position of the centre of mass of the fish (black circles) in an orthonormal system of reference Oxy. dij: distance between fish; vij = vjvi: relative velocity of j with respect to i; ψij: viewing angle with which i perceives j; ϕij = ϕjϕi: relative heading of j with respect to i. Angles are measured with respect to the horizontal axis of coordinates Ox; we use the convention that angles are positive in the counterclockwise direction. (Online version in colour.)

    In fish that have a burst-and-coast swimming mode, the behavioural changes of an individual correspond to significant variations of its heading that occur exactly at the onset of the acceleration phase (i.e. the bursts). These discrete behavioural decisions are called ‘kicks’ [12,17]. Other quantities such as the intensity of the acceleration in the direction perpendicular to the direction of motion, or simply the turning direction (right or left), can be used to detect behavioural changes. Thus, the task is to relate the heading variation of a focal fish δϕi with its state variables, that is, to find a function δϕi(d, v, ψ, ϕ).

    In this article, we first show that force maps, which are widely used to describe the effects of social interactions on the behaviour of individuals, have important limitations when it comes to describe these interactions. These limitations result mostly from the limited number of variables that force maps can handle, the difficulty of identifying intermediate contributions to behavioural patterns, and the difficulty of distinguishing the effects of state variables from constitutive parameters. We then describe in detail a method to analyse behavioural data obtained from digitized individual trajectories. The method allows us (1) to quantify the social interactions between two individuals and describe how the intensities of these interactions vary as a function of the state variables of the individuals, and (2) to reconstruct the interaction functions and to derive an explicit and concise mathematical model reproducing the observed behaviours. Finally, we apply this method to the analysis of the social interactions in two species of fish that both have a burst-and-coast type of swimming and that are characterized by very different levels of coordination when swimming in groups. The reconstruction of interaction rules allows to understand the origin of the differences in the level of coordination, and to predict in which experimental conditions other behavioural differences can arise.

    2. Use and limitations of force maps to infer social interactions

    A first way to infer social interactions between individuals directly from experimental data consists in using the force-map technique [13,18]. This technique has been used, for instance, to estimate from experiments performed with two fish the effective turning and speeding forces experienced by an individual, once the relevant variables on which they may depend have been chosen [11,18,19]. This is a simple way to visualize the strength and direction of behavioural changes. However, force maps have strong limitations that can induce profound misunderstandings.

    (a) Visualization of social interactions with force maps

    Force maps are planar representations of two-dimensional functions of the form f(x, y) where the variation of the value of the function is represented by a colour gradient. Generating and interpreting force maps is easy and this explains their success for inferring interactions between moving groups of individuals.

    A 2D function f(x, y) given by a dataset is a sequence of N triplets (xn, yn, fn), where the index n denotes for example the instant of time tn, n = 1, …, N. To build a force map of this function, the (x, y)-space is discretized in I × J boxes of the form [x^i,x^i+1]×[y^j,y^j+1], where the nodes x^i and y^j are given by

    x^i=x^min+(i1)(x^maxx^min)/I,i=1,,I+1
    and
    y^j=y^min+(j1)(y^maxy^min)/J,j=1,,J+1,
    and the data (xn, yn, fn) is placed in the ij-box such that xn is in [x^i,x^i+1] and yn is in [y^j,y^j+1]. Then, the number of data εij in the ij-box and the mean value fij of the values of fn that fall in the ij-box are calculated. The value fij is then considered as the value of f(x, y) at the middle point of the ij-box, xi=(x^i+x^i+1)/2, yi=(y^i+y^i+1)/2. The resulting points (xi, yj, fij) are then represented in a colour surface, after optional interpolation with, for instance, multilevel B-splines [20].

    Figures 2 and 3 show the force maps of the heading variation δϕ of an individual fish performing a burst-and-coast type of swimming, as a function of different variables, in the presence of a conspecific in a circular tank. In the case of Hemigrammus rhodostomus (figure 2), we used a tank of radius R = 0.25 m, about 8.3 times the body length (BL) of the fish, and in the case of Danio rerio (figure 3), a tank of radius RZ = 0.29 m, about 6.4 BL. Both figures 2a and 3a show the intensity of δϕ as a function of dij, the distance between fish, and ψij, the angle with which fish i perceives fish j, respectively, and figures 2b and 3b show δϕ as a function of dij and ϕij, the heading difference between fish.

    Figure 2.

    Figure 2. Force maps of H. rhodostomus swimming in a circular arena of radius 0.25 m. Heading change δϕ of a focal fish, located at the origin and pointing north in the four panels, as a function of (a) the distance to its neighbour dij and the angle of perception of its neighbour ψij, (b) the distance dij and its relative heading ϕij, (c) the left–right distance dijLR and ϕij, and (d) the relative heading ϕij and the front–back distance dijFB. The colour scale shown on the right represents the average value of heading change δϕ. Distances are measured in body lengths (BL), angles in degrees.

    Figure 3.

    Figure 3. Force maps of D. rerio swimming in a circular arena of radius 0.29 m. Heading change δϕ of a focal fish, located at the origin and pointing north in the four panels, as a function of (a) the distance to its neighbour dij and the angle of perception of its neighbour ψij, (b) the distance dij and its relative heading ϕij, (c) the left–right distance dijLR and ϕij, and (d) the relative heading ϕij and the front–back distance dijFB. The colour scale shown on the right represents the average value of heading change δϕ. Distances are measured in body lengths (BL), angles in degrees.

    These force maps provide some information about the individual behaviour of fish. In H. rhodostomus, figure 2a shows that the focal fish tends to turn towards its neighbour, to the left (resp. right) when the neighbour is on the left (resp. right), except when the neighbour is very close. In that case, the behaviour is quite complex: the fish performs small angular changes of amplitude ca3060, probably owing to collision avoidance manoeuvres. When the neighbour is further away from the focal fish, it maintains its heading (i.e. the white circular region at dij12BL). From this force map, one could conclude that a fish is attracted by its neighbour when it is beyond a distance of about 2 BL, and repulsed when its neighbour is too close (dij < 1 BL). The force map in figure 2b shows that the focal fish turns left (resp. right) when the relative heading of the neighbour is shifted to the left (resp. right). The larger the heading difference, the stronger the turn: the colour intensity increases as |ϕij| grows from 0 to 120°. When fish swim in more or less opposite directions, the intensity of the heading change is small. This force map reveals that a fish tends to align with its neighbour. In D. rerio, figure 3a shows that the focal fish turns towards its neighbour when they are close to each other (dij12BL) or when the neighbour is located behind the focal fish (|ψij| > 90°, whatever the distance). When the neighbour is at dij23BL in front of the focal fish (|ψij| < 60°), the focal fish turns away from its neighbour, as well as when its neighbour is very close to it (dij < 0.5 BL). The reaction to the neighbour’s heading is less intense than in H. rhodostomus, as shown by the wide white regions in figure 3b. When fish are beyond 3 BL from each other, the focal fish turns to adopt the orientation of its neighbour. At short distances, the behaviour is more complex, with changes of smaller size than those observed in H. rhodostomus.

    The visualization of the data by means of force maps therefore suggests the presence of two distinct types of interaction: an attraction interaction, which leads a fish to turn towards its neighbour to get closer to it, and an alignment interaction, which leads a fish to turn so as to adopt the same heading as its neighbour. However, nothing can be deduced from these colour maps about what happens when the two contributions to heading variation have different signs. Thus, would a fish turn right or left when its neighbour is on its left side? Attraction alone would induce the focal fish to turn left. However, if the relative heading of the neighbour is turned to the right, alignment alone would induce the focal fish to turn right. This difficulty comes from the fact that a function (δϕ) that depends on three variables (dij, ψij and ϕij) cannot be represented in 3D. To overcome this limitation, some authors use a kind of force map where the relative position of a fish with respect to a focal fish is decomposed into the left–right (LR) and front–back (FB) distances [13]. In figure 2a, (dij, ψij) are the polar coordinates of fish j in the system of reference centred on fish i pointing north. This is a continuous system of reference in which all the relative positions of fish can be represented. Instead, the LR and FB distances are projections of the relative position of a neighbour with respect to the focal fish on the dij-axis, where all the points of figure 2a that are in the left semicircle of radius 1 BL are averaged in a single point where dijLR=1BL (where dijLR is the LR distance of fish j with respect to fish i), because these points are at 1 BL to the left of the focal fish. Then, the third variable ϕij is used to expand this averaged point on a vertical line with different values of ϕij, in a system of reference with coordinates (dijLR,ϕij), giving rise to figure 2c. Similarly, the upper semicircles of figure 2a are averaged on the dijFB-axis in figure 2d.

    Force maps in figures 2c,d and 3c,d provide additional information about individual fish behaviour. In H. rhodostomus, figure 2c shows that turning direction is homogeneously distributed in the upper and lower half-planes, meaning that the focal fish turns to adopt the heading of its neighbour almost independently of the LR distance separating them, although the turning intensity is larger when the neighbour is far from the focal fish and perpendicular to it (i.e. the regions of highly intense colour at |dijLR|>2BL and ϕij ≈ ±90°). In the white horizontal region, the focal fish maintains its heading when it is aligned with its neighbour (|ϕij| < 10°), whatever the horizontal distance between them. Figure 2d exhibits two large regions homogeneous in colour, showing that the focal fish turns almost always to adopt the heading direction of its neighbour, except when this one is far behind it (i.e. the small regions of the opposite colour for dijFB<2BL). In D. rerio, the colour of each vertical half-plane of figure 3c is almost uniform, except for some regions close to the focal fish (dLRij ≈ ±1 BL) at ϕij ≈ ±90°, and some distant regions located at dLRij ≈ ± 3 BL and ϕij ≈ ±135°). This means that the focal fish turns almost always towards its neighbour, and almost independently from its relative heading, except when both fish are close and perpendicular to each other, and when they are very far and almost anti-aligned. In figure 3d, one can see that the fish turns to adopt the same direction as that of its neighbour when the neighbour is close and in front of it (i.e. the large green and orange homogeneous regions in the centre of the figure). But the fish tends to turn towards the opposite direction when its neighbour is far ahead (small regions where dijFB>2BL) or behind it and not very close to it (regions where dijFB<1BL). This is a different and more complex behaviour than the one observed in H. rhodostomus, where heading changes depend on ϕij but not on dijLR. In D. rerio, we observe the opposite, and the dependence on the FB distance is more complex than in H. rhodostomus. However, neither dijLR nor dijFB, separately, can determine the distance at which the neighbour is (e.g. a fish j located at 1 BL to the left can be at 0.5 or 2 BL to the front). Therefore, this kind of representation hides the effect of the absolute intensity of the interactions as a function of the distance between fish, and moreover does not allow the contribution of intermediate effects such as attraction and alignment to be disentangled.

    (b) Limitations of force maps

    The above descriptions show that the use of force maps to characterize interactions between individuals raises several problems, especially when more than two state variables must be taken into account to describe the behaviour of fish. Other issues are not exclusive to force maps. For example, data can be scarce; not only because they are difficult to collect but also because the phenomenon under observation rarely produces data of a given kind. This is what happens for instance when we consider the repulsive interactions: as individuals repel each other, there are very few cases in which the distance between them is small, so that repulsive interactions are very difficult to describe when the distance between fish is small. This particular issue is common to other methods, as well as ours. More importantly, data are rarely homogeneously distributed, so that different regions of the map can result from averaging a very disparate number of data, meaning that similar colour intensities do not have the same relevance. For instance, figure 4 shows that the relative position of the neighbour of fish i is not homogeneously distributed on the (d, ψ)-plane, and is differently distributed in both panels, so that, e.g., information from the frontal region in figure 4a is more relevant to describe the behaviour of H. rhodostomus than the information from the same region in figure 4b is to describe the behaviour of D. rerio.

    Figure 4.

    Figure 4. Density maps of the location of a neighbouring fish j in the system of reference centred on the focal fish i. (a) Hemigrammus rhodostomus in an arena of radius 0.25 m, and (b) D. rerio in an arena of radius 0.29 m, when the focal fish is at least at 2 BL (6 cm) in H. rhodostomus and 1 BL (4.5 cm) in D. rerio from the wall of the tank. The vertical arrow pointing north located at the centre of the coordinate system indicates the position and orientation of the focal fish i.

    Thus, the relative simplicity to visualize and interpret data with force maps comes at the cost of important limitations and oversimplifications. Four of the most critical of these limitations are the following:

    (i)

    the reduced number of state variables involved in individual behaviour that force maps can handle is limited to two, at most three. This limitation leads to the use of projections and average values that can hide crucial features of the phenomenon under study. Moreover, when data are not homogeneously distributed, the same limitation can even produce wrong or inaccurate observations or conclusions.

    (ii)

    the difficulty of identifying and disentangling intermediate contributions such as attraction and alignment to behavioural patterns;

    (iii)

    the difficulty of finding simple analytical expressions of the interaction functions in order to implement them in a mathematical model;

    (iv)

    the difficulty of distinguishing the effects of variables (e.g. the distance between fish dij) from the effect of parameters (e.g. the BL of fish).

    Let us describe these limitations in more detail.

    (i) When a function depends on more than two variables, force maps are mere projections of the function on a 3D surface, where the value of the function has been averaged with respect to one or more variables. However, averaging raises two problems. First, the function can be odd with respect to the variable used to calculate the average, as is the case of the heading change δϕ with respect to the angle of perception ψij and the relative heading ϕij in H. rhodostomus, where δϕ(dij, ψij, ϕij) = −δϕ(dij, − ψij, ϕij) = −δϕ(dij, ψij, − ϕij) [12]. Then, for instance, huge variations of δϕ with respect to ψij, but of different sign, can cancel each other and yield a small average value, as if δϕ was almost independent of ψij. In other words, a crucial feature of the behaviour can be hidden by the averaging process. Second, averaging by simply adding the values and dividing by the number of values implicitly assumes that the probability of occurrence is the same for all the possible states.1 This would mean for instance that the probability of a fish j being in the state dij = 1 BL, ψij = 90° and ϕij = 10° with respect to a focal fish i is the same as the probability of being at (dij, ψij, ϕij) = (2, 0, 180), which is clearly not the case, at least in H. rhodostomus [12].

    Thus, even if averaging over some state variables to obtain a 2D force map can provide qualitative information, it can also hide crucial effects and can even produce a completely misleading result owing to the strong correlations between the states variables along the actual trajectories. For instance, averaging the interaction between two fish (and neglecting the effect of the wall) over their distance d, to only keep the dependence on the two angular variables ψ (viewing angle) and ϕ (heading difference) in a 2D representation, involves the unknown correlation function C(d, ψ, ϕ) between d, ψ and ϕ along the experimental trajectories. Note that even if the true interactions were in fact separable into a product of one-variable interaction functions, δϕ = f(d) g(ψ) h(ϕ), the 2D projection would then be δϕ(ψ,ϕ)=g(ψ)h(ϕ)×0C(r,ψ,ϕ)f(r)dr. The integral term (averaging over the distance between the two fish) leads to an unknown and non-trivial function of ψ, and ϕ, which is in general not separable unless the correlation function is itself separable. And it is not: if two fish are close, they are likely to be aligned (ϕ ≈ 0), whereas if they are far apart, their headings are much less correlated (ϕ nearly uniformly distributed). This simple example demonstrates that force map projections not only do not recover the true interaction (g(ψ) h(ϕ) in this case), but can artificially produce a non-separable interaction, even if the actual interaction takes a product form.

    As a consequence, the small number of dimensions that can be represented by force maps constitutes a crucial limitation for an accurate description of behaviour, especially if more variables are taken into account, such as, e.g. the relative velocity vij, or the interaction of fish with an obstacle (such as the wall of a tank), which would require two more variables rwi and θwi, the distance and angle to the wall, respectively. Even for the depicted intermediate values, the precise contributions of the different variables to the heading change are still entangled. For instance, how does ϕi vary when j points to the left and is located at the right side of i, and how does this change vary with the distance dij? The contribution of each variable cannot be disentangled from those of the other variables.

    Moreover, force maps of 3D functions like δϕ require huge amounts of data. Using 30 bins per variable with an average of 100 points per bin would require 2.7 × 106 data points, which is very far from being the amount of data collected in most of the experiments on collective motion. We anticipate here that, in order to get the same level of precision, our procedure described below reduces the required number of data by up to 150 times.

    (ii) Another limitation of force maps is that intermediate contributions are difficult or often impossible to identify. A function f can depend on the state variables x and y through two intermediate functions a and b such that f(x, y) = f(a(x, y), b(x, y)). This is precisely the case for the heading change δϕ, which depends on the state variables dij, ψij and ϕij through the combination of at least two intermediate contributions, attraction and alignment, themselves depending on the three state variables [12]. Moreover, attraction and alignment can have opposite contributions, so that their combined effect can be cancelled as if the fish were isolated from each other, while both forces are in action. Force maps are not able to identify which part of the heading change is due to the attraction or the alignment, that is, the functions a(x, y) and b(x, y) cannot be directly extracted from force maps. Even if representations in 7D were possible, these ‘maps’ would not allow identification of intermediate contributions.

    (iii) Extracting analytical expressions from a colour map is difficult unless the relation between variables is very simple. However, it is essential to build mathematical models to understand how the combination of social interactions gives rise to the observed behaviour [14,21]. The simulations of these models will then be used to make predictions in other experimental situations, to draw phase portraits, etc. Simple piecewise linear functions interpolating the data are not suitable because they lack the physical or biological meaning and do not help to build explicit and concise mathematical models. Figures 2 and 3 show the force maps of δϕ for two different species of fish. Significant differences appear between maps, e.g. in panels (a), the left–right symmetry of the heading change with respect to the position of the neighbour in H. rhodostomus, while in D. rerio the symmetry is with respect to the position of the focal fish; in panels (c), the half-planes of homogeneous colour are horizontal in H. rhodostomus, but vertical in D. rerio. Will such a difference be observed if larger groups are considered? Force maps cannot be extrapolated from one experimental situation to another.

    (iv) Finally, it is difficult to distinguish the effect of a variable from the effect of a parameter in a force map, e.g. the effect of the instantaneous distance between fish at each instant of time, from the effect of the fixed BL of individuals. Are the differences observed in the previous maps of each species due to the experimental conditions (e.g. the radius of the arena, which is a variable of the experimental set-up), or to the physical characteristics of the species (e.g. the BL, which is a fixed parameter of the species)?

    The method we present below does not have these limitations, first, because it can handle a large number of state variables (i.e. of dimensions), and second, because the analytic expressions it provides are precisely the optimal way to disentangle the interaction functions at play and to describe the role of each state variable and each parameter of the system. These analytic expressions can then be exploited to build an explicit and yet concise model, whose agreement with experiment can be tested, and whose predictions can be further investigated experimentally.

    3. Method to extract and model social interactions from behavioural data

    Section 2 shows that force maps are representations of one quantity (action force, acceleration, heading variation, etc.) as a function of pairs of other quantities (relative position, velocity or orientation, angle of perception of other individuals, etc.). These quantities only make sense in a framework of the physical world described by a mathematical model which, even if it is often not mentioned explicitly in studies [18,19,22,23] (but see also [13,24,25]), is usually based on equations of motion built in analogy with Newtonian mechanics.

    The method essentially consists in defining this framework and deriving the mathematical model describing the relation between the quantities used to quantify the behaviour of an organism and its interactions with other organisms or physical objects that are present in its environment.

    Of course, our procedure requires the measurements (i.e. the distribution) of the heading changes δϕ, and also calculation of the average in each box of the discretized grid. However, our procedure does not require at any moment to represent heading changes as a function of two or more variables, i.e. using force maps.

    (a) Outline of the method

    Figure 5 provides a general overview of the method used to design the mathematical model and to extract the social interaction functions. The method consists of five consecutive steps, starting from experimental observations and ending with numerical simulations of a model reproducing these observations. These five steps are repeated in a cycle as shown in figure 5, as the model predictions allow better targeted experiments to be performed, which in turn allow the model to be refined: the experiments feed the model, and in turn the model helps the design of experiments. Figure 6 shows the same five steps in a detailed flow chart that includes the test of the model.

    Figure 5.

    Figure 5. General overview of the methodology. The successive steps carried out in our method are labelled with a number and illustrated with a representative picture: (1) Trajectories of real fish from experimental data; (2) observables of collective behaviour, the distance dij between fish, the relative position ϕij, and the relative orientation ϕij, and PDF of dij; (3) model equations of the heading angle changes δϕ; (4) extracted interaction functions FAtt, FAli, EAtt, OAtt, OAli and OAli, and their dependence on dij, ψij and ϕij; (5) trajectories resulting from the numerical simulations of the model. The darker blue arrow illustrates the possibility that the cycle starts again, which includes the realization of a new set of experiments or the design of a new set-up, to check the predictions of the model or to deepen our understanding of the mechanisms at work.

    Figure 6.

    Figure 6. Flowchart of the methodology, with a special emphasis on the procedure of extraction of the interaction functions (step 4). When the test at point 6 is not verified, the cycle starts again at a previous step, depending on the reason for which the test failed: if the model has to be revised (e.g. because the relation between observables and state variables is not well reproduced), then go to step 3; if other state variables or observables have to be considered in the model (e.g. because they may contribute to some observed effect), then go to step 2; if new experiments must be carried out (e.g. because there are not enough data or some important case has not been observed), then go to step 1. (Online version in colour.)

    (i) Data collection and preparation

    The method starts by carrying out experiments, motivated by a fundamental question, or the results of the previous cycle of an ongoing modelling process. Then, data are collected and prepared in the form of individual trajectories ui(t) resulting from a tracking procedure conveniently purged from identification errors (step 1 in figures 5 and 6). The velocity vector vi of an individual i is calculated from successive positions with finite differences, and its heading ϕi is extracted directly from the tracking data, or, alternatively, taken as the orientation of the velocity vector.

    (ii) Identification and analysis of state variables and observables

    The second step consists in identifying and analysing the state variables and the relevant observables that can be used to describe the phenomenon under study (step 2 in figures 5 and 6).

    In the case considered here, the individual state of a fish is characterized by its position and velocity vectors, or, equivalently, by its speed and its distance and orientation to the wall, v, rw and θw, respectively. The state of a fish i with respect to another fish j is characterized by the distance between them dij, the relative speed vij, the viewing angle with which j is perceived by i, ψij (≠ψji), and their relative heading ϕij = ϕjϕi. The heading angle change δϕS resulting from social interactions is thus a function of four variables.

    Then, an analysis is performed on a series of observables that can be used to describe both the individual and collective behaviour of fish when swimming in pairs. These observables are the probability density function (PDF) of the spatial distribution of individuals, their velocity, their distance and orientation to obstacles, their heading variation, the distance between individuals (cohesion), their relative positions, and their relative heading (polarization). Figure 7 shows the PDF of dij, ψij and ϕi corresponding to the species of fish studied here. The analysis of these observables relies exclusively on the experimental data and is independent of a model. Eventually, force maps can be drawn.

    Figure 7.

    Figure 7. Probability density functions (PDFs) of (a) distance between fish dij, (b) difference of heading ϕij, and (c) angle ψfollower with which a fish i perceives its neighbour j when |ψij| < |ψji| (the ‘geometrical follower’ is hence the fish that would have to turn the less to face the other fish, the latter being called the ‘geometrical leader’). Blue lines: H. rhodostomus (HR) in a circular arena of radius 0.25 m, red solid lines: D. rerio (DR) in a circular arena of radius 0.29 m. The vertical lines in (a) denote the relative radius R/BL of the arena with respect to fish body length of each species, 8.3 in H. rhodostomus (long-dashed line) and 6.4 in D. rerio (dotted line), with 1 BL = 0.03 m in H. rhodostomus and 0.045 m in D. rerio. (Online version in colour.)

    (iii) Modelling hypotheses

    The next step consists in defining a class of models that can provide the most suitable description of the observed phenomenon (step 3 in figures 5 and 6). This is by far the most crucial part of the method.

    This process has been described in detail in [12] in the case of H. rhodostomus; here we simply summarize it, as it is identical for D. rerio. We first consider that fish move in a straight line during short time intervals of different durations. These time intervals are separated by instantaneous ‘kicks’ during which fish adjust their direction and make an abrupt acceleration, after which the speed decays exponentially as the fish glides between two kicks. These assumptions come from the analysis of the trajectories and the variation of the velocity [12]. We also assume that the effects of social interactions occur exclusively when kicks are performed, that is, at the decision times when fish adjust their heading. Then, we provide an explicit equation for the heading variation of the focal fish δϕ when the fish performs a kick, in terms of the quantities that can potentially have an effect on δϕ. The selection of the duration and length of the gliding phase and the motion of the fish, with quasi-exponential decay of the velocity between two kicks, are given by simple analytical probability distributions fairly well reproducing the experimental ones, as described in [12].

    The equations for the time-evolution of the vector position u(t) and the heading ϕ(t) of a fish are

    u(t+dt)=u(t)+l(t)e(ϕ(t+dt))
    and
    ϕ(t+dt)=ϕ(t)+δϕ(t),
    where l(t) is the length of the kick, performed in the direction of the angle ϕ(t + dt), given by the unitary vector e(ϕ(t + dt)). The choice of the kick length is described in detail in [12]. Here, we focus on the equation for the heading variation δϕ, which, we hypothesize, accounts for the effects of the social interactions between fish, and reads
    δϕ(t)=δϕR(t)+δϕw(t)+δϕS(t),3.1
    where δϕR is a random angle change accounting for the spontaneous decisions of the fish, δϕw is due to the repulsion of the wall when the fish is close to a tank wall, and δϕS is due to the social interactions with other fish, usually attraction, alignment, or a combination of both, so that
    δϕS(t)=δϕAtt(t)+δϕAli(t).3.2
    Equations (3.1) and (3.2) are based on the hypotheses that δϕR, δϕw, δϕAtt and δϕAli are the main contributions to the heading variation of a fish, and that these contributions are combined linearly (additive hypothesis).

    (iv) Extraction of interaction functions from trajectory data

    Step 4, which is the one we wish to emphasize in this article, consists in extracting, from the experimental data, the interaction functions determining the contributions of a neighbouring fish and of the obstacles in the environment to the instantaneous heading variation of a fish. It is a relatively long step that involves several substeps, for which an overview is presented in figure 8.

    Figure 8.

    Figure 8. Detailed description of the successive substeps involved in the extraction of the interaction functions (step 4 of our method). For each substep, a short text describes what is done and the main formulae used in carrying it out. (Online version in colour.)

    Discretizing the six state variables in Ω = Q × P × I × J × K × M boxes, and calculating the mean value of δϕ in each box from the experimental data, equation (3.1) can be rewritten as a system of Ω equations and Ω unknowns,

    δϕw(rq,θp)+δϕS(di,ψj,ϕk,vm)=δϕqpijkm,3.3
    where the noise term δϕR vanishes as it is assumed to have zero mean in each box. We use the convention that unknowns are written in the left-hand side of the equation and the known values in the right-hand side. Even if one could build seven-dimensional representations, the contributions of each variable would still be entangled, and intermediate functions impossible to detect in the corresponding behavioural maps of social interactions.

    The key hypothesis to the procedure consists in assuming that the contribution of each state variable to each kind of interaction can be separated in a product form, which is the case for physical particles [12]:

    δϕw(rw,θw)=f¯(rw)g¯(θw),3.4
    δϕAtt(d,ψ,ϕ,v)=f(d)g(ψ)h(ϕ)l(v)3.5
    andδϕAli(d,ψ,ϕ,v)=f^(d)g^(ψ)h^(ϕ)l^(v).3.6
    Note that functions with a hat are in general different from the functions without hat, (i.e. f(d)f^(d), etc.). With this additional hypothesis, the procedure provides analytical expressions of the interaction functions of attraction and alignment, and of the contribution of each of them to the heading variation. In the case where the contribution of one of the interactions is non-existent, for instance if there is no explicit alignment, then the procedure will detect it.

    For the sake of simplicity, we consider here the case where fish are far enough from the tank wall that the contribution of δϕw to the heading variation can be neglected with respect to the effects of social interactions. The procedure can easily be extended to extract both the social interactions and the interaction with the wall, but at the cost of more complicated notation. We refer the reader to the work of Calovi et al. [12], in which the disentangling of the combined effects of a tank wall and social interactions between fish is described in detail.

    Hence, we are left with four state variables, so that the heading change δϕ(t) is averaged in four-dimensional ijkm-boxes. We introduce the notation f(di) = fi, g(ψj) = gj, …, l^(vm)=lm, so that, after removing the interaction with the wall, the system (3.3) becomes

    figjhklm+f^ig^jh^kl^m=δϕijkm,3.7
    which is a system of I × J × K × M equations with D = 2(I + J + K + M) unknowns, which is much smaller than the number of boxes, and in practice, smaller than the usual number of data available.

    The goal is now to solve this system of equations. This will provide the discrete values of the interaction functions (the unknowns fi,gj,,l^m) as a function of the known values δϕijkm (calculated from the experimental data as the mean heading change in each box). Systems with more equations than unknowns are called overdetermined systems and rarely have a solution. One way of overcoming this problem consists in reducing the overdetermined system to a solvable one by minimizing the error Δ with which the overdetermined system is satisfied by a candidate solution that is updated iteratively. This corresponds to step 4.1 in the flowchart figure 6, which is described in detail in box 1 and in §3b below for a simple 2D case where the equation is fi gj = aij. Figure 9 shows how the error function is built in this simple case.

    Figure 9.

    Figure 9. Construction of the error function Δ(X) in a simple two-dimensional case where a(x, y) = f(x)g(y). After discretizing the variables x and y in I and J intervals, respectively, the values of a(x, y) are distributed in the resulting ij-boxes. The number of data in each box εij is counted, and the mean value of a(x, y) in each box, aij, is calculated. The square error is formulated in each box, (fi gjaij)2, and summed up along all boxes. Starting from an initial guess of I + J values fi0, gj0, i = 1, …, I, j = 1, …, J, which produces a set of I × J green balls (one per ij-box), the iterative process is carried out. At each iteration, the green balls of each box converge globally towards the yellow balls, until a minimum of the error Δ(X) is reached. The final values fin, gjn (blue and red points, respectively) are then fitted by means of analytical expressions, which are the interaction functions represented by the blue and red solid curves. Note that n is the index of iteration, not the time discretization index.

    The error function is a function Δ(X) of D = 2(I + J + K + M) variables which returns a real value calculated as follows:

    Δ(X)=i=1Ij=1Jk=1Km=1Mϵijkm(δϕS,ijkmδϕijkm)2,3.8
    where X is the vector of dimension D
    X=(f1,,fI,g1,,gJ,h1,,hK,l1,,lM,f^1,,f^I,g^1,,g^J,h^1,,h^K,l^1,,l^M),
    and where δϕS,ijkm=figjhklm+f^ig^jh^kl^m is the discretization of the social force in each ijkm-box (the unknowns), δϕijkm is the mean heading change in the ijkm-box (calculated from experimental data), and εijkm is the number of data in the ijkm-box. By weighting the regions with more data, the factor εijkm allows the structure of the dataset and the correlations between variables resulting from the dynamics to be preserved (see also endnote 1 in §2). For a given X, the local error in the ijkm-box is defined as the difference between X and the experimental data, δϕS,ijkm − δϕijkm. Then, the local error is squared and weighted by the number of data in the box, εijkm, and summed over all the I × J × K × M boxes. The result is a positive real value Δ(X) that must be as small as possible, so that an optimal X must be found.

    The minima of Δ(X) are obtained by finding the zeros of the gradient of Δ (see the inset in box 1). Finding the zeros of a function (or root finding) can be carried out by different methods such as descent methods or other iterative methods. Here, we use an iterative method based on the fact that the equation that each component must verify is linear in this component, so that each component can be written explicitly in terms of the other components. This relation is found as follows.

    Box 1. Minimizing the error between data and a product form.

    Let us consider an N × M two-dimensional array aij, for instance, built after averaging some quantity obtained from experiments on an N × M grid. We look for the best approximation of aij under a product or separable form, aijxi × yj, i = 1, …, N, j = 1, …, M (see the system (3.7), in the more general case when more than two dimensions are involved). Note that in the product form, the number of unknown variables is only N + M, compared with the much larger size N × M of the original array of data aij. In order to quantify the accuracy of the product description, we define the quadratic error function Δ:

    Δ(x1,,xN,y1,,yM)=i=1Nj=1Mϵij(xiyjaij)2.6.1
    Here, εij is the number of data that fall in the cell ij. For systems that derive from physical and biological phenomena, it is fundamental to modulate the contribution of each cell, not only to give a higher weight to the more frequent values but also to preserve actual correlations arising between the variables.

    The goal is thus to find a set of values x1, …, xN, y1, …, yM that minimizes the error Δ. Δ = 0 implies that xi yj = aij, for all i and j, and that the original data had exactly a product form. Finding the minima of a function requires finding the zeros of its derivative, which, in high dimensions, is the gradient vector Δ, given by the partial derivatives of Δ with respect to its components; see the Finding minima in the boxed inset below.

    Finding minima

    A minimum of a one-variable function f(x) is a point xm where the derivative vanishes, f′(xm) = 0, and the second derivative is positive, f″(xm) > 0.

    In several dimensions, the ‘derivative’ of a function F:RDR is the gradient vector F(x), whose components are given by the partial derivatives of the function with respect to each component of x: F(x)=(F/x1,F/x2,,F/xD). The gradient vector points in the direction of maximum variation of F, and is therefore zero (i.e. equal to the null vector 0) when x is an extremum, and in particular, a minimum:

    Fx1=0,Fx2=0,,FxD=0.6.2

    In principle, to ensure that the extremum is indeed a minimum, one also has to check that the Hessian matrix of second-order partial derivatives has only strictly positive eigenvalues, a condition generically satisfied for squared error functions like the one considered here.

    Differentiating (6.1) with respect to xk gives

    Δxk=i=1Nj=1Mxk[ϵij(xiyjaij)2]=2j=1Mϵijyj(xkyjakj),6.3
    where we used the fact that xiyj does not depend on xk, if ik. Then,
    Δxk=2xkj=1Mϵijyj22j=1Mϵijyjakj,6.4
    and the conditions ∂Δ/∂xi = 0 and ∂Δ/∂yj = 0 for all i and j can be rewritten as
    xi=j=1Mϵijyjaijj=1Mϵijyj2,i=1,,N,yi=i=1Mϵijxiaiji=1Mϵijxi2,j=1,,M.6.5
    This is a system of N + M equations and unknowns that can be solved with different methods. Owing to the large dimension of the systems arising in social interaction analysis, iterative method are often used. See box 2.

    For the gradient to be zero, we must have

    Δf1=0,Δf2=0,,ΔfI=0,,ΔlM=0.
    For the component fi, the partial derivative of Δ is
    Δfi=2j=1Jk=1Km=1Mϵijkmgjhklm(δϕS,ijkmδϕijkm),
    so the condition ∂Δ/∂fi = 0 is equivalent to
    fi=j=1Jk=1Km=1Mϵijkmgjhklm(f^ig^jh^kl^mδϕijkm)j=1Jk=1Km=1Mϵijkm(gjhklm)2.3.9
    The corresponding equations for the other components are derived in the same way and all have the same structure where each component is given by an explicit combination of the other components and the known values δϕijkm and εijkm. For instance, the equation of gj is obtained by replacing gj by fi, j = 1 by i = 1, and J by I, and the equation for a function with a hat (e.g. f^i) is obtained by replacing the functions that have a hat by the same functions without the hat, and vice versa (i.e. gj is replaced by g^j, g^j is replaced by gj, and so on).

    The resulting set of explicit equations for each component can be written in the following form,

    f1=R1(g1,,gJ,h1,,h^K,l^1,,l^M),f2=R2(g1,,gJ,h1,,h^K,l^1,,l^M),=fI=RI(g1,,gJ,h1,,h^K,l^1,,l^M),g1=RI+1(f1,,fI,h1,,h^K,l^1,,l^M),=g^J=RI+J(f1,,fI,h1,,h^K,l^1,,l^M),=l^M=RD(f1,,fI,g1,,g^J,h^1,,h^K).
    To find the solution of such a system, it is possible to write it in a compact form as X = R(X), where R is a function from RD to RD, and consider the problem as finding a fixed point X0 of the function R. This is done by means of an iterative method, as explained in detail in box 2, which also shows how this method works in the 1D case. Again, note that the number D = 2(I + J + K + M) of fitting parameters is much smaller than the number of boxes I × J × K × M.

    Box 2. Fixed point iterations.

    A point xR is a fixed point of a function f:RR if f(x*) = x*. Fixed points can be found under certain conditions2 by means of the iterative method

    xn+1=f(xn).6.6
    Starting from an initial point x0, the method builds a sequence x1, x2, x3, … that converges to (one of) the fixed point(s) of f; see figure 15.

    In d dimensions, a vector xRd is a fixed point of a function F:RdRd if F(x*) = x*.

    When the dimension of the system d = N + M is very large, and in order to improve the stability of the recursion dynamics, it is convenient to use a relaxation method,

    xn+1=λxn+(1λ)F(xn),6.7
    where λ ∈ (0, 1) is the weight of the previous iteration in the value of the new iteration, averaged with what would have been the new iteration.

    In the end, the procedure provides the values of each interaction function at the points representing each box minimizing the error functions. Depending on the number of boxes, a satisfactory analytical form of each one-variable function can be obtained, based on physical principles and specific observations. This part corresponds to the step 4.3 in figure 6 and depends on the specific phenomenon under study; it is detailed in §4 for the case of fish swimming in pairs. For example, the function of the angle of perception g(ψ) must be odd, i.e. g(−ψ) = −g(ψ), because the attractive effect of a neighbour on the heading change of a focal fish has the same intensity wherever the neighbour is located at the right or at the left of the focal fish, but has opposite directions, and hence opposite signs, in each case: if the neighbour is at the right (resp. left) side, the focal fish would turn right (resp. left) to approach the neighbour. Physical properties of the interactions at play, such as the exponential decay of some interactions, must be taken into account and guide the choice of the final analytical expressions.

    (v) Numerical simulations of the model

    The last step consists in performing numerical simulations of the model with the double purpose of (1) verifying that the simulation results are in good agreement with the experimental results, and (2) making new predictions that can be ultimately confirmed by new experiments. If one of these two points is not satisfactorily verified, then it is necessary to go back to a previous step: step 3 to revise or reformulate the model, step 2 to use alternative state variables or observables, or even step 1 to carry out new experiments or measures of the data (figure 6).

    (b) Application to a simple case study

    In order to illustrate how the extraction procedure (step 4 of our methodology in figure 6) can be used, we have produced artificial data for a simple case in which the interaction functions are known and in which we controlled the level of noise and the distribution of data. This simple case also allows us to illustrate the efficiency of the procedure and the accuracy that can be obtained according to the quality of the data.

    Figure 10a,b shows the colour map of a real function of two variables a(x, y) going from [0, L] × [−π, π] to R, for two levels of noise

    a(x,y)=2e(x/x0)2siny,3.10
    with x0 = 0.1 and L = 2. The map is built as follows. We first discretize the 2D space (x, y) in Ω = I × J rectangular cells [x^i,x^i+1]×[y^j,y^j+1], where
    x^i=(i1)LI,i=1,,I+1
    and
    y^j=π+(j1)2πJ,j=1,,J+1.
    Then, we evaluate the function a(x, y) on 100 000 points randomly selected, and place each point on the corresponding cell ij. After that, we count the number of points in each cell, εij, and we assign to aij the average of the values of the function for all the points that are in the cell ij. To simulate the effect of the noise, which is always present in real datasets, for each point found in the cell ij, we add a small noise of zero mean and standard deviation 0.35 to the value of aij. This value corresponds to the noise intensity observed in the experiments with pairs of real fish [12]. The resulting value is then considered as the measured value of the function a(x, y) in the corresponding cell, denoted by (xi, yj), and usually defined as the middle point of the cell: xi=(x^i+1x^i)/2, yj=(y^j+1y^j)/2.
    Figure 10.

    Figure 10. Force maps and reconstruction of the interaction functions in a simple case. (a,b) Force maps of the function a(x, y) + η(x, y) for two levels of noise, of zero mean and amplitude (a) 0.35, which is the one observed in the experiments with H. rhodostomus, and (b) 3.5, a level of noise ten times higher. We used the same distribution of points εij as the one observed in the experiments with H. rhodostomus. (c,d) Reconstruction of the interaction functions (c) f(x) and (d) g(y) extracted with our procedure (red dots and blue circles), compared with the analytical expressions exp[−(x/x0)2] and 2siny, respectively (black lines). The red dots correspond to the case of a noise of amplitude 0.35, where we used 20 nodes in x and 31 in y, and the blue circles to the case with a much higher level of noise (3.5) and with fewer nodes in x, 11 instead of 20. Note that the function g(y) is normalized.

    We now apply the procedure described in §3a(iv), which provides analytical expressions of the interaction functions of x and y that give rise to this colour map.

    The key hypothesis is that there exist two decoupled functions f(x) and g(y) such that a(x, y) = f(x) g(y). Evaluating these functions in each ij-cell, this means that

    figj=aij,i=1,I,j=1,,J.3.11
    The values of fi and gj are unknown for all i and all j. The values of aij are known for all ij because it is the mean value of the data found in the ij-cell. These I × J equations and I + J unknowns constitute precisely the overdetermined system (3.7). Following the procedure to reduce this overdetermined system, we write the error function Δ(X), for X = (f1, …, fI, g1, …, gJ),
    Δ(X)=i=1Ij=1Jϵij(figjaij)2,3.12
    which has the same form as the one shown in equation (6.1) of box 1. Figure 9 shows how Δ(X) is calculated as the sum of all the squared local errors in each box, denoted by the vertical green line between the yellow and green balls corresponding to aij and fi gj respectively.

    To minimize the error function, we look for the zeros of its gradient. The partial derivatives are

    Δfi=2j=1Jϵijgj(figjaij)3.13
    and
    Δgj=2i=1Iϵijfi(figjaij),3.14
    and they are equal to zero when
    fi=j=1Jϵijgjaij/j=1Jϵijgj23.15
    and
    gj=i=1Iϵijfiaij/i=1Iϵijfi2.3.16
    Note that each component is explicitly given by a combination of the other components. This property always holds, independently of the number of interaction functions. This is due to the hypothesis of separability of variables and to the square norm used in the error function. Moreover, in this particularly simple case, we observe that the fis are given by the gjs, and vice versa (recall that the values aij and εij are known for all ij). Note finally that the minus sign that appears in equation (3.9) does not appear here.

    Thus, starting from an initial guess of the solution, X0=(f10,f20,,fI0,g10,g20,,gJ0), it is possible to obtain the values of the fi s and the gj s after one iteration, that is, X1=(f11,f21,,fI1,g11,g21,,gJ1), repeating this process iteratively, X1X2X3 → · · · until convergence, i.e. Xn+1Xn.

    When the jumps between iterations are excessively abrupt, for example if the initial guess is far from a solution, it is convenient to smooth or relax the iterative process by averaging the new values with those of the previous step weighted with a larger coefficient

    fin+1=λfin+(1λ)f~in3.17
    and
    gjn+1=λgjn+(1λ)g~jn.3.18
    Here, f~in and g~jn are calculated from the values of Xn in the previous step with the formulae (3.15) and (3.16), and λ ∈ [0, 1] is chosen with the compromise that the iterations progress smoothly but in a reasonable computational time. See also box 2.

    Figure 11 shows the successive values that each component fi, gj adopts during the first 120 steps of the iteration process, starting from an initial condition where fi0=gj0=1 for all i and j. We used I = 11, J = 17, a value of λ = 0.975 quite close to 1 in order to illustrate the convergence process in detail, and a final tolerance |Δ(Xn+1) − Δ(Xn)| < ɛ = 1.87 × 10−7 (using ɛ = I × J × 10−9). Typical values of λ can be much smaller (λ = 0.75), depending on how close the initial guess is from the solution.

    Figure 11.

    Figure 11. Evolution of the iterative process for the minimization of the error function Δ(X) in the simple case where a(x,y)=2exp[(x/x0)2]siny. Blue circles denote the successive values of the I + J components of the unknown functions fin and gjn during the first 120 steps of the iterative process. Red lines show the initial guess of each function fi0=gj0=1, i = 1, …, I and j = 1, …, J, and the final state of convergence. Note that n is the index of iteration, not the time discretization index. (Online version in colour.)

    The final state to which the iterative method has converged, that is, the points {fi}i=1I and {gj}j=1J that solve the reduced system (6.5), is shown in figure 10c,d. There is an excellent agreement with the original data, despite the addition of noise, including in the case of a much larger noise than the one inferred in the actual data of our fish experiments.

    Note that if f and g are a solution of (6.5), then the functions (1/α)f and αg, where α is a real number, are also a solution of (6.5), since the product of the two functions remains invariant. Hence, we need to impose an additional condition in order to account for this under-determination and to generally allow for a proper comparison of reconstructed interaction functions. Following [12], we chose to normalize all angular functions so that their squared average is equal to 1. In particular, g is normalized such that (1/2π)ππg(y)2dy=1, a normalization applied in figure 10c,d.

    The last step consists in finding simple analytical expressions that interpolate the discrete values of the reconstructed interaction functions, in order to implement them in an explicit mathematical model. There is an infinite number of combinations, so that one must be guided by key physical features of the phenomenon under study that are well established, properties such as symmetries or analogies with other physical systems. For example, in the case of angular functions, the parity is often easily identifiable from the data or can be asserted from general principles (mirror symmetry, left/right symmetry etc.), so that a small number of Fourier modes can be sufficient to interpolate the angular functions from the data that result from the reconstruction procedure.

    (c) On the hypothesis of separation of variables of interaction functions

    The central assumption of the extraction procedure is the separation of variables made in (3.4)–(3.6). Without this hypothesis, the solution of equation (3.3) far from the wall is δϕS(di, ψj, ϕk, vm) = δϕijkm, and we are led back to force maps, with all contributions still entangled and no analytical expressions of the interaction functions. Separability of variables is thus crucial for our method. However, what happens if the interaction functions do not satisfy this condition?

    Consider a 2D function a(d, ψ), odd in ψ. Such a function can always be expanded in a Fourier series as

    a(d,ψ)=sinψk=0ck(d)cos(kψ).3.19
    In order to be separable as a(d, ψ) = f(d)g(ψ), the functions ck(d) in (3.19) have all to be proportional to some f(d), that is, ck(d) = αk  f(d), with αk constant for all k. Then, g(ψ)=sinψk=0αkcos(kψ).

    A simple and biologically meaningful example, inspired by the fish interaction functions, would consist in writing ck(d)=αkexp[d2/(2lk2)] in (3.19) and using only the modes k = 0 and 2:

    a(d,ψ)=sinψ[α0ed2/(2l02)+α2ed2/(2l22)cos(2ψ)].
    When l2 = l0, the variables are separable and we have
    a(d,ψ)=α0ed2/(2l02)sinψ[1+α2α0cos(2ψ)],3.20
    which, when α0=2, l0=x0/2 and α2 = 0, corresponds to the simple case used in §3b, for which the reconstruction procedure worked quite well.

    Let us see the result of applying our procedure to a similar function b(d, ψ) that does not satisfy the separability condition. Using the Fourier expansion (3.19), we consider two examples b1(d, ψ) and b2(d, ψ) of the form

    b1,2(d,ψ)=2ed2/(2l02)sinψ+1.13ed2/(2l22)sinψcos(2ψ),
    where l2 = 2l0/3 ≈ 0.047 and l2 = 3l0/2 ≈ 0.106, respectively, and α2 = 0.8α0 ≈ 1.13 in both functions. Note that the first term is precisely the function used in the simple case of §3b, and the presence of the second term makes the functions b1 and b2 not separable.

    Figure 12a,b shows the colour maps of b1(d, ψ) and the solution found by the reconstruction procedure fi gj. The general shape is quite well reproduced, and differences appear only at a relatively small scale. Figure 12c,d shows the reconstructed functions f(d), g(ψ) for which we have found the following analytical expressions,

    f(d)=β0ed2/[2(l¯0)2]3.21
    and
    g(ψ)=β1sinψ[1+β2cos(2ψ)],3.22
    with β0 = 0.72, l¯0=l0=0.071, β1 = 1.86, β2 = 0.62. The resulting interaction function is
    f(d)g(ψ)=1.34ed2/[2(l¯0)2]sinψ[1+0.62cos(2ψ)],
    where the first term is quite close to the one of b1(d, ψ) (with 1.34 instead of 1.41), and the second is not too far from the corresponding one of b1(d, ψ), with 1.34 × 0.62 = 0.83 instead of 1.13, apart from the different decreasing rate of the exponential.
    Figure 12.

    Figure 12. Colour maps of (a) the non-separable function b(d,ψ)=2ed2/(2l02)sinψ+1.13ed2/(2l22)sinψcos(2ψ), and (b) the solution found by the reconstruction procedure f(d)g(ψ)=1.34ed2/(2l02)sinψ+0.83ed2/(2l02)sinψcos(2ψ). (c,d) Red and blue solid lines: cuts of the surface of b1(d, ψ) for a fixed value of (c) ψ = π/2 (red) and ψ = π/4 (blue), and (d) d = 0.03 (red) and d = 0.1 (blue). Black dots denote the reconstructed functions fi in (c) and gj in (d). Solid black lines in both panels are the analytical expressions found for the reconstructed functions. (c,d) Red and blue dots are cuts of the reconstructed solution fi gj corresponding to the respective cut of the non-separable function b1(d, ψ). Values of the cuts are f(0.03) = 0.67, f(0.1) = 0.26, g(π/2) = 0.73 and g(π/4) = 1.36. A noise of the same amplitude as the one measured in the experimental data of H. rhodostomus was added to b1(d, ψ) to create the dataset for the reconstruction procedure. The same data structure (number of data per box) as the one found in real fish has been used. We have used I = 21 and J = 31.

    Although the procedure provided an apparently satisfactory result (figure 12a,b are quite similar), the reconstructed functions fail to reproduce some features such as the (two) changes of variation of the intensity along the radial coordinate (first increasing, then decreasing) when the neighbour is at one side of the focal individual (ψ ≈ ±π/2), and introduce an artificial slight decrease in the intensity when the neighbour is far away (d ≈ 3 BL) and at one side (ψ ≈ ±π/2) of the focal individual (figure 12c,d).

    A similar qualitative result is obtained in the second case with b2(d, ψ), as shown in figure 13. In that case, we found β0 = 0.69, l¯0=0.078, β1 = 1.99, β2 = 0.9, so the interaction function provided by our procedure is

    f(d)g(ψ)=1.37ed2/[2(l¯0)2]sinψ[1+0.9cos(2ψ)].
    In both examples, the worst agreement between the reconstructed interaction and the actual one is obtained for the angular dependence of the interaction at very large distance (blue curves in figures 12d and 13d), and for the radial dependence of the interaction at ψ = ±π/2 (red curves in figures 12c and 13c). However, both cases correspond to situations where the interaction is the weakest and the reconstructed interaction matches perfectly the actual one in situations where the interaction is most significant (black and red curves in both panels (d); black and blue curves in both panels (c)).
    Figure 13.

    Figure 13. Colour maps of (a) the non-separable function b(d,ψ)=2ed2/(2l02)sinψ+1.13ed2/(2l22)sinψcos(2ψ) and (b) the solution found by the reconstruction procedure f(d)g(ψ)=1.34ed2/(2l02)sinψ+0.83ed2/(2l02)sinψcos(2ψ). Red and blue solid lines: Cuts of the surface of b2(d, ψ) for a fixed value of (c) ψ = π/2 (red) and ψ = π/4 (blue), and (d) d = 0.03 (red) and d = 0.1 (blue). Black dots denote the reconstructed functions fi in (c) and gj in (d). Solid black lines in both panels are the analytical expressions found for the reconstructed functions. Red and blue dots are cuts of the reconstructed solution fi gj corresponding to the respective cut of the non-separable function b2(d, ψ). Values of the cuts are f(0.03) = 0.23, f(0.1) = 0.29, g(π/2) = 0.66 and g(π/4) = 1.38. A noise of the same amplitude as the one measured in the experimental data of H. rhodostomus was added to b2(d, ψ) to create the dataset for the reconstruction procedure. The same data structure (number of data per box) as the one found in real fish has been used. We have used I = 21 and J = 31.

    Let us now discuss the general validity of the product assumption on the illustrative example of the interaction of a fish with a circular wall. In this case, the heading angle change between two time steps (kicks) is δϕ(rw, θw), where rw is the distance of the fish from the wall, and θw is the angle between the fish heading and the normal to the wall. Here, the product hypothesis amounts to assuming that δϕ(rw, θw) = f(rw)g(θw). As explained in [12], the product assumption is verified for physical particles interacting with a wall via a conservative force (deriving from a potential energy), with g(θw) given exactly by g(θw) = sin θw (projection of the central force on the normal to the velocity, i.e. on the angular acceleration). For animals, and in particular fish, the anisotropic perception of their environment generally leads to non-conservative interactions. The product hypothesis assumes that this anisotropic perception can be fully encoded in a non-trivial function g(θw) generally different from a simple sinus, while the function remains odd if the left/right symmetry is preserved. As explained above, δϕ(rw, θw) for animals takes the general Fourier expansion form of equation (3.19), δϕ(rw,θw)=sin(θw)k=0ck(rw)cos(kθw). Strictly speaking, δϕ(rw, θw) can be separated into the product of two functions of rw and θw only if all functions ck(rw) are proportional to each other. However, if these functions decay similarly with the distance to the wall rw, the proportionality assumption and hence the product assumption would be only weakly invalid, as illustrated in the practical examples presented above. In fact, we generally expect only a few Fourier modes to be relevant in the above expansion (in [12], it was found that only one or two non-zero modes were enough to describe all interactions; see next section). Hence, the product assumption can be only severely invalidated if the few relevant functions ck(rw) have very different behaviour. However, on general biological grounds, we expect that all these functions characterizing the interaction with the wall should decay smoothly with the distance rw. Similarly, for the corresponding functions describing the attractive interaction between two fish, we expect them to first increase with distance between the fish before decaying at larger range. Hence, even if these few functions are not strictly proportional to each other, their similar anticipated behaviour should lead to effective product interactions grasping the most important features of the actual interactions.

    In conclusion of this section, the product assumption allows for an efficient method to reconstruct the individual one-variable interaction functions, only requiring a moderate amount of experimental data. Even if the product assumption is not strictly verified by the actual interactions, general biological considerations ensure that the product interactions would still describe their main features. In any case, the product form offers an explicit representation of the interactions, separating the different contributions (interaction with the wall, attraction/repulsion and alignment between individuals). The reconstructed one-variable interaction functions can be fitted by simple analytic forms (see next section) and the full interaction functions can then be straightforwardly implemented in an explicit and concise model whose predictions can be checked against experimental results, hence providing a further validation of the product assumption. In addition to ultimately producing models for the dynamics of animal groups, the simple and explicit form of the interactions allows for a precise analysis of the behavioural interactions at play in the system, and in particular, for the disentangling of their different components.

    4. Extraction and comparison of social interactions in different species of fish

    The model proposed in §3 is based on the assumption that social interactions are combined in an additive form (see step 3, figure 6). In equation (3.2), two functions δϕAtt and δϕAli were introduced to account for the attraction and alignment interactions, for which no a priori assumptions were made except that their dependence on the state variables dij, ψij and ϕij is decoupled; see equations (3.5) and (3.6). Steps 4.1–4.3 in figure 6 provide us with functional forms, but do not determine which functional form corresponds to which kind of interaction.

    To do that, the analysis of the relevant observables (step 2 in figure 6) is used to say that δϕ must change sign when ψij and ϕij both change sign, and that, consequently, the same must happen for δϕAtt and δϕAli. This way, the parity of the angular components of the interaction functions is univocally determined. Thus, to have an interaction of attraction, the fish must turn left (resp. right) if its neighbour is on its left (resp. right); that is, δϕ > 0 if ψij > 0 (resp. δϕ < 0 if ψij < 0). Assuming perfect left/right symetry, this exactly means that δϕAtt must be an odd function of ψij, and thus an even function of ϕij, provided fish do not have side preferences to turn (that is, fish do not prefer turning left to turning right). Similarly, to have an interaction of alignment, the fish must turn left when the relative heading of its neighbour is turned to the left, and turn right if it is turned to the right; that is, δϕAli must be an odd function of ϕij, and thus an even function of ψij.

    This allows us to rewrite the interaction functions from equations (3.5) and (3.6) as follows:

    δϕAtt(dij,ψij,ϕij)=FAtt(dij)OAtt(ψij)EAtt(ϕij)4.1
    and
    δϕAli(dij,ψij,ϕij)=FAli(dij)EAli(ψij)OAli(ϕij),4.2
    where function names O and E stand for ‘odd’ and ‘even’, respectively. The six unknown interaction functions are then tabulated on a grid with typically 30 boxes leading to effectively 180 fitting parameters, a number much smaller than the number of kicks experimentally recorded (2 × 105 for H. rhodostomus and 4 × 104 in D. rerio). Ultimately, these fitting parameters are determined by minimizing the corresponding error function (see §3).

    Figure 14 shows the social interaction functions reconstructed with the procedure described in the previous section in the case of two H. rhodostomus (figure 14ac) and two D. rerio (figure 14df) swimming in circular arenas. In both species, attraction and alignment have been detected and, although some important differences can be observed, each kind of interaction has essentially a similar shape and intensity, although the interaction ranges are very different. Hence, we used the analytical expressions introduced in [12] for H. rhodostomus (solid lines in figure 14) to fit the discrete values of D. rerio extracted from the experimental data in step 4 of our method (figure 6), that is, with the procedure described in §3 (points in figure 14).

    Figure 14.

    Figure 14. Analytical expressions of the social interaction functions of H. rhodostomus (a–c) and D. rerio (d–f) swimming in arenas of radius 0.25 and 0.29 m, respectively (solid lines), interpolating the discrete values extracted from the experimental data with the procedure described in §3 (dots). (a,d) Intensity of the attraction (red) and alignment (blue), modulated by the angular functions of (b,e) attraction and (c,f) alignment. (Online version in colour.)

    Figure 15.

    Figure 15. Iterations converging to (x*, x*). Thick red line: function f(x); thin brown line: y = x; thin polygonal: iteration process; coloured dots: successive values of xn. (Online version in colour.)

    These expressions are, for the intensity of the social interaction of attraction and alignment, as follows:

    FAtt(d)=γAtt(ddAtt1)11+(dlAtt)24.3
    and
    FAli(d)=γAliddAliexp[(dlAli)2],4.4
    where the values of the parameters depend on the species and on the size of the arena. Note that the expression for the intensity of the alignment has been simplified with respect to the one obtained in [12] and now has one less parameter. Here, γAtt and γAli are the (dimensionless) intensities of the attraction and alignment interactions, dAtt is the distance below which attraction changes sign and becomes repulsion, lAtt and lAli are the ranges of each interaction (the higher the value, the longer the range), and dAli = 1 BL is a characteristic length used to make γAli dimensionless. Having dimensionless factors γAtt and γAli allows direct comparison between intensities of social interactions and the intensity of the interaction with obstacles δϕw and the spontaneous decision term δϕR in equation (3.1). Table 1 shows the parameter values corresponding to each species. Note that the values for H. rhodostomus have been adapted from those in [12], according to the slightly different expression used here.

    Table 1. Parameter values for H. rhodostomus and D. rerio in circular arenas of radius 0.25 and 0.29 m, respectively.

    H. rhodostomus D. rerio
    R (m) 0.25 0.29
    R/BL 8.33 6.44
    γAtt 0.124 0.42
    dAtt (m) 0.03 0.015
    lAtt (m) 0.193 0.042
    γAli 0.092 0.32
    dAli (m) 0.03 0.045
    lAli (m) 0.16 0.1

    Regarding the normalized angular functions for D. rerio, we found the following expansions (with at most two Fourier modes in addition to the trivial zero mode):

    OAtt(ψ)=1.66sin(ψ)[10.77cosψ+0.6cos(2ψ)],EAtt(ϕ)=0.81[10.95cosϕ+0.12cos(3ϕ)],EAli(ψ)=0.54[1+cosψ2cos(2ψ)]andOAli(ϕ)=1.53sin(ϕ)[1+0.24cosϕ].

    Following step 5 of our methodology (figure 6), these analytical expressions should be implemented in the model introduced in §3. Then, numerical simulations of the model should be performed and compared with the known experiments, and predictions should be made, which must be verified a posteriori. This corresponds to step 6 of our methodology (figure 6); this has been done for H. rhodostomus in [12], and will be done elsewhere for D. rerio and other species.

    Comparing the interaction functions found for both species, we observe that they have a similar shape, especially the angular functions (see the angular functions of attraction in figure 14b, for which one can use the same function of the angle of perception ψij). The intensities of attraction and alignment have the same order of magnitude in both species: the maximum of the attraction is around 0.4 in both species, and the maximum of the alignment is around 0.2–0.3 (figure 14a).

    The most important difference between the two species is that the range of the interactions is much larger in H. rhodostomus than in D. rerio: in H. rhodostomus, the maximum intensities of attraction and alignment are around 7 and 3.5 BL respectively, while in D. rerio these maxima are both near 1.5 BL. Moreover, the intensity of these interactions decays more rapidly in D. rerio than in H. rhodostomus, especially with respect to the fish BL (in D. rerio, the alignment intensity is zero beyond 5 BL (approx. 22.5 cm), but it is still noticeable at a distance of 10 BL (approx. 30 cm) in H. rhodostomus). Attraction almost always dominates alignment in D. rerio (the intersection of the red and blue lines in figure 14a is at around 0.5 BL), while, in H. rhodostomus, alignment was found to dominate attraction at short distances (under 2.5 BL).

    In H. rhodostomus, a fish i is subject to a stronger attraction when the other fish j is at its right or left side (OAtt reaches its highest values when ψij ≈ ±90°; see figure 14b) and moves more or less perpendicular to it (EAtt is higher when ϕij90--100; see figure 14c). Alignment is stronger when the other fish is in front (EAli is higher when |ψij| < 80°). In D. rerio, attraction is stronger when the other fish is clearly behind the focal fish (OAtt is higher when ψij ≈ ±135°) and moves in the opposite direction (EAtt is higher when |ϕij| > 100°; see figure 14e,f).

    In both species, the strength of the alignment vanishes when fish are already almost aligned (OAli ≈ 0 when |ϕij| < 30°; see figure 14c,f). In D. rerio, the strength of the alignment is active essentially when both fish are perpendicular to each other (the high intensity of |OAli| ≈ 1.5 is reached when ϕij ≈ ±85°) and the focal fish has its neighbour at one of its sides (EAli peaks at ψij ≈ ±85°), while in H. rhodostomus, the ranges of interaction both in the angle of relative heading and in the angle of perception of the neighbour are much wider: alignment is active when the neighbour is ahead of the focal fish (|ψij| < 100°) and fish are simply slightly aligned (45° < |ϕij| < 135°).

    In summary, when swimming in pairs, H. rhodostomus interact in a much wider range of situations than D. rerio. This is true with respect to the three state variables of a focal fish: (1) the distance dij at which both attraction and alignment interactions are active is much larger in H. rhodostomus than in D. rerio; (2) the zone around a focal fish where the strength of the interactions with a neighbour is important is much wider in H. rhodostomus than in D. rerio, and (3) the same is true for the range of relative headings for which the intensity of interactions between fish is not negligible. Finally, the maximum intensity of the interaction is similar in both species, although high-intensity values are reached in a much wider range of situations in H. rhodostomus than in D. rerio.

    5. Discussion and conclusion

    Behavioural biology has recently become a ‘big-data science’ mainly supported by the advances in imaging and tracking techniques. These new tools have revolutionized the observation and quantification of individual and collective animal behaviour, improving to unprecedented levels the variety and precision of available data [9,2628]. As the access to large volumes of data is gradually stepping animal behaviour research into a new era, there is also a growing need for understanding interactions between individuals and the collective properties that emerge from these interactions. Animal societies are complex systems whose properties are qualitatively different from those of their individual members, and whose behaviours are impossible to predict from a priori knowledge of individuals [3,4]. However, understanding how the interactions between individuals in swarms of insects, schools of fish, flocks of birds, herds of ungulates, or human crowds give rise to the ‘collective level’ properties requires the development of mathematical models. These models allow investigation of how complex processes are connected, to systematically analyse the impact of perturbations on collective behaviour (e.g. when a predator is detected in the neighbourhood), to develop hypotheses to guide the design of new experimental tests, and ultimately, to assess how each biological variable contributes to the emergent group-level properties.

    We have presented a general methodology that leads to the measurement of the social interactions (defined in the framework of a model) from a set of individual trajectories. This procedure, illustrated here on two different species of fish, can be similarly applied on any set of trajectories of other organisms, including humans [29,30]. Once the experimental trajectories have been obtained, the extraction of interaction functions only makes sense after they have been defined in the framework of a general model for the equation of motion of an individual interacting with its environment (i.e. the obstacles and another individual). The model should involve the relevant variables regarding the interaction with obstacles (distance to the nearest wall, angle of the velocity with respect to the normal to the wall etc.) and another individual (distance between individuals, relative velocities, viewing angle etc.). In addition, each interaction component (repulsion, attraction, alignment etc.) is reasonably assumed to contribute additively: for instance, the influence of the wall and another individual on the focal individual is the independent sum of the two corresponding interactions. More importantly, the central hypothesis of our approach consists in assuming that the contribution of each interaction can be adequately described by a product of unknown single-variable functions of the relevant variables, or of a combination of these variables. The structure of the model is obviously also constrained by the considered species, their motion mode, and their anticipated interactions. For instance, for the fish species that have a burst-and-coast swimming mode, the dynamical model is intrinsically discrete in time and returns the angle change of an individual after each kick. For humans or some other fish species with a smooth swimming mode, a continuous-time model is necessary.

    The unknown interaction functions defined in the structure of the model are not constrained, and the aim of the extraction procedure (step 4 in figure 6) is to measure them, without any a priori assumption about their form or intensity. In order to achieve that, each unknown single-variable interaction function is tabulated on a one-dimensional grid, and their values at each grid point are the fitting parameters. These parameters are then determined by minimizing the mean quadratic error between the prediction of the model and the experimental angle changes after a kick (in the case of discrete dynamics) or the experimental acceleration (in the case of a continuous-time dynamics). This minimization process (box 1) is achieved by solving, for instance with an iterative method (box 2), the equations expressing the vanishing of the partial derivatives of the error with respect to each fitting parameter, as we have done here, or by gradient descent methods. Once the interaction functions have been obtained on their respective grid, they are fitted and represented by simple analytical forms that best capture their general shape (figure 14).

    The original general equation of motion is now complemented by these explicit interaction functions, leading to a concise and explicit model, which can be straightforwardly implemented numerically and can even be studied mathematically. The ability of the model to reproduce the experimental results and even to predict the behaviour of individuals in other situations not yet investigated experimentally can then be assessed. The original model can also be extended and the full extraction procedure repeated if an important feature appears to be missing. In particular, quantities like the probability distributions of the distance to the wall, of the distance between two individuals, of the angle between the two velocities or between the velocity and the normal to the wall, as well as other observables, permit the predictive power of the model to be assessed. Note that the interaction functions appearing in the model are determined by finding the best equation of motion describing the instantaneous decisions of the individuals. It is by no means trivial that this is enough for the resulting dynamical model to be able to reproduce observable quantities measured after averaging over many trajectories, or to predict the behaviour of individuals in different experimental conditions. A model able to achieve this certainly provides a convincing indication that its original and general structure and its extracted interaction functions properly represent and describe the behaviour and motion of the studied species.

    The main limitations of our methodology lie in the reasonable assumption of additive contributions for the different interactions, and more critically, in assuming that each of these interactions is the product of single-variable interaction functions. This is the cost to pay for only involving a limited number of fitting parameters, yet capturing a large part of the complex structure of these interactions, and also for ultimately obtaining a concise, explicit and exploitable model.

    To address these issues, we are planning in the near future to test our models on a robotic platform [31]. Such a platform could also allow us to study bidirectional interactions between robots reproducing the trajectories generated by our models and real fish, to obtain a more representative validation of our models of interactions [32].

    Yet, the advantages and benefits of our approach are numerous. First, the number of fitting parameters (typically 30 for each of the typically five to eight interaction functions), although apparently large (a total of typically 150–300 parameters), is in general much smaller than the number of data points in the available experimental trajectories (typically 104 for human groups [29], and 105 for fish [12]). In comparison, a complete force map in typically five or more dimensions (one dimension per relevant variable) on a mesh involving 305 boxes, with enough data points in each of them, would require millions if not billions of experimental data points, which is in general impossible to achieve. Two-dimensional force maps obtained after projection (i.e. averaging on the other 5 − 2 = 3 variables), although less noisy than the original force map, cannot be exploited to build a model, since it can be shown that they are strongly affected by the existing correlations between these variables along actual trajectories. For instance, if a fish is very close to a wall, there is a high probability that it swims parallel to the wall, so that its distance from the wall is strongly correlated with its heading angle. On the other hand, it can be shown [12] that our methodology to extract interaction functions is not at all affected by the likely correlations present in the system, and actually exploits them.

    Our method is also robust with respect to the presence of noise in the data (intrinsic behavioural noise or unwanted experimental noise), and can actually be used to measure the spontaneous fluctuations of the speed and heading angle of the individuals [12,30]. Moreover, the extraction of interaction functions requires very limited computing power, being obtained within a few seconds on a standard workstation. Ultimately, the analysis of the resulting interaction function allows general qualitative conclusions to be made about the interaction at play for the considered species. Force maps can also help in this analysis, but our approach allows for an even finer analysis thanks to the disentangling of interactions by means of separate and explicit interaction functions, instead of mere projected colour maps affected in an uncontrolled manner by the inherent correlations present in the system and the mixing of the different interaction contributions.

    More importantly, our methodology ultimately leads to a concise and explicit model that can be exploited to understand and explain diverse experimental features and various forms of collective behaviour, and that has a predictive power, while force maps cannot be directly exploited to build such explicit models. Note that the structure of the model should be robust for different species having comparable motion mode. For instance, this structure is the same for H. rhodostomus and D. rerio (or for any species with a burst-and-coast or run-and-tumble motion mode), and the behavioural differences between the species are solely and fully encoded in their different measured interaction functions.

    In the specific cases of H. rhodostomus and D. rerio, we have also found that the interaction functions characterizing their interaction with the wall are very similar (see [12] for H. rhodostomus), which explains their common tendency to swim close to the wall, especially when a fish swims alone in a tank. However, even if the interaction functions describing the repulsion/attraction and alignment interactions between two fish have a similar general structure and shape for both species (figure 14), the range of the attraction and alignment interactions is much shorter for D. rerio. In addition, the intensity of both interactions in D. rerio is strongly reduced when the focal fish is behind and hence follows the other fish (i.e. when |ψij|< 30°). Both features contribute to a much weaker coordination of the motion in groups of two fish in D. rerio, compared with H. rhodostomus, which can already be qualitatively noticed by observing recorded trajectories. This weaker coordination is quantitatively illustrated by the probability distribution of the distance between two fish (figure 14a,d), which is wider for D. rerio and extends up to much larger distances. Moreover, the probability distribution of the heading angle difference between the two fish is less peaked near ϕij = 0 in D. rerio (figure 14f). Finally, the weaker coordination observed in D. rerio results in a similar behaviour for the geometrical leader and follower, whereas the leader and follower have a very distinct behaviour in H. rhodostomus.

    6. Material and methods

    (a) Ethics

    Experiments with H. rhodostomus have been approved by the Ethics Committee for Animal Experimentation of the Toulouse Research Federation in Biology no. 1 and comply with the European legislation for animal welfare. Experiments with D. rerio were conducted under authorization by the state ethical board of the Department of Consumer and Veterinary Affairs of the Canton de Vaud (SCAV) of Switzerland (authorization no. 2778). During the experiments, no mortality occurred.

    (b) Study species

    Hemigrammus rhodostomus were purchased from Amazonie Labège (http://www.amazonie.com) in Toulouse, France. Fish were kept in 150 l aquariums on a 12 : 12 h, dark : light photoperiod, at 26.8°C (±1.6°C) and were fed ad libitum with fish flakes. The average BL of the fish used in the experiments was 31 mm. Sixty wild-type D. rerio with short fins (AB strain) were acquired from a pet shop, and stored in a 60 l aquarium. The average BL of the fish used in the experiments was approximately 4.5 cm. The water in the housing aquarium was kept at a temperature of 26°C. The fish were fed once per day with commercial food between 16.00 and 18.00. Additionally, we opted to use enrichment for the aquarium in the form of plastic plants, Cladophora, gravel, rocks and aquatic snails.

    (c) Experimental procedures and data collection

    The experimental tank (120 × 120 cm) used to investigate swimming behaviour in H. rhodostomus was made of glass and was set on top of a box to isolate fish from vibrations. The set-up, placed in a chamber made by four opaque white curtains, was surrounded by four LED light panels giving isotropic lighting. A circular tank of radius R = 25 cm was set inside the experimental tank filled with 7 cm of water of controlled quality (50% water purified by reverse osmosis and 50% water treated by activated carbon) heated to 26.1°C (±0.3°C). Reflections of light due to the bottom of the experimental tank were avoided thanks to a white PVC layer. Each trial started by placing two fish randomly sampled from their breeding tank in a circular tank. Fish were left for 10 min to habituate before the start of the trial. A trial consisted in 1 or 3 h of fish freely swimming (i.e. without any external perturbation) in a circular tank. A total of 16 trials were performed. Fish trajectories were recorded by a Sony HandyCam HD camera filming from above the set-up at 50 Hz (50 frames per second) in HDTV resolution (1920 × 1080 pixels).

    In the case of D. rerio, the experimental set-up had dimensions of 100 × 100 × 25 cm, inside which a circular tank of radius R = 29 cm was placed. The sides and bottom part of the tank were covered with a Teflon plate to avoid reflections. Furthermore, the set-up was confined behind white sheets to isolate the fish from external stimuli in the room, while also maintaining a consistent lighting environment inside the set-up bounds. A uniform luminosity for the room was provided by four 110 W fluorescent lamps placed at each of the four sides of the tank. Prior to placing fish in the experimental set-up, we ensured that the height of the water was 6 cm. Then, 10 videos of 70 min duration each were recorded in the circular tank. Subsequently, a group of fish was randomly selected and caught from the rearing tanks to participate in the experiment. A pair of fish was then chosen and placed in the set-up. The fish were allowed to habituate for 5 min before starting the 70 min long recording. After a single experiment was completed, the fish were returned to the original rearing tank without being re-inserted in the selected group (i.e. no individual was used twice in the same day). The positions of fish on each frame were tracked with idTracker 2.1 [8]. Time series of positions were converted from pixels to metres and the origin of the coordinate system was set to the centre of the ring-shaped tank. Tracking errors (approx. 20% of the data) were corrected and instances where at least one fish moved less than 0.5 BL per second during 4 s were removed. More than 8 h remained during which one fish could kick.

    We provide the dataset of each species in Figshare: https://doi.org/10.6084/m9.figshare.9933356.v1.

    (d) Segmentation of trajectories

    Instants of kicks were identified as local minima of the velocity preceding local maxima (which are more easy to identify), along time intervals [ttw, t + tw], where tw is a time window of 0.32 s. Trajectories are thus considered as a sequence of kicks. Fish almost never kick at the same time (asynchronous kicks); the position of the other fish is calculated at each instant of kick of the focal fish by simple linear interpolation.

    (e) Fortran code

    We provide the code to perform the analysis described in §3b, together with a simple script for quick visualization of the result in the gnuplot environment and the required data in Figshare: https://doi.org/10.6084/m9.figshare.11777325.

    Data accessibility

    Supporting data are available from the Figshare Repository: https://doi.org/10.6084/m9.figshare.9933356. We provide the code to perform the analysis described in §3.2, together with a simple script for quick visualization of the result in the gnuplot environment and the required data in Figshare: https://doi.org/10.6084/m9.figshare.11777325.

    Authors' contributions

    R.E., C.S. and G.T. conceived and designed the study. R.E., V.L., V.P. and F.B. performed research; R.E., V.L., C.S. and G.T. analysed data. R.E., C.S. and G.T. wrote the paper with an input from all other authors.

    Competing interests

    We declare we have no competing interest.

    Funding

    This work was supported by the Germaine de Staël project no. 2019-17 and the Swiss National Science Foundation project ‘Self-Adaptive Mixed Societies of Animals and Robots’, grant no. 175731. R.E. was supported by Marie Curie Core/Program Grant Funding, grant no. 655235-SmartMass. V.L. was supported by doctoral fellowships from the scientific council of the Université Paul Sabatier.

    Acknowledgements

    G.T. gratefully acknowledges the Indian Institute of Science to serve as Infosys visiting professor at the Centre for Ecological Sciences in Bengaluru. We thank Gérard Latil for technical assistance.

    Footnotes

    Endnotes

    1 The average function of a(x, y) with respect to the variable x is given by ax(y)=xp(x,y)a(x,y)dx, where p(x, y) is the probability of occurrence of the state (x, y). However, force maps calculate the 〈ax(y) as the mean of the values of a(x, y) over all the values of x, i.e. ax(y)=xa(x,y)/x1, as if p(x, y) = 1 for all (x, y), because knowing p(x, y) is part of the problem.

    2 The function f must be contractive and the initial value of the iterations must be sufficiently close to the fixed point.

    One contribution of 14 to a theme issue ‘Multi-scale analysis and modelling of collective migration in biological systems’.

    Published by the Royal Society. All rights reserved.

    References