Journal of The Royal Society Interface
You have accessReview articles

Automated bioacoustics: methods in ecology and conservation and their potential for animal welfare monitoring

Michael P. Mcloughlin

Michael P. Mcloughlin

Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London, Mile End Campus, London, UK

[email protected]

Google Scholar

Find this author on PubMed

,
Rebecca Stewart

Rebecca Stewart

Centre for Digital Music, School of Electronic Engineering and Computer Science, Queen Mary University of London, Mile End Campus, London, UK

Google Scholar

Find this author on PubMed

and
Alan G. McElligott

Alan G. McElligott

Centre for Research in Ecology, Evolution and Behaviour, Department of Life Sciences, University of Roehampton, London, UK

[email protected]

Google Scholar

Find this author on PubMed

    Abstract

    Vocalizations carry emotional, physiological and individual information. This suggests that they may serve as potentially useful indicators for inferring animal welfare. At the same time, automated methods for analysing and classifying sound have developed rapidly, particularly in the fields of ecology, conservation and sound scene classification. These methods are already used to automatically classify animal vocalizations, for example, in identifying animal species and estimating numbers of individuals. Despite this potential, they have not yet found widespread application in animal welfare monitoring. In this review, we first discuss current trends in sound analysis for ecology, conservation and sound classification. Following this, we detail the vocalizations produced by three of the most important farm livestock species: chickens (Gallus gallus domesticus), pigs (Sus scrofa domesticus) and cattle (Bos taurus). Finally, we describe how these methods can be applied to monitor animal welfare with new potential for developing automated methods for large-scale farming.

    1. Introduction

    Bioacoustics is the study of the production, transmission and reception of animal sounds. This includes not only the vocalizations of animals such as birds and mammals [13], but also the sounds that can be produced by insects [4,5]. In ecology, the automated analysis of animal sounds can be used for individual animal detection [6], species detection [7,8], location of animal detection [911] and population monitoring [6,1214]. In conservation, it is useful when verifying if human activities such as shipping or seismic survey vessels affect wild animal behaviour [1519]. Vocalizations of some species such as goats (Capra hircus) and horses (Equus caballus) also differ during positive and negative experiences [2023].

    Methods in bioacoustics are becoming increasingly automated, with researchers deploying autonomous recorders that are capable of automatically collecting data [2426]. The automated analysis of sound has also been applied to tasks such as speech recognition [27]. This is easily the most well-known application of audio analysis, and it is found on every smartphone today [28,29]. Outside of speech recognition, computer scientists have focused their attention on the classification of ‘sound scenes’ (the type of environment an audio recording was collected in, such as a street or the inside of a bus), and of ‘sound events’ (for example, identifying if a car has passed by) [30].

    Most animal welfare research to date has focused on reducing negative experiences for animals. This involves improving environmental factors such as housing [3133], lighting [34], stocking density [3537], reducing aggression [3840], and injury and disease prevention [41]. Assessing animal welfare can be difficult, but is usually achieved using some type of scoring method indicative of negative experiences [4143] or through physiological assessment of the animal to identify conditions such as hock burn in poultry [44]. While these factors are important for monitoring the physiological welfare of the animals, it is now accepted that good animal welfare should not only involve protection from negative experiences, but also the inclusion of positive ones [4548]. More recently, technologically advanced methods such thermal imaging use infrared cameras to measure variation in blood flow and body temperature, allowing it to be used as a non-invasive method for monitoring heat loss, and thus discomfort and risk of illness [49].

    Animal welfare assessment and monitoring could benefit from increased use of automated methods [50,51]. One area in particular that shows promise is the use of automated analysis of the vocalizations that animals produce for monitoring their health and welfare. While ecology and conservation appear to be rapidly adopting advanced sound/audio methods for monitoring animal populations [7,52,53], the use of these methods in animal welfare has been somewhat slow and limited. This is despite previous research discussing the benefits of bioacoustics monitoring for animal welfare [54], and the research projects investigating common livestock vocalizations that have highlighted the potential of their methods for application in animal welfare monitoring [55,56]. The main goal of this review is to show recent advanced computational audio analysis methods that are already being used in ecology, conservation and animal cognition research in order to discuss how they may be applied as a potential method for monitoring negative and positive animal welfare in agricultural settings. Applications in speech processing, sound scene analysis and classification are also discussed, because these are implementing the most technically advanced methods in the field overall.

    Herein, we first outline the methodology on how to extract meaningful information from these recordings through the process known as acoustic feature extraction. We also introduce methods being deployed in ecology and conservation that implement the most technically advanced algorithms for analysing animal sounds. We conclude with a discussion of the function of vocalizations in some of the most common farmed livestock (chickens, Gallus gallus domesticus; pigs, Sus scrofa domesticus; and cattle, Bos taurus), and the potential application of the new methods that could be implemented for automated monitoring of animal welfare. Chickens and pigs are highly vocal species [5760] that are likely to be particularly suitable for these methods. Finally, we close the review discussing the most pressing challenges facing bioacoustics in welfare and the future direction of the field.

    2. Literature collection methodology

    The literature was collected using the Web of Science and Google Scholar search engines. While the field of automated bioacoustics monitoring is in its infancy regarding animal welfare, bioacoustics in ecology and electronic engineering are advancing rapidly, resulting in a large body of literature. In order to narrow down the literature search, and reflect the cutting edge of the field, we restricted our search to papers published in the past 5 years, ranging from January 2013 to June 2018. The following keywords were used: bioacoustics; ecoacoustics; animal names in English and Latin (chickens, Gallus gallus domesticus; pigs, Sus scrofa domesticus; and cattle, Bos taurus); sound scene classification; sound event detection and classification. Searches were both individual and Boolean. For the farm livestock discussion, we restricted our searches to some of the most common livestock (chickens, pigs and cattle), because they are also highly vocal [50,6163] and farmed in large numbers on an industrial scale. The chosen published studies on livestock species are used to illustrate key aspects of their vocalizations relevant to this review. The authors identified the literature that deployed techniques that could be adapted for animal welfare such as call identification, density estimation, species identification and physiological information detection. The authors omitted any papers on fish, insect and amphibian bioacoustics. Methods involving multimodal data are not covered in this literature review in order to focus on audio methods. The total number of papers in this review is 149, with 66 that were published before 2013. Pre-2013 papers are either studies that illustrate a particular aspect of bioacoustics well or were included because information on the topic in the past 5 years has been scant.

    3. Audio feature extraction

    After completing data collection, the first step in analysing audio recordings is to extract meaningful information from the signal. This process is commonly termed audio feature extraction [64]. There are several methods for extracting audio features from a signal, and the process of identifying what type of features should be used can be viewed as a research task in itself [65,66]. While these methods can be carried out in the time domain, the majority of algorithms focus on the time–frequency domain. In order to transform a signal from the time domain (the raw audio samples stored in an array, or some other type of format) to the time–frequency domain, it is necessary to carry out what is known a discrete Fourier transform (DFT) [67]. In the simplest form, a Fourier transform breaks down a signal into a number of different sinusoidal functions, each with their own frequency, phase and amplitude values. When a signal is converted to the frequency domain, using an implementation of the DFT called the fast Fourier transform (FFT), it is possible to extract a number of acoustic features, the most common of which are mel frequency cepstrum coefficients (MFCC), which gained considerable attention because of their success in human speech recognition algorithms [68]. This trend has been noted in reviews of the Detection and Classification of Audio Scenes and Events (DCASE) competition, where mel-based feature extraction methods were the most popular in classification and detection tasks [30]. The report on the DCASE challenge also noted recent trends in environment classification have implemented a variety of deep learning methods. A simple definition of deep learning refers to supervised and unsupervised machine learning algorithms that carry out a variety of tasks (such as classification, data generation, translation and prediction) using very large datasets (big data) and large neural networks [69]. A useful comparison of deep learning methods for environmental sound detection is given in [70]. In audio applications, the mel spectrogram has been used as the most common input for deep learning networks, although researchers are investigating the potential of raw audio samples as input [71,72]. Linear prediction coding, a model that is inspired by the source-filter theory of speech [73], analyses sounds in order to create filter banks that can recreate those found in the original sound. The fundamental frequency of a signal is the lowest voiced harmonic in that signal [73]. There are many other acoustic features that have been applied to the analysis of music recordings. These features include spectral flux, which measures the change in magnitude of all frequency bins, and has been used as an onset detection function (for example, detecting the start of a piano note) [74]. The spectral centroid has been used as a feature for describing the ‘brightness’ of a sound, making it useful when characterizing timbre [75]. Spectral flatness is a common method in speech analysis for detecting how noisy a signal is. Zero crossing rate examines how often an audio signal crosses the zero axis and is useful in detecting voices in noisy environments. While an exhaustive description of every acoustic feature and parameter is beyond the scope of this review, we have summarized the advantages and disadvantages of some of the most common audio features and parameters in table 1.

    Table 1. Common audio feature extraction algorithms. Each row corresponds to a different algorithm, with the first column giving the name of the feature, the second column some of the advantages associated with the method and the third column giving some disadvantages.

    feature name advantages disadvantages
    mel frequency cepstrum coefficients available in most software packages. Successfully implemented in many speech and birdsong studies. Popularity of the algorithm means it is well optimized and fast susceptible to interference from background noise
    linear predictive coding method that represents the spectral envelope of a signal and is based on the source-filter model, making it relevant to many animal vocalization studies does not perform well with sounds outside of the formant range
    mel spectrogram commonly used for deep learning algorithms. It is a spectrogram that has been mapped to the mel scale while suitable for many deep learning algorithms, it is not practical for many classic machine learning algorithms
    fundamental frequency the lowest partial in a signal after carrying out Fourier analysis. Associated with the concept of ‘pitch’. Used in several animal studies. Easier to conceptualize than some other features high computational cost
    spectral centroid associated with the 'brightness' of a sound. Used in music research as a method for timbre analysis typically combined with other audio features. Not often the only parameter measure in a signal
    spectral flux associated with timbre. Has been useful for identifying percussive sounds in music typically combined with other audio features. Not often the only parameter measure in a signal
    spectral flatness useful for detecting how noise like or tone like a signal is typically combined with other audio features. Not often the only parameter measure in a signal
    zero crossing rate analyses how frequently a signal crosses the zero axis. Has been used to detect voices in noisy environments and also been use for detecting percussive like sounds in music typically combined with other audio features. Not often the only parameter measure in a signal

    In supervised machine learning tasks, audio features are usually combined with other data such as the name of the species, and the location in which it was recorded [76]. In machine learning, these labels are often called ‘classes’ and the combined classes are referred to as the ‘taxonomy’. Labelling data can be a challenging task [77] because it requires expert knowledge of the data, is time consuming and can be subject to human error. Some researchers use citizen scientist programmes to assist in annotating recordings [7]. These annotations are highly important, as they are required for supervised machine learning tasks. A major setback in applying the methods discussed in this review is the lack of well-labelled open source databases for common farm animals. This is non-trivial, because recording animal vocalizations is a challenging task in itself. Finally, the creation of a database requires a human to accurately label each individual vocalization. This means that the database will be subject to some degree of human error. After extracting a feature, it is possible that variation in the duration of a signal could affect analysis. One method for adjusting the length of a signal is dynamic time warping. An excellent example of its application was its use in comparing individual units of vocalizations in birds [78]. It was also used to identify the similarities between speech recordings where an individual speaks at different speeds [79].

    4. Automated acoustic monitoring in ecology and conservation

    Bioacoustic monitoring in ecology and conservation is an extremely challenging task, and the relationship between an ecosystem and audio recorded from it is still not fully understood [80,81]. Here we outline methods that have been developed over the past 5 years to investigate a variety of topics in ecology and conservation. Bioacoustic analysis has proven especially useful in environments that are naturally hostile to humans and where visibility is low, such as marine [15,82,83] and tropical [52,8486] ecosystems. Acoustic monitoring can also be useful in detecting nocturnal animals such as bats [7,12]. This concept of hostile environment can be extended to include animal production facilities, which have been shown to be associated with increased risk of respiratory diseases in humans [87]. Automated acoustic monitoring will help reduce the amount of time that humans have to spend in potentially dangerous environments, and aid farmers in monitoring animal health and welfare. It also allows for the monitoring of animals at night when workers may not be available, and visibility is low. The interdisciplinary and highly technical nature of the field requires researchers to be familiar with digital signal processing, mathematics, machine learning and ecology. This can make it difficult for people with backgrounds in animal behaviour and welfare, as well as veterinary science to navigate the literature discussed in this review. In order to address this issue, we designed a decision tree shown in figure 1 to aid researchers in selecting papers to begin their own investigations into the field.

    Figure 1.

    Figure 1. A decision tree to help researchers identify bioacoustics studies relevant to animal disease status, location detection, physiological information, number of animals and species detection.

    Torti et al. [88] implemented a method known as the Acoustic Complexity Index to estimate the number of lemurs (Indri indri) taking part in a choral display in a tropical environment. They found that relatively simple spectrographic analysis was sufficient when identifying up to three singers, but for larger numbers of animals the Acoustic Complexity Index [89] performed well, positively correlating with the number of animals in the environment. Other investigations have found that the use of acoustic indices (mathematical descriptions of sounds similar to audio features) can be used to accurately detect the number of biological sounds in terrestrial recordings, but they performed poorly in marine recordings [90]. It was noted in the same research that the performance of acoustic indices was negatively affected by noise from insects, weather and anthropogenic sounds.

    There has been recent evidence to suggest that acoustic monitoring can be used to infer individuality, behaviour and morphology information about animals. In a study of African penguins (Spheniscus demersus), discriminant function analysis (DFA) applied to acoustic parameters extracted from recordings of the calls allowed 12 individuals to be identified 62–78% of the time [9193]. When implementing leave-one-out cross validation, the accuracy of the DFA was 66%. DFA has also been applied to the study of three different crane species, investigating how fledglings can increase their nonlinear calls as they grow older so as to avoid habituation of parents to their vocalizations [94]. It achieved an accuracy of 73% for animals aged 3–45 days old, and 79% accuracy for animals aged 83–183 days old. However, it should be noted that that DFA does not account for spectral or temporal features that may also be important in determining individuality. In fallow deer (Dama dama), lower frequency groans correlate with larger animal size, and indirectly with the individual's social status [3]. In goats (Capra hircus), feed-forward artificial neural networks have been used to classify calls according to individual identity, group membership and maturation [95]. Contact calls (n = 321) from 11 individuals were collected, and 27 acoustic features extracted from each call. Each input node corresponded with a different acoustic feature. The study achieved 71% accuracy for vocal individuality, 29% for social group and 91% for age.

    A challenge that is faced by many of these methods is that they often require labelled datasets. For example, a researcher may have to manually annotate what sounds occur in a recording in order to implement supervised learning methods. One method of addressing the issue of unlabelled data is to apply unsupervised analysis methods in order to infer information such as diversity from recordings. Ulloa et al. [96] developed a method called multiresolution analysis of acoustic diversity to detect regions of interest in audio data by first identifying areas of interest in recordings using the short-time Fourier transform. These regions were characterized by extracting the median frequency and two-dimensional wavelet analysis. This was then automatically annotated using a clustering technique. Another approach to handling poorly labelled datasets is to automatically annotate and label them by breaking down audio transcription into multiple intermediate tasks, such as when they occur and to which class they belong [97]. Morfi & Stowell [97] achieved this by training two types of neural networks (stacked convolutional neural network and a recurrent neural network) and using three different training methods: separate training (identifying when an event occurs and what class it belongs to trained separately); joint training (share a convolutional part and the network outputs when an event occurs and to what class it belongs); and tied weights training. Tied weights training aims to combine the benefits of separate and joint training by having a shared convolutional part, but unlike joint training, different types of input can be used to train each task. Their results showed that tied weights training outperformed joint weights training, but that separate training still outperformed both tasks.

    In marine mammal science, the most common method of determining the location of an animal is known as passive acoustic sonar. Passive acoustic sonar implements an array of evenly spaced microphones that records the sound of an individual, and then calculates the difference in the time of arrival of this vocalization between all microphones in order to triangulate the location [82,98102]. The combination of detecting species and animal location is often referred to as passive acoustic monitoring [52,53,98].

    5. Detecting emotion

    The term emotion is a challenging one in animal behaviour science due to the several different descriptive and prescriptive definitions found in the literature [103]. Some researchers describe emotions using the valence and arousal model [104], a dimensional model that conceptualizes emotions regarding positivity and negativity (valence) and states of contentment and elation (arousal). This model can be assessed using judgement bias tests [105]. Other researchers may refer to more specific systems, such as the anxiety–depression continuum [106]. In this review, we specify what system was used in each study.

    Briefer et al. [20] investigated the relationship between emotional state and vocalizations in goats (Capra hircus) by recording the physiology (e.g. heart rate variability) of the animals using a bio-harness, along with sound recordings of the animals. Recordings were made when the animal was placed in four situations to evoke different states of arousal and valence (control, negative food frustration, negative isolation and positive food anticipation) [20,104]. Vocalizations produced during these different emotional states showed that goats uttered calls with a lower fundamental frequency with a low level of frequency modulation when placed in positive situations compared to negative ones. This study highlights how we can infer the emotional state of the animals from their vocalizations, and thus if they are having positive experiences during their lives, but the methods used to identify this have not been automated. This could be achieved through some of the classification methods discussed in the ecology section above. For example, it would be possible to apply call identification algorithms such as those used in [107] to identify distress vocalizations in chickens, pigs and cattle. Outside of ecology, several investigations have been carried out into determining the emotional state in recordings of human speech [108110], where the four basic human emotions (happiness, anger, fear and neutrality) were classified by analysing changes in vowel regions of speech, focusing on the features of fundamental frequency and the first three formants of the signal. These features were then classified using a support vector machine, achieving the best results at classifying happiness, but the poorest results when classifying fear. Another approach focused on selecting features for the classification of emotions by using a small database of speech signals with emotional labels, and a high number of acoustic features [110]. These were then combined with decision tree classification and random forests in order to classify the speech sounds. These methods could also be used in order to identify animal vocalizations associated with welfare, but would require a well-labelled dataset of sounds associated with positive and negative welfare in order to be implemented.

    6. Anthropogenic noise

    The effect of anthropogenic noise on animals [15,111114] is a key topic in bioacoustics research. Noise is usually the result of the sound of vehicles and has been shown to have a negative effect on animal foraging [113]. Researchers have noted that noise can also interfere with data collection itself, such as where background noise can interfere with acoustic methods to determine the number of animals taking part in a choral display [88,90] or in the application of acoustic indices to monitoring biodiversity [90]. This is one of the major challenges bioacoustics faces in terms of its application to animal welfare. Animal housing often relies on ventilation systems for maintaining air quality [115], which produce noise and interfere with data collection. Bioacoustic researchers should look towards the fields of speech and music analysis that are developing methods to separate different sound sources in audio recordings [116]. Noise on farms has also been highlighted as being a major concern for the welfare of farm workers [117], and acoustic monitoring provides a method that could allow for it to be monitored and thus controlled. In marine mammals, it has been suggested that noise from shipping has elicited a change in the vocalizations of humpback whales [17], requiring them to switch from primarily vocal acoustic displays to surface active displays such as breaching. For this reason, it is important for welfare researchers to be aware of other sounds in animal production environments, as they may influence vocalizations they are trying to monitor.

    7. Discussion of livestock vocalizations

    In order to link the discussion back to animal welfare, it is necessary to provide some information on the bioacoustics of some of the major farm livestock species, including their call functions, what information their vocalizations may carry and what previous studies have revealed.

    7.1. Chickens

    The repertoire of chickens was first described by Collias & Joos [58], who identified different vocalizations specific to the age and sex of the animal. For chicks, they identified pleasure chirps, distress chirps and fear trills. Pleasure chirps consist of short ascending vocalizations, distress chirps of short descending sounds and fear trills consist of rapidly modulating vocalizations. In adults, they identified parental calls, so named because they are used to attract chicks. These included clucking (repeated vocalizations with a low-frequency content) of a broody hen to help stimulate the chicks to follow her, and also calls to let the chicks know there is food nearby. They also identified a roosting call, where a broody hen is settled for the night, and does not have her chicks underneath her, she will emit a long, low purring sound. This sound is stimulated by distress calls from chicks and the onset of darkness. Broody hens also produce alert calls for their chicks, whenever a person approached them, and this affected the behaviour of the chicks who would cease their activities and remain still. Finally, broody hens produce fear squawks whenever they were held by a labourer or researcher. Adult males produced two different types of warning call that distinguish between predators located on the ground, and predators located in the air. The repertoire of red jungle fowl (the ancestor of domestic chickens) was also analysed, and the general vocalizations and behaviour of poultry and jungle fowl were noted to be the same [118]. As the animals grow, their vocalizations change and it is possible to predict this change over time [56].

    Research has elicited both ground and aerial chicken alarm calls using visual stimuli presented using a video-monitor [119]. Research has also identified other behaviours associated with different types of alarm calls. For example, after hearing aerial alarm calls, hens are more likely to run towards areas with cover. Both alarm call types increased rates of horizontal scanning, but hens are more likely to look upwards following aerial alarm calls. This shows that chicken alarm calls are functionally referential. This was also investigated in food calls [120]. Male chickens are more likely to elicit food calls whenever a female is present [121], meaning that these food calls are dependent on food and social context. Two playback experiments were carried out to determine their function. In the first, isolated hens were played back food calls and their behavioural responses were compared to when they were played back ground alarm calls and contact calls. Food calls resulted in the hens fixating their view downwards. This type of behaviour was not observed with other calls and suggests that food calls provide the hens with information about the presence of food.

    Domestic fowl vary their vocalizations when they are anticipating different types of rewards [62]. Calls in the McGrath et al. [60] study were first manually classified, and then subjected to classification and regression tree (CART) and random forest analysis. The CART and random forest analysis were used to identify the call repertoire in anticipation of rewards and during frustrative non-reward. The results revealed that chickens produce different call types in anticipation to different types of rewards. The acoustic analysis revealed that the peak frequency in these calls varied depending on the reward. This work is also an excellent example of how methods from ecology are already influencing animal welfare research, as this decision tree method was originally used as a labelling convention to identify the repertoire of social sounds in humpback whales [122].

    Sufka et al. [106] investigated the relationship between chicken distress vocalizations and the anxiety–depression continuum over time. This research was carried out in order to verify a chicken model of depression–anxiety for use in clinical drug trials as an alternative to rodent models, but nevertheless provides insights into the relationship between vocalizations and emotions in chicks. Socially raised chicks were separated from conspecifics and during this initial stage displayed distress vocalizations. The rate of production of these vocalizations was most intense at the onset of separation, and then began to decline. Three temporally sequential phases were suggested from these results (anxiety-like stage, transitional phase and finally a depressive stage). Socially separated animals displayed higher rates of production of stress vocalizations, and higher levels of hormones (corticosterone) associated with stress that peaked during the anxiety stage.

    There have also been spectral approaches to the analysis of chicken vocalizations associated with respiratory disease [123]. Sick chickens produce a vocalization known as a rale, a type of sound only produced when they are infected with respiratory diseases. They detected rales using sparse spectrogram decomposition, a method in which audio recordings of the animals are first divided into one-minute segments. A spectrogram is generated from these segments, and any frequency content not associated with the respiratory system of the animals is discarded. This is then used to generate a sparse coefficient matrix, which is essentially a matrix based on the spectrogram but with very few elements within it. This coefficient matrix is then summed in order to create a feature vector. This is carried out for each segment of audio in order to create a dictionary of these vectors. These dictionaries corresponded to recordings made of a healthy flock, and a flock that was infected with respiratory disease. Labels and vectors were used to train a support vector machine, which learned to distinguish between the healthy and unhealthy flocks. Another algorithm detected rales by labelling audio recordings of spectrograms from 8 min of audio recordings collected over 25 days of continuous recordings [124]. They then extracted MFCC vectors, clustered them in order to examine their distribution over a window of time, and classified the features using a decision tree. Another group of birds were infected, and the researchers were able to use their algorithms to track the course of the disease using the trained decision tree. These studies are focused on animal health and welfare, but their methods are more inspired by research in electronic engineering, than conservation, ecology and behavioural studies. However, it may be possible to implement these methods to examine other issues related to animal welfare, such as detecting pain calls in pigs [60,125].

    Chickens are highly vocal and thus they are particularly suitable for automated bioacoustics monitoring methods. Some techniques already used in ecology, such as call classification, have great potential for welfare monitoring. Intensive chicken production also usually relies on an automated lighting system [126], and cameras used for monitoring welfare operate poorly in low lighting conditions. Acoustic monitoring can bypass this issue and be used regardless of low light conditions. Similarly, the distress vocalizations discussed by Sufka [106] have the potential to be detected automatically using methods such as convolutional neural networks [127].

    7.2. Pigs

    The calls of domestic pigs can be divided into three different categories: high-frequency distress calls (squeals and screams) [23], shorter low-frequency vocalizations known as grunts [128,129] and higher intensity short vocalizations known as barks [130]. Screams differ from squeals in that they have a significantly lower peak and main frequency [125]. During social isolation, there is a direct relationship between production rate of low-frequency vocalizations (below 500 Hz) and environment, with pigs kept in barren housing producing fewer vocalizations than those kept in enriched environments [63]. In addition, some call parameters (formant frequencies) in pig grunts can also be used to indicate body size and thus growth rates, another important indicator of good welfare [131].

    An experiment was carried out involving two manipulations to determine if there were differences in the calls of thriving (heaviest in the litter) and non-thriving (lightest in the litter) piglets during separation from their mother, and if these differences in calls could indicate if the animal was in need of food [132]. This test did not distinguish between the different call types of pigs, such as grunts and squeals. They found that the non-thriving animals use more high-frequency, long duration calls, and that calls increased more in frequency than the thriving and well-fed animals. The same study also investigated the response of mothers to the playback of piglet isolation and white noise. It found that the mothers were more likely to return a response vocalization and approach the loudspeaker when they heard recordings collected from piglets kept in isolation. This suggests that the calls of piglets contain information about their needs [132]. Care needs to be taken when using pig vocalizations as an indicator of need, as previous research has shown that not all signals are honest, and care must be taken when analysing their sounds for welfare assessment [133].

    Piglet vocalizations have been analysed in order to estimate the level of pain they are experiencing [125]. Grunts, squeals and screams were analysed when piglets were being castrated with and without local anaesthesia. It was found that piglets castrated without local anaesthesia produced twice the number of screams as piglets castrated with anaesthesia. This suggests that pig vocalizations also carry information about pain, further highlighting automated vocal analysis as an appropriate tool for assessing their welfare. Painful situations, such as tail biting [50], could be detected using automated acoustic monitoring. Pig screams have been detected by using a combination of linear predictive coding combined with artificial neural network in order to detect screams in production environments [134]. Another algorithm was also developed to detect the location of cough sounds in a pig house by calculating the difference in time of arrival between an array of microphones [135]. This allows for the early detection of respiratory diseases in pigs before it can spread to healthy animals. However, this algorithm could be adapted to work with screams or squeals, allowing the farmer to localize where in the housing the incident is occurring.

    Emotional arousal was investigated in piglets for two specific distress calls and contact calls across three levels of arousal in negative situations [23]. Central frequency was a good indicator of arousal in call types and harmonicity increased for screams but decreased in grunts as arousal increased. Linhart et al. [23] also found that the intensity of amplitude also increased in screams, but not in grunts.

    Research on the vocalizations of wild boar has shown that their calls can be categorized into grunts (pulsatile, low-frequency sounds), squeals (noisy, harsh vocalizations in a broad frequency range), grunt–squeals (observations where both vocalizations were observed in a single vocalization), barks (isolated, short, high-intensity, non-harmonic vocalizations) and trumpets (harmonic calls with a high fundamental frequency) [136]. The recordings were analysed by extracting acoustic parameters and putting them through multinomial logistic regression models, and a hierarchical cluster analysis. The analysis confirmed that vocalizations of wild boars could be broadly categorized into four classes listed above. Wild boar calls also contain information on emotional valence [137]. Animals were given three different treatments (anticipating a food award, affiliative interactions and antagonistic interactions) and had their calls recorded during these treatments. Body movement was used as an indicator of emotional arousal. Screams and squeals tended to be produced during negative interactions, and grunts were associated with positive situations. Maigrot et al. [137] also used energy quartiles, duration, formants and harmonicity in order to infer emotional valence for the different call types and situations.

    Overall, the calls that both domestic and wild pigs produce are related to body size and various positive and negative emotional states, and thus have great potential for future automated monitoring of their welfare. However, it should be noted that there are distinct differences in the vocalizations of the wild boar and domestic pig. For example, wild boars possess a vocalization known as the trumpet that is not observed in domestic piglets [136]. Like grunts, trumpets are used as contact calls, but possess a higher frequency content than grunts. This highlights that we need to be careful in extrapolating results from studies regarding an animal's wild ancestors if we wish to apply them to welfare assessment.

    7.3. Cattle

    Green et al. [61] provide an excellent review of the evolution of cattle vocal communication, as well as an overview of how these vocalizations relate to various welfare contexts. They separated cattle vocalization functions according to: individuality of vocalizations, vocal recognition, calf separation, social isolation, oestrus, feeding and painful husbandry procedures. Cattle calls contain information on individuality due to high levels of inter-cow variability in the acoustic characteristic of their vocalizations. This allows for each animal to be identified by the ‘uniqueness' of their call [138141]. Cattle are herd animals, and isolation from their conspecifics results in physiological changes in the animal such as increased heart rate, salivary cortisol, urination and defecation rates, and an increase in vocal responses [142]. The different contexts put forward by Green et al. [61] could be detected by creating a database of audio recordings of these different vocalizations and their related contexts. Different machine learning algorithms could potentially be trained using this labelled dataset in order to identify the vocalization, and thus the context in which it occurred.

    Cattle cough sounds have been classified using labelled data from a variety of recordings, which were identified by a human labeller using a combination of audio and visual scoring [143]. They labelled a total of 205 min of sounds, resulting in 285 labelled calf coughs. They extracted features by calculating the FFT of the incoming audio, removing the background noise and reducing the resolution in the spectrograms by summing the frequencies into 12 separate bands. They also calculated the duration of the cough. An example-based classifier was used to compare the rough reduced spectrogram of incoming audio with the reduced spectrogram of the labelled data. This was achieved by calculating the Euclidean distance between the two rough spectrograms. The lower the distance, the more it resembled its corresponding spectrogram. This research achieved a 98% specificity rate (true negatives) and 52% sensitivity rate (true positive). Despite the low sensitivity, the algorithm was still able to detect increased periods of coughing, allowing farmers to administer treatment for the respiratory disorder.

    Cattle grazing sounds have also been analysed in order to determine the relationship between behaviour and acoustics measurements with herbage dry matter intake [144]. This was achieved by attaching microphones and cameras to a cow's forehead and exposing the cattle to different treatments which varied plant species, two different heights, an increasing of herbage mass and the number of bites it takes to finish (10–30). The sounds were analysed by extracting the energy flux density from the sounds. It was found that energy flux density related linearly to dry matter intake.

    8. Summary and recommendations

    In this review, we have provided an overview of feature extraction methods, automated bioacoustics monitoring for ecology and conservation, detecting emotions via vocalizations and the effects of anthropogenic noise on animals. Following this, a discussion of the vocalizations of three of the most important farm livestock species was provided, and how these vocalizations can be related to welfare state. Throughout the discussion on livestock vocalizations, we highlighted a number of areas that could benefit from automated monitoring. These include automatic classification of distress vocalizations in poultry [145], monitoring aggressive interactions between conspecifics such as tail biting in pigs [50,146] and implementing a context-based labelling for cattle calls [61].

    It is clear that there is no shortage of automated methods for classifying animal sounds. Today, one of the most pressing issues facing the use of acoustic monitoring for animal welfare is the lack of an open source database. If such a database were developed, it would be possible to implement many of the methods discussed in this review. Ideally, such a database would be designed similarly to open source projects such as the DCASE challenges [30]. Animal behaviour and welfare scientists have done much to identify the vocal repertoires of many important farm livestock species [58,61,136]. We suggest that that labels for this type of database could be based around the descriptions and analysis found in the Discussion of livestock vocalizations in this review. Due to the rapid growth and maturation of livestock animals, it is also necessary to capture information about age, size and weight, and the context and location in which these vocalizations were produced. However, simply identifying these vocalizations is not enough. It is essential that we relate this database back to the core issues of animal welfare such as the Five Freedoms [46,147], the environment the animals live in and quality of life that the animal experiences.

    Since there is no available open source dataset, it is recommended that animal welfare researchers working with vocalizations focus on building this dataset and implementing classic machine learning and classification methods. Following the deployment of traditional methods, big databases will emerge. With these big databases, researchers will be capable of implementing deep learning methods, which have been shown to outperform more traditional machine learning methods [7,69,70,97]. Deep learning is a class of machine learning methodology that can carry out supervised or unsupervised learning using very large datasets, and large neural networks with many layers such as convolutional neural networks [69]. Previously, many of these methods were inaccessible to researchers due to the large amount of processing power and memory they required. However, advances in the use of graphic processing units have made deep learning available to many researchers, and it has become one of the cutting-edge topics in machine learning. However, its application to audio is only recent [30], and deep learning requires a much larger dataset than the more common classes of machine learning algorithms.

    Finally, automated acoustic monitoring could be a useful tool in precision livestock farming [77,148]. As farming systems become increasingly automated, it is possible to dynamically adjust the environment in which the animals are kept and automatically change the temperature, lighting and ventilation. For example, if chicken rale calls were detected [149], it could indicate that there is not enough airflow in the housing. This could notify a computer to turn on fans and open windows to increase the airflow. Lamb vocalizations have also been analysed and shown that calls that reflect poor vocal fold engagement and arousal were less likely to be preferred by their parents [150]. This suggests that automated analysis of vocalizations could be an indicator of offspring quality. The application of vocalization monitoring for precision livestock farming is not new [56,77]. However, these previous efforts have been aimed at labelling methods and growth monitoring. Animal welfare researchers must look towards how these automated systems can integrate with vocal monitoring in order to deliver the highest levels of animal welfare.

    Data accessibility

    This article has no additional data.

    Authors' contribution

    M.P.M., R.S. and A.G.M. wrote the manuscript.

    Competing interests

    We declare we have no competing interests.

    Funding

    This research was carried out as part of the LIVEQuest project supported by InnovateUK and BBSRC grant no. 2016YFE01242200.

    Acknowledgements

    We thank the Editor Dr Tim Holt and the reviewers for their helpful comments. We also thank Emmanouil Benetos, Livio Favaro, Nicky McGrath and Dan Stowell for valuable feedback on an early draft of this review.

    Footnotes

    Published by the Royal Society. All rights reserved.