Interface Focus

    Abstract

    Traditionally, state-space models are fitted to data where there is uncertainty in the observation or measurement of the system. State-space models are partitioned into an underlying system process describing the transitions of the true states of the system over time and the observation process linking the observations of the system to the true states. Open population capture–recapture–recovery data can be modelled in this framework by regarding the system process as the state of each individual observed within the study in terms of being alive or dead, and the observation process the recapture and/or recovery process. The traditional observation error of a state-space model is incorporated via the recapture/recovery probabilities being less than unity. The models can be fitted using a Bayesian data augmentation approach and in standard BUGS packages. Applying this state-space framework to such data permits additional complexities including individual heterogeneity to be fitted to the data at very little additional programming effort. We consider the efficiency of the state-space model fitting approach by considering a random effects model for capture–recapture data relating to dippers and compare different Bayesian model-fitting algorithms within WinBUGS.

    1. Introduction

    Survival parameter estimation is of particular interest within ecological systems for gaining insight into the underlying biological system and for conservation or management polices [1]. To obtain estimates of survival probabilities, capture–recapture data are often collected. The data collection process involves the repeated (re-)capturing of individuals at a series of capture events, where each individual within the study is uniquely identifiable, either by natural markings or a unique mark applied to the individual at their initial capture, such as a tag or ring. It is then possible to specify a statistical model for the data, including an explicit likelihood function for the data given the model parameters and hence obtain estimates of the survival probabilities and recapture probabilities [24]. If individual animals can also be recovered dead within the study period, this is referred to as capture–recapture–recovery data, and the approach extended to obtain estimates for the survival, recapture (of live animals) and recovery (of dead animals) probabilities [58]. Survival parameters are often subject to individual heterogeneity and recent interest has included such additional complexities within the statistical framework. For example, specifying the survival parameters as a function of individual covariates [916] and/or incorporating random effects [1720]. Including individual heterogeneity typically increases the complexity of the model specification and fitting process.

    State-space (or hierarchical) models have become a very useful modelling tool in the analysis of complex ecological data. For example, state-space models have been used in fisheries [2123], animal count data [2427] and telemetry data [2831]. Traditionally, state-space models are fitted to data where the data obtained are observed with error. We will show how capture–recapture–recovery data can be modelled using a state-space framework, where the observation error component is incorporated as a result of a non-perfect recapture and/or recovery process, i.e. the recapture and recovery probabilities are less than unity. For capture–recapture–recovery models, considering a state-space formulation can simplify the model specification by constructing the model as the composition of a series of simpler models corresponding to each separate process acting on the population and permit additional complexity to be incorporated in a straightforward manner. We initially describe the underlying ideas of the general state-space framework and how these can be fitted within a Bayesian framework before applying this to capture–recapture–recovery data.

    1.1. State-space models

    State-space models separate an observed process into two components: a system process that models the underlying biological process over time and an observation process that accounts for imperfect detection of the system process, such as measurement error. State-space models have become increasingly popular within ecological problems, often separating the nuisance parameters (observational error) from the biological processes of interest.

    In general, the state-space framework takes the following form. Let y1, … , yT denote the (possibly multivariate) observations of the process; x1, … , xT the true underlying states at times t = 1, … ,T and θ the set of model parameters. The state-space model can be described by,

    Display Formula

    When the underlying states of the system process, x1, … , xT, are discrete-valued the state-space model reduces to a traditional hidden Markov model (although we note that some authors have extended the term hidden Markov model to include continuous state variables, see Cappé et al. [32]). For such models with finitely many states, there is an explicit closed form likelihood function for the observed data, denoted by f(y|θ) (by summing over all possible states in the system process) which can be maximized to obtain parameter estimates. Alternatively, the EM algorithm [33] can be implemented to obtain parameter estimates (see also van Deusen [34] for an application of the EM algorithm to capture–recapture data). For a continuous system process, in general there is no such closed form expression for the likelihood function of the observed data (although it can be expressed in the form of an integral over the x values) with the Kalman filter [35] traditionally used to obtain parameter estimates, assuming a linear Gaussian state-space model. More recently, Bayesian model-fitting approaches have been used that permit a greater degree of flexibility including, for example, nonlinear and non-Gaussian processes. We focus on the Bayesian approach and briefly describe a data augmentation approach for fitting state-space models. See, Newman et al. [36] for further discussion.

    1.2. Bayesian model fitting

    In order to fit a state-space model, we use a Bayesian data augmentation approach [37], also referred to as an auxiliary or latent variable approach, and, more recently, a complete data likelihood approach [38,39]. We treat the underlying states x as parameters (or auxiliary variables) to be estimated in addition to the model parameters, θ. The joint posterior distribution of the model parameters and auxiliary variables can be written in the form,

    Display Formula
    where
    Display Formula
    Display Formula
    and p(θ) denotes the prior specified on the model parameters. In general, the posterior distribution of the model parameters is given by
    Display Formula

    However, the necessary integration to obtain the posterior distribution is analytically intractable. We will use Markov chain Monte Carlo (MCMC) to obtain a sample from the joint posterior distribution of x and θ. Considering only the sampled values of θ (and ignoring the x values) indirectly performs the necessary integration to obtain a set of sampled values from the marginal distribution π(θ|y) which can be used to obtain summary statistics of interest. We note that the posterior (marginal) distribution of x is often of interest itself, relating to the posterior distribution of the true underlying states of the system under study.

    For some studies, some of the underlying states x may be observed without error. In such cases, we let x = {xobs, xmis}, where xobs and xmis denote the observed and unobserved states, respectively. xobs is observed data and so regarded in the same manner as the observations y, while xmis are auxiliary variables and we can write

    Display Formula
    1.1

    At each iteration of the MCMC algorithm, the auxiliary variables xmis (in addition to the model parameters θ) are updated.

    The introduction of auxiliary variables is often used in two particular circumstances: (i) when the likelihood function of the observed data, f(y|θ), is analytically intractable or computationally expensive; and/or (ii) when the inclusion of the additional auxiliary variables simplifies or improves the MCMC algorithm. Thus, we note that an auxiliary variable approach may be used even if an explicit likelihood is available. For example, for hidden Markov models (i.e. discrete system processes), there is an explicit likelihood. However, using an auxiliary variable approach (where xmis corresponds to the discrete underlying states) leads to a complete data likelihood, f(y|x,θ), that can be significantly computationally faster to calculate than the observed data likelihood, f(y|θ). The auxiliary variable approach may also lead to posterior conditional distributions of standard form for the model parameters (and auxiliary variables) and use of the Gibbs sampler compared with the Metropolis–Hastings algorithm when using the observed data likelihood due to non-standard posterior conditional distributions. See for example [27,36,3942] for further discussion of Bayesian model fitting in the context of a range of ecological state-space models.

    Finally, we comment on the issue of parameter redundancy, or local identifiability, within the Bayesian framework. A parameter redundant model has a ridge in the corresponding likelihood surface leading to non-unique maximum-likelihood estimates. Traditionally, to remove this problem, restrictions are specified on the parameter space to allow estimation of all identifiable parameters. In general, parameter redundancy can be difficult to identify, particularly for models with a large number of parameters and/or processes acting, although some advances have been made using symbolic algebraic approaches [4347]. However, within the Bayesian framework, the likelihood is combined with the prior distribution to form the posterior distribution of the parameters. Thus, parameter redundancy still permits the estimation of the posterior distribution (assuming a proper prior distribution), but due to the identifiability issues, parameters (or functions of parameters) suffer from strong prior sensitivity. For example, suppose that a given parameter is unidentifiable (so that the likelihood is flat with respect to this parameter), then the posterior distribution for this parameter is simply the corresponding prior for the parameter. The idea extends to when functions of parameters are unidentifiable, and the individual parameters within this function will typically exhibit strong prior dependence. Gimenez et al. [48] used this idea to assess potential identifiability issues, by considering the overlap between the posterior and prior distributions. A large overlap may provide an indication that the parameters are not being well estimated and potential parameter identifiability issues. See also King et al. [41] and Kéry & Schaub [42] for further discussion and applications.

    1.2.1. Model discrimination

    The Bayesian approach can be extended to incorporate model uncertainty. For example, suppose that there are a total of M possible models. We introduce a discrete model indicator parameter, denoted by m ∈ {1, … , M} with corresponding parameters θm (since the set of parameters is model dependent). The joint posterior distribution over the combined parameter and model space is given by

    Display Formula
    where p(m) denotes the prior specified on model m. Marginal posterior model probabilities are often of particular interest, providing a quantitative comparison of competing models, with the posterior probability for model m given by
    Display Formula
    Bayes factors, defined to be the ratio of posterior model probabilities to prior model probabilities, are also frequently used to compare competing models [49]. Posterior model probabilities (or Bayes factors) are a quantitative comparison of competing models and hence naturally permit posterior model-averaged estimates of parameters of interest. For example, suppose that the parameter θ1 is common to all models. The posterior model averaged distribution for θ1 is given by
    Display Formula

    MCMC methods can again be used to obtain a sample from the posterior distribution of interest to obtain Monte Carlo estimates of interest. For example the posterior model probabilities are estimated as the proportion of time the chain is in each model. However, an extension to the standard MCMC algorithm is necessary in order for the constructed Markov chain to move between models with a different number of parameters. The most common such algorithm is the reversible jump MCMC algorithm [50] but also see, Godsill [51] for additional discussion. See King et al. [41] and references therein for further discussion of ecological applications.

    1.2.2. Model checking

    Model checking (or goodness of fit) is concerned with the absolute fit of a model to the data. Many different models may be fitted to the data, and a model discrimination procedure applied, but none of the models may fit the data well. Within the Bayesian approach, model checking is typically assessed by effectively simulating data from the given model and comparing whether the simulated data appear consistent with the observed data. The most common approach is that of Bayesian p-values [52], a posterior predictive checking approach. The approach takes the form of repeatedly sampling parameter values from the posterior distribution. For each set of sampled values, a dataset is simulated from the model and compared with the observed data using some discrepancy function, g (for example, deviance, Pearson χ2 statistic or Freeman–Tukey statistic). The Bayesian p-value is simply the proportion of times that the discrepancy function of the simulated data is less than that of the observed data. If the data are inconsistent with the model, we would expect the Bayesian p-value to be close to 0 or 1. For examples on the use of Bayesian p-values in the context of capture–recapture–recovery data see earlier studies [12,17,39,41,42,5355]. Bayesian p-values have also been applied within a model-averaging context [8,12]. Previous applications of Bayesian p-values to capture–recapture–recovery data include discrepancy functions of the observed data xobs given the model parameters θ (i.e. g(xobs|θ)) [53] and discrepancy functions of the observed data xobs given the model parameters and auxiliary variables xmis (i.e. g(xobs|θ, xmis)) [54,55].

    2. Capture–recapture–recovery data

    Capture–recapture studies involve a series of capture events. At the first capture event, all individuals observed in the study population are uniquely marked and released back into the population. For example, a tag/ring is applied to each individual or more recently photo-identification used when it is possible to use natural physical features, such as individual markings (often used for seals, newts or tigers) and dorsal fin features (of dolphins). At each subsequent capture event, all previously observed individuals are recorded, new individuals are uniquely marked and all released back into the population. A recapture (or initial captures when using photo-identification techniques) of an individual does not necessarily mean a physical capture but often relates to a sighting of the individual; however, for simplicity, we use the term recapture. Typically, for such data a number of assumptions are made, including no mark-loss, migration from the system is permanent (so that apparent survival probabilities are typically estimated), individuals behave independently from each other and are representative of the population. For a review of such data and associated models, see Schwarz & Seber [56]. In some studies, in addition to live resighting, dead recoveries of individuals may also occur, leading to capture–recapture–recovery data.

    Capture–recapture–recovery data are typically displayed in the form of the encounter histories of each individual observed within the study. We let n denote the number of individuals observed within the study and T the number of capture events. We let w denote the n × T encounter matrix, where

    Display Formula
    for i = 1, … , n and t = 1, … , T. The ith row of the matrix w, denoted by wi corresponds to the encounter history of individual i. For example, consider the following encounter histories:
    Display Formula

    Individual 1 is first observed at time 1, unobserved at times 2, 6 and 7, recaptured at times 3, 4 and 5 and recovered dead at time 8. The second individual is first observed at time 1, recaptured at time 2 but then unseen throughout the remainder of the study—either it survives until the end of the study and remains unobserved, or is unobserved until it dies at some time within the study and is not recovered.

    For such data, the parameters of interest are typically demographic parameters (most notably survival probabilities) and/or abundance. The traditional Cormack–Jolly–Seber (CJS) model [2,4] considers only capture–recapture data and the ring-recovery (or tag-recovery) model [57] considers only capture–recovery data. Each of these models, condition on the initial capture of each individual observed within the study with the corresponding likelihood a function of survival and recapture or recovery probabilities (namely the product over each individual of the probability of their corresponding encounter history conditional on their initial capture). These ideas can be extended to integrated capture–recapture–recovery data with an associated likelihood expression [58]. Consequently, for CJS-type models (due to the conditioning on the initial capture of individuals), abundance cannot be estimated. The Jolly–Seber (JS) model [3,4] removes this condition on the initial capture of individuals. For both the CJS and JS models, the observed data likelihood are available [5860] permitting estimation of abundance and demographic parameters.

    We initially consider the state-space formulation for the CJS model before extending the approach to allow for additional system processes, including births (leading to JS-type models and abundance estimation) and individual heterogeneity (in the form of discrete and continuous individual covariates and/or individual random effects).

    2.1. State-space model formulation

    The state-space model formulation of the CJS model has been proposed independently by Royle [18], Gimenez et al. [61] and Schofield & Barker [62] (although this modelling approach dates back to (at least) Dupuis [63]—see §2.3.1). The state-space approach separates the survival process from the observation processes of individuals observed within the study (either alive or dead). We note that (as pointed out by an anonymous referee) this underlying concept can be seen within the original seminal papers, although not formally expressed in a (conditional) state-space representation. For example, both Jolly [3] and Seber [4] describe (in words) the two separate processes acting on each individual member of the population of surviving from one capture event to the next and subsequently being captured (or not).

    Formally, within the state-space formulation, capture–recapture–recovery data can be viewed as the combination of two different processes: (i) individuals surviving between successive time points and (ii) individuals being observed (either alive or dead) with some probability. Thus, we can regard each encounter history as a combination of the processes of survival between each capture event and observations of whether they are recaptured (if alive) or recovered (if dead), and thus dissect w into these distinct (conditional) components. Mathematically, to represent the survival process, we define x to be the n × T matrix such that

    Display Formula

    We note that xit = 0 corresponds to the cases where an individual is either dead or has not yet entered the study (i.e. before it is first observed). We let f(i) denote the first time individual i is observed in the study and ϕit the probability individual i survives until time t + 1, given they are alive at time t, with ϕ = {ϕit, i = 1, … , n; t = 1, … , T − 1}. Submodels are obtained by setting restrictions on these parameters. For example, a constant survival model is obtained by setting ϕit = ϕ for all i = 1, …, n and t = 1, … , T − 1. However, we retain the more general notation as we will consider a range of models allowing for temporal and/or individual heterogeneity. The underlying system process can be easily described as follows:

    Display Formula
    2.1
    for t = f(i) + 1, … , T and xif(i) = 1 (since the CJS model conditions on the first time an individual is observed alive). For many encounter histories, the xit values will not always be known. For example, consider encounter history w2 given above, we know that x21 = x22 = 1 but as the individual is unobserved thereafter they may either be alive and not observed or have died and not be recovered. Recall that we let xobs and xmis denote the set of values of xit that are known and unknown, respectively.

    For capture–recapture–recovery data, we partition the observation process into two distinct processes corresponding to the (re-)capture (of live animals) and recovery (of dead animals). We initially consider the recapture process and let y denote the n × T matrix such that

    Display Formula

    We note that for an individual to be recaptured, it is conditional on the individual being alive in the study at that time (i.e. if xit = 0 then yit = 0). Let pit denote the probability individual i is recaptured at time t, given they are alive in the study area at this time, for i = 1, … , n and t = 2, … , T. The observation process can be written as

    Display Formula
    2.2
    for t = f(i) + 1, … , T and yif(i) = 1.

    The recovery process is defined similarly. For simplicity, we assume that the recapture and recovery pro-cesses are observed over the same period of time, though this can be easily relaxed. We let z denote the n × T matrix with elements

    Display Formula
    for i = 1, … , n and t = 2, … , T. Once more the values of this observation process, z, are conditional on the system process, x. For an individual to be recovered dead at time t, it is usually assumed that the individual must have died in the interval (t − 1,t] (although see below for further discussion of relaxing this assumption). Let λit denote the probability individual i is recovered at time t, given that they die in the interval (t − 1,t], for i = 1, … , n and t = 2, … , T, and set λ = {λit: i = 1, … , n, t = 2, … , T}. The observation process can be written as
    Display Formula
    2.3
    for i = 1, … , n and t = f(i) + 1, … , T. The product (1 − xit)xit−1 ensures that an individual can only be recovered at time t if they are alive at time t − 1 (i.e. xit−1 = 1) but dead at time t (i.e. xit = 0); in other words, die in the interval (t − 1,t].

    The observed encounter histories w can be separated into the data components xobs, y and z. Note that a glossary of these latent variables, or auxiliary variables (and others introduced later in the manuscript), are listed in table 4 in appendix A. In addition, we note that this state-space formulation is simply a (partial) hidden Markov model, with the system process composed of the two discrete states ‘alive’ and ‘dead’, with some of the states known. The corresponding underlying state-space (or hidden Markov model) representation is provided graphically in figure 1.

    Figure 1.

    Figure 1. State-space representation of the system process, x, recapture observation process, y, and recovery observation process, z, for capture–recapture–recovery data. Square notes represent observed data, circular nodes unknown states and arrows a stochastic relationship between nodes.

    We fit the models using a Bayesian data augmentation approach to form the joint posterior distribution (cf. equation (1.1)),

    Display Formula
    where f(x|ϕ), f(y|x,p) and f(z|x,λ) denote the (Bernoulli) probability mass functions associated with the system and observation processes, respectively (such that their product is the complete data likelihood) and p(ϕ, p, λ) the joint prior on the survival, recapture and recovery probabilities.

    Using a state-space framework, when considering deviations from the above model and/or additional complexities, it is not necessary to explicitly derive a new observed likelihood expression, but simply consider each individual process in turn. For example, the assumption of recovery only in the capture event following the death of an individual is often reasonable, due to the decay of marks used to uniquely identify individuals. However, in some cases, this assumption is not valid, as for abalone (see Catchpole et al. [64], who derive an explicit observed data likelihood for the model where individuals can be recovered at any point following death). The model proposed by Catchpole et al. [64] can be easily fitted within the state-space framework changing only the observation process for recoveries such that

    Display Formula
    for i = 1, … , n and t = f(i) + 1, … , T where Inline Formula is included since an individual is removed from the study once it is recovered dead and the interpretation of λit now changes to the probability an individual is recovered given they are dead, with no restriction on the time of their death. More generally, models that explicitly allow for a decay of a mark over time can be fitted within the framework, for example, by specifying the recovery probability to be a decreasing function of time since death.

    2.2. Estimating abundance

    Removing the conditioning of initial capture from the CJS model permits the estimation of birth rates and abundance via the JS model [3,4]. An observed data likelihood can be obtained [5860] by considering a superpopulation corresponding to all possible individuals who may be observed within the study (i.e. individuals present and available for capture for at least one capture event) and summing over all possible combinations of their entry and exit times (i.e. ‘birth’ and ‘death’ times). Entry to the system could be via birth or immigration and exit from the system, death or permanent emigration. However, for simplicity, we use the terms ‘birth’ and ‘death’ to include such migration. Thus, for the JS model, there is an additional entry process to be included within the system process. We let the number of individuals in the superpopulation be denoted by N, which is a parameter to be estimated. We note that (Nn) individuals are available for capture at least once within the study but are not observed and that x is of dimension N × T, and hence a function of N. We let bit denote the probability that individual i enters the study population in the interval (t − 1,t] and so is available for capture for the first time at capture event t, for t = 1, … , T. This bi1 is interpreted as the probability individual i is in the population and available for capture at the first capture event. Since we condition on all N individuals being available for capture at least once within the study period, we have that ∑t=1Tbit = 1 for all i = 1, … , N.

    Within the state-space model, the survival process is replaced by a ‘presence’ process allowing for individuals to both join and leave the study (assuming individuals can only join and leave the study a maximum of one time), corresponding to ‘birth’ and ‘death’. In particular, we have that for t = 1, … , T and i = 1, … , N,

    Display Formula
    where
    Display Formula
    2.4
    such that
    Display Formula

    The term Inline Formula ensures that an individual can only be ‘born’ once within the study and cannot be ‘reborn’ (i.e. re-enter the study once it has died). The parameter γit can be interpreted as the probability individual i enters the population within the interval (t − 1,t], conditional on not entering the study population before or at time t − 1. The recapture and recovery processes essentially remain the same as for the CJS model, given in equations (2.2) and (2.3), but are now specified for t = 1, … , T, since the JS model removes the conditioning on first capture. The posterior distribution for the JS model, assuming no recoveries, is given by

    Display Formula
    where f(x|ϕ, b, N) and f(y|x, N, p) denote the Bernoulli probability mass functions for the system presence process and observation process, respectively. For further discussion, see Schofield & Barker [62].

    We note that when specifying the model within the set of BUGS (and associated) computer packages, it is not possible to specify x as a function of a stochastic node (i.e. N). The solution proposed is to specify a fixed upper bound for N, denoted by N*, and let x be of fixed dimension (N* × T) allowing for (N* − N) individuals not available for capture within the study period, or, in other words, do not enter into the study population. See earlier studies [40,42,6568] for further discussion of this approach, alternative parametrizations and associated computer codes. Finally, we note the link of the JS model to two other common models of occupancy models and closed population models.

    Occupancy models relate to the occupancy of a species (rather than individuals) within a given area. Species can colonize an area (enter the population), die out from the area (exit the population), as for capture–recapture data, but also recolonize an area (i.e. be ‘reborn’). The state-space formulation of the JS model can be easily extended to an occupancy model, by adding the term Inline Formula) (1 − Xit − 1) to νit in (2.4) for capture times t = 3, … , T to allow re-entry into the population (see Royle and Dorazio [40] for further discussion and references therein in relation to occupancy models). Finally, we note that removing the presence process (assuming that all individuals are present for the full duration of the study, so that xit = 1 for all i = 1, … , N and t = 1, … , T) and considering only a recapture process, leads to the special case of a closed population where the estimation of the total population, N, is of particular interest. See earlier research [3942,6769] for a range of different closed population models.

    2.3. Individual heterogeneity

    In many biological systems, the model parameters may be individual specific, in that they differ between individuals despite identical environmental conditions. For example, individuals typically differ in body condition (which itself may be a function of feeding or breeding behaviour, parasite load, etc.) which in turn has a direct impact on their corresponding survival (animals in better condition are more able to compete for resources and/or survive harsh environmental conditions). Similarly, animals may disperse through an area, with individuals more/less likely to be recaptured/recovered in more remote or dense vegetation areas. To incorporate such heterogeneity individual level factors (or covariates) and/or random effects are often used to model such differences.

    For simplicity, we consider capture–recapture data and the CJS model, extensions allowing for further system and/or observation processes follow by including additional separate processes. We consider the model parameters as a function of individual level time-varying covariates. The matrix of observed encounter histories, w, has elements given by

    Display Formula

    The survival and recapture processes for x and y given in equations (2.1) and (2.2) have the same definitions as before but where we now have the survival and recapture probabilities expressed as a function of the covariate values. However, we need an additional system process corresponding to the transition process for the underlying covariates. For the transition process, we let u be an n × T matrix, with elements

    Display Formula
    for i = 1, … , n and t = 1, … , T. Typically, there are unknown observations within the transition process. For example, for time-varying covariates, if an individual is not observed, generally the corresponding covariate values are also unobserved. We let u = {uobs,umis}, where uobs and umis denote the observed and unobserved states within the transition process, and let ψ denote the parameters relating to the transition process. The joint posterior distribution can be expressed in the form
    Display Formula
    2.5
    where f(x, u|ϕ,ψ) denotes the joint probability function of the survival and transition processes. Within the MCMC algorithm, we update the model parameters and missing states xmis and umis at each iteration of the Markov chain. An estimate of the posterior distribution of the model parameters can be obtained by considering the marginal distribution of these parameters within the algorithm. We consider particular cases for different types of covariate information.

    2.3.1. Multi-state data

    Discrete-valued time-varying covariates, such as breeding state or discrete location, are most commonly referred to as multi-state (or multi-site) data. These models are an example of a (partially) hidden Markov model, as the system processes have a finite number of states. Notationally, we let the set of possible states for the covariates be denoted by 𝒦 = {1, … , K} (this does not include the state of death). The traditional Arnason–Schwarz (AS) model extends the CJS model by considering a first-order Markov transition process between the different states, with an observed data likelihood available [9,10,70,71], allowing parameter estimation. Dupuis [63] proposed an alternative Bayesian state-space modelling approach, although did not phrase the approach as a state-space model but using an auxiliary variable approach (see also Kéry & Schaub [42, ch. 9]; Marin & Robert [72, ch. 5.5]; Clark et al. [73] and references therein and King & Brooks [54] for an extension to allow for model uncertainty in terms of time and covariate dependence on the model parameters).

    We let the transition parameters ψit(j,k) denote the probability that individual i is in state k at time t + 1, given they are alive and in state j at time t for t = f(i), … , T − 1 and set ψit = {ψit(j,k): j,k𝒦}. We initially consider a first-order Markovian structure for the transition process, but discuss relaxations to this assumption below. In particular, we can write

    Display Formula
    where
    Display Formula
    for i = 1, … ,n, t = f(i) + 1, … , T, ψit(k,·) = {ψit(k, j): j = 1, … , K} and sif(i) = uif(i). In other words, if xit = 1 and uit−1 = sit−1 = j, the probability that individual i is in state k at time t is equal to ψit−1 (j,k); if xit = 0 (individual is dead), then uit = 0 (although sit𝒦). The survival and recapture parameters can be extended to be dependent on the additional state parameter, with the analogous Bernoulli processes occurring for the observation and system survival processes.

    The joint probability mass function of the survival and transition processes, f(x, u|ϕ,ψ) given in equation (2.5) can be specified in conditional form

    Display Formula

    The first term corresponds to the Bernoulli survival process and the second to the multinomial transition process. In the standard AS model, if an individual is recaptured (i.e. yit = 1), the corresponding state of the individual, uit, is also known. However, if this is not the case and the state may not be observed when an individual is observed then within the state-space formulation the model specification remains the same. It is simply that the set of umis and uobs values change to reflect the set of missing and observed states, and that any unobserved uit values are auxiliary variables and updated within the MCMC algorithm.The state-space model can be extended to include additional system or observation processes. For example, within the traditional AS model, we assume that there are no recoveries within this process, but these can be easily included by considering an additional observation process corresponding to the recovery process [42,61,74]. Similarly, a birth process can be included (and a parameter for total population size) leading to the multi-state JS model [55,65].

    Using a state-space framework, it is straightforward to fit more complex model structures, including a second-order transition process, such that the state of an individual at time t is dependent not only on their state at time t − 1 but also t − 2. An explicit observed data likelihood for a second-order Markov model is available [10,75] but is computationally expensive to calculate compared with the complete data likelihood, conditional on the unobserved states within the transition process. In addition, within the state-space framework the survival and observation processes remain the same, with only the transition system process needing to be modified. Let Ψit(j,k,l) denote the probability that individual i is in state l ∈ 𝒦 at time t + 2, given that they are alive and in states j ∈ 𝒦 and k ∈ 𝒦 at times t and t + 1, respectively, for t = f(i), … , T − 2. We let Ψit = {Ψit(j,k,l): j, k, l ∈ 𝒦}. The transition process for t = f(i) + 1, … , T can be described in the form

    Display Formula
    where
    Display Formula
    for t = f(i) + 2, … ,T and Ψit−2(sit−2, sit−1,·) = {Ψit−2 (sit−2, sit−1,l),l𝒦}; and for t = f(i) + 1, typically, a first-order process is considered, so that
    Display Formula
    using the notation as for the first-order Markovian process and where sif(i) = uif(i).

    Possible error within the state observation process can also be incorporated, leading to multi-event models proposed by Pradel [76]. In such models, the state of an individual is not observed directly, but an event relating to the state is observed. For example, states may refer to ‘breeding’ and ‘non-breeding’ and events ‘seen near nest’ and ‘not seen near nest’. In this case, there are no observed states (uobs = ∅︀). The state-space framework extends the AS model specification by including an additional event observation process, conditional on the underlying states. We note that the event process can, equivalently, be incorporated into the state process, corresponding to the event an individual is in at each capture event (with the introduction of auxiliary variables for the unobserved events). However, for simplicity, we retain the event process within the observation process (but note that if the model parameters such as the recapture probabilities are dependent on the event process, rather than the true underlying states, the event process needs to be incorporated into the system process and the unobserved events treated as auxiliary variables).

    Let ℰ = {1, … , E} denote the set of possible observed events, and let vit denote the event observation process for individual i at time t, conditional on the individual being observed at this time (i.e. yit = 1). For statistical convenience, we let vit = 0 if yit = 0 (i.e. the individual is not observed). Thus, we have

    Display Formula

    We define the additional model parameters βit(kit, eit) to be the probability individual i is in event eit ∈ ℰ at time t, given they are in state kit ∈ 𝒦 and observed at this time and let βit = {βit(k,e): k ∈ 𝒦,e ∈ ℰ}. Then, for i = 1, … , n and t = f(i), … , T, we have

    Display Formula
    where
    Display Formula
    and βit(uit,·) = {βit(uit,rit): rit = 1, … , E}, such that ∑e∈ℰ βit(uit, e) = 1 for all uit𝒦 (assuming that on initial capture the observed event is independent of the fact that it is their first encounter).

    The underlying system process is the same as for the AS model, with the event process incorporated in the observation process. However, we note that within the AS model, the likelihood conditions on the state an individual is in when first observed. Within multi-event models, this conditioning is removed (since we do not observe the true states), and an initial state distribution needs to be specified. Let κi(k) denote the probability individual i is in state k when it is first encountered. The initial state distribution is given by

    Display Formula
    where κi = {κi(k): k = 1, … , K}. More generally, the multi-event framework allows for both observed known state and partially observed state where an observed event restricts the set of possible true states (see Smout et al. [77] who consider this framework in relation to tag-loss models and King & McCrea [78] who derive an efficient closed form observed data likelihood).

    Finally, we consider the special case of mixture models proposed by Pledger et al. [79] to allow for unobservable heterogeneity within the population. Within these models, the population is composed of K homogeneous sub-populations, with each individual belonging to one sub-population and there are no transitions between sub-populations. Typically, the sub-populations are unobservable, so that there is no directly observable covariate information (uobs = ∅︀). The covariate values are time-invariant, so that uit = ui for all, t = 1, … , T, and

    Display Formula
    where κ = {κ(k): k = 1, … , K} and κ(k) denotes the probability an individual belongs to sub-population k. In general, K may be known or be a parameter to be estimated [79].

    2.3.2. Mixed/random effect models

    We now consider continuous individual covariates. In these cases, the demographic parameters are typically specified as a parametric function of covariate values [11,12,1520,41,42,65,80] or recently, more flexible semi-parametric spline models have also been proposed to model the relationship between the demographic parameters and covariates [14,81]. For example, let uit = {uit(j): j = 1, … , J} denote the set J covariate values for individual i at time t. The survival probability of individual i at time t may be (parametrically) logistically regressed on the set of covariates, such that

    Display Formula

    We note that if an individual is unobserved at any capture event (or following capture for the CJS model), their corresponding time-varying covariate(s) are also generally unobserved. This typically leads to an analytically intractable observed data likelihood, specified only in integral form, where the integration is over the unknown covariate values. Catchpole et al. [15] proposed an alternative conditional likelihood approach, conditioning on only the observed covariate values while Bonner et al. [16] compared this approach with an alternative Bayesian approach. The comparative study conducted suggested that, in general, the conditional likelihood approach suffers from poorer precision (as a result of discarding a potentially large proportion of the data) and can have boundary value problems for the parameter estimates, but has the potential advantage that no underlying model needs to be specified on the underlying covariate values, as in the Bayesian approach. In particular, the transition process for the covariate values are written in the form of the probability density function associated with the covariates (potentially allowing for dependence between the values). For example, Bonner & Schwarz [13] considered only one time-varying covariate and set

    Display Formula
    for t = f(i) + 1, … , T, dropping the notation on covariate j and where δt denotes an additive year effect. A probability density function is specified for uif(i) if there are unobserved values at initial capture. The missing covariate values are treated as auxiliary variables and updated within the MCMC algorithm.

    Including individual random effects to the model allows for additional unexplained individual heterogeneity, such that

    Display Formula
    where εiN(0, σε2). Gimenez & Choquet [20] use a numerical integration procedure to approximate the observed data likelihood for random effect models on survival probabilities. Such an approach can be computationally very expensive for more than two individual random effects. Alternatively, within the Bayesian model-fitting approach, the additional random effect terms ε = {εi: i = 1, … , n} are treated as auxiliary variables that are updated within the Markov chain. The joint posterior distribution over the model parameters, unobserved system state values and random effect terms is given by
    Display Formula
    where Inline Formula denotes the probability density function for the random effects (independent Gaussian for the above random effects model). Model selection can be performed on the covariates present in the model as a variable selection problem [12,19,41].

    2.4. Efficiency

    In general, when there are competing model-fitting algorithms that can be used, their relative performance will be both model and data dependent. In order to investigate the computational efficiency of a state-space modelling approach, we consider capture–recapture data relating to dippers (see Lebreton et al. [82] for further details) where there are T = 7 capture occasions and compare different model-fitting algorithms. Brooks et al. [53] compare different time-dependent models for these data, identifying the model with constant recapture probability, p, and two survival probabilities, corresponding a flood effect year (years 2 and 3) with survival probability, ϕf, and non-flood effect year (years 1, 4, 5 and 6) with survival probability, ϕn, as having largest posterior support (assuming uniform priors on the recapture and survival probabilities). Using this temporal model, we specify additional individual random effects, so that the model parameters are of the form

    Display Formula
    Display Formula
    2.6
    where Inline Formula and Inline Formula. Royle [18] considers the same data but specifies the fully time-dependent model with individual random effects. We specify the analogous priors to [18], such that logit−1 anU[0, 1], logit−1afU[0, 1], logit−1bU[0, 1] and σp, σϕU[0, 10]. Posterior summary statistics of the model parameters are provided in table 1. We note that the posterior distribution of the underlying mean recapture and survival probabilities is similar to the posterior distribution of the recapture and survival probabilities obtained under the analogous temporal model without random effects [53]. For some discussion relating to prior sensitivity analyses on these data, see studies by Brooks et al. [53] and Royle [18], and for further discussion of individual heterogeneity and alternative models, see Choquet & Gimenez [83].

    Table 1.Posterior mean, standard deviation (s.d.) and 95% symmetric credible intervals (CIs) for the parameters in the model with individual random effects on the recapture and survival probabilities. The terms logit−1an and logit−1af correspond to the underlying mean survival probabilities of the individuals in the presence of individual random effects and logit−1b the underlying mean recapture probability.

    parameter posterior mean (s.d.) 95% CI
    logit−1an 0.61 (0.04) (0.54, 0.69)
    logit−1af 0.48 (0.05) (0.39, 0.58)
    logit−1b 0.94 (0.04) (0.86, 0.99)
    σϕ 2.43 (1.28) (0.33, 5.15)
    σp 0.28 (0.19) (0.04, 0.74)

    We consider three model-fitting algorithms:

    SS1: the underlying state-space formulation described in §2 with individual random effects on the model parameters specified in (2.6);

    SS2: a reparametrized state-space formulation replacing the set of values xil(i)+1, …, xiT, where l(i) corresponds to the final time individual i is recaptured alive, with the parameter d = {di: i = 1, … , n} corresponding to the time of death for individual i (see, appendix B for further details); and

    L1: explicit calculation of the joint probability of the encounter histories given the parameters and auxiliary variables corresponding to the random effect terms, denoted by f(y|xobs, ε) (see appendix B for further details).

    Each model-fitting approach was fitted within the computer package WinBUGS [84]. Method SS1 uses an adapted version of the code provided by Royle [18] allowing for the different time-dependence on the model parameters, with the code for SS2 and L1 modified versions of this code. Three different models are considered relating to individual random effects on the recapture probabilities only, survival probabilities only or on both recapture and survival probabilities. These models are denoted by {p(h), ϕ(f)}, {p, ϕ(f + h)} and {p(h), ϕ(f + h)}, respectively, where the h in brackets corresponds to the presence of individual heterogeneity and the f to the flood effect. The length of time to run each model for 10 000 iterations is given in table 2. In general, there is a trade-off between the number of parameters to be updated and the complexity of the posterior distribution. For L1, the probabilities of the encounter histories (following the final time an individual is observed) need to be calculated, whereas for SS1 and SS2 this reduces to a product of Bernoulli terms and imputing the alive/dead status of each individual animal. In this example, the state-space approaches are clearly computationally faster than the L1 approach.

    Table 2.Length of time (in seconds) that each model-fitting approach took to run 10 000 iterations in WinBUGS for three different models, where h in the brackets corresponds individual random effects and f to a flood effect.

    model-fitting approach
    Time for 10 000 iterations SS1 SS2 L1
    p(h), ϕ(f + h) 126 83 180
    p(h), ϕ(f) 65 50 115
    p, ϕ(f + h) 83 61 97

    Of the three models fitted including individual heterogeneity, model {p(h), ϕ(f)}, corresponding to only individual random effects on the recapture probabilities, is preferred in terms of posterior model probabilities. Bayes factors of 3.9 and 32 are obtained for model {p(h), ϕ(f)} versus {p, ϕ(f + h)} and {p(h), ϕ(f + h)}, respectively. (However, we do note that the model excluding any random effects, {p, ϕ(f)}, is preferred with a Bayes factor of 5.2 versus model {p(h), ϕ(f)}.) We use model {p(h), ϕ(f)} to compare the performance of the different model-fitting algorithms including some level of individual heterogeneity. Note that for this model, we set q = logit−1b, corresponding to the underlying mean recapture probabilities of individuals in the presence of individual random effects, and ϕ = {ϕn, ϕf} the survival probabilities for non-flood and flood years, respectively (no individual random effects present).

    We consider the convergence of the Markov chains to the stationary distribution, using the Brooks–Gelman–Rubin (BGR) statistic [85]. The BGR statistic for parameters ϕ, b and σp2 consistently tend to one within 10 000 iterations for repeated simulations starting from different initial values, though ϕn and ϕf always tend to one much quicker than the other parameters. This appears to be a result of the poorer mixing of the Markov chain for σp2 (and b). We return to this issue below in relation to autocorrelation. To be conservative, we specify a burn-in of 20 000 iterations for the MCMC simulations, to ensure the chain has sufficiently converged.

    To compare the relative performances of the MCMC algorithm for the different model-fitting approaches, accounting for autocorrelation present within the Markov chain, we consider the effective sample size (ESS) [86] of each of the parameters. The ESS is essentially the number of independent samples within the MCMC sampled values, correcting for autocorrelation within the Markov chain and are presented in table 3. There are clear differences in the ESS across the different parameters with the consistent signal that all the MCMC algorithms perform poorly in the updating of the random effect variance term σp, with very high autocorrelation. Generally, the L1 model-fitting approach appears to have the largest ESS, suggesting a better mixing Markov chain; however, we recall that this was also the slowest model-fitting algorithm. Table 3 provides the corresponding ESS per unit time (100 s), which calculates the effective number of independent samples per unit of computational time. Using this ESS per unit time, the state-space approaches SS1 and SS2 each perform better than the L1 approach, for all parameters in this example. However, the SS2 state-space parametrization (auxiliary variables correspond to the time of death of each individual) significantly outperforms the SS1 parametrization (auxiliary variables correspond to alive/dead state of each individual at each capture time following final capture). However, we note the very poor performance of the MCMC algorithm for the random effect variance across all algorithms, and subsequently q (or b), due to the high posterior correlation between b and σp.

    Table 3.(a) Effective sample size (ESS) of each parameter under the different model-fitting algorithms using 100 000 iterations after 20 000 burn-in and (b) ESS per 100 seconds.

    parameter SS1 SS2 L1
    (a) ESS
    ϕn 4014 4529 5142
    ϕf 3569 3974 3979
    q 629 943 749
    σp 77 104 121
    (b) ESS per 100 seconds
    ϕn 616 900 447
    ϕf 547 790 346
    q 96 187 65
    σp 12 21 11

    We emphasize that, in general, the performance of different model-fitting approaches will be both problem and model dependent. For example, King [87] compares a state-space approach (using the Gibbs sampler) with the observed data likelihood approach (using a Metropolis–Hastings random walk algorithm) for the AS model using data analysed by Dupuis [63] consisting of 96 lizards, six capture events and three sites. In this case (using bespoke written code), the state-space approach was computationally faster, exhibited lower autocorrelation and did not require pilot-tuning compared with the observed data likelihood approach.

    3. Discussion

    The application of state-space models to capture–recapture–recovery data is very appealing due to its simplicity and ease with which models can be generalized. Partitioning the system and observation processes into distinct components leads to the natural identification of each individual process operating. This allows a complex model to be constructed using a sequence of simpler models, corresponding to each individual process acting on the study population. For example, for the capture–recapture–recovery data, the observation process is partitioned into distinct recapture and recovery processes; for the AS model, the system process is decomposed into the distinct survival process and state transition process. The different processes operating within a system will be problem dependent, but the general modelling approach remains the same. The full model is constructed by considering each process in turn and any conditional dependence between these processes (for example, for capture–recapture data the capture process is conditional on the individual being alive). This type of approach is clearly applicable to integrated analyses, combining together different sources of data within a single analysis. Integrated models simply specify the different processes acting on each individual dataset [42]. To construct and represent the state-space model, a graph is often a useful tool. See Buckland et al. [25,88] for further discussion and examples of different possible processes, including an ageing process (potentially leading to age-dependent parameters) and removals of individuals upon capture. Specifying the complete system process as a combination of simpler distinct system components also leads to the necessary and important consideration of the order in which the system processes occur [88,89]. The state-space approach has the advantage that if the model is modified in some way, the impact on each separate component can be considered, and in many cases, the different components may be unaltered. For example, for the AS model, changing the transition process from a first-order to second-order process only affects the system transition process, with the observation and recapture process being unaltered. This is in contrast to the derivation of the observed data likelihood which typically needs to be recalculated for different underlying models, which itself can be a non-trivial exercise. In addition, standard software including variants of BUGS (such as WinBUGS/OpenBUGS/JAGS) can be used to implement the state-space modelling approach, specifying the underlying distributional form for each individual process, which in general is significantly simpler than providing the explicit likelihood formulation.When fitting capture–recapture–recovery data within a Bayesian state-space approach there is a computational trade-off between the simplicity of the complete data likelihood and the number of parameters and auxiliary variables being updated in the MCMC algorithm. Using the standard state-space approach, imputing the alive/dead status of an individual following their final capture increases the number of parameters to be updated within the MCMC algorithm, but significantly simplifies the form of the posterior distribution. This can also lead to the posterior conditional distribution of the parameters being of standard form so that the Gibbs sampler may be implemented within the MCMC algorithm, for example, within the AS model for the survival/recapture/recovery probabilities (assuming independent beta priors) and transition probabilities (assuming independent Dirichlet priors). When implementing a Bayesian MCMC algorithm to fit such models the performance of the Markov chain (particularly in standard packages) should be examined. For poorly performing chains, alternative updating schemes can be implemented, for example, via the use of blocking of highly correlated parameters, hierarchical centring, parameter expansion and orthogonal reparametrization (see Browne et al. [90] for an overview of such methods and references therein). The use of alternative and efficient updating algorithms is an area of ongoing research.

    Acknowledgements

    I thank Roland Langrock, Byron Morgan, Diana Cole and Ian Goudie for their very helpful comments and discussions on earlier versions of this manuscript. I also thank three anonymous reviewers for their insightful comments on the initial manuscript which has led to the current improved version.

    Appendix A

    Table 4.Glossary of auxiliary variables.

    auxiliary variable value definition
    (a) system processes
    xit 0 individual i is not in the study at time t;
    1 individual i is in the study at time t.
    uit 0 individual i is not in the study at time t;
    k individual i has covariate value k at time t.
    (b) observation processes
    yit 0 individual i is not recaptured alive at time t;
    1 individual i is recaptured alive at time t.
    zit 0 individual i is not recovered dead at time t;
    1 individual i is recovered dead at time t.
    vit 0 individual i is not observed at time t;
    e individual i has event value e at time t,
     given individual is observed.

    Appendix B. Model-fitting algorithms

    Here, we describe in greater detail the model-fitting approaches of SS2 and L1 described in §2.4. Method SS2 uses a very similar state-space formulation as to method SS1, but reparametrizes the model replacing the set of values xil(i)+1, …, xiT, where l(i) corresponds to the final time individual i is recaptured alive, with the parameter di corresponding to the time of death for individual i. Clearly, the parametrizations are equivalent, but the updating of such parameters in generic packages (including WinBUGS) will be different with only a single parameter needing to be updated for each individual using the parameters d = {d1, … , dn}. Conversely, if using the x parametrization each parameter following the last time an individual is observed alive will be updated (though many of the probability mass functions of these terms may correspond to a point mass on a single value). It is straightforward to show that

    Display Formula
    where di = T + 1 corresponds to an individual surviving to the end of the study and Inline Formula for j2 < j1.

    The third approach for fitting the MCMC algorithm, denoted by L1, explicitly calculates the joint probability of the encounter histories given the parameters and auxiliary variables corresponding to the random effects present in the study, denoted by ε. Let χit denote the probability individual i is not observed again in the study following time t, given they are alive at time t. The term χit can be calculated using the recursive formulae

    Display Formula
    with χiT = 1 for all i = 1, … , n [7]. The joint posterior distribution is given by
    Display Formula
    where σ2 denotes the random effect variances present in the model and f(w|p,ϕ,ε) is the joint probability of the observed data, given the model and random effect parameters. Thus, this approach does not involve any updating of the x (or d) values, but uses an explicit expression for the probability of not being observed again, taking into account all possible times an individual may have died.

    The WinBUGS code for model fitting approach SS1 is available in Royle [18], with the adapted codes for approaches SS2 and L1 available from the author on request. Finally, we note that these approaches can be extended to other models, such as multi-state models (including extensions), but that approach L1 becomes more complicated to code, owing to the complexity of the observed data likelihood.

    Footnotes

    One contribution of 11 to a Theme Issue ‘Mathematical and theoretical ecology’.