Monte Carlo profile confidence intervals for dynamic systems

Monte Carlo methods to evaluate and maximize the likelihood function enable the construction of confidence intervals and hypothesis tests, facilitating scientific investigation using models for which the likelihood function is intractable. When Monte Carlo error can be made small, by sufficiently exhaustive computation, then the standard theory and practice of likelihood-based inference applies. As datasets become larger, and models more complex, situations arise where no reasonable amount of computation can render Monte Carlo error negligible. We develop profile likelihood methodology to provide frequentist inferences that take into account Monte Carlo uncertainty. We investigate the role of this methodology in facilitating inference for computationally challenging dynamic latent variable models. We present examples arising in the study of infectious disease transmission, demonstrating our methodology for inference on nonlinear dynamic models using genetic sequence data and panel time-series data. We also discuss applicability to nonlinear time-series and spatio-temporal data.


Introduction
This paper develops profile likelihood inference methodology for situations where computationally intensive Monte Carlo methods are employed to evaluate and maximize the likelihood function. If the profile log-likelihood function can be computed with a Monte Carlo error small compared to one unit, carrying out statistical inference from the Monte Carlo profile as if it were the true profile will have relatively small effects on resulting confidence intervals. Sometimes, however, no reasonable amount of computation can reduce the Monte Carlo error in evaluating the profile to levels at or below one log unit. This predicament typically arises with large datasets and complex models. However, to investigate large datasets in the context of complex models there is little alternative to the use of Monte Carlo methods. Monte Carlo approximation of the profile likelihood function provides opportunities to assess Monte Carlo variability and make appropriate compensations. We use this approach to construct profile likelihood confidence intervals with statistical behaviour properly adjusted for Monte Carlo uncertainty.
Our paper is organized as follows. First, we set up mathematical notation to formalize the task of Monte Carlo profile likelihood estimation via a metamodel. Section 2 puts this task in the context of some previous work on likelihood-based inference for intractable models. Section 3 develops our methodological approach. Section 4 presents a dynamic latent variable modeling framework of broad applicability for which the methodology is appropriate. We demonstrate the capabilities of our methodology by solving two inferential problems for which scientific progress has been limited by the lack of effective statistical methodology. These examples arise from the study of transmissible human diseases, a field characterized by extensive and diverse data, indirect observation of the underlying infection processes, strongly nonlinear stochastic dynamics and public health importance. Infectious disease data therefore provide many inference opportunities and challenges. Section 4.1 concerns inference on population dynamics from genetic data. Section 4.2 concerns fitting nonlinear partially observed Markov models to panel data. Section 4.3 discusses the role of our methodology for nonlinear time-series and spatio-temporal data analysis. Section 5 investigates our methodology via a simulation study on a toy example. Section 6 is a concluding discussion which situates our paper within the broader goal of inference for large datasets and complex models.
We consider a general statistical inference framework in which data are a real-valued vector, y*, modelled as a realization of a random variable Y having density f Y (y ; u), where u is an unknown parameter in R p . We are concerned with inference on u in situations where the data analyst cannot directly evaluate f Y (y ; u). Instead, we suppose that approximate evaluation of f Y (y ; u) is possible through Monte Carlo approaches. One situation in which this arises is when the statistician can simulate draws from the density f Y (y ; u) despite being unable to directly evaluate it [1]. In addition to a simulator for the full joint distribution of Y, we might also have access to simulators for various marginal and conditional distributions related to f Y (y ; u). For example, this can arise if Y has the structure of a fully or partially observed Markov process [2]. Simulation-based methods are growing in usage, motivated by advances in the availability of complex data and the desire for statistical fitting of complex models to these data. Although we cannot calculate them, we can nevertheless define the log-likelihood function, and a maximum-likelihood estimate (MLE), To formalize the task of constructing marginal confidence intervals, we suppose that Here, f is a focal parameter for which we are interested in obtaining a confidence interval using the data, y*. As the choice of focal parameter is arbitrary, we are solving the general problem of obtaining a marginal confidence interval for each component of a parameter vector. The profile log-likelihood function for f is defined as The profile log likelihood is maximized at a marginal MLE, A profile likelihood confidence interval with cut-off d is defined as Profile likelihood confidence intervals are a widespread inference approach with some favourable properties, including asymptotic efficiency and natural transformation under reparameterization [3]. Modifications can lead to higher-order asymptotic performance [4] but these are not routinely available. In our context, (1.3)-(1.5) are not directly accessible to the data analyst. Instead, we work with independent Monte Carlo profile likelihood evaluations at a sequence of points f 1:K ¼ (f 1 , . . ., f K ). We denote the evaluations as ( ' P k (y Ã ), k [ 1 : K), using a breve accent to distinguish Monte Carlo quantities. Without loss of generality we can write ' where e 1:K (Y ) are Monte Carlo random variables which are, by construction, mean zero and independent conditional on Y . In (1.6), b 1:K (y*) gives the Monte Carlo bias of each profile log-likelihood evaluation. Motivation for the decomposition into target quantity, bias and additive error on the log scale in (1.6) is that this is a proper scale for Monte Carlo central limit theory relevant to our subsequent examples [5] as well as an appropriate scale for inference.
The amount of information about f in the data is represented by the curvature of the profile log likelihood, and if this is large then statistically relevant region with high profile likelihood is narrow. As represented pictorially in figure 1, increasing the curvature of the profile log likelihood reduces the consequence of non-constant bias in the construction of profile confidence intervals. A useful simplification arises when it is reasonable to treat the distribution of the Monte Carlo bias and error in (1.6) as constant across the statistically relevant region having high-profile likelihood. This leads us to consider a metamodel with b k (y*) ¼ b( y*) and with e 1:K ( y*) independent and identically distributed (i.i.d.) Figure 1. The effect of bias on confidence intervals for a quadratic profile log-likelihood function. (a) The blue dotted quadratic represents a log-likelihood profile. The maximum-likelihood estimator of the profiled parameter is f 3 , with corresponding log likelihood with variance s 2 ( y*) , 1. The resulting metamodel is ' and He et al. [6] introduced the term plug-and-play to describe statistical methodology for which the model (viewed as an input to an inference algorithm) is specified via a simulator in this broader sense. The term likelihood-free has been used similarly, in the context of Markov chain Monte Carlo (MCMC) [7] and sequential Monte Carlo (SMC) [8,9]. The term equation-free has been used for the related concept of simulation-based model investigations in the physical sciences [10]. Related terms implicit [1] and doubly intractable [11] have been used to describe models for which only plug-and-play algorithms are practical. From the point of view of categorizing statistical methodology, it is natural to view the way in which an inference algorithm accesses the statistical model as a property of the algorithm rather than a property of the model. Rubio & Johansen [12] investigated non-parametric estimation of a likelihood surface via approximate Bayesian computing (ABC). These authors also provided a literature review of previous approaches to carry out statistical inferences in situations where likelihood evaluation and maximization necessarily involve computationally intensive and noisy Monte Carlo procedures. We are not aware of previous work developing Monte Carlo profile likelihood methodology. Profile methodology focuses the computational effort on parameters of key interest-specifically, parameters for which one computes a profile. The process of constructing a profile requires computation of a relevant feature of the likelihood surface in the region of inferential interest. Studying the likelihood surface on this scale, rather than focusing exclusively on a point estimate such as the maximum-likelihood estimate, has some theoretical justification [13]. In the general theory of stochastic simulation-based optimization, building metamodels describing the response surface is a standard technique [14]. Our goal is to develop metamodel methodology that takes advantage of the statistical properties of the profile likelihood and constructs confidence intervals correcting properly for Monte Carlo variability.

Profile cut-off correction via a local quadratic metamodel
Local asymptotic normality (LAN) provides a general theoretical framework in which the log-likelihood function is asymptotically well approximated by a quadratic [15]. Under sufficient regularity conditions, this quadratic approximation is inherited by the profile log likelihood [16]. Here, we write the marginal f component of the LAN property as a finite sample normal approximation given by is a normal random variable with mean 0 and variance 1. In (3.1), % indicates approximate equality in distribution. Under regular asymptotics, the curvature of the quadratic approximation in LAN is the Fisher information, and LAN is therefore a similar property to asymptotic normality of the maximum-likelihood estimate. The quantity I in (3.1) can be interpreted as the marginal Fisher information for f, also known as the f-aspect of the Fisher information [4,Section 3.4]. Specifically, if we write the inverse of the full Fisher information as In this article, we focus on developing and demonstrating statistical methodology rather than on presenting theoretical results. Therefore, the formal mathematical representation of the approximations in this paper as asymptotic limit theorems is postponed to subsequent work.
The LAN property suggests that the Monte Carlo profile log likelihood evaluated at f 1:K can be approximated, in a neighbourhood of its maximum, by a quadratic metamodel, '

ð3:2Þ
This local quadratic metamodel is a special case of (1.7). The unknown coefficientsâ(y Ã ),b(y Ã ) andĉ(y Ã ), corresponding to equation (3.2) evaluated at y ¼ y*, describe a quadratic approximation to the numerically intractable likelihood surface. We can use standard linear regression to estimatê a(y Ã ),b(y Ã ) andĉ(y Ã ) from the Monte Carlo profile evaluations. Writing e ¼ e 1:K , we denote the resulting linear regression coefficients as The marginal MLEf Ã can be approximated by the maximum of ' Q (f ; y Ã ), which is given by into two components: (1) Statistical error is the uncertainty resulting from randomness in the data, viewed as a draw from the statistical model. This is the error in the ideal quadratic profile approximation estimateb(y Ã )=2â(y Ã ) as an estimate of f 0 . (2) Monte Carlo error is the uncertainty resulting from implementing a Monte Carlo estimator. This is the error in b(y Ã , e)=2 a(y Ã , e) as a Monte Carlo estimate ofb(y Ã )=2â(y Ã ).
The LAN approximation in (3.1) suggests a normal approximation for the distribution of the marginal MLEf Ã which we write asf ðYÞ % N½f 0 , SE 2 stat : ð3:4Þ The usual statistical standard error, 1= ffiffiffiffiffiffiffiffiffiffiffiffi 2â(y Ã ) p , is not available to us, but we can instead use its Monte Carlo estimate, The usual asymptotic profile likelihood confidence interval cut-off value can be obtained by converting the standard error of the MLE into an equivalent cut-off on a quadratic approximation to the profile log likelihood. In our setting, the asymptotic (1 2 a) confidence interval, Monte Carlo adjusted profile cut-off for the quadratic approximation Note that, if SE mc ¼ 0, the calculation in (3.11) for a ¼ 0.05 reduces to the usual cut-off to construct a 95% CI for an exact profile likelihood.
Confidence intervals based on a quadratic approximation to the exact log likelihood are asymptotically equivalent to using the same cut-off d with a smoothed version of the likelihood, so long as an appropriate smoother is used [13]. An appropriate smoother should return a quadratic when the points do indeed lie on a quadratic, a property satisfied, for example, by local quadratic smoothing such as the R function loess [17]. We, therefore, propose using d as an appropriate cut-off on a profile likelihood estimate obtained by applying a suitable smoother to the Monte Carlo evaluations in (1.6). A smoother, S(f ; f 1:K , ' Here, we suppose that S(f ; f 1:K , ' P 1:K , l) is evaluated at f via local quadratic regression with weight w k (f) on the point (f k , ' P k ), where w k (f) depends on the proximity of f to f k . Specifically, we take S to be the widely used local quadratic smoother of [18] as implemented by the function loess in R3.3.3. In this case, l is the span of the smoother, defined as the fraction of the data used to construct the weights in the local regression at any point f. In practice, the statistician needs to specify l. While automated choices of smoothing parameter have been proposed, it remains standard practice to choose the smoothing parameter based on some experimentation and looking at the resulting fit. In our experience, the default loess choice of l ¼ 0.75 in R3.3.3 has been appropriate in most cases. However, a larger value of l is needed when the profile is evaluated at very few points (as demonstrated in the electronic supplementary material, Section S1). When the exact profile is not far from quadratic, one can expect local quadratic smoothing of the Monte Carlo profile likelihood to be insensitive to the choice of l.
Just as the local quadratic regression smoother has weights w 1:K (f ), the quadratic metamodel in (3.2) can be fitted using regression weights. A natural choice of these weights for obtaining a profile confidence interval cut-off for S(f ; f 1:K , ' Fit a local quadratic smoother, ' Obtain regression weights w 1:K for the evaluation of The observation Y n is modelled as a measurement of X(t n ) by requiring that Y n is conditionally independent of the other observations and of fX(t)g given X(t n ). We will suppose that Y n takes values in Y ¼ R d , the space of d-dimensional real vectors.

Inferring population dynamics from genetic sequence data
Genetic sequence data on a sample of individuals in an ecological system has potential to reveal population dynamics. Extraction of this information has been termed phylodynamics [21]. Likelihood-based inference for joint models of the molecular evolution process, population dynamics and measurement process is a challenging computational problem. The bulk of extant phylodynamic methodology has therefore focused on inference for population dynamics conditional on an estimated phylogeny and replacing the population dynamic model with an approximation, called a coalescent model that is convenient for calculations backwards in time [22]. Working with the full joint likelihood is not entirely beyond modern computational capabilities; in particular it can be done using the genPomp algorithm of Smith et al. [23]. The genPomp algorithm is an application of iterated filtering methodology [19] to phylodynamic models and data. To the best of our knowledge, genPomp is the first algorithm capable of carrying out full joint likelihood-based inference for population-level phylodynamic inference. However, the genPomp algorithm leads to estimators with high Monte Carlo variance, indeed, too high for reasonable amounts of computation resources to reduce Monte Carlo variability to negligibility.
rsif.royalsocietypublishing.org J. R. Soc. Interface 14: 20170126 This, therefore, provides a useful scenario to demonstrate our methodology. Figure 2 presents a Monte Carlo profile computed by Smith et al. [23], with confidence intervals constructed by applying the MCAP algorithm implemented by the mcap procedure (electronic supplementary material, Section S3) with default smoothing parameter. The model and data concern HIV transmission in Southeast Michigan, but details of the model and computations are not of immediate interest since all we need to consider are the estimated profile likelihood points. The profiled parameter quantifies HIV transmission from recently infected, diagnosed individuals-it is 1 J 0 in the notation of Smith et al. [23] but we rename it as f for the current paper. The computations for figure 2 took approximately 10 days using 500 cores on a Linux cluster. To scale this methodology to increasingly large datasets and more complex models, it is apparent that one may be limited by the computational effort required to control Monte Carlo error. The MCAP procedure gives a Monte Carlo standard error of SE mc ¼ 0.151 on the value maximizing the smoothed Monte Carlo profile, based on the quadratic approximation at the maximum. The statistical error is SE stat ¼ 0.32. Combining these sources of uncertainty gives a total standard error of SE total ¼ 0.354. From (3.11), the resulting 95% CI cut-off is d ¼ 2.35. We see in figure 2 that the smoothed profile is close to its quadratic approximation in the neighbourhood of the maximum statistically supported by the data. We also see that the Monte Carlo uncertainty in the profile confidence interval is rather small, leading to a profile cut-off not much bigger than the value of 1.92 for zero Monte Carlo error, despite the large Monte Carlo variability in the evaluation of any one point on the profile.

Panel time-series analysis
Panel data consist of a collection of time series which have some shared parameters, but negligible dynamic dependence. We consider inference using mechanistic models for panel data, i.e. equations for how the process progresses through time derived from scientific principles about the system under investigation. In principle, statistical methods for mechanistic time-series analysis [2] extend to the panel situation [24]. However, extensive data add computational challenges to Monte Carlo inference schemes. In particular, with increasing amounts of data, it must eventually become infeasible to calculate the likelihood with an error as small as one log unit. The MCAP procedure nevertheless succeeds so long as the signal-to-noise ratio in the Monte Carlo profile is adequate. In a simple situation, where each time series is modelled as i.i.d. and each time-series model contains the same parameters, we can check how this ratio scales. The Fisher information scales linearly with the number of time series in the panel, and therefore the curvature of the loglikelihood profile also scales linearly. The Monte Carlo standard error on the likelihood scales at a square-root rate. In this case, we, therefore, expect the MCAP methodology to scale successfully with the number of time series in the panel.
Investigations of population-level infectious disease transmission lead to highly nonlinear, stochastic, partially observed dynamic models. The great majority of disease transmission is local, despite the importance of spatial transmission to seed the local epidemics [25]. Fitting models to panels of epidemiological time-series data, such as incidence data for collections of cities or states, offers potential to elucidate the similarities and differences between these local epidemics.
We demonstrate the MCAP procedure on a panel estimate of the reporting rate of paralytic polio in the pre-vaccination era USA. Reporting rate has important consequences for understanding the system: conditional on observed incidence data, reporting rate determines the extent of the unreported epidemic. Yet, in the presence of many uncertainties about this complex disease transmission system, a single disease incidence time series often cannot conclusively pin down this epidemiological parameter. The profile evaluations in figure 3 were obtained by Bretó et al. [24] in an extension of the analysis of Martinez-Bakker et al. [26]. Martinez-Bakker et al. [26] analysed state-level paralytic polio incidence data in order to study the role of unobserved asymptomatic polio infections in disease persistence. Here, the reporting rate parameter (log(r) in the terminology of [24]) is denoted by f. The MCAP procedure gives a Monte Carlo standard error of SE mc ¼ 0.033 and a statistical error of SE stat ¼ 0.013. Combining them gives a total standard error of SE total ¼ 0.035. The resulting profile cut-off is d ¼ 13.6. The profile decreases slowly to the right of the smoothed MLE, since higher reporting rates can be compensated for by lower transmission intensities. The model struggles to explain reporting rates much lower than the smoothed MLE, since the reporting rate must be sufficient to explain the observed number   of cases in a situation where almost all individuals acquire non-paralytic polio infections. This asymmetrical trade-off may explain why the profile log likelihood shows some noticeable deviation from its quadratic approximation in a neighbourhood of the maximum. A consequence of this changing curvature is that the quadratic approximation used to construct the Monte Carlo profile at its maximum (figure 3, dotted blue line) does not share this maximum.
The computations for figure 3 required approximately 24 h on 300 cores. At this level of computational intensity, we see that the majority of uncertainty about the parameter f is due to Monte Carlo error rather than statistical error. For this large panel dataset, in the context of the fitted model, the parameter f would be identified very accurately by the data if we had access to the actual likelihood surface. Additional computation could, therefore, reduce the uncertainty on our estimate of f by a factor of three. However, the data analyst may decide the available computational effort is better used exploring other parameters or alternative model specifications.

Applications to time-series and spatio-temporal data analysis
The examples in § §4.1 and 4.2 demonstrate applications which were computationally intractable without MCAP.
Applications of the POMP framework to nonlinear timeseries analysis typically involve smaller data sets, and a relatively simple dependence structure, and are therefore less computationally demanding. This consideration has facilitated the utilization of Monte Carlo profile likelihood, without the benefits of MCAP, as a technique at the cutting edge of nonlinear time-series analysis. In the context of infectious disease dynamics, Dobson [27] wrote, 'Powerful new inferential fitting methods [28] considerably increase the accuracy of outbreak predictions while also allowing models whose structure reflects different underlying assumptions to be compared. These approaches move well beyond time series and statistical regression analyses as they include mechanistic details as mathematical functions that define rates of loss of immunity and the response of vector abundance to climate The main practical limitation of this approach is computational resources [6]. We have shown that our methodology can both quantify and dramatically reduce the Monte Carlo error in computationally intensive inferences for POMP models. The MCAP procedure therefore improves the accessibility and scalability of inference for nonlinear time-series models. Spatio-temporal data consists of time series collected at various locations. Models for partially observed spatiotemporal dynamics extend the panel models of §4.2 by allowing for dynamic dependence between locations. SMC methods, that provide a foundation for likelihood-based inference relating POMP models to time-series data, struggle with spatio-temporal data since they scale poorly with spatial dimension [34]. Theoretically, SMC methods with sub-exponential scaling can be developed for weakly coupled spatio-temporal systems [35]. Nevertheless, practical methodology for fitting nonlinear non-Gaussian spatio-temporal models continues to be constrained by high Monte Carlo variance [36]. Thus, this class of inference challenges stands to benefit from our MCAP methodology. The electronic supplementary material, Section S1, presents an example for fitting a coupled spatio-temporal model to measles incidence in twenty cities.

A simulation study of the Monte
Carlo-adjusted profile procedure We look for a numerically convenient toy scenario that generates Monte Carlo profiles resembling figures 2 and 3.
Our simulated data are an independent, identically distributed lognormal sample Y 1:N , where log(Y n ) N[f, 2s 2 ] for n [ 1 : N. We consider a profile likelihood confidence interval for the log mean parameter, f. The lognormal distribution leads to log-likelihood profiles that deviate from quadratic. To set up a situation with Monte Carlo error in evaluating and maximizing the likelihood, we supposed that the likelihood is accessed via Monte Carlo integration of a latent variable. Specifically, we write Y n jX n lognormal(X n , s 2 ) with X n N[f, s 2 ]. Then, our Monte Carlo density estimator is where f LN ( y ; m, t 2 ) is the lognormal density, Àðlog y À mÞ 2 2t 2 ( ) and e 1:J is a sequence of standard normal pseudo-random numbers corresponding to a seed s. We suppose that we are working with a parallel random number generator such that pseudo-random sequences corresponding to different seeds behave numerically like independent random sequences. Our Monte Carlo log-likelihood estimator is '(f, s ; y 1:N , s, J) ¼ X N n¼1 log f Y (y n ; f, s, s þ n À 1, J): ð5:2Þ Our Monte Carlo profile is calculated at f [ f 1:K . We maximize the likelihood numerically, at a fixed seed, to give a corresponding estimate of s given by s P k (y 1:N , s, J) ¼ arg max s '(f k , s ; y 1:N , s þ N(k À 1), J): ð5:3Þ We do not wish to imply that practical examples will generally result from a fixed-seed Monte Carlo likelihood calculation. Seed fixing is an effective technique for removing Monte Carlo variability from relatively small calculations, but can become difficult or impossible to implement effectively for complex, coupled, nonlinear systems.
The following numerical results used N ¼ 50 and J ¼ 3 with true parameter values f 0 ¼ 0 and s 2 0 ¼ 1. There are two ways to increase the Monte Carlo error in the log likelihood for this toy example, by increasing the sample size, N, and decreasing the Monte Carlo effort, J. The Monte Carlo variance of the log-likelihood estimate increases linearly with N, but at the same time the curvature of the log likelihood increases and, within the inferentially relevant region, the profile log likelihood becomes increasingly close to quadratic. Thus, in the context of our methodology, increasing N rsif.royalsocietypublishing.org J. R. Soc. Interface 14: 20170126 actually makes inference easier despite the increasing Monte Carlo noise. This avoids a paradoxical difficulty of Monte Carlo inference for big data: more data should be a help for a statistician, not a hindrance! Decreasing J represents a situation where Monte Carlo variability increases without increasing information about the parameter of interest. In this case, the Monte Carlo variability and the Monte Carlo bias on the log likelihood due to Jensen's inequality both increase. Also, likelihood maximization becomes more erratic for small J since the maximization error due to the fixed seed becomes more important. However, figure 4 shows that, even when there is considerable bias and variance in the Monte Carlo profile evaluations, the Monte Carlo profile confidence intervals can be little wider than the exact interval.
We computed intervals with nominal coverage of 95%. The MCAP coverage here was 93.4%, compared to 94.3% for the asymptotically exact profile (with a simulation study Monte Carlo standard error of 0.2%). The MCAP intervals were, on average, 12.5% larger than the corresponding exact profile interval, with the increased width accounting for the additional Monte Carlo uncertainty.
Two alternative approaches to generating confidence intervals based on a maximum-likelihood estimator are observed Fisher information and the bootstrap method. Comparisons with these methods on the toy example are presented in the supplement (electronic supplementary material, Section S2). Profile likelihood confidence intervals were found to perform favourably on this example, measured by interval width for a given coverage and given computational effort.

Discussion
This paper has focused on likelihood-based confidence intervals. An alternative to likelihood-based inference is to compare the data with simulations using some summary statistic. Various plug-and-play methodologies of this kind have been proposed, such as synthetic likelihood [37] and nonlinear forecasting [38]. For large nonlinear systems, it can be hard to find low-dimensional summary statistics that capture a good fraction of the information in the data. Even summary statistics derived by careful scientific or statistical reasoning have been found surprisingly uninformative compared to the whole data likelihood in both scientific investigations [39] and simulation experiments [40].
Much attention has been given to scaling Bayesian computation to complex models and large data. Bayesian computation is closely related to likelihood inference for stochastic dynamic models: the random variables generating a dynamic system are typically not directly observed, and these latent random variables are therefore similar to Bayesian parameters. We refer to these latent random variables as random effects since they have a similar role as linear model random effects. To carry out inference on the structural parameters of the model (i.e. the vector u in this article) the Bayesian approach looks for the marginal posterior of u, which involves integration over the random effects. Likelihood-based inference for u similarly involves integrating out the random effects. Numerical methods such as expectation propagation (EP) [41] and variational Bayes [42] are effective for some model classes. Another approach is to combine MCMC computations on subsets of the data, as in the posterior interval estimation (PIE) method of Li et al. [43]. The above approaches (EP, VB and PIE) all emphasize situations where the joint density of the data and latent variables can be conveniently split up into conditionally independent chunks, such as a hierarchical model structure. Our methodology has no such requirement. The panel model example above does have a natural hierarchical structure, with individual panels being independent (in the frequentist model sense) or conditionally independent given the shared parameters (in the Bayesian model sense). Our genetic example, and the spatio-temporal example of the electronic supplementary material, Section S1, do not have such a representation.
Some simulation-based Bayesian computation methodologies have built on the observation that unbiased Monte Carlo likelihood computations can be used inside an MCMC algorithm [44]. For large systems, high Monte Carlo variability of likelihood estimates is a concern, in this context, since it slows down MCMC convergence [45]. Doucet et al. [46] found that, for a given computational budget, the optimal balance between number of MCMC iterations and time spent on each likelihood evaluation occurs at a Monte Carlo likelihood standard deviation of one log unit. For the systems we demonstrate, Monte Carlo errors that small are not computationally feasible.
Our simple and general approach permits inference when the signal-to-noise ratio in the Monte Carlo profile log likelihood is sufficient to uncover the main features of this function, up to an unimportant vertical shift. For large datasets in which the signal (quantified as the curvature of the log likelihood) is large, the methodology can be effective even when the Monte Carlo noise is far too big to carry out standard MCMC techniques. Although the frequentist motivation for likelihood-based inference differs from the goal of Bayesian posterior inference, both approaches can be used for deductive scientific reasoning [47,48].
Our methodology builds on the availability of Monte Carlo algorithms to evaluate and maximize the likelihood. If these Monte Carlo algorithms are completely overwhelmed by the problem at hand, our method will fail. Geometric features of the likelihood surface, such as nonlinear ridges and multimodality, can lead to challenges for all numerical methods including Monte Carlo approaches. High dimensionality can also be problematic, particularly if combined with difficult geometry. The presence of challenging   Author's contributions. All authors performed the analysis and wrote the manuscript.