Stochastic modelling of urban structure

The building of mathematical and computer models of cities has a long history. The core elements are models of flows (spatial interaction) and the dynamics of structural evolution. In this article, we develop a stochastic model of urban structure to formally account for uncertainty arising from less predictable events. Standard practice has been to calibrate the spatial interaction models independently and to explore the dynamics through simulation. We present two significant results that will be transformative for both elements. First, we represent the structural variables through a single potential function and develop stochastic differential equations to model the evolution. Second, we show that the parameters of the spatial interaction model can be estimated from the structure alone, independently of flow data, using the Bayesian inferential framework. The posterior distribution is doubly intractable and poses significant computational challenges that we overcome using Markov chain Monte Carlo methods. We demonstrate our methodology with a case study on the London, UK, retail system.


Introduction
The task of understanding the inner workings of cities and regions is a major challenge for contemporary science. The key features of cities and regions are activities at locations, flows between locations and the structure that facilitates these activities [1]. It is well understood that cities and regions are complex systems, and that an emergent structure arises from the actions of many interacting individuals. The flows between locations arise from the choices of individuals. An understanding of the underlying choice mechanism is therefore advantageous for planning and decision-making. Economists have long supported the idea that consumer choices are derived from utility, a measure of net benefit, although preferences can only be measured indirectly by the phenomena they give rise to [2].
Random utility models, such as the multinomial logit model [3], provide a discrete choice mechanism based on a utility function. These models have received considerable attention in the econometrics literature [4]. The more conventional random utility models assume that choices are conditionally independent and require large volumes of flow data to calibrate. It is generally difficult to ascertain the flow data for a large number of individuals residing in a country or city, and this may require an extensive survey that suffers from sampling biases. On the other hand, the structure facilitating activities can be more straight-forward to measure.
It turns out that the flows between locations concern a vast number of individuals and are well represented by statistical averaging procedures [5]. It also turns out that the evolution of urban structure can be described by a system of coupled first-order ordinary differential equations that are related to the competitive Lotka-Volterra models in ecology [6]. The conventional Harris and Wilson model in [6] is obtained by combining Lotka-Volterra models with statistical averaging procedures, after having expressed the flows in terms of the evolving structure and spatial interaction. As it tends to be more feasible to observe the emergent structure, for example configurations of floorspace dedicated to retail activity, our work is largely motivated by the existing models of urban structure [1,[6][7][8][9]. By adopting a similar approach, we view the flows between locations as 'missing data'.
We note, however, that there is an urgent need to provide an improved modelling capability that captures the stochastic nature and uncertainty associated with the evolution of urban structure. The key shortcoming of the Harris and Wilson model is that it is deterministic and converges to one of multiple equilibria as determined by the initial conditions. In reality, the behaviour provided by the Harris and Wilson model would be accompanied by fluctuations arising from less predictable events. We instead introduce mathematically well-posed systems of stochastic differential equations (SDEs) to address this shortcoming, and provide an associated Bayesian inference methodology for parameter estimation and model calibration.
To this end, we take a novel approach and construct a probability distribution to represent the uncertainty in equilibrium structures for urban and regional systems. The probability distribution is a Boltzmann-Gibbs measure that is the invariant distribution of a related SDE model [10], and is defined in terms of a potential function whose gradient describes how we expect urban structure to evolve forward in time. The potential function may be interpreted as constraints on consumer welfare and running costs from a maximum entropy argument [8,11]. For the purposes of parameter estimation, the Boltzmann-Gibbs measure forms an integral part of the assumed datagenerating process in a Bayesian model of urban structure [12,13]. A computational statistical challenge arises as there is an intractable term in the density of the Boltzmann-Gibbs measure that is parameter dependent. The intractable term must be taken into consideration when using Markov chain Monte Carlo (MCMC) to explore the probability distributions of interest [14,15]. Our approach is applicable to a wide range of applications in urban and regional modelling; we demonstrate our approach by inferring the full distribution over the model parameters and latent structure for the London, UK, retail system.

Modelling urban systems
In this section, we construct a probability distribution for urban and regional systems. We work in the setting of the Harris and Wilson model [6] and use consumer behaviour as an archetype; however, the methodology is general and has wider applications such as archaeology, logistics, healthcare and crime to name a few [1]. We are interested in the sizes of M destination zones where consumer-led activities take place, for example shopping. Similarly, there are N origin zones from where consumers create demands for each of the destination zones. We define urban structure   Figure 1. Illustration of a flow in an urban or regional system. It is assumed that there are N origin zones (e.g. left) and M destination zones (e.g. right). The flow T ij denotes the flow of quantities from origin zone i to destination zone j. In an urban or regional system, there are NM flows similar to the one depicted.
as the vector of sizes In what proceeds it is more natural to work in terms of log-sizes X = {X 1 , . . . , X M } ∈ R M , where each W j = exp(X j ). We refer to log-size as the attractiveness, which is an unscaled measure of benefit, and by working in terms of attractiveness we avoid positivity issues when developing a stochastic model. We first describe a stochastic generalization of the Harris and Wilson model and then consider the equilibrium distribution as a probability distribution of urban structure.

(a) A stochastic reformulation of the Harris and Wilson model
The flow between destination zone j and origin zone i is denoted T ij . We illustrate a component of an urban or regional system in figure 1. For a singly constrained urban system, the demands made by the N origin zones are and are known. The demands made for the M destination zones are and are to be determined. The demands for the destination zones depend on urban or regional structure. It is assumed that larger zones provide more benefits for their use and that local zones are more convenient and cost less to use. A suitable model of the flows is obtained by maximizing an entropy function subject to the constraint in (2.1) in additional to fixed benefit and cost constraints [5,8]. The resulting flows are where α, β > 0 are scaling parameters and each c ij ≥ 0 represents the cost or inconvenience of carrying out an activity at zone j from i. We expect that zones with unfulfilled demand will grow, whereas zones that do not fulfil their capacity will reduce to a more sustainable size. It is therefore reasonable to expect a degree of stability in the sizes of the destination zones. A suitable model of the dynamics is given by the Harris and Wilson model [6], which is described by a system of ordinary differential equations (ODEs) 1 dW where > 0 is the responsiveness parameter and κ > 0 is the cost per unit floor size. The assumption that zones aim to maximize their size until an equilibrium is reached is justified by including the cost of capital in the running costs. A natural generalization of the Harris and Wilson model is the following SDE with multiplicative noise that we interpret in the Stratonovich 2 sense: for a standard M-dimensional Brownian motion B and volatility parameter σ > 0. A heuristic interpretation of the SDE is that, over a short time δt, the net capacity term 'D j − κW j ' in (2.4) is randomly perturbed by centred Gaussian noise with a standard deviation of σ √ δt. The noise term represents fluctuations in the growth rates arising from less predictable events that are not captured by the original model. The specification of multiplicative noise preserves the positivity of each W j .
With the change of variables X j = ln W j , the Harris and Wilson model in (2.4) can be expressed as a gradient flow. The corresponding stochastic dynamics in (2.5) is an overdamped Langevin diffusion. To express this notion, we introduce a potential function V: R M → R, its gradient ∇V: R M → R M and an 'inverse-temperature' parameter γ = 2σ −2 and reformulate the stochastic dynamics as where the potential function is It is well understood that the density of X(t), denoted ρ(x, t), evolves in time according to the Fokker-Planck equation [10]. For the SDE in (2.6), the Fokker-Planck equation can be written as While (2.8) is very challenging to solve, especially in higher dimensions, its steady-state solution is available in closed form and is the density of a Boltzmann-Gibbs measure given by The Boltzmann-Gibbs measure described by (2.9) forms the basis of our stochastic model of urban structure. The potential function given by (2.7) does not yield a well-defined probability distribution as the normalizing constant in (2.9) is not finite. In order to address the issue, we could restrict the dynamics to a bounded subset of R M , or introduce a confining term in the potential function. We adopt the latter approach and later argue that this approach amounts to an economically meaningful constraint.

(b) Boltzmann-Gibbs measures for urban structure
We model urban and regional structure as a single realization of the Boltzmann-Gibbs measure described by (2.9). The Boltzmann-Gibbs measure is the stationary distribution of the overdamped Langevin dynamics considered; however, we acknowledge that there are other stochastic processes that have the same stationary distribution [16]. It is desirable that the potential function satisfies the assumptions in appendix A. It suffices to say here that smooth potential functions that grow at least linearly but no faster than exponentially at infinity have the desired mathematical properties. The Boltzmann-Gibbs measure can also be obtained from a maximum entropy argument [8,11]. The advantage of this view is that the terms in the potential function can be interpreted as economic constraints. We consider a potential function with three components to develop a baseline model, although more comprehensive presentations are possible 3 (2.10) where κ is as before and δ > 0 is an additional parameter. The utility potential describes consumer welfare arising from utility-based choices; the cost potential enforces capacity limits in the system; and the additional potential is a confining term that represents government initiatives, continued investment or a background level of demand. If we consider a random variable X ∈ R M that is subject to the following equality constraints: with each C i ∈ R, then the maximum entropy distribution of X can be written as the Boltzmann-Gibbs measure whose density is given by (2.9) with reference to the potential function in (2.10). We now consider the meaning of each of these constraints in turn.

(i) Utility potential
A natural candidate for a utility potential is a measure of consumer welfare. For example, welfare may be taken to be the area under the demand curve [17], given by (2.2), that is equal to the path integral where we have defined the deterministic utility function In appendix B, we show that (2.13), but with α dependent on i, is also obtained by seeking a utility function consistent with a singly constrained model and the path integral in (2.12). The log-sum function is commonly used as a welfare measure in the economics literature [17][18][19]. To make the connection with random utility maximization explicit, we define the stochastic utility function for a choice being made from origin zone i as where the ξ ij are independent and identically distributed Gumbel random variables. Then under the utility maximization framework, the expected utility attained from a unit flow leaving origin zone i is where c is the Euler-Mascheroni constant [17]. The utility potential may then be expressed as the expected utility attained from all flows in units of α Two remarks are in order. First, the scaling factor of α −1 is necessary to ensure that the utility potential is non-constant in the limit α → 0. Second, the tight bounds [20] show that V Utility (x) may be finite when any x j → −∞, and so an additional potential is needed to prevent zones from collapsing from a lack of activity. The bounds again show that the utility potential is closely related to the best alternative available to each of the origin zones.
(ii) Cost potential The cost potential prevents each zone from becoming too large, and is justified by the notion that running costs increase with size. We therefore require that lim x j →+∞ V(x) = +∞. In view of the equality constraints in (2.11), an appropriate cost potential is the total size or capacity of the system this choice of potential yields the linear cost term in the overdamped Langevin dynamics considered in (2.6).

(iii) Additional potential
The final potential term must satisfy lim x j →−∞ V(x) = +∞ and must grow sufficiently fast at infinity in order for (2.9) to be well defined. The purpose of the additional potential is to prevent zones from collapsing from a lack of activity. Such mechanisms are commonplace in urban and regional systems, for example continued investment or government initiatives. In view of the equality constraints in (2.11), a suitable potential function is 20) which ensures that the attractiveness of each zone is finite. The partial derivatives of the additional potential function are Therefore, the finiteness constraint requires that there is an additional positive constant term in the deterministic part of the SDE model, given by (2.6), to ensure that the SDE has a well-defined stationary distribution.

(c) Model summary
In summary, we have specified the following potential function: Illustration of the potential function for a small model comprising two competing zones. The profile of the potential function is largely determined by the α and β pairing; here, we have held β fixed and show e −γ V(x) for different values of α.
an additional term to prevent zones from collapsing. A stochastic generalization of the Harris and Wilson model is given by the overdamped Langevin diffusion in (2.6), for which the process converges at a fast rate to the well-defined Boltzmann-Gibbs measure described by (2.9). The corresponding size dynamics, in the form of (2.5), are given by the Stratonovich SDE which is a stochastic generalization of the original Harris and Wilson model that includes a positive shift to the multiplicative scale factor. In the limit δ, σ → 0, we obtain the original Harris and Wilson model in (2.4).
In the regime δ → 0, the potential function has stationary points coinciding with the fixed points of the original Harris and Wilson model. The stationary points for the potential function are given by M simultaneous equations While the behaviour of the stochastic and deterministic models may be similar in low-noise regimes over finite time intervals, we emphasize that the asymptotic behaviour differs greatly between the two. Here, we consider a deterministic model to be given by (2.23) in the limit σ → 0. For a deterministic model, the dynamics will converge to a stable fixed point satisfying (2.24), as determined by the initial condition. For a stochastic model, the system will converge to a statistical equilibrium that does not depend on the initial condition. As t → +∞, the stochastic model spends more time around the lower values of V(x), which occur around stable stationary points, as summarized by the limiting stationary distribution given by (2.9). We now comment on the Boltzmann-Gibbs measure described by (2.9). The Boltzmann-Gibbs measure is the equilibrium distribution of (2.6), but is also justified as a probability distribution for urban and regional structures with a maximum entropy argument. When considering the Boltzmann-Gibbs measure, we specify = 1 to avoid over-parametrizing the model since the relative level of noise is controlled by the inverse temperature γ = 2σ −2 . As γ → +∞, the Boltzmann distribution collapses to a Dirac mass around the global minimum of V(x), which is unlikely to provide a good fit to the observed urban structure. As γ → 0, the distribution of sizes approaches an improper uniform distribution. The profile of V(x) is largely influenced by the pair of α and β values, as illustrated in figure 2. A large α relative to β results in all activity taking place in one of the zones, whereas this regime is unlikely when α is low relative to β. Lastly, we can use the deterministic model to specify appropriate values of the cost of floorspace κ and the additional parameter δ. By defining  .25) is justified with a supply and demand argument [21]. For simplicity, we use K = 1; the choice is arbitrary. We can then specify δ relative to the size of the smallest zone possible, since at equilibrium the size of a zone with no inward flows is δ/κ.

Parameter estimation
In this section, we consider the inverse problem; the task of determining α and β from observed urban structure. The value of α describes consumer preference towards more popular destinations and the value of β describes how much consumers are inconvenienced by travel. We use retail activity as an archetype; however, our methodology is general and can be applied to other singly constrained systems. While α and β can be estimated using discrete choice models [22][23][24][25], this approach requires large volumes of flow data and is impractical for large systems. We instead make use of the model described by the Boltzmann-Gibbs measure in §2.
We formulate the task of inversion as a statistical inference problem, as advocated in [12]. The Bayesian approach is based on the following principles: the unknown parameters are modelled as random variables; our degree of uncertainty is described by probability distributions; and the solution to the inverse problem is the posterior probability distribution. Unlike classical methods, a Bayesian approach is well posed and allows us to incorporate prior knowledge of the unknowns into the modelling process. A Bayesian approach yields a posterior probability over the model parameters, and the parameter values can be determined from the posterior mean or maximum a posteriori estimates.

(a) A Bayesian approach to parameter estimation
The Boltzmann distribution in §2 is assumed to form an integral part of the data-generating process; however, further uncertainty arises from measurement noise. 4 To this end, we assume that an observed configuration of urban structure Y ∈ R M >0 is related to some latent sizes W ∈ R M >0 and multiplicative noise E ∈ R M >0 by Multiplicative noise is appropriate as all measurements are positive, and there is more scope for error when measuring larger zones. As before, it is natural to work in terms of log-sizes X = ln W ∈ R M , and we assume that X ∼ ρ ∞ is a realization of the Boltzmann-Gibbs measure given by (2.9) and (2.22). The latent variables X depend on the model parameters Θ = {α, β} ∈ R 2 >0 , which we summarize by a single variable for notational convenience. We assume that ln E ∼ N(0, Σ) is a realization of Gaussian noise for some symmetric positive definite covariance matrix Σ ∈ R M×M .
We specify a prior π (θ ) on the model parameters. The prior distribution for the latent variables is denoted π (x|θ ) and is given by (2.9), which we repeat here to make the θ-dependence explicit in our notation We emphasize that π (x|θ ) is only known up to a normalizing constant z(θ) that is a function of θ. The likelihood function π (y|x) is the Gaussian density given by (3.1). The joint posterior density then has the form π (x, θ|y) ∝ π (θ ) 1 z(θ ) exp(−γ V θ (x))π (y|x), (3.3) and is 'doubly intractable' as both the normalization factor of (3.3) and the function z(θ) are unknown. The estimation of z(θ) is a notoriously challenging problem as it requires the integration of a complex function over a high-dimensional space [13,15]. The normalization constant z(θ) is a probability-weighted sum of all possible outcomes and is a necessary penalty against model complexity. The θ -dependence for the z(θ )-term poses significant computational challenges as the joint posterior density cannot be evaluated at all, not even up to an irrelevant multiplicative constant.

(b) Computational strategies
To explore the posterior distribution, we resort to numerical simulation and use MCMC to estimate integrals of the form where g(x, θ) is an integrable function of interest. For example, (3.4) can be used to compute the mean, variance and density estimates of the posterior marginals. As suggested in [13], we can use an approximate method to estimate z(θ ). We consider the quadratic approximation of V θ (x) that is obtained from a second-order Taylor expansion around its global minima m θ As the integral for z(θ ) only has significant contributions in the neighbourhood of m θ , where (3.5) is a good approximation, we estimate z(θ ) as This is known as a saddle point approximation and is asymptomatically accurate as γ → +∞ [26]. In all but special cases, the global minima of V θ (x) is unique and can be found inexpensively using Newton-based optimization; for example, using the limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) algorithm with the right initial condition [27]. We run the optimization procedure for multiple initializations to provide a good coverage of the basins, although this is only necessary for α > 1. The curvature term V θ (m θ ) is given by (A 5). With (3.6), we can proceed with the MCMC scheme in appendix C.
To obtain more accurate posterior summaries, especially in the case that the saddle point approximation performs poorly, we look towards a consistent estimator of (3.4). Despite the intractable z(θ) term, we are able to construct a Markov chain . (3.7) The estimator requires that we can obtain unbiased estimates of the reciprocal normalizing constant 1/z(θ ), which can be obtained by randomly truncating an infinite series involving importance sampling estimates of z(θ ) [15,28]. The estimator given by (3.7) is an importancesampling-style estimator, but with each weight Ω (i) ∈ {−1, 1} equal to the sign of the unbiased estimate of 1/z(θ) for that iteration. The suitability of the scheme is dependent on being able to obtain precise importance sampling estimates of z(θ), which is challenging for low-noise regimes due to the concentration of measure. Negative values of Ω (i) arise from imprecision in the z(θ)

(c) Implementation details
We specify weakly informative uniform priors on α and β, restricted to the interval [0, 2] with a suitable scaling of β determined by a preliminary study, as done in [1,21]. 5 In this setting, we are able to compare our inferred α and β values with the R 2 analysis performed for the deterministic Harris and Wilson model in [1,21]. While ideally we would place priors on all parameters that specify the Boltzmann-Gibbs measure, we acknowledge that in doing so we would encounter both identifiability issues and tuning difficulties with regards to the importance sampling scheme for z(θ ). We are able to proceed by fixing the remaining hyperparameters to suitable values. We specify = 1 to avoid over-parametrizing the model and specify γ to reflect a desired level of noise. We set δ to the size of the smallest zone. This is justified by considering the Gamma distribution of a zone with no inward flows. We normalize the origin quantities and total sizes to determine κ from (2.25). Lastly, for demonstration purposes, we specify independent and homogeneous observation noise by setting Σ = λ 2 I, where λ is the standard deviation of the noise.
To compute low-order summary statistics of the form in (3.4), we use the Monte Carlo scheme in appendix C, consisting of a block Gibbs scheme. We use a Metropolis-Hastings random walk with reflective boundaries for the Θ-updates and a Hamiltonian Monte Carlo (HMC) for the X-updates. We tune the step size parameter for the Θ-updates to obtain an acceptance rate in the range 30-70%, and we tune the step size and number of steps for the X-updates to obtain an acceptance rate of at least 90%. For the Θ-updates, we require either global minima of V θ (x), for the saddle point approximation in (3.6), or unbiased estimates of 1/z(θ), for the pseudo-marginal MCMC framework described in appendix C. When requiring global minima of V θ (x), we perform multiple runs of the L-BFGS algorithm for M different initial conditions. When requiring consistent estimates of 1/z(θ ), we use annealed importance sampling (AIS) with HMC transition kernels. We initialize AIS with the log-gamma distribution that can be obtained from π (x|θ ) by letting α, β → 0. We produce unbiased estimates by truncating an infinite series of importance sampling estimates with a random stopping time T with Pr(T ≥ k) ∝ k −1.1 , and therefore requiring T + 1 runs of AIS. Running AIS a large number of times is a computationally intensive task; however, the estimates can be obtained in parallel.

Case study: the London retail system
In this section, we illustrate our proposed methodology with an aggregate retail model using London, UK, data similar to the example in [1,21]. While the model can be improved with disaggregation to capture further problem-specific characteristics, the underlying arguments would remain the same. We demonstrate how the Boltzmann-Gibbs measure can be used to simulate configurations of urban structure before setting out to infer the α and β values in the utility function. In the context of retail, the attractiveness term in (2.13) is justified by the benefit consumers gain from the improved range of options and economies of scale, and the cost term represents inconvenience of travel. The inverse problem is of particular interest in the context of retail as the flow data are difficult to obtain. On the other hand, urban structure is relatively straight-forward to measure and may be routinely available. While some attempts have been made in the literature to estimate the parameters of a similar spatial interaction model [1,21], these approaches are somewhat ad hoc but do provide a basis of comparison.
We obtain measurements of retail floorspace for London town centres for 2008 from a London Town Centre Health Check report prepared by the Greater London Authority [29]. We only include town centres classified as international, metropolitan and major town centre  classifications in our study, giving M = 49 town centres. The remaining town centres are mostly district town centres that have a relatively high concentration of convenience goods and more localized catchment; we argue that these would be better modelled separately. We determine the origin quantities from ward-level household and income estimates, with N = 625, published by the Greater London Authority [30,31]. We take the origin quantities to be the spending powers as given by the population size multiplied by the average income. The floorspace measurements and residential data are presented in figure 3, over the map of London [32]. In our implementation, we calculate the cost matrix from Euclidean distance, although a better representation would use a transport network [1].
We first perform a preliminary study of our model in the limit of no observation noise λ → 0, in which case the θ -marginal of (3.3) is With this simplification, we are able to evaluate the posterior probabilities over a grid of α and β values. We evaluate the probabilities over a 100 × 100 grid for γ = 10 2 and γ = 10 4 , representing high-noise and low-noise regimes, respectively. Using the justification given in the previous section, we specify δ = 0.006 and κ = 1.3. We produce the grid by estimating z(θ) with the saddle point approximation in (3.6). 6 The results are presented in figure 4, in which the scales indicate that the model with high noise provides the better explanation of the data. We find that the best fit for the high-noise regime is α = 0.90 and β = 0.46 and the best fit for the low-noise regime is α = 1.18 and β = 0.28. As expected, the low-noise regime suggests stronger attractiveness effects as the model with a higher level of noise is able to explain variation by stochastic growth. The α and β values are positively correlated; this can be seen in figure 4 and is due to the competing effects in the utility function in (2.13).
In [1], the authors perform an R 2 analysis that we replicate here for our deterministic version of the Harris and Wilson model for a basis of comparison. The predicted value W Pred is taken to be the equilibrium obtained from the ODE model given by (2.23) with σ → 0 and the initial condition w 0 = Y. The R 2 value is defined as R 2 = 1 − SS res /SS tot , where SS res /SS tot is the ratio of  the variance of the residuals Y − W Pred and the variance of the observed Y. While our Bayesian approach is fundamentally different, and we should not expect to obtain too similar results, the R 2 analysis yields a best fit of α = 1.36 and β = 0.42. This is agreeable with the findings for the low-noise regime in figure 4. Furthermore, there are some strong similarities between the profile of posterior probabilities and the profile of the R 2 -values. First, both approaches find that the poorest fit is for a regime in which α is too high and β is too low; these values result in most activity taking place in a single zone. Second, both approaches agree in that a good fit can be found for 1 < α < 2. Next, we draw the latent variables from the prior distribution π (x|θ) to verify the suitability of the modelling. For illustrative purposes, we consider a range of α values across [0, 2], and hold β = 0.5 fixed. For the regime with high noise, the approximate draws are obtained by running a Markov chain of length 10 000 using HMC combined with parallel tempering for five different temperature levels [14]. For the regime with low noise, we plot configurations of the global minima of V θ (x) obtained from numerical optimization as there is little variation between samples. The results are in figures 5 and 6, respectively. It can be seen that higher values of α and lower values of β create a sparse structure in that all activity takes place in very few zones. Conversely, lower values of α and higher values of β lead to a more dense structure.
We now return to the observation model in (3.1) to account for observation noise in the data. For illustrative purposes, we specify λ = 0.1 so that the relative noise for a zone of size 1/M 7 is 3%. Although an improved specification of observation noise from a preliminary study would lead to more accurate inferences, the arguments and methodology we are presenting would remain the same. We run Markov chains of length 20 000. For the high-noise regime, we use the pseudomarginal MCMC methodology in appendix C. Our importance sampling estimates comprised 10 particles and 50 equally spaced inverse temperatures. For the low-noise regime, we were unable to obtain precise importance sampling estimates due to the concentration of measure, so used the MCMC methodology in appendix C with the saddle point approximation in (3.6). data suggest that the model provides a reasonable fit to the data, and that the assumption of homogeneous observation noise is reasonable. This is to be expected as the high-noise model provides a flexible model. After taking into account the observation noise, a weaker attractiveness effect was observed. Similar plots are presented in figures 9 and 10 for the low-noise regime, and the posterior marginals of α and β give a mean of 1.17 ± 0.01 and 0.26 ± 0.01, respectively. The inferred values for the low-noise regime are in line with figure 4. Both sets of posterior summaries suggest that attractiveness and inconvenience effects are present in the data, though the inferred α values are considerably higher for the low-noise model. The plots of the expected residuals and observation data suggest that the model also provides a reasonable fit to the data; however, there is more dispersion in the plotted quantities and possibly a degree of heteroscedasticity due to the less flexible model. The model with more noise favours the simpler explanation that most variation is due to stochastic growth, whereas the low-noise model is more constrained. There is notably more uncertainty in the latent variables and the model parameters for the high-noise regime, as there are more possible explanations for the observation data. The uncertainty in the α and β estimates is so great for the high-noise regime that limited insights are gained for the purposes of model calibration. As a result, we conclude that strong assumptions are required in the prior modelling in order to be able to exploit known structure in the data-generating process. The required assumptions can be made, for example, through the prior modelling of α and β or by specifying a high value of γ . On the other hand, the low-noise regime results in very confident posteriors. Although the resulting inferences are consistent with previous studies, care must be taken to avoid being overconfident in a particular model by not adequately accounting for uncertainty in the modelling process.

Discussion
We have developed a novel stochastic model to simulate realistic configurations of urban and regional structure. Our model is a substantial improvement on existing deterministic models as it fully addresses the uncertainties arising in the modelling process. Unlike existing time-stepping schemes, our model can be used to simulate realistic configurations of urban structure using MCMC methods without recourse to numerical error. We have demonstrated that our model can be used to infer the components of a utility function from observed structure, thereby providing an alternative to the existing discrete choice models. The key advantage is that we avoid the need to collect vast amounts of flow data. While we have presented our methodology in the context of consumer-led behaviour, our approach is applicable to other urban and regional settings such as archaeology, logistics, healthcare and crime to suggest a few. Our work has led to specific areas for further research. We are actively investigating the deployment of our methodology to large-scale urban systems, for which there are substantial computational challenges to overcome. The cost of a potential or gradient evaluation is O(NM); however, increasing M means that the z(θ ) estimates are more challenging to obtain owing to the curse of dimensionality. It is of interest to develop more tractable methods, for example optimization based, so that inference can be performed for international models on a practical time scale. We have presented an aggregate model that can be refined to better represent domainspecific characteristics as discussed in [1]. It remains to use the proposed methodology as part of a more realistic study with wider objectives. Lastly, we emphasize that our methodology is only applicable to cross-sectional data. In practice, many applications of interest require processing time-series data that are highly correlated over time. In this setting, we would need to solve the filtering or smoothing problem for (2.23), and in doing so would also need to account for general trends and seasonality effects that are exogenous to our model. Our work continues to be part of ongoing efforts to draw insights from data by making use of the known mathematical structure [33].
( C 2 ) The key challenge arises from proposing new Θ-values, in which case the acceptance probability contains an intractable ratio z(θ )/z(θ ). We can either proceed with a deterministic estimate of z(θ), at the expense of a bias, or we can obtain a consistent estimator of (3.4) with pseudo-marginal MCMC [38] provided that unbiased and reasonably precise estimates of 1/z(θ) are available.
(a) Unbiased estimates of the reciprocal of the normalizing constant An unbiased estimate of z(θ ) is given by averaging over a batch of importance weights. The importance weights are evaluations of at locations drawn from a proposal distribution with density q(x). By Jensen's inequality, the reciprocal of an importance sampling estimate of z(θ) is a biased estimate of 1/z(θ). Instead, unbiased estimates of 1/z(θ ) can be obtained by randomly truncating an infinite series: for a sequence {V i } satisfying lim i→+∞ E[V i ] = 1/z(θ ), and a random stopping time T, the estimator gives an unbiased estimate of 1/z(θ ) [28,[39][40][41]. We follow [15] and use the increasing averages estimator ,X (k) ∼ q, ( C 5 ) with reference to the importance sampling weights described by (C 3). The unbiased estimates of 1/z(θ ) can have high variance when the importance weights are highly variable. Fortunately, importance sampling may be carried out on an augmented state space, for example using AIS [42].

(b) Pseudo-marginal Markov chain Monte Carlo
The unbiased estimators of 1/z(θ ) given by (C 4) may be a negative estimate, which prohibits the use of the pseudo-marginal MCMC. Fortunately, the so-called 'sign problem' has been addressed in [28]. We can use the following importance sampling style of estimator that gives a consistent estimator of (3.4) in that: where {X (i) , Θ (i) , Ω (i) } n i=1 is a Markov chain obtained using the Metropolis-within-Gibbs scheme described at the start of this section but with the following acceptance probability for Θ-updates: a Θ (θ |θ ) = min 1, π (y|x , θ )|S | exp(−γ V θ (x))π (θ ) π (y|x, θ)|S| exp(−γ V θ (x))π (θ) (C 7) and Ω (i) = sgn(S (i) ). It is necessary to cache the value of S (i) at each iteration as part of the pseudomarginal MCMC scheme.