Generative models for network neuroscience: prospects and promise

Network neuroscience is the emerging discipline concerned with investigating the complex patterns of interconnections found in neural systems, and identifying principles with which to understand them. Within this discipline, one particularly powerful approach is network generative modelling, in which wiring rules are algorithmically implemented to produce synthetic network architectures with the same properties as observed in empirical network data. Successful models can highlight the principles by which a network is organized and potentially uncover the mechanisms by which it grows and develops. Here, we review the prospects and promise of generative models for network neuroscience. We begin with a primer on network generative models, with a discussion of compressibility and predictability, and utility in intuiting mechanisms, followed by a short history on their use in network science, broadly. We then discuss generative models in practice and application, paying particular attention to the critical need for cross-validation. Next, we review generative models of biological neural networks, both at the cellular and large-scale level, and across a variety of species including Caenorhabditis elegans, Drosophila, mouse, rat, cat, macaque and human. We offer a careful treatment of a few relevant distinctions, including differences between generative models and null models, sufficiency and redundancy, inferring and claiming mechanism, and functional and structural connectivity. We close with a discussion of future directions, outlining exciting frontiers both in empirical data collection efforts as well as in method and theory development that, together, further the utility of the generative network modelling approach for network neuroscience.


Introduction
Many complex systems are composed of elements that interact dyadically with one another and can therefore be represented as graphs (networks) composed of nodes interconnected by edges. The network framework can be applied to systems across a range of disciplines, from sociology and psychology to molecular biology and genomics, making it possible to leverage a common mathematical language and set of analytic tools to investigate the topological organization of systems that, outwardly, might appear dissimilar to one another [1].
In neuroscience, network-based analyses have become common. This is due in part to initiatives for sharing large, multimodal neuroimaging datasets [2,3], the availability of easy-to-use software packages for computing graph-theoretic metrics [4,5], and the fact that networks are natural vehicles for representing and analysing complex spatio-temporal interactions among neural elements, including neurons, populations and brain areas [6].
Though the scope of topics studied in network neuroscience is broad, the typical study involves characterizing the structure of a network with a series of summary statistics. Each statistic describes a particular feature of the network, ranging from simple to complex and operating over all topological scales. For example, degree is a local (node-level) property that simply counts a node's total number of incoming and outgoing connections. On the other hand, characteristic path length is a global (whole-network) measure of the average length of all pairwise shortest paths. In general, summary statistics offer succinct descriptions of a network's organizational features, especially those that are not immediately apparent given a network's list of nodes and edges.
The application of summary statistics to better understand the structure and function of biological neural networks has been fruitful. Over a decade or so, evidence from networks across different organisms and spatial scales [7] has converged onto a small set of properties and summary statistics that, collectively, can be used to describe the organization of most biological neural networks. These include indices of small-worldness [8], heavy-tailed degree and edge-weight distributions [9,10], a diverse meso-scale structure that includes segregated modules but also core-periphery structure [11][12][13], hubs and rich clubs [14,15], and economic spatial layouts favouring the formation of short-range (low-cost) connections [16,17]. Further, such core organizational principles also include functional constraints, like the need to balance properties that support either segregated or integrated brain function [18], but also emphasize the trade-off between the cost of such properties and their functionality [19]. These properties, collectively, create a caricature of neural system organization and function.
While illuminating, the process of describing networks in terms of their topological properties amounts to an exercise in 'fact collecting'. Though summary statistics might be useful for comparing individuals [20] and as biomarkers of disease [21], they offer limited insight into the mechanisms by which a network functions, grows, and evolves. Arguably, one of the overarching goals of neuroscience (and biology, in general) is to manipulate or perturb networks in targeted and deliberate ways that result in repeatable and predictable outcomes [22]. For network neuroscience to take steps towards addressing this goal, it must shift its current emphasis beyond network taxonomy-i.e. studying subtle individual-or population-level differences in summary statistics-towards a science of mechanisms and processes [23,24].
While there exist many methodological approaches for seeking mechanisms in networks and a range of spatial, topological and temporal scales at which those methods can be deployed [25], the focus of this article is on network generative modelling. Network generative modelling is a flexible framework for generating synthetic networks from a set of parametrized wiring rules. Generative models figure prominently in the network science canon [26][27][28][29], and have recently been deployed in domain-specific scenarios to study the evolution of protein interaction networks [30][31][32], the worldwide web [33], and social systems [34]. Importantly, and provided that the wiring rule is sufficiently informed and biologically grounded, generative models can be used to test and identify potential mechanisms that underlie the growth and evolution of biological neural networks. With mechanisms in hand, it becomes possible to distinguish the topological features that drive a network's growth from those that emerge as mere byproducts [35], and to pursue deliberate and targeted interventions [36,37].
In the following sections, we present a primer on network generative models, highlighting their past use, their interpretation and several open methodological considerations. We review current applications of generative models to neural systems, emphasizing several outstanding questions and implementation details. Finally, we plot a course for future studies. While our review discusses generative models of biological neural networks, in general, we focus our discussions on structural networks that represent the physical pathways among neural elements-e.g. synapses, axonal projections and white-matter fibre bundles.

Generative models: a primer
This article deals with the topic of generative models. Broadly, a generative model is a statistical process that outputs a synthetic set of data or observations. Usually, these synthetic data and the generative process are designed to have some properties in common with empirical data and with the process believed to have generated those data. Generative models are often parametrized, and those parameters can be chosen so as to minimize the discrepancy between observed and synthetic data. The models, themselves, can be compared against one another using standard model comparison techniques, including goodness-of-fit criteria and cross-validation approaches.
In the context of network science, generative models represent algorithmically implemented wiring rules or causal processes that output synthetic networks with a particular set of topological properties or that perform a particular set of functions. While a network's nodes and edges encode all of its structural properties, studying generative models shifts focus away from those structural properties and instead onto wiring rules and the process of network formation. This shift in emphasis confers a number of distinct advantages: (1) Generative models compress our descriptions of networks and highlight regularities in their organization. (2) They make predictions about out-of-sample and unobserved network data. (3) Under the best circumstances, generative models can uncover network mechanisms.
We discuss these topics in greater detail throughout the following subsections.

Compressibility of networks
Generative models compress our descriptions of a network, encoding the network's topology in a set of wiring rules and parameters. Naively, we could describe a network exactly given a list of its nodes and edges: that is, by consulting the list, we could correctly connect nodes that are supposed to be connected and avoid connecting nodes that are not supposed to be connected. However, connections in many networks are not independent of one another, but instead exhibit statistical regularities such that, given the wiring rule that matches those regularities, we could predict the presence/absence of connections ahead of time. In this case, it becomes unnecessary to consult the list of nodes and edges to describe the network. More importantly, we can often interpret the wiring rule itself to uncover the network's organizing principles.
As an example, consider real-world spatial networks, where the probability of observing an edge between two nodes decays as a function of distance [38]. Often, these kinds of networks can be well approximated by a simple geometric model whose wiring rule mimics the network's distance-dependent connection formation [39]. To perfectly describe a spatial network, we could generate a long and possibly unwieldy list of its nodes and edges. However, if the geometric model is a good approximation, e.g. synthetic networks generated by the model recapitulate many observed edges, then the model can be used to replace those edges in the list, effectively shortening our description of the network. The geometric model naturally mimics the distance dependencies of the spatial network. For many networks, however, the statistical regularities among links may not be obvious, in which case selecting the appropriate model may not be straightforward. We discuss this issue of model selection later in this section.

Predictability
Besides compressing our descriptions of a network, generative models also have predictive capacity and can be used as forward models of unobserved and out-of-sample data. Returning to the example of spatial networks, we might hypothesize the relevance of a generative model in which the probability of connection formation is given by a decaying exponential. If we let A ij [ f0, 1g indicate the presence or absence of an edge between nodes i and j, we can write this connection probability as: P(A ij ¼ 1) / exp (Àb Á D ij ), where D ij is the distance between nodes i and j and b ! 0 is a parameter to be fit [40]. If we were given a network G, we could fit the parameter b so that the discrepancy between synthetic networks generated by the model and G is minimized. Having fit the model, we could use it to make predictions about a second network, G 0 , whose connectivity pattern is unknown but whose nodes' spatial locations are given. As another example, consider the stochastic blockmodel [27,41], in which nodes are assigned membership to one of K communities, z i [ f1, . . ., Kg, and where the probability of two nodes, i and j, being connected to one another depends only on their community assignments: P(A ij ¼ 1) ¼ v z i ,z j (v is a K Â K matrix that encodes community-to-community connection probabilities). Fitting this model to a network G entails inferring nodes' communities and connection probabilities. If we encountered a second network, G 0 , with an unknown connectivity pattern but whose nodes correspond to those in G, e.g. the same set of neurons or brain regions, then we could use the model to predict the configuration of nodes and edges in that network.

Mechanisms
Finally, provided that it incorporates sufficient systemspecific details (in our case, neurobiological information), a generative model can be used to gain insight into the mechanisms that guide the formation and growth of a system. This last point is critical. A generative model, under ideal circumstances, is a recipe for building a network. Having such a recipe opens new avenues for interrogating a network. It allows us to identify structural features of a network that emerge as a direct result of the wiring rule, versus those that emerge spontaneously as a consequence of constraints imposed by a given wiring rule [35]. For example, a geometric model will generate networks with high levels of clustering even though the wiring rule never explicitly optimizes for this property. Importantly, a recipe for building a network also gives us the ability to explore alternative ingredients. What happens if we change a parameter slightly? Does the model generate networks of vastly different character? Can we control the trajectory of a network's growth and guide it into a desired target configuration [42]? The ability to selectively drive the growth of a network is a tantalizing prospect, and one with profound implications for the treatment of psychiatric disease and neurological disorders.

Canonical generative models for networks
Before engaging neuroscience-specific questions, it is useful to discuss examples of generative models as they have been applied in network science and other fields. In the remainder of this section we review some canonical generative models, emphasizing the properties that they share with one another as well as those that make them distinct. While the models discussed in this section certainly fit the definition of generative models, we emphasize that the space of all possible models is broad and includes models that share few characteristics with those discussed here. Generative models have a long history in network science and mathematics. One of the earliest examples is the so-called Erdó´s-Rényi (ER) model [26], in which connections are formed independently between pairs of N nodes with probability P (another version exists where, instead of P, a fixed number of edges, M, are added uniformly at random). While the ER model has interesting combinatoric and mathematical properties, e.g. binomially distributed node degree [43], it is a poor approximation of most real-world networks. That is, the random and independent process by which connections are formed in the ER model results in networks with no real structure ( poor compressibility) and does not resemble any of the mechanisms by which many real-world networks grow. Accordingly, if we wish to model networks in the real world, we need a set of models that generate networks with realistic properties.
Initial explorations into generative models for real-world data resulted in two models that, collectively, helped spark broad interest in complex networks. The first, introduced by Duncan Watts and Steven Strogatz, sought the origin of empirically observed 'small world' topologies, in which a network simultaneously exhibits greater-than-expected clustering and shorter-than-expected path length [28]. Broadly speaking, the model supposed that small-world networks are an interpolation between two extreme configurations: a ring lattice network (nodes arranged on the circumference of a circle and linked to their k clockwise and anti-clockwise neighbours) and an ER network. To move from one extreme to the other, the authors introduced a tuning parameter, p, which governed the probability that an edge in the lattice network would be rewired randomly. When p is small, the model generates networks that have mostly lattice-like properties, but when p is large, the model generates networks whose properties are indistinguishable from those produced by the ER model. Between those extremes, however, is a 'sweet spot'-a region of parameter space yielding networks with properties of both extremes, namely high clustering and short path length. This model is referred to as the Watts -Strogatz (WS) model. At around the same time, a second group sought an explanation for why many real-world networks exhibited heavy-tailed degree distributions. The proposed model, by Réka Albert and Albert-László Barabási, was based on a growth rule [29]. Starting with a small set of fully connected nodes, the model adds new nodes to the network by forming connections preferentially to already-existing nodes with rsif.royalsocietypublishing.org J. R. Soc. Interface 14: 20170623 higher degrees. This growth mechanism is a sort of 'rich get richer' process; nodes that have existed for a long time accumulate many connections, which further increases their likelihood of being connected to newly added nodes. The result of this process is a network with an approximately power-law degree distribution, mimicking those frequently observed in real networks [44]. This model is identical to that defined by Price in 1976 with a single value change to one parameter [45], and is generally referred to as the Barabási-Albert (BA) or the preferential attachment (PA) model.

Generative models in practice and application
The WS and BA models generate synthetic networks with properties qualitatively similar to those observed in realworld networks (small-worldness and heavy-tailed degree distribution). If we wanted to make the similarity of empirical and synthetic networks quantitative and more precise, how would we do so? Supposing that a model yields networks that repeatably and exactly recapitulate all properties of an empirical network, can we equate the model with mechanism? Both of these questions are difficult to answer, and represent some of the technical challenges associated with generative modelling.

Choosing an objective function
We will first address the issue of how to perform quantitative comparisons between synthetic and empirical networks. Fortunately, there exists a plurality of approaches for quantitatively comparing networks. The challenge is selecting the approach that is best suited to a given research question.
Typically, we wish to answer the question of whether an empirically observed network could have been produced by some generative model. One strategy for addressing this question involves defining a likelihood function over the space of all possible networks, and evaluating that function for the observed network. Stochastic blockmodels are a good example of this strategy in action [27,41,46]. The probability of a connection forming between nodes i and j, P ij , depends on their community assignments, z i and z j : The probability that i and j are disconnected, is therefore 1 2 P ij and the likelihood that the observed network was generated by this model is given by Blockmodels are convenient in that this likelihood function can be written in closed form. This approach can be generalized for other models-even when the precise likelihood function is unknown-by generating a sample of networks from a given step of parameters and estimating, from those samples, the probability of any connection existing. This approach is similar to others in the literature [47], in that it links the model's fitness with its ability to correctly account for the empirical network's exact configuration of nodes and edges. While this approach seems useful, it is not difficult to envision scenarios where even near-perfect prediction of an empirical network's connections nonetheless fails to account for some of its critical topological properties. For example, consider the canonical small-world network-a ring lattice plus a few random (shortcut) connections that reduce the network's characteristic path length. The ring lattice and small-world network have nearly perfect edge overlap. If we were to regard edge overlap as the definitive measure of fitness, we might be inclined to treat the lattice network as a good approximation of the small-world network. In other words, from a strictly structural point of view, these two networks are almost perfect matches; from a functional perspective, however, the two networks are highly dissimilar; because of its longer characteristic path length, the ring lattice will lack efficient (short) routes that would be useful for communication or transportation.
Comparing synthetic and empirical networks on the basis of their edge configuration is useful, but has some shortcomings that motivate the exploration of alternative approaches. Another approach, and one that has been used in several recent studies [48,49], eschews the edgewise comparison of two networks, instead simultaneously comparing them along several topological dimensions (e.g. their efficiency, clustering, modularity, etc.), and calculating a statistic of average dissimilarity. For example, Betzel et al. [49] defined the energy function E ¼ max (KS K , KS C , KS B , KS E ), where each term is a Kolmogorov-Smirnov statistic comparing degree (K), clustering (C), betweenness centrality (B) and edge length (E) distributions of synthetic and observed networks [49]. Intuitively, smaller energy implies greater fitness.
This approach is flexible and can be adapted to include virtually any set of metrics. It is important to note, however, that many network measures are correlated with one another, so the choice of which to include should take this into account. Also, there might be synthetic networks that match an empirical network in terms of network statistics but not its precise set of connections. Irrespective of how the objective function is defined, having one makes it possible to perform different kinds of comparisons. For a given model, we can perform model fitting by selecting the parameter values that optimize the objective function. We can also leverage an objective function to compare different generative models to one another. For example, we may wish to discount a model that is incapable of generating networks that resemble our real-world network of interest.

Cross-validation
Suppose that we fit a generative model by optimizing some objective function so that the model generates synthetic networks that share some set of properties with an empirical network. As in any model-fitting exercise, we can continue adding layers of complexity and free parameters to the model so that it matches our real-world network to some arbitrary degree of precision. It is often the case, however, that we are less interested in predicting the organization of a single network, but of a class of networks. For example, we might wish to identify wiring rules that can recapitulate the organization of structural brain networks, on average, rather than the network of any one individual. Even if our aim was to predict subject-specific networks, it might be unsurprising (in a statistical sense) that our models reproduce many of the features of those networks; after all, the model's parameters were selected only after an optimization procedure.
In both cases (fitting models to empirical network data based on edge-or property-matching), it is essential that we perform a cross-validation procedure. This procedure might entail taking the best-fitting parameters from one model and using them to generate estimates of a second network not involved in the model-fitting process. We can compare the goodness-of-fit to that of a random (ER) rsif.royalsocietypublishing.org J. R. Soc. Interface 14: 20170623 model, to ensure that our model performs above chance. This type of cross-validation ensures that a generative model is identifying general wiring rules and not overfitting. A second type of cross-validation involves testing whether synthetic networks have properties in common with real-world networks that they were not explicitly optimized to possess. In other words, does a generative model give us certain properties 'for free?' This type of cross-validation ensures that our objective function is sufficiently general rather than emphasizing a specific subset of network properties. However, even in this case it is important to note that many network properties are correlated with one another, and so the emergence of one may necessarily imply the emergence of another. For example, if a wiring rule incidentally results in assortative modules, it is also likely that the network will exhibit a greater-than-expected clustering coefficient. In short, the network properties used for cross-validation must be carefully chosen and ideally would be orthogonal to one another. One possible solution is to cross-validate using the principal components of a list of network features rather than any individual feature [50].

The space of generative models
What distinguishes one generative model from another? Is it possible to delineate classes of generative models based on their functions or characteristics? Arguably, one of the distinguishing features of any generative model is the timescale over which it operates (figure 1). On one extreme are models with no timescale at all, like stochastic blockmodels [27,41] or the family of exponential random graphs, which uses a regression-based framework to predict a network's link structure from node-and edge-level attributes, and which has recently been applied to brain network data [52 -55]. These kinds of models are 'single-shot' generators of networks, and can therefore be quite poor representations of real-world networks that grow and evolve over time. On the other extreme are models whose internal timescale matches that of the real system. Nodes and edges are added or rewired on a realistic timescale to match known properties of the system. The growth model of C. elegans presented by Nicosia et al. [51] is a good example [51]. In this model, nodes and edges are added according to their empirically measured birth times (time of cell division); a feature that contributed to the success of that model in predicting different properties of the C. elegans connectome.
Between these two extremes-where models operate either without any timescale or with a biologically plausible timescale-is where most generative models are situated. In this middle ground, edges and nodes are added to or rewired in an existing network, but the timescale over which these processes occur is arbitrary. A good example is the BA model, in which new nodes are linked to an existing network over a series of steps. These steps are ordered, so the addition of one node precedes or follows that of another. However, time is measured in arbitrary units (steps) and direct comparison to biological timescales, e.g. human development, might be inappropriate. Ordering generative models based on their internal timescales is similar to ordering them according to their plausibility and mechanistic understanding. Blockmodels and models with arbitrary timescales can do a good job compressing our description of a network and might identify general organizational principles [35]. However, if our aim is to develop realistic mechanistic models of network growth and development, it is essential that we include the necessary components that ground the model in reality. While a generative model's intrinsic timescale naturally results in a stratification of models according to their neurobiological plausibility, it is essential to note that increased plausibility does not necessarily imply improved fit or increased model performance. Entirely implausible models with many parameters that receive a variety of metadata as input can conceivably outperform more neurobiologically grounded mechanistic models, simply due to increased complexity [56]. 3. Generative models of biological neural networks Now that we have an intuition for what a generative model is, and what the goals are for building a generative model, we turn to a brief review of existing generative models for biological networks observed in neural systems. We note that this review is not comprehensive, but instead focuses on areas in which significant work has been accomplished, or areas that motivate important current and future frontiers. We also refer readers elsewhere for additional details on the mechanisms of connectome development [24], biophysical models of neural dynamics [57], and modelling mesoscale structure in dynamic networks [58] and multiscale networks [25]. Finally, we note that this review focuses mostly on generative models of structural and not functional networks (the distinction is in how edges are defined; in structural networks they represent physical connections, e.g. synapses, projections and fibre tracts, whereas in functional networks they represent statistical associations among neural elements' activity, e.g. correlation, coherence, etc.). Because of differences in how structural and functional networks are generated and evolve, certain classes of models that are appropriate for one type may be wholly inappropriate for the other. For example, functional networks are not generated through an edge addition process-they emerge from constrained dynamical processes. We discuss the implications of these differences in more detail later in this section.

The requisite ingredients
An open and important question that scientists face when embarking on a study to develop a generative model is: 'what features are required to build good network models?' Perhaps the simplest feature one requires is a target network topology, the organization of the network that one is trying to recapitulate and ultimately explain. Yet, a single network topology can be built in many different ways, with strikingly different underlying mechanisms [59]. Thus one might also wish to have a deep understanding of (i) the constraints on anatomy, from physical distance [60] to energy consumption [61], (ii) the rules of neurobiological growth, from chemical gradients [62] to genetic specification [63], and (iii) the pressures of normal or abnormal development, and their relevance for functionality. Moreover, each of these constraints, rules and pressures can change as the system grows, highlighting the importance of developmental timing [63]. Of course, one might also wish to choose which of these details to include in the model, with model parsimony being one of the key arguments in support of building models with fewer details.

Generative models at the cellular level
Recent efforts to model cellular level network architecture have had the benefit of building on rich empirical observations made over the last several decades. At one of the smallest spatial scales of neuronal connectivity, evidence suggests that the arbors of single neurons can be characterized by both local [64] and global [65] optimization rules to more strongly minimize volume than length, signal propagation speed or surface area. Within the confines of relative volume cost minimization, there is also evidence for a maximization of the repertoire of possible connectivity patterns between dendrites and surrounding axons: in basal dendritic arbours of pyramidal neurons, arbour size scales with the total dendritic length, the spatial correlation of arbour branches appears to have a single functional form, and small sections of an arbour display self-similarity [66].
The morphology of dendritic arbours specifically and other parts of the cell more generally have a direct bearing on the degree of connectivity that can take place between neurons [67]. Like dendritic arbours, synaptic connectivity appears to be organized in a highly non-random manner [68], with unexpectedly high density in relation to its volume [67]. Interestingly, both synaptic connectivity and neuronal morphology appear to experience some similar constraints, including principles of wiring optimization [60,69]. Some suggest that constraints on synaptic wiring may be the more fundamental of the two, explaining the degree of separation between cortical neurons [60], as well as the placement of cell bodies [70]. Others suggest that it is in fact the combination of wiring economy and volume exclusion that can determine neuronal placement [71].
In either case, the highly non-random nature of synaptic connectivity has been the subject of several recent generative modelling efforts. Initial observations that this non-random organization could be parsimoniously described as smallworld [8,72] have motivated the question of how this particular type of network complexity is combined with pressures for wiring minimization. Nicosia et al. [51] suggest that the growth rules shaping cellular nervous systems balance an economical trade-off between wiring cost and the functionality of network topology (figure 2). Using a dynamic economical model incorporating a continuously negotiated trade-off between wiring cost and network topology, they recapitulate an empirically observed phase transition in the proportion of nodes to links present over the developmental time period of C. elegans [51]. The authors speculate that such dynamically negotiated trade-offs may be characteristic of other complex systems, whether biological or man-made. It will be interesting in the future to consider scenarios in which such trade-offs may be negotiated over shorter time periods, such as in the alteration of the prevalence of autaptic connections posited to play a role in homeostatic network control of bursting [73].
The incorporation of a dynamic economic trade-off is an example of the broader importance of incorporating biophysically accurate features in generative models of cellular neural systems. Another example of such a biophysical feature is axon and dendrite geography, which has been shown to predict the specificity of synaptic connections in a functioning spinal cord network of hatchling frog tadpoles [74]. Some generative models have also sought to determine the role of neuron type in observed network topology and function, for example by building models of sensory neurons, sensory pathway interneurons, central pattern generator (CPG) interneurons and motoneurons, and then linking them in a network with known inter-type connectivity [75]. By adding knowledge about development including chemical gradients and physical barriers [62], a cell-type specific model of 2000 neurons in the spine of a young Xenopus tadpole can produce swimming behaviour in response to sensory stimulation [76]. These and related efforts demonstrate the ability of generative network models built with neuron and synapse resolution, and incorporating biophysical phenomena, to reproduce behaviours observed in whole organisms. Such findings are rsif.royalsocietypublishing.org J. R. Soc. Interface 14: 20170623 reminiscent of other biophysical modelling efforts at the large scale of human areal networks [77,78], where the biophysics of regional rhythms and inter-regional synchronization inform our understanding of human cognition [79].

Increasing in scale: generative models of largescale connectomes in non-human animals
In the previous section, we reviewed some of the literature supporting the notion that cellular network organization in neural systems is characterized by pressures of wiring economy and topological complexity. Such pressures are similarly thought to play a role in the organization of networks at the meso-and large scale in both human and non-human mammalian brains [19]. Computational studies suggest that trade-offs between wiring economy and topological complexity [80] support the formation of network modules, offering relative segregation of function, and network hubs, offering relative integration of function [81]. The role of topological complexity and the presence of unusually high wiring costs in some parts of cortex suggests that simple notions of spatial embedding are not sufficient to explain the observed organization of the connectome. This limitation has motivated models deriving a latent (rather than physical) space from which to predict missing links [82], or incorporating information about cytoarchitecture [83] such that cytoarchitectonically similar cortical areas in the two hemispheres have an unexpectedly high probability of connecting with one another [84]. A particularly salient example of a generative model of areal connectivity in a mammalian brain that incorporates many of these considerations is the recent predictive model of Beul et al. [83] (figure 3). In this paper, the authors study mesoscale structural connectivity between 49 areas of the cat cerebral cortex as estimated by tract tracing techniques [83]. They test the predictive utility of three separate wiring rules: (i) a structural rule in which the laminar patterns of origins and terminations of inter-areal projections vary according to the relative cytoarchitectonic differentiation of the projection sources and targets, (ii) a distance rule in which connections are more frequent, and more dense, among neighbouring regions and sparser or absent between remote regions, and (iii) a hierarchical rule in which differences in the functional hierarchical levels of source and target areas are inversely related to the degree of connectivity between them. While the latter rule did not accurately fit the data, the first two rules (structure and distance) explained significant variance in the observed connectivity patterns, with a linear combination of the two predicting the existence of connections with more than 85% accuracy.
Work in non-human primates generally and the macaque cortex specifically recapitulates many of the same motifs from work in lesser mammals. Early work suggested that cortical components are optimally placed so as to minimize the costs of their interconnections [86], facilitating a global optimal cerebral cortex layout [87]. Later work suggested that component placement did not maximally minimize wiring, but also tended to favour short processing paths, due to longdistance projections [16]. Indeed, separate from where components are placed, it has been noted that there appear to be successfully arbitrated optimization problems in the organization of inter-areal connectivity, for example favouring near-minimization of distance [17,88] and increased support for connectivity between areas with similar topological properties [47]. In an extension of the model described above for the cat, Beul and colleagues similarly demonstrate the striking  [89]. In this case, the distance rule was surprisingly not predictive. Future extensions of this model may include explicit nonlinear growth rules, which have previously been linked to the emergence of network hubs [90].

Generative models of large-scale connectomes in humans
Efforts in humans support the notions of wiring economy [91,92] and topological complexity [49], and further add new considerations such as the geometric segregation of the brain into grey and white matter, enabling the relative minimization of conduction delays [93]. While one-shot models have been the most commonly exercised generative models for human structural networks, relatively new evaluation criteria for them include an assessment of their controllability profiles [94] and homological features [95]. Moreover, there has been a recent and growing interest in developing network growth models that incorporate biologically motivated rules for the probability of connections [96,97]. For example, spatially constrained adaptive rewiring creates small-world network architectures with spatially localized modules [97], while wiring rules based on topological affinities recapitulate known scaling laws of physical network topology [96]. It would be interesting in future work to determine how these rules could be adapted to explain the patterns of conserved and variable architecture of white matter networks across individual humans [98]. The recent paper by Betzel et al. [49] represents one of the first attempts at subject-level generative modelling [49]. In this study, the authors fit 13 generative models to white-matter networks acquired from three independent datasets, totalling 380 subjects (figure 4). The model generated synthetic networks using an edge-addition algorithm, in which connections were added probabilistically and one at a time according to a set of parametrized wiring rules. Each of the 13 models was fit in two stages: first by matching distributional statistics of the white-matter networks and later cross-validated on a separate set of network measures. The best fitting models across all three datasets featured wiring rules based on wiring cost reduction and homophilic attraction principles, the severity of each controlled by a separate parameter. Because the models were fit to individual subjects, it was possible to explore individual variability in model fit. When applied to lifespan data from the Nathan Kline Institute, the authors found that the parameter governing the severity of the wiring cost reduction weakened systematically with age, as did the model goodness of fit. These findings suggest that generative models are sensitive to changes in network organization with development and ageing, and may be useful tools in studying variation across individuals [99].
Interestingly, Vértes et al. [48] found that a similar model also reproduced many of the topological features of brain networks reconstructed from function MRI data. In that study, the authors extended their analyses to show that, based on the optimal parameters fit to networks from individual subjects, the parameter space could be partitioned so as to distinguish brain networks of patients with schizophrenia from those of healthy controls.
The observation that basically the same model, when applied to structural or functional brain networks, outperformed all other models is intriguing. On one hand, it suggests that the brain's physical wiring and functional architecture might be organized according to similar principles. On the other hand, this similarity could simply be  Hub-module areas, as classified by Zamora-López et al. [14], are marked by a white outline. Reproduced with permission from [83]. (Online version in colour.) rsif.royalsocietypublishing.org J. R. Soc. Interface 14: 20170623 coincidental. While both Betzel et al. [49] and Vértes et al. [48] tested and compared a range of generative models, testing the space of all possible models and generative mechanisms is unfeasible. It is likely that other models not tested in either study would result in improved performance. Yet another explanation for the convergence of these two studies is that they model the brain at roughly the same organizational scale (inter-areal networks estimated from MRI data), and that deviations from this scale would uncover a different set of optimal models.
In a more recent study, Tang and colleagues study individual variation in youth by examining the white-matter networks of 882 individuals between the ages of 8 and 22 years [100]. Here, the authors posited that over this developmental time period, structural brain networks become optimized for a greater diversity of neural dynamics, as instantiated by recently defined metrics of network controllability [42]. They tested the hypothesis that an observed trajectory of network change over youth could be recapitulated by a generative model that increased average controllability (predicted ease of transitioning between nearby network states-the level of activity in each region, across the entire brain), increased modal controllability (predicted ease of transitioning between distant network states) and decreased synchronizability (predicted capacity for global synchronization). The model was initiated with a given brain network, and then evolved in silico according to a rewiring rule such that an existing edge was randomly chosen to take the place of an edge that did not exist, and this edge swap was retained only if the new network advanced the Pareto front, the set of all network configurations that were optimal in their trade-off between average and modal controllability (figure 5). As rewiring progressed forward in time, a course was charted in which networks increased in controllability and decreased in synchronizability; while as rewiring progressed backwards in time, networks decreased in controllability and increased in synchronizability. The simulated developmental trajectories displayed a striking similarity in functional form to the observed developmental trajectories, suggesting a possible mechanism of human brain development that preferentially optimizes dynamic network control over static network architecture.

A few relevant distinctions
In this section, we describe a few important distinctions that are particularly relevant to the understanding and further development of generative network models for neural systems. First, we will explore the relations between generative models that seek mechanisms and explanations, and null models for statistical testing of hypotheses. Second, we will discuss the important trade-off in sufficiency of a generative model versus redundancy. Third, we will seek to disambiguate between inferring a possible  mechanism versus claiming proof of a mechanism. And finally, we will describe some relevant considerations when building or evaluating generative models of structural versus functional connectivity.

Generative models and null models
The stated goals of the generative modelling approach, as described in the early sections of this review, include the identification of putative mechanisms of observed network architecture, and intuitive explanations for some of the features that characterize that architecture. Yet, depending on their degree of biological realism, such models can also be used as statistical null models, potentially enabling the dismissal of a null hypothesis. In general, topological and spatially informed null models play a critical role in network science broadly [101 -103], and network neuroscience specifically [92,104,105]. One could consider using a generative model to test the hypothesis that the topology of an empirically measured neural network was consistent with a topology of an artificial network built on a fixed set of rules or principles. In this case, one would need to be careful in the exposition of the study to distinguish between when the model was being used to propose a generative mechanism, and when the model was being used in a statistical sense to dismiss a null hypothesis.

Sufficiency and redundancy
When building generative network models of neural systems, a common observation is that the models often fit topological signatures that they were designed to fit, but do not fit topological signatures that they were not designed to fit [96] (although see also [90,97,106]). It is important to ask whether such a model is sufficient, or whether one should seek a model that also predicts a topological signature that has not been hard coded into the objective function and/or generative algorithm. In addition to sufficiency, one might also wish to consider model redundancy: does the model combine two or more wiring rules that both induce the same topoogical signature? Such a scenario can be quite common, as there exist whole families of graphs that display similar graph metric values [107], community structure [108], controllability profiles [94] and homological features [95]. Broadly, one might wish to build generative models that balance a trade-off between (i) sufficiency, potentially enabled by a greater number of wiring rules, and (ii) redundancy, whose relative minimization is supported by notions of biological efficiency and parsimony.

Inferring and claiming mechanism
Suppose that one is thoroughly successful, and creates a generative model that beautifully reproduces an empirically Pareto optimal networks ( purple dots) are the networks where these properties are most efficiently distributed, i.e. it is impossible to increase one property without decreasing another property-unlike in the non-optimal networks (green dots). The boundary connecting the Pareto-optimal networks forms the Pareto front ( purple line). (b,c,d ) Rewiring along Pareto fronts is a generative network process, here used to model developmental changes in network architecture according to a rewiring rule that maximizes two sorts of network controllability while minimizing synchronizability. Beginning from empirically measured brain networks of youth between the ages of 8 years and 22 years ( purple dots), the authors swapped edges to modify the topology and test if the modified network advances the Pareto front. This procedure charts a course of network evolution characterized by increasingly optimal features: here the authors increased the mean average controllability and the mean modal controllability, and decrease global synchronizability, in 1500 edge swaps (yellow curves). For comparison, the authors also evolved the network in the opposite direction (to decrease controllability and increase synchronzability, pink curves). The trajectory for one subject (blue dot) is highlighted (orange and red). The simulated trajectories defined by the rewiring rules of increasing network controllability and decreasing synchronizability provided a suprisingly good fit to the observed data, suggesting that they are sufficient mechanisms for the empirical observations. Reproduced with permission from [100]. (Online version in colour.) rsif.royalsocietypublishing.org J. R. Soc. Interface 14: 20170623 observed network structure. Do the rules that compose the generative model provide a mechanism explaining the empirical network's architecture [109]? Even more brazenly, can such a generative model help us to develop a theory of brain network organization and resultant behaviour [110]?
In seeking answers to these questions, it is important to disambiguate between inferring a possible mechanism and claiming proof of a mechanism. If a generative network model built upon rule a recapitulates the network structure of interest, one can say that rule a is a possible mechanism, but one cannot claim that it is the mechanism. To provide a more concrete example embedded in network neuroscience, let us consider the topological feature of Rentian scaling, an isometric scaling relationship between the number of processing elements and the number of connections, which is often found in systems that are built upon the principle of wiring reduction, and is observed in brain networks [111] as well as other transmission systems such as computer circuits [112], transportation systems [113] and vasculature [114]. Given the scaling relationship, one might infer that the network's structure is given by a mechanism that operates uniformly across all scales such as wiring minimization. However, such an inference would neglect the fact that many scale-heterogeneous mechanisms also produce topological scaling relationships [59]. In future work, it will be important to concretely discuss support for possible mechanisms separately from exact claims that such mechanisms have been proven.

Functional connectivity and structural connectivity
This review has focused on mostly generative models for structural networks, where links represent physical pathways among neural elements. Generative network models can also be built for functional connectivity data, with some caveats and limitations [48,55,115]. Posited drivers of functional network organization across species include similar notions of cost-efficiency [116 -118], small-world architecture [119] and spatial clustering [120]. However, the appropriate growth mechanisms that such generative models employ face different constraints in the functional domain from those in the structural domain [121]. Functional connectivity is not generated piece by piece, as instantiated by a discrete placement of edges in a network [122]. Instead, functional connectivity is a consequence of dynamical processes constrained by many factors [123], including but not limited to anatomical structure [124][125][126], the activity elicited by a particular task [127], the distance between brain areas [123], genetics [128][129][130] and any stimulation or other input to the system [131,132]. Many good models of brain dynamics exist, ranging from the biologically realistic to the heavily idealized [57]. However, growth models built from the placement of independent edges are conceptually more appropriate for structural networks than for functional networks.
A particularly powerful generative model for producing a functional network topology from a structural network topology (or for inferring a structural network topology from a functional network topology) is the pairwise maximum entropy model [133 -135]. The technique was initially applied to neural spiking data to demonstrate that pairwise interactions give an excellent approximation of the full correlation network [136]. More recently, the technique has been used to accurately fit fMRI BOLD collected in humans [137,138], dynamic functional connectivity patterns [139] and patterns of transitions between brain states [140].
While the pairwise maximum entropy model has proven useful in inferring structural network organization from functional network organization, and vice versa, it is certainly true that non-pairwise interactions may nevertheless play a non-trivial role in neural population function. Intuitively, beyond-pairwise interactions can occur via common input [141], producing multiway synchrony [142] with varying prevalence across different length scales in the system [143]. Generative models of such high-order relations include beyond-pairwise maximum entropy models [144] and dichotomous Gaussian models [145]. Another way in which beyond-pairwise functional interactions can occur is if neurons themselves do not only display pairwise connections, but also higher-order connections. This possibility highlights a complementary challenge in describing the presence of such higher-order relations in structural networks from a topological point of view, with the goal of building generative models that account for them. We will discuss initial efforts to address these challenges using notions from algebraic topology in the next section.

Future directions
In this section, we discuss future directions in efforts to develop, extend and apply network generative modelling approaches to questions of import to neuroscience. We begin by offering a description of what generative models could accomplish, with a particular focus on clinical applications. We next consider dream datasets and experiments whose acquisition and open sharing would inherently change the sorts of questions that network generative modelling approaches could tackle. Finally, we discuss a few natural directions in which to increase the sophistication of network generative models, including the consideration of multilayer networks and simplicial complexes. This section is purposefully more forward-looking and speculative than the previous sections, but we nevertheless offer a generous helping of appropriate citations to the relevant domain-specific efforts.

What would a generative model accomplish?
In practice, many of the current approaches for studying biological neural networks involve computing and comparing summary statistics between groups or continuously across individuals. While this approach is useful in identifying 'what' is different, it fails to explain 'how' those differences come to be, in the first place. In this review, we echo other recent reviews [23,24] and call for a shift in emphasis away from 'fact collecting' studies and towards uncovering the mechanisms that explain the organization of neural systems. We argue that network generative modelling represents a framework that can help us move towards addressing these lofty goals.
Suppose that-with the right dataset and the right modelling approach-we can devise a model that, to a reasonable approximation, can successfully mimic the growth or evolution of a real-world neural system. In other words, the model results in a network that changes over time (where time has a clear developmental or biological interpretation) and whose topology evolves in a way that is consistent with known facts about the real-world growth of that network. What does having such a model buy us? On the one hand, we could simply maintain the status quo, fit the model's parameters to individual subjects and compute statistical relationships between parameters and behavioural measures (figure 6a) using machine learning techniques to partition the model's parameter space into regions associated with clinical and control populations (figure 6b). While useful, these approaches are quite similar to the current state of the field.
On the other hand, another more novel possibility is to use the model for disease simulation. Many psychiatric [146] and neurodegenerative diseases [147] are manifest at the network level in the form of miswired or dysconnected systems [148], but it is unclear what predisposes an individual to evolve into a disease state. The generative model can be used to propagate individuals from one time point to another and identify those that are likely to evolve into a state similar to that of the disease phenotype and perhaps likely to develop that disease. In this way, the model has a clear role as a forecaster ( figure 6c).
Similarly, the generative model can be used to explore in silico the effect of potential intervention strategies. We can think of biological neural networks as living in a highdimensional space based on their topological characteristics, where some regions (of this space; not of the brain) are associated with neurological disease and considered maladaptive (and perhaps even deadly) [149,150]. In this context, the generative model represents an evolution operator that propagates a network from one point to another, tracing out a trajectory through this space. If we can identify individuals who are predisposed to travel near those maladative regions, we can begin to identify perturbations-changes to model parameters or wiring rules-that steer those trajectories towards regions not associated with disease (figure 6c). These goals are in line with current theoretical work, applying tools from network control theory to neuroimaging data [42,151,152].

Dream datasets and experiments
Generative models have clear utility in furthering our capacity to predict disease and identify the mechanisms that shape the development, growth and evolution of biological neural networks. A major hindrance in realizing these goals, however, is the absence of data tailored for generative models. The ideal data would (i) be longitudinal, enabling one to track and incorporate individual-level changes over time in the model, and (ii) include multiple data modalities, such as functional and structural connectivity, and genetics, along with other select factors that could influence network-level organization. In short, any metadata that could theoretically be incorporated into a model would be valuable and possibly worth collecting. Ideally, these data would be acquired at the earliest possible time point in utero [153] and proceed through maturity.
Clearly, collecting and curating such a dataset represents a massive undertaking. Though recent large-scale studies have made it possible to image thousands of individuals over a short period of time [2,3,154] and a small number of individuals over a long period of time [155][156][157], the duration and scale of a longitudinal study of the nature proposed here seems, at present, out of reach. Furthermore, the studies that have come closest to acquiring these kinds of data have relied on MRI due to its non-invasive nature. However, this same advantage also limits the fidelity and kinds of data that can be acquired from an individual (e.g. region-specific gene transcription levels can only be acquired post-mortem [158]).
An attractive alternative, then, is to consider building generative models of data from non-human model organisms. Not only are the life cycles of several model organisms much shorter than that of humans (making it possible to track an individual over the course of its entire life), but new advances in network reconstruction techniques [159][160][161] and the ability to make recordings of activity in unprecedented detail [162,163] ensure that any generative model will be endowed with sufficiently rich data to probe for novel wiring rules. Moreover, working with model organisms also makes it possible to collect data modalities that, otherwise, would be inaccessible, including details about gene expression [164].

Increasing sophistication of network generative models
Finally, given ideal data, there are also exciting and important future directions in increasing the mathematical sophistication of network generative models. One particularly accessible extension of current methods lies in multilayer network generative models. A multilayer network consists of multiple single-layer networks, e.g. representing a neural system's structural connectivity, functional connectivity and gene co-expression [165,166], that are linked across layers to one  Figure 6. Applications using generative models. Model parameters can be fit to individual subjects and those parameters compared to some behavioural measures (a) or used to classify different populations from one another (b). Generative models can also be used to simulate the development of a biological neural network (c). These simulations can be used as forecasting devices to identify individuals at risk of developing maladaptive network topologies. They can also be used to explore possible interventions, e.g. perturbations to parameters or wiring rules, that drive an individual away from an unfavourable, maladaptive network topology towards a more favourable state. (Online version in colour.) rsif.royalsocietypublishing.org J. R. Soc. Interface 14: 20170623 another. A generative model for these types of data is one that, instead of single-layer networks, generates multilayer networks [167], and the rules of generation can apply to a single layer, to multiple layers or to the interconnectivity between layers [168]. One potentially useful place to start would be to construct multilayer generative models where the neural connectivity evolves with a specific set of dynamics (or network growth rules) that are explicitly coupled to the underlying tissue growth or to the inervating vasculature growth [169]. At the larger scale, one could also consider developing multilayer generative models that couple brain network growth with social network growth, a coupling that has recently been postulated to occur through processes of development and learning [170]. Indeed, it is likely that there are other ways in which our brain network topology and changes in that topology are coupled to our experiences. Such experiences could be defined by our environment, for example as partially stipulated by our socio-economic status [171], or by our practices, for example as instantiated in our practice of curiosity [172]. Indeed, it is interesting to speculate that network generative models may be useful in understanding the relations between brain network architecture and the architecture of knowledge networks, which are physically instantiated in the brain [173], as well as semantic networks [174], which can be tuned by our attention [175]. Semantic networks, social networks, brain networks, vasculature networks and tissue networks may all evolve with one another in intertwined multilayer network systems, an understanding of any pair of which will require concerted efforts in extending the sophistication of current network generative modelling techniques.
In addition to multilayer approaches, network generative modelling could also benefit from incorporating methods to address the existence and growth of non-pairwise relations between nodes. A useful language with which to meet this challenge is the language of algebraic topology and specifically simplicial complexes [176] whose fundamental units are simplices: a 0-simplex is a node, a 1-simplex is a dyad, a 2-simplex is a face, a 3-simplex is a tetrahedron, a 4-simplex is a 5-cell, etc. A collection of simplices-called a simplicial complex-can include many interesting features including cliques (i.e. fully connected subgraphs) and cavities (collections of n-simplices arranged so that they have an empty geometric boundary). In patterns of correlations among the activity of pyramidal neurons in rat hippocampus, the topology of cliques and cavities demonstrates geometric organization consistent with a generative model of simplicial complexes related to random geometric graphs [177]. This higher-order structure has also enabled the identification of unexpectedly long structural loops linking regions of early and late evolutionary origin, underscoring their unique role in controlling brain function [178]. Indeed, the topology of cliques and cavities has specific implications for local processing (cliques) versus processing in which information may flow in either diverging or converging patterns (cavities) [178], and can support efficient coding by enabling inference of neural codes even in highly undersampled set of patterns [179]. While generative models of simplicial complexes based on random geometric graphs have shown some utility in explaining these structures, further work is needed to understand the extent of their applicability, and to consider models for growing simplicial complexes [180].
Another way in which the network generative modelling framework can be extended is to consider the effect of generative mechanisms on not only the structural properties of networks, but also the dynamics that they support [181]. Different growth mechanisms result in particular network motifs and topological features. In doing so, they shape flows over the network, organize dynamics and contribute to determining a network's functional fingerprint [182,183]. Incorporating information about dynamics into objective functions or the generative process itself facilitates the exploration of a richer tapestry of models. This theoretical work could eventually be used to model the functional effect of perturbations to the network's structure, retuning of a model's parameters or alterations to the wiring rule itself [184].

Conclusion
As the field of network neuroscience matures, efforts in data description and statistical characterization are being complemented by efforts to infer principles, to predict unobserved data and to perturb the system with theoretically grounded expectations about the results of those perturbations. Generative modelling is a particularly powerful approach for moving beyond description towards prediction, mechanism and eventually theory. In this article, we have offered a simple primer on generative models, a review of recent efforts in generative models of biological neural networks, and a discussion of current frontiers in empirical data collection and mathematical sophistication. We look forward with anticipation to efforts in the coming years that use generative models to understand human development, and to potentially inform interventions in psychiatric disease or neurological disorders in which wiring patterns have gone awry.
Data accessibility. This article has no additional data. Authors' contributions. R.F.B. and D.S.B. wrote this manuscript and approved of its final version.