Do-it-yourself networks: a novel method of generating weighted networks

Network theory is finding applications in the life and social sciences for ecology, epidemiology, finance and social–ecological systems. While there are methods to generate specific types of networks, the broad literature is focused on generating unweighted networks. In this paper, we present a framework for generating weighted networks that satisfy user-defined criteria. Each criterion hierarchically defines a feature of the network and, in doing so, complements existing algorithms in the literature. We use a general example of ecological species dispersal to illustrate the method and provide open-source code for academic purposes.


Introduction
Network theory is gaining traction in the life and social sciences, finding applications in the fields of ecology, economics, epidemiology, finance and more recently, in analysing socialecological systems [1][2][3][4]. In ecology, there is growing interest in the role of connectivity between spatial locations on the stability and resilience of those systems [5,6]. Numerous other applications exist for the study of infectious disease spread [7,8], optimal management of spatially distributed species [9], neural communication [10], bank failure [11] and the diffusion of ideas [12]. While there are methods to generate specific types of networks (table 1), these are primarily limited to unweighted networks [1]. In this paper, we present a general framework for generating suites of weighted networks that meet user-defined criteria, e.g. symmetry, clustering or other metrics measuring network connectivity [3,23]. The framework makes use of a hierarchy of metrics, where each metric specifies a distinct feature that describes the structural properties of the network; the hierarchy gives a natural order in which  Given a k-connected network of N nodes and a particular degree distribution. Order the nodes of the graph (n i ) by decreasing degree such that n 1 > n 2 > . . . > n N . Recursively connect the nodes of the lowest degree (h) to those of the highest degree (from n 1 to the first node of degree h, n x ) and to those of the lowest degree (from n N to n N−(h−x) ).

Bansal
Given a random network of N nodes. Iteratively select a random edge in the network and rewire it to a new node chosen in proportion to its degree. Continues rewiring until the coefficient of variation of the degree distribution is equal to one (characterizes an exponential degree distribution).

scale-free
Given an initial number of nodes in a network, N(0). For a set of t = 1,2, . . . ,T iterations, add a new node n(t) that is connected to no more than N(0) nodes in the system. The probability that the new node is connected to node i is given by k i /Σ j k j , where k i is the number of connections of node i. Exhibits preferential connections between nodes. Degree distribution follows a power law.
Barabasi & Albert [20], Barabasi et al. [21] . to consider the addition of new metrics. Our method uses optimization theory to create adjacency matrices that define the relationship between different entities such as spatial locations or the interactions between species, people and institutions. Our work is most akin to the network generation algorithms of Carayol et al. [15] and Small et al. [16], who use genetic algorithms and pay-off functions to optimize the generation process. However, we specifically generate weighted networks and optimize a suite of network metrics rather than node degree. We demonstrate our method using a general example of ecological species dispersal and provide open-source code of our algorithm for academic purposes. 1

Terminology
For the purposes of illustrating the algorithm, we consider a 'habitat graph' network composed of spatially distinct habitats or patches that are connected by species dispersing among them. However, our method is not limited to spatial networks and can be used to generate weighted networks in other applied fields such as those of species interaction or social networks. See [1,3] for reviews of different types of networks across disciplines. In our study, each patch is referred to as a node; each connection is referred to as an edge, which allows for the flow of species between patches. An adjacency matrix describes the connections between patches. Let us construct an adjacency matrix that represents the connectivity between n spatial locations or patches. Patches are connected by the movement of species, people or any flow between spatial locations. For illustrative purposes, we consider the case of a single species dispersing between patches.
Let the entries of the adjacency matrix correspond to the cost of movement between patches where the dispersal of species is symmetric, i.e. species move bidirectionally at the same rate. We assume a weighted network, where the off-diagonal entries of the adjacency matrix are constrained to be real numbers greater than or equal to zero. Increasing the cost of movement between patches to infinity is equivalent to assuming that two patches are not connected. 2 Mathematically we define a randomly generated adjacency matrix describing the cost of species movement between patches as: where A is a square n × n matrix and a ij is the degree of connectivity between patches i and j. Restrict a ij > 0 for all i = j. We assume that a patch cannot be connected to itself, i.e. A is a zero-diagonal matrix. 3 We constrain the adjacency matrix to possess a spectral radius of r, along with a specific variance (v) and skewness (s) of the associated dominant eigenvector x r . We focus on these three for the following reasons. First, because we are primarily interested in weighted networks, metrics concerning node degree distributions (e.g. node or link density) are less appropriate. With weighted networks, our focus is the node connectivity distribution. Second, our metrics are particularly useful for illustrating a hierarchical approach to network construction and identification. Combining multiple network metrics-starting with general summary measures and then moving to more specific metrics (the hierarchy of ordering)gives a more holistic understanding of network structure. For instance, spectral radius is a good proxy of connectivity [26] but there exist an infinite number of potential networks for a given spectral radius.
Combining spectral radius with the dominant eigenvector provides a more complete picture of network structure. For any given spectral radius there exists a dominant eigenvector that can be described as a distribution of node connectivity scores. The statistical moments of such a distribution are a natural (hierarchical) method of discerning differences between networks. In other words, any two networks can be differentiated as higher statistical moments are calculated. The spectral radius is the largest (dominant) eigenvalue of the adjacency matrix. 4 In network terms, it represents the average distance it takes to traverse across the entire landscape and is therefore a proxy of network connectivity [26]. A low (high) spectral radius indicates a network that is highly (poorly) connected. We formally define the spectral radius as: where {λ k } k=1...n are the eigenvalues associated with A. Let x r represent the eigenvector associated with the dominant eigenvalue. Spectral radius alone is insufficient to guarantee a particular network structure. For a specific spectral radius there exists an infinite number of potential network configurations (figure 1). This makes it difficult to confidently generalize conclusions drawn from network process outcomes. Therefore, we further constrain the adjacency matrix (of spectral radius, r) to possess a defined variance and skewness in the associated dominant eigenvector, x r .
In a network context, the dominant eigenvector represents a collection of node connectivity scores or 'eigenvector centralities' of the network [29]. Each component in the dominant eigenvector measures the relative contribution of each node to the overall connectivity of the network. Each network consists of nodes with varying degrees of contribution. Some nodes may be high contributors to connectivity while other nodes may be isolated and contribute less. We define highly (weakly) connected nodes within a  In c, only node 3 is a strong contributor to connectivity; in d, nodes 1-4 contribute strongly. Adapted from Salau et al. [28].
given network as nodes with a contribution score greater (less) than the mean contribution; where, the mean contribution score is the average of the dominant eigenvector components. Variance of the dominant eigenvector measures the spread in node contribution across the network and thus quantifies node heterogeneity for a given network. In statistical terms, variance is the mean squared deviation of node contribution to connectivity. A low variance indicates that each node contributes similarly to network connectivity; a high variance is indicative of a wide distribution in node contribution.
Skewness of the dominant eigenvector measures the net proportion of highly to weakly connected nodes in a network. A skewness of zero implies an even proportion of highly and weakly connected nodes. A positive (negative) skewness indicates a greater (lower) number of highly connected to weakly connected nodes. Taken together, spectral radius, variance and skewness of a network highlight unique topological properties of the desired network (figure 1).
Variance (v) and skewness (s) of x r take the forms: to summarize a network's node connectivity scores, each of our metrics highlights a distinct feature of networks and make hierarchy a non-issue [28].

The algorithm
In this section, we outline an algorithm to generate networks that conform to specific values for all three metrics [28]. Let r*, v* and s* serve as user-defined criteria that indicate the desired spectral radius and variance and skewness of the associated dominant eigenvector. 5 We generate a random adjacency matrix, A 0 , to serve as initial conditions. Denote the spectral radius and variance/skewness of the associated eigenvector of A 0 as r 0 , v 0 and s 0 , respectively. In our algorithm, A 0 and its associated network metrics are updated according to a numerical hill-climb method.
We choose the entries of A 0 such that they minimize the weighted sum of squared differences between the properties of our 'new' adjacency matrix and our user-defined criteria. 6 This results in the following minimization problem: The values of ω k are the weights associated with the preferred degree of accuracy of the estimated entry to the predefined user criteria. The larger the weight, the greater the variable contributes to the sum of squares and the closer the estimated value must be to the desired theoretical value. The values of r 0 , v 0 and s 0 are determined directly from the entries of A 0 and are, therefore, functions of a ij .
The minimization problem in (3.1) is constrained by (subject to) a set of equations defining the structural properties of the desired adjacency matrix. The constraints form an n 2 × n 2 system of equations such that: Collectively, the above equation is referred to as the equality constraints. On the left-hand side, the matrix B refers to the boundary equality conditions matrix. The matrix B is an n 2 × n 2 matrix that captures the characteristics of the matrix A 0 . Each row corresponds to an equation describing the connectivity between patches. Each column corresponds to a variable representing an entry in A 0 . The matrix B is multiplied by an n 2 × 1 vector of variables, one for each entry of A 0 . The solutions are given by an n 2 × 1 vector E, often referred to as the equality conditions. Together (3.2) forms a system of at most n 2 equations and n 2 unknowns that capture the desired structural properties of the adjacency matrix. An illustration of the construction of (3.2) for a three-patch system is found in the electronic supplementary material, A. We present the general method below.
Initially define B and E to be a zero matrix and vector, respectively. For the entries of B corresponding to the diagonal and zero off-diagonals of A 0 , the following must hold: -A patch cannot be connected to itself. For each diagonal entry of A 0 there exists a row in B such that the corresponding entry of a ii in B is equal to one with all other entries in the row equal to zero. (After matrix multiplication in (3.2) this results in a ii = 0). -To specify that two patches are not connected in the adjacency matrix, set the entries of B corresponding to the respective off-diagonal entries of A 0 equal to zero. In other words, if a ij = a ji = 0 in A 0 , then there exist two rows in B such that the corresponding entries for a ij and a ji in B should be one while all other entries in both rows are equal to zero. 5 For some network with a given number of patches (n) and spectral radius (r*), it follows that bounds for the variance and skewness of the associated eigenvector are given by: where w min is the user-defined minimum value of an entry of A 0 . The function g(v) represents the mean of the dominant eigenvector, which exhibits a one-to-one relationship with the variance where g(v) = ((1/n) − v * ) 1/2 [28]. These inequalities represent 'weak' bounds for v* and s*. Proofs for these inequalities are found in electronic supplementary material, A. Examples of available network configurations generated using our method can be found in figure 2; electronic supplementary material For the entries of B corresponding to the non-zero off-diagonals of A 0 , the following must hold: -For each pair of connected patches in A 0 , there exists a row in B such that the corresponding entries for a ij and a ji are equal to 1 and −1, respectively. All other entries in the row are 0. This implies symmetry. (After matrix multiplication we obtain a ij = a ji .) The minimized values of A 0 are fed into a generic hill-climbing numerical algorithm to solve for the optimal solution. The pseudo-code for the method is outlined below: -Define a new adjacency matrix consisting of the minimized values of A 0 as A 0 . Let the values of the network metrics associated with A 0 be given by r 0 , v 0 and s 0 . -Compare the values of each network metric derived from A 0 to r*, v* and s*. One method to do so is to calculate the absolute value of the difference between the desired and derived metrics (e.g. |r * − r 0 |, |v * − v 0 |, |s * − s 0 |). Another is to calculate the per cent difference between the desired and derived metrics. -If all three metric differences fall below a pre-determined tolerance level, the algorithm concludes. -However, if the difference between any of the metrics exceeds a pre-determined tolerance level: (i) Randomize one off-diagonal element of A 0 subject to the assumptions discussed above. (ii) Define the perturbed A 0 as new initial conditions for the minimization problem in (3.2). In other words, update A 0 , r 0 , v 0 and s 0 to be the perturbed adjacency matrix A 0 . Minimize the sum of squared differences.
The resulting output is an n × n adjacency matrix that is symmetric and possesses a spectral radius, variance and skewness of its associated eigenvector of r*, v* and s*. Figure 2 illustrates available network configurations generated using our method. Other configurations are found in electronic supplementary material, A, including network configurations of 50, 100 and 250 patches. While small compared with neural networks or the World Wide Web, these network sizes are suitable for many empirical social and ecological networks [4,[31][32][33].
Convergence of the algorithm is dependent on several factors. First, for a given spectral radius and number of patches, there exists a lower and upper bound on the feasible values of the variance and skewness of the dominant eigenvector. These are derived in electronic supplementary material, A. Second, while minimizing the weighted sum of squared differences guarantees a concave function, the value of the desired properties of the adjacency matrix (and their associated weights) in (3.1) determine the specific shape of the function. The flatter the function to be minimized, the more difficult it is for the algorithm to converge to a solution. Finally, we cannot rule out the possibility of convergence to a local minima, in contrast with a global minima. However, this is a general problem characteristic of optimization theory [34,35], and formal analysis is left for future work.
Our algorithm was coded in Matlab 2016a. Source code is available in the electronic supplemental material and is free for academic use. 7

Discussion
By varying initial conditions one may generate an infinite number of weighted networks with specific structural properties. Defining multiple criteria in the generation process allows one to fine-tune the structure of the resulting network, and explicitly test the effect of network structure on system properties such as stability or resilience.
Further, our method is not limited to spectral radius and eigenvector centrality, but may be applied using any network metric deemed appropriate to describe the connectivity of the system. For example, it is possible to use clustering and other metrics that are employed to analyse social and biological networks (i.e. clustering, degree centrality and betweenness) [1,2,22]. It is feasible to use our algorithm to generate networks with combinations of distances between nodes, node contributions to connectivity and/or clustering.  Nor is our method limited to symmetric, bidirectional dispersal. By changing the system of equations in (3.2), one may allow for unidirectional or asymmetric dispersal between patches. The method can also be extended to multiple species and unweighted networks. For the former, each species has its own adjacency matrix that describes the flow of that species within the network. The latter is intrinsically more difficult. Intuitively one would constrain the entries of the adjacency matrix to zero or one, and (if desired) solve for an adjacency matrix with a particular average, variance and skewness in the number of connections per node. In this case, the minimization problem in (3.1) becomes a nonlinear binary integer problem, which is difficult to solve and is left for future work. 8 Like other network generation methods (table 1), there is the possibility that the networks we generate reflect only a small subset of the entire suite of networks that meet our specified criteria. While we know of no way to explicitly test this, we can combine our method for generating weighted networks with those of unweighted networks to more fully 'paint the picture' of a network's structure. We may, for example, create a weighted network that has a particular node degree distribution. To do so, one would first generate an unweighted network describing the structural connections between nodes (e.g Erdõs-Réyni [17], scale-free [20], small world [22]). This would have a particular node degree distribution. The binary connected/not connected coefficients could then be fed into the equality constraints of (3. create a weighted network with a specific degree distribution that also meets the desired properties of the weights.
Our method has applications across disciplines. We briefly discuss several examples. In epidemiology, disease propagation is often faster in structured population networks than random ones [20]. More specifically, the structure of the network of contacts has implications for the management of human and animal disease spread [8,36]. In ecology, the dispersal of species plays a distinct role in the persistence and survival of species. Examples include mass and rescue effects [37,38], source-sink dynamics [39,40] and spatial insurance [41,42]. At the core of each of these principles is the underlying structure describing the movement of species between spatial locations. More recent work has identified 'keystone patches'patches that contribute strongly to diversity or stability of the entire system [43]. Finally, the management of spatially distributed resources has been a topic in the field of economics [44]. Though much of the literature focuses on fisheries management [9,44,45], applications exist for the spatial management of other renewable resources [46] and invasive species [47].
While we focus the application of our method on spatial networks, it is by no means limited to them. Our method is suitable for generating matrices of species interactions such as competition, parasitism or mutualism. In ecology, one common measure of ecosystem stability is the dominant eigenvalue of the interaction matrix or 'asymptotic resilience' [48]. A negative (positive) value indicates a stable (unstable) system, with the magnitude being a loose indicator of how stable the system is. Recent advances have demonstrated that for random interaction matrices the dominant eigenvalue can be a poor indicator of stability compared to other metrics [49]. It would be useful to explicitly test under which species interaction structures the dominant eigenvalue is a good or poor indicator of stability.
Our method complements existing algorithms for generating networks (table 1) while giving researchers precision in the structure of networks they wish to construct. We hope that our algorithm will aid researchers in testing the effects of explicit network structure on model outcomes.
Data accessibility. We include source code for the algorithm in electronic supplementary material, B. Alternatively, source code and convergence data can be retrieved from the Open Science Framework (https://osf.io/4sd65).