A growth model for water distribution networks with loops

Water distribution networks (WDNs) expand their service areas over time. These growth dynamics are poorly understood. One facet of WDNs is that they have loops in general, and closing loops may be a functionally important process for enhancing their robustness and efficiency. We propose a growth model for WDNs that generates networks with loops and is applicable to networks with multiple water sources. We apply the proposed model to four empirical WDNs to show that it produces networks whose structure is similar to that of the empirical WDNs. The comparison between the empirical and modelled WDNs suggests that the empirical WDNs may realize a reasonable balance between cost, efficiency and robustness in terms of the network structure. We also study the design of pipe diameters based on a biological positive feedback mechanism. Specifically, we apply a model inspired by Physarum polycephalum to find moderate positive correlations between the empirical and modelled pipe diameters. The difference between the empirical and modelled pipe diameters suggests that we may be able to improve the performance of WDNs by following organizing principles of biological flow networks.

Water distribution networks (WDNs) expand their service areas over time. These growth dynamics are poorly understood. One facet of WDNs is that they have loops in general, and closing loops may be a functionally important process for enhancing their robustness and efficiency. We propose a growth model for WDNs that generates networks with loops and is applicable to networks with multiple water sources. We apply the proposed model to four empirical WDNs to show that it produces networks whose structure is similar to that of the empirical WDNs. The comparison between the empirical and modelled WDNs suggests that the empirical WDNs may realize a reasonable balance between cost, efficiency and robustness in terms of the network structure. We also study the design of pipe diameters based on a biological positive feedback mechanism. Specifically, we apply a model inspired by Physarum polycephalum to find moderate positive correlations between the empirical and modelled pipe diameters. inputs apart from the physical location of each node. It is a growth model of networks that produces tree networks that are similar to some empirical infrastructure networks such as gas pipeline networks, sewage networks and railway networks. However, an important limitation of this model is to the application to WDNs because, in contrast to sewer water networks, empirical WDNs usually have a not negligible amount of loops. Loops in flow networks have been suggested to realize optimal transport efficiency under fluctuations in flow [17][18][19] and some tolerance to damages [19,20]. Overall, there is not a generative model of WDNs that produces networks similar in structure to empirical WDNs without relying on detailed geospatial data of the empirical urban setting.
To address this gap, the research objectives of the present study are twofold. First, we propose a growth model for WDNs that produces networks quantitatively similar to empirical WDNs and do not require detailed input apart from the physical location of each node. The proposed model is classified as a greedy model, in which one adds edges one by one based on a local optimization principle. Our model generates networks with loops and is applicable to networks with multiple root nodes. Note that many empirical WDNs have multiple root nodes (e.g. Modena [21], Colorado [22], Pescara [21] and Balerma [23]). Second, we aim to understand the genesis of the distribution of pipe diameters in the empirical WDNs. The pipe diameter generally varies across pipes in a WDN. It is not well known whether the distribution of pipe diameters in empirical WDNs attains approximate optimality or if there is a large room for improving the efficiency by engineering the pipe diameters under the given constraints. Specifically, we apply a model that is inspired by an amoeba-like organism, Physarum polycephalum, with which one gradually adjusts the conductance of each edge [24], to our empirical and synthetic WDNs. In [25], the authors applied a similar Physarum polycephalum model to relatively small benchmark WDNs and compared the wiring cost between the generated and empirical WDNs. Physarum polycephalum has shown various intelligent behaviour such as finding the shortest paths [26], adapting to changing environments [27] and building high-quality networks that realize a feasible balance between cost, efficiency and robustness [24]. Previous studies suggested that Physarum polycephalum may inform the design of next-generation, adaptive and robust spatial infrastructure networks with decentralized control systems [28][29][30].

Structural properties of WDNs (a) Data
We analyse four WDNs constructed from the empirical datasets (i.e. ZJ, Colorado, Modena and Tampa). These networks are WDNs in Zhijiang in China, Colorado Springs in the USA, Modena in Italy and Tampa in the USA, respectively. The datasets include the IDs and the physical location of junctions, reservoirs, tanks and pipes, and the diameter of pipes. We model each WDN as a graph G = G(V, E), in which V is the set of N nodes (i.e. junctions, reservoirs and tanks) and E is the set of M edges (i.e. pipes). Each WDN has one or more reservoirs and/or tanks, which act as source of water and is referred to as root node. We denote the set of the root nodes by V R . We assume that all the nodes except for the root nodes are demand nodes and denote the set of the demand nodes by V D .

(b) Indices to be measured
In addition to the number of nodes and edges, we measure six indices, two of which are the average degree, denoted by k , and the maximum degree, denoted by k max . The other four indices are meshedness coefficient, cost, route factor and robustness.
First, the meshedness coefficient, denoted by m, quantifies the amount of loops for planar graphs [31] and is defined by where N and M are the number of nodes and edges, respectively. It ranges between zero (for tree graphs) and one (for maximal planar graphs). WDNs are generally near-planar graphs [32]. In fact, among the aforementioned four WDNs, the ZJ, Colorado and Modena networks are planar graphs. Second, we define the cost c as the total Euclidean length of all edges, which is the same definition as that in [16]. The reason for using this definition is that our growth model can be regarded as an extension of a model proposed in [16], which was shown to produce tree-like networks similar to real-world tree-like networks such as sewage networks and gas pipeline networks in terms of the network structure. This definition of the cost is common in studies on spatial networks [3].
Third, we define the route factor as a measurement for the network efficiency. We use the definition in [16], which is also commonly applied to spatial networks [3]. For spatial networks in which flows are transported from the root nodes to the demand nodes, a network should be more efficient if the paths from the root nodes to each demand node are shorter. For a network with the single root node, denoted by 0, we define the route factor q by where d i0 is the Euclidean distance between the root node and demand node i, and l i0 is the shortest Euclidean distance of the path among the paths between the root node and demand node i [16]. The route factor is always greater than or equal to one, and a value close to one implies a high efficiency. The route factor for networks with multiple root nodes is defined as the average of the route factor over the root nodes. Fourth, for WDNs, it is necessary that each node is connected to a root node for the node to receive water supply. Therefore, we define the robustness R by where s(e) is the fraction of demand nodes that are connected to any root node after one removes e edges. A simulation of sequential edge removal starts from the given WDN. At every step of the edge removal process, we remove an edge selected uniformly at random. We calculate R as the average over 100 simulations. This index is an adjustment of the one proposed in [33] to the case of failures of edges and the connectivity to root nodes.

(c) Results
We show the values of the six indices for the four empirical WDNs in table 1. We also visualize these networks in figure 1a. In figure 1, the red and blue nodes represent the root nodes and the demand nodes, respectively. The ZJ and Tampa networks have one root node. The Colorado and Modena networks have four root nodes. Each WDN has just one connected component. All the four WDNs have similar average degrees, k ≈ 2.5. The maximum degree is 4 for ZJ, Colorado and Tampa, and is 5 for Modena. These results are reasonable because the spatial constraints of networks strongly restrict the degree of each node in general [2,3]. The meshedness coefficient, m, ranges from 0.0580 to 0.229, which is consistent with the previous observation that m is generally less than 0.5 for WDNs [32,39]. The cost c ranges between 6.31 for ZJ and 590 for Tampa, which depends on the size of the service areas. The four WDNs have similar values of the route factor, q ≈ 1.3. Finally, Modena has the highest robustness, followed by ZJ, Colorado and Tampa. This last result may be related to the number of root nodes per demand node, which is also the highest for Modena, followed by ZJ, Colorado and  Table 1. Structural properties of the empirical WDNs, the synthetic networks generated by our growth model, and the synthetic networks generated by the Waxman model. N: number of nodes, |V R |: number of root nodes, M: number of edges, J: Jaccard index for the sets of edges between the empirical and synthetic networks, N c : number of connected components, k : average degree, k max : maximum degree, m: meshedness coefficient, c: cost (i.e. total edge length) (km), q: route factor and R: robustness.      we add an edge between nodes i and j, which we select as follows. We impose that node i is in a connected component including a root node and that node j is in a connected component that is different from the one including the node i. Under this condition, we select i and j such that the Euclidean distance between i and j, denoted by d ij , is the smallest among all possible node pairs. (b) Loop closure: Alternatively, i.e. with probability P loop , we add an edge to close a loop.

ZJ
We impose that node i has degree one and that node j is in the same connected component as the one that contains node i. Under this condition, we select i and j where ij is the Euclidean distance of the path that is the shortest among the paths between nodes i and j. Then, we add an edge between i and j. If the number of nodes in the largest connected component is less than three, we carry out step (a) with probability 1 because we cannot carry out the loop closure without creating a multiple edge (i.e. more than one edge directly connecting two nodes). If there is no node with degree one, we also carry out step (a) with probability 1.
(iii) Repeat step (ii) until all the nodes are connected as one component.
Note that the reason why we set P loop = (M − N + 1)/M is that the expected number of edges should be M when we complete steps (i)-(iii). In other words, in order for the N nodes to be connected for the first time when we add M edges, we need to perform step (ii-a) exactly N − 1 times. In expectation, this condition is equivalent to Closing loops has been suggested to help realizing optimal transport efficiency under fluctuations in flow distributions [17][18][19] and improving robustness [19,20]. In our model, parameter γ controls how loops are closed. On the right-hand side of equation (3.1), the quantity d ij represents the cost of the new edge. The quantity γ ij represents the gain of closing a path of Euclidean length ij to make a loop. In other words, when we connect two nodes with large ij , the Euclidean length of the shortest loop closed by the new edge is large. Parameter γ controls the trade-off between the Euclidean length of the new edge and the Euclidean length of the shortest path closed by the new edge. With γ = 0, we always add the shortest edge in terms of the Euclidean distance. As we will show later, the generated networks with γ = 0 tend to have loops with a small Euclidean length and easily fall apart into disjoint components if one sequentially removes edges. With a large γ , generated networks are expected to be more robust against edge removal because they tend to have loops with a large Euclidean length, which generally offer multiple paths for many pairs of nodes.

(b) Examples of generated networks
We visualize the networks that our growth model generates with γ = 0.35 for ZJ and Tampa, γ = 0.4 for Colorado and γ = 0.5 for Modena in figure 1b. The synthetic networks are apparently similar to the empirical WDNs (figure 1a). For comparison, we also visualize the synthetic networks generated by the Waxman model, which is a spatial variant of random graphs (see appendix A for details of the Waxman model), in figure 1c. The Waxman model produces networks that are apparently dissimilar to the empirical WDNs.
We compare structural properties among the empirical WDNs, our model and the Waxman model in table 1. The synthetic networks generated by our model are roughly similar to the empirical WDNs in terms of the indices measured. The synthetic networks generated by the Waxman model are not connected as one component, and therefore we do not compute the route factor and the robustness for them. Apart from the route factor and robustness, the Waxman model is roughly as similar to the empirical WDNs as our model is in terms of the average degree and meshedness coefficient. However, the Waxman model is substantially more dissimilar to the empirical WDNs than our model is in terms of the Jaccard index, maximum degree and cost.
We show a simulated time course of network growth with our model in the case of Colorado in figure 2. We observe that the connected component associated with each root node expands its service area, and such connected components merge at later times.
(c) Trade-offs between the cost, route factor and robustness Our model has parameter γ , which controls the trade-off between the wiring cost and the length of the added loops. We visualize sample synthetic networks generated by our model with γ = 0,  With γ = 0, we sequentially add the edges that yield the smallest Euclidean distance in each step. As shown in figure 3a, the resulting network has loops with small Euclidean lengths. For a larger γ , there are more long edges that close loops. We computed the cost, route factor and robustness for these three networks. The cost increases as γ increases, i.e. it is equal to 367, 720 and 2248 when γ = 0, γ = 0.5 and γ = 1, respectively. The route factor decreases as γ increases, i.e. it is equal to 2.20, 1.24 and 1.23 when γ = 0, γ = 0.5 and γ = 1, respectively. The robustness increases as γ increases, i.e. it is equal to 0.0184, 0.217 and 0.280 when γ = 0, γ = 0.5 and γ = 1, respectively. This preliminary result indicates that there are trade-offs between the cost, route factor and robustness as we vary γ . In this section, we study these trade-offs. We start by comparing the cost and route factor for the empirical and synthetic networks. For the synthetic networks, we use γ ∈ {0, 0.05, . . . , 1} and generate three networks for each value of γ . We show the results in figure 4. First, the cost tends to increase as γ increases. This result is consistent with our observation with figure 3. Second, the route factor is the largest when γ = 0, and it decreases sharply as γ increases. The route factor is largely independent of γ when γ is approximately larger than 0.5. Third, our growth model generates networks similar to the   empirical WDNs in terms of the cost and route factor when 0.35 ≤ γ ≤ 0.5. The synthetic networks when 0.35 ≤ γ ≤ 0.5 realize route factor values that are close to the minimum (i.e. less than ≈ 115% of the smallest possible value when one varies γ ) and a cost that is less than ≈ 170% of the minimum for each empirical WDN.
We show in figure 5 the trade-offs between the cost and robustness for synthetic networks across different values of γ . Again, for the synthetic networks, we use γ ∈ {0, 0.05, . . . , 1} and generate three networks for each value of γ . The robustness increases as a function of γ , and it does so considerably more when γ is small than when γ is large. Similar to the case of the route factor, our growth model generates networks similar to the empirical WDNs in terms of the cost and robustness when 0.35 ≤ γ ≤ 0.5 except for the case of ZJ. The synthetic networks with 0.35 ≤ γ ≤ 0.5 realize robustness values that are more than ≈ 60% of the maximum and a cost that is less than ≈ 170% of the minimum for each of the empirical WDNs. The robustness of ZJ is smaller than those of the synthetic networks with a similar cost. A possible reason for this result is that the ZJ network apparently has a strong community structure (figure 1a), which our model does not intend to reproduce.   Figure 6a indicates that large-diameter pipes tend to be physically close to a root node. However, this is not always the case. For example, there are large-diameter pipes that are located far from the root node in the upper part of the Tampa network. The diameter of pipes influences the installation cost and the performance of WDNs such as the flow velocity and the head loss [40]. Abrupt changes in the pipe diameter may cause rapid variation in head loss, which may lead to physical damages and failures in pipes [41], and the gradual change of pipe diameters along flow paths has been shown to increase both energy and path redundancies in WDNs [42]. Therefore, the distribution of pipe diameters is an indispensable component for assessing the performance and design of WDNs. Likewise, edge conductance per unit length (i.e. the reciprocal of the edge's resistance per unit length), which we call conductance for short in the following text, is an important determinant for various natural flow networks such as leaf veins of plants [43], vascular systems of animals [44] and river networks [45]. These networks continuously adapt to the environment and are survivors of evolution. Therefore, they may attain optimal structure according to an evolutionary criterion. It is of interest to design transportation networks inspired by such natural networks. Models inspired by an amoeba-like organism, Physarum polycephalum, have been used for designing optimal transportation networks [24,27,[46][47][48]. These models operationalize positive feedback between the conductance of edges and the amount of flow passing through them. This feedback mechanism is informed by the physiology of Physarum polycephalum, with which plasmodial tubes thicken if the protoplasmic flow through them increases [49]. In this section, we compare the pipe diameters for the empirical WDNs and the ones determined by a model [24], which we refer to as the Physarum model. We use this specific model because the networks generated by this model have been suggested to be optimal in terms of the balance between the cost, efficiency and robustness [24].

Pipe diameter (a) Background and the Physarum model
Here we briefly describe the Physarum model proposed in [24]. Under the assumption that the flow is laminar and follows the Hagen-Poiseuille equation, the flux through pipe (i, j), denoted by Q ij , is given by where the flow is measured in the direction from the ith to jth nodes; R ij is the diameter of pipe (i, j); η is the viscosity of the fluid; D ij = π R 4 ij /128η is the conductance of pipe (i, j); ij is the physical length of pipe (i, j); and p i is the pressure at node i. We consider the conservation law of flux given by where I 0 (with I 0 > 0) is the total flux outgoing from the root nodes. We remind that V R and V D are the set of the root nodes and that of the demand nodes, respectively, such that V = V R ∪ V D . We assume that the viscosity of the fluid η is constant, which implies that the diameter R ij evolves in proportion to (D ij ) 1/4 as D ij evolves. The first term on the right-hand side of equation (4.3) describes the increase in D ij in response to the flux, which is a positive feedback effect. In the absence of flow, D ij exponentially decays over time because of the second term. We use where μ(> 0) is a parameter that specifies the nonlinearity of the positive feedback.

(b) Numerical results
We apply the Physarum model to the three empirical WDNs (i.e. Colorado, Modena and Tampa) and the corresponding synthetic networks generated by our growth model. The synthetic networks are those shown in figure 1b.
We visualize the networks with the pipe diameters, R ij , being obtained by the Physarum model with μ = 1 and I 0 = 10 for the empirical and synthetic networks in figure 6b and 6c, respectively. In these figures, there are large-diameter pipes physically close to the root nodes, which is similar to the empirical WDNs shown in figure 6a. In addition, the Physarum model generates largediameter pipes in the upper part of the Tampa network ( figure 6b and 6c), which is also consistent with the empirical Tampa network (figure 6a). However, there are also dissimilarities, especially in the case of the synthetic networks shown in figure 6c. For example, the distribution of pipe diameters in Colorado in figure 6c is not close to that of the WDN shown in figure 6a.
To quantify the similarity between the empirical and synthetic networks in terms of the distribution of the pipe diameters, we measure the Pearson correlation coefficient between the pipe diameter obtained by the Physarum model and the empirical counterpart, where we regard each pipe as a sample. In the case of synthetic networks, we calculate the Pearson correlation coefficient only based on the edges co-present in the empirical and synthetic networks. We show the Pearson correlation coefficient, denoted by r, for different values of the two parameters of the Physarum model, μ and I 0 , in figure 7. According to a standard, we interpret the Pearson correlation coefficient to be strong, moderate or weak when |r| > 0.7, |r| > 0.4 or |r| > 0.1, respectively [50]. Figure 7 suggests that, given the structure of the empirical network, the correlation between the empirical data and the synthetic data obtained from the Physarum model is moderately positive, albeit not strongly positive in a broad parameter region for all the three WDNs. By contrast, the pipe diameter for the synthetic networks is only weakly correlated with the empirical data.

Conclusion
We proposed a growth model for WDNs that does not require detailed inputs, apart from the physical location of each node. We have found that our model produces networks whose structure is similar to that of the empirical WDNs when 0.35 ≤ γ ≤ 0.5. These networks realize route factor values that are less than ≈ 115% of the smallest possible value when one varies γ , robustness values that are more than ≈ 60% of the largest possible value and cost values that are less than ≈ 170% of the smallest possible value for each of the four empirical WDNs that we have used. When γ ≤ 0.35, the route factor tends to decrease drastically as one increases γ . On the other hand, when γ ≥ 0.5, the gain in the robustness when one increases γ is small. Therefore, the empirical WDNs may realize a reasonable balance between the cost, efficiency and robustness. These results imply that our model may inform growth mechanisms of real-world WDNs and design of WDNs. Because the range of the γ values for which the synthetic network is most similar to the empirical WDN is close for the four empirical WDNs (i.e. 0.35 ≤ γ ≤ 0.5), the proposed growth model with this range of γ may be able to explain growth dynamics of various other real-world WDNs. The model may also be applicable to other spatial flow networks such as gas pipeline networks. Nonetheless, our model and its evaluation only depended on the network structure. Future studies along this line should examine the hydraulic properties of WDNs for our model to be informative in practical contexts. Some studies showed that increasing degree-degree correlations is effective at improving the network robustness [33,51,52]. There is also a strong relation between robustness and loops [19,20,[53][54][55]. In the present study, we have shown that the robustness is highly dependent on the value of γ for γ ≤ 0.5, which suggests that the distribution of loops may be a key determinant of the robustness. In [56], the authors proposed a method for analysing the distribution of loops for planar graphs and found that leaf venation networks have a hierarchical structure, with which large loops contain small loops. Such a hierarchical structure of loops may allow leaves to maintain the supply of water and nutrients even when flow through some edges in the network is lost due to damages [57][58][59]. In addition, loop nestedness increases flow path redundancies in WDNs [42]. Our results also show that such a hierarchical structure of loops may exist for some values of γ ( figure 3). However, analysing the distribution of loops for non-planar graphs is still an open question [60,61]. Further studies are desirable for investigating the correlation between the network-based robustness and hydraulic performance of WDNs, and clarifying relationships between the robustness and the distribution of loops in non-planar graphs including many of WDNs.
We also studied the design of pipe diameters based on a biological positive feedback mechanism. We found moderate positive correlations between the empirical and modelled pipe diameter across the pipes for all the three WDNs. These results suggest that the distribution of the empirical pipe diameters may be closer to optimal than to a uniformly random distribution under the assumption that the Physarum model approximately optimizes the conductance of edges in terms of some efficiency or robustness criteria. In fact, a similar Physarum model realizes distributions of pipe diameters that realize a small overall wiring cost in WDNs [25]. At the same time, the Physarum model or similar models lack hydraulic considerations, and it is underexplored whether optimality in the sense of evolution of amoeba-like organisms has anything to do with hydraulic performances. For example, the Physarum model does not take into account the elevation change over the pipe's length. When such elevation changes are nonnegligible, it may be necessary to adjust the model in this respect for practical applications. In addition, we assumed for simplicity that all the root nodes have the same flux value and that all the demand nodes have the same flux value (see equation (4.2)). Obviously, it is of pragmatic interest to adjust the model to take into account heterogeneity in these quantities. Further studies should also compare the hydraulic performance between the empirical WDNs and the networks whose pipe diameters are obtained by these biologically inspired models. Such studies will provide the insight on the fundamental contribution of network structure (i.e. edge configuration and pipe diameter arrangement) to the performance of WDNs. Moreover, additional natureinspired mechanisms such as the growth of underlying tissues in leaf venation networks [62] and other optimization techniques, such as simulated annealing [19] and combination between biological principles and engineering control [63], may also be useful for realizing desirable distributions of pipe diameters. Acknowledgements. The authors thank Brian Pickard and Seung Park from the city of Tampa Water Department for providing the water network data.

Appendix A. Waxman model
The Waxman model is a spatial variant of the Erdős-Rényi model [64]. In the original Waxman model, the nodes are uniformly distributed in the two-dimensional space. By contrast, we use the two-dimensional coordinate of each node informed by the empirical data. One adds each edge between each pair of nodes, i and j, with probability where d 0 is the typical length of edge and β controls the density of edges. One determines whether or not to lay an edge between each pair of the ith and jth nodes independently for different node pairs. We set d 0 to the average Euclidean length of edges in the given empirical WDN.