Principled network extraction from images

Images of natural systems may represent patterns of network-like structure, which could reveal important information about the topological properties of the underlying subject. However, the image itself does not automatically provide a formal definition of a network in terms of sets of nodes and edges. Instead, this information should be suitably extracted from the raw image data. Motivated by this, we present a principled model to extract network topologies from images that is scalable and efficient. We map this goal into solving a routing optimization problem where the solution is a network that minimizes an energy function which can be interpreted in terms of an operational and infrastructural cost. Our method relies on recent results from optimal transport theory and is a principled alternative to standard image-processing techniques that are based on heuristics. We test our model on real images of the retinal vascular system, slime mold and river networks and compare with routines combining image-processing techniques. Results are tested in terms of a similarity measure related to the amount of information preserved in the extraction. We find that our model finds networks from retina vascular network images that are more similar to hand-labeled ones, while also giving high performance in extracting networks from images of rivers and slime mold for which there is no ground truth available. While there is no unique method that fits all the images the best, our approach performs consistently across datasets, its algorithmic implementation is efficient and can be fully automatized to be run on several datasets with little supervision.


I. Introduction
Extracting network topologies from images is a relevant problem in applications where the subject of the image has a network-like structure. For instance, satellite images of rivers [1], neuronal networks [2,3], blood or vein networks [4,5], mitochondrial networks [6,7] or road networks [8]. Assuming this could be done automatically and quantitatively, practitioners would be then able to apply the mathematical study of networks to make quantitative analyses about the topological properties of the system at study. In practice, given a raw image, for instance, a satellite image of a river embedded in a landscape, extracting a network requires identifying a set of nodes and a set of edges connecting them. While it might be relatively easy to perform this identification qualitatively, the challenge here is performing this extraction automatically, thus avoiding tedious manual extraction or specific domain knowledge and ad-hoc tools. At the same time, this task should be scalable with system size and number of images as high-quality images are increasingly available and for larger systems. In addition, a qualitative intuition of the possible existence of a network behind an image is not enough to ensure that no degree of subjectivity is introduced due to the observer's eye. For instance, two different observers might both perceive the presence of a network-like structure but distinguish two different sets of nodes, and thus two different networks behind the same image. Another challenge is indeed that of performing this extraction in a principled way so that the number of arbitrary choices in defining what the network is should be limited, if not completely absent.
Here we present a method that addresses these issues by considering the framework of optimal transport. Specifically, inspired by a recently developed model to extract network topologies from solutions of routing op-timization problems [9], we adapt this formalism to our specific and different setting. We start from a raw image as input and propose a model that outputs a network representing the topological structure contained in the image. The novelty of this method is that its theoretical underpinning relies on a principled optimization framework. In fact, a proper energy function is efficiently minimized using numerical methods, which results in an output network topology. This implies in particular that network extraction may not depend on the observer's eye, but rather can be automatically done by solving this optimization problem. We study our model on real images from different fields, we focus in particular on ecology and biology and compare results with an algorithm that relies on standard image processing techniques, highlighting the main differences resulting from these two different approaches. In particular, our model allows for an automatic and principled performance of two tasks: filtering network redundancies and selection of edge weights. These are usually challenging tasks for image processing schemes, as they rely on some pre-defined parameter setting in input, while we obtain both directly in output with our model.
Many solutions for the problem of automatic network extraction from images have been proposed in computer vision, mainly relying on image-processing techniques [10][11][12][13], for instance segmentation [14,15], or junctionpoint processes [16]. The idea is to measure variation of intensity contrast in the image's pixels to highlight curve-like structures. Within this context, NEFI [17] is a flexible toolbox based on a combination of standard image-processing routines. A different approach, closer to the one considered in this work, is that of adopting some sort of optimization framework. For instance, [18] arXiv:2012.12758v1 [cs.CV] 23 Dec 2020 considered an optimization problem where the goal is to minimize the total roughness of a path (a measure depending on the difference of weights in adjacent edges), in order to decompose a filamentous network into individual filaments. However, these usually rely on domainspecific optimization setups that cannot be easily transferred across domains. Another example is the ant-colony optimization scheme used to extract blood vessels from images of retinas [19]. They all suffer from the NPhardness of the problem, typical of routing optimization settings. Thus this type of approach relies on approximation techniques. Finally, another approach is that of biologically-inspired mathematical models like the one of Tero et al. [20,21] that consider dynamical systems of equations that emulate network adaptability, like that observed for the Physarum Polycephalum slime mold. Our model is also inspired by the feedback mechanism of a slime model, which adapts the conductivity of the network edges to respond to differences in fluxes.
Our method relies on the formalism of optimal transport theory used to extract networks from solutions of routing optimization problems proposed in [9] and referred to as NextRout. This is made of three subsequent steps, but here we need only the last two, namely the pre-extraction and the filtering steps. While we refer to that work for all the mathematical details, here we describe the main principles behind this method and adapt it to images. The idea is inspired by the behavior of the Physarum Policephalum slime mold. The body of this organism forms a network structure that flexibly adapts to the surrounding environment and the distribution of food sources displaced in it. This network grows with a feedback mechanism between two physical quantities: the conductivities of network edges and the flow passing through them, through a dynamics that is described by a set of equations (sometimes referred to as "adaptation equations"). In practice, the problem starts by assigning food sources in space and spreading the slime mold uniformly to cover the whole space. The dynamics regulates how the slime mold changes its body shape in time to reach the food in an efficient way. The stationary solution of this dynamical system is a set of conductivities and flows on edges that describe the optimal network topology covered by the mold. When the underlying space is continuous, like a squared patch in 2D, these solutions are functions defined on (x, y) coordinates on this space. These are not immediately associated with a network meant as a set of nodes and a set of edges connecting them. However, [9] propose principled rules to automatically extract network topologies from these solutions in continuous space. While the main focus of that work was to extract network topologies from this particular type of input (functions defined in a continuous domain, e.g. the space where food sources are located), they hinted at the possibility of adapting this formalism to discrete spaces, like images made of pixels. Here we expand on this insight, and adapt this principled network extraction to inputs that are images. In particular, we propose an algorithm to effectively tackle two problems that are relevant for images and that were only briefly discussed for general applications in [9]: how to select source and sinks and how to obtain loopy structures.
FIG. 1: Analogy between optimizing trajectories and images. Left): the grid structure covering a continuous 2D space and the optimal flux obtained by NextRout for a specific routing problem where sources and sinks are inside the green and red rectangles respectively. Red tones of increasing darkness denote higher fluxes on the corresponding grid triangles. Right) the pixel grid and colors of a reference network-like image.

II.
Image2net: the method The key idea is to treat the images as a particular discretization of a 2D space by means of the pixels and treat the RGB color values on them as conductivities. With this setup, we can frame the problem as if there was an imaginary flow of colors that behaves analogously to the slime mold, it starts by covering the whole discrete domain of the image uniformly and then flows through the pixels until it consolidates to a certain subset of them. The observed image corresponds to the networklike shape that the mold converges to in order to optimize the path to reach the food sources. Figure 1 illustrates the analogy between the solutions of NextRout in continuous space and an image of a network-like structure in discrete space.
As we mentioned before, this is not yet formally a network, as we do not have a rigorous definition of what constitutes a node and how nodes are connected. However, thanks to the analogy proposed here, we can use the rules introduced in [9] for the continuous case and adapt them to images. Specifically, we consider the pixels' barycenters as nodes and draw edges between them depending on their pixels' locations and values, so that two nearby pixels are connected whenever the color has a high enough intensity and their pixels are neighbors. The result is a pre-extracted network that we denote with G pe . We denote with V and E the set of nodes and edges respectively. The network is mathematically encoded by a signed incidence matrix which has entries B ie = ±1 if the edge e has node i as start/endpoint, 0 otherwise. The sign is important to define the orientation of the flow passing through an edge.
This temporary network might contain redundancies like dangling nodes or redundant edges, see Fig. 2 for an example. Standard image processing techniques address this problem with pruning routines, e.g. by pruning away edges or branches shorter than a certain length. However, pruning has to be handled with great care, as small redundancies could be a major source of information or they could be completely irrelevant, depending on the network at hand. Usually, pruning is tuned by the user, thus creating potential for subjective bias in extracting the network. Instead, our model relies on a principled method for filtering such redundancies, which exploits a dynamics similar to that of the original problem but adapted to a discrete space like that of the network G pe . However, to apply the filter, one must specify a set of terminals, sources and sinks, as input to the discrete dynamics. Continuing with our analogy, we need to locate the pixels where we imagine that color mass is being injected and extracted. These are the sources and sinks that drive the dynamics to consolidate the flux of colors on the network-like structure observed in the input image.
FIG. 2: G pe taken from a river image. The subplot on the bottom left corner shows a section of G pe (in black), highlighted in red in the main plot, together with filtered graph G f in blue.

A. Dynamics
Assume for a moment that we knew this set of terminal pixels and denote with f i the amount of color mass that enters or exits the image in node i (the barycenter of pixel i). Note that to preserve the mass, we have i f i = 0. Here we describe in more detail the dynamical rules that regulate how colors spread along the pixels in an optimal way. To describe the flow of colors, we consider the conductivity µ e on an edge e ∈ E and the potential u i on a node i ∈ V . The conductivities can be interpreted as the size of the diameter of an edge, the potentials as pressures on nodes. Together, these two quantities determine the flow F e of color passing through an edge e = (v 1 , v 2 ) in the network: In turns, the flow influences the conductivities and potentials, through a feedback mechanism described by the following set of equations: where | · | is the absolute value, e is the euclidean length of an edge using the barycenters' coordinates and β is a parameter that determines the optimization mechanism. Equation (2) is Kirchoff's law, Eq. (3) is the discrete dynamics describing the feedback mechanism between conductivity and flow: when the flow of colors is high on an edge e, the conductivity increases, and vice versa when the flow is low the conductivity decreases; Eq. (4) is the initial condition. The stationary solution of this dynamical system can be mapped to the solutions of an optimization problem where the cost function can be interpreted as a network transportation cost: where µ(t) = {µ e (t)} e , and the first term is the network operational cost and the second is the cost to build the network. The values of µ e at convergence can be used not only to determine the set of edges in the extracted network, but also its weights, which can be interpreted as the diameter of the edge on the image. This is one of the advantages of our model, as estimating the diameter of edges extracted from an image is an open problem when using image-processing techniques. We get this automatically with the optimal conductivities. The dynamics works as a filter, i.e. removes redundancies, for β ≥ 1. In this work, we fix β = 1.5 as it gives good performance consistently across the datasets studied here. The output result is a tree, i.e. it does not contain loops and is the optimal one in terms of minimizing the transportation cost of Eq. (5). In our experiments, we use the numerical solver proposed in [9] to extract the stationary solutions of the system of Eq. (2)-(4).

B. Selecting terminal pixels
Having introduced how the dynamics of the colors works, we now tackle the problem of selecting the terminal pixels where to inject or extract imaginary color mass. This choice is crucial as it determines the final extracted networks, in the same way, that the location of the food sources determines the particular shape of the network that a slime mold will form to optimally reach them in a 2D space. In the original problem of [9], this was not an issue because the set of terminals could be selected from that of the original problem in continuous space. In other words, this was an input of the problem. Here we do not start from that input, but rather have access to only a raw image, without any notion of prespecified terminals attached to it. In practice, we need to find the pixel nodes corresponding to the rectangles inside Fig. 1 (left). Here we propose a method to make this selection effectively. Specifically, we select as a set of eligible terminals T (G pe tree ) all the leaves of the tree G pe tree obtained from running our dynamics on G pe when we pass in input all pixel nodes in G pe as terminal, and selecting one of these at random as a source, all the rest as sinks. This choice is motivated by the fact that the tree resulting from the filtering is a good approximation of the pre-extracted G pe , as it follows a principled optimization framework. The obtained leaves determine the coverage of this network, as they are usually located in distant parts of the network. Potentially, one could select terminal pixels "manually", by using domain-knowledge to determine what pixels are the most important. However, this strategy is not scalable to a large number of images. Instead, our proposed procedure does not suffer from this problem as it can be automatically implemented, while also being flexible to receive "hand-picked" terminals if available. Alternatively, a practitioner could make this selection based on some notion of network centrality, for instance selecting as terminal the most "central" nodes. However, this again assumes having extra information to decide what definition of centrality is appropriate based on the application. We do not explore this here.

C. Obtaining loops
Running the dynamics of Sec. II A outputs trees, while network-like structures in images might have loops. The question is thus how to recover networks that are not limited to trees. We tackle this problem by re-running the dynamics multiple times, each time selecting a par-ticular node as source from the eligible ones (and sinks all the others). Specifically, we randomly select an individual source pixel i ∈ T (G pe tree ) and assign all the others j = i ∈ T (G pe tree ) as sinks. Applying the dynamics to G pe with this choice of one source and multiple sinks outputs a filtered network G f r , indexed by the iteration run r. By repeating for N runs this filtering step, each time selecting a different source from T (G pe tree ) (and all the remaining node pixels as sinks), results in a set of filtered networks G f 1 , . . . , G f Nruns , all of them trees. We combine them by superposition, so that we obtain a final network G(V, E, W ) where the set of nodes and edges are the . The weights on the edges of the final network are given by the sum of the weights on each run where w r jk is the weight of edge (j, k) in network G f r and corresponds to the optimal edge conductivities as obtained from the dynamics at convergence. We assume w r jk = 0, if (j, k) ∈ G f r . The value of N runs ≤ |T (G pe tree )| is a parameter that has to be tuned based on the input image. Notice that a high value of N runs might not necessarily result to a network more similar to the one depicted in the input image. For instance, in the extreme scenario where the original network-like structure is a tree, then N runs = 1. Empirically, we find that a value of N runs = 5 gives good results in all the experiments reported here, see Supplementary Information for more details.
Combining these steps we obtain the whole algorithmic pipeline of our method, which we refer to as Image2net. We provide an algorithmic pseudo-code in Algorithm 1 and an open-source implementation at https://github. com/diegoabt/Img2net.

3:
G pe tree ← run Dynamics of Sec. II A on G pe for β and using in input as starting sources and sinks all the nodes in G pe

4:
T (G pe tree ) ← {v ∈ V (G pe tree )|dv = 1} dv is the degree of node v We run our model on three datasets of images covering various types of network-like topologies observed in biology and ecology. The images represent: i) the slime mold Physarum Polycephalum (Physarum Polycephalum) [22], which is also the inspiration of our dynamics; ii) the retinal vascular system (retina) [23]; iii) river networks (rivers) obtained by extracting images from [24]. The number of images taken from the Physarum Polycephalum, retina and rivers sources is 25, 20 and 10 respectively, see Table I. Pre-processing was applied, see Supplementary Information for details. For model comparison, we consider NEFI, a routine that combines various image processing techniques, and a variant of our routine based on a combination of Minimum Spanning Tree and Steiner Tree optimization (Image2net-MST). The idea behind this last routine is to run our procedure but replacing the optimization steps based on the dynamics of Sec. II A with standard routing optimization algorithms, namely a combination of Minimum Spanning and Steiner trees [26]. The goal is to see how the underlying optimization setup impacts the final network topology. In fact, while the underlying idea of treating the problem of network extraction from images within the framework of routing optimization is the same for Image2net and Image2net-MST, the details of their corresponding optimization differ. Specifically, for Image2net-MST, we first run a standard MST optimization algorithm to extract G M ST tree from G pe (this is the same input given to Image2net). From the corresponding set of leaves T (G M ST tree ), this time one should extract a subset of terminals T ⊆ T (G M ST tree ) of predefined size (no distinction between source and sinks is necessary to solve a minimum Steiner tree problem). From G pe and T , extract a Steiner tree G St r , repeat this N runs times and obtain the set G St 1 , . . . , G St Nruns . Finally, superimpose them as done for our method to obtain G mst , see Supplementary Information for more details. Notice that Steiner tree optimization has a complexity that scales with the number of terminals, a problem not present in our dynamics. As a result, running Image2net-MST is noticeably computationally more expensive than Image2net.
Finally, edge weights were assigned with rules specific to each method, as there is no common definition that applies to all of them. In fact, the ability to extract edge weights is rare among image processing techniques, and usually relies on image preprocessing and segmentation of the input image. Instead, Image2net extracts edge weights in a principled way based on the results at optimality in terms of conductivity, hence it has a nice direct interpretation as the diameter of the edges in the image. For Image2net, we use the rule Effective Reweighing (ER) on the resulting conductivities, see [9]; for Image2net-MST, we use the weights given in input to solve the Steiner tree problem, i.e. the weights given by ER rule in G pe ; for NEFI, we use as weight the width, this is an output of the algorithm; for the original image we assign the RGB values of the pixels mapped into an integer number increasing with the color intensity. All of these definitions of weight agree on the higher the weight the thicker the edge is, and thus the conductivity. Figure  3 illustrates an example of the networks extracted using the various algorithms for an image in retina.

A. Performance metrics
We measure performance in terms of the ability of an algorithm to recover the network-like subject depicted on the underlying image. We consider a measure of similarity adapted from the quality measure defined in [9]. This relies on partitioning the image in a grid of P non-intersecting subsets C α inside the pixels' domain and then compare the edges of G within C α assigned by the algorithm and those observed in the original image I (RGB values): where I α (δ, i) = 1, 0 for i ∈ I, if the pixel i is in C α or not, respectively; δ is a threshold used to decide whether that pixel contributes to the network-like image. In words, if the pixel color intensity is high enough, then we label it as an edge. For δ, we use the same value as used in input to Image2net. This is a coarse-grained measure of similarity that tells how many edges in the extracted graph correspond to high-intensity pairs of pixels. In order to account for edge weights and pixel intensities, we also consider a weighted version of this: where p i is the intensity of the pixel i, and I α (i) is 1 if i ∈ C α , and 0 otherwise. Notice that in this case δ is not needed since pixels with low intensity are penalized by lower weight in their contributions toŵ(G, I). In both cases, small values of these measures mean higher similarity values between the extracted network and the underlying network-like structure in the image. While ground-truth for this network-like structure is normally absent, the retina dataset contains groundtruth networks which were hand-labeled by individuals [23]. In this case, we calculate the binary similarity using the hand-labeled images instead of the one given in input. There are two sets of labeled images, each corresponding to a different person doing this manual identification. While similar, the resulting two sets of networks are different. In the absence of ground truth, we compare against the input image.

B. Implementation details.
We apply image pre-processing to the input image to improve image quality and distinguish the main subject from the background, see Supplementary Information for details. We rescale NEFI's pixels' location to have them in the same scale as that of the other methods, i.e. the set [0, 1] × [0, 1] (for simplicity, we consider only square-shaped images). All the edge lengths e have been assigned using the Euclidean distance between the corresponding endpoints. For NEFI, we used the two best performing pipelines of imageprocessing techniques polycephalum_high (NEFI-high) and crack_patterns (NEFI-crack) among the available predefined pipelines. In the figures we show the best results only, these vary based on the image given in input.

A. Retinal vessel image validation
We use the similarity measure defined in the previous section to compare every graph-based approximation of the image with the provided hand-labeled ones, assuming these last ones to be the ground truth I gt . We computeŵ b (G, I gt ) for each retinal image and the corresponding extracted network, to measure how close a particular network is from the human-labeled one. Fig. 4 shows that Image2net consistently outperforms NEFI over all images and the two hand-labeled datasets. Image2net and Image2net-MST perform similarly according to the binary similarity. However, if we account for weight, we obtain the Image2net outperforms Image2net-MST in the majority of the images. Notice that Image2net-MST does not assign new weights while selecting the edges, as in a Steiner tree problem, instead, it uses the weights of the input network, in this case, G pe . Instead, Image2net selects edges and weights at the same time, within the same optimization setup. The fact that the weighted similarity gives better results, signals that the values of the optimal conductivities (the weights assigned to Image2net extracted networks) have a meaningful interpretation, as they better match the pixel's intensities than the weights given by the other algorithms.

B. Physarum Polycephalum and river networks.
We measure the performance in the two datasets where there is no ground truth, which is often the case in real images. We find that Image2net recovers better the rivers networks, for both performance metrics as we show in Fig. 5. In fact, our model is able to capture the detailed geometry of the network when there are curves, while NEFI has limitations in that edges with curves or kinks are contracted to straight lines. This is one of the main advantages of our model based on an underlying optimization framework, the geometry of the network is automatically selected based on optimality, rather than a predefined setting manually tuned. As a result, Image2net is flexible in detecting different network geometries, as can be seen in Fig. 6 (top). The situation for Physarum Polycephalum is more nuanced as Image2net is second row showsŵ values. Smaller values mean higher similarity and thus better performance. Hence, points above the grey line (blue) means Image2net performs better, whereas points below (red) means worse performance.
better than Image2net-MST, in particular when considering the weights, but NEFI outperforms all the others. However, this is true if we use the NEFI-high routine, which is the one built on purpose to detect Physarum Polycephalum networks, it is not surprising that this has stronger results on these datasets. In Fig. 6 (bottom) we notice how these networks contain many small details that are better captured by NEFI. Indeed, using other NEFI routines, performance aligns more to Image2net and Image2net-MST. This also shows that if a practitioner aims at extracting networks from a particular image, all the approaches allow for few degrees of freedom to be tuned in order to increase performance. NEFI allows to specify individual routines to design a custom pipeline, Image2net and Image2net-MST have various parameters that could be tuned, the most important being δ and β. For instance, decreasing δ will allow for more fine details on Physarum Polycephalum networks, see Supplementary Information. However, tuning each routine on each input image goes beyond the scope of this work, as we aim at describing how different approaches perform on a corpus of images, potentially quite different, and thus automatize network extraction in a scalable way.

V. Qualitative results
Beyond validating the model on recovering network structure that resembles well what is pictured in an image, we illustrate the differences in topological properties of the extracted networks. This also showcases possible applications for our model, where a practitioner extracts a network and can then perform further analysis on it, for instance using the detected network properties.
We calculated the total network length as L = e e where e is the Euclidean distance between the nodes defining the edge e, see Fig. 7. We find that Image2net extracts on average longer rivers networks, and similar to NEFI for the retina, but with lower variance in this case. Instead NEFI extracts much longer Physarum Polycephalum networks, mainly due to many small minor paths permeating the whole image (this was signaled above by wider result difference in terms of similarity). Instead, Image2net-MST finds smaller values in all the datasets. This highlights one important difference due to the underlying optimization setup that distinguished these two approaches.
While a longer network total length might be due to a higher number of edges, this is not always the case. This can be seen from results on rivers in Fig. 8, where we plot the distribution of the number of nodes and edges, other important topological properties. In fact, for these images, NEFI finds much smaller network sizes than Image2net, while the distribution of L in Fig. 7 is similar for the two routines. This is again due to NEFI representing curved parts of the network with fewer but longer straight edges, see Fig. 6 for an example. In these river networks, Image2net has a higher resolution, where NEFI fails to find enough details. The opposite extreme is seen for the Physarum Polycephalum images where NEFI has many more nodes and edges when using the routine NEFI-high, and also much higher L as we saw before. For the retina vessel networks, Image2net extracts on average networks with higher number of nodes and edges than the other two methods, while L is similar to NEFI, hence both Image2net and Image2net-MST have on aver- networks. Smaller values mean higher similarity and thus better performance. Hence, points above the grey line (blue) means Image2net performs better, whereas points below (red) means worse performance.
age shorter edges than NEFI, with the difference that Image2net extract networks with bigger sizes.

VI. Conclusions
We propose Image2net, a model for extracting networks from images. It takes as input an image and returns a network structure as a set of nodes, edges and the corresponding weights. Standard approaches for addressing these problems rely on image processing techniques. Instead, our model is based on a principled formalism adapted from recent results of optimal transport theory. We build an analogy with fluid dynamics by treating colors on pixels as fluids flowing through an image and considering a set of dynamical equations for their conductivities and flows. At convergence, these correspond to stationary solutions of a cost function that has a nice interpretation in terms of a transportation cost.
The advantage of our approach with respect to more conventional methods is that our model naturally incorporates a principle definition of edge weights as the optimal conductivities and that can be interpreted as diameters of network edges. In addition, it allows for a principled and automatic filtering of possible redundancies by means of solving a routing optimization problem, instead of using pre-defined pruning routines. We test our model on various datasets, and calculate performance measures in terms of recovering the network-like shape in the input image. Image2net performs well compared to other network extraction tools and yields networks that closely approximate the networks depicted in the images. In particular, it is flexible in finding various network shapes, as it can find curved geometries as those observed in river networks.
In addition to being efficient, automated network extraction also has the advantage of yielding reproducible results and reducing human biases. Indeed, given an input image, Image2net will always yield the same networks, whereas manual extraction depends on the perception of the individual performing the measurement. Our model also enables practitioners to measure networkrelated quantities like centrality measures, branching points or curvature and angles. More importantly, given the computational efficiency of the underlying solver, it also works for large networks where manually measuring metrics across the whole network is not feasible.
In this work, we mostly show example applications from biology and ecology, but the usage of our model is not limited to this kind of networks. It can be used in a broad array of datasets to detect and measure networklike shapes. We foresee that our model will be useful for practitioners willing to perform automatic and scalable network analysis of large datasets of images.   Supporting Information (SI) S1. Dependency on the parameter δ Image2net has two main parameters: δ and β. While β directly acts on the optimization being performed, changing the dynamics and the transportation cost, δ controls the fine level details of the resulting conductivities and fluxes that one should keep in the final network. Specifically, the higher δ, the less detailed the extracted network is. We show the impact of this in Fig. S1.

S2. Similarity metric
Here we show an example of how the partition C α is used to compute the similarity measure. As explained in Section III of the main manuscript, the set [0, 1] 2 is divided into P non-intersecting sets in which the difference (both binary and weighted) is computed. Fig. S2 shows the values of this difference inside each element of the partition. For all the experiments reported here the value of P used is (15 − 1) 2 = 196, obtained by diving each axis in 15 equally-sized intervals.

S3. Dependency on Nruns
In Fig. S3 we show the impact of the parameter N runs on the final extracted graphs. For six example images (two per dataset), we extract the network obtained for N runs = 1, 2, 3, 5, 10, 15. The larger N runs , the better the similarity measure, this comes at a higher computational cost which is proportional to N runs . However, after N runs = 5 we notice that the network stabilizes to a common structure with no further main structural changes happening.

S6. Image preprocessing
This section contains the information related to the steps previous to the network extraction. We preprocessed the images by applying color enhancing, grayscale mapping, segmentation, cropping, and resizing.

A. Cropping
All of the images I were cropped in order to make them square shaped (width = height). We obtained the cropped image by specifying the center of the cropping (x,y) and the width w, and then using OpenCV (cv) functionalities w = int(min(x,y)/3)-1 x = int(x/2) y = int(y/2) img = cv.imread(image) crop_img = img[x-w:x+w, y-w:y+w]

B. Grayscale mapping
For the retina and rivers datasets we converted the colorful cropped images into grayscales by using gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)

E. Resizing
The procedures proposed in this paper depend on the size of the input images. Thus, before extracting the networks, images I are mapped into smaller ones by using a procedure called resizing_image(). The idea behind this function is to get a similar representation of I by replacing it by other image I in which pixel values are a function of the values of the pixels in I. The functions used to generate the new values are the ones accepted by the OpenCV function resize (input interpolation). We set interpolation to be either cv.INTER_NEAREST or cv.INTER_AREA for all the experiments exhibited in this work. The former does nearest-neighbour interpolation while the latter assigns the new pixel value as the area of the surrounding pixels in a given neighborhood. We used cv.INTER_AREA for Physarum Polycephalum and rivers, and cv.INTER_NEAREST for retina. Images in Physarum Polycephalum, rivers and retina were reduced to have 300, 200 and 200 width, respectively.