Cycle and flow trusses in directed networks

When we represent real-world systems as networks, the directions of links often convey valuable information. Finding module structures that respect link directions is one of the most important tasks for analysing directed networks. Although many notions of a directed module have been proposed, no consensus has been reached. This lack of consensus results partly because there might exist distinct types of modules in a single directed network, whereas most previous studies focused on an independent criterion for modules. To address this issue, we propose a generic notion of the so-called truss structures in directed networks. Our definition of truss is able to extract two distinct types of trusses, named the cycle truss and the flow truss, from a unified framework. By applying the method for finding trusses to empirical networks obtained from a wide range of research fields, we find that most real networks contain both cycle and flow trusses. In addition, the abundance of (and the overlap between) the two types of trusses may be useful to characterize module structures in a wide variety of empirical networks. Our findings shed light on the importance of simultaneously considering different types of modules in directed networks.


S1 Algorithms to compute trusses
In this section, we describe our algorithm for finding the flow and cycle k-trusses in a network. For a network G, the sets of nodes and links are denoted by V (G) and E(G), respectively. For a node v ∈ V (G), we define N + G (v) as the set of out-neighbors of v, that is, N + G (v) = {w ∈ V (G) | vw ∈ E(G)}; we define deg + G (v) as the out-degree of v, i.e., deg + G (v) = |N + G (v)|. Here, for the sake of simplicity, we denote the link from node u to node v by uv .
Our algorithm uses a subroutine called CommonNeighbor (Algorithm 1). This subroutine takes two networks G and G on the same node set V and two nodes u, v ∈ V , and returns the set of nodes w such that uw ∈ E(G) and vw ∈ E(G ). This subroutine will be used to enumerate the cycle or flow triangles involving the link uv. The algorithm is relatively straightforward: it chooses either u or v, enumerates its out-neighbors w, and then checks whether w is also an out-neighbor of the other unchosen node. For efficiency, we choose u if deg + G (u) < deg + G (v) and choose v otherwise. Using hash tables for storing out-neighbors of nodes, the time complexity of CommonNeighbor is bounded by return W . 17: end procedure Now we present our algorithm for enumerating cycle trusses (Algorithm 2). Given a network G, it computes cycle k-trusses for all k at once.
First, for every link uv ∈ E(G), we count the number of cycle triangles involving uv and store the number to c[uv] (Line 6). This can be done by calling CommonNeighbor(G , G, u, v), where G is the network obtained from G by reversing the directions of links. This is because if there exists a cycle triangle in G with links uv, vw, and wu, then G contains the link uw and G contains the link vw. Let G be the network obtained from G by reversing the directions of links.

12:
for w ∈ CommonNeighbor(G , G, u, v) do  Remove the link uv, and update G and G . 16: end while 17: k ← k + 1.

18:
end while 19: end procedure Next, starting with k = 0, as long as the links remain, we perform the following process: As long as there is a link uv such that c[uv] is at most k, we set the truss number [uv] of uv to be k (Line 11), then for each cycle triangle involving the link uv, we decrease the count of the other two links (Line [12][13][14], and finally, we remove the link uv from the network (Line 15). If there is no link with a count of at most k, we increment the value of k and repeat the process. Note that, when the process starts for a particular value of k, all links have counts of at least k, and thus, all these links have truss numbers of at least k. On the other hand, when we remove a link in the process for a particular k value, because we have only removed links that cannot be a member of a cycle (k + 1)-truss, its truss number is at most k. Therefore, each link is assigned with the correct truss number.
The overall time complexity of CycleTruss is dominated by the time complexity of enumerating the cycle triangles. Naively, this can be bounded by where N and M are the number of nodes and links, respectively, in the input network. In practice, however, the time taken runs is almost linear with M for real networks. Finally, we explain our algorithm for enumerating flow trusses (Algorithm 3), which simultaneously computes flow k-trusses for all k values. Conceptually, FlowTruss is almost the same as CycleTruss. However, for a link uv and a node w, there are three types of a flow triangle involved: (i) a flow truss with links uv, wv, and uw, (ii) a flow truss with links uv, wv, and wu, and (iii) a flow truss with links uv, vw, and wu. These links can be enumerated by calling CommonNeighbor(G, G ), CommonNeighbor(G , G ), and CommonNeighbor(G, G), respectively, where G is the network obtained from G by reversing the directions of all links. The other parts of the algorithm and the analysis of time complexity are the same as those of CycleTruss, and therefore we omit them. Let G be the network obtained from G by reversing the directions of links.

7:
Count flow triangles with links uv, wv, and uw for some w ∈ V (G). 8:

9:
Count flow triangles with links uv, wv, and wu for some w ∈ V (G). 10:

11:
Count flow triangles with links uv, vw, and uw for some w ∈ V (G).

17:
for w ∈ CommonNeighbor(G, G , u, v) do Remove the link uv, and update G and G .

S2 Data sources
The network data sets used in the present study were downloaded from the following websites. The metabolic networks were based on those used in Ref. [1] and the network data were given by Kazuhiro Takemoto through personal communication. Any additional information from links, such as the weight, sign, or time stamp, were discarded from the network data. We also removed the self-loops and multiple links to make the networks simple. The basic statistics of the truss structure for empirical networks obtained from 12 different fields are summarized in Tables. S1 and S2. Except for k f max for the circuit networks and a few examples, almost all k c max and k f max are nontrivial. Additionally, the k c max and k f max values do not necessarily increase with the number of cycle and flow triangles. This result implies that the trusses can indicate the information regarding the module structure, irrespective of the count of these triangles.        Figure S3: Histogram of the R measure for the metabolic networks. The measure R quantifies the overlap between the set of links with the largest k c values and that with the largest k f values.