Core–periphery structure in directed networks

Empirical networks often exhibit different meso-scale structures, such as community and core–periphery structures. Core–periphery structure typically consists of a well-connected core and a periphery that is well connected to the core but sparsely connected internally. Most core–periphery studies focus on undirected networks. We propose a generalization of core–periphery structure to directed networks. Our approach yields a family of core–periphery block model formulations in which, contrary to many existing approaches, core and periphery sets are edge-direction dependent. We focus on a particular structure consisting of two core sets and two periphery sets, which we motivate empirically. We propose two measures to assess the statistical significance and quality of our novel structure in empirical data, where one often has no ground truth. To detect core–periphery structure in directed networks, we propose three methods adapted from two approaches in the literature, each with a different trade-off between computational complexity and accuracy. We assess the methods on benchmark networks where our methods match or outperform standard methods from the literature, with a likelihood approach achieving the highest accuracy. Applying our methods to three empirical networks—faculty hiring, a world trade dataset and political blogs—illustrates that our proposed structure provides novel insights in empirical networks.

4. The structure can have no reciprocal edges between any two distinct blocks. We detail below the structures that satisfy the above assumptions.
(i) 1 Core set and 1 periphery set We first consider the case of one core and one periphery set. As the core must connect internally, and the periphery set must connect to the core, there are two possible design matrices Directed core-periphery structure (A.1) and (A.2) represent an asymmetric flow between core and periphery, as a periphery either only sends or only receives links. This can be a flow of information, such as follower-only accounts in Twitter networks, or a flow of money such as peripheral banks in inter-bank networks who sell their loans to central banks.
(ii) 2 Core sets and 1 periphery set As we restrict the periphery set to connect to only one core, and each core set to connect to at least one periphery, there are no possible design matrices that satisfy these constraints.
(iii) 1 Core set and 2 periphery sets In order for this case to make the structure identifiable, we exclude structures that can be obtained by splitting the peripheries of our 1 core and 1 periphery structures. Under this restriction, only one configuration is possible, namely having the peripheries connect in different directions. Below is the resultant design matrix for this structure. Directed core-periphery structure (A.3) is a particular instance of the well-known bow-tie structure, essentially having a set of vertices that only send material (content, money, etc.), and a set of vertices that only receive material; there are several known real world examples, namely of the internet in [1] and in biological networks [2].
(iv) 2 Core sets and 2 periphery sets In the case of 2 core and 2 periphery sets, the cores must interconnect, and the link cannot be reciprocated, so that there is only one possible structure between the two core sets. By assumption, the periphery vertices only connect to one core set, and each core set must be connected to a periphery. Therefore, there are four possible such structures Per. 1 Core 1 Core 2 Per. To the best of our knowledge, these structures have not been directly studied before. The directed core-periphery structure (A.5) is essentially a bow-tie in which the core can be split into two sets, one which takes the incoming connections of the periphery vertices, and the remaining core set which sends links to the additional periphery set. This could be seen as a two-step process version of the bow-tie structure, in which material is taken in, processed, and then distributed to a second set that performs a similar action. In contrast, the structure (A.4), which is the one on which the main paper focuses, is not a simple extension of the bow-tie structure due to the fact that, contrary to bow-tie, the flow in (A.4) is not uni-directional.

B Selection and Performance of Different Variants of the HITS Methods
Several methodological choices were made during the development stage of the HITS and AdvHits methods, based on their relative performance on our synthetic benchmarks. The Hits method can be divided into the following steps.
1. Construct the Hits scores of the observed graph adjacency matrix.
2. Construct vertex scores from the low-rank approximation.
3. Cluster the vertices by applying k-means++ to the scores.
For the scoring step described in the main text, we use the scores C HIT S In , C HIT S out , P HIT S In and P HIT S Out . This formulation is purely score-based, and it does not leverage other network information such as the score of the neighbours of a vertex. In this section, we detail an additional scoring variant we considered. For the first alternative formulation, heuristically, a vertex i should have a high P in score if 1. it has in-coming edges from vertices that have high C out scores and does not have in-coming edges from vertices that have high C in scores; 2. it does not have many out-going edges to nodes with high C out or C in scores.
Additionally, a vertex i should have a high P out score if 1. it has out-going edges to vertices that have high C in scores, and does not have out-coming edges to vertices that have high C in scores; 2. it does not have many in-coming edges from nodes with high C out or C in scores.
Combining these, we propose the following alternative scoring scheme To decide which approach to use in the main paper, we considered both scoring schemes, and selected the scoring scheme in the main body as it outperforms the alternatives on all three benchmarks. We also considered two variants of the HITS method, based on the alternative scoring in Eqs. (B.8) and (B.9). The scoring scheme we presented in the main body outperforms the alternatives on all three benchmarks.

C Stochastic Block Model Fitting: The MaxLike Method
In this section, we detail the algorithms for fitting a stochastic block model to the proposed structure. We implemented MaxLike from [3] as follows.
1. Assign all vertices to a random set (CurrentPartition).
2. Calculate the likelihood of the random partition, and store it and the partition itself in BestPartition.

Make a list
List1 of all vertices. 4. Find the vertex in List1 for which the movement to a new set will increase the likelihood the most (or decrease the least). Perform the move and remove the vertex from List1.

5.
If the likelihood of the current partition is larger than BestPartition, then replace BestPartition with the current partition.
6. If List1 is not empty go to Step 4.

If
BestPartition has improved since Step 3, then set CurrentPartition to BestPartition and go to Step 3.

D Synthetic Benchmarks
In this section, we present additional content and results which relate to our synthetic benchmarks.

(a) Similarity Measures
Adjusted Rand index The Adjusted Rand Index (ARI) [4] measures the similarity between two partitions of a data set. Let V = (v 1 , v 2 , . . . , v n ) be a set of vertices we wish to cluster, let X = (X 1 , X 2 , X 3 , X 4 ) be a partition of V into four clusters, and let Y = (Y 1 , Y 2 , Y 3 , Y 4 ) be the ideal partition. We let a be the number of vertex pairs that are placed in the same set of X and in the same set of Y ; b be the number of vertex pairs that are placed in the same set of X but in different sets of Y ; c be the number of vertex pairs that are placed in different sets of X but in the same set of Y ; d be the number of vertex pairs that are placed in different sets of X and in different sets of Y . The Adjusted Rand Index [4] is The Adjusted Rand Index (ARI) is bounded by ±1 with an expected value of 0. An ARI close to 1 indicates an almost perfect match between the two partitions, whereas an ARI close to -1 indicates that the agreements between two partitions is less than what is expected from random labellings (which has an ARI of 0) and points towards complementary partitions.   Normalised Mutual Information (NMI) and Variation of Information (VOI) These measures are based on the concept of entropy and mutual information [5]. For a random variable X with values in a discrete set S X , and a random variable Y with values in a discrete set S Y [5], the entropy of X is H(X) = − a∈S X P (X = a) log(P (X = a)), and the mutual information M I(X, Y ) of X and Y is given by The variation of information V OI(X, Y ) is defined as see for example [6]. Thus, if M I(X, Y ) = H(X) = H(Y ), indicating that the information in X is perfectly captured by the information in Y , V OI(X, Y ) would have a value of 0. If M I(X, Y ) = 0, then V OI(X, Y ) takes the largest possible value. Normalised mutual information N M I(X, Y ) is defined in [7] as .    Table 1 and Table 2. Broadly, the results are qualitatively similar to the ARI results. For both NMI and VOI, SaPa2 and DiSum are outperformed by all other  methods, which is consistent with the results of ARI. Similarly AdvHits and MaxLike outperform clustering based on degree in all cases, with HITS performing at least as well as clustering based on degree. We also observe an equivalent performance result for AdvHits, with an improvement over the HITS method, but outperformed by MaxLike.
NMI We highlight the VOI results for GraphTool. It outperforms the other methods for p < 2 −4 , whereas ARI and NMI show the opposite performance. This discrepancy may be related to the effect which placing almost all vertices into one set has on the similarity measures. Indeed, Table 3 shows the average size of the largest set in AdvHits and GraphTool. For AdvHits, the average largest set size is close to an equal division into four equally-sized sets, whereas GraphTool incorrectly places most of the vertices in a single set for p < 0.06 ≈ 2 −4.06 . With most vertices in the GraphTool partition in one set, the entropy will be close to 0, as − log(1 − ) ≈ 0. As mutual information is by definition less than the VOI   Table 3: Average size of the largest set in the detected partition in Benchmark 1. Note that the maximum achievable set size for a division into four non empty sets is 400 entropy, the VOI thus becomes close to the entropy of the almost constant sequence, which is equal to − log 2 (0.25) = 2, a relatively high value. The behaviour of NMI on GraphTool can also be similarly explained; as mutual information is close to 0, NMI also becomes close to 0.

(c) Additional Results for Benchmark 2
Plots for Benchmark 2, which show the combination of parameters p 1 and p 2 that yield an ARI of 0.75, or 0.9, respectively, are shown in Fig. SI 3. The results are broadly consistent, with similar performance for many regimes in the 0.9 contour, but a larger performance difference in the 0.75 contour. The 0.9 contour (Fig. SI 3) shows that the performance of GraphTool is closer to that of the likelihood methods compared to the 0.75 contour. This could be related to GraphTool often placing all nodes in the same set when the structure is weak. Fig. SI 4 shows contour plots for NMI and VOI for network size n = 400; the results for n = 1000 are broadly similar. The NMI contours give the same ordering and roughly the same performance as observed for ARI. The VOI contours, displayed in Fig. SI 4 right panel, show broadly consistent findings but with a few small deviations.

(d) Additional Results for Benchmark 3
The average ARI results as well as the NMI and VOI results of all methods on Benchmark 3 can be found in Fig. SI  5. We fix the size of three sets at n 1 = n/4 = 100 and vary the size of the final set n 2 . For example, to vary the size of P out , we fix n Cin = n Cout = n Pin = n 1 and test performance when we let n Pout = n 2 ∈ {2 −3 n 1 , 2 −2 n 1 , . . . , 2 3 n 1 }, with equivalent formulations for the other sets. Thus for n2 n1 = 1 we have equal-sized sets, which is equivalent to the model in Benchmark 1, for n2 n1 > 1 one set is larger than the remaining sets, and for n2 n1 < 1, one set is smaller than the others. The results are broadly consistent across each of the similarity measures, although there is a small deviation with the performance of AdvHits for n2 n > 2; the information theoretic measures (NMI and VOI) yield a slightly higher performance for AdvHits than ARI. In both cases, the performance of AdvHits collapses in this region.
The performance of the methods differs depending on whether it is a core set or a periphery set which changes size. In the periphery, smaller set sizes seem to have a minimal impact on performance, while large set sizes lead to a large decay in performance. In the core, both smaller and larger sets appear to have a significant impact.
Summarising, in Benchmarks 1 and 2, the results are qualitatively similar between the graph sizes, with methods failing in the same order and in the same manner, although the range of values where this failure occurs shifts. This is expected as there is additional information present in larger graphs, so we would expect smaller values of p to have potentially  larger effects on performance (up to detectability limits). We observe a similar effect in Benchmark 3 for n = 1000; with the increased network size (and fixed value of p = 0.1), with many methods performing better over the range of block sizes, although we observe a similar decay in the performance of AdvHits.

(e) Performance of DiSum and SaPa variants
As alluded to in the main text, we consider a small number of variants of our spectral comparison methods and only display the best performing method. In this section we discuss the other variants we considered on each of our benchmarks.
SaPa Variants From [8], SaPa employs a similarity measure using two different types of symmetrisations for directed (asymmetric) adjacency matrices, namely bibliometric symmetrisation and degree discounted symmetrisation. We use them here to construct two Sapa variants, namely SaPa1 and SaPa2, which we then cluster. They are defined as 1. SaPa1 -Bibiometric symmetrisation: where D i is a diagonal matrix with the in-degrees on the diagonal, D o is an diagonal matrix with the out-degrees on the diagonal, A I = A + I, where I is the identity matrix. In the original formulation of SaPa, these similarity matrices were then clustered via various clustering algorithms. Instead, we use the same kmeans++ pipeline used previously.
DiSum Variants The second fastest approach, DiSum from [9], clusters the graph using the left and right singular vectors of the regularised and degree normalised adjacency matrix ((k out u + m n )(k in v + m n )) −1/2 A uv . While [9] performed a column and row clustering, as we require clustering of the whole network, we consider four different variants of this approach, as follows. . DiSum4: Combining a row clustering into 2 sets with a column clustering into 2 sets to obtain 4 resultant sets (with each set defined by the intersection of a pair of sets from the different clusterings). Note, we use a separate row and column clustering rather than the combined clustering in [9]. We again cluster the resulting vectors using our standard k-means++ pipeline.

Results
We observe that SaPa2 and DiSum3 outperform their counterparts on Benchmark 1 and Benchmark 2. In Benchmark 3, DiSum4 outperforms DiSum3, for large differences in group sizes. However, given that DiSum3 performs reasonably and outperforms DiSum4 considerably in Benchmarks 1 and 2, we select DiSum3 as our overall DiSum method.

(f ) Testing our Novel Quality Measures
In this section, we present additional results testing the relationship between our quality measures and the ground truth partition. We display the average DCP M of the partitions from Benchmark 1 in the main body (Table 4). While the method with the highest average DCP M does not always have the highest ARI, its ARI is often close to the highest value, i.e. p = 0.09, GraphTool has the highest average DCP M , with an ARI of 0.968, whereas the highest ARI is 0.987, only a marginal difference. For completeness, in Table 5, we also present the sample standard deviation of the p-value, DCP M and ARI. The sample standard deviation is often many orders of magnitude below the average for the p-values, and typically, at least one order of magnitude below for DCPM and ARI.

(g) Run Time On Synthetic Networks
The two panels of Fig. SI 6 show the run time of each of the methods on Benchmark 1. We note that the time comparison is advisory, as this calculation was performed on an Azure Virtual Machine. To mitigate the effects of hyperthreads, we used half of the available vCPUs, but we cannot control for other users present on the same physical hardware. Further, different approaches are implemented in different languages, with the matrix-based methods (HITS, AdvHits, SaPa, DiSum, and HITS) being implemented in Python using NumPy and SciPy vectorised matrix operations, and GraphTool in C++ with Python wrapping 1 . Finally, the likelihood methods are implemented in pure Python, ran under cPython, with the exception of MaxLike for n = 1000 which was ran under PyPy [10]. When testing these methods under PyPy, we obtain a ≈ 10x speed up on the cPython times. The implementations of SaPa, DiSum, and HITS are relatively naïve, and could be improved, for example, by exploiting the specialised matrix construction code used in AdvHits.
We observe that the performance of many of the methods is relatively constant with respect to p, with an increase in run time when p is small. This behaviour is especially pronounced in AdvHits with comparatively fast times for the   easier graphs, and slower times for the harder graphs, which is likely to be due to the fact that the method falls back to the single vertex update scheme for small values of p. This is also likely to be related to the error bars (one sample standard deviation), which increase for low p in AdvHits for n = 400. We note that AdvHits is slower than Hits when p is small and has a better time complexity than the PyPy and non-PyPy versions of MaxLike (with 10 repeats), but may be closer in performance to a simpler likelihood optimiser highlighting that the AdvHits approach is a medium speed approach in contrast to the fast approaches.

E Political Blogs Results
The PolBlogs data set from [11] consists of political blogs as nodes, and a directed edge from one blog to another denoting that the first blog contains at least one link to the second blog. The data set was collected on a single day during the 2004 US presidential election. After collapsing 65 multi-edges, it contains 1, 490 nodes and 19, 090 edges. The set of blogs is divided into 758 liberal blogs and 732 conservative blogs. For the analysis in this section, we focus on the largest weakly connected component of the network, which has n = 1, 222 nodes and 19, 024 edges.
The right panel of Fig. SI 7 shows the ARI between the partitions given by 11 runs (original partition with 10 additional repeats) for each of our methods and a single run of BowTie. We note a high within-method AR and a high similarity between AdvHits and MaxLike. Furthermore, and importantly, there is a low ARI between our methods and BowTie, indicating that our methods do uncover a different structure.
Inspecting the left panel of Fig. SI 7, the HITS method gives p-values lower than 5% against the directed ER null model, but not against the directed configuration model, potentially indicating that the recovered structures can be modelled as a function of the in-and out-degree distributions. The other methods yield partitions with p-values less than 5% on both tests. Considering the DCPM, we note that the MaxLike partition has the highest value, and thus we focus on this structure for the remainder of the analysis.
The structure uncovered with MaxLike, shown in Fig. SI 8A, presents both a network visualisation and a matrix showing the density of edges between each pair of sets. We note that there is a visible 'L'-shape structure, indicating a split into the four roles. Fig. SI 8B shows the degree patterns of each set, which broadly matches our intuition from the 'L'-shaped structure. It is notable that not all nodes with very high out-degree are in C out , although all four nodes in C out are in the top 4% of nodes by out-degree.
To validate the structure, we first explore the relationship between the sets we uncover and the political division of the network. Figure SI 8C gives the confusion matrix between each of our divisions and the political division. We observe the sets divide into two roughly similarly-sized sets. These sets are not in 1-1 relationship with the Left/Right party division of the blogs. Hence there is evidence that similar to the division in [12], our partition relates to the role the sets play in the network.  for the slow methods, we use a single run and thus display a blank (white) square. To compare to bow-tie, we compare both to the raw partition into 7 sets and a partition formed by a subset of the nodes corresponding to the main three sets. We focus on the MaxLike partition as it has the highest DCPM. In A, we show summary network diagrams of the structure. In the first summary, the size of each of the nodes is proportional to the number of nodes in the corresponding set, and the width of the lines is given by the percentage of edges that are present. The second summary displays the percentage of edges present between each pair of blocks, allowing for a visualisation that highlights the 'L'-structure. In B, we present the in-and out-degree distributions of each of the groups. Finally, in C, we compare our division into four groups with the political division of the blogs.
A possible interpretation of the directed core-periphery structure is that the C in set represents blogs that are seen as authorities, and thus do not link to many other blogs (but are linked to themselves), whereas the opposite is true of those in C out . A potentially related structure was observed in the 15-block SBM from [13], in which blocks are identified that cite but are not cited, and other blocks which are highly cited either in general or by specific groups. An inspection of the web-address corresponding to the nodes in each set suggests that certain sets are enriched for 'blogspot' sites. A 'blogspot' site is a free blogging site that requires less expertise to set up compared to a full website. We see such an authority set in the MaxLike partition; the fraction of 'blogspot' sites is 0.21 in C in versus > 0.43 in all other sets. Moreover, 39 out of 40 top blogs are in C in in this division. Moreover, [11] provides a list of the top 20 conservative and the top 20 liberal blogs, by filtering the top ≈ 100 liberal and conservative blogs by degree, and ranking the top blogs by the number of citations of blog posts in an external index (see [11] for details). In MaxLike, all except one of the top blogs are also in C in , thus providing evidence for our hypothesis that C in is an authority set.
In conclusion, our method finds a division of the PolBlogs data in an 'L'-based structure. We further investigated this structure, and found that our C in set appears to represent authorities, in contrast to the C out which represents nonauthority core nodes. We validate this split using two covariates, first the presence of 'blogspot' website and the second using the list of top blogs provided by [11].