- This article has a Correction
##### Correction

- Correction to ‘Intermediacy of publications’
- https://doi.org/10.1098/rsos.201326
- volume 7issue 9Royal Society Open Science
- 02 September 2020

- Review history

## Abstract

Citation networks of scientific publications offer fundamental insights into the structure and development of scientific knowledge. We propose a new measure, called intermediacy, for tracing the historical development of scientific knowledge. Given two publications, an older and a more recent one, intermediacy identifies publications that seem to play a major role in the historical development from the older to the more recent publication. The identified publications are important in connecting the older and the more recent publication in the citation network. After providing a formal definition of intermediacy, we study its mathematical properties. We then present two empirical case studies, one tracing historical developments at the interface between the community detection literature and the scientometric literature and one examining the development of the literature on peer review. We show both conceptually and empirically how intermediacy differs from main path analysis, which is the most popular approach for tracing historical developments in citation networks. Main path analysis tends to favour longer paths over shorter ones, whereas intermediacy has the opposite tendency. Compared to the main path analysis, we conclude that intermediacy offers a more principled approach for tracing the historical development of scientific knowledge.

### 1. Introduction

Citation networks provide invaluable information for tracing historical developments in science. The idea of tracing scientific developments based on citation data goes back to Eugene Garfield, the founder of the Science Citation Index. In a report published more than 50 years ago, Garfield *et al*. [1] concluded that citation analysis is ‘a valid and valuable means of creating accurate historical descriptions of scientific fields’. Garfield also developed a software tool called HistCite that visualizes citation networks of scientific publications. This tool supports users in tracing historical developments in science, a process sometimes referred to as *algorithmic historiography* by Garfield et al. [2–4]. More recently, a software tool called CitNetExplorer [5] was developed that has similar functionality but offers more flexibility in analysing large-scale citation networks. Other software tools, most notably CiteSpace [6] and CRExplorer [7,8], provide alternative approaches for tracing scientific developments based on citation data.

Main path analysis, originally proposed by Hummon & Doreian [9], is a widely used technique for tracing historical developments in science. Given a citation network, main path analysis identifies one or more paths in the network that are considered to represent the most important scientific developments. Many variants and extensions of main path analysis have been proposed [10–16], not only for citation networks of scientific publications but also for patent citation networks [17–21]. However, despite the large body of literature in which main path analysis is used, we question whether the technique is really suitable for tracing historical developments in science. We show that main path analysis has the tendency to favour longer citation paths over shorter ones. In our view, this is an undesirable property that leads to counterintuitive results.

As an alternative to main path analysis, we introduce a new approach for tracing historical developments in science based on citation networks. This approach is based on a measure that we call intermediacy. Given two publications dealing with a specific research topic, an older publication and a more recent one, intermediacy can be used to identify publications that appear to play a major role in the historical development from the older to the more recent publication. These are publications that, based on citation links, are important in connecting the older and the more recent publication.

Like main path analysis, intermediacy can be used to identify paths between publications in a citation network. However, as we show both conceptually and empirically, there are fundamental differences between intermediacy and main path analysis. Most significantly, whereas main path analysis tends to favour longer citation paths over shorter ones, intermediacy has the opposite tendency. For the purpose of tracing historical developments in science, we argue that intermediacy yields better results than main path analysis.

Intermediacy might seem similar to centrality, but there is an essential difference. Centrality measures [22], such as degree centrality, closeness centrality, betweenness centrality and eigenvector centrality, indicate how central a node is in a network. Intermediacy is different because it is defined relative to a specific source and target node, not relative to a network as a whole. This is why centrality measures cannot be used to capture the idea of intermediacy. Preliminary work on the intermediacy was presented in [23].

### 2. Intermediacy

Consider a directed acyclic graph *G* = (*V*, *E*), where *V* denotes the set of nodes of *G* and *E* denotes the set of edges of *G*. The edges are directed. We are interested in the connectivity between a source *s* ∈ *V* and a target *t* ∈ *V*. Only nodes that are located on a path from source *s* to target *t* are of relevance. We refer to such a path as a source-target path. We assume that each node *v* ∈ *V* is located on a source-target path.

### Definition 2.1.

Given a source *s* and a target *t*, a path from *s* to *t* is called a *source-target path*.

In this paper, our focus is on citation networks of scientific publications. In this context, nodes are publications and edges are citations. We choose edges to be directed from a citing publication to a cited publication. Hence, edges point backward in time. This means that the source is a more recent publication and the target an older one.

Informally, the more important the role of a node *v* ∈ *V* in connecting source *s* to target *t*, the higher the intermediacy of *v*. To formally define intermediacy, we assume that each edge *e* ∈ *E* is either *active* or *inactive*. An edge is active with a certain probability *p*, where *p* ∈ (0, 1). This probability is the same for all edges. We exclude the possibility that this probability equals 0 or 1, since this would not yield useful results. Based on the notion of active and inactive edges, we introduce the following definitions.

### Definition 2.2.

If all edges on a path are active, the path is called *active*. Otherwise, the path is called *inactive*. If a node *v* ∈ *V* is located on an active source-target path, the node is called *active*. Otherwise, the node is called *inactive*.

For two nodes *u*, *v* ∈ *V*, we use *X*_{uv} to indicate whether there is an active path (or multiple active paths) from node *u* to node *v* (*X*_{uv} = 1) or not (*X*_{uv} = 0). The probability that there is an active path from node *u* to node *v* is denoted by $Pr({X}_{uv}=1)$. We use *X*_{st}(*v*) to indicate whether there is an active source-target path that goes through node *v* (*X*_{st}(*v*) = 1) or not (*X*_{st}(*v*) = 0). The probability that there is an active source-target path that goes through node *v* is denoted by $Pr({X}_{\mathrm{st}}(v)=1)=Pr({X}_{sv}=1)Pr({X}_{vt}=1)$. This probability equals the probability that node *v* is active.

Intermediacy can now be defined as follows.

### Definition 2.3.

The *intermediacy* *ϕ*_{v} of a node *v* ∈ *V* is the probability that *v* is active, that is,

In the interpretation of intermediacy, we focus on the ranking of nodes relative to each other. We do not consider the absolute values of intermediacy. For instance, suppose the intermediacy of node *v* ∈ *V* is twice as high as the intermediacy of node *u* ∈ *V*. We then consider node *v* to be more important than node *u* in connecting the source *s* and the target *t*. However, we do not consider node *v* to be twice as important as node *u*.

We now present an analysis of the mathematical properties of intermediacy. The proofs of the mathematical results provided below can be found in appendix A.

#### 2.1. Limit behaviour

To get a better understanding of intermediacy, we study the behaviour of intermediacy in two limit cases, namely the case in which the probability *p* that an edge is active goes to 0 and the case in which the probability *p* goes to 1. In each of the two cases, the ranking of the nodes in a graph based on intermediacy turns out to have a natural interpretation. The difference between the two cases is illustrated in figure 1*a*.

Let ℓ_{v} denote the length of the shortest source-target path going through node *v* ∈ *V*. The following theorem states that in the limit as the probability *p* that an edge is active tends to 0, the ranking of nodes based on intermediacy coincides with the ranking based on ℓ_{v}. Nodes located on shorter source-target paths are more intermediate than nodes located on longer source-target paths.

#### Theorem 2.4.

*In the limit as the probability* *p* *tends to* 0, ℓ_{u} < ℓ_{v} *implies* *ϕ*_{u} > *ϕ*_{v}.

The intuition underlying this theorem is as follows. When the probability that an edge is active is close to 0, almost all edges are inactive. Consequently, almost all source-target paths are inactive as well. However, from a relative point of view, longer source-target paths are more likely to be inactive than shorter source-target paths. This means that nodes located on shorter source-target paths are more likely to be active than nodes located on longer source-target paths (even though for all nodes the probability of being active is close to 0). Nodes located on shorter source-target paths, therefore, have a higher intermediacy than nodes located on longer source-target paths.

We now consider the limit case in which the probability *p* that an edge is active goes to 1. Let *σ*_{v} denote the number of edge-independent source-target paths going through node *v* ∈ *V*. Theorem 2.5 states that in the limit as *p* tends to 1, the ranking of nodes based on intermediacy coincides with the ranking based on *σ*_{v}. The larger the number of edge-independent source-target paths going through a node, the higher the intermediacy of the node.

#### Theorem 2.5.

*In the limit as the probability* *p* *tends to* 1, *σ*_{u} > *σ*_{v} *implies* *ϕ*_{u} > *ϕ*_{v}.

Intuitively, this theorem can be understood as follows. When the probability that an edge is active is close to 1, almost all edges are active. Consequently, almost all source-target paths are active as well, and so are almost all nodes. A node is inactive only if all source-target paths going through the node are inactive. If there are *σ* edge-independent source-target paths that go through a node, this means that the node can be inactive only if there are at least *σ* inactive edges. Consider two nodes *u*, *v* ∈ *V*. Suppose that the number of edge-independent source-target paths going through node *v* is larger than the number of edge-independent source-target paths going through node *u*. In order to be inactive, node *v* then requires more inactive edges than node *u*. This means that node *v* is less likely to be inactive than node *u* (even though for both nodes, the probability of being inactive is close to 0). Hence, node *v* has a higher intermediacy than node *u*. More generally, nodes located on a larger number of edge-independent source-target paths have a higher intermediacy than nodes located on a smaller number of edge-independent source-target paths.

#### 2.2. Parameter choice

The probability *p* that an edge is active is a free parameter of intermediacy for which one needs to choose an appropriate value. The results presented above are concerned with the behaviour of intermediacy in the limit cases in which the probability *p* tends to either 0 or 1. Figure 1*b* provides some insight into the behaviour of intermediacy for values of the probability *p* that are in between these two extremes. The figure shows two graphs. In the left graph, there is a direct path (i.e. a path of length 1) from node *u* to node *v*. There are no indirect paths. In this graph, the probability that there is an active path from *u* to node *v* equals *p*. In the right graph, there is no direct path from node *u* to node *v*, but there are *k* indirect paths of length 2. Each of these paths has a probability of *p*^{2} of being active. Consequently, the probability that there is at least one active path from node *u* to node *v* equals 1 − (1 − *p*^{2})^{k}. The bar chart in figure 1*b* shows for different values of *k* the values of *p* for which the probability that there is an active path from node *u* to node *v* is higher (in orange) or lower (in grey) in the left graph than in the right graph. For instance, suppose that *k* = 5. For *p* < 0.22, the probability that there is an active path from node *u* to node *v* is higher in the left graph than in the right graph. For *p* > 0.22, the situation is the other way around. If the probability *p* that an edge is active is set to 0.22, a direct path between two nodes is considered equally strong as five indirect paths of length 2. Based on figure 1*b*, one can set the probability *p* to a value that one considers appropriate for a particular analysis.

#### 2.3. Path addition and contraction

Next, we study two additional properties of intermediacy, the property of path addition and the property of path contraction. We show that both adding paths and contracting paths lead to an increase in intermediacy. Path addition and path contraction are important properties because they reflect the basic intuition underlying the idea of intermediacy. (Of course, in practice, paths cannot simply be added or contracted in a citation network. However, we can have two regions in a citation network that are topologically identical except for a path addition or a path contraction. Our theoretical analysis can be interpreted as an analysis comparing the intermediacy of the nodes in the two regions of the citation network.)

We start by considering the property of path addition. We define path addition as follows.

#### Definition 2.6.

Consider a directed acyclic graph *G* = (*V*, *E*) and two nodes *u*, *v* ∈ *V* such that there does not exist a path from node *v* to node *u*. *Path addition* is the operation in which a new path from node *u* to node *v* is added. Let ℓ denote the length of the new path. If ℓ = 1, an edge (*u*, *v*) is added. If ℓ > 1, nodes *w*_{1}, …, *w*_{ℓ−1} and edges (*u*, *w*_{1}), (*w*_{1}, *w*_{2}), …, (*w*_{ℓ−2}, *w*_{ℓ−1}), (*w*_{ℓ−1}, *v*) are added.

This definition includes the condition that there does not exist a path from node *v* to node *u*. This condition ensures that the graph *G* will remain acyclic after adding a path. The following theorem states that adding a path increases intermediacy.

#### Theorem 2.7.

*Consider a directed acyclic graph* *G* = (*V*, *E*), *a source* *s* ∈ *V*, *and a target* *t* ∈ *V*. *In addition*, *consider two nodes* *u*, *v* ∈ *V* *such that there does not exist a path from node* *v* *to node* *u*. *Adding a path from node* *u* *to node* *v* *strictly increases the intermediacy* *ϕ*_{w} *of any node* *w* ∈ *V* *located on a path from source* *s* *to node* *u* *or from node* *v* *to target* *t*.

Theorem 2.7 does not depend on the probability *p*. Adding a path always increases intermediacy, regardless of the value of *p*. To illustrate the theorem, consider figure 2*a*,*b*. The graph in figure 2*b* is identical to the one in figure 2*a* except that a path from node *u* to node *v* has been added. As can be seen, adding this path has increased the intermediacy of nodes located between source *s* and node *u* or between node *v* and target *t*, including nodes *u* and *v* themselves. While the intermediacy of other nodes has not changed, the intermediacy of these nodes has increased from 0.17 to 0.23. This reflects the basic intuition that, after a path from node *u* to node *v* has been added, going from source *s* to target *t* through nodes *u* and *v* has become ‘easier’ than it was before. This means that nodes located between source *s* and node *u* or between node *v* and target *t* have become more important in connecting the source and the target. Consequently, the intermediacy of these nodes has increased.

We now consider the property of path contraction. We use *V*_{uv} to denote the set of all nodes located on a path from node *u* to node *v*, including nodes *u* and *v* themselves. Path contraction is then defined as follows.

#### Definition 2.8.

Consider a directed acyclic graph *G* = (*V*, *E*) and two nodes *u*, *v* ∈ *V* such that there exists at least one path from node *u* to node *v*. *Path contraction* is the operation in which all nodes in *V*_{uv} are contracted. This means that the nodes in *V*_{uv} are replaced by a new node *r*. Edges pointing from a node $w\notin {V}_{uv}$ to nodes in *V*_{uv} are replaced by a single new edge (*w*, *r*). Edges pointing from nodes in *V*_{uv} to a node $w\notin {V}_{uv}$ are replaced by a single new edge (*r*, *w*). Edges between nodes in *V*_{uv} are removed.

The following theorem states that contracting paths increases intermediacy.

#### Theorem 2.9.

*Consider a directed acyclic graph* *G* = (*V*, *E*), *a source* *s* ∈ *V*, *and a target* *t* ∈ *V*. *In addition, consider two nodes* *u*, *v* ∈ *V* *such that there exists at least one path from node* *u* *to node* *v* *and such that nodes in* *V*_{uv} *do not have neighbours outside* *V*_{uv} *except for incoming neighbours of node* *u* *and outgoing neighbours of node* *v*. *Contracting paths from node* *u* *to node* *v* *strictly increases the intermediacy* *ϕ*_{w} *of any node* *w* ∈ *V* *located on a path from source* *s* *to node* *u* *or from node* *v* *to target* *t*.

Like theorem 2.7, theorem 2.9 does not depend on the probability *p*. Theorem 2.9 is illustrated in figure 2*b*,*c*. The graph in figure 2*c* is identical to the one in figure 2*b* except that paths from node *u* to node *v* have been contracted. As a result, there has been an increase in the intermediacy of nodes located between source *s* and node *u* or between node *v* and target *t*, including nodes *u* and *v* themselves (which have been contracted into a new node *r*). While the intermediacy of other nodes has not changed, the intermediacy of these nodes has increased from 0.23 to 0.34. This reflects the basic intuition that, after paths from node *u* to node *v* have been contracted, going from source *s* to target *t* through nodes *u* and *v* has become ‘easier’ than it was before. In other words, nodes located on a path from source *s* to target *t* going through nodes *u* and *v* have become more important in connecting the source and the target, and hence the intermediacy of these nodes has increased.

#### 2.4. Alternative approaches

How does intermediacy differ from alternative approaches? We consider three alternative approaches. One is main path analysis [9]. This is the most commonly used approach for tracing the historical development of scientific knowledge in citation networks. The second alternative approach is the expected path count approach. Like intermediacy, the expected path count approach distinguishes between active and inactive edges and focuses on active source-target paths. While intermediacy considers the probability that there is at least one active source-target path going through a node, the expected path count approach considers the expected number of active source-target paths that go through a node. The third alternative approach is resistance [24–26]. Resistance is a measure of the distance between nodes in a graph. We use it to define an alternative to intermediacy.

Consider the graph shown in figure 3*a*. To get from source *s* to target *t*, one could take either a path going through nodes *u* and *v* or the path going through node *w*. Based on intermediacy, the latter path represents a stronger connection between the source and the target than the former one. This follows from the path contraction property.

Interestingly, main path analysis gives the opposite result, as can be seen in figure 3*b*. For each edge, the figure shows the search path count, which is the number of source-target paths that go through the edge. There are two source-target paths that go through (*s*, *u*) and (*v*, *t*), while all other edges are included only in a single source-target path. Because the search path counts of (*s*, *u*) and (*v*, *t*) are higher than the search path counts of (*s*, *w*) and (*w*, *t*), main path analysis favours paths going through nodes *u* and *v* over the path going through node *w*. This is exactly opposite to the result obtained using intermediacy. Figure 3*b* makes clear that main path analysis yields outcomes that violate the path contraction property. Main path analysis tends to favour longer paths over shorter ones. For the purpose of identifying publications that play an important role in connecting an older and a more recent publication, we consider this behaviour to be undesirable. There are various variants of main path analysis, which all show the same type of undesirable behaviour.

Instead of focusing on the probability of the existence of at least one active source-target path, as is done by intermediacy, one could also focus on the expected number of active source-target paths going through a node. This alternative approach, which we refer to as the expected path count approach, is illustrated in figure 3*c*. As can be seen in the figure, nodes *u* and *v* have a higher expected path count than node *w*. Paths going through nodes *u* and *v* may, therefore, be favoured over the path going through node *w*. Figure 3*c* shows that, unlike intermediacy, the expected path count approach does not have the path contraction property. Depending on the probability *p*, contracting paths may cause expected path counts to decrease rather than increase. Because the expected path count approach does not have the path contraction property, we do not consider this approach to be a suitable alternative to intermediacy.

Finally, in figure 4, we illustrate the difference between intermediacy and resistance [24–26]. To get from source *s* to target *t*, one could take either a path going through node *u* or a path going through node *v*. Based on intermediacy, node *v* offers a stronger connection between the source and the target than node *u* (figure 4*a*). This follows from the path addition property. On the other hand, based on resistance, nodes *u* and *v* offer equally strong connections between the source and the target (figure 4*b*). Resistance is a measure of the distance between two nodes in a graph. Our interest focuses on the resistance between the source and the target. We define the resistance of a specific node as the resistance between the source and the target when only paths going through the node of interest are taken into account. Nodes *u* and *v* both have the same resistance of 2. According to the path addition property, node *v* should have a lower resistance than node *u*. (A lower resistance corresponds to a higher connectedness of the source and the target.) The equal resistance of nodes *u* and *v* shows that resistance does not have the path addition property.

### 3. Empirical analysis

We now present two case studies that serve as empirical illustrations of the use of intermediacy. Case 1 deals with the topic of community detection and its relationship with scientometric research. This case was selected because we are well acquainted with the topic and because we expect many readers of the present paper to be familiar with the topic as well. Case 2 deals with the topic of peer review. This case is of interest because it was examined using main path analysis in a recent paper by Batagelj *et al.* [27]. We consider this paper to be representative of the state of the art in main path analysis. Case 2, therefore, is well suited for demonstrating the differences between intermediacy and main path analysis.

In both case studies, the intermediacy of publications was calculated using the Monte Carlo algorithm presented in appendix B.

#### 3.1. Case 1: community detection and scientometrics

We analyse how a method for community detection in networks ended up being used in the field of scientometrics to construct classification systems of scientific publications. In particular, we are interested in the historical development from Newman & Girvan [28] to Klavans & Boyack [29]. These are our target and source publications. Newman & Girvan [28] introduced a new measure for community detection in networks, known as modularity, while Klavans & Boyack [29] compared different ways in which modularity-based approaches can be used to identify communities in citation networks.

Our analysis relies on data from the Scopus database produced by Elsevier. We also considered the Web of Science database produced by Clarivate Analytics. However, many citation links relevant for our analysis are missing in Web of Science. There are also missing citation links in Scopus, but for Scopus the problem is less significant than for Web of Science. We refer to van Eck & Waltman [30] for a further discussion of the problem of missing citation links.

In the Scopus database, we found *n* = 64 223 publications that are located on a citation path between our source and target publications. In total, we identified *m* = 280 033 citation links between these publications. This means that on average each publication has *k* = 2*m*/*n* ≈ 8.72 citation links, counting both incoming and outgoing links.

Figure 5*a* shows how the probability of the existence of an active path between the source and target publications depends on the parameter *p*. This probability increases from zero for *p* = 0 to almost one starting from *p* = 0.25. The vertical line indicates the value *p* = 1/*k*. At this value, traditional percolation theory for random graphs suggests that the probability that the source and target publications are connected becomes non-negligible [22]. When searching for a suitable value of *p*, the value *p* = 1/*k* suggested by percolation theory may serve as a reasonable starting point. In our case, this yields *p* ≈ 1/8.72 ≈ 0.11, resulting in a probability of about 0.40 for the existence of an active source-target path.

For five different values of the parameter *p*, figure 5*b* shows the cumulative distribution of the intermediacy scores of our *n* = 64 223 publications. As is to be expected, when *p* is close to zero, intermediacy scores are extremely small. On the other hand, when *p* is getting close to one, intermediacy scores also approach one.

Figure 5*c*,*d* shows Spearman and Pearson correlations between the intermediacy scores obtained for five different values of the parameter *p*. We consider intermediacy scores to be most useful from an ordinal perspective. From this point of view, Spearman correlations are more relevant than Pearson correlations, but for completeness, we report both types of correlations. The Spearman correlations show that values of 0.3, 0.5, 0.7 and 0.9 for *p* all yield fairly similar rankings of publications in terms of intermediacy. However, the ranking obtained for *p* = 0.1 is substantially different. Pearson correlations tend to be lower than Spearman correlations. Hence, even when different values of *p* yield similar rankings of publications, there usually does not exist a clear linear relationship between the intermediacy scores.

Figure 5*c*,*d* also shows correlations of intermediacy scores with citation counts and reference counts. The term *citation count* refers to the number of incoming citation links of a publication, while the term *reference count* refers to the number of outgoing citation links of a publication. Only citation links located on a citation path between the source and target publications are counted. Regardless of the value of *p*, intermediacy scores are not very strongly correlated with citation counts or reference counts.

Based on our expert knowledge of the topic under study, we found that the most useful results were obtained by setting the parameter *p* equal to 0.1. Table 1 lists the 10 publications with the highest intermediacy for *p* = 0.1. For each publication, the intermediacy is reported for five different values of *p*. In addition, the table also reports each publication’s citation count and reference count. Figure 5*e* shows the citation network of the 10 most intermediate publications for *p* = 0.1.

p |
||||||||
---|---|---|---|---|---|---|---|---|

0.1 | 0.3 | 0.5 | 0.7 | 0.9 | cit. | ref. | ||

t |
Newman & Girvan [28] | 0.301 | 0.992 | 1.000 | 1.000 | 1.000 | 468 | 0 |

s |
Klavans & Boyack [29] | 0.301 | 0.992 | 1.000 | 1.000 | 1.000 | 0 | 24 |

1 | Waltman & van Eck [31] | 0.061 | 0.376 | 0.656 | 0.878 | 0.988 | 2 | 27 |

2 | Waltman & van Eck [32] | 0.060 | 0.695 | 0.964 | 0.999 | 1.000 | 15 | 22 |

3 | Hric et al. [33] |
0.052 | 0.300 | 0.499 | 0.700 | 0.900 | 1 | 29 |

4 | Fortunato [34] | 0.037 | 0.629 | 0.972 | 1.000 | 1.000 | 73 | 154 |

5 | Newman [35] | 0.035 | 0.736 | 0.979 | 1.000 | 1.000 | 221 | 8 |

6 | Ruiz-Castillo & Waltman [36] | 0.024 | 0.360 | 0.624 | 0.847 | 0.981 | 2 | 24 |

7 | Blondel et al. [37] |
0.022 | 0.836 | 0.998 | 1.000 | 1.000 | 78 | 21 |

8 | Newman [38] | 0.021 | 0.851 | 0.999 | 1.000 | 1.000 | 138 | 18 |

9 | Newman [39] | 0.020 | 0.296 | 0.501 | 0.700 | 0.900 | 246 | 1 |

10 | Rosvall & Bergstrom [40] | 0.020 | 0.803 | 0.994 | 1.000 | 1.000 | 70 | 10 |

Using our expert knowledge to interpret the results presented in table 1 and figure 5*e*, we are able to trace how a method for community detection ended up in the scientometric literature. The two publications with the highest intermediacy [31,32] played a key role in introducing modularity-based approaches in the scientometric community. Waltman & van Eck [32] proposed the use of modularity-based approaches for constructing classification systems of scientific publications, while Waltman & van Eck [31] introduced an algorithm for implementing these modularity-based approaches. This algorithm can be seen as an improvement of the so-called Louvain algorithm introduced by Blondel *et al.* [37], which is also among the 10 most intermediate publications. Most of the other publications in table 1 and figure 5*e* are classical publications on community detection in general and modularity in particular. The publications by Newman all deal with modularity-based community detection. Rosvall & Bergstrom [40] proposed an alternative approach to community detection. They applied their approach to a citation network of scientific journals, which explains the connection with the scientometric literature. Fortunato [34] is a review of the literature on community detection. The intermediacy of this publication is probably strongly influenced by its large number of references. Hric *et al.* [33] is a more recent publication on community detection. This publication focuses on the challenges of evaluating the results produced by community detection methods. This issue is very relevant in a scientometric context, and therefore the publication was cited by our source publication [29]. Finally, there is one more scientometric publication in table 1 and figure 5*e*. This publication [36] is one of the first studies presenting a scientometric application of classification systems of scientific publications constructed using a modularity-based approach. The publication was also cited by our source publication.

The citation counts reported in table 1 show that some publications, especially the more recent ones, have a high intermediacy even though they have been cited only a very limited number of times. This makes clear that a ranking of publications based on intermediacy is quite different from a citation-based ranking of publications. The publications in table 1 that have a high intermediacy and a small number of citations do have a substantial number of references.

Finally, we compare the results obtained using intermediacy to the results given by main path analysis. The latter results, obtained using the original version of main path analysis [9] and using a more recent variant [12], can be found in electronic supplementary material, figures S1 and S2. Intermediacy and main path analysis provide completely different results. As shown in figure 5*e*, intermediacy yields a number of short paths between Newman & Girvan [28] in the community detection literature and Klavans & Boyack [29] in the scientometric literature. These paths go through well-known publications. On the other hand, main path analysis yields an extremely long path, going through more than 50 publications, most of which are not particularly well known. Despite our expert understanding of both the community detection literature and the scientometric literature, there are many publications that we are not familiar with. Unlike the results obtained using intermediacy, we believe that the results given by main path analysis do not provide much insight into the historical development from Newman & Girvan [28] to Klavans & Boyack [29].

Case 2 presented next offers another comparison between intermediacy and main path analysis.

#### 3.2. Case 2: peer review

In case 2, we analyse the literature on peer review. The analysis is based on data from the Web of Science database. We make use of the same data that was also used in a recent paper by Batagelj *et al.* [27].

We started with a citation network of 45 965 publications dealing with peer review. This is the citation network that was labelled CiteAcy by Batagelj *et al.* [27]. We selected Cole & Cole [41] and Garcia *et al.* [42] as our target and source publications. The main path analysis carried out by Batagelj *et al.* [27] suggests that these are central publications in the literature on peer review. For the purpose of our analysis, only publications located on a citation path between our source and target publications are of relevance. Other publications play no role in the analysis. We, therefore, restricted the analysis to the *n* = 615 publications located on a citation path from Garcia *et al.* [42] to Cole & Cole [41]. These publications are connected by *m* = 3420 citation links, resulting in an average of *k* = 2*m*/*n* ≈ 11.12 citation links per publication.

As can be seen in figure 6*a*, percolation theory suggests a value of 1/*k* ≈ 1/11.12 ≈ 0.09 for the parameter *p*. This is close to the value of 0.11 obtained in case 1. However, the probability of the existence of an active path between the source and target publications equals 0.03, which is much lower than the probability of 0.40 in case 1. Intermediacy scores tend to be higher in case 2 than in case 1. This can be seen by comparing figure 6*b* to figure 5*b*. We note that the former figure has a linear horizontal axis, while the horizontal axis in the latter figure is logarithmic. The Spearman and Pearson correlations are somewhat higher in case 2 (figure 6*c*,*d*) than in case 1 (figure 5*c*,*d*).

Table 2 lists the 10 publications with the highest intermediacy, where we use a value of 0.1 for the parameter *p*, like in table 1. Figure 6*e* shows the citation network of the 10 most intermediate publications. There are numerous paths in this citation network going from our source publication [42] to our target publication [41]. We regard these paths as the core paths between the source and target publications.

p |
||||||||
---|---|---|---|---|---|---|---|---|

0.1 | 0.3 | 0.5 | 0.7 | 0.9 | cit. | ref. | ||

t |
Cole & Cole [41] | 0.048 | 0.841 | 0.995 | 1.000 | 1.000 | 14 | 0 |

s |
Garcia et al. [42] |
0.048 | 0.841 | 0.995 | 1.000 | 1.000 | 0 | 8 |

1 | Lee et al. [43] |
0.018 | 0.510 | 0.865 | 0.986 | 1.000 | 5 | 71 |

2 | Zuckerman & Merton [44] | 0.016 | 0.336 | 0.622 | 0.847 | 0.981 | 73 | 2 |

3 | Campanario [45] | 0.013 | 0.592 | 0.967 | 0.999 | 1.000 | 23 | 35 |

4 | Crane [46] | 0.009 | 0.270 | 0.498 | 0.700 | 0.900 | 34 | 1 |

5 | Campanario [47] | 0.009 | 0.517 | 0.952 | 0.999 | 1.000 | 15 | 30 |

6 | Gottfredson [48] | 0.008 | 0.320 | 0.622 | 0.847 | 0.981 | 26 | 2 |

7 | Bornmann [49] | 0.008 | 0.333 | 0.776 | 0.975 | 1.000 | 6 | 71 |

8 | Bornmann [50] | 0.007 | 0.259 | 0.500 | 0.700 | 0.900 | 1 | 20 |

9 | Bornmann [51] | 0.007 | 0.275 | 0.500 | 0.700 | 0.900 | 1 | 17 |

10 | Merton [52] | 0.005 | 0.243 | 0.497 | 0.701 | 0.901 | 29 | 1 |

The core paths shown in figure 6*e* can be compared to the results obtained by Batagelj *et al.* [27] using main path analysis. Different variants of main path analysis were used by Batagelj *et al.* [27]. Both using the original version of main path analysis [9] and using a more recent variant [12], the paths that were identified are rather lengthy, as can be seen in figs 9 and 10 in Batagelj *et al.* [27]. The shortest main paths include about 20 publications.

The above findings, together with the observations made in case 1, confirm the fundamental difference between intermediacy and main path analysis. Main path analysis tends to favour longer paths over shorter ones, whereas intermediacy has the opposite tendency.

Using the results presented in table 2 and figure 6*e*, experts on the topic of peer review could discuss the historical development of the literature on this topic. Since our own expertise on the topic of peer review is limited, we refrain from providing an interpretation of the results.

### 4. Conclusion

Citation networks provide valuable information for tracing the historical development of scientific knowledge. For this purpose, citation networks are usually analysed using main path analysis [9]. However, the idea of a main path is not very well understood. The algorithmic definition of a main path is clear, but the underlying conceptual motivation remains somewhat obscure. As we have shown in this paper, main path analysis has the tendency to favour longer paths over shorter ones. We regard this as a counterintuitive property that lacks a convincing justification.

Intermediacy, introduced in this paper, offers an alternative to main path analysis. It provides a principled approach for identifying publications that appear to play a major role in the historical development from an older to a more recent publication. The older publication and the more recent one are referred to as the target and the source, respectively. Publications with a high intermediacy are important in connecting the source and the target publication in a citation network. As we have shown, intermediacy has two intuitively desirable properties, referred to as path addition and path contraction. Because of the path contraction property, intermediacy tends to favour shorter paths over longer ones. This is a fundamental difference with main path analysis. Intermediacy also has a free parameter that can be used to fine-tune its behaviour. This parameter enables interpolation between two extremes. In one extreme, intermediacy identifies publications located on a shortest path between the source and the target publication. In the other extreme, it identifies publications located on the largest number of edge-independent source-target paths.

We have also examined intermediacy in two case studies. In the first case study, intermediacy was used to trace historical developments at the interface between the community detection literature and the scientometric literature. This case study has shown that intermediacy yields results that make sense from our viewpoint as domain experts. In the second case study, intermediacy was applied to the literature on peer review. Both cases studies have demonstrated the strong preference of main path analysis for long paths.

There are various directions for further research. First of all, a more extensive mathematical analysis of intermediacy can be carried out, possibly resulting in an axiomatic foundation for intermediacy. Intermediacy can also be generalized to weighted graphs. In a citation network, a citation link may, for instance, be weighed inversely proportional to the total number of incoming or outgoing citation links of a publication. Another way to generalize intermediacy is to allow for multiple sources and targets. The ideas underlying intermediacy can also be used to develop other types of indicators for graphs, such as an indicator of the connectedness of two nodes in a graph. In empirical analyses, intermediacy can be applied not only in citation networks of scientific publications but for instance also in patent citation networks or in completely different types of networks, such as human mobility and migration networks, world trade networks, transportation networks, and passing networks in sports. Also, more comprehensive comparisons between intermediacy and main path analysis can be performed. The results of the two approaches can be evaluated in a systematic way based on input from domain experts.

### Data accessibility

The data used in the first case study have been obtained from the Scopus database produced by Elsevier. Due to licence restrictions, the data cannot be made openly available. Readers can contact Elsevier to obtain the data (https://www.elsevier.com/solutions/scopus). The data used in the second case study have been obtained from the Web of Science database produced by Clarivate Analytics. Due to licence restrictions, the data cannot be made openly available. Readers can contact Clarivate Analytics to obtain the data (https://clarivate.com/products/web-of-science). The code used for computing the intermediacy is freely available online (https://github.com/lovre/intermediacy).

### Authors' contributions

L.Š., L.W., V.T. and N.J.E. designed research, L.Š., L.W., V.T. and N.J.E. performed research, L.Š., V.T. and N.J.E. analysed data and L.W. wrote the paper. All authors gave final approval for publication.

### Competing interests

The authors have no conflicting interests to declare.

### Funding

This work has been supported in part by the Slovenian Research Agency under the programmes P2-0359 and P5-0168, and by the European Union COST Action number CA15109.

## Acknowledgements

The authors thank Vladimir Batagelj for sharing the data used to study the literature on peer review.

## Appendix A. Proofs

Below we provide the proofs of the theorems presented in the main text. We first need to introduce some additional notation. We use $Pr({X}_{uv})$ as a shorthand for $Pr({X}_{uv}=1)$. To make explicit that this probability depends on a graph *G*, we write $Pr({X}_{uv}\mid G)$. Furthermore, we use *A*_{e} to indicate whether an edge *e* is active. Hence, *A*_{e} = 1 if edge *e* is active and *A*_{e} = 0 if edge *e* is not active.

**A.1. Limit behaviour**

## Proof of theorem 2.4.

Let *m* = |*E*| denote the number of edges in the graph *G*. Suppose that the *m* edges are split into two sets, one set of *M* edges and another set of *m* − *M* edges. The probability that the edges in the former set are all active while the edges in the latter set are all inactive equals

*v*∈

*V*. The shortest source-target path that goes through node

*v*has a length of ℓ

_{v}. This means that at least ℓ

_{v}edges need to be active in order to obtain an active source-target path that goes through node

*v*. Hence, the probability that there is an active source-target path that goes through node

*v*can be written as

*n*

_{vi}> 0 for all

*i*= ℓ

_{v}, …,

*m*. Note that this probability equals the intermediacy of node

*v*. Now consider two nodes

*u*,

*v*∈

*V*with ℓ

_{u}< ℓ

_{v}. In the limit as

*p*tends to 0,

*ϕ*

_{u}and

*ϕ*

_{v}both tend to 0. However, they do so at different rates. More specifically, in the limit as

*p*tends to 0, we have

*p*tends to 0,

*ϕ*

_{u}>

*ϕ*

_{v}. ▪

## Proof of theorem 2.5.

Let *m* = |*E*| denote the number of edges in the graph *G*, and let *q* denote the probability that an edge is inactive, that is, *q* = 1 − *p*. Suppose that the *m* edges are split into two sets, one set of *M* edges and another set of *m* − *M* edges. The probability that the edges in the former set are all inactive while the edges in the latter set are all active equals

*v*∈

*V*. There are

*σ*

_{v}edge-independent source-target paths that go through node

*v*. This means that at least

*σ*

_{v}edges need to be inactive in order for there to be no active source-target path that goes through node

*v*. Hence, the probability that there is no active source-target path that goes through node

*v*can be written as

*n*

_{vi}> 0 for all $i={\sigma}_{v},\dots ,m$. Note that the intermediacy of node

*v*equals 1 minus this probability, that is,

*ϕ*

_{v}= 1 −

*Φ*

_{v}. Now consider two nodes

*u*,

*v*∈

*V*with

*σ*

_{u}>

*σ*

_{v}. In the limit as

*p*tends to 1,

*Φ*

_{u}and

*Φ*

_{v}both tend to 0. However, they do so at different rates. More specifically, in the limit as

*p*tends to 1, we have

*p*tends to 1,

*Φ*

_{u}<

*Φ*

_{v}, which implies that

*ϕ*

_{u}>

*ϕ*

_{v}. ▪

**A.2. Path addition and path contraction**

## Proof of theorem 2.7.

Suppose that node *w* is located on a path from source *s* to node *u*. Let *H* denote the graph obtained after the path from node *u* to node *v* has been added, and let *E*_{uv} denote the set of newly added edges. The intermediacy of node *w* in graph *G* can be factorized as ${\varphi}_{w}(G)=Pr({X}_{sw}\mid G)Pr({X}_{wt}\mid G)$. Similarly, for graph *H*, we have ${\varphi}_{w}(H)=Pr({X}_{sw}\mid H)Pr({X}_{wt}\mid H)$. Clearly, $Pr({X}_{sw}\mid G)=Pr({X}_{sw}\mid H)$, since the paths from node *s* to node *w* are identical in graphs *G* and *H*. Furthermore, $Pr({X}_{wt}\mid G)=Pr({X}_{wt}\mid H\hspace{0.17em}\mathrm{and}\hspace{0.17em}\mathrm{\forall}e\in {E}_{uv}\hspace{0.17em}:\hspace{0.17em}{A}_{e}=0)$. Since $Pr({X}_{wt}\mid H\hspace{0.17em}\mathrm{and}\hspace{0.17em}\mathrm{\forall}e\in {E}_{uv}\hspace{0.17em}:\hspace{0.17em}{A}_{e}=0)<Pr({X}_{wt}\mid H)$, it follows that $Pr({X}_{wt}\mid G)<Pr({X}_{wt}\mid H)$. This means that *ϕ*_{w}(*G*) < *ϕ*_{w}(*H*).

An analogous proof can be given if node *w* is located on a path from node *v* to target *t*. ▪

## Proof of theorem 2.9.

Suppose that node *w* is located on a path from source *s* to node *u*. Let *H* denote the graph obtained after paths from node *u* to node *v* have been contracted, and let *E*_{uv} denote the set of all edges between nodes in *V*_{uv}. The intermediacy of node *w* in graph *G* can be factorized as ${\varphi}_{w}(G)=Pr({X}_{sw}\mid G)Pr({X}_{wt}\mid G)$. Similarly, for graph *H*, we have ${\varphi}_{w}(H)=Pr({X}_{sw}\mid H)Pr({X}_{wt}\mid H)$. Clearly, $Pr({X}_{sw}\mid G)=Pr({X}_{sw}\mid H)$, since the paths from node *s* to node *w* are identical in graphs *G* and *H*. Furthermore, because nodes in *V*_{uv} , except for nodes *u* and *v*, do not have neighbours outside *V*_{uv} , we have $Pr({X}_{wt}\mid H)=Pr({X}_{wt}\mid G\hspace{0.17em}\mathrm{and}\hspace{0.17em}\mathrm{\forall}e\in {E}_{uv}\hspace{0.17em}:\hspace{0.17em}{A}_{e}=1)$. Since $Pr({X}_{wt}\mid G\hspace{0.17em}\mathrm{and}\hspace{0.17em}\mathrm{\forall}e\in {E}_{uv}\hspace{0.17em}:\hspace{0.17em}{A}_{e}=1)>Pr({X}_{wt}\mid G)$, it follows that $Pr({X}_{wt}\mid H)>Pr({X}_{wt}\mid G)$. This means that *ϕ*_{w}(*H*) > *ϕ*_{w}(*G*).

An analogous proof can be given if node *w* is located on a path from node *v* to target *t*. ▪

## Appendix B. Algorithms

Intermediacy depends on the probability that there exists a path between two nodes in a graph. Determining this probability is known as the problem of network reliability. This problem is NP-hard [53]. Below we provide an outline of an exact algorithm for calculating intermediacy. Because of its exponential run-time, the exact algorithm can be used only in relatively small graphs. We, therefore, also propose a Monte Carlo algorithm that approximates intermediacy.

**B.1. Exact algorithm**

The exact algorithm, illustrated in figure 7*a*, is based on contraction and deletion of edges [54]. Suppose we have a graph *G* = (*V*, *E*). The probability that there exists a path between two nodes *u*, *v* ∈ *V* can be written as

*G*/

*e*denotes the contraction of an edge

*e*∈

*E*and

*G*−

*e*denotes the deletion of an edge

*e*∈

*E*. Edge contraction must respect reachability [55]. Equation (B 1) yields a recursive algorithm for calculating $Pr({X}_{uv})$. For a node

*v*∈

*V*, this algorithm can be used to calculate $Pr({X}_{sv})$ and $Pr({X}_{vt})$. The intermediacy

*ϕ*

_{v}of node

*v*is then given by equation (2.1). We are usually interested in calculating the intermediacy of all nodes in a graph

*G*, not just of one specific node. This can be performed efficiently by calculating $Pr({X}_{sv})$ and $Pr({X}_{vt})$ for all nodes

*v*∈

*V*in a single recursion.

The run-time of the exact algorithm is exponential in the number of edges *m*. The algorithm has a complexity of $\mathcal{O}({2}^{m})$. In the special case of a so-called series–parallel graph, the run-time of the algorithm can be reduced from exponential to polynomial [56].

**B.2. Monte Carlo algorithm**

The Monte Carlo algorithm, illustrated in figure 7*b*, is quite straightforward. Suppose we have a graph *G* = (*V*, *E*) and we are interested in the intermediacy *ϕ*_{v} of a node *v* ∈ *V*. A subgraph *H* can be obtained by sampling the edges in the graph *G*, where each edge *e* ∈ *E* is sampled with probability *p*. Given a subgraph *H*, it can be determined whether in this subgraph node *v* is located on a path from source *s* to target *t*. We sample *N* subgraphs *H*_{1}, …, *H*_{N}. We then approximate the intermediacy of node *v* by ${\varphi}_{v}\approx \frac{1}{N}\sum _{i=1}^{N}{I}_{\mathrm{st}}(v\mid {H}_{i})$, where *I*_{st}(*v*| *H*_{i}) equals 1 if there exists a path from source *s* to target *t* going through node *v* in graph *H*_{i} and 0 otherwise.

The Monte Carlo algorithm can be implemented efficiently by simultaneously sampling subgraphs and checking path existence. To do so, we perform a probabilistic depth-first search. We maintain a stack of nodes that still need to be visited. We start by pushing source *s* to the stack. We then keep popping nodes from the stack until the stack is empty. When a node *v* has been popped from the stack, we determine for each of its outgoing edges whether the edge is active. An edge is active with probability *p*. If an edge (*v*, *u*) is active and if node *u* is not yet on the stack, then node *u* is pushed to the stack. At some point, target *t* may be reached, resulting in the identification of nodes that are located on a path from source *s* to target *t*. This implementation of the Monte Carlo algorithm is especially fast for smaller values of the probability *p*. The run-time of the Monte Carlo algorithm is linear in the number of edges *m*.

In this paper, we use a Java implementation of the Monte Carlo algorithm. The source code is available at https://github.com/lovre/intermediacy [57].

### References

- 1.
Garfield E, Sher I, Torpie R . 1964 The use of citation data in writing the history of science. Technical Report F49(638)-1256. The Institute for Scientific Information. Google Scholar - 2.
Garfield E, Pudovkin A, Istomin V . 2003 Why do we need algorithmic historiography?**J. Am. Soc. Inf. Sci. Technol.**, 400-412. (doi:10.1002/asi.10226) Crossref, Google Scholar**54** - 3.
Garfield E, Pudovkin A, Istomin V . 2003 Mapping the output of topical searches in the Web of Knowledge and the case of Watson-Crick.**Inf. Technol. Libr.**, 183-187. Google Scholar**22** - 4.
Garfield E . 2004 Historiographic mapping of knowledge domains literature.**J. Inf. Sci.**, 119-145. (doi:10.1177/0165551504042802) Crossref, ISI, Google Scholar**30** - 5.
van Eck N, Waltman L . 2014 CitNetExplorer: a new software tool for analyzing and visualizing citation networks.**J. Inform.**, 802-823. (doi:10.1016/j.joi.2014.07.006) Crossref, ISI, Google Scholar**8** - 6.
Chen C . 2006 CiteSpace II: detecting and visualizing emerging trends and transient patterns in scientific literature.**J. Am. Soc. Inf. Sci. Technol.**, 359-377. (doi:10.1002/asi.20317) Crossref, Google Scholar**57** - 7.
Marx W, Bornmann L, Barth A, Leydesdorff L . 2014 Detecting the historical roots of research fields by reference publication year spectroscopy (RPYS).**J. Assoc. Inf. Sci. Technol.**, 751-764. (doi:10.1002/asi.23089) Crossref, ISI, Google Scholar**65** - 8.
Thor A, Marx W, Leydesdorff L, Bornmann L . 2016 Introducing CitedReferencesExplorer (CRExplorer): a program for reference publication year spectroscopy with cited references standardization.**J. Inform.**, 503-515. (doi:10.1016/j.joi.2016.02.005) Crossref, ISI, Google Scholar**10** - 9.
Hummon N, Doreian P . 1989 Connectivity in a citation network: the development of DNA theory.**Soc. Networks**, 39-63. (doi:10.1016/0378-8733(89)90017-8) Crossref, ISI, Google Scholar**11** - 10.
Batagelj V . 2003 Efficient algorithms for citation network analysis. (http://arxiv.org/abs/cs/0309023v1) pp. 1–27. Google Scholar - 11.
Lucio-Arias D, Leydesdorff L . 2008 Main-path analysis and path-dependent transitions in HistCite^{TM}-based historiograms.**J. Am. Soc. Inf. Sci. Technol.**, 1948-1962. (doi:10.1002/asi.20903) Crossref, Google Scholar**59** - 12.
Liu J, Lu L . 2012 An integrated approach for main path analysis: development of the Hirsch index as an example.**J. Am. Soc. Inf. Sci. Technol.**, 528-542. (doi:10.1002/asi.21692) Crossref, Google Scholar**63** - 13.
Batagelj V, Doreian P, Ferligoj A, Kejžar N . 2014**Understanding large temporal networks and spatial networks**. Chichester: Wiley. Crossref, Google Scholar - 14.
Yeo W, Kim S, Lee JM, Kang J . 2014 Aggregative and stochastic model of main path identification: a case study on graphene.**Scientometrics**, 633-655. (doi:10.1007/s11192-013-1140-3) Crossref, ISI, Google Scholar**98** - 15.
Liu J, Kuan CH . 2016 A new approach for main path analysis: decay in knowledge diffusion.**J. Assoc. Inf. Sci. Technol.**, 465-476. (doi:10.1002/asi.23384) Crossref, ISI, Google Scholar**67** - 16.
Tu YN, Hsu SL . 2016 Constructing conceptual trajectory maps to trace the development of research fields.**J. Assoc. Inf. Sci. Technol.**, 2016-2031. (doi:10.1002/asi.23522) Crossref, ISI, Google Scholar**67** - 17.
Verspagen B . 2007 Mapping technological trajectories as patent citation networks: a study on the history of fuel cell research.**Adv. Complex Syst.**, 93-115. (doi:10.1142/S0219525907000945) Crossref, ISI, Google Scholar**10** - 18.
Park H, Magee C . 2017 Tracing technological development trajectories: a genetic knowledge persistence-based main path approach.**PLoS ONE**, e0170895. (doi:10.1371/journal.pone.0170895) PubMed, ISI, Google Scholar**12** - 19.
Gwak J, Sohn S . 2018 A novel approach to explore patent development paths for subfield technologies.**J. Assoc. Inf. Sci. Technol.**, 410-419. (doi:10.1002/asi.23962) Crossref, ISI, Google Scholar**69** - 20.
Kim J, Shin J . 2018 Mapping extended technological trajectories: integration of main path, derivative paths, and technology junctures.**Scientometrics**, 1439-1459. (doi:10.1007/s11192-018-2834-3) Crossref, ISI, Google Scholar**116** - 21.
Kuan CH, Huang MH, Chen DZ . 2018 Missing links: timing characteristics and their implications for capturing contemporaneous technological developments.**J. Inform.**, 259-270. (doi:10.1016/j.joi.2018.01.005) Crossref, ISI, Google Scholar**12** - 22.
- 23.
Šubelj L, Waltman L, Traag V, van Eck NJ . 2019 Intermediacy of publications. In**Proc. 17th Int. Conf. on Scientometrics and InformetricsISSI '19, Rome, Italy, 2 September**, pp. 1288-1300. Leuven, Belgium: ISSI. See http://issi-society.org/publications/issi-conference-proceedings/proceedings-of-issi-2019. Google Scholar - 24.
Stephenson K, Zelen M . 1989 Rethinking centrality: methods and examples.**Soc. Netw.**, 1-37. (doi:10.1016/0378-8733(89)90016-6) Crossref, ISI, Google Scholar**11** - 25.
Klein DJ, Randić M . 1993 Resistance distance.**J. Math. Chem.**, 81-95. (doi:10.1007/BF01164627) Crossref, ISI, Google Scholar**12** - 26.
Bozzo E, Franceschet M . 2013 Resistance distance, closeness, and betweenness.**Soc. Netw.**, 460-469. (doi:10.1016/j.socnet.2013.05.003) Crossref, ISI, Google Scholar**35** - 27.
Batagelj V, Ferligoj A, Squazzoni F . 2017 The emergence of a field: a network analysis of research on peer review.**Scientometrics**, 503-532. (doi:10.1007/s11192-017-2522-8) Crossref, PubMed, ISI, Google Scholar**113** - 28.
Newman MEJ, Girvan M . 2004 Finding and evaluating community structure in networks.**Phys. Rev. E**, 026113. (doi:10.1103/PhysRevE.69.026113) Crossref, PubMed, ISI, Google Scholar**69** - 29.
Klavans R, Boyack KW . 2017 Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge?**J. Assoc. Inf. Sci. Technol.**, 984-998. (doi:10.1002/asi.23734) Crossref, ISI, Google Scholar**68** - 30.
van Eck N, Waltman L . 2017 Accuracy of citation data in Web of Science and Scopus. In*Proc. 16th Int. Conf. on Scientometrics & Informetrics ISSI ’17, Wuhan, China, 19 October*, pp. 1087–1092. Leuven, Belgium: ISSI. Google Scholar - 31.
Waltman L, van Eck NJ . 2013 A smart local moving algorithm for large-scale modularity-based community detection.**Eur. Phys. J. B**, 471. (doi:10.1140/epjb/e2013-40829-0) Crossref, ISI, Google Scholar**86** - 32.
Waltman L, van Eck NJ . 2012 A new methodology for constructing a publication-level classification system of science.**J. Assoc. Inf. Sci. Technol.**, 2378-2392. (doi:10.1002/asi.22748) Crossref, ISI, Google Scholar**63** - 33.
Hric D, Darst RK, Fortunato S . 2014 Community detection in networks: structural communities versus ground truth.**Phys. Rev. E**, 062805. (doi:10.1103/PhysRevE.90.062805) Crossref, ISI, Google Scholar**90** - 34.
Fortunato S . 2010 Community detection in graphs.**Phys. Rep.**, 75-174. (doi:10.1016/j.physrep.2009.11.002) Crossref, ISI, Google Scholar**486** - 35.
Newman MEJ . 2006 Modularity and community structure in networks.**Proc. Natl Acad. Sci. USA**, 8577-8582. (doi:10.1073/pnas.0601602103) Crossref, PubMed, ISI, Google Scholar**103** - 36.
Ruiz-Castillo J, Waltman L . 2015 Field-normalized citation impact indicators using algorithmically constructed classification systems of science.**J. Inform.**, 102-117. (doi:10.1016/j.joi.2014.11.010) Crossref, ISI, Google Scholar**9** - 37.
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E . 2008 Fast unfolding of communities in large networks.**J. Stat. Mech.**, P10008. (doi:10.1088/1742-5468/2008/10/P10008) Crossref, ISI, Google Scholar**2008** - 38.
Newman MEJ . 2006 Finding community structure in networks using the eigenvectors of matrices.**Phys. Rev. E**, 036104. (doi:10.1103/PhysRevE.74.036104) Crossref, PubMed, ISI, Google Scholar**74** - 39.
Newman MEJ . 2004 Fast algorithm for detecting community structure in networks.**Phys. Rev. E**, 066133. (doi:10.1103/PhysRevE.69.066133) Crossref, PubMed, ISI, Google Scholar**69** - 40.
Rosvall M, Bergstrom CT . 2008 Maps of random walks on complex networks reveal community structure.**Proc. Natl Acad. Sci. USA**, 1118-1123. (doi:10.1073/pnas.0706851105) Crossref, PubMed, ISI, Google Scholar**105** - 41.
Cole S, Cole JR . 1967 Scientific output and recognition: a study in the operation of the reward system in science.**Am. Sociol. Rev.**, 377-390. (doi:10.2307/2091085) Crossref, PubMed, ISI, Google Scholar**32** - 42.
García JA, Rodriguez-Sánchez R, Fdez-Valdivia J . 2015 The author-editor game.**Scientometrics**, 361-380. (doi:10.1007/s11192-015-1566-x) Crossref, ISI, Google Scholar**104** - 43.
Lee CJ, Sugimoto CR, Zhang G, Cronin B . 2013 Bias in peer review.**J. Am. Soc. Inf. Sci. Technol.**, 2-17. (doi:10.1002/asi.22784) Crossref, Google Scholar**64** - 44.
Zuckerman H, Merton RK . 1971 Patterns of evaluation in science: institutionalisation, structure and functions of the referee system.**Minerva**, 66-100. (doi:10.1007/BF01553188) Crossref, ISI, Google Scholar**9** - 45.
Campanario JM . 1998 Peer review for journals as it stands today: part 1.**Sci. Commun.**, 181-211. (doi:10.1177/1075547098019003002) Crossref, ISI, Google Scholar**19** - 46.
Crane D . 1967 The gatekeepers of science: some factors affecting the selection of articles for scientific journals.**Am. Sociol.**, 195-201. Google Scholar**2** - 47.
Campanario JM . 1998 Peer review for journals as it stands today: part 2.**Sci. Commun.**, 277-306. (doi:10.1177/1075547098019004002) Crossref, ISI, Google Scholar**19** - 48.
Gottfredson SD . 1978 Evaluating psychological research reports: dimensions, reliability, and correlates of quality judgments.**Am. Psychol.**, 920-934. (doi:10.1037/0003-066X.33.10.920) Crossref, ISI, Google Scholar**33** - 49.
Bornmann L. 2011 Scientific peer review.**Annu. Rev. Inf. Sci. Technol.**, 197-245. (doi:10.1002/aris.2011.1440450112) Crossref, ISI, Google Scholar**45** - 50.
Bornmann L . 2012 The Hawthorne effect in journal peer review.**Scientometrics**, 857-862. (doi:10.1007/s11192-011-0547-y) Crossref, ISI, Google Scholar**91** - 51.
Bornmann L . 2014 Do we still need peer review? An argument for change.**J. Assoc. Inf. Sci. Technol.**, 209-213. (doi:10.1002/asi.23033) Crossref, ISI, Google Scholar**65** - 52.
Merton RK . 1968 The Matthew effect in science.**Science**, 56-63. (doi:10.1126/science.159.3810.56) Crossref, PubMed, ISI, Google Scholar**159** - 53.
Ball M . 1980 Complexity of network reliability computations.**Networks**, 153-165. (doi:10.1002/net.3230100206) Crossref, ISI, Google Scholar**10** - 54.
Moskowitz F . 1958 The analysis of redundancy networks.**Trans. Am. Inst. Electr. Eng.**, 627-632. Google Scholar**77** - 55.
Page L, Perry J . 1989 Reliability of directed networks using the factoring theorem.**IEEE Trans. Reliab.**, 556-562. (doi:10.1109/24.46479) Crossref, ISI, Google Scholar**38** - 56.
Misra K . 1970 An algorithm for the reliability evaluation of redundant networks.**IEEE Trans. Reliab.****R-19**, 146-151. (doi:10.1109/TR.1970.5216434) Crossref, ISI, Google Scholar - 57.
Šubelj L . 2018 Intermediacy of publications. See http://dx.doi.org/10.5281/zenodo.1424365. Google Scholar