Analysis of node2vec random walks on networks

Random walks have been proven to be useful for constructing various algorithms to gain information on networks. Algorithm node2vec employs biased random walks to realize embeddings of nodes into low-dimensional spaces, which can then be used for tasks such as multi-label classification and link prediction. The performance of the node2vec algorithm in these applications is considered to depend on properties of random walks that the algorithm uses. In the present study, we theoretically and numerically analyze random walks used by the node2vec. Those random walks are second-order Markov chains. We exploit the mapping of its transition rule to a transition probability matrix among directed edges to analyze the stationary probability, relaxation times in terms of the spectral gap of the transition probability matrix, and coalescence time. In particular, we show that node2vec random walk accelerates diffusion when walkers are designed to avoid both back-tracking and visiting a neighbor of the previously visited node but do not avoid them completely.


Introduction
Random walks on finite networks have been a favorite research topic for decades [1][2][3][4]. Perhaps more importantly, random walks are a core technique for building algorithms to extract useful information from network data. Such applications of random walks include community detection, ranking of nodes and edges, dimension reduction of data, sampling, to name a few [4,5]. Many theoretical, computational, and algorithmic studies have employed simple random walks on unweighted networks, which by definition dictates that a walker moves to one of its neighbors with equal probability in each time step. However, there are also various other types of random walks, many of which have been fed to random walk algorithms [4,5].
The random walks developed for the algorithmic framework called the node2vec are one such random walk [6]. Unlike simple random walks, transitions of node2vec random walkers not only depend on the degree of the currently visited node or its variant with edge weights, but also on the structure of the local network and last visited node. Grover and Leskovec proposed node2vec for scalable feature learning on networks, which can be used in tasks such as community detection, multilabel classification, and link prediction. In node2vec, one can tune the weight of local versus global search of the network by modulating parameter values [6]. The node2vec has found applications in, for example, predicting genes associated with Parkinson's disease [7] and movie recommendation [8].
To date, not much is known about behavior of node2vec random walks. Note that, among various properties of random walks, the stationary probability plays a key role in ranking the nodes [4,9,10], and the relaxation time affects, for example, the rate of the convergence of random-walk algorithms and quality of community structure [4]. In the present study, we theoretically and numerically examine the node2vec random walks on finite networks. In particular, we provide multiple lines of evidence supporting that diffusion (i.e., approaching to the stationary probability and coalescence of random walkers) is accelerated when the parameters of node2vec random walks are tuned such that back-tracking and visiting the neighbors of the last visited node are suppressed and exploration of the rest of the network, similar to depth-first sampling, is explicitly promoted. This is the case unless the avoidance of local sampling including back-tracking is not excessive.

Model
Consider a finite network G(V, E), where V = {1, . . . , N } is a finite set of nodes, N is the number of nodes, and E = {(i, j) | (i, j) ∈ V × V and i = j} is a set of edges. In the present study, we assume undirected and possible weighted networks that are free of self-loops and multiple edges, although the node2vec random walks and the formalism developed below are also valid for directed networks. Denote by v t (t = 0, 1, . . .) the position of a random walker at discrete time t. We say that a discrete-time random walk is node2vec if its transition probability p i→j (t) at time t, where (i, j) ∈ E, is given by where w ij is the weight of edge (i, j), and the symbol ∝ means "proportional to" [6]. The normalization is given by N j=1 p i→j (t) = 1 for all i ∈ V and t > 0. Variable α represents the propensity for the random walk to backtrack, β the weight of reaching a common neighbour of the currently visited node and the node visited in the last step, and γ the weight of exploring any of the other neighbors. A large β value implies an approximate breadth-first sampling, and a large γ value implies an approximate depth-first sampling [6]. If α = β = γ = 0, the node2vec random walk is reduced to a simple random walk. If α = 0 and β = γ = 0, the node2vec random walk is a non-backtracking random walk [11,12]. Possible one-step transitions of the node2vec random walk are schematically shown in Fig. 1.
Equation (1) implies that a node2vec random walk is a second-order Markov chain [6]. In other words, the transition probability p i→j (t) depends on the currently visited node i and the node visited in the previous time step (i.e., t − 1), but not on the further history of the walk. To transform the node2vec random walk into a first-order Markov chain, we change the state space from the nodes of the network to the directed edges of the network, similar to the formation of memory networks [13,14]. Let M denote the number of undirected edges. Let E = {e 1 , . . . , e 2M } be the set of directed edges, which consists of each undirected edge (u, v) ∈ E duplicated as directed edges (u, v) and (v, u). For notational convenience, we use (·, ·) to represent the cases of both an undirected and Figure 1: Schematic of the node2vec random walk. We assume that the network is unweighted. The transition probability to one of the four neighbors at time t in this example is given by α/(α + β + 2γ), β/(α + β + 2γ), or γ/(α + β + 2γ). directed edge. For e = (u, v) ∈ E, we denote e(0) = u and e(1) = v. Under this transformation, the 2M × 2M transition probability matrix T is given by The normalization is given by 2M j=1 T i,j = 1 for i = 1, 2, . . . , 2M .

Stationary probability in special cases
We start by briefly reviewing some definitions. A directed network is strongly connected if there exists a directed path from u to v and from v to u for any nodes u and v. We say that a network is aperiodic if the greatest common divisor of the length of all the closed directed paths is equal to 1. Most empirical networks are aperiodic although there are important exceptions such as bipartite networks including trees. Therefore, we assume aperiodicity throughout this paper. A node2vec random walk on a strongly connected aperiodic finite network with state space E induces a unique positive probability vector q * = (q * 1 , . . . , q * 2M ), where q * j is the stationary probability on directed edge e j (j = 1, 2, . . . , 2M ), such that Denote p * = (p * 1 , . . . , p * N ), where p * i is the stationary probability at node i (i = 1, . . . , N ). Probability vectors p * and q * are related by In particular, if the network is undirected and the random walk is simple (i.e., α = β = γ), one obtains Therefore, for a simple random walk on undirected networks, we recover the well-known result given by where d i is the weighted degree, which is called the node strength, of node i. We say that a network is simple if it is unweighted, undirected, and free of self-loops and multiple edges. Non-backtracking random walks on a simple finite network with degree d i ≥ 2 (i = 1, . . . , N ) have the same stationary distribution as the simple random walk [11]. Here we present a slight generalization of this result stated as follows: Theorem 1. For a node2vec random walk on a simple finite network, the stationary distribution is the same as that for the simple random walk if β = γ, α > 0. In other words, it is given by Eq. (5). Therefore, the stationary distribution for nodes is given by Eq. (6).
Proof. Let β = γ. In this case, we do not have to distinguish whether or not edges ) form a triangle. Therefore, the transition probability matrix is given by It is straightforward to verify that T has a left eigenvector 1 = (1, . . . , 1), such that 1T = 1. Because of the uniqueness of the Perron-Frobenius vector, the stationary distribution is given by Eq. (5).
We remark that Theorem 1 allows nodes with degree 1. If a node2vec random walker arrives at a node with degree 1, it always backtracks in the next time step because backtracking is the only possible move. This is consistent with the assumption α > 0 in the theorem.
We now examine how symmetry in the network constrains the stationary distribution of the node2vec random walk. Consider a network G(V, E) and its corresponding adjacency matrix A, where G can be directed or undirected, and weighted or unweighted. An automorphism π of network G is a permutation of the nodes that preserves the adjacency of the nodes [15][16][17][18]. In other words, automorphism π : V → V is a bijection that satisfies A ij = A π(i)π(j) , for any i, j = 1, . . . , N . Two nodes, denoted by v and v , are said to be automorphically equivalent if there is an automorphism that maps one node to the other, i.e., π(v) = v [15,16]. A vertex-transitive network is an undirected network in which any pair of nodes is automorphically equivalent [17,19].
Theorem 2. If nodes u and v are automorphically equivalent in undirected network G(V, E), then they have the same stationary probability of being visited by a node2vec random walker, i.e., p * u = p * v . Proof. Let π be an automorphism of G. Let E = {e 1 , e 2 , . . . , e 2M } be an ordered set of the directed edges in the undirected network G, in which each undirected edge (u, v) ∈ E is duplicated as directed edges (u, v) ∈ E and (v, u) ∈ E. Define a permutation of E by φ(E) = {φ(e 1 ), φ(e 2 ), . . . , φ(e 2M )}, where a directed edge φ(e i ) := (π(e i (0)), π(e i (1))) for i = 1, . . . , 2M . Because φ(e i ) ∈ E and φ(e i ) = φ(e j ) if i = j, set φ(E) is also an ordered set of the directed edges in G. Therefore, φ is a permutation of E.
First, we show that φ is an automorphism of a directed weighted network G derived from G. In G, the set of nodes is given by E, and the set of edges is specified by the weighted adjacency matrix, T , given by Eq. (2). Therefore, the two directed edges of G (i.e., nodes of G), denoted by e i and e j , are connected by a directed edge of G if and only if random walkers that have traversed e i may traverse e j in the next time step. Formally, for arbitrary e i , e j ∈ E, ordered pair (φ(e i ), φ(e j )) is an edge of G if and only if (e i , e j ) is an edge of G, because π(e i (1)) = π(e j (0)) if and only if e i (1) = e j (0). We also obtain if π(e i (1)) = π(e j (0)) and π(e i (0)) = π(e j (1)), if π(e i (1)) = π(e j (0)) and (π(e i (0)), π(e j (1))) ∈ E, if π(e i (1)) = π(e j (0)) and (π(e i (0)), π(e j (1))) ∈ E, 0 otherwise.
Therefore, φ is an automorphism of G. Note that, in Eq. (8), we used, for example, e i rather than i to refer to the row and column of T to avoid an abuse of notation. Second, we show that automorphically equivalent nodes in G have the same stationary probability of the random walk whose transition probability matrix is given by T . To show this, let T be the weighted adjacency matrix of G when the rows and columns are reordered as φ(E) = {φ(e 1 ), . . . , φ(e 2M )}. Because φ is an automorphism, we obtain for any i, j = 1, . . . , 2M . Let q * and q * be the stationary probability of the random walk whose transition probability matrix is given by T and T , respectively. Because T = T , we obtain q * = q * , i.e., q * e i = q * φ(e i ) , i = 1, . . . , 2M . Finally, assume that u ∈ V and v ∈ V are automorphically equivalent in G and connected by an automorphism π, i.e., v = π(u). For any directed edge e i incoming to u, i.e., Because this argument holds true for any pair of e i ∈ E incoming to u and the corresponding edge incoming to v, we use Eq. (4) to conclude that p * u = p * v . Corollary 1. If network G is vertex-transitive, p * i = 1/N for all nodes.

Relaxation time
The relaxation speed of the random walk is governed by the second largest eigenvalue of T in modulus [2,4,20]. The spectral gap defined by 1 − |λ 2 |, where λ 2 is the second largest eigenvalue of T in modulus, quantifies the relaxation speed (see SM for numerical examples). A large spectral gap implies a fast convergence. A node2vec random walk is specified by three parameters α, β, and γ. Because only the ratio among α, β, and γ specifies the transition probabilities, we set γ = 1. Note that we are not interested in the case γ = 0 because it implies that the walker always backtracks or visits the neighbor of the previously visited node without exploring a node different from v t−1 or its neighbor. In this section, we examine relaxation time of node2vec random walks on empirical and synthetic networks.

Empirical networks
We study node2vec random walks on six empirical networks. Basic properties of the data sets are shown in Table 1. All the networks are treated as unweighted and undirected networks. The data sets can be downloaded at [21][22][23][24][25][26].
The voles network is one of the 128 wild vole networks gathered in Kielder Forest on the English-Scottish border around 2001 [27]. Each node denotes a vole. An edge is present if two voles were caught in at least one common trap. The dolphin network is a social network, in which nodes are the bottlenose dolphins, and an edge occurs if there is a frequent association between two bottlenose dolphins [28]. Enron Email Data set was collected and prepared by the CALO (A Cognitive Assistant that Learns and Organizes) project [29]. Each node represents a manager or an employee of the Enron Corporation. There is an edge between two nodes if there is at least one email exchanged between the two individuals. The jazz network is constructed based on collaboration between jazz musician bands [30]. Each node denotes a band. Two nodes are adjacent if they have a musician in common anytime between 1912 and 1940. The coauthorship network represents coauthor relationships between authors who published papers on network science up to 2006 [31]. The original data set has 1589 nodes, and we only use the largest connected component. The email network is gathered from University at Rovira i Virgili in Tarragona, Spain, and contains 1669 users [32]. Each node represents an email address. An edges occurs between two nodes if there is an email communication between them at least once. Among the 1669 nodes, 1133 of them belongs to the largest connected component, which we use in the following analysis. Figure 2 shows the numerically calculated spectral gap for the different empirical networks when we vary the α and β values while keeping γ = 1. The figure suggests that spectral gap largely decreases as α or β increases for all the networks. The global maximum value of the spectral gap is obtained near (α, β) = (0, 0). Therefore, smaller α and β values, which imply a larger probability of exploring the network without backtracking or visiting common neighbors of the presently visited node and the last visited node, accelerate relaxation. In Figs. 2(d), 2(e), and 2(f), the spectral gap is small for excessively small β even when α is relatively large. It is probably because a tiny β value compels the random walker to leave local neighbors of a node, such as a community, before it sufficiently explores the neighborhood with a breadth-first sampling mechanism.

Extended ring network with triangles
Empirical networks are heterogeneous in terms of the node's degree and local abundance in triangles. Therefore, the stationary probability depends on the α and β values given γ = 1, unless β = 1. Therefore, the result that a small α and β largely accelerates the exploration of node2vec random walkers may partly rely on the change in the stationary probability as α or β changes. To exclude this possibility, in this section and Section 3(b)(3.2.3), we consider model networks whose stationary probability does not depend on α or β. Our choice of the model networks is based on analytical tractability rather than on sufficient similarity to empirical networks. Specifically, in this section we consider an extended ring network shown in Fig. 3(a). As the figure indicates, each node has degree k = 4, and all the nodes are automorphically equivalent to each other. Therefore, Theorem 2 implies that owing to symmetry induced by the vertex-transitivity of the network, the stationary probability of the node2vec random walk is given by p * = 1/N regardless of the values of α, β, and γ. To analyze the spectral gap, given k × k matrices B i , where i = 1, 2, . . . , n, we define the kn × kn block circulant matrix bcirc(B 1 , B 2 , . . . , B n ) by bcirc (B 1 , B 2 Consider the extended ring network and the set of directed edges E. Note that there are 2M = 4kN directed edges in E. We order the directed edges in E as illustrated in Fig. 3(b). Then, the transition probability matrix T is block circulant and is given by bcirc(0, A, B, 0, . . . , 0, C, D), where and We let denote the N th roots of 1, where i is the imaginary unit and j = 0, 1, 2, ..., N − 1. Then, we define 4 × 4 matrices where j = 0, 1, . . . , N − 1. In particular, has a right eigenvector 1 = (1, 1, 1, 1) corresponding to eigenvalue 1. Theorem 3 in Ref. [33] guarantees that where spec(·) denotes the spectrum of the matrix, i.e. the set of all its eigenvalues (also see Ref. [18]). Equation (19) allows us to calculate spec(T ), and therefore the spectral gap of T , by calculating the spectrum of N matrices of size 4. This method reduces the time for computing the spectral gap from O(N 3 ) to O(N ). The method can be generalized to the k-regular extended ring without difficulty, where k is an even number larger than 4.
The spectral gap of T for the 4-regular extended ring networks with N = 100, 1000, and 10000 nodes is shown in Fig. 4. The spectral gap is smaller when N is larger for any α and β. This result is reasonable because the average path length between node pairs is proportional to N for this network. Furthermore, Fig. 4 indicates that the spectral gap is large when α and β are small for any N . However, the spectral gap is not the largest when α and β are the smallest when N = 100 (see Fig. 4(a)). These results are roughly consistent with the results for the empirical networks shown in Section 3(b)(3.2.1). When α and β are both extremely small, the random walker has to go clockwise or counterclockwise for a long time before changing the direction. We consider that the spectral gap is small when α and β are both tiny because the walker skips to visit some nodes when unidirectionally sweeping the ring.

Two-layer extended ring network
Similarly, one can also semi-analytically calculate the spectral gap of the transition probability matrix of node2vec random walks on two-layer extended ring networks defined as follows. Consider a pair of extended ring network each of which has N nodes labeled 1, 2, . . . , N in the same manner, e.g., counterclockwise. Then, we connect the nodes with the same label in the different layers by an edge with weight w (Fig. 5(a)). We assume that the edges within each extended ring have weight 1. The obtained network is an undirected weighted network with N = 2N nodes. Note that each node v has degree 5; four edges in the same layer as v have weight 1, and the other edge connecting the two layers has weight w. The network is composed of two communities when w is small. Furthermore, it can be regarded as a multilayer network with two layers under the so-called ordinal coupling [34][35][36].
Because the network is vertex-transitive, Theorem 2 implies that the stationary probability p * = 1/N . To analyze the spectral gap of this network, we label the 5N directed edges as shown in Fig.  5(b). The transition probability matrix T is a block circulant matrix given by where and where Theorem 3 in Ref. [33] yields We define and where ρ j is given by Eq. (16). Because M 1 + M 2 and M 1 − M 2 are block circulant, one obtains and Therefore, the spectrum of T is given by Similar to the case of mono-layer extended ring networks, this method enables practical computation of the spectrum and the spectral gap for two-layer extended ring networks of various sizes and can be easily generalized to two-layer k-regular extended ring networks. Equation (33) implies that one can reduce the computation time from O(N 3 ) to O(N ).
Numerically calculated spectral gaps for the two-layer extended ring networks with N = 200 nodes are shown in Fig. 6 for various α and β values and four values of w. We find that backtracking (i.e., large α) slows down mixing for all the w values. When w is small, the spectral gap increases as α or β decreases (Figs. 6(a), 6(b) and 6(c)). These results are consistent with the results for the empirical networks and the mono-layer extended ring network. When w is large, movements between the two layers are frequent. In this case, the spectral gap decreases as α increases, whereas it is relatively insensitive to β within the range of β values that we have explored (Fig. 6(d)). In this situation, a random walker that visits more neighbors within the same layer by the breadth-first sampling mechanism (i.e., large β) mixes roughly as fast as a walker that frequently switches the layer (i.e., small β). The dependence of the spectral gap on N is examined in the SM.
Last, Fig. 6 indicates that the spectral gap is not monotonic in terms of w for any given α and β values. When w is small ( Fig. 6(a)), walkers find it difficult to transit from one layer to the other, which poses a bottleneck of diffusion. The spectral gap is the largest (i.e., relaxation is the fastest) for an intermediate value of w (w = 0.1 among the four values of w; Fig. 6(b)). When w is larger (Figs. 6(c) and 6(d)), the diffusion is decelerated presumably because exploration within the individual layers is not enough relative to inter-layer moves. This deceleration result is opposite to the previous result that strong inter-layer coupling makes the spectral gap larger than for random walks confined to the individual layers for simple random walks [37]. The difference may be ascribed to the different types of random walks employed in these studies, i.e., simple random walks in Ref. [37] and node2vec random walks in the present study.

Mean coalescence time on two-clique networks
In this section, we provide an analysis that is different from the spectral gap with the aim of supporting our main claim that diffusion accelerates with small α and β values. The voter model is a linear stochastic model of collective opinion formation, where each node in the network has one of the two opinions, denoted by A and B [38]. At least in finite networks, the consensus of opinion A and that of B are the only absorbing states. The duality relationship guarantees that the mean time to consensus is given by the mean time to coalescence of N coalescing random walkers deployed on each node of the edge-reversed network into one walker [2,4,38,39]. There are two random walkers just before all the N walkers coalesce into one walker. Therefore, in this section, we evaluate the mean time to coalescence of two node2vec random walkers as an alternative measure of speed of diffusion.
We consider a weighted network composed of two cliques each of which has N = N/2 nodes; by definition, each pair of nodes in a clique is adjacent to each other. We assume that the edges forming a clique has weight 1 and that the two cliques are connected by one edge with weight w, which we call the bridge (Fig. 7). We refer to the two nodes that are incident to the bridge as portal nodes. Unless w is extremely large, this network is composed of well distinguished two communities such that diffusion needs a long time when N is large. Because the two portal nodes are automorphically equivalent and so are the N − 2 non-portal nodes, the stationary probability for a single node2vec random walker is given by if node i is a non-portal node, The state of two coalescing node2vec random walkers is described by the currently visited node and the last visited node of each walker. In every time step, we update the position of one of the two walkers using the link dynamics rule [40,41]. In other words, we select one of the two walkers with the equal probability (i.e., 1/2) and then the selected walker makes a single move according to the rule of node2vec. This dynamics repeats until the two walkers meet at the same node to coalesce.
Specifying the currently visited and last visited nodes for the two walkers is equivalent to specifying two directed edges (while the network is assumed to be undirected). By exploiting the automorphical equivalence of the two portal nodes and that of the N − 2 non-portal nodes, we only need to distinguish the following types of the pairs of directed edges for specifying the state of the pair of the walkers. The possible states are enumerated in Table 2 and schematically shown in Fig. 7. A first level of classification of the pair of directed edges is whether they are in the same or different cliques, or on the bridge. Owing to the symmetry, if the two directed edges are contained in the same clique, we do not need to know which of the two cliques contains the two edges. There are ten such states. Alternatively, the two edges may belong to the opposite cliques. There are six such states. As the third and last possibility, one of the two edges may be on the bridge. There are five such states. Note that it is impossible for both edges to be on the bridge because it would mean that the walkers coalesced in a previous time step.
A second level of classification is based on whether or not and how the two directed edges share a node. At this classification level, we distinguish between four configurations, which are schematically shown in Fig. 8. First, we say that two directed edges e 1 and e 2 are disjoint if they do not share a node, i.e., e 1 (0) = e 2 (0), e 1 (0) = e 2 (1), e 1 (1) = e 2 (0), and e 1 (1) = e 2 (1) (Fig. 8(a)). Second, e 1 and e 2 are divergent if e 1 (0) = e 2 (0) and e 1 (1) = e 2 (1) (Fig. 8(b)). Third, the two edges are said to be chasing if e 1 (1) = e 2 (0) and e 1 (0) = e 2 (1), or e 1 (0) = e 2 (1) and e 1 (1) = e 2 (0) (Fig. 8(c)). Fourth, if e 1 (1) = e 2 (1), we say that the two edges are confluent (Fig. 8(d)), which implies the coalescence of the two walkers. In some cases, in addition to applying the aforementioned two levels of the classification scheme, one has to distinguish between different states depending on whether or not and how the nodes coincide with the portal node. For example, Table 2 indicates that there are three states for a pair of directed edges that qualify as "same clique" (according to the first-level classification) and "disjoint" (second-level). The exhaustive classification yields 21 states excluding the coalescent (i.e., confluent) state. We use the state number from 1 through 21 to inform the row/column index of the transitionprobability matrix. We assign state 22 to the coalescent state.
Let p i (t) be the probability that two walkers are in state i (i = 1, 2, . . . , 21) at time t and r(t) the probability that the two walkers coalesce at time t. Let T CRW be the 22 × 22 transition probability matrix derived in the Appendix, and S be the minor of T CRW that one obtains by removing its last row and column of T CRW corresponding to the confluent state. Note that T CRW 22,j = δ j,22 where δ is Kronecker delta. We obtain [42] where p(t) = (p 1 (t), . . . , p 21 (t)), and where v = (T We consider the two-clique network with N = 200 nodes (i.e., N = 100 nodes in each clique) and three initial conditions, i.e., two walkers starting from the same clique, the opposite cliques, or either clique with probability 1/2 independently for the different walkers. Specifically, we define the initial condition under which the two walkers start from the same clique by p j (0) = 1/12 for j = 1, 2, 3,4,5,6,7,8,9,10,18,21, and p j (0) = 0 otherwise. The initial condition under which the two walkers start from the opposite cliques is defined by p j (0) = 1/9 for j = 11, 12, 13, 14, 15, 16, 17, 19, 20, and p j (0) = 0 otherwise. The initial condition under which the two walkers start from a uniformly randomly selected clique is defined by p j (0) = 1/21 for j = 1, . . . , 21.
We show the mean coalescence time numerically calculated using Eq. (37) in Fig. 9 for the three initial conditions and two values of w (i.e., w = 1 and w = 10). As expected, the mean coalescence time is considerably smaller if the two walkers start in the same clique (Figs. 9(a) and 9(d)) than in the opposite cliques (Figs. 9(b) and 9(e)). The results for the uniformly random initial condition (Figs. 9(c) and 9(f)) are intermediate between the other two initial conditions. Under each initial condition, the mean coalescence time is smaller for w = 1 (Figs. 9(a)-(c)) than w = 10 (Figs. 9(d)-(f)) because large w enables the two walkers to move between cliques relatively frequently so that they have more chances to coalesce.

Discussion
The node2vec has been recognized as a competitive algorithm of network embedding and also inspiring further network embedding algorithms [43,44]. However, theoretical properties of the node2vec random walks, which are considered to affect the performance and applicability of node2vec, have been underexplored. A previous study provided a theoretical foundation of the stationary probability of node2vec random walks [45]. In the present study, we have investigated properties of node2vec random walks with a particular focus on diffusion speed. We have shown that diffusion measured in terms of the spectral gap and coalescence time is faster when random walkers are encouraged to explore the network without backtracking or visiting common neighbors of the currently visited node and the last visited node. We have confirmed this conclusion for several empirical and model networks except for some cases in which the avoidance of backtracking or visiting the common neighbors is excessive.
Node2vec random walks are a second-order Markov process. Second-order Markov processes have been shown to be a promising representation of temporal network data, as opposed to first-order (i.e., memoryless) Markov processes [13,14]. For temporal network data, second-order random walks find various applications. Therefore, apart from network embedding for which the node2vec random walks are originally used [6], they may also find applications in, for example, community detection, ranking of nodes, network search, and collaborative filtering [4,5]. For example, one may be able to accelerate network search and sampling by setting α and β to small values. However, we have pointed out that the stationary probability depends on the parameters of node2vec random walks, i.e., α and β assuming γ = 1 (also see Ref. [45]). Therefore, applications that depend on the stationary probability have to be carefully considered; one may have to calibrate the dependence of the stationary probability on the α and β values to realize such applications.
In the analysis of the spectral gap of model networks (Section 3(b)(3.2.2) and 3(b)(3.2.3)), we analyzed networks whose stationary probability is independent of α and β values. To this end, we used vertex-transitive networks, in which all nodes are automorphically equivalent to each other. We avoided the complete graph, which is trivially vertex-transitive, because all the triplets of nodes form a triangle such that the approximate depth-first sampling, which is defined to occur with the probability proportional to γ, is irrelevant. Both of the vertex-transitive networks that we have employed have a large average path length because they are essentially one-dimensional. This choice allowed us to employ a theorem in Ref. [33] for conveniently calculating the spectrum of block circulant matrices. However, these networks do not resemble most of the empirical networks that have a small average path length relative to the number of nodes, N [10,46]. In fact, there are various named vertextransitive networks, and methods to construct vertex-transitive networks such as Cayley graphs are available in algebraic graph theory [17]. Analysis of the diffusion speed in vertex-transitive and smallworld networks (i.e., having a small average path length and reasonably many triangles) warrants future work. Analysis of second-order Markov chains with other types of memory also warrants future work.

Appendix: Transition probabilities for a pair of coalescent node2vec random walkers
In this section, we list the transition probability for a pair of coalescent random walkers on the twoclique graph. The non-zero elements of the 22 × 22 transition probability matrix, T CRW , are enumerated as follows: All the other elements of T CRW are equal to 0.